Commit Graph

12671 Commits

Author SHA1 Message Date
many
c1ce4e4ca9 Introduce mocked ExactAttribute step in exactness criterion 2021-05-06 14:28:31 +02:00
many
a3f8686fbf Introduce exactness criterion 2021-05-06 14:28:30 +02:00
Marin Postma
b192cb9c1f enable string syntax for the filters 2021-05-06 12:48:31 +02:00
bors[bot]
25f75d4d03 Merge #189
189: Update version for the next release (v0.2.1) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-05-05 15:28:56 +00:00
bors[bot]
7e63e32960 Merge #187
187: Fix fields distribution after documents merge r=Kerollmops a=shekhirin

Resolves https://github.com/meilisearch/milli/issues/174

The problem was with calculation of fields distribution before the merge in `output_from_sorter()`. So if you'd import two documents with the same primary key value, fields distribution will count it as two documents, while `output_from_sorter()` will merge these documents into one.

---

```console
➜ Downloads cat short_movies.json
[
{"id":"47474","title":"The Serpent's Egg","poster":"https://image.tmdb.org/t/p/w500/n7z0doFkXHcvo8QQWHLFnkEPXRU.jpg","overview":"The Serpent's Egg follows a week in the life of Abel Rosenberg, an out-of-work American circus acrobat living in poverty-stricken Berlin following Germany's defeat in World War I.","release_date":246844800,"genres":["Thriller","Drama","Mystery"]},
{"id":"47474","title":"The Serpent's Egg","poster":"https://image.tmdb.org/t/p/w500/n7z0doFkXHcvo8QQWHLFnkEPXRU.jpg","overview":"The Serpent's Egg follows a week in the life of Abel Rosenberg, an out-of-work American circus acrobat living in poverty-stricken Berlin following Germany's defeat in World War I.","release_date":246844800,"genres":["Thriller","Drama","Mystery"]}
]
➜ Downloads curl -X POST -H "Content-Type: text/json" --data-binary @short_movies.json 127.0.0.1:7700/indexes/movies/documents
{"updateId":0}
```

## Before
```console
➜ Downloads curl -s 127.0.0.1:7700/indexes/movies/stats | jq
{
  "numberOfDocuments": 1,
  "isIndexing": false,
  "fieldsDistribution": {
    "release_date": 2,
    "poster": 2,
    "title": 2,
    "overview": 2,
    "genres": 2,
    "id": 2
  }
}
```

## After
```console
➜ Downloads curl -s 127.0.0.1:7700/indexes/movies/stats | jq
{
  "numberOfDocuments": 1,
  "isIndexing": false,
  "fieldsDistribution": {
    "poster": 1,
    "release_date": 1,
    "title": 1,
    "genres": 1,
    "id": 1,
    "overview": 1
  }
}
```

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-05-05 14:45:08 +00:00
Clémentine Urquizar
1e11578ef0 Update version for the next release (v0.2.1) 2021-05-05 14:57:34 +02:00
bors[bot]
998d5ead34 Merge #182
182: remove facet setting r=MarinPostma a=MarinPostma

remove useless code


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-05-05 11:22:12 +00:00
Marin Postma
ec7eb7798f remove facet setting 2021-05-04 22:36:31 +02:00
Alexey Shekhirin
f8d0f5265f fix(update): fields distribution after documents merge 2021-05-04 22:12:20 +03:00
Marin Postma
a717925caa remove filters, rename facet_filters to filter 2021-05-04 18:20:56 +02:00
bors[bot]
88ae02f8d9 Merge #174
174: Upgrade Tokenizer r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-05-04 15:57:07 +00:00
Clémentine Urquizar
eb03a3ccb1 Upgrade Milli and Tokenizer 2021-05-04 17:56:19 +02:00
bors[bot]
1207a058d0 Merge #185
185: Provide an iterator over all the documents in a milli index r=Kerollmops a=irevoire



Co-authored-by: tamo <tamo@meilisearch.com>
2021-05-04 14:04:16 +00:00
bors[bot]
77740829bd Merge #177
177: bump milli r=MarinPostma a=MarinPostma



Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-05-04 13:49:37 +00:00
Marin Postma
928fb34eff bump milli and fix tests 2021-05-04 15:10:22 +02:00
tamo
d61566787e provide an iterator over all the documents in a milli index 2021-05-04 11:23:51 +02:00
bors[bot]
c08f4599f2 Merge #183
183: remove tests on main r=Kerollmops a=MarinPostma

remove testing on main since we now use bors for merging.


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-05-03 15:06:28 +00:00
Marin Postma
bb5823c775 remove tests on main 2021-05-03 15:21:20 +02:00
bors[bot]
792225eaff Merge #182
182: Upgrade Milli version (v0.2.0) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-05-03 13:00:16 +00:00
Clémentine Urquizar
a8680887d8 Upgrade Milli version (v0.2.0) 2021-05-03 14:50:47 +02:00
bors[bot]
5b93d6ab91 Merge #181
181: Upgrade Tokenizer version (v0.2.2) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-05-03 11:03:25 +00:00
bors[bot]
5c762b71dd Merge #177
177: Add bors r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-05-03 10:57:09 +00:00
Clémentine Urquizar
c30f17fafb Add bors 2021-05-03 12:29:30 +02:00
Clémentine Urquizar
34e02aba42 Upgrade Tokenizer version (v0.2.2) 2021-05-03 10:55:55 +02:00
bors[bot]
1e6b40a24b Merge #172
172: Fix cors authentication issue r=MarinPostma a=MarinPostma

The error was due to the middleware returning an error, instead of a response containing the error.

close #110


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-05-03 08:38:42 +00:00
Clément Renault
03bb95539b Merge pull request #180 from shekhirin/disable-autogenerated-doc-ids
Disable autogenerate_docids by default
2021-05-01 12:22:13 +02:00
Alexey Shekhirin
d81c0e8bba feat(update): disable autogenerate_docids by default 2021-04-30 21:41:34 +03:00
Clément Renault
c112877a4a Merge pull request #178 from meilisearch/visible-document-nb
make document addition number visible
2021-04-29 21:54:51 +02:00
Marin Postma
e8e32e0ba1 make document addition number visible 2021-04-29 20:05:07 +02:00
Marin Postma
78217bcf18 Fix cors authentication issue 2021-04-29 16:28:12 +02:00
bors[bot]
53c88d9fa3 Merge #170
170: Improve CI r=MarinPostma a=curquiza

Checked with @Kerollmops to improve (a little bit) the CI execution time.

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-29 14:08:33 +00:00
bors[bot]
b14fdb1163 Merge #171
171: Update mini-dashboard with version 0.1.2 r=MarinPostma a=mdubus

Update of the mini-dashboard sha1 & assets-url, due to a new release

Co-authored-by: Morgane Dubus <morgane.d@meilisearch.com>
2021-04-29 13:48:54 +00:00
Morgane Dubus
3d5fba94c2 Update mini-dashboard with version 0.1.2 2021-04-29 15:22:41 +02:00
Clémentine Urquizar
3ee2b07918 Improve CI 2021-04-29 15:19:48 +02:00
Clément Renault
b31f36d68c Merge pull request #173 from meilisearch/enhance-distinct-attributes
Remove excluded document in criteria iterations
2021-04-29 12:14:44 +02:00
many
ee09e50e7f Remove excluded document in criteria iterations
- pass excluded document to criteria to remove them in higher levels of the bucket-sort
- merge already returned document with excluded documents to avoid duplicas

Related to #125 and #112
Fix #170
2021-04-29 12:09:38 +02:00
Clément Renault
374c2782ad Merge pull request #176 from yanns/patch-1
do not use echo that espaces newline
2021-04-29 10:50:15 +02:00
Yann Simon
566c4a53c5 do not use echo that espaces newline
Fix https://github.com/meilisearch/milli/issues/175
2021-04-29 09:25:35 +02:00
Many
5b9524e1ba Merge pull request #172 from meilisearch/optimize-proximity-criterion
Optimize proximity criterion
2021-04-28 15:41:57 +02:00
many
31607bf9cd Add a threshold on proximity when choosing between linear/set algorithm 2021-04-28 14:57:22 +02:00
Clément Renault
5a10de1b9f Merge pull request #122 from meilisearch/attribute-criterion
Introduce the Attribute criterion
2021-04-28 14:34:50 +02:00
many
3b7e6afb55 Make some refacto and add documentation 2021-04-28 13:53:27 +02:00
bors[bot]
8bc7dd8b03 Merge #143
143: Shared update store r=irevoire a=MarinPostma

This PR changes the updates process so that only one instance of an update store is shared among indexes.

This allows updates to always be processed sequentially without additional synchronization, and fixes the bug where all the first pending update for each index were reported as processing whereas only one was.

EDIT:

I ended having to rewrite the whole `UpdateStore` to allow updates being really queued and processed sequentially in the ordered they were added. For that purpose I created a `pending_queue` that orders the updates by a global update id.

To find the next `update_id` to use, both globally and for each index, I have created another database that contains the next id to use.

Finally, all updates that have been processed (with success or otherwise) are all stores in an `updates` database.

The layout for the keys of these databases are such that it is easy to iterate over the elements for a particular index, and greatly reduces the amount of code to do so, compared to the former implementation.

I have also simplified the locking mechanism for the update store, thanks to the StateLock data structure, that allow both an arbitrary number of readers and a single writer to concurrently access the state. The current state can be either Idle, Processing, or Snapshotting. When an update or snapshotting is ongoing, the process holds the state lock until it is done processing its task. When it is done, it sets bask the state to Idle.

I have made other small improvements here and there, and have let some other for work, such as:
- When creating an update file to hold a request's content, it would be preferable to first create a temporary file, and then atomically persist it when we have written to it. This would simplify the case when there is no data to be written to the file, since we wouldn't have to take care about cleaning after ourselves.
- The logic for content validation must be factored.
- Some more tests related to error handling in the process_pending_update function.
- The issue #159

close #114


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-27 18:41:55 +00:00
Thomas Payet
e6fd1afc3d Merge pull request #163 from meilisearch/curquiza-patch-1
Update README.md
2021-04-27 18:51:04 +02:00
Marin Postma
a961f0ce75 fix clippy warnings 2021-04-27 18:28:46 +02:00
Many
0add4d735c Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:40:34 +02:00
Many
3794ffc952 Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:39:23 +02:00
Many
329bd4a1bb Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:39:03 +02:00
Many
3b1358b62f Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:32:19 +02:00
Many
c862b1bc6b Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:32:10 +02:00