Commit Graph

254 Commits

Author SHA1 Message Date
0bf4f3f48a Modify a test to check that criteria additions change the fields ids map 2021-06-08 18:14:34 +02:00
82df524e09 Make sure that we register the field when setting criteria 2021-06-08 18:14:33 +02:00
133ab98260 Use the index primary key when deleting documents 2021-06-08 17:33:29 +02:00
a32236c80c Merge #211
211: Update Cargo.toml for next release v0.3.0 r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-03 10:42:52 +00:00
3b2b3aeea9 Update Cargo.toml for next release v0.3.0 2021-06-03 12:24:27 +02:00
39ed133f9f Merge #193
193: Fix primary key behavior r=Kerollmops a=MarinPostma

this pr:
- Adds early returns on empty document additions, avoiding error messages to be returned when adding no documents and no primary key was set.
- Changes the primary key inference logic to match that of legacy meilisearch.

close #194 

Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-03 10:24:21 +00:00
57898d8a90 fix silent deserialize error 2021-06-03 10:42:55 +02:00
834504aec0 Merge #204
204: Decorrelate Distinct, Asc/Desc, Filterable fields from the faceted fields r=Kerollmops a=Kerollmops

This PR decorrelates the fields that need to be stored in facet databases (big inverted indexes for fast access) from the filterable fields, the previously named faceted fields are now named filterable fields and are the union of the distinct attribute, all the Asc/Desc criteria and, the filterable fields.

I added two tests to make sure that the engine was correctly generating the faceted databases when a distinct attribute or an Asc/Desc criteria were added, and one to make sure that it was impossible to filter on a non-filterable field even if it was a faceted one.

Note that the `AttributesForFacetting` has also been renamed into `FilterableAttributes`. But it will be the Transplant's job to do that on the API, this change is only visible to the milli's library users.

- Related to https://github.com/meilisearch/transplant/issues/187.
- Fixes #161 by returning the documents that don't have the Asc/Desc field at the end of the bucket.
- Fixes #168.
- Fixes #152.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: many <maxime@meilisearch.com>
2021-06-02 15:43:39 +00:00
26a9974667 Make asc/desc criterion return resting documents
Fix #161.2
2021-06-02 17:41:48 +02:00
3c304c89d4 Make sure that we generate the faceted database when required 2021-06-02 16:24:58 +02:00
b0c0490e85 Make sure that we can add a Asc/Desc field without it being filterable 2021-06-02 16:24:58 +02:00
3b1cd4c4b4 Rename the FacetCondition into FilterCondition 2021-06-02 16:24:58 +02:00
c2afdbb1fb Move and comment some internal facet_condition helper functions 2021-06-02 16:24:58 +02:00
6476827d3a Fix the indexer to be sure that distinct and Asc/Desc are also faceted 2021-06-02 16:24:58 +02:00
1e366dae3e remove useless lifetime on Distinct Trait 2021-06-02 16:24:58 +02:00
187c713de5 Remove the MapDistinct struct as now distinct attributes are faceted 2021-06-02 16:24:57 +02:00
ff440c1d9d Introduce the faceted fields method to retrieve those that needs faceting 2021-06-02 16:24:57 +02:00
2a3f9b32ff Rename the faceted fields into filterable fields 2021-06-02 16:24:57 +02:00
06c414a753 move the benchmarks to another crate so we can download the datasets automatically without adding overhead to the build of milli 2021-06-02 11:11:50 +02:00
3c84075d2d uses an env variable to find the datasets 2021-06-02 11:05:07 +02:00
4969abeaab update the facets for the benchmarks 2021-06-02 11:05:07 +02:00
e5dfde88fd fix the facets conditions 2021-06-02 11:05:07 +02:00
7c7fba4e57 remove the time limitation to let criterion do what it wants 2021-06-02 11:05:07 +02:00
5d5d115608 reformat all the files 2021-06-02 11:05:07 +02:00
7086009f93 improve the base search 2021-06-02 11:05:07 +02:00
d0b44c380f add benchmarks on a wiki dataset 2021-06-02 11:05:07 +02:00
beae843766 add a missing space 2021-06-02 11:05:07 +02:00
5132a106a1 refactorize everything related to the songs dataset in a songs benchmark file 2021-06-02 11:05:07 +02:00
136efd6b53 fix the benches 2021-06-02 11:05:07 +02:00
4b78ef31b6 add the configuration of the searchable fields and displayed fields and a default configuration for the songs 2021-06-02 11:05:07 +02:00
ea0c6d8c40 add a bunch of queries and start the introduction of the filters and the new dataset 2021-06-02 11:05:07 +02:00
3def42abd8 merge all the criterion only benchmarks in one file 2021-06-02 11:05:07 +02:00
a2bff68c1a remove the optional words for the typo criterion 2021-06-02 11:05:07 +02:00
aee49bb3cd add the proximity criterion 2021-06-02 11:05:07 +02:00
49e4cc3daf add the words criterion to the bench 2021-06-02 11:05:07 +02:00
15cce89a45 update the README with instructions to get the download the dataset 2021-06-02 11:05:07 +02:00
e425f70ef9 let criterion decide how much iteration it wants to do in 10s 2021-06-02 11:05:07 +02:00
4fdbfd6048 push a first version of the benchmark for the typo 2021-06-02 11:05:07 +02:00
270da98c46 Merge #202
202: Add field id word count docids database r=Kerollmops a=LegendreM

This PR introduces a new database, `field_id_word_count_docids`, that maps the number of words in an attribute with a list of document ids. This relation is limited to attributes that contain less than 11 words.
This database is used by the exactness criterion to know if a document has an attribute that contains exactly the query without any additional word.

Fix #165 
Fix #196
Related to [specifications:#36](https://github.com/meilisearch/specifications/pull/36)

Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-06-01 16:09:48 +00:00
e857ca4d7d Fix PR comments 2021-06-01 18:06:46 +02:00
ab2cf69e8d Update milli/src/update/delete_documents.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:10 +02:00
8e6d1ff0dc Update milli/src/update/index_documents/store.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:02 +02:00
7d36d664a7 Merge #203
203: Make the MatchingWords return the number of matching bytes r=Kerollmops a=LegendreM

Make the MatchingWords return the number of matching bytes using a custom Levenshtein algorithm.

Fix #138

Co-authored-by: many <maxime@meilisearch.com>
2021-06-01 12:00:33 +00:00
225ae6fd25 Resolve PR comments 2021-06-01 11:53:09 +02:00
984dc7c1ed rewrite roaring codec without byteorder. 2021-05-31 22:15:39 +02:00
1373637da1 optimize roaring codec 2021-05-31 22:15:35 +02:00
1df68d342a Make the MatchingWords return the number of matching bytes 2021-05-31 18:22:29 +02:00
c701f8bf36 Use field id word count database in exactness criterion 2021-05-31 16:27:28 +02:00
4ddf008be2 add field id word count database 2021-05-31 16:27:28 +02:00
2f5e61bacb Merge #184
184: Transfer numbers and strings facets into the appropriate facet databases r=Kerollmops a=Kerollmops

This pull request is related to https://github.com/meilisearch/milli/issues/152 and changes the layout of the facets values, numbers and strings are now in dedicated databases and the user no more needs to define the type of the fields. No more conversion between the two types is done, numbers (floats and integers converted to f64) go to the facet float database and strings go to the strings facet database.

There is one related issue that I found regarding CSVs, the values in a CSV are always considered to be strings, [meilisearch/specifications#28](d916b57d74/text/0028-indexing-csv.md) fixes this issue by allowing the user to define the fields types using `:` in the "CSV Formatting Rules" section.

All previous tests on facets have been modified to pass again and I have also done hand-driven tests with the 115m songs dataset. Everything seems to be good!

Fixes #192.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-05-31 13:32:58 +00:00