Commit Graph

1447 Commits

Author SHA1 Message Date
刘瀚骋
2c65781d91 format 2021-10-12 22:20:22 +08:00
bors[bot]
6e3b869e6a Merge #388
388: fix primary key inference r=MarinPostma a=MarinPostma

The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id".

This fix sorts the the index by field_id when infering the primary key.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-10-12 09:25:16 +00:00
mpostma
86ead92ed5 infer primary key on sorted fields 2021-10-12 11:15:11 +02:00
mpostma
9a266a531b test correct primary key inference 2021-10-12 11:08:53 +02:00
bors[bot]
3f7f24b90e Merge #368
368: Remove limit of 1000 position per attribute r=irevoire a=ManyTheFish

Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.

- [x] check database size difference

below is the database size difference for each dataset:
![Capture d’écran 2021-09-27 à 18 01 44](https://user-images.githubusercontent.com/6482087/134944199-bd25fed0-6c34-475c-9afc-197871e06553.png)

- [ ] check search time on big dataset


Related to [product#202](https://github.com/meilisearch/product/issues/202)

Co-authored-by: many <maxime@meilisearch.com>
2021-10-12 08:30:33 +00:00
many
c5a6075484 Make max_position_per_attributes changable 2021-10-12 10:10:50 +02:00
many
360c5ff3df Remove limit of 1000 position per attribute
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
刘瀚骋
d323e35001 add a test case 2021-10-12 13:30:40 +08:00
刘瀚骋
70f576d5d3 error handling 2021-10-12 13:30:40 +08:00
刘瀚骋
28f9be8d7c support syntax 2021-10-12 13:30:40 +08:00
刘瀚骋
469d92c569 tweak error handling 2021-10-12 13:30:40 +08:00
刘瀚骋
7a90a101ee reorganize parser logic 2021-10-12 13:30:40 +08:00
刘瀚骋
f7796edc7e remove everything about pest 2021-10-12 13:30:40 +08:00
刘瀚骋
ac1df9d9d7 fix typo and remove pest 2021-10-12 13:30:40 +08:00
刘瀚骋
50ad750ec1 enhance error handling 2021-10-12 13:30:40 +08:00
刘瀚骋
8748df2ca4 draft without error handling 2021-10-12 13:30:40 +08:00
bors[bot]
8f6b6c9042 Merge #385
385: Fix the wiki indexing benchmark r=ManyTheFish a=irevoire



Co-authored-by: Tamo <tamo@meilisearch.com>
2021-10-11 15:12:24 +00:00
bors[bot]
07fb6d64e5 Merge #386
386: fix obkv document r=curquiza a=MarinPostma

When serializing a document, the serializer resolved the field_id of the current field and immediately added it to the obkv document under construction. The issue with that is that obkv expects the fields to be inserted in order, and when a document with out of order fields was added, obkv failed to insert the field.

The current fix first resolves each field_id, and adds all the fields to a temporary `BTreeMap`, until `end` is called on the map serializer, where all the fields are added to the obkv at once, and in order.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-10-11 13:45:04 +00:00
bors[bot]
e45c846af5 Merge #387
387: Update version for the next release (v0.17.2) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-10-11 13:21:47 +00:00
Clémentine Urquizar
dd56e82dba Update version for the next release (v0.17.2) 2021-10-11 15:20:35 +02:00
mpostma
99889a0ed0 add obkv document serialization test 2021-10-11 15:13:17 +02:00
mpostma
799f3d43c8 fix serialization to obkv format 2021-10-11 15:04:47 +02:00
Tamo
ed7fd855af fix the wiki indexing benchmark 2021-10-11 14:26:36 +02:00
Tom Parker-Shemilt
2dfe24f067 memmap -> memmap2 2021-10-10 22:47:12 +01:00
bors[bot]
a2743baaa3 Merge #383
383: Add check on latitude and longitude r=irevoire a=irevoire

Latitudes are not supposed to go beyond 90 degrees or below -90.
The same goes for longitudes with 180 or -180.

This was badly implemented in the filters, and was not implemented for the `AscDesc` rules.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-10-08 10:15:25 +00:00
Irevoire
b65aa7b5ac Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-10-07 17:51:52 +02:00
Tamo
11dfe38761 Update the check on the latitude and longitude
Latitude are not supposed to go beyound 90 degrees or below -90.
The same goes for longitude with 180 or -180.

This was badly implemented in the filters, and was not implemented for the AscDesc rules.
2021-10-07 16:10:43 +02:00
bors[bot]
dde1da1c0e Merge #382
382: Refactor attribute criterion r=Kerollmops a=ManyTheFish

### Re-implement set based algorithm for attribute criterion
#### Levels
Instead of doing level iteration and digging in the interesting level, we only iterate over the lowest level.

#### crossword iteration VS minimal position iteration
Instead of crossing word position in order to iterate strictly over the position that gives the best rank in good order; we iterate word by word starting with the word that increases the rank the little as possible.
This new method is a bit less precise but way simpler.

### Simplify word-level-position database
We don't use levels anymore in the attribute criterion, and so we removed the level complexity of the database making a word-position-docids database.

### Benchmarks on search on big datasets

#### songs main VS refactor-attribute-criterion
```diff
  group                                                   search_songsmain_31c18f09               search_songsrefactor-attribute-criterion_1bd15d84
  -----                                                   -------------------------               -------------------------------------------------
- smol-songs.csv: basic filter: <=/Notstandskomitee       1.00     84.8±0.58µs        ? ?/sec     1.09     92.2±8.98µs        ? ?/sec
+ smol-songs.csv: basic filter: TO/Notstandskomitee       1.18     98.0±6.30µs        ? ?/sec     1.00     83.2±0.97µs        ? ?/sec
+ smol-songs.csv: basic with quote/"david" "bowie"        114.68    76.0±0.20ms        ? ?/sec    1.00    662.5±5.03µs        ? ?/sec
- smol-songs.csv: basic with quote/"john"                 1.00    197.4±1.06µs        ? ?/sec     1.05    208.1±1.53µs        ? ?/sec
+ smol-songs.csv: basic with quote/"michael" "jackson"    2.75      2.0±0.01ms        ? ?/sec     1.00    738.9±3.91µs        ? ?/sec
+ smol-songs.csv: basic without quote/david bowie         297.42  1499.3±0.86ms        ? ?/sec    1.00      5.0±0.02ms        ? ?/sec
+ smol-songs.csv: basic without quote/michael jackson     2.55      8.9±0.02ms        ? ?/sec     1.00      3.5±0.01ms        ? ?/sec
+ smol-songs.csv: big filter/john                         1.08    473.6±2.25µs        ? ?/sec     1.00    438.1±2.59µs        ? ?/sec
- smol-songs.csv: prefix search/a                         1.00    446.9±1.81µs        ? ?/sec     1.79    800.5±4.45µs        ? ?/sec
- smol-songs.csv: prefix search/b                         1.00    398.5±2.74µs        ? ?/sec     1.81    723.1±5.46µs        ? ?/sec
- smol-songs.csv: prefix search/i                         1.00    486.3±1.99µs        ? ?/sec     1.69    823.6±9.42µs        ? ?/sec
- smol-songs.csv: prefix search/s                         1.00    229.6±3.29µs        ? ?/sec     2.59    594.4±2.22µs        ? ?/sec
- smol-songs.csv: prefix search/x                         1.00    150.2±0.76µs        ? ?/sec     1.11    166.0±0.87µs        ? ?/sec
```

On songs, the new algorithm gives a big improvement on slow queries, and is slower on one char prefix search (fast queries <1ms).

#### wiki main VS refactor-attribute-criterion
```diff
  group                                                           search_wikimain_31c18f09               search_wikirefactor-attribute-criterion_1bd15d84
  -----                                                           ------------------------               ------------------------------------------------
- smol-wiki-articles.csv: basic with quote/"rock" "and" "roll"    1.00      3.2±0.01ms        ? ?/sec    1.15      3.7±0.01ms        ? ?/sec
- smol-wiki-articles.csv: basic without quote/film                1.00    351.5±2.47µs        ? ?/sec    1.13    396.8±1.63µs        ? ?/sec
+ smol-wiki-articles.csv: basic without quote/rock and roll       1.10      9.4±0.02ms        ? ?/sec    1.00      8.6±0.04ms        ? ?/sec
- smol-wiki-articles.csv: basic without quote/spain               1.00    446.0±3.23µs        ? ?/sec    1.11    496.6±7.75µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/c                         1.00    115.6±0.61µs        ? ?/sec    2.22    256.7±1.24µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/g                         1.00    189.7±2.03µs        ? ?/sec    1.57    297.0±1.35µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/j                         1.00    209.2±1.11µs        ? ?/sec    1.40    293.0±2.09µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/q                         1.00     79.0±0.44µs        ? ?/sec    1.10     87.2±0.69µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/t                         1.00    270.1±1.15µs        ? ?/sec    1.55    419.9±5.16µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/x                         1.00    244.9±1.33µs        ? ?/sec    1.07    260.9±1.95µs        ? ?/sec
- smol-wiki-articles.csv: words/Abraham machin                    1.00      8.1±0.03ms        ? ?/sec    1.17      9.4±0.02ms        ? ?/sec
- smol-wiki-articles.csv: words/Idaho Bellevue pizza              1.00     19.3±0.07ms        ? ?/sec    1.07     20.6±0.05ms        ? ?/sec
```
On wiki we have some regressions `+17%` and `+15%` on request `>1ms`.

Co-authored-by: many <maxime@meilisearch.com>
2021-10-06 09:19:33 +00:00
many
085bc6440c Apply PR comments 2021-10-06 11:12:26 +02:00
many
1bd15d849b Reduce candidates threshold 2021-10-05 18:52:14 +02:00
many
ea4bd29d14 Apply PR comments 2021-10-05 17:35:07 +02:00
many
5ed75de0db Update infos crate 2021-10-05 13:56:12 +02:00
many
3296bb243c Simplify word level position DB into a word position DB 2021-10-05 12:15:02 +02:00
many
75d341d928 Re-implement set based algorithm for attribute criterion 2021-10-05 12:14:50 +02:00
bors[bot]
31c18f0953 Merge #381
381: Update version for the next release (v0.17.1) r=irevoire a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-10-03 02:12:43 +00:00
Clémentine Urquizar
05d8a33a28 Update version for the next release (v0.17.1) 2021-10-02 16:21:31 +02:00
bors[bot]
c9092c72bf Merge #380
380: Reserved keyword error message r=Kerollmops a=irevoire

And I missed _another_ reserved keyword error message in the filter :(

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-10-01 07:13:31 +00:00
Tamo
d9eba9d145 improve and test the sort error message 2021-09-30 14:38:27 +02:00
Tamo
0ee67bb7d1 improve the reserved keyword error message for the filters 2021-09-30 14:38:27 +02:00
bors[bot]
22551d0941 Merge #379
379: Revert "Change chunk size to 4MiB to fit more the end user usage" r=curquiza a=ManyTheFish

Reverts meilisearch/milli#370

Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-09-29 13:20:53 +00:00
Many
26b5dad042 Revert "Change chunk size to 4MiB to fit more the end user usage" 2021-09-29 15:08:39 +02:00
bors[bot]
6a057a3bd0 Merge #378
378: Hotfix meilisearch#1707 r=Kerollmops a=ManyTheFish

This PR contains an ugly quick fix of [meilisearch#1707](https://github.com/meilisearch/MeiliSearch/issues/1707).

- remove comparison reverse on rank. Enhancing relevancy and performances
- iterate over level 0 only. Enhancing performances.

A better fix is in development.

Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-09-29 12:57:31 +00:00
Many
2e49230ca2 Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-29 14:49:45 +02:00
Many
7ad0214089 Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-29 14:49:41 +02:00
many
1df5b8712b Hotfix meilisearch#1707 2021-09-29 14:41:56 +02:00
bors[bot]
bfedbc1b6d Merge #374
374: Enhance CSV document parsing r=Kerollmops a=ManyTheFish

Benchmarks on `search_songs` were crashing because of the CSV parsing.

Co-authored-by: many <maxime@meilisearch.com>
2021-09-29 08:55:54 +00:00
bors[bot]
68c758a533 Merge #376
376: Stop casting integer docids to string r=Kerollmops a=irevoire

When a docid is an integer, we stop casting it to a string, and thus we don't add `"` around it.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-29 08:32:48 +00:00
many
d2427f18e5 Enhance CSV document parsing 2021-09-29 10:25:33 +02:00
bors[bot]
00f94b1ffd Merge #377
377: Update version for the next release (v0.17.0) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-28 20:43:33 +00:00
Clémentine Urquizar
0e8665bf18 Update version for the next release (v0.17.0) 2021-09-28 19:38:12 +02:00