Commit Graph

1583 Commits

Author SHA1 Message Date
刘瀚骋
50ad750ec1 enhance error handling 2021-10-12 13:30:40 +08:00
刘瀚骋
8748df2ca4 draft without error handling 2021-10-12 13:30:40 +08:00
bors[bot]
8f6b6c9042 Merge #385
385: Fix the wiki indexing benchmark r=ManyTheFish a=irevoire



Co-authored-by: Tamo <tamo@meilisearch.com>
2021-10-11 15:12:24 +00:00
bors[bot]
07fb6d64e5 Merge #386
386: fix obkv document r=curquiza a=MarinPostma

When serializing a document, the serializer resolved the field_id of the current field and immediately added it to the obkv document under construction. The issue with that is that obkv expects the fields to be inserted in order, and when a document with out of order fields was added, obkv failed to insert the field.

The current fix first resolves each field_id, and adds all the fields to a temporary `BTreeMap`, until `end` is called on the map serializer, where all the fields are added to the obkv at once, and in order.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-10-11 13:45:04 +00:00
bors[bot]
e45c846af5 Merge #387
387: Update version for the next release (v0.17.2) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-10-11 13:21:47 +00:00
Clémentine Urquizar
dd56e82dba Update version for the next release (v0.17.2) 2021-10-11 15:20:35 +02:00
mpostma
99889a0ed0 add obkv document serialization test 2021-10-11 15:13:17 +02:00
mpostma
799f3d43c8 fix serialization to obkv format 2021-10-11 15:04:47 +02:00
Tamo
ed7fd855af fix the wiki indexing benchmark 2021-10-11 14:26:36 +02:00
Tom Parker-Shemilt
2dfe24f067 memmap -> memmap2 2021-10-10 22:47:12 +01:00
bors[bot]
a2743baaa3 Merge #383
383: Add check on latitude and longitude r=irevoire a=irevoire

Latitudes are not supposed to go beyond 90 degrees or below -90.
The same goes for longitudes with 180 or -180.

This was badly implemented in the filters, and was not implemented for the `AscDesc` rules.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-10-08 10:15:25 +00:00
Irevoire
b65aa7b5ac Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-10-07 17:51:52 +02:00
Tamo
11dfe38761 Update the check on the latitude and longitude
Latitude are not supposed to go beyound 90 degrees or below -90.
The same goes for longitude with 180 or -180.

This was badly implemented in the filters, and was not implemented for the AscDesc rules.
2021-10-07 16:10:43 +02:00
bors[bot]
dde1da1c0e Merge #382
382: Refactor attribute criterion r=Kerollmops a=ManyTheFish

### Re-implement set based algorithm for attribute criterion
#### Levels
Instead of doing level iteration and digging in the interesting level, we only iterate over the lowest level.

#### crossword iteration VS minimal position iteration
Instead of crossing word position in order to iterate strictly over the position that gives the best rank in good order; we iterate word by word starting with the word that increases the rank the little as possible.
This new method is a bit less precise but way simpler.

### Simplify word-level-position database
We don't use levels anymore in the attribute criterion, and so we removed the level complexity of the database making a word-position-docids database.

### Benchmarks on search on big datasets

#### songs main VS refactor-attribute-criterion
```diff
  group                                                   search_songsmain_31c18f09               search_songsrefactor-attribute-criterion_1bd15d84
  -----                                                   -------------------------               -------------------------------------------------
- smol-songs.csv: basic filter: <=/Notstandskomitee       1.00     84.8±0.58µs        ? ?/sec     1.09     92.2±8.98µs        ? ?/sec
+ smol-songs.csv: basic filter: TO/Notstandskomitee       1.18     98.0±6.30µs        ? ?/sec     1.00     83.2±0.97µs        ? ?/sec
+ smol-songs.csv: basic with quote/"david" "bowie"        114.68    76.0±0.20ms        ? ?/sec    1.00    662.5±5.03µs        ? ?/sec
- smol-songs.csv: basic with quote/"john"                 1.00    197.4±1.06µs        ? ?/sec     1.05    208.1±1.53µs        ? ?/sec
+ smol-songs.csv: basic with quote/"michael" "jackson"    2.75      2.0±0.01ms        ? ?/sec     1.00    738.9±3.91µs        ? ?/sec
+ smol-songs.csv: basic without quote/david bowie         297.42  1499.3±0.86ms        ? ?/sec    1.00      5.0±0.02ms        ? ?/sec
+ smol-songs.csv: basic without quote/michael jackson     2.55      8.9±0.02ms        ? ?/sec     1.00      3.5±0.01ms        ? ?/sec
+ smol-songs.csv: big filter/john                         1.08    473.6±2.25µs        ? ?/sec     1.00    438.1±2.59µs        ? ?/sec
- smol-songs.csv: prefix search/a                         1.00    446.9±1.81µs        ? ?/sec     1.79    800.5±4.45µs        ? ?/sec
- smol-songs.csv: prefix search/b                         1.00    398.5±2.74µs        ? ?/sec     1.81    723.1±5.46µs        ? ?/sec
- smol-songs.csv: prefix search/i                         1.00    486.3±1.99µs        ? ?/sec     1.69    823.6±9.42µs        ? ?/sec
- smol-songs.csv: prefix search/s                         1.00    229.6±3.29µs        ? ?/sec     2.59    594.4±2.22µs        ? ?/sec
- smol-songs.csv: prefix search/x                         1.00    150.2±0.76µs        ? ?/sec     1.11    166.0±0.87µs        ? ?/sec
```

On songs, the new algorithm gives a big improvement on slow queries, and is slower on one char prefix search (fast queries <1ms).

#### wiki main VS refactor-attribute-criterion
```diff
  group                                                           search_wikimain_31c18f09               search_wikirefactor-attribute-criterion_1bd15d84
  -----                                                           ------------------------               ------------------------------------------------
- smol-wiki-articles.csv: basic with quote/"rock" "and" "roll"    1.00      3.2±0.01ms        ? ?/sec    1.15      3.7±0.01ms        ? ?/sec
- smol-wiki-articles.csv: basic without quote/film                1.00    351.5±2.47µs        ? ?/sec    1.13    396.8±1.63µs        ? ?/sec
+ smol-wiki-articles.csv: basic without quote/rock and roll       1.10      9.4±0.02ms        ? ?/sec    1.00      8.6±0.04ms        ? ?/sec
- smol-wiki-articles.csv: basic without quote/spain               1.00    446.0±3.23µs        ? ?/sec    1.11    496.6±7.75µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/c                         1.00    115.6±0.61µs        ? ?/sec    2.22    256.7±1.24µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/g                         1.00    189.7±2.03µs        ? ?/sec    1.57    297.0±1.35µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/j                         1.00    209.2±1.11µs        ? ?/sec    1.40    293.0±2.09µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/q                         1.00     79.0±0.44µs        ? ?/sec    1.10     87.2±0.69µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/t                         1.00    270.1±1.15µs        ? ?/sec    1.55    419.9±5.16µs        ? ?/sec
- smol-wiki-articles.csv: prefix search/x                         1.00    244.9±1.33µs        ? ?/sec    1.07    260.9±1.95µs        ? ?/sec
- smol-wiki-articles.csv: words/Abraham machin                    1.00      8.1±0.03ms        ? ?/sec    1.17      9.4±0.02ms        ? ?/sec
- smol-wiki-articles.csv: words/Idaho Bellevue pizza              1.00     19.3±0.07ms        ? ?/sec    1.07     20.6±0.05ms        ? ?/sec
```
On wiki we have some regressions `+17%` and `+15%` on request `>1ms`.

Co-authored-by: many <maxime@meilisearch.com>
2021-10-06 09:19:33 +00:00
many
085bc6440c Apply PR comments 2021-10-06 11:12:26 +02:00
many
1bd15d849b Reduce candidates threshold 2021-10-05 18:52:14 +02:00
many
ea4bd29d14 Apply PR comments 2021-10-05 17:35:07 +02:00
many
5ed75de0db Update infos crate 2021-10-05 13:56:12 +02:00
many
3296bb243c Simplify word level position DB into a word position DB 2021-10-05 12:15:02 +02:00
many
75d341d928 Re-implement set based algorithm for attribute criterion 2021-10-05 12:14:50 +02:00
bors[bot]
31c18f0953 Merge #381
381: Update version for the next release (v0.17.1) r=irevoire a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-10-03 02:12:43 +00:00
Clémentine Urquizar
05d8a33a28 Update version for the next release (v0.17.1) 2021-10-02 16:21:31 +02:00
bors[bot]
c9092c72bf Merge #380
380: Reserved keyword error message r=Kerollmops a=irevoire

And I missed _another_ reserved keyword error message in the filter :(

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-10-01 07:13:31 +00:00
Tamo
d9eba9d145 improve and test the sort error message 2021-09-30 14:38:27 +02:00
Tamo
0ee67bb7d1 improve the reserved keyword error message for the filters 2021-09-30 14:38:27 +02:00
bors[bot]
22551d0941 Merge #379
379: Revert "Change chunk size to 4MiB to fit more the end user usage" r=curquiza a=ManyTheFish

Reverts meilisearch/milli#370

Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-09-29 13:20:53 +00:00
Many
26b5dad042 Revert "Change chunk size to 4MiB to fit more the end user usage" 2021-09-29 15:08:39 +02:00
bors[bot]
6a057a3bd0 Merge #378
378: Hotfix meilisearch#1707 r=Kerollmops a=ManyTheFish

This PR contains an ugly quick fix of [meilisearch#1707](https://github.com/meilisearch/MeiliSearch/issues/1707).

- remove comparison reverse on rank. Enhancing relevancy and performances
- iterate over level 0 only. Enhancing performances.

A better fix is in development.

Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-09-29 12:57:31 +00:00
Many
2e49230ca2 Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-29 14:49:45 +02:00
Many
7ad0214089 Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-29 14:49:41 +02:00
many
1df5b8712b Hotfix meilisearch#1707 2021-09-29 14:41:56 +02:00
bors[bot]
bfedbc1b6d Merge #374
374: Enhance CSV document parsing r=Kerollmops a=ManyTheFish

Benchmarks on `search_songs` were crashing because of the CSV parsing.

Co-authored-by: many <maxime@meilisearch.com>
2021-09-29 08:55:54 +00:00
bors[bot]
68c758a533 Merge #376
376: Stop casting integer docids to string r=Kerollmops a=irevoire

When a docid is an integer, we stop casting it to a string, and thus we don't add `"` around it.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-29 08:32:48 +00:00
many
d2427f18e5 Enhance CSV document parsing 2021-09-29 10:25:33 +02:00
bors[bot]
00f94b1ffd Merge #377
377: Update version for the next release (v0.17.0) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-28 20:43:33 +00:00
Clémentine Urquizar
0e8665bf18 Update version for the next release (v0.17.0) 2021-09-28 19:38:12 +02:00
Tamo
f65153ad64 stop casting integer docids to string 2021-09-28 18:35:54 +02:00
bors[bot]
adddf3f179 Merge #375
375: Fixes #365 r=Kerollmops a=vishnugt



Co-authored-by: Vishnu Ganesan <vganesan@microsoft.com>
Co-authored-by: Vishnu Gt <vishnugt@hotmail.com>
2021-09-28 14:42:48 +00:00
Vishnu Gt
785c1372f2 Change "settings" to "setting"
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2021-09-28 20:11:32 +05:30
Vishnu Ganesan
3580b2d803 Fixes #365 2021-09-28 19:30:23 +05:30
bors[bot]
3a12f5887e Merge #373
373: Improve error message for bad sort syntax with geosearch r=Kerollmops a=irevoire

`@Kerollmops` This should be the last PR for the geosearch and error handling, sorry for doing it in so many steps 😬 

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-28 12:39:32 +00:00
Tamo
a80dcfd4a3 improve error message for bad sort syntax with geosearch 2021-09-28 14:32:24 +02:00
bors[bot]
b2a332599e Merge #372
372: Fix Meilisearch 1714 r=Kerollmops a=ManyTheFish

The bug comes from the typo tolerance, to know how many typos are accepted we were counting bytes instead of characters in a word.
On Chinese Script characters, we were allowing  2 typos on 3 characters words.
We are now counting the number of char instead of counting bytes to assign the typo tolerance.

Related to [Meilisearch#1714](https://github.com/meilisearch/MeiliSearch/issues/1714)

Co-authored-by: many <maxime@meilisearch.com>
2021-09-28 11:59:45 +00:00
many
8046ae4bd5 Count the number of char instead of counting bytes to assign the typo tolerance 2021-09-28 12:10:43 +02:00
many
1988416295 Add failing test related to Meilisearch#1714 2021-09-28 12:05:11 +02:00
bors[bot]
3b479948c6 Merge #371
371: Provide a sort error handler r=Kerollmops a=irevoire

This PR simplify the error handling of asc-desc rules for Meilisearch or any other wrapper by providing directly in milli a new error type called `SortError` that can be generated from an `AscDescError` and that can be automatically converted to a `UserError`.

Basically now, wherever you are in the code as a user or in milli you can parse an `AscDesc` syntax and depending on the context, cast it either as a `SortError` or a `CriterionError` in one line with improved error messages.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-28 09:28:32 +00:00
Tamo
cc732fe95e update http-ui to use the sort-error 2021-09-28 11:15:24 +02:00
Tamo
c7cb816ae1 simplify the error handling of the sort syntax for meilisearch 2021-09-27 19:07:22 +02:00
bors[bot]
4c09f6838f Merge #370
370: Change chunk size to 4MiB to fit more the end user usage r=ManyTheFish a=ManyTheFish

We made several indexing tests using different sizes of datasets (5 datasets from 9MiB to 100MiB) on several typologies of VMs (`XS: 1GiB RAM, 1 VCPU`, `S: 2GiB RAM, 2 VCPU`, `M: 4GiB RAM, 3 VCPU`, `L: 8GiB RAM, 4 VCPU`).
The result of these tests shows that the `4MiB` chunk size seems to be the best size compared to other chunk sizes (`2Mib`, `4MiB`, `8Mib`, `16Mib`,  `32Mib`, `64Mib`, `128Mib`).

below is the average time per chunk size:

![Capture d’écran 2021-09-27 à 14 27 50](https://user-images.githubusercontent.com/6482087/134909368-ef0bc45e-68d5-49d1-aaf9-91113b7c410f.png)

<details>
<summary>Detailled data</summary>
<br>

![Capture d’écran 2021-09-27 à 14 39 48](https://user-images.githubusercontent.com/6482087/134909952-a36b1457-bbbd-4a6c-bbe5-519e4b926b5a.png)
</br>
</details> 


Co-authored-by: many <maxime@meilisearch.com>
2021-09-27 12:57:52 +00:00
many
b188063869 Change chunk size to 4MiB to fit more the end user usage 2021-09-27 14:26:21 +02:00