Commit Graph

7510 Commits

Author SHA1 Message Date
d388ea0f9d Merge #506
506: fix cargo warnings r=Kerollmops a=MarinPostma

fix cargo warnings


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-26 15:45:20 +00:00
ec89030483 Update bors toml 2022-04-26 17:36:04 +02:00
5c29258e8e fix cargo warnings 2022-04-26 17:33:11 +02:00
2fdf520271 Merge #514
514: Stop flattening every field r=Kerollmops a=irevoire

When we need to flatten a document:
* The primary key contains a `.`.
* Some fields need to be flattened

Instead of flattening the whole object and thus creating a lot of allocations with the `serde_json_flatten_crate`, we instead generate a minimal sub-object containing only the fields that need to be flattened.
That should create fewer allocations and thus index faster.

---------

```
group                                                             indexing_main_e1e362fa                 indexing_stop-flattening-every-field_40d1bd6b
-----                                                             ----------------------                 ---------------------------------------------
indexing/Indexing geo_point                                       1.99      23.7±0.23s        ? ?/sec    1.00      11.9±0.21s        ? ?/sec
indexing/Indexing movies in three batches                         1.00      18.2±0.24s        ? ?/sec    1.01      18.3±0.29s        ? ?/sec
indexing/Indexing movies with default settings                    1.00      17.5±0.09s        ? ?/sec    1.01      17.7±0.26s        ? ?/sec
indexing/Indexing songs in three batches with default settings    1.00      64.8±0.47s        ? ?/sec    1.00      65.1±0.49s        ? ?/sec
indexing/Indexing songs with default settings                     1.00      54.9±0.99s        ? ?/sec    1.01      55.7±1.34s        ? ?/sec
indexing/Indexing songs without any facets                        1.00      50.6±0.62s        ? ?/sec    1.01      50.9±1.05s        ? ?/sec
indexing/Indexing songs without faceted numbers                   1.00      54.0±1.14s        ? ?/sec    1.01      54.7±1.13s        ? ?/sec
indexing/Indexing wiki                                            1.00     996.2±8.54s        ? ?/sec    1.02   1021.1±30.63s        ? ?/sec
indexing/Indexing wiki in three batches                           1.00    1136.8±9.72s        ? ?/sec    1.00    1138.6±6.59s        ? ?/sec
```

So basically everything slowed down a liiiiiittle bit except the dataset with a nested field which got twice faster

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-04-26 11:50:33 +00:00
f19d2dc548 Only flatten the required fields
apply review comments

Co-authored-by: Kerollmops <kero@meilisearch.com>
2022-04-26 12:33:46 +02:00
5adeac8047 Merge #516
516: Fix the indexing fuzzer r=irevoire a=irevoire



Co-authored-by: Tamo <tamo@meilisearch.com>
2022-04-26 08:35:03 +00:00
7cb7643565 Make nightly CI run every week
Update CI

Fix CI
2022-04-25 18:52:27 +02:00
d138b3c704 Update version 2022-04-25 18:43:46 +02:00
fa6f495662 fix the indexing fuzzer 2022-04-25 18:32:06 +02:00
8cc86d5a8d Merge #515
515: Improve the README r=curquiza a=Kerollmops

This PR closes #512 by adding more content to the README. We listed all of the subcrates of the repository, changed the descriptions of the subcrates, and added a simple example usage in the README.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-04-25 16:15:12 +00:00
5e562ffecf Update README.md 2022-04-25 18:14:43 +02:00
2277172f9c Update README.md 2022-04-25 18:14:39 +02:00
2db3d60259 Update README.md 2022-04-25 18:14:35 +02:00
7e19bf1c0e Add an example usage of the library in the README 2022-04-25 17:25:46 +02:00
fb192aaa9f Update the list of milli's subcrates 2022-04-25 15:55:38 +02:00
e1e362fa43 Merge #509
509: Remove pr_status from bors settings r=Kerollmops a=curquiza

Because of multiple issue we had with bors.
https://github.com/bors-ng/bors-ng/issues/1492

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-04-25 11:45:37 +00:00
08753d002a Remove pr_status from bors settings 2022-04-25 13:39:45 +02:00
8d15ae37a1 Merge pull request #503 from meilisearch/improve-flatten-fuzzer
Improve the fuzzer of the flatten crate
2022-04-25 13:38:43 +02:00
3e53791de3 Merge pull request #508 from meilisearch/contributing
First version of new CONTRIBUTING.md
2022-04-25 13:36:41 +02:00
8010eca9c7 Merge #505
505: normalize exact words r=curquiza a=MarinPostma

Normalize the exact words, as specified in the specification.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-25 09:35:32 +00:00
c07f3b44b7 Merge #2347
2347: Change Nelson path r=curquiza a=curquiza

Nelson is now on the Meilisearch orga side

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-04-21 17:50:46 +00:00
dc0d4addd9 First version of new CONTRIBUTING.md 2022-04-21 19:02:22 +02:00
38d681c230 Change Nelson path 2022-04-21 18:42:34 +02:00
e85377e725 Merge pull request #2346 from meilisearch/revert-2345-bump-meilisearch-v9000.0.0
Revert "[TEST PURPOSE] Bump meilisearch to version 9000.0.0"
2022-04-21 16:38:48 +02:00
6ff8bf823d Revert "[TEST PURPOSE] Bump meilisearch to version 9000.0.0" 2022-04-21 16:36:56 +02:00
4d25229df9 Merge pull request #2345 from meilisearch/bump-meilisearch-v9000.0.0
Bump meilisearch to version 9000.0.0
2022-04-21 16:28:46 +02:00
f1cd6b6ee8 bump meilisearch to v9000.0.0 2022-04-21 14:26:40 +00:00
63f75bd187 Merge pull request #2344 from meilisearch/revert-2340-bump-meilisearch-v8000.1.0
Revert "[TEST PURPOSE] Bump meilisearch to version 8000.1.0"
2022-04-21 16:24:57 +02:00
acf3357cf3 Revert "[TEST PURPOSE] Bump meilisearch to version 8000.1.0" 2022-04-21 16:24:27 +02:00
71414630fc Merge pull request #504 from meilisearch/test-long-words
Add a test to make sure that long words are handled
2022-04-21 16:06:13 +02:00
2e0089d5ff normalize exact words 2022-04-21 15:38:40 +02:00
202d6105b2 Merge pull request #2340 from meilisearch/bump-meilisearch-v8000.1.0
[TEST PURPOSE] Bump meilisearch to version 8000.1.0
2022-04-21 15:28:00 +02:00
0714551101 bump meilisearch to v8000.1.0 2022-04-21 13:23:46 +00:00
3a2451fcba add test normalize exact words 2022-04-21 13:52:09 +02:00
eb5830aa40 Add a test to make sure that long words are handled 2022-04-21 13:45:28 +02:00
04381011b0 Merge #2336
2336: Move permissive-json-pointer in the meilisearch repository r=Kerollmops a=irevoire

Move the permissive-json-pointer crate in the meilisearch repository.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-04-20 17:25:44 +00:00
1ef87cc6d0 chore: move permissive-json-pointer in the meilisearch repository
Update permissive-json-pointer/src/lib.rs

Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-04-20 19:24:41 +02:00
4a9000bb96 Merge #2332
2332: fix(search): formatted field r=curquiza a=irevoire

fix #2318

Co-authored-by: Irevoire <tamo@meilisearch.com>
v0.27.0rc2
2022-04-20 14:59:41 +00:00
d81a3f4a74 improve the fuzzer of the flatten crate 2022-04-20 16:11:23 +02:00
754c49f991 Merge #2326
2326: rename min word lenght for typo r=irevoire a=MarinPostma

rename `minWordLengthForTypo` to `minWordSizeForTypos` as specified.

discussed here: https://github.com/meilisearch/specifications/pull/117#discussion_r850795714

Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-20 11:54:10 +00:00
97adef6bfc Merge #2335
2335: Fix typo reset by upgrading Milli to v0.26.2 r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-04-20 10:49:57 +00:00
a7fd199ded Fix typo reseting by upgrading milli to v0.26.2 2022-04-20 12:24:46 +02:00
2692b8c960 Merge #2334
2334: Update dashboard to v.0.1.10 r=curquiza a=mdubus

Closes #2322

Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
2022-04-20 10:14:46 +00:00
58a1124e9a fix(search): formatted field 2022-04-20 11:30:01 +02:00
b57ad15a24 Update dashboard to v.0.1.10 2022-04-20 11:14:42 +02:00
c7d0097c97 Merge #498
498: Get rid of the threshold when comparing benchmarks r=curquiza a=irevoire

It just hides things

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-04-19 14:04:11 +00:00
152a10344c Get rid of the threshold when comparing benchmarks
It just hide things
2022-04-19 15:39:58 +02:00
04eb32e539 Merge #499
499: fix min-word-len-for-typo not reset properly r=Kerollmops a=MarinPostma

fix min word len for typo not resettign properly, as reported in https://github.com/meilisearch/meilisearch/issues/2330


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-19 13:22:19 +00:00
8b14090927 fix min-word-len-for-typo not reset properly 2022-04-19 15:20:16 +02:00
ea4bb9402f Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00