Commit Graph

1963 Commits

Author SHA1 Message Date
ad hoc
8d46a5b0b5 extract exact word docids 2022-04-04 20:54:02 +02:00
ad hoc
5451c64d5d increase criteria asc desc test map size 2022-04-04 20:54:02 +02:00
ad hoc
0a77be4ec0 introduce exact_word_docids db 2022-04-04 20:54:02 +02:00
ad hoc
5f9f82757d refactor spawn_extraction_task 2022-04-04 20:54:02 +02:00
ad hoc
f82d4b36eb introduce exact attribute setting 2022-04-04 20:54:02 +02:00
ad hoc
c882d8daf0 add test for exact words 2022-04-04 20:54:01 +02:00
ad hoc
7e9d56a9e7 disable typos on exact words 2022-04-04 20:54:01 +02:00
bors[bot]
900825bac0 Merge #474
474: Disable typos on exact word r=MarinPostma a=MarinPostma

This PR introduces the `exact_word` setting to disable typo tolerance on custom words.

If a user query contains a word from `exact_words`, no typo derivation will be made for that particular word.

I have chosen to store the words in a FST, to save on deserialization, and allow for fast lookups.

I had some trouble with the `serde` module, and had to rename it `serde_impl`.

## steps:
- [x] introduce new settings to register words to disable typos on
- [x] in `typos`, return exact match is the current word is part of the word to disable typos for.
- [x] update `Context` to return the exact words dictionary.
- [x] merge #473 


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 18:39:43 +00:00
ad hoc
3e67d8818c fix typo in test comment 2022-04-04 20:34:23 +02:00
ad hoc
284d8a24e0 add intergration test for disabled typon on word 2022-04-04 20:15:51 +02:00
ad hoc
30a2711bac rename serde module to serde_impl module
needed because of issues with rustfmt
2022-04-04 20:10:55 +02:00
ad hoc
0fd55db21c fmt 2022-04-04 20:10:55 +02:00
ad hoc
559e46be5e fix bad rebase bug 2022-04-04 20:10:55 +02:00
ad hoc
8b1e5d9c6d add test for exact words 2022-04-04 20:10:55 +02:00
ad hoc
774fa8f065 disable typos on exact words 2022-04-04 20:10:55 +02:00
ad hoc
9bbffb8fee add exact words setting 2022-04-04 20:10:54 +02:00
bors[bot]
48a5ce7434 Merge #473
473: set minimum word len for typos r=MarinPostma a=MarinPostma

this PR allows the configuration on the minimum word length for typos.

The default values are the same as previously.

## steps
- [x] introduce settings for the minimum word length for 1 and 2 typos
- [x] update the settings update flow to set this setting
- [x] create a structure `TypoConfig` to configure typo tolerance in the query builder
- [x] in `typo`, use the configuration to create the appropriate query tree node.
- [x] extend `Context` to return the setting for minimum word length for typos
- [x] return correct error message for wrong settings.
- [x] merge #469 

Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 17:53:14 +00:00
bors[bot]
6bf9824fec Merge #485
485: fix bug on 2 typos derivation r=Kerollmops a=MarinPostma

I found a bug while working on #473. This pr fixes it and add the missing tests on word derivations.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 17:17:53 +00:00
ad hoc
853b4a520f fmt 2022-04-04 10:41:46 +02:00
ad hoc
2cb71dff4a add typo integration tests 2022-04-04 10:41:46 +02:00
ad hoc
1941072bb2 implement Copy on Setting 2022-04-04 10:41:46 +02:00
ad hoc
fdaf45aab2 replace hardcoded value with constant in TestContext 2022-04-04 10:41:46 +02:00
ad hoc
950a740bd4 refactor typos for readability 2022-04-04 10:41:46 +02:00
ad hoc
66020cd923 rename min_word_len* to use plain letter numbers 2022-04-04 10:41:46 +02:00
ad hoc
4c4b336ecb rename min word len for typo error 2022-04-01 11:17:03 +02:00
ad hoc
286dd7b2e4 rename min_word_len_2_typo 2022-04-01 11:17:03 +02:00
ad hoc
55af85db3c add tests for min_word_len_for_typo 2022-04-01 11:17:02 +02:00
ad hoc
9102de5500 fix error message 2022-04-01 11:17:02 +02:00
ad hoc
a1a3a49bc9 dynamic minimum word len for typos in query tree builder 2022-04-01 11:17:02 +02:00
ad hoc
5a24e60572 introduce word len for typo setting 2022-04-01 11:17:02 +02:00
ad hoc
9fe40df960 add word derivations tests 2022-04-01 11:05:18 +02:00
ad hoc
d5ddc6b080 fix 2 typos word derivation bug 2022-04-01 10:51:22 +02:00
bors[bot]
d2d930dd3f Merge #469
469: add authorize typo setting r=Kerollmops a=MarinPostma

This PR adds support for an authorize typo settings. This makes is possible to disable typos for a whole index. Typos are enabled by default.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-31 15:18:08 +00:00
ad hoc
3e34981d9b add test for authorize_typos in update 2022-03-31 14:12:00 +02:00
ad hoc
6ef3bb9d83 fmt 2022-03-31 14:06:23 +02:00
ad hoc
f782fe2062 add authorize_typo_test 2022-03-31 10:08:39 +02:00
ad hoc
c4653347fd add authorize typo setting 2022-03-31 10:05:44 +02:00
bors[bot]
d8dd357326 Merge #480
480: Increase benchmarks (push) CI timeout r=Kerollmops a=Kerollmops

This PR fixes the fact that the benchmarks CI on push were [canceled by GitHub](https://github.com/meilisearch/milli/actions/runs/2028844132) because they reached the default timeout of 6h. This PR changes the timeout to 72h, the same setting as the manually triggered benchmark one.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-03-29 18:13:31 +00:00
Kerollmops
6a77c81a28 Increase benchmarks (push) CI timeout 2022-03-29 09:45:36 -07:00
bors[bot]
e10c26e70d Merge #479
479: Update version (v0.24.1) r=Kerollmops a=curquiza

From v0.23.1 to v0.24.1 since we had an issue with the versionning for the previous release

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-03-24 20:12:37 +00:00
Clémentine Urquizar
ddf78a735b Update version (v0.24.1) 2022-03-24 16:39:45 +01:00
bors[bot]
2c7cafbf20 Merge #475
475: Bump tokenizer r=Kerollmops a=irevoire

This PR bump the tokenizer in v0.2.9 which fixes an issue we had with lindera where reqwest was used with openssl (which was breaking our benchmarks).

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-03-23 13:26:44 +00:00
Irevoire
86dd88698d bump tokenizer 2022-03-23 14:25:58 +01:00
bors[bot]
b82f46e862 Merge #476
476: Rollback meilisearch-tokenizer version r=Kerollmops a=irevoire

Lindera often fails to download some data from google drive we can’t compile consistently meilisearch / milli.
We can’t bump to the latest version (that moved out of google drive) either because lindera uses reqwest with openssl with no way of configuring it our benchmarks were not able to run. The latter issue should be fixed by https://github.com/lindera-morphology/lindera/pull/164.

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-03-22 14:02:00 +00:00
Irevoire
5dc464b9a7 rollback meilisearch-tokenizer version 2022-03-21 17:29:10 +01:00
bors[bot]
90276d9a2d Merge #472
472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish

Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function.

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-03-16 15:33:11 +00:00
ManyTheFish
49d59d88c2 Remove useless variables in proximity 2022-03-16 16:12:52 +01:00
bors[bot]
5863afa1a5 Merge #468
468: Add a new error message when the filterableAttributes are empty r=Kerollmops a=brunoocasali

Fixes https://github.com/meilisearch/meilisearch/issues/2140

Is there a good way to reduce de duplication here? Maybe adding a shared function? I don't know the best and idiomatic way to do that, I appreciate any tip!

Another doubt is related to the duplication of the calling:

```rs
// filter.rs:373
FilterError::AttributeNotFilterable {
    attribute,
    filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
},
```

and

```rs
// filter.rs:424
return Err(point[0].as_external_error(FilterError::AttributeNotFilterable {
    attribute: "_geo",
    filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
}))?;
```

I think we could make the `filterable_fields.into_iter().collect::<Vec<_>>().join(" ")` directly into the error handling like the sortable error. I made it into the last commit, if this is something to avoid, let me know and I can remove it :)

Co-authored-by: Bruno Casali <brunoocasali@gmail.com>
2022-03-16 15:02:19 +00:00
Bruno Casali
adc71742c8 Move string concat to the struct instead of in the calling 2022-03-16 10:26:12 -03:00
bors[bot]
cb6b6915a4 Merge #470
470: Set the cargo crate resolver to v2 r=Kerollmops a=MarinPostma

This PR updates the workspace resolver to v2. This should fix [the benchmarks](https://github.com/meilisearch/milli/runs/5558347765?check_suite_focus=true#step:8:184).


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-16 10:55:22 +00:00