Commit Graph

1615 Commits

Author SHA1 Message Date
ba0bb29cd8 refactor WordPrefixDocids to take dbs instead of indexes 2022-04-04 20:54:02 +02:00
c4c6e35352 query exact_word_docids in resolve_query_tree 2022-04-04 20:54:02 +02:00
8d46a5b0b5 extract exact word docids 2022-04-04 20:54:02 +02:00
5451c64d5d increase criteria asc desc test map size 2022-04-04 20:54:02 +02:00
0a77be4ec0 introduce exact_word_docids db 2022-04-04 20:54:02 +02:00
5f9f82757d refactor spawn_extraction_task 2022-04-04 20:54:02 +02:00
f82d4b36eb introduce exact attribute setting 2022-04-04 20:54:02 +02:00
c882d8daf0 add test for exact words 2022-04-04 20:54:01 +02:00
7e9d56a9e7 disable typos on exact words 2022-04-04 20:54:01 +02:00
900825bac0 Merge #474
474: Disable typos on exact word r=MarinPostma a=MarinPostma

This PR introduces the `exact_word` setting to disable typo tolerance on custom words.

If a user query contains a word from `exact_words`, no typo derivation will be made for that particular word.

I have chosen to store the words in a FST, to save on deserialization, and allow for fast lookups.

I had some trouble with the `serde` module, and had to rename it `serde_impl`.

## steps:
- [x] introduce new settings to register words to disable typos on
- [x] in `typos`, return exact match is the current word is part of the word to disable typos for.
- [x] update `Context` to return the exact words dictionary.
- [x] merge #473 


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 18:39:43 +00:00
3e67d8818c fix typo in test comment 2022-04-04 20:34:23 +02:00
284d8a24e0 add intergration test for disabled typon on word 2022-04-04 20:15:51 +02:00
30a2711bac rename serde module to serde_impl module
needed because of issues with rustfmt
2022-04-04 20:10:55 +02:00
0fd55db21c fmt 2022-04-04 20:10:55 +02:00
559e46be5e fix bad rebase bug 2022-04-04 20:10:55 +02:00
8b1e5d9c6d add test for exact words 2022-04-04 20:10:55 +02:00
774fa8f065 disable typos on exact words 2022-04-04 20:10:55 +02:00
9bbffb8fee add exact words setting 2022-04-04 20:10:54 +02:00
48a5ce7434 Merge #473
473: set minimum word len for typos r=MarinPostma a=MarinPostma

this PR allows the configuration on the minimum word length for typos.

The default values are the same as previously.

## steps
- [x] introduce settings for the minimum word length for 1 and 2 typos
- [x] update the settings update flow to set this setting
- [x] create a structure `TypoConfig` to configure typo tolerance in the query builder
- [x] in `typo`, use the configuration to create the appropriate query tree node.
- [x] extend `Context` to return the setting for minimum word length for typos
- [x] return correct error message for wrong settings.
- [x] merge #469 

Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 17:53:14 +00:00
6bf9824fec Merge #485
485: fix bug on 2 typos derivation r=Kerollmops a=MarinPostma

I found a bug while working on #473. This pr fixes it and add the missing tests on word derivations.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 17:17:53 +00:00
853b4a520f fmt 2022-04-04 10:41:46 +02:00
2cb71dff4a add typo integration tests 2022-04-04 10:41:46 +02:00
1941072bb2 implement Copy on Setting 2022-04-04 10:41:46 +02:00
fdaf45aab2 replace hardcoded value with constant in TestContext 2022-04-04 10:41:46 +02:00
950a740bd4 refactor typos for readability 2022-04-04 10:41:46 +02:00
66020cd923 rename min_word_len* to use plain letter numbers 2022-04-04 10:41:46 +02:00
4c4b336ecb rename min word len for typo error 2022-04-01 11:17:03 +02:00
286dd7b2e4 rename min_word_len_2_typo 2022-04-01 11:17:03 +02:00
55af85db3c add tests for min_word_len_for_typo 2022-04-01 11:17:02 +02:00
9102de5500 fix error message 2022-04-01 11:17:02 +02:00
a1a3a49bc9 dynamic minimum word len for typos in query tree builder 2022-04-01 11:17:02 +02:00
5a24e60572 introduce word len for typo setting 2022-04-01 11:17:02 +02:00
9fe40df960 add word derivations tests 2022-04-01 11:05:18 +02:00
d5ddc6b080 fix 2 typos word derivation bug 2022-04-01 10:51:22 +02:00
d2d930dd3f Merge #469
469: add authorize typo setting r=Kerollmops a=MarinPostma

This PR adds support for an authorize typo settings. This makes is possible to disable typos for a whole index. Typos are enabled by default.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-31 15:18:08 +00:00
3e34981d9b add test for authorize_typos in update 2022-03-31 14:12:00 +02:00
6ef3bb9d83 fmt 2022-03-31 14:06:23 +02:00
f782fe2062 add authorize_typo_test 2022-03-31 10:08:39 +02:00
c4653347fd add authorize typo setting 2022-03-31 10:05:44 +02:00
d8dd357326 Merge #480
480: Increase benchmarks (push) CI timeout r=Kerollmops a=Kerollmops

This PR fixes the fact that the benchmarks CI on push were [canceled by GitHub](https://github.com/meilisearch/milli/actions/runs/2028844132) because they reached the default timeout of 6h. This PR changes the timeout to 72h, the same setting as the manually triggered benchmark one.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-03-29 18:13:31 +00:00
6a77c81a28 Increase benchmarks (push) CI timeout 2022-03-29 09:45:36 -07:00
e10c26e70d Merge #479
479: Update version (v0.24.1) r=Kerollmops a=curquiza

From v0.23.1 to v0.24.1 since we had an issue with the versionning for the previous release

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-03-24 20:12:37 +00:00
ddf78a735b Update version (v0.24.1) 2022-03-24 16:39:45 +01:00
2c7cafbf20 Merge #475
475: Bump tokenizer r=Kerollmops a=irevoire

This PR bump the tokenizer in v0.2.9 which fixes an issue we had with lindera where reqwest was used with openssl (which was breaking our benchmarks).

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-03-23 13:26:44 +00:00
86dd88698d bump tokenizer 2022-03-23 14:25:58 +01:00
b82f46e862 Merge #476
476: Rollback meilisearch-tokenizer version r=Kerollmops a=irevoire

Lindera often fails to download some data from google drive we can’t compile consistently meilisearch / milli.
We can’t bump to the latest version (that moved out of google drive) either because lindera uses reqwest with openssl with no way of configuring it our benchmarks were not able to run. The latter issue should be fixed by https://github.com/lindera-morphology/lindera/pull/164.

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-03-22 14:02:00 +00:00
5dc464b9a7 rollback meilisearch-tokenizer version 2022-03-21 17:29:10 +01:00
90276d9a2d Merge #472
472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish

Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function.

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-03-16 15:33:11 +00:00
49d59d88c2 Remove useless variables in proximity 2022-03-16 16:12:52 +01:00
5863afa1a5 Merge #468
468: Add a new error message when the filterableAttributes are empty r=Kerollmops a=brunoocasali

Fixes https://github.com/meilisearch/meilisearch/issues/2140

Is there a good way to reduce de duplication here? Maybe adding a shared function? I don't know the best and idiomatic way to do that, I appreciate any tip!

Another doubt is related to the duplication of the calling:

```rs
// filter.rs:373
FilterError::AttributeNotFilterable {
    attribute,
    filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
},
```

and

```rs
// filter.rs:424
return Err(point[0].as_external_error(FilterError::AttributeNotFilterable {
    attribute: "_geo",
    filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
}))?;
```

I think we could make the `filterable_fields.into_iter().collect::<Vec<_>>().join(" ")` directly into the error handling like the sortable error. I made it into the last commit, if this is something to avoid, let me know and I can remove it :)

Co-authored-by: Bruno Casali <brunoocasali@gmail.com>
2022-03-16 15:02:19 +00:00