Commit Graph

2340 Commits

Author SHA1 Message Date
ManyTheFish
bd30ee97b8 Keep separators at start of the croped string 2022-04-05 17:41:32 +02:00
ManyTheFish
29c5f76d7f Use new matcher in http-ui 2022-04-05 17:41:32 +02:00
ManyTheFish
734d0899d3 Publish Matcher 2022-04-05 17:41:32 +02:00
ManyTheFish
4428cb5909 Add some tests and fix some corner cases 2022-04-05 17:41:32 +02:00
ManyTheFish
844f546a8b Add matches algorithm V1 2022-04-05 17:41:32 +02:00
ManyTheFish
3be1790803 Add crop algorithm with naive match algorithm 2022-04-05 17:41:32 +02:00
ManyTheFish
d96e72e5dc Create formater with some tests 2022-04-05 17:41:32 +02:00
ad hoc
201fea0fda limit extract_word_docids memory usage 2022-04-05 14:14:15 +02:00
ad hoc
5cfd3d8407 add exact attributes documentation 2022-04-05 14:10:22 +02:00
Clémentine Urquizar
9eec44dd98 Update version (v0.25.0) 2022-04-05 12:06:42 +02:00
ad hoc
b85cd4983e remove field_id_from_position 2022-04-05 09:50:34 +02:00
ad hoc
dac81b2d44 add missing \n in cli settings 2022-04-05 09:48:56 +02:00
ad hoc
ab185a59b5 fix infos 2022-04-05 09:46:56 +02:00
ad hoc
59e41d98e3 add comments to integration test 2022-04-04 21:17:06 +02:00
ad hoc
1810927dbd rephrase exact_attributes doc 2022-04-04 21:04:49 +02:00
ad hoc
b7694c34f5 remove println 2022-04-04 21:00:07 +02:00
ad hoc
6cabd47c32 fix typo in comment 2022-04-04 20:59:20 +02:00
ad hoc
9963f11172 fix infos crate compilation issue 2022-04-04 20:54:03 +02:00
ad hoc
c8d3a09af8 add integration test for disabel typo on attributes 2022-04-04 20:54:03 +02:00
ad hoc
bfd81ce050 add exact atttributes to cli settings 2022-04-04 20:54:03 +02:00
ad hoc
6b2c2509b2 fix bug in exact search 2022-04-04 20:54:03 +02:00
ad hoc
56b4f5dce2 add exact prefix to query_docids 2022-04-04 20:54:03 +02:00
ad hoc
21ae4143b1 add exact_word_prefix to Context 2022-04-04 20:54:03 +02:00
ad hoc
e8f06f6c06 extract exact_word_prefix_docids 2022-04-04 20:54:03 +02:00
ad hoc
6dd2e4ffbd introduce exact_word_prefix database in index 2022-04-04 20:54:03 +02:00
ad hoc
ba0bb29cd8 refactor WordPrefixDocids to take dbs instead of indexes 2022-04-04 20:54:02 +02:00
ad hoc
c4c6e35352 query exact_word_docids in resolve_query_tree 2022-04-04 20:54:02 +02:00
ad hoc
8d46a5b0b5 extract exact word docids 2022-04-04 20:54:02 +02:00
ad hoc
5451c64d5d increase criteria asc desc test map size 2022-04-04 20:54:02 +02:00
ad hoc
0a77be4ec0 introduce exact_word_docids db 2022-04-04 20:54:02 +02:00
ad hoc
5f9f82757d refactor spawn_extraction_task 2022-04-04 20:54:02 +02:00
ad hoc
f82d4b36eb introduce exact attribute setting 2022-04-04 20:54:02 +02:00
ad hoc
c882d8daf0 add test for exact words 2022-04-04 20:54:01 +02:00
ad hoc
7e9d56a9e7 disable typos on exact words 2022-04-04 20:54:01 +02:00
bors[bot]
900825bac0 Merge #474
474: Disable typos on exact word r=MarinPostma a=MarinPostma

This PR introduces the `exact_word` setting to disable typo tolerance on custom words.

If a user query contains a word from `exact_words`, no typo derivation will be made for that particular word.

I have chosen to store the words in a FST, to save on deserialization, and allow for fast lookups.

I had some trouble with the `serde` module, and had to rename it `serde_impl`.

## steps:
- [x] introduce new settings to register words to disable typos on
- [x] in `typos`, return exact match is the current word is part of the word to disable typos for.
- [x] update `Context` to return the exact words dictionary.
- [x] merge #473 


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 18:39:43 +00:00
ad hoc
3e67d8818c fix typo in test comment 2022-04-04 20:34:23 +02:00
ad hoc
284d8a24e0 add intergration test for disabled typon on word 2022-04-04 20:15:51 +02:00
ad hoc
30a2711bac rename serde module to serde_impl module
needed because of issues with rustfmt
2022-04-04 20:10:55 +02:00
ad hoc
0fd55db21c fmt 2022-04-04 20:10:55 +02:00
ad hoc
559e46be5e fix bad rebase bug 2022-04-04 20:10:55 +02:00
ad hoc
8b1e5d9c6d add test for exact words 2022-04-04 20:10:55 +02:00
ad hoc
774fa8f065 disable typos on exact words 2022-04-04 20:10:55 +02:00
ad hoc
9bbffb8fee add exact words setting 2022-04-04 20:10:54 +02:00
bors[bot]
48a5ce7434 Merge #473
473: set minimum word len for typos r=MarinPostma a=MarinPostma

this PR allows the configuration on the minimum word length for typos.

The default values are the same as previously.

## steps
- [x] introduce settings for the minimum word length for 1 and 2 typos
- [x] update the settings update flow to set this setting
- [x] create a structure `TypoConfig` to configure typo tolerance in the query builder
- [x] in `typo`, use the configuration to create the appropriate query tree node.
- [x] extend `Context` to return the setting for minimum word length for typos
- [x] return correct error message for wrong settings.
- [x] merge #469 

Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 17:53:14 +00:00
bors[bot]
6bf9824fec Merge #485
485: fix bug on 2 typos derivation r=Kerollmops a=MarinPostma

I found a bug while working on #473. This pr fixes it and add the missing tests on word derivations.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 17:17:53 +00:00
ad hoc
853b4a520f fmt 2022-04-04 10:41:46 +02:00
ad hoc
2cb71dff4a add typo integration tests 2022-04-04 10:41:46 +02:00
ad hoc
1941072bb2 implement Copy on Setting 2022-04-04 10:41:46 +02:00
ad hoc
fdaf45aab2 replace hardcoded value with constant in TestContext 2022-04-04 10:41:46 +02:00
ad hoc
950a740bd4 refactor typos for readability 2022-04-04 10:41:46 +02:00