Commit Graph

61 Commits

Author SHA1 Message Date
85824ee203 Try to make facet indexing incremental 2022-10-26 13:47:04 +02:00
39a4a0a362 Reintroduce filter range search and facet extractors 2022-10-26 13:46:14 +02:00
c3f49f766d Prepare refactor of facets database
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
d76d0cb1bf Merge branch 'main' into word-pair-proximity-docids-refactor 2022-10-24 15:23:00 +02:00
a7de4f5b85 Don't add swapped word pairs to the word_pair_proximity_docids db 2022-10-18 10:37:34 +02:00
bdeb47305e Change encoding of word_pair_proximity DB to (proximity, word1, word2)
Same for word_prefix_pair_proximity
2022-10-18 10:37:34 +02:00
beb987d3d1 Fixing piles of clippy errors.
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
762e320c35 Add proximity calculation for the same word 2022-10-07 12:59:12 +02:00
00c02d00f3 Add missing logging timer to extractors 2022-09-30 22:17:06 +05:30
3794962330 Use an unstable algorithm for grenad::Sorter when possible 2022-09-13 14:49:53 +02:00
fe3973a51c Make sure that long words are correctly skipped 2022-09-07 15:03:32 +02:00
306593144d Refactor word prefix pair proximity indexation 2022-08-17 11:59:00 +02:00
07003704a8 Merge branch 'filter/field-exist' 2022-07-21 14:51:41 +02:00
1506683705 Avoid using too much memory when indexing facet-exists-docids 2022-07-19 14:42:35 +02:00
aed8c69bcb Refactor indexation of the "facet-id-exists-docids" database
The idea is to directly create a sorted and merged list of bitmaps
in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating
a grenad::Reader where the keys are field_id and the values are docids.

Then we send that BTreeMap to the thing that handles TypedChunks, which
inserts its content into the database.
2022-07-19 10:07:33 +02:00
80b962b4f4 Run cargo fmt 2022-07-19 10:07:33 +02:00
30bd4db0fc Simplify indexing task for facet_exists_docids database 2022-07-19 10:07:33 +02:00
392472f4bb Apply suggestions from code review
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-19 10:07:33 +02:00
453d593ce8 Add a database containing the docids where each field exists 2022-07-19 10:07:33 +02:00
2eec290424 Check the validity of the latitute and longitude numbers 2022-07-12 15:14:06 +02:00
d1a4da9812 Generate a real UUIDv4 when ids are auto-generated 2022-07-12 15:14:06 +02:00
fcfc4caf8c Move the Object type in the lib.rs file and use it everywhere 2022-07-12 14:55:51 +02:00
0146175fe6 Introduce the validate_documents_batch function 2022-07-12 14:55:51 +02:00
86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
0af399a6d7 fix the mixed dataset geosearch indexing bug 2022-05-16 17:37:45 +02:00
c55368ddd4 apply code suggestion
Co-authored-by: Kerollmops <kero@meilisearch.com>
2022-05-04 14:11:03 +02:00
3cb1f6d0a1 improve geosearch error messages 2022-05-02 19:20:47 +02:00
4f3ce6d9cd nested fields 2022-04-07 16:58:46 +02:00
201fea0fda limit extract_word_docids memory usage 2022-04-05 14:14:15 +02:00
b85cd4983e remove field_id_from_position 2022-04-05 09:50:34 +02:00
b7694c34f5 remove println 2022-04-04 21:00:07 +02:00
6cabd47c32 fix typo in comment 2022-04-04 20:59:20 +02:00
6b2c2509b2 fix bug in exact search 2022-04-04 20:54:03 +02:00
8d46a5b0b5 extract exact word docids 2022-04-04 20:54:02 +02:00
0a77be4ec0 introduce exact_word_docids db 2022-04-04 20:54:02 +02:00
5f9f82757d refactor spawn_extraction_task 2022-04-04 20:54:02 +02:00
ff8d7a810d Change the behavior of the as_cloneable_grenad by taking a ref 2022-02-16 15:40:08 +01:00
f367cc2e75 Finally bump grenad to v0.4.1 2022-02-16 15:28:48 +01:00
8970246bc4 Sort positions before iterating over them during word pair proximity extraction 2021-11-22 18:16:54 +01:00
c5a6075484 Make max_position_per_attributes changable 2021-10-12 10:10:50 +02:00
360c5ff3df Remove limit of 1000 position per attribute
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
3296bb243c Simplify word level position DB into a word position DB 2021-10-05 12:15:02 +02:00
a84f3a8b31 Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
bad8ea47d5 edit the two lasts TODO comments 2021-09-08 18:24:09 +02:00
bd4c248292 improve the error handling in general and introduce the concept of reserved keywords 2021-09-08 18:24:09 +02:00
f73273d71c only call the extractor if needed 2021-09-08 17:51:08 +02:00
a21c854790 handle errors 2021-09-08 17:51:07 +02:00
70ab2c37c5 remove multiple bugs 2021-09-08 17:51:07 +02:00
b4b6ba6d82 rename all the ’long’ into ’lng’ like written in the specification 2021-09-08 17:51:07 +02:00
44d6b6ae9e Index the geo points 2021-09-08 17:51:07 +02:00