4f3ce6d9cd
nested fields
2022-04-07 16:58:46 +02:00
b799f3326b
rename merge_nothing to merge_ignore_values
2022-04-05 18:44:35 +02:00
201fea0fda
limit extract_word_docids memory usage
2022-04-05 14:14:15 +02:00
b85cd4983e
remove field_id_from_position
2022-04-05 09:50:34 +02:00
b7694c34f5
remove println
2022-04-04 21:00:07 +02:00
6cabd47c32
fix typo in comment
2022-04-04 20:59:20 +02:00
6b2c2509b2
fix bug in exact search
2022-04-04 20:54:03 +02:00
e8f06f6c06
extract exact_word_prefix_docids
2022-04-04 20:54:03 +02:00
ba0bb29cd8
refactor WordPrefixDocids to take dbs instead of indexes
2022-04-04 20:54:02 +02:00
c4c6e35352
query exact_word_docids in resolve_query_tree
2022-04-04 20:54:02 +02:00
8d46a5b0b5
extract exact word docids
2022-04-04 20:54:02 +02:00
0a77be4ec0
introduce exact_word_docids db
2022-04-04 20:54:02 +02:00
5f9f82757d
refactor spawn_extraction_task
2022-04-04 20:54:02 +02:00
d5b8b5a2f8
Replace the ugly unwraps by clean if let Somes
2022-02-28 16:31:33 +01:00
8d26f3040c
Remove a useless grenad file merging
2022-02-28 16:31:33 +01:00
04b1bbf932
Reintroduce appending sorted entries when possible
2022-02-24 14:50:45 +01:00
25123af3b8
Merge #436
...
436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops
This PR depends on the fixes done in #431 and must be merged after it.
In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents.
---
The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity.
The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things.
Co-authored-by: Clément Renault <clement@meilisearch.com >
Co-authored-by: Kerollmops <clement@meilisearch.com >
2022-02-16 15:41:14 +00:00
ff8d7a810d
Change the behavior of the as_cloneable_grenad by taking a ref
2022-02-16 15:40:08 +01:00
f367cc2e75
Finally bump grenad to v0.4.1
2022-02-16 15:28:48 +01:00
d59bcea749
Revert "Revert "Change chunk size to 4MiB to fit more the end user usage""
2022-02-02 17:01:13 +01:00
fb79c32430
Compute the new, common and, deleted prefix words fst once
2022-01-27 11:00:18 +01:00
51d1e64b23
Remove, now useless, the WriteMethod enum
2022-01-27 10:08:35 +01:00
e9c02173cf
Rework the WordsPrefixPositionDocids update to compute a subset of the database
2022-01-27 10:08:35 +01:00
d59e559317
Fix the computation of the newly added and common prefix words
2022-01-27 10:08:34 +01:00
28692f65be
Rework the WordPrefixDocids update to compute a subset of the database
2022-01-27 10:08:34 +01:00
5404bc02dd
Move the fst_stream_into_hashset method in the helper methods
2022-01-27 10:06:00 +01:00
822f67e9ad
Bring the newly created word pair proximity docids
2022-01-27 10:06:00 +01:00
d28f18658e
Retrieve the previous version of the words prefixes FST
2022-01-27 10:05:59 +01:00
fd177b63f8
Merge #423
...
423: Remove an unused file r=irevoire a=irevoire
This empty file is not included anywhere
Co-authored-by: Tamo <tamo@meilisearch.com >
2022-01-19 14:18:05 +00:00
0c84a40298
document batch support
...
reusable transform
rework update api
add indexer config
fix tests
review changes
Co-authored-by: Clément Renault <clement@meilisearch.com >
fmt
2022-01-19 12:40:20 +01:00
98a365aaae
store the geopoint in three dimensions
2021-12-14 12:21:24 +01:00
d671d6f0f1
remove an unused file
2021-12-13 19:27:34 +01:00
8970246bc4
Sort positions before iterating over them during word pair proximity extraction
2021-11-22 18:16:54 +01:00
6eb47ab792
remove update_id in UpdateBuilder
2021-11-16 13:07:04 +01:00
09b4281cff
improve document addition returned metaimprove document addition
...
returned metaimprove document addition returned metaimprove document
addition returned metaimprove document addition returned metaimprove
document addition returned metaimprove document addition returned
metaimprove document addition returned meta
2021-11-10 14:08:36 +01:00
3599df77f0
Change some error messages
2021-10-27 19:33:01 +02:00
baddd80069
implement review suggestions
2021-10-25 18:29:12 +02:00
430e9b13d3
add csv builder tests
2021-10-25 10:26:43 +02:00
2e62925a6e
fix tests
2021-10-25 10:26:42 +02:00
0f86d6b28f
implement csv serialization
2021-10-25 10:26:42 +02:00
8d70b01714
optimize document deserialization
2021-10-25 10:26:42 +02:00
c7db4176f3
Merge #384
...
384: Replace memmap with memmap2 r=Kerollmops a=palfrey
[memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html ) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values.
Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net >
2021-10-13 13:47:23 +00:00
6e3b869e6a
Merge #388
...
388: fix primary key inference r=MarinPostma a=MarinPostma
The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id".
This fix sorts the the index by field_id when infering the primary key.
Co-authored-by: mpostma <postma.marin@protonmail.com >
2021-10-12 09:25:16 +00:00
86ead92ed5
infer primary key on sorted fields
2021-10-12 11:15:11 +02:00
9a266a531b
test correct primary key inference
2021-10-12 11:08:53 +02:00
c5a6075484
Make max_position_per_attributes changable
2021-10-12 10:10:50 +02:00
360c5ff3df
Remove limit of 1000 position per attribute
...
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
2dfe24f067
memmap -> memmap2
2021-10-10 22:47:12 +01:00
3296bb243c
Simplify word level position DB into a word position DB
2021-10-05 12:15:02 +02:00
26b5dad042
Revert "Change chunk size to 4MiB to fit more the end user usage"
2021-09-29 15:08:39 +02:00