Commit Graph

29 Commits

Author SHA1 Message Date
63ef0aba18 Start porting facet distribution and sort to new database structure 2022-10-26 13:46:14 +02:00
7913d6365c Update Facets indexing to be compatible with new database structure 2022-10-26 13:46:14 +02:00
c3f49f766d Prepare refactor of facets database
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
6cc975704d Add some documentation to facets.rs 2022-08-17 12:59:52 +02:00
39687908f1 Add documentation and comments to facets.rs 2022-08-17 12:26:49 +02:00
8d4b21a005 Switch string facet levels indexation to new algo
Write the algorithm once for both numbers and strings
2022-08-17 12:26:49 +02:00
cf0cd92ed4 Refactor Facets::execute to increase performance 2022-08-17 12:26:49 +02:00
8ac24d3114 Cargo fmt + fix compiler warnings/error 2022-08-10 15:53:46 +02:00
6066256689 Add snapshot tests for indexing of word_prefix_pair_proximity_docids 2022-08-10 15:53:46 +02:00
3a734af159 Add snapshot tests for Facets::execute 2022-08-10 15:53:46 +02:00
25123af3b8 Merge #436
436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops

This PR depends on the fixes done in #431 and must be merged after it.

In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents.

---

The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity.

The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-02-16 15:41:14 +00:00
f367cc2e75 Finally bump grenad to v0.4.1 2022-02-16 15:28:48 +01:00
48542ac8fd get rid of chrono in favor of time 2022-02-15 11:41:55 +01:00
51d1e64b23 Remove, now useless, the WriteMethod enum 2022-01-27 10:08:35 +01:00
6eb47ab792 remove update_id in UpdateBuilder 2021-11-16 13:07:04 +01:00
fc7cc770d4 Add logging timers 2021-09-01 16:48:40 +02:00
1d314328f0 Plug new indexer 2021-09-01 16:48:36 +02:00
0227254a65 Return the original string values for the inverted facet index database 2021-07-21 16:59:39 +02:00
8c86348119 Indexing the facet strings levels 2021-07-21 16:59:38 +02:00
838ed1cd32 Use an u16 field id instead of one byte 2021-07-06 11:58:03 +02:00
32b7bd366f Remove the roaring operation functions warnings 2021-06-30 14:12:56 +02:00
9716fb3b36 format the whole project 2021-06-16 18:33:33 +02:00
312c2d1d8e Use the Error enum everywhere in the project 2021-06-14 16:58:38 +02:00
1c0a5cd136 Resolve code modification suggestions 2021-05-31 15:22:50 +02:00
bd7b285bae Split the update side to use the number and the strings facet databases 2021-05-25 11:30:00 +02:00
51767725b2 Simplify integer and float functions trait bounds 2021-04-20 10:23:31 +02:00
615fe095e1 update index updated at on index writes 2021-03-15 14:05:47 +01:00
f365de636f Compute and write the word-prefix-docids database 2021-02-17 11:12:38 +01:00
e8639517da Change the project to become a workspace with milli as a default-member 2021-02-12 16:15:09 +01:00