Commit Graph

1537 Commits

Author SHA1 Message Date
meili-bors[bot]
19e6f675b3 Merge #4900
4900: Indexer edition 2024 r=Kerollmops a=dureuill

This PR is implementing the indexer edition 2024, largely inspired by [the ideas from this blog post](https://blog.kerollmops.com/meilisearch-is-too-slow).

Fixes https://github.com/meilisearch/meilisearch/issues/4985

## Features
- Stream-first approach to reading documents.
- Minimum disk write operations.
- RAM usage-first approach to avoid modifying common bitmaps on disk but in memory.
- Reduced LMDB fragmentation by writing entries only once...
- ...computing the final version of the entries in parallel...
- ...and storing them in write-optimized data structures before sending them to the BTree (LMDB).
- Indexing in multiple transactions to improve large dataset support (dumps).


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 16:19:10 +00:00
Louis Dureuil
323ecbb885 Add span on document operation 2024-11-21 17:01:10 +01:00
Louis Dureuil
ffb60cb885 Add comment explaining why we fixed the version of insta 2024-11-21 16:56:56 +01:00
Louis Dureuil
dcc3caef0d Remove TopLevelMap 2024-11-21 16:56:46 +01:00
Louis Dureuil
221e547e86 Slight changes 2024-11-21 16:47:44 +01:00
Clément Renault
61d0615253 Document the geo point extractor 2024-11-21 16:47:08 +01:00
Clément Renault
5727e00374 Remove useless geo skipped 2024-11-21 16:47:08 +01:00
Clément Renault
9b60843831 Remove commented lines 2024-11-21 16:47:07 +01:00
ManyTheFish
36962b943b First batch of PR comment 2024-11-21 16:38:11 +01:00
Louis Dureuil
32bcacefd5 Changes Document::len to Document::top_level_fields_count 2024-11-21 15:01:07 +01:00
Louis Dureuil
4ed195426c remove unused stuff in global.rs 2024-11-21 15:01:07 +01:00
Many the fish
ff38f29981 Update crates/index-scheduler/src/batch.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 14:18:39 +01:00
ManyTheFish
94b260fd25 Remove orphan span 2024-11-21 12:12:07 +01:00
Clément Renault
ab2c83f868 Use the disk less when computing prefixes 2024-11-21 10:45:37 +01:00
Louis Dureuil
1f9692cd04 Increase map size for tests 2024-11-20 17:52:21 +01:00
Tamo
1e694ae432 improve the count of the number of tasks in a batch 2024-11-20 17:48:26 +01:00
Tamo
71807cac6d makes clippy happy 2024-11-20 17:40:58 +01:00
Tamo
21a2264782 improve the details and stats of the current batch processing 2024-11-20 17:25:55 +01:00
Louis Dureuil
bda2b41d11 update snaps after merge 2024-11-20 17:08:30 +01:00
Louis Dureuil
6e6acfcf1b Merge branch 'main' into indexer-edition-2024 2024-11-20 16:59:58 +01:00
Louis Dureuil
e0864f1b21 Separate side effect and debug asserts 2024-11-20 16:25:17 +01:00
Clément Renault
a38344acb3 Replace eprintlns by tracing 2024-11-20 15:29:51 +01:00
ManyTheFish
4d616f8794 Parse every attributes and filter before tokenization 2024-11-20 15:15:25 +01:00
Louis Dureuil
ff9c92c409 rename documents -> substep 2024-11-20 15:12:02 +01:00
Clément Renault
8380ddbdcd Fix progress of into_changes 2024-11-20 15:10:09 +01:00
Louis Dureuil
867138f166 Add SP to into_changes 2024-11-20 15:07:05 +01:00
Clément Renault
567bd4538b Fxi the into_changes stop processing 2024-11-20 14:58:25 +01:00
Louis Dureuil
84600a10d1 Add MSP to document_update.into_changes() 2024-11-20 14:53:37 +01:00
ManyTheFish
35bbe1c2a2 Add failing test on settings changes 2024-11-20 14:48:12 +01:00
Louis Dureuil
7d64e8dbd3 Fix Windows compilation 2024-11-20 14:40:38 +01:00
Tamo
ec06879d28 apply review changes 2024-11-20 14:40:36 +01:00
Tamo
83d1f858c1 Update crates/index-scheduler/src/lib.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-20 14:36:05 +01:00
Louis Dureuil
cae8c89467 "fix" last warnings 2024-11-20 14:03:52 +01:00
Tamo
a7ac590e9e implements the reverse query parameter for the batches 2024-11-20 13:29:52 +01:00
Clément Renault
7cb8732b45 Introduce a new bincode internal error 2024-11-20 13:23:11 +01:00
Tamo
8ad68dd708 stop leaking the update files of the canceled tasks 2024-11-20 13:17:54 +01:00
ManyTheFish
fe5d50969a Fix filed selector in extrators 2024-11-20 13:16:44 +01:00
Clément Renault
56c7c5d5f0 Fix comments 2024-11-20 13:16:44 +01:00
Louis Dureuil
4cdfdddd6d Fix one more 2024-11-20 13:16:43 +01:00
Louis Dureuil
2afa33011a Fix tokenize_document 2024-11-20 13:16:43 +01:00
Louis Dureuil
61feca1f41 More tests pass 2024-11-20 13:16:43 +01:00
Louis Dureuil
f893b5153e Don't mark [""] as empty facet 2024-11-20 13:16:42 +01:00
Louis Dureuil
ca779c21f9 facets: Handle boolean and skip empty strings 2024-11-20 13:16:42 +01:00
Louis Dureuil
477077bdc2 Remove _vectors from fid map when there are no vectors in sight 2024-11-20 13:16:42 +01:00
ManyTheFish
b1f8aec348 Fix index_documents_check_exists_database 2024-11-20 13:16:41 +01:00
ManyTheFish
ba7f091db3 Use tokenizer on numbers and booleans 2024-11-20 13:16:41 +01:00
Louis Dureuil
8049df125b Add depth to facet extraction so that null inside an array doesn't mark the entire field as null 2024-11-20 13:16:40 +01:00
Clément Renault
50d1bd01df We no longer index geo lat and lng 2024-11-20 13:16:40 +01:00
Louis Dureuil
a28d4f5d0c Fix setup_search_index_with_criteria 2024-11-20 13:16:40 +01:00
Louis Dureuil
fc14f4bc66 Attempt to fix setup_search_index_with_criteria 2024-11-20 13:16:39 +01:00