4900: Indexer edition 2024 r=Kerollmops a=dureuill

This PR is implementing the indexer edition 2024, largely inspired by [the ideas from this blog post](https://blog.kerollmops.com/meilisearch-is-too-slow).

Fixes https://github.com/meilisearch/meilisearch/issues/4985

## Features
- Stream-first approach to reading documents.
- Minimum disk write operations.
- RAM usage-first approach to avoid modifying common bitmaps on disk but in memory.
- Reduced LMDB fragmentation by writing entries only once...
- ...computing the final version of the entries in parallel...
- ...and storing them in write-optimized data structures before sending them to the BTree (LMDB).
- Indexing in multiple transactions to improve large dataset support (dumps).


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
This commit is contained in:
meili-bors[bot]
2024-11-21 16:19:10 +00:00
committed by GitHub
134 changed files with 15591 additions and 3843 deletions

View File

@@ -1,5 +1,5 @@
---
source: crates/index-scheduler/src/lib.rs
source: crates/crates/index-scheduler/src/lib.rs
snapshot_kind: text
---
### Autobatching Enabled = true
@@ -23,7 +23,7 @@ succeeded [0,1,2,]
doggos [0,1,2,]
----------------------------------------------------------------------
### Index Mapper:
doggos: { number_of_documents: 1, field_distribution: {"_vectors": 1, "breed": 1, "doggo": 1, "id": 1} }
doggos: { number_of_documents: 1, field_distribution: {"breed": 1, "doggo": 1, "id": 1} }
----------------------------------------------------------------------
### Canceled By:

View File

@@ -1,5 +1,5 @@
---
source: crates/index-scheduler/src/lib.rs
source: crates/crates/index-scheduler/src/lib.rs
snapshot_kind: text
---
### Autobatching Enabled = true
@@ -23,7 +23,7 @@ succeeded [0,1,]
doggos [0,1,2,]
----------------------------------------------------------------------
### Index Mapper:
doggos: { number_of_documents: 1, field_distribution: {"_vectors": 1, "breed": 1, "doggo": 1, "id": 1} }
doggos: { number_of_documents: 1, field_distribution: {"breed": 1, "doggo": 1, "id": 1} }
----------------------------------------------------------------------
### Canceled By:

View File

@@ -1,5 +1,5 @@
---
source: crates/index-scheduler/src/lib.rs
source: crates/crates/index-scheduler/src/lib.rs
snapshot_kind: text
---
### Autobatching Enabled = true
@@ -22,7 +22,7 @@ succeeded [0,1,]
doggos [0,1,]
----------------------------------------------------------------------
### Index Mapper:
doggos: { number_of_documents: 1, field_distribution: {"_vectors": 1, "breed": 1, "doggo": 1, "id": 1} }
doggos: { number_of_documents: 1, field_distribution: {"breed": 1, "doggo": 1, "id": 1} }
----------------------------------------------------------------------
### Canceled By: