4649: Don't store the vectors in the documents database r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4607

## What does this PR do?
- Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable
- [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings
- [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field.
- [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
This commit is contained in:
meili-bors[bot]
2024-06-17 12:32:03 +00:00
committed by GitHub
60 changed files with 3920 additions and 1126 deletions

6
Cargo.lock generated
View File

@ -2455,6 +2455,7 @@ name = "index-scheduler"
version = "1.9.0"
dependencies = [
"anyhow",
"arroy",
"big_s",
"bincode",
"crossbeam",
@ -2465,6 +2466,7 @@ dependencies = [
"file-store",
"flate2",
"insta",
"maplit",
"meili-snap",
"meilisearch-auth",
"meilisearch-types",
@ -5301,9 +5303,9 @@ dependencies = [
[[package]]
name = "tracing-actix-web"
version = "0.7.10"
version = "0.7.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fa069bd1503dd526ee793bb3fce408895136c95fc86d2edb2acf1c646d7f0684"
checksum = "4ee9e39a66d9b615644893ffc1704d2a89b5b315b7fd0228ad3182ca9a306b19"
dependencies = [
"actix-web",
"mutually_exclusive_features",