Compare commits

...

43 Commits

Author SHA1 Message Date
CaroFG
cf2758a217 Add image search demo 2025-10-30 10:17:34 +01:00
Clément Renault
6df196034e Merge pull request #5950 from meilisearch/update-version-v1.24.0
Update version to v1.24.0
2025-10-20 11:17:15 +00:00
Clément Renault
a63762737c Upgrade index scheduler 2025-10-20 12:22:27 +02:00
Clément Renault
77394bd4b9 Update insta tests 2025-10-20 10:54:16 +02:00
Clément Renault
cb87201c8b Fix dumpless upgrade and do nothing 2025-10-20 10:42:35 +02:00
Clément Renault
1a9c38794f Bump version to v1.24.0 2025-10-20 10:38:48 +02:00
Clément Renault
34233efb63 Merge pull request #5946 from meilisearch/fix-compaction-issues
Improve compaction behaviors
2025-10-16 15:42:38 +00:00
Clément Renault
af0608ebd6 Continue to the next index if index doesn't exists 2025-10-16 16:39:51 +02:00
Clément Renault
8c7e5c094e Improve the task batch stopped message 2025-10-16 16:39:50 +02:00
Clément Renault
c064737137 Remove duplicated logic in auto batching of tasks 2025-10-16 16:33:20 +02:00
Clément Renault
1d188a7ad3 Make the compaction tasks a priority over the export ones 2025-10-16 13:01:23 +02:00
Clément Renault
66a6b65716 Merge pull request #5945 from meilisearch/search-cutoff-vector-store
Search cutoff vector store
2025-10-16 09:43:20 +00:00
Louis Dureuil
326652a399 Update hannoy 2025-10-16 10:34:54 +02:00
Louis Dureuil
59316e8d5a add unit test 2025-10-16 10:34:20 +02:00
Louis Dureuil
76d7f20c87 fix snap 2025-10-16 10:34:19 +02:00
Louis Dureuil
380b2797a5 Share the same budget for all queries of a given index in federated search 2025-10-16 10:34:19 +02:00
Clémentine
1dd58f9bec Merge pull request #5866 from PedroTroller/build/alpine3.22
Bump Dockerfile alpine version to 3.22
2025-10-16 07:22:43 +00:00
Kerollmops
ddc76ad0dc Delete the leftover compaction files from canceled operations 2025-10-15 16:49:25 +02:00
Kerollmops
ffacf1c002 Introduce the new IndexMapper index path method 2025-10-15 16:49:25 +02:00
Kerollmops
5a49b93b77 Use constant tempfile name to reuse tempfile 2025-10-15 16:49:25 +02:00
Louis Dureuil
918a6eaec9 Implement for vector store ranking rule 2025-10-15 16:31:47 +02:00
Louis Dureuil
1e6ce70e3e "Uninteresting" ranking rule implementations 2025-10-15 16:31:47 +02:00
Louis Dureuil
b418054ee4 Change bucket_sort logic to pass the time budget and allow for retrieving non-blocking buckets 2025-10-15 16:31:47 +02:00
Louis Dureuil
58f30e9d8a Change RankingRule trait to account for budget 2025-10-15 16:31:46 +02:00
Many the fish
c45172a4bf Merge pull request #5942 from meilisearch/meili-bot-patch-1
Adapt the standards of prototypes
2025-10-15 11:22:03 +00:00
meili-bot
221ba20083 Adapt the standards of prototypes 2025-10-15 10:47:23 +02:00
Many the fish
93c5fbbb8b Merge pull request #5926 from meilisearch/search-metadata
Search metadata
2025-10-14 14:13:42 +00:00
ManyTheFish
22d529523a refactor: extract query metadata building logic into separate function 2025-10-14 14:39:07 +02:00
ManyTheFish
ed6f479940 Remove irrelevant test index method 2025-10-14 12:10:17 +02:00
ManyTheFish
f19f712433 Add local remote name when a remote federated search is made 2025-10-14 12:10:17 +02:00
ManyTheFish
24a92c2809 move contant header in search/mod.rs 2025-10-14 12:10:17 +02:00
ManyTheFish
443cc24408 --amend 2025-10-14 12:10:17 +02:00
ManyTheFish
e8d5228250 factorize metadata header 2025-10-14 12:10:17 +02:00
ManyTheFish
5c33fb090c avoid openning each index twice and remove clones 2025-10-14 12:10:17 +02:00
ManyTheFish
48dd9146e7 Add comprehensive metadata tests with insta snapshots
- Add 9 test cases covering single search, multi-search, and federated search
- Test metadata header opt-in functionality with case insensitivity
- Test header false value handling
- Test UUID format validation and consistency
- Use insta snapshots for reliable, maintainable test assertions
- Fix header parsing to properly handle 'false' values
- Add helper methods for testing with custom headers
2025-10-14 12:10:17 +02:00
ManyTheFish
c1c42e818e refactor: group perform_search parameters into SearchParams struct
- Create SearchParams struct to group related parameters
- Update perform_search function to use SearchParams instead of 8 individual parameters
- Fix clippy warning about too many arguments
- Update all callers to use new SearchParams struct
2025-10-14 12:10:17 +02:00
ManyTheFish
519905ef9c Fix remote index collision with HashMap-based lookup
- Replace BTreeMap with HashMap for (remote, index_uid) -> primary_key lookup
- Prevents collisions when multiple remotes have same index_uid but different primary keys
2025-10-14 12:10:17 +02:00
ManyTheFish
f242377d2b Fix remote index collision in federated search metadata
- Use composite key (indexUid, remote) instead of indexUid only for remote metadata lookup
- Prevents collisions when multiple remotes have same indexUid but different primary keys
- Ensures each remote query gets correct primaryKey from its specific remote instance
2025-10-14 12:10:17 +02:00
ManyTheFish
da06306274 Add header-based metadata opt-in for search responses
- Add Meili-Include-Metadata header constant
- Modify perform_search to conditionally include metadata based on header
- Modify perform_federated_search to conditionally include metadata based on header
- Update all search routes to check for header and pass include_metadata parameter
- Forward Meili-Include-Metadata header to remote requests for federated search
- Ensure remote queries include primaryKey metadata when header is present
2025-10-14 12:10:17 +02:00
ManyTheFish
b93b803a2e WIP: Add metadata field with queryUid, indexUid, primaryKey, and remote
- Add SearchMetadata struct with queryUid, indexUid, primaryKey, and remote fields
- Update SearchResult to include metadata field
- Update FederatedSearchResult to include metadata array
- Refactor federated search metadata building to maintain query order
- Support primary key extraction from both local and remote results
- Add remote field to identify remote instance for federated queries
- Ensure metadata array matches query order in federated search

Features:
- queryUid: UUID v7 for each query
- indexUid: Index identifier
- primaryKey: Primary key field name (null if not available)
- remote: Remote instance name (null for local queries)

This provides complete traceability for search operations across local and remote instances.
2025-10-14 12:10:17 +02:00
ManyTheFish
cf43ec4aff feat: add indexUid to SearchMetadata
- Add indexUid field to SearchMetadata struct
- Update perform_search to include indexUid in metadata
- Update federated search to include indexUid for each query

The metadata field now contains both queryUid and indexUid:
- For /search: single object with queryUid and indexUid
- For /multi-search: each result has metadata with both fields
- For federated search: array of objects, each with queryUid and indexUid
2025-10-14 12:10:17 +02:00
ManyTheFish
9795d98e77 feat: add metadata field with queryUid to search responses
- Add SearchMetadata struct with queryUid field (UUID v7)
- Add metadata field to SearchResult for /search route
- Add metadata field to FederatedSearchResult for /multi-search route
- Update perform_search to generate queryUid and set metadata
- Update federated search to generate queryUid for each query
- Update multi-search non-federated path to include metadata
- Fix pattern matching in analytics and other code

The metadata field contains:
- For /search: single object with queryUid
- For /multi-search: array of objects, one per query
- For federated search: array of objects, one per query

All queryUid values are generated using Uuid::now_v7() for time-ordered uniqueness.
2025-10-14 12:10:17 +02:00
PedroTroller
9f4dcd04e9 Bump alpine version to 3.22 2025-09-18 17:08:36 +02:00
49 changed files with 1392 additions and 219 deletions

38
Cargo.lock generated
View File

@@ -589,7 +589,7 @@ source = "git+https://github.com/meilisearch/bbqueue#cbb87cc707b5af415ef203bdaf2
[[package]] [[package]]
name = "benchmarks" name = "benchmarks"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"bumpalo", "bumpalo",
@@ -799,7 +799,7 @@ dependencies = [
[[package]] [[package]]
name = "build-info" name = "build-info"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"time", "time",
@@ -1829,7 +1829,7 @@ dependencies = [
[[package]] [[package]]
name = "dump" name = "dump"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"big_s", "big_s",
@@ -2072,7 +2072,7 @@ checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be"
[[package]] [[package]]
name = "file-store" name = "file-store"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"tempfile", "tempfile",
"thiserror 2.0.16", "thiserror 2.0.16",
@@ -2094,7 +2094,7 @@ dependencies = [
[[package]] [[package]]
name = "filter-parser" name = "filter-parser"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"insta", "insta",
"levenshtein_automata", "levenshtein_automata",
@@ -2122,7 +2122,7 @@ dependencies = [
[[package]] [[package]]
name = "flatten-serde-json" name = "flatten-serde-json"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"criterion", "criterion",
"serde_json", "serde_json",
@@ -2279,7 +2279,7 @@ dependencies = [
[[package]] [[package]]
name = "fuzzers" name = "fuzzers"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"arbitrary", "arbitrary",
"bumpalo", "bumpalo",
@@ -2758,9 +2758,9 @@ dependencies = [
[[package]] [[package]]
name = "hannoy" name = "hannoy"
version = "0.0.9-nested-rtxns" version = "0.0.9-nested-rtxns-2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cc5a945b92b063e677d658cfcc7cb6dec2502fe44631f017084938f14d6ce30e" checksum = "06eda090938d9dcd568c8c2a5de383047ed9191578ebf4a342d2975d16e621f2"
dependencies = [ dependencies = [
"bytemuck", "bytemuck",
"byteorder", "byteorder",
@@ -3233,7 +3233,7 @@ dependencies = [
[[package]] [[package]]
name = "index-scheduler" name = "index-scheduler"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"backoff", "backoff",
@@ -3487,7 +3487,7 @@ dependencies = [
[[package]] [[package]]
name = "json-depth-checker" name = "json-depth-checker"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"criterion", "criterion",
"serde_json", "serde_json",
@@ -3996,7 +3996,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]] [[package]]
name = "meili-snap" name = "meili-snap"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"insta", "insta",
"md5", "md5",
@@ -4007,7 +4007,7 @@ dependencies = [
[[package]] [[package]]
name = "meilisearch" name = "meilisearch"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"actix-cors", "actix-cors",
"actix-http", "actix-http",
@@ -4104,7 +4104,7 @@ dependencies = [
[[package]] [[package]]
name = "meilisearch-auth" name = "meilisearch-auth"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"base64 0.22.1", "base64 0.22.1",
"enum-iterator", "enum-iterator",
@@ -4123,7 +4123,7 @@ dependencies = [
[[package]] [[package]]
name = "meilisearch-types" name = "meilisearch-types"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"actix-web", "actix-web",
"anyhow", "anyhow",
@@ -4158,7 +4158,7 @@ dependencies = [
[[package]] [[package]]
name = "meilitool" name = "meilitool"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"clap", "clap",
@@ -4192,7 +4192,7 @@ dependencies = [
[[package]] [[package]]
name = "milli" name = "milli"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"allocator-api2 0.3.1", "allocator-api2 0.3.1",
"arroy", "arroy",
@@ -4773,7 +4773,7 @@ checksum = "9b4f627cb1b25917193a259e49bdad08f671f8d9708acfd5fe0a8c1455d87220"
[[package]] [[package]]
name = "permissive-json-pointer" name = "permissive-json-pointer"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"big_s", "big_s",
"serde_json", "serde_json",
@@ -7820,7 +7820,7 @@ dependencies = [
[[package]] [[package]]
name = "xtask" name = "xtask"
version = "1.23.0" version = "1.24.0"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"build-info", "build-info",

View File

@@ -23,7 +23,7 @@ members = [
] ]
[workspace.package] [workspace.package]
version = "1.23.0" version = "1.24.0"
authors = [ authors = [
"Quentin de Quelen <quentin@dequelen.me>", "Quentin de Quelen <quentin@dequelen.me>",
"Clément Renault <clement@meilisearch.com>", "Clément Renault <clement@meilisearch.com>",

View File

@@ -1,5 +1,5 @@
# Compile # Compile
FROM rust:1.89-alpine3.20 AS compiler FROM rust:1.89-alpine3.22 AS compiler
RUN apk add -q --no-cache build-base openssl-dev RUN apk add -q --no-cache build-base openssl-dev
@@ -20,7 +20,7 @@ RUN set -eux; \
cargo build --release -p meilisearch -p meilitool cargo build --release -p meilisearch -p meilitool
# Run # Run
FROM alpine:3.20 FROM alpine:3.22
LABEL org.opencontainers.image.source="https://github.com/meilisearch/meilisearch" LABEL org.opencontainers.image.source="https://github.com/meilisearch/meilisearch"
ENV MEILI_HTTP_ADDR 0.0.0.0:7700 ENV MEILI_HTTP_ADDR 0.0.0.0:7700

View File

@@ -39,6 +39,7 @@
## 🖥 Examples ## 🖥 Examples
- [**Movies**](https://where2watch.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=organization) — An application to help you find streaming platforms to watch movies using [hybrid search](https://www.meilisearch.com/solutions/hybrid-search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos). - [**Movies**](https://where2watch.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=organization) — An application to help you find streaming platforms to watch movies using [hybrid search](https://www.meilisearch.com/solutions/hybrid-search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos).
- [**Images**](https://flickr.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Search Flickr images using [AI-powered search](https://www.meilisearch.com/solutions/hybrid-search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos).
- [**Ecommerce**](https://ecommerce.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Ecommerce website using disjunctive [facets](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos), range and rating filtering, and pagination. - [**Ecommerce**](https://ecommerce.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Ecommerce website using disjunctive [facets](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos), range and rating filtering, and pagination.
- [**Songs**](https://music.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Search through 47 million of songs. - [**Songs**](https://music.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Search through 47 million of songs.
- [**SaaS**](https://saas.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Search for contacts, deals, and companies in this [multi-tenant](https://www.meilisearch.com/docs/learn/security/multitenancy_tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) CRM application. - [**SaaS**](https://saas.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Search for contacts, deals, and companies in this [multi-tenant](https://www.meilisearch.com/docs/learn/security/multitenancy_tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) CRM application.

View File

@@ -199,7 +199,7 @@ impl IndexMapper {
let uuid = Uuid::new_v4(); let uuid = Uuid::new_v4();
self.index_mapping.put(&mut wtxn, name, &uuid)?; self.index_mapping.put(&mut wtxn, name, &uuid)?;
let index_path = self.base_path.join(uuid.to_string()); let index_path = self.index_path(uuid);
fs::create_dir_all(&index_path)?; fs::create_dir_all(&index_path)?;
// Error if the UUIDv4 somehow already exists in the map, since it should be fresh. // Error if the UUIDv4 somehow already exists in the map, since it should be fresh.
@@ -286,7 +286,7 @@ impl IndexMapper {
}; };
let index_map = self.index_map.clone(); let index_map = self.index_map.clone();
let index_path = self.base_path.join(uuid.to_string()); let index_path = self.index_path(uuid);
let index_name = name.to_string(); let index_name = name.to_string();
thread::Builder::new() thread::Builder::new()
.name(String::from("index_deleter")) .name(String::from("index_deleter"))
@@ -408,7 +408,7 @@ impl IndexMapper {
} else { } else {
continue; continue;
}; };
let index_path = self.base_path.join(uuid.to_string()); let index_path = self.index_path(uuid);
// take the lock to reopen the environment. // take the lock to reopen the environment.
reopen reopen
.reopen(&mut self.index_map.write().unwrap(), &index_path) .reopen(&mut self.index_map.write().unwrap(), &index_path)
@@ -425,7 +425,7 @@ impl IndexMapper {
// if it's not already there. // if it's not already there.
match index_map.get(&uuid) { match index_map.get(&uuid) {
Missing => { Missing => {
let index_path = self.base_path.join(uuid.to_string()); let index_path = self.index_path(uuid);
break index_map break index_map
.create( .create(
@@ -452,6 +452,14 @@ impl IndexMapper {
Ok(index) Ok(index)
} }
/// Returns the path of the index.
///
/// The folder located at this path is containing the data.mdb,
/// the lock.mdb and an optional data.mdb.cpy file.
pub fn index_path(&self, uuid: Uuid) -> PathBuf {
self.base_path.join(uuid.to_string())
}
pub fn rollback_index( pub fn rollback_index(
&self, &self,
rtxn: &RoTxn, rtxn: &RoTxn,
@@ -492,7 +500,7 @@ impl IndexMapper {
}; };
} }
let index_path = self.base_path.join(uuid.to_string()); let index_path = self.index_path(uuid);
Index::rollback(milli::heed::EnvOpenOptions::new().read_txn_without_tls(), index_path, to) Index::rollback(milli::heed::EnvOpenOptions::new().read_txn_without_tls(), index_path, to)
.map_err(|err| crate::Error::from_milli(err, Some(name.to_string()))) .map_err(|err| crate::Error::from_milli(err, Some(name.to_string())))
} }

View File

@@ -75,6 +75,7 @@ make_enum_progress! {
pub enum TaskCancelationProgress { pub enum TaskCancelationProgress {
RetrievingTasks, RetrievingTasks,
CancelingUpgrade, CancelingUpgrade,
CleaningCompactionLeftover,
UpdatingTasks, UpdatingTasks,
} }
} }

View File

@@ -25,7 +25,6 @@ enum AutobatchKind {
IndexDeletion, IndexDeletion,
IndexUpdate, IndexUpdate,
IndexSwap, IndexSwap,
IndexCompaction,
} }
impl AutobatchKind { impl AutobatchKind {
@@ -69,14 +68,14 @@ impl From<KindWithContent> for AutobatchKind {
KindWithContent::IndexCreation { .. } => AutobatchKind::IndexCreation, KindWithContent::IndexCreation { .. } => AutobatchKind::IndexCreation,
KindWithContent::IndexUpdate { .. } => AutobatchKind::IndexUpdate, KindWithContent::IndexUpdate { .. } => AutobatchKind::IndexUpdate,
KindWithContent::IndexSwap { .. } => AutobatchKind::IndexSwap, KindWithContent::IndexSwap { .. } => AutobatchKind::IndexSwap,
KindWithContent::IndexCompaction { .. } => AutobatchKind::IndexCompaction, KindWithContent::IndexCompaction { .. }
KindWithContent::TaskCancelation { .. } | KindWithContent::TaskCancelation { .. }
| KindWithContent::TaskDeletion { .. } | KindWithContent::TaskDeletion { .. }
| KindWithContent::DumpCreation { .. } | KindWithContent::DumpCreation { .. }
| KindWithContent::Export { .. } | KindWithContent::Export { .. }
| KindWithContent::UpgradeDatabase { .. } | KindWithContent::UpgradeDatabase { .. }
| KindWithContent::SnapshotCreation => { | KindWithContent::SnapshotCreation => {
panic!("The autobatcher should never be called with tasks that don't apply to an index.") panic!("The autobatcher should never be called with tasks with special priority or that don't apply to an index.")
} }
} }
} }
@@ -120,9 +119,6 @@ pub enum BatchKind {
IndexSwap { IndexSwap {
id: TaskId, id: TaskId,
}, },
IndexCompaction {
id: TaskId,
},
} }
impl BatchKind { impl BatchKind {
@@ -188,13 +184,6 @@ impl BatchKind {
)), )),
false, false,
), ),
K::IndexCompaction => (
Break((
BatchKind::IndexCompaction { id: task_id },
BatchStopReason::TaskCannotBeBatched { kind, id: task_id },
)),
false,
),
K::DocumentClear => (Continue(BatchKind::DocumentClear { ids: vec![task_id] }), false), K::DocumentClear => (Continue(BatchKind::DocumentClear { ids: vec![task_id] }), false),
K::DocumentImport { allow_index_creation, primary_key: pk } K::DocumentImport { allow_index_creation, primary_key: pk }
if primary_key.is_none() || pk.is_none() || primary_key == pk.as_deref() => if primary_key.is_none() || pk.is_none() || primary_key == pk.as_deref() =>
@@ -300,7 +289,7 @@ impl BatchKind {
match (self, autobatch_kind) { match (self, autobatch_kind) {
// We don't batch any of these operations // We don't batch any of these operations
(this, K::IndexCreation | K::IndexUpdate | K::IndexSwap | K::DocumentEdition | K::IndexCompaction) => { (this, K::IndexCreation | K::IndexUpdate | K::IndexSwap | K::DocumentEdition) => {
Break((this, BatchStopReason::TaskCannotBeBatched { kind, id })) Break((this, BatchStopReason::TaskCannotBeBatched { kind, id }))
}, },
// We must not batch tasks that don't have the same index creation rights if the index doesn't already exists. // We must not batch tasks that don't have the same index creation rights if the index doesn't already exists.
@@ -497,7 +486,6 @@ impl BatchKind {
| BatchKind::IndexDeletion { .. } | BatchKind::IndexDeletion { .. }
| BatchKind::IndexUpdate { .. } | BatchKind::IndexUpdate { .. }
| BatchKind::IndexSwap { .. } | BatchKind::IndexSwap { .. }
| BatchKind::IndexCompaction { .. }
| BatchKind::DocumentEdition { .. }, | BatchKind::DocumentEdition { .. },
_, _,
) => { ) => {

View File

@@ -437,12 +437,6 @@ impl IndexScheduler {
current_batch.processing(Some(&mut task)); current_batch.processing(Some(&mut task));
Ok(Some(Batch::IndexSwap { task })) Ok(Some(Batch::IndexSwap { task }))
} }
BatchKind::IndexCompaction { id } => {
let mut task =
self.queue.tasks.get_task(rtxn, id)?.ok_or(Error::CorruptedTaskQueue)?;
current_batch.processing(Some(&mut task));
Ok(Some(Batch::IndexCompaction { index_uid, task }))
}
} }
} }
@@ -525,17 +519,33 @@ impl IndexScheduler {
return Ok(Some((Batch::TaskDeletions(tasks), current_batch))); return Ok(Some((Batch::TaskDeletions(tasks), current_batch)));
} }
// 3. we batch the export. // 3. we get the next task to compact
let to_compact = self.queue.tasks.get_kind(rtxn, Kind::IndexCompaction)? & enqueued;
if let Some(task_id) = to_compact.min() {
let mut task =
self.queue.tasks.get_task(rtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?;
current_batch.processing(Some(&mut task));
current_batch.reason(BatchStopReason::TaskCannotBeBatched {
kind: Kind::IndexCompaction,
id: task_id,
});
let index_uid =
task.index_uid().expect("Compaction task must have an index uid").to_owned();
return Ok(Some((Batch::IndexCompaction { index_uid, task }, current_batch)));
}
// 4. we batch the export.
let to_export = self.queue.tasks.get_kind(rtxn, Kind::Export)? & enqueued; let to_export = self.queue.tasks.get_kind(rtxn, Kind::Export)? & enqueued;
if !to_export.is_empty() { if !to_export.is_empty() {
let task_id = to_export.iter().next().expect("There must be at least one export task"); let task_id = to_export.iter().next().expect("There must be at least one export task");
let mut task = self.queue.tasks.get_task(rtxn, task_id)?.unwrap(); let mut task = self.queue.tasks.get_task(rtxn, task_id)?.unwrap();
current_batch.processing([&mut task]); current_batch.processing([&mut task]);
current_batch.reason(BatchStopReason::TaskKindCannotBeBatched { kind: Kind::Export }); current_batch
.reason(BatchStopReason::TaskCannotBeBatched { kind: Kind::Export, id: task_id });
return Ok(Some((Batch::Export { task }, current_batch))); return Ok(Some((Batch::Export { task }, current_batch)));
} }
// 4. we batch the snapshot. // 5. we batch the snapshot.
let to_snapshot = self.queue.tasks.get_kind(rtxn, Kind::SnapshotCreation)? & enqueued; let to_snapshot = self.queue.tasks.get_kind(rtxn, Kind::SnapshotCreation)? & enqueued;
if !to_snapshot.is_empty() { if !to_snapshot.is_empty() {
let mut tasks = self.queue.tasks.get_existing_tasks(rtxn, to_snapshot)?; let mut tasks = self.queue.tasks.get_existing_tasks(rtxn, to_snapshot)?;
@@ -545,7 +555,7 @@ impl IndexScheduler {
return Ok(Some((Batch::SnapshotCreation(tasks), current_batch))); return Ok(Some((Batch::SnapshotCreation(tasks), current_batch)));
} }
// 5. we batch the dumps. // 6. we batch the dumps.
let to_dump = self.queue.tasks.get_kind(rtxn, Kind::DumpCreation)? & enqueued; let to_dump = self.queue.tasks.get_kind(rtxn, Kind::DumpCreation)? & enqueued;
if let Some(to_dump) = to_dump.min() { if let Some(to_dump) = to_dump.min() {
let mut task = let mut task =
@@ -558,7 +568,7 @@ impl IndexScheduler {
return Ok(Some((Batch::Dump(task), current_batch))); return Ok(Some((Batch::Dump(task), current_batch)));
} }
// 6. We make a batch from the unprioritised tasks. Start by taking the next enqueued task. // 7. We make a batch from the unprioritised tasks. Start by taking the next enqueued task.
let task_id = if let Some(task_id) = enqueued.min() { task_id } else { return Ok(None) }; let task_id = if let Some(task_id) = enqueued.min() { task_id } else { return Ok(None) };
let mut task = let mut task =
self.queue.tasks.get_task(rtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?; self.queue.tasks.get_task(rtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?;

View File

@@ -1,5 +1,6 @@
use std::collections::{BTreeSet, HashMap, HashSet}; use std::collections::{BTreeSet, HashMap, HashSet};
use std::io::{Seek, SeekFrom}; use std::fs::{remove_file, File};
use std::io::{ErrorKind, Seek, SeekFrom};
use std::panic::{catch_unwind, AssertUnwindSafe}; use std::panic::{catch_unwind, AssertUnwindSafe};
use std::sync::atomic::Ordering; use std::sync::atomic::Ordering;
@@ -13,7 +14,7 @@ use meilisearch_types::tasks::{Details, IndexSwap, Kind, KindWithContent, Status
use meilisearch_types::versioning::{VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH}; use meilisearch_types::versioning::{VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH};
use milli::update::Settings as MilliSettings; use milli::update::Settings as MilliSettings;
use roaring::RoaringBitmap; use roaring::RoaringBitmap;
use tempfile::PersistError; use tempfile::{PersistError, TempPath};
use time::OffsetDateTime; use time::OffsetDateTime;
use super::create_batch::Batch; use super::create_batch::Batch;
@@ -28,6 +29,9 @@ use crate::utils::{
}; };
use crate::{Error, IndexScheduler, Result, TaskId}; use crate::{Error, IndexScheduler, Result, TaskId};
/// The name of the copy of the data.mdb file used during compaction.
const DATA_MDB_COPY_NAME: &str = "data.mdb.cpy";
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct ProcessBatchInfo { pub struct ProcessBatchInfo {
/// The write channel congestion. None when unavailable: settings update. /// The write channel congestion. None when unavailable: settings update.
@@ -558,11 +562,12 @@ impl IndexScheduler {
.set_currently_updating_index(Some((index_uid.to_string(), index.clone()))); .set_currently_updating_index(Some((index_uid.to_string(), index.clone())));
progress.update_progress(IndexCompaction::CreateTemporaryFile); progress.update_progress(IndexCompaction::CreateTemporaryFile);
let pre_size = std::fs::metadata(index.path().join("data.mdb"))?.len(); let src_path = index.path().join("data.mdb");
let mut file = tempfile::Builder::new() let pre_size = std::fs::metadata(&src_path)?.len();
.suffix("data.")
.prefix(".mdb.cpy") let dst_path = TempPath::from_path(index.path().join(DATA_MDB_COPY_NAME));
.tempfile_in(index.path())?; let file = File::create(&dst_path)?;
let mut file = tempfile::NamedTempFile::from_parts(file, dst_path);
// 3. We copy the index data to the temporary file // 3. We copy the index data to the temporary file
progress.update_progress(IndexCompaction::CopyAndCompactTheIndex); progress.update_progress(IndexCompaction::CopyAndCompactTheIndex);
@@ -574,7 +579,7 @@ impl IndexScheduler {
// 4. We replace the index data file with the temporary file // 4. We replace the index data file with the temporary file
progress.update_progress(IndexCompaction::PersistTheCompactedIndex); progress.update_progress(IndexCompaction::PersistTheCompactedIndex);
match file.persist(index.path().join("data.mdb")) { match file.persist(src_path) {
Ok(file) => file.sync_all()?, Ok(file) => file.sync_all()?,
// TODO see if we have a _resource busy_ error and probably handle this by: // TODO see if we have a _resource busy_ error and probably handle this by:
// 1. closing the index, 2. replacing and 3. reopening it // 1. closing the index, 2. replacing and 3. reopening it
@@ -910,9 +915,10 @@ impl IndexScheduler {
let enqueued_tasks = &self.queue.tasks.get_status(rtxn, Status::Enqueued)?; let enqueued_tasks = &self.queue.tasks.get_status(rtxn, Status::Enqueued)?;
// 0. Check if any upgrade task was matched. // 0. Check if any upgrade or compaction tasks were matched.
// If so, we cancel all the failed or enqueued upgrade tasks. // If so, we cancel all the failed or enqueued upgrade tasks.
let upgrade_tasks = &self.queue.tasks.get_kind(rtxn, Kind::UpgradeDatabase)?; let upgrade_tasks = &self.queue.tasks.get_kind(rtxn, Kind::UpgradeDatabase)?;
let compaction_tasks = &self.queue.tasks.get_kind(rtxn, Kind::IndexCompaction)?;
let is_canceling_upgrade = !matched_tasks.is_disjoint(upgrade_tasks); let is_canceling_upgrade = !matched_tasks.is_disjoint(upgrade_tasks);
if is_canceling_upgrade { if is_canceling_upgrade {
let failed_tasks = self.queue.tasks.get_status(rtxn, Status::Failed)?; let failed_tasks = self.queue.tasks.get_status(rtxn, Status::Failed)?;
@@ -977,7 +983,33 @@ impl IndexScheduler {
} }
} }
// 3. We now have a list of tasks to cancel, cancel them // 3. If we are cancelling a compaction task, remove the tempfiles after incomplete compactions
for compaction_task in &tasks_to_cancel & compaction_tasks {
progress.update_progress(TaskCancelationProgress::CleaningCompactionLeftover);
let task = self.queue.tasks.get_task(rtxn, compaction_task)?.unwrap();
let Some(Details::IndexCompaction {
index_uid,
pre_compaction_size: _,
post_compaction_size: _,
}) = task.details
else {
unreachable!("wrong details for compaction task {compaction_task}")
};
let index_path = match self.index_mapper.index_mapping.get(rtxn, &index_uid)? {
Some(index_uuid) => self.index_mapper.index_path(index_uuid),
None => continue,
};
if let Err(e) = remove_file(index_path.join(DATA_MDB_COPY_NAME)) {
match e.kind() {
ErrorKind::NotFound => (),
_ => return Err(Error::IoError(e)),
}
}
}
// 4. We now have a list of tasks to cancel, cancel them
let (task_progress, progress_obj) = AtomicTaskStep::new(tasks_to_cancel.len() as u32); let (task_progress, progress_obj) = AtomicTaskStep::new(tasks_to_cancel.len() as u32);
progress.update_progress(progress_obj); progress.update_progress(progress_obj);

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 23, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 24, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, batch_uid: 1, status: succeeded, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 1 {uid: 1, batch_uid: 1, status: succeeded, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
2 {uid: 2, batch_uid: 2, status: succeeded, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 2 {uid: 2, batch_uid: 2, status: succeeded, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
3 {uid: 3, batch_uid: 3, status: failed, error: ResponseError { code: 200, message: "Index `doggo` already exists.", error_code: "index_already_exists", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#index_already_exists" }, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 3 {uid: 3, batch_uid: 3, status: failed, error: ResponseError { code: 200, message: "Index `doggo` already exists.", error_code: "index_already_exists", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#index_already_exists" }, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
@@ -57,7 +57,7 @@ girafo: { number_of_documents: 0, field_distribution: {} }
[timestamp] [4,] [timestamp] [4,]
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Batches: ### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.23.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } 0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.24.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
1 {uid: 1, details: {"primaryKey":"mouse"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"catto":1}}, stop reason: "created batch containing only task with id 1 of type `indexCreation` that cannot be batched with any other task.", } 1 {uid: 1, details: {"primaryKey":"mouse"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"catto":1}}, stop reason: "created batch containing only task with id 1 of type `indexCreation` that cannot be batched with any other task.", }
2 {uid: 2, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 2 of type `indexCreation` that cannot be batched with any other task.", } 2 {uid: 2, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 2 of type `indexCreation` that cannot be batched with any other task.", }
3 {uid: 3, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 3 of type `indexCreation` that cannot be batched with any other task.", } 3 {uid: 3, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 3 of type `indexCreation` that cannot be batched with any other task.", }

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 23, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 24, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:
enqueued [0,] enqueued [0,]

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 23, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 24, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 23, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 24, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:
@@ -37,7 +37,7 @@ catto [1,]
[timestamp] [0,] [timestamp] [0,]
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Batches: ### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.23.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } 0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.24.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Batch to tasks mapping: ### Batch to tasks mapping:
0 [0,] 0 [0,]

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 23, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 24, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
2 {uid: 2, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 2 {uid: 2, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
@@ -40,7 +40,7 @@ doggo [2,]
[timestamp] [0,] [timestamp] [0,]
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Batches: ### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.23.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } 0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.24.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Batch to tasks mapping: ### Batch to tasks mapping:
0 [0,] 0 [0,]

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[] []
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 23, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 24, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
2 {uid: 2, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 2 {uid: 2, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
3 {uid: 3, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 3 {uid: 3, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
@@ -43,7 +43,7 @@ doggo [2,3,]
[timestamp] [0,] [timestamp] [0,]
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Batches: ### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.23.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } 0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.24.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Batch to tasks mapping: ### Batch to tasks mapping:
0 [0,] 0 [0,]

View File

@@ -47,6 +47,7 @@ pub fn upgrade_index_scheduler(
(1, 21, _) => 0, (1, 21, _) => 0,
(1, 22, _) => 0, (1, 22, _) => 0,
(1, 23, _) => 0, (1, 23, _) => 0,
(1, 24, _) => 0,
(major, minor, patch) => { (major, minor, patch) => {
if major > current_major if major > current_major
|| (major == current_major && minor > current_minor) || (major == current_major && minor > current_minor)

View File

@@ -22,11 +22,12 @@ use crate::extractors::authentication::GuardedData;
use crate::extractors::sequential_extractor::SeqHandler; use crate::extractors::sequential_extractor::SeqHandler;
use crate::metrics::MEILISEARCH_DEGRADED_SEARCH_REQUESTS; use crate::metrics::MEILISEARCH_DEGRADED_SEARCH_REQUESTS;
use crate::routes::indexes::search_analytics::{SearchAggregator, SearchGET, SearchPOST}; use crate::routes::indexes::search_analytics::{SearchAggregator, SearchGET, SearchPOST};
use crate::routes::parse_include_metadata_header;
use crate::search::{ use crate::search::{
add_search_rules, perform_search, HybridQuery, MatchingStrategy, RankingScoreThreshold, add_search_rules, perform_search, HybridQuery, MatchingStrategy, RankingScoreThreshold,
RetrieveVectors, SearchKind, SearchQuery, SearchResult, SemanticRatio, DEFAULT_CROP_LENGTH, RetrieveVectors, SearchKind, SearchParams, SearchQuery, SearchResult, SemanticRatio,
DEFAULT_CROP_MARKER, DEFAULT_HIGHLIGHT_POST_TAG, DEFAULT_HIGHLIGHT_PRE_TAG, DEFAULT_CROP_LENGTH, DEFAULT_CROP_MARKER, DEFAULT_HIGHLIGHT_POST_TAG,
DEFAULT_SEARCH_LIMIT, DEFAULT_SEARCH_OFFSET, DEFAULT_SEMANTIC_RATIO, DEFAULT_HIGHLIGHT_PRE_TAG, DEFAULT_SEARCH_LIMIT, DEFAULT_SEARCH_OFFSET, DEFAULT_SEMANTIC_RATIO,
}; };
use crate::search_queue::SearchQueue; use crate::search_queue::SearchQueue;
@@ -345,15 +346,20 @@ pub async fn search_with_url_query(
search_kind(&query, index_scheduler.get_ref(), index_uid.to_string(), &index)?; search_kind(&query, index_scheduler.get_ref(), index_uid.to_string(), &index)?;
let retrieve_vector = RetrieveVectors::new(query.retrieve_vectors); let retrieve_vector = RetrieveVectors::new(query.retrieve_vectors);
let permit = search_queue.try_get_search_permit().await?; let permit = search_queue.try_get_search_permit().await?;
let include_metadata = parse_include_metadata_header(&req);
let search_result = tokio::task::spawn_blocking(move || { let search_result = tokio::task::spawn_blocking(move || {
perform_search( perform_search(
index_uid.to_string(), SearchParams {
index_uid: index_uid.to_string(),
query,
search_kind,
retrieve_vectors: retrieve_vector,
features: index_scheduler.features(),
request_uid,
include_metadata,
},
&index, &index,
query,
search_kind,
retrieve_vector,
index_scheduler.features(),
request_uid,
) )
}) })
.await; .await;
@@ -453,16 +459,21 @@ pub async fn search_with_post(
search_kind(&query, index_scheduler.get_ref(), index_uid.to_string(), &index)?; search_kind(&query, index_scheduler.get_ref(), index_uid.to_string(), &index)?;
let retrieve_vectors = RetrieveVectors::new(query.retrieve_vectors); let retrieve_vectors = RetrieveVectors::new(query.retrieve_vectors);
let include_metadata = parse_include_metadata_header(&req);
let permit = search_queue.try_get_search_permit().await?; let permit = search_queue.try_get_search_permit().await?;
let search_result = tokio::task::spawn_blocking(move || { let search_result = tokio::task::spawn_blocking(move || {
perform_search( perform_search(
index_uid.to_string(), SearchParams {
index_uid: index_uid.to_string(),
query,
search_kind,
retrieve_vectors,
features: index_scheduler.features(),
request_uid,
include_metadata,
},
&index, &index,
query,
search_kind,
retrieve_vectors,
index_scheduler.features(),
request_uid,
) )
}) })
.await; .await;

View File

@@ -235,6 +235,7 @@ impl<Method: AggregateMethod> SearchAggregator<Method> {
degraded, degraded,
used_negative_operator, used_negative_operator,
request_uid: _, request_uid: _,
metadata: _,
} = result; } = result;
self.total_succeeded = self.total_succeeded.saturating_add(1); self.total_succeeded = self.total_succeeded.saturating_add(1);

View File

@@ -45,6 +45,7 @@ use crate::routes::webhooks::{WebhookResults, WebhookSettings, WebhookWithMetada
use crate::search::{ use crate::search::{
FederatedSearch, FederatedSearchResult, Federation, FederationOptions, MergeFacets, FederatedSearch, FederatedSearchResult, Federation, FederationOptions, MergeFacets,
SearchQueryWithIndex, SearchResultWithIndex, SimilarQuery, SimilarResult, SearchQueryWithIndex, SearchResultWithIndex, SimilarQuery, SimilarResult,
INCLUDE_METADATA_HEADER,
}; };
use crate::search_queue::SearchQueue; use crate::search_queue::SearchQueue;
use crate::Opt; use crate::Opt;
@@ -184,6 +185,18 @@ pub fn is_dry_run(req: &HttpRequest, opt: &Opt) -> Result<bool, ResponseError> {
.is_some_and(|s| s.to_lowercase() == "true")) .is_some_and(|s| s.to_lowercase() == "true"))
} }
/// Parse the `Meili-Include-Metadata` header from an HTTP request.
///
/// Returns `true` if the header is present and set to "true" or "1" (case-insensitive).
/// Returns `false` if the header is not present or has any other value.
pub fn parse_include_metadata_header(req: &HttpRequest) -> bool {
req.headers()
.get(INCLUDE_METADATA_HEADER)
.and_then(|h| h.to_str().ok())
.map(|v| matches!(v.to_lowercase().as_str(), "true" | "1"))
.unwrap_or(false)
}
#[derive(Debug, Serialize, Deserialize, ToSchema)] #[derive(Debug, Serialize, Deserialize, ToSchema)]
#[serde(rename_all = "camelCase")] #[serde(rename_all = "camelCase")]
pub struct SummarizedTaskView { pub struct SummarizedTaskView {

View File

@@ -18,10 +18,11 @@ use crate::extractors::authentication::policies::ActionPolicy;
use crate::extractors::authentication::{AuthenticationError, GuardedData}; use crate::extractors::authentication::{AuthenticationError, GuardedData};
use crate::extractors::sequential_extractor::SeqHandler; use crate::extractors::sequential_extractor::SeqHandler;
use crate::routes::indexes::search::search_kind; use crate::routes::indexes::search::search_kind;
use crate::routes::parse_include_metadata_header;
use crate::search::{ use crate::search::{
add_search_rules, perform_federated_search, perform_search, FederatedSearch, add_search_rules, perform_federated_search, perform_search, FederatedSearch,
FederatedSearchResult, RetrieveVectors, SearchQueryWithIndex, SearchResultWithIndex, FederatedSearchResult, RetrieveVectors, SearchParams, SearchQueryWithIndex,
PROXY_SEARCH_HEADER, PROXY_SEARCH_HEADER_VALUE, SearchResultWithIndex, PROXY_SEARCH_HEADER, PROXY_SEARCH_HEADER_VALUE,
}; };
use crate::search_queue::SearchQueue; use crate::search_queue::SearchQueue;
@@ -188,6 +189,7 @@ pub async fn multi_search_with_post(
err err
})?; })?;
let include_metadata = parse_include_metadata_header(&req);
let response = match federation { let response = match federation {
Some(federation) => { Some(federation) => {
debug!( debug!(
@@ -209,6 +211,7 @@ pub async fn multi_search_with_post(
features, features,
is_proxy, is_proxy,
request_uid, request_uid,
include_metadata,
) )
.await; .await;
permit.drop().await; permit.drop().await;
@@ -279,13 +282,16 @@ pub async fn multi_search_with_post(
let search_result = tokio::task::spawn_blocking(move || { let search_result = tokio::task::spawn_blocking(move || {
perform_search( perform_search(
index_uid_str.clone(), SearchParams {
index_uid: index_uid_str.clone(),
query,
search_kind,
retrieve_vectors: retrieve_vector,
features,
request_uid,
include_metadata,
},
&index, &index,
query,
search_kind,
retrieve_vector,
features,
request_uid,
) )
}) })
.await .await

View File

@@ -22,7 +22,8 @@ use uuid::Uuid;
use super::super::ranking_rules::{self, RankingRules}; use super::super::ranking_rules::{self, RankingRules};
use super::super::{ use super::super::{
compute_facet_distribution_stats, prepare_search, AttributesFormat, ComputedFacets, HitMaker, compute_facet_distribution_stats, prepare_search, AttributesFormat, ComputedFacets, HitMaker,
HitsInfo, RetrieveVectors, SearchHit, SearchKind, SearchQuery, SearchQueryWithIndex, HitsInfo, RetrieveVectors, SearchHit, SearchKind, SearchMetadata, SearchQuery,
SearchQueryWithIndex,
}; };
use super::proxy::{proxy_search, ProxySearchError, ProxySearchParams}; use super::proxy::{proxy_search, ProxySearchError, ProxySearchParams};
use super::types::{ use super::types::{
@@ -41,6 +42,7 @@ pub async fn perform_federated_search(
features: RoFeatures, features: RoFeatures,
is_proxy: bool, is_proxy: bool,
request_uid: Uuid, request_uid: Uuid,
include_metadata: bool,
) -> Result<FederatedSearchResult, ResponseError> { ) -> Result<FederatedSearchResult, ResponseError> {
if is_proxy { if is_proxy {
features.check_network("Performing a remote federated search")?; features.check_network("Performing a remote federated search")?;
@@ -59,20 +61,38 @@ pub async fn perform_federated_search(
let network = index_scheduler.network(); let network = index_scheduler.network();
// Preconstruct metadata keeping the original queries order for later metadata building
let precomputed_query_metadata: Option<Vec<_>> = include_metadata.then(|| {
queries
.iter()
.map(|q| {
(
q.index_uid.to_string(),
q.federation_options.as_ref().and_then(|o| o.remote.clone()),
)
})
.collect()
});
// this implementation partition the queries by index to guarantee an important property: // this implementation partition the queries by index to guarantee an important property:
// - all the queries to a particular index use the same read transaction. // - all the queries to a particular index use the same read transaction.
// This is an important property, otherwise we cannot guarantee the self-consistency of the results. // This is an important property, otherwise we cannot guarantee the self-consistency of the results.
// 1. partition queries by host and index // 1. partition queries by host and index
let mut partitioned_queries = PartitionedQueries::new(); let mut partitioned_queries = PartitionedQueries::new();
for (query_index, federated_query) in queries.into_iter().enumerate() { for (query_index, federated_query) in queries.into_iter().enumerate() {
partitioned_queries.partition(federated_query, query_index, &network, features)? partitioned_queries.partition(federated_query, query_index, &network, features)?
} }
// 2. perform queries, merge and make hits index by index // 2. perform queries, merge and make hits index by index
// 2.1. start remote queries // 2.1. start remote queries
let remote_search = let remote_search = RemoteSearch::start(
RemoteSearch::start(partitioned_queries.remote_queries_by_host, &federation, deadline); partitioned_queries.remote_queries_by_host,
&federation,
deadline,
include_metadata,
);
// 2.2. concurrently execute local queries // 2.2. concurrently execute local queries
let params = SearchByIndexParams { let params = SearchByIndexParams {
@@ -114,11 +134,25 @@ pub async fn perform_federated_search(
let after_waiting_remote_results = std::time::Instant::now(); let after_waiting_remote_results = std::time::Instant::now();
// 3. merge hits and metadata across indexes and hosts // 3. merge hits and metadata across indexes and hosts
// 3.1. merge metadata
// 3.1. Build metadata in the same order as the original queries
let query_metadata = precomputed_query_metadata.map(|precomputed_query_metadata| {
// If a remote is present, set the local remote name
let local_remote_name = network.local.clone().filter(|_| partitioned_queries.has_remote);
build_query_metadata(
precomputed_query_metadata,
local_remote_name,
&remote_results,
&results_by_index,
)
});
// 3.2. merge federation metadata
let (estimated_total_hits, degraded, used_negative_operator, facets, max_remote_duration) = let (estimated_total_hits, degraded, used_negative_operator, facets, max_remote_duration) =
merge_metadata(&mut results_by_index, &remote_results); merge_metadata(&mut results_by_index, &remote_results);
// 3.2. merge hits // 3.3. merge hits
let merged_hits: Vec<_> = merge_index_global_results(results_by_index, &mut remote_results) let merged_hits: Vec<_> = merge_index_global_results(results_by_index, &mut remote_results)
.skip(federation.offset) .skip(federation.offset)
.take(federation.limit) .take(federation.limit)
@@ -133,7 +167,7 @@ pub async fn perform_federated_search(
.map(|hit| hit.hit()) .map(|hit| hit.hit())
.collect(); .collect();
// 3.3. merge query vectors // 3.4. merge query vectors
let query_vectors = if retrieve_vectors { let query_vectors = if retrieve_vectors {
for remote_results in remote_results.iter_mut() { for remote_results in remote_results.iter_mut() {
if let Some(remote_vectors) = remote_results.query_vectors.take() { if let Some(remote_vectors) = remote_results.query_vectors.take() {
@@ -152,7 +186,7 @@ pub async fn perform_federated_search(
None None
}; };
// 3.4. merge facets // 3.5. merge facets
let (facet_distribution, facet_stats, facets_by_index) = let (facet_distribution, facet_stats, facets_by_index) =
facet_order.merge(federation.merge_facets, remote_results, facets); facet_order.merge(federation.merge_facets, remote_results, facets);
@@ -179,6 +213,7 @@ pub async fn perform_federated_search(
facets_by_index, facets_by_index,
remote_errors: partitioned_queries.has_remote.then_some(remote_errors), remote_errors: partitioned_queries.has_remote.then_some(remote_errors),
request_uid: Some(request_uid), request_uid: Some(request_uid),
metadata: query_metadata,
}) })
} }
@@ -402,6 +437,7 @@ struct SearchHitByIndex {
struct SearchResultByIndex { struct SearchResultByIndex {
index: String, index: String,
primary_key: Option<String>,
hits: Vec<SearchHitByIndex>, hits: Vec<SearchHitByIndex>,
estimated_total_hits: usize, estimated_total_hits: usize,
degraded: bool, degraded: bool,
@@ -409,6 +445,61 @@ struct SearchResultByIndex {
facets: Option<ComputedFacets>, facets: Option<ComputedFacets>,
} }
/// Builds query metadata for federated search results.
///
/// This function creates metadata for each query in the same order as the original queries,
/// combining information from both local and remote search results. It handles the mapping
/// of primary keys to their respective indexes and remotes to prevent collisions when
/// multiple remotes have the same index_uid but different primary keys.
fn build_query_metadata(
precomputed_query_metadata: Vec<(String, Option<String>)>,
local_remote_name: Option<String>,
remote_results: &[FederatedSearchResult],
results_by_index: &[SearchResultByIndex],
) -> Vec<SearchMetadata> {
// Create a map of (remote, index_uid) -> primary_key for quick lookup
// This prevents collisions when multiple remotes have the same index_uid but different primary keys
let mut primary_key_per_index = std::collections::HashMap::new();
// Build metadata for remote results
for remote_result in remote_results {
if let Some(remote_metadata) = &remote_result.metadata {
for remote_meta in remote_metadata {
if let SearchMetadata {
remote: Some(remote_name),
index_uid,
primary_key: Some(primary_key),
..
} = remote_meta
{
let key = (Some(remote_name), index_uid);
primary_key_per_index.insert(key, primary_key);
}
}
}
}
// Build metadata for local results
for local_meta in results_by_index {
if let SearchResultByIndex { index, primary_key: Some(primary_key), .. } = local_meta {
let key = (None, index);
primary_key_per_index.insert(key, primary_key);
}
}
// Build metadata in the same order as the original queries
let mut query_metadata = Vec::new();
for (index_uid, remote) in precomputed_query_metadata {
let primary_key =
primary_key_per_index.get(&(remote.as_ref(), &index_uid)).map(|pk| pk.to_string());
let query_uid = Uuid::now_v7();
// if the remote is not set, use the local remote name
let remote = remote.or_else(|| local_remote_name.clone());
query_metadata.push(SearchMetadata { query_uid, primary_key, index_uid, remote });
}
query_metadata
}
fn merge_metadata( fn merge_metadata(
results_by_index: &mut Vec<SearchResultByIndex>, results_by_index: &mut Vec<SearchResultByIndex>,
remote_results: &Vec<FederatedSearchResult>, remote_results: &Vec<FederatedSearchResult>,
@@ -420,6 +511,7 @@ fn merge_metadata(
let mut max_remote_duration = Duration::ZERO; let mut max_remote_duration = Duration::ZERO;
for SearchResultByIndex { for SearchResultByIndex {
index, index,
primary_key: _,
hits: _, hits: _,
estimated_total_hits: estimated_total_hits_by_index, estimated_total_hits: estimated_total_hits_by_index,
facets: facets_by_index, facets: facets_by_index,
@@ -448,6 +540,7 @@ fn merge_metadata(
degraded: degraded_for_host, degraded: degraded_for_host,
used_negative_operator: host_used_negative_operator, used_negative_operator: host_used_negative_operator,
remote_errors: _, remote_errors: _,
metadata: _,
request_uid: _, request_uid: _,
} in remote_results } in remote_results
{ {
@@ -576,7 +669,12 @@ struct RemoteSearch {
} }
impl RemoteSearch { impl RemoteSearch {
fn start(queries: RemoteQueriesByHost, federation: &Federation, deadline: Instant) -> Self { fn start(
queries: RemoteQueriesByHost,
federation: &Federation,
deadline: Instant,
include_metadata: bool,
) -> Self {
let mut in_flight_remote_queries = BTreeMap::new(); let mut in_flight_remote_queries = BTreeMap::new();
let client = reqwest::ClientBuilder::new() let client = reqwest::ClientBuilder::new()
.connect_timeout(std::time::Duration::from_millis(200)) .connect_timeout(std::time::Duration::from_millis(200))
@@ -596,7 +694,10 @@ impl RemoteSearch {
// never merge distant facets // never merge distant facets
proxy_federation.merge_facets = None; proxy_federation.merge_facets = None;
let params = params.clone(); let params = params.clone();
async move { proxy_search(&node, queries, proxy_federation, &params).await } async move {
proxy_search(&node, queries, proxy_federation, &params, include_metadata)
.await
}
}), }),
); );
} }
@@ -640,6 +741,13 @@ impl RemoteSearch {
continue 'remote_queries; continue 'remote_queries;
} }
// Add remote name to metadata
if let Some(metadata) = res.metadata.as_mut() {
for meta in metadata {
meta.remote = Some(node_name.clone());
}
}
federation.insert( federation.insert(
FEDERATION_REMOTE.to_string(), FEDERATION_REMOTE.to_string(),
serde_json::Value::String(node_name.clone()), serde_json::Value::String(node_name.clone()),
@@ -735,6 +843,7 @@ impl SearchByIndex {
} }
}; };
let rtxn = index.read_txn()?; let rtxn = index.read_txn()?;
let primary_key = index.primary_key(&rtxn)?.map(|pk| pk.to_string());
let criteria = index.criteria(&rtxn)?; let criteria = index.criteria(&rtxn)?;
let dictionary = index.dictionary(&rtxn)?; let dictionary = index.dictionary(&rtxn)?;
let dictionary: Option<Vec<_>> = let dictionary: Option<Vec<_>> =
@@ -761,6 +870,12 @@ impl SearchByIndex {
return Err(error); return Err(error);
} }
let mut results_by_query = Vec::with_capacity(queries.len()); let mut results_by_query = Vec::with_capacity(queries.len());
// all queries for an index share the same budget
let time_budget = match cutoff {
Some(cutoff) => TimeBudget::new(Duration::from_millis(cutoff)),
None => TimeBudget::default(),
};
for QueryByIndex { query, weight, query_index } in queries { for QueryByIndex { query, weight, query_index } in queries {
// use an immediately invoked lambda to capture the result without returning from the function // use an immediately invoked lambda to capture the result without returning from the function
@@ -830,17 +945,13 @@ impl SearchByIndex {
let retrieve_vectors = RetrieveVectors::new(query.retrieve_vectors); let retrieve_vectors = RetrieveVectors::new(query.retrieve_vectors);
let time_budget = match cutoff {
Some(cutoff) => TimeBudget::new(Duration::from_millis(cutoff)),
None => TimeBudget::default(),
};
let (mut search, _is_finite_pagination, _max_total_hits, _offset) = prepare_search( let (mut search, _is_finite_pagination, _max_total_hits, _offset) = prepare_search(
&index, &index,
&rtxn, &rtxn,
&query, &query,
&search_kind, &search_kind,
time_budget, // clones of `TimeBudget` share the budget rather than restart it
time_budget.clone(),
params.features, params.features,
)?; )?;
@@ -987,6 +1098,7 @@ impl SearchByIndex {
})?; })?;
self.results_by_index.push(SearchResultByIndex { self.results_by_index.push(SearchResultByIndex {
index: index_uid, index: index_uid,
primary_key,
hits: merged_result, hits: merged_result,
estimated_total_hits, estimated_total_hits,
degraded, degraded,

View File

@@ -7,7 +7,7 @@ use serde::de::DeserializeOwned;
use serde_json::Value; use serde_json::Value;
use super::types::{FederatedSearch, FederatedSearchResult, Federation}; use super::types::{FederatedSearch, FederatedSearchResult, Federation};
use crate::search::SearchQueryWithIndex; use crate::search::{SearchQueryWithIndex, INCLUDE_METADATA_HEADER};
pub const PROXY_SEARCH_HEADER: &str = "Meili-Proxy-Search"; pub const PROXY_SEARCH_HEADER: &str = "Meili-Proxy-Search";
pub const PROXY_SEARCH_HEADER_VALUE: &str = "true"; pub const PROXY_SEARCH_HEADER_VALUE: &str = "true";
@@ -98,6 +98,7 @@ pub async fn proxy_search(
queries: Vec<SearchQueryWithIndex>, queries: Vec<SearchQueryWithIndex>,
federation: Federation, federation: Federation,
params: &ProxySearchParams, params: &ProxySearchParams,
include_metadata: bool,
) -> Result<FederatedSearchResult, ProxySearchError> { ) -> Result<FederatedSearchResult, ProxySearchError> {
let url = format!("{}/multi-search", node.url); let url = format!("{}/multi-search", node.url);
@@ -119,7 +120,16 @@ pub async fn proxy_search(
}; };
for i in 0..params.try_count { for i in 0..params.try_count {
match try_proxy_search(&url, search_api_key, &federated, &params.client, deadline).await { match try_proxy_search(
&url,
search_api_key,
&federated,
&params.client,
deadline,
include_metadata,
)
.await
{
Ok(response) => return Ok(response), Ok(response) => return Ok(response),
Err(retry) => { Err(retry) => {
let duration = retry.into_duration(i)?; let duration = retry.into_duration(i)?;
@@ -127,7 +137,7 @@ pub async fn proxy_search(
} }
} }
} }
try_proxy_search(&url, search_api_key, &federated, &params.client, deadline) try_proxy_search(&url, search_api_key, &federated, &params.client, deadline, include_metadata)
.await .await
.map_err(Retry::into_error) .map_err(Retry::into_error)
} }
@@ -138,6 +148,7 @@ async fn try_proxy_search(
federated: &FederatedSearch, federated: &FederatedSearch,
client: &Client, client: &Client,
deadline: std::time::Instant, deadline: std::time::Instant,
include_metadata: bool,
) -> Result<FederatedSearchResult, Retry> { ) -> Result<FederatedSearchResult, Retry> {
let timeout = deadline.saturating_duration_since(std::time::Instant::now()); let timeout = deadline.saturating_duration_since(std::time::Instant::now());
@@ -148,6 +159,8 @@ async fn try_proxy_search(
request request
}; };
let request = request.header(PROXY_SEARCH_HEADER, PROXY_SEARCH_HEADER_VALUE); let request = request.header(PROXY_SEARCH_HEADER, PROXY_SEARCH_HEADER_VALUE);
let request =
if include_metadata { request.header(INCLUDE_METADATA_HEADER, "true") } else { request };
let response = request.send().await; let response = request.send().await;
let response = match response { let response = match response {

View File

@@ -18,6 +18,8 @@ use serde::{Deserialize, Serialize};
use utoipa::ToSchema; use utoipa::ToSchema;
use uuid::Uuid; use uuid::Uuid;
use crate::search::SearchMetadata;
use super::super::{ComputedFacets, FacetStats, HitsInfo, SearchHit, SearchQueryWithIndex}; use super::super::{ComputedFacets, FacetStats, HitsInfo, SearchHit, SearchQueryWithIndex};
use crate::milli::vector::Embedding; use crate::milli::vector::Embedding;
@@ -134,6 +136,8 @@ pub struct FederatedSearchResult {
pub facets_by_index: FederatedFacets, pub facets_by_index: FederatedFacets,
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none")]
pub request_uid: Option<Uuid>, pub request_uid: Option<Uuid>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub metadata: Option<Vec<SearchMetadata>>,
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none")]
pub remote_errors: Option<BTreeMap<String, ResponseError>>, pub remote_errors: Option<BTreeMap<String, ResponseError>>,
@@ -160,6 +164,7 @@ impl fmt::Debug for FederatedSearchResult {
facets_by_index, facets_by_index,
remote_errors, remote_errors,
request_uid, request_uid,
metadata,
} = self; } = self;
let mut debug = f.debug_struct("SearchResult"); let mut debug = f.debug_struct("SearchResult");
@@ -195,6 +200,9 @@ impl fmt::Debug for FederatedSearchResult {
if let Some(request_uid) = request_uid { if let Some(request_uid) = request_uid {
debug.field("request_uid", &request_uid); debug.field("request_uid", &request_uid);
} }
if let Some(metadata) = metadata {
debug.field("metadata", &metadata);
}
debug.finish() debug.finish()
} }

View File

@@ -57,6 +57,7 @@ pub const DEFAULT_CROP_MARKER: fn() -> String = || "…".to_string();
pub const DEFAULT_HIGHLIGHT_PRE_TAG: fn() -> String = || "<em>".to_string(); pub const DEFAULT_HIGHLIGHT_PRE_TAG: fn() -> String = || "<em>".to_string();
pub const DEFAULT_HIGHLIGHT_POST_TAG: fn() -> String = || "</em>".to_string(); pub const DEFAULT_HIGHLIGHT_POST_TAG: fn() -> String = || "</em>".to_string();
pub const DEFAULT_SEMANTIC_RATIO: fn() -> SemanticRatio = || SemanticRatio(0.5); pub const DEFAULT_SEMANTIC_RATIO: fn() -> SemanticRatio = || SemanticRatio(0.5);
pub const INCLUDE_METADATA_HEADER: &str = "Meili-Include-Metadata";
#[derive(Clone, Default, PartialEq, Deserr, ToSchema)] #[derive(Clone, Default, PartialEq, Deserr, ToSchema)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)] #[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
@@ -836,6 +837,18 @@ pub struct SearchHit {
pub ranking_score_details: Option<serde_json::Map<String, serde_json::Value>>, pub ranking_score_details: Option<serde_json::Map<String, serde_json::Value>>,
} }
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, ToSchema)]
#[serde(rename_all = "camelCase")]
#[schema(rename_all = "camelCase")]
pub struct SearchMetadata {
pub query_uid: Uuid,
pub index_uid: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub primary_key: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub remote: Option<String>,
}
#[derive(Serialize, Clone, PartialEq, ToSchema)] #[derive(Serialize, Clone, PartialEq, ToSchema)]
#[serde(rename_all = "camelCase")] #[serde(rename_all = "camelCase")]
#[schema(rename_all = "camelCase")] #[schema(rename_all = "camelCase")]
@@ -854,6 +867,8 @@ pub struct SearchResult {
pub facet_stats: Option<BTreeMap<String, FacetStats>>, pub facet_stats: Option<BTreeMap<String, FacetStats>>,
#[serde(skip_serializing_if = "Option::is_none")] #[serde(skip_serializing_if = "Option::is_none")]
pub request_uid: Option<Uuid>, pub request_uid: Option<Uuid>,
#[serde(skip_serializing_if = "Option::is_none")]
pub metadata: Option<SearchMetadata>,
#[serde(skip_serializing_if = "Option::is_none")] #[serde(skip_serializing_if = "Option::is_none")]
pub semantic_hit_count: Option<u32>, pub semantic_hit_count: Option<u32>,
@@ -876,6 +891,7 @@ impl fmt::Debug for SearchResult {
facet_distribution, facet_distribution,
facet_stats, facet_stats,
request_uid, request_uid,
metadata,
semantic_hit_count, semantic_hit_count,
degraded, degraded,
used_negative_operator, used_negative_operator,
@@ -908,6 +924,9 @@ impl fmt::Debug for SearchResult {
if let Some(request_uid) = request_uid { if let Some(request_uid) = request_uid {
debug.field("request_uid", &request_uid); debug.field("request_uid", &request_uid);
} }
if let Some(metadata) = metadata {
debug.field("metadata", &metadata);
}
debug.finish() debug.finish()
} }
@@ -1120,16 +1139,28 @@ pub fn prepare_search<'t>(
Ok((search, is_finite_pagination, max_total_hits, offset)) Ok((search, is_finite_pagination, max_total_hits, offset))
} }
pub fn perform_search( pub struct SearchParams {
index_uid: String, pub index_uid: String,
index: &Index, pub query: SearchQuery,
query: SearchQuery, pub search_kind: SearchKind,
search_kind: SearchKind, pub retrieve_vectors: RetrieveVectors,
retrieve_vectors: RetrieveVectors, pub features: RoFeatures,
features: RoFeatures, pub request_uid: Uuid,
request_uid: Uuid, pub include_metadata: bool,
) -> Result<SearchResult, ResponseError> { }
pub fn perform_search(params: SearchParams, index: &Index) -> Result<SearchResult, ResponseError> {
let SearchParams {
index_uid,
query,
search_kind,
retrieve_vectors,
features,
request_uid,
include_metadata,
} = params;
let before_search = Instant::now(); let before_search = Instant::now();
let index_uid_for_metadata = index_uid.clone();
let rtxn = index.read_txn()?; let rtxn = index.read_txn()?;
let time_budget = match index.search_cutoff(&rtxn)? { let time_budget = match index.search_cutoff(&rtxn)? {
Some(cutoff) => TimeBudget::new(Duration::from_millis(cutoff)), Some(cutoff) => TimeBudget::new(Duration::from_millis(cutoff)),
@@ -1150,7 +1181,20 @@ pub fn perform_search(
query_vector, query_vector,
}, },
semantic_hit_count, semantic_hit_count,
) = search_from_kind(index_uid, search_kind, search)?; ) = search_from_kind(index_uid.clone(), search_kind, search)?;
let metadata = if include_metadata {
let query_uid = Uuid::now_v7();
let primary_key = index.primary_key(&rtxn)?.map(|pk| pk.to_string());
Some(SearchMetadata {
query_uid,
index_uid: index_uid_for_metadata,
primary_key,
remote: None, // Local searches don't have a remote
})
} else {
None
};
let SearchQuery { let SearchQuery {
q, q,
@@ -1233,7 +1277,6 @@ pub fn perform_search(
.transpose()? .transpose()?
.map(|ComputedFacets { distribution, stats }| (distribution, stats)) .map(|ComputedFacets { distribution, stats }| (distribution, stats))
.unzip(); .unzip();
let result = SearchResult { let result = SearchResult {
hits: documents, hits: documents,
hits_info, hits_info,
@@ -1246,6 +1289,7 @@ pub fn perform_search(
used_negative_operator, used_negative_operator,
semantic_hit_count, semantic_hit_count,
request_uid: Some(request_uid), request_uid: Some(request_uid),
metadata,
}; };
Ok(result) Ok(result)
} }

View File

@@ -516,6 +516,18 @@ impl<State> Index<'_, State> {
self.service.post_encoded(url, query, self.encoder).await self.service.post_encoded(url, query, self.encoder).await
} }
pub async fn search_with_headers(
&self,
query: Value,
headers: Vec<(&str, &str)>,
) -> (Value, StatusCode) {
let url = format!("/indexes/{}/search", urlencode(self.uid.as_ref()));
let body = serde_json::to_string(&query).unwrap();
let mut all_headers = vec![("content-type", "application/json")];
all_headers.extend(headers);
self.service.post_str(url, body, all_headers).await
}
pub async fn search_get(&self, query: &str) -> (Value, StatusCode) { pub async fn search_get(&self, query: &str) -> (Value, StatusCode) {
let url = format!("/indexes/{}/search{}", urlencode(self.uid.as_ref()), query); let url = format!("/indexes/{}/search{}", urlencode(self.uid.as_ref()), query);
self.service.get(url).await self.service.get(url).await

View File

@@ -390,6 +390,17 @@ impl<State> Server<State> {
self.service.post("/multi-search", queries).await self.service.post("/multi-search", queries).await
} }
pub async fn multi_search_with_headers(
&self,
queries: Value,
headers: Vec<(&str, &str)>,
) -> (Value, StatusCode) {
let body = serde_json::to_string(&queries).unwrap();
let mut all_headers = vec![("content-type", "application/json")];
all_headers.extend(headers);
self.service.post_str("/multi-search", body, all_headers).await
}
pub async fn list_indexes_raw(&self, parameters: &str) -> (Value, StatusCode) { pub async fn list_indexes_raw(&self, parameters: &str) -> (Value, StatusCode) {
self.service.get(format!("/indexes{parameters}")).await self.service.get(format!("/indexes{parameters}")).await
} }

View File

@@ -0,0 +1,387 @@
use meili_snap::{json_string, snapshot};
use crate::common::{shared_index_with_documents, Server, DOCUMENTS};
use crate::json;
#[actix_rt::test]
async fn search_without_metadata_header() {
let index = shared_index_with_documents().await;
// Test that metadata is not included by default
index
.search(json!({"q": "glass"}), |response, code| {
snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".requestUid" => "[uuid]" }), @r###"
{
"hits": [
{
"title": "Gläss",
"id": "450465",
"color": [
"blue",
"red"
]
}
],
"query": "glass",
"processingTimeMs": "[duration]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1,
"requestUid": "[uuid]"
}
"###);
})
.await;
}
#[actix_rt::test]
async fn search_with_metadata_header() {
let server = Server::new_shared();
let index = server.unique_index();
let documents = DOCUMENTS.clone();
let (task, _code) = index.add_documents(documents, None).await;
server.wait_task(task.uid()).await.succeeded();
// Test with Meili-Include-Metadata header
let (response, code) = index
.search_with_headers(json!({"q": "glass"}), vec![("Meili-Include-Metadata", "true")])
.await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".requestUid" => "[uuid]", ".metadata.queryUid" => "[uuid]" }), @r###"
{
"hits": [
{
"title": "Gläss",
"id": "450465",
"color": [
"blue",
"red"
]
}
],
"query": "glass",
"processingTimeMs": "[duration]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1,
"requestUid": "[uuid]",
"metadata": {
"queryUid": "[uuid]",
"indexUid": "[uuid]",
"primaryKey": "id"
}
}
"###);
}
#[actix_rt::test]
async fn search_with_metadata_header_and_primary_key() {
let server = Server::new_shared();
let index = server.unique_index();
let documents = DOCUMENTS.clone();
let (task, _code) = index.add_documents(documents, Some("id")).await;
server.wait_task(task.uid()).await.succeeded();
// Test with Meili-Include-Metadata header
let (response, code) = index
.search_with_headers(json!({"q": "glass"}), vec![("Meili-Include-Metadata", "true")])
.await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".requestUid" => "[uuid]", ".metadata.queryUid" => "[uuid]" }), @r###"
{
"hits": [
{
"id": "450465",
"title": "Gläss",
"color": [
"blue",
"red"
]
}
],
"query": "glass",
"processingTimeMs": "[duration]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1,
"requestUid": "[uuid]",
"metadata": {
"queryUid": "[uuid]",
"indexUid": "[uuid]",
"primaryKey": "id"
}
}
"###);
}
#[actix_rt::test]
async fn multi_search_without_metadata_header() {
let server = Server::new_shared();
let index = server.unique_index();
let documents = DOCUMENTS.clone();
let (task, _code) = index.add_documents(documents, None).await;
server.wait_task(task.uid()).await.succeeded();
// Test multi-search without metadata header
let (response, code) = server
.multi_search(json!({
"queries": [
{"indexUid": index.uid, "q": "glass"},
{"indexUid": index.uid, "q": "dragon"}
]
}))
.await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".results[0].processingTimeMs" => "[duration]", ".results[0].requestUid" => "[uuid]", ".results[1].processingTimeMs" => "[duration]", ".results[1].requestUid" => "[uuid]" }), @r###"
{
"results": [
{
"indexUid": "[uuid]",
"hits": [
{
"title": "Gläss",
"id": "450465",
"color": [
"blue",
"red"
]
}
],
"query": "glass",
"processingTimeMs": "[duration]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1,
"requestUid": "[uuid]"
},
{
"indexUid": "[uuid]",
"hits": [
{
"title": "How to Train Your Dragon: The Hidden World",
"id": "166428",
"color": [
"green",
"red"
]
}
],
"query": "dragon",
"processingTimeMs": "[duration]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1,
"requestUid": "[uuid]"
}
]
}
"###);
}
#[actix_rt::test]
async fn multi_search_with_metadata_header() {
let server = Server::new_shared();
let index = server.unique_index();
let documents = DOCUMENTS.clone();
let (task, _code) = index.add_documents(documents, Some("id")).await;
server.wait_task(task.uid()).await.succeeded();
// Test multi-search with metadata header
let (response, code) = server
.multi_search_with_headers(
json!({
"queries": [
{"indexUid": index.uid, "q": "glass"},
{"indexUid": index.uid, "q": "dragon"}
]
}),
vec![("Meili-Include-Metadata", "true")],
)
.await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".results[0].processingTimeMs" => "[duration]", ".results[0].requestUid" => "[uuid]", ".results[0].metadata.queryUid" => "[uuid]", ".results[1].processingTimeMs" => "[duration]", ".results[1].requestUid" => "[uuid]", ".results[1].metadata.queryUid" => "[uuid]" }), @r###"
{
"results": [
{
"indexUid": "[uuid]",
"hits": [
{
"id": "450465",
"title": "Gläss",
"color": [
"blue",
"red"
]
}
],
"query": "glass",
"processingTimeMs": "[duration]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1,
"requestUid": "[uuid]",
"metadata": {
"queryUid": "[uuid]",
"indexUid": "[uuid]",
"primaryKey": "id"
}
},
{
"indexUid": "[uuid]",
"hits": [
{
"id": "166428",
"title": "How to Train Your Dragon: The Hidden World",
"color": [
"green",
"red"
]
}
],
"query": "dragon",
"processingTimeMs": "[duration]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1,
"requestUid": "[uuid]",
"metadata": {
"queryUid": "[uuid]",
"indexUid": "[uuid]",
"primaryKey": "id"
}
}
]
}
"###);
}
#[actix_rt::test]
async fn search_metadata_header_false_value() {
let server = Server::new_shared();
let index = server.unique_index();
let documents = DOCUMENTS.clone();
let (task, _code) = index.add_documents(documents, None).await;
server.wait_task(task.uid()).await.succeeded();
// Test with header set to false
let (response, code) = index
.search_with_headers(json!({"q": "glass"}), vec![("Meili-Include-Metadata", "false")])
.await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".requestUid" => "[uuid]" }), @r###"
{
"hits": [
{
"title": "Gläss",
"id": "450465",
"color": [
"blue",
"red"
]
}
],
"query": "glass",
"processingTimeMs": "[duration]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1,
"requestUid": "[uuid]"
}
"###);
}
#[actix_rt::test]
async fn search_metadata_uuid_format() {
let server = Server::new_shared();
let index = server.unique_index();
let documents = DOCUMENTS.clone();
let (task, _code) = index.add_documents(documents, None).await;
server.wait_task(task.uid()).await.succeeded();
let (response, code) = index
.search_with_headers(json!({"q": "glass"}), vec![("Meili-Include-Metadata", "true")])
.await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".requestUid" => "[uuid]", ".metadata.queryUid" => "[uuid]" }), @r###"
{
"hits": [
{
"title": "Gläss",
"id": "450465",
"color": [
"blue",
"red"
]
}
],
"query": "glass",
"processingTimeMs": "[duration]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1,
"requestUid": "[uuid]",
"metadata": {
"queryUid": "[uuid]",
"indexUid": "[uuid]",
"primaryKey": "id"
}
}
"###);
}
#[actix_rt::test]
async fn search_metadata_consistency_across_requests() {
let server = Server::new_shared();
let index = server.unique_index();
let documents = DOCUMENTS.clone();
let (task, _code) = index.add_documents(documents, Some("id")).await;
server.wait_task(task.uid()).await.succeeded();
// Make multiple requests and check that metadata is consistent
for _i in 0..3 {
let (response, code) = index
.search_with_headers(json!({"q": "glass"}), vec![("Meili-Include-Metadata", "true")])
.await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".requestUid" => "[uuid]", ".metadata.queryUid" => "[uuid]" }), @r###"
{
"hits": [
{
"id": "450465",
"title": "Gläss",
"color": [
"blue",
"red"
]
}
],
"query": "glass",
"processingTimeMs": "[duration]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1,
"requestUid": "[uuid]",
"metadata": {
"queryUid": "[uuid]",
"indexUid": "[uuid]",
"primaryKey": "id"
}
}
"###);
}
}

View File

@@ -11,6 +11,7 @@ mod hybrid;
#[cfg(not(feature = "chinese-pinyin"))] #[cfg(not(feature = "chinese-pinyin"))]
mod locales; mod locales;
mod matching_strategy; mod matching_strategy;
mod metadata;
mod multi; mod multi;
mod pagination; mod pagination;
mod restrict_searchable; mod restrict_searchable;

View File

@@ -43,7 +43,7 @@ async fn version_too_old() {
std::fs::write(db_path.join("VERSION"), "1.11.9999").unwrap(); std::fs::write(db_path.join("VERSION"), "1.11.9999").unwrap();
let options = Opt { experimental_dumpless_upgrade: true, ..default_settings }; let options = Opt { experimental_dumpless_upgrade: true, ..default_settings };
let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err(); let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err();
snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.23.0"); snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.24.0");
} }
#[actix_rt::test] #[actix_rt::test]
@@ -58,7 +58,7 @@ async fn version_requires_downgrade() {
std::fs::write(db_path.join("VERSION"), format!("{major}.{minor}.{patch}")).unwrap(); std::fs::write(db_path.join("VERSION"), format!("{major}.{minor}.{patch}")).unwrap();
let options = Opt { experimental_dumpless_upgrade: true, ..default_settings }; let options = Opt { experimental_dumpless_upgrade: true, ..default_settings };
let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err(); let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err();
snapshot!(err, @"Database version 1.23.1 is higher than the Meilisearch version 1.23.0. Downgrade is not supported"); snapshot!(err, @"Database version 1.24.1 is higher than the Meilisearch version 1.24.0. Downgrade is not supported");
} }
#[actix_rt::test] #[actix_rt::test]

View File

@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null, "progress": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.23.0" "upgradeTo": "v1.24.0"
}, },
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,

View File

@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null, "progress": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.23.0" "upgradeTo": "v1.24.0"
}, },
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,

View File

@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null, "progress": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.23.0" "upgradeTo": "v1.24.0"
}, },
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,

View File

@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null, "canceledBy": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.23.0" "upgradeTo": "v1.24.0"
}, },
"error": null, "error": null,
"duration": "[duration]", "duration": "[duration]",

View File

@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null, "canceledBy": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.23.0" "upgradeTo": "v1.24.0"
}, },
"error": null, "error": null,
"duration": "[duration]", "duration": "[duration]",

View File

@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null, "canceledBy": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.23.0" "upgradeTo": "v1.24.0"
}, },
"error": null, "error": null,
"duration": "[duration]", "duration": "[duration]",

View File

@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null, "progress": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.23.0" "upgradeTo": "v1.24.0"
}, },
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,

View File

@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null, "canceledBy": null,
"details": { "details": {
"upgradeFrom": "v1.12.0", "upgradeFrom": "v1.12.0",
"upgradeTo": "v1.23.0" "upgradeTo": "v1.24.0"
}, },
"error": null, "error": null,
"duration": "[duration]", "duration": "[duration]",

View File

@@ -90,7 +90,7 @@ rhai = { version = "1.22.2", features = [
"sync", "sync",
] } ] }
arroy = "0.6.4-nested-rtxns" arroy = "0.6.4-nested-rtxns"
hannoy = { version = "0.0.9-nested-rtxns", features = ["arroy"] } hannoy = { version = "0.0.9-nested-rtxns-2", features = ["arroy"] }
rand = "0.8.5" rand = "0.8.5"
tracing = "0.1.41" tracing = "0.1.41"
ureq = { version = "2.12.1", features = ["json"] } ureq = { version = "2.12.1", features = ["json"] }

View File

@@ -97,7 +97,7 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
logger.start_iteration_ranking_rule(0, ranking_rules[0].as_ref(), query, universe); logger.start_iteration_ranking_rule(0, ranking_rules[0].as_ref(), query, universe);
ranking_rules[0].start_iteration(ctx, logger, universe, query)?; ranking_rules[0].start_iteration(ctx, logger, universe, query, &time_budget)?;
let mut ranking_rule_scores: Vec<ScoreDetails> = vec![]; let mut ranking_rule_scores: Vec<ScoreDetails> = vec![];
@@ -168,42 +168,6 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
}; };
while valid_docids.len() < max_len_to_evaluate { while valid_docids.len() < max_len_to_evaluate {
if time_budget.exceeded() {
loop {
let bucket = std::mem::take(&mut ranking_rule_universes[cur_ranking_rule_index]);
ranking_rule_scores.push(ScoreDetails::Skipped);
// remove candidates from the universe without adding them to result if their score is below the threshold
let is_below_threshold =
ranking_score_threshold.is_some_and(|ranking_score_threshold| {
let current_score = ScoreDetails::global_score(ranking_rule_scores.iter());
current_score < ranking_score_threshold
});
if is_below_threshold {
all_candidates -= &bucket;
all_candidates -= &ranking_rule_universes[cur_ranking_rule_index];
} else {
maybe_add_to_results!(bucket);
}
ranking_rule_scores.pop();
if cur_ranking_rule_index == 0 {
break;
}
back!();
}
return Ok(BucketSortOutput {
scores: valid_scores,
docids: valid_docids,
all_candidates,
degraded: true,
});
}
// The universe for this bucket is zero, so we don't need to sort // The universe for this bucket is zero, so we don't need to sort
// anything, just go back to the parent ranking rule. // anything, just go back to the parent ranking rule.
if ranking_rule_universes[cur_ranking_rule_index].is_empty() if ranking_rule_universes[cur_ranking_rule_index].is_empty()
@@ -216,14 +180,63 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
continue; continue;
} }
let Some(next_bucket) = ranking_rules[cur_ranking_rule_index].next_bucket( let next_bucket = if time_budget.exceeded() {
ctx, match ranking_rules[cur_ranking_rule_index].non_blocking_next_bucket(
logger, ctx,
&ranking_rule_universes[cur_ranking_rule_index], logger,
)? &ranking_rule_universes[cur_ranking_rule_index],
else { )? {
back!(); std::task::Poll::Ready(bucket) => bucket,
continue; std::task::Poll::Pending => {
loop {
let bucket =
std::mem::take(&mut ranking_rule_universes[cur_ranking_rule_index]);
ranking_rule_scores.push(ScoreDetails::Skipped);
// remove candidates from the universe without adding them to result if their score is below the threshold
let is_below_threshold =
ranking_score_threshold.is_some_and(|ranking_score_threshold| {
let current_score =
ScoreDetails::global_score(ranking_rule_scores.iter());
current_score < ranking_score_threshold
});
if is_below_threshold {
all_candidates -= &bucket;
all_candidates -= &ranking_rule_universes[cur_ranking_rule_index];
} else {
maybe_add_to_results!(bucket);
}
ranking_rule_scores.pop();
if cur_ranking_rule_index == 0 {
break;
}
back!();
}
return Ok(BucketSortOutput {
scores: valid_scores,
docids: valid_docids,
all_candidates,
degraded: true,
});
}
}
} else {
let Some(next_bucket) = ranking_rules[cur_ranking_rule_index].next_bucket(
ctx,
logger,
&ranking_rule_universes[cur_ranking_rule_index],
&time_budget,
)?
else {
back!();
continue;
};
next_bucket
}; };
ranking_rule_scores.push(next_bucket.score); ranking_rule_scores.push(next_bucket.score);
@@ -275,6 +288,7 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
logger, logger,
&next_bucket.candidates, &next_bucket.candidates,
&next_bucket.query, &next_bucket.query,
&time_budget,
)?; )?;
} }

View File

@@ -6,7 +6,7 @@ use super::ranking_rules::{RankingRule, RankingRuleOutput};
use crate::score_details::{self, ScoreDetails}; use crate::score_details::{self, ScoreDetails};
use crate::search::new::query_graph::QueryNodeData; use crate::search::new::query_graph::QueryNodeData;
use crate::search::new::query_term::ExactTerm; use crate::search::new::query_term::ExactTerm;
use crate::{CboRoaringBitmapCodec, Result, SearchContext, SearchLogger}; use crate::{CboRoaringBitmapCodec, Result, SearchContext, SearchLogger, TimeBudget};
/// A ranking rule that produces 3 disjoint buckets: /// A ranking rule that produces 3 disjoint buckets:
/// ///
@@ -35,6 +35,7 @@ impl<'ctx> RankingRule<'ctx, QueryGraph> for ExactAttribute {
_logger: &mut dyn SearchLogger<QueryGraph>, _logger: &mut dyn SearchLogger<QueryGraph>,
universe: &roaring::RoaringBitmap, universe: &roaring::RoaringBitmap,
query: &QueryGraph, query: &QueryGraph,
_time_budget: &TimeBudget,
) -> Result<()> { ) -> Result<()> {
self.state = State::start_iteration(ctx, universe, query)?; self.state = State::start_iteration(ctx, universe, query)?;
Ok(()) Ok(())
@@ -46,6 +47,7 @@ impl<'ctx> RankingRule<'ctx, QueryGraph> for ExactAttribute {
_ctx: &mut SearchContext<'ctx>, _ctx: &mut SearchContext<'ctx>,
_logger: &mut dyn SearchLogger<QueryGraph>, _logger: &mut dyn SearchLogger<QueryGraph>,
universe: &roaring::RoaringBitmap, universe: &roaring::RoaringBitmap,
_time_budget: &TimeBudget,
) -> Result<Option<RankingRuleOutput<QueryGraph>>> { ) -> Result<Option<RankingRuleOutput<QueryGraph>>> {
let state = std::mem::take(&mut self.state); let state = std::mem::take(&mut self.state);
let (state, output) = State::next(state, universe); let (state, output) = State::next(state, universe);

View File

@@ -7,7 +7,7 @@ use super::ranking_rules::{RankingRule, RankingRuleOutput, RankingRuleQueryTrait
use crate::documents::geo_sort::{fill_cache, next_bucket}; use crate::documents::geo_sort::{fill_cache, next_bucket};
use crate::documents::{GeoSortParameter, GeoSortStrategy}; use crate::documents::{GeoSortParameter, GeoSortStrategy};
use crate::score_details::{self, ScoreDetails}; use crate::score_details::{self, ScoreDetails};
use crate::{GeoPoint, Result, SearchContext, SearchLogger}; use crate::{GeoPoint, Result, SearchContext, SearchLogger, TimeBudget};
pub struct GeoSort<Q: RankingRuleQueryTrait> { pub struct GeoSort<Q: RankingRuleQueryTrait> {
query: Option<Q>, query: Option<Q>,
@@ -84,6 +84,7 @@ impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for GeoSort<Q> {
_logger: &mut dyn SearchLogger<Q>, _logger: &mut dyn SearchLogger<Q>,
universe: &RoaringBitmap, universe: &RoaringBitmap,
query: &Q, query: &Q,
_time_budget: &TimeBudget,
) -> Result<()> { ) -> Result<()> {
assert!(self.query.is_none()); assert!(self.query.is_none());
@@ -110,6 +111,7 @@ impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for GeoSort<Q> {
ctx: &mut SearchContext<'ctx>, ctx: &mut SearchContext<'ctx>,
_logger: &mut dyn SearchLogger<Q>, _logger: &mut dyn SearchLogger<Q>,
universe: &RoaringBitmap, universe: &RoaringBitmap,
_time_budget: &TimeBudget,
) -> Result<Option<RankingRuleOutput<Q>>> { ) -> Result<Option<RankingRuleOutput<Q>>> {
let query = self.query.as_ref().unwrap().clone(); let query = self.query.as_ref().unwrap().clone();

View File

@@ -53,7 +53,7 @@ use super::{QueryGraph, RankingRule, RankingRuleOutput, SearchContext};
use crate::score_details::Rank; use crate::score_details::Rank;
use crate::search::new::query_term::LocatedQueryTermSubset; use crate::search::new::query_term::LocatedQueryTermSubset;
use crate::search::new::ranking_rule_graph::PathVisitor; use crate::search::new::ranking_rule_graph::PathVisitor;
use crate::{Result, TermsMatchingStrategy}; use crate::{Result, TermsMatchingStrategy, TimeBudget};
pub type Words = GraphBasedRankingRule<WordsGraph>; pub type Words = GraphBasedRankingRule<WordsGraph>;
impl GraphBasedRankingRule<WordsGraph> { impl GraphBasedRankingRule<WordsGraph> {
@@ -135,6 +135,7 @@ impl<'ctx, G: RankingRuleGraphTrait> RankingRule<'ctx, QueryGraph> for GraphBase
_logger: &mut dyn SearchLogger<QueryGraph>, _logger: &mut dyn SearchLogger<QueryGraph>,
_universe: &RoaringBitmap, _universe: &RoaringBitmap,
query_graph: &QueryGraph, query_graph: &QueryGraph,
_time_budget: &TimeBudget,
) -> Result<()> { ) -> Result<()> {
// the `next_max_cost` is the successor integer to the maximum cost of the paths in the graph. // the `next_max_cost` is the successor integer to the maximum cost of the paths in the graph.
// //
@@ -217,6 +218,7 @@ impl<'ctx, G: RankingRuleGraphTrait> RankingRule<'ctx, QueryGraph> for GraphBase
ctx: &mut SearchContext<'ctx>, ctx: &mut SearchContext<'ctx>,
logger: &mut dyn SearchLogger<QueryGraph>, logger: &mut dyn SearchLogger<QueryGraph>,
universe: &RoaringBitmap, universe: &RoaringBitmap,
_time_budget: &TimeBudget,
) -> Result<Option<RankingRuleOutput<QueryGraph>>> { ) -> Result<Option<RankingRuleOutput<QueryGraph>>> {
// Will crash if `next_bucket` is called before `start_iteration` or after `end_iteration`, // Will crash if `next_bucket` is called before `start_iteration` or after `end_iteration`,
// should never happen // should never happen

View File

@@ -1,9 +1,11 @@
use std::task::Poll;
use roaring::RoaringBitmap; use roaring::RoaringBitmap;
use super::logger::SearchLogger; use super::logger::SearchLogger;
use super::{QueryGraph, SearchContext}; use super::{QueryGraph, SearchContext};
use crate::score_details::ScoreDetails; use crate::score_details::ScoreDetails;
use crate::Result; use crate::{Result, TimeBudget};
/// An internal trait implemented by only [`PlaceholderQuery`] and [`QueryGraph`] /// An internal trait implemented by only [`PlaceholderQuery`] and [`QueryGraph`]
pub trait RankingRuleQueryTrait: Sized + Clone + 'static {} pub trait RankingRuleQueryTrait: Sized + Clone + 'static {}
@@ -28,12 +30,15 @@ pub trait RankingRule<'ctx, Query: RankingRuleQueryTrait> {
/// buckets using [`next_bucket`](RankingRule::next_bucket). /// buckets using [`next_bucket`](RankingRule::next_bucket).
/// ///
/// The given universe is the universe that will be given to [`next_bucket`](RankingRule::next_bucket). /// The given universe is the universe that will be given to [`next_bucket`](RankingRule::next_bucket).
///
/// If this function may take a long time, it should check the `time_budget` and return early if exceeded.
fn start_iteration( fn start_iteration(
&mut self, &mut self,
ctx: &mut SearchContext<'ctx>, ctx: &mut SearchContext<'ctx>,
logger: &mut dyn SearchLogger<Query>, logger: &mut dyn SearchLogger<Query>,
universe: &RoaringBitmap, universe: &RoaringBitmap,
query: &Query, query: &Query,
time_budget: &TimeBudget,
) -> Result<()>; ) -> Result<()>;
/// Return the next bucket of this ranking rule. /// Return the next bucket of this ranking rule.
@@ -43,13 +48,31 @@ pub trait RankingRule<'ctx, Query: RankingRuleQueryTrait> {
/// The universe given as argument is either: /// The universe given as argument is either:
/// - a subset of the universe given to the previous call to [`next_bucket`](RankingRule::next_bucket); OR /// - a subset of the universe given to the previous call to [`next_bucket`](RankingRule::next_bucket); OR
/// - the universe given to [`start_iteration`](RankingRule::start_iteration) /// - the universe given to [`start_iteration`](RankingRule::start_iteration)
///
/// If this function may take a long time, it should check the `time_budget` and return early if exceeded.
fn next_bucket( fn next_bucket(
&mut self, &mut self,
ctx: &mut SearchContext<'ctx>, ctx: &mut SearchContext<'ctx>,
logger: &mut dyn SearchLogger<Query>, logger: &mut dyn SearchLogger<Query>,
universe: &RoaringBitmap, universe: &RoaringBitmap,
time_budget: &TimeBudget,
) -> Result<Option<RankingRuleOutput<Query>>>; ) -> Result<Option<RankingRuleOutput<Query>>>;
/// Return the next bucket of this ranking rule, if doing so can be done without blocking
///
/// Even if the time budget is exceeded, when getting the next bucket is a fast operation, this should return `true`
/// to allow Meilisearch to collect the results.
///
/// Default implementation conservatively returns that it would block.
fn non_blocking_next_bucket(
&mut self,
_ctx: &mut SearchContext<'ctx>,
_logger: &mut dyn SearchLogger<Query>,
_universe: &RoaringBitmap,
) -> Result<Poll<RankingRuleOutput<Query>>> {
Ok(Poll::Pending)
}
/// Finish iterating over the buckets, which yields control to the parent ranking rule /// Finish iterating over the buckets, which yields control to the parent ranking rule
/// The next call to this ranking rule, if any, will be [`start_iteration`](RankingRule::start_iteration). /// The next call to this ranking rule, if any, will be [`start_iteration`](RankingRule::start_iteration).
fn end_iteration( fn end_iteration(

View File

@@ -7,7 +7,7 @@ use crate::heed_codec::facet::{FacetGroupKeyCodec, OrderedF64Codec};
use crate::heed_codec::{BytesRefCodec, StrRefCodec}; use crate::heed_codec::{BytesRefCodec, StrRefCodec};
use crate::score_details::{self, ScoreDetails}; use crate::score_details::{self, ScoreDetails};
use crate::search::facet::{ascending_facet_sort, descending_facet_sort}; use crate::search::facet::{ascending_facet_sort, descending_facet_sort};
use crate::{FieldId, Index, Result}; use crate::{FieldId, Index, Result, TimeBudget};
pub trait RankingRuleOutputIter<'ctx, Query> { pub trait RankingRuleOutputIter<'ctx, Query> {
fn next_bucket(&mut self) -> Result<Option<RankingRuleOutput<Query>>>; fn next_bucket(&mut self) -> Result<Option<RankingRuleOutput<Query>>>;
@@ -96,6 +96,7 @@ impl<'ctx, Query: RankingRuleQueryTrait> RankingRule<'ctx, Query> for Sort<'ctx,
_logger: &mut dyn SearchLogger<Query>, _logger: &mut dyn SearchLogger<Query>,
parent_candidates: &RoaringBitmap, parent_candidates: &RoaringBitmap,
parent_query: &Query, parent_query: &Query,
_time_budget: &TimeBudget,
) -> Result<()> { ) -> Result<()> {
let iter: RankingRuleOutputIterWrapper<'ctx, Query> = match self.field_id { let iter: RankingRuleOutputIterWrapper<'ctx, Query> = match self.field_id {
Some(field_id) => { Some(field_id) => {
@@ -194,6 +195,7 @@ impl<'ctx, Query: RankingRuleQueryTrait> RankingRule<'ctx, Query> for Sort<'ctx,
_ctx: &mut SearchContext<'ctx>, _ctx: &mut SearchContext<'ctx>,
_logger: &mut dyn SearchLogger<Query>, _logger: &mut dyn SearchLogger<Query>,
universe: &RoaringBitmap, universe: &RoaringBitmap,
_time_budget: &TimeBudget,
) -> Result<Option<RankingRuleOutput<Query>>> { ) -> Result<Option<RankingRuleOutput<Query>>> {
let iter = self.iter.as_mut().unwrap(); let iter = self.iter.as_mut().unwrap();
if let Some(mut bucket) = iter.next_bucket()? { if let Some(mut bucket) = iter.next_bucket()? {

View File

@@ -3,12 +3,17 @@
//! 2. A test that ensure the filters are affectively applied even with a cutoff of 0 //! 2. A test that ensure the filters are affectively applied even with a cutoff of 0
//! 3. A test that ensure the cutoff works well with the ranking scores //! 3. A test that ensure the cutoff works well with the ranking scores
use std::collections::BTreeMap;
use std::sync::Arc;
use std::time::Duration; use std::time::Duration;
use meili_snap::snapshot; use meili_snap::snapshot;
use crate::index::tests::TempIndex; use crate::index::tests::TempIndex;
use crate::score_details::{ScoreDetails, ScoringStrategy}; use crate::score_details::{ScoreDetails, ScoringStrategy};
use crate::update::Setting;
use crate::vector::settings::EmbeddingSettings;
use crate::vector::{Embedder, EmbedderOptions};
use crate::{Criterion, Filter, FilterableAttributesRule, Search, TimeBudget}; use crate::{Criterion, Filter, FilterableAttributesRule, Search, TimeBudget};
fn create_index() -> TempIndex { fn create_index() -> TempIndex {
@@ -361,9 +366,8 @@ fn degraded_search_and_score_details() {
] ]
"###); "###);
// After SIX loop iteration. The words ranking rule gave us a new bucket. // After FIVE loop iterations. The words ranking rule gave us a new bucket.
// Since we reached the limit we were able to early exit without checking the typo ranking rule. search.time_budget(TimeBudget::max().with_stop_after(5));
search.time_budget(TimeBudget::max().with_stop_after(6));
let result = search.execute().unwrap(); let result = search.execute().unwrap();
snapshot!(format!("IDs: {:?}\nScores: {}\nScore Details:\n{:#?}", result.documents_ids, result.document_scores.iter().map(|scores| format!("{:.4} ", ScoreDetails::global_score(scores.iter()))).collect::<String>(), result.document_scores), @r###" snapshot!(format!("IDs: {:?}\nScores: {}\nScore Details:\n{:#?}", result.documents_ids, result.document_scores.iter().map(|scores| format!("{:.4} ", ScoreDetails::global_score(scores.iter()))).collect::<String>(), result.document_scores), @r###"
@@ -424,4 +428,399 @@ fn degraded_search_and_score_details() {
], ],
] ]
"###); "###);
// After SIX loop iterations.
// we finished
search.time_budget(TimeBudget::max().with_stop_after(6));
let result = search.execute().unwrap();
snapshot!(format!("IDs: {:?}\nScores: {}\nScore Details:\n{:#?}", result.documents_ids, result.document_scores.iter().map(|scores| format!("{:.4} ", ScoreDetails::global_score(scores.iter()))).collect::<String>(), result.document_scores), @r###"
IDs: [4, 1, 0, 3]
Scores: 1.0000 0.9167 0.8333 0.6667
Score Details:
[
[
Words(
Words {
matching_words: 3,
max_matching_words: 3,
},
),
Typo(
Typo {
typo_count: 0,
max_typo_count: 3,
},
),
],
[
Words(
Words {
matching_words: 3,
max_matching_words: 3,
},
),
Typo(
Typo {
typo_count: 1,
max_typo_count: 3,
},
),
],
[
Words(
Words {
matching_words: 3,
max_matching_words: 3,
},
),
Typo(
Typo {
typo_count: 2,
max_typo_count: 3,
},
),
],
[
Words(
Words {
matching_words: 2,
max_matching_words: 3,
},
),
Typo(
Typo {
typo_count: 0,
max_typo_count: 2,
},
),
],
]
"###);
}
#[test]
fn degraded_search_and_score_details_vector() {
let index = create_index();
index
.add_documents(documents!([
{
"id": 4,
"text": "hella puppo kefir",
"_vectors": {
"default": [0.1, 0.1]
}
},
{
"id": 3,
"text": "hella puppy kefir",
"_vectors": {
"default": [-0.1, 0.1]
}
},
{
"id": 2,
"text": "hello",
"_vectors": {
"default": [0.1, -0.1]
}
},
{
"id": 1,
"text": "hello puppy",
"_vectors": {
"default": [-0.1, -0.1]
}
},
{
"id": 0,
"text": "hello puppy kefir",
"_vectors": {
"default": null
}
},
]))
.unwrap();
index
.update_settings(|settings| {
let mut embedders = BTreeMap::new();
embedders.insert(
"default".into(),
Setting::Set(EmbeddingSettings {
source: Setting::Set(crate::vector::settings::EmbedderSource::UserProvided),
dimensions: Setting::Set(2),
..Default::default()
}),
);
settings.set_embedder_settings(embedders);
settings.set_vector_store(crate::vector::VectorStoreBackend::Hannoy);
})
.unwrap();
let rtxn = index.read_txn().unwrap();
let mut search = Search::new(&rtxn, &index);
let embedder = Arc::new(
Embedder::new(
EmbedderOptions::UserProvided(crate::vector::embedder::manual::EmbedderOptions {
dimensions: 2,
distribution: None,
}),
0,
)
.unwrap(),
);
search.semantic("default".into(), embedder, false, Some(vec![1., -1.]), None);
search.limit(4);
search.scoring_strategy(ScoringStrategy::Detailed);
search.time_budget(TimeBudget::max());
let result = search.execute().unwrap();
snapshot!(format!("IDs: {:?}\nScores: {}\nScore Details:\n{:#?}", result.documents_ids, result.document_scores.iter().map(|scores| format!("{:.4} ", ScoreDetails::global_score(scores.iter()))).collect::<String>(), result.document_scores), @r###"
IDs: [2, 0, 3, 1]
Scores: 1.0000 0.5000 0.5000 0.0000
Score Details:
[
[
Vector(
Vector {
similarity: Some(
1.0,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.5,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.5,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.0,
),
},
),
],
]
"###);
// Do ONE loop iteration. Not much can be deduced, almost everyone matched the words first bucket.
search.time_budget(TimeBudget::max().with_stop_after(1));
let result = search.execute().unwrap();
snapshot!(format!("IDs: {:?}\nScores: {}\nScore Details:\n{:#?}", result.documents_ids, result.document_scores.iter().map(|scores| format!("{:.4} ", ScoreDetails::global_score(scores.iter()))).collect::<String>(), result.document_scores), @r###"
IDs: [0, 1, 2, 3]
Scores: 0.5000 0.0000 0.0000 0.0000
Score Details:
[
[
Vector(
Vector {
similarity: Some(
0.5,
),
},
),
],
[
Skipped,
],
[
Skipped,
],
[
Skipped,
],
]
"###);
search.time_budget(TimeBudget::max().with_stop_after(2));
let result = search.execute().unwrap();
snapshot!(format!("IDs: {:?}\nScores: {}\nScore Details:\n{:#?}", result.documents_ids, result.document_scores.iter().map(|scores| format!("{:.4} ", ScoreDetails::global_score(scores.iter()))).collect::<String>(), result.document_scores), @r###"
IDs: [0, 1, 2, 3]
Scores: 0.5000 0.0000 0.0000 0.0000
Score Details:
[
[
Vector(
Vector {
similarity: Some(
0.5,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.0,
),
},
),
],
[
Skipped,
],
[
Skipped,
],
]
"###);
search.time_budget(TimeBudget::max().with_stop_after(3));
let result = search.execute().unwrap();
snapshot!(format!("IDs: {:?}\nScores: {}\nScore Details:\n{:#?}", result.documents_ids, result.document_scores.iter().map(|scores| format!("{:.4} ", ScoreDetails::global_score(scores.iter()))).collect::<String>(), result.document_scores), @r###"
IDs: [2, 0, 1, 3]
Scores: 1.0000 0.5000 0.0000 0.0000
Score Details:
[
[
Vector(
Vector {
similarity: Some(
1.0,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.5,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.0,
),
},
),
],
[
Skipped,
],
]
"###);
search.time_budget(TimeBudget::max().with_stop_after(4));
let result = search.execute().unwrap();
snapshot!(format!("IDs: {:?}\nScores: {}\nScore Details:\n{:#?}", result.documents_ids, result.document_scores.iter().map(|scores| format!("{:.4} ", ScoreDetails::global_score(scores.iter()))).collect::<String>(), result.document_scores), @r###"
IDs: [2, 0, 3, 1]
Scores: 1.0000 0.5000 0.5000 0.0000
Score Details:
[
[
Vector(
Vector {
similarity: Some(
1.0,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.5,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.5,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.0,
),
},
),
],
]
"###);
search.time_budget(TimeBudget::max().with_stop_after(5));
let result = search.execute().unwrap();
snapshot!(format!("IDs: {:?}\nScores: {}\nScore Details:\n{:#?}", result.documents_ids, result.document_scores.iter().map(|scores| format!("{:.4} ", ScoreDetails::global_score(scores.iter()))).collect::<String>(), result.document_scores), @r###"
IDs: [2, 0, 3, 1]
Scores: 1.0000 0.5000 0.5000 0.0000
Score Details:
[
[
Vector(
Vector {
similarity: Some(
1.0,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.5,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.5,
),
},
),
],
[
Vector(
Vector {
similarity: Some(
0.0,
),
},
),
],
]
"###);
} }

View File

@@ -1,4 +1,5 @@
use std::iter::FromIterator; use std::iter::FromIterator;
use std::task::Poll;
use std::time::Instant; use std::time::Instant;
use roaring::RoaringBitmap; use roaring::RoaringBitmap;
@@ -7,7 +8,7 @@ use super::ranking_rules::{RankingRule, RankingRuleOutput, RankingRuleQueryTrait
use super::VectorStoreStats; use super::VectorStoreStats;
use crate::score_details::{self, ScoreDetails}; use crate::score_details::{self, ScoreDetails};
use crate::vector::{DistributionShift, Embedder, VectorStore}; use crate::vector::{DistributionShift, Embedder, VectorStore};
use crate::{DocumentId, Result, SearchContext, SearchLogger}; use crate::{DocumentId, Result, SearchContext, SearchLogger, TimeBudget};
pub struct VectorSort<Q: RankingRuleQueryTrait> { pub struct VectorSort<Q: RankingRuleQueryTrait> {
query: Option<Q>, query: Option<Q>,
@@ -52,6 +53,7 @@ impl<Q: RankingRuleQueryTrait> VectorSort<Q> {
&mut self, &mut self,
ctx: &mut SearchContext<'_>, ctx: &mut SearchContext<'_>,
vector_candidates: &RoaringBitmap, vector_candidates: &RoaringBitmap,
time_budget: &TimeBudget,
) -> Result<()> { ) -> Result<()> {
let target = &self.target; let target = &self.target;
let backend = ctx.index.get_vector_store(ctx.txn)?.unwrap_or_default(); let backend = ctx.index.get_vector_store(ctx.txn)?.unwrap_or_default();
@@ -59,7 +61,13 @@ impl<Q: RankingRuleQueryTrait> VectorSort<Q> {
let before = Instant::now(); let before = Instant::now();
let reader = let reader =
VectorStore::new(backend, ctx.index.vector_store, self.embedder_index, self.quantized); VectorStore::new(backend, ctx.index.vector_store, self.embedder_index, self.quantized);
let results = reader.nns_by_vector(ctx.txn, target, self.limit, Some(vector_candidates))?; let results = reader.nns_by_vector(
ctx.txn,
target,
self.limit,
Some(vector_candidates),
time_budget,
)?;
self.cached_sorted_docids = results.into_iter(); self.cached_sorted_docids = results.into_iter();
*ctx.vector_store_stats.get_or_insert_default() += VectorStoreStats { *ctx.vector_store_stats.get_or_insert_default() += VectorStoreStats {
total_time: before.elapsed(), total_time: before.elapsed(),
@@ -69,6 +77,20 @@ impl<Q: RankingRuleQueryTrait> VectorSort<Q> {
Ok(()) Ok(())
} }
fn next_result(&mut self, vector_candidates: &RoaringBitmap) -> Option<(DocumentId, f32)> {
for (docid, distance) in self.cached_sorted_docids.by_ref() {
if vector_candidates.contains(docid) {
let score = 1.0 - distance;
let score = self
.distribution_shift
.map(|distribution| distribution.shift(score))
.unwrap_or(score);
return Some((docid, score));
}
}
None
}
} }
impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for VectorSort<Q> { impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for VectorSort<Q> {
@@ -83,12 +105,13 @@ impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for VectorSort<Q> {
_logger: &mut dyn SearchLogger<Q>, _logger: &mut dyn SearchLogger<Q>,
universe: &RoaringBitmap, universe: &RoaringBitmap,
query: &Q, query: &Q,
time_budget: &TimeBudget,
) -> Result<()> { ) -> Result<()> {
assert!(self.query.is_none()); assert!(self.query.is_none());
self.query = Some(query.clone()); self.query = Some(query.clone());
let vector_candidates = &self.vector_candidates & universe; let vector_candidates = &self.vector_candidates & universe;
self.fill_buffer(ctx, &vector_candidates)?; self.fill_buffer(ctx, &vector_candidates, time_budget)?;
Ok(()) Ok(())
} }
@@ -99,6 +122,7 @@ impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for VectorSort<Q> {
ctx: &mut SearchContext<'ctx>, ctx: &mut SearchContext<'ctx>,
_logger: &mut dyn SearchLogger<Q>, _logger: &mut dyn SearchLogger<Q>,
universe: &RoaringBitmap, universe: &RoaringBitmap,
time_budget: &TimeBudget,
) -> Result<Option<RankingRuleOutput<Q>>> { ) -> Result<Option<RankingRuleOutput<Q>>> {
let query = self.query.as_ref().unwrap().clone(); let query = self.query.as_ref().unwrap().clone();
let vector_candidates = &self.vector_candidates & universe; let vector_candidates = &self.vector_candidates & universe;
@@ -111,24 +135,17 @@ impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for VectorSort<Q> {
})); }));
} }
for (docid, distance) in self.cached_sorted_docids.by_ref() { if let Some((docid, score)) = self.next_result(&vector_candidates) {
if vector_candidates.contains(docid) { return Ok(Some(RankingRuleOutput {
let score = 1.0 - distance; query,
let score = self candidates: RoaringBitmap::from_iter([docid]),
.distribution_shift score: ScoreDetails::Vector(score_details::Vector { similarity: Some(score) }),
.map(|distribution| distribution.shift(score)) }));
.unwrap_or(score);
return Ok(Some(RankingRuleOutput {
query,
candidates: RoaringBitmap::from_iter([docid]),
score: ScoreDetails::Vector(score_details::Vector { similarity: Some(score) }),
}));
}
} }
// if we got out of this loop it means we've exhausted our cache. // if we got out of this loop it means we've exhausted our cache.
// we need to refill it and run the function again. // we need to refill it and run the function again.
self.fill_buffer(ctx, &vector_candidates)?; self.fill_buffer(ctx, &vector_candidates, time_budget)?;
// we tried filling the buffer, but it remained empty 😢 // we tried filling the buffer, but it remained empty 😢
// it means we don't actually have any document remaining in the universe with a vector. // it means we don't actually have any document remaining in the universe with a vector.
@@ -141,11 +158,39 @@ impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for VectorSort<Q> {
})); }));
} }
self.next_bucket(ctx, _logger, universe) self.next_bucket(ctx, _logger, universe, time_budget)
} }
#[tracing::instrument(level = "trace", skip_all, target = "search::vector_sort")] #[tracing::instrument(level = "trace", skip_all, target = "search::vector_sort")]
fn end_iteration(&mut self, _ctx: &mut SearchContext<'ctx>, _logger: &mut dyn SearchLogger<Q>) { fn end_iteration(&mut self, _ctx: &mut SearchContext<'ctx>, _logger: &mut dyn SearchLogger<Q>) {
self.query = None; self.query = None;
} }
fn non_blocking_next_bucket(
&mut self,
_ctx: &mut SearchContext<'ctx>,
_logger: &mut dyn SearchLogger<Q>,
universe: &RoaringBitmap,
) -> Result<Poll<RankingRuleOutput<Q>>> {
let query = self.query.as_ref().unwrap().clone();
let vector_candidates = &self.vector_candidates & universe;
if vector_candidates.is_empty() {
return Ok(Poll::Ready(RankingRuleOutput {
query,
candidates: universe.clone(),
score: ScoreDetails::Vector(score_details::Vector { similarity: None }),
}));
}
if let Some((docid, score)) = self.next_result(&vector_candidates) {
Ok(Poll::Ready(RankingRuleOutput {
query,
candidates: RoaringBitmap::from_iter([docid]),
score: ScoreDetails::Vector(score_details::Vector { similarity: Some(score) }),
}))
} else {
Ok(Poll::Pending)
}
}
} }

View File

@@ -41,6 +41,7 @@ const UPGRADE_FUNCTIONS: &[&dyn UpgradeIndex] = &[
&ToTargetNoOp { target: (1, 21, 0) }, &ToTargetNoOp { target: (1, 21, 0) },
&ToTargetNoOp { target: (1, 22, 0) }, &ToTargetNoOp { target: (1, 22, 0) },
&ToTargetNoOp { target: (1, 23, 0) }, &ToTargetNoOp { target: (1, 23, 0) },
&ToTargetNoOp { target: (1, 24, 0) },
// This is the last upgrade function, it will be called when the index is up to date. // This is the last upgrade function, it will be called when the index is up to date.
// any other upgrade function should be added before this one. // any other upgrade function should be added before this one.
&ToCurrentNoOp {}, &ToCurrentNoOp {},
@@ -75,6 +76,7 @@ const fn start(from: (u32, u32, u32)) -> Option<usize> {
(1, 21, _) => function_index!(11), (1, 21, _) => function_index!(11),
(1, 22, _) => function_index!(12), (1, 22, _) => function_index!(12),
(1, 23, _) => function_index!(13), (1, 23, _) => function_index!(13),
(1, 24, _) => function_index!(14),
// We deliberately don't add a placeholder with (VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH) here to force manually // We deliberately don't add a placeholder with (VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH) here to force manually
// considering dumpless upgrade. // considering dumpless upgrade.
(_major, _minor, _patch) => return None, (_major, _minor, _patch) => return None,

View File

@@ -8,6 +8,7 @@ use serde::{Deserialize, Serialize};
use crate::progress::Progress; use crate::progress::Progress;
use crate::vector::Embeddings; use crate::vector::Embeddings;
use crate::TimeBudget;
const HANNOY_EF_CONSTRUCTION: usize = 125; const HANNOY_EF_CONSTRUCTION: usize = 125;
const HANNOY_M: usize = 16; const HANNOY_M: usize = 16;
@@ -591,6 +592,7 @@ impl VectorStore {
vector: &[f32], vector: &[f32],
limit: usize, limit: usize,
filter: Option<&RoaringBitmap>, filter: Option<&RoaringBitmap>,
time_budget: &TimeBudget,
) -> crate::Result<Vec<(ItemId, f32)>> { ) -> crate::Result<Vec<(ItemId, f32)>> {
if self.backend == VectorStoreBackend::Arroy { if self.backend == VectorStoreBackend::Arroy {
if self.quantized { if self.quantized {
@@ -601,11 +603,25 @@ impl VectorStore {
.map_err(Into::into) .map_err(Into::into)
} }
} else if self.quantized { } else if self.quantized {
self._hannoy_nns_by_vector(rtxn, self._hannoy_quantized_db(), vector, limit, filter) self._hannoy_nns_by_vector(
.map_err(Into::into) rtxn,
self._hannoy_quantized_db(),
vector,
limit,
filter,
time_budget,
)
.map_err(Into::into)
} else { } else {
self._hannoy_nns_by_vector(rtxn, self._hannoy_angular_db(), vector, limit, filter) self._hannoy_nns_by_vector(
.map_err(Into::into) rtxn,
self._hannoy_angular_db(),
vector,
limit,
filter,
time_budget,
)
.map_err(Into::into)
} }
} }
pub fn item_vectors(&self, rtxn: &RoTxn, item_id: u32) -> crate::Result<Vec<Vec<f32>>> { pub fn item_vectors(&self, rtxn: &RoTxn, item_id: u32) -> crate::Result<Vec<Vec<f32>>> {
@@ -1000,6 +1016,7 @@ impl VectorStore {
vector: &[f32], vector: &[f32],
limit: usize, limit: usize,
filter: Option<&RoaringBitmap>, filter: Option<&RoaringBitmap>,
time_budget: &TimeBudget,
) -> Result<Vec<(ItemId, f32)>, hannoy::Error> { ) -> Result<Vec<(ItemId, f32)>, hannoy::Error> {
let mut results = Vec::new(); let mut results = Vec::new();
@@ -1011,7 +1028,10 @@ impl VectorStore {
searcher.candidates(filter); searcher.candidates(filter);
} }
results.append(&mut searcher.by_vector(rtxn, vector)?); let (res, _degraded) =
&mut searcher
.by_vector_with_cancellation(rtxn, vector, || time_budget.exceeded())?;
results.append(res);
} }
results.sort_unstable_by_key(|(_, distance)| OrderedFloat(*distance)); results.sort_unstable_by_key(|(_, distance)| OrderedFloat(*distance));

View File

@@ -20,14 +20,16 @@ These make us iterate fast before stabilizing it for the current release.
### Release steps ### Release steps
The prototype name must follow this convention: `prototype-X-Y` where The prototype name must follow this convention: `prototype-v<version>.<name>-<number>` where
- `X` is the feature name formatted in `kebab-case`. It should not end with a single number. - `version` is the version of Meilisearch on which the prototype is based.
- `name` is the feature name formatted in `kebab-case`. It should not end with a single number.
- `Y` is the version of the prototype, starting from `0`. - `Y` is the version of the prototype, starting from `0`.
✅ Example: `prototype-auto-resize-0`. </br> ✅ Example: `prototype-v1.23.0.search-personalization-0`. </br>
❌ Bad example: `auto-resize-0`: lacks the `prototype` prefix. </br> ❌ Bad example: `prototype-search-personalization-0`: version is missing.</br>
❌ Bad example: `prototype-auto-resize`: lacks the version suffix. </br> ❌ Bad example: `v1.23.0.auto-resize-0`: lacks the `prototype` prefix. </br>
❌ Bad example: `prototype-auto-resize-0-0`: feature name ends with a single number. ❌ Bad example: `prototype-v1.23.0.auto-resize`: lacks the version suffix. </br>
❌ Bad example: `prototype-v1.23.0.auto-resize-0-0`: feature name ends with a single number.
Steps to create a prototype: Steps to create a prototype: