diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 3665d3303..0fbc68c1d 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -5,12 +5,12 @@ Fixes #... ## Requirements ⚠️ Ensure the following requirements before merging ⚠️ -- [] Automated tests have been added. -- [] If some tests cannot be automated, manual rigorous tests should be applied. -- [] ⚠️ If there is any change in the DB: +- [ ] Automated tests have been added. +- [ ] If some tests cannot be automated, manual rigorous tests should be applied. +- [ ] ⚠️ If there is any change in the DB: - [ ] Test that any impacted DB still works as expected after using `--experimental-dumpless-upgrade` on a DB created with the last released Meilisearch - [ ] Test that during the upgrade, **search is still available** (artificially make the upgrade longer if needed) - [ ] Set the `db change` label. -- [] If necessary, the feature have been tested in the Cloud production environment (with [prototypes](./documentation/prototypes.md)) and the Cloud UI is ready. -- [] If necessary, the [documentation](https://github.com/meilisearch/documentation) related to the implemented feature in the PR is ready. -- [] If necessary, the [integrations](https://github.com/meilisearch/integration-guides) related to the implemented feature in the PR are ready. +- [ ] If necessary, the feature have been tested in the Cloud production environment (with [prototypes](./documentation/prototypes.md)) and the Cloud UI is ready. +- [ ] If necessary, the [documentation](https://github.com/meilisearch/documentation) related to the implemented feature in the PR is ready. +- [ ] If necessary, the [integrations](https://github.com/meilisearch/integration-guides) related to the implemented feature in the PR are ready. diff --git a/.github/release-draft-template.yml b/.github/release-draft-template.yml index 1088be33b..ffe2fa5b7 100644 --- a/.github/release-draft-template.yml +++ b/.github/release-draft-template.yml @@ -3,17 +3,27 @@ tag-template: 'v$RESOLVED_VERSION' exclude-labels: - 'skip changelog' version-resolver: - major: - labels: - - 'breaking-change' minor: labels: - 'enhancement' default: patch +categories: + - title: '⚠️ Breaking changes' + label: 'breaking-change' + - title: '🚀 Enhancements' + label: 'enhancement' + - title: '🐛 Bug Fixes' + label: 'bug' + - title: '🔒 Security' + label: 'security' + - title: '⚙️ Maintenance/misc' + label: + - 'maintenance' + - 'documentation' template: | $CHANGES - Thanks again to $CONTRIBUTORS! 🎉 + ❤️ Huge thanks to our contributors: $CONTRIBUTORS. no-changes-template: 'Changes are coming soon 😎' sort-direction: 'ascending' replacers: diff --git a/.github/workflows/publish-binaries.yml b/.github/workflows/publish-release-assets.yml similarity index 88% rename from .github/workflows/publish-binaries.yml rename to .github/workflows/publish-release-assets.yml index 27d8c3610..ec0d36711 100644 --- a/.github/workflows/publish-binaries.yml +++ b/.github/workflows/publish-release-assets.yml @@ -1,4 +1,4 @@ -name: Publish binaries to GitHub release +name: Publish assets to GitHub release on: workflow_dispatch: @@ -184,3 +184,28 @@ jobs: file: target/${{ matrix.target }}/release/meilisearch asset_name: ${{ matrix.asset_name }} tag: ${{ github.ref }} + + publish-openapi-file: + name: Publish OpenAPI file + runs-on: ubuntu-latest + steps: + - name: Checkout code + uses: actions/checkout@v4 + - name: Setup Rust + uses: actions-rs/toolchain@v1 + with: + toolchain: stable + override: true + - name: Generate OpenAPI file + run: | + cd crates/openapi-generator + cargo run --release -- --pretty --output ../../meilisearch.json + - name: Upload OpenAPI to Release + # No need to upload for dry run (cron) + if: github.event_name == 'release' + uses: svenstaro/upload-release-action@2.11.2 + with: + repo_token: ${{ secrets.MEILI_BOT_GH_PAT }} + file: ./meilisearch.json + asset_name: meilisearch-openapi.json + tag: ${{ github.ref }} diff --git a/.github/workflows/update-cargo-toml-version.yml b/.github/workflows/update-cargo-toml-version.yml index d13a4404a..4118cd651 100644 --- a/.github/workflows/update-cargo-toml-version.yml +++ b/.github/workflows/update-cargo-toml-version.yml @@ -41,5 +41,4 @@ jobs: --title "Update version for the next release ($NEW_VERSION) in Cargo.toml" \ --body '⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.' \ --label 'skip changelog' \ - --milestone $NEW_VERSION \ --base $GITHUB_REF_NAME diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 72a91a765..7f718c899 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -107,12 +107,18 @@ Run `cargo xtask --help` from the root of the repository to find out what is ava To update the openAPI file in the code, see [sprint_issue.md](https://github.com/meilisearch/meilisearch/blob/main/.github/ISSUE_TEMPLATE/sprint_issue.md#reminders-when-modifying-the-api). -If you want to update the openAPI file on the [open-api repository](https://github.com/meilisearch/open-api): -- Pull the latest version of the latest rc of Meilisearch `git checkout release-vX.Y.Z; git pull` +If you want to generate OpenAPI file manually: + +With swagger: - Starts Meilisearch with the `swagger` feature flag: `cargo run --features swagger` - On a browser, open the following URL: http://localhost:7700/scalar - Click the « Download openAPI file » -- Open a PR replacing [this file](https://github.com/meilisearch/open-api/blob/main/open-api.json) with the one downloaded + +With the internal crate: +```bash +cd crates/openapi-generator +cargo run --release -- --pretty --output meilisearch.json +``` ### Logging diff --git a/Cargo.lock b/Cargo.lock index 43f491f1e..04a3cdb76 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -580,7 +580,7 @@ source = "git+https://github.com/meilisearch/bbqueue#cbb87cc707b5af415ef203bdaf2 [[package]] name = "benchmarks" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "bumpalo", @@ -770,7 +770,7 @@ dependencies = [ [[package]] name = "build-info" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "time", @@ -1774,7 +1774,7 @@ dependencies = [ [[package]] name = "dump" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "big_s", @@ -2006,7 +2006,7 @@ checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be" [[package]] name = "file-store" -version = "1.16.0" +version = "1.17.1" dependencies = [ "tempfile", "thiserror 2.0.12", @@ -2028,7 +2028,7 @@ dependencies = [ [[package]] name = "filter-parser" -version = "1.16.0" +version = "1.17.1" dependencies = [ "insta", "levenshtein_automata", @@ -2050,7 +2050,7 @@ dependencies = [ [[package]] name = "flatten-serde-json" -version = "1.16.0" +version = "1.17.1" dependencies = [ "criterion", "serde_json", @@ -2195,7 +2195,7 @@ dependencies = [ [[package]] name = "fuzzers" -version = "1.16.0" +version = "1.17.1" dependencies = [ "arbitrary", "bumpalo", @@ -2995,7 +2995,7 @@ dependencies = [ [[package]] name = "index-scheduler" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "backoff", @@ -3231,7 +3231,7 @@ dependencies = [ [[package]] name = "json-depth-checker" -version = "1.16.0" +version = "1.17.1" dependencies = [ "criterion", "serde_json", @@ -3725,7 +3725,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771" [[package]] name = "meili-snap" -version = "1.16.0" +version = "1.17.1" dependencies = [ "insta", "md5", @@ -3736,7 +3736,7 @@ dependencies = [ [[package]] name = "meilisearch" -version = "1.16.0" +version = "1.17.1" dependencies = [ "actix-cors", "actix-http", @@ -3832,7 +3832,7 @@ dependencies = [ [[package]] name = "meilisearch-auth" -version = "1.16.0" +version = "1.17.1" dependencies = [ "base64 0.22.1", "enum-iterator", @@ -3851,7 +3851,7 @@ dependencies = [ [[package]] name = "meilisearch-types" -version = "1.16.0" +version = "1.17.1" dependencies = [ "actix-web", "anyhow", @@ -3886,7 +3886,7 @@ dependencies = [ [[package]] name = "meilitool" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "clap", @@ -3920,7 +3920,7 @@ dependencies = [ [[package]] name = "milli" -version = "1.16.0" +version = "1.17.1" dependencies = [ "allocator-api2 0.3.0", "arroy", @@ -4339,6 +4339,17 @@ version = "11.1.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d6790f58c7ff633d8771f42965289203411a5e5c68388703c06e14f24770b41e" +[[package]] +name = "openapi-generator" +version = "0.1.0" +dependencies = [ + "anyhow", + "clap", + "meilisearch", + "serde_json", + "utoipa", +] + [[package]] name = "openssl-probe" version = "0.1.6" @@ -4472,7 +4483,7 @@ checksum = "e3148f5046208a5d56bcfc03053e3ca6334e51da8dfb19b6cdc8b306fae3283e" [[package]] name = "permissive-json-pointer" -version = "1.16.0" +version = "1.17.1" dependencies = [ "big_s", "serde_json", @@ -7260,7 +7271,7 @@ dependencies = [ [[package]] name = "xtask" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "build-info", diff --git a/Cargo.toml b/Cargo.toml index 3e57563b6..bc1c354b7 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -19,10 +19,11 @@ members = [ "crates/tracing-trace", "crates/xtask", "crates/build-info", + "crates/openapi-generator", ] [workspace.package] -version = "1.16.0" +version = "1.17.1" authors = [ "Quentin de Quelen ", "Clément Renault ", diff --git a/crates/benchmarks/Cargo.toml b/crates/benchmarks/Cargo.toml index f60f0979c..f05100c2c 100644 --- a/crates/benchmarks/Cargo.toml +++ b/crates/benchmarks/Cargo.toml @@ -55,3 +55,7 @@ harness = false [[bench]] name = "sort" harness = false + +[[bench]] +name = "filter_starts_with" +harness = false diff --git a/crates/benchmarks/benches/filter_starts_with.rs b/crates/benchmarks/benches/filter_starts_with.rs new file mode 100644 index 000000000..a7682cbf8 --- /dev/null +++ b/crates/benchmarks/benches/filter_starts_with.rs @@ -0,0 +1,66 @@ +mod datasets_paths; +mod utils; + +use criterion::{criterion_group, criterion_main}; +use milli::update::Settings; +use milli::FilterableAttributesRule; +use utils::Conf; + +#[cfg(not(windows))] +#[global_allocator] +static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc; + +fn base_conf(builder: &mut Settings) { + let displayed_fields = ["geonameid", "name"].iter().map(|s| s.to_string()).collect(); + builder.set_displayed_fields(displayed_fields); + + let filterable_fields = + ["name"].iter().map(|s| FilterableAttributesRule::Field(s.to_string())).collect(); + builder.set_filterable_fields(filterable_fields); +} + +#[rustfmt::skip] +const BASE_CONF: Conf = Conf { + dataset: datasets_paths::SMOL_ALL_COUNTRIES, + dataset_format: "jsonl", + queries: &[ + "", + ], + configure: base_conf, + primary_key: Some("geonameid"), + ..Conf::BASE +}; + +fn filter_starts_with(c: &mut criterion::Criterion) { + #[rustfmt::skip] + let confs = &[ + utils::Conf { + group_name: "1 letter", + filter: Some("name STARTS WITH e"), + ..BASE_CONF + }, + + utils::Conf { + group_name: "2 letters", + filter: Some("name STARTS WITH es"), + ..BASE_CONF + }, + + utils::Conf { + group_name: "3 letters", + filter: Some("name STARTS WITH est"), + ..BASE_CONF + }, + + utils::Conf { + group_name: "6 letters", + filter: Some("name STARTS WITH estoni"), + ..BASE_CONF + } + ]; + + utils::run_benches(c, confs); +} + +criterion_group!(benches, filter_starts_with); +criterion_main!(benches); diff --git a/crates/dump/src/reader/compat/v5_to_v6.rs b/crates/dump/src/reader/compat/v5_to_v6.rs index f173bb6bd..3a0c8ef0d 100644 --- a/crates/dump/src/reader/compat/v5_to_v6.rs +++ b/crates/dump/src/reader/compat/v5_to_v6.rs @@ -202,6 +202,10 @@ impl CompatV5ToV6 { pub fn network(&self) -> Result> { Ok(None) } + + pub fn webhooks(&self) -> Option<&v6::Webhooks> { + None + } } pub enum CompatIndexV5ToV6 { diff --git a/crates/dump/src/reader/mod.rs b/crates/dump/src/reader/mod.rs index c894c255f..da55bb4a8 100644 --- a/crates/dump/src/reader/mod.rs +++ b/crates/dump/src/reader/mod.rs @@ -138,6 +138,13 @@ impl DumpReader { DumpReader::Compat(compat) => compat.network(), } } + + pub fn webhooks(&self) -> Option<&v6::Webhooks> { + match self { + DumpReader::Current(current) => current.webhooks(), + DumpReader::Compat(compat) => compat.webhooks(), + } + } } impl From for DumpReader { @@ -365,6 +372,7 @@ pub(crate) mod test { assert_eq!(dump.features().unwrap().unwrap(), RuntimeTogglableFeatures::default()); assert_eq!(dump.network().unwrap(), None); + assert_eq!(dump.webhooks(), None); } #[test] @@ -435,6 +443,43 @@ pub(crate) mod test { insta::assert_snapshot!(network.remotes.get("ms-2").as_ref().unwrap().search_api_key.as_ref().unwrap(), @"foo"); } + #[test] + fn import_dump_v6_webhooks() { + let dump = File::open("tests/assets/v6-with-webhooks.dump").unwrap(); + let dump = DumpReader::open(dump).unwrap(); + + // top level infos + insta::assert_snapshot!(dump.date().unwrap(), @"2025-07-31 9:21:30.479544 +00:00:00"); + insta::assert_debug_snapshot!(dump.instance_uid().unwrap(), @r" + Some( + cb887dcc-34b3-48d1-addd-9815ae721a81, + ) + "); + + // webhooks + let webhooks = dump.webhooks().unwrap(); + insta::assert_json_snapshot!(webhooks, @r#" + { + "webhooks": { + "627ea538-733d-4545-8d2d-03526eb381ce": { + "url": "https://example.com/authorization-less", + "headers": {} + }, + "771b0a28-ef28-4082-b984-536f82958c65": { + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN" + } + }, + "f3583083-f8a7-4cbf-a5e7-fb3f1e28a7e9": { + "url": "https://third.com", + "headers": {} + } + } + } + "#); + } + #[test] fn import_dump_v5() { let dump = File::open("tests/assets/v5.dump").unwrap(); diff --git a/crates/dump/src/reader/v6/mod.rs b/crates/dump/src/reader/v6/mod.rs index 08d4700e5..9bc4b33c5 100644 --- a/crates/dump/src/reader/v6/mod.rs +++ b/crates/dump/src/reader/v6/mod.rs @@ -25,6 +25,7 @@ pub type Key = meilisearch_types::keys::Key; pub type ChatCompletionSettings = meilisearch_types::features::ChatCompletionSettings; pub type RuntimeTogglableFeatures = meilisearch_types::features::RuntimeTogglableFeatures; pub type Network = meilisearch_types::features::Network; +pub type Webhooks = meilisearch_types::webhooks::WebhooksDumpView; // ===== Other types to clarify the code of the compat module // everything related to the tasks @@ -59,6 +60,7 @@ pub struct V6Reader { keys: BufReader, features: Option, network: Option, + webhooks: Option, } impl V6Reader { @@ -93,8 +95,8 @@ impl V6Reader { Err(e) => return Err(e.into()), }; - let network_file = match fs::read(dump.path().join("network.json")) { - Ok(network_file) => Some(network_file), + let network = match fs::read(dump.path().join("network.json")) { + Ok(network_file) => Some(serde_json::from_reader(&*network_file)?), Err(error) => match error.kind() { // Allows the file to be missing, this will only result in all experimental features disabled. ErrorKind::NotFound => { @@ -104,10 +106,16 @@ impl V6Reader { _ => return Err(error.into()), }, }; - let network = if let Some(network_file) = network_file { - Some(serde_json::from_reader(&*network_file)?) - } else { - None + + let webhooks = match fs::read(dump.path().join("webhooks.json")) { + Ok(webhooks_file) => Some(serde_json::from_reader(&*webhooks_file)?), + Err(error) => match error.kind() { + ErrorKind::NotFound => { + debug!("`webhooks.json` not found in dump"); + None + } + _ => return Err(error.into()), + }, }; Ok(V6Reader { @@ -119,6 +127,7 @@ impl V6Reader { features, network, dump, + webhooks, }) } @@ -229,6 +238,10 @@ impl V6Reader { pub fn network(&self) -> Option<&Network> { self.network.as_ref() } + + pub fn webhooks(&self) -> Option<&Webhooks> { + self.webhooks.as_ref() + } } pub struct UpdateFile { diff --git a/crates/dump/src/writer.rs b/crates/dump/src/writer.rs index 9f828595a..1d41b6aa5 100644 --- a/crates/dump/src/writer.rs +++ b/crates/dump/src/writer.rs @@ -8,6 +8,7 @@ use meilisearch_types::batches::Batch; use meilisearch_types::features::{ChatCompletionSettings, Network, RuntimeTogglableFeatures}; use meilisearch_types::keys::Key; use meilisearch_types::settings::{Checked, Settings}; +use meilisearch_types::webhooks::WebhooksDumpView; use serde_json::{Map, Value}; use tempfile::TempDir; use time::OffsetDateTime; @@ -74,6 +75,13 @@ impl DumpWriter { Ok(std::fs::write(self.dir.path().join("network.json"), serde_json::to_string(&network)?)?) } + pub fn create_webhooks(&self, webhooks: WebhooksDumpView) -> Result<()> { + Ok(std::fs::write( + self.dir.path().join("webhooks.json"), + serde_json::to_string(&webhooks)?, + )?) + } + pub fn persist_to(self, mut writer: impl Write) -> Result<()> { let gz_encoder = GzEncoder::new(&mut writer, Compression::default()); let mut tar_encoder = tar::Builder::new(gz_encoder); diff --git a/crates/dump/tests/assets/v6-with-webhooks.dump b/crates/dump/tests/assets/v6-with-webhooks.dump new file mode 100644 index 000000000..955c2a63d Binary files /dev/null and b/crates/dump/tests/assets/v6-with-webhooks.dump differ diff --git a/crates/filter-parser/src/lib.rs b/crates/filter-parser/src/lib.rs index c761c583b..9ee00c3eb 100644 --- a/crates/filter-parser/src/lib.rs +++ b/crates/filter-parser/src/lib.rs @@ -178,9 +178,9 @@ impl<'a> FilterCondition<'a> { | Condition::Exists | Condition::LowerThan(_) | Condition::LowerThanOrEqual(_) - | Condition::Between { .. } => None, - Condition::Contains { keyword, word: _ } - | Condition::StartsWith { keyword, word: _ } => Some(keyword), + | Condition::Between { .. } + | Condition::StartsWith { .. } => None, + Condition::Contains { keyword, word: _ } => Some(keyword), }, FilterCondition::Not(this) => this.use_contains_operator(), FilterCondition::Or(seq) | FilterCondition::And(seq) => { diff --git a/crates/index-scheduler/src/features.rs b/crates/index-scheduler/src/features.rs index b52a659a6..dee665458 100644 --- a/crates/index-scheduler/src/features.rs +++ b/crates/index-scheduler/src/features.rs @@ -85,7 +85,7 @@ impl RoFeatures { Ok(()) } else { Err(FeatureNotEnabledError { - disabled_action: "Using `CONTAINS` or `STARTS WITH` in a filter", + disabled_action: "Using `CONTAINS` in a filter", feature: "contains filter", issue_link: "https://github.com/orgs/meilisearch/discussions/763", } @@ -182,6 +182,7 @@ impl FeatureData { ..persisted_features })); + // Once this is stabilized, network should be stored along with webhooks in index-scheduler's persisted database let network_db = runtime_features_db.remap_data_type::>(); let network: Network = network_db.get(wtxn, db_keys::NETWORK)?.unwrap_or_default(); diff --git a/crates/index-scheduler/src/insta_snapshot.rs b/crates/index-scheduler/src/insta_snapshot.rs index 32ce131b5..cb804d9b4 100644 --- a/crates/index-scheduler/src/insta_snapshot.rs +++ b/crates/index-scheduler/src/insta_snapshot.rs @@ -26,11 +26,11 @@ pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String { version, queue, scheduler, + persisted, index_mapper, features: _, - webhook_url: _, - webhook_authorization_header: _, + webhooks: _, test_breakpoint_sdr: _, planned_failures: _, run_loop_iteration: _, @@ -62,6 +62,13 @@ pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String { } snap.push_str("\n----------------------------------------------------------------------\n"); + let persisted_db_snapshot = snapshot_persisted_db(&rtxn, persisted); + if !persisted_db_snapshot.is_empty() { + snap.push_str("### Persisted:\n"); + snap.push_str(&persisted_db_snapshot); + snap.push_str("----------------------------------------------------------------------\n"); + } + snap.push_str("### All Tasks:\n"); snap.push_str(&snapshot_all_tasks(&rtxn, queue.tasks.all_tasks)); snap.push_str("----------------------------------------------------------------------\n"); @@ -200,6 +207,16 @@ pub fn snapshot_date_db(rtxn: &RoTxn, db: Database) -> String { + let mut snap = String::new(); + let iter = db.iter(rtxn).unwrap(); + for next in iter { + let (key, value) = next.unwrap(); + snap.push_str(&format!("{key}: {value}\n")); + } + snap +} + pub fn snapshot_task(task: &Task) -> String { let mut snap = String::new(); let Task { @@ -311,6 +328,7 @@ pub fn snapshot_status( } snap } + pub fn snapshot_kind(rtxn: &RoTxn, db: Database, RoaringBitmapCodec>) -> String { let mut snap = String::new(); let iter = db.iter(rtxn).unwrap(); @@ -331,6 +349,7 @@ pub fn snapshot_index_tasks(rtxn: &RoTxn, db: Database) } snap } + pub fn snapshot_canceled_by(rtxn: &RoTxn, db: Database) -> String { let mut snap = String::new(); let iter = db.iter(rtxn).unwrap(); diff --git a/crates/index-scheduler/src/lib.rs b/crates/index-scheduler/src/lib.rs index 46566b9ba..6ad7a8397 100644 --- a/crates/index-scheduler/src/lib.rs +++ b/crates/index-scheduler/src/lib.rs @@ -65,13 +65,16 @@ use meilisearch_types::milli::vector::{ use meilisearch_types::milli::{self, Index}; use meilisearch_types::task_view::TaskView; use meilisearch_types::tasks::{KindWithContent, Task}; +use meilisearch_types::webhooks::{Webhook, WebhooksDumpView, WebhooksView}; use milli::vector::db::IndexEmbeddingConfig; use processing::ProcessingTasks; pub use queue::Query; use queue::Queue; use roaring::RoaringBitmap; use scheduler::Scheduler; +use serde::{Deserialize, Serialize}; use time::OffsetDateTime; +use uuid::Uuid; use versioning::Versioning; use crate::index_mapper::IndexMapper; @@ -80,7 +83,15 @@ use crate::utils::clamp_to_page_size; pub(crate) type BEI128 = I128; const TASK_SCHEDULER_SIZE_THRESHOLD_PERCENT_INT: u64 = 40; -const CHAT_SETTINGS_DB_NAME: &str = "chat-settings"; + +mod db_name { + pub const CHAT_SETTINGS: &str = "chat-settings"; + pub const PERSISTED: &str = "persisted"; +} + +mod db_keys { + pub const WEBHOOKS: &str = "webhooks"; +} #[derive(Debug)] pub struct IndexSchedulerOptions { @@ -98,10 +109,10 @@ pub struct IndexSchedulerOptions { pub snapshots_path: PathBuf, /// The path to the folder containing the dumps. pub dumps_path: PathBuf, - /// The URL on which we must send the tasks statuses - pub webhook_url: Option, - /// The value we will send into the Authorization HTTP header on the webhook URL - pub webhook_authorization_header: Option, + /// The webhook url that was set by the CLI. + pub cli_webhook_url: Option, + /// The Authorization header to send to the webhook URL that was set by the CLI. + pub cli_webhook_authorization: Option, /// The maximum size, in bytes, of the task index. pub task_db_size: usize, /// The size, in bytes, with which a meilisearch index is opened the first time of each meilisearch index. @@ -171,10 +182,11 @@ pub struct IndexScheduler { /// Whether we should use the old document indexer or the new one. pub(crate) experimental_no_edition_2024_for_dumps: bool, - /// The webhook url we should send tasks to after processing every batches. - pub(crate) webhook_url: Option, - /// The Authorization header to send to the webhook URL. - pub(crate) webhook_authorization_header: Option, + /// A database to store single-keyed data that is persisted across restarts. + persisted: Database, + + /// Webhook, loaded and stored in the `persisted` database + webhooks: Arc, /// A map to retrieve the runtime representation of an embedder depending on its configuration. /// @@ -214,8 +226,9 @@ impl IndexScheduler { index_mapper: self.index_mapper.clone(), cleanup_enabled: self.cleanup_enabled, experimental_no_edition_2024_for_dumps: self.experimental_no_edition_2024_for_dumps, - webhook_url: self.webhook_url.clone(), - webhook_authorization_header: self.webhook_authorization_header.clone(), + persisted: self.persisted, + + webhooks: self.webhooks.clone(), embedders: self.embedders.clone(), #[cfg(test)] test_breakpoint_sdr: self.test_breakpoint_sdr.clone(), @@ -234,6 +247,7 @@ impl IndexScheduler { + IndexMapper::nb_db() + features::FeatureData::nb_db() + 1 // chat-prompts + + 1 // persisted } /// Create an index scheduler and start its run loop. @@ -284,10 +298,18 @@ impl IndexScheduler { let version = versioning::Versioning::new(&env, from_db_version)?; let mut wtxn = env.write_txn()?; + let features = features::FeatureData::new(&env, &mut wtxn, options.instance_features)?; let queue = Queue::new(&env, &mut wtxn, &options)?; let index_mapper = IndexMapper::new(&env, &mut wtxn, &options, budget)?; - let chat_settings = env.create_database(&mut wtxn, Some(CHAT_SETTINGS_DB_NAME))?; + let chat_settings = env.create_database(&mut wtxn, Some(db_name::CHAT_SETTINGS))?; + + let persisted = env.create_database(&mut wtxn, Some(db_name::PERSISTED))?; + let webhooks_db = persisted.remap_data_type::>(); + let mut webhooks = webhooks_db.get(&wtxn, db_keys::WEBHOOKS)?.unwrap_or_default(); + webhooks + .with_cli(options.cli_webhook_url.clone(), options.cli_webhook_authorization.clone()); + wtxn.commit()?; // allow unreachable_code to get rids of the warning in the case of a test build. @@ -303,8 +325,8 @@ impl IndexScheduler { experimental_no_edition_2024_for_dumps: options .indexer_config .experimental_no_edition_2024_for_dumps, - webhook_url: options.webhook_url, - webhook_authorization_header: options.webhook_authorization_header, + persisted, + webhooks: Arc::new(webhooks), embedders: Default::default(), #[cfg(test)] @@ -752,86 +774,92 @@ impl IndexScheduler { Ok(()) } - /// Once the tasks changes have been committed we must send all the tasks that were updated to our webhook if there is one. - fn notify_webhook(&self, updated: &RoaringBitmap) -> Result<()> { - if let Some(ref url) = self.webhook_url { - struct TaskReader<'a, 'b> { - rtxn: &'a RoTxn<'a>, - index_scheduler: &'a IndexScheduler, - tasks: &'b mut roaring::bitmap::Iter<'b>, - buffer: Vec, - written: usize, - } + /// Once the tasks changes have been committed we must send all the tasks that were updated to our webhooks + fn notify_webhooks(&self, updated: RoaringBitmap) { + struct TaskReader<'a, 'b> { + rtxn: &'a RoTxn<'a>, + index_scheduler: &'a IndexScheduler, + tasks: &'b mut roaring::bitmap::Iter<'b>, + buffer: Vec, + written: usize, + } - impl Read for TaskReader<'_, '_> { - fn read(&mut self, mut buf: &mut [u8]) -> std::io::Result { - if self.buffer.is_empty() { - match self.tasks.next() { - None => return Ok(0), - Some(task_id) => { - let task = self - .index_scheduler - .queue - .tasks - .get_task(self.rtxn, task_id) - .map_err(|err| io::Error::new(io::ErrorKind::Other, err))? - .ok_or_else(|| { - io::Error::new( - io::ErrorKind::Other, - Error::CorruptedTaskQueue, - ) - })?; + impl Read for TaskReader<'_, '_> { + fn read(&mut self, mut buf: &mut [u8]) -> std::io::Result { + if self.buffer.is_empty() { + match self.tasks.next() { + None => return Ok(0), + Some(task_id) => { + let task = self + .index_scheduler + .queue + .tasks + .get_task(self.rtxn, task_id) + .map_err(|err| io::Error::new(io::ErrorKind::Other, err))? + .ok_or_else(|| { + io::Error::new(io::ErrorKind::Other, Error::CorruptedTaskQueue) + })?; - serde_json::to_writer( - &mut self.buffer, - &TaskView::from_task(&task), - )?; - self.buffer.push(b'\n'); - } + serde_json::to_writer(&mut self.buffer, &TaskView::from_task(&task))?; + self.buffer.push(b'\n'); } } - - let mut to_write = &self.buffer[self.written..]; - let wrote = io::copy(&mut to_write, &mut buf)?; - self.written += wrote as usize; - - // we wrote everything and must refresh our buffer on the next call - if self.written == self.buffer.len() { - self.written = 0; - self.buffer.clear(); - } - - Ok(wrote as usize) } - } - let rtxn = self.env.read_txn()?; + let mut to_write = &self.buffer[self.written..]; + let wrote = io::copy(&mut to_write, &mut buf)?; + self.written += wrote as usize; - let task_reader = TaskReader { - rtxn: &rtxn, - index_scheduler: self, - tasks: &mut updated.into_iter(), - buffer: Vec::with_capacity(50), // on average a task is around ~100 bytes - written: 0, - }; + // we wrote everything and must refresh our buffer on the next call + if self.written == self.buffer.len() { + self.written = 0; + self.buffer.clear(); + } - // let reader = GzEncoder::new(BufReader::new(task_reader), Compression::default()); - let reader = GzEncoder::new(BufReader::new(task_reader), Compression::default()); - let request = ureq::post(url) - .timeout(Duration::from_secs(30)) - .set("Content-Encoding", "gzip") - .set("Content-Type", "application/x-ndjson"); - let request = match &self.webhook_authorization_header { - Some(header) => request.set("Authorization", header), - None => request, - }; - - if let Err(e) = request.send(reader) { - tracing::error!("While sending data to the webhook: {e}"); + Ok(wrote as usize) } } - Ok(()) + let webhooks = self.webhooks.get_all(); + if webhooks.is_empty() { + return; + } + let this = self.private_clone(); + // We must take the RoTxn before entering the thread::spawn otherwise another batch may be + // processed before we had the time to take our txn. + let rtxn = match self.env.clone().static_read_txn() { + Ok(rtxn) => rtxn, + Err(e) => { + tracing::error!("Couldn't get an rtxn to notify the webhook: {e}"); + return; + } + }; + + std::thread::spawn(move || { + for (uuid, Webhook { url, headers }) in webhooks.iter() { + let task_reader = TaskReader { + rtxn: &rtxn, + index_scheduler: &this, + tasks: &mut updated.iter(), + buffer: Vec::with_capacity(page_size::get()), + written: 0, + }; + + let reader = GzEncoder::new(BufReader::new(task_reader), Compression::default()); + + let mut request = ureq::post(url) + .timeout(Duration::from_secs(30)) + .set("Content-Encoding", "gzip") + .set("Content-Type", "application/x-ndjson"); + for (header_name, header_value) in headers.iter() { + request = request.set(header_name, header_value); + } + + if let Err(e) = request.send(reader) { + tracing::error!("While sending data to the webhook {uuid}: {e}"); + } + } + }); } pub fn index_stats(&self, index_uid: &str) -> Result { @@ -862,6 +890,29 @@ impl IndexScheduler { self.features.network() } + pub fn update_runtime_webhooks(&self, runtime: RuntimeWebhooks) -> Result<()> { + let webhooks = Webhooks::from_runtime(runtime); + let mut wtxn = self.env.write_txn()?; + let webhooks_db = self.persisted.remap_data_type::>(); + webhooks_db.put(&mut wtxn, db_keys::WEBHOOKS, &webhooks)?; + wtxn.commit()?; + self.webhooks.update_runtime(webhooks.into_runtime()); + Ok(()) + } + + pub fn webhooks_dump_view(&self) -> WebhooksDumpView { + // We must not dump the cli api key + WebhooksDumpView { webhooks: self.webhooks.get_runtime() } + } + + pub fn webhooks_view(&self) -> WebhooksView { + WebhooksView { webhooks: self.webhooks.get_all() } + } + + pub fn retrieve_runtime_webhooks(&self) -> RuntimeWebhooks { + self.webhooks.get_runtime() + } + pub fn embedders( &self, index_uid: String, @@ -990,3 +1041,72 @@ pub struct IndexStats { /// Internal stats computed from the index. pub inner_stats: index_mapper::IndexStats, } + +/// These structure are not meant to be exposed to the end user, if needed, use the meilisearch-types::webhooks structure instead. +/// /!\ Everytime you deserialize this structure you should fill the cli_webhook later on with the `with_cli` method. /!\ +#[derive(Debug, Serialize, Deserialize, Default)] +#[serde(rename_all = "camelCase")] +struct Webhooks { + // The cli webhook should *never* be stored in a database. + // It represent a state that only exists for this execution of meilisearch + #[serde(skip)] + pub cli: Option, + + #[serde(default)] + pub runtime: RwLock, +} + +type RuntimeWebhooks = BTreeMap; + +impl Webhooks { + pub fn with_cli(&mut self, url: Option, auth: Option) { + if let Some(url) = url { + let webhook = CliWebhook { url, auth }; + self.cli = Some(webhook); + } + } + + pub fn from_runtime(webhooks: RuntimeWebhooks) -> Self { + Self { cli: None, runtime: RwLock::new(webhooks) } + } + + pub fn into_runtime(self) -> RuntimeWebhooks { + // safe because we own self and it cannot be cloned + self.runtime.into_inner().unwrap() + } + + pub fn update_runtime(&self, webhooks: RuntimeWebhooks) { + *self.runtime.write().unwrap() = webhooks; + } + + /// Returns all the webhooks in an unified view. The cli webhook is represented with an uuid set to 0 + pub fn get_all(&self) -> BTreeMap { + self.cli + .as_ref() + .map(|wh| (Uuid::nil(), Webhook::from(wh))) + .into_iter() + .chain(self.runtime.read().unwrap().iter().map(|(uuid, wh)| (*uuid, wh.clone()))) + .collect() + } + + /// Returns all the runtime webhooks. + pub fn get_runtime(&self) -> BTreeMap { + self.runtime.read().unwrap().iter().map(|(uuid, wh)| (*uuid, wh.clone())).collect() + } +} + +#[derive(Debug, Serialize, Deserialize, Default, Clone, PartialEq)] +struct CliWebhook { + pub url: String, + pub auth: Option, +} + +impl From<&CliWebhook> for Webhook { + fn from(webhook: &CliWebhook) -> Self { + let mut headers = BTreeMap::new(); + if let Some(ref auth) = webhook.auth { + headers.insert("Authorization".to_string(), auth.to_string()); + } + Self { url: webhook.url.to_string(), headers } + } +} diff --git a/crates/index-scheduler/src/processing.rs b/crates/index-scheduler/src/processing.rs index fdd8e42ef..3da81f143 100644 --- a/crates/index-scheduler/src/processing.rs +++ b/crates/index-scheduler/src/processing.rs @@ -108,6 +108,7 @@ make_enum_progress! { DumpTheBatches, DumpTheIndexes, DumpTheExperimentalFeatures, + DumpTheWebhooks, CompressTheDump, } } diff --git a/crates/index-scheduler/src/scheduler/mod.rs b/crates/index-scheduler/src/scheduler/mod.rs index 5ac591143..b2bb90c0b 100644 --- a/crates/index-scheduler/src/scheduler/mod.rs +++ b/crates/index-scheduler/src/scheduler/mod.rs @@ -446,8 +446,7 @@ impl IndexScheduler { Ok(()) })?; - // We shouldn't crash the tick function if we can't send data to the webhook. - let _ = self.notify_webhook(&ids); + self.notify_webhooks(ids); #[cfg(test)] self.breakpoint(crate::test_utils::Breakpoint::AfterProcessing); diff --git a/crates/index-scheduler/src/scheduler/process_dump_creation.rs b/crates/index-scheduler/src/scheduler/process_dump_creation.rs index b14f23d0b..4f3ec0fdd 100644 --- a/crates/index-scheduler/src/scheduler/process_dump_creation.rs +++ b/crates/index-scheduler/src/scheduler/process_dump_creation.rs @@ -270,6 +270,11 @@ impl IndexScheduler { let network = self.network(); dump.create_network(network)?; + // 7. Dump the webhooks + progress.update_progress(DumpCreationProgress::DumpTheWebhooks); + let webhooks = self.webhooks_dump_view(); + dump.create_webhooks(webhooks)?; + let dump_uid = started_at.format(format_description!( "[year repr:full][month repr:numerical][day padding:zero]-[hour padding:zero][minute padding:zero][second padding:zero][subsecond digits:3]" )).unwrap(); diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/after_processing_everything.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/after_processing_everything.snap index 0b5d4409d..d700dd3db 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/after_processing_everything.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/after_processing_everything.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 1 {uid: 1, batch_uid: 1, status: succeeded, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 2 {uid: 2, batch_uid: 2, status: succeeded, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 3 {uid: 3, batch_uid: 3, status: failed, error: ResponseError { code: 200, message: "Index `doggo` already exists.", error_code: "index_already_exists", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#index_already_exists" }, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} @@ -57,7 +57,7 @@ girafo: { number_of_documents: 0, field_distribution: {} } [timestamp] [4,] ---------------------------------------------------------------------- ### All Batches: -0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.16.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } +0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.17.1"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } 1 {uid: 1, details: {"primaryKey":"mouse"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"catto":1}}, stop reason: "created batch containing only task with id 1 of type `indexCreation` that cannot be batched with any other task.", } 2 {uid: 2, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 2 of type `indexCreation` that cannot be batched with any other task.", } 3 {uid: 3, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 3 of type `indexCreation` that cannot be batched with any other task.", } diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/register_automatic_upgrade_task.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/register_automatic_upgrade_task.snap index 0bfb9c6da..ee3cefba4 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/register_automatic_upgrade_task.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/register_automatic_upgrade_task.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} ---------------------------------------------------------------------- ### Status: enqueued [0,] diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/registered_a_task_while_the_upgrade_task_is_enqueued.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/registered_a_task_while_the_upgrade_task_is_enqueued.snap index 8d374479b..abaffbb1b 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/registered_a_task_while_the_upgrade_task_is_enqueued.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/registered_a_task_while_the_upgrade_task_is_enqueued.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} ---------------------------------------------------------------------- ### Status: diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed.snap index 9fc28abbe..9569ecfe3 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} ---------------------------------------------------------------------- ### Status: @@ -37,7 +37,7 @@ catto [1,] [timestamp] [0,] ---------------------------------------------------------------------- ### All Batches: -0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.16.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } +0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.17.1"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } ---------------------------------------------------------------------- ### Batch to tasks mapping: 0 [0,] diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed_again.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed_again.snap index 33ddf7193..1d7945023 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed_again.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed_again.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 2 {uid: 2, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} ---------------------------------------------------------------------- @@ -40,7 +40,7 @@ doggo [2,] [timestamp] [0,] ---------------------------------------------------------------------- ### All Batches: -0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.16.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } +0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.17.1"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } ---------------------------------------------------------------------- ### Batch to tasks mapping: 0 [0,] diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_succeeded.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_succeeded.snap index 05d366d1e..869d1d0b2 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_succeeded.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_succeeded.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 2 {uid: 2, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 3 {uid: 3, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} @@ -43,7 +43,7 @@ doggo [2,3,] [timestamp] [0,] ---------------------------------------------------------------------- ### All Batches: -0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.16.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } +0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.17.1"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } ---------------------------------------------------------------------- ### Batch to tasks mapping: 0 [0,] diff --git a/crates/index-scheduler/src/test_utils.rs b/crates/index-scheduler/src/test_utils.rs index bfed7f53a..36de0ed9e 100644 --- a/crates/index-scheduler/src/test_utils.rs +++ b/crates/index-scheduler/src/test_utils.rs @@ -98,8 +98,8 @@ impl IndexScheduler { indexes_path: tempdir.path().join("indexes"), snapshots_path: tempdir.path().join("snapshots"), dumps_path: tempdir.path().join("dumps"), - webhook_url: None, - webhook_authorization_header: None, + cli_webhook_url: None, + cli_webhook_authorization: None, task_db_size: 1000 * 1000 * 10, // 10 MB, we don't use MiB on purpose. index_base_map_size: 1000 * 1000, // 1 MB, we don't use MiB on purpose. enable_mdb_writemap: false, diff --git a/crates/index-scheduler/src/upgrade/mod.rs b/crates/index-scheduler/src/upgrade/mod.rs index 2053caa92..a749b31d5 100644 --- a/crates/index-scheduler/src/upgrade/mod.rs +++ b/crates/index-scheduler/src/upgrade/mod.rs @@ -39,6 +39,7 @@ pub fn upgrade_index_scheduler( (1, 13, _) => 0, (1, 14, _) => 0, (1, 15, _) => 0, + (1, 16, _) => 0, (major, minor, patch) => { if major > current_major || (major == current_major && minor > current_minor) diff --git a/crates/meilisearch-auth/src/store.rs b/crates/meilisearch-auth/src/store.rs index eb2170f08..470379e06 100644 --- a/crates/meilisearch-auth/src/store.rs +++ b/crates/meilisearch-auth/src/store.rs @@ -137,6 +137,14 @@ impl HeedAuthStore { Action::ChatsSettingsAll => { actions.extend([Action::ChatsSettingsGet, Action::ChatsSettingsUpdate]); } + Action::WebhooksAll => { + actions.extend([ + Action::WebhooksGet, + Action::WebhooksUpdate, + Action::WebhooksDelete, + Action::WebhooksCreate, + ]); + } other => { actions.insert(*other); } diff --git a/crates/meilisearch-types/src/error.rs b/crates/meilisearch-types/src/error.rs index 458034c00..4360d947b 100644 --- a/crates/meilisearch-types/src/error.rs +++ b/crates/meilisearch-types/src/error.rs @@ -418,7 +418,16 @@ InvalidChatCompletionSearchDescriptionPrompt , InvalidRequest , BAD_REQU InvalidChatCompletionSearchQueryParamPrompt , InvalidRequest , BAD_REQUEST ; InvalidChatCompletionSearchFilterParamPrompt , InvalidRequest , BAD_REQUEST ; InvalidChatCompletionSearchIndexUidParamPrompt , InvalidRequest , BAD_REQUEST ; -InvalidChatCompletionPreQueryPrompt , InvalidRequest , BAD_REQUEST +InvalidChatCompletionPreQueryPrompt , InvalidRequest , BAD_REQUEST ; +// Webhooks +InvalidWebhooks , InvalidRequest , BAD_REQUEST ; +InvalidWebhookUrl , InvalidRequest , BAD_REQUEST ; +InvalidWebhookHeaders , InvalidRequest , BAD_REQUEST ; +ImmutableWebhook , InvalidRequest , BAD_REQUEST ; +InvalidWebhookUuid , InvalidRequest , BAD_REQUEST ; +WebhookNotFound , InvalidRequest , NOT_FOUND ; +ImmutableWebhookUuid , InvalidRequest , BAD_REQUEST ; +ImmutableWebhookIsEditable , InvalidRequest , BAD_REQUEST } impl ErrorCode for JoinError { diff --git a/crates/meilisearch-types/src/keys.rs b/crates/meilisearch-types/src/keys.rs index aec3199a3..06f621e70 100644 --- a/crates/meilisearch-types/src/keys.rs +++ b/crates/meilisearch-types/src/keys.rs @@ -365,6 +365,21 @@ pub enum Action { #[serde(rename = "*.get")] #[deserr(rename = "*.get")] AllGet, + #[serde(rename = "webhooks.get")] + #[deserr(rename = "webhooks.get")] + WebhooksGet, + #[serde(rename = "webhooks.update")] + #[deserr(rename = "webhooks.update")] + WebhooksUpdate, + #[serde(rename = "webhooks.delete")] + #[deserr(rename = "webhooks.delete")] + WebhooksDelete, + #[serde(rename = "webhooks.create")] + #[deserr(rename = "webhooks.create")] + WebhooksCreate, + #[serde(rename = "webhooks.*")] + #[deserr(rename = "webhooks.*")] + WebhooksAll, } impl Action { @@ -416,6 +431,11 @@ impl Action { NETWORK_GET => Some(Self::NetworkGet), NETWORK_UPDATE => Some(Self::NetworkUpdate), ALL_GET => Some(Self::AllGet), + WEBHOOKS_GET => Some(Self::WebhooksGet), + WEBHOOKS_UPDATE => Some(Self::WebhooksUpdate), + WEBHOOKS_DELETE => Some(Self::WebhooksDelete), + WEBHOOKS_CREATE => Some(Self::WebhooksCreate), + WEBHOOKS_ALL => Some(Self::WebhooksAll), _otherwise => None, } } @@ -428,7 +448,9 @@ impl Action { match self { // Any action that expands to others must return false, as it wouldn't be able to expand recursively. All | AllGet | DocumentsAll | IndexesAll | ChatsAll | TasksAll | SettingsAll - | StatsAll | MetricsAll | DumpsAll | SnapshotsAll | ChatsSettingsAll => false, + | StatsAll | MetricsAll | DumpsAll | SnapshotsAll | ChatsSettingsAll | WebhooksAll => { + false + } Search => true, DocumentsAdd => false, @@ -463,6 +485,10 @@ impl Action { ChatsDelete => false, ChatsSettingsGet => true, ChatsSettingsUpdate => false, + WebhooksGet => true, + WebhooksUpdate => false, + WebhooksDelete => false, + WebhooksCreate => false, } } @@ -522,6 +548,12 @@ pub mod actions { pub const CHATS_SETTINGS_ALL: u8 = ChatsSettingsAll.repr(); pub const CHATS_SETTINGS_GET: u8 = ChatsSettingsGet.repr(); pub const CHATS_SETTINGS_UPDATE: u8 = ChatsSettingsUpdate.repr(); + + pub const WEBHOOKS_GET: u8 = WebhooksGet.repr(); + pub const WEBHOOKS_UPDATE: u8 = WebhooksUpdate.repr(); + pub const WEBHOOKS_DELETE: u8 = WebhooksDelete.repr(); + pub const WEBHOOKS_CREATE: u8 = WebhooksCreate.repr(); + pub const WEBHOOKS_ALL: u8 = WebhooksAll.repr(); } #[cfg(test)] @@ -577,6 +609,11 @@ pub(crate) mod test { assert!(ChatsSettingsGet.repr() == 42 && CHATS_SETTINGS_GET == 42); assert!(ChatsSettingsUpdate.repr() == 43 && CHATS_SETTINGS_UPDATE == 43); assert!(AllGet.repr() == 44 && ALL_GET == 44); + assert!(WebhooksGet.repr() == 45 && WEBHOOKS_GET == 45); + assert!(WebhooksUpdate.repr() == 46 && WEBHOOKS_UPDATE == 46); + assert!(WebhooksDelete.repr() == 47 && WEBHOOKS_DELETE == 47); + assert!(WebhooksCreate.repr() == 48 && WEBHOOKS_CREATE == 48); + assert!(WebhooksAll.repr() == 49 && WEBHOOKS_ALL == 49); } #[test] diff --git a/crates/meilisearch-types/src/lib.rs b/crates/meilisearch-types/src/lib.rs index fe69da526..9857bfb29 100644 --- a/crates/meilisearch-types/src/lib.rs +++ b/crates/meilisearch-types/src/lib.rs @@ -15,6 +15,7 @@ pub mod star_or; pub mod task_view; pub mod tasks; pub mod versioning; +pub mod webhooks; pub use milli::{heed, Index}; use uuid::Uuid; pub use versioning::VERSION_FILE_NAME; diff --git a/crates/meilisearch-types/src/webhooks.rs b/crates/meilisearch-types/src/webhooks.rs new file mode 100644 index 000000000..7a35850ab --- /dev/null +++ b/crates/meilisearch-types/src/webhooks.rs @@ -0,0 +1,28 @@ +use std::collections::BTreeMap; + +use serde::{Deserialize, Serialize}; +use uuid::Uuid; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq)] +#[serde(rename_all = "camelCase")] +pub struct Webhook { + pub url: String, + #[serde(default)] + pub headers: BTreeMap, +} + +#[derive(Debug, Serialize, Default, Clone, PartialEq)] +#[serde(rename_all = "camelCase")] +pub struct WebhooksView { + #[serde(default)] + pub webhooks: BTreeMap, +} + +// Same as the WebhooksView instead it should never contains the CLI webhooks. +// It's the right structure to use in the dump +#[derive(Debug, Deserialize, Serialize, Default, Clone, PartialEq)] +#[serde(rename_all = "camelCase")] +pub struct WebhooksDumpView { + #[serde(default)] + pub webhooks: BTreeMap, +} diff --git a/crates/meilisearch/src/lib.rs b/crates/meilisearch/src/lib.rs index 0fb93b65a..ca9bb6f50 100644 --- a/crates/meilisearch/src/lib.rs +++ b/crates/meilisearch/src/lib.rs @@ -223,8 +223,8 @@ pub fn setup_meilisearch(opt: &Opt) -> anyhow::Result<(Arc, Arc< indexes_path: opt.db_path.join("indexes"), snapshots_path: opt.snapshot_dir.clone(), dumps_path: opt.dump_dir.clone(), - webhook_url: opt.task_webhook_url.as_ref().map(|url| url.to_string()), - webhook_authorization_header: opt.task_webhook_authorization_header.clone(), + cli_webhook_url: opt.task_webhook_url.as_ref().map(|url| url.to_string()), + cli_webhook_authorization: opt.task_webhook_authorization_header.clone(), task_db_size: opt.max_task_db_size.as_u64() as usize, index_base_map_size: opt.max_index_size.as_u64() as usize, enable_mdb_writemap: opt.experimental_reduce_indexing_memory_usage, @@ -491,7 +491,12 @@ fn import_dump( let _ = std::fs::write(db_path.join("instance-uid"), instance_uid.to_string().as_bytes()); }; - // 2. Import the `Key`s. + // 2. Import the webhooks + if let Some(webhooks) = dump_reader.webhooks() { + index_scheduler.update_runtime_webhooks(webhooks.webhooks.clone())?; + } + + // 3. Import the `Key`s. let mut keys = Vec::new(); auth.raw_delete_all_keys()?; for key in dump_reader.keys()? { @@ -500,20 +505,20 @@ fn import_dump( keys.push(key); } - // 3. Import the `ChatCompletionSettings`s. + // 4. Import the `ChatCompletionSettings`s. for result in dump_reader.chat_completions_settings()? { let (name, settings) = result?; index_scheduler.put_chat_settings(&name, &settings)?; } - // 4. Import the runtime features and network + // 5. Import the runtime features and network let features = dump_reader.features()?.unwrap_or_default(); index_scheduler.put_runtime_features(features)?; let network = dump_reader.network()?.cloned().unwrap_or_default(); index_scheduler.put_network(network)?; - // 4.1 Use all cpus to process dump if `max_indexing_threads` not configured + // 5.1 Use all cpus to process dump if `max_indexing_threads` not configured let backup_config; let base_config = index_scheduler.indexer_config(); @@ -530,7 +535,7 @@ fn import_dump( // /!\ The tasks must be imported AFTER importing the indexes or else the scheduler might // try to process tasks while we're trying to import the indexes. - // 5. Import the indexes. + // 6. Import the indexes. for index_reader in dump_reader.indexes()? { let mut index_reader = index_reader?; let metadata = index_reader.metadata(); @@ -543,12 +548,12 @@ fn import_dump( let mut wtxn = index.write_txn()?; let mut builder = milli::update::Settings::new(&mut wtxn, &index, indexer_config); - // 5.1 Import the primary key if there is one. + // 6.1 Import the primary key if there is one. if let Some(ref primary_key) = metadata.primary_key { builder.set_primary_key(primary_key.to_string()); } - // 5.2 Import the settings. + // 6.2 Import the settings. tracing::info!("Importing the settings."); let settings = index_reader.settings()?; apply_settings_to_builder(&settings, &mut builder); @@ -560,8 +565,8 @@ fn import_dump( let rtxn = index.read_txn()?; if index_scheduler.no_edition_2024_for_dumps() { - // 5.3 Import the documents. - // 5.3.1 We need to recreate the grenad+obkv format accepted by the index. + // 6.3 Import the documents. + // 6.3.1 We need to recreate the grenad+obkv format accepted by the index. tracing::info!("Importing the documents."); let file = tempfile::tempfile()?; let mut builder = DocumentsBatchBuilder::new(BufWriter::new(file)); @@ -572,7 +577,7 @@ fn import_dump( // This flush the content of the batch builder. let file = builder.into_inner()?.into_inner()?; - // 5.3.2 We feed it to the milli index. + // 6.3.2 We feed it to the milli index. let reader = BufReader::new(file); let reader = DocumentsBatchReader::from_reader(reader)?; @@ -651,15 +656,15 @@ fn import_dump( index_scheduler.refresh_index_stats(&uid)?; } - // 6. Import the queue + // 7. Import the queue let mut index_scheduler_dump = index_scheduler.register_dumped_task()?; - // 6.1. Import the batches + // 7.1. Import the batches for ret in dump_reader.batches()? { let batch = ret?; index_scheduler_dump.register_dumped_batch(batch)?; } - // 6.2. Import the tasks + // 7.2. Import the tasks for ret in dump_reader.tasks()? { let (task, file) = ret?; index_scheduler_dump.register_dumped_task(task, file)?; diff --git a/crates/meilisearch/src/option.rs b/crates/meilisearch/src/option.rs index dd77a1222..e27fa08cd 100644 --- a/crates/meilisearch/src/option.rs +++ b/crates/meilisearch/src/option.rs @@ -206,11 +206,13 @@ pub struct Opt { pub env: String, /// Called whenever a task finishes so a third party can be notified. + /// See also the dedicated API `/webhooks`. #[clap(long, env = MEILI_TASK_WEBHOOK_URL)] pub task_webhook_url: Option, /// The Authorization header to send on the webhook URL whenever /// a task finishes so a third party can be notified. + /// See also the dedicated API `/webhooks`. #[clap(long, env = MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER)] pub task_webhook_authorization_header: Option, diff --git a/crates/meilisearch/src/routes/indexes/search_analytics.rs b/crates/meilisearch/src/routes/indexes/search_analytics.rs index 6b3b7ea46..e27e6347b 100644 --- a/crates/meilisearch/src/routes/indexes/search_analytics.rs +++ b/crates/meilisearch/src/routes/indexes/search_analytics.rs @@ -226,6 +226,7 @@ impl SearchAggregator { let SearchResult { hits: _, query: _, + query_vector: _, processing_time_ms, hits_info: _, semantic_hit_count: _, diff --git a/crates/meilisearch/src/routes/indexes/settings.rs b/crates/meilisearch/src/routes/indexes/settings.rs index 308977a6e..10120ebff 100644 --- a/crates/meilisearch/src/routes/indexes/settings.rs +++ b/crates/meilisearch/src/routes/indexes/settings.rs @@ -511,7 +511,7 @@ make_setting_routes!( }, { route: "/chat", - update_verb: put, + update_verb: patch, value_type: ChatSettings, err_type: meilisearch_types::deserr::DeserrJsonError< meilisearch_types::error::deserr_codes::InvalidSettingsIndexChat, diff --git a/crates/meilisearch/src/routes/mod.rs b/crates/meilisearch/src/routes/mod.rs index 260d973a1..745ac5824 100644 --- a/crates/meilisearch/src/routes/mod.rs +++ b/crates/meilisearch/src/routes/mod.rs @@ -41,6 +41,7 @@ use crate::routes::indexes::IndexView; use crate::routes::multi_search::SearchResults; use crate::routes::network::{Network, Remote}; use crate::routes::swap_indexes::SwapIndexesPayload; +use crate::routes::webhooks::{WebhookResults, WebhookSettings, WebhookWithMetadata}; use crate::search::{ FederatedSearch, FederatedSearchResult, Federation, FederationOptions, MergeFacets, SearchQueryWithIndex, SearchResultWithIndex, SimilarQuery, SimilarResult, @@ -70,6 +71,7 @@ mod swap_indexes; pub mod tasks; #[cfg(test)] mod tasks_test; +mod webhooks; #[derive(OpenApi)] #[openapi( @@ -89,6 +91,7 @@ mod tasks_test; (path = "/experimental-features", api = features::ExperimentalFeaturesApi), (path = "/export", api = export::ExportApi), (path = "/network", api = network::NetworkApi), + (path = "/webhooks", api = webhooks::WebhooksApi), ), paths(get_health, get_version, get_stats), tags( @@ -99,7 +102,7 @@ mod tasks_test; url = "/", description = "Local server", )), - components(schemas(PaginationView, PaginationView, IndexView, DocumentDeletionByFilter, AllBatches, BatchStats, ProgressStepView, ProgressView, BatchView, RuntimeTogglableFeatures, SwapIndexesPayload, DocumentEditionByFunction, MergeFacets, FederationOptions, SearchQueryWithIndex, Federation, FederatedSearch, FederatedSearchResult, SearchResults, SearchResultWithIndex, SimilarQuery, SimilarResult, PaginationView, BrowseQuery, UpdateIndexRequest, IndexUid, IndexCreateRequest, KeyView, Action, CreateApiKey, UpdateStderrLogs, LogMode, GetLogs, IndexStats, Stats, HealthStatus, HealthResponse, VersionResponse, Code, ErrorType, AllTasks, TaskView, Status, DetailsView, ResponseError, Settings, Settings, TypoSettings, MinWordSizeTyposSetting, FacetingSettings, PaginationSettings, SummarizedTaskView, Kind, Network, Remote, FilterableAttributesRule, FilterableAttributesPatterns, AttributePatterns, FilterableAttributesFeatures, FilterFeatures, Export)) + components(schemas(PaginationView, PaginationView, IndexView, DocumentDeletionByFilter, AllBatches, BatchStats, ProgressStepView, ProgressView, BatchView, RuntimeTogglableFeatures, SwapIndexesPayload, DocumentEditionByFunction, MergeFacets, FederationOptions, SearchQueryWithIndex, Federation, FederatedSearch, FederatedSearchResult, SearchResults, SearchResultWithIndex, SimilarQuery, SimilarResult, PaginationView, BrowseQuery, UpdateIndexRequest, IndexUid, IndexCreateRequest, KeyView, Action, CreateApiKey, UpdateStderrLogs, LogMode, GetLogs, IndexStats, Stats, HealthStatus, HealthResponse, VersionResponse, Code, ErrorType, AllTasks, TaskView, Status, DetailsView, ResponseError, Settings, Settings, TypoSettings, MinWordSizeTyposSetting, FacetingSettings, PaginationSettings, SummarizedTaskView, Kind, Network, Remote, FilterableAttributesRule, FilterableAttributesPatterns, AttributePatterns, FilterableAttributesFeatures, FilterFeatures, Export, WebhookSettings, WebhookResults, WebhookWithMetadata)) )] pub struct MeilisearchApi; @@ -120,7 +123,8 @@ pub fn configure(cfg: &mut web::ServiceConfig) { .service(web::scope("/experimental-features").configure(features::configure)) .service(web::scope("/network").configure(network::configure)) .service(web::scope("/export").configure(export::configure)) - .service(web::scope("/chats").configure(chats::configure)); + .service(web::scope("/chats").configure(chats::configure)) + .service(web::scope("/webhooks").configure(webhooks::configure)); #[cfg(feature = "swagger")] { diff --git a/crates/meilisearch/src/routes/network.rs b/crates/meilisearch/src/routes/network.rs index 7e58df113..4afa32c09 100644 --- a/crates/meilisearch/src/routes/network.rs +++ b/crates/meilisearch/src/routes/network.rs @@ -51,7 +51,7 @@ pub fn configure(cfg: &mut web::ServiceConfig) { get, path = "", tag = "Network", - security(("Bearer" = ["network.get", "network.*", "*"])), + security(("Bearer" = ["network.get", "*"])), responses( (status = OK, description = "Known nodes are returned", body = Network, content_type = "application/json", example = json!( { @@ -168,7 +168,7 @@ impl Aggregate for PatchNetworkAnalytics { path = "", tag = "Network", request_body = Network, - security(("Bearer" = ["network.update", "network.*", "*"])), + security(("Bearer" = ["network.update", "*"])), responses( (status = OK, description = "New network state is returned", body = Network, content_type = "application/json", example = json!( { diff --git a/crates/meilisearch/src/routes/webhooks.rs b/crates/meilisearch/src/routes/webhooks.rs new file mode 100644 index 000000000..b25b19336 --- /dev/null +++ b/crates/meilisearch/src/routes/webhooks.rs @@ -0,0 +1,474 @@ +use std::collections::BTreeMap; +use std::str::FromStr; + +use actix_http::header::{ + HeaderName, HeaderValue, InvalidHeaderName as ActixInvalidHeaderName, + InvalidHeaderValue as ActixInvalidHeaderValue, +}; +use actix_web::web::{self, Data, Path}; +use actix_web::{HttpRequest, HttpResponse}; +use core::convert::Infallible; +use deserr::actix_web::AwebJson; +use deserr::{DeserializeError, Deserr, ValuePointerRef}; +use index_scheduler::IndexScheduler; +use meilisearch_types::deserr::{immutable_field_error, DeserrJsonError}; +use meilisearch_types::error::deserr_codes::{ + BadRequest, InvalidWebhookHeaders, InvalidWebhookUrl, +}; +use meilisearch_types::error::{Code, ErrorCode, ResponseError}; +use meilisearch_types::keys::actions; +use meilisearch_types::milli::update::Setting; +use meilisearch_types::webhooks::Webhook; +use serde::Serialize; +use tracing::debug; +use url::Url; +use utoipa::{OpenApi, ToSchema}; +use uuid::Uuid; + +use crate::analytics::{Aggregate, Analytics}; +use crate::extractors::authentication::policies::ActionPolicy; +use crate::extractors::authentication::GuardedData; +use crate::extractors::sequential_extractor::SeqHandler; +use WebhooksError::*; + +#[derive(OpenApi)] +#[openapi( + paths(get_webhooks, get_webhook, post_webhook, patch_webhook, delete_webhook), + tags(( + name = "Webhooks", + description = "The `/webhooks` route allows you to register endpoints to be called once tasks are processed.", + external_docs(url = "https://www.meilisearch.com/docs/reference/api/webhooks"), + )), +)] +pub struct WebhooksApi; + +pub fn configure(cfg: &mut web::ServiceConfig) { + cfg.service( + web::resource("") + .route(web::get().to(get_webhooks)) + .route(web::post().to(SeqHandler(post_webhook))), + ) + .service( + web::resource("/{uuid}") + .route(web::get().to(get_webhook)) + .route(web::patch().to(SeqHandler(patch_webhook))) + .route(web::delete().to(SeqHandler(delete_webhook))), + ); +} + +#[derive(Debug, Deserr, ToSchema)] +#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields = deny_immutable_fields_webhook)] +#[serde(rename_all = "camelCase")] +#[schema(rename_all = "camelCase")] +pub(super) struct WebhookSettings { + #[schema(value_type = Option, example = "https://your.site/on-tasks-completed")] + #[deserr(default, error = DeserrJsonError)] + #[serde(default)] + url: Setting, + #[schema(value_type = Option>, example = json!({"Authorization":"Bearer a-secret-token"}))] + #[deserr(default, error = DeserrJsonError)] + #[serde(default)] + headers: Setting>>, +} + +fn deny_immutable_fields_webhook( + field: &str, + accepted: &[&str], + location: ValuePointerRef, +) -> DeserrJsonError { + match field { + "uuid" => immutable_field_error(field, accepted, Code::ImmutableWebhookUuid), + "isEditable" => immutable_field_error(field, accepted, Code::ImmutableWebhookIsEditable), + _ => deserr::take_cf_content(DeserrJsonError::::error::( + None, + deserr::ErrorKind::UnknownKey { key: field, accepted }, + location, + )), + } +} + +#[derive(Debug, Serialize, ToSchema)] +#[serde(rename_all = "camelCase")] +#[schema(rename_all = "camelCase")] +pub(super) struct WebhookWithMetadata { + uuid: Uuid, + is_editable: bool, + #[schema(value_type = WebhookSettings)] + #[serde(flatten)] + webhook: Webhook, +} + +impl WebhookWithMetadata { + pub fn from(uuid: Uuid, webhook: Webhook) -> Self { + Self { uuid, is_editable: uuid != Uuid::nil(), webhook } + } +} + +#[derive(Debug, Serialize, ToSchema)] +#[serde(rename_all = "camelCase")] +pub(super) struct WebhookResults { + results: Vec, +} + +#[utoipa::path( + get, + path = "", + tag = "Webhooks", + security(("Bearer" = ["webhooks.get", "webhooks.*", "*.get", "*"])), + responses( + (status = OK, description = "Webhooks are returned", body = WebhookResults, content_type = "application/json", example = json!({ + "results": [ + { + "uuid": "550e8400-e29b-41d4-a716-446655440000", + "url": "https://your.site/on-tasks-completed", + "headers": { + "Authorization": "Bearer a-secret-token" + }, + "isEditable": true + }, + { + "uuid": "550e8400-e29b-41d4-a716-446655440001", + "url": "https://another.site/on-tasks-completed", + "isEditable": true + } + ] + })), + (status = 401, description = "The authorization header is missing", body = ResponseError, content_type = "application/json", example = json!( + { + "message": "The Authorization header is missing. It must use the bearer authorization method.", + "code": "missing_authorization_header", + "type": "auth", + "link": "https://docs.meilisearch.com/errors#missing_authorization_header" + } + )), + ) +)] +async fn get_webhooks( + index_scheduler: GuardedData, Data>, +) -> Result { + let webhooks = index_scheduler.webhooks_view(); + let results = webhooks + .webhooks + .into_iter() + .map(|(uuid, webhook)| WebhookWithMetadata::from(uuid, webhook)) + .collect::>(); + let results = WebhookResults { results }; + + debug!(returns = ?results, "Get webhooks"); + Ok(HttpResponse::Ok().json(results)) +} + +#[derive(Serialize, Default)] +pub struct PatchWebhooksAnalytics; + +impl Aggregate for PatchWebhooksAnalytics { + fn event_name(&self) -> &'static str { + "Webhooks Updated" + } + + fn aggregate(self: Box, _new: Box) -> Box { + self + } + + fn into_event(self: Box) -> serde_json::Value { + serde_json::to_value(*self).unwrap_or_default() + } +} + +#[derive(Serialize, Default)] +pub struct PostWebhooksAnalytics; + +impl Aggregate for PostWebhooksAnalytics { + fn event_name(&self) -> &'static str { + "Webhooks Created" + } + + fn aggregate(self: Box, _new: Box) -> Box { + self + } + + fn into_event(self: Box) -> serde_json::Value { + serde_json::to_value(*self).unwrap_or_default() + } +} + +#[derive(Debug, thiserror::Error)] +enum WebhooksError { + #[error("The URL for the webhook `{0}` is missing.")] + MissingUrl(Uuid), + #[error("Defining too many webhooks would crush the server. Please limit the number of webhooks to 20. You may use a third-party proxy server to dispatch events to more than 20 endpoints.")] + TooManyWebhooks, + #[error("Too many headers for the webhook `{0}`. Please limit the number of headers to 200. Hint: To remove an already defined header set its value to `null`")] + TooManyHeaders(Uuid), + #[error("Webhook `{0}` is immutable. The webhook defined from the command line cannot be modified using the API.")] + ImmutableWebhook(Uuid), + #[error("Webhook `{0}` not found.")] + WebhookNotFound(Uuid), + #[error("Invalid header name `{0}`: {1}")] + InvalidHeaderName(String, ActixInvalidHeaderName), + #[error("Invalid header value `{0}`: {1}")] + InvalidHeaderValue(String, ActixInvalidHeaderValue), + #[error("Invalid URL `{0}`: {1}")] + InvalidUrl(String, url::ParseError), + #[error("Invalid UUID: {0}")] + InvalidUuid(uuid::Error), +} + +impl ErrorCode for WebhooksError { + fn error_code(&self) -> meilisearch_types::error::Code { + match self { + MissingUrl(_) => meilisearch_types::error::Code::InvalidWebhookUrl, + TooManyWebhooks => meilisearch_types::error::Code::InvalidWebhooks, + TooManyHeaders(_) => meilisearch_types::error::Code::InvalidWebhookHeaders, + ImmutableWebhook(_) => meilisearch_types::error::Code::ImmutableWebhook, + WebhookNotFound(_) => meilisearch_types::error::Code::WebhookNotFound, + InvalidHeaderName(_, _) => meilisearch_types::error::Code::InvalidWebhookHeaders, + InvalidHeaderValue(_, _) => meilisearch_types::error::Code::InvalidWebhookHeaders, + InvalidUrl(_, _) => meilisearch_types::error::Code::InvalidWebhookUrl, + InvalidUuid(_) => meilisearch_types::error::Code::InvalidWebhookUuid, + } + } +} + +fn patch_webhook_inner( + uuid: &Uuid, + old_webhook: Webhook, + new_webhook: WebhookSettings, +) -> Result { + let Webhook { url: old_url, mut headers } = old_webhook; + + let url = match new_webhook.url { + Setting::Set(url) => url, + Setting::NotSet => old_url, + Setting::Reset => return Err(MissingUrl(uuid.to_owned())), + }; + + match new_webhook.headers { + Setting::Set(new_headers) => { + for (name, value) in new_headers { + match value { + Setting::Set(value) => { + headers.insert(name, value); + } + Setting::NotSet => continue, + Setting::Reset => { + headers.remove(&name); + continue; + } + } + } + } + Setting::Reset => headers.clear(), + Setting::NotSet => (), + }; + + if headers.len() > 200 { + return Err(TooManyHeaders(uuid.to_owned())); + } + + Ok(Webhook { url, headers }) +} + +fn check_changed(uuid: Uuid, webhook: &Webhook) -> Result<(), WebhooksError> { + if uuid.is_nil() { + return Err(ImmutableWebhook(uuid)); + } + + if webhook.url.is_empty() { + return Err(MissingUrl(uuid)); + } + + if webhook.headers.len() > 200 { + return Err(TooManyHeaders(uuid)); + } + + for (header, value) in &webhook.headers { + HeaderName::from_bytes(header.as_bytes()) + .map_err(|e| InvalidHeaderName(header.to_owned(), e))?; + HeaderValue::from_str(value).map_err(|e| InvalidHeaderValue(header.to_owned(), e))?; + } + + if let Err(e) = Url::parse(&webhook.url) { + return Err(InvalidUrl(webhook.url.to_owned(), e)); + } + + Ok(()) +} + +#[utoipa::path( + get, + path = "/{uuid}", + tag = "Webhooks", + security(("Bearer" = ["webhooks.get", "webhooks.*", "*.get", "*"])), + responses( + (status = 200, description = "Webhook found", body = WebhookWithMetadata, content_type = "application/json", example = json!({ + "uuid": "550e8400-e29b-41d4-a716-446655440000", + "url": "https://your.site/on-tasks-completed", + "headers": { + "Authorization": "Bearer a-secret" + }, + "isEditable": true + })), + (status = 404, description = "Webhook not found", body = ResponseError, content_type = "application/json"), + (status = 401, description = "The authorization header is missing", body = ResponseError, content_type = "application/json"), + ), + params( + ("uuid" = Uuid, Path, description = "The universally unique identifier of the webhook") + ) +)] +async fn get_webhook( + index_scheduler: GuardedData, Data>, + uuid: Path, +) -> Result { + let uuid = Uuid::from_str(&uuid.into_inner()).map_err(InvalidUuid)?; + let mut webhooks = index_scheduler.webhooks_view(); + + let webhook = webhooks.webhooks.remove(&uuid).ok_or(WebhookNotFound(uuid))?; + let webhook = WebhookWithMetadata::from(uuid, webhook); + + debug!(returns = ?webhook, "Get webhook"); + Ok(HttpResponse::Ok().json(webhook)) +} + +#[utoipa::path( + post, + path = "", + tag = "Webhooks", + request_body = WebhookSettings, + security(("Bearer" = ["webhooks.create", "webhooks.*", "*"])), + responses( + (status = 201, description = "Webhook created successfully", body = WebhookWithMetadata, content_type = "application/json", example = json!({ + "uuid": "550e8400-e29b-41d4-a716-446655440000", + "url": "https://your.site/on-tasks-completed", + "headers": { + "Authorization": "Bearer a-secret-token" + }, + "isEditable": true + })), + (status = 401, description = "The authorization header is missing", body = ResponseError, content_type = "application/json"), + (status = 400, description = "Bad request", body = ResponseError, content_type = "application/json"), + ) +)] +async fn post_webhook( + index_scheduler: GuardedData, Data>, + webhook_settings: AwebJson, + req: HttpRequest, + analytics: Data, +) -> Result { + let webhook_settings = webhook_settings.into_inner(); + debug!(parameters = ?webhook_settings, "Post webhook"); + + let uuid = Uuid::new_v4(); + if webhook_settings.headers.as_ref().set().is_some_and(|h| h.len() > 200) { + return Err(TooManyHeaders(uuid).into()); + } + + let mut webhooks = index_scheduler.retrieve_runtime_webhooks(); + if webhooks.len() >= 20 { + return Err(TooManyWebhooks.into()); + } + + let webhook = Webhook { + url: webhook_settings.url.set().ok_or(MissingUrl(uuid))?, + headers: webhook_settings + .headers + .set() + .map(|h| h.into_iter().map(|(k, v)| (k, v.set().unwrap_or_default())).collect()) + .unwrap_or_default(), + }; + + check_changed(uuid, &webhook)?; + webhooks.insert(uuid, webhook.clone()); + index_scheduler.update_runtime_webhooks(webhooks)?; + + analytics.publish(PostWebhooksAnalytics, &req); + + let response = WebhookWithMetadata::from(uuid, webhook); + debug!(returns = ?response, "Post webhook"); + Ok(HttpResponse::Created().json(response)) +} + +#[utoipa::path( + patch, + path = "/{uuid}", + tag = "Webhooks", + request_body = WebhookSettings, + security(("Bearer" = ["webhooks.update", "webhooks.*", "*"])), + responses( + (status = 200, description = "Webhook updated successfully", body = WebhookWithMetadata, content_type = "application/json", example = json!({ + "uuid": "550e8400-e29b-41d4-a716-446655440000", + "url": "https://your.site/on-tasks-completed", + "headers": { + "Authorization": "Bearer a-secret-token" + }, + "isEditable": true + })), + (status = 401, description = "The authorization header is missing", body = ResponseError, content_type = "application/json"), + (status = 400, description = "Bad request", body = ResponseError, content_type = "application/json"), + ), + params( + ("uuid" = Uuid, Path, description = "The universally unique identifier of the webhook") + ) +)] +async fn patch_webhook( + index_scheduler: GuardedData, Data>, + uuid: Path, + webhook_settings: AwebJson, + req: HttpRequest, + analytics: Data, +) -> Result { + let uuid = Uuid::from_str(&uuid.into_inner()).map_err(InvalidUuid)?; + let webhook_settings = webhook_settings.into_inner(); + debug!(parameters = ?(uuid, &webhook_settings), "Patch webhook"); + + if uuid.is_nil() { + return Err(ImmutableWebhook(uuid).into()); + } + + let mut webhooks = index_scheduler.retrieve_runtime_webhooks(); + let old_webhook = webhooks.remove(&uuid).ok_or(WebhookNotFound(uuid))?; + let webhook = patch_webhook_inner(&uuid, old_webhook, webhook_settings)?; + + check_changed(uuid, &webhook)?; + webhooks.insert(uuid, webhook.clone()); + index_scheduler.update_runtime_webhooks(webhooks)?; + + analytics.publish(PatchWebhooksAnalytics, &req); + + let response = WebhookWithMetadata::from(uuid, webhook); + debug!(returns = ?response, "Patch webhook"); + Ok(HttpResponse::Ok().json(response)) +} + +#[utoipa::path( + delete, + path = "/{uuid}", + tag = "Webhooks", + security(("Bearer" = ["webhooks.delete", "webhooks.*", "*"])), + responses( + (status = 204, description = "Webhook deleted successfully"), + (status = 404, description = "Webhook not found", body = ResponseError, content_type = "application/json"), + (status = 401, description = "The authorization header is missing", body = ResponseError, content_type = "application/json"), + ), + params( + ("uuid" = Uuid, Path, description = "The universally unique identifier of the webhook") + ) +)] +async fn delete_webhook( + index_scheduler: GuardedData, Data>, + uuid: Path, +) -> Result { + let uuid = Uuid::from_str(&uuid.into_inner()).map_err(InvalidUuid)?; + debug!(parameters = ?uuid, "Delete webhook"); + + if uuid.is_nil() { + return Err(ImmutableWebhook(uuid).into()); + } + + let mut webhooks = index_scheduler.retrieve_runtime_webhooks(); + webhooks.remove(&uuid).ok_or(WebhookNotFound(uuid))?; + index_scheduler.update_runtime_webhooks(webhooks)?; + + debug!(returns = "No Content", "Delete webhook"); + Ok(HttpResponse::NoContent().finish()) +} diff --git a/crates/meilisearch/src/search/federated/perform.rs b/crates/meilisearch/src/search/federated/perform.rs index c0fec01e8..3c80c22e3 100644 --- a/crates/meilisearch/src/search/federated/perform.rs +++ b/crates/meilisearch/src/search/federated/perform.rs @@ -13,6 +13,7 @@ use meilisearch_types::error::ResponseError; use meilisearch_types::features::{Network, Remote}; use meilisearch_types::milli::order_by_map::OrderByMap; use meilisearch_types::milli::score_details::{ScoreDetails, WeightedScoreValue}; +use meilisearch_types::milli::vector::Embedding; use meilisearch_types::milli::{self, DocumentId, OrderBy, TimeBudget, DEFAULT_VALUES_PER_FACET}; use roaring::RoaringBitmap; use tokio::task::JoinHandle; @@ -46,6 +47,7 @@ pub async fn perform_federated_search( let deadline = before_search + std::time::Duration::from_secs(9); let required_hit_count = federation.limit + federation.offset; + let retrieve_vectors = queries.iter().any(|q| q.retrieve_vectors); let network = index_scheduler.network(); @@ -91,6 +93,7 @@ pub async fn perform_federated_search( federation, mut semantic_hit_count, mut results_by_index, + mut query_vectors, previous_query_data: _, facet_order, } = search_by_index; @@ -122,7 +125,26 @@ pub async fn perform_federated_search( .map(|hit| hit.hit()) .collect(); - // 3.3. merge facets + // 3.3. merge query vectors + let query_vectors = if retrieve_vectors { + for remote_results in remote_results.iter_mut() { + if let Some(remote_vectors) = remote_results.query_vectors.take() { + for (key, value) in remote_vectors.into_iter() { + debug_assert!( + !query_vectors.contains_key(&key), + "Query vector for query {key} already exists" + ); + query_vectors.insert(key, value); + } + } + } + + Some(query_vectors) + } else { + None + }; + + // 3.4. merge facets let (facet_distribution, facet_stats, facets_by_index) = facet_order.merge(federation.merge_facets, remote_results, facets); @@ -140,6 +162,7 @@ pub async fn perform_federated_search( offset: federation.offset, estimated_total_hits, }, + query_vectors, semantic_hit_count, degraded, used_negative_operator, @@ -408,6 +431,7 @@ fn merge_metadata( hits: _, processing_time_ms, hits_info, + query_vectors: _, semantic_hit_count: _, facet_distribution: _, facet_stats: _, @@ -657,6 +681,7 @@ struct SearchByIndex { // Then when merging, we'll update its value if there is any semantic hit semantic_hit_count: Option, results_by_index: Vec, + query_vectors: BTreeMap, previous_query_data: Option<(RankingRules, usize, String)>, // remember the order and name of first index for each facet when merging with index settings // to detect if the order is inconsistent for a facet. @@ -674,6 +699,7 @@ impl SearchByIndex { federation, semantic_hit_count: None, results_by_index: Vec::with_capacity(index_count), + query_vectors: BTreeMap::new(), previous_query_data: None, } } @@ -837,8 +863,19 @@ impl SearchByIndex { document_scores, degraded: query_degraded, used_negative_operator: query_used_negative_operator, + query_vector, } = result; + if query.retrieve_vectors { + if let Some(query_vector) = query_vector { + debug_assert!( + !self.query_vectors.contains_key(&query_index), + "Query vector for query {query_index} already exists" + ); + self.query_vectors.insert(query_index, query_vector); + } + } + candidates |= query_candidates; degraded |= query_degraded; used_negative_operator |= query_used_negative_operator; diff --git a/crates/meilisearch/src/search/federated/types.rs b/crates/meilisearch/src/search/federated/types.rs index 3cf28c815..9c96fe768 100644 --- a/crates/meilisearch/src/search/federated/types.rs +++ b/crates/meilisearch/src/search/federated/types.rs @@ -18,6 +18,7 @@ use serde::{Deserialize, Serialize}; use utoipa::ToSchema; use super::super::{ComputedFacets, FacetStats, HitsInfo, SearchHit, SearchQueryWithIndex}; +use crate::milli::vector::Embedding; pub const DEFAULT_FEDERATED_WEIGHT: f64 = 1.0; @@ -117,6 +118,9 @@ pub struct FederatedSearchResult { #[serde(flatten)] pub hits_info: HitsInfo, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub query_vectors: Option>, + #[serde(default, skip_serializing_if = "Option::is_none")] pub semantic_hit_count: Option, @@ -144,6 +148,7 @@ impl fmt::Debug for FederatedSearchResult { hits, processing_time_ms, hits_info, + query_vectors, semantic_hit_count, degraded, used_negative_operator, @@ -158,6 +163,10 @@ impl fmt::Debug for FederatedSearchResult { debug.field("processing_time_ms", &processing_time_ms); debug.field("hits", &format!("[{} hits returned]", hits.len())); debug.field("hits_info", &hits_info); + if let Some(query_vectors) = query_vectors { + let known = query_vectors.len(); + debug.field("query_vectors", &format!("[{known} known vectors]")); + } if *used_negative_operator { debug.field("used_negative_operator", used_negative_operator); } diff --git a/crates/meilisearch/src/search/mod.rs b/crates/meilisearch/src/search/mod.rs index 7bcc8a9f8..fca8cc3a6 100644 --- a/crates/meilisearch/src/search/mod.rs +++ b/crates/meilisearch/src/search/mod.rs @@ -841,6 +841,8 @@ pub struct SearchHit { pub struct SearchResult { pub hits: Vec, pub query: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub query_vector: Option>, pub processing_time_ms: u128, #[serde(flatten)] pub hits_info: HitsInfo, @@ -865,6 +867,7 @@ impl fmt::Debug for SearchResult { let SearchResult { hits, query, + query_vector, processing_time_ms, hits_info, facet_distribution, @@ -879,6 +882,9 @@ impl fmt::Debug for SearchResult { debug.field("processing_time_ms", &processing_time_ms); debug.field("hits", &format!("[{} hits returned]", hits.len())); debug.field("query", &query); + if query_vector.is_some() { + debug.field("query_vector", &"[...]"); + } debug.field("hits_info", &hits_info); if *used_negative_operator { debug.field("used_negative_operator", used_negative_operator); @@ -1050,6 +1056,7 @@ pub fn prepare_search<'t>( .map(|x| x as usize) .unwrap_or(DEFAULT_PAGINATION_MAX_TOTAL_HITS); + search.retrieve_vectors(query.retrieve_vectors); search.exhaustive_number_hits(is_finite_pagination); search.max_total_hits(Some(max_total_hits)); search.scoring_strategy( @@ -1132,6 +1139,7 @@ pub fn perform_search( document_scores, degraded, used_negative_operator, + query_vector, }, semantic_hit_count, ) = search_from_kind(index_uid, search_kind, search)?; @@ -1222,6 +1230,7 @@ pub fn perform_search( hits: documents, hits_info, query: q.unwrap_or_default(), + query_vector, processing_time_ms: before_search.elapsed().as_millis(), facet_distribution, facet_stats, @@ -1734,6 +1743,7 @@ pub fn perform_similar( document_scores, degraded: _, used_negative_operator: _, + query_vector: _, } = similar.execute().map_err(|err| match err { milli::Error::UserError(milli::UserError::InvalidFilter(_)) => { ResponseError::from_msg(err.to_string(), Code::InvalidSimilarFilter) diff --git a/crates/meilisearch/tests/auth/api_keys.rs b/crates/meilisearch/tests/auth/api_keys.rs index 6dc3f429b..8dca24ac4 100644 --- a/crates/meilisearch/tests/auth/api_keys.rs +++ b/crates/meilisearch/tests/auth/api_keys.rs @@ -421,7 +421,7 @@ async fn error_add_api_key_invalid_parameters_actions() { meili_snap::snapshot!(code, @"400 Bad Request"); meili_snap::snapshot!(meili_snap::json_string!(response, { ".createdAt" => "[ignored]", ".updatedAt" => "[ignored]" }), @r#" { - "message": "Unknown value `doc.add` at `.actions[0]`: expected one of `*`, `search`, `documents.*`, `documents.add`, `documents.get`, `documents.delete`, `indexes.*`, `indexes.create`, `indexes.get`, `indexes.update`, `indexes.delete`, `indexes.swap`, `tasks.*`, `tasks.cancel`, `tasks.delete`, `tasks.get`, `settings.*`, `settings.get`, `settings.update`, `stats.*`, `stats.get`, `metrics.*`, `metrics.get`, `dumps.*`, `dumps.create`, `snapshots.*`, `snapshots.create`, `version`, `keys.create`, `keys.get`, `keys.update`, `keys.delete`, `experimental.get`, `experimental.update`, `export`, `network.get`, `network.update`, `chatCompletions`, `chats.*`, `chats.get`, `chats.delete`, `chatsSettings.*`, `chatsSettings.get`, `chatsSettings.update`, `*.get`", + "message": "Unknown value `doc.add` at `.actions[0]`: expected one of `*`, `search`, `documents.*`, `documents.add`, `documents.get`, `documents.delete`, `indexes.*`, `indexes.create`, `indexes.get`, `indexes.update`, `indexes.delete`, `indexes.swap`, `tasks.*`, `tasks.cancel`, `tasks.delete`, `tasks.get`, `settings.*`, `settings.get`, `settings.update`, `stats.*`, `stats.get`, `metrics.*`, `metrics.get`, `dumps.*`, `dumps.create`, `snapshots.*`, `snapshots.create`, `version`, `keys.create`, `keys.get`, `keys.update`, `keys.delete`, `experimental.get`, `experimental.update`, `export`, `network.get`, `network.update`, `chatCompletions`, `chats.*`, `chats.get`, `chats.delete`, `chatsSettings.*`, `chatsSettings.get`, `chatsSettings.update`, `*.get`, `webhooks.get`, `webhooks.update`, `webhooks.delete`, `webhooks.create`, `webhooks.*`", "code": "invalid_api_key_actions", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#invalid_api_key_actions" diff --git a/crates/meilisearch/tests/auth/errors.rs b/crates/meilisearch/tests/auth/errors.rs index b16ccb2f5..2a40f4d2b 100644 --- a/crates/meilisearch/tests/auth/errors.rs +++ b/crates/meilisearch/tests/auth/errors.rs @@ -93,7 +93,7 @@ async fn create_api_key_bad_actions() { snapshot!(code, @"400 Bad Request"); snapshot!(json_string!(response), @r#" { - "message": "Unknown value `doggo` at `.actions[0]`: expected one of `*`, `search`, `documents.*`, `documents.add`, `documents.get`, `documents.delete`, `indexes.*`, `indexes.create`, `indexes.get`, `indexes.update`, `indexes.delete`, `indexes.swap`, `tasks.*`, `tasks.cancel`, `tasks.delete`, `tasks.get`, `settings.*`, `settings.get`, `settings.update`, `stats.*`, `stats.get`, `metrics.*`, `metrics.get`, `dumps.*`, `dumps.create`, `snapshots.*`, `snapshots.create`, `version`, `keys.create`, `keys.get`, `keys.update`, `keys.delete`, `experimental.get`, `experimental.update`, `export`, `network.get`, `network.update`, `chatCompletions`, `chats.*`, `chats.get`, `chats.delete`, `chatsSettings.*`, `chatsSettings.get`, `chatsSettings.update`, `*.get`", + "message": "Unknown value `doggo` at `.actions[0]`: expected one of `*`, `search`, `documents.*`, `documents.add`, `documents.get`, `documents.delete`, `indexes.*`, `indexes.create`, `indexes.get`, `indexes.update`, `indexes.delete`, `indexes.swap`, `tasks.*`, `tasks.cancel`, `tasks.delete`, `tasks.get`, `settings.*`, `settings.get`, `settings.update`, `stats.*`, `stats.get`, `metrics.*`, `metrics.get`, `dumps.*`, `dumps.create`, `snapshots.*`, `snapshots.create`, `version`, `keys.create`, `keys.get`, `keys.update`, `keys.delete`, `experimental.get`, `experimental.update`, `export`, `network.get`, `network.update`, `chatCompletions`, `chats.*`, `chats.get`, `chats.delete`, `chatsSettings.*`, `chatsSettings.get`, `chatsSettings.update`, `*.get`, `webhooks.get`, `webhooks.update`, `webhooks.delete`, `webhooks.create`, `webhooks.*`", "code": "invalid_api_key_actions", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#invalid_api_key_actions" diff --git a/crates/meilisearch/tests/common/index.rs b/crates/meilisearch/tests/common/index.rs index bb1506022..012c9bebe 100644 --- a/crates/meilisearch/tests/common/index.rs +++ b/crates/meilisearch/tests/common/index.rs @@ -249,6 +249,11 @@ impl<'a> Index<'a, Owned> { self.service.put_encoded(url, settings, self.encoder).await } + pub async fn update_settings_chat(&self, settings: Value) -> (Value, StatusCode) { + let url = format!("/indexes/{}/settings/chat", urlencode(self.uid.as_ref())); + self.service.patch_encoded(url, settings, self.encoder).await + } + pub async fn delete_settings(&self) -> (Value, StatusCode) { let url = format!("/indexes/{}/settings", urlencode(self.uid.as_ref())); self.service.delete(url).await diff --git a/crates/meilisearch/tests/common/server.rs b/crates/meilisearch/tests/common/server.rs index 63c990466..291356bf8 100644 --- a/crates/meilisearch/tests/common/server.rs +++ b/crates/meilisearch/tests/common/server.rs @@ -182,6 +182,25 @@ impl Server { self.service.patch("/network", value).await } + pub async fn create_webhook(&self, value: Value) -> (Value, StatusCode) { + self.service.post("/webhooks", value).await + } + + pub async fn get_webhook(&self, uuid: impl AsRef) -> (Value, StatusCode) { + let url = format!("/webhooks/{}", uuid.as_ref()); + self.service.get(url).await + } + + pub async fn delete_webhook(&self, uuid: impl AsRef) -> (Value, StatusCode) { + let url = format!("/webhooks/{}", uuid.as_ref()); + self.service.delete(url).await + } + + pub async fn patch_webhook(&self, uuid: impl AsRef, value: Value) -> (Value, StatusCode) { + let url = format!("/webhooks/{}", uuid.as_ref()); + self.service.patch(url, value).await + } + pub async fn get_metrics(&self) -> (Value, StatusCode) { self.service.get("/metrics").await } @@ -447,6 +466,10 @@ impl Server { pub async fn get_network(&self) -> (Value, StatusCode) { self.service.get("/network").await } + + pub async fn get_webhooks(&self) -> (Value, StatusCode) { + self.service.get("/webhooks").await + } } pub fn default_settings(dir: impl AsRef) -> Opt { diff --git a/crates/meilisearch/tests/search/errors.rs b/crates/meilisearch/tests/search/errors.rs index bbf4bbeb1..b89129775 100644 --- a/crates/meilisearch/tests/search/errors.rs +++ b/crates/meilisearch/tests/search/errors.rs @@ -1270,27 +1270,27 @@ async fn search_with_contains_without_enabling_the_feature() { index .search(json!({ "filter": "doggo CONTAINS kefir" }), |response, code| { snapshot!(code, @"400 Bad Request"); - snapshot!(json_string!(response), @r###" + snapshot!(json_string!(response), @r#" { - "message": "Using `CONTAINS` or `STARTS WITH` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", + "message": "Using `CONTAINS` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", "code": "feature_not_enabled", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#feature_not_enabled" } - "###); + "#); }) .await; index .search(json!({ "filter": "doggo != echo AND doggo CONTAINS kefir" }), |response, code| { snapshot!(code, @"400 Bad Request"); - snapshot!(json_string!(response), @r###" + snapshot!(json_string!(response), @r#" { - "message": "Using `CONTAINS` or `STARTS WITH` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n25:33 doggo != echo AND doggo CONTAINS kefir", + "message": "Using `CONTAINS` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n25:33 doggo != echo AND doggo CONTAINS kefir", "code": "feature_not_enabled", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#feature_not_enabled" } - "###); + "#); }) .await; @@ -1299,24 +1299,24 @@ async fn search_with_contains_without_enabling_the_feature() { index.search_post(json!({ "filter": ["doggo != echo", "doggo CONTAINS kefir"] })).await; snapshot!(code, @"400 Bad Request"); - snapshot!(json_string!(response), @r###" + snapshot!(json_string!(response), @r#" { - "message": "Using `CONTAINS` or `STARTS WITH` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", + "message": "Using `CONTAINS` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", "code": "feature_not_enabled", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#feature_not_enabled" } - "###); + "#); let (response, code) = index.search_post(json!({ "filter": ["doggo != echo", ["doggo CONTAINS kefir"]] })).await; snapshot!(code, @"400 Bad Request"); - snapshot!(json_string!(response), @r###" + snapshot!(json_string!(response), @r#" { - "message": "Using `CONTAINS` or `STARTS WITH` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", + "message": "Using `CONTAINS` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", "code": "feature_not_enabled", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#feature_not_enabled" } - "###); + "#); } diff --git a/crates/meilisearch/tests/search/hybrid.rs b/crates/meilisearch/tests/search/hybrid.rs index d95e6fb64..b2970f233 100644 --- a/crates/meilisearch/tests/search/hybrid.rs +++ b/crates/meilisearch/tests/search/hybrid.rs @@ -148,7 +148,70 @@ async fn simple_search() { ) .await; snapshot!(code, @"200 OK"); - snapshot!(response["hits"], @r###"[{"title":"Captain Planet","desc":"He's not part of the Marvel Cinematic Universe","id":"2","_vectors":{"default":{"embeddings":[[1.0,2.0]],"regenerate":false}}},{"title":"Captain Marvel","desc":"a Shazam ersatz","id":"3","_vectors":{"default":{"embeddings":[[2.0,3.0]],"regenerate":false}}},{"title":"Shazam!","desc":"a Captain Marvel ersatz","id":"1","_vectors":{"default":{"embeddings":[[1.0,3.0]],"regenerate":false}}}]"###); + snapshot!(response, @r#" + { + "hits": [ + { + "title": "Captain Planet", + "desc": "He's not part of the Marvel Cinematic Universe", + "id": "2", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 2.0 + ] + ], + "regenerate": false + } + } + }, + { + "title": "Captain Marvel", + "desc": "a Shazam ersatz", + "id": "3", + "_vectors": { + "default": { + "embeddings": [ + [ + 2.0, + 3.0 + ] + ], + "regenerate": false + } + } + }, + { + "title": "Shazam!", + "desc": "a Captain Marvel ersatz", + "id": "1", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 3.0 + ] + ], + "regenerate": false + } + } + } + ], + "query": "Captain", + "queryVector": [ + 1.0, + 1.0 + ], + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3, + "semanticHitCount": 0 + } + "#); snapshot!(response["semanticHitCount"], @"0"); let (response, code) = index @@ -157,7 +220,73 @@ async fn simple_search() { ) .await; snapshot!(code, @"200 OK"); - snapshot!(response["hits"], @r###"[{"title":"Captain Marvel","desc":"a Shazam ersatz","id":"3","_vectors":{"default":{"embeddings":[[2.0,3.0]],"regenerate":false}},"_rankingScore":0.990290343761444},{"title":"Captain Planet","desc":"He's not part of the Marvel Cinematic Universe","id":"2","_vectors":{"default":{"embeddings":[[1.0,2.0]],"regenerate":false}},"_rankingScore":0.9848484848484848},{"title":"Shazam!","desc":"a Captain Marvel ersatz","id":"1","_vectors":{"default":{"embeddings":[[1.0,3.0]],"regenerate":false}},"_rankingScore":0.9472135901451112}]"###); + snapshot!(response, @r#" + { + "hits": [ + { + "title": "Captain Marvel", + "desc": "a Shazam ersatz", + "id": "3", + "_vectors": { + "default": { + "embeddings": [ + [ + 2.0, + 3.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.990290343761444 + }, + { + "title": "Captain Planet", + "desc": "He's not part of the Marvel Cinematic Universe", + "id": "2", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 2.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.9848484848484848 + }, + { + "title": "Shazam!", + "desc": "a Captain Marvel ersatz", + "id": "1", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 3.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.9472135901451112 + } + ], + "query": "Captain", + "queryVector": [ + 1.0, + 1.0 + ], + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3, + "semanticHitCount": 2 + } + "#); snapshot!(response["semanticHitCount"], @"2"); let (response, code) = index @@ -166,7 +295,73 @@ async fn simple_search() { ) .await; snapshot!(code, @"200 OK"); - snapshot!(response["hits"], @r###"[{"title":"Captain Marvel","desc":"a Shazam ersatz","id":"3","_vectors":{"default":{"embeddings":[[2.0,3.0]],"regenerate":false}},"_rankingScore":0.990290343761444},{"title":"Captain Planet","desc":"He's not part of the Marvel Cinematic Universe","id":"2","_vectors":{"default":{"embeddings":[[1.0,2.0]],"regenerate":false}},"_rankingScore":0.974341630935669},{"title":"Shazam!","desc":"a Captain Marvel ersatz","id":"1","_vectors":{"default":{"embeddings":[[1.0,3.0]],"regenerate":false}},"_rankingScore":0.9472135901451112}]"###); + snapshot!(response, @r#" + { + "hits": [ + { + "title": "Captain Marvel", + "desc": "a Shazam ersatz", + "id": "3", + "_vectors": { + "default": { + "embeddings": [ + [ + 2.0, + 3.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.990290343761444 + }, + { + "title": "Captain Planet", + "desc": "He's not part of the Marvel Cinematic Universe", + "id": "2", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 2.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.974341630935669 + }, + { + "title": "Shazam!", + "desc": "a Captain Marvel ersatz", + "id": "1", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 3.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.9472135901451112 + } + ], + "query": "Captain", + "queryVector": [ + 1.0, + 1.0 + ], + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3, + "semanticHitCount": 3 + } + "#); snapshot!(response["semanticHitCount"], @"3"); } diff --git a/crates/meilisearch/tests/search/multi/mod.rs b/crates/meilisearch/tests/search/multi/mod.rs index b9eed56da..16ee3906e 100644 --- a/crates/meilisearch/tests/search/multi/mod.rs +++ b/crates/meilisearch/tests/search/multi/mod.rs @@ -3703,7 +3703,7 @@ async fn federation_vector_two_indexes() { ]})) .await; snapshot!(code, @"200 OK"); - snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".**._rankingScore" => "[score]" }), @r###" + snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".**._rankingScore" => "[score]" }), @r#" { "hits": [ { @@ -3911,9 +3911,20 @@ async fn federation_vector_two_indexes() { "limit": 20, "offset": 0, "estimatedTotalHits": 8, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.5 + ], + "1": [ + 0.8, + 0.6 + ] + }, "semanticHitCount": 6 } - "###); + "#); // hybrid search, distinct embedder let (response, code) = server @@ -3923,7 +3934,7 @@ async fn federation_vector_two_indexes() { ]})) .await; snapshot!(code, @"200 OK"); - snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".**._rankingScore" => "[score]" }), @r###" + snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".**._rankingScore" => "[score]" }), @r#" { "hits": [ { @@ -4139,9 +4150,20 @@ async fn federation_vector_two_indexes() { "limit": 20, "offset": 0, "estimatedTotalHits": 8, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.5 + ], + "1": [ + -1.0, + 0.6 + ] + }, "semanticHitCount": 8 } - "###); + "#); } #[actix_rt::test] diff --git a/crates/meilisearch/tests/search/multi/proxy.rs b/crates/meilisearch/tests/search/multi/proxy.rs index 943295da5..2b1623ff8 100644 --- a/crates/meilisearch/tests/search/multi/proxy.rs +++ b/crates/meilisearch/tests/search/multi/proxy.rs @@ -2,8 +2,9 @@ use std::sync::Arc; use actix_http::StatusCode; use meili_snap::{json_string, snapshot}; -use wiremock::matchers::AnyMatcher; -use wiremock::{Mock, MockServer, ResponseTemplate}; +use wiremock::matchers::method; +use wiremock::matchers::{path, AnyMatcher}; +use wiremock::{Mock, MockServer, Request, ResponseTemplate}; use crate::common::{Server, Value, SCORE_DOCUMENTS}; use crate::json; @@ -415,6 +416,503 @@ async fn remote_sharding() { "###); } +#[actix_rt::test] +async fn remote_sharding_retrieve_vectors() { + let ms0 = Server::new().await; + let ms1 = Server::new().await; + let ms2 = Server::new().await; + let index0 = ms0.index("test"); + let index1 = ms1.index("test"); + let index2 = ms2.index("test"); + + // enable feature + + let (response, code) = ms0.set_features(json!({"network": true})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response["network"]), @"true"); + let (response, code) = ms1.set_features(json!({"network": true})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response["network"]), @"true"); + let (response, code) = ms2.set_features(json!({"network": true})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response["network"]), @"true"); + + // set self + + let (response, code) = ms0.set_network(json!({"self": "ms0"})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response), @r###" + { + "self": "ms0", + "remotes": {} + } + "###); + let (response, code) = ms1.set_network(json!({"self": "ms1"})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response), @r###" + { + "self": "ms1", + "remotes": {} + } + "###); + let (response, code) = ms2.set_network(json!({"self": "ms2"})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response), @r###" + { + "self": "ms2", + "remotes": {} + } + "###); + + // setup embedders + + let mock_server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/")) + .respond_with(move |req: &Request| { + println!("Received request: {:?}", req); + let text = req.body_json::().unwrap().to_lowercase(); + let patterns = [ + ("batman", [1.0, 0.0, 0.0]), + ("dark", [0.0, 0.1, 0.0]), + ("knight", [0.1, 0.1, 0.0]), + ("returns", [0.0, 0.0, 0.2]), + ("part", [0.05, 0.1, 0.0]), + ("1", [0.3, 0.05, 0.0]), + ("2", [0.2, 0.05, 0.0]), + ]; + let mut embedding = vec![0.; 3]; + for (pattern, vector) in patterns { + if text.contains(pattern) { + for (i, v) in vector.iter().enumerate() { + embedding[i] += v; + } + } + } + ResponseTemplate::new(200).set_body_json(json!({ "data": embedding })) + }) + .mount(&mock_server) + .await; + let url = mock_server.uri(); + + for (server, index) in [(&ms0, &index0), (&ms1, &index1), (&ms2, &index2)] { + let (response, code) = index + .update_settings(json!({ + "embedders": { + "rest": { + "source": "rest", + "url": url, + "dimensions": 3, + "request": "{{text}}", + "response": { "data": "{{embedding}}" }, + "documentTemplate": "{{doc.name}}", + }, + }, + })) + .await; + snapshot!(code, @"202 Accepted"); + server.wait_task(response.uid()).await.succeeded(); + } + + // wrap servers + let ms0 = Arc::new(ms0); + let ms1 = Arc::new(ms1); + let ms2 = Arc::new(ms2); + + let rms0 = LocalMeili::new(ms0.clone()).await; + let rms1 = LocalMeili::new(ms1.clone()).await; + let rms2 = LocalMeili::new(ms2.clone()).await; + + // set network + let network = json!({"remotes": { + "ms0": { + "url": rms0.url() + }, + "ms1": { + "url": rms1.url() + }, + "ms2": { + "url": rms2.url() + } + }}); + + let (_response, status_code) = ms0.set_network(network.clone()).await; + snapshot!(status_code, @"200 OK"); + let (_response, status_code) = ms1.set_network(network.clone()).await; + snapshot!(status_code, @"200 OK"); + let (_response, status_code) = ms2.set_network(network.clone()).await; + snapshot!(status_code, @"200 OK"); + + // multi vector search: one query per remote + + let request = json!({ + "federation": {}, + "queries": [ + { + "q": "batman", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms0" + } + }, + { + "q": "dark knight", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + { + "q": "returns", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms2" + } + }, + ] + }); + + let (response, _status_code) = ms0.multi_search(request.clone()).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r#" + { + "hits": [], + "processingTimeMs": "[time]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.0 + ], + "1": [ + 0.1, + 0.2, + 0.0 + ], + "2": [ + 0.0, + 0.0, + 0.2 + ] + }, + "semanticHitCount": 0, + "remoteErrors": {} + } + "#); + + // multi vector search: two local queries, one remote + + let request = json!({ + "federation": {}, + "queries": [ + { + "q": "batman", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms0" + } + }, + { + "q": "dark knight", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms0" + } + }, + { + "q": "returns", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms2" + } + }, + ] + }); + + let (response, _status_code) = ms0.multi_search(request.clone()).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r#" + { + "hits": [], + "processingTimeMs": "[time]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.0 + ], + "1": [ + 0.1, + 0.2, + 0.0 + ], + "2": [ + 0.0, + 0.0, + 0.2 + ] + }, + "semanticHitCount": 0, + "remoteErrors": {} + } + "#); + + // multi vector search: two queries on the same remote + + let request = json!({ + "federation": {}, + "queries": [ + { + "q": "batman", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms0" + } + }, + { + "q": "dark knight", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + { + "q": "returns", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + ] + }); + + let (response, _status_code) = ms0.multi_search(request.clone()).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r#" + { + "hits": [], + "processingTimeMs": "[time]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.0 + ], + "1": [ + 0.1, + 0.2, + 0.0 + ], + "2": [ + 0.0, + 0.0, + 0.2 + ] + }, + "semanticHitCount": 0, + "remoteErrors": {} + } + "#); + + // multi search: two vector, one keyword + + let request = json!({ + "federation": {}, + "queries": [ + { + "q": "batman", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms0" + } + }, + { + "q": "dark knight", + "indexUid": "test", + "hybrid": { + "semanticRatio": 0.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + { + "q": "returns", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + ] + }); + + let (response, _status_code) = ms0.multi_search(request.clone()).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r#" + { + "hits": [], + "processingTimeMs": "[time]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.0 + ], + "2": [ + 0.0, + 0.0, + 0.2 + ] + }, + "semanticHitCount": 0, + "remoteErrors": {} + } + "#); + + // multi vector search: no local queries, all remote + + let request = json!({ + "federation": {}, + "queries": [ + { + "q": "batman", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + { + "q": "dark knight", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + { + "q": "returns", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + ] + }); + + let (response, _status_code) = ms0.multi_search(request.clone()).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r#" + { + "hits": [], + "processingTimeMs": "[time]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.0 + ], + "1": [ + 0.1, + 0.2, + 0.0 + ], + "2": [ + 0.0, + 0.0, + 0.2 + ] + }, + "remoteErrors": {} + } + "#); +} + #[actix_rt::test] async fn error_unregistered_remote() { let ms0 = Server::new().await; diff --git a/crates/meilisearch/tests/settings/chat.rs b/crates/meilisearch/tests/settings/chat.rs new file mode 100644 index 000000000..891a22431 --- /dev/null +++ b/crates/meilisearch/tests/settings/chat.rs @@ -0,0 +1,66 @@ +use crate::common::Server; +use crate::json; +use meili_snap::{json_string, snapshot}; + +#[actix_rt::test] +async fn set_reset_chat_issue_5772() { + let server = Server::new().await; + let index = server.unique_index(); + + let (_, code) = server + .set_features(json!({ + "chatCompletions": true, + })) + .await; + snapshot!(code, @r#"200 OK"#); + + let (task1, _code) = index.update_settings_chat(json!({ + "description": "test!", + "documentTemplate": "{% for field in fields %}{% if field.is_searchable and field.value != nil %}{{ field.name }}: {{ field.value }}\n{% endif %}{% endfor %}", + "documentTemplateMaxBytes": 400, + "searchParameters": { + "limit": 15, + "sort": [], + "attributesToSearchOn": [] + } + })).await; + server.wait_task(task1.uid()).await.succeeded(); + + let (response, _) = index.settings().await; + snapshot!(json_string!(response["chat"]), @r#" + { + "description": "test!", + "documentTemplate": "{% for field in fields %}{% if field.is_searchable and field.value != nil %}{{ field.name }}: {{ field.value }}\n{% endif %}{% endfor %}", + "documentTemplateMaxBytes": 400, + "searchParameters": { + "limit": 15, + "sort": [], + "attributesToSearchOn": [] + } + } + "#); + + let (task2, _status_code) = index.update_settings_chat(json!({ + "description": "test!", + "documentTemplate": "{% for field in fields %}{% if field.is_searchable and field.value != nil %}{{ field.name }}: {{ field.value }}\n{% endif %}{% endfor %}", + "documentTemplateMaxBytes": 400, + "searchParameters": { + "limit": 16 + } + })).await; + server.wait_task(task2.uid()).await.succeeded(); + + let (response, _) = index.settings().await; + snapshot!(json_string!(response["chat"]), @r#" + { + "description": "test!", + "documentTemplate": "{% for field in fields %}{% if field.is_searchable and field.value != nil %}{{ field.name }}: {{ field.value }}\n{% endif %}{% endfor %}", + "documentTemplateMaxBytes": 400, + "searchParameters": { + "limit": 16, + "sort": [], + "attributesToSearchOn": [] + } + } + "#); +} diff --git a/crates/meilisearch/tests/settings/get_settings.rs b/crates/meilisearch/tests/settings/get_settings.rs index f50f7f940..8419f640d 100644 --- a/crates/meilisearch/tests/settings/get_settings.rs +++ b/crates/meilisearch/tests/settings/get_settings.rs @@ -186,7 +186,7 @@ test_setting_routes!( }, { setting: chat, - update_verb: put, + update_verb: patch, default_value: { "description": "", "documentTemplate": "{% for field in fields %}{% if field.is_searchable and field.value != nil %}{{ field.name }}: {{ field.value }}\n{% endif %}{% endfor %}", diff --git a/crates/meilisearch/tests/settings/mod.rs b/crates/meilisearch/tests/settings/mod.rs index 6b61e6be0..b3a956c25 100644 --- a/crates/meilisearch/tests/settings/mod.rs +++ b/crates/meilisearch/tests/settings/mod.rs @@ -1,3 +1,4 @@ +mod chat; mod distinct; mod errors; mod get_settings; diff --git a/crates/meilisearch/tests/tasks/webhook.rs b/crates/meilisearch/tests/tasks/webhook.rs index b18002eb7..bf2477b25 100644 --- a/crates/meilisearch/tests/tasks/webhook.rs +++ b/crates/meilisearch/tests/tasks/webhook.rs @@ -2,16 +2,18 @@ //! post requests. The webhook handle starts a server and forwards all the //! received requests into a channel for you to handle. +use std::path::PathBuf; use std::sync::Arc; use actix_http::body::MessageBody; use actix_web::dev::{ServiceFactory, ServiceResponse}; use actix_web::web::{Bytes, Data}; use actix_web::{post, App, HttpRequest, HttpResponse, HttpServer}; -use meili_snap::snapshot; +use meili_snap::{json_string, snapshot}; use meilisearch::Opt; use tokio::sync::mpsc; use url::Url; +use uuid::Uuid; use crate::common::{self, default_settings, Server}; use crate::json; @@ -68,21 +70,55 @@ async fn create_webhook_server() -> WebhookHandle { } #[actix_web::test] -async fn test_basic_webhook() { - let WebhookHandle { server_handle, url, mut receiver } = create_webhook_server().await; - +async fn cli_only() { let db_path = tempfile::tempdir().unwrap(); let server = Server::new_with_options(Opt { - task_webhook_url: Some(Url::parse(&url).unwrap()), + task_webhook_url: Some(Url::parse("https://example-cli.com/").unwrap()), + task_webhook_authorization_header: Some(String::from("Bearer a-secret-token")), ..default_settings(db_path.path()) }) .await .unwrap(); - let index = server.index("tamo"); + let (webhooks, code) = server.get_webhooks().await; + snapshot!(code, @"200 OK"); + snapshot!(webhooks, @r#" + { + "results": [ + { + "uuid": "00000000-0000-0000-0000-000000000000", + "isEditable": false, + "url": "https://example-cli.com/", + "headers": { + "Authorization": "Bearer a-secret-token" + } + } + ] + } + "#); +} + +#[actix_web::test] +async fn single_receives_data() { + let WebhookHandle { server_handle, url, mut receiver } = create_webhook_server().await; + + let server = Server::new().await; + + let (value, code) = server.create_webhook(json!({ "url": url })).await; + snapshot!(code, @"201 Created"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]", ".url" => "[ignored]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "[ignored]", + "headers": {} + } + "#); + // May be flaky: we're relying on the fact that while the first document addition is processed, the other // operations will be received and will be batched together. If it doesn't happen it's not a problem // the rest of the test won't assume anything about the number of tasks per batch. + let index = server.index("tamo"); for i in 0..5 { let (_, _status) = index.add_documents(json!({ "id": i, "doggo": "bone" }), None).await; } @@ -127,3 +163,496 @@ async fn test_basic_webhook() { server_handle.abort(); } + +#[actix_web::test] +async fn multiple_receive_data() { + let WebhookHandle { server_handle: handle1, url: url1, receiver: mut receiver1 } = + create_webhook_server().await; + let WebhookHandle { server_handle: handle2, url: url2, receiver: mut receiver2 } = + create_webhook_server().await; + let WebhookHandle { server_handle: handle3, url: url3, receiver: mut receiver3 } = + create_webhook_server().await; + + let db_path = tempfile::tempdir().unwrap(); + let server = Server::new_with_options(Opt { + task_webhook_url: Some(Url::parse(&url3).unwrap()), + ..default_settings(db_path.path()) + }) + .await + .unwrap(); + + for url in [url1, url2] { + let (value, code) = server.create_webhook(json!({ "url": url })).await; + snapshot!(code, @"201 Created"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]", ".url" => "[ignored]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "[ignored]", + "headers": {} + } + "#); + } + let index = server.index("tamo"); + let (_, status) = index.add_documents(json!({ "id": 1, "doggo": "bone" }), None).await; + snapshot!(status, @"202 Accepted"); + + let mut count1 = 0; + let mut count2 = 0; + let mut count3 = 0; + while count1 == 0 || count2 == 0 || count3 == 0 { + tokio::select! { + msg = receiver1.recv() => { if msg.is_some() { count1 += 1; } }, + msg = receiver2.recv() => { if msg.is_some() { count2 += 1; } }, + msg = receiver3.recv() => { if msg.is_some() { count3 += 1; } }, + } + } + + assert_eq!(count1, 1); + assert_eq!(count2, 1); + assert_eq!(count3, 1); + + handle1.abort(); + handle2.abort(); + handle3.abort(); +} + +#[actix_web::test] +async fn cli_with_dumps() { + let db_path = tempfile::tempdir().unwrap(); + let server = Server::new_with_options(Opt { + task_webhook_url: Some(Url::parse("http://defined-in-test-cli.com").unwrap()), + task_webhook_authorization_header: Some(String::from( + "Bearer a-secret-token-defined-in-test-cli", + )), + import_dump: Some(PathBuf::from("../dump/tests/assets/v6-with-webhooks.dump")), + ..default_settings(db_path.path()) + }) + .await + .unwrap(); + + let (webhooks, code) = server.get_webhooks().await; + snapshot!(code, @"200 OK"); + snapshot!(webhooks, @r#" + { + "results": [ + { + "uuid": "00000000-0000-0000-0000-000000000000", + "isEditable": false, + "url": "http://defined-in-test-cli.com/", + "headers": { + "Authorization": "Bearer a-secret-token-defined-in-test-cli" + } + }, + { + "uuid": "627ea538-733d-4545-8d2d-03526eb381ce", + "isEditable": true, + "url": "https://example.com/authorization-less", + "headers": {} + }, + { + "uuid": "771b0a28-ef28-4082-b984-536f82958c65", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN" + } + }, + { + "uuid": "f3583083-f8a7-4cbf-a5e7-fb3f1e28a7e9", + "isEditable": true, + "url": "https://third.com", + "headers": {} + } + ] + } + "#); +} + +#[actix_web::test] +async fn reserved_names() { + let db_path = tempfile::tempdir().unwrap(); + let server = Server::new_with_options(Opt { + task_webhook_url: Some(Url::parse("https://example-cli.com/").unwrap()), + task_webhook_authorization_header: Some(String::from("Bearer a-secret-token")), + ..default_settings(db_path.path()) + }) + .await + .unwrap(); + + let (value, code) = server + .patch_webhook(Uuid::nil().to_string(), json!({ "url": "http://localhost:8080" })) + .await; + snapshot!(value, @r#" + { + "message": "Webhook `[uuid]` is immutable. The webhook defined from the command line cannot be modified using the API.", + "code": "immutable_webhook", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook" + } + "#); + snapshot!(code, @"400 Bad Request"); + + let (value, code) = server.delete_webhook(Uuid::nil().to_string()).await; + snapshot!(value, @r#" + { + "message": "Webhook `[uuid]` is immutable. The webhook defined from the command line cannot be modified using the API.", + "code": "immutable_webhook", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook" + } + "#); + snapshot!(code, @"400 Bad Request"); +} + +#[actix_web::test] +async fn over_limits() { + let server = Server::new().await; + + // Too many webhooks + let mut uuids = Vec::new(); + for _ in 0..20 { + let (value, code) = server.create_webhook(json!({ "url": "http://localhost:8080" } )).await; + snapshot!(code, @"201 Created"); + uuids.push(value.get("uuid").unwrap().as_str().unwrap().to_string()); + } + let (value, code) = server.create_webhook(json!({ "url": "http://localhost:8080" })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Defining too many webhooks would crush the server. Please limit the number of webhooks to 20. You may use a third-party proxy server to dispatch events to more than 20 endpoints.", + "code": "invalid_webhooks", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhooks" + } + "#); + + // Reset webhooks + for uuid in uuids { + let (_value, code) = server.delete_webhook(&uuid).await; + snapshot!(code, @"204 No Content"); + } + + // Test too many headers + let (value, code) = server.create_webhook(json!({ "url": "http://localhost:8080" })).await; + snapshot!(code, @"201 Created"); + let uuid = value.get("uuid").unwrap().as_str().unwrap(); + for i in 0..200 { + let header_name = format!("header_{i}"); + let (_value, code) = + server.patch_webhook(uuid, json!({ "headers": { header_name: "" } })).await; + snapshot!(code, @"200 OK"); + } + let (value, code) = + server.patch_webhook(uuid, json!({ "headers": { "header_200": "" } })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Too many headers for the webhook `[uuid]`. Please limit the number of headers to 200. Hint: To remove an already defined header set its value to `null`", + "code": "invalid_webhook_headers", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_headers" + } + "#); +} + +#[actix_web::test] +async fn post_get_delete() { + let server = Server::new().await; + + let (value, code) = server + .create_webhook(json!({ + "url": "https://example.com/hook", + "headers": { "authorization": "TOKEN" } + })) + .await; + snapshot!(code, @"201 Created"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN" + } + } + "#); + + let uuid = value.get("uuid").unwrap().as_str().unwrap(); + let (value, code) = server.get_webhook(uuid).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN" + } + } + "#); + + let (_value, code) = server.delete_webhook(uuid).await; + snapshot!(code, @"204 No Content"); + + let (_value, code) = server.get_webhook(uuid).await; + snapshot!(code, @"404 Not Found"); +} + +#[actix_web::test] +async fn create_and_patch() { + let server = Server::new().await; + + let (value, code) = + server.create_webhook(json!({ "headers": { "authorization": "TOKEN" } })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "The URL for the webhook `[uuid]` is missing.", + "code": "invalid_webhook_url", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_url" + } + "#); + + let (value, code) = server.create_webhook(json!({ "url": "https://example.com/hook" })).await; + snapshot!(code, @"201 Created"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": {} + } + "#); + + let uuid = value.get("uuid").unwrap().as_str().unwrap(); + let (value, code) = + server.patch_webhook(&uuid, json!({ "headers": { "authorization": "TOKEN" } })).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN" + } + } + "#); + + let (value, code) = + server.patch_webhook(&uuid, json!({ "headers": { "authorization2": "TOKEN" } })).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN", + "authorization2": "TOKEN" + } + } + "#); + + let (value, code) = + server.patch_webhook(&uuid, json!({ "headers": { "authorization": null } })).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization2": "TOKEN" + } + } + "#); + + let (value, code) = server.patch_webhook(&uuid, json!({ "url": null })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "message": "The URL for the webhook `[uuid]` is missing.", + "code": "invalid_webhook_url", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_url" + } + "#); +} + +#[actix_web::test] +async fn invalid_url_and_headers() { + let server = Server::new().await; + + // Test invalid URL format + let (value, code) = server.create_webhook(json!({ "url": "not-a-valid-url" })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid URL `not-a-valid-url`: relative URL without a base", + "code": "invalid_webhook_url", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_url" + } + "#); + + // Test invalid header name (containing spaces) + let (value, code) = server + .create_webhook(json!({ + "url": "https://example.com/hook", + "headers": { "invalid header name": "value" } + })) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid header name `invalid header name`: invalid HTTP header name", + "code": "invalid_webhook_headers", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_headers" + } + "#); + + // Test invalid header value (containing control characters) + let (value, code) = server + .create_webhook(json!({ + "url": "https://example.com/hook", + "headers": { "authorization": "token\nwith\nnewlines" } + })) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid header value `authorization`: failed to parse header value", + "code": "invalid_webhook_headers", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_headers" + } + "#); +} + +#[actix_web::test] +async fn invalid_uuid() { + let server = Server::new().await; + + // Test get webhook with invalid UUID + let (value, code) = server.get_webhook("invalid-uuid").await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid UUID: invalid character: expected an optional prefix of `urn:uuid:` followed by [0-9a-fA-F-], found `i` at 1", + "code": "invalid_webhook_uuid", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_uuid" + } + "#); + + // Test update webhook with invalid UUID + let (value, code) = + server.patch_webhook("invalid-uuid", json!({ "url": "https://example.com/hook" })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid UUID: invalid character: expected an optional prefix of `urn:uuid:` followed by [0-9a-fA-F-], found `i` at 1", + "code": "invalid_webhook_uuid", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_uuid" + } + "#); + + // Test delete webhook with invalid UUID + let (value, code) = server.delete_webhook("invalid-uuid").await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid UUID: invalid character: expected an optional prefix of `urn:uuid:` followed by [0-9a-fA-F-], found `i` at 1", + "code": "invalid_webhook_uuid", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_uuid" + } + "#); +} + +#[actix_web::test] +async fn forbidden_fields() { + let server = Server::new().await; + + // Test creating webhook with uuid field + let custom_uuid = Uuid::new_v4(); + let (value, code) = server + .create_webhook(json!({ + "url": "https://example.com/hook", + "uuid": custom_uuid.to_string(), + "headers": { "authorization": "TOKEN" } + })) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Immutable field `uuid`: expected one of `url`, `headers`", + "code": "immutable_webhook_uuid", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook_uuid" + } + "#); + + // Test creating webhook with isEditable field + let (value, code) = server + .create_webhook(json!({ + "url": "https://example.com/hook2", + "isEditable": false, + "headers": { "authorization": "TOKEN" } + })) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Immutable field `isEditable`: expected one of `url`, `headers`", + "code": "immutable_webhook_is_editable", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook_is_editable" + } + "#); + + // Test patching webhook with uuid field + let (value, code) = server + .patch_webhook( + "uuid-whatever", + json!({ + "uuid": Uuid::new_v4(), + "headers": { "new-header": "value" } + }), + ) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Immutable field `uuid`: expected one of `url`, `headers`", + "code": "immutable_webhook_uuid", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook_uuid" + } + "#); + + // Test patching webhook with isEditable field + let (value, code) = server + .patch_webhook( + "uuid-whatever", + json!({ + "isEditable": false, + "headers": { "another-header": "value" } + }), + ) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "message": "Immutable field `isEditable`: expected one of `url`, `headers`", + "code": "immutable_webhook_is_editable", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook_is_editable" + } + "#); +} diff --git a/crates/meilisearch/tests/upgrade/mod.rs b/crates/meilisearch/tests/upgrade/mod.rs index 8114ed58b..5d120ba2f 100644 --- a/crates/meilisearch/tests/upgrade/mod.rs +++ b/crates/meilisearch/tests/upgrade/mod.rs @@ -43,7 +43,7 @@ async fn version_too_old() { std::fs::write(db_path.join("VERSION"), "1.11.9999").unwrap(); let options = Opt { experimental_dumpless_upgrade: true, ..default_settings }; let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err(); - snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.16.0"); + snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.17.1"); } #[actix_rt::test] @@ -58,7 +58,7 @@ async fn version_requires_downgrade() { std::fs::write(db_path.join("VERSION"), format!("{major}.{minor}.{patch}")).unwrap(); let options = Opt { experimental_dumpless_upgrade: true, ..default_settings }; let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err(); - snapshot!(err, @"Database version 1.16.1 is higher than the Meilisearch version 1.16.0. Downgrade is not supported"); + snapshot!(err, @"Database version 1.17.2 is higher than the Meilisearch version 1.17.1. Downgrade is not supported"); } #[actix_rt::test] diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap index b56cc5ca3..e7d8768be 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap @@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "progress": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "stats": { "totalNbTasks": 1, diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap index b56cc5ca3..e7d8768be 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap @@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "progress": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "stats": { "totalNbTasks": 1, diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap index b56cc5ca3..e7d8768be 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap @@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "progress": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "stats": { "totalNbTasks": 1, diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap index a52072f56..61dd95786 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap @@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "canceledBy": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "error": null, "duration": "[duration]", diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap index a52072f56..61dd95786 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap @@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "canceledBy": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "error": null, "duration": "[duration]", diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap index a52072f56..61dd95786 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap @@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "canceledBy": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "error": null, "duration": "[duration]", diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_batch_queue_once_everything_has_been_processed.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_batch_queue_once_everything_has_been_processed.snap index 81b50fb92..8103ceed2 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_batch_queue_once_everything_has_been_processed.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_batch_queue_once_everything_has_been_processed.snap @@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "progress": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "stats": { "totalNbTasks": 1, diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_task_queue_once_everything_has_been_processed.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_task_queue_once_everything_has_been_processed.snap index 1ec334fed..81259377c 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_task_queue_once_everything_has_been_processed.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_task_queue_once_everything_has_been_processed.snap @@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "canceledBy": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "error": null, "duration": "[duration]", diff --git a/crates/meilisearch/tests/vector/binary_quantized.rs b/crates/meilisearch/tests/vector/binary_quantized.rs index 6fcfa3563..adb0da441 100644 --- a/crates/meilisearch/tests/vector/binary_quantized.rs +++ b/crates/meilisearch/tests/vector/binary_quantized.rs @@ -323,7 +323,7 @@ async fn binary_quantize_clear_documents() { // Make sure the arroy DB has been cleared let (documents, _code) = index.search_post(json!({ "hybrid": { "embedder": "manual" }, "vector": [1, 1, 1] })).await; - snapshot!(documents, @r###" + snapshot!(documents, @r#" { "hits": [], "query": "", @@ -333,5 +333,5 @@ async fn binary_quantize_clear_documents() { "estimatedTotalHits": 0, "semanticHitCount": 0 } - "###); + "#); } diff --git a/crates/meilisearch/tests/vector/mod.rs b/crates/meilisearch/tests/vector/mod.rs index ba2bbfec1..3c08b9e03 100644 --- a/crates/meilisearch/tests/vector/mod.rs +++ b/crates/meilisearch/tests/vector/mod.rs @@ -687,7 +687,7 @@ async fn clear_documents() { // Make sure the arroy DB has been cleared let (documents, _code) = index.search_post(json!({ "vector": [1, 1, 1], "hybrid": {"embedder": "manual"} })).await; - snapshot!(documents, @r###" + snapshot!(documents, @r#" { "hits": [], "query": "", @@ -697,7 +697,7 @@ async fn clear_documents() { "estimatedTotalHits": 0, "semanticHitCount": 0 } - "###); + "#); } #[actix_rt::test] @@ -741,7 +741,7 @@ async fn add_remove_one_vector_4588() { json!({"vector": [1, 1, 1], "hybrid": {"semanticRatio": 1.0, "embedder": "manual"} }), ) .await; - snapshot!(documents, @r###" + snapshot!(documents, @r#" { "hits": [ { @@ -756,7 +756,7 @@ async fn add_remove_one_vector_4588() { "estimatedTotalHits": 1, "semanticHitCount": 1 } - "###); + "#); let (documents, _code) = index .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) diff --git a/crates/milli/src/search/facet/filter.rs b/crates/milli/src/search/facet/filter.rs index 1ddfe96c7..f9262f855 100644 --- a/crates/milli/src/search/facet/filter.rs +++ b/crates/milli/src/search/facet/filter.rs @@ -1,3 +1,4 @@ +use std::borrow::Cow; use std::collections::BTreeSet; use std::fmt::{Debug, Display}; use std::ops::Bound::{self, Excluded, Included, Unbounded}; @@ -14,10 +15,9 @@ use super::facet_range_search; use crate::constants::{RESERVED_GEO_FIELD_NAME, RESERVED_VECTORS_FIELD_NAME}; use crate::error::{Error, UserError}; use crate::filterable_attributes_rules::{filtered_matching_patterns, matching_features}; -use crate::heed_codec::facet::{ - FacetGroupKey, FacetGroupKeyCodec, FacetGroupValue, FacetGroupValueCodec, -}; +use crate::heed_codec::facet::{FacetGroupKey, FacetGroupKeyCodec, FacetGroupValueCodec}; use crate::index::db_name::FACET_ID_STRING_DOCIDS; +use crate::search::facet::facet_range_search::find_docids_of_facet_within_bounds; use crate::{ distance_between_two_points, lat_lng_to_xyz, FieldId, FieldsIdsMap, FilterableAttributesFeatures, FilterableAttributesRule, Index, InternalError, Result, @@ -422,20 +422,56 @@ impl<'a> Filter<'a> { return Ok(docids); } Condition::StartsWith { keyword: _, word } => { + // The idea here is that "STARTS WITH baba" is the same as "baba <= value < babb". + // We just incremented the last letter to find the upper bound. + // The upper bound may not be valid utf8, but lmdb doesn't care as it works over bytes. + let value = crate::normalize_facet(word.value()); - let base = FacetGroupKey { field_id, level: 0, left_bound: value.as_str() }; - let docids = strings_db - .prefix_iter(rtxn, &base)? - .map(|result| -> Result { - match result { - Ok((_facet_group_key, FacetGroupValue { bitmap, .. })) => Ok(bitmap), - Err(_e) => Err(InternalError::from(SerializationError::Decoding { - db_name: Some(FACET_ID_STRING_DOCIDS), - }) - .into()), - } - }) - .union()?; + let mut value2 = value.as_bytes().to_owned(); + + let last = match value2.last_mut() { + Some(last) => last, + None => { + // The prefix is empty, so all documents that have the field will match. + return index + .exists_faceted_documents_ids(rtxn, field_id) + .map_err(|e| e.into()); + } + }; + + if *last == u8::MAX { + // u8::MAX is a forbidden UTF-8 byte, we're guaranteed it cannot be sent through a filter to meilisearch, but just in case, we're going to return something + tracing::warn!( + "Found non utf-8 character in filter. That shouldn't be possible" + ); + return Ok(RoaringBitmap::new()); + } + *last += 1; + + // This is very similar to `heed::Bytes` but its `EItem` is `&[u8]` instead of `[u8]` + struct BytesRef; + impl<'a> BytesEncode<'a> for BytesRef { + type EItem = &'a [u8]; + + fn bytes_encode( + item: &'a Self::EItem, + ) -> std::result::Result, heed::BoxedError> { + Ok(Cow::Borrowed(item)) + } + } + + let mut docids = RoaringBitmap::new(); + let bytes_db = + index.facet_id_string_docids.remap_key_type::>(); + find_docids_of_facet_within_bounds::( + rtxn, + bytes_db, + field_id, + &Included(value.as_bytes()), + &Excluded(value2.as_slice()), + universe, + &mut docids, + )?; return Ok(docids); } diff --git a/crates/milli/src/search/hybrid.rs b/crates/milli/src/search/hybrid.rs index a29b6c4c7..1535c73ba 100644 --- a/crates/milli/src/search/hybrid.rs +++ b/crates/milli/src/search/hybrid.rs @@ -7,7 +7,7 @@ use roaring::RoaringBitmap; use crate::score_details::{ScoreDetails, ScoreValue, ScoringStrategy}; use crate::search::new::{distinct_fid, distinct_single_docid}; use crate::search::SemanticSearch; -use crate::vector::SearchQuery; +use crate::vector::{Embedding, SearchQuery}; use crate::{Index, MatchingWords, Result, Search, SearchResult}; struct ScoreWithRatioResult { @@ -16,6 +16,7 @@ struct ScoreWithRatioResult { document_scores: Vec<(u32, ScoreWithRatio)>, degraded: bool, used_negative_operator: bool, + query_vector: Option, } type ScoreWithRatio = (Vec, f32); @@ -85,6 +86,7 @@ impl ScoreWithRatioResult { document_scores, degraded: results.degraded, used_negative_operator: results.used_negative_operator, + query_vector: results.query_vector, } } @@ -186,6 +188,7 @@ impl ScoreWithRatioResult { degraded: vector_results.degraded | keyword_results.degraded, used_negative_operator: vector_results.used_negative_operator | keyword_results.used_negative_operator, + query_vector: vector_results.query_vector, }, semantic_hit_count, )) @@ -209,6 +212,7 @@ impl Search<'_> { terms_matching_strategy: self.terms_matching_strategy, scoring_strategy: ScoringStrategy::Detailed, words_limit: self.words_limit, + retrieve_vectors: self.retrieve_vectors, exhaustive_number_hits: self.exhaustive_number_hits, max_total_hits: self.max_total_hits, rtxn: self.rtxn, @@ -265,7 +269,7 @@ impl Search<'_> { }; search.semantic = Some(SemanticSearch { - vector: Some(vector_query), + vector: Some(vector_query.clone()), embedder_name, embedder, quantized, @@ -322,6 +326,7 @@ fn return_keyword_results( mut document_scores, degraded, used_negative_operator, + query_vector, }: SearchResult, ) -> (SearchResult, Option) { let (documents_ids, document_scores) = if offset >= documents_ids.len() || @@ -348,6 +353,7 @@ fn return_keyword_results( document_scores, degraded, used_negative_operator, + query_vector, }, Some(0), ) diff --git a/crates/milli/src/search/mod.rs b/crates/milli/src/search/mod.rs index 8742db24d..2ae931ff5 100644 --- a/crates/milli/src/search/mod.rs +++ b/crates/milli/src/search/mod.rs @@ -52,6 +52,7 @@ pub struct Search<'a> { terms_matching_strategy: TermsMatchingStrategy, scoring_strategy: ScoringStrategy, words_limit: usize, + retrieve_vectors: bool, exhaustive_number_hits: bool, max_total_hits: Option, rtxn: &'a heed::RoTxn<'a>, @@ -75,6 +76,7 @@ impl<'a> Search<'a> { geo_param: GeoSortParameter::default(), terms_matching_strategy: TermsMatchingStrategy::default(), scoring_strategy: Default::default(), + retrieve_vectors: false, exhaustive_number_hits: false, max_total_hits: None, words_limit: 10, @@ -161,6 +163,11 @@ impl<'a> Search<'a> { self } + pub fn retrieve_vectors(&mut self, retrieve_vectors: bool) -> &mut Search<'a> { + self.retrieve_vectors = retrieve_vectors; + self + } + /// Forces the search to exhaustively compute the number of candidates, /// this will increase the search time but allows finite pagination. pub fn exhaustive_number_hits(&mut self, exhaustive_number_hits: bool) -> &mut Search<'a> { @@ -233,6 +240,7 @@ impl<'a> Search<'a> { } let universe = filtered_universe(ctx.index, ctx.txn, &self.filter)?; + let mut query_vector = None; let PartialSearchResult { located_query_terms, candidates, @@ -247,24 +255,29 @@ impl<'a> Search<'a> { embedder, quantized, media: _, - }) => execute_vector_search( - &mut ctx, - vector, - self.scoring_strategy, - self.exhaustive_number_hits, - self.max_total_hits, - universe, - &self.sort_criteria, - &self.distinct, - self.geo_param, - self.offset, - self.limit, - embedder_name, - embedder, - *quantized, - self.time_budget.clone(), - self.ranking_score_threshold, - )?, + }) => { + if self.retrieve_vectors { + query_vector = Some(vector.clone()); + } + execute_vector_search( + &mut ctx, + vector, + self.scoring_strategy, + self.exhaustive_number_hits, + self.max_total_hits, + universe, + &self.sort_criteria, + &self.distinct, + self.geo_param, + self.offset, + self.limit, + embedder_name, + embedder, + *quantized, + self.time_budget.clone(), + self.ranking_score_threshold, + )? + } _ => execute_search( &mut ctx, self.query.as_deref(), @@ -306,6 +319,7 @@ impl<'a> Search<'a> { documents_ids, degraded, used_negative_operator, + query_vector, }) } } @@ -324,6 +338,7 @@ impl fmt::Debug for Search<'_> { terms_matching_strategy, scoring_strategy, words_limit, + retrieve_vectors, exhaustive_number_hits, max_total_hits, rtxn: _, @@ -344,6 +359,7 @@ impl fmt::Debug for Search<'_> { .field("searchable_attributes", searchable_attributes) .field("terms_matching_strategy", terms_matching_strategy) .field("scoring_strategy", scoring_strategy) + .field("retrieve_vectors", retrieve_vectors) .field("exhaustive_number_hits", exhaustive_number_hits) .field("max_total_hits", max_total_hits) .field("words_limit", words_limit) @@ -366,6 +382,7 @@ pub struct SearchResult { pub document_scores: Vec>, pub degraded: bool, pub used_negative_operator: bool, + pub query_vector: Option, } #[derive(Debug, Clone, Copy, PartialEq, Eq)] diff --git a/crates/milli/src/search/new/tests/integration.rs b/crates/milli/src/search/new/tests/integration.rs index 38f39e18b..6b8c25ab8 100644 --- a/crates/milli/src/search/new/tests/integration.rs +++ b/crates/milli/src/search/new/tests/integration.rs @@ -17,7 +17,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index { let path = tempfile::tempdir().unwrap(); let options = EnvOpenOptions::new(); let mut options = options.read_txn_without_tls(); - options.map_size(10 * 1024 * 1024); // 10 MB + options.map_size(10 * 1024 * 1024); // 10 MiB let index = Index::new(options, &path, true).unwrap(); let mut wtxn = index.write_txn().unwrap(); diff --git a/crates/milli/src/search/similar.rs b/crates/milli/src/search/similar.rs index 903b5fcf9..2235f6436 100644 --- a/crates/milli/src/search/similar.rs +++ b/crates/milli/src/search/similar.rs @@ -130,6 +130,7 @@ impl<'a> Similar<'a> { document_scores, degraded: false, used_negative_operator: false, + query_vector: None, }) } } diff --git a/crates/milli/src/test_index.rs b/crates/milli/src/test_index.rs index 12ac4e158..6e34961e7 100644 --- a/crates/milli/src/test_index.rs +++ b/crates/milli/src/test_index.rs @@ -1097,6 +1097,7 @@ fn bug_3021_fourth() { mut documents_ids, degraded: _, used_negative_operator: _, + query_vector: _, } = search.execute().unwrap(); let primary_key_id = index.fields_ids_map(&rtxn).unwrap().id("primary_key").unwrap(); documents_ids.sort_unstable(); diff --git a/crates/milli/src/update/upgrade/mod.rs b/crates/milli/src/update/upgrade/mod.rs index f53319a37..ecd1cec6c 100644 --- a/crates/milli/src/update/upgrade/mod.rs +++ b/crates/milli/src/update/upgrade/mod.rs @@ -8,6 +8,7 @@ use v1_12::{V1_12_3_To_V1_13_0, V1_12_To_V1_12_3}; use v1_13::{V1_13_0_To_V1_13_1, V1_13_1_To_Latest_V1_13}; use v1_14::Latest_V1_13_To_Latest_V1_14; use v1_15::Latest_V1_14_To_Latest_V1_15; +use v1_16::Latest_V1_16_To_V1_17_0; use crate::constants::{VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH}; use crate::progress::{Progress, VariableNameStep}; @@ -34,6 +35,7 @@ const UPGRADE_FUNCTIONS: &[&dyn UpgradeIndex] = &[ &Latest_V1_13_To_Latest_V1_14 {}, &Latest_V1_14_To_Latest_V1_15 {}, &Latest_V1_15_To_V1_16_0 {}, + &Latest_V1_16_To_V1_17_0 {}, // This is the last upgrade function, it will be called when the index is up to date. // any other upgrade function should be added before this one. &ToCurrentNoOp {}, @@ -62,6 +64,7 @@ const fn start(from: (u32, u32, u32)) -> Option { // We must handle the current version in the match because in case of a failure some index may have been upgraded but not other. (1, 15, _) => function_index!(6), (1, 16, _) => function_index!(7), + (1, 17, _) => function_index!(8), // We deliberately don't add a placeholder with (VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH) here to force manually // considering dumpless upgrade. (_major, _minor, _patch) => return None, diff --git a/crates/milli/src/update/upgrade/v1_16.rs b/crates/milli/src/update/upgrade/v1_16.rs index f43efd77d..02dd136ce 100644 --- a/crates/milli/src/update/upgrade/v1_16.rs +++ b/crates/milli/src/update/upgrade/v1_16.rs @@ -46,3 +46,22 @@ impl UpgradeIndex for Latest_V1_15_To_V1_16_0 { (1, 16, 0) } } + +#[allow(non_camel_case_types)] +pub(super) struct Latest_V1_16_To_V1_17_0(); + +impl UpgradeIndex for Latest_V1_16_To_V1_17_0 { + fn upgrade( + &self, + _wtxn: &mut RwTxn, + _index: &Index, + _original: (u32, u32, u32), + _progress: Progress, + ) -> Result { + Ok(false) + } + + fn target_version(&self) -> (u32, u32, u32) { + (1, 17, 0) + } +} diff --git a/crates/milli/tests/search/filters.rs b/crates/milli/tests/search/filters.rs index bb5943782..c97143d48 100644 --- a/crates/milli/tests/search/filters.rs +++ b/crates/milli/tests/search/filters.rs @@ -25,13 +25,16 @@ macro_rules! test_filter { let SearchResult { documents_ids, .. } = search.execute().unwrap(); let filtered_ids = search::expected_filtered_ids($filter); - let expected_external_ids: Vec<_> = + let mut expected_external_ids: Vec<_> = search::expected_order(&criteria, TermsMatchingStrategy::default(), &[]) .into_iter() .filter_map(|d| if filtered_ids.contains(&d.id) { Some(d.id) } else { None }) .collect(); - let documents_ids = search::internal_to_external_ids(&index, &documents_ids); + let mut documents_ids = search::internal_to_external_ids(&index, &documents_ids); + + expected_external_ids.sort_unstable(); + documents_ids.sort_unstable(); assert_eq!(documents_ids, expected_external_ids); } }; @@ -102,3 +105,9 @@ test_filter!(empty_filter_1_double_not, vec![Right("NOT opt1 IS NOT EMPTY")]); test_filter!(in_filter, vec![Right("tag_in IN[1, 2, 3, four, five]")]); test_filter!(not_in_filter, vec![Right("tag_in NOT IN[1, 2, 3, four, five]")]); test_filter!(not_not_in_filter, vec![Right("NOT tag_in NOT IN[1, 2, 3, four, five]")]); + +test_filter!(starts_with_filter_single_letter, vec![Right("tag STARTS WITH e")]); +test_filter!(starts_with_filter_diacritic, vec![Right("tag STARTS WITH é")]); +test_filter!(starts_with_filter_empty_prefix, vec![Right("tag STARTS WITH ''")]); +test_filter!(starts_with_filter_hell, vec![Right("title STARTS WITH hell")]); +test_filter!(starts_with_filter_hello, vec![Right("title STARTS WITH hello")]); diff --git a/crates/milli/tests/search/mod.rs b/crates/milli/tests/search/mod.rs index fa03f1cc1..578a22009 100644 --- a/crates/milli/tests/search/mod.rs +++ b/crates/milli/tests/search/mod.rs @@ -12,7 +12,8 @@ use milli::update::new::indexer; use milli::update::{IndexerConfig, Settings}; use milli::vector::RuntimeEmbedders; use milli::{ - AscDesc, Criterion, DocumentId, FilterableAttributesRule, Index, Member, TermsMatchingStrategy, + normalize_facet, AscDesc, Criterion, DocumentId, FilterableAttributesRule, Index, Member, + TermsMatchingStrategy, }; use serde::{Deserialize, Deserializer}; use slice_group_by::GroupBy; @@ -36,7 +37,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index { let path = tempfile::tempdir().unwrap(); let options = EnvOpenOptions::new(); let mut options = options.read_txn_without_tls(); - options.map_size(10 * 1024 * 1024); // 10 MB + options.map_size(10 * 1024 * 1024); // 10 MiB let index = Index::new(options, &path, true).unwrap(); let mut wtxn = index.write_txn().unwrap(); @@ -46,6 +47,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index { builder.set_criteria(criteria.to_vec()); builder.set_filterable_fields(vec![ + FilterableAttributesRule::Field(S("title")), FilterableAttributesRule::Field(S("tag")), FilterableAttributesRule::Field(S("asc_desc_rank")), FilterableAttributesRule::Field(S("_geo")), @@ -220,6 +222,19 @@ fn execute_filter(filter: &str, document: &TestDocument) -> Option { { id = Some(document.id.clone()) } + } else if let Some((field, prefix)) = filter.split_once("STARTS WITH") { + let field = match field.trim() { + "tag" => &document.tag, + "title" => &document.title, + "description" => &document.description, + _ => panic!("Unknown field: {field}"), + }; + + let field = normalize_facet(field); + let prefix = normalize_facet(prefix.trim().trim_matches('\'')); + if field.starts_with(&prefix) { + id = Some(document.id.clone()); + } } else if let Some(("asc_desc_rank", filter)) = filter.split_once('<') { if document.asc_desc_rank < filter.parse().unwrap() { id = Some(document.id.clone()) @@ -271,6 +286,8 @@ fn execute_filter(filter: &str, document: &TestDocument) -> Option { } else if matches!(filter, "tag_in NOT IN[1, 2, 3, four, five]") { id = (!matches!(document.id.as_str(), "A" | "B" | "C" | "D" | "E")) .then(|| document.id.clone()); + } else { + panic!("Unknown filter: {filter}"); } id } diff --git a/crates/openapi-generator/Cargo.toml b/crates/openapi-generator/Cargo.toml new file mode 100644 index 000000000..652f6fc57 --- /dev/null +++ b/crates/openapi-generator/Cargo.toml @@ -0,0 +1,12 @@ +[package] +name = "openapi-generator" +version = "0.1.0" +edition = "2021" +publish = false + +[dependencies] +meilisearch = { path = "../meilisearch" } +serde_json = "1.0" +clap = { version = "4.5.40", features = ["derive"] } +anyhow = "1.0.98" +utoipa = "5.4.0" diff --git a/crates/openapi-generator/src/main.rs b/crates/openapi-generator/src/main.rs new file mode 100644 index 000000000..a6196f771 --- /dev/null +++ b/crates/openapi-generator/src/main.rs @@ -0,0 +1,43 @@ +use std::path::PathBuf; + +use anyhow::Result; +use clap::Parser; +use meilisearch::routes::MeilisearchApi; +use utoipa::OpenApi; + +#[derive(Parser)] +#[command(name = "openapi-generator")] +#[command(about = "Generate OpenAPI specification for Meilisearch")] +struct Cli { + /// Output file path (default: meilisearch.json) + #[arg(short, long, value_name = "FILE")] + output: Option, + + /// Pretty print the JSON output + #[arg(short, long)] + pretty: bool, +} + +fn main() -> Result<()> { + let cli = Cli::parse(); + + // Generate the OpenAPI specification + let openapi = MeilisearchApi::openapi(); + + // Determine output path + let output_path = cli.output.unwrap_or_else(|| PathBuf::from("meilisearch.json")); + + // Serialize to JSON + let json = if cli.pretty { + serde_json::to_string_pretty(&openapi)? + } else { + serde_json::to_string(&openapi)? + }; + + // Write to file + std::fs::write(&output_path, json)?; + + println!("OpenAPI specification written to: {}", output_path.display()); + + Ok(()) +} diff --git a/documentation/release.md b/documentation/release.md index f70fcf872..b3d0ed7e9 100644 --- a/documentation/release.md +++ b/documentation/release.md @@ -4,10 +4,11 @@ This guide is to describe how to make releases for the current repository. ## 📅 Weekly Meilisearch release -1. A weekly meeting is done every Monday to define the release and to ensure minimal checks before the release. +1. A weekly meeting is held every Thursday afternoon to define the release and to ensure minimal checks before the release.
Check out the TODO 👇👇👇 -- [ ] Define the version of the release (`vX.Y.Z`) +- [ ] Define the version of the release (`vX.Y.Z`) based on our Versioning Policy
. +- [ ] Define the commit that will reference the tag release. Every PR merged after this commit will not be taken into account in the future release - [ ] Manually test `--experimental-dumpless-upgrade` on a DB of the previous Meilisearch minor version
- [ ] Check recent automated tests on `main`
- [ ] Scheduled test suite
@@ -22,7 +23,7 @@ This guide is to describe how to make releases for the current repository. 2. Go to the GitHub interface, in the [`Release` section](https://github.com/meilisearch/meilisearch/releases). 3. Select the already drafted release or click on the `Draft a new release` button if you want to start a blank one, and fill the form with the appropriate information. -⚠️ Publish on `main` +⚠️ Publish on a specific commit defined by the team. Or publish on `main`, but ensure you do want all the PRs merged in your release. ⚙️ The CIs will be triggered to: - [Upload binaries](https://github.com/meilisearch/meilisearch/actions/workflows/publish-binaries.yml) to the associated GitHub release. @@ -31,7 +32,7 @@ This guide is to describe how to make releases for the current repository. - [Move the `latest` git tag to the release commit](https://github.com/meilisearch/meilisearch/actions/workflows/latest-git-tag.yml). -### 🔥 How to do a patch release for an hotfix +### 🔥 How to do a patch release for a hotfix It happens some releases come with impactful bugs in production (e.g. indexation or search issues): we obviously don't wait for the next cycle to fix them and we release a patched version of Meilisearch.