mirror of
https://github.com/meilisearch/meilisearch.git
synced 2025-12-05 04:05:42 +00:00
Compare commits
3 Commits
update-ver
...
fix-metric
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
170b52004c | ||
|
|
da733135c8 | ||
|
|
c3f14b1f00 |
5
.github/ISSUE_TEMPLATE/new_feature_issue.md
vendored
5
.github/ISSUE_TEMPLATE/new_feature_issue.md
vendored
@@ -24,11 +24,6 @@ TBD
|
||||
- [ ] If not, add the `no db change` label to your PR, and you're good to merge.
|
||||
- [ ] If yes, add the `db change` label to your PR. You'll receive a message explaining you what to do.
|
||||
|
||||
### Reminders when adding features
|
||||
|
||||
- [ ] Write unit tests using insta
|
||||
- [ ] Write declarative integration tests in [workloads/tests](https://github.com/meilisearch/meilisearch/tree/main/workloads/test). Specify the routes to call and then call `cargo xtask test workloads/tests/YOUR_TEST.json --update-responses` so that responses are automatically filled.
|
||||
|
||||
### Reminders when modifying the API
|
||||
|
||||
- [ ] Update the openAPI file with utoipa:
|
||||
|
||||
2
.github/workflows/bench-pr.yml
vendored
2
.github/workflows/bench-pr.yml
vendored
@@ -67,6 +67,8 @@ jobs:
|
||||
ref: ${{ steps.comment-branch.outputs.head_ref }}
|
||||
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
with:
|
||||
profile: minimal
|
||||
|
||||
- name: Run benchmarks on PR ${{ github.event.issue.id }}
|
||||
run: |
|
||||
|
||||
2
.github/workflows/bench-push-indexing.yml
vendored
2
.github/workflows/bench-push-indexing.yml
vendored
@@ -13,6 +13,8 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
with:
|
||||
profile: minimal
|
||||
|
||||
# Run benchmarks
|
||||
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch main - Commit ${{ github.sha }}
|
||||
|
||||
6
.github/workflows/db-change-comments.yml
vendored
6
.github/workflows/db-change-comments.yml
vendored
@@ -6,7 +6,7 @@ on:
|
||||
|
||||
env:
|
||||
MESSAGE: |
|
||||
### Hello, I'm a bot 🤖
|
||||
### Hello, I'm a bot 🤖
|
||||
|
||||
You are receiving this message because you declared that this PR make changes to the Meilisearch database.
|
||||
Depending on the nature of the change, additional actions might be required on your part. The following sections detail the additional actions depending on the nature of the change, please copy the relevant section in the description of your PR, and make sure to perform the required actions.
|
||||
@@ -19,7 +19,6 @@ env:
|
||||
|
||||
- [ ] Detail the change to the DB format and why they are forward compatible
|
||||
- [ ] Forward-compatibility: A database created before this PR and using the features touched by this PR was able to be opened by a Meilisearch produced by the code of this PR.
|
||||
- [ ] Declarative test: add a [declarative test containing a dumpless upgrade](https://github.com/meilisearch/meilisearch/blob/main/TESTING.md#typical-usage)
|
||||
|
||||
|
||||
## This PR makes breaking changes
|
||||
@@ -36,7 +35,8 @@ env:
|
||||
- [ ] Write the code to go from the old database to the new one
|
||||
- If the change happened in milli, the upgrade function should be written and called [here](https://github.com/meilisearch/meilisearch/blob/3fd86e8d76d7d468b0095d679adb09211ca3b6c0/crates/milli/src/update/upgrade/mod.rs#L24-L47)
|
||||
- If the change happened in the index-scheduler, we've never done it yet, but the right place to do it should be [here](https://github.com/meilisearch/meilisearch/blob/3fd86e8d76d7d468b0095d679adb09211ca3b6c0/crates/index-scheduler/src/scheduler/process_upgrade/mod.rs#L13)
|
||||
- [ ] Declarative test: add a [declarative test containing a dumpless upgrade](https://github.com/meilisearch/meilisearch/blob/main/TESTING.md#typical-usage)
|
||||
- [ ] Write an integration test [here](https://github.com/meilisearch/meilisearch/blob/main/crates/meilisearch/tests/upgrade/mod.rs) ensuring you can read the old database, upgrade to the new database, and read the new database as expected
|
||||
|
||||
|
||||
jobs:
|
||||
add-comment:
|
||||
|
||||
6
.github/workflows/flaky-tests.yml
vendored
6
.github/workflows/flaky-tests.yml
vendored
@@ -13,12 +13,6 @@ jobs:
|
||||
image: ubuntu:22.04
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
- name: Install needed dependencies
|
||||
run: |
|
||||
apt-get update && apt-get install -y curl
|
||||
|
||||
2
.github/workflows/fuzzer-indexing.yml
vendored
2
.github/workflows/fuzzer-indexing.yml
vendored
@@ -13,6 +13,8 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
with:
|
||||
profile: minimal
|
||||
|
||||
# Run benchmarks
|
||||
- name: Run the fuzzer
|
||||
|
||||
6
.github/workflows/publish-apt-brew-pkg.yml
vendored
6
.github/workflows/publish-apt-brew-pkg.yml
vendored
@@ -25,12 +25,6 @@ jobs:
|
||||
run: |
|
||||
apt-get update && apt-get install -y curl
|
||||
apt-get install build-essential -y
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
- name: Install cargo-deb
|
||||
run: cargo install cargo-deb
|
||||
|
||||
15
.github/workflows/publish-docker-images.yml
vendored
15
.github/workflows/publish-docker-images.yml
vendored
@@ -208,8 +208,8 @@ jobs:
|
||||
done
|
||||
cosign sign --yes ${images}
|
||||
|
||||
# /!\ Don't touch this without checking with engineers working on the Cloud code base on #discussion-engineering Slack channel
|
||||
- name: Notify meilisearch-cloud
|
||||
# /!\ Don't touch this without checking with Cloud team
|
||||
- name: Send CI information to Cloud team
|
||||
# Do not send if nightly build (i.e. 'schedule' or 'workflow_dispatch' event)
|
||||
if: ${{ (github.event_name == 'push') && (matrix.edition == 'enterprise') }}
|
||||
uses: peter-evans/repository-dispatch@v3
|
||||
@@ -218,14 +218,3 @@ jobs:
|
||||
repository: meilisearch/meilisearch-cloud
|
||||
event-type: cloud-docker-build
|
||||
client-payload: '{ "meilisearch_version": "${{ github.ref_name }}", "stable": "${{ steps.check-tag-format.outputs.stable }}" }'
|
||||
|
||||
# /!\ Don't touch this without checking with integration team members on #discussion-integrations Slack channel
|
||||
- name: Notify meilisearch-kubernetes
|
||||
# Do not send if nightly build (i.e. 'schedule' or 'workflow_dispatch' event), or if not stable
|
||||
if: ${{ github.event_name == 'push' && matrix.edition == 'community' && steps.check-tag-format.outputs.stable == 'true' }}
|
||||
uses: peter-evans/repository-dispatch@v3
|
||||
with:
|
||||
token: ${{ secrets.MEILI_BOT_GH_PAT }}
|
||||
repository: meilisearch/meilisearch-kubernetes
|
||||
event-type: meilisearch-release
|
||||
client-payload: '{ "version": "${{ github.ref_name }}" }'
|
||||
|
||||
151
.github/workflows/test-suite.yml
vendored
151
.github/workflows/test-suite.yml
vendored
@@ -19,36 +19,31 @@ jobs:
|
||||
runs-on: ${{ matrix.runner }}
|
||||
strategy:
|
||||
matrix:
|
||||
runner: [ubuntu-22.04, ubuntu-22.04-arm]
|
||||
runner: [ubuntu-24.04, ubuntu-24.04-arm]
|
||||
features: ["", "--features enterprise"]
|
||||
container:
|
||||
# Use ubuntu-22.04 to compile with glibc 2.35
|
||||
image: ubuntu:22.04
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: check free space before
|
||||
run: df -h
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
- name: Install needed dependencies
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
- name: check free space after
|
||||
run: df -h
|
||||
apt-get update && apt-get install -y curl
|
||||
apt-get install build-essential -y
|
||||
- name: Setup test with Rust stable
|
||||
uses: dtolnay/rust-toolchain@1.89
|
||||
- name: Cache dependencies
|
||||
uses: Swatinem/rust-cache@v2.8.0
|
||||
with:
|
||||
key: ${{ matrix.features }}
|
||||
- name: Run cargo build without any default features
|
||||
- name: Run cargo check without any default features
|
||||
uses: actions-rs/cargo@v1
|
||||
with:
|
||||
command: build
|
||||
args: --locked --no-default-features --all
|
||||
args: --locked --release --no-default-features --all
|
||||
- name: Run cargo test
|
||||
uses: actions-rs/cargo@v1
|
||||
with:
|
||||
command: test
|
||||
args: --locked --all ${{ matrix.features }}
|
||||
args: --locked --release --all ${{ matrix.features }}
|
||||
|
||||
test-others:
|
||||
name: Tests on ${{ matrix.os }}
|
||||
@@ -58,56 +53,53 @@ jobs:
|
||||
matrix:
|
||||
os: [macos-14, windows-2022]
|
||||
features: ["", "--features enterprise"]
|
||||
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: Cache dependencies
|
||||
uses: Swatinem/rust-cache@v2.8.0
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
- name: Run cargo build without any default features
|
||||
- name: Run cargo check without any default features
|
||||
uses: actions-rs/cargo@v1
|
||||
with:
|
||||
command: build
|
||||
args: --locked --no-default-features --all
|
||||
args: --locked --release --no-default-features --all
|
||||
- name: Run cargo test
|
||||
uses: actions-rs/cargo@v1
|
||||
with:
|
||||
command: test
|
||||
args: --locked --all ${{ matrix.features }}
|
||||
args: --locked --release --all ${{ matrix.features }}
|
||||
|
||||
test-all-features:
|
||||
name: Tests almost all features
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-latest
|
||||
container:
|
||||
# Use ubuntu-22.04 to compile with glibc 2.35
|
||||
image: ubuntu:22.04
|
||||
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
- name: Install needed dependencies
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
apt-get update
|
||||
apt-get install --assume-yes build-essential curl
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
- name: Run cargo build with almost all features
|
||||
run: |
|
||||
cargo build --workspace --locked --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
|
||||
cargo build --workspace --locked --release --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
|
||||
- name: Run cargo test with almost all features
|
||||
run: |
|
||||
cargo test --workspace --locked --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
|
||||
cargo test --workspace --locked --release --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
|
||||
|
||||
ollama-ubuntu:
|
||||
name: Test with Ollama
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
features: ["", "--features enterprise"]
|
||||
env:
|
||||
MEILI_TEST_OLLAMA_SERVER: "http://localhost:11434"
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
- name: Install Ollama
|
||||
run: |
|
||||
curl -fsSL https://ollama.com/install.sh | sudo -E sh
|
||||
@@ -131,20 +123,20 @@ jobs:
|
||||
uses: actions-rs/cargo@v1
|
||||
with:
|
||||
command: test
|
||||
args: --locked -p meilisearch --features test-ollama ollama
|
||||
args: --locked --release --all --features test-ollama ollama ${{ matrix.features }}
|
||||
|
||||
test-disabled-tokenization:
|
||||
name: Test disabled tokenization
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-latest
|
||||
container:
|
||||
image: ubuntu:22.04
|
||||
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
- name: Install needed dependencies
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
apt-get update
|
||||
apt-get install --assume-yes build-essential curl
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
- name: Run cargo tree without default features and check lindera is not present
|
||||
run: |
|
||||
@@ -156,39 +148,35 @@ jobs:
|
||||
run: |
|
||||
cargo tree -f '{p} {f}' -e normal | grep lindera -qz
|
||||
|
||||
build:
|
||||
name: Build in release
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
- name: Cache dependencies
|
||||
uses: Swatinem/rust-cache@v2.8.0
|
||||
- name: Build
|
||||
run: cargo build --release --locked --target x86_64-unknown-linux-gnu
|
||||
|
||||
clippy:
|
||||
name: Run Clippy
|
||||
# We run tests in debug also, to make sure that the debug_assertions are hit
|
||||
test-debug:
|
||||
name: Run tests in debug
|
||||
runs-on: ubuntu-22.04
|
||||
strategy:
|
||||
matrix:
|
||||
features: ["", "--features enterprise"]
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
- name: Cache dependencies
|
||||
uses: Swatinem/rust-cache@v2.8.0
|
||||
- name: Run tests in debug
|
||||
uses: actions-rs/cargo@v1
|
||||
with:
|
||||
command: test
|
||||
args: --locked --all ${{ matrix.features }}
|
||||
|
||||
clippy:
|
||||
name: Run Clippy
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
features: ["", "--features enterprise"]
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
with:
|
||||
profile: minimal
|
||||
components: clippy
|
||||
- name: Cache dependencies
|
||||
uses: Swatinem/rust-cache@v2.8.0
|
||||
@@ -200,17 +188,14 @@ jobs:
|
||||
|
||||
fmt:
|
||||
name: Run Rustfmt
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
with:
|
||||
profile: minimal
|
||||
toolchain: nightly-2024-07-09
|
||||
override: true
|
||||
components: rustfmt
|
||||
- name: Cache dependencies
|
||||
uses: Swatinem/rust-cache@v2.8.0
|
||||
@@ -221,23 +206,3 @@ jobs:
|
||||
run: |
|
||||
echo -ne "\n" > crates/benchmarks/benches/datasets_paths.rs
|
||||
cargo fmt --all -- --check
|
||||
|
||||
declarative-tests:
|
||||
name: Run declarative tests
|
||||
runs-on: ubuntu-22.04-arm
|
||||
permissions:
|
||||
contents: read
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
- name: Cache dependencies
|
||||
uses: Swatinem/rust-cache@v2.8.0
|
||||
- name: Run declarative tests
|
||||
run: |
|
||||
cargo xtask test workloads/tests/*.json
|
||||
|
||||
@@ -18,13 +18,9 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
|
||||
run: |
|
||||
sudo rm -rf "/opt/ghc" || true
|
||||
sudo rm -rf "/usr/share/dotnet" || true
|
||||
sudo rm -rf "/usr/local/lib/android" || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
- uses: dtolnay/rust-toolchain@1.89
|
||||
with:
|
||||
profile: minimal
|
||||
- name: Install sd
|
||||
run: cargo install sd
|
||||
- name: Update Cargo.toml file
|
||||
|
||||
@@ -124,7 +124,6 @@ They are JSON files with the following structure (comments are not actually supp
|
||||
{
|
||||
// Name of the workload. Must be unique to the workload, as it will be used to group results on the dashboard.
|
||||
"name": "hackernews.ndjson_1M,no-threads",
|
||||
"type": "bench",
|
||||
// Number of consecutive runs of the commands that should be performed.
|
||||
// Each run uses a fresh instance of Meilisearch and a fresh database.
|
||||
// Each run produces its own report file.
|
||||
|
||||
106
Cargo.lock
generated
106
Cargo.lock
generated
@@ -580,7 +580,7 @@ source = "git+https://github.com/meilisearch/bbqueue#e8af4a4bccc8eb36b2b0442c4a9
|
||||
|
||||
[[package]]
|
||||
name = "benchmarks"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"bumpalo",
|
||||
@@ -790,11 +790,11 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "build-info"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"time",
|
||||
"vergen-gitcl",
|
||||
"vergen-git2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -1786,7 +1786,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "dump"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"big_s",
|
||||
@@ -2018,7 +2018,7 @@ checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be"
|
||||
|
||||
[[package]]
|
||||
name = "file-store"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"tempfile",
|
||||
"thiserror 2.0.17",
|
||||
@@ -2040,7 +2040,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "filter-parser"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"insta",
|
||||
"levenshtein_automata",
|
||||
@@ -2068,7 +2068,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "flatten-serde-json"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"criterion",
|
||||
"serde_json",
|
||||
@@ -2231,7 +2231,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "fuzzers"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"arbitrary",
|
||||
"bumpalo",
|
||||
@@ -2604,6 +2604,19 @@ version = "0.32.3"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "e629b9b98ef3dd8afe6ca2bd0f89306cec16d43d907889945bc5d6687f2f13c7"
|
||||
|
||||
[[package]]
|
||||
name = "git2"
|
||||
version = "0.20.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "2deb07a133b1520dc1a5690e9bd08950108873d7ed5de38dcc74d3b5ebffa110"
|
||||
dependencies = [
|
||||
"bitflags 2.10.0",
|
||||
"libc",
|
||||
"libgit2-sys",
|
||||
"log",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "glob"
|
||||
version = "0.3.3"
|
||||
@@ -2698,9 +2711,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "hannoy"
|
||||
version = "0.1.0-nested-rtxns"
|
||||
version = "0.0.9-nested-rtxns-2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "be82bf3f2108ddc8885e3d306fcd7f4692066bfe26065ca8b42ba417f3c26dd1"
|
||||
checksum = "06eda090938d9dcd568c8c2a5de383047ed9191578ebf4a342d2975d16e621f2"
|
||||
dependencies = [
|
||||
"bytemuck",
|
||||
"byteorder",
|
||||
@@ -3185,7 +3198,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "index-scheduler"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"backoff",
|
||||
@@ -3447,7 +3460,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "json-depth-checker"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"criterion",
|
||||
"serde_json",
|
||||
@@ -3544,6 +3557,18 @@ dependencies = [
|
||||
"rle-decode-fast",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "libgit2-sys"
|
||||
version = "0.18.2+1.9.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "1c42fe03df2bd3c53a3a9c7317ad91d80c81cd1fb0caec8d7cc4cd2bfa10c222"
|
||||
dependencies = [
|
||||
"cc",
|
||||
"libc",
|
||||
"libz-sys",
|
||||
"pkg-config",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "libloading"
|
||||
version = "0.8.9"
|
||||
@@ -3601,6 +3626,18 @@ dependencies = [
|
||||
"zlib-rs",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "libz-sys"
|
||||
version = "1.1.22"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "8b70e7a7df205e92a1a4cd9aaae7898dac0aa555503cc0a649494d0d60e7651d"
|
||||
dependencies = [
|
||||
"cc",
|
||||
"libc",
|
||||
"pkg-config",
|
||||
"vcpkg",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "lindera"
|
||||
version = "0.43.3"
|
||||
@@ -3937,7 +3974,7 @@ checksum = "ae960838283323069879657ca3de837e9f7bbb4c7bf6ea7f1b290d5e9476d2e0"
|
||||
|
||||
[[package]]
|
||||
name = "meili-snap"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"insta",
|
||||
"md5 0.8.0",
|
||||
@@ -3948,7 +3985,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "meilisearch"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"actix-cors",
|
||||
"actix-http",
|
||||
@@ -4046,7 +4083,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "meilisearch-auth"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"base64 0.22.1",
|
||||
"enum-iterator",
|
||||
@@ -4065,7 +4102,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "meilisearch-types"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"actix-web",
|
||||
"anyhow",
|
||||
@@ -4100,7 +4137,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "meilitool"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"clap",
|
||||
@@ -4134,7 +4171,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "milli"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"arroy",
|
||||
"bbqueue",
|
||||
@@ -4713,7 +4750,7 @@ checksum = "9b4f627cb1b25917193a259e49bdad08f671f8d9708acfd5fe0a8c1455d87220"
|
||||
|
||||
[[package]]
|
||||
name = "permissive-json-pointer"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"big_s",
|
||||
"serde_json",
|
||||
@@ -6035,20 +6072,6 @@ name = "similar"
|
||||
version = "2.7.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "bbbb5d9659141646ae647b42fe094daf6c6192d1620870b449d9557f748b2daa"
|
||||
dependencies = [
|
||||
"bstr",
|
||||
"unicode-segmentation",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "similar-asserts"
|
||||
version = "1.7.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "b5b441962c817e33508847a22bd82f03a30cff43642dc2fae8b050566121eb9a"
|
||||
dependencies = [
|
||||
"console",
|
||||
"similar",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "simple_asn1"
|
||||
@@ -7082,6 +7105,12 @@ version = "0.1.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "ba73ea9cf16a25df0c8caa16c51acb937d5712a8429db78a3ee29d5dcacd3a65"
|
||||
|
||||
[[package]]
|
||||
name = "vcpkg"
|
||||
version = "0.2.15"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "accd4ea62f7bb7a82fe23066fb0957d48ef677f6eeb8215f372f52e48bb32426"
|
||||
|
||||
[[package]]
|
||||
name = "vergen"
|
||||
version = "9.0.6"
|
||||
@@ -7095,13 +7124,14 @@ dependencies = [
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "vergen-gitcl"
|
||||
version = "1.0.8"
|
||||
name = "vergen-git2"
|
||||
version = "1.0.7"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "b9dfc1de6eb2e08a4ddf152f1b179529638bedc0ea95e6d667c014506377aefe"
|
||||
checksum = "4f6ee511ec45098eabade8a0750e76eec671e7fb2d9360c563911336bea9cac1"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"derive_builder",
|
||||
"git2",
|
||||
"rustversion",
|
||||
"time",
|
||||
"vergen",
|
||||
@@ -7753,7 +7783,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "xtask"
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"build-info",
|
||||
@@ -7762,11 +7792,9 @@ dependencies = [
|
||||
"futures-core",
|
||||
"futures-util",
|
||||
"reqwest",
|
||||
"semver",
|
||||
"serde",
|
||||
"serde_json",
|
||||
"sha2",
|
||||
"similar-asserts",
|
||||
"sysinfo",
|
||||
"time",
|
||||
"tokio",
|
||||
|
||||
@@ -23,7 +23,7 @@ members = [
|
||||
]
|
||||
|
||||
[workspace.package]
|
||||
version = "1.29.0"
|
||||
version = "1.28.2"
|
||||
authors = [
|
||||
"Quentin de Quelen <quentin@dequelen.me>",
|
||||
"Clément Renault <clement@meilisearch.com>",
|
||||
|
||||
326
TESTING.md
326
TESTING.md
@@ -1,326 +0,0 @@
|
||||
# Declarative tests
|
||||
|
||||
Declarative tests ensure that Meilisearch features remain stable across versions.
|
||||
|
||||
While we already have unit tests, those are run against **temporary databases** that are created fresh each time and therefore never risk corruption.
|
||||
|
||||
Declarative tests instead **simulate the lifetime of a database**: they chain together commands and requests to change the binary, verifying that database state and API responses remain consistent.
|
||||
|
||||
## Basic example
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"type": "test",
|
||||
"name": "api-keys",
|
||||
"binary": { // the first command will run on the binary following this specification.
|
||||
"source": "release", // get the binary as a release from GitHub
|
||||
"version": "1.19.0", // version to fetch
|
||||
"edition": "community" // edition to fetch
|
||||
},
|
||||
"commands": []
|
||||
}
|
||||
```
|
||||
|
||||
This example defines a no-op test (it does nothing).
|
||||
|
||||
If the file is saved at `workloads/tests/example.json`, you can run it with:
|
||||
|
||||
```bash
|
||||
cargo xtask test workloads/tests/example.json
|
||||
```
|
||||
|
||||
## Commands
|
||||
|
||||
Commands represent API requests sent to Meilisearch endpoints during a test.
|
||||
|
||||
They are executed sequentially, and their responses can be validated to ensure consistent behavior across upgrades.
|
||||
|
||||
```jsonc
|
||||
|
||||
{
|
||||
"route": "keys",
|
||||
"method": "POST",
|
||||
"body": {
|
||||
"inline": {
|
||||
"actions": [
|
||||
"search",
|
||||
"documents.add"
|
||||
],
|
||||
"description": "Test API Key",
|
||||
"expiresAt": null,
|
||||
"indexes": [ "movies" ]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This command issues a `POST /keys` request, creating an API key with permissions to search and add documents in the `movies` index.
|
||||
|
||||
### Using assets in commands
|
||||
|
||||
To keep tests concise and reusable, you can define **assets** at the root of the workload file.
|
||||
|
||||
Assets are external data sources (such as datasets) that are cached between runs, making tests faster and easier to read.
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"type": "test",
|
||||
"name": "movies",
|
||||
"binary": {
|
||||
"source": "release",
|
||||
"version": "1.19.0",
|
||||
"edition": "community"
|
||||
},
|
||||
"assets": {
|
||||
"movies.json": {
|
||||
"local_location": null,
|
||||
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/movies.json",
|
||||
"sha256": "5b6e4cb660bc20327776e8a33ea197b43d9ec84856710ead1cc87ab24df77de1"
|
||||
}
|
||||
},
|
||||
"commands": [
|
||||
{
|
||||
"route": "indexes/movies/documents",
|
||||
"method": "POST",
|
||||
"body": {
|
||||
"asset": "movies.json"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
In this example:
|
||||
- The `movies.json` dataset is defined as an asset, pointing to a remote URL.
|
||||
- The SHA-256 checksum ensures integrity.
|
||||
- The `POST /indexes/movies/documents` command uses this asset as the request body.
|
||||
|
||||
This makes the test much cleaner than inlining a large dataset directly into the command.
|
||||
|
||||
For asset handling, please refer to the [declarative benchmarks documentation](/BENCHMARKS.md#adding-new-assets).
|
||||
|
||||
### Asserting responses
|
||||
|
||||
Commands can specify both the **expected status code** and the **expected response body**.
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"route": "indexes/movies/documents",
|
||||
"method": "POST",
|
||||
"body": {
|
||||
"asset": "movies.json"
|
||||
},
|
||||
"expectedStatus": 202,
|
||||
"expectedResponse": {
|
||||
"enqueuedAt": "[timestamp]", // Set to a bracketed string to ignore the value
|
||||
"indexUid": "movies",
|
||||
"status": "enqueued",
|
||||
"taskUid": 1,
|
||||
"type": "documentAdditionOrUpdate"
|
||||
},
|
||||
"synchronous": "WaitForTask"
|
||||
}
|
||||
```
|
||||
|
||||
Manually writing `expectedResponse` fields can be tedious.
|
||||
|
||||
Instead, you can let the test runner populate them automatically:
|
||||
|
||||
```bash
|
||||
# Run the workload to populate expected fields. Only adds the missing ones, doesn't change existing data
|
||||
cargo xtask test workloads/tests/example.json --add-missing-responses
|
||||
|
||||
# OR
|
||||
|
||||
# Run the workload to populate expected fields. Updates all fields including existing ones
|
||||
cargo xtask test workloads/tests/example.json --update-responses
|
||||
```
|
||||
|
||||
This workflow is recommended:
|
||||
|
||||
1. Write the test without expected fields.
|
||||
2. Run it with `--add-missing-responses` to capture the actual responses.
|
||||
3. Review and commit the generated expectations.
|
||||
|
||||
## Changing binary
|
||||
|
||||
It is possible to insert an instruction to change the current Meilisearch instance from one binary specification to another during a test.
|
||||
|
||||
When executed, such an instruction will:
|
||||
1. Stop the current Meilisearch instance.
|
||||
2. Fetch the binary specified by the instruction.
|
||||
3. Restart the server with the specified binary on the same database.
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"type": "test",
|
||||
"name": "movies",
|
||||
"binary": {
|
||||
"source": "release",
|
||||
"version": "1.19.0", // start with version v1.19.0
|
||||
"edition": "community"
|
||||
},
|
||||
"assets": {
|
||||
"movies.json": {
|
||||
"local_location": null,
|
||||
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/movies.json",
|
||||
"sha256": "5b6e4cb660bc20327776e8a33ea197b43d9ec84856710ead1cc87ab24df77de1"
|
||||
}
|
||||
},
|
||||
"commands": [
|
||||
// setup some data
|
||||
{
|
||||
"route": "indexes/movies/documents",
|
||||
"method": "POST",
|
||||
"body": {
|
||||
"asset": "movies.json"
|
||||
}
|
||||
},
|
||||
// switch binary to v1.24.0
|
||||
{
|
||||
"binary": {
|
||||
"source": "release",
|
||||
"version": "1.24.0",
|
||||
"edition": "community"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Typical Usage
|
||||
|
||||
In most cases, the change binary instruction will be used to update a database.
|
||||
|
||||
- **Set up** some data using commands on an older version.
|
||||
- **Upgrade** to the latest version.
|
||||
- **Assert** that the data and API behavior remain correct after the upgrade.
|
||||
|
||||
To properly test the dumpless upgrade, one should typically:
|
||||
|
||||
1. Open the database without processing the update task: Use a `binary` instruction to switch to the desired version, passing `--experimental-dumpless-upgrade` and `--experimental-max-number-of-batched-tasks=0` as extra CLI arguments
|
||||
2. Check that the search, stats and task queue still work.
|
||||
3. Open the database and process the update task: Use a `binary` instruction to switch to the desired version, passing `--experimental-dumpless-upgrade` as the extra CLI argument. Use a `health` command to wait for the upgrade task to finish.
|
||||
4. Check that the indexing, search, stats, and task queue still work.
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"type": "test",
|
||||
"name": "movies",
|
||||
"binary": {
|
||||
"source": "release",
|
||||
"version": "1.12.0",
|
||||
"edition": "community"
|
||||
},
|
||||
"commands": [
|
||||
// 0. Run commands to populate the database
|
||||
{
|
||||
// ..
|
||||
},
|
||||
// 1. Open the database with new MS without processing the update task
|
||||
{
|
||||
"binary": {
|
||||
"source": "build", // build the binary from the sources in the current git repository
|
||||
"edition": "community",
|
||||
"extraCliArgs": [
|
||||
"--experimental-dumpless-upgrade", // allows to open with a newer MS
|
||||
"--experimental-max-number-of-batched-tasks=0" // prevent processing of the update task
|
||||
]
|
||||
}
|
||||
},
|
||||
// 2. Check the search etc.
|
||||
{
|
||||
// ..
|
||||
},
|
||||
// 3. Open the database with new MS and processing the update task
|
||||
{
|
||||
"binary": {
|
||||
"source": "build", // build the binary from the sources in the current git repository
|
||||
"edition": "community",
|
||||
"extraCliArgs": [
|
||||
"--experimental-dumpless-upgrade" // allows to open with a newer MS
|
||||
// no `--experimental-max-number-of-batched-tasks=0`
|
||||
]
|
||||
}
|
||||
},
|
||||
// 4. Check the indexing, search, etc.
|
||||
{
|
||||
// ..
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
This ensures backward compatibility: databases created with older Meilisearch versions should remain functional and consistent after an upgrade.
|
||||
|
||||
## Variables
|
||||
|
||||
Sometimes a command needs to use a value returned by a **previous response**.
|
||||
These values can be captured and reused using the register field.
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"route": "keys",
|
||||
"method": "POST",
|
||||
"body": {
|
||||
"inline": {
|
||||
"actions": [
|
||||
"search",
|
||||
"documents.add"
|
||||
],
|
||||
"description": "Test API Key",
|
||||
"expiresAt": null,
|
||||
"indexes": [ "movies" ]
|
||||
}
|
||||
},
|
||||
"expectedResponse": {
|
||||
"key": "c6f64630bad2996b1f675007c8800168e14adf5d6a7bb1a400a6d2b158050eaf",
|
||||
// ...
|
||||
},
|
||||
"register": {
|
||||
"key": "/key"
|
||||
},
|
||||
"synchronous": "WaitForResponse"
|
||||
}
|
||||
```
|
||||
|
||||
The `register` field captures the value at the JSON path `/key` from the response.
|
||||
Paths follow the **JavaScript Object Notation Pointer (RFC 6901)** format.
|
||||
Registered variables are available for all subsequent commands.
|
||||
|
||||
Registered variables can be referenced by wrapping their name in double curly braces:
|
||||
|
||||
In the route/path:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"route": "tasks/{{ task_id }}",
|
||||
"method": "GET"
|
||||
}
|
||||
```
|
||||
|
||||
In the request body:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"route": "indexes/movies/documents",
|
||||
"method": "PATCH",
|
||||
"body": {
|
||||
"inline": {
|
||||
"id": "{{ document_id }}",
|
||||
"overview": "Shazam turns evil and the world is in danger.",
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Or they can be referenced by their name (**without curly braces**) as an API key:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"route": "indexes/movies/documents",
|
||||
"method": "POST",
|
||||
"body": { /* ... */ },
|
||||
"apiKeyVariable": "key" // The **content** of the key variable will be used as an API key
|
||||
}
|
||||
```
|
||||
@@ -21,10 +21,6 @@ use roaring::RoaringBitmap;
|
||||
#[global_allocator]
|
||||
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
|
||||
|
||||
fn no_cancel() -> bool {
|
||||
false
|
||||
}
|
||||
|
||||
const BENCHMARK_ITERATION: usize = 10;
|
||||
|
||||
fn setup_dir(path: impl AsRef<Path>) {
|
||||
@@ -69,7 +65,7 @@ fn setup_settings<'t>(
|
||||
let sortable_fields = sortable_fields.iter().map(|s| s.to_string()).collect();
|
||||
builder.set_sortable_fields(sortable_fields);
|
||||
|
||||
builder.execute(&no_cancel, &Progress::default(), Default::default()).unwrap();
|
||||
builder.execute(&|| false, &Progress::default(), Default::default()).unwrap();
|
||||
}
|
||||
|
||||
fn setup_index_with_settings(
|
||||
@@ -156,7 +152,7 @@ fn indexing_songs_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -172,7 +168,7 @@ fn indexing_songs_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -224,7 +220,7 @@ fn reindexing_songs_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -240,7 +236,7 @@ fn reindexing_songs_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -270,7 +266,7 @@ fn reindexing_songs_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -286,7 +282,7 @@ fn reindexing_songs_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -340,7 +336,7 @@ fn deleting_songs_in_batches_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -356,7 +352,7 @@ fn deleting_songs_in_batches_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -418,7 +414,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -434,7 +430,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -464,7 +460,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -480,7 +476,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -506,7 +502,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -522,7 +518,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -575,7 +571,7 @@ fn indexing_songs_without_faceted_numbers(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -591,7 +587,7 @@ fn indexing_songs_without_faceted_numbers(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -643,7 +639,7 @@ fn indexing_songs_without_faceted_fields(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -659,7 +655,7 @@ fn indexing_songs_without_faceted_fields(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -711,7 +707,7 @@ fn indexing_wiki(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -727,7 +723,7 @@ fn indexing_wiki(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -778,7 +774,7 @@ fn reindexing_wiki(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -794,7 +790,7 @@ fn reindexing_wiki(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -824,7 +820,7 @@ fn reindexing_wiki(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -840,7 +836,7 @@ fn reindexing_wiki(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -893,7 +889,7 @@ fn deleting_wiki_in_batches_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -909,7 +905,7 @@ fn deleting_wiki_in_batches_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -971,7 +967,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -987,7 +983,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1018,7 +1014,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1034,7 +1030,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1061,7 +1057,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1077,7 +1073,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1129,7 +1125,7 @@ fn indexing_movies_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1145,7 +1141,7 @@ fn indexing_movies_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1196,7 +1192,7 @@ fn reindexing_movies_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1212,7 +1208,7 @@ fn reindexing_movies_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1242,7 +1238,7 @@ fn reindexing_movies_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1258,7 +1254,7 @@ fn reindexing_movies_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1311,7 +1307,7 @@ fn deleting_movies_in_batches_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1327,7 +1323,7 @@ fn deleting_movies_in_batches_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1376,7 +1372,7 @@ fn delete_documents_from_ids(index: Index, document_ids_to_delete: Vec<RoaringBi
|
||||
Some(primary_key),
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1426,7 +1422,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1442,7 +1438,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1472,7 +1468,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1488,7 +1484,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1514,7 +1510,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1530,7 +1526,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1605,7 +1601,7 @@ fn indexing_nested_movies_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1621,7 +1617,7 @@ fn indexing_nested_movies_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1697,7 +1693,7 @@ fn deleting_nested_movies_in_batches_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1713,7 +1709,7 @@ fn deleting_nested_movies_in_batches_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1781,7 +1777,7 @@ fn indexing_nested_movies_without_faceted_fields(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1797,7 +1793,7 @@ fn indexing_nested_movies_without_faceted_fields(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1849,7 +1845,7 @@ fn indexing_geo(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1865,7 +1861,7 @@ fn indexing_geo(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1916,7 +1912,7 @@ fn reindexing_geo(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1932,7 +1928,7 @@ fn reindexing_geo(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -1962,7 +1958,7 @@ fn reindexing_geo(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -1978,7 +1974,7 @@ fn reindexing_geo(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2031,7 +2027,7 @@ fn deleting_geo_in_batches_default(c: &mut Criterion) {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2047,7 +2043,7 @@ fn deleting_geo_in_batches_default(c: &mut Criterion) {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
|
||||
@@ -15,4 +15,4 @@ time = { version = "0.3.44", features = ["parsing"] }
|
||||
|
||||
[build-dependencies]
|
||||
anyhow = "1.0.100"
|
||||
vergen-gitcl = "1.0.8"
|
||||
vergen-git2 = "1.0.7"
|
||||
|
||||
@@ -15,7 +15,7 @@ fn emit_git_variables() -> anyhow::Result<()> {
|
||||
// Note: any code that needs VERGEN_ environment variables should take care to define them manually in the Dockerfile and pass them
|
||||
// in the corresponding GitHub workflow (publish_docker.yml).
|
||||
// This is due to the Dockerfile building the binary outside of the git directory.
|
||||
let mut builder = vergen_gitcl::GitclBuilder::default();
|
||||
let mut builder = vergen_git2::Git2Builder::default();
|
||||
|
||||
builder.branch(true);
|
||||
builder.commit_timestamp(true);
|
||||
@@ -25,5 +25,5 @@ fn emit_git_variables() -> anyhow::Result<()> {
|
||||
|
||||
let git2 = builder.build()?;
|
||||
|
||||
vergen_gitcl::Emitter::default().fail_on_error().add_instructions(&git2)?.emit()
|
||||
vergen_git2::Emitter::default().fail_on_error().add_instructions(&git2)?.emit()
|
||||
}
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
use build_info::BuildInfo;
|
||||
|
||||
fn main() {
|
||||
let info = BuildInfo::from_build();
|
||||
dbg!(info);
|
||||
}
|
||||
@@ -6,7 +6,7 @@ use meilisearch_types::heed::types::{SerdeBincode, SerdeJson, Str};
|
||||
use meilisearch_types::heed::{Database, RoTxn};
|
||||
use meilisearch_types::milli::{CboRoaringBitmapCodec, RoaringBitmapCodec, BEU32};
|
||||
use meilisearch_types::tasks::{Details, Kind, Status, Task};
|
||||
use meilisearch_types::versioning::{self, VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH};
|
||||
use meilisearch_types::versioning;
|
||||
use roaring::RoaringBitmap;
|
||||
|
||||
use crate::index_mapper::IndexMapper;
|
||||
@@ -320,11 +320,7 @@ fn snapshot_details(d: &Details) -> String {
|
||||
format!("{{ url: {url:?}, api_key: {api_key:?}, payload_size: {payload_size:?}, indexes: {indexes:?} }}")
|
||||
}
|
||||
Details::UpgradeDatabase { from, to } => {
|
||||
if to == &(VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH) {
|
||||
format!("{{ from: {from:?}, to: [current version] }}")
|
||||
} else {
|
||||
format!("{{ from: {from:?}, to: {to:?} }}")
|
||||
}
|
||||
format!("{{ from: {from:?}, to: {to:?} }}")
|
||||
}
|
||||
Details::IndexCompaction { index_uid, pre_compaction_size, post_compaction_size } => {
|
||||
format!("{{ index_uid: {index_uid:?}, pre_compaction_size: {pre_compaction_size:?}, post_compaction_size: {post_compaction_size:?} }}")
|
||||
@@ -404,21 +400,7 @@ pub fn snapshot_batch(batch: &Batch) -> String {
|
||||
|
||||
snap.push('{');
|
||||
snap.push_str(&format!("uid: {uid}, "));
|
||||
let details = if let Some(upgrade_to) = &details.upgrade_to {
|
||||
if upgrade_to.as_str()
|
||||
== format!("v{VERSION_MAJOR}.{VERSION_MINOR}.{VERSION_PATCH}").as_str()
|
||||
{
|
||||
let mut details = details.clone();
|
||||
|
||||
details.upgrade_to = Some("[current version]".into());
|
||||
serde_json::to_string(&details).unwrap()
|
||||
} else {
|
||||
serde_json::to_string(details).unwrap()
|
||||
}
|
||||
} else {
|
||||
serde_json::to_string(details).unwrap()
|
||||
};
|
||||
snap.push_str(&format!("details: {details}, "));
|
||||
snap.push_str(&format!("details: {}, ", serde_json::to_string(details).unwrap()));
|
||||
snap.push_str(&format!("stats: {}, ", serde_json::to_string(&stats).unwrap()));
|
||||
if !embedder_stats.skip_serializing() {
|
||||
snap.push_str(&format!(
|
||||
|
||||
@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
|
||||
[]
|
||||
----------------------------------------------------------------------
|
||||
### All Tasks:
|
||||
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: [current version] }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
1 {uid: 1, batch_uid: 1, status: succeeded, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
|
||||
2 {uid: 2, batch_uid: 2, status: succeeded, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
|
||||
3 {uid: 3, batch_uid: 3, status: failed, error: ResponseError { code: 200, message: "Index `doggo` already exists.", error_code: "index_already_exists", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#index_already_exists" }, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
|
||||
@@ -57,7 +57,7 @@ girafo: { number_of_documents: 0, field_distribution: {} }
|
||||
[timestamp] [4,]
|
||||
----------------------------------------------------------------------
|
||||
### All Batches:
|
||||
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"[current version]"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
|
||||
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.28.2"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
|
||||
1 {uid: 1, details: {"primaryKey":"mouse"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"catto":1}}, stop reason: "created batch containing only task with id 1 of type `indexCreation` that cannot be batched with any other task.", }
|
||||
2 {uid: 2, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 2 of type `indexCreation` that cannot be batched with any other task.", }
|
||||
3 {uid: 3, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 3 of type `indexCreation` that cannot be batched with any other task.", }
|
||||
|
||||
@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
|
||||
[]
|
||||
----------------------------------------------------------------------
|
||||
### All Tasks:
|
||||
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: [current version] }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
----------------------------------------------------------------------
|
||||
### Status:
|
||||
enqueued [0,]
|
||||
|
||||
@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
|
||||
[]
|
||||
----------------------------------------------------------------------
|
||||
### All Tasks:
|
||||
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: [current version] }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
|
||||
----------------------------------------------------------------------
|
||||
### Status:
|
||||
|
||||
@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
|
||||
[]
|
||||
----------------------------------------------------------------------
|
||||
### All Tasks:
|
||||
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: [current version] }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
|
||||
----------------------------------------------------------------------
|
||||
### Status:
|
||||
@@ -37,7 +37,7 @@ catto [1,]
|
||||
[timestamp] [0,]
|
||||
----------------------------------------------------------------------
|
||||
### All Batches:
|
||||
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"[current version]"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
|
||||
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.28.2"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
|
||||
----------------------------------------------------------------------
|
||||
### Batch to tasks mapping:
|
||||
0 [0,]
|
||||
|
||||
@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
|
||||
[]
|
||||
----------------------------------------------------------------------
|
||||
### All Tasks:
|
||||
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: [current version] }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
|
||||
2 {uid: 2, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
|
||||
----------------------------------------------------------------------
|
||||
@@ -40,7 +40,7 @@ doggo [2,]
|
||||
[timestamp] [0,]
|
||||
----------------------------------------------------------------------
|
||||
### All Batches:
|
||||
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"[current version]"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
|
||||
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.28.2"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
|
||||
----------------------------------------------------------------------
|
||||
### Batch to tasks mapping:
|
||||
0 [0,]
|
||||
|
||||
@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
|
||||
[]
|
||||
----------------------------------------------------------------------
|
||||
### All Tasks:
|
||||
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: [current version] }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
|
||||
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
|
||||
2 {uid: 2, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
|
||||
3 {uid: 3, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
|
||||
@@ -43,7 +43,7 @@ doggo [2,3,]
|
||||
[timestamp] [0,]
|
||||
----------------------------------------------------------------------
|
||||
### All Batches:
|
||||
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"[current version]"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
|
||||
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.28.2"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
|
||||
----------------------------------------------------------------------
|
||||
### Batch to tasks mapping:
|
||||
0 [0,]
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
use anyhow::bail;
|
||||
use meilisearch_types::heed::{Env, RwTxn, WithoutTls};
|
||||
use meilisearch_types::tasks::{Details, KindWithContent, Status, Task};
|
||||
use meilisearch_types::versioning;
|
||||
use meilisearch_types::versioning::{VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH};
|
||||
use time::OffsetDateTime;
|
||||
use tracing::info;
|
||||
|
||||
@@ -9,82 +9,83 @@ use crate::queue::TaskQueue;
|
||||
use crate::versioning::Versioning;
|
||||
|
||||
trait UpgradeIndexScheduler {
|
||||
fn upgrade(&self, env: &Env<WithoutTls>, wtxn: &mut RwTxn) -> anyhow::Result<()>;
|
||||
/// Whether the migration should be applied, depending on the initial version of the index scheduler before
|
||||
/// any migration was applied
|
||||
fn must_upgrade(&self, initial_version: (u32, u32, u32)) -> bool;
|
||||
/// A progress-centric description of the migration
|
||||
fn description(&self) -> &'static str;
|
||||
fn upgrade(
|
||||
&self,
|
||||
env: &Env<WithoutTls>,
|
||||
wtxn: &mut RwTxn,
|
||||
original: (u32, u32, u32),
|
||||
) -> anyhow::Result<()>;
|
||||
fn target_version(&self) -> (u32, u32, u32);
|
||||
}
|
||||
|
||||
/// Upgrade the index scheduler to the binary version.
|
||||
///
|
||||
/// # Warning
|
||||
///
|
||||
/// The current implementation uses a single wtxn to the index scheduler for the whole duration of the upgrade.
|
||||
/// If migrations start taking take a long time, it might prevent tasks from being registered.
|
||||
/// If this issue manifests, then it can be mitigated by adding a `fn target_version` to `UpgradeIndexScheduler`,
|
||||
/// to be able to write intermediate versions and drop the wtxn between applying migrations.
|
||||
pub fn upgrade_index_scheduler(
|
||||
env: &Env<WithoutTls>,
|
||||
versioning: &Versioning,
|
||||
initial_version: (u32, u32, u32),
|
||||
from: (u32, u32, u32),
|
||||
to: (u32, u32, u32),
|
||||
) -> anyhow::Result<()> {
|
||||
let target_major: u32 = versioning::VERSION_MAJOR;
|
||||
let target_minor: u32 = versioning::VERSION_MINOR;
|
||||
let target_patch: u32 = versioning::VERSION_PATCH;
|
||||
let target_version = (target_major, target_minor, target_patch);
|
||||
|
||||
if initial_version == target_version {
|
||||
return Ok(());
|
||||
}
|
||||
let current_major = to.0;
|
||||
let current_minor = to.1;
|
||||
let current_patch = to.2;
|
||||
|
||||
let upgrade_functions: &[&dyn UpgradeIndexScheduler] = &[
|
||||
// List all upgrade functions to apply in order here.
|
||||
// This is the last upgrade function, it will be called when the index is up to date.
|
||||
// any other upgrade function should be added before this one.
|
||||
&ToCurrentNoOp {},
|
||||
];
|
||||
|
||||
let (initial_major, initial_minor, initial_patch) = initial_version;
|
||||
|
||||
if initial_version > target_version {
|
||||
bail!(
|
||||
"Database version {initial_major}.{initial_minor}.{initial_patch} is higher than the Meilisearch version {target_major}.{target_minor}.{target_patch}. Downgrade is not supported",
|
||||
let start = match from {
|
||||
(1, 12, _) => 0,
|
||||
(1, 13, _) => 0,
|
||||
(1, 14, _) => 0,
|
||||
(1, 15, _) => 0,
|
||||
(1, 16, _) => 0,
|
||||
(1, 17, _) => 0,
|
||||
(1, 18, _) => 0,
|
||||
(1, 19, _) => 0,
|
||||
(1, 20, _) => 0,
|
||||
(1, 21, _) => 0,
|
||||
(1, 22, _) => 0,
|
||||
(1, 23, _) => 0,
|
||||
(1, 24, _) => 0,
|
||||
(1, 25, _) => 0,
|
||||
(1, 26, _) => 0,
|
||||
(1, 27, _) => 0,
|
||||
(1, 28, _) => 0,
|
||||
(major, minor, patch) => {
|
||||
if major > current_major
|
||||
|| (major == current_major && minor > current_minor)
|
||||
|| (major == current_major && minor == current_minor && patch > current_patch)
|
||||
{
|
||||
bail!(
|
||||
"Database version {major}.{minor}.{patch} is higher than the Meilisearch version {current_major}.{current_minor}.{current_patch}. Downgrade is not supported",
|
||||
);
|
||||
} else if major < 1 || (major == current_major && minor < 12) {
|
||||
bail!(
|
||||
"Database version {major}.{minor}.{patch} is too old for the experimental dumpless upgrade feature. Please generate a dump using the v{major}.{minor}.{patch} and import it in the v{current_major}.{current_minor}.{current_patch}",
|
||||
);
|
||||
}
|
||||
|
||||
if initial_version < (1, 12, 0) {
|
||||
bail!(
|
||||
"Database version {initial_major}.{initial_minor}.{initial_patch} is too old for the experimental dumpless upgrade feature. Please generate a dump using the v{initial_major}.{initial_minor}.{initial_patch} and import it in the v{target_major}.{target_minor}.{target_patch}",
|
||||
);
|
||||
}
|
||||
} else {
|
||||
bail!("Unknown database version: v{major}.{minor}.{patch}");
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
info!("Upgrading the task queue");
|
||||
let mut wtxn = env.write_txn()?;
|
||||
let migration_count = upgrade_functions.len();
|
||||
for (migration_index, upgrade) in upgrade_functions.iter().enumerate() {
|
||||
if upgrade.must_upgrade(initial_version) {
|
||||
info!(
|
||||
"[{migration_index}/{migration_count}]Applying migration: {}",
|
||||
upgrade.description()
|
||||
);
|
||||
|
||||
upgrade.upgrade(env, &mut wtxn)?;
|
||||
|
||||
info!(
|
||||
"[{}/{migration_count}]Migration applied: {}",
|
||||
migration_index + 1,
|
||||
upgrade.description()
|
||||
)
|
||||
} else {
|
||||
info!(
|
||||
"[{migration_index}/{migration_count}]Skipping unnecessary migration: {}",
|
||||
upgrade.description()
|
||||
)
|
||||
}
|
||||
let mut local_from = from;
|
||||
for upgrade in upgrade_functions[start..].iter() {
|
||||
let target = upgrade.target_version();
|
||||
info!(
|
||||
"Upgrading from v{}.{}.{} to v{}.{}.{}",
|
||||
local_from.0, local_from.1, local_from.2, target.0, target.1, target.2
|
||||
);
|
||||
let mut wtxn = env.write_txn()?;
|
||||
upgrade.upgrade(env, &mut wtxn, local_from)?;
|
||||
versioning.set_version(&mut wtxn, target)?;
|
||||
wtxn.commit()?;
|
||||
local_from = target;
|
||||
}
|
||||
|
||||
versioning.set_version(&mut wtxn, target_version)?;
|
||||
info!("Task queue upgraded, spawning the upgrade database task");
|
||||
|
||||
let mut wtxn = env.write_txn()?;
|
||||
let queue = TaskQueue::new(env, &mut wtxn)?;
|
||||
let uid = queue.next_task_id(&wtxn)?;
|
||||
queue.register(
|
||||
@@ -97,9 +98,9 @@ pub fn upgrade_index_scheduler(
|
||||
finished_at: None,
|
||||
error: None,
|
||||
canceled_by: None,
|
||||
details: Some(Details::UpgradeDatabase { from: initial_version, to: target_version }),
|
||||
details: Some(Details::UpgradeDatabase { from, to }),
|
||||
status: Status::Enqueued,
|
||||
kind: KindWithContent::UpgradeDatabase { from: initial_version },
|
||||
kind: KindWithContent::UpgradeDatabase { from },
|
||||
network: None,
|
||||
custom_metadata: None,
|
||||
},
|
||||
@@ -108,3 +109,21 @@ pub fn upgrade_index_scheduler(
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[allow(non_camel_case_types)]
|
||||
struct ToCurrentNoOp {}
|
||||
|
||||
impl UpgradeIndexScheduler for ToCurrentNoOp {
|
||||
fn upgrade(
|
||||
&self,
|
||||
_env: &Env<WithoutTls>,
|
||||
_wtxn: &mut RwTxn,
|
||||
_original: (u32, u32, u32),
|
||||
) -> anyhow::Result<()> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn target_version(&self) -> (u32, u32, u32) {
|
||||
(VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -64,7 +64,14 @@ impl Versioning {
|
||||
};
|
||||
wtxn.commit()?;
|
||||
|
||||
upgrade_index_scheduler(env, &this, from)?;
|
||||
let bin_major: u32 = versioning::VERSION_MAJOR;
|
||||
let bin_minor: u32 = versioning::VERSION_MINOR;
|
||||
let bin_patch: u32 = versioning::VERSION_PATCH;
|
||||
let to = (bin_major, bin_minor, bin_patch);
|
||||
|
||||
if from != to {
|
||||
upgrade_index_scheduler(env, &this, from, to)?;
|
||||
}
|
||||
|
||||
// Once we reach this point it means the upgrade process, if there was one is entirely finished
|
||||
// we can safely say we reached the latest version of the index scheduler
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq, Default)]
|
||||
#[serde(rename_all = "camelCase")]
|
||||
|
||||
@@ -197,7 +197,7 @@ test_setting_routes!(
|
||||
{
|
||||
setting: vector_store,
|
||||
update_verb: patch,
|
||||
default_value: "experimental"
|
||||
default_value: null
|
||||
},
|
||||
);
|
||||
|
||||
|
||||
@@ -2,7 +2,6 @@ mod chat;
|
||||
mod distinct;
|
||||
mod errors;
|
||||
mod get_settings;
|
||||
mod parent_seachable_fields;
|
||||
mod prefix_search_settings;
|
||||
mod proximity_settings;
|
||||
mod tokenizer_customization;
|
||||
|
||||
@@ -1,114 +0,0 @@
|
||||
use meili_snap::{json_string, snapshot};
|
||||
use once_cell::sync::Lazy;
|
||||
|
||||
use crate::common::Server;
|
||||
use crate::json;
|
||||
|
||||
static DOCUMENTS: Lazy<crate::common::Value> = Lazy::new(|| {
|
||||
json!([
|
||||
{
|
||||
"id": 1,
|
||||
"meta": {
|
||||
"title": "Soup of the day",
|
||||
"description": "many the fish",
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"meta": {
|
||||
"title": "Soup of day",
|
||||
"description": "many the lazy fish",
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"meta": {
|
||||
"title": "the Soup of day",
|
||||
"description": "many the fish",
|
||||
}
|
||||
},
|
||||
])
|
||||
});
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn nested_field_becomes_searchable() {
|
||||
let server = Server::new_shared();
|
||||
let index = server.unique_index();
|
||||
|
||||
let (task, _status_code) = index.add_documents(DOCUMENTS.clone(), None).await;
|
||||
server.wait_task(task.uid()).await.succeeded();
|
||||
|
||||
let (response, code) = index
|
||||
.update_settings(json!({
|
||||
"searchableAttributes": ["meta.title"]
|
||||
}))
|
||||
.await;
|
||||
assert_eq!("202", code.as_str(), "{response:?}");
|
||||
server.wait_task(response.uid()).await.succeeded();
|
||||
|
||||
// We expect no documents when searching for
|
||||
// a nested non-searchable field
|
||||
index
|
||||
.search(json!({"q": "many fish"}), |response, code| {
|
||||
snapshot!(code, @"200 OK");
|
||||
snapshot!(json_string!(response["hits"]), @r###"[]"###);
|
||||
})
|
||||
.await;
|
||||
|
||||
let (response, code) = index
|
||||
.update_settings(json!({
|
||||
"searchableAttributes": ["meta.title", "meta.description"]
|
||||
}))
|
||||
.await;
|
||||
assert_eq!("202", code.as_str(), "{response:?}");
|
||||
server.wait_task(response.uid()).await.succeeded();
|
||||
|
||||
// We expect all the documents when the nested field becomes searchable
|
||||
index
|
||||
.search(json!({"q": "many fish"}), |response, code| {
|
||||
snapshot!(code, @"200 OK");
|
||||
snapshot!(json_string!(response["hits"]), @r###"
|
||||
[
|
||||
{
|
||||
"id": 1,
|
||||
"meta": {
|
||||
"title": "Soup of the day",
|
||||
"description": "many the fish"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"meta": {
|
||||
"title": "the Soup of day",
|
||||
"description": "many the fish"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"meta": {
|
||||
"title": "Soup of day",
|
||||
"description": "many the lazy fish"
|
||||
}
|
||||
}
|
||||
]
|
||||
"###);
|
||||
})
|
||||
.await;
|
||||
|
||||
let (response, code) = index
|
||||
.update_settings(json!({
|
||||
"searchableAttributes": ["meta.title"]
|
||||
}))
|
||||
.await;
|
||||
assert_eq!("202", code.as_str(), "{response:?}");
|
||||
server.wait_task(response.uid()).await.succeeded();
|
||||
|
||||
// We expect no documents when searching for
|
||||
// a nested non-searchable field
|
||||
index
|
||||
.search(json!({"q": "many fish"}), |response, code| {
|
||||
snapshot!(code, @"200 OK");
|
||||
snapshot!(json_string!(response["hits"]), @r###"[]"###);
|
||||
})
|
||||
.await;
|
||||
}
|
||||
@@ -42,16 +42,8 @@ async fn version_too_old() {
|
||||
std::fs::create_dir_all(&db_path).unwrap();
|
||||
std::fs::write(db_path.join("VERSION"), "1.11.9999").unwrap();
|
||||
let options = Opt { experimental_dumpless_upgrade: true, ..default_settings };
|
||||
let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err().to_string();
|
||||
|
||||
let major = meilisearch_types::versioning::VERSION_MAJOR;
|
||||
let minor = meilisearch_types::versioning::VERSION_MINOR;
|
||||
let patch = meilisearch_types::versioning::VERSION_PATCH;
|
||||
|
||||
let current_version = format!("{major}.{minor}.{patch}");
|
||||
let err = err.replace(¤t_version, "[current version]");
|
||||
|
||||
snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v[current version]");
|
||||
let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err();
|
||||
snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.28.2");
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
@@ -62,21 +54,11 @@ async fn version_requires_downgrade() {
|
||||
std::fs::create_dir_all(&db_path).unwrap();
|
||||
let major = meilisearch_types::versioning::VERSION_MAJOR;
|
||||
let minor = meilisearch_types::versioning::VERSION_MINOR;
|
||||
let mut patch = meilisearch_types::versioning::VERSION_PATCH;
|
||||
|
||||
let current_version = format!("{major}.{minor}.{patch}");
|
||||
patch += 1;
|
||||
let future_version = format!("{major}.{minor}.{patch}");
|
||||
|
||||
std::fs::write(db_path.join("VERSION"), &future_version).unwrap();
|
||||
let patch = meilisearch_types::versioning::VERSION_PATCH + 1;
|
||||
std::fs::write(db_path.join("VERSION"), format!("{major}.{minor}.{patch}")).unwrap();
|
||||
let options = Opt { experimental_dumpless_upgrade: true, ..default_settings };
|
||||
let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err();
|
||||
|
||||
let err = err.to_string();
|
||||
let err = err.replace(¤t_version, "[current version]");
|
||||
let err = err.replace(&future_version, "[future version]");
|
||||
|
||||
snapshot!(err, @"Database version [future version] is higher than the Meilisearch version [current version]. Downgrade is not supported");
|
||||
snapshot!(err, @"Database version 1.28.3 is higher than the Meilisearch version 1.28.2. Downgrade is not supported");
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
|
||||
@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
|
||||
"progress": null,
|
||||
"details": {
|
||||
"upgradeFrom": "v1.12.0",
|
||||
"upgradeTo": "[current version]"
|
||||
"upgradeTo": "v1.28.2"
|
||||
},
|
||||
"stats": {
|
||||
"totalNbTasks": 1,
|
||||
|
||||
@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
|
||||
"progress": null,
|
||||
"details": {
|
||||
"upgradeFrom": "v1.12.0",
|
||||
"upgradeTo": "[current version]"
|
||||
"upgradeTo": "v1.28.2"
|
||||
},
|
||||
"stats": {
|
||||
"totalNbTasks": 1,
|
||||
|
||||
@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
|
||||
"progress": null,
|
||||
"details": {
|
||||
"upgradeFrom": "v1.12.0",
|
||||
"upgradeTo": "[current version]"
|
||||
"upgradeTo": "v1.28.2"
|
||||
},
|
||||
"stats": {
|
||||
"totalNbTasks": 1,
|
||||
|
||||
@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
|
||||
"canceledBy": null,
|
||||
"details": {
|
||||
"upgradeFrom": "v1.12.0",
|
||||
"upgradeTo": "[current version]"
|
||||
"upgradeTo": "v1.28.2"
|
||||
},
|
||||
"error": null,
|
||||
"duration": "[duration]",
|
||||
|
||||
@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
|
||||
"canceledBy": null,
|
||||
"details": {
|
||||
"upgradeFrom": "v1.12.0",
|
||||
"upgradeTo": "[current version]"
|
||||
"upgradeTo": "v1.28.2"
|
||||
},
|
||||
"error": null,
|
||||
"duration": "[duration]",
|
||||
|
||||
@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
|
||||
"canceledBy": null,
|
||||
"details": {
|
||||
"upgradeFrom": "v1.12.0",
|
||||
"upgradeTo": "[current version]"
|
||||
"upgradeTo": "v1.28.2"
|
||||
},
|
||||
"error": null,
|
||||
"duration": "[duration]",
|
||||
|
||||
@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
|
||||
"progress": null,
|
||||
"details": {
|
||||
"upgradeFrom": "v1.12.0",
|
||||
"upgradeTo": "[current version]"
|
||||
"upgradeTo": "v1.28.2"
|
||||
},
|
||||
"stats": {
|
||||
"totalNbTasks": 1,
|
||||
|
||||
@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
|
||||
"canceledBy": null,
|
||||
"details": {
|
||||
"upgradeFrom": "v1.12.0",
|
||||
"upgradeTo": "[current version]"
|
||||
"upgradeTo": "v1.28.2"
|
||||
},
|
||||
"error": null,
|
||||
"duration": "[duration]",
|
||||
|
||||
@@ -166,55 +166,55 @@ async fn check_the_index_scheduler(server: &Server) {
|
||||
// We rewrite the first task for all calls because it may be the upgrade database with unknown dates and duration.
|
||||
// The other tasks should NOT change
|
||||
let (tasks, _) = server.tasks_filter("limit=1000").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "the_whole_task_queue_once_everything_has_been_processed");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "the_whole_task_queue_once_everything_has_been_processed");
|
||||
let (batches, _) = server.batches_filter("limit=1000").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "the_whole_batch_queue_once_everything_has_been_processed");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "the_whole_batch_queue_once_everything_has_been_processed");
|
||||
|
||||
// Tests all the tasks query parameters
|
||||
let (tasks, _) = server.tasks_filter("uids=10").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_uids_equal_10");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_uids_equal_10");
|
||||
let (tasks, _) = server.tasks_filter("batchUids=10").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_batchUids_equal_10");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_batchUids_equal_10");
|
||||
let (tasks, _) = server.tasks_filter("statuses=canceled").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_statuses_equal_canceled");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_statuses_equal_canceled");
|
||||
// types has already been tested above to retrieve the upgrade database
|
||||
let (tasks, _) = server.tasks_filter("canceledBy=19").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_canceledBy_equal_19");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_canceledBy_equal_19");
|
||||
let (tasks, _) = server.tasks_filter("beforeEnqueuedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_beforeEnqueuedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_beforeEnqueuedAt_equal_2025-01-16T16_47_41");
|
||||
let (tasks, _) = server.tasks_filter("afterEnqueuedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41");
|
||||
let (tasks, _) = server.tasks_filter("beforeStartedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_beforeStartedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_beforeStartedAt_equal_2025-01-16T16_47_41");
|
||||
let (tasks, _) = server.tasks_filter("afterStartedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_afterStartedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_afterStartedAt_equal_2025-01-16T16_47_41");
|
||||
let (tasks, _) = server.tasks_filter("beforeFinishedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_beforeFinishedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_beforeFinishedAt_equal_2025-01-16T16_47_41");
|
||||
let (tasks, _) = server.tasks_filter("afterFinishedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(tasks, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_afterFinishedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(tasks, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]" }), name: "tasks_filter_afterFinishedAt_equal_2025-01-16T16_47_41");
|
||||
|
||||
// Tests all the batches query parameters
|
||||
let (batches, _) = server.batches_filter("uids=10").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_uids_equal_10");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_uids_equal_10");
|
||||
let (batches, _) = server.batches_filter("batchUids=10").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_batchUids_equal_10");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_batchUids_equal_10");
|
||||
let (batches, _) = server.batches_filter("statuses=canceled").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_statuses_equal_canceled");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_statuses_equal_canceled");
|
||||
// types has already been tested above to retrieve the upgrade database
|
||||
let (batches, _) = server.batches_filter("canceledBy=19").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_canceledBy_equal_19");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_canceledBy_equal_19");
|
||||
let (batches, _) = server.batches_filter("beforeEnqueuedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeEnqueuedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeEnqueuedAt_equal_2025-01-16T16_47_41");
|
||||
let (batches, _) = server.batches_filter("afterEnqueuedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41");
|
||||
let (batches, _) = server.batches_filter("beforeStartedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeStartedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeStartedAt_equal_2025-01-16T16_47_41");
|
||||
let (batches, _) = server.batches_filter("afterStartedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterStartedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterStartedAt_equal_2025-01-16T16_47_41");
|
||||
let (batches, _) = server.batches_filter("beforeFinishedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeFinishedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeFinishedAt_equal_2025-01-16T16_47_41");
|
||||
let (batches, _) = server.batches_filter("afterFinishedAt=2025-01-16T16:47:41Z").await;
|
||||
snapshot!(json_string!(batches, { ".results[0].details.upgradeTo" => "[current version]", ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41");
|
||||
snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41");
|
||||
|
||||
let (stats, _) = server.stats().await;
|
||||
assert_json_snapshot!(stats, {
|
||||
|
||||
@@ -104,8 +104,8 @@ async fn binary_quantize_before_sending_documents() {
|
||||
"manual": {
|
||||
"embeddings": [
|
||||
[
|
||||
0.0,
|
||||
0.0,
|
||||
-1.0,
|
||||
-1.0,
|
||||
1.0
|
||||
]
|
||||
],
|
||||
@@ -122,7 +122,7 @@ async fn binary_quantize_before_sending_documents() {
|
||||
[
|
||||
1.0,
|
||||
1.0,
|
||||
0.0
|
||||
-1.0
|
||||
]
|
||||
],
|
||||
"regenerate": false
|
||||
@@ -191,8 +191,8 @@ async fn binary_quantize_after_sending_documents() {
|
||||
"manual": {
|
||||
"embeddings": [
|
||||
[
|
||||
0.0,
|
||||
0.0,
|
||||
-1.0,
|
||||
-1.0,
|
||||
1.0
|
||||
]
|
||||
],
|
||||
@@ -209,7 +209,7 @@ async fn binary_quantize_after_sending_documents() {
|
||||
[
|
||||
1.0,
|
||||
1.0,
|
||||
0.0
|
||||
-1.0
|
||||
]
|
||||
],
|
||||
"regenerate": false
|
||||
|
||||
@@ -1,43 +0,0 @@
|
||||
use meili_snap::snapshot;
|
||||
|
||||
use crate::common::{GetAllDocumentsOptions, Server};
|
||||
use crate::json;
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn hf_bge_m3_force_cls_settings() {
|
||||
let server = Server::new_shared();
|
||||
let index = server.unique_index();
|
||||
|
||||
let (response, code) = index
|
||||
.update_settings(json!({
|
||||
"embedders": {
|
||||
"default": {
|
||||
"source": "huggingFace",
|
||||
"model": "baai/bge-m3",
|
||||
"revision": "5617a9f61b028005a4858fdac845db406aefb181",
|
||||
"pooling": "forceCls",
|
||||
// minimal template to allow potential document embedding if used later
|
||||
"documentTemplate": "{{doc.title}}"
|
||||
}
|
||||
}
|
||||
}))
|
||||
.await;
|
||||
snapshot!(code, @"202 Accepted");
|
||||
server.wait_task(response.uid()).await.succeeded();
|
||||
|
||||
// Try to embed one simple document
|
||||
let (task, code) =
|
||||
index.add_documents(json!([{ "id": 1, "title": "Hello world" }]), None).await;
|
||||
snapshot!(code, @"202 Accepted");
|
||||
server.wait_task(task.uid()).await.succeeded();
|
||||
|
||||
// Retrieve the document with vectors and assert embeddings were produced
|
||||
let (documents, _code) = index
|
||||
.get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() })
|
||||
.await;
|
||||
let has_vectors = documents["results"][0]["_vectors"]["default"]["embeddings"]
|
||||
.as_array()
|
||||
.map(|a| !a.is_empty())
|
||||
.unwrap_or(false);
|
||||
snapshot!(has_vectors, @"true");
|
||||
}
|
||||
@@ -1,6 +1,5 @@
|
||||
mod binary_quantized;
|
||||
mod fragments;
|
||||
mod huggingface;
|
||||
#[cfg(feature = "test-ollama")]
|
||||
mod ollama;
|
||||
mod openai;
|
||||
|
||||
@@ -500,6 +500,13 @@ async fn test_both_apis() {
|
||||
snapshot!(code, @"200 OK");
|
||||
snapshot!(json_string!(response["hits"]), @r###"
|
||||
[
|
||||
{
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"gender": "M",
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "Vénus",
|
||||
@@ -520,13 +527,6 @@ async fn test_both_apis() {
|
||||
"gender": "M",
|
||||
"birthyear": 1995,
|
||||
"breed": "Labrador Retriever"
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"gender": "M",
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
}
|
||||
]
|
||||
"###);
|
||||
@@ -540,6 +540,13 @@ async fn test_both_apis() {
|
||||
snapshot!(code, @"200 OK");
|
||||
snapshot!(json_string!(response["hits"]), @r###"
|
||||
[
|
||||
{
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"gender": "M",
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "Vénus",
|
||||
@@ -560,13 +567,6 @@ async fn test_both_apis() {
|
||||
"gender": "M",
|
||||
"birthyear": 1995,
|
||||
"breed": "Labrador Retriever"
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"gender": "M",
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
}
|
||||
]
|
||||
"###);
|
||||
@@ -581,18 +581,11 @@ async fn test_both_apis() {
|
||||
snapshot!(json_string!(response["hits"]), @r###"
|
||||
[
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Intel",
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"gender": "M",
|
||||
"birthyear": 2011,
|
||||
"breed": "Beagle"
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "Vénus",
|
||||
"gender": "F",
|
||||
"birthyear": 2003,
|
||||
"breed": "Jack Russel Terrier"
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
@@ -602,11 +595,18 @@ async fn test_both_apis() {
|
||||
"breed": "Labrador Retriever"
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"id": 2,
|
||||
"name": "Vénus",
|
||||
"gender": "F",
|
||||
"birthyear": 2003,
|
||||
"breed": "Jack Russel Terrier"
|
||||
},
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Intel",
|
||||
"gender": "M",
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
"birthyear": 2011,
|
||||
"breed": "Beagle"
|
||||
}
|
||||
]
|
||||
"###);
|
||||
@@ -621,18 +621,11 @@ async fn test_both_apis() {
|
||||
snapshot!(json_string!(response["hits"]), @r###"
|
||||
[
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Intel",
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"gender": "M",
|
||||
"birthyear": 2011,
|
||||
"breed": "Beagle"
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "Vénus",
|
||||
"gender": "F",
|
||||
"birthyear": 2003,
|
||||
"breed": "Jack Russel Terrier"
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
@@ -642,11 +635,18 @@ async fn test_both_apis() {
|
||||
"breed": "Labrador Retriever"
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"id": 2,
|
||||
"name": "Vénus",
|
||||
"gender": "F",
|
||||
"birthyear": 2003,
|
||||
"breed": "Jack Russel Terrier"
|
||||
},
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Intel",
|
||||
"gender": "M",
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
"birthyear": 2011,
|
||||
"breed": "Beagle"
|
||||
}
|
||||
]
|
||||
"###);
|
||||
@@ -661,18 +661,11 @@ async fn test_both_apis() {
|
||||
snapshot!(json_string!(response["hits"]), @r###"
|
||||
[
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Intel",
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"gender": "M",
|
||||
"birthyear": 2011,
|
||||
"breed": "Beagle"
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "Vénus",
|
||||
"gender": "F",
|
||||
"birthyear": 2003,
|
||||
"breed": "Jack Russel Terrier"
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
@@ -682,11 +675,18 @@ async fn test_both_apis() {
|
||||
"breed": "Labrador Retriever"
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"id": 2,
|
||||
"name": "Vénus",
|
||||
"gender": "F",
|
||||
"birthyear": 2003,
|
||||
"breed": "Jack Russel Terrier"
|
||||
},
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Intel",
|
||||
"gender": "M",
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
"birthyear": 2011,
|
||||
"breed": "Beagle"
|
||||
}
|
||||
]
|
||||
"###);
|
||||
@@ -701,18 +701,11 @@ async fn test_both_apis() {
|
||||
snapshot!(json_string!(response["hits"]), @r###"
|
||||
[
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Intel",
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"gender": "M",
|
||||
"birthyear": 2011,
|
||||
"breed": "Beagle"
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "Vénus",
|
||||
"gender": "F",
|
||||
"birthyear": 2003,
|
||||
"breed": "Jack Russel Terrier"
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
@@ -722,11 +715,18 @@ async fn test_both_apis() {
|
||||
"breed": "Labrador Retriever"
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"name": "kefir",
|
||||
"id": 2,
|
||||
"name": "Vénus",
|
||||
"gender": "F",
|
||||
"birthyear": 2003,
|
||||
"breed": "Jack Russel Terrier"
|
||||
},
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Intel",
|
||||
"gender": "M",
|
||||
"birthyear": 2023,
|
||||
"breed": "Patou"
|
||||
"birthyear": 2011,
|
||||
"breed": "Beagle"
|
||||
}
|
||||
]
|
||||
"###);
|
||||
|
||||
@@ -91,7 +91,7 @@ rhai = { version = "1.23.6", features = [
|
||||
"sync",
|
||||
] }
|
||||
arroy = "0.6.4-nested-rtxns"
|
||||
hannoy = { version = "0.1.0-nested-rtxns", features = ["arroy"] }
|
||||
hannoy = { version = "0.0.9-nested-rtxns-2", features = ["arroy"] }
|
||||
rand = "0.8.5"
|
||||
tracing = "0.1.41"
|
||||
ureq = { version = "2.12.1", features = ["json"] }
|
||||
|
||||
@@ -18,8 +18,6 @@ use crate::{
|
||||
pub struct Metadata {
|
||||
/// The weight as defined in the FieldidsWeightsMap of the searchable attribute if it is searchable.
|
||||
pub searchable: Option<Weight>,
|
||||
/// The field is part of the exact attributes.
|
||||
pub exact: bool,
|
||||
/// The field is part of the sortable attributes.
|
||||
pub sortable: bool,
|
||||
/// The field is defined as the distinct attribute.
|
||||
@@ -211,7 +209,6 @@ impl Metadata {
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct MetadataBuilder {
|
||||
searchable_attributes: Option<Vec<String>>,
|
||||
exact_searchable_attributes: Vec<String>,
|
||||
filterable_attributes: Vec<FilterableAttributesRule>,
|
||||
sortable_attributes: HashSet<String>,
|
||||
localized_attributes: Option<Vec<LocalizedAttributesRule>>,
|
||||
@@ -223,18 +220,15 @@ impl MetadataBuilder {
|
||||
pub fn from_index(index: &Index, rtxn: &RoTxn) -> Result<Self> {
|
||||
let searchable_attributes = index
|
||||
.user_defined_searchable_fields(rtxn)?
|
||||
.map(|fields| fields.into_iter().map(String::from).collect());
|
||||
let exact_searchable_attributes =
|
||||
index.exact_attributes(rtxn)?.into_iter().map(String::from).collect();
|
||||
.map(|fields| fields.into_iter().map(|s| s.to_string()).collect());
|
||||
let filterable_attributes = index.filterable_attributes_rules(rtxn)?;
|
||||
let sortable_attributes = index.sortable_fields(rtxn)?;
|
||||
let localized_attributes = index.localized_attributes_rules(rtxn)?;
|
||||
let distinct_attribute = index.distinct_field(rtxn)?.map(String::from);
|
||||
let distinct_attribute = index.distinct_field(rtxn)?.map(|s| s.to_string());
|
||||
let asc_desc_attributes = index.asc_desc_fields(rtxn)?;
|
||||
|
||||
Ok(Self::new(
|
||||
searchable_attributes,
|
||||
exact_searchable_attributes,
|
||||
filterable_attributes,
|
||||
sortable_attributes,
|
||||
localized_attributes,
|
||||
@@ -248,7 +242,6 @@ impl MetadataBuilder {
|
||||
/// This is used for testing, prefer using `MetadataBuilder::from_index` instead.
|
||||
pub fn new(
|
||||
searchable_attributes: Option<Vec<String>>,
|
||||
exact_searchable_attributes: Vec<String>,
|
||||
filterable_attributes: Vec<FilterableAttributesRule>,
|
||||
sortable_attributes: HashSet<String>,
|
||||
localized_attributes: Option<Vec<LocalizedAttributesRule>>,
|
||||
@@ -263,7 +256,6 @@ impl MetadataBuilder {
|
||||
|
||||
Self {
|
||||
searchable_attributes,
|
||||
exact_searchable_attributes,
|
||||
filterable_attributes,
|
||||
sortable_attributes,
|
||||
localized_attributes,
|
||||
@@ -277,7 +269,6 @@ impl MetadataBuilder {
|
||||
// Vectors fields are not searchable, filterable, distinct or asc_desc
|
||||
return Metadata {
|
||||
searchable: None,
|
||||
exact: false,
|
||||
sortable: false,
|
||||
distinct: false,
|
||||
asc_desc: false,
|
||||
@@ -305,7 +296,6 @@ impl MetadataBuilder {
|
||||
// Geo fields are not searchable, distinct or asc_desc
|
||||
return Metadata {
|
||||
searchable: None,
|
||||
exact: false,
|
||||
sortable,
|
||||
distinct: false,
|
||||
asc_desc: false,
|
||||
@@ -319,7 +309,6 @@ impl MetadataBuilder {
|
||||
debug_assert!(!sortable, "geojson fields should not be sortable");
|
||||
return Metadata {
|
||||
searchable: None,
|
||||
exact: false,
|
||||
sortable,
|
||||
distinct: false,
|
||||
asc_desc: false,
|
||||
@@ -340,8 +329,6 @@ impl MetadataBuilder {
|
||||
None => Some(0),
|
||||
};
|
||||
|
||||
let exact = self.exact_searchable_attributes.iter().any(|attr| is_faceted_by(field, attr));
|
||||
|
||||
let distinct =
|
||||
self.distinct_attribute.as_ref().is_some_and(|distinct_field| field == distinct_field);
|
||||
let asc_desc = self.asc_desc_attributes.contains(field);
|
||||
@@ -356,7 +343,6 @@ impl MetadataBuilder {
|
||||
|
||||
Metadata {
|
||||
searchable,
|
||||
exact,
|
||||
sortable,
|
||||
distinct,
|
||||
asc_desc,
|
||||
|
||||
@@ -281,9 +281,6 @@ impl Index {
|
||||
&mut wtxn,
|
||||
(constants::VERSION_MAJOR, constants::VERSION_MINOR, constants::VERSION_PATCH),
|
||||
)?;
|
||||
// The database before v1.29 defaulted to using arroy, so we
|
||||
// need to set it explicitly because the new default is hannoy.
|
||||
this.put_vector_store(&mut wtxn, VectorStoreBackend::Hannoy)?;
|
||||
}
|
||||
wtxn.commit()?;
|
||||
|
||||
|
||||
@@ -806,10 +806,6 @@ mod tests {
|
||||
use crate::vector::db::IndexEmbeddingConfig;
|
||||
use crate::{all_obkv_to_json, db_snap, Filter, FilterableAttributesRule, Search, UserError};
|
||||
|
||||
fn no_cancel() -> bool {
|
||||
false
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn simple_document_replacement() {
|
||||
let index = TempIndex::new();
|
||||
@@ -1989,7 +1985,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2042,7 +2038,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2061,7 +2057,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2131,7 +2127,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2150,7 +2146,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
RuntimeEmbedders::default(),
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2321,7 +2317,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2337,7 +2333,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2385,7 +2381,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2401,7 +2397,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2440,7 +2436,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2456,7 +2452,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2494,7 +2490,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2510,7 +2506,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2550,7 +2546,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2566,7 +2562,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2611,7 +2607,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2627,7 +2623,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2665,7 +2661,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2681,7 +2677,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2719,7 +2715,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2735,7 +2731,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2931,7 +2927,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -2947,7 +2943,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -2992,7 +2988,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -3008,7 +3004,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
@@ -3050,7 +3046,7 @@ mod tests {
|
||||
&rtxn,
|
||||
None,
|
||||
&mut new_fields_ids_map,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
Progress::default(),
|
||||
None,
|
||||
)
|
||||
@@ -3066,7 +3062,7 @@ mod tests {
|
||||
primary_key,
|
||||
&document_changes,
|
||||
embedders,
|
||||
&no_cancel,
|
||||
&|| false,
|
||||
&Progress::default(),
|
||||
&Default::default(),
|
||||
)
|
||||
|
||||
@@ -8,26 +8,17 @@ use bumpalo::Bump;
|
||||
|
||||
use super::match_searchable_field;
|
||||
use super::tokenize_document::{tokenizer_builder, DocumentTokenizer};
|
||||
use crate::fields_ids_map::metadata::Metadata;
|
||||
use crate::update::new::document::DocumentContext;
|
||||
use crate::update::new::extract::cache::BalancedCaches;
|
||||
use crate::update::new::extract::perm_json_p::contained_in;
|
||||
use crate::update::new::extract::searchable::has_searchable_children;
|
||||
use crate::update::new::indexer::document_changes::{
|
||||
extract, DocumentChanges, Extractor, IndexingContext,
|
||||
};
|
||||
use crate::update::new::indexer::settings_changes::{
|
||||
settings_change_extract, DocumentsIndentifiers, SettingsChangeExtractor,
|
||||
};
|
||||
use crate::update::new::ref_cell_ext::RefCellExt as _;
|
||||
use crate::update::new::steps::IndexingStep;
|
||||
use crate::update::new::thread_local::{FullySend, MostlySend, ThreadLocal};
|
||||
use crate::update::new::{DocumentChange, DocumentIdentifiers};
|
||||
use crate::update::settings::SettingsDelta;
|
||||
use crate::{
|
||||
bucketed_position, DocumentId, FieldId, PatternMatch, Result, UserError,
|
||||
MAX_POSITION_PER_ATTRIBUTE,
|
||||
};
|
||||
use crate::update::new::DocumentChange;
|
||||
use crate::{bucketed_position, DocumentId, FieldId, Result, MAX_POSITION_PER_ATTRIBUTE};
|
||||
|
||||
const MAX_COUNTED_WORDS: usize = 30;
|
||||
|
||||
@@ -43,15 +34,6 @@ pub struct WordDocidsBalancedCaches<'extractor> {
|
||||
|
||||
unsafe impl MostlySend for WordDocidsBalancedCaches<'_> {}
|
||||
|
||||
/// Whether to extract or skip fields during word extraction.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
enum FieldDbExtraction {
|
||||
/// Extract the word and put it in to the fid-based databases.
|
||||
Extract,
|
||||
/// Do not store the word in the fid-based databases.
|
||||
Skip,
|
||||
}
|
||||
|
||||
impl<'extractor> WordDocidsBalancedCaches<'extractor> {
|
||||
pub fn new_in(buckets: usize, max_memory: Option<usize>, alloc: &'extractor Bump) -> Self {
|
||||
Self {
|
||||
@@ -65,14 +47,12 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
|
||||
}
|
||||
}
|
||||
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
fn insert_add_u32(
|
||||
&mut self,
|
||||
field_id: FieldId,
|
||||
position: u16,
|
||||
word: &str,
|
||||
exact: bool,
|
||||
field_db_extraction: FieldDbExtraction,
|
||||
docid: u32,
|
||||
bump: &Bump,
|
||||
) -> Result<()> {
|
||||
@@ -86,13 +66,11 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
|
||||
let buffer_size = word_bytes.len() + 1 + size_of::<FieldId>();
|
||||
let mut buffer = BumpVec::with_capacity_in(buffer_size, bump);
|
||||
|
||||
if field_db_extraction == FieldDbExtraction::Extract {
|
||||
buffer.clear();
|
||||
buffer.extend_from_slice(word_bytes);
|
||||
buffer.push(0);
|
||||
buffer.extend_from_slice(&field_id.to_be_bytes());
|
||||
self.word_fid_docids.insert_add_u32(&buffer, docid)?;
|
||||
}
|
||||
buffer.clear();
|
||||
buffer.extend_from_slice(word_bytes);
|
||||
buffer.push(0);
|
||||
buffer.extend_from_slice(&field_id.to_be_bytes());
|
||||
self.word_fid_docids.insert_add_u32(&buffer, docid)?;
|
||||
|
||||
let position = bucketed_position(position);
|
||||
buffer.clear();
|
||||
@@ -105,26 +83,21 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
|
||||
self.flush_fid_word_count(&mut buffer)?;
|
||||
}
|
||||
|
||||
if field_db_extraction == FieldDbExtraction::Extract {
|
||||
self.fid_word_count
|
||||
.entry(field_id)
|
||||
.and_modify(|(_current_count, new_count)| *new_count.get_or_insert(0) += 1)
|
||||
.or_insert((None, Some(1)));
|
||||
}
|
||||
|
||||
self.fid_word_count
|
||||
.entry(field_id)
|
||||
.and_modify(|(_current_count, new_count)| *new_count.get_or_insert(0) += 1)
|
||||
.or_insert((None, Some(1)));
|
||||
self.current_docid = Some(docid);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
fn insert_del_u32(
|
||||
&mut self,
|
||||
field_id: FieldId,
|
||||
position: u16,
|
||||
word: &str,
|
||||
exact: bool,
|
||||
field_db_extraction: FieldDbExtraction,
|
||||
docid: u32,
|
||||
bump: &Bump,
|
||||
) -> Result<()> {
|
||||
@@ -138,13 +111,11 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
|
||||
let buffer_size = word_bytes.len() + 1 + size_of::<FieldId>();
|
||||
let mut buffer = BumpVec::with_capacity_in(buffer_size, bump);
|
||||
|
||||
if field_db_extraction == FieldDbExtraction::Extract {
|
||||
buffer.clear();
|
||||
buffer.extend_from_slice(word_bytes);
|
||||
buffer.push(0);
|
||||
buffer.extend_from_slice(&field_id.to_be_bytes());
|
||||
self.word_fid_docids.insert_del_u32(&buffer, docid)?;
|
||||
}
|
||||
buffer.clear();
|
||||
buffer.extend_from_slice(word_bytes);
|
||||
buffer.push(0);
|
||||
buffer.extend_from_slice(&field_id.to_be_bytes());
|
||||
self.word_fid_docids.insert_del_u32(&buffer, docid)?;
|
||||
|
||||
let position = bucketed_position(position);
|
||||
buffer.clear();
|
||||
@@ -157,12 +128,10 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
|
||||
self.flush_fid_word_count(&mut buffer)?;
|
||||
}
|
||||
|
||||
if field_db_extraction == FieldDbExtraction::Extract {
|
||||
self.fid_word_count
|
||||
.entry(field_id)
|
||||
.and_modify(|(current_count, _new_count)| *current_count.get_or_insert(0) += 1)
|
||||
.or_insert((Some(1), None));
|
||||
}
|
||||
self.fid_word_count
|
||||
.entry(field_id)
|
||||
.and_modify(|(current_count, _new_count)| *current_count.get_or_insert(0) += 1)
|
||||
.or_insert((Some(1), None));
|
||||
|
||||
self.current_docid = Some(docid);
|
||||
|
||||
@@ -356,24 +325,6 @@ impl WordDocidsExtractors {
|
||||
exact_attributes.iter().any(|attr| contained_in(fname, attr))
|
||||
|| disabled_typos_terms.is_exact(word)
|
||||
};
|
||||
|
||||
let mut should_tokenize = |field_name: &str| {
|
||||
let Some((field_id, meta)) = new_fields_ids_map.id_with_metadata_or_insert(field_name)
|
||||
else {
|
||||
return Err(UserError::AttributeLimitReached.into());
|
||||
};
|
||||
|
||||
let pattern_match = if meta.is_searchable() {
|
||||
PatternMatch::Match
|
||||
} else {
|
||||
// TODO: should be a match on the field_name using `match_field_legacy` function,
|
||||
// but for legacy reasons we iterate over all the fields to fill the field_id_map.
|
||||
PatternMatch::Parent
|
||||
};
|
||||
|
||||
Ok((field_id, pattern_match))
|
||||
};
|
||||
|
||||
match document_change {
|
||||
DocumentChange::Deletion(inner) => {
|
||||
let mut token_fn = |fname: &str, fid, pos, word: &str| {
|
||||
@@ -382,14 +333,13 @@ impl WordDocidsExtractors {
|
||||
pos,
|
||||
word,
|
||||
is_exact(fname, word),
|
||||
FieldDbExtraction::Extract,
|
||||
inner.docid(),
|
||||
doc_alloc,
|
||||
)
|
||||
};
|
||||
document_tokenizer.tokenize_document(
|
||||
inner.current(rtxn, index, context.db_fields_ids_map)?,
|
||||
&mut should_tokenize,
|
||||
new_fields_ids_map,
|
||||
&mut token_fn,
|
||||
)?;
|
||||
}
|
||||
@@ -411,14 +361,13 @@ impl WordDocidsExtractors {
|
||||
pos,
|
||||
word,
|
||||
is_exact(fname, word),
|
||||
FieldDbExtraction::Extract,
|
||||
inner.docid(),
|
||||
doc_alloc,
|
||||
)
|
||||
};
|
||||
document_tokenizer.tokenize_document(
|
||||
inner.current(rtxn, index, context.db_fields_ids_map)?,
|
||||
&mut should_tokenize,
|
||||
new_fields_ids_map,
|
||||
&mut token_fn,
|
||||
)?;
|
||||
|
||||
@@ -428,14 +377,13 @@ impl WordDocidsExtractors {
|
||||
pos,
|
||||
word,
|
||||
is_exact(fname, word),
|
||||
FieldDbExtraction::Extract,
|
||||
inner.docid(),
|
||||
doc_alloc,
|
||||
)
|
||||
};
|
||||
document_tokenizer.tokenize_document(
|
||||
inner.merged(rtxn, index, context.db_fields_ids_map)?,
|
||||
&mut should_tokenize,
|
||||
new_fields_ids_map,
|
||||
&mut token_fn,
|
||||
)?;
|
||||
}
|
||||
@@ -446,14 +394,13 @@ impl WordDocidsExtractors {
|
||||
pos,
|
||||
word,
|
||||
is_exact(fname, word),
|
||||
FieldDbExtraction::Extract,
|
||||
inner.docid(),
|
||||
doc_alloc,
|
||||
)
|
||||
};
|
||||
document_tokenizer.tokenize_document(
|
||||
inner.inserted(),
|
||||
&mut should_tokenize,
|
||||
new_fields_ids_map,
|
||||
&mut token_fn,
|
||||
)?;
|
||||
}
|
||||
@@ -464,292 +411,3 @@ impl WordDocidsExtractors {
|
||||
cached_sorter.flush_fid_word_count(&mut buffer)
|
||||
}
|
||||
}
|
||||
|
||||
pub struct WordDocidsSettingsExtractorsData<'a, SD> {
|
||||
tokenizer: DocumentTokenizer<'a>,
|
||||
max_memory_by_thread: Option<usize>,
|
||||
buckets: usize,
|
||||
settings_delta: &'a SD,
|
||||
}
|
||||
|
||||
impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
|
||||
for WordDocidsSettingsExtractorsData<'_, SD>
|
||||
{
|
||||
type Data = RefCell<Option<WordDocidsBalancedCaches<'extractor>>>;
|
||||
|
||||
fn init_data<'doc>(&'doc self, extractor_alloc: &'extractor Bump) -> crate::Result<Self::Data> {
|
||||
Ok(RefCell::new(Some(WordDocidsBalancedCaches::new_in(
|
||||
self.buckets,
|
||||
self.max_memory_by_thread,
|
||||
extractor_alloc,
|
||||
))))
|
||||
}
|
||||
|
||||
fn process<'doc>(
|
||||
&'doc self,
|
||||
documents: impl Iterator<Item = crate::Result<DocumentIdentifiers<'doc>>>,
|
||||
context: &'doc DocumentContext<Self::Data>,
|
||||
) -> crate::Result<()> {
|
||||
for document in documents {
|
||||
let document = document?;
|
||||
SettingsChangeWordDocidsExtractors::extract_document_from_settings_change(
|
||||
document,
|
||||
context,
|
||||
&self.tokenizer,
|
||||
self.settings_delta,
|
||||
)?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
pub struct SettingsChangeWordDocidsExtractors;
|
||||
|
||||
impl SettingsChangeWordDocidsExtractors {
|
||||
pub fn run_extraction<'fid, 'indexer, 'index, 'extractor, SD, MSP>(
|
||||
settings_delta: &SD,
|
||||
documents: &'indexer DocumentsIndentifiers<'indexer>,
|
||||
indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP>,
|
||||
extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>,
|
||||
step: IndexingStep,
|
||||
) -> Result<WordDocidsCaches<'extractor>>
|
||||
where
|
||||
SD: SettingsDelta + Sync,
|
||||
MSP: Fn() -> bool + Sync,
|
||||
{
|
||||
// Warning: this is duplicated code from extract_word_pair_proximity_docids.rs
|
||||
// TODO we need to read the new AND old settings to support changing global parameters
|
||||
let rtxn = indexing_context.index.read_txn()?;
|
||||
let stop_words = indexing_context.index.stop_words(&rtxn)?;
|
||||
let allowed_separators = indexing_context.index.allowed_separators(&rtxn)?;
|
||||
let allowed_separators: Option<Vec<_>> =
|
||||
allowed_separators.as_ref().map(|s| s.iter().map(String::as_str).collect());
|
||||
let dictionary = indexing_context.index.dictionary(&rtxn)?;
|
||||
let dictionary: Option<Vec<_>> =
|
||||
dictionary.as_ref().map(|s| s.iter().map(String::as_str).collect());
|
||||
let mut builder = tokenizer_builder(
|
||||
stop_words.as_ref(),
|
||||
allowed_separators.as_deref(),
|
||||
dictionary.as_deref(),
|
||||
);
|
||||
let tokenizer = builder.build();
|
||||
let localized_attributes_rules =
|
||||
indexing_context.index.localized_attributes_rules(&rtxn)?.unwrap_or_default();
|
||||
let document_tokenizer = DocumentTokenizer {
|
||||
tokenizer: &tokenizer,
|
||||
localized_attributes_rules: &localized_attributes_rules,
|
||||
max_positions_per_attributes: MAX_POSITION_PER_ATTRIBUTE,
|
||||
};
|
||||
let extractor_data = WordDocidsSettingsExtractorsData {
|
||||
tokenizer: document_tokenizer,
|
||||
max_memory_by_thread: indexing_context.grenad_parameters.max_memory_by_thread(),
|
||||
buckets: rayon::current_num_threads(),
|
||||
settings_delta,
|
||||
};
|
||||
let datastore = ThreadLocal::new();
|
||||
{
|
||||
let span = tracing::debug_span!(target: "indexing::documents::extract", "vectors");
|
||||
let _entered = span.enter();
|
||||
|
||||
settings_change_extract(
|
||||
documents,
|
||||
&extractor_data,
|
||||
indexing_context,
|
||||
extractor_allocs,
|
||||
&datastore,
|
||||
step,
|
||||
)?;
|
||||
}
|
||||
|
||||
let mut merger = WordDocidsCaches::new();
|
||||
for cache in datastore.into_iter().flat_map(RefCell::into_inner) {
|
||||
merger.push(cache)?;
|
||||
}
|
||||
|
||||
Ok(merger)
|
||||
}
|
||||
|
||||
/// Extracts document words from a settings change.
|
||||
fn extract_document_from_settings_change<SD: SettingsDelta>(
|
||||
document: DocumentIdentifiers<'_>,
|
||||
context: &DocumentContext<RefCell<Option<WordDocidsBalancedCaches>>>,
|
||||
document_tokenizer: &DocumentTokenizer,
|
||||
settings_delta: &SD,
|
||||
) -> Result<()> {
|
||||
let mut cached_sorter_ref = context.data.borrow_mut_or_yield();
|
||||
let cached_sorter = cached_sorter_ref.as_mut().unwrap();
|
||||
let doc_alloc = &context.doc_alloc;
|
||||
|
||||
let new_fields_ids_map = settings_delta.new_fields_ids_map();
|
||||
let old_fields_ids_map = context.index.fields_ids_map_with_metadata(&context.rtxn)?;
|
||||
let old_searchable = settings_delta.old_searchable_attributes().as_ref();
|
||||
let new_searchable = settings_delta.new_searchable_attributes().as_ref();
|
||||
|
||||
let current_document = document.current(
|
||||
&context.rtxn,
|
||||
context.index,
|
||||
old_fields_ids_map.as_fields_ids_map(),
|
||||
)?;
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq)]
|
||||
enum ActionToOperate {
|
||||
ReindexAllFields,
|
||||
// TODO improve by listing field prefixes
|
||||
IndexAddedFields,
|
||||
SkipDocument,
|
||||
}
|
||||
|
||||
let mut action = ActionToOperate::SkipDocument;
|
||||
// Here we do a preliminary check to determine the action to take.
|
||||
// This check doesn't trigger the tokenizer as we never return
|
||||
// PatternMatch::Match.
|
||||
document_tokenizer.tokenize_document(
|
||||
current_document,
|
||||
&mut |field_name| {
|
||||
let fid = new_fields_ids_map.id(field_name).expect("All fields IDs must exist");
|
||||
|
||||
// If the document must be reindexed, early return NoMatch to stop the scanning process.
|
||||
if action == ActionToOperate::ReindexAllFields {
|
||||
return Ok((fid, PatternMatch::NoMatch));
|
||||
}
|
||||
|
||||
let old_field_metadata = old_fields_ids_map.metadata(fid).unwrap();
|
||||
let new_field_metadata = new_fields_ids_map.metadata(fid).unwrap();
|
||||
|
||||
action = match (old_field_metadata, new_field_metadata) {
|
||||
// At least one field is added or removed from the exact fields => ReindexAllFields
|
||||
(Metadata { exact: old_exact, .. }, Metadata { exact: new_exact, .. })
|
||||
if old_exact != new_exact =>
|
||||
{
|
||||
ActionToOperate::ReindexAllFields
|
||||
}
|
||||
// At least one field is removed from the searchable fields => ReindexAllFields
|
||||
(Metadata { searchable: Some(_), .. }, Metadata { searchable: None, .. }) => {
|
||||
ActionToOperate::ReindexAllFields
|
||||
}
|
||||
// At least one field is added in the searchable fields => IndexAddedFields
|
||||
(Metadata { searchable: None, .. }, Metadata { searchable: Some(_), .. }) => {
|
||||
// We can safely overwrite the action, because we early return when action is ReindexAllFields.
|
||||
ActionToOperate::IndexAddedFields
|
||||
}
|
||||
_ => action,
|
||||
};
|
||||
|
||||
Ok((fid, PatternMatch::Parent))
|
||||
},
|
||||
&mut |_, _, _, _| Ok(()),
|
||||
)?;
|
||||
|
||||
// Early return when we don't need to index the document
|
||||
if action == ActionToOperate::SkipDocument {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let mut should_tokenize = |field_name: &str| {
|
||||
let field_id = new_fields_ids_map.id(field_name).expect("All fields IDs must exist");
|
||||
let old_field_metadata = old_fields_ids_map.metadata(field_id).unwrap();
|
||||
let new_field_metadata = new_fields_ids_map.metadata(field_id).unwrap();
|
||||
|
||||
let pattern_match = match action {
|
||||
ActionToOperate::ReindexAllFields => {
|
||||
if old_field_metadata.is_searchable() || new_field_metadata.is_searchable() {
|
||||
PatternMatch::Match
|
||||
// If any old or new field is searchable then we need to iterate over all fields
|
||||
// else if any field matches we need to iterate over all fields
|
||||
} else if has_searchable_children(
|
||||
field_name,
|
||||
old_searchable.zip(new_searchable).map(|(old, new)| old.iter().chain(new)),
|
||||
) {
|
||||
PatternMatch::Parent
|
||||
} else {
|
||||
PatternMatch::NoMatch
|
||||
}
|
||||
}
|
||||
ActionToOperate::IndexAddedFields => {
|
||||
// Was not searchable but now is
|
||||
if !old_field_metadata.is_searchable() && new_field_metadata.is_searchable() {
|
||||
PatternMatch::Match
|
||||
// If the field is now a parent of a searchable field
|
||||
} else if has_searchable_children(field_name, new_searchable) {
|
||||
PatternMatch::Parent
|
||||
} else {
|
||||
PatternMatch::NoMatch
|
||||
}
|
||||
}
|
||||
ActionToOperate::SkipDocument => unreachable!(),
|
||||
};
|
||||
|
||||
Ok((field_id, pattern_match))
|
||||
};
|
||||
|
||||
let old_disabled_typos_terms = settings_delta.old_disabled_typos_terms();
|
||||
let new_disabled_typos_terms = settings_delta.new_disabled_typos_terms();
|
||||
let mut token_fn = |_field_name: &str, field_id, pos, word: &str| {
|
||||
let old_field_metadata = old_fields_ids_map.metadata(field_id).unwrap();
|
||||
let new_field_metadata = new_fields_ids_map.metadata(field_id).unwrap();
|
||||
|
||||
match (old_field_metadata, new_field_metadata) {
|
||||
(
|
||||
Metadata { searchable: Some(_), exact: old_exact, .. },
|
||||
Metadata { searchable: None, .. },
|
||||
) => cached_sorter.insert_del_u32(
|
||||
field_id,
|
||||
pos,
|
||||
word,
|
||||
old_exact || old_disabled_typos_terms.is_exact(word),
|
||||
// We deleted the field globally
|
||||
FieldDbExtraction::Skip,
|
||||
document.docid(),
|
||||
doc_alloc,
|
||||
),
|
||||
(
|
||||
Metadata { searchable: None, .. },
|
||||
Metadata { searchable: Some(_), exact: new_exact, .. },
|
||||
) => cached_sorter.insert_add_u32(
|
||||
field_id,
|
||||
pos,
|
||||
word,
|
||||
new_exact || new_disabled_typos_terms.is_exact(word),
|
||||
FieldDbExtraction::Extract,
|
||||
document.docid(),
|
||||
doc_alloc,
|
||||
),
|
||||
(Metadata { searchable: None, .. }, Metadata { searchable: None, .. }) => {
|
||||
unreachable!()
|
||||
}
|
||||
(Metadata { exact: old_exact, .. }, Metadata { exact: new_exact, .. }) => {
|
||||
cached_sorter.insert_del_u32(
|
||||
field_id,
|
||||
pos,
|
||||
word,
|
||||
old_exact || old_disabled_typos_terms.is_exact(word),
|
||||
// The field has already been extracted
|
||||
FieldDbExtraction::Skip,
|
||||
document.docid(),
|
||||
doc_alloc,
|
||||
)?;
|
||||
cached_sorter.insert_add_u32(
|
||||
field_id,
|
||||
pos,
|
||||
word,
|
||||
new_exact || new_disabled_typos_terms.is_exact(word),
|
||||
// The field has already been extracted
|
||||
FieldDbExtraction::Skip,
|
||||
document.docid(),
|
||||
doc_alloc,
|
||||
)
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
// TODO we must tokenize twice when we change global parameters like stop words,
|
||||
// the language settings, dictionary, separators, non-separators...
|
||||
document_tokenizer.tokenize_document(
|
||||
current_document,
|
||||
&mut should_tokenize,
|
||||
&mut token_fn,
|
||||
)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
@@ -6,24 +6,17 @@ use bumpalo::Bump;
|
||||
|
||||
use super::match_searchable_field;
|
||||
use super::tokenize_document::{tokenizer_builder, DocumentTokenizer};
|
||||
use crate::fields_ids_map::metadata::Metadata;
|
||||
use crate::proximity::ProximityPrecision::*;
|
||||
use crate::proximity::{index_proximity, MAX_DISTANCE};
|
||||
use crate::update::new::document::{Document, DocumentContext};
|
||||
use crate::update::new::extract::cache::BalancedCaches;
|
||||
use crate::update::new::indexer::document_changes::{
|
||||
extract, DocumentChanges, Extractor, IndexingContext,
|
||||
};
|
||||
use crate::update::new::indexer::settings_change_extract;
|
||||
use crate::update::new::indexer::settings_changes::{
|
||||
DocumentsIndentifiers, SettingsChangeExtractor,
|
||||
};
|
||||
use crate::update::new::ref_cell_ext::RefCellExt as _;
|
||||
use crate::update::new::steps::IndexingStep;
|
||||
use crate::update::new::thread_local::{FullySend, ThreadLocal};
|
||||
use crate::update::new::{DocumentChange, DocumentIdentifiers};
|
||||
use crate::update::settings::SettingsDelta;
|
||||
use crate::{FieldId, PatternMatch, Result, UserError, MAX_POSITION_PER_ATTRIBUTE};
|
||||
use crate::update::new::DocumentChange;
|
||||
use crate::{FieldId, GlobalFieldsIdsMap, Result, MAX_POSITION_PER_ATTRIBUTE};
|
||||
|
||||
pub struct WordPairProximityDocidsExtractorData<'a> {
|
||||
tokenizer: DocumentTokenizer<'a>,
|
||||
@@ -123,7 +116,7 @@ impl WordPairProximityDocidsExtractor {
|
||||
// and to store the docids of the documents that have a number of words in a given field
|
||||
// equal to or under than MAX_COUNTED_WORDS.
|
||||
fn extract_document_change(
|
||||
context: &DocumentContext<RefCell<BalancedCaches<'_>>>,
|
||||
context: &DocumentContext<RefCell<BalancedCaches>>,
|
||||
document_tokenizer: &DocumentTokenizer,
|
||||
searchable_attributes: Option<&[&str]>,
|
||||
document_change: DocumentChange,
|
||||
@@ -154,12 +147,8 @@ impl WordPairProximityDocidsExtractor {
|
||||
process_document_tokens(
|
||||
document,
|
||||
document_tokenizer,
|
||||
new_fields_ids_map,
|
||||
&mut word_positions,
|
||||
&mut |field_name| {
|
||||
new_fields_ids_map
|
||||
.id_with_metadata_or_insert(field_name)
|
||||
.ok_or(UserError::AttributeLimitReached.into())
|
||||
},
|
||||
&mut |(w1, w2), prox| {
|
||||
del_word_pair_proximity.push(((w1, w2), prox));
|
||||
},
|
||||
@@ -181,12 +170,8 @@ impl WordPairProximityDocidsExtractor {
|
||||
process_document_tokens(
|
||||
document,
|
||||
document_tokenizer,
|
||||
new_fields_ids_map,
|
||||
&mut word_positions,
|
||||
&mut |field_name| {
|
||||
new_fields_ids_map
|
||||
.id_with_metadata_or_insert(field_name)
|
||||
.ok_or(UserError::AttributeLimitReached.into())
|
||||
},
|
||||
&mut |(w1, w2), prox| {
|
||||
del_word_pair_proximity.push(((w1, w2), prox));
|
||||
},
|
||||
@@ -195,12 +180,8 @@ impl WordPairProximityDocidsExtractor {
|
||||
process_document_tokens(
|
||||
document,
|
||||
document_tokenizer,
|
||||
new_fields_ids_map,
|
||||
&mut word_positions,
|
||||
&mut |field_name| {
|
||||
new_fields_ids_map
|
||||
.id_with_metadata_or_insert(field_name)
|
||||
.ok_or(UserError::AttributeLimitReached.into())
|
||||
},
|
||||
&mut |(w1, w2), prox| {
|
||||
add_word_pair_proximity.push(((w1, w2), prox));
|
||||
},
|
||||
@@ -211,12 +192,8 @@ impl WordPairProximityDocidsExtractor {
|
||||
process_document_tokens(
|
||||
document,
|
||||
document_tokenizer,
|
||||
new_fields_ids_map,
|
||||
&mut word_positions,
|
||||
&mut |field_name| {
|
||||
new_fields_ids_map
|
||||
.id_with_metadata_or_insert(field_name)
|
||||
.ok_or(UserError::AttributeLimitReached.into())
|
||||
},
|
||||
&mut |(w1, w2), prox| {
|
||||
add_word_pair_proximity.push(((w1, w2), prox));
|
||||
},
|
||||
@@ -280,8 +257,8 @@ fn drain_word_positions(
|
||||
fn process_document_tokens<'doc>(
|
||||
document: impl Document<'doc>,
|
||||
document_tokenizer: &DocumentTokenizer,
|
||||
fields_ids_map: &mut GlobalFieldsIdsMap,
|
||||
word_positions: &mut VecDeque<(Rc<str>, u16)>,
|
||||
field_id_and_metadata: &mut impl FnMut(&str) -> Result<(FieldId, Metadata)>,
|
||||
word_pair_proximity: &mut impl FnMut((Rc<str>, Rc<str>), u8),
|
||||
) -> Result<()> {
|
||||
let mut field_id = None;
|
||||
@@ -302,248 +279,8 @@ fn process_document_tokens<'doc>(
|
||||
word_positions.push_back((Rc::from(word), pos));
|
||||
Ok(())
|
||||
};
|
||||
|
||||
let mut should_tokenize = |field_name: &str| {
|
||||
let (field_id, meta) = field_id_and_metadata(field_name)?;
|
||||
|
||||
let pattern_match = if meta.is_searchable() {
|
||||
PatternMatch::Match
|
||||
} else {
|
||||
// TODO: should be a match on the field_name using `match_field_legacy` function,
|
||||
// but for legacy reasons we iterate over all the fields to fill the field_id_map.
|
||||
PatternMatch::Parent
|
||||
};
|
||||
|
||||
Ok((field_id, pattern_match))
|
||||
};
|
||||
|
||||
document_tokenizer.tokenize_document(document, &mut should_tokenize, &mut token_fn)?;
|
||||
document_tokenizer.tokenize_document(document, fields_ids_map, &mut token_fn)?;
|
||||
|
||||
drain_word_positions(word_positions, word_pair_proximity);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub struct WordPairProximityDocidsSettingsExtractorsData<'a, SD> {
|
||||
tokenizer: DocumentTokenizer<'a>,
|
||||
max_memory_by_thread: Option<usize>,
|
||||
buckets: usize,
|
||||
settings_delta: &'a SD,
|
||||
}
|
||||
|
||||
impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
|
||||
for WordPairProximityDocidsSettingsExtractorsData<'_, SD>
|
||||
{
|
||||
type Data = RefCell<BalancedCaches<'extractor>>;
|
||||
|
||||
fn init_data<'doc>(&'doc self, extractor_alloc: &'extractor Bump) -> crate::Result<Self::Data> {
|
||||
Ok(RefCell::new(BalancedCaches::new_in(
|
||||
self.buckets,
|
||||
self.max_memory_by_thread,
|
||||
extractor_alloc,
|
||||
)))
|
||||
}
|
||||
|
||||
fn process<'doc>(
|
||||
&'doc self,
|
||||
documents: impl Iterator<Item = crate::Result<DocumentIdentifiers<'doc>>>,
|
||||
context: &'doc DocumentContext<Self::Data>,
|
||||
) -> crate::Result<()> {
|
||||
for document in documents {
|
||||
let document = document?;
|
||||
SettingsChangeWordPairProximityDocidsExtractors::extract_document_from_settings_change(
|
||||
document,
|
||||
context,
|
||||
&self.tokenizer,
|
||||
self.settings_delta,
|
||||
)?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
pub struct SettingsChangeWordPairProximityDocidsExtractors;
|
||||
|
||||
impl SettingsChangeWordPairProximityDocidsExtractors {
|
||||
pub fn run_extraction<'fid, 'indexer, 'index, 'extractor, SD, MSP>(
|
||||
settings_delta: &SD,
|
||||
documents: &'indexer DocumentsIndentifiers<'indexer>,
|
||||
indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP>,
|
||||
extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>,
|
||||
step: IndexingStep,
|
||||
) -> Result<Vec<BalancedCaches<'extractor>>>
|
||||
where
|
||||
SD: SettingsDelta + Sync,
|
||||
MSP: Fn() -> bool + Sync,
|
||||
{
|
||||
// Warning: this is duplicated code from extract_word_docids.rs
|
||||
let rtxn = indexing_context.index.read_txn()?;
|
||||
let stop_words = indexing_context.index.stop_words(&rtxn)?;
|
||||
let allowed_separators = indexing_context.index.allowed_separators(&rtxn)?;
|
||||
let allowed_separators: Option<Vec<_>> =
|
||||
allowed_separators.as_ref().map(|s| s.iter().map(String::as_str).collect());
|
||||
let dictionary = indexing_context.index.dictionary(&rtxn)?;
|
||||
let dictionary: Option<Vec<_>> =
|
||||
dictionary.as_ref().map(|s| s.iter().map(String::as_str).collect());
|
||||
let mut builder = tokenizer_builder(
|
||||
stop_words.as_ref(),
|
||||
allowed_separators.as_deref(),
|
||||
dictionary.as_deref(),
|
||||
);
|
||||
let tokenizer = builder.build();
|
||||
let localized_attributes_rules =
|
||||
indexing_context.index.localized_attributes_rules(&rtxn)?.unwrap_or_default();
|
||||
let document_tokenizer = DocumentTokenizer {
|
||||
tokenizer: &tokenizer,
|
||||
localized_attributes_rules: &localized_attributes_rules,
|
||||
max_positions_per_attributes: MAX_POSITION_PER_ATTRIBUTE,
|
||||
};
|
||||
let extractor_data = WordPairProximityDocidsSettingsExtractorsData {
|
||||
tokenizer: document_tokenizer,
|
||||
max_memory_by_thread: indexing_context.grenad_parameters.max_memory_by_thread(),
|
||||
buckets: rayon::current_num_threads(),
|
||||
settings_delta,
|
||||
};
|
||||
let datastore = ThreadLocal::new();
|
||||
{
|
||||
let span = tracing::trace_span!(target: "indexing::documents::extract", "word_pair_proximity_docids_extraction");
|
||||
let _entered = span.enter();
|
||||
|
||||
settings_change_extract(
|
||||
documents,
|
||||
&extractor_data,
|
||||
indexing_context,
|
||||
extractor_allocs,
|
||||
&datastore,
|
||||
step,
|
||||
)?;
|
||||
}
|
||||
|
||||
Ok(datastore.into_iter().map(RefCell::into_inner).collect())
|
||||
}
|
||||
|
||||
/// Extracts document words from a settings change.
|
||||
fn extract_document_from_settings_change<SD: SettingsDelta>(
|
||||
document: DocumentIdentifiers<'_>,
|
||||
context: &DocumentContext<RefCell<BalancedCaches<'_>>>,
|
||||
document_tokenizer: &DocumentTokenizer,
|
||||
settings_delta: &SD,
|
||||
) -> Result<()> {
|
||||
let mut cached_sorter = context.data.borrow_mut_or_yield();
|
||||
let doc_alloc = &context.doc_alloc;
|
||||
|
||||
let new_fields_ids_map = settings_delta.new_fields_ids_map();
|
||||
let old_fields_ids_map = settings_delta.old_fields_ids_map();
|
||||
let old_proximity_precision = *settings_delta.old_proximity_precision();
|
||||
let new_proximity_precision = *settings_delta.new_proximity_precision();
|
||||
|
||||
let current_document = document.current(
|
||||
&context.rtxn,
|
||||
context.index,
|
||||
old_fields_ids_map.as_fields_ids_map(),
|
||||
)?;
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq)]
|
||||
enum ActionToOperate {
|
||||
ReindexAllFields,
|
||||
SkipDocument,
|
||||
}
|
||||
|
||||
// TODO prefix_fid delete_old_fid_based_databases
|
||||
let mut action = match (old_proximity_precision, new_proximity_precision) {
|
||||
(ByAttribute, ByWord) => ActionToOperate::ReindexAllFields,
|
||||
(_, _) => ActionToOperate::SkipDocument,
|
||||
};
|
||||
|
||||
// Here we do a preliminary check to determine the action to take.
|
||||
// This check doesn't trigger the tokenizer as we never return
|
||||
// PatternMatch::Match.
|
||||
if action != ActionToOperate::ReindexAllFields {
|
||||
document_tokenizer.tokenize_document(
|
||||
current_document,
|
||||
&mut |field_name| {
|
||||
let fid = new_fields_ids_map.id(field_name).expect("All fields IDs must exist");
|
||||
|
||||
// If the document must be reindexed, early return NoMatch to stop the scanning process.
|
||||
if action == ActionToOperate::ReindexAllFields {
|
||||
return Ok((fid, PatternMatch::NoMatch));
|
||||
}
|
||||
|
||||
let old_field_metadata = old_fields_ids_map.metadata(fid).unwrap();
|
||||
let new_field_metadata = new_fields_ids_map.metadata(fid).unwrap();
|
||||
|
||||
action = match (old_field_metadata, new_field_metadata) {
|
||||
// At least one field is removed or added from the searchable fields
|
||||
(
|
||||
Metadata { searchable: Some(_), .. },
|
||||
Metadata { searchable: None, .. },
|
||||
)
|
||||
| (
|
||||
Metadata { searchable: None, .. },
|
||||
Metadata { searchable: Some(_), .. },
|
||||
) => ActionToOperate::ReindexAllFields,
|
||||
_ => action,
|
||||
};
|
||||
|
||||
Ok((fid, PatternMatch::Parent))
|
||||
},
|
||||
&mut |_, _, _, _| Ok(()),
|
||||
)?;
|
||||
}
|
||||
|
||||
// Early return when we don't need to index the document
|
||||
if action == ActionToOperate::SkipDocument {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let mut del_word_pair_proximity = bumpalo::collections::Vec::new_in(doc_alloc);
|
||||
let mut add_word_pair_proximity = bumpalo::collections::Vec::new_in(doc_alloc);
|
||||
|
||||
// is a vecdequeue, and will be smol, so can stay on the heap for now
|
||||
let mut word_positions: VecDeque<(Rc<str>, u16)> =
|
||||
VecDeque::with_capacity(MAX_DISTANCE as usize);
|
||||
|
||||
process_document_tokens(
|
||||
current_document,
|
||||
// TODO Tokenize must be based on old settings
|
||||
document_tokenizer,
|
||||
&mut word_positions,
|
||||
&mut |field_name| {
|
||||
Ok(old_fields_ids_map.id_with_metadata(field_name).expect("All fields must exist"))
|
||||
},
|
||||
&mut |(w1, w2), prox| {
|
||||
del_word_pair_proximity.push(((w1, w2), prox));
|
||||
},
|
||||
)?;
|
||||
|
||||
process_document_tokens(
|
||||
current_document,
|
||||
// TODO Tokenize must be based on new settings
|
||||
document_tokenizer,
|
||||
&mut word_positions,
|
||||
&mut |field_name| {
|
||||
Ok(new_fields_ids_map.id_with_metadata(field_name).expect("All fields must exist"))
|
||||
},
|
||||
&mut |(w1, w2), prox| {
|
||||
add_word_pair_proximity.push(((w1, w2), prox));
|
||||
},
|
||||
)?;
|
||||
|
||||
let mut key_buffer = bumpalo::collections::Vec::new_in(doc_alloc);
|
||||
|
||||
del_word_pair_proximity.sort_unstable();
|
||||
del_word_pair_proximity.dedup_by(|(k1, _), (k2, _)| k1 == k2);
|
||||
for ((w1, w2), prox) in del_word_pair_proximity.iter() {
|
||||
let key = build_key(*prox, w1, w2, &mut key_buffer);
|
||||
cached_sorter.insert_del_u32(key, document.docid())?;
|
||||
}
|
||||
|
||||
add_word_pair_proximity.sort_unstable();
|
||||
add_word_pair_proximity.dedup_by(|(k1, _), (k2, _)| k1 == k2);
|
||||
for ((w1, w2), prox) in add_word_pair_proximity.iter() {
|
||||
let key = build_key(*prox, w1, w2, &mut key_buffer);
|
||||
cached_sorter.insert_add_u32(key, document.docid())?;
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2,12 +2,8 @@ mod extract_word_docids;
|
||||
mod extract_word_pair_proximity_docids;
|
||||
mod tokenize_document;
|
||||
|
||||
pub use extract_word_docids::{
|
||||
SettingsChangeWordDocidsExtractors, WordDocidsCaches, WordDocidsExtractors,
|
||||
};
|
||||
pub use extract_word_pair_proximity_docids::{
|
||||
SettingsChangeWordPairProximityDocidsExtractors, WordPairProximityDocidsExtractor,
|
||||
};
|
||||
pub use extract_word_docids::{WordDocidsCaches, WordDocidsExtractors};
|
||||
pub use extract_word_pair_proximity_docids::WordPairProximityDocidsExtractor;
|
||||
|
||||
use crate::attribute_patterns::{match_field_legacy, PatternMatch};
|
||||
|
||||
@@ -31,17 +27,3 @@ pub fn match_searchable_field(
|
||||
|
||||
selection
|
||||
}
|
||||
|
||||
/// return `true` if the provided `field_name` is a parent of at least one of the fields contained in `searchable`,
|
||||
/// or if `searchable` is `None`.
|
||||
fn has_searchable_children<I, A>(field_name: &str, searchable: Option<I>) -> bool
|
||||
where
|
||||
I: IntoIterator<Item = A>,
|
||||
A: AsRef<str>,
|
||||
{
|
||||
searchable.is_none_or(|fields| {
|
||||
fields
|
||||
.into_iter()
|
||||
.any(|attr| match_field_legacy(attr.as_ref(), field_name) == PatternMatch::Parent)
|
||||
})
|
||||
}
|
||||
|
||||
@@ -8,7 +8,10 @@ use crate::update::new::document::Document;
|
||||
use crate::update::new::extract::perm_json_p::{
|
||||
seek_leaf_values_in_array, seek_leaf_values_in_object, Depth,
|
||||
};
|
||||
use crate::{FieldId, InternalError, LocalizedAttributesRule, Result, MAX_WORD_LENGTH};
|
||||
use crate::{
|
||||
FieldId, GlobalFieldsIdsMap, InternalError, LocalizedAttributesRule, Result, UserError,
|
||||
MAX_WORD_LENGTH,
|
||||
};
|
||||
|
||||
// todo: should be crate::proximity::MAX_DISTANCE but it has been forgotten
|
||||
const MAX_DISTANCE: u32 = 8;
|
||||
@@ -23,25 +26,26 @@ impl DocumentTokenizer<'_> {
|
||||
pub fn tokenize_document<'doc>(
|
||||
&self,
|
||||
document: impl Document<'doc>,
|
||||
should_tokenize: &mut impl FnMut(&str) -> Result<(FieldId, PatternMatch)>,
|
||||
field_id_map: &mut GlobalFieldsIdsMap,
|
||||
token_fn: &mut impl FnMut(&str, FieldId, u16, &str) -> Result<()>,
|
||||
) -> Result<()> {
|
||||
let mut field_position = HashMap::new();
|
||||
for entry in document.iter_top_level_fields() {
|
||||
let (field_name, value) = entry?;
|
||||
|
||||
if let (_, PatternMatch::NoMatch) = should_tokenize(field_name)? {
|
||||
continue;
|
||||
}
|
||||
|
||||
let mut tokenize_field = |field_name: &str, _depth, value: &Value| {
|
||||
let (fid, pattern_match) = should_tokenize(field_name)?;
|
||||
if pattern_match == PatternMatch::Match {
|
||||
self.tokenize_field(fid, field_name, value, token_fn, &mut field_position)?;
|
||||
}
|
||||
Ok(pattern_match)
|
||||
let mut tokenize_field = |field_name: &str, _depth, value: &Value| {
|
||||
let Some((field_id, meta)) = field_id_map.id_with_metadata_or_insert(field_name) else {
|
||||
return Err(UserError::AttributeLimitReached.into());
|
||||
};
|
||||
|
||||
if meta.is_searchable() {
|
||||
self.tokenize_field(field_id, field_name, value, token_fn, &mut field_position)?;
|
||||
}
|
||||
|
||||
// todo: should be a match on the field_name using `match_field_legacy` function,
|
||||
// but for legacy reasons we iterate over all the fields to fill the field_id_map.
|
||||
Ok(PatternMatch::Match)
|
||||
};
|
||||
|
||||
for entry in document.iter_top_level_fields() {
|
||||
let (field_name, value) = entry?;
|
||||
// parse json.
|
||||
match serde_json::to_value(value).map_err(InternalError::SerdeJson)? {
|
||||
Value::Object(object) => seek_leaf_values_in_object(
|
||||
@@ -188,7 +192,7 @@ mod test {
|
||||
use super::*;
|
||||
use crate::fields_ids_map::metadata::{FieldIdMapWithMetadata, MetadataBuilder};
|
||||
use crate::update::new::document::{DocumentFromVersions, Versions};
|
||||
use crate::{FieldsIdsMap, GlobalFieldsIdsMap, UserError};
|
||||
use crate::FieldsIdsMap;
|
||||
|
||||
#[test]
|
||||
fn test_tokenize_document() {
|
||||
@@ -227,7 +231,6 @@ mod test {
|
||||
Default::default(),
|
||||
Default::default(),
|
||||
Default::default(),
|
||||
Default::default(),
|
||||
None,
|
||||
None,
|
||||
Default::default(),
|
||||
@@ -248,19 +251,15 @@ mod test {
|
||||
let document = Versions::single(document);
|
||||
let document = DocumentFromVersions::new(&document);
|
||||
|
||||
let mut should_tokenize = |field_name: &str| {
|
||||
let Some(field_id) = global_fields_ids_map.id_or_insert(field_name) else {
|
||||
return Err(UserError::AttributeLimitReached.into());
|
||||
};
|
||||
|
||||
Ok((field_id, PatternMatch::Match))
|
||||
};
|
||||
|
||||
document_tokenizer
|
||||
.tokenize_document(document, &mut should_tokenize, &mut |_fname, fid, pos, word| {
|
||||
words.insert([fid, pos], word.to_string());
|
||||
Ok(())
|
||||
})
|
||||
.tokenize_document(
|
||||
document,
|
||||
&mut global_fields_ids_map,
|
||||
&mut |_fname, fid, pos, word| {
|
||||
words.insert([fid, pos], word.to_string());
|
||||
Ok(())
|
||||
},
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
snapshot!(format!("{:#?}", words), @r###"
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
use std::cell::RefCell;
|
||||
use std::fmt::Debug;
|
||||
use std::sync::RwLock;
|
||||
|
||||
use bumpalo::collections::Vec as BVec;
|
||||
use bumpalo::Bump;
|
||||
@@ -28,10 +27,7 @@ use crate::vector::extractor::{
|
||||
use crate::vector::session::{EmbedSession, Input, Metadata, OnEmbed};
|
||||
use crate::vector::settings::ReindexAction;
|
||||
use crate::vector::{Embedding, RuntimeEmbedder, RuntimeEmbedders, RuntimeFragment};
|
||||
use crate::{
|
||||
DocumentId, FieldDistribution, GlobalFieldsIdsMap, InternalError, Result, ThreadPoolNoAbort,
|
||||
UserError,
|
||||
};
|
||||
use crate::{DocumentId, FieldDistribution, InternalError, Result, ThreadPoolNoAbort, UserError};
|
||||
|
||||
pub struct EmbeddingExtractor<'a, 'b> {
|
||||
embedders: &'a RuntimeEmbedders,
|
||||
@@ -325,15 +321,6 @@ impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
|
||||
let old_embedders = self.settings_delta.old_embedders();
|
||||
let unused_vectors_distribution = UnusedVectorsDistributionBump::new_in(&context.doc_alloc);
|
||||
|
||||
// We get a reference to the new and old fields ids maps but
|
||||
// note that those are local versions where updates to them
|
||||
// will not be reflected in the database. It's not an issue
|
||||
// because new settings do not generate new fields.
|
||||
let new_fields_ids_map = RwLock::new(self.settings_delta.new_fields_ids_map().clone());
|
||||
let new_fields_ids_map = RefCell::new(GlobalFieldsIdsMap::new(&new_fields_ids_map));
|
||||
let old_fields_ids_map = RwLock::new(self.settings_delta.old_fields_ids_map().clone());
|
||||
let old_fields_ids_map = RefCell::new(GlobalFieldsIdsMap::new(&old_fields_ids_map));
|
||||
|
||||
let mut all_chunks = BVec::with_capacity_in(embedders.len(), &context.doc_alloc);
|
||||
let embedder_configs = context.index.embedding_configs();
|
||||
for (embedder_name, action) in self.settings_delta.embedder_actions().iter() {
|
||||
@@ -409,7 +396,6 @@ impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
|
||||
if !must_regenerate {
|
||||
continue;
|
||||
}
|
||||
|
||||
// we need to regenerate the prompts for the document
|
||||
chunks.settings_change_autogenerated(
|
||||
document.docid(),
|
||||
@@ -420,8 +406,7 @@ impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
|
||||
context.db_fields_ids_map,
|
||||
)?,
|
||||
self.settings_delta,
|
||||
&old_fields_ids_map,
|
||||
&new_fields_ids_map,
|
||||
context.new_fields_ids_map,
|
||||
&unused_vectors_distribution,
|
||||
old_is_user_provided,
|
||||
fragments_changed,
|
||||
@@ -457,8 +442,7 @@ impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
|
||||
context.db_fields_ids_map,
|
||||
)?,
|
||||
self.settings_delta,
|
||||
&old_fields_ids_map,
|
||||
&new_fields_ids_map,
|
||||
context.new_fields_ids_map,
|
||||
&unused_vectors_distribution,
|
||||
old_is_user_provided,
|
||||
true,
|
||||
@@ -654,8 +638,7 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> {
|
||||
external_docid: &'a str,
|
||||
document: D,
|
||||
settings_delta: &SD,
|
||||
old_fields_ids_map: &'a RefCell<GlobalFieldsIdsMap<'a>>,
|
||||
new_fields_ids_map: &'a RefCell<GlobalFieldsIdsMap<'a>>,
|
||||
fields_ids_map: &'a RefCell<crate::GlobalFieldsIdsMap>,
|
||||
unused_vectors_distribution: &UnusedVectorsDistributionBump<'a>,
|
||||
old_is_user_provided: bool,
|
||||
full_reindex: bool,
|
||||
@@ -750,17 +733,10 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> {
|
||||
old_embedder.as_ref().map(|old_embedder| &old_embedder.document_template)
|
||||
};
|
||||
|
||||
let extractor = DocumentTemplateExtractor::new(
|
||||
document_template,
|
||||
doc_alloc,
|
||||
new_fields_ids_map,
|
||||
);
|
||||
let extractor =
|
||||
DocumentTemplateExtractor::new(document_template, doc_alloc, fields_ids_map);
|
||||
let old_extractor = old_document_template.map(|old_document_template| {
|
||||
DocumentTemplateExtractor::new(
|
||||
old_document_template,
|
||||
doc_alloc,
|
||||
old_fields_ids_map,
|
||||
)
|
||||
DocumentTemplateExtractor::new(old_document_template, doc_alloc, fields_ids_map)
|
||||
});
|
||||
let metadata =
|
||||
Metadata { docid, external_docid, extractor_id: extractor.extractor_id() };
|
||||
|
||||
@@ -372,10 +372,11 @@ where
|
||||
SD: SettingsDelta + Sync,
|
||||
{
|
||||
// Create the list of document ids to extract
|
||||
let index = indexing_context.index;
|
||||
let rtxn = index.read_txn()?;
|
||||
let all_document_ids = index.documents_ids(&rtxn)?.into_iter().collect::<Vec<_>>();
|
||||
let primary_key = primary_key_from_db(index, &rtxn, &indexing_context.db_fields_ids_map)?;
|
||||
let rtxn = indexing_context.index.read_txn()?;
|
||||
let all_document_ids =
|
||||
indexing_context.index.documents_ids(&rtxn)?.into_iter().collect::<Vec<_>>();
|
||||
let primary_key =
|
||||
primary_key_from_db(indexing_context.index, &rtxn, &indexing_context.db_fields_ids_map)?;
|
||||
let documents = DocumentsIndentifiers::new(&all_document_ids, primary_key);
|
||||
|
||||
let span =
|
||||
@@ -390,133 +391,6 @@ where
|
||||
extractor_allocs,
|
||||
)?;
|
||||
|
||||
{
|
||||
let WordDocidsCaches {
|
||||
word_docids,
|
||||
word_fid_docids,
|
||||
exact_word_docids,
|
||||
word_position_docids,
|
||||
fid_word_count_docids,
|
||||
} = {
|
||||
let span = tracing::trace_span!(target: "indexing::documents::extract", "word_docids");
|
||||
let _entered = span.enter();
|
||||
SettingsChangeWordDocidsExtractors::run_extraction(
|
||||
settings_delta,
|
||||
&documents,
|
||||
indexing_context,
|
||||
extractor_allocs,
|
||||
IndexingStep::ExtractingWords,
|
||||
)?
|
||||
};
|
||||
|
||||
indexing_context.progress.update_progress(IndexingStep::MergingWordCaches);
|
||||
|
||||
{
|
||||
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_docids");
|
||||
let _entered = span.enter();
|
||||
indexing_context.progress.update_progress(MergingWordCache::WordDocids);
|
||||
|
||||
merge_and_send_docids(
|
||||
word_docids,
|
||||
index.word_docids.remap_types(),
|
||||
index,
|
||||
extractor_sender.docids::<WordDocids>(),
|
||||
&indexing_context.must_stop_processing,
|
||||
)?;
|
||||
}
|
||||
|
||||
{
|
||||
let span =
|
||||
tracing::trace_span!(target: "indexing::documents::merge", "word_fid_docids");
|
||||
let _entered = span.enter();
|
||||
indexing_context.progress.update_progress(MergingWordCache::WordFieldIdDocids);
|
||||
|
||||
merge_and_send_docids(
|
||||
word_fid_docids,
|
||||
index.word_fid_docids.remap_types(),
|
||||
index,
|
||||
extractor_sender.docids::<WordFidDocids>(),
|
||||
&indexing_context.must_stop_processing,
|
||||
)?;
|
||||
}
|
||||
|
||||
{
|
||||
let span =
|
||||
tracing::trace_span!(target: "indexing::documents::merge", "exact_word_docids");
|
||||
let _entered = span.enter();
|
||||
indexing_context.progress.update_progress(MergingWordCache::ExactWordDocids);
|
||||
|
||||
merge_and_send_docids(
|
||||
exact_word_docids,
|
||||
index.exact_word_docids.remap_types(),
|
||||
index,
|
||||
extractor_sender.docids::<ExactWordDocids>(),
|
||||
&indexing_context.must_stop_processing,
|
||||
)?;
|
||||
}
|
||||
|
||||
{
|
||||
let span =
|
||||
tracing::trace_span!(target: "indexing::documents::merge", "word_position_docids");
|
||||
let _entered = span.enter();
|
||||
indexing_context.progress.update_progress(MergingWordCache::WordPositionDocids);
|
||||
|
||||
merge_and_send_docids(
|
||||
word_position_docids,
|
||||
index.word_position_docids.remap_types(),
|
||||
index,
|
||||
extractor_sender.docids::<WordPositionDocids>(),
|
||||
&indexing_context.must_stop_processing,
|
||||
)?;
|
||||
}
|
||||
|
||||
{
|
||||
let span =
|
||||
tracing::trace_span!(target: "indexing::documents::merge", "fid_word_count_docids");
|
||||
let _entered = span.enter();
|
||||
indexing_context.progress.update_progress(MergingWordCache::FieldIdWordCountDocids);
|
||||
|
||||
merge_and_send_docids(
|
||||
fid_word_count_docids,
|
||||
index.field_id_word_count_docids.remap_types(),
|
||||
index,
|
||||
extractor_sender.docids::<FidWordCountDocids>(),
|
||||
&indexing_context.must_stop_processing,
|
||||
)?;
|
||||
}
|
||||
}
|
||||
|
||||
// Run the proximity extraction only if the precision is ByWord.
|
||||
let new_proximity_precision = settings_delta.new_proximity_precision();
|
||||
if *new_proximity_precision == ProximityPrecision::ByWord {
|
||||
let caches = {
|
||||
let span = tracing::trace_span!(target: "indexing::documents::extract", "word_pair_proximity_docids");
|
||||
let _entered = span.enter();
|
||||
|
||||
SettingsChangeWordPairProximityDocidsExtractors::run_extraction(
|
||||
settings_delta,
|
||||
&documents,
|
||||
indexing_context,
|
||||
extractor_allocs,
|
||||
IndexingStep::ExtractingWordProximity,
|
||||
)?
|
||||
};
|
||||
|
||||
{
|
||||
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_pair_proximity_docids");
|
||||
let _entered = span.enter();
|
||||
indexing_context.progress.update_progress(IndexingStep::MergingWordProximity);
|
||||
|
||||
merge_and_send_docids(
|
||||
caches,
|
||||
index.word_pair_proximity_docids.remap_types(),
|
||||
index,
|
||||
extractor_sender.docids::<WordPairProximityDocids>(),
|
||||
&indexing_context.must_stop_processing,
|
||||
)?;
|
||||
}
|
||||
}
|
||||
|
||||
'vectors: {
|
||||
if settings_delta.embedder_actions().is_empty() {
|
||||
break 'vectors;
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
use std::collections::{BTreeMap, BTreeSet};
|
||||
use std::collections::BTreeMap;
|
||||
use std::sync::atomic::AtomicBool;
|
||||
use std::sync::{Arc, Once, RwLock};
|
||||
use std::thread::{self, Builder};
|
||||
@@ -8,11 +8,9 @@ use document_changes::{DocumentChanges, IndexingContext};
|
||||
pub use document_deletion::DocumentDeletion;
|
||||
pub use document_operation::{DocumentOperation, PayloadStats};
|
||||
use hashbrown::HashMap;
|
||||
use heed::types::DecodeIgnore;
|
||||
use heed::{BytesDecode, Database, RoTxn, RwTxn};
|
||||
use heed::{RoTxn, RwTxn};
|
||||
pub use partial_dump::PartialDump;
|
||||
pub use post_processing::recompute_word_fst_from_word_docids_database;
|
||||
pub use settings_changes::settings_change_extract;
|
||||
pub use update_by_function::UpdateByFunction;
|
||||
pub use write::ChannelCongestion;
|
||||
use write::{build_vectors, update_index, write_to_db};
|
||||
@@ -22,18 +20,12 @@ use super::steps::IndexingStep;
|
||||
use super::thread_local::ThreadLocal;
|
||||
use crate::documents::PrimaryKey;
|
||||
use crate::fields_ids_map::metadata::{FieldIdMapWithMetadata, MetadataBuilder};
|
||||
use crate::heed_codec::StrBEU16Codec;
|
||||
use crate::progress::{EmbedderStats, Progress};
|
||||
use crate::proximity::ProximityPrecision;
|
||||
use crate::update::new::steps::SettingsIndexerStep;
|
||||
use crate::update::new::FacetFieldIdsDelta;
|
||||
use crate::update::settings::SettingsDelta;
|
||||
use crate::update::GrenadParameters;
|
||||
use crate::vector::settings::{EmbedderAction, RemoveFragments, WriteBackToDocuments};
|
||||
use crate::vector::{Embedder, RuntimeEmbedders, VectorStore};
|
||||
use crate::{
|
||||
Error, FieldsIdsMap, GlobalFieldsIdsMap, Index, InternalError, Result, ThreadPoolNoAbort,
|
||||
};
|
||||
use crate::{FieldsIdsMap, GlobalFieldsIdsMap, Index, InternalError, Result, ThreadPoolNoAbort};
|
||||
|
||||
#[cfg(not(feature = "enterprise"))]
|
||||
pub mod community_edition;
|
||||
@@ -250,20 +242,6 @@ where
|
||||
SD: SettingsDelta + Sync,
|
||||
{
|
||||
delete_old_embedders_and_fragments(wtxn, index, settings_delta)?;
|
||||
delete_old_fid_based_databases(wtxn, index, settings_delta, must_stop_processing, progress)?;
|
||||
|
||||
// Clear word_pair_proximity if byWord to byAttribute
|
||||
let old_proximity_precision = settings_delta.old_proximity_precision();
|
||||
let new_proximity_precision = settings_delta.new_proximity_precision();
|
||||
if *old_proximity_precision == ProximityPrecision::ByWord
|
||||
&& *new_proximity_precision == ProximityPrecision::ByAttribute
|
||||
{
|
||||
index.word_pair_proximity_docids.clear(wtxn)?;
|
||||
}
|
||||
|
||||
// TODO delete useless searchable databases
|
||||
// - Clear fid_prefix_* in the post processing
|
||||
// - clear the prefix + fid_prefix if setting `PrefixSearch` is enabled
|
||||
|
||||
let mut bbbuffers = Vec::new();
|
||||
let finished_extraction = AtomicBool::new(false);
|
||||
@@ -322,8 +300,6 @@ where
|
||||
.unwrap()
|
||||
})?;
|
||||
|
||||
let global_fields_ids_map = GlobalFieldsIdsMap::new(&new_fields_ids_map);
|
||||
|
||||
let new_embedders = settings_delta.new_embedders();
|
||||
let embedder_actions = settings_delta.embedder_actions();
|
||||
let index_embedder_category_ids = settings_delta.new_embedder_category_id();
|
||||
@@ -358,18 +334,6 @@ where
|
||||
})
|
||||
.unwrap()?;
|
||||
|
||||
pool.install(|| {
|
||||
// WARN When implementing the facets don't forget this
|
||||
let facet_field_ids_delta = FacetFieldIdsDelta::new(0, 0);
|
||||
post_processing::post_process(
|
||||
indexing_context,
|
||||
wtxn,
|
||||
global_fields_ids_map,
|
||||
facet_field_ids_delta,
|
||||
)
|
||||
})
|
||||
.unwrap()?;
|
||||
|
||||
indexing_context.progress.update_progress(IndexingStep::BuildingGeoJson);
|
||||
index.cellulite.build(
|
||||
wtxn,
|
||||
@@ -499,106 +463,6 @@ where
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Deletes entries refering the provided
|
||||
/// fids from the fid-based databases.
|
||||
fn delete_old_fid_based_databases<SD, MSP>(
|
||||
wtxn: &mut RwTxn<'_>,
|
||||
index: &Index,
|
||||
settings_delta: &SD,
|
||||
must_stop_processing: &MSP,
|
||||
progress: &Progress,
|
||||
) -> Result<()>
|
||||
where
|
||||
SD: SettingsDelta + Sync,
|
||||
MSP: Fn() -> bool + Sync,
|
||||
{
|
||||
let fids_to_delete: Option<BTreeSet<_>> = {
|
||||
let rtxn = index.read_txn()?;
|
||||
let fields_ids_map = index.fields_ids_map(&rtxn)?;
|
||||
let old_searchable_attributes = settings_delta.old_searchable_attributes().as_ref();
|
||||
let new_searchable_attributes = settings_delta.new_searchable_attributes().as_ref();
|
||||
old_searchable_attributes.zip(new_searchable_attributes).map(|(old, new)| {
|
||||
old.iter()
|
||||
// Ignore the field if it is not searchable anymore
|
||||
// or if it was never referenced in any document
|
||||
.filter_map(|name| if new.contains(name) { None } else { fields_ids_map.id(name) })
|
||||
.collect()
|
||||
})
|
||||
};
|
||||
|
||||
let Some(fids_to_delete) = fids_to_delete else {
|
||||
return Ok(());
|
||||
};
|
||||
|
||||
progress.update_progress(SettingsIndexerStep::DeletingOldWordFidDocids);
|
||||
delete_old_word_fid_docids(wtxn, index.word_fid_docids, must_stop_processing, &fids_to_delete)?;
|
||||
|
||||
progress.update_progress(SettingsIndexerStep::DeletingOldFidWordCountDocids);
|
||||
delete_old_fid_word_count_docids(wtxn, index, must_stop_processing, &fids_to_delete)?;
|
||||
|
||||
progress.update_progress(SettingsIndexerStep::DeletingOldWordPrefixFidDocids);
|
||||
delete_old_word_fid_docids(
|
||||
wtxn,
|
||||
index.word_prefix_fid_docids,
|
||||
must_stop_processing,
|
||||
&fids_to_delete,
|
||||
)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn delete_old_word_fid_docids<'txn, MSP, DC>(
|
||||
wtxn: &mut RwTxn<'txn>,
|
||||
database: Database<StrBEU16Codec, DC>,
|
||||
must_stop_processing: &MSP,
|
||||
fids_to_delete: &BTreeSet<u16>,
|
||||
) -> Result<(), Error>
|
||||
where
|
||||
MSP: Fn() -> bool + Sync,
|
||||
DC: BytesDecode<'txn>,
|
||||
{
|
||||
let mut iter = database.iter_mut(wtxn)?.remap_data_type::<DecodeIgnore>();
|
||||
while let Some(((_word, fid), ())) = iter.next().transpose()? {
|
||||
// TODO should I call it that often?
|
||||
if must_stop_processing() {
|
||||
return Err(Error::InternalError(InternalError::AbortedIndexation));
|
||||
}
|
||||
|
||||
if fids_to_delete.contains(&fid) {
|
||||
// safety: We don't keep any references to the data.
|
||||
unsafe { iter.del_current()? };
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn delete_old_fid_word_count_docids<MSP>(
|
||||
wtxn: &mut RwTxn<'_>,
|
||||
index: &Index,
|
||||
must_stop_processing: &MSP,
|
||||
fids_to_delete: &BTreeSet<u16>,
|
||||
) -> Result<(), Error>
|
||||
where
|
||||
MSP: Fn() -> bool + Sync,
|
||||
{
|
||||
let db = index.field_id_word_count_docids.remap_data_type::<DecodeIgnore>();
|
||||
for &fid_to_delete in fids_to_delete {
|
||||
if must_stop_processing() {
|
||||
return Err(Error::InternalError(InternalError::AbortedIndexation));
|
||||
}
|
||||
|
||||
let mut iter = db.prefix_iter_mut(wtxn, &(fid_to_delete, 0))?;
|
||||
while let Some(((fid, _word_count), ())) = iter.next().transpose()? {
|
||||
debug_assert_eq!(fid, fid_to_delete);
|
||||
// safety: We don't keep any references to the data.
|
||||
unsafe { iter.del_current()? };
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn indexer_memory_settings(
|
||||
current_num_threads: usize,
|
||||
grenad_parameters: GrenadParameters,
|
||||
|
||||
@@ -28,9 +28,6 @@ make_enum_progress! {
|
||||
ChangingVectorStore,
|
||||
UsingStableIndexer,
|
||||
UsingExperimentalIndexer,
|
||||
DeletingOldWordFidDocids,
|
||||
DeletingOldFidWordCountDocids,
|
||||
DeletingOldWordPrefixFidDocids,
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1589,33 +1589,33 @@ impl<'a, 't, 'i> Settings<'a, 't, 'i> {
|
||||
|
||||
// only use the new indexer when only the embedder possibly changed
|
||||
if let Self {
|
||||
searchable_fields: _,
|
||||
searchable_fields: Setting::NotSet,
|
||||
displayed_fields: Setting::NotSet,
|
||||
filterable_fields: Setting::NotSet,
|
||||
sortable_fields: Setting::NotSet,
|
||||
criteria: Setting::NotSet,
|
||||
stop_words: Setting::NotSet, // TODO (require force reindexing of searchables)
|
||||
non_separator_tokens: Setting::NotSet, // TODO (require force reindexing of searchables)
|
||||
separator_tokens: Setting::NotSet, // TODO (require force reindexing of searchables)
|
||||
dictionary: Setting::NotSet, // TODO (require force reindexing of searchables)
|
||||
stop_words: Setting::NotSet,
|
||||
non_separator_tokens: Setting::NotSet,
|
||||
separator_tokens: Setting::NotSet,
|
||||
dictionary: Setting::NotSet,
|
||||
distinct_field: Setting::NotSet,
|
||||
synonyms: Setting::NotSet,
|
||||
primary_key: Setting::NotSet,
|
||||
authorize_typos: Setting::NotSet,
|
||||
min_word_len_two_typos: Setting::NotSet,
|
||||
min_word_len_one_typo: Setting::NotSet,
|
||||
exact_words: Setting::NotSet, // TODO (require force reindexing of searchables)
|
||||
exact_attributes: _,
|
||||
exact_words: Setting::NotSet,
|
||||
exact_attributes: Setting::NotSet,
|
||||
max_values_per_facet: Setting::NotSet,
|
||||
sort_facet_values_by: Setting::NotSet,
|
||||
pagination_max_total_hits: Setting::NotSet,
|
||||
proximity_precision: _,
|
||||
proximity_precision: Setting::NotSet,
|
||||
embedder_settings: _,
|
||||
search_cutoff: Setting::NotSet,
|
||||
localized_attributes_rules: Setting::NotSet, // TODO to start with
|
||||
prefix_search: Setting::NotSet, // TODO continue with this
|
||||
localized_attributes_rules: Setting::NotSet,
|
||||
prefix_search: Setting::NotSet,
|
||||
facet_search: Setting::NotSet,
|
||||
disable_on_numbers: Setting::NotSet, // TODO (require force reindexing of searchables)
|
||||
disable_on_numbers: Setting::NotSet,
|
||||
chat: Setting::NotSet,
|
||||
vector_store: Setting::NotSet,
|
||||
wtxn: _,
|
||||
@@ -1632,12 +1632,10 @@ impl<'a, 't, 'i> Settings<'a, 't, 'i> {
|
||||
// Update index settings
|
||||
let embedding_config_updates = self.update_embedding_configs()?;
|
||||
self.update_user_defined_searchable_attributes()?;
|
||||
self.update_exact_attributes()?;
|
||||
self.update_proximity_precision()?;
|
||||
|
||||
// Note that we don't need to update the searchables here,
|
||||
// as it will be done after the settings update.
|
||||
let new_inner_settings = InnerIndexSettings::from_index(self.index, self.wtxn, None)?;
|
||||
let mut new_inner_settings =
|
||||
InnerIndexSettings::from_index(self.index, self.wtxn, None)?;
|
||||
new_inner_settings.recompute_searchables(self.wtxn, self.index)?;
|
||||
|
||||
let primary_key_id = self
|
||||
.index
|
||||
@@ -2064,12 +2062,9 @@ impl InnerIndexSettings {
|
||||
let sortable_fields = index.sortable_fields(rtxn)?;
|
||||
let asc_desc_fields = index.asc_desc_fields(rtxn)?;
|
||||
let distinct_field = index.distinct_field(rtxn)?.map(|f| f.to_string());
|
||||
let user_defined_searchable_attributes = match index.user_defined_searchable_fields(rtxn)? {
|
||||
Some(fields) if fields.contains(&"*") => None,
|
||||
Some(fields) => Some(fields.into_iter().map(|f| f.to_string()).collect()),
|
||||
None => None,
|
||||
};
|
||||
|
||||
let user_defined_searchable_attributes = index
|
||||
.user_defined_searchable_fields(rtxn)?
|
||||
.map(|fields| fields.into_iter().map(|f| f.to_string()).collect());
|
||||
let builder = MetadataBuilder::from_index(index, rtxn)?;
|
||||
let fields_ids_map = FieldIdMapWithMetadata::new(fields_ids_map, builder);
|
||||
let disabled_typos_terms = index.disabled_typos_terms(rtxn)?;
|
||||
@@ -2583,20 +2578,8 @@ fn deserialize_sub_embedder(
|
||||
/// Implement this trait for the settings delta type.
|
||||
/// This is used in the new settings update flow and will allow to easily replace the old settings delta type: `InnerIndexSettingsDiff`.
|
||||
pub trait SettingsDelta {
|
||||
fn old_fields_ids_map(&self) -> &FieldIdMapWithMetadata;
|
||||
fn new_fields_ids_map(&self) -> &FieldIdMapWithMetadata;
|
||||
|
||||
fn old_searchable_attributes(&self) -> &Option<Vec<String>>;
|
||||
fn new_searchable_attributes(&self) -> &Option<Vec<String>>;
|
||||
|
||||
fn old_disabled_typos_terms(&self) -> &DisabledTyposTerms;
|
||||
fn new_disabled_typos_terms(&self) -> &DisabledTyposTerms;
|
||||
|
||||
fn old_proximity_precision(&self) -> &ProximityPrecision;
|
||||
fn new_proximity_precision(&self) -> &ProximityPrecision;
|
||||
|
||||
fn old_embedders(&self) -> &RuntimeEmbedders;
|
||||
fn new_embedders(&self) -> &RuntimeEmbedders;
|
||||
fn old_embedders(&self) -> &RuntimeEmbedders;
|
||||
fn new_embedder_category_id(&self) -> &HashMap<String, u8>;
|
||||
fn embedder_actions(&self) -> &BTreeMap<String, EmbedderAction>;
|
||||
fn try_for_each_fragment_diff<F, E>(
|
||||
@@ -2606,6 +2589,7 @@ pub trait SettingsDelta {
|
||||
) -> std::result::Result<(), E>
|
||||
where
|
||||
F: FnMut(FragmentDiff) -> std::result::Result<(), E>;
|
||||
fn new_fields_ids_map(&self) -> &FieldIdMapWithMetadata;
|
||||
}
|
||||
|
||||
pub struct FragmentDiff<'a> {
|
||||
@@ -2614,47 +2598,26 @@ pub struct FragmentDiff<'a> {
|
||||
}
|
||||
|
||||
impl SettingsDelta for InnerIndexSettingsDiff {
|
||||
fn old_fields_ids_map(&self) -> &FieldIdMapWithMetadata {
|
||||
&self.old.fields_ids_map
|
||||
}
|
||||
fn new_fields_ids_map(&self) -> &FieldIdMapWithMetadata {
|
||||
&self.new.fields_ids_map
|
||||
}
|
||||
|
||||
fn old_searchable_attributes(&self) -> &Option<Vec<String>> {
|
||||
&self.old.user_defined_searchable_attributes
|
||||
}
|
||||
fn new_searchable_attributes(&self) -> &Option<Vec<String>> {
|
||||
&self.new.user_defined_searchable_attributes
|
||||
}
|
||||
|
||||
fn old_disabled_typos_terms(&self) -> &DisabledTyposTerms {
|
||||
&self.old.disabled_typos_terms
|
||||
}
|
||||
fn new_disabled_typos_terms(&self) -> &DisabledTyposTerms {
|
||||
&self.new.disabled_typos_terms
|
||||
}
|
||||
|
||||
fn old_proximity_precision(&self) -> &ProximityPrecision {
|
||||
&self.old.proximity_precision
|
||||
}
|
||||
fn new_proximity_precision(&self) -> &ProximityPrecision {
|
||||
&self.new.proximity_precision
|
||||
fn new_embedders(&self) -> &RuntimeEmbedders {
|
||||
&self.new.runtime_embedders
|
||||
}
|
||||
|
||||
fn old_embedders(&self) -> &RuntimeEmbedders {
|
||||
&self.old.runtime_embedders
|
||||
}
|
||||
fn new_embedders(&self) -> &RuntimeEmbedders {
|
||||
&self.new.runtime_embedders
|
||||
}
|
||||
|
||||
fn new_embedder_category_id(&self) -> &HashMap<String, u8> {
|
||||
&self.new.embedder_category_id
|
||||
}
|
||||
|
||||
fn embedder_actions(&self) -> &BTreeMap<String, EmbedderAction> {
|
||||
&self.embedding_config_updates
|
||||
}
|
||||
|
||||
fn new_fields_ids_map(&self) -> &FieldIdMapWithMetadata {
|
||||
&self.new.fields_ids_map
|
||||
}
|
||||
|
||||
fn try_for_each_fragment_diff<F, E>(
|
||||
&self,
|
||||
embedder_name: &str,
|
||||
|
||||
@@ -14,21 +14,28 @@ fn set_and_reset_searchable_fields() {
|
||||
let index = TempIndex::new();
|
||||
|
||||
// First we send 3 documents with ids from 1 to 3.
|
||||
let mut wtxn = index.write_txn().unwrap();
|
||||
|
||||
index
|
||||
.add_documents(documents!([
|
||||
{ "id": 1, "name": "kevin", "age": 23 },
|
||||
{ "id": 2, "name": "kevina", "age": 21},
|
||||
{ "id": 3, "name": "benoit", "age": 34 }
|
||||
]))
|
||||
.add_documents_using_wtxn(
|
||||
&mut wtxn,
|
||||
documents!([
|
||||
{ "id": 1, "name": "kevin", "age": 23 },
|
||||
{ "id": 2, "name": "kevina", "age": 21},
|
||||
{ "id": 3, "name": "benoit", "age": 34 }
|
||||
]),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
// We change the searchable fields to be the "name" field only.
|
||||
index
|
||||
.update_settings(|settings| {
|
||||
.update_settings_using_wtxn(&mut wtxn, |settings| {
|
||||
settings.set_searchable_fields(vec!["name".into()]);
|
||||
})
|
||||
.unwrap();
|
||||
|
||||
wtxn.commit().unwrap();
|
||||
|
||||
db_snap!(index, fields_ids_map, @r###"
|
||||
0 id |
|
||||
1 name |
|
||||
|
||||
@@ -5,36 +5,103 @@ mod v1_15;
|
||||
mod v1_16;
|
||||
|
||||
use heed::RwTxn;
|
||||
use v1_12::{FixFieldDistribution, RecomputeStats};
|
||||
use v1_13::AddNewStats;
|
||||
use v1_14::UpgradeArroyVersion;
|
||||
use v1_15::RecomputeWordFst;
|
||||
use v1_16::SwitchToMultimodal;
|
||||
use v1_12::{V1_12_3_To_V1_13_0, V1_12_To_V1_12_3};
|
||||
use v1_13::{V1_13_0_To_V1_13_1, V1_13_1_To_Latest_V1_13};
|
||||
use v1_14::Latest_V1_13_To_Latest_V1_14;
|
||||
use v1_15::Latest_V1_14_To_Latest_V1_15;
|
||||
use v1_16::Latest_V1_15_To_V1_16_0;
|
||||
|
||||
use crate::constants::{VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH};
|
||||
use crate::progress::{Progress, VariableNameStep};
|
||||
use crate::{Index, InternalError, Result};
|
||||
|
||||
trait UpgradeIndex {
|
||||
/// Returns `true` if `upgrade` should be called when the index started with version `initial_version`.
|
||||
fn must_upgrade(&self, initial_version: (u32, u32, u32)) -> bool;
|
||||
|
||||
/// Returns `true` if the index scheduler must regenerate its cached stats.
|
||||
fn upgrade(&self, wtxn: &mut RwTxn, index: &Index, progress: Progress) -> Result<bool>;
|
||||
|
||||
/// Description of the upgrade for progress display purposes.
|
||||
fn description(&self) -> &'static str;
|
||||
fn upgrade(
|
||||
&self,
|
||||
wtxn: &mut RwTxn,
|
||||
index: &Index,
|
||||
original: (u32, u32, u32),
|
||||
progress: Progress,
|
||||
) -> Result<bool>;
|
||||
fn target_version(&self) -> (u32, u32, u32);
|
||||
}
|
||||
|
||||
const UPGRADE_FUNCTIONS: &[&dyn UpgradeIndex] = &[
|
||||
&FixFieldDistribution {},
|
||||
&RecomputeStats {},
|
||||
&AddNewStats {},
|
||||
&UpgradeArroyVersion {},
|
||||
&RecomputeWordFst {},
|
||||
&SwitchToMultimodal {},
|
||||
&V1_12_To_V1_12_3 {},
|
||||
&V1_12_3_To_V1_13_0 {},
|
||||
&V1_13_0_To_V1_13_1 {},
|
||||
&V1_13_1_To_Latest_V1_13 {},
|
||||
&Latest_V1_13_To_Latest_V1_14 {},
|
||||
&Latest_V1_14_To_Latest_V1_15 {},
|
||||
&Latest_V1_15_To_V1_16_0 {},
|
||||
&ToTargetNoOp { target: (1, 18, 0) },
|
||||
&ToTargetNoOp { target: (1, 19, 0) },
|
||||
&ToTargetNoOp { target: (1, 20, 0) },
|
||||
&ToTargetNoOp { target: (1, 21, 0) },
|
||||
&ToTargetNoOp { target: (1, 22, 0) },
|
||||
&ToTargetNoOp { target: (1, 23, 0) },
|
||||
&ToTargetNoOp { target: (1, 24, 0) },
|
||||
&ToTargetNoOp { target: (1, 25, 0) },
|
||||
&ToTargetNoOp { target: (1, 26, 0) },
|
||||
&ToTargetNoOp { target: (1, 27, 0) },
|
||||
&ToTargetNoOp { target: (1, 28, 0) },
|
||||
// This is the last upgrade function, it will be called when the index is up to date.
|
||||
// any other upgrade function should be added before this one.
|
||||
&ToCurrentNoOp {},
|
||||
];
|
||||
|
||||
/// Causes a compile-time error if the argument is not in range of `0..UPGRADE_FUNCTIONS.len()`
|
||||
macro_rules! function_index {
|
||||
($start:expr) => {{
|
||||
const _CHECK_INDEX: () = {
|
||||
if $start >= $crate::update::upgrade::UPGRADE_FUNCTIONS.len() {
|
||||
panic!("upgrade functions out of range")
|
||||
}
|
||||
};
|
||||
|
||||
$start
|
||||
}};
|
||||
}
|
||||
|
||||
const fn start(from: (u32, u32, u32)) -> Option<usize> {
|
||||
let start = match from {
|
||||
(1, 12, 0..=2) => function_index!(0),
|
||||
(1, 12, 3..) => function_index!(1),
|
||||
(1, 13, 0) => function_index!(2),
|
||||
(1, 13, _) => function_index!(4),
|
||||
(1, 14, _) => function_index!(5),
|
||||
// We must handle the current version in the match because in case of a failure some index may have been upgraded but not other.
|
||||
(1, 15, _) => function_index!(6),
|
||||
(1, 16, _) | (1, 17, _) => function_index!(7),
|
||||
(1, 18, _) => function_index!(8),
|
||||
(1, 19, _) => function_index!(9),
|
||||
(1, 20, _) => function_index!(10),
|
||||
(1, 21, _) => function_index!(11),
|
||||
(1, 22, _) => function_index!(12),
|
||||
(1, 23, _) => function_index!(13),
|
||||
(1, 24, _) => function_index!(14),
|
||||
(1, 25, _) => function_index!(15),
|
||||
(1, 26, _) => function_index!(16),
|
||||
(1, 27, _) => function_index!(17),
|
||||
(1, 28, _) => function_index!(18),
|
||||
// We deliberately don't add a placeholder with (VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH) here to force manually
|
||||
// considering dumpless upgrade.
|
||||
(_major, _minor, _patch) => return None,
|
||||
};
|
||||
|
||||
Some(start)
|
||||
}
|
||||
|
||||
/// Causes a compile-time error if the latest package cannot be upgraded.
|
||||
///
|
||||
/// This serves as a reminder to consider the proper dumpless upgrade implementation when changing the package version.
|
||||
const _CHECK_PACKAGE_CAN_UPGRADE: () = {
|
||||
if start((VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH)).is_none() {
|
||||
panic!("cannot upgrade from latest package version")
|
||||
}
|
||||
};
|
||||
|
||||
/// Return true if the cached stats of the index must be regenerated
|
||||
pub fn upgrade<MSP>(
|
||||
wtxn: &mut RwTxn,
|
||||
@@ -46,34 +113,79 @@ pub fn upgrade<MSP>(
|
||||
where
|
||||
MSP: Fn() -> bool + Sync,
|
||||
{
|
||||
let upgrade_functions = UPGRADE_FUNCTIONS;
|
||||
let from = index.get_version(wtxn)?.unwrap_or(db_version);
|
||||
|
||||
let initial_version = index.get_version(wtxn)?.unwrap_or(db_version);
|
||||
let start =
|
||||
start(from).ok_or_else(|| InternalError::CannotUpgradeToVersion(from.0, from.1, from.2))?;
|
||||
|
||||
enum UpgradeVersion {}
|
||||
let upgrade_path = &UPGRADE_FUNCTIONS[start..];
|
||||
|
||||
let mut current_version = from;
|
||||
let mut regenerate_stats = false;
|
||||
for (i, upgrade) in upgrade_functions.iter().enumerate() {
|
||||
for (i, upgrade) in upgrade_path.iter().enumerate() {
|
||||
if (must_stop_processing)() {
|
||||
return Err(crate::Error::InternalError(InternalError::AbortedIndexation));
|
||||
}
|
||||
if upgrade.must_upgrade(initial_version) {
|
||||
regenerate_stats |= upgrade.upgrade(wtxn, index, progress.clone())?;
|
||||
progress.update_progress(VariableNameStep::<UpgradeVersion>::new(
|
||||
upgrade.description(),
|
||||
i as u32,
|
||||
upgrade_functions.len() as u32,
|
||||
));
|
||||
} else {
|
||||
progress.update_progress(VariableNameStep::<UpgradeVersion>::new(
|
||||
"Skipping migration that must not be applied",
|
||||
i as u32,
|
||||
upgrade_functions.len() as u32,
|
||||
));
|
||||
}
|
||||
let target = upgrade.target_version();
|
||||
progress.update_progress(VariableNameStep::<UpgradeVersion>::new(
|
||||
format!(
|
||||
"Upgrading from v{}.{}.{} to v{}.{}.{}",
|
||||
current_version.0,
|
||||
current_version.1,
|
||||
current_version.2,
|
||||
target.0,
|
||||
target.1,
|
||||
target.2
|
||||
),
|
||||
i as u32,
|
||||
upgrade_path.len() as u32,
|
||||
));
|
||||
regenerate_stats |= upgrade.upgrade(wtxn, index, from, progress.clone())?;
|
||||
index.put_version(wtxn, target)?;
|
||||
current_version = target;
|
||||
}
|
||||
|
||||
index.put_version(wtxn, (VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH))?;
|
||||
|
||||
Ok(regenerate_stats)
|
||||
}
|
||||
|
||||
#[allow(non_camel_case_types)]
|
||||
struct ToCurrentNoOp {}
|
||||
|
||||
impl UpgradeIndex for ToCurrentNoOp {
|
||||
fn upgrade(
|
||||
&self,
|
||||
_wtxn: &mut RwTxn,
|
||||
_index: &Index,
|
||||
_original: (u32, u32, u32),
|
||||
_progress: Progress,
|
||||
) -> Result<bool> {
|
||||
Ok(false)
|
||||
}
|
||||
|
||||
fn target_version(&self) -> (u32, u32, u32) {
|
||||
(VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH)
|
||||
}
|
||||
}
|
||||
|
||||
/// Perform no operation during the upgrade except changing to the specified target version.
|
||||
#[allow(non_camel_case_types)]
|
||||
struct ToTargetNoOp {
|
||||
pub target: (u32, u32, u32),
|
||||
}
|
||||
|
||||
impl UpgradeIndex for ToTargetNoOp {
|
||||
fn upgrade(
|
||||
&self,
|
||||
_wtxn: &mut RwTxn,
|
||||
_index: &Index,
|
||||
_original: (u32, u32, u32),
|
||||
_progress: Progress,
|
||||
) -> Result<bool> {
|
||||
Ok(false)
|
||||
}
|
||||
|
||||
fn target_version(&self) -> (u32, u32, u32) {
|
||||
self.target
|
||||
}
|
||||
}
|
||||
|
||||
@@ -4,10 +4,17 @@ use super::UpgradeIndex;
|
||||
use crate::progress::Progress;
|
||||
use crate::{make_enum_progress, Index, Result};
|
||||
|
||||
pub(super) struct FixFieldDistribution {}
|
||||
#[allow(non_camel_case_types)]
|
||||
pub(super) struct V1_12_To_V1_12_3 {}
|
||||
|
||||
impl UpgradeIndex for FixFieldDistribution {
|
||||
fn upgrade(&self, wtxn: &mut RwTxn, index: &Index, progress: Progress) -> Result<bool> {
|
||||
impl UpgradeIndex for V1_12_To_V1_12_3 {
|
||||
fn upgrade(
|
||||
&self,
|
||||
wtxn: &mut RwTxn,
|
||||
index: &Index,
|
||||
_original: (u32, u32, u32),
|
||||
progress: Progress,
|
||||
) -> Result<bool> {
|
||||
make_enum_progress! {
|
||||
enum FieldDistribution {
|
||||
RebuildingFieldDistribution,
|
||||
@@ -18,28 +25,27 @@ impl UpgradeIndex for FixFieldDistribution {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn must_upgrade(&self, initial_version: (u32, u32, u32)) -> bool {
|
||||
initial_version < (1, 12, 3)
|
||||
}
|
||||
|
||||
fn description(&self) -> &'static str {
|
||||
"Recomputing field distribution which was wrong before v1.12.3"
|
||||
fn target_version(&self) -> (u32, u32, u32) {
|
||||
(1, 12, 3)
|
||||
}
|
||||
}
|
||||
|
||||
pub(super) struct RecomputeStats {}
|
||||
#[allow(non_camel_case_types)]
|
||||
pub(super) struct V1_12_3_To_V1_13_0 {}
|
||||
|
||||
impl UpgradeIndex for RecomputeStats {
|
||||
fn upgrade(&self, _wtxn: &mut RwTxn, _index: &Index, _progress: Progress) -> Result<bool> {
|
||||
impl UpgradeIndex for V1_12_3_To_V1_13_0 {
|
||||
fn upgrade(
|
||||
&self,
|
||||
_wtxn: &mut RwTxn,
|
||||
_index: &Index,
|
||||
_original: (u32, u32, u32),
|
||||
_progress: Progress,
|
||||
) -> Result<bool> {
|
||||
// recompute the indexes stats
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn must_upgrade(&self, initial_version: (u32, u32, u32)) -> bool {
|
||||
initial_version < (1, 13, 0)
|
||||
}
|
||||
|
||||
fn description(&self) -> &'static str {
|
||||
"Recomputing stats"
|
||||
fn target_version(&self) -> (u32, u32, u32) {
|
||||
(1, 13, 0)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -5,10 +5,17 @@ use crate::database_stats::DatabaseStats;
|
||||
use crate::progress::Progress;
|
||||
use crate::{make_enum_progress, Index, Result};
|
||||
|
||||
pub(super) struct AddNewStats();
|
||||
#[allow(non_camel_case_types)]
|
||||
pub(super) struct V1_13_0_To_V1_13_1();
|
||||
|
||||
impl UpgradeIndex for AddNewStats {
|
||||
fn upgrade(&self, wtxn: &mut RwTxn, index: &Index, progress: Progress) -> Result<bool> {
|
||||
impl UpgradeIndex for V1_13_0_To_V1_13_1 {
|
||||
fn upgrade(
|
||||
&self,
|
||||
wtxn: &mut RwTxn,
|
||||
index: &Index,
|
||||
_original: (u32, u32, u32),
|
||||
progress: Progress,
|
||||
) -> Result<bool> {
|
||||
make_enum_progress! {
|
||||
enum DocumentsStats {
|
||||
CreatingDocumentsStats,
|
||||
@@ -23,11 +30,26 @@ impl UpgradeIndex for AddNewStats {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn must_upgrade(&self, initial_version: (u32, u32, u32)) -> bool {
|
||||
initial_version < (1, 13, 1)
|
||||
}
|
||||
|
||||
fn description(&self) -> &'static str {
|
||||
"Computing newly introduced document stats"
|
||||
fn target_version(&self) -> (u32, u32, u32) {
|
||||
(1, 13, 1)
|
||||
}
|
||||
}
|
||||
|
||||
#[allow(non_camel_case_types)]
|
||||
pub(super) struct V1_13_1_To_Latest_V1_13();
|
||||
|
||||
impl UpgradeIndex for V1_13_1_To_Latest_V1_13 {
|
||||
fn upgrade(
|
||||
&self,
|
||||
_wtxn: &mut RwTxn,
|
||||
_index: &Index,
|
||||
_original: (u32, u32, u32),
|
||||
_progress: Progress,
|
||||
) -> Result<bool> {
|
||||
Ok(false)
|
||||
}
|
||||
|
||||
fn target_version(&self) -> (u32, u32, u32) {
|
||||
(1, 13, 3)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -5,10 +5,17 @@ use super::UpgradeIndex;
|
||||
use crate::progress::Progress;
|
||||
use crate::{make_enum_progress, Index, Result};
|
||||
|
||||
pub(super) struct UpgradeArroyVersion();
|
||||
#[allow(non_camel_case_types)]
|
||||
pub(super) struct Latest_V1_13_To_Latest_V1_14();
|
||||
|
||||
impl UpgradeIndex for UpgradeArroyVersion {
|
||||
fn upgrade(&self, wtxn: &mut RwTxn, index: &Index, progress: Progress) -> Result<bool> {
|
||||
impl UpgradeIndex for Latest_V1_13_To_Latest_V1_14 {
|
||||
fn upgrade(
|
||||
&self,
|
||||
wtxn: &mut RwTxn,
|
||||
index: &Index,
|
||||
_original: (u32, u32, u32),
|
||||
progress: Progress,
|
||||
) -> Result<bool> {
|
||||
make_enum_progress! {
|
||||
enum VectorStore {
|
||||
UpdateInternalVersions,
|
||||
@@ -28,11 +35,7 @@ impl UpgradeIndex for UpgradeArroyVersion {
|
||||
Ok(false)
|
||||
}
|
||||
|
||||
fn must_upgrade(&self, initial_version: (u32, u32, u32)) -> bool {
|
||||
initial_version < (1, 14, 0)
|
||||
}
|
||||
|
||||
fn description(&self) -> &'static str {
|
||||
"Updating vector store with an internal version"
|
||||
fn target_version(&self) -> (u32, u32, u32) {
|
||||
(1, 14, 0)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -7,21 +7,25 @@ use crate::progress::Progress;
|
||||
use crate::update::new::indexer::recompute_word_fst_from_word_docids_database;
|
||||
use crate::{Index, Result};
|
||||
|
||||
pub(super) struct RecomputeWordFst();
|
||||
#[allow(non_camel_case_types)]
|
||||
pub(super) struct Latest_V1_14_To_Latest_V1_15();
|
||||
|
||||
impl UpgradeIndex for RecomputeWordFst {
|
||||
fn upgrade(&self, wtxn: &mut RwTxn, index: &Index, progress: Progress) -> Result<bool> {
|
||||
impl UpgradeIndex for Latest_V1_14_To_Latest_V1_15 {
|
||||
fn upgrade(
|
||||
&self,
|
||||
wtxn: &mut RwTxn,
|
||||
index: &Index,
|
||||
_original: (u32, u32, u32),
|
||||
progress: Progress,
|
||||
) -> Result<bool> {
|
||||
// Recompute the word FST from the word docids database.
|
||||
recompute_word_fst_from_word_docids_database(index, wtxn, &progress)?;
|
||||
|
||||
Ok(false)
|
||||
}
|
||||
fn must_upgrade(&self, initial_version: (u32, u32, u32)) -> bool {
|
||||
initial_version < (1, 15, 0)
|
||||
}
|
||||
|
||||
fn description(&self) -> &'static str {
|
||||
"Recomputing word FST from word docids database as it was wrong before v1.15.0"
|
||||
fn target_version(&self) -> (u32, u32, u32) {
|
||||
(1, 15, 0)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -6,10 +6,17 @@ use crate::progress::Progress;
|
||||
use crate::vector::db::{EmbedderInfo, EmbeddingStatus};
|
||||
use crate::{Index, InternalError, Result};
|
||||
|
||||
pub(super) struct SwitchToMultimodal();
|
||||
#[allow(non_camel_case_types)]
|
||||
pub(super) struct Latest_V1_15_To_V1_16_0();
|
||||
|
||||
impl UpgradeIndex for SwitchToMultimodal {
|
||||
fn upgrade(&self, wtxn: &mut RwTxn, index: &Index, _progress: Progress) -> Result<bool> {
|
||||
impl UpgradeIndex for Latest_V1_15_To_V1_16_0 {
|
||||
fn upgrade(
|
||||
&self,
|
||||
wtxn: &mut RwTxn,
|
||||
index: &Index,
|
||||
_original: (u32, u32, u32),
|
||||
_progress: Progress,
|
||||
) -> Result<bool> {
|
||||
let v1_15_indexing_configs = index
|
||||
.main
|
||||
.remap_types::<Str, SerdeJson<Vec<super::v1_15::IndexEmbeddingConfig>>>()
|
||||
@@ -34,11 +41,8 @@ impl UpgradeIndex for SwitchToMultimodal {
|
||||
|
||||
Ok(false)
|
||||
}
|
||||
fn must_upgrade(&self, initial_version: (u32, u32, u32)) -> bool {
|
||||
initial_version < (1, 16, 0)
|
||||
}
|
||||
|
||||
fn description(&self) -> &'static str {
|
||||
"Migrating the database for multimodal support"
|
||||
fn target_version(&self) -> (u32, u32, u32) {
|
||||
(1, 16, 0)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2,7 +2,6 @@ use candle_core::Tensor;
|
||||
use candle_nn::VarBuilder;
|
||||
use candle_transformers::models::bert::{BertModel, Config as BertConfig, DTYPE};
|
||||
use candle_transformers::models::modernbert::{Config as ModernConfig, ModernBert};
|
||||
use candle_transformers::models::xlm_roberta::{Config as XlmRobertaConfig, XLMRobertaModel};
|
||||
// FIXME: currently we'll be using the hub to retrieve model, in the future we might want to embed it into Meilisearch itself
|
||||
use hf_hub::api::sync::Api;
|
||||
use hf_hub::{Repo, RepoType};
|
||||
@@ -90,7 +89,6 @@ impl Default for EmbedderOptions {
|
||||
enum ModelKind {
|
||||
Bert(BertModel),
|
||||
Modern(ModernBert),
|
||||
XlmRoberta(XLMRobertaModel),
|
||||
}
|
||||
|
||||
/// Perform embedding of documents and queries
|
||||
@@ -306,8 +304,7 @@ impl Embedder {
|
||||
};
|
||||
|
||||
let is_modern = has_arch("modernbert");
|
||||
let is_xlm_roberta = has_arch("xlm-roberta") || has_arch("xlm_roberta");
|
||||
tracing::debug!(is_modern, is_xlm_roberta, model_type, "detected HF architecture");
|
||||
tracing::debug!(is_modern, model_type, "detected HF architecture");
|
||||
|
||||
let mut tokenizer = Tokenizer::from_file(&tokenizer_filename)
|
||||
.map_err(|inner| NewEmbedderError::open_tokenizer(tokenizer_filename, inner))?;
|
||||
@@ -343,18 +340,6 @@ impl Embedder {
|
||||
)
|
||||
})?;
|
||||
ModelKind::Modern(ModernBert::load(vb, &config).map_err(NewEmbedderError::load_model)?)
|
||||
} else if is_xlm_roberta {
|
||||
let config: XlmRobertaConfig = serde_json::from_str(&config_str).map_err(|inner| {
|
||||
NewEmbedderError::deserialize_config(
|
||||
options.model.clone(),
|
||||
config_str.clone(),
|
||||
config_filename.clone(),
|
||||
inner,
|
||||
)
|
||||
})?;
|
||||
ModelKind::XlmRoberta(
|
||||
XLMRobertaModel::new(&config, vb).map_err(NewEmbedderError::load_model)?,
|
||||
)
|
||||
} else {
|
||||
let config: BertConfig = serde_json::from_str(&config_str).map_err(|inner| {
|
||||
NewEmbedderError::deserialize_config(
|
||||
@@ -466,19 +451,6 @@ impl Embedder {
|
||||
let mask = Tensor::stack(&[mask], 0).map_err(EmbedError::tensor_shape)?;
|
||||
model.forward(&token_ids, &mask).map_err(EmbedError::model_forward)?
|
||||
}
|
||||
ModelKind::XlmRoberta(model) => {
|
||||
let mut mask_vec = tokens.get_attention_mask().to_vec();
|
||||
if mask_vec.len() > self.max_len {
|
||||
mask_vec.truncate(self.max_len);
|
||||
}
|
||||
let mask = Tensor::new(mask_vec.as_slice(), &self.device)
|
||||
.map_err(EmbedError::tensor_shape)?;
|
||||
let mask = Tensor::stack(&[mask], 0).map_err(EmbedError::tensor_shape)?;
|
||||
let token_type_ids = token_ids.zeros_like().map_err(EmbedError::tensor_shape)?;
|
||||
model
|
||||
.forward(&token_ids, &mask, &token_type_ids, None, None, None)
|
||||
.map_err(EmbedError::model_forward)?
|
||||
}
|
||||
};
|
||||
|
||||
let embedding = Self::pooling(embeddings, self.pooling)?;
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
use hannoy::distances::{Cosine, Hamming};
|
||||
use hannoy::{ItemId, Searched};
|
||||
use hannoy::ItemId;
|
||||
use heed::{RoTxn, RwTxn, Unspecified};
|
||||
use ordered_float::OrderedFloat;
|
||||
use rand::SeedableRng as _;
|
||||
@@ -974,7 +974,7 @@ impl VectorStore {
|
||||
}
|
||||
|
||||
if let Some(mut ret) = searcher.by_item(rtxn, item)? {
|
||||
results.append(&mut ret.nns);
|
||||
results.append(&mut ret);
|
||||
}
|
||||
}
|
||||
results.sort_unstable_by_key(|(_, distance)| OrderedFloat(*distance));
|
||||
@@ -1028,9 +1028,10 @@ impl VectorStore {
|
||||
searcher.candidates(filter);
|
||||
}
|
||||
|
||||
let Searched { mut nns, did_cancel: _ } =
|
||||
searcher.by_vector_with_cancellation(rtxn, vector, || time_budget.exceeded())?;
|
||||
results.append(&mut nns);
|
||||
let (res, _degraded) =
|
||||
&mut searcher
|
||||
.by_vector_with_cancellation(rtxn, vector, || time_budget.exceeded())?;
|
||||
results.append(res);
|
||||
}
|
||||
|
||||
results.sort_unstable_by_key(|(_, distance)| OrderedFloat(*distance));
|
||||
|
||||
@@ -22,7 +22,6 @@ reqwest = { version = "0.12.24", features = [
|
||||
"json",
|
||||
"rustls-tls",
|
||||
], default-features = false }
|
||||
semver = "1.0.27"
|
||||
serde = { version = "1.0.228", features = ["derive"] }
|
||||
serde_json = "1.0.145"
|
||||
sha2 = "0.10.9"
|
||||
@@ -43,4 +42,3 @@ tracing = "0.1.41"
|
||||
tracing-subscriber = "0.3.20"
|
||||
tracing-trace = { version = "0.1.0", path = "../tracing-trace" }
|
||||
uuid = { version = "1.18.1", features = ["v7", "serde"] }
|
||||
similar-asserts = "1.7.0"
|
||||
|
||||
@@ -3,22 +3,21 @@ use std::io::{Read as _, Seek as _, Write as _};
|
||||
|
||||
use anyhow::{bail, Context};
|
||||
use futures_util::TryStreamExt as _;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use serde::Deserialize;
|
||||
use sha2::Digest;
|
||||
|
||||
use super::client::Client;
|
||||
|
||||
#[derive(Serialize, Deserialize, Clone, Debug)]
|
||||
#[derive(Deserialize, Clone)]
|
||||
pub struct Asset {
|
||||
pub local_location: Option<String>,
|
||||
pub remote_location: Option<String>,
|
||||
#[serde(default, skip_serializing_if = "AssetFormat::is_default")]
|
||||
#[serde(default)]
|
||||
pub format: AssetFormat,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
pub sha256: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Serialize, Deserialize, Default, Copy, Clone, Debug)]
|
||||
#[derive(Deserialize, Default, Copy, Clone)]
|
||||
pub enum AssetFormat {
|
||||
#[default]
|
||||
Auto,
|
||||
@@ -28,10 +27,6 @@ pub enum AssetFormat {
|
||||
}
|
||||
|
||||
impl AssetFormat {
|
||||
fn is_default(&self) -> bool {
|
||||
matches!(self, AssetFormat::Auto)
|
||||
}
|
||||
|
||||
pub fn to_content_type(self, filename: &str) -> &'static str {
|
||||
match self {
|
||||
AssetFormat::Auto => Self::auto_detect(filename).to_content_type(filename),
|
||||
@@ -171,14 +166,7 @@ fn check_sha256(name: &str, asset: &Asset, mut file: std::fs::File) -> anyhow::R
|
||||
}
|
||||
}
|
||||
None => {
|
||||
let msg = match name.starts_with("meilisearch-") {
|
||||
true => "Please add it to crates/xtask/src/common/instance/release.rs",
|
||||
false => "Please add it to workload file",
|
||||
};
|
||||
tracing::warn!(
|
||||
sha256 = file_hash,
|
||||
"Skipping hash for asset {name} that doesn't have one. {msg}"
|
||||
);
|
||||
tracing::warn!(sha256 = file_hash, "Skipping hash for asset {name} that doesn't have one. Please add it to workload file");
|
||||
true
|
||||
}
|
||||
})
|
||||
@@ -1,5 +1,5 @@
|
||||
use anyhow::Context;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use serde::Deserialize;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Client {
|
||||
@@ -61,7 +61,7 @@ impl Client {
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
|
||||
#[derive(Debug, Clone, Copy, Deserialize)]
|
||||
#[serde(rename_all = "SCREAMING_SNAKE_CASE")]
|
||||
pub enum Method {
|
||||
Get,
|
||||
194
crates/xtask/src/bench/command.rs
Normal file
194
crates/xtask/src/bench/command.rs
Normal file
@@ -0,0 +1,194 @@
|
||||
use std::collections::BTreeMap;
|
||||
use std::fmt::Display;
|
||||
use std::io::Read as _;
|
||||
|
||||
use anyhow::{bail, Context as _};
|
||||
use serde::Deserialize;
|
||||
|
||||
use super::assets::{fetch_asset, Asset};
|
||||
use super::client::{Client, Method};
|
||||
|
||||
#[derive(Clone, Deserialize)]
|
||||
pub struct Command {
|
||||
pub route: String,
|
||||
pub method: Method,
|
||||
#[serde(default)]
|
||||
pub body: Body,
|
||||
#[serde(default)]
|
||||
pub synchronous: SyncMode,
|
||||
}
|
||||
|
||||
#[derive(Default, Clone, Deserialize)]
|
||||
#[serde(untagged)]
|
||||
pub enum Body {
|
||||
Inline {
|
||||
inline: serde_json::Value,
|
||||
},
|
||||
Asset {
|
||||
asset: String,
|
||||
},
|
||||
#[default]
|
||||
Empty,
|
||||
}
|
||||
|
||||
impl Body {
|
||||
pub fn get(
|
||||
self,
|
||||
assets: &BTreeMap<String, Asset>,
|
||||
asset_folder: &str,
|
||||
) -> anyhow::Result<Option<(Vec<u8>, &'static str)>> {
|
||||
Ok(match self {
|
||||
Body::Inline { inline: body } => Some((
|
||||
serde_json::to_vec(&body)
|
||||
.context("serializing to bytes")
|
||||
.context("while getting inline body")?,
|
||||
"application/json",
|
||||
)),
|
||||
Body::Asset { asset: name } => Some({
|
||||
let context = || format!("while getting body from asset '{name}'");
|
||||
let (mut file, format) =
|
||||
fetch_asset(&name, assets, asset_folder).with_context(context)?;
|
||||
let mut buf = Vec::new();
|
||||
file.read_to_end(&mut buf).with_context(context)?;
|
||||
(buf, format.to_content_type(&name))
|
||||
}),
|
||||
Body::Empty => None,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl Display for Command {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
write!(f, "{:?} {} ({:?})", self.method, self.route, self.synchronous)
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Default, Debug, Clone, Copy, Deserialize)]
|
||||
pub enum SyncMode {
|
||||
DontWait,
|
||||
#[default]
|
||||
WaitForResponse,
|
||||
WaitForTask,
|
||||
}
|
||||
|
||||
pub async fn run_batch(
|
||||
client: &Client,
|
||||
batch: &[Command],
|
||||
assets: &BTreeMap<String, Asset>,
|
||||
asset_folder: &str,
|
||||
) -> anyhow::Result<()> {
|
||||
let [.., last] = batch else { return Ok(()) };
|
||||
let sync = last.synchronous;
|
||||
|
||||
let mut tasks = tokio::task::JoinSet::new();
|
||||
|
||||
for command in batch {
|
||||
// FIXME: you probably don't want to copy assets everytime here
|
||||
tasks.spawn({
|
||||
let client = client.clone();
|
||||
let command = command.clone();
|
||||
let assets = assets.clone();
|
||||
let asset_folder = asset_folder.to_owned();
|
||||
|
||||
async move { run(client, command, &assets, &asset_folder).await }
|
||||
});
|
||||
}
|
||||
|
||||
while let Some(result) = tasks.join_next().await {
|
||||
result
|
||||
.context("panicked while executing command")?
|
||||
.context("error while executing command")?;
|
||||
}
|
||||
|
||||
match sync {
|
||||
SyncMode::DontWait => {}
|
||||
SyncMode::WaitForResponse => {}
|
||||
SyncMode::WaitForTask => wait_for_tasks(client).await?,
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn wait_for_tasks(client: &Client) -> anyhow::Result<()> {
|
||||
loop {
|
||||
let response = client
|
||||
.get("tasks?statuses=enqueued,processing")
|
||||
.send()
|
||||
.await
|
||||
.context("could not wait for tasks")?;
|
||||
let response: serde_json::Value = response
|
||||
.json()
|
||||
.await
|
||||
.context("could not deserialize response to JSON")
|
||||
.context("could not wait for tasks")?;
|
||||
match response.get("total") {
|
||||
Some(serde_json::Value::Number(number)) => {
|
||||
let number = number.as_u64().with_context(|| {
|
||||
format!("waiting for tasks: could not parse 'total' as integer, got {}", number)
|
||||
})?;
|
||||
if number == 0 {
|
||||
break;
|
||||
} else {
|
||||
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
Some(thing_else) => {
|
||||
bail!(format!(
|
||||
"waiting for tasks: could not parse 'total' as a number, got '{thing_else}'"
|
||||
))
|
||||
}
|
||||
None => {
|
||||
bail!(format!(
|
||||
"waiting for tasks: expected response to contain 'total', got '{response}'"
|
||||
))
|
||||
}
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[tracing::instrument(skip(client, command, assets, asset_folder), fields(command = %command))]
|
||||
pub async fn run(
|
||||
client: Client,
|
||||
mut command: Command,
|
||||
assets: &BTreeMap<String, Asset>,
|
||||
asset_folder: &str,
|
||||
) -> anyhow::Result<()> {
|
||||
// memtake the body here to leave an empty body in its place, so that command is not partially moved-out
|
||||
let body = std::mem::take(&mut command.body)
|
||||
.get(assets, asset_folder)
|
||||
.with_context(|| format!("while getting body for command {command}"))?;
|
||||
|
||||
let request = client.request(command.method.into(), &command.route);
|
||||
|
||||
let request = if let Some((body, content_type)) = body {
|
||||
request.body(body).header(reqwest::header::CONTENT_TYPE, content_type)
|
||||
} else {
|
||||
request
|
||||
};
|
||||
|
||||
let response =
|
||||
request.send().await.with_context(|| format!("error sending command: {}", command))?;
|
||||
|
||||
let code = response.status();
|
||||
if code.is_client_error() {
|
||||
tracing::error!(%command, %code, "error in workload file");
|
||||
let response: serde_json::Value = response
|
||||
.json()
|
||||
.await
|
||||
.context("could not deserialize response as JSON")
|
||||
.context("parsing error in workload file when sending command")?;
|
||||
bail!("error in workload file: server responded with error code {code} and '{response}'")
|
||||
} else if code.is_server_error() {
|
||||
tracing::error!(%command, %code, "server error");
|
||||
let response: serde_json::Value = response
|
||||
.json()
|
||||
.await
|
||||
.context("could not deserialize response as JSON")
|
||||
.context("parsing server error when sending command")?;
|
||||
bail!("server error: server responded with error code {code} and '{response}'")
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
@@ -7,9 +7,9 @@ use tokio::task::AbortHandle;
|
||||
use tracing_trace::processor::span_stats::CallStats;
|
||||
use uuid::Uuid;
|
||||
|
||||
use super::client::Client;
|
||||
use super::env_info;
|
||||
use super::workload::BenchWorkload;
|
||||
use crate::common::client::Client;
|
||||
use super::workload::Workload;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum DashboardClient {
|
||||
@@ -89,7 +89,7 @@ impl DashboardClient {
|
||||
pub async fn create_workload(
|
||||
&self,
|
||||
invocation_uuid: Uuid,
|
||||
workload: &BenchWorkload,
|
||||
workload: &Workload,
|
||||
) -> anyhow::Result<Uuid> {
|
||||
let Self::Client(dashboard_client) = self else { return Ok(Uuid::now_v7()) };
|
||||
|
||||
|
||||
@@ -1,18 +1,18 @@
|
||||
use std::collections::{BTreeMap, HashMap};
|
||||
use std::collections::BTreeMap;
|
||||
use std::time::Duration;
|
||||
|
||||
use anyhow::{bail, Context as _};
|
||||
use tokio::process::Command as TokioCommand;
|
||||
use tokio::process::Command;
|
||||
use tokio::time;
|
||||
|
||||
use crate::common::client::Client;
|
||||
use crate::common::command::{health_command, run as run_command};
|
||||
use crate::common::instance::{Binary, BinarySource, Edition};
|
||||
use super::assets::Asset;
|
||||
use super::client::Client;
|
||||
use super::workload::Workload;
|
||||
|
||||
pub async fn kill_meili(mut meilisearch: tokio::process::Child) {
|
||||
pub async fn kill(mut meilisearch: tokio::process::Child) {
|
||||
let Some(id) = meilisearch.id() else { return };
|
||||
|
||||
match TokioCommand::new("kill").args(["--signal=TERM", &id.to_string()]).spawn() {
|
||||
match Command::new("kill").args(["--signal=TERM", &id.to_string()]).spawn() {
|
||||
Ok(mut cmd) => {
|
||||
let Err(error) = cmd.wait().await else { return };
|
||||
tracing::warn!(
|
||||
@@ -49,12 +49,9 @@ pub async fn kill_meili(mut meilisearch: tokio::process::Child) {
|
||||
}
|
||||
|
||||
#[tracing::instrument]
|
||||
async fn build(edition: Edition) -> anyhow::Result<()> {
|
||||
let mut command = TokioCommand::new("cargo");
|
||||
pub async fn build() -> anyhow::Result<()> {
|
||||
let mut command = Command::new("cargo");
|
||||
command.arg("build").arg("--release").arg("-p").arg("meilisearch");
|
||||
if let Edition::Enterprise = edition {
|
||||
command.arg("--features=enterprise");
|
||||
}
|
||||
|
||||
command.kill_on_drop(true);
|
||||
|
||||
@@ -67,68 +64,29 @@ async fn build(edition: Edition) -> anyhow::Result<()> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[tracing::instrument(skip(client, master_key))]
|
||||
pub async fn start_meili(
|
||||
#[tracing::instrument(skip(client, master_key, workload), fields(workload = workload.name))]
|
||||
pub async fn start(
|
||||
client: &Client,
|
||||
master_key: Option<&str>,
|
||||
binary: &Binary,
|
||||
workload: &Workload,
|
||||
asset_folder: &str,
|
||||
mut command: Command,
|
||||
) -> anyhow::Result<tokio::process::Child> {
|
||||
let mut command = match &binary.source {
|
||||
BinarySource::Build { edition } => {
|
||||
build(*edition).await?;
|
||||
let mut command = tokio::process::Command::new("cargo");
|
||||
|
||||
command
|
||||
.arg("run")
|
||||
.arg("--release")
|
||||
.arg("-p")
|
||||
.arg("meilisearch")
|
||||
.arg("--bin")
|
||||
.arg("meilisearch");
|
||||
if let Edition::Enterprise = *edition {
|
||||
command.arg("--features=enterprise");
|
||||
}
|
||||
command.arg("--");
|
||||
command
|
||||
}
|
||||
BinarySource::Release(release) => {
|
||||
let binary_path = release.binary_path(asset_folder)?;
|
||||
tokio::process::Command::new(binary_path)
|
||||
}
|
||||
BinarySource::Path(binary_path) => tokio::process::Command::new(binary_path),
|
||||
};
|
||||
|
||||
command.arg("--db-path").arg("./_xtask_benchmark.ms");
|
||||
if let Some(master_key) = master_key {
|
||||
command.arg("--master-key").arg(master_key);
|
||||
}
|
||||
command.arg("--experimental-enable-logs-route");
|
||||
|
||||
for extra_arg in binary.extra_cli_args.iter() {
|
||||
for extra_arg in workload.extra_cli_args.iter() {
|
||||
command.arg(extra_arg);
|
||||
}
|
||||
|
||||
command.kill_on_drop(true);
|
||||
|
||||
#[cfg(unix)]
|
||||
{
|
||||
use std::os::unix::fs::PermissionsExt;
|
||||
if let Some(binary_path) = binary.binary_path(asset_folder)? {
|
||||
let mut perms = tokio::fs::metadata(&binary_path)
|
||||
.await
|
||||
.with_context(|| format!("could not get metadata for {binary_path:?}"))?
|
||||
.permissions();
|
||||
perms.set_mode(perms.mode() | 0o111);
|
||||
tokio::fs::set_permissions(&binary_path, perms)
|
||||
.await
|
||||
.with_context(|| format!("could not set permissions for {binary_path:?}"))?;
|
||||
}
|
||||
}
|
||||
|
||||
let mut meilisearch = command.spawn().context("Error starting Meilisearch")?;
|
||||
|
||||
wait_for_health(client, &mut meilisearch).await?;
|
||||
wait_for_health(client, &mut meilisearch, &workload.assets, asset_folder).await?;
|
||||
|
||||
Ok(meilisearch)
|
||||
}
|
||||
@@ -136,11 +94,11 @@ pub async fn start_meili(
|
||||
async fn wait_for_health(
|
||||
client: &Client,
|
||||
meilisearch: &mut tokio::process::Child,
|
||||
assets: &BTreeMap<String, Asset>,
|
||||
asset_folder: &str,
|
||||
) -> anyhow::Result<()> {
|
||||
for i in 0..100 {
|
||||
let res =
|
||||
run_command(client, &health_command(), 0, &BTreeMap::new(), HashMap::new(), "", false)
|
||||
.await;
|
||||
let res = super::command::run(client.clone(), health_command(), assets, asset_folder).await;
|
||||
if res.is_ok() {
|
||||
// check that this is actually the current Meilisearch instance that answered us
|
||||
if let Some(exit_code) =
|
||||
@@ -164,6 +122,15 @@ async fn wait_for_health(
|
||||
bail!("meilisearch is not responding")
|
||||
}
|
||||
|
||||
pub async fn delete_db() {
|
||||
let _ = tokio::fs::remove_dir_all("./_xtask_benchmark.ms").await;
|
||||
fn health_command() -> super::command::Command {
|
||||
super::command::Command {
|
||||
route: "/health".into(),
|
||||
method: super::client::Method::Get,
|
||||
body: Default::default(),
|
||||
synchronous: super::command::SyncMode::WaitForResponse,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn delete_db() {
|
||||
let _ = std::fs::remove_dir_all("./_xtask_benchmark.ms");
|
||||
}
|
||||
@@ -1,36 +1,51 @@
|
||||
mod assets;
|
||||
mod client;
|
||||
mod command;
|
||||
mod dashboard;
|
||||
mod env_info;
|
||||
mod meili_process;
|
||||
mod workload;
|
||||
|
||||
use crate::common::args::CommonArgs;
|
||||
use crate::common::logs::setup_logs;
|
||||
use crate::common::workload::Workload;
|
||||
use std::{path::PathBuf, sync::Arc};
|
||||
use std::io::LineWriter;
|
||||
use std::path::PathBuf;
|
||||
|
||||
use anyhow::{bail, Context};
|
||||
use anyhow::Context;
|
||||
use clap::Parser;
|
||||
use tracing_subscriber::fmt::format::FmtSpan;
|
||||
use tracing_subscriber::layer::SubscriberExt;
|
||||
use tracing_subscriber::Layer;
|
||||
|
||||
use crate::common::client::Client;
|
||||
pub use workload::BenchWorkload;
|
||||
use self::client::Client;
|
||||
use self::workload::Workload;
|
||||
|
||||
pub fn default_http_addr() -> String {
|
||||
"127.0.0.1:7700".to_string()
|
||||
}
|
||||
pub fn default_report_folder() -> String {
|
||||
"./bench/reports/".into()
|
||||
}
|
||||
|
||||
pub fn default_asset_folder() -> String {
|
||||
"./bench/assets/".into()
|
||||
}
|
||||
|
||||
pub fn default_log_filter() -> String {
|
||||
"info".into()
|
||||
}
|
||||
|
||||
pub fn default_dashboard_url() -> String {
|
||||
"http://localhost:9001".into()
|
||||
}
|
||||
|
||||
/// Run benchmarks from a workload
|
||||
#[derive(Parser, Debug)]
|
||||
pub struct BenchArgs {
|
||||
/// Common arguments shared with other commands
|
||||
#[command(flatten)]
|
||||
common: CommonArgs,
|
||||
|
||||
/// Meilisearch master keys
|
||||
#[arg(long)]
|
||||
pub master_key: Option<String>,
|
||||
pub struct BenchDeriveArgs {
|
||||
/// Filename of the workload file, pass multiple filenames
|
||||
/// to run multiple workloads in the specified order.
|
||||
///
|
||||
/// Each workload run will get its own report file.
|
||||
#[arg(value_name = "WORKLOAD_FILE", last = false)]
|
||||
workload_file: Vec<PathBuf>,
|
||||
|
||||
/// URL of the dashboard.
|
||||
#[arg(long, default_value_t = default_dashboard_url())]
|
||||
@@ -44,14 +59,34 @@ pub struct BenchArgs {
|
||||
#[arg(long, default_value_t = default_report_folder())]
|
||||
report_folder: String,
|
||||
|
||||
/// Directory to store the remote assets.
|
||||
#[arg(long, default_value_t = default_asset_folder())]
|
||||
asset_folder: String,
|
||||
|
||||
/// Log directives
|
||||
#[arg(short, long, default_value_t = default_log_filter())]
|
||||
log_filter: String,
|
||||
|
||||
/// Benchmark dashboard API key
|
||||
#[arg(long)]
|
||||
api_key: Option<String>,
|
||||
|
||||
/// Meilisearch master keys
|
||||
#[arg(long)]
|
||||
master_key: Option<String>,
|
||||
|
||||
/// Authentication bearer for fetching assets
|
||||
#[arg(long)]
|
||||
assets_key: Option<String>,
|
||||
|
||||
/// Reason for the benchmark invocation
|
||||
#[arg(short, long)]
|
||||
reason: Option<String>,
|
||||
|
||||
/// The maximum time in seconds we allow for fetching the task queue before timing out.
|
||||
#[arg(long, default_value_t = 60)]
|
||||
tasks_queue_timeout_secs: u64,
|
||||
|
||||
/// The path to the binary to run.
|
||||
///
|
||||
/// If unspecified, runs `cargo run` after building Meilisearch with `cargo build`.
|
||||
@@ -59,8 +94,18 @@ pub struct BenchArgs {
|
||||
binary_path: Option<PathBuf>,
|
||||
}
|
||||
|
||||
pub fn run(args: BenchArgs) -> anyhow::Result<()> {
|
||||
setup_logs(&args.common.log_filter)?;
|
||||
pub fn run(args: BenchDeriveArgs) -> anyhow::Result<()> {
|
||||
// setup logs
|
||||
let filter: tracing_subscriber::filter::Targets =
|
||||
args.log_filter.parse().context("invalid --log-filter")?;
|
||||
|
||||
let subscriber = tracing_subscriber::registry().with(
|
||||
tracing_subscriber::fmt::layer()
|
||||
.with_writer(|| LineWriter::new(std::io::stderr()))
|
||||
.with_span_events(FmtSpan::NEW | FmtSpan::CLOSE)
|
||||
.with_filter(filter),
|
||||
);
|
||||
tracing::subscriber::set_global_default(subscriber).context("could not setup logging")?;
|
||||
|
||||
// fetch environment and build info
|
||||
let env = env_info::Environment::generate_from_current_config();
|
||||
@@ -71,11 +116,8 @@ pub fn run(args: BenchArgs) -> anyhow::Result<()> {
|
||||
let _scope = rt.enter();
|
||||
|
||||
// setup clients
|
||||
let assets_client = Client::new(
|
||||
None,
|
||||
args.common.assets_key.as_deref(),
|
||||
Some(std::time::Duration::from_secs(3600)), // 1h
|
||||
)?;
|
||||
let assets_client =
|
||||
Client::new(None, args.assets_key.as_deref(), Some(std::time::Duration::from_secs(3600)))?; // 1h
|
||||
|
||||
let dashboard_client = if args.no_dashboard {
|
||||
dashboard::DashboardClient::new_dry()
|
||||
@@ -92,11 +134,11 @@ pub fn run(args: BenchArgs) -> anyhow::Result<()> {
|
||||
None,
|
||||
)?;
|
||||
|
||||
let meili_client = Arc::new(Client::new(
|
||||
let meili_client = Client::new(
|
||||
Some("http://127.0.0.1:7700".into()),
|
||||
args.master_key.as_deref(),
|
||||
Some(std::time::Duration::from_secs(args.common.tasks_queue_timeout_secs)),
|
||||
)?);
|
||||
Some(std::time::Duration::from_secs(args.tasks_queue_timeout_secs)),
|
||||
)?;
|
||||
|
||||
// enter runtime
|
||||
|
||||
@@ -104,11 +146,11 @@ pub fn run(args: BenchArgs) -> anyhow::Result<()> {
|
||||
dashboard_client.send_machine_info(&env).await?;
|
||||
|
||||
let commit_message = build_info.commit_msg.unwrap_or_default().split('\n').next().unwrap();
|
||||
let max_workloads = args.common.workload_file.len();
|
||||
let max_workloads = args.workload_file.len();
|
||||
let reason: Option<&str> = args.reason.as_deref();
|
||||
let invocation_uuid = dashboard_client.create_invocation(build_info.clone(), commit_message, env, max_workloads, reason).await?;
|
||||
|
||||
tracing::info!(workload_count = args.common.workload_file.len(), "handling workload files");
|
||||
tracing::info!(workload_count = args.workload_file.len(), "handling workload files");
|
||||
|
||||
// main task
|
||||
let workload_runs = tokio::spawn(
|
||||
@@ -116,17 +158,13 @@ pub fn run(args: BenchArgs) -> anyhow::Result<()> {
|
||||
let dashboard_client = dashboard_client.clone();
|
||||
let mut dashboard_urls = Vec::new();
|
||||
async move {
|
||||
for workload_file in args.common.workload_file.iter() {
|
||||
for workload_file in args.workload_file.iter() {
|
||||
let workload: Workload = serde_json::from_reader(
|
||||
std::fs::File::open(workload_file)
|
||||
.with_context(|| format!("error opening {}", workload_file.display()))?,
|
||||
)
|
||||
.with_context(|| format!("error parsing {} as JSON", workload_file.display()))?;
|
||||
|
||||
let Workload::Bench(workload) = workload else {
|
||||
bail!("workload file {} is not a bench workload", workload_file.display());
|
||||
};
|
||||
|
||||
let workload_name = workload.name.clone();
|
||||
|
||||
workload::execute(
|
||||
|
||||
@@ -1,28 +1,24 @@
|
||||
use std::collections::{BTreeMap, HashMap};
|
||||
use std::collections::BTreeMap;
|
||||
use std::fs::File;
|
||||
use std::io::{Seek as _, Write as _};
|
||||
use std::path::Path;
|
||||
use std::sync::Arc;
|
||||
|
||||
use anyhow::{bail, Context as _};
|
||||
use futures_util::TryStreamExt as _;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use serde::Deserialize;
|
||||
use serde_json::json;
|
||||
use tokio::task::JoinHandle;
|
||||
use uuid::Uuid;
|
||||
|
||||
use super::assets::Asset;
|
||||
use super::client::Client;
|
||||
use super::command::SyncMode;
|
||||
use super::dashboard::DashboardClient;
|
||||
use super::BenchArgs;
|
||||
use crate::common::assets::{self, Asset};
|
||||
use crate::common::client::Client;
|
||||
use crate::common::command::{run_commands, Command};
|
||||
use crate::common::instance::Binary;
|
||||
use crate::common::process::{self, delete_db, start_meili};
|
||||
use super::BenchDeriveArgs;
|
||||
use crate::bench::{assets, meili_process};
|
||||
|
||||
/// A bench workload.
|
||||
/// Not to be confused with [a test workload](crate::test::workload::Workload).
|
||||
#[derive(Serialize, Deserialize, Debug)]
|
||||
pub struct BenchWorkload {
|
||||
#[derive(Deserialize)]
|
||||
pub struct Workload {
|
||||
pub name: String,
|
||||
pub run_count: u16,
|
||||
pub extra_cli_args: Vec<String>,
|
||||
@@ -30,34 +26,30 @@ pub struct BenchWorkload {
|
||||
#[serde(default)]
|
||||
pub target: String,
|
||||
#[serde(default)]
|
||||
pub precommands: Vec<Command>,
|
||||
pub commands: Vec<Command>,
|
||||
pub precommands: Vec<super::command::Command>,
|
||||
pub commands: Vec<super::command::Command>,
|
||||
}
|
||||
|
||||
async fn run_workload_commands(
|
||||
async fn run_commands(
|
||||
dashboard_client: &DashboardClient,
|
||||
logs_client: &Client,
|
||||
meili_client: &Arc<Client>,
|
||||
meili_client: &Client,
|
||||
workload_uuid: Uuid,
|
||||
workload: &BenchWorkload,
|
||||
args: &BenchArgs,
|
||||
workload: &Workload,
|
||||
args: &BenchDeriveArgs,
|
||||
run_number: u16,
|
||||
) -> anyhow::Result<JoinHandle<anyhow::Result<File>>> {
|
||||
let report_folder = &args.report_folder;
|
||||
let workload_name = &workload.name;
|
||||
let assets = Arc::new(workload.assets.clone());
|
||||
let asset_folder = args.common.asset_folder.clone().leak();
|
||||
|
||||
run_commands(
|
||||
meili_client,
|
||||
&workload.precommands,
|
||||
0,
|
||||
&assets,
|
||||
asset_folder,
|
||||
&mut HashMap::new(),
|
||||
false,
|
||||
)
|
||||
.await?;
|
||||
for batch in workload
|
||||
.precommands
|
||||
.as_slice()
|
||||
.split_inclusive(|command| !matches!(command.synchronous, SyncMode::DontWait))
|
||||
{
|
||||
super::command::run_batch(meili_client, batch, &workload.assets, &args.asset_folder)
|
||||
.await?;
|
||||
}
|
||||
|
||||
std::fs::create_dir_all(report_folder)
|
||||
.with_context(|| format!("could not create report directory at {report_folder}"))?;
|
||||
@@ -67,16 +59,14 @@ async fn run_workload_commands(
|
||||
|
||||
let report_handle = start_report(logs_client, trace_filename, &workload.target).await?;
|
||||
|
||||
run_commands(
|
||||
meili_client,
|
||||
&workload.commands,
|
||||
0,
|
||||
&assets,
|
||||
asset_folder,
|
||||
&mut HashMap::new(),
|
||||
false,
|
||||
)
|
||||
.await?;
|
||||
for batch in workload
|
||||
.commands
|
||||
.as_slice()
|
||||
.split_inclusive(|command| !matches!(command.synchronous, SyncMode::DontWait))
|
||||
{
|
||||
super::command::run_batch(meili_client, batch, &workload.assets, &args.asset_folder)
|
||||
.await?;
|
||||
}
|
||||
|
||||
let processor =
|
||||
stop_report(dashboard_client, logs_client, workload_uuid, report_filename, report_handle)
|
||||
@@ -91,14 +81,14 @@ pub async fn execute(
|
||||
assets_client: &Client,
|
||||
dashboard_client: &DashboardClient,
|
||||
logs_client: &Client,
|
||||
meili_client: &Arc<Client>,
|
||||
meili_client: &Client,
|
||||
invocation_uuid: Uuid,
|
||||
master_key: Option<&str>,
|
||||
workload: BenchWorkload,
|
||||
args: &BenchArgs,
|
||||
workload: Workload,
|
||||
args: &BenchDeriveArgs,
|
||||
binary_path: Option<&Path>,
|
||||
) -> anyhow::Result<()> {
|
||||
assets::fetch_assets(assets_client, &workload.assets, &args.common.asset_folder).await?;
|
||||
assets::fetch_assets(assets_client, &workload.assets, &args.asset_folder).await?;
|
||||
|
||||
let workload_uuid = dashboard_client.create_workload(invocation_uuid, &workload).await?;
|
||||
|
||||
@@ -139,33 +129,38 @@ pub async fn execute(
|
||||
async fn execute_run(
|
||||
dashboard_client: &DashboardClient,
|
||||
logs_client: &Client,
|
||||
meili_client: &Arc<Client>,
|
||||
meili_client: &Client,
|
||||
workload_uuid: Uuid,
|
||||
master_key: Option<&str>,
|
||||
workload: &BenchWorkload,
|
||||
args: &BenchArgs,
|
||||
workload: &Workload,
|
||||
args: &BenchDeriveArgs,
|
||||
binary_path: Option<&Path>,
|
||||
run_number: u16,
|
||||
) -> anyhow::Result<tokio::task::JoinHandle<anyhow::Result<std::fs::File>>> {
|
||||
delete_db().await;
|
||||
meili_process::delete_db();
|
||||
|
||||
let binary = match binary_path {
|
||||
Some(binary_path) => Binary {
|
||||
source: crate::common::instance::BinarySource::Path(binary_path.to_owned()),
|
||||
extra_cli_args: workload.extra_cli_args.clone(),
|
||||
},
|
||||
None => Binary {
|
||||
source: crate::common::instance::BinarySource::Build {
|
||||
edition: crate::common::instance::Edition::Community,
|
||||
},
|
||||
extra_cli_args: workload.extra_cli_args.clone(),
|
||||
},
|
||||
let run_command = match binary_path {
|
||||
Some(binary_path) => tokio::process::Command::new(binary_path),
|
||||
None => {
|
||||
meili_process::build().await?;
|
||||
let mut command = tokio::process::Command::new("cargo");
|
||||
command
|
||||
.arg("run")
|
||||
.arg("--release")
|
||||
.arg("-p")
|
||||
.arg("meilisearch")
|
||||
.arg("--bin")
|
||||
.arg("meilisearch")
|
||||
.arg("--");
|
||||
command
|
||||
}
|
||||
};
|
||||
|
||||
let meilisearch =
|
||||
start_meili(meili_client, master_key, &binary, &args.common.asset_folder).await?;
|
||||
meili_process::start(meili_client, master_key, workload, &args.asset_folder, run_command)
|
||||
.await?;
|
||||
|
||||
let processor = run_workload_commands(
|
||||
let processor = run_commands(
|
||||
dashboard_client,
|
||||
logs_client,
|
||||
meili_client,
|
||||
@@ -176,7 +171,7 @@ async fn execute_run(
|
||||
)
|
||||
.await?;
|
||||
|
||||
process::kill_meili(meilisearch).await;
|
||||
meili_process::kill(meilisearch).await;
|
||||
|
||||
tracing::info!(run_number, "Successful run");
|
||||
|
||||
|
||||
@@ -1,36 +0,0 @@
|
||||
use clap::Parser;
|
||||
use std::path::PathBuf;
|
||||
|
||||
pub fn default_asset_folder() -> String {
|
||||
"./bench/assets/".into()
|
||||
}
|
||||
|
||||
pub fn default_log_filter() -> String {
|
||||
"info".into()
|
||||
}
|
||||
|
||||
#[derive(Parser, Debug, Clone)]
|
||||
pub struct CommonArgs {
|
||||
/// Filename of the workload file, pass multiple filenames
|
||||
/// to run multiple workloads in the specified order.
|
||||
///
|
||||
/// For benches, each workload run will get its own report file.
|
||||
#[arg(value_name = "WORKLOAD_FILE", last = false)]
|
||||
pub workload_file: Vec<PathBuf>,
|
||||
|
||||
/// Directory to store the remote assets.
|
||||
#[arg(long, default_value_t = default_asset_folder())]
|
||||
pub asset_folder: String,
|
||||
|
||||
/// Log directives
|
||||
#[arg(short, long, default_value_t = default_log_filter())]
|
||||
pub log_filter: String,
|
||||
|
||||
/// Authentication bearer for fetching assets
|
||||
#[arg(long)]
|
||||
pub assets_key: Option<String>,
|
||||
|
||||
/// The maximum time in seconds we allow for fetching the task queue before timing out.
|
||||
#[arg(long, default_value_t = 60)]
|
||||
pub tasks_queue_timeout_secs: u64,
|
||||
}
|
||||
@@ -1,430 +0,0 @@
|
||||
use std::collections::{BTreeMap, HashMap};
|
||||
use std::fmt::Display;
|
||||
use std::io::Read as _;
|
||||
use std::sync::Arc;
|
||||
|
||||
use anyhow::{bail, Context as _};
|
||||
use reqwest::StatusCode;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use serde_json::Value;
|
||||
use similar_asserts::SimpleDiff;
|
||||
|
||||
use crate::common::assets::{fetch_asset, Asset};
|
||||
use crate::common::client::{Client, Method};
|
||||
|
||||
#[derive(Serialize, Deserialize, Clone, Debug)]
|
||||
#[serde(rename_all = "camelCase", deny_unknown_fields)]
|
||||
pub struct Command {
|
||||
pub route: String,
|
||||
pub method: Method,
|
||||
#[serde(default)]
|
||||
pub body: Body,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
pub expected_status: Option<u16>,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
pub expected_response: Option<serde_json::Value>,
|
||||
#[serde(default, skip_serializing_if = "HashMap::is_empty")]
|
||||
pub register: HashMap<String, String>,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
pub api_key_variable: Option<String>,
|
||||
#[serde(default)]
|
||||
pub synchronous: SyncMode,
|
||||
}
|
||||
|
||||
#[derive(Default, Clone, Serialize, Deserialize, Debug)]
|
||||
#[serde(untagged)]
|
||||
pub enum Body {
|
||||
Inline {
|
||||
inline: serde_json::Value,
|
||||
},
|
||||
Asset {
|
||||
asset: String,
|
||||
},
|
||||
#[default]
|
||||
Empty,
|
||||
}
|
||||
|
||||
impl Body {
|
||||
pub fn get(
|
||||
self,
|
||||
assets: &BTreeMap<String, Asset>,
|
||||
registered: &HashMap<String, Value>,
|
||||
asset_folder: &str,
|
||||
) -> anyhow::Result<Option<(Vec<u8>, &'static str)>> {
|
||||
Ok(match self {
|
||||
Body::Inline { inline: mut body } => {
|
||||
if !registered.is_empty() {
|
||||
insert_variables(&mut body, registered);
|
||||
}
|
||||
|
||||
Some((
|
||||
serde_json::to_vec(&body)
|
||||
.context("serializing to bytes")
|
||||
.context("while getting inline body")?,
|
||||
"application/json",
|
||||
))
|
||||
}
|
||||
Body::Asset { asset: name } => Some({
|
||||
let context = || format!("while getting body from asset '{name}'");
|
||||
let (mut file, format) =
|
||||
fetch_asset(&name, assets, asset_folder).with_context(context)?;
|
||||
let mut buf = Vec::new();
|
||||
file.read_to_end(&mut buf).with_context(context)?;
|
||||
(buf, format.to_content_type(&name))
|
||||
}),
|
||||
Body::Empty => None,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl Display for Command {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
write!(f, "{:?} {} ({:?})", self.method, self.route, self.synchronous)
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Default, Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
|
||||
pub enum SyncMode {
|
||||
DontWait,
|
||||
#[default]
|
||||
WaitForResponse,
|
||||
WaitForTask,
|
||||
}
|
||||
|
||||
async fn run_batch(
|
||||
client: &Arc<Client>,
|
||||
batch: &[Command],
|
||||
first_command_index: usize,
|
||||
assets: &Arc<BTreeMap<String, Asset>>,
|
||||
asset_folder: &'static str,
|
||||
registered: &mut HashMap<String, Value>,
|
||||
return_response: bool,
|
||||
) -> anyhow::Result<Vec<(Value, StatusCode)>> {
|
||||
let [.., last] = batch else { return Ok(Vec::new()) };
|
||||
let sync = last.synchronous;
|
||||
let batch_len = batch.len();
|
||||
|
||||
let mut tasks = Vec::with_capacity(batch.len());
|
||||
for (index, command) in batch.iter().cloned().enumerate() {
|
||||
let client2 = Arc::clone(client);
|
||||
let assets2 = Arc::clone(assets);
|
||||
let needs_response = return_response || !command.register.is_empty();
|
||||
let registered2 = registered.clone(); // FIXME: cloning the whole map for each command is inefficient
|
||||
tasks.push(tokio::spawn(async move {
|
||||
run(
|
||||
&client2,
|
||||
&command,
|
||||
first_command_index + index,
|
||||
&assets2,
|
||||
registered2,
|
||||
asset_folder,
|
||||
needs_response,
|
||||
)
|
||||
.await
|
||||
}));
|
||||
}
|
||||
|
||||
let mut outputs = Vec::with_capacity(if return_response { batch_len } else { 0 });
|
||||
for (task, command) in tasks.into_iter().zip(batch.iter()) {
|
||||
let output = task.await.context("task panicked")??;
|
||||
if let Some(output) = output {
|
||||
for (name, path) in &command.register {
|
||||
let value = output
|
||||
.0
|
||||
.pointer(path)
|
||||
.with_context(|| format!("could not find path '{path}' in response (required to register '{name}')"))?
|
||||
.clone();
|
||||
registered.insert(name.clone(), value);
|
||||
}
|
||||
|
||||
if return_response {
|
||||
outputs.push(output);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
match sync {
|
||||
SyncMode::DontWait => {}
|
||||
SyncMode::WaitForResponse => {}
|
||||
SyncMode::WaitForTask => wait_for_tasks(client).await?,
|
||||
}
|
||||
|
||||
Ok(outputs)
|
||||
}
|
||||
|
||||
async fn wait_for_tasks(client: &Client) -> anyhow::Result<()> {
|
||||
loop {
|
||||
let response = client
|
||||
.get("tasks?statuses=enqueued,processing")
|
||||
.send()
|
||||
.await
|
||||
.context("could not wait for tasks")?;
|
||||
let response: serde_json::Value = response
|
||||
.json()
|
||||
.await
|
||||
.context("could not deserialize response to JSON")
|
||||
.context("could not wait for tasks")?;
|
||||
match response.get("total") {
|
||||
Some(serde_json::Value::Number(number)) => {
|
||||
let number = number.as_u64().with_context(|| {
|
||||
format!("waiting for tasks: could not parse 'total' as integer, got {}", number)
|
||||
})?;
|
||||
if number == 0 {
|
||||
break;
|
||||
} else {
|
||||
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
Some(thing_else) => {
|
||||
bail!(format!(
|
||||
"waiting for tasks: could not parse 'total' as a number, got '{thing_else}'"
|
||||
))
|
||||
}
|
||||
None => {
|
||||
bail!(format!(
|
||||
"waiting for tasks: expected response to contain 'total', got '{response}'"
|
||||
))
|
||||
}
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn json_eq_ignore(reference: &Value, value: &Value) -> bool {
|
||||
match reference {
|
||||
Value::Null | Value::Bool(_) | Value::Number(_) => reference == value,
|
||||
Value::String(s) => (s.starts_with('[') && s.ends_with(']')) || reference == value,
|
||||
Value::Array(values) => match value {
|
||||
Value::Array(other_values) => {
|
||||
if values.len() != other_values.len() {
|
||||
return false;
|
||||
}
|
||||
for (value, other_value) in values.iter().zip(other_values.iter()) {
|
||||
if !json_eq_ignore(value, other_value) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
true
|
||||
}
|
||||
_ => false,
|
||||
},
|
||||
Value::Object(map) => match value {
|
||||
Value::Object(other_map) => {
|
||||
if map.len() != other_map.len() {
|
||||
return false;
|
||||
}
|
||||
for (key, value) in map.iter() {
|
||||
match other_map.get(key) {
|
||||
Some(other_value) => {
|
||||
if !json_eq_ignore(value, other_value) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
None => return false,
|
||||
}
|
||||
}
|
||||
true
|
||||
}
|
||||
_ => false,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
#[tracing::instrument(skip(client, command, assets, registered, asset_folder), fields(command = %command))]
|
||||
pub async fn run(
|
||||
client: &Client,
|
||||
command: &Command,
|
||||
command_index: usize,
|
||||
assets: &BTreeMap<String, Asset>,
|
||||
registered: HashMap<String, Value>,
|
||||
asset_folder: &str,
|
||||
return_value: bool,
|
||||
) -> anyhow::Result<Option<(Value, StatusCode)>> {
|
||||
// Try to replace variables in the route
|
||||
let mut route = &command.route;
|
||||
let mut owned_route;
|
||||
if !registered.is_empty() {
|
||||
while let (Some(pos1), Some(pos2)) = (route.find("{{"), route.rfind("}}")) {
|
||||
if pos2 > pos1 {
|
||||
let name = route[pos1 + 2..pos2].trim();
|
||||
if let Some(replacement) = registered.get(name).and_then(|r| r.as_str()) {
|
||||
let mut new_route = String::new();
|
||||
new_route.push_str(&route[..pos1]);
|
||||
new_route.push_str(replacement);
|
||||
new_route.push_str(&route[pos2 + 2..]);
|
||||
owned_route = new_route;
|
||||
route = &owned_route;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// memtake the body here to leave an empty body in its place, so that command is not partially moved-out
|
||||
let body = command
|
||||
.body
|
||||
.clone()
|
||||
.get(assets, ®istered, asset_folder)
|
||||
.with_context(|| format!("while getting body for command {command}"))?;
|
||||
|
||||
let mut request = client.request(command.method.into(), route);
|
||||
|
||||
// Replace the api key
|
||||
if let Some(var_name) = &command.api_key_variable {
|
||||
if let Some(api_key) = registered.get(var_name).and_then(|v| v.as_str()) {
|
||||
request = request.header("Authorization", format!("Bearer {api_key}"));
|
||||
} else {
|
||||
bail!("could not find API key variable '{var_name}' in registered values");
|
||||
}
|
||||
}
|
||||
|
||||
let request = if let Some((body, content_type)) = body {
|
||||
request.body(body).header(reqwest::header::CONTENT_TYPE, content_type)
|
||||
} else {
|
||||
request
|
||||
};
|
||||
|
||||
let response =
|
||||
request.send().await.with_context(|| format!("error sending command: {}", command))?;
|
||||
|
||||
let code = response.status();
|
||||
|
||||
if !return_value {
|
||||
if let Some(expected_status) = command.expected_status {
|
||||
if code.as_u16() != expected_status {
|
||||
let response = response
|
||||
.text()
|
||||
.await
|
||||
.context("could not read response body as text")
|
||||
.context("reading response body when checking expected status")?;
|
||||
bail!("unexpected status code: got {}, expected {expected_status}, response body: '{response}'", code.as_u16());
|
||||
}
|
||||
} else if code.is_client_error() {
|
||||
tracing::error!(%command, %code, "error in workload file");
|
||||
let response: serde_json::Value = response
|
||||
.json()
|
||||
.await
|
||||
.context("could not deserialize response as JSON")
|
||||
.context("parsing error in workload file when sending command")?;
|
||||
bail!(
|
||||
"error in workload file: server responded with error code {code} and '{response}'"
|
||||
)
|
||||
} else if code.is_server_error() {
|
||||
tracing::error!(%command, %code, "server error");
|
||||
let response: serde_json::Value = response
|
||||
.json()
|
||||
.await
|
||||
.context("could not deserialize response as JSON")
|
||||
.context("parsing server error when sending command")?;
|
||||
bail!("server error: server responded with error code {code} and '{response}'")
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(expected_response) = &command.expected_response {
|
||||
let mut evaluated_expected_response;
|
||||
|
||||
let expected_response = if !registered.is_empty() {
|
||||
evaluated_expected_response = expected_response.clone();
|
||||
insert_variables(&mut evaluated_expected_response, ®istered);
|
||||
&evaluated_expected_response
|
||||
} else {
|
||||
expected_response
|
||||
};
|
||||
|
||||
let response: serde_json::Value = response
|
||||
.json()
|
||||
.await
|
||||
.context("could not deserialize response as JSON")
|
||||
.context("parsing response when checking expected response")?;
|
||||
if return_value {
|
||||
return Ok(Some((response, code)));
|
||||
}
|
||||
if !json_eq_ignore(expected_response, &response) {
|
||||
let expected_pretty = serde_json::to_string_pretty(expected_response)
|
||||
.context("serializing expected response as pretty JSON")?;
|
||||
let response_pretty = serde_json::to_string_pretty(&response)
|
||||
.context("serializing response as pretty JSON")?;
|
||||
let diff = SimpleDiff::from_str(&expected_pretty, &response_pretty, "expected", "got");
|
||||
bail!("command #{command_index} unexpected response:\n{diff}");
|
||||
}
|
||||
} else if return_value {
|
||||
let response: serde_json::Value = response
|
||||
.json()
|
||||
.await
|
||||
.context("could not deserialize response as JSON")
|
||||
.context("parsing response when recording expected response")?;
|
||||
return Ok(Some((response, code)));
|
||||
}
|
||||
|
||||
Ok(None)
|
||||
}
|
||||
|
||||
pub async fn run_commands(
|
||||
client: &Arc<Client>,
|
||||
commands: &[Command],
|
||||
mut first_command_index: usize,
|
||||
assets: &Arc<BTreeMap<String, Asset>>,
|
||||
asset_folder: &'static str,
|
||||
registered: &mut HashMap<String, Value>,
|
||||
return_response: bool,
|
||||
) -> anyhow::Result<Vec<(Value, StatusCode)>> {
|
||||
let mut responses = Vec::new();
|
||||
for batch in
|
||||
commands.split_inclusive(|command| !matches!(command.synchronous, SyncMode::DontWait))
|
||||
{
|
||||
let mut new_responses = run_batch(
|
||||
client,
|
||||
batch,
|
||||
first_command_index,
|
||||
assets,
|
||||
asset_folder,
|
||||
registered,
|
||||
return_response,
|
||||
)
|
||||
.await?;
|
||||
responses.append(&mut new_responses);
|
||||
|
||||
first_command_index += batch.len();
|
||||
}
|
||||
|
||||
Ok(responses)
|
||||
}
|
||||
|
||||
pub fn health_command() -> Command {
|
||||
Command {
|
||||
route: "/health".into(),
|
||||
method: crate::common::client::Method::Get,
|
||||
body: Default::default(),
|
||||
register: HashMap::new(),
|
||||
synchronous: SyncMode::WaitForResponse,
|
||||
expected_status: None,
|
||||
expected_response: None,
|
||||
api_key_variable: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn insert_variables(value: &mut Value, registered: &HashMap<String, Value>) {
|
||||
match value {
|
||||
Value::Null | Value::Bool(_) | Value::Number(_) => (),
|
||||
Value::String(s) => {
|
||||
if s.starts_with("{{") && s.ends_with("}}") {
|
||||
let name = s[2..s.len() - 2].trim();
|
||||
if let Some(replacement) = registered.get(name) {
|
||||
*value = replacement.clone();
|
||||
}
|
||||
}
|
||||
}
|
||||
Value::Array(values) => {
|
||||
for value in values {
|
||||
insert_variables(value, registered);
|
||||
}
|
||||
}
|
||||
Value::Object(map) => {
|
||||
for (_key, value) in map.iter_mut() {
|
||||
insert_variables(value, registered);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,113 +0,0 @@
|
||||
use std::fmt::Display;
|
||||
use std::path::PathBuf;
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
mod release;
|
||||
|
||||
pub use release::{add_releases_to_assets, Release};
|
||||
|
||||
/// A binary to execute on a temporary DB.
|
||||
///
|
||||
/// - The URL of the binary will be in the form <http://localhost:PORT>, where `PORT`
|
||||
/// is selected by the runner.
|
||||
/// - The database will be temporary, cleaned before use, and will be selected by the runner.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "camelCase")]
|
||||
pub struct Binary {
|
||||
/// Describes how this binary should be instantiated
|
||||
#[serde(flatten)]
|
||||
pub source: BinarySource,
|
||||
/// Extra CLI arguments to pass to the binary.
|
||||
///
|
||||
/// Should be Meilisearch CLI options.
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub extra_cli_args: Vec<String>,
|
||||
}
|
||||
|
||||
impl Display for Binary {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
write!(f, "{}", self.source)?;
|
||||
if !self.extra_cli_args.is_empty() {
|
||||
write!(f, "with arguments: {:?}", self.extra_cli_args)?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
impl Binary {
|
||||
pub fn as_release(&self) -> Option<&Release> {
|
||||
if let BinarySource::Release(release) = &self.source {
|
||||
Some(release)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
pub fn binary_path(&self, asset_folder: &str) -> anyhow::Result<Option<PathBuf>> {
|
||||
self.source.binary_path(asset_folder)
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "camelCase", deny_unknown_fields, tag = "source")]
|
||||
/// Description of how to get a binary to instantiate.
|
||||
pub enum BinarySource {
|
||||
/// Compile and run the binary from the current repository.=
|
||||
Build {
|
||||
#[serde(default)]
|
||||
edition: Edition,
|
||||
},
|
||||
/// Get a release from GitHub
|
||||
Release(Release),
|
||||
/// Run the binary from the specified local path.
|
||||
Path(PathBuf),
|
||||
}
|
||||
|
||||
impl Display for BinarySource {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
match self {
|
||||
BinarySource::Build { edition: Edition::Community } => {
|
||||
f.write_str("git with community edition")
|
||||
}
|
||||
BinarySource::Build { edition: Edition::Enterprise } => {
|
||||
f.write_str("git with enterprise edition")
|
||||
}
|
||||
BinarySource::Release(release) => write!(f, "{release}"),
|
||||
BinarySource::Path(path) => write!(f, "binary at `{}`", path.display()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for BinarySource {
|
||||
fn default() -> Self {
|
||||
Self::Build { edition: Default::default() }
|
||||
}
|
||||
}
|
||||
|
||||
impl BinarySource {
|
||||
fn binary_path(&self, asset_folder: &str) -> anyhow::Result<Option<PathBuf>> {
|
||||
Ok(match self {
|
||||
Self::Release(release) => Some(release.binary_path(asset_folder)?),
|
||||
Self::Build { .. } => None,
|
||||
Self::Path(path) => Some(path.clone()),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Default, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "camelCase", deny_unknown_fields)]
|
||||
pub enum Edition {
|
||||
#[default]
|
||||
Community,
|
||||
Enterprise,
|
||||
}
|
||||
|
||||
impl Edition {
|
||||
fn binary_base(&self) -> &'static str {
|
||||
match self {
|
||||
Edition::Community => "meilisearch",
|
||||
Edition::Enterprise => "meilisearch-enterprise",
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,193 +0,0 @@
|
||||
use std::collections::BTreeMap;
|
||||
use std::fmt::Display;
|
||||
use std::path::PathBuf;
|
||||
|
||||
use anyhow::Context;
|
||||
use cargo_metadata::semver::Version;
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
use super::Edition;
|
||||
use crate::common::assets::{Asset, AssetFormat};
|
||||
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "camelCase", deny_unknown_fields)]
|
||||
pub struct Release {
|
||||
#[serde(default)]
|
||||
pub edition: Edition,
|
||||
pub version: Version,
|
||||
}
|
||||
|
||||
impl Display for Release {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
write!(f, "v{}", self.version)?;
|
||||
match self.edition {
|
||||
Edition::Community => f.write_str(" Community Edition"),
|
||||
Edition::Enterprise => f.write_str(" Enterprise Edition"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Release {
|
||||
pub fn binary_path(&self, asset_folder: &str) -> anyhow::Result<PathBuf> {
|
||||
let mut asset_folder: PathBuf = asset_folder
|
||||
.parse()
|
||||
.with_context(|| format!("parsing asset folder `{asset_folder}` as a path"))?;
|
||||
asset_folder.push(self.local_filename()?);
|
||||
Ok(asset_folder)
|
||||
}
|
||||
|
||||
fn local_filename(&self) -> anyhow::Result<String> {
|
||||
let version = &self.version;
|
||||
let arch = get_arch()?;
|
||||
let base = self.edition.binary_base();
|
||||
|
||||
Ok(format!("{base}-{version}-{arch}"))
|
||||
}
|
||||
|
||||
fn remote_filename(&self) -> anyhow::Result<String> {
|
||||
let arch = get_arch()?;
|
||||
let base = self.edition.binary_base();
|
||||
|
||||
Ok(format!("{base}-{arch}"))
|
||||
}
|
||||
|
||||
async fn fetch_sha256(&self) -> anyhow::Result<String> {
|
||||
let version = &self.version;
|
||||
let asset_name = self.remote_filename()?;
|
||||
|
||||
// If version is lower than 1.15 there is no point in trying to get the sha256, GitHub didn't support it
|
||||
if *version < Version::parse("1.15.0")? {
|
||||
anyhow::bail!("version is lower than 1.15, sha256 not available");
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct GithubReleaseAsset {
|
||||
name: String,
|
||||
digest: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct GithubRelease {
|
||||
assets: Vec<GithubReleaseAsset>,
|
||||
}
|
||||
|
||||
let url = format!(
|
||||
"https://api.github.com/repos/meilisearch/meilisearch/releases/tags/v{version}"
|
||||
);
|
||||
|
||||
let client = reqwest::Client::builder()
|
||||
.user_agent("Meilisearch bench xtask")
|
||||
.build()
|
||||
.context("failed to build reqwest client")?;
|
||||
let body = client.get(url).send().await?.text().await?;
|
||||
let data: GithubRelease = serde_json::from_str(&body)?;
|
||||
|
||||
let digest = data
|
||||
.assets
|
||||
.into_iter()
|
||||
.find(|asset| asset.name.as_str() == asset_name.as_str())
|
||||
.with_context(|| format!("asset {asset_name} not found in release {self}"))?
|
||||
.digest
|
||||
.with_context(|| format!("asset {asset_name} has no digest"))?;
|
||||
|
||||
let sha256 = digest
|
||||
.strip_prefix("sha256:")
|
||||
.map(|s| s.to_string())
|
||||
.context("invalid sha256 format")?;
|
||||
|
||||
Ok(sha256)
|
||||
}
|
||||
|
||||
async fn add_asset(&self, assets: &mut BTreeMap<String, Asset>) -> anyhow::Result<()> {
|
||||
let local_filename = self.local_filename()?;
|
||||
let version = &self.version;
|
||||
if assets.contains_key(&local_filename) {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let remote_filename = self.remote_filename()?;
|
||||
|
||||
// Try to get the sha256 but it may fail if Github is rate limiting us
|
||||
// We hardcode some values to speed up tests and avoid hitting Github
|
||||
// Also, versions prior to 1.15 don't have sha256 available anyway
|
||||
let sha256 = match local_filename.as_str() {
|
||||
"meilisearch-1.12.0-macos-apple-silicon" => {
|
||||
Some("3b384707a5df9edf66f9157f0ddb70dcd3ac84d4887149169cf93067d06717b7".into())
|
||||
}
|
||||
"meilisearch-1.12.0-linux-amd64" => {
|
||||
Some("865a3fc222e3b3bd1f4b64346cb114b9669af691aae28d71fa68dbf39427abcf".into())
|
||||
}
|
||||
_ => match self.fetch_sha256().await {
|
||||
Ok(sha256) => Some(sha256),
|
||||
Err(err) => {
|
||||
tracing::warn!("failed to get sha256 for release {self}: {err}");
|
||||
None
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
let url = format!(
|
||||
"https://github.com/meilisearch/meilisearch/releases/download/v{version}/{remote_filename}"
|
||||
);
|
||||
|
||||
let asset = Asset {
|
||||
local_location: Some(local_filename.clone()),
|
||||
remote_location: Some(url),
|
||||
format: AssetFormat::Raw,
|
||||
sha256,
|
||||
};
|
||||
|
||||
assets.insert(local_filename, asset);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
pub fn get_arch() -> anyhow::Result<&'static str> {
|
||||
// linux-aarch64
|
||||
#[cfg(all(target_os = "linux", target_arch = "aarch64"))]
|
||||
{
|
||||
Ok("linux-aarch64")
|
||||
}
|
||||
|
||||
// linux-amd64
|
||||
#[cfg(all(target_os = "linux", target_arch = "x86_64"))]
|
||||
{
|
||||
Ok("linux-amd64")
|
||||
}
|
||||
|
||||
// macos-amd64
|
||||
#[cfg(all(target_os = "macos", target_arch = "x86_64"))]
|
||||
{
|
||||
Ok("macos-amd64")
|
||||
}
|
||||
|
||||
// macos-apple-silicon
|
||||
#[cfg(all(target_os = "macos", target_arch = "aarch64"))]
|
||||
{
|
||||
Ok("macos-apple-silicon")
|
||||
}
|
||||
|
||||
// windows-amd64
|
||||
#[cfg(all(target_os = "windows", target_arch = "x86_64"))]
|
||||
{
|
||||
Ok("windows-amd64")
|
||||
}
|
||||
|
||||
#[cfg(not(all(target_os = "windows", target_arch = "x86_64")))]
|
||||
#[cfg(not(all(target_os = "linux", target_arch = "aarch64")))]
|
||||
#[cfg(not(all(target_os = "linux", target_arch = "x86_64")))]
|
||||
#[cfg(not(all(target_os = "macos", target_arch = "aarch64")))]
|
||||
anyhow::bail!("unsupported platform")
|
||||
}
|
||||
|
||||
pub async fn add_releases_to_assets(
|
||||
assets: &mut BTreeMap<String, Asset>,
|
||||
releases: impl IntoIterator<Item = &Release>,
|
||||
) -> anyhow::Result<()> {
|
||||
for release in releases {
|
||||
release.add_asset(assets).await?;
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
@@ -1,18 +0,0 @@
|
||||
use anyhow::Context;
|
||||
use std::io::LineWriter;
|
||||
use tracing_subscriber::{fmt::format::FmtSpan, layer::SubscriberExt, Layer};
|
||||
|
||||
pub fn setup_logs(log_filter: &str) -> anyhow::Result<()> {
|
||||
let filter: tracing_subscriber::filter::Targets =
|
||||
log_filter.parse().context("invalid --log-filter")?;
|
||||
|
||||
let subscriber = tracing_subscriber::registry().with(
|
||||
tracing_subscriber::fmt::layer()
|
||||
.with_writer(|| LineWriter::new(std::io::stderr()))
|
||||
.with_span_events(FmtSpan::NEW | FmtSpan::CLOSE)
|
||||
.with_filter(filter),
|
||||
);
|
||||
tracing::subscriber::set_global_default(subscriber).context("could not setup logging")?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
@@ -1,8 +0,0 @@
|
||||
pub mod args;
|
||||
pub mod assets;
|
||||
pub mod client;
|
||||
pub mod command;
|
||||
pub mod instance;
|
||||
pub mod logs;
|
||||
pub mod process;
|
||||
pub mod workload;
|
||||
@@ -1,11 +0,0 @@
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
use crate::{bench::BenchWorkload, test::TestWorkload};
|
||||
|
||||
#[derive(Serialize, Deserialize, Debug)]
|
||||
#[serde(tag = "type")]
|
||||
#[serde(rename_all = "camelCase")]
|
||||
pub enum Workload {
|
||||
Bench(BenchWorkload),
|
||||
Test(TestWorkload),
|
||||
}
|
||||
@@ -1,3 +1 @@
|
||||
pub mod bench;
|
||||
pub mod common;
|
||||
pub mod test;
|
||||
|
||||
@@ -1,34 +1,16 @@
|
||||
use std::{collections::HashSet, process::Stdio};
|
||||
use std::collections::HashSet;
|
||||
|
||||
use anyhow::Context;
|
||||
use clap::Parser;
|
||||
use semver::{Prerelease, Version};
|
||||
use xtask::{bench::BenchArgs, test::TestArgs};
|
||||
|
||||
/// This is the version of the crate but also the current Meilisearch version
|
||||
pub const VERSION: &str = env!("CARGO_PKG_VERSION");
|
||||
use xtask::bench::BenchDeriveArgs;
|
||||
|
||||
/// List features available in the workspace
|
||||
#[derive(Parser, Debug)]
|
||||
struct ListFeaturesArgs {
|
||||
struct ListFeaturesDeriveArgs {
|
||||
/// Feature to exclude from the list. Use a comma to separate multiple features.
|
||||
#[arg(short, long, value_delimiter = ',')]
|
||||
exclude_feature: Vec<String>,
|
||||
}
|
||||
|
||||
/// Create a git tag for the current version
|
||||
///
|
||||
/// The tag will of the form prototype-v<version>-<name>.<increment>
|
||||
#[derive(Parser, Debug)]
|
||||
struct PrototypeArgs {
|
||||
/// Name of the prototype to generate
|
||||
name: String,
|
||||
/// If true refuses to increment the tag if it already exists
|
||||
/// else refuses to generate new tag and expect the tag to exist.
|
||||
#[arg(long)]
|
||||
generate_new: bool,
|
||||
}
|
||||
|
||||
/// Utilitary commands
|
||||
#[derive(Parser, Debug)]
|
||||
#[command(author, version, about, long_about)]
|
||||
@@ -36,10 +18,8 @@ struct PrototypeArgs {
|
||||
#[command(bin_name = "cargo xtask")]
|
||||
#[allow(clippy::large_enum_variant)] // please, that's enough...
|
||||
enum Command {
|
||||
ListFeatures(ListFeaturesArgs),
|
||||
Bench(BenchArgs),
|
||||
GeneratePrototype(PrototypeArgs),
|
||||
Test(TestArgs),
|
||||
ListFeatures(ListFeaturesDeriveArgs),
|
||||
Bench(BenchDeriveArgs),
|
||||
}
|
||||
|
||||
fn main() -> anyhow::Result<()> {
|
||||
@@ -47,13 +27,11 @@ fn main() -> anyhow::Result<()> {
|
||||
match args {
|
||||
Command::ListFeatures(args) => list_features(args),
|
||||
Command::Bench(args) => xtask::bench::run(args)?,
|
||||
Command::GeneratePrototype(args) => generate_prototype(args)?,
|
||||
Command::Test(args) => xtask::test::run(args)?,
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn list_features(args: ListFeaturesArgs) {
|
||||
fn list_features(args: ListFeaturesDeriveArgs) {
|
||||
let exclude_features: HashSet<_> = args.exclude_feature.into_iter().collect();
|
||||
let metadata = cargo_metadata::MetadataCommand::new().no_deps().exec().unwrap();
|
||||
let features: Vec<String> = metadata
|
||||
@@ -66,106 +44,3 @@ fn list_features(args: ListFeaturesArgs) {
|
||||
let features = features.join(" ");
|
||||
println!("{features}")
|
||||
}
|
||||
|
||||
fn generate_prototype(args: PrototypeArgs) -> anyhow::Result<()> {
|
||||
let PrototypeArgs { name, generate_new: create_new } = args;
|
||||
|
||||
if name.rsplit_once(['.', '-']).filter(|(_, t)| t.chars().all(char::is_numeric)).is_some() {
|
||||
anyhow::bail!(
|
||||
"The increment must not be part of the name and will be rather incremented by this command."
|
||||
);
|
||||
}
|
||||
|
||||
// 1. Fetch the crate version
|
||||
let version = Version::parse(VERSION).context("while semver-parsing the crate version")?;
|
||||
|
||||
// 2. Pull tags from remote and retrieve last prototype tag
|
||||
std::process::Command::new("git")
|
||||
.arg("fetch")
|
||||
.arg("--tags")
|
||||
.stderr(Stdio::null())
|
||||
.stdout(Stdio::null())
|
||||
.status()?;
|
||||
|
||||
let output = std::process::Command::new("git")
|
||||
.arg("tag")
|
||||
.args(["--list", "prototype-v*"])
|
||||
.stderr(Stdio::inherit())
|
||||
.output()?;
|
||||
let output =
|
||||
String::try_from(output.stdout).context("while converting the tag list into a string")?;
|
||||
|
||||
let mut highest_increment = None;
|
||||
for tag in output.lines() {
|
||||
let Some(version) = tag.strip_prefix("prototype-v") else {
|
||||
continue;
|
||||
};
|
||||
let Ok(version) = Version::parse(version) else {
|
||||
continue;
|
||||
};
|
||||
let Ok(proto) = PrototypePrerelease::from_str(version.pre.as_str()) else {
|
||||
continue;
|
||||
};
|
||||
if proto.name() == name {
|
||||
highest_increment = match highest_increment {
|
||||
Some(last) if last < proto.increment() => Some(proto.increment()),
|
||||
Some(last) => Some(last),
|
||||
None => Some(proto.increment()),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Generate the new tag name (without git, just a string)
|
||||
let increment = match (create_new, highest_increment) {
|
||||
(true, None) => 0,
|
||||
(true, Some(increment)) => anyhow::bail!(
|
||||
"A prototype with the name `{name}` already exists with increment `{increment}`"
|
||||
),
|
||||
(false, None) => anyhow::bail!(
|
||||
"Prototype `{name}` is missing and must exist to be incremented.\n\
|
||||
Use the --generate-new flag to create a new prototype with an increment at 0."
|
||||
),
|
||||
(false, Some(increment)) => {
|
||||
increment.checked_add(1).context("While incrementing by one the increment")?
|
||||
}
|
||||
};
|
||||
|
||||
// Note that we cannot have leading zeros in the increment
|
||||
let pre = format!("{name}.{increment}").parse().context("while parsing pre-release name")?;
|
||||
let tag_name = Version { pre, ..version };
|
||||
println!("prototype-v{tag_name}");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
struct PrototypePrerelease {
|
||||
pre: Prerelease,
|
||||
}
|
||||
|
||||
impl PrototypePrerelease {
|
||||
fn from_str(s: &str) -> anyhow::Result<Self> {
|
||||
Prerelease::new(s)
|
||||
.map_err(Into::into)
|
||||
.and_then(|pre| {
|
||||
if pre.rsplit_once('.').is_some() {
|
||||
Ok(pre)
|
||||
} else {
|
||||
Err(anyhow::anyhow!("Invalid prototype name, missing name or increment"))
|
||||
}
|
||||
})
|
||||
.map(|pre| PrototypePrerelease { pre })
|
||||
}
|
||||
|
||||
fn name(&self) -> &str {
|
||||
self.pre.rsplit_once('.').expect("Missing prototype name").0
|
||||
}
|
||||
|
||||
fn increment(&self) -> u32 {
|
||||
self.pre
|
||||
.as_str()
|
||||
.rsplit_once('.')
|
||||
.map(|(_, tail)| tail.parse().expect("Invalid increment"))
|
||||
.expect("Missing increment")
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,95 +0,0 @@
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
|
||||
use anyhow::{bail, Context};
|
||||
use clap::Parser;
|
||||
|
||||
use crate::common::args::CommonArgs;
|
||||
use crate::common::client::Client;
|
||||
use crate::common::command::SyncMode;
|
||||
use crate::common::logs::setup_logs;
|
||||
use crate::common::workload::Workload;
|
||||
use crate::test::workload::CommandOrBinary;
|
||||
|
||||
mod workload;
|
||||
|
||||
pub use workload::TestWorkload;
|
||||
|
||||
/// Run tests from a workload
|
||||
#[derive(Parser, Debug)]
|
||||
pub struct TestArgs {
|
||||
/// Common arguments shared with other commands
|
||||
#[command(flatten)]
|
||||
common: CommonArgs,
|
||||
|
||||
/// Enables workloads to be rewritten in place to update expected responses.
|
||||
#[arg(short, long, default_value_t = false)]
|
||||
pub update_responses: bool,
|
||||
|
||||
/// Enables workloads to be rewritten in place to add missing expected responses.
|
||||
#[arg(short, long, default_value_t = false)]
|
||||
pub add_missing_responses: bool,
|
||||
}
|
||||
|
||||
pub fn run(args: TestArgs) -> anyhow::Result<()> {
|
||||
let rt = tokio::runtime::Builder::new_current_thread().enable_io().enable_time().build()?;
|
||||
let _scope = rt.enter();
|
||||
|
||||
rt.block_on(async { run_inner(args).await })?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn run_inner(args: TestArgs) -> anyhow::Result<()> {
|
||||
setup_logs(&args.common.log_filter)?;
|
||||
|
||||
// setup clients
|
||||
let assets_client = Arc::new(Client::new(
|
||||
None,
|
||||
args.common.assets_key.as_deref(),
|
||||
Some(Duration::from_secs(3600)), // 1h
|
||||
)?);
|
||||
|
||||
let meili_client = Arc::new(Client::new(
|
||||
Some("http://127.0.0.1:7700".into()),
|
||||
Some("masterKey"),
|
||||
Some(Duration::from_secs(args.common.tasks_queue_timeout_secs)),
|
||||
)?);
|
||||
|
||||
let asset_folder = args.common.asset_folder.clone().leak();
|
||||
for workload_file in &args.common.workload_file {
|
||||
let string = tokio::fs::read_to_string(workload_file)
|
||||
.await
|
||||
.with_context(|| format!("error reading {}", workload_file.display()))?;
|
||||
let workload: Workload = serde_json::from_str(string.trim())
|
||||
.with_context(|| format!("error parsing {} as JSON", workload_file.display()))?;
|
||||
|
||||
let Workload::Test(workload) = workload else {
|
||||
bail!("workload file {} is not a test workload", workload_file.display());
|
||||
};
|
||||
|
||||
let has_faulty_register = workload.commands.iter().any(|c| {
|
||||
matches!(c, CommandOrBinary::Command(cmd) if cmd.synchronous == SyncMode::DontWait && !cmd.register.is_empty())
|
||||
});
|
||||
if has_faulty_register {
|
||||
bail!("workload {} contains commands that register values but are marked as --dont-wait. This is not supported because we cannot guarantee the value will be registered before the next command runs.", workload.name);
|
||||
}
|
||||
|
||||
let name = workload.name.clone();
|
||||
match workload.run(&args, &assets_client, &meili_client, asset_folder).await {
|
||||
Ok(_) => match args.update_responses || args.add_missing_responses {
|
||||
true => println!(
|
||||
"🛠️ Workload {name} was updated, please check the output and restart the test"
|
||||
),
|
||||
false => println!("✅ Workload {name} passed"),
|
||||
},
|
||||
Err(error) => {
|
||||
println!("❌ Workload {name} failed: {error}");
|
||||
println!("💡 Is this intentional? If so, rerun with --update-responses to update the workload files.");
|
||||
return Err(error);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
@@ -1,201 +0,0 @@
|
||||
use std::collections::{BTreeMap, HashMap};
|
||||
use std::io::Write;
|
||||
use std::sync::Arc;
|
||||
|
||||
use anyhow::Context;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use serde_json::Value;
|
||||
|
||||
use crate::common::assets::{fetch_assets, Asset};
|
||||
use crate::common::client::Client;
|
||||
use crate::common::command::{run_commands, Command};
|
||||
use crate::common::instance::Binary;
|
||||
use crate::common::process::{self, delete_db, kill_meili};
|
||||
use crate::common::workload::Workload;
|
||||
use crate::test::TestArgs;
|
||||
|
||||
#[derive(Serialize, Deserialize, Debug)]
|
||||
#[serde(untagged)]
|
||||
#[allow(clippy::large_enum_variant)]
|
||||
pub enum CommandOrBinary {
|
||||
Command(Command),
|
||||
Binary { binary: Binary },
|
||||
}
|
||||
|
||||
enum CommandOrBinaryVec<'a> {
|
||||
Commands(Vec<&'a mut Command>),
|
||||
Binary(Binary),
|
||||
}
|
||||
|
||||
fn produce_reference_value(value: &mut Value) {
|
||||
match value {
|
||||
Value::Null | Value::Bool(_) | Value::Number(_) => (),
|
||||
Value::String(string) => {
|
||||
if time::OffsetDateTime::parse(
|
||||
string.as_str(),
|
||||
&time::format_description::well_known::Rfc3339,
|
||||
)
|
||||
.is_ok()
|
||||
{
|
||||
*string = String::from("[timestamp]");
|
||||
} else if uuid::Uuid::parse_str(string).is_ok() {
|
||||
*string = String::from("[uuid]");
|
||||
}
|
||||
}
|
||||
Value::Array(values) => {
|
||||
for value in values {
|
||||
produce_reference_value(value);
|
||||
}
|
||||
}
|
||||
Value::Object(map) => {
|
||||
for (key, value) in map.iter_mut() {
|
||||
match key.as_str() {
|
||||
"duration" => {
|
||||
*value = Value::String(String::from("[duration]"));
|
||||
}
|
||||
"processingTimeMs" => {
|
||||
*value = Value::String(String::from("[duration]"));
|
||||
}
|
||||
_ => produce_reference_value(value),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// A test workload.
|
||||
/// Not to be confused with [a bench workload](crate::bench::workload::Workload).
|
||||
#[derive(Serialize, Deserialize, Debug)]
|
||||
#[serde(rename_all = "camelCase")]
|
||||
pub struct TestWorkload {
|
||||
pub name: String,
|
||||
pub binary: Binary,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
pub master_key: Option<String>,
|
||||
#[serde(default, skip_serializing_if = "BTreeMap::is_empty")]
|
||||
pub assets: BTreeMap<String, Asset>,
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub commands: Vec<CommandOrBinary>,
|
||||
}
|
||||
|
||||
impl TestWorkload {
|
||||
pub async fn run(
|
||||
mut self,
|
||||
args: &TestArgs,
|
||||
assets_client: &Client,
|
||||
meili_client: &Arc<Client>,
|
||||
asset_folder: &'static str,
|
||||
) -> anyhow::Result<()> {
|
||||
// Group commands between upgrades
|
||||
let mut commands_or_instance = Vec::new();
|
||||
let mut current_commands = Vec::new();
|
||||
let mut all_releases = Vec::new();
|
||||
|
||||
if let Some(release) = self.binary.as_release() {
|
||||
all_releases.push(release);
|
||||
}
|
||||
for command_or_upgrade in &mut self.commands {
|
||||
match command_or_upgrade {
|
||||
CommandOrBinary::Command(command) => current_commands.push(command),
|
||||
CommandOrBinary::Binary { binary: instance } => {
|
||||
if !current_commands.is_empty() {
|
||||
commands_or_instance.push(CommandOrBinaryVec::Commands(current_commands));
|
||||
current_commands = Vec::new();
|
||||
}
|
||||
commands_or_instance.push(CommandOrBinaryVec::Binary(instance.clone()));
|
||||
if let Some(release) = instance.as_release() {
|
||||
all_releases.push(release);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if !current_commands.is_empty() {
|
||||
commands_or_instance.push(CommandOrBinaryVec::Commands(current_commands));
|
||||
}
|
||||
|
||||
// Fetch assets
|
||||
crate::common::instance::add_releases_to_assets(&mut self.assets, all_releases).await?;
|
||||
fetch_assets(assets_client, &self.assets, &args.common.asset_folder).await?;
|
||||
|
||||
// Run server
|
||||
delete_db().await;
|
||||
let mut process = process::start_meili(
|
||||
meili_client,
|
||||
Some("masterKey"),
|
||||
&self.binary,
|
||||
&args.common.asset_folder,
|
||||
)
|
||||
.await?;
|
||||
|
||||
let assets = Arc::new(self.assets.clone());
|
||||
let return_responses = args.add_missing_responses || args.update_responses;
|
||||
let mut registered = HashMap::new();
|
||||
let mut first_command_index = 0;
|
||||
for command_or_upgrade in commands_or_instance {
|
||||
match command_or_upgrade {
|
||||
CommandOrBinaryVec::Commands(commands) => {
|
||||
let cloned: Vec<_> = commands.iter().map(|c| (*c).clone()).collect();
|
||||
let responses = run_commands(
|
||||
meili_client,
|
||||
&cloned,
|
||||
first_command_index,
|
||||
&assets,
|
||||
asset_folder,
|
||||
&mut registered,
|
||||
return_responses,
|
||||
)
|
||||
.await?;
|
||||
first_command_index += cloned.len();
|
||||
if return_responses {
|
||||
assert_eq!(responses.len(), cloned.len());
|
||||
for (command, (mut response, status)) in commands.into_iter().zip(responses)
|
||||
{
|
||||
if args.update_responses
|
||||
|| (args.add_missing_responses
|
||||
&& command.expected_response.is_none())
|
||||
{
|
||||
produce_reference_value(&mut response);
|
||||
command.expected_response = Some(response);
|
||||
command.expected_status = Some(status.as_u16());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
CommandOrBinaryVec::Binary(binary) => {
|
||||
kill_meili(process).await;
|
||||
process = process::start_meili(
|
||||
meili_client,
|
||||
Some("masterKey"),
|
||||
&binary,
|
||||
&args.common.asset_folder,
|
||||
)
|
||||
.await?;
|
||||
tracing::info!("Restarted instance with {binary}");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Write back the workload if needed
|
||||
if return_responses {
|
||||
// Filter out the assets we added for the versions
|
||||
self.assets.retain(|_, asset| {
|
||||
asset.local_location.as_ref().is_none_or(|a| !a.starts_with("meilisearch-"))
|
||||
});
|
||||
|
||||
let workload = Workload::Test(self);
|
||||
let mut file =
|
||||
std::fs::File::create(&args.common.workload_file[0]).with_context(|| {
|
||||
format!("could not open {}", args.common.workload_file[0].display())
|
||||
})?;
|
||||
serde_json::to_writer_pretty(&file, &workload).with_context(|| {
|
||||
format!("could not write to {}", args.common.workload_file[0].display())
|
||||
})?;
|
||||
file.write_all(b"\n").with_context(|| {
|
||||
format!("could not write to {}", args.common.workload_file[0].display())
|
||||
})?;
|
||||
tracing::info!("Updated workload file {}", args.common.workload_file[0].display());
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
@@ -20,33 +20,29 @@ These make us iterate fast before stabilizing it for the current release.
|
||||
|
||||
### Release steps
|
||||
|
||||
The prototype name must [follow this convention](https://semver.org/#spec-item-11): `prototype-v<version>-<name>.<iteration>` where
|
||||
The prototype name must follow this convention: `prototype-v<version>.<name>-<number>` where
|
||||
- `version` is the version of Meilisearch on which the prototype is based.
|
||||
- `name` is the feature name formatted in `kebab-case`.
|
||||
- `iteration` is the iteration of the prototype, starting from `0`.
|
||||
- `name` is the feature name formatted in `kebab-case`. It should not end with a single number.
|
||||
- `Y` is the version of the prototype, starting from `0`.
|
||||
|
||||
✅ Example: `prototype-v1.23.0-search-personalization.1`. </br>
|
||||
❌ Bad example: `prototype-v1.23.0-search-personalization-0`: a dash separates the name and version. </br>
|
||||
❌ Bad example: `prototype-v1.23.0.search-personalization.0`: a dot separates the version and name. </br>
|
||||
✅ Example: `prototype-v1.23.0.search-personalization-0`. </br>
|
||||
❌ Bad example: `prototype-search-personalization-0`: version is missing.</br>
|
||||
❌ Bad example: `v1.23.0-auto-resize-0`: lacks the `prototype-` prefix. </br>
|
||||
❌ Bad example: `prototype-v1.23.0-auto-resize`: lacks the version suffix. </br>
|
||||
❌ Bad example: `prototype-v1.23.0-auto-resize.0-0`: feature name ends with something else than a number.
|
||||
❌ Bad example: `v1.23.0.auto-resize-0`: lacks the `prototype` prefix. </br>
|
||||
❌ Bad example: `prototype-v1.23.0.auto-resize`: lacks the version suffix. </br>
|
||||
❌ Bad example: `prototype-v1.23.0.auto-resize-0-0`: feature name ends with a single number.
|
||||
|
||||
Steps to create a prototype:
|
||||
|
||||
1. In your terminal, go to the last commit of your branch (the one you want to provide as a prototype).
|
||||
2. Use the `cargo xtask generate-prototype` command to generate the prototype name.
|
||||
3. Create the tag using the `git tag` command.
|
||||
4. Checkout the tag, run Meilisearch and check that it launches summary features a line: `Prototype: prototype-v<version>-<name>.<iteration>`.
|
||||
5. Checkout back to your branch: `git checkout -`.
|
||||
6. Push the tag: `git push origin prototype-v<version>-<name>.<iteration>`
|
||||
7. Check that the [Docker CI](https://github.com/meilisearch/meilisearch/actions/workflows/publish-docker-images.yml) is now running.
|
||||
2. Create a tag following the convention: `git tag prototype-X-Y`
|
||||
3. Run Meilisearch and check that its launch summary features a line: `Prototype: prototype-X-Y` (you may need to switch branches and back after tagging for this to work).
|
||||
3. Push the tag: `git push origin prototype-X-Y`
|
||||
4. Check the [Docker CI](https://github.com/meilisearch/meilisearch/actions/workflows/publish-docker-images.yml) is now running.
|
||||
|
||||
🐳 Once the CI has finished to run, a Docker image named `prototype-v<version>-<name>.<iteration>` will be available on [DockerHub](https://hub.docker.com/repository/docker/getmeili/meilisearch/general). People can use it with the following command: `docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-v<version>-<name>.<iteration>`. <br>
|
||||
🐳 Once the CI has finished to run (~1h30), a Docker image named `prototype-X-Y` will be available on [DockerHub](https://hub.docker.com/repository/docker/getmeili/meilisearch/general). People can use it with the following command: `docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-X-Y`. <br>
|
||||
More information about [how to run Meilisearch with Docker](https://docs.meilisearch.com/learn/cookbooks/docker.html#download-meilisearch-with-docker).
|
||||
|
||||
⚠️ However, no binaries will be created. If the users do not use Docker, they can go to the `prototype-v<version>-<name>.<iteration>` tag in the Meilisearch repository and compile it from the source code.
|
||||
⚠️ However, no binaries will be created. If the users do not use Docker, they can go to the `prototype-X-Y` tag in the Meilisearch repository and compile it from the source code.
|
||||
|
||||
### Communication
|
||||
|
||||
@@ -67,7 +63,7 @@ Here is an example of messages to share on GitHub:
|
||||
> How to run the prototype?
|
||||
> You need to start from a fresh new database (remove the previous used `data.ms`) and use the following Docker image:
|
||||
> ```bash
|
||||
> docker run -it --rm -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-v<version>-<name>.<iteration>
|
||||
> docker run -it --rm -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-X-Y
|
||||
> ```
|
||||
>
|
||||
> You can use the feature this way:
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "movies-subset-hf-embeddings",
|
||||
"type": "bench",
|
||||
"run_count": 5,
|
||||
"extra_cli_args": [
|
||||
"--max-indexing-threads=4"
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "settings-add-embeddings-hf",
|
||||
"type": "bench",
|
||||
"run_count": 5,
|
||||
"extra_cli_args": [
|
||||
"--max-indexing-threads=4"
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "hackernews.add_new_documents",
|
||||
"type": "bench",
|
||||
"run_count": 3,
|
||||
"extra_cli_args": [],
|
||||
"assets": {
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "hackernews.ndjson_1M_ignore_first_100k",
|
||||
"type": "bench",
|
||||
"run_count": 3,
|
||||
"extra_cli_args": [],
|
||||
"assets": {
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "hackernews.modify_facet_numbers",
|
||||
"type": "bench",
|
||||
"run_count": 3,
|
||||
"extra_cli_args": [],
|
||||
"assets": {
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "hackernews.modify_facet_strings",
|
||||
"type": "bench",
|
||||
"run_count": 3,
|
||||
"extra_cli_args": [],
|
||||
"assets": {
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "hackernews.modify_searchables",
|
||||
"type": "bench",
|
||||
"run_count": 3,
|
||||
"extra_cli_args": [],
|
||||
"assets": {
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "hackernews.ndjson_1M",
|
||||
"type": "bench",
|
||||
"run_count": 3,
|
||||
"extra_cli_args": [],
|
||||
"assets": {
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "movies.json,no-threads",
|
||||
"type": "bench",
|
||||
"run_count": 2,
|
||||
"extra_cli_args": [
|
||||
"--max-indexing-threads=1"
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "movies.json",
|
||||
"type": "bench",
|
||||
"run_count": 10,
|
||||
"extra_cli_args": [],
|
||||
"assets": {
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "search-movies-subset-hf-embeddings",
|
||||
"type": "bench",
|
||||
"run_count": 2,
|
||||
"target": "search::=trace",
|
||||
"extra_cli_args": [
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"name": "search-filterable-movies.json",
|
||||
"type": "bench",
|
||||
"run_count": 10,
|
||||
"target": "search::=trace",
|
||||
"extra_cli_args": [],
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"name": "search-geosort.jsonl_1M",
|
||||
"type": "bench",
|
||||
"run_count": 3,
|
||||
"run_count": 3,
|
||||
"target": "search::=trace",
|
||||
"extra_cli_args": [],
|
||||
"assets": {
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user