Compare commits

..

7 Commits

Author SHA1 Message Date
meili-bors[bot]
5908aec6cb Merge #5192
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 12s
Test suite / Run tests in debug (push) Failing after 12s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 28s
Test suite / Run Rustfmt (push) Successful in 2m16s
Test suite / Run Clippy (push) Successful in 6m20s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5192: Fix empty document addition r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes #5190

## What does this PR do?
- Improve a test just to make sure this issue never arises again
- Fix the issue

For the reviewer: Calling `add_documents` with an empty `mmap` seems to work, but does it impact the perf in a significant way? / 

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-31 17:11:10 +00:00
Tamo
19f48c15fb Fix the addition of empty payload 2024-12-31 18:00:14 +01:00
Tamo
47b484c07c update the test to ensure it works when specifying the primary key or not: it doesn't work 2024-12-31 17:24:32 +01:00
meili-bors[bot]
7d5e28b475 Merge #5193
5193: Update version for the next release (v1.12.1) in Cargo.toml r=irevoire a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2024-12-31 09:40:31 +00:00
curquiza
0648e06aa2 Update version for the next release (v1.12.1) in Cargo.toml 2024-12-30 17:36:46 +00:00
Tamo
33921747b7 stop skipping empty tasks when adding documents 2024-12-30 17:48:25 +01:00
Tamo
970a489dcc add a test reproducing the bug 2024-12-30 16:21:06 +01:00
940 changed files with 32007 additions and 89076 deletions

View File

@@ -1,39 +1,27 @@
---
name: New feature issue
about: ⚠️ Should only be used by the internal Meili team ⚠️
name: New sprint issue
about: ⚠️ Should only be used by the engine team ⚠️
title: ''
labels: 'impacts docs, impacts integrations'
labels: 'missing usage in PRD, impacts docs'
assignees: ''
---
Related product team resources: [PRD]() (_internal only_)
Related product discussion:
## Motivation
<!---Copy/paste the information in PRD or briefly detail the product motivation. Ask product team if any hesitation.-->
## Usage
<!---Link to the public part of the PRD, or to the related product discussion for experimental features-->
TBD
## TODO
<!---If necessary, create a list with technical/product steps-->
### Are you modifying a database?
- [ ] If not, add the `no db change` label to your PR, and you're good to merge.
- [ ] If yes, add the `db change` label to your PR. You'll receive a message explaining you what to do.
### Reminders when modifying the API
- [ ] Update the openAPI file with utoipa:
- [ ] If a new module has been introduced, create a new structure deriving [the OpenAPI proc-macro](https://docs.rs/utoipa/latest/utoipa/derive.OpenApi.html) and nest it in the main [openAPI structure](https://github.com/meilisearch/meilisearch/blob/f2185438eed60fa32d25b15480c5ee064f6fba4a/crates/meilisearch/src/routes/mod.rs#L64-L78).
- [ ] If a new route has been introduced, add the [path decorator](https://docs.rs/utoipa/latest/utoipa/attr.path.html) to it and add the route at the top of the file in its openAPI structure.
- [ ] If a structure which is deserialized or serialized in the API has been introduced or modified, it must derive the [`schema`](https://docs.rs/utoipa/latest/utoipa/macro.schema.html) or the [`IntoParams`](https://docs.rs/utoipa/latest/utoipa/derive.IntoParams.html) proc-macro.
If it's a **new** structure you must also add it to the big list of structures [in the main `OpenApi` structure](https://github.com/meilisearch/meilisearch/blob/f2185438eed60fa32d25b15480c5ee064f6fba4a/crates/meilisearch/src/routes/mod.rs#L88).
- [ ] Once everything is done, start Meilisearch with the swagger flag: `cargo run --features swagger`, open `http://localhost:7700/scalar` on your browser, and ensure everything works as expected.
- For more info, refer to [this presentation](https://pitch.com/v/generating-the-openapi-file-jrn3nh).
### Reminders when modifying the Setting API
<!--- Special steps to remind when adding a new index setting -->
@@ -52,5 +40,5 @@ TBD
## Impacted teams
<!---Ping the related teams. Ask on Slack if any hesitation-->
<!---@meilisearch/docs-team and @meilisearch/integration-team when there is any API change, e.g. settings addition-->
<!---Ping the related teams. Ask for the engine manager if any hesitation-->
<!---@meilisearch/docs-team when there is any API change, e.g. settings addition-->

View File

@@ -7,5 +7,6 @@ updates:
schedule:
interval: "monthly"
labels:
- 'skip changelog'
- 'dependencies'
rebase-strategy: disabled

View File

@@ -1,16 +0,0 @@
## Related issue
Fixes #...
## Requirements
⚠️ Ensure the following requirements before merging ⚠️
- [ ] Automated tests have been added.
- [ ] If some tests cannot be automated, manual rigorous tests should be applied.
- [ ] ⚠️ If there is any change in the DB:
- [ ] Test that any impacted DB still works as expected after using `--experimental-dumpless-upgrade` on a DB created with the last released Meilisearch
- [ ] Test that during the upgrade, **search is still available** (artificially make the upgrade longer if needed)
- [ ] Set the `db change` label.
- [ ] If necessary, the feature have been tested in the Cloud production environment (with [prototypes](./documentation/prototypes.md)) and the Cloud UI is ready.
- [ ] If necessary, the [documentation](https://github.com/meilisearch/documentation) related to the implemented feature in the PR is ready.
- [ ] If necessary, the [integrations](https://github.com/meilisearch/integration-guides) related to the implemented feature in the PR are ready.

View File

@@ -1,22 +0,0 @@
This issue is about updating Meilisearch dependencies:
- [ ] Update Meilisearch dependencies with the help of `cargo +nightly udeps --all-targets` (remove unused dependencies) and `cargo upgrade` (upgrade dependencies versions) - ⚠️ Some repositories may contain subdirectories (like heed, charabia, or deserr). Take care of updating these in the main crate as well. This won't be done automatically by `cargo upgrade`.
- [ ] [deserr](https://github.com/meilisearch/deserr)
- [ ] [charabia](https://github.com/meilisearch/charabia/)
- [ ] [heed](https://github.com/meilisearch/heed/)
- [ ] [roaring-rs](https://github.com/RoaringBitmap/roaring-rs/)
- [ ] [obkv](https://github.com/meilisearch/obkv)
- [ ] [grenad](https://github.com/meilisearch/grenad/)
- [ ] [arroy](https://github.com/meilisearch/arroy/)
- [ ] [segment](https://github.com/meilisearch/segment)
- [ ] [bumparaw-collections](https://github.com/meilisearch/bumparaw-collections)
- [ ] [bbqueue](https://github.com/meilisearch/bbqueue)
- [ ] Finally, [Meilisearch](https://github.com/meilisearch/MeiliSearch)
- [ ] If new Rust versions have been released, update the minimal Rust version in use at Meilisearch:
- [ ] in this [GitHub Action file](https://github.com/meilisearch/meilisearch/blob/main/.github/workflows/test-suite.yml), by changing the `toolchain` field of the `rustfmt` job to the latest available nightly (of the day before or the current day).
- [ ] in every [GitHub Action files](https://github.com/meilisearch/meilisearch/blob/main/.github/workflows), by changing all the `dtolnay/rust-toolchain@` references to use the latest stable version.
- [ ] in this [`rust-toolchain.toml`](https://github.com/meilisearch/meilisearch/blob/main/rust-toolchain.toml), by changing the `channel` field to the latest stable version.
- [ ] in the [Dockerfile](https://github.com/meilisearch/meilisearch/blob/main/Dockerfile), by changing the base image to `rust:<target_rust_version>-alpine<alpine_version>`. Check that the image exists on [Dockerhub](https://hub.docker.com/_/rust/tags?page=1&name=alpine). Also, build and run the image to check everything still works!
⚠️ This issue should be prioritized to avoid any deprecation and vulnerability issues.
The GitHub action dependencies are managed by [Dependabot](https://github.com/meilisearch/meilisearch/blob/main/.github/dependabot.yml), so no need to update them when solving this issue.

View File

@@ -1,27 +1,28 @@
name: Bench (manual)
on:
workflow_dispatch:
inputs:
workload:
description: "The path to the workloads to execute (workloads/...)"
required: true
default: "workloads/movies.json"
workflow_dispatch:
inputs:
workload:
description: 'The path to the workloads to execute (workloads/...)'
required: true
default: 'workloads/movies.json'
env:
WORKLOAD_NAME: ${{ github.event.inputs.workload }}
WORKLOAD_NAME: ${{ github.event.inputs.workload }}
jobs:
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 180 # 3h
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 180 # 3h
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
- name: Run benchmarks - workload ${WORKLOAD_NAME} - branch ${{ github.ref }} - commit ${{ github.sha }}
run: |
cargo xtask bench --api-key "${{ secrets.BENCHMARK_API_KEY }}" --dashboard-url "${{ vars.BENCHMARK_DASHBOARD_URL }}" --reason "Manual [Run #${{ github.run_id }}](https://github.com/meilisearch/meilisearch/actions/runs/${{ github.run_id }})" -- ${WORKLOAD_NAME}
- name: Run benchmarks - workload ${WORKLOAD_NAME} - branch ${{ github.ref }} - commit ${{ github.sha }}
run: |
cargo xtask bench --api-key "${{ secrets.BENCHMARK_API_KEY }}" --dashboard-url "${{ vars.BENCHMARK_DASHBOARD_URL }}" --reason "Manual [Run #${{ github.run_id }}](https://github.com/meilisearch/meilisearch/actions/runs/${{ github.run_id }})" -- ${WORKLOAD_NAME}

View File

@@ -1,82 +1,82 @@
name: Bench (PR)
on:
issue_comment:
types: [created]
issue_comment:
types: [created]
permissions:
issues: write
issues: write
env:
GH_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }}
GH_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }}
jobs:
run-benchmarks-on-comment:
if: startsWith(github.event.comment.body, '/bench')
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 180 # 3h
steps:
- name: Check permissions
id: permission
env:
PR_AUTHOR: ${{github.event.issue.user.login }}
COMMENT_AUTHOR: ${{github.event.comment.user.login }}
REPOSITORY: ${{github.repository}}
PR_ID: ${{github.event.issue.number}}
run: |
PR_REPOSITORY=$(gh api /repos/"$REPOSITORY"/pulls/"$PR_ID" --jq .head.repo.full_name)
if $(gh api /repos/"$REPOSITORY"/collaborators/"$PR_AUTHOR"/permission --jq .user.permissions.push)
then
echo "::notice title=Authentication success::PR author authenticated"
else
echo "::error title=Authentication error::PR author doesn't have push permission on this repository"
exit 1
fi
if $(gh api /repos/"$REPOSITORY"/collaborators/"$COMMENT_AUTHOR"/permission --jq .user.permissions.push)
then
echo "::notice title=Authentication success::Comment author authenticated"
else
echo "::error title=Authentication error::Comment author doesn't have push permission on this repository"
exit 1
fi
if [ "$PR_REPOSITORY" = "$REPOSITORY" ]
then
echo "::notice title=Authentication success::PR started from main repository"
else
echo "::error title=Authentication error::PR started from a fork"
exit 1
fi
run-benchmarks-on-comment:
if: startsWith(github.event.comment.body, '/bench')
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 180 # 3h
steps:
- name: Check permissions
id: permission
env:
PR_AUTHOR: ${{github.event.issue.user.login }}
COMMENT_AUTHOR: ${{github.event.comment.user.login }}
REPOSITORY: ${{github.repository}}
PR_ID: ${{github.event.issue.number}}
run: |
PR_REPOSITORY=$(gh api /repos/"$REPOSITORY"/pulls/"$PR_ID" --jq .head.repo.full_name)
if $(gh api /repos/"$REPOSITORY"/collaborators/"$PR_AUTHOR"/permission --jq .user.permissions.push)
then
echo "::notice title=Authentication success::PR author authenticated"
else
echo "::error title=Authentication error::PR author doesn't have push permission on this repository"
exit 1
fi
if $(gh api /repos/"$REPOSITORY"/collaborators/"$COMMENT_AUTHOR"/permission --jq .user.permissions.push)
then
echo "::notice title=Authentication success::Comment author authenticated"
else
echo "::error title=Authentication error::Comment author doesn't have push permission on this repository"
exit 1
fi
if [ "$PR_REPOSITORY" = "$REPOSITORY" ]
then
echo "::notice title=Authentication success::PR started from main repository"
else
echo "::error title=Authentication error::PR started from a fork"
exit 1
fi
- name: Check for Command
id: command
uses: xt0rted/slash-command-action@v2
with:
command: bench
reaction-type: "rocket"
repo-token: ${{ env.GH_TOKEN }}
- name: Check for Command
id: command
uses: xt0rted/slash-command-action@v2
with:
command: bench
reaction-type: "rocket"
repo-token: ${{ env.GH_TOKEN }}
- uses: xt0rted/pull-request-comment-branch@v3
id: comment-branch
with:
repo_token: ${{ env.GH_TOKEN }}
- uses: xt0rted/pull-request-comment-branch@v2
id: comment-branch
with:
repo_token: ${{ env.GH_TOKEN }}
- uses: actions/checkout@v5
if: success()
with:
fetch-depth: 0 # fetch full history to be able to get main commit sha
ref: ${{ steps.comment-branch.outputs.head_ref }}
- uses: actions/checkout@v3
if: success()
with:
fetch-depth: 0 # fetch full history to be able to get main commit sha
ref: ${{ steps.comment-branch.outputs.head_ref }}
- uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
- name: Run benchmarks on PR ${{ github.event.issue.id }}
run: |
cargo xtask bench --api-key "${{ secrets.BENCHMARK_API_KEY }}" \
--dashboard-url "${{ vars.BENCHMARK_DASHBOARD_URL }}" \
--reason "[Comment](${{ github.event.comment.html_url }}) on [#${{ github.event.issue.number }}](${{ github.event.issue.html_url }})" \
-- ${{ steps.command.outputs.command-arguments }} > benchlinks.txt
- name: Run benchmarks on PR ${{ github.event.issue.id }}
run: |
cargo xtask bench --api-key "${{ secrets.BENCHMARK_API_KEY }}" \
--dashboard-url "${{ vars.BENCHMARK_DASHBOARD_URL }}" \
--reason "[Comment](${{ github.event.comment.html_url }}) on [#${{ github.event.issue.number }}](${{ github.event.issue.html_url }})" \
-- ${{ steps.command.outputs.command-arguments }} > benchlinks.txt
- name: Send comment in PR
run: |
gh pr comment ${{github.event.issue.number}} --body-file benchlinks.txt
- name: Send comment in PR
run: |
gh pr comment ${{github.event.issue.number}} --body-file benchlinks.txt

View File

@@ -1,22 +1,23 @@
name: Indexing bench (push)
on:
push:
branches:
- main
push:
branches:
- main
jobs:
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 180 # 3h
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 180 # 3h
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch main - Commit ${{ github.sha }}
run: |
cargo xtask bench --api-key "${{ secrets.BENCHMARK_API_KEY }}" --dashboard-url "${{ vars.BENCHMARK_DASHBOARD_URL }}" --reason "Push on `main` [Run #${{ github.run_id }}](https://github.com/meilisearch/meilisearch/actions/runs/${{ github.run_id }})" -- workloads/*.json
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch main - Commit ${{ github.sha }}
run: |
cargo xtask bench --api-key "${{ secrets.BENCHMARK_API_KEY }}" --dashboard-url "${{ vars.BENCHMARK_DASHBOARD_URL }}" --reason "Push on `main` [Run #${{ github.run_id }}](https://github.com/meilisearch/meilisearch/actions/runs/${{ github.run_id }})" -- workloads/*.json

View File

@@ -4,9 +4,9 @@ on:
workflow_dispatch:
inputs:
dataset_name:
description: "The name of the dataset used to benchmark (search_songs, search_wiki, search_geo or indexing)"
description: 'The name of the dataset used to benchmark (search_songs, search_wiki, search_geo or indexing)'
required: false
default: "search_songs"
default: 'search_songs'
env:
BENCH_NAME: ${{ github.event.inputs.dataset_name }}
@@ -17,8 +17,8 @@ jobs:
runs-on: benchmarks
timeout-minutes: 4320 # 72h
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
@@ -67,7 +67,7 @@ jobs:
out_dir: critcmp_results
# Helper
- name: "README: compare with another benchmark"
- name: 'README: compare with another benchmark'
run: |
echo "${{ steps.file.outputs.basename }}.json has just been pushed."
echo 'How to compare this benchmark with another one?'

View File

@@ -44,7 +44,7 @@ jobs:
exit 1
fi
- uses: dtolnay/rust-toolchain@1.89
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
@@ -56,12 +56,12 @@ jobs:
reaction-type: "eyes"
repo-token: ${{ env.GH_TOKEN }}
- uses: xt0rted/pull-request-comment-branch@v3
- uses: xt0rted/pull-request-comment-branch@v2
id: comment-branch
with:
repo_token: ${{ env.GH_TOKEN }}
- uses: actions/checkout@v5
- uses: actions/checkout@v3
if: success()
with:
fetch-depth: 0 # fetch full history to be able to get main commit sha

View File

@@ -15,8 +15,8 @@ jobs:
runs-on: benchmarks
timeout-minutes: 4320 # 72h
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
@@ -69,7 +69,7 @@ jobs:
run: telegraf --config https://eu-central-1-1.aws.cloud2.influxdata.com/api/v2/telegrafs/08b52e34a370b000 --once --debug
# Helper
- name: "README: compare with another benchmark"
- name: 'README: compare with another benchmark'
run: |
echo "${{ steps.file.outputs.basename }}.json has just been pushed."
echo 'How to compare this benchmark with another one?'

View File

@@ -14,8 +14,8 @@ jobs:
name: Run and upload benchmarks
runs-on: benchmarks
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
@@ -68,7 +68,7 @@ jobs:
run: telegraf --config https://eu-central-1-1.aws.cloud2.influxdata.com/api/v2/telegrafs/08b52e34a370b000 --once --debug
# Helper
- name: "README: compare with another benchmark"
- name: 'README: compare with another benchmark'
run: |
echo "${{ steps.file.outputs.basename }}.json has just been pushed."
echo 'How to compare this benchmark with another one?'

View File

@@ -14,8 +14,8 @@ jobs:
name: Run and upload benchmarks
runs-on: benchmarks
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
@@ -68,7 +68,7 @@ jobs:
run: telegraf --config https://eu-central-1-1.aws.cloud2.influxdata.com/api/v2/telegrafs/08b52e34a370b000 --once --debug
# Helper
- name: "README: compare with another benchmark"
- name: 'README: compare with another benchmark'
run: |
echo "${{ steps.file.outputs.basename }}.json has just been pushed."
echo 'How to compare this benchmark with another one?'

View File

@@ -14,8 +14,8 @@ jobs:
name: Run and upload benchmarks
runs-on: benchmarks
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
@@ -68,7 +68,7 @@ jobs:
run: telegraf --config https://eu-central-1-1.aws.cloud2.influxdata.com/api/v2/telegrafs/08b52e34a370b000 --once --debug
# Helper
- name: "README: compare with another benchmark"
- name: 'README: compare with another benchmark'
run: |
echo "${{ steps.file.outputs.basename }}.json has just been pushed."
echo 'How to compare this benchmark with another one?'

View File

@@ -1,57 +0,0 @@
name: Comment when db change labels are added
on:
pull_request:
types: [labeled]
env:
MESSAGE: |
### Hello, I'm a bot 🤖
You are receiving this message because you declared that this PR make changes to the Meilisearch database.
Depending on the nature of the change, additional actions might be required on your part. The following sections detail the additional actions depending on the nature of the change, please copy the relevant section in the description of your PR, and make sure to perform the required actions.
Thank you for contributing to Meilisearch :heart:
## This PR makes forward-compatible changes
*Forward-compatible changes are changes to the database such that databases created in an older version of Meilisearch are still valid in the new version of Meilisearch. They usually represent additive changes, like adding a new optional attribute or setting.*
- [ ] Detail the change to the DB format and why they are forward compatible
- [ ] Forward-compatibility: A database created before this PR and using the features touched by this PR was able to be opened by a Meilisearch produced by the code of this PR.
## This PR makes breaking changes
*Breaking changes are changes to the database such that databases created in an older version of Meilisearch need changes to remain valid in the new version of Meilisearch. This typically happens when the way to store the data changed (change of database, new required key, etc). This can also happen due to breaking changes in the API of an experimental feature. ⚠️ This kind of changes are more difficult to achieve safely, so proceed with caution and test dumpless upgrade right before merging the PR.*
- [ ] Detail the changes to the DB format,
- [ ] which are compatible, and why
- [ ] which are not compatible, why, and how they will be fixed up in the upgrade
- [ ] /!\ Ensure all the read operations still work!
- If the change happened in milli, you may need to check the version of the database before doing any read operation
- If the change happened in the index-scheduler, make sure the new code can immediately read the old database
- If the change happened in the meilisearch-auth database, reach out to the team; we don't know yet how to handle these changes
- [ ] Write the code to go from the old database to the new one
- If the change happened in milli, the upgrade function should be written and called [here](https://github.com/meilisearch/meilisearch/blob/3fd86e8d76d7d468b0095d679adb09211ca3b6c0/crates/milli/src/update/upgrade/mod.rs#L24-L47)
- If the change happened in the index-scheduler, we've never done it yet, but the right place to do it should be [here](https://github.com/meilisearch/meilisearch/blob/3fd86e8d76d7d468b0095d679adb09211ca3b6c0/crates/index-scheduler/src/scheduler/process_upgrade/mod.rs#L13)
- [ ] Write an integration test [here](https://github.com/meilisearch/meilisearch/blob/main/crates/meilisearch/tests/upgrade/mod.rs) ensuring you can read the old database, upgrade to the new database, and read the new database as expected
jobs:
add-comment:
runs-on: ubuntu-latest
if: github.event.label.name == 'db change'
steps:
- name: Add comment
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const message = process.env.MESSAGE;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: message
})

View File

@@ -1,28 +0,0 @@
name: Check db change labels
on:
pull_request:
types: [opened, synchronize, reopened, labeled, unlabeled]
jobs:
check-labels:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v5
- name: Check db change labels
id: check_labels
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
URL=/repos/meilisearch/meilisearch/pulls/${{ github.event.pull_request.number }}/labels
echo ${{ github.event.pull_request.number }}
echo $URL
LABELS=$(gh api -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" /repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/labels -q .[].name)
echo "Labels: $LABELS"
if [[ ! "$LABELS" =~ "db change" && ! "$LABELS" =~ "no db change" ]]; then
echo "::error::Pull request must contain either the 'db change' or 'no db change' label."
exit 1
else
echo "The label is set"
fi

View File

@@ -13,9 +13,9 @@ jobs:
ISSUE_TEMPLATE: issue-template.md
GH_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/meilisearch/main/.github/templates/dependency-issue.md > $ISSUE_TEMPLATE
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/dependency-issue.md > $ISSUE_TEMPLATE
- name: Create issue
run: |
gh issue create \

View File

@@ -3,28 +3,28 @@ name: Look for flaky tests
on:
workflow_dispatch:
schedule:
- cron: '0 4 * * *' # Every day at 4:00AM
- cron: "0 12 * * FRI" # Every Friday at 12:00PM
jobs:
flaky:
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
# Use ubuntu-20.04 to compile with glibc 2.28
image: ubuntu:20.04
steps:
- uses: actions/checkout@v5
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.89
- name: Install cargo-flaky
run: cargo install cargo-flaky
- name: Run cargo flaky in the dumps
run: cd crates/dump; cargo flaky -i 100 --release
- name: Run cargo flaky in the index-scheduler
run: cd crates/index-scheduler; cargo flaky -i 100 --release
- name: Run cargo flaky in the auth
run: cd crates/meilisearch-auth; cargo flaky -i 100 --release
- name: Run cargo flaky in meilisearch
run: cd crates/meilisearch; cargo flaky -i 100 --release
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.79
- name: Install cargo-flaky
run: cargo install cargo-flaky
- name: Run cargo flaky in the dumps
run: cd crates/dump; cargo flaky -i 100 --release
- name: Run cargo flaky in the index-scheduler
run: cd crates/index-scheduler; cargo flaky -i 100 --release
- name: Run cargo flaky in the auth
run: cd crates/meilisearch-auth; cargo flaky -i 100 --release
- name: Run cargo flaky in meilisearch
run: cd crates/meilisearch; cargo flaky -i 100 --release

View File

@@ -11,8 +11,8 @@ jobs:
runs-on: ubuntu-latest
timeout-minutes: 4320 # 72h
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal

View File

@@ -10,7 +10,7 @@ jobs:
name: Check the version validity
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
- name: Check release validity
if: github.event_name == 'release'
run: bash .github/scripts/check-release.sh
@@ -19,7 +19,7 @@ jobs:
runs-on: ubuntu-latest
needs: check-version
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
- uses: rickstaa/action-create-tag@v1
with:
tag: "latest"

192
.github/workflows/milestone-workflow.yml vendored Normal file
View File

@@ -0,0 +1,192 @@
name: Milestone's workflow
# /!\ No git flow are handled here
# For each Milestone created (not opened!), and if the release is NOT a patch release (only the patch changed)
# - the roadmap issue is created, see https://github.com/meilisearch/engine-team/blob/main/issue-templates/roadmap-issue.md
# - the changelog issue is created, see https://github.com/meilisearch/engine-team/blob/main/issue-templates/changelog-issue.md
# For each Milestone closed
# - the `release_version` label is created
# - this label is applied to all issues/PRs in the Milestone
on:
milestone:
types: [created, closed]
env:
MILESTONE_VERSION: ${{ github.event.milestone.title }}
MILESTONE_URL: ${{ github.event.milestone.html_url }}
MILESTONE_DUE_ON: ${{ github.event.milestone.due_on }}
GH_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }}
jobs:
# -----------------
# MILESTONE CREATED
# -----------------
get-release-version:
if: github.event.action == 'created'
runs-on: ubuntu-latest
outputs:
is-patch: ${{ steps.check-patch.outputs.is-patch }}
steps:
- uses: actions/checkout@v3
- name: Check if this release is a patch release only
id: check-patch
run: |
echo version: $MILESTONE_VERSION
if [[ $MILESTONE_VERSION =~ ^v[0-9]+\.[0-9]+\.0$ ]]; then
echo 'This is NOT a patch release'
echo "is-patch=false" >> $GITHUB_OUTPUT
elif [[ $MILESTONE_VERSION =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
echo 'This is a patch release'
echo "is-patch=true" >> $GITHUB_OUTPUT
else
echo "Not a valid format of release, check the Milestone's title."
echo 'Should be vX.Y.Z'
exit 1
fi
create-roadmap-issue:
needs: get-release-version
# Create the roadmap issue if the release is not only a patch release
if: github.event.action == 'created' && needs.get-release-version.outputs.is-patch == 'false'
runs-on: ubuntu-latest
env:
ISSUE_TEMPLATE: issue-template.md
steps:
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/roadmap-issue.md > $ISSUE_TEMPLATE
- name: Replace all empty occurrences in the templates
run: |
# Replace all <<version>> occurrences
sed -i "s/<<version>>/$MILESTONE_VERSION/g" $ISSUE_TEMPLATE
# Replace all <<milestone_id>> occurrences
milestone_id=$(echo $MILESTONE_URL | cut -d '/' -f 7)
sed -i "s/<<milestone_id>>/$milestone_id/g" $ISSUE_TEMPLATE
# Replace release date if exists
if [[ ! -z $MILESTONE_DUE_ON ]]; then
date=$(echo $MILESTONE_DUE_ON | cut -d 'T' -f 1)
sed -i "s/Release date\: 20XX-XX-XX/Release date\: $date/g" $ISSUE_TEMPLATE
fi
- name: Create the issue
run: |
gh issue create \
--title "$MILESTONE_VERSION ROADMAP" \
--label 'epic,impacts docs,impacts integrations,impacts cloud' \
--body-file $ISSUE_TEMPLATE \
--milestone $MILESTONE_VERSION
create-changelog-issue:
needs: get-release-version
# Create the changelog issue if the release is not only a patch release
if: github.event.action == 'created' && needs.get-release-version.outputs.is-patch == 'false'
runs-on: ubuntu-latest
env:
ISSUE_TEMPLATE: issue-template.md
steps:
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/changelog-issue.md > $ISSUE_TEMPLATE
- name: Replace all empty occurrences in the templates
run: |
# Replace all <<version>> occurrences
sed -i "s/<<version>>/$MILESTONE_VERSION/g" $ISSUE_TEMPLATE
# Replace all <<milestone_id>> occurrences
milestone_id=$(echo $MILESTONE_URL | cut -d '/' -f 7)
sed -i "s/<<milestone_id>>/$milestone_id/g" $ISSUE_TEMPLATE
- name: Create the issue
run: |
gh issue create \
--title "Create release changelogs for $MILESTONE_VERSION" \
--label 'impacts docs,documentation' \
--body-file $ISSUE_TEMPLATE \
--milestone $MILESTONE_VERSION \
--assignee curquiza
create-update-version-issue:
needs: get-release-version
# Create the update-version issue even if the release is a patch release
if: github.event.action == 'created'
runs-on: ubuntu-latest
env:
ISSUE_TEMPLATE: issue-template.md
steps:
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/update-version-issue.md > $ISSUE_TEMPLATE
- name: Create the issue
run: |
gh issue create \
--title "Update version in Cargo.toml for $MILESTONE_VERSION" \
--label 'maintenance' \
--body-file $ISSUE_TEMPLATE \
--milestone $MILESTONE_VERSION
create-update-openapi-issue:
needs: get-release-version
# Create the openAPI issue if the release is not only a patch release
if: github.event.action == 'created' && needs.get-release-version.outputs.is-patch == 'false'
runs-on: ubuntu-latest
env:
ISSUE_TEMPLATE: issue-template.md
steps:
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/update-openapi-issue.md > $ISSUE_TEMPLATE
- name: Create the issue
run: |
gh issue create \
--title "Update Open API file for $MILESTONE_VERSION" \
--label 'maintenance' \
--body-file $ISSUE_TEMPLATE \
--milestone $MILESTONE_VERSION
# ----------------
# MILESTONE CLOSED
# ----------------
create-release-label:
if: github.event.action == 'closed'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Create the ${{ env.MILESTONE_VERSION }} label
run: |
label_description="PRs/issues solved in $MILESTONE_VERSION"
if [[ ! -z $MILESTONE_DUE_ON ]]; then
date=$(echo $MILESTONE_DUE_ON | cut -d 'T' -f 1)
label_description="$label_description released on $date"
fi
gh api repos/meilisearch/meilisearch/labels \
--method POST \
-H "Accept: application/vnd.github+json" \
-f name="$MILESTONE_VERSION" \
-f description="$label_description" \
-f color='ff5ba3'
labelize-all-milestone-content:
if: github.event.action == 'closed'
needs: create-release-label
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Add label ${{ env.MILESTONE_VERSION }} to all PRs in the Milestone
run: |
prs=$(gh pr list --search milestone:"$MILESTONE_VERSION" --limit 1000 --state all --json number --template '{{range .}}{{tablerow (printf "%v" .number)}}{{end}}')
for pr in $prs; do
gh pr edit $pr --add-label $MILESTONE_VERSION
done
- name: Add label ${{ env.MILESTONE_VERSION }} to all issues in the Milestone
run: |
issues=$(gh issue list --search milestone:"$MILESTONE_VERSION" --limit 1000 --state all --json number --template '{{range .}}{{tablerow (printf "%v" .number)}}{{end}}')
for issue in $issues; do
gh issue edit $issue --add-label $MILESTONE_VERSION
done

View File

@@ -9,7 +9,7 @@ jobs:
name: Check the version validity
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
- name: Check release validity
run: bash .github/scripts/check-release.sh
@@ -18,28 +18,28 @@ jobs:
runs-on: ubuntu-latest
needs: check-version
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
# Use ubuntu-20.04 to compile with glibc 2.28
image: ubuntu:20.04
steps:
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.89
- name: Install cargo-deb
run: cargo install cargo-deb
- uses: actions/checkout@v5
- name: Build deb package
run: cargo deb -p meilisearch -o target/debian/meilisearch.deb
- name: Upload debian pkg to release
uses: svenstaro/upload-release-action@2.11.2
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/debian/meilisearch.deb
asset_name: meilisearch.deb
tag: ${{ github.ref }}
- name: Upload debian pkg to apt repository
run: curl -F package=@target/debian/meilisearch.deb https://${{ secrets.GEMFURY_PUSH_TOKEN }}@push.fury.io/meilisearch/
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.79
- name: Install cargo-deb
run: cargo install cargo-deb
- uses: actions/checkout@v3
- name: Build deb package
run: cargo deb -p meilisearch -o target/debian/meilisearch.deb
- name: Upload debian pkg to release
uses: svenstaro/upload-release-action@2.7.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/debian/meilisearch.deb
asset_name: meilisearch.deb
tag: ${{ github.ref }}
- name: Upload debian pkg to apt repository
run: curl -F package=@target/debian/meilisearch.deb https://${{ secrets.GEMFURY_PUSH_TOKEN }}@push.fury.io/meilisearch/
homebrew:
name: Bump Homebrew formula

View File

@@ -1,9 +1,9 @@
name: Publish assets to GitHub release
name: Publish binaries to GitHub release
on:
workflow_dispatch:
schedule:
- cron: "0 2 * * *" # Every day at 2:00am
- cron: '0 2 * * *' # Every day at 2:00am
release:
types: [published]
@@ -11,9 +11,9 @@ jobs:
check-version:
name: Check the version validity
runs-on: ubuntu-latest
# No need to check the version for dry run (cron or workflow_dispatch)
# No need to check the version for dry run (cron)
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
# Check if the tag has the v<nmumber>.<number>.<number> format.
# If yes, it means we are publishing an official release.
# If no, we are releasing a RC, so no need to check the version.
@@ -37,26 +37,26 @@ jobs:
runs-on: ubuntu-latest
needs: check-version
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
# Use ubuntu-20.04 to compile with glibc 2.28
image: ubuntu:20.04
steps:
- uses: actions/checkout@v5
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.89
- name: Build
run: cargo build --release --locked
# No need to upload binaries for dry run (cron or workflow_dispatch)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.11.2
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/meilisearch
asset_name: meilisearch-linux-amd64
tag: ${{ github.ref }}
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.79
- name: Build
run: cargo build --release --locked
# No need to upload binaries for dry run (cron)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.7.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/meilisearch
asset_name: meilisearch-linux-amd64
tag: ${{ github.ref }}
publish-macos-windows:
name: Publish binary for ${{ matrix.os }}
@@ -74,19 +74,19 @@ jobs:
artifact_name: meilisearch.exe
asset_name: meilisearch-windows-amd64.exe
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- name: Build
run: cargo build --release --locked
# No need to upload binaries for dry run (cron or workflow_dispatch)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.11.2
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/${{ matrix.artifact_name }}
asset_name: ${{ matrix.asset_name }}
tag: ${{ github.ref }}
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
- name: Build
run: cargo build --release --locked
# No need to upload binaries for dry run (cron)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.7.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/${{ matrix.artifact_name }}
asset_name: ${{ matrix.asset_name }}
tag: ${{ github.ref }}
publish-macos-apple-silicon:
name: Publish binary for macOS silicon
@@ -99,9 +99,9 @@ jobs:
asset_name: meilisearch-macos-apple-silicon
steps:
- name: Checkout repository
uses: actions/checkout@v5
uses: actions/checkout@v3
- name: Installing Rust toolchain
uses: dtolnay/rust-toolchain@1.89
uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
target: ${{ matrix.target }}
@@ -111,9 +111,9 @@ jobs:
command: build
args: --release --target ${{ matrix.target }}
- name: Upload the binary to release
# No need to upload binaries for dry run (cron or workflow_dispatch)
# No need to upload binaries for dry run (cron)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.11.2
uses: svenstaro/upload-release-action@2.7.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/${{ matrix.target }}/release/meilisearch
@@ -127,8 +127,8 @@ jobs:
env:
DEBIAN_FRONTEND: noninteractive
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
# Use ubuntu-20.04 to compile with glibc 2.28
image: ubuntu:20.04
strategy:
matrix:
include:
@@ -136,7 +136,7 @@ jobs:
asset_name: meilisearch-linux-aarch64
steps:
- name: Checkout repository
uses: actions/checkout@v5
uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update -y && apt upgrade -y
@@ -148,7 +148,7 @@ jobs:
add-apt-repository "deb [arch=$(dpkg --print-architecture)] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update -y && apt-get install -y docker-ce
- name: Installing Rust toolchain
uses: dtolnay/rust-toolchain@1.89
uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
target: ${{ matrix.target }}
@@ -176,37 +176,11 @@ jobs:
- name: List target output files
run: ls -lR ./target
- name: Upload the binary to release
# No need to upload binaries for dry run (cron or workflow_dispatch)
# No need to upload binaries for dry run (cron)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.11.2
uses: svenstaro/upload-release-action@2.7.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/${{ matrix.target }}/release/meilisearch
asset_name: ${{ matrix.asset_name }}
tag: ${{ github.ref }}
publish-openapi-file:
name: Publish OpenAPI file
needs: check-version
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v5
- name: Setup Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
- name: Generate OpenAPI file
run: |
cd crates/openapi-generator
cargo run --release -- --pretty --output ../../meilisearch.json
- name: Upload OpenAPI to Release
# No need to upload for dry run (cron or workflow_dispatch)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.11.2
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: ./meilisearch.json
asset_name: meilisearch-openapi.json
tag: ${{ github.ref }}

View File

@@ -16,10 +16,8 @@ on:
jobs:
docker:
runs-on: docker
permissions:
id-token: write # This is needed to use Cosign in keyless mode
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
# If we are running a cron or manual job ('schedule' or 'workflow_dispatch' event), it means we are publishing the `nightly` tag, so not considered stable.
# If we have pushed a tag, and the tag has the v<nmumber>.<number>.<number> format, it means we are publishing an official release, so considered stable.
@@ -64,9 +62,6 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Install cosign
uses: sigstore/cosign-installer@d7543c93d881b35a8faa02e8e3605f69b7a1ce62 # tag=v3.10.0
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
@@ -90,7 +85,6 @@ jobs:
- name: Build and push
uses: docker/build-push-action@v6
id: build-and-push
with:
push: true
platforms: linux/amd64,linux/arm64
@@ -100,17 +94,6 @@ jobs:
COMMIT_DATE=${{ steps.build-metadata.outputs.date }}
GIT_TAG=${{ github.ref_name }}
- name: Sign the images with GitHub OIDC Token
env:
DIGEST: ${{ steps.build-and-push.outputs.digest }}
TAGS: ${{ steps.meta.outputs.tags }}
run: |
images=""
for tag in ${TAGS}; do
images+="${tag}@${DIGEST} "
done
cosign sign --yes ${images}
# /!\ Don't touch this without checking with Cloud team
- name: Send CI information to Cloud team
# Do not send if nightly build (i.e. 'schedule' or 'workflow_dispatch' event)
@@ -121,22 +104,3 @@ jobs:
repository: meilisearch/meilisearch-cloud
event-type: cloud-docker-build
client-payload: '{ "meilisearch_version": "${{ github.ref_name }}", "stable": "${{ steps.check-tag-format.outputs.stable }}" }'
# Send notification to Swarmia to notify of a deployment: https://app.swarmia.com
# - name: 'Setup jq'
# uses: dcarbone/install-jq-action
# - name: Send deployment to Swarmia
# if: github.event_name == 'push' && success()
# run: |
# JSON_STRING=$( jq --null-input --compact-output \
# --arg version "${{ github.ref_name }}" \
# --arg appName "meilisearch" \
# --arg environment "production" \
# --arg commitSha "${{ github.sha }}" \
# --arg repositoryFullName "${{ github.repository }}" \
# '{"version": $version, "appName": $appName, "environment": $environment, "commitSha": $commitSha, "repositoryFullName": $repositoryFullName}' )
# curl -H "Authorization: ${{ secrets.SWARMIA_DEPLOYMENTS_AUTHORIZATION }}" \
# -H "Content-Type: application/json" \
# -d "$JSON_STRING" \
# https://hook.swarmia.com/deployments

View File

@@ -9,7 +9,7 @@ on:
required: false
default: nightly
schedule:
- cron: '0 6 * * *' # Every day at 6:00am
- cron: "0 6 * * MON" # Every Monday at 6:00AM
env:
MEILI_MASTER_KEY: 'masterKey'
@@ -22,7 +22,7 @@ jobs:
outputs:
docker-image: ${{ steps.define-image.outputs.docker-image }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: Define the Docker image we need to use
id: define-image
run: |
@@ -46,13 +46,13 @@ jobs:
MEILISEARCH_VERSION: ${{ needs.define-docker-image.outputs.docker-image }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-dotnet
- name: Setup .NET Core
uses: actions/setup-dotnet@v5
uses: actions/setup-dotnet@v4
with:
dotnet-version: "8.0.x"
dotnet-version: "6.0.x"
- name: Install dependencies
run: dotnet restore
- name: Build
@@ -75,7 +75,7 @@ jobs:
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-dart
- uses: dart-lang/setup-dart@v1
@@ -100,10 +100,10 @@ jobs:
- '7700:7700'
steps:
- name: Set up Go
uses: actions/setup-go@v6
uses: actions/setup-go@v5
with:
go-version: stable
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-go
- name: Get dependencies
@@ -114,7 +114,7 @@ jobs:
dep ensure
fi
- name: Run integration tests
run: go test --race -v ./integration
run: go test -v ./...
meilisearch-java-tests:
needs: define-docker-image
@@ -129,19 +129,19 @@ jobs:
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-java
- name: Set up Java
uses: actions/setup-java@v5
uses: actions/setup-java@v4
with:
java-version: 17
distribution: 'temurin'
java-version: 8
distribution: 'zulu'
cache: gradle
- name: Grant execute permission for gradlew
run: chmod +x gradlew
- name: Build and run unit and integration tests
run: ./gradlew build integrationTest --info
run: ./gradlew build integrationTest
meilisearch-js-tests:
needs: define-docker-image
@@ -156,11 +156,11 @@ jobs:
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-js
- name: Setup node
uses: actions/setup-node@v5
uses: actions/setup-node@v4
with:
cache: 'yarn'
- name: Install dependencies
@@ -191,7 +191,7 @@ jobs:
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-php
- name: Install PHP
@@ -220,11 +220,11 @@ jobs:
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-python
- name: Set up Python
uses: actions/setup-python@v6
uses: actions/setup-python@v5
- name: Install pipenv
uses: dschep/install-pipenv-action@v1
- name: Install dependencies
@@ -245,7 +245,7 @@ jobs:
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-ruby
- name: Set up Ruby 3
@@ -270,7 +270,7 @@ jobs:
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-rust
- name: Build
@@ -291,7 +291,7 @@ jobs:
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-swift
- name: Run tests
@@ -314,11 +314,11 @@ jobs:
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-js-plugins
- name: Setup node
uses: actions/setup-node@v5
uses: actions/setup-node@v4
with:
cache: yarn
- name: Install dependencies
@@ -344,23 +344,15 @@ jobs:
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
env:
RAILS_VERSION: '7.0'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-rails
- name: Install SQLite dependencies
run: sudo apt-get update && sudo apt-get install -y libsqlite3-dev
- name: Set up Ruby
- name: Set up Ruby 3
uses: ruby/setup-ruby@v1
with:
ruby-version: 3
bundler-cache: true
- name: Start MongoDB
uses: supercharge/mongodb-github-action@1.12.0
with:
mongodb-version: 8.0
- name: Run tests
run: bundle exec rspec
@@ -377,7 +369,7 @@ jobs:
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-symfony
- name: Install PHP

View File

@@ -3,10 +3,14 @@ name: Test suite
on:
workflow_dispatch:
schedule:
# Every day at 5:00am
- cron: "0 5 * * *"
# Everyday at 5:00am
- cron: '0 5 * * *'
pull_request:
merge_group:
push:
# trying and staging branches are for Bors config
branches:
- trying
- staging
env:
CARGO_TERM_COLOR: always
@@ -15,21 +19,21 @@ env:
jobs:
test-linux:
name: Tests on ubuntu-22.04
name: Tests on ubuntu-20.04
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
# Use ubuntu-20.04 to compile with glibc 2.28
image: ubuntu:20.04
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- name: Setup test with Rust stable
uses: dtolnay/rust-toolchain@1.89
uses: dtolnay/rust-toolchain@1.79
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
uses: Swatinem/rust-cache@v2.7.5
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
@@ -49,10 +53,10 @@ jobs:
matrix:
os: [macos-13, windows-2022]
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
- uses: dtolnay/rust-toolchain@1.89
uses: Swatinem/rust-cache@v2.7.5
- uses: dtolnay/rust-toolchain@1.79
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
@@ -68,68 +72,36 @@ jobs:
name: Tests almost all features
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
# Use ubuntu-20.04 to compile with glibc 2.28
image: ubuntu:20.04
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update
apt-get install --assume-yes build-essential curl
- uses: dtolnay/rust-toolchain@1.89
- uses: dtolnay/rust-toolchain@1.79
- name: Run cargo build with almost all features
run: |
cargo build --workspace --locked --release --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
cargo build --workspace --locked --release --features "$(cargo xtask list-features --exclude-feature cuda)"
- name: Run cargo test with almost all features
run: |
cargo test --workspace --locked --release --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
ollama-ubuntu:
name: Test with Ollama
runs-on: ubuntu-latest
env:
MEILI_TEST_OLLAMA_SERVER: "http://localhost:11434"
steps:
- uses: actions/checkout@v5
- name: Install Ollama
run: |
curl -fsSL https://ollama.com/install.sh | sudo -E sh
- name: Start serving
run: |
# Run it in the background, there is no way to daemonise at the moment
ollama serve &
# A short pause is required before the HTTP port is opened
sleep 5
# This endpoint blocks until ready
time curl -i http://localhost:11434
- name: Pull nomic-embed-text & all-minilm
run: |
ollama pull nomic-embed-text
ollama pull all-minilm
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all --features test-ollama ollama
cargo test --workspace --locked --release --features "$(cargo xtask list-features --exclude-feature cuda)"
test-disabled-tokenization:
name: Test disabled tokenization
runs-on: ubuntu-latest
container:
image: ubuntu:22.04
image: ubuntu:20.04
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update
apt-get install --assume-yes build-essential curl
- uses: dtolnay/rust-toolchain@1.89
- uses: dtolnay/rust-toolchain@1.79
- name: Run cargo tree without default features and check lindera is not present
run: |
if cargo tree -f '{p} {f}' -e normal --no-default-features | grep -qz lindera; then
@@ -145,17 +117,17 @@ jobs:
name: Run tests in debug
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
# Use ubuntu-20.04 to compile with glibc 2.28
image: ubuntu:20.04
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.89
- uses: dtolnay/rust-toolchain@1.79
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
uses: Swatinem/rust-cache@v2.7.5
- name: Run tests in debug
uses: actions-rs/cargo@v1
with:
@@ -166,13 +138,13 @@ jobs:
name: Run Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
components: clippy
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
uses: Swatinem/rust-cache@v2.7.5
- name: Run cargo clippy
uses: actions-rs/cargo@v1
with:
@@ -183,15 +155,15 @@ jobs:
name: Run Rustfmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
toolchain: nightly-2024-07-09
override: true
components: rustfmt
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
uses: Swatinem/rust-cache@v2.7.5
- name: Run cargo fmt
# Since we never ran the `build.rs` script in the benchmark directory we are missing one auto-generated import file.
# Since we want to trigger (and fail) this action as fast as possible, instead of building the benchmark crate

View File

@@ -4,7 +4,7 @@ on:
workflow_dispatch:
inputs:
new_version:
description: "The new version (vX.Y.Z)"
description: 'The new version (vX.Y.Z)'
required: true
env:
@@ -17,8 +17,8 @@ jobs:
name: Update version in Cargo.toml
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.79
with:
profile: minimal
- name: Install sd
@@ -41,4 +41,5 @@ jobs:
--title "Update version for the next release ($NEW_VERSION) in Cargo.toml" \
--body '⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.' \
--label 'skip changelog' \
--milestone $NEW_VERSION \
--base $GITHUB_REF_NAME

14
.gitignore vendored
View File

@@ -5,27 +5,17 @@
**/*.json_lines
**/*.rs.bk
/*.mdb
/*.ms
/data.ms
/snapshots
/dumps
/bench
/_xtask_benchmark.ms
/benchmarks
.DS_Store
# Snapshots
## ... large
*.full.snap
## ... unreviewed
## ... unreviewed
*.snap.new
## ... pending
*.pending-snap
# Tmp files
.tmp*
# Database snapshot
crates/meilisearch/db.snapshot
# Fuzzcheck data for the facet indexing fuzz test
crates/milli/fuzz/update::facet::incremental::fuzz::fuzz/

View File

@@ -48,27 +48,6 @@ cargo xtask bench --no-dashboard -- workloads/my_workload_1.json workloads/my_wo
For processing the results, look at [Looking at benchmark results/Without dashboard](#without-dashboard).
#### Sending a workload by hand
Sometimes you want to visualize the metrics of a worlkoad that comes from a custom report.
It is not quite easy to trick the benchboard in thinking that your report is legitimate but here are the commands you can run to upload your firefox report on a running benchboard.
```bash
# Name this hostname whatever you want
echo '{ "hostname": "the-best-place" }' | xh PUT 'http://127.0.0.1:9001/api/v1/machine'
# You'll receive an UUID from this command that we will call $invocation_uuid
echo '{ "commit": { "sha1": "1234567", "commit_date": "2024-09-05 12:00:12.0 +00:00:00", "message": "A cool message" }, "machine_hostname": "the-best-place", "max_workloads": 1 }' | xh PUT 'http://127.0.0.1:9001/api/v1/invocation'
# Just use UUID from the previous command
# and you'll receive another UUID that we will call $workload_uuid
echo '{ "invocation_uuid": "$invocation_uuid", "name": "toto", "max_runs": 1 }' | xh PUT 'http://127.0.0.1:9001/api/v1/workload'
# And now use your $workload_uuid and the content of your firefox report
# but don't forget to convert your firefox report from JSONLines into an object
echo '{ "workload_uuid": "$workload_uuid", "data": $REPORT_JSON_DATA }' | xh PUT 'http://127.0.0.1:9001/api/v1/run'
```
### In CI
We have dedicated runners to run workloads on CI. Currently, there are three ways of running the CI:

View File

@@ -57,17 +57,9 @@ This command will be triggered to each PR as a requirement for merging it.
You can set the `LINDERA_CACHE` environment variable to speed up your successive builds by up to 2 minutes.
It'll store some built artifacts in the directory of your choice.
We recommend using the `$HOME/.cache/meili/lindera` directory:
We recommend using the standard `$HOME/.cache/lindera` directory:
```sh
export LINDERA_CACHE=$HOME/.cache/meili/lindera
```
You can set the `MILLI_BENCH_DATASETS_PATH` environment variable to further speed up your builds.
It'll store some big files used for the benchmarks in the directory of your choice.
We recommend using the `$HOME/.cache/meili/benches` directory:
```sh
export MILLI_BENCH_DATASETS_PATH=$HOME/.cache/meili/benches
export LINDERA_CACHE=$HOME/.cache/lindera
```
Furthermore, you can improve incremental compilation by setting the `MEILI_NO_VERGEN` environment variable.
@@ -103,23 +95,6 @@ Meilisearch follows the [cargo xtask](https://github.com/matklad/cargo-xtask) wo
Run `cargo xtask --help` from the root of the repository to find out what is available.
#### Update the openAPI file if the APIchanged
To update the openAPI file in the code, see [sprint_issue.md](https://github.com/meilisearch/meilisearch/blob/main/.github/ISSUE_TEMPLATE/sprint_issue.md#reminders-when-modifying-the-api).
If you want to generate OpenAPI file manually:
With swagger:
- Starts Meilisearch with the `swagger` feature flag: `cargo run --features swagger`
- On a browser, open the following URL: http://localhost:7700/scalar
- Click the « Download openAPI file »
With the internal crate:
```bash
cd crates/openapi-generator
cargo run --release -- --pretty --output meilisearch.json
```
### Logging
Meilisearch uses [`tracing`](https://lib.rs/crates/tracing) for logging purposes. Tracing logs are structured and can be displayed as JSON to the end user, so prefer passing arguments as fields rather than interpolating them in the message.
@@ -170,39 +145,28 @@ Some notes on GitHub PRs:
- The PR title should be accurate and descriptive of the changes.
- [Convert your PR as a draft](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request) if your changes are a work in progress: no one will review it until you pass your PR as ready for review.<br>
The draft PRs are recommended when you want to show that you are working on something and make your work visible.
- The branch related to the PR must be **up-to-date with `main`** before merging. Fortunately, this project uses [GitHub Merge Queues](https://github.blog/news-insights/product-news/github-merge-queue-is-generally-available/) to automatically enforce this requirement without the PR author having to rebase manually.
- The branch related to the PR must be **up-to-date with `main`** before merging. Fortunately, this project uses [Bors](https://github.com/bors-ng/bors-ng) to automatically enforce this requirement without the PR author having to rebase manually.
## Merging PRs
This project uses GitHub Merge Queues that helps us manage pull requests merging.
Before merging a PR, the maintainer should ensure the following requirements are met
- Automated tests have been added.
- If some tests cannot be automated, manual rigorous tests should be applied.
- ⚠️ If there is an change in the DB: it's mandatory to manually test the `--experimental-dumpless-upgrade` on a DB of the previous Meilisearch minor version (e.g. v1.13 for the v1.14 release).
- If necessary, the feature have been tested in the Cloud production environment (with [prototypes](./documentation/prototypes.md)) and the Cloud UI is ready.
- If necessary, the [documentation](https://github.com/meilisearch/documentation) related to the implemented feature in the PR is ready.
- If necessary, the [integrations](https://github.com/meilisearch/integration-guides) related to the implemented feature in the PR are ready.
## Publish Process (for internal team only)
## Release Process (for internal team only)
Meilisearch tools follow the [Semantic Versioning Convention](https://semver.org/).
### How to publish a new release
### Automation to rebase and Merge the PRs
The full Meilisearch release process is described in [this guide](./documentation/release.md).
This project integrates a bot that helps us manage pull requests merging.<br>
_[Read more about this](https://github.com/meilisearch/integration-guides/blob/main/resources/bors.md)._
### How to Publish a new Release
The full Meilisearch release process is described in [this guide](https://github.com/meilisearch/engine-team/blob/main/resources/meilisearch-release.md). Please follow it carefully before doing any release.
### How to publish a prototype
Depending on the developed feature, you might need to provide a prototyped version of Meilisearch to make it easier to test by the users.
This happens in two steps:
- [Release the prototype](./documentation/prototypes.md#how-to-publish-a-prototype)
- [Communicate about it](./documentation/prototypes.md#communication)
### How to implement and publish an experimental feature
Here is our [guidelines and process](./documentation/experimental-features.md) to implement and publish an experimental feature.
- [Release the prototype](https://github.com/meilisearch/engine-team/blob/main/resources/prototypes.md#how-to-publish-a-prototype)
- [Communicate about it](https://github.com/meilisearch/engine-team/blob/main/resources/prototypes.md#communication)
### Release assets

4499
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -19,11 +19,10 @@ members = [
"crates/tracing-trace",
"crates/xtask",
"crates/build-info",
"crates/openapi-generator",
]
[workspace.package]
version = "1.24.0"
version = "1.12.1"
authors = [
"Quentin de Quelen <quentin@dequelen.me>",
"Clément Renault <clement@meilisearch.com>",
@@ -37,12 +36,6 @@ license = "MIT"
[profile.release]
codegen-units = 1
# We now compile heed without the NDEBUG define for better performance.
# However, we still enable debug assertions for a better detection of
# disk corruption on the cloud or in OSS.
[profile.release.package.heed]
debug-assertions = true
[profile.dev.package.flate2]
opt-level = 3

View File

@@ -1,5 +1,5 @@
# Compile
FROM rust:1.89-alpine3.22 AS compiler
FROM rust:1.79.0-alpine3.20 AS compiler
RUN apk add -q --no-cache build-base openssl-dev
@@ -20,7 +20,7 @@ RUN set -eux; \
cargo build --release -p meilisearch -p meilitool
# Run
FROM alpine:3.22
FROM alpine:3.20
LABEL org.opencontainers.image.source="https://github.com/meilisearch/meilisearch"
ENV MEILI_HTTP_ADDR 0.0.0.0:7700

22
LICENSE
View File

@@ -1,9 +1,21 @@
# License
MIT License
Copyright (c) 2019-2025 Meili SAS
Copyright (c) 2019-2024 Meili SAS
Part of this work fall under the Meilisearch Enterprise Edition (EE) and are licensed under the Business Source License 1.1, please refer to [LICENSE-EE](./LICENSE-EE) for details.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The other parts of this work are licensed under the [MIT license](./LICENSE-MIT).
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
`SPDX-License-Identifier: MIT AND BUSL-1.1`
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -1,67 +0,0 @@
Business Source License 1.1 Adapted for Meili SAS
This license is based on the Business Source License version 1.1, as published by MariaDB Corporation Ab.
Parameters
Licensor: Meili SAS
Licensed Work: Any file explicitly marked as “Enterprise Edition (EE)” or “governed by the Business Source License” residing in enterprise_editions modules/folders.
Additional Use Grant:
You may use, modify, and distribute the Licensed Work for non-production purposes only, such as testing, development, or evaluation.
Production use of the Licensed Work requires a commercial license agreement with Meilisearch. Contact bonjour@meilisearch.com for licensing.
Change License: MIT
Change Date: Four years from the date the Licensed Work is published.
This License does not apply to any code outside of the Licensed Work, which remains under the MIT license.
For information about alternative licensing arrangements for the Licensed Work,
please contact bonjour@meilisearch.com or sales@meilisearch.com.
Notice
Business Source License 1.1
Terms
The Licensor hereby grants you the right to copy, modify, create derivative
works, redistribute, and make non-production use of the Licensed Work. The
Licensor may make an Additional Use Grant, above, permitting limited production use.
Effective on the Change Date, or the fourth anniversary of the first publicly
available distribution of a specific version of the Licensed Work under this
License, whichever comes first, the Licensor hereby grants you rights under
the terms of the Change License, and the rights granted in the paragraph
above terminate.
If your use of the Licensed Work does not comply with the requirements
currently in effect as described in this License, you must purchase a
commercial license from the Licensor, its affiliated entities, or authorized
resellers, or you must refrain from using the Licensed Work.
All copies of the original and modified Licensed Work, and derivative works
of the Licensed Work, are subject to this License. This License applies
separately for each version of the Licensed Work and the Change Date may vary
for each version of the Licensed Work released by Licensor.
You must conspicuously display this License on each original or modified copy
of the Licensed Work. If you receive the Licensed Work in original or
modified form from a third party, the terms and conditions set forth in this
License apply to your use of that work.
Any use of the Licensed Work in violation of this License will automatically
terminate your rights under this License for the current and all other
versions of the Licensed Work.
This License does not grant you any right in any trademark or logo of
Licensor or its affiliates (provided that you may use a trademark or logo of
Licensor as expressly required by this License).
TO THE EXTENT PERMITTED BY APPLICABLE LAW, THE LICENSED WORK IS PROVIDED ON
AN "AS IS" BASIS. LICENSOR HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS,
EXPRESS OR IMPLIED, INCLUDING (WITHOUT LIMITATION) WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, AND
TITLE.

View File

@@ -1,21 +0,0 @@
MIT License
Copyright (c) 2019-2025 Meili SAS
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -20,7 +20,7 @@
<p align="center">
<a href="https://deps.rs/repo/github/meilisearch/meilisearch"><img src="https://deps.rs/repo/github/meilisearch/meilisearch/status.svg" alt="Dependency status"></a>
<a href="https://github.com/meilisearch/meilisearch/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-informational" alt="License"></a>
<a href="https://github.com/meilisearch/meilisearch/queue"><img alt="Merge Queues enabled" src="https://img.shields.io/badge/Merge_Queues-enabled-%2357cf60?logo=github"></a>
<a href="https://ms-bors.herokuapp.com/repositories/52"><img src="https://bors.tech/images/badge_small.svg" alt="Bors enabled"></a>
</p>
<p align="center">⚡ A lightning-fast search engine that fits effortlessly into your apps, websites, and workflow 🔍</p>
@@ -41,7 +41,7 @@
- [**Movies**](https://where2watch.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=organization) — An application to help you find streaming platforms to watch movies using [hybrid search](https://www.meilisearch.com/solutions/hybrid-search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos).
- [**Ecommerce**](https://ecommerce.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Ecommerce website using disjunctive [facets](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos), range and rating filtering, and pagination.
- [**Songs**](https://music.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Search through 47 million of songs.
- [**SaaS**](https://saas.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Search for contacts, deals, and companies in this [multi-tenant](https://www.meilisearch.com/docs/learn/security/multitenancy_tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) CRM application.
- [**SaaS**](https://saas.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Search for contacts, deals, and companies in this [multi-tenant](https://www.meilisearch.com/docs/learn/security/multitenancy_tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) CRM application.
See the list of all our example apps in our [demos repository](https://github.com/meilisearch/demos).
@@ -58,7 +58,6 @@ See the list of all our example apps in our [demos repository](https://github.co
- **[Multi-Tenancy](https://www.meilisearch.com/docs/learn/security/multitenancy_tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** personalize search results for any number of application tenants
- **Highly Customizable:** customize Meilisearch to your specific needs or use our out-of-the-box and hassle-free presets
- **[RESTful API](https://www.meilisearch.com/docs/reference/api/overview?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** integrate Meilisearch in your technical stack with our plugins and SDKs
- **AI-ready:** works out of the box with [langchain](https://www.meilisearch.com/with/langchain) and the [model context protocol](https://github.com/meilisearch/meilisearch-mcp)
- **Easy to install, deploy, and maintain**
## 📖 Documentation
@@ -89,26 +88,6 @@ We also offer a wide range of dedicated guides to all Meilisearch features, such
Finally, for more in-depth information, refer to our articles explaining fundamental Meilisearch concepts such as [documents](https://www.meilisearch.com/docs/learn/core_concepts/documents?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced) and [indexes](https://www.meilisearch.com/docs/learn/core_concepts/indexes?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced).
## 🧾 Editions & Licensing
Meilisearch is available in two editions:
### 🧪 Community Edition (CE)
- Fully open source under the [MIT license](./LICENSE)
- Core search engine with fast and relevant full-text, semantic or hybrid search
- Free to use for anyone, including commercial usage
### 🏢 Enterprise Edition (EE)
- Includes advanced features such as:
- Sharding
- Governed by a [commercial license](./LICENSE-EE) or the [Business Source License 1.1](https://mariadb.com/bsl11)
- Not allowed in production without a commercial agreement with Meilisearch.
- You may use, modify, and distribute the Licensed Work for non-production purposes only, such as testing, development, or evaluation.
Want access to Enterprise features? → Contact us at [sales@meilisearch.com](maito:sales@meilisearch.com).
## 📊 Telemetry
Meilisearch collects **anonymized** user data to help us improve our product. You can [deactivate this](https://www.meilisearch.com/docs/learn/what_is_meilisearch/telemetry?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=telemetry#how-to-disable-data-collection) whenever you want.
@@ -119,9 +98,9 @@ If you want to know more about the kind of data we collect and what we use it fo
## 📫 Get in touch!
Meilisearch is a search engine created by [Meili](https://www.meilisearch.com/careers), a software development company headquartered in France and with team members all over the world. Want to know more about us? [Check out our blog!](https://blog.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=contact)
Meilisearch is a search engine created by [Meili]([https://www.welcometothejungle.com/en/companies/meilisearch](https://www.meilisearch.com/careers)), a software development company headquartered in France and with team members all over the world. Want to know more about us? [Check out our blog!](https://blog.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=contact)
🗞 [Subscribe to our newsletter](https://share-eu1.hsforms.com/1LN5N0x_GQgq7ss7tXmSykwfg3aq) if you don't want to miss any updates! We promise we won't clutter your mailbox: we only send one edition every two months.
🗞 [Subscribe to our newsletter](https://meilisearch.us2.list-manage.com/subscribe?u=27870f7b71c908a8b359599fb&id=79582d828e) if you don't want to miss any updates! We promise we won't clutter your mailbox: we only send one edition every two months.
💌 Want to make a suggestion or give feedback? Here are some of the channels where you can reach us:
@@ -139,6 +118,6 @@ Meilisearch is, and will always be, open-source! If you want to contribute to th
Meilisearch releases and their associated binaries are available on the project's [releases page](https://github.com/meilisearch/meilisearch/releases).
The binaries are versioned following [SemVer conventions](https://semver.org/). To know more, read our [versioning policy](./documentation/versioning-policy.md).
The binaries are versioned following [SemVer conventions](https://semver.org/). To know more, read our [versioning policy](https://github.com/meilisearch/engine-team/blob/main/resources/versioning-policy.md).
Differently from the binaries, crates in this repository are not currently available on [crates.io](https://crates.io/) and do not follow [SemVer conventions](https://semver.org).

View File

@@ -1403,398 +1403,6 @@
"title": "Number of tasks by indexes",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 15,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green"
},
{
"color": "red",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 11,
"w": 12,
"x": 12,
"y": 51
},
"id": 29,
"interval": "5s",
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "8.1.4",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "builder",
"exemplar": true,
"expr": "meilisearch_task_queue_latency_seconds{instance=\"$instance\", job=\"$job\"}",
"interval": "",
"legendFormat": "{{value}} ",
"range": true,
"refId": "A"
}
],
"title": "Task queue latency",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 15,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green"
},
{
"color": "red",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 11,
"w": 12,
"x": 12,
"y": 51
},
"id": 29,
"interval": "5s",
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "8.1.4",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "builder",
"exemplar": true,
"expr": "meilisearch_task_queue_used_size{instance=\"$instance\", job=\"$job\"}",
"interval": "",
"legendFormat": "{{value}} ",
"range": true,
"refId": "A"
}
],
"title": "Task queue used size in bytes",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 15,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green"
},
{
"color": "red",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 11,
"w": 12,
"x": 12,
"y": 51
},
"id": 29,
"interval": "5s",
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "8.1.4",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "builder",
"exemplar": true,
"expr": "meilisearch_task_queue_size_until_stop_registering{instance=\"$instance\", job=\"$job\"}",
"interval": "",
"legendFormat": "{{value}} ",
"range": true,
"refId": "A"
}
],
"title": "Task queue available size until it stop receiving tasks.",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 15,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green"
},
{
"color": "red",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 11,
"w": 12,
"x": 12,
"y": 51
},
"id": 29,
"interval": "5s",
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "8.1.4",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "builder",
"exemplar": true,
"expr": "meilisearch_task_queue_max_size{instance=\"$instance\", job=\"$job\"}",
"interval": "",
"legendFormat": "{{value}} ",
"range": true,
"refId": "A"
}
],
"title": "Task queue maximum possible size",
"type": "stat"
},
{
"collapsed": true,
"datasource": {

Binary file not shown.

Before

Width:  |  Height:  |  Size: 578 KiB

11
bors.toml Normal file
View File

@@ -0,0 +1,11 @@
status = [
'Tests on ubuntu-20.04',
'Tests on macos-13',
'Tests on windows-2022',
'Run Clippy',
'Run Rustfmt',
'Run tests in debug',
]
pr_status = ['Milestone Check']
# 3 hours timeout
timeout-sec = 10800

View File

@@ -11,27 +11,27 @@ edition.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.98"
bumpalo = "3.18.1"
csv = "1.3.1"
memmap2 = "0.9.7"
anyhow = "1.0.86"
bumpalo = "3.16.0"
csv = "1.3.0"
memmap2 = "0.9.5"
milli = { path = "../milli" }
mimalloc = { version = "0.1.47", default-features = false }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
tempfile = "3.20.0"
mimalloc = { version = "0.1.43", default-features = false }
serde_json = { version = "1.0.120", features = ["preserve_order"] }
tempfile = "3.14.0"
[dev-dependencies]
criterion = { version = "0.6.0", features = ["html_reports"] }
criterion = { version = "0.5.1", features = ["html_reports"] }
rand = "0.8.5"
rand_chacha = "0.3.1"
roaring = "0.10.12"
roaring = "0.10.7"
[build-dependencies]
anyhow = "1.0.98"
bytes = "1.10.1"
convert_case = "0.8.0"
flate2 = "1.1.2"
reqwest = { version = "0.12.20", features = ["blocking", "rustls-tls"], default-features = false }
anyhow = "1.0.86"
bytes = "1.6.0"
convert_case = "0.6.0"
flate2 = "1.0.30"
reqwest = { version = "0.12.5", features = ["blocking", "rustls-tls"], default-features = false }
[features]
default = ["milli/all-tokenizations"]
@@ -51,11 +51,3 @@ harness = false
[[bench]]
name = "indexing"
harness = false
[[bench]]
name = "sort"
harness = false
[[bench]]
name = "filter_starts_with"
harness = false

View File

@@ -1,66 +0,0 @@
mod datasets_paths;
mod utils;
use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use milli::FilterableAttributesRule;
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
fn base_conf(builder: &mut Settings) {
let displayed_fields = ["geonameid", "name"].iter().map(|s| s.to_string()).collect();
builder.set_displayed_fields(displayed_fields);
let filterable_fields =
["name"].iter().map(|s| FilterableAttributesRule::Field(s.to_string())).collect();
builder.set_filterable_fields(filterable_fields);
}
#[rustfmt::skip]
const BASE_CONF: Conf = Conf {
dataset: datasets_paths::SMOL_ALL_COUNTRIES,
dataset_format: "jsonl",
queries: &[
"",
],
configure: base_conf,
primary_key: Some("geonameid"),
..Conf::BASE
};
fn filter_starts_with(c: &mut criterion::Criterion) {
#[rustfmt::skip]
let confs = &[
utils::Conf {
group_name: "1 letter",
filter: Some("name STARTS WITH e"),
..BASE_CONF
},
utils::Conf {
group_name: "2 letters",
filter: Some("name STARTS WITH es"),
..BASE_CONF
},
utils::Conf {
group_name: "3 letters",
filter: Some("name STARTS WITH est"),
..BASE_CONF
},
utils::Conf {
group_name: "6 letters",
filter: Some("name STARTS WITH estoni"),
..BASE_CONF
}
];
utils::run_benches(c, confs);
}
criterion_group!(benches, filter_starts_with);
criterion_main!(benches);

View File

@@ -10,9 +10,9 @@ use milli::documents::PrimaryKey;
use milli::heed::{EnvOpenOptions, RwTxn};
use milli::progress::Progress;
use milli::update::new::indexer;
use milli::update::{IndexerConfig, Settings};
use milli::vector::RuntimeEmbedders;
use milli::{FilterableAttributesRule, Index};
use milli::update::{IndexDocumentsMethod, IndexerConfig, Settings};
use milli::vector::EmbeddingConfigs;
use milli::Index;
use rand::seq::SliceRandom;
use rand_chacha::rand_core::SeedableRng;
use roaring::RoaringBitmap;
@@ -35,11 +35,10 @@ fn setup_dir(path: impl AsRef<Path>) {
fn setup_index() -> Index {
let path = "benches.mmdb";
setup_dir(path);
let options = EnvOpenOptions::new();
let mut options = options.read_txn_without_tls();
let mut options = EnvOpenOptions::new();
options.map_size(100 * 1024 * 1024 * 1024); // 100 GB
options.max_readers(100);
Index::new(options, path, true).unwrap()
Index::new(options, path).unwrap()
}
fn setup_settings<'t>(
@@ -58,14 +57,13 @@ fn setup_settings<'t>(
let searchable_fields = searchable_fields.iter().map(|s| s.to_string()).collect();
builder.set_searchable_fields(searchable_fields);
let filterable_fields =
filterable_fields.iter().map(|s| FilterableAttributesRule::Field(s.to_string())).collect();
let filterable_fields = filterable_fields.iter().map(|s| s.to_string()).collect();
builder.set_filterable_fields(filterable_fields);
let sortable_fields = sortable_fields.iter().map(|s| s.to_string()).collect();
builder.set_sortable_fields(sortable_fields);
builder.execute(&|| false, &Progress::default(), Default::default()).unwrap();
builder.execute(|_| (), || false).unwrap();
}
fn setup_index_with_settings(
@@ -140,9 +138,10 @@ fn indexing_songs_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_SONGS, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -154,7 +153,6 @@ fn indexing_songs_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -167,10 +165,9 @@ fn indexing_songs_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -208,9 +205,10 @@ fn reindexing_songs_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_SONGS, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -222,7 +220,6 @@ fn reindexing_songs_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -235,10 +232,9 @@ fn reindexing_songs_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -254,9 +250,10 @@ fn reindexing_songs_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_SONGS, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -268,7 +265,6 @@ fn reindexing_songs_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -281,10 +277,9 @@ fn reindexing_songs_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -324,9 +319,10 @@ fn deleting_songs_in_batches_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_SONGS, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -338,7 +334,6 @@ fn deleting_songs_in_batches_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -351,10 +346,9 @@ fn deleting_songs_in_batches_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -402,9 +396,10 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_SONGS_1_2, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -416,7 +411,6 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -429,10 +423,9 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -448,9 +441,10 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_SONGS_3_4, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -462,7 +456,6 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -475,10 +468,9 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -490,9 +482,10 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_SONGS_4_4, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -504,7 +497,6 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -517,10 +509,9 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -558,10 +549,11 @@ fn indexing_songs_without_faceted_numbers(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_SONGS, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -573,7 +565,6 @@ fn indexing_songs_without_faceted_numbers(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -586,10 +577,9 @@ fn indexing_songs_without_faceted_numbers(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -627,9 +617,10 @@ fn indexing_songs_without_faceted_fields(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_SONGS, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -641,7 +632,6 @@ fn indexing_songs_without_faceted_fields(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -654,10 +644,9 @@ fn indexing_songs_without_faceted_fields(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -695,9 +684,10 @@ fn indexing_wiki(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_WIKI_ARTICLES, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -709,7 +699,6 @@ fn indexing_wiki(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -722,10 +711,9 @@ fn indexing_wiki(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -762,9 +750,10 @@ fn reindexing_wiki(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_WIKI_ARTICLES, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -776,7 +765,6 @@ fn reindexing_wiki(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -789,10 +777,9 @@ fn reindexing_wiki(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -808,9 +795,10 @@ fn reindexing_wiki(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_WIKI_ARTICLES, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -822,7 +810,6 @@ fn reindexing_wiki(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -835,10 +822,9 @@ fn reindexing_wiki(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -877,9 +863,10 @@ fn deleting_wiki_in_batches_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_WIKI_ARTICLES, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -891,7 +878,6 @@ fn deleting_wiki_in_batches_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -904,10 +890,9 @@ fn deleting_wiki_in_batches_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -954,10 +939,11 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents =
utils::documents_from(datasets_paths::SMOL_WIKI_ARTICLES_1_2, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -969,7 +955,6 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -982,10 +967,9 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1001,10 +985,11 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents =
utils::documents_from(datasets_paths::SMOL_WIKI_ARTICLES_3_4, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1016,7 +1001,6 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1029,10 +1013,9 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1044,10 +1027,11 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents =
utils::documents_from(datasets_paths::SMOL_WIKI_ARTICLES_4_4, "csv");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1059,7 +1043,6 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1072,10 +1055,9 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1113,9 +1095,10 @@ fn indexing_movies_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::MOVIES, "json");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1127,7 +1110,6 @@ fn indexing_movies_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1140,10 +1122,9 @@ fn indexing_movies_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1180,9 +1161,10 @@ fn reindexing_movies_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::MOVIES, "json");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1194,7 +1176,6 @@ fn reindexing_movies_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1207,10 +1188,9 @@ fn reindexing_movies_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1226,9 +1206,10 @@ fn reindexing_movies_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::MOVIES, "json");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1240,7 +1221,6 @@ fn reindexing_movies_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1253,10 +1233,9 @@ fn reindexing_movies_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1295,9 +1274,10 @@ fn deleting_movies_in_batches_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::MOVIES, "json");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1309,7 +1289,6 @@ fn deleting_movies_in_batches_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1322,10 +1301,9 @@ fn deleting_movies_in_batches_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1371,10 +1349,9 @@ fn delete_documents_from_ids(index: Index, document_ids_to_delete: Vec<RoaringBi
new_fields_ids_map,
Some(primary_key),
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1410,9 +1387,10 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::MOVIES_1_2, "json");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1424,7 +1402,6 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1437,10 +1414,9 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1456,9 +1432,10 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::MOVIES_3_4, "json");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1470,7 +1447,6 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1483,10 +1459,9 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1498,9 +1473,10 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::MOVIES_4_4, "json");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1512,7 +1488,6 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1525,10 +1500,9 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1589,9 +1563,10 @@ fn indexing_nested_movies_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::NESTED_MOVIES, "json");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1603,7 +1578,6 @@ fn indexing_nested_movies_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1616,10 +1590,9 @@ fn indexing_nested_movies_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1681,9 +1654,10 @@ fn deleting_nested_movies_in_batches_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::NESTED_MOVIES, "json");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1695,7 +1669,6 @@ fn deleting_nested_movies_in_batches_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1708,10 +1681,9 @@ fn deleting_nested_movies_in_batches_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1765,9 +1737,10 @@ fn indexing_nested_movies_without_faceted_fields(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::NESTED_MOVIES, "json");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1779,7 +1752,6 @@ fn indexing_nested_movies_without_faceted_fields(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1792,10 +1764,9 @@ fn indexing_nested_movies_without_faceted_fields(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1833,9 +1804,10 @@ fn indexing_geo(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_ALL_COUNTRIES, "jsonl");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1847,7 +1819,6 @@ fn indexing_geo(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1860,10 +1831,9 @@ fn indexing_geo(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1900,9 +1870,10 @@ fn reindexing_geo(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_ALL_COUNTRIES, "jsonl");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1914,7 +1885,6 @@ fn reindexing_geo(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1927,10 +1897,9 @@ fn reindexing_geo(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -1946,9 +1915,10 @@ fn reindexing_geo(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_ALL_COUNTRIES, "jsonl");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -1960,7 +1930,6 @@ fn reindexing_geo(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -1973,10 +1942,9 @@ fn reindexing_geo(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -2015,9 +1983,10 @@ fn deleting_geo_in_batches_default(c: &mut Criterion) {
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let mut indexer = indexer::DocumentOperation::new();
let mut indexer =
indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
let documents = utils::documents_from(datasets_paths::SMOL_ALL_COUNTRIES, "jsonl");
indexer.replace_documents(&documents).unwrap();
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -2029,7 +1998,6 @@ fn deleting_geo_in_batches_default(c: &mut Criterion) {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -2042,10 +2010,9 @@ fn deleting_geo_in_batches_default(c: &mut Criterion) {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();

View File

@@ -3,7 +3,6 @@ mod utils;
use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use milli::FilterableAttributesRule;
use utils::Conf;
#[cfg(not(windows))]
@@ -22,10 +21,8 @@ fn base_conf(builder: &mut Settings) {
["name", "alternatenames", "elevation"].iter().map(|s| s.to_string()).collect();
builder.set_searchable_fields(searchable_fields);
let filterable_fields = ["_geo", "population", "elevation"]
.iter()
.map(|s| FilterableAttributesRule::Field(s.to_string()))
.collect();
let filterable_fields =
["_geo", "population", "elevation"].iter().map(|s| s.to_string()).collect();
builder.set_filterable_fields(filterable_fields);
let sortable_fields =

View File

@@ -3,7 +3,6 @@ mod utils;
use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use milli::FilterableAttributesRule;
use utils::Conf;
#[cfg(not(windows))]
@@ -23,7 +22,7 @@ fn base_conf(builder: &mut Settings) {
let faceted_fields = ["released-timestamp", "duration-float", "genre", "country", "artist"]
.iter()
.map(|s| FilterableAttributesRule::Field(s.to_string()))
.map(|s| s.to_string())
.collect();
builder.set_filterable_fields(faceted_fields);
}

View File

@@ -1,114 +0,0 @@
//! This benchmark module is used to compare the performance of sorting documents in /search VS /documents
//!
//! The tests/benchmarks were designed in the context of a query returning only 20 documents.
mod datasets_paths;
mod utils;
use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
fn base_conf(builder: &mut Settings) {
let displayed_fields =
["geonameid", "name", "asciiname", "alternatenames", "_geo", "population"]
.iter()
.map(|s| s.to_string())
.collect();
builder.set_displayed_fields(displayed_fields);
let sortable_fields =
["_geo", "name", "population", "elevation", "timezone", "modification-date"]
.iter()
.map(|s| s.to_string())
.collect();
builder.set_sortable_fields(sortable_fields);
}
#[rustfmt::skip]
const BASE_CONF: Conf = Conf {
dataset: datasets_paths::SMOL_ALL_COUNTRIES,
dataset_format: "jsonl",
configure: base_conf,
primary_key: Some("geonameid"),
queries: &[""],
offsets: &[
Some((0, 20)), // The most common query in the real world
Some((0, 500)), // A query that ranges over many documents
Some((980, 20)), // The worst query that could happen in the real world
Some((800_000, 20)) // The worst query
],
get_documents: true,
..Conf::BASE
};
fn bench_sort(c: &mut criterion::Criterion) {
#[rustfmt::skip]
let confs = &[
utils::Conf {
group_name: "without sort",
sort: None,
..BASE_CONF
},
utils::Conf {
group_name: "sort on many different values",
sort: Some(vec!["name:asc"]),
..BASE_CONF
},
utils::Conf {
group_name: "sort on many similar values",
sort: Some(vec!["timezone:desc"]),
..BASE_CONF
},
utils::Conf {
group_name: "sort on many similar then different values",
sort: Some(vec!["timezone:desc", "name:asc"]),
..BASE_CONF
},
utils::Conf {
group_name: "sort on many different then similar values",
sort: Some(vec!["timezone:desc", "name:asc"]),
..BASE_CONF
},
utils::Conf {
group_name: "geo sort",
sample_size: Some(10),
sort: Some(vec!["_geoPoint(45.4777599, 9.1967508):asc"]),
..BASE_CONF
},
utils::Conf {
group_name: "sort on many similar values then geo sort",
sample_size: Some(50),
sort: Some(vec!["timezone:desc", "_geoPoint(45.4777599, 9.1967508):asc"]),
..BASE_CONF
},
utils::Conf {
group_name: "sort on many different values then geo sort",
sample_size: Some(50),
sort: Some(vec!["name:desc", "_geoPoint(45.4777599, 9.1967508):asc"]),
..BASE_CONF
},
utils::Conf {
group_name: "sort on many fields",
sort: Some(vec!["population:asc", "name:asc", "elevation:asc", "timezone:asc"]),
..BASE_CONF
},
];
utils::run_benches(c, confs);
}
criterion_group!(benches, bench_sort);
criterion_main!(benches);

View File

@@ -9,12 +9,11 @@ use anyhow::Context;
use bumpalo::Bump;
use criterion::BenchmarkId;
use memmap2::Mmap;
use milli::documents::sort::recursive_sort;
use milli::heed::EnvOpenOptions;
use milli::progress::Progress;
use milli::update::new::indexer;
use milli::update::{IndexerConfig, Settings};
use milli::vector::RuntimeEmbedders;
use milli::update::{IndexDocumentsMethod, IndexerConfig, Settings};
use milli::vector::EmbeddingConfigs;
use milli::{Criterion, Filter, Index, Object, TermsMatchingStrategy};
use serde_json::Value;
@@ -36,12 +35,6 @@ pub struct Conf<'a> {
pub configure: fn(&mut Settings),
pub filter: Option<&'a str>,
pub sort: Option<Vec<&'a str>>,
/// set to skip documents (offset, limit)
pub offsets: &'a [Option<(usize, usize)>],
/// enable if you want to bench getting documents without querying
pub get_documents: bool,
/// configure the benchmark sample size
pub sample_size: Option<usize>,
/// enable or disable the optional words on the query
pub optional_words: bool,
/// primary key, if there is None we'll auto-generate docids for every documents
@@ -59,9 +52,6 @@ impl Conf<'_> {
configure: |_| (),
filter: None,
sort: None,
offsets: &[None],
get_documents: false,
sample_size: None,
optional_words: true,
primary_key: None,
};
@@ -75,11 +65,10 @@ pub fn base_setup(conf: &Conf) -> Index {
}
create_dir_all(conf.database_name).unwrap();
let options = EnvOpenOptions::new();
let mut options = options.read_txn_without_tls();
let mut options = EnvOpenOptions::new();
options.map_size(100 * 1024 * 1024 * 1024); // 100 GB
options.max_readers(100);
let index = Index::new(options, conf.database_name, true).unwrap();
let index = Index::new(options, conf.database_name).unwrap();
let config = IndexerConfig::default();
let mut wtxn = index.write_txn().unwrap();
@@ -100,7 +89,7 @@ pub fn base_setup(conf: &Conf) -> Index {
(conf.configure)(&mut builder);
builder.execute(&|| false, &Progress::default(), Default::default()).unwrap();
builder.execute(|_| (), || false).unwrap();
wtxn.commit().unwrap();
let config = IndexerConfig::default();
@@ -110,8 +99,8 @@ pub fn base_setup(conf: &Conf) -> Index {
let mut new_fields_ids_map = db_fields_ids_map.clone();
let documents = documents_from(conf.dataset, conf.dataset_format);
let mut indexer = indexer::DocumentOperation::new();
indexer.replace_documents(&documents).unwrap();
let mut indexer = indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
@@ -123,7 +112,6 @@ pub fn base_setup(conf: &Conf) -> Index {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -136,10 +124,9 @@ pub fn base_setup(conf: &Conf) -> Index {
new_fields_ids_map,
primary_key,
&document_changes,
RuntimeEmbedders::default(),
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();
@@ -156,79 +143,25 @@ pub fn run_benches(c: &mut criterion::Criterion, confs: &[Conf]) {
let file_name = Path::new(conf.dataset).file_name().and_then(|f| f.to_str()).unwrap();
let name = format!("{}: {}", file_name, conf.group_name);
let mut group = c.benchmark_group(&name);
if let Some(sample_size) = conf.sample_size {
group.sample_size(sample_size);
}
for &query in conf.queries {
for offset in conf.offsets {
let parameter = match offset {
None => query.to_string(),
Some((offset, limit)) => format!("{query}[{offset}:{limit}]"),
};
group.bench_with_input(
BenchmarkId::from_parameter(parameter),
&query,
|b, &query| {
b.iter(|| {
let rtxn = index.read_txn().unwrap();
let mut search = index.search(&rtxn);
search
.query(query)
.terms_matching_strategy(TermsMatchingStrategy::default());
if let Some(filter) = conf.filter {
let filter = Filter::from_str(filter).unwrap().unwrap();
search.filter(filter);
}
if let Some(sort) = &conf.sort {
let sort = sort.iter().map(|sort| sort.parse().unwrap()).collect();
search.sort_criteria(sort);
}
if let Some((offset, limit)) = offset {
search.offset(*offset).limit(*limit);
}
let _ids = search.execute().unwrap();
});
},
);
}
}
if conf.get_documents {
for offset in conf.offsets {
let parameter = match offset {
None => String::from("get_documents"),
Some((offset, limit)) => format!("get_documents[{offset}:{limit}]"),
};
group.bench_with_input(BenchmarkId::from_parameter(parameter), &(), |b, &()| {
b.iter(|| {
let rtxn = index.read_txn().unwrap();
if let Some(sort) = &conf.sort {
let sort = sort.iter().map(|sort| sort.parse().unwrap()).collect();
let all_docs = index.documents_ids(&rtxn).unwrap();
let facet_sort =
recursive_sort(&index, &rtxn, sort, &all_docs).unwrap();
let iter = facet_sort.iter().unwrap();
if let Some((offset, limit)) = offset {
let _results = iter.skip(*offset).take(*limit).collect::<Vec<_>>();
} else {
let _results = iter.collect::<Vec<_>>();
}
} else {
let all_docs = index.documents_ids(&rtxn).unwrap();
if let Some((offset, limit)) = offset {
let _results =
all_docs.iter().skip(*offset).take(*limit).collect::<Vec<_>>();
} else {
let _results = all_docs.iter().collect::<Vec<_>>();
}
}
});
group.bench_with_input(BenchmarkId::from_parameter(query), &query, |b, &query| {
b.iter(|| {
let rtxn = index.read_txn().unwrap();
let mut search = index.search(&rtxn);
search.query(query).terms_matching_strategy(TermsMatchingStrategy::default());
if let Some(filter) = conf.filter {
let filter = Filter::from_str(filter).unwrap().unwrap();
search.filter(filter);
}
if let Some(sort) = &conf.sort {
let sort = sort.iter().map(|sort| sort.parse().unwrap()).collect();
search.sort_criteria(sort);
}
let _ids = search.execute().unwrap();
});
}
});
}
group.finish();
index.prepare_for_closing().wait();

View File

@@ -67,7 +67,7 @@ fn main() -> anyhow::Result<()> {
writeln!(
&mut manifest_paths_file,
r#"pub const {}: &str = {:?};"#,
dataset.to_case(Case::UpperSnake),
dataset.to_case(Case::ScreamingSnake),
out_file.display(),
)?;

View File

@@ -11,8 +11,8 @@ license.workspace = true
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
time = { version = "0.3.41", features = ["parsing"] }
time = { version = "0.3.36", features = ["parsing"] }
[build-dependencies]
anyhow = "1.0.98"
vergen-git2 = "1.0.7"
anyhow = "1.0.86"
vergen-git2 = "1.0.0"

View File

@@ -11,21 +11,21 @@ readme.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.98"
flate2 = "1.1.2"
http = "1.3.1"
anyhow = "1.0.86"
flate2 = "1.0.30"
http = "1.1.0"
meilisearch-types = { path = "../meilisearch-types" }
once_cell = "1.21.3"
regex = "1.11.1"
roaring = { version = "0.10.12", features = ["serde"] }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
tar = "0.4.44"
tempfile = "3.20.0"
thiserror = "2.0.12"
time = { version = "0.3.41", features = ["serde-well-known", "formatting", "parsing", "macros"] }
tracing = "0.1.41"
uuid = { version = "1.17.0", features = ["serde", "v4"] }
once_cell = "1.19.0"
regex = "1.10.5"
roaring = { version = "0.10.7", features = ["serde"] }
serde = { version = "1.0.204", features = ["derive"] }
serde_json = { version = "1.0.120", features = ["preserve_order"] }
tar = "0.4.41"
tempfile = "3.10.1"
thiserror = "1.0.61"
time = { version = "0.3.36", features = ["serde-well-known", "formatting", "parsing", "macros"] }
tracing = "0.1.40"
uuid = { version = "1.10.0", features = ["serde", "v4"] }
[dev-dependencies]
big_s = "1.0.2"

View File

@@ -10,10 +10,8 @@ dump
├── instance-uid.uuid
├── keys.jsonl
├── metadata.json
── tasks
├── update_files
│ └── [task_id].jsonl
│ └── queue.jsonl
└── batches
── tasks
├── update_files
│ └── [task_id].jsonl
└── queue.jsonl
```
```

View File

@@ -1,17 +1,12 @@
#![allow(clippy::type_complexity)]
#![allow(clippy::wrong_self_convention)]
use std::collections::BTreeMap;
use meilisearch_types::batches::BatchId;
use meilisearch_types::byte_unit::Byte;
use meilisearch_types::error::ResponseError;
use meilisearch_types::keys::Key;
use meilisearch_types::milli::update::IndexDocumentsMethod;
use meilisearch_types::settings::Unchecked;
use meilisearch_types::tasks::{
Details, ExportIndexSettings, IndexSwap, KindWithContent, Status, Task, TaskId, TaskNetwork,
};
use meilisearch_types::tasks::{Details, IndexSwap, KindWithContent, Status, Task, TaskId};
use meilisearch_types::InstanceUid;
use roaring::RoaringBitmap;
use serde::{Deserialize, Serialize};
@@ -94,8 +89,6 @@ pub struct TaskDump {
default
)]
pub finished_at: Option<OffsetDateTime>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub network: Option<TaskNetwork>,
}
// A `Kind` specific version made for the dump. If modified you may break the dump.
@@ -131,7 +124,6 @@ pub enum KindDump {
},
IndexUpdate {
primary_key: Option<String>,
uid: Option<String>,
},
IndexSwap {
swaps: Vec<IndexSwap>,
@@ -149,18 +141,6 @@ pub enum KindDump {
instance_uid: Option<InstanceUid>,
},
SnapshotCreation,
Export {
url: String,
api_key: Option<String>,
payload_size: Option<Byte>,
indexes: BTreeMap<String, ExportIndexSettings>,
},
UpgradeDatabase {
from: (u32, u32, u32),
},
IndexCompaction {
index_uid: String,
},
}
impl From<Task> for TaskDump {
@@ -177,7 +157,6 @@ impl From<Task> for TaskDump {
enqueued_at: task.enqueued_at,
started_at: task.started_at,
finished_at: task.finished_at,
network: task.network,
}
}
}
@@ -217,8 +196,8 @@ impl From<KindWithContent> for KindDump {
KindWithContent::IndexCreation { primary_key, .. } => {
KindDump::IndexCreation { primary_key }
}
KindWithContent::IndexUpdate { primary_key, new_index_uid: uid, .. } => {
KindDump::IndexUpdate { primary_key, uid }
KindWithContent::IndexUpdate { primary_key, .. } => {
KindDump::IndexUpdate { primary_key }
}
KindWithContent::IndexSwap { swaps } => KindDump::IndexSwap { swaps },
KindWithContent::TaskCancelation { query, tasks } => {
@@ -231,21 +210,6 @@ impl From<KindWithContent> for KindDump {
KindDump::DumpCreation { keys, instance_uid }
}
KindWithContent::SnapshotCreation => KindDump::SnapshotCreation,
KindWithContent::Export { url, api_key, payload_size, indexes } => KindDump::Export {
url,
api_key,
payload_size,
indexes: indexes
.into_iter()
.map(|(pattern, settings)| (pattern.to_string(), settings))
.collect(),
},
KindWithContent::UpgradeDatabase { from: version } => {
KindDump::UpgradeDatabase { from: version }
}
KindWithContent::IndexCompaction { index_uid } => {
KindDump::IndexCompaction { index_uid }
}
}
}
}
@@ -258,17 +222,14 @@ pub(crate) mod test {
use big_s::S;
use maplit::{btreemap, btreeset};
use meilisearch_types::batches::{Batch, BatchEnqueuedAt, BatchStats};
use meilisearch_types::enterprise_edition::network::{Network, Remote};
use meilisearch_types::facet_values_sort::FacetValuesSort;
use meilisearch_types::features::RuntimeTogglableFeatures;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::keys::{Action, Key};
use meilisearch_types::milli;
use meilisearch_types::milli::update::Setting;
use meilisearch_types::milli::{self, FilterableAttributesRule};
use meilisearch_types::settings::{Checked, FacetingSettings, Settings};
use meilisearch_types::task_view::DetailsView;
use meilisearch_types::tasks::{BatchStopReason, Details, Kind, Status};
use meilisearch_types::tasks::{Details, Status};
use serde_json::{json, Map, Value};
use time::macros::datetime;
use uuid::Uuid;
@@ -310,10 +271,7 @@ pub(crate) mod test {
let settings = Settings {
displayed_attributes: Setting::Set(vec![S("race"), S("name")]).into(),
searchable_attributes: Setting::Set(vec![S("name"), S("race")]).into(),
filterable_attributes: Setting::Set(vec![
FilterableAttributesRule::Field(S("race")),
FilterableAttributesRule::Field(S("age")),
]),
filterable_attributes: Setting::Set(btreeset! { S("race"), S("age") }),
sortable_attributes: Setting::Set(btreeset! { S("age") }),
ranking_rules: Setting::NotSet,
stop_words: Setting::NotSet,
@@ -336,42 +294,11 @@ pub(crate) mod test {
localized_attributes: Setting::NotSet,
facet_search: Setting::NotSet,
prefix_search: Setting::NotSet,
chat: Setting::NotSet,
vector_store: Setting::NotSet,
_kind: std::marker::PhantomData,
};
settings.check()
}
pub fn create_test_batches() -> Vec<Batch> {
vec![Batch {
uid: 0,
details: DetailsView {
received_documents: Some(12),
indexed_documents: Some(Some(10)),
..DetailsView::default()
},
progress: None,
stats: BatchStats {
total_nb_tasks: 1,
status: maplit::btreemap! { Status::Succeeded => 1 },
types: maplit::btreemap! { Kind::DocumentAdditionOrUpdate => 1 },
index_uids: maplit::btreemap! { "doggo".to_string() => 1 },
progress_trace: Default::default(),
write_channel_congestion: None,
internal_database_sizes: Default::default(),
},
embedder_stats: Default::default(),
enqueued_at: Some(BatchEnqueuedAt {
earliest: datetime!(2022-11-11 0:00 UTC),
oldest: datetime!(2022-11-11 0:00 UTC),
}),
started_at: datetime!(2022-11-20 0:00 UTC),
finished_at: Some(datetime!(2022-11-21 0:00 UTC)),
stop_reason: BatchStopReason::Unspecified.to_string(),
}]
}
pub fn create_test_tasks() -> Vec<(TaskDump, Option<Vec<Document>>)> {
vec![
(
@@ -395,7 +322,6 @@ pub(crate) mod test {
enqueued_at: datetime!(2022-11-11 0:00 UTC),
started_at: Some(datetime!(2022-11-20 0:00 UTC)),
finished_at: Some(datetime!(2022-11-21 0:00 UTC)),
network: None,
},
None,
),
@@ -420,7 +346,6 @@ pub(crate) mod test {
enqueued_at: datetime!(2022-11-11 0:00 UTC),
started_at: None,
finished_at: None,
network: None,
},
Some(vec![
json!({ "id": 4, "race": "leonberg" }).as_object().unwrap().clone(),
@@ -440,7 +365,6 @@ pub(crate) mod test {
enqueued_at: datetime!(2022-11-15 0:00 UTC),
started_at: None,
finished_at: None,
network: None,
},
None,
),
@@ -497,15 +421,6 @@ pub(crate) mod test {
index.flush().unwrap();
index.settings(&settings).unwrap();
// ========== pushing the batch queue
let batches = create_test_batches();
let mut batch_queue = dump.create_batches_queue().unwrap();
for batch in &batches {
batch_queue.push_batch(batch).unwrap();
}
batch_queue.flush().unwrap();
// ========== pushing the task queue
let tasks = create_test_tasks();
@@ -534,10 +449,6 @@ pub(crate) mod test {
dump.create_experimental_features(features).unwrap();
// ========== network
let network = create_test_network();
dump.create_network(network).unwrap();
// create the dump
let mut file = tempfile::tempfile().unwrap();
dump.persist_to(&mut file).unwrap();
@@ -547,15 +458,7 @@ pub(crate) mod test {
}
fn create_test_features() -> RuntimeTogglableFeatures {
RuntimeTogglableFeatures::default()
}
fn create_test_network() -> Network {
Network {
local: Some("myself".to_string()),
remotes: maplit::btreemap! {"other".to_string() => Remote { url: "http://test".to_string(), search_api_key: Some("apiKey".to_string()), write_api_key: Some("docApiKey".to_string()) }},
sharding: false,
}
RuntimeTogglableFeatures { vector_store: true, ..Default::default() }
}
#[test]
@@ -606,9 +509,5 @@ pub(crate) mod test {
// ==== checking the features
let expected = create_test_features();
assert_eq!(dump.features().unwrap().unwrap(), expected);
// ==== checking the network
let expected = create_test_network();
assert_eq!(&expected, dump.network().unwrap().unwrap());
}
}

View File

@@ -1,4 +1,3 @@
use std::fs::File;
use std::str::FromStr;
use super::v2_to_v3::CompatV2ToV3;
@@ -95,10 +94,6 @@ impl CompatIndexV1ToV2 {
self.from.documents().map(|it| Box::new(it) as Box<dyn Iterator<Item = _>>)
}
pub fn documents_file(&self) -> &File {
self.from.documents_file()
}
pub fn settings(&mut self) -> Result<v2::settings::Settings<v2::settings::Checked>> {
Ok(v2::settings::Settings::<v2::settings::Unchecked>::from(self.from.settings()?).check())
}

View File

@@ -1,4 +1,3 @@
use std::fs::File;
use std::str::FromStr;
use time::OffsetDateTime;
@@ -97,7 +96,6 @@ impl CompatV2ToV3 {
}
}
#[allow(clippy::large_enum_variant)]
pub enum CompatIndexV2ToV3 {
V2(v2::V2IndexReader),
Compat(Box<CompatIndexV1ToV2>),
@@ -124,13 +122,6 @@ impl CompatIndexV2ToV3 {
}
}
pub fn documents_file(&self) -> &File {
match self {
CompatIndexV2ToV3::V2(v2) => v2.documents_file(),
CompatIndexV2ToV3::Compat(compat) => compat.documents_file(),
}
}
pub fn settings(&mut self) -> Result<v3::Settings<v3::Checked>> {
let settings = match self {
CompatIndexV2ToV3::V2(from) => from.settings()?,

View File

@@ -1,5 +1,3 @@
use std::fs::File;
use super::v2_to_v3::{CompatIndexV2ToV3, CompatV2ToV3};
use super::v4_to_v5::CompatV4ToV5;
use crate::reader::{v3, v4, UpdateFile};
@@ -254,13 +252,6 @@ impl CompatIndexV3ToV4 {
}
}
pub fn documents_file(&self) -> &File {
match self {
CompatIndexV3ToV4::V3(v3) => v3.documents_file(),
CompatIndexV3ToV4::Compat(compat) => compat.documents_file(),
}
}
pub fn settings(&mut self) -> Result<v4::Settings<v4::Checked>> {
Ok(match self {
CompatIndexV3ToV4::V3(v3) => {

View File

@@ -1,5 +1,3 @@
use std::fs::File;
use super::v3_to_v4::{CompatIndexV3ToV4, CompatV3ToV4};
use super::v5_to_v6::CompatV5ToV6;
use crate::reader::{v4, v5, Document};
@@ -243,13 +241,6 @@ impl CompatIndexV4ToV5 {
}
}
pub fn documents_file(&self) -> &File {
match self {
CompatIndexV4ToV5::V4(v4) => v4.documents_file(),
CompatIndexV4ToV5::Compat(compat) => compat.documents_file(),
}
}
pub fn settings(&mut self) -> Result<v5::Settings<v5::Checked>> {
match self {
CompatIndexV4ToV5::V4(v4) => Ok(v5::Settings::from(v4.settings()?).check()),

View File

@@ -1,5 +1,3 @@
use std::fs::File;
use std::num::NonZeroUsize;
use std::str::FromStr;
use super::v4_to_v5::{CompatIndexV4ToV5, CompatV4ToV5};
@@ -85,7 +83,7 @@ impl CompatV5ToV6 {
v6::Kind::IndexCreation { primary_key }
}
v5::tasks::TaskContent::IndexUpdate { primary_key, .. } => {
v6::Kind::IndexUpdate { primary_key, uid: None }
v6::Kind::IndexUpdate { primary_key }
}
v5::tasks::TaskContent::IndexDeletion { .. } => v6::Kind::IndexDeletion,
v5::tasks::TaskContent::DocumentAddition {
@@ -140,11 +138,9 @@ impl CompatV5ToV6 {
v5::Details::Settings { settings } => {
v6::Details::SettingsUpdate { settings: Box::new(settings.into()) }
}
v5::Details::IndexInfo { primary_key } => v6::Details::IndexInfo {
primary_key,
new_index_uid: None,
old_index_uid: None,
},
v5::Details::IndexInfo { primary_key } => {
v6::Details::IndexInfo { primary_key }
}
v5::Details::DocumentDeletion {
received_document_ids,
deleted_documents,
@@ -163,7 +159,6 @@ impl CompatV5ToV6 {
enqueued_at: task_view.enqueued_at,
started_at: task_view.started_at,
finished_at: task_view.finished_at,
network: None,
};
(task, content_file)
@@ -201,14 +196,6 @@ impl CompatV5ToV6 {
pub fn features(&self) -> Result<Option<v6::RuntimeTogglableFeatures>> {
Ok(None)
}
pub fn network(&self) -> Result<Option<&v6::Network>> {
Ok(None)
}
pub fn webhooks(&self) -> Option<&v6::Webhooks> {
None
}
}
pub enum CompatIndexV5ToV6 {
@@ -251,13 +238,6 @@ impl CompatIndexV5ToV6 {
}
}
pub fn documents_file(&self) -> &File {
match self {
CompatIndexV5ToV6::V5(v5) => v5.documents_file(),
CompatIndexV5ToV6::Compat(compat) => compat.documents_file(),
}
}
pub fn settings(&mut self) -> Result<v6::Settings<v6::Checked>> {
match self {
CompatIndexV5ToV6::V5(v5) => Ok(v6::Settings::from(v5.settings()?).check()),
@@ -338,16 +318,7 @@ impl<T> From<v5::Settings<T>> for v6::Settings<v6::Unchecked> {
v6::Settings {
displayed_attributes: v6::Setting::from(settings.displayed_attributes).into(),
searchable_attributes: v6::Setting::from(settings.searchable_attributes).into(),
filterable_attributes: match settings.filterable_attributes {
v5::settings::Setting::Set(filterable_attributes) => v6::Setting::Set(
filterable_attributes
.into_iter()
.map(v6::FilterableAttributesRule::Field)
.collect(),
),
v5::settings::Setting::Reset => v6::Setting::Reset,
v5::settings::Setting::NotSet => v6::Setting::NotSet,
},
filterable_attributes: settings.filterable_attributes.into(),
sortable_attributes: settings.sortable_attributes.into(),
ranking_rules: {
match settings.ranking_rules {
@@ -389,7 +360,6 @@ impl<T> From<v5::Settings<T>> for v6::Settings<v6::Unchecked> {
},
disable_on_words: typo.disable_on_words.into(),
disable_on_attributes: typo.disable_on_attributes.into(),
disable_on_numbers: v6::Setting::NotSet,
}),
v5::Setting::Reset => v6::Setting::Reset,
v5::Setting::NotSet => v6::Setting::NotSet,
@@ -404,13 +374,7 @@ impl<T> From<v5::Settings<T>> for v6::Settings<v6::Unchecked> {
},
pagination: match settings.pagination {
v5::Setting::Set(pagination) => v6::Setting::Set(v6::PaginationSettings {
max_total_hits: match pagination.max_total_hits {
v5::Setting::Set(max_total_hits) => v6::Setting::Set(
max_total_hits.try_into().unwrap_or(NonZeroUsize::new(1).unwrap()),
),
v5::Setting::Reset => v6::Setting::Reset,
v5::Setting::NotSet => v6::Setting::NotSet,
},
max_total_hits: pagination.max_total_hits.into(),
}),
v5::Setting::Reset => v6::Setting::Reset,
v5::Setting::NotSet => v6::Setting::NotSet,
@@ -420,8 +384,6 @@ impl<T> From<v5::Settings<T>> for v6::Settings<v6::Unchecked> {
search_cutoff_ms: v6::Setting::NotSet,
facet_search: v6::Setting::NotSet,
prefix_search: v6::Setting::NotSet,
chat: v6::Setting::NotSet,
vector_store: v6::Setting::NotSet,
_kind: std::marker::PhantomData,
}
}

View File

@@ -23,7 +23,6 @@ mod v6;
pub type Document = serde_json::Map<String, serde_json::Value>;
pub type UpdateFile = dyn Iterator<Item = Result<Document>>;
#[allow(clippy::large_enum_variant)]
pub enum DumpReader {
Current(V6Reader),
Compat(CompatV5ToV6),
@@ -102,13 +101,6 @@ impl DumpReader {
}
}
pub fn batches(&mut self) -> Result<Box<dyn Iterator<Item = Result<v6::Batch>> + '_>> {
match self {
DumpReader::Current(current) => Ok(current.batches()),
DumpReader::Compat(_compat) => Ok(Box::new(std::iter::empty())),
}
}
pub fn keys(&mut self) -> Result<Box<dyn Iterator<Item = Result<v6::Key>> + '_>> {
match self {
DumpReader::Current(current) => Ok(current.keys()),
@@ -116,35 +108,12 @@ impl DumpReader {
}
}
pub fn chat_completions_settings(
&mut self,
) -> Result<Box<dyn Iterator<Item = Result<(String, v6::ChatCompletionSettings)>> + '_>> {
match self {
DumpReader::Current(current) => current.chat_completions_settings(),
DumpReader::Compat(_compat) => Ok(Box::new(std::iter::empty())),
}
}
pub fn features(&self) -> Result<Option<v6::RuntimeTogglableFeatures>> {
match self {
DumpReader::Current(current) => Ok(current.features()),
DumpReader::Compat(compat) => compat.features(),
}
}
pub fn network(&self) -> Result<Option<&v6::Network>> {
match self {
DumpReader::Current(current) => Ok(current.network()),
DumpReader::Compat(compat) => compat.network(),
}
}
pub fn webhooks(&self) -> Option<&v6::Webhooks> {
match self {
DumpReader::Current(current) => current.webhooks(),
DumpReader::Compat(compat) => compat.webhooks(),
}
}
}
impl From<V6Reader> for DumpReader {
@@ -199,14 +168,6 @@ impl DumpIndexReader {
}
}
/// A reference to a file in the NDJSON format containing all the documents of the index
pub fn documents_file(&self) -> &File {
match self {
DumpIndexReader::Current(v6) => v6.documents_file(),
DumpIndexReader::Compat(compat) => compat.documents_file(),
}
}
pub fn settings(&mut self) -> Result<v6::Settings<v6::Checked>> {
match self {
DumpIndexReader::Current(v6) => v6.settings(),
@@ -258,10 +219,6 @@ pub(crate) mod test {
insta::assert_snapshot!(dump.date().unwrap(), @"2024-05-16 15:51:34.151044 +00:00:00");
insta::assert_debug_snapshot!(dump.instance_uid().unwrap(), @"None");
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
@@ -370,9 +327,10 @@ pub(crate) mod test {
}
}
assert_eq!(dump.features().unwrap().unwrap(), RuntimeTogglableFeatures::default());
assert_eq!(dump.network().unwrap(), None);
assert_eq!(dump.webhooks(), None);
assert_eq!(
dump.features().unwrap().unwrap(),
RuntimeTogglableFeatures { vector_store: true, ..Default::default() }
);
}
#[test]
@@ -384,10 +342,6 @@ pub(crate) mod test {
insta::assert_snapshot!(dump.date().unwrap(), @"2023-07-06 7:10:27.21958 +00:00:00");
insta::assert_debug_snapshot!(dump.instance_uid().unwrap(), @"None");
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
@@ -419,65 +373,10 @@ pub(crate) mod test {
assert_eq!(test.documents().unwrap().count(), 1);
assert_eq!(dump.features().unwrap().unwrap(), RuntimeTogglableFeatures::default());
}
#[test]
fn import_dump_v6_network() {
let dump = File::open("tests/assets/v6-with-network.dump").unwrap();
let dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_snapshot!(dump.date().unwrap(), @"2025-01-29 15:45:32.738676 +00:00:00");
insta::assert_debug_snapshot!(dump.instance_uid().unwrap(), @"None");
// network
let network = dump.network().unwrap().unwrap();
insta::assert_snapshot!(network.local.as_ref().unwrap(), @"ms-0");
insta::assert_snapshot!(network.remotes.get("ms-0").as_ref().unwrap().url, @"http://localhost:7700");
insta::assert_snapshot!(network.remotes.get("ms-0").as_ref().unwrap().search_api_key.is_none(), @"true");
insta::assert_snapshot!(network.remotes.get("ms-1").as_ref().unwrap().url, @"http://localhost:7701");
insta::assert_snapshot!(network.remotes.get("ms-1").as_ref().unwrap().search_api_key.is_none(), @"true");
insta::assert_snapshot!(network.remotes.get("ms-2").as_ref().unwrap().url, @"http://ms-5679.example.meilisearch.io");
insta::assert_snapshot!(network.remotes.get("ms-2").as_ref().unwrap().search_api_key.as_ref().unwrap(), @"foo");
}
#[test]
fn import_dump_v6_webhooks() {
let dump = File::open("tests/assets/v6-with-webhooks.dump").unwrap();
let dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_snapshot!(dump.date().unwrap(), @"2025-07-31 9:21:30.479544 +00:00:00");
insta::assert_debug_snapshot!(dump.instance_uid().unwrap(), @r"
Some(
cb887dcc-34b3-48d1-addd-9815ae721a81,
)
");
// webhooks
let webhooks = dump.webhooks().unwrap();
insta::assert_json_snapshot!(webhooks, @r#"
{
"webhooks": {
"627ea538-733d-4545-8d2d-03526eb381ce": {
"url": "https://example.com/authorization-less",
"headers": {}
},
"771b0a28-ef28-4082-b984-536f82958c65": {
"url": "https://example.com/hook",
"headers": {
"authorization": "TOKEN"
}
},
"f3583083-f8a7-4cbf-a5e7-fb3f1e28a7e9": {
"url": "https://third.com",
"headers": {}
}
}
}
"#);
assert_eq!(
dump.features().unwrap().unwrap(),
RuntimeTogglableFeatures { vector_store: true, ..Default::default() }
);
}
#[test]
@@ -489,10 +388,6 @@ pub(crate) mod test {
insta::assert_snapshot!(dump.date().unwrap(), @"2022-10-04 15:55:10.344982459 +00:00:00");
insta::assert_snapshot!(dump.instance_uid().unwrap().unwrap(), @"9e15e977-f2ae-4761-943f-1eaf75fd736d");
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
@@ -573,10 +468,6 @@ pub(crate) mod test {
insta::assert_snapshot!(dump.date().unwrap(), @"2022-10-06 12:53:49.131989609 +00:00:00");
insta::assert_snapshot!(dump.instance_uid().unwrap().unwrap(), @"9e15e977-f2ae-4761-943f-1eaf75fd736d");
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
@@ -654,10 +545,6 @@ pub(crate) mod test {
insta::assert_snapshot!(dump.date().unwrap(), @"2022-10-07 11:39:03.709153554 +00:00:00");
assert_eq!(dump.instance_uid().unwrap(), None);
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
@@ -751,10 +638,6 @@ pub(crate) mod test {
insta::assert_snapshot!(dump.date().unwrap(), @"2022-10-09 20:27:59.904096267 +00:00:00");
assert_eq!(dump.instance_uid().unwrap(), None);
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
@@ -848,10 +731,6 @@ pub(crate) mod test {
insta::assert_snapshot!(dump.date().unwrap(), @"2023-01-30 16:26:09.247261 +00:00:00");
assert_eq!(dump.instance_uid().unwrap(), None);
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
@@ -928,10 +807,6 @@ pub(crate) mod test {
assert_eq!(dump.date(), None);
assert_eq!(dump.instance_uid().unwrap(), None);
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();

View File

@@ -1,5 +1,5 @@
---
source: crates/dump/src/reader/mod.rs
source: dump/src/reader/mod.rs
expression: vector_index.settings().unwrap()
---
{
@@ -49,7 +49,6 @@ expression: vector_index.settings().unwrap()
"source": "huggingFace",
"model": "BAAI/bge-base-en-v1.5",
"revision": "617ca489d9e86b49b8167676d8220688b99db36e",
"pooling": "forceMean",
"documentTemplate": "{% for field in fields %} {{ field.name }}: {{ field.value }}\n{% endfor %}"
}
},

View File

@@ -72,10 +72,6 @@ impl V1IndexReader {
.map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) }))
}
pub fn documents_file(&self) -> &File {
self.documents.get_ref()
}
pub fn settings(&mut self) -> Result<self::settings::Settings> {
Ok(serde_json::from_reader(&mut self.settings)?)
}

View File

@@ -203,10 +203,6 @@ impl V2IndexReader {
.map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) }))
}
pub fn documents_file(&self) -> &File {
self.documents.get_ref()
}
pub fn settings(&mut self) -> Result<Settings<Checked>> {
Ok(self.settings.clone())
}

View File

@@ -215,10 +215,6 @@ impl V3IndexReader {
.map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) }))
}
pub fn documents_file(&self) -> &File {
self.documents.get_ref()
}
pub fn settings(&mut self) -> Result<Settings<Checked>> {
Ok(self.settings.clone())
}

View File

@@ -108,7 +108,7 @@ where
/// not supported on untagged enums.
struct StarOrVisitor<T>(PhantomData<T>);
impl<T, FE> Visitor<'_> for StarOrVisitor<T>
impl<'de, T, FE> Visitor<'de> for StarOrVisitor<T>
where
T: FromStr<Err = FE>,
FE: Display,

View File

@@ -210,10 +210,6 @@ impl V4IndexReader {
.map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) }))
}
pub fn documents_file(&self) -> &File {
self.documents.get_ref()
}
pub fn settings(&mut self) -> Result<Settings<Checked>> {
Ok(self.settings.clone())
}

View File

@@ -99,7 +99,7 @@ impl Task {
/// Return true when a task is finished.
/// A task is finished when its last state is either `Succeeded` or `Failed`.
pub fn is_finished(&self) -> bool {
self.events.last().is_some_and(|event| {
self.events.last().map_or(false, |event| {
matches!(event, TaskEvent::Succeded { .. } | TaskEvent::Failed { .. })
})
}

View File

@@ -108,7 +108,7 @@ where
/// not supported on untagged enums.
struct StarOrVisitor<T>(PhantomData<T>);
impl<T, FE> Visitor<'_> for StarOrVisitor<T>
impl<'de, T, FE> Visitor<'de> for StarOrVisitor<T>
where
T: FromStr<Err = FE>,
FE: Display,

View File

@@ -247,10 +247,6 @@ impl V5IndexReader {
.map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) }))
}
pub fn documents_file(&self) -> &File {
self.documents.get_ref()
}
pub fn settings(&mut self) -> Result<Settings<Checked>> {
Ok(self.settings.clone())
}

View File

@@ -114,7 +114,7 @@ impl Task {
/// Return true when a task is finished.
/// A task is finished when its last state is either `Succeeded` or `Failed`.
pub fn is_finished(&self) -> bool {
self.events.last().is_some_and(|event| {
self.events.last().map_or(false, |event| {
matches!(event, TaskEvent::Succeeded { .. } | TaskEvent::Failed { .. })
})
}

View File

@@ -1,10 +1,8 @@
use std::ffi::OsStr;
use std::fs::{self, File};
use std::io::{BufRead, BufReader, ErrorKind};
use std::path::Path;
pub use meilisearch_types::milli;
use meilisearch_types::milli::vector::embedder::hf::OverridePooling;
use tempfile::TempDir;
use time::OffsetDateTime;
use tracing::debug;
@@ -20,12 +18,8 @@ pub type Checked = meilisearch_types::settings::Checked;
pub type Unchecked = meilisearch_types::settings::Unchecked;
pub type Task = crate::TaskDump;
pub type Batch = meilisearch_types::batches::Batch;
pub type Key = meilisearch_types::keys::Key;
pub type ChatCompletionSettings = meilisearch_types::features::ChatCompletionSettings;
pub type RuntimeTogglableFeatures = meilisearch_types::features::RuntimeTogglableFeatures;
pub type Network = meilisearch_types::enterprise_edition::network::Network;
pub type Webhooks = meilisearch_types::webhooks::WebhooksDumpView;
// ===== Other types to clarify the code of the compat module
// everything related to the tasks
@@ -49,18 +43,13 @@ pub type ResponseError = meilisearch_types::error::ResponseError;
pub type Code = meilisearch_types::error::Code;
pub type RankingRuleView = meilisearch_types::settings::RankingRuleView;
pub type FilterableAttributesRule = meilisearch_types::milli::FilterableAttributesRule;
pub struct V6Reader {
dump: TempDir,
instance_uid: Option<Uuid>,
metadata: Metadata,
tasks: BufReader<File>,
batches: Option<BufReader<File>>,
keys: BufReader<File>,
features: Option<RuntimeTogglableFeatures>,
network: Option<Network>,
webhooks: Option<Webhooks>,
}
impl V6Reader {
@@ -88,46 +77,14 @@ impl V6Reader {
} else {
None
};
let batches = match File::open(dump.path().join("batches").join("queue.jsonl")) {
Ok(file) => Some(BufReader::new(file)),
// The batch file was only introduced during the v1.13, anything prior to that won't have batches
Err(err) if err.kind() == ErrorKind::NotFound => None,
Err(e) => return Err(e.into()),
};
let network = match fs::read(dump.path().join("network.json")) {
Ok(network_file) => Some(serde_json::from_reader(&*network_file)?),
Err(error) => match error.kind() {
// Allows the file to be missing, this will only result in all experimental features disabled.
ErrorKind::NotFound => {
debug!("`network.json` not found in dump");
None
}
_ => return Err(error.into()),
},
};
let webhooks = match fs::read(dump.path().join("webhooks.json")) {
Ok(webhooks_file) => Some(serde_json::from_reader(&*webhooks_file)?),
Err(error) => match error.kind() {
ErrorKind::NotFound => {
debug!("`webhooks.json` not found in dump");
None
}
_ => return Err(error.into()),
},
};
Ok(V6Reader {
metadata: serde_json::from_reader(&*meta_file)?,
instance_uid,
tasks: BufReader::new(File::open(dump.path().join("tasks").join("queue.jsonl"))?),
batches,
keys: BufReader::new(File::open(dump.path().join("keys.jsonl"))?),
features,
network,
dump,
webhooks,
})
}
@@ -167,7 +124,7 @@ impl V6Reader {
&mut self,
) -> Box<dyn Iterator<Item = Result<(Task, Option<Box<super::UpdateFile>>)>> + '_> {
Box::new((&mut self.tasks).lines().map(|line| -> Result<_> {
let task: Task = serde_json::from_str(&line?)?;
let task: Task = serde_json::from_str(&line?).unwrap();
let update_file_path = self
.dump
@@ -179,7 +136,8 @@ impl V6Reader {
if update_file_path.exists() {
Ok((
task,
Some(Box::new(UpdateFile::new(&update_file_path)?) as Box<super::UpdateFile>),
Some(Box::new(UpdateFile::new(&update_file_path).unwrap())
as Box<super::UpdateFile>),
))
} else {
Ok((task, None))
@@ -187,61 +145,15 @@ impl V6Reader {
}))
}
pub fn batches(&mut self) -> Box<dyn Iterator<Item = Result<Batch>> + '_> {
match self.batches.as_mut() {
Some(batches) => Box::new((batches).lines().map(|line| -> Result<_> {
let batch = serde_json::from_str(&line?)?;
Ok(batch)
})),
None => Box::new(std::iter::empty()) as Box<dyn Iterator<Item = Result<Batch>> + '_>,
}
}
pub fn keys(&mut self) -> Box<dyn Iterator<Item = Result<Key>> + '_> {
Box::new(
(&mut self.keys).lines().map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) }),
)
}
pub fn chat_completions_settings(
&mut self,
) -> Result<Box<dyn Iterator<Item = Result<(String, ChatCompletionSettings)>> + '_>> {
let entries = match fs::read_dir(self.dump.path().join("chat-completions-settings")) {
Ok(entries) => entries,
Err(e) if e.kind() == ErrorKind::NotFound => return Ok(Box::new(std::iter::empty())),
Err(e) => return Err(e.into()),
};
Ok(Box::new(
entries
.map(|entry| -> Result<Option<_>> {
let entry = entry?;
let file_name = entry.file_name();
let path = Path::new(&file_name);
if entry.file_type()?.is_file() && path.extension() == Some(OsStr::new("json"))
{
let name = path.file_stem().unwrap().to_str().unwrap().to_string();
let file = File::open(entry.path())?;
let settings = serde_json::from_reader(file)?;
Ok(Some((name, settings)))
} else {
Ok(None)
}
})
.filter_map(|entry| entry.transpose()),
))
}
pub fn features(&self) -> Option<RuntimeTogglableFeatures> {
self.features
}
pub fn network(&self) -> Option<&Network> {
self.network.as_ref()
}
pub fn webhooks(&self) -> Option<&Webhooks> {
self.webhooks.as_ref()
}
}
pub struct UpdateFile {
@@ -297,34 +209,8 @@ impl V6IndexReader {
.map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) }))
}
pub fn documents_file(&self) -> &File {
self.documents.get_ref()
}
pub fn settings(&mut self) -> Result<Settings<Checked>> {
let mut settings: Settings<Unchecked> = serde_json::from_reader(&mut self.settings)?;
patch_embedders(&mut settings);
let settings: Settings<Unchecked> = serde_json::from_reader(&mut self.settings)?;
Ok(settings.check())
}
}
fn patch_embedders(settings: &mut Settings<Unchecked>) {
if let Setting::Set(embedders) = &mut settings.embedders {
for settings in embedders.values_mut() {
let Setting::Set(settings) = &mut settings.inner else {
continue;
};
if settings.source != Setting::Set(milli::vector::settings::EmbedderSource::HuggingFace)
{
continue;
}
settings.pooling = match settings.pooling {
Setting::Set(pooling) => Setting::Set(pooling),
// if the pooling for a hugging face embedder is not set, force it to `forceMean`
// for backward compatibility with v1.13
// dumps created in v1.14 and up will have the setting set for hugging face embedders
Setting::Reset | Setting::NotSet => Setting::Set(OverridePooling::ForceMean),
};
}
}
}

View File

@@ -4,12 +4,9 @@ use std::path::PathBuf;
use flate2::write::GzEncoder;
use flate2::Compression;
use meilisearch_types::batches::Batch;
use meilisearch_types::enterprise_edition::network::Network;
use meilisearch_types::features::{ChatCompletionSettings, RuntimeTogglableFeatures};
use meilisearch_types::features::RuntimeTogglableFeatures;
use meilisearch_types::keys::Key;
use meilisearch_types::settings::{Checked, Settings};
use meilisearch_types::webhooks::WebhooksDumpView;
use serde_json::{Map, Value};
use tempfile::TempDir;
use time::OffsetDateTime;
@@ -53,18 +50,10 @@ impl DumpWriter {
KeyWriter::new(self.dir.path().to_path_buf())
}
pub fn create_chat_completions_settings(&self) -> Result<ChatCompletionsSettingsWriter> {
ChatCompletionsSettingsWriter::new(self.dir.path().join("chat-completions-settings"))
}
pub fn create_tasks_queue(&self) -> Result<TaskWriter> {
TaskWriter::new(self.dir.path().join("tasks"))
}
pub fn create_batches_queue(&self) -> Result<BatchWriter> {
BatchWriter::new(self.dir.path().join("batches"))
}
pub fn create_experimental_features(&self, features: RuntimeTogglableFeatures) -> Result<()> {
Ok(std::fs::write(
self.dir.path().join("experimental-features.json"),
@@ -72,17 +61,6 @@ impl DumpWriter {
)?)
}
pub fn create_network(&self, network: Network) -> Result<()> {
Ok(std::fs::write(self.dir.path().join("network.json"), serde_json::to_string(&network)?)?)
}
pub fn create_webhooks(&self, webhooks: WebhooksDumpView) -> Result<()> {
Ok(std::fs::write(
self.dir.path().join("webhooks.json"),
serde_json::to_string(&webhooks)?,
)?)
}
pub fn persist_to(self, mut writer: impl Write) -> Result<()> {
let gz_encoder = GzEncoder::new(&mut writer, Compression::default());
let mut tar_encoder = tar::Builder::new(gz_encoder);
@@ -106,7 +84,7 @@ impl KeyWriter {
}
pub fn push_key(&mut self, key: &Key) -> Result<()> {
serde_json::to_writer(&mut self.keys, &key)?;
self.keys.write_all(&serde_json::to_vec(key)?)?;
self.keys.write_all(b"\n")?;
Ok(())
}
@@ -117,24 +95,6 @@ impl KeyWriter {
}
}
pub struct ChatCompletionsSettingsWriter {
path: PathBuf,
}
impl ChatCompletionsSettingsWriter {
pub(crate) fn new(path: PathBuf) -> Result<Self> {
std::fs::create_dir(&path)?;
Ok(ChatCompletionsSettingsWriter { path })
}
pub fn push_settings(&mut self, name: &str, settings: &ChatCompletionSettings) -> Result<()> {
let mut settings_file = File::create(self.path.join(name).with_extension("json"))?;
serde_json::to_writer(&mut settings_file, &settings)?;
settings_file.flush()?;
Ok(())
}
}
pub struct TaskWriter {
queue: BufWriter<File>,
update_files: PathBuf,
@@ -154,7 +114,7 @@ impl TaskWriter {
/// Pushes tasks in the dump.
/// If the tasks has an associated `update_file` it'll use the `task_id` as its name.
pub fn push_task(&mut self, task: &TaskDump) -> Result<UpdateFile> {
serde_json::to_writer(&mut self.queue, &task)?;
self.queue.write_all(&serde_json::to_vec(task)?)?;
self.queue.write_all(b"\n")?;
Ok(UpdateFile::new(self.update_files.join(format!("{}.jsonl", task.uid))))
@@ -166,30 +126,6 @@ impl TaskWriter {
}
}
pub struct BatchWriter {
queue: BufWriter<File>,
}
impl BatchWriter {
pub(crate) fn new(path: PathBuf) -> Result<Self> {
std::fs::create_dir(&path)?;
let queue = File::create(path.join("queue.jsonl"))?;
Ok(BatchWriter { queue: BufWriter::new(queue) })
}
/// Pushes batches in the dump.
pub fn push_batch(&mut self, batch: &Batch) -> Result<()> {
serde_json::to_writer(&mut self.queue, &batch)?;
self.queue.write_all(b"\n")?;
Ok(())
}
pub fn flush(mut self) -> Result<()> {
self.queue.flush()?;
Ok(())
}
}
pub struct UpdateFile {
path: PathBuf,
writer: Option<BufWriter<File>>,
@@ -201,8 +137,8 @@ impl UpdateFile {
}
pub fn push_document(&mut self, document: &Document) -> Result<()> {
if let Some(mut writer) = self.writer.as_mut() {
serde_json::to_writer(&mut writer, &document)?;
if let Some(writer) = self.writer.as_mut() {
writer.write_all(&serde_json::to_vec(document)?)?;
writer.write_all(b"\n")?;
} else {
let file = File::create(&self.path).unwrap();
@@ -269,8 +205,8 @@ pub(crate) mod test {
use super::*;
use crate::reader::Document;
use crate::test::{
create_test_api_keys, create_test_batches, create_test_documents, create_test_dump,
create_test_instance_uid, create_test_settings, create_test_tasks,
create_test_api_keys, create_test_documents, create_test_dump, create_test_instance_uid,
create_test_settings, create_test_tasks,
};
fn create_directory_hierarchy(dir: &Path) -> String {
@@ -345,10 +281,8 @@ pub(crate) mod test {
let dump_path = dump.path();
// ==== checking global file hierarchy (we want to be sure there isn't too many files or too few)
insta::assert_snapshot!(create_directory_hierarchy(dump_path), @r"
insta::assert_snapshot!(create_directory_hierarchy(dump_path), @r###"
.
├---- batches/
│ └---- queue.jsonl
├---- indexes/
│ └---- doggos/
│ │ ├---- documents.jsonl
@@ -361,9 +295,8 @@ pub(crate) mod test {
├---- experimental-features.json
├---- instance_uid.uuid
├---- keys.jsonl
---- metadata.json
└---- network.json
");
---- metadata.json
"###);
// ==== checking the top level infos
let metadata = fs::read_to_string(dump_path.join("metadata.json")).unwrap();
@@ -416,16 +349,6 @@ pub(crate) mod test {
}
}
// ==== checking the batch queue
let batches_queue = fs::read_to_string(dump_path.join("batches/queue.jsonl")).unwrap();
for (batch, expected) in batches_queue.lines().zip(create_test_batches()) {
let mut batch = serde_json::from_str::<Batch>(batch).unwrap();
if batch.details.settings == Some(Box::new(Settings::<Unchecked>::default())) {
batch.details.settings = None;
}
assert_eq!(batch, expected, "{batch:#?}{expected:#?}");
}
// ==== checking the keys
let keys = fs::read_to_string(dump_path.join("keys.jsonl")).unwrap();
for (key, expected) in keys.lines().zip(create_test_api_keys()) {

View File

@@ -11,7 +11,7 @@ edition.workspace = true
license.workspace = true
[dependencies]
tempfile = "3.20.0"
thiserror = "2.0.12"
tracing = "0.1.41"
uuid = { version = "1.17.0", features = ["serde", "v4"] }
tempfile = "3.10.1"
thiserror = "1.0.61"
tracing = "0.1.40"
uuid = { version = "1.10.0", features = ["serde", "v4"] }

View File

@@ -148,10 +148,11 @@ impl File {
Ok(Self { path: PathBuf::new(), file: None })
}
pub fn persist(self) -> Result<Option<StdFile>> {
let Some(file) = self.file else { return Ok(None) };
Ok(Some(file.persist(&self.path)?))
pub fn persist(self) -> Result<()> {
if let Some(file) = self.file {
file.persist(&self.path)?;
}
Ok(())
}
}

View File

@@ -14,9 +14,7 @@ license.workspace = true
[dependencies]
nom = "7.1.3"
nom_locate = "4.2.0"
unescaper = "0.1.6"
levenshtein_automata = { version = "0.2.1", features = ["fst_automaton"] }
unescaper = "0.1.5"
[dev-dependencies]
# fixed version due to format breakages in v1.40
insta = "=1.39.0"
insta = "1.39.0"

View File

@@ -7,14 +7,12 @@
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::character::complete::{char, multispace0, multispace1};
use nom::combinator::{cut, map, value};
use nom::sequence::{preceded, terminated, tuple};
use nom::character::complete::multispace1;
use nom::combinator::cut;
use nom::sequence::{terminated, tuple};
use Condition::*;
use crate::error::IResultExt;
use crate::value::{parse_vector_value, parse_vector_value_cut};
use crate::{parse_value, Error, ErrorKind, FilterCondition, IResult, Span, Token, VectorFilter};
use crate::{parse_value, FilterCondition, IResult, Span, Token};
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Condition<'a> {
@@ -32,25 +30,6 @@ pub enum Condition<'a> {
StartsWith { keyword: Token<'a>, word: Token<'a> },
}
impl Condition<'_> {
pub fn operator(&self) -> &str {
match self {
Condition::GreaterThan(_) => ">",
Condition::GreaterThanOrEqual(_) => ">=",
Condition::Equal(_) => "=",
Condition::NotEqual(_) => "!=",
Condition::Null => "IS NULL",
Condition::Empty => "IS EMPTY",
Condition::Exists => "EXISTS",
Condition::LowerThan(_) => "<",
Condition::LowerThanOrEqual(_) => "<=",
Condition::Between { .. } => "TO",
Condition::Contains { .. } => "CONTAINS",
Condition::StartsWith { .. } => "STARTS WITH",
}
}
}
/// condition = value ("==" | ">" ...) value
pub fn parse_condition(input: Span) -> IResult<FilterCondition> {
let operator = alt((tag("<="), tag(">="), tag("!="), tag("<"), tag(">"), tag("=")));
@@ -115,83 +94,6 @@ pub fn parse_not_exists(input: Span) -> IResult<FilterCondition> {
Ok((input, FilterCondition::Not(Box::new(FilterCondition::Condition { fid: key, op: Exists }))))
}
fn parse_vectors(input: Span) -> IResult<(Token, Option<Token>, VectorFilter)> {
let (input, _) = multispace0(input)?;
let (input, fid) = tag("_vectors")(input)?;
if let Ok((input, _)) = multispace1::<_, crate::Error>(input) {
return Ok((input, (Token::from(fid), None, VectorFilter::None)));
}
let (input, _) = char('.')(input)?;
// From this point, we are certain this is a vector filter, so our errors must be final.
// We could use nom's `cut` but it's better to be explicit about the errors
if let Ok((_, space)) = tag::<_, _, ()>(" ")(input) {
return Err(crate::Error::failure_from_kind(space, ErrorKind::VectorFilterMissingEmbedder));
}
let (input, embedder_name) =
parse_vector_value_cut(input, ErrorKind::VectorFilterInvalidEmbedder)?;
let (input, filter) = alt((
map(
preceded(tag(".fragments"), |input| {
let (input, _) = tag(".")(input).map_cut(ErrorKind::VectorFilterMissingFragment)?;
parse_vector_value_cut(input, ErrorKind::VectorFilterInvalidFragment)
}),
VectorFilter::Fragment,
),
value(VectorFilter::UserProvided, tag(".userProvided")),
value(VectorFilter::DocumentTemplate, tag(".documentTemplate")),
value(VectorFilter::Regenerate, tag(".regenerate")),
value(VectorFilter::None, nom::combinator::success("")),
))(input)?;
if let Ok((input, point)) = tag::<_, _, ()>(".")(input) {
let opt_value = parse_vector_value(input).ok().map(|(_, v)| v);
let value =
opt_value.as_ref().map(|v| v.value().to_owned()).unwrap_or_else(|| point.to_string());
let context = opt_value.map(|v| v.original_span()).unwrap_or(point);
let previous_kind = match filter {
VectorFilter::Fragment(_) => Some("fragments"),
VectorFilter::DocumentTemplate => Some("documentTemplate"),
VectorFilter::UserProvided => Some("userProvided"),
VectorFilter::Regenerate => Some("regenerate"),
VectorFilter::None => None,
};
return Err(Error::failure_from_kind(
context,
ErrorKind::VectorFilterUnknownSuffix(previous_kind, value),
));
}
let (input, _) = multispace1(input).map_cut(ErrorKind::VectorFilterLeftover)?;
Ok((input, (Token::from(fid), Some(embedder_name), filter)))
}
/// vectors_exists = vectors ("EXISTS" | ("NOT" WS+ "EXISTS"))
pub fn parse_vectors_exists(input: Span) -> IResult<FilterCondition> {
let (input, (fid, embedder, filter)) = parse_vectors(input)?;
// Try parsing "EXISTS" first
if let Ok((input, _)) = tag::<_, _, ()>("EXISTS")(input) {
return Ok((input, FilterCondition::VectorExists { fid, embedder, filter }));
}
// Try parsing "NOT EXISTS"
if let Ok((input, _)) = tuple::<_, _, (), _>((tag("NOT"), multispace1, tag("EXISTS")))(input) {
return Ok((
input,
FilterCondition::Not(Box::new(FilterCondition::VectorExists { fid, embedder, filter })),
));
}
Err(crate::Error::failure_from_kind(input, ErrorKind::VectorFilterOperation))
}
/// contains = value "CONTAINS" value
pub fn parse_contains(input: Span) -> IResult<FilterCondition> {
let (input, (fid, contains, value)) =

View File

@@ -35,30 +35,13 @@ impl<E> NomErrorExt<E> for nom::Err<E> {
pub fn cut_with_err<'a, O>(
mut parser: impl FnMut(Span<'a>) -> IResult<'a, O>,
mut with: impl FnMut(Error<'a>) -> Error<'a>,
) -> impl FnMut(Span<'a>) -> IResult<'a, O> {
) -> impl FnMut(Span<'a>) -> IResult<O> {
move |input| match parser.parse(input) {
Err(nom::Err::Error(e)) => Err(nom::Err::Failure(with(e))),
rest => rest,
}
}
pub trait IResultExt<'a> {
fn map_cut(self, kind: ErrorKind<'a>) -> Self;
}
impl<'a, T> IResultExt<'a> for IResult<'a, T> {
fn map_cut(self, kind: ErrorKind<'a>) -> Self {
self.map_err(move |e: nom::Err<Error<'a>>| {
let input = match e {
nom::Err::Incomplete(_) => return e,
nom::Err::Error(e) => *e.context(),
nom::Err::Failure(e) => *e.context(),
};
Error::failure_from_kind(input, kind)
})
}
}
#[derive(Debug)]
pub struct Error<'a> {
context: Span<'a>,
@@ -75,21 +58,9 @@ pub enum ExpectedValueKind {
pub enum ErrorKind<'a> {
ReservedGeo(&'a str),
GeoRadius,
GeoRadiusArgumentCount(usize),
GeoBoundingBox,
GeoPolygon,
GeoPolygonNotEnoughPoints(usize),
GeoCoordinatesNotPair(usize),
MisusedGeoRadius,
MisusedGeoBoundingBox,
VectorFilterLeftover,
VectorFilterInvalidQuotes,
VectorFilterMissingEmbedder,
VectorFilterInvalidEmbedder,
VectorFilterMissingFragment,
VectorFilterInvalidFragment,
VectorFilterUnknownSuffix(Option<&'static str>, String),
VectorFilterOperation,
InvalidPrimary,
InvalidEscapedNumber,
ExpectedEof,
@@ -120,10 +91,6 @@ impl<'a> Error<'a> {
Self { context, kind }
}
pub fn failure_from_kind(context: Span<'a>, kind: ErrorKind<'a>) -> nom::Err<Self> {
nom::Err::Failure(Self::new_from_kind(context, kind))
}
pub fn new_from_external(context: Span<'a>, error: impl std::error::Error) -> Self {
Self::new_from_kind(context, ErrorKind::External(error.to_string()))
}
@@ -154,27 +121,13 @@ impl<'a> ParseError<Span<'a>> for Error<'a> {
}
}
impl Display for Error<'_> {
impl<'a> Display for Error<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let input = self.context.fragment();
// When printing our error message we want to escape all `\n` to be sure we keep our format with the
// first line being the diagnostic and the second line being the incriminated filter.
let escaped_input = input.escape_debug();
fn key_suggestion<'a>(key: &str, keys: &[&'a str]) -> Option<&'a str> {
let typos =
levenshtein_automata::LevenshteinAutomatonBuilder::new(2, true).build_dfa(key);
for key in keys.iter() {
match typos.eval(key) {
levenshtein_automata::Distance::Exact(_) => {
return Some(key);
}
levenshtein_automata::Distance::AtLeast(_) => continue,
}
}
None
}
match &self.kind {
ErrorKind::ExpectedValue(_) if input.trim().is_empty() => {
writeln!(f, "Was expecting a value but instead got nothing.")?
@@ -193,7 +146,7 @@ impl Display for Error<'_> {
}
ErrorKind::InvalidPrimary => {
let text = if input.trim().is_empty() { "but instead got nothing.".to_string() } else { format!("at `{}`.", escaped_input) };
writeln!(f, "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` {text}")?
writeln!(f, "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` {}", text)?
}
ErrorKind::InvalidEscapedNumber => {
writeln!(f, "Found an invalid escaped sequence number: `{}`.", escaped_input)?
@@ -202,23 +155,11 @@ impl Display for Error<'_> {
writeln!(f, "Found unexpected characters at the end of the filter: `{}`. You probably forgot an `OR` or an `AND` rule.", escaped_input)?
}
ErrorKind::GeoRadius => {
writeln!(f, "The `_geoRadius` filter must be in the form: `_geoRadius(latitude, longitude, radius, optionalResolution)`.")?
}
ErrorKind::GeoRadiusArgumentCount(count) => {
writeln!(f, "Was expecting 3 or 4 arguments for `_geoRadius`, but instead found {count}.")?
writeln!(f, "The `_geoRadius` filter expects three arguments: `_geoRadius(latitude, longitude, radius)`.")?
}
ErrorKind::GeoBoundingBox => {
writeln!(f, "The `_geoBoundingBox` filter expects two pairs of arguments: `_geoBoundingBox([latitude, longitude], [latitude, longitude])`.")?
}
ErrorKind::GeoPolygon => {
writeln!(f, "The `_geoPolygon` filter doesn't match the expected format: `_geoPolygon([latitude, longitude], [latitude, longitude])`.")?
}
ErrorKind::GeoPolygonNotEnoughPoints(n) => {
writeln!(f, "The `_geoPolygon` filter expects at least 3 points but only {n} were specified")?;
}
ErrorKind::GeoCoordinatesNotPair(number) => {
writeln!(f, "Was expecting 2 coordinates but instead found {number}.")?
}
ErrorKind::ReservedGeo(name) => {
writeln!(f, "`{}` is a reserved keyword and thus can't be used as a filter expression. Use the `_geoRadius(latitude, longitude, distance)` or `_geoBoundingBox([latitude, longitude], [latitude, longitude])` built-in rules to filter on `_geo` coordinates.", name.escape_debug())?
}
@@ -228,44 +169,6 @@ impl Display for Error<'_> {
ErrorKind::MisusedGeoBoundingBox => {
writeln!(f, "The `_geoBoundingBox` filter is an operation and can't be used as a value.")?
}
ErrorKind::VectorFilterLeftover => {
writeln!(f, "The vector filter has leftover tokens.")?
}
ErrorKind::VectorFilterUnknownSuffix(_, value) if value.as_str() == "." => {
writeln!(f, "Was expecting one of `.fragments`, `.userProvided`, `.documentTemplate`, `.regenerate` or nothing, but instead found a point without a valid value.")?;
}
ErrorKind::VectorFilterUnknownSuffix(None, value) if ["fragments", "userProvided", "documentTemplate", "regenerate"].contains(&value.as_str()) => {
// This will happen with "_vectors.rest.\"userProvided\"" for instance
writeln!(f, "Was expecting this part to be unquoted.")?
}
ErrorKind::VectorFilterUnknownSuffix(None, value) => {
if let Some(suggestion) = key_suggestion(value, &["fragments", "userProvided", "documentTemplate", "regenerate"]) {
writeln!(f, "Was expecting one of `fragments`, `userProvided`, `documentTemplate`, `regenerate` or nothing, but instead found `{value}`. Did you mean `{suggestion}`?")?;
} else {
writeln!(f, "Was expecting one of `fragments`, `userProvided`, `documentTemplate`, `regenerate` or nothing, but instead found `{value}`.")?;
}
}
ErrorKind::VectorFilterUnknownSuffix(Some(previous_filter_kind), value) => {
writeln!(f, "Vector filter can only accept one of `fragments`, `userProvided`, `documentTemplate` or `regenerate`, but found both `{previous_filter_kind}` and `{value}`.")?
},
ErrorKind::VectorFilterInvalidFragment => {
writeln!(f, "The vector filter's fragment name is invalid.")?
}
ErrorKind::VectorFilterMissingFragment => {
writeln!(f, "The vector filter is missing a fragment name.")?
}
ErrorKind::VectorFilterMissingEmbedder => {
writeln!(f, "Was expecting embedder name but found nothing.")?
}
ErrorKind::VectorFilterInvalidEmbedder => {
writeln!(f, "The vector filter's embedder name is invalid.")?
}
ErrorKind::VectorFilterOperation => {
writeln!(f, "Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter.")?
}
ErrorKind::VectorFilterInvalidQuotes => {
writeln!(f, "The quotes in one of the values are inconsistent.")?
}
ErrorKind::ReservedKeyword(word) => {
writeln!(f, "`{word}` is a reserved keyword and thus cannot be used as a field name unless it is put inside quotes. Use \"{word}\" or \'{word}\' instead.")?
}

View File

@@ -19,7 +19,6 @@
//! word = (alphanumeric | _ | - | .)+
//! geoRadius = "_geoRadius(" WS* float WS* "," WS* float WS* "," float WS* ")"
//! geoBoundingBox = "_geoBoundingBox([" WS * float WS* "," WS* float WS* "], [" WS* float WS* "," WS* float WS* "]")
//! geoPolygon = "_geoPolygon([[" WS* float WS* "," WS* float WS* "],+])"
//! ```
//!
//! Other BNF grammar used to handle some specific errors:
@@ -66,9 +65,6 @@ use nom_locate::LocatedSpan;
pub(crate) use value::parse_value;
use value::word_exact;
use crate::condition::parse_vectors_exists;
use crate::error::IResultExt;
pub type Span<'a> = LocatedSpan<&'a str, &'a str>;
type IResult<'a, Ret> = nom::IResult<Span<'a>, Ret, Error<'a>>;
@@ -84,7 +80,7 @@ pub struct Token<'a> {
value: Option<String>,
}
impl PartialEq for Token<'_> {
impl<'a> PartialEq for Token<'a> {
fn eq(&self, other: &Self) -> bool {
self.span.fragment() == other.span.fragment()
}
@@ -117,7 +113,7 @@ impl<'a> Token<'a> {
self.span
}
pub fn parse_finite_float(&self) -> Result<f64, Error<'a>> {
pub fn parse_finite_float(&self) -> Result<f64, Error> {
let value: f64 = self.value().parse().map_err(|e| self.as_external_error(e))?;
if value.is_finite() {
Ok(value)
@@ -140,15 +136,6 @@ impl<'a> From<&'a str> for Token<'a> {
}
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum VectorFilter<'a> {
Fragment(Token<'a>),
DocumentTemplate,
UserProvided,
Regenerate,
None,
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum FilterCondition<'a> {
Not(Box<Self>),
@@ -156,10 +143,8 @@ pub enum FilterCondition<'a> {
In { fid: Token<'a>, els: Vec<Token<'a>> },
Or(Vec<Self>),
And(Vec<Self>),
VectorExists { fid: Token<'a>, embedder: Option<Token<'a>>, filter: VectorFilter<'a> },
GeoLowerThan { point: [Token<'a>; 2], radius: Token<'a>, resolution: Option<Token<'a>> },
GeoLowerThan { point: [Token<'a>; 2], radius: Token<'a> },
GeoBoundingBox { top_right_point: [Token<'a>; 2], bottom_left_point: [Token<'a>; 2] },
GeoPolygon { points: Vec<[Token<'a>; 2]> },
}
pub enum TraversedElement<'a> {
@@ -168,7 +153,7 @@ pub enum TraversedElement<'a> {
}
impl<'a> FilterCondition<'a> {
pub fn use_contains_operator(&self) -> Option<&Token<'a>> {
pub fn use_contains_operator(&self) -> Option<&Token> {
match self {
FilterCondition::Condition { fid: _, op } => match op {
Condition::GreaterThan(_)
@@ -180,59 +165,22 @@ impl<'a> FilterCondition<'a> {
| Condition::Exists
| Condition::LowerThan(_)
| Condition::LowerThanOrEqual(_)
| Condition::Between { .. }
| Condition::StartsWith { .. } => None,
Condition::Contains { keyword, word: _ } => Some(keyword),
| Condition::Between { .. } => None,
Condition::Contains { keyword, word: _ }
| Condition::StartsWith { keyword, word: _ } => Some(keyword),
},
FilterCondition::Not(this) => this.use_contains_operator(),
FilterCondition::Or(seq) | FilterCondition::And(seq) => {
seq.iter().find_map(|filter| filter.use_contains_operator())
}
FilterCondition::VectorExists { .. }
| FilterCondition::GeoLowerThan { .. }
| FilterCondition::GeoBoundingBox { .. }
| FilterCondition::GeoPolygon { .. }
| FilterCondition::In { .. } => None,
}
}
pub fn use_vector_filter(&self) -> Option<&Token<'a>> {
match self {
FilterCondition::Condition { .. } => None,
FilterCondition::Not(this) => this.use_vector_filter(),
FilterCondition::Or(seq) | FilterCondition::And(seq) => {
seq.iter().find_map(|filter| filter.use_vector_filter())
}
FilterCondition::GeoLowerThan { .. }
| FilterCondition::GeoBoundingBox { .. }
| FilterCondition::GeoPolygon { .. }
| FilterCondition::In { .. } => None,
FilterCondition::VectorExists { fid, .. } => Some(fid),
}
}
pub fn fids(&self, depth: usize) -> Box<dyn Iterator<Item = &Token<'a>> + '_> {
if depth == 0 {
return Box::new(std::iter::empty());
}
match self {
FilterCondition::Condition { fid, .. } | FilterCondition::In { fid, .. } => {
Box::new(std::iter::once(fid))
}
FilterCondition::Not(filter) => {
let depth = depth.saturating_sub(1);
filter.fids(depth)
}
FilterCondition::And(subfilters) | FilterCondition::Or(subfilters) => {
let depth = depth.saturating_sub(1);
Box::new(subfilters.iter().flat_map(move |f| f.fids(depth)))
}
_ => Box::new(std::iter::empty()),
}
}
/// Returns the first token found at the specified depth, `None` if no token at this depth.
pub fn token_at_depth(&self, depth: usize) -> Option<&Token<'a>> {
pub fn token_at_depth(&self, depth: usize) -> Option<&Token> {
match self {
FilterCondition::Condition { fid, .. } if depth == 0 => Some(fid),
FilterCondition::Or(subfilters) => {
@@ -258,7 +206,7 @@ impl<'a> FilterCondition<'a> {
}
}
pub fn parse(input: &'a str) -> Result<Option<Self>, Error<'a>> {
pub fn parse(input: &'a str) -> Result<Option<Self>, Error> {
if input.trim().is_empty() {
return Ok(None);
}
@@ -295,7 +243,10 @@ fn parse_in_body(input: Span) -> IResult<Vec<Token>> {
let (input, _) = ws(word_exact("IN"))(input)?;
// everything after `IN` can be a failure
let (input, _) = tag("[")(input).map_cut(ErrorKind::InOpeningBracket)?;
let (input, _) =
cut_with_err(tag("["), |_| Error::new_from_kind(input, ErrorKind::InOpeningBracket))(
input,
)?;
let (input, content) = cut(parse_value_list)(input)?;
@@ -400,27 +351,23 @@ fn parse_not(input: Span, depth: usize) -> IResult<FilterCondition> {
/// If we parse `_geoRadius` we MUST parse the rest of the expression.
fn parse_geo_radius(input: Span) -> IResult<FilterCondition> {
// we want to allow space BEFORE the _geoRadius but not after
let (input, _) = tuple((multispace0, word_exact("_geoRadius")))(input)?;
// if we were able to parse `_geoRadius` and can't parse the rest of the input we return a failure
let parsed =
delimited(char('('), separated_list1(tag(","), ws(recognize_float)), char(')'))(input)
.map_cut(ErrorKind::GeoRadius);
let parsed = preceded(
tuple((multispace0, word_exact("_geoRadius"))),
// if we were able to parse `_geoRadius` and can't parse the rest of the input we return a failure
cut(delimited(char('('), separated_list1(tag(","), ws(recognize_float)), char(')'))),
)(input)
.map_err(|e| e.map(|_| Error::new_from_kind(input, ErrorKind::GeoRadius)));
let (input, args) = parsed?;
if !(3..=4).contains(&args.len()) {
return Err(Error::failure_from_kind(input, ErrorKind::GeoRadiusArgumentCount(args.len())));
if args.len() != 3 {
return Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::GeoRadius)));
}
let res = FilterCondition::GeoLowerThan {
point: [args[0].into(), args[1].into()],
radius: args[2].into(),
resolution: args.get(3).cloned().map(Token::from),
};
Ok((input, res))
}
@@ -428,31 +375,24 @@ fn parse_geo_radius(input: Span) -> IResult<FilterCondition> {
/// If we parse `_geoBoundingBox` we MUST parse the rest of the expression.
fn parse_geo_bounding_box(input: Span) -> IResult<FilterCondition> {
// we want to allow space BEFORE the _geoBoundingBox but not after
let (input, _) = tuple((multispace0, word_exact("_geoBoundingBox")))(input)?;
// if we were able to parse `_geoBoundingBox` and can't parse the rest of the input we return a failure
let (input, args) = delimited(
char('('),
separated_list1(
tag(","),
ws(delimited(char('['), separated_list1(tag(","), ws(recognize_float)), char(']'))),
),
char(')'),
let parsed = preceded(
tuple((multispace0, word_exact("_geoBoundingBox"))),
// if we were able to parse `_geoBoundingBox` and can't parse the rest of the input we return a failure
cut(delimited(
char('('),
separated_list1(
tag(","),
ws(delimited(char('['), separated_list1(tag(","), ws(recognize_float)), char(']'))),
),
char(')'),
)),
)(input)
.map_cut(ErrorKind::GeoBoundingBox)?;
.map_err(|e| e.map(|_| Error::new_from_kind(input, ErrorKind::GeoBoundingBox)));
if args.len() != 2 {
return Err(Error::failure_from_kind(input, ErrorKind::GeoBoundingBox));
}
let (input, args) = parsed?;
if let Some(offending) = args.iter().find(|a| a.len() != 2) {
let context = offending.first().unwrap_or(&input);
return Err(Error::failure_from_kind(
*context,
ErrorKind::GeoCoordinatesNotPair(offending.len()),
));
if args.len() != 2 || args[0].len() != 2 || args[1].len() != 2 {
return Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::GeoBoundingBox)));
}
let res = FilterCondition::GeoBoundingBox {
@@ -462,47 +402,6 @@ fn parse_geo_bounding_box(input: Span) -> IResult<FilterCondition> {
Ok((input, res))
}
/// geoPolygon = "_geoPolygon([[" WS* float WS* "," WS* float WS* "],+])"
/// If we parse `_geoPolygon` we MUST parse the rest of the expression.
fn parse_geo_polygon(input: Span) -> IResult<FilterCondition> {
// we want to allow space BEFORE the _geoPolygon but not after
let (input, _) = tuple((multispace0, word_exact("_geoPolygon")))(input)?;
// if we were able to parse `_geoPolygon` and can't parse the rest of the input we return a failure
let (input, args): (_, Vec<Vec<LocatedSpan<_, _>>>) = delimited(
char('('),
separated_list1(
tag(","),
ws(delimited(char('['), separated_list1(tag(","), ws(recognize_float)), char(']'))),
),
preceded(opt(ws(char(','))), char(')')), // Tolerate trailing comma
)(input)
.map_cut(ErrorKind::GeoPolygon)?;
if args.len() < 3 {
let context = args.last().and_then(|a| a.last()).unwrap_or(&input);
return Err(Error::failure_from_kind(
*context,
ErrorKind::GeoPolygonNotEnoughPoints(args.len()),
));
}
if let Some(offending) = args.iter().find(|a| a.len() != 2) {
let context = offending.first().unwrap_or(&input);
return Err(Error::failure_from_kind(
*context,
ErrorKind::GeoCoordinatesNotPair(offending.len()),
));
}
let res = FilterCondition::GeoPolygon {
points: args.into_iter().map(|a| [a[0].into(), a[1].into()]).collect(),
};
Ok((input, res))
}
/// geoPoint = WS* "_geoPoint(float WS* "," WS* float WS* "," WS* float)
fn parse_geo_point(input: Span) -> IResult<FilterCondition> {
// we want to forbid space BEFORE the _geoPoint but not after
@@ -514,7 +413,7 @@ fn parse_geo_point(input: Span) -> IResult<FilterCondition> {
))(input)
.map_err(|e| e.map(|_| Error::new_from_kind(input, ErrorKind::ReservedGeo("_geoPoint"))))?;
// if we succeeded we still return a `Failure` because geoPoints are not allowed
Err(Error::failure_from_kind(input, ErrorKind::ReservedGeo("_geoPoint")))
Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::ReservedGeo("_geoPoint"))))
}
/// geoPoint = WS* "_geoDistance(float WS* "," WS* float WS* "," WS* float)
@@ -528,7 +427,7 @@ fn parse_geo_distance(input: Span) -> IResult<FilterCondition> {
))(input)
.map_err(|e| e.map(|_| Error::new_from_kind(input, ErrorKind::ReservedGeo("_geoDistance"))))?;
// if we succeeded we still return a `Failure` because `geoDistance` filters are not allowed
Err(Error::failure_from_kind(input, ErrorKind::ReservedGeo("_geoDistance")))
Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::ReservedGeo("_geoDistance"))))
}
/// geo = WS* "_geo(float WS* "," WS* float WS* "," WS* float)
@@ -542,7 +441,7 @@ fn parse_geo(input: Span) -> IResult<FilterCondition> {
))(input)
.map_err(|e| e.map(|_| Error::new_from_kind(input, ErrorKind::ReservedGeo("_geo"))))?;
// if we succeeded we still return a `Failure` because `_geo` filter is not allowed
Err(Error::failure_from_kind(input, ErrorKind::ReservedGeo("_geo")))
Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::ReservedGeo("_geo"))))
}
fn parse_error_reserved_keyword(input: Span) -> IResult<FilterCondition> {
@@ -572,8 +471,8 @@ fn parse_primary(input: Span, depth: usize) -> IResult<FilterCondition> {
Error::new_from_kind(input, ErrorKind::MissingClosingDelimiter(c.char()))
}),
),
// Made a random block of functions because we reached the maximum number of elements per alt
alt((parse_geo_radius, parse_geo_bounding_box, parse_geo_polygon)),
parse_geo_radius,
parse_geo_bounding_box,
parse_in,
parse_not_in,
parse_condition,
@@ -581,7 +480,8 @@ fn parse_primary(input: Span, depth: usize) -> IResult<FilterCondition> {
parse_is_not_null,
parse_is_empty,
parse_is_not_empty,
alt((parse_vectors_exists, parse_exists, parse_not_exists)),
parse_exists,
parse_not_exists,
parse_to,
parse_contains,
parse_not_contains,
@@ -607,7 +507,7 @@ pub fn parse_filter(input: Span) -> IResult<FilterCondition> {
terminated(|input| parse_expression(input, 0), eof)(input)
}
impl std::fmt::Display for FilterCondition<'_> {
impl<'a> std::fmt::Display for FilterCondition<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
FilterCondition::Not(filter) => {
@@ -637,28 +537,9 @@ impl std::fmt::Display for FilterCondition<'_> {
}
write!(f, "]")
}
FilterCondition::VectorExists { fid: _, embedder, filter: inner } => {
write!(f, "_vectors")?;
if let Some(embedder) = embedder {
write!(f, ".{:?}", embedder.value())?;
}
match inner {
VectorFilter::Fragment(fragment) => {
write!(f, ".fragments.{:?}", fragment.value())?
}
VectorFilter::DocumentTemplate => write!(f, ".documentTemplate")?,
VectorFilter::UserProvided => write!(f, ".userProvided")?,
VectorFilter::Regenerate => write!(f, ".regenerate")?,
VectorFilter::None => (),
}
write!(f, " EXISTS")
}
FilterCondition::GeoLowerThan { point, radius, resolution: None } => {
FilterCondition::GeoLowerThan { point, radius } => {
write!(f, "_geoRadius({}, {}, {})", point[0], point[1], radius)
}
FilterCondition::GeoLowerThan { point, radius, resolution: Some(resolution) } => {
write!(f, "_geoRadius({}, {}, {}, {})", point[0], point[1], radius, resolution)
}
FilterCondition::GeoBoundingBox {
top_right_point: top_left_point,
bottom_left_point: bottom_right_point,
@@ -672,18 +553,10 @@ impl std::fmt::Display for FilterCondition<'_> {
bottom_right_point[1]
)
}
FilterCondition::GeoPolygon { points } => {
write!(f, "_geoPolygon([")?;
for point in points {
write!(f, "[{}, {}], ", point[0], point[1])?;
}
write!(f, "])")
}
}
}
}
impl std::fmt::Display for Condition<'_> {
impl<'a> std::fmt::Display for Condition<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Condition::GreaterThan(token) => write!(f, "> {token}"),
@@ -701,8 +574,7 @@ impl std::fmt::Display for Condition<'_> {
}
}
}
impl std::fmt::Display for Token<'_> {
impl<'a> std::fmt::Display for Token<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{{{}}}", self.value())
}
@@ -717,7 +589,7 @@ pub mod tests {
/// Create a raw [Token]. You must specify the string that appear BEFORE your element followed by your element
pub fn rtok<'a>(before: &'a str, value: &'a str) -> Token<'a> {
// if the string is empty we still need to return 1 for the line number
let lines = if before.is_empty() { 1 } else { before.lines().count() };
let lines = before.is_empty().then_some(1).unwrap_or_else(|| before.lines().count());
let offset = before.chars().count();
// the extra field is not checked in the tests so we can set it to nothing
unsafe { Span::new_from_raw_offset(offset, lines as u32, value, "") }.into()
@@ -736,9 +608,6 @@ pub mod tests {
insta::assert_snapshot!(p(r"title = 'foo\\\\\\\\'"), @r#"{title} = {foo\\\\}"#);
// but it also works with other sequences
insta::assert_snapshot!(p(r#"title = 'foo\x20\n\t\"\'"'"#), @"{title} = {foo \n\t\"\'\"}");
insta::assert_snapshot!(p(r#"_vectors." valid.name ".fragments."also.. valid! " EXISTS"#), @r#"_vectors." valid.name ".fragments."also.. valid! " EXISTS"#);
insta::assert_snapshot!(p("_vectors.\"\n\t\r\\\"\" EXISTS"), @r#"_vectors."\n\t\r\"" EXISTS"#);
}
#[test]
@@ -801,18 +670,6 @@ pub mod tests {
insta::assert_snapshot!(p("NOT subscribers IS NOT EMPTY"), @"{subscribers} IS EMPTY");
insta::assert_snapshot!(p("subscribers IS NOT EMPTY"), @"NOT ({subscribers} IS EMPTY)");
// Test _vectors EXISTS + _vectors NOT EXITS
insta::assert_snapshot!(p("_vectors EXISTS"), @"_vectors EXISTS");
insta::assert_snapshot!(p("_vectors.embedderName EXISTS"), @r#"_vectors."embedderName" EXISTS"#);
insta::assert_snapshot!(p("_vectors.embedderName.documentTemplate EXISTS"), @r#"_vectors."embedderName".documentTemplate EXISTS"#);
insta::assert_snapshot!(p("_vectors.embedderName.regenerate EXISTS"), @r#"_vectors."embedderName".regenerate EXISTS"#);
insta::assert_snapshot!(p("_vectors.embedderName.regenerate EXISTS"), @r#"_vectors."embedderName".regenerate EXISTS"#);
insta::assert_snapshot!(p("_vectors.embedderName.fragments.fragmentName EXISTS"), @r#"_vectors."embedderName".fragments."fragmentName" EXISTS"#);
insta::assert_snapshot!(p(" _vectors.embedderName.fragments.fragmentName EXISTS"), @r#"_vectors."embedderName".fragments."fragmentName" EXISTS"#);
insta::assert_snapshot!(p("NOT _vectors EXISTS"), @"NOT (_vectors EXISTS)");
insta::assert_snapshot!(p(" NOT _vectors EXISTS"), @"NOT (_vectors EXISTS)");
insta::assert_snapshot!(p(" _vectors NOT EXISTS"), @"NOT (_vectors EXISTS)");
// Test EXISTS + NOT EXITS
insta::assert_snapshot!(p("subscribers EXISTS"), @"{subscribers} EXISTS");
insta::assert_snapshot!(p("NOT subscribers EXISTS"), @"NOT ({subscribers} EXISTS)");
@@ -842,17 +699,12 @@ pub mod tests {
insta::assert_snapshot!(p("_geoRadius(12, 13, 14)"), @"_geoRadius({12}, {13}, {14})");
insta::assert_snapshot!(p("NOT _geoRadius(12, 13, 14)"), @"NOT (_geoRadius({12}, {13}, {14}))");
insta::assert_snapshot!(p("_geoRadius(12,13,14)"), @"_geoRadius({12}, {13}, {14})");
insta::assert_snapshot!(p("_geoRadius(12,13,14,1000)"), @"_geoRadius({12}, {13}, {14}, {1000})");
// Test geo bounding box
insta::assert_snapshot!(p("_geoBoundingBox([12, 13], [14, 15])"), @"_geoBoundingBox([{12}, {13}], [{14}, {15}])");
insta::assert_snapshot!(p("NOT _geoBoundingBox([12, 13], [14, 15])"), @"NOT (_geoBoundingBox([{12}, {13}], [{14}, {15}]))");
insta::assert_snapshot!(p("_geoBoundingBox([12,13],[14,15])"), @"_geoBoundingBox([{12}, {13}], [{14}, {15}])");
// Test geo polygon
insta::assert_snapshot!(p("_geoPolygon([12, 13], [14, 15], [16, 17])"), @"_geoPolygon([[{12}, {13}], [{14}, {15}], [{16}, {17}], ])");
insta::assert_snapshot!(p("_geoPolygon([12, 13], [14, 15], [-1.2,2939.2], [1,1])"), @"_geoPolygon([[{12}, {13}], [{14}, {15}], [{-1.2}, {2939.2}], [{1}, {1}], ])");
// Test OR + AND
insta::assert_snapshot!(p("channel = ponce AND 'dog race' != 'bernese mountain'"), @"AND[{channel} = {ponce}, {dog race} != {bernese mountain}, ]");
insta::assert_snapshot!(p("channel = ponce OR 'dog race' != 'bernese mountain'"), @"OR[{channel} = {ponce}, {dog race} != {bernese mountain}, ]");
@@ -909,80 +761,50 @@ pub mod tests {
11:12 channel = 🐻 AND followers < 100
"###);
insta::assert_snapshot!(p("'OR'"), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `\'OR\'`.
insta::assert_snapshot!(p("'OR'"), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `\'OR\'`.
1:5 'OR'
");
"###);
insta::assert_snapshot!(p("OR"), @r###"
Was expecting a value but instead got `OR`, which is a reserved keyword. To use `OR` as a field name or a value, surround it by quotes.
1:3 OR
"###);
insta::assert_snapshot!(p("channel Ponce"), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `channel Ponce`.
insta::assert_snapshot!(p("channel Ponce"), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `channel Ponce`.
1:14 channel Ponce
");
"###);
insta::assert_snapshot!(p("channel = Ponce OR"), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` but instead got nothing.
insta::assert_snapshot!(p("channel = Ponce OR"), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` but instead got nothing.
19:19 channel = Ponce OR
");
"###);
insta::assert_snapshot!(p("_geoRadius"), @r"
The `_geoRadius` filter must be in the form: `_geoRadius(latitude, longitude, radius, optionalResolution)`.
11:11 _geoRadius
");
insta::assert_snapshot!(p("_geoRadius"), @r###"
The `_geoRadius` filter expects three arguments: `_geoRadius(latitude, longitude, radius)`.
1:11 _geoRadius
"###);
insta::assert_snapshot!(p("_geoRadius = 12"), @r"
The `_geoRadius` filter must be in the form: `_geoRadius(latitude, longitude, radius, optionalResolution)`.
11:16 _geoRadius = 12
");
insta::assert_snapshot!(p("_geoRadius = 12"), @r###"
The `_geoRadius` filter expects three arguments: `_geoRadius(latitude, longitude, radius)`.
1:16 _geoRadius = 12
"###);
insta::assert_snapshot!(p("_geoBoundingBox"), @r"
insta::assert_snapshot!(p("_geoBoundingBox"), @r###"
The `_geoBoundingBox` filter expects two pairs of arguments: `_geoBoundingBox([latitude, longitude], [latitude, longitude])`.
16:16 _geoBoundingBox
");
1:16 _geoBoundingBox
"###);
insta::assert_snapshot!(p("_geoBoundingBox = 12"), @r"
insta::assert_snapshot!(p("_geoBoundingBox = 12"), @r###"
The `_geoBoundingBox` filter expects two pairs of arguments: `_geoBoundingBox([latitude, longitude], [latitude, longitude])`.
16:21 _geoBoundingBox = 12
");
1:21 _geoBoundingBox = 12
"###);
insta::assert_snapshot!(p("_geoBoundingBox(1.0, 1.0)"), @r"
insta::assert_snapshot!(p("_geoBoundingBox(1.0, 1.0)"), @r###"
The `_geoBoundingBox` filter expects two pairs of arguments: `_geoBoundingBox([latitude, longitude], [latitude, longitude])`.
17:26 _geoBoundingBox(1.0, 1.0)
");
insta::assert_snapshot!(p("_geoPolygon([1,2,3])"), @r"
The `_geoPolygon` filter expects at least 3 points but only 1 were specified
18:19 _geoPolygon([1,2,3])
");
insta::assert_snapshot!(p("_geoPolygon(1,2,3)"), @r"
The `_geoPolygon` filter doesn't match the expected format: `_geoPolygon([latitude, longitude], [latitude, longitude])`.
13:19 _geoPolygon(1,2,3)
");
insta::assert_snapshot!(p("_geoPolygon([1,2],[1,2],[1,2,3])"), @r"
Was expecting 2 coordinates but instead found 3.
26:27 _geoPolygon([1,2],[1,2],[1,2,3])
");
insta::assert_snapshot!(p("_geoPolygon([1,2],[1,2,3])"), @r"
The `_geoPolygon` filter expects at least 3 points but only 2 were specified
24:25 _geoPolygon([1,2],[1,2,3])
");
insta::assert_snapshot!(p("_geoPolygon(1)"), @r"
The `_geoPolygon` filter doesn't match the expected format: `_geoPolygon([latitude, longitude], [latitude, longitude])`.
13:15 _geoPolygon(1)
");
insta::assert_snapshot!(p("_geoPolygon([1,2)"), @r"
The `_geoPolygon` filter doesn't match the expected format: `_geoPolygon([latitude, longitude], [latitude, longitude])`.
17:18 _geoPolygon([1,2)
");
1:26 _geoBoundingBox(1.0, 1.0)
"###);
insta::assert_snapshot!(p("_geoPoint(12, 13, 14)"), @r###"
`_geoPoint` is a reserved keyword and thus can't be used as a filter expression. Use the `_geoRadius(latitude, longitude, distance)` or `_geoBoundingBox([latitude, longitude], [latitude, longitude])` built-in rules to filter on `_geo` coordinates.
@@ -1039,15 +861,15 @@ pub mod tests {
34:35 channel = mv OR followers >= 1000)
"###);
insta::assert_snapshot!(p("colour NOT EXIST"), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `colour NOT EXIST`.
insta::assert_snapshot!(p("colour NOT EXIST"), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `colour NOT EXIST`.
1:17 colour NOT EXIST
");
"###);
insta::assert_snapshot!(p("subscribers 100 TO1000"), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `subscribers 100 TO1000`.
insta::assert_snapshot!(p("subscribers 100 TO1000"), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `subscribers 100 TO1000`.
1:23 subscribers 100 TO1000
");
"###);
insta::assert_snapshot!(p("channel = ponce ORdog != 'bernese mountain'"), @r###"
Found unexpected characters at the end of the filter: `ORdog != \'bernese mountain\'`. You probably forgot an `OR` or an `AND` rule.
@@ -1102,108 +924,43 @@ pub mod tests {
"###
);
insta::assert_snapshot!(p(r#"_vectors _vectors EXISTS"#), @r"
Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter.
10:25 _vectors _vectors EXISTS
");
insta::assert_snapshot!(p(r#"_vectors. embedderName EXISTS"#), @r"
Was expecting embedder name but found nothing.
10:11 _vectors. embedderName EXISTS
");
insta::assert_snapshot!(p(r#"_vectors .embedderName EXISTS"#), @r"
Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter.
10:30 _vectors .embedderName EXISTS
");
insta::assert_snapshot!(p(r#"_vectors.embedderName. EXISTS"#), @r"
Was expecting one of `.fragments`, `.userProvided`, `.documentTemplate`, `.regenerate` or nothing, but instead found a point without a valid value.
22:23 _vectors.embedderName. EXISTS
");
insta::assert_snapshot!(p(r#"_vectors."embedderName EXISTS"#), @r#"
The quotes in one of the values are inconsistent.
10:30 _vectors."embedderName EXISTS
"#);
insta::assert_snapshot!(p(r#"_vectors."embedderNam"e EXISTS"#), @r#"
The vector filter has leftover tokens.
23:31 _vectors."embedderNam"e EXISTS
"#);
insta::assert_snapshot!(p(r#"_vectors.embedderName.documentTemplate. EXISTS"#), @r"
Was expecting one of `.fragments`, `.userProvided`, `.documentTemplate`, `.regenerate` or nothing, but instead found a point without a valid value.
39:40 _vectors.embedderName.documentTemplate. EXISTS
");
insta::assert_snapshot!(p(r#"_vectors.embedderName.fragments EXISTS"#), @r"
The vector filter is missing a fragment name.
32:39 _vectors.embedderName.fragments EXISTS
");
insta::assert_snapshot!(p(r#"_vectors.embedderName.fragments. EXISTS"#), @r"
The vector filter's fragment name is invalid.
33:40 _vectors.embedderName.fragments. EXISTS
");
insta::assert_snapshot!(p(r#"_vectors.embedderName.fragments.test test EXISTS"#), @r"
Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter.
38:49 _vectors.embedderName.fragments.test test EXISTS
");
insta::assert_snapshot!(p(r#"_vectors.embedderName.fragments. test EXISTS"#), @r"
The vector filter's fragment name is invalid.
33:45 _vectors.embedderName.fragments. test EXISTS
");
insta::assert_snapshot!(p(r#"_vectors.embedderName .fragments. test EXISTS"#), @r"
Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter.
23:46 _vectors.embedderName .fragments. test EXISTS
");
insta::assert_snapshot!(p(r#"_vectors.embedderName .fragments.test EXISTS"#), @r"
Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter.
23:45 _vectors.embedderName .fragments.test EXISTS
");
insta::assert_snapshot!(p(r#"_vectors.embedderName.fargments.test EXISTS"#), @r"
Was expecting one of `fragments`, `userProvided`, `documentTemplate`, `regenerate` or nothing, but instead found `fargments`. Did you mean `fragments`?
23:32 _vectors.embedderName.fargments.test EXISTS
");
insta::assert_snapshot!(p(r#"_vectors.embedderName."userProvided" EXISTS"#), @r#"
Was expecting this part to be unquoted.
24:36 _vectors.embedderName."userProvided" EXISTS
"#);
insta::assert_snapshot!(p(r#"_vectors.embedderName.userProvided.fragments.test EXISTS"#), @r"
Vector filter can only accept one of `fragments`, `userProvided`, `documentTemplate` or `regenerate`, but found both `userProvided` and `fragments`.
36:45 _vectors.embedderName.userProvided.fragments.test EXISTS
");
insta::assert_snapshot!(p(r#"NOT OR EXISTS AND EXISTS NOT EXISTS"#), @r###"
Was expecting a value but instead got `OR`, which is a reserved keyword. To use `OR` as a field name or a value, surround it by quotes.
5:7 NOT OR EXISTS AND EXISTS NOT EXISTS
"###);
insta::assert_snapshot!(p(r#"value NULL"#), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `value NULL`.
insta::assert_snapshot!(p(r#"value NULL"#), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `value NULL`.
1:11 value NULL
");
insta::assert_snapshot!(p(r#"value NOT NULL"#), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `value NOT NULL`.
"###);
insta::assert_snapshot!(p(r#"value NOT NULL"#), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `value NOT NULL`.
1:15 value NOT NULL
");
insta::assert_snapshot!(p(r#"value EMPTY"#), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `value EMPTY`.
"###);
insta::assert_snapshot!(p(r#"value EMPTY"#), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `value EMPTY`.
1:12 value EMPTY
");
insta::assert_snapshot!(p(r#"value NOT EMPTY"#), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `value NOT EMPTY`.
"###);
insta::assert_snapshot!(p(r#"value NOT EMPTY"#), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `value NOT EMPTY`.
1:16 value NOT EMPTY
");
insta::assert_snapshot!(p(r#"value IS"#), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `value IS`.
"###);
insta::assert_snapshot!(p(r#"value IS"#), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `value IS`.
1:9 value IS
");
insta::assert_snapshot!(p(r#"value IS NOT"#), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `value IS NOT`.
"###);
insta::assert_snapshot!(p(r#"value IS NOT"#), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `value IS NOT`.
1:13 value IS NOT
");
insta::assert_snapshot!(p(r#"value IS EXISTS"#), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `value IS EXISTS`.
"###);
insta::assert_snapshot!(p(r#"value IS EXISTS"#), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `value IS EXISTS`.
1:16 value IS EXISTS
");
insta::assert_snapshot!(p(r#"value IS NOT EXISTS"#), @r"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, `_geoBoundingBox` or `_geoPolygon` at `value IS NOT EXISTS`.
"###);
insta::assert_snapshot!(p(r#"value IS NOT EXISTS"#), @r###"
Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `CONTAINS`, `NOT CONTAINS`, `STARTS WITH`, `NOT STARTS WITH`, `_geoRadius`, or `_geoBoundingBox` at `value IS NOT EXISTS`.
1:20 value IS NOT EXISTS
");
"###);
}
#[test]
@@ -1221,43 +978,6 @@ pub mod tests {
assert!(filter.token_at_depth(3).is_none());
}
#[test]
fn fids() {
let filter = Fc::parse("field = value").unwrap().unwrap();
let fids: Vec<_> = filter.fids(MAX_FILTER_DEPTH).collect();
assert_eq!(fids.len(), 1);
assert_eq!(fids[0].value(), "field");
let filter = Fc::parse("field IN [1, 2, 3]").unwrap().unwrap();
let fids: Vec<_> = filter.fids(MAX_FILTER_DEPTH).collect();
assert_eq!(fids.len(), 1);
assert_eq!(fids[0].value(), "field");
let filter = Fc::parse("field != value").unwrap().unwrap();
let fids: Vec<_> = filter.fids(MAX_FILTER_DEPTH).collect();
assert_eq!(fids.len(), 1);
assert_eq!(fids[0].value(), "field");
let filter = Fc::parse("field1 = value1 AND field2 = value2").unwrap().unwrap();
let fids: Vec<_> = filter.fids(MAX_FILTER_DEPTH).collect();
assert_eq!(fids.len(), 2);
assert!(fids[0].value() == "field1");
assert!(fids[1].value() == "field2");
let filter = Fc::parse("field1 = value1 OR field2 = value2").unwrap().unwrap();
let fids: Vec<_> = filter.fids(MAX_FILTER_DEPTH).collect();
assert_eq!(fids.len(), 2);
assert!(fids[0].value() == "field1");
assert!(fids[1].value() == "field2");
let depth = 2;
let filter =
Fc::parse("field1 = value1 AND (field2 = value2 OR field3 = value3)").unwrap().unwrap();
let fids: Vec<_> = filter.fids(depth).collect();
assert_eq!(fids.len(), 1);
assert_eq!(fids[0].value(), "field1");
}
#[test]
fn token_from_str() {
let s = "test string that should not be parsed";

View File

@@ -52,7 +52,7 @@ fn quoted_by(quote: char, input: Span) -> IResult<Token> {
}
// word = (alphanumeric | _ | - | .)+ except for reserved keywords
pub fn word_not_keyword<'a>(input: Span<'a>) -> IResult<'a, Token<'a>> {
pub fn word_not_keyword<'a>(input: Span<'a>) -> IResult<Token<'a>> {
let (input, word): (_, Token<'a>) =
take_while1(is_value_component)(input).map(|(s, t)| (s, t.into()))?;
if is_keyword(word.value()) {
@@ -80,51 +80,6 @@ pub fn word_exact<'a, 'b: 'a>(tag: &'b str) -> impl Fn(Span<'a>) -> IResult<'a,
}
}
/// vector_value = ( non_dot_word | singleQuoted | doubleQuoted)
pub fn parse_vector_value(input: Span) -> IResult<Token> {
pub fn non_dot_word(input: Span) -> IResult<Token> {
let (input, word) = take_while1(|c| is_value_component(c) && c != '.')(input)?;
Ok((input, word.into()))
}
let (input, value) = alt((
delimited(char('\''), cut(|input| quoted_by('\'', input)), cut(char('\''))),
delimited(char('"'), cut(|input| quoted_by('"', input)), cut(char('"'))),
non_dot_word,
))(input)?;
match unescaper::unescape(value.value()) {
Ok(content) => {
if content.len() != value.value().len() {
Ok((input, Token::new(value.original_span(), Some(content))))
} else {
Ok((input, value))
}
}
Err(unescaper::Error::IncompleteStr(_)) => Err(nom::Err::Incomplete(nom::Needed::Unknown)),
Err(unescaper::Error::ParseIntError { .. }) => Err(nom::Err::Error(Error::new_from_kind(
value.original_span(),
ErrorKind::InvalidEscapedNumber,
))),
Err(unescaper::Error::InvalidChar { .. }) => Err(nom::Err::Error(Error::new_from_kind(
value.original_span(),
ErrorKind::MalformedValue,
))),
}
}
pub fn parse_vector_value_cut<'a>(input: Span<'a>, kind: ErrorKind<'a>) -> IResult<'a, Token<'a>> {
parse_vector_value(input).map_err(|e| match e {
nom::Err::Failure(e) => match e.kind() {
ErrorKind::Char(c) if *c == '"' || *c == '\'' => {
crate::Error::failure_from_kind(input, ErrorKind::VectorFilterInvalidQuotes)
}
_ => crate::Error::failure_from_kind(input, kind),
},
_ => crate::Error::failure_from_kind(input, kind),
})
}
/// value = WS* ( word | singleQuoted | doubleQuoted) WS+
pub fn parse_value(input: Span) -> IResult<Token> {
// to get better diagnostic message we are going to strip the left whitespaces from the input right now
@@ -144,21 +99,31 @@ pub fn parse_value(input: Span) -> IResult<Token> {
}
match parse_geo_radius(input) {
Ok(_) => return Err(Error::failure_from_kind(input, ErrorKind::MisusedGeoRadius)),
Ok(_) => {
return Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::MisusedGeoRadius)))
}
// if we encountered a failure it means the user badly wrote a _geoRadius filter.
// But instead of showing them how to fix his syntax we are going to tell them they should not use this filter as a value.
Err(e) if e.is_failure() => {
return Err(Error::failure_from_kind(input, ErrorKind::MisusedGeoRadius))
return Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::MisusedGeoRadius)))
}
_ => (),
}
match parse_geo_bounding_box(input) {
Ok(_) => return Err(Error::failure_from_kind(input, ErrorKind::MisusedGeoBoundingBox)),
Ok(_) => {
return Err(nom::Err::Failure(Error::new_from_kind(
input,
ErrorKind::MisusedGeoBoundingBox,
)))
}
// if we encountered a failure it means the user badly wrote a _geoBoundingBox filter.
// But instead of showing them how to fix his syntax we are going to tell them they should not use this filter as a value.
Err(e) if e.is_failure() => {
return Err(Error::failure_from_kind(input, ErrorKind::MisusedGeoBoundingBox))
return Err(nom::Err::Failure(Error::new_from_kind(
input,
ErrorKind::MisusedGeoBoundingBox,
)))
}
_ => (),
}

View File

@@ -16,7 +16,7 @@ license.workspace = true
serde_json = "1.0"
[dev-dependencies]
criterion = { version = "0.6.0", features = ["html_reports"] }
criterion = { version = "0.5.1", features = ["html_reports"] }
[[bench]]
name = "benchmarks"

View File

@@ -11,12 +11,12 @@ edition.workspace = true
license.workspace = true
[dependencies]
arbitrary = { version = "1.4.1", features = ["derive"] }
bumpalo = "3.18.1"
clap = { version = "4.5.40", features = ["derive"] }
either = "1.15.0"
fastrand = "2.3.0"
arbitrary = { version = "1.3.2", features = ["derive"] }
bumpalo = "3.16.0"
clap = { version = "4.5.9", features = ["derive"] }
either = "1.13.0"
fastrand = "2.1.0"
milli = { path = "../milli" }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
tempfile = "3.20.0"
serde = { version = "1.0.204", features = ["derive"] }
serde_json = { version = "1.0.120", features = ["preserve_order"] }
tempfile = "3.10.1"

View File

@@ -12,8 +12,8 @@ use milli::documents::mmap_from_objects;
use milli::heed::EnvOpenOptions;
use milli::progress::Progress;
use milli::update::new::indexer;
use milli::update::IndexerConfig;
use milli::vector::RuntimeEmbedders;
use milli::update::{IndexDocumentsMethod, IndexerConfig};
use milli::vector::EmbeddingConfigs;
use milli::Index;
use serde_json::Value;
use tempfile::TempDir;
@@ -57,14 +57,13 @@ fn main() {
let opt = opt.clone();
let handle = std::thread::spawn(move || {
let options = EnvOpenOptions::new();
let mut options = options.read_txn_without_tls();
let mut options = EnvOpenOptions::new();
options.map_size(1024 * 1024 * 1024 * 1024);
let tempdir = match opt.path {
Some(path) => TempDir::new_in(path).unwrap(),
None => TempDir::new().unwrap(),
};
let index = Index::new(options, tempdir.path(), true).unwrap();
let index = Index::new(options, tempdir.path()).unwrap();
let indexer_config = IndexerConfig::default();
std::thread::scope(|s| {
@@ -89,8 +88,10 @@ fn main() {
let mut new_fields_ids_map = db_fields_ids_map.clone();
let indexer_alloc = Bump::new();
let embedders = RuntimeEmbedders::default();
let mut indexer = indexer::DocumentOperation::new();
let embedders = EmbeddingConfigs::default();
let mut indexer = indexer::DocumentOperation::new(
IndexDocumentsMethod::ReplaceDocuments,
);
let mut operations = Vec::new();
for op in batch.0 {
@@ -114,7 +115,7 @@ fn main() {
for op in &operations {
match op {
Either::Left(documents) => {
indexer.replace_documents(documents).unwrap()
indexer.add_documents(documents).unwrap()
}
Either::Right(ids) => indexer.delete_documents(ids),
}
@@ -129,7 +130,6 @@ fn main() {
&mut new_fields_ids_map,
&|| false,
Progress::default(),
None,
)
.unwrap();
@@ -145,7 +145,6 @@ fn main() {
embedders,
&|| false,
&Progress::default(),
&Default::default(),
)
.unwrap();

View File

@@ -11,45 +11,42 @@ edition.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.98"
anyhow = "1.0.86"
bincode = "1.3.3"
byte-unit = "5.1.6"
bumpalo = "3.18.1"
bumparaw-collections = "0.1.4"
convert_case = "0.8.0"
csv = "1.3.1"
derive_builder = "0.20.2"
bumpalo = "3.16.0"
bumparaw-collections = "0.1.2"
convert_case = "0.6.0"
csv = "1.3.0"
derive_builder = "0.20.0"
dump = { path = "../dump" }
enum-iterator = "2.1.0"
file-store = { path = "../file-store" }
flate2 = "1.1.2"
indexmap = "2.9.0"
flate2 = "1.0.30"
meilisearch-auth = { path = "../meilisearch-auth" }
meilisearch-types = { path = "../meilisearch-types" }
memmap2 = "0.9.7"
memmap2 = "0.9.4"
page_size = "0.6.0"
rayon = "1.10.0"
roaring = { version = "0.10.12", features = ["serde"] }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
roaring = { version = "0.10.7", features = ["serde"] }
serde = { version = "1.0.204", features = ["derive"] }
serde_json = { version = "1.0.120", features = ["preserve_order"] }
synchronoise = "1.0.1"
tempfile = "3.20.0"
thiserror = "2.0.12"
time = { version = "0.3.41", features = [
tempfile = "3.10.1"
thiserror = "1.0.61"
time = { version = "0.3.36", features = [
"serde-well-known",
"formatting",
"parsing",
"macros",
] }
tracing = "0.1.41"
ureq = "2.12.1"
uuid = { version = "1.17.0", features = ["serde", "v4"] }
backoff = "0.4.0"
tracing = "0.1.40"
ureq = "2.10.0"
uuid = { version = "1.10.0", features = ["serde", "v4"] }
[dev-dependencies]
arroy = "0.5.0"
big_s = "1.0.2"
crossbeam-channel = "0.5.15"
# fixed version due to format breakages in v1.40
insta = { version = "=1.39.0", features = ["json", "redactions"] }
crossbeam-channel = "0.5.13"
insta = { version = "1.39.0", features = ["json", "redactions"] }
maplit = "1.0.2"
meili-snap = { path = "../meili-snap" }

View File

@@ -0,0 +1,901 @@
/*!
The autobatcher is responsible for combining the next enqueued
tasks affecting a single index into a [batch](crate::batch::Batch).
The main function of the autobatcher is [`next_autobatch`].
*/
use std::ops::ControlFlow::{self, Break, Continue};
use meilisearch_types::milli::update::IndexDocumentsMethod::{
self, ReplaceDocuments, UpdateDocuments,
};
use meilisearch_types::tasks::TaskId;
use crate::KindWithContent;
/// Succinctly describes a task's [`Kind`](meilisearch_types::tasks::Kind)
/// for the purpose of simplifying the implementation of the autobatcher.
///
/// Only the non-prioritised tasks that can be grouped in a batch have a corresponding [`AutobatchKind`]
enum AutobatchKind {
DocumentImport {
method: IndexDocumentsMethod,
allow_index_creation: bool,
primary_key: Option<String>,
},
DocumentEdition,
DocumentDeletion {
by_filter: bool,
},
DocumentClear,
Settings {
allow_index_creation: bool,
},
IndexCreation,
IndexDeletion,
IndexUpdate,
IndexSwap,
}
impl AutobatchKind {
#[rustfmt::skip]
fn allow_index_creation(&self) -> Option<bool> {
match self {
AutobatchKind::DocumentImport { allow_index_creation, .. }
| AutobatchKind::Settings { allow_index_creation, .. } => Some(*allow_index_creation),
_ => None,
}
}
fn primary_key(&self) -> Option<Option<&str>> {
match self {
AutobatchKind::DocumentImport { primary_key, .. } => Some(primary_key.as_deref()),
_ => None,
}
}
}
impl From<KindWithContent> for AutobatchKind {
fn from(kind: KindWithContent) -> Self {
match kind {
KindWithContent::DocumentAdditionOrUpdate {
method,
allow_index_creation,
primary_key,
..
} => AutobatchKind::DocumentImport { method, allow_index_creation, primary_key },
KindWithContent::DocumentEdition { .. } => AutobatchKind::DocumentEdition,
KindWithContent::DocumentDeletion { .. } => {
AutobatchKind::DocumentDeletion { by_filter: false }
}
KindWithContent::DocumentClear { .. } => AutobatchKind::DocumentClear,
KindWithContent::DocumentDeletionByFilter { .. } => {
AutobatchKind::DocumentDeletion { by_filter: true }
}
KindWithContent::SettingsUpdate { allow_index_creation, is_deletion, .. } => {
AutobatchKind::Settings {
allow_index_creation: allow_index_creation && !is_deletion,
}
}
KindWithContent::IndexDeletion { .. } => AutobatchKind::IndexDeletion,
KindWithContent::IndexCreation { .. } => AutobatchKind::IndexCreation,
KindWithContent::IndexUpdate { .. } => AutobatchKind::IndexUpdate,
KindWithContent::IndexSwap { .. } => AutobatchKind::IndexSwap,
KindWithContent::TaskCancelation { .. }
| KindWithContent::TaskDeletion { .. }
| KindWithContent::DumpCreation { .. }
| KindWithContent::SnapshotCreation => {
panic!("The autobatcher should never be called with tasks that don't apply to an index.")
}
}
}
}
#[derive(Debug)]
pub enum BatchKind {
DocumentClear {
ids: Vec<TaskId>,
},
DocumentOperation {
method: IndexDocumentsMethod,
allow_index_creation: bool,
primary_key: Option<String>,
operation_ids: Vec<TaskId>,
},
DocumentEdition {
id: TaskId,
},
DocumentDeletion {
deletion_ids: Vec<TaskId>,
includes_by_filter: bool,
},
ClearAndSettings {
other: Vec<TaskId>,
allow_index_creation: bool,
settings_ids: Vec<TaskId>,
},
Settings {
allow_index_creation: bool,
settings_ids: Vec<TaskId>,
},
IndexDeletion {
ids: Vec<TaskId>,
},
IndexCreation {
id: TaskId,
},
IndexUpdate {
id: TaskId,
},
IndexSwap {
id: TaskId,
},
}
impl BatchKind {
#[rustfmt::skip]
fn allow_index_creation(&self) -> Option<bool> {
match self {
BatchKind::DocumentOperation { allow_index_creation, .. }
| BatchKind::ClearAndSettings { allow_index_creation, .. }
| BatchKind::Settings { allow_index_creation, .. } => Some(*allow_index_creation),
_ => None,
}
}
fn primary_key(&self) -> Option<Option<&str>> {
match self {
BatchKind::DocumentOperation { primary_key, .. } => Some(primary_key.as_deref()),
_ => None,
}
}
}
impl BatchKind {
/// Returns a `ControlFlow::Break` if you must stop right now.
/// The boolean tell you if an index has been created by the batched task.
/// To ease the writing of the code. `true` can be returned when you don't need to create an index
/// but false can't be returned if you needs to create an index.
// TODO use an AutoBatchKind as input
pub fn new(
task_id: TaskId,
kind: KindWithContent,
primary_key: Option<&str>,
) -> (ControlFlow<BatchKind, BatchKind>, bool) {
use AutobatchKind as K;
match AutobatchKind::from(kind) {
K::IndexCreation => (Break(BatchKind::IndexCreation { id: task_id }), true),
K::IndexDeletion => (Break(BatchKind::IndexDeletion { ids: vec![task_id] }), false),
K::IndexUpdate => (Break(BatchKind::IndexUpdate { id: task_id }), false),
K::IndexSwap => (Break(BatchKind::IndexSwap { id: task_id }), false),
K::DocumentClear => (Continue(BatchKind::DocumentClear { ids: vec![task_id] }), false),
K::DocumentImport { method, allow_index_creation, primary_key: pk }
if primary_key.is_none() || pk.is_none() || primary_key == pk.as_deref() =>
{
(
Continue(BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key: pk,
operation_ids: vec![task_id],
}),
allow_index_creation,
)
}
// if the primary key set in the task was different than ours we should stop and make this batch fail asap.
K::DocumentImport { method, allow_index_creation, primary_key } => (
Break(BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key,
operation_ids: vec![task_id],
}),
allow_index_creation,
),
K::DocumentEdition => (Break(BatchKind::DocumentEdition { id: task_id }), false),
K::DocumentDeletion { by_filter: includes_by_filter } => (
Continue(BatchKind::DocumentDeletion {
deletion_ids: vec![task_id],
includes_by_filter,
}),
false,
),
K::Settings { allow_index_creation } => (
Continue(BatchKind::Settings { allow_index_creation, settings_ids: vec![task_id] }),
allow_index_creation,
),
}
}
/// Returns a `ControlFlow::Break` if you must stop right now.
/// The boolean tell you if an index has been created by the batched task.
/// To ease the writing of the code. `true` can be returned when you don't need to create an index
/// but false can't be returned if you needs to create an index.
#[rustfmt::skip]
fn accumulate(self, id: TaskId, kind: AutobatchKind, index_already_exists: bool, primary_key: Option<&str>) -> ControlFlow<BatchKind, BatchKind> {
use AutobatchKind as K;
match (self, kind) {
// We don't batch any of these operations
(this, K::IndexCreation | K::IndexUpdate | K::IndexSwap | K::DocumentEdition) => Break(this),
// We must not batch tasks that don't have the same index creation rights if the index doesn't already exists.
(this, kind) if !index_already_exists && this.allow_index_creation() == Some(false) && kind.allow_index_creation() == Some(true) => {
Break(this)
},
// NOTE: We need to negate the whole condition since we're checking if we need to break instead of continue.
// I wrote it this way because it's easier to understand than the other way around.
(this, kind) if !(
// 1. If both task don't interact with primary key -> we can continue
(this.primary_key().is_none() && kind.primary_key().is_none()) ||
// 2. Else ->
(
// 2.1 If we already have a primary-key ->
(
primary_key.is_some() &&
// 2.1.1 If the task we're trying to accumulate have a pk it must be equal to our primary key
// 2.1.2 If the task don't have a primary-key -> we can continue
kind.primary_key().map_or(true, |pk| pk == primary_key)
) ||
// 2.2 If we don't have a primary-key ->
(
// 2.2.1 If both the batch and the task have a primary key they should be equal
// 2.2.2 If the batch is set to Some(None), the task should be too
// 2.2.3 If the batch is set to None -> we can continue
this.primary_key().zip(kind.primary_key()).map_or(true, |(this, kind)| this == kind)
)
)
) // closing the negation
=> {
Break(this)
},
// The index deletion can batch with everything but must stop after
(
BatchKind::DocumentClear { mut ids }
| BatchKind::DocumentDeletion { deletion_ids: mut ids, includes_by_filter: _ }
| BatchKind::DocumentOperation { method: _, allow_index_creation: _, primary_key: _, operation_ids: mut ids }
| BatchKind::Settings { allow_index_creation: _, settings_ids: mut ids },
K::IndexDeletion,
) => {
ids.push(id);
Break(BatchKind::IndexDeletion { ids })
}
(
BatchKind::ClearAndSettings { settings_ids: mut ids, allow_index_creation: _, mut other },
K::IndexDeletion,
) => {
ids.push(id);
ids.append(&mut other);
Break(BatchKind::IndexDeletion { ids })
}
(
BatchKind::DocumentClear { mut ids },
K::DocumentClear | K::DocumentDeletion { by_filter: _ },
) => {
ids.push(id);
Continue(BatchKind::DocumentClear { ids })
}
(
this @ BatchKind::DocumentClear { .. },
K::DocumentImport { .. } | K::Settings { .. },
) => Break(this),
(
BatchKind::DocumentOperation { method: _, allow_index_creation: _, primary_key: _, mut operation_ids },
K::DocumentClear,
) => {
operation_ids.push(id);
Continue(BatchKind::DocumentClear { ids: operation_ids })
}
// we can autobatch the same kind of document additions / updates
(
BatchKind::DocumentOperation { method: ReplaceDocuments, allow_index_creation, primary_key: _, mut operation_ids },
K::DocumentImport { method: ReplaceDocuments, primary_key: pk, .. },
) => {
operation_ids.push(id);
Continue(BatchKind::DocumentOperation {
method: ReplaceDocuments,
allow_index_creation,
operation_ids,
primary_key: pk,
})
}
(
BatchKind::DocumentOperation { method: UpdateDocuments, allow_index_creation, primary_key: _, mut operation_ids },
K::DocumentImport { method: UpdateDocuments, primary_key: pk, .. },
) => {
operation_ids.push(id);
Continue(BatchKind::DocumentOperation {
method: UpdateDocuments,
allow_index_creation,
primary_key: pk,
operation_ids,
})
}
(
BatchKind::DocumentOperation { method, allow_index_creation, primary_key, mut operation_ids },
K::DocumentDeletion { by_filter: false },
) => {
operation_ids.push(id);
Continue(BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key,
operation_ids,
})
}
// We can't batch a document operation with a delete by filter
(
this @ BatchKind::DocumentOperation { .. },
K::DocumentDeletion { by_filter: true },
) => {
Break(this)
}
// but we can't autobatch documents if it's not the same kind
// this match branch MUST be AFTER the previous one
(
this @ BatchKind::DocumentOperation { .. },
K::DocumentImport { .. },
) => Break(this),
(
this @ BatchKind::DocumentOperation { .. },
K::Settings { .. },
) => Break(this),
(BatchKind::DocumentDeletion { mut deletion_ids, includes_by_filter: _ }, K::DocumentClear) => {
deletion_ids.push(id);
Continue(BatchKind::DocumentClear { ids: deletion_ids })
}
// we can't autobatch the deletion and import if the document deletion contained a filter
(
this @ BatchKind::DocumentDeletion { deletion_ids: _, includes_by_filter: true },
K::DocumentImport { .. }
) => Break(this),
// we can autobatch the deletion and import if the index already exists
(
BatchKind::DocumentDeletion { mut deletion_ids, includes_by_filter: false },
K::DocumentImport { method, allow_index_creation, primary_key }
) if index_already_exists => {
deletion_ids.push(id);
Continue(BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key,
operation_ids: deletion_ids,
})
}
// we can autobatch the deletion and import if both can't create an index
(
BatchKind::DocumentDeletion { mut deletion_ids, includes_by_filter: false },
K::DocumentImport { method, allow_index_creation, primary_key }
) if !allow_index_creation => {
deletion_ids.push(id);
Continue(BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key,
operation_ids: deletion_ids,
})
}
// we can't autobatch a deletion and an import if the index does not exists but would be created by an addition
(
this @ BatchKind::DocumentDeletion { .. },
K::DocumentImport { .. }
) => {
Break(this)
}
(BatchKind::DocumentDeletion { mut deletion_ids, includes_by_filter }, K::DocumentDeletion { by_filter }) => {
deletion_ids.push(id);
Continue(BatchKind::DocumentDeletion { deletion_ids, includes_by_filter: includes_by_filter | by_filter })
}
(this @ BatchKind::DocumentDeletion { .. }, K::Settings { .. }) => Break(this),
(
BatchKind::Settings { settings_ids, allow_index_creation },
K::DocumentClear,
) => Continue(BatchKind::ClearAndSettings {
settings_ids,
allow_index_creation,
other: vec![id],
}),
(
this @ BatchKind::Settings { .. },
K::DocumentImport { .. } | K::DocumentDeletion { .. },
) => Break(this),
(
BatchKind::Settings { mut settings_ids, allow_index_creation },
K::Settings { .. },
) => {
settings_ids.push(id);
Continue(BatchKind::Settings {
allow_index_creation,
settings_ids,
})
}
(
BatchKind::ClearAndSettings { mut other, settings_ids, allow_index_creation },
K::DocumentClear,
) => {
other.push(id);
Continue(BatchKind::ClearAndSettings {
other,
settings_ids,
allow_index_creation,
})
}
(this @ BatchKind::ClearAndSettings { .. }, K::DocumentImport { .. }) => Break(this),
(
BatchKind::ClearAndSettings {
mut other,
settings_ids,
allow_index_creation,
},
K::DocumentDeletion { .. },
) => {
other.push(id);
Continue(BatchKind::ClearAndSettings {
other,
settings_ids,
allow_index_creation,
})
}
(
BatchKind::ClearAndSettings { mut settings_ids, other, allow_index_creation },
K::Settings { .. },
) => {
settings_ids.push(id);
Continue(BatchKind::ClearAndSettings {
other,
settings_ids,
allow_index_creation,
})
}
(
BatchKind::IndexCreation { .. }
| BatchKind::IndexDeletion { .. }
| BatchKind::IndexUpdate { .. }
| BatchKind::IndexSwap { .. }
| BatchKind::DocumentEdition { .. },
_,
) => {
unreachable!()
}
}
}
}
/// Create a batch from an ordered list of tasks.
///
/// ## Preconditions
/// 1. The tasks must be enqueued and given in the order in which they were enqueued
/// 2. The tasks must not be prioritised tasks (e.g. task cancellation, dump, snapshot, task deletion)
/// 3. The tasks must all be related to the same index
///
/// ## Return
/// `None` if the list of tasks is empty. Otherwise, an [`AutoBatch`] that represents
/// a subset of the given tasks.
pub fn autobatch(
enqueued: Vec<(TaskId, KindWithContent)>,
index_already_exists: bool,
primary_key: Option<&str>,
) -> Option<(BatchKind, bool)> {
let mut enqueued = enqueued.into_iter();
let (id, kind) = enqueued.next()?;
// index_exist will keep track of if the index should exist at this point after the tasks we batched.
let mut index_exist = index_already_exists;
let (mut acc, must_create_index) = match BatchKind::new(id, kind, primary_key) {
(Continue(acc), create) => (acc, create),
(Break(acc), create) => return Some((acc, create)),
};
// if an index has been created in the previous step we can consider it as existing.
index_exist |= must_create_index;
for (id, kind) in enqueued {
acc = match acc.accumulate(id, kind.into(), index_exist, primary_key) {
Continue(acc) => acc,
Break(acc) => return Some((acc, must_create_index)),
};
}
Some((acc, must_create_index))
}
#[cfg(test)]
mod tests {
use meilisearch_types::tasks::IndexSwap;
use uuid::Uuid;
use super::*;
use crate::debug_snapshot;
fn autobatch_from(
index_already_exists: bool,
primary_key: Option<&str>,
input: impl IntoIterator<Item = KindWithContent>,
) -> Option<(BatchKind, bool)> {
autobatch(
input.into_iter().enumerate().map(|(id, kind)| (id as TaskId, kind)).collect(),
index_already_exists,
primary_key,
)
}
fn doc_imp(
method: IndexDocumentsMethod,
allow_index_creation: bool,
primary_key: Option<&str>,
) -> KindWithContent {
KindWithContent::DocumentAdditionOrUpdate {
index_uid: String::from("doggo"),
primary_key: primary_key.map(|pk| pk.to_string()),
method,
content_file: Uuid::new_v4(),
documents_count: 0,
allow_index_creation,
}
}
fn doc_del() -> KindWithContent {
KindWithContent::DocumentDeletion {
index_uid: String::from("doggo"),
documents_ids: Vec::new(),
}
}
fn doc_del_fil() -> KindWithContent {
KindWithContent::DocumentDeletionByFilter {
index_uid: String::from("doggo"),
filter_expr: serde_json::json!("cuteness > 100"),
}
}
fn doc_clr() -> KindWithContent {
KindWithContent::DocumentClear { index_uid: String::from("doggo") }
}
fn settings(allow_index_creation: bool) -> KindWithContent {
KindWithContent::SettingsUpdate {
index_uid: String::from("doggo"),
new_settings: Default::default(),
is_deletion: false,
allow_index_creation,
}
}
fn idx_create() -> KindWithContent {
KindWithContent::IndexCreation { index_uid: String::from("doggo"), primary_key: None }
}
fn idx_update() -> KindWithContent {
KindWithContent::IndexUpdate { index_uid: String::from("doggo"), primary_key: None }
}
fn idx_del() -> KindWithContent {
KindWithContent::IndexDeletion { index_uid: String::from("doggo") }
}
fn idx_swap() -> KindWithContent {
KindWithContent::IndexSwap {
swaps: vec![IndexSwap { indexes: (String::from("doggo"), String::from("catto")) }],
}
}
#[test]
fn autobatch_simple_operation_together() {
// we can autobatch one or multiple `ReplaceDocuments` together.
// if the index exists.
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp( ReplaceDocuments, false , None), doc_imp(ReplaceDocuments, false , None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1, 2] }, false))");
// if it doesn't exists.
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
// we can autobatch one or multiple `UpdateDocuments` together.
// if the index exists.
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1, 2] }, false))");
// if it doesn't exists.
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1, 2] }, false))");
// we can autobatch one or multiple DocumentDeletion together
debug_snapshot!(autobatch_from(true, None, [doc_del()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_del(), doc_del()]), @"Some((DocumentDeletion { deletion_ids: [0, 1, 2], includes_by_filter: false }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_del()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_del(), doc_del(), doc_del()]), @"Some((DocumentDeletion { deletion_ids: [0, 1, 2], includes_by_filter: false }, false))");
// we can autobatch one or multiple DocumentDeletionByFilter together
debug_snapshot!(autobatch_from(true, None, [doc_del_fil()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_del_fil(), doc_del_fil()]), @"Some((DocumentDeletion { deletion_ids: [0, 1, 2], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_del_fil()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_del_fil(), doc_del_fil(), doc_del_fil()]), @"Some((DocumentDeletion { deletion_ids: [0, 1, 2], includes_by_filter: true }, false))");
// we can autobatch one or multiple Settings together
debug_snapshot!(autobatch_from(true, None, [settings(true)]), @"Some((Settings { allow_index_creation: true, settings_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [settings(true), settings(true), settings(true)]), @"Some((Settings { allow_index_creation: true, settings_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [settings(false)]), @"Some((Settings { allow_index_creation: false, settings_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [settings(false), settings(false), settings(false)]), @"Some((Settings { allow_index_creation: false, settings_ids: [0, 1, 2] }, false))");
debug_snapshot!(autobatch_from(false,None, [settings(true)]), @"Some((Settings { allow_index_creation: true, settings_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [settings(true), settings(true), settings(true)]), @"Some((Settings { allow_index_creation: true, settings_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(false,None, [settings(false)]), @"Some((Settings { allow_index_creation: false, settings_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [settings(false), settings(false), settings(false)]), @"Some((Settings { allow_index_creation: false, settings_ids: [0, 1, 2] }, false))");
// We can autobatch document addition with document deletion
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), doc_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, true, None), doc_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, true, None), doc_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, false, None), doc_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, false, None), doc_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
// And the other way around
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, true, Some("catto"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, true, Some("catto"))]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
// But we can't autobatch document addition with document deletion by filter
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_del_fil()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), doc_del_fil()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0] }, false))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0] }, false))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, false, None), doc_del_fil()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, false, None), doc_del_fil()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, true, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, true, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, false, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0] }, false))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, false, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0] }, false))"###);
// And the other way around
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(ReplaceDocuments, true, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(UpdateDocuments, true, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(ReplaceDocuments, false, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(UpdateDocuments, false, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del_fil(), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del_fil(), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del_fil(), doc_imp(ReplaceDocuments, false, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del_fil(), doc_imp(UpdateDocuments, false, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
}
#[test]
fn simple_document_operation_dont_autobatch_with_other() {
// addition, updates and deletion by filter can't batch together
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_create()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_create()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_create()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), idx_create()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_update()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_update()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_update()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), idx_update()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_swap()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_swap()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_swap()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), idx_swap()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false))");
}
#[test]
fn document_addition_doesnt_batch_with_settings() {
// simple case
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
// multiple settings and doc addition
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
// addition and setting unordered
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
// Doesn't batch with other forbidden operations
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_create()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_create()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_update()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_update()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_swap()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_swap()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
}
#[test]
fn clear_and_additions() {
// these two doesn't need to batch
debug_snapshot!(autobatch_from(true, None, [doc_clr(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentClear { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_clr(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentClear { ids: [0] }, false))");
// Basic use case
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), doc_clr()]), @"Some((DocumentClear { ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_clr()]), @"Some((DocumentClear { ids: [0, 1, 2] }, true))");
// This batch kind doesn't mix with other document addition
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), doc_clr(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentClear { ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_clr(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentClear { ids: [0, 1, 2] }, true))");
// But you can batch multiple clear together
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), doc_clr(), doc_clr(), doc_clr()]), @"Some((DocumentClear { ids: [0, 1, 2, 3, 4] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_clr(), doc_clr(), doc_clr()]), @"Some((DocumentClear { ids: [0, 1, 2, 3, 4] }, true))");
}
#[test]
fn clear_and_additions_and_settings() {
// A clear don't need to autobatch the settings that happens AFTER there is no documents
debug_snapshot!(autobatch_from(true, None, [doc_clr(), settings(true)]), @"Some((DocumentClear { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [settings(true), doc_clr(), settings(true)]), @"Some((ClearAndSettings { other: [1], allow_index_creation: true, settings_ids: [0, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_clr()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_clr()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
}
#[test]
fn anything_and_index_deletion() {
// The `IndexDeletion` doesn't batch with anything that happens AFTER.
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_imp(ReplaceDocuments, false, None)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_imp(UpdateDocuments, false, None)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_del()]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_del_fil()]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_clr()]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), settings(true)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), settings(false)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_imp(ReplaceDocuments, false, None)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_imp(UpdateDocuments, false, None)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_del()]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_del_fil()]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_clr()]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), settings(true)]), @"Some((IndexDeletion { ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), settings(false)]), @"Some((IndexDeletion { ids: [0] }, false))");
// The index deletion can accept almost any type of `BatchKind` and transform it to an `IndexDeletion`.
// First, the basic cases
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_del(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_del_fil(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
}
#[test]
fn allowed_and_disallowed_index_creation() {
// `DocumentImport` can't be mixed with those disallowed to do so except if the index already exists.
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
// batch deletion and addition
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, true, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, true, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false))");
}
#[test]
fn autobatch_primary_key() {
// ==> If I have a pk
// With a single update
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
// With a multiple updates
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other"))]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
// ==> If I don't have a pk
// With a single update
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
// With a multiple updates
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id"))]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,329 +0,0 @@
#![allow(clippy::result_large_err)]
use std::collections::HashMap;
use std::io;
use dump::{KindDump, TaskDump, UpdateFile};
use meilisearch_types::batches::{Batch, BatchId};
use meilisearch_types::heed::RwTxn;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::milli;
use meilisearch_types::tasks::{Kind, KindWithContent, Status, Task};
use roaring::RoaringBitmap;
use uuid::Uuid;
use crate::{utils, Error, IndexScheduler, Result};
pub struct Dump<'a> {
index_scheduler: &'a IndexScheduler,
wtxn: RwTxn<'a>,
batch_to_task_mapping: HashMap<BatchId, RoaringBitmap>,
indexes: HashMap<String, RoaringBitmap>,
statuses: HashMap<Status, RoaringBitmap>,
kinds: HashMap<Kind, RoaringBitmap>,
batch_indexes: HashMap<String, RoaringBitmap>,
batch_statuses: HashMap<Status, RoaringBitmap>,
batch_kinds: HashMap<Kind, RoaringBitmap>,
}
impl<'a> Dump<'a> {
pub(crate) fn new(index_scheduler: &'a mut IndexScheduler) -> Result<Self> {
// While loading a dump no one should be able to access the scheduler thus I can block everything.
let wtxn = index_scheduler.env.write_txn()?;
Ok(Dump {
index_scheduler,
wtxn,
batch_to_task_mapping: HashMap::new(),
indexes: HashMap::new(),
statuses: HashMap::new(),
kinds: HashMap::new(),
batch_indexes: HashMap::new(),
batch_statuses: HashMap::new(),
batch_kinds: HashMap::new(),
})
}
/// Register a new batch coming from a dump in the scheduler.
/// By taking a mutable ref we're pretty sure no one will ever import a dump while actix is running.
pub fn register_dumped_batch(&mut self, batch: Batch) -> Result<()> {
self.index_scheduler.queue.batches.all_batches.put(&mut self.wtxn, &batch.uid, &batch)?;
if let Some(enqueued_at) = batch.enqueued_at {
utils::insert_task_datetime(
&mut self.wtxn,
self.index_scheduler.queue.batches.enqueued_at,
enqueued_at.earliest,
batch.uid,
)?;
utils::insert_task_datetime(
&mut self.wtxn,
self.index_scheduler.queue.batches.enqueued_at,
enqueued_at.oldest,
batch.uid,
)?;
}
utils::insert_task_datetime(
&mut self.wtxn,
self.index_scheduler.queue.batches.started_at,
batch.started_at,
batch.uid,
)?;
if let Some(finished_at) = batch.finished_at {
utils::insert_task_datetime(
&mut self.wtxn,
self.index_scheduler.queue.batches.finished_at,
finished_at,
batch.uid,
)?;
}
for index in batch.stats.index_uids.keys() {
match self.batch_indexes.get_mut(index) {
Some(bitmap) => {
bitmap.insert(batch.uid);
}
None => {
let mut bitmap = RoaringBitmap::new();
bitmap.insert(batch.uid);
self.batch_indexes.insert(index.to_string(), bitmap);
}
};
}
for status in batch.stats.status.keys() {
self.batch_statuses.entry(*status).or_default().insert(batch.uid);
}
for kind in batch.stats.types.keys() {
self.batch_kinds.entry(*kind).or_default().insert(batch.uid);
}
Ok(())
}
/// Register a new task coming from a dump in the scheduler.
/// By taking a mutable ref we're pretty sure no one will ever import a dump while actix is running.
pub fn register_dumped_task(
&mut self,
task: TaskDump,
content_file: Option<Box<UpdateFile>>,
) -> Result<Task> {
let task_has_no_docs = matches!(task.kind, KindDump::DocumentImport { documents_count, .. } if documents_count == 0);
let content_uuid = match content_file {
Some(content_file) if task.status == Status::Enqueued => {
let (uuid, file) = self.index_scheduler.queue.create_update_file(false)?;
let mut writer = io::BufWriter::new(file);
for doc in content_file {
let doc = doc?;
serde_json::to_writer(&mut writer, &doc).map_err(|e| {
Error::from_milli(milli::InternalError::SerdeJson(e).into(), None)
})?;
}
let file = writer.into_inner().map_err(|e| e.into_error())?;
file.persist()?;
Some(uuid)
}
// If the task isn't `Enqueued` then just generate a recognisable `Uuid`
// in case we try to open it later.
_ if task.status != Status::Enqueued => Some(Uuid::nil()),
None if task.status == Status::Enqueued && task_has_no_docs => {
let (uuid, file) = self.index_scheduler.queue.create_update_file(false)?;
file.persist()?;
Some(uuid)
}
_ => None,
};
let task = Task {
uid: task.uid,
batch_uid: task.batch_uid,
enqueued_at: task.enqueued_at,
started_at: task.started_at,
finished_at: task.finished_at,
error: task.error,
canceled_by: task.canceled_by,
details: task.details,
status: task.status,
network: task.network,
kind: match task.kind {
KindDump::DocumentImport {
primary_key,
method,
documents_count,
allow_index_creation,
} => KindWithContent::DocumentAdditionOrUpdate {
index_uid: task.index_uid.ok_or(Error::CorruptedDump)?,
primary_key,
method,
content_file: content_uuid.ok_or(Error::CorruptedDump)?,
documents_count,
allow_index_creation,
},
KindDump::DocumentDeletion { documents_ids } => KindWithContent::DocumentDeletion {
documents_ids,
index_uid: task.index_uid.ok_or(Error::CorruptedDump)?,
},
KindDump::DocumentDeletionByFilter { filter } => {
KindWithContent::DocumentDeletionByFilter {
filter_expr: filter,
index_uid: task.index_uid.ok_or(Error::CorruptedDump)?,
}
}
KindDump::DocumentEdition { filter, context, function } => {
KindWithContent::DocumentEdition {
index_uid: task.index_uid.ok_or(Error::CorruptedDump)?,
filter_expr: filter,
context,
function,
}
}
KindDump::DocumentClear => KindWithContent::DocumentClear {
index_uid: task.index_uid.ok_or(Error::CorruptedDump)?,
},
KindDump::Settings { settings, is_deletion, allow_index_creation } => {
KindWithContent::SettingsUpdate {
index_uid: task.index_uid.ok_or(Error::CorruptedDump)?,
new_settings: settings,
is_deletion,
allow_index_creation,
}
}
KindDump::IndexDeletion => KindWithContent::IndexDeletion {
index_uid: task.index_uid.ok_or(Error::CorruptedDump)?,
},
KindDump::IndexCreation { primary_key } => KindWithContent::IndexCreation {
index_uid: task.index_uid.ok_or(Error::CorruptedDump)?,
primary_key,
},
KindDump::IndexUpdate { primary_key, uid } => KindWithContent::IndexUpdate {
index_uid: task.index_uid.ok_or(Error::CorruptedDump)?,
primary_key,
new_index_uid: uid,
},
KindDump::IndexSwap { swaps } => KindWithContent::IndexSwap { swaps },
KindDump::TaskCancelation { query, tasks } => {
KindWithContent::TaskCancelation { query, tasks }
}
KindDump::TasksDeletion { query, tasks } => {
KindWithContent::TaskDeletion { query, tasks }
}
KindDump::DumpCreation { keys, instance_uid } => {
KindWithContent::DumpCreation { keys, instance_uid }
}
KindDump::SnapshotCreation => KindWithContent::SnapshotCreation,
KindDump::Export { url, api_key, payload_size, indexes } => {
KindWithContent::Export {
url,
api_key,
payload_size,
indexes: indexes
.into_iter()
.map(|(pattern, settings)| {
Ok((
IndexUidPattern::try_from(pattern)
.map_err(|_| Error::CorruptedDump)?,
settings,
))
})
.collect::<Result<_, Error>>()?,
}
}
KindDump::UpgradeDatabase { from } => KindWithContent::UpgradeDatabase { from },
KindDump::IndexCompaction { index_uid } => {
KindWithContent::IndexCompaction { index_uid }
}
},
};
self.index_scheduler.queue.tasks.all_tasks.put(&mut self.wtxn, &task.uid, &task)?;
if let Some(batch_id) = task.batch_uid {
self.batch_to_task_mapping.entry(batch_id).or_default().insert(task.uid);
}
for index in task.indexes() {
match self.indexes.get_mut(index) {
Some(bitmap) => {
bitmap.insert(task.uid);
}
None => {
let mut bitmap = RoaringBitmap::new();
bitmap.insert(task.uid);
self.indexes.insert(index.to_string(), bitmap);
}
};
}
utils::insert_task_datetime(
&mut self.wtxn,
self.index_scheduler.queue.tasks.enqueued_at,
task.enqueued_at,
task.uid,
)?;
// we can't override the started_at & finished_at, so we must only set it if the tasks is finished and won't change
if matches!(task.status, Status::Succeeded | Status::Failed | Status::Canceled) {
if let Some(started_at) = task.started_at {
utils::insert_task_datetime(
&mut self.wtxn,
self.index_scheduler.queue.tasks.started_at,
started_at,
task.uid,
)?;
}
if let Some(finished_at) = task.finished_at {
utils::insert_task_datetime(
&mut self.wtxn,
self.index_scheduler.queue.tasks.finished_at,
finished_at,
task.uid,
)?;
}
}
self.statuses.entry(task.status).or_default().insert(task.uid);
self.kinds.entry(task.kind.as_kind()).or_default().insert(task.uid);
Ok(task)
}
/// Commit all the changes and exit the importing dump state
pub fn finish(mut self) -> Result<()> {
for (batch_id, task_ids) in self.batch_to_task_mapping {
self.index_scheduler.queue.batch_to_tasks_mapping.put(
&mut self.wtxn,
&batch_id,
&task_ids,
)?;
}
for (index, bitmap) in self.indexes {
self.index_scheduler.queue.tasks.index_tasks.put(&mut self.wtxn, &index, &bitmap)?;
}
for (status, bitmap) in self.statuses {
self.index_scheduler.queue.tasks.put_status(&mut self.wtxn, status, &bitmap)?;
}
for (kind, bitmap) in self.kinds {
self.index_scheduler.queue.tasks.put_kind(&mut self.wtxn, kind, &bitmap)?;
}
for (index, bitmap) in self.batch_indexes {
self.index_scheduler.queue.batches.index_tasks.put(&mut self.wtxn, &index, &bitmap)?;
}
for (status, bitmap) in self.batch_statuses {
self.index_scheduler.queue.batches.put_status(&mut self.wtxn, status, &bitmap)?;
}
for (kind, bitmap) in self.batch_kinds {
self.index_scheduler.queue.batches.put_kind(&mut self.wtxn, kind, &bitmap)?;
}
self.wtxn.commit()?;
self.index_scheduler.scheduler.wake_up.signal();
Ok(())
}
}

View File

@@ -2,7 +2,6 @@ use std::fmt::Display;
use meilisearch_types::batches::BatchId;
use meilisearch_types::error::{Code, ErrorCode};
use meilisearch_types::milli::index::RollbackOutcome;
use meilisearch_types::tasks::{Kind, Status};
use meilisearch_types::{heed, milli};
use thiserror::Error;
@@ -67,8 +66,6 @@ pub enum Error {
SwapDuplicateIndexesFound(Vec<String>),
#[error("Index `{0}` not found.")]
SwapIndexNotFound(String),
#[error("Cannot rename `{0}` to `{1}` as the index already exists. Hint: You can remove `{1}` first and then do your remove.")]
SwapIndexFoundDuringRename(String, String),
#[error("Meilisearch cannot receive write operations because the limit of the task database has been reached. Please delete tasks to continue performing write operations.")]
NoSpaceLeftInTaskQueue,
#[error(
@@ -76,10 +73,6 @@ pub enum Error {
.0.iter().map(|s| format!("`{}`", s)).collect::<Vec<_>>().join(", ")
)]
SwapIndexesNotFound(Vec<String>),
#[error("The following indexes are being renamed but cannot because their new name conflicts with an already existing index: {}. Renaming doesn't overwrite the other index name.",
.0.iter().map(|s| format!("`{}`", s)).collect::<Vec<_>>().join(", ")
)]
SwapIndexesFoundDuringRename(Vec<String>),
#[error("Corrupted dump.")]
CorruptedDump,
#[error(
@@ -116,8 +109,6 @@ pub enum Error {
InvalidIndexUid { index_uid: String },
#[error("Task `{0}` not found.")]
TaskNotFound(TaskId),
#[error("Task `{0}` does not contain any documents. Only `documentAdditionOrUpdate` tasks with the statuses `enqueued` or `processing` contain documents")]
TaskFileNotFound(TaskId),
#[error("Batch `{0}` not found.")]
BatchNotFound(BatchId),
#[error("Query parameters to filter the tasks to delete are missing. Available query parameters are: `uids`, `indexUids`, `statuses`, `types`, `canceledBy`, `beforeEnqueuedAt`, `afterEnqueuedAt`, `beforeStartedAt`, `afterStartedAt`, `beforeFinishedAt`, `afterFinishedAt`.")]
@@ -136,8 +127,8 @@ pub enum Error {
_ => format!("{error}")
})]
Milli { error: milli::Error, index_uid: Option<String> },
#[error("An unexpected crash occurred when processing the task: {0}")]
ProcessBatchPanicked(String),
#[error("An unexpected crash occurred when processing the task.")]
ProcessBatchPanicked,
#[error(transparent)]
FileStore(#[from] file_store::Error),
#[error(transparent)]
@@ -156,29 +147,7 @@ pub enum Error {
#[error("Corrupted task queue.")]
CorruptedTaskQueue,
#[error(transparent)]
DatabaseUpgrade(Box<Self>),
#[error(transparent)]
Export(Box<Self>),
#[error("Failed to export documents to remote server {code} ({type}): {message} <{link}>")]
FromRemoteWhenExporting { message: String, code: String, r#type: String, link: String },
#[error("Failed to rollback for index `{index}`: {rollback_outcome} ")]
RollbackFailed { index: String, rollback_outcome: RollbackOutcome },
#[error(transparent)]
UnrecoverableError(Box<Self>),
#[error("The index scheduler is in version v{}.{}.{}, but Meilisearch is in version v{}.{}.{}.\n - hint: start the correct version of Meilisearch, or consider updating your database. See also <https://www.meilisearch.com/docs/learn/update_and_migration/updating>",
index_scheduler_version.0, index_scheduler_version.1, index_scheduler_version.2,
package_version.0, package_version.1, package_version.2)]
IndexSchedulerVersionMismatch {
index_scheduler_version: (u32, u32, u32),
package_version: (u32, u32, u32),
},
#[error("Index `{index}` is in version v{}.{}.{}, but Meilisearch is in version v{}.{}.{}.\n - note: this is an internal error, please consider filing a bug report: <https://github.com/meilisearch/meilisearch/issues/new?template=bug_report.md>",
index_version.0, index_version.1, index_version.2, package_version.0, package_version.1, package_version.2)]
IndexVersionMismatch {
index: String,
index_version: (u32, u32, u32),
package_version: (u32, u32, u32),
},
TaskDatabaseUpdate(Box<Self>),
#[error(transparent)]
HeedTransaction(heed::Error),
@@ -209,8 +178,6 @@ impl Error {
| Error::SwapIndexNotFound(_)
| Error::NoSpaceLeftInTaskQueue
| Error::SwapIndexesNotFound(_)
| Error::SwapIndexFoundDuringRename(_, _)
| Error::SwapIndexesFoundDuringRename(_)
| Error::CorruptedDump
| Error::InvalidTaskDate { .. }
| Error::InvalidTaskUid { .. }
@@ -220,29 +187,22 @@ impl Error {
| Error::InvalidTaskCanceledBy { .. }
| Error::InvalidIndexUid { .. }
| Error::TaskNotFound(_)
| Error::TaskFileNotFound(_)
| Error::BatchNotFound(_)
| Error::TaskDeletionWithEmptyQuery
| Error::TaskCancelationWithEmptyQuery
| Error::FromRemoteWhenExporting { .. }
| Error::AbortedTask
| Error::Dump(_)
| Error::Heed(_)
| Error::Milli { .. }
| Error::ProcessBatchPanicked(_)
| Error::ProcessBatchPanicked
| Error::FileStore(_)
| Error::IoError(_)
| Error::Persist(_)
| Error::FeatureNotEnabled(_)
| Error::Export(_)
| Error::Anyhow(_) => true,
Error::CreateBatch(_)
| Error::CorruptedTaskQueue
| Error::DatabaseUpgrade(_)
| Error::UnrecoverableError(_)
| Error::IndexSchedulerVersionMismatch { .. }
| Error::IndexVersionMismatch { .. }
| Error::RollbackFailed { .. }
| Error::TaskDatabaseUpdate(_)
| Error::HeedTransaction(_) => false,
#[cfg(test)]
Error::PlannedFailure => false,
@@ -279,8 +239,6 @@ impl ErrorCode for Error {
Error::SwapDuplicateIndexFound(_) => Code::InvalidSwapDuplicateIndexFound,
Error::SwapIndexNotFound(_) => Code::IndexNotFound,
Error::SwapIndexesNotFound(_) => Code::IndexNotFound,
Error::SwapIndexFoundDuringRename(_, _) => Code::IndexAlreadyExists,
Error::SwapIndexesFoundDuringRename(_) => Code::IndexAlreadyExists,
Error::InvalidTaskDate { field, .. } => (*field).into(),
Error::InvalidTaskUid { .. } => Code::InvalidTaskUids,
Error::InvalidBatchUid { .. } => Code::InvalidBatchUids,
@@ -289,7 +247,6 @@ impl ErrorCode for Error {
Error::InvalidTaskCanceledBy { .. } => Code::InvalidTaskCanceledBy,
Error::InvalidIndexUid { .. } => Code::InvalidIndexUid,
Error::TaskNotFound(_) => Code::TaskNotFound,
Error::TaskFileNotFound(_) => Code::TaskFileNotFound,
Error::BatchNotFound(_) => Code::BatchNotFound,
Error::TaskDeletionWithEmptyQuery => Code::MissingTaskFilters,
Error::TaskCancelationWithEmptyQuery => Code::MissingTaskFilters,
@@ -297,8 +254,7 @@ impl ErrorCode for Error {
Error::NoSpaceLeftInTaskQueue => Code::NoSpaceLeftOnDevice,
Error::Dump(e) => e.error_code(),
Error::Milli { error, .. } => error.error_code(),
Error::ProcessBatchPanicked(_) => Code::Internal,
Error::FromRemoteWhenExporting { .. } => Code::Internal,
Error::ProcessBatchPanicked => Code::Internal,
Error::Heed(e) => e.error_code(),
Error::HeedTransaction(e) => e.error_code(),
Error::FileStore(e) => e.error_code(),
@@ -310,12 +266,7 @@ impl ErrorCode for Error {
Error::Anyhow(_) => Code::Internal,
Error::CorruptedTaskQueue => Code::Internal,
Error::CorruptedDump => Code::Internal,
Error::DatabaseUpgrade(_) => Code::Internal,
Error::Export(_) => Code::Internal,
Error::RollbackFailed { .. } => Code::Internal,
Error::UnrecoverableError(_) => Code::Internal,
Error::IndexSchedulerVersionMismatch { .. } => Code::Internal,
Error::IndexVersionMismatch { .. } => Code::Internal,
Error::TaskDatabaseUpdate(_) => Code::Internal,
Error::CreateBatch(_) => Code::Internal,
// This one should never be seen by the end user

View File

@@ -1,30 +1,18 @@
use std::sync::{Arc, RwLock};
use meilisearch_types::enterprise_edition::network::Network;
use meilisearch_types::features::{InstanceTogglableFeatures, RuntimeTogglableFeatures};
use meilisearch_types::heed::types::{SerdeJson, Str};
use meilisearch_types::heed::{Database, Env, RwTxn, WithoutTls};
use meilisearch_types::heed::{Database, Env, RwTxn};
use crate::error::FeatureNotEnabledError;
use crate::Result;
/// The number of database used by features
const NUMBER_OF_DATABASES: u32 = 1;
/// Database const names for the `FeatureData`.
mod db_name {
pub const EXPERIMENTAL_FEATURES: &str = "experimental-features";
}
mod db_keys {
pub const EXPERIMENTAL_FEATURES: &str = "experimental-features";
pub const NETWORK: &str = "network";
}
const EXPERIMENTAL_FEATURES: &str = "experimental-features";
#[derive(Clone)]
pub(crate) struct FeatureData {
persisted: Database<Str, SerdeJson<RuntimeTogglableFeatures>>,
runtime: Arc<RwLock<RuntimeTogglableFeatures>>,
network: Arc<RwLock<Network>>,
}
#[derive(Debug, Clone, Copy)]
@@ -68,6 +56,19 @@ impl RoFeatures {
}
}
pub fn check_vector(&self, disabled_action: &'static str) -> Result<()> {
if self.runtime.vector_store {
Ok(())
} else {
Err(FeatureNotEnabledError {
disabled_action,
feature: "vector store",
issue_link: "https://github.com/meilisearch/product/discussions/677",
}
.into())
}
}
pub fn check_edit_documents_by_function(&self, disabled_action: &'static str) -> Result<()> {
if self.runtime.edit_documents_by_function {
Ok(())
@@ -86,108 +87,24 @@ impl RoFeatures {
Ok(())
} else {
Err(FeatureNotEnabledError {
disabled_action: "Using `CONTAINS` in a filter",
disabled_action: "Using `CONTAINS` or `STARTS WITH` in a filter",
feature: "contains filter",
issue_link: "https://github.com/orgs/meilisearch/discussions/763",
}
.into())
}
}
pub fn check_network(&self, disabled_action: &'static str) -> Result<()> {
if self.runtime.network {
Ok(())
} else {
Err(FeatureNotEnabledError {
disabled_action,
feature: "network",
issue_link: "https://github.com/orgs/meilisearch/discussions/805",
}
.into())
}
}
pub fn check_get_task_documents_route(&self) -> Result<()> {
if self.runtime.get_task_documents_route {
Ok(())
} else {
Err(FeatureNotEnabledError {
disabled_action: "Getting the documents of an enqueued task",
feature: "get task documents route",
issue_link: "https://github.com/orgs/meilisearch/discussions/808",
}
.into())
}
}
pub fn check_composite_embedders(&self, disabled_action: &'static str) -> Result<()> {
if self.runtime.composite_embedders {
Ok(())
} else {
Err(FeatureNotEnabledError {
disabled_action,
feature: "composite embedders",
issue_link: "https://github.com/orgs/meilisearch/discussions/816",
}
.into())
}
}
pub fn check_chat_completions(&self, disabled_action: &'static str) -> Result<()> {
if self.runtime.chat_completions {
Ok(())
} else {
Err(FeatureNotEnabledError {
disabled_action,
feature: "chat completions",
issue_link: "https://github.com/orgs/meilisearch/discussions/835",
}
.into())
}
}
pub fn check_multimodal(&self, disabled_action: &'static str) -> Result<()> {
if self.runtime.multimodal {
Ok(())
} else {
Err(FeatureNotEnabledError {
disabled_action,
feature: "multimodal",
issue_link: "https://github.com/orgs/meilisearch/discussions/846",
}
.into())
}
}
pub fn check_vector_store_setting(&self, disabled_action: &'static str) -> Result<()> {
if self.runtime.vector_store_setting {
Ok(())
} else {
Err(FeatureNotEnabledError {
disabled_action,
feature: "vector_store_setting",
issue_link: "https://github.com/orgs/meilisearch/discussions/860",
}
.into())
}
}
}
impl FeatureData {
pub(crate) const fn nb_db() -> u32 {
NUMBER_OF_DATABASES
}
pub fn new(
env: &Env<WithoutTls>,
wtxn: &mut RwTxn,
instance_features: InstanceTogglableFeatures,
) -> Result<Self> {
let runtime_features_db =
env.create_database(wtxn, Some(db_name::EXPERIMENTAL_FEATURES))?;
pub fn new(env: &Env, instance_features: InstanceTogglableFeatures) -> Result<Self> {
let mut wtxn = env.write_txn()?;
let runtime_features_db = env.create_database(&mut wtxn, Some(EXPERIMENTAL_FEATURES))?;
wtxn.commit()?;
let txn = env.read_txn()?;
let persisted_features: RuntimeTogglableFeatures =
runtime_features_db.get(wtxn, db_keys::EXPERIMENTAL_FEATURES)?.unwrap_or_default();
runtime_features_db.get(&txn, EXPERIMENTAL_FEATURES)?.unwrap_or_default();
let InstanceTogglableFeatures { metrics, logs_route, contains_filter } = instance_features;
let runtime = Arc::new(RwLock::new(RuntimeTogglableFeatures {
metrics: metrics || persisted_features.metrics,
@@ -196,15 +113,7 @@ impl FeatureData {
..persisted_features
}));
// Once this is stabilized, network should be stored along with webhooks in index-scheduler's persisted database
let network_db = runtime_features_db.remap_data_type::<SerdeJson<Network>>();
let network: Network = network_db.get(wtxn, db_keys::NETWORK)?.unwrap_or_default();
Ok(Self {
persisted: runtime_features_db,
runtime,
network: Arc::new(RwLock::new(network)),
})
Ok(Self { persisted: runtime_features_db, runtime })
}
pub fn put_runtime_features(
@@ -212,7 +121,7 @@ impl FeatureData {
mut wtxn: RwTxn,
features: RuntimeTogglableFeatures,
) -> Result<()> {
self.persisted.put(&mut wtxn, db_keys::EXPERIMENTAL_FEATURES, &features)?;
self.persisted.put(&mut wtxn, EXPERIMENTAL_FEATURES, &features)?;
wtxn.commit()?;
// safe to unwrap, the lock will only fail if:
@@ -233,21 +142,4 @@ impl FeatureData {
pub fn features(&self) -> RoFeatures {
RoFeatures::new(self)
}
pub fn put_network(&self, mut wtxn: RwTxn, new_network: Network) -> Result<()> {
self.persisted.remap_data_type::<SerdeJson<Network>>().put(
&mut wtxn,
db_keys::NETWORK,
&new_network,
)?;
wtxn.commit()?;
let mut network = self.network.write().unwrap();
*network = new_network;
Ok(())
}
pub fn network(&self) -> Network {
Network::clone(&*self.network.read().unwrap())
}
}

View File

@@ -1,7 +1,5 @@
use std::collections::BTreeMap;
use std::env::VarError;
use std::path::Path;
use std::str::FromStr;
use std::time::Duration;
use meilisearch_types::heed::{EnvClosingEvent, EnvFlags, EnvOpenOptions};
@@ -104,7 +102,7 @@ impl ReopenableIndex {
return Ok(());
}
map.unavailable.remove(&self.uuid);
map.create(&self.uuid, path, None, self.enable_mdb_writemap, self.map_size, false)?;
map.create(&self.uuid, path, None, self.enable_mdb_writemap, self.map_size)?;
}
Ok(())
}
@@ -173,12 +171,11 @@ impl IndexMap {
date: Option<(OffsetDateTime, OffsetDateTime)>,
enable_mdb_writemap: bool,
map_size: usize,
creation: bool,
) -> Result<Index> {
if !matches!(self.get_unavailable(uuid), Missing) {
panic!("Attempt to open an index that was unavailable");
}
let index = create_or_open_index(path, date, enable_mdb_writemap, map_size, creation)?;
let index = create_or_open_index(path, date, enable_mdb_writemap, map_size)?;
match self.available.insert(*uuid, index.clone()) {
InsertionOutcome::InsertedNew => (),
InsertionOutcome::Evicted(evicted_uuid, evicted_index) => {
@@ -302,31 +299,18 @@ fn create_or_open_index(
date: Option<(OffsetDateTime, OffsetDateTime)>,
enable_mdb_writemap: bool,
map_size: usize,
creation: bool,
) -> Result<Index> {
let options = EnvOpenOptions::new();
let mut options = options.read_txn_without_tls();
let mut options = EnvOpenOptions::new();
options.map_size(clamp_to_page_size(map_size));
// You can find more details about this experimental
// environment variable on the following GitHub discussion:
// <https://github.com/orgs/meilisearch/discussions/806>
let max_readers = match std::env::var("MEILI_EXPERIMENTAL_INDEX_MAX_READERS") {
Ok(value) => u32::from_str(&value).unwrap(),
Err(VarError::NotPresent) => 1024,
Err(VarError::NotUnicode(value)) => panic!(
"Invalid unicode for the `MEILI_EXPERIMENTAL_INDEX_MAX_READERS` env var: {value:?}"
),
};
options.max_readers(max_readers);
options.max_readers(1024);
if enable_mdb_writemap {
unsafe { options.flags(EnvFlags::WRITE_MAP) };
}
if let Some((created, updated)) = date {
Ok(Index::new_with_creation_dates(options, path, created, updated, creation)?)
Ok(Index::new_with_creation_dates(options, path, created, updated)?)
} else {
Ok(Index::new(options, path, creation)?)
Ok(Index::new(options, path)?)
}
}
@@ -334,17 +318,17 @@ fn create_or_open_index(
#[cfg(test)]
mod tests {
use meilisearch_types::heed::{Env, WithoutTls};
use meilisearch_types::heed::Env;
use meilisearch_types::Index;
use uuid::Uuid;
use super::super::IndexMapper;
use crate::test_utils::IndexSchedulerHandle;
use crate::tests::IndexSchedulerHandle;
use crate::utils::clamp_to_page_size;
use crate::IndexScheduler;
impl IndexMapper {
fn test() -> (Self, Env<WithoutTls>, IndexSchedulerHandle) {
fn test() -> (Self, Env, IndexSchedulerHandle) {
let (index_scheduler, handle) = IndexScheduler::test(true, vec![]);
(index_scheduler.index_mapper, index_scheduler.env, handle)
}

View File

@@ -4,10 +4,8 @@ use std::time::Duration;
use std::{fs, thread};
use meilisearch_types::heed::types::{SerdeJson, Str};
use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn, WithoutTls};
use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn};
use meilisearch_types::milli;
use meilisearch_types::milli::database_stats::DatabaseStats;
use meilisearch_types::milli::index::RollbackOutcome;
use meilisearch_types::milli::update::IndexerConfig;
use meilisearch_types::milli::{FieldDistribution, Index};
use serde::{Deserialize, Serialize};
@@ -18,17 +16,12 @@ use uuid::Uuid;
use self::index_map::IndexMap;
use self::IndexStatus::{Available, BeingDeleted, Closing, Missing};
use crate::uuid_codec::UuidCodec;
use crate::{Error, IndexBudget, IndexSchedulerOptions, Result};
use crate::{Error, Result};
mod index_map;
/// The number of database used by index mapper
const NUMBER_OF_DATABASES: u32 = 2;
/// Database const names for the `IndexMapper`.
mod db_name {
pub const INDEX_MAPPING: &str = "index-mapping";
pub const INDEX_STATS: &str = "index-stats";
}
const INDEX_MAPPING: &str = "index-mapping";
const INDEX_STATS: &str = "index-stats";
/// Structure managing meilisearch's indexes.
///
@@ -71,7 +64,7 @@ pub struct IndexMapper {
/// Path to the folder where the LMDB environments of each index are.
base_path: PathBuf,
/// The map size an index is opened with on the first time.
pub(crate) index_base_map_size: usize,
index_base_map_size: usize,
/// The quantity by which the map size of an index is incremented upon reopening, in bytes.
index_growth_amount: usize,
/// Whether we open a meilisearch index with the MDB_WRITEMAP option or not.
@@ -100,32 +93,19 @@ pub enum IndexStatus {
/// The statistics that can be computed from an `Index` object.
#[derive(Serialize, Deserialize, Debug)]
pub struct IndexStats {
/// Stats of the documents database.
#[serde(default)]
pub documents_database_stats: DatabaseStats,
#[serde(default, skip_serializing)]
pub number_of_documents: Option<u64>,
/// Number of documents in the index.
pub number_of_documents: u64,
/// Size taken up by the index' DB, in bytes.
///
/// This includes the size taken by both the used and free pages of the DB, and as the free pages
/// are not returned to the disk after a deletion, this number is typically larger than
/// `used_database_size` that only includes the size of the used pages.
pub database_size: u64,
/// Number of embeddings in the index.
/// Option: retrocompatible with the stats of the pre-v1.13.0 versions of meilisearch
pub number_of_embeddings: Option<u64>,
/// Number of embedded documents in the index.
/// Option: retrocompatible with the stats of the pre-v1.13.0 versions of meilisearch
pub number_of_embedded_documents: Option<u64>,
/// Size taken by the used pages of the index' DB, in bytes.
///
/// As the DB backend does not return to the disk the pages that are not currently used by the DB,
/// this value is typically smaller than `database_size`.
pub used_database_size: u64,
/// The primary key of the index
pub primary_key: Option<String>,
/// Association of every field name with the number of times it occurs in the documents.
pub field_distribution: FieldDistribution,
/// Creation date of the index.
@@ -143,15 +123,10 @@ impl IndexStats {
///
/// - rtxn: a RO transaction for the index, obtained from `Index::read_txn()`.
pub fn new(index: &Index, rtxn: &RoTxn) -> milli::Result<Self> {
let vector_store_stats = index.vector_store_stats(rtxn)?;
Ok(IndexStats {
number_of_embeddings: Some(vector_store_stats.number_of_embeddings),
number_of_embedded_documents: Some(vector_store_stats.documents.len()),
documents_database_stats: index.documents_stats(rtxn)?.unwrap_or_default(),
number_of_documents: None,
number_of_documents: index.number_of_documents(rtxn)?,
database_size: index.on_disk_size()?,
used_database_size: index.used_size()?,
primary_key: index.primary_key(rtxn)?.map(|s| s.to_string()),
field_distribution: index.field_distribution(rtxn)?,
created_at: index.created_at(rtxn)?,
updated_at: index.updated_at(rtxn)?,
@@ -160,25 +135,29 @@ impl IndexStats {
}
impl IndexMapper {
pub(crate) const fn nb_db() -> u32 {
NUMBER_OF_DATABASES
}
pub fn new(
env: &Env<WithoutTls>,
wtxn: &mut RwTxn,
options: &IndexSchedulerOptions,
budget: IndexBudget,
env: &Env,
base_path: PathBuf,
index_base_map_size: usize,
index_growth_amount: usize,
index_count: usize,
enable_mdb_writemap: bool,
indexer_config: IndexerConfig,
) -> Result<Self> {
let mut wtxn = env.write_txn()?;
let index_mapping = env.create_database(&mut wtxn, Some(INDEX_MAPPING))?;
let index_stats = env.create_database(&mut wtxn, Some(INDEX_STATS))?;
wtxn.commit()?;
Ok(Self {
index_map: Arc::new(RwLock::new(IndexMap::new(budget.index_count))),
index_mapping: env.create_database(wtxn, Some(db_name::INDEX_MAPPING))?,
index_stats: env.create_database(wtxn, Some(db_name::INDEX_STATS))?,
base_path: options.indexes_path.clone(),
index_base_map_size: budget.map_size,
index_growth_amount: options.index_growth_amount,
enable_mdb_writemap: options.enable_mdb_writemap,
indexer_config: options.indexer_config.clone(),
index_map: Arc::new(RwLock::new(IndexMap::new(index_count))),
index_mapping,
index_stats,
base_path,
index_base_map_size,
index_growth_amount,
enable_mdb_writemap,
indexer_config: Arc::new(indexer_config),
currently_updating_index: Default::default(),
})
}
@@ -199,7 +178,7 @@ impl IndexMapper {
let uuid = Uuid::new_v4();
self.index_mapping.put(&mut wtxn, name, &uuid)?;
let index_path = self.index_path(uuid);
let index_path = self.base_path.join(uuid.to_string());
fs::create_dir_all(&index_path)?;
// Error if the UUIDv4 somehow already exists in the map, since it should be fresh.
@@ -215,14 +194,8 @@ impl IndexMapper {
date,
self.enable_mdb_writemap,
self.index_base_map_size,
true,
)
.map_err(|e| Error::from_milli(e, Some(uuid.to_string())))?;
let index_rtxn = index.read_txn()?;
let stats = crate::index_mapper::IndexStats::new(&index, &index_rtxn)
.map_err(|e| Error::from_milli(e, Some(name.to_string())))?;
self.store_stats_of(&mut wtxn, name, &stats)?;
drop(index_rtxn);
wtxn.commit()?;
@@ -286,7 +259,7 @@ impl IndexMapper {
};
let index_map = self.index_map.clone();
let index_path = self.index_path(uuid);
let index_path = self.base_path.join(uuid.to_string());
let index_name = name.to_string();
thread::Builder::new()
.name(String::from("index_deleter"))
@@ -341,26 +314,6 @@ impl IndexMapper {
Ok(())
}
/// Closes the specified index.
///
/// This operation involves closing the underlying environment and so can take a long time to complete.
///
/// # Panics
///
/// - If the Index corresponding to the passed name is concurrently being deleted/resized or cannot be found in the
/// in memory hash map.
pub fn close_index(&self, rtxn: &RoTxn, name: &str) -> Result<()> {
let uuid = self
.index_mapping
.get(rtxn, name)?
.ok_or_else(|| Error::IndexNotFound(name.to_string()))?;
// We remove the index from the in-memory index map.
self.index_map.write().unwrap().close_for_resize(&uuid, self.enable_mdb_writemap, 0);
Ok(())
}
/// Return an index, may open it if it wasn't already opened.
pub fn index(&self, rtxn: &RoTxn, name: &str) -> Result<Index> {
if let Some((current_name, current_index)) =
@@ -408,7 +361,7 @@ impl IndexMapper {
} else {
continue;
};
let index_path = self.index_path(uuid);
let index_path = self.base_path.join(uuid.to_string());
// take the lock to reopen the environment.
reopen
.reopen(&mut self.index_map.write().unwrap(), &index_path)
@@ -425,7 +378,7 @@ impl IndexMapper {
// if it's not already there.
match index_map.get(&uuid) {
Missing => {
let index_path = self.index_path(uuid);
let index_path = self.base_path.join(uuid.to_string());
break index_map
.create(
@@ -434,7 +387,6 @@ impl IndexMapper {
None,
self.enable_mdb_writemap,
self.index_base_map_size,
false,
)
.map_err(|e| Error::from_milli(e, Some(uuid.to_string())))?;
}
@@ -452,59 +404,6 @@ impl IndexMapper {
Ok(index)
}
/// Returns the path of the index.
///
/// The folder located at this path is containing the data.mdb,
/// the lock.mdb and an optional data.mdb.cpy file.
pub fn index_path(&self, uuid: Uuid) -> PathBuf {
self.base_path.join(uuid.to_string())
}
pub fn rollback_index(
&self,
rtxn: &RoTxn,
name: &str,
to: (u32, u32, u32),
) -> Result<RollbackOutcome> {
// remove any currently updating index to make sure that we aren't keeping a reference to the index somewhere
drop(self.currently_updating_index.write().unwrap().take());
let uuid = self
.index_mapping
.get(rtxn, name)?
.ok_or_else(|| Error::IndexNotFound(name.to_string()))?;
// take the lock to make sure noone is messing with the indexes while we rollback
// this will block any search or other operation, but we are rollbacking so this is probably acceptable.
let mut index_map = self.index_map.write().unwrap();
'close_index: loop {
match index_map.get(&uuid) {
Available(_) => {
index_map.close_for_resize(&uuid, self.enable_mdb_writemap, 0);
// index should now be `Closing`; try again
continue;
}
// index already closed
Missing => break 'close_index,
// closing requested by this thread or another one; wait for closing to complete, then exit
Closing(closing_index) => {
if closing_index.wait_timeout(Duration::from_secs(100)).is_none() {
// release the lock so it doesn't get poisoned
drop(index_map);
panic!("cannot close index")
}
break;
}
BeingDeleted => return Err(Error::IndexNotFound(name.to_string())),
};
}
let index_path = self.index_path(uuid);
Index::rollback(milli::heed::EnvOpenOptions::new().read_txn_without_tls(), index_path, to)
.map_err(|err| crate::Error::from_milli(err, Some(name.to_string())))
}
/// Attempts `f` for each index that exists in the index mapper.
///
/// It is preferable to use this function rather than a loop that opens all indexes, as a way to avoid having all indexes opened,
@@ -554,20 +453,6 @@ impl IndexMapper {
Ok(())
}
/// Rename an index.
pub fn rename(&self, wtxn: &mut RwTxn, current: &str, new: &str) -> Result<()> {
let uuid = self
.index_mapping
.get(wtxn, current)?
.ok_or_else(|| Error::IndexNotFound(current.to_string()))?;
if self.index_mapping.get(wtxn, new)?.is_some() {
return Err(Error::IndexAlreadyExists(new.to_string()));
}
self.index_mapping.delete(wtxn, current)?;
self.index_mapping.put(wtxn, new, &uuid)?;
Ok(())
}
/// The stats of an index.
///
/// If available in the cache, they are directly returned.

View File

@@ -1,16 +1,15 @@
use std::collections::BTreeSet;
use std::fmt::Write;
use meilisearch_types::batches::{Batch, BatchEnqueuedAt, BatchStats};
use meilisearch_types::batches::Batch;
use meilisearch_types::heed::types::{SerdeBincode, SerdeJson, Str};
use meilisearch_types::heed::{Database, RoTxn};
use meilisearch_types::milli::{CboRoaringBitmapCodec, RoaringBitmapCodec, BEU32};
use meilisearch_types::tasks::{Details, Kind, Status, Task};
use meilisearch_types::versioning;
use meilisearch_types::tasks::{Details, Task};
use roaring::RoaringBitmap;
use crate::index_mapper::IndexMapper;
use crate::{IndexScheduler, BEI128};
use crate::{IndexScheduler, Kind, Status, BEI128};
pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String {
// Since we'll snapshot the index right afterward, we don't need to ensure it's internally consistent for every run.
@@ -19,38 +18,55 @@ pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String {
scheduler.assert_internally_consistent();
let IndexScheduler {
autobatching_enabled,
cleanup_enabled: _,
experimental_no_edition_2024_for_dumps: _,
must_stop_processing: _,
processing_tasks,
file_store,
env,
version,
queue,
scheduler,
persisted,
all_tasks,
all_batches,
batch_to_tasks_mapping,
// task reverse index
status,
kind,
index_tasks,
canceled_by,
enqueued_at,
started_at,
finished_at,
// batch reverse index
batch_status,
batch_kind,
batch_index_tasks,
batch_enqueued_at,
batch_started_at,
batch_finished_at,
index_mapper,
features: _,
webhooks: _,
max_number_of_tasks: _,
max_number_of_batched_tasks: _,
wake_up: _,
dumps_path: _,
snapshots_path: _,
auth_path: _,
version_file_path: _,
webhook_url: _,
webhook_authorization_header: _,
test_breakpoint_sdr: _,
planned_failures: _,
run_loop_iteration: _,
embedders: _,
chat_settings: _,
} = scheduler;
let rtxn = env.read_txn().unwrap();
let mut snap = String::new();
let indx_sched_version = version.get_version(&rtxn).unwrap();
let latest_version =
(versioning::VERSION_MAJOR, versioning::VERSION_MINOR, versioning::VERSION_PATCH);
if indx_sched_version != Some(latest_version) {
snap.push_str(&format!("index scheduler running on version {indx_sched_version:?}\n"));
}
let processing = processing_tasks.read().unwrap().clone();
snap.push_str(&format!("### Autobatching Enabled = {}\n", scheduler.autobatching_enabled));
snap.push_str(&format!("### Autobatching Enabled = {autobatching_enabled}\n"));
snap.push_str(&format!(
"### Processing batch {:?}:\n",
processing.batch.as_ref().map(|batch| batch.uid)
@@ -62,27 +78,20 @@ pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String {
}
snap.push_str("\n----------------------------------------------------------------------\n");
let persisted_db_snapshot = snapshot_persisted_db(&rtxn, persisted);
if !persisted_db_snapshot.is_empty() {
snap.push_str("### Persisted:\n");
snap.push_str(&persisted_db_snapshot);
snap.push_str("----------------------------------------------------------------------\n");
}
snap.push_str("### All Tasks:\n");
snap.push_str(&snapshot_all_tasks(&rtxn, queue.tasks.all_tasks));
snap.push_str(&snapshot_all_tasks(&rtxn, *all_tasks));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Status:\n");
snap.push_str(&snapshot_status(&rtxn, queue.tasks.status));
snap.push_str(&snapshot_status(&rtxn, *status));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Kind:\n");
snap.push_str(&snapshot_kind(&rtxn, queue.tasks.kind));
snap.push_str(&snapshot_kind(&rtxn, *kind));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Index Tasks:\n");
snap.push_str(&snapshot_index_tasks(&rtxn, queue.tasks.index_tasks));
snap.push_str(&snapshot_index_tasks(&rtxn, *index_tasks));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Index Mapper:\n");
@@ -90,55 +99,55 @@ pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String {
snap.push_str("\n----------------------------------------------------------------------\n");
snap.push_str("### Canceled By:\n");
snap.push_str(&snapshot_canceled_by(&rtxn, queue.tasks.canceled_by));
snap.push_str(&snapshot_canceled_by(&rtxn, *canceled_by));
snap.push_str("\n----------------------------------------------------------------------\n");
snap.push_str("### Enqueued At:\n");
snap.push_str(&snapshot_date_db(&rtxn, queue.tasks.enqueued_at));
snap.push_str(&snapshot_date_db(&rtxn, *enqueued_at));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Started At:\n");
snap.push_str(&snapshot_date_db(&rtxn, queue.tasks.started_at));
snap.push_str(&snapshot_date_db(&rtxn, *started_at));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Finished At:\n");
snap.push_str(&snapshot_date_db(&rtxn, queue.tasks.finished_at));
snap.push_str(&snapshot_date_db(&rtxn, *finished_at));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### All Batches:\n");
snap.push_str(&snapshot_all_batches(&rtxn, queue.batches.all_batches));
snap.push_str(&snapshot_all_batches(&rtxn, *all_batches));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Batch to tasks mapping:\n");
snap.push_str(&snapshot_batches_to_tasks_mappings(&rtxn, queue.batch_to_tasks_mapping));
snap.push_str(&snapshot_batches_to_tasks_mappings(&rtxn, *batch_to_tasks_mapping));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Batches Status:\n");
snap.push_str(&snapshot_status(&rtxn, queue.batches.status));
snap.push_str(&snapshot_status(&rtxn, *batch_status));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Batches Kind:\n");
snap.push_str(&snapshot_kind(&rtxn, queue.batches.kind));
snap.push_str(&snapshot_kind(&rtxn, *batch_kind));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Batches Index Tasks:\n");
snap.push_str(&snapshot_index_tasks(&rtxn, queue.batches.index_tasks));
snap.push_str(&snapshot_index_tasks(&rtxn, *batch_index_tasks));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Batches Enqueued At:\n");
snap.push_str(&snapshot_date_db(&rtxn, queue.batches.enqueued_at));
snap.push_str(&snapshot_date_db(&rtxn, *batch_enqueued_at));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Batches Started At:\n");
snap.push_str(&snapshot_date_db(&rtxn, queue.batches.started_at));
snap.push_str(&snapshot_date_db(&rtxn, *batch_started_at));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### Batches Finished At:\n");
snap.push_str(&snapshot_date_db(&rtxn, queue.batches.finished_at));
snap.push_str(&snapshot_date_db(&rtxn, *batch_finished_at));
snap.push_str("----------------------------------------------------------------------\n");
snap.push_str("### File Store:\n");
snap.push_str(&snapshot_file_store(&queue.file_store));
snap.push_str(&snapshot_file_store(file_store));
snap.push_str("\n----------------------------------------------------------------------\n");
snap
@@ -207,16 +216,6 @@ pub fn snapshot_date_db(rtxn: &RoTxn, db: Database<BEI128, CboRoaringBitmapCodec
snap
}
pub fn snapshot_persisted_db(rtxn: &RoTxn, db: &Database<Str, Str>) -> String {
let mut snap = String::new();
let iter = db.iter(rtxn).unwrap();
for next in iter {
let (key, value) = next.unwrap();
snap.push_str(&format!("{key}: {value}\n"));
}
snap
}
pub fn snapshot_task(task: &Task) -> String {
let mut snap = String::new();
let Task {
@@ -230,7 +229,6 @@ pub fn snapshot_task(task: &Task) -> String {
details,
status,
kind,
network,
} = task;
snap.push('{');
snap.push_str(&format!("uid: {uid}, "));
@@ -248,9 +246,6 @@ pub fn snapshot_task(task: &Task) -> String {
snap.push_str(&format!("details: {}, ", &snapshot_details(details)));
}
snap.push_str(&format!("kind: {kind:?}"));
if let Some(network) = network {
snap.push_str(&format!("network: {network:?}, "))
}
snap.push('}');
snap
@@ -278,8 +273,8 @@ fn snapshot_details(d: &Details) -> String {
Details::SettingsUpdate { settings } => {
format!("{{ settings: {settings:?} }}")
}
Details::IndexInfo { primary_key, new_index_uid, old_index_uid } => {
format!("{{ primary_key: {primary_key:?}, old_new_uid: {old_index_uid:?}, new_index_uid: {new_index_uid:?} }}")
Details::IndexInfo { primary_key } => {
format!("{{ primary_key: {primary_key:?} }}")
}
Details::DocumentDeletion {
provided_ids: received_document_ids,
@@ -311,15 +306,6 @@ fn snapshot_details(d: &Details) -> String {
Details::IndexSwap { swaps } => {
format!("{{ swaps: {swaps:?} }}")
}
Details::Export { url, api_key, payload_size, indexes } => {
format!("{{ url: {url:?}, api_key: {api_key:?}, payload_size: {payload_size:?}, indexes: {indexes:?} }}")
}
Details::UpgradeDatabase { from, to } => {
format!("{{ from: {from:?}, to: {to:?} }}")
}
Details::IndexCompaction { index_uid, pre_compaction_size, post_compaction_size } => {
format!("{{ index_uid: {index_uid:?}, pre_compaction_size: {pre_compaction_size:?}, post_compaction_size: {post_compaction_size:?} }}")
}
}
}
@@ -335,7 +321,6 @@ pub fn snapshot_status(
}
snap
}
pub fn snapshot_kind(rtxn: &RoTxn, db: Database<SerdeBincode<Kind>, RoaringBitmapCodec>) -> String {
let mut snap = String::new();
let iter = db.iter(rtxn).unwrap();
@@ -356,7 +341,6 @@ pub fn snapshot_index_tasks(rtxn: &RoTxn, db: Database<Str, RoaringBitmapCodec>)
}
snap
}
pub fn snapshot_canceled_by(rtxn: &RoTxn, db: Database<BEU32, RoaringBitmapCodec>) -> String {
let mut snap = String::new();
let iter = db.iter(rtxn).unwrap();
@@ -369,41 +353,14 @@ pub fn snapshot_canceled_by(rtxn: &RoTxn, db: Database<BEU32, RoaringBitmapCodec
pub fn snapshot_batch(batch: &Batch) -> String {
let mut snap = String::new();
let Batch {
uid,
details,
stats,
embedder_stats,
started_at,
finished_at,
progress: _,
enqueued_at,
stop_reason,
} = batch;
let stats = BatchStats {
progress_trace: Default::default(),
internal_database_sizes: Default::default(),
write_channel_congestion: None,
..stats.clone()
};
let Batch { uid, details, stats, started_at, finished_at, progress: _ } = batch;
if let Some(finished_at) = finished_at {
assert!(finished_at > started_at);
}
let BatchEnqueuedAt { earliest, oldest } = enqueued_at.unwrap();
assert!(*started_at > earliest);
assert!(earliest >= oldest);
snap.push('{');
snap.push_str(&format!("uid: {uid}, "));
snap.push_str(&format!("details: {}, ", serde_json::to_string(details).unwrap()));
snap.push_str(&format!("stats: {}, ", serde_json::to_string(&stats).unwrap()));
if !embedder_stats.skip_serializing() {
snap.push_str(&format!(
"embedder stats: {}, ",
serde_json::to_string(&embedder_stats).unwrap()
));
}
snap.push_str(&format!("stop reason: {}, ", serde_json::to_string(&stop_reason).unwrap()));
snap.push_str(&format!("stats: {}, ", serde_json::to_string(stats).unwrap()));
snap.push('}');
snap
}
@@ -416,8 +373,7 @@ pub fn snapshot_index_mapper(rtxn: &RoTxn, mapper: &IndexMapper) -> String {
let stats = mapper.stats_of(rtxn, &name).unwrap();
s.push_str(&format!(
"{name}: {{ number_of_documents: {}, field_distribution: {:?} }}\n",
stats.documents_database_stats.number_of_entries(),
stats.field_distribution
stats.number_of_documents, stats.field_distribution
));
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,12 +1,14 @@
use std::borrow::Cow;
use std::sync::Arc;
use meilisearch_types::milli::progress::{AtomicSubStep, NamedStep, Progress, ProgressView};
use enum_iterator::Sequence;
use meilisearch_types::milli::progress::{AtomicSubStep, NamedStep, Progress, ProgressView, Step};
use meilisearch_types::milli::{make_atomic_progress, make_enum_progress};
use roaring::RoaringBitmap;
use crate::utils::ProcessingBatch;
#[derive(Clone, Default)]
#[derive(Clone)]
pub struct ProcessingTasks {
pub batch: Option<Arc<ProcessingBatch>>,
/// The list of tasks ids that are currently running.
@@ -18,7 +20,7 @@ pub struct ProcessingTasks {
impl ProcessingTasks {
/// Creates an empty `ProcessingAt` struct.
pub fn new() -> ProcessingTasks {
ProcessingTasks::default()
ProcessingTasks { batch: None, processing: Arc::new(RoaringBitmap::new()), progress: None }
}
pub fn get_progress_view(&self) -> Option<ProgressView> {
@@ -64,18 +66,9 @@ make_enum_progress! {
}
}
make_enum_progress! {
pub enum FinalizingIndexStep {
Committing,
ComputingStats,
}
}
make_enum_progress! {
pub enum TaskCancelationProgress {
RetrievingTasks,
CancelingUpgrade,
CleaningCompactionLeftover,
UpdatingTasks,
}
}
@@ -104,12 +97,9 @@ make_enum_progress! {
pub enum DumpCreationProgress {
StartTheDumpCreation,
DumpTheApiKeys,
DumpTheChatCompletionSettings,
DumpTheTasks,
DumpTheBatches,
DumpTheIndexes,
DumpTheExperimentalFeatures,
DumpTheWebhooks,
CompressTheDump,
}
}
@@ -139,17 +129,6 @@ make_enum_progress! {
}
}
make_enum_progress! {
pub enum IndexCompaction {
RetrieveTheIndex,
CreateTemporaryFile,
CopyAndCompactTheIndex,
PersistTheCompactedIndex,
CloseTheIndex,
ReopenTheIndex,
}
}
make_enum_progress! {
pub enum InnerSwappingTwoIndexes {
RetrieveTheTasks,
@@ -189,19 +168,36 @@ make_enum_progress! {
}
}
make_enum_progress! {
pub enum Export {
EnsuringCorrectnessOfTheTarget,
ExportingTheSettings,
ExportingTheDocuments,
make_atomic_progress!(Task alias AtomicTaskStep => "task" );
make_atomic_progress!(Document alias AtomicDocumentStep => "document" );
make_atomic_progress!(Batch alias AtomicBatchStep => "batch" );
make_atomic_progress!(UpdateFile alias AtomicUpdateFileStep => "update file" );
pub struct VariableNameStep {
name: String,
current: u32,
total: u32,
}
impl VariableNameStep {
pub fn new(name: impl Into<String>, current: u32, total: u32) -> Self {
Self { name: name.into(), current, total }
}
}
make_atomic_progress!(Task alias AtomicTaskStep => "task" );
make_atomic_progress!(Document alias AtomicDocumentStep => "document" );
make_atomic_progress!(Index alias AtomicIndexStep => "index" );
make_atomic_progress!(Batch alias AtomicBatchStep => "batch" );
make_atomic_progress!(UpdateFile alias AtomicUpdateFileStep => "update file" );
impl Step for VariableNameStep {
fn name(&self) -> Cow<'static, str> {
self.name.clone().into()
}
fn current(&self) -> u32 {
self.current
}
fn total(&self) -> u32 {
self.total
}
}
#[cfg(test)]
mod test {

View File

@@ -1,613 +0,0 @@
use std::collections::HashSet;
use std::ops::{Bound, RangeBounds};
use meilisearch_types::batches::{Batch, BatchId};
use meilisearch_types::heed::types::{DecodeIgnore, SerdeBincode, SerdeJson, Str};
use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn, WithoutTls};
use meilisearch_types::milli::{CboRoaringBitmapCodec, RoaringBitmapCodec, BEU32};
use meilisearch_types::tasks::{Kind, Status};
use roaring::{MultiOps, RoaringBitmap};
use time::OffsetDateTime;
use super::{Query, Queue};
use crate::processing::ProcessingTasks;
use crate::utils::{
insert_task_datetime, keep_ids_within_datetimes, map_bound,
remove_n_tasks_datetime_earlier_than, remove_task_datetime, ProcessingBatch,
};
use crate::{Error, Result, BEI128};
/// The number of database used by the batch queue
const NUMBER_OF_DATABASES: u32 = 7;
/// Database const names for the `IndexScheduler`.
mod db_name {
pub const ALL_BATCHES: &str = "all-batches";
pub const BATCH_STATUS: &str = "batch-status";
pub const BATCH_KIND: &str = "batch-kind";
pub const BATCH_INDEX_TASKS: &str = "batch-index-tasks";
pub const BATCH_ENQUEUED_AT: &str = "batch-enqueued-at";
pub const BATCH_STARTED_AT: &str = "batch-started-at";
pub const BATCH_FINISHED_AT: &str = "batch-finished-at";
}
pub struct BatchQueue {
/// Contains all the batches accessible by their Id.
pub(crate) all_batches: Database<BEU32, SerdeJson<Batch>>,
/// All the batches containing a task matching the selected status.
pub(crate) status: Database<SerdeBincode<Status>, RoaringBitmapCodec>,
/// All the batches ids grouped by the kind of their task.
pub(crate) kind: Database<SerdeBincode<Kind>, RoaringBitmapCodec>,
/// Store the batches associated to an index.
pub(crate) index_tasks: Database<Str, RoaringBitmapCodec>,
/// Store the batches containing tasks which were enqueued at a specific date
pub(crate) enqueued_at: Database<BEI128, CboRoaringBitmapCodec>,
/// Store the batches containing finished tasks started at a specific date
pub(crate) started_at: Database<BEI128, CboRoaringBitmapCodec>,
/// Store the batches containing tasks finished at a specific date
pub(crate) finished_at: Database<BEI128, CboRoaringBitmapCodec>,
}
impl BatchQueue {
pub(crate) fn private_clone(&self) -> BatchQueue {
BatchQueue {
all_batches: self.all_batches,
status: self.status,
kind: self.kind,
index_tasks: self.index_tasks,
enqueued_at: self.enqueued_at,
started_at: self.started_at,
finished_at: self.finished_at,
}
}
pub(crate) const fn nb_db() -> u32 {
NUMBER_OF_DATABASES
}
pub(super) fn new(env: &Env<WithoutTls>, wtxn: &mut RwTxn) -> Result<Self> {
Ok(Self {
all_batches: env.create_database(wtxn, Some(db_name::ALL_BATCHES))?,
status: env.create_database(wtxn, Some(db_name::BATCH_STATUS))?,
kind: env.create_database(wtxn, Some(db_name::BATCH_KIND))?,
index_tasks: env.create_database(wtxn, Some(db_name::BATCH_INDEX_TASKS))?,
enqueued_at: env.create_database(wtxn, Some(db_name::BATCH_ENQUEUED_AT))?,
started_at: env.create_database(wtxn, Some(db_name::BATCH_STARTED_AT))?,
finished_at: env.create_database(wtxn, Some(db_name::BATCH_FINISHED_AT))?,
})
}
pub(crate) fn all_batch_ids(&self, rtxn: &RoTxn) -> Result<RoaringBitmap> {
enum_iterator::all().map(|s| self.get_status(rtxn, s)).union()
}
pub(crate) fn next_batch_id(&self, rtxn: &RoTxn) -> Result<BatchId> {
Ok(self
.all_batches
.remap_data_type::<DecodeIgnore>()
.last(rtxn)?
.map(|(k, _)| k + 1)
.unwrap_or_default())
}
pub(crate) fn get_batch(&self, rtxn: &RoTxn, batch_id: BatchId) -> Result<Option<Batch>> {
Ok(self.all_batches.get(rtxn, &batch_id)?)
}
/// Returns the whole set of batches that belongs to this index.
pub(crate) fn index_batches(&self, rtxn: &RoTxn, index: &str) -> Result<RoaringBitmap> {
Ok(self.index_tasks.get(rtxn, index)?.unwrap_or_default())
}
pub(crate) fn update_index(
&self,
wtxn: &mut RwTxn,
index: &str,
f: impl Fn(&mut RoaringBitmap),
) -> Result<()> {
let mut batches = self.index_batches(wtxn, index)?;
f(&mut batches);
if batches.is_empty() {
self.index_tasks.delete(wtxn, index)?;
} else {
self.index_tasks.put(wtxn, index, &batches)?;
}
Ok(())
}
pub(crate) fn get_status(&self, rtxn: &RoTxn, status: Status) -> Result<RoaringBitmap> {
Ok(self.status.get(rtxn, &status)?.unwrap_or_default())
}
pub(crate) fn put_status(
&self,
wtxn: &mut RwTxn,
status: Status,
bitmap: &RoaringBitmap,
) -> Result<()> {
Ok(self.status.put(wtxn, &status, bitmap)?)
}
pub(crate) fn update_status(
&self,
wtxn: &mut RwTxn,
status: Status,
f: impl Fn(&mut RoaringBitmap),
) -> Result<()> {
let mut tasks = self.get_status(wtxn, status)?;
f(&mut tasks);
self.put_status(wtxn, status, &tasks)?;
Ok(())
}
pub(crate) fn get_kind(&self, rtxn: &RoTxn, kind: Kind) -> Result<RoaringBitmap> {
Ok(self.kind.get(rtxn, &kind)?.unwrap_or_default())
}
pub(crate) fn put_kind(
&self,
wtxn: &mut RwTxn,
kind: Kind,
bitmap: &RoaringBitmap,
) -> Result<()> {
Ok(self.kind.put(wtxn, &kind, bitmap)?)
}
pub(crate) fn update_kind(
&self,
wtxn: &mut RwTxn,
kind: Kind,
f: impl Fn(&mut RoaringBitmap),
) -> Result<()> {
let mut tasks = self.get_kind(wtxn, kind)?;
f(&mut tasks);
self.put_kind(wtxn, kind, &tasks)?;
Ok(())
}
pub(crate) fn write_batch(&self, wtxn: &mut RwTxn, batch: ProcessingBatch) -> Result<()> {
let old_batch = self.all_batches.get(wtxn, &batch.uid)?;
self.all_batches.put(
wtxn,
&batch.uid,
&Batch {
uid: batch.uid,
progress: None,
details: batch.details,
stats: batch.stats,
embedder_stats: batch.embedder_stats.as_ref().into(),
started_at: batch.started_at,
finished_at: batch.finished_at,
enqueued_at: batch.enqueued_at,
stop_reason: batch.reason.to_string(),
},
)?;
// Update the statuses
if let Some(ref old_batch) = old_batch {
for status in old_batch.stats.status.keys() {
self.update_status(wtxn, *status, |bitmap| {
bitmap.remove(batch.uid);
})?;
}
}
for status in batch.statuses {
self.update_status(wtxn, status, |bitmap| {
bitmap.insert(batch.uid);
})?;
}
// Update the kinds / types
if let Some(ref old_batch) = old_batch {
let kinds: HashSet<_> = old_batch.stats.types.keys().cloned().collect();
for kind in kinds.difference(&batch.kinds) {
self.update_kind(wtxn, *kind, |bitmap| {
bitmap.remove(batch.uid);
})?;
}
}
for kind in batch.kinds {
self.update_kind(wtxn, kind, |bitmap| {
bitmap.insert(batch.uid);
})?;
}
// Update the indexes
if let Some(ref old_batch) = old_batch {
let indexes: HashSet<_> = old_batch.stats.index_uids.keys().cloned().collect();
for index in indexes.difference(&batch.indexes) {
self.update_index(wtxn, index, |bitmap| {
bitmap.remove(batch.uid);
})?;
}
}
for index in batch.indexes {
self.update_index(wtxn, &index, |bitmap| {
bitmap.insert(batch.uid);
})?;
}
// Update the enqueued_at: we cannot retrieve the previous enqueued at from the previous batch, and
// must instead go through the db looking for it. We cannot look at the task contained in this batch either
// because they may have been removed.
// What we know, though, is that the task date is from before the enqueued_at, and max two timestamps have been written
// to the DB per batches.
if let Some(ref old_batch) = old_batch {
if let Some(enqueued_at) = old_batch.enqueued_at {
remove_task_datetime(wtxn, self.enqueued_at, enqueued_at.earliest, old_batch.uid)?;
remove_task_datetime(wtxn, self.enqueued_at, enqueued_at.oldest, old_batch.uid)?;
} else {
// If we don't have the enqueued at in the batch it means the database comes from the v1.12
// and we still need to find the date by scrolling the database
remove_n_tasks_datetime_earlier_than(
wtxn,
self.enqueued_at,
old_batch.started_at,
old_batch.stats.total_nb_tasks.clamp(1, 2) as usize,
old_batch.uid,
)?;
}
}
// A finished batch MUST contains at least one task and have an enqueued_at
let enqueued_at = batch.enqueued_at.as_ref().unwrap();
insert_task_datetime(wtxn, self.enqueued_at, enqueued_at.earliest, batch.uid)?;
insert_task_datetime(wtxn, self.enqueued_at, enqueued_at.oldest, batch.uid)?;
// Update the started at and finished at
if let Some(ref old_batch) = old_batch {
remove_task_datetime(wtxn, self.started_at, old_batch.started_at, old_batch.uid)?;
if let Some(finished_at) = old_batch.finished_at {
remove_task_datetime(wtxn, self.finished_at, finished_at, old_batch.uid)?;
}
}
insert_task_datetime(wtxn, self.started_at, batch.started_at, batch.uid)?;
insert_task_datetime(wtxn, self.finished_at, batch.finished_at.unwrap(), batch.uid)?;
Ok(())
}
/// Convert an iterator to a `Vec` of batches. The batches MUST exist or a
/// `CorruptedTaskQueue` error will be thrown.
pub(crate) fn get_existing_batches(
&self,
rtxn: &RoTxn,
batches: impl IntoIterator<Item = BatchId>,
processing: &ProcessingTasks,
) -> Result<Vec<Batch>> {
batches
.into_iter()
.map(|batch_id| {
if Some(batch_id) == processing.batch.as_ref().map(|batch| batch.uid) {
let mut batch = processing.batch.as_ref().unwrap().to_batch();
batch.progress = processing.get_progress_view();
// Add progress_trace from the current progress state
if let Some(progress) = &processing.progress {
batch.stats.progress_trace = progress
.accumulated_durations()
.into_iter()
.map(|(k, v)| (k, v.into()))
.collect();
}
Ok(batch)
} else {
self.get_batch(rtxn, batch_id)
.and_then(|batch| batch.ok_or(Error::CorruptedTaskQueue))
}
})
.collect::<Result<_>>()
}
}
impl Queue {
/// Return the batch ids matched by the given query from the index scheduler's point of view.
pub(crate) fn get_batch_ids(
&self,
rtxn: &RoTxn,
query: &Query,
processing: &ProcessingTasks,
) -> Result<RoaringBitmap> {
let Query {
limit,
from,
reverse,
uids,
batch_uids,
statuses,
types,
index_uids,
canceled_by,
before_enqueued_at,
after_enqueued_at,
before_started_at,
after_started_at,
before_finished_at,
after_finished_at,
} = query;
let mut batches = self.batches.all_batch_ids(rtxn)?;
if let Some(batch_id) = processing.batch.as_ref().map(|batch| batch.uid) {
batches.insert(batch_id);
}
if let Some(from) = from {
let range = if reverse.unwrap_or_default() {
u32::MIN..*from
} else {
from.saturating_add(1)..u32::MAX
};
batches.remove_range(range);
}
if let Some(batch_uids) = &batch_uids {
let batches_uids = RoaringBitmap::from_iter(batch_uids);
batches &= batches_uids;
}
if let Some(status) = &statuses {
let mut status_batches = RoaringBitmap::new();
for status in status {
match status {
// special case for Processing batches
Status::Processing => {
if let Some(batch_id) = processing.batch.as_ref().map(|batch| batch.uid) {
status_batches.insert(batch_id);
}
}
// Enqueued tasks are not stored in batches
Status::Enqueued => (),
status => status_batches |= &self.batches.get_status(rtxn, *status)?,
};
}
if !status.contains(&Status::Processing) {
if let Some(ref batch) = processing.batch {
batches.remove(batch.uid);
}
}
batches &= status_batches;
}
if let Some(task_uids) = &uids {
let mut batches_by_task_uids = RoaringBitmap::new();
for task_uid in task_uids {
if let Some(task) = self.tasks.get_task(rtxn, *task_uid)? {
if let Some(batch_uid) = task.batch_uid {
batches_by_task_uids.insert(batch_uid);
}
}
}
batches &= batches_by_task_uids;
}
// There is no database for this query, we must retrieve the task queried by the client and ensure it's valid
if let Some(canceled_by) = &canceled_by {
let mut all_canceled_batches = RoaringBitmap::new();
for cancel_uid in canceled_by {
if let Some(task) = self.tasks.get_task(rtxn, *cancel_uid)? {
if task.kind.as_kind() == Kind::TaskCancelation
&& task.status == Status::Succeeded
{
if let Some(batch_uid) = task.batch_uid {
all_canceled_batches.insert(batch_uid);
}
}
}
}
// if the canceled_by has been specified but no batch
// matches then we prefer matching zero than all batches.
if all_canceled_batches.is_empty() {
return Ok(RoaringBitmap::new());
} else {
batches &= all_canceled_batches;
}
}
if let Some(kind) = &types {
let mut kind_batches = RoaringBitmap::new();
for kind in kind {
kind_batches |= self.batches.get_kind(rtxn, *kind)?;
if let Some(uid) = processing
.batch
.as_ref()
.and_then(|batch| batch.kinds.contains(kind).then_some(batch.uid))
{
kind_batches.insert(uid);
}
}
batches &= &kind_batches;
}
if let Some(index) = &index_uids {
let mut index_batches = RoaringBitmap::new();
for index in index {
index_batches |= self.batches.index_batches(rtxn, index)?;
if let Some(uid) = processing
.batch
.as_ref()
.and_then(|batch| batch.indexes.contains(index).then_some(batch.uid))
{
index_batches.insert(uid);
}
}
batches &= &index_batches;
}
// For the started_at filter, we need to treat the part of the batches that are processing from the part of the
// batches that are not processing. The non-processing ones are filtered normally while the processing ones
// are entirely removed unless the in-memory startedAt variable falls within the date filter.
// Once we have filtered the two subsets, we put them back together and assign it back to `batches`.
batches = {
let (mut filtered_non_processing_batches, mut filtered_processing_batches) =
(&batches - &*processing.processing, &batches & &*processing.processing);
// special case for Processing batches
// A closure that clears the filtered_processing_batches if their started_at date falls outside the given bounds
let mut clear_filtered_processing_batches =
|start: Bound<OffsetDateTime>, end: Bound<OffsetDateTime>| {
let start = map_bound(start, |b| b.unix_timestamp_nanos());
let end = map_bound(end, |b| b.unix_timestamp_nanos());
let is_within_dates = RangeBounds::contains(
&(start, end),
&processing
.batch
.as_ref()
.map_or_else(OffsetDateTime::now_utc, |batch| batch.started_at)
.unix_timestamp_nanos(),
);
if !is_within_dates {
filtered_processing_batches.clear();
}
};
match (after_started_at, before_started_at) {
(None, None) => (),
(None, Some(before)) => {
clear_filtered_processing_batches(Bound::Unbounded, Bound::Excluded(*before))
}
(Some(after), None) => {
clear_filtered_processing_batches(Bound::Excluded(*after), Bound::Unbounded)
}
(Some(after), Some(before)) => clear_filtered_processing_batches(
Bound::Excluded(*after),
Bound::Excluded(*before),
),
};
keep_ids_within_datetimes(
rtxn,
&mut filtered_non_processing_batches,
self.batches.started_at,
*after_started_at,
*before_started_at,
)?;
filtered_non_processing_batches | filtered_processing_batches
};
keep_ids_within_datetimes(
rtxn,
&mut batches,
self.batches.enqueued_at,
*after_enqueued_at,
*before_enqueued_at,
)?;
keep_ids_within_datetimes(
rtxn,
&mut batches,
self.batches.finished_at,
*after_finished_at,
*before_finished_at,
)?;
if let Some(limit) = limit {
batches = if query.reverse.unwrap_or_default() {
batches.into_iter().take(*limit as usize).collect()
} else {
batches.into_iter().rev().take(*limit as usize).collect()
};
}
Ok(batches)
}
/// Return the batch ids matching the query along with the total number of batches
/// by ignoring the from and limit parameters from the user's point of view.
///
/// There are two differences between an internal query and a query executed by
/// the user.
///
/// 1. IndexSwap tasks are not publicly associated with any index, but they are associated
/// with many indexes internally.
/// 2. The user may not have the rights to access the tasks (internally) associated with all indexes.
pub(crate) fn get_batch_ids_from_authorized_indexes(
&self,
rtxn: &RoTxn,
query: &Query,
filters: &meilisearch_auth::AuthFilter,
processing: &ProcessingTasks,
) -> Result<(RoaringBitmap, u64)> {
// compute all batches matching the filter by ignoring the limits, to find the number of batches matching
// the filter.
// As this causes us to compute the filter twice it is slightly inefficient, but doing it this way spares
// us from modifying the underlying implementation, and the performance remains sufficient.
// Should this change, we would modify `get_batch_ids` to directly return the number of matching batches.
let total_batches =
self.get_batch_ids(rtxn, &query.clone().without_limits(), processing)?;
let mut batches = self.get_batch_ids(rtxn, query, processing)?;
// If the query contains a list of index uid or there is a finite list of authorized indexes,
// then we must exclude all the batches that only contains tasks associated to multiple indexes.
// This works because we don't autobatch tasks associated to multiple indexes with tasks associated
// to a single index. e.g: IndexSwap cannot be batched with IndexCreation.
if query.index_uids.is_some() || !filters.all_indexes_authorized() {
for kind in enum_iterator::all::<Kind>().filter(|kind| !kind.related_to_one_index()) {
batches -= self.tasks.get_kind(rtxn, kind)?;
if let Some(batch) = processing.batch.as_ref() {
if batch.kinds.contains(&kind) {
batches.remove(batch.uid);
}
}
}
}
// Any batch that is internally associated with at least one authorized index
// must be returned.
if !filters.all_indexes_authorized() {
let mut valid_indexes = RoaringBitmap::new();
let mut forbidden_indexes = RoaringBitmap::new();
let all_indexes_iter = self.batches.index_tasks.iter(rtxn)?;
for result in all_indexes_iter {
let (index, index_tasks) = result?;
if filters.is_index_authorized(index) {
valid_indexes |= index_tasks;
} else {
forbidden_indexes |= index_tasks;
}
}
if let Some(batch) = processing.batch.as_ref() {
for index in &batch.indexes {
if filters.is_index_authorized(index) {
valid_indexes.insert(batch.uid);
} else {
forbidden_indexes.insert(batch.uid);
}
}
}
// If a batch had ONE valid task then it should be returned
let invalid_batches = forbidden_indexes - valid_indexes;
batches -= invalid_batches;
}
Ok((batches, total_batches.len()))
}
pub(crate) fn get_batches_from_authorized_indexes(
&self,
rtxn: &RoTxn,
query: &Query,
filters: &meilisearch_auth::AuthFilter,
processing: &ProcessingTasks,
) -> Result<(Vec<Batch>, u64)> {
let (batches, total) =
self.get_batch_ids_from_authorized_indexes(rtxn, query, filters, processing)?;
let batches = if query.reverse.unwrap_or_default() {
Box::new(batches.into_iter()) as Box<dyn Iterator<Item = u32>>
} else {
Box::new(batches.into_iter().rev()) as Box<dyn Iterator<Item = u32>>
};
let batches = self.batches.get_existing_batches(
rtxn,
batches.take(query.limit.unwrap_or(u32::MAX) as usize),
processing,
)?;
Ok((batches, total))
}
}

View File

@@ -1,489 +0,0 @@
use meili_snap::snapshot;
use meilisearch_auth::AuthFilter;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::tasks::{IndexSwap, KindWithContent, Status};
use time::{Duration, OffsetDateTime};
use crate::insta_snapshot::{snapshot_bitmap, snapshot_index_scheduler};
use crate::test_utils::Breakpoint::*;
use crate::test_utils::{index_creation_task, FailureLocation};
use crate::{IndexScheduler, Query};
#[test]
fn query_batches_from_and_limit() {
let (index_scheduler, mut handle) = IndexScheduler::test(true, vec![]);
let kind = index_creation_task("doggo", "bone");
let _task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "registered_the_first_task");
let kind = index_creation_task("whalo", "plankton");
let _task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "registered_the_second_task");
let kind = index_creation_task("catto", "his_own_vomit");
let _task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "registered_the_third_task");
handle.advance_n_successful_batches(3);
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "processed_all_tasks");
let proc = index_scheduler.processing_tasks.read().unwrap().clone();
let rtxn = index_scheduler.env.read_txn().unwrap();
let query = Query { limit: Some(0), ..Default::default() };
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
snapshot!(snapshot_bitmap(&batches), @"[]");
let query = Query { limit: Some(1), ..Default::default() };
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
snapshot!(snapshot_bitmap(&batches), @"[2,]");
let query = Query { limit: Some(2), ..Default::default() };
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
snapshot!(snapshot_bitmap(&batches), @"[1,2,]");
let query = Query { from: Some(1), ..Default::default() };
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
snapshot!(snapshot_bitmap(&batches), @"[0,1,]");
let query = Query { from: Some(2), ..Default::default() };
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
snapshot!(snapshot_bitmap(&batches), @"[0,1,2,]");
let query = Query { from: Some(1), limit: Some(1), ..Default::default() };
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
snapshot!(snapshot_bitmap(&batches), @"[1,]");
let query = Query { from: Some(1), limit: Some(2), ..Default::default() };
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
snapshot!(snapshot_bitmap(&batches), @"[0,1,]");
}
#[test]
fn query_batches_simple() {
let start_time = OffsetDateTime::now_utc();
let (index_scheduler, mut handle) =
IndexScheduler::test(true, vec![(3, FailureLocation::InsideProcessBatch)]);
let kind = index_creation_task("catto", "mouse");
let _task = index_scheduler.register(kind, None, false).unwrap();
let kind = index_creation_task("doggo", "sheep");
let _task = index_scheduler.register(kind, None, false).unwrap();
let kind = index_creation_task("whalo", "fish");
let _task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "start");
handle.advance_till([Start, BatchCreated]);
let query = Query { statuses: Some(vec![Status::Processing]), ..Default::default() };
let (mut batches, _) = index_scheduler
.get_batches_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
assert_eq!(batches.len(), 1);
batches[0].started_at = OffsetDateTime::UNIX_EPOCH;
assert!(batches[0].enqueued_at.is_some());
batches[0].enqueued_at = None;
if !batches[0].stats.progress_trace.is_empty() {
batches[0].stats.progress_trace.clear();
batches[0]
.stats
.progress_trace
.insert("processing tasks".to_string(), "deterministic_duration".into());
}
// Insta cannot snapshot our batches because the batch stats contains an enum as key: https://github.com/mitsuhiko/insta/issues/689
let batch = serde_json::to_string_pretty(&batches[0]).unwrap();
snapshot!(batch, @r###"
{
"uid": 0,
"details": {
"primaryKey": "mouse"
},
"stats": {
"totalNbTasks": 1,
"status": {
"processing": 1
},
"types": {
"indexCreation": 1
},
"indexUids": {
"catto": 1
},
"progressTrace": {
"processing tasks": "deterministic_duration"
}
},
"startedAt": "1970-01-01T00:00:00Z",
"finishedAt": null,
"enqueuedAt": null,
"stopReason": "created batch containing only task with id 0 of type `indexCreation` that cannot be batched with any other task."
}
"###);
let query = Query { statuses: Some(vec![Status::Enqueued]), ..Default::default() };
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&batches), @"[]"); // The batches don't contains any enqueued tasks
let query =
Query { statuses: Some(vec![Status::Enqueued, Status::Processing]), ..Default::default() };
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&batches), @"[0,]"); // both enqueued and processing tasks in the first tick
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Processing]),
after_started_at: Some(start_time),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test, which should excludes the enqueued tasks
snapshot!(snapshot_bitmap(&batches), @"[0,]");
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Processing]),
before_started_at: Some(start_time),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes before the start of the test, which should excludes all of them
snapshot!(snapshot_bitmap(&batches), @"[]");
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Processing]),
after_started_at: Some(start_time),
before_started_at: Some(start_time + Duration::minutes(1)),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test and before one minute after the start of the test,
// which should exclude the enqueued tasks and include the only processing task
snapshot!(snapshot_bitmap(&batches), @"[0,]");
handle.advance_till([
InsideProcessBatch,
InsideProcessBatch,
ProcessBatchSucceeded,
AfterProcessing,
Start,
BatchCreated,
]);
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "after-advancing-a-bit");
let second_start_time = OffsetDateTime::now_utc();
let query = Query {
statuses: Some(vec![Status::Succeeded, Status::Processing]),
after_started_at: Some(start_time),
before_started_at: Some(start_time + Duration::minutes(1)),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test and before one minute after the start of the test,
// which should include all tasks
snapshot!(snapshot_bitmap(&batches), @"[0,1,]");
let query = Query {
statuses: Some(vec![Status::Succeeded, Status::Processing]),
before_started_at: Some(start_time),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes before the start of the test, which should exclude all tasks
snapshot!(snapshot_bitmap(&batches), @"[]");
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Succeeded, Status::Processing]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the second part of the test and before one minute after the
// second start of the test, which should exclude all tasks
snapshot!(snapshot_bitmap(&batches), @"[]");
// now we make one more batch, the started_at field of the new tasks will be past `second_start_time`
handle.advance_till([
InsideProcessBatch,
InsideProcessBatch,
ProcessBatchSucceeded,
AfterProcessing,
Start,
BatchCreated,
]);
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// we run the same query to verify that, and indeed find that the last task is matched
snapshot!(snapshot_bitmap(&batches), @"[2,]");
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Succeeded, Status::Processing]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// enqueued, succeeded, or processing tasks started after the second part of the test, should
// again only return the last task
snapshot!(snapshot_bitmap(&batches), @"[2,]");
handle.advance_till([ProcessBatchFailed, AfterProcessing]);
// now the last task should have failed
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "end");
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// so running the last query should return nothing
snapshot!(snapshot_bitmap(&batches), @"[]");
let query = Query {
statuses: Some(vec![Status::Failed]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// but the same query on failed tasks should return the last task
snapshot!(snapshot_bitmap(&batches), @"[2,]");
let query = Query {
statuses: Some(vec![Status::Failed]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// but the same query on failed tasks should return the last task
snapshot!(snapshot_bitmap(&batches), @"[2,]");
let query = Query {
statuses: Some(vec![Status::Failed]),
uids: Some(vec![1]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// same query but with an invalid uid
snapshot!(snapshot_bitmap(&batches), @"[]");
let query = Query {
statuses: Some(vec![Status::Failed]),
uids: Some(vec![2]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// same query but with a valid uid
snapshot!(snapshot_bitmap(&batches), @"[2,]");
}
#[test]
fn query_batches_special_rules() {
let (index_scheduler, mut handle) =
IndexScheduler::test(true, vec![(3, FailureLocation::InsideProcessBatch)]);
let kind = index_creation_task("catto", "mouse");
let _task = index_scheduler.register(kind, None, false).unwrap();
let kind = index_creation_task("doggo", "sheep");
let _task = index_scheduler.register(kind, None, false).unwrap();
let kind = KindWithContent::IndexSwap {
swaps: vec![IndexSwap { indexes: ("catto".to_owned(), "doggo".to_owned()), rename: false }],
};
let _task = index_scheduler.register(kind, None, false).unwrap();
let kind = KindWithContent::IndexSwap {
swaps: vec![IndexSwap { indexes: ("catto".to_owned(), "whalo".to_owned()), rename: false }],
};
let _task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "start");
handle.advance_till([Start, BatchCreated]);
let rtxn = index_scheduler.env.read_txn().unwrap();
let proc = index_scheduler.processing_tasks.read().unwrap().clone();
let query = Query { index_uids: Some(vec!["catto".to_owned()]), ..Default::default() };
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
// only the first task associated with catto is returned, the indexSwap tasks are excluded!
snapshot!(snapshot_bitmap(&batches), @"[0,]");
let query = Query { index_uids: Some(vec!["catto".to_owned()]), ..Default::default() };
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(
&rtxn,
&query,
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
&proc,
)
.unwrap();
// we have asked for only the tasks associated with catto, but are only authorized to retrieve the tasks
// associated with doggo -> empty result
snapshot!(snapshot_bitmap(&batches), @"[]");
drop(rtxn);
// We're going to advance and process all the batches for the next query to actually hit the db
handle.advance_till([
InsideProcessBatch,
InsideProcessBatch,
ProcessBatchSucceeded,
AfterProcessing,
]);
handle.advance_one_successful_batch();
handle.advance_n_failed_batches(2);
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "after-processing-everything");
let rtxn = index_scheduler.env.read_txn().unwrap();
let query = Query::default();
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(
&rtxn,
&query,
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
&proc,
)
.unwrap();
// we asked for all the tasks, but we are only authorized to retrieve the doggo tasks
// -> only the index creation of doggo should be returned
snapshot!(snapshot_bitmap(&batches), @"[1,]");
let query = Query::default();
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(
&rtxn,
&query,
&AuthFilter::with_allowed_indexes(
vec![
IndexUidPattern::new_unchecked("catto"),
IndexUidPattern::new_unchecked("doggo"),
]
.into_iter()
.collect(),
),
&proc,
)
.unwrap();
// we asked for all the tasks, but we are only authorized to retrieve the doggo and catto tasks
// -> all tasks except the swap of catto with whalo are returned
snapshot!(snapshot_bitmap(&batches), @"[0,1,]");
let query = Query::default();
let (batches, _) = index_scheduler
.queue
.get_batch_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
// we asked for all the tasks with all index authorized -> all tasks returned
snapshot!(snapshot_bitmap(&batches), @"[0,1,2,3,]");
}
#[test]
fn query_batches_canceled_by() {
let (index_scheduler, mut handle) =
IndexScheduler::test(true, vec![(3, FailureLocation::InsideProcessBatch)]);
let kind = index_creation_task("catto", "mouse");
let _ = index_scheduler.register(kind, None, false).unwrap();
let kind = index_creation_task("doggo", "sheep");
let _ = index_scheduler.register(kind, None, false).unwrap();
let kind = KindWithContent::IndexSwap {
swaps: vec![IndexSwap { indexes: ("catto".to_owned(), "doggo".to_owned()), rename: false }],
};
let _task = index_scheduler.register(kind, None, false).unwrap();
handle.advance_n_successful_batches(1);
let kind = KindWithContent::TaskCancelation {
query: "test_query".to_string(),
tasks: [0, 1, 2, 3].into_iter().collect(),
};
let task_cancelation = index_scheduler.register(kind, None, false).unwrap();
handle.advance_n_successful_batches(1);
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "start");
let query = Query { canceled_by: Some(vec![task_cancelation.uid]), ..Query::default() };
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// The batch zero was the index creation task, the 1 is the task cancellation
snapshot!(snapshot_bitmap(&batches), @"[1,]");
let query = Query { canceled_by: Some(vec![task_cancelation.uid]), ..Query::default() };
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(
&query,
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
)
.unwrap();
// Return only 1 because the user is not authorized to see task 2
snapshot!(snapshot_bitmap(&batches), @"[1,]");
}

View File

@@ -1,391 +0,0 @@
mod batches;
#[cfg(test)]
mod batches_test;
mod tasks;
#[cfg(test)]
mod tasks_test;
#[cfg(test)]
mod test;
use std::collections::BTreeMap;
use std::fs::File as StdFile;
use std::time::Duration;
use file_store::FileStore;
use meilisearch_types::batches::BatchId;
use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn, WithoutTls};
use meilisearch_types::milli::{CboRoaringBitmapCodec, BEU32};
use meilisearch_types::tasks::{Kind, KindWithContent, Status, Task};
use roaring::RoaringBitmap;
use time::format_description::well_known::Rfc3339;
use time::OffsetDateTime;
use uuid::Uuid;
pub(crate) use self::batches::BatchQueue;
pub(crate) use self::tasks::TaskQueue;
use crate::processing::ProcessingTasks;
use crate::utils::{
check_index_swap_validity, filter_out_references_to_newer_tasks, ProcessingBatch,
};
use crate::{Error, IndexSchedulerOptions, Result, TaskId};
/// The number of database used by queue itself
const NUMBER_OF_DATABASES: u32 = 1;
/// Database const names for the `IndexScheduler`.
mod db_name {
pub const BATCH_TO_TASKS_MAPPING: &str = "batch-to-tasks-mapping";
}
/// Defines a subset of tasks to be retrieved from the [`IndexScheduler`].
///
/// An empty/default query (where each field is set to `None`) matches all tasks.
/// Each non-null field restricts the set of tasks further.
#[derive(Default, Debug, Clone, PartialEq, Eq)]
pub struct Query {
/// The maximum number of tasks to be matched
pub limit: Option<u32>,
/// The minimum [task id](`meilisearch_types::tasks::Task::uid`) to be matched
pub from: Option<u32>,
/// The order used to return the tasks. By default the newest tasks are returned first and the boolean is `false`.
pub reverse: Option<bool>,
/// The [task ids](`meilisearch_types::tasks::Task::uid`) to be matched
pub uids: Option<Vec<TaskId>>,
/// The [batch ids](`meilisearch_types::batches::Batch::uid`) to be matched
pub batch_uids: Option<Vec<BatchId>>,
/// The allowed [statuses](`meilisearch_types::tasks::Task::status`) of the matched tasls
pub statuses: Option<Vec<Status>>,
/// The allowed [kinds](meilisearch_types::tasks::Kind) of the matched tasks.
///
/// The kind of a task is given by:
/// ```
/// # use meilisearch_types::tasks::{Task, Kind};
/// # fn doc_func(task: Task) -> Kind {
/// task.kind.as_kind()
/// # }
/// ```
pub types: Option<Vec<Kind>>,
/// The allowed [index ids](meilisearch_types::tasks::Task::index_uid) of the matched tasks
pub index_uids: Option<Vec<String>>,
/// The [task ids](`meilisearch_types::tasks::Task::uid`) of the [`TaskCancelation`](meilisearch_types::tasks::Task::Kind::TaskCancelation) tasks
/// that canceled the matched tasks.
pub canceled_by: Option<Vec<TaskId>>,
/// Exclusive upper bound of the matched tasks' [`enqueued_at`](meilisearch_types::tasks::Task::enqueued_at) field.
pub before_enqueued_at: Option<OffsetDateTime>,
/// Exclusive lower bound of the matched tasks' [`enqueued_at`](meilisearch_types::tasks::Task::enqueued_at) field.
pub after_enqueued_at: Option<OffsetDateTime>,
/// Exclusive upper bound of the matched tasks' [`started_at`](meilisearch_types::tasks::Task::started_at) field.
pub before_started_at: Option<OffsetDateTime>,
/// Exclusive lower bound of the matched tasks' [`started_at`](meilisearch_types::tasks::Task::started_at) field.
pub after_started_at: Option<OffsetDateTime>,
/// Exclusive upper bound of the matched tasks' [`finished_at`](meilisearch_types::tasks::Task::finished_at) field.
pub before_finished_at: Option<OffsetDateTime>,
/// Exclusive lower bound of the matched tasks' [`finished_at`](meilisearch_types::tasks::Task::finished_at) field.
pub after_finished_at: Option<OffsetDateTime>,
}
impl Query {
/// Return `true` if every field of the query is set to `None`, such that the query
/// matches all tasks.
pub fn is_empty(&self) -> bool {
matches!(
self,
Query {
limit: None,
from: None,
reverse: None,
uids: None,
batch_uids: None,
statuses: None,
types: None,
index_uids: None,
canceled_by: None,
before_enqueued_at: None,
after_enqueued_at: None,
before_started_at: None,
after_started_at: None,
before_finished_at: None,
after_finished_at: None,
}
)
}
/// Add an [index id](meilisearch_types::tasks::Task::index_uid) to the list of permitted indexes.
pub fn with_index(self, index_uid: String) -> Self {
let mut index_vec = self.index_uids.unwrap_or_default();
index_vec.push(index_uid);
Self { index_uids: Some(index_vec), ..self }
}
// Removes the `from` and `limit` restrictions from the query.
// Useful to get the total number of tasks matching a filter.
pub fn without_limits(self) -> Self {
Query { limit: None, from: None, ..self }
}
}
/// Structure which holds meilisearch's indexes and schedules the tasks
/// to be performed on them.
pub struct Queue {
pub(crate) tasks: tasks::TaskQueue,
pub(crate) batches: batches::BatchQueue,
/// Matches a batch id with the associated task ids.
pub(crate) batch_to_tasks_mapping: Database<BEU32, CboRoaringBitmapCodec>,
/// The list of files referenced by the tasks.
pub(crate) file_store: FileStore,
/// The max number of tasks allowed before the scheduler starts to delete
/// the finished tasks automatically.
pub(crate) max_number_of_tasks: usize,
}
impl Queue {
pub(crate) fn private_clone(&self) -> Queue {
Queue {
tasks: self.tasks.private_clone(),
batches: self.batches.private_clone(),
batch_to_tasks_mapping: self.batch_to_tasks_mapping,
file_store: self.file_store.clone(),
max_number_of_tasks: self.max_number_of_tasks,
}
}
pub(crate) const fn nb_db() -> u32 {
tasks::TaskQueue::nb_db() + batches::BatchQueue::nb_db() + NUMBER_OF_DATABASES
}
/// Create an index scheduler and start its run loop.
pub(crate) fn new(
env: &Env<WithoutTls>,
wtxn: &mut RwTxn,
options: &IndexSchedulerOptions,
) -> Result<Self> {
// allow unreachable_code to get rids of the warning in the case of a test build.
Ok(Self {
file_store: FileStore::new(&options.update_file_path)?,
batch_to_tasks_mapping: env
.create_database(wtxn, Some(db_name::BATCH_TO_TASKS_MAPPING))?,
tasks: TaskQueue::new(env, wtxn)?,
batches: BatchQueue::new(env, wtxn)?,
max_number_of_tasks: options.max_number_of_tasks,
})
}
/// Returns the whole set of tasks that belongs to this batch.
pub(crate) fn tasks_in_batch(&self, rtxn: &RoTxn, batch_id: BatchId) -> Result<RoaringBitmap> {
Ok(self.batch_to_tasks_mapping.get(rtxn, &batch_id)?.unwrap_or_default())
}
/// Convert an iterator to a `Vec` of tasks and edit the `ProcessingBatch` to add the given tasks.
///
/// The tasks MUST exist, or a `CorruptedTaskQueue` error will be thrown.
pub(crate) fn get_existing_tasks_for_processing_batch(
&self,
rtxn: &RoTxn,
processing_batch: &mut ProcessingBatch,
tasks: impl IntoIterator<Item = TaskId>,
) -> Result<Vec<Task>> {
tasks
.into_iter()
.map(|task_id| {
let mut task = self
.tasks
.get_task(rtxn, task_id)
.and_then(|task| task.ok_or(Error::CorruptedTaskQueue));
processing_batch.processing(&mut task);
task
})
.collect::<Result<_>>()
}
pub(crate) fn write_batch(
&self,
wtxn: &mut RwTxn,
batch: ProcessingBatch,
tasks: &RoaringBitmap,
) -> Result<()> {
self.batch_to_tasks_mapping.put(wtxn, &batch.uid, tasks)?;
self.batches.write_batch(wtxn, batch)?;
Ok(())
}
pub(crate) fn delete_persisted_task_data(&self, task: &Task) -> Result<()> {
match task.content_uuid() {
Some(content_file) => self.delete_update_file(content_file),
None => Ok(()),
}
}
/// Open and returns the task's content File.
pub fn update_file(&self, uuid: Uuid) -> file_store::Result<StdFile> {
self.file_store.get_update(uuid)
}
/// Delete a file from the index scheduler.
///
/// Counterpart to the [`create_update_file`](IndexScheduler::create_update_file) method.
pub fn delete_update_file(&self, uuid: Uuid) -> Result<()> {
Ok(self.file_store.delete(uuid)?)
}
/// Create a file and register it in the index scheduler.
///
/// The returned file and uuid can be used to associate
/// some data to a task. The file will be kept until
/// the task has been fully processed.
pub fn create_update_file(&self, dry_run: bool) -> Result<(Uuid, file_store::File)> {
if dry_run {
Ok((Uuid::nil(), file_store::File::dry_file()?))
} else {
Ok(self.file_store.new_update()?)
}
}
#[cfg(test)]
pub fn create_update_file_with_uuid(&self, uuid: u128) -> Result<(Uuid, file_store::File)> {
Ok(self.file_store.new_update_with_uuid(uuid)?)
}
/// The size on disk taken by all the updates files contained in the `IndexScheduler`, in bytes.
pub fn compute_update_file_size(&self) -> Result<u64> {
Ok(self.file_store.compute_total_size()?)
}
pub fn register(
&self,
wtxn: &mut RwTxn,
kind: &KindWithContent,
task_id: Option<TaskId>,
dry_run: bool,
) -> Result<Task> {
let next_task_id = self.tasks.next_task_id(wtxn)?;
if let Some(uid) = task_id {
if uid < next_task_id {
return Err(Error::BadTaskId { received: uid, expected: next_task_id });
}
}
let mut task = Task {
uid: task_id.unwrap_or(next_task_id),
// The batch is defined once we starts processing the task
batch_uid: None,
enqueued_at: OffsetDateTime::now_utc(),
started_at: None,
finished_at: None,
error: None,
canceled_by: None,
details: kind.default_details(),
status: Status::Enqueued,
kind: kind.clone(),
network: None,
};
// For deletion and cancelation tasks, we want to make extra sure that they
// don't attempt to delete/cancel tasks that are newer than themselves.
filter_out_references_to_newer_tasks(&mut task);
// If the register task is an index swap task, verify that it is well-formed
// (that it does not contain duplicate indexes).
check_index_swap_validity(&task)?;
// At this point the task is going to be registered and no further checks will be done
if dry_run {
return Ok(task);
}
self.tasks.register(wtxn, &task)?;
Ok(task)
}
/// Register a task to cleanup the task queue if needed
pub fn cleanup_task_queue(&self, wtxn: &mut RwTxn) -> Result<()> {
let nb_tasks = self.tasks.all_task_ids(wtxn)?.len();
// if we have less than 1M tasks everything is fine
if nb_tasks < self.max_number_of_tasks as u64 {
return Ok(());
}
let finished = self.tasks.status.get(wtxn, &Status::Succeeded)?.unwrap_or_default()
| self.tasks.status.get(wtxn, &Status::Failed)?.unwrap_or_default()
| self.tasks.status.get(wtxn, &Status::Canceled)?.unwrap_or_default();
let to_delete =
RoaringBitmap::from_sorted_iter(finished.into_iter().take(100_000)).unwrap();
// /!\ the len must be at least 2 or else we might enter an infinite loop where we only delete
// the deletion tasks we enqueued ourselves.
if to_delete.len() < 2 {
tracing::warn!("The task queue is almost full, but no task can be deleted yet.");
// the only thing we can do is hope that the user tasks are going to finish
return Ok(());
}
tracing::info!(
"The task queue is almost full. Deleting the oldest {} finished tasks.",
to_delete.len()
);
// it's safe to unwrap here because we checked the len above
let newest_task_id = to_delete.iter().next_back().unwrap();
let last_task_to_delete =
self.tasks.get_task(wtxn, newest_task_id)?.ok_or(Error::CorruptedTaskQueue)?;
// increase time by one nanosecond so that the enqueuedAt of the last task to delete is also lower than that date.
let delete_before = last_task_to_delete.enqueued_at + Duration::from_nanos(1);
self.register(
wtxn,
&KindWithContent::TaskDeletion {
query: format!(
"?beforeEnqueuedAt={}&statuses=succeeded,failed,canceled",
delete_before.format(&Rfc3339).map_err(|_| Error::CorruptedTaskQueue)?,
),
tasks: to_delete,
},
None,
false,
)?;
Ok(())
}
pub fn get_stats(
&self,
rtxn: &RoTxn,
processing: &ProcessingTasks,
) -> Result<BTreeMap<String, BTreeMap<String, u64>>> {
let mut res = BTreeMap::new();
let processing_tasks = processing.processing.len();
res.insert(
"statuses".to_string(),
enum_iterator::all::<Status>()
.map(|s| {
let tasks = self.tasks.get_status(rtxn, s)?.len();
match s {
Status::Enqueued => Ok((s.to_string(), tasks - processing_tasks)),
Status::Processing => Ok((s.to_string(), processing_tasks)),
s => Ok((s.to_string(), tasks)),
}
})
.collect::<Result<BTreeMap<String, u64>>>()?,
);
res.insert(
"types".to_string(),
enum_iterator::all::<Kind>()
.map(|s| Ok((s.to_string(), self.tasks.get_kind(rtxn, s)?.len())))
.collect::<Result<BTreeMap<String, u64>>>()?,
);
res.insert(
"indexes".to_string(),
self.tasks
.index_tasks
.iter(rtxn)?
.map(|res| Ok(res.map(|(name, bitmap)| (name.to_string(), bitmap.len()))?))
.collect::<Result<BTreeMap<String, u64>>>()?,
);
Ok(res)
}
}

View File

@@ -1,562 +0,0 @@
use std::ops::{Bound, RangeBounds};
use meilisearch_types::heed::types::{DecodeIgnore, SerdeBincode, SerdeJson, Str};
use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn, WithoutTls};
use meilisearch_types::milli::{CboRoaringBitmapCodec, RoaringBitmapCodec, BEU32};
use meilisearch_types::tasks::{Kind, Status, Task};
use roaring::{MultiOps, RoaringBitmap};
use time::OffsetDateTime;
use super::{Query, Queue};
use crate::processing::ProcessingTasks;
use crate::utils::{
self, insert_task_datetime, keep_ids_within_datetimes, map_bound, remove_task_datetime,
};
use crate::{Error, Result, TaskId, BEI128};
/// The number of database used by the task queue
const NUMBER_OF_DATABASES: u32 = 8;
/// Database const names for the `IndexScheduler`.
mod db_name {
pub const ALL_TASKS: &str = "all-tasks";
pub const STATUS: &str = "status";
pub const KIND: &str = "kind";
pub const INDEX_TASKS: &str = "index-tasks";
pub const CANCELED_BY: &str = "canceled_by";
pub const ENQUEUED_AT: &str = "enqueued-at";
pub const STARTED_AT: &str = "started-at";
pub const FINISHED_AT: &str = "finished-at";
}
pub struct TaskQueue {
/// The main database, it contains all the tasks accessible by their Id.
pub(crate) all_tasks: Database<BEU32, SerdeJson<Task>>,
/// All the tasks ids grouped by their status.
// TODO we should not be able to serialize a `Status::Processing` in this database.
pub(crate) status: Database<SerdeBincode<Status>, RoaringBitmapCodec>,
/// All the tasks ids grouped by their kind.
pub(crate) kind: Database<SerdeBincode<Kind>, RoaringBitmapCodec>,
/// Store the tasks associated to an index.
pub(crate) index_tasks: Database<Str, RoaringBitmapCodec>,
/// Store the tasks that were canceled by a task uid
pub(crate) canceled_by: Database<BEU32, RoaringBitmapCodec>,
/// Store the task ids of tasks which were enqueued at a specific date
pub(crate) enqueued_at: Database<BEI128, CboRoaringBitmapCodec>,
/// Store the task ids of finished tasks which started being processed at a specific date
pub(crate) started_at: Database<BEI128, CboRoaringBitmapCodec>,
/// Store the task ids of tasks which finished at a specific date
pub(crate) finished_at: Database<BEI128, CboRoaringBitmapCodec>,
}
impl TaskQueue {
pub(crate) fn private_clone(&self) -> TaskQueue {
TaskQueue {
all_tasks: self.all_tasks,
status: self.status,
kind: self.kind,
index_tasks: self.index_tasks,
canceled_by: self.canceled_by,
enqueued_at: self.enqueued_at,
started_at: self.started_at,
finished_at: self.finished_at,
}
}
pub(crate) const fn nb_db() -> u32 {
NUMBER_OF_DATABASES
}
pub(crate) fn new(env: &Env<WithoutTls>, wtxn: &mut RwTxn) -> Result<Self> {
Ok(Self {
all_tasks: env.create_database(wtxn, Some(db_name::ALL_TASKS))?,
status: env.create_database(wtxn, Some(db_name::STATUS))?,
kind: env.create_database(wtxn, Some(db_name::KIND))?,
index_tasks: env.create_database(wtxn, Some(db_name::INDEX_TASKS))?,
canceled_by: env.create_database(wtxn, Some(db_name::CANCELED_BY))?,
enqueued_at: env.create_database(wtxn, Some(db_name::ENQUEUED_AT))?,
started_at: env.create_database(wtxn, Some(db_name::STARTED_AT))?,
finished_at: env.create_database(wtxn, Some(db_name::FINISHED_AT))?,
})
}
pub(crate) fn last_task_id(&self, rtxn: &RoTxn) -> Result<Option<TaskId>> {
Ok(self.all_tasks.remap_data_type::<DecodeIgnore>().last(rtxn)?.map(|(k, _)| k + 1))
}
pub(crate) fn next_task_id(&self, rtxn: &RoTxn) -> Result<TaskId> {
Ok(self.last_task_id(rtxn)?.unwrap_or_default())
}
pub(crate) fn all_task_ids(&self, rtxn: &RoTxn) -> Result<RoaringBitmap> {
enum_iterator::all().map(|s| self.get_status(rtxn, s)).union()
}
pub(crate) fn get_task(&self, rtxn: &RoTxn, task_id: TaskId) -> Result<Option<Task>> {
Ok(self.all_tasks.get(rtxn, &task_id)?)
}
/// Update the inverted task indexes and write the new value of the task.
///
/// The passed `task` object typically comes from a previous transaction, so two kinds of modification might have occurred:
/// 1. Modification to the `task` object after loading it from the DB (the purpose of this method is to persist these changes)
/// 2. Modification to the task committed by another transaction in the DB (an annoying consequence of having lost the original
/// transaction from which the `task` instance was deserialized)
///
/// When calling this function, this `task` is modified to take into account any existing `network`
/// that can have been added since the task was loaded into memory.
///
/// Any other modification to the task that was committed from the DB since the parameter was pulled from the DB will be overwritten.
///
/// # Errors
///
/// - CorruptedTaskQueue: The task doesn't exist in the database
pub(crate) fn update_task(&self, wtxn: &mut RwTxn, task: &mut Task) -> Result<()> {
let old_task = self.get_task(wtxn, task.uid)?.ok_or(Error::CorruptedTaskQueue)?;
let reprocessing = old_task.status != Status::Enqueued;
debug_assert!(old_task != *task);
debug_assert_eq!(old_task.uid, task.uid);
// If we're processing a task that failed it may already contains a batch_uid
debug_assert!(
reprocessing || (old_task.batch_uid.is_none() && task.batch_uid.is_some()),
"\n==> old: {old_task:?}\n==> new: {task:?}"
);
if old_task.status != task.status {
self.update_status(wtxn, old_task.status, |bitmap| {
bitmap.remove(task.uid);
})?;
self.update_status(wtxn, task.status, |bitmap| {
bitmap.insert(task.uid);
})?;
}
if old_task.kind.as_kind() != task.kind.as_kind() {
self.update_kind(wtxn, old_task.kind.as_kind(), |bitmap| {
bitmap.remove(task.uid);
})?;
self.update_kind(wtxn, task.kind.as_kind(), |bitmap| {
bitmap.insert(task.uid);
})?;
}
assert_eq!(
old_task.enqueued_at, task.enqueued_at,
"Cannot update a task's enqueued_at time"
);
if old_task.started_at != task.started_at {
assert!(
reprocessing || old_task.started_at.is_none(),
"Cannot update a task's started_at time"
);
if let Some(started_at) = old_task.started_at {
remove_task_datetime(wtxn, self.started_at, started_at, task.uid)?;
}
if let Some(started_at) = task.started_at {
insert_task_datetime(wtxn, self.started_at, started_at, task.uid)?;
}
}
if old_task.finished_at != task.finished_at {
assert!(
reprocessing || old_task.finished_at.is_none(),
"Cannot update a task's finished_at time"
);
if let Some(finished_at) = old_task.finished_at {
remove_task_datetime(wtxn, self.finished_at, finished_at, task.uid)?;
}
if let Some(finished_at) = task.finished_at {
insert_task_datetime(wtxn, self.finished_at, finished_at, task.uid)?;
}
}
task.network = match (old_task.network, task.network.take()) {
(None, None) => None,
(None, Some(network)) | (Some(network), None) => Some(network),
(Some(_), Some(network)) => Some(network),
};
self.all_tasks.put(wtxn, &task.uid, task)?;
Ok(())
}
/// Returns the whole set of tasks that belongs to this index.
pub(crate) fn index_tasks(&self, rtxn: &RoTxn, index: &str) -> Result<RoaringBitmap> {
Ok(self.index_tasks.get(rtxn, index)?.unwrap_or_default())
}
pub(crate) fn update_index(
&self,
wtxn: &mut RwTxn,
index: &str,
f: impl Fn(&mut RoaringBitmap),
) -> Result<()> {
let mut tasks = self.index_tasks(wtxn, index)?;
f(&mut tasks);
if tasks.is_empty() {
self.index_tasks.delete(wtxn, index)?;
} else {
self.index_tasks.put(wtxn, index, &tasks)?;
}
Ok(())
}
pub(crate) fn get_status(&self, rtxn: &RoTxn, status: Status) -> Result<RoaringBitmap> {
Ok(self.status.get(rtxn, &status)?.unwrap_or_default())
}
pub(crate) fn put_status(
&self,
wtxn: &mut RwTxn,
status: Status,
bitmap: &RoaringBitmap,
) -> Result<()> {
Ok(self.status.put(wtxn, &status, bitmap)?)
}
pub(crate) fn update_status(
&self,
wtxn: &mut RwTxn,
status: Status,
f: impl Fn(&mut RoaringBitmap),
) -> Result<()> {
let mut tasks = self.get_status(wtxn, status)?;
f(&mut tasks);
self.put_status(wtxn, status, &tasks)?;
Ok(())
}
pub(crate) fn get_kind(&self, rtxn: &RoTxn, kind: Kind) -> Result<RoaringBitmap> {
Ok(self.kind.get(rtxn, &kind)?.unwrap_or_default())
}
pub(crate) fn put_kind(
&self,
wtxn: &mut RwTxn,
kind: Kind,
bitmap: &RoaringBitmap,
) -> Result<()> {
Ok(self.kind.put(wtxn, &kind, bitmap)?)
}
pub(crate) fn update_kind(
&self,
wtxn: &mut RwTxn,
kind: Kind,
f: impl Fn(&mut RoaringBitmap),
) -> Result<()> {
let mut tasks = self.get_kind(wtxn, kind)?;
f(&mut tasks);
self.put_kind(wtxn, kind, &tasks)?;
Ok(())
}
/// Convert an iterator to a `Vec` of tasks. The tasks MUST exist or a
/// `CorruptedTaskQueue` error will be thrown.
pub(crate) fn get_existing_tasks(
&self,
rtxn: &RoTxn,
tasks: impl IntoIterator<Item = TaskId>,
) -> Result<Vec<Task>> {
tasks
.into_iter()
.map(|task_id| {
self.get_task(rtxn, task_id).and_then(|task| task.ok_or(Error::CorruptedTaskQueue))
})
.collect::<Result<_>>()
}
pub(crate) fn register(&self, wtxn: &mut RwTxn, task: &Task) -> Result<()> {
self.all_tasks.put(wtxn, &task.uid, task)?;
for index in task.indexes() {
self.update_index(wtxn, index, |bitmap| {
bitmap.insert(task.uid);
})?;
}
self.update_status(wtxn, Status::Enqueued, |bitmap| {
bitmap.insert(task.uid);
})?;
self.update_kind(wtxn, task.kind.as_kind(), |bitmap| {
bitmap.insert(task.uid);
})?;
utils::insert_task_datetime(wtxn, self.enqueued_at, task.enqueued_at, task.uid)?;
Ok(())
}
}
impl Queue {
/// Return the task ids matched by the given query from the index scheduler's point of view.
pub(crate) fn get_task_ids(
&self,
rtxn: &RoTxn,
query: &Query,
processing_tasks: &ProcessingTasks,
) -> Result<RoaringBitmap> {
let ProcessingTasks { batch: processing_batch, processing: processing_tasks, progress: _ } =
processing_tasks;
let Query {
limit,
from,
reverse,
uids,
batch_uids,
statuses,
types,
index_uids,
canceled_by,
before_enqueued_at,
after_enqueued_at,
before_started_at,
after_started_at,
before_finished_at,
after_finished_at,
} = query;
let mut tasks = self.tasks.all_task_ids(rtxn)?;
if let Some(from) = from {
let range = if reverse.unwrap_or_default() {
u32::MIN..*from
} else {
from.saturating_add(1)..u32::MAX
};
tasks.remove_range(range);
}
if let Some(batch_uids) = batch_uids {
let mut batch_tasks = RoaringBitmap::new();
for batch_uid in batch_uids {
if processing_batch.as_ref().is_some_and(|batch| batch.uid == *batch_uid) {
batch_tasks |= &**processing_tasks;
} else {
batch_tasks |= self.tasks_in_batch(rtxn, *batch_uid)?;
}
}
tasks &= batch_tasks;
}
if let Some(status) = statuses {
let mut status_tasks = RoaringBitmap::new();
for status in status {
match status {
// special case for Processing tasks
Status::Processing => {
status_tasks |= &**processing_tasks;
}
status => status_tasks |= &self.tasks.get_status(rtxn, *status)?,
};
}
if !status.contains(&Status::Processing) {
tasks -= &**processing_tasks;
}
tasks &= status_tasks;
}
if let Some(uids) = uids {
let uids = RoaringBitmap::from_iter(uids);
tasks &= &uids;
}
if let Some(canceled_by) = canceled_by {
let mut all_canceled_tasks = RoaringBitmap::new();
for cancel_task_uid in canceled_by {
if let Some(canceled_by_uid) = self.tasks.canceled_by.get(rtxn, cancel_task_uid)? {
all_canceled_tasks |= canceled_by_uid;
}
}
// if the canceled_by has been specified but no task
// matches then we prefer matching zero than all tasks.
if all_canceled_tasks.is_empty() {
return Ok(RoaringBitmap::new());
} else {
tasks &= all_canceled_tasks;
}
}
if let Some(kind) = types {
let mut kind_tasks = RoaringBitmap::new();
for kind in kind {
kind_tasks |= self.tasks.get_kind(rtxn, *kind)?;
}
tasks &= &kind_tasks;
}
if let Some(index) = index_uids {
let mut index_tasks = RoaringBitmap::new();
for index in index {
index_tasks |= self.tasks.index_tasks(rtxn, index)?;
}
tasks &= &index_tasks;
}
// For the started_at filter, we need to treat the part of the tasks that are processing from the part of the
// tasks that are not processing. The non-processing ones are filtered normally while the processing ones
// are entirely removed unless the in-memory startedAt variable falls within the date filter.
// Once we have filtered the two subsets, we put them back together and assign it back to `tasks`.
tasks = {
let (mut filtered_non_processing_tasks, mut filtered_processing_tasks) =
(&tasks - &**processing_tasks, &tasks & &**processing_tasks);
// special case for Processing tasks
// A closure that clears the filtered_processing_tasks if their started_at date falls outside the given bounds
let mut clear_filtered_processing_tasks =
|start: Bound<OffsetDateTime>, end: Bound<OffsetDateTime>| {
let start = map_bound(start, |b| b.unix_timestamp_nanos());
let end = map_bound(end, |b| b.unix_timestamp_nanos());
let is_within_dates = RangeBounds::contains(
&(start, end),
&processing_batch
.as_ref()
.map_or_else(OffsetDateTime::now_utc, |batch| batch.started_at)
.unix_timestamp_nanos(),
);
if !is_within_dates {
filtered_processing_tasks.clear();
}
};
match (after_started_at, before_started_at) {
(None, None) => (),
(None, Some(before)) => {
clear_filtered_processing_tasks(Bound::Unbounded, Bound::Excluded(*before))
}
(Some(after), None) => {
clear_filtered_processing_tasks(Bound::Excluded(*after), Bound::Unbounded)
}
(Some(after), Some(before)) => clear_filtered_processing_tasks(
Bound::Excluded(*after),
Bound::Excluded(*before),
),
};
keep_ids_within_datetimes(
rtxn,
&mut filtered_non_processing_tasks,
self.tasks.started_at,
*after_started_at,
*before_started_at,
)?;
filtered_non_processing_tasks | filtered_processing_tasks
};
keep_ids_within_datetimes(
rtxn,
&mut tasks,
self.tasks.enqueued_at,
*after_enqueued_at,
*before_enqueued_at,
)?;
keep_ids_within_datetimes(
rtxn,
&mut tasks,
self.tasks.finished_at,
*after_finished_at,
*before_finished_at,
)?;
if let Some(limit) = limit {
tasks = if query.reverse.unwrap_or_default() {
tasks.into_iter().take(*limit as usize).collect()
} else {
tasks.into_iter().rev().take(*limit as usize).collect()
};
}
Ok(tasks)
}
pub(crate) fn get_task_ids_from_authorized_indexes(
&self,
rtxn: &RoTxn,
query: &Query,
filters: &meilisearch_auth::AuthFilter,
processing_tasks: &ProcessingTasks,
) -> Result<(RoaringBitmap, u64)> {
// compute all tasks matching the filter by ignoring the limits, to find the number of tasks matching
// the filter.
// As this causes us to compute the filter twice it is slightly inefficient, but doing it this way spares
// us from modifying the underlying implementation, and the performance remains sufficient.
// Should this change, we would modify `get_task_ids` to directly return the number of matching tasks.
let total_tasks =
self.get_task_ids(rtxn, &query.clone().without_limits(), processing_tasks)?;
let mut tasks = self.get_task_ids(rtxn, query, processing_tasks)?;
// If the query contains a list of index uid or there is a finite list of authorized indexes,
// then we must exclude all the kinds that aren't associated to one and only one index.
if query.index_uids.is_some() || !filters.all_indexes_authorized() {
for kind in enum_iterator::all::<Kind>().filter(|kind| !kind.related_to_one_index()) {
tasks -= self.tasks.get_kind(rtxn, kind)?;
}
}
// Any task that is internally associated with a non-authorized index
// must be discarded.
if !filters.all_indexes_authorized() {
let all_indexes_iter = self.tasks.index_tasks.iter(rtxn)?;
for result in all_indexes_iter {
let (index, index_tasks) = result?;
if !filters.is_index_authorized(index) {
tasks -= index_tasks;
}
}
}
Ok((tasks, total_tasks.len()))
}
pub(crate) fn get_tasks_from_authorized_indexes(
&self,
rtxn: &RoTxn,
query: &Query,
filters: &meilisearch_auth::AuthFilter,
processing_tasks: &ProcessingTasks,
) -> Result<(Vec<Task>, u64)> {
let (tasks, total) =
self.get_task_ids_from_authorized_indexes(rtxn, query, filters, processing_tasks)?;
let tasks = if query.reverse.unwrap_or_default() {
Box::new(tasks.into_iter()) as Box<dyn Iterator<Item = u32>>
} else {
Box::new(tasks.into_iter().rev()) as Box<dyn Iterator<Item = u32>>
};
let tasks = self
.tasks
.get_existing_tasks(rtxn, tasks.take(query.limit.unwrap_or(u32::MAX) as usize))?;
let ProcessingTasks { batch, processing, progress: _ } = processing_tasks;
let ret = tasks.into_iter();
if processing.is_empty() || batch.is_none() {
Ok((ret.collect(), total))
} else {
// Safe because we ensured there was a batch in the previous branch
let batch = batch.as_ref().unwrap();
Ok((
ret.map(|task| {
if processing.contains(task.uid) {
Task {
status: Status::Processing,
batch_uid: Some(batch.uid),
started_at: Some(batch.started_at),
..task
}
} else {
task
}
})
.collect(),
total,
))
}
}
}

View File

@@ -1,441 +0,0 @@
use meili_snap::snapshot;
use meilisearch_auth::AuthFilter;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::tasks::{IndexSwap, KindWithContent, Status};
use time::{Duration, OffsetDateTime};
use crate::insta_snapshot::{snapshot_bitmap, snapshot_index_scheduler};
use crate::test_utils::Breakpoint::*;
use crate::test_utils::{index_creation_task, FailureLocation};
use crate::{IndexScheduler, Query};
#[test]
fn query_tasks_from_and_limit() {
let (index_scheduler, mut handle) = IndexScheduler::test(true, vec![]);
let kind = index_creation_task("doggo", "bone");
let _task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "registered_the_first_task");
let kind = index_creation_task("whalo", "plankton");
let _task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "registered_the_second_task");
let kind = index_creation_task("catto", "his_own_vomit");
let _task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "registered_the_third_task");
handle.advance_n_successful_batches(3);
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "processed_all_tasks");
let rtxn = index_scheduler.env.read_txn().unwrap();
let processing = index_scheduler.processing_tasks.read().unwrap();
let query = Query { limit: Some(0), ..Default::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &processing)
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[]");
let query = Query { limit: Some(1), ..Default::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &processing)
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
let query = Query { limit: Some(2), ..Default::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &processing)
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[1,2,]");
let query = Query { from: Some(1), ..Default::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &processing)
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,]");
let query = Query { from: Some(2), ..Default::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &processing)
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,2,]");
let query = Query { from: Some(1), limit: Some(1), ..Default::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &processing)
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[1,]");
let query = Query { from: Some(1), limit: Some(2), ..Default::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &processing)
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,]");
}
#[test]
fn query_tasks_simple() {
let start_time = OffsetDateTime::now_utc();
let (index_scheduler, mut handle) =
IndexScheduler::test(true, vec![(3, FailureLocation::InsideProcessBatch)]);
let kind = index_creation_task("catto", "mouse");
let _task = index_scheduler.register(kind, None, false).unwrap();
let kind = index_creation_task("doggo", "sheep");
let _task = index_scheduler.register(kind, None, false).unwrap();
let kind = index_creation_task("whalo", "fish");
let _task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "start");
handle.advance_till([Start, BatchCreated]);
let query = Query { statuses: Some(vec![Status::Processing]), ..Default::default() };
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,]"); // only the processing tasks in the first tick
let query = Query { statuses: Some(vec![Status::Enqueued]), ..Default::default() };
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[1,2,]"); // only the enqueued tasks in the first tick
let query =
Query { statuses: Some(vec![Status::Enqueued, Status::Processing]), ..Default::default() };
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,2,]"); // both enqueued and processing tasks in the first tick
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Processing]),
after_started_at: Some(start_time),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test, which should excludes the enqueued tasks
snapshot!(snapshot_bitmap(&tasks), @"[0,]");
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Processing]),
before_started_at: Some(start_time),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes before the start of the test, which should excludes all of them
snapshot!(snapshot_bitmap(&tasks), @"[]");
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Processing]),
after_started_at: Some(start_time),
before_started_at: Some(start_time + Duration::minutes(1)),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test and before one minute after the start of the test,
// which should exclude the enqueued tasks and include the only processing task
snapshot!(snapshot_bitmap(&tasks), @"[0,]");
handle.advance_till([
InsideProcessBatch,
InsideProcessBatch,
ProcessBatchSucceeded,
AfterProcessing,
Start,
BatchCreated,
]);
let second_start_time = OffsetDateTime::now_utc();
let query = Query {
statuses: Some(vec![Status::Succeeded, Status::Processing]),
after_started_at: Some(start_time),
before_started_at: Some(start_time + Duration::minutes(1)),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test and before one minute after the start of the test,
// which should include all tasks
snapshot!(snapshot_bitmap(&tasks), @"[0,1,]");
let query = Query {
statuses: Some(vec![Status::Succeeded, Status::Processing]),
before_started_at: Some(start_time),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes before the start of the test, which should exclude all tasks
snapshot!(snapshot_bitmap(&tasks), @"[]");
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Succeeded, Status::Processing]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the second part of the test and before one minute after the
// second start of the test, which should exclude all tasks
snapshot!(snapshot_bitmap(&tasks), @"[]");
// now we make one more batch, the started_at field of the new tasks will be past `second_start_time`
handle.advance_till([
InsideProcessBatch,
InsideProcessBatch,
ProcessBatchSucceeded,
AfterProcessing,
Start,
BatchCreated,
]);
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// we run the same query to verify that, and indeed find that the last task is matched
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Succeeded, Status::Processing]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// enqueued, succeeded, or processing tasks started after the second part of the test, should
// again only return the last task
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
handle.advance_till([ProcessBatchFailed, AfterProcessing]);
// now the last task should have failed
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "end");
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// so running the last query should return nothing
snapshot!(snapshot_bitmap(&tasks), @"[]");
let query = Query {
statuses: Some(vec![Status::Failed]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// but the same query on failed tasks should return the last task
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
let query = Query {
statuses: Some(vec![Status::Failed]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// but the same query on failed tasks should return the last task
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
let query = Query {
statuses: Some(vec![Status::Failed]),
uids: Some(vec![1]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// same query but with an invalid uid
snapshot!(snapshot_bitmap(&tasks), @"[]");
let query = Query {
statuses: Some(vec![Status::Failed]),
uids: Some(vec![2]),
after_started_at: Some(second_start_time),
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let (tasks, _) = index_scheduler
.get_task_ids_from_authorized_indexes(&query, &AuthFilter::default())
.unwrap();
// same query but with a valid uid
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
}
#[test]
fn query_tasks_special_rules() {
let (index_scheduler, mut handle) =
IndexScheduler::test(true, vec![(3, FailureLocation::InsideProcessBatch)]);
let kind = index_creation_task("catto", "mouse");
let _task = index_scheduler.register(kind, None, false).unwrap();
let kind = index_creation_task("doggo", "sheep");
let _task = index_scheduler.register(kind, None, false).unwrap();
let kind = KindWithContent::IndexSwap {
swaps: vec![IndexSwap { indexes: ("catto".to_owned(), "doggo".to_owned()), rename: false }],
};
let _task = index_scheduler.register(kind, None, false).unwrap();
let kind = KindWithContent::IndexSwap {
swaps: vec![IndexSwap { indexes: ("catto".to_owned(), "whalo".to_owned()), rename: false }],
};
let _task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "start");
handle.advance_till([Start, BatchCreated]);
let rtxn = index_scheduler.env.read_txn().unwrap();
let proc = index_scheduler.processing_tasks.read().unwrap();
let query = Query { index_uids: Some(vec!["catto".to_owned()]), ..Default::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
// only the first task associated with catto is returned, the indexSwap tasks are excluded!
snapshot!(snapshot_bitmap(&tasks), @"[0,]");
let query = Query { index_uids: Some(vec!["catto".to_owned()]), ..Default::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
&proc,
)
.unwrap();
// we have asked for only the tasks associated with catto, but are only authorized to retrieve the tasks
// associated with doggo -> empty result
snapshot!(snapshot_bitmap(&tasks), @"[]");
let query = Query::default();
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
&proc,
)
.unwrap();
// we asked for all the tasks, but we are only authorized to retrieve the doggo tasks
// -> only the index creation of doggo should be returned
snapshot!(snapshot_bitmap(&tasks), @"[1,]");
let query = Query::default();
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&AuthFilter::with_allowed_indexes(
vec![
IndexUidPattern::new_unchecked("catto"),
IndexUidPattern::new_unchecked("doggo"),
]
.into_iter()
.collect(),
),
&proc,
)
.unwrap();
// we asked for all the tasks, but we are only authorized to retrieve the doggo and catto tasks
// -> all tasks except the swap of catto with whalo are returned
snapshot!(snapshot_bitmap(&tasks), @"[0,1,]");
let query = Query::default();
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
// we asked for all the tasks with all index authorized -> all tasks returned
snapshot!(snapshot_bitmap(&tasks), @"[0,1,2,3,]");
}
#[test]
fn query_tasks_canceled_by() {
let (index_scheduler, mut handle) =
IndexScheduler::test(true, vec![(3, FailureLocation::InsideProcessBatch)]);
let kind = index_creation_task("catto", "mouse");
let _ = index_scheduler.register(kind, None, false).unwrap();
let kind = index_creation_task("doggo", "sheep");
let _ = index_scheduler.register(kind, None, false).unwrap();
let kind = KindWithContent::IndexSwap {
swaps: vec![IndexSwap { indexes: ("catto".to_owned(), "doggo".to_owned()), rename: false }],
};
let _task = index_scheduler.register(kind, None, false).unwrap();
handle.advance_n_successful_batches(1);
let kind = KindWithContent::TaskCancelation {
query: "test_query".to_string(),
tasks: [0, 1, 2, 3].into_iter().collect(),
};
let task_cancelation = index_scheduler.register(kind, None, false).unwrap();
handle.advance_n_successful_batches(1);
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "start");
let rtxn = index_scheduler.read_txn().unwrap();
let proc = index_scheduler.processing_tasks.read().unwrap();
let query = Query { canceled_by: Some(vec![task_cancelation.uid]), ..Query::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default(), &proc)
.unwrap();
// 0 is not returned because it was not canceled, 3 is not returned because it is the uid of the
// taskCancelation itself
snapshot!(snapshot_bitmap(&tasks), @"[1,2,]");
let query = Query { canceled_by: Some(vec![task_cancelation.uid]), ..Query::default() };
let (tasks, _) = index_scheduler
.queue
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
&proc,
)
.unwrap();
// Return only 1 because the user is not authorized to see task 2
snapshot!(snapshot_bitmap(&tasks), @"[1,]");
}

View File

@@ -1,427 +0,0 @@
use big_s::S;
use meili_snap::{json_string, snapshot};
use meilisearch_types::error::ErrorCode;
use meilisearch_types::tasks::{KindWithContent, Status};
use roaring::RoaringBitmap;
use crate::insta_snapshot::snapshot_index_scheduler;
use crate::test_utils::Breakpoint::*;
use crate::test_utils::{index_creation_task, replace_document_import_task};
use crate::{IndexScheduler, Query};
#[test]
fn register() {
// In this test, the handle doesn't make any progress, we only check that the tasks are registered
let (index_scheduler, mut _handle) = IndexScheduler::test(true, vec![]);
let kinds = [
index_creation_task("catto", "mouse"),
replace_document_import_task("catto", None, 0, 12),
replace_document_import_task("catto", None, 1, 50),
replace_document_import_task("doggo", Some("bone"), 2, 5000),
];
let (_, file) = index_scheduler.queue.create_update_file_with_uuid(0).unwrap();
file.persist().unwrap();
let (_, file) = index_scheduler.queue.create_update_file_with_uuid(1).unwrap();
file.persist().unwrap();
let (_, file) = index_scheduler.queue.create_update_file_with_uuid(2).unwrap();
file.persist().unwrap();
for (idx, kind) in kinds.into_iter().enumerate() {
let k = kind.as_kind();
let task = index_scheduler.register(kind, None, false).unwrap();
index_scheduler.assert_internally_consistent();
assert_eq!(task.uid, idx as u32);
assert_eq!(task.status, Status::Enqueued);
assert_eq!(task.kind.as_kind(), k);
}
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "everything_is_successfully_registered");
}
#[test]
fn dry_run() {
let (index_scheduler, _handle) = IndexScheduler::test(true, vec![]);
let kind = KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None };
let task = index_scheduler.register(kind, None, true).unwrap();
snapshot!(task.uid, @"0");
snapshot!(snapshot_index_scheduler(&index_scheduler), @r"
### Autobatching Enabled = true
### Processing batch None:
[]
----------------------------------------------------------------------
### All Tasks:
----------------------------------------------------------------------
### Status:
----------------------------------------------------------------------
### Kind:
----------------------------------------------------------------------
### Index Tasks:
----------------------------------------------------------------------
### Index Mapper:
----------------------------------------------------------------------
### Canceled By:
----------------------------------------------------------------------
### Enqueued At:
----------------------------------------------------------------------
### Started At:
----------------------------------------------------------------------
### Finished At:
----------------------------------------------------------------------
### All Batches:
----------------------------------------------------------------------
### Batch to tasks mapping:
----------------------------------------------------------------------
### Batches Status:
----------------------------------------------------------------------
### Batches Kind:
----------------------------------------------------------------------
### Batches Index Tasks:
----------------------------------------------------------------------
### Batches Enqueued At:
----------------------------------------------------------------------
### Batches Started At:
----------------------------------------------------------------------
### Batches Finished At:
----------------------------------------------------------------------
### File Store:
----------------------------------------------------------------------
");
let kind = KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None };
let task = index_scheduler.register(kind, Some(12), true).unwrap();
snapshot!(task.uid, @"12");
snapshot!(snapshot_index_scheduler(&index_scheduler), @r"
### Autobatching Enabled = true
### Processing batch None:
[]
----------------------------------------------------------------------
### All Tasks:
----------------------------------------------------------------------
### Status:
----------------------------------------------------------------------
### Kind:
----------------------------------------------------------------------
### Index Tasks:
----------------------------------------------------------------------
### Index Mapper:
----------------------------------------------------------------------
### Canceled By:
----------------------------------------------------------------------
### Enqueued At:
----------------------------------------------------------------------
### Started At:
----------------------------------------------------------------------
### Finished At:
----------------------------------------------------------------------
### All Batches:
----------------------------------------------------------------------
### Batch to tasks mapping:
----------------------------------------------------------------------
### Batches Status:
----------------------------------------------------------------------
### Batches Kind:
----------------------------------------------------------------------
### Batches Index Tasks:
----------------------------------------------------------------------
### Batches Enqueued At:
----------------------------------------------------------------------
### Batches Started At:
----------------------------------------------------------------------
### Batches Finished At:
----------------------------------------------------------------------
### File Store:
----------------------------------------------------------------------
");
}
#[test]
fn basic_set_taskid() {
let (index_scheduler, _handle) = IndexScheduler::test(true, vec![]);
let kind = KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None };
let task = index_scheduler.register(kind, None, false).unwrap();
snapshot!(task.uid, @"0");
let kind = KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None };
let task = index_scheduler.register(kind, Some(12), false).unwrap();
snapshot!(task.uid, @"12");
let kind = KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None };
let error = index_scheduler.register(kind, Some(5), false).unwrap_err();
snapshot!(error, @"Received bad task id: 5 should be >= to 13.");
}
#[test]
fn test_disable_auto_deletion_of_tasks() {
let (index_scheduler, mut handle) = IndexScheduler::test_with_custom_config(vec![], |config| {
config.cleanup_enabled = false;
config.max_number_of_tasks = 2;
None
});
index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap();
handle.advance_one_successful_batch();
index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap();
handle.advance_one_failed_batch();
// at this point the max number of tasks is reached
// we can still enqueue multiple tasks
index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap();
index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap();
let rtxn = index_scheduler.env.read_txn().unwrap();
let proc = index_scheduler.processing_tasks.read().unwrap();
let tasks =
index_scheduler.queue.get_task_ids(&rtxn, &Query { ..Default::default() }, &proc).unwrap();
let tasks = index_scheduler.queue.tasks.get_existing_tasks(&rtxn, tasks).unwrap();
snapshot!(json_string!(tasks, { "[].enqueuedAt" => "[date]", "[].startedAt" => "[date]", "[].finishedAt" => "[date]" }), name: "task_queue_is_full");
drop(rtxn);
drop(proc);
// now we're above the max number of tasks
// and if we try to advance in the tick function no new task deletion should be enqueued
handle.advance_till([Start, BatchCreated]);
let rtxn = index_scheduler.env.read_txn().unwrap();
let proc = index_scheduler.processing_tasks.read().unwrap();
let tasks =
index_scheduler.queue.get_task_ids(&rtxn, &Query { ..Default::default() }, &proc).unwrap();
let tasks = index_scheduler.queue.tasks.get_existing_tasks(&rtxn, tasks).unwrap();
snapshot!(json_string!(tasks, { "[].enqueuedAt" => "[date]", "[].startedAt" => "[date]", "[].finishedAt" => "[date]", ".**.original_filter" => "[filter]", ".**.query" => "[query]" }), name: "task_deletion_have_not_been_enqueued");
drop(rtxn);
drop(proc);
}
#[test]
fn test_auto_deletion_of_tasks() {
let (index_scheduler, mut handle) = IndexScheduler::test_with_custom_config(vec![], |config| {
config.max_number_of_tasks = 2;
None
});
index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap();
handle.advance_one_successful_batch();
index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap();
handle.advance_one_failed_batch();
// at this point the max number of tasks is reached
// we can still enqueue multiple tasks
index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap();
index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap();
let rtxn = index_scheduler.env.read_txn().unwrap();
let proc = index_scheduler.processing_tasks.read().unwrap();
let tasks =
index_scheduler.queue.get_task_ids(&rtxn, &Query { ..Default::default() }, &proc).unwrap();
let tasks = index_scheduler.queue.tasks.get_existing_tasks(&rtxn, tasks).unwrap();
snapshot!(json_string!(tasks, { "[].enqueuedAt" => "[date]", "[].startedAt" => "[date]", "[].finishedAt" => "[date]" }), name: "task_queue_is_full");
drop(rtxn);
drop(proc);
// now we're above the max number of tasks
// and if we try to advance in the tick function a new task deletion should be enqueued
handle.advance_till([Start, BatchCreated]);
let rtxn = index_scheduler.env.read_txn().unwrap();
let proc = index_scheduler.processing_tasks.read().unwrap();
let tasks =
index_scheduler.queue.get_task_ids(&rtxn, &Query { ..Default::default() }, &proc).unwrap();
let tasks = index_scheduler.queue.tasks.get_existing_tasks(&rtxn, tasks).unwrap();
snapshot!(json_string!(tasks, { "[].enqueuedAt" => "[date]", "[].startedAt" => "[date]", "[].finishedAt" => "[date]", ".**.original_filter" => "[filter]", ".**.query" => "[query]" }), name: "task_deletion_have_been_enqueued");
drop(rtxn);
drop(proc);
handle.advance_till([InsideProcessBatch, ProcessBatchSucceeded, AfterProcessing]);
let rtxn = index_scheduler.env.read_txn().unwrap();
let proc = index_scheduler.processing_tasks.read().unwrap();
let tasks =
index_scheduler.queue.get_task_ids(&rtxn, &Query { ..Default::default() }, &proc).unwrap();
let tasks = index_scheduler.queue.tasks.get_existing_tasks(&rtxn, tasks).unwrap();
snapshot!(json_string!(tasks, { "[].enqueuedAt" => "[date]", "[].startedAt" => "[date]", "[].finishedAt" => "[date]", ".**.original_filter" => "[filter]", ".**.query" => "[query]" }), name: "task_deletion_have_been_processed");
drop(rtxn);
drop(proc);
handle.advance_one_failed_batch();
// a new task deletion has been enqueued
handle.advance_one_successful_batch();
let rtxn = index_scheduler.env.read_txn().unwrap();
let proc = index_scheduler.processing_tasks.read().unwrap();
let tasks =
index_scheduler.queue.get_task_ids(&rtxn, &Query { ..Default::default() }, &proc).unwrap();
let tasks = index_scheduler.queue.tasks.get_existing_tasks(&rtxn, tasks).unwrap();
snapshot!(json_string!(tasks, { "[].enqueuedAt" => "[date]", "[].startedAt" => "[date]", "[].finishedAt" => "[date]", ".**.original_filter" => "[filter]", ".**.query" => "[query]" }), name: "after_the_second_task_deletion");
drop(rtxn);
drop(proc);
handle.advance_one_failed_batch();
handle.advance_one_successful_batch();
let rtxn = index_scheduler.env.read_txn().unwrap();
let proc = index_scheduler.processing_tasks.read().unwrap();
let tasks =
index_scheduler.queue.get_task_ids(&rtxn, &Query { ..Default::default() }, &proc).unwrap();
let tasks = index_scheduler.queue.tasks.get_existing_tasks(&rtxn, tasks).unwrap();
snapshot!(json_string!(tasks, { "[].enqueuedAt" => "[date]", "[].startedAt" => "[date]", "[].finishedAt" => "[date]", ".**.original_filter" => "[filter]", ".**.query" => "[query]" }), name: "everything_has_been_processed");
drop(rtxn);
drop(proc);
}
#[test]
fn test_task_queue_is_full() {
let (index_scheduler, mut handle) = IndexScheduler::test_with_custom_config(vec![], |config| {
// that's the minimum map size possible
config.task_db_size = 1048576 * 3;
None
});
index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap();
handle.advance_one_successful_batch();
// on average this task takes ~600 bytes
loop {
let result = index_scheduler.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
);
if result.is_err() {
break;
}
handle.advance_one_failed_batch();
}
index_scheduler.assert_internally_consistent();
// at this point the task DB shoud have reached its limit and we should not be able to register new tasks
let result = index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap_err();
snapshot!(result, @"Meilisearch cannot receive write operations because the limit of the task database has been reached. Please delete tasks to continue performing write operations.");
// we won't be able to test this error in an integration test thus as a best effort test I still ensure the error return the expected error code
snapshot!(format!("{:?}", result.error_code()), @"NoSpaceLeftOnDevice");
// Even the task deletion and cancelation that don't delete anything should be refused
let result = index_scheduler
.register(
KindWithContent::TaskDeletion { query: S("test"), tasks: RoaringBitmap::new() },
None,
false,
)
.unwrap_err();
snapshot!(result, @"Meilisearch cannot receive write operations because the limit of the task database has been reached. Please delete tasks to continue performing write operations.");
let result = index_scheduler
.register(
KindWithContent::TaskCancelation { query: S("test"), tasks: RoaringBitmap::new() },
None,
false,
)
.unwrap_err();
snapshot!(result, @"Meilisearch cannot receive write operations because the limit of the task database has been reached. Please delete tasks to continue performing write operations.");
// we won't be able to test this error in an integration test thus as a best effort test I still ensure the error return the expected error code
snapshot!(format!("{:?}", result.error_code()), @"NoSpaceLeftOnDevice");
// But a task cancelation that cancel something should work
index_scheduler
.register(
KindWithContent::TaskCancelation { query: S("test"), tasks: (0..100).collect() },
None,
false,
)
.unwrap();
handle.advance_one_successful_batch();
// But we should still be forbidden from enqueuing new tasks
let result = index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap_err();
snapshot!(result, @"Meilisearch cannot receive write operations because the limit of the task database has been reached. Please delete tasks to continue performing write operations.");
// And a task deletion that delete something should works
index_scheduler
.register(
KindWithContent::TaskDeletion { query: S("test"), tasks: (0..100).collect() },
None,
false,
)
.unwrap();
handle.advance_one_successful_batch();
// Now we should be able to enqueue a few tasks again
index_scheduler
.register(
KindWithContent::IndexCreation { index_uid: S("doggo"), primary_key: None },
None,
false,
)
.unwrap();
handle.advance_one_failed_batch();
}

View File

@@ -1,539 +0,0 @@
/*!
The autobatcher is responsible for combining the next enqueued
tasks affecting a single index into a [batch](crate::batch::Batch).
The main function of the autobatcher is [`next_autobatch`].
*/
use std::ops::ControlFlow::{self, Break, Continue};
use meilisearch_types::tasks::{BatchStopReason, PrimaryKeyMismatchReason, TaskId};
use crate::KindWithContent;
/// Succinctly describes a task's [`Kind`](meilisearch_types::tasks::Kind)
/// for the purpose of simplifying the implementation of the autobatcher.
///
/// Only the non-prioritised tasks that can be grouped in a batch have a corresponding [`AutobatchKind`]
enum AutobatchKind {
DocumentImport { allow_index_creation: bool, primary_key: Option<String> },
DocumentEdition,
DocumentDeletion { by_filter: bool },
DocumentClear,
Settings { allow_index_creation: bool },
IndexCreation,
IndexDeletion,
IndexUpdate,
IndexSwap,
}
impl AutobatchKind {
#[rustfmt::skip]
fn allow_index_creation(&self) -> Option<bool> {
match self {
AutobatchKind::DocumentImport { allow_index_creation, .. }
| AutobatchKind::Settings { allow_index_creation, .. } => Some(*allow_index_creation),
_ => None,
}
}
fn primary_key(&self) -> Option<Option<&str>> {
match self {
AutobatchKind::DocumentImport { primary_key, .. } => Some(primary_key.as_deref()),
_ => None,
}
}
}
impl From<KindWithContent> for AutobatchKind {
fn from(kind: KindWithContent) -> Self {
match kind {
KindWithContent::DocumentAdditionOrUpdate {
allow_index_creation, primary_key, ..
} => AutobatchKind::DocumentImport { allow_index_creation, primary_key },
KindWithContent::DocumentEdition { .. } => AutobatchKind::DocumentEdition,
KindWithContent::DocumentDeletion { .. } => {
AutobatchKind::DocumentDeletion { by_filter: false }
}
KindWithContent::DocumentClear { .. } => AutobatchKind::DocumentClear,
KindWithContent::DocumentDeletionByFilter { .. } => {
AutobatchKind::DocumentDeletion { by_filter: true }
}
KindWithContent::SettingsUpdate { allow_index_creation, is_deletion, .. } => {
AutobatchKind::Settings {
allow_index_creation: allow_index_creation && !is_deletion,
}
}
KindWithContent::IndexDeletion { .. } => AutobatchKind::IndexDeletion,
KindWithContent::IndexCreation { .. } => AutobatchKind::IndexCreation,
KindWithContent::IndexUpdate { .. } => AutobatchKind::IndexUpdate,
KindWithContent::IndexSwap { .. } => AutobatchKind::IndexSwap,
KindWithContent::IndexCompaction { .. }
| KindWithContent::TaskCancelation { .. }
| KindWithContent::TaskDeletion { .. }
| KindWithContent::DumpCreation { .. }
| KindWithContent::Export { .. }
| KindWithContent::UpgradeDatabase { .. }
| KindWithContent::SnapshotCreation => {
panic!("The autobatcher should never be called with tasks with special priority or that don't apply to an index.")
}
}
}
}
#[derive(Debug)]
pub enum BatchKind {
DocumentClear {
ids: Vec<TaskId>,
},
DocumentOperation {
allow_index_creation: bool,
primary_key: Option<String>,
operation_ids: Vec<TaskId>,
},
DocumentEdition {
id: TaskId,
},
DocumentDeletion {
deletion_ids: Vec<TaskId>,
includes_by_filter: bool,
},
ClearAndSettings {
other: Vec<TaskId>,
allow_index_creation: bool,
settings_ids: Vec<TaskId>,
},
Settings {
allow_index_creation: bool,
settings_ids: Vec<TaskId>,
},
IndexDeletion {
ids: Vec<TaskId>,
},
IndexCreation {
id: TaskId,
},
IndexUpdate {
id: TaskId,
},
IndexSwap {
id: TaskId,
},
}
impl BatchKind {
#[rustfmt::skip]
fn allow_index_creation(&self) -> Option<bool> {
match self {
BatchKind::DocumentOperation { allow_index_creation, .. }
| BatchKind::ClearAndSettings { allow_index_creation, .. }
| BatchKind::Settings { allow_index_creation, .. } => Some(*allow_index_creation),
_ => None,
}
}
fn primary_key(&self) -> Option<Option<&str>> {
match self {
BatchKind::DocumentOperation { primary_key, .. } => Some(primary_key.as_deref()),
_ => None,
}
}
}
impl BatchKind {
/// Returns a `ControlFlow::Break` if you must stop right now.
/// The boolean tell you if an index has been created by the batched task.
/// To ease the writing of the code. `true` can be returned when you don't need to create an index
/// but false can't be returned if you needs to create an index.
// TODO use an AutoBatchKind as input
pub fn new(
task_id: TaskId,
kind_with_content: KindWithContent,
primary_key: Option<&str>,
) -> (ControlFlow<(BatchKind, BatchStopReason), BatchKind>, bool) {
use AutobatchKind as K;
let kind = kind_with_content.as_kind();
match AutobatchKind::from(kind_with_content) {
K::IndexCreation => (
Break((
BatchKind::IndexCreation { id: task_id },
BatchStopReason::TaskCannotBeBatched { kind, id: task_id },
)),
true,
),
K::IndexDeletion => (
Break((
BatchKind::IndexDeletion { ids: vec![task_id] },
BatchStopReason::IndexDeletion { id: task_id },
)),
false,
),
K::IndexUpdate => (
Break((
BatchKind::IndexUpdate { id: task_id },
BatchStopReason::TaskCannotBeBatched { kind, id: task_id },
)),
false,
),
K::IndexSwap => (
Break((
BatchKind::IndexSwap { id: task_id },
BatchStopReason::TaskCannotBeBatched { kind, id: task_id },
)),
false,
),
K::DocumentClear => (Continue(BatchKind::DocumentClear { ids: vec![task_id] }), false),
K::DocumentImport { allow_index_creation, primary_key: pk }
if primary_key.is_none() || pk.is_none() || primary_key == pk.as_deref() =>
{
(
Continue(BatchKind::DocumentOperation {
allow_index_creation,
primary_key: pk,
operation_ids: vec![task_id],
}),
allow_index_creation,
)
}
// if the primary key set in the task was different than ours we should stop and make this batch fail asap.
K::DocumentImport { allow_index_creation, primary_key: pk } => (
Break((
BatchKind::DocumentOperation {
allow_index_creation,
primary_key: pk.clone(),
operation_ids: vec![task_id],
},
BatchStopReason::PrimaryKeyIndexMismatch {
id: task_id,
in_index: primary_key.unwrap().to_owned(),
in_task: pk.unwrap(),
},
)),
allow_index_creation,
),
K::DocumentEdition => (
Break((
BatchKind::DocumentEdition { id: task_id },
BatchStopReason::TaskCannotBeBatched { kind, id: task_id },
)),
false,
),
K::DocumentDeletion { by_filter: includes_by_filter } => (
Continue(BatchKind::DocumentDeletion {
deletion_ids: vec![task_id],
includes_by_filter,
}),
false,
),
K::Settings { allow_index_creation } => (
Continue(BatchKind::Settings { allow_index_creation, settings_ids: vec![task_id] }),
allow_index_creation,
),
}
}
/// Returns a `ControlFlow::Break` if you must stop right now.
/// The boolean tell you if an index has been created by the batched task.
/// To ease the writing of the code. `true` can be returned when you don't need to create an index
/// but false can't be returned if you needs to create an index.
#[rustfmt::skip]
fn accumulate(self, id: TaskId, kind_with_content: KindWithContent, index_already_exists: bool, primary_key: Option<&str>) -> ControlFlow<(BatchKind, BatchStopReason), BatchKind> {
use AutobatchKind as K;
let kind = kind_with_content.as_kind();
let autobatch_kind = AutobatchKind::from(kind_with_content);
let pk: Option<String> = match (self.primary_key(), autobatch_kind.primary_key(), primary_key) {
// 1. If incoming task don't interact with primary key -> we can continue
(batch_pk, None | Some(None), _) => {
batch_pk.flatten().map(ToOwned::to_owned)
},
// 2.1 If we already have a primary-key ->
// 2.1.1 If the task we're trying to accumulate have a pk it must be equal to our primary key
(_batch_pk, Some(Some(task_pk)), Some(index_pk)) => if task_pk == index_pk {
Some(task_pk.to_owned())
} else {
return Break((self, BatchStopReason::PrimaryKeyMismatch {
id,
reason: PrimaryKeyMismatchReason::TaskPrimaryKeyDifferFromIndexPrimaryKey {
task_pk: task_pk.to_owned(),
index_pk: index_pk.to_owned(),
},
}))
},
// 2.2 If we don't have a primary-key ->
// 2.2.2 If the batch is set to Some(None), the task should be too
(Some(None), Some(Some(task_pk)), None) => return Break((self, BatchStopReason::PrimaryKeyMismatch {
id,
reason: PrimaryKeyMismatchReason::CannotInterfereWithPrimaryKeyGuessing {
task_pk: task_pk.to_owned(),
},
})),
(Some(Some(batch_pk)), Some(Some(task_pk)), None) => if task_pk == batch_pk {
Some(task_pk.to_owned())
} else {
let batch_pk = batch_pk.to_owned();
let task_pk = task_pk.to_owned();
return Break((self, BatchStopReason::PrimaryKeyMismatch {
id,
reason: PrimaryKeyMismatchReason::TaskPrimaryKeyDifferFromCurrentBatchPrimaryKey {
batch_pk,
task_pk
},
}))
},
(None, Some(Some(task_pk)), None) => Some(task_pk.to_owned())
};
match (self, autobatch_kind) {
// We don't batch any of these operations
(this, K::IndexCreation | K::IndexUpdate | K::IndexSwap | K::DocumentEdition) => {
Break((this, BatchStopReason::TaskCannotBeBatched { kind, id }))
},
// We must not batch tasks that don't have the same index creation rights if the index doesn't already exists.
(this, kind) if !index_already_exists && this.allow_index_creation() == Some(false) && kind.allow_index_creation() == Some(true) => {
Break((this, BatchStopReason::IndexCreationMismatch { id }))
},
// The index deletion can batch with everything but must stop after
(
BatchKind::DocumentClear { mut ids }
| BatchKind::DocumentDeletion { deletion_ids: mut ids, includes_by_filter: _ }
| BatchKind::DocumentOperation { allow_index_creation: _, primary_key: _, operation_ids: mut ids }
| BatchKind::Settings { allow_index_creation: _, settings_ids: mut ids },
K::IndexDeletion,
) => {
ids.push(id);
Break((BatchKind::IndexDeletion { ids }, BatchStopReason::IndexDeletion { id }))
}
(
BatchKind::ClearAndSettings { settings_ids: mut ids, allow_index_creation: _, mut other },
K::IndexDeletion,
) => {
ids.push(id);
ids.append(&mut other);
Break((BatchKind::IndexDeletion { ids }, BatchStopReason::IndexDeletion { id }))
}
(
BatchKind::DocumentClear { mut ids },
K::DocumentClear | K::DocumentDeletion { by_filter: _ },
) => {
ids.push(id);
Continue(BatchKind::DocumentClear { ids })
}
(
this @ BatchKind::DocumentClear { .. },
K::DocumentImport { .. } | K::Settings { .. },
) => Break((this, BatchStopReason::DocumentOperationWithSettings { id })),
(
BatchKind::DocumentOperation { allow_index_creation: _, primary_key: _, mut operation_ids },
K::DocumentClear,
) => {
operation_ids.push(id);
Continue(BatchKind::DocumentClear { ids: operation_ids })
}
// we can autobatch different kind of document operations and mix replacements with updates
(
BatchKind::DocumentOperation { allow_index_creation, primary_key: _, mut operation_ids },
K::DocumentImport { primary_key: _, .. },
) => {
operation_ids.push(id);
Continue(BatchKind::DocumentOperation {
allow_index_creation,
operation_ids,
primary_key: pk,
})
}
(
BatchKind::DocumentOperation { allow_index_creation, primary_key: _, mut operation_ids },
K::DocumentDeletion { by_filter: false },
) => {
operation_ids.push(id);
Continue(BatchKind::DocumentOperation {
allow_index_creation,
operation_ids,
primary_key: pk,
})
}
// We can't batch a document operation with a delete by filter
(
this @ BatchKind::DocumentOperation { .. },
K::DocumentDeletion { by_filter: true },
) => {
Break((this, BatchStopReason::DocumentOperationWithDeletionByFilter { id }))
}
(
this @ BatchKind::DocumentOperation { .. },
K::Settings { .. },
) => Break((this, BatchStopReason::DocumentOperationWithSettings { id })),
(BatchKind::DocumentDeletion { mut deletion_ids, includes_by_filter: _ }, K::DocumentClear) => {
deletion_ids.push(id);
Continue(BatchKind::DocumentClear { ids: deletion_ids })
}
// we can't autobatch the deletion and import if the document deletion contained a filter
(
this @ BatchKind::DocumentDeletion { deletion_ids: _, includes_by_filter: true },
K::DocumentImport { .. }
) => Break((this, BatchStopReason::DeletionByFilterWithDocumentOperation { id })),
// we can autobatch the deletion and import if the index already exists
(
BatchKind::DocumentDeletion { mut deletion_ids, includes_by_filter: false },
K::DocumentImport { allow_index_creation, primary_key }
) if index_already_exists => {
deletion_ids.push(id);
Continue(BatchKind::DocumentOperation {
allow_index_creation,
primary_key,
operation_ids: deletion_ids,
})
}
// we can autobatch the deletion and import if both can't create an index
(
BatchKind::DocumentDeletion { mut deletion_ids, includes_by_filter: false },
K::DocumentImport { allow_index_creation, primary_key }
) if !allow_index_creation => {
deletion_ids.push(id);
Continue(BatchKind::DocumentOperation {
allow_index_creation,
primary_key,
operation_ids: deletion_ids,
})
}
// we can't autobatch a deletion and an import if the index does not exist but would be created by an addition
(
this @ BatchKind::DocumentDeletion { .. },
K::DocumentImport { .. }
) => {
Break((this, BatchStopReason::IndexCreationMismatch { id }))
}
(BatchKind::DocumentDeletion { mut deletion_ids, includes_by_filter }, K::DocumentDeletion { by_filter }) => {
deletion_ids.push(id);
Continue(BatchKind::DocumentDeletion { deletion_ids, includes_by_filter: includes_by_filter | by_filter })
}
(this @ BatchKind::DocumentDeletion { .. }, K::Settings { .. }) => Break((this, BatchStopReason::DocumentOperationWithSettings { id })),
(
BatchKind::Settings { settings_ids, allow_index_creation },
K::DocumentClear,
) => Continue(BatchKind::ClearAndSettings {
settings_ids,
allow_index_creation,
other: vec![id],
}),
(
this @ BatchKind::Settings { .. },
K::DocumentImport { .. } | K::DocumentDeletion { .. },
) => Break((this, BatchStopReason::SettingsWithDocumentOperation { id })),
(
BatchKind::Settings { mut settings_ids, allow_index_creation },
K::Settings { .. },
) => {
settings_ids.push(id);
Continue(BatchKind::Settings {
allow_index_creation,
settings_ids,
})
}
(
BatchKind::ClearAndSettings { mut other, settings_ids, allow_index_creation },
K::DocumentClear,
) => {
other.push(id);
Continue(BatchKind::ClearAndSettings {
other,
settings_ids,
allow_index_creation,
})
}
(this @ BatchKind::ClearAndSettings { .. }, K::DocumentImport { .. }) => Break((this, BatchStopReason::SettingsWithDocumentOperation { id })),
(
BatchKind::ClearAndSettings {
mut other,
settings_ids,
allow_index_creation,
},
K::DocumentDeletion { .. },
) => {
other.push(id);
Continue(BatchKind::ClearAndSettings {
other,
settings_ids,
allow_index_creation,
})
}
(
BatchKind::ClearAndSettings { mut settings_ids, other, allow_index_creation },
K::Settings { .. },
) => {
settings_ids.push(id);
Continue(BatchKind::ClearAndSettings {
other,
settings_ids,
allow_index_creation,
})
}
(
BatchKind::IndexCreation { .. }
| BatchKind::IndexDeletion { .. }
| BatchKind::IndexUpdate { .. }
| BatchKind::IndexSwap { .. }
| BatchKind::DocumentEdition { .. },
_,
) => {
unreachable!()
}
}
}
}
/// Create a batch from an ordered list of tasks.
///
/// ## Preconditions
/// 1. The tasks must be enqueued and given in the order in which they were enqueued
/// 2. The tasks must not be prioritised tasks (e.g. task cancellation, dump, snapshot, task deletion)
/// 3. The tasks must all be related to the same index
///
/// ## Return
/// `None` if the list of tasks is empty. Otherwise, an [`AutoBatch`] that represents
/// a subset of the given tasks.
pub fn autobatch(
enqueued: Vec<(TaskId, KindWithContent)>,
index_already_exists: bool,
primary_key: Option<&str>,
) -> Option<(BatchKind, bool, Option<BatchStopReason>)> {
let mut enqueued = enqueued.into_iter();
let (id, kind) = enqueued.next()?;
// index_exist will keep track of if the index should exist at this point after the tasks we batched.
let mut index_exist = index_already_exists;
let (mut acc, must_create_index) = match BatchKind::new(id, kind, primary_key) {
(Continue(acc), create) => (acc, create),
(Break((acc, batch_stop_reason)), create) => {
return Some((acc, create, Some(batch_stop_reason)))
}
};
// if an index has been created in the previous step we can consider it as existing.
index_exist |= must_create_index;
for (id, kind_with_content) in enqueued {
acc = match acc.accumulate(id, kind_with_content, index_exist, primary_key) {
Continue(acc) => acc,
Break((acc, batch_stop_reason)) => {
return Some((acc, must_create_index, Some(batch_stop_reason)))
}
};
}
Some((acc, must_create_index, None))
}

View File

@@ -1,402 +0,0 @@
use meilisearch_types::milli::update::IndexDocumentsMethod::{
self, ReplaceDocuments, UpdateDocuments,
};
use meilisearch_types::tasks::{BatchStopReason, IndexSwap, KindWithContent};
use uuid::Uuid;
use self::autobatcher::{autobatch, BatchKind};
use super::*;
use crate::TaskId;
#[macro_export]
macro_rules! debug_snapshot {
($value:expr, @$snapshot:literal) => {{
let value = format!("{:?}", $value);
meili_snap::snapshot!(value, @$snapshot);
}};
}
fn autobatch_from(
index_already_exists: bool,
primary_key: Option<&str>,
input: impl IntoIterator<Item = KindWithContent>,
) -> Option<(BatchKind, bool, Option<BatchStopReason>)> {
autobatch(
input.into_iter().enumerate().map(|(id, kind)| (id as TaskId, kind)).collect(),
index_already_exists,
primary_key,
)
}
fn doc_imp(
method: IndexDocumentsMethod,
allow_index_creation: bool,
primary_key: Option<&str>,
) -> KindWithContent {
KindWithContent::DocumentAdditionOrUpdate {
index_uid: String::from("doggo"),
primary_key: primary_key.map(|pk| pk.to_string()),
method,
content_file: Uuid::new_v4(),
documents_count: 0,
allow_index_creation,
}
}
fn doc_del() -> KindWithContent {
KindWithContent::DocumentDeletion {
index_uid: String::from("doggo"),
documents_ids: Vec::new(),
}
}
fn doc_del_fil() -> KindWithContent {
KindWithContent::DocumentDeletionByFilter {
index_uid: String::from("doggo"),
filter_expr: serde_json::json!("cuteness > 100"),
}
}
fn doc_clr() -> KindWithContent {
KindWithContent::DocumentClear { index_uid: String::from("doggo") }
}
fn settings(allow_index_creation: bool) -> KindWithContent {
KindWithContent::SettingsUpdate {
index_uid: String::from("doggo"),
new_settings: Default::default(),
is_deletion: false,
allow_index_creation,
}
}
fn idx_create() -> KindWithContent {
KindWithContent::IndexCreation { index_uid: String::from("doggo"), primary_key: None }
}
fn idx_update() -> KindWithContent {
KindWithContent::IndexUpdate {
index_uid: String::from("doggo"),
primary_key: None,
new_index_uid: None,
}
}
fn idx_del() -> KindWithContent {
KindWithContent::IndexDeletion { index_uid: String::from("doggo") }
}
fn idx_swap() -> KindWithContent {
KindWithContent::IndexSwap {
swaps: vec![IndexSwap {
indexes: (String::from("doggo"), String::from("catto")),
rename: false,
}],
}
}
#[test]
fn autobatch_simple_operation_together() {
// we can autobatch one or multiple `ReplaceDocuments` together.
// if the index exists.
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp( ReplaceDocuments, false , None), doc_imp(ReplaceDocuments, false , None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1, 2] }, false, None))");
// if it doesn't exists.
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, None))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, None))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true, None))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, Some(IndexCreationMismatch { id: 1 })))");
// we can autobatch one or multiple `UpdateDocuments` together.
// if the index exists.
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1, 2] }, false, None))");
// if it doesn't exists.
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, None))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true, None))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, None))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1, 2] }, false, None))");
// we can autobatch one or multiple DocumentDeletion together
debug_snapshot!(autobatch_from(true, None, [doc_del()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_del(), doc_del()]), @"Some((DocumentDeletion { deletion_ids: [0, 1, 2], includes_by_filter: false }, false, None))");
debug_snapshot!(autobatch_from(false,None, [doc_del()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false, None))");
debug_snapshot!(autobatch_from(false,None, [doc_del(), doc_del(), doc_del()]), @"Some((DocumentDeletion { deletion_ids: [0, 1, 2], includes_by_filter: false }, false, None))");
// we can autobatch one or multiple DocumentDeletionByFilter together
debug_snapshot!(autobatch_from(true, None, [doc_del_fil()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_del_fil(), doc_del_fil()]), @"Some((DocumentDeletion { deletion_ids: [0, 1, 2], includes_by_filter: true }, false, None))");
debug_snapshot!(autobatch_from(false,None, [doc_del_fil()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, None))");
debug_snapshot!(autobatch_from(false,None, [doc_del_fil(), doc_del_fil(), doc_del_fil()]), @"Some((DocumentDeletion { deletion_ids: [0, 1, 2], includes_by_filter: true }, false, None))");
// we can autobatch one or multiple Settings together
debug_snapshot!(autobatch_from(true, None, [settings(true)]), @"Some((Settings { allow_index_creation: true, settings_ids: [0] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [settings(true), settings(true), settings(true)]), @"Some((Settings { allow_index_creation: true, settings_ids: [0, 1, 2] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [settings(false)]), @"Some((Settings { allow_index_creation: false, settings_ids: [0] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [settings(false), settings(false), settings(false)]), @"Some((Settings { allow_index_creation: false, settings_ids: [0, 1, 2] }, false, None))");
debug_snapshot!(autobatch_from(false,None, [settings(true)]), @"Some((Settings { allow_index_creation: true, settings_ids: [0] }, true, None))");
debug_snapshot!(autobatch_from(false,None, [settings(true), settings(true), settings(true)]), @"Some((Settings { allow_index_creation: true, settings_ids: [0, 1, 2] }, true, None))");
debug_snapshot!(autobatch_from(false,None, [settings(false)]), @"Some((Settings { allow_index_creation: false, settings_ids: [0] }, false, None))");
debug_snapshot!(autobatch_from(false,None, [settings(false), settings(false), settings(false)]), @"Some((Settings { allow_index_creation: false, settings_ids: [0, 1, 2] }, false, None))");
// We can autobatch document addition with document deletion
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_del()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_del()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_del()]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), doc_del()]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true, None))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true, None))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false, None))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false, None))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, true, None), doc_del()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, None))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, true, None), doc_del()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, None))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, false, None), doc_del()]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, false, None), doc_del()]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true, None))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true, None))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false, None))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false, None))"###);
// And the other way around
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, true, Some("catto"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, false, None))"###);
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, true, Some("catto"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, false, None))"###);
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false, None))"###);
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false, None))"###);
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false, None))"###);
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false, None))"###);
// But we can't autobatch document addition with document deletion by filter
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithDeletionByFilter { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithDeletionByFilter { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_del_fil()]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, Some(DocumentOperationWithDeletionByFilter { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), doc_del_fil()]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, Some(DocumentOperationWithDeletionByFilter { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0] }, true, Some(DocumentOperationWithDeletionByFilter { id: 1 })))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0] }, true, Some(DocumentOperationWithDeletionByFilter { id: 1 })))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0] }, false, Some(DocumentOperationWithDeletionByFilter { id: 1 })))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0] }, false, Some(DocumentOperationWithDeletionByFilter { id: 1 })))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithDeletionByFilter { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithDeletionByFilter { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, false, None), doc_del_fil()]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, Some(DocumentOperationWithDeletionByFilter { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, false, None), doc_del_fil()]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, Some(DocumentOperationWithDeletionByFilter { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, true, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0] }, true, Some(DocumentOperationWithDeletionByFilter { id: 1 })))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, true, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0] }, true, Some(DocumentOperationWithDeletionByFilter { id: 1 })))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, false, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0] }, false, Some(DocumentOperationWithDeletionByFilter { id: 1 })))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, false, Some("catto")), doc_del_fil()]), @r###"Some((DocumentOperation { allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0] }, false, Some(DocumentOperationWithDeletionByFilter { id: 1 })))"###);
// And the other way around
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(ReplaceDocuments, true, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(UpdateDocuments, true, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(ReplaceDocuments, false, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(UpdateDocuments, false, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_del_fil(), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_del_fil(), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_del_fil(), doc_imp(ReplaceDocuments, false, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_del_fil(), doc_imp(UpdateDocuments, false, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
}
#[test]
fn simple_different_document_operations_autobatch_together() {
// addition and updates with deletion by filter can't batch together
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithDeletionByFilter { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_del_fil()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithDeletionByFilter { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(DeletionByFilterWithDocumentOperation { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_create()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(TaskCannotBeBatched { kind: IndexCreation, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_create()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(TaskCannotBeBatched { kind: IndexCreation, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_create()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false, Some(TaskCannotBeBatched { kind: IndexCreation, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), idx_create()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(TaskCannotBeBatched { kind: IndexCreation, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_update()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(TaskCannotBeBatched { kind: IndexUpdate, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_update()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(TaskCannotBeBatched { kind: IndexUpdate, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_update()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false, Some(TaskCannotBeBatched { kind: IndexUpdate, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), idx_update()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(TaskCannotBeBatched { kind: IndexUpdate, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_swap()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(TaskCannotBeBatched { kind: IndexSwap, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_swap()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(TaskCannotBeBatched { kind: IndexSwap, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_swap()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false, Some(TaskCannotBeBatched { kind: IndexSwap, id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), idx_swap()]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: true }, false, Some(TaskCannotBeBatched { kind: IndexSwap, id: 1 })))");
}
#[test]
fn document_addition_doesnt_batch_with_settings() {
// simple case
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
// multiple settings and doc addition
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, Some(DocumentOperationWithSettings { id: 2 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, Some(DocumentOperationWithSettings { id: 2 })))");
// addition and setting unordered
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
// Doesn't batch with other forbidden operations
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_del()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_del()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_create()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_create()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_update()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_update()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_swap()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_swap()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
}
#[test]
fn clear_and_additions() {
// these two doesn't need to batch
debug_snapshot!(autobatch_from(true, None, [doc_clr(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentClear { ids: [0] }, false, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_clr(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentClear { ids: [0] }, false, Some(DocumentOperationWithSettings { id: 1 })))");
// Basic use case
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), doc_clr()]), @"Some((DocumentClear { ids: [0, 1, 2] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_clr()]), @"Some((DocumentClear { ids: [0, 1, 2] }, true, None))");
// This batch kind doesn't mix with other document addition
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), doc_clr(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentClear { ids: [0, 1, 2] }, true, Some(DocumentOperationWithSettings { id: 3 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_clr(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentClear { ids: [0, 1, 2] }, true, Some(DocumentOperationWithSettings { id: 3 })))");
// But you can batch multiple clear together
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), doc_clr(), doc_clr(), doc_clr()]), @"Some((DocumentClear { ids: [0, 1, 2, 3, 4] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_clr(), doc_clr(), doc_clr()]), @"Some((DocumentClear { ids: [0, 1, 2, 3, 4] }, true, None))");
}
#[test]
fn clear_and_additions_and_settings() {
// A clear don't need to autobatch the settings that happens AFTER there is no documents
debug_snapshot!(autobatch_from(true, None, [doc_clr(), settings(true)]), @"Some((DocumentClear { ids: [0] }, false, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [settings(true), doc_clr(), settings(true)]), @"Some((ClearAndSettings { other: [1], allow_index_creation: true, settings_ids: [0, 2] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_clr()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_clr()]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
}
#[test]
fn anything_and_index_deletion() {
// The `IndexDeletion` doesn't batch with anything that happens AFTER.
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_imp(ReplaceDocuments, false, None)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_imp(UpdateDocuments, false, None)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_del()]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_del_fil()]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), doc_clr()]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), settings(true)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(true, None, [idx_del(), settings(false)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_imp(ReplaceDocuments, false, None)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_imp(UpdateDocuments, false, None)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_del()]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_del_fil()]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), doc_clr()]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), settings(true)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
debug_snapshot!(autobatch_from(false,None, [idx_del(), settings(false)]), @"Some((IndexDeletion { ids: [0] }, false, Some(IndexDeletion { id: 0 })))");
// The index deletion can accept almost any type of `BatchKind` and transform it to an `IndexDeletion`.
// First, the basic cases
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_del_fil(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [doc_del(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [doc_del_fil(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true, Some(IndexDeletion { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false, Some(IndexDeletion { id: 1 })))");
}
#[test]
fn allowed_and_disallowed_index_creation() {
// `DocumentImport` can't be mixed with those disallowed to do so except if the index already exists.
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, Some(IndexCreationMismatch { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, None))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false, None))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(DocumentOperationWithSettings { id: 1 })))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((DocumentOperation { allow_index_creation: false, primary_key: None, operation_ids: [0] }, false, Some(IndexCreationMismatch { id: 1 })))");
// batch deletion and addition
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, true, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false, Some(IndexCreationMismatch { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, true, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false, Some(IndexCreationMismatch { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false, Some(IndexCreationMismatch { id: 1 })))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0], includes_by_filter: false }, false, Some(IndexCreationMismatch { id: 1 })))");
}
#[test]
fn autobatch_primary_key() {
// ==> If I have a pk
// With a single update
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, None))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true, None))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true, Some(PrimaryKeyIndexMismatch { id: 0, in_index: "id", in_task: "other" })))"###);
// With a multiple updates
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, None))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true, None))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1, 2] }, true, None))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(PrimaryKeyMismatch { id: 1, reason: TaskPrimaryKeyDifferFromIndexPrimaryKey { task_pk: "other", index_pk: "id" } })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(PrimaryKeyMismatch { id: 1, reason: TaskPrimaryKeyDifferFromIndexPrimaryKey { task_pk: "other", index_pk: "id" } })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(PrimaryKeyMismatch { id: 1, reason: TaskPrimaryKeyDifferFromIndexPrimaryKey { task_pk: "other", index_pk: "id" } })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true, None))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true, None))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1, 2] }, true, None))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true, Some(PrimaryKeyMismatch { id: 1, reason: TaskPrimaryKeyDifferFromIndexPrimaryKey { task_pk: "other", index_pk: "id" } })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true, Some(PrimaryKeyMismatch { id: 1, reason: TaskPrimaryKeyDifferFromIndexPrimaryKey { task_pk: "other", index_pk: "id" } })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true, Some(PrimaryKeyMismatch { id: 1, reason: TaskPrimaryKeyDifferFromIndexPrimaryKey { task_pk: "other", index_pk: "id" } })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true, Some(PrimaryKeyIndexMismatch { id: 0, in_index: "id", in_task: "other" })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true, Some(PrimaryKeyIndexMismatch { id: 0, in_index: "id", in_task: "other" })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true, Some(PrimaryKeyIndexMismatch { id: 0, in_index: "id", in_task: "other" })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true, Some(PrimaryKeyIndexMismatch { id: 0, in_index: "id", in_task: "other" })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true, Some(PrimaryKeyIndexMismatch { id: 0, in_index: "id", in_task: "other" })))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true, Some(PrimaryKeyIndexMismatch { id: 0, in_index: "id", in_task: "other" })))"###);
// ==> If I don't have a pk
// With a single update
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true, None))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true, None))"###);
// With a multiple updates
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true, None))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: None, operation_ids: [0] }, true, Some(PrimaryKeyMismatch { id: 1, reason: CannotInterfereWithPrimaryKeyGuessing { task_pk: "id" } })))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true, None))"###);
}

Some files were not shown because too many files have changed in this diff Show More