diff --git a/.github/ISSUE_TEMPLATE/sprint_issue.md b/.github/ISSUE_TEMPLATE/new_feature_issue.md similarity index 89% rename from .github/ISSUE_TEMPLATE/sprint_issue.md rename to .github/ISSUE_TEMPLATE/new_feature_issue.md index 30b5e16ff..cf55fa43f 100644 --- a/.github/ISSUE_TEMPLATE/sprint_issue.md +++ b/.github/ISSUE_TEMPLATE/new_feature_issue.md @@ -1,28 +1,26 @@ --- -name: New sprint issue -about: ⚠️ Should only be used by the engine team ⚠️ +name: New feature issue +about: ⚠️ Should only be used by the internal Meili team ⚠️ title: '' -labels: 'missing usage in PRD, impacts docs' +labels: 'impacts docs, impacts integrations' assignees: '' --- Related product team resources: [PRD]() (_internal only_) -Related product discussion: - -## Motivation - - ## Usage +TBD + ## TODO ### Are you modifying a database? + - [ ] If not, add the `no db change` label to your PR, and you're good to merge. - [ ] If yes, add the `db change` label to your PR. You'll receive a message explaining you what to do. @@ -54,5 +52,5 @@ Related product discussion: ## Impacted teams - - + + diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 000000000..0fbc68c1d --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,16 @@ +## Related issue + +Fixes #... + +## Requirements + +⚠️ Ensure the following requirements before merging ⚠️ +- [ ] Automated tests have been added. +- [ ] If some tests cannot be automated, manual rigorous tests should be applied. +- [ ] ⚠️ If there is any change in the DB: + - [ ] Test that any impacted DB still works as expected after using `--experimental-dumpless-upgrade` on a DB created with the last released Meilisearch + - [ ] Test that during the upgrade, **search is still available** (artificially make the upgrade longer if needed) + - [ ] Set the `db change` label. +- [ ] If necessary, the feature have been tested in the Cloud production environment (with [prototypes](./documentation/prototypes.md)) and the Cloud UI is ready. +- [ ] If necessary, the [documentation](https://github.com/meilisearch/documentation) related to the implemented feature in the PR is ready. +- [ ] If necessary, the [integrations](https://github.com/meilisearch/integration-guides) related to the implemented feature in the PR are ready. diff --git a/.github/release-draft-template.yml b/.github/release-draft-template.yml new file mode 100644 index 000000000..ffe2fa5b7 --- /dev/null +++ b/.github/release-draft-template.yml @@ -0,0 +1,33 @@ +name-template: 'v$RESOLVED_VERSION' +tag-template: 'v$RESOLVED_VERSION' +exclude-labels: + - 'skip changelog' +version-resolver: + minor: + labels: + - 'enhancement' + default: patch +categories: + - title: '⚠️ Breaking changes' + label: 'breaking-change' + - title: '🚀 Enhancements' + label: 'enhancement' + - title: '🐛 Bug Fixes' + label: 'bug' + - title: '🔒 Security' + label: 'security' + - title: '⚙️ Maintenance/misc' + label: + - 'maintenance' + - 'documentation' +template: | + $CHANGES + + ❤️ Huge thanks to our contributors: $CONTRIBUTORS. +no-changes-template: 'Changes are coming soon 😎' +sort-direction: 'ascending' +replacers: + - search: '/(?:and )?@dependabot-preview(?:\[bot\])?,?/g' + replace: '' + - search: '/(?:and )?@dependabot(?:\[bot\])?,?/g' + replace: '' diff --git a/.github/templates/dependency-issue.md b/.github/templates/dependency-issue.md new file mode 100644 index 000000000..72835c5f6 --- /dev/null +++ b/.github/templates/dependency-issue.md @@ -0,0 +1,22 @@ +This issue is about updating Meilisearch dependencies: + - [ ] Update Meilisearch dependencies with the help of `cargo +nightly udeps --all-targets` (remove unused dependencies) and `cargo upgrade` (upgrade dependencies versions) - ⚠️ Some repositories may contain subdirectories (like heed, charabia, or deserr). Take care of updating these in the main crate as well. This won't be done automatically by `cargo upgrade`. + - [ ] [deserr](https://github.com/meilisearch/deserr) + - [ ] [charabia](https://github.com/meilisearch/charabia/) + - [ ] [heed](https://github.com/meilisearch/heed/) + - [ ] [roaring-rs](https://github.com/RoaringBitmap/roaring-rs/) + - [ ] [obkv](https://github.com/meilisearch/obkv) + - [ ] [grenad](https://github.com/meilisearch/grenad/) + - [ ] [arroy](https://github.com/meilisearch/arroy/) + - [ ] [segment](https://github.com/meilisearch/segment) + - [ ] [bumparaw-collections](https://github.com/meilisearch/bumparaw-collections) + - [ ] [bbqueue](https://github.com/meilisearch/bbqueue) + - [ ] Finally, [Meilisearch](https://github.com/meilisearch/MeiliSearch) + - [ ] If new Rust versions have been released, update the minimal Rust version in use at Meilisearch: + - [ ] in this [GitHub Action file](https://github.com/meilisearch/meilisearch/blob/main/.github/workflows/test-suite.yml), by changing the `toolchain` field of the `rustfmt` job to the latest available nightly (of the day before or the current day). + - [ ] in every [GitHub Action files](https://github.com/meilisearch/meilisearch/blob/main/.github/workflows), by changing all the `dtolnay/rust-toolchain@` references to use the latest stable version. + - [ ] in this [`rust-toolchain.toml`](https://github.com/meilisearch/meilisearch/blob/main/rust-toolchain.toml), by changing the `channel` field to the latest stable version. + - [ ] in the [Dockerfile](https://github.com/meilisearch/meilisearch/blob/main/Dockerfile), by changing the base image to `rust:-alpine`. Check that the image exists on [Dockerhub](https://hub.docker.com/_/rust/tags?page=1&name=alpine). Also, build and run the image to check everything still works! + +⚠️ This issue should be prioritized to avoid any deprecation and vulnerability issues. + +The GitHub action dependencies are managed by [Dependabot](https://github.com/meilisearch/meilisearch/blob/main/.github/dependabot.yml), so no need to update them when solving this issue. diff --git a/.github/workflows/check-valid-milestone.yml b/.github/workflows/check-valid-milestone.yml deleted file mode 100644 index 91d2daa8e..000000000 --- a/.github/workflows/check-valid-milestone.yml +++ /dev/null @@ -1,100 +0,0 @@ -name: PR Milestone Check - -on: - pull_request: - types: [opened, reopened, edited, synchronize, milestoned, demilestoned] - branches: - - "main" - - "release-v*.*.*" - -jobs: - check-milestone: - name: Check PR Milestone - runs-on: ubuntu-latest - - steps: - - name: Checkout code - uses: actions/checkout@v3 - - - name: Validate PR milestone - uses: actions/github-script@v7 - with: - github-token: ${{ secrets.GITHUB_TOKEN }} - script: | - // Get PR number directly from the event payload - const prNumber = context.payload.pull_request.number; - - // Get PR details - const { data: prData } = await github.rest.pulls.get({ - owner: 'meilisearch', - repo: 'meilisearch', - pull_number: prNumber - }); - - // Get base branch name - const baseBranch = prData.base.ref; - console.log(`Base branch: ${baseBranch}`); - - // Get PR milestone - const prMilestone = prData.milestone; - if (!prMilestone) { - core.setFailed('PR must have a milestone assigned'); - return; - } - console.log(`PR milestone: ${prMilestone.title}`); - - // Validate milestone format: vx.y.z - const milestoneRegex = /^v\d+\.\d+\.\d+$/; - if (!milestoneRegex.test(prMilestone.title)) { - core.setFailed(`Milestone "${prMilestone.title}" does not follow the required format vx.y.z`); - return; - } - - // For main branch PRs, check if the milestone is the highest one - if (baseBranch === 'main') { - // Get all milestones - const { data: milestones } = await github.rest.issues.listMilestones({ - owner: 'meilisearch', - repo: 'meilisearch', - state: 'open', - sort: 'due_on', - direction: 'desc' - }); - - // Sort milestones by version number (vx.y.z) - const sortedMilestones = milestones - .filter(m => milestoneRegex.test(m.title)) - .sort((a, b) => { - const versionA = a.title.substring(1).split('.').map(Number); - const versionB = b.title.substring(1).split('.').map(Number); - - // Compare major version - if (versionA[0] !== versionB[0]) return versionB[0] - versionA[0]; - // Compare minor version - if (versionA[1] !== versionB[1]) return versionB[1] - versionA[1]; - // Compare patch version - return versionB[2] - versionA[2]; - }); - - if (sortedMilestones.length === 0) { - core.setFailed('No valid milestones found in the repository. Please create at least one milestone with the format vx.y.z'); - return; - } - - const highestMilestone = sortedMilestones[0]; - console.log(`Highest milestone: ${highestMilestone.title}`); - - if (prMilestone.title !== highestMilestone.title) { - core.setFailed(`PRs targeting the main branch must use the highest milestone (${highestMilestone.title}), but this PR uses ${prMilestone.title}`); - return; - } - } else { - // For release branches, the milestone should match the branch version - const branchVersion = baseBranch.substring(8); // remove 'release-' - if (prMilestone.title !== branchVersion) { - core.setFailed(`PRs targeting release branch "${baseBranch}" must use the matching milestone "${branchVersion}", but this PR uses "${prMilestone.title}"`); - return; - } - } - - console.log('PR milestone validation passed!'); diff --git a/.github/workflows/dependency-issue.yml b/.github/workflows/dependency-issue.yml index 99bd8330a..5de490d76 100644 --- a/.github/workflows/dependency-issue.yml +++ b/.github/workflows/dependency-issue.yml @@ -15,7 +15,7 @@ jobs: steps: - uses: actions/checkout@v3 - name: Download the issue template - run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/dependency-issue.md > $ISSUE_TEMPLATE + run: curl -s https://raw.githubusercontent.com/meilisearch/meilisearch/main/.github/templates/dependency-issue.md > $ISSUE_TEMPLATE - name: Create issue run: | gh issue create \ diff --git a/.github/workflows/flaky-tests.yml b/.github/workflows/flaky-tests.yml index 66be5b823..8f803f0ee 100644 --- a/.github/workflows/flaky-tests.yml +++ b/.github/workflows/flaky-tests.yml @@ -3,7 +3,7 @@ name: Look for flaky tests on: workflow_dispatch: schedule: - - cron: "0 12 * * FRI" # Every Friday at 12:00PM + - cron: '0 4 * * *' # Every day at 4:00AM jobs: flaky: diff --git a/.github/workflows/milestone-workflow.yml b/.github/workflows/milestone-workflow.yml deleted file mode 100644 index f2841c97e..000000000 --- a/.github/workflows/milestone-workflow.yml +++ /dev/null @@ -1,224 +0,0 @@ -name: Milestone's workflow - -# /!\ No git flow are handled here - -# For each Milestone created (not opened!), and if the release is NOT a patch release (only the patch changed) -# - the roadmap issue is created, see https://github.com/meilisearch/engine-team/blob/main/issue-templates/roadmap-issue.md -# - the changelog issue is created, see https://github.com/meilisearch/engine-team/blob/main/issue-templates/changelog-issue.md -# - update the ruleset to add the current release version to the list of allowed versions and be able to use the merge queue. - -# For each Milestone closed -# - the `release_version` label is created -# - this label is applied to all issues/PRs in the Milestone - -on: - milestone: - types: [created, closed] - -env: - MILESTONE_VERSION: ${{ github.event.milestone.title }} - MILESTONE_URL: ${{ github.event.milestone.html_url }} - MILESTONE_DUE_ON: ${{ github.event.milestone.due_on }} - GH_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }} - -jobs: - # ----------------- - # MILESTONE CREATED - # ----------------- - - get-release-version: - if: github.event.action == 'created' - runs-on: ubuntu-latest - outputs: - is-patch: ${{ steps.check-patch.outputs.is-patch }} - steps: - - uses: actions/checkout@v3 - - name: Check if this release is a patch release only - id: check-patch - run: | - echo version: $MILESTONE_VERSION - if [[ $MILESTONE_VERSION =~ ^v[0-9]+\.[0-9]+\.0$ ]]; then - echo 'This is NOT a patch release' - echo "is-patch=false" >> $GITHUB_OUTPUT - elif [[ $MILESTONE_VERSION =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then - echo 'This is a patch release' - echo "is-patch=true" >> $GITHUB_OUTPUT - else - echo "Not a valid format of release, check the Milestone's title." - echo 'Should be vX.Y.Z' - exit 1 - fi - - create-roadmap-issue: - needs: get-release-version - # Create the roadmap issue if the release is not only a patch release - if: github.event.action == 'created' && needs.get-release-version.outputs.is-patch == 'false' - runs-on: ubuntu-latest - env: - ISSUE_TEMPLATE: issue-template.md - steps: - - uses: actions/checkout@v3 - - name: Download the issue template - run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/roadmap-issue.md > $ISSUE_TEMPLATE - - name: Replace all empty occurrences in the templates - run: | - # Replace all <> occurrences - sed -i "s/<>/$MILESTONE_VERSION/g" $ISSUE_TEMPLATE - - # Replace all <> occurrences - milestone_id=$(echo $MILESTONE_URL | cut -d '/' -f 7) - sed -i "s/<>/$milestone_id/g" $ISSUE_TEMPLATE - - # Replace release date if exists - if [[ ! -z $MILESTONE_DUE_ON ]]; then - date=$(echo $MILESTONE_DUE_ON | cut -d 'T' -f 1) - sed -i "s/Release date\: 20XX-XX-XX/Release date\: $date/g" $ISSUE_TEMPLATE - fi - - name: Create the issue - run: | - gh issue create \ - --title "$MILESTONE_VERSION ROADMAP" \ - --label 'epic,impacts docs,impacts integrations,impacts cloud' \ - --body-file $ISSUE_TEMPLATE \ - --milestone $MILESTONE_VERSION - - create-changelog-issue: - needs: get-release-version - # Create the changelog issue if the release is not only a patch release - if: github.event.action == 'created' && needs.get-release-version.outputs.is-patch == 'false' - runs-on: ubuntu-latest - env: - ISSUE_TEMPLATE: issue-template.md - steps: - - uses: actions/checkout@v3 - - name: Download the issue template - run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/changelog-issue.md > $ISSUE_TEMPLATE - - name: Replace all empty occurrences in the templates - run: | - # Replace all <> occurrences - sed -i "s/<>/$MILESTONE_VERSION/g" $ISSUE_TEMPLATE - - # Replace all <> occurrences - milestone_id=$(echo $MILESTONE_URL | cut -d '/' -f 7) - sed -i "s/<>/$milestone_id/g" $ISSUE_TEMPLATE - - name: Create the issue - run: | - gh issue create \ - --title "Create release changelogs for $MILESTONE_VERSION" \ - --label 'impacts docs,documentation' \ - --body-file $ISSUE_TEMPLATE \ - --milestone $MILESTONE_VERSION \ - --assignee curquiza - - create-update-version-issue: - needs: get-release-version - # Create the update-version issue even if the release is a patch release - if: github.event.action == 'created' - runs-on: ubuntu-latest - env: - ISSUE_TEMPLATE: issue-template.md - steps: - - uses: actions/checkout@v3 - - name: Download the issue template - run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/update-version-issue.md > $ISSUE_TEMPLATE - - name: Create the issue - run: | - gh issue create \ - --title "Update version in Cargo.toml for $MILESTONE_VERSION" \ - --label 'maintenance' \ - --body-file $ISSUE_TEMPLATE \ - --milestone $MILESTONE_VERSION - - create-update-openapi-issue: - needs: get-release-version - # Create the openAPI issue if the release is not only a patch release - if: github.event.action == 'created' && needs.get-release-version.outputs.is-patch == 'false' - runs-on: ubuntu-latest - env: - ISSUE_TEMPLATE: issue-template.md - steps: - - uses: actions/checkout@v3 - - name: Download the issue template - run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/update-openapi-issue.md > $ISSUE_TEMPLATE - - name: Create the issue - run: | - gh issue create \ - --title "Update Open API file for $MILESTONE_VERSION" \ - --label 'maintenance' \ - --body-file $ISSUE_TEMPLATE \ - --milestone $MILESTONE_VERSION - - update-ruleset: - runs-on: ubuntu-latest - if: github.event.action == 'created' - steps: - - uses: actions/checkout@v3 - - name: Install jq - run: | - sudo apt-get update - sudo apt-get install -y jq - - name: Update ruleset - env: - # gh api repos/meilisearch/meilisearch/rulesets --jq '.[] | {name: .name, id: .id}' - RULESET_ID: 4253297 - BRANCH_NAME: ${{ github.event.inputs.branch_name }} - run: | - echo "RULESET_ID: ${{ env.RULESET_ID }}" - echo "BRANCH_NAME: ${{ env.BRANCH_NAME }}" - - # Get current ruleset conditions - CONDITIONS=$(gh api repos/meilisearch/meilisearch/rulesets/${{ env.RULESET_ID }} --jq '{ conditions: .conditions }') - - # Update the conditions by appending the milestone version - UPDATED_CONDITIONS=$(echo $CONDITIONS | jq '.conditions.ref_name.include += ["refs/heads/release-'${{ env.MILESTONE_VERSION }}'"]') - - # Update the ruleset from stdin (-) - echo $UPDATED_CONDITIONS | - gh api repos/meilisearch/meilisearch/rulesets/${{ env.RULESET_ID }} \ - --method PUT \ - -H "Accept: application/vnd.github+json" \ - -H "X-GitHub-Api-Version: 2022-11-28" \ - --input - - - # ---------------- - # MILESTONE CLOSED - # ---------------- - - create-release-label: - if: github.event.action == 'closed' - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v3 - - name: Create the ${{ env.MILESTONE_VERSION }} label - run: | - label_description="PRs/issues solved in $MILESTONE_VERSION" - if [[ ! -z $MILESTONE_DUE_ON ]]; then - date=$(echo $MILESTONE_DUE_ON | cut -d 'T' -f 1) - label_description="$label_description released on $date" - fi - - gh api repos/meilisearch/meilisearch/labels \ - --method POST \ - -H "Accept: application/vnd.github+json" \ - -f name="$MILESTONE_VERSION" \ - -f description="$label_description" \ - -f color='ff5ba3' - - labelize-all-milestone-content: - if: github.event.action == 'closed' - needs: create-release-label - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v3 - - name: Add label ${{ env.MILESTONE_VERSION }} to all PRs in the Milestone - run: | - prs=$(gh pr list --search milestone:"$MILESTONE_VERSION" --limit 1000 --state all --json number --template '{{range .}}{{tablerow (printf "%v" .number)}}{{end}}') - for pr in $prs; do - gh pr edit $pr --add-label $MILESTONE_VERSION - done - - name: Add label ${{ env.MILESTONE_VERSION }} to all issues in the Milestone - run: | - issues=$(gh issue list --search milestone:"$MILESTONE_VERSION" --limit 1000 --state all --json number --template '{{range .}}{{tablerow (printf "%v" .number)}}{{end}}') - for issue in $issues; do - gh issue edit $issue --add-label $MILESTONE_VERSION - done diff --git a/.github/workflows/publish-apt-brew-pkg.yml b/.github/workflows/publish-apt-brew-pkg.yml index 5b6994dcf..9a9c566e3 100644 --- a/.github/workflows/publish-apt-brew-pkg.yml +++ b/.github/workflows/publish-apt-brew-pkg.yml @@ -32,7 +32,7 @@ jobs: - name: Build deb package run: cargo deb -p meilisearch -o target/debian/meilisearch.deb - name: Upload debian pkg to release - uses: svenstaro/upload-release-action@2.11.1 + uses: svenstaro/upload-release-action@2.11.2 with: repo_token: ${{ secrets.MEILI_BOT_GH_PAT }} file: target/debian/meilisearch.deb diff --git a/.github/workflows/publish-docker-images.yml b/.github/workflows/publish-docker-images.yml index 74384e670..0ac834bbb 100644 --- a/.github/workflows/publish-docker-images.yml +++ b/.github/workflows/publish-docker-images.yml @@ -16,6 +16,8 @@ on: jobs: docker: runs-on: docker + permissions: + id-token: write # This is needed to use Cosign in keyless mode steps: - uses: actions/checkout@v3 @@ -62,6 +64,9 @@ jobs: - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 + - name: Install cosign + uses: sigstore/cosign-installer@d58896d6a1865668819e1d91763c7751a165e159 # tag=v3.9.2 + - name: Login to Docker Hub uses: docker/login-action@v3 with: @@ -85,6 +90,7 @@ jobs: - name: Build and push uses: docker/build-push-action@v6 + id: build-and-push with: push: true platforms: linux/amd64,linux/arm64 @@ -94,6 +100,17 @@ jobs: COMMIT_DATE=${{ steps.build-metadata.outputs.date }} GIT_TAG=${{ github.ref_name }} + - name: Sign the images with GitHub OIDC Token + env: + DIGEST: ${{ steps.build-and-push.outputs.digest }} + TAGS: ${{ steps.meta.outputs.tags }} + run: | + images="" + for tag in ${TAGS}; do + images+="${tag}@${DIGEST} " + done + cosign sign --yes ${images} + # /!\ Don't touch this without checking with Cloud team - name: Send CI information to Cloud team # Do not send if nightly build (i.e. 'schedule' or 'workflow_dispatch' event) diff --git a/.github/workflows/publish-binaries.yml b/.github/workflows/publish-release-assets.yml similarity index 85% rename from .github/workflows/publish-binaries.yml rename to .github/workflows/publish-release-assets.yml index 3200e778e..ec0d36711 100644 --- a/.github/workflows/publish-binaries.yml +++ b/.github/workflows/publish-release-assets.yml @@ -1,4 +1,4 @@ -name: Publish binaries to GitHub release +name: Publish assets to GitHub release on: workflow_dispatch: @@ -51,7 +51,7 @@ jobs: # No need to upload binaries for dry run (cron) - name: Upload binaries to release if: github.event_name == 'release' - uses: svenstaro/upload-release-action@2.11.1 + uses: svenstaro/upload-release-action@2.11.2 with: repo_token: ${{ secrets.MEILI_BOT_GH_PAT }} file: target/release/meilisearch @@ -81,7 +81,7 @@ jobs: # No need to upload binaries for dry run (cron) - name: Upload binaries to release if: github.event_name == 'release' - uses: svenstaro/upload-release-action@2.11.1 + uses: svenstaro/upload-release-action@2.11.2 with: repo_token: ${{ secrets.MEILI_BOT_GH_PAT }} file: target/release/${{ matrix.artifact_name }} @@ -113,7 +113,7 @@ jobs: - name: Upload the binary to release # No need to upload binaries for dry run (cron) if: github.event_name == 'release' - uses: svenstaro/upload-release-action@2.11.1 + uses: svenstaro/upload-release-action@2.11.2 with: repo_token: ${{ secrets.MEILI_BOT_GH_PAT }} file: target/${{ matrix.target }}/release/meilisearch @@ -178,9 +178,34 @@ jobs: - name: Upload the binary to release # No need to upload binaries for dry run (cron) if: github.event_name == 'release' - uses: svenstaro/upload-release-action@2.11.1 + uses: svenstaro/upload-release-action@2.11.2 with: repo_token: ${{ secrets.MEILI_BOT_GH_PAT }} file: target/${{ matrix.target }}/release/meilisearch asset_name: ${{ matrix.asset_name }} tag: ${{ github.ref }} + + publish-openapi-file: + name: Publish OpenAPI file + runs-on: ubuntu-latest + steps: + - name: Checkout code + uses: actions/checkout@v4 + - name: Setup Rust + uses: actions-rs/toolchain@v1 + with: + toolchain: stable + override: true + - name: Generate OpenAPI file + run: | + cd crates/openapi-generator + cargo run --release -- --pretty --output ../../meilisearch.json + - name: Upload OpenAPI to Release + # No need to upload for dry run (cron) + if: github.event_name == 'release' + uses: svenstaro/upload-release-action@2.11.2 + with: + repo_token: ${{ secrets.MEILI_BOT_GH_PAT }} + file: ./meilisearch.json + asset_name: meilisearch-openapi.json + tag: ${{ github.ref }} diff --git a/.github/workflows/release-drafter.yml b/.github/workflows/release-drafter.yml new file mode 100644 index 000000000..2f8ec04b0 --- /dev/null +++ b/.github/workflows/release-drafter.yml @@ -0,0 +1,20 @@ +name: Release Drafter + +permissions: + contents: read + pull-requests: write + +on: + push: + branches: + - main + +jobs: + update_release_draft: + runs-on: ubuntu-latest + steps: + - uses: release-drafter/release-drafter@v6 + with: + config-name: release-draft-template.yml + env: + GITHUB_TOKEN: ${{ secrets.RELEASE_DRAFTER_TOKEN }} diff --git a/.github/workflows/sdks-tests.yml b/.github/workflows/sdks-tests.yml index dc4d51068..62e31a4ae 100644 --- a/.github/workflows/sdks-tests.yml +++ b/.github/workflows/sdks-tests.yml @@ -9,7 +9,7 @@ on: required: false default: nightly schedule: - - cron: "0 6 * * MON" # Every Monday at 6:00AM + - cron: '0 6 * * *' # Every day at 6:00am env: MEILI_MASTER_KEY: 'masterKey' @@ -114,7 +114,7 @@ jobs: dep ensure fi - name: Run integration tests - run: go test -v ./... + run: go test --race -v ./integration meilisearch-java-tests: needs: define-docker-image diff --git a/.github/workflows/test-suite.yml b/.github/workflows/test-suite.yml index 2924a07bc..75914aea1 100644 --- a/.github/workflows/test-suite.yml +++ b/.github/workflows/test-suite.yml @@ -3,7 +3,7 @@ name: Test suite on: workflow_dispatch: schedule: - # Everyday at 5:00am + # Every day at 5:00am - cron: "0 5 * * *" pull_request: merge_group: diff --git a/.github/workflows/update-cargo-toml-version.yml b/.github/workflows/update-cargo-toml-version.yml index d13a4404a..4118cd651 100644 --- a/.github/workflows/update-cargo-toml-version.yml +++ b/.github/workflows/update-cargo-toml-version.yml @@ -41,5 +41,4 @@ jobs: --title "Update version for the next release ($NEW_VERSION) in Cargo.toml" \ --body '⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.' \ --label 'skip changelog' \ - --milestone $NEW_VERSION \ --base $GITHUB_REF_NAME diff --git a/.gitignore b/.gitignore index fc24b8306..44cfa8f75 100644 --- a/.gitignore +++ b/.gitignore @@ -5,18 +5,24 @@ **/*.json_lines **/*.rs.bk /*.mdb -/data.ms +/*.ms /snapshots /dumps /bench /_xtask_benchmark.ms /benchmarks +.DS_Store # Snapshots ## ... large *.full.snap -## ... unreviewed +## ... unreviewed *.snap.new +## ... pending +*.pending-snap + +# Tmp files +.tmp* # Database snapshot crates/meilisearch/db.snapshot diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 57d52116e..7f718c899 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -106,7 +106,19 @@ Run `cargo xtask --help` from the root of the repository to find out what is ava #### Update the openAPI file if the API changed To update the openAPI file in the code, see [sprint_issue.md](https://github.com/meilisearch/meilisearch/blob/main/.github/ISSUE_TEMPLATE/sprint_issue.md#reminders-when-modifying-the-api). -If you want to update the openAPI file on the [open-api repository](https://github.com/meilisearch/open-api), see [update-openapi-issue.md](https://github.com/meilisearch/engine-team/blob/main/issue-templates/update-openapi-issue.md). + +If you want to generate OpenAPI file manually: + +With swagger: +- Starts Meilisearch with the `swagger` feature flag: `cargo run --features swagger` +- On a browser, open the following URL: http://localhost:7700/scalar +- Click the « Download openAPI file » + +With the internal crate: +```bash +cd crates/openapi-generator +cargo run --release -- --pretty --output meilisearch.json +``` ### Logging @@ -160,25 +172,37 @@ Some notes on GitHub PRs: The draft PRs are recommended when you want to show that you are working on something and make your work visible. - The branch related to the PR must be **up-to-date with `main`** before merging. Fortunately, this project uses [GitHub Merge Queues](https://github.blog/news-insights/product-news/github-merge-queue-is-generally-available/) to automatically enforce this requirement without the PR author having to rebase manually. -## Release Process (for internal team only) - -Meilisearch tools follow the [Semantic Versioning Convention](https://semver.org/). - -### Automation to rebase and Merge the PRs +## Merging PRs This project uses GitHub Merge Queues that helps us manage pull requests merging. -### How to Publish a new Release +Before merging a PR, the maintainer should ensure the following requirements are met +- Automated tests have been added. +- If some tests cannot be automated, manual rigorous tests should be applied. +- ⚠️ If there is an change in the DB: it's mandatory to manually test the `--experimental-dumpless-upgrade` on a DB of the previous Meilisearch minor version (e.g. v1.13 for the v1.14 release). +- If necessary, the feature have been tested in the Cloud production environment (with [prototypes](./documentation/prototypes.md)) and the Cloud UI is ready. +- If necessary, the [documentation](https://github.com/meilisearch/documentation) related to the implemented feature in the PR is ready. +- If necessary, the [integrations](https://github.com/meilisearch/integration-guides) related to the implemented feature in the PR are ready. -The full Meilisearch release process is described in [this guide](https://github.com/meilisearch/engine-team/blob/main/resources/meilisearch-release.md). Please follow it carefully before doing any release. +## Publish Process (for internal team only) + +Meilisearch tools follow the [Semantic Versioning Convention](https://semver.org/). + +### How to publish a new release + +The full Meilisearch release process is described in [this guide](./documentation/release.md). ### How to publish a prototype Depending on the developed feature, you might need to provide a prototyped version of Meilisearch to make it easier to test by the users. This happens in two steps: -- [Release the prototype](https://github.com/meilisearch/engine-team/blob/main/resources/prototypes.md#how-to-publish-a-prototype) -- [Communicate about it](https://github.com/meilisearch/engine-team/blob/main/resources/prototypes.md#communication) +- [Release the prototype](./documentation/prototypes.md#how-to-publish-a-prototype) +- [Communicate about it](./documentation/prototypes.md#communication) + +### How to implement and publish an experimental feature + +Here is our [guidelines and process](./documentation/experimental-features.md) to implement and publish an experimental feature. ### Release assets diff --git a/Cargo.lock b/Cargo.lock index ab749c589..c5db441f1 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -589,7 +589,7 @@ source = "git+https://github.com/meilisearch/bbqueue#cbb87cc707b5af415ef203bdaf2 [[package]] name = "benchmarks" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "bumpalo", @@ -779,7 +779,7 @@ dependencies = [ [[package]] name = "build-info" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "time", @@ -1812,7 +1812,7 @@ dependencies = [ [[package]] name = "dump" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "big_s", @@ -2054,7 +2054,7 @@ checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be" [[package]] name = "file-store" -version = "1.16.0" +version = "1.17.1" dependencies = [ "tempfile", "thiserror 2.0.12", @@ -2076,9 +2076,10 @@ dependencies = [ [[package]] name = "filter-parser" -version = "1.16.0" +version = "1.17.1" dependencies = [ "insta", + "levenshtein_automata", "nom", "nom_locate", "unescaper", @@ -2097,7 +2098,7 @@ dependencies = [ [[package]] name = "flatten-serde-json" -version = "1.16.0" +version = "1.17.1" dependencies = [ "criterion", "serde_json", @@ -2254,7 +2255,7 @@ dependencies = [ [[package]] name = "fuzzers" -version = "1.16.0" +version = "1.17.1" dependencies = [ "arbitrary", "bumpalo", @@ -3172,7 +3173,7 @@ dependencies = [ [[package]] name = "index-scheduler" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "backoff", @@ -3449,7 +3450,7 @@ dependencies = [ [[package]] name = "json-depth-checker" -version = "1.16.0" +version = "1.17.1" dependencies = [ "criterion", "serde_json", @@ -3943,7 +3944,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771" [[package]] name = "meili-snap" -version = "1.16.0" +version = "1.17.1" dependencies = [ "insta", "md5", @@ -3954,7 +3955,7 @@ dependencies = [ [[package]] name = "meilisearch" -version = "1.16.0" +version = "1.17.1" dependencies = [ "actix-cors", "actix-http", @@ -3994,6 +3995,7 @@ dependencies = [ "meili-snap", "meilisearch-auth", "meilisearch-types", + "memmap2", "mimalloc", "mime", "mopa-maintained", @@ -4049,7 +4051,7 @@ dependencies = [ [[package]] name = "meilisearch-auth" -version = "1.16.0" +version = "1.17.1" dependencies = [ "base64 0.22.1", "enum-iterator", @@ -4068,7 +4070,7 @@ dependencies = [ [[package]] name = "meilisearch-types" -version = "1.16.0" +version = "1.17.1" dependencies = [ "actix-web", "anyhow", @@ -4103,7 +4105,7 @@ dependencies = [ [[package]] name = "meilitool" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "clap", @@ -4127,9 +4129,9 @@ checksum = "32a282da65faaf38286cf3be983213fcf1d2e2a58700e808f83f4ea9a4804bc0" [[package]] name = "memmap2" -version = "0.9.5" +version = "0.9.7" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "fd3f7eed9d3848f8b98834af67102b720745c4ec028fcd0aa0239277e7de374f" +checksum = "483758ad303d734cec05e5c12b41d7e93e6a6390c5e9dae6bdeb7c1259012d28" dependencies = [ "libc", "stable_deref_trait", @@ -4137,7 +4139,7 @@ dependencies = [ [[package]] name = "milli" -version = "1.16.0" +version = "1.17.1" dependencies = [ "allocator-api2 0.3.0", "arroy", @@ -4561,6 +4563,17 @@ version = "11.1.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d6790f58c7ff633d8771f42965289203411a5e5c68388703c06e14f24770b41e" +[[package]] +name = "openapi-generator" +version = "0.1.0" +dependencies = [ + "anyhow", + "clap", + "meilisearch", + "serde_json", + "utoipa", +] + [[package]] name = "openssl-probe" version = "0.1.6" @@ -4694,7 +4707,7 @@ checksum = "e3148f5046208a5d56bcfc03053e3ca6334e51da8dfb19b6cdc8b306fae3283e" [[package]] name = "permissive-json-pointer" -version = "1.16.0" +version = "1.17.1" dependencies = [ "big_s", "serde_json", @@ -7533,7 +7546,7 @@ dependencies = [ [[package]] name = "xtask" -version = "1.16.0" +version = "1.17.1" dependencies = [ "anyhow", "build-info", diff --git a/Cargo.toml b/Cargo.toml index 3e57563b6..bc1c354b7 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -19,10 +19,11 @@ members = [ "crates/tracing-trace", "crates/xtask", "crates/build-info", + "crates/openapi-generator", ] [workspace.package] -version = "1.16.0" +version = "1.17.1" authors = [ "Quentin de Quelen ", "Clément Renault ", diff --git a/README.md b/README.md index 77eecde25..40833a0d6 100644 --- a/README.md +++ b/README.md @@ -119,6 +119,6 @@ Meilisearch is, and will always be, open-source! If you want to contribute to th Meilisearch releases and their associated binaries are available on the project's [releases page](https://github.com/meilisearch/meilisearch/releases). -The binaries are versioned following [SemVer conventions](https://semver.org/). To know more, read our [versioning policy](https://github.com/meilisearch/engine-team/blob/main/resources/versioning-policy.md). +The binaries are versioned following [SemVer conventions](https://semver.org/). To know more, read our [versioning policy](./documentation/versioning-policy.md). Differently from the binaries, crates in this repository are not currently available on [crates.io](https://crates.io/) and do not follow [SemVer conventions](https://semver.org). diff --git a/crates/benchmarks/Cargo.toml b/crates/benchmarks/Cargo.toml index 9dccc444b..f05100c2c 100644 --- a/crates/benchmarks/Cargo.toml +++ b/crates/benchmarks/Cargo.toml @@ -14,7 +14,7 @@ license.workspace = true anyhow = "1.0.98" bumpalo = "3.18.1" csv = "1.3.1" -memmap2 = "0.9.5" +memmap2 = "0.9.7" milli = { path = "../milli" } mimalloc = { version = "0.1.47", default-features = false } serde_json = { version = "1.0.140", features = ["preserve_order"] } @@ -51,3 +51,11 @@ harness = false [[bench]] name = "indexing" harness = false + +[[bench]] +name = "sort" +harness = false + +[[bench]] +name = "filter_starts_with" +harness = false diff --git a/crates/benchmarks/benches/filter_starts_with.rs b/crates/benchmarks/benches/filter_starts_with.rs new file mode 100644 index 000000000..a7682cbf8 --- /dev/null +++ b/crates/benchmarks/benches/filter_starts_with.rs @@ -0,0 +1,66 @@ +mod datasets_paths; +mod utils; + +use criterion::{criterion_group, criterion_main}; +use milli::update::Settings; +use milli::FilterableAttributesRule; +use utils::Conf; + +#[cfg(not(windows))] +#[global_allocator] +static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc; + +fn base_conf(builder: &mut Settings) { + let displayed_fields = ["geonameid", "name"].iter().map(|s| s.to_string()).collect(); + builder.set_displayed_fields(displayed_fields); + + let filterable_fields = + ["name"].iter().map(|s| FilterableAttributesRule::Field(s.to_string())).collect(); + builder.set_filterable_fields(filterable_fields); +} + +#[rustfmt::skip] +const BASE_CONF: Conf = Conf { + dataset: datasets_paths::SMOL_ALL_COUNTRIES, + dataset_format: "jsonl", + queries: &[ + "", + ], + configure: base_conf, + primary_key: Some("geonameid"), + ..Conf::BASE +}; + +fn filter_starts_with(c: &mut criterion::Criterion) { + #[rustfmt::skip] + let confs = &[ + utils::Conf { + group_name: "1 letter", + filter: Some("name STARTS WITH e"), + ..BASE_CONF + }, + + utils::Conf { + group_name: "2 letters", + filter: Some("name STARTS WITH es"), + ..BASE_CONF + }, + + utils::Conf { + group_name: "3 letters", + filter: Some("name STARTS WITH est"), + ..BASE_CONF + }, + + utils::Conf { + group_name: "6 letters", + filter: Some("name STARTS WITH estoni"), + ..BASE_CONF + } + ]; + + utils::run_benches(c, confs); +} + +criterion_group!(benches, filter_starts_with); +criterion_main!(benches); diff --git a/crates/benchmarks/benches/sort.rs b/crates/benchmarks/benches/sort.rs new file mode 100644 index 000000000..c3e934432 --- /dev/null +++ b/crates/benchmarks/benches/sort.rs @@ -0,0 +1,114 @@ +//! This benchmark module is used to compare the performance of sorting documents in /search VS /documents +//! +//! The tests/benchmarks were designed in the context of a query returning only 20 documents. + +mod datasets_paths; +mod utils; + +use criterion::{criterion_group, criterion_main}; +use milli::update::Settings; +use utils::Conf; + +#[cfg(not(windows))] +#[global_allocator] +static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc; + +fn base_conf(builder: &mut Settings) { + let displayed_fields = + ["geonameid", "name", "asciiname", "alternatenames", "_geo", "population"] + .iter() + .map(|s| s.to_string()) + .collect(); + builder.set_displayed_fields(displayed_fields); + + let sortable_fields = + ["_geo", "name", "population", "elevation", "timezone", "modification-date"] + .iter() + .map(|s| s.to_string()) + .collect(); + builder.set_sortable_fields(sortable_fields); +} + +#[rustfmt::skip] +const BASE_CONF: Conf = Conf { + dataset: datasets_paths::SMOL_ALL_COUNTRIES, + dataset_format: "jsonl", + configure: base_conf, + primary_key: Some("geonameid"), + queries: &[""], + offsets: &[ + Some((0, 20)), // The most common query in the real world + Some((0, 500)), // A query that ranges over many documents + Some((980, 20)), // The worst query that could happen in the real world + Some((800_000, 20)) // The worst query + ], + get_documents: true, + ..Conf::BASE +}; + +fn bench_sort(c: &mut criterion::Criterion) { + #[rustfmt::skip] + let confs = &[ + utils::Conf { + group_name: "without sort", + sort: None, + ..BASE_CONF + }, + + utils::Conf { + group_name: "sort on many different values", + sort: Some(vec!["name:asc"]), + ..BASE_CONF + }, + + utils::Conf { + group_name: "sort on many similar values", + sort: Some(vec!["timezone:desc"]), + ..BASE_CONF + }, + + utils::Conf { + group_name: "sort on many similar then different values", + sort: Some(vec!["timezone:desc", "name:asc"]), + ..BASE_CONF + }, + + utils::Conf { + group_name: "sort on many different then similar values", + sort: Some(vec!["timezone:desc", "name:asc"]), + ..BASE_CONF + }, + + utils::Conf { + group_name: "geo sort", + sample_size: Some(10), + sort: Some(vec!["_geoPoint(45.4777599, 9.1967508):asc"]), + ..BASE_CONF + }, + + utils::Conf { + group_name: "sort on many similar values then geo sort", + sample_size: Some(50), + sort: Some(vec!["timezone:desc", "_geoPoint(45.4777599, 9.1967508):asc"]), + ..BASE_CONF + }, + + utils::Conf { + group_name: "sort on many different values then geo sort", + sample_size: Some(50), + sort: Some(vec!["name:desc", "_geoPoint(45.4777599, 9.1967508):asc"]), + ..BASE_CONF + }, + + utils::Conf { + group_name: "sort on many fields", + sort: Some(vec!["population:asc", "name:asc", "elevation:asc", "timezone:asc"]), + ..BASE_CONF + }, + ]; + + utils::run_benches(c, confs); +} + +criterion_group!(benches, bench_sort); +criterion_main!(benches); diff --git a/crates/benchmarks/benches/utils.rs b/crates/benchmarks/benches/utils.rs index b12408051..0abbd6c71 100644 --- a/crates/benchmarks/benches/utils.rs +++ b/crates/benchmarks/benches/utils.rs @@ -9,6 +9,7 @@ use anyhow::Context; use bumpalo::Bump; use criterion::BenchmarkId; use memmap2::Mmap; +use milli::documents::sort::recursive_sort; use milli::heed::EnvOpenOptions; use milli::progress::Progress; use milli::update::new::indexer; @@ -35,6 +36,12 @@ pub struct Conf<'a> { pub configure: fn(&mut Settings), pub filter: Option<&'a str>, pub sort: Option>, + /// set to skip documents (offset, limit) + pub offsets: &'a [Option<(usize, usize)>], + /// enable if you want to bench getting documents without querying + pub get_documents: bool, + /// configure the benchmark sample size + pub sample_size: Option, /// enable or disable the optional words on the query pub optional_words: bool, /// primary key, if there is None we'll auto-generate docids for every documents @@ -52,6 +59,9 @@ impl Conf<'_> { configure: |_| (), filter: None, sort: None, + offsets: &[None], + get_documents: false, + sample_size: None, optional_words: true, primary_key: None, }; @@ -145,25 +155,79 @@ pub fn run_benches(c: &mut criterion::Criterion, confs: &[Conf]) { let file_name = Path::new(conf.dataset).file_name().and_then(|f| f.to_str()).unwrap(); let name = format!("{}: {}", file_name, conf.group_name); let mut group = c.benchmark_group(&name); + if let Some(sample_size) = conf.sample_size { + group.sample_size(sample_size); + } for &query in conf.queries { - group.bench_with_input(BenchmarkId::from_parameter(query), &query, |b, &query| { - b.iter(|| { - let rtxn = index.read_txn().unwrap(); - let mut search = index.search(&rtxn); - search.query(query).terms_matching_strategy(TermsMatchingStrategy::default()); - if let Some(filter) = conf.filter { - let filter = Filter::from_str(filter).unwrap().unwrap(); - search.filter(filter); - } - if let Some(sort) = &conf.sort { - let sort = sort.iter().map(|sort| sort.parse().unwrap()).collect(); - search.sort_criteria(sort); - } - let _ids = search.execute().unwrap(); - }); - }); + for offset in conf.offsets { + let parameter = match offset { + None => query.to_string(), + Some((offset, limit)) => format!("{query}[{offset}:{limit}]"), + }; + group.bench_with_input( + BenchmarkId::from_parameter(parameter), + &query, + |b, &query| { + b.iter(|| { + let rtxn = index.read_txn().unwrap(); + let mut search = index.search(&rtxn); + search + .query(query) + .terms_matching_strategy(TermsMatchingStrategy::default()); + if let Some(filter) = conf.filter { + let filter = Filter::from_str(filter).unwrap().unwrap(); + search.filter(filter); + } + if let Some(sort) = &conf.sort { + let sort = sort.iter().map(|sort| sort.parse().unwrap()).collect(); + search.sort_criteria(sort); + } + if let Some((offset, limit)) = offset { + search.offset(*offset).limit(*limit); + } + + let _ids = search.execute().unwrap(); + }); + }, + ); + } } + + if conf.get_documents { + for offset in conf.offsets { + let parameter = match offset { + None => String::from("get_documents"), + Some((offset, limit)) => format!("get_documents[{offset}:{limit}]"), + }; + group.bench_with_input(BenchmarkId::from_parameter(parameter), &(), |b, &()| { + b.iter(|| { + let rtxn = index.read_txn().unwrap(); + if let Some(sort) = &conf.sort { + let sort = sort.iter().map(|sort| sort.parse().unwrap()).collect(); + let all_docs = index.documents_ids(&rtxn).unwrap(); + let facet_sort = + recursive_sort(&index, &rtxn, sort, &all_docs).unwrap(); + let iter = facet_sort.iter().unwrap(); + if let Some((offset, limit)) = offset { + let _results = iter.skip(*offset).take(*limit).collect::>(); + } else { + let _results = iter.collect::>(); + } + } else { + let all_docs = index.documents_ids(&rtxn).unwrap(); + if let Some((offset, limit)) = offset { + let _results = + all_docs.iter().skip(*offset).take(*limit).collect::>(); + } else { + let _results = all_docs.iter().collect::>(); + } + } + }); + }); + } + } + group.finish(); index.prepare_for_closing().wait(); diff --git a/crates/dump/src/reader/compat/v1_to_v2.rs b/crates/dump/src/reader/compat/v1_to_v2.rs index 0d050497b..35d369c3a 100644 --- a/crates/dump/src/reader/compat/v1_to_v2.rs +++ b/crates/dump/src/reader/compat/v1_to_v2.rs @@ -1,3 +1,4 @@ +use std::fs::File; use std::str::FromStr; use super::v2_to_v3::CompatV2ToV3; @@ -94,6 +95,10 @@ impl CompatIndexV1ToV2 { self.from.documents().map(|it| Box::new(it) as Box>) } + pub fn documents_file(&self) -> &File { + self.from.documents_file() + } + pub fn settings(&mut self) -> Result> { Ok(v2::settings::Settings::::from(self.from.settings()?).check()) } diff --git a/crates/dump/src/reader/compat/v2_to_v3.rs b/crates/dump/src/reader/compat/v2_to_v3.rs index e7516e708..62326040e 100644 --- a/crates/dump/src/reader/compat/v2_to_v3.rs +++ b/crates/dump/src/reader/compat/v2_to_v3.rs @@ -1,3 +1,4 @@ +use std::fs::File; use std::str::FromStr; use time::OffsetDateTime; @@ -122,6 +123,13 @@ impl CompatIndexV2ToV3 { } } + pub fn documents_file(&self) -> &File { + match self { + CompatIndexV2ToV3::V2(v2) => v2.documents_file(), + CompatIndexV2ToV3::Compat(compat) => compat.documents_file(), + } + } + pub fn settings(&mut self) -> Result> { let settings = match self { CompatIndexV2ToV3::V2(from) => from.settings()?, diff --git a/crates/dump/src/reader/compat/v3_to_v4.rs b/crates/dump/src/reader/compat/v3_to_v4.rs index 5bb70e9b2..1dba37771 100644 --- a/crates/dump/src/reader/compat/v3_to_v4.rs +++ b/crates/dump/src/reader/compat/v3_to_v4.rs @@ -1,3 +1,5 @@ +use std::fs::File; + use super::v2_to_v3::{CompatIndexV2ToV3, CompatV2ToV3}; use super::v4_to_v5::CompatV4ToV5; use crate::reader::{v3, v4, UpdateFile}; @@ -252,6 +254,13 @@ impl CompatIndexV3ToV4 { } } + pub fn documents_file(&self) -> &File { + match self { + CompatIndexV3ToV4::V3(v3) => v3.documents_file(), + CompatIndexV3ToV4::Compat(compat) => compat.documents_file(), + } + } + pub fn settings(&mut self) -> Result> { Ok(match self { CompatIndexV3ToV4::V3(v3) => { diff --git a/crates/dump/src/reader/compat/v4_to_v5.rs b/crates/dump/src/reader/compat/v4_to_v5.rs index e52acb176..3f47b5b48 100644 --- a/crates/dump/src/reader/compat/v4_to_v5.rs +++ b/crates/dump/src/reader/compat/v4_to_v5.rs @@ -1,3 +1,5 @@ +use std::fs::File; + use super::v3_to_v4::{CompatIndexV3ToV4, CompatV3ToV4}; use super::v5_to_v6::CompatV5ToV6; use crate::reader::{v4, v5, Document}; @@ -241,6 +243,13 @@ impl CompatIndexV4ToV5 { } } + pub fn documents_file(&self) -> &File { + match self { + CompatIndexV4ToV5::V4(v4) => v4.documents_file(), + CompatIndexV4ToV5::Compat(compat) => compat.documents_file(), + } + } + pub fn settings(&mut self) -> Result> { match self { CompatIndexV4ToV5::V4(v4) => Ok(v5::Settings::from(v4.settings()?).check()), diff --git a/crates/dump/src/reader/compat/v5_to_v6.rs b/crates/dump/src/reader/compat/v5_to_v6.rs index f7bda81c6..3a0c8ef0d 100644 --- a/crates/dump/src/reader/compat/v5_to_v6.rs +++ b/crates/dump/src/reader/compat/v5_to_v6.rs @@ -1,3 +1,4 @@ +use std::fs::File; use std::num::NonZeroUsize; use std::str::FromStr; @@ -201,6 +202,10 @@ impl CompatV5ToV6 { pub fn network(&self) -> Result> { Ok(None) } + + pub fn webhooks(&self) -> Option<&v6::Webhooks> { + None + } } pub enum CompatIndexV5ToV6 { @@ -243,6 +248,13 @@ impl CompatIndexV5ToV6 { } } + pub fn documents_file(&self) -> &File { + match self { + CompatIndexV5ToV6::V5(v5) => v5.documents_file(), + CompatIndexV5ToV6::Compat(compat) => compat.documents_file(), + } + } + pub fn settings(&mut self) -> Result> { match self { CompatIndexV5ToV6::V5(v5) => Ok(v6::Settings::from(v5.settings()?).check()), diff --git a/crates/dump/src/reader/mod.rs b/crates/dump/src/reader/mod.rs index 23e7eec9e..da55bb4a8 100644 --- a/crates/dump/src/reader/mod.rs +++ b/crates/dump/src/reader/mod.rs @@ -138,6 +138,13 @@ impl DumpReader { DumpReader::Compat(compat) => compat.network(), } } + + pub fn webhooks(&self) -> Option<&v6::Webhooks> { + match self { + DumpReader::Current(current) => current.webhooks(), + DumpReader::Compat(compat) => compat.webhooks(), + } + } } impl From for DumpReader { @@ -192,6 +199,14 @@ impl DumpIndexReader { } } + /// A reference to a file in the NDJSON format containing all the documents of the index + pub fn documents_file(&self) -> &File { + match self { + DumpIndexReader::Current(v6) => v6.documents_file(), + DumpIndexReader::Compat(compat) => compat.documents_file(), + } + } + pub fn settings(&mut self) -> Result> { match self { DumpIndexReader::Current(v6) => v6.settings(), @@ -357,6 +372,7 @@ pub(crate) mod test { assert_eq!(dump.features().unwrap().unwrap(), RuntimeTogglableFeatures::default()); assert_eq!(dump.network().unwrap(), None); + assert_eq!(dump.webhooks(), None); } #[test] @@ -427,6 +443,43 @@ pub(crate) mod test { insta::assert_snapshot!(network.remotes.get("ms-2").as_ref().unwrap().search_api_key.as_ref().unwrap(), @"foo"); } + #[test] + fn import_dump_v6_webhooks() { + let dump = File::open("tests/assets/v6-with-webhooks.dump").unwrap(); + let dump = DumpReader::open(dump).unwrap(); + + // top level infos + insta::assert_snapshot!(dump.date().unwrap(), @"2025-07-31 9:21:30.479544 +00:00:00"); + insta::assert_debug_snapshot!(dump.instance_uid().unwrap(), @r" + Some( + cb887dcc-34b3-48d1-addd-9815ae721a81, + ) + "); + + // webhooks + let webhooks = dump.webhooks().unwrap(); + insta::assert_json_snapshot!(webhooks, @r#" + { + "webhooks": { + "627ea538-733d-4545-8d2d-03526eb381ce": { + "url": "https://example.com/authorization-less", + "headers": {} + }, + "771b0a28-ef28-4082-b984-536f82958c65": { + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN" + } + }, + "f3583083-f8a7-4cbf-a5e7-fb3f1e28a7e9": { + "url": "https://third.com", + "headers": {} + } + } + } + "#); + } + #[test] fn import_dump_v5() { let dump = File::open("tests/assets/v5.dump").unwrap(); diff --git a/crates/dump/src/reader/v1/mod.rs b/crates/dump/src/reader/v1/mod.rs index ac7324d9a..d86ede62c 100644 --- a/crates/dump/src/reader/v1/mod.rs +++ b/crates/dump/src/reader/v1/mod.rs @@ -72,6 +72,10 @@ impl V1IndexReader { .map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) })) } + pub fn documents_file(&self) -> &File { + self.documents.get_ref() + } + pub fn settings(&mut self) -> Result { Ok(serde_json::from_reader(&mut self.settings)?) } diff --git a/crates/dump/src/reader/v2/mod.rs b/crates/dump/src/reader/v2/mod.rs index 14a643c2d..a74687381 100644 --- a/crates/dump/src/reader/v2/mod.rs +++ b/crates/dump/src/reader/v2/mod.rs @@ -203,6 +203,10 @@ impl V2IndexReader { .map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) })) } + pub fn documents_file(&self) -> &File { + self.documents.get_ref() + } + pub fn settings(&mut self) -> Result> { Ok(self.settings.clone()) } diff --git a/crates/dump/src/reader/v3/mod.rs b/crates/dump/src/reader/v3/mod.rs index 920e1dc6e..5f89eb861 100644 --- a/crates/dump/src/reader/v3/mod.rs +++ b/crates/dump/src/reader/v3/mod.rs @@ -215,6 +215,10 @@ impl V3IndexReader { .map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) })) } + pub fn documents_file(&self) -> &File { + self.documents.get_ref() + } + pub fn settings(&mut self) -> Result> { Ok(self.settings.clone()) } diff --git a/crates/dump/src/reader/v4/mod.rs b/crates/dump/src/reader/v4/mod.rs index 585786ae4..16a1e27c2 100644 --- a/crates/dump/src/reader/v4/mod.rs +++ b/crates/dump/src/reader/v4/mod.rs @@ -210,6 +210,10 @@ impl V4IndexReader { .map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) })) } + pub fn documents_file(&self) -> &File { + self.documents.get_ref() + } + pub fn settings(&mut self) -> Result> { Ok(self.settings.clone()) } diff --git a/crates/dump/src/reader/v5/mod.rs b/crates/dump/src/reader/v5/mod.rs index dfbc6346c..0123db433 100644 --- a/crates/dump/src/reader/v5/mod.rs +++ b/crates/dump/src/reader/v5/mod.rs @@ -247,6 +247,10 @@ impl V5IndexReader { .map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) })) } + pub fn documents_file(&self) -> &File { + self.documents.get_ref() + } + pub fn settings(&mut self) -> Result> { Ok(self.settings.clone()) } diff --git a/crates/dump/src/reader/v6/mod.rs b/crates/dump/src/reader/v6/mod.rs index 449a7e5fe..9bc4b33c5 100644 --- a/crates/dump/src/reader/v6/mod.rs +++ b/crates/dump/src/reader/v6/mod.rs @@ -25,6 +25,7 @@ pub type Key = meilisearch_types::keys::Key; pub type ChatCompletionSettings = meilisearch_types::features::ChatCompletionSettings; pub type RuntimeTogglableFeatures = meilisearch_types::features::RuntimeTogglableFeatures; pub type Network = meilisearch_types::features::Network; +pub type Webhooks = meilisearch_types::webhooks::WebhooksDumpView; // ===== Other types to clarify the code of the compat module // everything related to the tasks @@ -59,6 +60,7 @@ pub struct V6Reader { keys: BufReader, features: Option, network: Option, + webhooks: Option, } impl V6Reader { @@ -93,8 +95,8 @@ impl V6Reader { Err(e) => return Err(e.into()), }; - let network_file = match fs::read(dump.path().join("network.json")) { - Ok(network_file) => Some(network_file), + let network = match fs::read(dump.path().join("network.json")) { + Ok(network_file) => Some(serde_json::from_reader(&*network_file)?), Err(error) => match error.kind() { // Allows the file to be missing, this will only result in all experimental features disabled. ErrorKind::NotFound => { @@ -104,10 +106,16 @@ impl V6Reader { _ => return Err(error.into()), }, }; - let network = if let Some(network_file) = network_file { - Some(serde_json::from_reader(&*network_file)?) - } else { - None + + let webhooks = match fs::read(dump.path().join("webhooks.json")) { + Ok(webhooks_file) => Some(serde_json::from_reader(&*webhooks_file)?), + Err(error) => match error.kind() { + ErrorKind::NotFound => { + debug!("`webhooks.json` not found in dump"); + None + } + _ => return Err(error.into()), + }, }; Ok(V6Reader { @@ -119,6 +127,7 @@ impl V6Reader { features, network, dump, + webhooks, }) } @@ -229,6 +238,10 @@ impl V6Reader { pub fn network(&self) -> Option<&Network> { self.network.as_ref() } + + pub fn webhooks(&self) -> Option<&Webhooks> { + self.webhooks.as_ref() + } } pub struct UpdateFile { @@ -284,6 +297,10 @@ impl V6IndexReader { .map(|line| -> Result<_> { Ok(serde_json::from_str(&line?)?) })) } + pub fn documents_file(&self) -> &File { + self.documents.get_ref() + } + pub fn settings(&mut self) -> Result> { let mut settings: Settings = serde_json::from_reader(&mut self.settings)?; patch_embedders(&mut settings); diff --git a/crates/dump/src/writer.rs b/crates/dump/src/writer.rs index 9f828595a..1d41b6aa5 100644 --- a/crates/dump/src/writer.rs +++ b/crates/dump/src/writer.rs @@ -8,6 +8,7 @@ use meilisearch_types::batches::Batch; use meilisearch_types::features::{ChatCompletionSettings, Network, RuntimeTogglableFeatures}; use meilisearch_types::keys::Key; use meilisearch_types::settings::{Checked, Settings}; +use meilisearch_types::webhooks::WebhooksDumpView; use serde_json::{Map, Value}; use tempfile::TempDir; use time::OffsetDateTime; @@ -74,6 +75,13 @@ impl DumpWriter { Ok(std::fs::write(self.dir.path().join("network.json"), serde_json::to_string(&network)?)?) } + pub fn create_webhooks(&self, webhooks: WebhooksDumpView) -> Result<()> { + Ok(std::fs::write( + self.dir.path().join("webhooks.json"), + serde_json::to_string(&webhooks)?, + )?) + } + pub fn persist_to(self, mut writer: impl Write) -> Result<()> { let gz_encoder = GzEncoder::new(&mut writer, Compression::default()); let mut tar_encoder = tar::Builder::new(gz_encoder); diff --git a/crates/dump/tests/assets/v6-with-webhooks.dump b/crates/dump/tests/assets/v6-with-webhooks.dump new file mode 100644 index 000000000..955c2a63d Binary files /dev/null and b/crates/dump/tests/assets/v6-with-webhooks.dump differ diff --git a/crates/filter-parser/Cargo.toml b/crates/filter-parser/Cargo.toml index 6eeb0794b..173cabd4b 100644 --- a/crates/filter-parser/Cargo.toml +++ b/crates/filter-parser/Cargo.toml @@ -15,6 +15,7 @@ license.workspace = true nom = "7.1.3" nom_locate = "4.2.0" unescaper = "0.1.6" +levenshtein_automata = { version = "0.2.1", features = ["fst_automaton"] } [dev-dependencies] # fixed version due to format breakages in v1.40 diff --git a/crates/filter-parser/src/condition.rs b/crates/filter-parser/src/condition.rs index 0fc007bf1..8e3c04040 100644 --- a/crates/filter-parser/src/condition.rs +++ b/crates/filter-parser/src/condition.rs @@ -7,11 +7,22 @@ use nom::branch::alt; use nom::bytes::complete::tag; +use nom::character::complete::char; +use nom::character::complete::multispace0; use nom::character::complete::multispace1; use nom::combinator::cut; +use nom::combinator::map; +use nom::combinator::value; +use nom::sequence::preceded; use nom::sequence::{terminated, tuple}; use Condition::*; +use crate::error::IResultExt; +use crate::value::parse_vector_value; +use crate::value::parse_vector_value_cut; +use crate::Error; +use crate::ErrorKind; +use crate::VectorFilter; use crate::{parse_value, FilterCondition, IResult, Span, Token}; #[derive(Debug, Clone, PartialEq, Eq)] @@ -113,6 +124,83 @@ pub fn parse_not_exists(input: Span) -> IResult { Ok((input, FilterCondition::Not(Box::new(FilterCondition::Condition { fid: key, op: Exists })))) } +fn parse_vectors(input: Span) -> IResult<(Token, Option, VectorFilter<'_>)> { + let (input, _) = multispace0(input)?; + let (input, fid) = tag("_vectors")(input)?; + + if let Ok((input, _)) = multispace1::<_, crate::Error>(input) { + return Ok((input, (Token::from(fid), None, VectorFilter::None))); + } + + let (input, _) = char('.')(input)?; + + // From this point, we are certain this is a vector filter, so our errors must be final. + // We could use nom's `cut` but it's better to be explicit about the errors + + if let Ok((_, space)) = tag::<_, _, ()>(" ")(input) { + return Err(crate::Error::failure_from_kind(space, ErrorKind::VectorFilterMissingEmbedder)); + } + + let (input, embedder_name) = + parse_vector_value_cut(input, ErrorKind::VectorFilterInvalidEmbedder)?; + + let (input, filter) = alt(( + map( + preceded(tag(".fragments"), |input| { + let (input, _) = tag(".")(input).map_cut(ErrorKind::VectorFilterMissingFragment)?; + parse_vector_value_cut(input, ErrorKind::VectorFilterInvalidFragment) + }), + VectorFilter::Fragment, + ), + value(VectorFilter::UserProvided, tag(".userProvided")), + value(VectorFilter::DocumentTemplate, tag(".documentTemplate")), + value(VectorFilter::Regenerate, tag(".regenerate")), + value(VectorFilter::None, nom::combinator::success("")), + ))(input)?; + + if let Ok((input, point)) = tag::<_, _, ()>(".")(input) { + let opt_value = parse_vector_value(input).ok().map(|(_, v)| v); + let value = + opt_value.as_ref().map(|v| v.value().to_owned()).unwrap_or_else(|| point.to_string()); + let context = opt_value.map(|v| v.original_span()).unwrap_or(point); + let previous_kind = match filter { + VectorFilter::Fragment(_) => Some("fragments"), + VectorFilter::DocumentTemplate => Some("documentTemplate"), + VectorFilter::UserProvided => Some("userProvided"), + VectorFilter::Regenerate => Some("regenerate"), + VectorFilter::None => None, + }; + return Err(Error::failure_from_kind( + context, + ErrorKind::VectorFilterUnknownSuffix(previous_kind, value), + )); + } + + let (input, _) = multispace1(input).map_cut(ErrorKind::VectorFilterLeftover)?; + + Ok((input, (Token::from(fid), Some(embedder_name), filter))) +} + +/// vectors_exists = vectors ("EXISTS" | ("NOT" WS+ "EXISTS")) +pub fn parse_vectors_exists(input: Span) -> IResult { + let (input, (fid, embedder, filter)) = parse_vectors(input)?; + + // Try parsing "EXISTS" first + if let Ok((input, _)) = tag::<_, _, ()>("EXISTS")(input) { + return Ok((input, FilterCondition::VectorExists { fid, embedder, filter })); + } + + // Try parsing "NOT EXISTS" + if let Ok((input, _)) = tuple::<_, _, (), _>((tag("NOT"), multispace1, tag("EXISTS")))(input) { + return Ok(( + input, + FilterCondition::Not(Box::new(FilterCondition::VectorExists { fid, embedder, filter })), + )); + } + + Err(crate::Error::failure_from_kind(input, ErrorKind::VectorFilterOperation)) +} + /// contains = value "CONTAINS" value pub fn parse_contains(input: Span) -> IResult { let (input, (fid, contains, value)) = diff --git a/crates/filter-parser/src/error.rs b/crates/filter-parser/src/error.rs index 855ce983e..e381f45e2 100644 --- a/crates/filter-parser/src/error.rs +++ b/crates/filter-parser/src/error.rs @@ -42,6 +42,23 @@ pub fn cut_with_err<'a, O>( } } +pub trait IResultExt<'a> { + fn map_cut(self, kind: ErrorKind<'a>) -> Self; +} + +impl<'a, T> IResultExt<'a> for IResult<'a, T> { + fn map_cut(self, kind: ErrorKind<'a>) -> Self { + self.map_err(move |e: nom::Err>| { + let input = match e { + nom::Err::Incomplete(_) => return e, + nom::Err::Error(e) => *e.context(), + nom::Err::Failure(e) => *e.context(), + }; + Error::failure_from_kind(input, kind) + }) + } +} + #[derive(Debug)] pub struct Error<'a> { context: Span<'a>, @@ -61,6 +78,14 @@ pub enum ErrorKind<'a> { GeoBoundingBox, MisusedGeoRadius, MisusedGeoBoundingBox, + VectorFilterLeftover, + VectorFilterInvalidQuotes, + VectorFilterMissingEmbedder, + VectorFilterInvalidEmbedder, + VectorFilterMissingFragment, + VectorFilterInvalidFragment, + VectorFilterUnknownSuffix(Option<&'static str>, String), + VectorFilterOperation, InvalidPrimary, InvalidEscapedNumber, ExpectedEof, @@ -91,6 +116,10 @@ impl<'a> Error<'a> { Self { context, kind } } + pub fn failure_from_kind(context: Span<'a>, kind: ErrorKind<'a>) -> nom::Err { + nom::Err::Failure(Self::new_from_kind(context, kind)) + } + pub fn new_from_external(context: Span<'a>, error: impl std::error::Error) -> Self { Self::new_from_kind(context, ErrorKind::External(error.to_string())) } @@ -128,6 +157,20 @@ impl Display for Error<'_> { // first line being the diagnostic and the second line being the incriminated filter. let escaped_input = input.escape_debug(); + fn key_suggestion<'a>(key: &str, keys: &[&'a str]) -> Option<&'a str> { + let typos = + levenshtein_automata::LevenshteinAutomatonBuilder::new(2, true).build_dfa(key); + for key in keys.iter() { + match typos.eval(key) { + levenshtein_automata::Distance::Exact(_) => { + return Some(key); + } + levenshtein_automata::Distance::AtLeast(_) => continue, + } + } + None + } + match &self.kind { ErrorKind::ExpectedValue(_) if input.trim().is_empty() => { writeln!(f, "Was expecting a value but instead got nothing.")? @@ -169,6 +212,44 @@ impl Display for Error<'_> { ErrorKind::MisusedGeoBoundingBox => { writeln!(f, "The `_geoBoundingBox` filter is an operation and can't be used as a value.")? } + ErrorKind::VectorFilterLeftover => { + writeln!(f, "The vector filter has leftover tokens.")? + } + ErrorKind::VectorFilterUnknownSuffix(_, value) if value.as_str() == "." => { + writeln!(f, "Was expecting one of `.fragments`, `.userProvided`, `.documentTemplate`, `.regenerate` or nothing, but instead found a point without a valid value.")?; + } + ErrorKind::VectorFilterUnknownSuffix(None, value) if ["fragments", "userProvided", "documentTemplate", "regenerate"].contains(&value.as_str()) => { + // This will happen with "_vectors.rest.\"userProvided\"" for instance + writeln!(f, "Was expecting this part to be unquoted.")? + } + ErrorKind::VectorFilterUnknownSuffix(None, value) => { + if let Some(suggestion) = key_suggestion(value, &["fragments", "userProvided", "documentTemplate", "regenerate"]) { + writeln!(f, "Was expecting one of `fragments`, `userProvided`, `documentTemplate`, `regenerate` or nothing, but instead found `{value}`. Did you mean `{suggestion}`?")?; + } else { + writeln!(f, "Was expecting one of `fragments`, `userProvided`, `documentTemplate`, `regenerate` or nothing, but instead found `{value}`.")?; + } + } + ErrorKind::VectorFilterUnknownSuffix(Some(previous_filter_kind), value) => { + writeln!(f, "Vector filter can only accept one of `fragments`, `userProvided`, `documentTemplate` or `regenerate`, but found both `{previous_filter_kind}` and `{value}`.")? + }, + ErrorKind::VectorFilterInvalidFragment => { + writeln!(f, "The vector filter's fragment name is invalid.")? + } + ErrorKind::VectorFilterMissingFragment => { + writeln!(f, "The vector filter is missing a fragment name.")? + } + ErrorKind::VectorFilterMissingEmbedder => { + writeln!(f, "Was expecting embedder name but found nothing.")? + } + ErrorKind::VectorFilterInvalidEmbedder => { + writeln!(f, "The vector filter's embedder name is invalid.")? + } + ErrorKind::VectorFilterOperation => { + writeln!(f, "Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter.")? + } + ErrorKind::VectorFilterInvalidQuotes => { + writeln!(f, "The quotes in one of the values are inconsistent.")? + } ErrorKind::ReservedKeyword(word) => { writeln!(f, "`{word}` is a reserved keyword and thus cannot be used as a field name unless it is put inside quotes. Use \"{word}\" or \'{word}\' instead.")? } diff --git a/crates/filter-parser/src/lib.rs b/crates/filter-parser/src/lib.rs index 67ac8a3a2..64bb8dd37 100644 --- a/crates/filter-parser/src/lib.rs +++ b/crates/filter-parser/src/lib.rs @@ -66,6 +66,9 @@ use nom_locate::LocatedSpan; pub(crate) use value::parse_value; use value::word_exact; +use crate::condition::parse_vectors_exists; +use crate::error::IResultExt; + pub type Span<'a> = LocatedSpan<&'a str, &'a str>; type IResult<'a, Ret> = nom::IResult, Ret, Error<'a>>; @@ -137,6 +140,15 @@ impl<'a> From<&'a str> for Token<'a> { } } +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum VectorFilter<'a> { + Fragment(Token<'a>), + DocumentTemplate, + UserProvided, + Regenerate, + None, +} + #[derive(Debug, Clone, PartialEq, Eq)] pub enum FilterCondition<'a> { Not(Box), @@ -144,6 +156,7 @@ pub enum FilterCondition<'a> { In { fid: Token<'a>, els: Vec> }, Or(Vec), And(Vec), + VectorExists { fid: Token<'a>, embedder: Option>, filter: VectorFilter<'a> }, GeoLowerThan { point: [Token<'a>; 2], radius: Token<'a> }, GeoBoundingBox { top_right_point: [Token<'a>; 2], bottom_left_point: [Token<'a>; 2] }, GeoPolygon { points: Vec<[Token<'a>; 2]> }, @@ -167,18 +180,34 @@ impl<'a> FilterCondition<'a> { | Condition::Exists | Condition::LowerThan(_) | Condition::LowerThanOrEqual(_) - | Condition::Between { .. } => None, - Condition::Contains { keyword, word: _ } - | Condition::StartsWith { keyword, word: _ } => Some(keyword), + | Condition::Between { .. } + | Condition::StartsWith { .. } => None, + Condition::Contains { keyword, word: _ } => Some(keyword), }, FilterCondition::Not(this) => this.use_contains_operator(), FilterCondition::Or(seq) | FilterCondition::And(seq) => { seq.iter().find_map(|filter| filter.use_contains_operator()) } + FilterCondition::VectorExists { .. } + | FilterCondition::GeoLowerThan { .. } + | FilterCondition::GeoBoundingBox { .. } + | FilterCondition::GeoPolygon { .. } + | FilterCondition::In { .. } => None, + } + } + + pub fn use_vector_filter(&self) -> Option<&Token> { + match self { + FilterCondition::Condition { .. } => None, + FilterCondition::Not(this) => this.use_vector_filter(), + FilterCondition::Or(seq) | FilterCondition::And(seq) => { + seq.iter().find_map(|filter| filter.use_vector_filter()) + } FilterCondition::GeoLowerThan { .. } | FilterCondition::GeoBoundingBox { .. } | FilterCondition::GeoPolygon { .. } | FilterCondition::In { .. } => None, + FilterCondition::VectorExists { fid, .. } => Some(fid), } } @@ -266,10 +295,7 @@ fn parse_in_body(input: Span) -> IResult> { let (input, _) = ws(word_exact("IN"))(input)?; // everything after `IN` can be a failure - let (input, _) = - cut_with_err(tag("["), |_| Error::new_from_kind(input, ErrorKind::InOpeningBracket))( - input, - )?; + let (input, _) = tag("[")(input).map_cut(ErrorKind::InOpeningBracket)?; let (input, content) = cut(parse_value_list)(input)?; @@ -415,7 +441,7 @@ fn parse_geo_bounding_box(input: Span) -> IResult { let (input, args) = parsed?; if args.len() != 2 || args[0].len() != 2 || args[1].len() != 2 { - return Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::GeoBoundingBox))); + return Err(Error::failure_from_kind(input, ErrorKind::GeoBoundingBox)); } let res = FilterCondition::GeoBoundingBox { @@ -468,7 +494,7 @@ fn parse_geo_point(input: Span) -> IResult { ))(input) .map_err(|e| e.map(|_| Error::new_from_kind(input, ErrorKind::ReservedGeo("_geoPoint"))))?; // if we succeeded we still return a `Failure` because geoPoints are not allowed - Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::ReservedGeo("_geoPoint")))) + Err(Error::failure_from_kind(input, ErrorKind::ReservedGeo("_geoPoint"))) } /// geoPoint = WS* "_geoDistance(float WS* "," WS* float WS* "," WS* float) @@ -482,7 +508,7 @@ fn parse_geo_distance(input: Span) -> IResult { ))(input) .map_err(|e| e.map(|_| Error::new_from_kind(input, ErrorKind::ReservedGeo("_geoDistance"))))?; // if we succeeded we still return a `Failure` because `geoDistance` filters are not allowed - Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::ReservedGeo("_geoDistance")))) + Err(Error::failure_from_kind(input, ErrorKind::ReservedGeo("_geoDistance"))) } /// geo = WS* "_geo(float WS* "," WS* float WS* "," WS* float) @@ -496,7 +522,7 @@ fn parse_geo(input: Span) -> IResult { ))(input) .map_err(|e| e.map(|_| Error::new_from_kind(input, ErrorKind::ReservedGeo("_geo"))))?; // if we succeeded we still return a `Failure` because `_geo` filter is not allowed - Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::ReservedGeo("_geo")))) + Err(Error::failure_from_kind(input, ErrorKind::ReservedGeo("_geo"))) } fn parse_error_reserved_keyword(input: Span) -> IResult { @@ -535,8 +561,7 @@ fn parse_primary(input: Span, depth: usize) -> IResult { parse_is_not_null, parse_is_empty, parse_is_not_empty, - parse_exists, - parse_not_exists, + alt((parse_vectors_exists, parse_exists, parse_not_exists)), parse_to, parse_contains, parse_not_contains, @@ -592,6 +617,22 @@ impl std::fmt::Display for FilterCondition<'_> { } write!(f, "]") } + FilterCondition::VectorExists { fid: _, embedder, filter: inner } => { + write!(f, "_vectors")?; + if let Some(embedder) = embedder { + write!(f, ".{:?}", embedder.value())?; + } + match inner { + VectorFilter::Fragment(fragment) => { + write!(f, ".fragments.{:?}", fragment.value())? + } + VectorFilter::DocumentTemplate => write!(f, ".documentTemplate")?, + VectorFilter::UserProvided => write!(f, ".userProvided")?, + VectorFilter::Regenerate => write!(f, ".regenerate")?, + VectorFilter::None => (), + } + write!(f, " EXISTS") + } FilterCondition::GeoLowerThan { point, radius } => { write!(f, "_geoRadius({}, {}, {})", point[0], point[1], radius) } @@ -672,6 +713,9 @@ pub mod tests { insta::assert_snapshot!(p(r"title = 'foo\\\\\\\\'"), @r#"{title} = {foo\\\\}"#); // but it also works with other sequences insta::assert_snapshot!(p(r#"title = 'foo\x20\n\t\"\'"'"#), @"{title} = {foo \n\t\"\'\"}"); + + insta::assert_snapshot!(p(r#"_vectors." valid.name ".fragments."also.. valid! " EXISTS"#), @r#"_vectors." valid.name ".fragments."also.. valid! " EXISTS"#); + insta::assert_snapshot!(p("_vectors.\"\n\t\r\\\"\" EXISTS"), @r#"_vectors."\n\t\r\"" EXISTS"#); } #[test] @@ -734,6 +778,18 @@ pub mod tests { insta::assert_snapshot!(p("NOT subscribers IS NOT EMPTY"), @"{subscribers} IS EMPTY"); insta::assert_snapshot!(p("subscribers IS NOT EMPTY"), @"NOT ({subscribers} IS EMPTY)"); + // Test _vectors EXISTS + _vectors NOT EXITS + insta::assert_snapshot!(p("_vectors EXISTS"), @"_vectors EXISTS"); + insta::assert_snapshot!(p("_vectors.embedderName EXISTS"), @r#"_vectors."embedderName" EXISTS"#); + insta::assert_snapshot!(p("_vectors.embedderName.documentTemplate EXISTS"), @r#"_vectors."embedderName".documentTemplate EXISTS"#); + insta::assert_snapshot!(p("_vectors.embedderName.regenerate EXISTS"), @r#"_vectors."embedderName".regenerate EXISTS"#); + insta::assert_snapshot!(p("_vectors.embedderName.regenerate EXISTS"), @r#"_vectors."embedderName".regenerate EXISTS"#); + insta::assert_snapshot!(p("_vectors.embedderName.fragments.fragmentName EXISTS"), @r#"_vectors."embedderName".fragments."fragmentName" EXISTS"#); + insta::assert_snapshot!(p(" _vectors.embedderName.fragments.fragmentName EXISTS"), @r#"_vectors."embedderName".fragments."fragmentName" EXISTS"#); + insta::assert_snapshot!(p("NOT _vectors EXISTS"), @"NOT (_vectors EXISTS)"); + insta::assert_snapshot!(p(" NOT _vectors EXISTS"), @"NOT (_vectors EXISTS)"); + insta::assert_snapshot!(p(" _vectors NOT EXISTS"), @"NOT (_vectors EXISTS)"); + // Test EXISTS + NOT EXITS insta::assert_snapshot!(p("subscribers EXISTS"), @"{subscribers} EXISTS"); insta::assert_snapshot!(p("NOT subscribers EXISTS"), @"NOT ({subscribers} EXISTS)"); @@ -988,6 +1044,71 @@ pub mod tests { "### ); + insta::assert_snapshot!(p(r#"_vectors _vectors EXISTS"#), @r" + Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter. + 10:25 _vectors _vectors EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors. embedderName EXISTS"#), @r" + Was expecting embedder name but found nothing. + 10:11 _vectors. embedderName EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors .embedderName EXISTS"#), @r" + Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter. + 10:30 _vectors .embedderName EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors.embedderName. EXISTS"#), @r" + Was expecting one of `.fragments`, `.userProvided`, `.documentTemplate`, `.regenerate` or nothing, but instead found a point without a valid value. + 22:23 _vectors.embedderName. EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors."embedderName EXISTS"#), @r#" + The quotes in one of the values are inconsistent. + 10:30 _vectors."embedderName EXISTS + "#); + insta::assert_snapshot!(p(r#"_vectors."embedderNam"e EXISTS"#), @r#" + The vector filter has leftover tokens. + 23:31 _vectors."embedderNam"e EXISTS + "#); + insta::assert_snapshot!(p(r#"_vectors.embedderName.documentTemplate. EXISTS"#), @r" + Was expecting one of `.fragments`, `.userProvided`, `.documentTemplate`, `.regenerate` or nothing, but instead found a point without a valid value. + 39:40 _vectors.embedderName.documentTemplate. EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors.embedderName.fragments EXISTS"#), @r" + The vector filter is missing a fragment name. + 32:39 _vectors.embedderName.fragments EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors.embedderName.fragments. EXISTS"#), @r" + The vector filter's fragment name is invalid. + 33:40 _vectors.embedderName.fragments. EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors.embedderName.fragments.test test EXISTS"#), @r" + Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter. + 38:49 _vectors.embedderName.fragments.test test EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors.embedderName.fragments. test EXISTS"#), @r" + The vector filter's fragment name is invalid. + 33:45 _vectors.embedderName.fragments. test EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors.embedderName .fragments. test EXISTS"#), @r" + Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter. + 23:46 _vectors.embedderName .fragments. test EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors.embedderName .fragments.test EXISTS"#), @r" + Was expecting an operation like `EXISTS` or `NOT EXISTS` after the vector filter. + 23:45 _vectors.embedderName .fragments.test EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors.embedderName.fargments.test EXISTS"#), @r" + Was expecting one of `fragments`, `userProvided`, `documentTemplate`, `regenerate` or nothing, but instead found `fargments`. Did you mean `fragments`? + 23:32 _vectors.embedderName.fargments.test EXISTS + "); + insta::assert_snapshot!(p(r#"_vectors.embedderName."userProvided" EXISTS"#), @r#" + Was expecting this part to be unquoted. + 24:36 _vectors.embedderName."userProvided" EXISTS + "#); + insta::assert_snapshot!(p(r#"_vectors.embedderName.userProvided.fragments.test EXISTS"#), @r" + Vector filter can only accept one of `fragments`, `userProvided`, `documentTemplate` or `regenerate`, but found both `userProvided` and `fragments`. + 36:45 _vectors.embedderName.userProvided.fragments.test EXISTS + "); + insta::assert_snapshot!(p(r#"NOT OR EXISTS AND EXISTS NOT EXISTS"#), @r###" Was expecting a value but instead got `OR`, which is a reserved keyword. To use `OR` as a field name or a value, surround it by quotes. 5:7 NOT OR EXISTS AND EXISTS NOT EXISTS diff --git a/crates/filter-parser/src/value.rs b/crates/filter-parser/src/value.rs index 98cac39fe..35a5c0ab4 100644 --- a/crates/filter-parser/src/value.rs +++ b/crates/filter-parser/src/value.rs @@ -80,6 +80,51 @@ pub fn word_exact<'a, 'b: 'a>(tag: &'b str) -> impl Fn(Span<'a>) -> IResult<'a, } } +/// vector_value = ( non_dot_word | singleQuoted | doubleQuoted) +pub fn parse_vector_value(input: Span) -> IResult { + pub fn non_dot_word(input: Span) -> IResult { + let (input, word) = take_while1(|c| is_value_component(c) && c != '.')(input)?; + Ok((input, word.into())) + } + + let (input, value) = alt(( + delimited(char('\''), cut(|input| quoted_by('\'', input)), cut(char('\''))), + delimited(char('"'), cut(|input| quoted_by('"', input)), cut(char('"'))), + non_dot_word, + ))(input)?; + + match unescaper::unescape(value.value()) { + Ok(content) => { + if content.len() != value.value().len() { + Ok((input, Token::new(value.original_span(), Some(content)))) + } else { + Ok((input, value)) + } + } + Err(unescaper::Error::IncompleteStr(_)) => Err(nom::Err::Incomplete(nom::Needed::Unknown)), + Err(unescaper::Error::ParseIntError { .. }) => Err(nom::Err::Error(Error::new_from_kind( + value.original_span(), + ErrorKind::InvalidEscapedNumber, + ))), + Err(unescaper::Error::InvalidChar { .. }) => Err(nom::Err::Error(Error::new_from_kind( + value.original_span(), + ErrorKind::MalformedValue, + ))), + } +} + +pub fn parse_vector_value_cut<'a>(input: Span<'a>, kind: ErrorKind<'a>) -> IResult<'a, Token<'a>> { + parse_vector_value(input).map_err(|e| match e { + nom::Err::Failure(e) => match e.kind() { + ErrorKind::Char(c) if *c == '"' || *c == '\'' => { + crate::Error::failure_from_kind(input, ErrorKind::VectorFilterInvalidQuotes) + } + _ => crate::Error::failure_from_kind(input, kind), + }, + _ => crate::Error::failure_from_kind(input, kind), + }) +} + /// value = WS* ( word | singleQuoted | doubleQuoted) WS+ pub fn parse_value(input: Span) -> IResult { // to get better diagnostic message we are going to strip the left whitespaces from the input right now @@ -99,31 +144,21 @@ pub fn parse_value(input: Span) -> IResult { } match parse_geo_radius(input) { - Ok(_) => { - return Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::MisusedGeoRadius))) - } + Ok(_) => return Err(Error::failure_from_kind(input, ErrorKind::MisusedGeoRadius)), // if we encountered a failure it means the user badly wrote a _geoRadius filter. // But instead of showing them how to fix his syntax we are going to tell them they should not use this filter as a value. Err(e) if e.is_failure() => { - return Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::MisusedGeoRadius))) + return Err(Error::failure_from_kind(input, ErrorKind::MisusedGeoRadius)) } _ => (), } match parse_geo_bounding_box(input) { - Ok(_) => { - return Err(nom::Err::Failure(Error::new_from_kind( - input, - ErrorKind::MisusedGeoBoundingBox, - ))) - } + Ok(_) => return Err(Error::failure_from_kind(input, ErrorKind::MisusedGeoBoundingBox)), // if we encountered a failure it means the user badly wrote a _geoBoundingBox filter. // But instead of showing them how to fix his syntax we are going to tell them they should not use this filter as a value. Err(e) if e.is_failure() => { - return Err(nom::Err::Failure(Error::new_from_kind( - input, - ErrorKind::MisusedGeoBoundingBox, - ))) + return Err(Error::failure_from_kind(input, ErrorKind::MisusedGeoBoundingBox)) } _ => (), } diff --git a/crates/index-scheduler/Cargo.toml b/crates/index-scheduler/Cargo.toml index de0d01935..20cc49686 100644 --- a/crates/index-scheduler/Cargo.toml +++ b/crates/index-scheduler/Cargo.toml @@ -26,7 +26,7 @@ flate2 = "1.1.2" indexmap = "2.9.0" meilisearch-auth = { path = "../meilisearch-auth" } meilisearch-types = { path = "../meilisearch-types" } -memmap2 = "0.9.5" +memmap2 = "0.9.7" page_size = "0.6.0" rayon = "1.10.0" roaring = { version = "0.10.12", features = ["serde"] } diff --git a/crates/index-scheduler/src/features.rs b/crates/index-scheduler/src/features.rs index b52a659a6..dee665458 100644 --- a/crates/index-scheduler/src/features.rs +++ b/crates/index-scheduler/src/features.rs @@ -85,7 +85,7 @@ impl RoFeatures { Ok(()) } else { Err(FeatureNotEnabledError { - disabled_action: "Using `CONTAINS` or `STARTS WITH` in a filter", + disabled_action: "Using `CONTAINS` in a filter", feature: "contains filter", issue_link: "https://github.com/orgs/meilisearch/discussions/763", } @@ -182,6 +182,7 @@ impl FeatureData { ..persisted_features })); + // Once this is stabilized, network should be stored along with webhooks in index-scheduler's persisted database let network_db = runtime_features_db.remap_data_type::>(); let network: Network = network_db.get(wtxn, db_keys::NETWORK)?.unwrap_or_default(); diff --git a/crates/index-scheduler/src/index_mapper/mod.rs b/crates/index-scheduler/src/index_mapper/mod.rs index 86fb17ca7..e6bdccd41 100644 --- a/crates/index-scheduler/src/index_mapper/mod.rs +++ b/crates/index-scheduler/src/index_mapper/mod.rs @@ -71,7 +71,7 @@ pub struct IndexMapper { /// Path to the folder where the LMDB environments of each index are. base_path: PathBuf, /// The map size an index is opened with on the first time. - index_base_map_size: usize, + pub(crate) index_base_map_size: usize, /// The quantity by which the map size of an index is incremented upon reopening, in bytes. index_growth_amount: usize, /// Whether we open a meilisearch index with the MDB_WRITEMAP option or not. diff --git a/crates/index-scheduler/src/insta_snapshot.rs b/crates/index-scheduler/src/insta_snapshot.rs index 0cbbb2514..cb804d9b4 100644 --- a/crates/index-scheduler/src/insta_snapshot.rs +++ b/crates/index-scheduler/src/insta_snapshot.rs @@ -20,16 +20,17 @@ pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String { let IndexScheduler { cleanup_enabled: _, + experimental_no_edition_2024_for_dumps: _, processing_tasks, env, version, queue, scheduler, + persisted, index_mapper, features: _, - webhook_url: _, - webhook_authorization_header: _, + webhooks: _, test_breakpoint_sdr: _, planned_failures: _, run_loop_iteration: _, @@ -61,6 +62,13 @@ pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String { } snap.push_str("\n----------------------------------------------------------------------\n"); + let persisted_db_snapshot = snapshot_persisted_db(&rtxn, persisted); + if !persisted_db_snapshot.is_empty() { + snap.push_str("### Persisted:\n"); + snap.push_str(&persisted_db_snapshot); + snap.push_str("----------------------------------------------------------------------\n"); + } + snap.push_str("### All Tasks:\n"); snap.push_str(&snapshot_all_tasks(&rtxn, queue.tasks.all_tasks)); snap.push_str("----------------------------------------------------------------------\n"); @@ -199,6 +207,16 @@ pub fn snapshot_date_db(rtxn: &RoTxn, db: Database) -> String { + let mut snap = String::new(); + let iter = db.iter(rtxn).unwrap(); + for next in iter { + let (key, value) = next.unwrap(); + snap.push_str(&format!("{key}: {value}\n")); + } + snap +} + pub fn snapshot_task(task: &Task) -> String { let mut snap = String::new(); let Task { @@ -310,6 +328,7 @@ pub fn snapshot_status( } snap } + pub fn snapshot_kind(rtxn: &RoTxn, db: Database, RoaringBitmapCodec>) -> String { let mut snap = String::new(); let iter = db.iter(rtxn).unwrap(); @@ -330,6 +349,7 @@ pub fn snapshot_index_tasks(rtxn: &RoTxn, db: Database) } snap } + pub fn snapshot_canceled_by(rtxn: &RoTxn, db: Database) -> String { let mut snap = String::new(); let iter = db.iter(rtxn).unwrap(); diff --git a/crates/index-scheduler/src/lib.rs b/crates/index-scheduler/src/lib.rs index b2f27d66b..6ad7a8397 100644 --- a/crates/index-scheduler/src/lib.rs +++ b/crates/index-scheduler/src/lib.rs @@ -65,13 +65,16 @@ use meilisearch_types::milli::vector::{ use meilisearch_types::milli::{self, Index}; use meilisearch_types::task_view::TaskView; use meilisearch_types::tasks::{KindWithContent, Task}; +use meilisearch_types::webhooks::{Webhook, WebhooksDumpView, WebhooksView}; use milli::vector::db::IndexEmbeddingConfig; use processing::ProcessingTasks; pub use queue::Query; use queue::Queue; use roaring::RoaringBitmap; use scheduler::Scheduler; +use serde::{Deserialize, Serialize}; use time::OffsetDateTime; +use uuid::Uuid; use versioning::Versioning; use crate::index_mapper::IndexMapper; @@ -80,7 +83,15 @@ use crate::utils::clamp_to_page_size; pub(crate) type BEI128 = I128; const TASK_SCHEDULER_SIZE_THRESHOLD_PERCENT_INT: u64 = 40; -const CHAT_SETTINGS_DB_NAME: &str = "chat-settings"; + +mod db_name { + pub const CHAT_SETTINGS: &str = "chat-settings"; + pub const PERSISTED: &str = "persisted"; +} + +mod db_keys { + pub const WEBHOOKS: &str = "webhooks"; +} #[derive(Debug)] pub struct IndexSchedulerOptions { @@ -98,10 +109,10 @@ pub struct IndexSchedulerOptions { pub snapshots_path: PathBuf, /// The path to the folder containing the dumps. pub dumps_path: PathBuf, - /// The URL on which we must send the tasks statuses - pub webhook_url: Option, - /// The value we will send into the Authorization HTTP header on the webhook URL - pub webhook_authorization_header: Option, + /// The webhook url that was set by the CLI. + pub cli_webhook_url: Option, + /// The Authorization header to send to the webhook URL that was set by the CLI. + pub cli_webhook_authorization: Option, /// The maximum size, in bytes, of the task index. pub task_db_size: usize, /// The size, in bytes, with which a meilisearch index is opened the first time of each meilisearch index. @@ -168,10 +179,14 @@ pub struct IndexScheduler { /// Whether we should automatically cleanup the task queue or not. pub(crate) cleanup_enabled: bool, - /// The webhook url we should send tasks to after processing every batches. - pub(crate) webhook_url: Option, - /// The Authorization header to send to the webhook URL. - pub(crate) webhook_authorization_header: Option, + /// Whether we should use the old document indexer or the new one. + pub(crate) experimental_no_edition_2024_for_dumps: bool, + + /// A database to store single-keyed data that is persisted across restarts. + persisted: Database, + + /// Webhook, loaded and stored in the `persisted` database + webhooks: Arc, /// A map to retrieve the runtime representation of an embedder depending on its configuration. /// @@ -210,8 +225,10 @@ impl IndexScheduler { index_mapper: self.index_mapper.clone(), cleanup_enabled: self.cleanup_enabled, - webhook_url: self.webhook_url.clone(), - webhook_authorization_header: self.webhook_authorization_header.clone(), + experimental_no_edition_2024_for_dumps: self.experimental_no_edition_2024_for_dumps, + persisted: self.persisted, + + webhooks: self.webhooks.clone(), embedders: self.embedders.clone(), #[cfg(test)] test_breakpoint_sdr: self.test_breakpoint_sdr.clone(), @@ -230,6 +247,7 @@ impl IndexScheduler { + IndexMapper::nb_db() + features::FeatureData::nb_db() + 1 // chat-prompts + + 1 // persisted } /// Create an index scheduler and start its run loop. @@ -280,10 +298,18 @@ impl IndexScheduler { let version = versioning::Versioning::new(&env, from_db_version)?; let mut wtxn = env.write_txn()?; + let features = features::FeatureData::new(&env, &mut wtxn, options.instance_features)?; let queue = Queue::new(&env, &mut wtxn, &options)?; let index_mapper = IndexMapper::new(&env, &mut wtxn, &options, budget)?; - let chat_settings = env.create_database(&mut wtxn, Some(CHAT_SETTINGS_DB_NAME))?; + let chat_settings = env.create_database(&mut wtxn, Some(db_name::CHAT_SETTINGS))?; + + let persisted = env.create_database(&mut wtxn, Some(db_name::PERSISTED))?; + let webhooks_db = persisted.remap_data_type::>(); + let mut webhooks = webhooks_db.get(&wtxn, db_keys::WEBHOOKS)?.unwrap_or_default(); + webhooks + .with_cli(options.cli_webhook_url.clone(), options.cli_webhook_authorization.clone()); + wtxn.commit()?; // allow unreachable_code to get rids of the warning in the case of a test build. @@ -296,8 +322,11 @@ impl IndexScheduler { index_mapper, env, cleanup_enabled: options.cleanup_enabled, - webhook_url: options.webhook_url, - webhook_authorization_header: options.webhook_authorization_header, + experimental_no_edition_2024_for_dumps: options + .indexer_config + .experimental_no_edition_2024_for_dumps, + persisted, + webhooks: Arc::new(webhooks), embedders: Default::default(), #[cfg(test)] @@ -594,6 +623,11 @@ impl IndexScheduler { Ok(nbr_index_processing_tasks > 0) } + /// Whether the index should use the old document indexer. + pub fn no_edition_2024_for_dumps(&self) -> bool { + self.experimental_no_edition_2024_for_dumps + } + /// Return the tasks matching the query from the user's point of view along /// with the total number of tasks matching the query, ignoring from and limit. /// @@ -740,86 +774,92 @@ impl IndexScheduler { Ok(()) } - /// Once the tasks changes have been committed we must send all the tasks that were updated to our webhook if there is one. - fn notify_webhook(&self, updated: &RoaringBitmap) -> Result<()> { - if let Some(ref url) = self.webhook_url { - struct TaskReader<'a, 'b> { - rtxn: &'a RoTxn<'a>, - index_scheduler: &'a IndexScheduler, - tasks: &'b mut roaring::bitmap::Iter<'b>, - buffer: Vec, - written: usize, - } + /// Once the tasks changes have been committed we must send all the tasks that were updated to our webhooks + fn notify_webhooks(&self, updated: RoaringBitmap) { + struct TaskReader<'a, 'b> { + rtxn: &'a RoTxn<'a>, + index_scheduler: &'a IndexScheduler, + tasks: &'b mut roaring::bitmap::Iter<'b>, + buffer: Vec, + written: usize, + } - impl Read for TaskReader<'_, '_> { - fn read(&mut self, mut buf: &mut [u8]) -> std::io::Result { - if self.buffer.is_empty() { - match self.tasks.next() { - None => return Ok(0), - Some(task_id) => { - let task = self - .index_scheduler - .queue - .tasks - .get_task(self.rtxn, task_id) - .map_err(|err| io::Error::new(io::ErrorKind::Other, err))? - .ok_or_else(|| { - io::Error::new( - io::ErrorKind::Other, - Error::CorruptedTaskQueue, - ) - })?; + impl Read for TaskReader<'_, '_> { + fn read(&mut self, mut buf: &mut [u8]) -> std::io::Result { + if self.buffer.is_empty() { + match self.tasks.next() { + None => return Ok(0), + Some(task_id) => { + let task = self + .index_scheduler + .queue + .tasks + .get_task(self.rtxn, task_id) + .map_err(|err| io::Error::new(io::ErrorKind::Other, err))? + .ok_or_else(|| { + io::Error::new(io::ErrorKind::Other, Error::CorruptedTaskQueue) + })?; - serde_json::to_writer( - &mut self.buffer, - &TaskView::from_task(&task), - )?; - self.buffer.push(b'\n'); - } + serde_json::to_writer(&mut self.buffer, &TaskView::from_task(&task))?; + self.buffer.push(b'\n'); } } - - let mut to_write = &self.buffer[self.written..]; - let wrote = io::copy(&mut to_write, &mut buf)?; - self.written += wrote as usize; - - // we wrote everything and must refresh our buffer on the next call - if self.written == self.buffer.len() { - self.written = 0; - self.buffer.clear(); - } - - Ok(wrote as usize) } - } - let rtxn = self.env.read_txn()?; + let mut to_write = &self.buffer[self.written..]; + let wrote = io::copy(&mut to_write, &mut buf)?; + self.written += wrote as usize; - let task_reader = TaskReader { - rtxn: &rtxn, - index_scheduler: self, - tasks: &mut updated.into_iter(), - buffer: Vec::with_capacity(50), // on average a task is around ~100 bytes - written: 0, - }; + // we wrote everything and must refresh our buffer on the next call + if self.written == self.buffer.len() { + self.written = 0; + self.buffer.clear(); + } - // let reader = GzEncoder::new(BufReader::new(task_reader), Compression::default()); - let reader = GzEncoder::new(BufReader::new(task_reader), Compression::default()); - let request = ureq::post(url) - .timeout(Duration::from_secs(30)) - .set("Content-Encoding", "gzip") - .set("Content-Type", "application/x-ndjson"); - let request = match &self.webhook_authorization_header { - Some(header) => request.set("Authorization", header), - None => request, - }; - - if let Err(e) = request.send(reader) { - tracing::error!("While sending data to the webhook: {e}"); + Ok(wrote as usize) } } - Ok(()) + let webhooks = self.webhooks.get_all(); + if webhooks.is_empty() { + return; + } + let this = self.private_clone(); + // We must take the RoTxn before entering the thread::spawn otherwise another batch may be + // processed before we had the time to take our txn. + let rtxn = match self.env.clone().static_read_txn() { + Ok(rtxn) => rtxn, + Err(e) => { + tracing::error!("Couldn't get an rtxn to notify the webhook: {e}"); + return; + } + }; + + std::thread::spawn(move || { + for (uuid, Webhook { url, headers }) in webhooks.iter() { + let task_reader = TaskReader { + rtxn: &rtxn, + index_scheduler: &this, + tasks: &mut updated.iter(), + buffer: Vec::with_capacity(page_size::get()), + written: 0, + }; + + let reader = GzEncoder::new(BufReader::new(task_reader), Compression::default()); + + let mut request = ureq::post(url) + .timeout(Duration::from_secs(30)) + .set("Content-Encoding", "gzip") + .set("Content-Type", "application/x-ndjson"); + for (header_name, header_value) in headers.iter() { + request = request.set(header_name, header_value); + } + + if let Err(e) = request.send(reader) { + tracing::error!("While sending data to the webhook {uuid}: {e}"); + } + } + }); } pub fn index_stats(&self, index_uid: &str) -> Result { @@ -850,6 +890,29 @@ impl IndexScheduler { self.features.network() } + pub fn update_runtime_webhooks(&self, runtime: RuntimeWebhooks) -> Result<()> { + let webhooks = Webhooks::from_runtime(runtime); + let mut wtxn = self.env.write_txn()?; + let webhooks_db = self.persisted.remap_data_type::>(); + webhooks_db.put(&mut wtxn, db_keys::WEBHOOKS, &webhooks)?; + wtxn.commit()?; + self.webhooks.update_runtime(webhooks.into_runtime()); + Ok(()) + } + + pub fn webhooks_dump_view(&self) -> WebhooksDumpView { + // We must not dump the cli api key + WebhooksDumpView { webhooks: self.webhooks.get_runtime() } + } + + pub fn webhooks_view(&self) -> WebhooksView { + WebhooksView { webhooks: self.webhooks.get_all() } + } + + pub fn retrieve_runtime_webhooks(&self) -> RuntimeWebhooks { + self.webhooks.get_runtime() + } + pub fn embedders( &self, index_uid: String, @@ -978,3 +1041,72 @@ pub struct IndexStats { /// Internal stats computed from the index. pub inner_stats: index_mapper::IndexStats, } + +/// These structure are not meant to be exposed to the end user, if needed, use the meilisearch-types::webhooks structure instead. +/// /!\ Everytime you deserialize this structure you should fill the cli_webhook later on with the `with_cli` method. /!\ +#[derive(Debug, Serialize, Deserialize, Default)] +#[serde(rename_all = "camelCase")] +struct Webhooks { + // The cli webhook should *never* be stored in a database. + // It represent a state that only exists for this execution of meilisearch + #[serde(skip)] + pub cli: Option, + + #[serde(default)] + pub runtime: RwLock, +} + +type RuntimeWebhooks = BTreeMap; + +impl Webhooks { + pub fn with_cli(&mut self, url: Option, auth: Option) { + if let Some(url) = url { + let webhook = CliWebhook { url, auth }; + self.cli = Some(webhook); + } + } + + pub fn from_runtime(webhooks: RuntimeWebhooks) -> Self { + Self { cli: None, runtime: RwLock::new(webhooks) } + } + + pub fn into_runtime(self) -> RuntimeWebhooks { + // safe because we own self and it cannot be cloned + self.runtime.into_inner().unwrap() + } + + pub fn update_runtime(&self, webhooks: RuntimeWebhooks) { + *self.runtime.write().unwrap() = webhooks; + } + + /// Returns all the webhooks in an unified view. The cli webhook is represented with an uuid set to 0 + pub fn get_all(&self) -> BTreeMap { + self.cli + .as_ref() + .map(|wh| (Uuid::nil(), Webhook::from(wh))) + .into_iter() + .chain(self.runtime.read().unwrap().iter().map(|(uuid, wh)| (*uuid, wh.clone()))) + .collect() + } + + /// Returns all the runtime webhooks. + pub fn get_runtime(&self) -> BTreeMap { + self.runtime.read().unwrap().iter().map(|(uuid, wh)| (*uuid, wh.clone())).collect() + } +} + +#[derive(Debug, Serialize, Deserialize, Default, Clone, PartialEq)] +struct CliWebhook { + pub url: String, + pub auth: Option, +} + +impl From<&CliWebhook> for Webhook { + fn from(webhook: &CliWebhook) -> Self { + let mut headers = BTreeMap::new(); + if let Some(ref auth) = webhook.auth { + headers.insert("Authorization".to_string(), auth.to_string()); + } + Self { url: webhook.url.to_string(), headers } + } +} diff --git a/crates/index-scheduler/src/processing.rs b/crates/index-scheduler/src/processing.rs index fdd8e42ef..3da81f143 100644 --- a/crates/index-scheduler/src/processing.rs +++ b/crates/index-scheduler/src/processing.rs @@ -108,6 +108,7 @@ make_enum_progress! { DumpTheBatches, DumpTheIndexes, DumpTheExperimentalFeatures, + DumpTheWebhooks, CompressTheDump, } } diff --git a/crates/index-scheduler/src/scheduler/mod.rs b/crates/index-scheduler/src/scheduler/mod.rs index 5ac591143..b2bb90c0b 100644 --- a/crates/index-scheduler/src/scheduler/mod.rs +++ b/crates/index-scheduler/src/scheduler/mod.rs @@ -446,8 +446,7 @@ impl IndexScheduler { Ok(()) })?; - // We shouldn't crash the tick function if we can't send data to the webhook. - let _ = self.notify_webhook(&ids); + self.notify_webhooks(ids); #[cfg(test)] self.breakpoint(crate::test_utils::Breakpoint::AfterProcessing); diff --git a/crates/index-scheduler/src/scheduler/process_dump_creation.rs b/crates/index-scheduler/src/scheduler/process_dump_creation.rs index b8d100415..4f3ec0fdd 100644 --- a/crates/index-scheduler/src/scheduler/process_dump_creation.rs +++ b/crates/index-scheduler/src/scheduler/process_dump_creation.rs @@ -5,6 +5,7 @@ use std::sync::atomic::Ordering; use dump::IndexMetadata; use meilisearch_types::milli::constants::RESERVED_VECTORS_FIELD_NAME; +use meilisearch_types::milli::index::EmbeddingsWithMetadata; use meilisearch_types::milli::progress::{Progress, VariableNameStep}; use meilisearch_types::milli::vector::parsed_vectors::{ExplicitVectors, VectorOrArrayOfVectors}; use meilisearch_types::milli::{self}; @@ -227,12 +228,21 @@ impl IndexScheduler { return Err(Error::from_milli(user_err, Some(uid.to_string()))); }; - for (embedder_name, (embeddings, regenerate)) in embeddings { + for ( + embedder_name, + EmbeddingsWithMetadata { embeddings, regenerate, has_fragments }, + ) in embeddings + { let embeddings = ExplicitVectors { embeddings: Some(VectorOrArrayOfVectors::from_array_of_vectors( embeddings, )), - regenerate, + regenerate: regenerate && + // Meilisearch does not handle well dumps with fragments, because as the fragments + // are marked as user-provided, + // all embeddings would be regenerated on any settings change or document update. + // To prevent this, we mark embeddings has non regenerate in this case. + !has_fragments, }; vectors.insert(embedder_name, serde_json::to_value(embeddings).unwrap()); } @@ -260,6 +270,11 @@ impl IndexScheduler { let network = self.network(); dump.create_network(network)?; + // 7. Dump the webhooks + progress.update_progress(DumpCreationProgress::DumpTheWebhooks); + let webhooks = self.webhooks_dump_view(); + dump.create_webhooks(webhooks)?; + let dump_uid = started_at.format(format_description!( "[year repr:full][month repr:numerical][day padding:zero]-[hour padding:zero][minute padding:zero][second padding:zero][subsecond digits:3]" )).unwrap(); diff --git a/crates/index-scheduler/src/scheduler/process_export.rs b/crates/index-scheduler/src/scheduler/process_export.rs index 2062e1c28..0cd06f2e4 100644 --- a/crates/index-scheduler/src/scheduler/process_export.rs +++ b/crates/index-scheduler/src/scheduler/process_export.rs @@ -9,6 +9,7 @@ use flate2::write::GzEncoder; use flate2::Compression; use meilisearch_types::index_uid_pattern::IndexUidPattern; use meilisearch_types::milli::constants::RESERVED_VECTORS_FIELD_NAME; +use meilisearch_types::milli::index::EmbeddingsWithMetadata; use meilisearch_types::milli::progress::{Progress, VariableNameStep}; use meilisearch_types::milli::update::{request_threads, Setting}; use meilisearch_types::milli::vector::parsed_vectors::{ExplicitVectors, VectorOrArrayOfVectors}; @@ -62,13 +63,14 @@ impl IndexScheduler { let ExportIndexSettings { filter, override_settings } = export_settings; let index = self.index(uid)?; let index_rtxn = index.read_txn()?; + let bearer = api_key.map(|api_key| format!("Bearer {api_key}")); // First, check if the index already exists let url = format!("{base_url}/indexes/{uid}"); let response = retry(&must_stop_processing, || { let mut request = agent.get(&url); - if let Some(api_key) = api_key { - request = request.set("Authorization", &format!("Bearer {api_key}")); + if let Some(bearer) = &bearer { + request = request.set("Authorization", bearer); } request.send_bytes(Default::default()).map_err(into_backoff_error) @@ -90,8 +92,8 @@ impl IndexScheduler { let url = format!("{base_url}/indexes"); retry(&must_stop_processing, || { let mut request = agent.post(&url); - if let Some(api_key) = api_key { - request = request.set("Authorization", &format!("Bearer {api_key}")); + if let Some(bearer) = &bearer { + request = request.set("Authorization", bearer); } let index_param = json!({ "uid": uid, "primaryKey": primary_key }); request.send_json(&index_param).map_err(into_backoff_error) @@ -103,8 +105,8 @@ impl IndexScheduler { let url = format!("{base_url}/indexes/{uid}"); retry(&must_stop_processing, || { let mut request = agent.patch(&url); - if let Some(api_key) = api_key { - request = request.set("Authorization", &format!("Bearer {api_key}")); + if let Some(bearer) = &bearer { + request = request.set("Authorization", bearer); } let index_param = json!({ "primaryKey": primary_key }); request.send_json(&index_param).map_err(into_backoff_error) @@ -122,7 +124,6 @@ impl IndexScheduler { } // Retry logic for sending settings let url = format!("{base_url}/indexes/{uid}/settings"); - let bearer = api_key.map(|api_key| format!("Bearer {api_key}")); retry(&must_stop_processing, || { let mut request = agent.patch(&url); if let Some(bearer) = bearer.as_ref() { @@ -167,10 +168,10 @@ impl IndexScheduler { }, ); - let limit = payload_size.map(|ps| ps.as_u64() as usize).unwrap_or(50 * 1024 * 1024); // defaults to 50 MiB + let limit = payload_size.map(|ps| ps.as_u64() as usize).unwrap_or(20 * 1024 * 1024); // defaults to 20 MiB let documents_url = format!("{base_url}/indexes/{uid}/documents"); - request_threads() + let results = request_threads() .broadcast(|ctx| { let index_rtxn = index .read_txn() @@ -229,12 +230,21 @@ impl IndexScheduler { )); }; - for (embedder_name, (embeddings, regenerate)) in embeddings { + for ( + embedder_name, + EmbeddingsWithMetadata { embeddings, regenerate, has_fragments }, + ) in embeddings + { let embeddings = ExplicitVectors { embeddings: Some( VectorOrArrayOfVectors::from_array_of_vectors(embeddings), ), - regenerate, + regenerate: regenerate && + // Meilisearch does not handle well dumps with fragments, because as the fragments + // are marked as user-provided, + // all embeddings would be regenerated on any settings change or document update. + // To prevent this, we mark embeddings has non regenerate in this case. + !has_fragments, }; vectors.insert( embedder_name, @@ -265,9 +275,8 @@ impl IndexScheduler { let mut request = agent.post(&documents_url); request = request.set("Content-Type", "application/x-ndjson"); request = request.set("Content-Encoding", "gzip"); - if let Some(api_key) = api_key { - request = request - .set("Authorization", &(format!("Bearer {api_key}"))); + if let Some(bearer) = &bearer { + request = request.set("Authorization", bearer); } request.send_bytes(&compressed_buffer).map_err(into_backoff_error) })?; @@ -276,7 +285,7 @@ impl IndexScheduler { } buffer.extend_from_slice(&tmp_buffer); - if i % 100 == 0 { + if i > 0 && i % 100 == 0 { step.fetch_add(100, atomic::Ordering::Relaxed); } } @@ -284,8 +293,8 @@ impl IndexScheduler { retry(&must_stop_processing, || { let mut request = agent.post(&documents_url); request = request.set("Content-Type", "application/x-ndjson"); - if let Some(api_key) = api_key { - request = request.set("Authorization", &(format!("Bearer {api_key}"))); + if let Some(bearer) = &bearer { + request = request.set("Authorization", bearer); } request.send_bytes(&buffer).map_err(into_backoff_error) })?; @@ -298,6 +307,9 @@ impl IndexScheduler { Some(uid.to_string()), ) })?; + for result in results { + result?; + } step.store(total_documents, atomic::Ordering::Relaxed); } diff --git a/crates/index-scheduler/src/scheduler/process_snapshot_creation.rs b/crates/index-scheduler/src/scheduler/process_snapshot_creation.rs index d58157ae3..4a7a9e074 100644 --- a/crates/index-scheduler/src/scheduler/process_snapshot_creation.rs +++ b/crates/index-scheduler/src/scheduler/process_snapshot_creation.rs @@ -7,9 +7,73 @@ use meilisearch_types::milli::progress::{Progress, VariableNameStep}; use meilisearch_types::tasks::{Status, Task}; use meilisearch_types::{compression, VERSION_FILE_NAME}; +use crate::heed::EnvOpenOptions; use crate::processing::{AtomicUpdateFileStep, SnapshotCreationProgress}; +use crate::queue::TaskQueue; use crate::{Error, IndexScheduler, Result}; +/// # Safety +/// +/// See [`EnvOpenOptions::open`]. +unsafe fn remove_tasks( + tasks: &[Task], + dst: &std::path::Path, + index_base_map_size: usize, +) -> Result<()> { + let env_options = EnvOpenOptions::new(); + let mut env_options = env_options.read_txn_without_tls(); + let env = env_options.max_dbs(TaskQueue::nb_db()).map_size(index_base_map_size).open(dst)?; + let mut wtxn = env.write_txn()?; + let task_queue = TaskQueue::new(&env, &mut wtxn)?; + + // Destructuring to ensure the code below gets updated if a database gets added in the future. + let TaskQueue { + all_tasks, + status, + kind, + index_tasks: _, // snapshot creation tasks are not index tasks + canceled_by, + enqueued_at, + started_at, + finished_at, + } = task_queue; + + for task in tasks { + all_tasks.delete(&mut wtxn, &task.uid)?; + + let mut tasks = status.get(&wtxn, &task.status)?.unwrap_or_default(); + tasks.remove(task.uid); + status.put(&mut wtxn, &task.status, &tasks)?; + + let mut tasks = kind.get(&wtxn, &task.kind.as_kind())?.unwrap_or_default(); + tasks.remove(task.uid); + kind.put(&mut wtxn, &task.kind.as_kind(), &tasks)?; + + canceled_by.delete(&mut wtxn, &task.uid)?; + + let timestamp = task.enqueued_at.unix_timestamp_nanos(); + let mut tasks = enqueued_at.get(&wtxn, ×tamp)?.unwrap_or_default(); + tasks.remove(task.uid); + enqueued_at.put(&mut wtxn, ×tamp, &tasks)?; + + if let Some(task_started_at) = task.started_at { + let timestamp = task_started_at.unix_timestamp_nanos(); + let mut tasks = started_at.get(&wtxn, ×tamp)?.unwrap_or_default(); + tasks.remove(task.uid); + started_at.put(&mut wtxn, ×tamp, &tasks)?; + } + + if let Some(task_finished_at) = task.finished_at { + let timestamp = task_finished_at.unix_timestamp_nanos(); + let mut tasks = finished_at.get(&wtxn, ×tamp)?.unwrap_or_default(); + tasks.remove(task.uid); + finished_at.put(&mut wtxn, ×tamp, &tasks)?; + } + } + wtxn.commit()?; + Ok(()) +} + impl IndexScheduler { pub(super) fn process_snapshot( &self, @@ -48,14 +112,26 @@ impl IndexScheduler { }; self.env.copy_to_path(dst.join("data.mdb"), compaction_option)?; - // 2.2 Create a read transaction on the index-scheduler + // 2.2 Remove the current snapshot tasks + // + // This is done to ensure that the tasks are not processed again when the snapshot is imported + // + // # Safety + // + // This is safe because we open the env file we just created in a temporary directory. + // We are sure it's not being used by any other process nor thread. + unsafe { + remove_tasks(&tasks, &dst, self.index_mapper.index_base_map_size)?; + } + + // 2.3 Create a read transaction on the index-scheduler let rtxn = self.env.read_txn()?; - // 2.3 Create the update files directory + // 2.4 Create the update files directory let update_files_dir = temp_snapshot_dir.path().join("update_files"); fs::create_dir_all(&update_files_dir)?; - // 2.4 Only copy the update files of the enqueued tasks + // 2.5 Only copy the update files of the enqueued tasks progress.update_progress(SnapshotCreationProgress::SnapshotTheUpdateFiles); let enqueued = self.queue.tasks.get_status(&rtxn, Status::Enqueued)?; let (atomic, update_file_progress) = AtomicUpdateFileStep::new(enqueued.len() as u32); diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/after_processing_everything.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/after_processing_everything.snap index 0b5d4409d..d700dd3db 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/after_processing_everything.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/after_processing_everything.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 1 {uid: 1, batch_uid: 1, status: succeeded, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 2 {uid: 2, batch_uid: 2, status: succeeded, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 3 {uid: 3, batch_uid: 3, status: failed, error: ResponseError { code: 200, message: "Index `doggo` already exists.", error_code: "index_already_exists", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#index_already_exists" }, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} @@ -57,7 +57,7 @@ girafo: { number_of_documents: 0, field_distribution: {} } [timestamp] [4,] ---------------------------------------------------------------------- ### All Batches: -0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.16.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } +0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.17.1"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } 1 {uid: 1, details: {"primaryKey":"mouse"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"catto":1}}, stop reason: "created batch containing only task with id 1 of type `indexCreation` that cannot be batched with any other task.", } 2 {uid: 2, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 2 of type `indexCreation` that cannot be batched with any other task.", } 3 {uid: 3, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 3 of type `indexCreation` that cannot be batched with any other task.", } diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/register_automatic_upgrade_task.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/register_automatic_upgrade_task.snap index 0bfb9c6da..ee3cefba4 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/register_automatic_upgrade_task.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/register_automatic_upgrade_task.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} ---------------------------------------------------------------------- ### Status: enqueued [0,] diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/registered_a_task_while_the_upgrade_task_is_enqueued.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/registered_a_task_while_the_upgrade_task_is_enqueued.snap index 8d374479b..abaffbb1b 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/registered_a_task_while_the_upgrade_task_is_enqueued.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/registered_a_task_while_the_upgrade_task_is_enqueued.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} ---------------------------------------------------------------------- ### Status: diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed.snap index 9fc28abbe..9569ecfe3 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} ---------------------------------------------------------------------- ### Status: @@ -37,7 +37,7 @@ catto [1,] [timestamp] [0,] ---------------------------------------------------------------------- ### All Batches: -0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.16.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } +0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.17.1"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } ---------------------------------------------------------------------- ### Batch to tasks mapping: 0 [0,] diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed_again.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed_again.snap index 33ddf7193..1d7945023 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed_again.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_failed_again.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 2 {uid: 2, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} ---------------------------------------------------------------------- @@ -40,7 +40,7 @@ doggo [2,] [timestamp] [0,] ---------------------------------------------------------------------- ### All Batches: -0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.16.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } +0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.17.1"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } ---------------------------------------------------------------------- ### Batch to tasks mapping: 0 [0,] diff --git a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_succeeded.snap b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_succeeded.snap index 05d366d1e..869d1d0b2 100644 --- a/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_succeeded.snap +++ b/crates/index-scheduler/src/scheduler/snapshots/test_failure.rs/upgrade_failure/upgrade_task_succeeded.snap @@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs [] ---------------------------------------------------------------------- ### All Tasks: -0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 16, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }} +0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 17, 1) }, kind: UpgradeDatabase { from: (1, 12, 0) }} 1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 2 {uid: 2, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} 3 {uid: 3, status: enqueued, details: { primary_key: Some("bone") }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }} @@ -43,7 +43,7 @@ doggo [2,3,] [timestamp] [0,] ---------------------------------------------------------------------- ### All Batches: -0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.16.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } +0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.17.1"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", } ---------------------------------------------------------------------- ### Batch to tasks mapping: 0 [0,] diff --git a/crates/index-scheduler/src/scheduler/test_document_addition.rs b/crates/index-scheduler/src/scheduler/test_document_addition.rs index b642f5604..7ca72da95 100644 --- a/crates/index-scheduler/src/scheduler/test_document_addition.rs +++ b/crates/index-scheduler/src/scheduler/test_document_addition.rs @@ -736,7 +736,7 @@ fn test_document_addition_mixed_rights_with_index() { #[test] fn test_document_addition_mixed_right_without_index_starts_with_cant_create() { // We're going to autobatch multiple document addition. - // - The index does not exists + // - The index does not exist // - The first document addition don't have the right to create an index // - The second do. They should not batch together. // - The second should batch with everything else as it's going to create an index. diff --git a/crates/index-scheduler/src/scheduler/test_embedders.rs b/crates/index-scheduler/src/scheduler/test_embedders.rs index a9b920bd2..791fed4d8 100644 --- a/crates/index-scheduler/src/scheduler/test_embedders.rs +++ b/crates/index-scheduler/src/scheduler/test_embedders.rs @@ -3,6 +3,7 @@ use std::collections::BTreeMap; use big_s::S; use insta::assert_json_snapshot; use meili_snap::{json_string, snapshot}; +use meilisearch_types::milli::index::EmbeddingsWithMetadata; use meilisearch_types::milli::update::Setting; use meilisearch_types::milli::vector::settings::EmbeddingSettings; use meilisearch_types::milli::vector::SearchQuery; @@ -220,8 +221,8 @@ fn import_vectors() { let embeddings = index.embeddings(&rtxn, 0).unwrap(); - assert_json_snapshot!(embeddings[&simple_hf_name].0[0] == lab_embed, @"true"); - assert_json_snapshot!(embeddings[&fakerest_name].0[0] == beagle_embed, @"true"); + assert_json_snapshot!(embeddings[&simple_hf_name].embeddings[0] == lab_embed, @"true"); + assert_json_snapshot!(embeddings[&fakerest_name].embeddings[0] == beagle_embed, @"true"); let doc = index.documents(&rtxn, std::iter::once(0)).unwrap()[0].1; let fields_ids_map = index.fields_ids_map(&rtxn).unwrap(); @@ -311,9 +312,9 @@ fn import_vectors() { let embeddings = index.embeddings(&rtxn, 0).unwrap(); // automatically changed to patou because set to regenerate - assert_json_snapshot!(embeddings[&simple_hf_name].0[0] == patou_embed, @"true"); + assert_json_snapshot!(embeddings[&simple_hf_name].embeddings[0] == patou_embed, @"true"); // remained beagle - assert_json_snapshot!(embeddings[&fakerest_name].0[0] == beagle_embed, @"true"); + assert_json_snapshot!(embeddings[&fakerest_name].embeddings[0] == beagle_embed, @"true"); let doc = index.documents(&rtxn, std::iter::once(0)).unwrap()[0].1; let fields_ids_map = index.fields_ids_map(&rtxn).unwrap(); @@ -497,13 +498,13 @@ fn import_vectors_first_and_embedder_later() { let docid = index.external_documents_ids.get(&rtxn, "0").unwrap().unwrap(); let embeddings = index.embeddings(&rtxn, docid).unwrap(); - let (embedding, _) = &embeddings["my_doggo_embedder"]; - assert!(!embedding.is_empty(), "{embedding:?}"); + let EmbeddingsWithMetadata { embeddings, .. } = &embeddings["my_doggo_embedder"]; + assert!(!embeddings.is_empty(), "{embeddings:?}"); // the document with the id 3 should keep its original embedding let docid = index.external_documents_ids.get(&rtxn, "3").unwrap().unwrap(); let embeddings = index.embeddings(&rtxn, docid).unwrap(); - let (embeddings, _) = &embeddings["my_doggo_embedder"]; + let EmbeddingsWithMetadata { embeddings, .. } = &embeddings["my_doggo_embedder"]; snapshot!(embeddings.len(), @"1"); assert!(embeddings[0].iter().all(|i| *i == 3.0), "{:?}", embeddings[0]); @@ -558,7 +559,7 @@ fn import_vectors_first_and_embedder_later() { "###); let embeddings = index.embeddings(&rtxn, docid).unwrap(); - let (embedding, _) = &embeddings["my_doggo_embedder"]; + let EmbeddingsWithMetadata { embeddings: embedding, .. } = &embeddings["my_doggo_embedder"]; assert!(!embedding.is_empty()); assert!(!embedding[0].iter().all(|i| *i == 3.0), "{:?}", embedding[0]); @@ -566,7 +567,7 @@ fn import_vectors_first_and_embedder_later() { // the document with the id 4 should generate an embedding let docid = index.external_documents_ids.get(&rtxn, "4").unwrap().unwrap(); let embeddings = index.embeddings(&rtxn, docid).unwrap(); - let (embedding, _) = &embeddings["my_doggo_embedder"]; + let EmbeddingsWithMetadata { embeddings: embedding, .. } = &embeddings["my_doggo_embedder"]; assert!(!embedding.is_empty()); } @@ -696,7 +697,7 @@ fn delete_document_containing_vector() { "###); let docid = index.external_documents_ids.get(&rtxn, "0").unwrap().unwrap(); let embeddings = index.embeddings(&rtxn, docid).unwrap(); - let (embedding, _) = &embeddings["manual"]; + let EmbeddingsWithMetadata { embeddings: embedding, .. } = &embeddings["manual"]; assert!(!embedding.is_empty(), "{embedding:?}"); index_scheduler diff --git a/crates/index-scheduler/src/test_utils.rs b/crates/index-scheduler/src/test_utils.rs index bfed7f53a..36de0ed9e 100644 --- a/crates/index-scheduler/src/test_utils.rs +++ b/crates/index-scheduler/src/test_utils.rs @@ -98,8 +98,8 @@ impl IndexScheduler { indexes_path: tempdir.path().join("indexes"), snapshots_path: tempdir.path().join("snapshots"), dumps_path: tempdir.path().join("dumps"), - webhook_url: None, - webhook_authorization_header: None, + cli_webhook_url: None, + cli_webhook_authorization: None, task_db_size: 1000 * 1000 * 10, // 10 MB, we don't use MiB on purpose. index_base_map_size: 1000 * 1000, // 1 MB, we don't use MiB on purpose. enable_mdb_writemap: false, diff --git a/crates/index-scheduler/src/upgrade/mod.rs b/crates/index-scheduler/src/upgrade/mod.rs index 2053caa92..a749b31d5 100644 --- a/crates/index-scheduler/src/upgrade/mod.rs +++ b/crates/index-scheduler/src/upgrade/mod.rs @@ -39,6 +39,7 @@ pub fn upgrade_index_scheduler( (1, 13, _) => 0, (1, 14, _) => 0, (1, 15, _) => 0, + (1, 16, _) => 0, (major, minor, patch) => { if major > current_major || (major == current_major && minor > current_minor) diff --git a/crates/meilisearch-auth/src/lib.rs b/crates/meilisearch-auth/src/lib.rs index 27d163192..6f5a5c2a2 100644 --- a/crates/meilisearch-auth/src/lib.rs +++ b/crates/meilisearch-auth/src/lib.rs @@ -158,7 +158,7 @@ impl AuthController { self.store.delete_all_keys() } - /// Delete all the keys in the DB. + /// Insert a key directly into the store. pub fn raw_insert_key(&mut self, key: Key) -> Result<()> { self.store.put_api_key(key)?; Ok(()) @@ -351,6 +351,7 @@ pub struct IndexSearchRules { fn generate_default_keys(store: &HeedAuthStore) -> Result<()> { store.put_api_key(Key::default_chat())?; + store.put_api_key(Key::default_read_only_admin())?; store.put_api_key(Key::default_admin())?; store.put_api_key(Key::default_search())?; diff --git a/crates/meilisearch-auth/src/store.rs b/crates/meilisearch-auth/src/store.rs index bae27afe4..470379e06 100644 --- a/crates/meilisearch-auth/src/store.rs +++ b/crates/meilisearch-auth/src/store.rs @@ -88,7 +88,13 @@ impl HeedAuthStore { let mut actions = HashSet::new(); for action in &key.actions { match action { - Action::All => actions.extend(enum_iterator::all::()), + Action::All => { + actions.extend(enum_iterator::all::()); + actions.remove(&Action::AllGet); + } + Action::AllGet => { + actions.extend(enum_iterator::all::().filter(|a| a.is_read())) + } Action::DocumentsAll => { actions.extend( [Action::DocumentsGet, Action::DocumentsDelete, Action::DocumentsAdd] @@ -131,6 +137,14 @@ impl HeedAuthStore { Action::ChatsSettingsAll => { actions.extend([Action::ChatsSettingsGet, Action::ChatsSettingsUpdate]); } + Action::WebhooksAll => { + actions.extend([ + Action::WebhooksGet, + Action::WebhooksUpdate, + Action::WebhooksDelete, + Action::WebhooksCreate, + ]); + } other => { actions.insert(*other); } diff --git a/crates/meilisearch-types/Cargo.toml b/crates/meilisearch-types/Cargo.toml index faf59643f..f3279a094 100644 --- a/crates/meilisearch-types/Cargo.toml +++ b/crates/meilisearch-types/Cargo.toml @@ -24,7 +24,7 @@ enum-iterator = "2.1.0" file-store = { path = "../file-store" } flate2 = "1.1.2" fst = "0.4.7" -memmap2 = "0.9.5" +memmap2 = "0.9.7" milli = { path = "../milli" } roaring = { version = "0.10.12", features = ["serde"] } rustc-hash = "2.1.1" diff --git a/crates/meilisearch-types/src/error.rs b/crates/meilisearch-types/src/error.rs index d8d2f628a..1cb09eee3 100644 --- a/crates/meilisearch-types/src/error.rs +++ b/crates/meilisearch-types/src/error.rs @@ -237,6 +237,7 @@ InvalidDocumentRetrieveVectors , InvalidRequest , BAD_REQU MissingDocumentFilter , InvalidRequest , BAD_REQUEST ; MissingDocumentEditionFunction , InvalidRequest , BAD_REQUEST ; InvalidDocumentFilter , InvalidRequest , BAD_REQUEST ; +InvalidDocumentSort , InvalidRequest , BAD_REQUEST ; InvalidDocumentGeoField , InvalidRequest , BAD_REQUEST ; InvalidVectorDimensions , InvalidRequest , BAD_REQUEST ; InvalidVectorsType , InvalidRequest , BAD_REQUEST ; @@ -415,8 +416,18 @@ InvalidChatCompletionPrompts , InvalidRequest , BAD_REQU InvalidChatCompletionSystemPrompt , InvalidRequest , BAD_REQUEST ; InvalidChatCompletionSearchDescriptionPrompt , InvalidRequest , BAD_REQUEST ; InvalidChatCompletionSearchQueryParamPrompt , InvalidRequest , BAD_REQUEST ; +InvalidChatCompletionSearchFilterParamPrompt , InvalidRequest , BAD_REQUEST ; InvalidChatCompletionSearchIndexUidParamPrompt , InvalidRequest , BAD_REQUEST ; -InvalidChatCompletionPreQueryPrompt , InvalidRequest , BAD_REQUEST +InvalidChatCompletionPreQueryPrompt , InvalidRequest , BAD_REQUEST ; +// Webhooks +InvalidWebhooks , InvalidRequest , BAD_REQUEST ; +InvalidWebhookUrl , InvalidRequest , BAD_REQUEST ; +InvalidWebhookHeaders , InvalidRequest , BAD_REQUEST ; +ImmutableWebhook , InvalidRequest , BAD_REQUEST ; +InvalidWebhookUuid , InvalidRequest , BAD_REQUEST ; +WebhookNotFound , InvalidRequest , NOT_FOUND ; +ImmutableWebhookUuid , InvalidRequest , BAD_REQUEST ; +ImmutableWebhookIsEditable , InvalidRequest , BAD_REQUEST } impl ErrorCode for JoinError { @@ -476,7 +487,8 @@ impl ErrorCode for milli::Error { UserError::InvalidDistinctAttribute { .. } => Code::InvalidSearchDistinct, UserError::SortRankingRuleMissing => Code::InvalidSearchSort, UserError::InvalidFacetsDistribution { .. } => Code::InvalidSearchFacets, - UserError::InvalidSortableAttribute { .. } => Code::InvalidSearchSort, + UserError::InvalidSearchSortableAttribute { .. } => Code::InvalidSearchSort, + UserError::InvalidDocumentSortableAttribute { .. } => Code::InvalidDocumentSort, UserError::InvalidSearchableAttribute { .. } => { Code::InvalidSearchAttributesToSearchOn } @@ -494,7 +506,8 @@ impl ErrorCode for milli::Error { UserError::InvalidVectorsMapType { .. } | UserError::InvalidVectorsEmbedderConf { .. } => Code::InvalidVectorsType, UserError::TooManyVectors(_, _) => Code::TooManyVectors, - UserError::SortError(_) => Code::InvalidSearchSort, + UserError::SortError { search: true, .. } => Code::InvalidSearchSort, + UserError::SortError { search: false, .. } => Code::InvalidDocumentSort, UserError::InvalidMinTypoWordLenSetting(_, _) => { Code::InvalidSettingsTypoTolerance } diff --git a/crates/meilisearch-types/src/features.rs b/crates/meilisearch-types/src/features.rs index 3c78035e8..ddffb107c 100644 --- a/crates/meilisearch-types/src/features.rs +++ b/crates/meilisearch-types/src/features.rs @@ -4,10 +4,11 @@ use serde::{Deserialize, Serialize}; use crate::error::{Code, ResponseError}; -pub const DEFAULT_CHAT_SYSTEM_PROMPT: &str = "You are a highly capable research assistant with access to powerful search tools. IMPORTANT INSTRUCTIONS:1. When answering questions, you MUST make multiple tool calls (at least 2-3) to gather comprehensive information.2. Use different search queries for each tool call - vary keywords, rephrase questions, and explore different semantic angles to ensure broad coverage.3. Always explicitly announce BEFORE making each tool call by saying: \"I'll search for [specific information] now.\"4. Combine information from ALL tool calls to provide complete, nuanced answers rather than relying on a single source.5. For complex topics, break down your research into multiple targeted queries rather than using a single generic search."; +pub const DEFAULT_CHAT_SYSTEM_PROMPT: &str = "You are a highly capable research assistant with access to powerful search tools. IMPORTANT INSTRUCTIONS:1. When answering questions, you MUST make multiple tool calls (at least 2-3) to gather comprehensive information.2. Use different search queries for each tool call - vary keywords, rephrase questions, and explore different semantic angles to ensure broad coverage.3. Always explicitly announce BEFORE making each tool call by saying: \"I'll search for [specific information] now.\"4. Combine information from ALL tool calls to provide complete, nuanced answers rather than relying on a single source.5. For complex topics, break down your research into multiple targeted queries rather than using a single generic search. Meilisearch doesn't use the colon (:) syntax to filter but rather the equal (=) one. Separate filters from query and keep the q parameter empty if needed. Same for the filter parameter: keep it empty if need be. If you need to find documents that CONTAINS keywords simply put the keywords in the q parameter do no use a filter for this purpose. Whenever you get an error, read the error message and fix your error. "; pub const DEFAULT_CHAT_SEARCH_DESCRIPTION_PROMPT: &str = - "Search the database for relevant JSON documents using an optional query."; + "Query: 'best story about Rust before 2018' with year: 2018, 2020, 2021\nlabel: analysis, golang, javascript\ntype: story, link\nvote: 300, 298, 278\n: {\"q\": \"\", \"filter\": \"category = Rust AND type = story AND year < 2018 AND vote > 100\"}\nQuery: 'A black or green car that can go fast with red brakes' with maxspeed_kmh: 200, 150, 130\ncolor: black, grey, red, green\nbrand: Toyota, Renault, Jeep, Ferrari\n: {\"q\": \"red brakes\", \"filter\": \"maxspeed_kmh > 150 AND color IN ['black', green]\"}\nQuery: 'Superman movie released in 2018 or after' with year: 2018, 2020, 2021\ngenres: Drama, Comedy, Adventure, Fiction\n: {\"q\":\"Superman\",\"filter\":\"genres IN [Adventure, Fiction] AND year >= 2018\"}"; pub const DEFAULT_CHAT_SEARCH_Q_PARAM_PROMPT: &str = "The search query string used to find relevant documents in the index. This should contain keywords or phrases that best represent what the user is looking for. More specific queries will yield more precise results."; +pub const DEFAULT_CHAT_SEARCH_FILTER_PARAM_PROMPT: &str = "The search filter string used to find relevant documents in the index. It supports parentheses, `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `_geoRadius`, or `_geoBoundingBox`. Here is an example: \"price > 100 AND category = 'electronics'\". The following is a list of fields that can be filtered on: "; pub const DEFAULT_CHAT_SEARCH_INDEX_UID_PARAM_PROMPT: &str = "The name of the index to search within. An index is a collection of documents organized for search. Selecting the right index ensures the most relevant results for the user query."; #[derive(Serialize, Deserialize, Debug, Clone, Copy, Default, PartialEq, Eq)] @@ -161,18 +162,31 @@ impl ChatCompletionSource { #[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)] #[serde(rename_all = "camelCase")] pub struct ChatCompletionPrompts { + #[serde(default)] pub system: String, + #[serde(default)] pub search_description: String, + #[serde(default)] pub search_q_param: String, + #[serde(default = "default_search_filter_param")] + pub search_filter_param: String, + #[serde(default)] pub search_index_uid_param: String, } +/// This function is used for when the search_filter_param is +/// not provided and this can happen when the database is in v1.15. +fn default_search_filter_param() -> String { + DEFAULT_CHAT_SEARCH_FILTER_PARAM_PROMPT.to_string() +} + impl Default for ChatCompletionPrompts { fn default() -> Self { Self { system: DEFAULT_CHAT_SYSTEM_PROMPT.to_string(), search_description: DEFAULT_CHAT_SEARCH_DESCRIPTION_PROMPT.to_string(), search_q_param: DEFAULT_CHAT_SEARCH_Q_PARAM_PROMPT.to_string(), + search_filter_param: DEFAULT_CHAT_SEARCH_FILTER_PARAM_PROMPT.to_string(), search_index_uid_param: DEFAULT_CHAT_SEARCH_INDEX_UID_PARAM_PROMPT.to_string(), } } diff --git a/crates/meilisearch-types/src/keys.rs b/crates/meilisearch-types/src/keys.rs index 3ba31c2cb..06f621e70 100644 --- a/crates/meilisearch-types/src/keys.rs +++ b/crates/meilisearch-types/src/keys.rs @@ -144,6 +144,21 @@ impl Key { } } + pub fn default_read_only_admin() -> Self { + let now = OffsetDateTime::now_utc(); + let uid = Uuid::new_v4(); + Self { + name: Some("Default Read-Only Admin API Key".to_string()), + description: Some("Use it to read information across the whole database. Caution! Do not expose this key on a public frontend".to_string()), + uid, + actions: vec![Action::AllGet, Action::KeysGet], + indexes: vec![IndexUidPattern::all()], + expires_at: None, + created_at: now, + updated_at: now, + } + } + pub fn default_search() -> Self { let now = OffsetDateTime::now_utc(); let uid = Uuid::new_v4(); @@ -347,6 +362,24 @@ pub enum Action { #[serde(rename = "chatsSettings.update")] #[deserr(rename = "chatsSettings.update")] ChatsSettingsUpdate, + #[serde(rename = "*.get")] + #[deserr(rename = "*.get")] + AllGet, + #[serde(rename = "webhooks.get")] + #[deserr(rename = "webhooks.get")] + WebhooksGet, + #[serde(rename = "webhooks.update")] + #[deserr(rename = "webhooks.update")] + WebhooksUpdate, + #[serde(rename = "webhooks.delete")] + #[deserr(rename = "webhooks.delete")] + WebhooksDelete, + #[serde(rename = "webhooks.create")] + #[deserr(rename = "webhooks.create")] + WebhooksCreate, + #[serde(rename = "webhooks.*")] + #[deserr(rename = "webhooks.*")] + WebhooksAll, } impl Action { @@ -385,6 +418,7 @@ impl Action { METRICS_GET => Some(Self::MetricsGet), DUMPS_ALL => Some(Self::DumpsAll), DUMPS_CREATE => Some(Self::DumpsCreate), + SNAPSHOTS_ALL => Some(Self::SnapshotsAll), SNAPSHOTS_CREATE => Some(Self::SnapshotsCreate), VERSION => Some(Self::Version), KEYS_CREATE => Some(Self::KeysAdd), @@ -393,12 +427,71 @@ impl Action { KEYS_DELETE => Some(Self::KeysDelete), EXPERIMENTAL_FEATURES_GET => Some(Self::ExperimentalFeaturesGet), EXPERIMENTAL_FEATURES_UPDATE => Some(Self::ExperimentalFeaturesUpdate), + EXPORT => Some(Self::Export), NETWORK_GET => Some(Self::NetworkGet), NETWORK_UPDATE => Some(Self::NetworkUpdate), + ALL_GET => Some(Self::AllGet), + WEBHOOKS_GET => Some(Self::WebhooksGet), + WEBHOOKS_UPDATE => Some(Self::WebhooksUpdate), + WEBHOOKS_DELETE => Some(Self::WebhooksDelete), + WEBHOOKS_CREATE => Some(Self::WebhooksCreate), + WEBHOOKS_ALL => Some(Self::WebhooksAll), _otherwise => None, } } + /// Whether the action should be included in [Action::AllRead]. + pub fn is_read(&self) -> bool { + use Action::*; + + // It's using an exhaustive match to force the addition of new actions. + match self { + // Any action that expands to others must return false, as it wouldn't be able to expand recursively. + All | AllGet | DocumentsAll | IndexesAll | ChatsAll | TasksAll | SettingsAll + | StatsAll | MetricsAll | DumpsAll | SnapshotsAll | ChatsSettingsAll | WebhooksAll => { + false + } + + Search => true, + DocumentsAdd => false, + DocumentsGet => true, + DocumentsDelete => false, + Export => true, + IndexesAdd => false, + IndexesGet => true, + IndexesUpdate => false, + IndexesDelete => false, + IndexesSwap => false, + TasksCancel => false, + TasksDelete => false, + TasksGet => true, + SettingsGet => true, + SettingsUpdate => false, + StatsGet => true, + MetricsGet => true, + DumpsCreate => false, + SnapshotsCreate => false, + Version => true, + KeysAdd => false, + KeysGet => false, // Disabled in order to prevent privilege escalation + KeysUpdate => false, + KeysDelete => false, + ExperimentalFeaturesGet => true, + ExperimentalFeaturesUpdate => false, + NetworkGet => true, + NetworkUpdate => false, + ChatCompletions => false, // Disabled because it might trigger generation of new chats + ChatsGet => true, + ChatsDelete => false, + ChatsSettingsGet => true, + ChatsSettingsUpdate => false, + WebhooksGet => true, + WebhooksUpdate => false, + WebhooksDelete => false, + WebhooksCreate => false, + } + } + pub const fn repr(&self) -> u8 { *self as u8 } @@ -408,6 +501,7 @@ pub mod actions { use super::Action::*; pub(crate) const ALL: u8 = All.repr(); + pub const ALL_GET: u8 = AllGet.repr(); pub const SEARCH: u8 = Search.repr(); pub const DOCUMENTS_ALL: u8 = DocumentsAll.repr(); pub const DOCUMENTS_ADD: u8 = DocumentsAdd.repr(); @@ -432,6 +526,7 @@ pub mod actions { pub const METRICS_GET: u8 = MetricsGet.repr(); pub const DUMPS_ALL: u8 = DumpsAll.repr(); pub const DUMPS_CREATE: u8 = DumpsCreate.repr(); + pub const SNAPSHOTS_ALL: u8 = SnapshotsAll.repr(); pub const SNAPSHOTS_CREATE: u8 = SnapshotsCreate.repr(); pub const VERSION: u8 = Version.repr(); pub const KEYS_CREATE: u8 = KeysAdd.repr(); @@ -453,4 +548,80 @@ pub mod actions { pub const CHATS_SETTINGS_ALL: u8 = ChatsSettingsAll.repr(); pub const CHATS_SETTINGS_GET: u8 = ChatsSettingsGet.repr(); pub const CHATS_SETTINGS_UPDATE: u8 = ChatsSettingsUpdate.repr(); + + pub const WEBHOOKS_GET: u8 = WebhooksGet.repr(); + pub const WEBHOOKS_UPDATE: u8 = WebhooksUpdate.repr(); + pub const WEBHOOKS_DELETE: u8 = WebhooksDelete.repr(); + pub const WEBHOOKS_CREATE: u8 = WebhooksCreate.repr(); + pub const WEBHOOKS_ALL: u8 = WebhooksAll.repr(); +} + +#[cfg(test)] +pub(crate) mod test { + use super::actions::*; + use super::Action::*; + use super::*; + + #[test] + fn test_action_repr_and_constants() { + assert!(All.repr() == 0 && ALL == 0); + assert!(Search.repr() == 1 && SEARCH == 1); + assert!(DocumentsAll.repr() == 2 && DOCUMENTS_ALL == 2); + assert!(DocumentsAdd.repr() == 3 && DOCUMENTS_ADD == 3); + assert!(DocumentsGet.repr() == 4 && DOCUMENTS_GET == 4); + assert!(DocumentsDelete.repr() == 5 && DOCUMENTS_DELETE == 5); + assert!(IndexesAll.repr() == 6 && INDEXES_ALL == 6); + assert!(IndexesAdd.repr() == 7 && INDEXES_CREATE == 7); + assert!(IndexesGet.repr() == 8 && INDEXES_GET == 8); + assert!(IndexesUpdate.repr() == 9 && INDEXES_UPDATE == 9); + assert!(IndexesDelete.repr() == 10 && INDEXES_DELETE == 10); + assert!(IndexesSwap.repr() == 11 && INDEXES_SWAP == 11); + assert!(TasksAll.repr() == 12 && TASKS_ALL == 12); + assert!(TasksCancel.repr() == 13 && TASKS_CANCEL == 13); + assert!(TasksDelete.repr() == 14 && TASKS_DELETE == 14); + assert!(TasksGet.repr() == 15 && TASKS_GET == 15); + assert!(SettingsAll.repr() == 16 && SETTINGS_ALL == 16); + assert!(SettingsGet.repr() == 17 && SETTINGS_GET == 17); + assert!(SettingsUpdate.repr() == 18 && SETTINGS_UPDATE == 18); + assert!(StatsAll.repr() == 19 && STATS_ALL == 19); + assert!(StatsGet.repr() == 20 && STATS_GET == 20); + assert!(MetricsAll.repr() == 21 && METRICS_ALL == 21); + assert!(MetricsGet.repr() == 22 && METRICS_GET == 22); + assert!(DumpsAll.repr() == 23 && DUMPS_ALL == 23); + assert!(DumpsCreate.repr() == 24 && DUMPS_CREATE == 24); + assert!(SnapshotsAll.repr() == 25 && SNAPSHOTS_ALL == 25); + assert!(SnapshotsCreate.repr() == 26 && SNAPSHOTS_CREATE == 26); + assert!(Version.repr() == 27 && VERSION == 27); + assert!(KeysAdd.repr() == 28 && KEYS_CREATE == 28); + assert!(KeysGet.repr() == 29 && KEYS_GET == 29); + assert!(KeysUpdate.repr() == 30 && KEYS_UPDATE == 30); + assert!(KeysDelete.repr() == 31 && KEYS_DELETE == 31); + assert!(ExperimentalFeaturesGet.repr() == 32 && EXPERIMENTAL_FEATURES_GET == 32); + assert!(ExperimentalFeaturesUpdate.repr() == 33 && EXPERIMENTAL_FEATURES_UPDATE == 33); + assert!(Export.repr() == 34 && EXPORT == 34); + assert!(NetworkGet.repr() == 35 && NETWORK_GET == 35); + assert!(NetworkUpdate.repr() == 36 && NETWORK_UPDATE == 36); + assert!(ChatCompletions.repr() == 37 && CHAT_COMPLETIONS == 37); + assert!(ChatsAll.repr() == 38 && CHATS_ALL == 38); + assert!(ChatsGet.repr() == 39 && CHATS_GET == 39); + assert!(ChatsDelete.repr() == 40 && CHATS_DELETE == 40); + assert!(ChatsSettingsAll.repr() == 41 && CHATS_SETTINGS_ALL == 41); + assert!(ChatsSettingsGet.repr() == 42 && CHATS_SETTINGS_GET == 42); + assert!(ChatsSettingsUpdate.repr() == 43 && CHATS_SETTINGS_UPDATE == 43); + assert!(AllGet.repr() == 44 && ALL_GET == 44); + assert!(WebhooksGet.repr() == 45 && WEBHOOKS_GET == 45); + assert!(WebhooksUpdate.repr() == 46 && WEBHOOKS_UPDATE == 46); + assert!(WebhooksDelete.repr() == 47 && WEBHOOKS_DELETE == 47); + assert!(WebhooksCreate.repr() == 48 && WEBHOOKS_CREATE == 48); + assert!(WebhooksAll.repr() == 49 && WEBHOOKS_ALL == 49); + } + + #[test] + fn test_from_repr() { + for action in enum_iterator::all::() { + let repr = action.repr(); + let action_from_repr = Action::from_repr(repr); + assert_eq!(Some(action), action_from_repr, "Failed for action: {:?}", action); + } + } } diff --git a/crates/meilisearch-types/src/lib.rs b/crates/meilisearch-types/src/lib.rs index fe69da526..9857bfb29 100644 --- a/crates/meilisearch-types/src/lib.rs +++ b/crates/meilisearch-types/src/lib.rs @@ -15,6 +15,7 @@ pub mod star_or; pub mod task_view; pub mod tasks; pub mod versioning; +pub mod webhooks; pub use milli::{heed, Index}; use uuid::Uuid; pub use versioning::VERSION_FILE_NAME; diff --git a/crates/meilisearch-types/src/webhooks.rs b/crates/meilisearch-types/src/webhooks.rs new file mode 100644 index 000000000..7a35850ab --- /dev/null +++ b/crates/meilisearch-types/src/webhooks.rs @@ -0,0 +1,28 @@ +use std::collections::BTreeMap; + +use serde::{Deserialize, Serialize}; +use uuid::Uuid; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq)] +#[serde(rename_all = "camelCase")] +pub struct Webhook { + pub url: String, + #[serde(default)] + pub headers: BTreeMap, +} + +#[derive(Debug, Serialize, Default, Clone, PartialEq)] +#[serde(rename_all = "camelCase")] +pub struct WebhooksView { + #[serde(default)] + pub webhooks: BTreeMap, +} + +// Same as the WebhooksView instead it should never contains the CLI webhooks. +// It's the right structure to use in the dump +#[derive(Debug, Deserialize, Serialize, Default, Clone, PartialEq)] +#[serde(rename_all = "camelCase")] +pub struct WebhooksDumpView { + #[serde(default)] + pub webhooks: BTreeMap, +} diff --git a/crates/meilisearch/Cargo.toml b/crates/meilisearch/Cargo.toml index 83eb439d9..5cbbb6666 100644 --- a/crates/meilisearch/Cargo.toml +++ b/crates/meilisearch/Cargo.toml @@ -50,6 +50,7 @@ jsonwebtoken = "9.3.1" lazy_static = "1.5.0" meilisearch-auth = { path = "../meilisearch-auth" } meilisearch-types = { path = "../meilisearch-types" } +memmap2 = "0.9.7" mimalloc = { version = "0.1.47", default-features = false } mime = "0.3.17" num_cpus = "1.17.0" @@ -169,5 +170,5 @@ german = ["meilisearch-types/german"] turkish = ["meilisearch-types/turkish"] [package.metadata.mini-dashboard] -assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.20/build.zip" -sha1 = "82a7ddd7bf14bb5323c3d235d2b62892a98b6a59" +assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.22/build.zip" +sha1 = "b70b2036b5f167da9ea0b637da8b320c7ea88254" diff --git a/crates/meilisearch/src/analytics/mock_analytics.rs b/crates/meilisearch/src/analytics/mock_analytics.rs index 54b8d4f1b..062240018 100644 --- a/crates/meilisearch/src/analytics/mock_analytics.rs +++ b/crates/meilisearch/src/analytics/mock_analytics.rs @@ -104,6 +104,4 @@ impl Analytics for MockAnalytics { _request: &HttpRequest, ) { } - fn get_fetch_documents(&self, _documents_query: &DocumentFetchKind, _request: &HttpRequest) {} - fn post_fetch_documents(&self, _documents_query: &DocumentFetchKind, _request: &HttpRequest) {} } diff --git a/crates/meilisearch/src/analytics/mod.rs b/crates/meilisearch/src/analytics/mod.rs index bd14b0bfa..0d1a860e1 100644 --- a/crates/meilisearch/src/analytics/mod.rs +++ b/crates/meilisearch/src/analytics/mod.rs @@ -73,12 +73,6 @@ pub enum DocumentDeletionKind { PerFilter, } -#[derive(Copy, Clone, Debug, PartialEq, Eq)] -pub enum DocumentFetchKind { - PerDocumentId { retrieve_vectors: bool }, - Normal { with_filter: bool, limit: usize, offset: usize, retrieve_vectors: bool }, -} - /// To send an event to segment, your event must be able to aggregate itself with another event of the same type. pub trait Aggregate: 'static + mopa::Any + Send { /// The name of the event that will be sent to segment. diff --git a/crates/meilisearch/src/analytics/segment_analytics.rs b/crates/meilisearch/src/analytics/segment_analytics.rs index 0abc5c817..a2a0f0c05 100644 --- a/crates/meilisearch/src/analytics/segment_analytics.rs +++ b/crates/meilisearch/src/analytics/segment_analytics.rs @@ -203,6 +203,7 @@ struct Infos { experimental_composite_embedders: bool, experimental_embedding_cache_entries: usize, experimental_no_snapshot_compaction: bool, + experimental_no_edition_2024_for_dumps: bool, experimental_no_edition_2024_for_settings: bool, gpu_enabled: bool, db_path: bool, @@ -293,6 +294,7 @@ impl Infos { max_indexing_threads, skip_index_budget: _, experimental_no_edition_2024_for_settings, + experimental_no_edition_2024_for_dumps, } = indexer_options; let RuntimeTogglableFeatures { @@ -329,6 +331,7 @@ impl Infos { experimental_composite_embedders: composite_embedders, experimental_embedding_cache_entries, experimental_no_snapshot_compaction, + experimental_no_edition_2024_for_dumps, gpu_enabled: meilisearch_types::milli::vector::is_cuda_enabled(), db_path: db_path != PathBuf::from("./data.ms"), import_dump: import_dump.is_some(), diff --git a/crates/meilisearch/src/error.rs b/crates/meilisearch/src/error.rs index 91c6c23fa..8d4430f07 100644 --- a/crates/meilisearch/src/error.rs +++ b/crates/meilisearch/src/error.rs @@ -49,7 +49,7 @@ pub enum MeilisearchHttpError { TooManySearchRequests(usize), #[error("Internal error: Search limiter is down.")] SearchLimiterIsDown, - #[error("The provided payload reached the size limit. The maximum accepted payload size is {}.", Byte::from_u64(*.0 as u64).get_appropriate_unit(UnitType::Binary))] + #[error("The provided payload reached the size limit. The maximum accepted payload size is {}.", Byte::from_u64(*.0 as u64).get_appropriate_unit(if *.0 % 1024 == 0 { UnitType::Binary } else { UnitType::Decimal }))] PayloadTooLarge(usize), #[error("Two indexes must be given for each swap. The list `[{}]` contains {} indexes.", .0.iter().map(|uid| format!("\"{uid}\"")).collect::>().join(", "), .0.len() diff --git a/crates/meilisearch/src/lib.rs b/crates/meilisearch/src/lib.rs index 43d7afe0e..ca9bb6f50 100644 --- a/crates/meilisearch/src/lib.rs +++ b/crates/meilisearch/src/lib.rs @@ -30,6 +30,7 @@ use actix_web::web::Data; use actix_web::{web, HttpRequest}; use analytics::Analytics; use anyhow::bail; +use bumpalo::Bump; use error::PayloadError; use extractors::payload::PayloadConfig; use index_scheduler::versioning::Versioning; @@ -38,6 +39,7 @@ use meilisearch_auth::{open_auth_store_env, AuthController}; use meilisearch_types::milli::constants::VERSION_MAJOR; use meilisearch_types::milli::documents::{DocumentsBatchBuilder, DocumentsBatchReader}; use meilisearch_types::milli::progress::{EmbedderStats, Progress}; +use meilisearch_types::milli::update::new::indexer; use meilisearch_types::milli::update::{ default_thread_pool_and_threads, IndexDocumentsConfig, IndexDocumentsMethod, IndexerConfig, }; @@ -221,8 +223,8 @@ pub fn setup_meilisearch(opt: &Opt) -> anyhow::Result<(Arc, Arc< indexes_path: opt.db_path.join("indexes"), snapshots_path: opt.snapshot_dir.clone(), dumps_path: opt.dump_dir.clone(), - webhook_url: opt.task_webhook_url.as_ref().map(|url| url.to_string()), - webhook_authorization_header: opt.task_webhook_authorization_header.clone(), + cli_webhook_url: opt.task_webhook_url.as_ref().map(|url| url.to_string()), + cli_webhook_authorization: opt.task_webhook_authorization_header.clone(), task_db_size: opt.max_task_db_size.as_u64() as usize, index_base_map_size: opt.max_index_size.as_u64() as usize, enable_mdb_writemap: opt.experimental_reduce_indexing_memory_usage, @@ -489,7 +491,12 @@ fn import_dump( let _ = std::fs::write(db_path.join("instance-uid"), instance_uid.to_string().as_bytes()); }; - // 2. Import the `Key`s. + // 2. Import the webhooks + if let Some(webhooks) = dump_reader.webhooks() { + index_scheduler.update_runtime_webhooks(webhooks.webhooks.clone())?; + } + + // 3. Import the `Key`s. let mut keys = Vec::new(); auth.raw_delete_all_keys()?; for key in dump_reader.keys()? { @@ -498,20 +505,20 @@ fn import_dump( keys.push(key); } - // 3. Import the `ChatCompletionSettings`s. + // 4. Import the `ChatCompletionSettings`s. for result in dump_reader.chat_completions_settings()? { let (name, settings) = result?; index_scheduler.put_chat_settings(&name, &settings)?; } - // 4. Import the runtime features and network + // 5. Import the runtime features and network let features = dump_reader.features()?.unwrap_or_default(); index_scheduler.put_runtime_features(features)?; let network = dump_reader.network()?.cloned().unwrap_or_default(); index_scheduler.put_network(network)?; - // 4.1 Use all cpus to process dump if `max_indexing_threads` not configured + // 5.1 Use all cpus to process dump if `max_indexing_threads` not configured let backup_config; let base_config = index_scheduler.indexer_config(); @@ -528,12 +535,12 @@ fn import_dump( // /!\ The tasks must be imported AFTER importing the indexes or else the scheduler might // try to process tasks while we're trying to import the indexes. - // 5. Import the indexes. + // 6. Import the indexes. for index_reader in dump_reader.indexes()? { let mut index_reader = index_reader?; let metadata = index_reader.metadata(); let uid = metadata.uid.clone(); - tracing::info!("Importing index `{}`.", metadata.uid); + tracing::info!("Importing index `{uid}`."); let date = Some((metadata.created_at, metadata.updated_at)); let index = index_scheduler.create_raw_index(&metadata.uid, date)?; @@ -541,71 +548,123 @@ fn import_dump( let mut wtxn = index.write_txn()?; let mut builder = milli::update::Settings::new(&mut wtxn, &index, indexer_config); - // 5.1 Import the primary key if there is one. + // 6.1 Import the primary key if there is one. if let Some(ref primary_key) = metadata.primary_key { builder.set_primary_key(primary_key.to_string()); } - // 5.2 Import the settings. + // 6.2 Import the settings. tracing::info!("Importing the settings."); let settings = index_reader.settings()?; apply_settings_to_builder(&settings, &mut builder); let embedder_stats: Arc = Default::default(); builder.execute(&|| false, &progress, embedder_stats.clone())?; + wtxn.commit()?; - // 5.3 Import the documents. - // 5.3.1 We need to recreate the grenad+obkv format accepted by the index. - tracing::info!("Importing the documents."); - let file = tempfile::tempfile()?; - let mut builder = DocumentsBatchBuilder::new(BufWriter::new(file)); - for document in index_reader.documents()? { - builder.append_json_object(&document?)?; + let mut wtxn = index.write_txn()?; + let rtxn = index.read_txn()?; + + if index_scheduler.no_edition_2024_for_dumps() { + // 6.3 Import the documents. + // 6.3.1 We need to recreate the grenad+obkv format accepted by the index. + tracing::info!("Importing the documents."); + let file = tempfile::tempfile()?; + let mut builder = DocumentsBatchBuilder::new(BufWriter::new(file)); + for document in index_reader.documents()? { + builder.append_json_object(&document?)?; + } + + // This flush the content of the batch builder. + let file = builder.into_inner()?.into_inner()?; + + // 6.3.2 We feed it to the milli index. + let reader = BufReader::new(file); + let reader = DocumentsBatchReader::from_reader(reader)?; + + let embedder_configs = index.embedding_configs().embedding_configs(&wtxn)?; + let embedders = index_scheduler.embedders(uid.to_string(), embedder_configs)?; + + let builder = milli::update::IndexDocuments::new( + &mut wtxn, + &index, + indexer_config, + IndexDocumentsConfig { + update_method: IndexDocumentsMethod::ReplaceDocuments, + ..Default::default() + }, + |indexing_step| tracing::trace!("update: {:?}", indexing_step), + || false, + &embedder_stats, + )?; + + let builder = builder.with_embedders(embedders); + + let (builder, user_result) = builder.add_documents(reader)?; + let user_result = user_result?; + tracing::info!(documents_found = user_result, "{} documents found.", user_result); + builder.execute()?; + } else { + let db_fields_ids_map = index.fields_ids_map(&rtxn)?; + let primary_key = index.primary_key(&rtxn)?; + let mut new_fields_ids_map = db_fields_ids_map.clone(); + + let mut indexer = indexer::DocumentOperation::new(); + let embedders = index.embedding_configs().embedding_configs(&rtxn)?; + let embedders = index_scheduler.embedders(uid.clone(), embedders)?; + + let mmap = unsafe { memmap2::Mmap::map(index_reader.documents_file())? }; + + indexer.replace_documents(&mmap)?; + + let indexer_config = index_scheduler.indexer_config(); + let pool = &indexer_config.thread_pool; + + let indexer_alloc = Bump::new(); + let (document_changes, mut operation_stats, primary_key) = indexer.into_changes( + &indexer_alloc, + &index, + &rtxn, + primary_key, + &mut new_fields_ids_map, + &|| false, // never stop processing a dump + progress.clone(), + )?; + + let operation_stats = operation_stats.pop().unwrap(); + if let Some(error) = operation_stats.error { + return Err(error.into()); + } + + let _congestion = indexer::index( + &mut wtxn, + &index, + pool, + indexer_config.grenad_parameters(), + &db_fields_ids_map, + new_fields_ids_map, + primary_key, + &document_changes, + embedders, + &|| false, // never stop processing a dump + &progress, + &embedder_stats, + )?; } - // This flush the content of the batch builder. - let file = builder.into_inner()?.into_inner()?; - - // 5.3.2 We feed it to the milli index. - let reader = BufReader::new(file); - let reader = DocumentsBatchReader::from_reader(reader)?; - - let embedder_configs = index.embedding_configs().embedding_configs(&wtxn)?; - let embedders = index_scheduler.embedders(uid.to_string(), embedder_configs)?; - - let builder = milli::update::IndexDocuments::new( - &mut wtxn, - &index, - indexer_config, - IndexDocumentsConfig { - update_method: IndexDocumentsMethod::ReplaceDocuments, - ..Default::default() - }, - |indexing_step| tracing::trace!("update: {:?}", indexing_step), - || false, - &embedder_stats, - )?; - - let builder = builder.with_embedders(embedders); - - let (builder, user_result) = builder.add_documents(reader)?; - let user_result = user_result?; - tracing::info!(documents_found = user_result, "{} documents found.", user_result); - builder.execute()?; wtxn.commit()?; tracing::info!("All documents successfully imported."); - index_scheduler.refresh_index_stats(&uid)?; } - // 6. Import the queue + // 7. Import the queue let mut index_scheduler_dump = index_scheduler.register_dumped_task()?; - // 6.1. Import the batches + // 7.1. Import the batches for ret in dump_reader.batches()? { let batch = ret?; index_scheduler_dump.register_dumped_batch(batch)?; } - // 6.2. Import the tasks + // 7.2. Import the tasks for ret in dump_reader.tasks()? { let (task, file) = ret?; index_scheduler_dump.register_dumped_task(task, file)?; diff --git a/crates/meilisearch/src/metrics.rs b/crates/meilisearch/src/metrics.rs index d52e04cc6..607bc91eb 100644 --- a/crates/meilisearch/src/metrics.rs +++ b/crates/meilisearch/src/metrics.rs @@ -15,30 +15,33 @@ lazy_static! { "Meilisearch number of degraded search requests" )) .expect("Can't create a metric"); - pub static ref MEILISEARCH_CHAT_SEARCH_REQUESTS: IntCounterVec = register_int_counter_vec!( + pub static ref MEILISEARCH_CHAT_SEARCHES_TOTAL: IntCounterVec = register_int_counter_vec!( opts!( - "meilisearch_chat_search_requests", - "Meilisearch number of search requests performed by the chat route itself" + "meilisearch_chat_searches_total", + "Total number of searches performed by the chat route" ), &["type"] ) .expect("Can't create a metric"); - pub static ref MEILISEARCH_CHAT_PROMPT_TOKENS_USAGE: IntCounterVec = register_int_counter_vec!( - opts!("meilisearch_chat_prompt_tokens_usage", "Meilisearch Chat Prompt Tokens Usage"), + pub static ref MEILISEARCH_CHAT_PROMPT_TOKENS_TOTAL: IntCounterVec = register_int_counter_vec!( + opts!("meilisearch_chat_prompt_tokens_total", "Total number of prompt tokens consumed"), &["workspace", "model"] ) .expect("Can't create a metric"); - pub static ref MEILISEARCH_CHAT_COMPLETION_TOKENS_USAGE: IntCounterVec = + pub static ref MEILISEARCH_CHAT_COMPLETION_TOKENS_TOTAL: IntCounterVec = register_int_counter_vec!( opts!( - "meilisearch_chat_completion_tokens_usage", - "Meilisearch Chat Completion Tokens Usage" + "meilisearch_chat_completion_tokens_total", + "Total number of completion tokens consumed" ), &["workspace", "model"] ) .expect("Can't create a metric"); - pub static ref MEILISEARCH_CHAT_TOTAL_TOKENS_USAGE: IntCounterVec = register_int_counter_vec!( - opts!("meilisearch_chat_total_tokens_usage", "Meilisearch Chat Total Tokens Usage"), + pub static ref MEILISEARCH_CHAT_TOKENS_TOTAL: IntCounterVec = register_int_counter_vec!( + opts!( + "meilisearch_chat_tokens_total", + "Total number of tokens consumed (prompt + completion)" + ), &["workspace", "model"] ) .expect("Can't create a metric"); diff --git a/crates/meilisearch/src/option.rs b/crates/meilisearch/src/option.rs index 9658352c8..e27fa08cd 100644 --- a/crates/meilisearch/src/option.rs +++ b/crates/meilisearch/src/option.rs @@ -68,6 +68,8 @@ const MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS_TOTAL_SIZE: &str = const MEILI_EXPERIMENTAL_EMBEDDING_CACHE_ENTRIES: &str = "MEILI_EXPERIMENTAL_EMBEDDING_CACHE_ENTRIES"; const MEILI_EXPERIMENTAL_NO_SNAPSHOT_COMPACTION: &str = "MEILI_EXPERIMENTAL_NO_SNAPSHOT_COMPACTION"; +const MEILI_EXPERIMENTAL_NO_EDITION_2024_FOR_DUMPS: &str = + "MEILI_EXPERIMENTAL_NO_EDITION_2024_FOR_DUMPS"; const DEFAULT_CONFIG_FILE_PATH: &str = "./config.toml"; const DEFAULT_DB_PATH: &str = "./data.ms"; const DEFAULT_HTTP_ADDR: &str = "localhost:7700"; @@ -204,11 +206,13 @@ pub struct Opt { pub env: String, /// Called whenever a task finishes so a third party can be notified. + /// See also the dedicated API `/webhooks`. #[clap(long, env = MEILI_TASK_WEBHOOK_URL)] pub task_webhook_url: Option, /// The Authorization header to send on the webhook URL whenever /// a task finishes so a third party can be notified. + /// See also the dedicated API `/webhooks`. #[clap(long, env = MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER)] pub task_webhook_authorization_header: Option, @@ -759,6 +763,15 @@ pub struct IndexerOpts { #[clap(long, env = MEILI_EXPERIMENTAL_NO_EDITION_2024_FOR_SETTINGS)] #[serde(default)] pub experimental_no_edition_2024_for_settings: bool, + + /// Experimental make dump imports use the old document indexer. + /// + /// When enabled, Meilisearch will use the old document indexer when importing dumps. + /// + /// For more information, see . + #[clap(long, env = MEILI_EXPERIMENTAL_NO_EDITION_2024_FOR_DUMPS)] + #[serde(default)] + pub experimental_no_edition_2024_for_dumps: bool, } impl IndexerOpts { @@ -769,6 +782,7 @@ impl IndexerOpts { max_indexing_threads, skip_index_budget: _, experimental_no_edition_2024_for_settings, + experimental_no_edition_2024_for_dumps, } = self; if let Some(max_indexing_memory) = max_indexing_memory.0 { export_to_env_if_not_present( @@ -788,6 +802,12 @@ impl IndexerOpts { experimental_no_edition_2024_for_settings.to_string(), ); } + if experimental_no_edition_2024_for_dumps { + export_to_env_if_not_present( + MEILI_EXPERIMENTAL_NO_EDITION_2024_FOR_DUMPS, + experimental_no_edition_2024_for_dumps.to_string(), + ); + } } } @@ -808,6 +828,7 @@ impl TryFrom<&IndexerOpts> for IndexerConfig { skip_index_budget: other.skip_index_budget, experimental_no_edition_2024_for_settings: other .experimental_no_edition_2024_for_settings, + experimental_no_edition_2024_for_dumps: other.experimental_no_edition_2024_for_dumps, chunk_compression_type: Default::default(), chunk_compression_level: Default::default(), documents_chunk_size: Default::default(), diff --git a/crates/meilisearch/src/routes/chats/chat_completions.rs b/crates/meilisearch/src/routes/chats/chat_completions.rs index 4f7087ae8..f2c17a696 100644 --- a/crates/meilisearch/src/routes/chats/chat_completions.rs +++ b/crates/meilisearch/src/routes/chats/chat_completions.rs @@ -27,9 +27,10 @@ use meilisearch_types::features::{ ChatCompletionPrompts as DbChatCompletionPrompts, ChatCompletionSource as DbChatCompletionSource, SystemRole, }; +use meilisearch_types::heed::RoTxn; use meilisearch_types::keys::actions; use meilisearch_types::milli::index::ChatConfig; -use meilisearch_types::milli::{all_obkv_to_json, obkv_to_json, TimeBudget}; +use meilisearch_types::milli::{all_obkv_to_json, obkv_to_json, OrderBy, PatternMatch, TimeBudget}; use meilisearch_types::{Document, Index}; use serde::Deserialize; use serde_json::json; @@ -49,8 +50,8 @@ use crate::error::MeilisearchHttpError; use crate::extractors::authentication::policies::ActionPolicy; use crate::extractors::authentication::{extract_token_from_request, GuardedData, Policy as _}; use crate::metrics::{ - MEILISEARCH_CHAT_COMPLETION_TOKENS_USAGE, MEILISEARCH_CHAT_PROMPT_TOKENS_USAGE, - MEILISEARCH_CHAT_SEARCH_REQUESTS, MEILISEARCH_CHAT_TOTAL_TOKENS_USAGE, + MEILISEARCH_CHAT_COMPLETION_TOKENS_TOTAL, MEILISEARCH_CHAT_PROMPT_TOKENS_TOTAL, + MEILISEARCH_CHAT_SEARCHES_TOTAL, MEILISEARCH_CHAT_TOKENS_TOTAL, MEILISEARCH_DEGRADED_SEARCH_REQUESTS, }; use crate::routes::chats::utils::SseEventSender; @@ -169,6 +170,7 @@ fn setup_search_tool( let mut index_uids = Vec::new(); let mut function_description = prompts.search_description.clone(); + let mut filter_description = prompts.search_filter_param.clone(); index_scheduler.try_for_each_index::<_, ()>(|name, index| { // Make sure to skip unauthorized indexes if !filters.is_index_authorized(name) { @@ -180,16 +182,22 @@ fn setup_search_tool( let index_description = chat_config.description; let _ = writeln!(&mut function_description, "\n\n - {name}: {index_description}\n"); index_uids.push(name.to_string()); + let facet_distributions = format_facet_distributions(index, &rtxn, 10).unwrap(); // TODO do not unwrap + let _ = writeln!(&mut filter_description, "\n## Facet distributions of the {name} index"); + let _ = writeln!(&mut filter_description, "{facet_distributions}"); Ok(()) })?; + tracing::debug!("LLM function description: {function_description}"); + tracing::debug!("LLM filter description: {filter_description}"); + let tool = ChatCompletionToolArgs::default() .r#type(ChatCompletionToolType::Function) .function( FunctionObjectArgs::default() .name(MEILI_SEARCH_IN_INDEX_FUNCTION_NAME) - .description(&function_description) + .description(function_description) .parameters(json!({ "type": "object", "properties": { @@ -203,9 +211,13 @@ fn setup_search_tool( // "type": ["string", "null"], "type": "string", "description": prompts.search_q_param, + }, + "filter": { + "type": "string", + "description": filter_description, } }, - "required": ["index_uid", "q"], + "required": ["index_uid", "q", "filter"], "additionalProperties": false, })) .strict(true) @@ -247,11 +259,19 @@ async fn process_search_request( auth_token: &str, index_uid: String, q: Option, + filter: Option, ) -> Result<(Index, Vec, String), ResponseError> { let index = index_scheduler.index(&index_uid)?; let rtxn = index.static_read_txn()?; let ChatConfig { description: _, prompt: _, search_parameters } = index.chat_config(&rtxn)?; - let mut query = SearchQuery { q, ..SearchQuery::from(search_parameters) }; + let mut query = SearchQuery { + q, + filter: filter.map(serde_json::Value::from), + ..SearchQuery::from(search_parameters) + }; + + tracing::debug!("LLM query: {:?}", query); + let auth_filter = ActionPolicy::<{ actions::SEARCH }>::authenticate( auth_ctrl, auth_token, @@ -280,17 +300,26 @@ async fn process_search_request( let (search, _is_finite_pagination, _max_total_hits, _offset) = prepare_search(&index_cloned, &rtxn, &query, &search_kind, time_budget, features)?; - search_from_kind(index_uid, search_kind, search) - .map(|(search_results, _)| (rtxn, search_results)) - .map_err(ResponseError::from) + match search_from_kind(index_uid, search_kind, search) { + Ok((search_results, _)) => Ok((rtxn, Ok(search_results))), + Err(MeilisearchHttpError::Milli { + error: meilisearch_types::milli::Error::UserError(user_error), + index_name: _, + }) => Ok((rtxn, Err(user_error))), + Err(err) => Err(ResponseError::from(err)), + } }) .await; permit.drop().await; - let output = output?; + let output = match output? { + Ok((rtxn, Ok(search_results))) => Ok((rtxn, search_results)), + Ok((_rtxn, Err(error))) => return Ok((index, Vec::new(), error.to_string())), + Err(err) => Err(err), + }; let mut documents = Vec::new(); if let Ok((ref rtxn, ref search_result)) = output { - MEILISEARCH_CHAT_SEARCH_REQUESTS.with_label_values(&["internal"]).inc(); + MEILISEARCH_CHAT_SEARCHES_TOTAL.with_label_values(&["internal"]).inc(); if search_result.degraded { MEILISEARCH_DEGRADED_SEARCH_REQUESTS.inc(); } @@ -395,16 +424,19 @@ async fn non_streamed_chat( for call in meili_calls { let result = match serde_json::from_str(&call.function.arguments) { - Ok(SearchInIndexParameters { index_uid, q }) => process_search_request( - &index_scheduler, - auth_ctrl.clone(), - &search_queue, - auth_token, - index_uid, - q, - ) - .await - .map_err(|e| e.to_string()), + Ok(SearchInIndexParameters { index_uid, q, filter }) => { + process_search_request( + &index_scheduler, + auth_ctrl.clone(), + &search_queue, + auth_token, + index_uid, + q, + filter, + ) + .await + .map_err(|e| e.to_string()) + } Err(err) => Err(err.to_string()), }; @@ -564,13 +596,13 @@ async fn run_conversation( match result { Ok(resp) => { if let Some(usage) = resp.usage.as_ref() { - MEILISEARCH_CHAT_PROMPT_TOKENS_USAGE + MEILISEARCH_CHAT_PROMPT_TOKENS_TOTAL .with_label_values(&[workspace_uid, &chat_completion.model]) .inc_by(usage.prompt_tokens as u64); - MEILISEARCH_CHAT_COMPLETION_TOKENS_USAGE + MEILISEARCH_CHAT_COMPLETION_TOKENS_TOTAL .with_label_values(&[workspace_uid, &chat_completion.model]) .inc_by(usage.completion_tokens as u64); - MEILISEARCH_CHAT_TOTAL_TOKENS_USAGE + MEILISEARCH_CHAT_TOKENS_TOTAL .with_label_values(&[workspace_uid, &chat_completion.model]) .inc_by(usage.total_tokens as u64); } @@ -719,13 +751,14 @@ async fn handle_meili_tools( let mut error = None; let result = match serde_json::from_str(&call.function.arguments) { - Ok(SearchInIndexParameters { index_uid, q }) => match process_search_request( + Ok(SearchInIndexParameters { index_uid, q, filter }) => match process_search_request( index_scheduler, auth_ctrl.clone(), search_queue, auth_token, index_uid, q, + filter, ) .await { @@ -801,4 +834,42 @@ struct SearchInIndexParameters { index_uid: String, /// The query parameter to use. q: Option, + /// The filter parameter to use. + filter: Option, +} + +fn format_facet_distributions( + index: &Index, + rtxn: &RoTxn, + max_values_per_facet: usize, +) -> meilisearch_types::milli::Result { + let universe = index.documents_ids(rtxn)?; + let rules = index.filterable_attributes_rules(rtxn)?; + let fields_ids_map = index.fields_ids_map(rtxn)?; + let filterable_attributes = fields_ids_map + .names() + .filter(|name| rules.iter().any(|rule| matches!(rule.match_str(name), PatternMatch::Match))) + .map(|name| (name, OrderBy::Count)); + let facets_distribution = index + .facets_distribution(rtxn) + .max_values_per_facet(max_values_per_facet) + .candidates(universe) + .facets(filterable_attributes) + .execute()?; + + let mut output = String::new(); + for (facet_name, entries) in facets_distribution { + let _ = write!(&mut output, "{}: ", facet_name); + let total_entries = entries.len(); + for (i, (value, _count)) in entries.into_iter().enumerate() { + let _ = if total_entries.saturating_sub(1) == i { + write!(&mut output, "{value}.") + } else { + write!(&mut output, "{value}, ") + }; + } + let _ = writeln!(&mut output); + } + + Ok(output) } diff --git a/crates/meilisearch/src/routes/chats/settings.rs b/crates/meilisearch/src/routes/chats/settings.rs index 38eb0d3c5..44c099c14 100644 --- a/crates/meilisearch/src/routes/chats/settings.rs +++ b/crates/meilisearch/src/routes/chats/settings.rs @@ -8,8 +8,8 @@ use meilisearch_types::error::{Code, ResponseError}; use meilisearch_types::features::{ ChatCompletionPrompts as DbChatCompletionPrompts, ChatCompletionSettings, ChatCompletionSource as DbChatCompletionSource, DEFAULT_CHAT_SEARCH_DESCRIPTION_PROMPT, - DEFAULT_CHAT_SEARCH_INDEX_UID_PARAM_PROMPT, DEFAULT_CHAT_SEARCH_Q_PARAM_PROMPT, - DEFAULT_CHAT_SYSTEM_PROMPT, + DEFAULT_CHAT_SEARCH_FILTER_PARAM_PROMPT, DEFAULT_CHAT_SEARCH_INDEX_UID_PARAM_PROMPT, + DEFAULT_CHAT_SEARCH_Q_PARAM_PROMPT, DEFAULT_CHAT_SYSTEM_PROMPT, }; use meilisearch_types::keys::actions; use meilisearch_types::milli::update::Setting; @@ -84,6 +84,11 @@ async fn patch_settings( Setting::Reset => DEFAULT_CHAT_SEARCH_Q_PARAM_PROMPT.to_string(), Setting::NotSet => old_settings.prompts.search_q_param, }, + search_filter_param: match new_prompts.search_filter_param { + Setting::Set(new_description) => new_description, + Setting::Reset => DEFAULT_CHAT_SEARCH_FILTER_PARAM_PROMPT.to_string(), + Setting::NotSet => old_settings.prompts.search_filter_param, + }, search_index_uid_param: match new_prompts.search_index_uid_param { Setting::Set(new_description) => new_description, Setting::Reset => DEFAULT_CHAT_SEARCH_INDEX_UID_PARAM_PROMPT.to_string(), @@ -252,6 +257,10 @@ pub struct ChatPrompts { #[schema(value_type = Option, example = json!("This is query parameter..."))] pub search_q_param: Setting, #[serde(default)] + #[deserr(default, error = DeserrJsonError)] + #[schema(value_type = Option, example = json!("This is filter parameter..."))] + pub search_filter_param: Setting, + #[serde(default)] #[deserr(default, error = DeserrJsonError)] #[schema(value_type = Option, example = json!("This is index you want to search in..."))] pub search_index_uid_param: Setting, diff --git a/crates/meilisearch/src/routes/indexes/documents.rs b/crates/meilisearch/src/routes/indexes/documents.rs index a93d736f7..5ced4603e 100644 --- a/crates/meilisearch/src/routes/indexes/documents.rs +++ b/crates/meilisearch/src/routes/indexes/documents.rs @@ -1,6 +1,7 @@ use std::collections::HashSet; use std::io::{ErrorKind, Seek as _}; use std::marker::PhantomData; +use std::str::FromStr; use actix_web::http::header::CONTENT_TYPE; use actix_web::web::Data; @@ -17,9 +18,11 @@ use meilisearch_types::error::deserr_codes::*; use meilisearch_types::error::{Code, ResponseError}; use meilisearch_types::heed::RoTxn; use meilisearch_types::index_uid::IndexUid; +use meilisearch_types::milli::documents::sort::recursive_sort; +use meilisearch_types::milli::index::EmbeddingsWithMetadata; use meilisearch_types::milli::update::IndexDocumentsMethod; use meilisearch_types::milli::vector::parsed_vectors::ExplicitVectors; -use meilisearch_types::milli::DocumentId; +use meilisearch_types::milli::{AscDesc, DocumentId}; use meilisearch_types::serde_cs::vec::CS; use meilisearch_types::star_or::OptionStarOrList; use meilisearch_types::tasks::KindWithContent; @@ -42,6 +45,7 @@ use crate::extractors::authentication::policies::*; use crate::extractors::authentication::GuardedData; use crate::extractors::payload::Payload; use crate::extractors::sequential_extractor::SeqHandler; +use crate::routes::indexes::search::fix_sort_query_parameters; use crate::routes::{ get_task_id, is_dry_run, PaginationView, SummarizedTaskView, PAGINATION_DEFAULT_LIMIT, }; @@ -135,6 +139,10 @@ pub struct DocumentsFetchAggregator { per_document_id: bool, // if a filter was used per_filter: bool, + with_vector_filter: bool, + + // if documents were sorted + sort: bool, #[serde(rename = "vector.retrieve_vectors")] retrieve_vectors: bool, @@ -151,39 +159,6 @@ pub struct DocumentsFetchAggregator { marker: std::marker::PhantomData, } -#[derive(Copy, Clone, Debug, PartialEq, Eq)] -pub enum DocumentFetchKind { - PerDocumentId { retrieve_vectors: bool }, - Normal { with_filter: bool, limit: usize, offset: usize, retrieve_vectors: bool, ids: usize }, -} - -impl DocumentsFetchAggregator { - pub fn from_query(query: &DocumentFetchKind) -> Self { - let (limit, offset, retrieve_vectors) = match query { - DocumentFetchKind::PerDocumentId { retrieve_vectors } => (1, 0, *retrieve_vectors), - DocumentFetchKind::Normal { limit, offset, retrieve_vectors, .. } => { - (*limit, *offset, *retrieve_vectors) - } - }; - - let ids = match query { - DocumentFetchKind::Normal { ids, .. } => *ids, - DocumentFetchKind::PerDocumentId { .. } => 0, - }; - - Self { - per_document_id: matches!(query, DocumentFetchKind::PerDocumentId { .. }), - per_filter: matches!(query, DocumentFetchKind::Normal { with_filter, .. } if *with_filter), - max_limit: limit, - max_offset: offset, - retrieve_vectors, - max_document_ids: ids, - - marker: PhantomData, - } - } -} - impl Aggregate for DocumentsFetchAggregator { fn event_name(&self) -> &'static str { Method::event_name() @@ -193,6 +168,8 @@ impl Aggregate for DocumentsFetchAggregator { Box::new(Self { per_document_id: self.per_document_id | new.per_document_id, per_filter: self.per_filter | new.per_filter, + with_vector_filter: self.with_vector_filter | new.with_vector_filter, + sort: self.sort | new.sort, retrieve_vectors: self.retrieve_vectors | new.retrieve_vectors, max_limit: self.max_limit.max(new.max_limit), max_offset: self.max_offset.max(new.max_offset), @@ -276,6 +253,8 @@ pub async fn get_document( retrieve_vectors: param_retrieve_vectors.0, per_document_id: true, per_filter: false, + with_vector_filter: false, + sort: false, max_limit: 0, max_offset: 0, max_document_ids: 0, @@ -406,6 +385,8 @@ pub struct BrowseQueryGet { #[param(default, value_type = Option, example = "popularity > 1000")] #[deserr(default, error = DeserrQueryParamError)] filter: Option, + #[deserr(default, error = DeserrQueryParamError)] + sort: Option, } #[derive(Debug, Deserr, ToSchema)] @@ -430,6 +411,9 @@ pub struct BrowseQuery { #[schema(default, value_type = Option, example = "popularity > 1000")] #[deserr(default, error = DeserrJsonError)] filter: Option, + #[schema(default, value_type = Option>, example = json!(["title:asc", "rating:desc"]))] + #[deserr(default, error = DeserrJsonError)] + sort: Option>, } /// Get documents with POST @@ -495,6 +479,11 @@ pub async fn documents_by_query_post( analytics.publish( DocumentsFetchAggregator:: { per_filter: body.filter.is_some(), + with_vector_filter: body + .filter + .as_ref() + .is_some_and(|f| f.to_string().contains("_vectors")), + sort: body.sort.is_some(), retrieve_vectors: body.retrieve_vectors, max_limit: body.limit, max_offset: body.offset, @@ -571,7 +560,7 @@ pub async fn get_documents( ) -> Result { debug!(parameters = ?params, "Get documents GET"); - let BrowseQueryGet { limit, offset, fields, retrieve_vectors, filter, ids } = + let BrowseQueryGet { limit, offset, fields, retrieve_vectors, filter, ids, sort } = params.into_inner(); let filter = match filter { @@ -582,20 +571,24 @@ pub async fn get_documents( None => None, }; - let ids = ids.map(|ids| ids.into_iter().map(Into::into).collect()); - let query = BrowseQuery { offset: offset.0, limit: limit.0, fields: fields.merge_star_and_none(), retrieve_vectors: retrieve_vectors.0, filter, - ids, + ids: ids.map(|ids| ids.into_iter().map(Into::into).collect()), + sort: sort.map(|attr| fix_sort_query_parameters(&attr)), }; analytics.publish( DocumentsFetchAggregator:: { per_filter: query.filter.is_some(), + with_vector_filter: query + .filter + .as_ref() + .is_some_and(|f| f.to_string().contains("_vectors")), + sort: query.sort.is_some(), retrieve_vectors: query.retrieve_vectors, max_limit: query.limit, max_offset: query.offset, @@ -615,7 +608,7 @@ fn documents_by_query( query: BrowseQuery, ) -> Result { let index_uid = IndexUid::try_from(index_uid.into_inner())?; - let BrowseQuery { offset, limit, fields, retrieve_vectors, filter, ids } = query; + let BrowseQuery { offset, limit, fields, retrieve_vectors, filter, ids, sort } = query; let retrieve_vectors = RetrieveVectors::new(retrieve_vectors); @@ -633,6 +626,18 @@ fn documents_by_query( None }; + let sort_criteria = if let Some(sort) = &sort { + let sorts: Vec<_> = match sort.iter().map(|s| milli::AscDesc::from_str(s)).collect() { + Ok(sorts) => sorts, + Err(asc_desc_error) => { + return Err(milli::SortError::from(asc_desc_error).into_document_error().into()) + } + }; + Some(sorts) + } else { + None + }; + let index = index_scheduler.index(&index_uid)?; let (total, documents) = retrieve_documents( &index, @@ -643,6 +648,7 @@ fn documents_by_query( fields, retrieve_vectors, index_scheduler.features(), + sort_criteria, )?; let ret = PaginationView::new(offset, limit, total as usize, documents); @@ -1461,15 +1467,17 @@ fn some_documents<'a, 't: 'a>( document.remove("_vectors"); } RetrieveVectors::Retrieve => { - // Clippy is simply wrong - #[allow(clippy::manual_unwrap_or_default)] let mut vectors = match document.remove("_vectors") { Some(Value::Object(map)) => map, _ => Default::default(), }; - for (name, (vector, regenerate)) in index.embeddings(rtxn, key)? { + for ( + name, + EmbeddingsWithMetadata { embeddings, regenerate, has_fragments: _ }, + ) in index.embeddings(rtxn, key)? + { let embeddings = - ExplicitVectors { embeddings: Some(vector.into()), regenerate }; + ExplicitVectors { embeddings: Some(embeddings.into()), regenerate }; vectors.insert( name, serde_json::to_value(embeddings).map_err(MeilisearchHttpError::from)?, @@ -1494,6 +1502,7 @@ fn retrieve_documents>( attributes_to_retrieve: Option>, retrieve_vectors: RetrieveVectors, features: RoFeatures, + sort_criteria: Option>, ) -> Result<(u64, Vec), ResponseError> { let rtxn = index.read_txn()?; let filter = &filter; @@ -1526,15 +1535,32 @@ fn retrieve_documents>( })? } - let (it, number_of_documents) = { + let (it, number_of_documents) = if let Some(sort) = sort_criteria { + let number_of_documents = candidates.len(); + let facet_sort = recursive_sort(index, &rtxn, sort, &candidates)?; + let iter = facet_sort.iter()?; + let mut documents = Vec::with_capacity(limit); + for result in iter.skip(offset).take(limit) { + documents.push(result?); + } + ( + itertools::Either::Left(some_documents( + index, + &rtxn, + documents.into_iter(), + retrieve_vectors, + )?), + number_of_documents, + ) + } else { let number_of_documents = candidates.len(); ( - some_documents( + itertools::Either::Right(some_documents( index, &rtxn, candidates.into_iter().skip(offset).take(limit), retrieve_vectors, - )?, + )?), number_of_documents, ) }; diff --git a/crates/meilisearch/src/routes/indexes/search_analytics.rs b/crates/meilisearch/src/routes/indexes/search_analytics.rs index 07f79eba7..e27e6347b 100644 --- a/crates/meilisearch/src/routes/indexes/search_analytics.rs +++ b/crates/meilisearch/src/routes/indexes/search_analytics.rs @@ -40,6 +40,7 @@ pub struct SearchAggregator { // filter filter_with_geo_radius: bool, filter_with_geo_bounding_box: bool, + filter_on_vectors: bool, // every time a request has a filter, this field must be incremented by the number of terms it contains filter_sum_of_criteria_terms: usize, // every time a request has a filter, this field must be incremented by one @@ -163,6 +164,7 @@ impl SearchAggregator { let stringified_filters = filter.to_string(); ret.filter_with_geo_radius = stringified_filters.contains("_geoRadius("); ret.filter_with_geo_bounding_box = stringified_filters.contains("_geoBoundingBox("); + ret.filter_on_vectors = stringified_filters.contains("_vectors"); ret.filter_sum_of_criteria_terms = RE.split(&stringified_filters).count(); } @@ -224,6 +226,7 @@ impl SearchAggregator { let SearchResult { hits: _, query: _, + query_vector: _, processing_time_ms, hits_info: _, semantic_hit_count: _, @@ -260,6 +263,7 @@ impl Aggregate for SearchAggregator { distinct, filter_with_geo_radius, filter_with_geo_bounding_box, + filter_on_vectors, filter_sum_of_criteria_terms, filter_total_number_of_criteria, used_syntax, @@ -314,6 +318,7 @@ impl Aggregate for SearchAggregator { // filter self.filter_with_geo_radius |= filter_with_geo_radius; self.filter_with_geo_bounding_box |= filter_with_geo_bounding_box; + self.filter_on_vectors |= filter_on_vectors; self.filter_sum_of_criteria_terms = self.filter_sum_of_criteria_terms.saturating_add(filter_sum_of_criteria_terms); self.filter_total_number_of_criteria = @@ -388,6 +393,7 @@ impl Aggregate for SearchAggregator { distinct, filter_with_geo_radius, filter_with_geo_bounding_box, + filter_on_vectors, filter_sum_of_criteria_terms, filter_total_number_of_criteria, used_syntax, @@ -445,6 +451,7 @@ impl Aggregate for SearchAggregator { "filter": { "with_geoRadius": filter_with_geo_radius, "with_geoBoundingBox": filter_with_geo_bounding_box, + "on_vectors": filter_on_vectors, "avg_criteria_number": format!("{:.2}", filter_sum_of_criteria_terms as f64 / filter_total_number_of_criteria as f64), "most_used_syntax": used_syntax.iter().max_by_key(|(_, v)| *v).map(|(k, _)| json!(k)).unwrap_or_else(|| json!(null)), }, diff --git a/crates/meilisearch/src/routes/indexes/settings.rs b/crates/meilisearch/src/routes/indexes/settings.rs index 308977a6e..10120ebff 100644 --- a/crates/meilisearch/src/routes/indexes/settings.rs +++ b/crates/meilisearch/src/routes/indexes/settings.rs @@ -511,7 +511,7 @@ make_setting_routes!( }, { route: "/chat", - update_verb: put, + update_verb: patch, value_type: ChatSettings, err_type: meilisearch_types::deserr::DeserrJsonError< meilisearch_types::error::deserr_codes::InvalidSettingsIndexChat, diff --git a/crates/meilisearch/src/routes/mod.rs b/crates/meilisearch/src/routes/mod.rs index 260d973a1..745ac5824 100644 --- a/crates/meilisearch/src/routes/mod.rs +++ b/crates/meilisearch/src/routes/mod.rs @@ -41,6 +41,7 @@ use crate::routes::indexes::IndexView; use crate::routes::multi_search::SearchResults; use crate::routes::network::{Network, Remote}; use crate::routes::swap_indexes::SwapIndexesPayload; +use crate::routes::webhooks::{WebhookResults, WebhookSettings, WebhookWithMetadata}; use crate::search::{ FederatedSearch, FederatedSearchResult, Federation, FederationOptions, MergeFacets, SearchQueryWithIndex, SearchResultWithIndex, SimilarQuery, SimilarResult, @@ -70,6 +71,7 @@ mod swap_indexes; pub mod tasks; #[cfg(test)] mod tasks_test; +mod webhooks; #[derive(OpenApi)] #[openapi( @@ -89,6 +91,7 @@ mod tasks_test; (path = "/experimental-features", api = features::ExperimentalFeaturesApi), (path = "/export", api = export::ExportApi), (path = "/network", api = network::NetworkApi), + (path = "/webhooks", api = webhooks::WebhooksApi), ), paths(get_health, get_version, get_stats), tags( @@ -99,7 +102,7 @@ mod tasks_test; url = "/", description = "Local server", )), - components(schemas(PaginationView, PaginationView, IndexView, DocumentDeletionByFilter, AllBatches, BatchStats, ProgressStepView, ProgressView, BatchView, RuntimeTogglableFeatures, SwapIndexesPayload, DocumentEditionByFunction, MergeFacets, FederationOptions, SearchQueryWithIndex, Federation, FederatedSearch, FederatedSearchResult, SearchResults, SearchResultWithIndex, SimilarQuery, SimilarResult, PaginationView, BrowseQuery, UpdateIndexRequest, IndexUid, IndexCreateRequest, KeyView, Action, CreateApiKey, UpdateStderrLogs, LogMode, GetLogs, IndexStats, Stats, HealthStatus, HealthResponse, VersionResponse, Code, ErrorType, AllTasks, TaskView, Status, DetailsView, ResponseError, Settings, Settings, TypoSettings, MinWordSizeTyposSetting, FacetingSettings, PaginationSettings, SummarizedTaskView, Kind, Network, Remote, FilterableAttributesRule, FilterableAttributesPatterns, AttributePatterns, FilterableAttributesFeatures, FilterFeatures, Export)) + components(schemas(PaginationView, PaginationView, IndexView, DocumentDeletionByFilter, AllBatches, BatchStats, ProgressStepView, ProgressView, BatchView, RuntimeTogglableFeatures, SwapIndexesPayload, DocumentEditionByFunction, MergeFacets, FederationOptions, SearchQueryWithIndex, Federation, FederatedSearch, FederatedSearchResult, SearchResults, SearchResultWithIndex, SimilarQuery, SimilarResult, PaginationView, BrowseQuery, UpdateIndexRequest, IndexUid, IndexCreateRequest, KeyView, Action, CreateApiKey, UpdateStderrLogs, LogMode, GetLogs, IndexStats, Stats, HealthStatus, HealthResponse, VersionResponse, Code, ErrorType, AllTasks, TaskView, Status, DetailsView, ResponseError, Settings, Settings, TypoSettings, MinWordSizeTyposSetting, FacetingSettings, PaginationSettings, SummarizedTaskView, Kind, Network, Remote, FilterableAttributesRule, FilterableAttributesPatterns, AttributePatterns, FilterableAttributesFeatures, FilterFeatures, Export, WebhookSettings, WebhookResults, WebhookWithMetadata)) )] pub struct MeilisearchApi; @@ -120,7 +123,8 @@ pub fn configure(cfg: &mut web::ServiceConfig) { .service(web::scope("/experimental-features").configure(features::configure)) .service(web::scope("/network").configure(network::configure)) .service(web::scope("/export").configure(export::configure)) - .service(web::scope("/chats").configure(chats::configure)); + .service(web::scope("/chats").configure(chats::configure)) + .service(web::scope("/webhooks").configure(webhooks::configure)); #[cfg(feature = "swagger")] { diff --git a/crates/meilisearch/src/routes/network.rs b/crates/meilisearch/src/routes/network.rs index 7e58df113..4afa32c09 100644 --- a/crates/meilisearch/src/routes/network.rs +++ b/crates/meilisearch/src/routes/network.rs @@ -51,7 +51,7 @@ pub fn configure(cfg: &mut web::ServiceConfig) { get, path = "", tag = "Network", - security(("Bearer" = ["network.get", "network.*", "*"])), + security(("Bearer" = ["network.get", "*"])), responses( (status = OK, description = "Known nodes are returned", body = Network, content_type = "application/json", example = json!( { @@ -168,7 +168,7 @@ impl Aggregate for PatchNetworkAnalytics { path = "", tag = "Network", request_body = Network, - security(("Bearer" = ["network.update", "network.*", "*"])), + security(("Bearer" = ["network.update", "*"])), responses( (status = OK, description = "New network state is returned", body = Network, content_type = "application/json", example = json!( { diff --git a/crates/meilisearch/src/routes/tasks.rs b/crates/meilisearch/src/routes/tasks.rs index 95c105894..fb0f73425 100644 --- a/crates/meilisearch/src/routes/tasks.rs +++ b/crates/meilisearch/src/routes/tasks.rs @@ -336,7 +336,7 @@ impl Aggregate for TaskFilterAnalytics, example = "https://your.site/on-tasks-completed")] + #[deserr(default, error = DeserrJsonError)] + #[serde(default)] + url: Setting, + #[schema(value_type = Option>, example = json!({"Authorization":"Bearer a-secret-token"}))] + #[deserr(default, error = DeserrJsonError)] + #[serde(default)] + headers: Setting>>, +} + +fn deny_immutable_fields_webhook( + field: &str, + accepted: &[&str], + location: ValuePointerRef, +) -> DeserrJsonError { + match field { + "uuid" => immutable_field_error(field, accepted, Code::ImmutableWebhookUuid), + "isEditable" => immutable_field_error(field, accepted, Code::ImmutableWebhookIsEditable), + _ => deserr::take_cf_content(DeserrJsonError::::error::( + None, + deserr::ErrorKind::UnknownKey { key: field, accepted }, + location, + )), + } +} + +#[derive(Debug, Serialize, ToSchema)] +#[serde(rename_all = "camelCase")] +#[schema(rename_all = "camelCase")] +pub(super) struct WebhookWithMetadata { + uuid: Uuid, + is_editable: bool, + #[schema(value_type = WebhookSettings)] + #[serde(flatten)] + webhook: Webhook, +} + +impl WebhookWithMetadata { + pub fn from(uuid: Uuid, webhook: Webhook) -> Self { + Self { uuid, is_editable: uuid != Uuid::nil(), webhook } + } +} + +#[derive(Debug, Serialize, ToSchema)] +#[serde(rename_all = "camelCase")] +pub(super) struct WebhookResults { + results: Vec, +} + +#[utoipa::path( + get, + path = "", + tag = "Webhooks", + security(("Bearer" = ["webhooks.get", "webhooks.*", "*.get", "*"])), + responses( + (status = OK, description = "Webhooks are returned", body = WebhookResults, content_type = "application/json", example = json!({ + "results": [ + { + "uuid": "550e8400-e29b-41d4-a716-446655440000", + "url": "https://your.site/on-tasks-completed", + "headers": { + "Authorization": "Bearer a-secret-token" + }, + "isEditable": true + }, + { + "uuid": "550e8400-e29b-41d4-a716-446655440001", + "url": "https://another.site/on-tasks-completed", + "isEditable": true + } + ] + })), + (status = 401, description = "The authorization header is missing", body = ResponseError, content_type = "application/json", example = json!( + { + "message": "The Authorization header is missing. It must use the bearer authorization method.", + "code": "missing_authorization_header", + "type": "auth", + "link": "https://docs.meilisearch.com/errors#missing_authorization_header" + } + )), + ) +)] +async fn get_webhooks( + index_scheduler: GuardedData, Data>, +) -> Result { + let webhooks = index_scheduler.webhooks_view(); + let results = webhooks + .webhooks + .into_iter() + .map(|(uuid, webhook)| WebhookWithMetadata::from(uuid, webhook)) + .collect::>(); + let results = WebhookResults { results }; + + debug!(returns = ?results, "Get webhooks"); + Ok(HttpResponse::Ok().json(results)) +} + +#[derive(Serialize, Default)] +pub struct PatchWebhooksAnalytics; + +impl Aggregate for PatchWebhooksAnalytics { + fn event_name(&self) -> &'static str { + "Webhooks Updated" + } + + fn aggregate(self: Box, _new: Box) -> Box { + self + } + + fn into_event(self: Box) -> serde_json::Value { + serde_json::to_value(*self).unwrap_or_default() + } +} + +#[derive(Serialize, Default)] +pub struct PostWebhooksAnalytics; + +impl Aggregate for PostWebhooksAnalytics { + fn event_name(&self) -> &'static str { + "Webhooks Created" + } + + fn aggregate(self: Box, _new: Box) -> Box { + self + } + + fn into_event(self: Box) -> serde_json::Value { + serde_json::to_value(*self).unwrap_or_default() + } +} + +#[derive(Debug, thiserror::Error)] +enum WebhooksError { + #[error("The URL for the webhook `{0}` is missing.")] + MissingUrl(Uuid), + #[error("Defining too many webhooks would crush the server. Please limit the number of webhooks to 20. You may use a third-party proxy server to dispatch events to more than 20 endpoints.")] + TooManyWebhooks, + #[error("Too many headers for the webhook `{0}`. Please limit the number of headers to 200. Hint: To remove an already defined header set its value to `null`")] + TooManyHeaders(Uuid), + #[error("Webhook `{0}` is immutable. The webhook defined from the command line cannot be modified using the API.")] + ImmutableWebhook(Uuid), + #[error("Webhook `{0}` not found.")] + WebhookNotFound(Uuid), + #[error("Invalid header name `{0}`: {1}")] + InvalidHeaderName(String, ActixInvalidHeaderName), + #[error("Invalid header value `{0}`: {1}")] + InvalidHeaderValue(String, ActixInvalidHeaderValue), + #[error("Invalid URL `{0}`: {1}")] + InvalidUrl(String, url::ParseError), + #[error("Invalid UUID: {0}")] + InvalidUuid(uuid::Error), +} + +impl ErrorCode for WebhooksError { + fn error_code(&self) -> meilisearch_types::error::Code { + match self { + MissingUrl(_) => meilisearch_types::error::Code::InvalidWebhookUrl, + TooManyWebhooks => meilisearch_types::error::Code::InvalidWebhooks, + TooManyHeaders(_) => meilisearch_types::error::Code::InvalidWebhookHeaders, + ImmutableWebhook(_) => meilisearch_types::error::Code::ImmutableWebhook, + WebhookNotFound(_) => meilisearch_types::error::Code::WebhookNotFound, + InvalidHeaderName(_, _) => meilisearch_types::error::Code::InvalidWebhookHeaders, + InvalidHeaderValue(_, _) => meilisearch_types::error::Code::InvalidWebhookHeaders, + InvalidUrl(_, _) => meilisearch_types::error::Code::InvalidWebhookUrl, + InvalidUuid(_) => meilisearch_types::error::Code::InvalidWebhookUuid, + } + } +} + +fn patch_webhook_inner( + uuid: &Uuid, + old_webhook: Webhook, + new_webhook: WebhookSettings, +) -> Result { + let Webhook { url: old_url, mut headers } = old_webhook; + + let url = match new_webhook.url { + Setting::Set(url) => url, + Setting::NotSet => old_url, + Setting::Reset => return Err(MissingUrl(uuid.to_owned())), + }; + + match new_webhook.headers { + Setting::Set(new_headers) => { + for (name, value) in new_headers { + match value { + Setting::Set(value) => { + headers.insert(name, value); + } + Setting::NotSet => continue, + Setting::Reset => { + headers.remove(&name); + continue; + } + } + } + } + Setting::Reset => headers.clear(), + Setting::NotSet => (), + }; + + if headers.len() > 200 { + return Err(TooManyHeaders(uuid.to_owned())); + } + + Ok(Webhook { url, headers }) +} + +fn check_changed(uuid: Uuid, webhook: &Webhook) -> Result<(), WebhooksError> { + if uuid.is_nil() { + return Err(ImmutableWebhook(uuid)); + } + + if webhook.url.is_empty() { + return Err(MissingUrl(uuid)); + } + + if webhook.headers.len() > 200 { + return Err(TooManyHeaders(uuid)); + } + + for (header, value) in &webhook.headers { + HeaderName::from_bytes(header.as_bytes()) + .map_err(|e| InvalidHeaderName(header.to_owned(), e))?; + HeaderValue::from_str(value).map_err(|e| InvalidHeaderValue(header.to_owned(), e))?; + } + + if let Err(e) = Url::parse(&webhook.url) { + return Err(InvalidUrl(webhook.url.to_owned(), e)); + } + + Ok(()) +} + +#[utoipa::path( + get, + path = "/{uuid}", + tag = "Webhooks", + security(("Bearer" = ["webhooks.get", "webhooks.*", "*.get", "*"])), + responses( + (status = 200, description = "Webhook found", body = WebhookWithMetadata, content_type = "application/json", example = json!({ + "uuid": "550e8400-e29b-41d4-a716-446655440000", + "url": "https://your.site/on-tasks-completed", + "headers": { + "Authorization": "Bearer a-secret" + }, + "isEditable": true + })), + (status = 404, description = "Webhook not found", body = ResponseError, content_type = "application/json"), + (status = 401, description = "The authorization header is missing", body = ResponseError, content_type = "application/json"), + ), + params( + ("uuid" = Uuid, Path, description = "The universally unique identifier of the webhook") + ) +)] +async fn get_webhook( + index_scheduler: GuardedData, Data>, + uuid: Path, +) -> Result { + let uuid = Uuid::from_str(&uuid.into_inner()).map_err(InvalidUuid)?; + let mut webhooks = index_scheduler.webhooks_view(); + + let webhook = webhooks.webhooks.remove(&uuid).ok_or(WebhookNotFound(uuid))?; + let webhook = WebhookWithMetadata::from(uuid, webhook); + + debug!(returns = ?webhook, "Get webhook"); + Ok(HttpResponse::Ok().json(webhook)) +} + +#[utoipa::path( + post, + path = "", + tag = "Webhooks", + request_body = WebhookSettings, + security(("Bearer" = ["webhooks.create", "webhooks.*", "*"])), + responses( + (status = 201, description = "Webhook created successfully", body = WebhookWithMetadata, content_type = "application/json", example = json!({ + "uuid": "550e8400-e29b-41d4-a716-446655440000", + "url": "https://your.site/on-tasks-completed", + "headers": { + "Authorization": "Bearer a-secret-token" + }, + "isEditable": true + })), + (status = 401, description = "The authorization header is missing", body = ResponseError, content_type = "application/json"), + (status = 400, description = "Bad request", body = ResponseError, content_type = "application/json"), + ) +)] +async fn post_webhook( + index_scheduler: GuardedData, Data>, + webhook_settings: AwebJson, + req: HttpRequest, + analytics: Data, +) -> Result { + let webhook_settings = webhook_settings.into_inner(); + debug!(parameters = ?webhook_settings, "Post webhook"); + + let uuid = Uuid::new_v4(); + if webhook_settings.headers.as_ref().set().is_some_and(|h| h.len() > 200) { + return Err(TooManyHeaders(uuid).into()); + } + + let mut webhooks = index_scheduler.retrieve_runtime_webhooks(); + if webhooks.len() >= 20 { + return Err(TooManyWebhooks.into()); + } + + let webhook = Webhook { + url: webhook_settings.url.set().ok_or(MissingUrl(uuid))?, + headers: webhook_settings + .headers + .set() + .map(|h| h.into_iter().map(|(k, v)| (k, v.set().unwrap_or_default())).collect()) + .unwrap_or_default(), + }; + + check_changed(uuid, &webhook)?; + webhooks.insert(uuid, webhook.clone()); + index_scheduler.update_runtime_webhooks(webhooks)?; + + analytics.publish(PostWebhooksAnalytics, &req); + + let response = WebhookWithMetadata::from(uuid, webhook); + debug!(returns = ?response, "Post webhook"); + Ok(HttpResponse::Created().json(response)) +} + +#[utoipa::path( + patch, + path = "/{uuid}", + tag = "Webhooks", + request_body = WebhookSettings, + security(("Bearer" = ["webhooks.update", "webhooks.*", "*"])), + responses( + (status = 200, description = "Webhook updated successfully", body = WebhookWithMetadata, content_type = "application/json", example = json!({ + "uuid": "550e8400-e29b-41d4-a716-446655440000", + "url": "https://your.site/on-tasks-completed", + "headers": { + "Authorization": "Bearer a-secret-token" + }, + "isEditable": true + })), + (status = 401, description = "The authorization header is missing", body = ResponseError, content_type = "application/json"), + (status = 400, description = "Bad request", body = ResponseError, content_type = "application/json"), + ), + params( + ("uuid" = Uuid, Path, description = "The universally unique identifier of the webhook") + ) +)] +async fn patch_webhook( + index_scheduler: GuardedData, Data>, + uuid: Path, + webhook_settings: AwebJson, + req: HttpRequest, + analytics: Data, +) -> Result { + let uuid = Uuid::from_str(&uuid.into_inner()).map_err(InvalidUuid)?; + let webhook_settings = webhook_settings.into_inner(); + debug!(parameters = ?(uuid, &webhook_settings), "Patch webhook"); + + if uuid.is_nil() { + return Err(ImmutableWebhook(uuid).into()); + } + + let mut webhooks = index_scheduler.retrieve_runtime_webhooks(); + let old_webhook = webhooks.remove(&uuid).ok_or(WebhookNotFound(uuid))?; + let webhook = patch_webhook_inner(&uuid, old_webhook, webhook_settings)?; + + check_changed(uuid, &webhook)?; + webhooks.insert(uuid, webhook.clone()); + index_scheduler.update_runtime_webhooks(webhooks)?; + + analytics.publish(PatchWebhooksAnalytics, &req); + + let response = WebhookWithMetadata::from(uuid, webhook); + debug!(returns = ?response, "Patch webhook"); + Ok(HttpResponse::Ok().json(response)) +} + +#[utoipa::path( + delete, + path = "/{uuid}", + tag = "Webhooks", + security(("Bearer" = ["webhooks.delete", "webhooks.*", "*"])), + responses( + (status = 204, description = "Webhook deleted successfully"), + (status = 404, description = "Webhook not found", body = ResponseError, content_type = "application/json"), + (status = 401, description = "The authorization header is missing", body = ResponseError, content_type = "application/json"), + ), + params( + ("uuid" = Uuid, Path, description = "The universally unique identifier of the webhook") + ) +)] +async fn delete_webhook( + index_scheduler: GuardedData, Data>, + uuid: Path, +) -> Result { + let uuid = Uuid::from_str(&uuid.into_inner()).map_err(InvalidUuid)?; + debug!(parameters = ?uuid, "Delete webhook"); + + if uuid.is_nil() { + return Err(ImmutableWebhook(uuid).into()); + } + + let mut webhooks = index_scheduler.retrieve_runtime_webhooks(); + webhooks.remove(&uuid).ok_or(WebhookNotFound(uuid))?; + index_scheduler.update_runtime_webhooks(webhooks)?; + + debug!(returns = "No Content", "Delete webhook"); + Ok(HttpResponse::NoContent().finish()) +} diff --git a/crates/meilisearch/src/search/federated/perform.rs b/crates/meilisearch/src/search/federated/perform.rs index 5ad64d63c..3c80c22e3 100644 --- a/crates/meilisearch/src/search/federated/perform.rs +++ b/crates/meilisearch/src/search/federated/perform.rs @@ -13,6 +13,7 @@ use meilisearch_types::error::ResponseError; use meilisearch_types::features::{Network, Remote}; use meilisearch_types::milli::order_by_map::OrderByMap; use meilisearch_types::milli::score_details::{ScoreDetails, WeightedScoreValue}; +use meilisearch_types::milli::vector::Embedding; use meilisearch_types::milli::{self, DocumentId, OrderBy, TimeBudget, DEFAULT_VALUES_PER_FACET}; use roaring::RoaringBitmap; use tokio::task::JoinHandle; @@ -46,6 +47,7 @@ pub async fn perform_federated_search( let deadline = before_search + std::time::Duration::from_secs(9); let required_hit_count = federation.limit + federation.offset; + let retrieve_vectors = queries.iter().any(|q| q.retrieve_vectors); let network = index_scheduler.network(); @@ -91,6 +93,7 @@ pub async fn perform_federated_search( federation, mut semantic_hit_count, mut results_by_index, + mut query_vectors, previous_query_data: _, facet_order, } = search_by_index; @@ -122,7 +125,26 @@ pub async fn perform_federated_search( .map(|hit| hit.hit()) .collect(); - // 3.3. merge facets + // 3.3. merge query vectors + let query_vectors = if retrieve_vectors { + for remote_results in remote_results.iter_mut() { + if let Some(remote_vectors) = remote_results.query_vectors.take() { + for (key, value) in remote_vectors.into_iter() { + debug_assert!( + !query_vectors.contains_key(&key), + "Query vector for query {key} already exists" + ); + query_vectors.insert(key, value); + } + } + } + + Some(query_vectors) + } else { + None + }; + + // 3.4. merge facets let (facet_distribution, facet_stats, facets_by_index) = facet_order.merge(federation.merge_facets, remote_results, facets); @@ -140,6 +162,7 @@ pub async fn perform_federated_search( offset: federation.offset, estimated_total_hits, }, + query_vectors, semantic_hit_count, degraded, used_negative_operator, @@ -408,6 +431,7 @@ fn merge_metadata( hits: _, processing_time_ms, hits_info, + query_vectors: _, semantic_hit_count: _, facet_distribution: _, facet_stats: _, @@ -657,6 +681,7 @@ struct SearchByIndex { // Then when merging, we'll update its value if there is any semantic hit semantic_hit_count: Option, results_by_index: Vec, + query_vectors: BTreeMap, previous_query_data: Option<(RankingRules, usize, String)>, // remember the order and name of first index for each facet when merging with index settings // to detect if the order is inconsistent for a facet. @@ -674,6 +699,7 @@ impl SearchByIndex { federation, semantic_hit_count: None, results_by_index: Vec::with_capacity(index_count), + query_vectors: BTreeMap::new(), previous_query_data: None, } } @@ -745,10 +771,9 @@ impl SearchByIndex { match sort.iter().map(|s| milli::AscDesc::from_str(s)).collect() { Ok(sorts) => sorts, Err(asc_desc_error) => { - return Err(milli::Error::from(milli::SortError::from( - asc_desc_error, - )) - .into()) + return Err(milli::SortError::from(asc_desc_error) + .into_search_error() + .into()) } }; Some(sorts) @@ -838,8 +863,19 @@ impl SearchByIndex { document_scores, degraded: query_degraded, used_negative_operator: query_used_negative_operator, + query_vector, } = result; + if query.retrieve_vectors { + if let Some(query_vector) = query_vector { + debug_assert!( + !self.query_vectors.contains_key(&query_index), + "Query vector for query {query_index} already exists" + ); + self.query_vectors.insert(query_index, query_vector); + } + } + candidates |= query_candidates; degraded |= query_degraded; used_negative_operator |= query_used_negative_operator; diff --git a/crates/meilisearch/src/search/federated/types.rs b/crates/meilisearch/src/search/federated/types.rs index 3cf28c815..9c96fe768 100644 --- a/crates/meilisearch/src/search/federated/types.rs +++ b/crates/meilisearch/src/search/federated/types.rs @@ -18,6 +18,7 @@ use serde::{Deserialize, Serialize}; use utoipa::ToSchema; use super::super::{ComputedFacets, FacetStats, HitsInfo, SearchHit, SearchQueryWithIndex}; +use crate::milli::vector::Embedding; pub const DEFAULT_FEDERATED_WEIGHT: f64 = 1.0; @@ -117,6 +118,9 @@ pub struct FederatedSearchResult { #[serde(flatten)] pub hits_info: HitsInfo, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub query_vectors: Option>, + #[serde(default, skip_serializing_if = "Option::is_none")] pub semantic_hit_count: Option, @@ -144,6 +148,7 @@ impl fmt::Debug for FederatedSearchResult { hits, processing_time_ms, hits_info, + query_vectors, semantic_hit_count, degraded, used_negative_operator, @@ -158,6 +163,10 @@ impl fmt::Debug for FederatedSearchResult { debug.field("processing_time_ms", &processing_time_ms); debug.field("hits", &format!("[{} hits returned]", hits.len())); debug.field("hits_info", &hits_info); + if let Some(query_vectors) = query_vectors { + let known = query_vectors.len(); + debug.field("query_vectors", &format!("[{known} known vectors]")); + } if *used_negative_operator { debug.field("used_negative_operator", used_negative_operator); } diff --git a/crates/meilisearch/src/search/mod.rs b/crates/meilisearch/src/search/mod.rs index 1c987a70c..fca8cc3a6 100644 --- a/crates/meilisearch/src/search/mod.rs +++ b/crates/meilisearch/src/search/mod.rs @@ -16,7 +16,7 @@ use meilisearch_types::error::{Code, ResponseError}; use meilisearch_types::heed::RoTxn; use meilisearch_types::index_uid::IndexUid; use meilisearch_types::locales::Locale; -use meilisearch_types::milli::index::{self, SearchParameters}; +use meilisearch_types::milli::index::{self, EmbeddingsWithMetadata, SearchParameters}; use meilisearch_types::milli::score_details::{ScoreDetails, ScoringStrategy}; use meilisearch_types::milli::vector::parsed_vectors::ExplicitVectors; use meilisearch_types::milli::vector::Embedder; @@ -841,6 +841,8 @@ pub struct SearchHit { pub struct SearchResult { pub hits: Vec, pub query: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub query_vector: Option>, pub processing_time_ms: u128, #[serde(flatten)] pub hits_info: HitsInfo, @@ -865,6 +867,7 @@ impl fmt::Debug for SearchResult { let SearchResult { hits, query, + query_vector, processing_time_ms, hits_info, facet_distribution, @@ -879,6 +882,9 @@ impl fmt::Debug for SearchResult { debug.field("processing_time_ms", &processing_time_ms); debug.field("hits", &format!("[{} hits returned]", hits.len())); debug.field("query", &query); + if query_vector.is_some() { + debug.field("query_vector", &"[...]"); + } debug.field("hits_info", &hits_info); if *used_negative_operator { debug.field("used_negative_operator", used_negative_operator); @@ -1050,7 +1056,9 @@ pub fn prepare_search<'t>( .map(|x| x as usize) .unwrap_or(DEFAULT_PAGINATION_MAX_TOTAL_HITS); + search.retrieve_vectors(query.retrieve_vectors); search.exhaustive_number_hits(is_finite_pagination); + search.max_total_hits(Some(max_total_hits)); search.scoring_strategy( if query.show_ranking_score || query.show_ranking_score_details @@ -1091,7 +1099,7 @@ pub fn prepare_search<'t>( let sort = match sort.iter().map(|s| AscDesc::from_str(s)).collect() { Ok(sorts) => sorts, Err(asc_desc_error) => { - return Err(milli::Error::from(SortError::from(asc_desc_error)).into()) + return Err(SortError::from(asc_desc_error).into_search_error().into()) } }; @@ -1131,6 +1139,7 @@ pub fn perform_search( document_scores, degraded, used_negative_operator, + query_vector, }, semantic_hit_count, ) = search_from_kind(index_uid, search_kind, search)?; @@ -1221,6 +1230,7 @@ pub fn perform_search( hits: documents, hits_info, query: q.unwrap_or_default(), + query_vector, processing_time_ms: before_search.elapsed().as_millis(), facet_distribution, facet_stats, @@ -1527,8 +1537,11 @@ impl<'a> HitMaker<'a> { Some(Value::Object(map)) => map, _ => Default::default(), }; - for (name, (vector, regenerate)) in self.index.embeddings(self.rtxn, id)? { - let embeddings = ExplicitVectors { embeddings: Some(vector.into()), regenerate }; + for (name, EmbeddingsWithMetadata { embeddings, regenerate, has_fragments: _ }) in + self.index.embeddings(self.rtxn, id)? + { + let embeddings = + ExplicitVectors { embeddings: Some(embeddings.into()), regenerate }; vectors.insert( name, serde_json::to_value(embeddings).map_err(InternalError::SerdeJson)?, @@ -1730,6 +1743,7 @@ pub fn perform_similar( document_scores, degraded: _, used_negative_operator: _, + query_vector: _, } = similar.execute().map_err(|err| match err { milli::Error::UserError(milli::UserError::InvalidFilter(_)) => { ResponseError::from_msg(err.to_string(), Code::InvalidSimilarFilter) @@ -2077,7 +2091,7 @@ pub(crate) fn parse_filter( })?; if let Some(ref filter) = filter { - // If the contains operator is used while the contains filter features is not enabled, errors out + // If the contains operator is used while the contains filter feature is not enabled, errors out if let Some((token, error)) = filter.use_contains_operator().zip(features.check_contains_filter().err()) { @@ -2088,6 +2102,18 @@ pub(crate) fn parse_filter( } } + if let Some(ref filter) = filter { + // If a vector filter is used while the multi modal feature is not enabled, errors out + if let Some((token, error)) = + filter.use_vector_filter().zip(features.check_multimodal("using a vector filter").err()) + { + return Err(ResponseError::from_msg( + token.as_external_error(error).to_string(), + Code::FeatureNotEnabled, + )); + } + } + Ok(filter) } diff --git a/crates/meilisearch/tests/auth/api_keys.rs b/crates/meilisearch/tests/auth/api_keys.rs index 2688dd918..8dca24ac4 100644 --- a/crates/meilisearch/tests/auth/api_keys.rs +++ b/crates/meilisearch/tests/auth/api_keys.rs @@ -419,14 +419,14 @@ async fn error_add_api_key_invalid_parameters_actions() { let (response, code) = server.add_api_key(content).await; meili_snap::snapshot!(code, @"400 Bad Request"); - meili_snap::snapshot!(meili_snap::json_string!(response, { ".createdAt" => "[ignored]", ".updatedAt" => "[ignored]" }), @r###" + meili_snap::snapshot!(meili_snap::json_string!(response, { ".createdAt" => "[ignored]", ".updatedAt" => "[ignored]" }), @r#" { - "message": "Unknown value `doc.add` at `.actions[0]`: expected one of `*`, `search`, `documents.*`, `documents.add`, `documents.get`, `documents.delete`, `indexes.*`, `indexes.create`, `indexes.get`, `indexes.update`, `indexes.delete`, `indexes.swap`, `tasks.*`, `tasks.cancel`, `tasks.delete`, `tasks.get`, `settings.*`, `settings.get`, `settings.update`, `stats.*`, `stats.get`, `metrics.*`, `metrics.get`, `dumps.*`, `dumps.create`, `snapshots.*`, `snapshots.create`, `version`, `keys.create`, `keys.get`, `keys.update`, `keys.delete`, `experimental.get`, `experimental.update`, `export`, `network.get`, `network.update`, `chatCompletions`, `chats.*`, `chats.get`, `chats.delete`, `chatsSettings.*`, `chatsSettings.get`, `chatsSettings.update`", + "message": "Unknown value `doc.add` at `.actions[0]`: expected one of `*`, `search`, `documents.*`, `documents.add`, `documents.get`, `documents.delete`, `indexes.*`, `indexes.create`, `indexes.get`, `indexes.update`, `indexes.delete`, `indexes.swap`, `tasks.*`, `tasks.cancel`, `tasks.delete`, `tasks.get`, `settings.*`, `settings.get`, `settings.update`, `stats.*`, `stats.get`, `metrics.*`, `metrics.get`, `dumps.*`, `dumps.create`, `snapshots.*`, `snapshots.create`, `version`, `keys.create`, `keys.get`, `keys.update`, `keys.delete`, `experimental.get`, `experimental.update`, `export`, `network.get`, `network.update`, `chatCompletions`, `chats.*`, `chats.get`, `chats.delete`, `chatsSettings.*`, `chatsSettings.get`, `chatsSettings.update`, `*.get`, `webhooks.get`, `webhooks.update`, `webhooks.delete`, `webhooks.create`, `webhooks.*`", "code": "invalid_api_key_actions", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#invalid_api_key_actions" } - "###); + "#); } #[actix_rt::test] @@ -790,7 +790,7 @@ async fn list_api_keys() { meili_snap::snapshot!(code, @"201 Created"); let (response, code) = server.list_api_keys("").await; - meili_snap::snapshot!(meili_snap::json_string!(response, { ".results[].createdAt" => "[ignored]", ".results[].updatedAt" => "[ignored]", ".results[].uid" => "[ignored]", ".results[].key" => "[ignored]" }), @r###" + meili_snap::snapshot!(meili_snap::json_string!(response, { ".results[].createdAt" => "[ignored]", ".results[].updatedAt" => "[ignored]", ".results[].uid" => "[ignored]", ".results[].key" => "[ignored]" }), @r#" { "results": [ { @@ -850,6 +850,22 @@ async fn list_api_keys() { "createdAt": "[ignored]", "updatedAt": "[ignored]" }, + { + "name": "Default Read-Only Admin API Key", + "description": "Use it to read information across the whole database. Caution! Do not expose this key on a public frontend", + "key": "[ignored]", + "uid": "[ignored]", + "actions": [ + "*.get", + "keys.get" + ], + "indexes": [ + "*" + ], + "expiresAt": null, + "createdAt": "[ignored]", + "updatedAt": "[ignored]" + }, { "name": "Default Chat API Key", "description": "Use it to chat and search from the frontend", @@ -869,9 +885,9 @@ async fn list_api_keys() { ], "offset": 0, "limit": 20, - "total": 4 + "total": 5 } - "###); + "#); meili_snap::snapshot!(code, @"200 OK"); } diff --git a/crates/meilisearch/tests/auth/errors.rs b/crates/meilisearch/tests/auth/errors.rs index 687cb67a0..2a40f4d2b 100644 --- a/crates/meilisearch/tests/auth/errors.rs +++ b/crates/meilisearch/tests/auth/errors.rs @@ -91,14 +91,14 @@ async fn create_api_key_bad_actions() { // can't parse let (response, code) = server.add_api_key(json!({ "actions": ["doggo"] })).await; snapshot!(code, @"400 Bad Request"); - snapshot!(json_string!(response), @r###" + snapshot!(json_string!(response), @r#" { - "message": "Unknown value `doggo` at `.actions[0]`: expected one of `*`, `search`, `documents.*`, `documents.add`, `documents.get`, `documents.delete`, `indexes.*`, `indexes.create`, `indexes.get`, `indexes.update`, `indexes.delete`, `indexes.swap`, `tasks.*`, `tasks.cancel`, `tasks.delete`, `tasks.get`, `settings.*`, `settings.get`, `settings.update`, `stats.*`, `stats.get`, `metrics.*`, `metrics.get`, `dumps.*`, `dumps.create`, `snapshots.*`, `snapshots.create`, `version`, `keys.create`, `keys.get`, `keys.update`, `keys.delete`, `experimental.get`, `experimental.update`, `export`, `network.get`, `network.update`, `chatCompletions`, `chats.*`, `chats.get`, `chats.delete`, `chatsSettings.*`, `chatsSettings.get`, `chatsSettings.update`", + "message": "Unknown value `doggo` at `.actions[0]`: expected one of `*`, `search`, `documents.*`, `documents.add`, `documents.get`, `documents.delete`, `indexes.*`, `indexes.create`, `indexes.get`, `indexes.update`, `indexes.delete`, `indexes.swap`, `tasks.*`, `tasks.cancel`, `tasks.delete`, `tasks.get`, `settings.*`, `settings.get`, `settings.update`, `stats.*`, `stats.get`, `metrics.*`, `metrics.get`, `dumps.*`, `dumps.create`, `snapshots.*`, `snapshots.create`, `version`, `keys.create`, `keys.get`, `keys.update`, `keys.delete`, `experimental.get`, `experimental.update`, `export`, `network.get`, `network.update`, `chatCompletions`, `chats.*`, `chats.get`, `chats.delete`, `chatsSettings.*`, `chatsSettings.get`, `chatsSettings.update`, `*.get`, `webhooks.get`, `webhooks.update`, `webhooks.delete`, `webhooks.create`, `webhooks.*`", "code": "invalid_api_key_actions", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#invalid_api_key_actions" } - "###); + "#); } #[actix_rt::test] diff --git a/crates/meilisearch/tests/batches/mod.rs b/crates/meilisearch/tests/batches/mod.rs index a409aba03..9d6bee7c1 100644 --- a/crates/meilisearch/tests/batches/mod.rs +++ b/crates/meilisearch/tests/batches/mod.rs @@ -1,21 +1,37 @@ mod errors; +use insta::internals::{Content, ContentPath}; use meili_snap::insta::assert_json_snapshot; -use meili_snap::snapshot; +use meili_snap::{json_string, snapshot}; +use once_cell::sync::Lazy; +use regex::Regex; use crate::common::Server; use crate::json; +static TASK_WITH_ID_RE: Lazy = + Lazy::new(|| Regex::new(r"task with id (\d+) of type").unwrap()); + +fn task_with_id_redaction(value: Content, _path: ContentPath) -> Content { + match value { + Content::String(s) => { + let replaced = TASK_WITH_ID_RE.replace_all(&s, "task with id X of type"); + Content::String(replaced.to_string()) + } + _ => value.clone(), + } +} + #[actix_rt::test] async fn error_get_unexisting_batch_status() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _coder) = index.create(None).await; server.wait_task(task.uid()).await.succeeded(); - let (response, code) = index.get_batch(1).await; + let (response, code) = index.get_batch(u32::MAX).await; let expected_response = json!({ - "message": "Batch `1` not found.", + "message": format!("Batch `{}` not found.", u32::MAX), "code": "batch_not_found", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#batch_not_found" @@ -27,18 +43,18 @@ async fn error_get_unexisting_batch_status() { #[actix_rt::test] async fn get_batch_status() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _status_code) = index.create(None).await; - server.wait_task(task.uid()).await.succeeded(); - let (_response, code) = index.get_batch(0).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (_response, code) = index.get_batch(task.batch_uid()).await; assert_eq!(code, 200); } #[actix_rt::test] async fn list_batches() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _status_code) = index.create(None).await; server.wait_task(task.uid()).await.succeeded(); let (task, _status_code) = index.create(None).await; @@ -62,7 +78,7 @@ async fn list_batches_pagination_and_reverse() { let index = server.index(format!("test-{i}")); last_batch = Some(index.create(None).await.0.uid()); } - server.wait_task(last_batch.unwrap()).await; + server.wait_task(last_batch.unwrap()).await.succeeded(); let (response, code) = server.batches_filter("limit=3").await; assert_eq!(code, 200); @@ -119,91 +135,91 @@ async fn list_batches_with_star_filters() { let (response, code) = index.service.get("/batches?types=*,documentAdditionOrUpdate&statuses=*").await; - assert_eq!(code, 200, "{:?}", response); + assert_eq!(code, 200, "{response:?}"); assert_eq!(response["results"].as_array().unwrap().len(), 2); let (response, code) = index .service .get("/batches?types=*,documentAdditionOrUpdate&statuses=*,failed&indexUids=test") .await; - assert_eq!(code, 200, "{:?}", response); + assert_eq!(code, 200, "{response:?}"); assert_eq!(response["results"].as_array().unwrap().len(), 2); let (response, code) = index .service .get("/batches?types=*,documentAdditionOrUpdate&statuses=*,failed&indexUids=test,*") .await; - assert_eq!(code, 200, "{:?}", response); + assert_eq!(code, 200, "{response:?}"); assert_eq!(response["results"].as_array().unwrap().len(), 2); } #[actix_rt::test] async fn list_batches_status_filtered() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _status_code) = index.create(None).await; server.wait_task(task.uid()).await.succeeded(); let (task, _status_code) = index.create(None).await; server.wait_task(task.uid()).await.failed(); let (response, code) = index.filtered_batches(&[], &["succeeded"], &[]).await; - assert_eq!(code, 200, "{}", response); + assert_eq!(code, 200, "{response}"); assert_eq!(response["results"].as_array().unwrap().len(), 1); let (response, code) = index.filtered_batches(&[], &["succeeded"], &[]).await; - assert_eq!(code, 200, "{}", response); + assert_eq!(code, 200, "{response}"); assert_eq!(response["results"].as_array().unwrap().len(), 1); let (response, code) = index.filtered_batches(&[], &["succeeded", "failed"], &[]).await; - assert_eq!(code, 200, "{}", response); + assert_eq!(code, 200, "{response}"); assert_eq!(response["results"].as_array().unwrap().len(), 2); } #[actix_rt::test] async fn list_batches_type_filtered() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _) = index.create(None).await; server.wait_task(task.uid()).await.succeeded(); let (task, _) = index.delete().await; server.wait_task(task.uid()).await.succeeded(); let (response, code) = index.filtered_batches(&["indexCreation"], &[], &[]).await; - assert_eq!(code, 200, "{}", response); + assert_eq!(code, 200, "{response}"); assert_eq!(response["results"].as_array().unwrap().len(), 1); let (response, code) = - index.filtered_batches(&["indexCreation", "IndexDeletion"], &[], &[]).await; - assert_eq!(code, 200, "{}", response); + index.filtered_batches(&["indexCreation", "indexDeletion"], &[], &[]).await; + assert_eq!(code, 200, "{response}"); assert_eq!(response["results"].as_array().unwrap().len(), 2); let (response, code) = index.filtered_batches(&["indexCreation"], &[], &[]).await; - assert_eq!(code, 200, "{}", response); + assert_eq!(code, 200, "{response}"); assert_eq!(response["results"].as_array().unwrap().len(), 1); } #[actix_rt::test] async fn list_batches_invalid_canceled_by_filter() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _status_code) = index.create(None).await; server.wait_task(task.uid()).await.succeeded(); let (response, code) = index.filtered_batches(&[], &[], &["0"]).await; - assert_eq!(code, 200, "{}", response); + assert_eq!(code, 200, "{response}"); assert_eq!(response["results"].as_array().unwrap().len(), 0); } #[actix_rt::test] async fn list_batches_status_and_type_filtered() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _status_code) = index.create(None).await; server.wait_task(task.uid()).await.succeeded(); let (task, _status_code) = index.update(Some("id")).await; server.wait_task(task.uid()).await.succeeded(); let (response, code) = index.filtered_batches(&["indexCreation"], &["failed"], &[]).await; - assert_eq!(code, 200, "{}", response); + assert_eq!(code, 200, "{response}"); assert_eq!(response["results"].as_array().unwrap().len(), 0); let (response, code) = index @@ -213,17 +229,17 @@ async fn list_batches_status_and_type_filtered() { &[], ) .await; - assert_eq!(code, 200, "{}", response); + assert_eq!(code, 200, "{response}"); assert_eq!(response["results"].as_array().unwrap().len(), 2); } #[actix_rt::test] async fn list_batch_filter_error() { - let server = Server::new().await; + let server = Server::new_shared(); let (response, code) = server.batches_filter("lol=pied").await; - assert_eq!(code, 400, "{}", response); - meili_snap::snapshot!(meili_snap::json_string!(response), @r#" + assert_eq!(code, 400, "{response}"); + snapshot!(json_string!(response), @r#" { "message": "Unknown parameter `lol`: expected one of `limit`, `from`, `reverse`, `batchUids`, `uids`, `canceledBy`, `types`, `statuses`, `indexUids`, `afterEnqueuedAt`, `beforeEnqueuedAt`, `afterStartedAt`, `beforeStartedAt`, `afterFinishedAt`, `beforeFinishedAt`", "code": "bad_request", @@ -233,8 +249,8 @@ async fn list_batch_filter_error() { "#); let (response, code) = server.batches_filter("uids=pied").await; - assert_eq!(code, 400, "{}", response); - meili_snap::snapshot!(meili_snap::json_string!(response), @r#" + assert_eq!(code, 400, "{response}"); + snapshot!(json_string!(response), @r#" { "message": "Invalid value in parameter `uids`: could not parse `pied` as a positive integer", "code": "invalid_task_uids", @@ -244,8 +260,8 @@ async fn list_batch_filter_error() { "#); let (response, code) = server.batches_filter("from=pied").await; - assert_eq!(code, 400, "{}", response); - meili_snap::snapshot!(meili_snap::json_string!(response), @r#" + assert_eq!(code, 400, "{response}"); + snapshot!(json_string!(response), @r#" { "message": "Invalid value in parameter `from`: could not parse `pied` as a positive integer", "code": "invalid_task_from", @@ -255,8 +271,8 @@ async fn list_batch_filter_error() { "#); let (response, code) = server.batches_filter("beforeStartedAt=pied").await; - assert_eq!(code, 400, "{}", response); - meili_snap::snapshot!(meili_snap::json_string!(response), @r#" + assert_eq!(code, 400, "{response}"); + snapshot!(json_string!(response), @r#" { "message": "Invalid value in parameter `beforeStartedAt`: `pied` is an invalid date-time. It should follow the YYYY-MM-DD or RFC 3339 date-time format.", "code": "invalid_task_before_started_at", @@ -268,25 +284,28 @@ async fn list_batch_filter_error() { #[actix_web::test] async fn test_summarized_document_addition_or_update() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _status_code) = index.add_documents(json!({ "id": 42, "content": "doggos & fluff" }), None).await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(0).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", ".stats.writeChannelCongestion" => "[writeChannelCongestion]", - ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]" + ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => "batched all enqueued tasks", }, @r###" { - "uid": 0, + "uid": "[uid]", "progress": null, "details": { "receivedDocuments": 1, @@ -300,9 +319,7 @@ async fn test_summarized_document_addition_or_update() { "types": { "documentAdditionOrUpdate": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]", "writeChannelCongestion": "[writeChannelCongestion]", "internalDatabaseSizes": "[internalDatabaseSizes]" @@ -316,21 +333,24 @@ async fn test_summarized_document_addition_or_update() { let (task, _status_code) = index.add_documents(json!({ "id": 42, "content": "doggos & fluff" }), Some("id")).await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(1).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", ".stats.writeChannelCongestion" => "[writeChannelCongestion]", - ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]" + ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => "batched all enqueued tasks", }, @r###" { - "uid": 1, + "uid": "[uid]", "progress": null, "details": { "receivedDocuments": 1, @@ -344,9 +364,7 @@ async fn test_summarized_document_addition_or_update() { "types": { "documentAdditionOrUpdate": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]", "writeChannelCongestion": "[writeChannelCongestion]" }, @@ -360,23 +378,29 @@ async fn test_summarized_document_addition_or_update() { #[actix_web::test] async fn test_summarized_delete_documents_by_batch() { - let server = Server::new().await; - let index = server.index("test"); - let (task, _status_code) = index.delete_batch(vec![1, 2, 3]).await; - server.wait_task(task.uid()).await.failed(); - let (batch, _) = index.get_batch(0).await; + let server = Server::new_shared(); + let index = server.unique_index(); + let task_uid_1 = (u32::MAX - 1) as u64; + let task_uid_2 = (u32::MAX - 2) as u64; + let task_uid_3 = (u32::MAX - 3) as u64; + let (task, _status_code) = index.delete_batch(vec![task_uid_1, task_uid_2, task_uid_3]).await; + let task = server.wait_task(task.uid()).await.failed(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => "batched all enqueued tasks", }, @r###" { - "uid": 0, + "uid": "[uid]", "progress": null, "details": { "providedIds": 3, @@ -390,9 +414,7 @@ async fn test_summarized_delete_documents_by_batch() { "types": { "documentDeletion": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", @@ -404,21 +426,24 @@ async fn test_summarized_delete_documents_by_batch() { index.create(None).await; let (task, _status_code) = index.delete_batch(vec![42]).await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(2).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", ".stats.writeChannelCongestion" => "[writeChannelCongestion]", - ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]" + ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => "batched all enqueued tasks", }, @r###" { - "uid": 2, + "uid": "[uid]", "progress": null, "details": { "providedIds": 1, @@ -432,9 +457,7 @@ async fn test_summarized_delete_documents_by_batch() { "types": { "documentDeletion": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", @@ -447,25 +470,28 @@ async fn test_summarized_delete_documents_by_batch() { #[actix_web::test] async fn test_summarized_delete_documents_by_filter() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _status_code) = index.delete_document_by_filter(json!({ "filter": "doggo = bernese" })).await; - server.wait_task(task.uid()).await.failed(); - let (batch, _) = index.get_batch(0).await; + let task = server.wait_task(task.uid()).await.failed(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => "batched all enqueued tasks", }, @r###" { - "uid": 0, + "uid": "[uid]", "progress": null, "details": { "providedIds": 0, @@ -480,9 +506,7 @@ async fn test_summarized_delete_documents_by_filter() { "types": { "documentDeletion": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", @@ -495,21 +519,24 @@ async fn test_summarized_delete_documents_by_filter() { index.create(None).await; let (task, _status_code) = index.delete_document_by_filter(json!({ "filter": "doggo = bernese" })).await; - server.wait_task(task.uid()).await.failed(); - let (batch, _) = index.get_batch(2).await; + let task = server.wait_task(task.uid()).await.failed(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", ".stats.writeChannelCongestion" => "[writeChannelCongestion]", - ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]" + ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => "batched all enqueued tasks", }, @r###" { - "uid": 2, + "uid": "[uid]", "progress": null, "details": { "providedIds": 0, @@ -524,9 +551,7 @@ async fn test_summarized_delete_documents_by_filter() { "types": { "documentDeletion": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", @@ -539,21 +564,24 @@ async fn test_summarized_delete_documents_by_filter() { index.update_settings(json!({ "filterableAttributes": ["doggo"] })).await; let (task, _status_code) = index.delete_document_by_filter(json!({ "filter": "doggo = bernese" })).await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(4).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", ".stats.writeChannelCongestion" => "[writeChannelCongestion]", - ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]" + ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => "batched all enqueued tasks" }, @r###" { - "uid": 4, + "uid": "[uid]", "progress": null, "details": { "providedIds": 0, @@ -568,9 +596,7 @@ async fn test_summarized_delete_documents_by_filter() { "types": { "documentDeletion": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", @@ -583,11 +609,11 @@ async fn test_summarized_delete_documents_by_filter() { #[actix_web::test] async fn test_summarized_delete_document_by_id() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _status_code) = index.delete_document(1).await; - server.wait_task(task.uid()).await.failed(); - let (batch, _) = index.get_batch(0).await; + let task = server.wait_task(task.uid()).await.failed(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { ".uid" => "[uid]", @@ -596,7 +622,9 @@ async fn test_summarized_delete_document_by_id() { ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => "batched all enqueued tasks", }, @r###" { @@ -614,9 +642,7 @@ async fn test_summarized_delete_document_by_id() { "types": { "documentDeletion": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", @@ -628,21 +654,24 @@ async fn test_summarized_delete_document_by_id() { index.create(None).await; let (task, _status_code) = index.delete_document(42).await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(2).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", ".stats.writeChannelCongestion" => "[writeChannelCongestion]", - ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]" + ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => "batched all enqueued tasks", }, @r###" { - "uid": 2, + "uid": "[uid]", "progress": null, "details": { "providedIds": 1, @@ -656,9 +685,7 @@ async fn test_summarized_delete_document_by_id() { "types": { "documentDeletion": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", @@ -671,12 +698,12 @@ async fn test_summarized_delete_document_by_id() { #[actix_web::test] async fn test_summarized_settings_update() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); // here we should find my payload even in the failed batch. let (response, code) = index.update_settings(json!({ "rankingRules": ["custom"] })).await; - meili_snap::snapshot!(code, @"400 Bad Request"); - meili_snap::snapshot!(meili_snap::json_string!(response), @r###" + snapshot!(code, @"400 Bad Request"); + snapshot!(json_string!(response), @r###" { "message": "Invalid value at `.rankingRules[0]`: `custom` ranking rule is invalid. Valid ranking rules are words, typo, sort, proximity, attribute, exactness and custom ranking rules.", "code": "invalid_settings_ranking_rules", @@ -686,21 +713,24 @@ async fn test_summarized_settings_update() { "###); let (task,_status_code) = index.update_settings(json!({ "displayedAttributes": ["doggos", "name"], "filterableAttributes": ["age", "nb_paw_pads"], "sortableAttributes": ["iq"] })).await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(0).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", ".stats.writeChannelCongestion" => "[writeChannelCongestion]", - ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]" + ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => "batched all enqueued tasks" }, @r###" { - "uid": 0, + "uid": "[uid]", "progress": null, "details": { "displayedAttributes": [ @@ -723,9 +753,7 @@ async fn test_summarized_settings_update() { "types": { "settingsUpdate": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", @@ -738,23 +766,26 @@ async fn test_summarized_settings_update() { #[actix_web::test] async fn test_summarized_index_creation() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (task, _status_code) = index.create(None).await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(0).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => insta::dynamic_redaction(task_with_id_redaction), }, @r###" { - "uid": 0, + "uid": "[uid]", "progress": null, "details": {}, "stats": { @@ -765,33 +796,34 @@ async fn test_summarized_index_creation() { "types": { "indexCreation": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", "startedAt": "[date]", "finishedAt": "[date]", - "batchStrategy": "created batch containing only task with id 0 of type `indexCreation` that cannot be batched with any other task." + "batchStrategy": "created batch containing only task with id X of type `indexCreation` that cannot be batched with any other task." } "###); let (task, _status_code) = index.create(Some("doggos")).await; - server.wait_task(task.uid()).await.failed(); - let (batch, _) = index.get_batch(1).await; + let task = server.wait_task(task.uid()).await.failed(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => insta::dynamic_redaction(task_with_id_redaction), }, @r###" { - "uid": 1, + "uid": "[uid]", "progress": null, "details": { "primaryKey": "doggos" @@ -804,23 +836,21 @@ async fn test_summarized_index_creation() { "types": { "indexCreation": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", "startedAt": "[date]", "finishedAt": "[date]", - "batchStrategy": "created batch containing only task with id 1 of type `indexCreation` that cannot be batched with any other task." + "batchStrategy": "created batch containing only task with id X of type `indexCreation` that cannot be batched with any other task." } "###); } #[actix_web::test] async fn test_summarized_index_deletion() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); let (ret, _code) = index.delete().await; let batch = server.wait_task(ret.uid()).await.failed(); snapshot!(batch, @@ -828,7 +858,7 @@ async fn test_summarized_index_deletion() { { "uid": "[uid]", "batchUid": "[batch_uid]", - "indexUid": "test", + "indexUid": "[uuid]", "status": "failed", "type": "indexDeletion", "canceledBy": null, @@ -836,7 +866,7 @@ async fn test_summarized_index_deletion() { "deletedDocuments": 0 }, "error": { - "message": "Index `test` not found.", + "message": "Index `[uuid]` not found.", "code": "index_not_found", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#index_not_found" @@ -859,7 +889,7 @@ async fn test_summarized_index_deletion() { { "uid": "[uid]", "batchUid": "[batch_uid]", - "indexUid": "test", + "indexUid": "[uuid]", "status": "succeeded", "type": "documentAdditionOrUpdate", "canceledBy": null, @@ -882,7 +912,7 @@ async fn test_summarized_index_deletion() { { "uid": "[uid]", "batchUid": "[batch_uid]", - "indexUid": "test", + "indexUid": "[uuid]", "status": "succeeded", "type": "indexDeletion", "canceledBy": null, @@ -905,7 +935,7 @@ async fn test_summarized_index_deletion() { { "uid": "[uid]", "batchUid": "[batch_uid]", - "indexUid": "test", + "indexUid": "[uuid]", "status": "failed", "type": "indexDeletion", "canceledBy": null, @@ -913,7 +943,7 @@ async fn test_summarized_index_deletion() { "deletedDocuments": 0 }, "error": { - "message": "Index `test` not found.", + "message": "Index `[uuid]` not found.", "code": "index_not_found", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#index_not_found" @@ -928,24 +958,27 @@ async fn test_summarized_index_deletion() { #[actix_web::test] async fn test_summarized_index_update() { - let server = Server::new().await; - let index = server.index("test"); + let server = Server::new_shared(); + let index = server.unique_index(); // If the index doesn't exist yet, we should get errors with or without the primary key. let (task, _status_code) = index.update(None).await; - server.wait_task(task.uid()).await.failed(); - let (batch, _) = index.get_batch(0).await; + let task = server.wait_task(task.uid()).await.failed(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => insta::dynamic_redaction(task_with_id_redaction), }, @r###" { - "uid": 0, + "uid": "[uid]", "progress": null, "details": {}, "stats": { @@ -956,33 +989,34 @@ async fn test_summarized_index_update() { "types": { "indexUpdate": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", "startedAt": "[date]", "finishedAt": "[date]", - "batchStrategy": "created batch containing only task with id 0 of type `indexUpdate` that cannot be batched with any other task." + "batchStrategy": "created batch containing only task with id X of type `indexUpdate` that cannot be batched with any other task." } "###); let (task, _status_code) = index.update(Some("bones")).await; - server.wait_task(task.uid()).await.failed(); - let (batch, _) = index.get_batch(1).await; + let task = server.wait_task(task.uid()).await.failed(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => insta::dynamic_redaction(task_with_id_redaction), }, @r###" { - "uid": 1, + "uid": "[uid]", "progress": null, "details": { "primaryKey": "bones" @@ -995,36 +1029,37 @@ async fn test_summarized_index_update() { "types": { "indexUpdate": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", "startedAt": "[date]", "finishedAt": "[date]", - "batchStrategy": "created batch containing only task with id 1 of type `indexUpdate` that cannot be batched with any other task." + "batchStrategy": "created batch containing only task with id X of type `indexUpdate` that cannot be batched with any other task." } "###); - // And run the same two tests once the index do exists. + // And run the same two tests once the index does exist. index.create(None).await; let (task, _status_code) = index.update(None).await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(3).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => insta::dynamic_redaction(task_with_id_redaction), }, @r###" { - "uid": 3, + "uid": "[uid]", "progress": null, "details": {}, "stats": { @@ -1035,33 +1070,34 @@ async fn test_summarized_index_update() { "types": { "indexUpdate": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", "startedAt": "[date]", "finishedAt": "[date]", - "batchStrategy": "created batch containing only task with id 3 of type `indexUpdate` that cannot be batched with any other task." + "batchStrategy": "created batch containing only task with id X of type `indexUpdate` that cannot be batched with any other task." } "###); let (task, _status_code) = index.update(Some("bones")).await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(4).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => insta::dynamic_redaction(task_with_id_redaction), }, @r###" { - "uid": 4, + "uid": "[uid]", "progress": null, "details": { "primaryKey": "bones" @@ -1074,41 +1110,41 @@ async fn test_summarized_index_update() { "types": { "indexUpdate": 1 }, - "indexUids": { - "test": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", "startedAt": "[date]", "finishedAt": "[date]", - "batchStrategy": "created batch containing only task with id 4 of type `indexUpdate` that cannot be batched with any other task." + "batchStrategy": "created batch containing only task with id X of type `indexUpdate` that cannot be batched with any other task." } "###); } #[actix_web::test] async fn test_summarized_index_swap() { - let server = Server::new().await; + let server = Server::new_shared(); let (task, _status_code) = server .index_swap(json!([ { "indexes": ["doggos", "cattos"] } ])) .await; - server.wait_task(task.uid()).await.failed(); - let (batch, _) = server.get_batch(0).await; + let task = server.wait_task(task.uid()).await.failed(); + let (batch, _) = server.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".batchStrategy" => insta::dynamic_redaction(task_with_id_redaction), }, @r###" { - "uid": 0, + "uid": "[uid]", "progress": null, "details": { "swaps": [ @@ -1134,31 +1170,36 @@ async fn test_summarized_index_swap() { "duration": "[duration]", "startedAt": "[date]", "finishedAt": "[date]", - "batchStrategy": "created batch containing only task with id 0 of type `indexSwap` that cannot be batched with any other task." + "batchStrategy": "created batch containing only task with id X of type `indexSwap` that cannot be batched with any other task." } "###); - server.index("doggos").create(None).await; - let (task, _status_code) = server.index("cattos").create(None).await; + let doggos_index = server.unique_index(); + doggos_index.create(None).await; + let cattos_index = server.unique_index(); + let (task, _status_code) = cattos_index.create(None).await; server .index_swap(json!([ - { "indexes": ["doggos", "cattos"] } + { "indexes": [doggos_index.uid, cattos_index.uid] } ])) .await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = server.get_batch(1).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = server.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".stats.indexUids" => r#"{"[uuid]": 1}"#, + ".batchStrategy" => insta::dynamic_redaction(task_with_id_redaction), }, @r###" { - "uid": 1, + "uid": "[uid]", "progress": null, "details": {}, "stats": { @@ -1169,46 +1210,47 @@ async fn test_summarized_index_swap() { "types": { "indexCreation": 1 }, - "indexUids": { - "doggos": 1 - }, + "indexUids": "{\"[uuid]\": 1}", "progressTrace": "[progressTrace]" }, "duration": "[duration]", "startedAt": "[date]", "finishedAt": "[date]", - "batchStrategy": "created batch containing only task with id 1 of type `indexCreation` that cannot be batched with any other task." + "batchStrategy": "created batch containing only task with id X of type `indexCreation` that cannot be batched with any other task." } "###); } #[actix_web::test] async fn test_summarized_batch_cancelation() { - let server = Server::new().await; - let index = server.index("doggos"); + let server = Server::new_shared(); + let index = server.unique_index(); // to avoid being flaky we're only going to cancel an already finished batch :( let (task, _status_code) = index.create(None).await; server.wait_task(task.uid()).await.succeeded(); - let (task, _status_code) = server.cancel_tasks("uids=0").await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(1).await; + let (task, _status_code) = server.cancel_tasks(format!("uids={}", task.uid()).as_str()).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".details.originalFilter" => "?uids=X", + ".batchStrategy" => insta::dynamic_redaction(task_with_id_redaction), }, @r###" { - "uid": 1, + "uid": "[uid]", "progress": null, "details": { "matchedTasks": 1, "canceledTasks": 0, - "originalFilter": "?uids=0" + "originalFilter": "?uids=X" }, "stats": { "totalNbTasks": 1, @@ -1224,38 +1266,40 @@ async fn test_summarized_batch_cancelation() { "duration": "[duration]", "startedAt": "[date]", "finishedAt": "[date]", - "batchStrategy": "created batch containing only task with id 1 of type `taskCancelation` that cannot be batched with any other task." + "batchStrategy": "created batch containing only task with id X of type `taskCancelation` that cannot be batched with any other task." } "###); } #[actix_web::test] async fn test_summarized_batch_deletion() { - let server = Server::new().await; - let index = server.index("doggos"); + let server = Server::new_shared(); + let index = server.unique_index(); // to avoid being flaky we're only going to delete an already finished batch :( let (task, _status_code) = index.create(None).await; server.wait_task(task.uid()).await.succeeded(); - let (task, _status_code) = server.delete_tasks("uids=0").await; - server.wait_task(task.uid()).await.succeeded(); - let (batch, _) = index.get_batch(1).await; + let (task, _status_code) = server.delete_tasks(format!("uids={}", task.uid()).as_str()).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = index.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".details.originalFilter" => "?uids=X" }, @r###" { - "uid": 1, + "uid": "[uid]", "progress": null, "details": { "matchedTasks": 1, "deletedTasks": 1, - "originalFilter": "?uids=0" + "originalFilter": "?uids=X" }, "stats": { "totalNbTasks": 1, @@ -1278,23 +1322,25 @@ async fn test_summarized_batch_deletion() { #[actix_web::test] async fn test_summarized_dump_creation() { - let server = Server::new().await; + let server = Server::new_shared(); let (task, _status_code) = server.create_dump().await; - server.wait_task(task.uid()).await; - let (batch, _) = server.get_batch(0).await; + let task = server.wait_task(task.uid()).await.succeeded(); + let (batch, _) = server.get_batch(task.batch_uid()).await; assert_json_snapshot!(batch, { + ".uid" => "[uid]", ".details.dumpUid" => "[dumpUid]", ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".stats.progressTrace" => "[progressTrace]", - ".stats.writeChannelCongestion" => "[writeChannelCongestion]" + ".stats.writeChannelCongestion" => "[writeChannelCongestion]", + ".batchStrategy" => insta::dynamic_redaction(task_with_id_redaction), }, @r###" { - "uid": 0, + "uid": "[uid]", "progress": null, "details": { "dumpUid": "[dumpUid]" @@ -1313,7 +1359,7 @@ async fn test_summarized_dump_creation() { "duration": "[duration]", "startedAt": "[date]", "finishedAt": "[date]", - "batchStrategy": "created batch containing only task with id 0 of type `dumpCreation` that cannot be batched with any other task." + "batchStrategy": "created batch containing only task with id X of type `dumpCreation` that cannot be batched with any other task." } "###); } diff --git a/crates/meilisearch/tests/common/index.rs b/crates/meilisearch/tests/common/index.rs index b4ae151f3..012c9bebe 100644 --- a/crates/meilisearch/tests/common/index.rs +++ b/crates/meilisearch/tests/common/index.rs @@ -249,6 +249,11 @@ impl<'a> Index<'a, Owned> { self.service.put_encoded(url, settings, self.encoder).await } + pub async fn update_settings_chat(&self, settings: Value) -> (Value, StatusCode) { + let url = format!("/indexes/{}/settings/chat", urlencode(self.uid.as_ref())); + self.service.patch_encoded(url, settings, self.encoder).await + } + pub async fn delete_settings(&self) -> (Value, StatusCode) { let url = format!("/indexes/{}/settings", urlencode(self.uid.as_ref())); self.service.delete(url).await @@ -551,5 +556,7 @@ pub struct GetAllDocumentsOptions { pub offset: Option, #[serde(skip_serializing_if = "Option::is_none")] pub fields: Option>, + #[serde(skip_serializing_if = "Option::is_none")] + pub sort: Option>, pub retrieve_vectors: bool, } diff --git a/crates/meilisearch/tests/common/mod.rs b/crates/meilisearch/tests/common/mod.rs index d023d464e..03b1271f1 100644 --- a/crates/meilisearch/tests/common/mod.rs +++ b/crates/meilisearch/tests/common/mod.rs @@ -3,8 +3,10 @@ pub mod index; pub mod server; pub mod service; +use std::collections::BTreeMap; use std::fmt::{self, Display}; +use actix_http::StatusCode; #[allow(unused)] pub use index::GetAllDocumentsOptions; use meili_snap::json_string; @@ -13,6 +15,8 @@ use serde::{Deserialize, Serialize}; #[allow(unused)] pub use server::{default_settings, Server}; use tokio::sync::OnceCell; +use wiremock::matchers::{method, path}; +use wiremock::{Mock, MockServer, Request, ResponseTemplate}; use crate::common::index::Index; @@ -38,6 +42,15 @@ impl Value { self["uid"].as_u64().is_some() || self["taskUid"].as_u64().is_some() } + #[track_caller] + pub fn batch_uid(&self) -> u32 { + if let Some(batch_uid) = self["batchUid"].as_u64() { + batch_uid as u32 + } else { + panic!("Didn't find `batchUid` in: {self}"); + } + } + /// Return `true` if the `status` field is set to `succeeded`. /// Panic if the `status` field doesn't exists. #[track_caller] @@ -508,3 +521,166 @@ pub async fn shared_index_with_geo_documents() -> &'static Index<'static, Shared }) .await } + +pub async fn shared_index_for_fragments() -> Index<'static, Shared> { + static INDEX: OnceCell<(Server, String)> = OnceCell::const_new(); + let (server, uid) = INDEX + .get_or_init(|| async { + let (server, uid, _) = init_fragments_index().await; + (server.into_shared(), uid) + }) + .await; + server._index(uid).to_shared() +} + +async fn fragment_mock_server() -> String { + let text_to_embedding: BTreeMap<_, _> = vec![ + ("kefir", [0.5, -0.5, 0.0]), + ("intel", [1.0, 1.0, 0.0]), + ("dustin", [-0.5, 0.5, 0.0]), + ("bulldog", [0.0, 0.0, 1.0]), + ("labrador", [0.0, 0.0, -1.0]), + ("{{ doc.", [-9999.0, -9999.0, -9999.0]), // If a template didn't render + ] + .into_iter() + .collect(); + + let mock_server = Box::leak(Box::new(MockServer::start().await)); + + Mock::given(method("POST")) + .and(path("/")) + .respond_with(move |req: &Request| { + let text = String::from_utf8_lossy(&req.body).to_string(); + + let mut data = [0.0, 0.0, 0.0]; + for (inner_text, inner_data) in &text_to_embedding { + if text.contains(inner_text) { + for (i, &value) in inner_data.iter().enumerate() { + data[i] += value; + } + } + } + ResponseTemplate::new(200).set_body_json(json!({ "data": data })) + }) + .mount(mock_server) + .await; + + mock_server.uri() +} + +pub async fn init_fragments_index() -> (Server, String, crate::common::Value) { + let url = fragment_mock_server().await; + let server = Server::new().await; + let index = server.unique_index(); + + let (_response, code) = server.set_features(json!({"multimodal": true})).await; + assert_eq!(code, StatusCode::OK); + + // Configure the index to use our mock embedder + let settings = json!({ + "embedders": { + "rest": { + "source": "rest", + "url": url, + "dimensions": 3, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + }, + "indexingFragments": { + "withBreed": {"value": "{{ doc.name }} is a {{ doc.breed }}"}, + "basic": {"value": "{{ doc.name }} is a dog"}, + }, + "searchFragments": { + "justBreed": {"value": "It's a {{ media.breed }}"}, + "justName": {"value": "{{ media.name }} is a dog"}, + "query": {"value": "Some pre-prompt for query {{ q }}"}, + } + }, + }, + }); + let (response, code) = index.update_settings(settings.clone()).await; + assert_eq!(code, StatusCode::ACCEPTED); + + server.wait_task(response.uid()).await.succeeded(); + + // Send documents + let documents = json!([ + {"id": 0, "name": "kefir"}, + {"id": 1, "name": "echo", "_vectors": { "rest": [1, 1, 1] }}, + {"id": 2, "name": "intel", "breed": "labrador"}, + {"id": 3, "name": "dustin", "breed": "bulldog"}, + ]); + let (value, code) = index.add_documents(documents, None).await; + assert_eq!(code, StatusCode::ACCEPTED); + + let _task = server.wait_task(value.uid()).await.succeeded(); + + let uid = index.uid.clone(); + (server, uid, settings) +} + +pub async fn init_fragments_index_composite() -> (Server, String, crate::common::Value) { + let url = fragment_mock_server().await; + let server = Server::new().await; + let index = server.unique_index(); + + let (_response, code) = server.set_features(json!({"multimodal": true})).await; + assert_eq!(code, StatusCode::OK); + + let (_response, code) = server.set_features(json!({"compositeEmbedders": true})).await; + assert_eq!(code, StatusCode::OK); + + // Configure the index to use our mock embedder + let settings = json!({ + "embedders": { + "rest": { + "source": "composite", + "searchEmbedder": { + "source": "rest", + "url": url, + "dimensions": 3, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + }, + "searchFragments": { + "query": {"value": "Some pre-prompt for query {{ q }}"}, + } + }, + "indexingEmbedder": { + "source": "rest", + "url": url, + "dimensions": 3, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + }, + "indexingFragments": { + "withBreed": {"value": "{{ doc.name }} is a {{ doc.breed }}"}, + "basic": {"value": "{{ doc.name }} is a dog"}, + } + }, + }, + }, + }); + let (response, code) = index.update_settings(settings.clone()).await; + assert_eq!(code, StatusCode::ACCEPTED); + + server.wait_task(response.uid()).await.succeeded(); + + // Send documents + let documents = json!([ + {"id": 0, "name": "kefir"}, + {"id": 1, "name": "echo", "_vectors": { "rest": [1, 1, 1] }}, + {"id": 2, "name": "intel", "breed": "labrador"}, + {"id": 3, "name": "dustin", "breed": "bulldog"}, + ]); + let (value, code) = index.add_documents(documents, None).await; + assert_eq!(code, StatusCode::ACCEPTED); + + server.wait_task(value.uid()).await.succeeded(); + + let uid = index.uid.clone(); + (server, uid, settings) +} diff --git a/crates/meilisearch/tests/common/server.rs b/crates/meilisearch/tests/common/server.rs index 89c5a3aaa..291356bf8 100644 --- a/crates/meilisearch/tests/common/server.rs +++ b/crates/meilisearch/tests/common/server.rs @@ -35,7 +35,7 @@ pub struct Server { pub static TEST_TEMP_DIR: Lazy = Lazy::new(|| TempDir::new().unwrap()); impl Server { - fn into_shared(self) -> Server { + pub(super) fn into_shared(self) -> Server { Server { service: self.service, _dir: self._dir, _marker: PhantomData } } @@ -97,6 +97,7 @@ impl Server { self.use_api_key(master_key); let (response, code) = self.list_api_keys("").await; assert_eq!(200, code, "{:?}", response); + // TODO: relying on the order of keys is not ideal, we should use the name instead let admin_key = &response["results"][1]["key"]; self.use_api_key(admin_key.as_str().unwrap()); } @@ -181,6 +182,25 @@ impl Server { self.service.patch("/network", value).await } + pub async fn create_webhook(&self, value: Value) -> (Value, StatusCode) { + self.service.post("/webhooks", value).await + } + + pub async fn get_webhook(&self, uuid: impl AsRef) -> (Value, StatusCode) { + let url = format!("/webhooks/{}", uuid.as_ref()); + self.service.get(url).await + } + + pub async fn delete_webhook(&self, uuid: impl AsRef) -> (Value, StatusCode) { + let url = format!("/webhooks/{}", uuid.as_ref()); + self.service.delete(url).await + } + + pub async fn patch_webhook(&self, uuid: impl AsRef, value: Value) -> (Value, StatusCode) { + let url = format!("/webhooks/{}", uuid.as_ref()); + self.service.patch(url, value).await + } + pub async fn get_metrics(&self) -> (Value, StatusCode) { self.service.get("/metrics").await } @@ -446,6 +466,10 @@ impl Server { pub async fn get_network(&self) -> (Value, StatusCode) { self.service.get("/network").await } + + pub async fn get_webhooks(&self) -> (Value, StatusCode) { + self.service.get("/webhooks").await + } } pub fn default_settings(dir: impl AsRef) -> Opt { @@ -465,6 +489,7 @@ pub fn default_settings(dir: impl AsRef) -> Opt { // Having 2 threads makes the tests way faster max_indexing_threads: MaxThreads::from_str("2").unwrap(), experimental_no_edition_2024_for_settings: false, + experimental_no_edition_2024_for_dumps: false, }, experimental_enable_metrics: false, ..Parser::parse_from(None as Option<&str>) diff --git a/crates/meilisearch/tests/documents/errors.rs b/crates/meilisearch/tests/documents/errors.rs index ed1aec7e5..0ce5f0675 100644 --- a/crates/meilisearch/tests/documents/errors.rs +++ b/crates/meilisearch/tests/documents/errors.rs @@ -557,7 +557,7 @@ async fn delete_document_by_filter() { "###); let index = shared_does_not_exists_index().await; - // index does not exists + // index does not exist let (response, _code) = index.delete_document_by_filter_fail(json!({ "filter": "doggo = bernese"}), server).await; snapshot!(response, @r###" diff --git a/crates/meilisearch/tests/documents/get_documents.rs b/crates/meilisearch/tests/documents/get_documents.rs index 44eb181df..b3c68351f 100644 --- a/crates/meilisearch/tests/documents/get_documents.rs +++ b/crates/meilisearch/tests/documents/get_documents.rs @@ -5,8 +5,8 @@ use urlencoding::encode as urlencode; use crate::common::encoder::Encoder; use crate::common::{ - shared_does_not_exists_index, shared_empty_index, shared_index_with_test_set, - GetAllDocumentsOptions, Server, Value, + shared_does_not_exists_index, shared_empty_index, shared_index_with_geo_documents, + shared_index_with_test_set, GetAllDocumentsOptions, Server, Value, }; use crate::json; @@ -83,6 +83,311 @@ async fn get_document() { ); } +#[actix_rt::test] +async fn get_document_sorted() { + let server = Server::new_shared(); + let index = server.unique_index(); + index.load_test_set(server).await; + + let (task, _status_code) = + index.update_settings_sortable_attributes(json!(["age", "email", "gender", "name"])).await; + server.wait_task(task.uid()).await.succeeded(); + + let (response, _code) = index + .get_all_documents(GetAllDocumentsOptions { + fields: Some(vec!["id", "age", "email"]), + sort: Some(vec!["age:asc", "email:desc"]), + ..Default::default() + }) + .await; + let results = response["results"].as_array().unwrap(); + snapshot!(json_string!(results), @r#" + [ + { + "id": 5, + "age": 20, + "email": "warrenwatson@chorizon.com" + }, + { + "id": 6, + "age": 20, + "email": "sheliaberry@chorizon.com" + }, + { + "id": 57, + "age": 20, + "email": "kaitlinconner@chorizon.com" + }, + { + "id": 45, + "age": 20, + "email": "irenebennett@chorizon.com" + }, + { + "id": 40, + "age": 21, + "email": "staffordemerson@chorizon.com" + }, + { + "id": 41, + "age": 21, + "email": "salinasgamble@chorizon.com" + }, + { + "id": 63, + "age": 21, + "email": "knowleshebert@chorizon.com" + }, + { + "id": 50, + "age": 21, + "email": "guerramcintyre@chorizon.com" + }, + { + "id": 44, + "age": 22, + "email": "jonispears@chorizon.com" + }, + { + "id": 56, + "age": 23, + "email": "tuckerbarry@chorizon.com" + }, + { + "id": 51, + "age": 23, + "email": "keycervantes@chorizon.com" + }, + { + "id": 60, + "age": 23, + "email": "jodyherrera@chorizon.com" + }, + { + "id": 70, + "age": 23, + "email": "glassperkins@chorizon.com" + }, + { + "id": 75, + "age": 24, + "email": "emmajacobs@chorizon.com" + }, + { + "id": 68, + "age": 24, + "email": "angelinadyer@chorizon.com" + }, + { + "id": 17, + "age": 25, + "email": "ortegabrennan@chorizon.com" + }, + { + "id": 76, + "age": 25, + "email": "claricegardner@chorizon.com" + }, + { + "id": 43, + "age": 25, + "email": "arnoldbender@chorizon.com" + }, + { + "id": 12, + "age": 25, + "email": "aidakirby@chorizon.com" + }, + { + "id": 9, + "age": 26, + "email": "kellimendez@chorizon.com" + } + ] + "#); + + let (response, _code) = index + .get_all_documents(GetAllDocumentsOptions { + fields: Some(vec!["id", "gender", "name"]), + sort: Some(vec!["gender:asc", "name:asc"]), + ..Default::default() + }) + .await; + let results = response["results"].as_array().unwrap(); + snapshot!(json_string!(results), @r#" + [ + { + "id": 3, + "name": "Adeline Flynn", + "gender": "female" + }, + { + "id": 12, + "name": "Aida Kirby", + "gender": "female" + }, + { + "id": 68, + "name": "Angelina Dyer", + "gender": "female" + }, + { + "id": 15, + "name": "Aurelia Contreras", + "gender": "female" + }, + { + "id": 36, + "name": "Barbra Valenzuela", + "gender": "female" + }, + { + "id": 23, + "name": "Blanca Mcclain", + "gender": "female" + }, + { + "id": 53, + "name": "Caitlin Burnett", + "gender": "female" + }, + { + "id": 71, + "name": "Candace Sawyer", + "gender": "female" + }, + { + "id": 65, + "name": "Carole Rowland", + "gender": "female" + }, + { + "id": 33, + "name": "Cecilia Greer", + "gender": "female" + }, + { + "id": 1, + "name": "Cherry Orr", + "gender": "female" + }, + { + "id": 38, + "name": "Christina Short", + "gender": "female" + }, + { + "id": 7, + "name": "Chrystal Boyd", + "gender": "female" + }, + { + "id": 76, + "name": "Clarice Gardner", + "gender": "female" + }, + { + "id": 73, + "name": "Eleanor Shepherd", + "gender": "female" + }, + { + "id": 75, + "name": "Emma Jacobs", + "gender": "female" + }, + { + "id": 16, + "name": "Estella Bass", + "gender": "female" + }, + { + "id": 62, + "name": "Estelle Ramirez", + "gender": "female" + }, + { + "id": 20, + "name": "Florence Long", + "gender": "female" + }, + { + "id": 42, + "name": "Graciela Russell", + "gender": "female" + } + ] + "#); +} + +#[actix_rt::test] +async fn get_document_geosorted() { + let index = shared_index_with_geo_documents().await; + + let (response, _code) = index + .get_all_documents(GetAllDocumentsOptions { + sort: Some(vec!["_geoPoint(45.4777599, 9.1967508):asc"]), + ..Default::default() + }) + .await; + let results = response["results"].as_array().unwrap(); + snapshot!(json_string!(results), @r#" + [ + { + "id": 2, + "name": "La Bella Italia", + "address": "456 Elm Street, Townsville", + "type": "Italian", + "rating": 9, + "_geo": { + "lat": "45.4777599", + "lng": "9.1967508" + } + }, + { + "id": 1, + "name": "Taco Truck", + "address": "444 Salsa Street, Burritoville", + "type": "Mexican", + "rating": 9, + "_geo": { + "lat": 34.0522, + "lng": -118.2437 + } + }, + { + "id": 3, + "name": "Crêpe Truck", + "address": "2 Billig Avenue, Rouenville", + "type": "French", + "rating": 10 + } + ] + "#); +} + +#[actix_rt::test] +async fn get_document_sort_the_unsortable() { + let index = shared_index_with_test_set().await; + + let (response, _code) = index + .get_all_documents(GetAllDocumentsOptions { + fields: Some(vec!["id", "name"]), + sort: Some(vec!["name:asc"]), + ..Default::default() + }) + .await; + + snapshot!(json_string!(response), @r#" + { + "message": "Attribute `name` is not sortable. This index does not have configured sortable attributes.", + "code": "invalid_document_sort", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_document_sort" + } + "#); +} + #[actix_rt::test] async fn error_get_unexisting_index_all_documents() { let index = shared_does_not_exists_index().await; diff --git a/crates/meilisearch/tests/index/stats.rs b/crates/meilisearch/tests/index/stats.rs index 610601318..7f2ca9b4a 100644 --- a/crates/meilisearch/tests/index/stats.rs +++ b/crates/meilisearch/tests/index/stats.rs @@ -1,5 +1,4 @@ use crate::common::{shared_does_not_exists_index, Server}; - use crate::json; #[actix_rt::test] diff --git a/crates/meilisearch/tests/search/errors.rs b/crates/meilisearch/tests/search/errors.rs index 363ece067..b89129775 100644 --- a/crates/meilisearch/tests/search/errors.rs +++ b/crates/meilisearch/tests/search/errors.rs @@ -304,7 +304,7 @@ async fn search_bad_filter() { let server = Server::new_shared(); let index = server.unique_index(); // Also, to trigger the error message we need to effectively create the index or else it'll throw an - // index does not exists error. + // index does not exist error. let (response, _code) = index.create(None).await; server.wait_task(response.uid()).await.succeeded(); @@ -1263,34 +1263,34 @@ async fn search_with_contains_without_enabling_the_feature() { let server = Server::new_shared(); let index = server.unique_index(); // Also, to trigger the error message we need to effectively create the index or else it'll throw an - // index does not exists error. + // index does not exist error. let (task, _code) = index.create(None).await; server.wait_task(task.uid()).await.succeeded(); index .search(json!({ "filter": "doggo CONTAINS kefir" }), |response, code| { snapshot!(code, @"400 Bad Request"); - snapshot!(json_string!(response), @r###" + snapshot!(json_string!(response), @r#" { - "message": "Using `CONTAINS` or `STARTS WITH` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", + "message": "Using `CONTAINS` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", "code": "feature_not_enabled", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#feature_not_enabled" } - "###); + "#); }) .await; index .search(json!({ "filter": "doggo != echo AND doggo CONTAINS kefir" }), |response, code| { snapshot!(code, @"400 Bad Request"); - snapshot!(json_string!(response), @r###" + snapshot!(json_string!(response), @r#" { - "message": "Using `CONTAINS` or `STARTS WITH` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n25:33 doggo != echo AND doggo CONTAINS kefir", + "message": "Using `CONTAINS` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n25:33 doggo != echo AND doggo CONTAINS kefir", "code": "feature_not_enabled", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#feature_not_enabled" } - "###); + "#); }) .await; @@ -1299,24 +1299,24 @@ async fn search_with_contains_without_enabling_the_feature() { index.search_post(json!({ "filter": ["doggo != echo", "doggo CONTAINS kefir"] })).await; snapshot!(code, @"400 Bad Request"); - snapshot!(json_string!(response), @r###" + snapshot!(json_string!(response), @r#" { - "message": "Using `CONTAINS` or `STARTS WITH` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", + "message": "Using `CONTAINS` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", "code": "feature_not_enabled", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#feature_not_enabled" } - "###); + "#); let (response, code) = index.search_post(json!({ "filter": ["doggo != echo", ["doggo CONTAINS kefir"]] })).await; snapshot!(code, @"400 Bad Request"); - snapshot!(json_string!(response), @r###" + snapshot!(json_string!(response), @r#" { - "message": "Using `CONTAINS` or `STARTS WITH` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", + "message": "Using `CONTAINS` in a filter requires enabling the `contains filter` experimental feature. See https://github.com/orgs/meilisearch/discussions/763\n7:15 doggo CONTAINS kefir", "code": "feature_not_enabled", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#feature_not_enabled" } - "###); + "#); } diff --git a/crates/meilisearch/tests/search/filters.rs b/crates/meilisearch/tests/search/filters.rs index ffa025f5c..ef562bf4f 100644 --- a/crates/meilisearch/tests/search/filters.rs +++ b/crates/meilisearch/tests/search/filters.rs @@ -4,8 +4,8 @@ use tempfile::TempDir; use super::test_settings_documents_indexing_swapping_and_search; use crate::common::{ - default_settings, shared_index_with_documents, shared_index_with_nested_documents, Server, - DOCUMENTS, NESTED_DOCUMENTS, + default_settings, shared_index_for_fragments, shared_index_with_documents, + shared_index_with_nested_documents, Server, DOCUMENTS, NESTED_DOCUMENTS, }; use crate::json; @@ -731,3 +731,432 @@ async fn test_filterable_attributes_priority() { ) .await; } + +#[actix_rt::test] +async fn vector_filter_all_embedders() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "hits": [ + { + "name": "kefir" + }, + { + "name": "echo" + }, + { + "name": "intel" + }, + { + "name": "dustin" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 4 + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_missing_fragment() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.rest.fragments EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "message": "The vector filter is missing a fragment name.\n24:31 _vectors.rest.fragments EXISTS", + "code": "invalid_search_filter", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_search_filter" + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_nonexistent_embedder() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.other EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "message": "Index `[uuid]`: The embedder `other` does not exist. Available embedders are: `rest`.\n10:15 _vectors.other EXISTS", + "code": "invalid_search_filter", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_search_filter" + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_all_embedders_user_provided() { + let index = shared_index_for_fragments().await; + + // This one is counterintuitive, but it is the same as the previous one. + // It's because userProvided is interpreted as an embedder name + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.userProvided EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "message": "Index `[uuid]`: The embedder `userProvided` does not exist. Available embedders are: `rest`.\n10:22 _vectors.userProvided EXISTS", + "code": "invalid_search_filter", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_search_filter" + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_specific_embedder() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.rest EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "hits": [ + { + "name": "kefir" + }, + { + "name": "echo" + }, + { + "name": "intel" + }, + { + "name": "dustin" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 4 + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_user_provided() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.rest.userProvided EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "hits": [ + { + "name": "echo" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 1 + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_specific_fragment() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.rest.fragments.withBreed EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "hits": [ + { + "name": "intel" + }, + { + "name": "dustin" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 2 + } + "#); + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.rest.fragments.basic EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "hits": [ + { + "name": "kefir" + }, + { + "name": "intel" + }, + { + "name": "dustin" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3 + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_non_existant_fragment() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.rest.fragments.withBred EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "message": "Index `[uuid]`: The fragment `withBred` does not exist on embedder `rest`. Available fragments on this embedder are: `basic`, `withBreed`. Did you mean `withBreed`?\n25:33 _vectors.rest.fragments.withBred EXISTS", + "code": "invalid_search_filter", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_search_filter" + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_document_template_but_fragments_used() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.rest.documentTemplate EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "hits": [], + "query": "", + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0 + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_document_template() { + let (_mock, setting) = crate::vector::create_mock().await; + let server = crate::vector::get_server_vector().await; + let index = server.index("doggo"); + + let (_response, code) = server.set_features(json!({"multimodal": true})).await; + snapshot!(code, @"200 OK"); + + let (response, code) = index + .update_settings(json!({ + "embedders": { + "rest": setting, + }, + })) + .await; + snapshot!(code, @"202 Accepted"); + server.wait_task(response.uid()).await.succeeded(); + + let documents = json!([ + {"id": 0, "name": "kefir"}, + {"id": 1, "name": "echo", "_vectors": { "rest": [1, 1, 1] }}, + {"id": 2, "name": "intel"}, + {"id": 3, "name": "iko" } + ]); + let (value, code) = index.add_documents(documents, None).await; + snapshot!(code, @"202 Accepted"); + server.wait_task(value.uid()).await.succeeded(); + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.rest.documentTemplate EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "hits": [ + { + "name": "kefir" + }, + { + "name": "intel" + }, + { + "name": "iko" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3 + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_feature_gate() { + let index = shared_index_with_documents().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "message": "using a vector filter requires enabling the `multimodal` experimental feature. See https://github.com/orgs/meilisearch/discussions/846\n1:9 _vectors EXISTS", + "code": "feature_not_enabled", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#feature_not_enabled" + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_negation() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.rest.userProvided NOT EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "hits": [ + { + "name": "kefir" + }, + { + "name": "intel" + }, + { + "name": "dustin" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3 + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_or_combination() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": "_vectors.rest.fragments.withBreed EXISTS OR _vectors.rest.userProvided EXISTS", + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "hits": [ + { + "name": "echo" + }, + { + "name": "intel" + }, + { + "name": "dustin" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3 + } + "#); +} + +#[actix_rt::test] +async fn vector_filter_regenerate() { + let index = shared_index_for_fragments().await; + + let (value, _code) = index + .search_post(json!({ + "filter": format!("_vectors.rest.regenerate EXISTS"), + "attributesToRetrieve": ["name"] + })) + .await; + snapshot!(value, @r#" + { + "hits": [ + { + "name": "kefir" + }, + { + "name": "intel" + }, + { + "name": "dustin" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3 + } + "#); +} diff --git a/crates/meilisearch/tests/search/hybrid.rs b/crates/meilisearch/tests/search/hybrid.rs index d95e6fb64..b2970f233 100644 --- a/crates/meilisearch/tests/search/hybrid.rs +++ b/crates/meilisearch/tests/search/hybrid.rs @@ -148,7 +148,70 @@ async fn simple_search() { ) .await; snapshot!(code, @"200 OK"); - snapshot!(response["hits"], @r###"[{"title":"Captain Planet","desc":"He's not part of the Marvel Cinematic Universe","id":"2","_vectors":{"default":{"embeddings":[[1.0,2.0]],"regenerate":false}}},{"title":"Captain Marvel","desc":"a Shazam ersatz","id":"3","_vectors":{"default":{"embeddings":[[2.0,3.0]],"regenerate":false}}},{"title":"Shazam!","desc":"a Captain Marvel ersatz","id":"1","_vectors":{"default":{"embeddings":[[1.0,3.0]],"regenerate":false}}}]"###); + snapshot!(response, @r#" + { + "hits": [ + { + "title": "Captain Planet", + "desc": "He's not part of the Marvel Cinematic Universe", + "id": "2", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 2.0 + ] + ], + "regenerate": false + } + } + }, + { + "title": "Captain Marvel", + "desc": "a Shazam ersatz", + "id": "3", + "_vectors": { + "default": { + "embeddings": [ + [ + 2.0, + 3.0 + ] + ], + "regenerate": false + } + } + }, + { + "title": "Shazam!", + "desc": "a Captain Marvel ersatz", + "id": "1", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 3.0 + ] + ], + "regenerate": false + } + } + } + ], + "query": "Captain", + "queryVector": [ + 1.0, + 1.0 + ], + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3, + "semanticHitCount": 0 + } + "#); snapshot!(response["semanticHitCount"], @"0"); let (response, code) = index @@ -157,7 +220,73 @@ async fn simple_search() { ) .await; snapshot!(code, @"200 OK"); - snapshot!(response["hits"], @r###"[{"title":"Captain Marvel","desc":"a Shazam ersatz","id":"3","_vectors":{"default":{"embeddings":[[2.0,3.0]],"regenerate":false}},"_rankingScore":0.990290343761444},{"title":"Captain Planet","desc":"He's not part of the Marvel Cinematic Universe","id":"2","_vectors":{"default":{"embeddings":[[1.0,2.0]],"regenerate":false}},"_rankingScore":0.9848484848484848},{"title":"Shazam!","desc":"a Captain Marvel ersatz","id":"1","_vectors":{"default":{"embeddings":[[1.0,3.0]],"regenerate":false}},"_rankingScore":0.9472135901451112}]"###); + snapshot!(response, @r#" + { + "hits": [ + { + "title": "Captain Marvel", + "desc": "a Shazam ersatz", + "id": "3", + "_vectors": { + "default": { + "embeddings": [ + [ + 2.0, + 3.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.990290343761444 + }, + { + "title": "Captain Planet", + "desc": "He's not part of the Marvel Cinematic Universe", + "id": "2", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 2.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.9848484848484848 + }, + { + "title": "Shazam!", + "desc": "a Captain Marvel ersatz", + "id": "1", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 3.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.9472135901451112 + } + ], + "query": "Captain", + "queryVector": [ + 1.0, + 1.0 + ], + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3, + "semanticHitCount": 2 + } + "#); snapshot!(response["semanticHitCount"], @"2"); let (response, code) = index @@ -166,7 +295,73 @@ async fn simple_search() { ) .await; snapshot!(code, @"200 OK"); - snapshot!(response["hits"], @r###"[{"title":"Captain Marvel","desc":"a Shazam ersatz","id":"3","_vectors":{"default":{"embeddings":[[2.0,3.0]],"regenerate":false}},"_rankingScore":0.990290343761444},{"title":"Captain Planet","desc":"He's not part of the Marvel Cinematic Universe","id":"2","_vectors":{"default":{"embeddings":[[1.0,2.0]],"regenerate":false}},"_rankingScore":0.974341630935669},{"title":"Shazam!","desc":"a Captain Marvel ersatz","id":"1","_vectors":{"default":{"embeddings":[[1.0,3.0]],"regenerate":false}},"_rankingScore":0.9472135901451112}]"###); + snapshot!(response, @r#" + { + "hits": [ + { + "title": "Captain Marvel", + "desc": "a Shazam ersatz", + "id": "3", + "_vectors": { + "default": { + "embeddings": [ + [ + 2.0, + 3.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.990290343761444 + }, + { + "title": "Captain Planet", + "desc": "He's not part of the Marvel Cinematic Universe", + "id": "2", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 2.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.974341630935669 + }, + { + "title": "Shazam!", + "desc": "a Captain Marvel ersatz", + "id": "1", + "_vectors": { + "default": { + "embeddings": [ + [ + 1.0, + 3.0 + ] + ], + "regenerate": false + } + }, + "_rankingScore": 0.9472135901451112 + } + ], + "query": "Captain", + "queryVector": [ + 1.0, + 1.0 + ], + "processingTimeMs": "[duration]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 3, + "semanticHitCount": 3 + } + "#); snapshot!(response["semanticHitCount"], @"3"); } diff --git a/crates/meilisearch/tests/search/multi/mod.rs b/crates/meilisearch/tests/search/multi/mod.rs index b9eed56da..16ee3906e 100644 --- a/crates/meilisearch/tests/search/multi/mod.rs +++ b/crates/meilisearch/tests/search/multi/mod.rs @@ -3703,7 +3703,7 @@ async fn federation_vector_two_indexes() { ]})) .await; snapshot!(code, @"200 OK"); - snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".**._rankingScore" => "[score]" }), @r###" + snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".**._rankingScore" => "[score]" }), @r#" { "hits": [ { @@ -3911,9 +3911,20 @@ async fn federation_vector_two_indexes() { "limit": 20, "offset": 0, "estimatedTotalHits": 8, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.5 + ], + "1": [ + 0.8, + 0.6 + ] + }, "semanticHitCount": 6 } - "###); + "#); // hybrid search, distinct embedder let (response, code) = server @@ -3923,7 +3934,7 @@ async fn federation_vector_two_indexes() { ]})) .await; snapshot!(code, @"200 OK"); - snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".**._rankingScore" => "[score]" }), @r###" + snapshot!(json_string!(response, { ".processingTimeMs" => "[duration]", ".**._rankingScore" => "[score]" }), @r#" { "hits": [ { @@ -4139,9 +4150,20 @@ async fn federation_vector_two_indexes() { "limit": 20, "offset": 0, "estimatedTotalHits": 8, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.5 + ], + "1": [ + -1.0, + 0.6 + ] + }, "semanticHitCount": 8 } - "###); + "#); } #[actix_rt::test] diff --git a/crates/meilisearch/tests/search/multi/proxy.rs b/crates/meilisearch/tests/search/multi/proxy.rs index 311f69d9e..2b1623ff8 100644 --- a/crates/meilisearch/tests/search/multi/proxy.rs +++ b/crates/meilisearch/tests/search/multi/proxy.rs @@ -2,8 +2,9 @@ use std::sync::Arc; use actix_http::StatusCode; use meili_snap::{json_string, snapshot}; -use wiremock::matchers::AnyMatcher; -use wiremock::{Mock, MockServer, ResponseTemplate}; +use wiremock::matchers::method; +use wiremock::matchers::{path, AnyMatcher}; +use wiremock::{Mock, MockServer, Request, ResponseTemplate}; use crate::common::{Server, Value, SCORE_DOCUMENTS}; use crate::json; @@ -415,6 +416,503 @@ async fn remote_sharding() { "###); } +#[actix_rt::test] +async fn remote_sharding_retrieve_vectors() { + let ms0 = Server::new().await; + let ms1 = Server::new().await; + let ms2 = Server::new().await; + let index0 = ms0.index("test"); + let index1 = ms1.index("test"); + let index2 = ms2.index("test"); + + // enable feature + + let (response, code) = ms0.set_features(json!({"network": true})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response["network"]), @"true"); + let (response, code) = ms1.set_features(json!({"network": true})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response["network"]), @"true"); + let (response, code) = ms2.set_features(json!({"network": true})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response["network"]), @"true"); + + // set self + + let (response, code) = ms0.set_network(json!({"self": "ms0"})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response), @r###" + { + "self": "ms0", + "remotes": {} + } + "###); + let (response, code) = ms1.set_network(json!({"self": "ms1"})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response), @r###" + { + "self": "ms1", + "remotes": {} + } + "###); + let (response, code) = ms2.set_network(json!({"self": "ms2"})).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response), @r###" + { + "self": "ms2", + "remotes": {} + } + "###); + + // setup embedders + + let mock_server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/")) + .respond_with(move |req: &Request| { + println!("Received request: {:?}", req); + let text = req.body_json::().unwrap().to_lowercase(); + let patterns = [ + ("batman", [1.0, 0.0, 0.0]), + ("dark", [0.0, 0.1, 0.0]), + ("knight", [0.1, 0.1, 0.0]), + ("returns", [0.0, 0.0, 0.2]), + ("part", [0.05, 0.1, 0.0]), + ("1", [0.3, 0.05, 0.0]), + ("2", [0.2, 0.05, 0.0]), + ]; + let mut embedding = vec![0.; 3]; + for (pattern, vector) in patterns { + if text.contains(pattern) { + for (i, v) in vector.iter().enumerate() { + embedding[i] += v; + } + } + } + ResponseTemplate::new(200).set_body_json(json!({ "data": embedding })) + }) + .mount(&mock_server) + .await; + let url = mock_server.uri(); + + for (server, index) in [(&ms0, &index0), (&ms1, &index1), (&ms2, &index2)] { + let (response, code) = index + .update_settings(json!({ + "embedders": { + "rest": { + "source": "rest", + "url": url, + "dimensions": 3, + "request": "{{text}}", + "response": { "data": "{{embedding}}" }, + "documentTemplate": "{{doc.name}}", + }, + }, + })) + .await; + snapshot!(code, @"202 Accepted"); + server.wait_task(response.uid()).await.succeeded(); + } + + // wrap servers + let ms0 = Arc::new(ms0); + let ms1 = Arc::new(ms1); + let ms2 = Arc::new(ms2); + + let rms0 = LocalMeili::new(ms0.clone()).await; + let rms1 = LocalMeili::new(ms1.clone()).await; + let rms2 = LocalMeili::new(ms2.clone()).await; + + // set network + let network = json!({"remotes": { + "ms0": { + "url": rms0.url() + }, + "ms1": { + "url": rms1.url() + }, + "ms2": { + "url": rms2.url() + } + }}); + + let (_response, status_code) = ms0.set_network(network.clone()).await; + snapshot!(status_code, @"200 OK"); + let (_response, status_code) = ms1.set_network(network.clone()).await; + snapshot!(status_code, @"200 OK"); + let (_response, status_code) = ms2.set_network(network.clone()).await; + snapshot!(status_code, @"200 OK"); + + // multi vector search: one query per remote + + let request = json!({ + "federation": {}, + "queries": [ + { + "q": "batman", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms0" + } + }, + { + "q": "dark knight", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + { + "q": "returns", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms2" + } + }, + ] + }); + + let (response, _status_code) = ms0.multi_search(request.clone()).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r#" + { + "hits": [], + "processingTimeMs": "[time]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.0 + ], + "1": [ + 0.1, + 0.2, + 0.0 + ], + "2": [ + 0.0, + 0.0, + 0.2 + ] + }, + "semanticHitCount": 0, + "remoteErrors": {} + } + "#); + + // multi vector search: two local queries, one remote + + let request = json!({ + "federation": {}, + "queries": [ + { + "q": "batman", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms0" + } + }, + { + "q": "dark knight", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms0" + } + }, + { + "q": "returns", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms2" + } + }, + ] + }); + + let (response, _status_code) = ms0.multi_search(request.clone()).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r#" + { + "hits": [], + "processingTimeMs": "[time]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.0 + ], + "1": [ + 0.1, + 0.2, + 0.0 + ], + "2": [ + 0.0, + 0.0, + 0.2 + ] + }, + "semanticHitCount": 0, + "remoteErrors": {} + } + "#); + + // multi vector search: two queries on the same remote + + let request = json!({ + "federation": {}, + "queries": [ + { + "q": "batman", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms0" + } + }, + { + "q": "dark knight", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + { + "q": "returns", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + ] + }); + + let (response, _status_code) = ms0.multi_search(request.clone()).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r#" + { + "hits": [], + "processingTimeMs": "[time]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.0 + ], + "1": [ + 0.1, + 0.2, + 0.0 + ], + "2": [ + 0.0, + 0.0, + 0.2 + ] + }, + "semanticHitCount": 0, + "remoteErrors": {} + } + "#); + + // multi search: two vector, one keyword + + let request = json!({ + "federation": {}, + "queries": [ + { + "q": "batman", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms0" + } + }, + { + "q": "dark knight", + "indexUid": "test", + "hybrid": { + "semanticRatio": 0.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + { + "q": "returns", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + ] + }); + + let (response, _status_code) = ms0.multi_search(request.clone()).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r#" + { + "hits": [], + "processingTimeMs": "[time]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.0 + ], + "2": [ + 0.0, + 0.0, + 0.2 + ] + }, + "semanticHitCount": 0, + "remoteErrors": {} + } + "#); + + // multi vector search: no local queries, all remote + + let request = json!({ + "federation": {}, + "queries": [ + { + "q": "batman", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + { + "q": "dark knight", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + { + "q": "returns", + "indexUid": "test", + "hybrid": { + "semanticRatio": 1.0, + "embedder": "rest" + }, + "retrieveVectors": true, + "federationOptions": { + "remote": "ms1" + } + }, + ] + }); + + let (response, _status_code) = ms0.multi_search(request.clone()).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r#" + { + "hits": [], + "processingTimeMs": "[time]", + "limit": 20, + "offset": 0, + "estimatedTotalHits": 0, + "queryVectors": { + "0": [ + 1.0, + 0.0, + 0.0 + ], + "1": [ + 0.1, + 0.2, + 0.0 + ], + "2": [ + 0.0, + 0.0, + 0.2 + ] + }, + "remoteErrors": {} + } + "#); +} + #[actix_rt::test] async fn error_unregistered_remote() { let ms0 = Server::new().await; @@ -2500,7 +2998,7 @@ pub struct LocalMeiliParams { /// A server that exploits [`MockServer`] to provide an URL for testing network and the network. pub struct LocalMeili { - mock_server: MockServer, + mock_server: &'static MockServer, } impl LocalMeili { @@ -2509,7 +3007,7 @@ impl LocalMeili { } pub async fn with_params(server: Arc, params: LocalMeiliParams) -> Self { - let mock_server = MockServer::start().await; + let mock_server = Box::leak(Box::new(MockServer::start().await)); // tokio won't let us execute asynchronous code from a sync function inside of an async test, // so instead we spawn another thread that will call the service on a brand new tokio runtime @@ -2573,7 +3071,7 @@ impl LocalMeili { response.set_body_json(value) } }) - .mount(&mock_server) + .mount(mock_server) .await; Self { mock_server } } diff --git a/crates/meilisearch/tests/search/pagination.rs b/crates/meilisearch/tests/search/pagination.rs index c0752e7ec..6dd8b3181 100644 --- a/crates/meilisearch/tests/search/pagination.rs +++ b/crates/meilisearch/tests/search/pagination.rs @@ -1,6 +1,7 @@ use super::shared_index_with_documents; use crate::common::Server; use crate::json; +use meili_snap::{json_string, snapshot}; #[actix_rt::test] async fn default_search_should_return_estimated_total_hit() { @@ -133,3 +134,61 @@ async fn ensure_placeholder_search_hit_count_valid() { .await; } } + +#[actix_rt::test] +async fn test_issue_5274() { + let server = Server::new_shared(); + let index = server.unique_index(); + + let documents = json!([ + { + "id": 1, + "title": "Document 1", + "content": "This is the first." + }, + { + "id": 2, + "title": "Document 2", + "content": "This is the second doc." + } + ]); + let (task, _code) = index.add_documents(documents, None).await; + server.wait_task(task.uid()).await.succeeded(); + + // Find out the lowest ranking score among the documents + let (rep, _status) = index + .search_post(json!({"q": "doc", "page": 1, "hitsPerPage": 2, "showRankingScore": true})) + .await; + let hits = rep["hits"].as_array().expect("Missing hits array"); + let second_hit = hits.get(1).expect("Missing second hit"); + let ranking_score = second_hit + .get("_rankingScore") + .expect("Missing _rankingScore field") + .as_f64() + .expect("Expected _rankingScore to be a f64"); + + // Search with a ranking score threshold just above and expect to be a single hit + let (rep, _status) = index + .search_post(json!({"q": "doc", "page": 1, "hitsPerPage": 1, "rankingScoreThreshold": ranking_score + 0.0001})) + .await; + + snapshot!(json_string!(rep, { + ".processingTimeMs" => "[ignored]", + }), @r#" + { + "hits": [ + { + "id": 2, + "title": "Document 2", + "content": "This is the second doc." + } + ], + "query": "doc", + "processingTimeMs": "[ignored]", + "hitsPerPage": 1, + "page": 1, + "totalPages": 1, + "totalHits": 1 + } + "#); +} diff --git a/crates/meilisearch/tests/settings/chat.rs b/crates/meilisearch/tests/settings/chat.rs new file mode 100644 index 000000000..891a22431 --- /dev/null +++ b/crates/meilisearch/tests/settings/chat.rs @@ -0,0 +1,66 @@ +use crate::common::Server; +use crate::json; +use meili_snap::{json_string, snapshot}; + +#[actix_rt::test] +async fn set_reset_chat_issue_5772() { + let server = Server::new().await; + let index = server.unique_index(); + + let (_, code) = server + .set_features(json!({ + "chatCompletions": true, + })) + .await; + snapshot!(code, @r#"200 OK"#); + + let (task1, _code) = index.update_settings_chat(json!({ + "description": "test!", + "documentTemplate": "{% for field in fields %}{% if field.is_searchable and field.value != nil %}{{ field.name }}: {{ field.value }}\n{% endif %}{% endfor %}", + "documentTemplateMaxBytes": 400, + "searchParameters": { + "limit": 15, + "sort": [], + "attributesToSearchOn": [] + } + })).await; + server.wait_task(task1.uid()).await.succeeded(); + + let (response, _) = index.settings().await; + snapshot!(json_string!(response["chat"]), @r#" + { + "description": "test!", + "documentTemplate": "{% for field in fields %}{% if field.is_searchable and field.value != nil %}{{ field.name }}: {{ field.value }}\n{% endif %}{% endfor %}", + "documentTemplateMaxBytes": 400, + "searchParameters": { + "limit": 15, + "sort": [], + "attributesToSearchOn": [] + } + } + "#); + + let (task2, _status_code) = index.update_settings_chat(json!({ + "description": "test!", + "documentTemplate": "{% for field in fields %}{% if field.is_searchable and field.value != nil %}{{ field.name }}: {{ field.value }}\n{% endif %}{% endfor %}", + "documentTemplateMaxBytes": 400, + "searchParameters": { + "limit": 16 + } + })).await; + server.wait_task(task2.uid()).await.succeeded(); + + let (response, _) = index.settings().await; + snapshot!(json_string!(response["chat"]), @r#" + { + "description": "test!", + "documentTemplate": "{% for field in fields %}{% if field.is_searchable and field.value != nil %}{{ field.name }}: {{ field.value }}\n{% endif %}{% endfor %}", + "documentTemplateMaxBytes": 400, + "searchParameters": { + "limit": 16, + "sort": [], + "attributesToSearchOn": [] + } + } + "#); +} diff --git a/crates/meilisearch/tests/settings/get_settings.rs b/crates/meilisearch/tests/settings/get_settings.rs index 47e699380..8419f640d 100644 --- a/crates/meilisearch/tests/settings/get_settings.rs +++ b/crates/meilisearch/tests/settings/get_settings.rs @@ -186,7 +186,7 @@ test_setting_routes!( }, { setting: chat, - update_verb: put, + update_verb: patch, default_value: { "description": "", "documentTemplate": "{% for field in fields %}{% if field.is_searchable and field.value != nil %}{{ field.name }}: {{ field.value }}\n{% endif %}{% endfor %}", @@ -692,3 +692,68 @@ async fn granular_filterable_attributes() { ] "###); } + +#[actix_rt::test] +async fn test_searchable_attributes_order() { + let server = Server::new_shared(); + let index = server.unique_index(); + + // 1) Create an index with settings "searchableAttributes": ["title", "overview"] + let (response, code) = index.create(None).await; + assert_eq!(code, 202, "{response}"); + server.wait_task(response.uid()).await.succeeded(); + + let (task, code) = index + .update_settings(json!({ + "searchableAttributes": ["title", "overview"] + })) + .await; + assert_eq!(code, 202, "{task}"); + server.wait_task(task.uid()).await.succeeded(); + + // 2) Add documents in the index + let documents = json!([ + { + "id": 1, + "title": "The Matrix", + "overview": "A computer hacker learns from mysterious rebels about the true nature of his reality." + }, + { + "id": 2, + "title": "Inception", + "overview": "A thief who steals corporate secrets through dream-sharing technology." + } + ]); + + let (response, code) = index.add_documents(documents, None).await; + assert_eq!(code, 202, "{response}"); + server.wait_task(response.uid()).await.succeeded(); + + // 3) Modify the settings "searchableAttributes": ["overview", "title"] (overview is put first) + let (task, code) = index + .update_settings(json!({ + "searchableAttributes": ["overview", "title"] + })) + .await; + assert_eq!(code, 202, "{task}"); + server.wait_task(task.uid()).await.succeeded(); + + // 4) Check if it has been applied + let (response, code) = index.settings().await; + assert_eq!(code, 200, "{response}"); + assert_eq!(response["searchableAttributes"], json!(["overview", "title"])); + + // 5) Re-modify the settings "searchableAttributes": ["title", "overview"] (title is put first) + let (task, code) = index + .update_settings(json!({ + "searchableAttributes": ["title", "overview"] + })) + .await; + assert_eq!(code, 202, "{task}"); + server.wait_task(task.uid()).await.succeeded(); + + // 6) Check if it has been applied + let (response, code) = index.settings().await; + assert_eq!(code, 200, "{response}"); + assert_eq!(response["searchableAttributes"], json!(["title", "overview"])); +} diff --git a/crates/meilisearch/tests/settings/mod.rs b/crates/meilisearch/tests/settings/mod.rs index 6b61e6be0..b3a956c25 100644 --- a/crates/meilisearch/tests/settings/mod.rs +++ b/crates/meilisearch/tests/settings/mod.rs @@ -1,3 +1,4 @@ +mod chat; mod distinct; mod errors; mod get_settings; diff --git a/crates/meilisearch/tests/snapshot/mod.rs b/crates/meilisearch/tests/snapshot/mod.rs index 32946b06e..98ce17b80 100644 --- a/crates/meilisearch/tests/snapshot/mod.rs +++ b/crates/meilisearch/tests/snapshot/mod.rs @@ -122,11 +122,7 @@ async fn perform_on_demand_snapshot() { let server = Server::new_with_options(options).await.unwrap(); let index = server.index("catto"); - index - .update_settings(json! ({ - "searchableAttributes": [], - })) - .await; + index.update_settings(json! ({ "searchableAttributes": [] })).await; index.load_test_set(&server).await; @@ -203,3 +199,70 @@ async fn perform_on_demand_snapshot() { server.index("doggo").settings(), ); } + +#[actix_rt::test] +#[cfg_attr(target_os = "windows", ignore)] +async fn snapshotception_issue_4653() { + let temp = tempfile::tempdir().unwrap(); + let snapshot_dir = tempfile::tempdir().unwrap(); + let options = + Opt { snapshot_dir: snapshot_dir.path().to_owned(), ..default_settings(temp.path()) }; + + let server = Server::new_with_options(options).await.unwrap(); + + let (task, code) = server.create_snapshot().await; + snapshot!(code, @"202 Accepted"); + snapshot!(json_string!(task, { ".enqueuedAt" => "[date]" }), @r###" + { + "taskUid": 0, + "indexUid": null, + "status": "enqueued", + "type": "snapshotCreation", + "enqueuedAt": "[date]" + } + "###); + server.wait_task(task.uid()).await.succeeded(); + + let temp = tempfile::tempdir().unwrap(); + let snapshot_path = snapshot_dir.path().to_owned().join("db.snapshot"); + + let options = Opt { import_snapshot: Some(snapshot_path), ..default_settings(temp.path()) }; + let snapshot_server = Server::new_with_options(options).await.unwrap(); + + // The snapshot should have been taken without the snapshot creation task + let (tasks, code) = snapshot_server.tasks().await; + snapshot!(code, @"200 OK"); + snapshot!(tasks, @r#" + { + "results": [], + "total": 0, + "limit": 20, + "from": null, + "next": null + } + "#); + + // Ensure the task is not present in the snapshot + let (task, code) = snapshot_server.get_task(0).await; + snapshot!(code, @"404 Not Found"); + snapshot!(task, @r#" + { + "message": "Task `0` not found.", + "code": "task_not_found", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#task_not_found" + } + "#); + + // Ensure the batch is also not present + let (batch, code) = snapshot_server.get_batch(0).await; + snapshot!(code, @"404 Not Found"); + snapshot!(batch, @r#" + { + "message": "Batch `0` not found.", + "code": "batch_not_found", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#batch_not_found" + } + "#); +} diff --git a/crates/meilisearch/tests/tasks/webhook.rs b/crates/meilisearch/tests/tasks/webhook.rs index b18002eb7..bf2477b25 100644 --- a/crates/meilisearch/tests/tasks/webhook.rs +++ b/crates/meilisearch/tests/tasks/webhook.rs @@ -2,16 +2,18 @@ //! post requests. The webhook handle starts a server and forwards all the //! received requests into a channel for you to handle. +use std::path::PathBuf; use std::sync::Arc; use actix_http::body::MessageBody; use actix_web::dev::{ServiceFactory, ServiceResponse}; use actix_web::web::{Bytes, Data}; use actix_web::{post, App, HttpRequest, HttpResponse, HttpServer}; -use meili_snap::snapshot; +use meili_snap::{json_string, snapshot}; use meilisearch::Opt; use tokio::sync::mpsc; use url::Url; +use uuid::Uuid; use crate::common::{self, default_settings, Server}; use crate::json; @@ -68,21 +70,55 @@ async fn create_webhook_server() -> WebhookHandle { } #[actix_web::test] -async fn test_basic_webhook() { - let WebhookHandle { server_handle, url, mut receiver } = create_webhook_server().await; - +async fn cli_only() { let db_path = tempfile::tempdir().unwrap(); let server = Server::new_with_options(Opt { - task_webhook_url: Some(Url::parse(&url).unwrap()), + task_webhook_url: Some(Url::parse("https://example-cli.com/").unwrap()), + task_webhook_authorization_header: Some(String::from("Bearer a-secret-token")), ..default_settings(db_path.path()) }) .await .unwrap(); - let index = server.index("tamo"); + let (webhooks, code) = server.get_webhooks().await; + snapshot!(code, @"200 OK"); + snapshot!(webhooks, @r#" + { + "results": [ + { + "uuid": "00000000-0000-0000-0000-000000000000", + "isEditable": false, + "url": "https://example-cli.com/", + "headers": { + "Authorization": "Bearer a-secret-token" + } + } + ] + } + "#); +} + +#[actix_web::test] +async fn single_receives_data() { + let WebhookHandle { server_handle, url, mut receiver } = create_webhook_server().await; + + let server = Server::new().await; + + let (value, code) = server.create_webhook(json!({ "url": url })).await; + snapshot!(code, @"201 Created"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]", ".url" => "[ignored]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "[ignored]", + "headers": {} + } + "#); + // May be flaky: we're relying on the fact that while the first document addition is processed, the other // operations will be received and will be batched together. If it doesn't happen it's not a problem // the rest of the test won't assume anything about the number of tasks per batch. + let index = server.index("tamo"); for i in 0..5 { let (_, _status) = index.add_documents(json!({ "id": i, "doggo": "bone" }), None).await; } @@ -127,3 +163,496 @@ async fn test_basic_webhook() { server_handle.abort(); } + +#[actix_web::test] +async fn multiple_receive_data() { + let WebhookHandle { server_handle: handle1, url: url1, receiver: mut receiver1 } = + create_webhook_server().await; + let WebhookHandle { server_handle: handle2, url: url2, receiver: mut receiver2 } = + create_webhook_server().await; + let WebhookHandle { server_handle: handle3, url: url3, receiver: mut receiver3 } = + create_webhook_server().await; + + let db_path = tempfile::tempdir().unwrap(); + let server = Server::new_with_options(Opt { + task_webhook_url: Some(Url::parse(&url3).unwrap()), + ..default_settings(db_path.path()) + }) + .await + .unwrap(); + + for url in [url1, url2] { + let (value, code) = server.create_webhook(json!({ "url": url })).await; + snapshot!(code, @"201 Created"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]", ".url" => "[ignored]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "[ignored]", + "headers": {} + } + "#); + } + let index = server.index("tamo"); + let (_, status) = index.add_documents(json!({ "id": 1, "doggo": "bone" }), None).await; + snapshot!(status, @"202 Accepted"); + + let mut count1 = 0; + let mut count2 = 0; + let mut count3 = 0; + while count1 == 0 || count2 == 0 || count3 == 0 { + tokio::select! { + msg = receiver1.recv() => { if msg.is_some() { count1 += 1; } }, + msg = receiver2.recv() => { if msg.is_some() { count2 += 1; } }, + msg = receiver3.recv() => { if msg.is_some() { count3 += 1; } }, + } + } + + assert_eq!(count1, 1); + assert_eq!(count2, 1); + assert_eq!(count3, 1); + + handle1.abort(); + handle2.abort(); + handle3.abort(); +} + +#[actix_web::test] +async fn cli_with_dumps() { + let db_path = tempfile::tempdir().unwrap(); + let server = Server::new_with_options(Opt { + task_webhook_url: Some(Url::parse("http://defined-in-test-cli.com").unwrap()), + task_webhook_authorization_header: Some(String::from( + "Bearer a-secret-token-defined-in-test-cli", + )), + import_dump: Some(PathBuf::from("../dump/tests/assets/v6-with-webhooks.dump")), + ..default_settings(db_path.path()) + }) + .await + .unwrap(); + + let (webhooks, code) = server.get_webhooks().await; + snapshot!(code, @"200 OK"); + snapshot!(webhooks, @r#" + { + "results": [ + { + "uuid": "00000000-0000-0000-0000-000000000000", + "isEditable": false, + "url": "http://defined-in-test-cli.com/", + "headers": { + "Authorization": "Bearer a-secret-token-defined-in-test-cli" + } + }, + { + "uuid": "627ea538-733d-4545-8d2d-03526eb381ce", + "isEditable": true, + "url": "https://example.com/authorization-less", + "headers": {} + }, + { + "uuid": "771b0a28-ef28-4082-b984-536f82958c65", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN" + } + }, + { + "uuid": "f3583083-f8a7-4cbf-a5e7-fb3f1e28a7e9", + "isEditable": true, + "url": "https://third.com", + "headers": {} + } + ] + } + "#); +} + +#[actix_web::test] +async fn reserved_names() { + let db_path = tempfile::tempdir().unwrap(); + let server = Server::new_with_options(Opt { + task_webhook_url: Some(Url::parse("https://example-cli.com/").unwrap()), + task_webhook_authorization_header: Some(String::from("Bearer a-secret-token")), + ..default_settings(db_path.path()) + }) + .await + .unwrap(); + + let (value, code) = server + .patch_webhook(Uuid::nil().to_string(), json!({ "url": "http://localhost:8080" })) + .await; + snapshot!(value, @r#" + { + "message": "Webhook `[uuid]` is immutable. The webhook defined from the command line cannot be modified using the API.", + "code": "immutable_webhook", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook" + } + "#); + snapshot!(code, @"400 Bad Request"); + + let (value, code) = server.delete_webhook(Uuid::nil().to_string()).await; + snapshot!(value, @r#" + { + "message": "Webhook `[uuid]` is immutable. The webhook defined from the command line cannot be modified using the API.", + "code": "immutable_webhook", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook" + } + "#); + snapshot!(code, @"400 Bad Request"); +} + +#[actix_web::test] +async fn over_limits() { + let server = Server::new().await; + + // Too many webhooks + let mut uuids = Vec::new(); + for _ in 0..20 { + let (value, code) = server.create_webhook(json!({ "url": "http://localhost:8080" } )).await; + snapshot!(code, @"201 Created"); + uuids.push(value.get("uuid").unwrap().as_str().unwrap().to_string()); + } + let (value, code) = server.create_webhook(json!({ "url": "http://localhost:8080" })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Defining too many webhooks would crush the server. Please limit the number of webhooks to 20. You may use a third-party proxy server to dispatch events to more than 20 endpoints.", + "code": "invalid_webhooks", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhooks" + } + "#); + + // Reset webhooks + for uuid in uuids { + let (_value, code) = server.delete_webhook(&uuid).await; + snapshot!(code, @"204 No Content"); + } + + // Test too many headers + let (value, code) = server.create_webhook(json!({ "url": "http://localhost:8080" })).await; + snapshot!(code, @"201 Created"); + let uuid = value.get("uuid").unwrap().as_str().unwrap(); + for i in 0..200 { + let header_name = format!("header_{i}"); + let (_value, code) = + server.patch_webhook(uuid, json!({ "headers": { header_name: "" } })).await; + snapshot!(code, @"200 OK"); + } + let (value, code) = + server.patch_webhook(uuid, json!({ "headers": { "header_200": "" } })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Too many headers for the webhook `[uuid]`. Please limit the number of headers to 200. Hint: To remove an already defined header set its value to `null`", + "code": "invalid_webhook_headers", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_headers" + } + "#); +} + +#[actix_web::test] +async fn post_get_delete() { + let server = Server::new().await; + + let (value, code) = server + .create_webhook(json!({ + "url": "https://example.com/hook", + "headers": { "authorization": "TOKEN" } + })) + .await; + snapshot!(code, @"201 Created"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN" + } + } + "#); + + let uuid = value.get("uuid").unwrap().as_str().unwrap(); + let (value, code) = server.get_webhook(uuid).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN" + } + } + "#); + + let (_value, code) = server.delete_webhook(uuid).await; + snapshot!(code, @"204 No Content"); + + let (_value, code) = server.get_webhook(uuid).await; + snapshot!(code, @"404 Not Found"); +} + +#[actix_web::test] +async fn create_and_patch() { + let server = Server::new().await; + + let (value, code) = + server.create_webhook(json!({ "headers": { "authorization": "TOKEN" } })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "The URL for the webhook `[uuid]` is missing.", + "code": "invalid_webhook_url", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_url" + } + "#); + + let (value, code) = server.create_webhook(json!({ "url": "https://example.com/hook" })).await; + snapshot!(code, @"201 Created"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": {} + } + "#); + + let uuid = value.get("uuid").unwrap().as_str().unwrap(); + let (value, code) = + server.patch_webhook(&uuid, json!({ "headers": { "authorization": "TOKEN" } })).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN" + } + } + "#); + + let (value, code) = + server.patch_webhook(&uuid, json!({ "headers": { "authorization2": "TOKEN" } })).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization": "TOKEN", + "authorization2": "TOKEN" + } + } + "#); + + let (value, code) = + server.patch_webhook(&uuid, json!({ "headers": { "authorization": null } })).await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "uuid": "[uuid]", + "isEditable": true, + "url": "https://example.com/hook", + "headers": { + "authorization2": "TOKEN" + } + } + "#); + + let (value, code) = server.patch_webhook(&uuid, json!({ "url": null })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "message": "The URL for the webhook `[uuid]` is missing.", + "code": "invalid_webhook_url", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_url" + } + "#); +} + +#[actix_web::test] +async fn invalid_url_and_headers() { + let server = Server::new().await; + + // Test invalid URL format + let (value, code) = server.create_webhook(json!({ "url": "not-a-valid-url" })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid URL `not-a-valid-url`: relative URL without a base", + "code": "invalid_webhook_url", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_url" + } + "#); + + // Test invalid header name (containing spaces) + let (value, code) = server + .create_webhook(json!({ + "url": "https://example.com/hook", + "headers": { "invalid header name": "value" } + })) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid header name `invalid header name`: invalid HTTP header name", + "code": "invalid_webhook_headers", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_headers" + } + "#); + + // Test invalid header value (containing control characters) + let (value, code) = server + .create_webhook(json!({ + "url": "https://example.com/hook", + "headers": { "authorization": "token\nwith\nnewlines" } + })) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid header value `authorization`: failed to parse header value", + "code": "invalid_webhook_headers", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_headers" + } + "#); +} + +#[actix_web::test] +async fn invalid_uuid() { + let server = Server::new().await; + + // Test get webhook with invalid UUID + let (value, code) = server.get_webhook("invalid-uuid").await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid UUID: invalid character: expected an optional prefix of `urn:uuid:` followed by [0-9a-fA-F-], found `i` at 1", + "code": "invalid_webhook_uuid", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_uuid" + } + "#); + + // Test update webhook with invalid UUID + let (value, code) = + server.patch_webhook("invalid-uuid", json!({ "url": "https://example.com/hook" })).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid UUID: invalid character: expected an optional prefix of `urn:uuid:` followed by [0-9a-fA-F-], found `i` at 1", + "code": "invalid_webhook_uuid", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_uuid" + } + "#); + + // Test delete webhook with invalid UUID + let (value, code) = server.delete_webhook("invalid-uuid").await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid UUID: invalid character: expected an optional prefix of `urn:uuid:` followed by [0-9a-fA-F-], found `i` at 1", + "code": "invalid_webhook_uuid", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_webhook_uuid" + } + "#); +} + +#[actix_web::test] +async fn forbidden_fields() { + let server = Server::new().await; + + // Test creating webhook with uuid field + let custom_uuid = Uuid::new_v4(); + let (value, code) = server + .create_webhook(json!({ + "url": "https://example.com/hook", + "uuid": custom_uuid.to_string(), + "headers": { "authorization": "TOKEN" } + })) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Immutable field `uuid`: expected one of `url`, `headers`", + "code": "immutable_webhook_uuid", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook_uuid" + } + "#); + + // Test creating webhook with isEditable field + let (value, code) = server + .create_webhook(json!({ + "url": "https://example.com/hook2", + "isEditable": false, + "headers": { "authorization": "TOKEN" } + })) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Immutable field `isEditable`: expected one of `url`, `headers`", + "code": "immutable_webhook_is_editable", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook_is_editable" + } + "#); + + // Test patching webhook with uuid field + let (value, code) = server + .patch_webhook( + "uuid-whatever", + json!({ + "uuid": Uuid::new_v4(), + "headers": { "new-header": "value" } + }), + ) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Immutable field `uuid`: expected one of `url`, `headers`", + "code": "immutable_webhook_uuid", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook_uuid" + } + "#); + + // Test patching webhook with isEditable field + let (value, code) = server + .patch_webhook( + "uuid-whatever", + json!({ + "isEditable": false, + "headers": { "another-header": "value" } + }), + ) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(json_string!(value, { ".uuid" => "[uuid]" }), @r#" + { + "message": "Immutable field `isEditable`: expected one of `url`, `headers`", + "code": "immutable_webhook_is_editable", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#immutable_webhook_is_editable" + } + "#); +} diff --git a/crates/meilisearch/tests/upgrade/mod.rs b/crates/meilisearch/tests/upgrade/mod.rs index 8114ed58b..5d120ba2f 100644 --- a/crates/meilisearch/tests/upgrade/mod.rs +++ b/crates/meilisearch/tests/upgrade/mod.rs @@ -43,7 +43,7 @@ async fn version_too_old() { std::fs::write(db_path.join("VERSION"), "1.11.9999").unwrap(); let options = Opt { experimental_dumpless_upgrade: true, ..default_settings }; let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err(); - snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.16.0"); + snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.17.1"); } #[actix_rt::test] @@ -58,7 +58,7 @@ async fn version_requires_downgrade() { std::fs::write(db_path.join("VERSION"), format!("{major}.{minor}.{patch}")).unwrap(); let options = Opt { experimental_dumpless_upgrade: true, ..default_settings }; let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err(); - snapshot!(err, @"Database version 1.16.1 is higher than the Meilisearch version 1.16.0. Downgrade is not supported"); + snapshot!(err, @"Database version 1.17.2 is higher than the Meilisearch version 1.17.1. Downgrade is not supported"); } #[actix_rt::test] diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_features/kefir_settings.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_features/kefir_settings.snap index af7e82c8b..3c97dbe70 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_features/kefir_settings.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_features/kefir_settings.snap @@ -61,7 +61,16 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "pagination": { "maxTotalHits": 15 }, - "embedders": {}, + "embedders": { + "doggo_embedder": { + "source": "huggingFace", + "model": "sentence-transformers/all-MiniLM-L6-v2", + "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", + "pooling": "forceMean", + "documentTemplate": "{{doc.description}}", + "documentTemplateMaxBytes": 400 + } + }, "searchCutoffMs": 8000, "localizedAttributes": [ { diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_features/search_with_retrieve_vectors.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_features/search_with_retrieve_vectors.snap new file mode 100644 index 000000000..5baf8155c --- /dev/null +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_features/search_with_retrieve_vectors.snap @@ -0,0 +1,40 @@ +--- +source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs +--- +[ + { + "id": 1, + "name": "kefir", + "surname": [ + "kef", + "kefkef", + "kefirounet", + "boubou" + ], + "age": 1.4, + "description": "kefir est un petit chien blanc très mignon", + "_vectors": { + "doggo_embedder": { + "embeddings": "[vector]", + "regenerate": true + } + } + }, + { + "id": 2, + "name": "intel", + "surname": [ + "untel", + "tétel", + "iouiou" + ], + "age": 11.5, + "description": "intel est un grand beagle très mignon", + "_vectors": { + "doggo_embedder": { + "embeddings": "[vector]", + "regenerate": false + } + } + } +] diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap index f4edae51b..e7d8768be 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap @@ -4,11 +4,11 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs { "results": [ { - "uid": 24, + "uid": 30, "progress": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "stats": { "totalNbTasks": 1, @@ -26,6 +26,155 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "finishedAt": "[date]", "batchStrategy": "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type." }, + { + "uid": 29, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.067201S", + "startedAt": "2025-07-07T13:43:08.772854Z", + "finishedAt": "2025-07-07T13:43:08.840055Z", + "batchStrategy": "unspecified" + }, + { + "uid": 28, + "progress": null, + "details": { + "deletedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "indexDeletion": 1 + }, + "indexUids": { + "mieli": 1 + } + }, + "duration": "PT0.012727S", + "startedAt": "2025-07-07T13:42:50.745461Z", + "finishedAt": "2025-07-07T13:42:50.758188Z", + "batchStrategy": "unspecified" + }, + { + "uid": 27, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 0 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "failed": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.059920S", + "startedAt": "2025-07-07T13:42:15.625413Z", + "finishedAt": "2025-07-07T13:42:15.685333Z", + "batchStrategy": "unspecified" + }, + { + "uid": 26, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "mieli": 1 + } + }, + "duration": "PT0.088879S", + "startedAt": "2025-07-07T13:40:01.461741Z", + "finishedAt": "2025-07-07T13:40:01.55062Z", + "batchStrategy": "unspecified" + }, + { + "uid": 25, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.312911S", + "startedAt": "2025-07-07T13:32:46.139785Z", + "finishedAt": "2025-07-07T13:32:46.452696Z", + "batchStrategy": "unspecified" + }, + { + "uid": 24, + "progress": null, + "details": { + "embedders": { + "doggo_embedder": { + "source": "huggingFace", + "model": "sentence-transformers/all-MiniLM-L6-v2", + "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", + "documentTemplate": "{{doc.description}}" + } + } + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "settingsUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.247378S", + "startedAt": "2025-07-07T13:28:27.391344Z", + "finishedAt": "2025-07-07T13:28:27.638722Z", + "batchStrategy": "unspecified" + }, { "uid": 23, "progress": null, @@ -348,179 +497,10 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "startedAt": "2025-01-16T17:01:14.112756687Z", "finishedAt": "2025-01-16T17:01:14.120064527Z", "batchStrategy": "unspecified" - }, - { - "uid": 10, - "progress": null, - "details": { - "faceting": { - "maxValuesPerFacet": 99 - }, - "pagination": { - "maxTotalHits": 15 - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007391353S", - "startedAt": "2025-01-16T17:00:29.201180268Z", - "finishedAt": "2025-01-16T17:00:29.208571621Z", - "batchStrategy": "unspecified" - }, - { - "uid": 9, - "progress": null, - "details": { - "faceting": { - "maxValuesPerFacet": 100 - }, - "pagination": { - "maxTotalHits": 1000 - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007445825S", - "startedAt": "2025-01-16T17:00:15.77629445Z", - "finishedAt": "2025-01-16T17:00:15.783740275Z", - "batchStrategy": "unspecified" - }, - { - "uid": 8, - "progress": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - }, - "disableOnWords": [ - "kefir" - ], - "disableOnAttributes": [ - "surname" - ] - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.012020083S", - "startedAt": "2025-01-16T16:59:42.744086671Z", - "finishedAt": "2025-01-16T16:59:42.756106754Z", - "batchStrategy": "unspecified" - }, - { - "uid": 7, - "progress": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - } - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007440092S", - "startedAt": "2025-01-16T16:58:41.2155771Z", - "finishedAt": "2025-01-16T16:58:41.223017192Z", - "batchStrategy": "unspecified" - }, - { - "uid": 6, - "progress": null, - "details": { - "synonyms": { - "boubou": [ - "kefir" - ] - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007565161S", - "startedAt": "2025-01-16T16:54:51.940332781Z", - "finishedAt": "2025-01-16T16:54:51.947897942Z", - "batchStrategy": "unspecified" - }, - { - "uid": 5, - "progress": null, - "details": { - "stopWords": [ - "le", - "un" - ] - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.016307263S", - "startedAt": "2025-01-16T16:53:19.913351957Z", - "finishedAt": "2025-01-16T16:53:19.92965922Z", - "batchStrategy": "unspecified" } ], - "total": 23, + "total": 29, "limit": 20, - "from": 24, - "next": 4 + "from": 30, + "next": 10 } diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap index f4edae51b..e7d8768be 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap @@ -4,11 +4,11 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs { "results": [ { - "uid": 24, + "uid": 30, "progress": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "stats": { "totalNbTasks": 1, @@ -26,6 +26,155 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "finishedAt": "[date]", "batchStrategy": "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type." }, + { + "uid": 29, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.067201S", + "startedAt": "2025-07-07T13:43:08.772854Z", + "finishedAt": "2025-07-07T13:43:08.840055Z", + "batchStrategy": "unspecified" + }, + { + "uid": 28, + "progress": null, + "details": { + "deletedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "indexDeletion": 1 + }, + "indexUids": { + "mieli": 1 + } + }, + "duration": "PT0.012727S", + "startedAt": "2025-07-07T13:42:50.745461Z", + "finishedAt": "2025-07-07T13:42:50.758188Z", + "batchStrategy": "unspecified" + }, + { + "uid": 27, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 0 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "failed": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.059920S", + "startedAt": "2025-07-07T13:42:15.625413Z", + "finishedAt": "2025-07-07T13:42:15.685333Z", + "batchStrategy": "unspecified" + }, + { + "uid": 26, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "mieli": 1 + } + }, + "duration": "PT0.088879S", + "startedAt": "2025-07-07T13:40:01.461741Z", + "finishedAt": "2025-07-07T13:40:01.55062Z", + "batchStrategy": "unspecified" + }, + { + "uid": 25, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.312911S", + "startedAt": "2025-07-07T13:32:46.139785Z", + "finishedAt": "2025-07-07T13:32:46.452696Z", + "batchStrategy": "unspecified" + }, + { + "uid": 24, + "progress": null, + "details": { + "embedders": { + "doggo_embedder": { + "source": "huggingFace", + "model": "sentence-transformers/all-MiniLM-L6-v2", + "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", + "documentTemplate": "{{doc.description}}" + } + } + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "settingsUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.247378S", + "startedAt": "2025-07-07T13:28:27.391344Z", + "finishedAt": "2025-07-07T13:28:27.638722Z", + "batchStrategy": "unspecified" + }, { "uid": 23, "progress": null, @@ -348,179 +497,10 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "startedAt": "2025-01-16T17:01:14.112756687Z", "finishedAt": "2025-01-16T17:01:14.120064527Z", "batchStrategy": "unspecified" - }, - { - "uid": 10, - "progress": null, - "details": { - "faceting": { - "maxValuesPerFacet": 99 - }, - "pagination": { - "maxTotalHits": 15 - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007391353S", - "startedAt": "2025-01-16T17:00:29.201180268Z", - "finishedAt": "2025-01-16T17:00:29.208571621Z", - "batchStrategy": "unspecified" - }, - { - "uid": 9, - "progress": null, - "details": { - "faceting": { - "maxValuesPerFacet": 100 - }, - "pagination": { - "maxTotalHits": 1000 - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007445825S", - "startedAt": "2025-01-16T17:00:15.77629445Z", - "finishedAt": "2025-01-16T17:00:15.783740275Z", - "batchStrategy": "unspecified" - }, - { - "uid": 8, - "progress": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - }, - "disableOnWords": [ - "kefir" - ], - "disableOnAttributes": [ - "surname" - ] - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.012020083S", - "startedAt": "2025-01-16T16:59:42.744086671Z", - "finishedAt": "2025-01-16T16:59:42.756106754Z", - "batchStrategy": "unspecified" - }, - { - "uid": 7, - "progress": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - } - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007440092S", - "startedAt": "2025-01-16T16:58:41.2155771Z", - "finishedAt": "2025-01-16T16:58:41.223017192Z", - "batchStrategy": "unspecified" - }, - { - "uid": 6, - "progress": null, - "details": { - "synonyms": { - "boubou": [ - "kefir" - ] - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007565161S", - "startedAt": "2025-01-16T16:54:51.940332781Z", - "finishedAt": "2025-01-16T16:54:51.947897942Z", - "batchStrategy": "unspecified" - }, - { - "uid": 5, - "progress": null, - "details": { - "stopWords": [ - "le", - "un" - ] - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.016307263S", - "startedAt": "2025-01-16T16:53:19.913351957Z", - "finishedAt": "2025-01-16T16:53:19.92965922Z", - "batchStrategy": "unspecified" } ], - "total": 23, + "total": 29, "limit": 20, - "from": 24, - "next": 4 + "from": 30, + "next": 10 } diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap index f4edae51b..e7d8768be 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/batches_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap @@ -4,11 +4,11 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs { "results": [ { - "uid": 24, + "uid": 30, "progress": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "stats": { "totalNbTasks": 1, @@ -26,6 +26,155 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "finishedAt": "[date]", "batchStrategy": "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type." }, + { + "uid": 29, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.067201S", + "startedAt": "2025-07-07T13:43:08.772854Z", + "finishedAt": "2025-07-07T13:43:08.840055Z", + "batchStrategy": "unspecified" + }, + { + "uid": 28, + "progress": null, + "details": { + "deletedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "indexDeletion": 1 + }, + "indexUids": { + "mieli": 1 + } + }, + "duration": "PT0.012727S", + "startedAt": "2025-07-07T13:42:50.745461Z", + "finishedAt": "2025-07-07T13:42:50.758188Z", + "batchStrategy": "unspecified" + }, + { + "uid": 27, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 0 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "failed": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.059920S", + "startedAt": "2025-07-07T13:42:15.625413Z", + "finishedAt": "2025-07-07T13:42:15.685333Z", + "batchStrategy": "unspecified" + }, + { + "uid": 26, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "mieli": 1 + } + }, + "duration": "PT0.088879S", + "startedAt": "2025-07-07T13:40:01.461741Z", + "finishedAt": "2025-07-07T13:40:01.55062Z", + "batchStrategy": "unspecified" + }, + { + "uid": 25, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.312911S", + "startedAt": "2025-07-07T13:32:46.139785Z", + "finishedAt": "2025-07-07T13:32:46.452696Z", + "batchStrategy": "unspecified" + }, + { + "uid": 24, + "progress": null, + "details": { + "embedders": { + "doggo_embedder": { + "source": "huggingFace", + "model": "sentence-transformers/all-MiniLM-L6-v2", + "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", + "documentTemplate": "{{doc.description}}" + } + } + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "settingsUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.247378S", + "startedAt": "2025-07-07T13:28:27.391344Z", + "finishedAt": "2025-07-07T13:28:27.638722Z", + "batchStrategy": "unspecified" + }, { "uid": 23, "progress": null, @@ -348,179 +497,10 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "startedAt": "2025-01-16T17:01:14.112756687Z", "finishedAt": "2025-01-16T17:01:14.120064527Z", "batchStrategy": "unspecified" - }, - { - "uid": 10, - "progress": null, - "details": { - "faceting": { - "maxValuesPerFacet": 99 - }, - "pagination": { - "maxTotalHits": 15 - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007391353S", - "startedAt": "2025-01-16T17:00:29.201180268Z", - "finishedAt": "2025-01-16T17:00:29.208571621Z", - "batchStrategy": "unspecified" - }, - { - "uid": 9, - "progress": null, - "details": { - "faceting": { - "maxValuesPerFacet": 100 - }, - "pagination": { - "maxTotalHits": 1000 - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007445825S", - "startedAt": "2025-01-16T17:00:15.77629445Z", - "finishedAt": "2025-01-16T17:00:15.783740275Z", - "batchStrategy": "unspecified" - }, - { - "uid": 8, - "progress": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - }, - "disableOnWords": [ - "kefir" - ], - "disableOnAttributes": [ - "surname" - ] - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.012020083S", - "startedAt": "2025-01-16T16:59:42.744086671Z", - "finishedAt": "2025-01-16T16:59:42.756106754Z", - "batchStrategy": "unspecified" - }, - { - "uid": 7, - "progress": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - } - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007440092S", - "startedAt": "2025-01-16T16:58:41.2155771Z", - "finishedAt": "2025-01-16T16:58:41.223017192Z", - "batchStrategy": "unspecified" - }, - { - "uid": 6, - "progress": null, - "details": { - "synonyms": { - "boubou": [ - "kefir" - ] - } - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.007565161S", - "startedAt": "2025-01-16T16:54:51.940332781Z", - "finishedAt": "2025-01-16T16:54:51.947897942Z", - "batchStrategy": "unspecified" - }, - { - "uid": 5, - "progress": null, - "details": { - "stopWords": [ - "le", - "un" - ] - }, - "stats": { - "totalNbTasks": 1, - "status": { - "succeeded": 1 - }, - "types": { - "settingsUpdate": 1 - }, - "indexUids": { - "kefir": 1 - } - }, - "duration": "PT0.016307263S", - "startedAt": "2025-01-16T16:53:19.913351957Z", - "finishedAt": "2025-01-16T16:53:19.92965922Z", - "batchStrategy": "unspecified" } ], - "total": 23, + "total": 29, "limit": 20, - "from": 24, - "next": 4 + "from": 30, + "next": 10 } diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap index 01d2ea341..61dd95786 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41.snap @@ -4,15 +4,15 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs { "results": [ { - "uid": 25, - "batchUid": 24, + "uid": 31, + "batchUid": 30, "indexUid": null, "status": "succeeded", "type": "upgradeDatabase", "canceledBy": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "error": null, "duration": "[duration]", @@ -20,6 +20,118 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "startedAt": "[date]", "finishedAt": "[date]" }, + { + "uid": 30, + "batchUid": 29, + "indexUid": "kefir", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.067201S", + "enqueuedAt": "2025-07-07T13:43:08.772432Z", + "startedAt": "2025-07-07T13:43:08.772854Z", + "finishedAt": "2025-07-07T13:43:08.840055Z" + }, + { + "uid": 29, + "batchUid": 28, + "indexUid": "mieli", + "status": "succeeded", + "type": "indexDeletion", + "canceledBy": null, + "details": { + "deletedDocuments": 1 + }, + "error": null, + "duration": "PT0.012727S", + "enqueuedAt": "2025-07-07T13:42:50.744793Z", + "startedAt": "2025-07-07T13:42:50.745461Z", + "finishedAt": "2025-07-07T13:42:50.758188Z" + }, + { + "uid": 28, + "batchUid": 27, + "indexUid": "kefir", + "status": "failed", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 0 + }, + "error": { + "message": "Index `kefir`: Bad embedder configuration in the document with id: `2`. Could not parse `._vectors.doggo_embedder`: trailing characters at line 1 column 13", + "code": "invalid_vectors_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_vectors_type" + }, + "duration": "PT0.059920S", + "enqueuedAt": "2025-07-07T13:42:15.624598Z", + "startedAt": "2025-07-07T13:42:15.625413Z", + "finishedAt": "2025-07-07T13:42:15.685333Z" + }, + { + "uid": 27, + "batchUid": 26, + "indexUid": "mieli", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.088879S", + "enqueuedAt": "2025-07-07T13:40:01.46081Z", + "startedAt": "2025-07-07T13:40:01.461741Z", + "finishedAt": "2025-07-07T13:40:01.55062Z" + }, + { + "uid": 26, + "batchUid": 25, + "indexUid": "kefir", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.312911S", + "enqueuedAt": "2025-07-07T13:32:46.13871Z", + "startedAt": "2025-07-07T13:32:46.139785Z", + "finishedAt": "2025-07-07T13:32:46.452696Z" + }, + { + "uid": 25, + "batchUid": 24, + "indexUid": "kefir", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "doggo_embedder": { + "source": "huggingFace", + "model": "sentence-transformers/all-MiniLM-L6-v2", + "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", + "documentTemplate": "{{doc.description}}" + } + } + }, + "error": null, + "duration": "PT0.247378S", + "enqueuedAt": "2025-07-07T13:28:27.390054Z", + "startedAt": "2025-07-07T13:28:27.391344Z", + "finishedAt": "2025-07-07T13:28:27.638722Z" + }, { "uid": 24, "batchUid": 23, @@ -264,134 +376,10 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "enqueuedAt": "2025-01-16T17:02:52.527382964Z", "startedAt": "2025-01-16T17:02:52.539749853Z", "finishedAt": "2025-01-16T17:02:52.547390016Z" - }, - { - "uid": 11, - "batchUid": 11, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "searchCutoffMs": 8000 - }, - "error": null, - "duration": "PT0.007307840S", - "enqueuedAt": "2025-01-16T17:01:14.100316617Z", - "startedAt": "2025-01-16T17:01:14.112756687Z", - "finishedAt": "2025-01-16T17:01:14.120064527Z" - }, - { - "uid": 10, - "batchUid": 10, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "faceting": { - "maxValuesPerFacet": 99 - }, - "pagination": { - "maxTotalHits": 15 - } - }, - "error": null, - "duration": "PT0.007391353S", - "enqueuedAt": "2025-01-16T17:00:29.188815062Z", - "startedAt": "2025-01-16T17:00:29.201180268Z", - "finishedAt": "2025-01-16T17:00:29.208571621Z" - }, - { - "uid": 9, - "batchUid": 9, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "faceting": { - "maxValuesPerFacet": 100 - }, - "pagination": { - "maxTotalHits": 1000 - } - }, - "error": null, - "duration": "PT0.007445825S", - "enqueuedAt": "2025-01-16T17:00:15.759501709Z", - "startedAt": "2025-01-16T17:00:15.77629445Z", - "finishedAt": "2025-01-16T17:00:15.783740275Z" - }, - { - "uid": 8, - "batchUid": 8, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - }, - "disableOnWords": [ - "kefir" - ], - "disableOnAttributes": [ - "surname" - ] - } - }, - "error": null, - "duration": "PT0.012020083S", - "enqueuedAt": "2025-01-16T16:59:42.727292501Z", - "startedAt": "2025-01-16T16:59:42.744086671Z", - "finishedAt": "2025-01-16T16:59:42.756106754Z" - }, - { - "uid": 7, - "batchUid": 7, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - } - } - }, - "error": null, - "duration": "PT0.007440092S", - "enqueuedAt": "2025-01-16T16:58:41.203145044Z", - "startedAt": "2025-01-16T16:58:41.2155771Z", - "finishedAt": "2025-01-16T16:58:41.223017192Z" - }, - { - "uid": 6, - "batchUid": 6, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "synonyms": { - "boubou": [ - "kefir" - ] - } - }, - "error": null, - "duration": "PT0.007565161S", - "enqueuedAt": "2025-01-16T16:54:51.927866243Z", - "startedAt": "2025-01-16T16:54:51.940332781Z", - "finishedAt": "2025-01-16T16:54:51.947897942Z" } ], - "total": 24, + "total": 30, "limit": 20, - "from": 25, - "next": 5 + "from": 31, + "next": 11 } diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap index 01d2ea341..61dd95786 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterFinishedAt_equal_2025-01-16T16_47_41.snap @@ -4,15 +4,15 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs { "results": [ { - "uid": 25, - "batchUid": 24, + "uid": 31, + "batchUid": 30, "indexUid": null, "status": "succeeded", "type": "upgradeDatabase", "canceledBy": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "error": null, "duration": "[duration]", @@ -20,6 +20,118 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "startedAt": "[date]", "finishedAt": "[date]" }, + { + "uid": 30, + "batchUid": 29, + "indexUid": "kefir", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.067201S", + "enqueuedAt": "2025-07-07T13:43:08.772432Z", + "startedAt": "2025-07-07T13:43:08.772854Z", + "finishedAt": "2025-07-07T13:43:08.840055Z" + }, + { + "uid": 29, + "batchUid": 28, + "indexUid": "mieli", + "status": "succeeded", + "type": "indexDeletion", + "canceledBy": null, + "details": { + "deletedDocuments": 1 + }, + "error": null, + "duration": "PT0.012727S", + "enqueuedAt": "2025-07-07T13:42:50.744793Z", + "startedAt": "2025-07-07T13:42:50.745461Z", + "finishedAt": "2025-07-07T13:42:50.758188Z" + }, + { + "uid": 28, + "batchUid": 27, + "indexUid": "kefir", + "status": "failed", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 0 + }, + "error": { + "message": "Index `kefir`: Bad embedder configuration in the document with id: `2`. Could not parse `._vectors.doggo_embedder`: trailing characters at line 1 column 13", + "code": "invalid_vectors_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_vectors_type" + }, + "duration": "PT0.059920S", + "enqueuedAt": "2025-07-07T13:42:15.624598Z", + "startedAt": "2025-07-07T13:42:15.625413Z", + "finishedAt": "2025-07-07T13:42:15.685333Z" + }, + { + "uid": 27, + "batchUid": 26, + "indexUid": "mieli", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.088879S", + "enqueuedAt": "2025-07-07T13:40:01.46081Z", + "startedAt": "2025-07-07T13:40:01.461741Z", + "finishedAt": "2025-07-07T13:40:01.55062Z" + }, + { + "uid": 26, + "batchUid": 25, + "indexUid": "kefir", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.312911S", + "enqueuedAt": "2025-07-07T13:32:46.13871Z", + "startedAt": "2025-07-07T13:32:46.139785Z", + "finishedAt": "2025-07-07T13:32:46.452696Z" + }, + { + "uid": 25, + "batchUid": 24, + "indexUid": "kefir", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "doggo_embedder": { + "source": "huggingFace", + "model": "sentence-transformers/all-MiniLM-L6-v2", + "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", + "documentTemplate": "{{doc.description}}" + } + } + }, + "error": null, + "duration": "PT0.247378S", + "enqueuedAt": "2025-07-07T13:28:27.390054Z", + "startedAt": "2025-07-07T13:28:27.391344Z", + "finishedAt": "2025-07-07T13:28:27.638722Z" + }, { "uid": 24, "batchUid": 23, @@ -264,134 +376,10 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "enqueuedAt": "2025-01-16T17:02:52.527382964Z", "startedAt": "2025-01-16T17:02:52.539749853Z", "finishedAt": "2025-01-16T17:02:52.547390016Z" - }, - { - "uid": 11, - "batchUid": 11, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "searchCutoffMs": 8000 - }, - "error": null, - "duration": "PT0.007307840S", - "enqueuedAt": "2025-01-16T17:01:14.100316617Z", - "startedAt": "2025-01-16T17:01:14.112756687Z", - "finishedAt": "2025-01-16T17:01:14.120064527Z" - }, - { - "uid": 10, - "batchUid": 10, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "faceting": { - "maxValuesPerFacet": 99 - }, - "pagination": { - "maxTotalHits": 15 - } - }, - "error": null, - "duration": "PT0.007391353S", - "enqueuedAt": "2025-01-16T17:00:29.188815062Z", - "startedAt": "2025-01-16T17:00:29.201180268Z", - "finishedAt": "2025-01-16T17:00:29.208571621Z" - }, - { - "uid": 9, - "batchUid": 9, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "faceting": { - "maxValuesPerFacet": 100 - }, - "pagination": { - "maxTotalHits": 1000 - } - }, - "error": null, - "duration": "PT0.007445825S", - "enqueuedAt": "2025-01-16T17:00:15.759501709Z", - "startedAt": "2025-01-16T17:00:15.77629445Z", - "finishedAt": "2025-01-16T17:00:15.783740275Z" - }, - { - "uid": 8, - "batchUid": 8, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - }, - "disableOnWords": [ - "kefir" - ], - "disableOnAttributes": [ - "surname" - ] - } - }, - "error": null, - "duration": "PT0.012020083S", - "enqueuedAt": "2025-01-16T16:59:42.727292501Z", - "startedAt": "2025-01-16T16:59:42.744086671Z", - "finishedAt": "2025-01-16T16:59:42.756106754Z" - }, - { - "uid": 7, - "batchUid": 7, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - } - } - }, - "error": null, - "duration": "PT0.007440092S", - "enqueuedAt": "2025-01-16T16:58:41.203145044Z", - "startedAt": "2025-01-16T16:58:41.2155771Z", - "finishedAt": "2025-01-16T16:58:41.223017192Z" - }, - { - "uid": 6, - "batchUid": 6, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "synonyms": { - "boubou": [ - "kefir" - ] - } - }, - "error": null, - "duration": "PT0.007565161S", - "enqueuedAt": "2025-01-16T16:54:51.927866243Z", - "startedAt": "2025-01-16T16:54:51.940332781Z", - "finishedAt": "2025-01-16T16:54:51.947897942Z" } ], - "total": 24, + "total": 30, "limit": 20, - "from": 25, - "next": 5 + "from": 31, + "next": 11 } diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap index 01d2ea341..61dd95786 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/tasks_filter_afterStartedAt_equal_2025-01-16T16_47_41.snap @@ -4,15 +4,15 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs { "results": [ { - "uid": 25, - "batchUid": 24, + "uid": 31, + "batchUid": 30, "indexUid": null, "status": "succeeded", "type": "upgradeDatabase", "canceledBy": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "error": null, "duration": "[duration]", @@ -20,6 +20,118 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "startedAt": "[date]", "finishedAt": "[date]" }, + { + "uid": 30, + "batchUid": 29, + "indexUid": "kefir", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.067201S", + "enqueuedAt": "2025-07-07T13:43:08.772432Z", + "startedAt": "2025-07-07T13:43:08.772854Z", + "finishedAt": "2025-07-07T13:43:08.840055Z" + }, + { + "uid": 29, + "batchUid": 28, + "indexUid": "mieli", + "status": "succeeded", + "type": "indexDeletion", + "canceledBy": null, + "details": { + "deletedDocuments": 1 + }, + "error": null, + "duration": "PT0.012727S", + "enqueuedAt": "2025-07-07T13:42:50.744793Z", + "startedAt": "2025-07-07T13:42:50.745461Z", + "finishedAt": "2025-07-07T13:42:50.758188Z" + }, + { + "uid": 28, + "batchUid": 27, + "indexUid": "kefir", + "status": "failed", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 0 + }, + "error": { + "message": "Index `kefir`: Bad embedder configuration in the document with id: `2`. Could not parse `._vectors.doggo_embedder`: trailing characters at line 1 column 13", + "code": "invalid_vectors_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_vectors_type" + }, + "duration": "PT0.059920S", + "enqueuedAt": "2025-07-07T13:42:15.624598Z", + "startedAt": "2025-07-07T13:42:15.625413Z", + "finishedAt": "2025-07-07T13:42:15.685333Z" + }, + { + "uid": 27, + "batchUid": 26, + "indexUid": "mieli", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.088879S", + "enqueuedAt": "2025-07-07T13:40:01.46081Z", + "startedAt": "2025-07-07T13:40:01.461741Z", + "finishedAt": "2025-07-07T13:40:01.55062Z" + }, + { + "uid": 26, + "batchUid": 25, + "indexUid": "kefir", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.312911S", + "enqueuedAt": "2025-07-07T13:32:46.13871Z", + "startedAt": "2025-07-07T13:32:46.139785Z", + "finishedAt": "2025-07-07T13:32:46.452696Z" + }, + { + "uid": 25, + "batchUid": 24, + "indexUid": "kefir", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "doggo_embedder": { + "source": "huggingFace", + "model": "sentence-transformers/all-MiniLM-L6-v2", + "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", + "documentTemplate": "{{doc.description}}" + } + } + }, + "error": null, + "duration": "PT0.247378S", + "enqueuedAt": "2025-07-07T13:28:27.390054Z", + "startedAt": "2025-07-07T13:28:27.391344Z", + "finishedAt": "2025-07-07T13:28:27.638722Z" + }, { "uid": 24, "batchUid": 23, @@ -264,134 +376,10 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "enqueuedAt": "2025-01-16T17:02:52.527382964Z", "startedAt": "2025-01-16T17:02:52.539749853Z", "finishedAt": "2025-01-16T17:02:52.547390016Z" - }, - { - "uid": 11, - "batchUid": 11, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "searchCutoffMs": 8000 - }, - "error": null, - "duration": "PT0.007307840S", - "enqueuedAt": "2025-01-16T17:01:14.100316617Z", - "startedAt": "2025-01-16T17:01:14.112756687Z", - "finishedAt": "2025-01-16T17:01:14.120064527Z" - }, - { - "uid": 10, - "batchUid": 10, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "faceting": { - "maxValuesPerFacet": 99 - }, - "pagination": { - "maxTotalHits": 15 - } - }, - "error": null, - "duration": "PT0.007391353S", - "enqueuedAt": "2025-01-16T17:00:29.188815062Z", - "startedAt": "2025-01-16T17:00:29.201180268Z", - "finishedAt": "2025-01-16T17:00:29.208571621Z" - }, - { - "uid": 9, - "batchUid": 9, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "faceting": { - "maxValuesPerFacet": 100 - }, - "pagination": { - "maxTotalHits": 1000 - } - }, - "error": null, - "duration": "PT0.007445825S", - "enqueuedAt": "2025-01-16T17:00:15.759501709Z", - "startedAt": "2025-01-16T17:00:15.77629445Z", - "finishedAt": "2025-01-16T17:00:15.783740275Z" - }, - { - "uid": 8, - "batchUid": 8, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - }, - "disableOnWords": [ - "kefir" - ], - "disableOnAttributes": [ - "surname" - ] - } - }, - "error": null, - "duration": "PT0.012020083S", - "enqueuedAt": "2025-01-16T16:59:42.727292501Z", - "startedAt": "2025-01-16T16:59:42.744086671Z", - "finishedAt": "2025-01-16T16:59:42.756106754Z" - }, - { - "uid": 7, - "batchUid": 7, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "typoTolerance": { - "minWordSizeForTypos": { - "oneTypo": 4 - } - } - }, - "error": null, - "duration": "PT0.007440092S", - "enqueuedAt": "2025-01-16T16:58:41.203145044Z", - "startedAt": "2025-01-16T16:58:41.2155771Z", - "finishedAt": "2025-01-16T16:58:41.223017192Z" - }, - { - "uid": 6, - "batchUid": 6, - "indexUid": "kefir", - "status": "succeeded", - "type": "settingsUpdate", - "canceledBy": null, - "details": { - "synonyms": { - "boubou": [ - "kefir" - ] - } - }, - "error": null, - "duration": "PT0.007565161S", - "enqueuedAt": "2025-01-16T16:54:51.927866243Z", - "startedAt": "2025-01-16T16:54:51.940332781Z", - "finishedAt": "2025-01-16T16:54:51.947897942Z" } ], - "total": 24, + "total": 30, "limit": 20, - "from": 25, - "next": 5 + "from": 31, + "next": 11 } diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_batch_queue_once_everything_has_been_processed.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_batch_queue_once_everything_has_been_processed.snap index fb62b35da..8103ceed2 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_batch_queue_once_everything_has_been_processed.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_batch_queue_once_everything_has_been_processed.snap @@ -4,11 +4,11 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs { "results": [ { - "uid": 24, + "uid": 30, "progress": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "stats": { "totalNbTasks": 1, @@ -26,6 +26,155 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "finishedAt": "[date]", "batchStrategy": "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type." }, + { + "uid": 29, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.067201S", + "startedAt": "2025-07-07T13:43:08.772854Z", + "finishedAt": "2025-07-07T13:43:08.840055Z", + "batchStrategy": "unspecified" + }, + { + "uid": 28, + "progress": null, + "details": { + "deletedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "indexDeletion": 1 + }, + "indexUids": { + "mieli": 1 + } + }, + "duration": "PT0.012727S", + "startedAt": "2025-07-07T13:42:50.745461Z", + "finishedAt": "2025-07-07T13:42:50.758188Z", + "batchStrategy": "unspecified" + }, + { + "uid": 27, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 0 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "failed": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.059920S", + "startedAt": "2025-07-07T13:42:15.625413Z", + "finishedAt": "2025-07-07T13:42:15.685333Z", + "batchStrategy": "unspecified" + }, + { + "uid": 26, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "mieli": 1 + } + }, + "duration": "PT0.088879S", + "startedAt": "2025-07-07T13:40:01.461741Z", + "finishedAt": "2025-07-07T13:40:01.55062Z", + "batchStrategy": "unspecified" + }, + { + "uid": 25, + "progress": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "documentAdditionOrUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.312911S", + "startedAt": "2025-07-07T13:32:46.139785Z", + "finishedAt": "2025-07-07T13:32:46.452696Z", + "batchStrategy": "unspecified" + }, + { + "uid": 24, + "progress": null, + "details": { + "embedders": { + "doggo_embedder": { + "source": "huggingFace", + "model": "sentence-transformers/all-MiniLM-L6-v2", + "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", + "documentTemplate": "{{doc.description}}" + } + } + }, + "stats": { + "totalNbTasks": 1, + "status": { + "succeeded": 1 + }, + "types": { + "settingsUpdate": 1 + }, + "indexUids": { + "kefir": 1 + } + }, + "duration": "PT0.247378S", + "startedAt": "2025-07-07T13:28:27.391344Z", + "finishedAt": "2025-07-07T13:28:27.638722Z", + "batchStrategy": "unspecified" + }, { "uid": 23, "progress": null, @@ -642,8 +791,8 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "batchStrategy": "unspecified" } ], - "total": 25, + "total": 31, "limit": 1000, - "from": 24, + "from": 30, "next": null } diff --git a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_task_queue_once_everything_has_been_processed.snap b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_task_queue_once_everything_has_been_processed.snap index abb4dcdd9..81259377c 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_task_queue_once_everything_has_been_processed.snap +++ b/crates/meilisearch/tests/upgrade/v1_12/snapshots/v1_12_0.rs/check_the_index_scheduler/the_whole_task_queue_once_everything_has_been_processed.snap @@ -4,15 +4,15 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs { "results": [ { - "uid": 25, - "batchUid": 24, + "uid": 31, + "batchUid": 30, "indexUid": null, "status": "succeeded", "type": "upgradeDatabase", "canceledBy": null, "details": { "upgradeFrom": "v1.12.0", - "upgradeTo": "v1.16.0" + "upgradeTo": "v1.17.1" }, "error": null, "duration": "[duration]", @@ -20,6 +20,118 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "startedAt": "[date]", "finishedAt": "[date]" }, + { + "uid": 30, + "batchUid": 29, + "indexUid": "kefir", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.067201S", + "enqueuedAt": "2025-07-07T13:43:08.772432Z", + "startedAt": "2025-07-07T13:43:08.772854Z", + "finishedAt": "2025-07-07T13:43:08.840055Z" + }, + { + "uid": 29, + "batchUid": 28, + "indexUid": "mieli", + "status": "succeeded", + "type": "indexDeletion", + "canceledBy": null, + "details": { + "deletedDocuments": 1 + }, + "error": null, + "duration": "PT0.012727S", + "enqueuedAt": "2025-07-07T13:42:50.744793Z", + "startedAt": "2025-07-07T13:42:50.745461Z", + "finishedAt": "2025-07-07T13:42:50.758188Z" + }, + { + "uid": 28, + "batchUid": 27, + "indexUid": "kefir", + "status": "failed", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 0 + }, + "error": { + "message": "Index `kefir`: Bad embedder configuration in the document with id: `2`. Could not parse `._vectors.doggo_embedder`: trailing characters at line 1 column 13", + "code": "invalid_vectors_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_vectors_type" + }, + "duration": "PT0.059920S", + "enqueuedAt": "2025-07-07T13:42:15.624598Z", + "startedAt": "2025-07-07T13:42:15.625413Z", + "finishedAt": "2025-07-07T13:42:15.685333Z" + }, + { + "uid": 27, + "batchUid": 26, + "indexUid": "mieli", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.088879S", + "enqueuedAt": "2025-07-07T13:40:01.46081Z", + "startedAt": "2025-07-07T13:40:01.461741Z", + "finishedAt": "2025-07-07T13:40:01.55062Z" + }, + { + "uid": 26, + "batchUid": 25, + "indexUid": "kefir", + "status": "succeeded", + "type": "documentAdditionOrUpdate", + "canceledBy": null, + "details": { + "receivedDocuments": 1, + "indexedDocuments": 1 + }, + "error": null, + "duration": "PT0.312911S", + "enqueuedAt": "2025-07-07T13:32:46.13871Z", + "startedAt": "2025-07-07T13:32:46.139785Z", + "finishedAt": "2025-07-07T13:32:46.452696Z" + }, + { + "uid": 25, + "batchUid": 24, + "indexUid": "kefir", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "doggo_embedder": { + "source": "huggingFace", + "model": "sentence-transformers/all-MiniLM-L6-v2", + "revision": "e4ce9877abf3edfe10b0d82785e83bdcb973e22e", + "documentTemplate": "{{doc.description}}" + } + } + }, + "error": null, + "duration": "PT0.247378S", + "enqueuedAt": "2025-07-07T13:28:27.390054Z", + "startedAt": "2025-07-07T13:28:27.391344Z", + "finishedAt": "2025-07-07T13:28:27.638722Z" + }, { "uid": 24, "batchUid": 23, @@ -497,8 +609,8 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs "finishedAt": "2025-01-16T16:45:16.131303739Z" } ], - "total": 26, + "total": 32, "limit": 1000, - "from": 25, + "from": 31, "next": null } diff --git a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/auth/lock.mdb b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/auth/lock.mdb index 4c80ffe2c..80fb2b9d5 100644 Binary files a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/auth/lock.mdb and b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/auth/lock.mdb differ diff --git a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/indexes/381abe91-f939-4b91-92f2-01a24c2e8e3d/data.mdb b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/indexes/381abe91-f939-4b91-92f2-01a24c2e8e3d/data.mdb index c31db3415..95ca0a9da 100644 Binary files a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/indexes/381abe91-f939-4b91-92f2-01a24c2e8e3d/data.mdb and b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/indexes/381abe91-f939-4b91-92f2-01a24c2e8e3d/data.mdb differ diff --git a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/indexes/381abe91-f939-4b91-92f2-01a24c2e8e3d/lock.mdb b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/indexes/381abe91-f939-4b91-92f2-01a24c2e8e3d/lock.mdb index c99608b77..5fa5e6b49 100644 Binary files a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/indexes/381abe91-f939-4b91-92f2-01a24c2e8e3d/lock.mdb and b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/indexes/381abe91-f939-4b91-92f2-01a24c2e8e3d/lock.mdb differ diff --git a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/tasks/data.mdb b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/tasks/data.mdb index 226be2332..f2bcb1b8b 100644 Binary files a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/tasks/data.mdb and b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/tasks/data.mdb differ diff --git a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/tasks/lock.mdb b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/tasks/lock.mdb index 6d38eab08..b8e0e358d 100644 Binary files a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/tasks/lock.mdb and b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.ms/tasks/lock.mdb differ diff --git a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs index 1b2ae054c..b98f27b2d 100644 --- a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs +++ b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs @@ -114,13 +114,13 @@ async fn check_the_index_scheduler(server: &Server) { // All the indexes are still present let (indexes, _) = server.list_indexes(None, None).await; - snapshot!(indexes, @r#" + snapshot!(indexes, @r###" { "results": [ { "uid": "kefir", "createdAt": "2025-01-16T16:45:16.020663157Z", - "updatedAt": "2025-01-23T11:36:22.634859166Z", + "updatedAt": "2025-07-07T13:43:08.835381Z", "primaryKey": "id" } ], @@ -128,7 +128,7 @@ async fn check_the_index_scheduler(server: &Server) { "limit": 20, "total": 1 } - "#); + "###); // And their metadata are still right let (stats, _) = server.stats().await; assert_json_snapshot!(stats, { @@ -141,21 +141,21 @@ async fn check_the_index_scheduler(server: &Server) { { "databaseSize": "[bytes]", "usedDatabaseSize": "[bytes]", - "lastUpdate": "2025-01-23T11:36:22.634859166Z", + "lastUpdate": "2025-07-07T13:43:08.835381Z", "indexes": { "kefir": { - "numberOfDocuments": 1, + "numberOfDocuments": 2, "rawDocumentDbSize": "[bytes]", "avgDocumentSize": "[bytes]", "isIndexing": false, - "numberOfEmbeddings": 0, - "numberOfEmbeddedDocuments": 0, + "numberOfEmbeddings": 2, + "numberOfEmbeddedDocuments": 2, "fieldDistribution": { - "age": 1, - "description": 1, - "id": 1, - "name": 1, - "surname": 1 + "age": 2, + "description": 2, + "id": 2, + "name": 2, + "surname": 2 } } } @@ -227,21 +227,21 @@ async fn check_the_index_scheduler(server: &Server) { { "databaseSize": "[bytes]", "usedDatabaseSize": "[bytes]", - "lastUpdate": "2025-01-23T11:36:22.634859166Z", + "lastUpdate": "2025-07-07T13:43:08.835381Z", "indexes": { "kefir": { - "numberOfDocuments": 1, + "numberOfDocuments": 2, "rawDocumentDbSize": "[bytes]", "avgDocumentSize": "[bytes]", "isIndexing": false, - "numberOfEmbeddings": 0, - "numberOfEmbeddedDocuments": 0, + "numberOfEmbeddings": 2, + "numberOfEmbeddedDocuments": 2, "fieldDistribution": { - "age": 1, - "description": 1, - "id": 1, - "name": 1, - "surname": 1 + "age": 2, + "description": 2, + "id": 2, + "name": 2, + "surname": 2 } } } @@ -254,18 +254,18 @@ async fn check_the_index_scheduler(server: &Server) { ".avgDocumentSize" => "[bytes]", }), @r###" { - "numberOfDocuments": 1, + "numberOfDocuments": 2, "rawDocumentDbSize": "[bytes]", "avgDocumentSize": "[bytes]", "isIndexing": false, - "numberOfEmbeddings": 0, - "numberOfEmbeddedDocuments": 0, + "numberOfEmbeddings": 2, + "numberOfEmbeddedDocuments": 2, "fieldDistribution": { - "age": 1, - "description": 1, - "id": 1, - "name": 1, - "surname": 1 + "age": 2, + "description": 2, + "id": 2, + "name": 2, + "surname": 2 } } "###); @@ -295,4 +295,8 @@ async fn check_the_index_features(server: &Server) { let (results, _status) = kefir.search_post(json!({ "sort": ["age:asc"], "filter": "surname = kefirounet" })).await; snapshot!(results, name: "search_with_sort_and_filter"); + + // ensuring we can get the vectors and their `regenerate` is still good. + let (results, _status) = kefir.search_post(json!({"retrieveVectors": true})).await; + snapshot!(json_string!(results["hits"], {"[]._vectors.doggo_embedder.embeddings" => "[vector]"}), name: "search_with_retrieve_vectors"); } diff --git a/crates/meilisearch/tests/vector/binary_quantized.rs b/crates/meilisearch/tests/vector/binary_quantized.rs index 6fcfa3563..adb0da441 100644 --- a/crates/meilisearch/tests/vector/binary_quantized.rs +++ b/crates/meilisearch/tests/vector/binary_quantized.rs @@ -323,7 +323,7 @@ async fn binary_quantize_clear_documents() { // Make sure the arroy DB has been cleared let (documents, _code) = index.search_post(json!({ "hybrid": { "embedder": "manual" }, "vector": [1, 1, 1] })).await; - snapshot!(documents, @r###" + snapshot!(documents, @r#" { "hits": [], "query": "", @@ -333,5 +333,5 @@ async fn binary_quantize_clear_documents() { "estimatedTotalHits": 0, "semanticHitCount": 0 } - "###); + "#); } diff --git a/crates/meilisearch/tests/vector/fragments.rs b/crates/meilisearch/tests/vector/fragments.rs new file mode 100644 index 000000000..81c2e3a55 --- /dev/null +++ b/crates/meilisearch/tests/vector/fragments.rs @@ -0,0 +1,2120 @@ +use meili_snap::{json_string, snapshot}; + +use crate::common::{ + init_fragments_index, init_fragments_index_composite, shared_index_for_fragments, +}; +use crate::json; +use crate::vector::{GetAllDocumentsOptions, Server}; + +#[actix_rt::test] +async fn experimental_feature_not_enabled() { + let server = Server::new().await; + let index = server.unique_index(); + + let settings = json!({ + "embedders": { + "rest": { + "source": "rest", + "url": "http://localhost:1337", + "dimensions": 3, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + }, + "indexingFragments": { + "basic": {"value": "{{ doc.name }} is a dog"}, + }, + "searchFragments": { + "query": {"value": "Some pre-prompt for query {{ q }}"}, + } + }, + }, + }); + let (response, code) = index.update_settings(settings.clone()).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(response, @r#" + { + "message": "setting `indexingFragments` requires enabling the `multimodal` experimental feature. See https://github.com/orgs/meilisearch/discussions/846", + "code": "feature_not_enabled", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#feature_not_enabled" + } + "#); +} + +#[actix_rt::test] +async fn indexing_fragments() { + let index = shared_index_for_fragments().await; + + // Make sure the documents have been indexed and their embeddings retrieved + let (documents, code) = index + .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) + .await; + snapshot!(code, @"200 OK"); + snapshot!(documents, @r#" + { + "results": [ + { + "id": 0, + "name": "kefir", + "_vectors": { + "rest": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 1, + "name": "echo", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ] + ], + "regenerate": false + } + } + }, + { + "id": 2, + "name": "intel", + "breed": "labrador", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 0.0 + ], + [ + 1.0, + 1.0, + -1.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 3, + "name": "dustin", + "breed": "bulldog", + "_vectors": { + "rest": { + "embeddings": [ + [ + -0.5, + 0.5, + 0.0 + ], + [ + -0.5, + 0.5, + 1.0 + ] + ], + "regenerate": true + } + } + } + ], + "offset": 0, + "limit": 20, + "total": 4 + } + "#); +} + +#[actix_rt::test] +async fn replace_document() { + let (server, uid, _settings) = init_fragments_index().await; + let index = server.index(uid); + + let documents = json!([ + { "id": 0, "name": "kefir", "breed": "sorry-I-forgot" }, + ]); + let (value, code) = index.add_documents(documents, None).await; + snapshot!(code, @"202 Accepted"); + + server.wait_task(value.uid()).await.succeeded(); + + // Make sure kefir now has 2 vectors + let (documents, code) = index + .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) + .await; + snapshot!(code, @"200 OK"); + snapshot!(documents, @r#" + { + "results": [ + { + "id": 0, + "name": "kefir", + "breed": "sorry-I-forgot", + "_vectors": { + "rest": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ], + [ + 0.5, + -0.5, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 1, + "name": "echo", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ] + ], + "regenerate": false + } + } + }, + { + "id": 2, + "name": "intel", + "breed": "labrador", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 0.0 + ], + [ + 1.0, + 1.0, + -1.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 3, + "name": "dustin", + "breed": "bulldog", + "_vectors": { + "rest": { + "embeddings": [ + [ + -0.5, + 0.5, + 0.0 + ], + [ + -0.5, + 0.5, + 1.0 + ] + ], + "regenerate": true + } + } + } + ], + "offset": 0, + "limit": 20, + "total": 4 + } + "#); +} + +#[actix_rt::test] +async fn search_with_vector() { + let index = shared_index_for_fragments().await; + + let (value, code) = index.search_post( + json!({"vector": [1.0, 1.0, 1.0], "hybrid": {"semanticRatio": 1.0, "embedder": "rest"}, "limit": 1} + )).await; + snapshot!(code, @"200 OK"); + snapshot!(value, @r#" + { + "hits": [ + { + "id": 1, + "name": "echo" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 1, + "offset": 0, + "estimatedTotalHits": 4, + "semanticHitCount": 1 + } + "#); +} + +#[actix_rt::test] +async fn search_with_media() { + let index = shared_index_for_fragments().await; + + let (value, code) = index + .search_post(json!({ + "media": { "breed": "labrador" }, + "hybrid": {"semanticRatio": 1.0, "embedder": "rest"}, + "limit": 1 + } + )) + .await; + snapshot!(code, @"200 OK"); + snapshot!(value, @r#" + { + "hits": [ + { + "id": 2, + "name": "intel", + "breed": "labrador" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 1, + "offset": 0, + "estimatedTotalHits": 4, + "semanticHitCount": 1 + } + "#); +} + +#[actix_rt::test] +async fn search_with_media_and_vector() { + let index = shared_index_for_fragments().await; + + let (value, code) = index + .search_post(json!({ + "vector": [1.0, 1.0, 1.0], + "media": { "breed": "labrador" }, + "hybrid": {"semanticRatio": 1.0, "embedder": "rest"}, + "limit": 1 + } + )) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Invalid request: both `media` and `vector` parameters are present.", + "code": "invalid_search_media_and_vector", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_search_media_and_vector" + } + "#); +} + +#[actix_rt::test] +async fn search_with_media_matching_multiple_fragments() { + let index = shared_index_for_fragments().await; + + let (value, code) = index + .search_post(json!({ + "media": { "name": "dustin", "breed": "labrador" }, + "hybrid": {"semanticRatio": 1.0, "embedder": "rest"}, + "limit": 1 + } + )) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Error while generating embeddings: user error: Query matches multiple search fragments.\n - Note: First matched fragment `justBreed`.\n - Note: Second matched fragment `justName`.\n - Note: {\"q\":null,\"media\":{\"name\":\"dustin\",\"breed\":\"labrador\"}}", + "code": "vector_embedding_error", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#vector_embedding_error" + } + "#); +} + +#[actix_rt::test] +async fn search_with_media_matching_no_fragment() { + let index = shared_index_for_fragments().await; + + let (value, code) = index + .search_post(json!({ + "media": { "ticker": "GME", "section": "portfolio" }, + "hybrid": {"semanticRatio": 1.0, "embedder": "rest"}, + "limit": 1 + } + )) + .await; + snapshot!(code, @"400 Bad Request"); + snapshot!(value, @r#" + { + "message": "Error while generating embeddings: user error: Query matches no search fragment.\n - Note: {\"q\":null,\"media\":{\"ticker\":\"GME\",\"section\":\"portfolio\"}}", + "code": "vector_embedding_error", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#vector_embedding_error" + } + "#); +} + +#[actix_rt::test] +async fn search_with_query() { + let index = shared_index_for_fragments().await; + + let (value, code) = index + .search_post(json!({ + "q": "bulldog", + "hybrid": {"semanticRatio": 1.0, "embedder": "rest"}, + "limit": 1 + } + )) + .await; + snapshot!(code, @"200 OK"); + snapshot!(value, @r#" + { + "hits": [ + { + "id": 3, + "name": "dustin", + "breed": "bulldog" + } + ], + "query": "bulldog", + "processingTimeMs": "[duration]", + "limit": 1, + "offset": 0, + "estimatedTotalHits": 4, + "semanticHitCount": 1 + } + "#); +} + +#[actix_rt::test] +async fn deleting_fragments_deletes_vectors() { + let (server, uid, mut settings) = init_fragments_index().await; + let index = server.index(uid); + + settings["embedders"]["rest"]["indexingFragments"]["basic"] = serde_json::Value::Null; + + let (response, code) = index.update_settings(settings).await; + snapshot!(code, @"202 Accepted"); + let value = server.wait_task(response.uid()).await.succeeded(); + snapshot!(value, @r#" + { + "uid": "[uid]", + "batchUid": "[batch_uid]", + "indexUid": "[uuid]", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "rest": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "basic": null, + "withBreed": { + "value": "{{ doc.name }} is a {{ doc.breed }}" + } + }, + "searchFragments": { + "justBreed": { + "value": "It's a {{ media.breed }}" + }, + "justName": { + "value": "{{ media.name }} is a dog" + }, + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + } + } + } + }, + "error": null, + "duration": "[duration]", + "enqueuedAt": "[date]", + "startedAt": "[date]", + "finishedAt": "[date]" + } + "#); + + let (value, code) = index.settings().await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(value["embedders"], { + ".rest.url" => "[url]", + }), @r#" + { + "rest": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "withBreed": { + "value": "{{ doc.name }} is a {{ doc.breed }}" + } + }, + "searchFragments": { + "justBreed": { + "value": "It's a {{ media.breed }}" + }, + "justName": { + "value": "{{ media.name }} is a dog" + }, + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + }, + "headers": {} + } + } + "#); + + let (documents, code) = index + .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) + .await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(documents), @r###" + { + "results": [ + { + "id": 0, + "name": "kefir", + "_vectors": { + "rest": { + "embeddings": [], + "regenerate": true + } + } + }, + { + "id": 1, + "name": "echo", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ] + ], + "regenerate": false + } + } + }, + { + "id": 2, + "name": "intel", + "breed": "labrador", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + -1.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 3, + "name": "dustin", + "breed": "bulldog", + "_vectors": { + "rest": { + "embeddings": [ + [ + -0.5, + 0.5, + 1.0 + ] + ], + "regenerate": true + } + } + } + ], + "offset": 0, + "limit": 20, + "total": 4 + } + "###); +} + +#[actix_rt::test] +async fn modifying_fragments_modifies_vectors() { + let (server, uid, mut settings) = init_fragments_index().await; + let index = server.index(uid); + + settings["embedders"]["rest"]["indexingFragments"]["basic"]["value"] = + serde_json::Value::String("{{ doc.name }} is a dog (maybe bulldog?)".to_string()); + + let (response, code) = index.update_settings(settings).await; + snapshot!(code, @"202 Accepted"); + let value = server.wait_task(response.uid()).await.succeeded(); + snapshot!(value, @r#" + { + "uid": "[uid]", + "batchUid": "[batch_uid]", + "indexUid": "[uuid]", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "rest": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "basic": { + "value": "{{ doc.name }} is a dog (maybe bulldog?)" + }, + "withBreed": { + "value": "{{ doc.name }} is a {{ doc.breed }}" + } + }, + "searchFragments": { + "justBreed": { + "value": "It's a {{ media.breed }}" + }, + "justName": { + "value": "{{ media.name }} is a dog" + }, + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + } + } + } + }, + "error": null, + "duration": "[duration]", + "enqueuedAt": "[date]", + "startedAt": "[date]", + "finishedAt": "[date]" + } + "#); + + let (documents, code) = index + .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) + .await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(documents), @r#" + { + "results": [ + { + "id": 0, + "name": "kefir", + "_vectors": { + "rest": { + "embeddings": [ + [ + 0.5, + -0.5, + 1.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 1, + "name": "echo", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ] + ], + "regenerate": false + } + } + }, + { + "id": 2, + "name": "intel", + "breed": "labrador", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ], + [ + 1.0, + 1.0, + -1.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 3, + "name": "dustin", + "breed": "bulldog", + "_vectors": { + "rest": { + "embeddings": [ + [ + -0.5, + 0.5, + 1.0 + ], + [ + -0.5, + 0.5, + 1.0 + ] + ], + "regenerate": true + } + } + } + ], + "offset": 0, + "limit": 20, + "total": 4 + } + "#); +} + +#[actix_rt::test] +async fn swapping_fragments() { + let (server, uid, mut settings) = init_fragments_index().await; + let index = server.index(uid); + + let basic = settings["embedders"]["rest"]["indexingFragments"]["basic"].clone(); + let with_breed = settings["embedders"]["rest"]["indexingFragments"]["withBreed"].clone(); + settings["embedders"]["rest"]["indexingFragments"]["basic"] = with_breed; + settings["embedders"]["rest"]["indexingFragments"]["withBreed"] = basic; + + let (response, code) = index.update_settings(settings).await; + snapshot!(code, @"202 Accepted"); + let value = server.wait_task(response.uid()).await.succeeded(); + snapshot!(value, @r#" + { + "uid": "[uid]", + "batchUid": "[batch_uid]", + "indexUid": "[uuid]", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "rest": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "basic": { + "value": "{{ doc.name }} is a {{ doc.breed }}" + }, + "withBreed": { + "value": "{{ doc.name }} is a dog" + } + }, + "searchFragments": { + "justBreed": { + "value": "It's a {{ media.breed }}" + }, + "justName": { + "value": "{{ media.name }} is a dog" + }, + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + } + } + } + }, + "error": null, + "duration": "[duration]", + "enqueuedAt": "[date]", + "startedAt": "[date]", + "finishedAt": "[date]" + } + "#); + + let (documents, code) = index + .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) + .await; + snapshot!(code, @"200 OK"); + snapshot!(documents, @r#" + { + "results": [ + { + "id": 0, + "name": "kefir", + "_vectors": { + "rest": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 1, + "name": "echo", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ] + ], + "regenerate": false + } + } + }, + { + "id": 2, + "name": "intel", + "breed": "labrador", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + -1.0 + ], + [ + 1.0, + 1.0, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 3, + "name": "dustin", + "breed": "bulldog", + "_vectors": { + "rest": { + "embeddings": [ + [ + -0.5, + 0.5, + 1.0 + ], + [ + -0.5, + 0.5, + 0.0 + ] + ], + "regenerate": true + } + } + } + ], + "offset": 0, + "limit": 20, + "total": 4 + } + "#); +} + +#[actix_rt::test] +async fn ommitted_fragment_isnt_removed() { + let (server, uid, mut settings) = init_fragments_index().await; + let index = server.index(uid); + + settings["embedders"]["rest"]["indexingFragments"]["basic"] = serde_json::Value::Null; // basic is removed + settings["embedders"]["rest"]["indexingFragments"].as_object_mut().unwrap().remove("withBreed"); // withBreed isn't specified + + let (response, code) = index.update_settings(settings).await; + snapshot!(code, @"202 Accepted"); + let value = server.wait_task(response.uid()).await.succeeded(); + snapshot!(value, @r#" + { + "uid": "[uid]", + "batchUid": "[batch_uid]", + "indexUid": "[uuid]", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "rest": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "basic": null + }, + "searchFragments": { + "justBreed": { + "value": "It's a {{ media.breed }}" + }, + "justName": { + "value": "{{ media.name }} is a dog" + }, + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + } + } + } + }, + "error": null, + "duration": "[duration]", + "enqueuedAt": "[date]", + "startedAt": "[date]", + "finishedAt": "[date]" + } + "#); + + // Make sure withBreed is still here because it wasn't specified + let (value, code) = index.settings().await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(value["embedders"], { + ".rest.url" => "[url]", + }), @r#" + { + "rest": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "withBreed": { + "value": "{{ doc.name }} is a {{ doc.breed }}" + } + }, + "searchFragments": { + "justBreed": { + "value": "It's a {{ media.breed }}" + }, + "justName": { + "value": "{{ media.name }} is a dog" + }, + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + }, + "headers": {} + } + } + "#); +} + +#[actix_rt::test] +async fn fragment_insertion() { + let (server, uid, mut settings) = init_fragments_index().await; + let index = server.index(uid); + + settings["embedders"]["rest"]["indexingFragments"].as_object_mut().unwrap().insert( + String::from("useless"), + serde_json::json!({ + "value": "This fragment is useless" + }), + ); + + let (response, code) = index.update_settings(settings).await; + snapshot!(code, @"202 Accepted"); + let value = server.wait_task(response.uid()).await.succeeded(); + snapshot!(value, @r#" + { + "uid": "[uid]", + "batchUid": "[batch_uid]", + "indexUid": "[uuid]", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "rest": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "basic": { + "value": "{{ doc.name }} is a dog" + }, + "useless": { + "value": "This fragment is useless" + }, + "withBreed": { + "value": "{{ doc.name }} is a {{ doc.breed }}" + } + }, + "searchFragments": { + "justBreed": { + "value": "It's a {{ media.breed }}" + }, + "justName": { + "value": "{{ media.name }} is a dog" + }, + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + } + } + } + }, + "error": null, + "duration": "[duration]", + "enqueuedAt": "[date]", + "startedAt": "[date]", + "finishedAt": "[date]" + } + "#); + + let (documents, code) = index + .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) + .await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(documents), @r#" + { + "results": [ + { + "id": 0, + "name": "kefir", + "_vectors": { + "rest": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ], + [ + 0.0, + 0.0, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 1, + "name": "echo", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ] + ], + "regenerate": false + } + } + }, + { + "id": 2, + "name": "intel", + "breed": "labrador", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 0.0 + ], + [ + 1.0, + 1.0, + -1.0 + ], + [ + 0.0, + 0.0, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 3, + "name": "dustin", + "breed": "bulldog", + "_vectors": { + "rest": { + "embeddings": [ + [ + -0.5, + 0.5, + 0.0 + ], + [ + -0.5, + 0.5, + 1.0 + ], + [ + 0.0, + 0.0, + 0.0 + ] + ], + "regenerate": true + } + } + } + ], + "offset": 0, + "limit": 20, + "total": 4 + } + "#); +} + +#[actix_rt::test] +async fn multiple_embedders() { + let (server, uid, mut settings) = init_fragments_index().await; + let index = server.index(uid); + + let url = settings["embedders"]["rest"]["url"].as_str().unwrap(); + + let settings2 = json!({ + "embedders": { + "rest2": { + "source": "rest", + "url": url, + "dimensions": 3, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + }, + "indexingFragments": { + "withBreed": {"value": "{{ doc.name }} is a {{ doc.breed }}"}, + "basic": {"value": "{{ doc.name }} is a dog"}, + }, + "searchFragments": { + "query": {"value": "Some pre-prompt for query {{ q }}"}, + } + }, + "rest3": { + "source": "rest", + "url": url, + "dimensions": 3, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + }, + "indexingFragments": { + "basic": {"value": "{{ doc.name }} is a dog"}, + }, + "searchFragments": { + "query": {"value": "Some pre-prompt for query {{ q }}"}, + } + }, + }, + }); + let (response, code) = index.update_settings(settings2).await; + snapshot!(code, @"202 Accepted"); + let task = server.wait_task(response.uid()).await.succeeded(); + snapshot!(task, @r#" + { + "uid": "[uid]", + "batchUid": "[batch_uid]", + "indexUid": "[uuid]", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "rest2": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "basic": { + "value": "{{ doc.name }} is a dog" + }, + "withBreed": { + "value": "{{ doc.name }} is a {{ doc.breed }}" + } + }, + "searchFragments": { + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + } + }, + "rest3": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "basic": { + "value": "{{ doc.name }} is a dog" + } + }, + "searchFragments": { + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + } + } + } + }, + "error": null, + "duration": "[duration]", + "enqueuedAt": "[date]", + "startedAt": "[date]", + "finishedAt": "[date]" + } + "#); + + let (documents, code) = index + .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) + .await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(documents), @r#" + { + "results": [ + { + "id": 0, + "name": "kefir", + "_vectors": { + "rest": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ] + ], + "regenerate": true + }, + "rest2": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ] + ], + "regenerate": true + }, + "rest3": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 1, + "name": "echo", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ] + ], + "regenerate": false + }, + "rest2": { + "embeddings": [ + [ + 0.0, + 0.0, + 0.0 + ] + ], + "regenerate": true + }, + "rest3": { + "embeddings": [ + [ + 0.0, + 0.0, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 2, + "name": "intel", + "breed": "labrador", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 0.0 + ], + [ + 1.0, + 1.0, + -1.0 + ] + ], + "regenerate": true + }, + "rest2": { + "embeddings": [ + [ + 1.0, + 1.0, + 0.0 + ], + [ + 1.0, + 1.0, + -1.0 + ] + ], + "regenerate": true + }, + "rest3": { + "embeddings": [ + [ + 1.0, + 1.0, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 3, + "name": "dustin", + "breed": "bulldog", + "_vectors": { + "rest": { + "embeddings": [ + [ + -0.5, + 0.5, + 0.0 + ], + [ + -0.5, + 0.5, + 1.0 + ] + ], + "regenerate": true + }, + "rest2": { + "embeddings": [ + [ + -0.5, + 0.5, + 0.0 + ], + [ + -0.5, + 0.5, + 1.0 + ] + ], + "regenerate": true + }, + "rest3": { + "embeddings": [ + [ + -0.5, + 0.5, + 0.0 + ] + ], + "regenerate": true + } + } + } + ], + "offset": 0, + "limit": 20, + "total": 4 + } + "#); + + // Remove Rest2 + + settings["embedders"]["rest2"] = serde_json::Value::Null; + + let (response, code) = index.update_settings(settings.clone()).await; + snapshot!(code, @"202 Accepted"); + server.wait_task(response.uid()).await.succeeded(); + + let (documents, code) = index + .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) + .await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(documents), @r#" + { + "results": [ + { + "id": 0, + "name": "kefir", + "_vectors": { + "rest": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ] + ], + "regenerate": true + }, + "rest3": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 1, + "name": "echo", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ] + ], + "regenerate": false + }, + "rest3": { + "embeddings": [ + [ + 0.0, + 0.0, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 2, + "name": "intel", + "breed": "labrador", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 0.0 + ], + [ + 1.0, + 1.0, + -1.0 + ] + ], + "regenerate": true + }, + "rest3": { + "embeddings": [ + [ + 1.0, + 1.0, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 3, + "name": "dustin", + "breed": "bulldog", + "_vectors": { + "rest": { + "embeddings": [ + [ + -0.5, + 0.5, + 0.0 + ], + [ + -0.5, + 0.5, + 1.0 + ] + ], + "regenerate": true + }, + "rest3": { + "embeddings": [ + [ + -0.5, + 0.5, + 0.0 + ] + ], + "regenerate": true + } + } + } + ], + "offset": 0, + "limit": 20, + "total": 4 + } + "#); + + // Remove rest's basic fragment + + settings["embedders"]["rest"]["indexingFragments"]["basic"] = serde_json::Value::Null; + //settings["embedders"].as_object_mut().unwrap().remove("rest2"); + + let (response, code) = index.update_settings(settings).await; + snapshot!(code, @"202 Accepted"); + server.wait_task(response.uid()).await.succeeded(); + + let (documents, code) = index + .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) + .await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(documents), @r#" + { + "results": [ + { + "id": 0, + "name": "kefir", + "_vectors": { + "rest": { + "embeddings": [], + "regenerate": true + }, + "rest3": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 1, + "name": "echo", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ] + ], + "regenerate": false + }, + "rest3": { + "embeddings": [ + [ + 0.0, + 0.0, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 2, + "name": "intel", + "breed": "labrador", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + -1.0 + ] + ], + "regenerate": true + }, + "rest3": { + "embeddings": [ + [ + 1.0, + 1.0, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 3, + "name": "dustin", + "breed": "bulldog", + "_vectors": { + "rest": { + "embeddings": [ + [ + -0.5, + 0.5, + 1.0 + ] + ], + "regenerate": true + }, + "rest3": { + "embeddings": [ + [ + -0.5, + 0.5, + 0.0 + ] + ], + "regenerate": true + } + } + } + ], + "offset": 0, + "limit": 20, + "total": 4 + } + "#); +} + +#[actix_rt::test] +async fn remove_non_existant_embedder() { + let (server, uid, mut settings) = init_fragments_index().await; + let index = server.index(uid); + + settings["embedders"] + .as_object_mut() + .unwrap() + .insert(String::from("non-existant"), serde_json::Value::Null); + + let (response, code) = index.update_settings(settings).await; + snapshot!(code, @"202 Accepted"); + let task = server.wait_task(response.uid()).await.succeeded(); + snapshot!(task, @r#" + { + "uid": "[uid]", + "batchUid": "[batch_uid]", + "indexUid": "[uuid]", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "non-existant": null, + "rest": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "basic": { + "value": "{{ doc.name }} is a dog" + }, + "withBreed": { + "value": "{{ doc.name }} is a {{ doc.breed }}" + } + }, + "searchFragments": { + "justBreed": { + "value": "It's a {{ media.breed }}" + }, + "justName": { + "value": "{{ media.name }} is a dog" + }, + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + } + } + } + }, + "error": null, + "duration": "[duration]", + "enqueuedAt": "[date]", + "startedAt": "[date]", + "finishedAt": "[date]" + } + "#); +} + +#[actix_rt::test] +async fn double_remove_embedder() { + let (server, uid, mut settings) = init_fragments_index().await; + let index = server.index(uid); + + settings["embedders"] + .as_object_mut() + .unwrap() + .insert(String::from("rest"), serde_json::Value::Null); + + let (response, code) = index.update_settings(settings.clone()).await; + snapshot!(code, @"202 Accepted"); + let task = server.wait_task(response.uid()).await.succeeded(); + snapshot!(task, @r#" + { + "uid": "[uid]", + "batchUid": "[batch_uid]", + "indexUid": "[uuid]", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "rest": null + } + }, + "error": null, + "duration": "[duration]", + "enqueuedAt": "[date]", + "startedAt": "[date]", + "finishedAt": "[date]" + } + "#); + + let (response, code) = index.update_settings(settings.clone()).await; + snapshot!(code, @"202 Accepted"); + let task = server.wait_task(response.uid()).await.succeeded(); + snapshot!(task, @r#" + { + "uid": "[uid]", + "batchUid": "[batch_uid]", + "indexUid": "[uuid]", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "rest": null + } + }, + "error": null, + "duration": "[duration]", + "enqueuedAt": "[date]", + "startedAt": "[date]", + "finishedAt": "[date]" + } + "#); +} + +#[actix_rt::test] +async fn complex_fragment() { + let (server, uid, mut settings) = init_fragments_index().await; + let index = server.index(uid); + + settings["embedders"]["rest"]["indexingFragments"].as_object_mut().unwrap().insert( + String::from("complex"), + serde_json::json!({ + "value": { + "breed": "{{ doc.breed }}", + "breeds": [ + "{{ doc.breed }}", + { + "breed": "{{ doc.breed }}", + } + ] + } + }), + ); + + let (response, code) = index.update_settings(settings).await; + snapshot!(code, @"202 Accepted"); + let task = server.wait_task(response.uid()).await.succeeded(); + snapshot!(task, @r#" + { + "uid": "[uid]", + "batchUid": "[batch_uid]", + "indexUid": "[uuid]", + "status": "succeeded", + "type": "settingsUpdate", + "canceledBy": null, + "details": { + "embedders": { + "rest": { + "source": "rest", + "dimensions": 3, + "url": "[url]", + "indexingFragments": { + "basic": { + "value": "{{ doc.name }} is a dog" + }, + "complex": { + "value": { + "breed": "{{ doc.breed }}", + "breeds": [ + "{{ doc.breed }}", + { + "breed": "{{ doc.breed }}" + } + ] + } + }, + "withBreed": { + "value": "{{ doc.name }} is a {{ doc.breed }}" + } + }, + "searchFragments": { + "justBreed": { + "value": "It's a {{ media.breed }}" + }, + "justName": { + "value": "{{ media.name }} is a dog" + }, + "query": { + "value": "Some pre-prompt for query {{ q }}" + } + }, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + } + } + } + }, + "error": null, + "duration": "[duration]", + "enqueuedAt": "[date]", + "startedAt": "[date]", + "finishedAt": "[date]" + } + "#); + + let (documents, code) = index + .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) + .await; + snapshot!(code, @"200 OK"); + snapshot!(json_string!(documents), @r#" + { + "results": [ + { + "id": 0, + "name": "kefir", + "_vectors": { + "rest": { + "embeddings": [ + [ + 0.5, + -0.5, + 0.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 1, + "name": "echo", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 1.0 + ] + ], + "regenerate": false + } + } + }, + { + "id": 2, + "name": "intel", + "breed": "labrador", + "_vectors": { + "rest": { + "embeddings": [ + [ + 1.0, + 1.0, + 0.0 + ], + [ + 1.0, + 1.0, + -1.0 + ], + [ + 0.0, + 0.0, + -1.0 + ] + ], + "regenerate": true + } + } + }, + { + "id": 3, + "name": "dustin", + "breed": "bulldog", + "_vectors": { + "rest": { + "embeddings": [ + [ + -0.5, + 0.5, + 0.0 + ], + [ + -0.5, + 0.5, + 1.0 + ], + [ + 0.0, + 0.0, + 1.0 + ] + ], + "regenerate": true + } + } + } + ], + "offset": 0, + "limit": 20, + "total": 4 + } + "#); +} + +#[actix_rt::test] +async fn both_fragments_and_document_template() { + let server = Server::new().await; + let index = server.unique_index(); + + let (_response, code) = server.set_features(json!({"multimodal": true})).await; + snapshot!(code, @"200 OK"); + + let settings = json!({ + "embedders": { + "rest": { + "source": "rest", + "url": "http://localhost:1337", + "dimensions": 3, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + }, + "indexingFragments": { + "basic": {"value": "{{ doc.name }} is a dog"}, + }, + "searchFragments": { + "justBreed": {"value": "It's a {{ media.breed }}"}, + }, + "documentTemplate": "{{ doc.name }} is a dog", + }, + }, + }); + + let (response, code) = index.update_settings(settings.clone()).await; + snapshot!(code, @"400 Bad Request"); + snapshot!(response, @r#" + { + "message": "Error while generating embeddings: user error: cannot pass both fragments and a document template.\n - Note: 1 fragments declared in `indexingFragments` and 1 fragments declared in `search_fragments_len`.\n - Hint: remove the declared fragments or remove the `documentTemplate`", + "code": "vector_embedding_error", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#vector_embedding_error" + } + "#); +} + +#[ignore = "failing due to issue #5746"] +#[actix_rt::test] +async fn set_fragments_then_document_template() { + let (server, uid, settings) = init_fragments_index().await; + let index = server.index(uid); + + let url = settings["embedders"]["rest"]["url"].as_str().unwrap(); + + let settings = json!({ + "embedders": { + "rest": { + "source": "rest", + "url": url, + "dimensions": 3, + "request": "{{fragment}}", + "response": { + "data": "{{embedding}}" + }, + "documentTemplate": "{{ doc.name }} is a dog", + }, + }, + }); + + let (response, code) = index.update_settings(settings.clone()).await; + snapshot!(code, @"202 Accepted"); + let task = server.wait_task(response.uid()).await; + snapshot!(task, @r""); + + let (settings, code) = index.settings().await; + snapshot!(code, @"200 OK"); + snapshot!(settings, @r#""#); // Should have removed fragments +} + +#[actix_rt::test] +async fn composite() { + let (server, uid, _settings) = init_fragments_index_composite().await; + let index = server.index(uid); + + let (value, code) = index.search_post( + json!({"vector": [1.0, 1.0, 1.0], "hybrid": {"semanticRatio": 1.0, "embedder": "rest"}, "limit": 1} + )).await; + snapshot!(code, @"200 OK"); + snapshot!(value, @r#" + { + "hits": [ + { + "id": 1, + "name": "echo" + } + ], + "query": "", + "processingTimeMs": "[duration]", + "limit": 1, + "offset": 0, + "estimatedTotalHits": 4, + "semanticHitCount": 1 + } + "#); + + let (value, code) = index + .search_post( + json!({"q": "bulldog", "hybrid": {"semanticRatio": 1.0, "embedder": "rest"}, "limit": 1} + ), + ) + .await; + snapshot!(code, @"200 OK"); + snapshot!(value, @r#" + { + "hits": [ + { + "id": 3, + "name": "dustin", + "breed": "bulldog" + } + ], + "query": "bulldog", + "processingTimeMs": "[duration]", + "limit": 1, + "offset": 0, + "estimatedTotalHits": 4, + "semanticHitCount": 1 + } + "#); +} diff --git a/crates/meilisearch/tests/vector/mod.rs b/crates/meilisearch/tests/vector/mod.rs index ca2ecc998..3c08b9e03 100644 --- a/crates/meilisearch/tests/vector/mod.rs +++ b/crates/meilisearch/tests/vector/mod.rs @@ -1,4 +1,5 @@ mod binary_quantized; +mod fragments; #[cfg(feature = "test-ollama")] mod ollama; mod openai; @@ -13,8 +14,9 @@ use meilisearch::option::MaxThreads; use crate::common::index::Index; use crate::common::{default_settings, GetAllDocumentsOptions, Server}; use crate::json; +pub use rest::create_mock; -async fn get_server_vector() -> Server { +pub async fn get_server_vector() -> Server { Server::new().await } @@ -685,7 +687,7 @@ async fn clear_documents() { // Make sure the arroy DB has been cleared let (documents, _code) = index.search_post(json!({ "vector": [1, 1, 1], "hybrid": {"embedder": "manual"} })).await; - snapshot!(documents, @r###" + snapshot!(documents, @r#" { "hits": [], "query": "", @@ -695,7 +697,7 @@ async fn clear_documents() { "estimatedTotalHits": 0, "semanticHitCount": 0 } - "###); + "#); } #[actix_rt::test] @@ -739,7 +741,7 @@ async fn add_remove_one_vector_4588() { json!({"vector": [1, 1, 1], "hybrid": {"semanticRatio": 1.0, "embedder": "manual"} }), ) .await; - snapshot!(documents, @r###" + snapshot!(documents, @r#" { "hits": [ { @@ -754,7 +756,7 @@ async fn add_remove_one_vector_4588() { "estimatedTotalHits": 1, "semanticHitCount": 1 } - "###); + "#); let (documents, _code) = index .get_all_documents(GetAllDocumentsOptions { retrieve_vectors: true, ..Default::default() }) diff --git a/crates/meilisearch/tests/vector/openai.rs b/crates/meilisearch/tests/vector/openai.rs index 19b13228a..1d7e94a23 100644 --- a/crates/meilisearch/tests/vector/openai.rs +++ b/crates/meilisearch/tests/vector/openai.rs @@ -136,7 +136,7 @@ fn long_text() -> &'static str { }) } -async fn create_mock_tokenized() -> (MockServer, Value) { +async fn create_mock_tokenized() -> (&'static MockServer, Value) { create_mock_with_template("{{doc.text}}", ModelDimensions::Large, false, false).await } @@ -145,8 +145,8 @@ async fn create_mock_with_template( model_dimensions: ModelDimensions, fallible: bool, slow: bool, -) -> (MockServer, Value) { - let mock_server = MockServer::start().await; +) -> (&'static MockServer, Value) { + let mock_server = Box::leak(Box::new(MockServer::start().await)); const API_KEY: &str = "my-api-key"; const API_KEY_BEARER: &str = "Bearer my-api-key"; @@ -299,7 +299,7 @@ async fn create_mock_with_template( } })) }) - .mount(&mock_server) + .mount(mock_server) .await; let url = mock_server.uri(); @@ -321,27 +321,27 @@ const DOGGO_TEMPLATE: &str = r#"{%- if doc.gender == "F" -%}Une chienne nommée Un chien nommé {{doc.name}}, né en {{doc.birthyear}} {%- endif %}, de race {{doc.breed}}."#; -async fn create_mock() -> (MockServer, Value) { +async fn create_mock() -> (&'static MockServer, Value) { create_mock_with_template(DOGGO_TEMPLATE, ModelDimensions::Large, false, false).await } -async fn create_mock_dimensions() -> (MockServer, Value) { +async fn create_mock_dimensions() -> (&'static MockServer, Value) { create_mock_with_template(DOGGO_TEMPLATE, ModelDimensions::Large512, false, false).await } -async fn create_mock_small_embedding_model() -> (MockServer, Value) { +async fn create_mock_small_embedding_model() -> (&'static MockServer, Value) { create_mock_with_template(DOGGO_TEMPLATE, ModelDimensions::Small, false, false).await } -async fn create_mock_legacy_embedding_model() -> (MockServer, Value) { +async fn create_mock_legacy_embedding_model() -> (&'static MockServer, Value) { create_mock_with_template(DOGGO_TEMPLATE, ModelDimensions::Ada, false, false).await } -async fn create_fallible_mock() -> (MockServer, Value) { +async fn create_fallible_mock() -> (&'static MockServer, Value) { create_mock_with_template(DOGGO_TEMPLATE, ModelDimensions::Large, true, false).await } -async fn create_slow_mock() -> (MockServer, Value) { +async fn create_slow_mock() -> (&'static MockServer, Value) { create_mock_with_template(DOGGO_TEMPLATE, ModelDimensions::Large, true, true).await } diff --git a/crates/meilisearch/tests/vector/rest.rs b/crates/meilisearch/tests/vector/rest.rs index 768d03eb9..7668dbcc3 100644 --- a/crates/meilisearch/tests/vector/rest.rs +++ b/crates/meilisearch/tests/vector/rest.rs @@ -12,8 +12,8 @@ use crate::common::Value; use crate::json; use crate::vector::{get_server_vector, GetAllDocumentsOptions}; -async fn create_mock() -> (MockServer, Value) { - let mock_server = MockServer::start().await; +pub async fn create_mock() -> (&'static MockServer, Value) { + let mock_server = Box::leak(Box::new(MockServer::start().await)); let text_to_embedding: BTreeMap<_, _> = vec![ // text -> embedding @@ -32,7 +32,7 @@ async fn create_mock() -> (MockServer, Value) { json!({ "data": text_to_embedding.get(text.as_str()).unwrap_or(&[99., 99., 99.]) }), ) }) - .mount(&mock_server) + .mount(mock_server) .await; let url = mock_server.uri(); @@ -50,8 +50,8 @@ async fn create_mock() -> (MockServer, Value) { (mock_server, embedder_settings) } -async fn create_mock_default_template() -> (MockServer, Value) { - let mock_server = MockServer::start().await; +async fn create_mock_default_template() -> (&'static MockServer, Value) { + let mock_server = Box::leak(Box::new(MockServer::start().await)); let text_to_embedding: BTreeMap<_, _> = vec![ // text -> embedding @@ -73,7 +73,7 @@ async fn create_mock_default_template() -> (MockServer, Value) { .set_body_json(json!({"error": "text not found", "text": text})), } }) - .mount(&mock_server) + .mount(mock_server) .await; let url = mock_server.uri(); @@ -106,8 +106,8 @@ struct SingleResponse { embedding: Vec, } -async fn create_mock_multiple() -> (MockServer, Value) { - let mock_server = MockServer::start().await; +async fn create_mock_multiple() -> (&'static MockServer, Value) { + let mock_server = Box::leak(Box::new(MockServer::start().await)); let text_to_embedding: BTreeMap<_, _> = vec![ // text -> embedding @@ -146,7 +146,7 @@ async fn create_mock_multiple() -> (MockServer, Value) { ResponseTemplate::new(200).set_body_json(response) }) - .mount(&mock_server) + .mount(mock_server) .await; let url = mock_server.uri(); @@ -176,8 +176,8 @@ struct SingleRequest { input: String, } -async fn create_mock_single_response_in_array() -> (MockServer, Value) { - let mock_server = MockServer::start().await; +async fn create_mock_single_response_in_array() -> (&'static MockServer, Value) { + let mock_server = Box::leak(Box::new(MockServer::start().await)); let text_to_embedding: BTreeMap<_, _> = vec![ // text -> embedding @@ -212,7 +212,7 @@ async fn create_mock_single_response_in_array() -> (MockServer, Value) { ResponseTemplate::new(200).set_body_json(response) }) - .mount(&mock_server) + .mount(mock_server) .await; let url = mock_server.uri(); @@ -236,8 +236,8 @@ async fn create_mock_single_response_in_array() -> (MockServer, Value) { (mock_server, embedder_settings) } -async fn create_mock_raw_with_custom_header() -> (MockServer, Value) { - let mock_server = MockServer::start().await; +async fn create_mock_raw_with_custom_header() -> (&'static MockServer, Value) { + let mock_server = Box::leak(Box::new(MockServer::start().await)); let text_to_embedding: BTreeMap<_, _> = vec![ // text -> embedding @@ -277,7 +277,7 @@ async fn create_mock_raw_with_custom_header() -> (MockServer, Value) { ResponseTemplate::new(200).set_body_json(output) }) - .mount(&mock_server) + .mount(mock_server) .await; let url = mock_server.uri(); @@ -293,8 +293,8 @@ async fn create_mock_raw_with_custom_header() -> (MockServer, Value) { (mock_server, embedder_settings) } -async fn create_mock_raw() -> (MockServer, Value) { - let mock_server = MockServer::start().await; +async fn create_mock_raw() -> (&'static MockServer, Value) { + let mock_server = Box::leak(Box::new(MockServer::start().await)); let text_to_embedding: BTreeMap<_, _> = vec![ // text -> embedding @@ -321,7 +321,7 @@ async fn create_mock_raw() -> (MockServer, Value) { ResponseTemplate::new(200).set_body_json(output) }) - .mount(&mock_server) + .mount(mock_server) .await; let url = mock_server.uri(); @@ -337,8 +337,8 @@ async fn create_mock_raw() -> (MockServer, Value) { (mock_server, embedder_settings) } -async fn create_faulty_mock_raw(sender: mpsc::Sender<()>) -> (MockServer, Value) { - let mock_server = MockServer::start().await; +async fn create_faulty_mock_raw(sender: mpsc::Sender<()>) -> (&'static MockServer, Value) { + let mock_server = Box::leak(Box::new(MockServer::start().await)); let count = AtomicUsize::new(0); Mock::given(method("POST")) @@ -355,7 +355,7 @@ async fn create_faulty_mock_raw(sender: mpsc::Sender<()>) -> (MockServer, Value) ResponseTemplate::new(500).set_body_string("Service Unavailable") } }) - .mount(&mock_server) + .mount(mock_server) .await; let url = mock_server.uri(); diff --git a/crates/meilisearch/tests/vector/settings.rs b/crates/meilisearch/tests/vector/settings.rs index 50253f930..d26174faf 100644 --- a/crates/meilisearch/tests/vector/settings.rs +++ b/crates/meilisearch/tests/vector/settings.rs @@ -101,14 +101,7 @@ async fn reset_embedder_documents() { server.wait_task(response.uid()).await; // Make sure the documents are still present - let (documents, _code) = index - .get_all_documents(GetAllDocumentsOptions { - limit: None, - offset: None, - retrieve_vectors: false, - fields: None, - }) - .await; + let (documents, _code) = index.get_all_documents(GetAllDocumentsOptions::default()).await; snapshot!(json_string!(documents), @r###" { "results": [ diff --git a/crates/meilitool/src/main.rs b/crates/meilitool/src/main.rs index b967e620c..170bbdcc8 100644 --- a/crates/meilitool/src/main.rs +++ b/crates/meilitool/src/main.rs @@ -15,6 +15,7 @@ use meilisearch_types::heed::{ }; use meilisearch_types::milli::constants::RESERVED_VECTORS_FIELD_NAME; use meilisearch_types::milli::documents::{obkv_to_object, DocumentsBatchReader}; +use meilisearch_types::milli::index::EmbeddingsWithMetadata; use meilisearch_types::milli::vector::parsed_vectors::{ExplicitVectors, VectorOrArrayOfVectors}; use meilisearch_types::milli::{obkv_to_json, BEU32}; use meilisearch_types::tasks::{Status, Task}; @@ -591,12 +592,21 @@ fn export_documents( .into()); }; - for (embedder_name, (embeddings, regenerate)) in embeddings { + for ( + embedder_name, + EmbeddingsWithMetadata { embeddings, regenerate, has_fragments }, + ) in embeddings + { let embeddings = ExplicitVectors { embeddings: Some(VectorOrArrayOfVectors::from_array_of_vectors( embeddings, )), - regenerate, + regenerate: regenerate && + // Meilisearch does not handle well dumps with fragments, because as the fragments + // are marked as user-provided, + // all embeddings would be regenerated on any settings change or document update. + // To prevent this, we mark embeddings has non regenerate in this case. + !has_fragments, }; vectors .insert(embedder_name, serde_json::to_value(embeddings).unwrap()); diff --git a/crates/milli/Cargo.toml b/crates/milli/Cargo.toml index 7ee61a877..b7bcd8c68 100644 --- a/crates/milli/Cargo.toml +++ b/crates/milli/Cargo.toml @@ -44,7 +44,7 @@ indexmap = { version = "2.9.0", features = ["serde"] } json-depth-checker = { path = "../json-depth-checker" } levenshtein_automata = { version = "0.2.1", features = ["fst_automaton"] } memchr = "2.7.5" -memmap2 = "0.9.5" +memmap2 = "0.9.7" obkv = "0.3.0" once_cell = "1.21.3" ordered-float = "5.0.0" diff --git a/crates/milli/src/asc_desc.rs b/crates/milli/src/asc_desc.rs index e75adf83d..d7288faa3 100644 --- a/crates/milli/src/asc_desc.rs +++ b/crates/milli/src/asc_desc.rs @@ -168,6 +168,16 @@ pub enum SortError { ReservedNameForFilter { name: String }, } +impl SortError { + pub fn into_search_error(self) -> Error { + Error::UserError(UserError::SortError { error: self, search: true }) + } + + pub fn into_document_error(self) -> Error { + Error::UserError(UserError::SortError { error: self, search: false }) + } +} + impl From for SortError { fn from(error: AscDescError) -> Self { match error { @@ -190,12 +200,6 @@ impl From for SortError { } } -impl From for Error { - fn from(error: SortError) -> Self { - Self::UserError(UserError::SortError(error)) - } -} - #[cfg(test)] mod tests { use big_s::S; diff --git a/crates/milli/src/documents/geo_sort.rs b/crates/milli/src/documents/geo_sort.rs new file mode 100644 index 000000000..0750dfe5c --- /dev/null +++ b/crates/milli/src/documents/geo_sort.rs @@ -0,0 +1,294 @@ +use crate::{ + distance_between_two_points, + heed_codec::facet::{FieldDocIdFacetCodec, OrderedF64Codec}, + lat_lng_to_xyz, + search::new::{facet_string_values, facet_values_prefix_key}, + GeoPoint, Index, +}; +use heed::{ + types::{Bytes, Unit}, + RoPrefix, RoTxn, +}; +use roaring::RoaringBitmap; +use rstar::RTree; +use std::collections::VecDeque; + +#[derive(Debug, Clone, Copy)] +pub struct GeoSortParameter { + // Define the strategy used by the geo sort + pub strategy: GeoSortStrategy, + // Limit the number of docs in a single bucket to avoid unexpectedly large overhead + pub max_bucket_size: u64, + // Considering the errors of GPS and geographical calculations, distances less than distance_error_margin will be treated as equal + pub distance_error_margin: f64, +} + +impl Default for GeoSortParameter { + fn default() -> Self { + Self { + strategy: GeoSortStrategy::default(), + max_bucket_size: 1000, + distance_error_margin: 1.0, + } + } +} +/// Define the strategy used by the geo sort. +/// The parameter represents the cache size, and, in the case of the Dynamic strategy, +/// the point where we move from using the iterative strategy to the rtree. +#[derive(Debug, Clone, Copy)] +pub enum GeoSortStrategy { + AlwaysIterative(usize), + AlwaysRtree(usize), + Dynamic(usize), +} + +impl Default for GeoSortStrategy { + fn default() -> Self { + GeoSortStrategy::Dynamic(1000) + } +} + +impl GeoSortStrategy { + pub fn use_rtree(&self, candidates: usize) -> bool { + match self { + GeoSortStrategy::AlwaysIterative(_) => false, + GeoSortStrategy::AlwaysRtree(_) => true, + GeoSortStrategy::Dynamic(i) => candidates >= *i, + } + } + + pub fn cache_size(&self) -> usize { + match self { + GeoSortStrategy::AlwaysIterative(i) + | GeoSortStrategy::AlwaysRtree(i) + | GeoSortStrategy::Dynamic(i) => *i, + } + } +} + +#[allow(clippy::too_many_arguments)] +pub fn fill_cache( + index: &Index, + txn: &RoTxn, + strategy: GeoSortStrategy, + ascending: bool, + target_point: [f64; 2], + field_ids: &Option<[u16; 2]>, + rtree: &mut Option>, + geo_candidates: &RoaringBitmap, + cached_sorted_docids: &mut VecDeque<(u32, [f64; 2])>, +) -> crate::Result<()> { + debug_assert!(cached_sorted_docids.is_empty()); + + // lazily initialize the rtree if needed by the strategy, and cache it in `self.rtree` + let rtree = if strategy.use_rtree(geo_candidates.len() as usize) { + if let Some(rtree) = rtree.as_ref() { + // get rtree from cache + Some(rtree) + } else { + let rtree2 = index.geo_rtree(txn)?.expect("geo candidates but no rtree"); + // insert rtree in cache and returns it. + // Can't use `get_or_insert_with` because getting the rtree from the DB is a fallible operation. + Some(&*rtree.insert(rtree2)) + } + } else { + None + }; + + let cache_size = strategy.cache_size(); + if let Some(rtree) = rtree { + if ascending { + let point = lat_lng_to_xyz(&target_point); + for point in rtree.nearest_neighbor_iter(&point) { + if geo_candidates.contains(point.data.0) { + cached_sorted_docids.push_back(point.data); + if cached_sorted_docids.len() >= cache_size { + break; + } + } + } + } else { + // in the case of the desc geo sort we look for the closest point to the opposite of the queried point + // and we insert the points in reverse order they get reversed when emptying the cache later on + let point = lat_lng_to_xyz(&opposite_of(target_point)); + for point in rtree.nearest_neighbor_iter(&point) { + if geo_candidates.contains(point.data.0) { + cached_sorted_docids.push_front(point.data); + if cached_sorted_docids.len() >= cache_size { + break; + } + } + } + } + } else { + // the iterative version + let [lat, lng] = field_ids.expect("fill_buffer can't be called without the lat&lng"); + + let mut documents = geo_candidates + .iter() + .map(|id| -> crate::Result<_> { Ok((id, geo_value(id, lat, lng, index, txn)?)) }) + .collect::>>()?; + // computing the distance between two points is expensive thus we cache the result + documents + .sort_by_cached_key(|(_, p)| distance_between_two_points(&target_point, p) as usize); + cached_sorted_docids.extend(documents); + }; + + Ok(()) +} + +#[allow(clippy::too_many_arguments)] +pub fn next_bucket( + index: &Index, + txn: &RoTxn, + universe: &RoaringBitmap, + ascending: bool, + target_point: [f64; 2], + field_ids: &Option<[u16; 2]>, + rtree: &mut Option>, + cached_sorted_docids: &mut VecDeque<(u32, [f64; 2])>, + geo_candidates: &RoaringBitmap, + parameter: GeoSortParameter, +) -> crate::Result)>> { + let mut geo_candidates = geo_candidates & universe; + + if geo_candidates.is_empty() { + return Ok(Some((universe.clone(), None))); + } + + let next = |cache: &mut VecDeque<_>| { + if ascending { + cache.pop_front() + } else { + cache.pop_back() + } + }; + let put_back = |cache: &mut VecDeque<_>, x: _| { + if ascending { + cache.push_front(x) + } else { + cache.push_back(x) + } + }; + + let mut current_bucket = RoaringBitmap::new(); + // current_distance stores the first point and distance in current bucket + let mut current_distance: Option<([f64; 2], f64)> = None; + loop { + // The loop will only exit when we have found all points with equal distance or have exhausted the candidates. + if let Some((id, point)) = next(cached_sorted_docids) { + if geo_candidates.contains(id) { + let distance = distance_between_two_points(&target_point, &point); + if let Some((point0, bucket_distance)) = current_distance.as_ref() { + if (bucket_distance - distance).abs() > parameter.distance_error_margin { + // different distance, point belongs to next bucket + put_back(cached_sorted_docids, (id, point)); + return Ok(Some((current_bucket, Some(point0.to_owned())))); + } else { + // same distance, point belongs to current bucket + current_bucket.insert(id); + // remove from candidates to prevent it from being added to the cache again + geo_candidates.remove(id); + // current bucket size reaches limit, force return + if current_bucket.len() == parameter.max_bucket_size { + return Ok(Some((current_bucket, Some(point0.to_owned())))); + } + } + } else { + // first doc in current bucket + current_distance = Some((point, distance)); + current_bucket.insert(id); + geo_candidates.remove(id); + // current bucket size reaches limit, force return + if current_bucket.len() == parameter.max_bucket_size { + return Ok(Some((current_bucket, Some(point.to_owned())))); + } + } + } + } else { + // cache exhausted, we need to refill it + fill_cache( + index, + txn, + parameter.strategy, + ascending, + target_point, + field_ids, + rtree, + &geo_candidates, + cached_sorted_docids, + )?; + + if cached_sorted_docids.is_empty() { + // candidates exhausted, exit + if let Some((point0, _)) = current_distance.as_ref() { + return Ok(Some((current_bucket, Some(point0.to_owned())))); + } else { + return Ok(Some((universe.clone(), None))); + } + } + } + } +} + +/// Return an iterator over each number value in the given field of the given document. +fn facet_number_values<'a>( + docid: u32, + field_id: u16, + index: &Index, + txn: &'a RoTxn<'a>, +) -> crate::Result, Unit>> { + let key = facet_values_prefix_key(field_id, docid); + + let iter = index + .field_id_docid_facet_f64s + .remap_key_type::() + .prefix_iter(txn, &key)? + .remap_key_type(); + + Ok(iter) +} + +/// Extracts the lat and long values from a single document. +/// +/// If it is not able to find it in the facet number index it will extract it +/// from the facet string index and parse it as f64 (as the geo extraction behaves). +pub(crate) fn geo_value( + docid: u32, + field_lat: u16, + field_lng: u16, + index: &Index, + rtxn: &RoTxn<'_>, +) -> crate::Result<[f64; 2]> { + let extract_geo = |geo_field: u16| -> crate::Result { + match facet_number_values(docid, geo_field, index, rtxn)?.next() { + Some(Ok(((_, _, geo), ()))) => Ok(geo), + Some(Err(e)) => Err(e.into()), + None => match facet_string_values(docid, geo_field, index, rtxn)?.next() { + Some(Ok((_, geo))) => { + Ok(geo.parse::().expect("cannot parse geo field as f64")) + } + Some(Err(e)) => Err(e.into()), + None => panic!("A geo faceted document doesn't contain any lat or lng"), + }, + } + }; + + let lat = extract_geo(field_lat)?; + let lng = extract_geo(field_lng)?; + + Ok([lat, lng]) +} + +/// Compute the antipodal coordinate of `coord` +pub(crate) fn opposite_of(mut coord: [f64; 2]) -> [f64; 2] { + coord[0] *= -1.; + // in the case of x,0 we want to return x,180 + if coord[1] > 0. { + coord[1] -= 180.; + } else { + coord[1] += 180.; + } + + coord +} diff --git a/crates/milli/src/documents/mod.rs b/crates/milli/src/documents/mod.rs index f43f7e842..7a4babfa8 100644 --- a/crates/milli/src/documents/mod.rs +++ b/crates/milli/src/documents/mod.rs @@ -1,8 +1,10 @@ mod builder; mod enriched; +pub mod geo_sort; mod primary_key; mod reader; mod serde_impl; +pub mod sort; use std::fmt::Debug; use std::io; @@ -19,6 +21,7 @@ pub use primary_key::{ pub use reader::{DocumentsBatchCursor, DocumentsBatchCursorError, DocumentsBatchReader}; use serde::{Deserialize, Serialize}; +pub use self::geo_sort::{GeoSortParameter, GeoSortStrategy}; use crate::error::{FieldIdMapMissingEntry, InternalError}; use crate::{FieldId, Object, Result}; diff --git a/crates/milli/src/documents/sort.rs b/crates/milli/src/documents/sort.rs new file mode 100644 index 000000000..3866d9e27 --- /dev/null +++ b/crates/milli/src/documents/sort.rs @@ -0,0 +1,444 @@ +use std::collections::{BTreeSet, VecDeque}; + +use crate::{ + constants::RESERVED_GEO_FIELD_NAME, + documents::{geo_sort::next_bucket, GeoSortParameter}, + heed_codec::{ + facet::{FacetGroupKeyCodec, FacetGroupValueCodec}, + BytesRefCodec, + }, + is_faceted, + search::facet::{ascending_facet_sort, descending_facet_sort}, + AscDesc, DocumentId, Member, UserError, +}; +use heed::Database; +use roaring::RoaringBitmap; + +#[derive(Debug, Clone, Copy)] +enum AscDescId { + Facet { field_id: u16, ascending: bool }, + Geo { field_ids: [u16; 2], target_point: [f64; 2], ascending: bool }, +} + +/// A [`SortedDocumentsIterator`] allows efficient access to a continuous range of sorted documents. +/// This is ideal in the context of paginated queries in which only a small number of documents are needed at a time. +/// Search operations will only be performed upon access. +pub enum SortedDocumentsIterator<'ctx> { + Leaf { + /// The exact number of documents remaining + size: usize, + values: Box + 'ctx>, + }, + Branch { + /// The current child, got from the children iterator + current_child: Option>>, + /// The exact number of documents remaining, excluding documents in the current child + next_children_size: usize, + /// Iterators to become the current child once it is exhausted + next_children: + Box>> + 'ctx>, + }, +} + +impl SortedDocumentsIterator<'_> { + /// Takes care of updating the current child if it is `None`, and also updates the size + fn update_current<'ctx>( + current_child: &mut Option>>, + next_children_size: &mut usize, + next_children: &mut Box< + dyn Iterator>> + 'ctx, + >, + ) -> crate::Result<()> { + if current_child.is_none() { + *current_child = match next_children.next() { + Some(Ok(builder)) => { + let next_child = Box::new(builder.build()?); + *next_children_size -= next_child.size_hint().0; + Some(next_child) + } + Some(Err(e)) => return Err(e), + None => return Ok(()), + }; + } + Ok(()) + } +} + +impl Iterator for SortedDocumentsIterator<'_> { + type Item = crate::Result; + + /// Implementing the `nth` method allows for efficient access to the nth document in the sorted order. + /// It's used by `skip` internally. + /// The default implementation of `nth` would iterate over all children, which is inefficient for large datasets. + /// This implementation will jump over whole chunks of children until it gets close. + fn nth(&mut self, n: usize) -> Option { + if n == 0 { + return self.next(); + } + + // If it's at the leaf level, just forward the call to the values iterator + let (current_child, next_children, next_children_size) = match self { + SortedDocumentsIterator::Leaf { values, size } => { + *size = size.saturating_sub(n); + return values.nth(n).map(Ok); + } + SortedDocumentsIterator::Branch { + current_child, + next_children, + next_children_size, + } => (current_child, next_children, next_children_size), + }; + + // Otherwise don't directly iterate over children, skip them if we know we will go further + let mut to_skip = n - 1; + while to_skip > 0 { + if let Err(e) = SortedDocumentsIterator::update_current( + current_child, + next_children_size, + next_children, + ) { + return Some(Err(e)); + } + let Some(inner) = current_child else { + return None; // No more inner iterators, everything has been consumed. + }; + + if to_skip >= inner.size_hint().0 { + // The current child isn't large enough to contain the nth element. + // Skip it and continue with the next one. + to_skip -= inner.size_hint().0; + *current_child = None; + continue; + } else { + // The current iterator is large enough, so we can forward the call to it. + return inner.nth(to_skip + 1); + } + } + + self.next() + } + + /// Iterators need to keep track of their size so that they can be skipped efficiently by the `nth` method. + fn size_hint(&self) -> (usize, Option) { + let size = match self { + SortedDocumentsIterator::Leaf { size, .. } => *size, + SortedDocumentsIterator::Branch { + next_children_size, + current_child: Some(current_child), + .. + } => current_child.size_hint().0 + next_children_size, + SortedDocumentsIterator::Branch { next_children_size, current_child: None, .. } => { + *next_children_size + } + }; + + (size, Some(size)) + } + + fn next(&mut self) -> Option { + match self { + SortedDocumentsIterator::Leaf { values, size } => { + let result = values.next().map(Ok); + if result.is_some() { + *size -= 1; + } + result + } + SortedDocumentsIterator::Branch { + current_child, + next_children_size, + next_children, + } => { + let mut result = None; + while result.is_none() { + // Ensure we have selected an iterator to work with + if let Err(e) = SortedDocumentsIterator::update_current( + current_child, + next_children_size, + next_children, + ) { + return Some(Err(e)); + } + let Some(inner) = current_child else { + return None; + }; + + result = inner.next(); + + // If the current iterator is exhausted, we need to try the next one + if result.is_none() { + *current_child = None; + } + } + result + } + } + } +} + +/// Builder for a [`SortedDocumentsIterator`]. +/// Most builders won't ever be built, because pagination will skip them. +pub struct SortedDocumentsIteratorBuilder<'ctx> { + index: &'ctx crate::Index, + rtxn: &'ctx heed::RoTxn<'ctx>, + number_db: Database, FacetGroupValueCodec>, + string_db: Database, FacetGroupValueCodec>, + fields: &'ctx [AscDescId], + candidates: RoaringBitmap, + geo_candidates: &'ctx RoaringBitmap, +} + +impl<'ctx> SortedDocumentsIteratorBuilder<'ctx> { + /// Performs the sort and builds a [`SortedDocumentsIterator`]. + fn build(self) -> crate::Result> { + let size = self.candidates.len() as usize; + + match self.fields { + [] => Ok(SortedDocumentsIterator::Leaf { + size, + values: Box::new(self.candidates.into_iter()), + }), + [AscDescId::Facet { field_id, ascending }, next_fields @ ..] => { + SortedDocumentsIteratorBuilder::build_facet( + self.index, + self.rtxn, + self.number_db, + self.string_db, + next_fields, + self.candidates, + self.geo_candidates, + *field_id, + *ascending, + ) + } + [AscDescId::Geo { field_ids, target_point, ascending }, next_fields @ ..] => { + SortedDocumentsIteratorBuilder::build_geo( + self.index, + self.rtxn, + self.number_db, + self.string_db, + next_fields, + self.candidates, + self.geo_candidates, + *field_ids, + *target_point, + *ascending, + ) + } + } + } + + /// Builds a [`SortedDocumentsIterator`] based on the results of a facet sort. + #[allow(clippy::too_many_arguments)] + fn build_facet( + index: &'ctx crate::Index, + rtxn: &'ctx heed::RoTxn<'ctx>, + number_db: Database, FacetGroupValueCodec>, + string_db: Database, FacetGroupValueCodec>, + next_fields: &'ctx [AscDescId], + candidates: RoaringBitmap, + geo_candidates: &'ctx RoaringBitmap, + field_id: u16, + ascending: bool, + ) -> crate::Result> { + let size = candidates.len() as usize; + + // Perform the sort on the first field + let (number_iter, string_iter) = if ascending { + let number_iter = ascending_facet_sort(rtxn, number_db, field_id, candidates.clone())?; + let string_iter = ascending_facet_sort(rtxn, string_db, field_id, candidates)?; + + (itertools::Either::Left(number_iter), itertools::Either::Left(string_iter)) + } else { + let number_iter = descending_facet_sort(rtxn, number_db, field_id, candidates.clone())?; + let string_iter = descending_facet_sort(rtxn, string_db, field_id, candidates)?; + + (itertools::Either::Right(number_iter), itertools::Either::Right(string_iter)) + }; + + // Create builders for the next level of the tree + let number_iter = number_iter.map(|r| r.map(|(d, _)| d)); + let string_iter = string_iter.map(|r| r.map(|(d, _)| d)); + let next_children = number_iter.chain(string_iter).map(move |r| { + Ok(SortedDocumentsIteratorBuilder { + index, + rtxn, + number_db, + string_db, + fields: next_fields, + candidates: r?, + geo_candidates, + }) + }); + + Ok(SortedDocumentsIterator::Branch { + current_child: None, + next_children_size: size, + next_children: Box::new(next_children), + }) + } + + /// Builds a [`SortedDocumentsIterator`] based on the (lazy) results of a geo sort. + #[allow(clippy::too_many_arguments)] + fn build_geo( + index: &'ctx crate::Index, + rtxn: &'ctx heed::RoTxn<'ctx>, + number_db: Database, FacetGroupValueCodec>, + string_db: Database, FacetGroupValueCodec>, + next_fields: &'ctx [AscDescId], + candidates: RoaringBitmap, + geo_candidates: &'ctx RoaringBitmap, + field_ids: [u16; 2], + target_point: [f64; 2], + ascending: bool, + ) -> crate::Result> { + let mut cache = VecDeque::new(); + let mut rtree = None; + let size = candidates.len() as usize; + let not_geo_candidates = candidates.clone() - geo_candidates; + let mut geo_remaining = size - not_geo_candidates.len() as usize; + let mut not_geo_candidates = Some(not_geo_candidates); + + let next_children = std::iter::from_fn(move || { + // Find the next bucket of geo-sorted documents. + // next_bucket loops and will go back to the beginning so we use a variable to track how many are left. + if geo_remaining > 0 { + if let Ok(Some((docids, _point))) = next_bucket( + index, + rtxn, + &candidates, + ascending, + target_point, + &Some(field_ids), + &mut rtree, + &mut cache, + geo_candidates, + GeoSortParameter::default(), + ) { + geo_remaining -= docids.len() as usize; + return Some(Ok(SortedDocumentsIteratorBuilder { + index, + rtxn, + number_db, + string_db, + fields: next_fields, + candidates: docids, + geo_candidates, + })); + } + } + + // Once all geo candidates have been processed, we can return the others + if let Some(not_geo_candidates) = not_geo_candidates.take() { + if !not_geo_candidates.is_empty() { + return Some(Ok(SortedDocumentsIteratorBuilder { + index, + rtxn, + number_db, + string_db, + fields: next_fields, + candidates: not_geo_candidates, + geo_candidates, + })); + } + } + + None + }); + + Ok(SortedDocumentsIterator::Branch { + current_child: None, + next_children_size: size, + next_children: Box::new(next_children), + }) + } +} + +/// A structure owning the data needed during the lifetime of a [`SortedDocumentsIterator`]. +pub struct SortedDocuments<'ctx> { + index: &'ctx crate::Index, + rtxn: &'ctx heed::RoTxn<'ctx>, + fields: Vec, + number_db: Database, FacetGroupValueCodec>, + string_db: Database, FacetGroupValueCodec>, + candidates: &'ctx RoaringBitmap, + geo_candidates: RoaringBitmap, +} + +impl<'ctx> SortedDocuments<'ctx> { + pub fn iter(&'ctx self) -> crate::Result> { + let builder = SortedDocumentsIteratorBuilder { + index: self.index, + rtxn: self.rtxn, + number_db: self.number_db, + string_db: self.string_db, + fields: &self.fields, + candidates: self.candidates.clone(), + geo_candidates: &self.geo_candidates, + }; + builder.build() + } +} + +pub fn recursive_sort<'ctx>( + index: &'ctx crate::Index, + rtxn: &'ctx heed::RoTxn<'ctx>, + sort: Vec, + candidates: &'ctx RoaringBitmap, +) -> crate::Result> { + let sortable_fields: BTreeSet<_> = index.sortable_fields(rtxn)?.into_iter().collect(); + let fields_ids_map = index.fields_ids_map(rtxn)?; + + // Retrieve the field ids that are used for sorting + let mut fields = Vec::new(); + let mut need_geo_candidates = false; + for asc_desc in sort { + let (field, geofield) = match asc_desc { + AscDesc::Asc(Member::Field(field)) => (Some((field, true)), None), + AscDesc::Desc(Member::Field(field)) => (Some((field, false)), None), + AscDesc::Asc(Member::Geo(target_point)) => (None, Some((target_point, true))), + AscDesc::Desc(Member::Geo(target_point)) => (None, Some((target_point, false))), + }; + if let Some((field, ascending)) = field { + if is_faceted(&field, &sortable_fields) { + if let Some(field_id) = fields_ids_map.id(&field) { + fields.push(AscDescId::Facet { field_id, ascending }); + continue; + } + } + return Err(UserError::InvalidDocumentSortableAttribute { + field: field.to_string(), + sortable_fields: sortable_fields.clone(), + } + .into()); + } + if let Some((target_point, ascending)) = geofield { + if sortable_fields.contains(RESERVED_GEO_FIELD_NAME) { + if let (Some(lat), Some(lng)) = + (fields_ids_map.id("_geo.lat"), fields_ids_map.id("_geo.lng")) + { + need_geo_candidates = true; + fields.push(AscDescId::Geo { field_ids: [lat, lng], target_point, ascending }); + continue; + } + } + return Err(UserError::InvalidDocumentSortableAttribute { + field: RESERVED_GEO_FIELD_NAME.to_string(), + sortable_fields: sortable_fields.clone(), + } + .into()); + } + } + + let geo_candidates = if need_geo_candidates { + index.geo_faceted_documents_ids(rtxn)? + } else { + RoaringBitmap::new() + }; + + let number_db = index.facet_id_f64_docids.remap_key_type::>(); + let string_db = + index.facet_id_string_docids.remap_key_type::>(); + + Ok(SortedDocuments { index, rtxn, fields, number_db, string_db, candidates, geo_candidates }) +} diff --git a/crates/milli/src/error.rs b/crates/milli/src/error.rs index c9c39db18..122be47a0 100644 --- a/crates/milli/src/error.rs +++ b/crates/milli/src/error.rs @@ -203,7 +203,21 @@ and can not be more than 511 bytes.", .document_id.to_string() ), } )] - InvalidSortableAttribute { field: String, valid_fields: BTreeSet, hidden_fields: bool }, + InvalidSearchSortableAttribute { + field: String, + valid_fields: BTreeSet, + hidden_fields: bool, + }, + #[error("Attribute `{}` is not sortable. {}", + .field, + match .sortable_fields.is_empty() { + true => "This index does not have configured sortable attributes.".to_string(), + false => format!("Available sortable attributes are: `{}`.", + sortable_fields.iter().map(AsRef::as_ref).collect::>().join(", ") + ), + } + )] + InvalidDocumentSortableAttribute { field: String, sortable_fields: BTreeSet }, #[error("Attribute `{}` is not filterable and thus, cannot be used as distinct attribute. {}", .field, match (.valid_patterns.is_empty(), .matching_rule_index) { @@ -284,8 +298,8 @@ and can not be more than 511 bytes.", .document_id.to_string() PrimaryKeyCannotBeChanged(String), #[error(transparent)] SerdeJson(serde_json::Error), - #[error(transparent)] - SortError(#[from] SortError), + #[error("{error}")] + SortError { error: SortError, search: bool }, #[error("An unknown internal document id have been used: `{document_id}`.")] UnknownInternalDocumentId { document_id: DocumentId }, #[error("`minWordSizeForTypos` setting is invalid. `oneTypo` and `twoTypos` fields should be between `0` and `255`, and `twoTypos` should be greater or equals to `oneTypo` but found `oneTypo: {0}` and twoTypos: {1}`.")] @@ -628,7 +642,7 @@ fn conditionally_lookup_for_error_message() { ]; for (list, suffix) in messages { - let err = UserError::InvalidSortableAttribute { + let err = UserError::InvalidSearchSortableAttribute { field: "name".to_string(), valid_fields: list, hidden_fields: false, @@ -637,3 +651,29 @@ fn conditionally_lookup_for_error_message() { assert_eq!(err.to_string(), format!("{} {}", prefix, suffix)); } } + +pub struct DidYouMean<'a>(Option<&'a str>); + +impl<'a> DidYouMean<'a> { + pub fn new(key: &str, keys: &'a [String]) -> DidYouMean<'a> { + let typos = levenshtein_automata::LevenshteinAutomatonBuilder::new(2, true).build_dfa(key); + for key in keys.iter() { + match typos.eval(key) { + levenshtein_automata::Distance::Exact(_) => { + return DidYouMean(Some(key)); + } + levenshtein_automata::Distance::AtLeast(_) => continue, + } + } + DidYouMean(None) + } +} + +impl std::fmt::Display for DidYouMean<'_> { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + if let Some(suggestion) = self.0 { + write!(f, " Did you mean `{suggestion}`?")?; + } + Ok(()) + } +} diff --git a/crates/milli/src/filterable_attributes_rules.rs b/crates/milli/src/filterable_attributes_rules.rs index b48c55770..10539da83 100644 --- a/crates/milli/src/filterable_attributes_rules.rs +++ b/crates/milli/src/filterable_attributes_rules.rs @@ -115,7 +115,7 @@ impl FilterableAttributesFeatures { self.filter.is_filterable_null() } - /// Check if `IS EXISTS` is allowed + /// Check if `EXISTS` is allowed pub fn is_filterable_exists(&self) -> bool { self.filter.is_filterable_exists() } diff --git a/crates/milli/src/index.rs b/crates/milli/src/index.rs index a2c5fce0b..cd68ee118 100644 --- a/crates/milli/src/index.rs +++ b/crates/milli/src/index.rs @@ -1780,20 +1780,22 @@ impl Index { &self, rtxn: &RoTxn<'_>, docid: DocumentId, - ) -> Result, bool)>> { + ) -> Result> { let mut res = BTreeMap::new(); let embedders = self.embedding_configs(); for config in embedders.embedding_configs(rtxn)? { let embedder_info = embedders.embedder_info(rtxn, &config.name)?.unwrap(); + let has_fragments = config.config.embedder_options.has_fragments(); let reader = ArroyWrapper::new( self.vector_arroy, embedder_info.embedder_id, config.config.quantized(), ); let embeddings = reader.item_vectors(rtxn, docid)?; + let regenerate = embedder_info.embedding_status.must_regenerate(docid); res.insert( config.name.to_owned(), - (embeddings, embedder_info.embedding_status.must_regenerate(docid)), + EmbeddingsWithMetadata { embeddings, regenerate, has_fragments }, ); } Ok(res) @@ -1934,6 +1936,12 @@ impl Index { } } +pub struct EmbeddingsWithMetadata { + pub embeddings: Vec, + pub regenerate: bool, + pub has_fragments: bool, +} + #[derive(Debug, Default, Deserialize, Serialize)] pub struct ChatConfig { pub description: String, diff --git a/crates/milli/src/lib.rs b/crates/milli/src/lib.rs index a7c99a2f0..fedd58efe 100644 --- a/crates/milli/src/lib.rs +++ b/crates/milli/src/lib.rs @@ -43,12 +43,13 @@ use std::fmt; use std::hash::BuildHasherDefault; use charabia::normalizer::{CharNormalizer, CompatibilityDecompositionNormalizer}; +pub use documents::GeoSortStrategy; pub use filter_parser::{Condition, FilterCondition, Span, Token}; use fxhash::{FxHasher32, FxHasher64}; pub use grenad::CompressionType; pub use search::new::{ - execute_search, filtered_universe, DefaultSearchLogger, GeoSortStrategy, SearchContext, - SearchLogger, VisualSearchLogger, + execute_search, filtered_universe, DefaultSearchLogger, SearchContext, SearchLogger, + VisualSearchLogger, }; use serde_json::Value; pub use thread_pool_no_abort::{PanicCatched, ThreadPoolNoAbort, ThreadPoolNoAbortBuilder}; diff --git a/crates/milli/src/search/facet/filter.rs b/crates/milli/src/search/facet/filter.rs index 82cf070c5..d359a98d2 100644 --- a/crates/milli/src/search/facet/filter.rs +++ b/crates/milli/src/search/facet/filter.rs @@ -1,3 +1,4 @@ +use std::borrow::Cow; use std::collections::BTreeSet; use std::fmt::{Debug, Display}; use std::ops::Bound::{self, Excluded, Included, Unbounded}; @@ -11,13 +12,14 @@ use roaring::{MultiOps, RoaringBitmap}; use serde_json::Value; use super::facet_range_search; -use crate::constants::{RESERVED_GEOJSON_FIELD_NAME, RESERVED_GEO_FIELD_NAME}; +use crate::constants::{ + RESERVED_GEOJSON_FIELD_NAME, RESERVED_GEO_FIELD_NAME, RESERVED_VECTORS_FIELD_NAME, +}; use crate::error::{Error, UserError}; use crate::filterable_attributes_rules::{filtered_matching_patterns, matching_features}; -use crate::heed_codec::facet::{ - FacetGroupKey, FacetGroupKeyCodec, FacetGroupValue, FacetGroupValueCodec, -}; +use crate::heed_codec::facet::{FacetGroupKey, FacetGroupKeyCodec, FacetGroupValueCodec}; use crate::index::db_name::FACET_ID_STRING_DOCIDS; +use crate::search::facet::facet_range_search::find_docids_of_facet_within_bounds; use crate::{ distance_between_two_points, lat_lng_to_xyz, FieldId, FieldsIdsMap, FilterableAttributesFeatures, FilterableAttributesRule, Index, InternalError, Result, @@ -228,6 +230,10 @@ impl<'a> Filter<'a> { pub fn use_contains_operator(&self) -> Option<&Token> { self.condition.use_contains_operator() } + + pub fn use_vector_filter(&self) -> Option<&Token> { + self.condition.use_vector_filter() + } } impl<'a> Filter<'a> { @@ -235,10 +241,12 @@ impl<'a> Filter<'a> { // to avoid doing this for each recursive call we're going to do it ONCE ahead of time let fields_ids_map = index.fields_ids_map(rtxn)?; let filterable_attributes_rules = index.filterable_attributes_rules(rtxn)?; + for fid in self.condition.fids(MAX_FILTER_DEPTH) { let attribute = fid.value(); if matching_features(attribute, &filterable_attributes_rules) .is_some_and(|(_, features)| features.is_filterable()) + || attribute == RESERVED_VECTORS_FIELD_NAME { continue; } @@ -416,20 +424,56 @@ impl<'a> Filter<'a> { return Ok(docids); } Condition::StartsWith { keyword: _, word } => { + // The idea here is that "STARTS WITH baba" is the same as "baba <= value < babb". + // We just incremented the last letter to find the upper bound. + // The upper bound may not be valid utf8, but lmdb doesn't care as it works over bytes. + let value = crate::normalize_facet(word.value()); - let base = FacetGroupKey { field_id, level: 0, left_bound: value.as_str() }; - let docids = strings_db - .prefix_iter(rtxn, &base)? - .map(|result| -> Result { - match result { - Ok((_facet_group_key, FacetGroupValue { bitmap, .. })) => Ok(bitmap), - Err(_e) => Err(InternalError::from(SerializationError::Decoding { - db_name: Some(FACET_ID_STRING_DOCIDS), - }) - .into()), - } - }) - .union()?; + let mut value2 = value.as_bytes().to_owned(); + + let last = match value2.last_mut() { + Some(last) => last, + None => { + // The prefix is empty, so all documents that have the field will match. + return index + .exists_faceted_documents_ids(rtxn, field_id) + .map_err(|e| e.into()); + } + }; + + if *last == u8::MAX { + // u8::MAX is a forbidden UTF-8 byte, we're guaranteed it cannot be sent through a filter to meilisearch, but just in case, we're going to return something + tracing::warn!( + "Found non utf-8 character in filter. That shouldn't be possible" + ); + return Ok(RoaringBitmap::new()); + } + *last += 1; + + // This is very similar to `heed::Bytes` but its `EItem` is `&[u8]` instead of `[u8]` + struct BytesRef; + impl<'a> BytesEncode<'a> for BytesRef { + type EItem = &'a [u8]; + + fn bytes_encode( + item: &'a Self::EItem, + ) -> std::result::Result, heed::BoxedError> { + Ok(Cow::Borrowed(item)) + } + } + + let mut docids = RoaringBitmap::new(); + let bytes_db = + index.facet_id_string_docids.remap_key_type::>(); + find_docids_of_facet_within_bounds::( + rtxn, + bytes_db, + field_id, + &Included(value.as_bytes()), + &Excluded(value2.as_slice()), + universe, + &mut docids, + )?; return Ok(docids); } @@ -542,7 +586,8 @@ impl<'a> Filter<'a> { .union() } FilterCondition::Condition { fid, op } => { - let Some(field_id) = field_ids_map.id(fid.value()) else { + let value = fid.value(); + let Some(field_id) = field_ids_map.id(value) else { return Ok(RoaringBitmap::new()); }; let Some((rule_index, features)) = @@ -599,6 +644,9 @@ impl<'a> Filter<'a> { Ok(RoaringBitmap::new()) } } + FilterCondition::VectorExists { fid: _, embedder, filter } => { + super::filter_vector::evaluate(rtxn, index, universe, embedder.clone(), filter) + } FilterCondition::GeoLowerThan { point, radius } => { if index.is_geo_filtering_enabled(rtxn)? { let base_point: [f64; 2] = diff --git a/crates/milli/src/search/facet/filter_vector.rs b/crates/milli/src/search/facet/filter_vector.rs new file mode 100644 index 000000000..1ef4b8e3d --- /dev/null +++ b/crates/milli/src/search/facet/filter_vector.rs @@ -0,0 +1,157 @@ +use filter_parser::{Token, VectorFilter}; +use roaring::{MultiOps, RoaringBitmap}; + +use crate::error::{DidYouMean, Error}; +use crate::vector::db::IndexEmbeddingConfig; +use crate::vector::{ArroyStats, ArroyWrapper}; +use crate::Index; + +#[derive(Debug, thiserror::Error)] +pub enum VectorFilterError<'a> { + #[error("The embedder `{}` does not exist. {}", embedder.value(), { + if available.is_empty() { + String::from("This index does not have any configured embedders.") + } else { + let mut available = available.clone(); + available.sort_unstable(); + let did_you_mean = DidYouMean::new(embedder.value(), &available); + format!("Available embedders are: {}.{did_you_mean}", available.iter().map(|e| format!("`{e}`")).collect::>().join(", ")) + } + })] + EmbedderDoesNotExist { embedder: &'a Token<'a>, available: Vec }, + + #[error("The fragment `{}` does not exist on embedder `{}`. {}", fragment.value(), embedder.value(), { + if available.is_empty() { + String::from("This embedder does not have any configured fragments.") + } else { + let mut available = available.clone(); + available.sort_unstable(); + let did_you_mean = DidYouMean::new(fragment.value(), &available); + format!("Available fragments on this embedder are: {}.{did_you_mean}", available.iter().map(|f| format!("`{f}`")).collect::>().join(", ")) + } + })] + FragmentDoesNotExist { + embedder: &'a Token<'a>, + fragment: &'a Token<'a>, + available: Vec, + }, +} + +use VectorFilterError::*; + +impl<'a> From> for Error { + fn from(err: VectorFilterError<'a>) -> Self { + match &err { + EmbedderDoesNotExist { embedder: token, .. } + | FragmentDoesNotExist { fragment: token, .. } => token.as_external_error(err).into(), + } + } +} + +pub(super) fn evaluate( + rtxn: &heed::RoTxn<'_>, + index: &Index, + universe: Option<&RoaringBitmap>, + embedder: Option>, + filter: &VectorFilter<'_>, +) -> crate::Result { + let index_embedding_configs = index.embedding_configs(); + let embedding_configs = index_embedding_configs.embedding_configs(rtxn)?; + + let embedders = match embedder { + Some(embedder) => vec![embedder], + None => embedding_configs.iter().map(|config| Token::from(config.name.as_str())).collect(), + }; + + let mut docids = embedders + .iter() + .map(|e| evaluate_inner(rtxn, index, e, &embedding_configs, filter)) + .union()?; + + if let Some(universe) = universe { + docids &= universe; + } + + Ok(docids) +} + +fn evaluate_inner( + rtxn: &heed::RoTxn<'_>, + index: &Index, + embedder: &Token<'_>, + embedding_configs: &[IndexEmbeddingConfig], + filter: &VectorFilter<'_>, +) -> crate::Result { + let embedder_name = embedder.value(); + let available_embedders = + || embedding_configs.iter().map(|c| c.name.clone()).collect::>(); + + let embedding_config = embedding_configs + .iter() + .find(|config| config.name == embedder_name) + .ok_or_else(|| EmbedderDoesNotExist { embedder, available: available_embedders() })?; + + let embedder_info = index + .embedding_configs() + .embedder_info(rtxn, embedder_name)? + .ok_or_else(|| EmbedderDoesNotExist { embedder, available: available_embedders() })?; + + let arroy_wrapper = ArroyWrapper::new( + index.vector_arroy, + embedder_info.embedder_id, + embedding_config.config.quantized(), + ); + + let docids = match filter { + VectorFilter::Fragment(fragment) => { + let fragment_name = fragment.value(); + let fragment_config = embedding_config + .fragments + .as_slice() + .iter() + .find(|fragment| fragment.name == fragment_name) + .ok_or_else(|| FragmentDoesNotExist { + embedder, + fragment, + available: embedding_config + .fragments + .as_slice() + .iter() + .map(|f| f.name.clone()) + .collect(), + })?; + + let user_provided_docids = embedder_info.embedding_status.user_provided_docids(); + arroy_wrapper.items_in_store(rtxn, fragment_config.id, |bitmap| { + bitmap.clone() - user_provided_docids + })? + } + VectorFilter::DocumentTemplate => { + if !embedding_config.fragments.as_slice().is_empty() { + return Ok(RoaringBitmap::new()); + } + + let user_provided_docids = embedder_info.embedding_status.user_provided_docids(); + let mut stats = ArroyStats::default(); + arroy_wrapper.aggregate_stats(rtxn, &mut stats)?; + stats.documents - user_provided_docids.clone() + } + VectorFilter::UserProvided => { + let user_provided_docids = embedder_info.embedding_status.user_provided_docids(); + user_provided_docids.clone() + } + VectorFilter::Regenerate => { + let mut stats = ArroyStats::default(); + arroy_wrapper.aggregate_stats(rtxn, &mut stats)?; + let skip_regenerate = embedder_info.embedding_status.skip_regenerate_docids(); + stats.documents - skip_regenerate + } + VectorFilter::None => { + let mut stats = ArroyStats::default(); + arroy_wrapper.aggregate_stats(rtxn, &mut stats)?; + stats.documents + } + }; + + Ok(docids) +} diff --git a/crates/milli/src/search/facet/mod.rs b/crates/milli/src/search/facet/mod.rs index a5e65c95d..fac85df59 100644 --- a/crates/milli/src/search/facet/mod.rs +++ b/crates/milli/src/search/facet/mod.rs @@ -17,6 +17,7 @@ mod facet_range_search; mod facet_sort_ascending; mod facet_sort_descending; mod filter; +mod filter_vector; mod search; fn facet_extreme_value<'t>( diff --git a/crates/milli/src/search/hybrid.rs b/crates/milli/src/search/hybrid.rs index c906e1eb7..1535c73ba 100644 --- a/crates/milli/src/search/hybrid.rs +++ b/crates/milli/src/search/hybrid.rs @@ -7,7 +7,7 @@ use roaring::RoaringBitmap; use crate::score_details::{ScoreDetails, ScoreValue, ScoringStrategy}; use crate::search::new::{distinct_fid, distinct_single_docid}; use crate::search::SemanticSearch; -use crate::vector::SearchQuery; +use crate::vector::{Embedding, SearchQuery}; use crate::{Index, MatchingWords, Result, Search, SearchResult}; struct ScoreWithRatioResult { @@ -16,6 +16,7 @@ struct ScoreWithRatioResult { document_scores: Vec<(u32, ScoreWithRatio)>, degraded: bool, used_negative_operator: bool, + query_vector: Option, } type ScoreWithRatio = (Vec, f32); @@ -85,6 +86,7 @@ impl ScoreWithRatioResult { document_scores, degraded: results.degraded, used_negative_operator: results.used_negative_operator, + query_vector: results.query_vector, } } @@ -186,6 +188,7 @@ impl ScoreWithRatioResult { degraded: vector_results.degraded | keyword_results.degraded, used_negative_operator: vector_results.used_negative_operator | keyword_results.used_negative_operator, + query_vector: vector_results.query_vector, }, semantic_hit_count, )) @@ -209,7 +212,9 @@ impl Search<'_> { terms_matching_strategy: self.terms_matching_strategy, scoring_strategy: ScoringStrategy::Detailed, words_limit: self.words_limit, + retrieve_vectors: self.retrieve_vectors, exhaustive_number_hits: self.exhaustive_number_hits, + max_total_hits: self.max_total_hits, rtxn: self.rtxn, index: self.index, semantic: self.semantic.clone(), @@ -264,7 +269,7 @@ impl Search<'_> { }; search.semantic = Some(SemanticSearch { - vector: Some(vector_query), + vector: Some(vector_query.clone()), embedder_name, embedder, quantized, @@ -321,6 +326,7 @@ fn return_keyword_results( mut document_scores, degraded, used_negative_operator, + query_vector, }: SearchResult, ) -> (SearchResult, Option) { let (documents_ids, document_scores) = if offset >= documents_ids.len() || @@ -347,6 +353,7 @@ fn return_keyword_results( document_scores, degraded, used_negative_operator, + query_vector, }, Some(0), ) diff --git a/crates/milli/src/search/mod.rs b/crates/milli/src/search/mod.rs index 97d542524..2ae931ff5 100644 --- a/crates/milli/src/search/mod.rs +++ b/crates/milli/src/search/mod.rs @@ -9,6 +9,7 @@ use roaring::bitmap::RoaringBitmap; pub use self::facet::{FacetDistribution, Filter, OrderBy, DEFAULT_VALUES_PER_FACET}; pub use self::new::matches::{FormatOptions, MatchBounds, MatcherBuilder, MatchingWords}; use self::new::{execute_vector_search, PartialSearchResult, VectorStoreStats}; +use crate::documents::GeoSortParameter; use crate::filterable_attributes_rules::{filtered_matching_patterns, matching_features}; use crate::index::MatchingStrategy; use crate::score_details::{ScoreDetails, ScoringStrategy}; @@ -47,11 +48,13 @@ pub struct Search<'a> { sort_criteria: Option>, distinct: Option, searchable_attributes: Option<&'a [String]>, - geo_param: new::GeoSortParameter, + geo_param: GeoSortParameter, terms_matching_strategy: TermsMatchingStrategy, scoring_strategy: ScoringStrategy, words_limit: usize, + retrieve_vectors: bool, exhaustive_number_hits: bool, + max_total_hits: Option, rtxn: &'a heed::RoTxn<'a>, index: &'a Index, semantic: Option, @@ -70,10 +73,12 @@ impl<'a> Search<'a> { sort_criteria: None, distinct: None, searchable_attributes: None, - geo_param: new::GeoSortParameter::default(), + geo_param: GeoSortParameter::default(), terms_matching_strategy: TermsMatchingStrategy::default(), scoring_strategy: Default::default(), + retrieve_vectors: false, exhaustive_number_hits: false, + max_total_hits: None, words_limit: 10, rtxn, index, @@ -147,7 +152,7 @@ impl<'a> Search<'a> { } #[cfg(test)] - pub fn geo_sort_strategy(&mut self, strategy: new::GeoSortStrategy) -> &mut Search<'a> { + pub fn geo_sort_strategy(&mut self, strategy: crate::GeoSortStrategy) -> &mut Search<'a> { self.geo_param.strategy = strategy; self } @@ -158,6 +163,11 @@ impl<'a> Search<'a> { self } + pub fn retrieve_vectors(&mut self, retrieve_vectors: bool) -> &mut Search<'a> { + self.retrieve_vectors = retrieve_vectors; + self + } + /// Forces the search to exhaustively compute the number of candidates, /// this will increase the search time but allows finite pagination. pub fn exhaustive_number_hits(&mut self, exhaustive_number_hits: bool) -> &mut Search<'a> { @@ -165,6 +175,11 @@ impl<'a> Search<'a> { self } + pub fn max_total_hits(&mut self, max_total_hits: Option) -> &mut Search<'a> { + self.max_total_hits = max_total_hits; + self + } + pub fn time_budget(&mut self, time_budget: TimeBudget) -> &mut Search<'a> { self.time_budget = time_budget; self @@ -225,6 +240,7 @@ impl<'a> Search<'a> { } let universe = filtered_universe(ctx.index, ctx.txn, &self.filter)?; + let mut query_vector = None; let PartialSearchResult { located_query_terms, candidates, @@ -239,28 +255,36 @@ impl<'a> Search<'a> { embedder, quantized, media: _, - }) => execute_vector_search( - &mut ctx, - vector, - self.scoring_strategy, - universe, - &self.sort_criteria, - &self.distinct, - self.geo_param, - self.offset, - self.limit, - embedder_name, - embedder, - *quantized, - self.time_budget.clone(), - self.ranking_score_threshold, - )?, + }) => { + if self.retrieve_vectors { + query_vector = Some(vector.clone()); + } + execute_vector_search( + &mut ctx, + vector, + self.scoring_strategy, + self.exhaustive_number_hits, + self.max_total_hits, + universe, + &self.sort_criteria, + &self.distinct, + self.geo_param, + self.offset, + self.limit, + embedder_name, + embedder, + *quantized, + self.time_budget.clone(), + self.ranking_score_threshold, + )? + } _ => execute_search( &mut ctx, self.query.as_deref(), self.terms_matching_strategy, self.scoring_strategy, self.exhaustive_number_hits, + self.max_total_hits, universe, &self.sort_criteria, &self.distinct, @@ -295,6 +319,7 @@ impl<'a> Search<'a> { documents_ids, degraded, used_negative_operator, + query_vector, }) } } @@ -313,7 +338,9 @@ impl fmt::Debug for Search<'_> { terms_matching_strategy, scoring_strategy, words_limit, + retrieve_vectors, exhaustive_number_hits, + max_total_hits, rtxn: _, index: _, semantic, @@ -332,7 +359,9 @@ impl fmt::Debug for Search<'_> { .field("searchable_attributes", searchable_attributes) .field("terms_matching_strategy", terms_matching_strategy) .field("scoring_strategy", scoring_strategy) + .field("retrieve_vectors", retrieve_vectors) .field("exhaustive_number_hits", exhaustive_number_hits) + .field("max_total_hits", max_total_hits) .field("words_limit", words_limit) .field( "semantic.embedder_name", @@ -353,6 +382,7 @@ pub struct SearchResult { pub document_scores: Vec>, pub degraded: bool, pub used_negative_operator: bool, + pub query_vector: Option, } #[derive(Debug, Clone, Copy, PartialEq, Eq)] diff --git a/crates/milli/src/search/new/bucket_sort.rs b/crates/milli/src/search/new/bucket_sort.rs index 3c26cad5c..645d36e16 100644 --- a/crates/milli/src/search/new/bucket_sort.rs +++ b/crates/milli/src/search/new/bucket_sort.rs @@ -32,6 +32,8 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>( logger: &mut dyn SearchLogger, time_budget: TimeBudget, ranking_score_threshold: Option, + exhaustive_number_hits: bool, + max_total_hits: Option, ) -> Result { logger.initial_query(query); logger.ranking_rules(&ranking_rules); @@ -159,7 +161,13 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>( }; } - while valid_docids.len() < length { + let max_len_to_evaluate = + match (max_total_hits, exhaustive_number_hits && ranking_score_threshold.is_some()) { + (Some(max_total_hits), true) => max_total_hits, + _ => length, + }; + + while valid_docids.len() < max_len_to_evaluate { if time_budget.exceeded() { loop { let bucket = std::mem::take(&mut ranking_rule_universes[cur_ranking_rule_index]); diff --git a/crates/milli/src/search/new/distinct.rs b/crates/milli/src/search/new/distinct.rs index 36172302a..455b495f5 100644 --- a/crates/milli/src/search/new/distinct.rs +++ b/crates/milli/src/search/new/distinct.rs @@ -82,7 +82,7 @@ fn facet_value_docids( } /// Return an iterator over each number value in the given field of the given document. -fn facet_number_values<'a>( +pub(crate) fn facet_number_values<'a>( docid: u32, field_id: u16, index: &Index, @@ -118,7 +118,7 @@ pub fn facet_string_values<'a>( } #[allow(clippy::drop_non_drop)] -fn facet_values_prefix_key(distinct: u16, id: u32) -> [u8; FID_SIZE + DOCID_SIZE] { +pub(crate) fn facet_values_prefix_key(distinct: u16, id: u32) -> [u8; FID_SIZE + DOCID_SIZE] { concat_arrays::concat_arrays!(distinct.to_be_bytes(), id.to_be_bytes()) } diff --git a/crates/milli/src/search/new/geo_sort.rs b/crates/milli/src/search/new/geo_sort.rs index 3e7fe3458..6c7d7b03b 100644 --- a/crates/milli/src/search/new/geo_sort.rs +++ b/crates/milli/src/search/new/geo_sort.rs @@ -1,96 +1,18 @@ use std::collections::VecDeque; -use heed::types::{Bytes, Unit}; -use heed::{RoPrefix, RoTxn}; use roaring::RoaringBitmap; use rstar::RTree; -use super::facet_string_values; use super::ranking_rules::{RankingRule, RankingRuleOutput, RankingRuleQueryTrait}; -use crate::heed_codec::facet::{FieldDocIdFacetCodec, OrderedF64Codec}; +use crate::documents::geo_sort::{fill_cache, next_bucket}; +use crate::documents::{GeoSortParameter, GeoSortStrategy}; use crate::score_details::{self, ScoreDetails}; -use crate::{ - distance_between_two_points, lat_lng_to_xyz, GeoPoint, Index, Result, SearchContext, - SearchLogger, -}; - -const FID_SIZE: usize = 2; -const DOCID_SIZE: usize = 4; - -#[allow(clippy::drop_non_drop)] -fn facet_values_prefix_key(distinct: u16, id: u32) -> [u8; FID_SIZE + DOCID_SIZE] { - concat_arrays::concat_arrays!(distinct.to_be_bytes(), id.to_be_bytes()) -} - -/// Return an iterator over each number value in the given field of the given document. -fn facet_number_values<'a>( - docid: u32, - field_id: u16, - index: &Index, - txn: &'a RoTxn<'a>, -) -> Result, Unit>> { - let key = facet_values_prefix_key(field_id, docid); - - let iter = index - .field_id_docid_facet_f64s - .remap_key_type::() - .prefix_iter(txn, &key)? - .remap_key_type(); - - Ok(iter) -} - -#[derive(Debug, Clone, Copy)] -pub struct Parameter { - // Define the strategy used by the geo sort - pub strategy: Strategy, - // Limit the number of docs in a single bucket to avoid unexpectedly large overhead - pub max_bucket_size: u64, - // Considering the errors of GPS and geographical calculations, distances less than distance_error_margin will be treated as equal - pub distance_error_margin: f64, -} - -impl Default for Parameter { - fn default() -> Self { - Self { strategy: Strategy::default(), max_bucket_size: 1000, distance_error_margin: 1.0 } - } -} -/// Define the strategy used by the geo sort. -/// The parameter represents the cache size, and, in the case of the Dynamic strategy, -/// the point where we move from using the iterative strategy to the rtree. -#[derive(Debug, Clone, Copy)] -pub enum Strategy { - AlwaysIterative(usize), - AlwaysRtree(usize), - Dynamic(usize), -} - -impl Default for Strategy { - fn default() -> Self { - Strategy::Dynamic(1000) - } -} - -impl Strategy { - pub fn use_rtree(&self, candidates: usize) -> bool { - match self { - Strategy::AlwaysIterative(_) => false, - Strategy::AlwaysRtree(_) => true, - Strategy::Dynamic(i) => candidates >= *i, - } - } - - pub fn cache_size(&self) -> usize { - match self { - Strategy::AlwaysIterative(i) | Strategy::AlwaysRtree(i) | Strategy::Dynamic(i) => *i, - } - } -} +use crate::{GeoPoint, Result, SearchContext, SearchLogger}; pub struct GeoSort { query: Option, - strategy: Strategy, + strategy: GeoSortStrategy, ascending: bool, point: [f64; 2], field_ids: Option<[u16; 2]>, @@ -107,12 +29,12 @@ pub struct GeoSort { impl GeoSort { pub fn new( - parameter: Parameter, + parameter: GeoSortParameter, geo_faceted_docids: RoaringBitmap, point: [f64; 2], ascending: bool, ) -> Result { - let Parameter { strategy, max_bucket_size, distance_error_margin } = parameter; + let GeoSortParameter { strategy, max_bucket_size, distance_error_margin } = parameter; Ok(Self { query: None, strategy, @@ -134,98 +56,22 @@ impl GeoSort { ctx: &mut SearchContext<'_>, geo_candidates: &RoaringBitmap, ) -> Result<()> { - debug_assert!(self.field_ids.is_some(), "fill_buffer can't be called without the lat&lng"); - debug_assert!(self.cached_sorted_docids.is_empty()); - - // lazily initialize the rtree if needed by the strategy, and cache it in `self.rtree` - let rtree = if self.strategy.use_rtree(geo_candidates.len() as usize) { - if let Some(rtree) = self.rtree.as_ref() { - // get rtree from cache - Some(rtree) - } else { - let rtree = ctx.index.geo_rtree(ctx.txn)?.expect("geo candidates but no rtree"); - // insert rtree in cache and returns it. - // Can't use `get_or_insert_with` because getting the rtree from the DB is a fallible operation. - Some(&*self.rtree.insert(rtree)) - } - } else { - None - }; - - let cache_size = self.strategy.cache_size(); - if let Some(rtree) = rtree { - if self.ascending { - let point = lat_lng_to_xyz(&self.point); - for point in rtree.nearest_neighbor_iter(&point) { - if geo_candidates.contains(point.data.0) { - self.cached_sorted_docids.push_back(point.data); - if self.cached_sorted_docids.len() >= cache_size { - break; - } - } - } - } else { - // in the case of the desc geo sort we look for the closest point to the opposite of the queried point - // and we insert the points in reverse order they get reversed when emptying the cache later on - let point = lat_lng_to_xyz(&opposite_of(self.point)); - for point in rtree.nearest_neighbor_iter(&point) { - if geo_candidates.contains(point.data.0) { - self.cached_sorted_docids.push_front(point.data); - if self.cached_sorted_docids.len() >= cache_size { - break; - } - } - } - } - } else { - // the iterative version - let [lat, lng] = self.field_ids.unwrap(); - - let mut documents = geo_candidates - .iter() - .map(|id| -> Result<_> { Ok((id, geo_value(id, lat, lng, ctx.index, ctx.txn)?)) }) - .collect::>>()?; - // computing the distance between two points is expensive thus we cache the result - documents - .sort_by_cached_key(|(_, p)| distance_between_two_points(&self.point, p) as usize); - self.cached_sorted_docids.extend(documents); - }; + fill_cache( + ctx.index, + ctx.txn, + self.strategy, + self.ascending, + self.point, + &self.field_ids, + &mut self.rtree, + geo_candidates, + &mut self.cached_sorted_docids, + )?; Ok(()) } } -/// Extracts the lat and long values from a single document. -/// -/// If it is not able to find it in the facet number index it will extract it -/// from the facet string index and parse it as f64 (as the geo extraction behaves). -fn geo_value( - docid: u32, - field_lat: u16, - field_lng: u16, - index: &Index, - rtxn: &RoTxn<'_>, -) -> Result<[f64; 2]> { - let extract_geo = |geo_field: u16| -> Result { - match facet_number_values(docid, geo_field, index, rtxn)?.next() { - Some(Ok(((_, _, geo), ()))) => Ok(geo), - Some(Err(e)) => Err(e.into()), - None => match facet_string_values(docid, geo_field, index, rtxn)?.next() { - Some(Ok((_, geo))) => { - Ok(geo.parse::().expect("cannot parse geo field as f64")) - } - Some(Err(e)) => Err(e.into()), - None => panic!("A geo faceted document doesn't contain any lat or lng"), - }, - } - }; - - let lat = extract_geo(field_lat)?; - let lng = extract_geo(field_lng)?; - - Ok([lat, lng]) -} - impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for GeoSort { fn id(&self) -> String { "geo_sort".to_owned() @@ -267,124 +113,33 @@ impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for GeoSort { ) -> Result>> { let query = self.query.as_ref().unwrap().clone(); - let mut geo_candidates = &self.geo_candidates & universe; - - if geo_candidates.is_empty() { - return Ok(Some(RankingRuleOutput { + next_bucket( + ctx.index, + ctx.txn, + universe, + self.ascending, + self.point, + &self.field_ids, + &mut self.rtree, + &mut self.cached_sorted_docids, + &self.geo_candidates, + GeoSortParameter { + strategy: self.strategy, + max_bucket_size: self.max_bucket_size, + distance_error_margin: self.distance_error_margin, + }, + ) + .map(|o| { + o.map(|(candidates, point)| RankingRuleOutput { query, - candidates: universe.clone(), + candidates, score: ScoreDetails::GeoSort(score_details::GeoSort { target_point: self.point, ascending: self.ascending, - value: None, + value: point, }), - })); - } - - let ascending = self.ascending; - let next = |cache: &mut VecDeque<_>| { - if ascending { - cache.pop_front() - } else { - cache.pop_back() - } - }; - let put_back = |cache: &mut VecDeque<_>, x: _| { - if ascending { - cache.push_front(x) - } else { - cache.push_back(x) - } - }; - - let mut current_bucket = RoaringBitmap::new(); - // current_distance stores the first point and distance in current bucket - let mut current_distance: Option<([f64; 2], f64)> = None; - loop { - // The loop will only exit when we have found all points with equal distance or have exhausted the candidates. - if let Some((id, point)) = next(&mut self.cached_sorted_docids) { - if geo_candidates.contains(id) { - let distance = distance_between_two_points(&self.point, &point); - if let Some((point0, bucket_distance)) = current_distance.as_ref() { - if (bucket_distance - distance).abs() > self.distance_error_margin { - // different distance, point belongs to next bucket - put_back(&mut self.cached_sorted_docids, (id, point)); - return Ok(Some(RankingRuleOutput { - query, - candidates: current_bucket, - score: ScoreDetails::GeoSort(score_details::GeoSort { - target_point: self.point, - ascending: self.ascending, - value: Some(point0.to_owned()), - }), - })); - } else { - // same distance, point belongs to current bucket - current_bucket.insert(id); - // remove from cadidates to prevent it from being added to the cache again - geo_candidates.remove(id); - // current bucket size reaches limit, force return - if current_bucket.len() == self.max_bucket_size { - return Ok(Some(RankingRuleOutput { - query, - candidates: current_bucket, - score: ScoreDetails::GeoSort(score_details::GeoSort { - target_point: self.point, - ascending: self.ascending, - value: Some(point0.to_owned()), - }), - })); - } - } - } else { - // first doc in current bucket - current_distance = Some((point, distance)); - current_bucket.insert(id); - geo_candidates.remove(id); - // current bucket size reaches limit, force return - if current_bucket.len() == self.max_bucket_size { - return Ok(Some(RankingRuleOutput { - query, - candidates: current_bucket, - score: ScoreDetails::GeoSort(score_details::GeoSort { - target_point: self.point, - ascending: self.ascending, - value: Some(point.to_owned()), - }), - })); - } - } - } - } else { - // cache exhausted, we need to refill it - self.fill_buffer(ctx, &geo_candidates)?; - - if self.cached_sorted_docids.is_empty() { - // candidates exhausted, exit - if let Some((point0, _)) = current_distance.as_ref() { - return Ok(Some(RankingRuleOutput { - query, - candidates: current_bucket, - score: ScoreDetails::GeoSort(score_details::GeoSort { - target_point: self.point, - ascending: self.ascending, - value: Some(point0.to_owned()), - }), - })); - } else { - return Ok(Some(RankingRuleOutput { - query, - candidates: universe.clone(), - score: ScoreDetails::GeoSort(score_details::GeoSort { - target_point: self.point, - ascending: self.ascending, - value: None, - }), - })); - } - } - } - } + }) + }) } #[tracing::instrument(level = "trace", skip_all, target = "search::geo_sort")] @@ -394,16 +149,3 @@ impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for GeoSort { self.cached_sorted_docids.clear(); } } - -/// Compute the antipodal coordinate of `coord` -fn opposite_of(mut coord: [f64; 2]) -> [f64; 2] { - coord[0] *= -1.; - // in the case of x,0 we want to return x,180 - if coord[1] > 0. { - coord[1] -= 180.; - } else { - coord[1] += 180.; - } - - coord -} diff --git a/crates/milli/src/search/new/matches/mod.rs b/crates/milli/src/search/new/matches/mod.rs index 2d6f2cf17..66f65f5e5 100644 --- a/crates/milli/src/search/new/matches/mod.rs +++ b/crates/milli/src/search/new/matches/mod.rs @@ -510,6 +510,7 @@ mod tests { crate::TermsMatchingStrategy::default(), crate::score_details::ScoringStrategy::Skip, false, + None, universe, &None, &None, diff --git a/crates/milli/src/search/new/mod.rs b/crates/milli/src/search/new/mod.rs index a65b4076b..e22883839 100644 --- a/crates/milli/src/search/new/mod.rs +++ b/crates/milli/src/search/new/mod.rs @@ -1,7 +1,7 @@ mod bucket_sort; mod db_cache; mod distinct; -mod geo_sort; +pub(crate) mod geo_sort; mod graph_based_ranking_rule; mod interner; mod limits; @@ -46,14 +46,14 @@ use resolve_query_graph::{compute_query_graph_docids, PhraseDocIdsCache}; use roaring::RoaringBitmap; use sort::Sort; -use self::distinct::facet_string_values; +pub(crate) use self::distinct::{facet_string_values, facet_values_prefix_key}; use self::geo_sort::GeoSort; -pub use self::geo_sort::{Parameter as GeoSortParameter, Strategy as GeoSortStrategy}; use self::graph_based_ranking_rule::Words; use self::interner::Interned; use self::vector_sort::VectorSort; use crate::attribute_patterns::{match_pattern, PatternMatch}; use crate::constants::RESERVED_GEO_FIELD_NAME; +use crate::documents::GeoSortParameter; use crate::index::PrefixSearch; use crate::localized_attributes_rules::LocalizedFieldIds; use crate::score_details::{ScoreDetails, ScoringStrategy}; @@ -319,7 +319,7 @@ fn resolve_negative_phrases( fn get_ranking_rules_for_placeholder_search<'ctx>( ctx: &SearchContext<'ctx>, sort_criteria: &Option>, - geo_param: geo_sort::Parameter, + geo_param: GeoSortParameter, ) -> Result>> { let mut sort = false; let mut sorted_fields = HashSet::new(); @@ -371,7 +371,7 @@ fn get_ranking_rules_for_placeholder_search<'ctx>( fn get_ranking_rules_for_vector<'ctx>( ctx: &SearchContext<'ctx>, sort_criteria: &Option>, - geo_param: geo_sort::Parameter, + geo_param: GeoSortParameter, limit_plus_offset: usize, target: &[f32], embedder_name: &str, @@ -448,7 +448,7 @@ fn get_ranking_rules_for_vector<'ctx>( fn get_ranking_rules_for_query_graph_search<'ctx>( ctx: &SearchContext<'ctx>, sort_criteria: &Option>, - geo_param: geo_sort::Parameter, + geo_param: GeoSortParameter, terms_matching_strategy: TermsMatchingStrategy, ) -> Result>> { // query graph search @@ -559,7 +559,7 @@ fn resolve_sort_criteria<'ctx, Query: RankingRuleQueryTrait>( ranking_rules: &mut Vec>, sorted_fields: &mut HashSet, geo_sorted: &mut bool, - geo_param: geo_sort::Parameter, + geo_param: GeoSortParameter, ) -> Result<()> { let sort_criteria = sort_criteria.clone().unwrap_or_default(); ranking_rules.reserve(sort_criteria.len()); @@ -626,10 +626,12 @@ pub fn execute_vector_search( ctx: &mut SearchContext<'_>, vector: &[f32], scoring_strategy: ScoringStrategy, + exhaustive_number_hits: bool, + max_total_hits: Option, universe: RoaringBitmap, sort_criteria: &Option>, distinct: &Option, - geo_param: geo_sort::Parameter, + geo_param: GeoSortParameter, from: usize, length: usize, embedder_name: &str, @@ -669,6 +671,8 @@ pub fn execute_vector_search( placeholder_search_logger, time_budget, ranking_score_threshold, + exhaustive_number_hits, + max_total_hits, )?; Ok(PartialSearchResult { @@ -689,10 +693,11 @@ pub fn execute_search( terms_matching_strategy: TermsMatchingStrategy, scoring_strategy: ScoringStrategy, exhaustive_number_hits: bool, + max_total_hits: Option, mut universe: RoaringBitmap, sort_criteria: &Option>, distinct: &Option, - geo_param: geo_sort::Parameter, + geo_param: GeoSortParameter, from: usize, length: usize, words_limit: Option, @@ -825,6 +830,8 @@ pub fn execute_search( query_graph_logger, time_budget, ranking_score_threshold, + exhaustive_number_hits, + max_total_hits, )? } else { let ranking_rules = @@ -841,6 +848,8 @@ pub fn execute_search( placeholder_search_logger, time_budget, ranking_score_threshold, + exhaustive_number_hits, + max_total_hits, )? }; @@ -872,7 +881,7 @@ pub fn execute_search( }) } -fn check_sort_criteria( +pub(crate) fn check_sort_criteria( ctx: &SearchContext<'_>, sort_criteria: Option<&Vec>, ) -> Result<()> { @@ -902,7 +911,7 @@ fn check_sort_criteria( let (valid_fields, hidden_fields) = ctx.index.remove_hidden_fields(ctx.txn, sortable_fields)?; - return Err(UserError::InvalidSortableAttribute { + return Err(UserError::InvalidSearchSortableAttribute { field: field.to_string(), valid_fields, hidden_fields, @@ -913,7 +922,7 @@ fn check_sort_criteria( let (valid_fields, hidden_fields) = ctx.index.remove_hidden_fields(ctx.txn, sortable_fields)?; - return Err(UserError::InvalidSortableAttribute { + return Err(UserError::InvalidSearchSortableAttribute { field: RESERVED_GEO_FIELD_NAME.to_string(), valid_fields, hidden_fields, diff --git a/crates/milli/src/search/new/tests/integration.rs b/crates/milli/src/search/new/tests/integration.rs index 38f39e18b..6b8c25ab8 100644 --- a/crates/milli/src/search/new/tests/integration.rs +++ b/crates/milli/src/search/new/tests/integration.rs @@ -17,7 +17,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index { let path = tempfile::tempdir().unwrap(); let options = EnvOpenOptions::new(); let mut options = options.read_txn_without_tls(); - options.map_size(10 * 1024 * 1024); // 10 MB + options.map_size(10 * 1024 * 1024); // 10 MiB let index = Index::new(options, &path, true).unwrap(); let mut wtxn = index.write_txn().unwrap(); diff --git a/crates/milli/src/search/similar.rs b/crates/milli/src/search/similar.rs index 903b5fcf9..2235f6436 100644 --- a/crates/milli/src/search/similar.rs +++ b/crates/milli/src/search/similar.rs @@ -130,6 +130,7 @@ impl<'a> Similar<'a> { document_scores, degraded: false, used_negative_operator: false, + query_vector: None, }) } } diff --git a/crates/milli/src/test_index.rs b/crates/milli/src/test_index.rs index 6bb6b1345..6e34961e7 100644 --- a/crates/milli/src/test_index.rs +++ b/crates/milli/src/test_index.rs @@ -1097,6 +1097,7 @@ fn bug_3021_fourth() { mut documents_ids, degraded: _, used_negative_operator: _, + query_vector: _, } = search.execute().unwrap(); let primary_key_id = index.fields_ids_map(&rtxn).unwrap().id("primary_key").unwrap(); documents_ids.sort_unstable(); @@ -1338,10 +1339,9 @@ fn vectors_are_never_indexed_as_searchable_or_filterable() { assert!(results.candidates.is_empty()); let mut search = index.search(&rtxn); - let results = search - .filter(Filter::from_str("_vectors.doggo = 6789").unwrap().unwrap()) - .execute() - .unwrap(); + let results = + dbg!(search.filter(Filter::from_str("_vectors.doggo = 6789").unwrap().unwrap()).execute()) + .unwrap(); assert!(results.candidates.is_empty()); index diff --git a/crates/milli/src/update/chat.rs b/crates/milli/src/update/chat.rs index 2f364894d..a6c0b3fbc 100644 --- a/crates/milli/src/update/chat.rs +++ b/crates/milli/src/update/chat.rs @@ -93,7 +93,7 @@ pub struct ChatSearchParams { pub hybrid: Setting, #[serde(default, skip_serializing_if = "Setting::is_not_set")] - #[deserr(default = Setting::Set(20))] + #[deserr(default)] #[schema(value_type = Option)] pub limit: Setting, diff --git a/crates/milli/src/update/clear_documents.rs b/crates/milli/src/update/clear_documents.rs index af733d25f..a6cbc6207 100644 --- a/crates/milli/src/update/clear_documents.rs +++ b/crates/milli/src/update/clear_documents.rs @@ -2,7 +2,7 @@ use heed::RwTxn; use roaring::RoaringBitmap; use time::OffsetDateTime; -use crate::{FieldDistribution, Index, Result}; +use crate::{database_stats::DatabaseStats, FieldDistribution, Index, Result}; pub struct ClearDocuments<'t, 'i> { wtxn: &'t mut RwTxn<'i>, @@ -94,6 +94,10 @@ impl<'t, 'i> ClearDocuments<'t, 'i> { documents.clear(self.wtxn)?; + // Update the stats of the documents database after clearing all documents. + let stats = DatabaseStats::new(self.index.documents.remap_data_type(), self.wtxn)?; + self.index.put_documents_stats(self.wtxn, stats)?; + Ok(number_of_documents) } } @@ -124,6 +128,9 @@ mod tests { let rtxn = index.read_txn().unwrap(); + // Variables for statistics verification + let stats = index.documents_stats(&rtxn).unwrap().unwrap(); + // the value is 7 because there is `[id, name, age, country, _geo, _geo.lng, _geo.lat]` assert_eq!(index.fields_ids_map(&rtxn).unwrap().len(), 7); @@ -144,5 +151,9 @@ mod tests { assert!(index.field_id_docid_facet_f64s.is_empty(&rtxn).unwrap()); assert!(index.field_id_docid_facet_strings.is_empty(&rtxn).unwrap()); assert!(index.documents.is_empty(&rtxn).unwrap()); + + // Verify that the statistics are correctly updated after clearing documents + assert_eq!(index.number_of_documents(&rtxn).unwrap(), 0); + assert_eq!(stats.number_of_entries(), 0); } } diff --git a/crates/milli/src/update/facet/mod.rs b/crates/milli/src/update/facet/mod.rs index c40916670..71596530e 100644 --- a/crates/milli/src/update/facet/mod.rs +++ b/crates/milli/src/update/facet/mod.rs @@ -119,6 +119,7 @@ pub struct FacetsUpdate<'i> { min_level_size: u8, data_size: u64, } + impl<'i> FacetsUpdate<'i> { pub fn new( index: &'i Index, diff --git a/crates/milli/src/update/index_documents/extract/extract_vector_points.rs b/crates/milli/src/update/index_documents/extract/extract_vector_points.rs index 064cfd154..a1dfa1aad 100644 --- a/crates/milli/src/update/index_documents/extract/extract_vector_points.rs +++ b/crates/milli/src/update/index_documents/extract/extract_vector_points.rs @@ -23,7 +23,7 @@ use crate::progress::EmbedderStats; use crate::prompt::Prompt; use crate::update::del_add::{DelAdd, KvReaderDelAdd, KvWriterDelAdd}; use crate::update::settings::InnerIndexSettingsDiff; -use crate::vector::db::{EmbedderInfo, EmbeddingStatus, EmbeddingStatusDelta}; +use crate::vector::db::{EmbedderInfo, EmbeddingStatusDelta}; use crate::vector::error::{EmbedErrorKind, PossibleEmbeddingMistakes, UnusedVectorsDistribution}; use crate::vector::extractor::{Extractor, ExtractorDiff, RequestFragmentExtractor}; use crate::vector::parsed_vectors::{ParsedVectorsDiff, VectorState}; @@ -441,6 +441,8 @@ pub fn extract_vector_points( { let embedder_is_manual = matches!(*runtime.embedder, Embedder::UserProvided(_)); + let (old_is_user_provided, old_must_regenerate) = + embedder_info.embedding_status.is_user_provided_must_regenerate(docid); let (old, new) = parsed_vectors.remove(embedder_name); let new_must_regenerate = new.must_regenerate(); let delta = match action { @@ -499,16 +501,19 @@ pub fn extract_vector_points( let is_adding_fragments = has_fragments && !old_has_fragments; - if is_adding_fragments { + if !has_fragments { + // removing fragments + regenerate_prompt(obkv, &runtime.document_template, new_fields_ids_map)? + } else if is_adding_fragments || + // regenerate all fragments when going from user provided to ! user provided + old_is_user_provided + { regenerate_all_fragments( runtime.fragments(), &doc_alloc, new_fields_ids_map, obkv, ) - } else if !has_fragments { - // removing fragments - regenerate_prompt(obkv, &runtime.document_template, new_fields_ids_map)? } else { let mut fragment_diff = Vec::new(); let new_fields_ids_map = new_fields_ids_map.as_fields_ids_map(); @@ -600,7 +605,8 @@ pub fn extract_vector_points( docid, &delta, new_must_regenerate, - &embedder_info.embedding_status, + old_is_user_provided, + old_must_regenerate, ); // and we finally push the unique vectors into the writer @@ -657,10 +663,9 @@ fn push_embedding_status_delta( docid: DocumentId, delta: &VectorStateDelta, new_must_regenerate: bool, - embedding_status: &EmbeddingStatus, + old_is_user_provided: bool, + old_must_regenerate: bool, ) { - let (old_is_user_provided, old_must_regenerate) = - embedding_status.is_user_provided_must_regenerate(docid); let new_is_user_provided = match delta { VectorStateDelta::NoChange => old_is_user_provided, VectorStateDelta::NowRemoved => { diff --git a/crates/milli/src/update/indexer_config.rs b/crates/milli/src/update/indexer_config.rs index a0f901818..845da5a51 100644 --- a/crates/milli/src/update/indexer_config.rs +++ b/crates/milli/src/update/indexer_config.rs @@ -16,6 +16,7 @@ pub struct IndexerConfig { pub max_positions_per_attributes: Option, pub skip_index_budget: bool, pub experimental_no_edition_2024_for_settings: bool, + pub experimental_no_edition_2024_for_dumps: bool, } impl IndexerConfig { @@ -65,6 +66,7 @@ impl Default for IndexerConfig { max_positions_per_attributes: None, skip_index_budget: false, experimental_no_edition_2024_for_settings: false, + experimental_no_edition_2024_for_dumps: false, } } } diff --git a/crates/milli/src/update/new/extract/vectors/mod.rs b/crates/milli/src/update/new/extract/vectors/mod.rs index 25a26f9fb..6efc06917 100644 --- a/crates/milli/src/update/new/extract/vectors/mod.rs +++ b/crates/milli/src/update/new/extract/vectors/mod.rs @@ -620,12 +620,35 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> { where 'a: 'doc, { - match &mut self.kind { - ChunkType::Fragments { fragments: _, session } => { - let doc_alloc = session.doc_alloc(); + self.set_status(docid, old_is_user_provided, true, false, true); - if old_is_user_provided | full_reindex { + match &mut self.kind { + ChunkType::Fragments { fragments, session } => { + let doc_alloc = session.doc_alloc(); + let reindex_all_fragments = + // when the vectors were user-provided, Meilisearch cannot know if they come from a particular fragment, + // and so Meilisearch needs to clear all embeddings in that case. + // Fortunately, as dump export fragment vector with `regenerate` set to `false`, + // this case should be rare and opt-in. + old_is_user_provided || + // full-reindex case + full_reindex; + + if reindex_all_fragments { session.on_embed_mut().clear_vectors(docid); + let extractors = fragments.iter().map(|fragment| { + RequestFragmentExtractor::new(fragment, doc_alloc).ignore_errors() + }); + insert_autogenerated( + docid, + external_docid, + extractors, + document, + &(), + session, + unused_vectors_distribution, + )?; + return Ok(()); } settings_delta.try_for_each_fragment_diff( @@ -669,7 +692,6 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> { Result::Ok(()) }, )?; - self.set_status(docid, old_is_user_provided, true, false, true); } ChunkType::DocumentTemplate { document_template, session } => { let doc_alloc = session.doc_alloc(); @@ -690,12 +712,18 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> { match extractor.diff_settings(document, &external_docid, old_extractor.as_ref())? { ExtractorDiff::Removed => { + if old_is_user_provided || full_reindex { + session.on_embed_mut().clear_vectors(docid); + } OnEmbed::process_embedding_response( session.on_embed_mut(), crate::vector::session::EmbeddingResponse { metadata, embedding: None }, ); } ExtractorDiff::Added(input) | ExtractorDiff::Updated(input) => { + if old_is_user_provided || full_reindex { + session.on_embed_mut().clear_vectors(docid); + } session.request_embedding(metadata, input, unused_vectors_distribution)?; } ExtractorDiff::Unchanged => { /* do nothing */ } @@ -722,6 +750,13 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> { where 'a: 'doc, { + self.set_status( + docid, + old_is_user_provided, + old_must_regenerate, + false, + new_must_regenerate, + ); match &mut self.kind { ChunkType::DocumentTemplate { document_template, session } => { let doc_alloc = session.doc_alloc(); @@ -731,10 +766,6 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> { new_fields_ids_map, ); - if old_is_user_provided { - session.on_embed_mut().clear_vectors(docid); - } - update_autogenerated( docid, external_docid, @@ -743,6 +774,7 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> { new_document, &external_docid, old_must_regenerate, + old_is_user_provided, session, unused_vectors_distribution, )? @@ -754,7 +786,21 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> { }); if old_is_user_provided { + // when the document was `userProvided`, Meilisearch cannot know whose fragments a particular + // vector was referring to. + // So as a result Meilisearch will regenerate all fragments on this case. + // Fortunately, since dumps for fragments set regenerate to false, this case should be rare. session.on_embed_mut().clear_vectors(docid); + insert_autogenerated( + docid, + external_docid, + extractors, + new_document, + &(), + session, + unused_vectors_distribution, + )?; + return Ok(()); } update_autogenerated( @@ -765,25 +811,18 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> { new_document, &(), old_must_regenerate, + false, session, unused_vectors_distribution, )? } }; - self.set_status( - docid, - old_is_user_provided, - old_must_regenerate, - false, - new_must_regenerate, - ); - Ok(()) } #[allow(clippy::too_many_arguments)] - pub fn insert_autogenerated + Debug>( + pub fn insert_autogenerated<'doc, D: Document<'doc> + Debug>( &mut self, docid: DocumentId, external_docid: &'a str, @@ -791,7 +830,10 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> { new_fields_ids_map: &'a RefCell, unused_vectors_distribution: &UnusedVectorsDistributionBump<'a>, new_must_regenerate: bool, - ) -> Result<()> { + ) -> Result<()> + where + 'a: 'doc, + { let (default_is_user_provided, default_must_regenerate) = (false, true); self.set_status( docid, @@ -956,6 +998,7 @@ fn update_autogenerated<'doc, 'a: 'doc, 'b, E, OD, ND>( new_document: ND, meta: &E::DocumentMetadata, old_must_regenerate: bool, + mut must_clear_on_generation: bool, session: &mut EmbedSession<'a, OnEmbeddingDocumentUpdates<'a, 'b>, E::Input>, unused_vectors_distribution: &UnusedVectorsDistributionBump<'a>, ) -> Result<()> @@ -984,6 +1027,11 @@ where }; if must_regenerate { + if must_clear_on_generation { + must_clear_on_generation = false; + session.on_embed_mut().clear_vectors(docid); + } + let metadata = Metadata { docid, external_docid, extractor_id: extractor.extractor_id() }; @@ -1002,7 +1050,7 @@ where Ok(()) } -fn insert_autogenerated<'a, 'b, E, D: Document<'a> + Debug>( +fn insert_autogenerated<'doc, 'a: 'doc, 'b, E, D: Document<'doc> + Debug>( docid: DocumentId, external_docid: &'a str, extractors: impl IntoIterator, diff --git a/crates/milli/src/update/settings.rs b/crates/milli/src/update/settings.rs index a4d8b7203..db4f8070a 100644 --- a/crates/milli/src/update/settings.rs +++ b/crates/milli/src/update/settings.rs @@ -101,6 +101,10 @@ impl Setting { matches!(self, Self::NotSet) } + pub const fn is_reset(&self) -> bool { + matches!(self, Self::Reset) + } + /// If `Self` is `Reset`, then map self to `Set` with the provided `val`. pub fn or_reset(self, val: T) -> Self { match self { @@ -554,10 +558,10 @@ impl<'a, 't, 'i> Settings<'a, 't, 'i> { match self.searchable_fields { Setting::Set(ref fields) => { // Check to see if the searchable fields changed before doing anything else - let old_fields = self.index.searchable_fields(self.wtxn)?; + let old_fields = self.index.user_defined_searchable_fields(self.wtxn)?; let did_change = { let new_fields = fields.iter().map(String::as_str).collect::>(); - new_fields != old_fields + old_fields.is_none_or(|old| new_fields != old) }; if !did_change { return Ok(false); @@ -1213,6 +1217,10 @@ impl<'a, 't, 'i> Settings<'a, 't, 'i> { // new config EitherOrBoth::Right((name, mut setting)) => { tracing::debug!(embedder = name, "new embedder"); + // if we are asked to reset an embedder that doesn't exist, just ignore it + if setting.is_reset() { + continue; + } // apply the default source in case the source was not set so that it gets validated crate::vector::settings::EmbeddingSettings::apply_default_source(&mut setting); crate::vector::settings::EmbeddingSettings::apply_default_openai_model( diff --git a/crates/milli/src/update/upgrade/mod.rs b/crates/milli/src/update/upgrade/mod.rs index 9f64ca0e3..ecd1cec6c 100644 --- a/crates/milli/src/update/upgrade/mod.rs +++ b/crates/milli/src/update/upgrade/mod.rs @@ -2,14 +2,17 @@ mod v1_12; mod v1_13; mod v1_14; mod v1_15; +mod v1_16; use heed::RwTxn; use v1_12::{V1_12_3_To_V1_13_0, V1_12_To_V1_12_3}; use v1_13::{V1_13_0_To_V1_13_1, V1_13_1_To_Latest_V1_13}; use v1_14::Latest_V1_13_To_Latest_V1_14; use v1_15::Latest_V1_14_To_Latest_V1_15; +use v1_16::Latest_V1_16_To_V1_17_0; use crate::constants::{VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH}; use crate::progress::{Progress, VariableNameStep}; +use crate::update::upgrade::v1_16::Latest_V1_15_To_V1_16_0; use crate::{Index, InternalError, Result}; trait UpgradeIndex { @@ -24,6 +27,61 @@ trait UpgradeIndex { fn target_version(&self) -> (u32, u32, u32); } +const UPGRADE_FUNCTIONS: &[&dyn UpgradeIndex] = &[ + &V1_12_To_V1_12_3 {}, + &V1_12_3_To_V1_13_0 {}, + &V1_13_0_To_V1_13_1 {}, + &V1_13_1_To_Latest_V1_13 {}, + &Latest_V1_13_To_Latest_V1_14 {}, + &Latest_V1_14_To_Latest_V1_15 {}, + &Latest_V1_15_To_V1_16_0 {}, + &Latest_V1_16_To_V1_17_0 {}, + // This is the last upgrade function, it will be called when the index is up to date. + // any other upgrade function should be added before this one. + &ToCurrentNoOp {}, +]; + +/// Causes a compile-time error if the argument is not in range of `0..UPGRADE_FUNCTIONS.len()` +macro_rules! function_index { + ($start:expr) => {{ + const _CHECK_INDEX: () = { + if $start >= $crate::update::upgrade::UPGRADE_FUNCTIONS.len() { + panic!("upgrade functions out of range") + } + }; + + $start + }}; +} + +const fn start(from: (u32, u32, u32)) -> Option { + let start = match from { + (1, 12, 0..=2) => function_index!(0), + (1, 12, 3..) => function_index!(1), + (1, 13, 0) => function_index!(2), + (1, 13, _) => function_index!(4), + (1, 14, _) => function_index!(5), + // We must handle the current version in the match because in case of a failure some index may have been upgraded but not other. + (1, 15, _) => function_index!(6), + (1, 16, _) => function_index!(7), + (1, 17, _) => function_index!(8), + // We deliberately don't add a placeholder with (VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH) here to force manually + // considering dumpless upgrade. + (_major, _minor, _patch) => return None, + }; + + Some(start) +} + +/// Causes a compile-time error if the latest package cannot be upgraded. +/// +/// This serves as a reminder to consider the proper dumpless upgrade implementation when changing the package version. +const _CHECK_PACKAGE_CAN_UPGRADE: () = { + if start((VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH)).is_none() { + panic!("cannot upgrade from latest package version") + } +}; + /// Return true if the cached stats of the index must be regenerated pub fn upgrade( wtxn: &mut RwTxn, @@ -36,33 +94,12 @@ where MSP: Fn() -> bool + Sync, { let from = index.get_version(wtxn)?.unwrap_or(db_version); - let upgrade_functions: &[&dyn UpgradeIndex] = &[ - &V1_12_To_V1_12_3 {}, - &V1_12_3_To_V1_13_0 {}, - &V1_13_0_To_V1_13_1 {}, - &V1_13_1_To_Latest_V1_13 {}, - &Latest_V1_13_To_Latest_V1_14 {}, - &Latest_V1_14_To_Latest_V1_15 {}, - // This is the last upgrade function, it will be called when the index is up to date. - // any other upgrade function should be added before this one. - &ToCurrentNoOp {}, - ]; - let start = match from { - (1, 12, 0..=2) => 0, - (1, 12, 3..) => 1, - (1, 13, 0) => 2, - (1, 13, _) => 4, - (1, 14, _) => 5, - // We must handle the current version in the match because in case of a failure some index may have been upgraded but not other. - (1, 15, _) => 6, - (major, minor, patch) => { - return Err(InternalError::CannotUpgradeToVersion(major, minor, patch).into()) - } - }; + let start = + start(from).ok_or_else(|| InternalError::CannotUpgradeToVersion(from.0, from.1, from.2))?; enum UpgradeVersion {} - let upgrade_path = &upgrade_functions[start..]; + let upgrade_path = &UPGRADE_FUNCTIONS[start..]; let mut current_version = from; let mut regenerate_stats = false; diff --git a/crates/milli/src/update/upgrade/v1_15.rs b/crates/milli/src/update/upgrade/v1_15.rs index cea4783a1..3457e69ba 100644 --- a/crates/milli/src/update/upgrade/v1_15.rs +++ b/crates/milli/src/update/upgrade/v1_15.rs @@ -1,4 +1,6 @@ use heed::RwTxn; +use roaring::RoaringBitmap; +use serde::Deserialize; use super::UpgradeIndex; use crate::progress::Progress; @@ -26,3 +28,14 @@ impl UpgradeIndex for Latest_V1_14_To_Latest_V1_15 { (1, 15, 0) } } + +/// Parts of v1.15 `IndexingEmbeddingConfig` that are relevant for upgrade to v1.16 +/// +/// # Warning +/// +/// This object should not be rewritten to the DB, only read to get the name and `user_provided` roaring. +#[derive(Debug, Deserialize)] +pub struct IndexEmbeddingConfig { + pub name: String, + pub user_provided: RoaringBitmap, +} diff --git a/crates/milli/src/update/upgrade/v1_16.rs b/crates/milli/src/update/upgrade/v1_16.rs new file mode 100644 index 000000000..02dd136ce --- /dev/null +++ b/crates/milli/src/update/upgrade/v1_16.rs @@ -0,0 +1,67 @@ +use heed::types::{SerdeJson, Str}; +use heed::RwTxn; + +use super::UpgradeIndex; +use crate::progress::Progress; +use crate::vector::db::{EmbedderInfo, EmbeddingStatus}; +use crate::{Index, InternalError, Result}; + +#[allow(non_camel_case_types)] +pub(super) struct Latest_V1_15_To_V1_16_0(); + +impl UpgradeIndex for Latest_V1_15_To_V1_16_0 { + fn upgrade( + &self, + wtxn: &mut RwTxn, + index: &Index, + _original: (u32, u32, u32), + _progress: Progress, + ) -> Result { + let v1_15_indexing_configs = index + .main + .remap_types::>>() + .get(wtxn, crate::index::main_key::EMBEDDING_CONFIGS)? + .unwrap_or_default(); + + let embedders = index.embedding_configs(); + for config in v1_15_indexing_configs { + let embedder_id = embedders.embedder_id(wtxn, &config.name)?.ok_or( + InternalError::DatabaseMissingEntry { + db_name: crate::index::db_name::VECTOR_EMBEDDER_CATEGORY_ID, + key: None, + }, + )?; + let info = EmbedderInfo { + embedder_id, + // v1.15 used not to make a difference between `user_provided` and `! regenerate`. + embedding_status: EmbeddingStatus::from_user_provided(config.user_provided), + }; + embedders.put_embedder_info(wtxn, &config.name, &info)?; + } + + Ok(false) + } + + fn target_version(&self) -> (u32, u32, u32) { + (1, 16, 0) + } +} + +#[allow(non_camel_case_types)] +pub(super) struct Latest_V1_16_To_V1_17_0(); + +impl UpgradeIndex for Latest_V1_16_To_V1_17_0 { + fn upgrade( + &self, + _wtxn: &mut RwTxn, + _index: &Index, + _original: (u32, u32, u32), + _progress: Progress, + ) -> Result { + Ok(false) + } + + fn target_version(&self) -> (u32, u32, u32) { + (1, 17, 0) + } +} diff --git a/crates/milli/src/vector/composite.rs b/crates/milli/src/vector/composite.rs index 8314b8649..2e31da094 100644 --- a/crates/milli/src/vector/composite.rs +++ b/crates/milli/src/vector/composite.rs @@ -59,12 +59,24 @@ pub struct EmbedderOptions { impl Embedder { pub fn new( - EmbedderOptions { search, index }: EmbedderOptions, + EmbedderOptions { search: search_options, index: index_options }: EmbedderOptions, cache_cap: usize, ) -> Result { - let search = SubEmbedder::new(search, cache_cap)?; + // don't check similarity if one child is a rest embedder with fragments + // FIXME: skipping the check isn't ideal but we are unsure how to handle fragments in this context + let mut skip_similarity_check = false; + for options in [&search_options, &index_options] { + if let SubEmbedderOptions::Rest(options) = &options { + if !options.search_fragments.is_empty() || !options.indexing_fragments.is_empty() { + skip_similarity_check = true; + break; + } + } + } + + let search = SubEmbedder::new(search_options, cache_cap)?; // cache is only used at search - let index = SubEmbedder::new(index, 0)?; + let index = SubEmbedder::new(index_options, 0)?; // check dimensions if search.dimensions() != index.dimensions() { @@ -73,7 +85,12 @@ impl Embedder { index.dimensions(), )); } + // check similarity + if skip_similarity_check { + return Ok(Self { search, index }); + } + let search_embeddings = search .embed( vec![ diff --git a/crates/milli/src/vector/db.rs b/crates/milli/src/vector/db.rs index 0e890fac9..d445b47c0 100644 --- a/crates/milli/src/vector/db.rs +++ b/crates/milli/src/vector/db.rs @@ -117,10 +117,18 @@ impl EmbeddingStatus { Default::default() } + /// Create a new `EmbeddingStatus` that assumes that any `user_provided` docid is also skipping regenerate. + /// + /// Used for migration from v1.15 and earlier DBs. + pub(crate) fn from_user_provided(user_provided: RoaringBitmap) -> Self { + Self { user_provided, skip_regenerate_different_from_user_provided: Default::default() } + } + /// Whether the document contains user-provided vectors for that embedder. pub fn is_user_provided(&self, docid: DocumentId) -> bool { self.user_provided.contains(docid) } + /// Whether vectors should be regenerated for that document and that embedder. pub fn must_regenerate(&self, docid: DocumentId) -> bool { let invert = self.skip_regenerate_different_from_user_provided.contains(docid); diff --git a/crates/milli/src/vector/mod.rs b/crates/milli/src/vector/mod.rs index f64223e41..1f07f6c4f 100644 --- a/crates/milli/src/vector/mod.rs +++ b/crates/milli/src/vector/mod.rs @@ -556,9 +556,6 @@ impl ArroyWrapper { for reader in self.readers(rtxn, self.quantized_db()) { let reader = reader?; let documents = reader.item_ids(); - if documents.is_empty() { - break; - } stats.documents |= documents; stats.number_of_embeddings += documents.len(); } @@ -566,9 +563,6 @@ impl ArroyWrapper { for reader in self.readers(rtxn, self.angular_db()) { let reader = reader?; let documents = reader.item_ids(); - if documents.is_empty() { - break; - } stats.documents |= documents; stats.number_of_embeddings += documents.len(); } @@ -841,6 +835,25 @@ impl EmbedderOptions { } } } + + pub fn has_fragments(&self) -> bool { + match &self { + EmbedderOptions::HuggingFace(_) + | EmbedderOptions::OpenAi(_) + | EmbedderOptions::Ollama(_) + | EmbedderOptions::UserProvided(_) => false, + EmbedderOptions::Rest(embedder_options) => { + !embedder_options.indexing_fragments.is_empty() + } + EmbedderOptions::Composite(embedder_options) => { + if let SubEmbedderOptions::Rest(embedder_options) = &embedder_options.index { + !embedder_options.indexing_fragments.is_empty() + } else { + false + } + } + } + } } impl Default for EmbedderOptions { diff --git a/crates/milli/tests/search/filters.rs b/crates/milli/tests/search/filters.rs index bb5943782..c97143d48 100644 --- a/crates/milli/tests/search/filters.rs +++ b/crates/milli/tests/search/filters.rs @@ -25,13 +25,16 @@ macro_rules! test_filter { let SearchResult { documents_ids, .. } = search.execute().unwrap(); let filtered_ids = search::expected_filtered_ids($filter); - let expected_external_ids: Vec<_> = + let mut expected_external_ids: Vec<_> = search::expected_order(&criteria, TermsMatchingStrategy::default(), &[]) .into_iter() .filter_map(|d| if filtered_ids.contains(&d.id) { Some(d.id) } else { None }) .collect(); - let documents_ids = search::internal_to_external_ids(&index, &documents_ids); + let mut documents_ids = search::internal_to_external_ids(&index, &documents_ids); + + expected_external_ids.sort_unstable(); + documents_ids.sort_unstable(); assert_eq!(documents_ids, expected_external_ids); } }; @@ -102,3 +105,9 @@ test_filter!(empty_filter_1_double_not, vec![Right("NOT opt1 IS NOT EMPTY")]); test_filter!(in_filter, vec![Right("tag_in IN[1, 2, 3, four, five]")]); test_filter!(not_in_filter, vec![Right("tag_in NOT IN[1, 2, 3, four, five]")]); test_filter!(not_not_in_filter, vec![Right("NOT tag_in NOT IN[1, 2, 3, four, five]")]); + +test_filter!(starts_with_filter_single_letter, vec![Right("tag STARTS WITH e")]); +test_filter!(starts_with_filter_diacritic, vec![Right("tag STARTS WITH é")]); +test_filter!(starts_with_filter_empty_prefix, vec![Right("tag STARTS WITH ''")]); +test_filter!(starts_with_filter_hell, vec![Right("title STARTS WITH hell")]); +test_filter!(starts_with_filter_hello, vec![Right("title STARTS WITH hello")]); diff --git a/crates/milli/tests/search/mod.rs b/crates/milli/tests/search/mod.rs index fa03f1cc1..578a22009 100644 --- a/crates/milli/tests/search/mod.rs +++ b/crates/milli/tests/search/mod.rs @@ -12,7 +12,8 @@ use milli::update::new::indexer; use milli::update::{IndexerConfig, Settings}; use milli::vector::RuntimeEmbedders; use milli::{ - AscDesc, Criterion, DocumentId, FilterableAttributesRule, Index, Member, TermsMatchingStrategy, + normalize_facet, AscDesc, Criterion, DocumentId, FilterableAttributesRule, Index, Member, + TermsMatchingStrategy, }; use serde::{Deserialize, Deserializer}; use slice_group_by::GroupBy; @@ -36,7 +37,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index { let path = tempfile::tempdir().unwrap(); let options = EnvOpenOptions::new(); let mut options = options.read_txn_without_tls(); - options.map_size(10 * 1024 * 1024); // 10 MB + options.map_size(10 * 1024 * 1024); // 10 MiB let index = Index::new(options, &path, true).unwrap(); let mut wtxn = index.write_txn().unwrap(); @@ -46,6 +47,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index { builder.set_criteria(criteria.to_vec()); builder.set_filterable_fields(vec![ + FilterableAttributesRule::Field(S("title")), FilterableAttributesRule::Field(S("tag")), FilterableAttributesRule::Field(S("asc_desc_rank")), FilterableAttributesRule::Field(S("_geo")), @@ -220,6 +222,19 @@ fn execute_filter(filter: &str, document: &TestDocument) -> Option { { id = Some(document.id.clone()) } + } else if let Some((field, prefix)) = filter.split_once("STARTS WITH") { + let field = match field.trim() { + "tag" => &document.tag, + "title" => &document.title, + "description" => &document.description, + _ => panic!("Unknown field: {field}"), + }; + + let field = normalize_facet(field); + let prefix = normalize_facet(prefix.trim().trim_matches('\'')); + if field.starts_with(&prefix) { + id = Some(document.id.clone()); + } } else if let Some(("asc_desc_rank", filter)) = filter.split_once('<') { if document.asc_desc_rank < filter.parse().unwrap() { id = Some(document.id.clone()) @@ -271,6 +286,8 @@ fn execute_filter(filter: &str, document: &TestDocument) -> Option { } else if matches!(filter, "tag_in NOT IN[1, 2, 3, four, five]") { id = (!matches!(document.id.as_str(), "A" | "B" | "C" | "D" | "E")) .then(|| document.id.clone()); + } else { + panic!("Unknown filter: {filter}"); } id } diff --git a/crates/openapi-generator/Cargo.toml b/crates/openapi-generator/Cargo.toml new file mode 100644 index 000000000..652f6fc57 --- /dev/null +++ b/crates/openapi-generator/Cargo.toml @@ -0,0 +1,12 @@ +[package] +name = "openapi-generator" +version = "0.1.0" +edition = "2021" +publish = false + +[dependencies] +meilisearch = { path = "../meilisearch" } +serde_json = "1.0" +clap = { version = "4.5.40", features = ["derive"] } +anyhow = "1.0.98" +utoipa = "5.4.0" diff --git a/crates/openapi-generator/src/main.rs b/crates/openapi-generator/src/main.rs new file mode 100644 index 000000000..a6196f771 --- /dev/null +++ b/crates/openapi-generator/src/main.rs @@ -0,0 +1,43 @@ +use std::path::PathBuf; + +use anyhow::Result; +use clap::Parser; +use meilisearch::routes::MeilisearchApi; +use utoipa::OpenApi; + +#[derive(Parser)] +#[command(name = "openapi-generator")] +#[command(about = "Generate OpenAPI specification for Meilisearch")] +struct Cli { + /// Output file path (default: meilisearch.json) + #[arg(short, long, value_name = "FILE")] + output: Option, + + /// Pretty print the JSON output + #[arg(short, long)] + pretty: bool, +} + +fn main() -> Result<()> { + let cli = Cli::parse(); + + // Generate the OpenAPI specification + let openapi = MeilisearchApi::openapi(); + + // Determine output path + let output_path = cli.output.unwrap_or_else(|| PathBuf::from("meilisearch.json")); + + // Serialize to JSON + let json = if cli.pretty { + serde_json::to_string_pretty(&openapi)? + } else { + serde_json::to_string(&openapi)? + }; + + // Write to file + std::fs::write(&output_path, json)?; + + println!("OpenAPI specification written to: {}", output_path.display()); + + Ok(()) +} diff --git a/documentation/experimental-features.md b/documentation/experimental-features.md new file mode 100644 index 000000000..64e884ee1 --- /dev/null +++ b/documentation/experimental-features.md @@ -0,0 +1,83 @@ +# Experimental features: description and process + +## Quick definition of experimental features + +An experimental feature is a feature present in the final Meilisearch binary that is not considered stable. This means the API might become incompatible between two Meilisearch releases. + +Experimental features must be explicitly enabled by a user. + +> ⚠️ Experimental features are NOT [prototypes](./prototypes.md). All experimental features are thoroughly tested before release and follow the same quality standards as other features. + +## Motivation + +Since the release of v1, Meilisearch is considered a stable binary and its API cannot break between minor and patch versions. This means it is impossible to make breaking changes to a feature without releasing a major version. + +This limitation, which guarantees our users Meilisearch is a stable and reliable product, also applies to new features. If we introduce a new feature in one release, any breaking changes will require a new major release. + +To prevent frequently releasing new major versions but still continue to develop new features, we will first provide these features as "experimental". This allows users to test them, report implementation issues, and give us important feedback. + +## When is a feature considered experimental? + +Not all new features need to go through the experimental feature process. + +We will treat features as experimental when: + +- New features we are considering adding to the search engine, but need user feedback before making our final decision and/or committing to a specific implementation. Example: a new API route or CLI flag +- Improvements to existing functionality the engine team is not comfortable releasing as stable immediately. Example: changes to search relevancy or performance improvements +- New features that would introduce breaking changes and cannot be integrated as stable before a new major version +- New features that will NEVER be stable. These features are useful to provide quick temporary fixes to critical issues. Example: an option to disable auto-batching + +## How to enable experimental features? + +Users must explicitly enable experimental features with a CLI flag. Experimental features will always be disabled by default. + +Example CLI flags: `--experimental-disable-soft-delete`, `--experimental-multi-index-search`. + +⚠️ To ensure users understand a feature is experimental, flags must contain the `experimental` prefix. + +## Rules and expectations + +- The API and behavior of an experimental feature can break between two minor versions of Meilisearch +- The experimental feature process described here can significantly change between 2 minor versions of Meilisearch +- Providing a feature as “experimental” does not guarantee it will be stable one day: the newly introduced experimental features or improvements may be removed in a future release +- While experimental features are supposed to be unstable regarding usage and compatibility between versions, users should not expect any more bugs or issues than with any other Meilisearch feature. Experimental features should follow the same quality standards of stable features, including thorough testing suites and in-depth code reviews. That said, certain experimental features might be inherently more prone to bugs and downgrades + +## Communication with users + +For each new experimental feature, we must: +- GitHub: open a dedicated GitHub discussion in the [product repository](https://github.com/meilisearch/product/discussions). This discussion should never become stale and be updated regularly. Users need to understand they can interact with us and get quick answers. The discussion should inform users about: + - Our motivations: why this feature is unstable? + - Usage: how to activate this feature? Do we need to do a migration with a dump? + - Planning: what are the conditions to make this feature stable? When do we expect it become stable? +- Meilisearch CLI: update the `--help` command in the Meilisearch binary so it redirects users to the related GitHub discussion and warns them about the unstable state of the features +- Documentation: create a small dedicated page about the purpose of the experimental feature. This page should contain no usage instructions and redirect users to the related GitHub discussion for more information + +## Usage warnings + +- API can break between 2 versions of Meilisearch. People using the experimental feature in production should pay extra attention to it. +- Some experimental features might require re-indexing. In these cases, users will have to use a dump to activate and deactivate an experimental feature. Users will be clearly informed about this in the related GitHub discussion + +> ⚠️ Since this process is not mature yet, users might experience issues with their DB when deactivating these features even when using a dump.
+> We recommend users always save their data (with snapshots and/or dumps) before activating experimental features. + +## Technical details + +### Why does Meilisearch need to be restarted when activating an experimental feature? + +Meilisearch uses LMDB to store both documents and internal application data, such as Meilisearch tasks. Altering these internal data structures requires closing and re-opening the LMDB environment. + +If an experimental feature implementation involves a modification of internal data structures, users must restart Meilisearch. This cannot be done via HTTP routes. + +Unfortunately, this might impact most experimental features. However, this might change in the future, or adapted to the context of a specific new feature. + +### Why will some features require migrating data with dumps? + +Under some circumstances, Meilisearch might have issues when reading a database generated by a different Meilisearch release. This might cause an instance to crash or work with faulty data. + +This is already a possibility when migrating between minor Meilisearch versions, and is more likely to happen when activating a new experimental feature. The opposite operation—migrating a database with experimental features activated to a database where those features are not active—is currently riskier. As we develop and improve the development of experimental features, this procedure will become safer and more reliable. + +### Restarting Meilisearch and migrating databases with dumps to activate an experimental feature is inconvenient. Will this improve in the future? + +We understand the situation is inconvenient and less than ideal. We will only ask users to use dumps when activating experimental features when it’s strictly necessary. + +Avoiding restarts is more difficult, especially for features that currently require database migrations with dumps. We are not currently working on this, but the situation might change in the future. diff --git a/documentation/prototypes.md b/documentation/prototypes.md new file mode 100644 index 000000000..047f22e7b --- /dev/null +++ b/documentation/prototypes.md @@ -0,0 +1,74 @@ +# Prototype process + +## What is a prototype? + +A prototype is an alternative version of Meilisearch (provided in a Docker image) containing a new feature or an improvement the engine team provides to the users. + +## Why providing a prototype? + +For some features or improvements we want to introduce in Meilisearch, we also have to make the users test them first before releasing them for many reasons: +- to ensure we solve the first use case defined during the discovery +- to ensure the API does not have major issues of usages +- identify/remove concrete technical roadblocks by working on an implementation as soon as possible, like performance issues +- to get any other feedback from the users regarding their usage + +These make us iterate fast before stabilizing it for the current release. + +> ⚠️ Prototypes are NOT [experimental features](./experimental-features.md). All experimental features are thoroughly tested before release and follow the same quality standards as other features. This is not the case with prototypes which are the equivalent of a first draft of a new feature. + +## How to publish a prototype? + +### Release steps + +The prototype name must follow this convention: `prototype-X-Y` where +- `X` is the feature name formatted in `kebab-case`. It should not end with a single number. +- `Y` is the version of the prototype, starting from `0`. + +✅ Example: `prototype-auto-resize-0`.
+❌ Bad example: `auto-resize-0`: lacks the `prototype` prefix.
+❌ Bad example: `prototype-auto-resize`: lacks the version suffix.
+❌ Bad example: `prototype-auto-resize-0-0`: feature name ends with a single number. + +Steps to create a prototype: + +1. In your terminal, go to the last commit of your branch (the one you want to provide as a prototype). +2. Create a tag following the convention: `git tag prototype-X-Y` +3. Run Meilisearch and check that its launch summary features a line: `Prototype: prototype-X-Y` (you may need to switch branches and back after tagging for this to work). +3. Push the tag: `git push origin prototype-X-Y` +4. Check the [Docker CI](https://github.com/meilisearch/meilisearch/actions/workflows/publish-docker-images.yml) is now running. + +🐳 Once the CI has finished to run (~1h30), a Docker image named `prototype-X-Y` will be available on [DockerHub](https://hub.docker.com/repository/docker/getmeili/meilisearch/general). People can use it with the following command: `docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-X-Y`.
+More information about [how to run Meilisearch with Docker](https://docs.meilisearch.com/learn/cookbooks/docker.html#download-meilisearch-with-docker). + +⚠️ However, no binaries will be created. If the users do not use Docker, they can go to the `prototype-X-Y` tag in the Meilisearch repository and compile it from the source code. + +### Communication + +When sharing a prototype with users, it's important to +- remind them not to use it in production. Prototypes are solely for test purposes. +- explain how to run the prototype +- explain how to use the new feature +- encourage users to let their feedback + +The prototype should be shared at least in the related issue and/or the related product discussion. It's the developer and the PM to decide to add more communication, like sharing it on Discord or Twitter. + +Here is an example of messages to share on GitHub: + +> Hello everyone, +> +> Here is the current prototype you can use to test the new XXX feature: +> +> How to run the prototype? +> You need to start from a fresh new database (remove the previous used `data.ms`) and use the following Docker image: +> ```bash +> docker run -it --rm -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-X-Y +> ``` +> +> You can use the feature this way: +> ```bash +> ... +> ``` +> +> ⚠️ We do NOT recommend using this prototype in production. This is only for test purposes. +> +> Everyone is more than welcome to give feedback and to report any issue or bug you might encounter when using this prototype. Thanks in advance for your involvement. It means a lot to us ❤️ diff --git a/documentation/release.md b/documentation/release.md new file mode 100644 index 000000000..b3d0ed7e9 --- /dev/null +++ b/documentation/release.md @@ -0,0 +1,79 @@ +# Meilisearch release process + +This guide is to describe how to make releases for the current repository. + +## 📅 Weekly Meilisearch release + +1. A weekly meeting is held every Thursday afternoon to define the release and to ensure minimal checks before the release. +
+Check out the TODO 👇👇👇 +- [ ] Define the version of the release (`vX.Y.Z`) based on our Versioning Policy
. +- [ ] Define the commit that will reference the tag release. Every PR merged after this commit will not be taken into account in the future release +- [ ] Manually test `--experimental-dumpless-upgrade` on a DB of the previous Meilisearch minor version
+- [ ] Check recent automated tests on `main`
+ - [ ] Scheduled test suite
+ - [ ] Scheduled SDK tests
+ - [ ] Scheduled flaky tests
+ - [ ] Scheduled fuzzer tests
+ - [ ] Scheduled Docker CI (dry run)
+ - [ ] Scheduled GitHub binary release (dry run)
+- [ ] Create the PR updating the versionand merge it. +
+ +2. Go to the GitHub interface, in the [`Release` section](https://github.com/meilisearch/meilisearch/releases). + +3. Select the already drafted release or click on the `Draft a new release` button if you want to start a blank one, and fill the form with the appropriate information. +⚠️ Publish on a specific commit defined by the team. Or publish on `main`, but ensure you do want all the PRs merged in your release. + +⚙️ The CIs will be triggered to: +- [Upload binaries](https://github.com/meilisearch/meilisearch/actions/workflows/publish-binaries.yml) to the associated GitHub release. +- [Publish the Docker images](https://github.com/meilisearch/meilisearch/actions/workflows/publish-docker-images.yml) (`latest`, `vX`, `vX.Y` and `vX.Y.Z`) to DockerHub -> check the "Docker meta" steps in the CI to check the right tags are created +- [Publish binaries for Homebrew and APT](https://github.com/meilisearch/meilisearch/actions/workflows/publish-apt-brew-pkg.yml) +- [Move the `latest` git tag to the release commit](https://github.com/meilisearch/meilisearch/actions/workflows/latest-git-tag.yml). + + +### 🔥 How to do a patch release for a hotfix + +It happens some releases come with impactful bugs in production (e.g. indexation or search issues): we obviously don't wait for the next cycle to fix them and we release a patched version of Meilisearch. + +1. Create a new release branch starting from the latest stable Meilisearch release (`latest` git tag or the corresponding `vX.Y.Z` tag). + +```bash +# Ensure you get all the current tags of the repository +git fetch origin --tags --force + +# Create the branch +git checkout vX.Y.Z # The latest release you want to patch +git checkout -b release-vX.Y.Z+1 # Increase the Z here +git push -u origin release-vX.Y.Z+1 +``` + +2. Add the newly created branch `release-vX.Y.Z+1` to "Target Branches" of [this GitHub Ruleset](https://github.com/meilisearch/meilisearch/settings/rules/4253297). +Why? GitHub Merge Queue does not work with branch patterns yet, so we have to add the new created branch to the GitHub Ruleset to be able to use GitHub Merge Queue. + +3. Change the [version in `Cargo.toml` file](https://github.com/meilisearch/meilisearch/blob/e9b62aacb38f2c7a777adfda55293d407e0d6254/Cargo.toml#L21). You can use [our automation](https://github.com/meilisearch/meilisearch/actions/workflows/update-cargo-toml-version.yml) -> click on `Run workflow` -> Fill the appropriate version and run it on the newly created branch `release-vX.Y.Z` -> Click on "Run workflow". A PR updating the version in the `Cargo.toml` and `Cargo.lock` files will be created. + +4. Open and merge the PRs (fixing your bugs): they should point to `release-vX.Y.Z+1` branch. + +5. Go to the GitHub interface, in the [`Release` section](https://github.com/meilisearch/meilisearch/releases) and click on `Draft a new release` + ⚠️⚠️⚠️ Publish on `release-vX.Y.Z+1` branch, not on `main`! + +⚠️ If doing a patch release that should NOT be the `latest` release: + +- Do NOT check `Set as the latest release` when creating the GitHub release. If you did, quickly interrupt all CIs and delete the GitHub release! +- Once the release is created, you don't have to care about Homebrew, APT and Docker CIs: they will not consider this new release as the latest; the CIs are already adapted for this situation. +- However, the [CI updating the `latest` git tag](https://github.com/meilisearch/meilisearch/actions/workflows/latest-git-tag.yml) is not working for this situation currently and will attach the `latest` git tag to the just-created release, which is something we don't want! If you don't succeed in stopping the CI on time, don't worry, you just have to re-run the [old CI](https://github.com/meilisearch/meilisearch/actions/workflows/latest-git-tag.yml) corresponding to the real latest release, and the `latest` git tag will be attached back to the right commit. + +6. Bring the new commits back from `release-vX.Y.Z+1` to `main` by merging a PR originating `release-vX.Y.Z+1` and pointing to `main`. + +⚠️ If you encounter any merge conflicts, please do NOT fix the git conflicts directly on the `release-vX.Y.Z` branch. It would bring the changes present in `main` into `release-vX.Y.Z`, which would break a potential future patched release. + +![GitHub interface showing merge conflicts](../assets/merge-conflicts.png) + +Instead: +- Create a new branch originating `release-vX.Y.Z+1`, like `tmp-release-vX.Y.Z+1` +- Create a PR from the `tmp-release-vX.Y.Z+1` branch and pointing to `main` +- Fix the git conflicts on this new branch + - By either fixing the git conflict via the GitHub interface + - By pulling the `main` branch into `tmp-release-vX.Y.Z+1` and fixing them on your machine. +- Merge this new PR into `main` diff --git a/documentation/versioning-policy.md b/documentation/versioning-policy.md new file mode 100644 index 000000000..eda02137a --- /dev/null +++ b/documentation/versioning-policy.md @@ -0,0 +1,83 @@ +# Versioning policy + +This page describes the versioning rules Meilisearch will follow once v1.0.0 is released and how/when we should increase the MAJOR, MINOR, and PATCH of the versions. + +## 🤖 Basic rules + +Meilisearch engine releases follow the [SemVer rules](https://semver.org/), including the following basic ones: + +> 🔥 Given a version number MAJOR.MINOR.PATCH, increment the: +> +> 1. MAJOR version when you make incompatible API changes +> 2. MINOR version when you add functionality in a backwards compatible +> manner +> 3. PATCH version when you make backwards compatible bug fixes + +**Changes that MAY lead the Meilisearch users (developers) to change their code are considered API incompatibility and will make us increase the MAJOR version of Meilisearch.** + +**In other terms, if the users MAY have to do more steps than just downloading the new Meilisearch instance and running it, a new MAJOR is needed.** + +Examples of changes making the code break and then, involving increasing the MAJOR: + +- Name change of a route or a field in the request/response body +- Change a default value of a parameter or a setting. +- Any API behavior change: the users expect in their code the engine to behave this way, but it does not. +Examples: + - Make a synchronous error asynchronous or the contrary + - `displayableAttributes` impact now the `/documents` route: the users expect to retrieve all the fields, so specific fields, in their code but cannot. +- Change a final value type. +Ex: `/stats` now return floats instead of integers. This can impact strongly typed languages. + +⚠️ This guide only applies to the Meilisearch binary. Additional tools like SDKs and Docker images are out of the scope of this guide. However, we will ensure the changelogs are clear enough to inform users of the changes and their impacts. + +## ✋ Exceptions related to Meilisearch’s specificities + +Meilisearch is a search engine working with an internal database. It means some parts of the project can be really problematic to consider as breaking (and then leading to an increase of the MAJOR) without slowing down innovation. + +Here is the list of the following exceptions of changes that will not lead to an increase in the MAJOR in Meilisearch release. + +### DB incompatibilities: force using a dump + +A DB breaking leads to a failure when starting Meilisearch: you need to use a dump. + +We know this kind of failure requiring an additional step is the definition of “breaking” on the user side, but it’s really complicated to consider increasing a MAJOR for this. Indeed, since we don’t want to release a major version every two months and we also want to keep innovating simultaneously, increasing the MINOR is the best solution. + +People would need to use dump sometimes between two MAJOR versions; for instance, this is something [PostgreSQL does](https://www.postgresql.org/support/versioning/) by asking their users to perform some manual actions between two MINOR releases. + +### Search relevancy and algorithm improvements + +Relevancy is the engine team job; we need to improve it every day, like performance. It will be really hard to improve the engine without allowing the team to change the relevancy algorithm. Same as for DB breaking, considering relevancy changes as breaking can really slow down innovation. + +This way, changing the search relevancy, not the API behavior or fields, but the final relevancy result (like cropping algorithm, search algorithm, placeholder behavior, highlight behavior…) is not considered as a breaking change. Indeed, changing the relevancy behavior is not supposed to make the code fail since the final results of Meilisearch are only displayed, no matter the matched documents. + +This kind of change will lead us to increase the MINOR to let the people know about the change and avoid non-expected changes when pulling the latest patched version of Meilisearch. Indeed, increasing the MINOR (instead of the PATCH) will prevent users from downloading the new patched version without noticing the changes. + +🚨 Any change about the relevancy that is related to API usage, and thus, that may impact users to change their code (for instance changing the default `matchingStrategy` value) is not related to this specific section and would lead us to increase the MAJOR. + +### New "variant" type addition + +We don't consider breaking to add a new type to an already existing list of variant. For example, adding a new type of `task`, or a new type of error `code`. + +We are aware some strongly typed language code bases could be impacted, and our recommendation is to handle the possibility of having an unknown type when deserializing Meilisearch's response. + +### Human-readability purposes + +- Changing the value of `message` or `link` in error object will only increase the PATCH. The users should not refer to this field in their code since `code` and `type` exist in the same object. +- Any error message sent to the terminal that changed will increase the PATCH. People should not rely on them since these messages are for human debugging. +- Updating the logs format will increase the MINOR: this is supposed to be used by humans for debugging, but we are aware some people can plug some tools at the top of them. But since it’s not the main purpose of our logs, we don’t want to increase the MAJOR for a log format change. However, we will increase the MINOR to let the people know better about the change and avoid bad surprises when pulling the latest patched version of Meilisearch. + +### Integrated web-interface + +Any changes done to the integrated web interface are not considered breaking. The interface is considered an additional tool for test purposes, not for production. + +## 📝 About the Meilisearch changelogs + +All the changes, no matter if they are considered as breaking or not, if they are related to an algorithm change or not, will be announced in the changelogs. + +The details of the change will depend on the impact on the users. For instance, giving too many details on really deep tech improvements can lead to some confusion on the user side. + +## 👀 Some precisions + +- Updating a dependence requirement of Meilisearch is NOT considered as breaking by SemVer guide and will lead, in our case, to increasing the MINOR. Indeed, increasing the MINOR (instead of the PATCH) will prevent users from downloading the new patched version without noticing the changes. +See the [related rule](https://semver.org/#what-should-i-do-if-i-update-my-own-dependencies-without-changing-the-public-api). +- Fixing a CVE (Common Vulnerabilities and Exposures) will not increase the MAJOR; depending on the CVE, it will be a PATCH or a MINOR upgrade.