Compare commits

...

127 Commits

Author SHA1 Message Date
38100d5c05 securise the connecions between the leader and followers + forbid joining the cluster without the right master key 2023-03-23 12:28:36 +01:00
7ad14373e9 send the api keys in a dump 2023-03-22 18:45:08 +01:00
0a772fb391 sync the api key operations between the leader and followers 2023-03-22 15:49:35 +01:00
56f7e0d89a remove an unused function 2023-03-22 14:25:26 +01:00
9112b26cd1 fix the consistency rules 2023-03-22 14:24:53 +01:00
8089ec668e fix the analytics 2023-03-21 18:39:06 +01:00
4058f6c870 fix the tests 2023-03-21 18:28:26 +01:00
3df58831c6 make the consistency configurable 2023-03-21 18:25:53 +01:00
8ebc2b19ea remove a bunch of confusing logs 2023-03-21 17:58:36 +01:00
5ecfa3570f let the follower join the leader at any time 2023-03-21 17:56:18 +01:00
498b59ac84 get rids of my broken tricks/assert 2023-03-21 12:25:33 +01:00
301907869d first version working with index operation 2023-03-20 18:13:39 +01:00
f1aa22567f add some debug logs 2023-03-16 19:02:54 +01:00
de1b939ca0 plug the cluster thingy into meilisearch 2023-03-16 18:53:22 +01:00
6cc14feb51 synchronize most of the operations 2023-03-16 16:31:16 +01:00
145f0e753c move the leader to a new design 2023-03-16 14:31:03 +01:00
c4c1240ab8 start distributing meilisearch 2023-03-14 17:38:21 +01:00
7c9a8b1e1b Merge #3587
3587: Enable cache again in test suite CI r=curquiza a=curquiza

Following the change in this PR introduced in v1.1: https://github.com/meilisearch/meilisearch/pull/3422

The cache was removed due to failures (lack of space). Now the binary is smaller (from 250Mb to 50Mb) we want to try to enable the cache again.
Indeed, without the cache step, the CIs are wayyyy slower (45min instead of 20-30min).

For later: Rust 1.68 introduced a new way to fetch crates. Updating the rust version might also help in the future!

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-03-13 13:51:32 +00:00
f45daf8031 Enable cache again in test suite CI 2023-03-13 14:24:15 +01:00
7935bef4cd Merge #3567
3567: Clean CI file names r=curquiza a=curquiza

Make the CI names more consistent to ease the Gillian's onboarding 😇 

No impact for the users or the developers of the team

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-03-09 12:20:18 +00:00
febc8d1b52 Clean CI file names 2023-03-08 19:12:33 +01:00
df3986cd83 Merge #3510 #3551 #3552 #3553
3510: Add scheduled test to Actions for all features r=curquiza a=jlucktay

# Pull Request

## Related issue

Fixes #3506.

## What does this PR do?

Add a new job to the Rust workflow to run `cargo build` and `cargo test` (on the cron schedule only) with the `--all-features` flag.
This will execute across all three environments: Linux, macOS, Windows.
    
Autoformat the Rust workflow file via [the Red Hat YAML extension for Visual Studio Code](https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml).
This straightens out whitespace and string quoting for safer parsing.

As [pointed out by `@irevoire` here](https://github.com/meilisearch/meilisearch/issues/3506#issuecomment-1433501867), changes to CI such as this one will need to wait for #3496 before going ahead.
The new action [was executed on my fork](https://github.com/jlucktay/meilisearch/actions/runs/4211694210) but ended up failing on some metrics tests, as called out in that linked comment.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


3551: Bump Swatinem/rust-cache from 2.2.0 to 2.2.1 r=curquiza a=dependabot[bot]

Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.2.0 to 2.2.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.2.1</h2>
<ul>
<li>Update <code>`@actions/cache</code>` dependency to fix usage of <code>zstd</code> compression.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.2.1</h2>
<ul>
<li>Update <code>`@actions/cache</code>` dependency to fix usage of <code>zstd</code> compression.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="6fd3edff69"><code>6fd3edf</code></a> 2.2.1</li>
<li><a href="a1c019f71a"><code>a1c019f</code></a> update dependencies and rebuild</li>
<li><a href="664ce0090f"><code>664ce00</code></a> chore: Create check-dist.yml (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/96">#96</a>)</li>
<li>See full diff in <a href="https://github.com/Swatinem/rust-cache/compare/v2.2.0...v2.2.1">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Swatinem/rust-cache&package-manager=github_actions&previous-version=2.2.0&new-version=2.2.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

3552: Bump svenstaro/upload-release-action from 2.4.0 to 2.5.0 r=curquiza a=dependabot[bot]

Bumps [svenstaro/upload-release-action](https://github.com/svenstaro/upload-release-action) from 2.4.0 to 2.5.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/svenstaro/upload-release-action/releases">svenstaro/upload-release-action's releases</a>.</em></p>
<blockquote>
<h2>2.5.0</h2>
<ul>
<li>Add retry to upload release <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/96">#96</a> (thanks <a href="https://github.com/sonphantrung"><code>`@​sonphantrung</code></a>)</li>`
</ul>
<h2>2.4.1</h2>
<ul>
<li>Modernize octokit usage</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/svenstaro/upload-release-action/blob/master/CHANGELOG.md">svenstaro/upload-release-action's changelog</a>.</em></p>
<blockquote>
<h2>[2.5.0] - 2023-02-21</h2>
<ul>
<li>Add retry to upload release <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/96">#96</a> (thanks <a href="https://github.com/sonphantrung"><code>`@​sonphantrung</code></a>)</li>`
</ul>
<h2>[2.4.1] - 2023-02-01</h2>
<ul>
<li>Modernize octokit usage</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="7319e4733e"><code>7319e47</code></a> 2.5.0</li>
<li><a href="4e86b8565b"><code>4e86b85</code></a> Prepare release</li>
<li><a href="3a6baf0f12"><code>3a6baf0</code></a> Add CHANGELOG entry for retry feature</li>
<li><a href="e8c797e08e"><code>e8c797e</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/issues/96">#96</a> from sonphantrung/retry-v2</li>
<li><a href="cf83be2c7f"><code>cf83be2</code></a> Merge branch 'master' into retry-v2</li>
<li><a href="cfdd9b50bd"><code>cfdd9b5</code></a> Merge branch 'retry' of <a href="https://github.com/messense/upload-release-action">https://github.com/messense/upload-release-action</a></li>
<li><a href="cc92c9093e"><code>cc92c90</code></a> 2.4.1</li>
<li><a href="72f6bf584a"><code>72f6bf5</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/issues/93">#93</a> from ggreif/gabor/fix</li>
<li><a href="f2899b4677"><code>f2899b4</code></a> use <code>createReadStream</code></li>
<li><a href="af306bddfe"><code>af306bd</code></a> Revert &quot;use the <code>`@file</code>` mechanism of octokit-5&quot;</li>
<li>Additional commits viewable in <a href="https://github.com/svenstaro/upload-release-action/compare/2.4.0...2.5.0">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=svenstaro/upload-release-action&package-manager=github_actions&previous-version=2.4.0&new-version=2.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

3553: Bump docker/build-push-action from 3 to 4 r=curquiza a=dependabot[bot]

Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/docker/build-push-action/releases">docker/build-push-action's releases</a>.</em></p>
<blockquote>
<h2>v4.0.0</h2>
<blockquote>
<p><strong>Note</strong></p>
<p>Buildx v0.10 enables support for a minimal <a href="https://slsa.dev/provenance/">SLSA Provenance</a> attestation, which requires support for <a href="https://github.com/opencontainers/image-spec">OCI-compliant</a> multi-platform images. This may introduce issues with registry and runtime support (e.g. <a href="https://github-redirect.dependabot.com/docker/buildx/issues/1533">Google Cloud Run and AWS Lambda</a>). You can optionally disable the default provenance attestation functionality using <code>provenance: false</code>.</p>
</blockquote>
<ul>
<li>Revert disable provenance by default if not set by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` in <a href="https://github-redirect.dependabot.com/docker/build-push-action/pull/784">docker/build-push-action#784</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/docker/build-push-action/compare/v3.3.1...v4.0.0">https://github.com/docker/build-push-action/compare/v3.3.1...v4.0.0</a></p>
<h2>v3.3.1</h2>
<ul>
<li>Disable provenance by default if not set by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/781">#781</a>)</li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/docker/build-push-action/compare/v3.3.0...v3.3.1">https://github.com/docker/build-push-action/compare/v3.3.0...v3.3.1</a></p>
<h2>v3.3.0</h2>
<blockquote>
<p><strong>Note</strong></p>
<p>Buildx v0.10 enables support for a minimal <a href="https://slsa.dev/provenance/">SLSA Provenance</a> attestation, which requires support for <a href="https://github.com/opencontainers/image-spec">OCI-compliant</a> multi-platform images. This may introduce issues with registry and runtime support (e.g. <a href="https://github-redirect.dependabot.com/docker/buildx/issues/1533">Google Cloud Run and AWS Lambda</a>). You can optionally disable the default provenance attestation functionality using <code>provenance: false</code>.</p>
</blockquote>
<ul>
<li>Add <code>attests</code>, <code>provenance</code> and <code>sbom</code> inputs by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/746">#746</a> <a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/759">#759</a>)</li>
<li>Log GitHub Actions runtime token access controls by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/707">#707</a>)</li>
<li>Examples moved to <a href="https://docs.docker.com/build/ci/github-actions/examples/">docs website</a> by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/718">#718</a>)</li>
<li>Bump minimatch from 3.0.4 to 3.1.2 (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/732">#732</a>)</li>
<li>Bump csv-parse from 5.3.0 to 5.3.3 (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/729">#729</a>)</li>
<li>Bump json5 from 2.2.0 to 2.2.3 (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/749">#749</a>)</li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/docker/build-push-action/compare/v3.2.0...v3.3.0">https://github.com/docker/build-push-action/compare/v3.2.0...v3.3.0</a></p>
<h2>v3.2.0</h2>
<ul>
<li>Remove workaround for <code>setOutput</code> by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/704">#704</a>)</li>
<li>Docs: fix Git context link and add more details about subdir support by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/685">#685</a>)</li>
<li>Docs: named context by <a href="https://github.com/baibaratsky"><code>`@​baibaratsky</code></a>` and <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/665">#665</a>)</li>
<li>Bump <code>`@​actions/core</code>` from 1.9.0 to 1.10.0 (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/667">#667</a> <a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/695">#695</a>)</li>
<li>Bump <code>`@​actions/github</code>` from 5.0.3 to 5.1.1 (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/696">#696</a>)</li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/docker/build-push-action/compare/v3.1.1...v3.2.0">https://github.com/docker/build-push-action/compare/v3.1.1...v3.2.0</a></p>
<h2>v3.1.1</h2>
<ul>
<li>Fix GitHub token not passed with Git context if subdir defined by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/663">#663</a>)</li>
<li>Replace deprecated <code>fs.rmdir</code> with <code>fs.rm</code> by <a href="https://github.com/bendrucker"><code>`@​bendrucker</code></a>` (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/657">#657</a>)</li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/docker/build-push-action/compare/v3.1.0...v3.1.1">https://github.com/docker/build-push-action/compare/v3.1.0...v3.1.1</a></p>
<h2>v3.1.0</h2>
<ul>
<li><code>no-cache-filters</code> input by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/653">#653</a>)</li>
<li>Bump <code>`@​actions/github</code>` from 5.0.1 to 5.0.3 (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/619">#619</a>)</li>
<li>Bump <code>`@​actions/core</code>` from 1.6.0 to 1.9.0 (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/620">#620</a> <a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/637">#637</a>)</li>
<li>Bump csv-parse from 5.0.4 to 5.3.0 (<a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/623">#623</a> <a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/650">#650</a>)</li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/docker/build-push-action/compare/v3.0.0...v3.1.0">https://github.com/docker/build-push-action/compare/v3.0.0...v3.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="3b5e8027fc"><code>3b5e802</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/docker/build-push-action/issues/784">#784</a> from crazy-max/enable-provenance</li>
<li><a href="02d3266a89"><code>02d3266</code></a> update generated content</li>
<li><a href="f403dafe18"><code>f403daf</code></a> revert disable provenance by default if not set</li>
<li>See full diff in <a href="https://github.com/docker/build-push-action/compare/v3...v4">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=docker/build-push-action&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Co-authored-by: James Lucktaylor <jlucktay+github@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-06 15:45:33 +00:00
34ed6518ae Merge #3554
3554: Bump docker/metadata-action from 3 to 4 r=curquiza a=dependabot[bot]

Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/docker/metadata-action/releases">docker/metadata-action's releases</a>.</em></p>
<blockquote>
<h2>v4.0.0</h2>
<ul>
<li>Node 16 as default runtime by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/176">#176</a>)
<ul>
<li>This requires a minimum <a href="https://github.com/actions/runner/releases/tag/v2.285.0">Actions Runner</a> version of v2.285.0, which is by default available in GHES 3.4 or later.</li>
</ul>
</li>
<li>Do not sanitize before pattern matching by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/201">#201</a>)
<ul>
<li>Breaking change with <code>type=match</code> pattern matching</li>
</ul>
</li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/docker/metadata-action/compare/v3.8.0...v4.0.0">https://github.com/docker/metadata-action/compare/v3.8.0...v4.0.0</a></p>
<h2>v3.8.0</h2>
<ul>
<li>Add attribute to enable/disable images by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/193">#193</a>)</li>
<li>Add <code>is_default_branch</code> global expression by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/192">#192</a> <a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/197">#197</a> <a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/198">#198</a>)</li>
<li>Update fixtures (dev) by <a href="https://github.com/crazy-max"><code>`@​crazy-max</code></a>` (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/190">#190</a>)</li>
<li>Bump semver from 7.3.5 to 7.3.7 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/185">#185</a>)</li>
<li>Bump moment from 2.29.2 to 2.29.3 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/187">#187</a>)</li>
<li>Bump csv-parse from 4.16.3 to 5.0.4 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/195">#195</a>)</li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/docker/metadata-action/compare/v3.7.0...v3.8.0">https://github.com/docker/metadata-action/compare/v3.7.0...v3.8.0</a></p>
<h2>v3.7.0</h2>
<ul>
<li>Handle comments for multi-line inputs (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/172">#172</a>)</li>
<li>Missing <code>json</code> output in <code>action.yml</code> (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/167">#167</a>)</li>
<li>Update dev dependencies and workflow (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/175">#175</a>)</li>
<li>Bump minimist from 1.2.5 to 1.2.6 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/182">#182</a>)</li>
<li>Bump moment from 2.29.1 to 2.29.2 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/180">#180</a>)</li>
<li>Bump <code>`@​actions/github</code>` from 5.0.0 to 5.0.1 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/179">#179</a>)</li>
<li>Bump node-fetch from 2.6.1 to 2.6.7 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/173">#173</a>)</li>
</ul>
<h2>v3.6.2</h2>
<ul>
<li>Handle raw statement for pre-release (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/155">#155</a> <a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/156">#156</a>)</li>
</ul>
<h2>v3.6.1</h2>
<ul>
<li>Preserve quotes inside unquoted field (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/153">#153</a>)</li>
</ul>
<h2>v3.6.0</h2>
<ul>
<li><code>base_ref</code> global expression (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/142">#142</a>)</li>
<li>Trim tags and flavor inputs (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/143">#143</a>)</li>
<li>Bump <code>`@​actions/core</code>` from 1.5.0 to 1.6.0 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/135">#135</a>)</li>
<li>Bump ansi-regex from 5.0.0 to 5.0.1 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/134">#134</a>)</li>
<li>Bump tmpl from 1.0.4 to 1.0.5 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/132">#132</a>)</li>
<li>Bump csv-parse from 4.16.0 to 4.16.3 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/131">#131</a>)</li>
</ul>
<h2>v3.5.0</h2>
<ul>
<li>Add global expression <code>date</code> (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/121">#121</a>)</li>
<li>Bump <code>`@​actions/core</code>` from 1.4.0 to 1.5.0 (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/122">#122</a>)</li>
</ul>
<h2>v3.4.1</h2>
<ul>
<li>Only return edge if branch matches (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/115">#115</a>)</li>
</ul>
<h2>v3.4.0</h2>
<ul>
<li>PEP 440 support (<a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/108">#108</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Upgrade guide</summary>
<p><em>Sourced from <a href="https://github.com/docker/metadata-action/blob/master/UPGRADE.md">docker/metadata-action's upgrade guide</a>.</em></p>
<blockquote>
<h1>Upgrade notes</h1>
<h2>v2 to v3</h2>
<ul>
<li>Repository has been moved to docker org. Replace <code>crazy-max/ghaction-docker-meta@v2</code>
with <code>docker/metadata-action@v4</code></li>
<li>The default bake target has been changed: <code>ghaction-docker-meta</code> &gt; <code>docker-metadata-action</code></li>
</ul>
<h2>v1 to v2</h2>
<ul>
<li><a href="https://github.com/docker/metadata-action/blob/master/#inputs">inputs</a>
<ul>
<li><a href="https://github.com/docker/metadata-action/blob/master/#tag-sha"><code>tag-sha</code></a></li>
<li><a href="https://github.com/docker/metadata-action/blob/master/#tag-edge--tag-edge-branch"><code>tag-edge</code> / <code>tag-edge-branch</code></a></li>
<li><a href="https://github.com/docker/metadata-action/blob/master/#tag-semver"><code>tag-semver</code></a></li>
<li><a href="https://github.com/docker/metadata-action/blob/master/#tag-match--tag-match-group"><code>tag-match</code> / <code>tag-match-group</code></a></li>
<li><a href="https://github.com/docker/metadata-action/blob/master/#tag-latest"><code>tag-latest</code></a></li>
<li><a href="https://github.com/docker/metadata-action/blob/master/#tag-schedule"><code>tag-schedule</code></a></li>
<li><a href="https://github.com/docker/metadata-action/blob/master/#tag-custom--tag-custom-only"><code>tag-custom</code> / <code>tag-custom-only</code></a></li>
<li><a href="https://github.com/docker/metadata-action/blob/master/#label-custom"><code>label-custom</code></a></li>
</ul>
</li>
<li><a href="https://github.com/docker/metadata-action/blob/master/#basic-workflow">Basic workflow</a></li>
<li><a href="https://github.com/docker/metadata-action/blob/master/#semver-workflow">Semver workflow</a></li>
</ul>
<h3>inputs</h3>
<table>
<thead>
<tr>
<th>New</th>
<th>Unchanged</th>
<th>Removed</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>tags</code></td>
<td><code>images</code></td>
<td><code>tag-sha</code></td>
</tr>
<tr>
<td><code>flavor</code></td>
<td><code>sep-tags</code></td>
<td><code>tag-edge</code></td>
</tr>
<tr>
<td><code>labels</code></td>
<td><code>sep-labels</code></td>
<td><code>tag-edge-branch</code></td>
</tr>
<tr>
<td></td>
<td></td>
<td><code>tag-semver</code></td>
</tr>
<tr>
<td></td>
<td></td>
<td><code>tag-match</code></td>
</tr>
<tr>
<td></td>
<td></td>
<td><code>tag-match-group</code></td>
</tr>
<tr>
<td></td>
<td></td>
<td><code>tag-latest</code></td>
</tr>
<tr>
<td></td>
<td></td>
<td><code>tag-schedule</code></td>
</tr>
<tr>
<td></td>
<td></td>
<td><code>tag-custom</code></td>
</tr>
<tr>
<td></td>
<td></td>
<td><code>tag-custom-only</code></td>
</tr>
<tr>
<td></td>
<td></td>
<td><code>label-custom</code></td>
</tr>
</tbody>
</table>
<h4><code>tag-sha</code></h4>
<pre lang="yaml"><code>tags: |
  type=sha
</code></pre>
<h4><code>tag-edge</code> / <code>tag-edge-branch</code></h4>
<pre lang="yaml"><code>tags: |
  # default branch
&lt;/tr&gt;&lt;/table&gt; 
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="507c2f2dc5"><code>507c2f2</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/257">#257</a> from crazy-max/env-output</li>
<li><a href="04861f5102"><code>04861f5</code></a> update generated content</li>
<li><a href="6729545cde"><code>6729545</code></a> Provide outputs as env vars</li>
<li><a href="05d22bf317"><code>05d22bf</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/256">#256</a> from crazy-max/fix-readme</li>
<li><a href="70b403b46b"><code>70b403b</code></a> Fix README</li>
<li><a href="9e6ae02878"><code>9e6ae02</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/252">#252</a> from docker/dependabot/npm_and_yarn/json5-2.2.3</li>
<li><a href="3d239e8b8a"><code>3d239e8</code></a> Bump json5 from 2.2.0 to 2.2.3</li>
<li><a href="7cb52e2750"><code>7cb52e2</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/251">#251</a> from chroju/set_timezone</li>
<li><a href="90a1d5cf21"><code>90a1d5c</code></a> Add tz attribute to handlebar date function</li>
<li><a href="c98ac5e987"><code>c98ac5e</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/docker/metadata-action/issues/249">#249</a> from crazy-max/fix-readme</li>
<li>Additional commits viewable in <a href="https://github.com/docker/metadata-action/compare/v3...v4">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=docker/metadata-action&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-06 14:56:20 +00:00
c0ede6d152 Merge #3562
3562: Update version for the next release (v1.1.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2023-03-06 13:54:16 +00:00
577e7126f9 Update version for the next release (v1.1.0) in Cargo.toml 2023-03-06 13:52:54 +00:00
22219fd88f ci(actions/rust): explicitly set up dependencies and toolchain override 2023-03-06 12:45:08 +00:00
a9e17ab8c6 style(actions/rust): resolve PR review 2023-03-03 12:08:30 +00:00
2dd948a4a1 ci(actions/rust): align with test-linux job 2023-03-03 12:07:42 +00:00
76cf1bff87 Add scheduled test to Actions for all features
Add a new job to the Rust workflow to run 'cargo build' and 'cargo
test' (on the cron schedule only) with the '--all-features' flag.
This will execute across all three environments: Linux, macOS,
Windows.

Autoformat the Rust workflow file via the Red Hat YAML extension for
Visual Studio Code:
https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml
This straightens out whitespace and string quoting for safer parsing.

Fixes #3506.
2023-03-03 12:01:14 +00:00
3d1046369c Merge #3529
3529: Add an analytics on the geo bounding box feature r=ManyTheFish a=irevoire

Fixes #3527

[The specification of the geoBoundingBox](https://github.com/meilisearch/specifications/pull/223) feature has been updated and now introduces a new analytics to follow the usage of the geoBoundingBox feature in the search requests.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-03-02 11:58:39 +00:00
4f1ccbc495 Merge #3525
3525: Fix phrase search containing stop words r=ManyTheFish a=ManyTheFish

# Summary
A search with a phrase containing only stop words was returning an HTTP error 500,
this PR filters the phrase containing only stop words dropping them before the search starts, a query with a phrase containing only stop words now behaves like a placeholder search.

fixes https://github.com/meilisearch/meilisearch/issues/3521

related v1.0.2 PR on milli: https://github.com/meilisearch/milli/pull/779



Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-03-02 10:55:37 +00:00
37489fd495 Return an internal error in the case of matching word is invalid 2023-03-01 19:05:16 +01:00
c0d8eb295d Bump docker/metadata-action from 3 to 4
Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 3 to 4.
- [Release notes](https://github.com/docker/metadata-action/releases)
- [Upgrade guide](https://github.com/docker/metadata-action/blob/master/UPGRADE.md)
- [Commits](https://github.com/docker/metadata-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: docker/metadata-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-01 17:58:18 +00:00
bcd3f6054a Bump docker/build-push-action from 3 to 4
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 3 to 4.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-01 17:58:11 +00:00
3a0314f9de Bump svenstaro/upload-release-action from 2.4.0 to 2.5.0
Bumps [svenstaro/upload-release-action](https://github.com/svenstaro/upload-release-action) from 2.4.0 to 2.5.0.
- [Release notes](https://github.com/svenstaro/upload-release-action/releases)
- [Changelog](https://github.com/svenstaro/upload-release-action/blob/master/CHANGELOG.md)
- [Commits](https://github.com/svenstaro/upload-release-action/compare/2.4.0...2.5.0)

---
updated-dependencies:
- dependency-name: svenstaro/upload-release-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-01 17:58:05 +00:00
fa4d8b8348 Bump Swatinem/rust-cache from 2.2.0 to 2.2.1
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.2.0 to 2.2.1.
- [Release notes](https://github.com/Swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Swatinem/rust-cache/compare/v2.2.0...v2.2.1)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-01 17:57:57 +00:00
d9e19c89c5 Merge #3544
3544: Attempt to use default vram budget for faster startup r=Kerollmops a=dureuill

# Pull Request

## Related issue
Follow-up to #3382: addresses the added startup time on Windows/macOS.

## What does this PR do?
- Attempt to skip budget calculation by using "known good values" instead
- Perform dichotomic budget calculation as fallback only when the known value is not actually good.


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-03-01 09:49:38 +00:00
18bf740ee2 Merge #3539
3539: Update migration link to the docs r=curquiza a=curquiza

Fixes https://github.com/meilisearch/meilisearch/issues/3449

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-02-28 18:21:11 +00:00
0202ff8ab4 Attempt to use default budget for faster startup 2023-02-28 10:55:43 +01:00
fbe4ab158e Merge #3543
3543: config: case `experimental_enable_metrics` in snake_case r=dureuill a=dureuill

# Pull Request

Avoids "Error: unknown field `experimental-enable-metrics` at line 1 column 1" error when using the default config file.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-28 09:35:25 +00:00
92318ca573 config: case experimental_enable_metrics in snake_case 2023-02-27 17:14:06 +01:00
6ca7a109b9 Merge #3538
3538: Improve the api key of the metrics r=dureuill a=irevoire

Related to https://github.com/meilisearch/meilisearch/pull/3524#discussion_r1115903998
Update: https://github.com/meilisearch/meilisearch/issues/3523

Right after merging the PR, we changed our minds and decided to update the way we handle the API keys on the metrics route.
Now instead of bypassing all the applied rules of the API key, we forbid the usage of the `/metrics` route if you have any restrictions on the indexes.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-27 13:46:57 +00:00
d4d4702f1b Rephrase hint message 2023-02-27 13:46:16 +01:00
2648bbca25 Update migration link to the docs 2023-02-23 18:36:30 +01:00
562c86ea01 Merge #3519
3519: Update comments in version bump CI r=irevoire a=curquiza

Following https://github.com/meilisearch/meilisearch/pull/3499

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-02-23 16:46:10 +00:00
7ae10abb6b fix the auth tests 2023-02-23 17:27:42 +01:00
dc533584c6 Forbid the usage of the metrics route if your API key have a limitation on the indexes 2023-02-23 17:13:22 +01:00
442c1e36de Merge #3537
3537: fix a bug where the filestore could try to parse its own tmp file and fail (main) r=irevoire a=curquiza



Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-23 15:58:05 +00:00
66b5e4b548 fix a bug where the filestore could try to parse its own tmp file and fail 2023-02-23 16:52:41 +01:00
89ac1015f3 Merge #3524
3524: Update the metrics route r=irevoire a=irevoire

Fixes #3523

Make the metrics available by default without a feature flag.
+ Rename the cli-flag to `experimental-enable-metrics`.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-23 15:11:10 +00:00
ca25904c26 Merge #3331
3331: Limit the number of concurrently opened indexes r=dureuill a=dureuill

# Pull Request

## Related issue
Relevant to #1841, fixes #3382

## What does this PR do?

### User standpoint

- Limit the number of concurrently opened indexes (currently, the number of indexes that can be concurrently opened is computed at startup)
- When too many an index is opened, the least recently used one is closed and its virtual memory released.
- This allows a user to have an arbitrary number of indexes of an arbitrary size

### Implementation standpoint

- Added a LRU cache map in `index-scheduler::lru`. A more complete implementation  (eg with helper functions not used here) is available but would better fit a dedicated crate.
- Use the LRU cache map in the `IndexScheduler`. To simplify the lifecycle of indexes, they are never removed from the cache when they are in the middle of a resize or delete operation. To achieve this, an intermediate `Vec` stores the UUIDs of the indexes that are in the middle of such an operation.
- Upon creating the index scheduler object, compute the total virtual memory that is adressable by using a dichotomic search on the max size of an index. Use this as a base to compute the number of indexes that can be open with 2TiB per index. If the virtual memory address space is lower than 2TiB, then only allow for 1 index of a fraction of that size.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-23 14:20:52 +00:00
8a1b1a95f3 comment the right of the metrics 2023-02-23 13:59:01 +01:00
8d47d2d018 update the auth api after the rebase 2023-02-23 13:15:51 +01:00
5082cd5e67 update the config file to mention the experimental metrics feature 2023-02-23 12:26:22 +01:00
750a2b6842 Update meilisearch/src/option.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-23 12:26:22 +01:00
bc7d4112d9 send the cli experimental feature in the analytics 2023-02-23 12:26:22 +01:00
88a18677d0 rename the metrics cli flag 2023-02-23 12:26:22 +01:00
68e30214ca remove the feature flag and reorganize the module slightly 2023-02-23 12:26:21 +01:00
b985b96e4e Merge #3530
3530: Fix highlighter bug r=Kerollmops a=ManyTheFish

# Pull Request

There was a highlighting issue on CJK's character, we were highlighting too many characters and these additional characters were duplicated after the highlight tag.

## Related issue
Fixes #3517 
Fixes #3526 

## What does this PR do?
- add a test showcasing the bug
- fix the bug by activating the char_map creation of the tokenizer during the highlighting process


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-02-23 10:59:43 +00:00
71e7900c67 move index_map to file 2023-02-23 11:29:11 +01:00
431782f3ee Move index_mapper to mod.rs 2023-02-23 11:29:11 +01:00
3db613ff77 Don't iterate all indexes manually 2023-02-23 11:29:09 +01:00
5822764be9 Skip computing index budget in tests 2023-02-23 11:23:39 +01:00
c63294f331 Switch to 2TiB default index size, updates documentation 2023-02-23 11:23:39 +01:00
a529bf160c Compute budget 2023-02-23 11:23:39 +01:00
f1119f2dc2 Add dichotomic search to utils 2023-02-23 11:23:39 +01:00
1db7d5d851 Add basic tests for index eviction and resize 2023-02-23 11:23:39 +01:00
80b060f920 Use LRU cache 2023-02-23 11:23:39 +01:00
fdf043580c Add LruMap 2023-02-23 11:23:38 +01:00
f62703cd67 Merge #3534
3534: Update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter r=Kerollmops a=irevoire

Fixes #3533

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-23 07:05:12 +00:00
76f82c880d update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter 2023-02-22 19:26:48 +01:00
6eeba3a8ab Merge #3417
3417: Allow multiple searches in a single request r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #3427

## What does this PR do?

### User standpoint

- Adds a new `/multi-search` entry point (not to be confused with the existing `/{index_uid}/search` entry points) that accepts a POST whose body is an object containing an array of queries.
    - Each query must specify on which index it acts by providing its `indexUid`. Other parameters are identical to the one in the existing search routes (`q`, `limit`, etc.).
    - The response is a JSON object containing an array of the results for each search query as if it had been performed using the `/{index_uid}/search` routes.

### Implementation standpoint

- Refactor authentication module:
  -  Allow tenant token to be checked even without an index in URL
  - Add `meilisearch-auth` as a dependency to `index-scheduler` so as to have a working method of checking if the indexes are authorized there that takes into account both the API key and the tenant token (existing method relied on a behavior that was returning the allowed indexes from the API key as long as there weren't any tenant token)
  - Make `AuthFilter` an object with invariants and so its fields are now private
  - Use the methods of `AuthFilter` to know if an index is authorized rather than relying on its internal search rules.
  - Make tenant token search rules optional and `None` when the `AuthFilter` was not built with a tenant token.
- Add a new `routes::index::search::multiple_search` module containing a post handler that performs the same work as the existing `routes::index::search` post handler, but in a loop.
  - Add various tests
  - Add authentication test suite 

### Sample request


<details>
<summary>
Click to see request/response
</summary>

```json
~/datasets
❯ curl \
  -X POST 'http://localhost:7700/multi-search' \
  -H 'Content-Type: application/json' \
  --data-binary '{"queries": [{ "indexUid": "index-0", "q": "toto", "limit": 1 }, {"indexUid": "index-1", "q": "titi", "limit": 1}]}' | jsonxf
{ "results": [
  {
    "indexUid": "index-0",
    "hits": [
      {
        "id": 20480,
        "title": "Toto - 25th Anniversary - Live in Amsterdam",
        "overview": "Filmed in High Definition in Amsterdam on Toto's 25th Anniversary Tour in 2003, this stunning concert captures the band at their very best, reunited with original vocalist Bobby Kimball. The set combines all their hits with tracks from their latest album \"Through the Looking Glass\" and other live favorites, performed in front of a wildly enthusastic sell-out crowd. Extras include 35 minute behind-the-scenes film following the band through various stages of their world tour including footage from Japan, Thailand, South Korea, and France.  Toto celebrate their 25th anniversary with this blistering live concert, filmed in Amsterdam on February 25th, 2003. Proving they've still got exactly what it takes to move a crowd, the band perform a mixture of medley's, solo spots, and huge hits. Tracks include \"Rosanna,\" \"Africa,\" \"Hold The Line,\" a cover of the Beatles' \"While My Guitar Gently Weeps,\" and many more.",
        "genres": [
          "Music"
        ],
        "poster": "https://image.tmdb.org/t/p/w500/7SCbUPwoB8Z7VUIA1Rn1WWwjNiT.jpg",
        "release_date": 1064275200
      }
    ],
    "query": "toto",
    "processingTimeMs": 1,
    "limit": 1,
    "offset": 0,
    "estimatedTotalHits": 17
  },
  {
    "indexUid": "index-1",
    "hits": [
      {
        "id": 41212,
        "title": "Titicut Follies",
        "overview": "The film is a stark and graphic portrayal of the conditions that existed at the State Prison for the Criminally Insane at Bridgewater, Massachusetts. TITICUT FOLLIES documents the various ways the inmates are treated by the guards, social workers and psychiatrists.",
        "genres": [
          "Documentary"
        ],
        "poster": "https://image.tmdb.org/t/p/w500/2Ju5hn1ofOPeP1eRJtQWakiHuhW.jpg",
        "release_date": -70934400
      }
    ],
    "query": "titi",
    "processingTimeMs": 0,
    "limit": 1,
    "offset": 0,
    "estimatedTotalHits": 7
  }
]}
```

</details>


## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-22 17:23:45 +00:00
28d6a4466d Make the tokenizer creating a char map during highlighting 2023-02-22 17:43:10 +01:00
1ba2fae3ae multi-search/authentication: Add authentication tests 2023-02-22 17:04:12 +01:00
28d6ab78de multi-search: Add multi search tests 2023-02-22 17:04:12 +01:00
3ba5dfb6ec multi-search: Add test server search method for multi search 2023-02-22 17:04:12 +01:00
a23fbf6c7b multi-search: Add search with an array of indexes 2023-02-22 17:04:12 +01:00
596a98f7c6 multi-search: Add basic analytics 2023-02-22 16:37:18 +01:00
14c4a222da Authentication: AuthFilter::allow_index_creation both check that the index is authorized and the IndexCreate action 2023-02-22 16:37:13 +01:00
690bb2e5cc Authentication: Make allow_index_creation a private field 2023-02-22 16:35:52 +01:00
d0f2c9c72e Authentication: Make search_rules optional in AuthFilter 2023-02-22 16:35:52 +01:00
42577403d8 Authentication: Directly pass the authfilter to the index scheduler 2023-02-22 16:35:52 +01:00
c8c5944094 Authentication: is_index_authorized takes into account API key indexes even with a tenant token 2023-02-22 16:35:52 +01:00
4b65851793 Authentication: Refactor authentication check to work for tenant token even without an index in URL
Callers need to manually check `is_index_authorized` when using the route without an index in URL
2023-02-22 16:35:51 +01:00
10d4a1a9af Make ResponseError code and message pub so that they can be modified 2023-02-22 16:35:51 +01:00
ad35edfa32 Add test 2023-02-22 15:47:15 +01:00
033417e9cc add an analytics on the geo bounding box feature 2023-02-22 15:35:26 +01:00
ac5a1e4c4b Merge #3423
3423: Add min and max facet stats r=dureuill a=dureuill

# Pull Request

## Related issue
Fixes #3426

## What does this PR do?

### User standpoint

- When using a `facets` parameter in search, the facets that have numeric values are displayed in a new section of the response called `facetStats` that contains, per facet, the numeric min and max value of the hits returned by the search.

<details>
<summary>
Sample request/response
</summary>

```json
❯ curl \
  -X POST 'http://localhost:7700/indexes/meteorites/search?facets=mass' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "LL6", "facets":["mass", "recclass"], "limit": 5 }' | jsonxf
{
  "hits": [
    {
      "name": "Niger (LL6)",
      "id": "16975",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 3.3,
      "fall": "Fell"
    },
    {
      "name": "Appley Bridge",
      "id": "2318",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 15000,
      "fall": "Fell",
      "_geo": {
        "lat": 53.58333,
        "lng": -2.71667
      }
    },
    {
      "name": "Athens",
      "id": "4885",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 265,
      "fall": "Fell",
      "_geo": {
        "lat": 34.75,
        "lng": -87.0
      }
    },
    {
      "name": "Bandong",
      "id": "4935",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 11500,
      "fall": "Fell",
      "_geo": {
        "lat": -6.91667,
        "lng": 107.6
      }
    },
    {
      "name": "Benguerir",
      "id": "30443",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 25000,
      "fall": "Fell",
      "_geo": {
        "lat": 32.25,
        "lng": -8.15
      }
    }
  ],
  "query": "LL6",
  "processingTimeMs": 15,
  "limit": 5,
  "offset": 0,
  "estimatedTotalHits": 42,
  "facetDistribution": {
    "mass": {
      "110000": 1,
      "11500": 1,
      "1161": 1,
      "12000": 1,
      "1215.5": 1,
      "127000": 1,
      "15000": 1,
      "1676": 1,
      "1700": 1,
      "1710.5": 1,
      "18000": 1,
      "19000": 1,
      "220000": 1,
      "2220": 1,
      "22300": 1,
      "25000": 2,
      "265": 1,
      "271000": 1,
      "2840": 1,
      "3.3": 1,
      "3000": 1,
      "303": 1,
      "32000": 1,
      "34000": 1,
      "36.1": 1,
      "45000": 1,
      "460": 1,
      "478": 1,
      "483": 1,
      "5500": 2,
      "600": 1,
      "6000": 1,
      "67.8": 1,
      "678": 1,
      "680.5": 1,
      "6930": 1,
      "8": 1,
      "8300": 1,
      "840": 1,
      "8400": 1
    },
    "recclass": {
      "L/LL6": 3,
      "LL6": 39
    }
  },
  "facetStats": {
    "mass": {
      "min": 3.3,
      "max": 271000.0
    }
  }
}
```

</details>

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-22 13:06:43 +00:00
3eb9a08b5c Update comments in version bump CI 2023-02-21 19:14:59 +01:00
900bae3d9d keep phrases that has at least one word 2023-02-21 18:16:51 +01:00
28b7d73d4a Remove an unefficient part of a test on milli 2023-02-21 18:16:51 +01:00
6841f167b4 Add test 2023-02-21 18:02:52 +01:00
c88b6f331f Merge #3482
3482: Optimize meilisearch uffizzi build r=curquiza a=waveywaves

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3476

## What does this PR do?
even though docker cache was being used earlier for uffizzi builds, seems like the cache layers weren't persisting. This commit adds changes to move meilisearch building outside the dockerfile so that we can use the rust cache action. We are also building to the musl target so that the binary for meilisearch which is created can be used for the uffizzi ttyd image which uses alpine.

Meilisearch build time brought to 5 mins example https://github.com/waveywaves/meilisearch/actions/runs/4142776058

we also update the version of uffizzi action used here which fixes another uffizzi bug where the environments are not deployed. https://app.uffizzi.com/github.com/waveywaves/meilisearch/pull/2 was built as a part of a test for this PR and we can be sure that the deployment works well now.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Vibhav Bobade <vibhav.bobde@gmail.com>
2023-02-21 16:06:53 +00:00
09a94e0db3 optimize meilisearch uffizzi build
even though docker cache was being used earlier for uffizzi builds,
seems like the cache layers weren't persisting. This commit adds changes
to move meilisearch building outside the dockerfile so that we can
use the rust cache action. We are also building to the musl target
so that the binary for meilisearch which is created can be used for
the uffizzi ttyd image which uses alpine.
2023-02-21 17:25:28 +05:30
39407885c2 Merge #3347
3347: Enhance language detection r=irevoire a=ManyTheFish

## Summary

Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query.

This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search.

## Detail

- Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected
- Fill the newly created database during indexing
- Create an allow-list with this database and pass it to Charabia
- Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese

## Related issues
Fixes #2403
Fixes #3513

Co-authored-by: f3r10 <frledesma@outlook.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-02-21 10:52:13 +00:00
a3e41ba33e Merge #3496
3496: Fix metrics feature r=irevoire a=james-2001

# Pull Request

## Related issue

Resolves: #3469
See also: #2763

## What does this PR do?
As reported the metrics feature was broken by still using and old reference to `meilisearch_auth::actions`. This commit switches to the new location, `meilisearch_types::keys::actions`.

The original issue was not *that* clear as to exactly what was broken, and the build logs have disappeared, but it seemed to just be this one line fix. If this is not the case and I've missed the mark let me know, and i'll head back to the drawing board.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: James <james.a.may.2001@gmail.com>
2023-02-21 10:13:11 +00:00
ce807d760b Fix formatting issue on Opt struct
tab in enable_metrics_route to fix cargo fmt issues

Resolves: #3469
See also: #2763
2023-02-21 09:45:18 +00:00
bbecab8948 fix clippy 2023-02-21 10:18:44 +01:00
5cff435bf6 Add feature flags to Opt structure
Resolves: #3469
See also: #2763
2023-02-21 07:41:41 +00:00
8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
23f4e82b53 Add test ensuring that Meilisearch works on kanji only requests 2023-02-20 15:43:29 +01:00
119e6d8811 Update milli/src/search/mod.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 15:33:10 +01:00
cb8d5f2d4b Update Charabia to 0.7.1 2023-02-20 14:00:31 +01:00
eb28d4c525 add facet test 2023-02-20 13:52:28 +01:00
9ac981d025 Remove some clippy type complexity warns by deboxing iters 2023-02-20 13:52:27 +01:00
74859ecd61 Add min and max facet stats 2023-02-20 13:52:27 +01:00
8ae441a4db Update usage of iterators 2023-02-20 13:52:27 +01:00
042d86cbb3 facet sort ascending/descending now also return the values 2023-02-20 13:52:27 +01:00
91048d209d Fix metrics feature
Metrics feature was relying on old references. Refactored with inspiration from the `get_stats` method in `meilisearch/src/routes/lib.rs`. `enable_metrics_routes` added to options in `segment_analytics`.

Resolves: #3469
See also: #2763
2023-02-17 20:11:57 +00:00
62358bd31c Fix metrics feature
As reported the metrics feature was broken by still using and old reference to `meilisearch_auth::actions`. This commit switches to the new location, `meilisearch_types::keys::actions`.

Resolves: #3469
See also: #2763
2023-02-14 17:29:38 +00:00
0bc1a18f52 Use Languages list detected during indexing at search time 2023-02-01 18:57:43 +01:00
643d99e0f9 Add expectancy test 2023-02-01 18:39:54 +01:00
064158e4e2 Update test 2023-02-01 15:34:01 +01:00
77d32d0ee8 Fix codec deserialization 2023-02-01 15:26:26 +01:00
f4569b04ad Update Charabia version 2023-02-01 15:26:26 +01:00
2922c5c899 Fix code format 2023-01-31 11:28:05 +01:00
7681be5367 Format code 2023-01-31 11:28:05 +01:00
50bc156257 Fix tests 2023-01-31 11:28:05 +01:00
d8207356f4 Skip script,language insertion if language is undetected 2023-01-31 11:28:05 +01:00
2d58b28f43 Improve script language codec 2023-01-31 11:28:05 +01:00
fd60a39f1c Format code 2023-01-31 11:28:05 +01:00
369c05732e Add test checking if from script_language_docids database were removed
deleted docids
2023-01-31 11:28:05 +01:00
34d04f3d3f Filter from script_language_docids database soft deleted documents 2023-01-31 11:28:05 +01:00
a27f329e3a Add tests for checking that detected script and language associated with document(s) were stored during indexing 2023-01-31 11:28:05 +01:00
b216ddba63 Delete and clear data from the new database 2023-01-31 11:28:05 +01:00
d97fb6117e Extract and index data 2023-01-31 11:28:05 +01:00
c45d1e3610 Create a new database on index and add a specialized codec for it 2023-01-31 11:28:05 +01:00
86 changed files with 5812 additions and 960 deletions

View File

@ -1,24 +1,3 @@
# Compile
FROM rust:alpine3.16 AS compiler
RUN apk add -q --update-cache --no-cache build-base openssl-dev
WORKDIR /meilisearch
ARG COMMIT_SHA
ARG COMMIT_DATE
ARG GIT_TAG
ENV COMMIT_SHA=${COMMIT_SHA} COMMIT_DATE=${COMMIT_DATE} VERGEN_GIT_SEMVER_LIGHTWEIGHT=${GIT_TAG}
ENV RUSTFLAGS="-C target-feature=-crt-static"
COPY . .
RUN set -eux; \
apkArch="$(apk --print-arch)"; \
if [ "$apkArch" = "aarch64" ]; then \
export JEMALLOC_SYS_WITH_LG_PAGE=16; \
fi && \
cargo build --release
# Run
FROM uffizzi/ttyd:alpine
@ -29,19 +8,11 @@ ENV MEILI_NO_ANALYTICS true
RUN apk update --quiet \
&& apk add -q --no-cache libgcc tini curl
# add meilisearch to the `/bin` so you can run it from anywhere and it's easy
# to find.
COPY --from=compiler /meilisearch/target/release/meilisearch /bin/meilisearch
# To stay compatible with the older version of the container (pre v0.27.0) we're
# going to symlink the meilisearch binary in the path to `/meilisearch`
COPY target/x86_64-unknown-linux-musl/release/meilisearch /bin/meilisearch
RUN ln -s /bin/meilisearch /meilisearch
# This directory should hold all the data related to meilisearch so we're going
# to move our PWD in there.
# We don't want to put the meilisearch binary
WORKDIR /meili_data
EXPOSE 7700/tcp
ENTRYPOINT ["tini", "--"]

View File

@ -1,4 +1,4 @@
name: Benchmarks
name: Benchmarks (manual)
on:
workflow_dispatch:

View File

@ -1,4 +1,4 @@
name: Benchmarks indexing (push)
name: Benchmarks of indexing (push)
on:
push:

View File

@ -1,4 +1,4 @@
name: Benchmarks search geo (push)
name: Benchmarks of search for geo (push)
on:
push:

View File

@ -1,4 +1,4 @@
name: Benchmarks search songs (push)
name: Benchmarks of search for songs (push)
on:
push:

View File

@ -1,4 +1,4 @@
name: Benchmarks search wikipedia articles (push)
name: Benchmarks of search for Wikipedia articles (push)
on:
push:

View File

@ -1,28 +0,0 @@
name: Create issue to upgrade dependencies
on:
schedule:
# Run the first of the month, every 3 month
- cron: '0 0 1 */3 *'
workflow_dispatch:
jobs:
create-issue:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Create an issue
uses: actions-ecosystem/action-create-issue@v1
with:
github_token: ${{ secrets.MEILI_BOT_GH_PAT }}
title: Upgrade dependencies
body: |
This issue is about updating Meilisearch dependencies:
- [ ] Cargo toml dependencies of Meilisearch; but also the main engine-team repositories that Meilisearch depends on (charabia, heed...)
- [ ] If new Rust versions have been released, update the Rust version in the Clippy job of this [GitHub Action file](./.github/workflows/rust.yml)
⚠️ To avoid last minute bugs, this issue should only be done at the beginning of the sprint!
The GitHub action dependencies are managed by [Dependabot](./.github/dependabot.yml)
labels: |
dependencies
maintenance

24
.github/workflows/dependency-issue.yml vendored Normal file
View File

@ -0,0 +1,24 @@
name: Create issue to upgrade dependencies
on:
schedule:
# Run the first of the month, every 3 month
- cron: '0 0 1 */3 *'
workflow_dispatch:
jobs:
create-issue:
runs-on: ubuntu-latest
env:
ISSUE_TEMPLATE: issue-template.md
GH_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }}
steps:
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/dependency-issue.md > $ISSUE_TEMPLATE
- name: Create issue
run: |
gh issue create \
--title 'Upgrade dependencies' \
--label 'dependencies,maintenance' \
--body-file $ISSUE_TEMPLATE

View File

@ -1,4 +1,4 @@
name: Publish to APT repository & Homebrew
name: Publish to APT & Homebrew
on:
release:
@ -35,7 +35,7 @@ jobs:
- name: Build deb package
run: cargo deb -p meilisearch -o target/debian/meilisearch.deb
- name: Upload debian pkg to release
uses: svenstaro/upload-release-action@2.4.0
uses: svenstaro/upload-release-action@2.5.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/debian/meilisearch.deb

View File

@ -1,3 +1,5 @@
name: Publish binaries to GitHub release
on:
workflow_dispatch:
schedule:
@ -5,8 +7,6 @@ on:
release:
types: [published]
name: Publish binaries to release
jobs:
check-version:
name: Check the version validity
@ -54,7 +54,7 @@ jobs:
# No need to upload binaries for dry run (cron)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.4.0
uses: svenstaro/upload-release-action@2.5.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/meilisearch
@ -87,7 +87,7 @@ jobs:
# No need to upload binaries for dry run (cron)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.4.0
uses: svenstaro/upload-release-action@2.5.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/${{ matrix.artifact_name }}
@ -123,7 +123,7 @@ jobs:
- name: Upload the binary to release
# No need to upload binaries for dry run (cron)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.4.0
uses: svenstaro/upload-release-action@2.5.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/${{ matrix.target }}/release/meilisearch
@ -183,7 +183,7 @@ jobs:
- name: Upload the binary to release
# No need to upload binaries for dry run (cron)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.4.0
uses: svenstaro/upload-release-action@2.5.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/${{ matrix.target }}/release/meilisearch

View File

@ -1,4 +1,5 @@
---
name: Publish images to Docker Hub
on:
push:
# Will run for every tag pushed except `latest`
@ -12,8 +13,6 @@ on:
- cron: '0 23 * * *' # Every day at 11:00pm
workflow_dispatch:
name: Publish tagged images to Docker Hub
jobs:
docker:
runs-on: docker

View File

@ -1,4 +1,4 @@
name: Rust
name: Test suite
on:
workflow_dispatch:
@ -25,36 +25,35 @@ jobs:
# Use ubuntu-18.04 to compile with glibc 2.27, which are the production expectations
image: ubuntu:18.04
steps:
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- name: Run test with Rust stable
if: github.event_name != 'schedule'
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
- name: Run test with Rust nightly
if: github.event_name == 'schedule'
uses: actions-rs/toolchain@v1
with:
toolchain: nightly
override: true
# Disable cache due to disk space issues with Windows workers in CI
# - name: Cache dependencies
# uses: Swatinem/rust-cache@v2.2.0
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --locked --release --no-default-features --all
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- name: Run test with Rust stable
if: github.event_name != 'schedule'
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
- name: Run test with Rust nightly
if: github.event_name == 'schedule'
uses: actions-rs/toolchain@v1
with:
toolchain: nightly
override: true
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.0
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --locked --release --no-default-features --all
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all
test-others:
name: Tests on ${{ matrix.os }}
@ -64,19 +63,47 @@ jobs:
matrix:
os: [macos-12, windows-2022]
steps:
- uses: actions/checkout@v3
# - name: Cache dependencies
# uses: Swatinem/rust-cache@v2.2.0
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --locked --release --no-default-features --all
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all
- uses: actions/checkout@v3
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.0
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --locked --release --no-default-features --all
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all
test-all-features:
name: Tests all features on cron schedule only
runs-on: ubuntu-latest
container:
# Use ubuntu-18.04 to compile with glibc 2.27, which are the production expectations
image: ubuntu:18.04
if: github.event_name == 'schedule'
steps:
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update
apt-get install --assume-yes build-essential curl
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
- name: Run cargo build with all features
uses: actions-rs/cargo@v1
with:
command: build
args: --workspace --locked --release --all-features
- name: Run cargo test with all features
uses: actions-rs/cargo@v1
with:
command: test
args: --workspace --locked --release --all-features
# We run tests in debug also, to make sure that the debug_assertions are hit
test-debug:
@ -95,8 +122,8 @@ jobs:
with:
toolchain: stable
override: true
# - name: Cache dependencies
# uses: Swatinem/rust-cache@v2.2.0
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.0
- name: Run tests in debug
uses: actions-rs/cargo@v1
with:
@ -114,8 +141,8 @@ jobs:
toolchain: 1.67.0
override: true
components: clippy
# - name: Cache dependencies
# uses: Swatinem/rust-cache@v2.2.0
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.0
- name: Run cargo clippy
uses: actions-rs/cargo@v1
with:
@ -134,8 +161,8 @@ jobs:
toolchain: nightly
override: true
components: rustfmt
# - name: Cache dependencies
# uses: Swatinem/rust-cache@v2.2.0
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.0
- name: Run cargo fmt
# Since we never ran the `build.rs` script in the benchmark directory we are missing one auto-generated import file.
# Since we want to trigger (and fail) this action as fast as possible, instead of building the benchmark crate

View File

@ -14,6 +14,26 @@ jobs:
- name: checkout
uses: actions/checkout@v3
- run: sudo apt-get install musl-tools
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
target: x86_64-unknown-linux-musl
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.1
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --target x86_64-unknown-linux-musl --release
- name: Remove dockerignore so we can use the target folder in our docker build
run: rm -f .dockerignore
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
@ -26,14 +46,14 @@ jobs:
- name: Docker metadata
id: meta
uses: docker/metadata-action@v3
uses: docker/metadata-action@v4
with:
images: registry.uffizzi.com/${{ env.UUID_TAG }}
tags: |
type=raw,value=60d
- name: Build Image
uses: docker/build-push-action@v3
uses: docker/build-push-action@v4
with:
context: ./
file: .github/uffizzi/Dockerfile

View File

@ -82,7 +82,7 @@ jobs:
name: Use Remote Workflow to Preview on Uffizzi
needs:
- cache-compose-file
uses: UffizziCloud/preview-action/.github/workflows/reusable.yaml@desc
uses: UffizziCloud/preview-action/.github/workflows/reusable.yaml@v2
with:
# If this workflow was triggered by a PR close event, cache-key will be an empty string
# and this reusable workflow will delete the preview deployment.
@ -95,8 +95,8 @@ jobs:
`meilisearch` command. You should be able to access this instance of meilisearch running in
the preview from the link Meilisearch Endpoint link given below.
Web Terminal Endpoint : ${{ needs.cache-compose-file.outputs.expected-url }}
Meilisearch Endpoint : ${{ needs.cache-compose-file.outputs.expected-url }}/meilisearch
Web Terminal Endpoint : <uffizzi-url>
Meilisearch Endpoint : <uffizzi-url>/meilisearch
permissions:
contents: read
pull-requests: write

View File

@ -1,4 +1,4 @@
name: Update Meilisearch version in all Cargo.toml files
name: Update Meilisearch version in Cargo.toml
on:
workflow_dispatch:
@ -14,7 +14,7 @@ env:
jobs:
update-version-cargo-toml:
name: Update version in Cargo.toml files
name: Update version in Cargo.toml
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
@ -25,7 +25,7 @@ jobs:
override: true
- name: Install sd
run: cargo install sd
- name: Update Cargo.toml files
- name: Update Cargo.toml file
run: |
raw_new_version=$(echo $NEW_VERSION | cut -d 'v' -f 2)
new_string="version = \"$raw_new_version\""
@ -35,13 +35,13 @@ jobs:
- name: Commit and push the changes to the ${{ env.NEW_BRANCH }} branch
uses: EndBug/add-and-commit@v9
with:
message: "Update version for the next release (${{ env.NEW_VERSION }}) in Cargo.toml files"
message: "Update version for the next release (${{ env.NEW_VERSION }}) in Cargo.toml"
new_branch: ${{ env.NEW_BRANCH }}
- name: Create the PR pointing to ${{ github.ref_name }}
run: |
gh pr create \
--title "Update version for the next release ($NEW_VERSION) in Cargo.toml files" \
--body '⚠️ This PR is automatically generated. Check the new version is the expected one before merging.' \
--title "Update version for the next release ($NEW_VERSION) in Cargo.toml" \
--body '⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.' \
--label 'skip changelog' \
--milestone $NEW_VERSION \
--base $GITHUB_REF_NAME

326
Cargo.lock generated
View File

@ -252,7 +252,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9e8b47f52ea9bae42228d07ec09eb676433d7c4ed1ebdf0f1d1c29ed446f1ab8"
dependencies = [
"cfg-if",
"cipher",
"cipher 0.3.0",
"cpufeatures",
"opaque-debug",
]
@ -410,7 +410,7 @@ checksum = "b645a089122eccb6111b4f81cbc1a49f5900ac4666bb93ac027feaecf15607bf"
[[package]]
name = "benchmarks"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"anyhow",
"bytes",
@ -517,18 +517,23 @@ dependencies = [
"serde",
]
[[package]]
name = "build_const"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b4ae4235e6dac0694637c763029ecea1a2ec9e4e06ec2729bd21ba4d9c863eb7"
[[package]]
name = "bumpalo"
version = "3.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "572f695136211188308f16ad2ca5c851a712c464060ae6974944458eb83880ba"
[[package]]
name = "bus"
version = "2.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "80cb4625f5b60155ff1018c9d4ce2e38bf5ae3e5780dfab9fa68bb44a6b751e2"
dependencies = [
"crossbeam-channel",
"num_cpus",
"parking_lot_core",
]
[[package]]
name = "byte-unit"
version = "4.0.18"
@ -647,6 +652,17 @@ version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
[[package]]
name = "chacha20"
version = "0.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c7fc89c7c5b9e7a02dfe45cd2367bae382f9ed31c61ca8debe5f827c420a2f08"
dependencies = [
"cfg-if",
"cipher 0.4.4",
"cpufeatures",
]
[[package]]
name = "change-detection"
version = "1.2.0"
@ -659,16 +675,19 @@ dependencies = [
[[package]]
name = "charabia"
version = "0.7.0"
version = "0.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b57f9571f611796ea38e5a9c12e5ce37476f70397b032757f8dfe0c7b9bc5637"
checksum = "1ad3d9667a6b4e03813162c22c4d58235c2dc25d580d60837ce29199038341c9"
dependencies = [
"cow-utils",
"csv",
"deunicode",
"fst",
"irg-kvariants",
"jieba-rs",
"lindera",
"lindera-ipadic",
"lindera-ko-dic",
"once_cell",
"pinyin",
"serde",
@ -715,20 +734,25 @@ dependencies = [
"generic-array",
]
[[package]]
name = "cipher"
version = "0.4.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "773f3b9af64447d2ce9850330c473515014aa235e6a783b02db81ff39e4a3dad"
dependencies = [
"crypto-common",
"inout",
]
[[package]]
name = "clap"
version = "3.2.23"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "71655c45cb9845d3270c9d6df84ebe72b4dad3c2ba3f7023ad47c144e4e473a5"
dependencies = [
"atty",
"bitflags",
"clap_derive 3.2.18",
"clap_lex 0.2.4",
"indexmap",
"once_cell",
"strsim",
"termcolor",
"textwrap",
]
@ -739,7 +763,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a7db700bc935f9e43e88d00b0850dae18a63773cfbec6d8e070fccf7fef89a39"
dependencies = [
"bitflags",
"clap_derive 4.0.21",
"clap_derive",
"clap_lex 0.3.0",
"is-terminal",
"once_cell",
@ -747,19 +771,6 @@ dependencies = [
"termcolor",
]
[[package]]
name = "clap_derive"
version = "3.2.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ea0c8bce528c4be4da13ea6fead8965e95b6073585a2f05204bd8f4119f82a65"
dependencies = [
"heck",
"proc-macro-error",
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "clap_derive"
version = "4.0.21"
@ -791,6 +802,24 @@ dependencies = [
"os_str_bytes",
]
[[package]]
name = "cluster"
version = "1.1.0"
dependencies = [
"bus",
"crossbeam",
"ductile",
"log",
"meilisearch-types",
"roaring",
"serde",
"serde_json",
"synchronoise",
"thiserror",
"time",
"uuid 1.3.0",
]
[[package]]
name = "concat-arrays"
version = "0.1.2"
@ -873,15 +902,6 @@ dependencies = [
"libc",
]
[[package]]
name = "crc"
version = "1.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d663548de7f5cca343f1e0a48d14dcfb0e9eb4e079ec58883b7251539fa10aeb"
dependencies = [
"build_const",
]
[[package]]
name = "crc32fast"
version = "1.3.2"
@ -1178,9 +1198,24 @@ dependencies = [
"winapi",
]
[[package]]
name = "ductile"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "12cde25956886749c891a27249630ae99471f1ba05c4a924aad1a6ffe6932812"
dependencies = [
"anyhow",
"bincode",
"chacha20",
"crossbeam-channel",
"log",
"rand",
"serde",
]
[[package]]
name = "dump"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"anyhow",
"big_s",
@ -1200,14 +1235,14 @@ dependencies = [
"tempfile",
"thiserror",
"time",
"uuid 1.2.2",
"uuid 1.3.0",
]
[[package]]
name = "either"
version = "1.8.0"
version = "1.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "90e5c1c8368803113bf0c9584fc495a58b86dc8a29edbf8fe877d21d9507e797"
checksum = "7fcaabb2fef8c910e7f4c7ce9f67a1283a1715879a7c230ca9d6d1ae31f16d91"
dependencies = [
"serde",
]
@ -1333,6 +1368,19 @@ dependencies = [
"termcolor",
]
[[package]]
name = "env_logger"
version = "0.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85cdab6a89accf66733ad5a1693a4dcced6aeff64602b634530dd73c1f3ee9f0"
dependencies = [
"humantime",
"is-terminal",
"log",
"regex",
"termcolor",
]
[[package]]
name = "errno"
version = "0.2.8"
@ -1388,12 +1436,12 @@ dependencies = [
[[package]]
name = "file-store"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"faux",
"tempfile",
"thiserror",
"uuid 1.2.2",
"uuid 1.3.0",
]
[[package]]
@ -1410,7 +1458,7 @@ dependencies = [
[[package]]
name = "filter-parser"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"insta",
"nom",
@ -1430,7 +1478,7 @@ dependencies = [
[[package]]
name = "flatten-serde-json"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"criterion",
"serde_json",
@ -1907,11 +1955,12 @@ dependencies = [
[[package]]
name = "index-scheduler"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"anyhow",
"big_s",
"bincode",
"cluster",
"crossbeam",
"csv",
"derive_builder",
@ -1921,6 +1970,7 @@ dependencies = [
"insta",
"log",
"meili-snap",
"meilisearch-auth",
"meilisearch-types",
"nelson",
"page_size 0.5.0",
@ -1931,7 +1981,7 @@ dependencies = [
"tempfile",
"thiserror",
"time",
"uuid 1.2.2",
"uuid 1.3.0",
]
[[package]]
@ -1945,6 +1995,15 @@ dependencies = [
"serde",
]
[[package]]
name = "inout"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a0c10553d664a4d0bcff9f4215d0aac67a639cc68ef660840afe309b807bc9f5"
dependencies = [
"generic-array",
]
[[package]]
name = "insta"
version = "1.26.0"
@ -1986,6 +2045,17 @@ version = "2.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "30e22bd8629359895450b59ea7a776c850561b96a3b1d31321c1949d9e6c9146"
[[package]]
name = "irg-kvariants"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c73214298363629cf9dbfc93b426808865ee3c121029778cb31b1284104fdf78"
dependencies = [
"csv",
"once_cell",
"serde",
]
[[package]]
name = "is-terminal"
version = "0.4.2"
@ -2054,7 +2124,7 @@ dependencies = [
[[package]]
name = "json-depth-checker"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"criterion",
"serde_json",
@ -2074,6 +2144,15 @@ dependencies = [
"simple_asn1",
]
[[package]]
name = "kanaria"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c0f9d9652540055ac4fded998a73aca97d965899077ab1212587437da44196ff"
dependencies = [
"bitflags",
]
[[package]]
name = "language-tags"
version = "0.3.2"
@ -2143,14 +2222,15 @@ dependencies = [
[[package]]
name = "lindera"
version = "0.17.0"
version = "0.21.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "082ca91ac4d1557028ace9bfb8cee1500d156a4574dda93cfcdcf4caaebb9bd7"
checksum = "0f33a20bb9cbf95572b2d2f40d7040c8d8c7ad09ae20e1f6513db6ef2564dfc5"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"encoding",
"kanaria",
"lindera-cc-cedict-builder",
"lindera-core",
"lindera-dictionary",
@ -2159,24 +2239,27 @@ dependencies = [
"lindera-ko-dic",
"lindera-ko-dic-builder",
"lindera-unidic-builder",
"regex",
"serde",
"serde_json",
"thiserror",
"unicode-blocks",
"unicode-normalization",
"yada",
]
[[package]]
name = "lindera-cc-cedict-builder"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a8967615a6d85320ec2755e1435c36165467ba01a79026adc3f86dad1b668df3"
checksum = "60c3b379251edadbac7a5fdb31e482274e11dae6ab6cc789d0d86cf34369cf49"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"clap 3.2.23",
"csv",
"encoding",
"env_logger",
"env_logger 0.10.0",
"glob",
"lindera-core",
"lindera-decompress",
@ -2185,16 +2268,28 @@ dependencies = [
]
[[package]]
name = "lindera-core"
version = "0.17.0"
name = "lindera-compress"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0e8ed3cea13f73557a4574a179b1518670a3b70bfdad120521313b03cc89380e"
checksum = "a8d0ea3de5625e2381cac94e518d3b56103fde56bc0dce840fe875c1e871b125"
dependencies = [
"anyhow",
"flate2",
"lindera-decompress",
]
[[package]]
name = "lindera-core"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2281747b98fdd46bcc54ce7fdb6870dad9f67ddb3dc086c47b6704f3e1178cd5"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"encoding_rs",
"log",
"once_cell",
"serde",
"thiserror",
"yada",
@ -2202,20 +2297,20 @@ dependencies = [
[[package]]
name = "lindera-decompress"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2badb41828f89cfa6452db0a66da77897c0a04478304de26c8b2b36613e08d43"
checksum = "52101bd454754c506305ab897af5ac2ae41fe91e3272c1ff5c6a02a089dfaefd"
dependencies = [
"anyhow",
"lzma-rs",
"flate2",
"serde",
]
[[package]]
name = "lindera-dictionary"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e219722c9f56b920c231210e7c25d8b5d35b508e7a2fd69d368916c4b1c926f6"
checksum = "af1c6668848f1d30d216c99093a3ed3fe125c105fa12a4aeed5a1861dc01dd52"
dependencies = [
"anyhow",
"bincode",
@ -2225,15 +2320,16 @@ dependencies = [
[[package]]
name = "lindera-ipadic"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2c8e87c8362c724e8188fb7d9b6d184cac15d01369295e9bff7812b630d57e3b"
checksum = "693098007200fa43fd5cdc9ca8740f371327369672ce812cd87a1f6344971e31"
dependencies = [
"bincode",
"byteorder",
"encoding",
"flate2",
"lindera-core",
"lindera-decompress",
"lindera-ipadic-builder",
"once_cell",
"tar",
@ -2241,19 +2337,19 @@ dependencies = [
[[package]]
name = "lindera-ipadic-builder"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1439e95852e444a116424086dc64d709c90e8af269ff7d2c2c4020f666f8dfab"
checksum = "7b6b7240d097a8fc37ee8f90ebff02c4db0ba5325ecb0dacb6da3724596798c9"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"clap 3.2.23",
"csv",
"encoding_rs",
"encoding_rs_io",
"env_logger",
"env_logger 0.10.0",
"glob",
"lindera-compress",
"lindera-core",
"lindera-decompress",
"log",
@ -2263,15 +2359,16 @@ dependencies = [
[[package]]
name = "lindera-ko-dic"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cb15f949220da45872d774b7831bb030855ec083435c907499782f8558c8a203"
checksum = "abd3c5a4addeb61ca66788a3dd1fd51093e6cd8fea1d997042ada5aa60e8cc5e"
dependencies = [
"bincode",
"byteorder",
"encoding",
"flate2",
"lindera-core",
"lindera-decompress",
"lindera-ko-dic-builder",
"once_cell",
"tar",
@ -2279,18 +2376,18 @@ dependencies = [
[[package]]
name = "lindera-ko-dic-builder"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fde5a7352f4754be4f741e90bf4dff38a12a6572ab3880d0cf688e1166b8d82b"
checksum = "512bb1393a9281e0b13704319d1343b7931416865852d9d7b7c0178431518326"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"clap 3.2.23",
"csv",
"encoding",
"env_logger",
"env_logger 0.10.0",
"glob",
"lindera-compress",
"lindera-core",
"lindera-decompress",
"log",
@ -2299,17 +2396,16 @@ dependencies = [
[[package]]
name = "lindera-unidic-builder"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f1451b2ed8a7184a5f815d84f99d358c1d67297305831453dfdc0eb5d08e22b5"
checksum = "7f575a27f8ba67c15fe16ebf7d277a0ac04e8c8a0f72670ebc2443da9d41c450"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"clap 3.2.23",
"csv",
"encoding",
"env_logger",
"env_logger 0.10.0",
"glob",
"lindera-core",
"lindera-decompress",
@ -2398,16 +2494,6 @@ dependencies = [
"syn",
]
[[package]]
name = "lzma-rs"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "aba8ecb0450dfabce4ad72085eed0a75dffe8f21f7ada05638564ea9db2d7fb1"
dependencies = [
"byteorder",
"crc",
]
[[package]]
name = "manifest-dir-macros"
version = "0.1.16"
@ -2434,7 +2520,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]]
name = "meili-snap"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"insta",
"md5",
@ -2443,7 +2529,7 @@ dependencies = [
[[package]]
name = "meilisearch"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"actix-cors",
"actix-http",
@ -2462,11 +2548,12 @@ dependencies = [
"bytes",
"cargo_toml",
"clap 4.0.32",
"cluster",
"crossbeam-channel",
"deserr",
"dump",
"either",
"env_logger",
"env_logger 0.9.3",
"file-store",
"flate2",
"fst",
@ -2522,7 +2609,7 @@ dependencies = [
"tokio-stream",
"toml",
"urlencoding",
"uuid 1.2.2",
"uuid 1.3.0",
"vergen",
"walkdir",
"yaup",
@ -2531,9 +2618,10 @@ dependencies = [
[[package]]
name = "meilisearch-auth"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"base64 0.13.1",
"cluster",
"enum-iterator",
"hmac",
"maplit",
@ -2545,12 +2633,12 @@ dependencies = [
"sha2",
"thiserror",
"time",
"uuid 1.2.2",
"uuid 1.3.0",
]
[[package]]
name = "meilisearch-types"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"actix-web",
"anyhow",
@ -2575,7 +2663,7 @@ dependencies = [
"thiserror",
"time",
"tokio",
"uuid 1.2.2",
"uuid 1.3.0",
]
[[package]]
@ -2604,7 +2692,7 @@ dependencies = [
[[package]]
name = "milli"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"big_s",
"bimap",
@ -2650,7 +2738,7 @@ dependencies = [
"tempfile",
"thiserror",
"time",
"uuid 1.2.2",
"uuid 1.3.0",
]
[[package]]
@ -2958,7 +3046,7 @@ checksum = "478c572c3d73181ff3c2539045f6eb99e5491218eae919370993b890cdbdd98e"
[[package]]
name = "permissive-json-pointer"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"big_s",
"serde_json",
@ -3488,9 +3576,9 @@ checksum = "58bc9567378fc7690d6b2addae4e60ac2eeea07becb2c64b9f218b53865cba2a"
[[package]]
name = "serde"
version = "1.0.152"
version = "1.0.155"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bb7d1f0d3021d347a83e556fc4683dea2ea09d87bccdf88ff5c12545d89d5efb"
checksum = "71f2b4817415c6d4210bfe1c7bfcf4801b2d904cb4d0e1a8fdb651013c9e86b8"
dependencies = [
"serde_derive",
]
@ -3506,9 +3594,9 @@ dependencies = [
[[package]]
name = "serde_derive"
version = "1.0.152"
version = "1.0.155"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "af487d118eecd09402d70a5d72551860e788df87b464af30e5ea6a38c75c541e"
checksum = "d071a94a3fac4aff69d023a7f411e33f40f3483f8c5190b1953822b6b76d7630"
dependencies = [
"proc-macro2",
"quote",
@ -3517,9 +3605,9 @@ dependencies = [
[[package]]
name = "serde_json"
version = "1.0.91"
version = "1.0.94"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "877c235533714907a8c2464236f5c4b2a17262ef1bd71f38f35ea592c8da6883"
checksum = "1c533a59c9d8a93a09c6ab31f0fd5e5f4dd1b8fc9434804029839884765d04ea"
dependencies = [
"indexmap",
"itoa 1.0.5",
@ -3805,18 +3893,18 @@ checksum = "222a222a5bfe1bba4a77b45ec488a741b3cb8872e5e499451fd7d0129c9c7c3d"
[[package]]
name = "thiserror"
version = "1.0.38"
version = "1.0.39"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6a9cd18aa97d5c45c6603caea1da6628790b37f7a34b6ca89522331c5180fed0"
checksum = "a5ab016db510546d856297882807df8da66a16fb8c4101cb8b30054b0d5b2d9c"
dependencies = [
"thiserror-impl",
]
[[package]]
name = "thiserror-impl"
version = "1.0.38"
version = "1.0.39"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1fb327af4685e4d03fa8cbcf1716380da910eeb2bb8be417e7f9fd3fb164f36f"
checksum = "5420d42e90af0c38c3290abcca25b9b3bdf379fc9f55c528f53a269d9c9a267e"
dependencies = [
"proc-macro2",
"quote",
@ -3825,9 +3913,9 @@ dependencies = [
[[package]]
name = "time"
version = "0.3.17"
version = "0.3.20"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a561bf4617eebd33bca6434b988f39ed798e527f51a1e797d0ee4f61c0a38376"
checksum = "cd0cbfecb4d19b5ea75bb31ad904eb5b9fa13f21079c3b92017ebdf4999a5890"
dependencies = [
"itoa 1.0.5",
"serde",
@ -3843,9 +3931,9 @@ checksum = "2e153e1f1acaef8acc537e68b44906d2db6436e2b35ac2c6b42640fff91f00fd"
[[package]]
name = "time-macros"
version = "0.2.6"
version = "0.2.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d967f99f534ca7e495c575c62638eebc2898a8c84c119b89e250477bc4ba16b2"
checksum = "fd80a657e71da814b8e5d60d3374fc6d35045062245d80224748ae522dd76f36"
dependencies = [
"time-core",
]
@ -4011,6 +4099,12 @@ version = "0.3.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "099b7128301d285f79ddd55b9a83d5e6b9e97c92e0ea0daebee7263e932de992"
[[package]]
name = "unicode-blocks"
version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9de2be6bad6f56ce8373d377e611cbb2265de3a656138065609ce82e217aad70"
[[package]]
name = "unicode-ident"
version = "1.0.6"
@ -4084,9 +4178,9 @@ dependencies = [
[[package]]
name = "uuid"
version = "1.2.2"
version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "422ee0de9031b5b948b97a8fc04e3aa35230001a722ddd27943e0be31564ce4c"
checksum = "1674845326ee10d37ca60470760d4288a6f80f304007d92e5c53bab78c9cfd79"
dependencies = [
"getrandom",
"serde",

View File

@ -9,6 +9,7 @@ members = [
"dump",
"file-store",
"permissive-json-pointer",
"cluster",
"milli",
"filter-parser",
"flatten-serde-json",
@ -17,7 +18,7 @@ members = [
]
[workspace.package]
version = "1.0.0"
version = "1.1.0"
authors = ["Quentin de Quelen <quentin@dequelen.me>", "Clément Renault <clement@meilisearch.com>"]
description = "Meilisearch HTTP server"
homepage = "https://meilisearch.com"

View File

@ -29,7 +29,7 @@ fn bench_formatting(c: &mut criterion::Criterion) {
(vec![Rc::new(MatchingWord::new("thedoord".to_string(), 1, true).unwrap())], vec![0, 1, 2]),
(vec![Rc::new(MatchingWord::new("doord".to_string(), 1, true).unwrap())], vec![1, 2]),
]
), TokenizerBuilder::default().build()),
).unwrap(), TokenizerBuilder::default().build()),
},
];

25
cluster/Cargo.toml Normal file
View File

@ -0,0 +1,25 @@
[package]
name = "cluster"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
ductile = "0.3.0"
serde = { version = "1.0.155", features = ["derive"] }
serde_json = "1.0.94"
thiserror = "1.0.39"
meilisearch-types = { path = "../meilisearch-types" }
roaring = { version = "0.10.1", features = ["serde"] }
log = "0.4.17"
crossbeam = "0.8.2"
bus = "2.3.0"
time = "0.3.20"
uuid = { version = "1.3.0", features = ["v4"] }
synchronoise = "1.0.1"

148
cluster/src/batch.rs Normal file
View File

@ -0,0 +1,148 @@
use meilisearch_types::milli::update::IndexDocumentsMethod;
use meilisearch_types::settings::{Settings, Unchecked};
use meilisearch_types::tasks::TaskId;
use roaring::RoaringBitmap;
use serde::{Deserialize, Serialize};
use time::OffsetDateTime;
use uuid::Uuid;
/// Represents a combination of tasks that can all be processed at the same time.
///
/// A batch contains the set of tasks that it represents (accessible through
/// [`self.ids()`](Batch::ids)), as well as additional information on how to
/// be processed.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Batch {
TaskCancelation {
/// The task cancelation itself.
task: TaskId,
/// The date and time at which the previously processing tasks started.
previous_started_at: OffsetDateTime,
/// The list of tasks that were processing when this task cancelation appeared.
previous_processing_tasks: RoaringBitmap,
},
TaskDeletion(TaskId),
SnapshotCreation(Vec<TaskId>),
Dump(TaskId),
IndexOperation {
op: IndexOperation,
must_create_index: bool,
},
IndexCreation {
index_uid: String,
primary_key: Option<String>,
task: TaskId,
},
IndexUpdate {
index_uid: String,
primary_key: Option<String>,
task: TaskId,
},
IndexDeletion {
index_uid: String,
tasks: Vec<TaskId>,
index_has_been_created: bool,
},
IndexSwap {
task: TaskId,
},
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum DocumentOperation {
Add(Uuid),
Delete(Vec<String>),
}
/// A [batch](Batch) that combines multiple tasks operating on an index.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum IndexOperation {
DocumentOperation {
index_uid: String,
primary_key: Option<String>,
method: IndexDocumentsMethod,
documents_counts: Vec<u64>,
operations: Vec<DocumentOperation>,
tasks: Vec<TaskId>,
},
DocumentDeletion {
index_uid: String,
// The vec associated with each document deletion tasks.
documents: Vec<Vec<String>>,
tasks: Vec<TaskId>,
},
DocumentClear {
index_uid: String,
tasks: Vec<TaskId>,
},
Settings {
index_uid: String,
// The boolean indicates if it's a settings deletion or creation.
settings: Vec<(bool, Settings<Unchecked>)>,
tasks: Vec<TaskId>,
},
DocumentClearAndSetting {
index_uid: String,
cleared_tasks: Vec<TaskId>,
// The boolean indicates if it's a settings deletion or creation.
settings: Vec<(bool, Settings<Unchecked>)>,
settings_tasks: Vec<TaskId>,
},
SettingsAndDocumentOperation {
index_uid: String,
primary_key: Option<String>,
method: IndexDocumentsMethod,
documents_counts: Vec<u64>,
operations: Vec<DocumentOperation>,
document_import_tasks: Vec<TaskId>,
// The boolean indicates if it's a settings deletion or creation.
settings: Vec<(bool, Settings<Unchecked>)>,
settings_tasks: Vec<TaskId>,
},
}
impl Batch {
pub fn ids(&self) -> impl Iterator<Item = TaskId> {
type Ret = Box<dyn Iterator<Item = TaskId>>;
match self {
Batch::TaskCancelation { task, .. } => Box::new(std::iter::once(*task)) as Ret,
Batch::TaskDeletion(task) => Box::new(std::iter::once(*task)) as Ret,
Batch::SnapshotCreation(tasks) => Box::new(tasks.clone().into_iter()) as Ret,
Batch::Dump(task) => Box::new(std::iter::once(*task)) as Ret,
Batch::IndexOperation { op, .. } => match op {
IndexOperation::DocumentOperation { tasks, .. } => {
Box::new(tasks.clone().into_iter()) as Ret
}
IndexOperation::DocumentDeletion { tasks, .. } => {
Box::new(tasks.clone().into_iter()) as Ret
}
IndexOperation::DocumentClear { tasks, .. } => {
Box::new(tasks.clone().into_iter()) as Ret
}
IndexOperation::Settings { tasks, .. } => {
Box::new(tasks.clone().into_iter()) as Ret
}
IndexOperation::DocumentClearAndSetting {
cleared_tasks, settings_tasks, ..
} => {
Box::new(cleared_tasks.clone().into_iter().chain(settings_tasks.clone())) as Ret
}
IndexOperation::SettingsAndDocumentOperation {
document_import_tasks,
settings_tasks,
..
} => Box::new(
document_import_tasks.clone().into_iter().chain(settings_tasks.clone()),
) as Ret,
},
Batch::IndexCreation { task, .. } => Box::new(std::iter::once(*task)) as Ret,
Batch::IndexUpdate { task, .. } => Box::new(std::iter::once(*task)) as Ret,
Batch::IndexDeletion { tasks, .. } => Box::new(tasks.clone().into_iter()) as Ret,
Batch::IndexSwap { task } => Box::new(std::iter::once(*task)) as Ret,
}
}
}

276
cluster/src/leader.rs Normal file
View File

@ -0,0 +1,276 @@
use std::net::ToSocketAddrs;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::{atomic, Arc, Mutex, RwLock};
use std::time::Duration;
use bus::{Bus, BusReader};
use crossbeam::channel::{unbounded, Receiver, Sender};
use ductile::{ChannelReceiver, ChannelSender, ChannelServer};
use log::{info, warn};
use meilisearch_types::keys::Key;
use meilisearch_types::tasks::Task;
use synchronoise::SignalEvent;
use uuid::Uuid;
use crate::batch::Batch;
use crate::{ApiKeyOperation, Consistency, FollowerMsg, LeaderMsg};
#[derive(Clone)]
pub struct Leader {
task_ready_to_commit: Receiver<u32>,
broadcast_to_follower: Sender<LeaderMsg>,
needs_key_sender: Sender<Sender<Vec<Key>>>,
needs_key_receiver: Receiver<Sender<Vec<Key>>>,
pub wake_up: Arc<SignalEvent>,
new_followers: Arc<AtomicUsize>,
active_followers: Arc<AtomicUsize>,
batch_id: Arc<RwLock<u32>>,
}
impl Leader {
pub fn new(
listen_on: impl ToSocketAddrs + Send + 'static,
master_key: Option<String>,
) -> Leader {
let new_followers = Arc::new(AtomicUsize::new(0));
let active_followers = Arc::new(AtomicUsize::new(1));
let wake_up = Arc::new(SignalEvent::auto(true));
let (broadcast_to_follower, process_batch_receiver) = unbounded();
let (task_finished_sender, task_finished_receiver) = unbounded();
let (needs_key_sender, needs_key_receiver) = unbounded();
let nf = new_followers.clone();
let af = active_followers.clone();
let wu = wake_up.clone();
std::thread::spawn(move || {
Self::listener(
listen_on,
master_key,
nf,
af,
wu,
process_batch_receiver,
task_finished_sender,
)
});
Leader {
task_ready_to_commit: task_finished_receiver,
broadcast_to_follower,
needs_key_sender,
needs_key_receiver,
wake_up,
new_followers,
active_followers,
batch_id: Arc::default(),
}
}
pub fn has_new_followers(&self) -> bool {
self.new_followers.load(Ordering::Relaxed) != 0
}
/// Takes all the necessary channels to chat with the scheduler and give them
/// to each new followers
fn listener(
listen_on: impl ToSocketAddrs,
master_key: Option<String>,
new_followers: Arc<AtomicUsize>,
active_followers: Arc<AtomicUsize>,
wake_up: Arc<SignalEvent>,
broadcast_to_follower: Receiver<LeaderMsg>,
task_finished: Sender<u32>,
) {
let listener: ChannelServer<LeaderMsg, FollowerMsg> = if let Some(ref master_key) =
master_key
{
let mut enc = [0; 32];
let master_key = master_key.as_bytes();
if master_key.len() < 32 {
warn!("Master key is not secure, use a longer master key (at least 32 bytes long)");
}
enc.iter_mut().zip(master_key).for_each(|(enc, mk)| *enc = *mk);
info!("Listening with encryption enabled");
ChannelServer::bind_with_enc(listen_on, enc).unwrap()
} else {
ChannelServer::bind(listen_on).unwrap()
};
info!("Ready to the receive connections");
// We're going to broadcast all the batches to all our follower
let bus: Bus<LeaderMsg> = Bus::new(10);
let bus = Arc::new(Mutex::new(bus));
let b = bus.clone();
std::thread::spawn(move || loop {
let msg = broadcast_to_follower.recv().expect("Main thread is dead");
b.lock().unwrap().broadcast(msg);
});
for (sender, receiver, _addr) in listener {
let task_finished = task_finished.clone();
let nf = new_followers.clone();
let af = active_followers.clone();
let wu = wake_up.clone();
let process_batch = bus.lock().unwrap().add_rx();
std::thread::spawn(move || {
Self::follower(sender, receiver, nf, af, wu, process_batch, task_finished)
});
}
}
/// Allow a follower to chat with the scheduler
fn follower(
sender: ChannelSender<LeaderMsg>,
receiver: ChannelReceiver<FollowerMsg>,
new_followers: Arc<AtomicUsize>,
active_followers: Arc<AtomicUsize>,
wake_up: Arc<SignalEvent>,
mut broadcast_to_follower: BusReader<LeaderMsg>,
task_finished: Sender<u32>,
) {
let size = new_followers.fetch_add(1, Ordering::Relaxed) + 1;
wake_up.signal();
info!("A new follower joined the cluster. {} members.", size);
loop {
if let msg @ LeaderMsg::JoinFromDump(_) =
broadcast_to_follower.recv().expect("Main thread died")
{
// we exit the new_follower state and become an active follower even though
// the dump will takes some time to index
new_followers.fetch_sub(1, Ordering::Relaxed);
let size = active_followers.fetch_add(1, Ordering::Relaxed) + 1;
info!("A new follower became active. {} active members.", size);
sender.send(msg).unwrap();
break;
}
}
// send messages to the follower
std::thread::spawn(move || loop {
let msg = broadcast_to_follower.recv().expect("Main thread died");
match msg {
LeaderMsg::JoinFromDump(_) => (),
msg => {
if sender.send(msg).is_err() {
// the follower died, the logging and cluster size update should be done
// in the other thread
break;
}
}
}
});
// receive messages from the follower
loop {
match receiver.recv() {
Err(_) => break,
Ok(msg) => match msg {
FollowerMsg::ReadyToCommit(id) => {
task_finished.send(id).expect("Can't reach the main thread")
}
FollowerMsg::RegisterNewTask(_) => todo!(),
},
}
}
// if we exited from the previous loop it means the follower is down and should
// be removed from the cluster
let size = active_followers.fetch_sub(1, atomic::Ordering::Relaxed) - 1;
info!("A follower left the cluster. {} members.", size);
}
// ============= Everything related to the setup of the cluster
pub fn join_me(&self, dump: Vec<u8>) {
self.broadcast_to_follower
.send(LeaderMsg::JoinFromDump(dump))
.expect("Lost the link with the followers");
}
// ============= Everything related to the scheduler
pub fn starts_batch(&self, batch: Batch) {
let mut batch_id = self.batch_id.write().unwrap();
info!("Send the batch to process to the followers");
*batch_id += 1;
self.broadcast_to_follower
.send(LeaderMsg::StartBatch { id: *batch_id, batch })
.expect("Can't reach the cluster");
}
pub fn commit(&self, consistency_level: Consistency) {
info!("Wait until enough followers are ready to commit a batch");
let batch_id = self.batch_id.write().unwrap();
let mut nodes_ready_to_commit = 1;
loop {
let size = self.active_followers.load(atomic::Ordering::Relaxed);
info!("{nodes_ready_to_commit} nodes are ready to commit for a cluster size of {size}");
let all = nodes_ready_to_commit == size;
match consistency_level {
Consistency::One if nodes_ready_to_commit >= 1 || all => break,
Consistency::Two if nodes_ready_to_commit >= 2 || all => break,
Consistency::Quorum if nodes_ready_to_commit >= (size / 2) || all => break,
Consistency::All if all => break,
_ => (),
}
// we can't wait forever here because if a node dies the cluster size might get updated while we're stuck
match self.task_ready_to_commit.recv_timeout(Duration::new(1, 0)) {
Ok(id) if id == *batch_id => nodes_ready_to_commit += 1,
_ => continue,
};
}
info!("Tells all the follower to commit");
self.broadcast_to_follower.send(LeaderMsg::Commit(*batch_id)).unwrap();
}
pub fn register_new_task(&self, task: Task, update_file: Option<Vec<u8>>) {
info!("Tells all the follower to register a new task");
self.broadcast_to_follower
.send(LeaderMsg::RegisterNewTask { task, update_file })
.expect("Main thread is dead");
}
// ============= Everything related to the api-keys
pub fn insert_key(&self, key: Key) {
self.broadcast_to_follower
.send(LeaderMsg::ApiKeyOperation(ApiKeyOperation::Insert(key)))
.unwrap()
}
pub fn delete_key(&self, uuid: Uuid) {
self.broadcast_to_follower
.send(LeaderMsg::ApiKeyOperation(ApiKeyOperation::Delete(uuid)))
.unwrap()
}
pub fn needs_keys(&self) -> Sender<Vec<Key>> {
self.needs_key_receiver.recv().expect("The cluster is dead")
}
pub fn get_keys(&self) -> Vec<Key> {
let (send, rcv) = crossbeam::channel::bounded(1);
self.needs_key_sender.send(send).expect("The cluster is dead");
rcv.recv().expect("The auth controller is dead")
}
}

231
cluster/src/lib.rs Normal file
View File

@ -0,0 +1,231 @@
use std::net::ToSocketAddrs;
use std::str::FromStr;
use std::sync::{Arc, RwLock};
use batch::Batch;
use crossbeam::channel::{unbounded, Receiver, Sender};
use ductile::{connect_channel, connect_channel_with_enc, ChannelReceiver, ChannelSender};
use log::{info, warn};
use meilisearch_types::keys::Key;
use meilisearch_types::tasks::{KindWithContent, Task};
use serde::{Deserialize, Serialize};
pub mod batch;
mod leader;
pub use leader::Leader;
use uuid::Uuid;
#[derive(Debug, thiserror::Error)]
pub enum Error {
#[error("Network issue occured")]
NetworkIssue,
#[error("Internal error:{0}")]
SerdeJson(#[from] serde_json::Error),
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum LeaderMsg {
/// A dump to join the cluster
JoinFromDump(Vec<u8>),
/// Starts a new batch
StartBatch { id: u32, batch: Batch },
///Tell the follower to commit the update asap
Commit(u32),
///Tell the follower to commit the update asap
RegisterNewTask { task: Task, update_file: Option<Vec<u8>> },
///Tell the follower to commit the update asap
ApiKeyOperation(ApiKeyOperation),
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum FollowerMsg {
// Let the leader knows you're ready to commit
ReadyToCommit(u32),
RegisterNewTask(KindWithContent),
}
#[derive(Default, Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum Consistency {
One,
Two,
Quorum,
#[default]
All,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub enum ApiKeyOperation {
Insert(Key),
Delete(Uuid),
}
impl std::fmt::Display for Consistency {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Consistency::One => write!(f, "one"),
Consistency::Two => write!(f, "two"),
Consistency::Quorum => write!(f, "quorum"),
Consistency::All => write!(f, "all"),
}
}
}
impl FromStr for Consistency {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match s {
"one" => Ok(Consistency::One),
"two" => Ok(Consistency::Two),
"quorum" => Ok(Consistency::Quorum),
"all" => Ok(Consistency::All),
s => Err(format!(
"Unexpected value `{s}`, expected one of `one`, `two`, `quorum`, `all`"
)),
}
}
}
#[derive(Clone)]
pub enum Cluster {
Leader(Leader),
Follower(Follower),
}
#[derive(Clone)]
pub struct Follower {
sender: ChannelSender<FollowerMsg>,
get_batch: Receiver<(u32, Batch)>,
must_commit: Receiver<u32>,
register_new_task: Receiver<(Task, Option<Vec<u8>>)>,
api_key_op: Receiver<ApiKeyOperation>,
batch_id: Arc<RwLock<u32>>,
}
impl Follower {
pub fn join(leader: impl ToSocketAddrs, master_key: Option<String>) -> (Follower, Vec<u8>) {
let (sender, receiver) = if let Some(master_key) = master_key {
let mut enc = [0; 32];
let master_key = master_key.as_bytes();
if master_key.len() < 32 {
warn!("Master key is not secure, use a longer master key (at least 32 bytes long)");
}
enc.iter_mut().zip(master_key).for_each(|(enc, mk)| *enc = *mk);
info!("Connecting with encryption enabled");
connect_channel_with_enc(leader, &enc).unwrap()
} else {
connect_channel(leader).unwrap()
};
info!("Connection to the leader established");
info!("Waiting for the leader to contact us");
let state = receiver.recv().unwrap();
let dump = match state {
LeaderMsg::JoinFromDump(dump) => dump,
msg => panic!("Received unexpected message {msg:?}"),
};
let (get_batch_sender, get_batch_receiver) = unbounded();
let (must_commit_sender, must_commit_receiver) = unbounded();
let (register_task_sender, register_task_receiver) = unbounded();
let (create_api_key_sender, create_api_key_receiver) = unbounded();
std::thread::spawn(move || {
Self::router(
receiver,
get_batch_sender,
must_commit_sender,
register_task_sender,
create_api_key_sender,
);
});
(
Follower {
sender,
get_batch: get_batch_receiver,
must_commit: must_commit_receiver,
register_new_task: register_task_receiver,
api_key_op: create_api_key_receiver,
batch_id: Arc::default(),
},
dump,
)
}
fn router(
receiver: ChannelReceiver<LeaderMsg>,
get_batch: Sender<(u32, Batch)>,
must_commit: Sender<u32>,
register_new_task: Sender<(Task, Option<Vec<u8>>)>,
api_key_op: Sender<ApiKeyOperation>,
) {
loop {
match receiver.recv().expect("Lost connection to the leader") {
LeaderMsg::JoinFromDump(_) => {
warn!("Received a join from dump msg but Im already running : ignoring the message")
}
LeaderMsg::StartBatch { id, batch } => {
info!("Starting to process a new batch");
get_batch.send((id, batch)).expect("Lost connection to the main thread")
}
LeaderMsg::Commit(id) => {
info!("Must commit");
must_commit.send(id).expect("Lost connection to the main thread")
}
LeaderMsg::RegisterNewTask { task, update_file } => {
info!("Registered a new task");
register_new_task
.send((task, update_file))
.expect("Lost connection to the main thread")
}
LeaderMsg::ApiKeyOperation(key) => {
api_key_op.send(key).expect("Lost connection to the main thread")
}
}
}
}
pub fn get_new_batch(&self) -> Batch {
info!("Get new batch called");
let (id, batch) = self.get_batch.recv().expect("Lost connection to the leader");
info!("Got a new batch");
*self.batch_id.write().unwrap() = id;
batch
}
pub fn ready_to_commit(&self) {
info!("I'm ready to commit");
let batch_id = self.batch_id.read().unwrap();
self.sender.send(FollowerMsg::ReadyToCommit(*batch_id)).unwrap();
loop {
let id = self.must_commit.recv().expect("Lost connection to the leader");
#[allow(clippy::comparison_chain)]
if id == *batch_id {
break;
} else if id > *batch_id {
panic!("We missed a batch");
}
}
info!("I got the right to commit");
}
pub fn get_new_task(&self) -> (Task, Option<Vec<u8>>) {
self.register_new_task.recv().expect("Lost connection to the leader")
}
pub fn api_key_operation(&self) -> ApiKeyOperation {
info!("Creating a new api key");
self.api_key_op.recv().expect("Lost connection to the leader")
}
}

View File

@ -118,3 +118,13 @@ ssl_resumption = false
ssl_tickets = false
# Activates SSL tickets.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ssl-tickets
#############################
### Experimental features ###
#############################
experimental_enable_metrics = false
# Experimental metrics feature. For more information, see: <https://github.com/meilisearch/meilisearch/discussions/3518>
# Enables the Prometheus metrics on the `GET /metrics` endpoint.

View File

@ -116,10 +116,20 @@ impl FileStore {
/// List the Uuids of the files in the FileStore
pub fn all_uuids(&self) -> Result<impl Iterator<Item = Result<Uuid>>> {
Ok(self.path.read_dir()?.map(|entry| {
Ok(Uuid::from_str(
entry?.file_name().to_str().ok_or(Error::CouldNotParseFileNameAsUtf8)?,
)?)
Ok(self.path.read_dir()?.filter_map(|entry| {
let file_name = match entry {
Ok(entry) => entry.file_name(),
Err(e) => return Some(Err(e.into())),
};
let file_name = match file_name.to_str() {
Some(file_name) => file_name,
None => return Some(Err(Error::CouldNotParseFileNameAsUtf8)),
};
if file_name.starts_with('.') {
None
} else {
Some(Uuid::from_str(file_name).map_err(|e| e.into()))
}
}))
}
}
@ -135,3 +145,34 @@ impl File {
Ok(())
}
}
#[cfg(test)]
mod test {
use std::io::Write;
use tempfile::TempDir;
use super::*;
#[test]
fn all_uuids() {
let dir = TempDir::new().unwrap();
let fs = FileStore::new(dir.path()).unwrap();
let (uuid, mut file) = fs.new_update().unwrap();
file.write_all(b"Hello world").unwrap();
file.persist().unwrap();
let all_uuids = fs.all_uuids().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(all_uuids, vec![uuid]);
let (uuid2, file) = fs.new_update().unwrap();
let all_uuids = fs.all_uuids().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(all_uuids, vec![uuid]);
file.persist().unwrap();
let mut all_uuids = fs.all_uuids().unwrap().collect::<Result<Vec<_>>>().unwrap();
all_uuids.sort();
let mut expected = vec![uuid, uuid2];
expected.sort();
assert_eq!(all_uuids, expected);
}
}

View File

@ -13,12 +13,15 @@ license.workspace = true
[dependencies]
anyhow = "1.0.64"
bincode = "1.3.3"
cluster = { path = "../cluster" }
crossbeam = "0.8.2"
csv = "1.1.6"
derive_builder = "0.11.2"
dump = { path = "../dump" }
enum-iterator = "1.1.3"
file-store = { path = "../file-store" }
log = "0.4.14"
meilisearch-auth = { path = "../meilisearch-auth" }
meilisearch-types = { path = "../meilisearch-types" }
page_size = "0.5.0"
roaring = { version = "0.10.0", features = ["serde"] }

View File

@ -22,7 +22,8 @@ use std::ffi::OsStr;
use std::fs::{self, File};
use std::io::BufWriter;
use dump::IndexMetadata;
use crossbeam::utils::Backoff;
use dump::{DumpWriter, IndexMetadata};
use log::{debug, error, info};
use meilisearch_types::heed::{RoTxn, RwTxn};
use meilisearch_types::milli::documents::{obkv_to_object, DocumentsBatchReader};
@ -41,14 +42,14 @@ use uuid::Uuid;
use crate::autobatcher::{self, BatchKind};
use crate::utils::{self, swap_index_uid_in_task};
use crate::{Error, IndexScheduler, ProcessingTasks, Result, TaskId};
use crate::{Cluster, Error, IndexScheduler, ProcessingTasks, Result, TaskId};
/// Represents a combination of tasks that can all be processed at the same time.
///
/// A batch contains the set of tasks that it represents (accessible through
/// [`self.ids()`](Batch::ids)), as well as additional information on how to
/// be processed.
#[derive(Debug)]
#[derive(Debug, Clone)]
pub(crate) enum Batch {
TaskCancelation {
/// The task cancelation itself.
@ -85,14 +86,14 @@ pub(crate) enum Batch {
},
}
#[derive(Debug)]
#[derive(Debug, Clone)]
pub(crate) enum DocumentOperation {
Add(Uuid),
Delete(Vec<String>),
}
/// A [batch](Batch) that combines multiple tasks operating on an index.
#[derive(Debug)]
#[derive(Debug, Clone)]
pub(crate) enum IndexOperation {
DocumentOperation {
index_uid: String,
@ -586,6 +587,12 @@ impl IndexScheduler {
_ => unreachable!(),
}
match &self.cluster {
Some(Cluster::Leader(leader)) => leader.commit(self.consistency_level),
Some(Cluster::Follower(follower)) => follower.ready_to_commit(),
None => (),
}
// We must only remove the content files if the transaction is successfully committed
// and if errors occurs when we are deleting files we must do our best to delete
// everything. We do not return the encountered errors when deleting the content
@ -629,6 +636,13 @@ impl IndexScheduler {
}
_ => unreachable!(),
}
match &self.cluster {
Some(Cluster::Leader(leader)) => leader.commit(self.consistency_level),
Some(Cluster::Follower(follower)) => follower.ready_to_commit(),
None => (),
}
wtxn.commit()?;
Ok(vec![task])
}
@ -723,95 +737,9 @@ impl IndexScheduler {
Ok(tasks)
}
Batch::Dump(mut task) => {
// TODO: It would be better to use the started_at from the task instead of generating a new one
let started_at = OffsetDateTime::now_utc();
let (keys, instance_uid) =
if let KindWithContent::DumpCreation { keys, instance_uid } = &task.kind {
(keys, instance_uid)
} else {
unreachable!();
};
let dump = dump::DumpWriter::new(*instance_uid)?;
// 1. dump the keys
let mut dump_keys = dump.create_keys()?;
for key in keys {
dump_keys.push_key(key)?;
}
dump_keys.flush()?;
let rtxn = self.env.read_txn()?;
// 2. dump the tasks
let mut dump_tasks = dump.create_tasks_queue()?;
for ret in self.all_tasks.iter(&rtxn)? {
let (_, mut t) = ret?;
let status = t.status;
let content_file = t.content_uuid();
// In the case we're dumping ourselves we want to be marked as finished
// to not loop over ourselves indefinitely.
if t.uid == task.uid {
let finished_at = OffsetDateTime::now_utc();
// We're going to fake the date because we don't know if everything is going to go well.
// But we need to dump the task as finished and successful.
// If something fail everything will be set appropriately in the end.
t.status = Status::Succeeded;
t.started_at = Some(started_at);
t.finished_at = Some(finished_at);
}
let mut dump_content_file = dump_tasks.push_task(&t.into())?;
// 2.1. Dump the `content_file` associated with the task if there is one and the task is not finished yet.
if let Some(content_file) = content_file {
if status == Status::Enqueued {
let content_file = self.file_store.get_update(content_file)?;
let reader = DocumentsBatchReader::from_reader(content_file)
.map_err(milli::Error::from)?;
let (mut cursor, documents_batch_index) =
reader.into_cursor_and_fields_index();
while let Some(doc) =
cursor.next_document().map_err(milli::Error::from)?
{
dump_content_file.push_document(&obkv_to_object(
&doc,
&documents_batch_index,
)?)?;
}
dump_content_file.flush()?;
}
}
}
dump_tasks.flush()?;
// 3. Dump the indexes
for (uid, index) in self.index_mapper.indexes(&rtxn)? {
let rtxn = index.read_txn()?;
let metadata = IndexMetadata {
uid: uid.clone(),
primary_key: index.primary_key(&rtxn)?.map(String::from),
created_at: index.created_at(&rtxn)?,
updated_at: index.updated_at(&rtxn)?,
};
let mut index_dumper = dump.create_index(&uid, &metadata)?;
let fields_ids_map = index.fields_ids_map(&rtxn)?;
let all_fields: Vec<_> = fields_ids_map.iter().map(|(id, _)| id).collect();
// 3.1. Dump the documents
for ret in index.all_documents(&rtxn)? {
let (_id, doc) = ret?;
let document = milli::obkv_to_json(&all_fields, &fields_ids_map, doc)?;
index_dumper.push_document(&document)?;
}
// 3.2. Dump the settings
let settings = meilisearch_types::settings::settings(&index, &rtxn)?;
index_dumper.settings(&settings)?;
}
let dump = self.create_dump(&task, &started_at)?;
let dump_uid = started_at.format(format_description!(
"[year repr:full][month repr:numerical][day padding:zero]-[hour padding:zero][minute padding:zero][second padding:zero][subsecond digits:3]"
@ -839,6 +767,13 @@ impl IndexScheduler {
let mut index_wtxn = index.write_txn()?;
let tasks = self.apply_index_operation(&mut index_wtxn, &index, op)?;
match &self.cluster {
Some(Cluster::Leader(leader)) => leader.commit(self.consistency_level),
Some(Cluster::Follower(follower)) => follower.ready_to_commit(),
None => (),
}
index_wtxn.commit()?;
Ok(tasks)
@ -937,6 +872,13 @@ impl IndexScheduler {
for swap in swaps {
self.apply_index_swap(&mut wtxn, task.uid, &swap.indexes.0, &swap.indexes.1)?;
}
match &self.cluster {
Some(Cluster::Leader(leader)) => leader.commit(self.consistency_level),
Some(Cluster::Follower(follower)) => follower.ready_to_commit(),
None => (),
}
wtxn.commit()?;
task.status = Status::Succeeded;
Ok(vec![task])
@ -944,6 +886,99 @@ impl IndexScheduler {
}
}
pub(crate) fn create_dump(
&self,
task: &Task,
started_at: &OffsetDateTime,
) -> Result<DumpWriter> {
let (keys, instance_uid) =
if let KindWithContent::DumpCreation { keys, instance_uid } = &task.kind {
(keys, instance_uid)
} else {
unreachable!();
};
let dump = dump::DumpWriter::new(*instance_uid)?;
// 1. dump the keys
let mut dump_keys = dump.create_keys()?;
for key in keys {
dump_keys.push_key(key)?;
}
dump_keys.flush()?;
let rtxn = self.env.read_txn()?;
// 2. dump the tasks
let mut dump_tasks = dump.create_tasks_queue()?;
for ret in self.all_tasks.iter(&rtxn)? {
let (_, mut t) = ret?;
let status = t.status;
let content_file = t.content_uuid();
// In the case we're dumping ourselves we want to be marked as finished
// to not loop over ourselves indefinitely.
if t.uid == task.uid {
let finished_at = OffsetDateTime::now_utc();
// We're going to fake the date because we don't know if everything is going to go well.
// But we need to dump the task as finished and successful.
// If something fail everything will be set appropriately in the end.
t.status = Status::Succeeded;
t.started_at = Some(*started_at);
t.finished_at = Some(finished_at);
}
let mut dump_content_file = dump_tasks.push_task(&t.into())?;
// 2.1. Dump the `content_file` associated with the task if there is one and the task is not finished yet.
if let Some(content_file) = content_file {
if status == Status::Enqueued {
let content_file = self.file_store.get_update(content_file)?;
let reader = DocumentsBatchReader::from_reader(content_file)
.map_err(milli::Error::from)?;
let (mut cursor, documents_batch_index) = reader.into_cursor_and_fields_index();
while let Some(doc) = cursor.next_document().map_err(milli::Error::from)? {
dump_content_file
.push_document(&obkv_to_object(&doc, &documents_batch_index)?)?;
}
dump_content_file.flush()?;
}
}
}
dump_tasks.flush()?;
// 3. Dump the indexes
self.index_mapper.try_for_each_index(&rtxn, |uid, index| -> Result<()> {
let rtxn = index.read_txn()?;
let metadata = IndexMetadata {
uid: uid.to_owned(),
primary_key: index.primary_key(&rtxn)?.map(String::from),
created_at: index.created_at(&rtxn)?,
updated_at: index.updated_at(&rtxn)?,
};
let mut index_dumper = dump.create_index(uid, &metadata)?;
let fields_ids_map = index.fields_ids_map(&rtxn)?;
let all_fields: Vec<_> = fields_ids_map.iter().map(|(id, _)| id).collect();
// 3.1. Dump the documents
for ret in index.all_documents(&rtxn)? {
let (_id, doc) = ret?;
let document = milli::obkv_to_json(&all_fields, &fields_ids_map, doc)?;
index_dumper.push_document(&document)?;
}
// 3.2. Dump the settings
let settings = meilisearch_types::settings::settings(index, &rtxn)?;
index_dumper.settings(&settings)?;
Ok(())
})?;
Ok(dump)
}
/// Swap the index `lhs` with the index `rhs`.
fn apply_index_swap(&self, wtxn: &mut RwTxn, task_id: u32, lhs: &str, rhs: &str) -> Result<()> {
// 1. Verify that both lhs and rhs are existing indexes
@ -1374,4 +1409,274 @@ impl IndexScheduler {
Ok(content_files_to_delete)
}
pub(crate) fn get_batch_from_cluster_batch(
&self,
batch: cluster::batch::Batch,
) -> Result<Batch> {
use cluster::batch::Batch as CBatch;
let mut rtxn = self.env.read_txn().map_err(Error::HeedTransaction)?;
for id in batch.ids() {
let backoff = Backoff::new();
let id = BEU32::new(id);
loop {
if self.all_tasks.get(&rtxn, &id)?.is_some() {
info!("Found the task_id");
break;
}
info!("The task is not present in the task queue, we wait");
// we need to drop the txn to make a write visible
drop(rtxn);
backoff.spin();
rtxn = self.env.read_txn().map_err(Error::HeedTransaction)?;
}
}
Ok(match batch {
CBatch::TaskCancelation { task, previous_started_at, previous_processing_tasks } => {
Batch::TaskCancelation {
task: self.get_existing_tasks(&rtxn, Some(task))?[0].clone(),
previous_started_at,
previous_processing_tasks,
}
}
CBatch::TaskDeletion(task) => {
Batch::TaskDeletion(self.get_existing_tasks(&rtxn, Some(task))?[0].clone())
}
CBatch::SnapshotCreation(tasks) => {
Batch::SnapshotCreation(self.get_existing_tasks(&rtxn, tasks)?)
}
CBatch::Dump(task) => {
Batch::Dump(self.get_existing_tasks(&rtxn, Some(task))?[0].clone())
}
CBatch::IndexOperation { op, must_create_index } => Batch::IndexOperation {
op: self.get_index_op_from_cluster_index_op(&rtxn, op)?,
must_create_index,
},
CBatch::IndexCreation { index_uid, primary_key, task } => Batch::IndexCreation {
index_uid,
primary_key,
task: self.get_existing_tasks(&rtxn, Some(task))?[0].clone(),
},
CBatch::IndexUpdate { index_uid, primary_key, task } => Batch::IndexUpdate {
index_uid,
primary_key,
task: self.get_existing_tasks(&rtxn, Some(task))?[0].clone(),
},
CBatch::IndexDeletion { index_uid, tasks, index_has_been_created } => {
Batch::IndexDeletion {
index_uid,
tasks: self.get_existing_tasks(&rtxn, tasks)?,
index_has_been_created,
}
}
CBatch::IndexSwap { task } => {
Batch::IndexSwap { task: self.get_existing_tasks(&rtxn, Some(task))?[0].clone() }
}
})
}
pub(crate) fn get_index_op_from_cluster_index_op(
&self,
rtxn: &RoTxn,
op: cluster::batch::IndexOperation,
) -> Result<IndexOperation> {
use cluster::batch::IndexOperation as COp;
Ok(match op {
COp::DocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations,
tasks,
} => IndexOperation::DocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations: operations.into_iter().map(|op| op.into()).collect(),
tasks: self.get_existing_tasks(rtxn, tasks)?,
},
COp::DocumentDeletion { index_uid, documents, tasks } => {
IndexOperation::DocumentDeletion {
index_uid,
documents,
tasks: self.get_existing_tasks(rtxn, tasks)?,
}
}
COp::DocumentClear { index_uid, tasks } => IndexOperation::DocumentClear {
index_uid,
tasks: self.get_existing_tasks(rtxn, tasks)?,
},
COp::Settings { index_uid, settings, tasks } => IndexOperation::Settings {
index_uid,
settings,
tasks: self.get_existing_tasks(rtxn, tasks)?,
},
COp::DocumentClearAndSetting { index_uid, cleared_tasks, settings, settings_tasks } => {
IndexOperation::DocumentClearAndSetting {
index_uid,
cleared_tasks: self.get_existing_tasks(rtxn, cleared_tasks)?,
settings,
settings_tasks: self.get_existing_tasks(rtxn, settings_tasks)?,
}
}
COp::SettingsAndDocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations,
document_import_tasks,
settings,
settings_tasks,
} => IndexOperation::SettingsAndDocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations: operations.into_iter().map(|op| op.into()).collect(),
document_import_tasks: self.get_existing_tasks(rtxn, document_import_tasks)?,
settings,
settings_tasks: self.get_existing_tasks(rtxn, settings_tasks)?,
},
})
}
}
impl From<Batch> for cluster::batch::Batch {
fn from(batch: Batch) -> Self {
use cluster::batch::Batch as CBatch;
match batch {
Batch::TaskCancelation { task, previous_started_at, previous_processing_tasks } => {
CBatch::TaskCancelation {
task: task.uid,
previous_started_at,
previous_processing_tasks,
}
}
Batch::TaskDeletion(task) => CBatch::TaskDeletion(task.uid),
Batch::SnapshotCreation(task) => {
CBatch::SnapshotCreation(task.into_iter().map(|task| task.uid).collect())
}
Batch::Dump(task) => CBatch::Dump(task.uid),
Batch::IndexOperation { op, must_create_index } => {
CBatch::IndexOperation { op: op.into(), must_create_index }
}
Batch::IndexCreation { index_uid, primary_key, task } => {
CBatch::IndexCreation { index_uid, primary_key, task: task.uid }
}
Batch::IndexUpdate { index_uid, primary_key, task } => {
CBatch::IndexUpdate { index_uid, primary_key, task: task.uid }
}
Batch::IndexDeletion { index_uid, tasks, index_has_been_created } => {
CBatch::IndexDeletion {
index_uid,
tasks: tasks.into_iter().map(|task| task.uid).collect(),
index_has_been_created,
}
}
Batch::IndexSwap { task } => CBatch::IndexSwap { task: task.uid },
}
}
}
impl From<IndexOperation> for cluster::batch::IndexOperation {
fn from(op: IndexOperation) -> Self {
use cluster::batch::IndexOperation as COp;
match op {
IndexOperation::DocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations,
tasks,
} => COp::DocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations: operations.into_iter().map(|op| op.into()).collect(),
tasks: tasks.into_iter().map(|task| task.uid).collect(),
},
IndexOperation::DocumentDeletion { index_uid, documents, tasks } => {
COp::DocumentDeletion {
index_uid,
documents,
tasks: tasks.into_iter().map(|task| task.uid).collect(),
}
}
IndexOperation::DocumentClear { index_uid, tasks } => COp::DocumentClear {
index_uid,
tasks: tasks.into_iter().map(|task| task.uid).collect(),
},
IndexOperation::Settings { index_uid, settings, tasks } => COp::Settings {
index_uid,
settings,
tasks: tasks.into_iter().map(|task| task.uid).collect(),
},
IndexOperation::DocumentClearAndSetting {
index_uid,
cleared_tasks,
settings,
settings_tasks,
} => COp::DocumentClearAndSetting {
index_uid,
cleared_tasks: cleared_tasks.into_iter().map(|task| task.uid).collect(),
settings,
settings_tasks: settings_tasks.into_iter().map(|task| task.uid).collect(),
},
IndexOperation::SettingsAndDocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations,
document_import_tasks,
settings,
settings_tasks,
} => COp::SettingsAndDocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations: operations.into_iter().map(|op| op.into()).collect(),
document_import_tasks: document_import_tasks
.into_iter()
.map(|task| task.uid)
.collect(),
settings,
settings_tasks: settings_tasks.into_iter().map(|task| task.uid).collect(),
},
}
}
}
impl From<DocumentOperation> for cluster::batch::DocumentOperation {
fn from(op: DocumentOperation) -> Self {
use cluster::batch::DocumentOperation as COp;
match op {
DocumentOperation::Add(uuid) => COp::Add(uuid),
DocumentOperation::Delete(docs) => COp::Delete(docs),
}
}
}
impl From<cluster::batch::DocumentOperation> for DocumentOperation {
fn from(op: cluster::batch::DocumentOperation) -> Self {
use cluster::batch::DocumentOperation as COp;
match op {
COp::Add(uuid) => DocumentOperation::Add(uuid),
COp::Delete(docs) => DocumentOperation::Delete(docs),
}
}
}

View File

@ -0,0 +1,370 @@
/// the map size to use when we don't succeed in reading it in indexes.
const DEFAULT_MAP_SIZE: usize = 10 * 1024 * 1024 * 1024; // 10 GiB
use std::collections::BTreeMap;
use std::path::Path;
use std::time::Duration;
use meilisearch_types::heed::{EnvClosingEvent, EnvOpenOptions};
use meilisearch_types::milli::Index;
use time::OffsetDateTime;
use uuid::Uuid;
use super::IndexStatus::{self, Available, BeingDeleted, Closing, Missing};
use crate::lru::{InsertionOutcome, LruMap};
use crate::{clamp_to_page_size, Result};
/// Keep an internally consistent view of the open indexes in memory.
///
/// This view is made of an LRU cache that will evict the least frequently used indexes when new indexes are opened.
/// Indexes that are being closed (for resizing or due to cache eviction) or deleted cannot be evicted from the cache and
/// are stored separately.
///
/// This view provides operations to change the state of the index as it is known in memory:
/// open an index (making it available for queries), close an index (specifying the new size it should be opened with),
/// delete an index.
///
/// External consistency with the other bits of data of an index is provided by the `IndexMapper` parent structure.
pub struct IndexMap {
/// A LRU map of indexes that are in the open state and available for queries.
available: LruMap<Uuid, Index>,
/// A map of indexes that are not available for queries, either because they are being deleted
/// or because they are being closed.
///
/// If they are being deleted, the UUID points to `None`.
unavailable: BTreeMap<Uuid, Option<ClosingIndex>>,
/// A monotonically increasing generation number, used to differentiate between multiple successive index closing requests.
///
/// Because multiple readers could be waiting on an index to close, the following could theoretically happen:
///
/// 1. Multiple readers wait for the index closing to occur.
/// 2. One of them "wins the race", takes the lock and then removes the index that finished closing from the map.
/// 3. The index is reopened, but must be closed again (such as being resized again).
/// 4. One reader that "lost the race" in (2) wakes up and tries to take the lock and remove the index from the map.
///
/// In that situation, the index may or may not have finished closing. The `generation` field allows to remember which
/// closing request was made, so the reader that "lost the race" has the old generation and will need to wait again for the index
/// to close.
generation: usize,
}
#[derive(Clone)]
pub struct ClosingIndex {
uuid: Uuid,
closing_event: EnvClosingEvent,
map_size: usize,
generation: usize,
}
impl ClosingIndex {
/// Waits for the index to be definitely closed.
///
/// To avoid blocking, users should relinquish their locks to the IndexMap before calling this function.
///
/// After the index is physically closed, the in memory map must still be updated to take this into account.
/// To do so, a `ReopenableIndex` is returned, that can be used to either definitely close or definitely open
/// the index without waiting anymore.
pub fn wait_timeout(self, timeout: Duration) -> Option<ReopenableIndex> {
self.closing_event.wait_timeout(timeout).then_some(ReopenableIndex {
uuid: self.uuid,
map_size: self.map_size,
generation: self.generation,
})
}
}
pub struct ReopenableIndex {
uuid: Uuid,
map_size: usize,
generation: usize,
}
impl ReopenableIndex {
/// Attempts to reopen the index, which can result in the index being reopened again or not
/// (e.g. if another thread already opened and closed the index again).
///
/// Use get again on the IndexMap to get the updated status.
///
/// Fails if the underlying index creation fails.
///
/// # Status table
///
/// | Previous Status | New Status |
/// |-----------------|----------------------------------------------|
/// | Missing | Missing |
/// | BeingDeleted | BeingDeleted |
/// | Closing | Available or Closing depending on generation |
/// | Available | Available |
///
pub fn reopen(self, map: &mut IndexMap, path: &Path) -> Result<()> {
if let Closing(reopen) = map.get(&self.uuid) {
if reopen.generation != self.generation {
return Ok(());
}
map.unavailable.remove(&self.uuid);
map.create(&self.uuid, path, None, self.map_size)?;
}
Ok(())
}
/// Attempts to close the index, which may or may not result in the index being closed
/// (e.g. if another thread already reopened the index again).
///
/// Use get again on the IndexMap to get the updated status.
///
/// # Status table
///
/// | Previous Status | New Status |
/// |-----------------|--------------------------------------------|
/// | Missing | Missing |
/// | BeingDeleted | BeingDeleted |
/// | Closing | Missing or Closing depending on generation |
/// | Available | Available |
pub fn close(self, map: &mut IndexMap) {
if let Closing(reopen) = map.get(&self.uuid) {
if reopen.generation != self.generation {
return;
}
map.unavailable.remove(&self.uuid);
}
}
}
impl IndexMap {
pub fn new(cap: usize) -> IndexMap {
Self { unavailable: Default::default(), available: LruMap::new(cap), generation: 0 }
}
/// Gets the current status of an index in the map.
///
/// If the index is available it can be accessed from the returned status.
pub fn get(&self, uuid: &Uuid) -> IndexStatus {
self.available
.get(uuid)
.map(|index| Available(index.clone()))
.unwrap_or_else(|| self.get_unavailable(uuid))
}
fn get_unavailable(&self, uuid: &Uuid) -> IndexStatus {
match self.unavailable.get(uuid) {
Some(Some(reopen)) => Closing(reopen.clone()),
Some(None) => BeingDeleted,
None => Missing,
}
}
/// Attempts to create a new index that wasn't existing before.
///
/// # Status table
///
/// | Previous Status | New Status |
/// |-----------------|------------|
/// | Missing | Available |
/// | BeingDeleted | panics |
/// | Closing | panics |
/// | Available | panics |
///
pub fn create(
&mut self,
uuid: &Uuid,
path: &Path,
date: Option<(OffsetDateTime, OffsetDateTime)>,
map_size: usize,
) -> Result<Index> {
if !matches!(self.get_unavailable(uuid), Missing) {
panic!("Attempt to open an index that was unavailable");
}
let index = create_or_open_index(path, date, map_size)?;
match self.available.insert(*uuid, index.clone()) {
InsertionOutcome::InsertedNew => (),
InsertionOutcome::Evicted(evicted_uuid, evicted_index) => {
self.close(evicted_uuid, evicted_index, 0);
}
InsertionOutcome::Replaced(_) => {
panic!("Attempt to open an index that was already opened")
}
}
Ok(index)
}
/// Increases the current generation. See documentation for this field.
///
/// In the unlikely event that the 2^64 generations would have been exhausted, we simply wrap-around.
///
/// For this to cause an issue, one should be able to stop a reader in time after it got a `ReopenableIndex` and before it takes the lock
/// to remove it from the unavailable map, and keep the reader in this frozen state for 2^64 closing of other indexes.
///
/// This seems overwhelmingly impossible to achieve in practice.
fn next_generation(&mut self) -> usize {
self.generation = self.generation.wrapping_add(1);
self.generation
}
/// Attempts to close an index.
///
/// # Status table
///
/// | Previous Status | New Status |
/// |-----------------|---------------|
/// | Missing | Missing |
/// | BeingDeleted | BeingDeleted |
/// | Closing | Closing |
/// | Available | Closing |
///
pub fn close_for_resize(&mut self, uuid: &Uuid, map_size_growth: usize) {
let Some(index) = self.available.remove(uuid) else { return; };
self.close(*uuid, index, map_size_growth);
}
fn close(&mut self, uuid: Uuid, index: Index, map_size_growth: usize) {
let map_size = index.map_size().unwrap_or(DEFAULT_MAP_SIZE) + map_size_growth;
let closing_event = index.prepare_for_closing();
let generation = self.next_generation();
self.unavailable
.insert(uuid, Some(ClosingIndex { uuid, closing_event, map_size, generation }));
}
/// Attempts to delete and index.
///
/// `end_deletion` must be called just after.
///
/// # Status table
///
/// | Previous Status | New Status | Return value |
/// |-----------------|--------------|-----------------------------|
/// | Missing | BeingDeleted | Ok(None) |
/// | BeingDeleted | BeingDeleted | Err(None) |
/// | Closing | Closing | Err(Some(reopen)) |
/// | Available | BeingDeleted | Ok(Some(env_closing_event)) |
pub fn start_deletion(
&mut self,
uuid: &Uuid,
) -> std::result::Result<Option<EnvClosingEvent>, Option<ClosingIndex>> {
if let Some(index) = self.available.remove(uuid) {
self.unavailable.insert(*uuid, None);
return Ok(Some(index.prepare_for_closing()));
}
match self.unavailable.remove(uuid) {
Some(Some(reopen)) => Err(Some(reopen)),
Some(None) => Err(None),
None => Ok(None),
}
}
/// Marks that an index deletion finished.
///
/// Must be used after calling `start_deletion`.
///
/// # Status table
///
/// | Previous Status | New Status |
/// |-----------------|------------|
/// | Missing | Missing |
/// | BeingDeleted | Missing |
/// | Closing | panics |
/// | Available | panics |
pub fn end_deletion(&mut self, uuid: &Uuid) {
assert!(
self.available.get(uuid).is_none(),
"Attempt to finish deletion of an index that was not being deleted"
);
// Do not panic if the index was Missing or BeingDeleted
assert!(
!matches!(self.unavailable.remove(uuid), Some(Some(_))),
"Attempt to finish deletion of an index that was being closed"
);
}
}
/// Create or open an index in the specified path.
/// The path *must* exist or an error will be thrown.
fn create_or_open_index(
path: &Path,
date: Option<(OffsetDateTime, OffsetDateTime)>,
map_size: usize,
) -> Result<Index> {
let mut options = EnvOpenOptions::new();
options.map_size(clamp_to_page_size(map_size));
options.max_readers(1024);
if let Some((created, updated)) = date {
Ok(Index::new_with_creation_dates(options, path, created, updated)?)
} else {
Ok(Index::new(options, path)?)
}
}
/// Putting the tests of the LRU down there so we have access to the cache's private members
#[cfg(test)]
mod tests {
use meilisearch_types::heed::Env;
use meilisearch_types::Index;
use uuid::Uuid;
use super::super::IndexMapper;
use crate::tests::IndexSchedulerHandle;
use crate::utils::clamp_to_page_size;
use crate::IndexScheduler;
impl IndexMapper {
fn test() -> (Self, Env, IndexSchedulerHandle) {
let (index_scheduler, handle) = IndexScheduler::test(true, vec![]);
(index_scheduler.index_mapper, index_scheduler.env, handle)
}
}
fn check_first_unavailable(mapper: &IndexMapper, expected_uuid: Uuid, is_closing: bool) {
let index_map = mapper.index_map.read().unwrap();
let (uuid, state) = index_map.unavailable.first_key_value().unwrap();
assert_eq!(uuid, &expected_uuid);
assert_eq!(state.is_some(), is_closing);
}
#[test]
fn evict_indexes() {
let (mapper, env, _handle) = IndexMapper::test();
let mut uuids = vec![];
// LRU cap + 1
for i in 0..(5 + 1) {
let index_name = format!("index-{i}");
let wtxn = env.write_txn().unwrap();
mapper.create_index(wtxn, &index_name, None).unwrap();
let txn = env.read_txn().unwrap();
uuids.push(mapper.index_mapping.get(&txn, &index_name).unwrap().unwrap());
}
// index-0 was evicted
check_first_unavailable(&mapper, uuids[0], true);
// get back the evicted index
let wtxn = env.write_txn().unwrap();
mapper.create_index(wtxn, "index-0", None).unwrap();
// Least recently used is now index-1
check_first_unavailable(&mapper, uuids[1], true);
}
#[test]
fn resize_index() {
let (mapper, env, _handle) = IndexMapper::test();
let index = mapper.create_index(env.write_txn().unwrap(), "index", None).unwrap();
assert_index_size(index, mapper.index_base_map_size);
mapper.resize_index(&env.read_txn().unwrap(), "index").unwrap();
let index = mapper.create_index(env.write_txn().unwrap(), "index", None).unwrap();
assert_index_size(index, mapper.index_base_map_size + mapper.index_growth_amount);
mapper.resize_index(&env.read_txn().unwrap(), "index").unwrap();
let index = mapper.create_index(env.write_txn().unwrap(), "index", None).unwrap();
assert_index_size(index, mapper.index_base_map_size + mapper.index_growth_amount * 2);
}
fn assert_index_size(index: Index, expected: usize) {
let expected = clamp_to_page_size(expected);
let index_map_size = index.map_size().unwrap();
assert_eq!(index_map_size, expected);
}
}

View File

@ -1,21 +1,22 @@
use std::collections::hash_map::Entry;
use std::collections::HashMap;
use std::path::{Path, PathBuf};
use std::path::PathBuf;
use std::sync::{Arc, RwLock};
use std::time::Duration;
use std::{fs, thread};
use log::error;
use meilisearch_types::heed::types::Str;
use meilisearch_types::heed::{Database, Env, EnvOpenOptions, RoTxn, RwTxn};
use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn};
use meilisearch_types::milli::update::IndexerConfig;
use meilisearch_types::milli::Index;
use synchronoise::SignalEvent;
use time::OffsetDateTime;
use uuid::Uuid;
use self::IndexStatus::{Available, BeingDeleted, BeingResized};
use self::index_map::IndexMap;
use self::IndexStatus::{Available, BeingDeleted, Closing, Missing};
use crate::uuid_codec::UuidCodec;
use crate::{clamp_to_page_size, Error, Result};
use crate::{Error, Result};
mod index_map;
const INDEX_MAPPING: &str = "index-mapping";
@ -26,17 +27,38 @@ const INDEX_MAPPING: &str = "index-mapping";
/// 2. Opening indexes and storing references to these opened indexes
/// 3. Accessing indexes through their uuid
/// 4. Mapping a user-defined name to each index uuid.
///
/// # Implementation notes
///
/// An index exists as 3 bits of data:
/// 1. The index data on disk, that can exist in 3 states: Missing, Present, or BeingDeleted.
/// 2. The persistent database containing the association between the index' name and its UUID,
/// that can exist in 2 states: Missing or Present.
/// 3. The state of the index in the in-memory `IndexMap`, that can exist in multiple states:
/// - Missing
/// - Available
/// - Closing (because an index needs resizing or was evicted from the cache)
/// - BeingDeleted
///
/// All of this data should be kept consistent between index operations, which is achieved by the `IndexMapper`
/// with the use of the following primitives:
/// - A RwLock on the `IndexMap`.
/// - Transactions on the association database.
/// - ClosingEvent signals emitted when closing an environment.
#[derive(Clone)]
pub struct IndexMapper {
/// Keep track of the opened indexes. Used mainly by the index resolver.
index_map: Arc<RwLock<HashMap<Uuid, IndexStatus>>>,
index_map: Arc<RwLock<IndexMap>>,
/// Map an index name with an index uuid currently available on disk.
pub(crate) index_mapping: Database<Str, UuidCodec>,
/// Path to the folder where the LMDB environments of each index are.
base_path: PathBuf,
index_size: usize,
/// The map size an index is opened with on the first time.
index_base_map_size: usize,
/// The quantity by which the map size of an index is incremented upon reopening, in bytes.
index_growth_amount: usize,
pub indexer_config: Arc<IndexerConfig>,
}
@ -44,10 +66,12 @@ pub struct IndexMapper {
#[allow(clippy::large_enum_variant)]
#[derive(Clone)]
pub enum IndexStatus {
/// Not currently in the index map.
Missing,
/// Do not insert it back in the index map as it is currently being deleted.
BeingDeleted,
/// Temporarily do not insert the index in the index map as it is currently being resized.
BeingResized(Arc<SignalEvent>),
/// Temporarily do not insert the index in the index map as it is currently being resized/evicted from the map.
Closing(index_map::ClosingIndex),
/// You can use the index without worrying about anything.
Available(Index),
}
@ -56,37 +80,21 @@ impl IndexMapper {
pub fn new(
env: &Env,
base_path: PathBuf,
index_size: usize,
index_base_map_size: usize,
index_growth_amount: usize,
index_count: usize,
indexer_config: IndexerConfig,
) -> Result<Self> {
Ok(Self {
index_map: Arc::default(),
index_map: Arc::new(RwLock::new(IndexMap::new(index_count))),
index_mapping: env.create_database(Some(INDEX_MAPPING))?,
base_path,
index_size,
index_base_map_size,
index_growth_amount,
indexer_config: Arc::new(indexer_config),
})
}
/// Create or open an index in the specified path.
/// The path *must* exists or an error will be thrown.
fn create_or_open_index(
&self,
path: &Path,
date: Option<(OffsetDateTime, OffsetDateTime)>,
map_size: usize,
) -> Result<Index> {
let mut options = EnvOpenOptions::new();
options.map_size(clamp_to_page_size(map_size));
options.max_readers(1024);
if let Some((created, updated)) = date {
Ok(Index::new_with_creation_dates(options, path, created, updated)?)
} else {
Ok(Index::new(options, path)?)
}
}
/// Get or create the index.
pub fn create_index(
&self,
@ -106,16 +114,17 @@ impl IndexMapper {
let index_path = self.base_path.join(uuid.to_string());
fs::create_dir_all(&index_path)?;
let index = self.create_or_open_index(&index_path, date, self.index_size)?;
wtxn.commit()?;
// Error if the UUIDv4 somehow already exists in the map, since it should be fresh.
// This is very unlikely to happen in practice.
// TODO: it would be better to lazily create the index. But we need an Index::open function for milli.
if self.index_map.write().unwrap().insert(uuid, Available(index.clone())).is_some()
{
panic!("Uuid v4 conflict: index with UUID {uuid} already exists.");
}
let index = self.index_map.write().unwrap().create(
&uuid,
&index_path,
date,
self.index_base_map_size,
)?;
wtxn.commit()?;
Ok(index)
}
@ -135,23 +144,42 @@ impl IndexMapper {
assert!(self.index_mapping.delete(&mut wtxn, name)?);
wtxn.commit()?;
// We remove the index from the in-memory index map.
let mut tries = 0;
// Attempts to remove the index from the in-memory index map in a loop.
//
// If the index is currently being closed, we will wait for it to be closed and retry getting it in a subsequent
// loop iteration.
//
// We make 100 attempts before giving up.
// This could happen in the following situations:
//
// 1. There is a bug preventing the index from being correctly closed, or us from detecting this.
// 2. A user of the index is keeping it open for more than 600 seconds. This could happen e.g. during a pathological search.
// This can not be caused by indexation because deleting an index happens in the scheduler itself, so cannot be concurrent with indexation.
//
// In these situations, reporting the error through a panic is in order.
let closing_event = loop {
let mut lock = self.index_map.write().unwrap();
let resize_operation = match lock.insert(uuid, BeingDeleted) {
Some(Available(index)) => break Some(index.prepare_for_closing()),
// The target index is in the middle of a resize operation.
// Wait for this operation to complete, then try again.
Some(BeingResized(resize_operation)) => resize_operation.clone(),
// The index is already being deleted or doesn't exist.
// It's OK to remove it from the map again.
_ => break None,
};
// Avoiding deadlocks: we need to drop the lock before waiting for the end of the resize, which
// will involve operations on the very map we're locking.
drop(lock);
resize_operation.wait();
match lock.start_deletion(&uuid) {
Ok(env_closing) => break env_closing,
Err(Some(reopen)) => {
// drop the lock here so that we don't synchronously wait for the index to close.
drop(lock);
tries += 1;
if tries >= 100 {
panic!("Too many attempts to close index {name} prior to deletion.")
}
let reopen = if let Some(reopen) = reopen.wait_timeout(Duration::from_secs(6)) {
reopen
} else {
continue;
};
reopen.close(&mut self.index_map.write().unwrap());
continue;
}
Err(None) => return Ok(()),
}
};
let index_map = self.index_map.clone();
@ -161,7 +189,7 @@ impl IndexMapper {
.name(String::from("index_deleter"))
.spawn(move || {
// We first wait to be sure that the previously opened index is effectively closed.
// This can take a lot of time, this is why we do that in a seperate thread.
// This can take a lot of time, this is why we do that in a separate thread.
if let Some(closing_event) = closing_event {
closing_event.wait();
}
@ -175,7 +203,7 @@ impl IndexMapper {
}
// Finally we remove the entry from the index map.
assert!(matches!(index_map.write().unwrap().remove(&uuid), Some(BeingDeleted)));
index_map.write().unwrap().end_deletion(&uuid);
})
.unwrap();
@ -195,76 +223,15 @@ impl IndexMapper {
/// - If the Index corresponding to the passed name is concurrently being deleted/resized or cannot be found in the
/// in memory hash map.
pub fn resize_index(&self, rtxn: &RoTxn, name: &str) -> Result<()> {
// fixme: factor to a function?
let uuid = self
.index_mapping
.get(rtxn, name)?
.ok_or_else(|| Error::IndexNotFound(name.to_string()))?;
// We remove the index from the in-memory index map.
let mut lock = self.index_map.write().unwrap();
// signal that will be sent when the resize operation completes
let resize_operation = Arc::new(SignalEvent::manual(false));
let index = match lock.insert(uuid, BeingResized(resize_operation)) {
Some(Available(index)) => index,
Some(previous_status) => {
lock.insert(uuid, previous_status);
panic!(
"Attempting to resize index {name} that is already being resized or deleted."
)
}
None => {
panic!("Could not find the status of index {name} in the in-memory index mapper.")
}
};
self.index_map.write().unwrap().close_for_resize(&uuid, self.index_growth_amount);
drop(lock);
let resize_succeeded = (move || {
let current_size = index.map_size()?;
let new_size = current_size * 2;
let closing_event = index.prepare_for_closing();
log::debug!("Waiting for index {name} to close");
if !closing_event.wait_timeout(std::time::Duration::from_secs(600)) {
// fail after 10 minutes waiting
panic!("Could not resize index {name} (unable to close it)");
}
log::info!("Resized index {name} from {current_size} to {new_size} bytes");
let index_path = self.base_path.join(uuid.to_string());
let index = self.create_or_open_index(&index_path, None, new_size)?;
Ok(index)
})();
// Put the map back to a consistent state.
// Even if there was an error we don't want to leave the map in an inconsistent state as it would cause
// deadlocks.
let mut lock = self.index_map.write().unwrap();
let (resize_operation, resize_succeeded) = match resize_succeeded {
Ok(index) => {
// insert the resized index
let Some(BeingResized(resize_operation)) = lock.insert(uuid, Available(index)) else {
panic!("Index state for index {name} was modified while it was being resized")
};
(resize_operation, Ok(()))
}
Err(error) => {
// there was an error, not much we can do... delete the index from the in-memory map to prevent future errors
let Some(BeingResized(resize_operation)) = lock.remove(&uuid) else {
panic!("Index state for index {name} was modified while it was being resized")
};
(resize_operation, Err(error))
}
};
// drop the lock before signaling completion so that other threads don't immediately await on the lock after waking up.
drop(lock);
resize_operation.signal();
resize_succeeded
Ok(())
}
/// Return an index, may open it if it wasn't already opened.
@ -274,47 +241,68 @@ impl IndexMapper {
.get(rtxn, name)?
.ok_or_else(|| Error::IndexNotFound(name.to_string()))?;
// we clone here to drop the lock before entering the match
let mut tries = 0;
// attempts to open the index in a loop.
//
// If the index is currently being closed, we will wait for it to be closed and retry getting it in a subsequent
// loop iteration.
//
// We make 100 attempts before giving up.
// This could happen in the following situations:
//
// 1. There is a bug preventing the index from being correctly closed, or us from detecting it was.
// 2. A user of the index is keeping it open for more than 600 seconds. This could happen e.g. during a long indexation,
// a pathological search, and so on.
//
// In these situations, reporting the error through a panic is in order.
let index = loop {
let index = self.index_map.read().unwrap().get(&uuid).cloned();
tries += 1;
if tries > 100 {
panic!("Too many spurious wake ups while trying to open the index {name}");
}
// we get the index here to drop the lock before entering the match
let index = self.index_map.read().unwrap().get(&uuid);
match index {
Some(Available(index)) => break index,
Some(BeingResized(ref resize_operation)) => {
Available(index) => break index,
Closing(reopen) => {
// Avoiding deadlocks: no lock taken while doing this operation.
resize_operation.wait();
let reopen = if let Some(reopen) = reopen.wait_timeout(Duration::from_secs(6)) {
reopen
} else {
continue;
};
let index_path = self.base_path.join(uuid.to_string());
// take the lock to reopen the environment.
reopen.reopen(&mut self.index_map.write().unwrap(), &index_path)?;
continue;
}
Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())),
BeingDeleted => return Err(Error::IndexNotFound(name.to_string())),
// since we're lazy, it's possible that the index has not been opened yet.
None => {
Missing => {
let mut index_map = self.index_map.write().unwrap();
// between the read lock and the write lock it's not impossible
// that someone already opened the index (eg if two search happens
// that someone already opened the index (eg if two searches happen
// at the same time), thus before opening it we check a second time
// if it's not already there.
// Since there is a good chance it's not already there we can use
// the entry method.
match index_map.entry(uuid) {
Entry::Vacant(entry) => {
match index_map.get(&uuid) {
Missing => {
let index_path = self.base_path.join(uuid.to_string());
let index =
self.create_or_open_index(&index_path, None, self.index_size)?;
entry.insert(Available(index.clone()));
break index;
break index_map.create(
&uuid,
&index_path,
None,
self.index_base_map_size,
)?;
}
Entry::Occupied(entry) => match entry.get() {
Available(index) => break index.clone(),
BeingResized(resize_operation) => {
// Avoiding the deadlock: we drop the lock before waiting
let resize_operation = resize_operation.clone();
drop(index_map);
resize_operation.wait();
continue;
}
BeingDeleted => return Err(Error::IndexNotFound(name.to_string())),
},
Available(index) => break index,
Closing(_) => {
// the reopening will be handled in the next loop operation
continue;
}
BeingDeleted => return Err(Error::IndexNotFound(name.to_string())),
}
}
}
@ -323,18 +311,38 @@ impl IndexMapper {
Ok(index)
}
/// Return all indexes, may open them if they weren't already opened.
pub fn indexes(&self, rtxn: &RoTxn) -> Result<Vec<(String, Index)>> {
/// Attempts `f` for each index that exists in the index mapper.
///
/// It is preferable to use this function rather than a loop that opens all indexes, as a way to avoid having all indexes opened,
/// which is unsupported in general.
///
/// Since `f` is allowed to return a result, and `Index` is cloneable, it is still possible to wrongly build e.g. a vector of
/// all the indexes, but this function makes it harder and so less likely to do accidentally.
pub fn try_for_each_index<U, V>(
&self,
rtxn: &RoTxn,
mut f: impl FnMut(&str, &Index) -> Result<U>,
) -> Result<V>
where
V: FromIterator<U>,
{
self.index_mapping
.iter(rtxn)?
.map(|ret| {
ret.map_err(Error::from).and_then(|(name, _)| {
self.index(rtxn, name).map(|index| (name.to_string(), index))
})
.map(|res| {
res.map_err(Error::from)
.and_then(|(name, _)| self.index(rtxn, name).and_then(|index| f(name, &index)))
})
.collect()
}
/// Return the name of all indexes without opening them.
pub fn index_names(&self, rtxn: &RoTxn) -> Result<Vec<String>> {
self.index_mapping
.iter(rtxn)?
.map(|res| res.map_err(Error::from).map(|(name, _)| name.to_string()))
.collect()
}
/// Swap two index names.
pub fn swap(&self, wtxn: &mut RwTxn, lhs: &str, rhs: &str) -> Result<()> {
let lhs_uuid = self

View File

@ -33,6 +33,8 @@ pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String {
snapshots_path: _,
auth_path: _,
version_file_path: _,
cluster: _,
consistency_level: _,
test_breakpoint_sdr: _,
planned_failures: _,
run_loop_iteration: _,
@ -254,6 +256,6 @@ pub fn snapshot_canceled_by(
snap
}
pub fn snapshot_index_mapper(rtxn: &RoTxn, mapper: &IndexMapper) -> String {
let names = mapper.indexes(rtxn).unwrap().into_iter().map(|(n, _)| n).collect::<Vec<_>>();
let names = mapper.index_names(rtxn).unwrap();
format!("{names:?}")
}

View File

@ -24,32 +24,37 @@ pub mod error;
mod index_mapper;
#[cfg(test)]
mod insta_snapshot;
mod lru;
mod utils;
mod uuid_codec;
pub type Result<T> = std::result::Result<T, Error>;
pub type TaskId = u32;
use std::io::Write;
use std::ops::{Bound, RangeBounds};
use std::path::PathBuf;
use std::path::{Path, PathBuf};
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering::Relaxed;
use std::sync::{Arc, RwLock};
use std::time::Duration;
use batch::Batch;
use cluster::{Cluster, Consistency};
use dump::{KindDump, TaskDump, UpdateFile};
pub use error::Error;
use file_store::FileStore;
use log::info;
use meilisearch_types::error::ResponseError;
use meilisearch_types::heed::types::{OwnedType, SerdeBincode, SerdeJson, Str};
use meilisearch_types::heed::{self, Database, Env, RoTxn};
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::milli;
use meilisearch_types::milli::documents::DocumentsBatchBuilder;
use meilisearch_types::milli::update::IndexerConfig;
use meilisearch_types::milli::{CboRoaringBitmapCodec, Index, RoaringBitmapCodec, BEU32};
use meilisearch_types::tasks::{Kind, KindWithContent, Status, Task};
use roaring::RoaringBitmap;
use serde::Deserialize;
use synchronoise::SignalEvent;
use time::OffsetDateTime;
use utils::{filter_out_references_to_newer_tasks, keep_tasks_within_datetimes, map_bound};
@ -230,8 +235,12 @@ pub struct IndexSchedulerOptions {
pub dumps_path: PathBuf,
/// The maximum size, in bytes, of the task index.
pub task_db_size: usize,
/// The maximum size, in bytes, of each meilisearch index.
pub index_size: usize,
/// The size, in bytes, with which a meilisearch index is opened the first time of each meilisearch index.
pub index_base_map_size: usize,
/// The size, in bytes, by which the map size of an index is increased when it resized due to being full.
pub index_growth_amount: usize,
/// The number of indexes that can be concurrently opened in memory.
pub index_count: usize,
/// Configuration used during indexing for each meilisearch index.
pub indexer_config: IndexerConfig,
/// Set to `true` iff the index scheduler is allowed to automatically
@ -298,6 +307,11 @@ pub struct IndexScheduler {
/// The path to the version file of Meilisearch.
pub(crate) version_file_path: PathBuf,
/// The role in the cluster
pub(crate) cluster: Option<Cluster>,
/// The Consistency level used by the leader. Ignored if the node is not in a leader in cluster mode.
pub(crate) consistency_level: Consistency,
// ================= test
// The next entry is dedicated to the tests.
/// Provide a way to set a breakpoint in multiple part of the scheduler.
@ -317,6 +331,24 @@ pub struct IndexScheduler {
run_loop_iteration: Arc<RwLock<usize>>,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Deserialize)]
pub enum ClusterMode {
Leader,
Follower,
}
impl std::str::FromStr for ClusterMode {
type Err = ();
fn from_str(s: &str) -> std::result::Result<Self, Self::Err> {
match s {
"leader" => Ok(ClusterMode::Leader),
"follower" => Ok(ClusterMode::Follower),
_ => Err(()),
}
}
}
impl IndexScheduler {
fn private_clone(&self) -> IndexScheduler {
IndexScheduler {
@ -339,6 +371,8 @@ impl IndexScheduler {
dumps_path: self.dumps_path.clone(),
auth_path: self.auth_path.clone(),
version_file_path: self.version_file_path.clone(),
cluster: self.cluster.clone(),
consistency_level: self.consistency_level,
#[cfg(test)]
test_breakpoint_sdr: self.test_breakpoint_sdr.clone(),
#[cfg(test)]
@ -353,6 +387,8 @@ impl IndexScheduler {
/// Create an index scheduler and start its run loop.
pub fn new(
options: IndexSchedulerOptions,
cluster: Option<Cluster>,
consistency_level: Consistency,
#[cfg(test)] test_breakpoint_sdr: crossbeam::channel::Sender<(Breakpoint, bool)>,
#[cfg(test)] planned_failures: Vec<(usize, tests::FailureLocation)>,
) -> Result<Self> {
@ -361,9 +397,25 @@ impl IndexScheduler {
std::fs::create_dir_all(&options.indexes_path)?;
std::fs::create_dir_all(&options.dumps_path)?;
let task_db_size = clamp_to_page_size(options.task_db_size);
let budget = if options.indexer_config.skip_index_budget {
IndexBudget {
map_size: options.index_base_map_size,
index_count: options.index_count,
task_db_size,
}
} else {
Self::index_budget(
&options.tasks_path,
options.index_base_map_size,
task_db_size,
options.index_count,
)
};
let env = heed::EnvOpenOptions::new()
.max_dbs(10)
.map_size(clamp_to_page_size(options.task_db_size))
.map_size(budget.task_db_size)
.open(options.tasks_path)?;
let file_store = FileStore::new(&options.update_file_path)?;
@ -383,7 +435,9 @@ impl IndexScheduler {
index_mapper: IndexMapper::new(
&env,
options.indexes_path,
options.index_size,
budget.map_size,
options.index_growth_amount,
budget.index_count,
options.indexer_config,
)?,
env,
@ -394,6 +448,8 @@ impl IndexScheduler {
snapshots_path: options.snapshots_path,
auth_path: options.auth_path,
version_file_path: options.version_file_path,
cluster,
consistency_level,
#[cfg(test)]
test_breakpoint_sdr,
@ -407,6 +463,75 @@ impl IndexScheduler {
Ok(this)
}
fn index_budget(
tasks_path: &Path,
base_map_size: usize,
mut task_db_size: usize,
max_index_count: usize,
) -> IndexBudget {
#[cfg(windows)]
const DEFAULT_BUDGET: usize = 6 * 1024 * 1024 * 1024 * 1024; // 6 TiB, 1 index
#[cfg(not(windows))]
const DEFAULT_BUDGET: usize = 80 * 1024 * 1024 * 1024 * 1024; // 80 TiB, 18 indexes
let budget = if Self::is_good_heed(tasks_path, DEFAULT_BUDGET) {
DEFAULT_BUDGET
} else {
log::debug!("determining budget with dichotomic search");
utils::dichotomic_search(DEFAULT_BUDGET / 2, |map_size| {
Self::is_good_heed(tasks_path, map_size)
})
};
log::debug!("memmap budget: {budget}B");
let mut budget = budget / 2;
if task_db_size > (budget / 2) {
task_db_size = clamp_to_page_size(budget * 2 / 5);
log::debug!(
"Decreasing max size of task DB to {task_db_size}B due to constrained memory space"
);
}
budget -= task_db_size;
// won't be mutated again
let budget = budget;
let task_db_size = task_db_size;
log::debug!("index budget: {budget}B");
let mut index_count = budget / base_map_size;
if index_count < 2 {
// take a bit less than half than the budget to make sure we can always afford to open an index
let map_size = (budget * 2) / 5;
// single index of max budget
log::debug!("1 index of {map_size}B can be opened simultaneously.");
return IndexBudget { map_size, index_count: 1, task_db_size };
}
// give us some space for an additional index when the cache is already full
// decrement is OK because index_count >= 2.
index_count -= 1;
if index_count > max_index_count {
index_count = max_index_count;
}
log::debug!("Up to {index_count} indexes of {base_map_size}B opened simultaneously.");
IndexBudget { map_size: base_map_size, index_count, task_db_size }
}
fn is_good_heed(tasks_path: &Path, map_size: usize) -> bool {
if let Ok(env) =
heed::EnvOpenOptions::new().map_size(clamp_to_page_size(map_size)).open(tasks_path)
{
env.prepare_for_closing().wait();
true
} else {
// We're treating all errors equally here, not only allocation errors.
// This means there's a possiblity for the budget to lower due to errors different from allocation errors.
// For persistent errors, this is OK as long as the task db is then reopened normally without ignoring the error this time.
// For transient errors, this could lead to an instance with too low a budget.
// However transient errors are: 1) less likely than persistent errors 2) likely to cause other issues down the line anyway.
false
}
}
pub fn read_txn(&self) -> Result<RoTxn> {
self.env.read_txn().map_err(|e| e.into())
}
@ -417,6 +542,26 @@ impl IndexScheduler {
/// only once per index scheduler.
fn run(&self) {
let run = self.private_clone();
// if we're a follower we starts a thread to register the tasks coming from the leader
if let Some(Cluster::Follower(ref follower)) = self.cluster {
let this = self.private_clone();
let follower = follower.clone();
std::thread::spawn(move || loop {
let (task, content) = follower.get_new_task();
this.register_raw_task(task, content);
});
} else if let Some(Cluster::Leader(ref leader)) = self.cluster {
// we need a way to let the leader come out of its loop if a new follower joins the cluster
let cluster = leader.wake_up.clone();
let scheduler = self.wake_up.clone();
std::thread::spawn(move || loop {
cluster.wait();
scheduler.signal();
});
}
std::thread::Builder::new()
.name(String::from("scheduler"))
.spawn(move || {
@ -460,15 +605,42 @@ impl IndexScheduler {
///
/// * If the index wasn't opened before, the index will be opened.
/// * If the index doesn't exist on disk, the `IndexNotFoundError` is thrown.
///
/// ### Note
///
/// As an `Index` requires a large swath of the virtual memory address space, correct usage of an `Index` does not
/// keep its handle for too long.
///
/// Some configurations also can't reasonably open multiple indexes at once.
/// If you need to fetch information from or perform an action on all indexes,
/// see the `try_for_each_index` function.
pub fn index(&self, name: &str) -> Result<Index> {
let rtxn = self.env.read_txn()?;
self.index_mapper.index(&rtxn, name)
}
/// Return and open all the indexes.
pub fn indexes(&self) -> Result<Vec<(String, Index)>> {
/// Return the name of all indexes without opening them.
pub fn index_names(self) -> Result<Vec<String>> {
let rtxn = self.env.read_txn()?;
self.index_mapper.indexes(&rtxn)
self.index_mapper.index_names(&rtxn)
}
/// Attempts `f` for each index that exists known to the index scheduler.
///
/// It is preferable to use this function rather than a loop that opens all indexes, as a way to avoid having all indexes opened,
/// which is unsupported in general.
///
/// Since `f` is allowed to return a result, and `Index` is cloneable, it is still possible to wrongly build e.g. a vector of
/// all the indexes, but this function makes it harder and so less likely to do accidentally.
///
/// If many indexes exist, this operation can take time to complete (in the order of seconds for a 1000 of indexes) as it needs to open
/// all the indexes.
pub fn try_for_each_index<U, V>(&self, f: impl FnMut(&str, &Index) -> Result<U>) -> Result<V>
where
V: FromIterator<U>,
{
let rtxn = self.env.read_txn()?;
self.index_mapper.try_for_each_index(&rtxn, f)
}
/// Return the task ids matched by the given query from the index scheduler's point of view.
@ -630,13 +802,13 @@ impl IndexScheduler {
&self,
rtxn: &RoTxn,
query: &Query,
authorized_indexes: &Option<Vec<IndexUidPattern>>,
filters: &meilisearch_auth::AuthFilter,
) -> Result<RoaringBitmap> {
let mut tasks = self.get_task_ids(rtxn, query)?;
// If the query contains a list of index uid or there is a finite list of authorized indexes,
// then we must exclude all the kinds that aren't associated to one and only one index.
if query.index_uids.is_some() || authorized_indexes.is_some() {
if query.index_uids.is_some() || !filters.all_indexes_authorized() {
for kind in enum_iterator::all::<Kind>().filter(|kind| !kind.related_to_one_index()) {
tasks -= self.get_kind(rtxn, kind)?;
}
@ -644,11 +816,11 @@ impl IndexScheduler {
// Any task that is internally associated with a non-authorized index
// must be discarded.
if let Some(authorized_indexes) = authorized_indexes {
if !filters.all_indexes_authorized() {
let all_indexes_iter = self.index_tasks.iter(rtxn)?;
for result in all_indexes_iter {
let (index, index_tasks) = result?;
if !authorized_indexes.iter().any(|p| p.matches_str(index)) {
if !filters.is_index_authorized(index) {
tasks -= index_tasks;
}
}
@ -668,12 +840,11 @@ impl IndexScheduler {
pub fn get_tasks_from_authorized_indexes(
&self,
query: Query,
authorized_indexes: Option<Vec<IndexUidPattern>>,
filters: &meilisearch_auth::AuthFilter,
) -> Result<Vec<Task>> {
let rtxn = self.env.read_txn()?;
let tasks =
self.get_task_ids_from_authorized_indexes(&rtxn, &query, &authorized_indexes)?;
let tasks = self.get_task_ids_from_authorized_indexes(&rtxn, &query, filters)?;
let tasks = self.get_existing_tasks(
&rtxn,
@ -748,6 +919,16 @@ impl IndexScheduler {
return Err(e.into());
}
if let Some(Cluster::Leader(leader)) = &self.cluster {
let update_file = if let Some(uuid) = task.content_uuid() {
let path = self.file_store.get_update_path(uuid);
Some(std::fs::read(path).unwrap())
} else {
None
};
leader.register_new_task(task.clone(), update_file);
}
// If the registered task is a task cancelation
// we inform the processing tasks to stop (if necessary).
if let KindWithContent::TaskCancelation { tasks, .. } = kind {
@ -877,6 +1058,44 @@ impl IndexScheduler {
Ok(task)
}
/// /!\ should only be used when you're a follower in cluster mode
pub fn register_raw_task(&self, task: Task, content_file: Option<Vec<u8>>) {
if let Some(content) = content_file {
let uuid = task.content_uuid().expect("bad task");
let (_, mut file) = self.file_store.new_update_with_uuid(uuid.as_u128()).unwrap();
file.write_all(&content).unwrap();
file.persist().unwrap();
}
let mut wtxn = self.env.write_txn().unwrap();
self.all_tasks.put(&mut wtxn, &BEU32::new(task.uid), &task).unwrap();
for index in task.indexes() {
self.update_index(&mut wtxn, index, |bitmap| {
bitmap.insert(task.uid);
})
.unwrap();
}
self.update_status(&mut wtxn, task.status, |bitmap| {
bitmap.insert(task.uid);
})
.unwrap();
self.update_kind(&mut wtxn, task.kind.as_kind(), |bitmap| {
(bitmap.insert(task.uid));
})
.unwrap();
utils::insert_task_datetime(&mut wtxn, self.enqueued_at, task.enqueued_at, task.uid)
.unwrap();
wtxn.commit().unwrap();
self.wake_up.signal();
}
/// Create a new index without any associated task.
pub fn create_raw_index(
&self,
@ -933,14 +1152,15 @@ impl IndexScheduler {
self.breakpoint(Breakpoint::Start);
}
let rtxn = self.env.read_txn().map_err(Error::HeedTransaction)?;
let batch =
match self.create_next_batch(&rtxn).map_err(|e| Error::CreateBatch(Box::new(e)))? {
Some(batch) => batch,
None => return Ok(TickOutcome::WaitForSignal),
};
info!("before getting a new batch");
let batch = match self.get_or_create_next_batch()? {
Some(batch) => batch,
None => return Ok(TickOutcome::WaitForSignal),
};
info!("after getting a new batch");
let index_uid = batch.index_uid().map(ToOwned::to_owned);
drop(rtxn);
// TODO cluster: Should we send the starting date as well so everyone is in sync?
// 1. store the starting date with the bitmap of processing tasks.
let mut ids = batch.ids();
@ -1069,6 +1289,63 @@ impl IndexScheduler {
Ok(TickOutcome::TickAgain(processed_tasks))
}
/// If there is no cluster or if leader -> create a new batch
/// If follower -> wait till the leader gives us a batch to process
fn get_or_create_next_batch(&self) -> Result<Option<Batch>> {
info!("inside get or create next batch");
let batch = match &self.cluster {
None | Some(Cluster::Leader(_)) => {
let rtxn = self.env.read_txn().map_err(Error::HeedTransaction)?;
self.create_next_batch(&rtxn).map_err(|e| Error::CreateBatch(Box::new(e)))?
}
Some(Cluster::Follower(follower)) => {
let batch = follower.get_new_batch();
Some(self.get_batch_from_cluster_batch(batch)?)
}
};
if let Some(Cluster::Leader(leader)) = &self.cluster {
// first, onboard the new followers
if leader.has_new_followers() {
info!("New followers are trying to join the cluster");
let started_at = OffsetDateTime::now_utc();
let dump = self
.create_dump(
&Task {
uid: TaskId::MAX,
enqueued_at: started_at,
started_at: Some(started_at),
finished_at: Some(started_at),
error: None,
canceled_by: None,
details: None,
status: Status::Enqueued,
kind: KindWithContent::DumpCreation {
keys: leader.get_keys(),
// TODO cluster: should we unify the instance_uid between every instances?
instance_uid: None,
},
},
&started_at,
)
.unwrap();
let mut buffer = Vec::new();
// TODO cluster: stop writing everything in RAM
dump.persist_to(&mut buffer).unwrap();
leader.join_me(buffer);
}
// second, starts processing the batch
if let Some(ref batch) = batch {
leader.starts_batch(batch.clone().into());
}
}
Ok(batch)
}
pub(crate) fn delete_persisted_task_data(&self, task: &Task) -> Result<()> {
match task.content_uuid() {
Some(content_file) => self.delete_update_file(content_file),
@ -1111,6 +1388,16 @@ pub enum TickOutcome {
WaitForSignal,
}
/// How many indexes we can afford to have open simultaneously.
struct IndexBudget {
/// Map size of an index.
map_size: usize,
/// Maximum number of simultaneously opened indexes.
index_count: usize,
/// For very constrained systems we might need to reduce the base task_db_size so we can accept at least one index.
task_db_size: usize,
}
#[cfg(test)]
mod tests {
use std::io::{BufWriter, Seek, Write};
@ -1120,7 +1407,9 @@ mod tests {
use crossbeam::channel::RecvTimeoutError;
use file_store::File;
use meili_snap::snapshot;
use meilisearch_auth::AuthFilter;
use meilisearch_types::document_formats::DocumentFormatError;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::milli::obkv_to_json;
use meilisearch_types::milli::update::IndexDocumentsMethod::{
ReplaceDocuments, UpdateDocuments,
@ -1154,6 +1443,8 @@ mod tests {
let tempdir = TempDir::new().unwrap();
let (sender, receiver) = crossbeam::channel::bounded(0);
let indexer_config = IndexerConfig { skip_index_budget: true, ..Default::default() };
let options = IndexSchedulerOptions {
version_file_path: tempdir.path().join(VERSION_FILE_NAME),
auth_path: tempdir.path().join("auth"),
@ -1163,12 +1454,15 @@ mod tests {
snapshots_path: tempdir.path().join("snapshots"),
dumps_path: tempdir.path().join("dumps"),
task_db_size: 1000 * 1000, // 1 MB, we don't use MiB on purpose.
index_size: 1000 * 1000, // 1 MB, we don't use MiB on purpose.
indexer_config: IndexerConfig::default(),
index_base_map_size: 1000 * 1000, // 1 MB, we don't use MiB on purpose.
index_growth_amount: 1000 * 1000, // 1 MB
index_count: 5,
indexer_config,
autobatching_enabled,
};
let index_scheduler = Self::new(options, sender, planned_failures).unwrap();
let index_scheduler =
Self::new(options, None, Consistency::default(), sender, planned_failures).unwrap();
// To be 100% consistent between all test we're going to start the scheduler right now
// and ensure it's in the expected starting state.
@ -2371,38 +2665,45 @@ mod tests {
let rtxn = index_scheduler.env.read_txn().unwrap();
let query = Query { limit: Some(0), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[]");
let query = Query { limit: Some(1), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
let query = Query { limit: Some(2), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[1,2,]");
let query = Query { from: Some(1), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,]");
let query = Query { from: Some(2), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,2,]");
let query = Query { from: Some(1), limit: Some(1), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[1,]");
let query = Query { from: Some(1), limit: Some(2), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,]");
}
@ -2427,21 +2728,24 @@ mod tests {
let rtxn = index_scheduler.env.read_txn().unwrap();
let query = Query { statuses: Some(vec![Status::Processing]), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,]"); // only the processing tasks in the first tick
let query = Query { statuses: Some(vec![Status::Enqueued]), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[1,2,]"); // only the enqueued tasks in the first tick
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Processing]),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,2,]"); // both enqueued and processing tasks in the first tick
let query = Query {
@ -2449,8 +2753,9 @@ mod tests {
after_started_at: Some(start_time),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test, which should excludes the enqueued tasks
snapshot!(snapshot_bitmap(&tasks), @"[0,]");
@ -2460,8 +2765,9 @@ mod tests {
before_started_at: Some(start_time),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes before the start of the test, which should excludes all of them
snapshot!(snapshot_bitmap(&tasks), @"[]");
@ -2472,8 +2778,9 @@ mod tests {
before_started_at: Some(start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test and before one minute after the start of the test,
// which should exclude the enqueued tasks and include the only processing task
@ -2498,8 +2805,9 @@ mod tests {
before_started_at: Some(start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test and before one minute after the start of the test,
// which should include all tasks
@ -2510,8 +2818,9 @@ mod tests {
before_started_at: Some(start_time),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes before the start of the test, which should exclude all tasks
snapshot!(snapshot_bitmap(&tasks), @"[]");
@ -2522,8 +2831,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the second part of the test and before one minute after the
// second start of the test, which should exclude all tasks
@ -2541,8 +2851,9 @@ mod tests {
let rtxn = index_scheduler.env.read_txn().unwrap();
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// we run the same query to verify that, and indeed find that the last task is matched
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
@ -2552,8 +2863,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// enqueued, succeeded, or processing tasks started after the second part of the test, should
// again only return the last task
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
@ -2563,8 +2875,9 @@ mod tests {
// now the last task should have failed
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "end");
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// so running the last query should return nothing
snapshot!(snapshot_bitmap(&tasks), @"[]");
@ -2574,8 +2887,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// but the same query on failed tasks should return the last task
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
@ -2585,8 +2899,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// but the same query on failed tasks should return the last task
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
@ -2597,8 +2912,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// same query but with an invalid uid
snapshot!(snapshot_bitmap(&tasks), @"[]");
@ -2609,8 +2925,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// same query but with a valid uid
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
}
@ -2640,8 +2957,9 @@ mod tests {
let rtxn = index_scheduler.env.read_txn().unwrap();
let query = Query { index_uids: Some(vec!["catto".to_owned()]), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// only the first task associated with catto is returned, the indexSwap tasks are excluded!
snapshot!(snapshot_bitmap(&tasks), @"[0,]");
@ -2650,7 +2968,9 @@ mod tests {
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&Some(vec![IndexUidPattern::new_unchecked("doggo")]),
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
)
.unwrap();
// we have asked for only the tasks associated with catto, but are only authorized to retrieve the tasks
@ -2662,7 +2982,9 @@ mod tests {
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&Some(vec![IndexUidPattern::new_unchecked("doggo")]),
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
)
.unwrap();
// we asked for all the tasks, but we are only authorized to retrieve the doggo tasks
@ -2674,10 +2996,14 @@ mod tests {
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&Some(vec![
IndexUidPattern::new_unchecked("catto"),
IndexUidPattern::new_unchecked("doggo"),
]),
&AuthFilter::with_allowed_indexes(
vec![
IndexUidPattern::new_unchecked("catto"),
IndexUidPattern::new_unchecked("doggo"),
]
.into_iter()
.collect(),
),
)
.unwrap();
// we asked for all the tasks, but we are only authorized to retrieve the doggo and catto tasks
@ -2685,8 +3011,9 @@ mod tests {
snapshot!(snapshot_bitmap(&tasks), @"[0,1,]");
let query = Query::default();
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// we asked for all the tasks with all index authorized -> all tasks returned
snapshot!(snapshot_bitmap(&tasks), @"[0,1,2,3,]");
}
@ -2717,8 +3044,9 @@ mod tests {
let rtxn = index_scheduler.read_txn().unwrap();
let query = Query { canceled_by: Some(vec![task_cancelation.uid]), ..Query::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// 0 is not returned because it was not canceled, 3 is not returned because it is the uid of the
// taskCancelation itself
snapshot!(snapshot_bitmap(&tasks), @"[1,2,]");
@ -2728,7 +3056,9 @@ mod tests {
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&Some(vec![IndexUidPattern::new_unchecked("doggo")]),
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
)
.unwrap();
// Return only 1 because the user is not authorized to see task 2

203
index-scheduler/src/lru.rs Normal file
View File

@ -0,0 +1,203 @@
//! Thread-safe `Vec`-backend LRU cache using [`std::sync::atomic::AtomicU64`] for synchronization.
use std::sync::atomic::{AtomicU64, Ordering};
/// Thread-safe `Vec`-backend LRU cache
#[derive(Debug)]
pub struct Lru<T> {
data: Vec<(AtomicU64, T)>,
generation: AtomicU64,
cap: usize,
}
impl<T> Lru<T> {
/// Creates a new LRU cache with the specified capacity.
///
/// The capacity is allocated up-front, and will never change through a [`Self::put`] operation.
///
/// # Panics
///
/// - If the capacity is 0.
/// - If the capacity exceeds `isize::MAX` bytes.
pub fn new(cap: usize) -> Self {
assert_ne!(cap, 0, "The capacity of a cache cannot be 0");
Self {
// Note: since the element of the vector contains an AtomicU64, it is definitely not zero-sized so cap will never be usize::MAX.
data: Vec::with_capacity(cap),
generation: AtomicU64::new(0),
cap,
}
}
/// The capacity of this LRU cache, that is the maximum number of elements it can hold before evicting elements from the cache.
///
/// The cache will contain at most this number of elements at any given time.
pub fn capacity(&self) -> usize {
self.cap
}
fn next_generation(&self) -> u64 {
// Acquire so this "happens-before" any potential store to a data cell (with Release ordering)
let generation = self.generation.fetch_add(1, Ordering::Acquire);
generation + 1
}
fn next_generation_mut(&mut self) -> u64 {
let generation = self.generation.get_mut();
*generation += 1;
*generation
}
/// Add a value in the cache, evicting an older value if necessary.
///
/// If a value was evicted from the cache, it is returned.
///
/// # Complexity
///
/// - If the cache is full, then linear in the capacity.
/// - Otherwise constant.
pub fn put(&mut self, value: T) -> Option<T> {
// no need for a memory fence: we assume that whichever mechanism provides us synchronization
// (very probably, a RwLock) takes care of fencing for us.
let next_generation = self.next_generation_mut();
let evicted = if self.is_full() { self.pop() } else { None };
self.data.push((AtomicU64::new(next_generation), value));
evicted
}
/// Evict the oldest value from the cache.
///
/// If the cache is empty, `None` will be returned.
///
/// # Complexity
///
/// - Linear in the capacity of the cache.
pub fn pop(&mut self) -> Option<T> {
// Don't use `Iterator::min_by_key` that provides shared references to its elements,
// so that we can get an exclusive one.
// This allows to handles the `AtomicU64`s as normal integers without using atomic instructions.
let mut min_generation_index = None;
for (index, (generation, _)) in self.data.iter_mut().enumerate() {
let generation = *generation.get_mut();
if let Some((_, min_generation)) = min_generation_index {
if min_generation > generation {
min_generation_index = Some((index, generation));
}
} else {
min_generation_index = Some((index, generation))
}
}
min_generation_index.map(|(min_index, _)| self.data.swap_remove(min_index).1)
}
/// The current number of elements in the cache.
///
/// This value is guaranteed to be less than or equal to [`Self::capacity`].
pub fn len(&self) -> usize {
self.data.len()
}
/// Returns `true` if putting any additional element in the cache would cause the eviction of an element.
pub fn is_full(&self) -> bool {
self.len() == self.capacity()
}
}
pub struct LruMap<K, V>(Lru<(K, V)>);
impl<K, V> LruMap<K, V>
where
K: Eq,
{
/// Creates a new LRU cache map with the specified capacity.
///
/// The capacity is allocated up-front, and will never change through a [`Self::insert`] operation.
///
/// # Panics
///
/// - If the capacity is 0.
/// - If the capacity exceeds `isize::MAX` bytes.
pub fn new(cap: usize) -> Self {
Self(Lru::new(cap))
}
/// Gets a value in the cache map by its key.
///
/// If no value matches, `None` will be returned.
///
/// # Complexity
///
/// - Linear in the capacity of the cache.
pub fn get(&self, key: &K) -> Option<&V> {
for (generation, (candidate, value)) in self.0.data.iter() {
if key == candidate {
generation.store(self.0.next_generation(), Ordering::Release);
return Some(value);
}
}
None
}
/// Gets a value in the cache map by its key.
///
/// If no value matches, `None` will be returned.
///
/// # Complexity
///
/// - Linear in the capacity of the cache.
pub fn get_mut(&mut self, key: &K) -> Option<&mut V> {
let next_generation = self.0.next_generation_mut();
for (generation, (candidate, value)) in self.0.data.iter_mut() {
if key == candidate {
*generation.get_mut() = next_generation;
return Some(value);
}
}
None
}
/// Inserts a value in the cache map by its key, replacing any existing value and returning any evicted value.
///
/// # Complexity
///
/// - Linear in the capacity of the cache.
pub fn insert(&mut self, key: K, mut value: V) -> InsertionOutcome<K, V> {
match self.get_mut(&key) {
Some(old_value) => {
std::mem::swap(old_value, &mut value);
InsertionOutcome::Replaced(value)
}
None => match self.0.put((key, value)) {
Some((key, value)) => InsertionOutcome::Evicted(key, value),
None => InsertionOutcome::InsertedNew,
},
}
}
/// Removes an element from the cache map by its key, returning its value.
///
/// Returns `None` if there was no element with this key in the cache.
///
/// # Complexity
///
/// - Linear in the capacity of the cache.
pub fn remove(&mut self, key: &K) -> Option<V> {
for (index, (_, (candidate, _))) in self.0.data.iter_mut().enumerate() {
if key == candidate {
return Some(self.0.data.swap_remove(index).1 .1);
}
}
None
}
}
/// The result of an insertion in a LRU map.
pub enum InsertionOutcome<K, V> {
/// The key was not in the cache, the key-value pair has been inserted.
InsertedNew,
/// The key was not in the cache and an old key-value pair was evicted from the cache to make room for its insertions.
Evicted(K, V),
/// The key was already in the cache map, its value has been updated.
Replaced(V),
}

View File

@ -538,3 +538,37 @@ impl IndexScheduler {
}
}
}
pub fn dichotomic_search(start_point: usize, mut is_good: impl FnMut(usize) -> bool) -> usize {
let mut biggest_good = None;
let mut smallest_bad = None;
let mut current = start_point;
loop {
let is_good = is_good(current);
(biggest_good, smallest_bad, current) = match (biggest_good, smallest_bad, is_good) {
(None, None, false) => (None, Some(current), current / 2),
(None, None, true) => (Some(current), None, current * 2),
(None, Some(smallest_bad), true) => {
(Some(current), Some(smallest_bad), (current + smallest_bad) / 2)
}
(None, Some(_), false) => (None, Some(current), current / 2),
(Some(_), None, true) => (Some(current), None, current * 2),
(Some(biggest_good), None, false) => {
(Some(biggest_good), Some(current), (biggest_good + current) / 2)
}
(Some(_), Some(smallest_bad), true) => {
(Some(current), Some(smallest_bad), (smallest_bad + current) / 2)
}
(Some(biggest_good), Some(_), false) => {
(Some(biggest_good), Some(current), (biggest_good + current) / 2)
}
};
if current == 0 {
return current;
}
if smallest_bad.is_some() && biggest_good.is_some() && biggest_good >= Some(current) {
return current;
}
}
}

View File

@ -12,6 +12,7 @@ license.workspace = true
[dependencies]
base64 = "0.13.1"
cluster = { path = "../cluster" }
enum-iterator = "1.1.3"
hmac = "0.12.1"
maplit = "1.0.2"

View File

@ -6,6 +6,7 @@ use std::collections::{HashMap, HashSet};
use std::path::Path;
use std::sync::Arc;
use cluster::Cluster;
use error::{AuthControllerError, Result};
use maplit::hashset;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
@ -21,17 +22,52 @@ use uuid::Uuid;
pub struct AuthController {
store: Arc<HeedAuthStore>,
master_key: Option<String>,
cluster: Option<Cluster>,
}
impl AuthController {
pub fn new(db_path: impl AsRef<Path>, master_key: &Option<String>) -> Result<Self> {
pub fn new(
db_path: impl AsRef<Path>,
master_key: &Option<String>,
cluster: Option<Cluster>,
) -> Result<Self> {
let store = HeedAuthStore::new(db_path)?;
if store.is_empty()? {
generate_default_keys(&store)?;
}
Ok(Self { store: Arc::new(store), master_key: master_key.clone() })
let this = Self {
store: Arc::new(store),
master_key: master_key.clone(),
cluster: cluster.clone(),
};
if let Some(Cluster::Follower(follower)) = cluster {
let this = this.clone();
std::thread::spawn(move || loop {
match follower.api_key_operation() {
cluster::ApiKeyOperation::Insert(key) => {
this.store.put_api_key(key).expect("Inconsistency with the leader");
}
cluster::ApiKeyOperation::Delete(uuid) => {
this.store.delete_api_key(uuid).expect("Inconsistency with the leader");
}
}
});
} else if let Some(Cluster::Leader(leader)) = cluster {
let this = this.clone();
std::thread::spawn(move || loop {
let channel = leader.needs_keys();
let keys = this.list_keys().expect("auth controller is dead");
channel.send(keys).expect("Cluster is dead");
});
}
Ok(this)
}
/// Return the size of the `AuthController` database in bytes.
@ -42,7 +78,13 @@ impl AuthController {
pub fn create_key(&self, create_key: CreateApiKey) -> Result<Key> {
match self.store.get_api_key(create_key.uid)? {
Some(_) => Err(AuthControllerError::ApiKeyAlreadyExists(create_key.uid.to_string())),
None => self.store.put_api_key(create_key.to_key()),
None => {
let key = self.store.put_api_key(create_key.to_key())?;
if let Some(Cluster::Leader(ref leader)) = self.cluster {
leader.insert_key(key.clone());
}
Ok(key)
}
}
}
@ -57,7 +99,12 @@ impl AuthController {
name => key.name = name.set(),
};
key.updated_at = OffsetDateTime::now_utc();
self.store.put_api_key(key)
let key = self.store.put_api_key(key)?;
if let Some(Cluster::Leader(ref leader)) = self.cluster {
leader.insert_key(key.clone());
}
Ok(key)
}
pub fn get_key(&self, uid: Uuid) -> Result<Key> {
@ -85,17 +132,13 @@ impl AuthController {
uid: Uuid,
search_rules: Option<SearchRules>,
) -> Result<AuthFilter> {
let mut filters = AuthFilter::default();
let key = self.get_key(uid)?;
filters.search_rules = match search_rules {
Some(search_rules) => search_rules,
None => SearchRules::Set(key.indexes.into_iter().collect()),
};
let key_authorized_indexes = SearchRules::Set(key.indexes.into_iter().collect());
filters.allow_index_creation = self.is_key_authorized(uid, Action::IndexesAdd, None)?;
let allow_index_creation = self.is_key_authorized(uid, Action::IndexesAdd, None)?;
Ok(filters)
Ok(AuthFilter { search_rules, key_authorized_indexes, allow_index_creation })
}
pub fn list_keys(&self) -> Result<Vec<Key>> {
@ -104,6 +147,9 @@ impl AuthController {
pub fn delete_key(&self, uid: Uuid) -> Result<()> {
if self.store.delete_api_key(uid)? {
if let Some(Cluster::Leader(ref leader)) = self.cluster {
leader.delete_key(uid);
}
Ok(())
} else {
Err(AuthControllerError::ApiKeyNotFound(uid.to_string()))
@ -160,13 +206,59 @@ impl AuthController {
}
pub struct AuthFilter {
pub search_rules: SearchRules,
pub allow_index_creation: bool,
search_rules: Option<SearchRules>,
key_authorized_indexes: SearchRules,
allow_index_creation: bool,
}
impl Default for AuthFilter {
fn default() -> Self {
Self { search_rules: SearchRules::default(), allow_index_creation: true }
Self {
search_rules: None,
key_authorized_indexes: SearchRules::default(),
allow_index_creation: true,
}
}
}
impl AuthFilter {
#[inline]
pub fn allow_index_creation(&self, index: &str) -> bool {
self.allow_index_creation && self.is_index_authorized(index)
}
pub fn with_allowed_indexes(allowed_indexes: HashSet<IndexUidPattern>) -> Self {
Self {
search_rules: None,
key_authorized_indexes: SearchRules::Set(allowed_indexes),
allow_index_creation: false,
}
}
pub fn all_indexes_authorized(&self) -> bool {
self.key_authorized_indexes.all_indexes_authorized()
&& self
.search_rules
.as_ref()
.map(|search_rules| search_rules.all_indexes_authorized())
.unwrap_or(true)
}
pub fn is_index_authorized(&self, index: &str) -> bool {
self.key_authorized_indexes.is_index_authorized(index)
&& self
.search_rules
.as_ref()
.map(|search_rules| search_rules.is_index_authorized(index))
.unwrap_or(true)
}
pub fn get_index_search_rules(&self, index: &str) -> Option<IndexSearchRules> {
if !self.is_index_authorized(index) {
return None;
}
let search_rules = self.search_rules.as_ref().unwrap_or(&self.key_authorized_indexes);
search_rules.get_index_search_rules(index)
}
}
@ -185,7 +277,7 @@ impl Default for SearchRules {
}
impl SearchRules {
pub fn is_index_authorized(&self, index: &str) -> bool {
fn is_index_authorized(&self, index: &str) -> bool {
match self {
Self::Set(set) => {
set.contains("*")
@ -200,7 +292,7 @@ impl SearchRules {
}
}
pub fn get_index_search_rules(&self, index: &str) -> Option<IndexSearchRules> {
fn get_index_search_rules(&self, index: &str) -> Option<IndexSearchRules> {
match self {
Self::Set(_) => {
if self.is_index_authorized(index) {
@ -219,24 +311,10 @@ impl SearchRules {
}
}
/// Return the list of indexes such that `self.is_index_authorized(index) == true`,
/// or `None` if all indexes satisfy this condition.
pub fn authorized_indexes(&self) -> Option<Vec<IndexUidPattern>> {
fn all_indexes_authorized(&self) -> bool {
match self {
SearchRules::Set(set) => {
if set.contains("*") {
None
} else {
Some(set.iter().cloned().collect())
}
}
SearchRules::Map(map) => {
if map.contains_key("*") {
None
} else {
Some(map.keys().cloned().collect())
}
}
SearchRules::Set(set) => set.contains("*"),
SearchRules::Map(map) => map.contains_key("*"),
}
}
}

View File

@ -11,8 +11,8 @@ use serde::{Deserialize, Serialize};
#[serde(rename_all = "camelCase")]
pub struct ResponseError {
#[serde(skip)]
code: StatusCode,
message: String,
pub code: StatusCode,
pub message: String,
#[serde(rename = "code")]
error_code: String,
#[serde(rename = "type")]
@ -212,6 +212,7 @@ InvalidApiKeyName , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyOffset , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyUid , InvalidRequest , BAD_REQUEST ;
InvalidContentType , InvalidRequest , UNSUPPORTED_MEDIA_TYPE ;
InvalidDocumentCsvDelimiter , InvalidRequest , BAD_REQUEST ;
InvalidDocumentFields , InvalidRequest , BAD_REQUEST ;
InvalidDocumentGeoField , InvalidRequest , BAD_REQUEST ;
InvalidDocumentId , InvalidRequest , BAD_REQUEST ;
@ -220,7 +221,6 @@ InvalidDocumentOffset , InvalidRequest , BAD_REQUEST ;
InvalidIndexLimit , InvalidRequest , BAD_REQUEST ;
InvalidIndexOffset , InvalidRequest , BAD_REQUEST ;
InvalidIndexPrimaryKey , InvalidRequest , BAD_REQUEST ;
InvalidIndexCsvDelimiter , InvalidRequest , BAD_REQUEST ;
InvalidIndexUid , InvalidRequest , BAD_REQUEST ;
InvalidSearchAttributesToCrop , InvalidRequest , BAD_REQUEST ;
InvalidSearchAttributesToHighlight , InvalidRequest , BAD_REQUEST ;

View File

@ -46,7 +46,7 @@ pub fn check_version_file(db_path: &Path) -> anyhow::Result<()> {
pub enum VersionFileError {
#[error(
"Meilisearch (v{}) failed to infer the version of the database.
To update Meilisearch please follow our guide on https://docs.meilisearch.com/learn/advanced/updating.html.",
To update Meilisearch please follow our guide on https://docs.meilisearch.com/learn/update_and_migration/updating.html.",
env!("CARGO_PKG_VERSION").to_string()
)]
MissingVersionFile,
@ -54,7 +54,7 @@ pub enum VersionFileError {
MalformedVersionFile,
#[error(
"Your database version ({major}.{minor}.{patch}) is incompatible with your current engine version ({}).\n\
To migrate data between Meilisearch versions, please follow our guide on https://docs.meilisearch.com/learn/advanced/updating.html.",
To migrate data between Meilisearch versions, please follow our guide on https://docs.meilisearch.com/learn/update_and_migration/updating.html.",
env!("CARGO_PKG_VERSION").to_string()
)]
VersionMismatch { major: String, minor: String, patch: String },

View File

@ -24,6 +24,7 @@ bstr = "1.0.1"
byte-unit = { version = "4.0.14", default-features = false, features = ["std", "serde"] }
bytes = "1.2.1"
clap = { version = "4.0.9", features = ["derive", "env"] }
cluster = { path = "../cluster" }
crossbeam-channel = "0.5.6"
deserr = "0.5.0"
dump = { path = "../dump" }
@ -52,7 +53,7 @@ parking_lot = "0.12.1"
permissive-json-pointer = { path = "../permissive-json-pointer" }
pin-project-lite = "0.2.9"
platform-dirs = "0.3.0"
prometheus = { version = "0.13.2", features = ["process"], optional = true }
prometheus = { version = "0.13.2", features = ["process"] }
rand = "0.8.5"
rayon = "1.5.3"
regex = "1.6.0"
@ -107,7 +108,6 @@ zip = { version = "0.6.2", optional = true }
[features]
default = ["analytics", "meilisearch-types/default", "mini-dashboard"]
metrics = ["prometheus"]
analytics = ["segment"]
mini-dashboard = ["actix-web-static-files", "static-files", "anyhow", "cargo_toml", "hex", "reqwest", "sha-1", "tempfile", "zip"]
chinese = ["meilisearch-types/chinese"]

View File

@ -26,6 +26,18 @@ impl SearchAggregator {
pub fn succeed(&mut self, _: &dyn Any) {}
}
#[derive(Default)]
pub struct MultiSearchAggregator;
#[allow(dead_code)]
impl MultiSearchAggregator {
pub fn from_queries(_: &dyn Any, _: &dyn Any) -> Self {
Self::default()
}
pub fn succeed(&mut self) {}
}
impl MockAnalytics {
#[allow(clippy::new_ret_no_self)]
pub fn new(opt: &Opt) -> Arc<dyn Analytics> {
@ -43,6 +55,7 @@ impl Analytics for MockAnalytics {
fn publish(&self, _event_name: String, _send: Value, _request: Option<&HttpRequest>) {}
fn get_search(&self, _aggregate: super::SearchAggregator) {}
fn post_search(&self, _aggregate: super::SearchAggregator) {}
fn post_multi_search(&self, _aggregate: super::MultiSearchAggregator) {}
fn add_documents(
&self,
_documents_query: &UpdateDocumentsQuery,

View File

@ -23,6 +23,8 @@ use crate::routes::tasks::TasksFilterQuery;
pub type SegmentAnalytics = mock_analytics::MockAnalytics;
#[cfg(any(debug_assertions, not(feature = "analytics")))]
pub type SearchAggregator = mock_analytics::SearchAggregator;
#[cfg(any(debug_assertions, not(feature = "analytics")))]
pub type MultiSearchAggregator = mock_analytics::MultiSearchAggregator;
// if we are in release mode and the feature analytics was enabled
// we use the real analytics
@ -30,6 +32,8 @@ pub type SearchAggregator = mock_analytics::SearchAggregator;
pub type SegmentAnalytics = segment_analytics::SegmentAnalytics;
#[cfg(all(not(debug_assertions), feature = "analytics"))]
pub type SearchAggregator = segment_analytics::SearchAggregator;
#[cfg(all(not(debug_assertions), feature = "analytics"))]
pub type MultiSearchAggregator = segment_analytics::MultiSearchAggregator;
/// The Meilisearch config dir:
/// `~/.config/Meilisearch` on *NIX or *BSD.
@ -74,6 +78,9 @@ pub trait Analytics: Sync + Send {
/// This method should be called to aggregate a post search
fn post_search(&self, aggregate: SearchAggregator);
/// This method should be called to aggregate a post array of searches
fn post_multi_search(&self, aggregate: MultiSearchAggregator);
// this method should be called to aggregate a add documents request
fn add_documents(
&self,

View File

@ -9,7 +9,7 @@ use actix_web::HttpRequest;
use byte_unit::Byte;
use http::header::CONTENT_TYPE;
use index_scheduler::IndexScheduler;
use meilisearch_auth::{AuthController, SearchRules};
use meilisearch_auth::{AuthController, AuthFilter};
use meilisearch_types::InstanceUid;
use once_cell::sync::Lazy;
use regex::Regex;
@ -30,7 +30,7 @@ use crate::routes::indexes::documents::UpdateDocumentsQuery;
use crate::routes::tasks::TasksFilterQuery;
use crate::routes::{create_all_stats, Stats};
use crate::search::{
SearchQuery, SearchResult, DEFAULT_CROP_LENGTH, DEFAULT_CROP_MARKER,
SearchQuery, SearchQueryWithIndex, SearchResult, DEFAULT_CROP_LENGTH, DEFAULT_CROP_MARKER,
DEFAULT_HIGHLIGHT_POST_TAG, DEFAULT_HIGHLIGHT_PRE_TAG, DEFAULT_SEARCH_LIMIT,
};
use crate::Opt;
@ -68,6 +68,7 @@ pub enum AnalyticsMsg {
BatchMessage(Track),
AggregateGetSearch(SearchAggregator),
AggregatePostSearch(SearchAggregator),
AggregatePostMultiSearch(MultiSearchAggregator),
AggregateAddDocuments(DocumentsAggregator),
AggregateDeleteDocuments(DocumentsDeletionAggregator),
AggregateUpdateDocuments(DocumentsAggregator),
@ -133,6 +134,7 @@ impl SegmentAnalytics {
opt: opt.clone(),
batcher,
post_search_aggregator: SearchAggregator::default(),
post_multi_search_aggregator: MultiSearchAggregator::default(),
get_search_aggregator: SearchAggregator::default(),
add_documents_aggregator: DocumentsAggregator::default(),
delete_documents_aggregator: DocumentsDeletionAggregator::default(),
@ -174,6 +176,10 @@ impl super::Analytics for SegmentAnalytics {
let _ = self.sender.try_send(AnalyticsMsg::AggregatePostSearch(aggregate));
}
fn post_multi_search(&self, aggregate: MultiSearchAggregator) {
let _ = self.sender.try_send(AnalyticsMsg::AggregatePostMultiSearch(aggregate));
}
fn add_documents(
&self,
documents_query: &UpdateDocumentsQuery,
@ -218,6 +224,7 @@ impl super::Analytics for SegmentAnalytics {
#[derive(Debug, Clone, Serialize)]
struct Infos {
env: String,
experimental_enable_metrics: bool,
db_path: bool,
import_dump: bool,
dump_dir: bool,
@ -250,6 +257,7 @@ impl From<Opt> for Infos {
// Thus we must not insert `..` at the end.
let Opt {
db_path,
experimental_enable_metrics,
http_addr,
master_key: _,
env,
@ -274,6 +282,7 @@ impl From<Opt> for Infos {
dump_dir,
log_level,
indexer_options,
cluster_configuration: _,
config_file_path,
#[cfg(all(not(debug_assertions), feature = "analytics"))]
no_analytics: _,
@ -284,12 +293,14 @@ impl From<Opt> for Infos {
ScheduleSnapshot::Enabled(interval) => Some(interval),
};
let IndexerOpts { max_indexing_memory, max_indexing_threads } = indexer_options;
let IndexerOpts { max_indexing_memory, max_indexing_threads, skip_index_budget: _ } =
indexer_options;
// We're going to override every sensible information.
// We consider information sensible if it contains a path, an address, or a key.
Self {
env,
experimental_enable_metrics,
db_path: db_path != PathBuf::from("./data.ms"),
import_dump: import_dump.is_some(),
dump_dir: dump_dir != PathBuf::from("dumps/"),
@ -324,6 +335,7 @@ pub struct Segment {
batcher: AutoBatcher,
get_search_aggregator: SearchAggregator,
post_search_aggregator: SearchAggregator,
post_multi_search_aggregator: MultiSearchAggregator,
add_documents_aggregator: DocumentsAggregator,
delete_documents_aggregator: DocumentsDeletionAggregator,
update_documents_aggregator: DocumentsAggregator,
@ -381,6 +393,7 @@ impl Segment {
Some(AnalyticsMsg::BatchMessage(msg)) => drop(self.batcher.push(msg).await),
Some(AnalyticsMsg::AggregateGetSearch(agreg)) => self.get_search_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregatePostSearch(agreg)) => self.post_search_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregatePostMultiSearch(agreg)) => self.post_multi_search_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregateAddDocuments(agreg)) => self.add_documents_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregateDeleteDocuments(agreg)) => self.delete_documents_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregateUpdateDocuments(agreg)) => self.update_documents_aggregator.aggregate(agreg),
@ -399,7 +412,7 @@ impl Segment {
auth_controller: AuthController,
) {
if let Ok(stats) =
create_all_stats(index_scheduler.into(), auth_controller, &SearchRules::default())
create_all_stats(index_scheduler.into(), auth_controller, &AuthFilter::default())
{
// Replace the version number with the prototype name if any.
let version = if let Some(prototype) = crate::prototype_name() {
@ -426,6 +439,8 @@ impl Segment {
.into_event(&self.user, "Documents Searched GET");
let post_search = std::mem::take(&mut self.post_search_aggregator)
.into_event(&self.user, "Documents Searched POST");
let post_multi_search = std::mem::take(&mut self.post_multi_search_aggregator)
.into_event(&self.user, "Documents Searched by Multi-Search POST");
let add_documents = std::mem::take(&mut self.add_documents_aggregator)
.into_event(&self.user, "Documents Added");
let delete_documents = std::mem::take(&mut self.delete_documents_aggregator)
@ -443,6 +458,9 @@ impl Segment {
if let Some(post_search) = post_search {
let _ = self.batcher.push(post_search).await;
}
if let Some(post_multi_search) = post_multi_search {
let _ = self.batcher.push(post_multi_search).await;
}
if let Some(add_documents) = add_documents {
let _ = self.batcher.push(add_documents).await;
}
@ -483,6 +501,7 @@ pub struct SearchAggregator {
// filter
filter_with_geo_radius: bool,
filter_with_geo_bounding_box: bool,
// every time a request has a filter, this field must be incremented by the number of terms it contains
filter_sum_of_criteria_terms: usize,
// every time a request has a filter, this field must be incremented by one
@ -550,6 +569,7 @@ impl SearchAggregator {
let stringified_filters = filter.to_string();
ret.filter_with_geo_radius = stringified_filters.contains("_geoRadius(");
ret.filter_with_geo_bounding_box = stringified_filters.contains("_geoBoundingBox(");
ret.filter_sum_of_criteria_terms = RE.split(&stringified_filters).count();
}
@ -609,6 +629,7 @@ impl SearchAggregator {
// filter
self.filter_with_geo_radius |= other.filter_with_geo_radius;
self.filter_with_geo_bounding_box |= other.filter_with_geo_bounding_box;
self.filter_sum_of_criteria_terms =
self.filter_sum_of_criteria_terms.saturating_add(other.filter_sum_of_criteria_terms);
self.filter_total_number_of_criteria = self
@ -676,6 +697,7 @@ impl SearchAggregator {
},
"filter": {
"with_geoRadius": self.filter_with_geo_radius,
"with_geoBoundingBox": self.filter_with_geo_bounding_box,
"avg_criteria_number": format!("{:.2}", self.filter_sum_of_criteria_terms as f64 / self.filter_total_number_of_criteria as f64),
"most_used_syntax": self.used_syntax.iter().max_by_key(|(_, v)| *v).map(|(k, _)| json!(k)).unwrap_or_else(|| json!(null)),
},
@ -716,6 +738,118 @@ impl SearchAggregator {
}
}
#[derive(Default)]
pub struct MultiSearchAggregator {
timestamp: Option<OffsetDateTime>,
// requests
total_received: usize,
total_succeeded: usize,
// sum of the number of distinct indexes in each single request, use with total_received to compute an avg
total_distinct_index_count: usize,
// number of queries with a single index, use with total_received to compute a proportion
total_single_index: usize,
// sum of the number of search queries in the requests, use with total_received to compute an average
total_search_count: usize,
// context
user_agents: HashSet<String>,
}
impl MultiSearchAggregator {
pub fn from_queries(query: &[SearchQueryWithIndex], request: &HttpRequest) -> Self {
let timestamp = Some(OffsetDateTime::now_utc());
let user_agents = extract_user_agents(request).into_iter().collect();
let distinct_indexes: HashSet<_> =
query.iter().map(|query| query.index_uid.as_str()).collect();
Self {
timestamp,
total_received: 1,
total_succeeded: 0,
total_distinct_index_count: distinct_indexes.len(),
total_single_index: if distinct_indexes.len() == 1 { 1 } else { 0 },
total_search_count: query.len(),
user_agents,
}
}
pub fn succeed(&mut self) {
self.total_succeeded = self.total_succeeded.saturating_add(1);
}
pub fn aggregate(&mut self, other: Self) {
// write the aggregate in a way that will cause a compilation error if a field is added.
// get ownership of self, replacing it by a default value.
let this = std::mem::take(self);
let timestamp = this.timestamp.or(other.timestamp);
let total_received = this.total_received.saturating_add(other.total_received);
let total_succeeded = this.total_succeeded.saturating_add(other.total_succeeded);
let total_distinct_index_count =
this.total_distinct_index_count.saturating_add(other.total_distinct_index_count);
let total_single_index = this.total_single_index.saturating_add(other.total_single_index);
let total_search_count = this.total_search_count.saturating_add(other.total_search_count);
let mut user_agents = this.user_agents;
for user_agent in other.user_agents.into_iter() {
user_agents.insert(user_agent);
}
// need all fields or compile error
let mut aggregated = Self {
timestamp,
total_received,
total_succeeded,
total_distinct_index_count,
total_single_index,
total_search_count,
user_agents,
// do not add _ or ..Default::default() here
};
// replace the default self with the aggregated value
std::mem::swap(self, &mut aggregated);
}
pub fn into_event(self, user: &User, event_name: &str) -> Option<Track> {
if self.total_received == 0 {
None
} else {
let properties = json!({
"user-agent": self.user_agents,
"requests": {
"total_succeeded": self.total_succeeded,
"total_failed": self.total_received.saturating_sub(self.total_succeeded), // just to be sure we never panics
"total_received": self.total_received,
},
"indexes": {
"total_single_index": self.total_single_index,
"total_distinct_index_count": self.total_distinct_index_count,
"avg_distinct_index_count": (self.total_distinct_index_count as f64) / (self.total_received as f64), // not 0 else returned early
},
"searches": {
"total_search_count": self.total_search_count,
"avg_search_count": (self.total_search_count as f64) / (self.total_received as f64),
}
});
Some(Track {
timestamp: self.timestamp,
user: user.clone(),
event: event_name.to_string(),
properties,
..Default::default()
})
}
}
}
#[derive(Default)]
pub struct DocumentsAggregator {
timestamp: Option<OffsetDateTime>,

View File

@ -136,6 +136,13 @@ pub mod policies {
use crate::extractors::authentication::Policy;
enum TenantTokenOutcome {
NotATenantToken,
Invalid,
Expired,
Valid(Uuid, SearchRules),
}
fn tenant_token_validation() -> Validation {
let mut validation = Validation::default();
validation.validate_exp = false;
@ -164,29 +171,42 @@ pub mod policies {
pub struct ActionPolicy<const A: u8>;
impl<const A: u8> Policy for ActionPolicy<A> {
/// Attempts to grant authentication from a bearer token (that can be a tenant token or an API key), the requested Action,
/// and a list of requested indexes.
///
/// If the bearer token is not allowed for the specified indexes and action, returns `None`.
/// Otherwise, returns an object containing the generated permissions: the search filters to add to a search, and the list of allowed indexes
/// (that may contain more indexes than requested).
fn authenticate(
auth: AuthController,
token: &str,
index: Option<&str>,
) -> Option<AuthFilter> {
// authenticate if token is the master key.
// master key can only have access to keys routes.
// if master key is None only keys routes are inaccessible.
// Without a master key, all routes are accessible except the key-related routes.
if auth.get_master_key().map_or_else(|| !is_keys_action(A), |mk| mk == token) {
return Some(AuthFilter::default());
}
// Tenant token
if let Some(filters) = ActionPolicy::<A>::authenticate_tenant_token(&auth, token, index)
{
return Some(filters);
} else if let Some(action) = Action::from_repr(A) {
// API key
if let Ok(Some(uid)) = auth.get_optional_uid_from_encoded_key(token.as_bytes()) {
if let Ok(true) = auth.is_key_authorized(uid, action, index) {
return auth.get_key_filters(uid, None).ok();
let (key_uuid, search_rules) =
match ActionPolicy::<A>::authenticate_tenant_token(&auth, token) {
TenantTokenOutcome::Valid(key_uuid, search_rules) => {
(key_uuid, Some(search_rules))
}
}
TenantTokenOutcome::Expired => return None,
TenantTokenOutcome::Invalid => return None,
TenantTokenOutcome::NotATenantToken => {
(auth.get_optional_uid_from_encoded_key(token.as_bytes()).ok()??, None)
}
};
// check that the indexes are allowed
let action = Action::from_repr(A)?;
let auth_filter = auth.get_key_filters(key_uuid, search_rules).ok()?;
if auth.is_key_authorized(key_uuid, action, index).unwrap_or(false)
&& index.map(|index| auth_filter.is_index_authorized(index)).unwrap_or(true)
{
return Some(auth_filter);
}
None
@ -194,50 +214,43 @@ pub mod policies {
}
impl<const A: u8> ActionPolicy<A> {
fn authenticate_tenant_token(
auth: &AuthController,
token: &str,
index: Option<&str>,
) -> Option<AuthFilter> {
// A tenant token only has access to the search route which always defines an index.
let index = index?;
fn authenticate_tenant_token(auth: &AuthController, token: &str) -> TenantTokenOutcome {
// Only search action can be accessed by a tenant token.
if A != actions::SEARCH {
return None;
return TenantTokenOutcome::NotATenantToken;
}
let uid = extract_key_id(token)?;
// check if parent key is authorized to do the action.
if auth.is_key_authorized(uid, Action::Search, Some(index)).ok()? {
// Check if tenant token is valid.
let key = auth.generate_key(uid)?;
let data = decode::<Claims>(
token,
&DecodingKey::from_secret(key.as_bytes()),
&tenant_token_validation(),
)
.ok()?;
let uid = if let Some(uid) = extract_key_id(token) {
uid
} else {
return TenantTokenOutcome::NotATenantToken;
};
// Check index access if an index restriction is provided.
if !data.claims.search_rules.is_index_authorized(index) {
return None;
// Check if tenant token is valid.
let key = if let Some(key) = auth.generate_key(uid) {
key
} else {
return TenantTokenOutcome::Invalid;
};
let data = if let Ok(data) = decode::<Claims>(
token,
&DecodingKey::from_secret(key.as_bytes()),
&tenant_token_validation(),
) {
data
} else {
return TenantTokenOutcome::Invalid;
};
// Check if token is expired.
if let Some(exp) = data.claims.exp {
if OffsetDateTime::now_utc().unix_timestamp() > exp {
return TenantTokenOutcome::Expired;
}
// Check if token is expired.
if let Some(exp) = data.claims.exp {
if OffsetDateTime::now_utc().unix_timestamp() > exp {
return None;
}
}
return match auth.get_key_filters(uid, Some(data.claims.search_rules)) {
Ok(auth) if auth.search_rules.is_index_authorized(index) => Some(auth),
_ => None,
};
}
None
TenantTokenOutcome::Valid(uid, data.claims.search_rules)
}
}

View File

@ -4,17 +4,15 @@ pub mod error;
pub mod analytics;
#[macro_use]
pub mod extractors;
pub mod metrics;
pub mod middleware;
pub mod option;
pub mod routes;
pub mod search;
#[cfg(feature = "metrics")]
pub mod metrics;
#[cfg(feature = "metrics")]
pub mod route_metrics;
use std::fs::File;
use std::io::{BufReader, BufWriter};
use std::io::{BufReader, BufWriter, Write};
use std::net::ToSocketAddrs;
use std::path::Path;
use std::sync::Arc;
use std::thread;
@ -25,14 +23,15 @@ use actix_http::body::MessageBody;
use actix_web::dev::{ServiceFactory, ServiceResponse};
use actix_web::error::JsonPayloadError;
use actix_web::web::Data;
use actix_web::{middleware, web, HttpRequest};
use actix_web::{web, HttpRequest};
use analytics::Analytics;
use anyhow::bail;
use cluster::{Cluster, Follower, Leader};
use error::PayloadError;
use extractors::payload::PayloadConfig;
use http::header::CONTENT_TYPE;
use index_scheduler::{IndexScheduler, IndexSchedulerOptions};
use log::error;
use log::{error, info};
use meilisearch_auth::AuthController;
use meilisearch_types::milli::documents::{DocumentsBatchBuilder, DocumentsBatchReader};
use meilisearch_types::milli::update::{IndexDocumentsConfig, IndexDocumentsMethod};
@ -45,6 +44,34 @@ use option::ScheduleSnapshot;
use crate::error::MeilisearchHttpError;
/// Default number of simultaneously opened indexes.
///
/// This value is used when dynamic computation of how many indexes can be opened at once was skipped (e.g., in tests).
///
/// Lower for Windows that dedicates a smaller virtual address space to processes.
///
/// The value was chosen this way:
///
/// - Windows provides a small virtual address space of about 10TiB to processes.
/// - The chosen value allows for indexes to use the default map size of 2TiB safely.
#[cfg(windows)]
const DEFAULT_INDEX_COUNT: usize = 4;
/// Default number of simultaneously opened indexes.
///
/// This value is used when dynamic computation of how many indexes can be opened at once was skipped (e.g., in tests).
///
/// The higher, the better for avoiding reopening indexes.
///
/// The value was chosen this way:
///
/// - Opening an index consumes a file descriptor.
/// - The default on many unices is about 256 file descriptors for a process.
/// - 100 is a little bit less than half this value.
/// - The chosen value allows for indexes to use the default map size of 2TiB safely.
#[cfg(not(windows))]
const DEFAULT_INDEX_COUNT: usize = 20;
/// Check if a db is empty. It does not provide any information on the
/// validity of the data in it.
/// We consider a database as non empty when it's a non empty directory.
@ -86,13 +113,13 @@ pub fn create_app(
analytics.clone(),
)
})
.configure(routes::configure)
.configure(|cfg| routes::configure(cfg, opt.experimental_enable_metrics))
.configure(|s| dashboard(s, enable_dashboard));
#[cfg(feature = "metrics")]
let app = app.configure(|s| configure_metrics_route(s, opt.enable_metrics_route));
#[cfg(feature = "metrics")]
let app = app.wrap(Condition::new(opt.enable_metrics_route, route_metrics::RouteMetrics));
let app = app.wrap(actix_web::middleware::Condition::new(
opt.experimental_enable_metrics,
middleware::RouteMetrics,
));
app.wrap(
Cors::default()
.send_wildcard()
@ -101,9 +128,9 @@ pub fn create_app(
.allow_any_method()
.max_age(86_400), // 24h
)
.wrap(middleware::Logger::default())
.wrap(middleware::Compress::default())
.wrap(middleware::NormalizePath::new(middleware::TrailingSlash::Trim))
.wrap(actix_web::middleware::Logger::default())
.wrap(actix_web::middleware::Compress::default())
.wrap(actix_web::middleware::NormalizePath::new(actix_web::middleware::TrailingSlash::Trim))
}
enum OnFailure {
@ -118,7 +145,7 @@ pub fn setup_meilisearch(opt: &Opt) -> anyhow::Result<(Arc<IndexScheduler>, Auth
// the db is empty and the snapshot exists, import it
if empty_db && snapshot_path_exists {
match compression::from_tar_gz(snapshot_path, &opt.db_path) {
Ok(()) => open_or_create_database_unchecked(opt, OnFailure::RemoveDb)?,
Ok(()) => open_or_create_database_unchecked(opt, None, OnFailure::RemoveDb)?,
Err(e) => {
std::fs::remove_dir_all(&opt.db_path)?;
return Err(e);
@ -135,14 +162,14 @@ pub fn setup_meilisearch(opt: &Opt) -> anyhow::Result<(Arc<IndexScheduler>, Auth
bail!("snapshot doesn't exist at {}", snapshot_path.display())
// the snapshot and the db exist, and we can ignore the snapshot because of the ignore_snapshot_if_db_exists flag
} else {
open_or_create_database(opt, empty_db)?
open_or_create_database(opt, empty_db, None)?
}
} else if let Some(ref path) = opt.import_dump {
let src_path_exists = path.exists();
// the db is empty and the dump exists, import it
if empty_db && src_path_exists {
let (mut index_scheduler, mut auth_controller) =
open_or_create_database_unchecked(opt, OnFailure::RemoveDb)?;
open_or_create_database_unchecked(opt, None, OnFailure::RemoveDb)?;
match import_dump(&opt.db_path, path, &mut index_scheduler, &mut auth_controller) {
Ok(()) => (index_scheduler, auth_controller),
Err(e) => {
@ -162,10 +189,62 @@ pub fn setup_meilisearch(opt: &Opt) -> anyhow::Result<(Arc<IndexScheduler>, Auth
// the dump and the db exist and we can ignore the dump because of the ignore_dump_if_db_exists flag
// or, the dump is missing but we can ignore that because of the ignore_missing_dump flag
} else {
open_or_create_database(opt, empty_db)?
open_or_create_database(opt, empty_db, None)?
}
} else if let Some(ref cluster) = opt.cluster_configuration.experimental_enable_ha {
match cluster.as_str() {
"leader" => {
info!("Starting as a leader");
let mut addr = opt.http_addr.to_socket_addrs().unwrap().next().unwrap();
addr.set_port(6666);
open_or_create_database(
opt,
empty_db,
Some(Cluster::Leader(Leader::new(addr, opt.master_key.clone()))),
)?
}
"follower" => {
info!("Starting as a follower");
if !empty_db {
panic!("Can't start as a follower with an already existing data.ms");
}
let mut addr = opt
.cluster_configuration
.leader
.as_ref()
.expect("Can't be a follower without a leader")
.to_socket_addrs()
.unwrap()
.next()
.unwrap();
addr.set_port(6666);
let (follower, dump) = Follower::join(addr, opt.master_key.clone());
let mut dump_file = tempfile::NamedTempFile::new().unwrap();
dump_file.write_all(&dump).unwrap();
let (mut index_scheduler, mut auth_controller) = open_or_create_database_unchecked(
opt,
Some(Cluster::Follower(follower)),
OnFailure::RemoveDb,
)?;
match import_dump(
&opt.db_path,
dump_file.path(),
&mut index_scheduler,
&mut auth_controller,
) {
Ok(()) => (index_scheduler, auth_controller),
Err(e) => {
std::fs::remove_dir_all(&opt.db_path)?;
return Err(e);
}
}
}
_ => panic!("Available values for the cluster mode are leader and follower"),
}
} else {
open_or_create_database(opt, empty_db)?
open_or_create_database(opt, empty_db, None)?
};
// We create a loop in a thread that registers snapshotCreation tasks
@ -190,25 +269,34 @@ pub fn setup_meilisearch(opt: &Opt) -> anyhow::Result<(Arc<IndexScheduler>, Auth
/// Try to start the IndexScheduler and AuthController without checking the VERSION file or anything.
fn open_or_create_database_unchecked(
opt: &Opt,
cluster: Option<Cluster>,
on_failure: OnFailure,
) -> anyhow::Result<(IndexScheduler, AuthController)> {
// we don't want to create anything in the data.ms yet, thus we
// wrap our two builders in a closure that'll be executed later.
let auth_controller = AuthController::new(&opt.db_path, &opt.master_key);
let auth_controller = AuthController::new(&opt.db_path, &opt.master_key, cluster.clone());
let index_scheduler_builder = || -> anyhow::Result<_> {
Ok(IndexScheduler::new(IndexSchedulerOptions {
version_file_path: opt.db_path.join(VERSION_FILE_NAME),
auth_path: opt.db_path.join("auth"),
tasks_path: opt.db_path.join("tasks"),
update_file_path: opt.db_path.join("update_files"),
indexes_path: opt.db_path.join("indexes"),
snapshots_path: opt.snapshot_dir.clone(),
dumps_path: opt.dump_dir.clone(),
task_db_size: opt.max_task_db_size.get_bytes() as usize,
index_size: opt.max_index_size.get_bytes() as usize,
indexer_config: (&opt.indexer_options).try_into()?,
autobatching_enabled: true,
})?)
Ok(IndexScheduler::new(
IndexSchedulerOptions {
version_file_path: opt.db_path.join(VERSION_FILE_NAME),
auth_path: opt.db_path.join("auth"),
tasks_path: opt.db_path.join("tasks"),
update_file_path: opt.db_path.join("update_files"),
indexes_path: opt.db_path.join("indexes"),
snapshots_path: opt.snapshot_dir.clone(),
dumps_path: opt.dump_dir.clone(),
task_db_size: opt.max_task_db_size.get_bytes() as usize,
index_base_map_size: opt.max_index_size.get_bytes() as usize,
indexer_config: (&opt.indexer_options).try_into()?,
autobatching_enabled: true,
index_growth_amount: byte_unit::Byte::from_str("10GiB").unwrap().get_bytes()
as usize,
index_count: DEFAULT_INDEX_COUNT,
},
cluster,
opt.cluster_configuration.consistency,
)?)
};
match (
@ -230,12 +318,13 @@ fn open_or_create_database_unchecked(
fn open_or_create_database(
opt: &Opt,
empty_db: bool,
cluster: Option<Cluster>,
) -> anyhow::Result<(IndexScheduler, AuthController)> {
if !empty_db {
check_version_file(&opt.db_path)?;
}
open_or_create_database_unchecked(opt, OnFailure::KeepDb)
open_or_create_database_unchecked(opt, cluster, OnFailure::KeepDb)
}
fn import_dump(
@ -419,15 +508,6 @@ pub fn dashboard(config: &mut web::ServiceConfig, _enable_frontend: bool) {
config.service(web::resource("/").route(web::get().to(routes::running)));
}
#[cfg(feature = "metrics")]
pub fn configure_metrics_route(config: &mut web::ServiceConfig, enable_metrics_route: bool) {
if enable_metrics_route {
config.service(
web::resource("/metrics").route(web::get().to(crate::route_metrics::get_metrics)),
);
}
}
/// Parses the output of
/// [`VERGEN_GIT_SEMVER_LIGHTWEIGHT`](https://docs.rs/vergen/latest/vergen/struct.Git.html#instructions)
/// as a prototype name.

View File

@ -1,40 +1,11 @@
//! Contains all the custom middleware used in meilisearch
use std::future::{ready, Ready};
use actix_web::dev::{self, Service, ServiceRequest, ServiceResponse, Transform};
use actix_web::http::header;
use actix_web::{Error, HttpResponse};
use actix_web::Error;
use futures_util::future::LocalBoxFuture;
use meilisearch_auth::actions;
use meilisearch_lib::MeiliSearch;
use meilisearch_types::error::ResponseError;
use prometheus::{Encoder, HistogramTimer, TextEncoder};
use crate::extractors::authentication::policies::ActionPolicy;
use crate::extractors::authentication::GuardedData;
pub async fn get_metrics(
meilisearch: GuardedData<ActionPolicy<{ actions::METRICS_GET }>, MeiliSearch>,
) -> Result<HttpResponse, ResponseError> {
let search_rules = &meilisearch.filters().search_rules;
let response = meilisearch.get_all_stats(search_rules).await?;
crate::metrics::MEILISEARCH_DB_SIZE_BYTES.set(response.database_size as i64);
crate::metrics::MEILISEARCH_INDEX_COUNT.set(response.indexes.len() as i64);
for (index, value) in response.indexes.iter() {
crate::metrics::MEILISEARCH_INDEX_DOCS_COUNT
.with_label_values(&[index])
.set(value.number_of_documents as i64);
}
let encoder = TextEncoder::new();
let mut buffer = vec![];
encoder.encode(&prometheus::gather(), &mut buffer).expect("Failed to encode metrics");
let response = String::from_utf8(buffer).expect("Failed to convert bytes to string");
Ok(HttpResponse::Ok().insert_header(header::ContentType(mime::TEXT_PLAIN)).body(response))
}
use prometheus::HistogramTimer;
pub struct RouteMetrics;

View File

@ -12,6 +12,7 @@ use std::{env, fmt, fs};
use byte_unit::{Byte, ByteError};
use clap::Parser;
use cluster::Consistency;
use meilisearch_types::milli::update::IndexerConfig;
use rustls::server::{
AllowAnyAnonymousOrAuthenticatedClient, AllowAnyAuthenticatedClient, ServerSessionMemoryCache,
@ -47,8 +48,7 @@ const MEILI_IGNORE_MISSING_DUMP: &str = "MEILI_IGNORE_MISSING_DUMP";
const MEILI_IGNORE_DUMP_IF_DB_EXISTS: &str = "MEILI_IGNORE_DUMP_IF_DB_EXISTS";
const MEILI_DUMP_DIR: &str = "MEILI_DUMP_DIR";
const MEILI_LOG_LEVEL: &str = "MEILI_LOG_LEVEL";
#[cfg(feature = "metrics")]
const MEILI_ENABLE_METRICS_ROUTE: &str = "MEILI_ENABLE_METRICS_ROUTE";
const MEILI_EXPERIMENTAL_ENABLE_METRICS: &str = "MEILI_EXPERIMENTAL_ENABLE_METRICS";
const DEFAULT_CONFIG_FILE_PATH: &str = "./config.toml";
const DEFAULT_DB_PATH: &str = "./data.ms";
@ -65,11 +65,11 @@ const MEILI_MAX_INDEXING_THREADS: &str = "MEILI_MAX_INDEXING_THREADS";
const DEFAULT_LOG_EVERY_N: usize = 100_000;
// Each environment (index and task-db) is taking space in the virtual address space.
//
// The size of the virtual address space is limited by the OS. About 100TB for Linux and about 10TB for Windows.
// This means that the number of indexes is limited to about 200 for Linux and about 20 for Windows.
pub const INDEX_SIZE: u64 = 536_870_912_000; // 500 GiB
pub const TASK_DB_SIZE: u64 = 10_737_418_240; // 10 GiB
// Ideally, indexes can occupy 2TiB each to avoid having to manually resize them.
// The actual size of the virtual address space is computed at startup to determine how many 2TiB indexes can be
// opened simultaneously.
pub const INDEX_SIZE: u64 = 2 * 1024 * 1024 * 1024 * 1024; // 2 TiB
pub const TASK_DB_SIZE: u64 = 10 * 1024 * 1024 * 1024; // 10 GiB
#[derive(Debug, Default, Clone, Copy, Serialize, Deserialize)]
#[serde(rename_all = "UPPERCASE")]
@ -287,16 +287,21 @@ pub struct Opt {
#[serde(default)]
pub log_level: LogLevel,
/// Enables Prometheus metrics and /metrics route.
#[cfg(feature = "metrics")]
#[clap(long, env = MEILI_ENABLE_METRICS_ROUTE)]
/// Experimental metrics feature. For more information, see: <https://github.com/meilisearch/meilisearch/discussions/3518>
///
/// Enables the Prometheus metrics on the `GET /metrics` endpoint.
#[clap(long, env = MEILI_EXPERIMENTAL_ENABLE_METRICS)]
#[serde(default)]
pub enable_metrics_route: bool,
pub experimental_enable_metrics: bool,
#[serde(flatten)]
#[clap(flatten)]
pub indexer_options: IndexerOpts,
#[serde(flatten)]
#[clap(flatten)]
pub cluster_configuration: ClusterOpts,
/// Set the path to a configuration file that should be used to setup the engine.
/// Format must be TOML.
#[clap(long)]
@ -384,8 +389,8 @@ impl Opt {
config_file_path: _,
#[cfg(all(not(debug_assertions), feature = "analytics"))]
no_analytics,
#[cfg(feature = "metrics")]
enable_metrics_route,
experimental_enable_metrics: enable_metrics_route,
cluster_configuration: _,
} = self;
export_to_env_if_not_present(MEILI_DB_PATH, db_path);
export_to_env_if_not_present(MEILI_HTTP_ADDR, http_addr);
@ -423,13 +428,10 @@ impl Opt {
export_to_env_if_not_present(MEILI_DUMP_DIR, dump_dir);
export_to_env_if_not_present(MEILI_LOG_LEVEL, log_level.to_string());
#[cfg(feature = "metrics")]
{
export_to_env_if_not_present(
MEILI_ENABLE_METRICS_ROUTE,
enable_metrics_route.to_string(),
);
}
export_to_env_if_not_present(
MEILI_EXPERIMENTAL_ENABLE_METRICS,
enable_metrics_route.to_string(),
);
indexer_options.export_to_env();
}
@ -494,12 +496,21 @@ pub struct IndexerOpts {
#[clap(long, env = MEILI_MAX_INDEXING_THREADS, default_value_t)]
#[serde(default)]
pub max_indexing_threads: MaxThreads,
/// Whether or not we want to determine the budget of virtual memory address space we have available dynamically
/// (the default), or statically.
///
/// Determining the budget of virtual memory address space dynamically takes some time on some systems (such as macOS)
/// and may make tests non-deterministic, so we want to skip it in tests.
#[clap(skip)]
#[serde(skip)]
pub skip_index_budget: bool,
}
impl IndexerOpts {
/// Exports the values to their corresponding env vars if they are not set.
pub fn export_to_env(self) {
let IndexerOpts { max_indexing_memory, max_indexing_threads } = self;
let IndexerOpts { max_indexing_memory, max_indexing_threads, skip_index_budget: _ } = self;
if let Some(max_indexing_memory) = max_indexing_memory.0 {
export_to_env_if_not_present(
MEILI_MAX_INDEXING_MEMORY,
@ -513,6 +524,21 @@ impl IndexerOpts {
}
}
#[derive(Debug, Default, Clone, Parser, Deserialize)]
pub struct ClusterOpts {
#[clap(long)]
#[serde(default)]
pub experimental_enable_ha: Option<String>,
#[clap(long)]
#[serde(default)]
pub leader: Option<String>,
#[clap(long, default_value_t)]
#[serde(default)]
pub consistency: Consistency,
}
impl TryFrom<&IndexerOpts> for IndexerConfig {
type Error = anyhow::Error;
@ -527,6 +553,7 @@ impl TryFrom<&IndexerOpts> for IndexerConfig {
max_memory: other.max_indexing_memory.map(|b| b.get_bytes() as usize),
thread_pool: Some(thread_pool),
max_positions_per_attributes: None,
skip_index_budget: other.skip_index_budget,
..Default::default()
})
}

View File

@ -160,19 +160,19 @@ pub async fn get_all_documents(
pub struct UpdateDocumentsQuery {
#[deserr(default, error = DeserrQueryParamError<InvalidIndexPrimaryKey>)]
pub primary_key: Option<String>,
#[deserr(default, try_from(char) = from_char_csv_delimiter -> DeserrQueryParamError<InvalidIndexCsvDelimiter>, error = DeserrQueryParamError<InvalidIndexCsvDelimiter>)]
#[deserr(default, try_from(char) = from_char_csv_delimiter -> DeserrQueryParamError<InvalidDocumentCsvDelimiter>, error = DeserrQueryParamError<InvalidDocumentCsvDelimiter>)]
pub csv_delimiter: Option<u8>,
}
fn from_char_csv_delimiter(
c: char,
) -> Result<Option<u8>, DeserrQueryParamError<InvalidIndexCsvDelimiter>> {
) -> Result<Option<u8>, DeserrQueryParamError<InvalidDocumentCsvDelimiter>> {
if c.is_ascii() {
Ok(Some(c as u8))
} else {
Err(DeserrQueryParamError::new(
format!("csv delimiter must be an ascii character. Found: `{}`", c),
Code::InvalidIndexCsvDelimiter,
Code::InvalidDocumentCsvDelimiter,
))
}
}
@ -192,7 +192,7 @@ pub async fn replace_documents(
analytics.add_documents(&params, index_scheduler.index(&index_uid).is_err(), &req);
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation = index_scheduler.filters().allow_index_creation(&index_uid);
let task = document_addition(
extract_mime_type(&req)?,
index_scheduler,
@ -223,7 +223,7 @@ pub async fn update_documents(
analytics.update_documents(&params, index_scheduler.index(&index_uid).is_err(), &req);
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation = index_scheduler.filters().allow_index_creation(&index_uid);
let task = document_addition(
extract_mime_type(&req)?,
index_scheduler,

View File

@ -61,6 +61,8 @@ pub struct IndexView {
impl IndexView {
fn new(uid: String, index: &Index) -> Result<IndexView, milli::Error> {
// It is important that this function does not keep the Index handle or a clone of it, because
// `list_indexes` relies on this property to avoid opening all indexes at once.
let rtxn = index.read_txn()?;
Ok(IndexView {
uid,
@ -89,14 +91,16 @@ pub async fn list_indexes(
index_scheduler: GuardedData<ActionPolicy<{ actions::INDEXES_GET }>, Data<IndexScheduler>>,
paginate: AwebQueryParameter<ListIndexes, DeserrQueryParamError>,
) -> Result<HttpResponse, ResponseError> {
let search_rules = &index_scheduler.filters().search_rules;
let indexes: Vec<_> = index_scheduler.indexes()?;
let indexes = indexes
.into_iter()
.filter(|(name, _)| search_rules.is_index_authorized(name))
.map(|(name, index)| IndexView::new(name, &index))
.collect::<Result<Vec<_>, _>>()?;
let filters = index_scheduler.filters();
let indexes: Vec<Option<IndexView>> =
index_scheduler.try_for_each_index(|uid, index| -> Result<Option<IndexView>, _> {
if !filters.is_index_authorized(uid) {
return Ok(None);
}
Ok(Some(IndexView::new(uid.to_string(), index)?))
})?;
// Won't cause to open all indexes because IndexView doesn't keep the `Index` opened.
let indexes: Vec<IndexView> = indexes.into_iter().flatten().collect();
let ret = paginate.as_pagination().auto_paginate_sized(indexes.into_iter());
debug!("returns: {:?}", ret);
@ -120,7 +124,7 @@ pub async fn create_index(
) -> Result<HttpResponse, ResponseError> {
let IndexCreateRequest { primary_key, uid } = body.into_inner();
let allow_index_creation = index_scheduler.filters().search_rules.is_index_authorized(&uid);
let allow_index_creation = index_scheduler.filters().allow_index_creation(&uid);
if allow_index_creation {
analytics.publish(
"Index Created".to_string(),

View File

@ -3,7 +3,6 @@ use actix_web::{web, HttpRequest, HttpResponse};
use deserr::actix_web::{AwebJson, AwebQueryParameter};
use index_scheduler::IndexScheduler;
use log::debug;
use meilisearch_auth::IndexSearchRules;
use meilisearch_types::deserr::query_params::Param;
use meilisearch_types::deserr::{DeserrJsonError, DeserrQueryParamError};
use meilisearch_types::error::deserr_codes::*;
@ -17,9 +16,9 @@ use crate::extractors::authentication::policies::*;
use crate::extractors::authentication::GuardedData;
use crate::extractors::sequential_extractor::SeqHandler;
use crate::search::{
perform_search, MatchingStrategy, SearchQuery, DEFAULT_CROP_LENGTH, DEFAULT_CROP_MARKER,
DEFAULT_HIGHLIGHT_POST_TAG, DEFAULT_HIGHLIGHT_PRE_TAG, DEFAULT_SEARCH_LIMIT,
DEFAULT_SEARCH_OFFSET,
add_search_rules, perform_search, MatchingStrategy, SearchQuery, DEFAULT_CROP_LENGTH,
DEFAULT_CROP_MARKER, DEFAULT_HIGHLIGHT_POST_TAG, DEFAULT_HIGHLIGHT_PRE_TAG,
DEFAULT_SEARCH_LIMIT, DEFAULT_SEARCH_OFFSET,
};
pub fn configure(cfg: &mut web::ServiceConfig) {
@ -101,26 +100,6 @@ impl From<SearchQueryGet> for SearchQuery {
}
}
/// Incorporate search rules in search query
fn add_search_rules(query: &mut SearchQuery, rules: IndexSearchRules) {
query.filter = match (query.filter.take(), rules.filter) {
(None, rules_filter) => rules_filter,
(filter, None) => filter,
(Some(filter), Some(rules_filter)) => {
let filter = match filter {
Value::Array(filter) => filter,
filter => vec![filter],
};
let rules_filter = match rules_filter {
Value::Array(rules_filter) => rules_filter,
rules_filter => vec![rules_filter],
};
Some(Value::Array([filter, rules_filter].concat()))
}
}
}
// TODO: TAMO: split on :asc, and :desc, instead of doing some weird things
/// Transform the sort query parameter into something that matches the post expected format.
@ -159,9 +138,7 @@ pub async fn search_with_url_query(
let mut query: SearchQuery = params.into_inner().into();
// Tenant token search_rules.
if let Some(search_rules) =
index_scheduler.filters().search_rules.get_index_search_rules(&index_uid)
{
if let Some(search_rules) = index_scheduler.filters().get_index_search_rules(&index_uid) {
add_search_rules(&mut query, search_rules);
}
@ -193,9 +170,7 @@ pub async fn search_with_post(
debug!("search called with params: {:?}", query);
// Tenant token search_rules.
if let Some(search_rules) =
index_scheduler.filters().search_rules.get_index_search_rules(&index_uid)
{
if let Some(search_rules) = index_scheduler.filters().get_index_search_rules(&index_uid) {
add_search_rules(&mut query, search_rules);
}

View File

@ -45,7 +45,8 @@ macro_rules! make_setting_route {
let new_settings = Settings { $attr: Setting::Reset.into(), ..Default::default() };
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation =
index_scheduler.filters().allow_index_creation(&index_uid);
let task = KindWithContent::SettingsUpdate {
index_uid: index_uid.to_string(),
@ -86,7 +87,8 @@ macro_rules! make_setting_route {
..Default::default()
};
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation =
index_scheduler.filters().allow_index_creation(&index_uid);
let task = KindWithContent::SettingsUpdate {
index_uid: index_uid.to_string(),
@ -560,7 +562,7 @@ pub async fn update_all(
Some(&req),
);
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation = index_scheduler.filters().allow_index_creation(&index_uid);
let index_uid = IndexUid::try_from(index_uid.into_inner())?.into_inner();
let task = KindWithContent::SettingsUpdate {
index_uid,
@ -596,7 +598,7 @@ pub async fn delete_all(
let new_settings = Settings::cleared().into_unchecked();
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation = index_scheduler.filters().allow_index_creation(&index_uid);
let index_uid = IndexUid::try_from(index_uid.into_inner())?.into_inner();
let task = KindWithContent::SettingsUpdate {
index_uid,

View File

@ -0,0 +1,50 @@
use actix_web::http::header;
use actix_web::web::{self, Data};
use actix_web::HttpResponse;
use index_scheduler::IndexScheduler;
use meilisearch_auth::AuthController;
use meilisearch_types::error::ResponseError;
use meilisearch_types::keys::actions;
use prometheus::{Encoder, TextEncoder};
use crate::extractors::authentication::policies::ActionPolicy;
use crate::extractors::authentication::{AuthenticationError, GuardedData};
use crate::routes::create_all_stats;
pub fn configure(config: &mut web::ServiceConfig) {
config.service(web::resource("").route(web::get().to(get_metrics)));
}
pub async fn get_metrics(
index_scheduler: GuardedData<ActionPolicy<{ actions::METRICS_GET }>, Data<IndexScheduler>>,
auth_controller: GuardedData<ActionPolicy<{ actions::METRICS_GET }>, AuthController>,
) -> Result<HttpResponse, ResponseError> {
let auth_filters = index_scheduler.filters();
if !auth_filters.all_indexes_authorized() {
let mut error = ResponseError::from(AuthenticationError::InvalidToken);
error
.message
.push_str(" The API key for the `/metrics` route must allow access to all indexes.");
return Err(error);
}
let response =
create_all_stats((*index_scheduler).clone(), (*auth_controller).clone(), auth_filters)?;
crate::metrics::MEILISEARCH_DB_SIZE_BYTES.set(response.database_size as i64);
crate::metrics::MEILISEARCH_INDEX_COUNT.set(response.indexes.len() as i64);
for (index, value) in response.indexes.iter() {
crate::metrics::MEILISEARCH_INDEX_DOCS_COUNT
.with_label_values(&[index])
.set(value.number_of_documents as i64);
}
let encoder = TextEncoder::new();
let mut buffer = vec![];
encoder.encode(&prometheus::gather(), &mut buffer).expect("Failed to encode metrics");
let response = String::from_utf8(buffer).expect("Failed to convert bytes to string");
Ok(HttpResponse::Ok().insert_header(header::ContentType(mime::TEXT_PLAIN)).body(response))
}

View File

@ -22,10 +22,12 @@ const PAGINATION_DEFAULT_LIMIT: usize = 20;
mod api_key;
mod dump;
pub mod indexes;
mod metrics;
mod multi_search;
mod swap_indexes;
pub mod tasks;
pub fn configure(cfg: &mut web::ServiceConfig) {
pub fn configure(cfg: &mut web::ServiceConfig, enable_metrics: bool) {
cfg.service(web::scope("/tasks").configure(tasks::configure))
.service(web::resource("/health").route(web::get().to(get_health)))
.service(web::scope("/keys").configure(api_key::configure))
@ -33,7 +35,12 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
.service(web::resource("/stats").route(web::get().to(get_stats)))
.service(web::resource("/version").route(web::get().to(get_version)))
.service(web::scope("/indexes").configure(indexes::configure))
.service(web::scope("/multi-search").configure(multi_search::configure))
.service(web::scope("/swap-indexes").configure(swap_indexes::configure));
if enable_metrics {
cfg.service(web::scope("/metrics").configure(metrics::configure));
}
}
#[derive(Debug, Serialize)]
@ -237,10 +244,9 @@ async fn get_stats(
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
analytics.publish("Stats Seen".to_string(), json!({ "per_index_uid": false }), Some(&req));
let search_rules = &index_scheduler.filters().search_rules;
let filters = index_scheduler.filters();
let stats =
create_all_stats((*index_scheduler).clone(), (*auth_controller).clone(), search_rules)?;
let stats = create_all_stats((*index_scheduler).clone(), (*auth_controller).clone(), filters)?;
debug!("returns: {:?}", stats);
Ok(HttpResponse::Ok().json(stats))
@ -249,20 +255,20 @@ async fn get_stats(
pub fn create_all_stats(
index_scheduler: Data<IndexScheduler>,
auth_controller: AuthController,
search_rules: &meilisearch_auth::SearchRules,
filters: &meilisearch_auth::AuthFilter,
) -> Result<Stats, ResponseError> {
let mut last_task: Option<OffsetDateTime> = None;
let mut indexes = BTreeMap::new();
let mut database_size = 0;
let processing_task = index_scheduler.get_tasks_from_authorized_indexes(
Query { statuses: Some(vec![Status::Processing]), limit: Some(1), ..Query::default() },
search_rules.authorized_indexes(),
filters,
)?;
// accumulate the size of each indexes
let processing_index = processing_task.first().and_then(|task| task.index_uid());
for (name, index) in index_scheduler.indexes()? {
if !search_rules.is_index_authorized(&name) {
continue;
index_scheduler.try_for_each_index(|name, index| {
if !filters.is_index_authorized(name) {
return Ok(());
}
database_size += index.on_disk_size()?;
@ -277,8 +283,9 @@ pub fn create_all_stats(
let updated_at = index.updated_at(&rtxn)?;
last_task = last_task.map_or(Some(updated_at), |last| Some(last.max(updated_at)));
indexes.insert(name, stats);
}
indexes.insert(name.to_string(), stats);
Ok(())
})?;
database_size += index_scheduler.size()?;
database_size += auth_controller.size()?;

View File

@ -0,0 +1,122 @@
use actix_http::StatusCode;
use actix_web::web::{self, Data};
use actix_web::{HttpRequest, HttpResponse};
use deserr::actix_web::AwebJson;
use index_scheduler::IndexScheduler;
use log::debug;
use meilisearch_types::deserr::DeserrJsonError;
use meilisearch_types::error::ResponseError;
use meilisearch_types::keys::actions;
use serde::Serialize;
use crate::analytics::{Analytics, MultiSearchAggregator};
use crate::extractors::authentication::policies::ActionPolicy;
use crate::extractors::authentication::{AuthenticationError, GuardedData};
use crate::extractors::sequential_extractor::SeqHandler;
use crate::search::{
add_search_rules, perform_search, SearchQueryWithIndex, SearchResultWithIndex,
};
pub fn configure(cfg: &mut web::ServiceConfig) {
cfg.service(web::resource("").route(web::post().to(SeqHandler(multi_search_with_post))));
}
#[derive(Serialize)]
struct SearchResults {
results: Vec<SearchResultWithIndex>,
}
#[derive(Debug, deserr::Deserr)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
pub struct SearchQueries {
queries: Vec<SearchQueryWithIndex>,
}
pub async fn multi_search_with_post(
index_scheduler: GuardedData<ActionPolicy<{ actions::SEARCH }>, Data<IndexScheduler>>,
params: AwebJson<SearchQueries, DeserrJsonError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
let queries = params.into_inner().queries;
let mut multi_aggregate = MultiSearchAggregator::from_queries(&queries, &req);
// Explicitly expect a `(ResponseError, usize)` for the error type rather than `ResponseError` only,
// so that `?` doesn't work if it doesn't use `with_index`, ensuring that it is not forgotten in case of code
// changes.
let search_results: Result<_, (ResponseError, usize)> = (|| {
async {
let mut search_results = Vec::with_capacity(queries.len());
for (query_index, (index_uid, mut query)) in
queries.into_iter().map(SearchQueryWithIndex::into_index_query).enumerate()
{
debug!("multi-search #{query_index}: called with params: {:?}", query);
// Check index from API key
if !index_scheduler.filters().is_index_authorized(&index_uid) {
return Err(AuthenticationError::InvalidToken).with_index(query_index);
}
// Apply search rules from tenant token
if let Some(search_rules) =
index_scheduler.filters().get_index_search_rules(&index_uid)
{
add_search_rules(&mut query, search_rules);
}
let index = index_scheduler
.index(&index_uid)
.map_err(|err| {
let mut err = ResponseError::from(err);
// Patch the HTTP status code to 400 as it defaults to 404 for `index_not_found`, but
// here the resource not found is not part of the URL.
err.code = StatusCode::BAD_REQUEST;
err
})
.with_index(query_index)?;
let search_result =
tokio::task::spawn_blocking(move || perform_search(&index, query))
.await
.with_index(query_index)?;
search_results.push(SearchResultWithIndex {
index_uid: index_uid.into_inner(),
result: search_result.with_index(query_index)?,
});
}
Ok(search_results)
}
})()
.await;
if search_results.is_ok() {
multi_aggregate.succeed();
}
analytics.post_multi_search(multi_aggregate);
let search_results = search_results.map_err(|(mut err, query_index)| {
// Add the query index that failed as context for the error message.
// We're doing it only here and not directly in the `WithIndex` trait so that the `with_index` function returns a different type
// of result and we can benefit from static typing.
err.message = format!("Inside `.queries[{query_index}]`: {}", err.message);
err
})?;
debug!("returns: {:?}", search_results);
Ok(HttpResponse::Ok().json(SearchResults { results: search_results }))
}
/// Local `Result` extension trait to avoid `map_err` boilerplate.
trait WithIndex {
type T;
/// convert the error type inside of the `Result` to a `ResponseError`, and return a couple of it + the usize.
fn with_index(self, index: usize) -> Result<Self::T, (ResponseError, usize)>;
}
impl<T, E: Into<ResponseError>> WithIndex for Result<T, E> {
type T = T;
fn with_index(self, index: usize) -> Result<T, (ResponseError, usize)> {
self.map_err(|err| (err.into(), index))
}
}

View File

@ -42,7 +42,7 @@ pub async fn swap_indexes(
}),
Some(&req),
);
let search_rules = &index_scheduler.filters().search_rules;
let filters = index_scheduler.filters();
let mut swaps = vec![];
for SwapIndexesPayload { indexes } in params.into_iter() {
@ -53,7 +53,7 @@ pub async fn swap_indexes(
return Err(MeilisearchHttpError::SwapIndexPayloadWrongLength(indexes).into());
}
};
if !search_rules.is_index_authorized(lhs) || !search_rules.is_index_authorized(rhs) {
if !filters.is_index_authorized(lhs) || !filters.is_index_authorized(rhs) {
return Err(AuthenticationError::InvalidToken.into());
}
swaps.push(IndexSwap { indexes: (lhs.to_string(), rhs.to_string()) });

View File

@ -319,7 +319,7 @@ async fn cancel_tasks(
let tasks = index_scheduler.get_task_ids_from_authorized_indexes(
&index_scheduler.read_txn()?,
&query,
&index_scheduler.filters().search_rules.authorized_indexes(),
index_scheduler.filters(),
)?;
let task_cancelation =
KindWithContent::TaskCancelation { query: format!("?{}", req.query_string()), tasks };
@ -364,7 +364,7 @@ async fn delete_tasks(
let tasks = index_scheduler.get_task_ids_from_authorized_indexes(
&index_scheduler.read_txn()?,
&query,
&index_scheduler.filters().search_rules.authorized_indexes(),
index_scheduler.filters(),
)?;
let task_deletion =
KindWithContent::TaskDeletion { query: format!("?{}", req.query_string()), tasks };
@ -398,10 +398,7 @@ async fn get_tasks(
let query = params.into_query();
let mut tasks_results: Vec<TaskView> = index_scheduler
.get_tasks_from_authorized_indexes(
query,
index_scheduler.filters().search_rules.authorized_indexes(),
)?
.get_tasks_from_authorized_indexes(query, index_scheduler.filters())?
.into_iter()
.map(|t| TaskView::from_task(&t))
.collect();
@ -439,12 +436,8 @@ async fn get_task(
let query = index_scheduler::Query { uids: Some(vec![task_uid]), ..Query::default() };
if let Some(task) = index_scheduler
.get_tasks_from_authorized_indexes(
query,
index_scheduler.filters().search_rules.authorized_indexes(),
)?
.first()
if let Some(task) =
index_scheduler.get_tasks_from_authorized_indexes(query, index_scheduler.filters())?.first()
{
let task_view = TaskView::from_task(task);
Ok(HttpResponse::Ok().json(task_view))

View File

@ -5,8 +5,10 @@ use std::time::Instant;
use deserr::Deserr;
use either::Either;
use meilisearch_auth::IndexSearchRules;
use meilisearch_types::deserr::DeserrJsonError;
use meilisearch_types::error::deserr_codes::*;
use meilisearch_types::index_uid::IndexUid;
use meilisearch_types::settings::DEFAULT_PAGINATION_MAX_TOTAL_HITS;
use meilisearch_types::{milli, Document};
use milli::tokenizer::TokenizerBuilder;
@ -74,6 +76,100 @@ impl SearchQuery {
}
}
/// A `SearchQuery` + an index UID.
// This struct contains the fields of `SearchQuery` inline.
// This is because neither deserr nor serde support `flatten` when using `deny_unknown_fields.
// The `From<SearchQueryWithIndex>` implementation ensures both structs remain up to date.
#[derive(Debug, Clone, PartialEq, Eq, Deserr)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
pub struct SearchQueryWithIndex {
#[deserr(error = DeserrJsonError<InvalidIndexUid>, missing_field_error = DeserrJsonError::missing_index_uid)]
pub index_uid: IndexUid,
#[deserr(default, error = DeserrJsonError<InvalidSearchQ>)]
pub q: Option<String>,
#[deserr(default = DEFAULT_SEARCH_OFFSET(), error = DeserrJsonError<InvalidSearchOffset>)]
pub offset: usize,
#[deserr(default = DEFAULT_SEARCH_LIMIT(), error = DeserrJsonError<InvalidSearchLimit>)]
pub limit: usize,
#[deserr(default, error = DeserrJsonError<InvalidSearchPage>)]
pub page: Option<usize>,
#[deserr(default, error = DeserrJsonError<InvalidSearchHitsPerPage>)]
pub hits_per_page: Option<usize>,
#[deserr(default, error = DeserrJsonError<InvalidSearchAttributesToRetrieve>)]
pub attributes_to_retrieve: Option<BTreeSet<String>>,
#[deserr(default, error = DeserrJsonError<InvalidSearchAttributesToCrop>)]
pub attributes_to_crop: Option<Vec<String>>,
#[deserr(default, error = DeserrJsonError<InvalidSearchCropLength>, default = DEFAULT_CROP_LENGTH())]
pub crop_length: usize,
#[deserr(default, error = DeserrJsonError<InvalidSearchAttributesToHighlight>)]
pub attributes_to_highlight: Option<HashSet<String>>,
#[deserr(default, error = DeserrJsonError<InvalidSearchShowMatchesPosition>, default)]
pub show_matches_position: bool,
#[deserr(default, error = DeserrJsonError<InvalidSearchFilter>)]
pub filter: Option<Value>,
#[deserr(default, error = DeserrJsonError<InvalidSearchSort>)]
pub sort: Option<Vec<String>>,
#[deserr(default, error = DeserrJsonError<InvalidSearchFacets>)]
pub facets: Option<Vec<String>>,
#[deserr(default, error = DeserrJsonError<InvalidSearchHighlightPreTag>, default = DEFAULT_HIGHLIGHT_PRE_TAG())]
pub highlight_pre_tag: String,
#[deserr(default, error = DeserrJsonError<InvalidSearchHighlightPostTag>, default = DEFAULT_HIGHLIGHT_POST_TAG())]
pub highlight_post_tag: String,
#[deserr(default, error = DeserrJsonError<InvalidSearchCropMarker>, default = DEFAULT_CROP_MARKER())]
pub crop_marker: String,
#[deserr(default, error = DeserrJsonError<InvalidSearchMatchingStrategy>, default)]
pub matching_strategy: MatchingStrategy,
}
impl SearchQueryWithIndex {
pub fn into_index_query(self) -> (IndexUid, SearchQuery) {
let SearchQueryWithIndex {
index_uid,
q,
offset,
limit,
page,
hits_per_page,
attributes_to_retrieve,
attributes_to_crop,
crop_length,
attributes_to_highlight,
show_matches_position,
filter,
sort,
facets,
highlight_pre_tag,
highlight_post_tag,
crop_marker,
matching_strategy,
} = self;
(
index_uid,
SearchQuery {
q,
offset,
limit,
page,
hits_per_page,
attributes_to_retrieve,
attributes_to_crop,
crop_length,
attributes_to_highlight,
show_matches_position,
filter,
sort,
facets,
highlight_pre_tag,
highlight_post_tag,
crop_marker,
matching_strategy,
// do not use ..Default::default() here,
// rather add any missing field from `SearchQuery` to `SearchQueryWithIndex`
},
)
}
}
#[derive(Debug, Clone, PartialEq, Eq, Deserr)]
#[deserr(rename_all = camelCase)]
pub enum MatchingStrategy {
@ -108,7 +204,7 @@ pub struct SearchHit {
pub matches_position: Option<MatchesPosition>,
}
#[derive(Serialize, Debug, Clone, PartialEq, Eq)]
#[derive(Serialize, Debug, Clone, PartialEq)]
#[serde(rename_all = "camelCase")]
pub struct SearchResult {
pub hits: Vec<SearchHit>,
@ -118,6 +214,16 @@ pub struct SearchResult {
pub hits_info: HitsInfo,
#[serde(skip_serializing_if = "Option::is_none")]
pub facet_distribution: Option<BTreeMap<String, BTreeMap<String, u64>>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub facet_stats: Option<BTreeMap<String, FacetStats>>,
}
#[derive(Serialize, Debug, Clone, PartialEq)]
#[serde(rename_all = "camelCase")]
pub struct SearchResultWithIndex {
pub index_uid: String,
#[serde(flatten)]
pub result: SearchResult,
}
#[derive(Serialize, Debug, Clone, PartialEq, Eq)]
@ -129,6 +235,32 @@ pub enum HitsInfo {
OffsetLimit { limit: usize, offset: usize, estimated_total_hits: usize },
}
#[derive(Serialize, Debug, Clone, PartialEq)]
pub struct FacetStats {
pub min: f64,
pub max: f64,
}
/// Incorporate search rules in search query
pub fn add_search_rules(query: &mut SearchQuery, rules: IndexSearchRules) {
query.filter = match (query.filter.take(), rules.filter) {
(None, rules_filter) => rules_filter,
(filter, None) => filter,
(Some(filter), Some(rules_filter)) => {
let filter = match filter {
Value::Array(filter) => filter,
filter => vec![filter],
};
let rules_filter = match rules_filter {
Value::Array(rules_filter) => rules_filter,
rules_filter => vec![rules_filter],
};
Some(Value::Array([filter, rules_filter].concat()))
}
}
}
pub fn perform_search(
index: &Index,
query: SearchQuery,
@ -243,9 +375,10 @@ pub fn perform_search(
&displayed_ids,
);
let tokenizer = TokenizerBuilder::default().build();
let mut tokenizer_buidler = TokenizerBuilder::default();
tokenizer_buidler.create_char_map(true);
let mut formatter_builder = MatcherBuilder::new(matching_words, tokenizer);
let mut formatter_builder = MatcherBuilder::new(matching_words, tokenizer_buidler.build());
formatter_builder.crop_marker(query.crop_marker);
formatter_builder.highlight_prefix(query.highlight_pre_tag);
formatter_builder.highlight_suffix(query.highlight_post_tag);
@ -300,7 +433,7 @@ pub fn perform_search(
HitsInfo::OffsetLimit { limit: query.limit, offset, estimated_total_hits: number_of_hits }
};
let facet_distribution = match query.facets {
let (facet_distribution, facet_stats) = match query.facets {
Some(ref fields) => {
let mut facet_distribution = index.facets_distribution(&rtxn);
@ -314,18 +447,23 @@ pub fn perform_search(
facet_distribution.facets(fields);
}
let distribution = facet_distribution.candidates(candidates).execute()?;
Some(distribution)
let stats = facet_distribution.compute_stats()?;
(Some(distribution), Some(stats))
}
None => None,
None => (None, None),
};
let facet_stats = facet_stats.map(|stats| {
stats.into_iter().map(|(k, (min, max))| (k, FacetStats { min, max })).collect()
});
let result = SearchResult {
hits: documents,
hits_info,
query: query.q.clone().unwrap_or_default(),
processing_time_ms: before_search.elapsed().as_millis(),
facet_distribution,
facet_stats,
};
Ok(result)
}

View File

@ -10,7 +10,8 @@ use crate::common::Server;
pub static AUTHORIZATIONS: Lazy<HashMap<(&'static str, &'static str), HashSet<&'static str>>> =
Lazy::new(|| {
let mut authorizations = hashmap! {
let authorizations = hashmap! {
("POST", "/multi-search") => hashset!{"search", "*"},
("POST", "/indexes/products/search") => hashset!{"search", "*"},
("GET", "/indexes/products/search") => hashset!{"search", "*"},
("POST", "/indexes/products/documents") => hashset!{"documents.add", "documents.*", "*"},
@ -51,6 +52,7 @@ pub static AUTHORIZATIONS: Lazy<HashMap<(&'static str, &'static str), HashSet<&'
("GET", "/stats") => hashset!{"stats.get", "stats.*", "*"},
("POST", "/dumps") => hashset!{"dumps.create", "dumps.*", "*"},
("GET", "/version") => hashset!{"version", "*"},
("GET", "/metrics") => hashset!{"metrics.get", "metrics.*", "*"},
("PATCH", "/keys/mykey/") => hashset!{"keys.update", "*"},
("GET", "/keys/mykey/") => hashset!{"keys.get", "*"},
("DELETE", "/keys/mykey/") => hashset!{"keys.delete", "*"},
@ -58,10 +60,6 @@ pub static AUTHORIZATIONS: Lazy<HashMap<(&'static str, &'static str), HashSet<&'
("GET", "/keys") => hashset!{"keys.get", "*"},
};
if cfg!(feature = "metrics") {
authorizations.insert(("GET", "/metrics"), hashset! {"metrics.get", "metrics.*", "*"});
}
authorizations
});
@ -77,6 +75,14 @@ static INVALID_RESPONSE: Lazy<Value> = Lazy::new(|| {
})
});
static INVALID_METRICS_RESPONSE: Lazy<Value> = Lazy::new(|| {
json!({"message": "The provided API key is invalid. The API key for the `/metrics` route must allow access to all indexes.",
"code": "invalid_api_key",
"type": "auth",
"link": "https://docs.meilisearch.com/errors#invalid_api_key"
})
});
const MASTER_KEY: &str = "MASTER_KEY";
#[actix_rt::test]
@ -204,15 +210,28 @@ async fn access_authorized_restricted_index() {
let (response, code) = server.dummy_request(method, route).await;
assert_ne!(
response,
INVALID_RESPONSE.clone(),
"on route: {:?} - {:?} with action: {:?}",
method,
route,
action
);
assert_ne!(code, 403);
// The metrics route MUST have no limitation on the indexes
if *route == "/metrics" {
assert_eq!(
response,
INVALID_METRICS_RESPONSE.clone(),
"on route: {:?} - {:?} with action: {:?}",
method,
route,
action
);
assert_eq!(code, 403);
} else {
assert_ne!(
response,
INVALID_RESPONSE.clone(),
"on route: {:?} - {:?} with action: {:?}",
method,
route,
action
);
assert_ne!(code, 403);
}
}
}
}

View File

@ -60,7 +60,7 @@ async fn create_api_key_bad_uid() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value at `.uid`: invalid character: expected an optional prefix of `urn:uuid:` followed by [0-9a-zA-Z], found `o` at 2",
"message": "Invalid value at `.uid`: invalid character: expected an optional prefix of `urn:uuid:` followed by [0-9a-fA-F-], found `o` at 2",
"code": "invalid_api_key_uid",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_api_key_uid"

View File

@ -4,6 +4,8 @@ mod errors;
mod payload;
mod tenant_token;
mod tenant_token_multi_search;
use actix_web::http::StatusCode;
use serde_json::{json, Value};

File diff suppressed because it is too large Load Diff

View File

@ -103,6 +103,10 @@ impl Server {
Index { uid: uid.as_ref().to_string(), service: &self.service, encoder }
}
pub async fn multi_search(&self, queries: Value) -> (Value, StatusCode) {
self.service.post("/multi-search", queries).await
}
pub async fn list_indexes_raw(&self, parameters: &str) -> (Value, StatusCode) {
self.service.get(format!("/indexes{parameters}")).await
}
@ -201,10 +205,10 @@ pub fn default_settings(dir: impl AsRef<Path>) -> Opt {
indexer_options: IndexerOpts {
// memory has to be unlimited because several meilisearch are running in test context.
max_indexing_memory: MaxMemory::unlimited(),
skip_index_budget: true,
..Parser::parse_from(None as Option<&str>)
},
#[cfg(feature = "metrics")]
enable_metrics_route: true,
experimental_enable_metrics: true,
..Parser::parse_from(None as Option<&str>)
}
}

View File

@ -280,9 +280,9 @@ async fn replace_documents_bad_csv_delimiter() {
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `csvDelimiter`: expected a string of one character, but found an empty string",
"code": "invalid_index_csv_delimiter",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_index_csv_delimiter"
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
@ -292,9 +292,9 @@ async fn replace_documents_bad_csv_delimiter() {
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `csvDelimiter`: expected a string of one character, but found the following string of 5 characters: `doggo`",
"code": "invalid_index_csv_delimiter",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_index_csv_delimiter"
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
@ -305,9 +305,9 @@ async fn replace_documents_bad_csv_delimiter() {
snapshot!(json_string!(response), @r###"
{
"message": "csv delimiter must be an ascii character. Found: `🍰`",
"code": "invalid_index_csv_delimiter",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_index_csv_delimiter"
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
}
@ -323,9 +323,9 @@ async fn update_documents_bad_csv_delimiter() {
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `csvDelimiter`: expected a string of one character, but found an empty string",
"code": "invalid_index_csv_delimiter",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_index_csv_delimiter"
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
@ -335,9 +335,9 @@ async fn update_documents_bad_csv_delimiter() {
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `csvDelimiter`: expected a string of one character, but found the following string of 5 characters: `doggo`",
"code": "invalid_index_csv_delimiter",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_index_csv_delimiter"
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
@ -352,9 +352,9 @@ async fn update_documents_bad_csv_delimiter() {
snapshot!(json_string!(response), @r###"
{
"message": "csv delimiter must be an ascii character. Found: `🍰`",
"code": "invalid_index_csv_delimiter",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_index_csv_delimiter"
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
}

View File

@ -442,3 +442,37 @@ async fn displayedattr_2_smol() {
)
.await;
}
#[cfg(feature = "default")]
#[actix_rt::test]
async fn test_cjk_highlight() {
let server = Server::new().await;
let index = server.index("test");
let documents = json!([
{ "id": 0, "title": "この度、クーポンで無料で頂きました。" },
{ "id": 1, "title": "大卫到了扫罗那里" },
]);
index.add_documents(documents, None).await;
index.wait_task(0).await;
index
.search(json!({"q": "", "attributesToHighlight": ["title"]}), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0]["_formatted"]["title"],
json!("この度、クーポン<em>で</em>無料<em>で</em>頂きました。")
);
})
.await;
index
.search(json!({"q": "大卫", "attributesToHighlight": ["title"]}), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0]["_formatted"]["title"],
json!("<em>大卫</em>到了扫罗那里")
);
})
.await;
}

View File

@ -3,6 +3,7 @@
mod errors;
mod formatted;
mod multi;
mod pagination;
use once_cell::sync::Lazy;
@ -148,6 +149,49 @@ async fn simple_search() {
.await;
}
#[actix_rt::test]
async fn phrase_search_with_stop_word() {
// related to https://github.com/meilisearch/meilisearch/issues/3521
let server = Server::new().await;
let index = server.index("test");
let (_, code) = index.update_settings(json!({"stopWords": ["the", "of"]})).await;
meili_snap::snapshot!(code, @"202 Accepted");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(1).await;
index
.search(json!({"q": "how \"to\" train \"the" }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(response["hits"].as_array().unwrap().len(), 1);
})
.await;
}
#[cfg(feature = "default")]
#[actix_rt::test]
async fn test_kanji_language_detection() {
let server = Server::new().await;
let index = server.index("test");
let documents = json!([
{ "id": 0, "title": "The quick (\"brown\") fox can't jump 32.3 feet, right? Brr, it's 29.3°F!" },
{ "id": 1, "title": "東京のお寿司。" },
{ "id": 2, "title": "הַשּׁוּעָל הַמָּהִיר (״הַחוּם״) לֹא יָכוֹל לִקְפֹּץ 9.94 מֶטְרִים, נָכוֹן? ברר, 1.5°C- בַּחוּץ!" }
]);
index.add_documents(documents, None).await;
index.wait_task(0).await;
index
.search(json!({"q": "東京"}), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(response["hits"].as_array().unwrap().len(), 1);
})
.await;
}
#[actix_rt::test]
async fn search_multiple_params() {
let server = Server::new().await;

View File

@ -0,0 +1,343 @@
use meili_snap::{json_string, snapshot};
use serde_json::json;
use super::{DOCUMENTS, NESTED_DOCUMENTS};
use crate::common::Server;
#[actix_rt::test]
async fn search_empty_list() {
let server = Server::new().await;
let (response, code) = server.multi_search(json!({"queries": []})).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response), @r###"
{
"results": []
}
"###);
}
#[actix_rt::test]
async fn search_json_object() {
let server = Server::new().await;
let (response, code) = server.multi_search(json!({})).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Missing field `queries`",
"code": "bad_request",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#bad_request"
}
"###);
}
#[actix_rt::test]
async fn search_json_array() {
let server = Server::new().await;
let (response, code) = server.multi_search(json!([])).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value type: expected an object, but found an array: `[]`",
"code": "bad_request",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#bad_request"
}
"###);
}
#[actix_rt::test]
async fn simple_search_single_index() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(0).await;
let (response, code) = server
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass"},
{"indexUid": "test", "q": "captain"},
]}))
.await;
snapshot!(code, @"200 OK");
insta::assert_json_snapshot!(response["results"], { "[].processingTimeMs" => "[time]" }, @r###"
[
{
"indexUid": "test",
"hits": [
{
"title": "Glass",
"id": "450465"
}
],
"query": "glass",
"processingTimeMs": "[time]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1
},
{
"indexUid": "test",
"hits": [
{
"title": "Captain Marvel",
"id": "299537"
}
],
"query": "captain",
"processingTimeMs": "[time]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1
}
]
"###);
}
#[actix_rt::test]
async fn simple_search_missing_index_uid() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(0).await;
let (response, code) = server
.multi_search(json!({"queries": [
{"q": "glass"},
]}))
.await;
snapshot!(code, @"400 Bad Request");
insta::assert_json_snapshot!(response, @r###"
{
"message": "Missing field `indexUid` inside `.queries[0]`",
"code": "missing_index_uid",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_index_uid"
}
"###);
}
#[actix_rt::test]
async fn simple_search_illegal_index_uid() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(0).await;
let (response, code) = server
.multi_search(json!({"queries": [
{"indexUid": "", "q": "glass"},
]}))
.await;
snapshot!(code, @"400 Bad Request");
insta::assert_json_snapshot!(response, @r###"
{
"message": "Invalid value at `.queries[0].indexUid`: `hé` is not a valid index uid. Index uid can be an integer or a string containing only alphanumeric characters, hyphens (-) and underscores (_).",
"code": "invalid_index_uid",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_index_uid"
}
"###);
}
#[actix_rt::test]
async fn simple_search_two_indexes() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(0).await;
let index = server.index("nested");
let documents = NESTED_DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(1).await;
let (response, code) = server
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass"},
{"indexUid": "nested", "q": "pesti"},
]}))
.await;
snapshot!(code, @"200 OK");
insta::assert_json_snapshot!(response["results"], { "[].processingTimeMs" => "[time]" }, @r###"
[
{
"indexUid": "test",
"hits": [
{
"title": "Glass",
"id": "450465"
}
],
"query": "glass",
"processingTimeMs": "[time]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1
},
{
"indexUid": "nested",
"hits": [
{
"id": 852,
"father": "jean",
"mother": "michelle",
"doggos": [
{
"name": "bobby",
"age": 2
},
{
"name": "buddy",
"age": 4
}
],
"cattos": "pesti"
},
{
"id": 654,
"father": "pierre",
"mother": "sabine",
"doggos": [
{
"name": "gros bill",
"age": 8
}
],
"cattos": [
"simba",
"pestiféré"
]
}
],
"query": "pesti",
"processingTimeMs": "[time]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 2
}
]
"###);
}
#[actix_rt::test]
async fn search_one_index_doesnt_exist() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(0).await;
let (response, code) = server
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass"},
{"indexUid": "nested", "q": "pesti"},
]}))
.await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Inside `.queries[1]`: Index `nested` not found.",
"code": "index_not_found",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#index_not_found"
}
"###);
}
#[actix_rt::test]
async fn search_multiple_indexes_dont_exist() {
let server = Server::new().await;
let (response, code) = server
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass"},
{"indexUid": "nested", "q": "pesti"},
]}))
.await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Inside `.queries[0]`: Index `test` not found.",
"code": "index_not_found",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#index_not_found"
}
"###);
}
#[actix_rt::test]
async fn search_one_query_error() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(0).await;
let index = server.index("nested");
let documents = NESTED_DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(1).await;
let (response, code) = server
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass", "facets": ["title"]},
{"indexUid": "nested", "q": "pesti"},
]}))
.await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Inside `.queries[0]`: Invalid facet distribution, this index does not have configured filterable attributes.",
"code": "invalid_search_facets",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_facets"
}
"###);
}
#[actix_rt::test]
async fn search_multiple_query_errors() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(0).await;
let index = server.index("nested");
let documents = NESTED_DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(1).await;
let (response, code) = server
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass", "facets": ["title"]},
{"indexUid": "nested", "q": "pesti", "facets": ["doggos"]},
]}))
.await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Inside `.queries[0]`: Invalid facet distribution, this index does not have configured filterable attributes.",
"code": "invalid_search_facets",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_facets"
}
"###);
}

View File

@ -16,7 +16,7 @@ bimap = { version = "0.6.2", features = ["serde"] }
bincode = "1.3.3"
bstr = "1.0.1"
byteorder = "1.4.3"
charabia = { version = "0.7.0", default-features = false }
charabia = { version = "0.7.1", default-features = false }
concat-arrays = "0.1.2"
crossbeam-channel = "0.5.6"
deserr = "0.5.0"

View File

@ -59,6 +59,8 @@ pub enum InternalError {
Utf8(#[from] str::Utf8Error),
#[error("An indexation process was explicitly aborted.")]
AbortedIndexation,
#[error("The matching words list contains at least one invalid member.")]
InvalidMatchingWords,
}
#[derive(Error, Debug)]

View File

@ -5,6 +5,7 @@ mod field_id_word_count_codec;
mod obkv_codec;
mod roaring_bitmap;
mod roaring_bitmap_length;
mod script_language_codec;
mod str_beu32_codec;
mod str_ref;
mod str_str_u8_codec;
@ -19,5 +20,6 @@ pub use self::roaring_bitmap::{BoRoaringBitmapCodec, CboRoaringBitmapCodec, Roar
pub use self::roaring_bitmap_length::{
BoRoaringBitmapLenCodec, CboRoaringBitmapLenCodec, RoaringBitmapLenCodec,
};
pub use self::script_language_codec::ScriptLanguageCodec;
pub use self::str_beu32_codec::StrBEU32Codec;
pub use self::str_str_u8_codec::{U8StrStrCodec, UncheckedU8StrStrCodec};

View File

@ -0,0 +1,38 @@
use std::borrow::Cow;
use std::str;
use charabia::{Language, Script};
pub struct ScriptLanguageCodec;
impl<'a> heed::BytesDecode<'a> for ScriptLanguageCodec {
type DItem = (Script, Language);
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
let sep = bytes.iter().position(|b| *b == 0)?;
let (s_bytes, l_bytes) = bytes.split_at(sep);
let script = str::from_utf8(s_bytes).ok()?;
let script_name = Script::from_name(script);
let lan = str::from_utf8(l_bytes).ok()?;
// skip '\0' byte between the two strings.
let lan_name = Language::from_name(&lan[1..]);
Some((script_name, lan_name))
}
}
impl<'a> heed::BytesEncode<'a> for ScriptLanguageCodec {
type EItem = (Script, Language);
fn bytes_encode((script, lan): &Self::EItem) -> Option<Cow<[u8]>> {
let script_name = script.name().as_bytes();
let lan_name = lan.name().as_bytes();
let mut bytes = Vec::with_capacity(script_name.len() + lan_name.len() + 1);
bytes.extend_from_slice(script_name);
bytes.push(0);
bytes.extend_from_slice(lan_name);
Some(Cow::Owned(bytes))
}
}

View File

@ -4,6 +4,7 @@ use std::fs::File;
use std::mem::size_of;
use std::path::Path;
use charabia::{Language, Script};
use heed::flags::Flags;
use heed::types::*;
use heed::{CompactionOption, Database, PolyDatabase, RoTxn, RwTxn};
@ -18,7 +19,7 @@ use crate::heed_codec::facet::{
FacetGroupKeyCodec, FacetGroupValueCodec, FieldDocIdFacetF64Codec, FieldDocIdFacetStringCodec,
FieldIdCodec, OrderedF64Codec,
};
use crate::heed_codec::StrRefCodec;
use crate::heed_codec::{ScriptLanguageCodec, StrRefCodec};
use crate::{
default_criteria, BEU32StrCodec, BoRoaringBitmapCodec, CboRoaringBitmapCodec, Criterion,
DocumentId, ExternalDocumentsIds, FacetDistribution, FieldDistribution, FieldId,
@ -83,6 +84,7 @@ pub mod db_name {
pub const FIELD_ID_DOCID_FACET_F64S: &str = "field-id-docid-facet-f64s";
pub const FIELD_ID_DOCID_FACET_STRINGS: &str = "field-id-docid-facet-strings";
pub const DOCUMENTS: &str = "documents";
pub const SCRIPT_LANGUAGE_DOCIDS: &str = "script_language_docids";
}
#[derive(Clone)]
@ -122,6 +124,9 @@ pub struct Index {
/// Maps the position of a word prefix with all the docids where this prefix appears.
pub word_prefix_position_docids: Database<StrBEU32Codec, CboRoaringBitmapCodec>,
/// Maps the script and language with all the docids that corresponds to it.
pub script_language_docids: Database<ScriptLanguageCodec, RoaringBitmapCodec>,
/// Maps the facet field id and the docids for which this field exists
pub facet_id_exists_docids: Database<FieldIdCodec, CboRoaringBitmapCodec>,
@ -148,7 +153,7 @@ impl Index {
) -> Result<Index> {
use db_name::*;
options.max_dbs(18);
options.max_dbs(19);
unsafe { options.flag(Flags::MdbAlwaysFreePages) };
let env = options.open(path)?;
@ -159,6 +164,7 @@ impl Index {
let exact_word_prefix_docids = env.create_database(Some(EXACT_WORD_PREFIX_DOCIDS))?;
let docid_word_positions = env.create_database(Some(DOCID_WORD_POSITIONS))?;
let word_pair_proximity_docids = env.create_database(Some(WORD_PAIR_PROXIMITY_DOCIDS))?;
let script_language_docids = env.create_database(Some(SCRIPT_LANGUAGE_DOCIDS))?;
let word_prefix_pair_proximity_docids =
env.create_database(Some(WORD_PREFIX_PAIR_PROXIMITY_DOCIDS))?;
let prefix_word_pair_proximity_docids =
@ -186,6 +192,7 @@ impl Index {
exact_word_prefix_docids,
docid_word_positions,
word_pair_proximity_docids,
script_language_docids,
word_prefix_pair_proximity_docids,
prefix_word_pair_proximity_docids,
word_position_docids,
@ -1187,6 +1194,38 @@ impl Index {
pub(crate) fn delete_pagination_max_total_hits(&self, txn: &mut RwTxn) -> heed::Result<bool> {
self.main.delete::<_, Str>(txn, main_key::PAGINATION_MAX_TOTAL_HITS)
}
/* script language docids */
/// Retrieve all the documents ids that correspond with (Script, Language) key, `None` if it is any.
pub fn script_language_documents_ids(
&self,
rtxn: &RoTxn,
key: &(Script, Language),
) -> heed::Result<Option<RoaringBitmap>> {
let soft_deleted_documents = self.soft_deleted_documents_ids(rtxn)?;
let doc_ids = self.script_language_docids.get(rtxn, key)?;
Ok(doc_ids.map(|ids| ids - soft_deleted_documents))
}
pub fn script_language(&self, rtxn: &RoTxn) -> heed::Result<HashMap<Script, Vec<Language>>> {
let soft_deleted_documents = self.soft_deleted_documents_ids(rtxn)?;
let mut script_language: HashMap<Script, Vec<Language>> = HashMap::new();
for sl in self.script_language_docids.iter(rtxn)? {
let ((script, language), docids) = sl?;
// keep only Languages that contains at least 1 document.
if !soft_deleted_documents.is_superset(&docids) {
if let Some(languages) = script_language.get_mut(&script) {
(*languages).push(language);
} else {
script_language.insert(script, vec![language]);
}
}
}
Ok(script_language)
}
}
#[cfg(test)]

View File

@ -1,5 +1,6 @@
use std::mem::take;
use heed::BytesDecode;
use itertools::Itertools;
use log::debug;
use ordered_float::OrderedFloat;
@ -7,7 +8,7 @@ use roaring::RoaringBitmap;
use super::{Criterion, CriterionParameters, CriterionResult};
use crate::facet::FacetType;
use crate::heed_codec::facet::FacetGroupKeyCodec;
use crate::heed_codec::facet::{FacetGroupKeyCodec, OrderedF64Codec};
use crate::heed_codec::ByteSliceRefCodec;
use crate::search::criteria::{resolve_query_tree, CriteriaBuilder, InitialCandidates};
use crate::search::facet::{ascending_facet_sort, descending_facet_sort};
@ -196,6 +197,38 @@ fn facet_ordered_iterative<'t>(
Ok(Box::new(number_iter.chain(string_iter).map(Ok)) as Box<dyn Iterator<Item = _>>)
}
fn facet_extreme_value<'t>(
mut extreme_it: impl Iterator<Item = heed::Result<(RoaringBitmap, &'t [u8])>> + 't,
) -> Result<Option<f64>> {
let extreme_value =
if let Some(extreme_value) = extreme_it.next() { extreme_value } else { return Ok(None) };
let (_, extreme_value) = extreme_value?;
Ok(OrderedF64Codec::bytes_decode(extreme_value))
}
pub fn facet_min_value<'t>(
index: &'t Index,
rtxn: &'t heed::RoTxn,
field_id: FieldId,
candidates: RoaringBitmap,
) -> Result<Option<f64>> {
let db = index.facet_id_f64_docids.remap_key_type::<FacetGroupKeyCodec<ByteSliceRefCodec>>();
let it = ascending_facet_sort(rtxn, db, field_id, candidates)?;
facet_extreme_value(it)
}
pub fn facet_max_value<'t>(
index: &'t Index,
rtxn: &'t heed::RoTxn,
field_id: FieldId,
candidates: RoaringBitmap,
) -> Result<Option<f64>> {
let db = index.facet_id_f64_docids.remap_key_type::<FacetGroupKeyCodec<ByteSliceRefCodec>>();
let it = descending_facet_sort(rtxn, db, field_id, candidates)?;
facet_extreme_value(it)
}
fn facet_ordered_set_based<'t>(
index: &'t Index,
rtxn: &'t heed::RoTxn,
@ -203,23 +236,24 @@ fn facet_ordered_set_based<'t>(
is_ascending: bool,
candidates: RoaringBitmap,
) -> Result<Box<dyn Iterator<Item = heed::Result<RoaringBitmap>> + 't>> {
let make_iter = if is_ascending { ascending_facet_sort } else { descending_facet_sort };
let number_db =
index.facet_id_f64_docids.remap_key_type::<FacetGroupKeyCodec<ByteSliceRefCodec>>();
let string_db =
index.facet_id_string_docids.remap_key_type::<FacetGroupKeyCodec<ByteSliceRefCodec>>();
let number_iter = make_iter(
rtxn,
index.facet_id_f64_docids.remap_key_type::<FacetGroupKeyCodec<ByteSliceRefCodec>>(),
field_id,
candidates.clone(),
)?;
let (number_iter, string_iter) = if is_ascending {
let number_iter = ascending_facet_sort(rtxn, number_db, field_id, candidates.clone())?;
let string_iter = ascending_facet_sort(rtxn, string_db, field_id, candidates)?;
let string_iter = make_iter(
rtxn,
index.facet_id_string_docids.remap_key_type::<FacetGroupKeyCodec<ByteSliceRefCodec>>(),
field_id,
candidates,
)?;
(itertools::Either::Left(number_iter), itertools::Either::Left(string_iter))
} else {
let number_iter = descending_facet_sort(rtxn, number_db, field_id, candidates.clone())?;
let string_iter = descending_facet_sort(rtxn, string_db, field_id, candidates)?;
Ok(Box::new(number_iter.chain(string_iter)))
(itertools::Either::Right(number_iter), itertools::Either::Right(string_iter))
};
Ok(Box::new(number_iter.chain(string_iter).map(|res| res.map(|(doc_ids, _)| doc_ids))))
}
/// Returns an iterator over groups of the given candidates in ascending or descending order.

View File

@ -21,6 +21,7 @@ use crate::update::{MAX_LENGTH_FOR_PREFIX_PROXIMITY_DB, MAX_PROXIMITY_FOR_PREFIX
use crate::{AscDesc as AscDescName, DocumentId, FieldId, Index, Member, Result};
mod asc_desc;
pub use asc_desc::{facet_max_value, facet_min_value};
mod attribute;
mod exactness;
pub mod r#final;

View File

@ -278,6 +278,65 @@ impl<'a> FacetDistribution<'a> {
}
}
pub fn compute_stats(&self) -> Result<BTreeMap<String, (f64, f64)>> {
let fields_ids_map = self.index.fields_ids_map(self.rtxn)?;
let filterable_fields = self.index.filterable_fields(self.rtxn)?;
let candidates = if let Some(candidates) = self.candidates.clone() {
candidates
} else {
return Ok(Default::default());
};
let fields = match &self.facets {
Some(facets) => {
let invalid_fields: HashSet<_> = facets
.iter()
.filter(|facet| !crate::is_faceted(facet, &filterable_fields))
.collect();
if !invalid_fields.is_empty() {
return Err(UserError::InvalidFacetsDistribution {
invalid_facets_name: invalid_fields.into_iter().cloned().collect(),
valid_facets_name: filterable_fields.into_iter().collect(),
}
.into());
} else {
facets.clone()
}
}
None => filterable_fields,
};
let mut distribution = BTreeMap::new();
for (fid, name) in fields_ids_map.iter() {
if crate::is_faceted(name, &fields) {
let min_value = if let Some(min_value) = crate::search::criteria::facet_min_value(
self.index,
self.rtxn,
fid,
candidates.clone(),
)? {
min_value
} else {
continue;
};
let max_value = if let Some(max_value) = crate::search::criteria::facet_max_value(
self.index,
self.rtxn,
fid,
candidates.clone(),
)? {
max_value
} else {
continue;
};
distribution.insert(name.to_string(), (min_value, max_value));
}
}
Ok(distribution)
}
pub fn execute(&self) -> Result<BTreeMap<String, BTreeMap<String, u64>>> {
let fields_ids_map = self.index.fields_ids_map(self.rtxn)?;
let filterable_fields = self.index.filterable_fields(self.rtxn)?;
@ -537,4 +596,216 @@ mod tests {
milli_snap!(format!("{map:?}"), "candidates_0_5_000", @"825f23a4090d05756f46176987b7d992");
}
#[test]
fn facet_stats() {
let mut index = TempIndex::new_with_map_size(4096 * 10_000);
index.index_documents_config.autogenerate_docids = true;
index
.update_settings(|settings| settings.set_filterable_fields(hashset! { S("colour") }))
.unwrap();
let facet_values = (0..1000).into_iter().collect::<Vec<_>>();
let mut documents = vec![];
for i in 0..1000 {
let document = serde_json::json!({
"colour": facet_values[i % 1000],
})
.as_object()
.unwrap()
.clone();
documents.push(document);
}
let documents = documents_batch_reader_from_objects(documents);
index.add_documents(documents).unwrap();
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "no_candidates", @"{}");
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.candidates((0..1000).into_iter().collect())
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "candidates_0_1000", @r###"{"colour": (0.0, 999.0)}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.candidates((217..777).into_iter().collect())
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "candidates_217_777", @r###"{"colour": (217.0, 776.0)}"###);
}
#[test]
fn facet_stats_array() {
let mut index = TempIndex::new_with_map_size(4096 * 10_000);
index.index_documents_config.autogenerate_docids = true;
index
.update_settings(|settings| settings.set_filterable_fields(hashset! { S("colour") }))
.unwrap();
let facet_values = (0..1000).into_iter().collect::<Vec<_>>();
let mut documents = vec![];
for i in 0..1000 {
let document = serde_json::json!({
"colour": [facet_values[i % 1000], facet_values[i % 1000] + 1000],
})
.as_object()
.unwrap()
.clone();
documents.push(document);
}
let documents = documents_batch_reader_from_objects(documents);
index.add_documents(documents).unwrap();
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "no_candidates", @"{}");
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.candidates((0..1000).into_iter().collect())
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "candidates_0_1000", @r###"{"colour": (0.0, 1999.0)}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.candidates((217..777).into_iter().collect())
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "candidates_217_777", @r###"{"colour": (217.0, 1776.0)}"###);
}
#[test]
fn facet_stats_mixed_array() {
let mut index = TempIndex::new_with_map_size(4096 * 10_000);
index.index_documents_config.autogenerate_docids = true;
index
.update_settings(|settings| settings.set_filterable_fields(hashset! { S("colour") }))
.unwrap();
let facet_values = (0..1000).into_iter().collect::<Vec<_>>();
let mut documents = vec![];
for i in 0..1000 {
let document = serde_json::json!({
"colour": [facet_values[i % 1000], format!("{}", facet_values[i % 1000] + 1000)],
})
.as_object()
.unwrap()
.clone();
documents.push(document);
}
let documents = documents_batch_reader_from_objects(documents);
index.add_documents(documents).unwrap();
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "no_candidates", @"{}");
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.candidates((0..1000).into_iter().collect())
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "candidates_0_1000", @r###"{"colour": (0.0, 999.0)}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.candidates((217..777).into_iter().collect())
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "candidates_217_777", @r###"{"colour": (217.0, 776.0)}"###);
}
#[test]
fn facet_mixed_values() {
let mut index = TempIndex::new_with_map_size(4096 * 10_000);
index.index_documents_config.autogenerate_docids = true;
index
.update_settings(|settings| settings.set_filterable_fields(hashset! { S("colour") }))
.unwrap();
let facet_values = (0..1000).into_iter().collect::<Vec<_>>();
let mut documents = vec![];
for i in 0..1000 {
let document = if i % 2 == 0 {
serde_json::json!({
"colour": [facet_values[i % 1000], facet_values[i % 1000] + 1000],
})
} else {
serde_json::json!({
"colour": format!("{}", facet_values[i % 1000] + 10000),
})
};
let document = document.as_object().unwrap().clone();
documents.push(document);
}
let documents = documents_batch_reader_from_objects(documents);
index.add_documents(documents).unwrap();
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "no_candidates", @"{}");
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.candidates((0..1000).into_iter().collect())
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "candidates_0_1000", @r###"{"colour": (0.0, 1998.0)}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.candidates((217..777).into_iter().collect())
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "candidates_217_777", @r###"{"colour": (218.0, 1776.0)}"###);
}
}

View File

@ -34,15 +34,20 @@ pub fn ascending_facet_sort<'t>(
db: heed::Database<FacetGroupKeyCodec<ByteSliceRefCodec>, FacetGroupValueCodec>,
field_id: u16,
candidates: RoaringBitmap,
) -> Result<Box<dyn Iterator<Item = Result<RoaringBitmap>> + 't>> {
) -> Result<impl Iterator<Item = Result<(RoaringBitmap, &'t [u8])>> + 't> {
let highest_level = get_highest_level(rtxn, db, field_id)?;
if let Some(first_bound) = get_first_facet_value::<ByteSliceRefCodec>(rtxn, db, field_id)? {
let first_key = FacetGroupKey { field_id, level: highest_level, left_bound: first_bound };
let iter = db.range(rtxn, &(first_key..)).unwrap().take(usize::MAX);
Ok(Box::new(AscendingFacetSort { rtxn, db, field_id, stack: vec![(candidates, iter)] }))
Ok(itertools::Either::Left(AscendingFacetSort {
rtxn,
db,
field_id,
stack: vec![(candidates, iter)],
}))
} else {
Ok(Box::new(std::iter::empty()))
Ok(itertools::Either::Right(std::iter::empty()))
}
}
@ -60,7 +65,7 @@ struct AscendingFacetSort<'t, 'e> {
}
impl<'t, 'e> Iterator for AscendingFacetSort<'t, 'e> {
type Item = Result<RoaringBitmap>;
type Item = Result<(RoaringBitmap, &'t [u8])>;
fn next(&mut self) -> Option<Self::Item> {
'outer: loop {
@ -90,7 +95,8 @@ impl<'t, 'e> Iterator for AscendingFacetSort<'t, 'e> {
*documents_ids -= &bitmap;
if level == 0 {
return Some(Ok(bitmap));
// Since the level is 0, the left_bound is the exact value.
return Some(Ok((bitmap, left_bound)));
}
let starting_key_below =
FacetGroupKey { field_id: self.field_id, level: level - 1, left_bound };
@ -130,7 +136,7 @@ mod tests {
let mut results = String::new();
let iter = ascending_facet_sort(&txn, index.content, 0, candidates).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}
@ -152,7 +158,7 @@ mod tests {
let mut results = String::new();
let iter = ascending_facet_sort(&txn, index.content, 0, candidates.clone()).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}
@ -161,7 +167,7 @@ mod tests {
let mut results = String::new();
let iter = ascending_facet_sort(&txn, index.content, 1, candidates).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}
@ -183,7 +189,7 @@ mod tests {
let mut results = String::new();
let iter = ascending_facet_sort(&txn, index.content, 0, candidates.clone()).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}
@ -192,7 +198,7 @@ mod tests {
let mut results = String::new();
let iter = ascending_facet_sort(&txn, index.content, 1, candidates).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}
@ -214,7 +220,7 @@ mod tests {
let mut results = String::new();
let iter = ascending_facet_sort(&txn, index.content, 3, candidates.clone()).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}

View File

@ -17,21 +17,21 @@ pub fn descending_facet_sort<'t>(
db: heed::Database<FacetGroupKeyCodec<ByteSliceRefCodec>, FacetGroupValueCodec>,
field_id: u16,
candidates: RoaringBitmap,
) -> Result<Box<dyn Iterator<Item = Result<RoaringBitmap>> + 't>> {
) -> Result<impl Iterator<Item = Result<(RoaringBitmap, &'t [u8])>> + 't> {
let highest_level = get_highest_level(rtxn, db, field_id)?;
if let Some(first_bound) = get_first_facet_value::<ByteSliceRefCodec>(rtxn, db, field_id)? {
let first_key = FacetGroupKey { field_id, level: highest_level, left_bound: first_bound };
let last_bound = get_last_facet_value::<ByteSliceRefCodec>(rtxn, db, field_id)?.unwrap();
let last_key = FacetGroupKey { field_id, level: highest_level, left_bound: last_bound };
let iter = db.rev_range(rtxn, &(first_key..=last_key))?.take(usize::MAX);
Ok(Box::new(DescendingFacetSort {
Ok(itertools::Either::Left(DescendingFacetSort {
rtxn,
db,
field_id,
stack: vec![(candidates, iter, Bound::Included(last_bound))],
}))
} else {
Ok(Box::new(std::iter::empty()))
Ok(itertools::Either::Right(std::iter::empty()))
}
}
@ -50,7 +50,7 @@ struct DescendingFacetSort<'t> {
}
impl<'t> Iterator for DescendingFacetSort<'t> {
type Item = Result<RoaringBitmap>;
type Item = Result<(RoaringBitmap, &'t [u8])>;
fn next(&mut self) -> Option<Self::Item> {
'outer: loop {
@ -77,7 +77,8 @@ impl<'t> Iterator for DescendingFacetSort<'t> {
*documents_ids -= &bitmap;
if level == 0 {
return Some(Ok(bitmap));
// Since we're at the level 0 the left_bound is the exact value.
return Some(Ok((bitmap, left_bound)));
}
let starting_key_below =
FacetGroupKey { field_id, level: level - 1, left_bound };
@ -146,7 +147,7 @@ mod tests {
let db = index.content.remap_key_type::<FacetGroupKeyCodec<ByteSliceRefCodec>>();
let iter = descending_facet_sort(&txn, db, 0, candidates).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}
@ -169,7 +170,7 @@ mod tests {
let db = index.content.remap_key_type::<FacetGroupKeyCodec<ByteSliceRefCodec>>();
let iter = descending_facet_sort(&txn, db, 0, candidates.clone()).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}
@ -179,7 +180,7 @@ mod tests {
let iter = descending_facet_sort(&txn, db, 1, candidates).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}
@ -200,7 +201,7 @@ mod tests {
let mut results = String::new();
let iter = descending_facet_sort(&txn, index.content, 0, candidates.clone()).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}
@ -209,7 +210,7 @@ mod tests {
let mut results = String::new();
let iter = descending_facet_sort(&txn, index.content, 1, candidates).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}
@ -231,7 +232,7 @@ mod tests {
let mut results = String::new();
let iter = descending_facet_sort(&txn, index.content, 3, candidates.clone()).unwrap();
for el in iter {
let docids = el.unwrap();
let (docids, _) = el.unwrap();
results.push_str(&display_bitmap(&docids));
results.push('\n');
}

View File

@ -7,6 +7,7 @@ use std::rc::Rc;
use charabia::Token;
use levenshtein_automata::{Distance, DFA};
use crate::error::InternalError;
use crate::search::build_dfa;
use crate::MAX_WORD_LENGTH;
@ -31,12 +32,19 @@ impl fmt::Debug for MatchingWords {
}
impl MatchingWords {
pub fn new(mut matching_words: Vec<(Vec<Rc<MatchingWord>>, Vec<PrimitiveWordId>)>) -> Self {
pub fn new(
mut matching_words: Vec<(Vec<Rc<MatchingWord>>, Vec<PrimitiveWordId>)>,
) -> crate::Result<Self> {
// if one of the matching_words vec doesn't contain a word.
if matching_words.iter().any(|(mw, _)| mw.is_empty()) {
return Err(InternalError::InvalidMatchingWords.into());
}
// Sort word by len in DESC order prioritizing the longuest matches,
// in order to highlight the longuest part of the matched word.
matching_words.sort_unstable_by_key(|(mw, _)| Reverse((mw.len(), mw[0].word.len())));
Self { inner: matching_words }
Ok(Self { inner: matching_words })
}
/// Returns an iterator over terms that match or partially match the given token.
@ -360,7 +368,7 @@ mod tests {
(vec![all[2].clone()], vec![2]),
];
let matching_words = MatchingWords::new(matching_words);
let matching_words = MatchingWords::new(matching_words).unwrap();
assert_eq!(
matching_words

View File

@ -513,7 +513,7 @@ mod tests {
(vec![all[2].clone()], vec![2]),
];
MatchingWords::new(matching_words)
MatchingWords::new(matching_words).unwrap()
}
impl MatcherBuilder<'_, Vec<u8>> {
@ -600,7 +600,7 @@ mod tests {
];
let matching_words = vec![(vec![all[0].clone()], vec![0]), (vec![all[1].clone()], vec![1])];
let matching_words = MatchingWords::new(matching_words);
let matching_words = MatchingWords::new(matching_words).unwrap();
let builder = MatcherBuilder::from_matching_words(matching_words);
@ -847,7 +847,7 @@ mod tests {
(vec![all[4].clone()], vec![2]),
];
let matching_words = MatchingWords::new(matching_words);
let matching_words = MatchingWords::new(matching_words).unwrap();
let mut builder = MatcherBuilder::from_matching_words(matching_words);
builder.highlight_prefix("_".to_string());

View File

@ -152,6 +152,11 @@ impl<'a> Search<'a> {
tokbuilder.stop_words(stop_words);
}
let script_lang_map = self.index.script_language(self.rtxn)?;
if !script_lang_map.is_empty() {
tokbuilder.allow_list(&script_lang_map);
}
let tokenizer = tokbuilder.build();
let tokens = tokenizer.tokenize(query);
builder
@ -446,6 +451,28 @@ mod test {
use super::*;
use crate::index::tests::TempIndex;
#[cfg(feature = "default")]
#[test]
fn test_kanji_language_detection() {
let index = TempIndex::new();
index
.add_documents(documents!([
{ "id": 0, "title": "The quick (\"brown\") fox can't jump 32.3 feet, right? Brr, it's 29.3°F!" },
{ "id": 1, "title": "東京のお寿司。" },
{ "id": 2, "title": "הַשּׁוּעָל הַמָּהִיר (״הַחוּם״) לֹא יָכוֹל לִקְפֹּץ 9.94 מֶטְרִים, נָכוֹן? ברר, 1.5°C- בַּחוּץ!" }
]))
.unwrap();
let txn = index.write_txn().unwrap();
let mut search = Search::new(&txn, &index);
search.query("東京");
let SearchResult { documents_ids, .. } = search.execute().unwrap();
assert_eq!(documents_ids, vec![1]);
}
#[test]
fn test_is_authorized_typos() {
let index = TempIndex::new();

View File

@ -747,7 +747,7 @@ fn create_matching_words(
let mut matching_word_cache = MatchingWordCache::default();
let mut matching_words = Vec::new();
ngrams(ctx, authorize_typos, query, &mut matching_words, &mut matching_word_cache, 0)?;
Ok(MatchingWords::new(matching_words))
MatchingWords::new(matching_words)
}
pub type PrimitiveQuery = Vec<PrimitiveQueryPart>;
@ -825,9 +825,13 @@ where
quoted = !quoted;
}
// if there is a quote or a hard separator we close the phrase.
if !phrase.is_empty() && (quote_count > 0 || separator_kind == SeparatorKind::Hard)
{
primitive_query.push(PrimitiveQueryPart::Phrase(mem::take(&mut phrase)));
if quote_count > 0 || separator_kind == SeparatorKind::Hard {
let phrase = mem::take(&mut phrase);
// if the phrase only contains stop words, we don't keep it in the query.
if phrase.iter().any(|w| w.is_some()) {
primitive_query.push(PrimitiveQueryPart::Phrase(phrase));
}
}
}
_ => (),
@ -835,7 +839,7 @@ where
}
// If a quote is never closed, we consider all of the end of the query as a phrase.
if !phrase.is_empty() {
if phrase.iter().any(|w| w.is_some()) {
primitive_query.push(PrimitiveQueryPart::Phrase(mem::take(&mut phrase)));
}

View File

@ -30,6 +30,7 @@ impl<'t, 'u, 'i> ClearDocuments<'t, 'u, 'i> {
word_position_docids,
field_id_word_count_docids,
word_prefix_position_docids,
script_language_docids,
facet_id_f64_docids,
facet_id_string_docids,
facet_id_exists_docids,
@ -82,6 +83,7 @@ impl<'t, 'u, 'i> ClearDocuments<'t, 'u, 'i> {
word_position_docids.clear(self.wtxn)?;
field_id_word_count_docids.clear(self.wtxn)?;
word_prefix_position_docids.clear(self.wtxn)?;
script_language_docids.clear(self.wtxn)?;
facet_id_f64_docids.clear(self.wtxn)?;
facet_id_exists_docids.clear(self.wtxn)?;
facet_id_string_docids.clear(self.wtxn)?;

View File

@ -243,6 +243,7 @@ impl<'t, 'u, 'i> DeleteDocuments<'t, 'u, 'i> {
facet_id_string_docids: _,
field_id_docid_facet_f64s: _,
field_id_docid_facet_strings: _,
script_language_docids,
facet_id_exists_docids,
documents,
} = self.index;
@ -499,6 +500,22 @@ impl<'t, 'u, 'i> DeleteDocuments<'t, 'u, 'i> {
.execute(self.wtxn)?;
}
// Remove the documents ids from the script language database.
let mut iter = script_language_docids.iter_mut(self.wtxn)?;
while let Some((key, mut docids)) = iter.next().transpose()? {
let previous_len = docids.len();
docids -= &self.to_delete_docids;
if docids.is_empty() {
// safety: we don't keep references from inside the LMDB database.
unsafe { iter.del_current()? };
} else if docids.len() != previous_len {
let key = key.to_owned();
// safety: we don't keep references from inside the LMDB database.
unsafe { iter.put_current(&key, &docids)? };
}
}
drop(iter);
// We delete the documents ids that are under the facet field id values.
remove_docids_from_facet_id_exists_docids(
self.wtxn,
@ -1166,4 +1183,52 @@ mod tests {
stats_should_not_return_deleted_documents_(DeletionStrategy::AlwaysHard);
stats_should_not_return_deleted_documents_(DeletionStrategy::AlwaysSoft);
}
fn stored_detected_script_and_language_should_not_return_deleted_documents_(
deletion_strategy: DeletionStrategy,
) {
use charabia::{Language, Script};
let index = TempIndex::new();
let mut wtxn = index.write_txn().unwrap();
index
.add_documents_using_wtxn(
&mut wtxn,
documents!([
{ "id": "0", "title": "The quick (\"brown\") fox can't jump 32.3 feet, right? Brr, it's 29.3°F!" },
{ "id": "1", "title": "人人生而自由﹐在尊嚴和權利上一律平等。他們賦有理性和良心﹐並應以兄弟關係的精神互相對待。" },
{ "id": "2", "title": "הַשּׁוּעָל הַמָּהִיר (״הַחוּם״) לֹא יָכוֹל לִקְפֹּץ 9.94 מֶטְרִים, נָכוֹן? ברר, 1.5°C- בַּחוּץ!" },
{ "id": "3", "title": "関西国際空港限定トートバッグ すもももももももものうち" },
{ "id": "4", "title": "ภาษาไทยง่ายนิดเดียว" },
{ "id": "5", "title": "The quick 在尊嚴和權利上一律平等。" },
]))
.unwrap();
let key_cmn = (Script::Cj, Language::Cmn);
let cj_cmn_docs =
index.script_language_documents_ids(&wtxn, &key_cmn).unwrap().unwrap_or_default();
let mut expected_cj_cmn_docids = RoaringBitmap::new();
expected_cj_cmn_docids.push(1);
expected_cj_cmn_docids.push(5);
assert_eq!(cj_cmn_docs, expected_cj_cmn_docids);
delete_documents(&mut wtxn, &index, &["1"], deletion_strategy);
wtxn.commit().unwrap();
let rtxn = index.read_txn().unwrap();
let cj_cmn_docs =
index.script_language_documents_ids(&rtxn, &key_cmn).unwrap().unwrap_or_default();
let mut expected_cj_cmn_docids = RoaringBitmap::new();
expected_cj_cmn_docids.push(5);
assert_eq!(cj_cmn_docs, expected_cj_cmn_docids);
}
#[test]
fn stored_detected_script_and_language_should_not_return_deleted_documents() {
stored_detected_script_and_language_should_not_return_deleted_documents_(
DeletionStrategy::AlwaysHard,
);
stored_detected_script_and_language_should_not_return_deleted_documents_(
DeletionStrategy::AlwaysSoft,
);
}
}

View File

@ -1,9 +1,9 @@
use std::collections::HashSet;
use std::collections::{HashMap, HashSet};
use std::convert::TryInto;
use std::fs::File;
use std::{io, mem, str};
use charabia::{SeparatorKind, Token, TokenKind, TokenizerBuilder};
use charabia::{Language, Script, SeparatorKind, Token, TokenKind, TokenizerBuilder};
use roaring::RoaringBitmap;
use serde_json::Value;
@ -13,6 +13,8 @@ use crate::{
absolute_from_relative_position, FieldId, Result, MAX_POSITION_PER_ATTRIBUTE, MAX_WORD_LENGTH,
};
pub type ScriptLanguageDocidsMap = HashMap<(Script, Language), RoaringBitmap>;
/// Extracts the word and positions where this word appear and
/// prefixes it by the document id.
///
@ -25,12 +27,13 @@ pub fn extract_docid_word_positions<R: io::Read + io::Seek>(
searchable_fields: &Option<HashSet<FieldId>>,
stop_words: Option<&fst::Set<&[u8]>>,
max_positions_per_attributes: Option<u32>,
) -> Result<(RoaringBitmap, grenad::Reader<File>)> {
) -> Result<(RoaringBitmap, grenad::Reader<File>, ScriptLanguageDocidsMap)> {
let max_positions_per_attributes = max_positions_per_attributes
.map_or(MAX_POSITION_PER_ATTRIBUTE, |max| max.min(MAX_POSITION_PER_ATTRIBUTE));
let max_memory = indexer.max_memory_by_thread();
let mut documents_ids = RoaringBitmap::new();
let mut script_language_pair = HashMap::new();
let mut docid_word_positions_sorter = create_sorter(
grenad::SortAlgorithm::Stable,
concat_u32s_array,
@ -70,6 +73,13 @@ pub fn extract_docid_word_positions<R: io::Read + io::Seek>(
.take_while(|(p, _)| (*p as u32) < max_positions_per_attributes);
for (index, token) in tokens {
if let Some(language) = token.language {
let script = token.script;
let entry = script_language_pair
.entry((script, language))
.or_insert_with(RoaringBitmap::new);
entry.push(document_id);
}
let token = token.lemma().trim();
if !token.is_empty() && token.len() <= MAX_WORD_LENGTH {
key_buffer.truncate(mem::size_of::<u32>());
@ -88,7 +98,8 @@ pub fn extract_docid_word_positions<R: io::Read + io::Seek>(
}
}
sorter_into_reader(docid_word_positions_sorter, indexer).map(|reader| (documents_ids, reader))
sorter_into_reader(docid_word_positions_sorter, indexer)
.map(|reader| (documents_ids, reader, script_language_pair))
}
/// Transform a JSON value into a string that can be indexed.

View File

@ -257,13 +257,14 @@ fn send_and_extract_flattened_documents_data(
let (docid_word_positions_chunk, docid_fid_facet_values_chunks): (Result<_>, Result<_>) =
rayon::join(
|| {
let (documents_ids, docid_word_positions_chunk) = extract_docid_word_positions(
flattened_documents_chunk.clone(),
indexer,
searchable_fields,
stop_words.as_ref(),
max_positions_per_attributes,
)?;
let (documents_ids, docid_word_positions_chunk, script_language_pair) =
extract_docid_word_positions(
flattened_documents_chunk.clone(),
indexer,
searchable_fields,
stop_words.as_ref(),
max_positions_per_attributes,
)?;
// send documents_ids to DB writer
let _ = lmdb_writer_sx.send(Ok(TypedChunk::NewDocumentsIds(documents_ids)));
@ -274,6 +275,9 @@ fn send_and_extract_flattened_documents_data(
let _ = lmdb_writer_sx
.send(Ok(TypedChunk::DocidWordPositions(docid_word_positions_chunk.clone())));
let _ =
lmdb_writer_sx.send(Ok(TypedChunk::ScriptLanguageDocids(script_language_pair)));
Ok(docid_word_positions_chunk)
},
|| {

View File

@ -1906,6 +1906,33 @@ mod tests {
index.add_documents(doc1).unwrap();
}
#[cfg(feature = "default")]
#[test]
fn store_detected_script_and_language_per_document_during_indexing() {
use charabia::{Language, Script};
let index = TempIndex::new();
index
.add_documents(documents!([
{ "id": 1, "title": "The quick (\"brown\") fox can't jump 32.3 feet, right? Brr, it's 29.3°F!" },
{ "id": 2, "title": "人人生而自由﹐在尊嚴和權利上一律平等。他們賦有理性和良心﹐並應以兄弟關係的精神互相對待。" },
{ "id": 3, "title": "הַשּׁוּעָל הַמָּהִיר (״הַחוּם״) לֹא יָכוֹל לִקְפֹּץ 9.94 מֶטְרִים, נָכוֹן? ברר, 1.5°C- בַּחוּץ!" },
{ "id": 4, "title": "関西国際空港限定トートバッグ すもももももももものうち" },
{ "id": 5, "title": "ภาษาไทยง่ายนิดเดียว" },
{ "id": 6, "title": "The quick 在尊嚴和權利上一律平等。" },
]))
.unwrap();
let rtxn = index.read_txn().unwrap();
let key_jpn = (Script::Cj, Language::Jpn);
let key_cmn = (Script::Cj, Language::Cmn);
let cj_jpn_docs = index.script_language_documents_ids(&rtxn, &key_jpn).unwrap().unwrap();
let cj_cmn_docs = index.script_language_documents_ids(&rtxn, &key_cmn).unwrap().unwrap();
let expected_cj_jpn_docids = [3].iter().collect();
assert_eq!(cj_jpn_docs, expected_cj_jpn_docids);
let expected_cj_cmn_docids = [1, 5].iter().collect();
assert_eq!(cj_cmn_docs, expected_cj_cmn_docids);
}
#[test]
fn add_and_delete_documents_in_single_transform() {
let mut index = TempIndex::new();

View File

@ -1,8 +1,10 @@
use std::borrow::Cow;
use std::collections::HashMap;
use std::convert::TryInto;
use std::fs::File;
use std::io;
use charabia::{Language, Script};
use grenad::MergerBuilder;
use heed::types::ByteSlice;
use heed::{BytesDecode, RwTxn};
@ -38,6 +40,7 @@ pub(crate) enum TypedChunk {
FieldIdFacetNumberDocids(grenad::Reader<File>),
FieldIdFacetExistsDocids(grenad::Reader<File>),
GeoPoints(grenad::Reader<File>),
ScriptLanguageDocids(HashMap<(Script, Language), RoaringBitmap>),
}
/// Write typed chunk in the corresponding LMDB database of the provided index.
@ -210,6 +213,24 @@ pub(crate) fn write_typed_chunk_into_index(
index.put_geo_rtree(wtxn, &rtree)?;
index.put_geo_faceted_documents_ids(wtxn, &geo_faceted_docids)?;
}
TypedChunk::ScriptLanguageDocids(hash_pair) => {
let mut buffer = Vec::new();
for (key, value) in hash_pair {
buffer.clear();
let final_value = match index.script_language_docids.get(wtxn, &key)? {
Some(db_values) => {
let mut db_value_buffer = Vec::new();
serialize_roaring_bitmap(&db_values, &mut db_value_buffer)?;
let mut new_value_buffer = Vec::new();
serialize_roaring_bitmap(&value, &mut new_value_buffer)?;
merge_roaring_bitmaps(&new_value_buffer, &db_value_buffer, &mut buffer)?;
RoaringBitmap::deserialize_from(&buffer[..])?
}
None => value,
};
index.script_language_docids.put(wtxn, &key, &final_value)?;
}
}
}
Ok((RoaringBitmap::new(), is_merged_database))

View File

@ -11,6 +11,7 @@ pub struct IndexerConfig {
pub chunk_compression_level: Option<u32>,
pub thread_pool: Option<ThreadPool>,
pub max_positions_per_attributes: Option<u32>,
pub skip_index_budget: bool,
}
impl Default for IndexerConfig {
@ -24,6 +25,7 @@ impl Default for IndexerConfig {
chunk_compression_level: None,
thread_pool: None,
max_positions_per_attributes: None,
skip_index_budget: false,
}
}
}

View File

@ -32,15 +32,6 @@ fn test_phrase_search_with_stop_words_given_criteria(criteria: &[Criterion]) {
let result = search.execute().unwrap();
// 1 document should match
assert_eq!(result.documents_ids.len(), 1);
// test for a single stop word only, no other search terms
let mut search = Search::new(&txn, &index);
search.query("\"the\"");
search.limit(10);
search.authorize_typos(false);
search.terms_matching_strategy(TermsMatchingStrategy::All);
let result = search.execute().unwrap();
assert_eq!(result.documents_ids.len(), 0);
}
#[test]