Compare commits

...

59 Commits

Author SHA1 Message Date
a1a79389fc remove unused import 2023-06-12 16:30:42 +02:00
3115af9baf score_details: short circuit when the order has been established 2023-06-09 08:33:13 +02:00
da138deaf7 WIP merge by score details strategy 2023-06-09 08:32:47 +02:00
fafe432eb1 WIP comparable details 2023-06-08 22:22:44 +02:00
5e99f16859 WIP normalized score merge strategy 2023-06-08 22:22:21 +02:00
fc0eb3901d Gate _rankingScore behind showRankingScore query parameter 2023-06-08 17:03:04 +02:00
4e740f4c5f Introduce linear scale instead of a unit one for scores 2023-06-08 17:03:04 +02:00
efc3371b6f use linear scale in search 2023-06-08 17:03:03 +02:00
73085d6b03 Fix tests 2023-06-08 17:03:03 +02:00
0ee35ede86 Expose the scores and detailed scores in the API 2023-06-08 17:03:03 +02:00
16898c661e Store the scores for each bucket 2023-06-08 17:03:03 +02:00
4a2a6dc529 Compute score for the ranking rules 2023-06-08 17:03:03 +02:00
63ddea8ae4 Add score_details 2023-06-08 17:03:03 +02:00
df749d424c add virtual conditions to fid and position to always have the max cost 2023-06-08 17:03:03 +02:00
0cfecf4e9a Remove optimization where ranking rules are not executed on buckets of a single document 2023-06-08 17:03:03 +02:00
b8f4e2b3e4 Change how the cost of removing words is computed 2023-06-08 17:03:03 +02:00
daafbc88d6 Fix sort id 2023-06-08 17:03:03 +02:00
047d22fcb1 Merge #3824
3824: Changes the way words are counted in the word count DB r=ManyTheFish a=dureuill

# Pull Request

## Related issue

Fixes https://github.com/meilisearch/meilisearch/issues/3823

## What does this PR do?

- Apply offset when parsing query that is consistent with the indexing

### DB breaking changes

- Count the number of words in `field_id_word_count_docids`
- raise limit of word count for storing the entry in the DB from 10 to 30

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-08 13:26:05 +00:00
a2a3b8c973 Fix offset difference between query and indexing for hard separators 2023-06-08 12:07:12 +02:00
9f37b61666 DB BREAKING: raise limit of word count from 10 to 30. 2023-06-08 12:07:12 +02:00
c15c076da9 DB BREAKING: Count the number of words in field_id_word_count_docids 2023-06-08 12:07:11 +02:00
9dcf1da59d Merge #3819
3819: Remove the `docid_word_positions` database r=Kerollmops a=loiclec

Remove the `docid_word_positions` database, which was only used during deletion operations. In the process, also fixes https://github.com/meilisearch/meilisearch/issues/3816




Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
2023-06-07 09:53:25 +00:00
8628a0c856 Remove docid_word_positions_db + fix deletion bug
That would happen when a word was deleted from all exact attributes
but not all regular attributes.
2023-06-07 10:52:50 +02:00
c1e3cc04b0 Merge #3811
3811: Bring back changes from `release-v1.2.0` to `main` r=Kerollmops a=curquiza



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Filip Bachul <filipbachul@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-06-06 13:10:24 +00:00
d96d8bb0dd Merge #3789
3789: Improve the metrics r=dureuill a=irevoire

# Pull Request

## Related issue
Implements https://github.com/meilisearch/meilisearch/issues/3790
Associated specification: https://github.com/meilisearch/specifications/pull/242

## Be cautious; it's DB-breaking 😱 

While reviewing and after merging this PR, be cautious; if you already have a `data.ms` and run meilisearch with this code on it, it won't work because we need to cache a new information on the index stats (that are backed up on disk). You'll get internal errors.

### About the breaking-change label

We only break the API of the metrics route, which does not pose any problem since it's experimental.

## What does this PR do?
- Create a method to get the « facet distribution » of the task queue.
- Prefix all the metrics by `meilisearch_`
- Add the real database size used by meilisearch
- Add metrics on the task queue
- Update the grafana dashboard to these new changes
- Move the dashboard to the `assets` directory
- Provide a new prometheus file to scrape meilisearch easily

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-06-06 11:44:54 +00:00
4a3405afec comment the stats method 2023-06-06 12:59:58 +02:00
3cfd653db1 Apply suggestions from code review
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-06 11:38:41 +02:00
f3e2f79290 Merge branch 'main' into tmp-release-v1.2.0 2023-06-05 18:36:28 +02:00
f517274d1f Merge #3788
3788: Use `RoaringBitmap::deserialize_unchecked_from` to reduce the deserialization time r=irevoire a=Kerollmops

This pull request replaces the `RoaringBitmap::deserialize_from` methods with the `deserialize_unchecked_from` to avoid doing too much checks. We know the written bitmaps are valid as we do not disable the checks during the indexation phase.

I did a small test with #3780 and discovered that the deserialization time changed from 32% to 9.46% when using these changes. It seems it was low-hanging fruit hidden behind a leaf.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-06-05 09:20:30 +00:00
3f41bc642a Merge #3804 #3805
3804: Bump svenstaro/upload-release-action from 2.5.0 to 2.6.1 r=curquiza a=dependabot[bot]

Bumps [svenstaro/upload-release-action](https://github.com/svenstaro/upload-release-action) from 2.5.0 to 2.6.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/svenstaro/upload-release-action/releases">svenstaro/upload-release-action's releases</a>.</em></p>
<blockquote>
<h2>2.6.1</h2>
<ul>
<li>Do not overwrite body or name if empty <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/108">#108</a> (thanks <a href="https://github.com/regevbr"><code>`@​regevbr</code></a>)</li>`
</ul>
<h2>2.6.0</h2>
<ul>
<li>Add <code>make_latest</code> input parameter. Can be set to <code>false</code> to prevent the created release from being marked as the latest release for the repository <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/100">#100</a> (thanks <a href="https://github.com/brandonkelly"><code>`@​brandonkelly</code></a>)</li>`
<li>Don't try to upload empty files <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/102">#102</a> (thanks <a href="https://github.com/Loyalsoldier"><code>`@​Loyalsoldier</code></a>)</li>`
<li>Bump all deps <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/105">#105</a></li>
<li><code>overwrite</code> option also overwrites name and body <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/106">#106</a> (thanks <a href="https://github.com/regevbr"><code>`@​regevbr</code></a>)</li>`
<li>Add <code>promote</code> option to allow prereleases to be promoted <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/74">#74</a> (thanks <a href="https://github.com/regevbr"><code>`@​regevbr</code></a>)</li>`
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/svenstaro/upload-release-action/blob/master/CHANGELOG.md">svenstaro/upload-release-action's changelog</a>.</em></p>
<blockquote>
<h2>[2.6.1] - 2023-05-31</h2>
<ul>
<li>Do not overwrite body or name if empty <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/108">#108</a> (thanks <a href="https://github.com/regevbr"><code>`@​regevbr</code></a>)</li>`
</ul>
<h2>[2.6.0] - 2023-05-23</h2>
<ul>
<li>Add <code>make_latest</code> input parameter. Can be set to <code>false</code> to prevent the created release from being marked as the latest release for the repository <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/100">#100</a> (thanks <a href="https://github.com/brandonkelly"><code>`@​brandonkelly</code></a>)</li>`
<li>Don't try to upload empty files <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/102">#102</a> (thanks <a href="https://github.com/Loyalsoldier"><code>`@​Loyalsoldier</code></a>)</li>`
<li>Bump all deps <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/105">#105</a></li>
<li><code>overwrite</code> option also overwrites name and body <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/106">#106</a> (thanks <a href="https://github.com/regevbr"><code>`@​regevbr</code></a>)</li>`
<li>Add <code>promote</code> option to allow prereleases to be promoted <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/74">#74</a> (thanks <a href="https://github.com/regevbr"><code>`@​regevbr</code></a>)</li>`
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="2b9d2847a9"><code>2b9d284</code></a> 2.6.1</li>
<li><a href="f9beb0ad08"><code>f9beb0a</code></a> Merge pull request <a href="https://redirect.github.com/svenstaro/upload-release-action/issues/108">#108</a> from regevbr/<a href="https://redirect.github.com/svenstaro/upload-release-action/issues/107">#107</a></li>
<li><a href="1662cfa449"><code>1662cfa</code></a> fix <a href="https://redirect.github.com/svenstaro/upload-release-action/issues/197">#197</a> - do not overwrite, if empty</li>
<li><a href="a5002416a0"><code>a500241</code></a> Document running npm update after changing version</li>
<li><a href="58d5258088"><code>58d5258</code></a> 2.6.0</li>
<li><a href="ffc1afa9c0"><code>ffc1afa</code></a> Update CHANGELOG</li>
<li><a href="24bced81d9"><code>24bced8</code></a> Merge pull request <a href="https://redirect.github.com/svenstaro/upload-release-action/issues/74">#74</a> from regevbr/body</li>
<li><a href="794b3152e1"><code>794b315</code></a> fix <a href="https://redirect.github.com/svenstaro/upload-release-action/issues/42">#42</a> - overwrite body and name as well</li>
<li><a href="b00963776a"><code>b009637</code></a> fix <a href="https://redirect.github.com/svenstaro/upload-release-action/issues/42">#42</a> - overwrite body and name as well</li>
<li><a href="210500d479"><code>210500d</code></a> fix <a href="https://redirect.github.com/svenstaro/upload-release-action/issues/42">#42</a> - overwrite body and name as well</li>
<li>Additional commits viewable in <a href="https://github.com/svenstaro/upload-release-action/compare/2.5.0...2.6.1">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=svenstaro/upload-release-action&package-manager=github_actions&previous-version=2.5.0&new-version=2.6.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

3805: Bump actions/setup-go from 3 to 4 r=curquiza a=dependabot[bot]

Bumps [actions/setup-go](https://github.com/actions/setup-go) from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/actions/setup-go/releases">actions/setup-go's releases</a>.</em></p>
<blockquote>
<h2>v4.0.0</h2>
<p>In scope of release we enable cache by default. The action won’t throw an error if the cache can’t be restored or saved. The action will throw a warning message but it won’t stop a build process. The cache can be disabled by specifying <code>cache: false</code>.</p>
<pre lang="yaml"><code>steps:
  - uses: actions/checkout@v3
  - uses: actions/setup-go@v4
    with:
      go-version: ‘1.19’
  - run: go run hello.go
</code></pre>
<p>Besides, we introduce such changes as</p>
<ul>
<li><a href="https://redirect.github.com/actions/setup-go/pull/305">Allow to use only GOCACHE for cache</a></li>
<li><a href="https://redirect.github.com/actions/setup-go/pull/315">Bump json5 from 2.2.1 to 2.2.3</a></li>
<li><a href="https://redirect.github.com/actions/setup-go/pull/323">Use proper version for primary key in cache</a></li>
<li><a href="https://redirect.github.com/actions/setup-go/pull/351">Always add Go bin to the PATH</a></li>
<li><a href="https://redirect.github.com/actions/setup-go/pull/350">Add step warning if go-version input is empty</a></li>
</ul>
<h2>Add support for stable and oldstable aliases</h2>
<p>In scope of this release we introduce aliases for the <code>go-version</code> input. The <code>stable</code> alias instals the latest stable version of Go. The <code>oldstable</code> alias installs previous latest minor release (the stable is 1.19.x -&gt; the oldstable is 1.18.x).</p>
<h3>Stable</h3>
<pre lang="yaml"><code>steps:
  - uses: actions/checkout@v3
  - uses: actions/setup-go@v3
    with:
      go-version: 'stable'
  - run: go run hello.go
</code></pre>
<h3>OldStable</h3>
<pre lang="yaml"><code>steps:
  - uses: actions/checkout@v3
  - uses: actions/setup-go@v3
    with:
      go-version: 'oldstable'
  - run: go run hello.go
</code></pre>
<h2>Add support for go.work and pass the token input through on GHES</h2>
<p>In scope of this release we added <a href="https://redirect.github.com/actions/setup-go/pull/283">support for go.work file to pass it in go-version-file input</a>.</p>
<pre lang="yaml"><code>steps:
  - uses: actions/checkout@v3
  - uses: actions/setup-go@v3
&lt;/tr&gt;&lt;/table&gt; 
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="fac708d667"><code>fac708d</code></a> Bump <code>`@​actions/cache</code>` dependency to v3.2.1 (<a href="https://redirect.github.com/actions/setup-go/issues/374">#374</a>)</li>
<li><a href="dd84a9531a"><code>dd84a95</code></a> Update xml2js (<a href="https://redirect.github.com/actions/setup-go/issues/370">#370</a>)</li>
<li><a href="41c2024c46"><code>41c2024</code></a> Fix glob bug in package.json scripts section (<a href="https://redirect.github.com/actions/setup-go/issues/359">#359</a>)</li>
<li><a href="8dbf352f06"><code>8dbf352</code></a> update README fo v4 (<a href="https://redirect.github.com/actions/setup-go/issues/354">#354</a>)</li>
<li><a href="4d34df0c23"><code>4d34df0</code></a> Update configuration files (<a href="https://redirect.github.com/actions/setup-go/issues/348">#348</a>)</li>
<li><a href="fdc0d672a1"><code>fdc0d67</code></a> Add Go bin if go-version input is empty (<a href="https://redirect.github.com/actions/setup-go/issues/351">#351</a>)</li>
<li><a href="ebfdf6ac95"><code>ebfdf6a</code></a> add warning if go-version is empty (<a href="https://redirect.github.com/actions/setup-go/issues/350">#350</a>)</li>
<li><a href="b27d76912e"><code>b27d769</code></a> fix lockfileVersion (<a href="https://redirect.github.com/actions/setup-go/issues/349">#349</a>)</li>
<li><a href="c51a720768"><code>c51a720</code></a> Enable caching by default with default input (<a href="https://redirect.github.com/actions/setup-go/issues/332">#332</a>)</li>
<li><a href="6b848af622"><code>6b848af</code></a> Merge pull request <a href="https://redirect.github.com/actions/setup-go/issues/343">#343</a> from akv-platform/reusable-workflow</li>
<li>Additional commits viewable in <a href="https://github.com/actions/setup-go/compare/v3...v4">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-go&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 08:36:22 +00:00
672abdb341 Merge #3803
3803: Bump Swatinem/rust-cache from 2.2.1 to 2.4.0 r=curquiza a=dependabot[bot]

Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.2.1 to 2.4.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.4.0</h2>
<ul>
<li>Fix cache key stability.</li>
<li>Use 8 character hash components to reduce the key length, making it more readable.</li>
</ul>
<h2>v2.3.0</h2>
<ul>
<li>Add <code>cache-all-crates</code> option, which enables caching of crates installed by workflows.</li>
<li>Add installed packages to cache key, so changes to workflows that install rust tools are detected and cached properly.</li>
<li>Fix cache restore failures due to upstream bug.</li>
<li>Fix <code>EISDIR</code> error due to globed directories.</li>
<li>Update runtime <code>`@actions/cache</code>,` <code>`@actions/io</code>` and dev <code>typescript</code> dependencies.</li>
<li>Update <code>npm run prepare</code> so it creates distribution files with the right line endings.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.4.0</h2>
<ul>
<li>Fix cache key stability.</li>
<li>Use 8 character hash components to reduce the key length, making it more readable.</li>
</ul>
<h2>2.3.0</h2>
<ul>
<li>Add <code>cache-all-crates</code> option, which enables caching of crates installed by workflows.</li>
<li>Add installed packages to cache key, so changes to workflows that install rust tools are detected and cached properly.</li>
<li>Fix cache restore failures due to upstream bug.</li>
<li>Fix <code>EISDIR</code> error due to globed directories.</li>
<li>Update runtime <code>`@actions/cache</code>,` <code>`@actions/io</code>` and dev <code>typescript</code> dependencies.</li>
<li>Update <code>npm run prepare</code> so it creates distribution files with the right line endings.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="988c164c3d"><code>988c164</code></a> 2.4.0</li>
<li><a href="bb80d0f127"><code>bb80d0f</code></a> chore: use 8 character hash components (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/143">#143</a>)</li>
<li><a href="ad97570a01"><code>ad97570</code></a> fix: cache key stability (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/142">#142</a>)</li>
<li><a href="060bda31e0"><code>060bda3</code></a> 2.3.0</li>
<li><a href="865fd1f6db"><code>865fd1f</code></a> &quot;update dependencies and changelog&quot;</li>
<li><a href="7c7e41ab01"><code>7c7e41a</code></a> chore: changelog v2.3.0 (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/139">#139</a>)</li>
<li><a href="68aeeba167"><code>68aeeba</code></a> chore: use linefix to ensure platform line endings (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/135">#135</a>)</li>
<li><a href="def0926359"><code>def0926</code></a> feat: add option to cache all crates (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/137">#137</a>)</li>
<li><a href="827c240e23"><code>827c240</code></a> fix: cache key dependency on installed packages (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/138">#138</a>)</li>
<li><a href="5e9fae966f"><code>5e9fae9</code></a> fix: cache restore failures (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/136">#136</a>)</li>
<li>Additional commits viewable in <a href="https://github.com/Swatinem/rust-cache/compare/v2.2.1...v2.4.0">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Swatinem/rust-cache&package-manager=github_actions&previous-version=2.2.1&new-version=2.4.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 07:58:52 +00:00
a13ed4d0b0 Bump actions/setup-go from 3 to 4
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 3 to 4.
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](https://github.com/actions/setup-go/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-01 17:57:48 +00:00
4cc2988482 Bump svenstaro/upload-release-action from 2.5.0 to 2.6.1
Bumps [svenstaro/upload-release-action](https://github.com/svenstaro/upload-release-action) from 2.5.0 to 2.6.1.
- [Release notes](https://github.com/svenstaro/upload-release-action/releases)
- [Changelog](https://github.com/svenstaro/upload-release-action/blob/master/CHANGELOG.md)
- [Commits](https://github.com/svenstaro/upload-release-action/compare/2.5.0...2.6.1)

---
updated-dependencies:
- dependency-name: svenstaro/upload-release-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-01 17:57:43 +00:00
26c7e31f25 Bump Swatinem/rust-cache from 2.2.1 to 2.4.0
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.2.1 to 2.4.0.
- [Release notes](https://github.com/Swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Swatinem/rust-cache/compare/v2.2.1...v2.4.0)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-01 17:57:40 +00:00
b2dee07b5e Merge #3783
3783: Improve SDK CI to choose the Docker image r=curquiza a=curquiza

The point is to have the following "form" when running the SDK CI manually
`nightly` is the default value if running the CI manually.

<img width="1105" alt="Capture d’écran 2023-05-25 à 12 17 35" src="https://github.com/meilisearch/meilisearch/assets/20380692/87ae7123-efe8-4e7b-a99b-4a40aafa3f79">


Co-authored-by: curquiza <clementine@meilisearch.com>
2023-05-31 12:10:07 +00:00
d963b5f85a Merge #3792
3792: fix the type of the document deletion by filter tasks r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3791

## What does this PR do?
- Hide the deleteDocumentByFilter internal type from the users.


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-05-30 18:20:28 +00:00
2acc3ec5ee fix the type of the document deletion by filter tasks 2023-05-30 15:18:52 +02:00
da04edff8c Better use deserialize_unchecked_from to reduce the deserialization time 2023-05-30 14:58:30 +02:00
85a80f4f4c move the grafana dashboard to the assets directory and upload a basic prometheus scraper to help new users 2023-05-29 18:39:34 +02:00
1213ec7164 update the dashboard once again 2023-05-29 18:37:55 +02:00
0a7817a002 Merge #3786
3786: Consistently use wrapping add to avoid overflow in debug when query s… r=dureuill a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3785

## What does this PR do?
- Some of the code paths would erroneously use the default addition operator that has the semantics that "overflow is an error, checked at runtime in debug" instead of the intended "overflow is expected" semantics that this code use (this code is using `u16::MAX` as a sentinel). This PR makes it so the wrapping add operator is used everywhere.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-29 12:39:54 +00:00
1dfc4038ab Add test that fails before PR and passes now 2023-05-29 11:58:26 +02:00
73198179f1 Consistently use wrapping add to avoid overflow in debug when query starts with a separator 2023-05-29 11:54:12 +02:00
51dce9e9d1 improve the dashboard slightly 2023-05-25 18:33:01 +02:00
c9b65677bf return the on disk size actually used by meilisearch 2023-05-25 18:30:30 +02:00
35d5556f1f prefix all the metrics by meilisearch_ 2023-05-25 17:41:53 +02:00
c433bdd1cd add a view for the task queue in the metrics 2023-05-25 12:58:13 +02:00
2db09725f8 Improve SDK CI to choose the Docker image 2023-05-25 12:22:35 +02:00
fdb23132d4 Merge #3781
3781: Revert "Improve docker cache" r=Kerollmops a=curquiza

Reverts meilisearch/meilisearch#3566 because does not work as expected, and so I want to remove useless complexity from the CI and Dockerfile

Co-authored-by: Clémentine U. - curqui <clementine@meilisearch.com>
2023-05-25 09:57:40 +00:00
11b95284cd Revert "Improve docker cache" 2023-05-25 11:48:26 +02:00
1b601f70c6 increase the bucketing of requests 2023-05-25 11:08:16 +02:00
8185731bbf Merge #3779
3779: Add a cron test with disabled tokenization (with @roy9495) r=Kerollmops a=curquiza

Replaces https://github.com/meilisearch/meilisearch/pull/3746 because of bors issue

Co-authored-by: TATHAGATA ROY <98920199+roy9495@users.noreply.github.com>
Co-authored-by: Clémentine U. - curqui <clementine@meilisearch.com>
2023-05-25 08:13:14 +00:00
840727d76f Update .github/workflows/test-suite.yml 2023-05-25 10:07:59 +02:00
ead07d0b9d Update .github/workflows/test-suite.yml 2023-05-25 10:07:52 +02:00
44f231d41e Update .github/workflows/test-suite.yml 2023-05-25 10:07:45 +02:00
3c5d1c93de Added a cron test for disabled all-tokenization 2023-05-25 10:07:32 +02:00
57d53de402 Increase the number of buckets 2023-05-24 10:47:15 +02:00
918ce1dd67 Merge #3731
3731: Move comments above keys in config.toml r=curquiza a=jirutka

The current style is very unusual, confusing and breaks compatibility with tools for parsing config files including comments. Everyone writes comments above the items to which they refer (maybe except pythonists), so let's stick to that.


Co-authored-by: Jakub Jirutka <jakub@jirutka.cz>
2023-05-09 09:24:36 +00:00
8095f21999 Move comments above keys in config.toml
The current style is very unusual, confusing and breaks compatibility
with tools for parsing config files including comments. Everyone writes
comments above the items to which they refer (maybe except pythonists),
so let's stick to that.
2023-05-06 18:10:54 +02:00
60 changed files with 3074 additions and 1586 deletions

View File

@ -2,4 +2,3 @@ target
Dockerfile
.dockerignore
.gitignore
**/.git

View File

@ -35,7 +35,7 @@ jobs:
- name: Build deb package
run: cargo deb -p meilisearch -o target/debian/meilisearch.deb
- name: Upload debian pkg to release
uses: svenstaro/upload-release-action@2.5.0
uses: svenstaro/upload-release-action@2.6.1
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/debian/meilisearch.deb

View File

@ -54,7 +54,7 @@ jobs:
# No need to upload binaries for dry run (cron)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.5.0
uses: svenstaro/upload-release-action@2.6.1
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/meilisearch
@ -87,7 +87,7 @@ jobs:
# No need to upload binaries for dry run (cron)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.5.0
uses: svenstaro/upload-release-action@2.6.1
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/${{ matrix.artifact_name }}
@ -121,7 +121,7 @@ jobs:
- name: Upload the binary to release
# No need to upload binaries for dry run (cron)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.5.0
uses: svenstaro/upload-release-action@2.6.1
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/${{ matrix.target }}/release/meilisearch
@ -183,7 +183,7 @@ jobs:
- name: Upload the binary to release
# No need to upload binaries for dry run (cron)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.5.0
uses: svenstaro/upload-release-action@2.6.1
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/${{ matrix.target }}/release/meilisearch

View File

@ -58,13 +58,9 @@ jobs:
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
with:
platforms: linux/amd64,linux/arm64
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
platforms: linux/amd64,linux/arm64
- name: Login to Docker Hub
uses: docker/login-action@v2
@ -92,13 +88,10 @@ jobs:
push: true
platforms: linux/amd64,linux/arm64
tags: ${{ steps.meta.outputs.tags }}
builder: ${{ steps.buildx.outputs.name }}
build-args: |
COMMIT_SHA=${{ github.sha }}
COMMIT_DATE=${{ steps.build-metadata.outputs.date }}
GIT_TAG=${{ github.ref_name }}
cache-from: type=gha
cache-to: type=gha,mode=max
# /!\ Don't touch this without checking with Cloud team
- name: Send CI information to Cloud team

View File

@ -3,6 +3,11 @@ name: SDKs tests
on:
workflow_dispatch:
inputs:
docker_image:
description: 'The Meilisearch Docker image used'
required: false
default: nightly
schedule:
- cron: "0 6 * * MON" # Every Monday at 6:00AM
@ -17,7 +22,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:nightly
image: getmeili/meilisearch:${{ github.event.inputs.docker_image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@ -51,7 +56,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:nightly
image: getmeili/meilisearch:${{ github.event.inputs.docker_image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@ -77,7 +82,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:nightly
image: getmeili/meilisearch:${{ github.event.inputs.docker_image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@ -107,7 +112,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:nightly
image: getmeili/meilisearch:${{ github.event.inputs.docker_image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@ -131,7 +136,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:nightly
image: getmeili/meilisearch:${{ github.event.inputs.docker_image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@ -139,7 +144,7 @@ jobs:
- '7700:7700'
steps:
- name: Set up Go
uses: actions/setup-go@v3
uses: actions/setup-go@v4
with:
go-version: stable
- uses: actions/checkout@v3
@ -160,7 +165,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:nightly
image: getmeili/meilisearch:${{ github.event.inputs.docker_image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@ -184,7 +189,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:nightly
image: getmeili/meilisearch:${{ github.event.inputs.docker_image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}

View File

@ -43,7 +43,7 @@ jobs:
toolchain: nightly
override: true
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.1
uses: Swatinem/rust-cache@v2.4.0
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
@ -65,7 +65,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.1
uses: Swatinem/rust-cache@v2.4.0
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
@ -105,6 +105,29 @@ jobs:
command: test
args: --workspace --locked --release --all-features
test-disabled-tokenization:
name: Test disabled tokenization
runs-on: ubuntu-latest
container:
image: ubuntu:18.04
if: github.event_name == 'schedule'
steps:
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update
apt-get install --assume-yes build-essential curl
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
- name: Run cargo tree without default features and check lindera is not present
run: |
cargo tree -f '{p} {f}' -e normal --no-default-features | grep lindera -vqz
- name: Run cargo tree with default features and check lindera is pressent
run: |
cargo tree -f '{p} {f}' -e normal | grep lindera -qz
# We run tests in debug also, to make sure that the debug_assertions are hit
test-debug:
name: Run tests in debug
@ -123,7 +146,7 @@ jobs:
toolchain: stable
override: true
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.1
uses: Swatinem/rust-cache@v2.4.0
- name: Run tests in debug
uses: actions-rs/cargo@v1
with:
@ -142,7 +165,7 @@ jobs:
override: true
components: clippy
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.1
uses: Swatinem/rust-cache@v2.4.0
- name: Run cargo clippy
uses: actions-rs/cargo@v1
with:
@ -161,7 +184,7 @@ jobs:
override: true
components: rustfmt
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.1
uses: Swatinem/rust-cache@v2.4.0
- name: Run cargo fmt
# Since we never ran the `build.rs` script in the benchmark directory we are missing one auto-generated import file.
# Since we want to trigger (and fail) this action as fast as possible, instead of building the benchmark crate

View File

@ -1,4 +1,3 @@
# syntax=docker/dockerfile:1.4
# Compile
FROM rust:alpine3.16 AS compiler
@ -12,7 +11,7 @@ ARG GIT_TAG
ENV VERGEN_GIT_SHA=${COMMIT_SHA} VERGEN_GIT_COMMIT_TIMESTAMP=${COMMIT_DATE} VERGEN_GIT_SEMVER_LIGHTWEIGHT=${GIT_TAG}
ENV RUSTFLAGS="-C target-feature=-crt-static"
COPY --link . .
COPY . .
RUN set -eux; \
apkArch="$(apk --print-arch)"; \
if [ "$apkArch" = "aarch64" ]; then \
@ -31,7 +30,7 @@ RUN apk update --quiet \
# add meilisearch to the `/bin` so you can run it from anywhere and it's easy
# to find.
COPY --from=compiler --link /meilisearch/target/release/meilisearch /bin/meilisearch
COPY --from=compiler /meilisearch/target/release/meilisearch /bin/meilisearch
# To stay compatible with the older version of the container (pre v0.27.0) we're
# going to symlink the meilisearch binary in the path to `/meilisearch`
RUN ln -s /bin/meilisearch /meilisearch

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,19 @@
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'meilisearch'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:7700']

View File

@ -1,131 +1,131 @@
# This file shows the default configuration of Meilisearch.
# All variables are defined here: https://www.meilisearch.com/docs/learn/configuration/instance_options#environment-variables
db_path = "./data.ms"
# Designates the location where database files will be created and retrieved.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#database-path
db_path = "./data.ms"
env = "development"
# Configures the instance's environment. Value must be either `production` or `development`.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#environment
env = "development"
http_addr = "localhost:7700"
# The address on which the HTTP server will listen.
http_addr = "localhost:7700"
# master_key = "YOUR_MASTER_KEY_VALUE"
# Sets the instance's master key, automatically protecting all routes except GET /health.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#master-key
# master_key = "YOUR_MASTER_KEY_VALUE"
# no_analytics = true
# Deactivates Meilisearch's built-in telemetry when provided.
# Meilisearch automatically collects data from all instances that do not opt out using this flag.
# All gathered data is used solely for the purpose of improving Meilisearch, and can be deleted at any time.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#disable-analytics
# no_analytics = true
http_payload_size_limit = "100 MB"
# Sets the maximum size of accepted payloads.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#payload-limit-size
http_payload_size_limit = "100 MB"
log_level = "INFO"
# Defines how much detail should be present in Meilisearch's logs.
# Meilisearch currently supports six log levels, listed in order of increasing verbosity: `OFF`, `ERROR`, `WARN`, `INFO`, `DEBUG`, `TRACE`
# https://www.meilisearch.com/docs/learn/configuration/instance_options#log-level
log_level = "INFO"
# max_indexing_memory = "2 GiB"
# Sets the maximum amount of RAM Meilisearch can use when indexing.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#max-indexing-memory
# max_indexing_memory = "2 GiB"
# max_indexing_threads = 4
# Sets the maximum number of threads Meilisearch can use during indexing.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#max-indexing-threads
# max_indexing_threads = 4
#############
### DUMPS ###
#############
dump_dir = "dumps/"
# Sets the directory where Meilisearch will create dump files.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#dump-directory
dump_dir = "dumps/"
# import_dump = "./path/to/my/file.dump"
# Imports the dump file located at the specified path. Path must point to a .dump file.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#import-dump
# import_dump = "./path/to/my/file.dump"
ignore_missing_dump = false
# Prevents Meilisearch from throwing an error when `import_dump` does not point to a valid dump file.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ignore-missing-dump
ignore_missing_dump = false
ignore_dump_if_db_exists = false
# Prevents a Meilisearch instance with an existing database from throwing an error when using `import_dump`.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ignore-dump-if-db-exists
ignore_dump_if_db_exists = false
#################
### SNAPSHOTS ###
#################
schedule_snapshot = false
# Enables scheduled snapshots when true, disable when false (the default).
# If the value is given as an integer, then enables the scheduled snapshot with the passed value as the interval
# between each snapshot, in seconds.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#schedule-snapshot-creation
schedule_snapshot = false
snapshot_dir = "snapshots/"
# Sets the directory where Meilisearch will store snapshots.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#snapshot-destination
snapshot_dir = "snapshots/"
# import_snapshot = "./path/to/my/snapshot"
# Launches Meilisearch after importing a previously-generated snapshot at the given filepath.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#import-snapshot
# import_snapshot = "./path/to/my/snapshot"
ignore_missing_snapshot = false
# Prevents a Meilisearch instance from throwing an error when `import_snapshot` does not point to a valid snapshot file.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ignore-missing-snapshot
ignore_missing_snapshot = false
ignore_snapshot_if_db_exists = false
# Prevents a Meilisearch instance with an existing database from throwing an error when using `import_snapshot`.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ignore-snapshot-if-db-exists
ignore_snapshot_if_db_exists = false
###########
### SSL ###
###########
# ssl_auth_path = "./path/to/root"
# Enables client authentication in the specified path.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-authentication-path
# ssl_auth_path = "./path/to/root"
# ssl_cert_path = "./path/to/certfile"
# Sets the server's SSL certificates.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-certificates-path
# ssl_cert_path = "./path/to/certfile"
# ssl_key_path = "./path/to/private-key"
# Sets the server's SSL key files.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-key-path
# ssl_key_path = "./path/to/private-key"
# ssl_ocsp_path = "./path/to/ocsp-file"
# Sets the server's OCSP file.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-ocsp-path
# ssl_ocsp_path = "./path/to/ocsp-file"
ssl_require_auth = false
# Makes SSL authentication mandatory.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-require-auth
ssl_require_auth = false
ssl_resumption = false
# Activates SSL session resumption.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-resumption
ssl_resumption = false
ssl_tickets = false
# Activates SSL tickets.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-tickets
ssl_tickets = false
#############################
### Experimental features ###
#############################
experimental_enable_metrics = false
# Experimental metrics feature. For more information, see: <https://github.com/meilisearch/meilisearch/discussions/3518>
# Enables the Prometheus metrics on the `GET /metrics` endpoint.
experimental_enable_metrics = false
experimental_reduce_indexing_memory_usage = false
# Experimental RAM reduction during indexing, do not use in production, see: <https://github.com/meilisearch/product/discussions/652>
experimental_reduce_indexing_memory_usage = false

File diff suppressed because it is too large Load Diff

View File

@ -90,8 +90,17 @@ pub enum IndexStatus {
pub struct IndexStats {
/// Number of documents in the index.
pub number_of_documents: u64,
/// Size of the index' DB, in bytes.
/// Size taken up by the index' DB, in bytes.
///
/// This includes the size taken by both the used and free pages of the DB, and as the free pages
/// are not returned to the disk after a deletion, this number is typically larger than
/// `used_database_size` that only includes the size of the used pages.
pub database_size: u64,
/// Size taken by the used pages of the index' DB, in bytes.
///
/// As the DB backend does not return to the disk the pages that are not currently used by the DB,
/// this value is typically smaller than `database_size`.
pub used_database_size: u64,
/// Association of every field name with the number of times it occurs in the documents.
pub field_distribution: FieldDistribution,
/// Creation date of the index.
@ -107,10 +116,10 @@ impl IndexStats {
///
/// - rtxn: a RO transaction for the index, obtained from `Index::read_txn()`.
pub fn new(index: &Index, rtxn: &RoTxn) -> Result<Self> {
let database_size = index.on_disk_size()?;
Ok(IndexStats {
number_of_documents: index.number_of_documents(rtxn)?,
database_size,
database_size: index.on_disk_size()?,
used_database_size: index.used_size()?,
field_distribution: index.field_distribution(rtxn)?,
created_at: index.created_at(rtxn)?,
updated_at: index.updated_at(rtxn)?,

View File

@ -31,7 +31,7 @@ mod uuid_codec;
pub type Result<T> = std::result::Result<T, Error>;
pub type TaskId = u32;
use std::collections::HashMap;
use std::collections::{BTreeMap, HashMap};
use std::ops::{Bound, RangeBounds};
use std::path::{Path, PathBuf};
use std::sync::atomic::AtomicBool;
@ -573,10 +573,16 @@ impl IndexScheduler {
&self.index_mapper.indexer_config
}
/// Return the real database size (i.e.: The size **with** the free pages)
pub fn size(&self) -> Result<u64> {
Ok(self.env.real_disk_size()?)
}
/// Return the used database size (i.e.: The size **without** the free pages)
pub fn used_size(&self) -> Result<u64> {
Ok(self.env.non_free_pages_size()?)
}
/// Return the index corresponding to the name.
///
/// * If the index wasn't opened before, the index will be opened.
@ -756,6 +762,38 @@ impl IndexScheduler {
Ok(tasks)
}
/// The returned structure contains:
/// 1. The name of the property being observed can be `statuses`, `types`, or `indexes`.
/// 2. The name of the specific data related to the property can be `enqueued` for the `statuses`, `settingsUpdate` for the `types`, or the name of the index for the `indexes`, for example.
/// 3. The number of times the properties appeared.
pub fn get_stats(&self) -> Result<BTreeMap<String, BTreeMap<String, u64>>> {
let rtxn = self.read_txn()?;
let mut res = BTreeMap::new();
res.insert(
"statuses".to_string(),
enum_iterator::all::<Status>()
.map(|s| Ok((s.to_string(), self.get_status(&rtxn, s)?.len())))
.collect::<Result<BTreeMap<String, u64>>>()?,
);
res.insert(
"types".to_string(),
enum_iterator::all::<Kind>()
.map(|s| Ok((s.to_string(), self.get_kind(&rtxn, s)?.len())))
.collect::<Result<BTreeMap<String, u64>>>()?,
);
res.insert(
"indexes".to_string(),
self.index_tasks
.iter(&rtxn)?
.map(|res| Ok(res.map(|(name, bitmap)| (name.to_string(), bitmap.len()))?))
.collect::<Result<BTreeMap<String, u64>>>()?,
);
Ok(res)
}
/// Return true iff there is at least one task associated with this index
/// that is processing.
pub fn is_index_processing(&self, index: &str) -> Result<bool> {

View File

@ -466,7 +466,7 @@ impl IndexScheduler {
}
}
Details::DocumentDeletionByFilter { deleted_documents, original_filter: _ } => {
assert_eq!(kind.as_kind(), Kind::DocumentDeletionByFilter);
assert_eq!(kind.as_kind(), Kind::DocumentDeletion);
let (index_uid, _) = if let KindWithContent::DocumentDeletionByFilter {
ref index_uid,
ref filter_expr,

View File

@ -45,6 +45,11 @@ impl AuthController {
self.store.size()
}
/// Return the used size of the `AuthController` database in bytes.
pub fn used_size(&self) -> Result<u64> {
self.store.used_size()
}
pub fn create_key(&self, create_key: CreateApiKey) -> Result<Key> {
match self.store.get_api_key(create_key.uid)? {
Some(_) => Err(AuthControllerError::ApiKeyAlreadyExists(create_key.uid.to_string())),

View File

@ -75,6 +75,11 @@ impl HeedAuthStore {
Ok(self.env.real_disk_size()?)
}
/// Return the number of bytes actually used in the database
pub fn used_size(&self) -> Result<u64> {
Ok(self.env.non_free_pages_size()?)
}
pub fn set_drop_on_close(&mut self, v: bool) {
self.should_close_on_drop = v;
}

View File

@ -236,10 +236,13 @@ InvalidSearchHighlightPreTag , InvalidRequest , BAD_REQUEST ;
InvalidSearchHitsPerPage , InvalidRequest , BAD_REQUEST ;
InvalidSearchLimit , InvalidRequest , BAD_REQUEST ;
InvalidSearchMatchingStrategy , InvalidRequest , BAD_REQUEST ;
InvalidMultiSearchMergeStrategy , InvalidRequest , BAD_REQUEST ;
InvalidSearchOffset , InvalidRequest , BAD_REQUEST ;
InvalidSearchPage , InvalidRequest , BAD_REQUEST ;
InvalidSearchQ , InvalidRequest , BAD_REQUEST ;
InvalidSearchShowMatchesPosition , InvalidRequest , BAD_REQUEST ;
InvalidSearchShowRankingScore , InvalidRequest , BAD_REQUEST ;
InvalidSearchShowRankingScoreDetails , InvalidRequest , BAD_REQUEST ;
InvalidSearchSort , InvalidRequest , BAD_REQUEST ;
InvalidSettingsDisplayedAttributes , InvalidRequest , BAD_REQUEST ;
InvalidSettingsDistinctAttribute , InvalidRequest , BAD_REQUEST ;

View File

@ -395,7 +395,6 @@ impl std::error::Error for ParseTaskStatusError {}
pub enum Kind {
DocumentAdditionOrUpdate,
DocumentDeletion,
DocumentDeletionByFilter,
SettingsUpdate,
IndexCreation,
IndexDeletion,
@ -412,7 +411,6 @@ impl Kind {
match self {
Kind::DocumentAdditionOrUpdate
| Kind::DocumentDeletion
| Kind::DocumentDeletionByFilter
| Kind::SettingsUpdate
| Kind::IndexCreation
| Kind::IndexDeletion
@ -430,7 +428,6 @@ impl Display for Kind {
match self {
Kind::DocumentAdditionOrUpdate => write!(f, "documentAdditionOrUpdate"),
Kind::DocumentDeletion => write!(f, "documentDeletion"),
Kind::DocumentDeletionByFilter => write!(f, "documentDeletionByFilter"),
Kind::SettingsUpdate => write!(f, "settingsUpdate"),
Kind::IndexCreation => write!(f, "indexCreation"),
Kind::IndexDeletion => write!(f, "indexDeletion"),

View File

@ -4,20 +4,32 @@ use prometheus::{
register_int_gauge_vec, HistogramVec, IntCounterVec, IntGauge, IntGaugeVec,
};
const HTTP_RESPONSE_TIME_CUSTOM_BUCKETS: &[f64; 14] = &[
0.0005, 0.0008, 0.00085, 0.0009, 0.00095, 0.001, 0.00105, 0.0011, 0.00115, 0.0012, 0.0015,
0.002, 0.003, 1.0,
];
/// Create evenly distributed buckets
fn create_buckets() -> [f64; 29] {
(0..10)
.chain((10..100).step_by(10))
.chain((100..=1000).step_by(100))
.map(|i| i as f64 / 1000.)
.collect::<Vec<_>>()
.try_into()
.unwrap()
}
lazy_static! {
pub static ref HTTP_REQUESTS_TOTAL: IntCounterVec = register_int_counter_vec!(
opts!("http_requests_total", "HTTP requests total"),
pub static ref HTTP_RESPONSE_TIME_CUSTOM_BUCKETS: [f64; 29] = create_buckets();
pub static ref MEILISEARCH_HTTP_REQUESTS_TOTAL: IntCounterVec = register_int_counter_vec!(
opts!("meilisearch_http_requests_total", "Meilisearch HTTP requests total"),
&["method", "path"]
)
.expect("Can't create a metric");
pub static ref MEILISEARCH_DB_SIZE_BYTES: IntGauge =
register_int_gauge!(opts!("meilisearch_db_size_bytes", "Meilisearch Db Size In Bytes"))
register_int_gauge!(opts!("meilisearch_db_size_bytes", "Meilisearch DB Size In Bytes"))
.expect("Can't create a metric");
pub static ref MEILISEARCH_USED_DB_SIZE_BYTES: IntGauge = register_int_gauge!(opts!(
"meilisearch_used_db_size_bytes",
"Meilisearch Used DB Size In Bytes"
))
.expect("Can't create a metric");
pub static ref MEILISEARCH_INDEX_COUNT: IntGauge =
register_int_gauge!(opts!("meilisearch_index_count", "Meilisearch Index Count"))
.expect("Can't create a metric");
@ -26,11 +38,16 @@ lazy_static! {
&["index"]
)
.expect("Can't create a metric");
pub static ref HTTP_RESPONSE_TIME_SECONDS: HistogramVec = register_histogram_vec!(
pub static ref MEILISEARCH_HTTP_RESPONSE_TIME_SECONDS: HistogramVec = register_histogram_vec!(
"http_response_time_seconds",
"HTTP response times",
&["method", "path"],
HTTP_RESPONSE_TIME_CUSTOM_BUCKETS.to_vec()
)
.expect("Can't create a metric");
pub static ref MEILISEARCH_NB_TASKS: IntGaugeVec = register_int_gauge_vec!(
opts!("meilisearch_nb_tasks", "Meilisearch Number of tasks"),
&["kind", "value"]
)
.expect("Can't create a metric");
}

View File

@ -52,11 +52,11 @@ where
if is_registered_resource {
let request_method = req.method().to_string();
histogram_timer = Some(
crate::metrics::HTTP_RESPONSE_TIME_SECONDS
crate::metrics::MEILISEARCH_HTTP_RESPONSE_TIME_SECONDS
.with_label_values(&[&request_method, request_path])
.start_timer(),
);
crate::metrics::HTTP_REQUESTS_TOTAL
crate::metrics::MEILISEARCH_HTTP_REQUESTS_TOTAL
.with_label_values(&[&request_method, request_path])
.inc();
}

View File

@ -56,6 +56,10 @@ pub struct SearchQueryGet {
sort: Option<String>,
#[deserr(default, error = DeserrQueryParamError<InvalidSearchShowMatchesPosition>)]
show_matches_position: Param<bool>,
#[deserr(default, error = DeserrQueryParamError<InvalidSearchShowRankingScore>)]
show_ranking_score: Param<bool>,
#[deserr(default, error = DeserrQueryParamError<InvalidSearchShowRankingScoreDetails>)]
show_ranking_score_details: Param<bool>,
#[deserr(default, error = DeserrQueryParamError<InvalidSearchFacets>)]
facets: Option<CS<String>>,
#[deserr( default = DEFAULT_HIGHLIGHT_PRE_TAG(), error = DeserrQueryParamError<InvalidSearchHighlightPreTag>)]
@ -91,6 +95,8 @@ impl From<SearchQueryGet> for SearchQuery {
filter,
sort: other.sort.map(|attr| fix_sort_query_parameters(&attr)),
show_matches_position: other.show_matches_position.0,
show_ranking_score: other.show_ranking_score.0,
show_ranking_score_details: other.show_ranking_score_details.0,
facets: other.facets.map(|o| o.into_iter().collect()),
highlight_pre_tag: other.highlight_pre_tag,
highlight_post_tag: other.highlight_post_tag,

View File

@ -17,7 +17,7 @@ pub fn configure(config: &mut web::ServiceConfig) {
pub async fn get_metrics(
index_scheduler: GuardedData<ActionPolicy<{ actions::METRICS_GET }>, Data<IndexScheduler>>,
auth_controller: GuardedData<ActionPolicy<{ actions::METRICS_GET }>, Data<AuthController>>,
auth_controller: Data<AuthController>,
) -> Result<HttpResponse, ResponseError> {
let auth_filters = index_scheduler.filters();
if !auth_filters.all_indexes_authorized() {
@ -28,10 +28,10 @@ pub async fn get_metrics(
return Err(error);
}
let response =
create_all_stats((*index_scheduler).clone(), (*auth_controller).clone(), auth_filters)?;
let response = create_all_stats((*index_scheduler).clone(), auth_controller, auth_filters)?;
crate::metrics::MEILISEARCH_DB_SIZE_BYTES.set(response.database_size as i64);
crate::metrics::MEILISEARCH_USED_DB_SIZE_BYTES.set(response.used_database_size as i64);
crate::metrics::MEILISEARCH_INDEX_COUNT.set(response.indexes.len() as i64);
for (index, value) in response.indexes.iter() {
@ -40,6 +40,14 @@ pub async fn get_metrics(
.set(value.number_of_documents as i64);
}
for (kind, value) in index_scheduler.get_stats()? {
for (value, count) in value {
crate::metrics::MEILISEARCH_NB_TASKS
.with_label_values(&[&kind, &value])
.set(count as i64);
}
}
let encoder = TextEncoder::new();
let mut buffer = vec![];
encoder.encode(&prometheus::gather(), &mut buffer).expect("Failed to encode metrics");

View File

@ -231,6 +231,8 @@ pub async fn running() -> HttpResponse {
#[serde(rename_all = "camelCase")]
pub struct Stats {
pub database_size: u64,
#[serde(skip)]
pub used_database_size: u64,
#[serde(serialize_with = "time::serde::rfc3339::option::serialize")]
pub last_update: Option<OffsetDateTime>,
pub indexes: BTreeMap<String, indexes::IndexStats>,
@ -259,6 +261,7 @@ pub fn create_all_stats(
let mut last_task: Option<OffsetDateTime> = None;
let mut indexes = BTreeMap::new();
let mut database_size = 0;
let mut used_database_size = 0;
for index_uid in index_scheduler.index_names()? {
// Accumulate the size of all indexes, even unauthorized ones, so
@ -266,6 +269,7 @@ pub fn create_all_stats(
// See <https://github.com/meilisearch/meilisearch/pull/3541#discussion_r1126747643> for context.
let stats = index_scheduler.index_stats(&index_uid)?;
database_size += stats.inner_stats.database_size;
used_database_size += stats.inner_stats.used_database_size;
if !filters.is_index_authorized(&index_uid) {
continue;
@ -278,10 +282,14 @@ pub fn create_all_stats(
}
database_size += index_scheduler.size()?;
used_database_size += index_scheduler.used_size()?;
database_size += auth_controller.size()?;
database_size += index_scheduler.compute_update_file_size()?;
used_database_size += auth_controller.used_size()?;
let update_file_size = index_scheduler.compute_update_file_size()?;
database_size += update_file_size;
used_database_size += update_file_size;
let stats = Stats { database_size, last_update: last_task, indexes };
let stats = Stats { database_size, used_database_size, last_update: last_task, indexes };
Ok(stats)
}

View File

@ -1,20 +1,26 @@
use std::collections::HashMap;
use actix_http::StatusCode;
use actix_web::web::{self, Data};
use actix_web::{HttpRequest, HttpResponse};
use deserr::actix_web::AwebJson;
use deserr::Deserr;
use index_scheduler::IndexScheduler;
use log::debug;
use meilisearch_types::deserr::DeserrJsonError;
use meilisearch_types::error::deserr_codes::InvalidMultiSearchMergeStrategy;
use meilisearch_types::error::ResponseError;
use meilisearch_types::keys::actions;
use meilisearch_types::milli::score_details::NotComparable;
use serde::Serialize;
use crate::analytics::{Analytics, MultiSearchAggregator};
use crate::extractors::authentication::policies::ActionPolicy;
use crate::extractors::authentication::{AuthenticationError, GuardedData};
use crate::extractors::sequential_extractor::SeqHandler;
use crate::milli::score_details::ScoreDetails;
use crate::search::{
add_search_rules, perform_search, SearchQueryWithIndex, SearchResultWithIndex,
add_search_rules, perform_search, SearchHit, SearchQueryWithIndex, SearchResultWithIndex,
};
pub fn configure(cfg: &mut web::ServiceConfig) {
@ -23,13 +29,34 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
#[derive(Serialize)]
struct SearchResults {
#[serde(skip_serializing_if = "Option::is_none")]
aggregate_hits: Option<Vec<SearchHitWithIndex>>,
results: Vec<SearchResultWithIndex>,
}
#[derive(Serialize, Debug, Clone, PartialEq)]
#[serde(rename_all = "camelCase")]
struct SearchHitWithIndex {
pub index_uid: String,
#[serde(flatten)]
pub hit: SearchHit,
}
#[derive(Debug, deserr::Deserr)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
pub struct SearchQueries {
queries: Vec<SearchQueryWithIndex>,
#[deserr(default, error = DeserrJsonError<InvalidMultiSearchMergeStrategy>, default)]
merge_strategy: MergeStrategy,
}
#[derive(Debug, Clone, PartialEq, Eq, Deserr, Default)]
#[deserr(rename_all = camelCase)]
pub enum MergeStrategy {
#[default]
None,
ByNormalizedScore,
ByScoreDetails,
}
pub async fn multi_search_with_post(
@ -38,7 +65,13 @@ pub async fn multi_search_with_post(
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
let queries = params.into_inner().queries;
let SearchQueries { queries, merge_strategy } = params.into_inner();
// FIXME: REMOVE UNWRAP
let max_hits = queries
.iter()
.map(|SearchQueryWithIndex { limit, hits_per_page, .. }| hits_per_page.unwrap_or(*limit))
.max()
.unwrap();
let mut multi_aggregate = MultiSearchAggregator::from_queries(&queries, &req);
@ -104,7 +137,117 @@ pub async fn multi_search_with_post(
debug!("returns: {:?}", search_results);
Ok(HttpResponse::Ok().json(SearchResults { results: search_results }))
let aggregate_hits = match merge_strategy {
MergeStrategy::None => None,
MergeStrategy::ByScoreDetails => Some(merge_by_score_details(&search_results, max_hits)),
MergeStrategy::ByNormalizedScore => {
Some(merge_by_normalized_score(&search_results, max_hits))
}
};
Ok(HttpResponse::Ok().json(SearchResults { aggregate_hits, results: search_results }))
}
fn merge_by_score_details(
search_results: &[SearchResultWithIndex],
max_hits: usize,
) -> Vec<SearchHitWithIndex> {
let mut iterators: Vec<_> = search_results
.iter()
.filter_map(|SearchResultWithIndex { index_uid, result }| {
let mut it = result.hits.iter();
let next = it.next()?;
Some((index_uid, it, next))
})
.collect();
let mut hits = Vec::with_capacity(max_hits);
let mut inconsistent_indexes = HashMap::new();
for _ in 0..max_hits {
iterators.sort_by(|(left_uid, _, left_hit), (right_uid, _, right_hit)| {
let error = match ScoreDetails::partial_cmp_iter(
left_hit.ranking_score_raw.iter(),
right_hit.ranking_score_raw.iter(),
) {
Ok(ord) => return ord,
Err(NotComparable(incomparable_index)) => incomparable_index,
};
inconsistent_indexes.entry((left_uid.to_owned(), right_uid.to_owned())).or_insert_with(
|| {
format!(
"Detailed score {:?} is not comparable with {:?}: (left: {:#?}, right: {:#?})",
left_hit.ranking_score_raw.get(error),
right_hit.ranking_score_raw.get(error),
left_hit.ranking_score_raw,
right_hit.ranking_score_raw
)
},
);
std::cmp::Ordering::Less
});
if !inconsistent_indexes.is_empty() {
let mut s = String::new();
for ((left_uid, right_uid), error) in &inconsistent_indexes {
use std::fmt::Write;
writeln!(s, "Indexes {} and {} are inconsistent: {}", left_uid, right_uid, error)
.unwrap();
}
// Replace panic with proper error
panic!("{}", s);
}
let Some((index_uid, it, next)) = iterators.last_mut()
else {
break;
};
let hit = SearchHitWithIndex { index_uid: index_uid.clone(), hit: next.clone() };
if let Some(next_hit) = it.next() {
*next = next_hit;
} else {
iterators.pop();
}
hits.push(hit);
}
hits
}
fn merge_by_normalized_score(
search_results: &[SearchResultWithIndex],
max_hits: usize,
) -> Vec<SearchHitWithIndex> {
let mut iterators: Vec<_> = search_results
.iter()
.filter_map(|SearchResultWithIndex { index_uid, result }| {
let mut it = result.hits.iter();
let next = it.next()?;
Some((index_uid, it, next))
})
.collect();
let mut hits = Vec::with_capacity(max_hits);
for _ in 0..max_hits {
iterators.sort_by_key(|(_, _, hit)| {
ScoreDetails::global_score_linear_scale(hit.ranking_score_raw.iter())
});
let Some((index_uid, it, next)) = iterators.last_mut()
else {
break;
};
let hit = SearchHitWithIndex { index_uid: index_uid.clone(), hit: next.clone() };
if let Some(next_hit) = it.next() {
*next = next_hit;
} else {
iterators.pop();
}
hits.push(hit);
}
hits
}
/// Local `Result` extension trait to avoid `map_err` boilerplate.

View File

@ -730,7 +730,7 @@ mod tests {
let err = deserr_query_params::<TaskDeletionOrCancelationQuery>(params).unwrap_err();
snapshot!(meili_snap::json_string!(err), @r###"
{
"message": "Invalid value in parameter `types`: `createIndex` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `documentDeletionByFilter`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"message": "Invalid value in parameter `types`: `createIndex` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"code": "invalid_task_types",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_task_types"

View File

@ -9,6 +9,7 @@ use meilisearch_auth::IndexSearchRules;
use meilisearch_types::deserr::DeserrJsonError;
use meilisearch_types::error::deserr_codes::*;
use meilisearch_types::index_uid::IndexUid;
use meilisearch_types::milli::score_details::ScoreDetails;
use meilisearch_types::settings::DEFAULT_PAGINATION_MAX_TOTAL_HITS;
use meilisearch_types::{milli, Document};
use milli::tokenizer::TokenizerBuilder;
@ -54,6 +55,10 @@ pub struct SearchQuery {
pub attributes_to_highlight: Option<HashSet<String>>,
#[deserr(default, error = DeserrJsonError<InvalidSearchShowMatchesPosition>, default)]
pub show_matches_position: bool,
#[deserr(default, error = DeserrJsonError<InvalidSearchShowRankingScore>, default)]
pub show_ranking_score: bool,
#[deserr(default, error = DeserrJsonError<InvalidSearchShowRankingScoreDetails>, default)]
pub show_ranking_score_details: bool,
#[deserr(default, error = DeserrJsonError<InvalidSearchFilter>)]
pub filter: Option<Value>,
#[deserr(default, error = DeserrJsonError<InvalidSearchSort>)]
@ -103,6 +108,10 @@ pub struct SearchQueryWithIndex {
pub crop_length: usize,
#[deserr(default, error = DeserrJsonError<InvalidSearchAttributesToHighlight>)]
pub attributes_to_highlight: Option<HashSet<String>>,
#[deserr(default, error = DeserrJsonError<InvalidSearchShowRankingScore>, default)]
pub show_ranking_score: bool,
#[deserr(default, error = DeserrJsonError<InvalidSearchShowRankingScoreDetails>, default)]
pub show_ranking_score_details: bool,
#[deserr(default, error = DeserrJsonError<InvalidSearchShowMatchesPosition>, default)]
pub show_matches_position: bool,
#[deserr(default, error = DeserrJsonError<InvalidSearchFilter>)]
@ -134,6 +143,8 @@ impl SearchQueryWithIndex {
attributes_to_crop,
crop_length,
attributes_to_highlight,
show_ranking_score,
show_ranking_score_details,
show_matches_position,
filter,
sort,
@ -155,6 +166,8 @@ impl SearchQueryWithIndex {
attributes_to_crop,
crop_length,
attributes_to_highlight,
show_ranking_score,
show_ranking_score_details,
show_matches_position,
filter,
sort,
@ -194,7 +207,7 @@ impl From<MatchingStrategy> for TermsMatchingStrategy {
}
}
#[derive(Debug, Clone, Serialize, PartialEq, Eq)]
#[derive(Debug, Clone, Serialize, PartialEq)]
pub struct SearchHit {
#[serde(flatten)]
pub document: Document,
@ -202,6 +215,12 @@ pub struct SearchHit {
pub formatted: Document,
#[serde(rename = "_matchesPosition", skip_serializing_if = "Option::is_none")]
pub matches_position: Option<MatchesPosition>,
#[serde(rename = "_rankingScore", skip_serializing_if = "Option::is_none")]
pub ranking_score: Option<u64>,
#[serde(rename = "_rankingScoreDetails", skip_serializing_if = "Option::is_none")]
pub ranking_score_details: Option<serde_json::Map<String, serde_json::Value>>,
#[serde(skip)]
pub ranking_score_raw: Vec<ScoreDetails>,
}
#[derive(Serialize, Debug, Clone, PartialEq)]
@ -320,7 +339,8 @@ pub fn perform_search(
search.sort_criteria(sort);
}
let milli::SearchResult { documents_ids, matching_words, candidates, .. } = search.execute()?;
let milli::SearchResult { documents_ids, matching_words, candidates, document_scores, .. } =
search.execute()?;
let fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
@ -392,7 +412,7 @@ pub fn perform_search(
let documents_iter = index.documents(&rtxn, documents_ids)?;
for (_id, obkv) in documents_iter {
for ((_id, obkv), score) in documents_iter.into_iter().zip(document_scores.into_iter()) {
// First generate a document with all the displayed fields
let displayed_document = make_document(&displayed_ids, &fields_ids_map, obkv)?;
@ -416,7 +436,19 @@ pub fn perform_search(
insert_geo_distance(sort, &mut document);
}
let hit = SearchHit { document, formatted, matches_position };
let ranking_score =
query.show_ranking_score.then(|| ScoreDetails::global_score_linear_scale(score.iter()));
let ranking_score_details =
query.show_ranking_score_details.then(|| ScoreDetails::to_json_map(score.iter()));
let hit = SearchHit {
document,
formatted,
matches_position,
ranking_score_details,
ranking_score,
ranking_score_raw: score,
};
documents.push(hit);
}

View File

@ -1,3 +1,4 @@
use insta::{allow_duplicates, assert_json_snapshot};
use serde_json::json;
use super::*;
@ -18,30 +19,45 @@ async fn formatted_contain_wildcard() {
|response, code|
{
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"_formatted": {
"id": "852",
"cattos": "<em>pésti</em>",
},
"_matchesPosition": {"cattos": [{"start": 0, "length": 5}]},
})
);
}
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"_formatted": {
"id": "852",
"cattos": "<em>pésti</em>"
},
"_matchesPosition": {
"cattos": [
{
"start": 0,
"length": 5
}
]
},
"_rankingScore": "[score]"
}
"###);
}
}
)
.await;
index
.search(json!({ "q": "pésti", "attributesToRetrieve": ["*"] }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
"cattos": "pésti",
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"cattos": "pésti",
"_rankingScore": "[score]"
}
"###)
}
})
.await;
@ -50,20 +66,30 @@ async fn formatted_contain_wildcard() {
json!({ "q": "pésti", "attributesToRetrieve": ["*"], "attributesToHighlight": ["id"], "showMatchesPosition": true }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
"cattos": "pésti",
"_formatted": {
"id": "852",
"cattos": "pésti",
},
"_matchesPosition": {"cattos": [{"start": 0, "length": 5}]},
})
);
}
)
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"cattos": "pésti",
"_formatted": {
"id": "852",
"cattos": "pésti"
},
"_matchesPosition": {
"cattos": [
{
"start": 0,
"length": 5
}
]
},
"_rankingScore": "[score]"
}
"###)
}
})
.await;
index
@ -71,17 +97,21 @@ async fn formatted_contain_wildcard() {
json!({ "q": "pésti", "attributesToRetrieve": ["*"], "attributesToCrop": ["*"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
"cattos": "pésti",
"_formatted": {
"id": "852",
"cattos": "pésti",
}
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"cattos": "pésti",
"_formatted": {
"id": "852",
"cattos": "pésti"
},
"_rankingScore": "[score]"
}
"###);
}
},
)
.await;
@ -89,17 +119,21 @@ async fn formatted_contain_wildcard() {
index
.search(json!({ "q": "pésti", "attributesToCrop": ["*"] }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
"cattos": "pésti",
"_formatted": {
"id": "852",
"cattos": "pésti",
}
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"cattos": "pésti",
"_formatted": {
"id": "852",
"cattos": "pésti"
},
"_rankingScore": "[score]"
}
"###)
}
})
.await;
}
@ -116,21 +150,25 @@ async fn format_nested() {
index
.search(json!({ "q": "pésti", "attributesToRetrieve": ["doggos"] }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"doggos": [
{
"name": "bobby",
"age": 2,
},
{
"name": "buddy",
"age": 4,
},
],
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"doggos": [
{
"name": "bobby",
"age": 2
},
{
"name": "buddy",
"age": 4
}
],
"_rankingScore": "[score]"
}
"###)
}
})
.await;
@ -139,19 +177,23 @@ async fn format_nested() {
json!({ "q": "pésti", "attributesToRetrieve": ["doggos.name"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"doggos": [
{
"name": "bobby",
},
{
"name": "buddy",
},
],
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"doggos": [
{
"name": "bobby"
},
{
"name": "buddy"
}
],
"_rankingScore": "[score]"
}
"###)
}
},
)
.await;
@ -161,20 +203,31 @@ async fn format_nested() {
json!({ "q": "bobby", "attributesToRetrieve": ["doggos.name"], "showMatchesPosition": true }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"doggos": [
{
"name": "bobby",
},
{
"name": "buddy",
},
],
"_matchesPosition": {"doggos.name": [{"start": 0, "length": 5}]},
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"doggos": [
{
"name": "bobby"
},
{
"name": "buddy"
}
],
"_matchesPosition": {
"doggos.name": [
{
"start": 0,
"length": 5
}
]
},
"_rankingScore": "[score]"
}
"###)
}
}
)
.await;
@ -183,21 +236,25 @@ async fn format_nested() {
.search(json!({ "q": "pésti", "attributesToRetrieve": [], "attributesToHighlight": ["doggos.name"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"_formatted": {
"doggos": [
{
"name": "bobby",
},
{
"name": "buddy",
},
],
},
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"_formatted": {
"doggos": [
{
"name": "bobby"
},
{
"name": "buddy"
}
]
},
"_rankingScore": "[score]"
}
"###)
}
})
.await;
@ -205,21 +262,25 @@ async fn format_nested() {
.search(json!({ "q": "pésti", "attributesToRetrieve": [], "attributesToCrop": ["doggos.name"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"_formatted": {
"doggos": [
{
"name": "bobby",
},
{
"name": "buddy",
},
],
},
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"_formatted": {
"doggos": [
{
"name": "bobby"
},
{
"name": "buddy"
}
]
},
"_rankingScore": "[score]"
}
"###)
}
})
.await;
@ -227,55 +288,63 @@ async fn format_nested() {
.search(json!({ "q": "pésti", "attributesToRetrieve": ["doggos.name"], "attributesToHighlight": ["doggos.age"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"doggos": [
{
"name": "bobby",
},
{
"name": "buddy",
},
],
"_formatted": {
"doggos": [
{
"name": "bobby",
"age": "2",
},
{
"name": "buddy",
"age": "4",
},
],
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"doggos": [
{
"name": "bobby"
},
})
);
})
{
"name": "buddy"
}
],
"_formatted": {
"doggos": [
{
"name": "bobby",
"age": "2"
},
{
"name": "buddy",
"age": "4"
}
]
},
"_rankingScore": "[score]"
}
"###)
}
})
.await;
index
.search(json!({ "q": "pésti", "attributesToRetrieve": [], "attributesToHighlight": ["doggos.age"], "attributesToCrop": ["doggos.name"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"_formatted": {
"doggos": [
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"name": "bobby",
"age": "2",
},
{
"name": "buddy",
"age": "4",
},
],
},
})
);
"_formatted": {
"doggos": [
{
"name": "bobby",
"age": "2"
},
{
"name": "buddy",
"age": "4"
}
]
},
"_rankingScore": "[score]"
}
"###)
}
}
)
.await;
@ -297,54 +366,70 @@ async fn displayedattr_2_smol() {
.search(json!({ "attributesToRetrieve": ["father", "id"], "attributesToHighlight": ["mother"], "attributesToCrop": ["cattos"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"_rankingScore": "[score]"
}
"###)
}
})
.await;
index
.search(json!({ "attributesToRetrieve": ["id"] }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"_rankingScore": "[score]"
}
"###)
}
})
.await;
index
.search(json!({ "attributesToHighlight": ["id"] }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
"_formatted": {
"id": "852",
}
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"_formatted": {
"id": "852"
},
"_rankingScore": "[score]"
}
"###)
}
})
.await;
index
.search(json!({ "attributesToCrop": ["id"] }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
"_formatted": {
"id": "852",
}
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"_formatted": {
"id": "852"
},
"_rankingScore": "[score]"
}
"###)
}
})
.await;
@ -353,15 +438,19 @@ async fn displayedattr_2_smol() {
json!({ "attributesToHighlight": ["id"], "attributesToCrop": ["id"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
"_formatted": {
"id": "852",
}
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"_formatted": {
"id": "852"
},
"_rankingScore": "[score]"
}
"###)
}
},
)
.await;
@ -369,31 +458,47 @@ async fn displayedattr_2_smol() {
index
.search(json!({ "attributesToHighlight": ["cattos"] }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"_rankingScore": "[score]"
}
"###)
}
})
.await;
index
.search(json!({ "attributesToCrop": ["cattos"] }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"id": 852,
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"id": 852,
"_rankingScore": "[score]"
}
"###)
}
})
.await;
index
.search(json!({ "attributesToRetrieve": ["cattos"] }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(response["hits"][0], json!({}));
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"_rankingScore": "[score]"
}
"###)
}
})
.await;
@ -402,7 +507,15 @@ async fn displayedattr_2_smol() {
json!({ "attributesToRetrieve": ["cattos"], "attributesToHighlight": ["cattos"], "attributesToCrop": ["cattos"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(response["hits"][0], json!({}));
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"_rankingScore": "[score]"
}
"###)
}
}
)
@ -413,14 +526,18 @@ async fn displayedattr_2_smol() {
json!({ "attributesToRetrieve": ["cattos"], "attributesToHighlight": ["id"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"_formatted": {
"id": "852",
}
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"_formatted": {
"id": "852"
},
"_rankingScore": "[score]"
}
"###)
}
},
)
.await;
@ -430,14 +547,18 @@ async fn displayedattr_2_smol() {
json!({ "attributesToRetrieve": ["cattos"], "attributesToCrop": ["id"] }),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0],
json!({
"_formatted": {
"id": "852",
}
})
);
allow_duplicates! {
assert_json_snapshot!(response["hits"][0],
{ "._rankingScore" => "[score]" },
@r###"
{
"_formatted": {
"id": "852"
},
"_rankingScore": "[score]"
}
"###)
}
},
)
.await;

View File

@ -65,14 +65,15 @@ async fn simple_search_single_index() {
]}))
.await;
snapshot!(code, @"200 OK");
insta::assert_json_snapshot!(response["results"], { "[].processingTimeMs" => "[time]" }, @r###"
insta::assert_json_snapshot!(response["results"], { "[].processingTimeMs" => "[time]", ".**._rankingScore" => "[score]" }, @r###"
[
{
"indexUid": "test",
"hits": [
{
"title": "Gläss",
"id": "450465"
"id": "450465",
"_rankingScore": "[score]"
}
],
"query": "glass",
@ -86,7 +87,8 @@ async fn simple_search_single_index() {
"hits": [
{
"title": "Captain Marvel",
"id": "299537"
"id": "299537",
"_rankingScore": "[score]"
}
],
"query": "captain",
@ -170,14 +172,15 @@ async fn simple_search_two_indexes() {
]}))
.await;
snapshot!(code, @"200 OK");
insta::assert_json_snapshot!(response["results"], { "[].processingTimeMs" => "[time]" }, @r###"
insta::assert_json_snapshot!(response["results"], { "[].processingTimeMs" => "[time]", ".**._rankingScore" => "[score]" }, @r###"
[
{
"indexUid": "test",
"hits": [
{
"title": "Gläss",
"id": "450465"
"id": "450465",
"_rankingScore": "[score]"
}
],
"query": "glass",
@ -203,7 +206,8 @@ async fn simple_search_two_indexes() {
"age": 4
}
],
"cattos": "pésti"
"cattos": "pésti",
"_rankingScore": "[score]"
},
{
"id": 654,
@ -218,7 +222,8 @@ async fn simple_search_two_indexes() {
"cattos": [
"simba",
"pestiféré"
]
],
"_rankingScore": "[score]"
}
],
"query": "pésti",

View File

@ -97,7 +97,7 @@ async fn task_bad_types() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `documentDeletionByFilter`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"code": "invalid_task_types",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_task_types"
@ -108,7 +108,7 @@ async fn task_bad_types() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `documentDeletionByFilter`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"code": "invalid_task_types",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_task_types"
@ -119,7 +119,7 @@ async fn task_bad_types() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `documentDeletionByFilter`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"code": "invalid_task_types",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_task_types"

View File

@ -49,7 +49,7 @@ impl CboRoaringBitmapCodec {
} else {
// Otherwise, it means we used the classic RoaringBitmapCodec and
// that the header takes threshold integers.
RoaringBitmap::deserialize_from(bytes)
RoaringBitmap::deserialize_unchecked_from(bytes)
}
}
@ -69,7 +69,7 @@ impl CboRoaringBitmapCodec {
vec.push(integer);
}
} else {
roaring |= RoaringBitmap::deserialize_from(bytes.as_ref())?;
roaring |= RoaringBitmap::deserialize_unchecked_from(bytes.as_ref())?;
}
}

View File

@ -8,7 +8,7 @@ impl heed::BytesDecode<'_> for RoaringBitmapCodec {
type DItem = RoaringBitmap;
fn bytes_decode(bytes: &[u8]) -> Option<Self::DItem> {
RoaringBitmap::deserialize_from(bytes).ok()
RoaringBitmap::deserialize_unchecked_from(bytes).ok()
}
}

View File

@ -21,10 +21,9 @@ use crate::heed_codec::facet::{
};
use crate::heed_codec::{ScriptLanguageCodec, StrBEU16Codec, StrRefCodec};
use crate::{
default_criteria, BEU32StrCodec, BoRoaringBitmapCodec, CboRoaringBitmapCodec, Criterion,
DocumentId, ExternalDocumentsIds, FacetDistribution, FieldDistribution, FieldId,
FieldIdWordCountCodec, GeoPoint, ObkvCodec, Result, RoaringBitmapCodec, RoaringBitmapLenCodec,
Search, U8StrStrCodec, BEU16, BEU32,
default_criteria, CboRoaringBitmapCodec, Criterion, DocumentId, ExternalDocumentsIds,
FacetDistribution, FieldDistribution, FieldId, FieldIdWordCountCodec, GeoPoint, ObkvCodec,
Result, RoaringBitmapCodec, RoaringBitmapLenCodec, Search, U8StrStrCodec, BEU16, BEU32,
};
pub const DEFAULT_MIN_WORD_LEN_ONE_TYPO: u8 = 5;
@ -111,9 +110,6 @@ pub struct Index {
/// A prefix of word and all the documents ids containing this prefix, from attributes for which typos are not allowed.
pub exact_word_prefix_docids: Database<Str, RoaringBitmapCodec>,
/// Maps a word and a document id (u32) to all the positions where the given word appears.
pub docid_word_positions: Database<BEU32StrCodec, BoRoaringBitmapCodec>,
/// Maps the proximity between a pair of words with all the docids where this relation appears.
pub word_pair_proximity_docids: Database<U8StrStrCodec, CboRoaringBitmapCodec>,
/// Maps the proximity between a pair of word and prefix with all the docids where this relation appears.
@ -177,7 +173,6 @@ impl Index {
let word_prefix_docids = env.create_database(&mut wtxn, Some(WORD_PREFIX_DOCIDS))?;
let exact_word_prefix_docids =
env.create_database(&mut wtxn, Some(EXACT_WORD_PREFIX_DOCIDS))?;
let docid_word_positions = env.create_database(&mut wtxn, Some(DOCID_WORD_POSITIONS))?;
let word_pair_proximity_docids =
env.create_database(&mut wtxn, Some(WORD_PAIR_PROXIMITY_DOCIDS))?;
let script_language_docids =
@ -220,7 +215,6 @@ impl Index {
exact_word_docids,
word_prefix_docids,
exact_word_prefix_docids,
docid_word_positions,
word_pair_proximity_docids,
script_language_docids,
word_prefix_pair_proximity_docids,
@ -2494,8 +2488,12 @@ pub(crate) mod tests {
let rtxn = index.read_txn().unwrap();
let search = Search::new(&rtxn, &index);
let SearchResult { matching_words: _, candidates: _, mut documents_ids } =
search.execute().unwrap();
let SearchResult {
matching_words: _,
candidates: _,
document_scores: _,
mut documents_ids,
} = search.execute().unwrap();
let primary_key_id = index.fields_ids_map(&rtxn).unwrap().id("primary_key").unwrap();
documents_ids.sort_unstable();
let docs = index.documents(&rtxn, documents_ids).unwrap();

View File

@ -5,52 +5,6 @@
#[global_allocator]
pub static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
// #[cfg(test)]
// pub mod allocator {
// use std::alloc::{GlobalAlloc, System};
// use std::sync::atomic::{self, AtomicI64};
// #[global_allocator]
// pub static ALLOC: CountingAlloc = CountingAlloc {
// max_resident: AtomicI64::new(0),
// resident: AtomicI64::new(0),
// allocated: AtomicI64::new(0),
// };
// pub struct CountingAlloc {
// pub max_resident: AtomicI64,
// pub resident: AtomicI64,
// pub allocated: AtomicI64,
// }
// unsafe impl GlobalAlloc for CountingAlloc {
// unsafe fn alloc(&self, layout: std::alloc::Layout) -> *mut u8 {
// self.allocated.fetch_add(layout.size() as i64, atomic::Ordering::SeqCst);
// let old_resident =
// self.resident.fetch_add(layout.size() as i64, atomic::Ordering::SeqCst);
// let resident = old_resident + layout.size() as i64;
// self.max_resident.fetch_max(resident, atomic::Ordering::SeqCst);
// // if layout.size() > 1_000_000 {
// // eprintln!(
// // "allocating {} with new resident size: {resident}",
// // layout.size() / 1_000_000
// // );
// // // let trace = std::backtrace::Backtrace::capture();
// // // let t = trace.to_string();
// // // eprintln!("{t}");
// // }
// System.alloc(layout)
// }
// unsafe fn dealloc(&self, ptr: *mut u8, layout: std::alloc::Layout) {
// self.resident.fetch_sub(layout.size() as i64, atomic::Ordering::Relaxed);
// System.dealloc(ptr, layout)
// }
// }
// }
#[macro_use]
pub mod documents;
@ -63,6 +17,7 @@ mod fields_ids_map;
pub mod heed_codec;
pub mod index;
pub mod proximity;
pub mod score_details;
mod search;
pub mod update;

544
milli/src/score_details.rs Normal file
View File

@ -0,0 +1,544 @@
use std::cmp::Ordering;
use serde::Serialize;
use crate::distance_between_two_points;
#[derive(Debug, Clone, PartialEq)]
pub enum ScoreDetails {
Words(Words),
Typo(Typo),
Proximity(Rank),
Fid(Rank),
Position(Rank),
ExactAttribute(ExactAttribute),
Exactness(Rank),
Sort(Sort),
GeoSort(GeoSort),
}
impl PartialOrd for ScoreDetails {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
use ScoreDetails::*;
match (self, other) {
// matching left and right hands => defer to sub impl
(Words(left), Words(right)) => left.partial_cmp(right),
(Typo(left), Typo(right)) => left.partial_cmp(right),
(Proximity(left), Proximity(right)) => left.partial_cmp(right),
(Fid(left), Fid(right)) => left.partial_cmp(right),
(Position(left), Position(right)) => left.partial_cmp(right),
(ExactAttribute(left), ExactAttribute(right)) => left.partial_cmp(right),
(Exactness(left), Exactness(right)) => left.partial_cmp(right),
(Sort(left), Sort(right)) => left.partial_cmp(right),
(GeoSort(left), GeoSort(right)) => left.partial_cmp(right),
// non matching left and right hands => None
// written this way rather than with a single `_` arm, so that adding a new variant
// still results in a compile error
(Words(_), _) => None,
(Typo(_), _) => None,
(Proximity(_), _) => None,
(Fid(_), _) => None,
(Position(_), _) => None,
(ExactAttribute(_), _) => None,
(Exactness(_), _) => None,
(Sort(_), _) => None,
(GeoSort(_), _) => None,
}
}
}
impl ScoreDetails {
pub fn local_score(&self) -> Option<f64> {
self.rank().map(Rank::local_score)
}
pub fn rank(&self) -> Option<Rank> {
match self {
ScoreDetails::Words(details) => Some(details.rank()),
ScoreDetails::Typo(details) => Some(details.rank()),
ScoreDetails::Proximity(details) => Some(*details),
ScoreDetails::Fid(details) => Some(*details),
ScoreDetails::Position(details) => Some(*details),
ScoreDetails::ExactAttribute(details) => Some(details.rank()),
ScoreDetails::Exactness(details) => Some(*details),
ScoreDetails::Sort(_) => None,
ScoreDetails::GeoSort(_) => None,
}
}
pub fn global_score<'a>(details: impl Iterator<Item = &'a Self>) -> f64 {
Rank::global_score(details.filter_map(Self::rank))
}
pub fn global_score_linear_scale<'a>(details: impl Iterator<Item = &'a Self>) -> u64 {
(Self::global_score(details) * LINEAR_SCALE_FACTOR).round() as u64
}
/// Panics
///
/// - If Position is not preceded by Fid
/// - If Exactness is not preceded by ExactAttribute
/// - If a sort fid is not contained in the passed `fields_ids_map`.
pub fn to_json_map<'a>(
details: impl Iterator<Item = &'a Self>,
) -> serde_json::Map<String, serde_json::Value> {
let mut order = 0;
let mut details_map = serde_json::Map::default();
for details in details {
match details {
ScoreDetails::Words(words) => {
let words_details = serde_json::json!({
"order": order,
"matchingWords": words.matching_words,
"maxMatchingWords": words.max_matching_words,
"score": words.rank().local_score_linear_scale(),
});
details_map.insert("words".into(), words_details);
order += 1;
}
ScoreDetails::Typo(typo) => {
let typo_details = serde_json::json!({
"order": order,
"typoCount": typo.typo_count,
"maxTypoCount": typo.max_typo_count,
"score": typo.rank().local_score_linear_scale(),
});
details_map.insert("typo".into(), typo_details);
order += 1;
}
ScoreDetails::Proximity(proximity) => {
let proximity_details = serde_json::json!({
"order": order,
"score": proximity.local_score_linear_scale(),
});
details_map.insert("proximity".into(), proximity_details);
order += 1;
}
ScoreDetails::Fid(fid) => {
// For now, fid is a virtual rule always followed by the "position" rule
let fid_details = serde_json::json!({
"order": order,
"attributes_ranking_order": fid.local_score_linear_scale(),
});
details_map.insert("attribute".into(), fid_details);
order += 1;
}
ScoreDetails::Position(position) => {
// For now, position is a virtual rule always preceded by the "fid" rule
let attribute_details = details_map
.get_mut("attribute")
.expect("position not preceded by attribute");
let attribute_details = attribute_details
.as_object_mut()
.expect("attribute details was not an object");
attribute_details.insert(
"attributes_query_word_order".into(),
position.local_score_linear_scale().into(),
);
// do not update the order since this was already done by fid
}
ScoreDetails::ExactAttribute(exact_attribute) => {
let exactness_details = serde_json::json!({
"order": order,
"exactIn": exact_attribute,
"score": exact_attribute.rank().local_score_linear_scale(),
});
details_map.insert("exactness".into(), exactness_details);
order += 1;
}
ScoreDetails::Exactness(details) => {
// For now, exactness is a virtual rule always preceded by the "ExactAttribute" rule
let exactness_details = details_map
.get_mut("exactness")
.expect("Exactness not preceded by exactAttribute");
let exactness_details = exactness_details
.as_object_mut()
.expect("exactness details was not an object");
if exactness_details.get("exactIn").expect("missing 'exactIn'")
== &serde_json::json!(ExactAttribute::NoExactMatch)
{
let score = Rank::global_score_linear_scale(
[ExactAttribute::NoExactMatch.rank(), *details].iter().copied(),
);
*exactness_details.get_mut("score").expect("missing score") = score.into();
}
// do not update the order since this was already done by exactAttribute
}
ScoreDetails::Sort(details) => {
let sort = format!(
"{}:{}",
details.field_name,
if details.ascending { "asc" } else { "desc" }
);
let sort_details = serde_json::json!({
"order": order,
"value": details.value,
});
details_map.insert(sort, sort_details);
order += 1;
}
ScoreDetails::GeoSort(details) => {
let sort = format!(
"_geoPoint({}, {}):{}",
details.target_point[0],
details.target_point[1],
if details.ascending { "asc" } else { "desc" }
);
let point = if let Some(value) = details.value {
serde_json::json!({ "lat": value[0], "lng": value[1]})
} else {
serde_json::Value::Null
};
let sort_details = serde_json::json!({
"order": order,
"value": point,
"distance": details.distance(),
});
details_map.insert(sort, sort_details);
order += 1;
}
}
}
details_map
}
pub fn partial_cmp_iter<'a>(
mut left: impl Iterator<Item = &'a Self>,
mut right: impl Iterator<Item = &'a Self>,
) -> Result<Ordering, NotComparable> {
let mut index = 0;
let mut order = match (left.next(), right.next()) {
(Some(left), Some(right)) => left.partial_cmp(right).incomparable(index)?,
_ => return Ok(Ordering::Equal),
};
for (left, right) in left.zip(right) {
if order != Ordering::Equal {
return Ok(order);
};
index += 1;
order = left.partial_cmp(right).incomparable(index)?;
}
Ok(order)
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct NotComparable(pub usize);
trait OptionToNotComparable<T> {
fn incomparable(self, index: usize) -> Result<T, NotComparable>;
}
impl<T> OptionToNotComparable<T> for Option<T> {
fn incomparable(self, index: usize) -> Result<T, NotComparable> {
match self {
Some(t) => Ok(t),
None => Err(NotComparable(index)),
}
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Words {
pub matching_words: u32,
pub max_matching_words: u32,
}
impl PartialOrd for Words {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
(self.max_matching_words == other.max_matching_words)
.then(|| self.matching_words.cmp(&other.matching_words))
}
}
impl Words {
pub fn rank(&self) -> Rank {
Rank { rank: self.matching_words, max_rank: self.max_matching_words }
}
pub(crate) fn from_rank(rank: Rank) -> Words {
Words { matching_words: rank.rank, max_matching_words: rank.max_rank }
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Typo {
pub typo_count: u32,
pub max_typo_count: u32,
}
impl PartialOrd for Typo {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
(self.max_typo_count == other.max_typo_count).then(|| {
// the order is reverted as having fewer typos gives a better score
self.typo_count.cmp(&other.typo_count).reverse()
})
}
}
impl Typo {
pub fn rank(&self) -> Rank {
Rank {
rank: self.max_typo_count - self.typo_count + 1,
max_rank: (self.max_typo_count + 1),
}
}
// max_rank = max_typo + 1
// max_typo = max_rank - 1
//
// rank = max_typo - typo + 1
// rank = max_rank - 1 - typo + 1
// rank + typo = max_rank
// typo = max_rank - rank
pub fn from_rank(rank: Rank) -> Typo {
Typo { typo_count: rank.max_rank - rank.rank, max_typo_count: rank.max_rank - 1 }
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Rank {
/// The ordinal rank, such that `max_rank` is the first rank, and 0 is the last rank.
///
/// The higher the better. Documents with a rank of 0 have a score of 0 and are typically never returned
/// (they don't match the query).
pub rank: u32,
/// The maximum possible rank. Documents with this rank have a score of 1.
///
/// The max rank should not be 0.
pub max_rank: u32,
}
impl PartialOrd for Rank {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
(self.max_rank == other.max_rank).then(|| self.rank.cmp(&other.rank))
}
}
impl Rank {
pub fn local_score(self) -> f64 {
self.rank as f64 / self.max_rank as f64
}
pub fn local_score_linear_scale(self) -> u64 {
(self.local_score() * LINEAR_SCALE_FACTOR).round() as u64
}
pub fn global_score(details: impl Iterator<Item = Self>) -> f64 {
let mut rank = Rank { rank: 1, max_rank: 1 };
for inner_rank in details {
rank.rank -= 1;
rank.rank *= inner_rank.max_rank;
rank.max_rank *= inner_rank.max_rank;
rank.rank += inner_rank.rank;
}
rank.local_score()
}
pub fn global_score_linear_scale(details: impl Iterator<Item = Self>) -> u64 {
(Self::global_score(details) * LINEAR_SCALE_FACTOR).round() as u64
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize)]
#[serde(rename_all = "camelCase")]
pub enum ExactAttribute {
// Do not reorder as the order is significant, from least relevant to most relevant
NoExactMatch,
MatchesStart,
MatchesFull,
}
impl ExactAttribute {
pub fn rank(&self) -> Rank {
let rank = match self {
ExactAttribute::MatchesFull => 3,
ExactAttribute::MatchesStart => 2,
ExactAttribute::NoExactMatch => 1,
};
Rank { rank, max_rank: 3 }
}
}
#[derive(Debug, Clone, PartialEq)]
pub struct Sort {
pub field_name: String,
pub ascending: bool,
pub value: serde_json::Value,
}
impl PartialOrd for Sort {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
if self.field_name != other.field_name {
return None;
}
if self.ascending != other.ascending {
return None;
}
match (&self.value, &other.value) {
(serde_json::Value::Null, serde_json::Value::Null) => Some(Ordering::Equal),
(serde_json::Value::Null, _) => Some(Ordering::Less),
(_, serde_json::Value::Null) => Some(Ordering::Greater),
// numbers are always before strings
(serde_json::Value::Number(_), serde_json::Value::String(_)) => Some(Ordering::Greater),
(serde_json::Value::String(_), serde_json::Value::Number(_)) => Some(Ordering::Less),
(serde_json::Value::Number(left), serde_json::Value::Number(right)) => {
//FIXME: unwrap permitted here?
let order = left.as_f64().unwrap().partial_cmp(&right.as_f64().unwrap())?;
// always reverted, as bigger is better
Some(if self.ascending { order.reverse() } else { order })
}
(serde_json::Value::String(left), serde_json::Value::String(right)) => {
let order = left.cmp(right);
Some(if self.ascending { order.reverse() } else { order })
}
_ => None,
}
}
}
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct GeoSort {
pub target_point: [f64; 2],
pub ascending: bool,
pub value: Option<[f64; 2]>,
}
impl PartialOrd for GeoSort {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
if self.target_point != other.target_point {
return None;
}
if self.ascending != other.ascending {
return None;
}
Some(match (self.distance(), other.distance()) {
(None, None) => Ordering::Equal,
(None, Some(_)) => Ordering::Less,
(Some(_), None) => Ordering::Greater,
(Some(left), Some(right)) => {
let order = left.partial_cmp(&right)?;
if self.ascending {
// when ascending, the one with the smallest distance has the best score
order.reverse()
} else {
order
}
}
})
}
}
impl GeoSort {
pub fn distance(&self) -> Option<f64> {
self.value.map(|value| distance_between_two_points(&self.target_point, &value))
}
}
const LINEAR_SCALE_FACTOR: f64 = 1000.0;
#[cfg(test)]
mod test {
use super::*;
#[test]
fn compare() {
let left = [
ScoreDetails::Words(Words { matching_words: 3, max_matching_words: 4 }),
ScoreDetails::Sort(Sort {
field_name: "doggo".into(),
ascending: true,
value: "Intel the Beagle".into(),
}),
];
let right = [
ScoreDetails::Words(Words { matching_words: 3, max_matching_words: 4 }),
ScoreDetails::Sort(Sort {
field_name: "doggo".into(),
ascending: true,
value: "Max the Labrador".into(),
}),
];
assert_eq!(
Ok(Ordering::Greater),
ScoreDetails::partial_cmp_iter(left.iter(), right.iter())
);
// equal when all the common components are equal
assert_eq!(
Ok(Ordering::Equal),
ScoreDetails::partial_cmp_iter(left[0..1].iter(), right.iter())
);
let right = [
ScoreDetails::Words(Words { matching_words: 4, max_matching_words: 4 }),
ScoreDetails::Sort(Sort {
field_name: "doggo".into(),
ascending: true,
value: "Max the Labrador".into(),
}),
];
assert_eq!(Ok(Ordering::Less), ScoreDetails::partial_cmp_iter(left.iter(), right.iter()));
}
#[test]
fn sort_not_comparable() {
let left = [
ScoreDetails::Words(Words { matching_words: 3, max_matching_words: 4 }),
ScoreDetails::Sort(Sort {
// not the same field name
field_name: "catto".into(),
ascending: true,
value: "Sylver the cat".into(),
}),
];
let right = [
ScoreDetails::Words(Words { matching_words: 3, max_matching_words: 4 }),
ScoreDetails::Sort(Sort {
field_name: "doggo".into(),
ascending: true,
value: "Max the Labrador".into(),
}),
];
assert_eq!(
Err(NotComparable(1)),
ScoreDetails::partial_cmp_iter(left.iter(), right.iter())
);
let left = [
ScoreDetails::Words(Words { matching_words: 3, max_matching_words: 4 }),
ScoreDetails::Sort(Sort {
field_name: "doggo".into(),
// Not the same order
ascending: false,
value: "Intel the Beagle".into(),
}),
];
let right = [
ScoreDetails::Words(Words { matching_words: 3, max_matching_words: 4 }),
ScoreDetails::Sort(Sort {
field_name: "doggo".into(),
ascending: true,
value: "Max the Labrador".into(),
}),
];
assert_eq!(
Err(NotComparable(1)),
ScoreDetails::partial_cmp_iter(left.iter(), right.iter())
);
}
#[test]
fn sort_behavior() {
let left = Sort { field_name: "price".into(), ascending: true, value: "5400".into() };
let right = Sort { field_name: "price".into(), ascending: true, value: 53.into() };
// number always better match than strings
assert_eq!(Some(Ordering::Less), left.partial_cmp(&right));
let left = Sort { field_name: "price".into(), ascending: false, value: "5400".into() };
let right = Sort { field_name: "price".into(), ascending: false, value: 53.into() };
// true regardless of the sort direction
assert_eq!(Some(Ordering::Less), left.partial_cmp(&right));
}
}

View File

@ -7,6 +7,7 @@ use roaring::bitmap::RoaringBitmap;
pub use self::facet::{FacetDistribution, Filter, DEFAULT_VALUES_PER_FACET};
pub use self::new::matches::{FormatOptions, MatchBounds, Matcher, MatcherBuilder, MatchingWords};
use self::new::PartialSearchResult;
use crate::score_details::ScoreDetails;
use crate::{
execute_search, AscDesc, DefaultSearchLogger, DocumentId, Index, Result, SearchContext,
};
@ -93,7 +94,7 @@ impl<'a> Search<'a> {
self
}
/// Force the search to exhastivelly compute the number of candidates,
/// Forces the search to exhaustively compute the number of candidates,
/// this will increase the search time but allows finite pagination.
pub fn exhaustive_number_hits(&mut self, exhaustive_number_hits: bool) -> &mut Search<'a> {
self.exhaustive_number_hits = exhaustive_number_hits;
@ -102,7 +103,7 @@ impl<'a> Search<'a> {
pub fn execute(&self) -> Result<SearchResult> {
let mut ctx = SearchContext::new(self.index, self.rtxn);
let PartialSearchResult { located_query_terms, candidates, documents_ids } =
let PartialSearchResult { located_query_terms, candidates, documents_ids, document_scores } =
execute_search(
&mut ctx,
&self.query,
@ -124,7 +125,7 @@ impl<'a> Search<'a> {
None => MatchingWords::default(),
};
Ok(SearchResult { matching_words, candidates, documents_ids })
Ok(SearchResult { matching_words, candidates, document_scores, documents_ids })
}
}
@ -160,8 +161,8 @@ impl fmt::Debug for Search<'_> {
pub struct SearchResult {
pub matching_words: MatchingWords,
pub candidates: RoaringBitmap,
// TODO those documents ids should be associated with their criteria scores.
pub documents_ids: Vec<DocumentId>,
pub document_scores: Vec<Vec<ScoreDetails>>,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]

View File

@ -3,11 +3,13 @@ use roaring::RoaringBitmap;
use super::logger::SearchLogger;
use super::ranking_rules::{BoxRankingRule, RankingRuleQueryTrait};
use super::SearchContext;
use crate::score_details::ScoreDetails;
use crate::search::new::distinct::{apply_distinct_rule, distinct_single_docid, DistinctOutput};
use crate::Result;
pub struct BucketSortOutput {
pub docids: Vec<u32>,
pub scores: Vec<Vec<ScoreDetails>>,
pub all_candidates: RoaringBitmap,
}
@ -31,7 +33,11 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
};
if universe.len() < from as u64 {
return Ok(BucketSortOutput { docids: vec![], all_candidates: universe.clone() });
return Ok(BucketSortOutput {
docids: vec![],
scores: vec![],
all_candidates: universe.clone(),
});
}
if ranking_rules.is_empty() {
if let Some(distinct_fid) = distinct_fid {
@ -49,22 +55,32 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
}
let mut all_candidates = universe - excluded;
all_candidates.extend(results.iter().copied());
return Ok(BucketSortOutput { docids: results, all_candidates });
return Ok(BucketSortOutput {
scores: vec![Default::default(); results.len()],
docids: results,
all_candidates,
});
} else {
let docids = universe.iter().skip(from).take(length).collect();
return Ok(BucketSortOutput { docids, all_candidates: universe.clone() });
let docids: Vec<u32> = universe.iter().skip(from).take(length).collect();
return Ok(BucketSortOutput {
scores: vec![Default::default(); docids.len()],
docids,
all_candidates: universe.clone(),
});
};
}
let ranking_rules_len = ranking_rules.len();
logger.start_iteration_ranking_rule(0, ranking_rules[0].as_ref(), query, universe);
ranking_rules[0].start_iteration(ctx, logger, universe, query)?;
let mut ranking_rule_scores: Vec<ScoreDetails> = vec![];
let mut ranking_rule_universes: Vec<RoaringBitmap> =
vec![RoaringBitmap::default(); ranking_rules_len];
ranking_rule_universes[0] = universe.clone();
let mut cur_ranking_rule_index = 0;
/// Finish iterating over the current ranking rule, yielding
@ -89,11 +105,16 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
} else {
cur_ranking_rule_index -= 1;
}
// FIXME: check off by one
if ranking_rule_scores.len() > cur_ranking_rule_index {
ranking_rule_scores.pop();
}
};
}
let mut all_candidates = universe.clone();
let mut valid_docids = vec![];
let mut valid_scores = vec![];
let mut cur_offset = 0usize;
macro_rules! maybe_add_to_results {
@ -104,23 +125,23 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
length,
logger,
&mut valid_docids,
&mut valid_scores,
&mut all_candidates,
&mut ranking_rule_universes,
&mut ranking_rules,
cur_ranking_rule_index,
&mut cur_offset,
distinct_fid,
&ranking_rule_scores,
$candidates,
)?;
};
}
while valid_docids.len() < length {
// The universe for this bucket is zero or one element, so we don't need to sort
// anything, just extend the results and go back to the parent ranking rule.
if ranking_rule_universes[cur_ranking_rule_index].len() <= 1 {
let bucket = std::mem::take(&mut ranking_rule_universes[cur_ranking_rule_index]);
maybe_add_to_results!(bucket);
// The universe for this bucket is zero, so we don't need to sort
// anything, just go back to the parent ranking rule.
if ranking_rule_universes[cur_ranking_rule_index].is_empty() {
back!();
continue;
}
@ -130,6 +151,8 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
continue;
};
ranking_rule_scores.push(next_bucket.score);
logger.next_bucket_ranking_rule(
cur_ranking_rule_index,
ranking_rules[cur_ranking_rule_index].as_ref(),
@ -143,10 +166,11 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
ranking_rule_universes[cur_ranking_rule_index] -= &next_bucket.candidates;
if cur_ranking_rule_index == ranking_rules_len - 1
|| next_bucket.candidates.len() <= 1
|| cur_offset + (next_bucket.candidates.len() as usize) < from
{
maybe_add_to_results!(next_bucket.candidates);
// FIXME: use index based logic like all the other rules so that you don't have to maintain the pop/push?
ranking_rule_scores.pop();
continue;
}
@ -166,7 +190,7 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
)?;
}
Ok(BucketSortOutput { docids: valid_docids, all_candidates })
Ok(BucketSortOutput { docids: valid_docids, scores: valid_scores, all_candidates })
}
/// Add the candidates to the results. Take `distinct`, `from`, `length`, and `cur_offset`
@ -179,14 +203,18 @@ fn maybe_add_to_results<'ctx, Q: RankingRuleQueryTrait>(
logger: &mut dyn SearchLogger<Q>,
valid_docids: &mut Vec<u32>,
valid_scores: &mut Vec<Vec<ScoreDetails>>,
all_candidates: &mut RoaringBitmap,
ranking_rule_universes: &mut [RoaringBitmap],
ranking_rules: &mut [BoxRankingRule<'ctx, Q>],
cur_ranking_rule_index: usize,
cur_offset: &mut usize,
distinct_fid: Option<u16>,
ranking_rule_scores: &[ScoreDetails],
candidates: RoaringBitmap,
) -> Result<()> {
// First apply the distinct rule on the candidates, reducing the universes if necessary
@ -231,13 +259,17 @@ fn maybe_add_to_results<'ctx, Q: RankingRuleQueryTrait>(
let candidates =
candidates.iter().take(length - valid_docids.len()).copied().collect::<Vec<_>>();
logger.add_to_results(&candidates);
valid_docids.extend(&candidates);
valid_docids.extend_from_slice(&candidates);
valid_scores
.extend(std::iter::repeat(ranking_rule_scores.to_owned()).take(candidates.len()));
}
} else {
// if we have passed the offset already, add some of the documents (up to the limit)
let candidates = candidates.iter().take(length - valid_docids.len()).collect::<Vec<u32>>();
logger.add_to_results(&candidates);
valid_docids.extend(&candidates);
valid_docids.extend_from_slice(&candidates);
valid_scores
.extend(std::iter::repeat(ranking_rule_scores.to_owned()).take(candidates.len()));
}
*cur_offset += candidates.len() as usize;

View File

@ -2,6 +2,7 @@ use roaring::{MultiOps, RoaringBitmap};
use super::query_graph::QueryGraph;
use super::ranking_rules::{RankingRule, RankingRuleOutput};
use crate::score_details::{self, ScoreDetails};
use crate::search::new::query_graph::QueryNodeData;
use crate::search::new::query_term::ExactTerm;
use crate::{Result, SearchContext, SearchLogger};
@ -244,7 +245,13 @@ impl State {
candidates &= universe;
(
State::AttributeStarts(query_graph.clone(), candidates_per_attribute),
Some(RankingRuleOutput { query: query_graph, candidates }),
Some(RankingRuleOutput {
query: query_graph,
candidates,
score: ScoreDetails::ExactAttribute(
score_details::ExactAttribute::MatchesFull,
),
}),
)
}
State::AttributeStarts(query_graph, candidates_per_attribute) => {
@ -257,12 +264,24 @@ impl State {
candidates &= universe;
(
State::Empty(query_graph.clone()),
Some(RankingRuleOutput { query: query_graph, candidates }),
Some(RankingRuleOutput {
query: query_graph,
candidates,
score: ScoreDetails::ExactAttribute(
score_details::ExactAttribute::MatchesStart,
),
}),
)
}
State::Empty(query_graph) => (
State::Empty(query_graph.clone()),
Some(RankingRuleOutput { query: query_graph, candidates: universe.clone() }),
Some(RankingRuleOutput {
query: query_graph,
candidates: universe.clone(),
score: ScoreDetails::ExactAttribute(
score_details::ExactAttribute::NoExactMatch,
),
}),
),
};
(state, output)

View File

@ -8,6 +8,7 @@ use rstar::RTree;
use super::ranking_rules::{RankingRule, RankingRuleOutput, RankingRuleQueryTrait};
use crate::heed_codec::facet::{FieldDocIdFacetCodec, OrderedF64Codec};
use crate::score_details::{self, ScoreDetails};
use crate::{
distance_between_two_points, lat_lng_to_xyz, GeoPoint, Index, Result, SearchContext,
SearchLogger,
@ -80,7 +81,7 @@ pub struct GeoSort<Q: RankingRuleQueryTrait> {
field_ids: Option<[u16; 2]>,
rtree: Option<RTree<GeoPoint>>,
cached_sorted_docids: VecDeque<u32>,
cached_sorted_docids: VecDeque<(u32, [f64; 2])>,
geo_candidates: RoaringBitmap,
}
@ -130,7 +131,7 @@ impl<Q: RankingRuleQueryTrait> GeoSort<Q> {
let point = lat_lng_to_xyz(&self.point);
for point in rtree.nearest_neighbor_iter(&point) {
if self.geo_candidates.contains(point.data.0) {
self.cached_sorted_docids.push_back(point.data.0);
self.cached_sorted_docids.push_back(point.data);
if self.cached_sorted_docids.len() >= cache_size {
break;
}
@ -142,7 +143,7 @@ impl<Q: RankingRuleQueryTrait> GeoSort<Q> {
let point = lat_lng_to_xyz(&opposite_of(self.point));
for point in rtree.nearest_neighbor_iter(&point) {
if self.geo_candidates.contains(point.data.0) {
self.cached_sorted_docids.push_front(point.data.0);
self.cached_sorted_docids.push_front(point.data);
if self.cached_sorted_docids.len() >= cache_size {
break;
}
@ -177,7 +178,7 @@ impl<Q: RankingRuleQueryTrait> GeoSort<Q> {
// computing the distance between two points is expensive thus we cache the result
documents
.sort_by_cached_key(|(_, p)| distance_between_two_points(&self.point, p) as usize);
self.cached_sorted_docids.extend(documents.into_iter().map(|(doc_id, _)| doc_id));
self.cached_sorted_docids.extend(documents.into_iter());
};
Ok(())
@ -220,12 +221,19 @@ impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for GeoSort<Q> {
logger: &mut dyn SearchLogger<Q>,
universe: &RoaringBitmap,
) -> Result<Option<RankingRuleOutput<Q>>> {
assert!(universe.len() > 1);
let query = self.query.as_ref().unwrap().clone();
self.geo_candidates &= universe;
if self.geo_candidates.is_empty() {
return Ok(Some(RankingRuleOutput { query, candidates: universe.clone() }));
return Ok(Some(RankingRuleOutput {
query,
candidates: universe.clone(),
score: ScoreDetails::GeoSort(score_details::GeoSort {
target_point: self.point,
ascending: self.ascending,
value: None,
}),
}));
}
let ascending = self.ascending;
@ -236,11 +244,16 @@ impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for GeoSort<Q> {
cache.pop_back()
}
};
while let Some(id) = next(&mut self.cached_sorted_docids) {
while let Some((id, point)) = next(&mut self.cached_sorted_docids) {
if self.geo_candidates.contains(id) {
return Ok(Some(RankingRuleOutput {
query,
candidates: RoaringBitmap::from_iter([id]),
score: ScoreDetails::GeoSort(score_details::GeoSort {
target_point: self.point,
ascending: self.ascending,
value: Some(point),
}),
}));
}
}

View File

@ -50,6 +50,7 @@ use super::ranking_rule_graph::{
};
use super::small_bitmap::SmallBitmap;
use super::{QueryGraph, RankingRule, RankingRuleOutput, SearchContext};
use crate::score_details::Rank;
use crate::search::new::query_term::LocatedQueryTermSubset;
use crate::search::new::ranking_rule_graph::PathVisitor;
use crate::{Result, TermsMatchingStrategy};
@ -118,6 +119,8 @@ pub struct GraphBasedRankingRuleState<G: RankingRuleGraphTrait> {
all_costs: MappedInterner<QueryNode, Vec<u64>>,
/// An index in the first element of `all_distances`, giving the cost of the next bucket
cur_cost: u64,
/// One above the highest possible cost for this rule
next_max_cost: u64,
}
impl<'ctx, G: RankingRuleGraphTrait> RankingRule<'ctx, QueryGraph> for GraphBasedRankingRule<G> {
@ -139,13 +142,12 @@ impl<'ctx, G: RankingRuleGraphTrait> RankingRule<'ctx, QueryGraph> for GraphBase
let mut forbidden_nodes =
SmallBitmap::for_interned_values_in(&query_graph.nodes);
let mut costs = query_graph.nodes.map(|_| None);
let mut cost = 100;
// FIXME: this works because only words uses termsmatchingstrategy at the moment.
for ns in removal_order {
for n in ns.iter() {
*costs.get_mut(n) = Some((cost, forbidden_nodes.clone()));
*costs.get_mut(n) = Some((1, forbidden_nodes.clone()));
}
forbidden_nodes.union(&ns);
cost += 100;
}
costs
}
@ -162,12 +164,16 @@ impl<'ctx, G: RankingRuleGraphTrait> RankingRule<'ctx, QueryGraph> for GraphBase
// Then pre-compute the cost of all paths from each node to the end node
let all_costs = graph.find_all_costs_to_end();
let next_max_cost =
all_costs.get(graph.query_graph.root_node).iter().copied().max().unwrap_or(0) + 1;
let state = GraphBasedRankingRuleState {
graph,
conditions_cache: condition_docids_cache,
dead_ends_cache,
all_costs,
cur_cost: 0,
next_max_cost,
};
self.state = Some(state);
@ -181,17 +187,13 @@ impl<'ctx, G: RankingRuleGraphTrait> RankingRule<'ctx, QueryGraph> for GraphBase
logger: &mut dyn SearchLogger<QueryGraph>,
universe: &RoaringBitmap,
) -> Result<Option<RankingRuleOutput<QueryGraph>>> {
// If universe.len() <= 1, the bucket sort algorithm
// should not have called this function.
assert!(universe.len() > 1);
// Will crash if `next_bucket` is called before `start_iteration` or after `end_iteration`,
// should never happen
let mut state = self.state.take().unwrap();
let all_costs = state.all_costs.get(state.graph.query_graph.root_node);
// Retrieve the cost of the paths to compute
let Some(&cost) = state
.all_costs
.get(state.graph.query_graph.root_node)
let Some(&cost) = all_costs
.iter()
.find(|c| **c >= state.cur_cost) else {
self.state = None;
@ -207,8 +209,12 @@ impl<'ctx, G: RankingRuleGraphTrait> RankingRule<'ctx, QueryGraph> for GraphBase
dead_ends_cache,
all_costs,
cur_cost: _,
next_max_cost,
} = &mut state;
let rank = *next_max_cost - cost;
let score = G::rank_to_score(Rank { rank: rank as u32, max_rank: *next_max_cost as u32 });
let mut universe = universe.clone();
let mut used_conditions = SmallBitmap::for_interned_values_in(&graph.conditions_interner);
@ -325,7 +331,7 @@ impl<'ctx, G: RankingRuleGraphTrait> RankingRule<'ctx, QueryGraph> for GraphBase
self.state = Some(state);
Ok(Some(RankingRuleOutput { query: next_query_graph, candidates: bucket }))
Ok(Some(RankingRuleOutput { query: next_query_graph, candidates: bucket, score }))
}
fn end_iteration(

View File

@ -44,6 +44,7 @@ use self::geo_sort::GeoSort;
pub use self::geo_sort::Strategy as GeoSortStrategy;
use self::graph_based_ranking_rule::Words;
use self::interner::Interned;
use crate::score_details::ScoreDetails;
use crate::search::new::distinct::apply_distinct_rule;
use crate::{AscDesc, DocumentId, Filter, Index, Member, Result, TermsMatchingStrategy, UserError};
@ -426,13 +427,15 @@ pub fn execute_search(
)?
};
let BucketSortOutput { docids, mut all_candidates } = bucket_sort_output;
let BucketSortOutput { docids, scores, mut all_candidates } = bucket_sort_output;
let fields_ids_map = ctx.index.fields_ids_map(ctx.txn)?;
// The candidates is the universe unless the exhaustive number of hits
// is requested and a distinct attribute is set.
if exhaustive_number_hits {
if let Some(f) = ctx.index.distinct_field(ctx.txn)? {
if let Some(distinct_fid) = ctx.index.fields_ids_map(ctx.txn)?.id(f) {
if let Some(distinct_fid) = fields_ids_map.id(f) {
all_candidates = apply_distinct_rule(ctx, distinct_fid, &all_candidates)?.remaining;
}
}
@ -440,6 +443,7 @@ pub fn execute_search(
Ok(PartialSearchResult {
candidates: all_candidates,
document_scores: scores,
documents_ids: docids,
located_query_terms,
})
@ -491,4 +495,5 @@ pub struct PartialSearchResult {
pub located_query_terms: Option<Vec<LocatedQueryTerm>>,
pub candidates: RoaringBitmap,
pub documents_ids: Vec<DocumentId>,
pub document_scores: Vec<Vec<ScoreDetails>>,
}

View File

@ -77,13 +77,9 @@ pub fn located_query_terms_from_tokens(
}
}
TokenKind::Separator(separator_kind) => {
match separator_kind {
SeparatorKind::Hard => {
position += 1;
}
SeparatorKind::Soft => {
position += 0;
}
// add penalty for hard separators
if let SeparatorKind::Hard = separator_kind {
position = position.wrapping_add(7);
}
phrase = 'phrase: {
@ -288,3 +284,36 @@ impl PhraseBuilder {
})
}
}
#[cfg(test)]
mod tests {
use charabia::TokenizerBuilder;
use super::*;
use crate::index::tests::TempIndex;
fn temp_index_with_documents() -> TempIndex {
let temp_index = TempIndex::new();
temp_index
.add_documents(documents!([
{ "id": 1, "name": "split this world westfali westfalia the Ŵôřlḑôle" },
{ "id": 2, "name": "Westfália" },
{ "id": 3, "name": "Ŵôřlḑôle" },
]))
.unwrap();
temp_index
}
#[test]
fn start_with_hard_separator() -> Result<()> {
let tokenizer = TokenizerBuilder::new().build();
let tokens = tokenizer.tokenize(".");
let index = temp_index_with_documents();
let rtxn = index.read_txn()?;
let mut ctx = SearchContext::new(&index, &rtxn);
// panics with `attempt to add with overflow` before <https://github.com/meilisearch/meilisearch/issues/3785>
let located_query_terms = located_query_terms_from_tokens(&mut ctx, tokens, None)?;
assert!(located_query_terms.is_empty());
Ok(())
}
}

View File

@ -49,10 +49,15 @@ impl<G: RankingRuleGraphTrait> RankingRuleGraph<G> {
if let Some((cost_of_ignoring, forbidden_nodes)) =
cost_of_ignoring_node.get(dest_idx)
{
let dest = graph_nodes.get(dest_idx);
let dest_size = match &dest.data {
QueryNodeData::Term(term) => term.term_ids.len(),
_ => panic!(),
};
let new_edge_id = edges_store.insert(Some(Edge {
source_node: source_id,
dest_node: dest_idx,
cost: *cost_of_ignoring,
cost: *cost_of_ignoring * dest_size as u32,
condition: None,
nodes_to_skip: forbidden_nodes.clone(),
}));

View File

@ -1,6 +1,7 @@
use roaring::RoaringBitmap;
use super::{ComputedCondition, RankingRuleGraphTrait};
use crate::score_details::{Rank, ScoreDetails};
use crate::search::new::interner::{DedupInterner, Interned};
use crate::search::new::query_term::{ExactTerm, LocatedQueryTermSubset};
use crate::search::new::resolve_query_graph::compute_query_term_subset_docids;
@ -84,4 +85,8 @@ impl RankingRuleGraphTrait for ExactnessGraph {
Ok(vec![(0, exact_condition), (dest_node.term_ids.len() as u32, skip_condition)])
}
fn rank_to_score(rank: Rank) -> ScoreDetails {
ScoreDetails::Exactness(rank)
}
}

View File

@ -2,6 +2,7 @@ use fxhash::FxHashSet;
use roaring::RoaringBitmap;
use super::{ComputedCondition, RankingRuleGraphTrait};
use crate::score_details::{Rank, ScoreDetails};
use crate::search::new::interner::{DedupInterner, Interned};
use crate::search::new::query_term::LocatedQueryTermSubset;
use crate::search::new::resolve_query_graph::compute_query_term_subset_docids_within_field_id;
@ -68,7 +69,7 @@ impl RankingRuleGraphTrait for FidGraph {
}
let mut edges = vec![];
for fid in all_fields {
for fid in all_fields.iter().copied() {
// TODO: We can improve performances and relevancy by storing
// the term subsets associated to each field ids fetched.
edges.push((
@ -80,6 +81,35 @@ impl RankingRuleGraphTrait for FidGraph {
));
}
// always lookup the max_fid if we don't already and add an artificial condition for max scoring
let max_fid: Option<u16> = {
if let Some(max_fid) = ctx
.index
.searchable_fields_ids(ctx.txn)?
.map(|field_ids| field_ids.into_iter().max())
{
max_fid
} else {
ctx.index.fields_ids_map(ctx.txn)?.ids().max()
}
};
if let Some(max_fid) = max_fid {
if !all_fields.contains(&max_fid) {
edges.push((
max_fid as u32 * term.term_ids.len() as u32, // TODO improve the fid score i.e. fid^10.
conditions_interner.insert(FidCondition {
term: term.clone(), // TODO remove this ugly clone
fid: max_fid,
}),
));
}
}
Ok(edges)
}
fn rank_to_score(rank: Rank) -> ScoreDetails {
ScoreDetails::Fid(rank)
}
}

View File

@ -41,6 +41,7 @@ use super::interner::{DedupInterner, FixedSizeInterner, Interned, MappedInterner
use super::query_term::LocatedQueryTermSubset;
use super::small_bitmap::SmallBitmap;
use super::{QueryGraph, QueryNode, SearchContext};
use crate::score_details::{Rank, ScoreDetails};
use crate::Result;
pub struct ComputedCondition {
@ -110,6 +111,9 @@ pub trait RankingRuleGraphTrait: Sized + 'static {
source_node: Option<&LocatedQueryTermSubset>,
dest_node: &LocatedQueryTermSubset,
) -> Result<Vec<(u32, Interned<Self::Condition>)>>;
/// Convert the rank of a path to its corresponding score for the ranking rule
fn rank_to_score(rank: Rank) -> ScoreDetails;
}
/// The graph used by graph-based ranking rules.

View File

@ -2,6 +2,7 @@ use fxhash::{FxHashMap, FxHashSet};
use roaring::RoaringBitmap;
use super::{ComputedCondition, RankingRuleGraphTrait};
use crate::score_details::{Rank, ScoreDetails};
use crate::search::new::interner::{DedupInterner, Interned};
use crate::search::new::query_term::LocatedQueryTermSubset;
use crate::search::new::resolve_query_graph::compute_query_term_subset_docids_within_position;
@ -105,8 +106,20 @@ impl RankingRuleGraphTrait for PositionGraph {
));
}
// artificial empty condition for computing max cost
let max_cost = term.term_ids.len() as u32 * 10;
edges.push((
max_cost,
conditions_interner
.insert(PositionCondition { term: term.clone(), positions: Vec::default() }),
));
Ok(edges)
}
fn rank_to_score(rank: Rank) -> ScoreDetails {
ScoreDetails::Position(rank)
}
}
fn cost_from_position(sum_positions: u32) -> u32 {

View File

@ -4,6 +4,7 @@ pub mod compute_docids;
use roaring::RoaringBitmap;
use super::{ComputedCondition, RankingRuleGraphTrait};
use crate::score_details::{Rank, ScoreDetails};
use crate::search::new::interner::{DedupInterner, Interned};
use crate::search::new::query_term::LocatedQueryTermSubset;
use crate::search::new::SearchContext;
@ -36,4 +37,8 @@ impl RankingRuleGraphTrait for ProximityGraph {
) -> Result<Vec<(u32, Interned<Self::Condition>)>> {
build::build_edges(ctx, conditions_interner, source_term, dest_term)
}
fn rank_to_score(rank: Rank) -> ScoreDetails {
ScoreDetails::Proximity(rank)
}
}

View File

@ -1,6 +1,7 @@
use roaring::RoaringBitmap;
use super::{ComputedCondition, RankingRuleGraphTrait};
use crate::score_details::{self, Rank, ScoreDetails};
use crate::search::new::interner::{DedupInterner, Interned};
use crate::search::new::query_term::LocatedQueryTermSubset;
use crate::search::new::resolve_query_graph::compute_query_term_subset_docids;
@ -75,4 +76,8 @@ impl RankingRuleGraphTrait for TypoGraph {
}
Ok(edges)
}
fn rank_to_score(rank: Rank) -> ScoreDetails {
ScoreDetails::Typo(score_details::Typo::from_rank(rank))
}
}

View File

@ -1,6 +1,7 @@
use roaring::RoaringBitmap;
use super::{ComputedCondition, RankingRuleGraphTrait};
use crate::score_details::{self, Rank, ScoreDetails};
use crate::search::new::interner::{DedupInterner, Interned};
use crate::search::new::query_term::LocatedQueryTermSubset;
use crate::search::new::resolve_query_graph::compute_query_term_subset_docids;
@ -41,9 +42,10 @@ impl RankingRuleGraphTrait for WordsGraph {
_from: Option<&LocatedQueryTermSubset>,
to_term: &LocatedQueryTermSubset,
) -> Result<Vec<(u32, Interned<Self::Condition>)>> {
Ok(vec![(
to_term.term_ids.len() as u32,
conditions_interner.insert(WordsCondition { term: to_term.clone() }),
)])
Ok(vec![(0, conditions_interner.insert(WordsCondition { term: to_term.clone() }))])
}
fn rank_to_score(rank: Rank) -> ScoreDetails {
ScoreDetails::Words(score_details::Words::from_rank(rank))
}
}

View File

@ -2,6 +2,7 @@ use roaring::RoaringBitmap;
use super::logger::SearchLogger;
use super::{QueryGraph, SearchContext};
use crate::score_details::ScoreDetails;
use crate::Result;
/// An internal trait implemented by only [`PlaceholderQuery`] and [`QueryGraph`]
@ -66,4 +67,6 @@ pub struct RankingRuleOutput<Q> {
pub query: Q,
/// The allowed candidates for the child ranking rule
pub candidates: RoaringBitmap,
/// The score for the candidates of the current bucket
pub score: ScoreDetails,
}

View File

@ -1,9 +1,11 @@
use heed::BytesDecode;
use roaring::RoaringBitmap;
use super::logger::SearchLogger;
use super::{RankingRule, RankingRuleOutput, RankingRuleQueryTrait, SearchContext};
use crate::heed_codec::facet::FacetGroupKeyCodec;
use crate::heed_codec::ByteSliceRefCodec;
use crate::heed_codec::facet::{FacetGroupKeyCodec, OrderedF64Codec};
use crate::heed_codec::{ByteSliceRefCodec, StrRefCodec};
use crate::score_details::{self, ScoreDetails};
use crate::search::facet::{ascending_facet_sort, descending_facet_sort};
use crate::{FieldId, Index, Result};
@ -67,7 +69,7 @@ impl<'ctx, Query> Sort<'ctx, Query> {
impl<'ctx, Query: RankingRuleQueryTrait> RankingRule<'ctx, Query> for Sort<'ctx, Query> {
fn id(&self) -> String {
let Self { field_name, is_ascending, .. } = self;
format!("{field_name}:{}", if *is_ascending { "asc" } else { "desc " })
format!("{field_name}:{}", if *is_ascending { "asc" } else { "desc" })
}
fn start_iteration(
&mut self,
@ -118,12 +120,43 @@ impl<'ctx, Query: RankingRuleQueryTrait> RankingRule<'ctx, Query> for Sort<'ctx,
(itertools::Either::Right(number_iter), itertools::Either::Right(string_iter))
};
let number_iter = number_iter.map(|r| -> Result<_> {
let (docids, bytes) = r?;
Ok((
docids,
serde_json::Value::Number(
serde_json::Number::from_f64(
OrderedF64Codec::bytes_decode(bytes).expect("some number"),
)
.expect("too big float"),
),
))
});
let string_iter = string_iter.map(|r| -> Result<_> {
let (docids, bytes) = r?;
Ok((
docids,
serde_json::Value::String(
StrRefCodec::bytes_decode(bytes).expect("some string").to_owned(),
),
))
});
let query_graph = parent_query.clone();
let ascending = self.is_ascending;
let field_name = self.field_name.clone();
RankingRuleOutputIterWrapper::new(Box::new(number_iter.chain(string_iter).map(
move |r| {
let (docids, _) = r?;
Ok(RankingRuleOutput { query: query_graph.clone(), candidates: docids })
let (docids, value) = r?;
Ok(RankingRuleOutput {
query: query_graph.clone(),
candidates: docids,
score: ScoreDetails::Sort(score_details::Sort {
field_name: field_name.clone(),
ascending,
value,
}),
})
},
)))
}
@ -150,7 +183,15 @@ impl<'ctx, Query: RankingRuleQueryTrait> RankingRule<'ctx, Query> for Sort<'ctx,
Ok(Some(bucket))
} else {
let query = self.original_query.as_ref().unwrap().clone();
Ok(Some(RankingRuleOutput { query, candidates: universe.clone() }))
Ok(Some(RankingRuleOutput {
query,
candidates: universe.clone(),
score: ScoreDetails::Sort(score_details::Sort {
field_name: self.field_name.clone(),
ascending: self.is_ascending,
value: serde_json::Value::Null,
}),
}))
}
}

View File

@ -89,7 +89,6 @@ Create a snapshot test of the given database.
- `exact_word_docids`
- `word_prefix_docids`
- `exact_word_prefix_docids`
- `docid_word_positions`
- `word_pair_proximity_docids`
- `word_prefix_pair_proximity_docids`
- `word_position_docids`
@ -217,11 +216,6 @@ pub fn snap_exact_word_prefix_docids(index: &Index) -> String {
&format!("{s:<16} {}", display_bitmap(&b))
})
}
pub fn snap_docid_word_positions(index: &Index) -> String {
make_db_snap_from_iter!(index, docid_word_positions, |((idx, s), b)| {
&format!("{idx:<6} {s:<16} {}", display_bitmap(&b))
})
}
pub fn snap_word_pair_proximity_docids(index: &Index) -> String {
make_db_snap_from_iter!(index, word_pair_proximity_docids, |((proximity, word1, word2), b)| {
&format!("{proximity:<2} {word1:<16} {word2:<16} {}", display_bitmap(&b))
@ -477,9 +471,6 @@ macro_rules! full_snap_of_db {
($index:ident, exact_word_prefix_docids) => {{
$crate::snapshot_tests::snap_exact_word_prefix_docids(&$index)
}};
($index:ident, docid_word_positions) => {{
$crate::snapshot_tests::snap_docid_word_positions(&$index)
}};
($index:ident, word_pair_proximity_docids) => {{
$crate::snapshot_tests::snap_word_pair_proximity_docids(&$index)
}};

View File

@ -23,7 +23,6 @@ impl<'t, 'u, 'i> ClearDocuments<'t, 'u, 'i> {
exact_word_docids,
word_prefix_docids,
exact_word_prefix_docids,
docid_word_positions,
word_pair_proximity_docids,
word_prefix_pair_proximity_docids,
prefix_word_pair_proximity_docids,
@ -80,7 +79,6 @@ impl<'t, 'u, 'i> ClearDocuments<'t, 'u, 'i> {
exact_word_docids.clear(self.wtxn)?;
word_prefix_docids.clear(self.wtxn)?;
exact_word_prefix_docids.clear(self.wtxn)?;
docid_word_positions.clear(self.wtxn)?;
word_pair_proximity_docids.clear(self.wtxn)?;
word_prefix_pair_proximity_docids.clear(self.wtxn)?;
prefix_word_pair_proximity_docids.clear(self.wtxn)?;
@ -141,7 +139,6 @@ mod tests {
assert!(index.word_docids.is_empty(&rtxn).unwrap());
assert!(index.word_prefix_docids.is_empty(&rtxn).unwrap());
assert!(index.docid_word_positions.is_empty(&rtxn).unwrap());
assert!(index.word_pair_proximity_docids.is_empty(&rtxn).unwrap());
assert!(index.field_id_word_count_docids.is_empty(&rtxn).unwrap());
assert!(index.word_prefix_pair_proximity_docids.is_empty(&rtxn).unwrap());

View File

@ -1,5 +1,5 @@
use std::collections::btree_map::Entry;
use std::collections::{HashMap, HashSet};
use std::collections::{BTreeSet, HashMap, HashSet};
use fst::IntoStreamer;
use heed::types::{ByteSlice, DecodeIgnore, Str, UnalignedSlice};
@ -15,8 +15,7 @@ use crate::facet::FacetType;
use crate::heed_codec::facet::FieldDocIdFacetCodec;
use crate::heed_codec::CboRoaringBitmapCodec;
use crate::{
ExternalDocumentsIds, FieldId, FieldIdMapMissingEntry, Index, Result, RoaringBitmapCodec,
SmallString32, BEU32,
ExternalDocumentsIds, FieldId, FieldIdMapMissingEntry, Index, Result, RoaringBitmapCodec, BEU32,
};
pub struct DeleteDocuments<'t, 'u, 'i> {
@ -232,7 +231,6 @@ impl<'t, 'u, 'i> DeleteDocuments<'t, 'u, 'i> {
exact_word_docids,
word_prefix_docids,
exact_word_prefix_docids,
docid_word_positions,
word_pair_proximity_docids,
field_id_word_count_docids,
word_prefix_pair_proximity_docids,
@ -251,23 +249,9 @@ impl<'t, 'u, 'i> DeleteDocuments<'t, 'u, 'i> {
facet_id_is_empty_docids,
documents,
} = self.index;
// Retrieve the words contained in the documents.
let mut words = Vec::new();
// Remove from the documents database
for docid in &self.to_delete_docids {
documents.delete(self.wtxn, &BEU32::new(docid))?;
// We iterate through the words positions of the document id, retrieve the word and delete the positions.
// We create an iterator to be able to get the content and delete the key-value itself.
// It's faster to acquire a cursor to get and delete, as we avoid traversing the LMDB B-Tree two times but only once.
let mut iter = docid_word_positions.prefix_iter_mut(self.wtxn, &(docid, ""))?;
while let Some(result) = iter.next() {
let ((_docid, word), _positions) = result?;
// This boolean will indicate if we must remove this word from the words FST.
words.push((SmallString32::from(word), false));
// safety: we don't keep references from inside the LMDB database.
unsafe { iter.del_current()? };
}
}
// We acquire the current external documents ids map...
// Note that its soft-deleted document ids field will be equal to the `to_delete_docids`
@ -278,42 +262,27 @@ impl<'t, 'u, 'i> DeleteDocuments<'t, 'u, 'i> {
let new_external_documents_ids = new_external_documents_ids.into_static();
self.index.put_external_documents_ids(self.wtxn, &new_external_documents_ids)?;
// Maybe we can improve the get performance of the words
// if we sort the words first, keeping the LMDB pages in cache.
words.sort_unstable();
let mut words_to_keep = BTreeSet::default();
let mut words_to_delete = BTreeSet::default();
// We iterate over the words and delete the documents ids
// from the word docids database.
for (word, must_remove) in &mut words {
remove_from_word_docids(
self.wtxn,
word_docids,
word.as_str(),
must_remove,
&self.to_delete_docids,
)?;
remove_from_word_docids(
self.wtxn,
exact_word_docids,
word.as_str(),
must_remove,
&self.to_delete_docids,
)?;
}
remove_from_word_docids(
self.wtxn,
word_docids,
&self.to_delete_docids,
&mut words_to_keep,
&mut words_to_delete,
)?;
remove_from_word_docids(
self.wtxn,
exact_word_docids,
&self.to_delete_docids,
&mut words_to_keep,
&mut words_to_delete,
)?;
// We construct an FST set that contains the words to delete from the words FST.
let words_to_delete =
words.iter().filter_map(
|(word, must_remove)| {
if *must_remove {
Some(word.as_str())
} else {
None
}
},
);
let words_to_delete = fst::Set::from_iter(words_to_delete)?;
let words_to_delete = fst::Set::from_iter(words_to_delete.difference(&words_to_keep))?;
let new_words_fst = {
// We retrieve the current words FST from the database.
@ -532,23 +501,24 @@ fn remove_from_word_prefix_docids(
fn remove_from_word_docids(
txn: &mut heed::RwTxn,
db: &heed::Database<Str, RoaringBitmapCodec>,
word: &str,
must_remove: &mut bool,
to_remove: &RoaringBitmap,
words_to_keep: &mut BTreeSet<String>,
words_to_remove: &mut BTreeSet<String>,
) -> Result<()> {
// We create an iterator to be able to get the content and delete the word docids.
// It's faster to acquire a cursor to get and delete or put, as we avoid traversing
// the LMDB B-Tree two times but only once.
let mut iter = db.prefix_iter_mut(txn, word)?;
if let Some((key, mut docids)) = iter.next().transpose()? {
if key == word {
let previous_len = docids.len();
docids -= to_remove;
if docids.is_empty() {
// safety: we don't keep references from inside the LMDB database.
unsafe { iter.del_current()? };
*must_remove = true;
} else if docids.len() != previous_len {
let mut iter = db.iter_mut(txn)?;
while let Some((key, mut docids)) = iter.next().transpose()? {
let previous_len = docids.len();
docids -= to_remove;
if docids.is_empty() {
// safety: we don't keep references from inside the LMDB database.
unsafe { iter.del_current()? };
words_to_remove.insert(key.to_owned());
} else {
words_to_keep.insert(key.to_owned());
if docids.len() != previous_len {
let key = key.to_owned();
// safety: we don't keep references from inside the LMDB database.
unsafe { iter.put_current(&key, &docids)? };
@ -627,7 +597,7 @@ mod tests {
use super::*;
use crate::index::tests::TempIndex;
use crate::{db_snap, Filter};
use crate::{db_snap, Filter, Search};
fn delete_documents<'t>(
wtxn: &mut RwTxn<'t, '_>,
@ -1199,4 +1169,52 @@ mod tests {
DeletionStrategy::AlwaysSoft,
);
}
#[test]
fn delete_words_exact_attributes() {
let index = TempIndex::new();
index
.update_settings(|settings| {
settings.set_primary_key(S("id"));
settings.set_searchable_fields(vec![S("text"), S("exact")]);
settings.set_exact_attributes(vec![S("exact")].into_iter().collect());
})
.unwrap();
index
.add_documents(documents!([
{ "id": 0, "text": "hello" },
{ "id": 1, "exact": "hello"}
]))
.unwrap();
db_snap!(index, word_docids, 1, @r###"
hello [0, ]
"###);
db_snap!(index, exact_word_docids, 1, @r###"
hello [1, ]
"###);
db_snap!(index, words_fst, 1, @"300000000000000001084cfcfc2ce1000000016000000090ea47f");
let mut wtxn = index.write_txn().unwrap();
let deleted_internal_ids =
delete_documents(&mut wtxn, &index, &["1"], DeletionStrategy::AlwaysHard);
wtxn.commit().unwrap();
db_snap!(index, word_docids, 2, @r###"
hello [0, ]
"###);
db_snap!(index, exact_word_docids, 2, @"");
db_snap!(index, words_fst, 2, @"300000000000000001084cfcfc2ce1000000016000000090ea47f");
insta::assert_snapshot!(format!("{deleted_internal_ids:?}"), @"[1]");
let txn = index.read_txn().unwrap();
let words = index.words_fst(&txn).unwrap().into_stream().into_strs().unwrap();
insta::assert_snapshot!(format!("{words:?}"), @r###"["hello"]"###);
let mut s = Search::new(&txn, &index);
s.query("hello");
let crate::SearchResult { documents_ids, .. } = s.execute().unwrap();
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[0]");
}
}

View File

@ -1,6 +1,6 @@
use std::collections::HashMap;
use std::fs::File;
use std::{cmp, io};
use std::io;
use grenad::Sorter;
@ -54,11 +54,10 @@ pub fn extract_fid_word_count_docids<R: io::Read + io::Seek>(
}
for position in read_u32_ne_bytes(value) {
let (field_id, position) = relative_from_absolute_position(position);
let word_count = position as u32 + 1;
let (field_id, _) = relative_from_absolute_position(position);
let value = document_fid_wordcount.entry(field_id as FieldId).or_insert(0);
*value = cmp::max(*value, word_count);
*value += 1;
}
}
@ -83,7 +82,7 @@ fn drain_document_fid_wordcount_into_sorter(
let mut key_buffer = Vec::new();
for (fid, count) in document_fid_wordcount.drain() {
if count <= 10 {
if count <= 30 {
key_buffer.clear();
key_buffer.extend_from_slice(&fid.to_be_bytes());
key_buffer.push(count as u8);

View File

@ -325,8 +325,6 @@ fn send_and_extract_flattened_documents_data(
// send docid_word_positions_chunk to DB writer
let docid_word_positions_chunk =
unsafe { as_cloneable_grenad(&docid_word_positions_chunk)? };
let _ = lmdb_writer_sx
.send(Ok(TypedChunk::DocidWordPositions(docid_word_positions_chunk.clone())));
let _ =
lmdb_writer_sx.send(Ok(TypedChunk::ScriptLanguageDocids(script_language_pair)));

View File

@ -4,7 +4,6 @@ use std::result::Result as StdResult;
use roaring::RoaringBitmap;
use super::read_u32_ne_bytes;
use crate::heed_codec::CboRoaringBitmapCodec;
use crate::update::index_documents::transform::Operation;
use crate::Result;
@ -22,10 +21,6 @@ pub fn concat_u32s_array<'a>(_key: &[u8], values: &[Cow<'a, [u8]>]) -> Result<Co
}
}
pub fn roaring_bitmap_from_u32s_array(slice: &[u8]) -> RoaringBitmap {
read_u32_ne_bytes(slice).collect()
}
pub fn serialize_roaring_bitmap(bitmap: &RoaringBitmap, buffer: &mut Vec<u8>) -> io::Result<()> {
buffer.clear();
buffer.reserve(bitmap.serialized_size());

View File

@ -14,8 +14,8 @@ pub use grenad_helpers::{
};
pub use merge_functions::{
concat_u32s_array, keep_first, keep_latest_obkv, merge_cbo_roaring_bitmaps,
merge_obkvs_and_operations, merge_roaring_bitmaps, merge_two_obkvs,
roaring_bitmap_from_u32s_array, serialize_roaring_bitmap, MergeFn,
merge_obkvs_and_operations, merge_roaring_bitmaps, merge_two_obkvs, serialize_roaring_bitmap,
MergeFn,
};
use crate::MAX_WORD_LENGTH;

View File

@ -2471,11 +2471,11 @@ mod tests {
{
"id": 3,
"text": "a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a a a a a a
a a a a a a a a a a a a a a a a a a a a a "
}
]))
@ -2513,6 +2513,5 @@ mod tests {
db_snap!(index, word_fid_docids, 3, @"4c2e2a1832e5802796edc1638136d933");
db_snap!(index, word_position_docids, 3, @"74f556b91d161d997a89468b4da1cb8f");
db_snap!(index, docid_word_positions, 3, @"5287245332627675740b28bd46e1cde1");
}
}

View File

@ -7,24 +7,19 @@ use std::io;
use charabia::{Language, Script};
use grenad::MergerBuilder;
use heed::types::ByteSlice;
use heed::{BytesDecode, RwTxn};
use heed::RwTxn;
use roaring::RoaringBitmap;
use super::helpers::{
self, merge_ignore_values, roaring_bitmap_from_u32s_array, serialize_roaring_bitmap,
valid_lmdb_key, CursorClonableMmap,
self, merge_ignore_values, serialize_roaring_bitmap, valid_lmdb_key, CursorClonableMmap,
};
use super::{ClonableMmap, MergeFn};
use crate::facet::FacetType;
use crate::update::facet::FacetsUpdate;
use crate::update::index_documents::helpers::as_cloneable_grenad;
use crate::{
lat_lng_to_xyz, BoRoaringBitmapCodec, CboRoaringBitmapCodec, DocumentId, GeoPoint, Index,
Result,
};
use crate::{lat_lng_to_xyz, CboRoaringBitmapCodec, DocumentId, GeoPoint, Index, Result};
pub(crate) enum TypedChunk {
DocidWordPositions(grenad::Reader<CursorClonableMmap>),
FieldIdDocidFacetStrings(grenad::Reader<CursorClonableMmap>),
FieldIdDocidFacetNumbers(grenad::Reader<CursorClonableMmap>),
Documents(grenad::Reader<CursorClonableMmap>),
@ -56,29 +51,6 @@ pub(crate) fn write_typed_chunk_into_index(
) -> Result<(RoaringBitmap, bool)> {
let mut is_merged_database = false;
match typed_chunk {
TypedChunk::DocidWordPositions(docid_word_positions_iter) => {
write_entries_into_database(
docid_word_positions_iter,
&index.docid_word_positions,
wtxn,
index_is_empty,
|value, buffer| {
// ensure that values are unique and ordered
let positions = roaring_bitmap_from_u32s_array(value);
BoRoaringBitmapCodec::serialize_into(&positions, buffer);
Ok(buffer)
},
|new_values, db_values, buffer| {
let new_values = roaring_bitmap_from_u32s_array(new_values);
let positions = match BoRoaringBitmapCodec::bytes_decode(db_values) {
Some(db_values) => new_values | db_values,
None => new_values, // should not happen
};
BoRoaringBitmapCodec::serialize_into(&positions, buffer);
Ok(())
},
)?;
}
TypedChunk::Documents(obkv_documents_iter) => {
let mut cursor = obkv_documents_iter.into_cursor()?;
while let Some((key, value)) = cursor.move_on_next()? {