4462: Divide threshold by ten r=dureuill a=ManyTheFish
Change the facet incremental vs bulk indexing threshold to better fit our user needs, it might be changed in the future if we have more insights
Co-authored-by: ManyTheFish <many@meilisearch.com>
4458: Replace logging timer by spans r=Kerollmops a=dureuill
- Remove logging timer dependency.
- Remplace last uses in search by spans
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4459: Put a bound on OpenAI timeout r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes#4460
## What does this PR do?
- Makes sure that the timeout of the openai embedder is limited to max 1min, rather than the prior 15min+
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4445: Add subcommand to run benchmarks r=irevoire a=dureuill
# Pull Request
## Related issue
Not user-facing, no issue
## What does this PR do?
- Adds a new `cargo xtask bench` subcommand that can run one or multiple workload files and report the results to a server
- A workload file is a JSON file with a specific schema
- Refactor our use of the `vergen` crate:
- update to the beta `vergen-git2` crate
- VERGEN_GIT_SEMVER_LIGHTWEIGHT => VERGEN_GIT_DESCRIBE
- factor logic in a single `build-info` crate that is used both by meilisearch and xtask (prevents vergen variables from overriding themselves)
- checked that defining the variables by hand when no git repo is available (docker build case) still works.
- Add CI to run `cargo xtask bench`
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4450: Add the content type in the webhook + improve the test r=Kerollmops a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4436
## What does this PR do?
- Specify the content type of the webhook
- Ensure it’s the case in the test
Co-authored-by: Tamo <tamo@meilisearch.com>
4453: Don't test on nightly r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes#4441 better 😅
## What does this PR do?
- No longer run tests on nightly
The motivation for this change is that we are now updating Rust at fixed points in time, and so no longer need nightly runs to ensure that a change won't get into stable and break our build at the worst possible moment.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4451: Fix nightly build r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes#4441
## What does this PR do?
- Change imports following https://github.com/rust-lang/rust/pull/117772
## Note
This one is going to be annoying a bit until the lint stabilizes:
- We only get the warning on nightly, so we will discover them when it runs in the CI that uses the nightly compiler (not on regular PRs)
- There's the case of `TryInto`/`TryFrom` traits. They have been added to the prelude in Rust edition 2021, so it means that `use`ing them is a warning on nightly for 2021 edition crates (most crates), but not `use`ing them is an error anywhere for 2018 Rust edition crates, such as `milli`
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4433: Enhance facet incremental r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes#4367Fixes#4409
## What does this PR do?
- Add a test reproducing #4409
- Fix#4409 by removing a document from a level only if it is no more present in all the linked sub-level nodes
- Optimize facet Incremental indexing by creating or deleting a complete level once per field id instead of for each facet value
- Optimize facet Incremental indexing by doing the additions and the deletions in the same process instead of doing them separately
Co-authored-by: ManyTheFish <many@meilisearch.com>
4446: Do not omit vectors when importing a dump r=irevoire a=dureuill
# Pull Request
## Related issue
Fixes#4447
## What does this PR do?
- Correctly populate the maps of embedders before starting the indexing operations, while importing a dump
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4435: Make update file deletion atomic r=Kerollmops a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4432
Fixes https://github.com/meilisearch/meilisearch/issues/4438 by adding the logs the user asked
## What does this PR do?
- Adds a bunch of logs to help debug this kind of issue in the future
- Delete the update files AFTER committing the update in the `index-scheduler` (thus, if a restart happens, we are able to re-process the batch successfully)
- Multi-thread the deletion of all update files.
Co-authored-by: Tamo <tamo@meilisearch.com>
4443: Add GPU analytics r=dureuill a=dureuill
# Pull Request
## Related issue
Adds analytics indicating whether Meilisearch was compiled with the `milli/cuda` feature.
Cc `@macraig`
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4442: Send custom task r=ManyTheFish a=irevoire
This PR has already been merged on main but was supposed to be merged on `release-v1.7.0` thus we need to merge it a second time; sorry 😓
### This PR implements the necessary parameters for the High Availability
Introduce a new CLI flag called `--experimental-replication-parameters` that changes a few behaviors in the engine:
- [The auto-deletion of tasks is disabled](https://specs.meilisearch.com/specifications/text/0060-tasks-api.html#_2-technical-details)
- Upon registering a task, you can choose its task ID by sending a new header: `TaskId: 456645`. It must be a valid number, which must be superior to the last task id ever seen.
- Add the ability to « dry-register » a task. That means meilisearch will answer to you with a valid task ID like everything went well, but won’t actually write anything in the database. To do that, you need to use the `DryRun: true` header.
- Specification’s here: https://github.com/meilisearch/specifications/pull/266
Co-authored-by: Tamo <tamo@meilisearch.com>
4418: Output logs to stderr r=dureuill a=irevoire
Output the logs to `stderr` instead of `stdout`. This was introduced in the `v1.7.0-rc.0` and is a bug; logs should always be outputted to stderr.
Fix#4419
Co-authored-by: Tamo <tamo@meilisearch.com>
4410: Implement the experimental log mode cli flag and log level updates at runtime r=dureuill a=irevoire
# Pull Request
This PR fixes two issues at once because they’re highly correlated in the codebase.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4415
Fixes https://github.com/meilisearch/meilisearch/issues/4413
## What does this PR do?
- It makes the fmt logger configurable to output json or human-readable logs (like we already do today)
- It moves the fmt logger under a `reload` layer so we can update its targets at runtime
- Add the possibility to stream logs in the json mode
- Adds an analytics for the new CLI flag
Co-authored-by: Tamo <tamo@meilisearch.com>
4401: Update version for the next release (v1.7.0) in Cargo.toml r=irevoire a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
4391: Tracing r=dureuill a=irevoire
# Pull Request
- [ ] Hide the parameters of the process batch
- [x] Make actix-web trace every call on every route
- [x] Remove all `env_logger`/`logs` dependencies
- [x] Be able to enable or disable the memory measurement using the `/logs` route parameters
See the following product discussion: https://github.com/orgs/meilisearch/discussions/721
Supersedes https://github.com/meilisearch/meilisearch/pull/4338
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4317
## What does this PR do?
Update the format of the logs from:
```
[2024-02-06T14:54:11Z INFO actix_server::builder] starting 10 workers
```
to
```
2024-02-06T13:58:14.710803Z INFO actix_server::builder: 200: starting 10 workers
```
First, run meilisearch with the route enabled via the feature flag:
- `cargo run --experimental-enable-logs-route`
- Or at runtime by sending the following payload:
```
curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' \
--data-binary '{
"logsRoute": true
}'
```
Then gather data from meilisearch by calling for example:
```
curl \
-X POST http://localhost:7700/logs \
-H 'Content-Type: application/json' \
--data-binary '{
"mode": "fmt",
"target": "milli=trace"
}'
```
Once your operation is over, tell meilisearch to stop the route:
```
curl \
-X DELETE http://localhost:7700/logs
```
----
In the case you’re profiling code, you will be interested by the next command that converts the output of the route to a format that the firefox profiler can understand.
```bash
cargo run --release --bin trace-to-firefox -- 2024-01-17_17:07:55-indexing-trace.json
```
Then go to https://profiler.firefox.com and load it.
Note that we can also share the profiles using the https://share.firefox.dev website.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
4388: Cap the maximum memory of the grenad sorters r=curquiza a=Kerollmops
This PR clamps the memory usage of the grenad sorters to a reasonable maximum. Grenad sorters are opened on multiple threads at a time. This can result in higher memory usage than expected, even though it shouldn't consume more than the memory available.
Fixes#4152.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4389: Stabilize scoreDetails r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes#4359
## What does this PR do?
### User standpoint
- Users no longer need to enable the `scoreDetails` experimental feature to use `showRankingScoreDetails` in search queries.
- ⚠️ **Breaking change**: sending an object containing the key `"scoreDetails"` to the `/experimental-features` route is now an error. However, importing a dump of a database where that feature was enabled completes successfully.
### Implementation standpoint
- remove `scoreDetails` from the experimental features
- remove check on the experimental feature `scoreDetails` before accepting `showRankingScoreDetails`
- remove `scoreDetails` from the accepted fields in the `/experimental-features` route
- fix tests accordingly
## Manual tests
1. exported a dump with the `scoreDetails` feature enabled on `main`
- tried to import the dump after the changes in this PR
- the dump imported successfully
2. tried to make a search with `showRankingScoreDetails: true`
- the ranking score details are displayed
- an automated test case also exists and passes
3. tried to enable the `scoreDetails` in `/experimental-features`
- get error message
```
Unknown field `scoreDetails`: expected one of `vectorStore`, `metrics`, `exportPuffinReports`
```
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4375: Feat: add new OpenAI models and ability to override dimensions r=dureuill a=Gosti
# Pull Request
Fixes#4394
## Related discussion
https://github.com/orgs/meilisearch/discussions/677#discussioncomment-8306384
## What does this PR do?
- Add text-embedding-3-small
- Add text-embedding-3-large
- Add optional dimensions parameter for both new models
## Note
As the dimensions option is not available for text-embedding-ada-002 I've added a manual check to prevent, but I feel it could be implemented in a more idiomatic rust
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Gosti <gostitsog@gmail.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
-> make sure the settings change is rejected or the settings task fails when the specified model doesn't support
overriding `dimensions` and the passed `dimensions` differs from the model's default dimensions.
4360: fix readme broken links r=curquiza a=Elliot67
# Pull Request
## What does this PR do?
- fix some links in the readme
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Elliot Lintz <45725915+Elliot67@users.noreply.github.com>
Co-authored-by: gui machiavelli <hey@guimachiavelli.com>
4364: Revert "Remove panic on the geosearch" r=curquiza a=irevoire
After more thought about it, we want to fix this bug in a patch release instead of `main`.
I revert this PR for now, but the fix will still land on `main` once we bring back the change of the `v1.6.1` on `main`.
Reverts meilisearch/meilisearch#4337
Co-authored-by: Tamo <irevoire@protonmail.ch>
4304: Add CUDA GPU support for Hugging Face embedders r=Kerollmops a=dureuill
Adds a "cuda" feature to `milli`.
Compiling with this feature requires that the CUDA support library be installed (see "with CUDA support" paragraph in https://huggingface.github.io/candle/guide/installation.html), and adds CUDA support to the `huggingFace` embedder.
To enable GPU support, users will need to:
1. Have a compatible NVidia GPU under Linux
2. Follow [the guide](https://huggingface.github.io/candle/guide/installation.html) to install the CUDA dependencies
3. Compile Meilisearch with the `cuda` feature: `cargo build --release --features cuda`
# Impact
Enabling the CUDA feature allows to use an available GPU to compute embeddings with a `huggingFace` embedder.
On an AWS Graviton 2, this yields a x3 - x5 improvement on indexing time.
# Technical details
- I had to change the CI so that the cuda feature is not included in the `Tests all features` workflow
- To achieve that, I had to add a binary following the `cargo xtask` design pattern, to list all features excepted the cuda one.
- I then changed the workflow accordingly (renamed to "Tests almost all features" 😉)
- A test run of the new feature was done on a temporary version of this PR that had it enabled for PRs: [See the results here](https://github.com/meilisearch/meilisearch/actions/runs/7461331929/job/20301216732)
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4345: Bump h2 from 0.3.20 to 0.3.24 r=curquiza a=dependabot[bot]
Bumps [h2](https://github.com/hyperium/h2) from 0.3.20 to 0.3.24.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/hyperium/h2/releases">h2's releases</a>.</em></p>
<blockquote>
<h2>v0.3.24</h2>
<h2>Fixed</h2>
<ul>
<li>Limit error resets for misbehaving connections.</li>
</ul>
<h2>v0.3.23</h2>
<h2>What's Changed</h2>
<ul>
<li>cherry-pick fix: streams awaiting capacity lockout in <a href="https://redirect.github.com/hyperium/h2/pull/734">hyperium/h2#734</a></li>
</ul>
<h2>v0.3.22</h2>
<h2>What's Changed</h2>
<ul>
<li>Add <code>header_table_size(usize)</code> option to client and server builders.</li>
<li>Improve throughput when vectored IO is not available.</li>
<li>Update indexmap to 2.</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/tottoto"><code>`@tottoto</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/714">hyperium/h2#714</a></li>
<li><a href="https://github.com/xiaoyawei"><code>`@xiaoyawei</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/712">hyperium/h2#712</a></li>
<li><a href="https://github.com/Protryon"><code>`@Protryon</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/719">hyperium/h2#719</a></li>
<li><a href="https://github.com/4JX"><code>`@4JX</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/638">hyperium/h2#638</a></li>
<li><a href="https://github.com/vuittont60"><code>`@vuittont60</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/724">hyperium/h2#724</a></li>
</ul>
<h2>v0.3.21</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix opening of new streams over peer's max concurrent limit.</li>
<li>Fix <code>RecvStream</code> to return data even if it has received a <code>CANCEL</code> stream error.</li>
<li>Update MSRV to 1.63.</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/DDtKey"><code>`@DDtKey</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/703">hyperium/h2#703</a></li>
<li><a href="https://github.com/jwilm"><code>`@jwilm</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/707">hyperium/h2#707</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/hyperium/h2/blob/v0.3.24/CHANGELOG.md">h2's changelog</a>.</em></p>
<blockquote>
<h1>0.3.24 (January 17, 2024)</h1>
<ul>
<li>Limit error resets for misbehaving connections.</li>
</ul>
<h1>0.3.23 (January 10, 2024)</h1>
<ul>
<li>Backport fix from 0.4.1 for stream capacity assignment.</li>
</ul>
<h1>0.3.22 (November 15, 2023)</h1>
<ul>
<li>Add <code>header_table_size(usize)</code> option to client and server builders.</li>
<li>Improve throughput when vectored IO is not available.</li>
<li>Update indexmap to 2.</li>
</ul>
<h1>0.3.21 (August 21, 2023)</h1>
<ul>
<li>Fix opening of new streams over peer's max concurrent limit.</li>
<li>Fix <code>RecvStream</code> to return data even if it has received a <code>CANCEL</code> stream error.</li>
<li>Update MSRV to 1.63.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="7243ab5854"><code>7243ab5</code></a> Prepare v0.3.24</li>
<li><a href="d919cd6fd8"><code>d919cd6</code></a> streams: limit error resets for misbehaving connections</li>
<li><a href="a7eb14a487"><code>a7eb14a</code></a> v0.3.23</li>
<li><a href="b668c7fbe2"><code>b668c7f</code></a> fix: streams awaiting capacity lockout (<a href="https://redirect.github.com/hyperium/h2/issues/730">#730</a>) (<a href="https://redirect.github.com/hyperium/h2/issues/734">#734</a>)</li>
<li><a href="0f412d8b9c"><code>0f412d8</code></a> v0.3.22</li>
<li><a href="c7ca62f69b"><code>c7ca62f</code></a> docs: fix typos (<a href="https://redirect.github.com/hyperium/h2/issues/724">#724</a>)</li>
<li><a href="ef743ecb22"><code>ef743ec</code></a> Add a setter for header_table_size (<a href="https://redirect.github.com/hyperium/h2/issues/638">#638</a>)</li>
<li><a href="56651e6e51"><code>56651e6</code></a> fix lint about unused import</li>
<li><a href="4aa7b16342"><code>4aa7b16</code></a> Fix documentation for max_send_buffer_size (<a href="https://redirect.github.com/hyperium/h2/issues/718">#718</a>)</li>
<li><a href="d03c54a80d"><code>d03c54a</code></a> chore(dependencies): update tracing minimal version to 0.1.35</li>
<li>Additional commits viewable in <a href="https://github.com/hyperium/h2/compare/v0.3.20...v0.3.24">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4325: Add Setting API reminder in issue template r=ManyTheFish a=ManyTheFish
When adding a new setting, several important points can be easily forgotten.
This PR adds a small reminder list of some of these points in the issue template.
Co-authored-by: Many the fish <many@meilisearch.com>
4330: Add job variable to grafana dashboard r=irevoire a=capJavert
# Pull Request
## Related issue
Fixes https://github.com/orgs/meilisearch/discussions/625#discussioncomment-8143282
## What does this PR do?
"meilisearch" as [job_name](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#job_name) was hardcoded in the dashboard config so if user sets anything but "meilisearch" as job_name on prometheus side the dashboard does not work.
With this change dasboard will auto load the values from data source (much like instance variable) and show the correct data. This now also adds support for multiple meilisearch jobs in single dashboard.
See: https://github.com/orgs/meilisearch/discussions/625#discussioncomment-8143282
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: capJavert <ante@kickass.website>
4337: Remove panic on the geosearch r=ManyTheFish a=irevoire
# Pull Request
## Related issue
Fixes #4333
## What does this PR do?
- Add tests for the enrich pipeline on malformed documents with `null` value
- Reproduce the issue when updating the settings while there is malformed documents in the DB
- Fix the bug
Co-authored-by: Tamo <tamo@meilisearch.com>
4332: Update the dependencies r=irevoire a=Kerollmops
This PR upgrades the dependencies and fixes#4287.
- ~We keep arroy at the current commit. We will release and use the latest version published when possible~
- We also updated arroy to 0.2.0.
- I rolled back the version of rustls has too many breaking changes.
- I had to keep HTTP to 0.2.11 due to actix-cors.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4263: Bump rustls-webpki from 0.101.3 to 0.101.7 r=irevoire a=dependabot[bot]
Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.101.3 to 0.101.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/rustls/webpki/releases">rustls-webpki's releases</a>.</em></p>
<blockquote>
<h2>0.101.7</h2>
<ul>
<li>Upgrades <code>*ring*</code> to 0.17, and <code>untrusted</code> to 0.9. Note: since <code>untrusted</code> appears in the <code>Error</code> API this may be a breaking change for applications using two <code>untrusted</code> versions.</li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>Simplify tests for DER errors by <a href="https://github.com/djc"><code>`@djc</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/193">rustls/webpki#193</a></li>
<li>Upgrade to ring 0.17, untrusted 0.9 by <a href="https://github.com/djc"><code>`@djc</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/193">rustls/webpki#193</a></li>
<li>Bump MSRV to 1.61 by <a href="https://github.com/djc"><code>`@djc</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/193">rustls/webpki#193</a></li>
<li>Upgrade to rcgen 0.11.3 by <a href="https://github.com/cpu"><code>`@cpu</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/189">rustls/webpki#189</a>, <a href="https://redirect.github.com/rustls/webpki/pull/195">rustls/webpki#195</a></li>
<li>v0.101.7 preparation by <a href="https://github.com/cpu"><code>`@cpu</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/199">rustls/webpki#199</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/rustls/webpki/compare/v/0.101.6...v/0.101.7">https://github.com/rustls/webpki/compare/v/0.101.6...v/0.101.7</a></p>
<h2>0.101.6</h2>
<ul>
<li>The <code>CertificateRevocationList</code> trait's <code>verify_signature</code> <code>Budget</code> argument was removed. This was a semver incompatible change mistakenly introduced in v0.101.5.</li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>crl: rm Budget from verify_signature fn by <a href="https://github.com/cpu"><code>`@cpu</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/187">rustls/webpki#187</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/rustls/webpki/compare/v/0.101.5...v/0.101.6">https://github.com/rustls/webpki/compare/v/0.101.5...v/0.101.6</a></p>
<h2>0.101.5</h2>
<ul>
<li>Path building complexity is now limited to a maximum budget of path finding operations, avoiding exponential processing time when encountering certificate chains containing many certificates with the same subject/issuer distinguished name but different subject public key information.</li>
<li>Name constraints evaluation is now limited to a maximum number of comparison operations, avoiding exponential processing time when encountering certificate chains containing many name constraints and subject alternate names.</li>
<li>Subject common names are no longer parsed for name iteration, or applying name constraints. Webpki only uses Subject Alternate Names when validating certificates, and the common name handling was buggy, producing <code>Error::BadDer</code> when iterating certificates with printable string subject common names, or omitted common names encoded as an empty sequence.</li>
</ul>
<h2>What's Changed</h2>
<p>The following PRs were backported to the rel-0.101 branch in <a href="https://redirect.github.com/rustls/webpki/issues/170">#170</a>:</p>
<ul>
<li>Further limits on expensive path building (<a href="https://redirect.github.com/rustls/webpki/issues/163">#163</a>)</li>
<li>Budget tweaks (<a href="https://redirect.github.com/rustls/webpki/issues/164">#164</a>)</li>
<li>Bound name constraint comparisons (<a href="https://redirect.github.com/rustls/webpki/issues/165">#165</a>)</li>
<li>Remove subject common name parsing (<a href="https://redirect.github.com/rustls/webpki/issues/169">#169</a>, thanks to <a href="https://github.com/hawkw"><code>`@hawkw</code></a>)</li>`
<li>Correct handling of fatal errors (<a href="https://redirect.github.com/rustls/webpki/issues/168">#168</a>)</li>
</ul>
<p>Thanks to all who have contributed, on behalf of the rustls team (<a href="https://github.com/ctz"><code>`@ctz</code></a>,` <a href="https://github.com/cpu"><code>`@cpu</code></a>` and <a href="https://github.com/djc"><code>`@djc</code></a>)!</p>`
<h2>0.101.4</h2>
<h2>Release notes</h2>
<ul>
<li>certificate path building and verification is now capped at 100 signature validation operations to avoid the risk of CPU usage denial-of-service attack when validating crafted certificate chains producing quadratic runtime. This risk affected both clients, as well as servers that verified client certificates.</li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>v0.101.4 prep by <a href="https://github.com/cpu"><code>`@cpu</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/153">rustls/webpki#153</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/rustls/webpki/compare/v/0.101.3...v/0.101.4">https://github.com/rustls/webpki/compare/v/0.101.3...v/0.101.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="ee5aab1dff"><code>ee5aab1</code></a> Cargo: v0.101.6 -> v0.101.7</li>
<li><a href="4f721a901f"><code>4f721a9</code></a> Upgrade to rcgen 0.11.3</li>
<li><a href="3be3625584"><code>3be3625</code></a> Bump MSRV to 1.61</li>
<li><a href="bb7c7f47ab"><code>bb7c7f4</code></a> Upgrade to ring 0.17, untrusted 0.9</li>
<li><a href="2eeb2920cf"><code>2eeb292</code></a> Simplify tests for DER errors</li>
<li><a href="7956538ee7"><code>7956538</code></a> Cargo: v0.101.5 -> v0.101.6</li>
<li><a href="7f8208ec06"><code>7f8208e</code></a> crl: rm <code>Budget</code> from <code>verify_signature</code> fn</li>
<li><a href="7cb6c646a0"><code>7cb6c64</code></a> Cargo: bump version 0.101.4 -> 0.101.5</li>
<li><a href="2dd2a06016"><code>2dd2a06</code></a> verify_cert: use enum for build chain error</li>
<li><a href="c255d61a6a"><code>c255d61</code></a> verify_cert: correct handling of fatal errors</li>
<li>Additional commits viewable in <a href="https://github.com/rustls/webpki/compare/v/0.101.3...v/0.101.7">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4319: Update README r=curquiza a=codesmith-emmy
# Pull Request
## Related issue
Fixes #<issue_number>
## What does this PR do?
- ...
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: emmanuel <154705254+codesmith-emmy@users.noreply.github.com>
When adding a new setting, there are several important points that can be easily forgotten.
This PR adds a small reminder list of some of these points.
4318: Hide embedders r=ManyTheFish a=dureuill
Hides `embedders` when it is an empty dictionary.
Manual tests:
- getting settings with empty embedders: not displayed
- getting settings with non-empty embedders: displayed like before
- dump with empty embedders: can be imported
- dump with non-empty embedders: can be imported
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4313: Fix document formatting performances r=Kerollmops a=ManyTheFish
reduce the formatted option list to the attributes that should be formatted,
instead of all the attributes to display.
The time to compute the `format` list scales with the number of fields to format;
cumulated with `map_leaf_values` that iterates over all the nested fields, it gives a quadratic complexity:
`d*f` where `d` is the total number of fields to display and `f` is the total number of fields to format.
Co-authored-by: ManyTheFish <many@meilisearch.com>
4314: Fix proximity precision telemetry r=Kerollmops a=ManyTheFish
The proximity precision telemetry was partially missing in the global setting route.
This PR adds the missing field and return the default value when the value is not set.
Co-authored-by: ManyTheFish <many@meilisearch.com>
4311: Limit the number of values returned by the facet search r=dureuill a=Kerollmops
This PR fixes a bug where the number of values per facet returned by the `indexes/{index}/facet-search` route was not tacking the `faceting.maxValuePerFacet` setting. It also adds a test.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4308: Fix hang on `/indexes` and `/stats` routes r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#4218
## Context
- A previous fix added a field to the `IndexScheduler` to memorize the `currently_updating_index`, so that accessing it through the search would return the handle without trying to open it. This resolved a hang on the search, but #4218 reported further hangs on the `/indexes` and `/stats` routes
- These routes were shunting the `IndexScheduler` and using internal `IndexMapper` logic to access the indexes, again trying to reopen the updating index.
## What does this PR do?
- Moves the logic relative to the `currently_updating_index` from the `IndexScheduler` to the `IndexMapper`, so that any index request to the `IndexMapper` can benefit from it.
## Test
1. Follow reproducer from #4218
2. Before this PR, notice a hang on `/stats` and `/indexes`, but not on `/indexes/<updating_index>/search`
3. After this PR, notice no hang on either of `/stats`, `/indexes` or `/indexes/<updating_index>/search`
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4296: Fix single element search r=irevoire a=dureuill
# Pull Request
Before this PR, indexing a single vector in a single document would result in the vector not being found by the vector search.
This PR adds a test case for this condition, and resolves it by bumping arroy to a version containing the fix.
# Test case
Output of the test before and after this PR:
```diff
diff --git a/meilisearch/tests/search/hybrid.rs b/meilisearch/tests/search/hybrid.rs
index 2cd4b83e7..79819cab2 100644
--- a/meilisearch/tests/search/hybrid.rs on release-v1.6.0
+++ b/meilisearch/tests/search/hybrid.rs on fix-single-element-search
`@@` -171,5 +171,5 `@@` async fn single_document() {
.await;
snapshot!(code, `@"200` OK");
- snapshot!(response["hits"][0], `@r###"{"title":"Shazam!","desc":"a` Captain Marvel ersatz","id":"1","_vectors":{"default":[1.0,3.0]},"_rankingScore":0.0}"###);
+ snapshot!(response["hits"][0], `@r###"{"title":"Shazam!","desc":"a` Captain Marvel ersatz","id":"1","_vectors":{"default":[1.0,3.0]},"_rankingScore":1.0,"_semanticScore":1.0}"###);
}
```
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4293: Update SDK test dependencies r=curquiza a=curquiza
Replace dependabot updates
The changes are really un-impactful for the engine team velocity because is about a CI
- that does not run during release deployment
- that does not run to merge a PR
It's only a weekly scheduled CI to check the breaking we introduced in the integrations.
I updated the dependencies based on what we do on the integration CIs
For example for dart, I looked at what we have in the [Dart CI](63fd758882/.github/workflows/tests.yml (L16-L54)) and I updated our CI in this repo accordingly. I did the same for each repository. This ensures we test the same things.
Co-authored-by: curquiza <clementine@meilisearch.com>
4294: fix compilation warnings for release v1.6 r=curquiza a=irevoire
# Pull Request
## Related issue
Fixes#4292
## What does this PR do?
- Removed unused imports
#4295 fixes the issue no main
Co-authored-by: Tamo <tamo@meilisearch.com>
4279: Check experimental feature on setting update query rather than in the task. r=ManyTheFish a=dureuill
Improve the UX by checking for the vector store feature and returning an error synchronously when sending a setting update, rather than in the indexing task.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4238: Task queue webhook r=dureuill a=irevoire
# Prototype `prototype-task-queue-webhook-1`
The prototype is available through Docker by using the following command:
```bash
docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-task-queue-webhook-1
```
# Pull Request
Implements the task queue webhook.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4236
## What does this PR do?
- Provide a new cli and env var for the webhook, respectively called `--task-webhook-url` and `MEILI_TASK_WEBHOOK_URL`
- Also supports sending the requests with a custom `Authorization` header by specifying the optional `--task-webhook-authorization-header` CLI parameter or `MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER` env variable.
- Throw an error if the specified URL is invalid
- Every time a batch is processed, send all the finished tasks into the webhook with our public `TaskView` type as a JSON Line GZIPed body.
- Add one test.
## PR checklist
### Before becoming ready to review
- [x] Add a test
- [x] Compress the data we send
- [x] Chunk and stream the data we send
- [x] Remove the unwrap in the index-scheduler when sending the data fails
- [x] The analytics are missing
### Before merging
- [x] Release a prototype
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
4277: Update mini-dashboard to v0.2.12 r=curquiza a=mdubus
# Pull Request
## Related issue
Fixes#4276
## What does this PR do?
Upgrade mini-dashboard to version 0.2.12 ([see changes](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.12))
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
4275: Flatten settings r=dureuill a=dureuill
# Pull Request
## Related issue
Initial internal feedback seems to indicate that the current shape of the `embedders` setting is undesirable: it has too much depth.
This PR changes this by flattening the structure of the embedders to the following:
```json5
// NEW structure
"embedders": {
// still starts with the embedder name
"default": {
"source": "huggingFace", // now a string
// properties of the source are all at the same level as the source
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd",
"documentTemplate": "A product titled '{{doc.title}}'" // now a string
}
}
```
By comparison, the old structure was:
```json5
// PREVIOUS version, no longer working with this PR
"embedders": {
// still starts with the embedder name
"default": {
"source": {
"huggingFace": {
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd"
},
"documentTemplate": {
"template": "A product titled '{{doc.title}}'" // now a string
}
}
}
```
The fields that are accepted in the new version of the `embedders` setting are depending on the value of the `source` field:
```json5
// huggingFace
"embedders": {
"default": {
"source": "huggingFace",
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd",
"documentTemplate": "A product titled '{{doc.title}}'"
}
}
// openAi
"embedders": {
"default": {
"source": "openAi",
"model": "text-embedding-ada-002",
"apiKey": "open_ai_api_key",
"documentTemplate": "A product titled '{{doc.title}}'"
}
}
// userProvided
"embedders": {
"default": {
"source": "userProvided",
"dimensions": 42, // mandatory
}
}
```
## What does this PR do?
- Flatten the settings structure
- Validate the prompt earlier to return a synchronous error on setting change rather than in the failing task
- Make it an error to pass a field for the wrong source (see above for allowed fields for each source)
- Not changed: It is still an error not to pass `dimensions` to the `userProvided` embedder
- If `source` was specified in the settings, validate the setting early to return a synchronous error in case of a missing mandatory field for the userProvided source (dimensions) or a forbidden field for the specified source.
- If `source` was not specified in the settings, still validate the setting, but only at indexing time, by using the source stored in the DB.
- Resets all values if the source changes, even if the user did not reset them explicitly.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Change the public facing guide for using the API
- [ ] Change examples of use in the changelog
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4272: Don't pass default revision when the model is explicitly set in config r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#4271
## What does this PR do?
- When the `model` is explicitly set in the `embedders` setting, we reset the `revision` to `None`, such that if the user doesn't specify a revision, the head of the model repository is chosen.
- Not changed: If the user specifies a revision, it applies, like previously.
- Not changed: If the user doesn't specify a model, the default model with the default revision applies, like previously.
## Manual testing on a fresh DB
1. Enable experimental feature:
```sh
curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' -H 'Authorization: Bearer foo' \
--data-binary '{ "vectorStore": true
}'
```
2. Send settings with a specified model but no specified revision:
```sh
curl \
-X PATCH 'http://localhost:7700/indexes/products/settings' \
-H 'Content-Type: application/json' --data-binary \
'{ "embedders": { "default": { "source": { "huggingFace": { "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2" } }, "documentTemplate": { "template": "A product titled '{{doc.title}}'"} } } }'
```
3. Check that the task was successful:
```sh
curl 'http://localhost:7700/tasks/0'
{"uid":0,"indexUid":"products","status":"succeeded","type":"settingsUpdate","canceledBy":null,"details":{"embedders":{"default":{"source":{"huggingFace":{"model":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"}},"documentTemplate":{"template":"A product titled {{doc.title}}"}}}},"error":null,"duration":"PT0.001892S","enqueuedAt":"2023-12-20T09:17:01.73789Z","startedAt":"2023-12-20T09:17:01.73854Z","finishedAt":"2023-12-20T09:17:01.740432Z"}
```
4. Send documents to index:
```sh
curl 'https://localhost:7700/indexes/products/documents' -H 'Content-Type: application/json' --data-binary '{"id": 0, "title": "Best product"}'
```
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4269: Remove dependency that requires libstdc++ r=dureuill a=dureuill
Removes the dependency that caused the additional runtime dependency on libstdc++ by disabling the default features of the hf tokenizer.
## Discussion
- This removes a feature that is using a C++ dependency and is supposed to accelerate the tokenizer. As the tokenizer is likely to be a significant bottleneck for embedding texts using a HF model, this is an issue.
- We should at least rerun the movies vector indexing and check that it still works correctly and that it has a runtime in the ballpark of what it used to be.
Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>
4268: Add libstdc++ in Dockerfile r=curquiza a=sanders41
# Pull Request
## Related issue
Fixes#4267
## What does this PR do?
- Add libstdc++ in the Dockerfile
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Paul Sanders <psanders1@gmail.com>
4262: Update version for the next release (v1.6.0) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
4257: Change proximity precision settings r=dureuill a=ManyTheFish
- [x] Add proximity_precision value into the analytics
- [x] Change the naming of `attributeScale` and `wordScale` into `byAttribute` and `byWord`
- [x] Remove proximityPrecision from the experimental feature
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
4226: Hybrid search r=dureuill a=dureuill
Allows to perform hybrid search requests that combine the results of semantic and keyword search and automatically generate embeddings.
## How to use
See [feature description](https://meilisearch.notion.site/v1-6-Hybrid-Search-Embedders-ea42c82f90cc4bc0be1eeb917c1118c8)
## Changes
- work is based on #4213
- milli::new search now takes an input universe directly, rather than computing it from a filter. This adds flexibility to require results on a subset of documents
- vector search is now a regular ranking rule (akin to sort and geosort) and reports its score as a ScoreDetail
- separate keyword search and vector search functions, vector search now respects (geo)sort ranking rules
- add automatic embedding
- add hybrid search
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
4254: Bring back v1.5.1 changes into main r=ManyTheFish a=Kerollmops
This pull request brings back changes from the _release-v1.5.1_ branch into _main_.
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
4250: Update version for the next release (v1.5.1) in Cargo.toml r=dureuill a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
4239: Remove the actix-web dependency from milli r=dureuill a=Kerollmops
Just remove actix-web from milli.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4233: Add test reproducing #4232 r=dureuill a=ManyTheFish
- add a test reproducing the bug
- fix the bug by creating 2 different restricting lists of attributes, one for the exact attributes, and the other for the tolerant attributes
## Related issue
Fixes#4232
Co-authored-by: ManyTheFish <many@meilisearch.com>
4223: Update to heed 0.20 r=dureuill a=Kerollmops
This PR brings the v0.20-alpha.9 version of heed into Meilisearch 🎉 The main goal is to test it in a real environment to make the necessary changes if needed. We also want to merge it as soon as possible during the pre-release phase to ensure we catch bugs before the release.
Most of the calls to heed are the same as before, except:
- The `PolyDatabase` has been replaced with a `Database<Unspecified, Unspecified>`. We replaced the `get<T, U>()` by a `remap<T, U>().get()` calls.
- The `Database` `append(...)` method has been replaced with a `put_with_flags(PutFlags::APPEND, ...)`.
- The `RwTxn<'e, 'p>` has been simplified into a `RwTxn<'e>`.
- The `BytesEncode/Decode` traits return a `Result<_, BoxedError>` instead of an `Option<_>`.
- We no longer need to wrap and unwrap the `BEU32` integer when storing/getting them from heed.
### TODO
- [x] Create actual, simple error types instead of using strings in the codecs.
### Follow-up work
- Move the codecs into another member crate (we depend on the uuid one in the meilitool crate).
- Display the internal decoding error in the `SerializationError` internal error variant.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4234: Fix puffin in the index scheduler r=dureuill a=irevoire
Currently, we can't compile the index scheduler without this feature.
It could be cool to specify the dependencies in the main workspace cargo toml like quickwit does to avoid this kind of error in the future; https://github.com/quickwit-oss/quickwit/blob/main/quickwit/Cargo.toml#L41
Co-authored-by: Tamo <tamo@meilisearch.com>
4231: Fixed payload limit setting being ignored for delete documents by batch r=Kerollmops a=Karribalu
# Pull Request
## Related issue
Fixes#4224
## What does this PR do?
- Added http_payload_size_limit to JsonConfig to allow deleting documents in batches with a payload size greater than 2MB, which is the default limit set in the JsonConfig crate.
## PR checklist
Please check if your PR fulfills the following requirements:
- [Y] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [Y] Have you read the contributing guidelines?
- [Y] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: karribalu <karri.balu123456@gmail.com>
4090: Diff indexing r=ManyTheFish a=ManyTheFish
This pull request aims to reduce the indexing time by computing a difference between the data added to the index and the data removed from the index before writing in LMDB.
## Why focus on reducing the writings in LMDB?
The indexing in Meilisearch is split into 3 main phases:
1) The computing or the extraction of the data (Multi-threaded)
2) The writing of the data in LMDB (Mono-threaded)
3) The processing of the prefix databases (Mono-threaded)
see below:

Because the writing is mono-threaded, it represents a bottleneck in the indexing, reducing the number of writes in LMDB will reduce the pressure on the main thread and should reduce the global time spent on the indexing.
## Give Feedback
We created [a dedicated discussion](https://github.com/meilisearch/meilisearch/discussions/4196) for users to try this new feature and to give feedback on bugs or performance issues.
## Technical approach
### Part 1: merge the addition and the deletion process
This part:
a) Aims to reduce the time spent on indexing only the filterable/sortable fields of documents, for example:
- Updating the number of "likes" or "stars" of a song or a movie
- Updating the "stock count" or the "price" of a product
b) Aims to reduce the time spent on writing in LMDB which should reduce the global indexing time for the highly multi-threaded machines by reducing the writing bottleneck.
c) Aims to reduce the average time spent to delete documents without having to keep the soft-deleted documents implementation
- [x] Create a preprocessing function that creates the diff-based documents chuck (`OBKV<fid, OBKV<AddDel, value>>`)
- [x] and clearly separate the faceted fields and the searchable fields in two different chunks
- Change the parameters of the input extractor by taking an `OBKV<fid, OBKV<AddDel, value>>` instead of `OBKV<fid, value>`.
- [x] extract_docid_word_positions
- [x] extract_geo_points
- [x] extract_vector_points
- [x] extract_fid_docid_facet_values
- Adapt the searchable extractors to the new diff-chucks
- [x] extract_fid_word_count_docids
- [x] extract_word_pair_proximity_docids
- [x] extract_word_position_docids
- [x] extract_word_docids
- Adapt the facet extractors to the new diff-chucks
- [x] extract_facet_number_docids
- [x] extract_facet_string_docids
- [x] extract_fid_docid_facet_values
- [x] FacetsUpdate
- [x] Adapt the prefix database extractors ⚠️⚠️
- [x] Make the LMDB writer remove the document_ids to delete at the same time the new document_ids are added
- [x] Remove document deletion pipeline
- [x] remove `new_documents_ids` entirely and `replaced_documents_ids`
- [x] reuse extracted external id from transform instead of re-extracting in `TypedChunks::Documents`
- [x] Remove deletion pipeline after autobatcher
- [x] remove autobatcher deletion pipeline
- [x] everything uses `IndexOperation::DocumentOperation`
- [x] repair deletion by internal id for filter by delete
- [x] Improve the deletion via internal ids by avoiding iterating over the whole set of external document ids.
- [x] Remove soft-deleted documents
#### FIXME
- [x] field distribution is not correctly updated after deletion
- [x] missing documents in the tests of tokenizer_customization
### Part 2: Only compute the documents field by field
This part aims to reduce the global indexing time for any kind of partial document modification on any size of machine from the mono-threaded one to the highly multi-threaded one.
- [ ] Make the preprocessing function only send the fields that changed to the extractors
- [ ] remove the `word_docids` and `exact_word_docids` database and adapt the search (⚠️ could impact the search performances)
- [ ] replace the `word_pair_proximity_docids` database with a `word_pair_proximity_fid_docids` database and adapt the search (⚠️ could impact the search performances)
- [ ] Adapt the prefix database extractors ⚠️⚠️
## Technical Concerns
- The part 1 implementation could increase the indexing time for the smallest machines (with few threads) by increasing the extracting time (multi-threaded) more than the writing time (mono-threaded)
- The part 2 implementation needs to change the databases which could have a significant impact on the search performances
- The prefix databases are a bit special to process and may be a pain to adapt to the difference-based indexing
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4203: Extract external document docids from docs on deletion by filter r=Kerollmops a=dureuill
This fixes some of the performance regression observed on `diff-indexing` when doing delete-by-filter with a filter matching many documents.
To delete 19 768 771 documents (hackernews dataset, all documents matching `type = comment`), here are the observed time:
|branch (commit sha1sum)|time|speed-down factor (lower is better)|
|--|--|--|
|`main` (48865470d7)|1212.885536s (~20min)|x1.0 (baseline)|
|`diff-indexing` (523519fdbf)|5385.550543s (90min)|x4.44|
|**`diff-indexing-extract-primary-key`**(f8289cd974)|2582.323324s (43min) | x2.13|
So we're still suffering a speed-down of x2.13, but that's much better than x4.44.
---
Changes:
- Refactor the logic of PrimaryKey extraction to a struct
- Add a trait to abstract the extraction of field id from a name between `DocumentBatch` and `FieldIdMap`.
- Add `Index::external_id_of` to get the external ids of a bitmap of internal ids.
- Use this new method to add new Transform and Batch methods to remove documents that are known to be from the DB.
- Modify delete-by-filter to use the new method
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4205: Prevent search hang on the processing index r=Kerollmops a=dureuill
Fixes#4206, an issue originally [reported on Discord](https://discord.com/channels/1006923006964154428/1148983671026618579/1148983671026618579) where having parallel search requests on more indexes than the index cache capacity would cause search requests on the currently updating index to hang until the index is done updating.
## Test setup
- Create 20 empty indexes by sending settings to them
- repeatedly send placeholder search requests to each of the indexes in a loop
- Create another index and send a significant batch of documents to index.
- Attempt to perform a search request on that last index.
- Before this PR, the search request hangs while the index update task is processing
- After this PR, the search request respond immediately even while the index update task is processing
## Changes
- When getting the handle to an index for some potentially long running batches of tasks, save it in the index scheduler.
- Drop the handle from the index-scheduler when the task is done so that we don't leak indexes.
- When getting an index from outside the task queue processor, check if there is such an handle matching the requested index. If so, skip the cache entirely and clone the handle.
Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4204: Throw error when the vector search is sent with the wrong size r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#4201
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4185: Bump Swatinem/rust-cache from 2.6.2 to 2.7.1 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.6.2 to 2.7.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.7.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix save-if documentation in readme by <a href="https://github.com/rukai"><code>`@rukai</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/166">Swatinem/rust-cache#166</a></li>
<li>Support for <code>trybuild</code> and similar macro testing tools by <a href="https://github.com/neysofu"><code>`@neysofu</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/168">Swatinem/rust-cache#168</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/rukai"><code>`@rukai</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/166">Swatinem/rust-cache#166</a></li>
<li><a href="https://github.com/neysofu"><code>`@neysofu</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/168">Swatinem/rust-cache#168</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/Swatinem/rust-cache/compare/v2.6.2...v2.7.0">https://github.com/Swatinem/rust-cache/compare/v2.6.2...v2.7.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.7.1</h2>
<ul>
<li>Update toml parser to fix parsing errors.</li>
</ul>
<h2>2.7.0</h2>
<ul>
<li>Properly cache <code>trybuild</code> tests.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="3cf7f8cc28"><code>3cf7f8c</code></a> 2.7.1</li>
<li><a href="e03705e031"><code>e03705e</code></a> changelog</li>
<li><a href="b86d1c6caa"><code>b86d1c6</code></a> bump all the other dependencies too</li>
<li><a href="f27990c89a"><code>f27990c</code></a> Update Dependencies (<a href="https://redirect.github.com/swatinem/rust-cache/issues/172">#172</a>)</li>
<li><a href="a95ba19544"><code>a95ba19</code></a> 2.7.0</li>
<li><a href="82c8487d00"><code>82c8487</code></a> changelog</li>
<li><a href="67c46e7159"><code>67c46e7</code></a> Support for <code>trybuild</code> and similar macro testing tools (<a href="https://redirect.github.com/swatinem/rust-cache/issues/168">#168</a>)</li>
<li><a href="44b6087283"><code>44b6087</code></a> Fix save-if documentation in readme (<a href="https://redirect.github.com/swatinem/rust-cache/issues/166">#166</a>)</li>
<li>See full diff in <a href="https://github.com/swatinem/rust-cache/compare/v2.6.2...v2.7.1">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4167: Introduce the `meilitool` command line interface r=Kerollmops a=Kerollmops
This PR introduces a small tool to help the Cloud team:
- Clear the tasks queue by removing all the tasks
- Dump a Meilisearch database without having to enqueue the task
- Access this `meilitool` binary from the Docker Image
## TODO
- [x] Modify the Docker File to ship with this new tool (`@curquiza,` could you review that, please?)
- [x] Clear the tasks queue by removing all the tasks
- [x] Add more logs to explain what is happening
- [x] Clear the `update_files` folder
- [x] Dump a Meilisearch database without having to enqueue the task
- [x] Add more logs to explain what is happening
- [x] Introduce a flag to skip dumping enqueued and processing tasks.
- [x] Dump the instance uid.
- [x] Dump the keys.
- [x] Dump the tasks with the update files.
- [x] Dump the index documents and settings.
- [ ] ~Dump the experimental features~
Co-authored-by: Clément Renault <clement@meilisearch.com>
4169: update charabia r=curquiza a=ManyTheFish
Update Charabia to v0.8.5 and add the new khmer tokenizer
Co-authored-by: ManyTheFish <many@meilisearch.com>
4132: Extract the creation and last updated timestamp from v2 dumps r=irevoire a=vivek-26
# Pull Request
## Related issue
Fixes#2989
## What does this PR do?
This PR -
- extracts the `created_at` and `updated_at` dates from v2 dumps.
- updates the unit tests.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
4154: Update version for the next release (v1.5.0) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
4126: Make the experimental route /metrics activable via HTTP r=dureuill a=braddotcoffee
# Pull Request
## Related issue
Closes#4086
## What does this PR do?
- [x] Make `/metrics` available via HTTP as described in #4086
- [x] The users can still launch Meilisearch using the `--experimental-enable-metrics` flag.
- [x] If the flag `--experimental-enable-metrics` is activated, a call to the `GET /experimental-features` route right after the launch will show `"metrics": true` even if the user has not called the `PATCH /experimental-features` route yet.
- [x] Even if the --experimental-enable-metrics flag is present at launch, calling the `PATCH /experimental-features` route with `"metrics": false` disables the experimental feature.
- [x] Update the spec
- I was unable to find docs in this repository to update about the `/experimental-features` endpoint. I'll happily update if you point me in the right direction!
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: bwbonanno <bradfordbonanno@gmail.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4134: Bump rustix from 0.36.15 to 0.36.16 r=Kerollmops a=dependabot[bot]
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.36.15 to 0.36.16.
<details>
<summary>Commits</summary>
<ul>
<li><a href="6534992521"><code>6534992</code></a> chore: Release rustix version 0.36.16</li>
<li><a href="4928cf7a38"><code>4928cf7</code></a> Disable riscv64 testing.</li>
<li><a href="8cc159c4c3"><code>8cc159c</code></a> Fix the <code>test_ttyname_ok</code> test when /dev/stdin is inaccessable. (<a href="https://redirect.github.com/bytecodealliance/rustix/issues/821">#821</a>)</li>
<li><a href="6dc7ba9478"><code>6dc7ba9</code></a> Downgrade dependencies and disable tests to compile under Rust 1.48.</li>
<li><a href="ded8986e7e"><code>ded8986</code></a> Disable MIPS in CI. (<a href="https://redirect.github.com/bytecodealliance/rustix/issues/793">#793</a>)</li>
<li><a href="739f9c3ba0"><code>739f9c3</code></a> Fixes for <code>Dir</code> on macOS, FreeBSD, and WASI.</li>
<li><a href="87481a97f4"><code>87481a9</code></a> Merge pull request from GHSA-c827-hfw6-qwvm</li>
<li>See full diff in <a href="https://github.com/bytecodealliance/rustix/compare/v0.36.15...v0.36.16">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4073: Simplify Puffin report exports r=ManyTheFish a=Kerollmops
This PR changes how we export Puffin reports by directly writing them to disk when the `exportPuffinReports` [experimental feature is enabled](https://www.meilisearch.com/docs/learn/experimental/overview) on the `/experimental-features` route. It also adds more puffing logging to the deletion phase and grenad helpers. The puffin reports are identified by the date and time at which they are exported.
## Todo List
- [x] Change the CLI flag to be an API experimental option.
- [x] Create [a PRD for this experimental feature (private)](https://www.notion.so/meilisearch/Export-Puffin-Reports-091df151e71c4edfb7d72f4bf995b3ea).
- [x] Create and complete [a product discussion](https://github.com/meilisearch/product/discussions/693) (copy/paste PROFILING markdown?).
- [x] Update the _PROFILING.md_ markdown file instructions.
- [x] Change the debug logs of the processing operation (visible in puffin viewer).
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
4101: Bump webpki from 0.22.1 to 0.22.2 r=curquiza a=dependabot[bot]
Bumps [webpki](https://github.com/briansmith/webpki) from 0.22.1 to 0.22.2.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a href="https://github.com/briansmith/webpki/commits">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4108: Fix bug where search with distinct attribute and no ranking, returns offset+limit hits r=curquiza a=vivek-26
# Pull Request
## Related issue
Fixes#4078
## What does this PR do?
This PR -
- Fixes bug where search with distinct attribute and no ranking, returns offset+limit hits.
- Adds unit and integration tests.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
4089: Use a bufreader and bufwriter everytime there is a grenad<file> r=curquiza a=irevoire
# Pull Request
Wrap all the files we give to a grenad in a `BufReader` or `BufWriter`.
The dump import I tried in the issue went from 2h to 10 minutes on my machine.
I also ran a bunch of benchmarks on my machine, and we're faster by a few seconds everywhere but nothing huge.
-----
The one thing I’m afraid about is if we used to get the inner file in a grenad and then do a read right after without a seek at the beginning of the file or a reopen.
Since we now use a bufreader our read would return the bytes one buffer later and probably completely corrupt what we were supposed to read.
From what I see, it looks like it works, but I may have missed something, I don't know much about this part of the codebase.
This issue should not arise on the bufwriter, though, because if we're not able to write the content of the buffer I ensured that the `into_inner` of the bufwriter should return an internal error.
## Related issue
Fixes#4087
Co-authored-by: Tamo <tamo@meilisearch.com>
4112: Update version for the next release (v1.4.1) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
4102: Introduce the first bot that shows benchmarks results r=curquiza a=Kerollmops
TBD
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
4074: Enable analytics in debug builds r=Kerollmops a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4072
## What does this PR do?
- Stop disabling the analytics if meilisearch has been compiled in debug mode
Co-authored-by: Tamo <tamo@meilisearch.com>
4065: Dependency issue every 6 months r=curquiza a=curquiza
To avoid spending too much time on it (1 every two sprints)
If you disagree `@Kerollmops,` for security or any reason, please close the PR
Co-authored-by: Clémentine U. - curqui <clementine@meilisearch.com>
4044: Add more integrations to SDK CI r=curquiza a=curquiza
For the integration scope management, but also to anticipate bugs and breaking changes for engine team, we need to add more SDKs tests into the CI
Co-authored-by: curquiza <clementine@meilisearch.com>
Display docker image
Add strapi and firebase
Add rails and symfony tests
Remove strapi and firestore tests
Fix dotnet SDK CI
Use specific dart SDK version
Disable coverage for ruby SDK
Prevent pushing coverage information to codecov
Remove codecoverage token
Trigger Build
Trigger Build
Trigger Build
Trigger Build
Trigger Build
4056: Rewrite segment_analytics module with the destructuring syntax r=Kerollmops a=vivek-26
# Pull Request
## Related issue
Fixes#3928
## What does this PR do?
- This PR uses Rust Destructuring syntax in the `segment_analytics` module, such that adding or deleting fields causes an error at compile time.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
4053: Fix the stats of the documents deletion by filter r=Kerollmops a=irevoire
# Pull Request
The issue was that the operation « DocumentDeletionByFilter » was not declared as an index operation. That means the index stats were not reprocessed after the application of the operation.
## Related issue
Fixes#4018
## What does this PR do?
- Move the `DocumentDeletionByFilter` internal operation into the category of the `IndexOperation`. This means that the stats will automatically be re-processed after a batch is processed.
- Update a test to ensure that the stats are valid after each operation
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Tamo <tamo@meilisearch.com>
4051: Implement the snapshots on demand r=Kerollmops a=irevoire
# Pull Request
Private link: [PRD available here](https://www.notion.so/meilisearch/On-demand-snapshots-5676e542b905459d96eec228da133b00#847ff0cafeb64fe09e8ee7150852b474)
Specification here: https://github.com/meilisearch/specifications/pull/258
## Prototype
A prototype is available under the name: `prototype-snapshot-on-demand-0`.
## Related issue
Fixes#4052
## What does this PR do?
- Introduce a new route, `POST /snapshots` to create snapshots on demand
- Introduce a new api-key action `snapshot.create`
- Introduce a new analytic `Snapshot Created` sent every time a snapshot is created.
## Notes for the team
I made a prototype so users can test the feature before the v1.5 comes out. But we can merge the PR as-is.
Co-authored-by: Tamo <tamo@meilisearch.com>
3997: Refactor empty arrays/objects should return empty instead of null r=Kerollmops a=dogukanakkaya
# Pull Request
## What does this PR do?
At the moment if we select empty objects and array of object properties with dot notations like:
```json
{
"array": [],
"object": {}
}
```
```rs
GetDocumentOptions { fields: Some(vec!["array.name", "object.name"]) }
```
returns null if the array/object has no property yet.
I am not sure if this is expected or it's the correct behaviour but I add my document with a property that is assigned to an empty array/object, later on when I select it, returns null which is kinda weird and unexpected in my opinion.
This PR fixes that issue by returning an empty vector if the array is empty or an empty map if object is empty. This is not added for `permissive-json-pointer/src/lib.rs:224` because `create_array` loops over each item. Selecting a single property that is an object, in an array of objects would result other objects to be empty maps instead of none.
```json
"doggos": [
{
"jean": {
"race": {
"name": "bernese mountain",
}
}
},
{
"marc": {
"age": 4,
"race": {
"name": "golden retriever",
}
}
}
]
```
```rs
GetDocumentOptions { fields: Some(vec!["doggos.jean"]) }
```
Would result in `jean` object and an extra empty object for `marc`.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: dogukanakkaya <doguakkaya27@hotmail.com>
4009: Bump rustls-webpki from 0.100.1 to 0.100.2 r=Kerollmops a=dependabot[bot]
Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.100.1 to 0.100.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/rustls/webpki/releases">rustls-webpki's releases</a>.</em></p>
<blockquote>
<h2>v/0.100.2</h2>
<h2>Release notes</h2>
<ul>
<li>certificate path building and verification is now capped at 100 signature validation operations to avoid the risk of CPU usage denial-of-service attack when validating crafted certificate chains producing quadratic runtime. This risk affected both clients, as well as servers that verified client certificates.</li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>v0.100.2 prep by <a href="https://github.com/cpu"><code>`@cpu</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/154">rustls/webpki#154</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2">https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="c8b821450b"><code>c8b8214</code></a> Bump MSRV to 1.60</li>
<li><a href="855752292e"><code>8557522</code></a> Avoid testing MSRV of dev-dependencies</li>
<li><a href="73a7f0c7d7"><code>73a7f0c</code></a> Cargo: version 0.100.1 -> 0.100.2</li>
<li><a href="4ea052366f"><code>4ea0523</code></a> verify_cert: enforce maximum number of signatures.</li>
<li>See full diff in <a href="https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The issue was that the operation « DocumentDeletionByFilter » was not
declared as an index operation. That means the indexes stats were not
reprocessed after the application of the operation.
4050: Bump webpki from 0.22.0 to 0.22.1 r=Kerollmops a=dependabot[bot]
Bumps [webpki](https://github.com/briansmith/webpki) from 0.22.0 to 0.22.1.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a href="https://github.com/briansmith/webpki/commits">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [Update] test-suite.yml
Added New run command for cargo tree without default features using if-then block
* [Updated] test-disabled-tokenization in test-suite.yml
* [Updated] test-suite.yml
* Update .github/workflows/test-suite.yml
---------
Co-authored-by: Clémentine U. - curqui <clementine@meilisearch.com>
4028: Fix highlighting bug when searching for a phrase with cropping r=ManyTheFish a=vivek-26
# Pull Request
## Related issue
Fixes#3975
## What does this PR do?
This PR -
- Fixes the bug where searching **only** for a phrase (containing multiple words) along with cropping, highlighted only the first word of the phrase.
- Adds unit test case for the above mentioned scenario.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
4041: Register the swap indexe task in a spawn blocking to be sure to never… r=ManyTheFish a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4040
## What does this PR do?
- Register the swap indexes task in a spawn blocking task
Co-authored-by: Tamo <tamo@meilisearch.com>
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops
This PR fixes#4035, making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.
Co-authored-by: Kerollmops <clement@meilisearch.com>
4038: Fix filter escaping issues r=ManyTheFish a=Kerollmops
This PR fixes#4034 by always escaping the sequences. Users must always put quotes (simple or double) to escape the filter values.
Co-authored-by: Kerollmops <clement@meilisearch.com>
4037: Update version for the next release (v1.3.3) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
3994: Fix synonyms with separators r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes#3977
## Available prototype
```
$ docker pull getmeili/meilisearch:prototype-fix-synonyms-with-separators-0
```
## What does this PR do?
- add a new test
- filter the empty synonyms after normalization
Co-authored-by: ManyTheFish <many@meilisearch.com>
4016: Define the full Homebrew formula path r=curquiza a=Kerollmops
This PR fixes#4015 by defining the full Homebrew formula path.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4025: Bump Swatinem/rust-cache from 2.5.1 to 2.6.2 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.5.1 to 2.6.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.6.2</h2>
<h2>What's Changed</h2>
<ul>
<li>dep: Use <code>smol-toml</code> instead of <code>toml</code> by <a href="https://github.com/NobodyXu"><code>`@NobodyXu</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/164">Swatinem/rust-cache#164</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/Swatinem/rust-cache/compare/v2...v2.6.2">https://github.com/Swatinem/rust-cache/compare/v2...v2.6.2</a></p>
<h2>v2.6.1</h2>
<ul>
<li>Fix hash contributions of <code>Cargo.lock</code>/<code>Cargo.toml</code> files.</li>
</ul>
<h2>v2.6.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add "buildjet" as a second <code>cache-provider</code> backend <a href="https://github.com/joroshiba"><code>`@joroshiba</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/154">Swatinem/rust-cache#154</a></li>
<li>Clean up sparse registry index.</li>
<li>Do not clean up src of <code>-sys</code> crates.</li>
<li>Remove <code>.cargo/credentials.toml</code> before saving.</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/joroshiba"><code>`@joroshiba</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/154">Swatinem/rust-cache#154</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/Swatinem/rust-cache/compare/v2.5.1...v2.6.0">https://github.com/Swatinem/rust-cache/compare/v2.5.1...v2.6.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.6.2</h2>
<ul>
<li>Fix <code>toml</code> parsing.</li>
</ul>
<h2>2.6.1</h2>
<ul>
<li>Fix hash contributions of <code>Cargo.lock</code>/<code>Cargo.toml</code> files.</li>
</ul>
<h2>2.6.0</h2>
<ul>
<li>Add "buildjet" as a second <code>cache-provider</code> backend.</li>
<li>Clean up sparse registry index.</li>
<li>Do not clean up src of <code>-sys</code> crates.</li>
<li>Remove <code>.cargo/credentials.toml</code> before saving.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="e207df5d26"><code>e207df5</code></a> 2.6.2</li>
<li><a href="decb69d790"><code>decb69d</code></a> Update dependencies and add changelog</li>
<li><a href="ab6b2769d1"><code>ab6b276</code></a> dep: Use <code>smol-toml</code> instead of <code>toml</code> (<a href="https://redirect.github.com/swatinem/rust-cache/issues/164">#164</a>)</li>
<li><a href="578b235f6e"><code>578b235</code></a> 2.6.1</li>
<li><a href="5113490c3f"><code>5113490</code></a> prepare 2.6.1</li>
<li><a href="c0e052c18c"><code>c0e052c</code></a> Fix hashing of parsed <code>Cargo.toml</code> (<a href="https://redirect.github.com/swatinem/rust-cache/issues/160">#160</a>)</li>
<li><a href="4e0f4b19dd"><code>4e0f4b1</code></a> Fix typo in hashing parsed <code>Cargo.lock</code> (<a href="https://redirect.github.com/swatinem/rust-cache/issues/159">#159</a>)</li>
<li><a href="b919e1427f"><code>b919e14</code></a> feat: Add logging to <code>Cargo.lock</code>/<code>Cargo.toml</code> hashing (<a href="https://redirect.github.com/swatinem/rust-cache/issues/156">#156</a>)</li>
<li><a href="b8a6852b4f"><code>b8a6852</code></a> 2.6.0</li>
<li><a href="80c47cc945"><code>80c47cc</code></a> Clean up <code>credentials.toml</code></li>
<li>Additional commits viewable in <a href="https://github.com/swatinem/rust-cache/compare/v2.5.1...v2.6.2">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4020: Update version for the next release (v1.4.0) in Cargo.toml r=Kerollmops a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: Kerollmops <Kerollmops@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
4013: Fix the ranking rule by temporarily disabling an assert in the bucket sort algorithm r=Kerollmops a=Kerollmops
This PR temporarily disables an assertion, making the search crash. [I created a tracking issue](https://github.com/meilisearch/meilisearch/issues/4012) to find a better way to fix this.
It no longer reverts a20e4d447c, which seemed to generate unreachable graphs and make the bucket sort ranking algorithm panic because of entering an unreachable state. We discussed that below in the comments.
Temporary fixes#4002, fixes#4006, and fixes#3995.
---
It took me approximately 2 days to find the first bad commit just because I'm bad in `git bisect` x `bash`, i.e. [I misused `%1` with `$!` to kill the most recently backgrounded job](https://unix.stackexchange.com/a/340084/212574)...
<details>
<summary>Here is the script I used to find the invalid commit</summary>
```bash
#!/usr/bin/env bash
set -x
# remove the data
rm -rf data.ms
# build meilisearch
cargo build --release
# ignore this commit if it doesn't compile
if [[ $? != 0 ]]; then
exit 125
fi
# index the dump and start from it
./target/release/meilisearch \
--http-addr 'localhost:7705' \
--import-dump $HOME/Downloads/modified-20230822-083016113.dump &
# wait 10 sec while it indexes the docs
sleep 5
# check if the server crashes on requests
echo '{
"q": "rtx 305",
"attributesToHighlight": [
"*"
],
"highlightPreTag": "<ais-highlight-0000000000>",
"highlightPostTag": "</ais-highlight-0000000000>",
"limit": 21,
"offset": 0
}' | xh 'localhost:7705/indexes/arvutitark_local_orderables/search'
last_exit_code=$?
# Now kill Meilisearch
kill $!
# Clean the potential Cargo.lock
git checkout .
exit $last_exit_code
```
</details>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
3945: Do not leak field information on error r=Kerollmops a=vivek-26
# Pull Request
## Related issue
Fixes#3865
## What does this PR do?
This PR ensures that `InvalidSortableAttribute`and `InvalidFacetSearchFacetName` errors do not leak field information i.e. fields which are not part of `displayedAttributes` in the settings are hidden from the error message.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
4000: Update version for the next release (v1.3.2) in Cargo.toml r=irevoire a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
3998: Accept the `null` JSON value as a value of the `_vectors` field r=irevoire a=Kerollmops
This PR fixes#3979 by accepting `null` JSON values in the `_vectors` fields provided by the user.
Can the reviewer please verify that I am merging in the right branch?
I think we must create a new _release-v1.3.2_.
Co-authored-by: Kerollmops <clement@meilisearch.com>
3990: Removed unnecessary borrow call that failed nightly tests r=irevoire a=JannisK89
# Pull Request
## Related issue
Fixes#3988
## What does this PR do?
- Removes unnecessary borrow call that was causing warnings when running tests on nightly.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ x] Have you read the contributing guidelines?
- [ x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Please let me know if there is anything else I can do to improve this PR.
Thank you.
Co-authored-by: JannisK89 <jannis.karanikis@gmail.com>
3976: Fix the get stats method r=ManyTheFish a=irevoire
# Pull Request
- The get stats method of the index-scheduler was not using at all the processing tasks. That was returning a wrong number of enqueued tasks and 0 processing tasks.
- Added a test
- Currently this method was **ONLY** used to compute the `meilisearch_nb_tasks` field of the **experimental feature** metrics.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3972
Co-authored-by: Tamo <tamo@meilisearch.com>
3946: Settings customizing tokenization r=irevoire a=ManyTheFish
# Pull Request
This pull Request allows the User to customize Meilisearch Tokenization by providing specialized settings.
## Small documentation
All the new settings can be set and reset like the other index settings by calling the route `/indexes/:name/settings`
### `nonSeparatorTokens`
The Meilisearch word segmentation uses a default list of separators to segment words, however, for specific use cases some of the default separators shouldn't be considered separators, the `nonSeparatorTokens` setting allows to remove of some tokens from the default list of separators.
***Request payload `PUT`- `/indexes/articles/settings/non-separator-tokens`***
```json
["`@",` "#", "&"]
```
### `separatorTokens`
Some use cases need to define additional separators, some are related to a specific way of parsing technical documents some others are related to encodings in documents, the `separatorTokens` setting allows adding some tokens to the list of separators.
***Request payload `PUT`- `/indexes/articles/settings/separator-tokens`***
```json
["§", "&sep"]
```
### `dictionary`
The Meilisearch word segmentation relies on separators and language-based word-dictionaries to segment words, however, this segmentation is inaccurate on technical or use-case specific vocabulary (like `G/Box` to say `Gear Box`), or on proper nouns (like `J. R. R.` when parsing `J. R. R. Tolkien`), the `dictionary` setting allows defining a list of words that would be segmented as described in the list.
***Request payload `PUT`- `/indexes/articles/settings/dictionary`***
```json
["J. R. R.", "J.R.R."]
```
these last feature synergies well with the `stopWords` setting or the `synonyms` setting allowing to segment words and correctly retrieve the synonyms:
***Request payload `PATCH`- `/indexes/articles/settings`***
```json
{
"dictionary": ["J. R. R.", "J.R.R."],
"synonyms": {
"J.R.R.": ["jrr", "J. R. R."],
"J. R. R.": ["jrr", "J.R.R."],
"jrr": ["J.R.R.", "J. R. R."],
}
}
```
### Related specifications:
- https://github.com/meilisearch/specifications/pull/255
- https://github.com/meilisearch/specifications/pull/254
### Try it with Docker
```bash
$ docker pull getmeili/meilisearch:prototype-tokenizer-customization-3
```
## Related issue
Fixes#3610Fixes#3917
Fixes https://github.com/meilisearch/product/discussions/468
Fixes https://github.com/meilisearch/product/discussions/160
Fixes https://github.com/meilisearch/product/discussions/260
Fixes https://github.com/meilisearch/product/discussions/381
Fixes https://github.com/meilisearch/product/discussions/131
Related to https://github.com/meilisearch/meilisearch/issues/2879Fixes#2760
## What does this PR do?
- Add a setting `nonSeparatorTokens` allowing to remove a token from the default separator tokens
- Add a setting `separatorTokens` allowing to add a token in the separator tokens
- Add a setting `dictionary` allowing to override the segmentation on specific words
- add new error code `invalid_settings_non_separator_tokens` (invalid_request)
- add new error code `invalid_settings_separator_tokens` (invalid_request)
- add new error code `invalid_settings_dictionary` (invalid_request)
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
3986: Fix geo bounding box with strings r=ManyTheFish a=irevoire
# Pull Request
When sending a document with one geofield of type string (i.e.: `{ "_geo": { "lat": 12, "lng": "13" }}`), the geobounding box would exclude this document.
This PR fixes this issue by automatically parsing the string value in case we're working on a geofield.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3973
## What does this PR do?
- Automatically parse the facet value iif we're working on a geofield.
- Make insta works with snapshots in loops or closure executed multiple times. (you may need to update your cli if it panics after this PR: `cargo install cargo-insta`).
- Add one integration test in milli and in meilisearch to ensure it works forever.
- Add three snapshots for the dump that mysteriously disappeared I don't know how
Co-authored-by: Tamo <tamo@meilisearch.com>
3981: Truncate the normalized long facets used in the search for facet value r=irevoire a=ManyTheFish
# Pull Request
Truncate the normalized long facets used in the search for facet value
## targeted release
v1.3.1
## Related issue
Fixes#3978
Co-authored-by: ManyTheFish <many@meilisearch.com>
3982: Update version for the next release (v1.3.1) in Cargo.toml r=irevoire a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
3968: Bump svenstaro/upload-release-action from 2.6.1 to 2.7.0 r=curquiza a=dependabot[bot]
Bumps [svenstaro/upload-release-action](https://github.com/svenstaro/upload-release-action) from 2.6.1 to 2.7.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/svenstaro/upload-release-action/releases">svenstaro/upload-release-action's releases</a>.</em></p>
<blockquote>
<h2>2.7.0</h2>
<ul>
<li>Allow setting an explicit target_commitish <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/46">#46</a> (thanks <a href="https://github.com/Spikatrix"><code>`@Spikatrix</code></a>)</li>`
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/svenstaro/upload-release-action/blob/master/CHANGELOG.md">svenstaro/upload-release-action's changelog</a>.</em></p>
<blockquote>
<h2>[2.7.0] - 2023-07-28</h2>
<ul>
<li>Allow setting an explicit target_commitish <a href="https://redirect.github.com/svenstaro/upload-release-action/pull/46">#46</a> (thanks <a href="https://github.com/Spikatrix"><code>`@Spikatrix</code></a>)</li>`
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="1beeb572c1"><code>1beeb57</code></a> 2.7.0</li>
<li><a href="5206d34958"><code>5206d34</code></a> Bump deps</li>
<li><a href="80d7a7e41c"><code>80d7a7e</code></a> Merge pull request <a href="https://redirect.github.com/svenstaro/upload-release-action/issues/46">#46</a> from Spikatrix/master</li>
<li><a href="5eb2ffd70b"><code>5eb2ffd</code></a> Merge pull request <a href="https://redirect.github.com/svenstaro/upload-release-action/issues/110">#110</a> from svenstaro/dependabot/npm_and_yarn/word-wrap-1.2.4</li>
<li><a href="07af2f374a"><code>07af2f3</code></a> Bump word-wrap from 1.2.3 to 1.2.4</li>
<li><a href="5164410c7d"><code>5164410</code></a> Push dist</li>
<li><a href="f47fb36ff1"><code>f47fb36</code></a> Use the ref api to check if a tag exists</li>
<li><a href="212d4babf8"><code>212d4ba</code></a> Rethrow getTag error if not 404</li>
<li><a href="7670b98fa0"><code>7670b98</code></a> Push dist files</li>
<li><a href="ac438791c4"><code>ac43879</code></a> Warn when target_commit is ignored</li>
<li>Additional commits viewable in <a href="https://github.com/svenstaro/upload-release-action/compare/2.6.1...2.7.0">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3969: Bump Swatinem/rust-cache from 2.5.0 to 2.5.1 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.5.0 to 2.5.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.5.1</h2>
<ul>
<li>Fix hash contribution of <code>Cargo.lock</code>.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.5.1</h2>
<ul>
<li>Fix hash contribution of <code>Cargo.lock</code>.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="dd05243424"><code>dd05243</code></a> 2.5.1</li>
<li><a href="65dbc54a5d"><code>65dbc54</code></a> update changelog</li>
<li><a href="be7377e68e"><code>be7377e</code></a> fix <code>src/config.ts</code>: Remove <code>sort_object</code> (<a href="https://redirect.github.com/swatinem/rust-cache/issues/152">#152</a>)</li>
<li>See full diff in <a href="https://github.com/swatinem/rust-cache/compare/v2.5.0...v2.5.1">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3967: Bring back changes from `release-v1.3.0` into `main` r=ManyTheFish a=curquiza
Using a temp branch because of git conflict
Co-authored-by: Cong Chen <cong.chen@ocrlabs.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
3963: Fix the milli crate r=ManyTheFish a=irevoire
Milli was using the serde feature of either without enabling it first; thus, it wasn't working.
It was working in meilisearch, though, because `meilisearch-types` was using the feature which enables it globally for all the other crates.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3962
Co-authored-by: Tamo <tamo@meilisearch.com>
3957: fix: upgrade mimalloc dependency to resolve FreeBSD build r=irevoire a=ThatOneCalculator
# Pull Request
## Related issue
Fixes#3806
## What does this PR do?
- Upgrades mimalloc to 0.1.37
- Fixes build on FreeBSD
Ref: https://github.com/meilisearch/meilisearch/issues/3806#issuecomment-1653693468
Tested and working on FreeBSD 13.1-RELEASE-p5
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: ThatOneCalculator <kainoa@t1c.dev>
3955: Update mini-dashboard to version 0.2.11 r=curquiza a=bidoubiwa
# Pull Request
## What does this PR do?
- Updates the mini-dashboard to version [0.2.11](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.11)
## PR checklist
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
3952: Use the new safe `read-txn-no-tls` heed feature r=ManyTheFish a=Kerollmops
[We recently found out](https://github.com/meilisearch/heed/issues/191#issuecomment-1650280513) that the `read-sync-txn` heed feature was invalid and must be removed from this crate. We were declaring it in milli/meilisearch but, fortunately, not sharing the `RoTxn`s across threads 😮💨
[I recently introduced the `read-txn-no-tls` heed feature](https://github.com/meilisearch/heed/pull/194), which implements `RoTxn: Send` and allows multiple read transactions on a single thread (which we use).
This PR removes the `sync-read-txn` heed feature from the _Cargo.toml_ file. I will fix this in heed v0.20.0 and will fill a RustSec advisory in the meantime.
Co-authored-by: Clément Renault <clement@meilisearch.com>
3953: Update UTM campaign r=curquiza a=macraig
# Pull Request
## What does this PR do?
Redirect CTAs to Cloud landing page
Co-authored-by: María <maria@Marias-MacBook-Pro.local>
3942: Normalize for the search the facets values r=ManyTheFish a=Kerollmops
This PR improves and fixes the search for facet values feature. Searching for _bre_ wasn't returning facet values like _brévent_ or _brô_.
The issue was related to the fact that facets are normalized but not in the same way as the `searchableAttributes` are. We decided to normalize them further and add another intermediate database where the key is the normalized facet value, and the value is a set of the non-normalized facets. We then use these non-normalized ones to get the correct counts by fetching the associated databases.
### What's missing in this PR?
- [x] Apply the change to the whole set of `SearchForFacetValue::execute` conditions.
- [x] Factorize the code that does an intermediate normalized value fetch in a function.
- [x] Add or modify the search for facet value test.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3948: Fix hnsw internal panic by using another library r=ManyTheFish a=Kerollmops
This pull request fixes#3923. The issue concerns the `hnsw` crate panicking due to a wrong call to the `[T]::copy_from_slice` function.
I decided to switch the library to `instant-distance`, which is maintained [by someone of trust](https://lib.rs/~djc), who maintains a lot of very important crates.
- [x] Make Clippy happy with the first commit.
- [x] Reproduce the #3923 bug without this patch
- [x] Check if the bug disappeared with this PR.
- [x] Test with [the Algolia e-commerce dataset](https://www.notion.so/meilisearch/Algolia-Ecommerce-c5fa3b5f23a7485295df7e87306d5859).
Co-authored-by: Kerollmops <clement@meilisearch.com>
3940: Update mini dashboard v0.2.9 r=gillian-meilisearch a=bidoubiwa
# Pull Request
## What does this PR do?
- Updates the mini-dashboard to version [0.2.9](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.9)
## PR checklist
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
3937: Update Charabia to the last version r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes#3924
## What does this PR do?
- Update Charabia
Co-authored-by: ManyTheFish <many@meilisearch.com>
3913: Expose a Puffin server to profile the indexing process r=Kerollmops a=Kerollmops
This PR exposes a puffin HTTP server to expose the internal timing it takes to index documents, delete documents, or update the settings of an index.
<img width="1752" alt="Capture d’écran 2023-07-10 à 18 44 58" src="https://github.com/meilisearch/meilisearch/assets/3610253/a3c7a6bf-db5b-42f4-8be1-c4e31c869843">
## To be done
- [x] Move the puffin HTTP server under a feature flag.
- [x] Use [the `puffin::set_scopes_on` function](https://docs.rs/puffin/latest/puffin/fn.set_scopes_on.html) to toggle it (by using the feature directly).
When this function is called with `false`, [a call to `profile_scope!` talked 1-2ns](https://docs.rs/puffin/latest/puffin/fn.set_scopes_on.html).
- [x] Create a _PROFILING.md_ file explaining how to use it.
- [x] Explain that merging scopes on the interface is not always useful.
- [x] Add more info on the number of batched tasks (using the `puffin::profile_scope!` macro data).
- I added more info, but that's more continuous work when we consider we need more info here and there.
- [x] Clean up some scopes, and don't touch too much code to inject puffin.
- I am not sure that the _index_documents/mod.rs_ function is that complex with the addition of the scope.
- [x] Think about what we consider frames. One indexation operation or the wall program. When must we stop the frame, then?
- What we consider a frame is one single `IndexScheduler::tick` execution.
- We can change that later.
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
3932: Add UTM tracking to README r=gillian-meilisearch a=Strift
# Pull Request
Hi `@macraig` `@curquiza` 👋
## Related issue
N/A
## What does this PR do?
This PR adds UTM tracking to the links in the README.
It add UTM params to:
- links in the nav
- links to where2watch
- links in the Features section
- Docs & Getting started links (cc `@guimachiavelli)`
- links in the SDKs section
- links in the Advanced usage section
- links in the Telemetry section
- links in the Get in touch section
Additionally, this PR adds a link to the Meilisearch logo (there is currently none.)
## On the UTM pattern
All links in this PR use the new convention `@gmourier` and I agreed on:
- utm_campaign=oss
- utm_source=github
- utm_medium=meilisearch
- utm_content= where the link is in the page
It's worth considering updating the tracking link for the Cloud, which is the only one that doesn’t follow the new convention. It is currently using `utm_campaign=oss&utm_source=engine&utm_medium=meilisearch`.
Merging analytics from different UTMs is doable on Amplitude, but can't be done in Fathom. Plus, having two different conventions creates knowledge overhead, and is bound to result in corrupt analytics at some point. I suggest we change the Cloud UTM trackers too — the sooner we eat the frog, the better imo.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Strift <strift@Strifts-MacBook-Pro.local>
Co-authored-by: Strift <laurent@meilisearch.com>
3935: Update mini-dashboard to version 0.2.8 r=Kerollmops a=bidoubiwa
# Pull Request
## What does this PR do?
- Updates the mini-dashboard to version [0.2.8](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.8)
## PR checklist
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
3933: Stop computing the update files size r=ManyTheFish a=Kerollmops
This PR, related #3934, removes the part which computes the total size of the `data.ms/update_files` folder, which can take a lot of time when many updates must be processed.
It is not breaking API-side but is breaking on the result we will show to the user. The `databaseSize` field returned by the `/stats` endpoint will be reduced.
Co-authored-by: Kerollmops <clement@meilisearch.com>
3929: Fix a panic when sorting geo fields represented by strings r=Kerollmops a=Kerollmops
This issue fixes#3927 by retrieving and parsing the original string values into f64s. I also added a test to ensure we don't break it in a future version.
Co-authored-by: Kerollmops <clement@meilisearch.com>
3921: Deactivate camel case segmentation r=dureuill a=ManyTheFish
# Pull Request
This PR deactivates the camel case segmentation to retrieve the possibility to accept typos over camel-cased words
## Related issue
Fixes#3869Fixes#3818
## What does this PR do?
- deactivates camelcase segmentation
related to #3919
Co-authored-by: ManyTheFish <many@meilisearch.com>
3907: Add telemetry for define field to search on at query time r=dureuill a=ManyTheFish
Add "attributes_to_search_on" telemetry usage counter:
```json
"attributes_to_search_on": {
"total_number_of_use": 12,
},
```
This measures the number of search queries that the user uses `attributesToSearchOn` field.
related to https://github.com/meilisearch/specifications/pull/251
## reviewers:
- `@macraig` for validating the telemetry's name
- `@dureuill` for validating the code
Co-authored-by: ManyTheFish <many@meilisearch.com>
3915: `attributesToSearchOn` supports wildcards r=ManyTheFish a=dureuill
# Pull Request
## Related issue
Fixes#3912 and #3911
## What does this PR do?
- Adding `*` in the list of `attributesToSearchOn` allows searching on all the `searchableAttributes`.
- If `searchableAttributes contains "*"`, then any attribute is accepted in the `attributesToSearchOn` list.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3918: Update and fix the Test Suite CI r=dureuill a=Kerollmops
This Pull Request renames the _Run test with Rust_ into _Setup test with Rust_ for more clarity and `cargo update -p proc-macro2` to make the project compile with the latest Rust Nightly.
Co-authored-by: Kerollmops <clement@meilisearch.com>
3908: Allow a comma-separated value to the `vector` argument in GET search r=Kerollmops a=dureuill
# Pull Request
For request:
```
curl \
-X GET 'http://localhost:7700/indexes/movies/search?vector=0.123,1.124,244'
```
Before PR:
```
{"message":"Invalid value type for parameter `vector`: expected a string, but found a string: `0,1,2`","code":"invalid_search_vector","type":"invalid_request","link":"https://docs.meilisearch.com/errors#invalid_search_vector"}%
```
After PR:
```
{"hits":[],"query":"","vector":[0.123,1.124,244.0],"processingTimeMs":0,"limit":20,"offset":0,"estimatedTotalHits":1000}%
```
cc `@gmourier` `@bidoubiwa`
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3904: Sort by lexicographic order after normalization r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3893
## What does this PR do?
- Re-sort stop words after normalization so they're not sent out-of-order to the FST
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3895: Update README.md r=curquiza a=ferdi05
Adding the free-trial option
# Pull Request
## Related issue
Fixes #<issue_number>
## What does this PR do?
- ...
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Ferdinand Boas <ferdinand.boas@gmail.com>
3897: Add automated tests for `/experimental-features` route r=Kerollmops a=dureuill
# Pull Request
## What does this PR do?
- Make `RuntimeTogglableFeatures` `Eq`
- Add various tests for the `/experimental-features` route
- Integration tests for the route itself
- Integration tests for the effect of enabling `scoreDetails` and `vectorStore` through this route.
- Dump integration tests
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3889: Display the total number of tasks matching a filter/query r=dureuill a=Kerollmops
This PR returns a new field on the `/tasks` routes. The `total` field exposes the total number of tasks that matches the given filter/query. It is useful to display information on a user interface and can help understand when progress is made in processing tasks, i.e., the total number of tasks on `/tasks?statuses=succeeded` will increase over time.
Fixes#3888.
- [ ] Update the specs fo the `/tasks` route.
## How have I implemented it?
I found it much easier to run two times the task filtering system. Once with the original `from` and `limit` parameters and a second time without. The second call will return the total number of tasks that match the query, not only the number of tasks on the current page.
So far, in terms of performance, there doesn't seem to be any issue. I tried different filters with something like 250k tasks. Note that there is a limit of 1M tasks in the queue.
Co-authored-by: Clément Renault <clement@meilisearch.com>
3891: Fix the way we compute the 99th percentile r=dureuill a=Kerollmops
This PR fixes how we compute the 99th percentile by avoiding using float and doing the multiplication and divisions in the correct order avoiding going out of the buffer of timings. You can see the issue on [this rust playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021).
When there are a very small number of successful requests, the number is so tiny that the 99th percentile calculus sometimes gives an index out of the buffer. In this example, the `1`/`1.0` represent the number of timings you collected (one). As you can see, the float computation gives us the index `1.0`, with is out of a vector of only one value. This makes the engine generate a `null` value.
```rust
1 * 99 / 100 = 0 // with integers
0.99_f64 * (1.0 - 1.0) + 1.0 = 1.0 // with floats
```
Co-authored-by: Clément Renault <clement@meilisearch.com>
3890: Fix the analytics of the sort facet values by count feature r=dureuill a=Kerollmops
This PR ensures we return the right analytics from the settings route.
Co-authored-by: Clément Renault <clement@meilisearch.com>
3877: update the total_received properties of multiple events r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes#3814
## What does this PR do?
-fix name of `total_received` for several events
Co-authored-by: Tamo <tamo@meilisearch.com>
3878: Remove unsafe `atty` dependency r=dureuill a=Kerollmops
This PR replaces the `atty` dependency with the `is-terminal` one. We do that to fix GHSA-g98v-hv3f-hcfr.
Co-authored-by: Kerollmops <clement@meilisearch.com>
3851: Expose lastUpdate and isIndexing in /stats endpoint r=dureuill a=gentcys
# Pull Request
## Related issue
Fixes#3843
## What does this PR do?
- expose lastUpdate in `/stats` endpoint
- expose isIndex in `stats` endpoint
- add a method `is_task_processing` in index-scheduler/src/lib.rs.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Cong Chen <cong.chen@ocrlabs.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3874: Update version for the next release (v1.3.0) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: gillian-meilisearch <gillian-meilisearch@users.noreply.github.com>
3873: Format let-else ❤️🎉 r=Kerollmops a=dureuill
# Pull Request
Allows passing CI after landing of 6162f6f123
## What does this PR do?
- `cargo +nightly fmt`
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish
# Pull Request
Update Charabia:
- enhance Japanese segmentation
- enhance Latin Tokenization
- words containing `_` are now properly segmented into several words
- brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule
- fixes#3815
- fixes#3778
- fixes [product#151](https://github.com/meilisearch/product/discussions/151)
> Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22`
Co-authored-by: ManyTheFish <many@meilisearch.com>
3780: Be able to sort facet values by alpha or count r=dureuill a=Kerollmops
This PR introduces a new `sortFacetValuesBy` settings parameter to expose the facet distribution in either count or lexicographic/alpha order.
## Mini Spec of the `sortFacetValuesBy` Settings Parameter
This parameter can be set in the settings to change how the engine returns the facet values. There are two possible values to this parameter.
Please note that the current behavior changed a bit, and keys are returned in lexicographic order instead of undefined order. The previous order wasn't defined as we were using a `HashMap`, which returns entries in hash order (undefined), and we are now using an `IndexMap`, which returns them in insertion order (the order we actually want).
Also, note that there are performance issues when the dataset is enormous. Here are the timings of the engine running on my Macbook Pro M1 (16Go of RAM). [The dataset is 40 million songs file](https://www.notion.so/meilisearch/Songs-from-MusicBrainz-686e31b2bd3845898c7746f502a6e117), and the database size is about 50GiB. Even if you think 800ms is not that high, don't forget that the API is public, and anybody can ask for multiple facets in a single query.
| Search Kind | Get Facets | Max Values per Facet | Time for Alpha | Time for Count | Count but with #3788 |
|------------:|------------|----------------------|:--------------:|----------------|----------------------|
| Placeholder | genres | default (100) | 7ms | 187ms | 122ms |
| Placeholder | genres | 20 | 6ms | 124ms | 75ms |
| Placeholder | album | default (100) | 9ms | 808ms | 677ms |
| Placeholder | album | 20 | 8ms | 579ms | 446ms |
| Placeholder | artist | default (100) | 9ms | 462ms | 344ms |
| Placeholder | artist | 20 | 9ms | 341ms | 246ms |
### Order Values in Alphanumeric Order
This is the default one. Values will be returned by lexicographic order, ascending from A to Z.
```bash
# First, update the settings
curl 'localhost:7700/indexes/movies/settings/facetting' \
-H "Content-Type: application/json" \
-d '{ "sortFacetValuesBy": { "*": "alpha" } }'
# Then, ask for the facet distribution
curl 'localhost:7700/indexes/movies/search?facets=genres'
```
```json5
{
"hits": [
/* list of results */
],
"query": "",
"processingTimeMs": 0,
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1000,
"facetDistribution": {
"genres": {
"Action": 3215,
"Adventure": 1972,
"Animation": 1577,
"Comedy": 5883,
"Crime": 1808,
// ...
}
},
"facetStats": {}
}
```
### Order Values in Count Order
Facet values are sorted by decreasing count. The count is the number of records containing this facet value in the query results.
```bash
# First, update the settings
curl 'localhost:7700/indexes/movies/settings/facetting' \
-H "Content-Type: application/json" \
-d '{ "sortFacetValuesBy": { "*": "count" } }'
# Then, ask for the facet distribution
curl 'localhost:7700/indexes/movies/search?facets=genres'
```
```json5
{
"hits": [
/* list of results */
],
"query": "",
"processingTimeMs": 0,
"limit": 20,
"offset": 0,
"estimatedTotalHits": 1000,
"facetDistribution": {
"genres": {
"Drama": 7337,
"Comedy": 5883,
"Action": 3215,
"Thriller": 3189,
"Romance": 2507,
// ...
}
},
"facetStats": {}
}
```
## Todo List
- [x] Add tests
- [x] Send analytics when a user change the `sortFacetValuesBy`
- [x] Create a prototype and announce it in https://github.com/meilisearch/product/discussions/519.
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
3864: Remove `/experimental-features` verbs that weren't in the PRD r=dureuill a=dureuill
Removes:
- POST `/experimental-features`
- DELETE `/experimental-features`
keeping only:
- PATCH `/experimental-features`
- GET `/experimental-features`
The two routes that are described in the PRD.
Following `@guimachiavelli's` [question](https://github.com/meilisearch/documentation/issues/2482#issuecomment-1611845372) about the POST route.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3699: Search for Facet Values r=Kerollmops a=Kerollmops
This PR introduces the first version of [the _Search for Facet Values_ feature](https://github.com/meilisearch/product/discussions/515) that allows a user to search for facets, by optionally using a prefix string and optionally specifying the `q` and `filter` original search parameters to restrict the candidates to search in.
The steps to merge it into Meilisearch will first start by providing prototype Docker images. This way users will be able to test the prototypes before using them.
The current route to use the _Search for Facet Values_ feature is the `POST /indexes/{index}/facet-search` where the body is a JSON object that looks like the following:
```json5
{
"q": "spiderman", // optional
"filter": "rating > 10", // optional
"facetName": "genres",
"facetQuery": "a" // optional
}
```
## What is missing?
- [x] Send some analytics.
- [x] Support the `matchingStrategy` parameter.
- [x] Make sure that the errors are the right ones.
- [x] Use the [Index typo tolerance settings](https://www.meilisearch.com/docs/learn/configuration/typo_tolerance#minwordsizefortypos) when matching facet values.
- [x] minWordSizeForTypos.oneTypo
- [x] minWordSizeForTypos.twoTypo
- [x] Add tests
- [x] Log the time it took to compute the results.
- [x] Fix the compilation warnings.
- [x] [Create an issue to fix potential performance issues when indexing](https://github.com/meilisearch/meilisearch/issues/3862).
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3861: Add "meilisearch" prefix to last metrics that were missing it r=Kerollmops a=dureuill
# Pull Request
## Related issue
Related to #3790
## What does this PR do?
- change implementation to follow the spec on metrics name
- regenerate grafana dashboard from the code
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish
## Summary
This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`:
```json
{
"q": "Captain Marvel",
"attributesToSearchOn": ["title"]
}
```
This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on.
## Trying the prototype
A dedicated docker image has been released for this feature:
#### last prototype version:
```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1
```
#### others prototype versions:
```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0
```
## Technical Detail
The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases.
The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache.
### Relevancy limits
Almost all ranking rules behave as expected when ordering the documents.
Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it:
```rust
#[actix_rt::test]
async fn proximity_ranking_rule_order() {
let server = Server::new().await;
let index = index_with_documents(
&server,
&json!([
{
"title": "Captain super mega cool. A Marvel story",
// Perfect distance between words in an ignored attribute
"desc": "Captain Marvel",
"id": "1",
},
{
"title": "Captain America from Marvel",
"desc": "a Shazam ersatz",
"id": "2",
}]),
)
.await;
// Document 2 should appear before document 1.
index
.search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"],
json!([
{"id": "2"},
{"id": "1"},
])
);
})
.await;
}
```
Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR.
## Related
Fixes#3772
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
3745: tests: add unit test for `PayloadTooLarge` error r=curquiza a=cymruu
# Pull Request
Add a unit test for the `Payload`, which verifies that a request with a payload that is too large is rejected with the appropriate message.
This was requested in this PR https://github.com/meilisearch/meilisearch/pull/3739
## Related issue
https://github.com/meilisearch/meilisearch/pull/3739
## What does this PR do?
- Adds requested test
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Filip Bachul <filipbachul@gmail.com>
3859: Merge all analytics events pertaining to updating the experimental features r=Kerollmops a=dureuill
Follow-up to #3850
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3825: Accept semantic vectors and allow users to query nearest neighbors r=Kerollmops a=Kerollmops
This Pull Request brings a new feature to the current API. The engine accepts a new `_vectors` field akin to the `_geo` one. This vector is stored in Meilisearch and can be retrieved via search. This work is the first step toward hybrid search, bringing the best of both worlds: keyword and semantic search ❤️🔥
## ToDo
- [x] Make it possible to get the `limit` nearest neighbors from a user-generated vector by using the `vector` field of search route.
- [x] Delete the documents and vectors from the HNSW-related data structures.
- [x] Do it the slow and ugly way (we need to be able to iterate over all the values).
- [ ] Do it the efficient way (Wait for a new method or implement it myself).
- [ ] ~~Move from the `hnsw` crate to the hgg one~~ The hgg crate is too slow.
Meilisearch takes approximately 88s to answer a query. It is related to the time it takes to deserialize the `Hgg` data structure or search in it. I didn't take the time to measure precisely. We moved back to the hnsw crate which takes approximately 40ms to answer.
- [ ] ~~Wait for a fix for https://github.com/rust-cv/hgg/issues/4.~~
- [x] Fix the current dot product function.
- [x] Fill in the other `SearchResult` fields.
- [x] Remove the `hnsw` dependency of the meilisearch crate.
- [x] Fix the pages by taking the offset into account.
- [x] Release a first prototype https://github.com/meilisearch/product/discussions/621#discussioncomment-6183647
- [x] Make the pagination and filtering faster and more correct.
- [x] Return the original vector in the output search results (like `query`).
- [x] Return an `_semanticSimilarity` field in the documents (it's a dot product)
- [x] Return this score even if the `_vectors` field is not displayed
- [x] Rename the field `_semanticScore`.
- [ ] Return the `_geoDistance` value even if the `_geo` field is not displayed
- [x] Store the HNSW on possibly multiple LMDB values.
- [ ] Measure it and make it faster if needed
- [ ] Export the `ReadableSlices` type into a small external crate
- [x] Accept an `_vectors` field instead of the `_vector` one.
- [x] Normalize all vectors.
- [ ] Remove the `_vectors` field from the default searchable attributes (as we do with `_geo`?).
- [ ] Correctly compute the candidates by remembering the documents having a valid `_vectors` field.
- [ ] Return the right errors:
- [ ] Return an error when the query vector is not the same length as the vectors in the HNSW.
- [ ] We must return the user document id that triggered the vector dimension issue.
- [x] If an indexation error occurs.
- [ ] Fix the error codes when using the search route.
- [ ] ~~Introduce some settings:~~
We currently ensure that the vector length is consistent over the whole set of documents and return an error for when a vector dimension doesn't follow the current number of dimensions.
- [ ] The length of the vector the user will provide.
- [ ] The distance function (we only support dot as of now).
- [ ] Introduce other distance functions
- [ ] Euclidean
- [ ] Dot Product
- [ ] Cosine
- [ ] Make them SIMD optimized
- [ ] Give credit to qdrant
- [ ] Add tests.
- [ ] Write a mini spec.
- [ ] Release it in v1.3 as an experimental feature.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3853: docs: fixed some broken links r=gillian-meilisearch a=0xflotus
Some of the links in the README file were broken.
Co-authored-by: 0xflotus <0xflotus@gmail.com>
3850: Experimental features r=Kerollmops a=dureuill
# Pull Request
## Related issue
- Fixes https://github.com/meilisearch/meilisearch/issues/3857
- Related to https://github.com/meilisearch/meilisearch/issues/3771
## What does this PR do?
### Example
<details>
<summary>Using the feature to enable `scoreDetails`</summary>
```json
❯ curl \
-X POST 'http://localhost:7700/indexes/index-word-count-10-count/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "Batman", "limit": 1, "showRankingScoreDetails": true, "attributesToRetrieve": ["title"]}' | jsonxf
{
"message": "Computing score details requires enabling the `score details` experimental feature. See https://github.com/meilisearch/product/discussions/674",
"code": "feature_not_enabled",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#feature_not_enabled"
}
```
```json
❯ curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' \
--data-binary '{
"scoreDetails": true
}'
{"scoreDetails":true,"vectorSearch":false}
```
```json
❯ curl \
-X POST 'http://localhost:7700/indexes/index-word-count-10-count/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "Batman", "limit": 1, "showRankingScoreDetails": true, "attributesToRetrieve": ["title"]}' | jsonxf
{
"hits": [
{
"title": "Batman",
"_rankingScoreDetails": {
"words": {
"order": 0,
"matchingWords": 1,
"maxMatchingWords": 1,
"score": 1.0
},
"typo": {
"order": 1,
"typoCount": 0,
"maxTypoCount": 1,
"score": 1.0
},
"proximity": {
"order": 2,
"score": 1.0
},
"attribute": {
"order": 3,
"attribute_ranking_order_score": 1.0,
"query_word_distance_score": 1.0,
"score": 1.0
},
"exactness": {
"order": 4,
"matchType": "exactMatch",
"score": 1.0
}
}
}
],
"query": "Batman",
"processingTimeMs": 3,
"limit": 1,
"offset": 0,
"estimatedTotalHits": 46
}
```
</details>
### User standpoint
- Add new route GET/POST/PATCH/DELETE `/experimental-features` to switch on or off some of the experimental features in a manner persistent between instance restarts
- Use these new routes to allow setting on/off the following experimental features:
- vector store **TODO:** fill in issue
- score details (related to https://github.com/meilisearch/meilisearch/issues/3771)
- Make the way of checking feature availability and error message uniform for the Prometheus metrics experimental feature
- Save the enabled features in dump, restore from dumps
- **TODO:** tests:
- Test new security permissions (do they allow access with ALL, do they prevent access when missing)
- Test dump behavior, in particular ability to import existing v6 dumps
- Test basic behavior when calling the rule
### Implementation standpoint
- New DB "experimental-features"
- dumps are modified to save the state of that new DB as a `experimental-features.json` file, that is then loaded back when importing the dump. This doesn't change the dump version, as the file is optional and it missing will not cause the dump to fail
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3821: Add normalized and detailed scores to documents returned by a query r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes#3771
## What does this PR do?
### User standpoint
<details>
<summary>Request ranking score</summary>
```
echo '{
"q": "Badman dark knight returns",
"showRankingScore": true,
"limit": 10,
"attributesToRetrieve": ["title"]
}' | mieli search -i index-word-count-10-count
```
</details>
<details>
<summary>Response</summary>
```json
{
"hits": [
{
"title": "Batman: The Dark Knight Returns, Part 1",
"_rankingScore": 0.947520325203252
},
{
"title": "Batman: The Dark Knight Returns, Part 2",
"_rankingScore": 0.947520325203252
},
{
"title": "Batman Unmasked: The Psychology of the Dark Knight",
"_rankingScore": 0.6657594086021505
},
{
"title": "Legends of the Dark Knight: The History of Batman",
"_rankingScore": 0.6654905913978495
},
{
"title": "Angel and the Badman",
"_rankingScore": 0.2196969696969697
},
{
"title": "Angel and the Badman",
"_rankingScore": 0.2196969696969697
},
{
"title": "Batman",
"_rankingScore": 0.11553030303030302
},
{
"title": "Batman Begins",
"_rankingScore": 0.11553030303030302
},
{
"title": "Batman Returns",
"_rankingScore": 0.11553030303030302
},
{
"title": "Batman Forever",
"_rankingScore": 0.11553030303030302
}
],
"query": "Badman dark knight returns",
"processingTimeMs": 12,
"limit": 10,
"offset": 0,
"estimatedTotalHits": 46
}
```
</details>
- If adding a `showRankingScore` parameter to the search query, then documents returned by a search now contain an additional field `_rankingScore` that is a float bigger than 0 and lower or equal to 1.0. This field represents the relevancy of the document, relatively to the search query and the settings of the index, with 1.0 meaning "perfect match" and 0 meaning "not matching the query" (Meilisearch should never return documents not matching the query at all).
- The `sort` and `geosort` ranking rules do not influence the `_rankingScore`.
<details>
<summary>Request detailed ranking scores</summary>
```
echo '{
"q": "Badman dark knight returns",
"showRankingScoreDetails": true,
"limit": 5,
"attributesToRetrieve": ["title"]
}' | mieli search -i index-word-count-10-count
```
</details>
<details>
<summary>Response</summary>
```json
{
"hits": [
{
"title": "Batman: The Dark Knight Returns, Part 1",
"_rankingScoreDetails": {
"words": {
"order": 0,
"matchingWords": 4,
"maxMatchingWords": 4,
"score": 1.0
},
"typo": {
"order": 1,
"typoCount": 1,
"maxTypoCount": 4,
"score": 0.8
},
"proximity": {
"order": 2,
"score": 0.9545454545454546
},
"attribute": {
"order": 3,
"attributes_ranking_order": 1.0,
"attributes_query_word_order": 0.926829268292683,
"score": 0.926829268292683
},
"exactness": {
"order": 4,
"matchType": "noExactMatch",
"score": 0.26666666666666666
}
}
},
{
"title": "Batman: The Dark Knight Returns, Part 2",
"_rankingScoreDetails": {
"words": {
"order": 0,
"matchingWords": 4,
"maxMatchingWords": 4,
"score": 1.0
},
"typo": {
"order": 1,
"typoCount": 1,
"maxTypoCount": 4,
"score": 0.8
},
"proximity": {
"order": 2,
"score": 0.9545454545454546
},
"attribute": {
"order": 3,
"attributes_ranking_order": 1.0,
"attributes_query_word_order": 0.926829268292683,
"score": 0.926829268292683
},
"exactness": {
"order": 4,
"matchType": "noExactMatch",
"score": 0.26666666666666666
}
}
},
{
"title": "Batman Unmasked: The Psychology of the Dark Knight",
"_rankingScoreDetails": {
"words": {
"order": 0,
"matchingWords": 3,
"maxMatchingWords": 4,
"score": 0.75
},
"typo": {
"order": 1,
"typoCount": 1,
"maxTypoCount": 3,
"score": 0.75
},
"proximity": {
"order": 2,
"score": 0.6666666666666666
},
"attribute": {
"order": 3,
"attributes_ranking_order": 1.0,
"attributes_query_word_order": 0.8064516129032258,
"score": 0.8064516129032258
},
"exactness": {
"order": 4,
"matchType": "noExactMatch",
"score": 0.25
}
}
},
{
"title": "Legends of the Dark Knight: The History of Batman",
"_rankingScoreDetails": {
"words": {
"order": 0,
"matchingWords": 3,
"maxMatchingWords": 4,
"score": 0.75
},
"typo": {
"order": 1,
"typoCount": 1,
"maxTypoCount": 3,
"score": 0.75
},
"proximity": {
"order": 2,
"score": 0.6666666666666666
},
"attribute": {
"order": 3,
"attributes_ranking_order": 1.0,
"attributes_query_word_order": 0.7419354838709677,
"score": 0.7419354838709677
},
"exactness": {
"order": 4,
"matchType": "noExactMatch",
"score": 0.25
}
}
},
{
"title": "Angel and the Badman",
"_rankingScoreDetails": {
"words": {
"order": 0,
"matchingWords": 1,
"maxMatchingWords": 4,
"score": 0.25
},
"typo": {
"order": 1,
"typoCount": 0,
"maxTypoCount": 1,
"score": 1.0
},
"proximity": {
"order": 2,
"score": 1.0
},
"attribute": {
"order": 3,
"attributes_ranking_order": 1.0,
"attributes_query_word_order": 0.8181818181818182,
"score": 0.8181818181818182
},
"exactness": {
"order": 4,
"matchType": "noExactMatch",
"score": 0.3333333333333333
}
}
}
],
"query": "Badman dark knight returns",
"processingTimeMs": 9,
"limit": 5,
"offset": 0,
"estimatedTotalHits": 46
}
```
</details>
- If adding a `showRankingScoreDetails` parameter to the search query, then the returned documents will now contain an additional `_rankingScoreDetails` field that is a JSON object containing one field per ranking rule that was applied, whose value is a JSON object with the following fields:
- `order`: a number indicating the order this rule was applied (0 is the first applied ranking rule)
- `score` (except for `sort` and `geosort`): a float indicating how the document matched this particular rule.
- other fields that are specific to the rule, indicating for example how many words matched for a document and how many typos were counted in a matching document.
- If the `displayableAttributes` list is defined in the settings of the index, any ranking rule using an attribute **not** part of that list will be marked as `<hidden-rule>` in the `_rankingScoreDetails`.
- Search queries that are part of a `multi-search` requests are modified in the same way and each of the queries can take the `showRankingScore` and `showRankingScoreDetails` parameters independently. The results are still returned in separate lists and providing a unified list of results between multiple queries is not in the scope of this PR (but is unblocked by this PR and can be done manually by using the scores of the various documents).
### Implementation standpoint
- Fix difference in how the position of terms were computed at indexing time and query time: this difference meant that a query containing a hard separator would fail the exactness check.
- Fix the id reported by the sort ranking rule (very minor)
- Change how the cost of removing words is computed. After this change the cost no longer works for any other ranking rule than `words`. Also made `words` have a cost of 0 such that the entire cost of `words` is given by the termRemovalStrategy. The new cost computation makes it so the score is computed in a way consistent with the number of words in the query. Additionally, the words that appear in phrases in the query are also counted as matching words.
- When any score computation is requested through `showRankingScore` or `showRankingScoreDetails`, remove optimization where ranking rules are not executed on buckets of a single document: this is important to allow the computation of an accurate score.
- add virtual conditions to fid and position to always have the max cost: this ensures that the score is independent from the dataset
- the Position ranking rule now takes into account the distance to the position of the word in the query instead of the distance to the position 0.
- modified proximity ranking rule cost calculation so that the cost is 0 for documents that are perfectly matching the query
- Add a new `milli::score_details` module containing all the types that are involved in score computation.
- Make it so a bucket of result now contains a `ScoreDetails` and changed the ranking rules to produce their `ScoreDetails`.
- Expose the scores in the REST API.
- Add very light analytics for scoring.
- Update the search tests to add the expected scores.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3842: fix some typos r=dureuill a=cuishuang
# Pull Request
## Related issue
Fixes #<issue_number>
## What does this PR do?
- fix some typos
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: cui fliter <imcusg@gmail.com>
3799: Fix error messages in `check-release.sh` r=curquiza a=vvv
- `check_tag`: Report file name correctly. Use named variables.
- Introduce `read_version` helper function. Simplify the implementation.
- Show meaningful error message if `GITHUB_REF` is not set or its format is incorrect.
Co-authored-by: Valeriy V. Vorotyntsev <valery.vv@gmail.com>
3844: Fix SDK CI (again) r=curquiza a=curquiza
Following this PR: https://github.com/meilisearch/meilisearch/pull/3813
Sorry `@Kerollmops,` here is (I hope) the latest fix 🙏 I made tests last time that were not sufficient. I really did a lot this time. I hope I have not missed anything.
Co-authored-by: curquiza <clementine@meilisearch.com>
- `check_tag`: Report file name correctly. Use named variables.
- Introduce `read_version` helper function. Simplify the implementation.
- Show meaningful error message if `GITHUB_REF` is not set or its format
is incorrect.
3670: Fix addition deletion bug r=irevoire a=irevoire
The first commit of this PR is a revert of https://github.com/meilisearch/meilisearch/pull/3667. It re-enable the auto-batching of addition and deletion of tasks. No new changes have been introduced outside of `milli`. So all the changes you see on the autobatcher have actually already been reviewed.
It fixes https://github.com/meilisearch/meilisearch/issues/3440.
### What was happening?
The issue was that the `external_documents_ids` generated in the `transform` were used in a very strange way that wasn’t compatible with the deletion of documents.
Instead of doing a clear merge between the external document IDs of the DB and the one returned by the transform + writing it on disk, we were doing some weird tricks with the soft-deleted to avoid writing the fst on disk as much as possible.
The new algorithm may be a bit slower but is way more straightforward and doesn’t change depending on if the soft deletion was used or not. Here is a list of the changes introduced:
1. We now do a clear distinction between the `new_external_documents_ids` coming from the transform and only held on RAM and the `external_documents_ids` coming from the DB.
2. The `new_external_documents_ids` (coming out of the transform) are now represented as an `fst`. We don't need to struggle with the hard, soft distinction + the soft_deleted => That's easier to understand
3. When indexing documents, we merge the `external_documents_ids` coming from the DB and the `new_external_documents_ids` coming from the transform.
### Other things introduced in this PR
Since we constantly have to write small, very specialized fuzzers for this kind of bug, we decided to push the one used to reproduce this bug.
It's not perfect, but it's easy to improve in the future.
It'll also run for as long as possible on every merge on the main branch.
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
3835: Add more documentation to graph-based ranking rule algorithms + comment cleanup r=Kerollmops a=loiclec
In addition to documenting the `cheapest_path.rs` file, this PR cleans up a few outdated comments as well as some TODOs. These TODOs have been moved to https://github.com/meilisearch/meilisearch/issues/3776
Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
3836: Remove trailing whitespace in snapshots r=dureuill a=dureuill
# Pull Request
## Related issue
No issue, maintenance
## What does this PR do?
- Remove trailing whitespace in snapshots by adding a trailing `|` at the end of lines that would previously end with fixed-width integers
- This allows contributors whose editor is configured to remove trailing whitespace not to modify the tests when changing an unrelated part of the file containing the tests
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3813: Fix SDK CI for scheduled jobs r=curquiza a=curquiza
The SDK CI does not run for the scheduled job (`cron`) every day, and only works for manual triggers.
I added a job to define the Docker image we use depending on the event: `worflow_dispatch` = manual triggering, or `scheduled` = cron jobs
Co-authored-by: curquiza <clementine@meilisearch.com>
3824: Changes the way words are counted in the word count DB r=ManyTheFish a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3823
## What does this PR do?
- Apply offset when parsing query that is consistent with the indexing
### DB breaking changes
- Count the number of words in `field_id_word_count_docids`
- raise limit of word count for storing the entry in the DB from 10 to 30
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3789: Improve the metrics r=dureuill a=irevoire
# Pull Request
## Related issue
Implements https://github.com/meilisearch/meilisearch/issues/3790
Associated specification: https://github.com/meilisearch/specifications/pull/242
## Be cautious; it's DB-breaking 😱
While reviewing and after merging this PR, be cautious; if you already have a `data.ms` and run meilisearch with this code on it, it won't work because we need to cache a new information on the index stats (that are backed up on disk). You'll get internal errors.
### About the breaking-change label
We only break the API of the metrics route, which does not pose any problem since it's experimental.
## What does this PR do?
- Create a method to get the « facet distribution » of the task queue.
- Prefix all the metrics by `meilisearch_`
- Add the real database size used by meilisearch
- Add metrics on the task queue
- Update the grafana dashboard to these new changes
- Move the dashboard to the `assets` directory
- Provide a new prometheus file to scrape meilisearch easily
Co-authored-by: Tamo <tamo@meilisearch.com>
3788: Use `RoaringBitmap::deserialize_unchecked_from` to reduce the deserialization time r=irevoire a=Kerollmops
This pull request replaces the `RoaringBitmap::deserialize_from` methods with the `deserialize_unchecked_from` to avoid doing too much checks. We know the written bitmaps are valid as we do not disable the checks during the indexation phase.
I did a small test with #3780 and discovered that the deserialization time changed from 32% to 9.46% when using these changes. It seems it was low-hanging fruit hidden behind a leaf.
Co-authored-by: Kerollmops <clement@meilisearch.com>
3803: Bump Swatinem/rust-cache from 2.2.1 to 2.4.0 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.2.1 to 2.4.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.4.0</h2>
<ul>
<li>Fix cache key stability.</li>
<li>Use 8 character hash components to reduce the key length, making it more readable.</li>
</ul>
<h2>v2.3.0</h2>
<ul>
<li>Add <code>cache-all-crates</code> option, which enables caching of crates installed by workflows.</li>
<li>Add installed packages to cache key, so changes to workflows that install rust tools are detected and cached properly.</li>
<li>Fix cache restore failures due to upstream bug.</li>
<li>Fix <code>EISDIR</code> error due to globed directories.</li>
<li>Update runtime <code>`@actions/cache</code>,` <code>`@actions/io</code>` and dev <code>typescript</code> dependencies.</li>
<li>Update <code>npm run prepare</code> so it creates distribution files with the right line endings.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.4.0</h2>
<ul>
<li>Fix cache key stability.</li>
<li>Use 8 character hash components to reduce the key length, making it more readable.</li>
</ul>
<h2>2.3.0</h2>
<ul>
<li>Add <code>cache-all-crates</code> option, which enables caching of crates installed by workflows.</li>
<li>Add installed packages to cache key, so changes to workflows that install rust tools are detected and cached properly.</li>
<li>Fix cache restore failures due to upstream bug.</li>
<li>Fix <code>EISDIR</code> error due to globed directories.</li>
<li>Update runtime <code>`@actions/cache</code>,` <code>`@actions/io</code>` and dev <code>typescript</code> dependencies.</li>
<li>Update <code>npm run prepare</code> so it creates distribution files with the right line endings.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="988c164c3d"><code>988c164</code></a> 2.4.0</li>
<li><a href="bb80d0f127"><code>bb80d0f</code></a> chore: use 8 character hash components (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/143">#143</a>)</li>
<li><a href="ad97570a01"><code>ad97570</code></a> fix: cache key stability (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/142">#142</a>)</li>
<li><a href="060bda31e0"><code>060bda3</code></a> 2.3.0</li>
<li><a href="865fd1f6db"><code>865fd1f</code></a> "update dependencies and changelog"</li>
<li><a href="7c7e41ab01"><code>7c7e41a</code></a> chore: changelog v2.3.0 (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/139">#139</a>)</li>
<li><a href="68aeeba167"><code>68aeeba</code></a> chore: use linefix to ensure platform line endings (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/135">#135</a>)</li>
<li><a href="def0926359"><code>def0926</code></a> feat: add option to cache all crates (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/137">#137</a>)</li>
<li><a href="827c240e23"><code>827c240</code></a> fix: cache key dependency on installed packages (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/138">#138</a>)</li>
<li><a href="5e9fae966f"><code>5e9fae9</code></a> fix: cache restore failures (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/136">#136</a>)</li>
<li>Additional commits viewable in <a href="https://github.com/Swatinem/rust-cache/compare/v2.2.1...v2.4.0">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3792: fix the type of the document deletion by filter tasks r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3791
## What does this PR do?
- Hide the deleteDocumentByFilter internal type from the users.
Co-authored-by: Tamo <tamo@meilisearch.com>
3786: Consistently use wrapping add to avoid overflow in debug when query s… r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3785
## What does this PR do?
- Some of the code paths would erroneously use the default addition operator that has the semantics that "overflow is an error, checked at runtime in debug" instead of the intended "overflow is expected" semantics that this code use (this code is using `u16::MAX` as a sentinel). This PR makes it so the wrapping add operator is used everywhere.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3781: Revert "Improve docker cache" r=Kerollmops a=curquiza
Reverts meilisearch/meilisearch#3566 because does not work as expected, and so I want to remove useless complexity from the CI and Dockerfile
Co-authored-by: Clémentine U. - curqui <clementine@meilisearch.com>
3775: Last error code changes on the new get/delete documents routes r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes#3774
## What does this PR do?
Following the specification: https://github.com/meilisearch/specifications/pull/236
1. Get rid of the `invalid_document_delete_filter` and always use the `invalid_document_filter`
2. Introduce a new `missing_document_filter` instead of returning `invalid_document_delete_filter` (that’s consistent with all the other routes that have a mandatory parameter)
3. Always return the `original_filter` in the details (potentially set to `null`) instead of hiding it if it wasn’t used
Co-authored-by: Tamo <tamo@meilisearch.com>
3768: Fix bugs in graph-based ranking rules + make `words` a graph-based ranking rule r=dureuill a=loiclec
This PR contains three changes:
## 1. Don't call the `words` ranking rule if the term matching strategy is `All`
This is because the purpose of `words` is only to remove nodes from the query graph. It would never do any useful work when the matching strategy was `All`. Remember that the universe was already computed before by computing all the docids corresponding to the "maximally reduced" query graph, which, in the case of `All`, is equal to the original graph.
## 2. The `words` ranking rule is replaced by a graph-based ranking rule.
This is for three reasons:
1. **performance**: graph-based ranking rules benefit from a lot of optimisations by default, which ensures that they are never too slow. The previous implementation of `words` could call `compute_query_graph_docids` many times if some words had to be removed from the query, which would be quite expensive. I was especially worried about its performance in cases where it is placed right after the `sort` ranking rule. Furthermore, `compute_query_graph_docids` would clone a lot of bitmaps many times unnecessarily.
2. **consistency**: every other ranking rule (except `sort`) is graph-based. It makes sense to implement `words` like that as well. It will automatically benefit from all the features, optimisations, and bug fixes that all the other ranking rules get.
3. **surfacing bugs**: as the first ranking rule to be called (most of the time), I'd like `words` to behave the same as the other ranking rules so that we can quickly detect bugs in our graph algorithms. This actually already happened, which is why this PR also contains a bug fix.
## 3. Fix the `update_all_costs_before_nodes` function
It is a bit difficult to explain what was wrong, but I'll try. The bug happened when we had graphs like:
<img width="730" alt="Screenshot 2023-05-16 at 10 58 57" src="https://github.com/meilisearch/meilisearch/assets/6040237/40db1a68-d852-4e89-99d5-0d65757242a7">
and we gave the node `is` as argument.
Then, we'd walk backwards from the node breadth-first. We'd update the costs of:
1. `sun`
2. `thesun`
3. `start`
4. `the`
which is an incorrect order. The correct order is:
1. `sun`
2. `thesun`
3. `the`
4. `start`
That is, we can only update the cost of a node when all of its successors have either already been visited or were not affected by the update to the node passed as argument. To solve this bug, I factored out the graph-traversal logic into a `traverse_breadth_first_backward` function.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3757: Adjust the cost of edges in the `position` ranking rule by bucketing positions more aggressively r=loiclec a=loiclec
This PR significantly improves the performance of the `position` ranking rule when:
1. a query contains many words
2. the `position` ranking rule needs to be called many times
3. the score of the documents according to `position` is high
These conditions greatly increase:
1. the number of edge traversals that are needed to find a valid path from the `start` node to the `end` node
2. the number of edges that need to be deleted from the graph, and therefore the number of times that we need to recompute all the possible costs from START to END
As a result, a majority of the search time is spent in `visit_condition`, `visit_node`, and `update_all_costs_before_node`. This is frustrating because it often happens when the "universe" given to the rule consists of only a handful of document ids.
By limiting the number of possible edges between two nodes from `20` to `10`, we:
1. reduce the number of possible costs from START to END
2. reduce the number of edges that will be deleted
3. make it faster to update the costs after deleting an edge
4. reduce the number of buckets that need to be computed
In terms of relevancy, I don't think we lose or gain much. We still prefer terms that are in a lower positions, with decreasing precision as we go further. The previous choice of bucketing wasn't chosen in a principled way, and neither is this one. They both "feel" right to me.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
3738: Add analytics on the get documents resource r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3737
Related spec https://github.com/meilisearch/specifications/pull/234
## What does this PR do?
Add the analytics for the following routes:
- `GET` - `/indexes/:uid/documents`
- `GET` - `/indexes/:uid/documents/:doc_id`
- `POST` - `/indexes/:uid/documents/fetch`
These analytics are aggregated between two events:
- `Documents Fetched GET`
- `Documents Fetched POST`
That shares the same payload:
Property name | Description | Example |
|---------------|-------------|---------|
| `requests.total_received` | Total number of request received in this batch | 325 |
| `per_document_id` | `false` | false |
| `per_filter` | `true` if `POST /indexes/:indexUid/documents/fetch` endpoint was used with a filter in this batch, otherwise `false` | false |
| `pagination.max_limit` | Highest value given for the `limit` parameter in this batch | 60 |
| `pagination.max_offset` | Highest value given for the `offset` parameter in this batch | 1000 |
Co-authored-by: Tamo <tamo@meilisearch.com>
3759: Invalid error code when parsing filters r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3753
## What does this PR do?
Fix the error code in case the error comes from the evaluate of the filter for the get, fetch and delete documents routes.
Co-authored-by: Tamo <tamo@meilisearch.com>
3741: Add ngram support to the highlighter r=ManyTheFish a=loiclec
This PR fixes a bug introduced by the search refactor, where ngrams were not highlighted.
The solution was to add the ngrams to the vector of `LocatedQueryTerm` that is given to the `MatchingWords` structure.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3749: Fix back: sort error message r=ManyTheFish a=ManyTheFish
This PR reintroduces the error message modified in https://github.com/meilisearch/milli/pull/375.
However, this added double-quotes around `sort` in the message. I don't think another message contains double-quotes, so I have added a separate commit replacing the double-quotes with back-ticks, which seems more consistent with the other error messages, this last change can be reverted easily.
## Detailed changes
#### v1.2-rc0
```
The sort ranking rule must be specified in the ranking rules settings to use the sort parameter at search time.
```
#### [Reintroduce fix (previous and expected behavior)](23d1c86825)
```
You must specify where "sort" is listed in the rankingRules setting to use the sort parameter at search time
```
#### [Replace double-quotes with back-ticks (my suggestion)](4d691d071a)
```
You must specify where `sort` is listed in the rankingRules setting to use the sort parameter at search time
```
## Related
Fixes#3722
## Reviewers
- technical review: `@irevoire`
- to validate the replacement: `@macraig`
Co-authored-by: ManyTheFish <many@meilisearch.com>
3651: Use the writemap flag to reduce the memory usage r=irevoire a=Kerollmops
This draft PR is showing some stats about the memory usage of Meilisearch when [the LMDB `MDB_WRITEMAP` flag](3947014aed/libraries/liblmdb/lmdb.h (L573-L581)) is enabled and when it is not. As you can see there is a reduction of about 50% of the memory usage pick. The dataset used was [the Wikipedia one](https://www.notion.so/meilisearch/Wikipedia-8b1486e4b17547c5bda485d2d97767a0) with the first 30 000 first CSV documents without settings. This PR depends on https://github.com/meilisearch/heed/pull/168.
I just [opened a discussion](https://github.com/meilisearch/product/discussions/652) for people to understand the tradeoffs and give their feedback.
- [x] Create an experiment flag `--experimental-reduce-indexing-memory-usage`.
- [x] Add it to the config file.
- [x] Explain the tradeoff and copy/link the LMDB documentation in the help message.
- [x] Add analytics about the experimental flag.
- [x] Document that this flag cannot be used on Windows, ~~or hide it~~.
<details>
<summary>The command I used to run the tests</summary>
#### Sign the binary to be able to use Instruments / xcrun
```sh
codesign -s - -f --entitlements ~/ent.plist target/release/meilisearch
```
where `ent.plist` contains:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.security.get-task-allow</key>
<true/>
</dict>
</plist>
```
#### Run Meilisearch in measure-mode
```sh
xcrun xctrace record --template 'Allocations' --launch -- target/release/meilisearch --max-indexing-memory 0MiB
```
#### Send the wiki dataset available on notion.so / Public
```sh
for f in 0.csv 15000.csv; do echo sending $f; xh 'localhost:7700/indexes/wiki/documents' 'content-type:text/csv' `@$f;` done
```
#### Wait for the task to finish
```sh
watch --color xh --pretty all 'localhost:7700/tasks?statuses=processing'
```
</details>
Keep in mind that I tested that with the Instruments Apple tools on an iMac 5k 2019. More benchmarks must be done, especially on the indexation speed, as the flag is told to slow down writing into databases bigger that the amount of memory.
On the left Meilisearch is running without the flag. On the right, it is running with the flag.
<p align="center">
<img align="left" width="45%" alt="Instrument showing the memory usage of Meilisearch without the MDB_WRITEMAP flag" src="https://user-images.githubusercontent.com/3610253/234299524-7607f1df-6fc1-45d3-bd3d-4f9388002857.png">
<img align="right" width="45%" alt="Instrument showing the memory usage of Meilisearch with the MDB_WRITEMAP flag" src="https://user-images.githubusercontent.com/3610253/234299534-6cc3ae58-8bd9-426c-aa79-4c78f9e88b94.png">
</p>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
3739: fix: update `payload_too_large` error message to include human readable maximum acceptable payload size r=Kerollmops a=cymruu
# Pull Request
## Related issue
Fixes#3736
## What does this PR do?
- update `payload_too_large` error message as requested in ticket
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Filip Bachul <filipbachul@gmail.com>
3742: Compute split words derivations of terms that don't accept typos r=ManyTheFish a=loiclec
Allows looking for the split-word derivation for short words in the user's query (like `the -> "t he"` or `door -> do or`) as well as for 3grams.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3731: Move comments above keys in config.toml r=curquiza a=jirutka
The current style is very unusual, confusing and breaks compatibility with tools for parsing config files including comments. Everyone writes comments above the items to which they refer (maybe except pythonists), so let's stick to that.
Co-authored-by: Jakub Jirutka <jakub@jirutka.cz>
3734: Update version for the next release (v1.2.0) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
3726: Fix prefix highlighting r=loiclec a=ManyTheFish
The prefix queries were not properly highlighted, this PR now highlights only the start of a word when it matched with a prefix
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
The current style is very unusual, confusing and breaks compatibility
with tools for parsing config files including comments. Everyone writes
comments above the items to which they refer (maybe except pythonists),
so let's stick to that.
3687: Allow to disable specialized tokenizations (again) r=Kerollmops a=jirutka
In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai` feature flags to allow melisearch to be built without huge specialed tokenizations that took up 90% of the melisearch binary size. Unfortunately, due to some recent changes, this doesn't work anymore. The problem lies in excessive use of the `default` feature flag, which infects the dependency graph.
Instead of adding `default-features = false` here and there, it's easier and more future-proof to not declare `default` in `milli` and `meilisearch-types`. I've renamed it to `all-tokenizers`, which also makes it a bit clearer what it's about.
Co-authored-by: Jakub Jirutka <jakub@jirutka.cz>
In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai`
feature flags to allow melisearch to be built without huge specialed
tokenizations that took up 90% of the melisearch binary size.
Unfortunately, due to some recent changes, this doesn't work anymore.
The problem lies in excessive use of the `default` feature flag, which
infects the dependency graph.
Instead of adding `default-features = false` here and there, it's easier
and more future-proof to not declare `default` in `milli` and
`meilisearch-types`. I've renamed it to `all-tokenizers`, which also
makes it a bit clearer what it's about.
3570: Get documents by filter r=irevoire a=dureuill
# Pull Request
## Related issue
Associated spec: https://github.com/meilisearch/specifications/pull/234
None really, this is more of an extension of #3477: since after this issue we'll be able to delete documents by filter, it makes sense to also be able to get documents by filter.
## What does this PR do?
### User standpoint
- Add a new `filter` URL parameter to `GET /indexes/{:indexUid}/documents` and a new `POST /indexes/{:indexUid}/documents/fetch` route with the same `offset, limit, fields, filter`
### Implementation standpoint
- Add a new `Index::iter_documents` method to iterate on a set of documents rather than return a vector of these documents.
- Rewrite the other `Index::*documents` methods to use the new `Index::iter_documents` method.
## Usage
<details>
<summary>
Sample request and response
</summary>
```
curl -X POST 'http://localhost:7700/indexes/index-1101/documents/fetch' -H 'Content-Type: application/json' --data-binary '{ "filter": "genres = Comedy", "limit": 3, "offset": 8000}' | jsonxf
```
```json
{
"results": [
{
"id": 326126,
"title": "Bad Exorcists",
"overview": "A trio of awkward teens intend to win a horror festival by making their own movie, but wind up getting their actress possessed in the process.",
"genres": [
"Horror",
"Comedy"
],
"poster": "https://image.tmdb.org/t/p/w500/lwd65kPbjFacAw3QSXiwSsW6cFU.jpg",
"release_date": 1425081600
},
{
"id": 326215,
"title": "Ooops! Noah is Gone...",
"overview": "It's the end of the world. A flood is coming. Luckily for Dave and his son Finny, a couple of clumsy Nestrians, an Ark has been built to save all animals. But as it turns out, Nestrians aren't allowed. Sneaking on board with the involuntary help of Hazel and her daughter Leah, two Grymps, they think they're safe. Until the curious kids fall off the Ark. Now Finny and Leah struggle to survive the flood and hungry predators and attempt to reach the top of a mountain, while Dave and Hazel must put aside their differences, turn the Ark around and save their kids. It's definitely not going to be smooth sailing.",
"genres": [
"Animation",
"Adventure",
"Comedy",
"Family"
],
"poster": "https://image.tmdb.org/t/p/w500/gEJXHgpiKh89Vwjc4XUY5CIgUdB.jpg",
"release_date": 1427328000
},
{
"id": 326241,
"title": "For Here or to Go?",
"overview": "An aspiring Indian tech entrepreneur in the Silicon Valley finds himself unexpectedly battling the bizarre American immigration system to keep his dream alive or prepare to return home forever.",
"genres": [
"Drama",
"Comedy"
],
"poster": "https://image.tmdb.org/t/p/w500/ff8WaA7ItBgl36kdT232i0d0Fnq.jpg",
"release_date": 1490918400
}
],
"offset": 8000,
"limit": 3,
"total": 9331
}
```
<img width="1348" alt="Capture d’écran 2023-03-08 à 10 09 04" src="https://user-images.githubusercontent.com/41078892/223670905-6932b79b-f9b8-4a41-b59e-be2171705b7d.png">
</details>
# Draft status
- [ ] Route naming: having one route be `GET /indexes/{:indexUid}/documents` and the other `POST /indexes/{:indexUid}/documents/fetch` is suboptimal (also, technically a breaking change for documents with `fetch` as uid?), but `POST /indexes/{:indexUid}/documents` is already used to insert documents.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
3550: Delete documents by filter r=irevoire a=dureuill
# Prototype `prototype-delete-by-filter-0`
Usage:
A new route is available under `POST /indexes/{index_uid}/documents/delete` that allows you to delete your documents by filter.
The expected payload looks like that:
```json
{
"filter": "doggo = bernese",
}
```
It'll then enqueue a task in your task queue that'll delete all the documents matching this filter once it's processed.
Here is an example of the associated details;
```json
"details": {
"deletedDocuments": 53,
"originalFilter": "\"doggo = bernese\""
}
```
----------
# Pull Request
## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3477
## What does this PR do?
### User standpoint
- Modifies the `/indexes/{:indexUid}/documents/delete-batch` route to accept either the existing array of documents ids, or a JSON object with a `filter` field representing a filter to apply. If that latter variant is used, any document matching the filter will be deleted.
### Implementation standpoint
- (processing time version) Adds a new BatchKind that is not autobatchable and that performs the delete by filter
- Reuse the `documentDeletion` task with a new `originalFilter` detail that replaces the `providedIds` detail.
## Example
<details>
<summary>Sample request, response and task result</summary>
Request:
```
curl \
-X POST 'http://localhost:7700/indexes/index-10/documents/delete-batch' \
-H 'Content-Type: application/json' \
--data-binary '{ "filter" : "mass = 600"}'
```
Response:
```
{
"taskUid": 3902,
"indexUid": "index-10",
"status": "enqueued",
"type": "documentDeletion",
"enqueuedAt": "2023-02-28T20:50:31.667502Z"
}
```
Task log:
```json
{
"uid": 3906,
"indexUid": "index-12",
"status": "succeeded",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"deletedDocuments": 3,
"originalFilter": "\"mass = 600\""
},
"error": null,
"duration": "PT0.001819S",
"enqueuedAt": "2023-03-07T08:57:20.11387Z",
"startedAt": "2023-03-07T08:57:20.115895Z",
"finishedAt": "2023-03-07T08:57:20.117714Z"
}
```
</details>
## Draft status
- [ ] Error handling
- [ ] Analytics
- [ ] Do we want to reuse the `delete-batch` route in this way, or create a new route instead?
- [ ] Should the filter be applied at request time or when the deletion task is processed?
- The first commit in this PR applies the filter at request time, meaning that even if a document is modified in a way that no longer matches the filter in a later update, it will be deleted as long as the deletion task is processed after that update.
- The other commits in this PR apply the filter only when the asynchronous deletion task is processed, meaning that documents that match the filter at processing time are deleted even if they didn't match the filter at request time.
- [ ] If keeping the filter at request time, find a more elegant way to recover the user document ids from the internal document ids. The current way implemented in the first commit of this PR involves getting all the documents matching the filter, looking for the value of their primary key, and turning it into a string by copy-pasting routines found in milli...
- [ ] Security consideration, if any
- [ ] Fix the tests (but waiting until product questions are resolved)
- [ ] Add delete by filter specific tests
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
3639: Add a dedicated error variant for planned failures in index scheduler tests r=Kerollmops a=Sufflope
# Pull Request
## Related issue
Fixes#3086
## What does this PR do?
- Add a dedicated test variant in test cfg to avoid reusing a misleading existing error
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Jean-Sébastien Bour <jean-sebastien@bour.name>
3693: Implement the auto deletion of tasks r=dureuill a=irevoire
Fixes https://github.com/meilisearch/meilisearch/issues/3622
This PR should be the definite fix for #3622.
It adds a limit (1M) to the maximum number of tasks the task queue can hold.
Once the task queue reaches this limit (1M of tasks are in the task queue, whatever their status is), meilisearch will schedule a task deletion that tries to delete the oldest 100k tasks.
If meilisearch can't delete 100k tasks because some of them are not yet finished, it will delete as many tasks as possible.
Once the limit is reached, you're still able to register new tasks. The engine will only stop you from adding new tasks once [the other hard limit](https://github.com/meilisearch/meilisearch/pull/3659) of 10GiB of tasks is reached (that's between 5M and 15M of tasks depending on your workflow).
-------
Technically;
- We only try to schedule our task deletion when calling the tick function but before creating a new batch. This means we never enqueue a task we're not going to process ~right away.
- If our task deletion doesn't delete anything, we don't enqueue it and log a warn the user that the engine is not working properly
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3542: Refactor of the search algorithms r=dureuill a=loiclec
This PR refactors a large part of the search logic (related to https://github.com/meilisearch/meilisearch/issues/3547)
- The "query tree" is replaced by a "query graph", which describes the different ways in which the search query can be interpreted and precomputes the word derivations for each query term. Example:
<img width="1162" alt="Screenshot 2023-02-27 at 10 26 50" src="https://user-images.githubusercontent.com/6040237/221525270-87917cc0-60d1-473f-847f-2c5a7de9e370.png">
- The control flow between the ~criterions~ ranking rules is managed in a single place instead of being independently implemented by each ranking rule.
- The set of document candidates is determined greedily from the beginning. It is often referred as the "universe" in the code.
- The ranking rules `proximity`, `attribute`, `typo`, and (maybe) `exactness` are or will be implemented using a K-shortest path graph algorithm. This minimises the number of database and bitmap operations we need to do to compute each ranking rule bucket. It also simplifies the code a lot since a lot of ranking rules will share a large part of their implementation.
- Pointers to database values are stored in a cache to avoid searching in the LMDB databases needlessly.
- The result of some roaring bitmap operations are also stored in a cache, although we'll need to measure the memory pressure this puts on the system and maybe deactivate this cache later on.
- Search requests can be visually logged and debugged in tests.
TODO:
- [ ] Reintroduce search benchmarks
- [x] Implement `disableOnWords` and `disableOnAttributes` settings of typo tolerance
- [x] Implement "exhaustive number of hits
- [x] Implement `attribute` ranking rule
- [x] Indexing changes: split into `word_fid_docids` and `word_position_docids` (with bucketed position)
- [x] Ranking rule implementations
- [ ] Implement `exactness` ranking rule
- [x] Initial implementation
- [ ] Correct implementation when followed by `Words`
- [ ] Implement `geosort` ranking rule
- [ ] Add tests
- [x] Typo tolerance `disableOnWords`/`disableOnAttributes`
- [ ] Geosort
- [x] Exactness
- [ ] Attribute/Position
- [ ] Interactions between ranking rules:
- [x] Typo/Proximity/Attribute not preceded by Words
- [x] Exactness not preceded by Words
- [x] Exactness -> Words (+ check universe correctness)
- [x] Exactness -> Typo, etc.
- [ ] Sort -> Words (performance tests)
- [ ] Attribute/Position -> Typo
- [ ] Attribute/Position -> Proximity
- [x] Typo -> Exactness
- [x] Typo -> Proximity
- [x] Proximity -> Typo
- [x] Words
- [x] Typo
- [x] Proximity
- [x] Sort
- [x] Ngrams
- [x] Split words
- [x] Ngram + Split Words
- [x] Term matching strategy
- [x] Distinct attribute
- [x] Phrase Search
- [x] Placeholder search
- [x] Highlighter
- [x] Limit the number of word derivations in a search query
- [x] Compute the initial universe correctly according to the terms matching strategy
- [x] Implement placeholder search
- [x] Get the list of ranking rules from the settings
- [x] Implement `distinct`
- [x] Determine what to do when one of `attribute`, `proximity`, `typo`, or `exactness` is placed before `words`
- [x] Make sure the correct number of allowed typos is used for each word, including the prefix one
- [x] Make sure stop words are treated correctly (e.g. correct position in query graph), including in phrases
- [x] Support phrases correctly
- [x] Support synonyms
- [x] Support split words
- [x] Support combination of ngram + split-words (e.g. `whiteh orse` -> `"white horse"`)
- [x] Implement `typo` ranking rule
- [x] Implement `sort` ranking rule
- [x] Use existing `Search` interface to use the new search algorithms
- [x] Remove old code
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Conflicts | resolution
----------|-----------
Cargo.lock | added mimalloc
Cargo.toml | took origin/main version
milli/src/search/criteria/exactness.rs | deleted after checking it was only clippy changes
milli/src/search/query_tree.rs | deleted after checking it was only clippy changes
3702: Update charabia v0.7.2 r=curquiza a=ManyTheFish
fixes#3701fixes#3689fixes#3285
3710: Updated messages pointing to the docs website r=curquiza a=roy9495
# Pull Request
Fixes partially #3668
## What does this PR do?
- ...Any messages referencing this docs site https://docs.meilisearch.com has been changed to this docs site https://meilisearch.com/docs .
Thanks.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: TATHAGATA ROY <98920199+roy9495@users.noreply.github.com>
3718: Fix broken README links r=curquiza a=Kerollmops
This PR fixes#3708 by changing the link to the new SDKs and API Reference pages. I would like to thank `@Tommy-42,` who also found the issue.
Co-authored-by: Clément Renault <clement@meilisearch.com>
3709: Add SDKs test in a CI r=Kerollmops a=curquiza
Add a CI running every week to run the `nightly` docker image of Meilisearch with the most "strategic" SDKs (most used, well tested, strongly typed SDK)
- meilisearch-js
- instant-meilisearch
- meilisearch-php
- meilisearch-python
- meilisearch-go
- meilisearch-ruby
- meilisearch-rust
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops
# Pull Request
## Related issue
This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884) is describing in a very detailed manner.
## What does this PR do?
### `IS NULL` and `IS NOT NULL`
This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter.
- `IS NULL` matches fields that `EXISTS` AND `= IS NULL`
- `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL`
1. `{"name": "A", "price": null}`
2. `{"name": "A", "price": 10}`
3. `{"name": "A"}`
`price IS NULL` would match 1
`price IS NOT NULL` or `NOT price IS NULL` would match 2,3
`price EXISTS` would match 1, 2
`price NOT EXISTS` or `NOT price EXISTS` would match 3
common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2
### `IS EMPTY` and `IS NOT EMPTY`
- `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty
- `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty.
1. `{"name": "A", "tags": null}`
2. `{"name": "A", "tags": [null]}`
3. `{"name": "A", "tags": []}`
4. `{"name": "A", "tags": ["hello","world"]}`
5. `{"name": "A", "tags": [""]}`
6. `{"name": "A"}`
7. `{"name": "A", "tags": {}}`
8. `{"name": "A", "tags": {"t1":"v1"}}`
9. `{"name": "A", "tags": {"t1":""}}`
10. `{"name": "A", "tags": ""}`
`tags IS EMPTY` would match 3,7,10
`tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9
`tags IS NULL` would match 1
`tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10
`tags EXISTS` would match 1,2,3,4,5,7,8,9,10
`tags NOT EXISTS` or `NOT tags EXISTS` would match 6
common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9
## What should the reviewer do?
- Check that I tested the filters
- Check that I deleted the ids of the documents when deleting documents
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3464: Remove CLI changes for clippy r=curquiza a=dureuill
# Pull Request
## Related issue
Reverts #3434, which was linked to https://github.com/rust-lang/rust-clippy/issues/10087, as putting the lint in the pedantic group [is being uplifted to Rust 1.67.1](https://github.com/rust-lang/rust/pull/107743#issue-1573438821) (my thanks to everyone involved in this work 🎉).
## Motivation
- Using "standard issue" clippy in the CI spares our contributors and us from knowing/remembering to add the lint when running clippy locally
- In particular, spares us from configuring tools like rust-analyzer to take the lint into account.
- Should this lint come back in another form in the future, we won't blindly ignore it, and we will be able to reassess it, which will be good wrt writing idiomatic Rust. By the time this occurs, lints might be configurable through `clippy.toml` too, which would make disabling one globally much more convenient if needs be.
## Note
We should wait for the release of Rust 1.67.1 and its propagation to our CI before merging this. The PR won't pass CI before this.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3696: Remove the unused snapshot files r=dureuill a=irevoire
While « reverting by hand » the PR about the auto batching of the addition&deletion of documents, we forgot to remove the associated snapshot files.
Here is the command I used to generate this PR: `cargo insta test --delete-unreferenced-snapshots`
Co-authored-by: Tamo <tamo@meilisearch.com>
3566: Improve docker cache r=curquiza a=inductor
# Pull Request
## Related issue
Fixes #<issue_number>
## What does this PR do?
- Use `--mount=type=cache` and GHA build cache for faster build
- `=> => transferring context: 75.37MB` to `=> => transferring context: 19.21MB` with `.dockerignore`
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: inductor <kela@inductor.me>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
3694: Remove Uffizzi because not used by the team r=Kerollmops a=curquiza
After discussion with the team, we don't really use Uffizzi and even had issues with it recently: the preview build failing randomly leading to unwanted GitHub notifications + issue to reach the container
<img width="628" alt="Capture d’écran 2023-04-25 à 11 55 39" src="https://user-images.githubusercontent.com/20380692/234298586-ef9cff85-ded8-4ec5-ba13-bb7a24d476b3.png">
Thanks for the involvement of Uffizzi team anyway, the tool is just not adapted to our team 😊
Co-authored-by: curquiza <clementine@meilisearch.com>
3661: Bump the dependencies r=Kerollmops a=Kerollmops
This PR bumps all the dependencies of Meilisearch and the sub-crates. I first did a `cargo upgrade --compatible`, fixed the tests, continued with a `cargo upgrade --incompatible`, and finally fixed the compilation issues.
I wasn't able to bump _rustls_ to _0.21.0_ (_actix-web_ is using _tokio-tls 0.23.4_, which uses the _0.20.8_) and neither _vergen_, which changed everything without any guide, I didn't find a way to declare that with the version 8.1.1.
bc25f378e8/meilisearch/build.rs (L4-L9)Fixes#3285
Co-authored-by: Kerollmops <clement@meilisearch.com>
3688: Following release v1.1.1: bring back changes into `main` r=curquiza a=curquiza
`@meilisearch/engine-team` ensure the changes we bring to `main` are the ones you want
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: dureuill <dureuill@users.noreply.github.com>
3674: Bump h2 from 0.3.15 to 0.3.17 r=Kerollmops a=dependabot[bot]
Bumps [h2](https://github.com/hyperium/h2) from 0.3.15 to 0.3.17.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/hyperium/h2/releases">h2's releases</a>.</em></p>
<blockquote>
<h2>v0.3.17</h2>
<h2>What's Changed</h2>
<ul>
<li>Add <code>Error::is_library()</code> method to check if the originated inside <code>h2</code>.</li>
<li>Add <code>max_pending_accept_reset_streams(usize)</code> option to client and server
builders.</li>
<li>Fix theoretical memory growth when receiving too many HEADERS and then
RST_STREAM frames faster than an application can accept them off the queue.
(CVE-2023-26964)</li>
</ul>
<h2>v0.3.16</h2>
<h2>What's Changed</h2>
<ul>
<li>Set <code>Protocol</code> extension on requests when received Extended CONNECT requests.</li>
<li>Remove <code>B: Unpin + 'static</code> bound requiremented of bufs</li>
<li>Fix releasing of frames when stream is finished, reducing memory usage.</li>
<li>Fix panic when trying to send data and connection window is available, but stream window is not.</li>
<li>Fix spurious wakeups when stream capacity is not available.</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/vi"><code>`@vi</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/646">hyperium/h2#646</a></li>
<li><a href="https://github.com/silence-coding"><code>`@silence-coding</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/651">hyperium/h2#651</a></li>
<li><a href="https://github.com/gtsiam"><code>`@gtsiam</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/649">hyperium/h2#649</a></li>
<li><a href="https://github.com/howardjohn"><code>`@howardjohn</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/658">hyperium/h2#658</a></li>
<li><a href="https://github.com/cloneable"><code>`@cloneable</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/655">hyperium/h2#655</a></li>
<li><a href="https://github.com/aftersnow"><code>`@aftersnow</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/657">hyperium/h2#657</a></li>
<li><a href="https://github.com/vadim-eg"><code>`@vadim-eg</code></a>` made their first contribution in <a href="https://redirect.github.com/hyperium/h2/pull/661">hyperium/h2#661</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/hyperium/h2/blob/master/CHANGELOG.md">h2's changelog</a>.</em></p>
<blockquote>
<h1>0.3.17 (April 13, 2023)</h1>
<ul>
<li>Add <code>Error::is_library()</code> method to check if the originated inside <code>h2</code>.</li>
<li>Add <code>max_pending_accept_reset_streams(usize)</code> option to client and server
builders.</li>
<li>Fix theoretical memory growth when receiving too many HEADERS and then
RST_STREAM frames faster than an application can accept them off the queue.
(CVE-2023-26964)</li>
</ul>
<h1>0.3.16 (February 27, 2023)</h1>
<ul>
<li>Set <code>Protocol</code> extension on requests when received Extended CONNECT requests.</li>
<li>Remove <code>B: Unpin + 'static</code> bound requiremented of bufs</li>
<li>Fix releasing of frames when stream is finished, reducing memory usage.</li>
<li>Fix panic when trying to send data and connection window is available, but stream window is not.</li>
<li>Fix spurious wakeups when stream capacity is not available.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="af4bcacf6d"><code>af4bcac</code></a> v0.3.17</li>
<li><a href="d3f37e9fba"><code>d3f37e9</code></a> feat: add <code>max_pending_accept_reset_streams(n)</code> options</li>
<li><a href="5bc8e72e5f"><code>5bc8e72</code></a> fix: limit the amount of pending-accept reset streams</li>
<li><a href="8088ca658d"><code>8088ca6</code></a> feat: add Error::is_library method</li>
<li><a href="481c31d528"><code>481c31d</code></a> chore: Use Cargo metadata for the MSRV build job</li>
<li><a href="d3d50ef812"><code>d3d50ef</code></a> chore: Replace unmaintained/outdated GitHub Actions</li>
<li><a href="45b9bccdfc"><code>45b9bcc</code></a> chore: set rust-version in Cargo.toml (<a href="https://redirect.github.com/hyperium/h2/issues/664">#664</a>)</li>
<li><a href="b9dcd39915"><code>b9dcd39</code></a> v0.3.16</li>
<li><a href="96caf4fca3"><code>96caf4f</code></a> Add a message for EOF-related broken pipe errors (<a href="https://redirect.github.com/hyperium/h2/issues/615">#615</a>)</li>
<li><a href="732319039f"><code>7323190</code></a> Avoid spurious wakeups when stream capacity is not available (<a href="https://redirect.github.com/hyperium/h2/issues/661">#661</a>)</li>
<li>Additional commits viewable in <a href="https://github.com/hyperium/h2/compare/v0.3.15...v0.3.17">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3673: Handle the task queue being full r=irevoire a=dureuill
# Pull Request
## Related issue
Fixes a remaining issue with #3659 where it was not always possible to send tasks back even after deleting some tasks when prompted.
## Tests
- see integration test
- also manually tested with a 1MiB task queue. Was not possible to become unblocked before this PR, is now possible.
## What does this PR do?
- Use the `non_free_pages_size` method to compute the space occupied by the task db instead of the `real_disk_size` which is not always affected by task deletion.
- Expand the test so that it adds a task after the deletion. The test now fails before this PR and succeeds after this PR.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3672: Update version for the next release (v1.1.1) in Cargo.toml r=dureuill a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: dureuill <dureuill@users.noreply.github.com>
3659: stops receiving tasks once the task queue is full r=Kerollmops a=irevoire
Give 20GiB to the task queue + once 50% of the task queue is used, it blocks itself and only receives task deletion requests to ensure we never get in a state where we can’t do anything.
Also, create a new error message when we reach this case:
```
Meilisearch cannot receive write operations because the size limit of the tasks database has been reached. Please delete tasks to continue performing write operations.
```
Co-authored-by: Tamo <tamo@meilisearch.com>
3666: Update README to reference new docs website r=curquiza a=guimachiavelli
With the launch of the new website, we need to update the README so it references the correct URLs.
Two minor details:
- we have removed the contact page from the documentation (it had the same links present in this readme and on the community section of the landing page)
- we have recently separated filtering and faceted search into two separate articles
Co-authored-by: gui machiavelli <hey@guimachiavelli.com>
3667: Disable autobatching of additions and deletions r=irevoire a=dureuill
# Pull Request
## Related issue
Fixes#3664
## What does this PR do?
- Modifies the autobatcher to not batch document additions and deletions, as a workaround to the DB corruption in #3664
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
With the launch of the new website, we need to update the README so it references the correct URLs.
Two minor details:
- we have removed the contact page from the documentation (it had the same links present in this readme and on the community section of the landing page)
- we have recently separated filtering and faceted search into two separate articles
3647: Improve the health route by ensuring lmdb is not down r=irevoire a=irevoire
Fixes#3644
In this PR, I try to make a small read on the `AuthController` and `IndexScheduler` databases.
The idea is not to validate that everything works but just to avoid the bug we had last time when lmdb was stuck forever.
In order to get access to the `AuthController` without going through the extractor, I need to wrap it in the `Data` type from `actix-web`.
And to do that, I had to patch our extractor so it works with the `Data` type as well.
Co-authored-by: Tamo <tamo@meilisearch.com>
3638: filter errors - `geo(x,y,z)` and `geoDistance(x,y,z)` r=irevoire a=cymruu
# Pull Request
## Related issue
Fixes#3006
## What does this PR do?
- fixes the display function of `ParseError::ReservedGeo`. The previous display string was missing back ticks around available filters.
- makes the filter-parser parse `_geo(x,y,z)` and `geoDistance(x,y,z)` filters. Both parsing functions will throw an error if the filter was used.
- removes `FilterError::ReservedGeo` and `FilterError::Reserved` error variants since they are now thrown by the filter-parser.
I ran `cargo test --package milli -- --test-threads 1` and the tests passed.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Filip Bachul <filipbachul@gmail.com>
Co-authored-by: filip <filipbachul@gmail.com>
3631: sort errors - `_geo(x,y)` and `_geoDistance(x,y)` return `ReservedKeyword` r=irevoire a=cymruu
# Pull Request
## Related issue
Fix part of #3006 (sort errors)
## What does this PR do?
- made meilisearch return `ReservedKeyword` when sorting by `_geo(x,y)` or `_geoDistance(x,y)`
Screenshot:

I ran `cargo test --package milli -- --test-threads 1` and the tests passed.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Filip Bachul <filipbachul@gmail.com>
3636: Add a newline after the meilisearch version in the issue template r=curquiza a=bidoubiwa
# Pull Request
## What does this PR do?
Just a small change that makes it easier to remove the version example and adds consistency with the other examples in its positioning.
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
3633: Bump Swatinem/rust-cache from 2.2.0 to 2.2.1 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.2.0 to 2.2.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.2.1</h2>
<ul>
<li>Update <code>`@actions/cache</code>` dependency to fix usage of <code>zstd</code> compression.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.2.1</h2>
<ul>
<li>Update <code>`@actions/cache</code>` dependency to fix usage of <code>zstd</code> compression.</li>
</ul>
<h2>2.2.0</h2>
<ul>
<li>Add new <code>save-if</code> option to always restore, but only conditionally save the cache.</li>
</ul>
<h2>2.1.0</h2>
<ul>
<li>Only hash <code>Cargo.{lock,toml}</code> files in the configured workspace directories.</li>
</ul>
<h2>2.0.2</h2>
<ul>
<li>Avoid calling <code>cargo metadata</code> on pre-cleanup.</li>
<li>Added <code>prefix-key</code>, <code>cache-directories</code> and <code>cache-targets</code> options.</li>
</ul>
<h2>2.0.1</h2>
<ul>
<li>Primarily just updating dependencies to fix GitHub deprecation notices.</li>
</ul>
<h2>2.0.0</h2>
<ul>
<li>The action code was refactored to allow for caching multiple workspaces and
different <code>target</code> directory layouts.</li>
<li>The <code>working-directory</code> and <code>target-dir</code> input options were replaced by a
single <code>workspaces</code> option that has the form of <code>$workspace -> $target</code>.</li>
<li>Support for considering <code>env-vars</code> as part of the cache key.</li>
<li>The <code>sharedKey</code> input option was renamed to <code>shared-key</code> for consistency.</li>
</ul>
<h2>1.4.0</h2>
<ul>
<li>Clean both <code>debug</code> and <code>release</code> target directories.</li>
</ul>
<h2>1.3.0</h2>
<ul>
<li>Use Rust toolchain file as additional cache key.</li>
<li>Allow for a configurable target-dir.</li>
</ul>
<h2>1.2.0</h2>
<ul>
<li>Cache <code>~/.cargo/bin</code>.</li>
<li>Support for custom <code>$CARGO_HOME</code>.</li>
<li>Add a <code>cache-hit</code> output.</li>
<li>Add a new <code>sharedKey</code> option that overrides the automatic job-name based key.</li>
</ul>
<h2>1.1.0</h2>
<ul>
<li>Add a new <code>working-directory</code> input.</li>
<li>Support caching git dependencies.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="6fd3edff69"><code>6fd3edf</code></a> 2.2.1</li>
<li><a href="a1c019f71a"><code>a1c019f</code></a> update dependencies and rebuild</li>
<li><a href="664ce0090f"><code>664ce00</code></a> chore: Create check-dist.yml (<a href="https://redirect.github.com/Swatinem/rust-cache/issues/96">#96</a>)</li>
<li>See full diff in <a href="https://github.com/Swatinem/rust-cache/compare/v2.2.0...v2.2.1">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3623: Update mini-dashboard to version v0.2.7 r=curquiza a=bidoubiwa
## Changes
* Retrieve the API Key from the url parameters (#416) `@qdequele`
## 🐛 Bug Fixes
* Fix show more button not displaying all fields (#419) `@bidoubiwa`
Thanks again to `@bidoubiwa,` and `@qdequele!` 🎉
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
3624: Reduce the time to import a dump r=irevoire a=irevoire
When importing a dump, this PR does multiple things;
- Stops committing the changes between each task import
- Stop deserializing + serializing every bitmap for every task
Pros:
Importing 1M tasks in a dump went from 3m36 on my computer to 6s
Cons: We use slightly more memory, but since we’re using roaring bitmaps, that really shouldn’t be noticeable.
Fixes#3620
Co-authored-by: Tamo <tamo@meilisearch.com>
3621: Fix facet normalization r=Kerollmops a=ManyTheFish
# Pull Request
Make sure the facet normalization is the same between indexing and search.
## Related issue
Fixes#3599
Co-authored-by: ManyTheFish <many@meilisearch.com>
3608: In a settings update, check to see if the primary key actually changes before erroring out r=irevoire a=GregoryConrad
Previously, if the primary key was set and a Settings update contained a primary key, an error would be returned.
However, this error is not needed if the new PK == the current PK. This PR just checks to see if the PK actually changes before raising an error.
I came across this slight hiccup in https://github.com/GregoryConrad/mimir/issues/156#issuecomment-1484128654
Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>
3617: update the geoBoundingBox feature r=dureuill a=irevoire
Closing #3616
Implementing this change in the spec: 38a715c072
Now instead of using the (top_left, bottom_right) corners of the bounding box, it’s using the (top_right, bottom_left) corners.
Co-authored-by: Tamo <tamo@meilisearch.com>
Previously, if the primary key was set and a Settings update contained
a primary key, an error would be returned.
However, this error is not needed if the new PK == the current PK.
This commit just checks to see if the PK actually changes
before raising an error.
3597: ensure that the task queue is correctly imported r=irevoire a=irevoire
## Related issue
Fixes#3596
I updated all the dump's integration tests to ensure that we're effectively able to query the tasks
Co-authored-by: Tamo <tamo@meilisearch.com>
3576: Add boolean support for csv documents r=irevoire a=irevoire
Fixes https://github.com/meilisearch/meilisearch/issues/3572
## What does this PR do?
Add support for the boolean types in csv documents.
The type definition is `boolean` and the possible values are
- `true` for true
- `false` for false
- ` ` for null
Here is an example:
```csv
#id,cute:boolean
0,true
1,false
2,
```
Co-authored-by: Tamo <tamo@meilisearch.com>
3587: Enable cache again in test suite CI r=curquiza a=curquiza
Following the change in this PR introduced in v1.1: https://github.com/meilisearch/meilisearch/pull/3422
The cache was removed due to failures (lack of space). Now the binary is smaller (from 250Mb to 50Mb) we want to try to enable the cache again.
Indeed, without the cache step, the CIs are wayyyy slower (45min instead of 20-30min).
For later: Rust 1.68 introduced a new way to fetch crates. Updating the rust version might also help in the future!
Co-authored-by: curquiza <clementine@meilisearch.com>
3568: CI: Fix `publish-aarch64` job that still uses ubuntu-18.04 r=Kerollmops a=curquiza
Fixes#3563
Main change
- add the usage of the `ubuntu-18.04` container instead of the native `ubuntu-18.04` of GitHub actions: I had to install docker in the container.
Small additional changes
- remove useless `fail-fast` and unused/irrelevant matrix inputs (`build`, `linker`, `os`, `use-cross`...)
- Remove useless step in job
Proof of work with this CI triggered on this current branch: https://github.com/meilisearch/meilisearch/actions/runs/4366233882
3569: Enhance Japanese language detection r=dureuill a=ManyTheFish
# Pull Request
This PR is a prototype and can be tested by downloading [the dedicated docker image](https://hub.docker.com/layers/getmeili/meilisearch/prototype-better-language-detection-0/images/sha256-a12847de00e21a71ab797879fd09777dadcb0881f65b5f810e7d1ed434d116ef?context=explore):
```bash
$ docker pull getmeili/meilisearch:prototype-better-language-detection-0
```
## Context
Some Languages are harder to detect than others, this miss-detection leads to bad tokenization making some words or even documents completely unsearchable. Japanese is the main Language affected and can be detected as Chinese which has a completely different way of tokenization.
A [first iteration has been implemented for v1.1.0](https://github.com/meilisearch/meilisearch/pull/3347) but is an insufficient enhancement to make Japanese work. This first implementation was detecting the Language during the indexing to avoid bad detections during the search.
Unfortunately, some documents (shorter ones) can be wrongly detected as Chinese running bad tokenization for these documents and making possible the detection of Chinese during the search because it has been detected during the indexing.
For instance, a Japanese document `{"id": 1, "name": "東京スカパラダイスオーケストラ"}` is detected as Japanese during indexing, during the search the query `東京` will be detected as Japanese because only Japanese documents have been detected during indexing despite the fact that v1.0.2 would detect it as Chinese.
However if in the dataset there is at least one document containing a field with only Kanjis like:
_A document with only 1 field containing only Kanjis:_
```json
{
"id":4,
"name": "東京特許許可局"
}
```
_A document with 1 field containing only Kanjis and 1 field containing several Japanese characters:_
```json
{
"id":105,
"name": "東京特許許可局",
"desc": "日経平均株価は26日 に約8カ月ぶりに2万4000円の心理的な節目を上回った。株高を支える材料のひとつは、自民党総裁選で3選を決めた安倍晋三首相の経済政策への期待だ。恩恵が見込まれるとされる人材サービスや建設株の一角が買われている。ただ思惑が先行して資金が集まっている面 は否めない。実際に政策効果を取り込む企業はどこか、なお未知数だ。"
}
```
Then, in both cases, the field `name` will be detected as Chinese during indexing allowing the search to detect Chinese in queries. Therefore, the query `東京` will be detected as Chinese and only the two last documents will be retrieved by Meilisearch.
## Technical Approach
The current PR partially fixes these issues by:
1) Adding a check over potential miss-detections and rerunning the extraction of the document forcing the tokenization over the main Languages detected in it.
> 1) run a first extraction allowing the tokenizer to detect any Language in any Script
> 2) generate a distribution of tokens by Script and Languages (`script_language`)
> 3) if for a Script we have a token distribution of one of the Language that is under the threshold, then we rerun the extraction forbidding the tokenizer to detect the marginal Languages
> 4) the tokenizer will fall back on the other available Languages to tokenize the text. For example, if the Chinese were marginally detected compared to the Japanese on the CJ script, then the second extraction will force Japanese tokenization for CJ text in the document. however, the text on another script like Latin will not be impacted by this restriction.
2) Adding a filtering threshold during the search over Languages that have been marginally detected in documents
## Limits
This PR introduces 2 arbitrary thresholds:
1) during the indexing, a Language is considered miss-detected if the number of detected tokens of this Language is under 10% of the tokens detected in the same Script (Japanese and Chinese are 2 different Languages sharing the "same" script "CJK").
2) during the search, a Language is considered marginal if less than 5% of documents are detected as this Language.
This PR only partially fixes these issues:
- ✅ the query `東京` now find Japanese documents if less than 5% of documents are detected as Chinese.
- ✅ the document with the id `105` containing the Japanese field `desc` but the miss-detected field `name` is now completely detected and tokenized as Japanese and is found with the query `東京`.
- ❌ the document with the id `4` no longer breaks the search Language detection but continues to be detected as a Chinese document and can't be found during the search.
## Related issue
Fixes#3565
## Possible future enhancements
- Change or contribute to the Library used to detect the Language
- the related issue on Whatlang: https://github.com/greyblake/whatlang-rs/issues/122
Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
3577: Avoid fetching an LMDB value with an empty string r=ManyTheFish a=Kerollmops
# Pull Request
## Related issue
Fixes#3574
## What does this PR do?
This PR fixes a bug where an empty key fetches an entry in the database. LMDB throws an error if an empty or too-long key is used to fetch an entry. This empty string seems to have been generated by the Charabia tokenizer.
Co-authored-by: Clément Renault <clement@meilisearch.com>
3567: Clean CI file names r=curquiza a=curquiza
Make the CI names more consistent to ease the Gillian's onboarding 😇
No impact for the users or the developers of the team
Co-authored-by: curquiza <clementine@meilisearch.com>
3561: Fix the snapshots permissions on unix system r=irevoire a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3507
The snapshot permissions were wrong after the v0.30 and the huge refacto of the index scheduler.
Fix this issue + add a test on the permissions on unix
Co-authored-by: Tamo <tamo@meilisearch.com>
3562: Update version for the next release (v1.1.0) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Add a new job to the Rust workflow to run 'cargo build' and 'cargo
test' (on the cron schedule only) with the '--all-features' flag.
This will execute across all three environments: Linux, macOS,
Windows.
Autoformat the Rust workflow file via the Red Hat YAML extension for
Visual Studio Code:
https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml
This straightens out whitespace and string quoting for safer parsing.
Fixes#3506.
3529: Add an analytics on the geo bounding box feature r=ManyTheFish a=irevoire
Fixes#3527
[The specification of the geoBoundingBox](https://github.com/meilisearch/specifications/pull/223) feature has been updated and now introduces a new analytics to follow the usage of the geoBoundingBox feature in the search requests.
Co-authored-by: Tamo <tamo@meilisearch.com>
3525: Fix phrase search containing stop words r=ManyTheFish a=ManyTheFish
# Summary
A search with a phrase containing only stop words was returning an HTTP error 500,
this PR filters the phrase containing only stop words dropping them before the search starts, a query with a phrase containing only stop words now behaves like a placeholder search.
fixes https://github.com/meilisearch/meilisearch/issues/3521
related v1.0.2 PR on milli: https://github.com/meilisearch/milli/pull/779
Co-authored-by: ManyTheFish <many@meilisearch.com>
3544: Attempt to use default vram budget for faster startup r=Kerollmops a=dureuill
# Pull Request
## Related issue
Follow-up to #3382: addresses the added startup time on Windows/macOS.
## What does this PR do?
- Attempt to skip budget calculation by using "known good values" instead
- Perform dichotomic budget calculation as fallback only when the known value is not actually good.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3543: config: case `experimental_enable_metrics` in snake_case r=dureuill a=dureuill
# Pull Request
Avoids "Error: unknown field `experimental-enable-metrics` at line 1 column 1" error when using the default config file.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3537: fix a bug where the filestore could try to parse its own tmp file and fail (main) r=irevoire a=curquiza
Co-authored-by: Tamo <tamo@meilisearch.com>
3524: Update the metrics route r=irevoire a=irevoire
Fixes#3523
Make the metrics available by default without a feature flag.
+ Rename the cli-flag to `experimental-enable-metrics`.
Co-authored-by: Tamo <tamo@meilisearch.com>
3331: Limit the number of concurrently opened indexes r=dureuill a=dureuill
# Pull Request
## Related issue
Relevant to #1841, fixes#3382
## What does this PR do?
### User standpoint
- Limit the number of concurrently opened indexes (currently, the number of indexes that can be concurrently opened is computed at startup)
- When too many an index is opened, the least recently used one is closed and its virtual memory released.
- This allows a user to have an arbitrary number of indexes of an arbitrary size
### Implementation standpoint
- Added a LRU cache map in `index-scheduler::lru`. A more complete implementation (eg with helper functions not used here) is available but would better fit a dedicated crate.
- Use the LRU cache map in the `IndexScheduler`. To simplify the lifecycle of indexes, they are never removed from the cache when they are in the middle of a resize or delete operation. To achieve this, an intermediate `Vec` stores the UUIDs of the indexes that are in the middle of such an operation.
- Upon creating the index scheduler object, compute the total virtual memory that is adressable by using a dichotomic search on the max size of an index. Use this as a base to compute the number of indexes that can be open with 2TiB per index. If the virtual memory address space is lower than 2TiB, then only allow for 1 index of a fraction of that size.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3530: Fix highlighter bug r=Kerollmops a=ManyTheFish
# Pull Request
There was a highlighting issue on CJK's character, we were highlighting too many characters and these additional characters were duplicated after the highlight tag.
## Related issue
Fixes#3517Fixes#3526
## What does this PR do?
- add a test showcasing the bug
- fix the bug by activating the char_map creation of the tokenizer during the highlighting process
Co-authored-by: ManyTheFish <many@meilisearch.com>
3534: Update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter r=Kerollmops a=irevoire
Fixes#3533
Co-authored-by: Tamo <tamo@meilisearch.com>
3417: Allow multiple searches in a single request r=irevoire a=dureuill
# Pull Request
## Related issue
Fixes#3427
## What does this PR do?
### User standpoint
- Adds a new `/multi-search` entry point (not to be confused with the existing `/{index_uid}/search` entry points) that accepts a POST whose body is an object containing an array of queries.
- Each query must specify on which index it acts by providing its `indexUid`. Other parameters are identical to the one in the existing search routes (`q`, `limit`, etc.).
- The response is a JSON object containing an array of the results for each search query as if it had been performed using the `/{index_uid}/search` routes.
### Implementation standpoint
- Refactor authentication module:
- Allow tenant token to be checked even without an index in URL
- Add `meilisearch-auth` as a dependency to `index-scheduler` so as to have a working method of checking if the indexes are authorized there that takes into account both the API key and the tenant token (existing method relied on a behavior that was returning the allowed indexes from the API key as long as there weren't any tenant token)
- Make `AuthFilter` an object with invariants and so its fields are now private
- Use the methods of `AuthFilter` to know if an index is authorized rather than relying on its internal search rules.
- Make tenant token search rules optional and `None` when the `AuthFilter` was not built with a tenant token.
- Add a new `routes::index::search::multiple_search` module containing a post handler that performs the same work as the existing `routes::index::search` post handler, but in a loop.
- Add various tests
- Add authentication test suite
### Sample request
<details>
<summary>
Click to see request/response
</summary>
```json
~/datasets
❯ curl \
-X POST 'http://localhost:7700/multi-search' \
-H 'Content-Type: application/json' \
--data-binary '{"queries": [{ "indexUid": "index-0", "q": "toto", "limit": 1 }, {"indexUid": "index-1", "q": "titi", "limit": 1}]}' | jsonxf
{ "results": [
{
"indexUid": "index-0",
"hits": [
{
"id": 20480,
"title": "Toto - 25th Anniversary - Live in Amsterdam",
"overview": "Filmed in High Definition in Amsterdam on Toto's 25th Anniversary Tour in 2003, this stunning concert captures the band at their very best, reunited with original vocalist Bobby Kimball. The set combines all their hits with tracks from their latest album \"Through the Looking Glass\" and other live favorites, performed in front of a wildly enthusastic sell-out crowd. Extras include 35 minute behind-the-scenes film following the band through various stages of their world tour including footage from Japan, Thailand, South Korea, and France. Toto celebrate their 25th anniversary with this blistering live concert, filmed in Amsterdam on February 25th, 2003. Proving they've still got exactly what it takes to move a crowd, the band perform a mixture of medley's, solo spots, and huge hits. Tracks include \"Rosanna,\" \"Africa,\" \"Hold The Line,\" a cover of the Beatles' \"While My Guitar Gently Weeps,\" and many more.",
"genres": [
"Music"
],
"poster": "https://image.tmdb.org/t/p/w500/7SCbUPwoB8Z7VUIA1Rn1WWwjNiT.jpg",
"release_date": 1064275200
}
],
"query": "toto",
"processingTimeMs": 1,
"limit": 1,
"offset": 0,
"estimatedTotalHits": 17
},
{
"indexUid": "index-1",
"hits": [
{
"id": 41212,
"title": "Titicut Follies",
"overview": "The film is a stark and graphic portrayal of the conditions that existed at the State Prison for the Criminally Insane at Bridgewater, Massachusetts. TITICUT FOLLIES documents the various ways the inmates are treated by the guards, social workers and psychiatrists.",
"genres": [
"Documentary"
],
"poster": "https://image.tmdb.org/t/p/w500/2Ju5hn1ofOPeP1eRJtQWakiHuhW.jpg",
"release_date": -70934400
}
],
"query": "titi",
"processingTimeMs": 0,
"limit": 1,
"offset": 0,
"estimatedTotalHits": 7
}
]}
```
</details>
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3482: Optimize meilisearch uffizzi build r=curquiza a=waveywaves
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3476
## What does this PR do?
even though docker cache was being used earlier for uffizzi builds, seems like the cache layers weren't persisting. This commit adds changes to move meilisearch building outside the dockerfile so that we can use the rust cache action. We are also building to the musl target so that the binary for meilisearch which is created can be used for the uffizzi ttyd image which uses alpine.
Meilisearch build time brought to 5 mins example https://github.com/waveywaves/meilisearch/actions/runs/4142776058
we also update the version of uffizzi action used here which fixes another uffizzi bug where the environments are not deployed. https://app.uffizzi.com/github.com/waveywaves/meilisearch/pull/2 was built as a part of a test for this PR and we can be sure that the deployment works well now.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vibhav Bobade <vibhav.bobde@gmail.com>
even though docker cache was being used earlier for uffizzi builds,
seems like the cache layers weren't persisting. This commit adds changes
to move meilisearch building outside the dockerfile so that we can
use the rust cache action. We are also building to the musl target
so that the binary for meilisearch which is created can be used for
the uffizzi ttyd image which uses alpine.
3347: Enhance language detection r=irevoire a=ManyTheFish
## Summary
Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query.
This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search.
## Detail
- Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected
- Fill the newly created database during indexing
- Create an allow-list with this database and pass it to Charabia
- Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese
## Related issues
Fixes#2403Fixes#3513
Co-authored-by: f3r10 <frledesma@outlook.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
3496: Fix metrics feature r=irevoire a=james-2001
# Pull Request
## Related issue
Resolves: #3469
See also: #2763
## What does this PR do?
As reported the metrics feature was broken by still using and old reference to `meilisearch_auth::actions`. This commit switches to the new location, `meilisearch_types::keys::actions`.
The original issue was not *that* clear as to exactly what was broken, and the build logs have disappeared, but it seemed to just be this one line fix. If this is not the case and I've missed the mark let me know, and i'll head back to the drawing board.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: James <james.a.may.2001@gmail.com>
3505: Csv delimiter r=irevoire a=irevoire
Fixes https://github.com/meilisearch/meilisearch/issues/3442
Closes https://github.com/meilisearch/meilisearch/pull/2803
Specified in https://github.com/meilisearch/specifications/pull/221
This PR is a reimplementation of https://github.com/meilisearch/meilisearch/pull/2803, on the new engine. Thanks for your idea and initial PR `@MixusMinimax;` sorry I couldn’t update/merge your PR. Way too many changes happened on the engine in the meantime.
**Attention to reviewer**; I had to update deserr to implement the support of deserializing `char`s
-------
It introduces four new error messages;
- Invalid value in parameter csvDelimiter: expected a string of one character, but found an empty string
- Invalid value in parameter csvDelimiter: expected a string of one character, but found the following string of 5 characters: doggo
- csv delimiter must be an ascii character. Found: 🍰
- The Content-Type application/json does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type text/csv.
And one error code;
- `invalid_index_csv_delimiter`
The `invalid_content_type` error code is now also used when we encounter the `csvDelimiter` query parameter with a non-csv content type.
Co-authored-by: Tamo <tamo@meilisearch.com>
3319: Transparently resize indexes on MaxDatabaseSizeReached errors r=Kerollmops a=dureuill
# Pull Request
## Related issue
Related to https://github.com/meilisearch/meilisearch/discussions/3280, depends on https://github.com/meilisearch/milli/pull/760
## What does this PR do?
### User standpoint
- Meilisearch no longer fails tasks that encounter the `milli::UserError(MaxDatabaseSizeReached)` error.
- Instead, these tasks are retried after increasing the maximum size allocated to the index where the failure occurred.
### Implementation standpoint
- Add `Batch::index_uid` to get the `index_uid` of a batch of task if there is one
- `IndexMapper::create_or_open_index` now takes an additional `size` argument that allows to (re)open indexes with a size different from the base `IndexScheduler::index_size` field
- `IndexScheduler::tick` now returns a `Result<TickOutcome>` instead of a `Result<usize>`. This offers more explicit control over what the behavior should be wrt the next tick.
- Add `IndexStatus::BeingResized` that contains a handle that a thread can use to await for the resize operation to complete and the index to be available again.
- Add `IndexMapper::resize_index` to increase the size of an index.
- In `IndexScheduler::tick`, intercept task batches that failed due to `MaxDatabaseSizeReached` and resize the index that caused the error, then request a new tick that will eventually handle the still enqueued task.
## Testing the PR
The following diff can be applied to this branch to make testing the PR easier:
<details>
```diff
diff --git a/index-scheduler/src/index_mapper.rs b/index-scheduler/src/index_mapper.rs
index 553ab45a..022b2f00 100644
--- a/index-scheduler/src/index_mapper.rs
+++ b/index-scheduler/src/index_mapper.rs
`@@` -228,13 +228,15 `@@` impl IndexMapper {
drop(lock);
+ std:🧵:sleep_ms(2000);
+
let current_size = index.map_size()?;
let closing_event = index.prepare_for_closing();
- log::info!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2);
+ log::error!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2);
closing_event.wait();
- log::info!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2);
+ log::error!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2);
let index_path = self.base_path.join(uuid.to_string());
let index = self.create_or_open_index(&index_path, None, 2 * current_size)?;
`@@` -268,8 +270,10 `@@` impl IndexMapper {
match index {
Some(Available(index)) => break index,
Some(BeingResized(ref resize_operation)) => {
+ log::error!("waiting for resize end");
// Deadlock: no lock taken while doing this operation.
resize_operation.wait();
+ log::error!("trying our luck again!");
continue;
}
Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())),
diff --git a/index-scheduler/src/lib.rs b/index-scheduler/src/lib.rs
index 11b17d05..242dc095 100644
--- a/index-scheduler/src/lib.rs
+++ b/index-scheduler/src/lib.rs
`@@` -908,6 +908,7 `@@` impl IndexScheduler {
///
/// Returns the number of processed tasks.
fn tick(&self) -> Result<TickOutcome> {
+ log::error!("ticking!");
#[cfg(test)]
{
*self.run_loop_iteration.write().unwrap() += 1;
diff --git a/meilisearch/src/main.rs b/meilisearch/src/main.rs
index 050c825a..63f312f6 100644
--- a/meilisearch/src/main.rs
+++ b/meilisearch/src/main.rs
`@@` -25,7 +25,7 `@@` fn setup(opt: &Opt) -> anyhow::Result<()> {
#[actix_web::main]
async fn main() -> anyhow::Result<()> {
- let (opt, config_read_from) = Opt::try_build()?;
+ let (mut opt, config_read_from) = Opt::try_build()?;
setup(&opt)?;
`@@` -56,6 +56,8 `@@` We generated a secure master key for you (you can safely copy this token):
_ => (),
}
+ opt.max_index_size = byte_unit::Byte::from_str("1MB").unwrap();
+
let (index_scheduler, auth_controller) = setup_meilisearch(&opt)?;
#[cfg(all(not(debug_assertions), feature = "analytics"))]
```
</details>
Mainly, these debug changes do the following:
- Set the default index size to 1MiB so that index resizes are initially frequent
- Turn some logs from info to error so that they can be displayed with `--log-level ERROR` (hiding the other infos)
- Add a long sleep between the beginning and the end of the resize so that we can observe the `BeingResized` index status (otherwise it would never come up in my tests)
## Open questions
- Is the growth factor of x2 the correct solution? For a `Vec` in memory it makes sense, but here we're manipulating quantities that are potentially in the order of 500GiBs. For bigger indexes it may make more sense to add at most e.g. 100GiB on each resize operation, avoiding big steps like 500GiB -> 1TiB.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
3470: Autobatch addition and deletion r=irevoire a=irevoire
This PR adds the capability to meilisearch to batch document addition and deletion together.
Fix https://github.com/meilisearch/meilisearch/issues/3440
--------------
Things to check before merging;
- [x] What happens if we delete multiple time the same documents -> add a test
- [x] If a documentDeletion gets batched with a documentAddition but the index doesn't exist yet? It should not work
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
3467: Identify builds git tagged with `prototype-...` in CLI and analytics r=curquiza a=dureuill
# Pull Request
## What does this PR do?
- Parses the last git tag to extract a prototype name if:
- Current build uses the prototype tag (not after the tag) precisely
- The prototype tag name respects the following conditions:
1. starts with `prototype-`
2. ends with a number
3. the hyphen-separated segment right before the number is not a number (required to reject commits after the tag).
- Display the prototype name in the launch summary in the CLI
- Send the prototype name to analytics if any
- Update prototypes instructions in CONTRIBUTING.md
|`VERGEN_GIT_SEMVER_LIGHTWEIGHT` value | Prototype |
|---|---|
| `Some("prototype-geo-bounding-box-0-139-gcde89018")` | `None` (does not end with a number) |
| `Some("prototype-geo-bounding-box-0-139-89018")` | `None` (before the last segment is a number) |
| `Some("prototype-geo-bounding-box-0")` | `Some("prototype-geo-bounding-box-0")` |
| `Some("prototype-geo-bounding-box")` | `None` (does not end with a number") |
| `Some("geo-bounding-box-0")` | `None` (does not start with "prototype") |
| `None` | `None` |
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Metrics feature was relying on old references. Refactored with inspiration from the `get_stats` method in `meilisearch/src/routes/lib.rs`. `enable_metrics_routes` added to options in `segment_analytics`.
Resolves: #3469
See also: #2763
3499: Use the workspace inheritance r=Kerollmops a=irevoire
Use the workspace inheritance [introduced in rust 1.64](https://blog.rust-lang.org/2022/09/22/Rust-1.64.0.html#cargo-improvements-workspace-inheritance-and-multi-target-builds).
It allows us to define the version of meilisearch once in the main `Cargo.toml` and let all the other `Cargo.toml` uses this version.
`@curquiza` I added you as a reviewer because I had to patch some CI scripts
And `@Kerollmops,` I had to bump the `cargo_toml` crates because our version was getting old and didn't support the feature yet.
Also, in another PR, I would like to unify some of our dependencies to ensure we always stay in sync between all our crates.
Co-authored-by: Tamo <tamo@meilisearch.com>
3490: Fix attributes set candidates r=curquiza a=ManyTheFish
# Pull Request
Fix attributes set candidates for v1.1.0
## details
The attribute criterion was not returning the remaining candidates when its internal algorithm was been exhausted.
We had a loss of candidates by the attribute criterion leading to the bug reported in the issue linked below.
After some investigation, it seems that it was the only criterion that had this behavior.
We are now returning the remaining candidates instead of an empty bitmap.
## Related issue
Fixes#3483
PR on milli for v1.0.1: https://github.com/meilisearch/milli/pull/777
Co-authored-by: ManyTheFish <many@meilisearch.com>
3492: Bump deserr r=Kerollmops a=irevoire
Bump deserr to the latest version;
- We now use the default actix-web extractors that deserr provides (which were copy/pasted from meilisearch)
- We also use the default `JsonError` message provided by deserr instead of defining our own in meilisearch
- Finally, we get the new `did you mean?` error message. Fix#3493
Co-authored-by: Tamo <tamo@meilisearch.com>
3495: Add tests with rust nightly in CI r=curquiza a=ztkmkoo
# Pull Request
## Related issue
Fixes#3402
## What does this PR do?
- add ci test with rust nightly
- make test with rust stable not run on schedule event
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Kebron <ztkmkoo@gmail.com>
3479: Unify "Bad latitude" & "Bad longitude" errors r=irevoire a=cymruu
# Pull Request
## Related issue
Fix part of #3006
## What does this PR do?
- Moved out `BadGeoLat`, `BadGeoLng`, `BadGeoBoundingBoxTopIsBelowBottom` from `FilterError` into newly introduced error type `ParseGeoError`.
- Renamed `BadGeo` error to `ReservedGeo`
- Used new `ParseGeoError` type in `FilterError` and `AscDescError`
Screenshot:

I ran `cargo test --package milli -- --test-threads 1` and tests passed.
`--test-threads` was set to 1 because my OS complained about too many opened files.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Filip Bachul <filipbachul@gmail.com>
Co-authored-by: filip <filipbachul@gmail.com>
As reported the metrics feature was broken by still using and old reference to `meilisearch_auth::actions`. This commit switches to the new location, `meilisearch_types::keys::actions`.
Resolves: #3469
See also: #2763
3174: Allow wildcards at the end of index names for API Keys and Tenant tokens r=irevoire a=Kerollmops
This PR introduces the wildcards at the end of the index names when identifying indexes in the API Keys and tenant tokens. It fixes#2788 and fixes#2908. This PR is based on `@akhildevelops'` work.
Note that when a tenant token filter is chosen to restrict a search, it is always the most restrictive pattern that is chosen. If we have an index pattern _prod*_ that defines _filter1_ and _p*_ that defines _filter2_, the engine will choose _filter1_ over _filter2_ as it is defined for a most restrictive pattern, _prod*_. This restrictiveness is defined by 1. is it exact, without _*_ 2. the length of the pattern.
It is a continuation of work that has already started and should close#2869.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3480: Gitignore vscode & jetbrains IDE folders r=curquiza a=AymanHamdoun
# Pull Request
## Related issue
There is no issue for it, and i couldn't find an appropriate category to make an issue for it.
## What does this PR do?
- Its just a gitignore edit so people who use vscode and jetbrains IDEs (like IntelliJ) dont have to deal with committing the folder the IDE generates to store local project configs by mistake. (I honestly wanted to fork the repo to add something else but this bothered me enough to make a PR for it first)
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ✔️ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [✔️ ] Have you read the contributing guidelines?
- [✔️ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Ayman <ayman.s.hamdoun@gmail.com>
3456: Bump tokio from 1.24.1 to 1.24.2 r=curquiza a=dependabot[bot]
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.1 to 1.24.2.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a href="https://github.com/tokio-rs/tokio/commits">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3461: Bring v1 changes into main r=curquiza a=Kerollmops
Also bring back changes in milli (the remote repository) into main done during the pre-release
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Philipp Ahlner <philipp@ahlner.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3451: Pin Rust version in Clippy job r=dureuill a=curquiza
Avoid "surprising" CI failure because of clippy when rust is releasing a new version
Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
3040: feat: create a preview environment for every PR using Uffizzi r=curquiza a=waveywaves
# Pull Request
## Related discussion (was created as an issue initially)
https://github.com/meilisearch/meilisearch/discussions/2883
## What does this PR do?
This PR adds gha workflows to create preview environments on every PR. This workflow also posts the preview url as a comment on the PR.
[This PR created against my fork of meilisearch](https://github.com/waveywaves/meilisearch/pull/2) demonstrates how this change behaves.
In [the demo preview](https://pr-2-deployment-7396-meilisearch.app.uffizzi.com/) you can run the `meilisearch` binary built from the PR and can access meilisearch running from the PR by adding `/meilisearch` to the url of the PR.
eg: I go to the demo preview at the URL https://app.uffizzi.com/github.com/waveywaves/meilisearch/pull/2, run `meilisearch` in the terminal. I can access this running instance of `meilisearch` in the preview env fromhttps://pr-2-deployment-7396-meilisearch.app.uffizzi.com/meilisearch
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vibhav Bobade <vibhav.bobde@gmail.com>
776: Reduce incremental indexing time of `words_prefix_position_docids` DB r=curquiza a=loiclec
Fixes partially https://github.com/meilisearch/milli/issues/605
The `words_prefix_position_docids` can easily contain millions of entries. Thus, iterating
over it can be very expensive. But we do so needlessly for every document addition tasks.
It can sometimes cause indexing performance issues when :
- a user sends many `documentAdditionOrUpdate` tasks that cannot be all batched together (for example if they are interspersed with `documentDeletion` tasks)
- the documents contain long, diverse text fields, thus increasing the number of entries in `words_prefix_position_docids`
- the index has accumulated many soft-deleted documents, further increasing the size of `words_prefix_position_docids`
- the machine running Meilisearch does not have great IO performance (e.g. slow SSD, or quota-limited by the cloud provider)
Note, before approving the PR: the only changed file should be `milli/src/update/words_prefix_position_docids.rs`.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3441: Fix import of dump v2 r=dureuill a=irevoire
# Pull Request
This bug was introduced because of a mistake we did earlier: We said the last version to export dump v2 was the v0.21.0 while it was the v0.22.0.
To fix the bug I updated our whole v2 reader to use the code from meilisearch v0.22.0.
Also:
- Import the bugged dump in the tests
- Test the import of this dump in the v2 reader and current reader
## Related issue
Fixes#3435
Co-authored-by: Tamo <tamo@meilisearch.com>
774: Update version for the next release (v0.41.1) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
This database can easily contain millions of entries. Thus, iterating
over it can be very expensive.
For regular `documentAdditionOrUpdate` tasks, `del_prefix_fst_words`
will always be empty. Thus, we can save a significant amount of time
by adding this `if !del_prefix_fst_words.is_empty()` condition.
The code's behaviour remains completely unchanged.
3437: Make clippy happy for Rust 1.67, allow uninlined_format_args r=Kerollmops a=dureuill
# Pull Request
This PR is the equivalent of #3434 for the `release-v1.0.0` branch.
See #3434 for more information.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3434: Make clippy happy for Rust 1.67, allow `uninlined_format_args` r=Kerollmops a=dureuill
# Pull Request
This PR allows `uninlined_format_args` in CI for clippy.
This is due to https://github.com/rust-lang/rust-clippy/issues/10087, which in particular has correctness issues wrt edition 2018 crates, and is a big change altogether. https://github.com/rust-lang/rust-clippy/pull/10265 is already open in order to change the category of this lint to "pedantic", meaning that if this latter PR merges, a future Rust release will accept our code unmodified wrt uninlined format arguments.
As a result, this PR introduces the following changes:
1. Allow `uninlined_format_args` in the clippy command in CI.
2. Use rewind rather than seek(0)
3. Remove lifetimes that clippy deems needless.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3420: Add image hyperlink in README.md r=curquiza a=gregsadetsky
# Pull Request
## What does this PR do?
- tiny README.md improvement: under "SDKs & integration tools", add a hyperlink to the image with all of the language logos so that clicking the image leads to the integrations page. Otherwise, right now, clicking this image leads to the image file in the repo, which is not really useful.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [X] Have you read the contributing guidelines?
- [X] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
3422: Remove cache from test CI r=dureuill a=curquiza
Comment the lines in CIs where we use the test CIs
We indeed have cache issues (lack of space on the machine) when running our test CIs
https://github.com/meilisearch/meilisearch/pull/3403
Co-authored-by: Greg Sadetsky <lepetitg@gmail.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
3415: Test all the errors of wrong `_geo` field and bump milli r=dureuill a=irevoire
## Attention to reviewer
The first commit is only a refactoring of the test suite to use snapshot tests everywhere instead of `assert_eq`.
It doesn’t change the content of anything and there is probably nothing to review. I just made it for maintenance purpose in the future.
Fix https://github.com/meilisearch/meilisearch/issues/3414
Co-authored-by: Tamo <tamo@meilisearch.com>
3409: Bump libgit2-sys from 0.14.1+1.5.0 to 0.14.2+1.5.1 r=Kerollmops a=dependabot[bot]
Bumps [libgit2-sys](https://github.com/rust-lang/git2-rs) from 0.14.1+1.5.0 to 0.14.2+1.5.1.
<details>
<summary>Commits</summary>
<ul>
<li><a href="a233483a39"><code>a233483</code></a> Update to libgit2 1.5.1</li>
<li><a href="bce15556ef"><code>bce1555</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/rust-lang/git2-rs/issues/909">#909</a> from ehuss/ssh-keys</li>
<li><a href="222fbf3b9e"><code>222fbf3</code></a> Bump versions</li>
<li><a href="fa41943135"><code>fa41943</code></a> Change the certificate_check callback to support passthrough.</li>
<li><a href="84e21aad4e"><code>84e21aa</code></a> Add ability to get the SSH host key and its type.</li>
<li><a href="e6aa6666b9"><code>e6aa666</code></a> Bump git2-curl version. (<a href="https://github-redirect.dependabot.com/rust-lang/git2-rs/issues/861">#861</a>)</li>
<li><a href="46674cebd9"><code>46674ce</code></a> Fix warning about unused_must_use for Box::from_raw (<a href="https://github-redirect.dependabot.com/rust-lang/git2-rs/issues/860">#860</a>)</li>
<li><a href="951dce9dea"><code>951dce9</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/rust-lang/git2-rs/issues/858">#858</a> from davidkna/git2150</li>
<li><a href="8871f8e9b3"><code>8871f8e</code></a> bump libgit2 to 1.5.0</li>
<li><a href="04278a24ba"><code>04278a2</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/rust-lang/git2-rs/issues/839">#839</a> from davidkna/libgit2_143</li>
<li>Additional commits viewable in <a href="https://github.com/rust-lang/git2-rs/compare/0.14.1...libgit2-sys-0.14.2">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3406: Master Key: Implements errors and warnings from the specification r=irevoire a=dureuill
<sub>Now in technicolor</sub>
# Pull Request
## What does this PR do?
- Uses `atty` and `termcolor` as dependency
- Use these dependencies to print colored background for warning messages
- Update messages to match https://github.com/meilisearch/specifications/pull/209
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
While updating the test suite I also noticed an issue with the indexed_documents value of failed task and had to update it.
I also named a bunch of snapshots that had no name sorry 😬
3395: Indicate filterable attributes in facet distributions when user requests a non filterable one. r=irevoire a=dureuill
# Pull Request
## Related issue
Fixes#3390
## What does this PR do?
- bump milli & deserr
- Update and add tests
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
771: Update version for the next release (v0.40.0) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
3343: Extract creation and last updated timestamp for v3 dump r=curquiza a=FrancisMurillo
# Pull Request
## Related issue
Fixes#2988
## What does this PR do?
Inspired by the v4 dump implementation, this extracts the first `createdAt` and last `updatedAt` fields by parsing the task queue.
Questions:
- Should the parsing of the tasks be cached instead of being parsed for every index since it might add a performance penalty?
- I am not sure if the `created_at` and `processed_at` fields are correct
- Should I assume the data is sorted in some order like with `uuid` or `updateId`? I assumed the list is unordered.
- I was planning to populate my dev instance with data and dump my data. Is there a way to dump with previous versions?
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Francis Murillo <evacuee.overlap.vs3op@aleeas.com>
3398: Error links use underscores again r=irevoire a=dureuill
# Pull Request
## Related issue
Follow-up of #3288 where [it was decided](https://github.com/meilisearch/meilisearch/pull/3288#issuecomment-1396733603) to revert course on the separator to use in error anchors.
## What does this PR do?
- Use `_` again as separator in anchors of error link
- Fix tests
Impacts `@meilisearch/docs-team` : we need `_`-separated anchors to be generated in the online documentation to match the ones emitted from the engine.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
763: Fixes error message when lat and lng are unparseable r=loiclec a=ahlner
# Pull Request
## Related issue
Fixes partially [#3007](https://github.com/meilisearch/meilisearch/issues/3007)
## What does this PR do?
- Changes function validate_geo_from_json to return a BadLatitudeAndLongitude if lat or lng is a string and not parseable to f64
- implemented some unittests
- Derived PartialEq for GeoError to use assert_eq! in tests
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Philipp Ahlner <philipp@ahlner.com>
767: Update version for the next release (v0.39.2) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
766: Indicate filterable attributes when the user sets a non filterable attribute in facet distributions r=irevoire a=dureuill
# Pull Request
## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3390
## What does this PR do?
- Title
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3389: Return `invalid_search_facets` rather than `bad_request` when using facet on a non filterable attribute r=irevoire a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3384
## What does the PR does
- title
- also adds a test
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3372: Enhance facet string normalization r=ManyTheFish a=ManyTheFish
# Pull Request
Use compatibility decomposition normalizer in facet string extraction in order to have a more human friendly sort order.
Now, [é (U+00E9)](https://www.compart.com/fr/unicode/U+00E9) is converted to [e (U+0065)](https://www.compart.com/fr/unicode/U+0065) + [◌́ (U+0301)](https://www.compart.com/fr/unicode/U+0301). This way any word starting with an accented/diacritized version of a character is put just after the words starting with the unaccented version of the character.
## Related issue
Fixes#3260
Co-authored-by: ManyTheFish <many@meilisearch.com>
3335: Remove test badge r=curquiza a=curquiza
Suggestion of removal: from my point of view, this badge does not provide any useful information, and most of all is often outdated. Currently ours is "no status" despite our tests passing.
Plus, sometimes our tests are not passing because we are still in development, but it does not mean our current provided binaries are not.
<img width="619" alt="Capture d’écran 2023-01-12 à 14 06 40" src="https://user-images.githubusercontent.com/20380692/212074200-f9e3ab3e-ad1d-4171-bd13-46584c3cd117.png">
3353: Bump svenstaro/upload-release-action from 2.3.0 to 2.4.0 r=curquiza a=dependabot[bot]
Bumps [svenstaro/upload-release-action](https://github.com/svenstaro/upload-release-action) from 2.3.0 to 2.4.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/svenstaro/upload-release-action/releases">svenstaro/upload-release-action's releases</a>.</em></p>
<blockquote>
<h2>2.4.0</h2>
<ul>
<li>Update to node 16</li>
<li>Bump most dependencies</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/svenstaro/upload-release-action/blob/master/CHANGELOG.md">svenstaro/upload-release-action's changelog</a>.</em></p>
<blockquote>
<h2>[2.4.0] - 2023-01-09</h2>
<ul>
<li>Update to node 16</li>
<li>Bump most dependencies</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="2728235f7d"><code>2728235</code></a> 2.4.0</li>
<li><a href="c2e0608dc4"><code>c2e0608</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/issues/88">#88</a> from svenstaro/dependabot/npm_and_yarn/json5-1.0.2</li>
<li><a href="bd74772a1a"><code>bd74772</code></a> Don't shadow vars</li>
<li><a href="16e7903b2d"><code>16e7903</code></a> Bump json5 from 1.0.1 to 1.0.2</li>
<li><a href="f2c549b117"><code>f2c549b</code></a> Bump some more deps</li>
<li><a href="7a7d004438"><code>7a7d004</code></a> Bump some deps</li>
<li><a href="9c4a92ec0d"><code>9c4a92e</code></a> Use explicit any</li>
<li><a href="039214a996"><code>039214a</code></a> Bump jest and typescript versions</li>
<li><a href="2b373356cb"><code>2b37335</code></a> Update to node16</li>
<li><a href="fb1eb39e74"><code>fb1eb39</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/issues/75">#75</a> from svenstaro/dependabot/npm_and_yarn/jsdom-16.7.0</li>
<li>Additional commits viewable in <a href="https://github.com/svenstaro/upload-release-action/compare/2.3.0...2.4.0">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3346: Import milli 🎉 r=Kerollmops a=Kerollmops
Fixes https://github.com/meilisearch/meilisearch/issues/2901
Main work
- integrate the milli repository as an internal crate into this repo
- Update the Cargo.toml accordingly
- Ensure meilisearch-type now uses the internal milli crate and not the remote repository
- Update the milli's version to follow the meilisearch one
Also
- Removed the beta tests in test CI (will be re-integrated later if needed)
- Move and modify milli's README into the `milli` folder
- remove the script folder from `milli`
- Removed useless CI (release-drafter and enforce-label)
⚠️ Also, import all the `release-v1.0.0` until [a5c4fb](a5c4fbbcea) included (merged of the PR https://github.com/meilisearch/meilisearch/pull/3334)
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Samyak S Sarnayak <samyak201@gmail.com>
Co-authored-by: unvalley <kirohi.code@gmail.com>
Co-authored-by: Samyak Sarnayak <samyak201@gmail.com>
3339: Continued deserr integration r=irevoire a=loiclec
Fix https://github.com/meilisearch/meilisearch/issues/3337
Fix https://github.com/meilisearch/meilisearch/issues/3338
1. Add new error codes that should have been implemented earlier:
- `MissingApiKeyActions`
- `MissingApiKeyExpiresAt`
- `MissingApiKeyIndexes`
- `MissingSwapIndexes`
2. Fix a bug where it was possible to create an API key without specifying the value of `expiresAt`
3. Improve the error messages generated by deserr. Have specific error messages for JSON and QueryParam deserialisation errors.
4. Improve error tests by passing query params as arguments to `GET` routes directly instead of using an intermediary JSON object
5. [Use invalid_index_uid error code in more places](e225608337)
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
765: Update version for the next release (v0.39.1) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
764: Update deserr to latest version r=irevoire a=loiclec
Update deserr to 0.1.5, which changes the `DeserializeFromValue` trait, getting rid of the `default()` method.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3334: Add specific error codes `immutable_...` r=irevoire a=loiclec
Add the following error codes:
When an immutable field of API key is sent to the `PATCH /keys` route:
- `ImmutableApiKeyUid`
- `ImmutableApiKeyKey`
- `ImmutableApiKeyActions`
- `ImmutableApiKeyIndexes`
- `ImmutableApiKeyExpiresAt`
- `ImmutableApiKeyCreatedAt`
- `ImmutableApiKeyUpdatedAt`
When an immutable field of Index is sent to the `PATCH /indexes/{uid}` route:
- `ImmutableIndexUid`
- `ImmutableIndexCreatedAt`
- `ImmutableIndexUpdatedAt`
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
3330: test the error codes on the task routes + fix the missing error codes on the limit and from r=dureuill a=irevoire
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
762: Update version for the next release (v0.39.0) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
761: Integrate deserr r=irevoire a=loiclec
1. `Setting<T>` now implements `DeserializeFromValue`
2. The settings now store ranking rules as strongly typed `Criterion` instead of `String`, since the validation of the ranking rules will be done on meilisearch's side from now on
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
759: Change primary key inference error messages r=Kerollmops a=dureuill
# Pull Request
## Related issue
Milli part of https://github.com/meilisearch/meilisearch/issues/3301
## What does this PR do?
- Change error message strings
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3317: Remove the unused error codes r=irevoire a=irevoire
Remove some unused error code + fix the usage of the search+settings sort and filter error_code
Co-authored-by: Tamo <tamo@meilisearch.com>
760: Add Index::map_size r=Kerollmops a=dureuill
# Pull Request
## Related issue
Related to discussion: https://github.com/meilisearch/meilisearch/discussions/3280
## What does this PR do?
- Expose `heed::Env::map_size` through `Index::map_size`. This allows knowing after the fact with which `map_size` an environment was opened (which is not always the `map_size` that was configured for the opening of the environment, see the documentation for `Index::map_size`), which will be necessary to guarantee we can reopen the index with a larger `map_size`.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3305: Remove hidden but usable CLI arguments r=Kerollmops a=Kerollmops
`@curquiza` found out that we were exposing some internal CLI arguments: `nb-max-chunks` and `log-every-n`. In this PR I removed those two, the only two ones that I found. Those options shouldn't be accessible as non-documented in the documentation or the `--help` message.
Fixes https://github.com/meilisearch/meilisearch/issues/3307
Co-authored-by: Clément Renault <clement@meilisearch.com>
3308: Remove `--generate-master-key` option r=Kerollmops a=dureuill
# Pull Request
## Related issue
Related to https://github.com/meilisearch/specifications/pull/210#issuecomment-1372035525
## What does this PR do?
- Remove the short-lived `--generate-master-key` flag that was too beautiful for this world :D.
Removal of this option proceeds of the following reasoning:
1. It is the only option that starts meilisearch and then immediately exits
2. We are unsure if we want to keep it under this form in the future or switch to a subcommand.
3. Releasing this option in v1 would make it insta-stable.
5. The option is only marginally useful, as users will be presented with freshly generated key directly in the error messages if their master key is absent/too short.
6. If we remove this option now, we can still add it back in a future v1 release. If we add it now, we won't be able to remove it in any future v1 version.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
### Impacts
this impacts the docs team as they would previously have had to document this option, and they may have wanted to use it in the user workflow.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3302: Update insta snap tests for index dates of dump v5 r=curquiza a=loiclec
This PR simply updates the content of the insta snapshot test following https://github.com/meilisearch/meilisearch/pull/3013 . I manually verified that the dates in the snaps are indeed correct.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3303: Update version for the next release (v1.0.0) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
3266: Improve the way we receive the documents payload- serde multiple ndjson fix r=curquiza a=jiangbo212
# Pull Request
## Related issue
Fixes#3037
## Related PR
#3164
## What does this PR do?
Sorry, This PR is mainly to fix the problems caused by my previously provided PR #3164. It causes multiple ndjson data deserialization failures
- Fix serde multiple ndjson data failures and add test to it
- Fix serde jsonarray error and againest serde it use `from_slice`. only use `from_slice` when serde error category is `data`, it indicate json data is a single json.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: jiangbo212 <peiyaoliukuan@126.com>
3288: Replace underscores with hyphens in documentation link to error code r=dureuill a=loiclec
# Pull Request
## Related issue
Fixes#3097
## Implementation
Add a new dependency to `convert_case` (already used transitively by `deserr`) so that the link can be generated using:
```rust
/// return the doc url associated with the error
fn url(&self) -> String {
format!(
"https://docs.meilisearch.com/errors#{}",
self.name().to_case(convert_case::Case::Kebab)
)
}
```
## Review
I'd like the reviewer to check whether it is expected that the content of some `dump` snapshot tests changed :-)
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3295: Adjust Master Key-related messages r=dureuill a=dureuill
# Pull Request
## Related issue
Follow up for #3272
## What does this PR do?
- Consistently capitalize "master key" (instead of "Master Key" sometimes) (see https://github.com/meilisearch/specifications/pull/209#discussion_r1060081094)
- Clarify that the counted unit for master key length is bytes, not characters (see https://github.com/meilisearch/documentation/issues/2069#issuecomment-1368873167)
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3013: Extract the dates out of the dumpv5. r=loiclec a=funilrys
Hi there,
please review this PR that tries to fix#2986. I'm still learning Rust and I found that #2986 is an excellent way for me to read and learn what others do with Rust. So please excuse my semantics ...
Stay safe and healthy.
---
# Pull Request
This patch possibly fixes#2986.
This patch introduces a way to fill the IndexMetadata.created_at and IndexMetadata.updated_at keys from the tasks events. This is done by reading the creation date of the first event (created_at) and the creation date of the last event (updated_at).
## Related issue
Fixes#2986
## What does this PR do?
- Extract the dates out of the dumpv5.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: funilrys <contact@funilrys.com>
3278: Remove `--max-index-size` and `--max-task-db-size` flags r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#3231
## What does this PR do?
- Remove `--max-index-size` and `--max-task-db-size` flags from the CLI, config file and environment variable
- Set the size of all indexes to **500GiB** and the size of the task DB to **10GiB**. Reviewers might want to review these values carefully.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3269: Simplify primary key inference r=dureuill a=dureuill
# Pull Request
## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3233
## What does this PR do?
- Integrates https://github.com/meilisearch/milli/pull/752 in meilisearch
- Remove `Serialize` and `Deserialize` from `error::Code` as it is unused.
- No longer filter on `milli` logs when `--log-level` is "info".
- `milli` only has the newly-added inference log at the `info` level (from greping `info` in the codebase)
- the default value for `--log-level` is "INFO" and not "info" since `v0.30` so the filter is not active by default.
- updates milli to v0.38.0
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3281: Merge `--schedule-snapshot` and `--snapshot-interval-sec` options r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes#3131
## What does this PR do?
- Removes `--snapshot-interval-sec`
- `--schedule-snapshot` now accepts an optional integer value specifying the interval in seconds
- The config file no longer has a snapshot_interval_sec key. Instead, the schedule_snapshot key now additionally accepts an integer value specifying the interval in seconds
- The env variable MEILI_SNAPSHOT_INTERVAL no longer exists
- The env variable MEILI_SCHEDULE_SNAPSHOT is always specified to the interval of the snapshot in seconds when defined. If snapshots are disabled the variable is undefined.
---
Relevant part of the `--help`
<img width="885" alt="Capture d’écran 2022-12-27 à 18 22 32" src="https://user-images.githubusercontent.com/41078892/209700626-1a1292c1-14e3-45b6-8265-e0adbd76ecf1.png">
---
### Tests
| `schedule_snapshot` in config.toml | `--schedule-snapshot` flag on CLI | `MEILI_SCHEDULE_SNAPSHOT` | `opt.schedule_snapshot` |
|--|--|--|--|
| missing | missing | missing | `Disabled`
| `false` | missing | missing | `Disabled`
| `true` | missing | missing | `Enabled(86400)`
| `1234` | missing | missing | `Enabled(1234)`
| missing | `--schedule-snapshot` | missing | `Enabled(86400)`
| `false` | `--schedule-snapshot` | missing | `Enabled(86400)`
| missing | `--schedule-snapshot 2345` | missing | `Enabled(2345)`
| `false` | `--schedule-snapshot 2345` | missing | `Enabled(2345)`
| `true` | `--schedule-snapshot 2345` | missing | `Enabled(2345)`
| `1234` | `--schedule-snapshot 2345` | missing | `Enabled(2345)`
| `false` | `--schedule-snapshot 2345` | 3456 | `Enabled(2345)`
| `false` | `--schedule-snapshot` | 3456 | **`Enabled(86400)`**
| `1234` | missing | 3456 | `Enabled(3456)`
| `false` | missing | 3456 | `Enabled(3456)`
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
733: Avoid a prefix-related worst-case scenario in the proximity criterion r=loiclec a=loiclec
# Pull Request
## Related issue
Somewhat fixes (until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3118
## What does this PR do?
When a query ends with a word and a prefix, such as:
```
word pr
```
Then we first determine whether `pre` *could possibly* be in the proximity prefix database before querying it. There are then three possibilities:
1. `pr` is not in any prefix cache because it is not the prefix of many words. We don't query the proximity prefix database. Instead, we list all the word derivations of `pre` through the FST and query the regular proximity databases.
2. `pr` is in the prefix cache but cannot be found in the proximity prefix databases. **In this case, we partially disable the proximity ranking rule for the pair `word pre`.** This is done as follows:
1. Only find the documents where `word` is in proximity to `pre` **exactly** (no derivations)
2. Otherwise, assume that their proximity in all the documents in which they coexist is >= 8
3. `pr` is in the prefix cache and can be found in the proximity prefix databases. In this case we simply query the proximity prefix databases.
Note that if a prefix is longer than 2 bytes, then it cannot be in the proximity prefix databases. Also, proximities larger than 4 are not present in these databases either. Therefore, the impact on relevancy is:
1. For common prefixes of one or two letters: we no longer distinguish between proximities from 4 to 8
2. For common prefixes of more than two letters: we no longer distinguish between any proximities
3. For uncommon prefixes: nothing changes
Regarding (1), it means that these two documents would be considered equally relevant according to the proximity rule for the query `heard pr` (IF `pr` is the prefix of more than 200 words in the dataset):
```json
[
{ "text": "I heard there is a faster proximity criterion" },
{ "text": "I heard there is a faster but less relevant proximity criterion" }
]
```
Regarding (2), it means that two documents would be considered equally relevant according to the proximity rule for the query "faster pro":
```json
[
{ "text": "I heard there is a faster but less relevant proximity criterion" }
{ "text": "I heard there is a faster proximity criterion" },
]
```
But the following document would be considered more relevant than the two documents above:
```json
{ "text": "I heard there is a faster swimmer who is competing in the pro section of the competition " }
```
Note, however, that this change of behaviour only occurs when using the set-based version of the proximity criterion. In cases where there are fewer than 1000 candidate documents when the proximity criterion is called, this PR does not change anything.
---
## Performance
I couldn't use the existing search benchmarks to measure the impact of the PR, but I did some manual tests with the `songs` benchmark dataset.
```
1. 10x 'a':
- 640ms ⟹ 630ms = no significant difference
2. 10x 'b':
- set-based: 4.47s ⟹ 7.42 = bad, ~2x regression
- dynamic: 1s ⟹ 870 ms = no significant difference
3. 'Someone I l':
- set-based: 250ms ⟹ 12 ms = very good, x20 speedup
- dynamic: 21ms ⟹ 11 ms = good, x2 speedup
4. 'billie e':
- set-based: 623ms ⟹ 2ms = very good, x300 speedup
- dynamic: ~4ms ⟹ 4ms = no difference
5. 'billie ei':
- set-based: 57ms ⟹ 20ms = good, ~2x speedup
- dynamic: ~4ms ⟹ ~2ms. = no significant difference
6. 'i am getting o'
- set-based: 300ms ⟹ 60ms = very good, 5x speedup
- dynamic: 30ms ⟹ 6ms = very good, 5x speedup
7. 'prologue 1 a 1:
- set-based: 3.36s ⟹ 120ms = very good, 30x speedup
- dynamic: 200ms ⟹ 30ms = very good, 6x speedup
8. 'prologue 1 a 10':
- set-based: 590ms ⟹ 18ms = very good, 30x speedup
- dynamic: 82ms ⟹ 35ms = good, ~2x speedup
```
Performance is often significantly better, but there is also one regression in the set-based implementation with the query `b b b b b b b b b b`.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
732: Interpret synonyms as phrases r=loiclec a=loiclec
# Pull Request
## Related issue
Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3125
## What does this PR do?
We now map multi-word synonyms to phrases instead of loose words. Such that the request:
```
btw I am going to nyc soon
```
is interpreted as (when the synonym interpretation is chosen for both `btw` and `nyc`):
```
"by the way" I am going to "New York City" soon
```
instead of:
```
by the way I am going to New York City soon
```
This prevents queries containing multi-word synonyms to exceed to word length limit and degrade the search performance.
In terms of relevancy, there is a debate to have. I personally think this could be considered an improvement, since it would be strange for a user to search for:
```
good DIY project
```
and have a result such as:
```
{
"text": "whether it is a good project to do, you'll have to decide for yourself"
}
```
However, for synonyms such as `NYC -> New York City`, then we will stop matching documents where `New York` is separated from `City`. This is however solvable by adding an additional mapping: `NYC -> New York`.
## Performance
With the old behaviour, some long search requests making heavy uses of synonyms could take minutes to be executed. This is no longer the case, these search requests now take an average amount of time to be resolved.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3296: Remove `--disable-auto-batching` CLI option r=gmourier a=loiclec
Fixes#3294
The `index-scheduler` code is not modified, only the CLI options have changed.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
736: Update charabia r=curquiza a=ManyTheFish
Update Charabia to the last version.
> We are now Romanizing Chinese characters into Pinyin.
> Note that we keep the accent because they are in fact never typed directly by the end-user, moreover, changing an accent leads to a different Chinese character, and I don't have sufficient knowledge to forecast the impact of removing accents in this context.
Co-authored-by: ManyTheFish <many@meilisearch.com>
758: Bump taiki-e/install-action from 1 to 2 r=curquiza a=dependabot[bot]
Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 1 to 2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p>
<blockquote>
<h2>2.0.0</h2>
<p>This release implements a mechanism to automatically track the latest version of the tool on our end. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/27">#27</a>)
Hopefully, this will avoid situations such as "new version of the tool has been released, but the maintainer has not been aware of it for a number of months".
This also makes it easier to add support for new tools.</p>
<p>This release also includes the following improvements:</p>
<ul>
<li>
<p>Verify SHA256 checksums for downloaded files in all tools installed from GH releases. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/27">#27</a>)</p>
</li>
<li>
<p>Support omitting the patch/minor version in all tools installed from GH releases. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/27">#27</a>)</p>
<p>For example:</p>
<pre lang="yaml"><code>- uses: taiki-e/install-action@v2
with:
tool: cargo-hack@0.5
</code></pre>
<p>You can also omit the minor version if the major version of tool is 1 or greater.</p>
</li>
<li>
<p>Support <code>just</code>. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/34">#34</a>)</p>
</li>
<li>
<p>Support <code>dprint</code>. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/34">#34</a>)</p>
</li>
</ul>
<p>Note: This release is considered a breaking change because installing on versions not yet recognized by the action or on pre-release versions will no longer work with this release. (They were never officially supported, but they could work before.) Please submit an issue if you need these supports again.</p>
<h2>1.17.3</h2>
<ul>
<li>Update <code>wasmtime@latest</code> to 4.0.0.</li>
</ul>
<h2>1.17.2</h2>
<ul>
<li>Update <code>mdbook@latest</code> to 0.4.25.</li>
</ul>
<h2>1.17.1</h2>
<ul>
<li>Update <code>mdbook@latest</code> to 0.4.23.</li>
<li>Support <code>mdbook</code> on Linux (musl).</li>
<li>Update <code>cargo-llvm-cov@latest</code> to 0.5.3.</li>
</ul>
<h2>1.17.0</h2>
<ul>
<li>Update <code>protoc@latest</code> to 3.21.12.</li>
<li>Support aarch64 self-hosted runners (Linux, macOS, Windows).</li>
<li>Improve support for Fedora/RHEL based containers/self-hosted runners.</li>
</ul>
<h2>1.16.0</h2>
<ul>
<li>
<p>Update <code>cargo-binstall@latest</code> to 0.18.1. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/32">#32</a>, thanks <a href="https://github.com/NobodyXu"><code>`@NobodyXu</code></a>)</p>`
</li>
<li>
<p>If the host environment lacks packages required for installation, such as <code>curl</code> or <code>tar</code>, install them if possible.</p>
<p>It is mainly intended to make the use of this action easy on containers or self-hosted runners, and currently supports Debian-based distributions (including Ubuntu) and Alpine.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="8ffc26aecd"><code>8ffc26a</code></a> Release 2.0.1</li>
<li><a href="82e9eb5996"><code>82e9eb5</code></a> Update DEVELOPMENT.md</li>
<li><a href="d3b7ad8380"><code>d3b7ad8</code></a> Update changelog</li>
<li><a href="f1a96ee3ed"><code>f1a96ee</code></a> Update changelog when manifest update</li>
<li><a href="46063c186c"><code>46063c1</code></a> Update cargo-minimal-versions</li>
<li><a href="048586d7a8"><code>048586d</code></a> Update cargo-hack</li>
<li><a href="d117b8d41a"><code>d117b8d</code></a> Remove outdated todo</li>
<li><a href="76828c33cd"><code>76828c3</code></a> Release 2.0.0</li>
<li><a href="eea8c318de"><code>eea8c31</code></a> Update readme and changelog</li>
<li><a href="ab0e193cf5"><code>ab0e193</code></a> Support dprint</li>
<li>Additional commits viewable in <a href="https://github.com/taiki-e/install-action/compare/v1...v2">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3279: Clarify error message when the db and engine versions are incompatible r=irevoire a=dureuill
# Pull Request
## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/2752
## What does this PR do?
- Implements https://github.com/meilisearch/product/discussions/572#discussioncomment-4390616
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3245: Enable create_raw_index(...) to specify time r=irevoire a=amab8901
# Pull Request
## Related issue
Partially fixes#2983
## What does this PR do?
- Enables [`create_raw_index`](660be071b5/index-scheduler/src/lib.rs (L868)) to specify time
## PR checklist
Please check if your PR fulfills the following requirements:
- [ X ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ X ] Have you read the contributing guidelines?
- [ X ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: amab8901 <amab8901@protonmail.com>
709: Optimise the `ExactWords` sub-criterion within `Exactness` r=loiclec a=loiclec
# Pull Request
## Related issue
Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3116
## What does this PR do?
1. Reduces the algorithmic complexity of finding the documents containing N exact words from something that is exponential to something that is polynomial.
2. Cache intermediary results between different calls to the `exactness` criterion.
## Performance Results
On the `smol_songs.csv` dataset, a request containing 10 common words now takes about 60ms instead of 5 seconds to execute. For example, this is the case with this (admittedly nonsensical) request: `Rock You Hip Hop Folk World Country Electronic Love The`.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
752: Simplify primary key inference r=irevoire a=dureuill
# Pull Request
## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3233
## What does this PR do?
### User PoV
- Change primary key inference to only consider a value as a candidate when it ends with "id", rather than when it simply contains "id".
- Change primary key inference to always fail when there are multiple candidates.
- Replace UserError::MissingPrimaryKey with `UserError::NoPrimaryKeyCandidateFound` and `UserError::MultiplePrimaryKeyCandidatesFound`
### Implementation-wise
- Remove uses of UserError::MissingPrimaryKey not pertaining to inference. This introduces a possible panicking path.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3276: README: Replace Slack link with Discord r=dureuill a=shivaylamba
# Pull Request
## Related issue
Fixes#3275
## What does this PR do?
Update Slack link with Discord link in the README
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Shivay Lamba <shivaylamba@gmail.com>
Indeed, before this patch we were using the reference instead of
"reopening" the task list each time we needed to access it.
Without this patch, all other usage of the task attribute will
break.
This patch possibly fixes#2986.
This patch introduces a way to fill the IndexMetadata.created_at
and IndexMetadata.updated_at keys from the tasks events.
This is done by reading the creation date of the first event
(created_at) and the creation date of the last event (updated_at).
Displays log message in the form:
```
[2022-12-21T09:19:42Z INFO milli::update::index_documents::enrich] Primary key was not specified in index. Inferred to 'id'
```
742: Add a "Criterion implementation strategy" parameter to Search r=irevoire a=loiclec
Add a parameter to search requests which determines the implementation strategy of the criteria. This can be either `set-based`, `iterative`, or `dynamic` (ie choosing between set-based or iterative at search time). See https://github.com/meilisearch/milli/issues/755 for more context about this change.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
750: Fix hard-deletion of an external id that was soft-deleted and then reimported - main r=irevoire a=loiclec
# Pull Request
## Related issue
Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3021
## What does this PR do?
There was a bug happening when:
1. Documents were added
2. Some of these documents were replaced using soft-deletion
3. A deletion of another non-replaced document takes place and triggers a hard-deletion
4. Documents with the same identifiers as the replaced documents are added again
Then, search results would return duplicate documents. No crash would happen at any time (this is the reason it wasn't caught by the previous fuzz test. I have updated the new one such that it also checks the result of a placeholder search request, which then finds the bug immediately).
The cause of the bug is:
1. When a hard-deletion is triggered, we try to retrieve the external document id associated with each soft-deleted document id.
2. Then, we take this list of external document ids and remove each of them from the `ExternalDocumentsIds` structure.
3. However, this is not correct in case an existing (non-deleted) document shares the external id of a soft-deleted document.
## Implementation of the fix
1. Before we process a permanent deletion, we update the list of soft-deleted document ids.
2. Then, the permanent deletion's job is to remove the soft-deleted documents from all data structures. Therefore, to update `ExternalDocumentsIds`, we can simply call the `delete_soft_deleted_documents_ids_from_fsts` method, which is faster and simpler.
## Correctness
A unit test was added to reproduce the bug. The new fuzz test, when adjusted to check the correctness of a placeholder search, could also instantly reproduce the bug, but now does not find any other problem.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3264: Remove macos-latest and windows-latest usages r=curquiza a=curquiza
Related to https://github.com/meilisearch/meilisearch/issues/3109#issuecomment-1359151297
Remove the `macos-latest` and `windows-latest` to replace them with the specific version: this will avoid "surprises" in the future when GitHub changes the `latest` version.
This way, it will also allow us to let the documentation team know about the changes, since we will control the macOS/Windows version we support
Co-authored-by: curquiza <clementine@meilisearch.com>
3261: Use ubuntu-18.04 container instead of GitHub hosted actions r=curquiza a=curquiza
Related to (but does not fix totally) https://github.com/meilisearch/meilisearch/issues/3109 and https://github.com/meilisearch/product/discussions/547#discussioncomment-4109143
## For reviewers, what's the PR changes:
- Use ubuntu-latest where compiling with ubuntu-18.04 is not needed (`update-version-cargo-toml`, `fmt`, `clippy` jobs)
- Where ubuntu-18.04 is required
- Use `ubuntu-latest` as runner
- Use `ubuntu:18.04` as Docker container
- Install the required dependencies (curl and cc)
- Use `actions-rs/toolchain@v1` instead of `hecrj/setup-rust-action@master`. It's more stable and followed alternative. Plus it was easy to make it work with our container contrary to the old one. Change applied in all our CIs to be more consistent
- Remove some useless space to increase readability.
Co-authored-by: curquiza <clementine@meilisearch.com>
747: Soft-deletion computation no longer depends on the mapsize r=irevoire a=dureuill
# Pull Request
## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3231: After removing `--max-index-size`, the `mapsize` will always be unrelated to the actual max size the user wants for their DB, so it doesn't make sense to use these values any longer.
This implements solution 2.3 from https://github.com/meilisearch/meilisearch/issues/3231#issuecomment-1348628824
## What does this PR do?
### User-visible
- Soft-deleted are no longer deleted when there is less than 10% of the mapsize available or when they take more than 10% of the mapsize
- Instead, they are deleted when they are more soft deleted than regular documents, or when they take more than 1GiB disk space (estimated).
### Implementation standpoint
1. Adds a `DeletionStrategy` struct to replace the boolean `disable_soft_deletion` that we had up until now. This enum allows us to specify that we want "always hard", "always soft", or to use the dynamic soft-deletion strategy (default).
2. Uses the current strategy when deleting documents, with the new heuristics being used in the `DeletionStrategy::Dynamic` variant.
3. Updates the tests to use the appropriate DeletionStrategy whenever needed (one of `AlwaysHard` or `AlwaysSoft` depending on the test)
Note to reviewers: this PR is optimized for a commit-by-commit review.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
751: Update version for the next release (v0.38.0) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
3262: Clippy fixes after updating Rust to v1.66 r=curquiza a=dureuill
Ran `cargo clippy --fix`
Fixes the CI.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3249: Bring back changes from release-v0.30.3 to main r=curquiza a=curquiza
⚠️⚠️ I had to fix git conflicts, ensure I did not lose anything ⚠️⚠️
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
728: Add some integration tests on the sort criterion r=ManyTheFish a=loiclec
This is simply an integration test ensuring that the sort criterion works properly.
However, only one version of the algorithm is tested here (the iterative one). To test the version that uses the facet DB, one has to manually set the `CANDIDATES_THRESHOLD` constant to `0`. I have done that and ensured that the test still succeeds. However, in the future, we will probably want to have an option to force which algorithm is used at runtime, for testing purposes.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3236: Improves clarity of the code that receives payloads r=Kerollmops a=Kerollmops
This PR makes small changes to #3164. It improves the clarity and simplicity of some parts of the code.
Co-authored-by: Kerollmops <clement@meilisearch.com>
740: Fix two nightly errors r=Kerollmops a=irevoire
Currently, we have these two errors on rust nightly. It would be nice to help rustc understand what's going on
```
error[E0658]: anonymous lifetimes in `impl Trait` are unstable
--> filter-parser/src/lib.rs:173:53
|
173 | fn ws<'a, O>(inner: impl FnMut(Span<'a>) -> IResult<O>) -> impl FnMut(Span<'a>) -> IResult<O> {
| ^ expected named lifetime parameter
|
= help: add `#![feature(anonymous_lifetime_in_impl_trait)]` to the crate attributes to enable
help: consider introducing a named lifetime parameter
|
173 | fn ws<'a, 'a, O>(inner: impl FnMut(Span<'a>) -> IResult<'a, O>) -> impl FnMut(Span<'a>) -> IResult<O> {
| +++ +++
error[E0658]: anonymous lifetimes in `impl Trait` are unstable
--> filter-parser/src/error.rs:36:49
|
36 | mut parser: impl FnMut(Span<'a>) -> IResult<O>,
| ^ expected named lifetime parameter
|
= help: add `#![feature(anonymous_lifetime_in_impl_trait)]` to the crate attributes to enable
help: consider introducing a named lifetime parameter
|
35 ~ pub fn cut_with_err<'a, 'a, O>(
36 ~ mut parser: impl FnMut(Span<'a>) -> IResult<'a, O>,
|
For more information about this error, try `rustc --explain E0658`.
error: could not compile `filter-parser` due to 2 previous errors
```
Co-authored-by: Tamo <tamo@meilisearch.com>
3164: Improve the way we receive the documents payload r=Kerollmops a=jiangbo212
# Pull Request
## Related issue
Fixes#3037
## What does this PR do?
- writing the playload to a temporary file via BufWritter
- deserialising the json tempporary file to an array of Objects by means of a memory map
- deserialising thie csv tempporary file by means of a memory map
- Adapted some read_json tests
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: jiangbo212 <peiyaoliukuan@gmail.com>
Co-authored-by: jiangbo212 <peiyaoliukuan@126.com>
734: Fix bug 2945/3021 (missing key in documents database) r=Kerollmops a=loiclec
# Pull Request
## Related issue
Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/2945 (until we integrate the new milli bump into meilisearch).
**Note that a dump will not be sufficient to upgrade from meilisearch v0.30.2 to meilisearch v0.30.3 due to this fix** because the bug could have caused the `documents` database to be corrupted. Instead, a full manual reimport of the documents will be necessary.
## What does this PR do?
There was a bug happening when:
1. A few documents are added to the index
2. Some of these documents are soft-deleted
3. New documents are added, replacing existing ones and triggering a hard-deletion
The `IndexDocuments::execute` method would then perform the hard-deletion but forget to change the `external_document_ids` structure appropriately. As a result, the `external_document_ids` would contain keys corresponding to documents that do no exist anymore.
To fix this bug, I split the `DeleteDocuments::execute` method into two: `execute_inner` and `execute`.
- `execute_inner` returns a `DetailedDocumentDeletionResult` which says whether soft-deletion was used or not
- `execute` keeps the exact same signature and behaviour
Then, when deleting replaced documents inside `IndexDocuments::execute`, we call `DeleteDocuments::execute_inner` instead of `DeleteDocuments::execute`. If soft-deletion was used, nothing more is done. But if hard-deletion was used, we remove every reference to soft-deleted documents in the new `external_documents_ids` structure.
## Correctness
- Every other test still passes
- The reproduction test case now passes
- In a different branch ([`update-fuzz-test`](https://github.com/meilisearch/milli/pull/735)), I created a fuzz-test that reproduces the past two bugs. This fuzz test cannot find this bug through any combination of some hand-selected `DocumentAddition / DocumentDeletion / DocumentClear / SettingsUpdate` operations. In that test, each relevant operations can be executed with or without soft-deletion, and document additions can be done in batches, replacing or updating existing documents.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3223: Bring back release-v0.30.2 changes into main r=irevoire a=curquiza
Only bring back the necessary changes from `release-v0.30.2` to `main`, following v0.30.2 release
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
3224: Fix update-cargo-toml-version.yml r=curquiza a=mohitsaxenaknoldus
# Pull Request
## Related issue
Fixes#3219
## What does this PR do?
- ...
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Mohit Saxena <76725454+mohitsaxenaknoldus@users.noreply.github.com>
3229: Add a nightly CI: create every day a `nightly` Docker tag based on the latest commit on `main` r=Kerollmops a=curquiza
Also, fixes#3195
Easy to follow with the commits
- In the Docker CI:
- create every day a `nightly` Docker tag based on the latest commit on `main`
- check if the release is the latest one, before creating the `latest` Docker tag. A script has been added.
- add the `worflow_dispatch` event to trigger the CI to build the `nightly` tag when we want (always on the latest commit on `main`)
- In multiple CIs: replace the `released` type by `published`, see [here](https://stackoverflow.com/questions/59319281/github-action-different-between-release-created-and-published) why. Will not impact anything, but will prevent to fail our future automation
- Remove a useless CI (code coverage, not used for 1 year)
- Remove useless lines (comments and CI logic) that don't have any impact
Co-authored-by: curquiza <clementine@meilisearch.com>
We were writing the instance-uid as bytes instead of string in the data.ms and thus we were unable to parse it later.
Also it was less practical for our user to retrieve it and send it to us.
3112: Rename meilisearch-http r=Kerollmops a=colbsmcdolbs
# Pull Request
## Related issue
Fixes#3073
## What does this PR do?
- Renames all references of `meilisearch-http` to `meilisearch`
- Might need to be rebased before the 1.0.0 release
## PR checklist
Please check if your PR fulfills the following requirements:
- [X] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [X] Have you read the contributing guidelines?
- [X] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Colby Allen <colbyjayallen@gmail.com>
3221: Update README to reference Meilisearch Cloud r=curquiza a=davelarkan
# Pull Request
## Related issue
Fixes#3220
## What does this PR do?
- Updates the README to link to the Pricing page where people can choose a Meilisearch Cloud plan
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Dave Larkan <davelarkan@gmail.com>
3216: Update version for the next release (v1.0.0) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
3215: Use nightly in cargo fmt r=curquiza a=curquiza
Discussed with `@Kerollmops,` needs this change
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
729: Fix distincted exhaustive hits r=Kerollmops a=ManyTheFish
This PR changes the name and behavior of `bucket_candidates`:
- `bucket_candidates` become `initial_candidates` that is less confusing
- `initial_candidates` is no more a simple `RoaringBitmap` but an enum allowing us to precise if the candidates are exhaustive or not
- this enum ensures that any modification is allowed only if the candidates are not already exhaustive.
The bug occurred because `initial_candidates` are modified during the bucket sort allowing the estimation to be more and more precise along the search, and this was an issue when the `initial_candidates` were already exhaustive, now, if candidates are exhaustive, then no modifications are made.
Co-authored-by: ManyTheFish <many@meilisearch.com>
727: Fix bug in filter search r=Kerollmops a=loiclec
# Pull Request
## Related issue
Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3178
## What does this PR do?
The most important change is this one:
```rust
// in milli/src/search/facet/facet_range_search.rs, line 239
let should_stop = {
match self.right {
Bound::Included(right) => right < previous_key.left_bound,
Bound::Excluded(right) => right <= previous_key.left_bound,
Bound::Unbounded => false,
}
};
```
where the operations `<` and `<=` between the two branches were switched. This caused (very few) documents to be missing from filter results.
The second change is a simplification of the algorithm for filters such as `field = value`, where we now perform a direct query into the "Level 0" of the facet db to retrieve the docids instead of invoking the full facet search algorithm. This change is done in `milli/src/search/facet/filter.rs`.
I have added yet more insta-snapshot tests, rechecked the content of the snapshots, and added some integration tests as well.
This is purely a fix in the search algorithms. Based on this PR alone, a dump will not be necessary to switch from v0.30.1 (where this bug is present) to v0.30.2 (where this PR is merged).
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
By creating snapshots and updating the format of the existing
snapshots. The next commit will apply the fix, which will show
its effects cleanly on the old and new snapshot tests
3208: Stop snapshotting the version of meilisearch in the dump r=Kerollmops a=irevoire
It might change, and we don't want to update this test every time we make a new release.
Co-authored-by: Tamo <tamo@meilisearch.com>
3128: Bumps cargo_toml version to most up to date r=curquiza a=colbsmcdolbs
# Pull Request
## Related issue
Fixes#3127
## What does this PR do?
- The README of this repository declares that one package is not up to date. In order to ensure Due Diligence, I have bumped the version number of the package. No test failures running on Windows.
## PR checklist
Please check if your PR fulfills the following requirements:
- [X] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [X] Have you read the contributing guidelines?
- [X] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Colby Allen <colbyjayallen@gmail.com>
723: Fix bug in handling of soft deleted documents when updating settings r=Kerollmops a=loiclec
# Pull Request
## Related issue
Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3021
## What does this PR do?
This PR fixes the bug where a `missing key in documents database` internal error message could appear when indexing documents.
When updating the settings, before clearing the database and before creating the transform output, we now modify the `ExternalDocumentsIds` structure to get rid of all references to soft deleted document ids in its FSTs.
It used to be that updating the settings would clear the soft-deleted document ids, but keep the original `ExternalDocumentsIds` structure. As a consequence of this, when processing a future document addition, we could wrongly believe that a document was being replaced when, in fact, it was a completely new document. See the tests `bug_3021_first`, `bug_3021_second`, and `bug_3021` for a minimal test case that would have reproduced the issue.
We need to take special care to:
- evaluate how users should update to v0.30.1 (containing this fix): dump? reimporting all documents from scratch?
- understand IF/HOW this bug could have caused duplicate documents to be returned
- and evaluate the correctness of the fix, of course :)
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3190: Fix the dump date-import of the dumpv4 r=irevoire a=irevoire
# Pull Request
After merging https://github.com/meilisearch/meilisearch/pull/3012 I realized that the tests on the date of the dump-v4 were still ignored, thus, I fixed them and then noticed #3012 wasn't working properly.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/2987 a second time
`@funilrys` since you wrote most of the code you might be interested, but don't feel obligated to review this code.
Someone from the team will double-check it works 😁
Co-authored-by: Tamo <tamo@meilisearch.com>
719: Add more members of `filter_parser` to `milli::` & `From<&str>` implementation for `Token` r=Kerollmops a=GregoryConrad
## What does this PR do?
The current `milli::Filter` and `milli::FilterCondition` APIs require working with some members of `filter_parser` directly that `milli::` does *not* re-export to its users (at least when not parsing input using `parse`). Also, using `filter_parser` does not make sense when using milli from an embedded context where there is no query to parse.
Instead of reworking `milli::Filter` and `milli::FilterCondition`, this PR adds two non-breaking changes that ease the use of milli:
- Re-exports more members of the dependent version of `filter_parser` in `milli`
- Implements `From<&str>` for `filter_parser::Token`
- This will also allow some basic tests that need to create a `Token` from a string to avoid some boilerplate.
In conjunction, both of these will allow milli users to easily create a `Token` from a `&str` without needing to add `filter_parser` as an extra dependency.
Note: I wanted to use `FromStr` for the `From` implementation; however, it requires returning a `Result` which is not needed for the conversion. Thus, I just left it as `From<&str>`.
Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>
3175: Rename dump command from --dumps-dir to --dump-dir r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes#3132
## What does this PR do?
- Rename the dump commands, env variables and default config
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
722: Geosearch for zero radius r=irevoire a=amab8901
# Pull Request
## Related issue
Fixes#3167 (https://github.com/meilisearch/meilisearch/issues/3167)
## What does this PR do?
- allows Geosearch with zero radius to return the specified location when the coordinates match perfectly (instead of returning nothing). See link for more details.
- new attempt on https://github.com/meilisearch/milli/pull/713
## PR checklist
Please check if your PR fulfills the following requirements:
- [ X ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ X ] Have you read the contributing guidelines?
- [ X ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: amab8901 <amab8901@protonmail.com>
Co-authored-by: Tamo <irevoire@protonmail.ch>
720: Make soft deletion optional in document addition and deletion + add lots of tests r=irevoire a=loiclec
# Pull Request
## What does this PR do?
When debugging recent issues, I created a few unit tests in the hopes reproducing the bugs I was looking for. In the end, I didn't find any, but I thought it would still be good to keep those tests.
More importantly, I added a field to the `DeleteDocuments` and `IndexDocuments` builders, called `disable_soft_deletion`. If set to `true`, the indexing/deletion will never add documents to the `soft_deleted_documents_ids` and instead perform a real deletion of the documents from the databases.
For the new tests, I have:
- Improved the insta-snapshot format of the `external_documents_ids` structure
- Added more tests for the facet DB indexing, deletion, and search algorithms, making sure to test them when the facet DB contains strings (instead of numbers) as well.
- Added more tests for the incremental indexing of the prefix proximity databases. For example, to see if documents are replaced correctly and if common prefixes are deleted correctly.
- Added tests that mix soft deletion and hard deletion, including when processing batches of document updates.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3188: re-enable the dump test on the dates r=irevoire a=irevoire
I just noticed that we have the real date in the dump-v1 contrarily to the dump-v2/3/4/5, thus we can ensure it doesn't change unexpectedly 👍
Co-authored-by: Tamo <tamo@meilisearch.com>
3012: Extract the dates out of the dumpv4. r=irevoire a=funilrys
Hi there,
please review this PR that tries to fix#2987. I'm still learning Rust and I found that #2987 is an excellent way for me to read and learn what others do with Rust. So please excuse my semantics ...
Stay safe and healthy.
---
# Pull Request
This patch possibly fixes#2987.
This patch introduces a way to fill the IndexMetadata.created_at and IndexMetadata.updated_at keys from the tasks events. This is done by reading the creation date of the first event (created_at) and the creation date of the last event (updated_at).
## Related issue
Fixes#2987
## What does this PR do?
- Extract the dates out of the dumpv4.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: funilrys <contact@funilrys.com>
Indeed, before this patch, I was (probably) breaking every usage
of the tasks BufReader. This patch solves the issue by reopening
the the tasks file every time its needed.
3170: Re-enable importing from dumps v1 r=irevoire a=dureuill
# Pull Request
## Related issue
Fixes#2985
## What does this PR do?
### User standpoint
- Allows importing dumps version 1 (exported with Meilisearch <=v0.20) to modern-day Meilisearch
- Tasks of type "Customs" are skipped, with a warning
- Tasks of status "enqueued" are skipped, with a warning
- The "WordsPosition" ranking rule is skipped when encountered in the ranking rules, with a warning.
After an import from a v1 dump, it is recommended that a user checks each index and its settings.
### Implementation standpoint
- Add a dump v1 reader based on the one by `@irevoire`
- Add a v1_to_v2 compatibility layer based on the v2_to_v3 one
- as v2 requires UUIDs, the v1 indexes are mapped to UUIDs built from their position in the metada file: the first index is given UUID all zeroes, the second one UUID `00000000-0000-0000-0000-000000000001`, and so on... This should have no bearing on the final indexes because v6 is not using UUIDs, but this allows us to correctly identify which tasks belong to which index.
- Modify the v2_to_v3 compatibility layer to account for the fact that the reader can actually be a v1_to_v2 compat layer
- Make some base dump types Clone
- impl Display for v2::settings::Criterion
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
3180: Bump mislav/bump-homebrew-formula-action from 1 to 2 r=curquiza a=dependabot[bot]
Bumps [mislav/bump-homebrew-formula-action](https://github.com/mislav/bump-homebrew-formula-action) from 1 to 2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/mislav/bump-homebrew-formula-action/releases">mislav/bump-homebrew-formula-action's releases</a>.</em></p>
<blockquote>
<h2>bump-homebrew-formula 2.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Use Node 16 by <a href="https://github.com/chenrui333"><code>`@chenrui333</code></a>` in <a href="https://github-redirect.dependabot.com/mislav/bump-homebrew-formula-action/pull/36">mislav/bump-homebrew-formula-action#36</a></li>
<li>Bump minimist from 1.2.5 to 1.2.6 by <a href="https://github.com/dependabot"><code>`@dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/mislav/bump-homebrew-formula-action/pull/33">mislav/bump-homebrew-formula-action#33</a></li>
<li>Bump node-fetch from 2.6.6 to 2.6.7 by <a href="https://github.com/dependabot"><code>`@dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/mislav/bump-homebrew-formula-action/pull/34">mislav/bump-homebrew-formula-action#34</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/chenrui333"><code>`@chenrui333</code></a>` made their first contribution in <a href="https://github-redirect.dependabot.com/mislav/bump-homebrew-formula-action/pull/36">mislav/bump-homebrew-formula-action#36</a></li>
</ul>
<h2>bump-homebrew-formula 1.16</h2>
<ul>
<li>Replaces broken v1.15 tag, thanks <a href="https://github.com/hendrikmaus"><code>`@hendrikmaus</code></a>` <a href="https://github-redirect.dependabot.com/mislav/bump-homebrew-formula-action/issues/32">mislav/bump-homebrew-formula-action#32</a></li>
<li>Add <code>push-to</code> option, thanks <a href="https://github.com/codefromthecrypt"><code>`@codefromthecrypt</code></a>` <a href="https://github-redirect.dependabot.com/mislav/bump-homebrew-formula-action/pull/30">mislav/bump-homebrew-formula-action#30</a></li>
<li>Fix syntax error, thanks <a href="https://github.com/hendrikmaus"><code>`@hendrikmaus</code></a>` <a href="https://github.com/wata727"><code>`@wata727</code></a>` <a href="https://github-redirect.dependabot.com/mislav/bump-homebrew-formula-action/pull/27">mislav/bump-homebrew-formula-action#27</a></li>
<li>Ensure repeated placeholders in <code>commit-message</code> are expanded, thanks <a href="https://github.com/hendrikmaus"><code>`@hendrikmaus</code></a>` <a href="https://github-redirect.dependabot.com/mislav/bump-homebrew-formula-action/pull/29">mislav/bump-homebrew-formula-action#29</a></li>
</ul>
<h2>bump-homebrew-formula 1.14</h2>
<ul>
<li>Ignore HTTP 409 error when fast-forwading the main branch of <code>homebrew-tap</code> fork</li>
</ul>
<h2>bump-homebrew-formula 1.13</h2>
<ul>
<li>Add <code>create-pullrequest</code> input to control whether or not a PR is submitted to <code>homebrew-tap</code></li>
<li>Add <code>download-sha256</code> input to define the SHA256 checksum of the archive at <code>download-url</code></li>
<li>Fix creating a new branch in the forked repo failing with HTTP 404</li>
</ul>
<h2>bump-homebrew-formula 1.12</h2>
<ul>
<li>Fix Actions CJS loader halting on <code>foo?.bar</code> JS syntax</li>
</ul>
<h2>bump-homebrew-formula 1.11</h2>
<ul>
<li>New optional <code>formula-path</code> input accepts the filename of the formula file to edit (default <code>Formula/<formula-name>.rb</code>).</li>
<li>Remove <code>revision N</code> lines when bumping Homebrew formulae.</li>
</ul>
<h2>bump-homebrew-formula 1.10</h2>
<ul>
<li>The new optional <code>tag-name</code> input allows this action to be <a href="https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow">manually triggered via <code>workflow_dispatch</code></a> instead of on git push to a tag.</li>
</ul>
<h2>bump-homebrew-formula 1.9</h2>
<ul>
<li>Fix following multiple HTTP redirects while calculating checksum for <code>download-url</code></li>
</ul>
<h2>bump-homebrew-formula 1.8</h2>
<ul>
<li>Enable JavaScript source maps for better failure debugging</li>
</ul>
<h2>bump-homebrew-formula 1.7</h2>
<ul>
<li>
<p>Allow <code>download-url</code> as input parameter</p>
</li>
<li>
<p>Add support for git-based <code>download-url</code></p>
</li>
</ul>
<h2>bump-homebrew-formula 1.6</h2>
<ul>
<li>Control the git commit message template being used for updating the formula file via the <code>commit-message</code> action input</li>
</ul>
<h2>bump-homebrew-formula 1.5</h2>
<ul>
<li>Support detection version from <code>https://github.com/OWNER/REPO/releases/download/TAG/FILE</code> download URLs</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="fcd7e28e54"><code>fcd7e28</code></a> lib</li>
<li><a href="33989a8502"><code>33989a8</code></a> Merge branch 'main' into v2</li>
<li><a href="5983bb6c59"><code>5983bb6</code></a> Improve extracting complex tag names from URLs</li>
<li><a href="9750a1166b"><code>9750a11</code></a> lib</li>
<li><a href="64410e9c96"><code>64410e9</code></a> v2</li>
<li><a href="677d7482a3"><code>677d748</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/mislav/bump-homebrew-formula-action/issues/36">#36</a> from chenrui333/node-16</li>
<li><a href="f364e76079"><code>f364e76</code></a> also update some build dependencies</li>
<li><a href="c08fd9bee5"><code>c08fd9b</code></a> deps: update to use nodev16</li>
<li><a href="280f532e9a"><code>280f532</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/mislav/bump-homebrew-formula-action/issues/34">#34</a> from mislav/dependabot/npm_and_yarn/node-fetch-2.6.7</li>
<li><a href="5d94a66af3"><code>5d94a66</code></a> Bump node-fetch from 2.6.6 to 2.6.7</li>
<li>Additional commits viewable in <a href="https://github.com/mislav/bump-homebrew-formula-action/compare/v1...v2">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
3181: Bump Swatinem/rust-cache from 2.0.0 to 2.2.0 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.0.0 to 2.2.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.2.0</h2>
<ul>
<li>Add new <code>save-if</code> option to always restore, but only conditionally save the cache.</li>
</ul>
<h2>v2.1.0</h2>
<ul>
<li>Only hash <code>Cargo.{lock,toml}</code> files in the configured workspace directories.</li>
</ul>
<h2>v2.0.2</h2>
<ul>
<li>Avoid calling cargo metadata on pre-cleanup.</li>
<li>Added <code>prefix-key</code>, <code>cache-directories</code> and <code>cache-targets</code> options.</li>
</ul>
<h2>v2.0.1</h2>
<ul>
<li>Primarily just updating dependencies to fix GitHub deprecation notices.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.2.0</h2>
<ul>
<li>Add new <code>save-if</code> option to always restore, but only conditionally save the cache.</li>
</ul>
<h2>2.1.0</h2>
<ul>
<li>Only hash <code>Cargo.{lock,toml}</code> files in the configured workspace directories.</li>
</ul>
<h2>2.0.2</h2>
<ul>
<li>Avoid calling <code>cargo metadata</code> on pre-cleanup.</li>
<li>Added <code>prefix-key</code>, <code>cache-directories</code> and <code>cache-targets</code> options.</li>
</ul>
<h2>2.0.1</h2>
<ul>
<li>Primarily just updating dependencies to fix GitHub deprecation notices.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="359a70e43a"><code>359a70e</code></a> 2.2.0</li>
<li><a href="ecee04e7b3"><code>ecee04e</code></a> feat: add save-if option, closes <a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/66">#66</a> (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/91">#91</a>)</li>
<li><a href="b894d59a8d"><code>b894d59</code></a> 2.1.0</li>
<li><a href="e78327dd9e"><code>e78327d</code></a> small code style improvements, README and CHANGELOG updates</li>
<li><a href="ccdddcc049"><code>ccdddcc</code></a> only hash Cargo.toml/Cargo.lock that belong to a configured workspace (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/90">#90</a>)</li>
<li><a href="b5ec9edd91"><code>b5ec9ed</code></a> 2.0.2</li>
<li><a href="3f2513fdf4"><code>3f2513f</code></a> avoid calling cargo metadata on pre-cleanup</li>
<li><a href="19c46583c5"><code>19c4658</code></a> update dependencies</li>
<li><a href="b8e72aae83"><code>b8e72aa</code></a> Added <code>prefix-key</code> <code>cache-directories</code> and <code>cache-targets</code> options (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/85">#85</a>)</li>
<li><a href="22c9328bcb"><code>22c9328</code></a> 2.0.1</li>
<li>Additional commits viewable in <a href="https://github.com/Swatinem/rust-cache/compare/v2.0.0...v2.2.0">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
3183: Use ubuntu-latest when not impacting r=Kerollmops a=curquiza
Minor changes
- Use `ubuntu-latest` for CI where there is no compilation
- rename one of the workflow (obsolete name)
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
3179: Bump svenstaro/upload-release-action from 1.pre.release to 2.3.0 r=curquiza a=dependabot[bot]
Bumps [svenstaro/upload-release-action](https://github.com/svenstaro/upload-release-action) from 1.pre.release to 2.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/svenstaro/upload-release-action/releases">svenstaro/upload-release-action's releases</a>.</em></p>
<blockquote>
<h2>2.3.0</h2>
<ul>
<li>Now defaults <code>repo_token</code> to <code>${{ github.token }}</code> and <code>tag</code> to <code>${{ github.ref }}</code> <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/69">#69</a> (thanks <a href="https://github.com/leighmcculloch"><code>`@leighmcculloch</code></a>)**</li>`
</ul>
<h2>2.2.1</h2>
<ul>
<li>Added support for the GitHub pagination API for repositories with many releases <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/36">#36</a> (thanks <a href="https://github.com/djpohly"><code>`@djpohly</code></a>)</li>`
</ul>
<h2>2.2.0</h2>
<ul>
<li>Add support for ceating a new release in a foreign repository <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/25">#25</a> (thanks <a href="https://github.com/kittaakos"><code>`@kittaakos</code></a>)</li>`
<li>Upgrade all deps</li>
</ul>
<h2>2.1.1</h2>
<ul>
<li>Fix <code>release_name</code> option <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/27">#27</a> (thanks <a href="https://github.com/kittaakos"><code>`@kittaakos</code></a>)</li>`
</ul>
<h2>2.1.0</h2>
<ul>
<li>Strip refs/heads/ from the input tag <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/23">#23</a> (thanks <a href="https://github.com/OmarEmaraDev"><code>`@OmarEmaraDev</code></a>)</li>`
</ul>
<h2>2.0.0</h2>
<ul>
<li>Add <code>prerelease</code> input parameter. Setting this marks the created release as a pre-release.</li>
<li>Add <code>release_name</code> input parameter. Setting this explicitly sets the title of the release.</li>
<li>Add <code>body</code> input parameter. Setting this sets the text content of the created release.</li>
<li>Add <code>browser_download_url</code> output variable. This contains the publicly accessible download URL of the uploaded artifact.</li>
<li>Allow for leaving <code>asset_name</code> unset. This will cause the asset to use the filename.</li>
</ul>
<h2>1.1.0</h2>
<p>No release notes provided.</p>
<h2>1.0.1</h2>
<p>No release notes provided.</p>
<h2>1.0.0</h2>
<p>No release notes provided.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/svenstaro/upload-release-action/blob/master/CHANGELOG.md">svenstaro/upload-release-action's changelog</a>.</em></p>
<blockquote>
<h2>[2.3.0] - 2022-06-05</h2>
<ul>
<li>Now defaults <code>repo_token</code> to <code>${{ github.token }}</code> and <code>tag</code> to <code>${{ github.ref }}</code> <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/69">#69</a> (thanks <a href="https://github.com/leighmcculloch"><code>`@leighmcculloch</code></a>)</li>`
</ul>
<h2>[2.2.1] - 2020-12-16</h2>
<ul>
<li>Added support for the GitHub pagination API for repositories with many releases <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/36">#36</a> (thanks <a href="https://github.com/djpohly"><code>`@djpohly</code></a>)</li>`
</ul>
<h2>[2.2.0] - 2020-10-07</h2>
<ul>
<li>Add support for ceating a new release in a foreign repository <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/25">#25</a> (thanks <a href="https://github.com/kittaakos"><code>`@kittaakos</code></a>)</li>`
<li>Upgrade all deps</li>
</ul>
<h2>[2.1.1] - 2020-09-25</h2>
<ul>
<li>Fix <code>release_name</code> option <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/27">#27</a> (thanks <a href="https://github.com/kittaakos"><code>`@kittaakos</code></a>)</li>`
</ul>
<h2>[2.1.0] - 2020-08-10</h2>
<ul>
<li>Strip refs/heads/ from the input tag <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/pull/23">#23</a> (thanks <a href="https://github.com/OmarEmaraDev"><code>`@OmarEmaraDev</code></a>)</li>`
</ul>
<h2>[2.0.0] - 2020-07-03</h2>
<ul>
<li>Add <code>prerelease</code> input parameter. Setting this marks the created release as a pre-release.</li>
<li>Add <code>release_name</code> input parameter. Setting this explicitly sets the title of the release.</li>
<li>Add <code>body</code> input parameter. Setting this sets the text content of the created release.</li>
<li>Add <code>browser_download_url</code> output variable. This contains the publicly accessible download URL of the uploaded artifact.</li>
<li>Allow for leaving <code>asset_name</code> unset. This will cause the asset to use the filename.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="133984371c"><code>1339843</code></a> 2.3.0</li>
<li><a href="c2b649c57e"><code>c2b649c</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/issues/72">#72</a> from svenstaro/dependabot/npm_and_yarn/node-fetch-2.6.7</li>
<li><a href="eb625cd0ad"><code>eb625cd</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/issues/71">#71</a> from svenstaro/dependabot/npm_and_yarn/ansi-regex-4.1.1</li>
<li><a href="99cbd251b2"><code>99cbd25</code></a> Bump node-fetch from 2.6.1 to 2.6.7</li>
<li><a href="a6824c9e54"><code>a6824c9</code></a> Bump ansi-regex from 4.1.0 to 4.1.1</li>
<li><a href="6eb74c809d"><code>6eb74c8</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/issues/67">#67</a> from svenstaro/dependabot/npm_and_yarn/minimist-1.2.6</li>
<li><a href="8ec375d911"><code>8ec375d</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/svenstaro/upload-release-action/issues/69">#69</a> from leighmcculloch/patch-1</li>
<li><a href="8d45355ac2"><code>8d45355</code></a> Update README.md</li>
<li><a href="7d269bd712"><code>7d269bd</code></a> Add defaults to commonly set fields to reduce setup</li>
<li><a href="c71fb95114"><code>c71fb95</code></a> Bump minimist from 1.2.5 to 1.2.6</li>
<li>Additional commits viewable in <a href="https://github.com/svenstaro/upload-release-action/compare/v1-release...2.3.0">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
706: Limit the reindexing caused by updating settings when not needed r=curquiza a=GregoryConrad
## What does this PR do?
When updating index settings using `update::Settings`, sometimes a `reindex` of `update::Settings` is triggered when it doesn't need to be. This PR aims to prevent those unnecessary `reindex` calls.
For reference, here is a snippet from the current `execute` method in `update::Settings`:
```rust
// ...
if stop_words_updated
|| faceted_updated
|| synonyms_updated
|| searchable_updated
|| exact_attributes_updated
{
self.reindex(&progress_callback, &should_abort, old_fields_ids_map)?;
}
```
- [x] `faceted_updated` - looks good as-is ✅
- [x] `stop_words_updated` - looks good as-is ✅
- [x] `synonyms_updated` - looks good as-is ✅
- [x] `searchable_updated` - fixed in this PR
- [x] `exact_attributes_updated` - fixed in this PR
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>
3172: Add CI to push a latest git tag for every stable Meilisearch release r=curquiza a=curquiza
Fixes partially #3147.
- Add a CI to add/update a `latest` git tag ONLY when releasing a stable version of Meilisearch (not for pre-release or for custom tag for prototypes)
- Update the Docker CI to avoid being triggered when `latest` git tag is pushed: the CI is still triggered when a git tag is pushed, except for the `latest` tag.
Reminder: the `latest` Docker image is already created and pushed when releasing a stable version of Meilisearch. This step is already present in our current Docker CI.
Co-authored-by: curquiza <clementine@meilisearch.com>
716: Bump Swatinem/rust-cache from 2.0.1 to 2.2.0 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.0.1 to 2.2.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.2.0</h2>
<ul>
<li>Add new <code>save-if</code> option to always restore, but only conditionally save the cache.</li>
</ul>
<h2>v2.1.0</h2>
<ul>
<li>Only hash <code>Cargo.{lock,toml}</code> files in the configured workspace directories.</li>
</ul>
<h2>v2.0.2</h2>
<ul>
<li>Avoid calling cargo metadata on pre-cleanup.</li>
<li>Added <code>prefix-key</code>, <code>cache-directories</code> and <code>cache-targets</code> options.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.2.0</h2>
<ul>
<li>Add new <code>save-if</code> option to always restore, but only conditionally save the cache.</li>
</ul>
<h2>2.1.0</h2>
<ul>
<li>Only hash <code>Cargo.{lock,toml}</code> files in the configured workspace directories.</li>
</ul>
<h2>2.0.2</h2>
<ul>
<li>Avoid calling <code>cargo metadata</code> on pre-cleanup.</li>
<li>Added <code>prefix-key</code>, <code>cache-directories</code> and <code>cache-targets</code> options.</li>
</ul>
<h2>2.0.1</h2>
<ul>
<li>Primarily just updating dependencies to fix GitHub deprecation notices.</li>
</ul>
<h2>2.0.0</h2>
<ul>
<li>The action code was refactored to allow for caching multiple workspaces and
different <code>target</code> directory layouts.</li>
<li>The <code>working-directory</code> and <code>target-dir</code> input options were replaced by a
single <code>workspaces</code> option that has the form of <code>$workspace -> $target</code>.</li>
<li>Support for considering <code>env-vars</code> as part of the cache key.</li>
<li>The <code>sharedKey</code> input option was renamed to <code>shared-key</code> for consistency.</li>
</ul>
<h2>1.4.0</h2>
<ul>
<li>Clean both <code>debug</code> and <code>release</code> target directories.</li>
</ul>
<h2>1.3.0</h2>
<ul>
<li>Use Rust toolchain file as additional cache key.</li>
<li>Allow for a configurable target-dir.</li>
</ul>
<h2>1.2.0</h2>
<ul>
<li>Cache <code>~/.cargo/bin</code>.</li>
<li>Support for custom <code>$CARGO_HOME</code>.</li>
<li>Add a <code>cache-hit</code> output.</li>
<li>Add a new <code>sharedKey</code> option that overrides the automatic job-name based key.</li>
</ul>
<h2>1.1.0</h2>
<ul>
<li>Add a new <code>working-directory</code> input.</li>
<li>Support caching git dependencies.</li>
<li>Lots of other improvements.</li>
</ul>
<h2>1.0.2</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="359a70e43a"><code>359a70e</code></a> 2.2.0</li>
<li><a href="ecee04e7b3"><code>ecee04e</code></a> feat: add save-if option, closes <a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/66">#66</a> (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/91">#91</a>)</li>
<li><a href="b894d59a8d"><code>b894d59</code></a> 2.1.0</li>
<li><a href="e78327dd9e"><code>e78327d</code></a> small code style improvements, README and CHANGELOG updates</li>
<li><a href="ccdddcc049"><code>ccdddcc</code></a> only hash Cargo.toml/Cargo.lock that belong to a configured workspace (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/90">#90</a>)</li>
<li><a href="b5ec9edd91"><code>b5ec9ed</code></a> 2.0.2</li>
<li><a href="3f2513fdf4"><code>3f2513f</code></a> avoid calling cargo metadata on pre-cleanup</li>
<li><a href="19c46583c5"><code>19c4658</code></a> update dependencies</li>
<li><a href="b8e72aae83"><code>b8e72aa</code></a> Added <code>prefix-key</code> <code>cache-directories</code> and <code>cache-targets</code> options (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/85">#85</a>)</li>
<li>See full diff in <a href="https://github.com/Swatinem/rust-cache/compare/v2.0.1...v2.2.0">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
708: Reduce memory usage of the MatchingWords structure r=ManyTheFish a=loiclec
# Pull Request
## Related issue
Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3115
## What does this PR do?
1. Reduces the memory usage caused by the creation of a 10-word query tree by 20x.
This is done by deduplicating the `MatchingWord` values, which are heavy because of their inner DFA. The deduplication works by wrapping each `MatchingWord` in a reference-counted box and using a hash map to determine whether a `MatchingWord` DFA already exists for a certain signature, or whether a new one needs to be built.
2. Avoid the worst-case scenario of creating a `MatchingWord` for extremely long words that cannot be indexed by milli.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
712: Fix bulk facet indexing bug r=Kerollmops a=loiclec
# Pull Request
## Related issue
Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3165
## What does this PR do?
Fixes a bug where indexing certain numbers of filterable attribute values in bulk led to corrupted facet databases. This was due to a lossy integer conversion which would ultimately prevent entire levels of the facet database to be written into LMDB.
More specifically, this change was made:
```diff
- if cur_writer_len as u8 >= self.min_level_size {
+ if cur_writer_len >= self.min_level_size as usize {
```
I also checked other comparisons to `min_level_size` and other conversions such as `x as u8` in this part of the codebase.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
711: Replace deprecated gh actions r=curquiza a=pnhatminh
# Pull Request
## Related issue
Fixes#678
## What does this PR do?
- Replace deprecated github action command with newly defined command.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Minh Pham <minh.pham@codelink.io>
3149: Fix the dump tests r=Kerollmops a=irevoire
You'll need to trust me on this one. But the tests in the release-v0.30.0 were deactivated for a long time, and I don't know what was wrong with them.
Anyway, I checked that these SHA did match the tasks view we're expecting, and it looks good to me.
Co-authored-by: Tamo <tamo@meilisearch.com>
* Fix error code of the "duplicate index found" error
* Use the content of the ProcessingTasks in the tasks cancelation system
* Change the missing_filters error code into missing_task_filters
* WIP Introduce the invalid_task_uid error code
* Use more precise error codes/message for the task routes
+ Allow star operator in delete/cancel tasks
+ rename originalQuery to originalFilters
+ Display error/canceled_by in task view even when they are = null
+ Rename task filter fields by using their plural forms
+ Prepare an error code for canceledBy filter
+ Only return global tasks if the API key action `index.*` is there
* Add canceledBy task filter
* Update tests following task API changes
* Rename original_query to original_filters everywhere
* Update more insta-snap tests
* Make clippy happy
They're a happy clip now.
* Make rustfmt happy
>:-(
* Fix Index name parsing error message to fit the specification
* Bump milli version to 0.35.1
* Fix the new error messages
* fix the error messages and add tests
* rename the error codes for the sake of consistency
* refactor the way we send the cli informations + add the analytics for the config file and ssl usage
* Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
* add a comment over the new infos structure
* reformat, sorry @kero
* Store analytics for the documents deletions
* Add analytics on all the settings
* Spawn threads with names
* Spawn rayon threads with names
* update the distinct attributes to the spec update
* update the analytics on the search route
* implements the analytics on the health and version routes
* Fix task details serialization
* Add the question mark to the task deletion query filter
* Add the question mark to the task cancelation query filter
* Fix tests
* add analytics on the task route
* Add all the missing fields of the new task query type
* Create a new analytics for the task deletion
* Create a new analytics for the task creation
* batch the tasks seen events
* Update the finite pagination analytics
* add the analytics of the swap-indexes route
* Stop removing the DB when failing to read it
* Rename originalFilters into originalFilters
* Rename matchedDocuments into providedIds
* Add `workflow_dispatch` to flaky.yml
* Bump grenad to 0.4.4
* Bump milli to version v0.37.0
* Don't multiply total memory returned by sysinfo anymore
sysinfo now returns bytes rather than KB
* Add a dispatch to the publish binaries workflow
* Fix publish release CI
* Don't use gold but the default linker
* Always display details for the indexDeletion task
* Fix the insta tests
* refactorize the whole test suite
1. Make a call to assert_internally_consistent automatically when snapshoting the scheduler. There is no point in snapshoting something broken and expect the dumb humans to notice.
2. Replace every possible call to assert_internally_consistent by a snapshot of the scheduler. It takes as many lines and ensure we never change something without noticing in any tests ever.
3. Name every snapshots: it's easier to debug when something goes wrong and easier to review in general.
4. Stop skipping breakpoints, it's too easy to miss something. Now you must explicitely show which path is the scheduler supposed to use.
5. Add a timeout on the channel.recv, it eases the process of writing tests, now when something file you get a failure instead of a deadlock.
* rebase on release-v0.30
* makes clippy happy
* update the snapshots after a rebase
* try to remove the flakyness of the failing test
* Add more analytics on the ranking rules positions
* Update the dump test to check for the dumpUid dumpCreation task details
* send the ranking rules as a string because amplitude is too dumb to process an array as a single value
* Display a null dumpUid until we computed the dump itself on disk
* Update tests
* Check if the master key is missing before returning an error
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
707: Add all_obkv_to_json function r=Kerollmops a=GregoryConrad
## What does this PR do?
When embedding milli in an application (other than Meilisearch), it often makes sense to not use the `displayed_attributes` functionality and instead just use milli as a full document store. Thus, this PR adds a function, `all_obkv_to_json`, to supplement the already exposed `milli::obkv_to_json` so that those embedding milli *do not* need to deal with `displayed_attributes` if they don't need to.
~This PR also introduces a slight breaking change: `obkv_to_json` now accepts a reference to `obkv::KvReaderU16` instead of taking ownership of it. As far as I can tell, this seems like a change for the better (`obkv_to_json` only acts upon `obkv` rather than consuming it), but I can change it back if you so desire.~ (reverted in [935a724](935a724c57))
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>
710: Update Clippy to use Rust Stable r=irevoire a=Kerollmops
This PR changes the CI to use Rust stable for Clippy and Rustfmt. This way we will reduce the number of times we break the CI. [The version will only change every two months or so](https://www.whatrustisit.com/).
Co-authored-by: Clément Renault <clement@meilisearch.com>
The issue was linked to the fact that the debug implementation of the PhantomData wasn't the same between rust stable and rust nightly.
This was causing an issue while snapshsotting the settings and this commit fix it by representing the settings as json which already ignores the PhantomData
3100: Add a dispatch to the publish binaries workflow r=Kerollmops a=curquiza
Add `worklfow_dispatch` event to publish binaries workflow to allow the manually trigger
Co-authored-by: Kerollmops <clement@meilisearch.com>
697: Fix bug in prefix DB indexing r=loiclec a=loiclec
Where the batch's information was not properly updated in cases where only the proximity changed between two consecutive word pair proximities.
Closes partially https://github.com/meilisearch/meilisearch/issues/3043
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
3041: Add `workflow_dispatch` to flaky.yml r=irevoire a=curquiza
To be able to run the job manual and don't wait for one week
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
702: Update version for the next release (v0.37.0) in Cargo.toml files r=ManyTheFish a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
696: Fix Facet Indexing bugs r=Kerollmops a=loiclec
1. Handle keys with variable length correctly
Closes (partially) https://github.com/meilisearch/meilisearch/issues/3042
This issue is now easily reproducible with the updated fuzz tests, which now generate keys with variable lengths.
2. Prevent adding facets to the database if their encoded value does not satisfy `valid_lmdb_key`.
Closes (partially) https://github.com/meilisearch/meilisearch/issues/2743
This fixes an indexing failure when a document had a filterable attribute containing a value whose length is higher than ~500 bytes. For now, this fix is just meant to prevent crashes. Better handling of long values of filterable attributes will be handled in a separate PR.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
1. Handle keys with variable length correctly
This fixes https://github.com/meilisearch/meilisearch/issues/3042 and
is easily reproducible with the updated fuzz tests, which now generate
keys with variable lengths.
2. Prevent adding facets to the database if their encoded value does
not satisfy `valid_lmdb_key`.
This fixes an indexing failure when a document had a filterable
attribute containing a value whose length is higher than ~500 bytes.
3070: Remove core and use engine r=Kerollmops a=curquiza
Following the new team name
Not mandatory since GitHub is doing redirection, but more consistent
Co-authored-by: curquiza <clementine@meilisearch.com>
685: ci: Use pre-compiled binaries for faster CI r=irevoire a=azzamsa
# Pull Request
## Related issue
Fixes #<issue_number>
## What does this PR do?
- ...
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: azzamsa <me@azzamsa.com>
699: Force vendoring of LMDB even if a system version is available r=Kerollmops a=dureuill
# Pull Request
## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3017: will fix once ported to milli and meilisearch.
## What does this PR do?
- Force using vendored version of LMDB
- **don't use lmdb master3 branch anymore**: this is a bit of a side effect of using a tag instead of branch for heed as a dependency, but it is wanted anyway for now as lmdb master3 was more of an experiment
- **modifies CI to run `cargo check` on the release rather than the debug artifacts**. This is an attempt to reduce the necessary disk space and avoid "out of space" failures.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3015: Replace deprecated set-output in GitHub actions r=curquiza a=funilrys
# Pull Request
This patch fixes#3011.
This patch fixes the deprecation warning regarding the usage of `set-output`.
This patch fixes the issues by switching the following format:
```
echo ::set-output name=[name]::[value]
```
into the following format:
```
echo "[name]=[value]" >> ${GITHUB_OUTPUT}
```
## Related issue
Fixes#3011
## What does this PR do?
- Fix CI/CD deprecation warnings.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: funilrys <contact@funilrys.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
694: Update version for the next release (v0.36.0) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: Kerollmops <Kerollmops@users.noreply.github.com>
693: use the lmdb-master.3 branch r=Kerollmops a=irevoire
After investigating https://github.com/meilisearch/meilisearch/issues/3017, we found out that it was due to lmdb and that, without any code change on our side, bumping using the lmdb-master-3 branch fix our issues.
But, we’re not really confident about what changed between the `mdb.master` and `mdb.master3` branches; thus this is a temporary change, and we hope we’ll be able to move to the new version of heed asap (either before the end of the pre-release or for the next release).
--------
The bug is hard to reproduce; I can reproduce it 100% of the time on my archlinux personal computer. But on a scaleway archlinux bare-metal machine, it doesn’t reproduce. It’s flaky on our test suite, but `@loiclec` was able to write a minimal test that reproduces it every time on macOS.
Basically, what happens is when there are multiple threads opening databases in a different directory at the same time.
If there are 10 or more threads running at the same time, lmdb starts throwing the `Invalid argument (os error 22)` error for no reason, we believe.
I would like to submit an issue to lmdb, but I don’t really have the time to write a test in C without heed currently.
`@hyc,` if you want to take a look at it, here is the repo that reproduces the issue on macOS: https://github.com/irevoire/heed-bug
Co-authored-by: Irevoire <tamo@meilisearch.com>
691: Update version for the next release (v0.35.1) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: Kerollmops <Kerollmops@users.noreply.github.com>
689: Handle non-finite floats consistently in filters r=irevoire a=dureuill
# Pull Request
## Related issue
Related meilisearch/meilisearch#3000
## What does this PR do?
### User
- Filters using `field = inf`, (or `infinite`, `NaN`) now match the value as a string rather than returning an internal error.
- Filters using `field < inf` (or other comparison operators) now return an invalid_filter error rather than returning an internal error, much like when using `field < aaa`.
### Implementation
- Add new `NonFiniteFloat` error variants to the filter-parser errors
- Add `Token::parse_as_finite_float` that can fail both when the string is not a float and when the float is not finite
- Refactor `Filter::inner_evaluate` to always use `parse_as_finite_float` instead of just `parse`
- Add corresponding tests
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
673: Add clippy job r=ManyTheFish a=unvalley
# Pull Request
## Related issue
Fixes#231
## What does this PR do?
- fix some clippy errors remain
- add clippy job to CI (I set `nightly` as toolchain)
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: unvalley <kirohi.code@gmail.com>
659: Fix clippy error to add clippy job on Ci r=Kerollmops a=unvalley
## Related PR
This PR is for #673
## What does this PR do?
- ~~add `Run Clippy` job to CI (rust.yml)~~
- apply `cargo clippy --fix` command
- fix some `cargo clippy` error manually (but warnings still remain on tests)
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: unvalley <kirohi.code@gmail.com>
Co-authored-by: unvalley <38400669+unvalley@users.noreply.github.com>
679: Bump Swatinem/rust-cache from 2.0.0 to 2.0.1 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.0.0 to 2.0.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.0.1</h2>
<ul>
<li>Primarily just updating dependencies to fix GitHub deprecation notices.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.0.1</h2>
<ul>
<li>Primarily just updating dependencies to fix GitHub deprecation notices.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="22c9328bcb"><code>22c9328</code></a> 2.0.1</li>
<li><a href="d4d463bd9b"><code>d4d463b</code></a> bump deps and rebuild</li>
<li><a href="c4652c677c"><code>c4652c6</code></a> Update <code>`@actions/core</code>` (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/83">#83</a>)</li>
<li><a href="76686c56f2"><code>76686c5</code></a> docs: Fix github workflows directory (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/79">#79</a>)</li>
<li><a href="1b43d2f2c3"><code>1b43d2f</code></a> remove outdated versioning note</li>
<li><a href="20b9201e8a"><code>20b9201</code></a> bump cargo hash</li>
<li><a href="0d72e5f9a0"><code>0d72e5f</code></a> revert explicit dir close</li>
<li><a href="86531941c2"><code>8653194</code></a> Merge branch 'master' of <a href="https://github.com/Swatinem/rust-cache">https://github.com/Swatinem/rust-cache</a></li>
<li><a href="be4be3720d"><code>be4be37</code></a> explicitly close dir handles, add more logging, cleanups</li>
<li><a href="213334cd98"><code>213334c</code></a> cargo update</li>
<li>Additional commits viewable in <a href="https://github.com/Swatinem/rust-cache/compare/v2.0.0...v2.0.1">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This patch fixes#3011.
This patch fixes the depracation warning regarding the usage of
`set-output`.
This patch fixes the issues by switching the following format:
```
echo ::set-output name=[name]::[value]
```
into the following format:
```
echo "[name]=[value]" >> ${GITHUB_OUTPUT}
```
677: run the tests in all workspaces r=curquiza a=irevoire
With #676 I noticed the tests were not running in any of our sub crates.
Most of our sub crates didn't includes any tests though.
But the filter-parser did and we're lucky we never broke these one without noticing 😁
Co-authored-by: Irevoire <tamo@meilisearch.com>
This patch possibly fixes#2987.
This patch introduces a way to fill the IndexMetadata.created_at
and IndexMetadata.updated_at keys from the tasks events.
This is done by reading the creation date of the first event
(created_at) and the creation date of the last event (updated_at).
676: chore: added `IN`,`NOT IN` to `invalid_filter` msg r=irevoire a=Pranav-yadav
# Pull Request
## Related issue
`Fixes` https://github.com/meilisearch/meilisearch/issues/3004
## What does this PR do?
- Improves correct error msg in response
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Pranav Yadav <Pranavyadav3912@gmail.com>
3001: Implement Uuid codec for heed r=Kerollmops a=elbertronnie
# Pull Request
## Related issue
Fixes#2984
## What does this PR do?
- Created a new heed codec for uuid::Uuid named as UuidCodec
- Replaced SerdeBincode\<Uuid\> with UuidCodec
- Removed the TODO in code associated with this issue
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Elbert Ronnie <elbert.ronniep@gmail.com>
3002: Fix dump import without instance uid r=Kerollmops a=irevoire
When creating a dump without any instance-uid (that can happen if you’ve always run meilisearch with the `--no-analytics` flag), you could get an error when trying to load the dump.
Co-authored-by: Irevoire <tamo@meilisearch.com>
675: Deleted empty files r=Kerollmops a=SKVKPandey
# Pull Request
## Related issue
Fixes#674
## What does this PR do?
Delete empty files:
- `milli/src/heed_codec/facet/facet_string_level_zero_value_codec.rs`
- `milli/src/heed_codec/facet/facet_string_zero_bounds_value_codec.rs`
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Shashank Kashyap <50551759+SKVKPandey@users.noreply.github.com>
664: Fix phrase search containing stop words r=ManyTheFish a=Samyak2
# Pull Request
This a WIP draft PR I wanted to create to let other potential contributors know that I'm working on this issue. I'll be completing this in a few hours from opening this.
## Related issue
Fixes#661 and towards fixing meilisearch/meilisearch#2905
## What does this PR do?
- [x] Change Phrase Operation to use a `Vec<Option<String>>` instead of `Vec<String>` where `None` corresponds to a stop word
- [x] Update all other uses of phrase operation
- [x] Update `resolve_phrase`
- [x] Update `create_primitive_query`?
- [x] Add test
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Samyak S Sarnayak <samyak201@gmail.com>
Co-authored-by: Samyak Sarnayak <samyak201@gmail.com>
2981: Move index swap error handling from meilisearch-http to index-scheduler r=irevoire a=loiclec
And make index_not_found error asynchronous, since we can't know whether the index will exist by the time the index swap task is processed.
Improve the index-swap test to verify that future tasks are not swapped and to test the new error messages that were introduced.
## Related issue
https://github.com/meilisearch/meilisearch/issues/2973
2996: Get rids of the unecessary tasks when an index_uid is specified r=Kerollmops a=irevoire
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
2982: Adapt task queries to account for special index swap rules r=irevoire a=loiclec
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/2970
## What does this PR do?
- Replace the `get_tasks` method with a `get_tasks_from_authorized_indexes` which returns the list of tasks matched by the query **from the point of view of the user**. That is, it takes into consideration the list of authorised indexes as well as the special case of `IndexSwap` which should not be returned if an index_uid is specified or if any of its associated indexes are not authorised.
- Adapt the code in other places following this change
- Add some tests
- Also the method `get_task_ids_from_authorized_indexes` now takes a read transaction as argument. This is because we want to make sure that the implementation of `get_tasks_from_authorized_indexes` only uses one read transaction. Otherwise, we could (1) get a list of task ids matching the query, then (2) one of these task ids is deleted by a taskDeletion task, and finally (3) we try to get the `Task`s associated with each returned task ids, and get a `CorruptedTaskQueue` error.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2993: Reconsider the Windows tests r=irevoire a=Kerollmops
This PR removes the `ignore` cfg on top of a lot of our tests. Now that we reworked the index scheduler we can make them pass again!
Fixes#2038, fixes#1966.
Co-authored-by: Clément Renault <clement@meilisearch.com>
2991: Update version for the next release (v0.30.0) in Cargo.toml files r=Kerollmops a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2990: isolate the search in another task r=Kerollmops a=irevoire
In case there is a failure on milli's side that should avoid blocking the tokio main thread
Co-authored-by: Irevoire <tamo@meilisearch.com>
And make index_not_found error asynchronous, since we can't know
whether the index will exist by the time the index swap task is
processed.
Improve the index-swap test to verify that future tasks are not swapped
and to test the new error messages that were introduced.
2763: Index scheduler r=Kerollmops a=irevoire
Fix https://github.com/meilisearch/meilisearch/issues/2725
- [x] Durability of the tasks once an answer has been sent to the user.
- [x] Fix the analytics
- [x] Disable the auto-batching system.
- [x] Make sure the task scheduler run if there are tasks to process.
- [x] Auto-batching of enqueued tasks:
- [x] Do not batch operations from two different indexes.
- [x] Document addition.
- [x] Document updates.
- [x] Settings.
- [x] Document deletion.
- [x] Make sure that we only merge batches with the same index-creation rights:
- [x] the batch either starts with a `yes`
- [x] [we only batch `no`s together and stop batching when we encounter a `yes`](https://www.youtube.com/watch?v=O27mdRvR1GY)
- [x] Unify the logic about `false` and `true` index creation rights.
- [ ] Execute all batch kind:
- [x] Import dumps at startup time.
- [x] Export dumps i.e. export the tasks queue.
- [x] Document addition
- [x] Document update
- [x] Document deletion.
- [x] Clear all documents.
- [x] Update the settings of an index.
- [ ] Merge multiple settings into a single one.
- [x] Index update e.g. Create an Index, change an index primary key, delete an index.
- [x] Cancel enqueued or processing tasks (with filters) (don't count tasks from forbidden indexes) (can't cancel a task with a higher or equal task_id than your own).
- [x] Delete processed tasks from the task store (with filters) (don't count tasks from forbidden indexes) (can't flush a task with a higher or equal task_id than your own)
- [x] Document addition + settings
- [x] Document addition + settings + clear all documents
- [x] anything + index deletion
- [x] Snapshot
- [x] Make the `SnapshotCreation` task visible.
- [x] Snapshot tasks are scheduled by a detached thread.
- [x] Only include update files that are useful.
- [x] Check that statuses and details are correctly set. (ie; if you enqueue a `documentAddition`, is the `documentReceived` well set?)
- [x] Prioritize and reorder tasks i.e. Index deletion, Delete all the documents.
- [x] Always accept new tasks without blocking.
- [x] Fairly share the loads over the different indexes e.g. Always process the index queue with the lowest id.
- [x] Easily testable.
- [x] Well tested i.e. tasks reordering, tasks prioritizing, use atomic barriers to block the tasks for tests.
- [x] Dump
- [x] Serialize the uuid as string in the keys
- [x] Create a dump crate with getters and setters
- [x] Serialize the API key in the dump task
- [x] Get the instance-uuid in the dump task
- [x] List and filter tasks:
- [x] Paginate the tasks.
- [x] Filter by index name.
- [x] Filter on the status, the enqueued, processing, and finished tasks.
- [x] Filter on the type of task.
- [x] Check that it works in `meilisearch-http`.
- [x] Think about [the index wrapper](2c4c14caa8/index/src/updates.rs (L269)) and probably move or remove it.
- [x] Reduce the amount of copy/paste for the batched operations by creating a sub-enum for the `Batch` enum.
- [x] Move the `IndexScheduler` in the lib.rs file.
- [x] Think about the `MilliError` type and probably remove it.
- [x] Remove the `index` crate entirely
- [x] Remove the `Kind` type from the `TaskView` and introduce another type, remove the `<Kind as FromStr>`.
- [x] Once the point above is done; remove the unreachable variant from the autobatchingkind
- [x] Rename the `Settings` task `Kind` to `SettingsUpdate`
- [x] Rename the `DumpExport` task `Kind` to `DumpExport`
- [x] Path the error message when deserializing a `Kind` and `Status`.
- [x] Check the version file when starting.
- [x] Copy the version file when creating snapshots.
---------
Once everything above is done;
- [ ] Check what happens with the update files i.e. when are they deleted.
- [ ] When a TaskDeletion occurs
- [ ] When a TaskCancelation
- [ ] When a task is finished
- [ ] When a task fails
- [ ] When importing a dump forward the date to milli
- [ ] Add tests for the snapshots.
- [ ] Look at all the places where we put _TODOs_.
- [ ] Rename a bunch of things, see https://github.com/meilisearch/meilisearch/pull/2917
- [ ] Ensure that when compiling meilisearch-http with `no-default-features` it doesn’t pull lindera etc
- [ ] Run a bunch of operations in a `tokio::spawn_blocking`
- [ ] The search requests
- [ ] Issue to create once this is merged:
- [ ] Realtime progressing status e.g. Websocket events (optional).
- [ ] Implement an `Uuid` codec instead of using a `Bincode<Uuid>`.
- [ ] Handle the dump-v1
- [ ] When importing a dump v1 we could iterate over the whole task queue to find the creation and last update date
- [ ] When importing a dump v2 we could iterate over the whole task queue to find the creation and last update date
- [ ] When importing a dump v3 we could iterate over the whole task queue to find the creation and last update date
- [ ] When importing a dump v4 we could iterate over the whole task queue to find the creation and last update date
- [ ] When importing a dump v5 we could iterate over the whole task queue to find the creation and last update date
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
Use rfc3339 or YYYY-MM-DD.
Add a day to the parsed date when it is an excluded lower bound
and the YYYY-MM-DD was used.
Also the Query type does not need to be serialisable anymore
2922: Add new error when using /keys without masterkey set r=ManyTheFish a=vishalsodani
# Pull Request
## Related issue
Fixes#2918
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: vishalsodani <vishalsodani@rediffmail.com>
619: Refactor the Facets databases to enable incremental indexing r=curquiza a=loiclec
# Pull Request
## What does this PR do?
Party fixes https://github.com/meilisearch/milli/issues/605 by making the indexing of the facet databases (i.e. `facet_id_f64_docids` and `facet_id_string_docids`) incremental. It also closes#327 and https://github.com/meilisearch/meilisearch/issues/2820 . Two more untracked bugs were also fixed:
1. The facet distribution algorithm did not respect the `maxFacetValues` parameter when there were only a few candidate document ids.
2. The structure of the levels > 0 of the facet databases were not updated following the deletion of documents
## How to review this PR
First, read this comment to get an overview of the changes.
Then, based on this comment, raise any concerns you might have about:
1. the new structure of the databases
2. the algorithms for sort, facet distribution, and range search
3. the new/removed heed codecs
Then, weigh in on the following concerns:
1. adding `fuzzcheck` as a fuzz-only dependency may add too much complexity for the benefits it provides
2. the `ByteSliceRef` and `StrRefCodec` are misnamed or should not exist
3. the new behaviour of facet distributions can be considered incorrect
4. incremental deletion is useless given that documents are always deleted in bulk
## What's left for me to do
1. Re-read everything once to make sure I haven't forgotten anything
2. Wait for the results of the benchmarks and see if (1) they provide enough information (2) there was any change in performance, especially for search queries. Then, maybe, spend some time optimising the code.
3. Test whether the `info`/`http-ui` crates survived the refactor
## Old structure of the `facet_id_f64_docids` and `facet_id_string_docids` databases
Previously, these two databases had different but conceptually similar structures. For each field id, the facet number database had the following format:
```
┌───────────────────────────────┬───────────────────────────────┬───────────────┐
┌───────┐ │ 1.2 – 2 │ 3.4 – 100 │ 102 – 104 │
│Level 2│ │ │ │ │
└───────┘ │ a, b, d, f, z │ c, d, e, f, g │ u, y │
├───────────────┬───────────────┼───────────────┬───────────────┼───────────────┤
┌───────┐ │ 1.2 – 1.3 │ 1.6 – 2 │ 3.4 – 12 │ 12.3 – 100 │ 102 – 104 │
│Level 1│ │ │ │ │ │ │
└───────┘ │ a, b, d, z │ a, b, f │ c, d, g │ e, f │ u, y │
├───────┬───────┼───────┬───────┼───────┬───────┼───────┬───────┼───────┬───────┤
┌───────┐ │ 1.2 │ 1.3 │ 1.6 │ 2 │ 3.4 │ 12 │ 12.3 │ 100 │ 102 │ 104 │
│Level 0│ │ │ │ │ │ │ │ │ │ │ │
└───────┘ │ a, b │ d, z │ b, f │ a, f │ c, d │ g │ e │ e, f │ y │ u │
└───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┘
```
where the first line is the key of the database, consisting of :
- the field id
- the level height
- the left and right bound of the group
and the second line is the value of the database, consisting of:
- a bitmap of all the docids that have a facet value within the bounds
The `facet_id_string_docids` had a similar structure:
```
┌───────────────────────────────┬───────────────────────────────┬───────────────┐
┌───────┐ │ 0 – 3 │ 4 – 7 │ 8 – 9 │
│Level 2│ │ │ │ │
└───────┘ │ a, b, d, f, z │ c, d, e, f, g │ u, y │
├───────────────┬───────────────┼───────────────┬───────────────┼───────────────┤
┌───────┐ │ 0 – 1 │ 2 – 3 │ 4 – 5 │ 6 – 7 │ 8 – 9 │
│Level 1│ │ "ab" – "ac" │ "ba" – "bac" │ "gaf" – "gal" │"form" – "wow" │ "woz" – "zz" │
└───────┘ │ a, b, d, z │ a, b, f │ c, d, g │ e, f │ u, y │
├───────┬───────┼───────┬───────┼───────┬───────┼───────┬───────┼───────┬───────┤
┌───────┐ │ "ab" │ "ac" │ "ba" │ "bac" │ "gaf" │ "gal" │ "form"│ "wow" │ "woz" │ "zz" │
│Level 0│ │ "AB" │ " Ac" │ "ba " │ "Bac" │ " GAF"│ "gal" │ "Form"│ " wow"│ "woz" │ "ZZ" │
└───────┘ │ a, b │ d, z │ b, f │ a, f │ c, d │ g │ e │ e, f │ y │ u │
└───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┘
```
where, **at level 0**, the key is:
* the normalised facet value (string)
and the value is:
* the original facet value (string)
* a bitmap of all the docids that have this normalised string facet value
**At level 1**, the key is:
* the left bound of the range as an index in level 0
* the right bound of the range as an index in level 0
and the value is:
* the left bound of the range as a normalised string
* the right bound of the range as a normalised string
* a bitmap of all the docids that have a string facet value within the bounds
**At level > 1**, the key is:
* the left bound of the range as an index in level 0
* the right bound of the range as an index in level 0
and the value is:
* a bitmap of all the docids that have a string facet value within the bounds
## New structure of the `facet_id_f64_docids` and `facet_id_string_docids` databases
Now both the `facet_id_f64_docids` and `facet_id_string_docids` databases have the exact same structure:
```
┌───────────────────────────────┬───────────────────────────────┬───────────────┐
┌───────┐ │ "ab" (2) │ "gaf" (2) │ "woz" (1) │
│Level 2│ │ │ │ │
└───────┘ │ [a, b, d, f, z] │ [c, d, e, f, g] │ [u, y] │
├───────────────┬───────────────┼───────────────┬───────────────┼───────────────┤
┌───────┐ │ "ab" (2) │ "ba" (2) │ "gaf" (2) │ "form" (2) │ "woz" (2) │
│Level 1│ │ │ │ │ │ │
└───────┘ │ [a, b, d, z] │ [a, b, f] │ [c, d, g] │ [e, f] │ [u, y] │
├───────┬───────┼───────┬───────┼───────┬───────┼───────┬───────┼───────┬───────┤
┌───────┐ │ "ab" │ "ac" │ "ba" │ "bac" │ "gaf" │ "gal" │ "form"│ "wow" │ "woz" │ "zz" │
│Level 0│ │ │ │ │ │ │ │ │ │ │ │
└───────┘ │ [a, b]│ [d, z]│ [b, f]│ [a, f]│ [c, d]│ [g] │ [e] │ [e, f]│ [y] │ [u] │
└───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┘
```
where for all levels, the key is a `FacetGroupKey<T>` containing:
* the field id (`u16`)
* the level height (`u8`)
* the left bound of the range (`T`)
and the value is a `FacetGroupValue` containing:
* the number of elements from the level below that are part of the range (`u8`, =0 for level 0)
* a bitmap of all the docids that have a facet value within the bounds (`RoaringBitmap`)
The right bound of the range is now implicit, it is equal to `Excluded(next_left_bound)`.
In the code, the key is always encoded using `FacetGroupKeyCodec<C>` where `C` is the codec used to encode the facet value (either `OrderedF64Codec` or `StrRefCodec`) and the value is encoded with `FacetGroupValueCodec`.
Since both databases share the same structure, we can implement almost all operations only once by treating the facet value as a byte slice (i.e. `FacetGroupKey<&[u8]>` encoded as `FacetGroupKeyCodec<ByteSliceRef>`). This is, in my opinion, a big simplification.
The reason for changing the structure of the databases is to make it possible to incrementally add a facet value to an existing database. Since the `facet_id_string_docids` used to store indices to `level 0` in all levels > 0, adding an element to level 0 would potentially invalidate all the indices.
Note that the original string value of a facet is no longer stored in this database.
## Incrementally adding a facet value
Here I describe how we can add a facet value to the new database incrementally. If we want to add the document with id `z` and facet value `gap`., then we want to add/modify the elements highlighted below in pink:
<img width="946" alt="Screenshot 2022-09-12 at 10 14 54" src="https://user-images.githubusercontent.com/6040237/189605532-fe4b0f52-e13d-4b3c-92d9-10c705953e3d.png">
which results in:
<img width="662" alt="Screenshot 2022-09-12 at 10 23 29" src="https://user-images.githubusercontent.com/6040237/189607015-c3a37588-b825-43c2-878a-f8f85c000b94.png">
* one element was added in level 0
* one key/value was modified in level 1
* one value was modified in level 2
Adding this element was easy since we could simply add it to level 0 and then increase the `group_size` part of the value for the level above. However, in order to keep the structure balanced, we can't always do this. If the group size reaches a threshold (`max_group_size`), then we split the node into two. For example, let's imagine that `max_group_size` is `4` and we add the docid `y` with facet value `gas`. First, we add it in level 0:
<img width="904" alt="Screenshot 2022-09-12 at 10 30 40" src="https://user-images.githubusercontent.com/6040237/189608391-531f9df1-3424-4f1f-8344-73eb194570e5.png">
Then, we realise that the group size of its parent is going to reach the maximum group size (=4) and thus we split the parent into two nodes:
<img width="919" alt="Screenshot 2022-09-12 at 10 33 16" src="https://user-images.githubusercontent.com/6040237/189608884-66f87635-1fc6-41d2-a459-87c995491ac4.png">
and since we inserted an element in level 1, we also update level 2 accordingly, by increasing the group size of the parent:
<img width="915" alt="Screenshot 2022-09-12 at 10 34 42" src="https://user-images.githubusercontent.com/6040237/189609233-d4a893ff-254a-48a7-a5ad-c0dc337f23ca.png">
We also have two other parameters:
* `group_size` is the default group size when building the database from scratch
* `min_level_size` is the minimum number of elements that a level should contain
When the highest level size is greater than `group_size * min_level_size`, then we create an additional level above it.
There is one more edge case for the insertion algorithm. While we normally don't modify the existing left bounds of a key, we have to do it if the facet value being inserted is smaller than the first left bound. For example, inserting `"aa"` with the docid `w` would change the database to:
<img width="756" alt="Screenshot 2022-09-12 at 10 41 56" src="https://user-images.githubusercontent.com/6040237/189610637-a043ef71-7159-4bf1-b4fd-9903134fc095.png">
The root of the code for incremental indexing is the `FacetUpdateIncremental` builder.
## Incrementally removing a facet value
TODO: the algorithm was implemented and works, but its current API is: `fn delete(self, facet_value, single_docid)`. It removes the given document id from all keys containing the given facet value. I don't think it is the right way to implement it anymore. Perhaps a bitmap of docids should be given instead. This is fairly easy to do. But since we batch document deletions together (because of soft deletion), it's not clear to me anymore that incremental deletion should be implemented at all.
## Bulk insertion
While it's faster to incrementally add a single facet value to the database, it is sometimes **slower** to repeatedly add facet values one-by-one instead of doing it in bulk. For example, during initial indexing, we'd like to build the database from a list of facet values and associated document ids in one go. The `FacetUpdateBulk` builder provides a way to do so. It works by:
1. clearing all levels > 0 from the DB
2. adding all new elements in level 0
3. rebuilding the higher levels from scratch
The algorithm for bulk insertion is the same as the previous one.
## Choosing between incremental and bulk insertion
On my computer, I measured that is about 50x slower to add N facet values incrementally than it is to re-build a database with N facet values in level 0. Therefore, we dynamically choose to use either incremental insertion or bulk insertion based on (1) the number of existing elements in level 0 of the database and (2) the number of facet values from the new documents.
This is imprecise but is mainly aimed at avoiding the worst-case scenario where the incremental insertion method is used repeatedly millions of times.
## Fuzz-testing
**Potentially controversial:**
I fuzz-tested incremental addition and deletion using fuzzcheck, which found many bugs. The fuzz-test consists of inserting/deleting facet values and docids in succession, each operation is processed with different parameters for `group_size`, `max_group_size`, and `min_level_size`. After all the operations are processed, the content of level 0 is compared to the content of an equivalent structure with a simple and easily-checked implementation. Furthermore, we check that the database has a correct structure (all groups from levels > 0 correctly combine the content of their children). I also visualised the code coverage found by the fuzz-test. It covered 100% of the relevant code except for `unreachable/panic` statements and errors returned by `heed`.
The fuzz-test and the fuzzcheck dependency are only compiled when `cargo fuzzcheck` is used. For now, the dependency is from a local path on my computer, but it can be changed to a crate version if we decide to keep it.
## Algorithms operating on the facet databases
There are four important algorithms making use of the facet databases:
1. Sort, ascending
2. Sort, descending
3. Facet distribution
4. Range search
Previously, the implementation of all four algorithms was based on a number of iterators specific to each database kind (number or string): `FacetNumberRange`, `FacetNumberRevRange`, `FacetNumberIter` (with a reversed and reducing/non-reducing option), `FacetStringGroupRange`, `FacetStringGroupRevRange`, `FacetStringLevel0Range`, `FacetStringLevel0RevRange`, and `FacetStringIter` (reversed + reducing/non-reducing).
Now, all four algorithms have a unique implementation shared by both the string and number databases. There are four functions:
1. `ascending_facet_sort` in `search/facet/facet_sort_ascending.rs`
2. `descending_facet_sort` in `search/facet/facet_sort_descending.rs`
3. `iterate_over_facet_distribution` in `search/facet/facet_distribution_iter.rs`
4. `find_docids_of_facet_within_bounds` in `search/facet/facet_range_search.rs`
I have tried to test them with some snapshot tests but more testing could still be done. I don't *think* that the performance of these algorithms regressed, but that will need to be confirmed by benchmarks.
## Change of behaviour for facet distributions
Previously, the original string value of a facet was stored in the level 0 of `facet_id_string_docids `. This is no longer the case. The original string value was used in the implementation of the facet distribution algorithm. Now, to recover it, we pick a random document id which contains the normalised string value and look up the original one in `field_id_docid_facet_strings`. As a consequence, it may be that the string value returned in the field distribution does not appear in any of the candidates. For example,
```json
{ "id": 0, "colour": "RED" }
{ "id": 1, "colour": "red" }
```
Facet distribution for the `colour` field among the candidates `[1]`:
```
{ "RED": 1 }
```
Here, "RED" was given as the original facet value even though it does not appear in the document id `1`.
## Heed codecs
A number of heed codecs related to the facet databases were removed:
* `FacetLevelValueF64Codec`
* `FacetLevelValueU32Codec`
* `FacetStringLevelZeroCodec`
* `StringValueCodec`
* `FacetStringZeroBoundsValueCodec`
* `FacetValueStringCodec`
* `FieldDocIdFacetStringCodec`
* `FieldDocIdFacetF64Codec`
They were replaced by:
* `FacetGroupKeyCodec<C>` (replaces all key codecs for the facet databases)
* `FacetGroupValueCodec` (replaces all value codecs for the facet databases)
* `FieldDocIdFacetCodec<C>` (replaces `FieldDocIdFacetStringCodec` and `FieldDocIdFacetF64Codec`)
Since the associated encoded item of `FacetGroupKeyCodec<C>` is `FacetKey<T>` and we often work with `FacetKey<&[u8]>` and `FacetKey<&str>`, then we need to have codecs that encode values of type `&str` and `&[u8]`. The existing `ByteSlice` and `Str` codecs do not work for that purpose (their `EItem` are `[u8]` and `str`), I have also created two new codecs:
* `ByteSliceRef` is a codec with a `EItem = DItem = &[u8]`
* `StrRefCodec` is a codec with a `EItem = DItem = &str`
I have also factored out the code used to encode an ordered f64 into its own `OrderedF64Codec`.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
668: Fix many Clippy errors part 2 r=ManyTheFish a=ehiggs
This brings us a step closer to enforcing clippy on each build.
# Pull Request
## Related issue
This does not fix any issue outright, but it is a second round of fixes for clippy after https://github.com/meilisearch/milli/pull/665. This should contribute to fixing https://github.com/meilisearch/milli/pull/659.
## What does this PR do?
Satisfies many issues for clippy. The complaints are mostly:
* Passing reference where a variable is already a reference.
* Using clone where a struct already implements `Copy`
* Using `ok_or_else` when it is a closure that returns a value instead of using the closure to call function (hence we use `ok_or`)
* Unambiguous lifetimes don't need names, so we can just use `'_`
* Using `return` when it is not needed as we are on the last expression of a function.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>
671: Update version for the next release (v0.35.0) in Cargo.toml files r=Kerollmops a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
e.g. add one facet value incrementally with a group_size = X and then
add another one with group_size = Y
It is not actually possible to do so with the public API of milli,
but I wanted to make sure the algorithm worked well in those cases
anyway.
The bugs were found by fuzzing the code with fuzzcheck, which I've added
to milli as a conditional dev-dependency. But it can be removed later.
616: Introduce an indexation abortion function when indexing documents r=Kerollmops a=Kerollmops
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
669: Add method to create a new Index with specific creation dates r=irevoire a=loiclec
This functionality is needed to implement the import of dumps correctly.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2961: Changed error message for config file r=curquiza a=LunarMarathon
# Pull Request
## Related issue
Fixes#2959
## What does this PR do?
- Changed the error message as required in the issue - changed config to configuration
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: LunarMarathon <lmaytan24@gmail.com>
2951: Update mini-dashboard to v0.2.3 r=curquiza a=mdubus
# Pull Request
## What does this PR do?
- Update the mini-dashboard with its latest release (v0.2.3)
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
2930: fix wrong variant returned for invalid_api_key_indexes error r=ManyTheFish a=vishalsodani
# Pull Request
## Related issue
Fixes#2924
## What does this PR do?
- ...
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: vishalsodani <vishalsodani@rediffmail.com>
667: Update version for the next release (v0.34.0) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
665: Fixing piles of clippy errors. r=ManyTheFish a=ehiggs
## Related issue
No issue fixed. Simply cleaning up some code for clippy on the march towards a clean build when #659 is merged.
## What does this PR do?
Most of these are calling clone when the struct supports Copy.
Many are using & and &mut on `self` when the function they are called from already has an immutable or mutable borrow so this isn't needed.
I tried to stay away from actual changes or places where I'd have to name fresh variables.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>
2928: Improve default config file r=curquiza a=curquiza
Following https://github.com/meilisearch/meilisearch/pull/2745#pullrequestreview-1115361274
`@gmourier` `@maryamsulemani97,` here are the changes I applied
- [After discussing with the cloud team](https://meilisearch.slack.com/archives/C03T1T47TUG/p1666014688582379) (internal link only), I let commented and uncommented the variables depending on their role.
- The only boolean variable I let be commented is the `no_analytics` one, and I wrote the non-default value (i.e. `true`)-> it will be easier for the users to disable analytics by only uncommenting the line. Plus, `no_analytics = false` is a double negative and not really easy to understand.
- I removed the `INDEX` section that was not really related to indexes
- I keep only 4 sections: SSL, Dumps, Snapshots, and an untitled section with all the uncategorized variables. I gathered all the uncategorized variable at the top of the file, so in the "untitled" section.
- I copied/pasted the really basic [documentation description](https://docs.meilisearch.com/learn/configuration/instance_options.html) of the variables
- I added every doc linked to each variable.
Let me know if you agree or not with my choices!
Thanks for your review 🙏
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
586: Add settings to force milli to exhaustively compute the total number of hits r=Kerollmops a=ManyTheFish
Add a new setting `exhaustive_number_hits` to `Search` forcing the `Initial` criterion to exhaustively compute the bucket_candidates allowing the end users to implement finite pagination.
related to https://github.com/meilisearch/meilisearch/pull/2601
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2913: download-latest: some refactoring r=curquiza a=nfsec
# Pull Request
## What does this PR do?
- Usually the elevation of variables.
## PR checklist
Please check if your PR fulfills the following requirements:
- [X] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [X] Have you read the contributing guidelines?
- [X] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Patryk Krawaczyński <nfsec@users.noreply.github.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2851: Upgrade clap to 4.0 r=loiclec a=choznerol
# Pull Request
## Related issue
Fixes#2846
This PR is draft based on #2847 to avoid conflict. I will rebase and mark as 'Ready for review' after #2847 is merged.
## What does this PR do?
1. Upgrade clap to the latest version or 4.0 (4.0.9 as of today) by following the [migrating instruction](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md#migrating) from [4.0 changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md#migrating)
2. Fix an `ArgGroup` typo that can only be caught after upgrading to 4.0 in 20a715e29ed17c5a76229c98fb31504ada873597
## Notable changes
### The `--help` message
The format, ordering and indentation of `--help` message was changed in 4.0. I recorded the output of `cargo run -- --help` before and after upgrade to 4.0 for reference.
<details>
<summary>diff</summary>
Output of `diff --ignore-all-space --text --unified --new-file help-message-before.txt help-message-after.txt`:
```diff
--- help-message-before.txt 2022-10-14 16:45:36.000000000 +0800
+++ help-message-after.txt 2022-10-14 16:36:53.000000000 +0800
`@@` -1,12 +1,8 `@@`
-meilisearch-http 0.29.1
+Usage: meilisearch [OPTIONS]
-USAGE:
- meilisearch [OPTIONS]
-
-OPTIONS:
+Options:
--config-file-path <CONFIG_FILE_PATH>
- Set the path to a configuration file that should be used to setup the engine. Format
- must be TOML
+ Set the path to a configuration file that should be used to setup the engine. Format must be TOML
--db-path <DB_PATH>
Designates the location where database files will be created and retrieved
`@@` -26,15 +22,14 `@@`
[default: dumps/]
--env <ENV>
- Configures the instance's environment. Value must be either `production` or
- `development`
+ Configures the instance's environment. Value must be either `production` or `development`
[env: MEILI_ENV=]
[default: development]
[possible values: development, production]
-h, --help
- Print help information
+ Print help information (use `-h` for a summary)
--http-addr <HTTP_ADDR>
Sets the HTTP address and port Meilisearch will use
`@@` -43,63 +38,53 `@@`
[default: 127.0.0.1:7700]
--http-payload-size-limit <HTTP_PAYLOAD_SIZE_LIMIT>
- Sets the maximum size of accepted payloads. Value must be given in bytes or explicitly
- stating a base unit (for instance: 107374182400, '107.7Gb', or '107374 Mb')
+ Sets the maximum size of accepted payloads. Value must be given in bytes or explicitly stating a base unit (for instance: 107374182400, '107.7Gb', or '107374 Mb')
[env: MEILI_HTTP_PAYLOAD_SIZE_LIMIT=]
[default: 100000000]
--ignore-dump-if-db-exists
- Prevents a Meilisearch instance with an existing database from throwing an error when
- using `--import-dump`. Instead, the dump will be ignored and Meilisearch will launch
- using the existing database.
+ Prevents a Meilisearch instance with an existing database from throwing an error when using `--import-dump`. Instead, the dump will be ignored and Meilisearch will launch using the existing database.
This option will trigger an error if `--import-dump` is not defined.
[env: MEILI_IGNORE_DUMP_IF_DB_EXISTS=]
--ignore-missing-dump
- Prevents Meilisearch from throwing an error when `--import-dump` does not point to a
- valid dump file. Instead, Meilisearch will start normally without importing any dump.
+ Prevents Meilisearch from throwing an error when `--import-dump` does not point to a valid dump file. Instead, Meilisearch will start normally without importing any dump.
This option will trigger an error if `--import-dump` is not defined.
[env: MEILI_IGNORE_MISSING_DUMP=]
--ignore-missing-snapshot
- Prevents a Meilisearch instance from throwing an error when `--import-snapshot` does not
- point to a valid snapshot file.
+ Prevents a Meilisearch instance from throwing an error when `--import-snapshot` does not point to a valid snapshot file.
This command will throw an error if `--import-snapshot` is not defined.
[env: MEILI_IGNORE_MISSING_SNAPSHOT=]
--ignore-snapshot-if-db-exists
- Prevents a Meilisearch instance with an existing database from throwing an error when
- using `--import-snapshot`. Instead, the snapshot will be ignored and Meilisearch will
- launch using the existing database.
+ Prevents a Meilisearch instance with an existing database from throwing an error when using `--import-snapshot`. Instead, the snapshot will be ignored and Meilisearch will launch using the existing database.
This command will throw an error if `--import-snapshot` is not defined.
[env: MEILI_IGNORE_SNAPSHOT_IF_DB_EXISTS=]
--import-dump <IMPORT_DUMP>
- Imports the dump file located at the specified path. Path must point to a `.dump` file.
- If a database already exists, Meilisearch will throw an error and abort launch
+ Imports the dump file located at the specified path. Path must point to a `.dump` file. If a database already exists, Meilisearch will throw an error and abort launch
[env: MEILI_IMPORT_DUMP=]
--import-snapshot <IMPORT_SNAPSHOT>
- Launches Meilisearch after importing a previously-generated snapshot at the given
- filepath
+ Launches Meilisearch after importing a previously-generated snapshot at the given filepath
[env: MEILI_IMPORT_SNAPSHOT=]
--log-level <LOG_LEVEL>
Defines how much detail should be present in Meilisearch's logs.
- Meilisearch currently supports five log levels, listed in order of increasing verbosity:
- ERROR, WARN, INFO, DEBUG, TRACE.
+ Meilisearch currently supports five log levels, listed in order of increasing verbosity: ERROR, WARN, INFO, DEBUG, TRACE.
[env: MEILI_LOG_LEVEL=]
[default: INFO]
`@@` -110,31 +95,25 `@@`
[env: MEILI_MASTER_KEY=]
--max-index-size <MAX_INDEX_SIZE>
- Sets the maximum size of the index. Value must be given in bytes or explicitly stating a
- base unit (for instance: 107374182400, '107.7Gb', or '107374 Mb')
+ Sets the maximum size of the index. Value must be given in bytes or explicitly stating a base unit (for instance: 107374182400, '107.7Gb', or '107374 Mb')
[env: MEILI_MAX_INDEX_SIZE=]
[default: 107374182400]
--max-indexing-memory <MAX_INDEXING_MEMORY>
- Sets the maximum amount of RAM Meilisearch can use when indexing. By default,
- Meilisearch uses no more than two thirds of available memory
+ Sets the maximum amount of RAM Meilisearch can use when indexing. By default, Meilisearch uses no more than two thirds of available memory
[env: MEILI_MAX_INDEXING_MEMORY=]
[default: "21.33 TiB"]
--max-indexing-threads <MAX_INDEXING_THREADS>
- Sets the maximum number of threads Meilisearch can use during indexation. By default,
- the indexer avoids using more than half of a machine's total processing units. This
- ensures Meilisearch is always ready to perform searches, even while you are updating an
- index
+ Sets the maximum number of threads Meilisearch can use during indexation. By default, the indexer avoids using more than half of a machine's total processing units. This ensures Meilisearch is always ready to perform searches, even while you are updating an index
[env: MEILI_MAX_INDEXING_THREADS=]
[default: 5]
--max-task-db-size <MAX_TASK_DB_SIZE>
- Sets the maximum size of the task database. Value must be given in bytes or explicitly
- stating a base unit (for instance: 107374182400, '107.7Gb', or '107374 Mb')
+ Sets the maximum size of the task database. Value must be given in bytes or explicitly stating a base unit (for instance: 107374182400, '107.7Gb', or '107374 Mb')
[env: MEILI_MAX_TASK_DB_SIZE=]
[default: 107374182400]
```
- ~[help-message-before.txt](https://github.com/meilisearch/meilisearch/files/9715683/help-message-before.txt)~ [help-message-before.txt](https://github.com/meilisearch/meilisearch/files/9784156/help-message-before-2.txt)
- ~[help-message-after.txt](https://github.com/meilisearch/meilisearch/files/9715682/help-message-after.txt)~ [help-message-after.txt](https://github.com/meilisearch/meilisearch/files/9784091/help-message-after.txt)
</details>
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Lawrence Chou <choznerol@protonmail.com>
The 'Tests on windows-latest' now failed with error message below
---- option::test::test_meilli_config_file_path_invalid stdout ----
thread 'option::test::test_meilli_config_file_path_invalid' panicked at 'assertion failed:
left: `"unable to open or read the \"../configgg.toml\" configuration file: The system cannot find the file specified. (os error 2)."`,
right: `"unable to open or read the \"../configgg.toml\" configuration file: No such file or directory (os error 2)."`', meilisearch-http\src\option.rs:555:17
https://github.com/meilisearch/meilisearch/actions/runs/3231941308/jobs/5291998750
Most of these are calling clone when the struct supports Copy.
Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.
I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2876: Full support for compressed (Gzip, Brotli, Zlib) API requests r=Kerollmops a=mou
# Pull Request
## Related issue
Fixes#2802
## What does this PR do?
- Adds missed content-encoding support for streamed requests (documents)
- Adds additional tests to validate content-encoding support is in place
- Adds new tests to validate content-encoding support for previously existing code (built-in actix functionality for unmarshaling JSON payloads)
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Andrey "MOU" Larionov <anlarionov@gmail.com>
2896: Fix CI to send signal to Cloud team r=Kerollmops a=curquiza
Hello `@eskombro` 👋
I realized the CI we recently created together was not good when we release the official Meilisearch version. Indeed, in this case `steps.meta.outputs.tags` contains several tags, see: https://github.com/meilisearch/meilisearch/actions/runs/3197492898/jobs/5220776456
You might want to ask: why the CI, including the Cloud team signal, was available when releasing v0.29.1 and not v0.29.0? Good question, thanks for asking it Sam! It was a mistake on our side, it should not have been available for v0.29.1, and this is how I found out v0.29.1 was broken and contained commits it should not have. So I deleted everything and started the release process again for v0.29.1.
Anyway, since I had the chance to see the bug in this release mess, I want to take the opportunity to fix it. Now, we will only send the real tag.
Here you have more documentation about what `github.ref_name` is: https://docs.github.com/en/actions/learn-github-actions/contexts
Bisous bisous!
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
662: Enhance word splitting strategy r=ManyTheFish a=akki1306
# Pull Request
## Related issue
Fixes#648
## What does this PR do?
- [split_best_frequency](55d889522b/milli/src/search/query_tree.rs (L282-L301)) to use frequency of word pairs near together with proximity value of 1 instead of considering the frequency of individual words. Word pairs having max frequency are considered.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Akshay Kulkarni <akshayk.gj@gmail.com>
Alongside request encoding (compression) support, it is helpful to verify that the server respect `Accept-Encoding` headers and apply the corresponding compression to responses.
604: Speed up debug builds r=Kerollmops a=loiclec
Note: this draft PR is based on https://github.com/meilisearch/milli/pull/601 , for no particular reason.
## What does this PR do?
Make a series of changes with the goal of speeding up debug builds:
1. Add an `all_languages` feature which compiles charabia with its `default` features activated.
The `all_languages` feature is activated by default. But running:
```
cargo build --no-default-features
```
on `milli` is now much faster.
2. Reduce the debug optimisation level from 3 to 0, except for a few critical dependencies.
3. Compile the build dependencies quicker as well. Previously, all build dependencies were compiled with `opt-level = 3`. Now, only the critical build dependencies are compiled with optimisations.
4. Reduce the amount of code generated by the `documents!` macro
5. Make the "progress update" closure provided to indexing functions a trait object instead of a generic parameter. This avoids monomorphising the indexing code multiple times needlessly.
## Results
Initial build times on my computer before and after these changes:
| | cargo check | cargo check --no-default-features | cargo test | cargo test --lib | cargo test --no-default-features | cargo test --lib --no-default-features |
|--------|-------------|-----------------------------------|------------|------------------|----------------------------------|----------------------------------------|
| before | 1m05s | 1m05s | 2m06s | 1m47s | 2m06 | 1m47s |
| after | 28.9s | 13.1s | 40s | 38s | 23s | 21s |
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2890: fix: add handle dumpCreation query on tasks request r=Kerollmops a=washbin
# Pull Request
## Related issue
Fixes#2874
## What does this PR do?
- add missing `DumpCreation` type in tasks route
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: washbin <76929116+washbin@users.noreply.github.com>
658: Add proximity calculation for the same word r=ManyTheFish a=msvaljek
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/milli/issues/647
## What does this PR do?
- During [the increase of the current word position](d94339a858/milli/src/update/index_documents/extract/extract_word_pair_proximity_docids.rs (L129-L135)) we extract the proximity between the current position and the next one.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: msvaljek <marko.svaljek@commercetools.com>
2862: Use Ubuntu 18.04 for all CI tasks that previously used Ubuntu 20.04 r=curquiza a=loiclec
This is to prevent linking with a version of glibc that is too recent.
With meilisearch v0.29.0 we inadvertently bumped the minimum supported glibc version to 2.29, which means it couldn't be run from Debian 10 (for example) anymore. By using Ubuntu 18.04, which uses glibc 2.27, we restore support for older Linux distros.
Fixes#2850
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
This is to prevent linking with a version of glibc that is too recent.
With meilisearch v0.29.0 we inadvertently bumped the minimum supported
glibc version to 2.29, which means it couldn't be run from Debian 10
(for example) anymore. By using Ubuntu 18.04, which uses glibc 2.27, we
restore support for older Linux distros.
2868: Uncomment cache steps in Github CI r=curquiza a=AM1TB
# Pull Request
Uncomment cache steps as they were previously affected by an issue with Github actions: https://www.githubstatus.com/incidents/gq1x0j8bv67v
## Related issue
Fixes#2864
## What does this PR do?
- Reintroduce the rust-cache steps within Github CI.
## PR checklist
Please check if your PR fulfills the following requirements:
- [X] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [X] Have you read the contributing guidelines?
- [X] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Amit Banerjee <amit.banerjee.jsr@gmail.com>
Actix provides different content encodings out of the box, but only if we use built-in content wrappers and containers. This patch wraps its own Payload implementation with an actix decoder, which enables request compression support.
Refactored tests code to allow to specify compression (content-encoding) algorithm.
Added tests to verify what actix actually handle different content encodings properly.
2861: Change default bind address to localhost r=Kerollmops a=Fall1ngStar
# Pull Request
## Related issue
Fixes#2782
## What does this PR do?
- Change the default bind address to `localhost` so that it can be accessed with IPv6
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Fall1ngStar <fall1ngstar.public@gmail.com>
Close#2800
This is an alternative to the `--config-file-path` option. If both `--config-file-path` and `MEILI_CONFIG_FILE_PATH` are present, `--config-file-path` takes precedence according to the "Priority order" section of #2558.
Not sure why but the compiler didn't catch this until clap is upgraded to v4.
Follwoing are the error from 'cargo test':
running 2 tests
test routes::indexes::search::test::test_fix_sort_query_parameters ... ok
test option::test::test_valid_opt ... FAILED
failures:
---- option::test::test_valid_opt stdout ----
thread 'option::test::test_valid_opt' panicked at 'Command meilisearch-http: Argument or group 'import-snapshot' specified in 'requires*' for 'ignore_missing_snapshot' does not exist', /Users/ychou/.cargo/registry/src/github.com-1ecc6299db9ec823/clap-4.0.9/src/builder/debug_asserts.rs:152:13
note: run with environment variable to display a backtrace
failures:
option::test::test_valid_opt
test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
2839: Update internal CLI documentation r=curquiza a=jeertmans
# Pull Request
## Related issue
Fixes#2810
## What does this PR do?
- Make internal CLI documentation match the online one
## Remarks
- Scope is limited to `meilisearch/meilisearch-http/src/option.rs`, not the flattened structs: should I also take care of them?
- Could not find online docs for `enable_metrics_route`
- Max column width wrapping was done by hand, so may not be perfect
- I removed the links from the internal doc: should I put them back?
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Jérome Eertmans <jeertmans@icloud.com>
2862: Use Ubuntu 18.04 for all CI tasks that previously used Ubuntu 20.04 r=curquiza a=loiclec
This is to prevent linking with a version of glibc that is too recent.
With meilisearch v0.29.0 we inadvertently bumped the minimum supported glibc version to 2.29, which means it couldn't be run from Debian 10 (for example) anymore. By using Ubuntu 18.04, which uses glibc 2.27, we restore support for older Linux distros.
Fixes#2850
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2841: Bail if config file contains 'config_file_path' r=Kerollmops a=arriven
# Pull Request
## Related issue
Fixes#2801
## What does this PR do?
- Return an error if config file contains 'config_file_path'
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: arriven <20084245+Arriven@users.noreply.github.com>
This is to prevent linking with a version of glibc that is too recent.
With meilisearch v0.29.0 we inadvertently bumped the minimum supported
glibc version to 2.29, which means it couldn't be run from Debian 10
(for example) anymore. By using Ubuntu 18.04, which uses glibc 2.27, we
restore support for older Linux distros.
2847: Upgrade dependencies r=Kerollmops a=loiclec
# Pull Request
## Related issue
Partly fixes#2822
## What does this PR do?
Upgrade Meilisearch's dependencies to their latest versions, except:
- clap stays on 3.0 because it is more complicated to upgrade ( see #2846 )
- `enum_iterator` goes up to 1.1.2 instead of 1.2 because of the vergen dependency
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2837: chore: generate Apple Silicon binaries r=curquiza a=jeertmans
# Pull Request
This creates a new job in the "publish binaries" workflow.
User `@mohitsaxenaknoldus` did not seem to be active on this anymore, so I tryied myself ;-)
## Related issue
Fixes#2792
## What does this PR do?
- As titled
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Jérome Eertmans <jeertmans@icloud.com>
654: Re-upload milli's logo r=curquiza a=jeertmans
# Pull Request
## Related issue
None
## What does this PR do?
Apparently, some [commit](add96f921b) deleted the logo file, and updated the `src` path. It seems to me that this was an error, and that the logo file should have been moved, not deleted.
This fixes the problem of seeing this (see image) instead of the actual logo.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Jérome Eertmans <jeertmans@icloud.com>
2845: Fixed broken Link r=curquiza a=AnirudhDaya
# Pull Request
## Related issue
Fixes#2840
## What does this PR do?
- Broken link is fixed. Now it redirects to the correct website.
## PR checklist
Please check if your PR fulfils the following requirements:
- [X] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [X] Have you read the contributing guidelines?
- [X] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Anirudh Dayanand <anirudhdaya@gmail.com>
650: Add missing logging timer to extractors r=Kerollmops a=vishalsodani
# Pull Request
## What does this PR do?
#645
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: vishalsodani <vishalsodani@rediffmail.com>
2819: Replace a meaningless serde message r=irevoire a=onyxcherry
# Pull Request
## What does this PR do?
Fixes#2680
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->
I've renamed the `serde_msg` variable to a `message` as _message_ does or does not include the serde error message --> is more generic.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Tomasz Wiśniewski <tomasz@wisniewski.app>
2830: Increase max concurrent readers on indexes r=irevoire a=arriven
# Pull Request
## What does this PR do?
Fixes#2786
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: arriven <20084245+Arriven@users.noreply.github.com>
2817: Update Hacktoberfest section in CONTRIBUTING.md r=curquiza a=meili-bot
_This PR is auto-generated._
Following: af850854e4
Update Hacktoberfest section in CONTRIBUTING.md with the global guideline information.
Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
2827: Upgrade to alpine 3.16 r=curquiza a=nontw
# Pull Request
## What does this PR do?
Just a simple version upgrade to the existing Dockerfile. Tested building the image locally successfully.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: nont <nontkrub@gmail.com>
2834: Deleted v1.rs because it is not included in Project r=irevoire a=Himanshu664
# Pull Request
## Related issue
Fixes#2833
## What does this PR do?
- Deleted the v1.rs file because it is not included in project/
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Himanshu Malviya <gs2012023@sgsitsindore.in>
2745: Config file support r=curquiza a=mlemesle
# Pull Request
## What does this PR do?
Fixes#2558
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
2789: Fix typos r=Kerollmops a=kianmeng
# Pull Request
## What does this PR do?
Found via `codespell -L crate,nam,hart`.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
2814: Skip dashboard test if mini-dashboard feature is disabled r=Kerollmops a=jirutka
Fixes#2813
Fixes the following error:
cargo test --no-default-features
...
error: couldn't read target/debug/build/meilisearch-http-ec029d8c902cf2cb/out/generated.rs: No such file or directory (os error 2)
--> meilisearch-http/tests/dashboard/mod.rs:8:9
|
8 | include!(concat!(env!("OUT_DIR"), "/generated.rs"));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
error: could not compile `meilisearch-http` due to previous error
2826: Rename receivedDocumentIds into matchedDocuments r=Kerollmops a=Ugzuzg
# Pull Request
## What does this PR do?
Fixes#2799
Changes DocumentDeletion task details response.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Tested with curl:
```
curl \
-X POST 'http://localhost:7700/indexes/movies/documents/delete-batch' \
-H 'Content-Type: application/json' \
--data-binary '[
23488,
153738,
437035,
363869
]'
{"taskUid":1,"indexUid":"movies","status":"enqueued","type":"documentDeletion","enqueuedAt":"2022-10-01T20:06:37.105416054Z"}%
curl \
-X GET 'http://localhost:7700/tasks/1'
{"uid":1,"indexUid":"movies","status":"succeeded","type":"documentDeletion","details":{"matchedDocuments":4,"deletedDocuments":2},"duration":"PT0.005708322S","enqueuedAt":"2022-10-01T20:06:37.105416054Z","startedAt":"2022-10-01T20:06:37.115562733Z","finishedAt":"2022-10-01T20:06:37.121271055Z"}
```
Co-authored-by: mlemesle <lemesle.martin@hotmail.fr>
Co-authored-by: Kian-Meng Ang <kianmeng@cpan.org>
Co-authored-by: Jakub Jirutka <jakub@jirutka.cz>
Co-authored-by: Jarasłaŭ Viktorčyk <ugzuzg@gmail.com>
653: Fix#652 - Change Spelling of `author` in `README.md` r=curquiza a=anirudhRowjee
# Pull Request
## What does this PR do?
Fixes#652
- Changes spellings of `au{hor` to `author`
- Minor formatting changes in Markdown
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Anirudh Rowjee <ani.rowjee@gmail.com>
649: Update Hacktoberfest section in CONTRIBUTING.md r=curquiza a=meili-bot
_This PR is auto-generated._
Following: af850854e4
Update Hacktoberfest section in CONTRIBUTING.md with the global guideline information.
Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
Fixes the following error:
cargo test --no-default-features
...
error: couldn't read target/debug/build/meilisearch-http-ec029d8c902cf2cb/out/generated.rs: No such file or directory (os error 2)
--> meilisearch-http/tests/dashboard/mod.rs:8:9
|
8 | include!(concat!(env!("OUT_DIR"), "/generated.rs"));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
error: could not compile `meilisearch-http` due to previous error
2772: Improve issue template display to avoid support in Meilisearch issues r=curquiza a=curquiza
Move the support line higher to make it more visible
Like this: https://github.com/meilisearch/meilisearch/discussions/2780
2773: Allow building without specialized tokenizations r=curquiza a=jirutka
Fixes#2774
(Some of) these specialized tokenizations include huge dictionaries that currently account for 90% (!) of the meilisearch binary size.
This commit adds `chinese`, `hebrew`, `japanese`, and `thai` feature flags that are propagated via `milli` down to the `charabia` crate. To keep it backwards compatible, they are enabled by default.
Related to meilisearch/milli#632
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
Co-authored-by: Jakub Jirutka <jakub@jirutka.cz>
2790: Improve Docker CI for cloud team r=curquiza a=curquiza
Work with `@eskombro`
Update CI to send a signal to Cloud team CI when Docker image is pushed
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
642: Remove LTO in release profile r=Kerollmops a=loiclec
Since we can't enable it in Meilisearch (see https://github.com/meilisearch/meilisearch/pull/2717 ), we should not enable it in milli either. The goal is for milli's benchmarks to accurately represent its performance within meilisearch.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
641: Remove `helpers` crate r=Kerollmops a=loiclec
# Pull Request
## What does this PR do?
Remove the `helpers` crates, because (I think) we don't use it. This should have been part of https://github.com/meilisearch/milli/pull/636 , but I forgot about it then :)
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
(Some of) these specialized tokenizations include huge dictionaries
that currently account for 90% (!) of the meilisearch binary size.
This commit adds chinese, hebrew, japanese, and thai feature flags
that are propagated via milli down to the charabia crate. To keep it
backward compatible, they are enabled by default.
Related to meilisearch/milli#632
636: Remove unused `infos`, `http-ui`, and `milli/fuzz`, crates r=ManyTheFish a=loiclec
We haven't used the `infos/`, `http-ui/` and `milli/fuzz/` crates in a long time. They are not properly maintained and probably do not work correctly anymore.
This PR removes these crates entirely from the workspace to reduce the amount of code we need to maintain.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec
# Pull Request
## What does this PR do?
Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.
In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
638: Update version for the next release (v0.33.4) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2727: Don't panic when the error length is slightly over 100 r=Kerollmops a=onyxcherry
# Pull Request
## What does this PR do?
Fixes PR #2207 as [the last commit](7ece7a9d9e) has changed number of the characters at the end to leave in place from `50` to `85` **but the lower limit of a string length wasn't changed**.
Therefore, any data (e.g. example string from issue #2680) was causing `meilisearch` to **panic**.
So I simply raised the minimum value from `100` to `135` (`50 + 85`) to ensure that `replace_range()` won't panic due to an inverted range.
At the same time I am in favor of the `85` value which was changed in the `@CNLHC's` last commit.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing ~issue~ pull request?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Tomasz Wiśniewski <tomasz@wisniewski.app>
629: Update version for the next release (v0.33.3) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2755: Update mini-dashboard to v0.2.2 r=Kerollmops a=mdubus
# Pull Request
## What does this PR do?
Fixes#2716
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
624: Bump Swatinem/rust-cache from 1.3.0 to 2.0.0 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 1.3.0 to 2.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.0.0</h2>
<ul>
<li>The action code was refactored to allow for caching multiple workspaces and
different <code>target</code> directory layouts.</li>
<li>The <code>working-directory</code> and <code>target-dir</code> input options were replaced by a
single <code>workspaces</code> option that has the form of <code>$workspace -> $target</code>.</li>
<li>Support for considering <code>env-vars</code> as part of the cache key.</li>
<li>The <code>sharedKey</code> input option was renamed to <code>shared-key</code> for consistency.</li>
</ul>
<h2>v1.4.0</h2>
<ul>
<li>Clean both debug and release target directories.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.0.0</h2>
<ul>
<li>The action code was refactored to allow for caching multiple workspaces and
different <code>target</code> directory layouts.</li>
<li>The <code>working-directory</code> and <code>target-dir</code> input options were replaced by a
single <code>workspaces</code> option that has the form of <code>$workspace -> $target</code>.</li>
<li>Support for considering <code>env-vars</code> as part of the cache key.</li>
<li>The <code>sharedKey</code> input option was renamed to <code>shared-key</code> for consistency.</li>
</ul>
<h2>1.4.0</h2>
<ul>
<li>Clean both <code>debug</code> and <code>release</code> target directories.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="6720f05bc4"><code>6720f05</code></a> 2.0.0</li>
<li><a href="5733786579"><code>5733786</code></a> rebuild</li>
<li><a href="622616010e"><code>6226160</code></a> prepare v2</li>
<li><a href="0497f9301f"><code>0497f93</code></a> improve registry cleanpu</li>
<li><a href="7b8626742a"><code>7b86267</code></a> update registry cleaning</li>
<li><a href="911d8e9e55"><code>911d8e9</code></a> test sparse registry</li>
<li><a href="875be5ce2d"><code>875be5c</code></a> bump cache</li>
<li><a href="07a2ee71bc"><code>07a2ee7</code></a> lol, dependency check was reversed</li>
<li><a href="7c190ef171"><code>7c190ef</code></a> fix actual test code ;-)</li>
<li><a href="fffd6895b2"><code>fffd689</code></a> add some more tests</li>
<li>Additional commits viewable in <a href="https://github.com/Swatinem/rust-cache/compare/v1.3.0...v2.0.0">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
622: Minor fixes in the just added update-version CI r=ManyTheFish a=curquiza
These fixes are minor, and do not prevent us to use the current CI
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2744: Minor fixes in the just added update-version CI r=ManyTheFish a=curquiza
These fixes does not prevent us to use the current CI
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
* Add some tests
* Disallow index creation when API key doesn't havec explicitelly the right on the creating index
* Fix lazy index creation with `indexes.*` action
2740: Update checkout v2 to v3 in CI manifests and use a unique GitHub PAT r=Kerollmops a=curquiza
Upgrade the missing checkout v2 into v3
Probably a bad merge conflicts that make them removed when merging `stable` into `main` after v0.28.0 release.
Also, use `MEILI_BOT_GH_PAT` instead of `PUBLISH_TOKEN` and the default github token, which allow us to remove useless GitHub secrets (once this PR is merged and v0.29.0 is release because `PUBLISH_TOKEN` is still used on `release-v0.29.0`)
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2738: Add missing env vars for dumps and snapshots features r=irevoire a=gmourier
# Pull Request
## What does this PR do?
Fixes#2721
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>
2741: Add CI to update the Meilisearch version in Cargo.toml files r=ManyTheFish a=curquiza
Add a CI we can trigger manually to create a PR updating the Meilisearch version
The next step is to create a Slack bot that will trigger this CI
In the meantime, we can trigger this CI manually in the [Actions tab](https://github.com/meilisearch/milli/actions)
The `MEILI_BOT_GH_PAT` secrets has been added to the organization level, and is accessible for the following repositories (so far): Meilisearch, Milli and Charabia
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
621: Add CI to update the Milli version r=ManyTheFish a=curquiza
Add a CI we can trigger manually to create a PR updating the Milli version
The next step is to create a Slack bot that will trigger this CI
In the meantime, we can trigger this CI manually in the [Actions tab](https://github.com/meilisearch/milli/actions)
The `MEILI_BOT_GH_PAT` secrets has been added to the organization level, and is accessible for the following repositories (so far): Meilisearch, Milli and Charabia
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2726: Add dry run for publishing binaries: check the compilation works r=Kerollmops a=curquiza
To avoid realizing the compilation of one type of binary does not work during the release, I create a dry run for binary compilation every day at 2am 😇
See the problem we had recently because missing this CI: https://github.com/meilisearch/meilisearch/issues/2718
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2724: Make the document addition done log to appear once indexing is over r=curquiza a=evpeople
# Pull Request
## What does this PR do?
Fixes#2703
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: evpeople <hangcaihui@gmail.com>
618: Update version for next release (v0.33.1) in Cargo.toml r=Kerollmops a=curquiza
No breaking for this release
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
617: Accept integers as document ids again r=irevoire a=Kerollmops
This PR is related to https://github.com/meilisearch/meilisearch/issues/2723 and will fix when this PR will be merged, a new release deployed and used in Meilisearch itself.
This PR makes the indexer to try to parse the values of the fields identified as numbers i.e. `id:number` as integer first then as float if it fails.
Co-authored-by: Clément Renault <clement@meilisearch.com>
2717: Disable LTO due to compilation failures on some platforms r=curquiza a=loiclec
Meilisearch fails to compile on aarch64 Linux due to a linker error ( https://github.com/meilisearch/meilisearch/runs/8072616457?check_suite_focus=true ). This is probably caused by link-time-optimisation (LTO). Since it is not possible to modify a profile based on the target triple, this PR deactivates LTO completely for all platforms.
In the future, we might want to create different custom profiles, such as:
```toml
[profile.release-lto]
inherits = "release"
lto = "thin"
```
and compile Meilisearch using `cargo build --profile release-lto` on the platforms that can support it.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2713: Move prometheus behind a feature flag r=Kerollmops a=irevoire
We decided we wanted to continue working on this feature before making it public.
Co-authored-by: Tamo <tamo@meilisearch.com>
2702: Add link to the main image r=curquiza a=brunoocasali
I have wrapped the image with a `<a>` link, and it seems to be working fine, WDYT?
Co-authored-by: Bruno Casali <brunoocasali@gmail.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2504: New README 🌟 r=curquiza a=curquiza
⚠️ Please do not only look at the Markdown but also how the GitHub renders the README 😇👉👉 [Rendered](https://github.com/meilisearch/meilisearch/blob/new-readme/README.md) 👈👈
2697: Accept an environment variable to enable the metrics route r=ManyTheFish a=Kerollmops
With the PR Meilisearch is able to accept the `MEILI_ENABLE_METRICS_ROUTE` environment variable to enable the newly introduces metrics route.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2696: Add the new `metrics.get` and `metrics.all` actions rights r=Kerollmops a=Kerollmops
Follow the specification and add the new `metrics.get` and `metrics.all` actions, making the `/metrics` route only accessible with those rights.
Co-authored-by: Clément Renault <clement@meilisearch.com>
2636: Upgrade milli to v0.33.0 r=Kerollmops a=ManyTheFish
# Summary
- Update milli to v0.33.0
- Classify the new InvalidLmdbOpenOptions error as an Internal error
- Update filter error check in tests
- Introduce Terms Matching Policies
fixes#2479fixes#2484fixes#2486fixes#2516fixes#2578fixes#2580fixes#2583fixes#2600fixes#2640fixes#2672fixes#2679fixes#2686
# Terms Matching Policies
This PR allows end users to customize matching term policies
## Todo
- [x] Update the API to return the number of pages and allow users to directly choose a page instead of computing an offset
- [x] Change generation of the query tree depending on the chosen settings https://github.com/meilisearch/milli/pull/598
## Small Documentation
### Default search query
**request**:
```sh
curl \
-X POST 'http://localhost:7700/indexes/movies/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "doctor of tokio" }'
```
**result**:
```json
{
"hits":[...],
"estimatedTotalHits":32,
"query":"doctor of tokio",
"limit":20,
"offset":0,
"processingTimeMs":7
}
```
The default behavior doesn't change with the current Meilisearch behavior:
If we don't have enough documents to fit the requested limit, we remove the query words from the last to the first typed word.
## Search query with `optionalWords` parameter
**request**:
```sh
curl \
-X POST 'http://localhost:7700/indexes/movies/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "doctor of tokio", "matchingStrategy": "all"}'
```
**result**:
```json
{
"hits":[...],
"estimatedTotalHits":1,
"query":"doctor of tokio",
"limit":20,
"offset":0,
"processingTimeMs":7
}
```
### allowed `matchingStrategy` values
#### `last`
The default behavior, If we don't have enough documents to fit the requested limit, we remove the query words from the last to the first typed word.
#### `all`
No word will be removed, If we don't have enough documents to fit the requested limit, we return the number of documents we found.
### In charge of the feature
Core: `@ManyTheFish` & `@curquiza`
Docs: TBD
Integration: `@bidoubiwa`
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2689: Use mimalloc as the global allocator r=Kerollmops a=loiclec
milli has switched its global allocator to mimalloc already, and we have seen some performance gains as a result. Furthermore, we can use mimalloc as the global allocator on all platforms whereas jemalloc was only activated on Linux.
This PR brings mimalloc to Meilisearch as well.
2690: Add LTO and codegen-units=1 to release compile options r=Kerollmops a=loiclec
This PR brings Meilisearch's release compile options in line with milli (see https://github.com/meilisearch/milli/pull/606 ).
Adding LTO and codegen=units=1 will make compile times longer, but they also speed up the final binary significantly.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2691: Update version for next release (v0.29.0) in Cargo.toml files r=Kerollmops a=curquiza
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
598: Matching query terms policy r=Kerollmops a=ManyTheFish
## Summary
Implement several optional words strategy.
## Content
Replace `optional_words` boolean with an enum containing several term matching strategies:
```rust
pub enum TermsMatchingStrategy {
// remove last word first
Last,
// remove first word first
First,
// remove more frequent word first
Frequency,
// remove smallest word first
Size,
// only one of the word is mandatory
Any,
// all words are mandatory
All,
}
```
All strategies implemented during the prototype are kept, but only `Last` and `All` will be published by Meilisearch in the `v0.29.0` release.
## Related
spec: https://github.com/meilisearch/specifications/pull/173
prototype discussion: https://github.com/meilisearch/meilisearch/discussions/2639#discussioncomment-3447699
Co-authored-by: ManyTheFish <many@meilisearch.com>
610: Share heed between all sub-crates r=Kerollmops a=irevoire
# Pull Request
## What does this PR do?
Use the reexported version of heed in the benchmarks and the fuzzer
Co-authored-by: Irevoire <tamo@meilisearch.com>
2674: Add analytics on the stats routes r=ManyTheFish a=irevoire
# Pull Request
## What does this PR do?
Implements https://github.com/meilisearch/specifications/pull/169
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Irevoire <tamo@meilisearch.com>
2678: Accept either an array of documents or a single document r=irevoire a=Kerollmops
# Pull Request
## What does this PR do?
Fixes#2671
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Clément Renault <clement@meilisearch.com>
2677: Hide the batch_uid field from the tasks route r=Kerollmops a=Kerollmops
# Pull Request
## What does this PR do?
Fixes#2676
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Clément Renault <clement@meilisearch.com>
607: Better threshold r=Kerollmops a=irevoire
# Pull Request
## What does this PR do?
Fixes#570
This PR tries to improve the threshold used to trigger the real deletion of documents.
The deletion is now triggered in two cases;
- 10% of the total available space is used by soft deleted documents
- 90% of the total available space is used.
In this context, « total available space » means the `map_size` of lmdb.
And the size used by the soft deleted documents is actually an estimation. We can't determine precisely the size used by one document thus what we do is; take the total space used, divide it by the number of documents + soft deleted documents to estimate the size of one average document. Then multiply the size of one avg document by the number of soft deleted document.
--------
<img width="808" alt="image" src="https://user-images.githubusercontent.com/7032172/185083075-92cf379e-8ae1-4bfc-9ca6-93b54e6ab4e9.png">
Here we can see we have a ~10GB drift in the end between the space used by the soft deleted and the real space used by the documents.
Personally I don’t think that's a big issue because once the red line reach 90GB everything will be freed but now you know.
If you have an idea on how to improve this estimation I would love to hear it.
It look like the difference is linear so maybe we could simply multiply the current estimation by two?
Co-authored-by: Irevoire <tamo@meilisearch.com>
587: Word prefix pair proximity docids indexation refactor r=Kerollmops a=loiclec
# Pull Request
## What does this PR do?
Refactor the code of `WordPrefixPairProximityDocIds` to make it much faster, fix a bug, and add a unit test.
## Why is it faster?
Because we avoid using a sorter to insert the (`word1`, `prefix`, `proximity`) keys and their associated bitmaps, and thus we don't have to sort a potentially very big set of data. I have also added a couple of other optimisations:
1. reusing allocations
2. using a prefix trie instead of an array of prefixes to get all the prefixes of a word
3. inserting directly into the database instead of putting the data in an intermediary grenad when possible. Also avoid checking for pre-existing values in the database when we know for certain that they do not exist.
## What bug was fixed?
When reindexing, the `new_prefix_fst_words` prefixes may look like:
```
["ant", "axo", "bor"]
```
which we group by first letter:
```
[["ant", "axo"], ["bor"]]
```
Later in the code, if we have the word2 "axolotl", we try to find which subarray of prefixes contains its prefixes. This check is done with `word2.starts_with(subarray_prefixes[0])`, but `"axolotl".starts_with("ant")` is false, and thus we wrongly think that there are no prefixes in `new_prefix_fst_words` that are prefixes of `axolotl`.
## StrStrU8Codec
I had to change the encoding of `StrStrU8Codec` to make the second string null-terminated as well. I don't think this should be a problem, but I may have missed some nuances about the impacts of this change.
## Requests when reviewing this PR
I have explained what the code does in the module documentation of `word_pair_proximity_prefix_docids`. It would be nice if someone could read it and give their opinion on whether it is a clear explanation or not.
I also have a couple questions regarding the code itself:
- Should we clean up and factor out the `PrefixTrieNode` code to try and make broader use of it outside this module? For now, the prefixes undergo a few transformations: from FST, to array, to prefix trie. It seems like it could be simplified.
- I wrote a function called `write_into_lmdb_database_without_merging`. (1) Are we okay with such a function existing? (2) Should it be in `grenad_helpers` instead?
## Benchmark Results
We reduce the time it takes to index about 8% in most cases, but it varies between -3% and -20%.
```
group indexing_main_ce90fc62 indexing_word-prefix-pair-proximity-docids-refactor_cbad2023
----- ---------------------- ------------------------------------------------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.00 1893.0±233.03µs ? ?/sec 1.01 1921.2±260.79µs ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.05 9.4±3.51ms ? ?/sec 1.00 9.0±2.14ms ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.22 18.3±11.42ms ? ?/sec 1.00 15.0±5.79ms ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.00 41.4±4.20ms ? ?/sec 1.28 53.0±13.97ms ? ?/sec
indexing/-wiki-delete-searchable- 1.00 285.6±18.12ms ? ?/sec 1.03 293.1±16.09ms ? ?/sec
indexing/Indexing geo_point 1.03 60.8±0.45s ? ?/sec 1.00 58.8±0.68s ? ?/sec
indexing/Indexing movies in three batches 1.14 16.5±0.30s ? ?/sec 1.00 14.5±0.24s ? ?/sec
indexing/Indexing movies with default settings 1.11 13.7±0.07s ? ?/sec 1.00 12.3±0.28s ? ?/sec
indexing/Indexing nested movies with default settings 1.10 10.6±0.11s ? ?/sec 1.00 9.6±0.15s ? ?/sec
indexing/Indexing nested movies without any facets 1.11 9.4±0.15s ? ?/sec 1.00 8.5±0.10s ? ?/sec
indexing/Indexing songs in three batches with default settings 1.18 66.2±0.39s ? ?/sec 1.00 56.0±0.67s ? ?/sec
indexing/Indexing songs with default settings 1.07 58.7±1.26s ? ?/sec 1.00 54.7±1.71s ? ?/sec
indexing/Indexing songs without any facets 1.08 53.1±0.88s ? ?/sec 1.00 49.3±1.43s ? ?/sec
indexing/Indexing songs without faceted numbers 1.08 57.7±1.33s ? ?/sec 1.00 53.3±0.98s ? ?/sec
indexing/Indexing wiki 1.06 1051.1±21.46s ? ?/sec 1.00 989.6±24.55s ? ?/sec
indexing/Indexing wiki in three batches 1.20 1184.8±8.93s ? ?/sec 1.00 989.7±7.06s ? ?/sec
indexing/Reindexing geo_point 1.04 67.5±0.75s ? ?/sec 1.00 64.9±0.32s ? ?/sec
indexing/Reindexing movies with default settings 1.12 13.9±0.17s ? ?/sec 1.00 12.4±0.13s ? ?/sec
indexing/Reindexing songs with default settings 1.05 60.6±0.84s ? ?/sec 1.00 57.5±0.99s ? ?/sec
indexing/Reindexing wiki 1.07 1725.0±17.92s ? ?/sec 1.00 1611.4±9.90s ? ?/sec
```
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
594: Fix(Search): Fix phrase search candidates computation r=Kerollmops a=ManyTheFish
This bug is an old bug but was hidden by the proximity criterion,
Phrase searches were always returning an empty candidates list when the proximity criterion is deactivated.
Before the fix, we were trying to find any words[n] near words[n]
instead of finding any words[n] near words[n+1], for example:
for a phrase search '"Hello world"' we were searching for "hello" near "hello" first, instead of "hello" near "world".
Co-authored-by: ManyTheFish <many@meilisearch.com>
NOTE: The token_at_depth is method is a bit useless now, as the only
cases where there would be a toke at depth 1000 are the cases where
the parser already stack-overflowed earlier.
Example: (((((... (x=1) ...)))))
602: Use mimalloc as the default allocator r=Kerollmops a=loiclec
## What does this PR do?
Use mimalloc as the global allocator for milli's benchmarks on macOS.
## Why?
On Linux, we use jemalloc, which is a very fast allocator. But on macOS, we currently use the system allocator, which is very slow. In practice, this difference in allocator speed means that it is difficult to gain insight into milli's performance by running benchmarks locally on the Mac.
By using mimalloc, which is another excellent allocator, we reduce the speed difference between the two platforms.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
New full snapshot:
---
source: milli/src/update/word_prefix_pair_proximity_docids.rs
---
5 a 1 [101, ]
5 a 2 [101, ]
5 am 1 [101, ]
5 b 4 [101, ]
5 be 4 [101, ]
am a 3 [101, ]
amazing a 1 [100, ]
amazing a 2 [100, ]
amazing a 3 [100, ]
amazing an 1 [100, ]
amazing an 2 [100, ]
amazing b 2 [100, ]
amazing be 2 [100, ]
an a 1 [100, ]
an a 2 [100, 202, ]
an am 1 [100, ]
an an 2 [100, ]
an b 3 [100, ]
an be 3 [100, ]
and a 2 [100, ]
and a 3 [100, ]
and a 4 [100, ]
and am 2 [100, ]
and an 3 [100, ]
and b 1 [100, ]
and be 1 [100, ]
at a 1 [100, 202, ]
at a 2 [100, 101, ]
at a 3 [100, ]
at am 2 [100, 101, ]
at an 1 [100, 202, ]
at an 3 [100, ]
at b 3 [101, ]
at b 4 [100, ]
at be 3 [101, ]
at be 4 [100, ]
beautiful a 2 [100, ]
beautiful a 3 [100, ]
beautiful a 4 [100, ]
beautiful am 3 [100, ]
beautiful an 2 [100, ]
beautiful an 4 [100, ]
bell a 2 [101, ]
bell a 4 [101, ]
bell am 4 [101, ]
extraordinary a 2 [202, ]
extraordinary a 3 [202, ]
extraordinary an 2 [202, ]
house a 3 [100, 202, ]
house a 4 [100, 202, ]
house am 4 [100, ]
house an 3 [100, 202, ]
house b 2 [100, ]
house be 2 [100, ]
rings a 1 [101, ]
rings a 3 [101, ]
rings am 3 [101, ]
rings b 2 [101, ]
rings be 2 [101, ]
the a 3 [101, ]
the b 1 [101, ]
the be 1 [101, ]
New snapshot (yes, it's wrong as well, it will get fixed later):
---
source: milli/src/update/word_prefix_pair_proximity_docids.rs
---
5 a 1 [101, ]
5 a 2 [101, ]
5 am 1 [101, ]
5 b 4 [101, ]
5 be 4 [101, ]
am a 3 [101, ]
amazing a 1 [100, ]
amazing a 2 [100, ]
amazing a 3 [100, ]
amazing an 1 [100, ]
amazing an 2 [100, ]
amazing b 2 [100, ]
amazing be 2 [100, ]
an a 1 [100, ]
an a 2 [100, 202, ]
an am 1 [100, ]
an b 3 [100, ]
an be 3 [100, ]
and a 2 [100, ]
and a 3 [100, ]
and a 4 [100, ]
and b 1 [100, ]
and be 1 [100, ]
d\0 0 [100, 202, ]
an an 2 [100, ]
and am 2 [100, ]
and an 3 [100, ]
at a 2 [100, 101, ]
at a 3 [100, ]
at am 2 [100, 101, ]
at an 1 [100, 202, ]
at an 3 [100, ]
at b 3 [101, ]
at b 4 [100, ]
at be 3 [101, ]
at be 4 [100, ]
beautiful a 2 [100, ]
beautiful a 3 [100, ]
beautiful a 4 [100, ]
beautiful am 3 [100, ]
beautiful an 2 [100, ]
beautiful an 4 [100, ]
bell a 2 [101, ]
bell a 4 [101, ]
bell am 4 [101, ]
extraordinary a 2 [202, ]
extraordinary a 3 [202, ]
extraordinary an 2 [202, ]
house a 4 [100, 202, ]
house a 4 [100, ]
house am 4 [100, ]
house an 3 [100, 202, ]
house b 2 [100, ]
house be 2 [100, ]
rings a 1 [101, ]
rings a 3 [101, ]
rings am 3 [101, ]
rings b 2 [101, ]
rings be 2 [101, ]
the a 3 [101, ]
the b 1 [101, ]
the be 1 [101, ]
606: Make binaries faster on release profile through better compile options r=Kerollmops a=loiclec
Using `codegen-units = 1` and `lto = 'thin'` makes the compile time a bit longer, but also produces faster binaries.
I'd like to run milli's benchmark with these options, so that we can see whether it is worth enabling on meilisearch.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2664: 🐞 fix: Support https in print_launch_resume r=irevoire a=evpeople
fix#2660
# Pull Request
## What does this PR do?
Fixes#2660
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: evpeople <hangcaihui@gmail.com>
2523: Improve the tasks error reporting when processed in batches r=irevoire a=Kerollmops
This fixes#2478 by changing the behavior of the task handler when there is an error in a batch of document addition or update.
What changes is that when there is a user error in a task in a batch we now report this task as failed with the right error message but we continue to process the other tasks. A user error can be when a geo field is invalid, a document id is invalid, or missing.
fixes#2582, #2478
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
601: Introduce snapshot tests r=Kerollmops a=loiclec
# Pull Request
## What does this PR do?
Introduce snapshot tests into milli, by using the `insta` crate. This implements the idea described by #597
See: [insta.rs](https://insta.rs)
## Design
There is now a new file, `snapshot_tests.rs`, which is compiled only under `#[cfg(test)]`. It exposes the `db_snap!` macro, which is used to snapshot the content of a database.
When running `cargo test`, `insta` will check that the value of the current snapshot is the same as the previous one (on the file system). If they are the same, the test passes. If they are different, the test fails and you are asked to review the new snapshot to approve or reject it.
We don't want to save very large snapshots to the file system, because it will pollute the git repository and increase its size too much. Instead, we only save their `md5` hashes under the name `<snapshot_name>.hash.snap`. There is a new environment variable called `MILLI_TEST_FULL_SNAPS` which can be set to `true` in order to *also* save the full content of the snapshot under the name `<snapshot_name>.full.snap`. However, snapshots with the extension `.full.snap` are never saved to the git repository.
## Example
```rust
// In e.g. facets.rs
#[test]
fn my_test() {
// create an index
let index = TempIndex::new():
index.add_documents(...);
index.update_settings(|settings| ...);
// then snapshot the content of one of its databases
// the snapshot will be saved at the current folder under facets.rs/my_test/facet_id_string_docids.snap
db_snap!(index, facet_id_string_docids);
index.add_documents(...);
// we can also name the snapshot to ensure there is no conflict
// this snapshot will be saved at facets.rs/my_test/updated/facet_id_string_docids.snap
db_snap!(index, facet_id_string, docids, "updated");
// and we can also use "inline" snapshots, which insert their content in the given string literal
db_snap!(index, field_distributions, `@"");`
// once the snapshot is approved, it will automatically get transformed to, e.g.:
// db_snap!(index, field_distributions, `@"`
// my_facet 21
// other_field 3
// ");
// now let's add **many** documents
index.add_documents(...);
// because the snapshot is too big, its hash is saved instead
// if the MILLI_TEST_FULL_SNAPS env variable is set to true, then the full snapshot will also be saved
// at facets.rs/my_test/large/facet_id_string_docids.full.snap
db_snap!(index, facet_id_string_docids, "large", `@"5348bbc46b5384455b6a900666d2a502");`
}
```
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2662: Fix(cli): Clamp databases max size to a multiple of system page size r=Kerollmops a=ManyTheFish
fix#2659
Co-authored-by: ManyTheFish <many@meilisearch.com>
600: Simplify some unit tests r=ManyTheFish a=loiclec
# Pull Request
## What does this PR do?
Simplify the code that is used in unit tests to create and modify an index. Basically, the following code:
```rust
let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();
let mut wtxn = index.write_txn().unwrap();
let content = documents!([
{ "id": 0, "name": "kevin" },
]);
let config = IndexerConfig::default();
let indexing_config = IndexDocumentsConfig::default();
let builder =
IndexDocuments::new(&mut wtxn, &index, &config, indexing_config, |_| ()).unwrap();
let (builder, user_error) = builder.add_documents(content).unwrap();
user_error.unwrap();
builder.execute().unwrap();
wtxn.commit.unwrap();
let mut wtxn = index.write_txn().unwrap();
let config = IndexerConfig::default();
let mut builder = Settings::new(&mut wtxn, &index, &config);
builder.set_primary_key(S("docid"));
builder.set_filterable_fields(hashset! { S("label") });
builder.execute(|_| ()).unwrap();
wtxn.commit().unwrap();
```
becomes:
```rust
let index = TempIndex::new():
index.add_documents(documents!(
{ "id": 0, "name": "kevin" },
)).unwrap();
index.update_settings(|settings| {
settings.set_primary_key(S("docid"));
settings.set_filterable_fields(hashset! { S("label") });
}).unwrap();
```
Then there is a bunch of options to modify the indexing configs, the map size, to reuse a transaction, etc. For example:
```rust
let mut index = TempIndex::new_with_map_size(1000 * 4096 * 10);
index.index_documents_config.autogenerate_docids = true;
let mut wtxn = index.write_txn().unwrap();
index.update_settings_using_wtxn(&mut wtxn, |settings| {
settings.set_primary_key(S("docids"));
}).unwrap();
wtxn.commit().unwrap();
```
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
556: Add EXISTS filter r=loiclec a=loiclec
## What does this PR do?
Fixes issue [#2484](https://github.com/meilisearch/meilisearch/issues/2484) in the meilisearch repo.
It creates a `field EXISTS` filter which selects all documents containing the `field` key.
For example, with the following documents:
```json
[{
"id": 0,
"colour": []
},
{
"id": 1,
"colour": ["blue", "green"]
},
{
"id": 2,
"colour": 145238
},
{
"id": 3,
"colour": null
},
{
"id": 4,
"colour": {
"green": []
}
},
{
"id": 5,
"colour": {}
},
{
"id": 6
}]
```
Then the filter `colour EXISTS` selects the ids `[0, 1, 2, 3, 4, 5]`. The filter `colour NOT EXISTS` selects `[6]`.
## Details
There is a new database named `facet-id-exists-docids`. Its keys are field ids and its values are bitmaps of all the document ids where the corresponding field exists.
To create this database, the indexing part of milli had to be adapted. The implementation there is basically copy/pasted from the code handling the `facet-id-f64-docids` database, with appropriate modifications in place.
There was an issue involving the flattening of documents during (re)indexing. Previously, the following JSON:
```json
{
"id": 0,
"colour": [],
"size": {}
}
```
would be flattened to:
```json
{
"id": 0
}
```
prior to being given to the extraction pipeline.
This transformation would lose the information that is needed to populate the `facet-id-exists-docids` database. Therefore, I have also changed the implementation of the `flatten-serde-json` crate. Now, as it traverses the Json, it keeps track of which key was encountered. Then, at the end, if a previously encountered key is not present in the flattened object, it adds that key to the object with an empty array as value. For example:
```json
{
"id": 0,
"colour": {
"green": [],
"blue": 1
},
"size": {}
}
```
becomes
```json
{
"id": 0,
"colour": [],
"colour.green": [],
"colour.blue": 1,
"size": []
}
```
Co-authored-by: Kerollmops <clement@meilisearch.com>
2653: Bump Swatinem/rust-cache from 1.4.0 to 2.0.0 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 1.4.0 to 2.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.0.0</h2>
<ul>
<li>The action code was refactored to allow for caching multiple workspaces and
different <code>target</code> directory layouts.</li>
<li>The <code>working-directory</code> and <code>target-dir</code> input options were replaced by a
single <code>workspaces</code> option that has the form of <code>$workspace -> $target</code>.</li>
<li>Support for considering <code>env-vars</code> as part of the cache key.</li>
<li>The <code>sharedKey</code> input option was renamed to <code>shared-key</code> for consistency.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.0.0</h2>
<ul>
<li>The action code was refactored to allow for caching multiple workspaces and
different <code>target</code> directory layouts.</li>
<li>The <code>working-directory</code> and <code>target-dir</code> input options were replaced by a
single <code>workspaces</code> option that has the form of <code>$workspace -> $target</code>.</li>
<li>Support for considering <code>env-vars</code> as part of the cache key.</li>
<li>The <code>sharedKey</code> input option was renamed to <code>shared-key</code> for consistency.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="6720f05bc4"><code>6720f05</code></a> 2.0.0</li>
<li><a href="5733786579"><code>5733786</code></a> rebuild</li>
<li><a href="622616010e"><code>6226160</code></a> prepare v2</li>
<li><a href="0497f9301f"><code>0497f93</code></a> improve registry cleanpu</li>
<li><a href="7b8626742a"><code>7b86267</code></a> update registry cleaning</li>
<li><a href="911d8e9e55"><code>911d8e9</code></a> test sparse registry</li>
<li><a href="875be5ce2d"><code>875be5c</code></a> bump cache</li>
<li><a href="07a2ee71bc"><code>07a2ee7</code></a> lol, dependency check was reversed</li>
<li><a href="7c190ef171"><code>7c190ef</code></a> fix actual test code ;-)</li>
<li><a href="fffd6895b2"><code>fffd689</code></a> add some more tests</li>
<li>Additional commits viewable in <a href="https://github.com/Swatinem/rust-cache/compare/v1.4.0...v2.0.0">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This bug is an old bug but was hidden by the proximity criterion,
Phrase search were always returning an empty candidates list.
Before the fix, we were trying to find any words[n] near words[n]
instead of finding any words[n] near words[n+1], for example:
for a phrase search '"Hello world"' we were searching for "hello" near "hello" first, instead of "hello" near "world".
561: Enriched documents batch reader r=curquiza a=Kerollmops
~This PR is based on #555 and must be rebased on main after it has been merged to ease the review.~
This PR contains the work in #555 and can be merged on main as soon as reviewed and approved.
- [x] Create an `EnrichedDocumentsBatchReader` that contains the external documents id.
- [x] Extract the primary key name and make it accessible in the `EnrichedDocumentsBatchReader`.
- [x] Use the external id from the `EnrichedDocumentsBatchReader` in the `Transform::read_documents`.
- [x] Remove the `update_primary_key` from the _transform.rs_ file.
- [x] Really generate the auto-generated documents ids.
- [x] Insert the (auto-generated) document ids in the document while processing it in `Transform::read_documents`.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2631: Update mini-dashboard to v0.2.1 r=curquiza a=mdubus
# Pull Request
## What does this PR do?
Fixes#2629
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
2625: Update link to Cloud beta form r=curquiza a=davelarkan
# Pull Request
## What does this PR do?
Fixes#2624
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Dave Larkan <davelarkan@gmail.com>
The idea is to directly create a sorted and merged list of bitmaps
in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating
a grenad::Reader where the keys are field_id and the values are docids.
Then we send that BTreeMap to the thing that handles TypedChunks, which
inserts its content into the database.
588: Fix name of "release_date" facet in movies benchmarks r=ManyTheFish a=loiclec
## What does this PR do?
The `movies.json` file in the benchmark datasets contains a filterable field called "release_date", but the indexing benchmarks wrongly called the field "released_date" instead. This PR fixes that.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2592: Chores: Add a dedicated section for Language Support in the issue template r=curquiza a=ManyTheFish
This new section in put upper than the feature proposal because language support is kind of a sub-category of it,
and so, in the reading order, we choose to create a feature proposal only if it is not related to Language.
Co-authored-by: ManyTheFish <many@meilisearch.com>
This new section in put apper than feature proposal because language-support is kind of a sub-category of it,
and so, in the reading order, we choose to create a feature proposal only if it is not related to Language.
2591: Introduce the Tasks Seen event when filtering r=Kerollmops a=Kerollmops
This PR fixes#2377 by introducing the Tasks Seen analytics events.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2579: API keys: adds action * for actions r=irevoire a=phdavis1027
# Pull Request
This is PR builds on `@janithpet's` addition to DocumentsAll; it's basically a copy-and-paste job, except I used ```iter.filter()``` to avoid the possibility for duplication that they mentioned. I'm not sure how much that matters.
Also, hi! This is my first open-source contribution and my first attempt to write Rust for anyone other than myself, so any feedback whatsoever is appreciated.
## What does this PR do?
Fixes#2560
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Phillip Davis <phdavis1027@gmail.com>
2589: Update create-issue-dependencies.yml r=ManyTheFish a=curquiza
Minor change to update the format in the description of the issue created: remove the useless newlines
See the format without this change: https://github.com/meilisearch/meilisearch/issues/2588
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
583: Use BufReader to read datasets in benchmarks r=ManyTheFish a=loiclec
## What does this PR do?
Ensure that the datasets used by the benchmarks are read efficiently by using a `BufReader`.
## Why?
Using a `BufReader` is more representative of how `meilisearch` works. It will also make performance comparisons between different branches of `milli` more accurate.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2584: Format API keys in hexa instead of base64 r=curquiza a=ManyTheFish
This PR:
- Changes API key generation and formatting to ease the generation of the key made by our users
- updates the `uuid` crate version
The API key can now be generated in bash as below:
```sh
echo -n $HYPHENATED_UUID | openssl dgst -sha256 -hmac $MASTER_KEY
```
fixes the issue raised in [product/discussion#421](https://github.com/meilisearch/product/discussions/421#discussioncomment-3079410), this should not impact anything in documentation nor integration but ease the key generation on the user sides.
poke `@gmourier`
Co-authored-by: ManyTheFish <many@meilisearch.com>
2587: Update mini-dashboard to v0.2.0 r=curquiza a=mdubus
# Pull Request
## What does this PR do?
Fixes#2469
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
2539: Update Docker credentials r=curquiza a=curquiza
This is to avoid using the tpayet credentials and use `@meili-bot` credentials instead.
The `DOCKER_USERNAME` and `DOCKER_PASSWORD` are still present as secret, I will remove them once v0.28.0 is fully merged (they are still used on `release-v0.28.0`)
I tested by created a tag on the branch, it worked: the tag was pushed to docker hub by meili-bot
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
572: Add reindexing benchmarks r=Kerollmops a=irevoire
With #557 coming, we should add benchmarks that measure our impact on the reindexing process.
Co-authored-by: Tamo <tamo@meilisearch.com>
557: Fasten documents deletion and update r=Kerollmops a=irevoire
When a document deletion occurs, instead of deleting the document we mark it as deleted in the new “soft deleted” bitmap. It is then removed from the search and all the other endpoints.
I ran the benchmarks against main;
```
% ./compare.sh indexing_main_83ad1aaf.json indexing_fasten-document-deletion_abab51fb.json
group indexing_fasten-document-deletion_abab51fb indexing_main_83ad1aaf
----- ------------------------------------------ ----------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.05 2.0±0.40ms ? ?/sec 1.00 1904.9±190.00µs ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.00 10.3±2.64ms ? ?/sec 961.61 9.9±0.12s ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.00 15.1±3.90ms ? ?/sec 554.63 8.4±0.12s ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.00 45.1±7.53ms ? ?/sec 710.15 32.0±0.10s ? ?/sec
indexing/-wiki-delete-searchable- 1.00 277.8±7.97ms ? ?/sec 1946.57 540.8±3.15s ? ?/sec
indexing/Indexing geo_point 1.00 12.0±0.20s ? ?/sec 1.03 12.4±0.19s ? ?/sec
indexing/Indexing movies in three batches 1.00 19.3±0.30s ? ?/sec 1.01 19.4±0.16s ? ?/sec
indexing/Indexing movies with default settings 1.00 18.8±0.09s ? ?/sec 1.00 18.9±0.10s ? ?/sec
indexing/Indexing nested movies with default settings 1.00 25.9±0.19s ? ?/sec 1.00 25.9±0.12s ? ?/sec
indexing/Indexing nested movies without any facets 1.00 24.8±0.17s ? ?/sec 1.00 24.8±0.18s ? ?/sec
indexing/Indexing songs in three batches with default settings 1.00 65.9±0.96s ? ?/sec 1.03 67.8±0.82s ? ?/sec
indexing/Indexing songs with default settings 1.00 58.8±1.11s ? ?/sec 1.02 59.9±2.09s ? ?/sec
indexing/Indexing songs without any facets 1.00 53.4±0.72s ? ?/sec 1.01 54.2±0.88s ? ?/sec
indexing/Indexing songs without faceted numbers 1.00 57.9±1.17s ? ?/sec 1.01 58.3±1.20s ? ?/sec
indexing/Indexing wiki 1.00 1065.2±13.26s ? ?/sec 1.00 1065.8±12.66s ? ?/sec
indexing/Indexing wiki in three batches 1.00 1182.4±6.20s ? ?/sec 1.01 1190.8±8.48s ? ?/sec
```
Most things do not change, we lost 0.1ms on the indexing of geo point (I don’t get why), and then we are between 500 and 1900 times faster when we delete documents.
Co-authored-by: Tamo <tamo@meilisearch.com>
577: Fix deserialisation of NDJson documents in benchmarks r=irevoire a=loiclec
Previously, the first document in the NDJson file was read over and over again. So the `geo_point` benchmark was not working properly: it only indexed one document.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2561: Add dependabot for GHA r=Kerollmops a=curquiza
No panic, I only add dependabot to update our CI, not the rust dependencies. Indeed, no easy command exists for this.
If it's too much regular notification, as usual, I will remove it.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2525: Auth: Provide all document related permissions for action document.* r=Kerollmops a=janithpet
Added a `Action::DocumentsAll` identifier as [suggested](https://github.com/meilisearch/meilisearch/issues/2080#issuecomment-1022952486), along with the other necessary changes in `action.rs`.
Inside `store.rs`, added an extra condition in `HeedAuthStore::put_api_key` to append all document related permissions if `key.actions.contains(&DocumentsAll)`.
Updated the tests as [suggested](https://github.com/meilisearch/meilisearch/issues/2080#issuecomment-1022952486).
I am quite new to Rust, so please let me know if I had made any mistakes; have I written the code in the most idiomatic/efficient way? I am aware that the way I append the document permissions could create duplicates in the `actions` vector, but I am not sure how fix that in a simple way (other than using other dependencies like [itertools](https://github.com/rust-itertools/itertools), for example).
## What does this PR do?
Fixes#2080
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue?
- [ x] Have you read the contributing guidelines?
- [ x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: janithPet <jpetangoda@gmail.com>
2557: add more tests on the formatted route r=Kerollmops a=irevoire
We had a bunch of tests trying to send array of array in a get request, this is actually not supported thus I updated the tests to only send a single array or a direct string.
Also, the real tests that ensure the array of array are well handled are in milli so I don’t think we should lose time trying to « improve » our test surface on this point.
Co-authored-by: Irevoire <tamo@meilisearch.com>
568: Fix not equal filter when field contains both number and strings r=Kerollmops a=GraDKh
Related to https://github.com/meilisearch/meilisearch/issues/2516
Looks like the issue should be moved to this repo, but I'm not sure what the right procedure for it.
Co-authored-by: Dmytro Gordon <dmytro@bigstream.co>
2553: Fix ci binary push r=irevoire a=curquiza
Prevent the check of version when releasing a RC for the binary publish.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2549: Fix CI checking version compatibility r=irevoire a=curquiza
- Fix CI checks in `if` regarding the `stable` variable created
- Fix command to remove prefix `refs/tags/v` from `GITHUB_REF` in `check-release.sh` script
- Change from `sh` to `bash` for `check-release.sh` script
- Fix error message
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2544: Fix content of dump/assets for testing r=curquiza a=loiclec
This is just a change to the content of two .dump files used in the integration tests.
Those files contained settings with the criterion `desc(fame)`, which is invalid
for a v3 or higher dump.
The change took place in the updates.json file inside the decompressed
.dump files. Instances of `desc(field)` or `asc(field)` were changed to
`field:desc` and `field:asc`.
The tests were (wrongly) passing because the ranking rules were never parsed.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
566: Introduce the copy_to_path method on the Index r=irevoire a=Kerollmops
Meilisearch needs this method to do snapshots.
Co-authored-by: Kerollmops <clement@meilisearch.com>
Some contained settings with the criterion desc(fame), which is invalid
for a v3 or higher dump.
The change took place in the updates.json file inside the decompressed
.dump files. Instances of desc(field) or asc(field) were changed to
field:desc and field:asc
2530: Check the version in Cargo.toml before publishing r=irevoire a=curquiza
Fixes#2079
Also
- improves the current docker CI for v0.28.0: the current implementation will make run 2 CI instead of just one for the official release
- move the `is-latest-releaes.sh` script, and update the documentation comment
- fix version of permissive-json-pointer
How to test the script?
```
export GITHUB_REF='refs/tags/v0.28.0'
sh .github/scripts/check-release.sh
```
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops
This PR is related to https://github.com/meilisearch/meilisearch/issues/2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way.
The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document.
The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned.
If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles.
Co-authored-by: Kerollmops <clement@meilisearch.com>
* Create a docker tag without patch version if git tag has 0 patch version.
* Create Docker tag without patch number if git tag follows v<number>.<number>.<number>
Add minor changes on CI
* Create a docker tag without patch version if git tag has 0 patch version.
* Create Docker tag without patch number if git tag follows v<number>.<number>.<number>
Add minor changes on CI
2524: Add the specific routes for the pagination and faceting settings r=curquiza a=Kerollmops
Fixes#2522
Co-authored-by: Kerollmops <clement@meilisearch.com>
2517: Fix typos in `tasks/.rs` r=MarinPostma a=ryanrussell
Signed-off-by: Ryan Russell <git@ryanrussell.org>
## What does this PR do?
Readability in `tasks/.rs` files.
## PR checklist
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Ryan Russell <git@ryanrussell.org>
2508: docs: Readability improvements in `download-latest.sh` and `src/analytics/.rs` r=Kerollmops a=ryanrussell
# Pull Request
## What does this PR do?
Improves readability in `download-latest.sh` and in the `src/analytics/.rs` files
## PR checklist
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Ryan Russell <git@ryanrussell.org>
2466: index resolver tests r=MarinPostma a=MarinPostma
add more index resolver tests
depends on #2455
followup #2453
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2414: Improve index uid validation upon API key creation r=Kerollmops a=pierre-l
- ~Use an IndexUid newtype to enforce stronger constraints~
- ~`cargo update -p vergen`~ (`rustup update` was the proper fix for this)
- Add a new `meilisearch_types` crate
- Move `meilisearch_error` to `meilisearch_types::error`
- Move `meilisearch_lib::index_resolver::IndexUid` to `meilisearch_types::index_uid`
- Add a new `InvalidIndexUid` error in `meilisearch_types::index_uid`
- Move `meilisearch_http::routes::StarOr` to `meilisearch_types::star_or`
- Use the `IndexUid` and `StarOr` in `meilisearch_auth::Key`
Fixes#2158
Co-authored-by: pierre-l <pierre.larger@gmail.com>
552: Fix escaped quotes in filter r=Kerollmops a=irevoire
Will fix https://github.com/meilisearch/meilisearch/issues/2380
The issue was that in the evaluation of the filter, I was using the deref implementation instead of calling the `value` method of my token.
To avoid the problem happening again, I removed the deref implementation; now, you need to either call the `lexeme` or the `value` methods but can't rely on a « default » implementation to get a string out of a token.
Co-authored-by: Tamo <tamo@meilisearch.com>
2455: refactor perform task r=curquiza a=MarinPostma
Refactor some index resolver functions to make duties more consistent, and code easier to test
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2498: Update CONTRIBUTING.md to add the release process r=Kerollmops a=curquiza
Add release process to the contributing
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Move `meilisearch_error` to `meilisearch_types::error`
Move `meilisearch_lib::index_resolver::IndexUid` to `meilisearch_types::index_uid`
Add a new `InvalidIndexUid` error in `meilisearch_types::index_uid`
2494: Introduce the new faceting and pagination settings r=ManyTheFish a=Kerollmops
This PR introduces two new settings following the newly created spec https://github.com/meilisearch/specifications/pull/157:
- The `faceting.max_values_per_facet` one describes the maximum number of values (each with a count) associated with a value in a facet distribution query.
- The `pagination.limited_to` one describes the maximum number of documents that a search query can ever return.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2491: Improve Docker CIs r=curquiza a=curquiza
Close#1901
Continuing work of
- [x] Merge two docker CI in one (#2477). The latest tag is still only push for the official release.
- [x] Add cron job in GHA to run the CI (without publishing the image) every day. This will check the docker build is working.
Co-authored-by: Lawrence Chou <lawrencechou1024@gmail.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2460: Create custom error types for `TaskType`, `TaskStatus`, and `IndexUid` r=Kerollmops a=walterbm
# Pull Request
## What does this PR do?
Fixes#2443 by making the following changes:
- Add custom `TaskTypeError` for `TaskType::from_str`
- Add custom `TaskStatusError` for `TaskStatus::from_str`
- Add custom `IndexUidFormatError` for `IndexUid::from_str`
- Implement `From<IndexUidFormatError> for IndexResolverError` to convert between errors
- Replace all usages of `IndexUid::new` with `IndexUid::from_str`
- **NOTE** I am relatively new to Rust and I struggled a lot with this final part. This PR ended up with a messy error conversion which does not seem ideal. Please let me know if you have any suggestions for how to make this better and I'll be happy to make any updates!
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: walter <walter.beller.morales@gmail.com>
2473: fix blocking in dumps r=irevoire a=MarinPostma
This PR fixes two blocking calls in the dump process.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
548: Setup the new limits on the number of facet values to return r=ManyTheFish a=Kerollmops
This PR implements the early draft of the new spec (waiting for it) specifying how the new facet limit feature should work and which limit we apply to the number of facet values to return by facet.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2487: Feat(auth): Use hmac instead of sha256 to derivate API keys from master key r=MarinPostma a=ManyTheFish
Wrap sha256 in HMAC to derivate new API keys from the master key.
2489: Fix(auth): Authorization test were not testing keys unrestricted on index r=Kerollmops a=ManyTheFish
Co-authored-by: ManyTheFish <many@meilisearch.com>
2453: test index resolver r=MarinPostma a=MarinPostma
add some tests to the `IndexResolver` implementation of `BatchHandler`
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2471: Remove the connection keep-alive timeout r=MarinPostma a=Thearas
# Pull Request
## What does this PR do?
Fixes <https://github.com/meilisearch/meilisearch-go/issues/221>.
Meilisearch has a default connection keep-alive timeout for 5s, which means it will close the connections with idle time >= 5s.
This PR set actix-web keep-alive config to `KeepAlive::Os`, let the client and system to decide when to close the connection.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Thearas <thearas850@gmail.com>
2476: fix(test): Windows index pagination test r=ManyTheFish a=ManyTheFish
The test `index::get_index::get_and_paginate_indexes` fails on every PRs during `Tests on windows-latest` CI job.
- reduce the default index size in tests.
- wait for each task instead of waiting for the last one at the end.
Co-authored-by: ManyTheFish <many@meilisearch.com>
2464: Simplify the `star_or` function usage r=ManyTheFish a=Kerollmops
This PR simplifies the usage of the `star_or` function that was originally introduced in #2399. The `serde-cs` https://github.com/naughie/serde-cs/pull/1 PR was merged and was implementing the `IntoIterator` trait on the `CS` types, which makes it possible to directly collect without converting a `CS` into the inner type (vec).
Co-authored-by: Kerollmops <clement@meilisearch.com>
2459: Fix typo in codebase comments r=Kerollmops a=ryanrussell
# Pull Request
## PR checklist
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Ryan Russell <git@ryanrussell.org>
541: Update version for next release (v0.29.0) r=ManyTheFish a=curquiza
Need to update the version since #540 was merged and breaking
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2454: Unify the pagination of the index and documents route behind a common type r=curquiza a=irevoire
`@MarinPostma` wdyt of keeping the `auto_paginate_sized` until we implement the pagination on every route that needs it just to see if it could be useful to something else
Co-authored-by: Tamo <tamo@meilisearch.com>
2447: move index uid in task content r=Kerollmops a=MarinPostma
this pr moves the index_uid from the `Task` to the `TaskContent`. This is because the task can now have content that do not target a particular index.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2451: feat(API-keys): Change immutable_field error message r=Kerollmops a=ManyTheFish
Change the immutable_field error message to fit the recent changes in the spec:
aa0a148ee3..84a9baff68
Co-authored-by: ManyTheFish <many@meilisearch.com>
2450: Bump the dependencies r=ManyTheFish a=Kerollmops
In order to use [the latest version of grenad](https://docs.rs/grenad) I bump the dependencies here. We also use the latest versions of all our other dependencies now.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2438: Refine keys api r=ManyTheFish a=ManyTheFish
waiting for #2410 and #2444 to be merged.
fix#2369
Co-authored-by: ManyTheFish <many@meilisearch.com>
535: Reintroduce the max values by facet limit r=ManyTheFish a=Kerollmops
This PR reintroduces the max values by facet limit this is related to https://github.com/meilisearch/meilisearch/issues/2349.
~I would like some help in deciding on whether I keep the default 100 max values in milli and set up the `FacetDistribution` settings in Meilisearch to use 1000 as the new value, I expose the `max_values_by_facet` for this purpose.~
I changed the default value to 1000 and the max to 10000, thank you `@ManyTheFish` for the help!
Co-authored-by: Kerollmops <clement@meilisearch.com>
2448: docs(security): Fix `Supported` r=MarinPostma a=ryanrussell
Signed-off-by: Ryan Russell <git@ryanrussell.org>
# Pull Request
## What does this PR do?
- Fix typo in security `Suported` -> `Supported`
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Ryan Russell <git@ryanrussell.org>
2446: rename Succeded to Succeeded r=irevoire a=MarinPostma
this pr renames `TaskEvent::Succeded` to `TaskEvent::Succeeded` and apply the migration to the dumps
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2427: Update the documents resource r=MarinPostma a=irevoire
- Return Documents API resources on `/documents` in an array in the results field.
- Add limit, offset, and total in the response body.
- Rename `attributesToRetrieve` into `fields` (only for the `/documents` endpoints, not for the `/search` ones).
- The `displayedAttributes` settings do not impact anymore the displayed fields returned in the `/documents` endpoints. These settings only impact the `/search` endpoint.
- make the `/document/:uid` route accept the `fields` query parameter
Fix#2372
Co-authored-by: Irevoire <tamo@meilisearch.com>
- Return Documents API resources on `/documents` in an array in the the results field.
- Add limit, offset and total in the response body.
- Rename `attributesToRetrieve` into `fields` (only for the `/documents` endpoints, not for the `/search` ones).
- The `displayedAttributes` settings does not impact anymore the displayed fields returned in the `/documents` endpoints. These settings only impacts the `/search` endpoint.
Fix#2372
2399: Update the tasks endpoints r=MarinPostma a=Kerollmops
This PR wraps all the changes related to the `tasks` endpoints, it is related to https://github.com/meilisearch/meilisearch/issues/2377 but doesn't close it. I will create a new PR to work on [the seek-based pagination](https://github.com/meilisearch/specifications/pull/115).
I wanted to do something cool with Github: being able to merge multiple PR in this one, to help review changes one by one, unfortunately, Github doesn't allow creating empty PRs. I also struggled with git itself when it comes to merging things in the right order, so I decided that I would add all of the changes in this single PR. I will list the changes and references to the specs here.
- [x] Tasks statuses and types must be case insensitive
- [x] Tasks statuses, types and indexUid must accept the `*` selector
- [ ] Rename the `TaskDetails` struct fields
## Changes
- [ ] Add seek-based pagination following [the spec](https://github.com/meilisearch/specifications/pull/115)
- [x] Add filtering on the `/tasks` endpoint following [this spec](https://github.com/meilisearch/specifications/pull/116)
- [x] Add filtering capabilities on `type`, `status` and `indexUid` for `GET` `task` lists endpoints.
- [x] It is possible to specify several values for a filter using the `,` character. e.g. `?status=enqueued,processing`
- [x] Between two different filters, an AND operation is applied. e.g. `?status=enqueued&type=indexCreation` is equivalent to `status=enqueued AND type = indexCreation`
- [x] Remove `GET /indexes/:indexUid/tasks`. It can be replaced by `GET /tasks?indexUid=:indexUid`
- [x] Remove `GET /indexes/:indexUid/tasks/:taskUid`.
- [x] Rename `uid` to `taskUid` in the `202 - Accepted` task response return by every asynchronous tasks (ex: index creation, document addition...)
- [x] Rename some task properties
- [x] `documentPartial`-> `documentAdditionOrUpdate`
- [x] `documentAddition`-> `documentAdditionOrUpdate`
- [x] `clearAll` -> `documentDeletion`
Co-authored-by: Kerollmops <clement@meilisearch.com>
2444: add boilerplate for dump v5 r=MarinPostma a=MarinPostma
add the boilerplate files for dump v5
Co-authored-by: ad hoc <postma.marin@protonmail.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2410: Make dump a task r=Kerollmops a=MarinPostma
This PR transforms the dump task into a proper task.
The `GET /dumps/:dump_uid` is removed.
Some changes were made to make this work, and a bit a refactoring was necessary.
- The `dump_actor` module has been renamed do `dumps` and moved to the root
- There isn't a `DumpActor` anymore, and the dump process is handled by the `DumpHandler`.
- The `TaskPerformer` is renamed to `BatchHandler`
- The `BatchHandler` trait no longer has a `perform_job` method, but instead has a `accept` method returning whether a handler can proccess a batch
- The scheduler now accept a list of `BatchHandler`, and iterates trhough them until it finds one to accept the current batch.
- `Job` doesn't exist anymore, and everything in now inside of the `BatchContent` enum.
- The `Vec<TaskId>` from `Batch` is replaced with a `BatchContent` enum which hints at the content.
- The Scheduler is slightly modified to accept batch, and prioritize them before regular tasks.
- The `TaskList` are not identified by a `String` representing the index uid anymore, but by a `TaskListIdentifier` which also works for dumps which are not targeting any specific indexes.
- The `GET /dump/:dump_id` no longer exists
- `DumpActorError` is renamed to `DumpError`
close#2410
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2429: Send the analytics to `telemetry.meilisearch.com` instead of segment r=MarinPostma a=irevoire
Fix#2425
Co-authored-by: Irevoire <tamo@meilisearch.com>
538: speedup exact words r=Kerollmops a=MarinPostma
This PR make `exact_words` return an `Option` instead of an empty set, since set creation is costly, as noticed by `@kerollmops.`
I was not convinces that this was the cause for all of the performance drop we measured, and then realized that methods that initialized it were called recursively which caused initialization times to add up. While the first fix solves the issue when not using exact words, using exact word remained way more expensive that it should be. To address this issue, the exact words are cached into the `Context`, so they are only initialized once.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
536: Improves ranking rules error message r=Kerollmops a=matthias-wright
This PR improves the ranking rules error message to properly reflect the case sensitivity.
The issue was highlighted in [meilisearch/issues/2407](https://github.com/meilisearch/meilisearch/issues/2407).
Cheers!
Co-authored-by: Matthias Wright <matthias.s.wright@gmail.com>
2357: chore(dump): add dump tests r=Kerollmops a=irevoire
Add tests on the import of dump v1, v2, v3 and v4.
Since the dumps are slow to decompress, I made the `flate2` crate always compile in optimized.
And since they're also slow to index, I also made the `milli` crate always compile in optimized. What do you think of this `@MarinPostma?`
Should we keep milli unoptimized in case it could help us debug some things? 👀
Co-authored-by: Tamo <tamo@meilisearch.com>
2408: Upgrade milli v0.28 r=ManyTheFish a=ManyTheFish
- Add smart crop
- Add test on _matches_infos
- Fix some test
Co-authored-by: ManyTheFish <many@meilisearch.com>
2404: Bring `release-v0.27.1` into `main` r=curquiza a=curquiza
Following the v0.27.1 hotfixes
Co-authored-by: ad hoc <postma.marin@protonmail.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
532: Add some implementation on MatchBounds r=Kerollmops a=ManyTheFish
Theses Implementations are needed in meilisearch
Co-authored-by: ManyTheFish <many@meilisearch.com>
2339: deny warnings in CI r=irevoire a=MarinPostma
Add `RUSTFLAGS= -D warnings` to the CI so all warnings are treated as hard errors.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
525: Simplify the error creation with thiserror r=irevoire a=irevoire
I introduced [`thiserror`](https://docs.rs/thiserror/latest/thiserror/) to implements all the `Display` trait and most of the `impl From<xxx> for yyy` in way less lines.
And then I introduced a cute macro to implements the `impl<X, Y, Z> From<X> for Z where Y: From<X>, Z: From<X>` more easily.
Co-authored-by: Tamo <tamo@meilisearch.com>
527: Remove the wip section part of the contributing file r=curquiza a=Kerollmops
Everything was good in the _Development Workflow_ section so I removed the _WIP Section_ part, now this PR fixes https://github.com/meilisearch/milli/issues/513.
Co-authored-by: Kerollmops <clement@meilisearch.com>
522: Do not generate keys that are too long for LMDB r=Kerollmops a=Kerollmops
This PR fixes https://github.com/meilisearch/meilisearch/issues/2338 by making sure that we do not generate keys that are too long for LMDB especially when we are creating our prefix and proximity pairs keys.
Co-authored-by: Kerollmops <clement@meilisearch.com>
507: deny warnings in CI r=Kerollmops a=MarinPostma
Add `RUSTFLAGS= -D warnings` to the CI so all warnings are treated as hard errors.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
511: Update version in every workspace r=curquiza a=curquiza
Checked with `@Kerollmops`
- Update the version into every workspace (the current version is v0.27.0, but I forgot to update it for the previous release)
- add `publish = false` except in `milli` workspace.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
514: Stop flattening every field r=Kerollmops a=irevoire
When we need to flatten a document:
* The primary key contains a `.`.
* Some fields need to be flattened
Instead of flattening the whole object and thus creating a lot of allocations with the `serde_json_flatten_crate`, we instead generate a minimal sub-object containing only the fields that need to be flattened.
That should create fewer allocations and thus index faster.
---------
```
group indexing_main_e1e362fa indexing_stop-flattening-every-field_40d1bd6b
----- ---------------------- ---------------------------------------------
indexing/Indexing geo_point 1.99 23.7±0.23s ? ?/sec 1.00 11.9±0.21s ? ?/sec
indexing/Indexing movies in three batches 1.00 18.2±0.24s ? ?/sec 1.01 18.3±0.29s ? ?/sec
indexing/Indexing movies with default settings 1.00 17.5±0.09s ? ?/sec 1.01 17.7±0.26s ? ?/sec
indexing/Indexing songs in three batches with default settings 1.00 64.8±0.47s ? ?/sec 1.00 65.1±0.49s ? ?/sec
indexing/Indexing songs with default settings 1.00 54.9±0.99s ? ?/sec 1.01 55.7±1.34s ? ?/sec
indexing/Indexing songs without any facets 1.00 50.6±0.62s ? ?/sec 1.01 50.9±1.05s ? ?/sec
indexing/Indexing songs without faceted numbers 1.00 54.0±1.14s ? ?/sec 1.01 54.7±1.13s ? ?/sec
indexing/Indexing wiki 1.00 996.2±8.54s ? ?/sec 1.02 1021.1±30.63s ? ?/sec
indexing/Indexing wiki in three batches 1.00 1136.8±9.72s ? ?/sec 1.00 1138.6±6.59s ? ?/sec
```
So basically everything slowed down a liiiiiittle bit except the dataset with a nested field which got twice faster
Co-authored-by: Tamo <tamo@meilisearch.com>
515: Improve the README r=curquiza a=Kerollmops
This PR closes#512 by adding more content to the README. We listed all of the subcrates of the repository, changed the descriptions of the subcrates, and added a simple example usage in the README.
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
505: normalize exact words r=curquiza a=MarinPostma
Normalize the exact words, as specified in the specification.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2347: Change Nelson path r=curquiza a=curquiza
Nelson is now on the Meilisearch orga side
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
483: Enhance matching words r=Kerollmops a=ManyTheFish
# Summary
Enhance milli word-matcher making it handle match computing and cropping.
# Implementation
## Computing best matches for cropping
Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.
Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.
## Cropping around the best matches interval
Before we were cropping around the interval without checking the context.
Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.
> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and not like:
`Natalie risk her future. Split The World is a book …`
Co-authored-by: ManyTheFish <many@meilisearch.com>
493: Use smartstring to store the external id in our hashmap r=Kerollmops a=irevoire
We need to store all the external id (primary key) in a hashmap
associated to their internal id.
The smartstring remove heap allocation / memory usage and should
improve the cache locality.
I ran the benchmarks to measure the impact of this PR on the indexing time.
I think we should merge it whatever happens thought because it'll decrease the memory consumption.
---------
This improve really sliiiiiightly the performances but improve the memory usage thus it should be merged.
```
group indexing_main_6b073738 indexing_use-smartsring_3f343511
----- ---------------------- --------------------------------
indexing/Indexing geo_point 1.02 25.2±0.20s ? ?/sec 1.00 24.8±0.13s ? ?/sec
indexing/Indexing movies in three batches 1.00 18.2±0.10s ? ?/sec 1.00 18.2±0.23s ? ?/sec
indexing/Indexing movies with default settings 1.00 17.5±0.09s ? ?/sec 1.01 17.7±0.11s ? ?/sec
indexing/Indexing songs in three batches with default settings 1.00 68.3±1.01s ? ?/sec 1.00 68.0±0.95s ? ?/sec
indexing/Indexing songs with default settings 1.00 63.2±0.78s ? ?/sec 1.00 63.0±0.58s ? ?/sec
indexing/Indexing songs without any facets 1.02 59.6±1.00s ? ?/sec 1.00 58.5±1.03s ? ?/sec
indexing/Indexing songs without faceted numbers 1.00 62.8±0.38s ? ?/sec 1.00 62.6±1.02s ? ?/sec
indexing/Indexing wiki 1.01 1009.2±25.25s ? ?/sec 1.00 998.1±11.27s ? ?/sec
indexing/Indexing wiki in three batches 1.01 1142.0±9.97s ? ?/sec 1.00 1134.4±11.21s ? ?/sec
```
Co-authored-by: Tamo <tamo@meilisearch.com>
We need to store all the external id (primary key) in a hashmap
associated to their internal id during.
The smartstring remove heap allocation / memory usage and should
improve the cache locality.
496: Improve the performances of the flattening subcrate r=irevoire a=Kerollmops
This PR adds some benchmarks to the _flatten-serde-json_ crate, this crate is responsible for transforming the original documents into flat versions that the engine can understand. It can probably be speed-up and this is why I added benchmarks to it.
I make some interesting performance improvements when I replaced the `json!` macro calls.
```
flatten/simple time: [452.44 ns 453.31 ns 454.18 ns]
change: [-15.036% -14.751% -14.473%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
Benchmarking flatten/complex: Collecting 100 samples in estimated 5.0007 s (4.9M i flatten/complex time: [1.0101 us 1.0131 us 1.0160 us]
change: [-18.001% -17.775% -17.536%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
```
---
_I removed this particular commit from this PR._ The reason is that the two other commits were enough for this PR to give enough impact and be merged. We will continue to explore where we can get performances later.
But when I changed the flattening function to accept an owned version of the objects, we lost a lot of performances. Yes, I rewrote the benchmarks (locally) to clone the input object (and measured both, previous and new versions, with the cloning benchmarks). Maybe cloning the benchmark inputs is not the right thing to do...
```
Benchmarking flatten/simple: Collecting 100 samples in estimated 5.0005 s (6.7M it flatten/simple time: [746.46 ns 749.59 ns 752.70 ns]
change: [+40.082% +40.714% +41.347%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking flatten/complex: Collecting 100 samples in estimated 5.0047 s (2.9M i flatten/complex time: [1.7311 us 1.7342 us 1.7368 us]
change: [+40.976% +41.398% +41.807%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
```
Co-authored-by: Kerollmops <clement@meilisearch.com>
491: remove the unused key warning r=curquiza a=irevoire
When I copy-pasted my flatten crate I forgot to remove the key used to publish the package and that throw a warning.
Co-authored-by: Tamo <tamo@meilisearch.com>
490: Enforce labelling for the PRs r=curquiza a=curquiza
- Enforce one of the following labels to make the CI pass: `no breaking`, `DB breaking`, `API breaking` (milli API, not the Meilisearch API of course), or `skip changelog`. This new CI is now `Required` in the GitHub settings for merging a PR.
- Adapt the release drafter to these new labels
- rename `skip-changelog` into `skip changelog` according to the new label name
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
458: Nested fields r=Kerollmops a=irevoire
For the following document:
```json
{
"id": 1,
"person": {
"name": "tamo",
"age": 25,
}
}
```
Suppose the user sets `person` as a filterable attribute. We need to store `person` in the filterable _obviously_. But we also need to keep track of `person.name` and `person.age` somewhere.
That’s where I changed a little bit the logic of the engine.
Currently, we have a function called `faceted_field` that returns the union of the filterable and sortable.
I renamed this function in `user_defined_faceted_field`. And now, when we finish indexing documents, we look at all the fields and see if they « match » a `user_defined_faceted_field`.
So in our case:
- does `id` match `person`: 🔴
- does `person.name` match `person`: 🟢
- does `person.age` match `person`: 🟢
And thus, we insert in the database the following faceted fields: `person, person.name, person.age`.
The good thing about that solution is that we generate everything during the indexing phase, and then during the search, we can access our field without recomputing too much globbing.
-----
Now the bad thing is that I had to create a new db.
And if that was only one db, that would be ok, but actually, I need to do the same for the:
- Displayed attributes
- Attributes to retrieve
- Attributes to highlight
- Attribute to crop
`@Kerollmops`
Do you think there is a better way to do it?
Apart from all the code, can we have a problem because we have too many dbs?
Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
486: Update version (v0.25.0) r=curquiza a=curquiza
v0.25.0 will be released once #478 is merged
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
474: Disable typos on exact word r=MarinPostma a=MarinPostma
This PR introduces the `exact_word` setting to disable typo tolerance on custom words.
If a user query contains a word from `exact_words`, no typo derivation will be made for that particular word.
I have chosen to store the words in a FST, to save on deserialization, and allow for fast lookups.
I had some trouble with the `serde` module, and had to rename it `serde_impl`.
## steps:
- [x] introduce new settings to register words to disable typos on
- [x] in `typos`, return exact match is the current word is part of the word to disable typos for.
- [x] update `Context` to return the exact words dictionary.
- [x] merge #473
Co-authored-by: ad hoc <postma.marin@protonmail.com>
473: set minimum word len for typos r=MarinPostma a=MarinPostma
this PR allows the configuration on the minimum word length for typos.
The default values are the same as previously.
## steps
- [x] introduce settings for the minimum word length for 1 and 2 typos
- [x] update the settings update flow to set this setting
- [x] create a structure `TypoConfig` to configure typo tolerance in the query builder
- [x] in `typo`, use the configuration to create the appropriate query tree node.
- [x] extend `Context` to return the setting for minimum word length for typos
- [x] return correct error message for wrong settings.
- [x] merge #469
Co-authored-by: ad hoc <postma.marin@protonmail.com>
485: fix bug on 2 typos derivation r=Kerollmops a=MarinPostma
I found a bug while working on #473. This pr fixes it and add the missing tests on word derivations.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
469: add authorize typo setting r=Kerollmops a=MarinPostma
This PR adds support for an authorize typo settings. This makes is possible to disable typos for a whole index. Typos are enabled by default.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
480: Increase benchmarks (push) CI timeout r=Kerollmops a=Kerollmops
This PR fixes the fact that the benchmarks CI on push were [canceled by GitHub](https://github.com/meilisearch/milli/actions/runs/2028844132) because they reached the default timeout of 6h. This PR changes the timeout to 72h, the same setting as the manually triggered benchmark one.
Co-authored-by: Kerollmops <clement@meilisearch.com>
479: Update version (v0.24.1) r=Kerollmops a=curquiza
From v0.23.1 to v0.24.1 since we had an issue with the versionning for the previous release
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
475: Bump tokenizer r=Kerollmops a=irevoire
This PR bump the tokenizer in v0.2.9 which fixes an issue we had with lindera where reqwest was used with openssl (which was breaking our benchmarks).
Co-authored-by: Irevoire <tamo@meilisearch.com>
476: Rollback meilisearch-tokenizer version r=Kerollmops a=irevoire
Lindera often fails to download some data from google drive we can’t compile consistently meilisearch / milli.
We can’t bump to the latest version (that moved out of google drive) either because lindera uses reqwest with openssl with no way of configuring it our benchmarks were not able to run. The latter issue should be fixed by https://github.com/lindera-morphology/lindera/pull/164.
Co-authored-by: Irevoire <tamo@meilisearch.com>
472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish
Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function.
Co-authored-by: ManyTheFish <many@meilisearch.com>
468: Add a new error message when the filterableAttributes are empty r=Kerollmops a=brunoocasali
Fixes https://github.com/meilisearch/meilisearch/issues/2140
Is there a good way to reduce de duplication here? Maybe adding a shared function? I don't know the best and idiomatic way to do that, I appreciate any tip!
Another doubt is related to the duplication of the calling:
```rs
// filter.rs:373
FilterError::AttributeNotFilterable {
attribute,
filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
},
```
and
```rs
// filter.rs:424
return Err(point[0].as_external_error(FilterError::AttributeNotFilterable {
attribute: "_geo",
filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
}))?;
```
I think we could make the `filterable_fields.into_iter().collect::<Vec<_>>().join(" ")` directly into the error handling like the sortable error. I made it into the last commit, if this is something to avoid, let me know and I can remove it :)
Co-authored-by: Bruno Casali <brunoocasali@gmail.com>
466: Bump version to 0.23.1 r=curquiza a=Kerollmops
This PR bumps the crate versions to 0.23.1. Nothing seems to be breaking in the next release.
Co-authored-by: Kerollmops <clement@meilisearch.com>
467: optimize prefix database r=Kerollmops a=MarinPostma
This pr introduces two optimizations that greatly improve the speed of computing prefix databases.
- The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST.
- We unconditionally and needlessly checked for documents to remove in `word_prefix_pair`, which caused an iteration over the whole database.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
465: Update dependencies r=ManyTheFish a=Kerollmops
This PR upgrade and updates this crate's dependencies but first, it removes three dependencies that we don't use anymore. I used [cargo udeps](https://github.com/est31/cargo-udeps) to upgrade them ⬆️
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
464: exporting heed to avoid having different versions of Heed in Meilisearch r=curquiza a=psvnlsaikumar
# Pull Request
## What does this PR do?
Fixes the issue in meilisearch https://github.com/meilisearch/meilisearch/issues/2210
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: psvnl sai kumar <psvnlsaikumar@gmail.com>
457: Avoid iterating on big databases when useless r=Kerollmops a=Kerollmops
This PR makes the prefix database updates to avoid iterating on big grenad files when it is unnecessary. We introduced this regression in #436 but it went unnoticed.
---
According to the following benchmark results, we take more time when we index documents in one run than before #436. It looks like it is probably due to the fact that, now, instead of computing the prefixes database by iterating on the LMDB we directly iterate on the grenad file. Those could be slower to iterate on and could be the slowdown cause.
I just pushed a commit that tests this branch with the new unreleased version of grenad where some work was done to speed up the iteration on grenad files. [The benchmarks for this last commit](https://github.com/meilisearch/milli/actions/runs/1927187408) are currently running. You can [see the diff](https://github.com/meilisearch/grenad/compare/v0.4.1...main) between the v0.4 and the unreleased v0.5 version of grenad.
```diff
group indexing_benchmark-multi-batch-indexing-before-speed-up_45f52620 indexing_stop-iterating-on-big-grenad-files_ac8b85c4
----- ---------------------------------------------------------------- ----------------------------------------------------
+ indexing/Indexing songs in three batches with default settings 1.12 57.7±2.14s ? ?/sec 1.00 51.3±2.76s ? ?/sec
- indexing/Indexing wiki 1.00 917.3±30.01s ? ?/sec 1.10 1008.4±38.27s ? ?/sec
+ indexing/Indexing wiki in three batches 1.10 1091.2±32.73s ? ?/sec 1.00 995.5±24.33s ? ?/sec
```
Co-authored-by: Kerollmops <clement@meilisearch.com>
461: Add a new error message when the `valid_fields` is empty r=curquiza a=brunoocasali
I've created a test case to handle the new error formatting behavior, but I'm not sure if:
- this is the right place to add the test?
- this is the best way to test this behavior?
And I'm not sure also regarding the `match` implementation, is this something required? Or maybe just an `if` statement is ok as well?
I left the two messages literally without "reusing the prefix" in the implementation because I think this could help the "searchability" of the error in the future.
# Pull Request
## What does this PR do?
Fixes https://github.com/meilisearch/meilisearch/issues/2140
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [ ] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Bruno Casali <brunoocasali@gmail.com>
462: cli improvements r=Kerollmops a=MarinPostma
a few improvements:
- use bufreader to load documents, so the loading of the document doesn't appear on flamegraphs
- set default db path to current directory so the `-i` flag can be omitted.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
> "Attribute `{}` is not sortable. This index doesn't have configured sortable attributes."
> "Attribute `{}` is not sortable. Available sortable attributes are: `{}`."
coexist in the error handling
459: Update heed link in cargo toml r=Kerollmops a=curquiza
Since grenad and heed have been moved to the meilisearch orga, this PR changes the link.
This is a minor change since GitHub handles automatically the redirection. This PR is only for consisitency.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
456: Remove useless grenad merging r=Kerollmops a=Kerollmops
This PR must be merged after #454.
This PR removes the part of code that was merging all of the grenad Readers merging that we don't need as the indexer should have merged them and, therefore, we should only have one final grenad Reader. We reduce the amount of CPU usage and memory pressure we were doing uselessly.
`@ManyTheFish` are you sure I can skip merging the `word_docids` database?
Here is the benchmark comparison with the previously merged PR #454:
```
group indexing_reintroduce-appending-sorted-values_c05e42a8 indexing_remove-useless-grenad-merging_d5b8b5a2
----- ----------------------------------------------------- -----------------------------------------------
indexing/Indexing movies with default settings 1.06 16.6±1.04s ? ?/sec 1.00 15.7±0.93s ? ?/sec
indexing/Indexing songs with default settings 1.16 60.1±7.07s ? ?/sec 1.00 51.7±5.98s ? ?/sec
indexing/Indexing songs without faceted numbers 1.06 55.4±6.14s ? ?/sec 1.00 52.2±4.13s ? ?/sec
```
And the comparison with multi-batch indexing before #436, we can see that we gain time for benchmarks that index datasets in multiple batches but there is _so much_ variance that it's not clear.
```
group indexing_benchmark-multi-batch-indexing-before-speed-up_45f52620 indexing_remove-useless-grenad-merging_d5b8b5a2
----- ---------------------------------------------------------------- -----------------------------------------------
indexing/Indexing geo_point 1.07 6.6±0.08s ? ?/sec 1.00 6.2±0.11s ? ?/sec
indexing/Indexing songs in three batches with default settings 1.12 57.7±2.14s ? ?/sec 1.00 51.5±3.80s ? ?/sec
indexing/Indexing songs with default settings 1.00 47.5±2.52s ? ?/sec 1.09 51.7±5.98s ? ?/sec
indexing/Indexing songs without any facets 1.00 43.5±1.43s ? ?/sec 1.12 48.8±3.73s ? ?/sec
indexing/Indexing songs without faceted numbers 1.00 47.1±2.23s ? ?/sec 1.11 52.2±4.13s ? ?/sec
indexing/Indexing wiki 1.00 917.3±30.01s ? ?/sec 1.09 998.7±38.92s ? ?/sec
indexing/Indexing wiki in three batches 1.09 1091.2±32.73s ? ?/sec 1.00 996.5±15.70s ? ?/sec
```
What do you think `@irevoire?` Should we change the benchmarks to make them do more runs?
Co-authored-by: Kerollmops <clement@meilisearch.com>
454: Reintroduce appending sorted entries when possible r=Kerollmops a=Kerollmops
This PR modifies the `sorter_into_lmdb_database` function to append values into the database instead of get-put-merging them, it should improve the indexation speed for when the database is empty.
```txt
group indexing_main_25123af3 indexing_reintroduce-appending-sorted-values_c05e42a8
----- ---------------------- -----------------------------------------------------
indexing/Indexing movies with default settings 1.07 17.8±0.99s ? ?/sec 1.00 16.6±1.04s ? ?/sec
indexing/Indexing songs with default settings 1.00 57.0±6.01s ? ?/sec 1.05 60.1±7.07s ? ?/sec
indexing/Indexing songs without any facets 1.10 51.8±5.36s ? ?/sec 1.00 47.3±3.30s ? ?/sec
```
Co-authored-by: Clément Renault <clement@meilisearch.com>
453: Benchmark multi batch indexing r=Kerollmops a=Kerollmops
Hey `@irevoire,` could you please add the new benchmarks into influx?
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops
This PR depends on the fixes done in #431 and must be merged after it.
In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents.
---
The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity.
The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
451: Update LICENSE with Meili SAS name r=Kerollmops a=curquiza
Check with thomas, we must put the real name of the company
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
442: fix phrase search r=curquiza a=MarinPostma
Run the exact match search on 7 words windows instead of only two. This makes false positive very very unlikely, and impossible on phrase query that are less than seven words.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
445: allow null values in csv r=Kerollmops a=MarinPostma
This pr allows null values in csv:
- if the field is of type string, then an empty field is considered null (`,,`), anything other is turned into a string (i.e `, ,` is a single whitespace string)
- if the field is of type number, when the trimmed field is empty, we consider the value null (i.e `,,`, `, ,` are both null numbers) otherwise we try to parse the number.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
444: Fix the parsing of ndjson requests to index more than the first line r=Kerollmops a=Kerollmops
This PR correctly uses the `BufRead` trait to read every line of the content instead of just the first one. This bug was only affecting the http-ui test crate.
Co-authored-by: Kerollmops <clement@meilisearch.com>
431: Fix and improve word prefix pair proximity r=ManyTheFish a=Kerollmops
This PR first fixes the algorithm we used to select and compute the word prefix pair proximity database. The previous version was skipping nearly all of the prefixes. The issue is that this fix made this method to take more time and we were trying to reduce the time spent in it.
With `@ManyTheFish` we found out that we could skip some of the work we were doing by:
- discarding the prefixes that were shorter than a specific threshold (default: 2).
- discarding the word prefix pairs with proximity bigger than a specific threshold (default: 4).
- remove the unused threshold that was specifying a minimum amount of word docids to merge.
We will take more time to do some more optimization, like stop clearing and recomputing from scratch the database, we will compute the subsets of keys to create, keep and merge. This change is a little bit more complex than what this PR does.
I keep this PR as a draft as I want to further test the real gain if it is enough or not if it is valid or not. I advise reviewers to review commit by commit to see the changes bit by bit, reviewing the whole PR can be hard.
Co-authored-by: Clément Renault <clement@meilisearch.com>
441: Changes related to the rebranding r=curquiza a=meili-bot
_This PR is auto-generated._
- [X] Change the name `MeiliSearch` to `Meilisearch` in README.
- [x] ⚠️ Ensure the bot did not update part you don’t want it to update, especially in the code examples in the Getting started.
- [x] Please, ensure there is no other "MeiliSearch". For example, in the comments or in the tests name.
- [x] Put the new logo on the README if needed -> still using the milli logo so far
Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
438: CLI improvements r=Kerollmops a=MarinPostma
I've made the following changes to the cli:
- `settings-update` become `settings`, with two subcommands: `update` and `show`.
- `document-addition` becomes `documents` with a subcommands: `add` (I'll add a feature to list documents later)
- `search` now has an interactive mode `-i`
- search return the number of documents and the time it took to perform the search.
Co-authored-by: mpostma <postma.marin@protonmail.com>
430: Document batch support r=Kerollmops a=MarinPostma
This pr adds support for document batches in milli. It changes the API of the `IndexDocuments` builder by adding a `add_documents` method. The API of the updates is changed a little, with the `UpdateBuilder` being renamed to `IndexerConfig` and being passed to the update builders. This makes it easier to pass around structs that need to access the indexer config, rather that extracting the fields each time. This change impacts many function signatures and simplify them.
The change in not thorough, and may require another PR to propagate to the whole codebase. I restricted to the necessary for this PR.
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
documents containing this field thus we return an empty RoaringBitmap
instead of throwing an internal error
Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released
Co-authored-by: Tamo <tamo@meilisearch.com>
426: Fix search highlight for non-unicode chars r=ManyTheFish a=Samyak2
# Pull Request
## What does this PR do?
Fixes https://github.com/meilisearch/MeiliSearch/issues/1480
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
## Changes
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight.
In essence, the `matching_bytes` function now returns the number of matching grapheme clusters instead of bytes.
Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens
- `<mark>` tag is put around only the matched part
- before this change, the entire word was highlighted even if only a part of it matched
## Questions
Since `matching_bytes` does not return number of bytes but grapheme clusters, should it be renamed to something like `matching_chars` or `matching_graphemes`? Will this break the API?
Thank you very much `@ManyTheFish` for helping 😄
Co-authored-by: Samyak S Sarnayak <samyak201@gmail.com>
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
documents containing this field thus we returns an empty RoaringBitmap
instead of throwing an internal error
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
clusters to highlight.
In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?
Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
clusters from tokens
- `<mark>` tag is put around only the matched part
- before this change, the entire word was highlighted even if only a
part of it matched
424: Store the geopoint in three dimensions r=Kerollmops a=irevoire
Related to this issue: https://github.com/meilisearch/MeiliSearch/issues/1872
Fix the whole computation of distance for any “geo” operations (sort or filter). Now when you sort points they are returned to you in the right order.
And when you filter on a specific radius you only get points included in the radius.
This PR changes the way we store the geo points in the RTree.
Instead of considering the latitude and longitude as orthogonal coordinates, we convert them to real orthogonal coordinates projected on a sphere with a radius of 1.
This is the conversion formulae.

Which, in rust, translate to this function:
```rust
pub fn lat_lng_to_xyz(coord: &[f64; 2]) -> [f64; 3] {
let [lat, lng] = coord.map(|f| f.to_radians());
let x = lat.cos() * lng.cos();
let y = lat.cos() * lng.sin();
let z = lat.sin();
[x, y, z]
}
```
Storing the points on a sphere is easier / faster to compute than storing the point on an approximation of the real earth shape.
But when we need to compute the distance between two points we still need to use the haversine distance which works with latitude and longitude.
So, to do the fewest search-time computation possible I'm now associating every point with its `DocId` and its lat/lng.
Co-authored-by: Tamo <tamo@meilisearch.com>
429: Benchmark CIs: not use a default label to call the GH runner r=irevoire a=curquiza
Since we now have multiple self-hosted github runners, we need to differentiate them calling them in the CI. The `self-hosted` label is the default one, so we need to use the unique and appropriate one for the benchmark machine
<img width="925" alt="Capture d’écran 2022-01-04 à 15 42 18" src="https://user-images.githubusercontent.com/20380692/148079840-49cd7878-5912-46ff-8ab8-bf646777f782.png">
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
428: Reintroduce the gitignore for the fuzzer r=Kerollmops a=irevoire
Reintroduce the gitignore in the fuzz directory
Co-authored-by: Tamo <tamo@meilisearch.com>
425: Push the result of the benchmarks to influxdb r=irevoire a=irevoire
Now execute a benchmark for every PR merged into main and then upload the results to influxdb.
Co-authored-by: Tamo <tamo@meilisearch.com>
422: Prefer returning `None` instead of using an `FilterCondition::Empty` state r=Kerollmops a=Kerollmops
This PR is related to the issue comment https://github.com/meilisearch/MeiliSearch/issues/1338#issuecomment-989322889 which exhibits the fact that when a filter is known to be empty no results are returned which is wrong, the filter should not apply as no restriction is done on the documents set.
The filter system on the milli side has introduced an Empty state which was used in this kind of situation but I found out that it is not needed and that when we parse a filter and that it is empty we can simply return `None` as the `Filter::from_array` constructor does. So I removed it and added tests!
On the MeiliSearch side, we just need to match on a `None` and completely ignore the filter in such a case.
Co-authored-by: Clément Renault <clement@meilisearch.com>
421: Introduce the depth method on FilterCondition r=Kerollmops a=Kerollmops
This PR introduces the depth method on the FilterCondition type to be able to react to it. It is meant to be used to reject filters that go too deep and can make the engine stack overflow.
Co-authored-by: Clément Renault <clement@meilisearch.com>
418: change visibility of DocumentDeletionResult r=Kerollmops a=MarinPostma
Change the visibility of `DocumentDeletionResult`, so its fields can be accesses from outside milli.
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
409: remove update_id in UpdateBuilder r=ManyTheFish a=MarinPostma
Removing the `update_id` from `UpdateBuidler`, since it serves no purpose. I had introduced it when working in HA some time ago, but I think there are better ways to do it now, so it can be removed an stop being in our way.
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
414: improve update result types r=ManyTheFish a=MarinPostma
Inprove the returned meta when performing document additions and deletions:
- On document addition return the number of indexed documents and the total number of documents in the index after the indexion
- On document deletion return the number of deleted documents, and the remaining number of documents in the index after the deletion is performed
I also fixed a potential bug when performing a document deletion and the primary key couldn't be found: before we assumed that the db was empty and returned that no documents were deleted, but since we checked before that the db wasn't empty, entering this branch is actually a bug, and now returns a 'MissingPrimaryKey' error.
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
returned metaimprove document addition returned metaimprove document
addition returned metaimprove document addition returned metaimprove
document addition returned metaimprove document addition returned
metaimprove document addition returned meta
400: Rewrite the filter parser and add a lot of tests r=irevoire a=irevoire
This PR is a complete rewrite of #358, which was reverted in #403.
You can already try this PR in Meilisearch here https://github.com/meilisearch/MeiliSearch/pull/1880.
Since writing a parser is quite complicated, I moved all the logic to another workspace called `filter_parser`.
In this workspace, we don't know anything about milli, the filterable fields / field ID or anything.
As you can see in its `cargo.toml`, it has only three dependencies entirely focused on the parsing part:
```
nom = "7.0.0"
nom_locate = "4.0.0"
```
But introducing this new workspace made some changes necessary on the “AST”. Now the parser only returns `Tokens` (a simple `&str` with a bit of context). Everything is interpreted when we execute the filter later in milli.
This crate provides a new error type for all filter related errors.
---------
## Errors
Currently, we have multiple kinds of errors. Sometimes we are generating errors looking like that: (for `name = truc`)
```
Attribute `name` is not filterable. Available filterable attributes are: ``.
```
While sometimes pest was generating errors looking like that:
```
Invalid syntax for the filter parameter: ` --> 1:7
|
1 | name =
| ^---
|
= expected word`.
```
Which most people were seeing like that: (for `name =`)
```
Invalid syntax for the filter parameter: ` --> 1:7\n |\n1 | name =\n | ^---\n |\n = expected word`.
```
-----------
With this PR, the error format is unified between all errors.
All errors follow this more straightforward format:
```
The error message.
[from char]:[to char] filter
```
This should be way easier to read when embedded in the JSON for a human. And it should also allow us to parse the errors easily and provide highlighting or something with a frontend playground.
Here is an example of the two previous errors with the new format:
For `name = truc`:
```
Attribute `name` is not filterable. Available filterable attributes are: ``.
1:4 name = truc
```
Or in one line:
```
Attribute `name` is not filterable. Available filterable attributes are: ``.\n1:4 name = truc
```
And for `name =`:
```
Was expecting a value but instead got nothing.
7:7 name =
```
Or in one line:
```
Was expecting a value but instead got nothing.\n7:7 name =
```
Also, since we now have control over the parser, we can generate more explicit error messages so a lot of new errors have been created. I tried to be as helpful as possible for the user; here is a little overview of the new error message you can get when misusing a filter:
```
Expression `"truc` is missing the following closing delimiter: `"`.
8:13 name = "truc
```
The `_geoRadius` filter is an operation and can't be used as a value.
8:30 name = _geoRadius(12, 13, 14)
```
etc
## Tests
A lot of tests have been written in the `filter_parser` crate. I think there is a unit test for every part of the syntax.
But since we can never be sure we covered all the cases, I also fuzzed the new parser A LOT (for ±8 hours on 20 threads). And the code to fuzz the parser is included in the workspace, so if one day we need to change something to the syntax, we'll be able to re-use it by simply running:
```
cargo fuzz run --release parse
```
## Milli
I renamed the type and module `filter_condition.rs` / `FilterCondition` to `filter.rs` / `Filter`.
Co-authored-by: Tamo <tamo@meilisearch.com>
407: Update version for the next release (v0.20.0) r=curquiza a=curquiza
Breaking because of #405 and #406
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
406: return document count from builder r=MarinPostma a=MarinPostma
`DocumentBatchBuilder::finish` now returns the number of documents in the batch. This is more compact that calling `len()` just before calling finish.
Co-authored-by: marin postma <postma.marin@protonmail.com>
402: Optimize document transform r=MarinPostma a=MarinPostma
This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically:
- For json, we build a serde `Visitor`, that transform the json straight into obkv without intermediate representation.
- For csv, we directly write the lines in the obkv, applying other optimization as well.
Co-authored-by: marin postma <postma.marin@protonmail.com>
404: remove search crate r=Kerollmops a=MarinPostma
The functionalities of the search crate have been moved to the cli crate. The outstanding files are removed by this pr.
Co-authored-by: marin postma <postma.marin@protonmail.com>
394: Added search_geo benchmark in cron job r=irevoire a=fumblehool
fixes: #392
`search_geo` cron will run every friday at 18:30
Co-authored-by: Damanpreet Singh <daman.4880@gmail.com>
390: Add helper methods on the settings r=Kerollmops a=irevoire
This would be a good addition to look at the content of a setting without consuming it.
It’s useful for analytics.
Co-authored-by: Irevoire <tamo@meilisearch.com>
384: Replace memmap with memmap2 r=Kerollmops a=palfrey
[memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values.
Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>
388: fix primary key inference r=MarinPostma a=MarinPostma
The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id".
This fix sorts the the index by field_id when infering the primary key.
Co-authored-by: mpostma <postma.marin@protonmail.com>
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
386: fix obkv document r=curquiza a=MarinPostma
When serializing a document, the serializer resolved the field_id of the current field and immediately added it to the obkv document under construction. The issue with that is that obkv expects the fields to be inserted in order, and when a document with out of order fields was added, obkv failed to insert the field.
The current fix first resolves each field_id, and adds all the fields to a temporary `BTreeMap`, until `end` is called on the map serializer, where all the fields are added to the obkv at once, and in order.
Co-authored-by: mpostma <postma.marin@protonmail.com>
383: Add check on latitude and longitude r=irevoire a=irevoire
Latitudes are not supposed to go beyond 90 degrees or below -90.
The same goes for longitudes with 180 or -180.
This was badly implemented in the filters, and was not implemented for the `AscDesc` rules.
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
Latitude are not supposed to go beyound 90 degrees or below -90.
The same goes for longitude with 180 or -180.
This was badly implemented in the filters, and was not implemented for the AscDesc rules.
380: Reserved keyword error message r=Kerollmops a=irevoire
And I missed _another_ reserved keyword error message in the filter :(
Co-authored-by: Tamo <tamo@meilisearch.com>
378: Hotfix meilisearch#1707 r=Kerollmops a=ManyTheFish
This PR contains an ugly quick fix of [meilisearch#1707](https://github.com/meilisearch/MeiliSearch/issues/1707).
- remove comparison reverse on rank. Enhancing relevancy and performances
- iterate over level 0 only. Enhancing performances.
A better fix is in development.
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
374: Enhance CSV document parsing r=Kerollmops a=ManyTheFish
Benchmarks on `search_songs` were crashing because of the CSV parsing.
Co-authored-by: many <maxime@meilisearch.com>
376: Stop casting integer docids to string r=Kerollmops a=irevoire
When a docid is an integer, we stop casting it to a string, and thus we don't add `"` around it.
Co-authored-by: Tamo <tamo@meilisearch.com>
373: Improve error message for bad sort syntax with geosearch r=Kerollmops a=irevoire
`@Kerollmops` This should be the last PR for the geosearch and error handling, sorry for doing it in so many steps 😬
Co-authored-by: Tamo <tamo@meilisearch.com>
372: Fix Meilisearch 1714 r=Kerollmops a=ManyTheFish
The bug comes from the typo tolerance, to know how many typos are accepted we were counting bytes instead of characters in a word.
On Chinese Script characters, we were allowing 2 typos on 3 characters words.
We are now counting the number of char instead of counting bytes to assign the typo tolerance.
Related to [Meilisearch#1714](https://github.com/meilisearch/MeiliSearch/issues/1714)
Co-authored-by: many <maxime@meilisearch.com>
371: Provide a sort error handler r=Kerollmops a=irevoire
This PR simplify the error handling of asc-desc rules for Meilisearch or any other wrapper by providing directly in milli a new error type called `SortError` that can be generated from an `AscDescError` and that can be automatically converted to a `UserError`.
Basically now, wherever you are in the code as a user or in milli you can parse an `AscDesc` syntax and depending on the context, cast it either as a `SortError` or a `CriterionError` in one line with improved error messages.
Co-authored-by: Tamo <tamo@meilisearch.com>
370: Change chunk size to 4MiB to fit more the end user usage r=ManyTheFish a=ManyTheFish
We made several indexing tests using different sizes of datasets (5 datasets from 9MiB to 100MiB) on several typologies of VMs (`XS: 1GiB RAM, 1 VCPU`, `S: 2GiB RAM, 2 VCPU`, `M: 4GiB RAM, 3 VCPU`, `L: 8GiB RAM, 4 VCPU`).
The result of these tests shows that the `4MiB` chunk size seems to be the best size compared to other chunk sizes (`2Mib`, `4MiB`, `8Mib`, `16Mib`, `32Mib`, `64Mib`, `128Mib`).
below is the average time per chunk size:

<details>
<summary>Detailled data</summary>
<br>

</br>
</details>
Co-authored-by: many <maxime@meilisearch.com>
366: Geosearch error handling r=Kerollmops a=irevoire
Rewrite most of geosearch error handling and another batch of tests on the criterion parsing.
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
364: Fix all the benchmarks r=Kerollmops a=irevoire
#324 broke all benchmarks.
I fixed everything and noticed that `cargo check --all` was insufficient to check the bench in multiple workspaces, so I also updated the CI to use `cargo check --workspace --all-targets`.
Co-authored-by: Tamo <tamo@meilisearch.com>
363: Fix the returned `AscDesc` error r=Kerollmops a=irevoire
With my previous PR on the geosearch I erased the change I've introduced with my pre-previous PR about the new error type when we fail to parse the `AscDesc` type.
Sorry for that, here is the fix
Co-authored-by: Tamo <tamo@meilisearch.com>
357: Add benchmarks for the geosearch r=Kerollmops a=irevoire
closes#336
Should I merge this PR in #322 and then we merge everything in `main` or should we wait for #322 to be merged and then merge this one in `main` later?
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
324: Implement documents API r=Kerollmops a=MarinPostma
This pr implement the intermediary document representation for milli. The JSON, JSONL and CSV formats are replaced with the format instead, to push the serialization duty on the client side.
The `documents` module contains the interface to the new document format:
- The `DocumentsBuilder` allows the creation of a writer backed document addition, when documents are added either one by one, or as arrays of depth 1. This is made possible by the fact that the seriliazer used by the `add_documents` methods only accepts `[Object]` and `Object`. The related serialization logic is located in the `serde.rs` file.
- The `DocumentsReader` allows to to iterate over the documents created by a `DocumentsBuilder`. A call to `next_document_with_index` returns the next obkv reader in the document addition, along with a reference to the index used to map the field ids in the obkv reader to the field names
All references to json, jsonl or csv in the tests have been replaced with the `documents!` macro, works exaclty like the `serde_json::json` macro, as a convenient way to create a `DocumentsReader`.
Rewrote the search cli, to the `cli` crate, to also allow index manipulation. This only offers basic functionalities for now, but is meant to be easier to extend than http ui
blocked by #308
Co-authored-by: mpostma <postma.marin@protonmail.com>
360: Update version for the next release (v0.14.0) r=Kerollmops a=curquiza
Release containing the geosearch, cf #322
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
322: Geosearch r=ManyTheFish a=irevoire
This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so.
### What we will have to do on the indexing part:
- [x] Index the `_geo` fields from the documents.
- [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process.
- [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module.
- [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree`
- [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification.
- [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file.
- [x] save a Roaring bitmap of all documents containing the `_geo` field
### What we will have to do on the query part:
- [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range.
- [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum.
- [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents.
- [x] Add the `_geoRadius` function in the pest parser.
- [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too!
- [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule.
-----------
- On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned.
Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
359: Improve the benchmark comparison script r=irevoire a=irevoire
This modification allow us to compare more than 2 benchmarks or to only print the results of one benchmark
Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
356: Update the README r=curquiza a=Kerollmops
This PR updates a little bit the README and more specifically the indexing times, fixes#352.
Co-authored-by: Kerollmops <clement@meilisearch.com>
349: Enable the grenad tempfile feature back r=ManyTheFish a=Kerollmops
This PR enables the grenad `tempfile` feature back, [when this is feature is disabled the sorter writes the entries in memory](7c082d05bf/src/sorter.rs (L470-L476)) instead of on disk and therefore, consumes more memory. By enabling this feature grenad merges on disk by using the `tempfile` dependency.
This PR also bumps milli to v0.3.1 where `@ManyTheFish` added an assert for when the allocator can't allocate and disable the default snappy compression in the `http-ui` crate.
Co-authored-by: Kerollmops <clement@meilisearch.com>
344: Move the sort ranking rule before the exactness ranking rule r=ManyTheFish a=Kerollmops
This PR moves the sort ranking rule at the 5th position by default, right before the exactness one.
Co-authored-by: Kerollmops <clement@meilisearch.com>
341: Throw a query time error when a sort parameter is used but the sort ranking rule is missing r=Kerollmops a=Kerollmops
This PR makes the engine throw an error for when the ranking rules don't contain the `sort` rule, the `sortable_fields` are correctly set but the user tries to use the `sort` query parameter. Doing so will have no effect on the returned documents so we preferred returning an error to help debug this.
That's breaking on the MeiliSearch side as we added a new variant to the `UserError` enum.
Co-authored-by: Kerollmops <clement@meilisearch.com>
346: remove unused grenad default features r=Kerollmops a=MarinPostma
Milli is not using any of grenad default features, and it's zstd feature creates conflict with meilisearch. This pr simply remove the unused features.
Co-authored-by: mpostma <postma.marin@protonmail.com>
338: Fix string fields sorting r=Kerollmops a=shekhirin
Resolves https://github.com/meilisearch/milli/issues/333
<details>
<summary>curl checks</summary>
```console
➜ ~ curl -s 'localhost:7700/indexes/movies/search' -d '{"sort": ["title:asc"], "limit": 30}' | jq -r '.hits | map(.title)[]'
#1 Cheerleader Camp
#Horror
#RealityHigh
#Roxy
#SquadGoals
$5 a Day
$9.99
'71
(2)
(500) Days of Summer
(Girl)Friend
*batteries not included
...And God Created Woman
...And Justice for All
...E fuori nevica!
.45
1
1 Mile To You
1 Night
10
10 Cloverfield Lane
10 giorni senza mamma
10 Items or Less
10 Rillington Place
10 Rules for Sleeping Around
10 Things I Hate About You
10 to Midnight
10 Years
10,000 BC
10,000 Saints
➜ ~ curl -s 'localhost:7700/indexes/movies/search' -d '{"sort": ["title:desc"], "limit": 30}' | jq -r '.hits | map(.title)[]'
크게 될 놈
왓칭
뷰티플 마인드
노무현과 바보들
ハニー
Счастье – это… Часть 2
СОТКА
Смотри мою любовь
Позивний 'Бандерас'
Лошо момиче
Күлүк Хомус
Куда течет море
Каникулы президента
Ακίνητο Ποτάμι
Üç Harfliler: Beddua
È nata una Star?
Æon Flux
Ága
À propos de Nice
À Nos Amours
À l'aventure
¡Three Amigos!
Zulu Dawn
Zulu
Zulu
Zu: Warriors from the Magic Mountain
Zu Warriors
Zorro
Zorba the Greek
Zootopia
```
</details>
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
308: Implement a better parallel indexer r=Kerollmops a=ManyTheFish
Rewrite the indexer:
- enhance memory consumption control
- optimize parallelism using rayon and crossbeam channel
- factorize the different parts and make new DB implementation easier
- optimize and fix prefix databases
Co-authored-by: many <maxime@meilisearch.com>
337: Update version for the next release (v0.12.0) r=Kerollmops a=curquiza
Breaking because of the new indexer that implies DB changes #308
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
335: Get sortable_fields from index only if criteria present in query r=Kerollmops a=shekhirin
Seems like we don't need to retrieve `sortable_fields` from the index if there's no any `sort_criteria` in the query.
Small 🤏 optimization opportunity out there.
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
329: Run all benchmarks once every friday r=irevoire a=irevoire
All the benchmarks run every Friday on the `main` branch.
To avoid having pending benchmarks everywhere, we execute one benchmark every 8 hours.
Then the results are uploaded as if it was a normal user-run benchmark.
This PR closes#314 and #321
Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
330: Introduce the reset_sortable_fields Settings method r=irevoire a=Kerollmops
I forgot to add the `reset_sortable_fields` method on the `Settings` builder, it is no big deal as the library user (like MeiliSearch) can always call `set_sortable_fields` with an empty list of fields, it is equivalent.
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
332: Sortable attributes in http-ui r=Kerollmops a=irevoire
- Add a `reset_sortable_attribute` method
- Add the `sortable_attributes` to http-ui
- Fix some broken test in http-ui
Co-authored-by: Tamo <tamo@meilisearch.com>
325: Update milli version to v0.11.0 r=curquiza a=Kerollmops
This PR also clean-up some dependencies in the Cargo.toml.
Co-authored-by: Kerollmops <clement@meilisearch.com>
315: Rewrite the indexing benchmarks r=Kerollmops a=irevoire
There was a panic on the benchmark and while I was trying to understand what was happening I decided to rewrite the way the benchmarks were working.
Before we were creating a database with the good setting, and then for each benchmarks we were:
1. Deleting all documents in the database
2. Indexing a batch of documents
Now for each iteration we recreate entirely a new database from scratch.
Since deleting all the documents in a database may not be the same as starting with a fresh new database I prefer this solution.
Co-authored-by: Irevoire <tamo@meilisearch.com>
317: Fix the facet string docids filterable deletion bug r=Kerollmops a=Kerollmops
Fixes a bug where the deletion of documents was returning a decoding error. But only when the settings are set with filterable attributes.
This bug was introduced in #254 in which we made the engine faster in returning the facet distribution. We changed the way we were storing the inverted index, we were no more storing only documents ids with the original values but also groups identified with integers, depending on the facet level we were using. This is similar to how facet numbers are already stored.
⚠️ As `@curquiza` already said, we must first revert #309 before merging this!
Related to https://github.com/meilisearch/MeiliSearch/issues/1601.
Co-authored-by: Clément Renault <clement@meilisearch.com>
309: Sort at query time r=Kerollmops a=Kerollmops
This PR:
- Makes the `Asc/Desc` criteria work with strings too, it first returns documents ordered by numbers then by strings, and finally the documents that can't be ordered. Note that it is lexicographically ordered and not ordered by character, which means that it doesn't know about wide and short characters i.e. `a`, `丹`, `▲`.
- Changes the syntax for the `Asc/Desc` criterion by now using a colon to separate the name and the order i.e. `title:asc`, `price:desc`.
- Add the `Sort` criterion at the third position in the ranking rules by default.
- Add the `sort_criteria` method to the `Search` builder struct to let the users define the `Asc/Desc` sortable attributes they want to use at query time. Note that we need to check that the fields are registered in the sortable attributes before performing the search.
- Introduce a new `InvalidSortableAttribute` user error that is raised when the sort criteria declared at query time are not part of the sortable attributes.
- `@ManyTheFish` introduced integration tests for the dynamic Sort criterion.
Fixes#305.
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: many <maxime@meilisearch.com>
302: Update milli to v0.9.0 r=curquiza a=curquiza
Updating the minor and not patch since #300 seems to be breaking: it involves a re-indexation to get the fix, so it involves an additional step from the users, not only downloading the latest version.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
300: Fix prefix level position docids database r=curquiza a=ManyTheFish
The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.
Fix#299
Co-authored-by: many <maxime@meilisearch.com>
The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.
Fix#299
298: Rename the search benchmarks r=Kerollmops a=irevoire
And fix a bug. As always, I was not closing the env.
Co-authored-by: Tamo <tamo@meilisearch.com>
296: Fix invalid faceted documents ids buffer size r=Kerollmops a=Kerollmops
Fix a bug found by `@irevoire` when benchmarking the search.
Co-authored-by: Kerollmops <clement@meilisearch.com>
288: Stop tracking the Cargo.lock and add cache + windows to the CI r=curquiza a=irevoire
We reuse the same `~/.cargo` and `./target` directory between each run on the same OS and rust toolchain.
The `key` to decide if we can use the cache or not is: `$OS_NAME-$RUST_TOOLCHAIN-$HASH(Cargo.toml)`
We also removed the `Cargo.lock` from this repository. Indeed, milli is a library and [should not track the `Cargo.lock`](https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html)
And finally, we enabled the tests on `windows-latest`. Since `lmdb` has been updated, this is now possible.
Co-authored-by: Tamo <tamo@meilisearch.com>
291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops
Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released.
It is not an undefined behavior, I repeat: it is a "normal" bug 🎉👏
----
This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it.
As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`.
I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes.
<details>
<summary>The backtrace leading us to a panic in grenad.</summary>
```
meilisearch_1 | thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1 | stack backtrace:
meilisearch_1 | 0: rust_begin_unwind
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5
meilisearch_1 | 1: core::panicking::panic_fmt
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14
meilisearch_1 | 2: core::panicking::panic
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5
meilisearch_1 | 3: grenad::block_builder::BlockBuilder::insert
meilisearch_1 | at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1 | 4: grenad::writer::Writer<W>::insert
meilisearch_1 | at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12
meilisearch_1 | 5: milli::update::words_level_positions::write_level_entry
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5
meilisearch_1 | 6: milli::update::words_level_positions::compute_positions_levels
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13
meilisearch_1 | 7: milli::update::words_level_positions::WordsLevelPositions::execute
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23
meilisearch_1 | 8: milli::update::index_documents::IndexDocuments::execute_raw
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9
meilisearch_1 | 9: milli::update::index_documents::IndexDocuments::execute
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9
meilisearch_1 | 10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30
meilisearch_1 | 11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22
meilisearch_1 | 12: meilisearch_http::index::update_handler::UpdateHandler::handle_update
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18
meilisearch_1 | 13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}}
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35
meilisearch_1 | 14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21
meilisearch_1 | 15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17
meilisearch_1 | 16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9
meilisearch_1 | 17: tokio::runtime::task::core::CoreStage<T>::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13
meilisearch_1 | 18: tokio::runtime::task::harness::poll_future::{{closure}}
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23
meilisearch_1 | 19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
meilisearch_1 | 20: std::panicking::try::do_call
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
meilisearch_1 | 21: std::panicking::try
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
meilisearch_1 | 22: std::panic::catch_unwind
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
meilisearch_1 | 23: tokio::runtime::task::harness::poll_future
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19
meilisearch_1 | 24: tokio::runtime::task::harness::Harness<T,S>::poll_inner
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9
meilisearch_1 | 25: tokio::runtime::task::harness::Harness<T,S>::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15
meilisearch_1 | 26: tokio::runtime::task::raw::RawTask::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18
meilisearch_1 | 27: tokio::runtime::task::Notified<S>::run
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9
meilisearch_1 | 28: tokio::runtime::blocking::pool::Inner::run
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17
meilisearch_1 | 29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17
meilisearch_1 | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```
</details>
Co-authored-by: Kerollmops <clement@meilisearch.com>
254: Improve the facet string distribution speed r=Kerollmops a=Kerollmops
This pull request creates a data structure similar to the one we use for the faceted numbers, a tetratomic decision tree but this time for the facet strings. This PR also changes the facet distribution behavior by returning one of the original facet values, fixes#260.
This data structure defines bucket-like structures where documents ids are stored under their facet value and helps the search decide if it wants to move to a lower level under a given bucket or not, depending on if the current bucket contains interesting documents or not. The whole format, algorithm, and previous attempts are explained in the [`facet_string.rs` file](ec1cfdd42b/milli/src/search/facet/facet_string.rs).
Note that this data structure **could** be used to sort by string lexicographically, that hypothetically possible. We need more testing, in terms of performance and quality, as we will sort on lowercased versions of the facet values.
- [x] Implement a faster and more precise way to fetch the facet distribution.
- [x] Store and return the original facet string value. We currently return the lowercased version.
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
287: Add benchmarks for indexing r=Kerollmops a=irevoire
closes#274
I don't really know how much time this will take on our bench machine. I'm afraid the wiki dataset will take a really long time to bench (it takes 1h30 on my computer).
If you are ok with it, I would like to merge this first PR since it introduces a first set of benchmarks and see how much time it takes in reality on our setup.
Co-authored-by: Tamo <tamo@meilisearch.com>
285: Support documents with at most 65536 fields r=Kerollmops a=Kerollmops
Fixes#248.
In this PR I updated the `obkv` crate, it now supports arbitrary key length and therefore I was able to use an `u16` to represent the fields instead of a single byte. It was impressively easy to update the whole codebase 🍡🍔
Co-authored-by: Kerollmops <clement@meilisearch.com>
283: Use the AlwaysFreePages flag when opening an index r=irevoire a=Kerollmops
We introduced a new flag in our fork of LMDB, this `AlwaysFreePages` flag forces LMDB to always free the single pages it uses before writing to the disk instead of keeping them in a linked list.
Declaring this flag reduces the memory print (leak) we have on memory after indexing a lot of documents.
Fixes#279.
Co-authored-by: Kerollmops <clement@meilisearch.com>
284: [http-ui] Introduce the route `die` r=Kerollmops a=irevoire
This route just `exit` the process. This can come in handy when you run `http-ui` inside of another process (a profiler for example), and you don't want to kill everything
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
275: Fix the benchmarks dependencies r=Kerollmops a=irevoire
Import exactly the same dependency as milli instead of a wildcard that can do anything
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <irevoire@protonmail.ch>
269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops
This PR fixes#268.
The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted.
The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html).
271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops
In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings.
Co-authored-by: Kerollmops <clement@meilisearch.com>
267: Highlighting r=Kerollmops a=irevoire
closes#262
I basically rewrote a part of the damerau-levenshtein function we were using for the highlighting to accept at most two errors from the user and stop on the third mistake.
Also, now it supports utf-8, so it should fix our issue.
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <irevoire@protonmail.ch>
266: Bump LMDB to the latest version (v0.9.70) r=Kerollmops a=Kerollmops
By bumping to a new version of heed (from git, v0.12.0 unpublished yet), this PR fixes Windows disk reservation problems. This new version of heed changes the `del/put_current`, and `append` iterator methods signature by declaring them unsafe.
This PR also bumps milli itself into v0.7.0 as it is breaking due to the heed/LMDB bump.
This PR must be merged after #264.
Co-authored-by: Clément Renault <clement@meilisearch.com>
It is undefined behavior to keep a reference to the database while
modifying it, we were keeping references in the database and also
feeding the heed put_current methods with keys referenced inside
the database itself.
https://github.com/Kerollmops/heed/pull/108
257: Fix unconditional facet indexing r=Kerollmops a=Kerollmops
We were indexing every searchable field as filterable, this was a mistake.
Co-authored-by: Kerollmops <clement@meilisearch.com>
246: Improve the ci r=Kerollmops a=irevoire
Rewrite the CI entirely:
- run the ci on Linux, macOS and Windows.
- run the ci on rust stable, beta and nightly
- add rustfmt to the CI.
- split the CI into multiple tasks, this way, the ci should be faster to fail
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
255: Fix facet distribution error r=Kerollmops a=Kerollmops
This PR fixes two invalid behaviors and fixes#253:
- We were ignoring the list of fields for which the user wanted a facet distribution.
- We were not raising any error for when a non-filterable field was requested a facet distribution.
~For the latter behavior I need the help of @curquiza to help me choose the right error type.~
Co-authored-by: Kerollmops <clement@meilisearch.com>
250: Add the limit field to http-ui r=Kerollmops a=irevoire
251: Fix the limit r=Kerollmops a=irevoire
There was no check on the limit and thus if a user specified a very large number this line could cause a panic.
Co-authored-by: Tamo <tamo@meilisearch.com>
245: Warn for when a key is too large for LMDB r=Kerollmops a=Kerollmops
Closes#191, and resolves#140.
Co-authored-by: Kerollmops <clement@meilisearch.com>
247: Return a `MissingDocumentId` error when a document doesn't have one r=Kerollmops a=Kerollmops
We were wrongly returning a `MissingPrimaryKey` instead of a `MissingDocumentId` error for when a document was missing a document id. We also improved the error message for when a document id is invalid (wrong type or wrong format).
Co-authored-by: Kerollmops <clement@meilisearch.com>
237: change sub errors visibility r=Kerollmops a=MarinPostma
re-export sub-error types so they can be matched upon outside of milli.
Co-authored-by: marin postma <postma.marin@protonmail.com>
Co-authored-by: marin <postma.marin@protonmail.com>
236: Format the whole project r=Kerollmops a=irevoire
I need to add `cargo fmt` in the CI before closing #231
Co-authored-by: Tamo <tamo@meilisearch.com>
234: Revert "Enable optimization in every profile" r=Kerollmops a=ManyTheFish
compiling tests in release takes too much time.
Reverts meilisearch/milli#224
Fix#233
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
227: Replace Consecutive by Phrase in query tree r=Kerollmops a=ManyTheFish
Replace `Consecutive` by `Phrase` in the query tree in order to remove theoretical bugs,
due to the `Consecutive` enum type.
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
212: Introduce integration test on criteria r=Kerollmops a=ManyTheFish
- add pre-ranked dataset
- test each criterion 1 by 1
- test all criteria in several order
222: Move the `UpdateStore` into the http-ui crate r=Kerollmops a=Kerollmops
We no more need to have the `UpdateStore` inside of the mill crate as this is the job of the caller to stack the updates and sequentially give them to milli.
223: Update dataset links r=Kerollmops a=curquiza
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
220: Make hard separators split phrase query r=Kerollmops a=ManyTheFish
hard separators will now split a phrase query as two sequential phrases (double-quoted strings):
the query `"Radioactive (Imagine Dragons)"` would be considered equivalent to `"Radioactive" "Imagine Dragons"` which as the little disadvantage of not keeping the order of the two (or more) separate phrases.
Fix#208
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
218: Enable optimization for build.rs and macro r=Kerollmops a=irevoire
It fasten the unzip of the benchmark’s dataset a lot
Co-authored-by: Irevoire <tamo@meilisearch.com>
217: Improve the benchmarks readme r=Kerollmops a=irevoire
- Move the Dataset part to the end of the readme so when peoples just want to run the benchmarks they are not tempted to download the benchmarks by hand (which are going to be downloaded anyway by the `build.rs` scritp)
- Fix the links in the dataset -- wiki part
Co-authored-by: Irevoire <tamo@meilisearch.com>
213: Fix the benchmarks script and names r=Kerollmops a=Kerollmops
The benchmarks compare script was not using the `--output` flag and was therefore failing the download of the JSON reports. We also modified the criterion benchmarks to use shorter names, it helps in looking at the benchmarks in the terminal.
Co-authored-by: Kerollmops <clement@meilisearch.com>
193: Fix primary key behavior r=Kerollmops a=MarinPostma
this pr:
- Adds early returns on empty document additions, avoiding error messages to be returned when adding no documents and no primary key was set.
- Changes the primary key inference logic to match that of legacy meilisearch.
close#194
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: marin postma <postma.marin@protonmail.com>
204: Decorrelate Distinct, Asc/Desc, Filterable fields from the faceted fields r=Kerollmops a=Kerollmops
This PR decorrelates the fields that need to be stored in facet databases (big inverted indexes for fast access) from the filterable fields, the previously named faceted fields are now named filterable fields and are the union of the distinct attribute, all the Asc/Desc criteria and, the filterable fields.
I added two tests to make sure that the engine was correctly generating the faceted databases when a distinct attribute or an Asc/Desc criteria were added, and one to make sure that it was impossible to filter on a non-filterable field even if it was a faceted one.
Note that the `AttributesForFacetting` has also been renamed into `FilterableAttributes`. But it will be the Transplant's job to do that on the API, this change is only visible to the milli's library users.
- Related to https://github.com/meilisearch/transplant/issues/187.
- Fixes#161 by returning the documents that don't have the Asc/Desc field at the end of the bucket.
- Fixes#168.
- Fixes#152.
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: many <maxime@meilisearch.com>
202: Add field id word count docids database r=Kerollmops a=LegendreM
This PR introduces a new database, `field_id_word_count_docids`, that maps the number of words in an attribute with a list of document ids. This relation is limited to attributes that contain less than 11 words.
This database is used by the exactness criterion to know if a document has an attribute that contains exactly the query without any additional word.
Fix#165Fix#196
Related to [specifications:#36](https://github.com/meilisearch/specifications/pull/36)
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
206: Fix http-ui r=Kerollmops a=irevoire
I just noticed that `http-ui` was not compiling on `main`.
I'm not sure this is the best fix, but it works 👀
Co-authored-by: Tamo <irevoire@hotmail.fr>
203: Make the MatchingWords return the number of matching bytes r=Kerollmops a=LegendreM
Make the MatchingWords return the number of matching bytes using a custom Levenshtein algorithm.
Fix#138
Co-authored-by: many <maxime@meilisearch.com>
184: Transfer numbers and strings facets into the appropriate facet databases r=Kerollmops a=Kerollmops
This pull request is related to https://github.com/meilisearch/milli/issues/152 and changes the layout of the facets values, numbers and strings are now in dedicated databases and the user no more needs to define the type of the fields. No more conversion between the two types is done, numbers (floats and integers converted to f64) go to the facet float database and strings go to the strings facet database.
There is one related issue that I found regarding CSVs, the values in a CSV are always considered to be strings, [meilisearch/specifications#28](d916b57d74/text/0028-indexing-csv.md) fixes this issue by allowing the user to define the fields types using `:` in the "CSV Formatting Rules" section.
All previous tests on facets have been modified to pass again and I have also done hand-driven tests with the 115m songs dataset. Everything seems to be good!
Fixes#192.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
200: Fix plane sweep algorithm r=Kerollmops a=LegendreM
Fix plain sweep algorithm after creating some tests on proximity.
Co-authored-by: many <maxime@meilisearch.com>
190: Make bucket candidates optionals r=Kerollmops a=LegendreM
Before the bucket candidates were the result of the facet filters or result of the query tree.
They will now be only the result of the query tree, making the number of candidates more consistent between the same request with or without facet filters.
Fix some clippy warnings.
Fix#186
Co-authored-by: many <maxime@meilisearch.com>
187: Fix fields distribution after documents merge r=Kerollmops a=shekhirin
Resolves https://github.com/meilisearch/milli/issues/174
The problem was with calculation of fields distribution before the merge in `output_from_sorter()`. So if you'd import two documents with the same primary key value, fields distribution will count it as two documents, while `output_from_sorter()` will merge these documents into one.
---
```console
➜ Downloads cat short_movies.json
[
{"id":"47474","title":"The Serpent's Egg","poster":"https://image.tmdb.org/t/p/w500/n7z0doFkXHcvo8QQWHLFnkEPXRU.jpg","overview":"The Serpent's Egg follows a week in the life of Abel Rosenberg, an out-of-work American circus acrobat living in poverty-stricken Berlin following Germany's defeat in World War I.","release_date":246844800,"genres":["Thriller","Drama","Mystery"]},
{"id":"47474","title":"The Serpent's Egg","poster":"https://image.tmdb.org/t/p/w500/n7z0doFkXHcvo8QQWHLFnkEPXRU.jpg","overview":"The Serpent's Egg follows a week in the life of Abel Rosenberg, an out-of-work American circus acrobat living in poverty-stricken Berlin following Germany's defeat in World War I.","release_date":246844800,"genres":["Thriller","Drama","Mystery"]}
]
➜ Downloads curl -X POST -H "Content-Type: text/json" --data-binary @short_movies.json 127.0.0.1:7700/indexes/movies/documents
{"updateId":0}
```
## Before
```console
➜ Downloads curl -s 127.0.0.1:7700/indexes/movies/stats | jq
{
"numberOfDocuments": 1,
"isIndexing": false,
"fieldsDistribution": {
"release_date": 2,
"poster": 2,
"title": 2,
"overview": 2,
"genres": 2,
"id": 2
}
}
```
## After
```console
➜ Downloads curl -s 127.0.0.1:7700/indexes/movies/stats | jq
{
"numberOfDocuments": 1,
"isIndexing": false,
"fieldsDistribution": {
"poster": 1,
"release_date": 1,
"title": 1,
"genres": 1,
"id": 1,
"overview": 1
}
}
```
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
183: remove tests on main r=Kerollmops a=MarinPostma
remove testing on main since we now use bors for merging.
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
- pass excluded document to criteria to remove them in higher levels of the bucket-sort
- merge already returned document with excluded documents to avoid duplicas
Related to #125 and #112Fix#170
This reverts commit 12fb509d84.
We revert this commit because it's causing the bug #150.
The initial algorithm we implemented for the stop_words was:
1. remove the stop_words from the dataset
2. keep the stop_words in the query to see if we can generate new words by
integrating typos or if the word was a prefix
=> This was causing the bug since, in the case of “The hobbit”, we were
**always** looking for something starting with “t he” or “th e”
instead of ignoring the word completely.
For now we are going to fix the bug by completely ignoring the
stop_words in the query.
This could cause another problem were someone mistyped a normal word and
ended up typing a stop_word.
For example imagine someone searching for the music “Won't he do it”.
If that person misplace one space and write “Won' the do it” then we
will loose a part of the request.
One fix would be to update our query tree to something like that:
---------------------
OR
OR
TOLERANT hobbit # the first option is to ignore the stop_word
AND
CONSECUTIVE # the second option is to do as we are doing
EXACT t # currently
EXACT he
TOLERANT hobbit
---------------------
This would increase drastically the size of our query tree on request
with a lot of stop_words. For example think of “The Lord Of The Rings”.
For now whatsoever we decided we were going to ignore this problem and consider
that it doesn't reduce too much the relevancy of the search to do that
while it improves the performances.
fixes after review
bump the version of the tokenizer
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
Integrate the stop_words in the querytree
remove the stop_words from the querytree except if it was a prefix or a typo
more fixes after review
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
Add `the update_id` to the to the updates. The rationale is the
following:
- It allows for better tracability of the update events, thus improved
debugging and logging.
- The enigne is now aware of what he's already processed, and can return
it if asked. It may not make sense now, but in the future, the update
store may not work the same way, and this information about the state
of the engine will be desirable (distributed environement).
This allows us to:
- transform a CSV, a JSON or a JSON lines data type into the same
Grenad x Obkv streamable data type and creates the new FieldsIdsMap.
- Extract all the documents user ids in advance to be able to delete
the existing documents before re-indexing them.
- Keep the last documents with the same user id avoiding duplicates
in the same request.
This is a preparation for making the indexing fully parallel by making the
indexer only be aware of certain words for each threads to avoid postings lists
conflicts for each words
about:The requests and feedback regarding Language support are not managed in this repository. Please upvote the related discussion in our dedicated product repository or open a new one if it doesn't exist.
about:The feature requests and feedback regarding the already existing features are not managed in this repository. Please open a discussion in our dedicated product repository
about: ⚠️ Should only be used by the engine team ⚠️
title: ''
labels: ''
assignees: ''
---
Related product team resources: [PRD]() (_internal only_)
Related product discussion:
Related spec: WIP
## Motivation
<!---Copy/paste the information in PRD or briefly detail the product motivation. Ask product team if any hesitation.-->
## Usage
<!---Link to the public part of the PRD, or to the related product discussion for experimental features-->
## TODO
<!---Feel free to adapt this list with more technical/product steps-->
- [ ] Release a prototype
- [ ] If prototype validated, merge changes into `main`
- [ ] Update the spec
### Reminders when modifying the Setting API
<!--- Special steps to remind when adding a new index setting -->
- [ ] Ensure the new setting route is at least tested by the [`test_setting_routes` macro](https://github.com/meilisearch/meilisearch/blob/5204c0b60b384cbc79621b6b2176fca086069e8e/meilisearch/tests/settings/get_settings.rs#L276)
- [ ] Ensure Analytics are fully implemented
- [ ]`/settings/my-new-setting` configurated in the [`make_setting_routes` macro](https://github.com/meilisearch/meilisearch/blob/5204c0b60b384cbc79621b6b2176fca086069e8e/meilisearch/src/routes/indexes/settings.rs#L141-L165)
- [ ] global `/settings` route configurated in the [`update_all` function](https://github.com/meilisearch/meilisearch/blob/5204c0b60b384cbc79621b6b2176fca086069e8e/meilisearch/src/routes/indexes/settings.rs#L655-L751)
- [ ] Ensure the dump serializing is consistent with the `/settings` route serializing, e.g., enums case can be different (`camelCase` in route and `PascalCase` in the dump)
#### Special cases when adding a setting for an experimental feature
- [ ] ⚠️ API stability: The setting does not appear on the main settings route when the feature has never been enabled (e.g. mark it `Unset` when returned from the index in this situation. See [an example](https://github.com/meilisearch/meilisearch/blob/7a89abd2a025606a42f8b219e539117eb2eb029f/meilisearch-types/src/settings.rs#L608))
- [ ] The setting cannot be set when the feature is disabled, either by the main settings route or the subroute (see [`validate_settings` function](https://github.com/meilisearch/meilisearch/blob/7a89abd2a025606a42f8b219e539117eb2eb029f/meilisearch/src/routes/indexes/settings.rs#L811))
- [ ] If possible, the setting is reset when the feature is disabled (hard if it requires reindexing)
## Impacted teams
<!---Ping the related teams. Ask for the engine manager if any hesitation-->
<!---@meilisearch/docs-team when there is any API change, e.g. settings addition-->
# When the `latest` git tag is created with this [CI](../latest-git-tag.yml)
# we don't need to create a Docker `latest` image again.
# The `latest` Docker image push is already done in this CI when releasing a stable version of Meilisearch.
tags-ignore:
- latest
# Both `schedule` and `workflow_dispatch` build the nightly tag
schedule:
- cron:'0 23 * * *'# Every day at 11:00pm
workflow_dispatch:
jobs:
docker:
runs-on:docker
steps:
- uses:actions/checkout@v3
# If we are running a cron or manual job ('schedule' or 'workflow_dispatch' event), it means we are publishing the `nightly` tag, so not considered stable.
# If we have pushed a tag, and the tag has the v<nmumber>.<number>.<number> format, it means we are publishing an official release, so considered stable.
# In this situation, we need to set `output.stable` to create/update the following tags (additionally to the `vX.Y.Z` Docker tag):
# - a `vX.Y` (without patch version) Docker tag
# - a `latest` Docker tag
# For any other tag pushed, this is not considered stable.
- name:Define if stable and latest release
id:check-tag-format
env:
# To avoid request limit with the .github/scripts/is-latest-release.sh script
GITHUB_PATH:${{ secrets.MEILI_BOT_GH_PAT }}
run:|
escaped_tag=$(printf "%q" ${{ github.ref_name }})
echo "latest=false" >> $GITHUB_OUTPUT
if [[ ${{ github.event_name }} != 'push' ]]; then
echo "stable=false" >> $GITHUB_OUTPUT
elif [[ $escaped_tag =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
@@ -4,17 +4,23 @@ First, thank you for contributing to Meilisearch! The goal of this document is t
Remember that there are many ways to contribute other than writing code: writing [tutorials or blog posts](https://github.com/meilisearch/awesome-meilisearch), improving [the documentation](https://github.com/meilisearch/documentation), submitting [bug reports](https://github.com/meilisearch/meilisearch/issues/new?assignees=&labels=&template=bug_report.md&title=) and [feature requests](https://github.com/meilisearch/product/discussions/categories/feedback-feature-proposal)...
The code in this repository is only concerned with managing multiple indexes, handling the update store, and exposing an HTTP API. Search and indexation are the domain of our core engine, [`milli`](https://github.com/meilisearch/milli), while tokenization is handled by [our `charabia` library](https://github.com/meilisearch/charabia/).
If Meilisearch does not offer optimized support for your language, please consider contributing to `charabia` by following the [CONTRIBUTING.md file](https://github.com/meilisearch/charabia/blob/main/CONTRIBUTING.md) and integrating your intended normalizer/segmenter.
## Table of Contents
- [Assumptions](#assumptions)
- [How to Contribute](#how-to-contribute)
- [Development Workflow](#development-workflow)
- [Git Guidelines](#git-guidelines)
- [Release Process (for internal team only)](#release-process-for-internal-team-only)
## Assumptions
1.**You're familiar with [Github](https://github.com) and the [Pull Requests](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)(PR) workflow.**
2.**You've read the Meilisearch [documentation](https://docs.meilisearch.com).**
3. **You know about the [Meilisearch community](https://docs.meilisearch.com/learn/what_is_meilisearch/contact.html).
1.**You're familiar with [GitHub](https://github.com) and the [Pull Requests (PR)](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) workflow.**
2.**You've read the Meilisearch [documentation](https://www.meilisearch.com/docs).**
3. **You know about the [Meilisearch community on Discord](https://discord.meilisearch.com).
Please use this for help.**
## How to Contribute
@@ -22,7 +28,7 @@ Remember that there are many ways to contribute other than writing code: writing
1. Ensure your change has an issue! Find an
[existing issue](https://github.com/meilisearch/meilisearch/issues/) or [open a new issue](https://github.com/meilisearch/meilisearch/issues/new).
* This is where you can get a feel if the change will be accepted or not.
2. Once approved, [fork the Meilisearch repository](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) in your own Github account.
2. Once approved, [fork the Meilisearch repository](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) in your own GitHub account.
3. [Create a new Git branch](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-and-deleting-branches-within-your-repository)
4. Review the [Development Workflow](#development-workflow) section that describes the steps to maintain the repository.
5. Make your changes on your branch.
@@ -44,12 +50,37 @@ We recommend using the `--release` flag to test the full performance of Meilisea
cargo test
```
This command will be triggered to each PR as a requirement for merging it.
#### Snapshot-based tests
We are using [insta](https://insta.rs) to perform snapshot-based testing.
We recommend using the insta tooling (such as `cargo-insta`) to update the snapshots if they change following a PR.
New tests should use insta where possible rather than manual `assert` statements.
Furthermore, we provide some macros on top of insta, notably a way to use snapshot hashes instead of inline snapshots, saving a lot of space in the repository.
To effectively debug snapshot-based hashes, we recommend you export the `MEILI_TEST_FULL_SNAPS` environment variable so that snapshot are fully created locally:
```
export MEILI_TEST_FULL_SNAPS=true # add this to your .bashrc, .zshrc, ...
```
#### Test troubleshooting
If you get a "Too many open files" error you might want to increase the open file limit using this command:
```bash
ulimit -Sn 3000
```
#### Build tools
Meilisearch follows the [cargo xtask](https://github.com/matklad/cargo-xtask) workflow to provide some build tools.
Run `cargo xtask --help` from the root of the repository to find out what is available.
## Git Guidelines
### Git Branches
@@ -68,7 +99,7 @@ As minimal requirements, your commit message should:
We don't follow any other convention, but if you want to use one, we recommend [the Chris Beams one](https://chris.beams.io/posts/git-commit/).
### Github Pull Requests
### GitHub Pull Requests
Some notes on GitHub PRs:
@@ -78,6 +109,37 @@ Some notes on GitHub PRs:
The draft PRs are recommended when you want to show that you are working on something and make your work visible.
- The branch related to the PR must be **up-to-date with `main`** before merging. Fortunately, this project uses [Bors](https://github.com/bors-ng/bors-ng) to automatically enforce this requirement without the PR author having to rebase manually.
## Release Process (for internal team only)
Meilisearch tools follow the [Semantic Versioning Convention](https://semver.org/).
### Automation to rebase and Merge the PRs
This project integrates a bot that helps us manage pull requests merging.<br>
_[Read more about this](https://github.com/meilisearch/integration-guides/blob/main/resources/bors.md)._
### How to Publish a new Release
The full Meilisearch release process is described in [this guide](https://github.com/meilisearch/engine-team/blob/main/resources/meilisearch-release.md). Please follow it carefully before doing any release.
### How to publish a prototype
Depending on the developed feature, you might need to provide a prototyped version of Meilisearch to make it easier to test by the users.
This happens in two steps:
- [Release the prototype](https://github.com/meilisearch/engine-team/blob/main/resources/prototypes.md#how-to-publish-a-prototype)
- [Communicate about it](https://github.com/meilisearch/engine-team/blob/main/resources/prototypes.md#communication)
### Release assets
For each release, the following assets are created:
- Binaries for different platforms (Linux, MacOS, Windows and ARM architectures) are attached to the GitHub release
- Binaries are pushed to HomeBrew and APT (not published for RC)
- Docker tags are created/updated:
-`vX.Y.Z`
-`vX.Y` (not published for RC)
-`latest` (not published for RC)
<hr>
Thank you again for reading this through, we can not wait to begin to work with you if you made your way through this contributing guide ❤️
Search engine technologies are complex pieces of software that require thorough profiling tools. We chose to use [Puffin](https://github.com/EmbarkStudios/puffin), which the Rust gaming industry uses extensively. You can export and import the profiling reports using the top bar's _File_ menu options [in Puffin Viewer](https://github.com/embarkstudios/puffin#ui).

## Profiling the Indexing Process
When you enable [the `exportPuffinReports` experimental feature](https://www.meilisearch.com/docs/learn/experimental/overview) of Meilisearch, Puffin reports with the `.puffin` extension will be automatically exported to disk. When this option is enabled, the engine will automatically create a "frame" whenever it executes the `IndexScheduler::tick` method.
[Puffin Viewer](https://github.com/EmbarkStudios/puffin/tree/main/puffin_viewer) is used to analyze the reports. Those reports show areas where Meilisearch spent time during indexing.
Another piece of advice on the Puffin viewer UI interface is to consider the _Merge children with same ID_ option. It can hide the exact actual timings at which events were sent. Please turn it off when you see strange gaps on the Flamegraph. It can help.
## Profiling the Search Process
We still need to take the time to profile the search side of the engine with Puffin. It would require time to profile the filtering phase, query parsing, creation, and execution. We could even profile the Actix HTTP server.
The only issue we see is the framing system. Puffin requires a global frame-based profiling phase, which collides with Meilisearch's ability to accept and answer multiple requests on different threads simultaneously.
<palign="center">⚡ Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine 🔍</p>
<palign="center">⚡ A lightning-fast search engine that fits effortlessly into your apps, websites, and workflow 🔍</p>
**Meilisearch** is a powerful, fast, open-source, easy to use and deploy search engine. Both searching and indexing are highly customizable. Features such as typo-tolerance, filters, and synonyms are provided out-of-the-box.
For more information about features go to [our documentation](https://docs.meilisearch.com/).
Meilisearch helps you shape a delightful search experience in a snap, offering features that work out-of-the-box to speed up your workflow.
- **Search-as-you-type:** find search results in less than 50 milliseconds
- **[Typo tolerance](https://www.meilisearch.com/docs/learn/configuration/typo_tolerance?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** get relevant matches even when queries contain typos and misspellings
- **[Filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features) and [faceted search](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** enhance your users' search experience with custom filters and build a faceted search interface in a few lines of code
- **[Sorting](https://www.meilisearch.com/docs/learn/fine_tuning_results/sorting?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** sort results based on price, date, or pretty much anything else your users need
- **[Synonym support](https://www.meilisearch.com/docs/learn/configuration/synonyms?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** configure synonyms to include more relevant content in your search results
- **[Geosearch](https://www.meilisearch.com/docs/learn/fine_tuning_results/geosearch?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** filter and sort documents based on geographic data
- **[Extensive language support](https://www.meilisearch.com/docs/learn/what_is_meilisearch/language?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** search datasets in any language, with optimized support for Chinese, Japanese, Hebrew, and languages using the Latin alphabet
- **[Security management](https://www.meilisearch.com/docs/learn/security/master_api_keys?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** control which users can access what data with API keys that allow fine-grained permissions handling
- **[Multi-Tenancy](https://www.meilisearch.com/docs/learn/security/tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** personalize search results for any number of application tenants
- **Highly Customizable:** customize Meilisearch to your specific needs or use our out-of-the-box and hassle-free presets
- **[RESTful API](https://www.meilisearch.com/docs/reference/api/overview?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** integrate Meilisearch in your technical stack with our plugins and SDKs
- **Easy to install, deploy, and maintain**
### Deploy the Server
## 📖 Documentation
#### Homebrew (Mac OS)
You can consult Meilisearch's documentation at [https://www.meilisearch.com/docs](https://www.meilisearch.com/docs/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=docs).
```bash
brew update && brew install meilisearch
meilisearch
```
## 🚀 Getting started
#### Docker
For basic instructions on how to set up Meilisearch, add documents to an index, and search for documents, take a look at our [Quick Start](https://www.meilisearch.com/docs/learn/getting_started/quick_start?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=get-started) guide.
```bash
docker run -p 7700:7700 -v "$(pwd)/data.ms:/data.ms" getmeili/meilisearch
```
## ⚡ Supercharge your Meilisearch experience
#### Announcing a cloud-hosted Meilisearch
Say goodbye to server deployment and manual updates with [Meilisearch Cloud](https://www.meilisearch.com/cloud?utm_campaign=oss&utm_source=github&utm_medium=meilisearch). No credit card required.
Take a look at the complete [Meilisearch integration list](https://www.meilisearch.com/docs/learn/what_is_meilisearch/sdks?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=sdks-link).
#### Run on Digital Ocean
[](https://www.meilisearch.com/docs/learn/what_is_meilisearch/sdks?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=sdks-logos)
Experienced users will want to keep our [API Reference](https://www.meilisearch.com/docs/reference/api/overview?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced) close at hand.
<imgsrc="https://platform.sh/images/deploy/lg-blue.svg"alt="Deploy on Platform.sh"width="180px"/>
</a>
We also offer a wide range of dedicated guides to all Meilisearch features, such as [filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), [sorting](https://www.meilisearch.com/docs/learn/fine_tuning_results/sorting?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), [geosearch](https://www.meilisearch.com/docs/learn/fine_tuning_results/geosearch?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), [API keys](https://www.meilisearch.com/docs/learn/security/master_api_keys?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), and [tenant tokens](https://www.meilisearch.com/docs/learn/security/tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced).
#### APT (Debian & Ubuntu)
Finally, for more in-depth information, refer to our articles explaining fundamental Meilisearch concepts such as [documents](https://www.meilisearch.com/docs/learn/core_concepts/documents?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced) and [indexes](https://www.meilisearch.com/docs/learn/core_concepts/indexes?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced).
Meilisearch collects **anonymized** data from users to help us improve our product. You can [deactivate this](https://www.meilisearch.com/docs/learn/what_is_meilisearch/telemetry?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=telemetry#how-to-disable-data-collection) whenever you want.
```bash
curl -L https://install.meilisearch.com | sh
./meilisearch
```
To request deletion of collected data, please write to us at [privacy@meilisearch.com](mailto:privacy@meilisearch.com). Don't forget to include your `Instance UID` in the message, as this helps us quickly find and delete your data.
#### Compile and run it from sources
If you want to know more about the kind of data we collect and what we use it for, check the [telemetry section](https://www.meilisearch.com/docs/learn/what_is_meilisearch/telemetry?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=telemetry#how-to-disable-data-collection) of our documentation.
If you have the latest stable Rust toolchain installed on your local system, clone the repository and change it to your working directory.
Meilisearch is a search engine created by [Meili](https://www.welcometothejungle.com/en/companies/meilisearch), a software development company based in France and with team members all over the world. Want to know more about us? [Check out our blog!](https://blog.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=contact)
### Create an Index and Upload Some Documents
🗞 [Subscribe to our newsletter](https://meilisearch.us2.list-manage.com/subscribe?u=27870f7b71c908a8b359599fb&id=79582d828e) if you don't want to miss any updates! We promise we won't clutter your mailbox: we only send one edition every two months.
Let's create an index! If you need a sample dataset, use [this movie database](https://www.notion.so/meilisearch/A-movies-dataset-to-test-Meili-1cbf7c9cfa4247249c40edfa22d7ca87#b5ae399b81834705ba5420ac70358a65). You can also find it in the `datasets/` directory.
💌 Want to make a suggestion or give feedback? Here are some of the channels where you can reach us:
```bash
curl -L 'https://bit.ly/2PAcw9l' -o movies.json
```
- For feature requests, please visit our [product repository](https://github.com/meilisearch/product/discussions)
- Found a bug? Open an [issue](https://github.com/meilisearch/meilisearch/issues)!
- Want to be part of our Discord community? [Join us!](https://discord.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=contact)
Now, you're ready to index some data.
Thank you for your support!
```bash
curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents'\
--header 'content-type: application/json'\
--data-binary @movies.json
```
## 👩💻 Contributing
### Search for Documents
Meilisearch is, and will always be, open-source! If you want to contribute to the project, please take a look at [our contribution guidelines](CONTRIBUTING.md).
#### In command line
## 📦 Versioning
The search engine is now aware of your documents and can serve those via an HTTP server.
Meilisearch releases and their associated binaries are available [in this GitHub page](https://github.com/meilisearch/meilisearch/releases).
The [`jq` command-line tool](https://stedolan.github.io/jq/) can greatly help you read the server responses.
The binaries are versioned following [SemVer conventions](https://semver.org/). To know more, read our [versioning policy](https://github.com/meilisearch/engine-team/blob/main/resources/versioning-policy.md).
"overview":"Along with crime-fighting partner Robin and new recruit Batgirl, Batman battles the dual threat of frosty genius Mr. Freeze and homicidal horticulturalist Poison Ivy. Freeze plans to put Gotham City on ice, while Ivy tries to drive a wedge between the dynamic duo.",
"overview":"Adam West and Burt Ward returns to their iconic roles of Batman and Robin. Featuring the voices of Adam West, Burt Ward, and Julie Newmar, the film sees the superheroes going up against classic villains like The Joker, The Riddler, The Penguin and Catwoman, both in Gotham City… and in space.",
"release_date":1475888400
}
],
"nbHits":8,
"exhaustiveNbHits":false,
"query":"botman robin",
"limit":2,
"offset":0,
"processingTimeMs":2
}
```
#### Use the Web Interface
We also deliver an **out-of-the-box [web interface](https://github.com/meilisearch/mini-dashboard)** in which you can test Meilisearch interactively.
You can access the web interface in your web browser at the root of the server. The default URL is [http://127.0.0.1:7700](http://127.0.0.1:7700). All you need to do is open your web browser and enter Meilisearch’s address to visit it. This will lead you to a web page with a search bar that will allow you to search in the selected index.
| [See the gif above](#demo)
## Documentation
Now that your Meilisearch server is up and running, you can learn more about how to tune your search engine in [the documentation](https://docs.meilisearch.com).
## Contributing
Hey! We're glad you're thinking about contributing to Meilisearch! Feel free to pick an [issue labeled as `good first issue`](https://github.com/meilisearch/meilisearch/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22), and to ask any question you need. Some points might not be clear and we are available to help you!
Also, we recommend following the [CONTRIBUTING](./CONTRIBUTING.md) to create your PR.
## Core engine and tokenizer
The code in this repository is only concerned with managing multiple indexes, handling the update store, and exposing an HTTP API.
Search and indexation are the domain of our core engine, [`milli`](https://github.com/meilisearch/milli), while tokenization is handled by [our `tokenizer` library](https://github.com/meilisearch/tokenizer/).
## Telemetry
Meilisearch collects anonymous data regarding general usage.
This helps us better understand developers' usage of Meilisearch features.
To find out more on what information we're retrieving, please see our documentation on [Telemetry](https://docs.meilisearch.com/learn/what_is_meilisearch/telemetry.html).
This program is optional, you can disable these analytics by using the `MEILI_NO_ANALYTICS` env variable.
## Feature request
The feature requests are not managed in this repository. Please visit our [dedicated repository](https://github.com/meilisearch/product) to see our work about the Meilisearch product.
If you have a feature request or any feedback about an existing feature, please open [a discussion](https://github.com/meilisearch/product/discussions).
Also, feel free to participate in the current discussions, we are looking forward to reading your comments.
Meilisearch is developed by [Meili](https://www.meilisearch.com), a young company. To know more about us, you can [read our blog](https://blog.meilisearch.com). Any suggestion or feedback is highly appreciated. Thank you for your support!
Differently from the binaries, crates in this repository are not currently available on [crates.io](https://crates.io/) and do not follow [SemVer conventions](https://semver.org).
- [Comparison between benchmarks](#comparison-between-benchmarks)
- [Datasets](#datasets)
## Run the benchmarks
### On our private server
The Meili team has self-hosted his own GitHub runner to run benchmarks on our dedicated bare metal server.
To trigger the benchmark workflow:
- Go to the `Actions` tab of this repository.
- Select the `Benchmarks` workflow on the left.
- Click on `Run workflow` in the blue banner.
- Select the branch on which you want to run the benchmarks and select the dataset you want (default: `songs`).
- Finally, click on `Run workflow`.
This GitHub workflow will run the benchmarks and push the `critcmp` report to a DigitalOcean Space (= S3).
The name of the uploaded file is displayed in the workflow.
_[More about critcmp](https://github.com/BurntSushi/critcmp)._
💡 To compare the just-uploaded benchmark with another one, check out the [next section](#comparison-between-benchmarks).
### On your machine
To run all the benchmarks (~5h):
```bash
cargo bench
```
To run only the `search_songs` (~1h), `search_wiki` (~3h), `search_geo` (~20m) or `indexing` (~2h) benchmark:
```bash
cargo bench --bench <dataset name>
```
By default, the benchmarks will be downloaded and uncompressed automatically in the target directory.<br>
If you don't want to download the datasets every time you update something on the code, you can specify a custom directory with the environment variable `MILLI_BENCH_DATASETS_PATH`:
```bash
mkdir ~/datasets
MILLI_BENCH_DATASETS_PATH=~/datasets cargo bench --bench search_songs # the four datasets are downloaded
touch build.rs
MILLI_BENCH_DATASETS_PATH=~/datasets cargo bench --bench songs # the code is compiled again but the datasets are not downloaded
```
## Comparison between benchmarks
The benchmark reports we push are generated with `critcmp`. Thus, we use `critcmp` to show the result of a benchmark, or compare results between multiple benchmarks.
We provide a script to download and display the comparison report.
_[Download the `smol-wiki` dataset](https://milli-benchmarks.fra1.digitaloceanspaces.com/datasets/smol-wiki-articles.csv.gz)._
### Movies
`movies` is a really small dataset we uses as our example in the [getting started](https://www.meilisearch.com/docs/learn/getting_started/quick_start)
_[Download the `movies` dataset](https://www.meilisearch.com/movies.json)._
### All Countries
`smol-all-countries` is a subset of the [`all-countries.csv` dataset](https://milli-benchmarks.fra1.digitaloceanspaces.com/datasets/all-countries.csv.gz)
It has been converted to jsonlines and then edited so it matches our format for the `_geo` field.
"the black saint and the sinner lady and the good doggo ",// four words to pop
"les liaisons dangeureuses 1793 ",// one word to pop
"The Disneyland Children's Sing-Alone song ",// two words to pop
"seven nation mummy ",// one word to pop
"7000 Danses / Le Baiser / je me trompe de mots ",// four words to pop
"Bring Your Daughter To The Slaughter but now this is not part of the title ",// nine words to pop
"whathavenotnsuchforth and a good amount of words to pop to match the first one ",// 13
],
criterion: Some(&["words"]),
..BASE_CONF
},
utils::Conf{
group_name: "asc",
criterion: Some(&["released-timestamp:desc"]),
..BASE_CONF
},
utils::Conf{
group_name: "desc",
criterion: Some(&["released-timestamp:desc"]),
..BASE_CONF
},
/* then we bench the asc and desc criterion on top of the default criterion */
utils::Conf{
group_name: "asc + default",
criterion: Some(&asc_default[..]),
..BASE_CONF
},
utils::Conf{
group_name: "desc + default",
criterion: Some(&desc_default[..]),
..BASE_CONF
},
/* we bench the filters with the default request */
utils::Conf{
group_name: "basic filter: <=",
filter: Some("released-timestamp <= 946728000"),// year 2000
..BASE_CONF
},
utils::Conf{
group_name: "basic filter: TO",
filter: Some("released-timestamp 946728000 TO 1262347200"),// year 2000 to 2010
..BASE_CONF
},
utils::Conf{
group_name: "big filter",
filter: Some("released-timestamp != 1262347200 AND (NOT (released-timestamp = 946728000)) AND (duration-float = 1 OR (duration-float 1.1 TO 1.5 AND released-timestamp > 315576000))"),
..BASE_CONF
},
/* the we bench some global / normal search with all the default criterion in the default
* order */
utils::Conf{
group_name: "basic placeholder",
queries: &[""],
..BASE_CONF
},
utils::Conf{
group_name: "basic without quote",
queries: &BASE_CONF
.queries
.iter()
.map(|s|s.trim())// we remove the space at the end of each request
# Gets the architecture by setting the $archi variable
# Gets the architecture by setting the $archi variable.
# Returns 0 in case of success, 1 otherwise.
get_archi(){
architecture=$(uname -m)
@@ -152,8 +74,9 @@ get_archi() {
archi='amd64'
;;
'arm64')
if[$os='macos'];then# MacOS M1
archi='amd64'
# macOS M1/M2
if[$os='macos'];then
archi='apple-silicon'
else
archi='aarch64'
fi
@@ -171,70 +94,74 @@ success_usage() {
printf"$GREEN%s\n$DEFAULT""Meilisearch $latest binary successfully downloaded as '$binary_name' file."
echo''
echo'Run it:'
echo' $ ./meilisearch'
echo"$ ./$PNAME"
echo'Usage:'
echo' $ ./meilisearch --help'
echo"$ ./$PNAME --help"
}
not_available_failure_usage(){
printf"$RED%s\n$DEFAULT"'ERROR: Meilisearch binary is not available for your OS distribution or your architecture yet.'
echo''
echo'However, you can easily compile the binary from the source files.'
echo'Follow the steps at the page ("Source" tab): https://docs.meilisearch.com/learn/getting_started/installation.html'
echo'Follow the steps at the page ("Source" tab): https://www.meilisearch.com/docs/learn/getting_started/installation'
}
fetch_release_failure_usage(){
echo''
printf"$RED%s\n$DEFAULT"'ERROR: Impossible to get the latest stable version of Meilisearch.'
echo'Please let us know about this issue: https://github.com/meilisearch/meilisearch/issues/new/choose'
echo''
echo'In the meantime, you can manually download the appropriate binary from the GitHub release assets here: https://github.com/meilisearch/meilisearch/releases/latest'
}
fill_release_variables(){
# Fill $latest variable.
if ! get_latest;then
fetch_release_failure_usage
exit1
fi
if["$latest"=''];then
fetch_release_failure_usage
exit1
fi
# Fill $os variable.
if ! get_os;then
not_available_failure_usage
exit1
fi
# Fill $archi variable.
if ! get_archi;then
not_available_failure_usage
exit1
fi
}
download_binary(){
fill_release_variables
echo"Downloading Meilisearch binary $latest for $os, architecture $archi..."
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.