4231: Fixed payload limit setting being ignored for delete documents by batch r=Kerollmops a=Karribalu
# Pull Request
## Related issue
Fixes#4224
## What does this PR do?
- Added http_payload_size_limit to JsonConfig to allow deleting documents in batches with a payload size greater than 2MB, which is the default limit set in the JsonConfig crate.
## PR checklist
Please check if your PR fulfills the following requirements:
- [Y] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [Y] Have you read the contributing guidelines?
- [Y] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: karribalu <karri.balu123456@gmail.com>
4090: Diff indexing r=ManyTheFish a=ManyTheFish
This pull request aims to reduce the indexing time by computing a difference between the data added to the index and the data removed from the index before writing in LMDB.
## Why focus on reducing the writings in LMDB?
The indexing in Meilisearch is split into 3 main phases:
1) The computing or the extraction of the data (Multi-threaded)
2) The writing of the data in LMDB (Mono-threaded)
3) The processing of the prefix databases (Mono-threaded)
see below:

Because the writing is mono-threaded, it represents a bottleneck in the indexing, reducing the number of writes in LMDB will reduce the pressure on the main thread and should reduce the global time spent on the indexing.
## Give Feedback
We created [a dedicated discussion](https://github.com/meilisearch/meilisearch/discussions/4196) for users to try this new feature and to give feedback on bugs or performance issues.
## Technical approach
### Part 1: merge the addition and the deletion process
This part:
a) Aims to reduce the time spent on indexing only the filterable/sortable fields of documents, for example:
- Updating the number of "likes" or "stars" of a song or a movie
- Updating the "stock count" or the "price" of a product
b) Aims to reduce the time spent on writing in LMDB which should reduce the global indexing time for the highly multi-threaded machines by reducing the writing bottleneck.
c) Aims to reduce the average time spent to delete documents without having to keep the soft-deleted documents implementation
- [x] Create a preprocessing function that creates the diff-based documents chuck (`OBKV<fid, OBKV<AddDel, value>>`)
- [x] and clearly separate the faceted fields and the searchable fields in two different chunks
- Change the parameters of the input extractor by taking an `OBKV<fid, OBKV<AddDel, value>>` instead of `OBKV<fid, value>`.
- [x] extract_docid_word_positions
- [x] extract_geo_points
- [x] extract_vector_points
- [x] extract_fid_docid_facet_values
- Adapt the searchable extractors to the new diff-chucks
- [x] extract_fid_word_count_docids
- [x] extract_word_pair_proximity_docids
- [x] extract_word_position_docids
- [x] extract_word_docids
- Adapt the facet extractors to the new diff-chucks
- [x] extract_facet_number_docids
- [x] extract_facet_string_docids
- [x] extract_fid_docid_facet_values
- [x] FacetsUpdate
- [x] Adapt the prefix database extractors ⚠️⚠️
- [x] Make the LMDB writer remove the document_ids to delete at the same time the new document_ids are added
- [x] Remove document deletion pipeline
- [x] remove `new_documents_ids` entirely and `replaced_documents_ids`
- [x] reuse extracted external id from transform instead of re-extracting in `TypedChunks::Documents`
- [x] Remove deletion pipeline after autobatcher
- [x] remove autobatcher deletion pipeline
- [x] everything uses `IndexOperation::DocumentOperation`
- [x] repair deletion by internal id for filter by delete
- [x] Improve the deletion via internal ids by avoiding iterating over the whole set of external document ids.
- [x] Remove soft-deleted documents
#### FIXME
- [x] field distribution is not correctly updated after deletion
- [x] missing documents in the tests of tokenizer_customization
### Part 2: Only compute the documents field by field
This part aims to reduce the global indexing time for any kind of partial document modification on any size of machine from the mono-threaded one to the highly multi-threaded one.
- [ ] Make the preprocessing function only send the fields that changed to the extractors
- [ ] remove the `word_docids` and `exact_word_docids` database and adapt the search (⚠️ could impact the search performances)
- [ ] replace the `word_pair_proximity_docids` database with a `word_pair_proximity_fid_docids` database and adapt the search (⚠️ could impact the search performances)
- [ ] Adapt the prefix database extractors ⚠️⚠️
## Technical Concerns
- The part 1 implementation could increase the indexing time for the smallest machines (with few threads) by increasing the extracting time (multi-threaded) more than the writing time (mono-threaded)
- The part 2 implementation needs to change the databases which could have a significant impact on the search performances
- The prefix databases are a bit special to process and may be a pain to adapt to the difference-based indexing
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4203: Extract external document docids from docs on deletion by filter r=Kerollmops a=dureuill
This fixes some of the performance regression observed on `diff-indexing` when doing delete-by-filter with a filter matching many documents.
To delete 19 768 771 documents (hackernews dataset, all documents matching `type = comment`), here are the observed time:
|branch (commit sha1sum)|time|speed-down factor (lower is better)|
|--|--|--|
|`main` (48865470d7)|1212.885536s (~20min)|x1.0 (baseline)|
|`diff-indexing` (523519fdbf)|5385.550543s (90min)|x4.44|
|**`diff-indexing-extract-primary-key`**(f8289cd974)|2582.323324s (43min) | x2.13|
So we're still suffering a speed-down of x2.13, but that's much better than x4.44.
---
Changes:
- Refactor the logic of PrimaryKey extraction to a struct
- Add a trait to abstract the extraction of field id from a name between `DocumentBatch` and `FieldIdMap`.
- Add `Index::external_id_of` to get the external ids of a bitmap of internal ids.
- Use this new method to add new Transform and Batch methods to remove documents that are known to be from the DB.
- Modify delete-by-filter to use the new method
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4205: Prevent search hang on the processing index r=Kerollmops a=dureuill
Fixes#4206, an issue originally [reported on Discord](https://discord.com/channels/1006923006964154428/1148983671026618579/1148983671026618579) where having parallel search requests on more indexes than the index cache capacity would cause search requests on the currently updating index to hang until the index is done updating.
## Test setup
- Create 20 empty indexes by sending settings to them
- repeatedly send placeholder search requests to each of the indexes in a loop
- Create another index and send a significant batch of documents to index.
- Attempt to perform a search request on that last index.
- Before this PR, the search request hangs while the index update task is processing
- After this PR, the search request respond immediately even while the index update task is processing
## Changes
- When getting the handle to an index for some potentially long running batches of tasks, save it in the index scheduler.
- Drop the handle from the index-scheduler when the task is done so that we don't leak indexes.
- When getting an index from outside the task queue processor, check if there is such an handle matching the requested index. If so, skip the cache entirely and clone the handle.
Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4204: Throw error when the vector search is sent with the wrong size r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#4201
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4185: Bump Swatinem/rust-cache from 2.6.2 to 2.7.1 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.6.2 to 2.7.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.7.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix save-if documentation in readme by <a href="https://github.com/rukai"><code>`@rukai</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/166">Swatinem/rust-cache#166</a></li>
<li>Support for <code>trybuild</code> and similar macro testing tools by <a href="https://github.com/neysofu"><code>`@neysofu</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/168">Swatinem/rust-cache#168</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/rukai"><code>`@rukai</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/166">Swatinem/rust-cache#166</a></li>
<li><a href="https://github.com/neysofu"><code>`@neysofu</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/168">Swatinem/rust-cache#168</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/Swatinem/rust-cache/compare/v2.6.2...v2.7.0">https://github.com/Swatinem/rust-cache/compare/v2.6.2...v2.7.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.7.1</h2>
<ul>
<li>Update toml parser to fix parsing errors.</li>
</ul>
<h2>2.7.0</h2>
<ul>
<li>Properly cache <code>trybuild</code> tests.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="3cf7f8cc28"><code>3cf7f8c</code></a> 2.7.1</li>
<li><a href="e03705e031"><code>e03705e</code></a> changelog</li>
<li><a href="b86d1c6caa"><code>b86d1c6</code></a> bump all the other dependencies too</li>
<li><a href="f27990c89a"><code>f27990c</code></a> Update Dependencies (<a href="https://redirect.github.com/swatinem/rust-cache/issues/172">#172</a>)</li>
<li><a href="a95ba19544"><code>a95ba19</code></a> 2.7.0</li>
<li><a href="82c8487d00"><code>82c8487</code></a> changelog</li>
<li><a href="67c46e7159"><code>67c46e7</code></a> Support for <code>trybuild</code> and similar macro testing tools (<a href="https://redirect.github.com/swatinem/rust-cache/issues/168">#168</a>)</li>
<li><a href="44b6087283"><code>44b6087</code></a> Fix save-if documentation in readme (<a href="https://redirect.github.com/swatinem/rust-cache/issues/166">#166</a>)</li>
<li>See full diff in <a href="https://github.com/swatinem/rust-cache/compare/v2.6.2...v2.7.1">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4167: Introduce the `meilitool` command line interface r=Kerollmops a=Kerollmops
This PR introduces a small tool to help the Cloud team:
- Clear the tasks queue by removing all the tasks
- Dump a Meilisearch database without having to enqueue the task
- Access this `meilitool` binary from the Docker Image
## TODO
- [x] Modify the Docker File to ship with this new tool (`@curquiza,` could you review that, please?)
- [x] Clear the tasks queue by removing all the tasks
- [x] Add more logs to explain what is happening
- [x] Clear the `update_files` folder
- [x] Dump a Meilisearch database without having to enqueue the task
- [x] Add more logs to explain what is happening
- [x] Introduce a flag to skip dumping enqueued and processing tasks.
- [x] Dump the instance uid.
- [x] Dump the keys.
- [x] Dump the tasks with the update files.
- [x] Dump the index documents and settings.
- [ ] ~Dump the experimental features~
Co-authored-by: Clément Renault <clement@meilisearch.com>
4169: update charabia r=curquiza a=ManyTheFish
Update Charabia to v0.8.5 and add the new khmer tokenizer
Co-authored-by: ManyTheFish <many@meilisearch.com>
4132: Extract the creation and last updated timestamp from v2 dumps r=irevoire a=vivek-26
# Pull Request
## Related issue
Fixes#2989
## What does this PR do?
This PR -
- extracts the `created_at` and `updated_at` dates from v2 dumps.
- updates the unit tests.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
4154: Update version for the next release (v1.5.0) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
4126: Make the experimental route /metrics activable via HTTP r=dureuill a=braddotcoffee
# Pull Request
## Related issue
Closes#4086
## What does this PR do?
- [x] Make `/metrics` available via HTTP as described in #4086
- [x] The users can still launch Meilisearch using the `--experimental-enable-metrics` flag.
- [x] If the flag `--experimental-enable-metrics` is activated, a call to the `GET /experimental-features` route right after the launch will show `"metrics": true` even if the user has not called the `PATCH /experimental-features` route yet.
- [x] Even if the --experimental-enable-metrics flag is present at launch, calling the `PATCH /experimental-features` route with `"metrics": false` disables the experimental feature.
- [x] Update the spec
- I was unable to find docs in this repository to update about the `/experimental-features` endpoint. I'll happily update if you point me in the right direction!
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: bwbonanno <bradfordbonanno@gmail.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4134: Bump rustix from 0.36.15 to 0.36.16 r=Kerollmops a=dependabot[bot]
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.36.15 to 0.36.16.
<details>
<summary>Commits</summary>
<ul>
<li><a href="6534992521"><code>6534992</code></a> chore: Release rustix version 0.36.16</li>
<li><a href="4928cf7a38"><code>4928cf7</code></a> Disable riscv64 testing.</li>
<li><a href="8cc159c4c3"><code>8cc159c</code></a> Fix the <code>test_ttyname_ok</code> test when /dev/stdin is inaccessable. (<a href="https://redirect.github.com/bytecodealliance/rustix/issues/821">#821</a>)</li>
<li><a href="6dc7ba9478"><code>6dc7ba9</code></a> Downgrade dependencies and disable tests to compile under Rust 1.48.</li>
<li><a href="ded8986e7e"><code>ded8986</code></a> Disable MIPS in CI. (<a href="https://redirect.github.com/bytecodealliance/rustix/issues/793">#793</a>)</li>
<li><a href="739f9c3ba0"><code>739f9c3</code></a> Fixes for <code>Dir</code> on macOS, FreeBSD, and WASI.</li>
<li><a href="87481a97f4"><code>87481a9</code></a> Merge pull request from GHSA-c827-hfw6-qwvm</li>
<li>See full diff in <a href="https://github.com/bytecodealliance/rustix/compare/v0.36.15...v0.36.16">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4073: Simplify Puffin report exports r=ManyTheFish a=Kerollmops
This PR changes how we export Puffin reports by directly writing them to disk when the `exportPuffinReports` [experimental feature is enabled](https://www.meilisearch.com/docs/learn/experimental/overview) on the `/experimental-features` route. It also adds more puffing logging to the deletion phase and grenad helpers. The puffin reports are identified by the date and time at which they are exported.
## Todo List
- [x] Change the CLI flag to be an API experimental option.
- [x] Create [a PRD for this experimental feature (private)](https://www.notion.so/meilisearch/Export-Puffin-Reports-091df151e71c4edfb7d72f4bf995b3ea).
- [x] Create and complete [a product discussion](https://github.com/meilisearch/product/discussions/693) (copy/paste PROFILING markdown?).
- [x] Update the _PROFILING.md_ markdown file instructions.
- [x] Change the debug logs of the processing operation (visible in puffin viewer).
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
4101: Bump webpki from 0.22.1 to 0.22.2 r=curquiza a=dependabot[bot]
Bumps [webpki](https://github.com/briansmith/webpki) from 0.22.1 to 0.22.2.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a href="https://github.com/briansmith/webpki/commits">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4108: Fix bug where search with distinct attribute and no ranking, returns offset+limit hits r=curquiza a=vivek-26
# Pull Request
## Related issue
Fixes#4078
## What does this PR do?
This PR -
- Fixes bug where search with distinct attribute and no ranking, returns offset+limit hits.
- Adds unit and integration tests.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
4089: Use a bufreader and bufwriter everytime there is a grenad<file> r=curquiza a=irevoire
# Pull Request
Wrap all the files we give to a grenad in a `BufReader` or `BufWriter`.
The dump import I tried in the issue went from 2h to 10 minutes on my machine.
I also ran a bunch of benchmarks on my machine, and we're faster by a few seconds everywhere but nothing huge.
-----
The one thing I’m afraid about is if we used to get the inner file in a grenad and then do a read right after without a seek at the beginning of the file or a reopen.
Since we now use a bufreader our read would return the bytes one buffer later and probably completely corrupt what we were supposed to read.
From what I see, it looks like it works, but I may have missed something, I don't know much about this part of the codebase.
This issue should not arise on the bufwriter, though, because if we're not able to write the content of the buffer I ensured that the `into_inner` of the bufwriter should return an internal error.
## Related issue
Fixes#4087
Co-authored-by: Tamo <tamo@meilisearch.com>
4112: Update version for the next release (v1.4.1) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
4102: Introduce the first bot that shows benchmarks results r=curquiza a=Kerollmops
TBD
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
4074: Enable analytics in debug builds r=Kerollmops a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4072
## What does this PR do?
- Stop disabling the analytics if meilisearch has been compiled in debug mode
Co-authored-by: Tamo <tamo@meilisearch.com>
4065: Dependency issue every 6 months r=curquiza a=curquiza
To avoid spending too much time on it (1 every two sprints)
If you disagree `@Kerollmops,` for security or any reason, please close the PR
Co-authored-by: Clémentine U. - curqui <clementine@meilisearch.com>
4044: Add more integrations to SDK CI r=curquiza a=curquiza
For the integration scope management, but also to anticipate bugs and breaking changes for engine team, we need to add more SDKs tests into the CI
Co-authored-by: curquiza <clementine@meilisearch.com>
Display docker image
Add strapi and firebase
Add rails and symfony tests
Remove strapi and firestore tests
Fix dotnet SDK CI
Use specific dart SDK version
Disable coverage for ruby SDK
Prevent pushing coverage information to codecov
Remove codecoverage token
Trigger Build
Trigger Build
Trigger Build
Trigger Build
Trigger Build
4056: Rewrite segment_analytics module with the destructuring syntax r=Kerollmops a=vivek-26
# Pull Request
## Related issue
Fixes#3928
## What does this PR do?
- This PR uses Rust Destructuring syntax in the `segment_analytics` module, such that adding or deleting fields causes an error at compile time.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
4053: Fix the stats of the documents deletion by filter r=Kerollmops a=irevoire
# Pull Request
The issue was that the operation « DocumentDeletionByFilter » was not declared as an index operation. That means the index stats were not reprocessed after the application of the operation.
## Related issue
Fixes#4018
## What does this PR do?
- Move the `DocumentDeletionByFilter` internal operation into the category of the `IndexOperation`. This means that the stats will automatically be re-processed after a batch is processed.
- Update a test to ensure that the stats are valid after each operation
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Tamo <tamo@meilisearch.com>
4051: Implement the snapshots on demand r=Kerollmops a=irevoire
# Pull Request
Private link: [PRD available here](https://www.notion.so/meilisearch/On-demand-snapshots-5676e542b905459d96eec228da133b00#847ff0cafeb64fe09e8ee7150852b474)
Specification here: https://github.com/meilisearch/specifications/pull/258
## Prototype
A prototype is available under the name: `prototype-snapshot-on-demand-0`.
## Related issue
Fixes#4052
## What does this PR do?
- Introduce a new route, `POST /snapshots` to create snapshots on demand
- Introduce a new api-key action `snapshot.create`
- Introduce a new analytic `Snapshot Created` sent every time a snapshot is created.
## Notes for the team
I made a prototype so users can test the feature before the v1.5 comes out. But we can merge the PR as-is.
Co-authored-by: Tamo <tamo@meilisearch.com>
3997: Refactor empty arrays/objects should return empty instead of null r=Kerollmops a=dogukanakkaya
# Pull Request
## What does this PR do?
At the moment if we select empty objects and array of object properties with dot notations like:
```json
{
"array": [],
"object": {}
}
```
```rs
GetDocumentOptions { fields: Some(vec!["array.name", "object.name"]) }
```
returns null if the array/object has no property yet.
I am not sure if this is expected or it's the correct behaviour but I add my document with a property that is assigned to an empty array/object, later on when I select it, returns null which is kinda weird and unexpected in my opinion.
This PR fixes that issue by returning an empty vector if the array is empty or an empty map if object is empty. This is not added for `permissive-json-pointer/src/lib.rs:224` because `create_array` loops over each item. Selecting a single property that is an object, in an array of objects would result other objects to be empty maps instead of none.
```json
"doggos": [
{
"jean": {
"race": {
"name": "bernese mountain",
}
}
},
{
"marc": {
"age": 4,
"race": {
"name": "golden retriever",
}
}
}
]
```
```rs
GetDocumentOptions { fields: Some(vec!["doggos.jean"]) }
```
Would result in `jean` object and an extra empty object for `marc`.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: dogukanakkaya <doguakkaya27@hotmail.com>
4009: Bump rustls-webpki from 0.100.1 to 0.100.2 r=Kerollmops a=dependabot[bot]
Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.100.1 to 0.100.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/rustls/webpki/releases">rustls-webpki's releases</a>.</em></p>
<blockquote>
<h2>v/0.100.2</h2>
<h2>Release notes</h2>
<ul>
<li>certificate path building and verification is now capped at 100 signature validation operations to avoid the risk of CPU usage denial-of-service attack when validating crafted certificate chains producing quadratic runtime. This risk affected both clients, as well as servers that verified client certificates.</li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>v0.100.2 prep by <a href="https://github.com/cpu"><code>`@cpu</code></a>` in <a href="https://redirect.github.com/rustls/webpki/pull/154">rustls/webpki#154</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2">https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="c8b821450b"><code>c8b8214</code></a> Bump MSRV to 1.60</li>
<li><a href="855752292e"><code>8557522</code></a> Avoid testing MSRV of dev-dependencies</li>
<li><a href="73a7f0c7d7"><code>73a7f0c</code></a> Cargo: version 0.100.1 -> 0.100.2</li>
<li><a href="4ea052366f"><code>4ea0523</code></a> verify_cert: enforce maximum number of signatures.</li>
<li>See full diff in <a href="https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The issue was that the operation « DocumentDeletionByFilter » was not
declared as an index operation. That means the indexes stats were not
reprocessed after the application of the operation.
4050: Bump webpki from 0.22.0 to 0.22.1 r=Kerollmops a=dependabot[bot]
Bumps [webpki](https://github.com/briansmith/webpki) from 0.22.0 to 0.22.1.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a href="https://github.com/briansmith/webpki/commits">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [Update] test-suite.yml
Added New run command for cargo tree without default features using if-then block
* [Updated] test-disabled-tokenization in test-suite.yml
* [Updated] test-suite.yml
* Update .github/workflows/test-suite.yml
---------
Co-authored-by: Clémentine U. - curqui <clementine@meilisearch.com>
4028: Fix highlighting bug when searching for a phrase with cropping r=ManyTheFish a=vivek-26
# Pull Request
## Related issue
Fixes#3975
## What does this PR do?
This PR -
- Fixes the bug where searching **only** for a phrase (containing multiple words) along with cropping, highlighted only the first word of the phrase.
- Adds unit test case for the above mentioned scenario.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops
This PR fixes#4035, making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.
Co-authored-by: Kerollmops <clement@meilisearch.com>
4038: Fix filter escaping issues r=ManyTheFish a=Kerollmops
This PR fixes#4034 by always escaping the sequences. Users must always put quotes (simple or double) to escape the filter values.
Co-authored-by: Kerollmops <clement@meilisearch.com>
4037: Update version for the next release (v1.3.3) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
4025: Bump Swatinem/rust-cache from 2.5.1 to 2.6.2 r=curquiza a=dependabot[bot]
Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.5.1 to 2.6.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.6.2</h2>
<h2>What's Changed</h2>
<ul>
<li>dep: Use <code>smol-toml</code> instead of <code>toml</code> by <a href="https://github.com/NobodyXu"><code>`@NobodyXu</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/164">Swatinem/rust-cache#164</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/Swatinem/rust-cache/compare/v2...v2.6.2">https://github.com/Swatinem/rust-cache/compare/v2...v2.6.2</a></p>
<h2>v2.6.1</h2>
<ul>
<li>Fix hash contributions of <code>Cargo.lock</code>/<code>Cargo.toml</code> files.</li>
</ul>
<h2>v2.6.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add "buildjet" as a second <code>cache-provider</code> backend <a href="https://github.com/joroshiba"><code>`@joroshiba</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/154">Swatinem/rust-cache#154</a></li>
<li>Clean up sparse registry index.</li>
<li>Do not clean up src of <code>-sys</code> crates.</li>
<li>Remove <code>.cargo/credentials.toml</code> before saving.</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/joroshiba"><code>`@joroshiba</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/154">Swatinem/rust-cache#154</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/Swatinem/rust-cache/compare/v2.5.1...v2.6.0">https://github.com/Swatinem/rust-cache/compare/v2.5.1...v2.6.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h2>2.6.2</h2>
<ul>
<li>Fix <code>toml</code> parsing.</li>
</ul>
<h2>2.6.1</h2>
<ul>
<li>Fix hash contributions of <code>Cargo.lock</code>/<code>Cargo.toml</code> files.</li>
</ul>
<h2>2.6.0</h2>
<ul>
<li>Add "buildjet" as a second <code>cache-provider</code> backend.</li>
<li>Clean up sparse registry index.</li>
<li>Do not clean up src of <code>-sys</code> crates.</li>
<li>Remove <code>.cargo/credentials.toml</code> before saving.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="e207df5d26"><code>e207df5</code></a> 2.6.2</li>
<li><a href="decb69d790"><code>decb69d</code></a> Update dependencies and add changelog</li>
<li><a href="ab6b2769d1"><code>ab6b276</code></a> dep: Use <code>smol-toml</code> instead of <code>toml</code> (<a href="https://redirect.github.com/swatinem/rust-cache/issues/164">#164</a>)</li>
<li><a href="578b235f6e"><code>578b235</code></a> 2.6.1</li>
<li><a href="5113490c3f"><code>5113490</code></a> prepare 2.6.1</li>
<li><a href="c0e052c18c"><code>c0e052c</code></a> Fix hashing of parsed <code>Cargo.toml</code> (<a href="https://redirect.github.com/swatinem/rust-cache/issues/160">#160</a>)</li>
<li><a href="4e0f4b19dd"><code>4e0f4b1</code></a> Fix typo in hashing parsed <code>Cargo.lock</code> (<a href="https://redirect.github.com/swatinem/rust-cache/issues/159">#159</a>)</li>
<li><a href="b919e1427f"><code>b919e14</code></a> feat: Add logging to <code>Cargo.lock</code>/<code>Cargo.toml</code> hashing (<a href="https://redirect.github.com/swatinem/rust-cache/issues/156">#156</a>)</li>
<li><a href="b8a6852b4f"><code>b8a6852</code></a> 2.6.0</li>
<li><a href="80c47cc945"><code>80c47cc</code></a> Clean up <code>credentials.toml</code></li>
<li>Additional commits viewable in <a href="https://github.com/swatinem/rust-cache/compare/v2.5.1...v2.6.2">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Search engine technologies are complex pieces of software that require thorough profiling tools. We chose to use [Puffin](https://github.com/EmbarkStudios/puffin), which the Rust gaming industry uses extensively. You can export and import the profiling reports using the top bar's _File_ menu options.
Search engine technologies are complex pieces of software that require thorough profiling tools. We chose to use [Puffin](https://github.com/EmbarkStudios/puffin), which the Rust gaming industry uses extensively. You can export and import the profiling reports using the top bar's _File_ menu options [in Puffin Viewer](https://github.com/embarkstudios/puffin#ui).

## Profiling the Indexing Process
When you enable the `profile-with-puffin` feature of Meilisearch, a Puffin HTTP server will run on Meilisearch and listen on the default _0.0.0.0:8585_ address. This server will record a "frame" whenever it executes the `IndexScheduler::tick` method.
When you enable [the `exportPuffinReports` experimental feature](https://www.meilisearch.com/docs/learn/experimental/overview) of Meilisearch, Puffin reports with the `.puffin` extension will be automatically exported to disk. When this option is enabled, the engine will automatically create a "frame" whenever it executes the `IndexScheduler::tick` method.
Once your Meilisearch is running and awaits new indexation operations, you must [install and run the `puffin_viewer` tool](https://github.com/EmbarkStudios/puffin/tree/main/puffin_viewer) to see the profiling results. I advise you to run the viewer with the `RUST_LOG=puffin_http::client=debug` environment variable to see the client trying to connect to your server.
[Puffin Viewer](https://github.com/EmbarkStudios/puffin/tree/main/puffin_viewer) is used to analyze the reports. Those reports show areas where Meilisearch spent time during indexing.
Another piece of advice on the Puffin viewer UI interface is to consider the _Merge children with same ID_ option. It can hide the exact actual timings at which events were sent. Please turn it off when you see strange gaps on the Flamegraph. It can help.
#[error("Query parameters to filter the tasks to cancel are missing. Available query parameters are: `uids`, `indexUids`, `statuses`, `types`, `canceledBy`, `beforeEnqueuedAt`, `afterEnqueuedAt`, `beforeStartedAt`, `afterStartedAt`, `beforeFinishedAt`, `afterFinishedAt`.")]
// for some reason the db sizes differ. this may be due to the compaction options we have
// set when performing the snapshot
//server.stats(),
// The original instance contains the snapshotCreation task, while the snapshotted-instance does not. For this reason we need to compare the task queue **after** the task 4
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.