Compare commits

..

167 Commits

Author SHA1 Message Date
curquiza
72b4b36516 Remove debug lines 2025-12-04 14:13:46 +01:00
curquiza
d795ade246 Add code samples to generated openAPI file 2025-12-04 14:13:46 +01:00
Louis Dureuil
82adabc5a0 Merge pull request #5861 from meilisearch/upgrade-tests
Declarative tests
2025-12-04 11:00:53 +00:00
Louis Dureuil
c9a22247d2 add hannoy test 2025-12-04 11:41:41 +01:00
Louis Dureuil
c535b8ddef Use variables to account for changes between local and CI 2025-12-04 09:47:37 +01:00
Louis Dureuil
8e89619aed Also evaluate variables in expected responses 2025-12-04 09:47:21 +01:00
Clément Renault
f617ca8e38 Merge pull request #6023 from meilisearch/curquiza-patch-1
Send notifications for Kubernetes integration when releasing
2025-12-04 07:00:50 +00:00
Louis Dureuil
959175ad2a switch to gh runner 2025-12-03 22:59:57 +01:00
Louis Dureuil
341ffbf5ef Modify bot message on db-change labeled PRs 2025-12-03 21:25:41 +01:00
Louis Dureuil
542f3073f4 Appease codeql 2025-12-03 21:25:41 +01:00
Louis Dureuil
0f134b079f hf-embed workload: add ranking scores 2025-12-03 21:25:41 +01:00
Louis Dureuil
9e7ae47355 Add missing sha 2025-12-03 21:25:41 +01:00
Louis Dureuil
1edf07df29 Add tests to CI 2025-12-03 21:25:40 +01:00
Louis Dureuil
88aa3cddde Support local builds of enterprise binaries 2025-12-03 21:25:40 +01:00
Louis Dureuil
e6846cb55a Rename and move the test instructions 2025-12-03 21:25:40 +01:00
Louis Dureuil
29b715e2f9 Update workloads 2025-12-03 21:25:40 +01:00
Louis Dureuil
f28dc5bd2b cleaning 2025-12-03 21:25:40 +01:00
Louis Dureuil
56d0b8ea54 Some cleaning 2025-12-03 21:25:40 +01:00
Louis Dureuil
514edb1b79 Add workloads 2025-12-03 21:25:40 +01:00
Louis Dureuil
cfb609d41d clippy 2025-12-03 21:25:40 +01:00
Louis Dureuil
11cb062067 fmt 2025-12-03 21:25:40 +01:00
Louis Dureuil
2ca4926ac5 Support editions, move to common 2025-12-03 21:25:40 +01:00
Louis Dureuil
834bd9b879 Fix uninitialization issue on unsupported platforms 2025-12-03 21:25:39 +01:00
Louis Dureuil
cac7e00983 Remove chrono 2025-12-03 21:25:39 +01:00
Mubelotix
e9300bac64 Add documentation 2025-12-03 21:25:39 +01:00
Mubelotix
b0da7864a4 Api key tests 2025-12-03 21:25:39 +01:00
Mubelotix
2b9d379feb Add variable registration mechanism 2025-12-03 21:25:39 +01:00
Mubelotix
8d585a04d4 Update movies workload 2025-12-03 21:25:39 +01:00
Mubelotix
0095a72fba Test for upgrade 2025-12-03 21:25:39 +01:00
Mubelotix
651339648c Fix processing time ms 2025-12-03 21:25:39 +01:00
Mubelotix
a489f4c172 Update issue template 2025-12-03 21:25:39 +01:00
Mubelotix
3b875ea00e Update movies 2025-12-03 21:25:39 +01:00
Mubelotix
9d269c499c Fix line feed at the end of files 2025-12-03 21:25:39 +01:00
Mubelotix
da35ae0a6e Update emojis 2025-12-03 21:25:38 +01:00
Mubelotix
61945b235d Add redaction system 2025-12-03 21:25:38 +01:00
Mubelotix
e936ac172d Fix compilation 2025-12-03 21:25:38 +01:00
Mubelotix
162a84cdbf Improve error detection 2025-12-03 21:25:38 +01:00
Mubelotix
92c63cf351 Improve diffing 2025-12-03 21:25:38 +01:00
Mubelotix
fca35b7476 Add upgrade system 2025-12-03 21:25:38 +01:00
Mubelotix
4056657a55 Refactor around meili_path 2025-12-03 21:25:38 +01:00
Mubelotix
685d227597 Move file to common 2025-12-03 21:25:38 +01:00
Mubelotix
49b9f6ff38 Remove useless data 2025-12-03 21:25:38 +01:00
Mubelotix
79d0a3fb97 Remove useless parameter 2025-12-03 21:25:38 +01:00
Mubelotix
313ef7e79b Add response updating logic 2025-12-03 21:25:37 +01:00
Mubelotix
256407be61 Fix asset version issues 2025-12-03 21:25:37 +01:00
Mubelotix
8b3943bd32 Do so that meilisearch versions get downloaded 2025-12-03 21:25:37 +01:00
Mubelotix
87b972d29a Implement test workload running logic 2025-12-03 21:25:37 +01:00
Mubelotix
09ab61b360 Continue integrating commands to tests 2025-12-03 21:25:37 +01:00
Mubelotix
2459f381b4 Remove dead code 2025-12-03 21:25:37 +01:00
Mubelotix
6442f02de4 Make commands common 2025-12-03 21:25:37 +01:00
Mubelotix
91c4d9ea79 Tag workloads 2025-12-03 21:25:37 +01:00
Mubelotix
92a4091da3 Create test workload 2025-12-03 21:25:37 +01:00
Mubelotix
29a337f0f9 Create the test function 2025-12-03 21:25:36 +01:00
Mubelotix
8c3cebadaa Create the test xtask command and args 2025-12-03 21:25:36 +01:00
Clément Renault
b566458aa2 Merge pull request #6027 from meilisearch/release-v1.28.2
Bring back changes from v1.28.2
2025-12-03 17:46:44 +00:00
Clément Renault
ae4344e359 Merge pull request #6004 from meilisearch/default-experimental-vector-store
Make Hannoy the default vector store
2025-12-03 17:16:46 +00:00
Kerollmops
b6cb384650 Fix settings tests 2025-12-03 17:52:52 +01:00
Clément Renault
2c3e3d856c Make hannoy the default vector store when creating an index 2025-12-03 17:52:52 +01:00
Clémentine
93e97f814c Add notifications for Kubernetes integration
Updated comments and conditions for notifying integration teams.
2025-12-03 17:49:46 +01:00
Kerollmops
e9350f033d Limit the number of retrieved task to one 2025-12-03 17:43:48 +01:00
Kerollmops
54c92fd6c0 Update the snapshots 2025-12-03 17:43:48 +01:00
Kerollmops
4f4df83a51 Bump the version to v1.28.2 2025-12-03 17:43:48 +01:00
Clément Renault
a51021cab7 Merge pull request #6026 from meilisearch/free-space
Fix the CI issues
2025-12-03 16:18:41 +00:00
Louis Dureuil
e33f4fdeae Attempt to eschew containers for ubuntu 2025-12-03 16:28:19 +01:00
Louis Dureuil
e407bca196 use feature as cache key 2025-12-03 16:24:48 +01:00
Louis Dureuil
cd24ea11b4 correctly clean space + remove test in debug 2025-12-03 16:12:08 +01:00
Louis Dureuil
ba578e7ab5 Fix ollama test following update on their side 2025-12-03 15:48:30 +01:00
Louis Dureuil
05a74d1e68 remove non-existing rust-toolchain action arguments 2025-12-03 15:37:51 +01:00
Louis Dureuil
41d61deb97 Make runners/containers more uniform 2025-12-03 15:34:57 +01:00
Louis Dureuil
bba292b01a Run ollama test on 22.04 2025-12-03 15:21:02 +01:00
Louis Dureuil
96923dff33 adjust test suite 2025-12-03 15:01:58 +01:00
Louis Dureuil
8f9c9305da set back the cache 2025-12-03 14:10:18 +01:00
Louis Dureuil
a9f309e1d1 Remove macos and windows from PRs 2025-12-03 13:54:02 +01:00
Louis Dureuil
e456a9acd8 Add the disk freeing to all ubuntu-22.04 jobs 2025-12-03 11:51:42 +01:00
Louis Dureuil
9b7d29466c Attempt to earn some free space... 2025-12-03 11:41:00 +01:00
Clément Renault
b0ef14b6f0 Merge pull request #5983 from meilisearch/new-searchable-settings-indexer
Support the searchable and exact attributes in the new Settings Indexer
2025-12-02 11:03:36 +00:00
Clément Renault
36febe2068 Merge pull request #6021 from meilisearch/skip-macos-windows-in-merge-queue
Skip the macOS and Windows CI in the merge queue
2025-12-02 08:29:06 +00:00
Kerollmops
6f14a6ec18 Skip the macOS and Windows CI in the merge queue 2025-12-01 16:59:55 +01:00
Clément Renault
1a45b19e7e Merge pull request #6020 from meilisearch/fix-release-ci-enterprise
Fix release CI after enterprise merge
2025-12-01 15:12:00 +00:00
Kerollmops
bd7525b166 Update the snapshots 2025-12-01 15:26:00 +01:00
Kerollmops
359757d939 Bump patch version 2025-12-01 15:25:56 +01:00
Paul de Nonancourt
1c6eea596c fix: Only trigger Cloud CI for enterprise edition 2025-12-01 15:08:23 +01:00
Paul de Nonancourt
693b6f483e fix: Update binary path for target x86_64 meilisearch release 2025-12-01 15:07:55 +01:00
Many the fish
818a4aa6d9 Merge pull request #6016 from EclipseAditya/fix-sort-on-empty-attribute-5998
Fix sort on /documents endpoint when field has no values
2025-12-01 13:50:05 +00:00
Clément Renault
ddadeb99e9 Merge pull request #6019 from meilisearch/bump-version
Bump version to v1.28
2025-12-01 10:26:51 +00:00
Kerollmops
b8d8be934a Update snapshots 2025-12-01 10:52:57 +01:00
Kerollmops
7175d70b8f List the version in the upgrades 2025-12-01 10:29:33 +01:00
Kerollmops
8a3e65ab6f Bump version to v1.28 2025-12-01 10:23:42 +01:00
EclipseAditya
4737e1a2a5 Fix rustfmt formatting issues 2025-11-30 06:02:02 +00:00
EclipseAditya
36522e951b Fix sort on /documents endpoint when field has no values 2025-11-28 15:22:57 +00:00
Kerollmops
fce046d84d Fix non-detected searchable attribute 2025-11-28 11:29:31 +01:00
Kerollmops
3fc507bb44 Introduce a test for when a new nested field becomes searchable 2025-11-28 11:29:31 +01:00
Kerollmops
fdbcd033fb Clean up the CI 2025-11-28 11:29:31 +01:00
Clément Renault
aaab49baca Fix a bug and improve code quality
Co-authored-by: Many the fish <many@meilisearch.com>
2025-11-28 11:29:31 +01:00
Kerollmops
0d0d6e8099 Update the proximity precision for the settings delta 2025-11-28 11:29:31 +01:00
Clément Renault
c1e351c92b Show available space 2025-11-28 11:29:31 +01:00
Clément Renault
67cab4cc9d Trigger the new settings indexer when changing the proximity precision 2025-11-28 11:29:31 +01:00
Clément Renault
f30a37b0fe Clear old word prefix fid docids entries when removing searchable fields 2025-11-28 11:29:31 +01:00
Clément Renault
a78a9f80dd Introduce the word pair proximity extractor 2025-11-28 11:29:31 +01:00
Clément Renault
439fee5434 Move the has_searchable_children function to the appropriate module 2025-11-28 11:29:31 +01:00
Clément Renault
9e858590e0 Rename the function to extract document words when a setting changes
Co-authored-By: Maxime Legendre <maxime@meilisearch.com>
2025-11-28 11:29:31 +01:00
Clément Renault
29eebd5f93 Merge the logic of the function detecting searchable children fields 2025-11-28 11:29:31 +01:00
Clément Renault
07da6edbdf Fix a bug when nested fields appear
Co-authored-by: Many the fish <many@meilisearch.com>
2025-11-28 11:29:31 +01:00
Clément Renault
22b83042e6 Add some comments
Co-authored-by: Many the fish <many@meilisearch.com>
2025-11-28 11:29:31 +01:00
Clément Renault
52ab13906a Fix a test trying to change settings with a wtxn 2025-11-28 11:29:31 +01:00
Clément Renault
29bec8efd4 Make sure the embedders supports changing searchables 2025-11-28 11:29:31 +01:00
Clément Renault
6947a8990b Make sure we don't crash on unreferenced fields 2025-11-28 11:29:31 +01:00
Clément Renault
fbb2bb0c73 Make clippy happy 2025-11-28 11:29:31 +01:00
Clément Renault
15918f53a9 Introduce new progress steps when deleting fid-based entries 2025-11-28 11:29:30 +01:00
Clément Renault
d7f5f3a0a3 Delete entries from fid-based databases when searchables are deleted 2025-11-28 11:29:30 +01:00
Clément Renault
1afbf35f27 Support exact attributes in the settings delta 2025-11-28 11:29:30 +01:00
Clément Renault
d7675233d5 Call the post processing in the new settings indexer 2025-11-28 11:29:30 +01:00
Clément Renault
c63c1ac32b Support exact attributes in the field metadata 2025-11-28 11:29:30 +01:00
Clément Renault
6171dcde0d Call the new searchable extractor 2025-11-28 11:29:30 +01:00
Clément Renault
04bc134324 Introduce the new searchable extractor 2025-11-28 11:29:30 +01:00
Clément Renault
8ff39d927d Enable the new settings indexer when the searchable or exact are updates 2025-11-28 11:29:30 +01:00
Clément Renault
ffd461c800 Merge pull request #6011 from meilisearch/enterprise-feature
Add support for conditional compilation of the EE
2025-11-27 20:43:09 +00:00
Clément Renault
9134d27980 Merge pull request #6013 from meilisearch/fix-sdk-tests
Fix SDK test to use EE
2025-11-27 19:24:37 +00:00
curquiza
f60242979f Fix SDK test to use EE 2025-11-27 17:51:27 +01:00
Clément Renault
d347417cfd Merge pull request #5956 from meilisearch/progress-trace-in-metrics
Expose batch progress traces on the metrics route
2025-11-27 16:05:13 +00:00
Paul de Nonancourt
55d54afd69 Build different community and enterprise Docker images in CI 2025-11-27 14:31:08 +01:00
Kerollmops
dca7679c47 Change the binary name format to suffix meilisearch with enterprise 2025-11-27 13:56:29 +01:00
Kerollmops
a34b692396 Remove Cross compilation file 2025-11-27 13:53:23 +01:00
Kerollmops
63829b62e9 Cleanup useless references to jemalloc 2025-11-27 13:53:23 +01:00
Kerollmops
44c8252ad5 Merge the publish binaries job 2025-11-27 13:53:23 +01:00
Kerollmops
19ae428890 Introduce a matrix for the tests CIs 2025-11-27 13:53:13 +01:00
Many the fish
7adcb657ae Merge pull request #6007 from meilisearch/update-charabia-v0.9.9
Update charabia v0.9.9
2025-11-27 12:37:55 +00:00
Louis Dureuil
9624768976 Add support for conditional compilation of the EE 2025-11-27 10:53:46 +01:00
Clément Renault
5025acfd2a Merge pull request #6012 from meilisearch/update-test-job-name
Remove version from the name of the test job in CI
2025-11-27 08:29:32 +00:00
Paul de Nonancourt
4bbfdccc3e Remove version from the label of the test 2025-11-26 16:25:15 +01:00
Many the fish
a5b24b54b8 Merge pull request #6002 from meilisearch/update-dependencies
Upgrade most of the dependencies
2025-11-26 13:26:27 +00:00
Clément Renault
461e69c143 Merge pull request #6003 from meilisearch/build-arm-images-on-arm-runner
Build x86 and ARM images on Github-hosted runners
2025-11-26 11:53:47 +00:00
Clément Renault
915aeafefe Update the workflow name 2025-11-26 11:33:23 +01:00
Louis Dureuil
408529d8b2 compile gemm-16 optimized for ARM compatibility
Co-Authored-By: Paul de Nonancourt <paul@meilisearch.com>
2025-11-26 10:49:10 +01:00
Paul de Nonancourt
1724ab6d94 Run tests on both arm64 and x86 Github-hosted runners 2025-11-26 10:49:10 +01:00
Paul de Nonancourt
49a500a342 Fix cosign digest signature 2025-11-26 10:49:10 +01:00
Paul de Nonancourt
f26eabcfa1 Merge manifests into multi-architecture Docker image 2025-11-26 10:49:10 +01:00
Paul de Nonancourt
b468c090f3 Build ARM64 and AMD64 images on Github-hosted runners 2025-11-26 10:49:10 +01:00
Clément Renault
c14114840e Remove container 2025-11-26 10:45:12 +01:00
ManyTheFish
7933d1f9ea Update charabia v0.9.9 2025-11-24 15:13:11 +01:00
Clément Renault
6f1d3f337b Merge pull request #6006 from meilisearch/bump-version
Bump version to v1.27.0
2025-11-24 12:28:44 +00:00
Clément Renault
9640706c5a Do a no-op when upgrading version 2025-11-24 10:43:27 +01:00
Clément Renault
01cd273a52 Update the snapshots 2025-11-24 10:40:06 +01:00
Clément Renault
ae87d1cab9 Bump version in Cargo.toml 2025-11-24 10:32:32 +01:00
Clément Renault
d5a5372aba Only provide the last batch info 2025-11-20 12:02:29 +01:00
Clément Renault
cf62af13e8 Merge pull request #6005 from meilisearch/clamp-max-batch-size
Clamp max batch size to 10 GiB
2025-11-20 10:45:23 +00:00
Clément Renault
0d5e176dc2 Bump some of the incompatible dependencies 2025-11-20 11:45:08 +01:00
Clément Renault
d6f36a773d Update the compatible dependencies 2025-11-20 11:45:08 +01:00
Many the fish
91cf94c196 Merge pull request #5999 from meilisearch/fix-document-fetch-sort
Fix the Document Fetch pagination bug when Sort is applied
2025-11-20 10:15:04 +00:00
Clément Renault
753ba39199 Update the documentation of the batch size 2025-11-20 10:33:02 +01:00
Clément Renault
3944c25853 Clamp the maximum batch size to maximum 10GiB 2025-11-20 10:29:50 +01:00
ManyTheFish
925bce5fbd Modify the test to test all the sort branches and fix the untested branch 2025-11-20 10:27:24 +01:00
ManyTheFish
62065ed30d Fix the pagination bug
where the last document of the previous page was duplicated as the first
document of the current page. This was due to a bug on the custom nth
function of the sort ranking rule skipping `n-1` documents instead of `n`
2025-11-20 10:27:24 +01:00
Clément Renault
97e6ae1957 Merge pull request #5994 from meilisearch/improve-s3-error-messages
Improve S3 upload by showing errors in the task queue
2025-11-19 16:58:02 +00:00
Clément Renault
5ed9be0789 Merge pull request #5990 from meilisearch/default-max-batch-size
Make the limit batched tasks total size defaults to half of the max indexing memory
2025-11-19 16:56:34 +00:00
Clément Renault
7597b1049f Merge pull request #6001 from meilisearch/update-windows-macos-ci
Update the macOS platform version in the CI
2025-11-19 16:12:52 +00:00
Clément Renault
d99150f21b Improve error message extraction
Co-authored-by: Many the fish <many@meilisearch.com>
2025-11-19 17:09:15 +01:00
Kerollmops
c9726674a0 Make the limit batched tasks total size default to half of max indexing
memory
2025-11-19 17:04:45 +01:00
Clément Renault
205f40b3b8 Update the macOS platform version to use version 14 2025-11-19 16:10:41 +01:00
Kerollmops
361580f451 Display the error message on failure 2025-11-17 09:21:18 +01:00
Kerollmops
a8d55562e9 Expose the three last batches timings 2025-11-03 16:01:05 +01:00
Kerollmops
40d649ec9e Update utoipa 2025-11-03 15:53:14 +01:00
Kerollmops
c272ac8204 Reset metrics values to keep current steps only 2025-11-03 15:41:54 +01:00
Kerollmops
e18c677f0e Expose the step currently running on the metrics route 2025-11-03 15:28:58 +01:00
Kerollmops
84a288da57 Simplify the auth filters 2025-11-03 15:11:28 +01:00
Kerollmops
cbfc325b56 Expose the metrics for the last finished batch and not the processing
one
2025-11-03 15:10:23 +01:00
Kerollmops
ea640b076e Expose batch progress traces on the metrics route 2025-10-24 14:36:21 +02:00
142 changed files with 6847 additions and 1894 deletions

View File

@@ -24,6 +24,11 @@ TBD
- [ ] If not, add the `no db change` label to your PR, and you're good to merge.
- [ ] If yes, add the `db change` label to your PR. You'll receive a message explaining you what to do.
### Reminders when adding features
- [ ] Write unit tests using insta
- [ ] Write declarative integration tests in [workloads/tests](https://github.com/meilisearch/meilisearch/tree/main/workloads/test). Specify the routes to call and then call `cargo xtask test workloads/tests/YOUR_TEST.json --update-responses` so that responses are automatically filled.
### Reminders when modifying the API
- [ ] Update the openAPI file with utoipa:

View File

@@ -67,8 +67,6 @@ jobs:
ref: ${{ steps.comment-branch.outputs.head_ref }}
- uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
- name: Run benchmarks on PR ${{ github.event.issue.id }}
run: |

View File

@@ -13,8 +13,6 @@ jobs:
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch main - Commit ${{ github.sha }}

View File

@@ -6,7 +6,7 @@ on:
env:
MESSAGE: |
### Hello, I'm a bot 🤖
### Hello, I'm a bot 🤖
You are receiving this message because you declared that this PR make changes to the Meilisearch database.
Depending on the nature of the change, additional actions might be required on your part. The following sections detail the additional actions depending on the nature of the change, please copy the relevant section in the description of your PR, and make sure to perform the required actions.
@@ -19,6 +19,7 @@ env:
- [ ] Detail the change to the DB format and why they are forward compatible
- [ ] Forward-compatibility: A database created before this PR and using the features touched by this PR was able to be opened by a Meilisearch produced by the code of this PR.
- [ ] Declarative test: add a [declarative test containing a dumpless upgrade](https://github.com/meilisearch/meilisearch/blob/main/TESTING.md#typical-usage)
## This PR makes breaking changes
@@ -35,8 +36,7 @@ env:
- [ ] Write the code to go from the old database to the new one
- If the change happened in milli, the upgrade function should be written and called [here](https://github.com/meilisearch/meilisearch/blob/3fd86e8d76d7d468b0095d679adb09211ca3b6c0/crates/milli/src/update/upgrade/mod.rs#L24-L47)
- If the change happened in the index-scheduler, we've never done it yet, but the right place to do it should be [here](https://github.com/meilisearch/meilisearch/blob/3fd86e8d76d7d468b0095d679adb09211ca3b6c0/crates/index-scheduler/src/scheduler/process_upgrade/mod.rs#L13)
- [ ] Write an integration test [here](https://github.com/meilisearch/meilisearch/blob/main/crates/meilisearch/tests/upgrade/mod.rs) ensuring you can read the old database, upgrade to the new database, and read the new database as expected
- [ ] Declarative test: add a [declarative test containing a dumpless upgrade](https://github.com/meilisearch/meilisearch/blob/main/TESTING.md#typical-usage)
jobs:
add-comment:

View File

@@ -13,6 +13,12 @@ jobs:
image: ubuntu:22.04
steps:
- uses: actions/checkout@v5
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl

View File

@@ -13,8 +13,6 @@ jobs:
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
# Run benchmarks
- name: Run the fuzzer

View File

@@ -25,6 +25,12 @@ jobs:
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- uses: dtolnay/rust-toolchain@1.89
- name: Install cargo-deb
run: cargo install cargo-deb

View File

@@ -13,9 +13,6 @@ on:
- cron: '0 23 * * *' # Every day at 11:00pm
workflow_dispatch:
env:
REGISTRY_IMAGE: getmeili/meilisearch
jobs:
build:
runs-on: ${{ matrix.runner }}
@@ -23,11 +20,18 @@ jobs:
strategy:
matrix:
platform: [amd64, arm64]
edition: [community, enterprise]
include:
- platform: amd64
runner: ubuntu-24.04
- platform: arm64
runner: ubuntu-24.04-arm
- edition: community
registry: getmeili/meilisearch
feature-flag: ""
- edition: enterprise
registry: getmeili/meilisearch-enterprise
feature-flag: "--features enterprise"
permissions: {}
steps:
@@ -54,7 +58,7 @@ jobs:
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY_IMAGE }}
images: ${{ matrix.registry }}
# Prevent `latest` to be updated for each new tag pushed.
# We need latest and `vX.Y` tags to only be pushed for the stable Meilisearch releases.
flavor: latest=false
@@ -71,12 +75,13 @@ jobs:
with:
platforms: linux/${{ matrix.platform }}
labels: ${{ steps.meta.outputs.labels }}
tags: ${{ env.REGISTRY_IMAGE }}
tags: ${{ matrix.registry }}
outputs: type=image,push-by-digest=true,name-canonical=true,push=true
build-args: |
COMMIT_SHA=${{ github.sha }}
COMMIT_DATE=${{ steps.build-metadata.outputs.date }}
GIT_TAG=${{ github.ref_name }}
EXTRA_ARGS=${{ matrix.feature-flag }}
- name: Export digest
run: |
@@ -87,13 +92,21 @@ jobs:
- name: Upload digest
uses: actions/upload-artifact@v4
with:
name: digests-${{ env.PLATFORM_PAIR }}
name: digests-${{ matrix.edition }}-${{ env.PLATFORM_PAIR }}
path: ${{ runner.temp }}/digests/*
if-no-files-found: error
retention-days: 1
merge:
runs-on: ubuntu-latest
strategy:
matrix:
edition: [community, enterprise]
include:
- edition: community
registry: getmeili/meilisearch
- edition: enterprise
registry: getmeili/meilisearch-enterprise
needs:
- build
@@ -147,7 +160,7 @@ jobs:
uses: actions/download-artifact@v4
with:
path: ${{ runner.temp }}/digests
pattern: digests-*
pattern: digests-${{ matrix.edition }}-*
merge-multiple: true
- name: Login to Docker Hub
@@ -163,7 +176,7 @@ jobs:
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY_IMAGE }}
images: ${{ matrix.registry }}
# Prevent `latest` to be updated for each new tag pushed.
# We need latest and `vX.Y` tags to only be pushed for the stable Meilisearch releases.
flavor: latest=false
@@ -178,12 +191,12 @@ jobs:
working-directory: ${{ runner.temp }}/digests
run: |
docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \
$(printf '${{ env.REGISTRY_IMAGE }}@sha256:%s ' *)
$(printf '${{ matrix.registry }}@sha256:%s ' *)
- name: Inspect image
- name: Inspect image to fetch digest to sign
run: |
digest=$(docker buildx imagetools inspect ${{ env.REGISTRY_IMAGE }}:${{ steps.meta.outputs.version }})
echo "DIGEST=${platform}" >> $GITHUB_ENV
digest=$(docker buildx imagetools inspect --format='{{ json .Manifest }}' ${{ matrix.registry }}:${{ steps.meta.outputs.version }} | jq -r '.digest')
echo "DIGEST=${digest}" >> $GITHUB_ENV
- name: Sign the images with GitHub OIDC Token
env:
@@ -195,10 +208,10 @@ jobs:
done
cosign sign --yes ${images}
# /!\ Don't touch this without checking with Cloud team
- name: Send CI information to Cloud team
# /!\ Don't touch this without checking with engineers working on the Cloud code base on #discussion-engineering Slack channel
- name: Notify meilisearch-cloud
# Do not send if nightly build (i.e. 'schedule' or 'workflow_dispatch' event)
if: github.event_name == 'push'
if: ${{ (github.event_name == 'push') && (matrix.edition == 'enterprise') }}
uses: peter-evans/repository-dispatch@v3
with:
token: ${{ secrets.MEILI_BOT_GH_PAT }}
@@ -206,21 +219,13 @@ jobs:
event-type: cloud-docker-build
client-payload: '{ "meilisearch_version": "${{ github.ref_name }}", "stable": "${{ steps.check-tag-format.outputs.stable }}" }'
# Send notification to Swarmia to notify of a deployment: https://app.swarmia.com
# - name: 'Setup jq'
# uses: dcarbone/install-jq-action
# - name: Send deployment to Swarmia
# if: github.event_name == 'push' && success()
# run: |
# JSON_STRING=$( jq --null-input --compact-output \
# --arg version "${{ github.ref_name }}" \
# --arg appName "meilisearch" \
# --arg environment "production" \
# --arg commitSha "${{ github.sha }}" \
# --arg repositoryFullName "${{ github.repository }}" \
# '{"version": $version, "appName": $appName, "environment": $environment, "commitSha": $commitSha, "repositoryFullName": $repositoryFullName}' )
# curl -H "Authorization: ${{ secrets.SWARMIA_DEPLOYMENTS_AUTHORIZATION }}" \
# -H "Content-Type: application/json" \
# -d "$JSON_STRING" \
# https://hook.swarmia.com/deployments
# /!\ Don't touch this without checking with integration team members on #discussion-integrations Slack channel
- name: Notify meilisearch-kubernetes
# Do not send if nightly build (i.e. 'schedule' or 'workflow_dispatch' event), or if not stable
if: ${{ github.event_name == 'push' && matrix.edition == 'community' && steps.check-tag-format.outputs.stable == 'true' }}
uses: peter-evans/repository-dispatch@v3
with:
token: ${{ secrets.MEILI_BOT_GH_PAT }}
repository: meilisearch/meilisearch-kubernetes
event-type: meilisearch-release
client-payload: '{ "version": "${{ github.ref_name }}" }'

View File

@@ -32,157 +32,61 @@ jobs:
if: github.event_name == 'release' && steps.check-tag-format.outputs.stable == 'true'
run: bash .github/scripts/check-release.sh
publish-linux:
name: Publish binary for Linux
runs-on: ubuntu-latest
needs: check-version
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
steps:
- uses: actions/checkout@v5
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.89
- name: Build
run: cargo build --release --locked
# No need to upload binaries for dry run (cron or workflow_dispatch)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.11.2
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/meilisearch
asset_name: meilisearch-linux-amd64
tag: ${{ github.ref }}
publish-macos-windows:
name: Publish binary for ${{ matrix.os }}
publish-binaries:
name: Publish binary for ${{ matrix.release }} ${{ matrix.edition }} edition
runs-on: ${{ matrix.os }}
needs: check-version
strategy:
fail-fast: false
matrix:
os: [macos-13, windows-2022]
edition: [community, enterprise]
release:
[macos-amd64, macos-aarch64, windows, linux-amd64, linux-aarch64]
include:
- os: macos-13
artifact_name: meilisearch
asset_name: meilisearch-macos-amd64
- os: windows-2022
artifact_name: meilisearch.exe
asset_name: meilisearch-windows-amd64.exe
- edition: "community"
feature-flag: ""
edition-suffix: ""
- edition: "enterprise"
feature-flag: "--features enterprise"
edition-suffix: "enterprise-"
- release: macos-amd64
os: macos-15-intel
binary_path: release/meilisearch
asset_name: macos-amd64
extra-args: ""
- release: macos-aarch64
os: macos-14
binary_path: aarch64-apple-darwin/release/meilisearch
asset_name: macos-apple-silicon
extra-args: "--target aarch64-apple-darwin"
- release: windows
os: windows-2022
binary_path: release/meilisearch.exe
asset_name: windows-amd64.exe
extra-args: ""
- release: linux-amd64
os: ubuntu-22.04
binary_path: x86_64-unknown-linux-gnu/release/meilisearch
asset_name: linux-amd64
extra-args: "--target x86_64-unknown-linux-gnu"
- release: linux-aarch64
os: ubuntu-22.04-arm
binary_path: aarch64-unknown-linux-gnu/release/meilisearch
asset_name: linux-aarch64
extra-args: "--target aarch64-unknown-linux-gnu"
needs: check-version
steps:
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@1.89
- name: Build
run: cargo build --release --locked
run: cargo build --release --locked ${{ matrix.feature-flag }} ${{ matrix.extra-args }}
# No need to upload binaries for dry run (cron or workflow_dispatch)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.11.2
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/${{ matrix.artifact_name }}
asset_name: ${{ matrix.asset_name }}
tag: ${{ github.ref }}
publish-macos-apple-silicon:
name: Publish binary for macOS silicon
runs-on: macos-13
needs: check-version
strategy:
matrix:
include:
- target: aarch64-apple-darwin
asset_name: meilisearch-macos-apple-silicon
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Installing Rust toolchain
uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
target: ${{ matrix.target }}
- name: Cargo build
uses: actions-rs/cargo@v1
with:
command: build
args: --release --target ${{ matrix.target }}
- name: Upload the binary to release
# No need to upload binaries for dry run (cron or workflow_dispatch)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.11.2
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/${{ matrix.target }}/release/meilisearch
asset_name: ${{ matrix.asset_name }}
tag: ${{ github.ref }}
publish-aarch64:
name: Publish binary for aarch64
runs-on: ubuntu-latest
needs: check-version
env:
DEBIAN_FRONTEND: noninteractive
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
strategy:
matrix:
include:
- target: aarch64-unknown-linux-gnu
asset_name: meilisearch-linux-aarch64
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Install needed dependencies
run: |
apt-get update -y && apt upgrade -y
apt-get install -y curl build-essential gcc-aarch64-linux-gnu
- name: Set up Docker for cross compilation
run: |
apt-get install -y curl apt-transport-https ca-certificates software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
add-apt-repository "deb [arch=$(dpkg --print-architecture)] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update -y && apt-get install -y docker-ce
- name: Installing Rust toolchain
uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
target: ${{ matrix.target }}
- name: Configure target aarch64 GNU
## Environment variable is not passed using env:
## LD gold won't work with MUSL
# env:
# JEMALLOC_SYS_WITH_LG_PAGE: 16
# RUSTFLAGS: '-Clink-arg=-fuse-ld=gold'
run: |
echo '[target.aarch64-unknown-linux-gnu]' >> ~/.cargo/config
echo 'linker = "aarch64-linux-gnu-gcc"' >> ~/.cargo/config
echo 'JEMALLOC_SYS_WITH_LG_PAGE=16' >> $GITHUB_ENV
- name: Install a default toolchain that will be used to build cargo cross
run: |
rustup default stable
- name: Cargo build
uses: actions-rs/cargo@v1
with:
command: build
use-cross: true
args: --release --target ${{ matrix.target }}
env:
CROSS_DOCKER_IN_DOCKER: true
- name: List target output files
run: ls -lR ./target
- name: Upload the binary to release
# No need to upload binaries for dry run (cron or workflow_dispatch)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.11.2
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/${{ matrix.target }}/release/meilisearch
asset_name: ${{ matrix.asset_name }}
file: target/${{ matrix.binary_path }}
asset_name: meilisearch-${{ matrix.edition-suffix }}${{ matrix.asset_name }}
tag: ${{ github.ref }}
publish-openapi-file:

View File

@@ -68,7 +68,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -92,7 +92,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -122,7 +122,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -149,7 +149,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -184,7 +184,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -213,7 +213,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -238,7 +238,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -263,7 +263,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -284,7 +284,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -307,7 +307,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -338,7 +338,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
@@ -370,7 +370,7 @@ jobs:
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
image: getmeili/meilisearch-enterprise:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}

View File

@@ -15,31 +15,40 @@ env:
jobs:
test-linux:
name: Tests on ubuntu-22.04
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
name: Tests on Ubuntu
runs-on: ${{ matrix.runner }}
strategy:
matrix:
runner: [ubuntu-22.04, ubuntu-22.04-arm]
features: ["", "--features enterprise"]
steps:
- uses: actions/checkout@v5
- name: Install needed dependencies
- name: check free space before
run: df -h
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- name: check free space after
run: df -h
- name: Setup test with Rust stable
uses: dtolnay/rust-toolchain@1.89
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
- name: Run cargo check without any default features
with:
key: ${{ matrix.features }}
- name: Run cargo build without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --locked --release --no-default-features --all
args: --locked --no-default-features --all
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all
args: --locked --all ${{ matrix.features }}
test-others:
name: Tests on ${{ matrix.os }}
@@ -47,51 +56,58 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [macos-13, windows-2022]
os: [macos-14, windows-2022]
features: ["", "--features enterprise"]
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
steps:
- uses: actions/checkout@v5
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
- uses: dtolnay/rust-toolchain@1.89
- name: Run cargo check without any default features
- name: Run cargo build without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --locked --release --no-default-features --all
args: --locked --no-default-features --all
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all
args: --locked --all ${{ matrix.features }}
test-all-features:
name: Tests almost all features
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
runs-on: ubuntu-22.04
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
steps:
- uses: actions/checkout@v5
- name: Install needed dependencies
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
apt-get update
apt-get install --assume-yes build-essential curl
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- uses: dtolnay/rust-toolchain@1.89
- name: Run cargo build with almost all features
run: |
cargo build --workspace --locked --release --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
cargo build --workspace --locked --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
- name: Run cargo test with almost all features
run: |
cargo test --workspace --locked --release --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
cargo test --workspace --locked --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
ollama-ubuntu:
name: Test with Ollama
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
env:
MEILI_TEST_OLLAMA_SERVER: "http://localhost:11434"
steps:
- uses: actions/checkout@v5
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- name: Install Ollama
run: |
curl -fsSL https://ollama.com/install.sh | sudo -E sh
@@ -115,20 +131,20 @@ jobs:
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all --features test-ollama ollama
args: --locked -p meilisearch --features test-ollama ollama
test-disabled-tokenization:
name: Test disabled tokenization
runs-on: ubuntu-latest
container:
image: ubuntu:22.04
runs-on: ubuntu-22.04
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
steps:
- uses: actions/checkout@v5
- name: Install needed dependencies
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
apt-get update
apt-get install --assume-yes build-essential curl
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- uses: dtolnay/rust-toolchain@1.89
- name: Run cargo tree without default features and check lindera is not present
run: |
@@ -140,36 +156,42 @@ jobs:
run: |
cargo tree -f '{p} {f}' -e normal | grep lindera -qz
# We run tests in debug also, to make sure that the debug_assertions are hit
test-debug:
name: Run tests in debug
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
build:
name: Build in release
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v5
- name: Install needed dependencies
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- uses: dtolnay/rust-toolchain@1.89
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
- name: Run tests in debug
- name: Run cargo build in release
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --all
command: build
args: --all-targets --release
clippy:
name: Run Clippy
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
strategy:
matrix:
features: ["", "--features enterprise"]
steps:
- uses: actions/checkout@v5
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
components: clippy
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
@@ -177,18 +199,21 @@ jobs:
uses: actions-rs/cargo@v1
with:
command: clippy
args: --all-targets -- --deny warnings
args: --all-targets ${{ matrix.features }} -- --deny warnings
fmt:
name: Run Rustfmt
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v5
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
toolchain: nightly-2024-07-09
override: true
components: rustfmt
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
@@ -199,3 +224,23 @@ jobs:
run: |
echo -ne "\n" > crates/benchmarks/benches/datasets_paths.rs
cargo fmt --all -- --check
declarative-tests:
name: Run declarative tests
runs-on: ubuntu-22.04-arm
permissions:
contents: read
steps:
- uses: actions/checkout@v5
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- uses: dtolnay/rust-toolchain@1.89
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.8.0
- name: Run declarative tests
run: |
cargo xtask test workloads/tests/*.json

View File

@@ -18,9 +18,13 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: Clean space as per https://github.com/actions/virtual-environments/issues/709
run: |
sudo rm -rf "/opt/ghc" || true
sudo rm -rf "/usr/share/dotnet" || true
sudo rm -rf "/usr/local/lib/android" || true
sudo rm -rf "/usr/local/share/boost" || true
- uses: dtolnay/rust-toolchain@1.89
with:
profile: minimal
- name: Install sd
run: cargo install sd
- name: Update Cargo.toml file

View File

@@ -124,6 +124,7 @@ They are JSON files with the following structure (comments are not actually supp
{
// Name of the workload. Must be unique to the workload, as it will be used to group results on the dashboard.
"name": "hackernews.ndjson_1M,no-threads",
"type": "bench",
// Number of consecutive runs of the commands that should be performed.
// Each run uses a fresh instance of Meilisearch and a fresh database.
// Each run produces its own report file.

1875
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -23,7 +23,7 @@ members = [
]
[workspace.package]
version = "1.26.0"
version = "1.28.2"
authors = [
"Quentin de Quelen <quentin@dequelen.me>",
"Clément Renault <clement@meilisearch.com>",
@@ -50,3 +50,5 @@ opt-level = 3
opt-level = 3
[profile.dev.package.roaring]
opt-level = 3
[profile.dev.package.gemm-f16]
opt-level = 3

View File

@@ -1,7 +0,0 @@
[build.env]
passthrough = [
"RUST_BACKTRACE",
"CARGO_TERM_COLOR",
"RUSTFLAGS",
"JEMALLOC_SYS_WITH_LG_PAGE"
]

View File

@@ -8,16 +8,14 @@ WORKDIR /
ARG COMMIT_SHA
ARG COMMIT_DATE
ARG GIT_TAG
ARG EXTRA_ARGS
ENV VERGEN_GIT_SHA=${COMMIT_SHA} VERGEN_GIT_COMMIT_TIMESTAMP=${COMMIT_DATE} VERGEN_GIT_DESCRIBE=${GIT_TAG}
ENV RUSTFLAGS="-C target-feature=-crt-static"
COPY . .
RUN set -eux; \
apkArch="$(apk --print-arch)"; \
if [ "$apkArch" = "aarch64" ]; then \
export JEMALLOC_SYS_WITH_LG_PAGE=16; \
fi && \
cargo build --release -p meilisearch -p meilitool
cargo build --release -p meilisearch -p meilitool ${EXTRA_ARGS}
# Run
FROM alpine:3.22

326
TESTING.md Normal file
View File

@@ -0,0 +1,326 @@
# Declarative tests
Declarative tests ensure that Meilisearch features remain stable across versions.
While we already have unit tests, those are run against **temporary databases** that are created fresh each time and therefore never risk corruption.
Declarative tests instead **simulate the lifetime of a database**: they chain together commands and requests to change the binary, verifying that database state and API responses remain consistent.
## Basic example
```jsonc
{
"type": "test",
"name": "api-keys",
"binary": { // the first command will run on the binary following this specification.
"source": "release", // get the binary as a release from GitHub
"version": "1.19.0", // version to fetch
"edition": "community" // edition to fetch
},
"commands": []
}
```
This example defines a no-op test (it does nothing).
If the file is saved at `workloads/tests/example.json`, you can run it with:
```bash
cargo xtask test workloads/tests/example.json
```
## Commands
Commands represent API requests sent to Meilisearch endpoints during a test.
They are executed sequentially, and their responses can be validated to ensure consistent behavior across upgrades.
```jsonc
{
"route": "keys",
"method": "POST",
"body": {
"inline": {
"actions": [
"search",
"documents.add"
],
"description": "Test API Key",
"expiresAt": null,
"indexes": [ "movies" ]
}
}
}
```
This command issues a `POST /keys` request, creating an API key with permissions to search and add documents in the `movies` index.
### Using assets in commands
To keep tests concise and reusable, you can define **assets** at the root of the workload file.
Assets are external data sources (such as datasets) that are cached between runs, making tests faster and easier to read.
```jsonc
{
"type": "test",
"name": "movies",
"binary": {
"source": "release",
"version": "1.19.0",
"edition": "community"
},
"assets": {
"movies.json": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/movies.json",
"sha256": "5b6e4cb660bc20327776e8a33ea197b43d9ec84856710ead1cc87ab24df77de1"
}
},
"commands": [
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "movies.json"
}
}
]
}
```
In this example:
- The `movies.json` dataset is defined as an asset, pointing to a remote URL.
- The SHA-256 checksum ensures integrity.
- The `POST /indexes/movies/documents` command uses this asset as the request body.
This makes the test much cleaner than inlining a large dataset directly into the command.
For asset handling, please refer to the [declarative benchmarks documentation](/BENCHMARKS.md#adding-new-assets).
### Asserting responses
Commands can specify both the **expected status code** and the **expected response body**.
```jsonc
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "movies.json"
},
"expectedStatus": 202,
"expectedResponse": {
"enqueuedAt": "[timestamp]", // Set to a bracketed string to ignore the value
"indexUid": "movies",
"status": "enqueued",
"taskUid": 1,
"type": "documentAdditionOrUpdate"
},
"synchronous": "WaitForTask"
}
```
Manually writing `expectedResponse` fields can be tedious.
Instead, you can let the test runner populate them automatically:
```bash
# Run the workload to populate expected fields. Only adds the missing ones, doesn't change existing data
cargo xtask test workloads/tests/example.json --add-missing-responses
# OR
# Run the workload to populate expected fields. Updates all fields including existing ones
cargo xtask test workloads/tests/example.json --update-responses
```
This workflow is recommended:
1. Write the test without expected fields.
2. Run it with `--add-missing-responses` to capture the actual responses.
3. Review and commit the generated expectations.
## Changing binary
It is possible to insert an instruction to change the current Meilisearch instance from one binary specification to another during a test.
When executed, such an instruction will:
1. Stop the current Meilisearch instance.
2. Fetch the binary specified by the instruction.
3. Restart the server with the specified binary on the same database.
```jsonc
{
"type": "test",
"name": "movies",
"binary": {
"source": "release",
"version": "1.19.0", // start with version v1.19.0
"edition": "community"
},
"assets": {
"movies.json": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/movies.json",
"sha256": "5b6e4cb660bc20327776e8a33ea197b43d9ec84856710ead1cc87ab24df77de1"
}
},
"commands": [
// setup some data
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "movies.json"
}
},
// switch binary to v1.24.0
{
"binary": {
"source": "release",
"version": "1.24.0",
"edition": "community"
}
}
]
}
```
### Typical Usage
In most cases, the change binary instruction will be used to update a database.
- **Set up** some data using commands on an older version.
- **Upgrade** to the latest version.
- **Assert** that the data and API behavior remain correct after the upgrade.
To properly test the dumpless upgrade, one should typically:
1. Open the database without processing the update task: Use a `binary` instruction to switch to the desired version, passing `--experimental-dumpless-upgrade` and `--experimental-max-number-of-batched-tasks=0` as extra CLI arguments
2. Check that the search, stats and task queue still work.
3. Open the database and process the update task: Use a `binary` instruction to switch to the desired version, passing `--experimental-dumpless-upgrade` as the extra CLI argument. Use a `health` command to wait for the upgrade task to finish.
4. Check that the indexing, search, stats, and task queue still work.
```jsonc
{
"type": "test",
"name": "movies",
"binary": {
"source": "release",
"version": "1.12.0",
"edition": "community"
},
"commands": [
// 0. Run commands to populate the database
{
// ..
},
// 1. Open the database with new MS without processing the update task
{
"binary": {
"source": "build", // build the binary from the sources in the current git repository
"edition": "community",
"extraCliArgs": [
"--experimental-dumpless-upgrade", // allows to open with a newer MS
"--experimental-max-number-of-batched-tasks=0" // prevent processing of the update task
]
}
},
// 2. Check the search etc.
{
// ..
},
// 3. Open the database with new MS and processing the update task
{
"binary": {
"source": "build", // build the binary from the sources in the current git repository
"edition": "community",
"extraCliArgs": [
"--experimental-dumpless-upgrade" // allows to open with a newer MS
// no `--experimental-max-number-of-batched-tasks=0`
]
}
},
// 4. Check the indexing, search, etc.
{
// ..
}
]
}
```
This ensures backward compatibility: databases created with older Meilisearch versions should remain functional and consistent after an upgrade.
## Variables
Sometimes a command needs to use a value returned by a **previous response**.
These values can be captured and reused using the register field.
```jsonc
{
"route": "keys",
"method": "POST",
"body": {
"inline": {
"actions": [
"search",
"documents.add"
],
"description": "Test API Key",
"expiresAt": null,
"indexes": [ "movies" ]
}
},
"expectedResponse": {
"key": "c6f64630bad2996b1f675007c8800168e14adf5d6a7bb1a400a6d2b158050eaf",
// ...
},
"register": {
"key": "/key"
},
"synchronous": "WaitForResponse"
}
```
The `register` field captures the value at the JSON path `/key` from the response.
Paths follow the **JavaScript Object Notation Pointer (RFC 6901)** format.
Registered variables are available for all subsequent commands.
Registered variables can be referenced by wrapping their name in double curly braces:
In the route/path:
```jsonc
{
"route": "tasks/{{ task_id }}",
"method": "GET"
}
```
In the request body:
```jsonc
{
"route": "indexes/movies/documents",
"method": "PATCH",
"body": {
"inline": {
"id": "{{ document_id }}",
"overview": "Shazam turns evil and the world is in danger.",
}
}
}
```
Or they can be referenced by their name (**without curly braces**) as an API key:
```jsonc
{
"route": "indexes/movies/documents",
"method": "POST",
"body": { /* ... */ },
"apiKeyVariable": "key" // The **content** of the key variable will be used as an API key
}
```

View File

@@ -11,27 +11,27 @@ edition.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.98"
bumpalo = "3.18.1"
csv = "1.3.1"
memmap2 = "0.9.7"
anyhow = "1.0.100"
bumpalo = "3.19.0"
csv = "1.4.0"
memmap2 = "0.9.9"
milli = { path = "../milli" }
mimalloc = { version = "0.1.47", default-features = false }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
tempfile = "3.20.0"
mimalloc = { version = "0.1.48", default-features = false }
serde_json = { version = "1.0.145", features = ["preserve_order"] }
tempfile = "3.23.0"
[dev-dependencies]
criterion = { version = "0.6.0", features = ["html_reports"] }
criterion = { version = "0.7.0", features = ["html_reports"] }
rand = "0.8.5"
rand_chacha = "0.3.1"
roaring = "0.10.12"
[build-dependencies]
anyhow = "1.0.98"
bytes = "1.10.1"
convert_case = "0.8.0"
flate2 = "1.1.2"
reqwest = { version = "0.12.20", features = ["blocking", "rustls-tls"], default-features = false }
anyhow = "1.0.100"
bytes = "1.11.0"
convert_case = "0.9.0"
flate2 = "1.1.5"
reqwest = { version = "0.12.24", features = ["blocking", "rustls-tls"], default-features = false }
[features]
default = ["milli/all-tokenizations"]

View File

@@ -11,8 +11,8 @@ license.workspace = true
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
time = { version = "0.3.41", features = ["parsing"] }
time = { version = "0.3.44", features = ["parsing"] }
[build-dependencies]
anyhow = "1.0.98"
anyhow = "1.0.100"
vergen-git2 = "1.0.7"

View File

@@ -11,24 +11,27 @@ readme.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.98"
flate2 = "1.1.2"
anyhow = "1.0.100"
flate2 = "1.1.5"
http = "1.3.1"
meilisearch-types = { path = "../meilisearch-types" }
once_cell = "1.21.3"
regex = "1.11.1"
regex = "1.12.2"
roaring = { version = "0.10.12", features = ["serde"] }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
serde = { version = "1.0.228", features = ["derive"] }
serde_json = { version = "1.0.145", features = ["preserve_order"] }
tar = "0.4.44"
tempfile = "3.20.0"
thiserror = "2.0.12"
time = { version = "0.3.41", features = ["serde-well-known", "formatting", "parsing", "macros"] }
tempfile = "3.23.0"
thiserror = "2.0.17"
time = { version = "0.3.44", features = ["serde-well-known", "formatting", "parsing", "macros"] }
tracing = "0.1.41"
uuid = { version = "1.17.0", features = ["serde", "v4"] }
uuid = { version = "1.18.1", features = ["serde", "v4"] }
[dev-dependencies]
big_s = "1.0.2"
maplit = "1.0.2"
meili-snap = { path = "../meili-snap" }
meilisearch-types = { path = "../meilisearch-types" }
[features]
enterprise = ["meilisearch-types/enterprise"]

View File

@@ -262,13 +262,13 @@ pub(crate) mod test {
use big_s::S;
use maplit::{btreemap, btreeset};
use meilisearch_types::batches::{Batch, BatchEnqueuedAt, BatchStats};
use meilisearch_types::enterprise_edition::network::{Network, Remote};
use meilisearch_types::facet_values_sort::FacetValuesSort;
use meilisearch_types::features::RuntimeTogglableFeatures;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::keys::{Action, Key};
use meilisearch_types::milli::update::Setting;
use meilisearch_types::milli::{self, FilterableAttributesRule};
use meilisearch_types::network::{Network, Remote};
use meilisearch_types::settings::{Checked, FacetingSettings, Settings};
use meilisearch_types::task_view::DetailsView;
use meilisearch_types::tasks::{BatchStopReason, Details, Kind, Status};

View File

@@ -24,7 +24,7 @@ pub type Batch = meilisearch_types::batches::Batch;
pub type Key = meilisearch_types::keys::Key;
pub type ChatCompletionSettings = meilisearch_types::features::ChatCompletionSettings;
pub type RuntimeTogglableFeatures = meilisearch_types::features::RuntimeTogglableFeatures;
pub type Network = meilisearch_types::enterprise_edition::network::Network;
pub type Network = meilisearch_types::network::Network;
pub type Webhooks = meilisearch_types::webhooks::WebhooksDumpView;
// ===== Other types to clarify the code of the compat module

View File

@@ -5,9 +5,9 @@ use std::path::PathBuf;
use flate2::write::GzEncoder;
use flate2::Compression;
use meilisearch_types::batches::Batch;
use meilisearch_types::enterprise_edition::network::Network;
use meilisearch_types::features::{ChatCompletionSettings, RuntimeTogglableFeatures};
use meilisearch_types::keys::Key;
use meilisearch_types::network::Network;
use meilisearch_types::settings::{Checked, Settings};
use meilisearch_types::webhooks::WebhooksDumpView;
use serde_json::{Map, Value};

View File

@@ -11,7 +11,7 @@ edition.workspace = true
license.workspace = true
[dependencies]
tempfile = "3.20.0"
thiserror = "2.0.12"
tempfile = "3.23.0"
thiserror = "2.0.17"
tracing = "0.1.41"
uuid = { version = "1.17.0", features = ["serde", "v4"] }
uuid = { version = "1.18.1", features = ["serde", "v4"] }

View File

@@ -16,7 +16,7 @@ license.workspace = true
serde_json = "1.0"
[dev-dependencies]
criterion = { version = "0.6.0", features = ["html_reports"] }
criterion = { version = "0.7.0", features = ["html_reports"] }
[[bench]]
name = "benchmarks"

View File

@@ -11,12 +11,12 @@ edition.workspace = true
license.workspace = true
[dependencies]
arbitrary = { version = "1.4.1", features = ["derive"] }
bumpalo = "3.18.1"
clap = { version = "4.5.40", features = ["derive"] }
arbitrary = { version = "1.4.2", features = ["derive"] }
bumpalo = "3.19.0"
clap = { version = "4.5.52", features = ["derive"] }
either = "1.15.0"
fastrand = "2.3.0"
milli = { path = "../milli" }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
tempfile = "3.20.0"
serde = { version = "1.0.228", features = ["derive"] }
serde_json = { version = "1.0.145", features = ["preserve_order"] }
tempfile = "3.23.0"

View File

@@ -11,33 +11,33 @@ edition.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.98"
anyhow = "1.0.100"
bincode = "1.3.3"
byte-unit = "5.1.6"
bytes = "1.10.1"
bumpalo = "3.18.1"
bytes = "1.11.0"
bumpalo = "3.19.0"
bumparaw-collections = "0.1.4"
convert_case = "0.8.0"
csv = "1.3.1"
convert_case = "0.9.0"
csv = "1.4.0"
derive_builder = "0.20.2"
dump = { path = "../dump" }
enum-iterator = "2.1.0"
enum-iterator = "2.3.0"
file-store = { path = "../file-store" }
flate2 = "1.1.2"
indexmap = "2.9.0"
flate2 = "1.1.5"
indexmap = "2.12.0"
meilisearch-auth = { path = "../meilisearch-auth" }
meilisearch-types = { path = "../meilisearch-types" }
memmap2 = "0.9.7"
memmap2 = "0.9.9"
page_size = "0.6.0"
rayon = "1.10.0"
rayon = "1.11.0"
roaring = { version = "0.10.12", features = ["serde"] }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
serde = { version = "1.0.228", features = ["derive"] }
serde_json = { version = "1.0.145", features = ["preserve_order"] }
tar = "0.4.44"
synchronoise = "1.0.1"
tempfile = "3.20.0"
thiserror = "2.0.12"
time = { version = "0.3.41", features = [
tempfile = "3.23.0"
thiserror = "2.0.17"
time = { version = "0.3.44", features = [
"serde-well-known",
"formatting",
"parsing",
@@ -45,11 +45,11 @@ time = { version = "0.3.41", features = [
] }
tracing = "0.1.41"
ureq = "2.12.1"
uuid = { version = "1.17.0", features = ["serde", "v4"] }
uuid = { version = "1.18.1", features = ["serde", "v4"] }
backoff = "0.4.0"
reqwest = { version = "0.12.23", features = ["rustls-tls", "http2"], default-features = false }
reqwest = { version = "0.12.24", features = ["rustls-tls", "http2"], default-features = false }
rusty-s3 = "0.8.1"
tokio = { version = "1.47.1", features = ["full"] }
tokio = { version = "1.48.0", features = ["full"] }
[dev-dependencies]
big_s = "1.0.2"

View File

@@ -1,9 +1,9 @@
use std::sync::{Arc, RwLock};
use meilisearch_types::enterprise_edition::network::Network;
use meilisearch_types::features::{InstanceTogglableFeatures, RuntimeTogglableFeatures};
use meilisearch_types::heed::types::{SerdeJson, Str};
use meilisearch_types::heed::{Database, Env, RwTxn, WithoutTls};
use meilisearch_types::network::Network;
use crate::error::FeatureNotEnabledError;
use crate::Result;

View File

@@ -54,7 +54,6 @@ pub use features::RoFeatures;
use flate2::bufread::GzEncoder;
use flate2::Compression;
use meilisearch_types::batches::Batch;
use meilisearch_types::enterprise_edition::network::Network;
use meilisearch_types::features::{
ChatCompletionSettings, InstanceTogglableFeatures, RuntimeTogglableFeatures,
};
@@ -67,6 +66,7 @@ use meilisearch_types::milli::vector::{
Embedder, EmbedderOptions, RuntimeEmbedder, RuntimeEmbedders, RuntimeFragment,
};
use meilisearch_types::milli::{self, Index};
use meilisearch_types::network::Network;
use meilisearch_types::task_view::TaskView;
use meilisearch_types::tasks::{KindWithContent, Task, TaskNetwork};
use meilisearch_types::webhooks::{Webhook, WebhooksDumpView, WebhooksView};

View File

@@ -438,12 +438,15 @@ async fn multipart_stream_to_s3(
db_name: String,
reader: std::io::PipeReader,
) -> Result<(), Error> {
use std::{collections::VecDeque, os::fd::OwnedFd, path::PathBuf};
use std::collections::VecDeque;
use std::io;
use std::os::fd::OwnedFd;
use std::path::PathBuf;
use bytes::{Bytes, BytesMut};
use reqwest::{Client, Response};
use rusty_s3::S3Action as _;
use rusty_s3::{actions::CreateMultipartUpload, Bucket, BucketError, Credentials, UrlStyle};
use rusty_s3::actions::CreateMultipartUpload;
use rusty_s3::{Bucket, BucketError, Credentials, S3Action as _, UrlStyle};
use tokio::task::JoinHandle;
let reader = OwnedFd::from(reader);
@@ -517,7 +520,6 @@ async fn multipart_stream_to_s3(
while buffer.len() < (s3_multipart_part_size as usize / 2) {
// Wait for the pipe to be readable
use std::io;
reader.readable().await?;
match reader.try_read_buf(&mut buffer) {
@@ -581,15 +583,17 @@ async fn multipart_stream_to_s3(
async move {
match client.post(url).body(body).send().await {
Ok(resp) if resp.status().is_client_error() => {
resp.error_for_status().map_err(backoff::Error::Permanent)
Err(backoff::Error::Permanent(Error::S3Error {
status: resp.status(),
body: resp.text().await.unwrap_or_default(),
}))
}
Ok(resp) => Ok(resp),
Err(e) => Err(backoff::Error::transient(e)),
Err(e) => Err(backoff::Error::transient(Error::S3HttpError(e))),
}
}
})
.await
.map_err(Error::S3HttpError)?;
.await?;
let status = resp.status();
let body = resp.text().await.map_err(|e| Error::S3Error { status, body: e.to_string() })?;

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 26, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, batch_uid: 1, status: succeeded, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
2 {uid: 2, batch_uid: 2, status: succeeded, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
3 {uid: 3, batch_uid: 3, status: failed, error: ResponseError { code: 200, message: "Index `doggo` already exists.", error_code: "index_already_exists", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#index_already_exists" }, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
@@ -57,7 +57,7 @@ girafo: { number_of_documents: 0, field_distribution: {} }
[timestamp] [4,]
----------------------------------------------------------------------
### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.26.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.28.2"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
1 {uid: 1, details: {"primaryKey":"mouse"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"catto":1}}, stop reason: "created batch containing only task with id 1 of type `indexCreation` that cannot be batched with any other task.", }
2 {uid: 2, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 2 of type `indexCreation` that cannot be batched with any other task.", }
3 {uid: 3, details: {"primaryKey":"bone"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, stop reason: "created batch containing only task with id 3 of type `indexCreation` that cannot be batched with any other task.", }

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 26, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
----------------------------------------------------------------------
### Status:
enqueued [0,]

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 26, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
0 {uid: 0, status: enqueued, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
----------------------------------------------------------------------
### Status:

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 26, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
----------------------------------------------------------------------
### Status:
@@ -37,7 +37,7 @@ catto [1,]
[timestamp] [0,]
----------------------------------------------------------------------
### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.26.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.28.2"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
----------------------------------------------------------------------
### Batch to tasks mapping:
0 [0,]

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 26, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
0 {uid: 0, batch_uid: 0, status: failed, error: ResponseError { code: 200, message: "Planned failure for tests.", error_code: "internal", error_type: "internal", error_link: "https://docs.meilisearch.com/errors#internal" }, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
2 {uid: 2, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
----------------------------------------------------------------------
@@ -40,7 +40,7 @@ doggo [2,]
[timestamp] [0,]
----------------------------------------------------------------------
### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.26.0"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.28.2"}, stats: {"totalNbTasks":1,"status":{"failed":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
----------------------------------------------------------------------
### Batch to tasks mapping:
0 [0,]

View File

@@ -6,7 +6,7 @@ source: crates/index-scheduler/src/scheduler/test_failure.rs
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 26, 0) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
0 {uid: 0, batch_uid: 0, status: succeeded, details: { from: (1, 12, 0), to: (1, 28, 2) }, kind: UpgradeDatabase { from: (1, 12, 0) }}
1 {uid: 1, status: enqueued, details: { primary_key: Some("mouse"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}
2 {uid: 2, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
3 {uid: 3, status: enqueued, details: { primary_key: Some("bone"), old_new_uid: None, new_index_uid: None }, kind: IndexCreation { index_uid: "doggo", primary_key: Some("bone") }}
@@ -43,7 +43,7 @@ doggo [2,3,]
[timestamp] [0,]
----------------------------------------------------------------------
### All Batches:
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.26.0"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
0 {uid: 0, details: {"upgradeFrom":"v1.12.0","upgradeTo":"v1.28.2"}, stats: {"totalNbTasks":1,"status":{"succeeded":1},"types":{"upgradeDatabase":1},"indexUids":{}}, stop reason: "stopped after the last task of type `upgradeDatabase` because they cannot be batched with tasks of any other type.", }
----------------------------------------------------------------------
### Batch to tasks mapping:
0 [0,]

View File

@@ -50,6 +50,8 @@ pub fn upgrade_index_scheduler(
(1, 24, _) => 0,
(1, 25, _) => 0,
(1, 26, _) => 0,
(1, 27, _) => 0,
(1, 28, _) => 0,
(major, minor, patch) => {
if major > current_major
|| (major == current_major && minor > current_minor)

View File

@@ -15,7 +15,7 @@ license.workspace = true
serde_json = "1.0"
[dev-dependencies]
criterion = "0.6.0"
criterion = "0.7.0"
[[bench]]
name = "depth"

View File

@@ -13,7 +13,7 @@ license.workspace = true
[dependencies]
# fixed version due to format breakages in v1.40
insta = { version = "=1.39.0", features = ["json", "redactions"] }
md5 = "0.7.0"
md5 = "0.8.0"
once_cell = "1.21"
regex-lite = "0.1.6"
uuid = { version = "1.17.0", features = ["v4"] }
regex-lite = "0.1.8"
uuid = { version = "1.18.1", features = ["v4"] }

View File

@@ -12,15 +12,15 @@ license.workspace = true
[dependencies]
base64 = "0.22.1"
enum-iterator = "2.1.0"
enum-iterator = "2.3.0"
hmac = "0.12.1"
maplit = "1.0.2"
meilisearch-types = { path = "../meilisearch-types" }
rand = "0.8.5"
roaring = { version = "0.10.12", features = ["serde"] }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
serde = { version = "1.0.228", features = ["derive"] }
serde_json = { version = "1.0.145", features = ["preserve_order"] }
sha2 = "0.10.9"
thiserror = "2.0.12"
time = { version = "0.3.41", features = ["serde-well-known", "formatting", "parsing", "macros"] }
uuid = { version = "1.17.0", features = ["serde", "v4"] }
thiserror = "2.0.17"
time = { version = "0.3.44", features = ["serde-well-known", "formatting", "parsing", "macros"] }
uuid = { version = "1.18.1", features = ["serde", "v4"] }

View File

@@ -11,38 +11,38 @@ edition.workspace = true
license.workspace = true
[dependencies]
actix-web = { version = "4.11.0", default-features = false }
anyhow = "1.0.98"
bumpalo = "3.18.1"
actix-web = { version = "4.12.0", default-features = false }
anyhow = "1.0.100"
bumpalo = "3.19.0"
bumparaw-collections = "0.1.4"
byte-unit = { version = "5.1.6", features = ["serde"] }
convert_case = "0.8.0"
csv = "1.3.1"
deserr = { version = "0.6.3", features = ["actix-web"] }
convert_case = "0.9.0"
csv = "1.4.0"
deserr = { version = "0.6.4", features = ["actix-web"] }
either = { version = "1.15.0", features = ["serde"] }
enum-iterator = "2.1.0"
enum-iterator = "2.3.0"
file-store = { path = "../file-store" }
flate2 = "1.1.2"
flate2 = "1.1.5"
fst = "0.4.7"
memmap2 = "0.9.7"
memmap2 = "0.9.9"
milli = { path = "../milli" }
roaring = { version = "0.10.12", features = ["serde"] }
rustc-hash = "2.1.1"
serde = { version = "1.0.219", features = ["derive"] }
serde = { version = "1.0.228", features = ["derive"] }
serde-cs = "0.2.4"
serde_json = { version = "1.0.140", features = ["preserve_order"] }
serde_json = { version = "1.0.145", features = ["preserve_order"] }
tar = "0.4.44"
tempfile = "3.20.0"
thiserror = "2.0.12"
time = { version = "0.3.41", features = [
tempfile = "3.23.0"
thiserror = "2.0.17"
time = { version = "0.3.44", features = [
"serde-well-known",
"formatting",
"parsing",
"macros",
] }
tokio = "1.45"
tokio = "1.48"
utoipa = { version = "5.4.0", features = ["macros"] }
uuid = { version = "1.17.0", features = ["serde", "v4"] }
uuid = { version = "1.18.1", features = ["serde", "v4"] }
[dev-dependencies]
# fixed version due to format breakages in v1.40
@@ -56,6 +56,9 @@ all-tokenizations = ["milli/all-tokenizations"]
# chinese specialized tokenization
chinese = ["milli/chinese"]
chinese-pinyin = ["milli/chinese-pinyin"]
enterprise = ["milli/enterprise"]
# hebrew specialized tokenization
hebrew = ["milli/hebrew"]
# japanese specialized tokenization

View File

@@ -0,0 +1,16 @@
pub mod network {
use milli::update::new::indexer::current_edition::sharding::Shards;
use crate::network::Network;
impl Network {
pub fn shards(&self) -> Option<Shards> {
None
}
pub fn sharding(&self) -> bool {
// always false in CE
false
}
}
}

View File

@@ -3,21 +3,9 @@
// Use of this source code is governed by the Business Source License 1.1,
// as found in the LICENSE-EE file or at <https://mariadb.com/bsl11>
use std::collections::BTreeMap;
use milli::update::new::indexer::enterprise_edition::sharding::Shards;
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq, Default)]
#[serde(rename_all = "camelCase")]
pub struct Network {
#[serde(default, rename = "self")]
pub local: Option<String>,
#[serde(default)]
pub remotes: BTreeMap<String, Remote>,
#[serde(default)]
pub sharding: bool,
}
use crate::network::Network;
impl Network {
pub fn shards(&self) -> Option<Shards> {
@@ -34,14 +22,8 @@ impl Network {
None
}
}
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct Remote {
pub url: String,
#[serde(default)]
pub search_api_key: Option<String>,
#[serde(default)]
pub write_api_key: Option<String>,
pub fn sharding(&self) -> bool {
self.sharding
}
}

View File

@@ -433,6 +433,7 @@ InvalidChatCompletionSearchQueryParamPrompt , InvalidRequest , BAD_REQU
InvalidChatCompletionSearchFilterParamPrompt , InvalidRequest , BAD_REQUEST ;
InvalidChatCompletionSearchIndexUidParamPrompt , InvalidRequest , BAD_REQUEST ;
InvalidChatCompletionPreQueryPrompt , InvalidRequest , BAD_REQUEST ;
RequiresEnterpriseEdition , InvalidRequest , UNAVAILABLE_FOR_LEGAL_REASONS ;
// Webhooks
InvalidWebhooks , InvalidRequest , BAD_REQUEST ;
InvalidWebhookUrl , InvalidRequest , BAD_REQUEST ;

View File

@@ -2,10 +2,17 @@
pub mod batch_view;
pub mod batches;
#[cfg(not(feature = "enterprise"))]
pub mod community_edition;
pub mod compression;
pub mod deserr;
pub mod document_formats;
#[cfg(feature = "enterprise")]
pub mod enterprise_edition;
#[cfg(not(feature = "enterprise"))]
pub use community_edition as current_edition;
#[cfg(feature = "enterprise")]
pub use enterprise_edition as current_edition;
pub mod error;
pub mod facet_values_sort;
pub mod features;
@@ -13,6 +20,7 @@ pub mod index_uid;
pub mod index_uid_pattern;
pub mod keys;
pub mod locales;
pub mod network;
pub mod settings;
pub mod star_or;
pub mod task_view;

View File

@@ -0,0 +1,23 @@
use serde::{Deserialize, Serialize};
use std::collections::BTreeMap;
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq, Default)]
#[serde(rename_all = "camelCase")]
pub struct Network {
#[serde(default, rename = "self")]
pub local: Option<String>,
#[serde(default)]
pub remotes: BTreeMap<String, Remote>,
#[serde(default)]
pub sharding: bool,
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct Remote {
pub url: String,
#[serde(default)]
pub search_api_key: Option<String>,
#[serde(default)]
pub write_api_key: Option<String>,
}

View File

@@ -14,91 +14,91 @@ default-run = "meilisearch"
[dependencies]
actix-cors = "0.7.1"
actix-http = { version = "3.11.0", default-features = false, features = [
actix-http = { version = "3.11.2", default-features = false, features = [
"compress-brotli",
"compress-gzip",
"rustls-0_23",
] }
actix-utils = "3.0.1"
actix-web = { version = "4.11.0", default-features = false, features = [
actix-web = { version = "4.12.0", default-features = false, features = [
"macros",
"compress-brotli",
"compress-gzip",
"cookies",
"rustls-0_23",
] }
anyhow = { version = "1.0.98", features = ["backtrace"] }
bstr = "1.12.0"
anyhow = { version = "1.0.100", features = ["backtrace"] }
bstr = "1.12.1"
byte-unit = { version = "5.1.6", features = ["serde"] }
bytes = "1.10.1"
bumpalo = "3.18.1"
clap = { version = "4.5.40", features = ["derive", "env"] }
bytes = "1.11.0"
bumpalo = "3.19.0"
clap = { version = "4.5.52", features = ["derive", "env"] }
crossbeam-channel = "0.5.15"
deserr = { version = "0.6.3", features = ["actix-web"] }
deserr = { version = "0.6.4", features = ["actix-web"] }
dump = { path = "../dump" }
either = "1.15.0"
file-store = { path = "../file-store" }
flate2 = "1.1.2"
flate2 = "1.1.5"
fst = "0.4.7"
futures = "0.3.31"
futures-util = "0.3.31"
index-scheduler = { path = "../index-scheduler" }
indexmap = { version = "2.9.0", features = ["serde"] }
is-terminal = "0.4.16"
indexmap = { version = "2.12.0", features = ["serde"] }
is-terminal = "0.4.17"
itertools = "0.14.0"
jsonwebtoken = "9.3.1"
lazy_static = "1.5.0"
meilisearch-auth = { path = "../meilisearch-auth" }
meilisearch-types = { path = "../meilisearch-types" }
memmap2 = "0.9.7"
mimalloc = { version = "0.1.47", default-features = false }
memmap2 = "0.9.9"
mimalloc = { version = "0.1.48", default-features = false }
mime = "0.3.17"
num_cpus = "1.17.0"
obkv = "0.3.0"
once_cell = "1.21.3"
ordered-float = "5.0.0"
parking_lot = "0.12.4"
ordered-float = "5.1.0"
parking_lot = "0.12.5"
permissive-json-pointer = { path = "../permissive-json-pointer" }
pin-project-lite = "0.2.16"
platform-dirs = "0.3.0"
prometheus = { version = "0.14.0", features = ["process"] }
rand = "0.8.5"
rayon = "1.10.0"
regex = "1.11.1"
reqwest = { version = "0.12.20", features = [
rayon = "1.11.0"
regex = "1.12.2"
reqwest = { version = "0.12.24", features = [
"rustls-tls",
"json",
], default-features = false }
rustls = { version = "0.23.28", features = ["ring"], default-features = false }
rustls-pki-types = { version = "1.12.0", features = ["alloc"] }
rustls = { version = "0.23.35", features = ["ring"], default-features = false }
rustls-pki-types = { version = "1.13.0", features = ["alloc"] }
rustls-pemfile = "2.2.0"
segment = { version = "0.2.6" }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
serde = { version = "1.0.228", features = ["derive"] }
serde_json = { version = "1.0.145", features = ["preserve_order"] }
sha2 = "0.10.9"
siphasher = "1.0.1"
slice-group-by = "0.3.1"
static-files = { version = "0.2.5", optional = true }
sysinfo = "0.35.2"
static-files = { version = "0.3.1", optional = true }
sysinfo = "0.37.2"
tar = "0.4.44"
tempfile = "3.20.0"
thiserror = "2.0.12"
time = { version = "0.3.41", features = [
tempfile = "3.23.0"
thiserror = "2.0.17"
time = { version = "0.3.44", features = [
"serde-well-known",
"formatting",
"parsing",
"macros",
] }
tokio = { version = "1.45.1", features = ["full"] }
toml = "0.8.23"
uuid = { version = "1.18.0", features = ["serde", "v4", "v7"] }
tokio = { version = "1.48.0", features = ["full"] }
toml = "0.9.8"
uuid = { version = "1.18.1", features = ["serde", "v4", "v7"] }
serde_urlencoded = "0.7.1"
termcolor = "1.4.1"
url = { version = "2.5.4", features = ["serde"] }
url = { version = "2.5.7", features = ["serde"] }
tracing = "0.1.41"
tracing-subscriber = { version = "0.3.20", features = ["json"] }
tracing-trace = { version = "0.1.0", path = "../tracing-trace" }
tracing-actix-web = "0.7.18"
tracing-actix-web = "0.7.19"
build-info = { version = "1.7.0", path = "../build-info" }
roaring = "0.10.12"
mopa-maintained = "0.2.3"
@@ -114,35 +114,35 @@ utoipa = { version = "5.4.0", features = [
utoipa-scalar = { version = "0.3.0", optional = true, features = ["actix-web"] }
async-openai = { git = "https://github.com/meilisearch/async-openai", branch = "better-error-handling" }
secrecy = "0.10.3"
actix-web-lab = { version = "0.24.1", default-features = false }
actix-web-lab = { version = "0.24.3", default-features = false }
urlencoding = "2.1.3"
backoff = { version = "0.4.0", features = ["tokio"] }
humantime = { version = "2.3.0", default-features = false }
[dev-dependencies]
actix-rt = "2.10.0"
brotli = "8.0.1"
actix-rt = "2.11.0"
brotli = "8.0.2"
# fixed version due to format breakages in v1.40
insta = { version = "=1.39.0", features = ["redactions"] }
manifest-dir-macros = "0.1.18"
maplit = "1.0.2"
meili-snap = { path = "../meili-snap" }
temp-env = "0.3.6"
wiremock = "0.6.3"
wiremock = "0.6.5"
yaup = "0.3.1"
[build-dependencies]
anyhow = { version = "1.0.98", optional = true }
cargo_toml = { version = "0.22.1", optional = true }
anyhow = { version = "1.0.100", optional = true }
cargo_toml = { version = "0.22.3", optional = true }
hex = { version = "0.4.3", optional = true }
reqwest = { version = "0.12.20", features = [
reqwest = { version = "0.12.24", features = [
"blocking",
"rustls-tls",
], default-features = false, optional = true }
sha-1 = { version = "0.10.1", optional = true }
static-files = { version = "0.2.5", optional = true }
tempfile = { version = "3.20.0", optional = true }
zip = { version = "4.1.0", optional = true }
static-files = { version = "0.3.1", optional = true }
tempfile = { version = "3.23.0", optional = true }
zip = { version = "6.0.0", optional = true }
[features]
default = ["meilisearch-types/all-tokenizations", "mini-dashboard"]
@@ -160,6 +160,7 @@ mini-dashboard = [
]
chinese = ["meilisearch-types/chinese"]
chinese-pinyin = ["meilisearch-types/chinese-pinyin"]
enterprise = ["meilisearch-types/enterprise"]
hebrew = ["meilisearch-types/hebrew"]
japanese = ["meilisearch-types/japanese"]
korean = ["meilisearch-types/korean"]

View File

@@ -195,7 +195,7 @@ struct Infos {
experimental_enable_logs_route: bool,
experimental_reduce_indexing_memory_usage: bool,
experimental_max_number_of_batched_tasks: usize,
experimental_limit_batched_tasks_total_size: u64,
experimental_limit_batched_tasks_total_size: Option<u64>,
experimental_network: bool,
experimental_multimodal: bool,
experimental_chat_completions: bool,
@@ -359,7 +359,7 @@ impl Infos {
http_payload_size_limit,
experimental_max_number_of_batched_tasks,
experimental_limit_batched_tasks_total_size:
experimental_limit_batched_tasks_total_size.into(),
experimental_limit_batched_tasks_total_size.map(|size| size.as_u64()),
task_queue_webhook: task_webhook_url.is_some(),
task_webhook_authorization_header: task_webhook_authorization_header.is_some(),
log_level: log_level.to_string(),

View File

@@ -230,7 +230,17 @@ pub fn setup_meilisearch(
cleanup_enabled: !opt.experimental_replication_parameters,
max_number_of_tasks: 1_000_000,
max_number_of_batched_tasks: opt.experimental_max_number_of_batched_tasks,
batched_tasks_size_limit: opt.experimental_limit_batched_tasks_total_size.into(),
batched_tasks_size_limit: opt.experimental_limit_batched_tasks_total_size.map_or_else(
|| {
opt.indexer_options
.max_indexing_memory
// By default, we use half of the available memory to determine the size of batched tasks
.map_or(u64::MAX, |mem| mem.as_u64() / 2)
// And never exceed 10 GiB when we infer the limit
.min(10 * 1024 * 1024 * 1024)
},
|size| size.as_u64(),
),
index_growth_amount: byte_unit::Byte::from_str("10GiB").unwrap().as_u64() as usize,
index_count: DEFAULT_INDEX_COUNT,
instance_features: opt.to_instance_features(),

View File

@@ -1,7 +1,8 @@
use lazy_static::lazy_static;
use prometheus::{
opts, register_gauge, register_histogram_vec, register_int_counter_vec, register_int_gauge,
register_int_gauge_vec, Gauge, HistogramVec, IntCounterVec, IntGauge, IntGaugeVec,
opts, register_gauge, register_gauge_vec, register_histogram_vec, register_int_counter_vec,
register_int_gauge, register_int_gauge_vec, Gauge, GaugeVec, HistogramVec, IntCounterVec,
IntGauge, IntGaugeVec,
};
lazy_static! {
@@ -73,6 +74,20 @@ lazy_static! {
&["kind", "value"]
)
.expect("Can't create a metric");
pub static ref MEILISEARCH_BATCH_RUNNING_PROGRESS_TRACE: GaugeVec = register_gauge_vec!(
opts!("meilisearch_batch_running_progress_trace", "The currently running progress trace"),
&["batch_uid", "step_name"]
)
.expect("Can't create a metric");
pub static ref MEILISEARCH_LAST_FINISHED_BATCHES_PROGRESS_TRACE_MS: IntGaugeVec =
register_int_gauge_vec!(
opts!(
"meilisearch_last_finished_batches_progress_trace_ms",
"The last few batches progress trace in milliseconds"
),
&["batch_uid", "step_name"]
)
.expect("Can't create a metric");
pub static ref MEILISEARCH_LAST_UPDATE: IntGauge =
register_int_gauge!(opts!("meilisearch_last_update", "Meilisearch Last Update"))
.expect("Can't create a metric");

View File

@@ -473,11 +473,14 @@ pub struct Opt {
#[serde(default = "default_limit_batched_tasks")]
pub experimental_max_number_of_batched_tasks: usize,
/// Experimentally reduces the maximum total size, in bytes, of tasks that will be processed at once,
/// see: <https://github.com/orgs/meilisearch/discussions/801>
#[clap(long, env = MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS_TOTAL_SIZE, default_value_t = default_limit_batched_tasks_total_size())]
#[serde(default = "default_limit_batched_tasks_total_size")]
pub experimental_limit_batched_tasks_total_size: Byte,
/// Experimentally controls the maximum total size, in bytes, of tasks that will be processed
/// simultaneously. When unspecified, defaults to half of the maximum indexing memory and
/// clamped to 10 GiB.
///
/// See: <https://github.com/orgs/meilisearch/discussions/801>
#[clap(long, env = MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS_TOTAL_SIZE)]
#[serde(default)]
pub experimental_limit_batched_tasks_total_size: Option<Byte>,
/// Enables experimental caching of search query embeddings. The value represents the maximal number of entries in the cache of each
/// distinct embedder.
@@ -701,10 +704,12 @@ impl Opt {
MEILI_EXPERIMENTAL_MAX_NUMBER_OF_BATCHED_TASKS,
experimental_max_number_of_batched_tasks.to_string(),
);
export_to_env_if_not_present(
MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS_TOTAL_SIZE,
experimental_limit_batched_tasks_total_size.to_string(),
);
if let Some(limit) = experimental_limit_batched_tasks_total_size {
export_to_env_if_not_present(
MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS_TOTAL_SIZE,
limit.to_string(),
);
}
export_to_env_if_not_present(
MEILI_EXPERIMENTAL_EMBEDDING_CACHE_ENTRIES,
experimental_embedding_cache_entries.to_string(),
@@ -1273,10 +1278,6 @@ fn default_limit_batched_tasks() -> usize {
usize::MAX
}
fn default_limit_batched_tasks_total_size() -> Byte {
Byte::from_u64(u64::MAX)
}
fn default_embedding_cache_entries() -> usize {
0
}

View File

@@ -1,14 +1,14 @@
use crate::search::{Personalize, SearchResult};
use meilisearch_types::{
error::{Code, ErrorCode, ResponseError},
milli::TimeBudget,
};
use std::time::Duration;
use meilisearch_types::error::{Code, ErrorCode, ResponseError};
use meilisearch_types::milli::TimeBudget;
use rand::Rng;
use reqwest::Client;
use serde::{Deserialize, Serialize};
use std::time::Duration;
use tracing::{debug, info, warn};
use crate::search::{Personalize, SearchResult};
const COHERE_API_URL: &str = "https://api.cohere.ai/v1/rerank";
const MAX_RETRIES: u32 = 10;

View File

@@ -0,0 +1,39 @@
pub mod proxy {
use std::fs::File;
use actix_web::HttpRequest;
use index_scheduler::IndexScheduler;
use crate::error::MeilisearchHttpError;
pub enum Body<T: serde::Serialize> {
NdJsonPayload,
Inline(T),
None,
}
impl Body<()> {
pub fn with_ndjson_payload(_file: File) -> Self {
Self::NdJsonPayload
}
pub fn none() -> Self {
Self::None
}
}
pub const PROXY_ORIGIN_REMOTE_HEADER: &str = "Meili-Proxy-Origin-Remote";
pub const PROXY_ORIGIN_TASK_UID_HEADER: &str = "Meili-Proxy-Origin-TaskUid";
pub async fn proxy<T: serde::Serialize>(
_index_scheduler: &IndexScheduler,
_index_uid: &str,
_req: &HttpRequest,
_network: meilisearch_types::network::Network,
_body: Body<T>,
_task: &meilisearch_types::tasks::Task,
) -> Result<(), MeilisearchHttpError> {
Ok(())
}
}

View File

@@ -45,7 +45,7 @@ use crate::extractors::authentication::policies::*;
use crate::extractors::authentication::GuardedData;
use crate::extractors::payload::Payload;
use crate::extractors::sequential_extractor::SeqHandler;
use crate::routes::indexes::enterprise_edition::proxy::{proxy, Body};
use crate::routes::indexes::current_edition::proxy::{proxy, Body};
use crate::routes::indexes::search::fix_sort_query_parameters;
use crate::routes::{
get_task_id, is_dry_run, PaginationView, SummarizedTaskView, PAGINATION_DEFAULT_LIMIT,
@@ -367,7 +367,7 @@ pub async fn delete_document(
.await??
};
if network.sharding && !dry_run {
if network.sharding() && !dry_run {
proxy(&index_scheduler, &index_uid, &req, network, Body::none(), &task).await?;
}
@@ -1098,7 +1098,7 @@ async fn document_addition(
}
};
if network.sharding {
if network.sharding() {
if let Some(file) = file {
proxy(
&index_scheduler,
@@ -1222,7 +1222,7 @@ pub async fn delete_documents_batch(
.await??
};
if network.sharding && !dry_run {
if network.sharding() && !dry_run {
proxy(&index_scheduler, &index_uid, &req, network, Body::Inline(body), &task).await?;
}
@@ -1320,7 +1320,7 @@ pub async fn delete_documents_by_filter(
.await??
};
if network.sharding && !dry_run {
if network.sharding() && !dry_run {
proxy(&index_scheduler, &index_uid, &req, network, Body::Inline(filter), &task).await?;
}
@@ -1475,7 +1475,7 @@ pub async fn edit_documents_by_function(
.await??
};
if network.sharding && !dry_run {
if network.sharding() && !dry_run {
proxy(&index_scheduler, &index_uid, &req, network, Body::Inline(body), &task).await?;
}
@@ -1549,7 +1549,7 @@ pub async fn clear_all_documents(
.await??
};
if network.sharding && !dry_run {
if network.sharding() && !dry_run {
proxy(&index_scheduler, &index_uid, &req, network, Body::none(), &task).await?;
}

View File

@@ -52,7 +52,7 @@ pub async fn proxy<T: serde::Serialize>(
index_scheduler: &IndexScheduler,
index_uid: &str,
req: &HttpRequest,
network: meilisearch_types::enterprise_edition::network::Network,
network: meilisearch_types::network::Network,
body: Body<T>,
task: &meilisearch_types::tasks::Task,
) -> Result<(), MeilisearchHttpError> {

View File

@@ -30,7 +30,16 @@ use crate::Opt;
pub mod compact;
pub mod documents;
#[cfg(not(feature = "enterprise"))]
mod community_edition;
#[cfg(feature = "enterprise")]
mod enterprise_edition;
#[cfg(not(feature = "enterprise"))]
use community_edition as current_edition;
#[cfg(feature = "enterprise")]
use enterprise_edition as current_edition;
pub mod facet_search;
pub mod search;
mod search_analytics;
@@ -41,7 +50,7 @@ mod settings_analytics;
pub mod similar;
mod similar_analytics;
pub use enterprise_edition::proxy::{PROXY_ORIGIN_REMOTE_HEADER, PROXY_ORIGIN_TASK_UID_HEADER};
pub use current_edition::proxy::{PROXY_ORIGIN_REMOTE_HEADER, PROXY_ORIGIN_TASK_UID_HEADER};
#[derive(OpenApi)]
#[openapi(

View File

@@ -4,6 +4,7 @@ use index_scheduler::{IndexScheduler, Query};
use meilisearch_auth::AuthController;
use meilisearch_types::error::ResponseError;
use meilisearch_types::keys::actions;
use meilisearch_types::milli::progress::ProgressStepView;
use meilisearch_types::tasks::Status;
use prometheus::{Encoder, TextEncoder};
use time::OffsetDateTime;
@@ -38,6 +39,12 @@ pub fn configure(config: &mut web::ServiceConfig) {
# HELP meilisearch_db_size_bytes Meilisearch DB Size In Bytes
# TYPE meilisearch_db_size_bytes gauge
meilisearch_db_size_bytes 1130496
# HELP meilisearch_batch_running_progress_trace The currently running progress trace
# TYPE meilisearch_batch_running_progress_trace gauge
meilisearch_batch_running_progress_trace{batch_uid="0",step_name="document"} 0.710618582519409
meilisearch_batch_running_progress_trace{batch_uid="0",step_name="extracting word proximity"} 0.2222222222222222
meilisearch_batch_running_progress_trace{batch_uid="0",step_name="indexing"} 0.6666666666666666
meilisearch_batch_running_progress_trace{batch_uid="0",step_name="processing tasks"} 0
# HELP meilisearch_http_requests_total Meilisearch HTTP requests total
# TYPE meilisearch_http_requests_total counter
meilisearch_http_requests_total{method="GET",path="/metrics",status="400"} 1
@@ -61,6 +68,13 @@ meilisearch_http_response_time_seconds_bucket{method="GET",path="/metrics",le="1
meilisearch_http_response_time_seconds_bucket{method="GET",path="/metrics",le="+Inf"} 0
meilisearch_http_response_time_seconds_sum{method="GET",path="/metrics"} 0
meilisearch_http_response_time_seconds_count{method="GET",path="/metrics"} 0
# HELP meilisearch_last_finished_batches_progress_trace_ms The last few batches progress trace in milliseconds
# TYPE meilisearch_last_finished_batches_progress_trace_ms gauge
meilisearch_last_finished_batches_progress_trace_ms{batch_uid="0",step_name="processing tasks"} 19360
meilisearch_last_finished_batches_progress_trace_ms{batch_uid="0",step_name="processing tasks > computing document changes"} 368
meilisearch_last_finished_batches_progress_trace_ms{batch_uid="0",step_name="processing tasks > computing document changes > preparing payloads"} 367
meilisearch_last_finished_batches_progress_trace_ms{batch_uid="0",step_name="processing tasks > computing document changes > preparing payloads > payload"} 367
meilisearch_last_finished_batches_progress_trace_ms{batch_uid="0",step_name="processing tasks > indexing"} 18970
# HELP meilisearch_index_count Meilisearch Index Count
# TYPE meilisearch_index_count gauge
meilisearch_index_count 1
@@ -148,6 +162,50 @@ pub async fn get_metrics(
}
}
// Fetch and expose the current progressing step
crate::metrics::MEILISEARCH_BATCH_RUNNING_PROGRESS_TRACE.reset();
let (batches, _total) = index_scheduler.get_batches_from_authorized_indexes(
&Query { statuses: Some(vec![Status::Processing]), ..Query::default() },
auth_filters,
)?;
if let Some(batch) = batches.into_iter().next() {
let batch_uid = batch.uid.to_string();
if let Some(progress) = batch.progress {
for ProgressStepView { current_step, finished, total } in progress.steps {
crate::metrics::MEILISEARCH_BATCH_RUNNING_PROGRESS_TRACE
.with_label_values(&[batch_uid.as_str(), current_step.as_ref()])
// We return the completion ratio of the current step
.set(finished as f64 / total as f64);
}
}
}
crate::metrics::MEILISEARCH_LAST_FINISHED_BATCHES_PROGRESS_TRACE_MS.reset();
let (batches, _total) = index_scheduler.get_batches_from_authorized_indexes(
// Fetch the finished batches...
&Query {
statuses: Some(vec![Status::Succeeded, Status::Failed]),
limit: Some(1),
..Query::default()
},
auth_filters,
)?;
// ...and get the last batch only.
if let Some(batch) = batches.into_iter().next() {
let batch_uid = batch.uid.to_string();
for (step_name, duration_str) in batch.stats.progress_trace {
let Some(duration_str) = duration_str.as_str() else { continue };
match humantime::parse_duration(duration_str) {
Ok(duration) => {
crate::metrics::MEILISEARCH_LAST_FINISHED_BATCHES_PROGRESS_TRACE_MS
.with_label_values(&[&batch_uid, &step_name])
.set(duration.as_millis() as i64);
}
Err(e) => tracing::error!("Failed to parse duration: {e}"),
}
}
}
if let Some(last_update) = response.last_update {
crate::metrics::MEILISEARCH_LAST_UPDATE.set(last_update.unix_timestamp());
}

View File

@@ -7,7 +7,6 @@ use deserr::Deserr;
use index_scheduler::IndexScheduler;
use itertools::{EitherOrBoth, Itertools};
use meilisearch_types::deserr::DeserrJsonError;
use meilisearch_types::enterprise_edition::network::{Network as DbNetwork, Remote as DbRemote};
use meilisearch_types::error::deserr_codes::{
InvalidNetworkRemotes, InvalidNetworkSearchApiKey, InvalidNetworkSelf, InvalidNetworkSharding,
InvalidNetworkUrl, InvalidNetworkWriteApiKey,
@@ -15,6 +14,7 @@ use meilisearch_types::error::deserr_codes::{
use meilisearch_types::error::ResponseError;
use meilisearch_types::keys::actions;
use meilisearch_types::milli::update::Setting;
use meilisearch_types::network::{Network as DbNetwork, Remote as DbRemote};
use serde::Serialize;
use tracing::debug;
use utoipa::{OpenApi, ToSchema};
@@ -211,6 +211,16 @@ async fn patch_network(
let old_network = index_scheduler.network();
debug!(parameters = ?new_network, "Patch network");
#[cfg(not(feature = "enterprise"))]
if new_network.sharding.set().is_some() {
use meilisearch_types::error::Code;
return Err(ResponseError::from_msg(
"Meilisearch Enterprise Edition is required to set `network.sharding`".into(),
Code::RequiresEnterpriseEdition,
));
}
let merged_self = match new_network.local {
Setting::Set(new_self) => Some(new_self),
Setting::Reset => None,
@@ -312,6 +322,7 @@ async fn patch_network(
let merged_network =
DbNetwork { local: merged_self, remotes: merged_remotes, sharding: merged_sharding };
index_scheduler.put_network(merged_network.clone())?;
debug!(returns = ?merged_network, "Patch network");
Ok(HttpResponse::Ok().json(merged_network))

View File

@@ -9,12 +9,12 @@ use std::vec::{IntoIter, Vec};
use actix_http::StatusCode;
use index_scheduler::{IndexScheduler, RoFeatures};
use itertools::Itertools;
use meilisearch_types::enterprise_edition::network::{Network, Remote};
use meilisearch_types::error::ResponseError;
use meilisearch_types::milli::order_by_map::OrderByMap;
use meilisearch_types::milli::score_details::{ScoreDetails, WeightedScoreValue};
use meilisearch_types::milli::vector::Embedding;
use meilisearch_types::milli::{self, DocumentId, OrderBy, TimeBudget, DEFAULT_VALUES_PER_FACET};
use meilisearch_types::network::{Network, Remote};
use roaring::RoaringBitmap;
use tokio::task::JoinHandle;
use uuid::Uuid;

View File

@@ -1,6 +1,6 @@
pub use error::ProxySearchError;
use error::ReqwestErrorWithoutUrl;
use meilisearch_types::enterprise_edition::network::Remote;
use meilisearch_types::network::Remote;
use rand::Rng as _;
use reqwest::{Client, Response, StatusCode};
use serde::de::DeserializeOwned;

View File

@@ -18,10 +18,9 @@ use serde::{Deserialize, Serialize};
use utoipa::ToSchema;
use uuid::Uuid;
use crate::search::SearchMetadata;
use super::super::{ComputedFacets, FacetStats, HitsInfo, SearchHit, SearchQueryWithIndex};
use crate::milli::vector::Embedding;
use crate::search::SearchMetadata;
pub const DEFAULT_FEDERATED_WEIGHT: f64 = 1.0;

View File

@@ -1339,3 +1339,266 @@ async fn get_document_with_vectors() {
}
"###);
}
#[actix_rt::test]
async fn test_fetch_documents_pagination_with_sorting() {
let server = Server::new_shared();
let index = server.unique_index();
let (task, _code) = index.create(None).await;
server.wait_task(task.uid()).await.succeeded();
// Set name as sortable attribute
let (task, code) = index.update_settings_sortable_attributes(json!(["name"])).await;
assert_eq!(code, 202);
server.wait_task(task.uid()).await.succeeded();
let documents = json!((0..50)
.map(|i| json!({"id": i, "name": format!("doc_{:05}", std::cmp::min(i, 5))}))
.collect::<Vec<_>>());
// Add documents as described in the bug report
let (task, code) = index.add_documents(documents, None).await;
assert_eq!(code, 202);
server.wait_task(task.uid()).await.succeeded();
// Request 1 (first page): offset 0, limit 2
let (response, code) = index
.fetch_documents(json!({
"offset": 0,
"limit": 2,
"sort": ["name:asc"]
}))
.await;
assert_eq!(code, 200);
let results = response["results"].as_array().unwrap();
snapshot!(json_string!(results), @r###"
[
{
"id": 0,
"name": "doc_00000"
},
{
"id": 1,
"name": "doc_00001"
}
]
"###);
// Request 2 (second page): offset 2, limit 2
let (response, code) = index
.fetch_documents(json!({
"offset": 2,
"limit": 2,
"sort": ["name:asc"]
}))
.await;
assert_eq!(code, 200);
let results = response["results"].as_array().unwrap();
snapshot!(json_string!(results), @r###"
[
{
"id": 2,
"name": "doc_00002"
},
{
"id": 3,
"name": "doc_00003"
}
]
"###);
// Request 3 (third page): offset 4, limit 2
let (response, code) = index
.fetch_documents(json!({
"offset": 4,
"limit": 2,
"sort": ["name:asc"]
}))
.await;
assert_eq!(code, 200);
let results = response["results"].as_array().unwrap();
snapshot!(json_string!(results), @r###"
[
{
"id": 4,
"name": "doc_00004"
},
{
"id": 5,
"name": "doc_00005"
}
]
"###);
// Request 4 (fourth page): offset 6, limit 2
let (response, code) = index
.fetch_documents(json!({
"offset": 6,
"limit": 2,
"sort": ["name:asc"]
}))
.await;
assert_eq!(code, 200);
let results = response["results"].as_array().unwrap();
snapshot!(json_string!(results), @r###"
[
{
"id": 6,
"name": "doc_00005"
},
{
"id": 7,
"name": "doc_00005"
}
]
"###);
}
// <https://github.com/meilisearch/meilisearch/issues/5998>
#[actix_rt::test]
async fn get_document_sort_field_not_in_any_document() {
let server = Server::new_shared();
let index = server.unique_index();
let (task, _code) = index.create(None).await;
server.wait_task(task.uid()).await.succeeded();
let (task, _code) = index.update_settings_sortable_attributes(json!(["created_at"])).await;
server.wait_task(task.uid()).await.succeeded();
let documents = json!([
{ "id": 1, "name": "Document 1" },
{ "id": 2, "name": "Document 2" }
]);
let (task, _code) = index.add_documents(documents, None).await;
server.wait_task(task.uid()).await.succeeded();
let (response, code) = index
.fetch_documents(json!({
"sort": ["created_at:asc"]
}))
.await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response), @r###"
{
"results": [
{
"id": 1,
"name": "Document 1"
},
{
"id": 2,
"name": "Document 2"
}
],
"offset": 0,
"limit": 20,
"total": 2
}
"###);
}
#[actix_rt::test]
async fn get_document_sort_includes_docs_without_field() {
let server = Server::new_shared();
let index = server.unique_index();
let (task, _code) = index.create(None).await;
server.wait_task(task.uid()).await.succeeded();
let (task, _code) = index.update_settings_sortable_attributes(json!(["created_at"])).await;
server.wait_task(task.uid()).await.succeeded();
let documents = json!([
{ "id": 1, "name": "Doc without created_at" },
{ "id": 2, "name": "Doc with created_at", "created_at": "2025-01-15" },
{ "id": 3, "name": "Another doc without created_at" },
{ "id": 4, "name": "Another doc with created_at", "created_at": "2025-01-10" }
]);
let (task, _code) = index.add_documents(documents, None).await;
server.wait_task(task.uid()).await.succeeded();
let (response, code) = index
.fetch_documents(json!({
"sort": ["created_at:asc"]
}))
.await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response), @r###"
{
"results": [
{
"id": 4,
"name": "Another doc with created_at",
"created_at": "2025-01-10"
},
{
"id": 2,
"name": "Doc with created_at",
"created_at": "2025-01-15"
},
{
"id": 1,
"name": "Doc without created_at"
},
{
"id": 3,
"name": "Another doc without created_at"
}
],
"offset": 0,
"limit": 20,
"total": 4
}
"###);
}
#[actix_rt::test]
async fn get_document_sort_desc_includes_docs_without_field() {
let server = Server::new_shared();
let index = server.unique_index();
let (task, _code) = index.create(None).await;
server.wait_task(task.uid()).await.succeeded();
let (task, _code) = index.update_settings_sortable_attributes(json!(["priority"])).await;
server.wait_task(task.uid()).await.succeeded();
let documents = json!([
{ "id": 1, "name": "Low priority", "priority": 1 },
{ "id": 2, "name": "No priority" },
{ "id": 3, "name": "High priority", "priority": 10 }
]);
let (task, _code) = index.add_documents(documents, None).await;
server.wait_task(task.uid()).await.succeeded();
let (response, code) = index
.fetch_documents(json!({
"sort": ["priority:desc"]
}))
.await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response), @r###"
{
"results": [
{
"id": 3,
"name": "High priority",
"priority": 10
},
{
"id": 1,
"name": "Low priority",
"priority": 1
},
{
"id": 2,
"name": "No priority"
}
],
"offset": 0,
"limit": 20,
"total": 3
}
"###);
}

View File

@@ -3142,6 +3142,7 @@ fn fail(override_response_body: Option<&str>) -> ResponseTemplate {
}
}
#[cfg(feature = "enterprise")]
#[actix_rt::test]
async fn remote_auto_sharding() {
let ms0 = Server::new().await;
@@ -3161,7 +3162,6 @@ async fn remote_auto_sharding() {
snapshot!(json_string!(response["network"]), @"true");
// set self & sharding
let (response, code) = ms0.set_network(json!({"self": "ms0", "sharding": true})).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response), @r###"
@@ -3462,6 +3462,30 @@ async fn remote_auto_sharding() {
"###);
}
#[cfg(not(feature = "enterprise"))]
#[actix_rt::test]
async fn sharding_not_enterprise() {
let ms0 = Server::new().await;
// enable feature
let (response, code) = ms0.set_features(json!({"network": true})).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response["network"]), @"true");
let (response, code) = ms0.set_network(json!({"self": "ms0", "sharding": true})).await;
snapshot!(code, @"451 Unavailable For Legal Reasons");
snapshot!(json_string!(response), @r###"
{
"message": "Meilisearch Enterprise Edition is required to set `network.sharding`",
"code": "requires_enterprise_edition",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#requires_enterprise_edition"
}
"###);
}
#[cfg(feature = "enterprise")]
#[actix_rt::test]
async fn remote_auto_sharding_with_custom_metadata() {
let ms0 = Server::new().await;

View File

@@ -197,7 +197,7 @@ test_setting_routes!(
{
setting: vector_store,
update_verb: patch,
default_value: null
default_value: "experimental"
},
);

View File

@@ -2,6 +2,7 @@ mod chat;
mod distinct;
mod errors;
mod get_settings;
mod parent_seachable_fields;
mod prefix_search_settings;
mod proximity_settings;
mod tokenizer_customization;

View File

@@ -0,0 +1,114 @@
use meili_snap::{json_string, snapshot};
use once_cell::sync::Lazy;
use crate::common::Server;
use crate::json;
static DOCUMENTS: Lazy<crate::common::Value> = Lazy::new(|| {
json!([
{
"id": 1,
"meta": {
"title": "Soup of the day",
"description": "many the fish",
}
},
{
"id": 2,
"meta": {
"title": "Soup of day",
"description": "many the lazy fish",
}
},
{
"id": 3,
"meta": {
"title": "the Soup of day",
"description": "many the fish",
}
},
])
});
#[actix_rt::test]
async fn nested_field_becomes_searchable() {
let server = Server::new_shared();
let index = server.unique_index();
let (task, _status_code) = index.add_documents(DOCUMENTS.clone(), None).await;
server.wait_task(task.uid()).await.succeeded();
let (response, code) = index
.update_settings(json!({
"searchableAttributes": ["meta.title"]
}))
.await;
assert_eq!("202", code.as_str(), "{response:?}");
server.wait_task(response.uid()).await.succeeded();
// We expect no documents when searching for
// a nested non-searchable field
index
.search(json!({"q": "many fish"}), |response, code| {
snapshot!(code, @"200 OK");
snapshot!(json_string!(response["hits"]), @r###"[]"###);
})
.await;
let (response, code) = index
.update_settings(json!({
"searchableAttributes": ["meta.title", "meta.description"]
}))
.await;
assert_eq!("202", code.as_str(), "{response:?}");
server.wait_task(response.uid()).await.succeeded();
// We expect all the documents when the nested field becomes searchable
index
.search(json!({"q": "many fish"}), |response, code| {
snapshot!(code, @"200 OK");
snapshot!(json_string!(response["hits"]), @r###"
[
{
"id": 1,
"meta": {
"title": "Soup of the day",
"description": "many the fish"
}
},
{
"id": 3,
"meta": {
"title": "the Soup of day",
"description": "many the fish"
}
},
{
"id": 2,
"meta": {
"title": "Soup of day",
"description": "many the lazy fish"
}
}
]
"###);
})
.await;
let (response, code) = index
.update_settings(json!({
"searchableAttributes": ["meta.title"]
}))
.await;
assert_eq!("202", code.as_str(), "{response:?}");
server.wait_task(response.uid()).await.succeeded();
// We expect no documents when searching for
// a nested non-searchable field
index
.search(json!({"q": "many fish"}), |response, code| {
snapshot!(code, @"200 OK");
snapshot!(json_string!(response["hits"]), @r###"[]"###);
})
.await;
}

View File

@@ -43,7 +43,7 @@ async fn version_too_old() {
std::fs::write(db_path.join("VERSION"), "1.11.9999").unwrap();
let options = Opt { experimental_dumpless_upgrade: true, ..default_settings };
let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err();
snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.26.0");
snapshot!(err, @"Database version 1.11.9999 is too old for the experimental dumpless upgrade feature. Please generate a dump using the v1.11.9999 and import it in the v1.28.2");
}
#[actix_rt::test]
@@ -58,7 +58,7 @@ async fn version_requires_downgrade() {
std::fs::write(db_path.join("VERSION"), format!("{major}.{minor}.{patch}")).unwrap();
let options = Opt { experimental_dumpless_upgrade: true, ..default_settings };
let err = Server::new_with_options(options).await.map(|_| ()).unwrap_err();
snapshot!(err, @"Database version 1.26.1 is higher than the Meilisearch version 1.26.0. Downgrade is not supported");
snapshot!(err, @"Database version 1.28.3 is higher than the Meilisearch version 1.28.2. Downgrade is not supported");
}
#[actix_rt::test]

View File

@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null,
"details": {
"upgradeFrom": "v1.12.0",
"upgradeTo": "v1.26.0"
"upgradeTo": "v1.28.2"
},
"stats": {
"totalNbTasks": 1,

View File

@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null,
"details": {
"upgradeFrom": "v1.12.0",
"upgradeTo": "v1.26.0"
"upgradeTo": "v1.28.2"
},
"stats": {
"totalNbTasks": 1,

View File

@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null,
"details": {
"upgradeFrom": "v1.12.0",
"upgradeTo": "v1.26.0"
"upgradeTo": "v1.28.2"
},
"stats": {
"totalNbTasks": 1,

View File

@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null,
"details": {
"upgradeFrom": "v1.12.0",
"upgradeTo": "v1.26.0"
"upgradeTo": "v1.28.2"
},
"error": null,
"duration": "[duration]",

View File

@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null,
"details": {
"upgradeFrom": "v1.12.0",
"upgradeTo": "v1.26.0"
"upgradeTo": "v1.28.2"
},
"error": null,
"duration": "[duration]",

View File

@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null,
"details": {
"upgradeFrom": "v1.12.0",
"upgradeTo": "v1.26.0"
"upgradeTo": "v1.28.2"
},
"error": null,
"duration": "[duration]",

View File

@@ -8,7 +8,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"progress": null,
"details": {
"upgradeFrom": "v1.12.0",
"upgradeTo": "v1.26.0"
"upgradeTo": "v1.28.2"
},
"stats": {
"totalNbTasks": 1,

View File

@@ -12,7 +12,7 @@ source: crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
"canceledBy": null,
"details": {
"upgradeFrom": "v1.12.0",
"upgradeTo": "v1.26.0"
"upgradeTo": "v1.28.2"
},
"error": null,
"duration": "[duration]",

View File

@@ -104,8 +104,8 @@ async fn binary_quantize_before_sending_documents() {
"manual": {
"embeddings": [
[
-1.0,
-1.0,
0.0,
0.0,
1.0
]
],
@@ -122,7 +122,7 @@ async fn binary_quantize_before_sending_documents() {
[
1.0,
1.0,
-1.0
0.0
]
],
"regenerate": false
@@ -191,8 +191,8 @@ async fn binary_quantize_after_sending_documents() {
"manual": {
"embeddings": [
[
-1.0,
-1.0,
0.0,
0.0,
1.0
]
],
@@ -209,7 +209,7 @@ async fn binary_quantize_after_sending_documents() {
[
1.0,
1.0,
-1.0
0.0
]
],
"regenerate": false

View File

@@ -500,13 +500,6 @@ async fn test_both_apis() {
snapshot!(code, @"200 OK");
snapshot!(json_string!(response["hits"]), @r###"
[
{
"id": 0,
"name": "kefir",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
},
{
"id": 2,
"name": "Vénus",
@@ -527,6 +520,13 @@ async fn test_both_apis() {
"gender": "M",
"birthyear": 1995,
"breed": "Labrador Retriever"
},
{
"id": 0,
"name": "kefir",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
}
]
"###);
@@ -540,13 +540,6 @@ async fn test_both_apis() {
snapshot!(code, @"200 OK");
snapshot!(json_string!(response["hits"]), @r###"
[
{
"id": 0,
"name": "kefir",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
},
{
"id": 2,
"name": "Vénus",
@@ -567,6 +560,13 @@ async fn test_both_apis() {
"gender": "M",
"birthyear": 1995,
"breed": "Labrador Retriever"
},
{
"id": 0,
"name": "kefir",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
}
]
"###);
@@ -581,18 +581,11 @@ async fn test_both_apis() {
snapshot!(json_string!(response["hits"]), @r###"
[
{
"id": 0,
"name": "kefir",
"id": 1,
"name": "Intel",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
},
{
"id": 3,
"name": "Max",
"gender": "M",
"birthyear": 1995,
"breed": "Labrador Retriever"
"birthyear": 2011,
"breed": "Beagle"
},
{
"id": 2,
@@ -602,11 +595,18 @@ async fn test_both_apis() {
"breed": "Jack Russel Terrier"
},
{
"id": 1,
"name": "Intel",
"id": 3,
"name": "Max",
"gender": "M",
"birthyear": 2011,
"breed": "Beagle"
"birthyear": 1995,
"breed": "Labrador Retriever"
},
{
"id": 0,
"name": "kefir",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
}
]
"###);
@@ -621,18 +621,11 @@ async fn test_both_apis() {
snapshot!(json_string!(response["hits"]), @r###"
[
{
"id": 0,
"name": "kefir",
"id": 1,
"name": "Intel",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
},
{
"id": 3,
"name": "Max",
"gender": "M",
"birthyear": 1995,
"breed": "Labrador Retriever"
"birthyear": 2011,
"breed": "Beagle"
},
{
"id": 2,
@@ -642,11 +635,18 @@ async fn test_both_apis() {
"breed": "Jack Russel Terrier"
},
{
"id": 1,
"name": "Intel",
"id": 3,
"name": "Max",
"gender": "M",
"birthyear": 2011,
"breed": "Beagle"
"birthyear": 1995,
"breed": "Labrador Retriever"
},
{
"id": 0,
"name": "kefir",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
}
]
"###);
@@ -661,18 +661,11 @@ async fn test_both_apis() {
snapshot!(json_string!(response["hits"]), @r###"
[
{
"id": 0,
"name": "kefir",
"id": 1,
"name": "Intel",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
},
{
"id": 3,
"name": "Max",
"gender": "M",
"birthyear": 1995,
"breed": "Labrador Retriever"
"birthyear": 2011,
"breed": "Beagle"
},
{
"id": 2,
@@ -682,11 +675,18 @@ async fn test_both_apis() {
"breed": "Jack Russel Terrier"
},
{
"id": 1,
"name": "Intel",
"id": 3,
"name": "Max",
"gender": "M",
"birthyear": 2011,
"breed": "Beagle"
"birthyear": 1995,
"breed": "Labrador Retriever"
},
{
"id": 0,
"name": "kefir",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
}
]
"###);
@@ -701,18 +701,11 @@ async fn test_both_apis() {
snapshot!(json_string!(response["hits"]), @r###"
[
{
"id": 0,
"name": "kefir",
"id": 1,
"name": "Intel",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
},
{
"id": 3,
"name": "Max",
"gender": "M",
"birthyear": 1995,
"breed": "Labrador Retriever"
"birthyear": 2011,
"breed": "Beagle"
},
{
"id": 2,
@@ -722,11 +715,18 @@ async fn test_both_apis() {
"breed": "Jack Russel Terrier"
},
{
"id": 1,
"name": "Intel",
"id": 3,
"name": "Max",
"gender": "M",
"birthyear": 2011,
"breed": "Beagle"
"birthyear": 1995,
"breed": "Labrador Retriever"
},
{
"id": 0,
"name": "kefir",
"gender": "M",
"birthyear": 2023,
"breed": "Patou"
}
]
"###);

View File

@@ -9,15 +9,15 @@ edition.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.98"
clap = { version = "4.5.40", features = ["derive"] }
anyhow = "1.0.100"
clap = { version = "4.5.52", features = ["derive"] }
dump = { path = "../dump" }
file-store = { path = "../file-store" }
indexmap = { version = "2.9.0", features = ["serde"] }
indexmap = { version = "2.12.0", features = ["serde"] }
meilisearch-auth = { path = "../meilisearch-auth" }
meilisearch-types = { path = "../meilisearch-types" }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.140", features = ["preserve_order"] }
tempfile = "3.20.0"
time = { version = "0.3.41", features = ["formatting", "parsing", "alloc"] }
uuid = { version = "1.17.0", features = ["v4"], default-features = false }
serde = { version = "1.0.228", features = ["derive"] }
serde_json = { version = "1.0.145", features = ["preserve_order"] }
tempfile = "3.23.0"
time = { version = "0.3.44", features = ["formatting", "parsing", "alloc"] }
uuid = { version = "1.18.1", features = ["v4"], default-features = false }

View File

@@ -15,15 +15,15 @@ license.workspace = true
big_s = "1.0.2"
bimap = { version = "0.6.3", features = ["serde"] }
bincode = "1.3.3"
bstr = "1.12.0"
bytemuck = { version = "1.23.1", features = ["extern_crate_alloc"] }
bstr = "1.12.1"
bytemuck = { version = "1.24.0", features = ["extern_crate_alloc"] }
byteorder = "1.5.0"
charabia = { version = "0.9.8", default-features = false }
charabia = { version = "0.9.9", default-features = false }
cellulite = "0.3.1-nested-rtxns-2"
concat-arrays = "0.1.2"
convert_case = "0.8.0"
convert_case = "0.9.0"
crossbeam-channel = "0.5.15"
deserr = "0.6.3"
deserr = "0.6.4"
either = { version = "1.15.0", features = ["serde"] }
flatten-serde-json = { path = "../flatten-serde-json" }
fst = "0.4.7"
@@ -38,39 +38,39 @@ heed = { version = "0.22.1-nested-rtxns-6", default-features = false, features =
"serde-json",
"serde-bincode",
] }
indexmap = { version = "2.9.0", features = ["serde"] }
indexmap = { version = "2.12.0", features = ["serde"] }
json-depth-checker = { path = "../json-depth-checker" }
levenshtein_automata = { version = "0.2.1", features = ["fst_automaton"] }
memchr = "2.7.5"
memmap2 = "0.9.7"
memchr = "2.7.6"
memmap2 = "0.9.9"
obkv = "0.3.0"
once_cell = "1.21.3"
ordered-float = "5.0.0"
rayon = "1.10.0"
ordered-float = "5.1.0"
rayon = "1.11.0"
roaring = { version = "0.10.12", features = ["serde"] }
rstar = { version = "0.12.2", features = ["serde"] }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = { version = "1.0.140", features = ["preserve_order", "raw_value"] }
serde = { version = "1.0.228", features = ["derive"] }
serde_json = { version = "1.0.145", features = ["preserve_order", "raw_value"] }
slice-group-by = "0.3.1"
smallstr = { version = "0.3.0", features = ["serde"] }
smallstr = { version = "0.3.1", features = ["serde"] }
smallvec = "1.15.1"
smartstring = "1.0.1"
tempfile = "3.20.0"
thiserror = "2.0.12"
time = { version = "0.3.41", features = [
tempfile = "3.23.0"
thiserror = "2.0.17"
time = { version = "0.3.44", features = [
"serde-well-known",
"formatting",
"parsing",
"macros",
] }
uuid = { version = "1.17.0", features = ["v4"] }
uuid = { version = "1.18.1", features = ["v4"] }
filter-parser = { path = "../filter-parser" }
# documents words self-join
itertools = "0.14.0"
csv = "1.3.1"
csv = "1.4.0"
candle-core = { version = "0.9.1" }
candle-transformers = { version = "0.9.1" }
candle-nn = { version = "0.9.1" }
@@ -81,9 +81,9 @@ hf-hub = { git = "https://github.com/dureuill/hf-hub.git", branch = "rust_tls",
"online",
] }
safetensors = "0.6.2"
tiktoken-rs = "0.7.0"
tiktoken-rs = "0.9.1"
liquid = "0.26.11"
rhai = { version = "1.22.2", features = [
rhai = { version = "1.23.6", features = [
"serde",
"no_module",
"no_custom_syntax",
@@ -95,14 +95,14 @@ hannoy = { version = "0.0.9-nested-rtxns-2", features = ["arroy"] }
rand = "0.8.5"
tracing = "0.1.41"
ureq = { version = "2.12.1", features = ["json"] }
url = "2.5.4"
hashbrown = "0.15.4"
bumpalo = "3.18.1"
url = "2.5.7"
hashbrown = "0.15.5"
bumpalo = "3.19.0"
bumparaw-collections = "0.1.4"
steppe = { version = "0.4", default-features = false }
thread_local = "1.1.9"
rustc-hash = "2.1.1"
enum-iterator = "2.1.0"
enum-iterator = "2.3.0"
bbqueue = { git = "https://github.com/meilisearch/bbqueue" }
flume = { version = "0.11.1", default-features = false }
utoipa = { version = "5.4.0", features = [
@@ -112,21 +112,21 @@ utoipa = { version = "5.4.0", features = [
"time",
"openapi_extensions",
] }
lru = "0.14.0"
twox-hash = { version = "2.1.1", default-features = false, features = [
lru = "0.16.2"
twox-hash = { version = "2.1.2", default-features = false, features = [
"std",
"xxhash3_64",
"xxhash64",
] }
geo-types = "0.7.16"
geo-types = "0.7.17"
zerometry = "0.3.0"
[dev-dependencies]
mimalloc = { version = "0.1.47", default-features = false }
mimalloc = { version = "0.1.48", default-features = false }
# fixed version due to format breakages in v1.40
insta = "=1.39.0"
maplit = "1.0.2"
md5 = "0.7.0"
md5 = "0.8.0"
meili-snap = { path = "../meili-snap" }
rand = { version = "0.8.5", features = ["small_rng"] }
@@ -141,6 +141,8 @@ lmdb-posix-sem = ["heed/posix-sem"]
chinese = ["charabia/chinese"]
chinese-pinyin = ["chinese", "charabia/chinese-normalization-pinyin"]
enterprise = []
# allow hebrew specialized tokenization
hebrew = ["charabia/hebrew"]

View File

@@ -87,7 +87,7 @@ impl Iterator for SortedDocumentsIterator<'_> {
};
// Otherwise don't directly iterate over children, skip them if we know we will go further
let mut to_skip = n - 1;
let mut to_skip = n;
while to_skip > 0 {
if let Err(e) = SortedDocumentsIterator::update_current(
current_child,
@@ -108,7 +108,7 @@ impl Iterator for SortedDocumentsIterator<'_> {
continue;
} else {
// The current iterator is large enough, so we can forward the call to it.
return inner.nth(to_skip + 1);
return inner.nth(to_skip);
}
}
@@ -240,15 +240,25 @@ impl<'ctx> SortedDocumentsIteratorBuilder<'ctx> {
) -> crate::Result<SortedDocumentsIterator<'ctx>> {
let size = candidates.len() as usize;
// Get documents that have this facet field
let faceted_candidates = index.exists_faceted_documents_ids(rtxn, field_id)?;
// Documents that don't have this facet field should be returned at the end
let not_faceted_candidates = &candidates - &faceted_candidates;
// Only sort candidates that have the facet field
let faceted_candidates = candidates & faceted_candidates;
let mut not_faceted_candidates = Some(not_faceted_candidates);
// Perform the sort on the first field
let (number_iter, string_iter) = if ascending {
let number_iter = ascending_facet_sort(rtxn, number_db, field_id, candidates.clone())?;
let string_iter = ascending_facet_sort(rtxn, string_db, field_id, candidates)?;
let number_iter =
ascending_facet_sort(rtxn, number_db, field_id, faceted_candidates.clone())?;
let string_iter = ascending_facet_sort(rtxn, string_db, field_id, faceted_candidates)?;
(itertools::Either::Left(number_iter), itertools::Either::Left(string_iter))
} else {
let number_iter = descending_facet_sort(rtxn, number_db, field_id, candidates.clone())?;
let string_iter = descending_facet_sort(rtxn, string_db, field_id, candidates)?;
let number_iter =
descending_facet_sort(rtxn, number_db, field_id, faceted_candidates.clone())?;
let string_iter = descending_facet_sort(rtxn, string_db, field_id, faceted_candidates)?;
(itertools::Either::Right(number_iter), itertools::Either::Right(string_iter))
};
@@ -256,17 +266,37 @@ impl<'ctx> SortedDocumentsIteratorBuilder<'ctx> {
// Create builders for the next level of the tree
let number_iter = number_iter.map(|r| r.map(|(d, _)| d));
let string_iter = string_iter.map(|r| r.map(|(d, _)| d));
let next_children = number_iter.chain(string_iter).map(move |r| {
Ok(SortedDocumentsIteratorBuilder {
index,
rtxn,
number_db,
string_db,
fields: next_fields,
candidates: r?,
geo_candidates,
// Chain faceted documents with non-faceted documents at the end
let next_children = number_iter
.chain(string_iter)
.map(move |r| {
Ok(SortedDocumentsIteratorBuilder {
index,
rtxn,
number_db,
string_db,
fields: next_fields,
candidates: r?,
geo_candidates,
})
})
});
.chain(std::iter::from_fn(move || {
// Once all faceted candidates have been processed, return the non-faceted ones
if let Some(not_faceted) = not_faceted_candidates.take() {
if !not_faceted.is_empty() {
return Some(Ok(SortedDocumentsIteratorBuilder {
index,
rtxn,
number_db,
string_db,
fields: next_fields,
candidates: not_faceted,
geo_candidates,
}));
}
}
None
}));
Ok(SortedDocumentsIterator::Branch {
current_child: None,
@@ -398,10 +428,14 @@ pub fn recursive_sort<'ctx>(
};
if let Some((field, ascending)) = field {
if is_faceted(&field, &sortable_fields) {
// The field may be in sortable_fields but not in fields_ids_map if no document
// has ever contained this field. In that case, we just skip this sort criterion
// since there are no values to sort by. Documents will be returned in their
// default order for this field.
if let Some(field_id) = fields_ids_map.id(&field) {
fields.push(AscDescId::Facet { field_id, ascending });
continue;
}
continue;
}
return Err(UserError::InvalidDocumentSortableAttribute {
field: field.to_string(),

View File

@@ -18,6 +18,8 @@ use crate::{
pub struct Metadata {
/// The weight as defined in the FieldidsWeightsMap of the searchable attribute if it is searchable.
pub searchable: Option<Weight>,
/// The field is part of the exact attributes.
pub exact: bool,
/// The field is part of the sortable attributes.
pub sortable: bool,
/// The field is defined as the distinct attribute.
@@ -209,6 +211,7 @@ impl Metadata {
#[derive(Debug, Clone)]
pub struct MetadataBuilder {
searchable_attributes: Option<Vec<String>>,
exact_searchable_attributes: Vec<String>,
filterable_attributes: Vec<FilterableAttributesRule>,
sortable_attributes: HashSet<String>,
localized_attributes: Option<Vec<LocalizedAttributesRule>>,
@@ -220,15 +223,18 @@ impl MetadataBuilder {
pub fn from_index(index: &Index, rtxn: &RoTxn) -> Result<Self> {
let searchable_attributes = index
.user_defined_searchable_fields(rtxn)?
.map(|fields| fields.into_iter().map(|s| s.to_string()).collect());
.map(|fields| fields.into_iter().map(String::from).collect());
let exact_searchable_attributes =
index.exact_attributes(rtxn)?.into_iter().map(String::from).collect();
let filterable_attributes = index.filterable_attributes_rules(rtxn)?;
let sortable_attributes = index.sortable_fields(rtxn)?;
let localized_attributes = index.localized_attributes_rules(rtxn)?;
let distinct_attribute = index.distinct_field(rtxn)?.map(|s| s.to_string());
let distinct_attribute = index.distinct_field(rtxn)?.map(String::from);
let asc_desc_attributes = index.asc_desc_fields(rtxn)?;
Ok(Self::new(
searchable_attributes,
exact_searchable_attributes,
filterable_attributes,
sortable_attributes,
localized_attributes,
@@ -242,6 +248,7 @@ impl MetadataBuilder {
/// This is used for testing, prefer using `MetadataBuilder::from_index` instead.
pub fn new(
searchable_attributes: Option<Vec<String>>,
exact_searchable_attributes: Vec<String>,
filterable_attributes: Vec<FilterableAttributesRule>,
sortable_attributes: HashSet<String>,
localized_attributes: Option<Vec<LocalizedAttributesRule>>,
@@ -256,6 +263,7 @@ impl MetadataBuilder {
Self {
searchable_attributes,
exact_searchable_attributes,
filterable_attributes,
sortable_attributes,
localized_attributes,
@@ -269,6 +277,7 @@ impl MetadataBuilder {
// Vectors fields are not searchable, filterable, distinct or asc_desc
return Metadata {
searchable: None,
exact: false,
sortable: false,
distinct: false,
asc_desc: false,
@@ -296,6 +305,7 @@ impl MetadataBuilder {
// Geo fields are not searchable, distinct or asc_desc
return Metadata {
searchable: None,
exact: false,
sortable,
distinct: false,
asc_desc: false,
@@ -309,6 +319,7 @@ impl MetadataBuilder {
debug_assert!(!sortable, "geojson fields should not be sortable");
return Metadata {
searchable: None,
exact: false,
sortable,
distinct: false,
asc_desc: false,
@@ -329,6 +340,8 @@ impl MetadataBuilder {
None => Some(0),
};
let exact = self.exact_searchable_attributes.iter().any(|attr| is_faceted_by(field, attr));
let distinct =
self.distinct_attribute.as_ref().is_some_and(|distinct_field| field == distinct_field);
let asc_desc = self.asc_desc_attributes.contains(field);
@@ -343,6 +356,7 @@ impl MetadataBuilder {
Metadata {
searchable,
exact,
sortable,
distinct,
asc_desc,

View File

@@ -281,6 +281,9 @@ impl Index {
&mut wtxn,
(constants::VERSION_MAJOR, constants::VERSION_MINOR, constants::VERSION_PATCH),
)?;
// The database before v1.29 defaulted to using arroy, so we
// need to set it explicitly because the new default is hannoy.
this.put_vector_store(&mut wtxn, VectorStoreBackend::Hannoy)?;
}
wtxn.commit()?;

View File

@@ -8,17 +8,26 @@ use bumpalo::Bump;
use super::match_searchable_field;
use super::tokenize_document::{tokenizer_builder, DocumentTokenizer};
use crate::fields_ids_map::metadata::Metadata;
use crate::update::new::document::DocumentContext;
use crate::update::new::extract::cache::BalancedCaches;
use crate::update::new::extract::perm_json_p::contained_in;
use crate::update::new::extract::searchable::has_searchable_children;
use crate::update::new::indexer::document_changes::{
extract, DocumentChanges, Extractor, IndexingContext,
};
use crate::update::new::indexer::settings_changes::{
settings_change_extract, DocumentsIndentifiers, SettingsChangeExtractor,
};
use crate::update::new::ref_cell_ext::RefCellExt as _;
use crate::update::new::steps::IndexingStep;
use crate::update::new::thread_local::{FullySend, MostlySend, ThreadLocal};
use crate::update::new::DocumentChange;
use crate::{bucketed_position, DocumentId, FieldId, Result, MAX_POSITION_PER_ATTRIBUTE};
use crate::update::new::{DocumentChange, DocumentIdentifiers};
use crate::update::settings::SettingsDelta;
use crate::{
bucketed_position, DocumentId, FieldId, PatternMatch, Result, UserError,
MAX_POSITION_PER_ATTRIBUTE,
};
const MAX_COUNTED_WORDS: usize = 30;
@@ -34,6 +43,15 @@ pub struct WordDocidsBalancedCaches<'extractor> {
unsafe impl MostlySend for WordDocidsBalancedCaches<'_> {}
/// Whether to extract or skip fields during word extraction.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum FieldDbExtraction {
/// Extract the word and put it in to the fid-based databases.
Extract,
/// Do not store the word in the fid-based databases.
Skip,
}
impl<'extractor> WordDocidsBalancedCaches<'extractor> {
pub fn new_in(buckets: usize, max_memory: Option<usize>, alloc: &'extractor Bump) -> Self {
Self {
@@ -47,12 +65,14 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
}
}
#[allow(clippy::too_many_arguments)]
fn insert_add_u32(
&mut self,
field_id: FieldId,
position: u16,
word: &str,
exact: bool,
field_db_extraction: FieldDbExtraction,
docid: u32,
bump: &Bump,
) -> Result<()> {
@@ -66,11 +86,13 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
let buffer_size = word_bytes.len() + 1 + size_of::<FieldId>();
let mut buffer = BumpVec::with_capacity_in(buffer_size, bump);
buffer.clear();
buffer.extend_from_slice(word_bytes);
buffer.push(0);
buffer.extend_from_slice(&field_id.to_be_bytes());
self.word_fid_docids.insert_add_u32(&buffer, docid)?;
if field_db_extraction == FieldDbExtraction::Extract {
buffer.clear();
buffer.extend_from_slice(word_bytes);
buffer.push(0);
buffer.extend_from_slice(&field_id.to_be_bytes());
self.word_fid_docids.insert_add_u32(&buffer, docid)?;
}
let position = bucketed_position(position);
buffer.clear();
@@ -83,21 +105,26 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
self.flush_fid_word_count(&mut buffer)?;
}
self.fid_word_count
.entry(field_id)
.and_modify(|(_current_count, new_count)| *new_count.get_or_insert(0) += 1)
.or_insert((None, Some(1)));
if field_db_extraction == FieldDbExtraction::Extract {
self.fid_word_count
.entry(field_id)
.and_modify(|(_current_count, new_count)| *new_count.get_or_insert(0) += 1)
.or_insert((None, Some(1)));
}
self.current_docid = Some(docid);
Ok(())
}
#[allow(clippy::too_many_arguments)]
fn insert_del_u32(
&mut self,
field_id: FieldId,
position: u16,
word: &str,
exact: bool,
field_db_extraction: FieldDbExtraction,
docid: u32,
bump: &Bump,
) -> Result<()> {
@@ -111,11 +138,13 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
let buffer_size = word_bytes.len() + 1 + size_of::<FieldId>();
let mut buffer = BumpVec::with_capacity_in(buffer_size, bump);
buffer.clear();
buffer.extend_from_slice(word_bytes);
buffer.push(0);
buffer.extend_from_slice(&field_id.to_be_bytes());
self.word_fid_docids.insert_del_u32(&buffer, docid)?;
if field_db_extraction == FieldDbExtraction::Extract {
buffer.clear();
buffer.extend_from_slice(word_bytes);
buffer.push(0);
buffer.extend_from_slice(&field_id.to_be_bytes());
self.word_fid_docids.insert_del_u32(&buffer, docid)?;
}
let position = bucketed_position(position);
buffer.clear();
@@ -128,10 +157,12 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
self.flush_fid_word_count(&mut buffer)?;
}
self.fid_word_count
.entry(field_id)
.and_modify(|(current_count, _new_count)| *current_count.get_or_insert(0) += 1)
.or_insert((Some(1), None));
if field_db_extraction == FieldDbExtraction::Extract {
self.fid_word_count
.entry(field_id)
.and_modify(|(current_count, _new_count)| *current_count.get_or_insert(0) += 1)
.or_insert((Some(1), None));
}
self.current_docid = Some(docid);
@@ -325,6 +356,24 @@ impl WordDocidsExtractors {
exact_attributes.iter().any(|attr| contained_in(fname, attr))
|| disabled_typos_terms.is_exact(word)
};
let mut should_tokenize = |field_name: &str| {
let Some((field_id, meta)) = new_fields_ids_map.id_with_metadata_or_insert(field_name)
else {
return Err(UserError::AttributeLimitReached.into());
};
let pattern_match = if meta.is_searchable() {
PatternMatch::Match
} else {
// TODO: should be a match on the field_name using `match_field_legacy` function,
// but for legacy reasons we iterate over all the fields to fill the field_id_map.
PatternMatch::Parent
};
Ok((field_id, pattern_match))
};
match document_change {
DocumentChange::Deletion(inner) => {
let mut token_fn = |fname: &str, fid, pos, word: &str| {
@@ -333,13 +382,14 @@ impl WordDocidsExtractors {
pos,
word,
is_exact(fname, word),
FieldDbExtraction::Extract,
inner.docid(),
doc_alloc,
)
};
document_tokenizer.tokenize_document(
inner.current(rtxn, index, context.db_fields_ids_map)?,
new_fields_ids_map,
&mut should_tokenize,
&mut token_fn,
)?;
}
@@ -361,13 +411,14 @@ impl WordDocidsExtractors {
pos,
word,
is_exact(fname, word),
FieldDbExtraction::Extract,
inner.docid(),
doc_alloc,
)
};
document_tokenizer.tokenize_document(
inner.current(rtxn, index, context.db_fields_ids_map)?,
new_fields_ids_map,
&mut should_tokenize,
&mut token_fn,
)?;
@@ -377,13 +428,14 @@ impl WordDocidsExtractors {
pos,
word,
is_exact(fname, word),
FieldDbExtraction::Extract,
inner.docid(),
doc_alloc,
)
};
document_tokenizer.tokenize_document(
inner.merged(rtxn, index, context.db_fields_ids_map)?,
new_fields_ids_map,
&mut should_tokenize,
&mut token_fn,
)?;
}
@@ -394,13 +446,14 @@ impl WordDocidsExtractors {
pos,
word,
is_exact(fname, word),
FieldDbExtraction::Extract,
inner.docid(),
doc_alloc,
)
};
document_tokenizer.tokenize_document(
inner.inserted(),
new_fields_ids_map,
&mut should_tokenize,
&mut token_fn,
)?;
}
@@ -411,3 +464,292 @@ impl WordDocidsExtractors {
cached_sorter.flush_fid_word_count(&mut buffer)
}
}
pub struct WordDocidsSettingsExtractorsData<'a, SD> {
tokenizer: DocumentTokenizer<'a>,
max_memory_by_thread: Option<usize>,
buckets: usize,
settings_delta: &'a SD,
}
impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
for WordDocidsSettingsExtractorsData<'_, SD>
{
type Data = RefCell<Option<WordDocidsBalancedCaches<'extractor>>>;
fn init_data<'doc>(&'doc self, extractor_alloc: &'extractor Bump) -> crate::Result<Self::Data> {
Ok(RefCell::new(Some(WordDocidsBalancedCaches::new_in(
self.buckets,
self.max_memory_by_thread,
extractor_alloc,
))))
}
fn process<'doc>(
&'doc self,
documents: impl Iterator<Item = crate::Result<DocumentIdentifiers<'doc>>>,
context: &'doc DocumentContext<Self::Data>,
) -> crate::Result<()> {
for document in documents {
let document = document?;
SettingsChangeWordDocidsExtractors::extract_document_from_settings_change(
document,
context,
&self.tokenizer,
self.settings_delta,
)?;
}
Ok(())
}
}
pub struct SettingsChangeWordDocidsExtractors;
impl SettingsChangeWordDocidsExtractors {
pub fn run_extraction<'fid, 'indexer, 'index, 'extractor, SD, MSP>(
settings_delta: &SD,
documents: &'indexer DocumentsIndentifiers<'indexer>,
indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP>,
extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>,
step: IndexingStep,
) -> Result<WordDocidsCaches<'extractor>>
where
SD: SettingsDelta + Sync,
MSP: Fn() -> bool + Sync,
{
// Warning: this is duplicated code from extract_word_pair_proximity_docids.rs
// TODO we need to read the new AND old settings to support changing global parameters
let rtxn = indexing_context.index.read_txn()?;
let stop_words = indexing_context.index.stop_words(&rtxn)?;
let allowed_separators = indexing_context.index.allowed_separators(&rtxn)?;
let allowed_separators: Option<Vec<_>> =
allowed_separators.as_ref().map(|s| s.iter().map(String::as_str).collect());
let dictionary = indexing_context.index.dictionary(&rtxn)?;
let dictionary: Option<Vec<_>> =
dictionary.as_ref().map(|s| s.iter().map(String::as_str).collect());
let mut builder = tokenizer_builder(
stop_words.as_ref(),
allowed_separators.as_deref(),
dictionary.as_deref(),
);
let tokenizer = builder.build();
let localized_attributes_rules =
indexing_context.index.localized_attributes_rules(&rtxn)?.unwrap_or_default();
let document_tokenizer = DocumentTokenizer {
tokenizer: &tokenizer,
localized_attributes_rules: &localized_attributes_rules,
max_positions_per_attributes: MAX_POSITION_PER_ATTRIBUTE,
};
let extractor_data = WordDocidsSettingsExtractorsData {
tokenizer: document_tokenizer,
max_memory_by_thread: indexing_context.grenad_parameters.max_memory_by_thread(),
buckets: rayon::current_num_threads(),
settings_delta,
};
let datastore = ThreadLocal::new();
{
let span = tracing::debug_span!(target: "indexing::documents::extract", "vectors");
let _entered = span.enter();
settings_change_extract(
documents,
&extractor_data,
indexing_context,
extractor_allocs,
&datastore,
step,
)?;
}
let mut merger = WordDocidsCaches::new();
for cache in datastore.into_iter().flat_map(RefCell::into_inner) {
merger.push(cache)?;
}
Ok(merger)
}
/// Extracts document words from a settings change.
fn extract_document_from_settings_change<SD: SettingsDelta>(
document: DocumentIdentifiers<'_>,
context: &DocumentContext<RefCell<Option<WordDocidsBalancedCaches>>>,
document_tokenizer: &DocumentTokenizer,
settings_delta: &SD,
) -> Result<()> {
let mut cached_sorter_ref = context.data.borrow_mut_or_yield();
let cached_sorter = cached_sorter_ref.as_mut().unwrap();
let doc_alloc = &context.doc_alloc;
let new_fields_ids_map = settings_delta.new_fields_ids_map();
let old_fields_ids_map = context.index.fields_ids_map_with_metadata(&context.rtxn)?;
let old_searchable = settings_delta.old_searchable_attributes().as_ref();
let new_searchable = settings_delta.new_searchable_attributes().as_ref();
let current_document = document.current(
&context.rtxn,
context.index,
old_fields_ids_map.as_fields_ids_map(),
)?;
#[derive(Debug, Clone, Copy, PartialEq)]
enum ActionToOperate {
ReindexAllFields,
// TODO improve by listing field prefixes
IndexAddedFields,
SkipDocument,
}
let mut action = ActionToOperate::SkipDocument;
// Here we do a preliminary check to determine the action to take.
// This check doesn't trigger the tokenizer as we never return
// PatternMatch::Match.
document_tokenizer.tokenize_document(
current_document,
&mut |field_name| {
let fid = new_fields_ids_map.id(field_name).expect("All fields IDs must exist");
// If the document must be reindexed, early return NoMatch to stop the scanning process.
if action == ActionToOperate::ReindexAllFields {
return Ok((fid, PatternMatch::NoMatch));
}
let old_field_metadata = old_fields_ids_map.metadata(fid).unwrap();
let new_field_metadata = new_fields_ids_map.metadata(fid).unwrap();
action = match (old_field_metadata, new_field_metadata) {
// At least one field is added or removed from the exact fields => ReindexAllFields
(Metadata { exact: old_exact, .. }, Metadata { exact: new_exact, .. })
if old_exact != new_exact =>
{
ActionToOperate::ReindexAllFields
}
// At least one field is removed from the searchable fields => ReindexAllFields
(Metadata { searchable: Some(_), .. }, Metadata { searchable: None, .. }) => {
ActionToOperate::ReindexAllFields
}
// At least one field is added in the searchable fields => IndexAddedFields
(Metadata { searchable: None, .. }, Metadata { searchable: Some(_), .. }) => {
// We can safely overwrite the action, because we early return when action is ReindexAllFields.
ActionToOperate::IndexAddedFields
}
_ => action,
};
Ok((fid, PatternMatch::Parent))
},
&mut |_, _, _, _| Ok(()),
)?;
// Early return when we don't need to index the document
if action == ActionToOperate::SkipDocument {
return Ok(());
}
let mut should_tokenize = |field_name: &str| {
let field_id = new_fields_ids_map.id(field_name).expect("All fields IDs must exist");
let old_field_metadata = old_fields_ids_map.metadata(field_id).unwrap();
let new_field_metadata = new_fields_ids_map.metadata(field_id).unwrap();
let pattern_match = match action {
ActionToOperate::ReindexAllFields => {
if old_field_metadata.is_searchable() || new_field_metadata.is_searchable() {
PatternMatch::Match
// If any old or new field is searchable then we need to iterate over all fields
// else if any field matches we need to iterate over all fields
} else if has_searchable_children(
field_name,
old_searchable.zip(new_searchable).map(|(old, new)| old.iter().chain(new)),
) {
PatternMatch::Parent
} else {
PatternMatch::NoMatch
}
}
ActionToOperate::IndexAddedFields => {
// Was not searchable but now is
if !old_field_metadata.is_searchable() && new_field_metadata.is_searchable() {
PatternMatch::Match
// If the field is now a parent of a searchable field
} else if has_searchable_children(field_name, new_searchable) {
PatternMatch::Parent
} else {
PatternMatch::NoMatch
}
}
ActionToOperate::SkipDocument => unreachable!(),
};
Ok((field_id, pattern_match))
};
let old_disabled_typos_terms = settings_delta.old_disabled_typos_terms();
let new_disabled_typos_terms = settings_delta.new_disabled_typos_terms();
let mut token_fn = |_field_name: &str, field_id, pos, word: &str| {
let old_field_metadata = old_fields_ids_map.metadata(field_id).unwrap();
let new_field_metadata = new_fields_ids_map.metadata(field_id).unwrap();
match (old_field_metadata, new_field_metadata) {
(
Metadata { searchable: Some(_), exact: old_exact, .. },
Metadata { searchable: None, .. },
) => cached_sorter.insert_del_u32(
field_id,
pos,
word,
old_exact || old_disabled_typos_terms.is_exact(word),
// We deleted the field globally
FieldDbExtraction::Skip,
document.docid(),
doc_alloc,
),
(
Metadata { searchable: None, .. },
Metadata { searchable: Some(_), exact: new_exact, .. },
) => cached_sorter.insert_add_u32(
field_id,
pos,
word,
new_exact || new_disabled_typos_terms.is_exact(word),
FieldDbExtraction::Extract,
document.docid(),
doc_alloc,
),
(Metadata { searchable: None, .. }, Metadata { searchable: None, .. }) => {
unreachable!()
}
(Metadata { exact: old_exact, .. }, Metadata { exact: new_exact, .. }) => {
cached_sorter.insert_del_u32(
field_id,
pos,
word,
old_exact || old_disabled_typos_terms.is_exact(word),
// The field has already been extracted
FieldDbExtraction::Skip,
document.docid(),
doc_alloc,
)?;
cached_sorter.insert_add_u32(
field_id,
pos,
word,
new_exact || new_disabled_typos_terms.is_exact(word),
// The field has already been extracted
FieldDbExtraction::Skip,
document.docid(),
doc_alloc,
)
}
}
};
// TODO we must tokenize twice when we change global parameters like stop words,
// the language settings, dictionary, separators, non-separators...
document_tokenizer.tokenize_document(
current_document,
&mut should_tokenize,
&mut token_fn,
)?;
Ok(())
}
}

View File

@@ -6,17 +6,24 @@ use bumpalo::Bump;
use super::match_searchable_field;
use super::tokenize_document::{tokenizer_builder, DocumentTokenizer};
use crate::fields_ids_map::metadata::Metadata;
use crate::proximity::ProximityPrecision::*;
use crate::proximity::{index_proximity, MAX_DISTANCE};
use crate::update::new::document::{Document, DocumentContext};
use crate::update::new::extract::cache::BalancedCaches;
use crate::update::new::indexer::document_changes::{
extract, DocumentChanges, Extractor, IndexingContext,
};
use crate::update::new::indexer::settings_change_extract;
use crate::update::new::indexer::settings_changes::{
DocumentsIndentifiers, SettingsChangeExtractor,
};
use crate::update::new::ref_cell_ext::RefCellExt as _;
use crate::update::new::steps::IndexingStep;
use crate::update::new::thread_local::{FullySend, ThreadLocal};
use crate::update::new::DocumentChange;
use crate::{FieldId, GlobalFieldsIdsMap, Result, MAX_POSITION_PER_ATTRIBUTE};
use crate::update::new::{DocumentChange, DocumentIdentifiers};
use crate::update::settings::SettingsDelta;
use crate::{FieldId, PatternMatch, Result, UserError, MAX_POSITION_PER_ATTRIBUTE};
pub struct WordPairProximityDocidsExtractorData<'a> {
tokenizer: DocumentTokenizer<'a>,
@@ -116,7 +123,7 @@ impl WordPairProximityDocidsExtractor {
// and to store the docids of the documents that have a number of words in a given field
// equal to or under than MAX_COUNTED_WORDS.
fn extract_document_change(
context: &DocumentContext<RefCell<BalancedCaches>>,
context: &DocumentContext<RefCell<BalancedCaches<'_>>>,
document_tokenizer: &DocumentTokenizer,
searchable_attributes: Option<&[&str]>,
document_change: DocumentChange,
@@ -147,8 +154,12 @@ impl WordPairProximityDocidsExtractor {
process_document_tokens(
document,
document_tokenizer,
new_fields_ids_map,
&mut word_positions,
&mut |field_name| {
new_fields_ids_map
.id_with_metadata_or_insert(field_name)
.ok_or(UserError::AttributeLimitReached.into())
},
&mut |(w1, w2), prox| {
del_word_pair_proximity.push(((w1, w2), prox));
},
@@ -170,8 +181,12 @@ impl WordPairProximityDocidsExtractor {
process_document_tokens(
document,
document_tokenizer,
new_fields_ids_map,
&mut word_positions,
&mut |field_name| {
new_fields_ids_map
.id_with_metadata_or_insert(field_name)
.ok_or(UserError::AttributeLimitReached.into())
},
&mut |(w1, w2), prox| {
del_word_pair_proximity.push(((w1, w2), prox));
},
@@ -180,8 +195,12 @@ impl WordPairProximityDocidsExtractor {
process_document_tokens(
document,
document_tokenizer,
new_fields_ids_map,
&mut word_positions,
&mut |field_name| {
new_fields_ids_map
.id_with_metadata_or_insert(field_name)
.ok_or(UserError::AttributeLimitReached.into())
},
&mut |(w1, w2), prox| {
add_word_pair_proximity.push(((w1, w2), prox));
},
@@ -192,8 +211,12 @@ impl WordPairProximityDocidsExtractor {
process_document_tokens(
document,
document_tokenizer,
new_fields_ids_map,
&mut word_positions,
&mut |field_name| {
new_fields_ids_map
.id_with_metadata_or_insert(field_name)
.ok_or(UserError::AttributeLimitReached.into())
},
&mut |(w1, w2), prox| {
add_word_pair_proximity.push(((w1, w2), prox));
},
@@ -257,8 +280,8 @@ fn drain_word_positions(
fn process_document_tokens<'doc>(
document: impl Document<'doc>,
document_tokenizer: &DocumentTokenizer,
fields_ids_map: &mut GlobalFieldsIdsMap,
word_positions: &mut VecDeque<(Rc<str>, u16)>,
field_id_and_metadata: &mut impl FnMut(&str) -> Result<(FieldId, Metadata)>,
word_pair_proximity: &mut impl FnMut((Rc<str>, Rc<str>), u8),
) -> Result<()> {
let mut field_id = None;
@@ -279,8 +302,248 @@ fn process_document_tokens<'doc>(
word_positions.push_back((Rc::from(word), pos));
Ok(())
};
document_tokenizer.tokenize_document(document, fields_ids_map, &mut token_fn)?;
let mut should_tokenize = |field_name: &str| {
let (field_id, meta) = field_id_and_metadata(field_name)?;
let pattern_match = if meta.is_searchable() {
PatternMatch::Match
} else {
// TODO: should be a match on the field_name using `match_field_legacy` function,
// but for legacy reasons we iterate over all the fields to fill the field_id_map.
PatternMatch::Parent
};
Ok((field_id, pattern_match))
};
document_tokenizer.tokenize_document(document, &mut should_tokenize, &mut token_fn)?;
drain_word_positions(word_positions, word_pair_proximity);
Ok(())
}
pub struct WordPairProximityDocidsSettingsExtractorsData<'a, SD> {
tokenizer: DocumentTokenizer<'a>,
max_memory_by_thread: Option<usize>,
buckets: usize,
settings_delta: &'a SD,
}
impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
for WordPairProximityDocidsSettingsExtractorsData<'_, SD>
{
type Data = RefCell<BalancedCaches<'extractor>>;
fn init_data<'doc>(&'doc self, extractor_alloc: &'extractor Bump) -> crate::Result<Self::Data> {
Ok(RefCell::new(BalancedCaches::new_in(
self.buckets,
self.max_memory_by_thread,
extractor_alloc,
)))
}
fn process<'doc>(
&'doc self,
documents: impl Iterator<Item = crate::Result<DocumentIdentifiers<'doc>>>,
context: &'doc DocumentContext<Self::Data>,
) -> crate::Result<()> {
for document in documents {
let document = document?;
SettingsChangeWordPairProximityDocidsExtractors::extract_document_from_settings_change(
document,
context,
&self.tokenizer,
self.settings_delta,
)?;
}
Ok(())
}
}
pub struct SettingsChangeWordPairProximityDocidsExtractors;
impl SettingsChangeWordPairProximityDocidsExtractors {
pub fn run_extraction<'fid, 'indexer, 'index, 'extractor, SD, MSP>(
settings_delta: &SD,
documents: &'indexer DocumentsIndentifiers<'indexer>,
indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP>,
extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>,
step: IndexingStep,
) -> Result<Vec<BalancedCaches<'extractor>>>
where
SD: SettingsDelta + Sync,
MSP: Fn() -> bool + Sync,
{
// Warning: this is duplicated code from extract_word_docids.rs
let rtxn = indexing_context.index.read_txn()?;
let stop_words = indexing_context.index.stop_words(&rtxn)?;
let allowed_separators = indexing_context.index.allowed_separators(&rtxn)?;
let allowed_separators: Option<Vec<_>> =
allowed_separators.as_ref().map(|s| s.iter().map(String::as_str).collect());
let dictionary = indexing_context.index.dictionary(&rtxn)?;
let dictionary: Option<Vec<_>> =
dictionary.as_ref().map(|s| s.iter().map(String::as_str).collect());
let mut builder = tokenizer_builder(
stop_words.as_ref(),
allowed_separators.as_deref(),
dictionary.as_deref(),
);
let tokenizer = builder.build();
let localized_attributes_rules =
indexing_context.index.localized_attributes_rules(&rtxn)?.unwrap_or_default();
let document_tokenizer = DocumentTokenizer {
tokenizer: &tokenizer,
localized_attributes_rules: &localized_attributes_rules,
max_positions_per_attributes: MAX_POSITION_PER_ATTRIBUTE,
};
let extractor_data = WordPairProximityDocidsSettingsExtractorsData {
tokenizer: document_tokenizer,
max_memory_by_thread: indexing_context.grenad_parameters.max_memory_by_thread(),
buckets: rayon::current_num_threads(),
settings_delta,
};
let datastore = ThreadLocal::new();
{
let span = tracing::trace_span!(target: "indexing::documents::extract", "word_pair_proximity_docids_extraction");
let _entered = span.enter();
settings_change_extract(
documents,
&extractor_data,
indexing_context,
extractor_allocs,
&datastore,
step,
)?;
}
Ok(datastore.into_iter().map(RefCell::into_inner).collect())
}
/// Extracts document words from a settings change.
fn extract_document_from_settings_change<SD: SettingsDelta>(
document: DocumentIdentifiers<'_>,
context: &DocumentContext<RefCell<BalancedCaches<'_>>>,
document_tokenizer: &DocumentTokenizer,
settings_delta: &SD,
) -> Result<()> {
let mut cached_sorter = context.data.borrow_mut_or_yield();
let doc_alloc = &context.doc_alloc;
let new_fields_ids_map = settings_delta.new_fields_ids_map();
let old_fields_ids_map = settings_delta.old_fields_ids_map();
let old_proximity_precision = *settings_delta.old_proximity_precision();
let new_proximity_precision = *settings_delta.new_proximity_precision();
let current_document = document.current(
&context.rtxn,
context.index,
old_fields_ids_map.as_fields_ids_map(),
)?;
#[derive(Debug, Clone, Copy, PartialEq)]
enum ActionToOperate {
ReindexAllFields,
SkipDocument,
}
// TODO prefix_fid delete_old_fid_based_databases
let mut action = match (old_proximity_precision, new_proximity_precision) {
(ByAttribute, ByWord) => ActionToOperate::ReindexAllFields,
(_, _) => ActionToOperate::SkipDocument,
};
// Here we do a preliminary check to determine the action to take.
// This check doesn't trigger the tokenizer as we never return
// PatternMatch::Match.
if action != ActionToOperate::ReindexAllFields {
document_tokenizer.tokenize_document(
current_document,
&mut |field_name| {
let fid = new_fields_ids_map.id(field_name).expect("All fields IDs must exist");
// If the document must be reindexed, early return NoMatch to stop the scanning process.
if action == ActionToOperate::ReindexAllFields {
return Ok((fid, PatternMatch::NoMatch));
}
let old_field_metadata = old_fields_ids_map.metadata(fid).unwrap();
let new_field_metadata = new_fields_ids_map.metadata(fid).unwrap();
action = match (old_field_metadata, new_field_metadata) {
// At least one field is removed or added from the searchable fields
(
Metadata { searchable: Some(_), .. },
Metadata { searchable: None, .. },
)
| (
Metadata { searchable: None, .. },
Metadata { searchable: Some(_), .. },
) => ActionToOperate::ReindexAllFields,
_ => action,
};
Ok((fid, PatternMatch::Parent))
},
&mut |_, _, _, _| Ok(()),
)?;
}
// Early return when we don't need to index the document
if action == ActionToOperate::SkipDocument {
return Ok(());
}
let mut del_word_pair_proximity = bumpalo::collections::Vec::new_in(doc_alloc);
let mut add_word_pair_proximity = bumpalo::collections::Vec::new_in(doc_alloc);
// is a vecdequeue, and will be smol, so can stay on the heap for now
let mut word_positions: VecDeque<(Rc<str>, u16)> =
VecDeque::with_capacity(MAX_DISTANCE as usize);
process_document_tokens(
current_document,
// TODO Tokenize must be based on old settings
document_tokenizer,
&mut word_positions,
&mut |field_name| {
Ok(old_fields_ids_map.id_with_metadata(field_name).expect("All fields must exist"))
},
&mut |(w1, w2), prox| {
del_word_pair_proximity.push(((w1, w2), prox));
},
)?;
process_document_tokens(
current_document,
// TODO Tokenize must be based on new settings
document_tokenizer,
&mut word_positions,
&mut |field_name| {
Ok(new_fields_ids_map.id_with_metadata(field_name).expect("All fields must exist"))
},
&mut |(w1, w2), prox| {
add_word_pair_proximity.push(((w1, w2), prox));
},
)?;
let mut key_buffer = bumpalo::collections::Vec::new_in(doc_alloc);
del_word_pair_proximity.sort_unstable();
del_word_pair_proximity.dedup_by(|(k1, _), (k2, _)| k1 == k2);
for ((w1, w2), prox) in del_word_pair_proximity.iter() {
let key = build_key(*prox, w1, w2, &mut key_buffer);
cached_sorter.insert_del_u32(key, document.docid())?;
}
add_word_pair_proximity.sort_unstable();
add_word_pair_proximity.dedup_by(|(k1, _), (k2, _)| k1 == k2);
for ((w1, w2), prox) in add_word_pair_proximity.iter() {
let key = build_key(*prox, w1, w2, &mut key_buffer);
cached_sorter.insert_add_u32(key, document.docid())?;
}
Ok(())
}
}

View File

@@ -2,8 +2,12 @@ mod extract_word_docids;
mod extract_word_pair_proximity_docids;
mod tokenize_document;
pub use extract_word_docids::{WordDocidsCaches, WordDocidsExtractors};
pub use extract_word_pair_proximity_docids::WordPairProximityDocidsExtractor;
pub use extract_word_docids::{
SettingsChangeWordDocidsExtractors, WordDocidsCaches, WordDocidsExtractors,
};
pub use extract_word_pair_proximity_docids::{
SettingsChangeWordPairProximityDocidsExtractors, WordPairProximityDocidsExtractor,
};
use crate::attribute_patterns::{match_field_legacy, PatternMatch};
@@ -27,3 +31,17 @@ pub fn match_searchable_field(
selection
}
/// return `true` if the provided `field_name` is a parent of at least one of the fields contained in `searchable`,
/// or if `searchable` is `None`.
fn has_searchable_children<I, A>(field_name: &str, searchable: Option<I>) -> bool
where
I: IntoIterator<Item = A>,
A: AsRef<str>,
{
searchable.is_none_or(|fields| {
fields
.into_iter()
.any(|attr| match_field_legacy(attr.as_ref(), field_name) == PatternMatch::Parent)
})
}

View File

@@ -8,10 +8,7 @@ use crate::update::new::document::Document;
use crate::update::new::extract::perm_json_p::{
seek_leaf_values_in_array, seek_leaf_values_in_object, Depth,
};
use crate::{
FieldId, GlobalFieldsIdsMap, InternalError, LocalizedAttributesRule, Result, UserError,
MAX_WORD_LENGTH,
};
use crate::{FieldId, InternalError, LocalizedAttributesRule, Result, MAX_WORD_LENGTH};
// todo: should be crate::proximity::MAX_DISTANCE but it has been forgotten
const MAX_DISTANCE: u32 = 8;
@@ -26,26 +23,25 @@ impl DocumentTokenizer<'_> {
pub fn tokenize_document<'doc>(
&self,
document: impl Document<'doc>,
field_id_map: &mut GlobalFieldsIdsMap,
should_tokenize: &mut impl FnMut(&str) -> Result<(FieldId, PatternMatch)>,
token_fn: &mut impl FnMut(&str, FieldId, u16, &str) -> Result<()>,
) -> Result<()> {
let mut field_position = HashMap::new();
let mut tokenize_field = |field_name: &str, _depth, value: &Value| {
let Some((field_id, meta)) = field_id_map.id_with_metadata_or_insert(field_name) else {
return Err(UserError::AttributeLimitReached.into());
};
if meta.is_searchable() {
self.tokenize_field(field_id, field_name, value, token_fn, &mut field_position)?;
}
// todo: should be a match on the field_name using `match_field_legacy` function,
// but for legacy reasons we iterate over all the fields to fill the field_id_map.
Ok(PatternMatch::Match)
};
for entry in document.iter_top_level_fields() {
let (field_name, value) = entry?;
if let (_, PatternMatch::NoMatch) = should_tokenize(field_name)? {
continue;
}
let mut tokenize_field = |field_name: &str, _depth, value: &Value| {
let (fid, pattern_match) = should_tokenize(field_name)?;
if pattern_match == PatternMatch::Match {
self.tokenize_field(fid, field_name, value, token_fn, &mut field_position)?;
}
Ok(pattern_match)
};
// parse json.
match serde_json::to_value(value).map_err(InternalError::SerdeJson)? {
Value::Object(object) => seek_leaf_values_in_object(
@@ -192,7 +188,7 @@ mod test {
use super::*;
use crate::fields_ids_map::metadata::{FieldIdMapWithMetadata, MetadataBuilder};
use crate::update::new::document::{DocumentFromVersions, Versions};
use crate::FieldsIdsMap;
use crate::{FieldsIdsMap, GlobalFieldsIdsMap, UserError};
#[test]
fn test_tokenize_document() {
@@ -231,6 +227,7 @@ mod test {
Default::default(),
Default::default(),
Default::default(),
Default::default(),
None,
None,
Default::default(),
@@ -251,15 +248,19 @@ mod test {
let document = Versions::single(document);
let document = DocumentFromVersions::new(&document);
let mut should_tokenize = |field_name: &str| {
let Some(field_id) = global_fields_ids_map.id_or_insert(field_name) else {
return Err(UserError::AttributeLimitReached.into());
};
Ok((field_id, PatternMatch::Match))
};
document_tokenizer
.tokenize_document(
document,
&mut global_fields_ids_map,
&mut |_fname, fid, pos, word| {
words.insert([fid, pos], word.to_string());
Ok(())
},
)
.tokenize_document(document, &mut should_tokenize, &mut |_fname, fid, pos, word| {
words.insert([fid, pos], word.to_string());
Ok(())
})
.unwrap();
snapshot!(format!("{:#?}", words), @r###"

View File

@@ -1,5 +1,6 @@
use std::cell::RefCell;
use std::fmt::Debug;
use std::sync::RwLock;
use bumpalo::collections::Vec as BVec;
use bumpalo::Bump;
@@ -27,7 +28,10 @@ use crate::vector::extractor::{
use crate::vector::session::{EmbedSession, Input, Metadata, OnEmbed};
use crate::vector::settings::ReindexAction;
use crate::vector::{Embedding, RuntimeEmbedder, RuntimeEmbedders, RuntimeFragment};
use crate::{DocumentId, FieldDistribution, InternalError, Result, ThreadPoolNoAbort, UserError};
use crate::{
DocumentId, FieldDistribution, GlobalFieldsIdsMap, InternalError, Result, ThreadPoolNoAbort,
UserError,
};
pub struct EmbeddingExtractor<'a, 'b> {
embedders: &'a RuntimeEmbedders,
@@ -321,6 +325,15 @@ impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
let old_embedders = self.settings_delta.old_embedders();
let unused_vectors_distribution = UnusedVectorsDistributionBump::new_in(&context.doc_alloc);
// We get a reference to the new and old fields ids maps but
// note that those are local versions where updates to them
// will not be reflected in the database. It's not an issue
// because new settings do not generate new fields.
let new_fields_ids_map = RwLock::new(self.settings_delta.new_fields_ids_map().clone());
let new_fields_ids_map = RefCell::new(GlobalFieldsIdsMap::new(&new_fields_ids_map));
let old_fields_ids_map = RwLock::new(self.settings_delta.old_fields_ids_map().clone());
let old_fields_ids_map = RefCell::new(GlobalFieldsIdsMap::new(&old_fields_ids_map));
let mut all_chunks = BVec::with_capacity_in(embedders.len(), &context.doc_alloc);
let embedder_configs = context.index.embedding_configs();
for (embedder_name, action) in self.settings_delta.embedder_actions().iter() {
@@ -396,6 +409,7 @@ impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
if !must_regenerate {
continue;
}
// we need to regenerate the prompts for the document
chunks.settings_change_autogenerated(
document.docid(),
@@ -406,7 +420,8 @@ impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
context.db_fields_ids_map,
)?,
self.settings_delta,
context.new_fields_ids_map,
&old_fields_ids_map,
&new_fields_ids_map,
&unused_vectors_distribution,
old_is_user_provided,
fragments_changed,
@@ -442,7 +457,8 @@ impl<'extractor, SD: SettingsDelta + Sync> SettingsChangeExtractor<'extractor>
context.db_fields_ids_map,
)?,
self.settings_delta,
context.new_fields_ids_map,
&old_fields_ids_map,
&new_fields_ids_map,
&unused_vectors_distribution,
old_is_user_provided,
true,
@@ -638,7 +654,8 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> {
external_docid: &'a str,
document: D,
settings_delta: &SD,
fields_ids_map: &'a RefCell<crate::GlobalFieldsIdsMap>,
old_fields_ids_map: &'a RefCell<GlobalFieldsIdsMap<'a>>,
new_fields_ids_map: &'a RefCell<GlobalFieldsIdsMap<'a>>,
unused_vectors_distribution: &UnusedVectorsDistributionBump<'a>,
old_is_user_provided: bool,
full_reindex: bool,
@@ -733,10 +750,17 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> {
old_embedder.as_ref().map(|old_embedder| &old_embedder.document_template)
};
let extractor =
DocumentTemplateExtractor::new(document_template, doc_alloc, fields_ids_map);
let extractor = DocumentTemplateExtractor::new(
document_template,
doc_alloc,
new_fields_ids_map,
);
let old_extractor = old_document_template.map(|old_document_template| {
DocumentTemplateExtractor::new(old_document_template, doc_alloc, fields_ids_map)
DocumentTemplateExtractor::new(
old_document_template,
doc_alloc,
old_fields_ids_map,
)
});
let metadata =
Metadata { docid, external_docid, extractor_id: extractor.extractor_id() };

View File

@@ -0,0 +1,9 @@
pub mod sharding {
pub struct Shards;
impl Shards {
pub fn must_process(&self, _docid: &str) -> bool {
true
}
}
}

View File

@@ -17,7 +17,7 @@ use super::guess_primary_key::retrieve_or_guess_primary_key;
use crate::documents::PrimaryKey;
use crate::progress::{AtomicPayloadStep, Progress};
use crate::update::new::document::{DocumentContext, Versions};
use crate::update::new::indexer::enterprise_edition::sharding::Shards;
use crate::update::new::indexer::current_edition::sharding::Shards;
use crate::update::new::steps::IndexingStep;
use crate::update::new::thread_local::MostlySend;
use crate::update::new::{DocumentIdentifiers, Insertion, Update};

View File

@@ -372,11 +372,10 @@ where
SD: SettingsDelta + Sync,
{
// Create the list of document ids to extract
let rtxn = indexing_context.index.read_txn()?;
let all_document_ids =
indexing_context.index.documents_ids(&rtxn)?.into_iter().collect::<Vec<_>>();
let primary_key =
primary_key_from_db(indexing_context.index, &rtxn, &indexing_context.db_fields_ids_map)?;
let index = indexing_context.index;
let rtxn = index.read_txn()?;
let all_document_ids = index.documents_ids(&rtxn)?.into_iter().collect::<Vec<_>>();
let primary_key = primary_key_from_db(index, &rtxn, &indexing_context.db_fields_ids_map)?;
let documents = DocumentsIndentifiers::new(&all_document_ids, primary_key);
let span =
@@ -391,6 +390,133 @@ where
extractor_allocs,
)?;
{
let WordDocidsCaches {
word_docids,
word_fid_docids,
exact_word_docids,
word_position_docids,
fid_word_count_docids,
} = {
let span = tracing::trace_span!(target: "indexing::documents::extract", "word_docids");
let _entered = span.enter();
SettingsChangeWordDocidsExtractors::run_extraction(
settings_delta,
&documents,
indexing_context,
extractor_allocs,
IndexingStep::ExtractingWords,
)?
};
indexing_context.progress.update_progress(IndexingStep::MergingWordCaches);
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_docids");
let _entered = span.enter();
indexing_context.progress.update_progress(MergingWordCache::WordDocids);
merge_and_send_docids(
word_docids,
index.word_docids.remap_types(),
index,
extractor_sender.docids::<WordDocids>(),
&indexing_context.must_stop_processing,
)?;
}
{
let span =
tracing::trace_span!(target: "indexing::documents::merge", "word_fid_docids");
let _entered = span.enter();
indexing_context.progress.update_progress(MergingWordCache::WordFieldIdDocids);
merge_and_send_docids(
word_fid_docids,
index.word_fid_docids.remap_types(),
index,
extractor_sender.docids::<WordFidDocids>(),
&indexing_context.must_stop_processing,
)?;
}
{
let span =
tracing::trace_span!(target: "indexing::documents::merge", "exact_word_docids");
let _entered = span.enter();
indexing_context.progress.update_progress(MergingWordCache::ExactWordDocids);
merge_and_send_docids(
exact_word_docids,
index.exact_word_docids.remap_types(),
index,
extractor_sender.docids::<ExactWordDocids>(),
&indexing_context.must_stop_processing,
)?;
}
{
let span =
tracing::trace_span!(target: "indexing::documents::merge", "word_position_docids");
let _entered = span.enter();
indexing_context.progress.update_progress(MergingWordCache::WordPositionDocids);
merge_and_send_docids(
word_position_docids,
index.word_position_docids.remap_types(),
index,
extractor_sender.docids::<WordPositionDocids>(),
&indexing_context.must_stop_processing,
)?;
}
{
let span =
tracing::trace_span!(target: "indexing::documents::merge", "fid_word_count_docids");
let _entered = span.enter();
indexing_context.progress.update_progress(MergingWordCache::FieldIdWordCountDocids);
merge_and_send_docids(
fid_word_count_docids,
index.field_id_word_count_docids.remap_types(),
index,
extractor_sender.docids::<FidWordCountDocids>(),
&indexing_context.must_stop_processing,
)?;
}
}
// Run the proximity extraction only if the precision is ByWord.
let new_proximity_precision = settings_delta.new_proximity_precision();
if *new_proximity_precision == ProximityPrecision::ByWord {
let caches = {
let span = tracing::trace_span!(target: "indexing::documents::extract", "word_pair_proximity_docids");
let _entered = span.enter();
SettingsChangeWordPairProximityDocidsExtractors::run_extraction(
settings_delta,
&documents,
indexing_context,
extractor_allocs,
IndexingStep::ExtractingWordProximity,
)?
};
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_pair_proximity_docids");
let _entered = span.enter();
indexing_context.progress.update_progress(IndexingStep::MergingWordProximity);
merge_and_send_docids(
caches,
index.word_pair_proximity_docids.remap_types(),
index,
extractor_sender.docids::<WordPairProximityDocids>(),
&indexing_context.must_stop_processing,
)?;
}
}
'vectors: {
if settings_delta.embedder_actions().is_empty() {
break 'vectors;

View File

@@ -1,4 +1,4 @@
use std::collections::BTreeMap;
use std::collections::{BTreeMap, BTreeSet};
use std::sync::atomic::AtomicBool;
use std::sync::{Arc, Once, RwLock};
use std::thread::{self, Builder};
@@ -8,9 +8,11 @@ use document_changes::{DocumentChanges, IndexingContext};
pub use document_deletion::DocumentDeletion;
pub use document_operation::{DocumentOperation, PayloadStats};
use hashbrown::HashMap;
use heed::{RoTxn, RwTxn};
use heed::types::DecodeIgnore;
use heed::{BytesDecode, Database, RoTxn, RwTxn};
pub use partial_dump::PartialDump;
pub use post_processing::recompute_word_fst_from_word_docids_database;
pub use settings_changes::settings_change_extract;
pub use update_by_function::UpdateByFunction;
pub use write::ChannelCongestion;
use write::{build_vectors, update_index, write_to_db};
@@ -20,18 +22,31 @@ use super::steps::IndexingStep;
use super::thread_local::ThreadLocal;
use crate::documents::PrimaryKey;
use crate::fields_ids_map::metadata::{FieldIdMapWithMetadata, MetadataBuilder};
use crate::heed_codec::StrBEU16Codec;
use crate::progress::{EmbedderStats, Progress};
use crate::proximity::ProximityPrecision;
use crate::update::new::steps::SettingsIndexerStep;
use crate::update::new::FacetFieldIdsDelta;
use crate::update::settings::SettingsDelta;
use crate::update::GrenadParameters;
use crate::vector::settings::{EmbedderAction, RemoveFragments, WriteBackToDocuments};
use crate::vector::{Embedder, RuntimeEmbedders, VectorStore};
use crate::{FieldsIdsMap, GlobalFieldsIdsMap, Index, InternalError, Result, ThreadPoolNoAbort};
use crate::{
Error, FieldsIdsMap, GlobalFieldsIdsMap, Index, InternalError, Result, ThreadPoolNoAbort,
};
#[cfg(not(feature = "enterprise"))]
pub mod community_edition;
pub(crate) mod de;
pub mod document_changes;
mod document_deletion;
mod document_operation;
#[cfg(feature = "enterprise")]
pub mod enterprise_edition;
#[cfg(not(feature = "enterprise"))]
pub use community_edition as current_edition;
#[cfg(feature = "enterprise")]
pub use enterprise_edition as current_edition;
mod extract;
mod guess_primary_key;
mod partial_dump;
@@ -235,6 +250,20 @@ where
SD: SettingsDelta + Sync,
{
delete_old_embedders_and_fragments(wtxn, index, settings_delta)?;
delete_old_fid_based_databases(wtxn, index, settings_delta, must_stop_processing, progress)?;
// Clear word_pair_proximity if byWord to byAttribute
let old_proximity_precision = settings_delta.old_proximity_precision();
let new_proximity_precision = settings_delta.new_proximity_precision();
if *old_proximity_precision == ProximityPrecision::ByWord
&& *new_proximity_precision == ProximityPrecision::ByAttribute
{
index.word_pair_proximity_docids.clear(wtxn)?;
}
// TODO delete useless searchable databases
// - Clear fid_prefix_* in the post processing
// - clear the prefix + fid_prefix if setting `PrefixSearch` is enabled
let mut bbbuffers = Vec::new();
let finished_extraction = AtomicBool::new(false);
@@ -293,6 +322,8 @@ where
.unwrap()
})?;
let global_fields_ids_map = GlobalFieldsIdsMap::new(&new_fields_ids_map);
let new_embedders = settings_delta.new_embedders();
let embedder_actions = settings_delta.embedder_actions();
let index_embedder_category_ids = settings_delta.new_embedder_category_id();
@@ -327,6 +358,18 @@ where
})
.unwrap()?;
pool.install(|| {
// WARN When implementing the facets don't forget this
let facet_field_ids_delta = FacetFieldIdsDelta::new(0, 0);
post_processing::post_process(
indexing_context,
wtxn,
global_fields_ids_map,
facet_field_ids_delta,
)
})
.unwrap()?;
indexing_context.progress.update_progress(IndexingStep::BuildingGeoJson);
index.cellulite.build(
wtxn,
@@ -456,6 +499,106 @@ where
Ok(())
}
/// Deletes entries refering the provided
/// fids from the fid-based databases.
fn delete_old_fid_based_databases<SD, MSP>(
wtxn: &mut RwTxn<'_>,
index: &Index,
settings_delta: &SD,
must_stop_processing: &MSP,
progress: &Progress,
) -> Result<()>
where
SD: SettingsDelta + Sync,
MSP: Fn() -> bool + Sync,
{
let fids_to_delete: Option<BTreeSet<_>> = {
let rtxn = index.read_txn()?;
let fields_ids_map = index.fields_ids_map(&rtxn)?;
let old_searchable_attributes = settings_delta.old_searchable_attributes().as_ref();
let new_searchable_attributes = settings_delta.new_searchable_attributes().as_ref();
old_searchable_attributes.zip(new_searchable_attributes).map(|(old, new)| {
old.iter()
// Ignore the field if it is not searchable anymore
// or if it was never referenced in any document
.filter_map(|name| if new.contains(name) { None } else { fields_ids_map.id(name) })
.collect()
})
};
let Some(fids_to_delete) = fids_to_delete else {
return Ok(());
};
progress.update_progress(SettingsIndexerStep::DeletingOldWordFidDocids);
delete_old_word_fid_docids(wtxn, index.word_fid_docids, must_stop_processing, &fids_to_delete)?;
progress.update_progress(SettingsIndexerStep::DeletingOldFidWordCountDocids);
delete_old_fid_word_count_docids(wtxn, index, must_stop_processing, &fids_to_delete)?;
progress.update_progress(SettingsIndexerStep::DeletingOldWordPrefixFidDocids);
delete_old_word_fid_docids(
wtxn,
index.word_prefix_fid_docids,
must_stop_processing,
&fids_to_delete,
)?;
Ok(())
}
fn delete_old_word_fid_docids<'txn, MSP, DC>(
wtxn: &mut RwTxn<'txn>,
database: Database<StrBEU16Codec, DC>,
must_stop_processing: &MSP,
fids_to_delete: &BTreeSet<u16>,
) -> Result<(), Error>
where
MSP: Fn() -> bool + Sync,
DC: BytesDecode<'txn>,
{
let mut iter = database.iter_mut(wtxn)?.remap_data_type::<DecodeIgnore>();
while let Some(((_word, fid), ())) = iter.next().transpose()? {
// TODO should I call it that often?
if must_stop_processing() {
return Err(Error::InternalError(InternalError::AbortedIndexation));
}
if fids_to_delete.contains(&fid) {
// safety: We don't keep any references to the data.
unsafe { iter.del_current()? };
}
}
Ok(())
}
fn delete_old_fid_word_count_docids<MSP>(
wtxn: &mut RwTxn<'_>,
index: &Index,
must_stop_processing: &MSP,
fids_to_delete: &BTreeSet<u16>,
) -> Result<(), Error>
where
MSP: Fn() -> bool + Sync,
{
let db = index.field_id_word_count_docids.remap_data_type::<DecodeIgnore>();
for &fid_to_delete in fids_to_delete {
if must_stop_processing() {
return Err(Error::InternalError(InternalError::AbortedIndexation));
}
let mut iter = db.prefix_iter_mut(wtxn, &(fid_to_delete, 0))?;
while let Some(((fid, _word_count), ())) = iter.next().transpose()? {
debug_assert_eq!(fid, fid_to_delete);
// safety: We don't keep any references to the data.
unsafe { iter.del_current()? };
}
}
Ok(())
}
fn indexer_memory_settings(
current_num_threads: usize,
grenad_parameters: GrenadParameters,

View File

@@ -28,6 +28,9 @@ make_enum_progress! {
ChangingVectorStore,
UsingStableIndexer,
UsingExperimentalIndexer,
DeletingOldWordFidDocids,
DeletingOldFidWordCountDocids,
DeletingOldWordPrefixFidDocids,
}
}

View File

@@ -1589,33 +1589,33 @@ impl<'a, 't, 'i> Settings<'a, 't, 'i> {
// only use the new indexer when only the embedder possibly changed
if let Self {
searchable_fields: Setting::NotSet,
searchable_fields: _,
displayed_fields: Setting::NotSet,
filterable_fields: Setting::NotSet,
sortable_fields: Setting::NotSet,
criteria: Setting::NotSet,
stop_words: Setting::NotSet,
non_separator_tokens: Setting::NotSet,
separator_tokens: Setting::NotSet,
dictionary: Setting::NotSet,
stop_words: Setting::NotSet, // TODO (require force reindexing of searchables)
non_separator_tokens: Setting::NotSet, // TODO (require force reindexing of searchables)
separator_tokens: Setting::NotSet, // TODO (require force reindexing of searchables)
dictionary: Setting::NotSet, // TODO (require force reindexing of searchables)
distinct_field: Setting::NotSet,
synonyms: Setting::NotSet,
primary_key: Setting::NotSet,
authorize_typos: Setting::NotSet,
min_word_len_two_typos: Setting::NotSet,
min_word_len_one_typo: Setting::NotSet,
exact_words: Setting::NotSet,
exact_attributes: Setting::NotSet,
exact_words: Setting::NotSet, // TODO (require force reindexing of searchables)
exact_attributes: _,
max_values_per_facet: Setting::NotSet,
sort_facet_values_by: Setting::NotSet,
pagination_max_total_hits: Setting::NotSet,
proximity_precision: Setting::NotSet,
proximity_precision: _,
embedder_settings: _,
search_cutoff: Setting::NotSet,
localized_attributes_rules: Setting::NotSet,
prefix_search: Setting::NotSet,
localized_attributes_rules: Setting::NotSet, // TODO to start with
prefix_search: Setting::NotSet, // TODO continue with this
facet_search: Setting::NotSet,
disable_on_numbers: Setting::NotSet,
disable_on_numbers: Setting::NotSet, // TODO (require force reindexing of searchables)
chat: Setting::NotSet,
vector_store: Setting::NotSet,
wtxn: _,
@@ -1632,10 +1632,12 @@ impl<'a, 't, 'i> Settings<'a, 't, 'i> {
// Update index settings
let embedding_config_updates = self.update_embedding_configs()?;
self.update_user_defined_searchable_attributes()?;
self.update_exact_attributes()?;
self.update_proximity_precision()?;
let mut new_inner_settings =
InnerIndexSettings::from_index(self.index, self.wtxn, None)?;
new_inner_settings.recompute_searchables(self.wtxn, self.index)?;
// Note that we don't need to update the searchables here,
// as it will be done after the settings update.
let new_inner_settings = InnerIndexSettings::from_index(self.index, self.wtxn, None)?;
let primary_key_id = self
.index
@@ -2062,9 +2064,12 @@ impl InnerIndexSettings {
let sortable_fields = index.sortable_fields(rtxn)?;
let asc_desc_fields = index.asc_desc_fields(rtxn)?;
let distinct_field = index.distinct_field(rtxn)?.map(|f| f.to_string());
let user_defined_searchable_attributes = index
.user_defined_searchable_fields(rtxn)?
.map(|fields| fields.into_iter().map(|f| f.to_string()).collect());
let user_defined_searchable_attributes = match index.user_defined_searchable_fields(rtxn)? {
Some(fields) if fields.contains(&"*") => None,
Some(fields) => Some(fields.into_iter().map(|f| f.to_string()).collect()),
None => None,
};
let builder = MetadataBuilder::from_index(index, rtxn)?;
let fields_ids_map = FieldIdMapWithMetadata::new(fields_ids_map, builder);
let disabled_typos_terms = index.disabled_typos_terms(rtxn)?;
@@ -2578,8 +2583,20 @@ fn deserialize_sub_embedder(
/// Implement this trait for the settings delta type.
/// This is used in the new settings update flow and will allow to easily replace the old settings delta type: `InnerIndexSettingsDiff`.
pub trait SettingsDelta {
fn new_embedders(&self) -> &RuntimeEmbedders;
fn old_fields_ids_map(&self) -> &FieldIdMapWithMetadata;
fn new_fields_ids_map(&self) -> &FieldIdMapWithMetadata;
fn old_searchable_attributes(&self) -> &Option<Vec<String>>;
fn new_searchable_attributes(&self) -> &Option<Vec<String>>;
fn old_disabled_typos_terms(&self) -> &DisabledTyposTerms;
fn new_disabled_typos_terms(&self) -> &DisabledTyposTerms;
fn old_proximity_precision(&self) -> &ProximityPrecision;
fn new_proximity_precision(&self) -> &ProximityPrecision;
fn old_embedders(&self) -> &RuntimeEmbedders;
fn new_embedders(&self) -> &RuntimeEmbedders;
fn new_embedder_category_id(&self) -> &HashMap<String, u8>;
fn embedder_actions(&self) -> &BTreeMap<String, EmbedderAction>;
fn try_for_each_fragment_diff<F, E>(
@@ -2589,7 +2606,6 @@ pub trait SettingsDelta {
) -> std::result::Result<(), E>
where
F: FnMut(FragmentDiff) -> std::result::Result<(), E>;
fn new_fields_ids_map(&self) -> &FieldIdMapWithMetadata;
}
pub struct FragmentDiff<'a> {
@@ -2598,26 +2614,47 @@ pub struct FragmentDiff<'a> {
}
impl SettingsDelta for InnerIndexSettingsDiff {
fn new_embedders(&self) -> &RuntimeEmbedders {
&self.new.runtime_embedders
fn old_fields_ids_map(&self) -> &FieldIdMapWithMetadata {
&self.old.fields_ids_map
}
fn new_fields_ids_map(&self) -> &FieldIdMapWithMetadata {
&self.new.fields_ids_map
}
fn old_searchable_attributes(&self) -> &Option<Vec<String>> {
&self.old.user_defined_searchable_attributes
}
fn new_searchable_attributes(&self) -> &Option<Vec<String>> {
&self.new.user_defined_searchable_attributes
}
fn old_disabled_typos_terms(&self) -> &DisabledTyposTerms {
&self.old.disabled_typos_terms
}
fn new_disabled_typos_terms(&self) -> &DisabledTyposTerms {
&self.new.disabled_typos_terms
}
fn old_proximity_precision(&self) -> &ProximityPrecision {
&self.old.proximity_precision
}
fn new_proximity_precision(&self) -> &ProximityPrecision {
&self.new.proximity_precision
}
fn old_embedders(&self) -> &RuntimeEmbedders {
&self.old.runtime_embedders
}
fn new_embedders(&self) -> &RuntimeEmbedders {
&self.new.runtime_embedders
}
fn new_embedder_category_id(&self) -> &HashMap<String, u8> {
&self.new.embedder_category_id
}
fn embedder_actions(&self) -> &BTreeMap<String, EmbedderAction> {
&self.embedding_config_updates
}
fn new_fields_ids_map(&self) -> &FieldIdMapWithMetadata {
&self.new.fields_ids_map
}
fn try_for_each_fragment_diff<F, E>(
&self,
embedder_name: &str,

View File

@@ -14,28 +14,21 @@ fn set_and_reset_searchable_fields() {
let index = TempIndex::new();
// First we send 3 documents with ids from 1 to 3.
let mut wtxn = index.write_txn().unwrap();
index
.add_documents_using_wtxn(
&mut wtxn,
documents!([
{ "id": 1, "name": "kevin", "age": 23 },
{ "id": 2, "name": "kevina", "age": 21},
{ "id": 3, "name": "benoit", "age": 34 }
]),
)
.add_documents(documents!([
{ "id": 1, "name": "kevin", "age": 23 },
{ "id": 2, "name": "kevina", "age": 21},
{ "id": 3, "name": "benoit", "age": 34 }
]))
.unwrap();
// We change the searchable fields to be the "name" field only.
index
.update_settings_using_wtxn(&mut wtxn, |settings| {
.update_settings(|settings| {
settings.set_searchable_fields(vec!["name".into()]);
})
.unwrap();
wtxn.commit().unwrap();
db_snap!(index, fields_ids_map, @r###"
0 id |
1 name |

View File

@@ -44,6 +44,8 @@ const UPGRADE_FUNCTIONS: &[&dyn UpgradeIndex] = &[
&ToTargetNoOp { target: (1, 24, 0) },
&ToTargetNoOp { target: (1, 25, 0) },
&ToTargetNoOp { target: (1, 26, 0) },
&ToTargetNoOp { target: (1, 27, 0) },
&ToTargetNoOp { target: (1, 28, 0) },
// This is the last upgrade function, it will be called when the index is up to date.
// any other upgrade function should be added before this one.
&ToCurrentNoOp {},
@@ -81,6 +83,8 @@ const fn start(from: (u32, u32, u32)) -> Option<usize> {
(1, 24, _) => function_index!(14),
(1, 25, _) => function_index!(15),
(1, 26, _) => function_index!(16),
(1, 27, _) => function_index!(17),
(1, 28, _) => function_index!(18),
// We deliberately don't add a placeholder with (VERSION_MAJOR, VERSION_MINOR, VERSION_PATCH) here to force manually
// considering dumpless upgrade.
(_major, _minor, _patch) => return None,

View File

@@ -7,6 +7,8 @@ publish = false
[dependencies]
meilisearch = { path = "../meilisearch" , default-features = false}
serde_json = "1.0"
clap = { version = "4.5.40", features = ["derive"] }
anyhow = "1.0.98"
clap = { version = "4.5.52", features = ["derive"] }
anyhow = "1.0.100"
utoipa = "5.4.0"
reqwest = { version = "0.12", features = ["blocking"] }
regex = "1.10"

View File

@@ -1,10 +1,15 @@
use std::collections::HashMap;
use std::path::PathBuf;
use anyhow::Result;
use anyhow::{Context, Result};
use clap::Parser;
use meilisearch::routes::MeilisearchApi;
use regex::Regex;
use serde_json::{json, Value};
use utoipa::OpenApi;
const CODE_SAMPLES_URL: &str = "https://raw.githubusercontent.com/meilisearch/documentation/6a694aad651772cc3805fb39ab4f0d92273795de/.code-samples.meilisearch.yaml";
#[derive(Parser)]
#[command(name = "openapi-generator")]
#[command(about = "Generate OpenAPI specification for Meilisearch")]
@@ -16,6 +21,10 @@ struct Cli {
/// Pretty print the JSON output
#[arg(short, long)]
pretty: bool,
/// Skip fetching code samples (offline mode)
#[arg(long)]
no_code_samples: bool,
}
fn main() -> Result<()> {
@@ -24,14 +33,23 @@ fn main() -> Result<()> {
// Generate the OpenAPI specification
let openapi = MeilisearchApi::openapi();
// Convert to serde_json::Value for modification
let mut openapi_value: Value = serde_json::to_value(&openapi)?;
// Fetch and add code samples if not disabled
if !cli.no_code_samples {
let code_samples = fetch_code_samples()?;
add_code_samples_to_openapi(&mut openapi_value, &code_samples)?;
}
// Determine output path
let output_path = cli.output.unwrap_or_else(|| PathBuf::from("meilisearch.json"));
// Serialize to JSON
let json = if cli.pretty {
serde_json::to_string_pretty(&openapi)?
serde_json::to_string_pretty(&openapi_value)?
} else {
serde_json::to_string(&openapi)?
serde_json::to_string(&openapi_value)?
};
// Write to file
@@ -41,3 +59,244 @@ fn main() -> Result<()> {
Ok(())
}
/// Fetch and parse code samples from the documentation repository
fn fetch_code_samples() -> Result<HashMap<String, String>> {
let response = reqwest::blocking::get(CODE_SAMPLES_URL)
.context("Failed to fetch code samples")?
.text()
.context("Failed to read code samples response")?;
parse_code_samples_yaml(&response)
}
/// Parse the code samples YAML file
/// The format is:
/// ```yaml
/// # key_name
/// some_id: |-
/// curl \
/// -X GET 'URL'
/// ```
/// We extract the comment as the key and everything after `|-` as the value
fn parse_code_samples_yaml(content: &str) -> Result<HashMap<String, String>> {
let mut samples: HashMap<String, String> = HashMap::new();
let mut current_key: Option<String> = None;
let mut current_value: Vec<String> = Vec::new();
let mut in_code_block = false;
let mut base_indent: Option<usize> = None;
// Regex to match the start of a code sample (key: |-)
let code_start_re = Regex::new(r"^[a-zA-Z0-9_]+:\s*\|-\s*$")?;
// Regex to match comment lines that define keys
let comment_re = Regex::new(r"^#\s*([a-zA-Z0-9_]+)\s*$")?;
for line in content.lines() {
// Check if this is a comment line defining a new key
if let Some(caps) = comment_re.captures(line) {
// Save previous sample if exists
if let Some(key) = current_key.take() {
if !current_value.is_empty() {
samples.insert(key, current_value.join("\n").trim_end().to_string());
}
}
current_key = Some(caps[1].to_string());
current_value.clear();
in_code_block = false;
base_indent = None;
continue;
}
// Check if this starts a new code block
if code_start_re.is_match(line) {
in_code_block = true;
base_indent = None;
continue;
}
// If we're in a code block and have a current key, collect the content
if in_code_block && current_key.is_some() {
// Empty line or line with only whitespace
if line.trim().is_empty() {
// Only add empty lines if we've already started collecting
if !current_value.is_empty() {
current_value.push(String::new());
}
continue;
}
// Calculate indentation
let indent = line.len() - line.trim_start().len();
// Set base indent from first non-empty line
if base_indent.is_none() {
base_indent = Some(indent);
}
// If line has less indentation than base, we've exited the block
if indent < base_indent.unwrap_or(0) && !line.trim().is_empty() {
in_code_block = false;
continue;
}
// Remove base indentation and add to value
let base = base_indent.unwrap_or(0);
let dedented = if line.len() > base { &line[base..] } else { line.trim_start() };
current_value.push(dedented.to_string());
}
}
// Don't forget the last sample
if let Some(key) = current_key {
if !current_value.is_empty() {
samples.insert(key, current_value.join("\n").trim_end().to_string());
}
}
Ok(samples)
}
/// Convert an OpenAPI path to a code sample key
/// Path: /indexes/{index_uid}/documents/{document_id}
/// Method: GET
/// Key: get_indexes_indexUid_documents_documentId
fn path_to_key(path: &str, method: &str) -> String {
let method_lower = method.to_lowercase();
// Remove leading slash and convert path
let path_part = path
.trim_start_matches('/')
.split('/')
.map(|segment| {
if segment.starts_with('{') && segment.ends_with('}') {
// Convert {param_name} to camelCase
let param = &segment[1..segment.len() - 1];
to_camel_case(param)
} else {
// Keep path segments as-is, but replace hyphens with underscores
segment.replace('-', "_")
}
})
.collect::<Vec<_>>()
.join("_");
if path_part.is_empty() {
method_lower
} else {
format!("{}_{}", method_lower, path_part)
}
}
/// Convert snake_case to camelCase
fn to_camel_case(s: &str) -> String {
let mut result = String::new();
let mut capitalize_next = false;
for (i, c) in s.chars().enumerate() {
if c == '_' {
capitalize_next = true;
} else if capitalize_next {
result.push(c.to_ascii_uppercase());
capitalize_next = false;
} else if i == 0 {
result.push(c.to_ascii_lowercase());
} else {
result.push(c);
}
}
result
}
/// Add code samples to the OpenAPI specification
fn add_code_samples_to_openapi(
openapi: &mut Value,
code_samples: &HashMap<String, String>,
) -> Result<()> {
let paths = openapi
.get_mut("paths")
.and_then(|p| p.as_object_mut())
.context("OpenAPI spec missing 'paths' object")?;
for (path, path_item) in paths.iter_mut() {
let path_item = match path_item.as_object_mut() {
Some(p) => p,
None => continue,
};
let methods = ["get", "post", "put", "patch", "delete"];
for method in methods {
if let Some(operation) = path_item.get_mut(method) {
let key = path_to_key(path, method);
if let Some(sample) = code_samples.get(&key) {
// Create x-codeSamples array according to Redocly spec
let code_sample = json!([{
"lang": "cURL",
"source": sample
}]);
operation
.as_object_mut()
.map(|op| op.insert("x-codeSamples".to_string(), code_sample));
}
}
}
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_path_to_key() {
assert_eq!(path_to_key("/indexes", "GET"), "get_indexes");
assert_eq!(path_to_key("/indexes/{index_uid}", "GET"), "get_indexes_indexUid");
assert_eq!(
path_to_key("/indexes/{index_uid}/documents", "POST"),
"post_indexes_indexUid_documents"
);
assert_eq!(
path_to_key("/indexes/{index_uid}/documents/{document_id}", "GET"),
"get_indexes_indexUid_documents_documentId"
);
assert_eq!(path_to_key("/indexes/{index_uid}/settings/stop-words", "GET"), "get_indexes_indexUid_settings_stop_words");
}
#[test]
fn test_to_camel_case() {
assert_eq!(to_camel_case("index_uid"), "indexUid");
assert_eq!(to_camel_case("document_id"), "documentId");
assert_eq!(to_camel_case("task_uid"), "taskUid");
}
#[test]
fn test_parse_code_samples() {
let yaml = r#"
# get_indexes
list_all_indexes_1: |-
curl \
-X GET 'MEILISEARCH_URL/indexes'
# post_indexes
create_an_index_1: |-
curl \
-X POST 'MEILISEARCH_URL/indexes' \
-H 'Content-Type: application/json' \
--data-binary '{
"uid": "movies"
}'
"#;
let samples = parse_code_samples_yaml(yaml).unwrap();
assert_eq!(samples.len(), 2);
assert!(samples.contains_key("get_indexes"));
assert!(samples.contains_key("post_indexes"));
assert!(samples["get_indexes"].contains("curl"));
assert!(samples["post_indexes"].contains("POST"));
}
}

View File

@@ -8,8 +8,8 @@ edition = "2021"
[dependencies]
color-spantrace = "0.3.0"
fxprof-processed-profile = "0.7.0"
serde = { version = "1.0.219", features = ["derive"] }
serde_json = "1.0.140"
serde = { version = "1.0.228", features = ["derive"] }
serde_json = "1.0.145"
tracing = "0.1.41"
tracing-error = "0.2.1"
tracing-subscriber = "0.3.20"
@@ -18,7 +18,7 @@ byte-unit = { version = "5.1.6", default-features = false, features = [
"byte",
"serde",
] }
tokio = { version = "1.45.1", features = ["sync"] }
tokio = { version = "1.48.0", features = ["sync"] }
[target.'cfg(any(target_os = "linux", target_os = "macos"))'.dependencies]
libproc = "0.14.10"
libproc = "0.14.11"

View File

@@ -11,27 +11,27 @@ license.workspace = true
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
anyhow = "1.0.98"
anyhow = "1.0.100"
build-info = { version = "1.7.0", path = "../build-info" }
cargo_metadata = "0.20.0"
clap = { version = "4.5.40", features = ["derive"] }
cargo_metadata = "0.23.1"
clap = { version = "4.5.52", features = ["derive"] }
futures-core = "0.3.31"
futures-util = "0.3.31"
reqwest = { version = "0.12.20", features = [
reqwest = { version = "0.12.24", features = [
"stream",
"json",
"rustls-tls",
], default-features = false }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = "1.0.140"
serde = { version = "1.0.228", features = ["derive"] }
serde_json = "1.0.145"
sha2 = "0.10.9"
sysinfo = "0.35.2"
time = { version = "0.3.41", features = [
sysinfo = "0.37.2"
time = { version = "0.3.44", features = [
"serde",
"serde-human-readable",
"macros",
] }
tokio = { version = "1.45.1", features = [
tokio = { version = "1.48.0", features = [
"rt",
"net",
"time",
@@ -41,4 +41,5 @@ tokio = { version = "1.45.1", features = [
tracing = "0.1.41"
tracing-subscriber = "0.3.20"
tracing-trace = { version = "0.1.0", path = "../tracing-trace" }
uuid = { version = "1.17.0", features = ["v7", "serde"] }
uuid = { version = "1.18.1", features = ["v7", "serde"] }
similar-asserts = "1.7.0"

Some files were not shown because too many files have changed in this diff Show More