Compare commits

..

110 Commits

Author SHA1 Message Date
539edb5a64 Merge #3248
3248: Bump milli to v0.37.3 r=curquiza a=Kerollmops

This PR bumps the milli dependency from v0.37.2 to v0.37.3.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-12-14 13:52:13 +00:00
312a8afadc Bump milli to v0.37.3 2022-12-14 14:33:43 +01:00
0ceb51a123 Merge #3240
3240: Update version for the next release (v0.30.3) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2022-12-13 17:13:52 +00:00
d2a7f332c1 Update version for the next release (v0.30.3) in Cargo.toml files 2022-12-13 16:35:10 +00:00
4534825191 Merge #3237
3237: Fix the cli flags related to the import of dump and snapshot. r=dureuill a=irevoire

Some flags were badly applied + the database wrongly deleted when it shouldn't.

To reduce the number of mistakes we might make I added a bunch of comments + created a function that handles the import of an existing or empty database.


Here is the associated (working) code from the v0.29.3: https://github.com/meilisearch/meilisearch/blob/release-v0.29.3/meilisearch-lib/src/dump/mod.rs#L166-L191

Fix  #3238

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-12-13 16:31:13 +00:00
bb9b3e0bbb rename the two new functions 2022-12-13 17:25:49 +01:00
79656585dd Fix typos
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-12-13 17:02:07 +01:00
bbe4a84ddc Fix the import of dumps and snapshot.
Some flags were badly applied + the database wrongly deleted when they shouldn't
2022-12-13 16:39:05 +01:00
125f0b1522 Merge #3218
3218: Update version for the next release (v0.30.2) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2022-12-08 13:28:00 +00:00
eb758343bd Merge #3217
3217: Bump milli to v0.37.2 r=Kerollmops a=curquiza

⚠️ Please ensure milli has been updated in EVERY cargo.tom ⚠️ 

Co-authored-by: curquiza <clementine@meilisearch.com>
2022-12-08 13:01:44 +00:00
4b61ffae67 Update version for the next release (v0.30.2) in Cargo.toml files 2022-12-08 12:32:02 +00:00
b259bc686d Bump milli to v0.37.2 2022-12-08 13:23:44 +01:00
6ddde37850 Merge #3213
3213: Fix the instance-uid in the data.ms r=Kerollmops a=irevoire

We were writing the instance-uid as bytes instead of strings in the data.ms, and thus we were unable to parse it later. Also, it was less practical for our users to retrieve it and send it to us.

Fix #3214

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-12-07 19:08:17 +00:00
ee7a4be95c Fix the instance-uid in the data.ms
We were writing the instance-uid as bytes instead of string in the data.ms and thus we were unable to parse it later.
Also it was less practical for our user to retrieve it and send it to us.
2022-12-07 18:22:36 +01:00
58cd5d29e8 Merge #3202 #3203
3202: Bump milli to v0.37.1 r=curquiza a=Kerollmops

This PR bumps milli to v0.37.1 and fixes #3167, #3178, #3165, and #3021.

3203: Update version for the next release (v0.30.1) in Cargo.toml files r=Kerollmops a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2022-12-06 16:22:38 +00:00
d3d794e9ba Update version for the next release (v0.30.1) in Cargo.toml files 2022-12-06 16:20:31 +00:00
ef978a6106 Bump milli to v0.37.1 2022-12-06 17:11:03 +01:00
ee372099fd Merge #3160
3160: Clamp databases max size to the page size r=irevoire a=Kerollmops

This PR fixes #3150 (again #2662). We fix it again, here, as we entirely rewrote the index scheduler and forgot about this little detail.

`@irevoire` Can I have your input on where we create the indexes in the tests? I want to use a non-page-size rounded value in the tests. This way, we can see this issue in the tests next time.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-29 14:43:49 +00:00
e6f4a8a992 Clamp the databases size to the page size 2022-11-29 15:26:48 +01:00
b948bb5191 Make the tests use MB to trigger page size issues 2022-11-29 15:26:48 +01:00
27ac62ec33 Merge #3161
3161: « Fix » dump tests r=curquiza a=irevoire

This PR doesn't re-enable the dump tests.
It's a cherry-pick of https://github.com/meilisearch/meilisearch/pull/3149 that should avoid a conflict when we merge on `main` later

Close  #3153

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-29 12:30:54 +00:00
0aec840d20 Fix the dump tests 2022-11-29 10:45:39 +01:00
0aa3f667d4 Merge #3136
3136: Fix no master key error r=Kerollmops a=ManyTheFish


Fixes #3135

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-11-24 15:25:11 +00:00
1eba5d45ea Check if the master key is missing before returning an error 2022-11-24 16:02:14 +01:00
cfa78418f2 Update tests 2022-11-24 15:51:15 +01:00
aaf5abbf1c Merge #3085
3085: refactorize the whole test suite r=irevoire a=irevoire

1. Make a call to assert_internally_consistent automatically when snapshotting the scheduler. There is no point in snapshotting something broken and expecting the dumb humans to notice.
2. Replace every possible call to assert_internally_consistent with a snapshot of the scheduler. It uses the same amount of lines and ensures we never change something without noticing in any tests ever.
3. Name every snapshot: it's easier to debug when something goes wrong and easier to review in general.
4. Stop skipping breakpoints; it's too easy to miss something. Now you must explicitly show which path the scheduler is supposed to use.
5. Add a timeout on the channel.recv, it eases the process of writing tests; now, when something files, you get a failure instead of a deadlock.

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-11-23 19:55:08 +00:00
f724f8adfe Merge #3122
3122: Display the `dumpUid` as `null` until we create it r=irevoire a=Kerollmops

This PR fixes #3117 by displaying the `DumpCreation` `dumpUid` details field as `null` until we compute the dump and the task is finished.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-23 14:37:50 +00:00
cde2a96486 Display a null dumpUid until we computed the dump itself on disk 2022-11-23 15:16:58 +01:00
ceca386dc0 Merge #3114
3114: Update the analytics on the ranking rules r=irevoire a=irevoire

fix #3113

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-11-23 14:03:57 +00:00
370a45a58b send the ranking rules as a string because amplitude is too dumb to process an array as a single value 2022-11-23 14:56:22 +01:00
7093bae131 Update the dump test to check for the dumpUid dumpCreation task details 2022-11-23 14:48:39 +01:00
fb785dc5ac Add more analytics on the ranking rules positions 2022-11-23 12:51:34 +01:00
3a0b1a0c0e try to remove the flakyness of the failing test 2022-11-23 11:22:24 +01:00
af808462b6 update the snapshots after a rebase 2022-11-23 11:16:59 +01:00
2999ae3da4 makes clippy happy 2022-11-23 11:13:54 +01:00
23ec7db3f9 rebase on release-v0.30 2022-11-23 11:13:54 +01:00
f02e5cfaa6 refactorize the whole test suite
1. Make a call to assert_internally_consistent automatically when snapshoting the scheduler. There is no point in snapshoting something broken and expect the dumb humans to notice.
2. Replace every possible call to assert_internally_consistent by a snapshot of the scheduler. It takes as many lines and ensure we never change something without noticing in any tests ever.
3. Name every snapshots: it's easier to debug when something goes wrong and easier to review in general.
4. Stop skipping breakpoints, it's too easy to miss something. Now you must explicitely show which path is the scheduler supposed to use.
5. Add a timeout on the channel.recv, it eases the process of writing tests, now when something file you get a failure instead of a deadlock.
2022-11-23 11:13:54 +01:00
8443554b1f Merge #3110
3110: Always display `deletedDocuments` in the `IndexDeletion` details r=ManyTheFish a=Kerollmops

This PR fixes #3108 by always displaying a `deletedDocuments` details info.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-23 09:37:02 +00:00
84c782ce9a Fix the insta tests 2022-11-22 18:53:17 +01:00
1392a3b304 Merge #3106
3106: Fix linking error when building binaries for aarch64 r=curquiza a=Kerollmops

This PR tries to fix #3094. Please don't look too close. It will be horrifying 😱
You can look at [the status of the fix in the CI](https://github.com/meilisearch/meilisearch/actions/runs/3523723323/jobs/5908229083).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-22 15:49:53 +00:00
2ec699a2e7 Always display details for the indexDeletion task 2022-11-22 15:14:28 +01:00
d02837f982 Don't use gold but the default linker 2022-11-22 14:48:57 +01:00
6784d17d0e Merge #3105
3105: Fix publish release CI r=Kerollmops a=curquiza

Fix CI to avoid release creation when triggering manually the CI with `workflow_dispatch` -> only trigger the upload of binary when the event is `release` instead of different of `schedule`

Co-authored-by: curquiza <clementine@meilisearch.com>
2022-11-22 10:49:52 +00:00
415977a41e Fix publish release CI 2022-11-22 11:38:45 +01:00
a07e1f7a00 Merge #3099
3099: Add a dispatch to the publish binaries workflow r=curquiza a=Kerollmops

This PR adds a dispatch to the publish binaries workflow to help us debug #3094.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-21 17:15:20 +00:00
b0460abf54 Add a dispatch to the publish binaries workflow 2022-11-21 18:07:47 +01:00
48e00cdf3f Merge #3096
3096: Fix total memory computation r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #3018 

## What does this PR do?
- Don't multiply the total memory value returned by sysinfo by 1024 anymore:
  According to the [changelog of sysinfo 0.26.0](https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md#0260), units are now in bytes and not KBs.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-11-21 15:08:51 +00:00
aaea5f87db Don't multiply total memory returned by sysinfo anymore
sysinfo now returns bytes rather than KB
2022-11-21 15:28:05 +01:00
44440bb8f4 Merge #3084
3084: Bump the `milli` and `grenad` dependencies r=irevoire a=Kerollmops

This PR is bumping the milli direct and the grenad indirect dependencies.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-11-17 16:21:25 +00:00
ec74fd6b44 Bump milli to version v0.37.0 2022-11-17 17:02:15 +01:00
d3bc0c6e93 Bump grenad to 0.4.4 2022-11-17 16:59:13 +01:00
262c8bf68b Merge #3083
3083: Add `workflow_dispatch` to flaky.yml r=irevoire a=curquiza

Following this PR: bringing the change into `release-v0.30.0` as well, otherwise we cannot test

<img width="346" alt="Capture d’écran 2022-11-17 à 16 55 12" src="https://user-images.githubusercontent.com/20380692/202494559-6032665c-e336-4d4b-8a84-8be5e2371370.png">


Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-11-17 15:57:26 +00:00
c4a669d056 Add workflow_dispatch to flaky.yml 2022-11-17 16:54:09 +01:00
dfaf845382 Merge #3080
3080: Rename `originalFilters` into `originalFilter` on the cancelation and deletion routes r=irevoire a=Kerollmops

This PR fixes https://github.com/meilisearch/meilisearch/issues/3079.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-11-17 13:49:50 +00:00
1fe9fccf6e Merge #3081
3081: Rename `matchedDocuments` into `providedIds` r=ManyTheFish a=Kerollmops

This PR fixes https://github.com/meilisearch/meilisearch/issues/3078.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-11-17 12:55:59 +00:00
9fe32e1e3b Rename matchedDocuments into providedIds 2022-11-17 13:48:23 +01:00
388305fcb6 Rename originalFilters into originalFilters 2022-11-17 13:44:00 +01:00
49bc45e0d4 Merge #3074
3074: add the analytics of the swap-indexes route r=irevoire a=irevoire

implements https://github.com/meilisearch/specifications/pull/192/files#diff-dbac052211a8ea4b2c5d068a6264380740b022efead552b4dfc9e4e8d961f0f4R276-R284

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-17 11:29:35 +00:00
b478b18218 Merge #3071
3071: Analytics on the tasks route r=Kerollmops a=irevoire

Implement the missing analytics on the delete and cancel task routes.
+ Batch the analytics on the `GET tasks` route to avoid flooding ourselves while polling meilisearch.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-17 11:05:56 +00:00
877d1735b1 Merge #3077
3077: Don't remove DB if unreadable r=irevoire a=dureuill

# Pull Request

## Related issue
Related to #3069

Will fix it after merging into `main`

## What does this PR do?

### User standpoint

- When the DB cannot be read after opening it, Meilisearch exits with an error message like previously, but it now leaves the DB untouched
- When the DB cannot be read after importing it from a snapshot or dump, it is still removed, like previously.

### Dev standpoint

- Add new local enum `OnFailure` that is used as a parameter to the `meilisearch_builder` closure to control when to keep or remove the DB in case of failure.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-11-17 09:38:27 +00:00
a1a29e92fd Stop removing the DB when failing to read it 2022-11-17 09:33:48 +01:00
fea969d5e5 Merge #3072
3072: Update the finite pagination analytics r=ManyTheFish a=irevoire

Implement this spec: https://github.com/meilisearch/specifications/pull/196/files?show-viewed-files=true&file-filters%5B%5D=#diff-dbac052211a8ea4b2c5d068a6264380740b022efead552b4dfc9e4e8d961f0f4R111

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-17 08:04:58 +00:00
fcca7475fa add the analytics of the swap-indexes route 2022-11-16 19:10:01 +01:00
3fc1d7e67b Update the finite pagination analytics 2022-11-16 19:01:21 +01:00
f1884d6910 batch the tasks seen events 2022-11-16 18:45:19 +01:00
0e6394fafc add analytics on the task route
* Add all the missing fields of the new task query type
* Create a new analytics for the task deletion
* Create a new analytics for the task creation
2022-11-16 18:45:19 +01:00
637ca7b9fa Merge #3067
3067: Fix task details serialization r=Kerollmops a=ManyTheFish

# Pull Request

- document addition task details always contain the field `indexedDocuments`
  - value is set to `null` when the task is enqueued or processing
  - value is set to `0` when the task is canceled or failed
- the field `deletedDocuments` of the document deletion task details is set to `0` when the task is canceled or failed
- the field `deletedDocuments` of the document clearAll task details is set to `0` when the task is canceled or failed
- the field `deletedTasks` of the task deletion task details is set to `0` when the task is canceled or failed
- the field `canceledTasks` of the task cancelation task details is set to `0` when the task is canceled or failed

## Related issue
Fixes #3057
Fixes #3058


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-11-16 16:15:03 +00:00
25e39edc7e Fix tests 2022-11-16 16:45:20 +01:00
f908ae2ef4 Merge #3068
3068: Prepend question mark to the `originalFilters` of the task deletion and cancelation r=irevoire a=Kerollmops

This pull request fixes #3064 by prepending [the HTTP query question mark](https://en.wikipedia.org/wiki/Question_mark#Computing) to the `originalFilters` of the task deletion and cancelation.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-16 15:02:04 +00:00
8ddec58430 Merge #3061
3061: Name spawned threads r=irevoire a=dureuill

# Pull Request

## Related issue
None, this is to improve debuggability

## What does this PR do?
- This PR replaces the raw `thread::spawn(...)` calls by `thread::Builder::new().name(...).spawn(...).unwrap()` calls so that we can give meaningful names to threads. 
- This PR also setup the `rayon` thread pool to give a name to its threads.
- This improves debuggability, as the thread names are reported by debuggers:

<img width="411" alt="Capture d’écran 2022-11-16 à 10 26 27" src="https://user-images.githubusercontent.com/41078892/202141870-a88663aa-d2f8-494f-b4da-709fdbd072ba.png">

(screen showing vscode's debugger and its main/scheduler/indexing threads)

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-11-16 14:25:32 +00:00
b4d0403518 Merge #3065
3065: Implement the analytics on the health and version routes r=Kerollmops a=irevoire

Fix https://github.com/meilisearch/meilisearch/issues/2955

Must be merged after https://github.com/meilisearch/meilisearch/pull/3063

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-16 14:01:43 +00:00
3525c964a7 Add the question mark to the task cancelation query filter 2022-11-16 14:30:35 +01:00
ed51df41e5 Add the question mark to the task deletion query filter 2022-11-16 14:28:30 +01:00
7f89e302a2 Merge #3049 #3063
3049: Plural on task query r=Kerollmops a=irevoire

Fixes #3045 and fixes #3046

3063: Update the analytics on the search r=Kerollmops a=irevoire

Partially fix https://github.com/meilisearch/meilisearch/issues/2955

Must be merged after https://github.com/meilisearch/meilisearch/pull/3060

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-16 13:20:40 +00:00
c603e17b1d Merge #3060
3060: Add analytics on all the settings r=Kerollmops a=irevoire

Partially fix https://github.com/meilisearch/meilisearch/issues/2955

Must be merged after https://github.com/meilisearch/meilisearch/pull/3059

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-16 12:48:23 +00:00
07b28ea8cf Fix task details serialization 2022-11-16 13:47:08 +01:00
10ab5f6a58 implements the analytics on the health and version routes 2022-11-16 13:06:10 +01:00
684b90066d update the analytics on the search route 2022-11-16 12:30:49 +01:00
93ab019304 update the distinct attributes to the spec update 2022-11-16 12:25:33 +01:00
1ef517f63d Merge #3059
3059: Create the analytics for the document deletion r=Kerollmops a=irevoire

Partially fix #2955

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-16 10:44:59 +00:00
1a1ede96de Spawn rayon threads with names 2022-11-16 10:28:25 +01:00
93afeedcea Spawn threads with names 2022-11-16 09:50:47 +01:00
25d057b75e Add analytics on all the settings 2022-11-15 19:09:02 +01:00
b44c381c2a Store analytics for the documents deletions 2022-11-15 19:08:45 +01:00
51be75a264 Merge #3056
3056: refactor the way we send the cli informations + add the analytics for the config file and ssl usage r=Kerollmops a=irevoire

Partially fix #2955

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-15 17:03:30 +00:00
4953b62712 reformat, sorry @kero 2022-11-15 17:57:27 +01:00
9473cccc27 add a comment over the new infos structure 2022-11-15 17:56:07 +01:00
9327db3e91 Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-11-15 17:53:23 +01:00
0fced6f270 refactor the way we send the cli informations + add the analytics for the config file and ssl usage 2022-11-15 16:32:31 +01:00
b4434dcad2 rename the error codes for the sake of consistency 2022-11-14 23:27:02 +01:00
d08d97bf43 fix the error messages and add tests 2022-11-14 23:27:02 +01:00
a8991ccb64 Merge #3036
3036: Bump milli to v0.35.1 r=irevoire a=Kerollmops

This PR bumps milli to v0.35.1 which brings some fixes. You can see [the changelog of milli on the release page](https://github.com/meilisearch/milli/releases/tag/v0.35.1).

Fixes #2905
Fixes #3004
Fixes #3000
Fixes #3021
Fixes #2945

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-10 14:25:54 +00:00
761bd3aca4 Fix the new error messages 2022-11-10 12:04:25 +01:00
26ab6ab0cc Bump milli version to 0.35.1 2022-11-10 12:04:25 +01:00
379522ace3 Merge #3023
3023: Update error codes related to tasks cancelation + add canceledBy filter r=Kerollmops a=Kerollmops

<details>This PR changes the error codes [to follow the specification](https://github.com/meilisearch/specifications/pull/195).

 - [x] The `missing_filters` error code is renamed `missing_task_filters` to be more accurate and follow the `invalid_task_*` convention.
 - [x] The error code `invalid_task_uids_filter` is added.
 - [x] The error code `invalid_task_canceled_by_filter` is added.
 - [x] The error code `invalid_task_date_filter` is added.
      -  The error message is the same as for expires_at in the API Key  EXCEPT that it does not explicitly mention that a date must be given in the future.
</details>

Edit by `@loiclec` :
I have added a few more changes into this PR. The related issues are:

- Fixes https://github.com/meilisearch/meilisearch/issues/3029
- Implements https://github.com/meilisearch/meilisearch/issues/3026
- Fixes https://github.com/meilisearch/meilisearch/issues/2940
- Fixes https://github.com/meilisearch/meilisearch/issues/2939

Additionally:
- Fixes a bug where global tasks were returned by `GET /tasks` queries even if the user did not have the `index.*` API key action.
- Rename `originalQuery` to `originalFilters`
- Display `error: null` and `canceledBy: null` in the task views
- Allow using the star operator in the task filters in the `DELETE /tasks` and `POST /tasks/cancel` routes
- Make sure that the index scheduler keeps making progress even when a grave error occurs.


Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-10 10:51:41 +00:00
1d5f17a9ea Merge #3032
3032: Fix Index name parsing error message to fit the specification r=Kerollmops a=ManyTheFish

fixes #2924


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-11-08 16:01:53 +00:00
8bb260bf3e Fix Index name parsing error message to fit the specification 2022-11-08 16:35:58 +01:00
52b38bee9d Make rustfmt happy
>:-(
2022-11-08 15:45:53 +01:00
f5454dfa60 Make clippy happy
They're a happy clip now.
2022-11-08 15:31:08 +01:00
1e464e87fc Update more insta-snap tests 2022-11-08 13:39:52 +01:00
6126fc8d98 Rename original_query to original_filters everywhere 2022-11-08 13:18:18 +01:00
2fdd814e57 Update tests following task API changes 2022-11-08 13:18:18 +01:00
20fa103992 Add canceledBy task filter 2022-11-08 13:18:18 +01:00
d5638d2c27 Use more precise error codes/message for the task routes
+ Allow star operator in delete/cancel tasks
+ rename originalQuery to originalFilters
+ Display error/canceled_by in task view even when they are = null
+ Rename task filter fields by using their plural forms
+ Prepare an error code for canceledBy filter
+ Only return global tasks if the API key action `index.*` is there
2022-11-08 13:18:17 +01:00
932414bf72 WIP Introduce the invalid_task_uid error code 2022-11-08 13:17:56 +01:00
b20025c01e Change the missing_filters error code into missing_task_filters 2022-11-08 13:17:29 +01:00
3999f74f78 Merge #3022
3022: Store the `started_at` for a task that is canceled when processing r=irevoire a=Kerollmops

This PR changes the current behavior of the displayed tasks. When a processing task is canceled, the `started_at` date time is kept and displayed to the user. Otherwise, if the task was just enqueued, the `started_at` remains `null`. If a task is processing, the engine is ctrl-c, starts again, and the task becomes enqueued again, so if it is canceled, its `started_at` will be `null`.

You can read more [in this discussion](https://github.com/meilisearch/specifications/pull/195/files#r1009602335).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-03 20:48:24 +00:00
739b9f5505 Use the content of the ProcessingTasks in the tasks cancelation system 2022-11-03 11:09:59 +01:00
722a0da0c3 Merge #3019
3019: Fix error code of the "duplicate index found" error on the index swap route r=irevoire a=loiclec

According to the spec, the code of the error should be `duplicate_index_found`, but it was `bad_request` instead.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-03 09:49:55 +00:00
5704a1895d Fix error code of the "duplicate index found" error 2022-11-02 09:34:50 +01:00
1590 changed files with 30643 additions and 220011 deletions

View File

@ -1,2 +0,0 @@
[alias]
xtask = "run --release --package xtask --"

View File

@ -23,8 +23,7 @@ A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Meilisearch version:**
[e.g. v0.20.0]
**Meilisearch version:** [e.g. v0.20.0]
**Additional context**
Additional information that may be relevant to the issue.

View File

@ -1,54 +0,0 @@
---
name: New sprint issue
about: ⚠️ Should only be used by the engine team ⚠️
title: ''
labels: 'missing usage in PRD, impacts docs'
assignees: ''
---
Related product team resources: [PRD]() (_internal only_)
Related product discussion:
## Motivation
<!---Copy/paste the information in PRD or briefly detail the product motivation. Ask product team if any hesitation.-->
## Usage
<!---Link to the public part of the PRD, or to the related product discussion for experimental features-->
## TODO
<!---If necessary, create a list with technical/product steps-->
### Reminders when modifying the API
- [ ] Update the openAPI file with utoipa:
- [ ] If a new module has been introduced, create a new structure deriving [the OpenAPI proc-macro](https://docs.rs/utoipa/latest/utoipa/derive.OpenApi.html) and nest it in the main [openAPI structure](https://github.com/meilisearch/meilisearch/blob/f2185438eed60fa32d25b15480c5ee064f6fba4a/crates/meilisearch/src/routes/mod.rs#L64-L78).
- [ ] If a new route has been introduced, add the [path decorator](https://docs.rs/utoipa/latest/utoipa/attr.path.html) to it and add the route at the top of the file in its openAPI structure.
- [ ] If a structure which is deserialized or serialized in the API has been introduced or modified, it must derive the [`schema`](https://docs.rs/utoipa/latest/utoipa/macro.schema.html) or the [`IntoParams`](https://docs.rs/utoipa/latest/utoipa/derive.IntoParams.html) proc-macro.
If it's a **new** structure you must also add it to the big list of structures [in the main `OpenApi` structure](https://github.com/meilisearch/meilisearch/blob/f2185438eed60fa32d25b15480c5ee064f6fba4a/crates/meilisearch/src/routes/mod.rs#L88).
- [ ] Once everything is done, start Meilisearch with the swagger flag: `cargo run --features swagger`, open `http://localhost:7700/scalar` on your browser, and ensure everything works as expected.
- For more info, refer to [this presentation](https://pitch.com/v/generating-the-openapi-file-jrn3nh).
### Reminders when modifying the Setting API
<!--- Special steps to remind when adding a new index setting -->
- [ ] Ensure the new setting route is at least tested by the [`test_setting_routes` macro](https://github.com/meilisearch/meilisearch/blob/5204c0b60b384cbc79621b6b2176fca086069e8e/meilisearch/tests/settings/get_settings.rs#L276)
- [ ] Ensure Analytics are fully implemented
- [ ] `/settings/my-new-setting` configurated in the [`make_setting_routes` macro](https://github.com/meilisearch/meilisearch/blob/5204c0b60b384cbc79621b6b2176fca086069e8e/meilisearch/src/routes/indexes/settings.rs#L141-L165)
- [ ] global `/settings` route configurated in the [`update_all` function](https://github.com/meilisearch/meilisearch/blob/5204c0b60b384cbc79621b6b2176fca086069e8e/meilisearch/src/routes/indexes/settings.rs#L655-L751)
- [ ] Ensure the dump serializing is consistent with the `/settings` route serializing, e.g., enums case can be different (`camelCase` in route and `PascalCase` in the dump)
#### Special cases when adding a setting for an experimental feature
- [ ] ⚠️ API stability: The setting does not appear on the main settings route when the feature has never been enabled (e.g. mark it `Unset` when returned from the index in this situation. See [an example](https://github.com/meilisearch/meilisearch/blob/7a89abd2a025606a42f8b219e539117eb2eb029f/meilisearch-types/src/settings.rs#L608))
- [ ] The setting cannot be set when the feature is disabled, either by the main settings route or the subroute (see [`validate_settings` function](https://github.com/meilisearch/meilisearch/blob/7a89abd2a025606a42f8b219e539117eb2eb029f/meilisearch/src/routes/indexes/settings.rs#L811))
- [ ] If possible, the setting is reset when the feature is disabled (hard if it requires reindexing)
## Impacted teams
<!---Ping the related teams. Ask for the engine manager if any hesitation-->
<!---@meilisearch/docs-team when there is any API change, e.g. settings addition-->

View File

@ -2,6 +2,7 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:

View File

@ -1,41 +1,28 @@
#!/usr/bin/env bash
set -eu -o pipefail
#!/bin/bash
check_tag() {
local expected=$1
local actual=$2
local filename=$3
if [[ $actual != $expected ]]; then
echo >&2 "Error: the current tag does not match the version in $filename: found $actual, expected $expected"
return 1
fi
# check_tag $current_tag $file_tag $file_name
function check_tag {
if [[ "$1" != "$2" ]]; then
echo "Error: the current tag does not match the version in $3: found $2 - expected $1"
ret=1
fi
}
read_version() {
grep '^version = ' | cut -d \" -f 2
}
if [[ -z "${GITHUB_REF:-}" ]]; then
echo >&2 "Error: GITHUB_REF is not set"
exit 1
fi
if [[ ! "$GITHUB_REF" =~ ^refs/tags/v[0-9]+\.[0-9]+\.[0-9]+(-[a-z0-9]+)?$ ]]; then
echo >&2 "Error: GITHUB_REF is not a valid tag: $GITHUB_REF"
exit 1
fi
current_tag=${GITHUB_REF#refs/tags/v}
ret=0
current_tag=${GITHUB_REF#'refs/tags/v'}
toml_tag="$(cat Cargo.toml | read_version)"
check_tag "$current_tag" "$toml_tag" Cargo.toml || ret=1
toml_files='*/Cargo.toml'
for toml_file in $toml_files;
do
file_tag="$(grep '^version = ' $toml_file | cut -d '=' -f 2 | tr -d '"' | tr -d ' ')"
check_tag $current_tag $file_tag $toml_file
done
lock_tag=$(grep -A 1 '^name = "meilisearch-auth"' Cargo.lock | read_version)
check_tag "$current_tag" "$lock_tag" Cargo.lock || ret=1
lock_file='Cargo.lock'
lock_tag=$(grep -A 1 'name = "meilisearch-auth"' $lock_file | grep version | cut -d '=' -f 2 | tr -d '"' | tr -d ' ')
check_tag $current_tag $lock_tag $lock_file
if (( ret == 0 )); then
echo 'OK'
if [[ "$ret" -eq 0 ]] ; then
echo 'OK'
fi
exit $ret

View File

@ -1,41 +1,127 @@
#!/bin/sh
# Used in our CIs to publish the latest Docker image.
# Was used in our CIs to publish the latest docker image. Not used anymore, will be used again when v1 and v2 will be out and we will want to maintain multiple stable versions.
# Returns "true" or "false" (as a string) to be used in the `if` in GHA
# Checks if the current tag ($GITHUB_REF) corresponds to the latest release tag on GitHub
# Returns "true" or "false" (as a string).
# Checks if the current tag should be the latest (in terms of semver and not of release date).
# Ex: previous tag -> v2.1.1
# new tag -> v1.20.3
# The new tag (v1.20.3) should NOT be the latest
# So it returns "false", the `latest` tag should not be updated for the release v1.20.3 and still need to correspond to v2.1.1
GITHUB_API='https://api.github.com/repos/meilisearch/meilisearch/releases'
PNAME='meilisearch'
# GLOBAL
GREP_SEMVER_REGEXP='v\([0-9]*\)[.]\([0-9]*\)[.]\([0-9]*\)$' # i.e. v[number].[number].[number]
# FUNCTIONS
# Returns the version of the latest stable version of Meilisearch by setting the $latest variable.
# semverParseInto and semverLT from https://github.com/cloudflare/semver_bash/blob/master/semver.sh
# usage: semverParseInto version major minor patch special
# version: the string version
# major, minor, patch, special: will be assigned by the function
semverParseInto() {
local RE='[^0-9]*\([0-9]*\)[.]\([0-9]*\)[.]\([0-9]*\)\([0-9A-Za-z-]*\)'
#MAJOR
eval $2=`echo $1 | sed -e "s#$RE#\1#"`
#MINOR
eval $3=`echo $1 | sed -e "s#$RE#\2#"`
#MINOR
eval $4=`echo $1 | sed -e "s#$RE#\3#"`
#SPECIAL
eval $5=`echo $1 | sed -e "s#$RE#\4#"`
}
# usage: semverLT version1 version2
semverLT() {
local MAJOR_A=0
local MINOR_A=0
local PATCH_A=0
local SPECIAL_A=0
local MAJOR_B=0
local MINOR_B=0
local PATCH_B=0
local SPECIAL_B=0
semverParseInto $1 MAJOR_A MINOR_A PATCH_A SPECIAL_A
semverParseInto $2 MAJOR_B MINOR_B PATCH_B SPECIAL_B
if [ $MAJOR_A -lt $MAJOR_B ]; then
return 0
fi
if [ $MAJOR_A -le $MAJOR_B ] && [ $MINOR_A -lt $MINOR_B ]; then
return 0
fi
if [ $MAJOR_A -le $MAJOR_B ] && [ $MINOR_A -le $MINOR_B ] && [ $PATCH_A -lt $PATCH_B ]; then
return 0
fi
if [ "_$SPECIAL_A" == "_" ] && [ "_$SPECIAL_B" == "_" ] ; then
return 1
fi
if [ "_$SPECIAL_A" == "_" ] && [ "_$SPECIAL_B" != "_" ] ; then
return 1
fi
if [ "_$SPECIAL_A" != "_" ] && [ "_$SPECIAL_B" == "_" ] ; then
return 0
fi
if [ "_$SPECIAL_A" < "_$SPECIAL_B" ]; then
return 0
fi
return 1
}
# Returns the tag of the latest stable release (in terms of semver and not of release date)
get_latest() {
# temp_file is needed because the grep would start before the download is over
temp_file=$(mktemp -q /tmp/$PNAME.XXXXXXXXX)
latest_release="$GITHUB_API/latest"
temp_file='temp_file' # temp_file needed because the grep would start before the download is over
curl -s 'https://api.github.com/repos/meilisearch/meilisearch/releases' > "$temp_file"
releases=$(cat "$temp_file" | \
grep -E "tag_name|draft|prerelease" \
| tr -d ',"' | cut -d ':' -f2 | tr -d ' ')
# Returns a list of [tag_name draft_boolean prerelease_boolean ...]
# Ex: v0.10.1 false false v0.9.1-rc.1 false true v0.9.0 false false...
if [ $? -ne 0 ]; then
echo "$0: Can't create temp file."
exit 1
fi
if [ -z "$GITHUB_PAT" ]; then
curl -s "$latest_release" > "$temp_file" || return 1
else
curl -H "Authorization: token $GITHUB_PAT" -s "$latest_release" > "$temp_file" || return 1
fi
latest="$(cat "$temp_file" | grep '"tag_name":' | cut -d ':' -f2 | tr -d '"' | tr -d ',' | tr -d ' ')"
i=0
latest=""
current_tag=""
for release_info in $releases; do
if [ $i -eq 0 ]; then # Checking tag_name
if echo "$release_info" | grep -q "$GREP_SEMVER_REGEXP"; then # If it's not an alpha or beta release
current_tag=$release_info
else
current_tag=""
fi
i=1
elif [ $i -eq 1 ]; then # Checking draft boolean
if [ "$release_info" = "true" ]; then
current_tag=""
fi
i=2
elif [ $i -eq 2 ]; then # Checking prerelease boolean
if [ "$release_info" = "true" ]; then
current_tag=""
fi
i=0
if [ "$current_tag" != "" ]; then # If the current_tag is valid
if [ "$latest" = "" ]; then # If there is no latest yet
latest="$current_tag"
else
semverLT $current_tag $latest # Comparing latest and the current tag
if [ $? -eq 1 ]; then
latest="$current_tag"
fi
fi
fi
fi
done
rm -f "$temp_file"
return 0
echo $latest
}
# MAIN
current_tag="$(echo $GITHUB_REF | tr -d 'refs/tags/')"
get_latest
latest="$(get_latest)"
if [ "$current_tag" != "$latest" ]; then
# The current release tag is not the latest
@ -44,5 +130,3 @@ else
# The current release tag is the latest
echo "true"
fi
exit 0

View File

@ -1,28 +0,0 @@
name: Bench (manual)
on:
workflow_dispatch:
inputs:
workload:
description: 'The path to the workloads to execute (workloads/...)'
required: true
default: 'workloads/movies.json'
env:
WORKLOAD_NAME: ${{ github.event.inputs.workload }}
jobs:
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 180 # 3h
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
- name: Run benchmarks - workload ${WORKLOAD_NAME} - branch ${{ github.ref }} - commit ${{ github.sha }}
run: |
cargo xtask bench --api-key "${{ secrets.BENCHMARK_API_KEY }}" --dashboard-url "${{ vars.BENCHMARK_DASHBOARD_URL }}" --reason "Manual [Run #${{ github.run_id }}](https://github.com/meilisearch/meilisearch/actions/runs/${{ github.run_id }})" -- ${WORKLOAD_NAME}

View File

@ -1,82 +0,0 @@
name: Bench (PR)
on:
issue_comment:
types: [created]
permissions:
issues: write
env:
GH_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }}
jobs:
run-benchmarks-on-comment:
if: startsWith(github.event.comment.body, '/bench')
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 180 # 3h
steps:
- name: Check permissions
id: permission
env:
PR_AUTHOR: ${{github.event.issue.user.login }}
COMMENT_AUTHOR: ${{github.event.comment.user.login }}
REPOSITORY: ${{github.repository}}
PR_ID: ${{github.event.issue.number}}
run: |
PR_REPOSITORY=$(gh api /repos/"$REPOSITORY"/pulls/"$PR_ID" --jq .head.repo.full_name)
if $(gh api /repos/"$REPOSITORY"/collaborators/"$PR_AUTHOR"/permission --jq .user.permissions.push)
then
echo "::notice title=Authentication success::PR author authenticated"
else
echo "::error title=Authentication error::PR author doesn't have push permission on this repository"
exit 1
fi
if $(gh api /repos/"$REPOSITORY"/collaborators/"$COMMENT_AUTHOR"/permission --jq .user.permissions.push)
then
echo "::notice title=Authentication success::Comment author authenticated"
else
echo "::error title=Authentication error::Comment author doesn't have push permission on this repository"
exit 1
fi
if [ "$PR_REPOSITORY" = "$REPOSITORY" ]
then
echo "::notice title=Authentication success::PR started from main repository"
else
echo "::error title=Authentication error::PR started from a fork"
exit 1
fi
- name: Check for Command
id: command
uses: xt0rted/slash-command-action@v2
with:
command: bench
reaction-type: "rocket"
repo-token: ${{ env.GH_TOKEN }}
- uses: xt0rted/pull-request-comment-branch@v3
id: comment-branch
with:
repo_token: ${{ env.GH_TOKEN }}
- uses: actions/checkout@v3
if: success()
with:
fetch-depth: 0 # fetch full history to be able to get main commit sha
ref: ${{ steps.comment-branch.outputs.head_ref }}
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
- name: Run benchmarks on PR ${{ github.event.issue.id }}
run: |
cargo xtask bench --api-key "${{ secrets.BENCHMARK_API_KEY }}" \
--dashboard-url "${{ vars.BENCHMARK_DASHBOARD_URL }}" \
--reason "[Comment](${{ github.event.comment.html_url }}) on [#${{ github.event.issue.number }}](${{ github.event.issue.html_url }})" \
-- ${{ steps.command.outputs.command-arguments }} > benchlinks.txt
- name: Send comment in PR
run: |
gh pr comment ${{github.event.issue.number}} --body-file benchlinks.txt

View File

@ -1,23 +0,0 @@
name: Indexing bench (push)
on:
push:
branches:
- main
jobs:
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 180 # 3h
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch main - Commit ${{ github.sha }}
run: |
cargo xtask bench --api-key "${{ secrets.BENCHMARK_API_KEY }}" --dashboard-url "${{ vars.BENCHMARK_DASHBOARD_URL }}" --reason "Push on `main` [Run #${{ github.run_id }}](https://github.com/meilisearch/meilisearch/actions/runs/${{ github.run_id }})" -- workloads/*.json

View File

@ -1,75 +0,0 @@
name: Benchmarks (manual)
on:
workflow_dispatch:
inputs:
dataset_name:
description: 'The name of the dataset used to benchmark (search_songs, search_wiki, search_geo or indexing)'
required: false
default: 'search_songs'
env:
BENCH_NAME: ${{ github.event.inputs.dataset_name }}
jobs:
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 4320 # 72h
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
# Set variables
- name: Set current branch name
shell: bash
run: echo "name=$(echo ${GITHUB_REF#refs/heads/})" >> $GITHUB_OUTPUT
id: current_branch
- name: Set normalized current branch name # Replace `/` by `_` in branch name to avoid issues when pushing to S3
shell: bash
run: echo "name=$(echo ${GITHUB_REF#refs/heads/} | tr '/' '_')" >> $GITHUB_OUTPUT
id: normalized_current_branch
- name: Set shorter commit SHA
shell: bash
run: echo "short=$(echo $GITHUB_SHA | cut -c1-8)" >> $GITHUB_OUTPUT
id: commit_sha
- name: Set file basename with format "dataset_branch_commitSHA"
shell: bash
run: echo "basename=$(echo ${BENCH_NAME}_${{ steps.normalized_current_branch.outputs.name }}_${{ steps.commit_sha.outputs.short }})" >> $GITHUB_OUTPUT
id: file
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files
- name: Install critcmp
uses: taiki-e/install-action@v2
with:
tool: critcmp
- name: Export cripcmp file
run: |
critcmp --export ${{ steps.file.outputs.basename }} > ${{ steps.file.outputs.basename }}.json
# Upload benchmarks
- name: Upload ${{ steps.file.outputs.basename }}.json to DO Spaces # DigitalOcean Spaces = S3
uses: BetaHuhn/do-spaces-action@v2
with:
access_key: ${{ secrets.DO_SPACES_ACCESS_KEY }}
secret_key: ${{ secrets.DO_SPACES_SECRET_KEY }}
space_name: ${{ secrets.DO_SPACES_SPACE_NAME }}
space_region: ${{ secrets.DO_SPACES_SPACE_REGION }}
source: ${{ steps.file.outputs.basename }}.json
out_dir: critcmp_results
# Helper
- name: 'README: compare with another benchmark'
run: |
echo "${{ steps.file.outputs.basename }}.json has just been pushed."
echo 'How to compare this benchmark with another one?'
echo ' - Check the available files with: ./benchmarks/scripts/list.sh'
echo " - Run the following command: ./benchmaks/scripts/compare.sh <file-to-compare-with> ${{ steps.file.outputs.basename }}.json"

View File

@ -1,127 +0,0 @@
name: Benchmarks (PR)
on: issue_comment
permissions:
issues: write
env:
GH_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }}
jobs:
run-benchmarks-on-comment:
if: startsWith(github.event.comment.body, '/benchmark')
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 4320 # 72h
steps:
- name: Check permissions
id: permission
env:
PR_AUTHOR: ${{github.event.issue.user.login }}
COMMENT_AUTHOR: ${{github.event.comment.user.login }}
REPOSITORY: ${{github.repository}}
PR_ID: ${{github.event.issue.number}}
run: |
PR_REPOSITORY=$(gh api /repos/"$REPOSITORY"/pulls/"$PR_ID" --jq .head.repo.full_name)
if $(gh api /repos/"$REPOSITORY"/collaborators/"$PR_AUTHOR"/permission --jq .user.permissions.push)
then
echo "::notice title=Authentication success::PR author authenticated"
else
echo "::error title=Authentication error::PR author doesn't have push permission on this repository"
exit 1
fi
if $(gh api /repos/"$REPOSITORY"/collaborators/"$COMMENT_AUTHOR"/permission --jq .user.permissions.push)
then
echo "::notice title=Authentication success::Comment author authenticated"
else
echo "::error title=Authentication error::Comment author doesn't have push permission on this repository"
exit 1
fi
if [ "$PR_REPOSITORY" = "$REPOSITORY" ]
then
echo "::notice title=Authentication success::PR started from main repository"
else
echo "::error title=Authentication error::PR started from a fork"
exit 1
fi
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
- name: Check for Command
id: command
uses: xt0rted/slash-command-action@v2
with:
command: benchmark
reaction-type: "eyes"
repo-token: ${{ env.GH_TOKEN }}
- uses: xt0rted/pull-request-comment-branch@v3
id: comment-branch
with:
repo_token: ${{ env.GH_TOKEN }}
- uses: actions/checkout@v3
if: success()
with:
fetch-depth: 0 # fetch full history to be able to get main commit sha
ref: ${{ steps.comment-branch.outputs.head_ref }}
# Set variables
- name: Set current branch name
shell: bash
run: echo "name=$(git rev-parse --abbrev-ref HEAD)" >> $GITHUB_OUTPUT
id: current_branch
- name: Set normalized current branch name # Replace `/` by `_` in branch name to avoid issues when pushing to S3
shell: bash
run: echo "name=$(git rev-parse --abbrev-ref HEAD | tr '/' '_')" >> $GITHUB_OUTPUT
id: normalized_current_branch
- name: Set shorter commit SHA
shell: bash
run: echo "short=$(echo $GITHUB_SHA | cut -c1-8)" >> $GITHUB_OUTPUT
id: commit_sha
- name: Set file basename with format "dataset_branch_commitSHA"
shell: bash
run: echo "basename=$(echo ${{ steps.command.outputs.command-arguments }}_${{ steps.normalized_current_branch.outputs.name }}_${{ steps.commit_sha.outputs.short }})" >> $GITHUB_OUTPUT
id: file
# Run benchmarks
- name: Run benchmarks - Dataset ${{ steps.command.outputs.command-arguments }} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd crates/benchmarks
cargo bench --bench ${{ steps.command.outputs.command-arguments }} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files
- name: Install critcmp
uses: taiki-e/install-action@v2
with:
tool: critcmp
- name: Export cripcmp file
run: |
critcmp --export ${{ steps.file.outputs.basename }} > ${{ steps.file.outputs.basename }}.json
# Upload benchmarks
- name: Upload ${{ steps.file.outputs.basename }}.json to DO Spaces # DigitalOcean Spaces = S3
uses: BetaHuhn/do-spaces-action@v2
with:
access_key: ${{ secrets.DO_SPACES_ACCESS_KEY }}
secret_key: ${{ secrets.DO_SPACES_SECRET_KEY }}
space_name: ${{ secrets.DO_SPACES_SPACE_NAME }}
space_region: ${{ secrets.DO_SPACES_SPACE_REGION }}
source: ${{ steps.file.outputs.basename }}.json
out_dir: critcmp_results
# Compute the diff of the benchmarks and send a message on the GitHub PR
- name: Compute and send a message in the PR
env:
GITHUB_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }}
run: |
set -x
export base_ref=$(git merge-base origin/main ${{ steps.comment-branch.outputs.head_ref }} | head -c8)
export base_filename=$(echo ${{ steps.command.outputs.command-arguments }}_main_${base_ref}.json)
export bench_name=$(echo ${{ steps.command.outputs.command-arguments }})
echo "Here are your $bench_name benchmarks diff 👊" >> body.txt
echo '```' >> body.txt
./benchmarks/scripts/compare.sh $base_filename ${{ steps.file.outputs.basename }}.json >> body.txt
echo '```' >> body.txt
gh pr comment ${{ steps.current_branch.outputs.name }} --body-file body.txt

View File

@ -1,77 +0,0 @@
name: Benchmarks of indexing (push)
on:
push:
branches:
- main
env:
INFLUX_TOKEN: ${{ secrets.INFLUX_TOKEN }}
BENCH_NAME: "indexing"
jobs:
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
timeout-minutes: 4320 # 72h
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
# Set variables
- name: Set current branch name
shell: bash
run: echo "name=$(echo ${GITHUB_REF#refs/heads/})" >> $GITHUB_OUTPUT
id: current_branch
- name: Set normalized current branch name # Replace `/` by `_` in branch name to avoid issues when pushing to S3
shell: bash
run: echo "name=$(echo ${GITHUB_REF#refs/heads/} | tr '/' '_')" >> $GITHUB_OUTPUT
id: normalized_current_branch
- name: Set shorter commit SHA
shell: bash
run: echo "short=$(echo $GITHUB_SHA | cut -c1-8)" >> $GITHUB_OUTPUT
id: commit_sha
- name: Set file basename with format "dataset_branch_commitSHA"
shell: bash
run: echo "basename=$(echo ${BENCH_NAME}_${{ steps.normalized_current_branch.outputs.name }}_${{ steps.commit_sha.outputs.short }})" >> $GITHUB_OUTPUT
id: file
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files
- name: Install critcmp
uses: taiki-e/install-action@v2
with:
tool: critcmp
- name: Export cripcmp file
run: |
critcmp --export ${{ steps.file.outputs.basename }} > ${{ steps.file.outputs.basename }}.json
# Upload benchmarks
- name: Upload ${{ steps.file.outputs.basename }}.json to DO Spaces # DigitalOcean Spaces = S3
uses: BetaHuhn/do-spaces-action@v2
with:
access_key: ${{ secrets.DO_SPACES_ACCESS_KEY }}
secret_key: ${{ secrets.DO_SPACES_SECRET_KEY }}
space_name: ${{ secrets.DO_SPACES_SPACE_NAME }}
space_region: ${{ secrets.DO_SPACES_SPACE_REGION }}
source: ${{ steps.file.outputs.basename }}.json
out_dir: critcmp_results
# Upload benchmarks to influxdb
- name: Upload ${{ steps.file.outputs.basename }}.json to influxDB
run: telegraf --config https://eu-central-1-1.aws.cloud2.influxdata.com/api/v2/telegrafs/08b52e34a370b000 --once --debug
# Helper
- name: 'README: compare with another benchmark'
run: |
echo "${{ steps.file.outputs.basename }}.json has just been pushed."
echo 'How to compare this benchmark with another one?'
echo ' - Check the available files with: ./benchmarks/scripts/list.sh'
echo " - Run the following command: ./benchmaks/scipts/compare.sh <file-to-compare-with> ${{ steps.file.outputs.basename }}.json"

View File

@ -1,76 +0,0 @@
name: Benchmarks of search for geo (push)
on:
push:
branches:
- main
env:
BENCH_NAME: "search_geo"
INFLUX_TOKEN: ${{ secrets.INFLUX_TOKEN }}
jobs:
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
# Set variables
- name: Set current branch name
shell: bash
run: echo "name=$(echo ${GITHUB_REF#refs/heads/})" >> $GITHUB_OUTPUT
id: current_branch
- name: Set normalized current branch name # Replace `/` by `_` in branch name to avoid issues when pushing to S3
shell: bash
run: echo "name=$(echo ${GITHUB_REF#refs/heads/} | tr '/' '_')" >> $GITHUB_OUTPUT
id: normalized_current_branch
- name: Set shorter commit SHA
shell: bash
run: echo "short=$(echo $GITHUB_SHA | cut -c1-8)" >> $GITHUB_OUTPUT
id: commit_sha
- name: Set file basename with format "dataset_branch_commitSHA"
shell: bash
run: echo "basename=$(echo ${BENCH_NAME}_${{ steps.normalized_current_branch.outputs.name }}_${{ steps.commit_sha.outputs.short }})" >> $GITHUB_OUTPUT
id: file
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files
- name: Install critcmp
uses: taiki-e/install-action@v2
with:
tool: critcmp
- name: Export cripcmp file
run: |
critcmp --export ${{ steps.file.outputs.basename }} > ${{ steps.file.outputs.basename }}.json
# Upload benchmarks
- name: Upload ${{ steps.file.outputs.basename }}.json to DO Spaces # DigitalOcean Spaces = S3
uses: BetaHuhn/do-spaces-action@v2
with:
access_key: ${{ secrets.DO_SPACES_ACCESS_KEY }}
secret_key: ${{ secrets.DO_SPACES_SECRET_KEY }}
space_name: ${{ secrets.DO_SPACES_SPACE_NAME }}
space_region: ${{ secrets.DO_SPACES_SPACE_REGION }}
source: ${{ steps.file.outputs.basename }}.json
out_dir: critcmp_results
# Upload benchmarks to influxdb
- name: Upload ${{ steps.file.outputs.basename }}.json to influxDB
run: telegraf --config https://eu-central-1-1.aws.cloud2.influxdata.com/api/v2/telegrafs/08b52e34a370b000 --once --debug
# Helper
- name: 'README: compare with another benchmark'
run: |
echo "${{ steps.file.outputs.basename }}.json has just been pushed."
echo 'How to compare this benchmark with another one?'
echo ' - Check the available files with: ./benchmarks/scripts/list.sh'
echo " - Run the following command: ./benchmaks/scipts/compare.sh <file-to-compare-with> ${{ steps.file.outputs.basename }}.json"

View File

@ -1,76 +0,0 @@
name: Benchmarks of search for songs (push)
on:
push:
branches:
- main
env:
BENCH_NAME: "search_songs"
INFLUX_TOKEN: ${{ secrets.INFLUX_TOKEN }}
jobs:
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
# Set variables
- name: Set current branch name
shell: bash
run: echo "name=$(echo ${GITHUB_REF#refs/heads/})" >> $GITHUB_OUTPUT
id: current_branch
- name: Set normalized current branch name # Replace `/` by `_` in branch name to avoid issues when pushing to S3
shell: bash
run: echo "name=$(echo ${GITHUB_REF#refs/heads/} | tr '/' '_')" >> $GITHUB_OUTPUT
id: normalized_current_branch
- name: Set shorter commit SHA
shell: bash
run: echo "short=$(echo $GITHUB_SHA | cut -c1-8)" >> $GITHUB_OUTPUT
id: commit_sha
- name: Set file basename with format "dataset_branch_commitSHA"
shell: bash
run: echo "basename=$(echo ${BENCH_NAME}_${{ steps.normalized_current_branch.outputs.name }}_${{ steps.commit_sha.outputs.short }})" >> $GITHUB_OUTPUT
id: file
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files
- name: Install critcmp
uses: taiki-e/install-action@v2
with:
tool: critcmp
- name: Export cripcmp file
run: |
critcmp --export ${{ steps.file.outputs.basename }} > ${{ steps.file.outputs.basename }}.json
# Upload benchmarks
- name: Upload ${{ steps.file.outputs.basename }}.json to DO Spaces # DigitalOcean Spaces = S3
uses: BetaHuhn/do-spaces-action@v2
with:
access_key: ${{ secrets.DO_SPACES_ACCESS_KEY }}
secret_key: ${{ secrets.DO_SPACES_SECRET_KEY }}
space_name: ${{ secrets.DO_SPACES_SPACE_NAME }}
space_region: ${{ secrets.DO_SPACES_SPACE_REGION }}
source: ${{ steps.file.outputs.basename }}.json
out_dir: critcmp_results
# Upload benchmarks to influxdb
- name: Upload ${{ steps.file.outputs.basename }}.json to influxDB
run: telegraf --config https://eu-central-1-1.aws.cloud2.influxdata.com/api/v2/telegrafs/08b52e34a370b000 --once --debug
# Helper
- name: 'README: compare with another benchmark'
run: |
echo "${{ steps.file.outputs.basename }}.json has just been pushed."
echo 'How to compare this benchmark with another one?'
echo ' - Check the available files with: ./benchmarks/scripts/list.sh'
echo " - Run the following command: ./benchmaks/scipts/compare.sh <file-to-compare-with> ${{ steps.file.outputs.basename }}.json"

View File

@ -1,76 +0,0 @@
name: Benchmarks of search for Wikipedia articles (push)
on:
push:
branches:
- main
env:
BENCH_NAME: "search_wiki"
INFLUX_TOKEN: ${{ secrets.INFLUX_TOKEN }}
jobs:
benchmarks:
name: Run and upload benchmarks
runs-on: benchmarks
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
# Set variables
- name: Set current branch name
shell: bash
run: echo "name=$(echo ${GITHUB_REF#refs/heads/})" >> $GITHUB_OUTPUT
id: current_branch
- name: Set normalized current branch name # Replace `/` by `_` in branch name to avoid issues when pushing to S3
shell: bash
run: echo "name=$(echo ${GITHUB_REF#refs/heads/} | tr '/' '_')" >> $GITHUB_OUTPUT
id: normalized_current_branch
- name: Set shorter commit SHA
shell: bash
run: echo "short=$(echo $GITHUB_SHA | cut -c1-8)" >> $GITHUB_OUTPUT
id: commit_sha
- name: Set file basename with format "dataset_branch_commitSHA"
shell: bash
run: echo "basename=$(echo ${BENCH_NAME}_${{ steps.normalized_current_branch.outputs.name }}_${{ steps.commit_sha.outputs.short }})" >> $GITHUB_OUTPUT
id: file
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files
- name: Install critcmp
uses: taiki-e/install-action@v2
with:
tool: critcmp
- name: Export cripcmp file
run: |
critcmp --export ${{ steps.file.outputs.basename }} > ${{ steps.file.outputs.basename }}.json
# Upload benchmarks
- name: Upload ${{ steps.file.outputs.basename }}.json to DO Spaces # DigitalOcean Spaces = S3
uses: BetaHuhn/do-spaces-action@v2
with:
access_key: ${{ secrets.DO_SPACES_ACCESS_KEY }}
secret_key: ${{ secrets.DO_SPACES_SECRET_KEY }}
space_name: ${{ secrets.DO_SPACES_SPACE_NAME }}
space_region: ${{ secrets.DO_SPACES_SPACE_REGION }}
source: ${{ steps.file.outputs.basename }}.json
out_dir: critcmp_results
# Upload benchmarks to influxdb
- name: Upload ${{ steps.file.outputs.basename }}.json to influxDB
run: telegraf --config https://eu-central-1-1.aws.cloud2.influxdata.com/api/v2/telegrafs/08b52e34a370b000 --once --debug
# Helper
- name: 'README: compare with another benchmark'
run: |
echo "${{ steps.file.outputs.basename }}.json has just been pushed."
echo 'How to compare this benchmark with another one?'
echo ' - Check the available files with: ./benchmarks/scripts/list.sh'
echo " - Run the following command: ./benchmaks/scipts/compare.sh <file-to-compare-with> ${{ steps.file.outputs.basename }}.json"

View File

@ -1,100 +0,0 @@
name: PR Milestone Check
on:
pull_request:
types: [opened, reopened, edited, synchronize, milestoned, demilestoned]
branches:
- "main"
- "release-v*.*.*"
jobs:
check-milestone:
name: Check PR Milestone
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Validate PR milestone
uses: actions/github-script@v6
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
// Get PR number directly from the event payload
const prNumber = context.payload.pull_request.number;
// Get PR details
const { data: prData } = await github.rest.pulls.get({
owner: 'meilisearch',
repo: 'meilisearch',
pull_number: prNumber
});
// Get base branch name
const baseBranch = prData.base.ref;
console.log(`Base branch: ${baseBranch}`);
// Get PR milestone
const prMilestone = prData.milestone;
if (!prMilestone) {
core.setFailed('PR must have a milestone assigned');
return;
}
console.log(`PR milestone: ${prMilestone.title}`);
// Validate milestone format: vx.y.z
const milestoneRegex = /^v\d+\.\d+\.\d+$/;
if (!milestoneRegex.test(prMilestone.title)) {
core.setFailed(`Milestone "${prMilestone.title}" does not follow the required format vx.y.z`);
return;
}
// For main branch PRs, check if the milestone is the highest one
if (baseBranch === 'main') {
// Get all milestones
const { data: milestones } = await github.rest.issues.listMilestones({
owner: 'meilisearch',
repo: 'meilisearch',
state: 'open',
sort: 'due_on',
direction: 'desc'
});
// Sort milestones by version number (vx.y.z)
const sortedMilestones = milestones
.filter(m => milestoneRegex.test(m.title))
.sort((a, b) => {
const versionA = a.title.substring(1).split('.').map(Number);
const versionB = b.title.substring(1).split('.').map(Number);
// Compare major version
if (versionA[0] !== versionB[0]) return versionB[0] - versionA[0];
// Compare minor version
if (versionA[1] !== versionB[1]) return versionB[1] - versionA[1];
// Compare patch version
return versionB[2] - versionA[2];
});
if (sortedMilestones.length === 0) {
core.setFailed('No valid milestones found in the repository. Please create at least one milestone with the format vx.y.z');
return;
}
const highestMilestone = sortedMilestones[0];
console.log(`Highest milestone: ${highestMilestone.title}`);
if (prMilestone.title !== highestMilestone.title) {
core.setFailed(`PRs targeting the main branch must use the highest milestone (${highestMilestone.title}), but this PR uses ${prMilestone.title}`);
return;
}
} else {
// For release branches, the milestone should match the branch version
const branchVersion = baseBranch.substring(8); // remove 'release-'
if (prMilestone.title !== branchVersion) {
core.setFailed(`PRs targeting release branch "${baseBranch}" must use the matching milestone "${branchVersion}", but this PR uses "${prMilestone.title}"`);
return;
}
}
console.log('PR milestone validation passed!');

33
.github/workflows/coverage.yml vendored Normal file
View File

@ -0,0 +1,33 @@
---
on:
workflow_dispatch:
name: Execute code coverage
jobs:
nightly-coverage:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v3
- uses: actions-rs/toolchain@v1
with:
toolchain: nightly
override: true
- uses: actions-rs/cargo@v1
with:
command: clean
- uses: actions-rs/cargo@v1
with:
command: test
args: --all-features --no-fail-fast
env:
CARGO_INCREMENTAL: "0"
RUSTFLAGS: "-Zprofile -Ccodegen-units=1 -Cinline-threshold=0 -Clink-dead-code -Coverflow-checks=off -Cpanic=unwind -Zpanic_abort_tests"
- uses: actions-rs/grcov@v0.1
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: ${{ steps.coverage.outputs.report }}
yml: ./codecov.yml
fail_ci_if_error: true

View File

@ -0,0 +1,23 @@
name: Create issue to upgrade dependencies
on:
schedule:
- cron: '0 0 1 */3 *'
workflow_dispatch:
jobs:
create-issue:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Create an issue
uses: actions-ecosystem/action-create-issue@v1
with:
github_token: ${{ secrets.MEILI_BOT_GH_PAT }}
title: Upgrade dependencies
body: |
We need to update the dependencies of the Meilisearch repository, and, if possible, the dependencies of all the core-team repositories that Meilisearch depends on (milli, charabia, heed...).
⚠️ This issue should only be done at the beginning of the sprint!
labels: |
dependencies
maintenance

View File

@ -1,24 +0,0 @@
name: Create issue to upgrade dependencies
on:
schedule:
# Run the first of the month, every 6 month
- cron: '0 0 1 */6 *'
workflow_dispatch:
jobs:
create-issue:
runs-on: ubuntu-latest
env:
ISSUE_TEMPLATE: issue-template.md
GH_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }}
steps:
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/dependency-issue.md > $ISSUE_TEMPLATE
- name: Create issue
run: |
gh issue create \
--title 'Upgrade dependencies' \
--label 'dependencies,maintenance' \
--body-file $ISSUE_TEMPLATE

View File

@ -1,30 +0,0 @@
name: Look for flaky tests
on:
workflow_dispatch:
schedule:
- cron: "0 12 * * FRI" # Every Friday at 12:00PM
jobs:
flaky:
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
steps:
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.81
- name: Install cargo-flaky
run: cargo install cargo-flaky
- name: Run cargo flaky in the dumps
run: cd crates/dump; cargo flaky -i 100 --release
- name: Run cargo flaky in the index-scheduler
run: cd crates/index-scheduler; cargo flaky -i 100 --release
- name: Run cargo flaky in the auth
run: cd crates/meilisearch-auth; cargo flaky -i 100 --release
- name: Run cargo flaky in meilisearch
run: cd crates/meilisearch; cargo flaky -i 100 --release

16
.github/workflows/flaky.yml vendored Normal file
View File

@ -0,0 +1,16 @@
name: Look for flaky tests
on:
workflow_dispatch:
schedule:
- cron: "0 12 * * FRI" # Every Friday at 12:00PM
jobs:
flaky:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v3
- name: Install cargo-flaky
run: cargo install cargo-flaky
- name: Run cargo flaky 100 times
run: cargo flaky -i 100 --release

View File

@ -1,22 +0,0 @@
name: Run the indexing fuzzer
on:
push:
branches:
- main
jobs:
fuzz:
name: Setup the action
runs-on: ubuntu-latest
timeout-minutes: 4320 # 72h
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
# Run benchmarks
- name: Run the fuzzer
run: |
cargo run --release --bin fuzz-indexing

View File

@ -1,29 +0,0 @@
# Create or update a latest git tag when releasing a stable version of Meilisearch
name: Update latest git tag
on:
workflow_dispatch:
release:
types: [released]
jobs:
check-version:
name: Check the version validity
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Check release validity
if: github.event_name == 'release'
run: bash .github/scripts/check-release.sh
update-latest-tag:
runs-on: ubuntu-latest
needs: check-version
steps:
- uses: actions/checkout@v3
- uses: rickstaa/action-create-tag@v1
with:
tag: "latest"
message: "Latest stable release of Meilisearch"
# Move the tag if `latest` already exists
force_push_tag: true
github_token: ${{ secrets.MEILI_BOT_GH_PAT }}

View File

@ -3,8 +3,8 @@ name: Milestone's workflow
# /!\ No git flow are handled here
# For each Milestone created (not opened!), and if the release is NOT a patch release (only the patch changed)
# - the roadmap issue is created, see https://github.com/meilisearch/engine-team/blob/main/issue-templates/roadmap-issue.md
# - the changelog issue is created, see https://github.com/meilisearch/engine-team/blob/main/issue-templates/changelog-issue.md
# - the roadmap issue is created, see https://github.com/meilisearch/core-team/blob/main/issue-templates/roadmap-issue.md
# - the changelog issue is created, see https://github.com/meilisearch/core-team/blob/main/issue-templates/changelog-issue.md
# For each Milestone closed
# - the `release_version` label is created
@ -31,6 +31,8 @@ jobs:
runs-on: ubuntu-latest
outputs:
is-patch: ${{ steps.check-patch.outputs.is-patch }}
env:
MILESTONE_VERSION: ${{ github.event.milestone.title }}
steps:
- uses: actions/checkout@v3
- name: Check if this release is a patch release only
@ -39,10 +41,10 @@ jobs:
echo version: $MILESTONE_VERSION
if [[ $MILESTONE_VERSION =~ ^v[0-9]+\.[0-9]+\.0$ ]]; then
echo 'This is NOT a patch release'
echo "is-patch=false" >> $GITHUB_OUTPUT
echo ::set-output name=is-patch::false
elif [[ $MILESTONE_VERSION =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
echo 'This is a patch release'
echo "is-patch=true" >> $GITHUB_OUTPUT
echo ::set-output name=is-patch::true
else
echo "Not a valid format of release, check the Milestone's title."
echo 'Should be vX.Y.Z'
@ -59,7 +61,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/roadmap-issue.md > $ISSUE_TEMPLATE
run: curl -s https://raw.githubusercontent.com/meilisearch/core-team/main/issue-templates/roadmap-issue.md > $ISSUE_TEMPLATE
- name: Replace all empty occurrences in the templates
run: |
# Replace all <<version>> occurrences
@ -92,7 +94,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/changelog-issue.md > $ISSUE_TEMPLATE
run: curl -s https://raw.githubusercontent.com/meilisearch/core-team/main/issue-templates/changelog-issue.md > $ISSUE_TEMPLATE
- name: Replace all empty occurrences in the templates
run: |
# Replace all <<version>> occurrences
@ -110,44 +112,6 @@ jobs:
--milestone $MILESTONE_VERSION \
--assignee curquiza
create-update-version-issue:
needs: get-release-version
# Create the update-version issue even if the release is a patch release
if: github.event.action == 'created'
runs-on: ubuntu-latest
env:
ISSUE_TEMPLATE: issue-template.md
steps:
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/update-version-issue.md > $ISSUE_TEMPLATE
- name: Create the issue
run: |
gh issue create \
--title "Update version in Cargo.toml for $MILESTONE_VERSION" \
--label 'maintenance' \
--body-file $ISSUE_TEMPLATE \
--milestone $MILESTONE_VERSION
create-update-openapi-issue:
needs: get-release-version
# Create the openAPI issue if the release is not only a patch release
if: github.event.action == 'created' && needs.get-release-version.outputs.is-patch == 'false'
runs-on: ubuntu-latest
env:
ISSUE_TEMPLATE: issue-template.md
steps:
- uses: actions/checkout@v3
- name: Download the issue template
run: curl -s https://raw.githubusercontent.com/meilisearch/engine-team/main/issue-templates/update-openapi-issue.md > $ISSUE_TEMPLATE
- name: Create the issue
run: |
gh issue create \
--title "Update Open API file for $MILESTONE_VERSION" \
--label 'maintenance' \
--body-file $ISSUE_TEMPLATE \
--milestone $MILESTONE_VERSION
# ----------------
# MILESTONE CLOSED
# ----------------

View File

@ -1,55 +0,0 @@
name: Publish to APT & Homebrew
on:
release:
types: [released]
jobs:
check-version:
name: Check the version validity
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Check release validity
run: bash .github/scripts/check-release.sh
debian:
name: Publish debian packagge
runs-on: ubuntu-latest
needs: check-version
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
steps:
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.81
- name: Install cargo-deb
run: cargo install cargo-deb
- uses: actions/checkout@v3
- name: Build deb package
run: cargo deb -p meilisearch -o target/debian/meilisearch.deb
- name: Upload debian pkg to release
uses: svenstaro/upload-release-action@2.7.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/debian/meilisearch.deb
asset_name: meilisearch.deb
tag: ${{ github.ref }}
- name: Upload debian pkg to apt repository
run: curl -F package=@target/debian/meilisearch.deb https://${{ secrets.GEMFURY_PUSH_TOKEN }}@push.fury.io/meilisearch/
homebrew:
name: Bump Homebrew formula
runs-on: ubuntu-latest
needs: check-version
steps:
- name: Create PR to Homebrew
uses: mislav/bump-homebrew-formula-action@v3
with:
formula-name: meilisearch
formula-path: Formula/m/meilisearch.rb
env:
COMMITTER_TOKEN: ${{ secrets.HOMEBREW_COMMITTER_TOKEN }}

View File

@ -1,12 +1,12 @@
name: Publish binaries to GitHub release
on:
workflow_dispatch:
schedule:
- cron: "0 2 * * *" # Every day at 2:00am
- cron: '0 2 * * *' # Every day at 2:00am
release:
types: [published]
name: Publish binaries to release
jobs:
check-version:
name: Check the version validity
@ -24,87 +24,73 @@ jobs:
escaped_tag=$(printf "%q" ${{ github.ref_name }})
if [[ $escaped_tag =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
echo "stable=true" >> $GITHUB_OUTPUT
echo ::set-output name=stable::true
else
echo "stable=false" >> $GITHUB_OUTPUT
echo ::set-output name=stable::false
fi
- name: Check release validity
if: github.event_name == 'release' && steps.check-tag-format.outputs.stable == 'true'
run: bash .github/scripts/check-release.sh
publish-linux:
name: Publish binary for Linux
runs-on: ubuntu-latest
needs: check-version
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
steps:
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.81
- name: Build
run: cargo build --release --locked
# No need to upload binaries for dry run (cron)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.7.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/meilisearch
asset_name: meilisearch-linux-amd64
tag: ${{ github.ref }}
publish-macos-windows:
publish:
name: Publish binary for ${{ matrix.os }}
runs-on: ${{ matrix.os }}
needs: check-version
strategy:
fail-fast: false
matrix:
os: [macos-13, windows-2022]
os: [ubuntu-18.04, macos-latest, windows-latest]
include:
- os: macos-13
- os: ubuntu-18.04
artifact_name: meilisearch
asset_name: meilisearch-linux-amd64
- os: macos-latest
artifact_name: meilisearch
asset_name: meilisearch-macos-amd64
- os: windows-2022
- os: windows-latest
artifact_name: meilisearch.exe
asset_name: meilisearch-windows-amd64.exe
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
- name: Build
run: cargo build --release --locked
# No need to upload binaries for dry run (cron)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.7.0
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/${{ matrix.artifact_name }}
asset_name: ${{ matrix.asset_name }}
tag: ${{ github.ref }}
- uses: hecrj/setup-rust-action@master
with:
rust-version: stable
- uses: actions/checkout@v3
- name: Build
run: cargo build --release --locked
# No need to upload binaries for dry run (cron)
- name: Upload binaries to release
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@v1-release
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/release/${{ matrix.artifact_name }}
asset_name: ${{ matrix.asset_name }}
tag: ${{ github.ref }}
publish-macos-apple-silicon:
name: Publish binary for macOS silicon
runs-on: macos-13
runs-on: ${{ matrix.os }}
needs: check-version
continue-on-error: false
strategy:
fail-fast: false
matrix:
include:
- target: aarch64-apple-darwin
- os: macos-latest
target: aarch64-apple-darwin
asset_name: meilisearch-macos-apple-silicon
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Installing Rust toolchain
uses: dtolnay/rust-toolchain@1.81
uses: actions-rs/toolchain@v1
with:
toolchain: stable
profile: minimal
target: ${{ matrix.target }}
override: true
- name: Cargo build
uses: actions-rs/cargo@v1
with:
@ -113,7 +99,7 @@ jobs:
- name: Upload the binary to release
# No need to upload binaries for dry run (cron)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.7.0
uses: svenstaro/upload-release-action@v1-release
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/${{ matrix.target }}/release/meilisearch
@ -122,37 +108,43 @@ jobs:
publish-aarch64:
name: Publish binary for aarch64
runs-on: ubuntu-latest
runs-on: ${{ matrix.os }}
needs: check-version
env:
DEBIAN_FRONTEND: noninteractive
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
continue-on-error: false
strategy:
fail-fast: false
matrix:
include:
- target: aarch64-unknown-linux-gnu
- build: aarch64
os: ubuntu-18.04
target: aarch64-unknown-linux-gnu
linker: gcc-aarch64-linux-gnu
use-cross: true
asset_name: meilisearch-linux-aarch64
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update -y && apt upgrade -y
apt-get install -y curl build-essential gcc-aarch64-linux-gnu
- name: Set up Docker for cross compilation
run: |
apt-get install -y curl apt-transport-https ca-certificates software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
add-apt-repository "deb [arch=$(dpkg --print-architecture)] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update -y && apt-get install -y docker-ce
- name: Installing Rust toolchain
uses: dtolnay/rust-toolchain@1.81
uses: actions-rs/toolchain@v1
with:
toolchain: stable
profile: minimal
target: ${{ matrix.target }}
override: true
- name: APT update
run: |
sudo apt update
- name: Install target specific tools
if: matrix.use-cross
run: |
sudo apt-get install -y ${{ matrix.linker }}
- name: Configure target aarch64 GNU
if: matrix.target == 'aarch64-unknown-linux-gnu'
## Environment variable is not passed using env:
## LD gold won't work with MUSL
# env:
@ -162,23 +154,21 @@ jobs:
echo '[target.aarch64-unknown-linux-gnu]' >> ~/.cargo/config
echo 'linker = "aarch64-linux-gnu-gcc"' >> ~/.cargo/config
echo 'JEMALLOC_SYS_WITH_LG_PAGE=16' >> $GITHUB_ENV
- name: Install a default toolchain that will be used to build cargo cross
run: |
rustup default stable
- name: Cargo build
uses: actions-rs/cargo@v1
with:
command: build
use-cross: true
use-cross: ${{ matrix.use-cross }}
args: --release --target ${{ matrix.target }}
env:
CROSS_DOCKER_IN_DOCKER: true
- name: List target output files
run: ls -lR ./target
- name: Upload the binary to release
# No need to upload binaries for dry run (cron)
if: github.event_name == 'release'
uses: svenstaro/upload-release-action@2.7.0
uses: svenstaro/upload-release-action@v1-release
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/${{ matrix.target }}/release/meilisearch

View File

@ -0,0 +1,49 @@
name: Publish deb pkg to GitHub release & APT repository & Homebrew
on:
release:
types: [released]
jobs:
check-version:
name: Check the version validity
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Check release validity
run: bash .github/scripts/check-release.sh
debian:
name: Publish debian packagge
runs-on: ubuntu-18.04
needs: check-version
steps:
- uses: hecrj/setup-rust-action@master
with:
rust-version: stable
- name: Install cargo-deb
run: cargo install cargo-deb
- uses: actions/checkout@v3
- name: Build deb package
run: cargo deb -p meilisearch-http -o target/debian/meilisearch.deb
- name: Upload debian pkg to release
uses: svenstaro/upload-release-action@v1-release
with:
repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
file: target/debian/meilisearch.deb
asset_name: meilisearch.deb
tag: ${{ github.ref }}
- name: Upload debian pkg to apt repository
run: curl -F package=@target/debian/meilisearch.deb https://${{ secrets.GEMFURY_PUSH_TOKEN }}@push.fury.io/meilisearch/
homebrew:
name: Bump Homebrew formula
runs-on: ubuntu-18.04
needs: check-version
steps:
- name: Create PR to Homebrew
uses: mislav/bump-homebrew-formula-action@v1
with:
formula-name: meilisearch
env:
COMMITTER_TOKEN: ${{ secrets.HOMEBREW_COMMITTER_TOKEN }}

View File

@ -1,17 +1,12 @@
name: Publish images to Docker Hub
---
on:
push:
# Will run for every tag pushed except `latest`
# When the `latest` git tag is created with this [CI](../latest-git-tag.yml)
# we don't need to create a Docker `latest` image again.
# The `latest` Docker image push is already done in this CI when releasing a stable version of Meilisearch.
tags-ignore:
- latest
# Both `schedule` and `workflow_dispatch` build the nightly tag
schedule:
- cron: '0 23 * * *' # Every day at 11:00pm
workflow_dispatch:
- cron: '0 4 * * *' # Every day at 4:00am
push:
tags:
- '*'
name: Publish tagged images to Docker Hub
jobs:
docker:
@ -19,86 +14,65 @@ jobs:
steps:
- uses: actions/checkout@v3
# If we are running a cron or manual job ('schedule' or 'workflow_dispatch' event), it means we are publishing the `nightly` tag, so not considered stable.
# If we have pushed a tag, and the tag has the v<nmumber>.<number>.<number> format, it means we are publishing an official release, so considered stable.
# Check if the tag has the v<nmumber>.<number>.<number> format. If yes, it means we are publishing an official release.
# In this situation, we need to set `output.stable` to create/update the following tags (additionally to the `vX.Y.Z` Docker tag):
# - a `vX.Y` (without patch version) Docker tag
# - a `latest` Docker tag
# For any other tag pushed, this is not considered stable.
- name: Define if stable and latest release
- name: Check tag format
if: github.event_name != 'schedule'
id: check-tag-format
env:
# To avoid request limit with the .github/scripts/is-latest-release.sh script
GITHUB_PATH: ${{ secrets.MEILI_BOT_GH_PAT }}
run: |
escaped_tag=$(printf "%q" ${{ github.ref_name }})
echo "latest=false" >> $GITHUB_OUTPUT
if [[ ${{ github.event_name }} != 'push' ]]; then
echo "stable=false" >> $GITHUB_OUTPUT
elif [[ $escaped_tag =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
echo "stable=true" >> $GITHUB_OUTPUT
echo "latest=$(sh .github/scripts/is-latest-release.sh)" >> $GITHUB_OUTPUT
if [[ $escaped_tag =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
echo ::set-output name=stable::true
else
echo "stable=false" >> $GITHUB_OUTPUT
echo ::set-output name=stable::false
fi
# Check only the validity of the tag for stable releases (not for pre-releases or other tags)
# Check only the validity of the tag for official releases (not for pre-releases or other tags)
- name: Check release validity
if: steps.check-tag-format.outputs.stable == 'true'
if: github.event_name != 'schedule' && steps.check-tag-format.outputs.stable == 'true'
run: bash .github/scripts/check-release.sh
- name: Set build-args for Docker buildx
id: build-metadata
run: |
# Extract commit date
commit_date=$(git show -s --format=%cd --date=iso-strict ${{ github.sha }})
echo "date=$commit_date" >> $GITHUB_OUTPUT
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v2
- name: Login to Docker Hub
uses: docker/login-action@v3
if: github.event_name != 'schedule'
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
uses: docker/metadata-action@v4
with:
images: getmeili/meilisearch
# Prevent `latest` to be updated for each new tag pushed.
# We need latest and `vX.Y` tags to only be pushed for the stable Meilisearch releases.
# The latest and `vX.Y` tags are only pushed for the official Meilisearch releases
# See https://github.com/docker/metadata-action#latest-tag
flavor: latest=false
tags: |
type=ref,event=tag
type=raw,value=nightly,enable=${{ github.event_name != 'push' }}
type=semver,pattern=v{{major}}.{{minor}},enable=${{ steps.check-tag-format.outputs.stable == 'true' }}
type=semver,pattern=v{{major}},enable=${{ steps.check-tag-format.outputs.stable == 'true' }}
type=raw,value=latest,enable=${{ steps.check-tag-format.outputs.stable == 'true' && steps.check-tag-format.outputs.latest == 'true' }}
type=raw,value=latest,enable=${{ steps.check-tag-format.outputs.stable == 'true' }}
- name: Build and push
uses: docker/build-push-action@v6
uses: docker/build-push-action@v3
with:
push: true
# We do not push tags for the cron jobs, this is only for test purposes
push: ${{ github.event_name != 'schedule' }}
platforms: linux/amd64,linux/arm64
tags: ${{ steps.meta.outputs.tags }}
build-args: |
COMMIT_SHA=${{ github.sha }}
COMMIT_DATE=${{ steps.build-metadata.outputs.date }}
GIT_TAG=${{ github.ref_name }}
# /!\ Don't touch this without checking with Cloud team
- name: Send CI information to Cloud team
# Do not send if nightly build (i.e. 'schedule' or 'workflow_dispatch' event)
if: github.event_name == 'push'
uses: peter-evans/repository-dispatch@v3
if: github.event_name != 'schedule'
uses: peter-evans/repository-dispatch@v2
with:
token: ${{ secrets.MEILI_BOT_GH_PAT }}
repository: meilisearch/meilisearch-cloud

92
.github/workflows/rust.yml vendored Normal file
View File

@ -0,0 +1,92 @@
name: Rust
on:
workflow_dispatch:
pull_request:
push:
# trying and staging branches are for Bors config
branches:
- trying
- staging
env:
CARGO_TERM_COLOR: always
RUST_BACKTRACE: 1
RUSTFLAGS: "-D warnings"
jobs:
tests:
name: Tests on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-18.04, macos-latest, windows-latest]
steps:
- uses: actions/checkout@v3
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.0.0
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --locked --release --no-default-features
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release
# We run tests in debug also, to make sure that the debug_assertions are hit
test-debug:
name: Run tests in debug
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v3
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.0.0
- name: Run tests in debug
uses: actions-rs/cargo@v1
with:
command: test
args: --locked
clippy:
name: Run Clippy
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v3
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
components: clippy
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.0.0
- name: Run cargo clippy
uses: actions-rs/cargo@v1
with:
command: clippy
args: --all-targets -- --deny warnings
fmt:
name: Run Rustfmt
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v3
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
components: rustfmt
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.0.0
- name: Run cargo fmt
run: cargo fmt --all -- --check

View File

@ -1,386 +0,0 @@
# If any test fails, the engine team should ensure the "breaking" changes are expected and contact the integration team
name: SDKs tests
on:
workflow_dispatch:
inputs:
docker_image:
description: 'The Meilisearch Docker image used'
required: false
default: nightly
schedule:
- cron: "0 6 * * MON" # Every Monday at 6:00AM
env:
MEILI_MASTER_KEY: 'masterKey'
MEILI_NO_ANALYTICS: 'true'
DISABLE_COVERAGE: 'true'
jobs:
define-docker-image:
runs-on: ubuntu-latest
outputs:
docker-image: ${{ steps.define-image.outputs.docker-image }}
steps:
- uses: actions/checkout@v4
- name: Define the Docker image we need to use
id: define-image
run: |
event=${{ github.event_name }}
echo "docker-image=nightly" >> $GITHUB_OUTPUT
if [[ $event == 'workflow_dispatch' ]]; then
echo "docker-image=${{ github.event.inputs.docker_image }}" >> $GITHUB_OUTPUT
fi
- name: Docker image is ${{ steps.define-image.outputs.docker-image }}
run: echo "Docker image is ${{ steps.define-image.outputs.docker-image }}"
##########
## SDKs ##
##########
meilisearch-dotnet-tests:
needs: define-docker-image
name: .NET SDK tests
runs-on: ubuntu-latest
env:
MEILISEARCH_VERSION: ${{ needs.define-docker-image.outputs.docker-image }}
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-dotnet
- name: Setup .NET Core
uses: actions/setup-dotnet@v4
with:
dotnet-version: "8.0.x"
- name: Install dependencies
run: dotnet restore
- name: Build
run: dotnet build --configuration Release --no-restore
- name: Meilisearch (latest version) setup with Docker
run: docker compose up -d
- name: Run tests
run: dotnet test --no-restore --verbosity normal
meilisearch-dart-tests:
needs: define-docker-image
name: Dart SDK tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-dart
- uses: dart-lang/setup-dart@v1
with:
sdk: 'latest'
- name: Install dependencies
run: dart pub get
- name: Run integration tests
run: dart test --concurrency=4
meilisearch-go-tests:
needs: define-docker-image
name: Go SDK tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: stable
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-go
- name: Get dependencies
run: |
go get -v -t -d ./...
if [ -f Gopkg.toml ]; then
curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh
dep ensure
fi
- name: Run integration tests
run: go test -v ./...
meilisearch-java-tests:
needs: define-docker-image
name: Java SDK tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-java
- name: Set up Java
uses: actions/setup-java@v4
with:
java-version: 8
distribution: 'zulu'
cache: gradle
- name: Grant execute permission for gradlew
run: chmod +x gradlew
- name: Build and run unit and integration tests
run: ./gradlew build integrationTest
meilisearch-js-tests:
needs: define-docker-image
name: JS SDK tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-js
- name: Setup node
uses: actions/setup-node@v4
with:
cache: 'yarn'
- name: Install dependencies
run: yarn --dev
- name: Run tests
run: yarn test
- name: Build project
run: yarn build
- name: Run ESM env
run: yarn test:env:esm
- name: Run Node.js env
run: yarn test:env:nodejs
- name: Run node typescript env
run: yarn test:env:node-ts
- name: Run Browser env
run: yarn test:env:browser
meilisearch-php-tests:
needs: define-docker-image
name: PHP SDK tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-php
- name: Install PHP
uses: shivammathur/setup-php@v2
- name: Validate composer.json and composer.lock
run: composer validate
- name: Install dependencies
run: |
composer remove --dev friendsofphp/php-cs-fixer --no-update --no-interaction
composer update --prefer-dist --no-progress
- name: Run test suite - default HTTP client (Guzzle 7)
run: |
sh scripts/tests.sh
composer remove --dev guzzlehttp/guzzle http-interop/http-factory-guzzle
meilisearch-python-tests:
needs: define-docker-image
name: Python SDK tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-python
- name: Set up Python
uses: actions/setup-python@v5
- name: Install pipenv
uses: dschep/install-pipenv-action@v1
- name: Install dependencies
run: pipenv install --dev --python=${{ matrix.python-version }}
- name: Test with pytest
run: pipenv run pytest
meilisearch-ruby-tests:
needs: define-docker-image
name: Ruby SDK tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-ruby
- name: Set up Ruby 3
uses: ruby/setup-ruby@v1
with:
ruby-version: 3
- name: Install ruby dependencies
run: bundle install --with test
- name: Run test suite
run: bundle exec rspec
meilisearch-rust-tests:
needs: define-docker-image
name: Rust SDK tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-rust
- name: Build
run: cargo build --verbose
- name: Run tests
run: cargo test --verbose
meilisearch-swift-tests:
needs: define-docker-image
name: Swift SDK tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-swift
- name: Run tests
run: swift test
########################
## FRONT-END PLUGINS ##
########################
meilisearch-js-plugins-tests:
needs: define-docker-image
name: meilisearch-js-plugins tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-js-plugins
- name: Setup node
uses: actions/setup-node@v4
with:
cache: yarn
- name: Install dependencies
run: yarn install
- name: Run tests
run: yarn test
- name: Build all the playgrounds and the packages
run: yarn build
########################
## BACK-END PLUGINS ###
########################
meilisearch-rails-tests:
needs: define-docker-image
name: meilisearch-rails tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-rails
- name: Set up Ruby 3
uses: ruby/setup-ruby@v1
with:
ruby-version: 3
bundler-cache: true
- name: Run tests
run: bundle exec rspec
meilisearch-symfony-tests:
needs: define-docker-image
name: meilisearch-symfony tests
runs-on: ubuntu-latest
services:
meilisearch:
image: getmeili/meilisearch:${{ needs.define-docker-image.outputs.docker-image }}
env:
MEILI_MASTER_KEY: ${{ env.MEILI_MASTER_KEY }}
MEILI_NO_ANALYTICS: ${{ env.MEILI_NO_ANALYTICS }}
ports:
- '7700:7700'
steps:
- uses: actions/checkout@v4
with:
repository: meilisearch/meilisearch-symfony
- name: Install PHP
uses: shivammathur/setup-php@v2
with:
tools: composer:v2, flex
- name: Validate composer.json and composer.lock
run: composer validate
- name: Install dependencies
run: composer install --prefer-dist --no-progress --quiet
- name: Remove doctrine/annotations
run: composer remove --dev doctrine/annotations
- name: Run test suite
run: composer test:unit

View File

@ -1,201 +0,0 @@
name: Test suite
on:
workflow_dispatch:
schedule:
# Everyday at 5:00am
- cron: "0 5 * * *"
pull_request:
merge_group:
env:
CARGO_TERM_COLOR: always
RUST_BACKTRACE: 1
RUSTFLAGS: "-D warnings"
jobs:
test-linux:
name: Tests on ubuntu-22.04
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
steps:
- uses: actions/checkout@v4
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- name: Setup test with Rust stable
uses: dtolnay/rust-toolchain@1.81
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.7.7
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --locked --release --no-default-features --all
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all
test-others:
name: Tests on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [macos-13, windows-2022]
steps:
- uses: actions/checkout@v3
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.7.7
- uses: dtolnay/rust-toolchain@1.81
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --locked --release --no-default-features --all
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all
test-all-features:
name: Tests almost all features
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
steps:
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update
apt-get install --assume-yes build-essential curl
- uses: dtolnay/rust-toolchain@1.81
- name: Run cargo build with almost all features
run: |
cargo build --workspace --locked --release --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
- name: Run cargo test with almost all features
run: |
cargo test --workspace --locked --release --features "$(cargo xtask list-features --exclude-feature cuda,test-ollama)"
ollama-ubuntu:
name: Test with Ollama
runs-on: ubuntu-latest
env:
MEILI_TEST_OLLAMA_SERVER: "http://localhost:11434"
steps:
- uses: actions/checkout@v1
- name: Install Ollama
run: |
curl -fsSL https://ollama.com/install.sh | sudo -E sh
- name: Start serving
run: |
# Run it in the background, there is no way to daemonise at the moment
ollama serve &
# A short pause is required before the HTTP port is opened
sleep 5
# This endpoint blocks until ready
time curl -i http://localhost:11434
- name: Pull nomic-embed-text & all-minilm
run: |
ollama pull nomic-embed-text
ollama pull all-minilm
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release --all --features test-ollama ollama
test-disabled-tokenization:
name: Test disabled tokenization
runs-on: ubuntu-latest
container:
image: ubuntu:22.04
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
steps:
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update
apt-get install --assume-yes build-essential curl
- uses: dtolnay/rust-toolchain@1.81
- name: Run cargo tree without default features and check lindera is not present
run: |
if cargo tree -f '{p} {f}' -e normal --no-default-features | grep -qz lindera; then
echo "lindera has been found in the sources and it shouldn't"
exit 1
fi
- name: Run cargo tree with default features and check lindera is pressent
run: |
cargo tree -f '{p} {f}' -e normal | grep lindera -qz
# We run tests in debug also, to make sure that the debug_assertions are hit
test-debug:
name: Run tests in debug
runs-on: ubuntu-latest
container:
# Use ubuntu-22.04 to compile with glibc 2.35
image: ubuntu:22.04
steps:
- uses: actions/checkout@v3
- name: Install needed dependencies
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.81
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.7.7
- name: Run tests in debug
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --all
clippy:
name: Run Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
components: clippy
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.7.7
- name: Run cargo clippy
uses: actions-rs/cargo@v1
with:
command: clippy
args: --all-targets -- --deny warnings
fmt:
name: Run Rustfmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
with:
profile: minimal
toolchain: nightly-2024-07-09
override: true
components: rustfmt
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.7.7
- name: Run cargo fmt
# Since we never ran the `build.rs` script in the benchmark directory we are missing one auto-generated import file.
# Since we want to trigger (and fail) this action as fast as possible, instead of building the benchmark crate
# we are going to create an empty file where rustfmt expects it.
run: |
echo -ne "\n" > crates/benchmarks/benches/datasets_paths.rs
cargo fmt --all -- --check

View File

@ -1,4 +1,4 @@
name: Update Meilisearch version in Cargo.toml
name: Update Meilisearch version in all Cargo.toml files
on:
workflow_dispatch:
@ -13,33 +13,35 @@ env:
GH_TOKEN: ${{ secrets.MEILI_BOT_GH_PAT }}
jobs:
update-version-cargo-toml:
name: Update version in Cargo.toml
runs-on: ubuntu-latest
name: Update version in Cargo.toml files
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@1.81
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- name: Install sd
run: cargo install sd
- name: Update Cargo.toml file
- name: Update Cargo.toml files
run: |
raw_new_version=$(echo $NEW_VERSION | cut -d 'v' -f 2)
new_string="version = \"$raw_new_version\""
sd '^version = "\d+.\d+.\w+"$' "$new_string" Cargo.toml
sd '^version = "\d+.\d+.\w+"$' "$new_string" */Cargo.toml
- name: Build Meilisearch to update Cargo.lock
run: cargo build
- name: Commit and push the changes to the ${{ env.NEW_BRANCH }} branch
uses: EndBug/add-and-commit@v9
with:
message: "Update version for the next release (${{ env.NEW_VERSION }}) in Cargo.toml"
message: "Update version for the next release (${{ env.NEW_VERSION }}) in Cargo.toml files"
new_branch: ${{ env.NEW_BRANCH }}
- name: Create the PR pointing to ${{ github.ref_name }}
run: |
gh pr create \
--title "Update version for the next release ($NEW_VERSION) in Cargo.toml" \
--body '⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.' \
--title "Update version for the next release ($NEW_VERSION) in Cargo.toml files" \
--body '⚠️ This PR is automatically generated. Check the new version is the expected one before merging.' \
--label 'skip changelog' \
--milestone $NEW_VERSION \
--base $GITHUB_REF_NAME
--milestone $NEW_VERSION

10
.gitignore vendored
View File

@ -1,22 +1,16 @@
.idea/
.vscode/
/target
**/*.csv
**/*.json_lines
**/*.rs.bk
/*.mdb
/query-history.txt
/data.ms
/snapshots
/dumps
/bench
/_xtask_benchmark.ms
/benchmarks
# Snapshots
## ... large
*.full.snap
## ... unreviewed
*.snap.new
# Fuzzcheck data for the facet indexing fuzz test
crates/milli/fuzz/update::facet::incremental::fuzz::fuzz/

View File

@ -1,392 +0,0 @@
# Benchmarks
Currently this repository hosts two kinds of benchmarks:
1. The older "milli benchmarks", that use [criterion](https://github.com/bheisler/criterion.rs) and live in the "benchmarks" directory.
2. The newer "bench" that are workload-based and so split between the [`workloads`](./workloads/) directory and the [`xtask::bench`](./xtask/src/bench/) module.
This document describes the newer "bench" benchmarks. For more details on the "milli benchmarks", see [benchmarks/README.md](./benchmarks/README.md).
## Design philosophy for the benchmarks
The newer "bench" benchmarks are **integration** benchmarks, in the sense that they spawn an actual Meilisearch server and measure its performance end-to-end, including HTTP request overhead.
Since this is prone to fluctuating, the benchmarks regain a bit of precision by measuring the runtime of the individual spans using the [logging machinery](./CONTRIBUTING.md#logging) of Meilisearch.
A span roughly translates to a function call. The benchmark runner collects all the spans by name using the [logs route](https://github.com/orgs/meilisearch/discussions/721) and sums their runtime. The processed results are then sent to the [benchmark dashboard](https://bench.meilisearch.dev), which is in charge of storing and presenting the data.
## Running the benchmarks
Benchmarks can run locally or in CI.
### Locally
#### With a local benchmark dashboard
The benchmarks dashboard lives in its [own repository](https://github.com/meilisearch/benchboard). We provide binaries for Ubuntu/Debian, but you can build from source for other platforms (MacOS should work as it was developed under that platform).
Run the `benchboard` binary to create a fresh database of results. By default it will serve the results and the API to gather results on `http://localhost:9001`.
From the Meilisearch repository, you can then run benchmarks with:
```sh
cargo xtask bench -- workloads/my_workload_1.json ..
```
This command will build and run Meilisearch locally on port 7700, so make sure that this port is available.
To run benchmarks on a different commit, just use the usual git command to get back to the desired commit.
#### Without a local benchmark dashboard
To work with the raw results, you can also skip using a local benchmark dashboard.
Run:
```sh
cargo xtask bench --no-dashboard -- workloads/my_workload_1.json workloads/my_workload_2.json ..
```
For processing the results, look at [Looking at benchmark results/Without dashboard](#without-dashboard).
#### Sending a workload by hand
Sometimes you want to visualize the metrics of a worlkoad that comes from a custom report.
It is not quite easy to trick the benchboard in thinking that your report is legitimate but here are the commands you can run to upload your firefox report on a running benchboard.
```bash
# Name this hostname whatever you want
echo '{ "hostname": "the-best-place" }' | xh PUT 'http://127.0.0.1:9001/api/v1/machine'
# You'll receive an UUID from this command that we will call $invocation_uuid
echo '{ "commit": { "sha1": "1234567", "commit_date": "2024-09-05 12:00:12.0 +00:00:00", "message": "A cool message" }, "machine_hostname": "the-best-place", "max_workloads": 1 }' | xh PUT 'http://127.0.0.1:9001/api/v1/invocation'
# Just use UUID from the previous command
# and you'll receive another UUID that we will call $workload_uuid
echo '{ "invocation_uuid": "$invocation_uuid", "name": "toto", "max_runs": 1 }' | xh PUT 'http://127.0.0.1:9001/api/v1/workload'
# And now use your $workload_uuid and the content of your firefox report
# but don't forget to convert your firefox report from JSONLines into an object
echo '{ "workload_uuid": "$workload_uuid", "data": $REPORT_JSON_DATA }' | xh PUT 'http://127.0.0.1:9001/api/v1/run'
```
### In CI
We have dedicated runners to run workloads on CI. Currently, there are three ways of running the CI:
1. Automatically, on every push to `main`.
2. Manually, by clicking the [`Run workflow`](https://github.com/meilisearch/meilisearch/actions/workflows/bench-manual.yml) button and specifying the target reference (tag, commit or branch) as well as one or multiple workloads to run. The workloads must exist in the Meilisearch repository (conventionally, in the [`workloads`](./workloads/) directory) on the target reference. Globbing (e.g., `workloads/*.json`) works.
3. Manually on a PR, by posting a comment containing a `/bench` command, followed by one or multiple workloads to run. Globbing works. The workloads must exist in the Meilisearch repository in the branch of the PR.
```
/bench workloads/movies*.json /hackernews_1M.json
```
## Looking at benchmark results
### On the dashboard
Results are available on the global dashboard used by CI at <https://bench.meilisearch.dev> or on your [local dashboard](#with-a-local-benchmark-dashboard).
The dashboard homepage presents three sections:
1. The latest invocations (a call to `cargo xtask bench`, either local or by CI) with their reason (generally set to some helpful link in CI) and their status.
2. The latest workloads ran on `main`.
3. The latest workloads ran on other references.
By default, the workload shows the total runtime delta with the latest applicable commit on `main`. The latest applicable commit is the latest commit for workload invocations that do not originate on `main`, and the latest previous commit for workload invocations that originate on `main`.
You can explicitly request a detailed comparison by span with the `main` branch, the branch or origin, or any previous commit, by clicking the links at the bottom of the workload invocation.
In the detailed comparison view, the spans are sorted by improvements, regressions, stable (no statistically significant change) and unstable (the span runtime is comparable to its standard deviation).
You can click on the name of any span to get a box plot comparing the target commit with multiple commits of the selected branch.
### Without dashboard
After the workloads are done running, the reports will live in the Meilisearch repository, in the `bench/reports` directory (by default).
You can then convert these reports into other formats.
- To [Firefox profiler](https://profiler.firefox.com) format. Run:
```sh
cd bench/reports
cargo run --release --bin trace-to-firefox -- my_workload_1-0-trace.json
```
You can then upload the resulting `firefox-my_workload_1-0-trace.json` file to the online profiler.
## Designing benchmark workloads
Benchmark workloads conventionally live in the `workloads` directory of the Meilisearch repository.
They are JSON files with the following structure (comments are not actually supported, to make your own, remove them or copy some existing workload file):
```jsonc
{
// Name of the workload. Must be unique to the workload, as it will be used to group results on the dashboard.
"name": "hackernews.ndjson_1M,no-threads",
// Number of consecutive runs of the commands that should be performed.
// Each run uses a fresh instance of Meilisearch and a fresh database.
// Each run produces its own report file.
"run_count": 3,
// List of arguments to add to the Meilisearch command line.
"extra_cli_args": ["--max-indexing-threads=1"],
// An expression that can be parsed as a comma-separated list of targets and levels
// as described in [tracing_subscriber's documentation](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/targets/struct.Targets.html#examples).
// The expression is used to filter the spans that are measured for profiling purposes.
// Optional, defaults to "indexing::=trace" (for indexing workloads), common other values is
// "search::=trace"
"target": "indexing::=trace",
// List of named assets that can be used in the commands.
"assets": {
// name of the asset.
// Must be unique at the workload level.
// For better results, the same asset (same sha256) should have the same name accross workloads.
// Having multiple assets with the same name and distinct hashes is supported accross workloads,
// but will lead to superfluous downloads.
//
// Assets are stored in the `bench/assets/` directory by default.
"hackernews-100_000.ndjson": {
// If the assets exists in the local filesystem (Meilisearch repository or for your local workloads)
// Its file path can be specified here.
// `null` if the asset should be downloaded from a remote location.
"local_location": null,
// URL of the remote location where the asset can be downloaded.
// Use the `--assets-key` of the runner to pass an API key in the `Authorization: Bearer` header of the download requests.
// `null` if the asset should be imported from a local location.
// if both local and remote locations are specified, then the local one is tried first, then the remote one
// if the file is locally missing or its hash differs.
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/hackernews-100_000.ndjson",
// SHA256 of the asset.
// Optional, the `sha256` of the asset will be displayed during a run of the workload if it is missing.
// If present, the hash of the asset in the `bench/assets/` directory will be compared against this hash before
// running the workload. If the hashes differ, the asset will be downloaded anew.
"sha256": "60ecd23485d560edbd90d9ca31f0e6dba1455422f2a44e402600fbb5f7f1b213",
// Optional, one of "Auto", "Json", "NdJson" or "Raw".
// If missing, assumed to be "Auto".
// If "Auto", the format will be determined from the extension in the asset name.
"format": "NdJson"
},
"hackernews-200_000.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/hackernews-200_000.ndjson",
"sha256": "785b0271fdb47cba574fab617d5d332276b835c05dd86e4a95251cf7892a1685"
},
"hackernews-300_000.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/hackernews-300_000.ndjson",
"sha256": "de73c7154652eddfaf69cdc3b2f824d5c452f095f40a20a1c97bb1b5c4d80ab2"
},
"hackernews-400_000.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/hackernews-400_000.ndjson",
"sha256": "c1b00a24689110f366447e434c201c086d6f456d54ed1c4995894102794d8fe7"
},
"hackernews-500_000.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/hackernews-500_000.ndjson",
"sha256": "ae98f9dbef8193d750e3e2dbb6a91648941a1edca5f6e82c143e7996f4840083"
},
"hackernews-600_000.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/hackernews-600_000.ndjson",
"sha256": "b495fdc72c4a944801f786400f22076ab99186bee9699f67cbab2f21f5b74dbe"
},
"hackernews-700_000.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/hackernews-700_000.ndjson",
"sha256": "4b2c63974f3dabaa4954e3d4598b48324d03c522321ac05b0d583f36cb78a28b"
},
"hackernews-800_000.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/hackernews-800_000.ndjson",
"sha256": "cb7b6afe0e6caa1be111be256821bc63b0771b2a0e1fad95af7aaeeffd7ba546"
},
"hackernews-900_000.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/hackernews-900_000.ndjson",
"sha256": "e1154ddcd398f1c867758a93db5bcb21a07b9e55530c188a2917fdef332d3ba9"
},
"hackernews-1_000_000.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/hackernews-1_000_000.ndjson",
"sha256": "27e25efd0b68b159b8b21350d9af76938710cb29ce0393fa71b41c4f3c630ffe"
}
},
// Core of the workload.
// A list of commands to run sequentially.
// Optional: A precommand is a request to the Meilisearch instance that is executed before the profiling runs.
"precommands": [
{
// Meilisearch route to call. `http://localhost:7700/` will be prepended.
"route": "indexes/movies/settings",
// HTTP method to call.
"method": "PATCH",
// If applicable, body of the request.
// Optional, if missing, the body will be empty.
"body": {
// One of "empty", "inline" or "asset".
// If using "empty", you can skip the entire "body" key.
"inline": {
// when "inline" is used, the body is the JSON object that is the value of the `"inline"` key.
"displayedAttributes": [
"title",
"by",
"score",
"time"
],
"searchableAttributes": [
"title"
],
"filterableAttributes": [
"by"
],
"sortableAttributes": [
"score",
"time"
]
}
},
// Whether to wait before running the next request.
// One of:
// - DontWait: run the next command without waiting the response to this one.
// - WaitForResponse: run the next command as soon as the response from the server is received.
// - WaitForTask: run the next command once **all** the Meilisearch tasks created up to now have finished processing.
"synchronous": "WaitForTask"
}
],
// A command is a request to the Meilisearch instance that is executed while the profiling runs.
"commands": [
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
// When using "asset", use the name of an asset as value to use the content of that asset as body.
// the content type is derived of the format of the asset:
// "NdJson" => "application/x-ndjson"
// "Json" => "application/json"
// "Raw" => "application/octet-stream"
// See [AssetFormat::to_content_type](https://github.com/meilisearch/meilisearch/blob/7b670a4afadb132ac4a01b6403108700501a391d/xtask/src/bench/assets.rs#L30)
// for details and up-to-date list.
"asset": "hackernews-100_000.ndjson"
},
"synchronous": "WaitForTask"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-200_000.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-300_000.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-400_000.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-500_000.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-600_000.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-700_000.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-800_000.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-900_000.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-1_000_000.ndjson"
},
"synchronous": "WaitForTask"
}
]
}
```
### Adding new assets
Assets reside in our DigitalOcean S3 space. Assuming you have team access to the DigitalOcean S3 space:
1. go to <https://cloud.digitalocean.com/spaces/milli-benchmarks?i=d1c552&path=bench%2Fdatasets%2F>
2. upload your dataset:
1. if your dataset is a single file, upload that single file using the "upload" button,
2. otherwise, create a folder using the "create folder" button, then inside that folder upload your individual files.
## Upgrading `https://bench.meilisearch.dev`
The URL of the server is in our password manager (look for "benchboard").
1. Make the needed modifications on the [benchboard repository](https://github.com/meilisearch/benchboard) and merge them to main.
2. Publish a new release to produce the Ubuntu/Debian binary.
3. Download the binary locally, send it to the server:
```
scp -6 ~/Downloads/benchboard root@\[<ipv6-address>\]:/bench/new-benchboard
```
Note that the ipv6 must be between escaped square brackets for SCP.
4. SSH to the server:
```
ssh root@<ipv6-address>
```
Note the ipv6 must **NOT** be between escaped square brackets for SSH 🥲
5. On the server, set the correct permissions for the new binary:
```
chown bench:bench /bench/new-benchboard
chmod 700 /bench/new-benchboard
```
6. On the server, move the new binary to the location of the running binary (if unsure, start by making a backup of the running binary):
```
mv /bench/{new-,}benchboard
```
7. Restart the benchboard service.
```
systemctl restart benchboard
```
8. Check that the service runs correctly.
```
systemctl status benchboard
```
9. Check the availability of the service by going to <https://bench.meilisearch.dev> on your browser.

View File

@ -4,23 +4,35 @@ First, thank you for contributing to Meilisearch! The goal of this document is t
Remember that there are many ways to contribute other than writing code: writing [tutorials or blog posts](https://github.com/meilisearch/awesome-meilisearch), improving [the documentation](https://github.com/meilisearch/documentation), submitting [bug reports](https://github.com/meilisearch/meilisearch/issues/new?assignees=&labels=&template=bug_report.md&title=) and [feature requests](https://github.com/meilisearch/product/discussions/categories/feedback-feature-proposal)...
Meilisearch can manage multiple indexes, handle the update store, and expose an HTTP API. Search and indexation are the domain of our core engine, [`milli`](https://github.com/meilisearch/meilisearch/tree/main/milli), while tokenization is handled by [our `charabia` library](https://github.com/meilisearch/charabia/).
The code in this repository is only concerned with managing multiple indexes, handling the update store, and exposing an HTTP API. Search and indexation are the domain of our core engine, [`milli`](https://github.com/meilisearch/milli), while tokenization is handled by [our `charabia` library](https://github.com/meilisearch/charabia/).
If Meilisearch does not offer optimized support for your language, please consider contributing to `charabia` by following the [CONTRIBUTING.md file](https://github.com/meilisearch/charabia/blob/main/CONTRIBUTING.md) and integrating your intended normalizer/segmenter.
## Table of Contents
- [Hacktoberfest 2022](#hacktoberfest-2022)
- [Assumptions](#assumptions)
- [How to Contribute](#how-to-contribute)
- [Development Workflow](#development-workflow)
- [Git Guidelines](#git-guidelines)
- [Release Process (for internal team only)](#release-process-for-internal-team-only)
## Hacktoberfest 2022
It's [Hacktoberfest month](https://hacktoberfest.com)! 🥳
Thanks so much for participating with Meilisearch this year!
1. We will follow the quality standards set by the organizers of Hacktoberfest (see detail on their [website](https://hacktoberfest.com/participation/#spam)). Our reviewers will not consider any PR that doesnt match that standard.
2. PRs reviews will take place from Monday to Thursday, during usual working hours, CEST time. If you submit outside of these hours, theres no need to panic; we will get around to your contribution.
3. There will be no issue assignment as we dont want people to ask to be assigned specific issues and never return, discouraging the volunteer contributors from opening a PR to fix this issue. We take the liberty to choose the PR that best fixes the issue, so we encourage you to get to it as soon as possible and do your best!
You can check out the longer, more complete guideline documentation [here](https://github.com/meilisearch/.github/blob/main/Hacktoberfest_2022_contributors_guidelines.md).
## Assumptions
1. **You're familiar with [GitHub](https://github.com) and the [Pull Requests (PR)](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) workflow.**
2. **You've read the Meilisearch [documentation](https://www.meilisearch.com/docs).**
3. **You know about the [Meilisearch community on Discord](https://discord.meilisearch.com).
1. **You're familiar with [GitHub](https://github.com) and the [Pull Requests](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)(PR) workflow.**
2. **You've read the Meilisearch [documentation](https://docs.meilisearch.com).**
3. **You know about the [Meilisearch community](https://docs.meilisearch.com/learn/what_is_meilisearch/contact.html).
Please use this for help.**
## How to Contribute
@ -52,78 +64,12 @@ cargo test
This command will be triggered to each PR as a requirement for merging it.
#### Faster build
You can set the `LINDERA_CACHE` environment variable to speed up your successive builds by up to 2 minutes.
It'll store some built artifacts in the directory of your choice.
We recommend using the standard `$HOME/.cache/lindera` directory:
```sh
export LINDERA_CACHE=$HOME/.cache/lindera
```
Furthermore, you can improve incremental compilation by setting the `MEILI_NO_VERGEN` environment variable.
Setting this variable will prevent the Meilisearch binary from being rebuilt each time the directory that hosts the Meilisearch repository changes.
Do not enable this environment variable for production builds (as it will break the `version` route, among other things).
#### Snapshot-based tests
We are using [insta](https://insta.rs) to perform snapshot-based testing.
We recommend using the insta tooling (such as `cargo-insta`) to update the snapshots if they change following a PR.
New tests should use insta where possible rather than manual `assert` statements.
Furthermore, we provide some macros on top of insta, notably a way to use snapshot hashes instead of inline snapshots, saving a lot of space in the repository.
To effectively debug snapshot-based hashes, we recommend you export the `MEILI_TEST_FULL_SNAPS` environment variable so that snapshot are fully created locally:
```sh
export MEILI_TEST_FULL_SNAPS=true # add this to your .bashrc, .zshrc, ...
```
#### Test troubleshooting
If you get a "Too many open files" error you might want to increase the open file limit using this command:
```bash
ulimit -Sn 3000
```
#### Build tools
Meilisearch follows the [cargo xtask](https://github.com/matklad/cargo-xtask) workflow to provide some build tools.
Run `cargo xtask --help` from the root of the repository to find out what is available.
#### Update the openAPI file if the APIchanged
To update the openAPI file in the code, see [sprint_issue.md](https://github.com/meilisearch/meilisearch/blob/main/.github/ISSUE_TEMPLATE/sprint_issue.md#reminders-when-modifying-the-api).
If you want to update the openAPI file on the [open-api repository](https://github.com/meilisearch/open-api), see [update-openapi-issue.md](https://github.com/meilisearch/engine-team/blob/main/issue-templates/update-openapi-issue.md).
### Logging
Meilisearch uses [`tracing`](https://lib.rs/crates/tracing) for logging purposes. Tracing logs are structured and can be displayed as JSON to the end user, so prefer passing arguments as fields rather than interpolating them in the message.
Refer to the [documentation](https://docs.rs/tracing/0.1.40/tracing/index.html#using-the-macros) for the syntax of the spans and events.
Logging spans are used for 3 distinct purposes:
1. Regular logging
2. Profiling
3. Benchmarking
As a result, the spans should follow some rules:
- They should not be put on functions that are called too often. That is because opening and closing a span causes some overhead. For regular logging, avoid putting spans on functions that are taking less than a few hundred nanoseconds. For profiling or benchmarking, avoid putting spans on functions that are taking less than a few microseconds.
- For profiling and benchmarking, use the `TRACE` level.
- For profiling and benchmarking, use the following `target` prefixes:
- `indexing::` for spans meant when profiling the indexing operations.
- `search::` for spans meant when profiling the search operations.
### Benchmarking
See [BENCHMARKS.md](./BENCHMARKS.md)
## Git Guidelines
### Git Branches
@ -150,7 +96,7 @@ Some notes on GitHub PRs:
- The PR title should be accurate and descriptive of the changes.
- [Convert your PR as a draft](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request) if your changes are a work in progress: no one will review it until you pass your PR as ready for review.<br>
The draft PRs are recommended when you want to show that you are working on something and make your work visible.
- The branch related to the PR must be **up-to-date with `main`** before merging. Fortunately, this project uses [GitHub Merge Queues](https://github.blog/news-insights/product-news/github-merge-queue-is-generally-available/) to automatically enforce this requirement without the PR author having to rebase manually.
- The branch related to the PR must be **up-to-date with `main`** before merging. Fortunately, this project uses [Bors](https://github.com/bors-ng/bors-ng) to automatically enforce this requirement without the PR author having to rebase manually.
## Release Process (for internal team only)
@ -158,19 +104,12 @@ Meilisearch tools follow the [Semantic Versioning Convention](https://semver.org
### Automation to rebase and Merge the PRs
This project uses GitHub Merge Queues that helps us manage pull requests merging.
This project integrates a bot that helps us manage pull requests merging.<br>
_[Read more about this](https://github.com/meilisearch/integration-guides/blob/main/resources/bors.md)._
### How to Publish a new Release
The full Meilisearch release process is described in [this guide](https://github.com/meilisearch/engine-team/blob/main/resources/meilisearch-release.md). Please follow it carefully before doing any release.
### How to publish a prototype
Depending on the developed feature, you might need to provide a prototyped version of Meilisearch to make it easier to test by the users.
This happens in two steps:
- [Release the prototype](https://github.com/meilisearch/engine-team/blob/main/resources/prototypes.md#how-to-publish-a-prototype)
- [Communicate about it](https://github.com/meilisearch/engine-team/blob/main/resources/prototypes.md#communication)
The full Meilisearch release process is described in [this guide](https://github.com/meilisearch/core-team/blob/main/resources/meilisearch-release.md). Please follow it carefully before doing any release.
### Release assets

5615
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -1,51 +1,21 @@
[workspace]
resolver = "2"
members = [
"crates/meilisearch",
"crates/meilitool",
"crates/meilisearch-types",
"crates/meilisearch-auth",
"crates/meili-snap",
"crates/index-scheduler",
"crates/dump",
"crates/file-store",
"crates/permissive-json-pointer",
"crates/milli",
"crates/filter-parser",
"crates/flatten-serde-json",
"crates/json-depth-checker",
"crates/benchmarks",
"crates/fuzzers",
"crates/tracing-trace",
"crates/xtask",
"crates/build-info",
"meilisearch-http",
"meilisearch-types",
"meilisearch-auth",
"meili-snap",
"index-scheduler",
"dump",
"file-store",
"permissive-json-pointer",
]
[workspace.package]
version = "1.14.0"
authors = [
"Quentin de Quelen <quentin@dequelen.me>",
"Clément Renault <clement@meilisearch.com>",
]
description = "Meilisearch HTTP server"
homepage = "https://meilisearch.com"
readme = "README.md"
edition = "2021"
license = "MIT"
[profile.release]
codegen-units = 1
# We now compile heed without the NDEBUG define for better performance.
# However, we still enable debug assertions for a better detection of
# disk corruption on the cloud or in OSS.
[profile.release.package.heed]
debug-assertions = true
[profile.dev.package.flate2]
opt-level = 3
[profile.dev.package.grenad]
opt-level = 3
[profile.dev.package.roaring]
[profile.dev.package.milli]
opt-level = 3

View File

@ -1,14 +1,13 @@
# Compile
FROM rust:1.81.0-alpine3.20 AS compiler
FROM rust:alpine3.16 AS compiler
RUN apk add -q --no-cache build-base openssl-dev
RUN apk add -q --update-cache --no-cache build-base openssl-dev
WORKDIR /
WORKDIR /meilisearch
ARG COMMIT_SHA
ARG COMMIT_DATE
ARG GIT_TAG
ENV VERGEN_GIT_SHA=${COMMIT_SHA} VERGEN_GIT_COMMIT_TIMESTAMP=${COMMIT_DATE} VERGEN_GIT_DESCRIBE=${GIT_TAG}
ENV COMMIT_SHA=${COMMIT_SHA} COMMIT_DATE=${COMMIT_DATE}
ENV RUSTFLAGS="-C target-feature=-crt-static"
COPY . .
@ -17,21 +16,20 @@ RUN set -eux; \
if [ "$apkArch" = "aarch64" ]; then \
export JEMALLOC_SYS_WITH_LG_PAGE=16; \
fi && \
cargo build --release -p meilisearch -p meilitool
cargo build --release
# Run
FROM alpine:3.20
LABEL org.opencontainers.image.source="https://github.com/meilisearch/meilisearch"
FROM alpine:3.16
ENV MEILI_HTTP_ADDR 0.0.0.0:7700
ENV MEILI_SERVER_PROVIDER docker
RUN apk add -q --no-cache libgcc tini curl
RUN apk update --quiet \
&& apk add -q --no-cache libgcc tini curl
# add meilisearch and meilitool to the `/bin` so you can run it from anywhere
# and it's easy to find.
COPY --from=compiler /target/release/meilisearch /bin/meilisearch
COPY --from=compiler /target/release/meilitool /bin/meilitool
# add meilisearch to the `/bin` so you can run it from anywhere and it's easy
# to find.
COPY --from=compiler /meilisearch/target/release/meilisearch /bin/meilisearch
# To stay compatible with the older version of the container (pre v0.27.0) we're
# going to symlink the meilisearch binary in the path to `/meilisearch`
RUN ln -s /bin/meilisearch /meilisearch

View File

@ -1,6 +1,6 @@
MIT License
Copyright (c) 2019-2025 Meili SAS
Copyright (c) 2019-2022 Meili SAS
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@ -1,19 +0,0 @@
# Profiling Meilisearch
Search engine technologies are complex pieces of software that require thorough profiling tools. We chose to use [Puffin](https://github.com/EmbarkStudios/puffin), which the Rust gaming industry uses extensively. You can export and import the profiling reports using the top bar's _File_ menu options [in Puffin Viewer](https://github.com/embarkstudios/puffin#ui).
![An example profiling with Puffin viewer](assets/profiling-example.png)
## Profiling the Indexing Process
When you enable [the `exportPuffinReports` experimental feature](https://www.meilisearch.com/docs/learn/experimental/overview) of Meilisearch, Puffin reports with the `.puffin` extension will be automatically exported to disk. When this option is enabled, the engine will automatically create a "frame" whenever it executes the `IndexScheduler::tick` method.
[Puffin Viewer](https://github.com/EmbarkStudios/puffin/tree/main/puffin_viewer) is used to analyze the reports. Those reports show areas where Meilisearch spent time during indexing.
Another piece of advice on the Puffin viewer UI interface is to consider the _Merge children with same ID_ option. It can hide the exact actual timings at which events were sent. Please turn it off when you see strange gaps on the Flamegraph. It can help.
## Profiling the Search Process
We still need to take the time to profile the search side of the engine with Puffin. It would require time to profile the filtering phase, query parsing, creation, and execution. We could even profile the Actix HTTP server.
The only issue we see is the framing system. Puffin requires a global frame-based profiling phase, which collides with Meilisearch's ability to accept and answer multiple requests on different threads simultaneously.

107
README.md
View File

@ -1,105 +1,103 @@
<p align="center">
<a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=logo#gh-light-mode-only" target="_blank">
<img src="assets/meilisearch-logo-light.svg?sanitize=true#gh-light-mode-only">
</a>
<a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=logo#gh-dark-mode-only" target="_blank">
<img src="assets/meilisearch-logo-dark.svg?sanitize=true#gh-dark-mode-only">
</a>
<img src="assets/meilisearch-logo-light.svg?sanitize=true#gh-light-mode-only">
<img src="assets/meilisearch-logo-dark.svg?sanitize=true#gh-dark-mode-only">
</p>
<h4 align="center">
<a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">Website</a> |
<a href="https://www.meilisearch.com">Website</a> |
<a href="https://roadmap.meilisearch.com/tabs/1-under-consideration">Roadmap</a> |
<a href="https://www.meilisearch.com/pricing?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">Meilisearch Cloud</a> |
<a href="https://blog.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">Blog</a> |
<a href="https://www.meilisearch.com/docs?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">Documentation</a> |
<a href="https://www.meilisearch.com/docs/faq?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">FAQ</a> |
<a href="https://discord.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">Discord</a>
<a href="https://blog.meilisearch.com">Blog</a> |
<a href="https://docs.meilisearch.com">Documentation</a> |
<a href="https://docs.meilisearch.com/faq/">FAQ</a> |
<a href="https://slack.meilisearch.com">Slack</a>
</h4>
<p align="center">
<a href="https://github.com/meilisearch/meilisearch/actions"><img src="https://github.com/meilisearch/meilisearch/workflows/Cargo%20test/badge.svg" alt="Build Status"></a>
<a href="https://deps.rs/repo/github/meilisearch/meilisearch"><img src="https://deps.rs/repo/github/meilisearch/meilisearch/status.svg" alt="Dependency status"></a>
<a href="https://github.com/meilisearch/meilisearch/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-informational" alt="License"></a>
<a href="https://github.com/meilisearch/meilisearch/queue"><img alt="Merge Queues enabled" src="https://img.shields.io/badge/Merge_Queues-enabled-%2357cf60?logo=github"></a>
<a href="https://app.bors.tech/repositories/26457"><img src="https://bors.tech/images/badge_small.svg" alt="Bors enabled"></a>
</p>
<p align="center">⚡ A lightning-fast search engine that fits effortlessly into your apps, websites, and workflow 🔍</p>
[Meilisearch](https://www.meilisearch.com?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=intro) helps you shape a delightful search experience in a snap, offering features that work out of the box to speed up your workflow.
Meilisearch helps you shape a delightful search experience in a snap, offering features that work out-of-the-box to speed up your workflow.
<p align="center" name="demo">
<a href="https://where2watch.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demo-gif#gh-light-mode-only" target="_blank">
<a href="https://where2watch.meilisearch.com/#gh-light-mode-only" target="_blank">
<img src="assets/demo-light.gif#gh-light-mode-only" alt="A bright colored application for finding movies screening near the user">
</a>
<a href="https://where2watch.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demo-gif#gh-dark-mode-only" target="_blank">
<a href="https://where2watch.meilisearch.com/#gh-dark-mode-only" target="_blank">
<img src="assets/demo-dark.gif#gh-dark-mode-only" alt="A dark colored application for finding movies screening near the user">
</a>
</p>
## 🖥 Examples
🔥 [**Try it!**](https://where2watch.meilisearch.com/) 🔥
- [**Movies**](https://where2watch.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=organization) — An application to help you find streaming platforms to watch movies using [hybrid search](https://www.meilisearch.com/solutions/hybrid-search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos).
- [**Ecommerce**](https://ecommerce.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Ecommerce website using disjunctive [facets](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos), range and rating filtering, and pagination.
- [**Songs**](https://music.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Search through 47 million of songs.
- [**SaaS**](https://saas.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) — Search for contacts, deals, and companies in this [multi-tenant](https://www.meilisearch.com/docs/learn/security/multitenancy_tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demos) CRM application.
## 🎃 Hacktoberfest
See the list of all our example apps in our [demos repository](https://github.com/meilisearch/demos).
Its Hacktoberfest 2022 @Meilisearch
[Hacktoberfest](https://hacktoberfest.com/) is a celebration of the open-source community. This year, and for the third time in a row, Meilisearch is participating in this fantastic event.
Youd like to contribute? Dont hesitate to check out our [contributing guidelines](./CONTRIBUTING.md).
## ✨ Features
- **Hybrid search:** Combine the best of both [semantic](https://www.meilisearch.com/docs/learn/experimental/vector_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features) & full-text search to get the most relevant results
- **Search-as-you-type:** Find & display results in less than 50 milliseconds to provide an intuitive experience
- **[Typo tolerance](https://www.meilisearch.com/docs/learn/relevancy/typo_tolerance_settings?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** get relevant matches even when queries contain typos and misspellings
- **[Filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features) and [faceted search](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** enhance your users' search experience with custom filters and build a faceted search interface in a few lines of code
- **[Sorting](https://www.meilisearch.com/docs/learn/fine_tuning_results/sorting?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** sort results based on price, date, or pretty much anything else your users need
- **[Synonym support](https://www.meilisearch.com/docs/learn/relevancy/synonyms?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** configure synonyms to include more relevant content in your search results
- **[Geosearch](https://www.meilisearch.com/docs/learn/fine_tuning_results/geosearch?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** filter and sort documents based on geographic data
- **[Extensive language support](https://www.meilisearch.com/docs/learn/what_is_meilisearch/language?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** search datasets in any language, with optimized support for Chinese, Japanese, Hebrew, and languages using the Latin alphabet
- **[Security management](https://www.meilisearch.com/docs/learn/security/master_api_keys?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** control which users can access what data with API keys that allow fine-grained permissions handling
- **[Multi-Tenancy](https://www.meilisearch.com/docs/learn/security/multitenancy_tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** personalize search results for any number of application tenants
- **Search-as-you-type:** find search results in less than 50 milliseconds
- **[Typo tolerance](https://docs.meilisearch.com/learn/getting_started/customizing_relevancy.html#typo-tolerance):** get relevant matches even when queries contain typos and misspellings
- **[Filtering and faceted search](https://docs.meilisearch.com/learn/advanced/filtering_and_faceted_search.html):** enhance your user's search experience with custom filters and build a faceted search interface in a few lines of code
- **[Sorting](https://docs.meilisearch.com/learn/advanced/sorting.html):** sort results based on price, date, or pretty much anything else your users need
- **[Synonym support](https://docs.meilisearch.com/learn/getting_started/customizing_relevancy.html#synonyms):** configure synonyms to include more relevant content in your search results
- **[Geosearch](https://docs.meilisearch.com/learn/advanced/geosearch.html):** filter and sort documents based on geographic data
- **[Extensive language support](https://docs.meilisearch.com/learn/what_is_meilisearch/language.html):** search datasets in any language, with optimized support for Chinese, Japanese, Hebrew, and languages using the Latin alphabet
- **[Security management](https://docs.meilisearch.com/learn/security/master_api_keys.html):** control which users can access what data with API keys that allow fine-grained permissions handling
- **[Multi-Tenancy](https://docs.meilisearch.com/learn/security/tenant_tokens.html):** personalize search results for any number of application tenants
- **Highly Customizable:** customize Meilisearch to your specific needs or use our out-of-the-box and hassle-free presets
- **[RESTful API](https://www.meilisearch.com/docs/reference/api/overview?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** integrate Meilisearch in your technical stack with our plugins and SDKs
- **AI-ready:** works out of the box with [langchain](https://www.meilisearch.com/with/langchain) and the [model context protocol](https://github.com/meilisearch/meilisearch-mcp)
- **[RESTful API](https://docs.meilisearch.com/reference/api/overview.html):** integrate Meilisearch in your technical stack with our plugins and SDKs
- **Easy to install, deploy, and maintain**
## 📖 Documentation
You can consult Meilisearch's documentation at [meilisearch.com/docs](https://www.meilisearch.com/docs/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=docs).
You can consult Meilisearch's documentation at [https://docs.meilisearch.com](https://docs.meilisearch.com/).
## 🚀 Getting started
For basic instructions on how to set up Meilisearch, add documents to an index, and search for documents, take a look at our [documentation](https://www.meilisearch.com/docs?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=get-started) guide.
For basic instructions on how to set up Meilisearch, add documents to an index, and search for documents, take a look at our [Quick Start](https://docs.meilisearch.com/learn/getting_started/quick_start.html) guide.
## 🌍 Supercharge your Meilisearch experience
You may also want to check out [Meilisearch 101](https://docs.meilisearch.com/learn/getting_started/filtering_and_sorting.html) for an introduction to some of Meilisearch's most popular features.
Say goodbye to server deployment and manual updates with [Meilisearch Cloud](https://www.meilisearch.com/cloud?utm_campaign=oss&utm_source=github&utm_medium=meilisearch). Additional features include analytics & monitoring in many regions around the world. No credit card is required.
## ☁️ Meilisearch cloud
Join the closed beta for Meilisearch cloud by filling out [this form](https://meilisearch.typeform.com/to/VI2cI2rv).
## 🧰 SDKs & integration tools
Install one of our SDKs in your project for seamless integration between Meilisearch and your favorite language or framework!
Take a look at the complete [Meilisearch integration list](https://www.meilisearch.com/docs/learn/what_is_meilisearch/sdks?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=sdks-link).
Take a look at the complete [Meilisearch integration list](https://docs.meilisearch.com/learn/what_is_meilisearch/sdks.html).
[![Logos belonging to different languages and frameworks supported by Meilisearch, including React, Ruby on Rails, Go, Rust, and PHP](assets/integrations.png)](https://www.meilisearch.com/docs/learn/what_is_meilisearch/sdks?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=sdks-logos)
![Logos belonging to different languages and frameworks supported by Meilisearch, including React, Ruby on Rails, Go, Rust, and PHP](assets/integrations.png)
## ⚙️ Advanced usage
Experienced users will want to keep our [API Reference](https://www.meilisearch.com/docs/reference/api/overview?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced) close at hand.
Experienced users will want to keep our [API Reference](https://docs.meilisearch.com/reference/api) close at hand.
We also offer a wide range of dedicated guides to all Meilisearch features, such as [filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), [sorting](https://www.meilisearch.com/docs/learn/fine_tuning_results/sorting?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), [geosearch](https://www.meilisearch.com/docs/learn/fine_tuning_results/geosearch?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), [API keys](https://www.meilisearch.com/docs/learn/security/master_api_keys?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), and [tenant tokens](https://www.meilisearch.com/docs/learn/security/tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced).
We also offer a wide range of dedicated guides to all Meilisearch features, such as [filtering](https://docs.meilisearch.com/learn/advanced/filtering_and_faceted_search.html), [sorting](https://docs.meilisearch.com/learn/advanced/sorting.html), [geosearch](https://docs.meilisearch.com/learn/advanced/geosearch.html), [API keys](https://docs.meilisearch.com/learn/security/master_api_keys.html), and [tenant tokens](https://docs.meilisearch.com/learn/security/tenant_tokens.html).
Finally, for more in-depth information, refer to our articles explaining fundamental Meilisearch concepts such as [documents](https://www.meilisearch.com/docs/learn/core_concepts/documents?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced) and [indexes](https://www.meilisearch.com/docs/learn/core_concepts/indexes?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced).
Finally, for more in-depth information, refer to our articles explaining fundamental Meilisearch concepts such as [documents](https://docs.meilisearch.com/learn/core_concepts/documents.html) and [indexes](https://docs.meilisearch.com/learn/core_concepts/indexes.html).
## 📊 Telemetry
Meilisearch collects **anonymized** user data to help us improve our product. You can [deactivate this](https://www.meilisearch.com/docs/learn/what_is_meilisearch/telemetry?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=telemetry#how-to-disable-data-collection) whenever you want.
Meilisearch collects **anonymized** data from users to help us improve our product. You can [deactivate this](https://docs.meilisearch.com/learn/what_is_meilisearch/telemetry.html#how-to-disable-data-collection) whenever you want.
To request deletion of collected data, please write to us at [privacy@meilisearch.com](mailto:privacy@meilisearch.com). Remember to include your `Instance UID` in the message, as this helps us quickly find and delete your data.
To request deletion of collected data, please write to us at [privacy@meilisearch.com](mailto:privacy@meilisearch.com). Don't forget to include your `Instance UID` in the message, as this helps us quickly find and delete your data.
If you want to know more about the kind of data we collect and what we use it for, check the [telemetry section](https://www.meilisearch.com/docs/learn/what_is_meilisearch/telemetry?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=telemetry#how-to-disable-data-collection) of our documentation.
If you want to know more about the kind of data we collect and what we use it for, check the [telemetry section](https://docs.meilisearch.com/learn/what_is_meilisearch/telemetry.html) of our documentation.
## 📫 Get in touch!
Meilisearch is a search engine created by [Meili]([https://www.welcometothejungle.com/en/companies/meilisearch](https://www.meilisearch.com/careers)), a software development company headquartered in France and with team members all over the world. Want to know more about us? [Check out our blog!](https://blog.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=contact)
Meilisearch is a search engine created by [Meili](https://www.welcometothejungle.com/en/companies/meilisearch), a software development company based in France and with team members all over the world. Want to know more about us? [Check out our blog!](https://blog.meilisearch.com/)
🗞 [Subscribe to our newsletter](https://meilisearch.us2.list-manage.com/subscribe?u=27870f7b71c908a8b359599fb&id=79582d828e) if you don't want to miss any updates! We promise we won't clutter your mailbox: we only send one edition every two months.
@ -107,18 +105,7 @@ Meilisearch is a search engine created by [Meili]([https://www.welcometothejungl
- For feature requests, please visit our [product repository](https://github.com/meilisearch/product/discussions)
- Found a bug? Open an [issue](https://github.com/meilisearch/meilisearch/issues)!
- Want to be part of our Discord community? [Join us!](https://discord.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=contact)
- Want to be part of our Slack community? [Join us!](https://slack.meilisearch.com/)
- For everything else, please check [this page listing some of the other places where you can find us](https://docs.meilisearch.com/learn/what_is_meilisearch/contact.html)
Thank you for your support!
## 👩‍💻 Contributing
Meilisearch is, and will always be, open-source! If you want to contribute to the project, please look at [our contribution guidelines](CONTRIBUTING.md).
## 📦 Versioning
Meilisearch releases and their associated binaries are available on the project's [releases page](https://github.com/meilisearch/meilisearch/releases).
The binaries are versioned following [SemVer conventions](https://semver.org/). To know more, read our [versioning policy](https://github.com/meilisearch/engine-team/blob/main/resources/versioning-policy.md).
Differently from the binaries, crates in this repository are not currently available on [crates.io](https://crates.io/) and do not follow [SemVer conventions](https://semver.org).

File diff suppressed because it is too large Load Diff

View File

@ -1,6 +0,0 @@
<svg width="277" height="236" viewBox="0 0 277 236" fill="none" xmlns="http://www.w3.org/2000/svg">
<path fill-rule="evenodd" clip-rule="evenodd" d="M213.085 190L242.907 86H276.196L246.375 190H213.085Z" fill="#494949"/>
<path fill-rule="evenodd" clip-rule="evenodd" d="M0 190L29.8215 86H63.1111L33.2896 190H0Z" fill="#494949"/>
<path fill-rule="evenodd" clip-rule="evenodd" d="M124.986 0L57.5772 235.083L60.7752 236H90.6038L158.276 0H124.986Z" fill="#494949"/>
<path fill-rule="evenodd" clip-rule="evenodd" d="M195.273 0L127.601 236H160.891L228.563 0H195.273Z" fill="#494949"/>
</svg>

Before

Width:  |  Height:  |  Size: 585 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 MiB

View File

@ -1,19 +0,0 @@
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'meilisearch'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:7700']

11
bors.toml Normal file
View File

@ -0,0 +1,11 @@
status = [
'Tests on ubuntu-18.04',
'Tests on macos-latest',
'Tests on windows-latest',
'Run Clippy',
'Run Rustfmt',
'Run tests in debug',
]
pr_status = ['Milestone Check']
# 3 hours timeout
timeout-sec = 10800

View File

@ -1,134 +1,135 @@
# This file shows the default configuration of Meilisearch.
# All variables are defined here: https://www.meilisearch.com/docs/learn/configuration/instance_options#environment-variables
# All variables are defined here: https://docs.meilisearch.com/learn/configuration/instance_options.html#environment-variables
# Designates the location where database files will be created and retrieved.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#database-path
db_path = "./data.ms"
# Designates the location where database files will be created and retrieved.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#database-path
# Configures the instance's environment. Value must be either `production` or `development`.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#environment
env = "development"
# Configures the instance's environment. Value must be either `production` or `development`.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#environment
# The address on which the HTTP server will listen.
http_addr = "localhost:7700"
# The address on which the HTTP server will listen.
# Sets the instance's master key, automatically protecting all routes except GET /health.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#master-key
# master_key = "YOUR_MASTER_KEY_VALUE"
# Sets the instance's master key, automatically protecting all routes except GET /health.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#master-key
# no_analytics = true
# Deactivates Meilisearch's built-in telemetry when provided.
# Meilisearch automatically collects data from all instances that do not opt out using this flag.
# All gathered data is used solely for the purpose of improving Meilisearch, and can be deleted at any time.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#disable-analytics
# no_analytics = true
# https://docs.meilisearch.com/learn/configuration/instance_options.html#disable-analytics
# Sets the maximum size of accepted payloads.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#payload-limit-size
http_payload_size_limit = "100 MB"
# Sets the maximum size of accepted payloads.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#payload-limit-size
# Defines how much detail should be present in Meilisearch's logs.
# Meilisearch currently supports six log levels, listed in order of increasing verbosity: `OFF`, `ERROR`, `WARN`, `INFO`, `DEBUG`, `TRACE`
# https://www.meilisearch.com/docs/learn/configuration/instance_options#log-level
log_level = "INFO"
# Defines how much detail should be present in Meilisearch's logs.
# Meilisearch currently supports five log levels, listed in order of increasing verbosity: `ERROR`, `WARN`, `INFO`, `DEBUG`, `TRACE`
# https://docs.meilisearch.com/learn/configuration/instance_options.html#log-level
max_index_size = "100 GiB"
# Sets the maximum size of the index.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#max-index-size
max_task_db_size = "100 GiB"
# Sets the maximum size of the task database.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#max-task-db-size
# Sets the maximum amount of RAM Meilisearch can use when indexing.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#max-indexing-memory
# max_indexing_memory = "2 GiB"
# Sets the maximum amount of RAM Meilisearch can use when indexing.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#max-indexing-memory
# Sets the maximum number of threads Meilisearch can use during indexing.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#max-indexing-threads
# max_indexing_threads = 4
# Sets the maximum number of threads Meilisearch can use during indexing.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#max-indexing-threads
disable_auto_batching = false
# Deactivates auto-batching when provided.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#disable-auto-batching
#############
### DUMPS ###
#############
dumps_dir = "dumps/"
# Sets the directory where Meilisearch will create dump files.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#dump-directory
dump_dir = "dumps/"
# https://docs.meilisearch.com/learn/configuration/instance_options.html#dumps-destination
# Imports the dump file located at the specified path. Path must point to a .dump file.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#import-dump
# import_dump = "./path/to/my/file.dump"
# Imports the dump file located at the specified path. Path must point to a .dump file.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#import-dump
# Prevents Meilisearch from throwing an error when `import_dump` does not point to a valid dump file.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ignore-missing-dump
ignore_missing_dump = false
# Prevents Meilisearch from throwing an error when `import_dump` does not point to a valid dump file.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ignore-missing-dump
# Prevents a Meilisearch instance with an existing database from throwing an error when using `import_dump`.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ignore-dump-if-db-exists
ignore_dump_if_db_exists = false
# Prevents a Meilisearch instance with an existing database from throwing an error when using `import_dump`.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ignore-dump-if-db-exists
#################
### SNAPSHOTS ###
#################
# Enables scheduled snapshots when true, disable when false (the default).
# If the value is given as an integer, then enables the scheduled snapshot with the passed value as the interval
# between each snapshot, in seconds.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#schedule-snapshot-creation
schedule_snapshot = false
# Activates scheduled snapshots when provided.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#schedule-snapshot-creation
# Sets the directory where Meilisearch will store snapshots.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#snapshot-destination
snapshot_dir = "snapshots/"
# Sets the directory where Meilisearch will store snapshots.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#snapshot-destination
snapshot_interval_sec = 86400
# Defines the interval between each snapshot. Value must be given in seconds.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#snapshot-interval
# Launches Meilisearch after importing a previously-generated snapshot at the given filepath.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#import-snapshot
# import_snapshot = "./path/to/my/snapshot"
# Launches Meilisearch after importing a previously-generated snapshot at the given filepath.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#import-snapshot
# Prevents a Meilisearch instance from throwing an error when `import_snapshot` does not point to a valid snapshot file.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ignore-missing-snapshot
ignore_missing_snapshot = false
# Prevents a Meilisearch instance from throwing an error when `import_snapshot` does not point to a valid snapshot file.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ignore-missing-snapshot
# Prevents a Meilisearch instance with an existing database from throwing an error when using `import_snapshot`.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ignore-snapshot-if-db-exists
ignore_snapshot_if_db_exists = false
# Prevents a Meilisearch instance with an existing database from throwing an error when using `import_snapshot`.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ignore-snapshot-if-db-exists
###########
### SSL ###
###########
# Enables client authentication in the specified path.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-authentication-path
# ssl_auth_path = "./path/to/root"
# Enables client authentication in the specified path.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ssl-authentication-path
# Sets the server's SSL certificates.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-certificates-path
# ssl_cert_path = "./path/to/certfile"
# Sets the server's SSL certificates.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ssl-certificates-path
# Sets the server's SSL key files.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-key-path
# ssl_key_path = "./path/to/private-key"
# Sets the server's SSL key files.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ssl-key-path
# Sets the server's OCSP file.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-ocsp-path
# ssl_ocsp_path = "./path/to/ocsp-file"
# Sets the server's OCSP file.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ssl-ocsp-path
# Makes SSL authentication mandatory.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-require-auth
ssl_require_auth = false
# Makes SSL authentication mandatory.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ssl-require-auth
# Activates SSL session resumption.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-resumption
ssl_resumption = false
# Activates SSL session resumption.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ssl-resumption
# Activates SSL tickets.
# https://www.meilisearch.com/docs/learn/configuration/instance_options#ssl-tickets
ssl_tickets = false
#############################
### Experimental features ###
#############################
# Experimental metrics feature. For more information, see: <https://github.com/meilisearch/meilisearch/discussions/3518>
# Enables the Prometheus metrics on the `GET /metrics` endpoint.
experimental_enable_metrics = false
# Experimental RAM reduction during indexing, do not use in production, see: <https://github.com/meilisearch/product/discussions/652>
experimental_reduce_indexing_memory_usage = false
# Experimentally reduces the maximum number of tasks that will be processed at once, see: <https://github.com/orgs/meilisearch/discussions/713>
# experimental_max_number_of_batched_tasks = 100
# Activates SSL tickets.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ssl-tickets

View File

@ -1 +0,0 @@
benches/datasets_paths.rs

View File

@ -1,53 +0,0 @@
[package]
name = "benchmarks"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.95"
bumpalo = "3.16.0"
csv = "1.3.1"
memmap2 = "0.9.5"
milli = { path = "../milli" }
mimalloc = { version = "0.1.43", default-features = false }
serde_json = { version = "1.0.135", features = ["preserve_order"] }
tempfile = "3.15.0"
[dev-dependencies]
criterion = { version = "0.5.1", features = ["html_reports"] }
rand = "0.8.5"
rand_chacha = "0.3.1"
roaring = "0.10.10"
[build-dependencies]
anyhow = "1.0.95"
bytes = "1.9.0"
convert_case = "0.6.0"
flate2 = "1.0.35"
reqwest = { version = "0.12.12", features = ["blocking", "rustls-tls"], default-features = false }
[features]
default = ["milli/all-tokenizations"]
[[bench]]
name = "search_songs"
harness = false
[[bench]]
name = "search_wiki"
harness = false
[[bench]]
name = "search_geo"
harness = false
[[bench]]
name = "indexing"
harness = false

View File

@ -1,138 +0,0 @@
Benchmarks
==========
## TOC
- [Run the benchmarks](#run-the-benchmarks)
- [Comparison between benchmarks](#comparison-between-benchmarks)
- [Datasets](#datasets)
## Run the benchmarks
### On our private server
The Meili team has self-hosted his own GitHub runner to run benchmarks on our dedicated bare metal server.
To trigger the benchmark workflow:
- Go to the `Actions` tab of this repository.
- Select the `Benchmarks` workflow on the left.
- Click on `Run workflow` in the blue banner.
- Select the branch on which you want to run the benchmarks and select the dataset you want (default: `songs`).
- Finally, click on `Run workflow`.
This GitHub workflow will run the benchmarks and push the `critcmp` report to a DigitalOcean Space (= S3).
The name of the uploaded file is displayed in the workflow.
_[More about critcmp](https://github.com/BurntSushi/critcmp)._
💡 To compare the just-uploaded benchmark with another one, check out the [next section](#comparison-between-benchmarks).
### On your machine
To run all the benchmarks (~5h):
```bash
cargo bench
```
To run only the `search_songs` (~1h), `search_wiki` (~3h), `search_geo` (~20m) or `indexing` (~2h) benchmark:
```bash
cargo bench --bench <dataset name>
```
By default, the benchmarks will be downloaded and uncompressed automatically in the target directory.<br>
If you don't want to download the datasets every time you update something on the code, you can specify a custom directory with the environment variable `MILLI_BENCH_DATASETS_PATH`:
```bash
mkdir ~/datasets
MILLI_BENCH_DATASETS_PATH=~/datasets cargo bench --bench search_songs # the four datasets are downloaded
touch build.rs
MILLI_BENCH_DATASETS_PATH=~/datasets cargo bench --bench songs # the code is compiled again but the datasets are not downloaded
```
## Comparison between benchmarks
The benchmark reports we push are generated with `critcmp`. Thus, we use `critcmp` to show the result of a benchmark, or compare results between multiple benchmarks.
We provide a script to download and display the comparison report.
Requirements:
- `grep`
- `curl`
- [`critcmp`](https://github.com/BurntSushi/critcmp)
List the available file in the DO Space:
```bash
./benchmarks/script/list.sh
```
```bash
songs_main_09a4321.json
songs_geosearch_24ec456.json
search_songs_main_cb45a10b.json
```
Run the comparison script:
```bash
# we get the result of ONE benchmark, this give you an idea of how much time an operation took
./benchmarks/scripts/compare.sh son songs_geosearch_24ec456.json
# we compare two benchmarks
./benchmarks/scripts/compare.sh songs_main_09a4321.json songs_geosearch_24ec456.json
# we compare three benchmarks
./benchmarks/scripts/compare.sh songs_main_09a4321.json songs_geosearch_24ec456.json search_songs_main_cb45a10b.json
```
## Datasets
The benchmarks uses the following datasets:
- `smol-songs`
- `smol-wiki`
- `movies`
- `smol-all-countries`
### Songs
`smol-songs` is a subset of the [`songs.csv` dataset](https://milli-benchmarks.fra1.digitaloceanspaces.com/datasets/songs.csv.gz).
It was generated with this command:
```bash
xsv sample --seed 42 1000000 songs.csv -o smol-songs.csv
```
_[Download the generated `smol-songs` dataset](https://milli-benchmarks.fra1.digitaloceanspaces.com/datasets/smol-songs.csv.gz)._
### Wiki
`smol-wiki` is a subset of the [`wikipedia-articles.csv` dataset](https://milli-benchmarks.fra1.digitaloceanspaces.com/datasets/wiki-articles.csv.gz).
It was generated with the following command:
```bash
xsv sample --seed 42 500000 wiki-articles.csv -o smol-wiki-articles.csv
```
_[Download the `smol-wiki` dataset](https://milli-benchmarks.fra1.digitaloceanspaces.com/datasets/smol-wiki-articles.csv.gz)._
### Movies
`movies` is a really small dataset we uses as our example in the [getting started](https://www.meilisearch.com/docs/learn/getting_started/quick_start)
_[Download the `movies` dataset](https://www.meilisearch.com/movies.json)._
### All Countries
`smol-all-countries` is a subset of the [`all-countries.csv` dataset](https://milli-benchmarks.fra1.digitaloceanspaces.com/datasets/all-countries.csv.gz)
It has been converted to jsonlines and then edited so it matches our format for the `_geo` field.
It was generated with the following command:
```bash
bat all-countries.csv.gz | gunzip | xsv sample --seed 42 1000000 | csv2json-lite | sd '"latitude":"(.*?)","longitude":"(.*?)"' '"_geo": { "lat": $1, "lng": $2 }' | sd '\[|\]|,$' '' | gzip > smol-all-countries.jsonl.gz
```
_[Download the `smol-all-countries` dataset](https://milli-benchmarks.fra1.digitaloceanspaces.com/datasets/smol-all-countries.jsonl.gz)._

File diff suppressed because it is too large Load Diff

View File

@ -1,125 +0,0 @@
mod datasets_paths;
mod utils;
use criterion::{criterion_group, criterion_main};
use milli::{update::Settings, FilterableAttributesRule};
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
fn base_conf(builder: &mut Settings) {
let displayed_fields =
["geonameid", "name", "asciiname", "alternatenames", "_geo", "population"]
.iter()
.map(|s| s.to_string())
.collect();
builder.set_displayed_fields(displayed_fields);
let searchable_fields =
["name", "alternatenames", "elevation"].iter().map(|s| s.to_string()).collect();
builder.set_searchable_fields(searchable_fields);
let filterable_fields = ["_geo", "population", "elevation"]
.iter()
.map(|s| FilterableAttributesRule::Field(s.to_string()))
.collect();
builder.set_filterable_fields(filterable_fields);
let sortable_fields =
["_geo", "population", "elevation"].iter().map(|s| s.to_string()).collect();
builder.set_sortable_fields(sortable_fields);
}
#[rustfmt::skip]
const BASE_CONF: Conf = Conf {
dataset: datasets_paths::SMOL_ALL_COUNTRIES,
dataset_format: "jsonl",
queries: &[
"",
],
configure: base_conf,
primary_key: Some("geonameid"),
..Conf::BASE
};
fn bench_geo(c: &mut criterion::Criterion) {
#[rustfmt::skip]
let confs = &[
// A basic placeholder with no geo
utils::Conf {
group_name: "placeholder with no geo",
..BASE_CONF
},
// Medium aglomeration: probably the most common usecase
utils::Conf {
group_name: "asc sort from Lille",
sort: Some(vec!["_geoPoint(50.62999333378238, 3.086269263384099):asc"]),
..BASE_CONF
},
utils::Conf {
group_name: "desc sort from Lille",
sort: Some(vec!["_geoPoint(50.62999333378238, 3.086269263384099):desc"]),
..BASE_CONF
},
// Big agglomeration: a lot of documents close to our point
utils::Conf {
group_name: "asc sort from Tokyo",
sort: Some(vec!["_geoPoint(35.749512532692144, 139.61664952543356):asc"]),
..BASE_CONF
},
utils::Conf {
group_name: "desc sort from Tokyo",
sort: Some(vec!["_geoPoint(35.749512532692144, 139.61664952543356):desc"]),
..BASE_CONF
},
// The furthest point from any civilization
utils::Conf {
group_name: "asc sort from Point Nemo",
sort: Some(vec!["_geoPoint(-48.87561645055408, -123.39275749319793):asc"]),
..BASE_CONF
},
utils::Conf {
group_name: "desc sort from Point Nemo",
sort: Some(vec!["_geoPoint(-48.87561645055408, -123.39275749319793):desc"]),
..BASE_CONF
},
// Filters
utils::Conf {
group_name: "filter of 100km from Lille",
filter: Some("_geoRadius(50.62999333378238, 3.086269263384099, 100000)"),
..BASE_CONF
},
utils::Conf {
group_name: "filter of 1km from Lille",
filter: Some("_geoRadius(50.62999333378238, 3.086269263384099, 1000)"),
..BASE_CONF
},
utils::Conf {
group_name: "filter of 100km from Tokyo",
filter: Some("_geoRadius(35.749512532692144, 139.61664952543356, 100000)"),
..BASE_CONF
},
utils::Conf {
group_name: "filter of 1km from Tokyo",
filter: Some("_geoRadius(35.749512532692144, 139.61664952543356, 1000)"),
..BASE_CONF
},
utils::Conf {
group_name: "filter of 100km from Point Nemo",
filter: Some("_geoRadius(-48.87561645055408, -123.39275749319793, 100000)"),
..BASE_CONF
},
utils::Conf {
group_name: "filter of 1km from Point Nemo",
filter: Some("_geoRadius(-48.87561645055408, -123.39275749319793, 1000)"),
..BASE_CONF
},
];
utils::run_benches(c, confs);
}
criterion_group!(benches, bench_geo);
criterion_main!(benches);

View File

@ -1,197 +0,0 @@
mod datasets_paths;
mod utils;
use criterion::{criterion_group, criterion_main};
use milli::{update::Settings, FilterableAttributesRule};
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
fn base_conf(builder: &mut Settings) {
let displayed_fields =
["id", "title", "album", "artist", "genre", "country", "released", "duration"]
.iter()
.map(|s| s.to_string())
.collect();
builder.set_displayed_fields(displayed_fields);
let searchable_fields = ["title", "album", "artist"].iter().map(|s| s.to_string()).collect();
builder.set_searchable_fields(searchable_fields);
let faceted_fields = ["released-timestamp", "duration-float", "genre", "country", "artist"]
.iter()
.map(|s| FilterableAttributesRule::Field(s.to_string()))
.collect();
builder.set_filterable_fields(faceted_fields);
}
#[rustfmt::skip]
const BASE_CONF: Conf = Conf {
dataset: datasets_paths::SMOL_SONGS,
queries: &[
"john ", // 9097
"david ", // 4794
"charles ", // 1957
"david bowie ", // 1200
"michael jackson ", // 600
"thelonious monk ", // 303
"charles mingus ", // 142
"marcus miller ", // 60
"tamo ", // 13
"Notstandskomitee ", // 4
],
configure: base_conf,
primary_key: Some("id"),
..Conf::BASE
};
fn bench_songs(c: &mut criterion::Criterion) {
let default_criterion: Vec<String> =
milli::default_criteria().iter().map(|criteria| criteria.to_string()).collect();
let default_criterion = default_criterion.iter().map(|s| s.as_str());
let asc_default: Vec<&str> =
std::iter::once("released-timestamp:asc").chain(default_criterion.clone()).collect();
let desc_default: Vec<&str> =
std::iter::once("released-timestamp:desc").chain(default_criterion.clone()).collect();
let basic_with_quote: Vec<String> = BASE_CONF
.queries
.iter()
.map(|s| {
s.trim().split(' ').map(|s| format!(r#""{}""#, s)).collect::<Vec<String>>().join(" ")
})
.collect();
let basic_with_quote: &[&str] =
&basic_with_quote.iter().map(|s| s.as_str()).collect::<Vec<&str>>();
#[rustfmt::skip]
let confs = &[
/* first we bench each criterion alone */
utils::Conf {
group_name: "proximity",
queries: &[
"black saint sinner lady ",
"les dangeureuses 1960 ",
"The Disneyland Sing-Along Chorus ",
"Under Great Northern Lights ",
"7000 Danses Un Jour Dans Notre Vie ",
],
criterion: Some(&["proximity"]),
optional_words: false,
..BASE_CONF
},
utils::Conf {
group_name: "typo",
queries: &[
"mongus ",
"thelonius monk ",
"Disnaylande ",
"the white striper ",
"indochie ",
"indochien ",
"klub des loopers ",
"fear of the duck ",
"michel depech ",
"stromal ",
"dire straights ",
"Arethla Franklin ",
],
criterion: Some(&["typo"]),
optional_words: false,
..BASE_CONF
},
utils::Conf {
group_name: "words",
queries: &[
"the black saint and the sinner lady and the good doggo ", // four words to pop
"les liaisons dangeureuses 1793 ", // one word to pop
"The Disneyland Children's Sing-Alone song ", // two words to pop
"seven nation mummy ", // one word to pop
"7000 Danses / Le Baiser / je me trompe de mots ", // four words to pop
"Bring Your Daughter To The Slaughter but now this is not part of the title ", // nine words to pop
"whathavenotnsuchforth and a good amount of words to pop to match the first one ", // 13
],
criterion: Some(&["words"]),
..BASE_CONF
},
utils::Conf {
group_name: "asc",
criterion: Some(&["released-timestamp:desc"]),
..BASE_CONF
},
utils::Conf {
group_name: "desc",
criterion: Some(&["released-timestamp:desc"]),
..BASE_CONF
},
/* then we bench the asc and desc criterion on top of the default criterion */
utils::Conf {
group_name: "asc + default",
criterion: Some(&asc_default[..]),
..BASE_CONF
},
utils::Conf {
group_name: "desc + default",
criterion: Some(&desc_default[..]),
..BASE_CONF
},
/* we bench the filters with the default request */
utils::Conf {
group_name: "basic filter: <=",
filter: Some("released-timestamp <= 946728000"), // year 2000
..BASE_CONF
},
utils::Conf {
group_name: "basic filter: TO",
filter: Some("released-timestamp 946728000 TO 1262347200"), // year 2000 to 2010
..BASE_CONF
},
utils::Conf {
group_name: "big filter",
filter: Some("released-timestamp != 1262347200 AND (NOT (released-timestamp = 946728000)) AND (duration-float = 1 OR (duration-float 1.1 TO 1.5 AND released-timestamp > 315576000))"),
..BASE_CONF
},
/* the we bench some global / normal search with all the default criterion in the default
* order */
utils::Conf {
group_name: "basic placeholder",
queries: &[""],
..BASE_CONF
},
utils::Conf {
group_name: "basic without quote",
queries: &BASE_CONF
.queries
.iter()
.map(|s| s.trim()) // we remove the space at the end of each request
.collect::<Vec<&str>>(),
..BASE_CONF
},
utils::Conf {
group_name: "basic with quote",
queries: basic_with_quote,
..BASE_CONF
},
utils::Conf {
group_name: "prefix search",
queries: &[
"s", // 500k+ results
"a", //
"b", //
"i", //
"x", // only 7k results
],
..BASE_CONF
},
];
utils::run_benches(c, confs);
}
criterion_group!(benches, bench_songs);
criterion_main!(benches);

View File

@ -1,130 +0,0 @@
mod datasets_paths;
mod utils;
use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
fn base_conf(builder: &mut Settings) {
let displayed_fields = ["title", "body", "url"].iter().map(|s| s.to_string()).collect();
builder.set_displayed_fields(displayed_fields);
let searchable_fields = ["title", "body"].iter().map(|s| s.to_string()).collect();
builder.set_searchable_fields(searchable_fields);
}
#[rustfmt::skip]
const BASE_CONF: Conf = Conf {
dataset: datasets_paths::SMOL_WIKI_ARTICLES,
queries: &[
"mingus ", // 46 candidates
"miles davis ", // 159
"rock and roll ", // 1007
"machine ", // 3448
"spain ", // 7002
"japan ", // 10.593
"france ", // 17.616
"film ", // 24.959
],
configure: base_conf,
..Conf::BASE
};
fn bench_songs(c: &mut criterion::Criterion) {
let basic_with_quote: Vec<String> = BASE_CONF
.queries
.iter()
.map(|s| {
s.trim().split(' ').map(|s| format!(r#""{}""#, s)).collect::<Vec<String>>().join(" ")
})
.collect();
let basic_with_quote: &[&str] =
&basic_with_quote.iter().map(|s| s.as_str()).collect::<Vec<&str>>();
#[rustfmt::skip]
let confs = &[
/* first we bench each criterion alone */
utils::Conf {
group_name: "proximity",
queries: &[
"herald sings ",
"april paris ",
"tea two ",
"diesel engine ",
],
criterion: Some(&["proximity"]),
optional_words: false,
..BASE_CONF
},
utils::Conf {
group_name: "typo",
queries: &[
"migrosoft ",
"linax ",
"Disnaylande ",
"phytogropher ",
"nympalidea ",
"aritmetric ",
"the fronce ",
"sisan ",
],
criterion: Some(&["typo"]),
optional_words: false,
..BASE_CONF
},
utils::Conf {
group_name: "words",
queries: &[
"the black saint and the sinner lady and the good doggo ", // four words to pop, 27 results
"Kameya Tokujirō mingus monk ", // two words to pop, 55
"Ulrich Hensel meilisearch milli ", // two words to pop, 306
"Idaho Bellevue pizza ", // one word to pop, 800
"Abraham machin ", // one word to pop, 1141
],
criterion: Some(&["words"]),
..BASE_CONF
},
/* the we bench some global / normal search with all the default criterion in the default
* order */
utils::Conf {
group_name: "basic placeholder",
queries: &[""],
..BASE_CONF
},
utils::Conf {
group_name: "basic without quote",
queries: &BASE_CONF
.queries
.iter()
.map(|s| s.trim()) // we remove the space at the end of each request
.collect::<Vec<&str>>(),
..BASE_CONF
},
utils::Conf {
group_name: "basic with quote",
queries: basic_with_quote,
..BASE_CONF
},
utils::Conf {
group_name: "prefix search",
queries: &[
"t", // 453k results
"c", // 405k
"g", // 318k
"j", // 227k
"q", // 71k
"x", // 17k
],
..BASE_CONF
},
];
utils::run_benches(c, confs);
}
criterion_group!(benches, bench_songs);
criterion_main!(benches);

View File

@ -1,336 +0,0 @@
#![allow(dead_code)]
use std::fs::{create_dir_all, remove_dir_all, File};
use std::io::{self, BufReader, BufWriter, Read};
use std::path::Path;
use std::str::FromStr as _;
use anyhow::Context;
use bumpalo::Bump;
use criterion::BenchmarkId;
use memmap2::Mmap;
use milli::heed::EnvOpenOptions;
use milli::progress::Progress;
use milli::update::new::indexer;
use milli::update::{IndexerConfig, Settings};
use milli::vector::EmbeddingConfigs;
use milli::{Criterion, Filter, Index, Object, TermsMatchingStrategy};
use serde_json::Value;
pub struct Conf<'a> {
/// where we are going to create our database.mmdb directory
/// each benchmark will first try to delete it and then recreate it
pub database_name: &'a str,
/// the dataset to be used, it must be an uncompressed csv
pub dataset: &'a str,
/// The format of the dataset
pub dataset_format: &'a str,
pub group_name: &'a str,
pub queries: &'a [&'a str],
/// here you can change which criterion are used and in which order.
/// - if you specify something all the base configuration will be thrown out
/// - if you don't specify anything (None) the default configuration will be kept
pub criterion: Option<&'a [&'a str]>,
/// the last chance to configure your database as you want
pub configure: fn(&mut Settings),
pub filter: Option<&'a str>,
pub sort: Option<Vec<&'a str>>,
/// enable or disable the optional words on the query
pub optional_words: bool,
/// primary key, if there is None we'll auto-generate docids for every documents
pub primary_key: Option<&'a str>,
}
impl Conf<'_> {
pub const BASE: Self = Conf {
database_name: "benches.mmdb",
dataset_format: "csv",
dataset: "",
group_name: "",
queries: &[],
criterion: None,
configure: |_| (),
filter: None,
sort: None,
optional_words: true,
primary_key: None,
};
}
pub fn base_setup(conf: &Conf) -> Index {
match remove_dir_all(conf.database_name) {
Ok(_) => (),
Err(e) if e.kind() == std::io::ErrorKind::NotFound => (),
Err(e) => panic!("{}", e),
}
create_dir_all(conf.database_name).unwrap();
let options = EnvOpenOptions::new();
let mut options = options.read_txn_without_tls();
options.map_size(100 * 1024 * 1024 * 1024); // 100 GB
options.max_readers(100);
let index = Index::new(options, conf.database_name, true).unwrap();
let config = IndexerConfig::default();
let mut wtxn = index.write_txn().unwrap();
let mut builder = Settings::new(&mut wtxn, &index, &config);
if let Some(primary_key) = conf.primary_key {
builder.set_primary_key(primary_key.to_string());
}
if let Some(criterion) = conf.criterion {
builder.reset_filterable_fields();
builder.reset_criteria();
builder.reset_stop_words();
let criterion = criterion.iter().map(|s| Criterion::from_str(s).unwrap()).collect();
builder.set_criteria(criterion);
}
(conf.configure)(&mut builder);
builder.execute(|_| (), || false).unwrap();
wtxn.commit().unwrap();
let config = IndexerConfig::default();
let mut wtxn = index.write_txn().unwrap();
let rtxn = index.read_txn().unwrap();
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let documents = documents_from(conf.dataset, conf.dataset_format);
let mut indexer = indexer::DocumentOperation::new();
indexer.replace_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
.into_changes(
&indexer_alloc,
&index,
&rtxn,
None,
&mut new_fields_ids_map,
&|| false,
Progress::default(),
)
.unwrap();
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
primary_key,
&document_changes,
EmbeddingConfigs::default(),
&|| false,
&Progress::default(),
)
.unwrap();
wtxn.commit().unwrap();
drop(rtxn);
index
}
pub fn run_benches(c: &mut criterion::Criterion, confs: &[Conf]) {
for conf in confs {
let index = base_setup(conf);
let file_name = Path::new(conf.dataset).file_name().and_then(|f| f.to_str()).unwrap();
let name = format!("{}: {}", file_name, conf.group_name);
let mut group = c.benchmark_group(&name);
for &query in conf.queries {
group.bench_with_input(BenchmarkId::from_parameter(query), &query, |b, &query| {
b.iter(|| {
let rtxn = index.read_txn().unwrap();
let mut search = index.search(&rtxn);
search.query(query).terms_matching_strategy(TermsMatchingStrategy::default());
if let Some(filter) = conf.filter {
let filter = Filter::from_str(filter).unwrap().unwrap();
search.filter(filter);
}
if let Some(sort) = &conf.sort {
let sort = sort.iter().map(|sort| sort.parse().unwrap()).collect();
search.sort_criteria(sort);
}
let _ids = search.execute().unwrap();
});
});
}
group.finish();
index.prepare_for_closing().wait();
}
}
pub fn documents_from(filename: &str, filetype: &str) -> Mmap {
let file = File::open(filename)
.unwrap_or_else(|_| panic!("could not find the dataset in: {filename}"));
match filetype {
"csv" => documents_from_csv(file).unwrap(),
"json" => documents_from_json(file).unwrap(),
"jsonl" => documents_from_jsonl(file).unwrap(),
otherwise => panic!("invalid update format {otherwise:?}"),
}
}
fn documents_from_jsonl(file: File) -> anyhow::Result<Mmap> {
unsafe { Mmap::map(&file).map_err(Into::into) }
}
fn documents_from_json(file: File) -> anyhow::Result<Mmap> {
let reader = BufReader::new(file);
let documents: Vec<milli::Object> = serde_json::from_reader(reader)?;
let mut output = tempfile::tempfile().map(BufWriter::new)?;
for document in documents {
serde_json::to_writer(&mut output, &document)?;
}
let file = output.into_inner()?;
unsafe { Mmap::map(&file).map_err(Into::into) }
}
fn documents_from_csv(file: File) -> anyhow::Result<Mmap> {
let output = tempfile::tempfile()?;
let mut output = BufWriter::new(output);
let mut reader = csv::ReaderBuilder::new().from_reader(file);
let headers = reader.headers().context("while retrieving headers")?.clone();
let typed_fields: Vec<_> = headers.iter().map(parse_csv_header).collect();
let mut object: serde_json::Map<_, _> =
typed_fields.iter().map(|(k, _)| (k.to_string(), Value::Null)).collect();
let mut line = 0;
let mut record = csv::StringRecord::new();
while reader.read_record(&mut record).context("while reading a record")? {
// We increment here and not at the end of the loop
// to take the header offset into account.
line += 1;
// Reset the document values
object.iter_mut().for_each(|(_, v)| *v = Value::Null);
for (i, (name, atype)) in typed_fields.iter().enumerate() {
let value = &record[i];
let trimmed_value = value.trim();
let value = match atype {
AllowedType::Number if trimmed_value.is_empty() => Value::Null,
AllowedType::Number => {
match trimmed_value.parse::<i64>() {
Ok(integer) => Value::from(integer),
Err(_) => match trimmed_value.parse::<f64>() {
Ok(float) => Value::from(float),
Err(error) => {
anyhow::bail!("document format error on line {line}: {error}. For value: {value}")
}
},
}
}
AllowedType::Boolean if trimmed_value.is_empty() => Value::Null,
AllowedType::Boolean => match trimmed_value.parse::<bool>() {
Ok(bool) => Value::from(bool),
Err(error) => {
anyhow::bail!(
"document format error on line {line}: {error}. For value: {value}"
)
}
},
AllowedType::String if value.is_empty() => Value::Null,
AllowedType::String => Value::from(value),
};
*object.get_mut(name).expect("encountered an unknown field") = value;
}
serde_json::to_writer(&mut output, &object).context("while writing to disk")?;
}
let output = output.into_inner()?;
unsafe { Mmap::map(&output).map_err(Into::into) }
}
enum AllowedType {
String,
Boolean,
Number,
}
fn parse_csv_header(header: &str) -> (String, AllowedType) {
// if there are several separators we only split on the last one.
match header.rsplit_once(':') {
Some((field_name, field_type)) => match field_type {
"string" => (field_name.to_string(), AllowedType::String),
"boolean" => (field_name.to_string(), AllowedType::Boolean),
"number" => (field_name.to_string(), AllowedType::Number),
// if the pattern isn't recognized, we keep the whole field.
_otherwise => (header.to_string(), AllowedType::String),
},
None => (header.to_string(), AllowedType::String),
}
}
struct CSVDocumentDeserializer<R>
where
R: Read,
{
documents: csv::StringRecordsIntoIter<R>,
headers: Vec<(String, AllowedType)>,
}
impl<R: Read> CSVDocumentDeserializer<R> {
fn from_reader(reader: R) -> io::Result<Self> {
let mut records = csv::Reader::from_reader(reader);
let headers = records.headers()?.into_iter().map(parse_csv_header).collect();
Ok(Self { documents: records.into_records(), headers })
}
}
impl<R: Read> Iterator for CSVDocumentDeserializer<R> {
type Item = anyhow::Result<Object>;
fn next(&mut self) -> Option<Self::Item> {
let csv_document = self.documents.next()?;
match csv_document {
Ok(csv_document) => {
let mut document = Object::new();
for ((field_name, field_type), value) in
self.headers.iter().zip(csv_document.into_iter())
{
let parsed_value: anyhow::Result<Value> = match field_type {
AllowedType::Number => {
value.parse::<f64>().map(Value::from).map_err(Into::into)
}
AllowedType::Boolean => {
value.parse::<bool>().map(Value::from).map_err(Into::into)
}
AllowedType::String => Ok(Value::String(value.to_string())),
};
match parsed_value {
Ok(value) => drop(document.insert(field_name.to_string(), value)),
Err(_e) => {
return Some(Err(anyhow::anyhow!(
"Value '{}' is not a valid number",
value
)))
}
}
}
Some(Ok(document))
}
Err(e) => Some(Err(anyhow::anyhow!("Error parsing csv document: {}", e))),
}
}
}

View File

@ -1,115 +0,0 @@
use std::fs::File;
use std::io::{Cursor, Read, Seek, Write};
use std::path::{Path, PathBuf};
use std::{env, fs};
use bytes::Bytes;
use convert_case::{Case, Casing};
use flate2::read::GzDecoder;
use reqwest::IntoUrl;
const BASE_URL: &str = "https://milli-benchmarks.fra1.digitaloceanspaces.com/datasets";
const DATASET_SONGS: (&str, &str) = ("smol-songs", "csv");
const DATASET_SONGS_1_2: (&str, &str) = ("smol-songs-1_2", "csv");
const DATASET_SONGS_3_4: (&str, &str) = ("smol-songs-3_4", "csv");
const DATASET_SONGS_4_4: (&str, &str) = ("smol-songs-4_4", "csv");
const DATASET_WIKI: (&str, &str) = ("smol-wiki-articles", "csv");
const DATASET_WIKI_1_2: (&str, &str) = ("smol-wiki-articles-1_2", "csv");
const DATASET_WIKI_3_4: (&str, &str) = ("smol-wiki-articles-3_4", "csv");
const DATASET_WIKI_4_4: (&str, &str) = ("smol-wiki-articles-4_4", "csv");
const DATASET_MOVIES: (&str, &str) = ("movies", "json");
const DATASET_MOVIES_1_2: (&str, &str) = ("movies-1_2", "json");
const DATASET_MOVIES_3_4: (&str, &str) = ("movies-3_4", "json");
const DATASET_MOVIES_4_4: (&str, &str) = ("movies-4_4", "json");
const DATASET_NESTED_MOVIES: (&str, &str) = ("nested_movies", "json");
const DATASET_GEO: (&str, &str) = ("smol-all-countries", "jsonl");
const ALL_DATASETS: &[(&str, &str)] = &[
DATASET_SONGS,
DATASET_SONGS_1_2,
DATASET_SONGS_3_4,
DATASET_SONGS_4_4,
DATASET_WIKI,
DATASET_WIKI_1_2,
DATASET_WIKI_3_4,
DATASET_WIKI_4_4,
DATASET_MOVIES,
DATASET_MOVIES_1_2,
DATASET_MOVIES_3_4,
DATASET_MOVIES_4_4,
DATASET_NESTED_MOVIES,
DATASET_GEO,
];
/// The name of the environment variable used to select the path
/// of the directory containing the datasets
const BASE_DATASETS_PATH_KEY: &str = "MILLI_BENCH_DATASETS_PATH";
fn main() -> anyhow::Result<()> {
let out_dir = PathBuf::from(env::var(BASE_DATASETS_PATH_KEY).unwrap_or(env::var("OUT_DIR")?));
let benches_dir = PathBuf::from(env::var("CARGO_MANIFEST_DIR")?).join("benches");
let mut manifest_paths_file = File::create(benches_dir.join("datasets_paths.rs"))?;
write!(
manifest_paths_file,
r#"//! This file is generated by the build script.
//! Do not modify by hand, use the build.rs file.
#![allow(dead_code)]
"#
)?;
writeln!(manifest_paths_file)?;
for (dataset, extension) in ALL_DATASETS {
let out_path = out_dir.join(dataset);
let out_file = out_path.with_extension(extension);
writeln!(
&mut manifest_paths_file,
r#"pub const {}: &str = {:?};"#,
dataset.to_case(Case::ScreamingSnake),
out_file.display(),
)?;
if out_file.exists() {
eprintln!(
"The dataset {} already exists on the file system and will not be downloaded again",
out_path.display(),
);
continue;
}
let url = format!("{}/{}.{}.gz", BASE_URL, dataset, extension);
eprintln!("downloading: {}", url);
let bytes = retry(|| download_dataset(url.clone()), 10)?;
eprintln!("{} downloaded successfully", url);
eprintln!("uncompressing in {}", out_file.display());
uncompress_in_file(bytes, &out_file)?;
}
Ok(())
}
fn retry<Ok, Err>(fun: impl Fn() -> Result<Ok, Err>, times: usize) -> Result<Ok, Err> {
for _ in 0..times {
if let ok @ Ok(_) = fun() {
return ok;
}
}
fun()
}
fn download_dataset<U: IntoUrl>(url: U) -> anyhow::Result<Cursor<Bytes>> {
let bytes =
reqwest::blocking::Client::builder().timeout(None).build()?.get(url).send()?.bytes()?;
Ok(Cursor::new(bytes))
}
fn uncompress_in_file<R: Read + Seek, P: AsRef<Path>>(bytes: R, path: P) -> anyhow::Result<()> {
let path = path.as_ref();
let mut gz = GzDecoder::new(bytes);
let mut dataset = Vec::new();
gz.read_to_end(&mut dataset)?;
fs::write(path, dataset)?;
Ok(())
}

View File

@ -1,38 +0,0 @@
#!/usr/bin/env bash
# Requirements:
# - critcmp. See: https://github.com/BurntSushi/critcmp
# - curl
# Usage
# $ bash compare.sh json_file1 json_file1
# ex: bash compare.sh songs_main_09a4321.json songs_geosearch_24ec456.json
# Checking that critcmp is installed
command -v critcmp > /dev/null 2>&1
if [[ "$?" -ne 0 ]]; then
echo 'You must install critcmp to make this script work.'
echo 'See: https://github.com/BurntSushi/critcmp'
echo ' $ cargo install critcmp'
exit 1
fi
s3_url='https://milli-benchmarks.fra1.digitaloceanspaces.com/critcmp_results'
for file in $@
do
file_s3_url="$s3_url/$file"
file_local_path="/tmp/$file"
if [[ ! -f $file_local_path ]]; then
curl $file_s3_url --output $file_local_path --silent
if [[ "$?" -ne 0 ]]; then
echo 'curl command failed.'
exit 1
fi
fi
done
path_list=$(echo " $@" | sed 's/ / \/tmp\//g')
critcmp $path_list

View File

@ -1,14 +0,0 @@
#!/usr/bin/env bash
# Requirements:
# - curl
# - grep
res=$(curl -s https://milli-benchmarks.fra1.digitaloceanspaces.com | grep -o '<Key>[^<]\+' | cut -c 5- | grep critcmp_results/ | cut -c 18-)
for pattern in "$@"
do
res=$(echo "$res" | grep $pattern)
done
echo "$res"

View File

@ -1,5 +0,0 @@
//! This library is only used to isolate the benchmarks
//! from the original milli library.
//!
//! It does not include interesting functions for milli library
//! users only for milli contributors.

View File

@ -1,18 +0,0 @@
[package]
name = "build-info"
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
time = { version = "0.3.37", features = ["parsing"] }
[build-dependencies]
anyhow = "1.0.95"
vergen-git2 = "1.0.2"

View File

@ -1,29 +0,0 @@
fn main() {
if let Err(err) = emit_git_variables() {
println!("cargo:warning=vergen: {}", err);
}
}
fn emit_git_variables() -> anyhow::Result<()> {
println!("cargo::rerun-if-env-changed=MEILI_NO_VERGEN");
let has_vergen =
!matches!(std::env::var_os("MEILI_NO_VERGEN"), Some(x) if x != "false" && x != "0");
anyhow::ensure!(has_vergen, "disabled via `MEILI_NO_VERGEN`");
// Note: any code that needs VERGEN_ environment variables should take care to define them manually in the Dockerfile and pass them
// in the corresponding GitHub workflow (publish_docker.yml).
// This is due to the Dockerfile building the binary outside of the git directory.
let mut builder = vergen_git2::Git2Builder::default();
builder.branch(true);
builder.commit_timestamp(true);
builder.commit_message(true);
builder.describe(true, true, None);
builder.sha(false);
let git2 = builder.build()?;
vergen_git2::Emitter::default().fail_on_error().add_instructions(&git2)?.emit()
}

View File

@ -1,203 +0,0 @@
use time::format_description::well_known::Iso8601;
#[derive(Debug, Clone)]
pub struct BuildInfo {
pub branch: Option<&'static str>,
pub describe: Option<DescribeResult>,
pub commit_sha1: Option<&'static str>,
pub commit_msg: Option<&'static str>,
pub commit_timestamp: Option<time::OffsetDateTime>,
}
impl BuildInfo {
pub fn from_build() -> Self {
let branch: Option<&'static str> = option_env!("VERGEN_GIT_BRANCH");
let describe = DescribeResult::from_build();
let commit_sha1 = option_env!("VERGEN_GIT_SHA");
let commit_msg = option_env!("VERGEN_GIT_COMMIT_MESSAGE");
let commit_timestamp = option_env!("VERGEN_GIT_COMMIT_TIMESTAMP");
let commit_timestamp = commit_timestamp.and_then(|commit_timestamp| {
time::OffsetDateTime::parse(commit_timestamp, &Iso8601::DEFAULT).ok()
});
Self { branch, describe, commit_sha1, commit_msg, commit_timestamp }
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum DescribeResult {
Prototype { name: &'static str },
Release { version: &'static str, major: u64, minor: u64, patch: u64 },
Prerelease { version: &'static str, major: u64, minor: u64, patch: u64, rc: u64 },
NotATag { describe: &'static str },
}
impl DescribeResult {
pub fn new(describe: &'static str) -> Self {
if let Some(name) = prototype_name(describe) {
Self::Prototype { name }
} else if let Some(release) = release_version(describe) {
release
} else if let Some(prerelease) = prerelease_version(describe) {
prerelease
} else {
Self::NotATag { describe }
}
}
pub fn from_build() -> Option<Self> {
let describe: &'static str = option_env!("VERGEN_GIT_DESCRIBE")?;
Some(Self::new(describe))
}
pub fn as_tag(&self) -> Option<&'static str> {
match self {
DescribeResult::Prototype { name } => Some(name),
DescribeResult::Release { version, .. } => Some(version),
DescribeResult::Prerelease { version, .. } => Some(version),
DescribeResult::NotATag { describe: _ } => None,
}
}
pub fn as_prototype(&self) -> Option<&'static str> {
match self {
DescribeResult::Prototype { name } => Some(name),
DescribeResult::Release { .. }
| DescribeResult::Prerelease { .. }
| DescribeResult::NotATag { .. } => None,
}
}
}
/// Parses the input as a prototype name.
///
/// Returns `Some(prototype_name)` if the following conditions are met on this value:
///
/// 1. starts with `prototype-`,
/// 2. ends with `-<some_number>`,
/// 3. does not end with `<some_number>-<some_number>`.
///
/// Otherwise, returns `None`.
fn prototype_name(describe: &'static str) -> Option<&'static str> {
if !describe.starts_with("prototype-") {
return None;
}
let mut rsplit_prototype = describe.rsplit('-');
// last component MUST be a number
rsplit_prototype.next()?.parse::<u64>().ok()?;
// before than last component SHALL NOT be a number
rsplit_prototype.next()?.parse::<u64>().err()?;
Some(describe)
}
fn release_version(describe: &'static str) -> Option<DescribeResult> {
if !describe.starts_with('v') {
return None;
}
// full release version don't contain a `-`
if describe.contains('-') {
return None;
}
// full release version parse as vX.Y.Z, with X, Y, Z numbers.
let mut dots = describe[1..].split('.');
let major: u64 = dots.next()?.parse().ok()?;
let minor: u64 = dots.next()?.parse().ok()?;
let patch: u64 = dots.next()?.parse().ok()?;
if dots.next().is_some() {
return None;
}
Some(DescribeResult::Release { version: describe, major, minor, patch })
}
fn prerelease_version(describe: &'static str) -> Option<DescribeResult> {
// prerelease version is in the shape vM.N.P-rc.C
let mut hyphen = describe.rsplit('-');
let prerelease = hyphen.next()?;
if !prerelease.starts_with("rc.") {
return None;
}
let rc: u64 = prerelease[3..].parse().ok()?;
let release = hyphen.next()?;
let DescribeResult::Release { version: _, major, minor, patch } = release_version(release)?
else {
return None;
};
Some(DescribeResult::Prerelease { version: describe, major, minor, patch, rc })
}
#[cfg(test)]
mod test {
use super::DescribeResult;
fn assert_not_a_tag(describe: &'static str) {
assert_eq!(DescribeResult::NotATag { describe }, DescribeResult::new(describe))
}
fn assert_proto(describe: &'static str) {
assert_eq!(DescribeResult::Prototype { name: describe }, DescribeResult::new(describe))
}
fn assert_release(describe: &'static str, major: u64, minor: u64, patch: u64) {
assert_eq!(
DescribeResult::Release { version: describe, major, minor, patch },
DescribeResult::new(describe)
)
}
fn assert_prerelease(describe: &'static str, major: u64, minor: u64, patch: u64, rc: u64) {
assert_eq!(
DescribeResult::Prerelease { version: describe, major, minor, patch, rc },
DescribeResult::new(describe)
)
}
#[test]
fn not_a_tag() {
assert_not_a_tag("whatever-fuzzy");
assert_not_a_tag("whatever-fuzzy-5-ggg-dirty");
assert_not_a_tag("whatever-fuzzy-120-ggg-dirty");
// technically a tag, but not a proto nor a version, so not parsed as a tag
assert_not_a_tag("whatever");
// dirty version
assert_not_a_tag("v1.7.0-1-ggga-dirty");
assert_not_a_tag("v1.7.0-rc.1-1-ggga-dirty");
// after version
assert_not_a_tag("v1.7.0-1-ggga");
assert_not_a_tag("v1.7.0-rc.1-1-ggga");
// after proto
assert_not_a_tag("protoype-tag-0-1-ggga");
assert_not_a_tag("protoype-tag-0-1-ggga-dirty");
}
#[test]
fn prototype() {
assert_proto("prototype-tag-0");
assert_proto("prototype-tag-10");
assert_proto("prototype-long-name-tag-10");
}
#[test]
fn release() {
assert_release("v1.7.2", 1, 7, 2);
}
#[test]
fn prerelease() {
assert_prerelease("v1.7.2-rc.3", 1, 7, 2, 3);
}
}

View File

@ -1,34 +0,0 @@
[package]
name = "dump"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
edition.workspace = true
homepage.workspace = true
readme.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.95"
flate2 = "1.0.35"
http = "1.2.0"
meilisearch-types = { path = "../meilisearch-types" }
once_cell = "1.20.2"
regex = "1.11.1"
roaring = { version = "0.10.10", features = ["serde"] }
serde = { version = "1.0.217", features = ["derive"] }
serde_json = { version = "1.0.135", features = ["preserve_order"] }
tar = "0.4.43"
tempfile = "3.15.0"
thiserror = "2.0.9"
time = { version = "0.3.37", features = ["serde-well-known", "formatting", "parsing", "macros"] }
tracing = "0.1.41"
uuid = { version = "1.11.0", features = ["serde", "v4"] }
[dev-dependencies]
big_s = "1.0.2"
maplit = "1.0.2"
meili-snap = { path = "../meili-snap" }
meilisearch-types = { path = "../meilisearch-types" }

View File

@ -1,38 +0,0 @@
---
source: dump/src/reader/compat/v1_to_v2.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"typo",
"words",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
},
"distinctAttribute": null
}

View File

@ -1,31 +0,0 @@
---
source: dump/src/reader/compat/v1_to_v2.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [
"genres",
"id"
],
"sortableAttributes": [
"genres",
"id"
],
"rankingRules": [
"typo",
"words",
"proximity",
"attribute",
"exactness",
"release_date:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,24 +0,0 @@
---
source: dump/src/reader/compat/v1_to_v2.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"typo",
"words",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,23 +0,0 @@
---
source: dump/src/reader/compat/v2_to_v3.rs
expression: movies2.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,23 +0,0 @@
---
source: dump/src/reader/compat/v2_to_v3.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,37 +0,0 @@
---
source: dump/src/reader/compat/v2_to_v3.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
},
"distinctAttribute": null
}

View File

@ -1,24 +0,0 @@
---
source: dump/src/reader/compat/v2_to_v3.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"exactness",
"release_date:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,25 +0,0 @@
---
source: dump/src/reader/compat/v3_to_v4.rs
expression: movies2.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,25 +0,0 @@
---
source: dump/src/reader/compat/v3_to_v4.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,39 +0,0 @@
---
source: dump/src/reader/compat/v3_to_v4.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
},
"distinctAttribute": null
}

View File

@ -1,31 +0,0 @@
---
source: dump/src/reader/compat/v3_to_v4.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [
"genres",
"id"
],
"sortableAttributes": [
"release_date"
],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness",
"release_date:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,56 +0,0 @@
---
source: dump/src/reader/compat/v4_to_v5.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": "Reset",
"searchableAttributes": "Reset",
"filterableAttributes": {
"Set": []
},
"sortableAttributes": {
"Set": []
},
"rankingRules": {
"Set": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
]
},
"stopWords": {
"Set": []
},
"synonyms": {
"Set": {}
},
"distinctAttribute": "Reset",
"typoTolerance": {
"Set": {
"enabled": {
"Set": true
},
"minWordSizeForTypos": {
"Set": {
"oneTypo": {
"Set": 5
},
"twoTypos": {
"Set": 9
}
}
},
"disableOnWords": {
"Set": []
},
"disableOnAttributes": {
"Set": []
}
}
},
"faceting": "NotSet",
"pagination": "NotSet"
}

View File

@ -1,70 +0,0 @@
---
source: dump/src/reader/compat/v4_to_v5.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": "Reset",
"searchableAttributes": "Reset",
"filterableAttributes": {
"Set": []
},
"sortableAttributes": {
"Set": []
},
"rankingRules": {
"Set": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
]
},
"stopWords": {
"Set": []
},
"synonyms": {
"Set": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
}
},
"distinctAttribute": "Reset",
"typoTolerance": {
"Set": {
"enabled": {
"Set": true
},
"minWordSizeForTypos": {
"Set": {
"oneTypo": {
"Set": 5
},
"twoTypos": {
"Set": 9
}
}
},
"disableOnWords": {
"Set": []
},
"disableOnAttributes": {
"Set": []
}
}
},
"faceting": "NotSet",
"pagination": "NotSet"
}

View File

@ -1,62 +0,0 @@
---
source: dump/src/reader/compat/v4_to_v5.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": "Reset",
"searchableAttributes": "Reset",
"filterableAttributes": {
"Set": [
"genres",
"id"
]
},
"sortableAttributes": {
"Set": [
"release_date"
]
},
"rankingRules": {
"Set": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness",
"release_date:asc"
]
},
"stopWords": {
"Set": []
},
"synonyms": {
"Set": {}
},
"distinctAttribute": "Reset",
"typoTolerance": {
"Set": {
"enabled": {
"Set": true
},
"minWordSizeForTypos": {
"Set": {
"oneTypo": {
"Set": 5
},
"twoTypos": {
"Set": 9
}
}
},
"disableOnWords": {
"Set": []
},
"disableOnAttributes": {
"Set": []
}
}
},
"faceting": "NotSet",
"pagination": "NotSet"
}

View File

@ -1,40 +0,0 @@
---
source: dump/src/reader/compat/v5_to_v6.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null,
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
"oneTypo": 5,
"twoTypos": 9
},
"disableOnWords": [],
"disableOnAttributes": []
},
"faceting": {
"maxValuesPerFacet": 100
},
"pagination": {
"maxTotalHits": 1000
}
}

View File

@ -1,54 +0,0 @@
---
source: dump/src/reader/compat/v5_to_v6.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
},
"distinctAttribute": null,
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
"oneTypo": 5,
"twoTypos": 9
},
"disableOnWords": [],
"disableOnAttributes": []
},
"faceting": {
"maxValuesPerFacet": 100
},
"pagination": {
"maxTotalHits": 1000
}
}

View File

@ -1,46 +0,0 @@
---
source: dump/src/reader/compat/v5_to_v6.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [
"genres",
"id"
],
"sortableAttributes": [
"release_date"
],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness",
"release_date:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null,
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
"oneTypo": 5,
"twoTypos": 9
},
"disableOnWords": [],
"disableOnAttributes": []
},
"faceting": {
"maxValuesPerFacet": 100
},
"pagination": {
"maxTotalHits": 1000
}
}

View File

@ -1,410 +0,0 @@
use std::str::FromStr;
use super::v2_to_v3::CompatV2ToV3;
use crate::reader::{v1, v2, Document};
use crate::Result;
pub struct CompatV1ToV2 {
pub from: v1::V1Reader,
}
impl CompatV1ToV2 {
pub fn new(v1: v1::V1Reader) -> Self {
Self { from: v1 }
}
pub fn to_v3(self) -> CompatV2ToV3 {
CompatV2ToV3::Compat(self)
}
pub fn version(&self) -> crate::Version {
self.from.version()
}
pub fn date(&self) -> Option<time::OffsetDateTime> {
self.from.date()
}
pub fn index_uuid(&self) -> Vec<v2::meta::IndexUuid> {
self.from
.index_uuid()
.into_iter()
.enumerate()
// we use the index of the index 😬 as UUID for the index, so that we can link the v2::Task to their index
.map(|(index, index_uuid)| v2::meta::IndexUuid {
uid: index_uuid.uid,
uuid: uuid::Uuid::from_u128(index as u128),
})
.collect()
}
pub fn indexes(&self) -> Result<impl Iterator<Item = Result<CompatIndexV1ToV2>> + '_> {
Ok(self.from.indexes()?.map(|index_reader| Ok(CompatIndexV1ToV2 { from: index_reader? })))
}
pub fn tasks(
&mut self,
) -> Box<dyn Iterator<Item = Result<(v2::Task, Option<v2::UpdateFile>)>> + '_> {
// Convert an error here to an iterator yielding the error
let indexes = match self.from.indexes() {
Ok(indexes) => indexes,
Err(err) => return Box::new(std::iter::once(Err(err))),
};
let it = indexes.enumerate().flat_map(
move |(index, index_reader)| -> Box<dyn Iterator<Item = _>> {
let index_reader = match index_reader {
Ok(index_reader) => index_reader,
Err(err) => return Box::new(std::iter::once(Err(err))),
};
Box::new(
index_reader
.tasks()
// Filter out the UpdateStatus::Customs variant that is not supported in v2
// and enqueued tasks, that don't contain the necessary update file in v1
.filter_map(move |task| -> Option<_> {
let task = match task {
Ok(task) => task,
Err(err) => return Some(Err(err)),
};
Some(Ok((
v2::Task {
uuid: uuid::Uuid::from_u128(index as u128),
update: Option::from(task)?,
},
None,
)))
}),
)
},
);
Box::new(it)
}
}
pub struct CompatIndexV1ToV2 {
pub from: v1::V1IndexReader,
}
impl CompatIndexV1ToV2 {
pub fn metadata(&self) -> &crate::IndexMetadata {
self.from.metadata()
}
pub fn documents(&mut self) -> Result<Box<dyn Iterator<Item = Result<Document>> + '_>> {
self.from.documents().map(|it| Box::new(it) as Box<dyn Iterator<Item = _>>)
}
pub fn settings(&mut self) -> Result<v2::settings::Settings<v2::settings::Checked>> {
Ok(v2::settings::Settings::<v2::settings::Unchecked>::from(self.from.settings()?).check())
}
}
impl From<v1::settings::Settings> for v2::Settings<v2::Unchecked> {
fn from(source: v1::settings::Settings) -> Self {
Self {
displayed_attributes: option_to_setting(source.displayed_attributes)
.map(|displayed| displayed.into_iter().collect()),
searchable_attributes: option_to_setting(source.searchable_attributes),
filterable_attributes: option_to_setting(source.attributes_for_faceting.clone())
.map(|filterable| filterable.into_iter().collect()),
sortable_attributes: option_to_setting(source.attributes_for_faceting)
.map(|sortable| sortable.into_iter().collect()),
ranking_rules: option_to_setting(source.ranking_rules).map(|ranking_rules| {
ranking_rules
.into_iter()
.filter_map(|ranking_rule| {
match v1::settings::RankingRule::from_str(&ranking_rule) {
Ok(ranking_rule) => {
let criterion: Option<v2::settings::Criterion> =
ranking_rule.into();
criterion.as_ref().map(ToString::to_string)
}
Err(()) => {
tracing::warn!(
"Could not import the following ranking rule: `{}`.",
ranking_rule
);
None
}
}
})
.collect()
}),
stop_words: option_to_setting(source.stop_words),
synonyms: option_to_setting(source.synonyms),
distinct_attribute: option_to_setting(source.distinct_attribute),
_kind: std::marker::PhantomData,
}
}
}
fn option_to_setting<T>(opt: Option<Option<T>>) -> v2::Setting<T> {
match opt {
Some(Some(t)) => v2::Setting::Set(t),
None => v2::Setting::NotSet,
Some(None) => v2::Setting::Reset,
}
}
impl From<v1::update::UpdateStatus> for Option<v2::updates::UpdateStatus> {
fn from(source: v1::update::UpdateStatus) -> Self {
use v1::update::UpdateStatus as UpdateStatusV1;
use v2::updates::UpdateStatus as UpdateStatusV2;
Some(match source {
UpdateStatusV1::Enqueued { content } => {
tracing::warn!(
"Cannot import task {} (importing enqueued tasks from v1 dumps is unsupported)",
content.update_id
);
tracing::warn!("Task will be skipped in the queue of imported tasks.");
return None;
}
UpdateStatusV1::Failed { content } => UpdateStatusV2::Failed(v2::updates::Failed {
from: v2::updates::Processing {
from: v2::updates::Enqueued {
update_id: content.update_id,
meta: Option::from(content.update_type)?,
enqueued_at: content.enqueued_at,
content: None,
},
started_processing_at: content.processed_at
- std::time::Duration::from_secs_f64(content.duration),
},
error: v2::ResponseError {
// error code is ignored by serialization, and so always default in deserialized v2 dumps
// that's a good thing, because we don't have them in v1 dump 😅
code: http::StatusCode::default(),
message: content.error.unwrap_or_default(),
// error codes are unchanged between v1 and v2
error_code: content.error_code.unwrap_or_default(),
// error types are unchanged between v1 and v2
error_type: content.error_type.unwrap_or_default(),
// error links are unchanged between v1 and v2
error_link: content.error_link.unwrap_or_default(),
},
failed_at: content.processed_at,
}),
UpdateStatusV1::Processed { content } => {
UpdateStatusV2::Processed(v2::updates::Processed {
success: match &content.update_type {
v1::update::UpdateType::ClearAll => {
v2::updates::UpdateResult::DocumentDeletion { deleted: u64::MAX }
}
v1::update::UpdateType::Customs => v2::updates::UpdateResult::Other,
v1::update::UpdateType::DocumentsAddition { number } => {
v2::updates::UpdateResult::DocumentsAddition(
v2::updates::DocumentAdditionResult { nb_documents: *number },
)
}
v1::update::UpdateType::DocumentsPartial { number } => {
v2::updates::UpdateResult::DocumentsAddition(
v2::updates::DocumentAdditionResult { nb_documents: *number },
)
}
v1::update::UpdateType::DocumentsDeletion { number } => {
v2::updates::UpdateResult::DocumentDeletion { deleted: *number as u64 }
}
v1::update::UpdateType::Settings { .. } => v2::updates::UpdateResult::Other,
},
processed_at: content.processed_at,
from: v2::updates::Processing {
from: v2::updates::Enqueued {
update_id: content.update_id,
meta: Option::from(content.update_type)?,
enqueued_at: content.enqueued_at,
content: None,
},
started_processing_at: content.processed_at
- std::time::Duration::from_secs_f64(content.duration),
},
})
}
})
}
}
impl From<v1::update::UpdateType> for Option<v2::updates::UpdateMeta> {
fn from(source: v1::update::UpdateType) -> Self {
Some(match source {
v1::update::UpdateType::ClearAll => v2::updates::UpdateMeta::ClearDocuments,
v1::update::UpdateType::Customs => {
tracing::warn!("Ignoring task with type 'Customs' that is no longer supported");
return None;
}
v1::update::UpdateType::DocumentsAddition { .. } => {
v2::updates::UpdateMeta::DocumentsAddition {
method: v2::updates::IndexDocumentsMethod::ReplaceDocuments,
format: v2::updates::UpdateFormat::Json,
primary_key: None,
}
}
v1::update::UpdateType::DocumentsPartial { .. } => {
v2::updates::UpdateMeta::DocumentsAddition {
method: v2::updates::IndexDocumentsMethod::UpdateDocuments,
format: v2::updates::UpdateFormat::Json,
primary_key: None,
}
}
v1::update::UpdateType::DocumentsDeletion { .. } => {
v2::updates::UpdateMeta::DeleteDocuments { ids: vec![] }
}
v1::update::UpdateType::Settings { settings } => {
v2::updates::UpdateMeta::Settings((*settings).into())
}
})
}
}
impl From<v1::settings::SettingsUpdate> for v2::Settings<v2::Unchecked> {
fn from(source: v1::settings::SettingsUpdate) -> Self {
let ranking_rules = v2::Setting::from(source.ranking_rules);
// go from the concrete types of v1 (RankingRule) to the concrete type of v2 (Criterion),
// and then back to string as this is what the settings manipulate
let ranking_rules = ranking_rules.map(|ranking_rules| {
ranking_rules
.into_iter()
// filter out the WordsPosition ranking rule that exists in v1 but not v2
.filter_map(Option::<v2::settings::Criterion>::from)
.map(|criterion| criterion.to_string())
.collect()
});
Self {
displayed_attributes: v2::Setting::from(source.displayed_attributes)
.map(|displayed_attributes| displayed_attributes.into_iter().collect()),
searchable_attributes: source.searchable_attributes.into(),
filterable_attributes: v2::Setting::from(source.attributes_for_faceting.clone())
.map(|attributes_for_faceting| attributes_for_faceting.into_iter().collect()),
sortable_attributes: v2::Setting::from(source.attributes_for_faceting)
.map(|attributes_for_faceting| attributes_for_faceting.into_iter().collect()),
ranking_rules,
stop_words: source.stop_words.into(),
synonyms: source.synonyms.into(),
distinct_attribute: source.distinct_attribute.into(),
_kind: std::marker::PhantomData,
}
}
}
impl From<v1::settings::RankingRule> for Option<v2::settings::Criterion> {
fn from(source: v1::settings::RankingRule) -> Self {
match source {
v1::settings::RankingRule::Typo => Some(v2::settings::Criterion::Typo),
v1::settings::RankingRule::Words => Some(v2::settings::Criterion::Words),
v1::settings::RankingRule::Proximity => Some(v2::settings::Criterion::Proximity),
v1::settings::RankingRule::Attribute => Some(v2::settings::Criterion::Attribute),
v1::settings::RankingRule::WordsPosition => {
tracing::warn!("Removing the 'WordsPosition' ranking rule that is no longer supported, please check the resulting ranking rules of your indexes");
None
}
v1::settings::RankingRule::Exactness => Some(v2::settings::Criterion::Exactness),
v1::settings::RankingRule::Asc(field_name) => {
Some(v2::settings::Criterion::Asc(field_name))
}
v1::settings::RankingRule::Desc(field_name) => {
Some(v2::settings::Criterion::Desc(field_name))
}
}
}
}
impl<T> From<v1::settings::UpdateState<T>> for v2::Setting<T> {
fn from(source: v1::settings::UpdateState<T>) -> Self {
match source {
v1::settings::UpdateState::Update(new_value) => v2::Setting::Set(new_value),
v1::settings::UpdateState::Clear => v2::Setting::Reset,
v1::settings::UpdateState::Nothing => v2::Setting::NotSet,
}
}
}
#[cfg(test)]
pub(crate) mod test {
use std::fs::File;
use std::io::BufReader;
use flate2::bufread::GzDecoder;
use meili_snap::insta;
use tempfile::TempDir;
use super::*;
#[test]
fn compat_v1_v2() {
let dump = File::open("tests/assets/v1.dump").unwrap();
let dir = TempDir::new().unwrap();
let mut dump = BufReader::new(dump);
let gz = GzDecoder::new(&mut dump);
let mut archive = tar::Archive::new(gz);
archive.unpack(dir.path()).unwrap();
let mut dump = v1::V1Reader::open(dir).unwrap().to_v2();
// top level infos
assert_eq!(dump.date(), None);
// tasks
let tasks = dump.tasks().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"2298010973ee98cf4670787314176a3a");
assert_eq!(update_files.len(), 9);
assert!(update_files[..].iter().all(|u| u.is_none())); // no update file in dumps v1
// indexes
let mut indexes = dump.indexes().unwrap().collect::<Result<Vec<_>>>().unwrap();
// the index are not ordered in any way by default
indexes.sort_by_key(|index| index.metadata().uid.to_string());
let mut products = indexes.pop().unwrap();
let mut movies = indexes.pop().unwrap();
let mut spells = indexes.pop().unwrap();
assert!(indexes.is_empty());
// products
insta::assert_json_snapshot!(products.metadata(), @r###"
{
"uid": "products",
"primaryKey": "sku",
"createdAt": "2022-10-02T13:23:39.976870431Z",
"updatedAt": "2022-10-02T13:27:54.353262482Z"
}
"###);
insta::assert_json_snapshot!(products.settings().unwrap());
let documents = products.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"b01c8371aea4c7171af0d4d846a2bdca");
// movies
insta::assert_json_snapshot!(movies.metadata(), @r###"
{
"uid": "movies",
"primaryKey": "id",
"createdAt": "2022-10-02T13:15:29.477512777Z",
"updatedAt": "2022-10-02T13:21:12.671204856Z"
}
"###);
insta::assert_json_snapshot!(movies.settings().unwrap());
let documents = movies.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"b63dbed5bbc059f3e32bc471ae699bf5");
// spells
insta::assert_json_snapshot!(spells.metadata(), @r###"
{
"uid": "dnd_spells",
"primaryKey": "index",
"createdAt": "2022-10-02T13:38:26.358882984Z",
"updatedAt": "2022-10-02T13:38:26.385609433Z"
}
"###);
insta::assert_json_snapshot!(spells.settings().unwrap());
let documents = spells.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"aa24c0cfc733d66c396237ad44263bed");
}
}

View File

@ -1,940 +0,0 @@
use std::fs::File;
use std::io::{BufReader, Read};
use flate2::bufread::GzDecoder;
use serde::Deserialize;
use tempfile::TempDir;
use self::compat::v4_to_v5::CompatV4ToV5;
use self::compat::v5_to_v6::{CompatIndexV5ToV6, CompatV5ToV6};
use self::v5::V5Reader;
use self::v6::{V6IndexReader, V6Reader};
use crate::{Result, Version};
mod compat;
mod v1;
mod v2;
mod v3;
mod v4;
mod v5;
mod v6;
pub type Document = serde_json::Map<String, serde_json::Value>;
pub type UpdateFile = dyn Iterator<Item = Result<Document>>;
#[allow(clippy::large_enum_variant)]
pub enum DumpReader {
Current(V6Reader),
Compat(CompatV5ToV6),
}
impl DumpReader {
pub fn open(dump: impl Read) -> Result<DumpReader> {
let path = TempDir::new()?;
let mut dump = BufReader::new(dump);
let gz = GzDecoder::new(&mut dump);
let mut archive = tar::Archive::new(gz);
archive.unpack(path.path())?;
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct MetadataVersion {
pub dump_version: Version,
}
let mut meta_file = File::open(path.path().join("metadata.json"))?;
let MetadataVersion { dump_version } = serde_json::from_reader(&mut meta_file)?;
match dump_version {
Version::V1 => {
Ok(v1::V1Reader::open(path)?.to_v2().to_v3().to_v4().to_v5().to_v6().into())
}
Version::V2 => Ok(v2::V2Reader::open(path)?.to_v3().to_v4().to_v5().to_v6().into()),
Version::V3 => Ok(v3::V3Reader::open(path)?.to_v4().to_v5().to_v6().into()),
Version::V4 => Ok(v4::V4Reader::open(path)?.to_v5().to_v6().into()),
Version::V5 => Ok(v5::V5Reader::open(path)?.to_v6().into()),
Version::V6 => Ok(v6::V6Reader::open(path)?.into()),
}
}
pub fn version(&self) -> crate::Version {
match self {
DumpReader::Current(current) => current.version(),
DumpReader::Compat(compat) => compat.version(),
}
}
pub fn date(&self) -> Option<time::OffsetDateTime> {
match self {
DumpReader::Current(current) => current.date(),
DumpReader::Compat(compat) => compat.date(),
}
}
pub fn instance_uid(&self) -> Result<Option<uuid::Uuid>> {
match self {
DumpReader::Current(current) => current.instance_uid(),
DumpReader::Compat(compat) => compat.instance_uid(),
}
}
pub fn indexes(&self) -> Result<Box<dyn Iterator<Item = Result<DumpIndexReader>> + '_>> {
match self {
DumpReader::Current(current) => {
let indexes = Box::new(current.indexes()?.map(|res| res.map(DumpIndexReader::from)))
as Box<dyn Iterator<Item = Result<DumpIndexReader>> + '_>;
Ok(indexes)
}
DumpReader::Compat(compat) => {
let indexes = Box::new(compat.indexes()?.map(|res| res.map(DumpIndexReader::from)))
as Box<dyn Iterator<Item = Result<DumpIndexReader>> + '_>;
Ok(indexes)
}
}
}
pub fn tasks(
&mut self,
) -> Result<Box<dyn Iterator<Item = Result<(v6::Task, Option<Box<UpdateFile>>)>> + '_>> {
match self {
DumpReader::Current(current) => Ok(current.tasks()),
DumpReader::Compat(compat) => compat.tasks(),
}
}
pub fn batches(&mut self) -> Result<Box<dyn Iterator<Item = Result<v6::Batch>> + '_>> {
match self {
DumpReader::Current(current) => Ok(current.batches()),
DumpReader::Compat(_compat) => Ok(Box::new(std::iter::empty())),
}
}
pub fn keys(&mut self) -> Result<Box<dyn Iterator<Item = Result<v6::Key>> + '_>> {
match self {
DumpReader::Current(current) => Ok(current.keys()),
DumpReader::Compat(compat) => compat.keys(),
}
}
pub fn features(&self) -> Result<Option<v6::RuntimeTogglableFeatures>> {
match self {
DumpReader::Current(current) => Ok(current.features()),
DumpReader::Compat(compat) => compat.features(),
}
}
pub fn network(&self) -> Result<Option<&v6::Network>> {
match self {
DumpReader::Current(current) => Ok(current.network()),
DumpReader::Compat(compat) => compat.network(),
}
}
}
impl From<V6Reader> for DumpReader {
fn from(value: V6Reader) -> Self {
DumpReader::Current(value)
}
}
impl From<CompatV5ToV6> for DumpReader {
fn from(value: CompatV5ToV6) -> Self {
DumpReader::Compat(value)
}
}
impl From<V5Reader> for DumpReader {
fn from(value: V5Reader) -> Self {
DumpReader::Compat(value.to_v6())
}
}
impl From<CompatV4ToV5> for DumpReader {
fn from(value: CompatV4ToV5) -> Self {
DumpReader::Compat(value.to_v6())
}
}
pub enum DumpIndexReader {
Current(v6::V6IndexReader),
Compat(Box<CompatIndexV5ToV6>),
}
impl DumpIndexReader {
pub fn new_v6(v6: v6::V6IndexReader) -> DumpIndexReader {
DumpIndexReader::Current(v6)
}
pub fn metadata(&self) -> &crate::IndexMetadata {
match self {
DumpIndexReader::Current(v6) => v6.metadata(),
DumpIndexReader::Compat(compat) => compat.metadata(),
}
}
pub fn documents(&mut self) -> Result<Box<dyn Iterator<Item = Result<Document>> + '_>> {
match self {
DumpIndexReader::Current(v6) => v6
.documents()
.map(|iter| Box::new(iter) as Box<dyn Iterator<Item = Result<Document>> + '_>),
DumpIndexReader::Compat(compat) => compat
.documents()
.map(|iter| Box::new(iter) as Box<dyn Iterator<Item = Result<Document>> + '_>),
}
}
pub fn settings(&mut self) -> Result<v6::Settings<v6::Checked>> {
match self {
DumpIndexReader::Current(v6) => v6.settings(),
DumpIndexReader::Compat(compat) => compat.settings(),
}
}
}
impl From<V6IndexReader> for DumpIndexReader {
fn from(value: V6IndexReader) -> Self {
DumpIndexReader::Current(value)
}
}
impl From<CompatIndexV5ToV6> for DumpIndexReader {
fn from(value: CompatIndexV5ToV6) -> Self {
DumpIndexReader::Compat(Box::new(value))
}
}
#[cfg(test)]
pub(crate) mod test {
use std::fs::File;
use meili_snap::insta;
use super::*;
use crate::reader::v6::RuntimeTogglableFeatures;
#[test]
fn import_dump_v6_with_vectors() {
// dump containing two indexes
//
// "vector", configured with an embedder
// contains:
// - one document with an overriden vector,
// - one document with a natural vector
// - one document with a _vectors map containing one additional embedder name and a natural vector
// - one document with a _vectors map containing one additional embedder name and an overriden vector
//
// "novector", no embedder
// contains:
// - a document without vector
// - a document with a random _vectors field
let dump = File::open("tests/assets/v6-with-vectors.dump").unwrap();
let mut dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_snapshot!(dump.date().unwrap(), @"2024-05-16 15:51:34.151044 +00:00:00");
insta::assert_debug_snapshot!(dump.instance_uid().unwrap(), @"None");
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"2b8a72d6bc6ba79980491966437daaf9");
assert_eq!(update_files.len(), 10);
assert!(update_files[0].is_none()); // the dump creation
assert!(update_files[1].is_none());
assert!(update_files[2].is_none());
assert!(update_files[3].is_none());
assert!(update_files[4].is_none());
assert!(update_files[5].is_none());
assert!(update_files[6].is_none());
assert!(update_files[7].is_none());
assert!(update_files[8].is_none());
assert!(update_files[9].is_none());
// indexes
let mut indexes = dump.indexes().unwrap().collect::<Result<Vec<_>>>().unwrap();
// the index are not ordered in any way by default
indexes.sort_by_key(|index| index.metadata().uid.to_string());
let mut vector_index = indexes.pop().unwrap();
let mut novector_index = indexes.pop().unwrap();
assert!(indexes.is_empty());
// vector
insta::assert_json_snapshot!(vector_index.metadata(), @r###"
{
"uid": "vector",
"primaryKey": "id",
"createdAt": "2024-05-16T15:33:17.240962Z",
"updatedAt": "2024-05-16T15:40:55.723052Z"
}
"###);
insta::assert_json_snapshot!(vector_index.settings().unwrap());
{
let documents: Result<Vec<_>> = vector_index.documents().unwrap().collect();
let mut documents = documents.unwrap();
assert_eq!(documents.len(), 4);
documents.sort_by_key(|doc| doc.get("id").unwrap().to_string());
{
let document = documents.pop().unwrap();
insta::assert_json_snapshot!(document);
}
{
let document = documents.pop().unwrap();
insta::assert_json_snapshot!(document);
}
{
let document = documents.pop().unwrap();
insta::assert_json_snapshot!(document);
}
{
let document = documents.pop().unwrap();
insta::assert_json_snapshot!(document);
}
}
// novector
insta::assert_json_snapshot!(novector_index.metadata(), @r###"
{
"uid": "novector",
"primaryKey": "id",
"createdAt": "2024-05-16T15:33:03.568055Z",
"updatedAt": "2024-05-16T15:33:07.530217Z"
}
"###);
insta::assert_json_snapshot!(novector_index.settings().unwrap().embedders, @"null");
{
let documents: Result<Vec<_>> = novector_index.documents().unwrap().collect();
let mut documents = documents.unwrap();
assert_eq!(documents.len(), 2);
documents.sort_by_key(|doc| doc.get("id").unwrap().to_string());
{
let document = documents.pop().unwrap();
insta::assert_json_snapshot!(document, @r###"
{
"id": "e1",
"other": "random1",
"_vectors": "toto"
}
"###);
}
{
let document = documents.pop().unwrap();
insta::assert_json_snapshot!(document, @r###"
{
"id": "e0",
"other": "random0"
}
"###);
}
}
assert_eq!(dump.features().unwrap().unwrap(), RuntimeTogglableFeatures::default());
assert_eq!(dump.network().unwrap(), None);
}
#[test]
fn import_dump_v6_experimental() {
let dump = File::open("tests/assets/v6-with-experimental.dump").unwrap();
let mut dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_snapshot!(dump.date().unwrap(), @"2023-07-06 7:10:27.21958 +00:00:00");
insta::assert_debug_snapshot!(dump.instance_uid().unwrap(), @"None");
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"3ddf6169b0a3703c5d770971f036fc5d");
assert_eq!(update_files.len(), 2);
assert!(update_files[0].is_none()); // the dump creation
assert!(update_files[1].is_none()); // the processed document addition
// keys
let keys = dump.keys().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot_hash!(meili_snap::json_string!(keys), @"13c2da155e9729c2344688cab29af71d");
// indexes
let mut indexes = dump.indexes().unwrap().collect::<Result<Vec<_>>>().unwrap();
// the index are not ordered in any way by default
indexes.sort_by_key(|index| index.metadata().uid.to_string());
let mut test = indexes.pop().unwrap();
assert!(indexes.is_empty());
insta::assert_json_snapshot!(test.metadata(), @r###"
{
"uid": "test",
"primaryKey": "id",
"createdAt": "2023-07-06T07:07:41.364694Z",
"updatedAt": "2023-07-06T07:07:41.396114Z"
}
"###);
assert_eq!(test.documents().unwrap().count(), 1);
assert_eq!(dump.features().unwrap().unwrap(), RuntimeTogglableFeatures::default());
}
#[test]
fn import_dump_v6_network() {
let dump = File::open("tests/assets/v6-with-network.dump").unwrap();
let dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_snapshot!(dump.date().unwrap(), @"2025-01-29 15:45:32.738676 +00:00:00");
insta::assert_debug_snapshot!(dump.instance_uid().unwrap(), @"None");
// network
let network = dump.network().unwrap().unwrap();
insta::assert_snapshot!(network.local.as_ref().unwrap(), @"ms-0");
insta::assert_snapshot!(network.remotes.get("ms-0").as_ref().unwrap().url, @"http://localhost:7700");
insta::assert_snapshot!(network.remotes.get("ms-0").as_ref().unwrap().search_api_key.is_none(), @"true");
insta::assert_snapshot!(network.remotes.get("ms-1").as_ref().unwrap().url, @"http://localhost:7701");
insta::assert_snapshot!(network.remotes.get("ms-1").as_ref().unwrap().search_api_key.is_none(), @"true");
insta::assert_snapshot!(network.remotes.get("ms-2").as_ref().unwrap().url, @"http://ms-5679.example.meilisearch.io");
insta::assert_snapshot!(network.remotes.get("ms-2").as_ref().unwrap().search_api_key.as_ref().unwrap(), @"foo");
}
#[test]
fn import_dump_v5() {
let dump = File::open("tests/assets/v5.dump").unwrap();
let mut dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_snapshot!(dump.date().unwrap(), @"2022-10-04 15:55:10.344982459 +00:00:00");
insta::assert_snapshot!(dump.instance_uid().unwrap().unwrap(), @"9e15e977-f2ae-4761-943f-1eaf75fd736d");
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"4b03e23e740b27bfb9d2a1faffe512e2");
assert_eq!(update_files.len(), 22);
assert!(update_files[0].is_none()); // the dump creation
assert!(update_files[1].is_some()); // the enqueued document addition
assert!(update_files[2..].iter().all(|u| u.is_none())); // everything already processed
// keys
let keys = dump.keys().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot_hash!(meili_snap::json_string!(keys), @"c9d2b467fe2fca0b35580d8a999808fb");
// indexes
let mut indexes = dump.indexes().unwrap().collect::<Result<Vec<_>>>().unwrap();
// the index are not ordered in any way by default
indexes.sort_by_key(|index| index.metadata().uid.to_string());
let mut products = indexes.pop().unwrap();
let mut movies = indexes.pop().unwrap();
let mut spells = indexes.pop().unwrap();
assert!(indexes.is_empty());
// products
insta::assert_json_snapshot!(products.metadata(), @r###"
{
"uid": "products",
"primaryKey": "sku",
"createdAt": "2022-10-04T15:51:35.939396731Z",
"updatedAt": "2022-10-04T15:55:01.897325373Z"
}
"###);
insta::assert_json_snapshot!(products.settings().unwrap());
let documents = products.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"b01c8371aea4c7171af0d4d846a2bdca");
// movies
insta::assert_json_snapshot!(movies.metadata(), @r###"
{
"uid": "movies",
"primaryKey": "id",
"createdAt": "2022-10-04T15:51:35.291992167Z",
"updatedAt": "2022-10-04T15:55:10.33561842Z"
}
"###);
insta::assert_json_snapshot!(movies.settings().unwrap());
let documents = movies.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 200);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"e962baafd2fbae4cdd14e876053b0c5a");
// spells
insta::assert_json_snapshot!(spells.metadata(), @r###"
{
"uid": "dnd_spells",
"primaryKey": "index",
"createdAt": "2022-10-04T15:51:37.381094632Z",
"updatedAt": "2022-10-04T15:55:02.394503431Z"
}
"###);
insta::assert_json_snapshot!(spells.settings().unwrap());
let documents = spells.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"235016433dd04262c7f2da01d1e808ce");
assert_eq!(dump.features().unwrap(), None);
}
#[test]
fn import_dump_v4() {
let dump = File::open("tests/assets/v4.dump").unwrap();
let mut dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_snapshot!(dump.date().unwrap(), @"2022-10-06 12:53:49.131989609 +00:00:00");
insta::assert_snapshot!(dump.instance_uid().unwrap().unwrap(), @"9e15e977-f2ae-4761-943f-1eaf75fd736d");
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"c1b06a5ca60d5805483c16c5b3ff61ef");
assert_eq!(update_files.len(), 10);
assert!(update_files[0].is_some()); // the enqueued document addition
assert!(update_files[1..].iter().all(|u| u.is_none())); // everything already processed
// keys
let keys = dump.keys().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot_hash!(meili_snap::json_string!(keys, { "[].uid" => "[uuid]" }), @"d751713988987e9331980363e24189ce");
// indexes
let mut indexes = dump.indexes().unwrap().collect::<Result<Vec<_>>>().unwrap();
// the index are not ordered in any way by default
indexes.sort_by_key(|index| index.metadata().uid.to_string());
let mut products = indexes.pop().unwrap();
let mut movies = indexes.pop().unwrap();
let mut spells = indexes.pop().unwrap();
assert!(indexes.is_empty());
// products
insta::assert_json_snapshot!(products.metadata(), @r###"
{
"uid": "products",
"primaryKey": "sku",
"createdAt": "2022-10-06T12:53:39.360187055Z",
"updatedAt": "2022-10-06T12:53:40.603035979Z"
}
"###);
insta::assert_json_snapshot!(products.settings().unwrap());
let documents = products.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"b01c8371aea4c7171af0d4d846a2bdca");
// movies
insta::assert_json_snapshot!(movies.metadata(), @r###"
{
"uid": "movies",
"primaryKey": "id",
"createdAt": "2022-10-06T12:53:38.710611568Z",
"updatedAt": "2022-10-06T12:53:49.785862546Z"
}
"###);
insta::assert_json_snapshot!(movies.settings().unwrap());
let documents = movies.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 110);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"786022a66ecb992c8a2a60fee070a5ab");
// spells
insta::assert_json_snapshot!(spells.metadata(), @r###"
{
"uid": "dnd_spells",
"primaryKey": "index",
"createdAt": "2022-10-06T12:53:40.831649057Z",
"updatedAt": "2022-10-06T12:53:41.116036186Z"
}
"###);
insta::assert_json_snapshot!(spells.settings().unwrap());
let documents = spells.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"235016433dd04262c7f2da01d1e808ce");
}
#[test]
fn import_dump_v3() {
let dump = File::open("tests/assets/v3.dump").unwrap();
let mut dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_snapshot!(dump.date().unwrap(), @"2022-10-07 11:39:03.709153554 +00:00:00");
assert_eq!(dump.instance_uid().unwrap(), None);
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"0e203b6095f7c68dbdf788321dcc8215");
assert_eq!(update_files.len(), 10);
assert!(update_files[0].is_some()); // the enqueued document addition
assert!(update_files[1..].iter().all(|u| u.is_none())); // everything already processed
// keys
let keys = dump.keys().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot_hash!(meili_snap::json_string!(keys), @"d751713988987e9331980363e24189ce");
// indexes
let mut indexes = dump.indexes().unwrap().collect::<Result<Vec<_>>>().unwrap();
// the index are not ordered in any way by default
indexes.sort_by_key(|index| index.metadata().uid.to_string());
let mut products = indexes.pop().unwrap();
let mut movies2 = indexes.pop().unwrap();
let mut movies = indexes.pop().unwrap();
let mut spells = indexes.pop().unwrap();
assert!(indexes.is_empty());
// products
insta::assert_json_snapshot!(products.metadata(), { ".createdAt" => "[now]", ".updatedAt" => "[now]" }, @r###"
{
"uid": "products",
"primaryKey": "sku",
"createdAt": "[now]",
"updatedAt": "[now]"
}
"###);
insta::assert_json_snapshot!(products.settings().unwrap());
let documents = products.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"548284a84de510f71e88e6cdea495cf5");
// movies
insta::assert_json_snapshot!(movies.metadata(), { ".createdAt" => "[now]", ".updatedAt" => "[now]" }, @r###"
{
"uid": "movies",
"primaryKey": "id",
"createdAt": "[now]",
"updatedAt": "[now]"
}
"###);
insta::assert_json_snapshot!(movies.settings().unwrap());
let documents = movies.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 110);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"d153b5a81d8b3cdcbe1dec270b574022");
// movies2
insta::assert_json_snapshot!(movies2.metadata(), { ".createdAt" => "[now]", ".updatedAt" => "[now]" }, @r###"
{
"uid": "movies_2",
"primaryKey": null,
"createdAt": "[now]",
"updatedAt": "[now]"
}
"###);
insta::assert_json_snapshot!(movies2.settings().unwrap());
let documents = movies2.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 0);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"d751713988987e9331980363e24189ce");
// spells
insta::assert_json_snapshot!(spells.metadata(), { ".createdAt" => "[now]", ".updatedAt" => "[now]" }, @r###"
{
"uid": "dnd_spells",
"primaryKey": "index",
"createdAt": "[now]",
"updatedAt": "[now]"
}
"###);
insta::assert_json_snapshot!(spells.settings().unwrap());
let documents = spells.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"235016433dd04262c7f2da01d1e808ce");
}
#[test]
fn import_dump_v2() {
let dump = File::open("tests/assets/v2.dump").unwrap();
let mut dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_snapshot!(dump.date().unwrap(), @"2022-10-09 20:27:59.904096267 +00:00:00");
assert_eq!(dump.instance_uid().unwrap(), None);
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"d216c7f90f538ffbb2a059531d7ac89a");
assert_eq!(update_files.len(), 9);
assert!(update_files[0].is_some()); // the enqueued document addition
assert!(update_files[1..].iter().all(|u| u.is_none())); // everything already processed
// keys
let keys = dump.keys().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot_hash!(meili_snap::json_string!(keys), @"d751713988987e9331980363e24189ce");
// indexes
let mut indexes = dump.indexes().unwrap().collect::<Result<Vec<_>>>().unwrap();
// the index are not ordered in any way by default
indexes.sort_by_key(|index| index.metadata().uid.to_string());
let mut products = indexes.pop().unwrap();
let mut movies2 = indexes.pop().unwrap();
let mut movies = indexes.pop().unwrap();
let mut spells = indexes.pop().unwrap();
assert!(indexes.is_empty());
// products
insta::assert_json_snapshot!(products.metadata(), @r###"
{
"uid": "products",
"primaryKey": "sku",
"createdAt": "2022-10-09T20:27:22.688964637Z",
"updatedAt": "2022-10-09T20:27:23.951017769Z"
}
"###);
insta::assert_json_snapshot!(products.settings().unwrap());
let documents = products.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"548284a84de510f71e88e6cdea495cf5");
// movies
insta::assert_json_snapshot!(movies.metadata(), @r###"
{
"uid": "movies",
"primaryKey": "id",
"createdAt": "2022-10-09T20:27:22.197788495Z",
"updatedAt": "2022-10-09T20:28:01.93111053Z"
}
"###);
insta::assert_json_snapshot!(movies.settings().unwrap());
let documents = movies.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 110);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"d153b5a81d8b3cdcbe1dec270b574022");
// movies2
insta::assert_json_snapshot!(movies2.metadata(), { ".createdAt" => "[now]", ".updatedAt" => "[now]" }, @r###"
{
"uid": "movies_2",
"primaryKey": null,
"createdAt": "[now]",
"updatedAt": "[now]"
}
"###);
insta::assert_json_snapshot!(movies2.settings().unwrap());
let documents = movies2.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 0);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"d751713988987e9331980363e24189ce");
// spells
insta::assert_json_snapshot!(spells.metadata(), @r###"
{
"uid": "dnd_spells",
"primaryKey": "index",
"createdAt": "2022-10-09T20:27:24.242683494Z",
"updatedAt": "2022-10-09T20:27:24.312809641Z"
}
"###);
insta::assert_json_snapshot!(spells.settings().unwrap());
let documents = spells.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"235016433dd04262c7f2da01d1e808ce");
}
#[test]
fn import_dump_v2_from_meilisearch_v0_22_0_issue_3435() {
let dump = File::open("tests/assets/v2-v0.22.0.dump").unwrap();
let mut dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_snapshot!(dump.date().unwrap(), @"2023-01-30 16:26:09.247261 +00:00:00");
assert_eq!(dump.instance_uid().unwrap(), None);
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"e27999f1112632222cb84f6cffff7c5f");
assert_eq!(update_files.len(), 8);
assert!(update_files[0..].iter().all(|u| u.is_none())); // everything already processed
// keys
let keys = dump.keys().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot_hash!(meili_snap::json_string!(keys), @"d751713988987e9331980363e24189ce");
// indexes
let mut indexes = dump.indexes().unwrap().collect::<Result<Vec<_>>>().unwrap();
// the index are not ordered in any way by default
indexes.sort_by_key(|index| index.metadata().uid.to_string());
let mut products = indexes.pop().unwrap();
let mut movies = indexes.pop().unwrap();
let mut spells = indexes.pop().unwrap();
assert!(indexes.is_empty());
// products
insta::assert_json_snapshot!(products.metadata(), @r###"
{
"uid": "products",
"primaryKey": "sku",
"createdAt": "2023-01-30T16:25:56.595257Z",
"updatedAt": "2023-01-30T16:25:58.70348Z"
}
"###);
insta::assert_json_snapshot!(products.settings().unwrap());
let documents = products.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"548284a84de510f71e88e6cdea495cf5");
// movies
insta::assert_json_snapshot!(movies.metadata(), @r###"
{
"uid": "movies",
"primaryKey": "id",
"createdAt": "2023-01-30T16:25:56.192178Z",
"updatedAt": "2023-01-30T16:25:56.455714Z"
}
"###);
insta::assert_json_snapshot!(movies.settings().unwrap());
let documents = movies.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"0227598af846e574139ee0b80e03a720");
// spells
insta::assert_json_snapshot!(spells.metadata(), @r###"
{
"uid": "dnd_spells",
"primaryKey": "index",
"createdAt": "2023-01-30T16:25:58.876405Z",
"updatedAt": "2023-01-30T16:25:59.079906Z"
}
"###);
insta::assert_json_snapshot!(spells.settings().unwrap());
let documents = spells.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"235016433dd04262c7f2da01d1e808ce");
}
#[test]
fn import_dump_v1() {
let dump = File::open("tests/assets/v1.dump").unwrap();
let mut dump = DumpReader::open(dump).unwrap();
// top level infos
assert_eq!(dump.date(), None);
assert_eq!(dump.instance_uid().unwrap(), None);
// batches didn't exists at the time
let batches = dump.batches().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(batches), @"[]");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"0155a664b0cf62aae23db5138b6b03d7");
assert_eq!(update_files.len(), 9);
assert!(update_files[..].iter().all(|u| u.is_none())); // no update file in dump v1
// keys
let keys = dump.keys().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot!(meili_snap::json_string!(keys), @"[]");
meili_snap::snapshot_hash!(meili_snap::json_string!(keys), @"d751713988987e9331980363e24189ce");
// indexes
let mut indexes = dump.indexes().unwrap().collect::<Result<Vec<_>>>().unwrap();
// the index are not ordered in any way by default
indexes.sort_by_key(|index| index.metadata().uid.to_string());
let mut products = indexes.pop().unwrap();
let mut movies = indexes.pop().unwrap();
let mut spells = indexes.pop().unwrap();
assert!(indexes.is_empty());
// products
insta::assert_json_snapshot!(products.metadata(), @r###"
{
"uid": "products",
"primaryKey": "sku",
"createdAt": "2022-10-02T13:23:39.976870431Z",
"updatedAt": "2022-10-02T13:27:54.353262482Z"
}
"###);
insta::assert_json_snapshot!(products.settings().unwrap());
let documents = products.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"b01c8371aea4c7171af0d4d846a2bdca");
// movies
insta::assert_json_snapshot!(movies.metadata(), @r###"
{
"uid": "movies",
"primaryKey": "id",
"createdAt": "2022-10-02T13:15:29.477512777Z",
"updatedAt": "2022-10-02T13:21:12.671204856Z"
}
"###);
insta::assert_json_snapshot!(movies.settings().unwrap());
let documents = movies.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"b63dbed5bbc059f3e32bc471ae699bf5");
// spells
insta::assert_json_snapshot!(spells.metadata(), @r###"
{
"uid": "dnd_spells",
"primaryKey": "index",
"createdAt": "2022-10-02T13:38:26.358882984Z",
"updatedAt": "2022-10-02T13:38:26.385609433Z"
}
"###);
insta::assert_json_snapshot!(spells.settings().unwrap());
let documents = spells.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"aa24c0cfc733d66c396237ad44263bed");
}
}

View File

@ -1,24 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"typo",
"words",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,38 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"typo",
"words",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
},
"distinctAttribute": null
}

View File

@ -1,31 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [
"genres",
"id"
],
"sortableAttributes": [
"genres",
"id"
],
"rankingRules": [
"typo",
"words",
"proximity",
"attribute",
"exactness",
"release_date:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,23 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: movies2.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,23 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,37 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
},
"distinctAttribute": null
}

View File

@ -1,24 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"exactness",
"release_date:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,25 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,39 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
},
"distinctAttribute": null
}

View File

@ -1,30 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [
"genres",
"id"
],
"sortableAttributes": [
"release_date"
],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"exactness",
"release_date:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,25 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: movies2.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,25 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,39 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
},
"distinctAttribute": null
}

View File

@ -1,31 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [
"genres",
"id"
],
"sortableAttributes": [
"release_date"
],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness",
"release_date:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@ -1,34 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null,
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
"oneTypo": 5,
"twoTypos": 9
},
"disableOnWords": [],
"disableOnAttributes": []
}
}

View File

@ -1,48 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
},
"distinctAttribute": null,
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
"oneTypo": 5,
"twoTypos": 9
},
"disableOnWords": [],
"disableOnAttributes": []
}
}

View File

@ -1,40 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [
"genres",
"id"
],
"sortableAttributes": [
"release_date"
],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness",
"release_date:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null,
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
"oneTypo": 5,
"twoTypos": 9
},
"disableOnWords": [],
"disableOnAttributes": []
}
}

View File

@ -1,40 +0,0 @@
---
source: dump/src/reader/mod.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null,
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
"oneTypo": 5,
"twoTypos": 9
},
"disableOnWords": [],
"disableOnAttributes": []
},
"faceting": {
"maxValuesPerFacet": 100
},
"pagination": {
"maxTotalHits": 1000
}
}

Some files were not shown because too many files have changed in this diff Show More