Compare commits

...

112 Commits

Author SHA1 Message Date
79111230ee display fids at the end 2025-01-08 17:24:17 +01:00
d6957a8e5d ignore all string facets 2025-01-08 16:55:31 +01:00
e83c021755 When spilling on the next fid, no longer ignore children 2025-01-08 16:50:05 +01:00
7ec7200378 Check valid_facet_value as part of a filter of the iterator 2025-01-08 16:25:44 +01:00
6a577254fa No longer ignore the first child without parent 2025-01-08 16:25:30 +01:00
fd88c834c3 Modernize valid_lmdb_key 2025-01-08 15:22:11 +01:00
b4005593f4 Switch to an iterative algorithm for find_changed_parents 2025-01-08 14:57:14 +01:00
8ee3793259 Update after review 2025-01-08 13:58:14 +01:00
3648abbfd5 Remove unused FacetFieldIdOperation 2025-01-07 15:26:09 +01:00
4d2433de12 center groups 2025-01-06 18:23:35 +01:00
28cc6df7a3 Fix uselessly deep stack trace 2025-01-06 18:07:49 +01:00
34f4602ae8 Update snapshot 2025-01-06 16:55:12 +01:00
7a9290aaae Use new incremental facet indexing and enable sanity checks in debug 2025-01-06 15:08:48 +01:00
5d219587b8 Add new incremental facet indexing 2025-01-06 15:08:36 +01:00
6e9aa49893 add valid_facet_value utility function 2025-01-06 15:08:07 +01:00
6b3a2c7281 Add sanity checks for facet values 2025-01-06 15:07:55 +01:00
5908aec6cb Merge #5192
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 12s
Test suite / Run tests in debug (push) Failing after 12s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 28s
Test suite / Run Rustfmt (push) Successful in 2m16s
Test suite / Run Clippy (push) Successful in 6m20s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5192: Fix empty document addition r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes #5190

## What does this PR do?
- Improve a test just to make sure this issue never arises again
- Fix the issue

For the reviewer: Calling `add_documents` with an empty `mmap` seems to work, but does it impact the perf in a significant way? / 

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-31 17:11:10 +00:00
19f48c15fb Fix the addition of empty payload 2024-12-31 18:00:14 +01:00
47b484c07c update the test to ensure it works when specifying the primary key or not: it doesn't work 2024-12-31 17:24:32 +01:00
7d5e28b475 Merge #5193
5193: Update version for the next release (v1.12.1) in Cargo.toml r=irevoire a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2024-12-31 09:40:31 +00:00
0648e06aa2 Update version for the next release (v1.12.1) in Cargo.toml 2024-12-30 17:36:46 +00:00
33921747b7 stop skipping empty tasks when adding documents 2024-12-30 17:48:25 +01:00
970a489dcc add a test reproducing the bug 2024-12-30 16:21:06 +01:00
ba11121cfc Merge #5159
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 22s
Test suite / Run Rustfmt (push) Successful in 1m18s
Test suite / Run Clippy (push) Successful in 5m30s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5159: Fix the New Indexer Spilling r=irevoire a=Kerollmops

Fix two bugs in the merging of the spilled caches. Thanks to `@ManyTheFish` and `@irevoire` 👏

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-12-12 17:16:53 +00:00
acdd5aa6ea Use the thread source id instead of the destination id
when filtering on the cache to merge
2024-12-12 18:12:00 +01:00
2f3cc8cdd2 Fix the merge_caches_sorted function 2024-12-12 16:15:37 +01:00
7a95fed23f Merge #5158
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 46s
Test suite / Tests on ubuntu-20.04 (push) Failing after 13s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 15s
Test suite / Run Rustfmt (push) Successful in 9m49s
Test suite / Run Clippy (push) Successful in 46m15s
5158: Indexer edition 2024 fix facet fst r=Kerollmops a=ManyTheFish

# Pull Request
Fix a regression in the new indexer; when several filterable attributes containing strings were set, all the field IDs were shifted, and the last one was overwriting the previous FST.

## What does this PR do?
- Add a test reproducing the bug
- fix the bug

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-12-12 14:14:44 +00:00
961de4d34e Fix facet fst 2024-12-12 15:12:28 +01:00
18ce95dcbf Add test reproducing the bug 2024-12-12 14:56:45 +01:00
c177210b1b Merge #5152
5152: Make xtasks be able to use the specified binary r=dureuill a=Kerollmops

Makes it possible to specify the binary to run. It is useful to run PGO optimized binaries.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-12-12 12:28:16 +00:00
1fc90fbacb Merge #5147
5147: Batch progress r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5068

## What does this PR do?
- ...

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-12 09:15:54 +00:00
6c72559457 Update the binary-path description
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-12 09:39:39 +01:00
1fdfa3f208 Change the exit code to 130 when Ctrl-Ced 2024-12-12 09:26:14 +01:00
0d0c18f519 rename the Step::name into Step::current_step 2024-12-11 18:41:03 +01:00
d12364c1e0 fix the tests 2024-12-11 18:30:48 +01:00
8cd3a1aa57 fmt 2024-12-11 18:18:40 +01:00
08fd026ebd fix warning 2024-12-11 18:18:13 +01:00
75d5cea624 use a with_capacity while allocating the progress view 2024-12-11 18:17:33 +01:00
ab9213fa94 ensure we never write the progress to the db 2024-12-11 18:16:20 +01:00
45d5d4bf40 make the progressview public 2024-12-11 18:15:33 +01:00
fa885e75b4 rename the send_progress in progress 2024-12-11 18:13:12 +01:00
29fc77ee5b remove usuless print 2024-12-11 18:11:19 +01:00
ad4dc70720 rename the ComputingTheChanges to ComputingDocumentChanges in the edit document progress 2024-12-11 18:09:54 +01:00
5d682b4700 rename the ComputingTheChanges to ComputingDocumentChanges 2024-12-11 18:08:45 +01:00
f1beb60204 make the progress use payload instead of documents 2024-12-11 18:07:45 +01:00
85577e70cd reuse the enqueued 2024-12-11 18:05:34 +01:00
c5536c37b5 rename the atomic::name to unit_name 2024-12-11 18:03:06 +01:00
9245c89cfe move the macros to milli 2024-12-11 18:00:46 +01:00
eaabc1af2f Merge #5144
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 47s
Test suite / Run Rustfmt (push) Successful in 1m17s
Test suite / Run Clippy (push) Successful in 5m30s
5144: Exactly 512 bytes docid fails r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #5050 

## What does this PR do?
- Return a user error rather than an internal one for docids of exactly 512 bytes
- Fix up error message to indicate that exactly 512 bytes long docids are not supported.
- Fix up error message to reflect that index uids are actually limited to 400 bytes in length

## Impact

- Impacts docs: 
    - update [this paragraph](https://www.meilisearch.com/docs/learn/resources/known_limitations#length-of-primary-key-values) to say 511 bytes instead of 512 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-11 15:41:05 +00:00
04a24a9239 Kill Meilisearch with a TERM signal 2024-12-11 16:27:07 +01:00
1f54dfa883 update the macro to look more like an enum 2024-12-11 16:26:09 +01:00
786b0fabea implement the progress for almost all the tasks 2024-12-11 16:26:08 +01:00
26733c705d add progress for the task deletion and task cancelation 2024-12-11 16:25:02 +01:00
ab75f53efd update all snapshots 2024-12-11 16:25:02 +01:00
867e6a8f1d rename the send_progress field to progress since it s not sending anything 2024-12-11 16:25:01 +01:00
6f4823fc97 make the number of document in the document tasks more incremental 2024-12-11 16:25:01 +01:00
df9b68f8ed inital implementation of the progress 2024-12-11 16:25:01 +01:00
5bc6391700 Merge #5153
5153: Return docid in case of errors while rendering the document template r=Kerollmops a=dureuill

Improves error message:

Before: 

```
ERROR index_scheduler: Batch failed Index `mieli`: user error: missing field in document: liquid: Unknown index
  with:
    variable=doc
    requested index=title
    available indexes=by, id, kids, parent, text, time, type
```

After:

```
ERROR index_scheduler: Batch failed Index `mieli`: user error: missing field in document `11345147`: liquid: Unknown index
  with:
    variable=doc
    requested index=title
    available indexes=by, id, kids, parent, text, time, type
```

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-11 15:01:40 +00:00
eaa897d983 Avoid compiling when unecessary 2024-12-11 15:57:16 +01:00
bfca54cc2c Return docid in case of errors while rendering the document template 2024-12-11 15:26:18 +01:00
04a62d2b97 Compile Meilisearch or run the dedicated binary file 2024-12-11 14:57:07 +01:00
8c19cb0a0b Merge #5146
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 19s
Test suite / Run tests in debug (push) Failing after 14s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 23s
Test suite / Run Rustfmt (push) Successful in 4m52s
Test suite / Run Clippy (push) Failing after 8m9s
5146: Offline upgrade v1.12 r=irevoire a=ManyTheFish

# Pull Request

## Related issue
Fixes #4978 

## What does this PR do?
- add v1_11_to_v1_12 function to upgrade Meilisearch from v1.11 to v1.12
- Convert the update files from OBKV to ndjson format


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-12-11 13:39:14 +00:00
5c492031d9 Update crates/meilitool/src/upgrade/v1_12.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-11 14:34:18 +01:00
fb1caa4724 Merge #5148
5148: Do not duplicate NDJson data when unecessary r=dureuill a=Kerollmops

This PR improves the NDJSON support. Usually, we save all of the user's document content into a temporary file, validate its content, and then convert everything into NDJSON in the file store (update files in the tasks).

It is a waste of time when users are already sending NDJSON. So, this PR removes the last copy and directly stores the user content in the file store, validating it from the file store. If an issue arises, the file will not persist and will be dropped/deleted instead.

Related to #5078.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-11 13:00:50 +00:00
5622b9607d Wrap the read NDJSON pass into a tokio blocking 2024-12-11 12:18:36 +01:00
01bcc601be Use a nonrandom hasher when decoding JSON 2024-12-11 12:04:29 +01:00
93fbdc06d3 Use a nonrandom hasher when decoding NDJSON 2024-12-11 12:03:09 +01:00
69c931334f Fix the error messages categorization with invalid NDJson 2024-12-11 12:02:48 +01:00
d683f5980c Do not duplicate NDJson when unecessary 2024-12-11 12:02:48 +01:00
f8ba112f66 Merge #5150
5150: Reintroduce the Document Addition Logs r=dureuill a=Kerollmops

This PR reintroduces lost tracing logs showing some information about the number of indexed documents.

Related to #5078. Resolves [this comment](https://github.com/meilisearch/meilisearch/pull/4900/files?show-deleted-files=true&show-viewed-files=true&file-filters%5B%5D=#r1852158338) and [this other one](https://github.com/meilisearch/meilisearch/pull/4900/files?show-deleted-files=true&show-viewed-files=true&file-filters%5B%5D=#r1852159073).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-11 10:48:48 +00:00
c614d0dd35 Add context when returning an error 2024-12-11 10:55:39 +01:00
479607e5dd Convert update files from OBKV to ndjson 2024-12-11 10:55:39 +01:00
bb00e70087 Reintroduce the document addition logs 2024-12-11 10:39:04 +01:00
e974be9518 Merge #5145
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 9s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 22s
Test suite / Run Rustfmt (push) Successful in 1m18s
Test suite / Run Clippy (push) Successful in 5m32s
5145: Use bumparaw-collections in Meilisearch/milli r=dureuill a=Kerollmops

This PR is related to #5078. It uses the now published bumparaw-collections and (soon) makes the `RawMap` hasher nonrandom.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-10 15:51:01 +00:00
aeb6b74725 Make sure we use an FxHashBuilder on the Value 2024-12-10 15:52:22 +01:00
a751972c57 Prefer using a stable than a random hash builder 2024-12-10 14:25:53 +01:00
6b269795d2 Update bumparaw-collections to 0.1.2 2024-12-10 14:25:13 +01:00
d075be798a Fix tests 2024-12-10 13:39:07 +01:00
89637bcaaf Use bumparaw-collections in Meilisearch/milli 2024-12-10 11:52:20 +01:00
866ac91be3 Fix error messages 2024-12-10 11:06:58 +01:00
e610af36aa User failure for documents with docid of ==512 bytes 2024-12-10 11:06:24 +01:00
7cf6707ed3 Extend test to add the ==512 bytes case 2024-12-10 11:05:42 +01:00
1995040846 Merge #5142
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 10s
Test suite / Run tests in debug (push) Failing after 11s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 22s
Test suite / Run Rustfmt (push) Successful in 1m19s
Test suite / Run Clippy (push) Successful in 5m49s
5142: Try merge optimisation r=dureuill a=ManyTheFish

![Capture_decran_2024-12-09_a_11 59 42](https://github.com/user-attachments/assets/0dfc7e30-a603-4546-98d2-791990bdfcce)

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-12-09 14:48:26 +00:00
07f42e8057 Do not index a filed count when no word is counted 2024-12-09 15:45:12 +01:00
71f59749dc Reduce union impact in merging 2024-12-09 15:44:06 +01:00
3b0b9967f6 Merge #5141
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 16s
Test suite / Run tests in debug (push) Failing after 14s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 44s
Test suite / Run Rustfmt (push) Successful in 9m52s
Test suite / Run Clippy (push) Successful in 1h2m24s
5141: Use the right amount of max memory and not impact the settings r=curquiza a=Kerollmops

Fixes #5132. Related to #5125.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-09 10:40:46 +00:00
123b54a178 Merge #5056
5056: Attach index name in error message r=irevoire a=airycanon

# Pull Request

## Related issue
Fixes #4392 

## What does this PR do?
- ...

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: airycanon <airycanon@airycanon.me>
2024-12-09 09:59:12 +00:00
f5dd8dfc3e Rollback max memory usage changes 2024-12-09 10:26:30 +01:00
bcfed70888 Revert "Merge #5125"
This reverts commit 9a9383643f, reversing
changes made to cac355bfa7.
2024-12-09 10:08:02 +01:00
503ef3bbc9 Merge #5138
5138: Allow xtask bench to proceed without a commit message r=Kerollmops a=dureuill



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-09 09:00:12 +00:00
08f2c696b0 Allow xtask bench to proceed without a commit message 2024-12-09 09:36:59 +01:00
b75f1f4c17 fix tests
# Conflicts:
#	crates/index-scheduler/src/batch.rs
#	crates/index-scheduler/src/snapshots/lib.rs/fail_in_process_batch_for_document_deletion/after_removing_the_documents.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_bad_primary_key/fifth_task_succeeds.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_bad_primary_key/fourth_task_fails.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_multiple_primary_key/second_task_fails.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_multiple_primary_key/third_task_fails.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_multiple_primary_key_batch_wrong_key/second_and_third_tasks_fails.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_set_and_null_primary_key_inference_works/all_other_tasks_succeeds.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_set_and_null_primary_key_inference_works/second_task_fails.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_set_and_null_primary_key_inference_works/third_task_succeeds.snap

# Conflicts:
#	crates/index-scheduler/src/batch.rs
#	crates/meilisearch/src/search/mod.rs
#	crates/meilisearch/tests/vector/mod.rs

# Conflicts:
#	crates/index-scheduler/src/batch.rs
2024-12-06 02:03:02 +08:00
95ed079761 attach index name in errors
# Conflicts:
#	crates/index-scheduler/src/batch.rs

# Conflicts:
#	crates/index-scheduler/src/batch.rs
#	crates/meilisearch/src/search/mod.rs
2024-12-06 01:12:13 +08:00
4a082683df Merge #5131
Some checks failed
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 21s
Test suite / Tests on ubuntu-20.04 (push) Failing after 10s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Run Rustfmt (push) Successful in 1m25s
Test suite / Run Clippy (push) Successful in 5m54s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5131: Ignore documents whose selected fields didn't change r=dureuill a=dureuill

Attempts to improve the new indexer performance by ignoring documents whose selected fields didn't change:

- Add `Update::has_changed_for_fields` function
- Ignore documents whose searchable attributes didn't change for word docids and word pair proximity extraction
- Ignore documents whose faceted attributes didn't change for facet extraction

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-05 16:04:16 +00:00
26be5e0733 Merge #5123
5123: Fix batch details r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5079
Fixes https://github.com/meilisearch/meilisearch/issues/5112

## What does this PR do?
- Make the processing tasks actually processing in the stats of the batch instead of enqueued
- Stop counting one extra task for all non-prioritized batches in the stats
- Add a test

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-05 15:21:55 +00:00
bd5110a2fe Fix clippy warnings 2024-12-05 16:13:07 +01:00
fa8b9acdf6 Ignore documents that didn't change in facets 2024-12-05 16:12:52 +01:00
2b74d1824b Ignore documents that didn't change any field in word pair proximity 2024-12-05 15:56:22 +01:00
c77b00d3ac Don't extract word docids when no searchable changed 2024-12-05 15:51:58 +01:00
c77073efcc Update::has_changed_for_fields 2024-12-05 15:50:12 +01:00
1537323eb9 Merge #5119
5119: Settings opt out error msg r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
PRD: https://meilisearch.notion.site/API-usage-Settings-to-opt-out-indexing-features-fff4b06b651f8108ade3f858aeb16b14?pvs=4
## What does this PR do?

Add a new error code and message when the user tries a facet search on an index where the facet search is disabled:
```json
{
  "message": "The facet search is disabled for this index",
  "code": "facet_search_disabled",
  "type": "invalid_request",
  "link": "https://docs.meilisearch.com/errors#invalid_facet_search_disabled"
}
 ```


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-12-05 13:51:11 +00:00
a0a3b55700 Change error code 2024-12-05 14:48:29 +01:00
214b51de87 try to fix the snapshot on demand flaky test 2024-12-05 14:45:54 +01:00
95975944d7 fix the dumps missing the empty swap index tasks 2024-12-05 14:23:38 +01:00
9a9383643f Merge #5125
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 37s
Test suite / Tests on ubuntu-20.04 (push) Failing after 15s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 12s
Test suite / Run Rustfmt (push) Successful in 2m14s
Test suite / Run Clippy (push) Successful in 12m4s
5125: Change the default max memory usage to 5% of the total memory r=ManyTheFish a=Kerollmops

After thorough testing, we found that giving 5% of the total available memory to allocate resident memory (caches and channels) is the best approach.

The main reason is that the new indexer is highly memory-map oriented, with LMDB, and reads the database while performing the indexation. So, by allowing the maximum amount of memory available to LMDB and the OS, it will perform the key-value store reads and all other indexation operations faster by keeping more pages hot in the cache. In #5124, we also sorted the entries to merge to improve the read speed of LMDB.

This is common in database management systems: Reading stuff on the disk is much faster when done in lexicographic order (the default sorted order of key values). The entries have a great chance of already being in the OS memory cache, as they were loaded in a previous read, and reading stuff on the disk is very slow compared to reading memory.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 10:11:25 +00:00
9020a50df8 Change the default max memory usage to 5% of the total memory 2024-12-05 10:14:46 +01:00
7a2af06b1e update the impacted snapshots 2024-12-04 15:52:24 +01:00
cb0c3a5aad stop adding one enqueued tasks to all unprioritized batches 2024-12-04 15:48:28 +01:00
cbcf6c9ba3 make the processing tasks as processing in a batch 2024-12-04 14:48:48 +01:00
bf742d81cf add a test 2024-12-04 14:47:02 +01:00
fc1df5793c fix tests 2024-12-04 14:35:20 +01:00
953a82ca04 Add new error message 2024-12-04 11:15:29 +01:00
114 changed files with 3283 additions and 1104 deletions

90
Cargo.lock generated
View File

@ -496,7 +496,7 @@ source = "git+https://github.com/meilisearch/bbqueue#cbb87cc707b5af415ef203bdaf2
[[package]] [[package]]
name = "benchmarks" name = "benchmarks"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"bumpalo", "bumpalo",
@ -689,7 +689,7 @@ dependencies = [
[[package]] [[package]]
name = "build-info" name = "build-info"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"time", "time",
@ -706,6 +706,20 @@ dependencies = [
"serde", "serde",
] ]
[[package]]
name = "bumparaw-collections"
version = "0.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4ce682bdc86c2e25ef5cd95881d9d6a1902214eddf74cf9ffea88fe1464377e8"
dependencies = [
"allocator-api2",
"bitpacking",
"bumpalo",
"hashbrown 0.15.1",
"serde",
"serde_json",
]
[[package]] [[package]]
name = "byte-unit" name = "byte-unit"
version = "5.1.4" version = "5.1.4"
@ -1650,7 +1664,7 @@ dependencies = [
[[package]] [[package]]
name = "dump" name = "dump"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"big_s", "big_s",
@ -1862,7 +1876,7 @@ checksum = "486f806e73c5707928240ddc295403b1b93c96a02038563881c4a2fd84b81ac4"
[[package]] [[package]]
name = "file-store" name = "file-store"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"tempfile", "tempfile",
"thiserror", "thiserror",
@ -1884,7 +1898,7 @@ dependencies = [
[[package]] [[package]]
name = "filter-parser" name = "filter-parser"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"insta", "insta",
"nom", "nom",
@ -1904,7 +1918,7 @@ dependencies = [
[[package]] [[package]]
name = "flatten-serde-json" name = "flatten-serde-json"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"criterion", "criterion",
"serde_json", "serde_json",
@ -2043,7 +2057,7 @@ dependencies = [
[[package]] [[package]]
name = "fuzzers" name = "fuzzers"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"arbitrary", "arbitrary",
"bumpalo", "bumpalo",
@ -2610,13 +2624,15 @@ checksum = "206ca75c9c03ba3d4ace2460e57b189f39f43de612c2f85836e65c929701bb2d"
[[package]] [[package]]
name = "index-scheduler" name = "index-scheduler"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)", "arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
"big_s", "big_s",
"bincode", "bincode",
"bumpalo", "bumpalo",
"bumparaw-collections",
"convert_case 0.6.0",
"crossbeam-channel", "crossbeam-channel",
"csv", "csv",
"derive_builder 0.20.0", "derive_builder 0.20.0",
@ -2631,7 +2647,6 @@ dependencies = [
"meilisearch-types", "meilisearch-types",
"memmap2", "memmap2",
"page_size", "page_size",
"raw-collections",
"rayon", "rayon",
"roaring", "roaring",
"serde", "serde",
@ -2647,12 +2662,12 @@ dependencies = [
[[package]] [[package]]
name = "indexmap" name = "indexmap"
version = "2.2.6" version = "2.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "168fb715dda47215e360912c096649d23d58bf392ac62f73919e831745e40f26" checksum = "62f822373a4fe84d4bb149bf54e584a7f4abec90e072ed49cda0edea5b95471f"
dependencies = [ dependencies = [
"equivalent", "equivalent",
"hashbrown 0.14.3", "hashbrown 0.15.1",
"serde", "serde",
] ]
@ -2807,7 +2822,7 @@ dependencies = [
[[package]] [[package]]
name = "json-depth-checker" name = "json-depth-checker"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"criterion", "criterion",
"serde_json", "serde_json",
@ -3426,7 +3441,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]] [[package]]
name = "meili-snap" name = "meili-snap"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"insta", "insta",
"md5", "md5",
@ -3435,7 +3450,7 @@ dependencies = [
[[package]] [[package]]
name = "meilisearch" name = "meilisearch"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"actix-cors", "actix-cors",
"actix-http", "actix-http",
@ -3525,7 +3540,7 @@ dependencies = [
[[package]] [[package]]
name = "meilisearch-auth" name = "meilisearch-auth"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"base64 0.22.1", "base64 0.22.1",
"enum-iterator", "enum-iterator",
@ -3544,11 +3559,12 @@ dependencies = [
[[package]] [[package]]
name = "meilisearch-types" name = "meilisearch-types"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"actix-web", "actix-web",
"anyhow", "anyhow",
"bumpalo", "bumpalo",
"bumparaw-collections",
"convert_case 0.6.0", "convert_case 0.6.0",
"csv", "csv",
"deserr", "deserr",
@ -3561,8 +3577,8 @@ dependencies = [
"meili-snap", "meili-snap",
"memmap2", "memmap2",
"milli", "milli",
"raw-collections",
"roaring", "roaring",
"rustc-hash 2.1.0",
"serde", "serde",
"serde-cs", "serde-cs",
"serde_json", "serde_json",
@ -3576,16 +3592,19 @@ dependencies = [
[[package]] [[package]]
name = "meilitool" name = "meilitool"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"arroy 0.5.0 (git+https://github.com/meilisearch/arroy/?tag=DO-NOT-DELETE-upgrade-v04-to-v05)", "arroy 0.5.0 (git+https://github.com/meilisearch/arroy/?tag=DO-NOT-DELETE-upgrade-v04-to-v05)",
"clap", "clap",
"dump", "dump",
"file-store", "file-store",
"indexmap",
"meilisearch-auth", "meilisearch-auth",
"meilisearch-types", "meilisearch-types",
"serde", "serde",
"serde_json",
"tempfile",
"time", "time",
"uuid", "uuid",
] ]
@ -3608,7 +3627,7 @@ dependencies = [
[[package]] [[package]]
name = "milli" name = "milli"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"allocator-api2", "allocator-api2",
"arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)", "arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
@ -3618,6 +3637,7 @@ dependencies = [
"bincode", "bincode",
"bstr", "bstr",
"bumpalo", "bumpalo",
"bumparaw-collections",
"bytemuck", "bytemuck",
"byteorder", "byteorder",
"candle-core", "candle-core",
@ -3656,13 +3676,12 @@ dependencies = [
"once_cell", "once_cell",
"ordered-float", "ordered-float",
"rand", "rand",
"raw-collections",
"rayon", "rayon",
"rayon-par-bridge", "rayon-par-bridge",
"rhai", "rhai",
"roaring", "roaring",
"rstar", "rstar",
"rustc-hash 2.0.0", "rustc-hash 2.1.0",
"serde", "serde",
"serde_json", "serde_json",
"slice-group-by", "slice-group-by",
@ -4064,7 +4083,7 @@ checksum = "e3148f5046208a5d56bcfc03053e3ca6334e51da8dfb19b6cdc8b306fae3283e"
[[package]] [[package]]
name = "permissive-json-pointer" name = "permissive-json-pointer"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"big_s", "big_s",
"serde_json", "serde_json",
@ -4411,7 +4430,7 @@ dependencies = [
"bytes", "bytes",
"rand", "rand",
"ring", "ring",
"rustc-hash 2.0.0", "rustc-hash 2.1.0",
"rustls", "rustls",
"slab", "slab",
"thiserror", "thiserror",
@ -4487,19 +4506,6 @@ dependencies = [
"rand", "rand",
] ]
[[package]]
name = "raw-collections"
version = "0.1.0"
source = "git+https://github.com/meilisearch/raw-collections.git#15e5d7bdebc0c149b2a28b2454f307c717d07f8a"
dependencies = [
"allocator-api2",
"bitpacking",
"bumpalo",
"hashbrown 0.15.1",
"serde",
"serde_json",
]
[[package]] [[package]]
name = "raw-cpuid" name = "raw-cpuid"
version = "10.7.0" version = "10.7.0"
@ -4797,9 +4803,9 @@ checksum = "08d43f7aa6b08d49f382cde6a7982047c3426db949b1424bc4b7ec9ae12c6ce2"
[[package]] [[package]]
name = "rustc-hash" name = "rustc-hash"
version = "2.0.0" version = "2.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "583034fd73374156e66797ed8e5b0d5690409c9226b22d87cb7f19821c05d152" checksum = "c7fb8039b3032c191086b10f11f319a6e99e1e82889c5cc6046f515c9db1d497"
[[package]] [[package]]
name = "rustc_version" name = "rustc_version"
@ -4968,9 +4974,9 @@ dependencies = [
[[package]] [[package]]
name = "serde_json" name = "serde_json"
version = "1.0.132" version = "1.0.133"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d726bfaff4b320266d395898905d0eba0345aae23b54aee3a737e260fd46db03" checksum = "c7fceb2473b9166b2294ef05efcb65a3db80803f0b03ef86a5fc88a2b85ee377"
dependencies = [ dependencies = [
"indexmap", "indexmap",
"itoa", "itoa",
@ -6480,7 +6486,7 @@ dependencies = [
[[package]] [[package]]
name = "xtask" name = "xtask"
version = "1.12.0" version = "1.12.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"build-info", "build-info",

View File

@ -22,7 +22,7 @@ members = [
] ]
[workspace.package] [workspace.package]
version = "1.12.0" version = "1.12.1"
authors = [ authors = [
"Quentin de Quelen <quentin@dequelen.me>", "Quentin de Quelen <quentin@dequelen.me>",
"Clément Renault <clement@meilisearch.com>", "Clément Renault <clement@meilisearch.com>",

View File

@ -8,6 +8,7 @@ use bumpalo::Bump;
use criterion::{criterion_group, criterion_main, Criterion}; use criterion::{criterion_group, criterion_main, Criterion};
use milli::documents::PrimaryKey; use milli::documents::PrimaryKey;
use milli::heed::{EnvOpenOptions, RwTxn}; use milli::heed::{EnvOpenOptions, RwTxn};
use milli::progress::Progress;
use milli::update::new::indexer; use milli::update::new::indexer;
use milli::update::{IndexDocumentsMethod, IndexerConfig, Settings}; use milli::update::{IndexDocumentsMethod, IndexerConfig, Settings};
use milli::vector::EmbeddingConfigs; use milli::vector::EmbeddingConfigs;
@ -151,7 +152,7 @@ fn indexing_songs_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -166,7 +167,7 @@ fn indexing_songs_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -218,7 +219,7 @@ fn reindexing_songs_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -233,7 +234,7 @@ fn reindexing_songs_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -263,7 +264,7 @@ fn reindexing_songs_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -278,7 +279,7 @@ fn reindexing_songs_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -332,7 +333,7 @@ fn deleting_songs_in_batches_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -347,7 +348,7 @@ fn deleting_songs_in_batches_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -409,7 +410,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -424,7 +425,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -454,7 +455,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -469,7 +470,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -495,7 +496,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -510,7 +511,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -563,7 +564,7 @@ fn indexing_songs_without_faceted_numbers(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -578,7 +579,7 @@ fn indexing_songs_without_faceted_numbers(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -630,7 +631,7 @@ fn indexing_songs_without_faceted_fields(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -645,7 +646,7 @@ fn indexing_songs_without_faceted_fields(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -697,7 +698,7 @@ fn indexing_wiki(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -712,7 +713,7 @@ fn indexing_wiki(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -763,7 +764,7 @@ fn reindexing_wiki(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -778,7 +779,7 @@ fn reindexing_wiki(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -808,7 +809,7 @@ fn reindexing_wiki(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -823,7 +824,7 @@ fn reindexing_wiki(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -876,7 +877,7 @@ fn deleting_wiki_in_batches_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -891,7 +892,7 @@ fn deleting_wiki_in_batches_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -953,7 +954,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -968,7 +969,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -999,7 +1000,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1014,7 +1015,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1041,7 +1042,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1056,7 +1057,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1108,7 +1109,7 @@ fn indexing_movies_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1123,7 +1124,7 @@ fn indexing_movies_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1174,7 +1175,7 @@ fn reindexing_movies_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1189,7 +1190,7 @@ fn reindexing_movies_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1219,7 +1220,7 @@ fn reindexing_movies_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1234,7 +1235,7 @@ fn reindexing_movies_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1287,7 +1288,7 @@ fn deleting_movies_in_batches_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1302,7 +1303,7 @@ fn deleting_movies_in_batches_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1350,7 +1351,7 @@ fn delete_documents_from_ids(index: Index, document_ids_to_delete: Vec<RoaringBi
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1400,7 +1401,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1415,7 +1416,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1445,7 +1446,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1460,7 +1461,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1486,7 +1487,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1501,7 +1502,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1576,7 +1577,7 @@ fn indexing_nested_movies_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1591,7 +1592,7 @@ fn indexing_nested_movies_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1667,7 +1668,7 @@ fn deleting_nested_movies_in_batches_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1682,7 +1683,7 @@ fn deleting_nested_movies_in_batches_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1750,7 +1751,7 @@ fn indexing_nested_movies_without_faceted_fields(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1765,7 +1766,7 @@ fn indexing_nested_movies_without_faceted_fields(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1817,7 +1818,7 @@ fn indexing_geo(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1832,7 +1833,7 @@ fn indexing_geo(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1883,7 +1884,7 @@ fn reindexing_geo(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1898,7 +1899,7 @@ fn reindexing_geo(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1928,7 +1929,7 @@ fn reindexing_geo(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -1943,7 +1944,7 @@ fn reindexing_geo(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
@ -1996,7 +1997,7 @@ fn deleting_geo_in_batches_default(c: &mut Criterion) {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2011,7 +2012,7 @@ fn deleting_geo_in_batches_default(c: &mut Criterion) {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();

View File

@ -10,6 +10,7 @@ use bumpalo::Bump;
use criterion::BenchmarkId; use criterion::BenchmarkId;
use memmap2::Mmap; use memmap2::Mmap;
use milli::heed::EnvOpenOptions; use milli::heed::EnvOpenOptions;
use milli::progress::Progress;
use milli::update::new::indexer; use milli::update::new::indexer;
use milli::update::{IndexDocumentsMethod, IndexerConfig, Settings}; use milli::update::{IndexDocumentsMethod, IndexerConfig, Settings};
use milli::vector::EmbeddingConfigs; use milli::vector::EmbeddingConfigs;
@ -110,7 +111,7 @@ pub fn base_setup(conf: &Conf) -> Index {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -125,7 +126,7 @@ pub fn base_setup(conf: &Conf) -> Index {
&document_changes, &document_changes,
EmbeddingConfigs::default(), EmbeddingConfigs::default(),
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();

View File

@ -136,6 +136,14 @@ pub struct File {
} }
impl File { impl File {
pub fn from_parts(path: PathBuf, file: Option<NamedTempFile>) -> Self {
Self { path, file }
}
pub fn into_parts(self) -> (PathBuf, Option<NamedTempFile>) {
(self.path, self.file)
}
pub fn dry_file() -> Result<Self> { pub fn dry_file() -> Result<Self> {
Ok(Self { path: PathBuf::new(), file: None }) Ok(Self { path: PathBuf::new(), file: None })
} }

View File

@ -10,6 +10,7 @@ use either::Either;
use fuzzers::Operation; use fuzzers::Operation;
use milli::documents::mmap_from_objects; use milli::documents::mmap_from_objects;
use milli::heed::EnvOpenOptions; use milli::heed::EnvOpenOptions;
use milli::progress::Progress;
use milli::update::new::indexer; use milli::update::new::indexer;
use milli::update::{IndexDocumentsMethod, IndexerConfig}; use milli::update::{IndexDocumentsMethod, IndexerConfig};
use milli::vector::EmbeddingConfigs; use milli::vector::EmbeddingConfigs;
@ -128,7 +129,7 @@ fn main() {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -143,7 +144,7 @@ fn main() {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();

View File

@ -13,6 +13,9 @@ license.workspace = true
[dependencies] [dependencies]
anyhow = "1.0.86" anyhow = "1.0.86"
bincode = "1.3.3" bincode = "1.3.3"
bumpalo = "3.16.0"
bumparaw-collections = "0.1.2"
convert_case = "0.6.0"
csv = "1.3.0" csv = "1.3.0"
derive_builder = "0.20.0" derive_builder = "0.20.0"
dump = { path = "../dump" } dump = { path = "../dump" }
@ -21,8 +24,8 @@ file-store = { path = "../file-store" }
flate2 = "1.0.30" flate2 = "1.0.30"
meilisearch-auth = { path = "../meilisearch-auth" } meilisearch-auth = { path = "../meilisearch-auth" }
meilisearch-types = { path = "../meilisearch-types" } meilisearch-types = { path = "../meilisearch-types" }
memmap2 = "0.9.4"
page_size = "0.6.0" page_size = "0.6.0"
raw-collections = { git = "https://github.com/meilisearch/raw-collections.git", version = "0.1.0" }
rayon = "1.10.0" rayon = "1.10.0"
roaring = { version = "0.10.7", features = ["serde"] } roaring = { version = "0.10.7", features = ["serde"] }
serde = { version = "1.0.204", features = ["derive"] } serde = { version = "1.0.204", features = ["derive"] }
@ -30,7 +33,6 @@ serde_json = { version = "1.0.120", features = ["preserve_order"] }
synchronoise = "1.0.1" synchronoise = "1.0.1"
tempfile = "3.10.1" tempfile = "3.10.1"
thiserror = "1.0.61" thiserror = "1.0.61"
memmap2 = "0.9.4"
time = { version = "0.3.36", features = [ time = { version = "0.3.36", features = [
"serde-well-known", "serde-well-known",
"formatting", "formatting",
@ -40,7 +42,6 @@ time = { version = "0.3.36", features = [
tracing = "0.1.40" tracing = "0.1.40"
ureq = "2.10.0" ureq = "2.10.0"
uuid = { version = "1.10.0", features = ["serde", "v4"] } uuid = { version = "1.10.0", features = ["serde", "v4"] }
bumpalo = "3.16.0"
[dev-dependencies] [dev-dependencies]
arroy = "0.5.0" arroy = "0.5.0"

View File

@ -22,27 +22,26 @@ use std::ffi::OsStr;
use std::fmt; use std::fmt;
use std::fs::{self, File}; use std::fs::{self, File};
use std::io::BufWriter; use std::io::BufWriter;
use std::sync::atomic::{self, AtomicU64}; use std::sync::atomic::Ordering;
use std::time::Duration;
use bumpalo::collections::CollectIn; use bumpalo::collections::CollectIn;
use bumpalo::Bump; use bumpalo::Bump;
use dump::IndexMetadata; use dump::IndexMetadata;
use meilisearch_types::batches::BatchId; use meilisearch_types::batches::BatchId;
use meilisearch_types::error::Code;
use meilisearch_types::heed::{RoTxn, RwTxn}; use meilisearch_types::heed::{RoTxn, RwTxn};
use meilisearch_types::milli::documents::{obkv_to_object, DocumentsBatchReader, PrimaryKey}; use meilisearch_types::milli::documents::{obkv_to_object, DocumentsBatchReader, PrimaryKey};
use meilisearch_types::milli::heed::CompactionOption; use meilisearch_types::milli::heed::CompactionOption;
use meilisearch_types::milli::progress::Progress;
use meilisearch_types::milli::update::new::indexer::{self, UpdateByFunction}; use meilisearch_types::milli::update::new::indexer::{self, UpdateByFunction};
use meilisearch_types::milli::update::{IndexDocumentsMethod, Settings as MilliSettings}; use meilisearch_types::milli::update::{
DocumentAdditionResult, IndexDocumentsMethod, Settings as MilliSettings,
};
use meilisearch_types::milli::vector::parsed_vectors::{ use meilisearch_types::milli::vector::parsed_vectors::{
ExplicitVectors, VectorOrArrayOfVectors, RESERVED_VECTORS_FIELD_NAME, ExplicitVectors, VectorOrArrayOfVectors, RESERVED_VECTORS_FIELD_NAME,
}; };
use meilisearch_types::milli::{self, Filter, ThreadPoolNoAbortBuilder}; use meilisearch_types::milli::{self, Filter, ThreadPoolNoAbortBuilder};
use meilisearch_types::settings::{apply_settings_to_builder, Settings, Unchecked}; use meilisearch_types::settings::{apply_settings_to_builder, Settings, Unchecked};
use meilisearch_types::tasks::{ use meilisearch_types::tasks::{Details, IndexSwap, Kind, KindWithContent, Status, Task};
Details, IndexSwap, Kind, KindWithContent, Status, Task, TaskProgress,
};
use meilisearch_types::{compression, Index, VERSION_FILE_NAME}; use meilisearch_types::{compression, Index, VERSION_FILE_NAME};
use roaring::RoaringBitmap; use roaring::RoaringBitmap;
use time::macros::format_description; use time::macros::format_description;
@ -50,6 +49,13 @@ use time::OffsetDateTime;
use uuid::Uuid; use uuid::Uuid;
use crate::autobatcher::{self, BatchKind}; use crate::autobatcher::{self, BatchKind};
use crate::processing::{
AtomicBatchStep, AtomicDocumentStep, AtomicTaskStep, AtomicUpdateFileStep, CreateIndexProgress,
DeleteIndexProgress, DocumentDeletionProgress, DocumentEditionProgress,
DocumentOperationProgress, DumpCreationProgress, InnerSwappingTwoIndexes, SettingsProgress,
SnapshotCreationProgress, SwappingTheIndexes, TaskCancelationProgress, TaskDeletionProgress,
UpdateIndexProgress, VariableNameStep,
};
use crate::utils::{self, swap_index_uid_in_task, ProcessingBatch}; use crate::utils::{self, swap_index_uid_in_task, ProcessingBatch};
use crate::{Error, IndexScheduler, Result, TaskId}; use crate::{Error, IndexScheduler, Result, TaskId};
@ -497,7 +503,6 @@ impl IndexScheduler {
// 5. We make a batch from the unprioritised tasks. Start by taking the next enqueued task. // 5. We make a batch from the unprioritised tasks. Start by taking the next enqueued task.
let task_id = if let Some(task_id) = enqueued.min() { task_id } else { return Ok(None) }; let task_id = if let Some(task_id) = enqueued.min() { task_id } else { return Ok(None) };
let mut task = self.get_task(rtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?; let mut task = self.get_task(rtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?;
current_batch.processing(Some(&mut task));
// If the task is not associated with any index, verify that it is an index swap and // If the task is not associated with any index, verify that it is an index swap and
// create the batch directly. Otherwise, get the index name associated with the task // create the batch directly. Otherwise, get the index name associated with the task
@ -507,6 +512,7 @@ impl IndexScheduler {
index_name index_name
} else { } else {
assert!(matches!(&task.kind, KindWithContent::IndexSwap { swaps } if swaps.is_empty())); assert!(matches!(&task.kind, KindWithContent::IndexSwap { swaps } if swaps.is_empty()));
current_batch.processing(Some(&mut task));
return Ok(Some((Batch::IndexSwap { task }, current_batch))); return Ok(Some((Batch::IndexSwap { task }, current_batch)));
}; };
@ -560,11 +566,12 @@ impl IndexScheduler {
/// The list of tasks that were processed. The metadata of each task in the returned /// The list of tasks that were processed. The metadata of each task in the returned
/// list is updated accordingly, with the exception of the its date fields /// list is updated accordingly, with the exception of the its date fields
/// [`finished_at`](meilisearch_types::tasks::Task::finished_at) and [`started_at`](meilisearch_types::tasks::Task::started_at). /// [`finished_at`](meilisearch_types::tasks::Task::finished_at) and [`started_at`](meilisearch_types::tasks::Task::started_at).
#[tracing::instrument(level = "trace", skip(self, batch), target = "indexing::scheduler", fields(batch=batch.to_string()))] #[tracing::instrument(level = "trace", skip(self, batch, progress), target = "indexing::scheduler", fields(batch=batch.to_string()))]
pub(crate) fn process_batch( pub(crate) fn process_batch(
&self, &self,
batch: Batch, batch: Batch,
current_batch: &mut ProcessingBatch, current_batch: &mut ProcessingBatch,
progress: Progress,
) -> Result<Vec<Task>> { ) -> Result<Vec<Task>> {
#[cfg(test)] #[cfg(test)]
{ {
@ -584,8 +591,13 @@ impl IndexScheduler {
}; };
let rtxn = self.env.read_txn()?; let rtxn = self.env.read_txn()?;
let mut canceled_tasks = let mut canceled_tasks = self.cancel_matched_tasks(
self.cancel_matched_tasks(&rtxn, task.uid, current_batch, matched_tasks)?; &rtxn,
task.uid,
current_batch,
matched_tasks,
&progress,
)?;
task.status = Status::Succeeded; task.status = Status::Succeeded;
match &mut task.details { match &mut task.details {
@ -616,7 +628,8 @@ impl IndexScheduler {
} }
let mut wtxn = self.env.write_txn()?; let mut wtxn = self.env.write_txn()?;
let mut deleted_tasks = self.delete_matched_tasks(&mut wtxn, &matched_tasks)?; let mut deleted_tasks =
self.delete_matched_tasks(&mut wtxn, &matched_tasks, &progress)?;
wtxn.commit()?; wtxn.commit()?;
for task in tasks.iter_mut() { for task in tasks.iter_mut() {
@ -642,6 +655,8 @@ impl IndexScheduler {
Ok(tasks) Ok(tasks)
} }
Batch::SnapshotCreation(mut tasks) => { Batch::SnapshotCreation(mut tasks) => {
progress.update_progress(SnapshotCreationProgress::StartTheSnapshotCreation);
fs::create_dir_all(&self.snapshots_path)?; fs::create_dir_all(&self.snapshots_path)?;
let temp_snapshot_dir = tempfile::tempdir()?; let temp_snapshot_dir = tempfile::tempdir()?;
@ -662,6 +677,7 @@ impl IndexScheduler {
// two read operations as the task processing is synchronous. // two read operations as the task processing is synchronous.
// 2.1 First copy the LMDB env of the index-scheduler // 2.1 First copy the LMDB env of the index-scheduler
progress.update_progress(SnapshotCreationProgress::SnapshotTheIndexScheduler);
let dst = temp_snapshot_dir.path().join("tasks"); let dst = temp_snapshot_dir.path().join("tasks");
fs::create_dir_all(&dst)?; fs::create_dir_all(&dst)?;
self.env.copy_to_file(dst.join("data.mdb"), CompactionOption::Enabled)?; self.env.copy_to_file(dst.join("data.mdb"), CompactionOption::Enabled)?;
@ -674,27 +690,41 @@ impl IndexScheduler {
fs::create_dir_all(&update_files_dir)?; fs::create_dir_all(&update_files_dir)?;
// 2.4 Only copy the update files of the enqueued tasks // 2.4 Only copy the update files of the enqueued tasks
for task_id in self.get_status(&rtxn, Status::Enqueued)? { progress.update_progress(SnapshotCreationProgress::SnapshotTheUpdateFiles);
let enqueued = self.get_status(&rtxn, Status::Enqueued)?;
let (atomic, update_file_progress) =
AtomicUpdateFileStep::new(enqueued.len() as u32);
progress.update_progress(update_file_progress);
for task_id in enqueued {
let task = self.get_task(&rtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?; let task = self.get_task(&rtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?;
if let Some(content_uuid) = task.content_uuid() { if let Some(content_uuid) = task.content_uuid() {
let src = self.file_store.get_update_path(content_uuid); let src = self.file_store.get_update_path(content_uuid);
let dst = update_files_dir.join(content_uuid.to_string()); let dst = update_files_dir.join(content_uuid.to_string());
fs::copy(src, dst)?; fs::copy(src, dst)?;
} }
atomic.fetch_add(1, Ordering::Relaxed);
} }
// 3. Snapshot every indexes // 3. Snapshot every indexes
for result in self.index_mapper.index_mapping.iter(&rtxn)? { progress.update_progress(SnapshotCreationProgress::SnapshotTheIndexes);
let index_mapping = self.index_mapper.index_mapping;
let nb_indexes = index_mapping.len(&rtxn)? as u32;
for (i, result) in index_mapping.iter(&rtxn)?.enumerate() {
let (name, uuid) = result?; let (name, uuid) = result?;
progress.update_progress(VariableNameStep::new(name, i as u32, nb_indexes));
let index = self.index_mapper.index(&rtxn, name)?; let index = self.index_mapper.index(&rtxn, name)?;
let dst = temp_snapshot_dir.path().join("indexes").join(uuid.to_string()); let dst = temp_snapshot_dir.path().join("indexes").join(uuid.to_string());
fs::create_dir_all(&dst)?; fs::create_dir_all(&dst)?;
index.copy_to_file(dst.join("data.mdb"), CompactionOption::Enabled)?; index
.copy_to_file(dst.join("data.mdb"), CompactionOption::Enabled)
.map_err(|e| Error::from_milli(e, Some(name.to_string())))?;
} }
drop(rtxn); drop(rtxn);
// 4. Snapshot the auth LMDB env // 4. Snapshot the auth LMDB env
progress.update_progress(SnapshotCreationProgress::SnapshotTheApiKeys);
let dst = temp_snapshot_dir.path().join("auth"); let dst = temp_snapshot_dir.path().join("auth");
fs::create_dir_all(&dst)?; fs::create_dir_all(&dst)?;
// TODO We can't use the open_auth_store_env function here but we should // TODO We can't use the open_auth_store_env function here but we should
@ -707,6 +737,7 @@ impl IndexScheduler {
auth.copy_to_file(dst.join("data.mdb"), CompactionOption::Enabled)?; auth.copy_to_file(dst.join("data.mdb"), CompactionOption::Enabled)?;
// 5. Copy and tarball the flat snapshot // 5. Copy and tarball the flat snapshot
progress.update_progress(SnapshotCreationProgress::CreateTheTarball);
// 5.1 Find the original name of the database // 5.1 Find the original name of the database
// TODO find a better way to get this path // TODO find a better way to get this path
let mut base_path = self.env.path().to_owned(); let mut base_path = self.env.path().to_owned();
@ -739,6 +770,7 @@ impl IndexScheduler {
Ok(tasks) Ok(tasks)
} }
Batch::Dump(mut task) => { Batch::Dump(mut task) => {
progress.update_progress(DumpCreationProgress::StartTheDumpCreation);
let started_at = OffsetDateTime::now_utc(); let started_at = OffsetDateTime::now_utc();
let (keys, instance_uid) = let (keys, instance_uid) =
if let KindWithContent::DumpCreation { keys, instance_uid } = &task.kind { if let KindWithContent::DumpCreation { keys, instance_uid } = &task.kind {
@ -749,6 +781,7 @@ impl IndexScheduler {
let dump = dump::DumpWriter::new(*instance_uid)?; let dump = dump::DumpWriter::new(*instance_uid)?;
// 1. dump the keys // 1. dump the keys
progress.update_progress(DumpCreationProgress::DumpTheApiKeys);
let mut dump_keys = dump.create_keys()?; let mut dump_keys = dump.create_keys()?;
for key in keys { for key in keys {
dump_keys.push_key(key)?; dump_keys.push_key(key)?;
@ -758,7 +791,13 @@ impl IndexScheduler {
let rtxn = self.env.read_txn()?; let rtxn = self.env.read_txn()?;
// 2. dump the tasks // 2. dump the tasks
progress.update_progress(DumpCreationProgress::DumpTheTasks);
let mut dump_tasks = dump.create_tasks_queue()?; let mut dump_tasks = dump.create_tasks_queue()?;
let (atomic, update_task_progress) =
AtomicTaskStep::new(self.all_tasks.len(&rtxn)? as u32);
progress.update_progress(update_task_progress);
for ret in self.all_tasks.iter(&rtxn)? { for ret in self.all_tasks.iter(&rtxn)? {
if self.must_stop_processing.get() { if self.must_stop_processing.get() {
return Err(Error::AbortedTask); return Err(Error::AbortedTask);
@ -791,50 +830,84 @@ impl IndexScheduler {
let content_file = self.file_store.get_update(content_file)?; let content_file = self.file_store.get_update(content_file)?;
let reader = DocumentsBatchReader::from_reader(content_file) let reader = DocumentsBatchReader::from_reader(content_file)
.map_err(milli::Error::from)?; .map_err(|e| Error::from_milli(e.into(), None))?;
let (mut cursor, documents_batch_index) = let (mut cursor, documents_batch_index) =
reader.into_cursor_and_fields_index(); reader.into_cursor_and_fields_index();
while let Some(doc) = while let Some(doc) = cursor
cursor.next_document().map_err(milli::Error::from)? .next_document()
.map_err(|e| Error::from_milli(e.into(), None))?
{ {
dump_content_file dump_content_file.push_document(
.push_document(&obkv_to_object(doc, &documents_batch_index)?)?; &obkv_to_object(doc, &documents_batch_index)
.map_err(|e| Error::from_milli(e, None))?,
)?;
} }
dump_content_file.flush()?; dump_content_file.flush()?;
} }
} }
atomic.fetch_add(1, Ordering::Relaxed);
} }
dump_tasks.flush()?; dump_tasks.flush()?;
// 3. Dump the indexes // 3. Dump the indexes
progress.update_progress(DumpCreationProgress::DumpTheIndexes);
let nb_indexes = self.index_mapper.index_mapping.len(&rtxn)? as u32;
let mut count = 0;
self.index_mapper.try_for_each_index(&rtxn, |uid, index| -> Result<()> { self.index_mapper.try_for_each_index(&rtxn, |uid, index| -> Result<()> {
progress.update_progress(VariableNameStep::new(
uid.to_string(),
count,
nb_indexes,
));
count += 1;
let rtxn = index.read_txn()?; let rtxn = index.read_txn()?;
let metadata = IndexMetadata { let metadata = IndexMetadata {
uid: uid.to_owned(), uid: uid.to_owned(),
primary_key: index.primary_key(&rtxn)?.map(String::from), primary_key: index.primary_key(&rtxn)?.map(String::from),
created_at: index.created_at(&rtxn)?, created_at: index
updated_at: index.updated_at(&rtxn)?, .created_at(&rtxn)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?,
updated_at: index
.updated_at(&rtxn)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?,
}; };
let mut index_dumper = dump.create_index(uid, &metadata)?; let mut index_dumper = dump.create_index(uid, &metadata)?;
let fields_ids_map = index.fields_ids_map(&rtxn)?; let fields_ids_map = index.fields_ids_map(&rtxn)?;
let all_fields: Vec<_> = fields_ids_map.iter().map(|(id, _)| id).collect(); let all_fields: Vec<_> = fields_ids_map.iter().map(|(id, _)| id).collect();
let embedding_configs = index.embedding_configs(&rtxn)?; let embedding_configs = index
.embedding_configs(&rtxn)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
let nb_documents = index
.number_of_documents(&rtxn)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?
as u32;
let (atomic, update_document_progress) = AtomicDocumentStep::new(nb_documents);
progress.update_progress(update_document_progress);
let documents = index
.all_documents(&rtxn)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
// 3.1. Dump the documents // 3.1. Dump the documents
for ret in index.all_documents(&rtxn)? { for ret in documents {
if self.must_stop_processing.get() { if self.must_stop_processing.get() {
return Err(Error::AbortedTask); return Err(Error::AbortedTask);
} }
let (id, doc) = ret?; let (id, doc) =
ret.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
let mut document = milli::obkv_to_json(&all_fields, &fields_ids_map, doc)?; let mut document =
milli::obkv_to_json(&all_fields, &fields_ids_map, doc)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
'inject_vectors: { 'inject_vectors: {
let embeddings = index.embeddings(&rtxn, id)?; let embeddings = index
.embeddings(&rtxn, id)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
if embeddings.is_empty() { if embeddings.is_empty() {
break 'inject_vectors; break 'inject_vectors;
@ -845,7 +918,7 @@ impl IndexScheduler {
.or_insert(serde_json::Value::Object(Default::default())); .or_insert(serde_json::Value::Object(Default::default()));
let serde_json::Value::Object(vectors) = vectors else { let serde_json::Value::Object(vectors) = vectors else {
return Err(milli::Error::UserError( let user_err = milli::Error::UserError(
milli::UserError::InvalidVectorsMapType { milli::UserError::InvalidVectorsMapType {
document_id: { document_id: {
if let Ok(Some(Ok(index))) = index if let Ok(Some(Ok(index))) = index
@ -859,8 +932,9 @@ impl IndexScheduler {
}, },
value: vectors.clone(), value: vectors.clone(),
}, },
) );
.into());
return Err(Error::from_milli(user_err, Some(uid.to_string())));
}; };
for (embedder_name, embeddings) in embeddings { for (embedder_name, embeddings) in embeddings {
@ -883,6 +957,7 @@ impl IndexScheduler {
} }
index_dumper.push_document(&document)?; index_dumper.push_document(&document)?;
atomic.fetch_add(1, Ordering::Relaxed);
} }
// 3.2. Dump the settings // 3.2. Dump the settings
@ -890,12 +965,14 @@ impl IndexScheduler {
index, index,
&rtxn, &rtxn,
meilisearch_types::settings::SecretPolicy::RevealSecrets, meilisearch_types::settings::SecretPolicy::RevealSecrets,
)?; )
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
index_dumper.settings(&settings)?; index_dumper.settings(&settings)?;
Ok(()) Ok(())
})?; })?;
// 4. Dump experimental feature settings // 4. Dump experimental feature settings
progress.update_progress(DumpCreationProgress::DumpTheExperimentalFeatures);
let features = self.features().runtime_features(); let features = self.features().runtime_features();
dump.create_experimental_features(features)?; dump.create_experimental_features(features)?;
@ -906,6 +983,7 @@ impl IndexScheduler {
if self.must_stop_processing.get() { if self.must_stop_processing.get() {
return Err(Error::AbortedTask); return Err(Error::AbortedTask);
} }
progress.update_progress(DumpCreationProgress::CompressTheDump);
let path = self.dumps_path.join(format!("{}.dump", dump_uid)); let path = self.dumps_path.join(format!("{}.dump", dump_uid));
let file = File::create(path)?; let file = File::create(path)?;
dump.persist_to(BufWriter::new(file))?; dump.persist_to(BufWriter::new(file))?;
@ -931,7 +1009,7 @@ impl IndexScheduler {
.set_currently_updating_index(Some((index_uid.clone(), index.clone()))); .set_currently_updating_index(Some((index_uid.clone(), index.clone())));
let mut index_wtxn = index.write_txn()?; let mut index_wtxn = index.write_txn()?;
let tasks = self.apply_index_operation(&mut index_wtxn, &index, op)?; let tasks = self.apply_index_operation(&mut index_wtxn, &index, op, progress)?;
{ {
let span = tracing::trace_span!(target: "indexing::scheduler", "commit"); let span = tracing::trace_span!(target: "indexing::scheduler", "commit");
@ -946,7 +1024,8 @@ impl IndexScheduler {
// the entire batch. // the entire batch.
let res = || -> Result<()> { let res = || -> Result<()> {
let index_rtxn = index.read_txn()?; let index_rtxn = index.read_txn()?;
let stats = crate::index_mapper::IndexStats::new(&index, &index_rtxn)?; let stats = crate::index_mapper::IndexStats::new(&index, &index_rtxn)
.map_err(|e| Error::from_milli(e, Some(index_uid.to_string())))?;
let mut wtxn = self.env.write_txn()?; let mut wtxn = self.env.write_txn()?;
self.index_mapper.store_stats_of(&mut wtxn, &index_uid, &stats)?; self.index_mapper.store_stats_of(&mut wtxn, &index_uid, &stats)?;
wtxn.commit()?; wtxn.commit()?;
@ -964,6 +1043,8 @@ impl IndexScheduler {
Ok(tasks) Ok(tasks)
} }
Batch::IndexCreation { index_uid, primary_key, task } => { Batch::IndexCreation { index_uid, primary_key, task } => {
progress.update_progress(CreateIndexProgress::CreatingTheIndex);
let wtxn = self.env.write_txn()?; let wtxn = self.env.write_txn()?;
if self.index_mapper.exists(&wtxn, &index_uid)? { if self.index_mapper.exists(&wtxn, &index_uid)? {
return Err(Error::IndexAlreadyExists(index_uid)); return Err(Error::IndexAlreadyExists(index_uid));
@ -973,9 +1054,11 @@ impl IndexScheduler {
self.process_batch( self.process_batch(
Batch::IndexUpdate { index_uid, primary_key, task }, Batch::IndexUpdate { index_uid, primary_key, task },
current_batch, current_batch,
progress,
) )
} }
Batch::IndexUpdate { index_uid, primary_key, mut task } => { Batch::IndexUpdate { index_uid, primary_key, mut task } => {
progress.update_progress(UpdateIndexProgress::UpdatingTheIndex);
let rtxn = self.env.read_txn()?; let rtxn = self.env.read_txn()?;
let index = self.index_mapper.index(&rtxn, &index_uid)?; let index = self.index_mapper.index(&rtxn, &index_uid)?;
@ -988,10 +1071,12 @@ impl IndexScheduler {
); );
builder.set_primary_key(primary_key); builder.set_primary_key(primary_key);
let must_stop_processing = self.must_stop_processing.clone(); let must_stop_processing = self.must_stop_processing.clone();
builder.execute( builder
|indexing_step| tracing::debug!(update = ?indexing_step), .execute(
|| must_stop_processing.get(), |indexing_step| tracing::debug!(update = ?indexing_step),
)?; || must_stop_processing.get(),
)
.map_err(|e| Error::from_milli(e, Some(index_uid.to_string())))?;
index_wtxn.commit()?; index_wtxn.commit()?;
} }
@ -1008,7 +1093,8 @@ impl IndexScheduler {
let res = || -> Result<()> { let res = || -> Result<()> {
let mut wtxn = self.env.write_txn()?; let mut wtxn = self.env.write_txn()?;
let index_rtxn = index.read_txn()?; let index_rtxn = index.read_txn()?;
let stats = crate::index_mapper::IndexStats::new(&index, &index_rtxn)?; let stats = crate::index_mapper::IndexStats::new(&index, &index_rtxn)
.map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?;
self.index_mapper.store_stats_of(&mut wtxn, &index_uid, &stats)?; self.index_mapper.store_stats_of(&mut wtxn, &index_uid, &stats)?;
wtxn.commit()?; wtxn.commit()?;
Ok(()) Ok(())
@ -1025,13 +1111,16 @@ impl IndexScheduler {
Ok(vec![task]) Ok(vec![task])
} }
Batch::IndexDeletion { index_uid, index_has_been_created, mut tasks } => { Batch::IndexDeletion { index_uid, index_has_been_created, mut tasks } => {
progress.update_progress(DeleteIndexProgress::DeletingTheIndex);
let wtxn = self.env.write_txn()?; let wtxn = self.env.write_txn()?;
// it's possible that the index doesn't exist // it's possible that the index doesn't exist
let number_of_documents = || -> Result<u64> { let number_of_documents = || -> Result<u64> {
let index = self.index_mapper.index(&wtxn, &index_uid)?; let index = self.index_mapper.index(&wtxn, &index_uid)?;
let index_rtxn = index.read_txn()?; let index_rtxn = index.read_txn()?;
Ok(index.number_of_documents(&index_rtxn)?) index
.number_of_documents(&index_rtxn)
.map_err(|e| Error::from_milli(e, Some(index_uid.to_string())))
}() }()
.unwrap_or_default(); .unwrap_or_default();
@ -1056,6 +1145,8 @@ impl IndexScheduler {
Ok(tasks) Ok(tasks)
} }
Batch::IndexSwap { mut task } => { Batch::IndexSwap { mut task } => {
progress.update_progress(SwappingTheIndexes::EnsuringCorrectnessOfTheSwap);
let mut wtxn = self.env.write_txn()?; let mut wtxn = self.env.write_txn()?;
let swaps = if let KindWithContent::IndexSwap { swaps } = &task.kind { let swaps = if let KindWithContent::IndexSwap { swaps } = &task.kind {
swaps swaps
@ -1082,8 +1173,20 @@ impl IndexScheduler {
)); ));
} }
} }
for swap in swaps { progress.update_progress(SwappingTheIndexes::SwappingTheIndexes);
self.apply_index_swap(&mut wtxn, task.uid, &swap.indexes.0, &swap.indexes.1)?; for (step, swap) in swaps.iter().enumerate() {
progress.update_progress(VariableNameStep::new(
format!("swapping index {} and {}", swap.indexes.0, swap.indexes.1),
step as u32,
swaps.len() as u32,
));
self.apply_index_swap(
&mut wtxn,
&progress,
task.uid,
&swap.indexes.0,
&swap.indexes.1,
)?;
} }
wtxn.commit()?; wtxn.commit()?;
task.status = Status::Succeeded; task.status = Status::Succeeded;
@ -1093,7 +1196,15 @@ impl IndexScheduler {
} }
/// Swap the index `lhs` with the index `rhs`. /// Swap the index `lhs` with the index `rhs`.
fn apply_index_swap(&self, wtxn: &mut RwTxn, task_id: u32, lhs: &str, rhs: &str) -> Result<()> { fn apply_index_swap(
&self,
wtxn: &mut RwTxn,
progress: &Progress,
task_id: u32,
lhs: &str,
rhs: &str,
) -> Result<()> {
progress.update_progress(InnerSwappingTwoIndexes::RetrieveTheTasks);
// 1. Verify that both lhs and rhs are existing indexes // 1. Verify that both lhs and rhs are existing indexes
let index_lhs_exists = self.index_mapper.index_exists(wtxn, lhs)?; let index_lhs_exists = self.index_mapper.index_exists(wtxn, lhs)?;
if !index_lhs_exists { if !index_lhs_exists {
@ -1111,14 +1222,21 @@ impl IndexScheduler {
index_rhs_task_ids.remove_range(task_id..); index_rhs_task_ids.remove_range(task_id..);
// 3. before_name -> new_name in the task's KindWithContent // 3. before_name -> new_name in the task's KindWithContent
for task_id in &index_lhs_task_ids | &index_rhs_task_ids { progress.update_progress(InnerSwappingTwoIndexes::UpdateTheTasks);
let tasks_to_update = &index_lhs_task_ids | &index_rhs_task_ids;
let (atomic, task_progress) = AtomicTaskStep::new(tasks_to_update.len() as u32);
progress.update_progress(task_progress);
for task_id in tasks_to_update {
let mut task = self.get_task(wtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?; let mut task = self.get_task(wtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?;
swap_index_uid_in_task(&mut task, (lhs, rhs)); swap_index_uid_in_task(&mut task, (lhs, rhs));
self.all_tasks.put(wtxn, &task_id, &task)?; self.all_tasks.put(wtxn, &task_id, &task)?;
atomic.fetch_add(1, Ordering::Relaxed);
} }
// 4. remove the task from indexuid = before_name // 4. remove the task from indexuid = before_name
// 5. add the task to indexuid = after_name // 5. add the task to indexuid = after_name
progress.update_progress(InnerSwappingTwoIndexes::UpdateTheIndexesMetadata);
self.update_index(wtxn, lhs, |lhs_tasks| { self.update_index(wtxn, lhs, |lhs_tasks| {
*lhs_tasks -= &index_lhs_task_ids; *lhs_tasks -= &index_lhs_task_ids;
*lhs_tasks |= &index_rhs_task_ids; *lhs_tasks |= &index_rhs_task_ids;
@ -1140,7 +1258,7 @@ impl IndexScheduler {
/// The list of processed tasks. /// The list of processed tasks.
#[tracing::instrument( #[tracing::instrument(
level = "trace", level = "trace",
skip(self, index_wtxn, index), skip(self, index_wtxn, index, progress),
target = "indexing::scheduler" target = "indexing::scheduler"
)] )]
fn apply_index_operation<'i>( fn apply_index_operation<'i>(
@ -1148,48 +1266,18 @@ impl IndexScheduler {
index_wtxn: &mut RwTxn<'i>, index_wtxn: &mut RwTxn<'i>,
index: &'i Index, index: &'i Index,
operation: IndexOperation, operation: IndexOperation,
progress: Progress,
) -> Result<Vec<Task>> { ) -> Result<Vec<Task>> {
let indexer_alloc = Bump::new(); let indexer_alloc = Bump::new();
let started_processing_at = std::time::Instant::now(); let started_processing_at = std::time::Instant::now();
let secs_since_started_processing_at = AtomicU64::new(0);
const PRINT_SECS_DELTA: u64 = 5;
let processing_tasks = self.processing_tasks.clone();
let must_stop_processing = self.must_stop_processing.clone(); let must_stop_processing = self.must_stop_processing.clone();
let send_progress = |progress| {
let now = std::time::Instant::now();
let elapsed = secs_since_started_processing_at.load(atomic::Ordering::Relaxed);
let previous = started_processing_at + Duration::from_secs(elapsed);
let elapsed = now - previous;
if elapsed.as_secs() < PRINT_SECS_DELTA {
return;
}
secs_since_started_processing_at
.store((now - started_processing_at).as_secs(), atomic::Ordering::Relaxed);
let TaskProgress {
current_step,
finished_steps,
total_steps,
finished_substeps,
total_substeps,
} = processing_tasks.write().unwrap().update_progress(progress);
tracing::info!(
current_step,
finished_steps,
total_steps,
finished_substeps,
total_substeps
);
};
match operation { match operation {
IndexOperation::DocumentClear { mut tasks, .. } => { IndexOperation::DocumentClear { index_uid, mut tasks } => {
let count = milli::update::ClearDocuments::new(index_wtxn, index).execute()?; let count = milli::update::ClearDocuments::new(index_wtxn, index)
.execute()
.map_err(|e| Error::from_milli(e, Some(index_uid)))?;
let mut first_clear_found = false; let mut first_clear_found = false;
for task in &mut tasks { for task in &mut tasks {
@ -1209,12 +1297,13 @@ impl IndexScheduler {
Ok(tasks) Ok(tasks)
} }
IndexOperation::DocumentOperation { IndexOperation::DocumentOperation {
index_uid: _, index_uid,
primary_key, primary_key,
method, method,
operations, operations,
mut tasks, mut tasks,
} => { } => {
progress.update_progress(DocumentOperationProgress::RetrievingConfig);
// TODO: at some point, for better efficiency we might want to reuse the bumpalo for successive batches. // TODO: at some point, for better efficiency we might want to reuse the bumpalo for successive batches.
// this is made difficult by the fact we're doing private clones of the index scheduler and sending it // this is made difficult by the fact we're doing private clones of the index scheduler and sending it
// to a fresh thread. // to a fresh thread.
@ -1223,9 +1312,7 @@ impl IndexScheduler {
if let DocumentOperation::Add(content_uuid) = operation { if let DocumentOperation::Add(content_uuid) = operation {
let content_file = self.file_store.get_update(*content_uuid)?; let content_file = self.file_store.get_update(*content_uuid)?;
let mmap = unsafe { memmap2::Mmap::map(&content_file)? }; let mmap = unsafe { memmap2::Mmap::map(&content_file)? };
if !mmap.is_empty() { content_files.push(mmap);
content_files.push(mmap);
}
} }
} }
@ -1235,13 +1322,17 @@ impl IndexScheduler {
let mut content_files_iter = content_files.iter(); let mut content_files_iter = content_files.iter();
let mut indexer = indexer::DocumentOperation::new(method); let mut indexer = indexer::DocumentOperation::new(method);
let embedders = index.embedding_configs(index_wtxn)?; let embedders = index
let embedders = self.embedders(embedders)?; .embedding_configs(index_wtxn)
.map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?;
let embedders = self.embedders(index_uid.clone(), embedders)?;
for operation in operations { for operation in operations {
match operation { match operation {
DocumentOperation::Add(_content_uuid) => { DocumentOperation::Add(_content_uuid) => {
let mmap = content_files_iter.next().unwrap(); let mmap = content_files_iter.next().unwrap();
indexer.add_documents(mmap)?; indexer
.add_documents(mmap)
.map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?;
} }
DocumentOperation::Delete(document_ids) => { DocumentOperation::Delete(document_ids) => {
let document_ids: bumpalo::collections::vec::Vec<_> = document_ids let document_ids: bumpalo::collections::vec::Vec<_> = document_ids
@ -1266,19 +1357,22 @@ impl IndexScheduler {
} }
}; };
let (document_changes, operation_stats, primary_key) = indexer.into_changes( progress.update_progress(DocumentOperationProgress::ComputingDocumentChanges);
&indexer_alloc, let (document_changes, operation_stats, primary_key) = indexer
index, .into_changes(
&rtxn, &indexer_alloc,
primary_key.as_deref(), index,
&mut new_fields_ids_map, &rtxn,
&|| must_stop_processing.get(), primary_key.as_deref(),
&send_progress, &mut new_fields_ids_map,
)?; &|| must_stop_processing.get(),
progress.clone(),
)
.map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?;
let mut addition = 0; let mut candidates_count = 0;
for (stats, task) in operation_stats.into_iter().zip(&mut tasks) { for (stats, task) in operation_stats.into_iter().zip(&mut tasks) {
addition += stats.document_count; candidates_count += stats.document_count;
match stats.error { match stats.error {
Some(error) => { Some(error) => {
task.status = Status::Failed; task.status = Status::Failed;
@ -1308,6 +1402,7 @@ impl IndexScheduler {
} }
} }
progress.update_progress(DocumentOperationProgress::Indexing);
if tasks.iter().any(|res| res.error.is_none()) { if tasks.iter().any(|res| res.error.is_none()) {
indexer::index( indexer::index(
index_wtxn, index_wtxn,
@ -1320,15 +1415,25 @@ impl IndexScheduler {
&document_changes, &document_changes,
embedders, embedders,
&|| must_stop_processing.get(), &|| must_stop_processing.get(),
&send_progress, &progress,
)?; )
.map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?;
let addition = DocumentAdditionResult {
indexed_documents: candidates_count,
number_of_documents: index
.number_of_documents(index_wtxn)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?,
};
tracing::info!(indexing_result = ?addition, processed_in = ?started_processing_at.elapsed(), "document indexing done"); tracing::info!(indexing_result = ?addition, processed_in = ?started_processing_at.elapsed(), "document indexing done");
} }
Ok(tasks) Ok(tasks)
} }
IndexOperation::DocumentEdition { mut task, .. } => { IndexOperation::DocumentEdition { index_uid, mut task } => {
progress.update_progress(DocumentEditionProgress::RetrievingConfig);
let (filter, code) = if let KindWithContent::DocumentEdition { let (filter, code) = if let KindWithContent::DocumentEdition {
filter_expr, filter_expr,
context: _, context: _,
@ -1342,16 +1447,11 @@ impl IndexScheduler {
}; };
let candidates = match filter.as_ref().map(Filter::from_json) { let candidates = match filter.as_ref().map(Filter::from_json) {
Some(Ok(Some(filter))) => { Some(Ok(Some(filter))) => filter
filter.evaluate(index_wtxn, index).map_err(|err| match err { .evaluate(index_wtxn, index)
milli::Error::UserError(milli::UserError::InvalidFilter(_)) => { .map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?,
Error::from(err).with_custom_error_code(Code::InvalidDocumentFilter)
}
e => e.into(),
})?
}
None | Some(Ok(None)) => index.documents_ids(index_wtxn)?, None | Some(Ok(None)) => index.documents_ids(index_wtxn)?,
Some(Err(e)) => return Err(e.into()), Some(Err(e)) => return Err(Error::from_milli(e, Some(index_uid.clone()))),
}; };
let (original_filter, context, function) = if let Some(Details::DocumentEdition { let (original_filter, context, function) = if let Some(Details::DocumentEdition {
@ -1386,8 +1486,9 @@ impl IndexScheduler {
// candidates not empty => index not empty => a primary key is set // candidates not empty => index not empty => a primary key is set
let primary_key = index.primary_key(&rtxn)?.unwrap(); let primary_key = index.primary_key(&rtxn)?.unwrap();
let primary_key = PrimaryKey::new_or_insert(primary_key, &mut new_fields_ids_map) let primary_key =
.map_err(milli::Error::from)?; PrimaryKey::new_or_insert(primary_key, &mut new_fields_ids_map)
.map_err(|err| Error::from_milli(err.into(), Some(index_uid.clone())))?;
let result_count = Ok((candidates.len(), candidates.len())) as Result<_>; let result_count = Ok((candidates.len(), candidates.len())) as Result<_>;
@ -1405,13 +1506,22 @@ impl IndexScheduler {
} }
}; };
let candidates_count = candidates.len();
progress.update_progress(DocumentEditionProgress::ComputingDocumentChanges);
let indexer = UpdateByFunction::new(candidates, context.clone(), code.clone()); let indexer = UpdateByFunction::new(candidates, context.clone(), code.clone());
let document_changes = let document_changes = pool
pool.install(|| indexer.into_changes(&primary_key)).unwrap()?; .install(|| {
indexer
let embedders = index.embedding_configs(index_wtxn)?; .into_changes(&primary_key)
let embedders = self.embedders(embedders)?; .map_err(|err| Error::from_milli(err, Some(index_uid.clone())))
})
.unwrap()?;
let embedders = index
.embedding_configs(index_wtxn)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?;
let embedders = self.embedders(index_uid.clone(), embedders)?;
progress.update_progress(DocumentEditionProgress::Indexing);
indexer::index( indexer::index(
index_wtxn, index_wtxn,
index, index,
@ -1423,10 +1533,18 @@ impl IndexScheduler {
&document_changes, &document_changes,
embedders, embedders,
&|| must_stop_processing.get(), &|| must_stop_processing.get(),
&send_progress, &progress,
)?; )
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?;
// tracing::info!(indexing_result = ?addition, processed_in = ?started_processing_at.elapsed(), "document indexing done"); let addition = DocumentAdditionResult {
indexed_documents: candidates_count,
number_of_documents: index
.number_of_documents(index_wtxn)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?,
};
tracing::info!(indexing_result = ?addition, processed_in = ?started_processing_at.elapsed(), "document indexing done");
} }
match result_count { match result_count {
@ -1455,7 +1573,9 @@ impl IndexScheduler {
Ok(vec![task]) Ok(vec![task])
} }
IndexOperation::DocumentDeletion { mut tasks, index_uid: _ } => { IndexOperation::DocumentDeletion { mut tasks, index_uid } => {
progress.update_progress(DocumentDeletionProgress::RetrievingConfig);
let mut to_delete = RoaringBitmap::new(); let mut to_delete = RoaringBitmap::new();
let external_documents_ids = index.external_documents_ids(); let external_documents_ids = index.external_documents_ids();
@ -1476,35 +1596,23 @@ impl IndexScheduler {
deleted_documents: Some(will_be_removed), deleted_documents: Some(will_be_removed),
}); });
} }
KindWithContent::DocumentDeletionByFilter { index_uid: _, filter_expr } => { KindWithContent::DocumentDeletionByFilter { index_uid, filter_expr } => {
let before = to_delete.len(); let before = to_delete.len();
let filter = match Filter::from_json(filter_expr) { let filter = match Filter::from_json(filter_expr) {
Ok(filter) => filter, Ok(filter) => filter,
Err(err) => { Err(err) => {
// theorically, this should be catched by deserr before reaching the index-scheduler and cannot happens // theorically, this should be catched by deserr before reaching the index-scheduler and cannot happens
task.status = Status::Failed; task.status = Status::Failed;
task.error = match err { task.error = Some(
milli::Error::UserError( Error::from_milli(err, Some(index_uid.clone())).into(),
milli::UserError::InvalidFilterExpression { .. }, );
) => Some(
Error::from(err)
.with_custom_error_code(Code::InvalidDocumentFilter)
.into(),
),
e => Some(e.into()),
};
None None
} }
}; };
if let Some(filter) = filter { if let Some(filter) = filter {
let candidates = let candidates = filter
filter.evaluate(index_wtxn, index).map_err(|err| match err { .evaluate(index_wtxn, index)
milli::Error::UserError( .map_err(|err| Error::from_milli(err, Some(index_uid.clone())));
milli::UserError::InvalidFilter(_),
) => Error::from(err)
.with_custom_error_code(Code::InvalidDocumentFilter),
e => e.into(),
});
match candidates { match candidates {
Ok(candidates) => to_delete |= candidates, Ok(candidates) => to_delete |= candidates,
Err(err) => { Err(err) => {
@ -1540,8 +1648,9 @@ impl IndexScheduler {
// to_delete not empty => index not empty => primary key set // to_delete not empty => index not empty => primary key set
let primary_key = index.primary_key(&rtxn)?.unwrap(); let primary_key = index.primary_key(&rtxn)?.unwrap();
let primary_key = PrimaryKey::new_or_insert(primary_key, &mut new_fields_ids_map) let primary_key =
.map_err(milli::Error::from)?; PrimaryKey::new_or_insert(primary_key, &mut new_fields_ids_map)
.map_err(|err| Error::from_milli(err.into(), Some(index_uid.clone())))?;
if !tasks.iter().all(|res| res.error.is_some()) { if !tasks.iter().all(|res| res.error.is_some()) {
let local_pool; let local_pool;
@ -1557,12 +1666,17 @@ impl IndexScheduler {
} }
}; };
progress.update_progress(DocumentDeletionProgress::DeleteDocuments);
let mut indexer = indexer::DocumentDeletion::new(); let mut indexer = indexer::DocumentDeletion::new();
let candidates_count = to_delete.len();
indexer.delete_documents_by_docids(to_delete); indexer.delete_documents_by_docids(to_delete);
let document_changes = indexer.into_changes(&indexer_alloc, primary_key); let document_changes = indexer.into_changes(&indexer_alloc, primary_key);
let embedders = index.embedding_configs(index_wtxn)?; let embedders = index
let embedders = self.embedders(embedders)?; .embedding_configs(index_wtxn)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?;
let embedders = self.embedders(index_uid.clone(), embedders)?;
progress.update_progress(DocumentDeletionProgress::Indexing);
indexer::index( indexer::index(
index_wtxn, index_wtxn,
index, index,
@ -1574,15 +1688,24 @@ impl IndexScheduler {
&document_changes, &document_changes,
embedders, embedders,
&|| must_stop_processing.get(), &|| must_stop_processing.get(),
&send_progress, &progress,
)?; )
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?;
// tracing::info!(indexing_result = ?addition, processed_in = ?started_processing_at.elapsed(), "document indexing done"); let addition = DocumentAdditionResult {
indexed_documents: candidates_count,
number_of_documents: index
.number_of_documents(index_wtxn)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?,
};
tracing::info!(indexing_result = ?addition, processed_in = ?started_processing_at.elapsed(), "document indexing done");
} }
Ok(tasks) Ok(tasks)
} }
IndexOperation::Settings { index_uid: _, settings, mut tasks } => { IndexOperation::Settings { index_uid, settings, mut tasks } => {
progress.update_progress(SettingsProgress::RetrievingAndMergingTheSettings);
let indexer_config = self.index_mapper.indexer_config(); let indexer_config = self.index_mapper.indexer_config();
let mut builder = milli::update::Settings::new(index_wtxn, index, indexer_config); let mut builder = milli::update::Settings::new(index_wtxn, index, indexer_config);
@ -1596,10 +1719,13 @@ impl IndexScheduler {
task.status = Status::Succeeded; task.status = Status::Succeeded;
} }
builder.execute( progress.update_progress(SettingsProgress::ApplyTheSettings);
|indexing_step| tracing::debug!(update = ?indexing_step), builder
|| must_stop_processing.get(), .execute(
)?; |indexing_step| tracing::debug!(update = ?indexing_step),
|| must_stop_processing.get(),
)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?;
Ok(tasks) Ok(tasks)
} }
@ -1616,12 +1742,14 @@ impl IndexScheduler {
index_uid: index_uid.clone(), index_uid: index_uid.clone(),
tasks: cleared_tasks, tasks: cleared_tasks,
}, },
progress.clone(),
)?; )?;
let settings_tasks = self.apply_index_operation( let settings_tasks = self.apply_index_operation(
index_wtxn, index_wtxn,
index, index,
IndexOperation::Settings { index_uid, settings, tasks: settings_tasks }, IndexOperation::Settings { index_uid, settings, tasks: settings_tasks },
progress,
)?; )?;
let mut tasks = settings_tasks; let mut tasks = settings_tasks;
@ -1638,15 +1766,18 @@ impl IndexScheduler {
&self, &self,
wtxn: &mut RwTxn, wtxn: &mut RwTxn,
matched_tasks: &RoaringBitmap, matched_tasks: &RoaringBitmap,
progress: &Progress,
) -> Result<RoaringBitmap> { ) -> Result<RoaringBitmap> {
progress.update_progress(TaskDeletionProgress::DeletingTasksDateTime);
// 1. Remove from this list the tasks that we are not allowed to delete // 1. Remove from this list the tasks that we are not allowed to delete
let enqueued_tasks = self.get_status(wtxn, Status::Enqueued)?; let enqueued_tasks = self.get_status(wtxn, Status::Enqueued)?;
let processing_tasks = &self.processing_tasks.read().unwrap().processing.clone(); let processing_tasks = &self.processing_tasks.read().unwrap().processing.clone();
let all_task_ids = self.all_task_ids(wtxn)?; let all_task_ids = self.all_task_ids(wtxn)?;
let mut to_delete_tasks = all_task_ids & matched_tasks; let mut to_delete_tasks = all_task_ids & matched_tasks;
to_delete_tasks -= processing_tasks; to_delete_tasks -= &**processing_tasks;
to_delete_tasks -= enqueued_tasks; to_delete_tasks -= &enqueued_tasks;
// 2. We now have a list of tasks to delete, delete them // 2. We now have a list of tasks to delete, delete them
@ -1657,6 +1788,8 @@ impl IndexScheduler {
// The tasks that have been removed *per batches*. // The tasks that have been removed *per batches*.
let mut affected_batches: HashMap<BatchId, RoaringBitmap> = HashMap::new(); let mut affected_batches: HashMap<BatchId, RoaringBitmap> = HashMap::new();
let (atomic_progress, task_progress) = AtomicTaskStep::new(to_delete_tasks.len() as u32);
progress.update_progress(task_progress);
for task_id in to_delete_tasks.iter() { for task_id in to_delete_tasks.iter() {
let task = self.get_task(wtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?; let task = self.get_task(wtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?;
@ -1680,22 +1813,35 @@ impl IndexScheduler {
if let Some(batch_uid) = task.batch_uid { if let Some(batch_uid) = task.batch_uid {
affected_batches.entry(batch_uid).or_default().insert(task_id); affected_batches.entry(batch_uid).or_default().insert(task_id);
} }
atomic_progress.fetch_add(1, Ordering::Relaxed);
} }
progress.update_progress(TaskDeletionProgress::DeletingTasksMetadata);
let (atomic_progress, task_progress) = AtomicTaskStep::new(
(affected_indexes.len() + affected_statuses.len() + affected_kinds.len()) as u32,
);
progress.update_progress(task_progress);
for index in affected_indexes.iter() { for index in affected_indexes.iter() {
self.update_index(wtxn, index, |bitmap| *bitmap -= &to_delete_tasks)?; self.update_index(wtxn, index, |bitmap| *bitmap -= &to_delete_tasks)?;
atomic_progress.fetch_add(1, Ordering::Relaxed);
} }
for status in affected_statuses.iter() { for status in affected_statuses.iter() {
self.update_status(wtxn, *status, |bitmap| *bitmap -= &to_delete_tasks)?; self.update_status(wtxn, *status, |bitmap| *bitmap -= &to_delete_tasks)?;
atomic_progress.fetch_add(1, Ordering::Relaxed);
} }
for kind in affected_kinds.iter() { for kind in affected_kinds.iter() {
self.update_kind(wtxn, *kind, |bitmap| *bitmap -= &to_delete_tasks)?; self.update_kind(wtxn, *kind, |bitmap| *bitmap -= &to_delete_tasks)?;
atomic_progress.fetch_add(1, Ordering::Relaxed);
} }
progress.update_progress(TaskDeletionProgress::DeletingTasks);
let (atomic_progress, task_progress) = AtomicTaskStep::new(to_delete_tasks.len() as u32);
progress.update_progress(task_progress);
for task in to_delete_tasks.iter() { for task in to_delete_tasks.iter() {
self.all_tasks.delete(wtxn, &task)?; self.all_tasks.delete(wtxn, &task)?;
atomic_progress.fetch_add(1, Ordering::Relaxed);
} }
for canceled_by in affected_canceled_by { for canceled_by in affected_canceled_by {
if let Some(mut tasks) = self.canceled_by.get(wtxn, &canceled_by)? { if let Some(mut tasks) = self.canceled_by.get(wtxn, &canceled_by)? {
@ -1707,6 +1853,9 @@ impl IndexScheduler {
} }
} }
} }
progress.update_progress(TaskDeletionProgress::DeletingBatches);
let (atomic_progress, batch_progress) = AtomicBatchStep::new(affected_batches.len() as u32);
progress.update_progress(batch_progress);
for (batch_id, to_delete_tasks) in affected_batches { for (batch_id, to_delete_tasks) in affected_batches {
if let Some(mut tasks) = self.batch_to_tasks_mapping.get(wtxn, &batch_id)? { if let Some(mut tasks) = self.batch_to_tasks_mapping.get(wtxn, &batch_id)? {
tasks -= &to_delete_tasks; tasks -= &to_delete_tasks;
@ -1748,6 +1897,7 @@ impl IndexScheduler {
} }
} }
} }
atomic_progress.fetch_add(1, Ordering::Relaxed);
} }
Ok(to_delete_tasks) Ok(to_delete_tasks)
@ -1762,21 +1912,36 @@ impl IndexScheduler {
cancel_task_id: TaskId, cancel_task_id: TaskId,
current_batch: &mut ProcessingBatch, current_batch: &mut ProcessingBatch,
matched_tasks: &RoaringBitmap, matched_tasks: &RoaringBitmap,
progress: &Progress,
) -> Result<Vec<Task>> { ) -> Result<Vec<Task>> {
progress.update_progress(TaskCancelationProgress::RetrievingTasks);
// 1. Remove from this list the tasks that we are not allowed to cancel // 1. Remove from this list the tasks that we are not allowed to cancel
// Notice that only the _enqueued_ ones are cancelable and we should // Notice that only the _enqueued_ ones are cancelable and we should
// have already aborted the indexation of the _processing_ ones // have already aborted the indexation of the _processing_ ones
let cancelable_tasks = self.get_status(rtxn, Status::Enqueued)?; let cancelable_tasks = self.get_status(rtxn, Status::Enqueued)?;
let tasks_to_cancel = cancelable_tasks & matched_tasks; let tasks_to_cancel = cancelable_tasks & matched_tasks;
// 2. We now have a list of tasks to cancel, cancel them let (task_progress, progress_obj) = AtomicTaskStep::new(tasks_to_cancel.len() as u32);
let mut tasks = self.get_existing_tasks(rtxn, tasks_to_cancel.iter())?; progress.update_progress(progress_obj);
// 2. We now have a list of tasks to cancel, cancel them
let mut tasks = self.get_existing_tasks(
rtxn,
tasks_to_cancel.iter().inspect(|_| {
task_progress.fetch_add(1, Ordering::Relaxed);
}),
)?;
progress.update_progress(TaskCancelationProgress::UpdatingTasks);
let (task_progress, progress_obj) = AtomicTaskStep::new(tasks_to_cancel.len() as u32);
progress.update_progress(progress_obj);
for task in tasks.iter_mut() { for task in tasks.iter_mut() {
task.status = Status::Canceled; task.status = Status::Canceled;
task.canceled_by = Some(cancel_task_id); task.canceled_by = Some(cancel_task_id);
task.details = task.details.as_ref().map(|d| d.to_failed()); task.details = task.details.as_ref().map(|d| d.to_failed());
current_batch.processing(Some(task)); current_batch.processing(Some(task));
task_progress.fetch_add(1, Ordering::Relaxed);
} }
Ok(tasks) Ok(tasks)

View File

@ -104,7 +104,7 @@ pub enum Error {
)] )]
InvalidTaskCanceledBy { canceled_by: String }, InvalidTaskCanceledBy { canceled_by: String },
#[error( #[error(
"{index_uid} is not a valid index uid. Index uid can be an integer or a string containing only alphanumeric characters, hyphens (-) and underscores (_), and can not be more than 512 bytes." "{index_uid} is not a valid index uid. Index uid can be an integer or a string containing only alphanumeric characters, hyphens (-) and underscores (_), and can not be more than 400 bytes."
)] )]
InvalidIndexUid { index_uid: String }, InvalidIndexUid { index_uid: String },
#[error("Task `{0}` not found.")] #[error("Task `{0}` not found.")]
@ -122,8 +122,11 @@ pub enum Error {
Dump(#[from] dump::Error), Dump(#[from] dump::Error),
#[error(transparent)] #[error(transparent)]
Heed(#[from] heed::Error), Heed(#[from] heed::Error),
#[error(transparent)] #[error("{}", match .index_uid {
Milli(#[from] milli::Error), Some(uid) if !uid.is_empty() => format!("Index `{}`: {error}", uid),
_ => format!("{error}")
})]
Milli { error: milli::Error, index_uid: Option<String> },
#[error("An unexpected crash occurred when processing the task.")] #[error("An unexpected crash occurred when processing the task.")]
ProcessBatchPanicked, ProcessBatchPanicked,
#[error(transparent)] #[error(transparent)]
@ -190,7 +193,7 @@ impl Error {
| Error::AbortedTask | Error::AbortedTask
| Error::Dump(_) | Error::Dump(_)
| Error::Heed(_) | Error::Heed(_)
| Error::Milli(_) | Error::Milli { .. }
| Error::ProcessBatchPanicked | Error::ProcessBatchPanicked
| Error::FileStore(_) | Error::FileStore(_)
| Error::IoError(_) | Error::IoError(_)
@ -209,6 +212,20 @@ impl Error {
pub fn with_custom_error_code(self, code: Code) -> Self { pub fn with_custom_error_code(self, code: Code) -> Self {
Self::WithCustomErrorCode(code, Box::new(self)) Self::WithCustomErrorCode(code, Box::new(self))
} }
pub fn from_milli(err: milli::Error, index_uid: Option<String>) -> Self {
match err {
milli::Error::UserError(milli::UserError::InvalidFilter(_)) => {
Self::Milli { error: err, index_uid }
.with_custom_error_code(Code::InvalidDocumentFilter)
}
milli::Error::UserError(milli::UserError::InvalidFilterExpression { .. }) => {
Self::Milli { error: err, index_uid }
.with_custom_error_code(Code::InvalidDocumentFilter)
}
_ => Self::Milli { error: err, index_uid },
}
}
} }
impl ErrorCode for Error { impl ErrorCode for Error {
@ -236,7 +253,7 @@ impl ErrorCode for Error {
// TODO: not sure of the Code to use // TODO: not sure of the Code to use
Error::NoSpaceLeftInTaskQueue => Code::NoSpaceLeftOnDevice, Error::NoSpaceLeftInTaskQueue => Code::NoSpaceLeftOnDevice,
Error::Dump(e) => e.error_code(), Error::Dump(e) => e.error_code(),
Error::Milli(e) => e.error_code(), Error::Milli { error, .. } => error.error_code(),
Error::ProcessBatchPanicked => Code::Internal, Error::ProcessBatchPanicked => Code::Internal,
Error::Heed(e) => e.error_code(), Error::Heed(e) => e.error_code(),
Error::HeedTransaction(e) => e.error_code(), Error::HeedTransaction(e) => e.error_code(),

View File

@ -3,14 +3,13 @@ use std::path::Path;
use std::time::Duration; use std::time::Duration;
use meilisearch_types::heed::{EnvClosingEvent, EnvFlags, EnvOpenOptions}; use meilisearch_types::heed::{EnvClosingEvent, EnvFlags, EnvOpenOptions};
use meilisearch_types::milli::Index; use meilisearch_types::milli::{Index, Result};
use time::OffsetDateTime; use time::OffsetDateTime;
use uuid::Uuid; use uuid::Uuid;
use super::IndexStatus::{self, Available, BeingDeleted, Closing, Missing}; use super::IndexStatus::{self, Available, BeingDeleted, Closing, Missing};
use crate::clamp_to_page_size;
use crate::lru::{InsertionOutcome, LruMap}; use crate::lru::{InsertionOutcome, LruMap};
use crate::{clamp_to_page_size, Result};
/// Keep an internally consistent view of the open indexes in memory. /// Keep an internally consistent view of the open indexes in memory.
/// ///
/// This view is made of an LRU cache that will evict the least frequently used indexes when new indexes are opened. /// This view is made of an LRU cache that will evict the least frequently used indexes when new indexes are opened.

View File

@ -5,6 +5,7 @@ use std::{fs, thread};
use meilisearch_types::heed::types::{SerdeJson, Str}; use meilisearch_types::heed::types::{SerdeJson, Str};
use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn}; use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn};
use meilisearch_types::milli;
use meilisearch_types::milli::update::IndexerConfig; use meilisearch_types::milli::update::IndexerConfig;
use meilisearch_types::milli::{FieldDistribution, Index}; use meilisearch_types::milli::{FieldDistribution, Index};
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
@ -121,7 +122,7 @@ impl IndexStats {
/// # Parameters /// # Parameters
/// ///
/// - rtxn: a RO transaction for the index, obtained from `Index::read_txn()`. /// - rtxn: a RO transaction for the index, obtained from `Index::read_txn()`.
pub fn new(index: &Index, rtxn: &RoTxn) -> Result<Self> { pub fn new(index: &Index, rtxn: &RoTxn) -> milli::Result<Self> {
Ok(IndexStats { Ok(IndexStats {
number_of_documents: index.number_of_documents(rtxn)?, number_of_documents: index.number_of_documents(rtxn)?,
database_size: index.on_disk_size()?, database_size: index.on_disk_size()?,
@ -183,13 +184,18 @@ impl IndexMapper {
// Error if the UUIDv4 somehow already exists in the map, since it should be fresh. // Error if the UUIDv4 somehow already exists in the map, since it should be fresh.
// This is very unlikely to happen in practice. // This is very unlikely to happen in practice.
// TODO: it would be better to lazily create the index. But we need an Index::open function for milli. // TODO: it would be better to lazily create the index. But we need an Index::open function for milli.
let index = self.index_map.write().unwrap().create( let index = self
&uuid, .index_map
&index_path, .write()
date, .unwrap()
self.enable_mdb_writemap, .create(
self.index_base_map_size, &uuid,
)?; &index_path,
date,
self.enable_mdb_writemap,
self.index_base_map_size,
)
.map_err(|e| Error::from_milli(e, Some(uuid.to_string())))?;
wtxn.commit()?; wtxn.commit()?;
@ -357,7 +363,9 @@ impl IndexMapper {
}; };
let index_path = self.base_path.join(uuid.to_string()); let index_path = self.base_path.join(uuid.to_string());
// take the lock to reopen the environment. // take the lock to reopen the environment.
reopen.reopen(&mut self.index_map.write().unwrap(), &index_path)?; reopen
.reopen(&mut self.index_map.write().unwrap(), &index_path)
.map_err(|e| Error::from_milli(e, Some(uuid.to_string())))?;
continue; continue;
} }
BeingDeleted => return Err(Error::IndexNotFound(name.to_string())), BeingDeleted => return Err(Error::IndexNotFound(name.to_string())),
@ -372,13 +380,15 @@ impl IndexMapper {
Missing => { Missing => {
let index_path = self.base_path.join(uuid.to_string()); let index_path = self.base_path.join(uuid.to_string());
break index_map.create( break index_map
&uuid, .create(
&index_path, &uuid,
None, &index_path,
self.enable_mdb_writemap, None,
self.index_base_map_size, self.enable_mdb_writemap,
)?; self.index_base_map_size,
)
.map_err(|e| Error::from_milli(e, Some(uuid.to_string())))?;
} }
Available(index) => break index, Available(index) => break index,
Closing(_) => { Closing(_) => {
@ -460,6 +470,7 @@ impl IndexMapper {
let index = self.index(rtxn, index_uid)?; let index = self.index(rtxn, index_uid)?;
let index_rtxn = index.read_txn()?; let index_rtxn = index.read_txn()?;
IndexStats::new(&index, &index_rtxn) IndexStats::new(&index, &index_rtxn)
.map_err(|e| Error::from_milli(e, Some(uuid.to_string())))
} }
} }
} }

View File

@ -353,7 +353,7 @@ pub fn snapshot_canceled_by(rtxn: &RoTxn, db: Database<BEU32, RoaringBitmapCodec
pub fn snapshot_batch(batch: &Batch) -> String { pub fn snapshot_batch(batch: &Batch) -> String {
let mut snap = String::new(); let mut snap = String::new();
let Batch { uid, details, stats, started_at, finished_at } = batch; let Batch { uid, details, stats, started_at, finished_at, progress: _ } = batch;
if let Some(finished_at) = finished_at { if let Some(finished_at) = finished_at {
assert!(finished_at > started_at); assert!(finished_at > started_at);
} }

View File

@ -26,6 +26,7 @@ mod index_mapper;
#[cfg(test)] #[cfg(test)]
mod insta_snapshot; mod insta_snapshot;
mod lru; mod lru;
mod processing;
mod utils; mod utils;
pub mod uuid_codec; pub mod uuid_codec;
@ -56,12 +57,12 @@ use meilisearch_types::heed::types::{SerdeBincode, SerdeJson, Str, I128};
use meilisearch_types::heed::{self, Database, Env, PutFlags, RoTxn, RwTxn}; use meilisearch_types::heed::{self, Database, Env, PutFlags, RoTxn, RwTxn};
use meilisearch_types::milli::documents::DocumentsBatchBuilder; use meilisearch_types::milli::documents::DocumentsBatchBuilder;
use meilisearch_types::milli::index::IndexEmbeddingConfig; use meilisearch_types::milli::index::IndexEmbeddingConfig;
use meilisearch_types::milli::update::new::indexer::document_changes::Progress;
use meilisearch_types::milli::update::IndexerConfig; use meilisearch_types::milli::update::IndexerConfig;
use meilisearch_types::milli::vector::{Embedder, EmbedderOptions, EmbeddingConfigs}; use meilisearch_types::milli::vector::{Embedder, EmbedderOptions, EmbeddingConfigs};
use meilisearch_types::milli::{self, CboRoaringBitmapCodec, Index, RoaringBitmapCodec, BEU32}; use meilisearch_types::milli::{self, CboRoaringBitmapCodec, Index, RoaringBitmapCodec, BEU32};
use meilisearch_types::task_view::TaskView; use meilisearch_types::task_view::TaskView;
use meilisearch_types::tasks::{Kind, KindWithContent, Status, Task, TaskProgress}; use meilisearch_types::tasks::{Kind, KindWithContent, Status, Task};
use processing::ProcessingTasks;
use rayon::current_num_threads; use rayon::current_num_threads;
use rayon::prelude::{IntoParallelIterator, ParallelIterator}; use rayon::prelude::{IntoParallelIterator, ParallelIterator};
use roaring::RoaringBitmap; use roaring::RoaringBitmap;
@ -72,7 +73,8 @@ use utils::{filter_out_references_to_newer_tasks, keep_ids_within_datetimes, map
use uuid::Uuid; use uuid::Uuid;
use crate::index_mapper::IndexMapper; use crate::index_mapper::IndexMapper;
use crate::utils::{check_index_swap_validity, clamp_to_page_size, ProcessingBatch}; use crate::processing::{AtomicTaskStep, BatchProgress};
use crate::utils::{check_index_swap_validity, clamp_to_page_size};
pub(crate) type BEI128 = I128<BE>; pub(crate) type BEI128 = I128<BE>;
@ -163,48 +165,6 @@ impl Query {
} }
} }
#[derive(Debug, Clone)]
pub struct ProcessingTasks {
batch: Option<ProcessingBatch>,
/// The list of tasks ids that are currently running.
processing: RoaringBitmap,
/// The progress on processing tasks
progress: Option<TaskProgress>,
}
impl ProcessingTasks {
/// Creates an empty `ProcessingAt` struct.
fn new() -> ProcessingTasks {
ProcessingTasks { batch: None, processing: RoaringBitmap::new(), progress: None }
}
/// Stores the currently processing tasks, and the date time at which it started.
fn start_processing(&mut self, processing_batch: ProcessingBatch, processing: RoaringBitmap) {
self.batch = Some(processing_batch);
self.processing = processing;
}
fn update_progress(&mut self, progress: Progress) -> TaskProgress {
self.progress.get_or_insert_with(TaskProgress::default).update(progress)
}
/// Set the processing tasks to an empty list
fn stop_processing(&mut self) -> Self {
self.progress = None;
Self {
batch: std::mem::take(&mut self.batch),
processing: std::mem::take(&mut self.processing),
progress: None,
}
}
/// Returns `true` if there, at least, is one task that is currently processing that we must stop.
fn must_cancel_processing_tasks(&self, canceled_tasks: &RoaringBitmap) -> bool {
!self.processing.is_disjoint(canceled_tasks)
}
}
#[derive(Default, Clone, Debug)] #[derive(Default, Clone, Debug)]
struct MustStopProcessing(Arc<AtomicBool>); struct MustStopProcessing(Arc<AtomicBool>);
@ -813,7 +773,7 @@ impl IndexScheduler {
let mut batch_tasks = RoaringBitmap::new(); let mut batch_tasks = RoaringBitmap::new();
for batch_uid in batch_uids { for batch_uid in batch_uids {
if processing_batch.as_ref().map_or(false, |batch| batch.uid == *batch_uid) { if processing_batch.as_ref().map_or(false, |batch| batch.uid == *batch_uid) {
batch_tasks |= &processing_tasks; batch_tasks |= &*processing_tasks;
} else { } else {
batch_tasks |= self.tasks_in_batch(rtxn, *batch_uid)?; batch_tasks |= self.tasks_in_batch(rtxn, *batch_uid)?;
} }
@ -827,13 +787,13 @@ impl IndexScheduler {
match status { match status {
// special case for Processing tasks // special case for Processing tasks
Status::Processing => { Status::Processing => {
status_tasks |= &processing_tasks; status_tasks |= &*processing_tasks;
} }
status => status_tasks |= &self.get_status(rtxn, *status)?, status => status_tasks |= &self.get_status(rtxn, *status)?,
}; };
} }
if !status.contains(&Status::Processing) { if !status.contains(&Status::Processing) {
tasks -= &processing_tasks; tasks -= &*processing_tasks;
} }
tasks &= status_tasks; tasks &= status_tasks;
} }
@ -882,7 +842,7 @@ impl IndexScheduler {
// Once we have filtered the two subsets, we put them back together and assign it back to `tasks`. // Once we have filtered the two subsets, we put them back together and assign it back to `tasks`.
tasks = { tasks = {
let (mut filtered_non_processing_tasks, mut filtered_processing_tasks) = let (mut filtered_non_processing_tasks, mut filtered_processing_tasks) =
(&tasks - &processing_tasks, &tasks & &processing_tasks); (&tasks - &*processing_tasks, &tasks & &*processing_tasks);
// special case for Processing tasks // special case for Processing tasks
// A closure that clears the filtered_processing_tasks if their started_at date falls outside the given bounds // A closure that clears the filtered_processing_tasks if their started_at date falls outside the given bounds
@ -1090,7 +1050,7 @@ impl IndexScheduler {
// Once we have filtered the two subsets, we put them back together and assign it back to `batches`. // Once we have filtered the two subsets, we put them back together and assign it back to `batches`.
batches = { batches = {
let (mut filtered_non_processing_batches, mut filtered_processing_batches) = let (mut filtered_non_processing_batches, mut filtered_processing_batches) =
(&batches - &processing.processing, &batches & &processing.processing); (&batches - &*processing.processing, &batches & &*processing.processing);
// special case for Processing batches // special case for Processing batches
// A closure that clears the filtered_processing_batches if their started_at date falls outside the given bounds // A closure that clears the filtered_processing_batches if their started_at date falls outside the given bounds
@ -1606,7 +1566,8 @@ impl IndexScheduler {
// We reset the must_stop flag to be sure that we don't stop processing tasks // We reset the must_stop flag to be sure that we don't stop processing tasks
self.must_stop_processing.reset(); self.must_stop_processing.reset();
self.processing_tasks let progress = self
.processing_tasks
.write() .write()
.unwrap() .unwrap()
// We can clone the processing batch here because we don't want its modification to affect the view of the processing batches // We can clone the processing batch here because we don't want its modification to affect the view of the processing batches
@ -1619,11 +1580,12 @@ impl IndexScheduler {
let res = { let res = {
let cloned_index_scheduler = self.private_clone(); let cloned_index_scheduler = self.private_clone();
let processing_batch = &mut processing_batch; let processing_batch = &mut processing_batch;
let progress = progress.clone();
std::thread::scope(|s| { std::thread::scope(|s| {
let handle = std::thread::Builder::new() let handle = std::thread::Builder::new()
.name(String::from("batch-operation")) .name(String::from("batch-operation"))
.spawn_scoped(s, move || { .spawn_scoped(s, move || {
cloned_index_scheduler.process_batch(batch, processing_batch) cloned_index_scheduler.process_batch(batch, processing_batch, progress)
}) })
.unwrap(); .unwrap();
handle.join().unwrap_or(Err(Error::ProcessBatchPanicked)) handle.join().unwrap_or(Err(Error::ProcessBatchPanicked))
@ -1636,6 +1598,7 @@ impl IndexScheduler {
#[cfg(test)] #[cfg(test)]
self.maybe_fail(tests::FailureLocation::AcquiringWtxn)?; self.maybe_fail(tests::FailureLocation::AcquiringWtxn)?;
progress.update_progress(BatchProgress::WritingTasksToDisk);
processing_batch.finished(); processing_batch.finished();
let mut wtxn = self.env.write_txn().map_err(Error::HeedTransaction)?; let mut wtxn = self.env.write_txn().map_err(Error::HeedTransaction)?;
let mut canceled = RoaringBitmap::new(); let mut canceled = RoaringBitmap::new();
@ -1645,12 +1608,15 @@ impl IndexScheduler {
#[cfg(test)] #[cfg(test)]
self.breakpoint(Breakpoint::ProcessBatchSucceeded); self.breakpoint(Breakpoint::ProcessBatchSucceeded);
let (task_progress, task_progress_obj) = AtomicTaskStep::new(tasks.len() as u32);
progress.update_progress(task_progress_obj);
let mut success = 0; let mut success = 0;
let mut failure = 0; let mut failure = 0;
let mut canceled_by = None; let mut canceled_by = None;
#[allow(unused_variables)] #[allow(unused_variables)]
for (i, mut task) in tasks.into_iter().enumerate() { for (i, mut task) in tasks.into_iter().enumerate() {
task_progress.fetch_add(1, Ordering::Relaxed);
processing_batch.update(&mut task); processing_batch.update(&mut task);
if task.status == Status::Canceled { if task.status == Status::Canceled {
canceled.insert(task.uid); canceled.insert(task.uid);
@ -1678,9 +1644,10 @@ impl IndexScheduler {
tracing::info!("A batch of tasks was successfully completed with {success} successful tasks and {failure} failed tasks."); tracing::info!("A batch of tasks was successfully completed with {success} successful tasks and {failure} failed tasks.");
} }
// If we have an abortion error we must stop the tick here and re-schedule tasks. // If we have an abortion error we must stop the tick here and re-schedule tasks.
Err(Error::Milli(milli::Error::InternalError( Err(Error::Milli {
milli::InternalError::AbortedIndexation, error: milli::Error::InternalError(milli::InternalError::AbortedIndexation),
))) ..
})
| Err(Error::AbortedTask) => { | Err(Error::AbortedTask) => {
#[cfg(test)] #[cfg(test)]
self.breakpoint(Breakpoint::AbortedIndexation); self.breakpoint(Breakpoint::AbortedIndexation);
@ -1699,9 +1666,10 @@ impl IndexScheduler {
// 2. close the associated environment // 2. close the associated environment
// 3. resize it // 3. resize it
// 4. re-schedule tasks // 4. re-schedule tasks
Err(Error::Milli(milli::Error::UserError( Err(Error::Milli {
milli::UserError::MaxDatabaseSizeReached, error: milli::Error::UserError(milli::UserError::MaxDatabaseSizeReached),
))) if index_uid.is_some() => { ..
}) if index_uid.is_some() => {
// fixme: add index_uid to match to avoid the unwrap // fixme: add index_uid to match to avoid the unwrap
let index_uid = index_uid.unwrap(); let index_uid = index_uid.unwrap();
// fixme: handle error more gracefully? not sure when this could happen // fixme: handle error more gracefully? not sure when this could happen
@ -1716,8 +1684,12 @@ impl IndexScheduler {
Err(err) => { Err(err) => {
#[cfg(test)] #[cfg(test)]
self.breakpoint(Breakpoint::ProcessBatchFailed); self.breakpoint(Breakpoint::ProcessBatchFailed);
let (task_progress, task_progress_obj) = AtomicTaskStep::new(ids.len() as u32);
progress.update_progress(task_progress_obj);
let error: ResponseError = err.into(); let error: ResponseError = err.into();
for id in ids.iter() { for id in ids.iter() {
task_progress.fetch_add(1, Ordering::Relaxed);
let mut task = self let mut task = self
.get_task(&wtxn, id) .get_task(&wtxn, id)
.map_err(|e| Error::TaskDatabaseUpdate(Box::new(e)))? .map_err(|e| Error::TaskDatabaseUpdate(Box::new(e)))?
@ -1943,6 +1915,7 @@ impl IndexScheduler {
// TODO: consider using a type alias or a struct embedder/template // TODO: consider using a type alias or a struct embedder/template
pub fn embedders( pub fn embedders(
&self, &self,
index_uid: String,
embedding_configs: Vec<IndexEmbeddingConfig>, embedding_configs: Vec<IndexEmbeddingConfig>,
) -> Result<EmbeddingConfigs> { ) -> Result<EmbeddingConfigs> {
let res: Result<_> = embedding_configs let res: Result<_> = embedding_configs
@ -1953,8 +1926,12 @@ impl IndexScheduler {
config: milli::vector::EmbeddingConfig { embedder_options, prompt, quantized }, config: milli::vector::EmbeddingConfig { embedder_options, prompt, quantized },
.. ..
}| { }| {
let prompt = let prompt = Arc::new(
Arc::new(prompt.try_into().map_err(meilisearch_types::milli::Error::from)?); prompt
.try_into()
.map_err(meilisearch_types::milli::Error::from)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?,
);
// optimistically return existing embedder // optimistically return existing embedder
{ {
let embedders = self.embedders.read().unwrap(); let embedders = self.embedders.read().unwrap();
@ -1970,7 +1947,9 @@ impl IndexScheduler {
let embedder = Arc::new( let embedder = Arc::new(
Embedder::new(embedder_options.clone()) Embedder::new(embedder_options.clone())
.map_err(meilisearch_types::milli::vector::Error::from) .map_err(meilisearch_types::milli::vector::Error::from)
.map_err(meilisearch_types::milli::Error::from)?, .map_err(|err| {
Error::from_milli(err.into(), Some(index_uid.clone()))
})?,
); );
{ {
let mut embedders = self.embedders.write().unwrap(); let mut embedders = self.embedders.write().unwrap();
@ -4319,10 +4298,35 @@ mod tests {
let proc = index_scheduler.processing_tasks.read().unwrap().clone(); let proc = index_scheduler.processing_tasks.read().unwrap().clone();
let query = Query { statuses: Some(vec![Status::Processing]), ..Default::default() }; let query = Query { statuses: Some(vec![Status::Processing]), ..Default::default() };
let (batches, _) = index_scheduler let (mut batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&rtxn, &proc, &query, &AuthFilter::default()) .get_batches_from_authorized_indexes(query.clone(), &AuthFilter::default())
.unwrap(); .unwrap();
snapshot!(snapshot_bitmap(&batches), @"[0,]"); // only the processing batch in the first tick assert_eq!(batches.len(), 1);
batches[0].started_at = OffsetDateTime::UNIX_EPOCH;
// Insta cannot snapshot our batches because the batch stats contains an enum as key: https://github.com/mitsuhiko/insta/issues/689
let batch = serde_json::to_string_pretty(&batches[0]).unwrap();
snapshot!(batch, @r#"
{
"uid": 0,
"details": {
"primaryKey": "mouse"
},
"stats": {
"totalNbTasks": 1,
"status": {
"processing": 1
},
"types": {
"indexCreation": 1
},
"indexUids": {
"catto": 1
}
},
"startedAt": "1970-01-01T00:00:00Z",
"finishedAt": null
}
"#);
let query = Query { statuses: Some(vec![Status::Enqueued]), ..Default::default() }; let query = Query { statuses: Some(vec![Status::Enqueued]), ..Default::default() };
let (batches, _) = index_scheduler let (batches, _) = index_scheduler
@ -6146,7 +6150,7 @@ mod tests {
insta::assert_json_snapshot!(simple_hf_config.embedder_options); insta::assert_json_snapshot!(simple_hf_config.embedder_options);
let simple_hf_name = name.clone(); let simple_hf_name = name.clone();
let configs = index_scheduler.embedders(configs).unwrap(); let configs = index_scheduler.embedders("doggos".to_string(), configs).unwrap();
let (hf_embedder, _, _) = configs.get(&simple_hf_name).unwrap(); let (hf_embedder, _, _) = configs.get(&simple_hf_name).unwrap();
let beagle_embed = let beagle_embed =
hf_embedder.embed_one(S("Intel the beagle best doggo"), None).unwrap(); hf_embedder.embed_one(S("Intel the beagle best doggo"), None).unwrap();

View File

@ -0,0 +1,316 @@
use std::borrow::Cow;
use std::sync::Arc;
use enum_iterator::Sequence;
use meilisearch_types::milli::progress::{AtomicSubStep, NamedStep, Progress, ProgressView, Step};
use meilisearch_types::milli::{make_atomic_progress, make_enum_progress};
use roaring::RoaringBitmap;
use crate::utils::ProcessingBatch;
#[derive(Clone)]
pub struct ProcessingTasks {
pub batch: Option<Arc<ProcessingBatch>>,
/// The list of tasks ids that are currently running.
pub processing: Arc<RoaringBitmap>,
/// The progress on processing tasks
pub progress: Option<Progress>,
}
impl ProcessingTasks {
/// Creates an empty `ProcessingAt` struct.
pub fn new() -> ProcessingTasks {
ProcessingTasks { batch: None, processing: Arc::new(RoaringBitmap::new()), progress: None }
}
pub fn get_progress_view(&self) -> Option<ProgressView> {
Some(self.progress.as_ref()?.as_progress_view())
}
/// Stores the currently processing tasks, and the date time at which it started.
pub fn start_processing(
&mut self,
processing_batch: ProcessingBatch,
processing: RoaringBitmap,
) -> Progress {
self.batch = Some(Arc::new(processing_batch));
self.processing = Arc::new(processing);
let progress = Progress::default();
progress.update_progress(BatchProgress::ProcessingTasks);
self.progress = Some(progress.clone());
progress
}
/// Set the processing tasks to an empty list
pub fn stop_processing(&mut self) -> Self {
self.progress = None;
Self {
batch: std::mem::take(&mut self.batch),
processing: std::mem::take(&mut self.processing),
progress: None,
}
}
/// Returns `true` if there, at least, is one task that is currently processing that we must stop.
pub fn must_cancel_processing_tasks(&self, canceled_tasks: &RoaringBitmap) -> bool {
!self.processing.is_disjoint(canceled_tasks)
}
}
make_enum_progress! {
pub enum BatchProgress {
ProcessingTasks,
WritingTasksToDisk,
}
}
make_enum_progress! {
pub enum TaskCancelationProgress {
RetrievingTasks,
UpdatingTasks,
}
}
make_enum_progress! {
pub enum TaskDeletionProgress {
DeletingTasksDateTime,
DeletingTasksMetadata,
DeletingTasks,
DeletingBatches,
}
}
make_enum_progress! {
pub enum SnapshotCreationProgress {
StartTheSnapshotCreation,
SnapshotTheIndexScheduler,
SnapshotTheUpdateFiles,
SnapshotTheIndexes,
SnapshotTheApiKeys,
CreateTheTarball,
}
}
make_enum_progress! {
pub enum DumpCreationProgress {
StartTheDumpCreation,
DumpTheApiKeys,
DumpTheTasks,
DumpTheIndexes,
DumpTheExperimentalFeatures,
CompressTheDump,
}
}
make_enum_progress! {
pub enum CreateIndexProgress {
CreatingTheIndex,
}
}
make_enum_progress! {
pub enum UpdateIndexProgress {
UpdatingTheIndex,
}
}
make_enum_progress! {
pub enum DeleteIndexProgress {
DeletingTheIndex,
}
}
make_enum_progress! {
pub enum SwappingTheIndexes {
EnsuringCorrectnessOfTheSwap,
SwappingTheIndexes,
}
}
make_enum_progress! {
pub enum InnerSwappingTwoIndexes {
RetrieveTheTasks,
UpdateTheTasks,
UpdateTheIndexesMetadata,
}
}
make_enum_progress! {
pub enum DocumentOperationProgress {
RetrievingConfig,
ComputingDocumentChanges,
Indexing,
}
}
make_enum_progress! {
pub enum DocumentEditionProgress {
RetrievingConfig,
ComputingDocumentChanges,
Indexing,
}
}
make_enum_progress! {
pub enum DocumentDeletionProgress {
RetrievingConfig,
DeleteDocuments,
Indexing,
}
}
make_enum_progress! {
pub enum SettingsProgress {
RetrievingAndMergingTheSettings,
ApplyTheSettings,
}
}
make_atomic_progress!(Task alias AtomicTaskStep => "task" );
make_atomic_progress!(Document alias AtomicDocumentStep => "document" );
make_atomic_progress!(Batch alias AtomicBatchStep => "batch" );
make_atomic_progress!(UpdateFile alias AtomicUpdateFileStep => "update file" );
pub struct VariableNameStep {
name: String,
current: u32,
total: u32,
}
impl VariableNameStep {
pub fn new(name: impl Into<String>, current: u32, total: u32) -> Self {
Self { name: name.into(), current, total }
}
}
impl Step for VariableNameStep {
fn name(&self) -> Cow<'static, str> {
self.name.clone().into()
}
fn current(&self) -> u32 {
self.current
}
fn total(&self) -> u32 {
self.total
}
}
#[cfg(test)]
mod test {
use std::sync::atomic::Ordering;
use meili_snap::{json_string, snapshot};
use super::*;
#[test]
fn one_level() {
let mut processing = ProcessingTasks::new();
processing.start_processing(ProcessingBatch::new(0), RoaringBitmap::new());
snapshot!(json_string!(processing.get_progress_view()), @r#"
{
"steps": [
{
"currentStep": "processing tasks",
"finished": 0,
"total": 2
}
],
"percentage": 0.0
}
"#);
processing.progress.as_ref().unwrap().update_progress(BatchProgress::WritingTasksToDisk);
snapshot!(json_string!(processing.get_progress_view()), @r#"
{
"steps": [
{
"currentStep": "writing tasks to disk",
"finished": 1,
"total": 2
}
],
"percentage": 50.0
}
"#);
}
#[test]
fn task_progress() {
let mut processing = ProcessingTasks::new();
processing.start_processing(ProcessingBatch::new(0), RoaringBitmap::new());
let (atomic, tasks) = AtomicTaskStep::new(10);
processing.progress.as_ref().unwrap().update_progress(tasks);
snapshot!(json_string!(processing.get_progress_view()), @r#"
{
"steps": [
{
"currentStep": "processing tasks",
"finished": 0,
"total": 2
},
{
"currentStep": "task",
"finished": 0,
"total": 10
}
],
"percentage": 0.0
}
"#);
atomic.fetch_add(6, Ordering::Relaxed);
snapshot!(json_string!(processing.get_progress_view()), @r#"
{
"steps": [
{
"currentStep": "processing tasks",
"finished": 0,
"total": 2
},
{
"currentStep": "task",
"finished": 6,
"total": 10
}
],
"percentage": 30.000002
}
"#);
processing.progress.as_ref().unwrap().update_progress(BatchProgress::WritingTasksToDisk);
snapshot!(json_string!(processing.get_progress_view()), @r#"
{
"steps": [
{
"currentStep": "writing tasks to disk",
"finished": 1,
"total": 2
}
],
"percentage": 50.0
}
"#);
let (atomic, tasks) = AtomicTaskStep::new(5);
processing.progress.as_ref().unwrap().update_progress(tasks);
atomic.fetch_add(4, Ordering::Relaxed);
snapshot!(json_string!(processing.get_progress_view()), @r#"
{
"steps": [
{
"currentStep": "writing tasks to disk",
"finished": 1,
"total": 2
},
{
"currentStep": "task",
"finished": 4,
"total": 5
}
],
"percentage": 90.0
}
"#);
}
}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(1): ### Processing batch Some(1):
[1,] [1,]
{uid: 1, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"beavero":2}}, } {uid: 1, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"beavero":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(1): ### Processing batch Some(1):
[1,] [1,]
{uid: 1, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"beavero":2}}, } {uid: 1, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"beavero":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"dumpUid":null}, stats: {"totalNbTasks":1,"status":{"enqueued":1},"types":{"dumpCreation":1},"indexUids":{}}, } {uid: 0, details: {"dumpUid":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"dumpCreation":1},"indexUids":{}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { dump_uid: None }, kind: DumpCreation { keys: [], instance_uid: None }} 0 {uid: 0, status: enqueued, details: { dump_uid: None }, kind: DumpCreation { keys: [], instance_uid: None }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"catto":2}}, } {uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"catto":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"catto":2}}, } {uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"catto":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"catto":2}}, } {uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"catto":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"doggos":2}}, } {uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"doggos":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"doggos":2}}, } {uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"doggos":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@ -9,8 +9,8 @@ source: crates/index-scheduler/src/lib.rs
0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: Set({"catto"}), sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: NotSet, search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: Set({"catto"}), sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: NotSet, search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: Set({"catto"}), sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: NotSet, search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: Set({"catto"}), sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: NotSet, search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }}
1 {uid: 1, batch_uid: 1, status: succeeded, details: { received_documents: 3, indexed_documents: Some(3) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 3, allow_index_creation: true }} 1 {uid: 1, batch_uid: 1, status: succeeded, details: { received_documents: 3, indexed_documents: Some(3) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 3, allow_index_creation: true }}
2 {uid: 2, batch_uid: 2, status: succeeded, details: { received_document_ids: 1, deleted_documents: Some(1) }, kind: DocumentDeletion { index_uid: "doggos", documents_ids: ["1"] }} 2 {uid: 2, batch_uid: 2, status: succeeded, details: { received_document_ids: 1, deleted_documents: Some(1) }, kind: DocumentDeletion { index_uid: "doggos", documents_ids: ["1"] }}
3 {uid: 3, batch_uid: 2, status: failed, error: ResponseError { code: 200, message: "Invalid type for filter subexpression: expected: String, Array, found: true.", error_code: "invalid_document_filter", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#invalid_document_filter" }, details: { original_filter: true, deleted_documents: Some(0) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: Bool(true) }} 3 {uid: 3, batch_uid: 2, status: failed, error: ResponseError { code: 200, message: "Index `doggos`: Invalid type for filter subexpression: expected: String, Array, found: true.", error_code: "invalid_document_filter", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#invalid_document_filter" }, details: { original_filter: true, deleted_documents: Some(0) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: Bool(true) }}
4 {uid: 4, batch_uid: 2, status: failed, error: ResponseError { code: 200, message: "Attribute `id` is not filterable. Available filterable attributes are: `catto`.\n1:3 id = 2", error_code: "invalid_document_filter", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#invalid_document_filter" }, details: { original_filter: "id = 2", deleted_documents: Some(0) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: String("id = 2") }} 4 {uid: 4, batch_uid: 2, status: failed, error: ResponseError { code: 200, message: "Index `doggos`: Attribute `id` is not filterable. Available filterable attributes are: `catto`.\n1:3 id = 2", error_code: "invalid_document_filter", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#invalid_document_filter" }, details: { original_filter: "id = 2", deleted_documents: Some(0) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: String("id = 2") }}
5 {uid: 5, batch_uid: 2, status: succeeded, details: { original_filter: "catto EXISTS", deleted_documents: Some(1) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: String("catto EXISTS") }} 5 {uid: 5, batch_uid: 2, status: succeeded, details: { original_filter: "catto EXISTS", deleted_documents: Some(1) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: String("catto EXISTS") }}
---------------------------------------------------------------------- ----------------------------------------------------------------------
### Status: ### Status:

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"doggos":2}}, } {uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"doggos":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"doggos":2}}, } {uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"doggos":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }} 0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"indexCreation":2},"indexUids":{"index_a":2}}, } {uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"indexCreation":1},"indexUids":{"index_a":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { primary_key: Some("id") }, kind: IndexCreation { index_uid: "index_a", primary_key: Some("id") }} 0 {uid: 0, status: enqueued, details: { primary_key: Some("id") }, kind: IndexCreation { index_uid: "index_a", primary_key: Some("id") }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"indexCreation":2},"indexUids":{"index_a":2}}, } {uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"indexCreation":1},"indexUids":{"index_a":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { primary_key: Some("id") }, kind: IndexCreation { index_uid: "index_a", primary_key: Some("id") }} 0 {uid: 0, status: enqueued, details: { primary_key: Some("id") }, kind: IndexCreation { index_uid: "index_a", primary_key: Some("id") }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[0,] [0,]
{uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"indexCreation":2},"indexUids":{"index_a":2}}, } {uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"indexCreation":1},"indexUids":{"index_a":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { primary_key: Some("id") }, kind: IndexCreation { index_uid: "index_a", primary_key: Some("id") }} 0 {uid: 0, status: enqueued, details: { primary_key: Some("id") }, kind: IndexCreation { index_uid: "index_a", primary_key: Some("id") }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(1): ### Processing batch Some(1):
[1,] [1,]
{uid: 1, details: {"primaryKey":"sheep"}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"indexCreation":2},"indexUids":{"doggo":2}}, } {uid: 1, details: {"primaryKey":"sheep"}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 0 {uid: 0, batch_uid: 0, status: succeeded, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}

View File

@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true ### Autobatching Enabled = true
### Processing batch Some(0): ### Processing batch Some(0):
[3,] [3,]
{uid: 0, details: {"matchedTasks":2,"deletedTasks":null,"originalFilter":"test_query"}, stats: {"totalNbTasks":1,"status":{"enqueued":1},"types":{"taskDeletion":1},"indexUids":{}}, } {uid: 0, details: {"matchedTasks":2,"deletedTasks":null,"originalFilter":"test_query"}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"taskDeletion":1},"indexUids":{}}, }
---------------------------------------------------------------------- ----------------------------------------------------------------------
### All Tasks: ### All Tasks:
0 {uid: 0, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }} 0 {uid: 0, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}

View File

@ -67,7 +67,7 @@ impl ProcessingBatch {
task.batch_uid = Some(self.uid); task.batch_uid = Some(self.uid);
// We don't store the statuses in the map since they're all enqueued but we must // We don't store the statuses in the map since they're all enqueued but we must
// still store them in the stats since that can be displayed. // still store them in the stats since that can be displayed.
*self.stats.status.entry(task.status).or_default() += 1; *self.stats.status.entry(Status::Processing).or_default() += 1;
self.kinds.insert(task.kind.as_kind()); self.kinds.insert(task.kind.as_kind());
*self.stats.types.entry(task.kind.as_kind()).or_default() += 1; *self.stats.types.entry(task.kind.as_kind()).or_default() += 1;
@ -134,6 +134,7 @@ impl ProcessingBatch {
pub fn to_batch(&self) -> Batch { pub fn to_batch(&self) -> Batch {
Batch { Batch {
uid: self.uid, uid: self.uid,
progress: None,
details: self.details.clone(), details: self.details.clone(),
stats: self.stats.clone(), stats: self.stats.clone(),
started_at: self.started_at, started_at: self.started_at,
@ -187,6 +188,7 @@ impl IndexScheduler {
&batch.uid, &batch.uid,
&Batch { &Batch {
uid: batch.uid, uid: batch.uid,
progress: None,
details: batch.details, details: batch.details,
stats: batch.stats, stats: batch.stats,
started_at: batch.started_at, started_at: batch.started_at,
@ -273,7 +275,9 @@ impl IndexScheduler {
.into_iter() .into_iter()
.map(|batch_id| { .map(|batch_id| {
if Some(batch_id) == processing.batch.as_ref().map(|batch| batch.uid) { if Some(batch_id) == processing.batch.as_ref().map(|batch| batch.uid) {
Ok(processing.batch.as_ref().unwrap().to_batch()) let mut batch = processing.batch.as_ref().unwrap().to_batch();
batch.progress = processing.get_progress_view();
Ok(batch)
} else { } else {
self.get_batch(rtxn, batch_id) self.get_batch(rtxn, batch_id)
.and_then(|task| task.ok_or(Error::CorruptedTaskQueue)) .and_then(|task| task.ok_or(Error::CorruptedTaskQueue))
@ -287,7 +291,10 @@ impl IndexScheduler {
debug_assert!(old_task != *task); debug_assert!(old_task != *task);
debug_assert_eq!(old_task.uid, task.uid); debug_assert_eq!(old_task.uid, task.uid);
debug_assert!(old_task.batch_uid.is_none() && task.batch_uid.is_some()); debug_assert!(
old_task.batch_uid.is_none() && task.batch_uid.is_some(),
"\n==> old: {old_task:?}\n==> new: {task:?}"
);
if old_task.status != task.status { if old_task.status != task.status {
self.update_status(wtxn, old_task.status, |bitmap| { self.update_status(wtxn, old_task.status, |bitmap| {

View File

@ -24,8 +24,9 @@ flate2 = "1.0.30"
fst = "0.4.7" fst = "0.4.7"
memmap2 = "0.9.4" memmap2 = "0.9.4"
milli = { path = "../milli" } milli = { path = "../milli" }
raw-collections = { git = "https://github.com/meilisearch/raw-collections.git", version = "0.1.0" } bumparaw-collections = "0.1.2"
roaring = { version = "0.10.7", features = ["serde"] } roaring = { version = "0.10.7", features = ["serde"] }
rustc-hash = "2.1.0"
serde = { version = "1.0.204", features = ["derive"] } serde = { version = "1.0.204", features = ["derive"] }
serde-cs = "0.2.4" serde-cs = "0.2.4"
serde_json = "1.0.120" serde_json = "1.0.120"

View File

@ -1,16 +1,16 @@
use milli::progress::ProgressView;
use serde::Serialize; use serde::Serialize;
use time::{Duration, OffsetDateTime}; use time::{Duration, OffsetDateTime};
use crate::{ use crate::batches::{Batch, BatchId, BatchStats};
batches::{Batch, BatchId, BatchStats}, use crate::task_view::DetailsView;
task_view::DetailsView, use crate::tasks::serialize_duration;
tasks::serialize_duration,
};
#[derive(Debug, Clone, Serialize)] #[derive(Debug, Clone, Serialize)]
#[serde(rename_all = "camelCase")] #[serde(rename_all = "camelCase")]
pub struct BatchView { pub struct BatchView {
pub uid: BatchId, pub uid: BatchId,
pub progress: Option<ProgressView>,
pub details: DetailsView, pub details: DetailsView,
pub stats: BatchStats, pub stats: BatchStats,
#[serde(serialize_with = "serialize_duration", default)] #[serde(serialize_with = "serialize_duration", default)]
@ -25,6 +25,7 @@ impl BatchView {
pub fn from_batch(batch: &Batch) -> Self { pub fn from_batch(batch: &Batch) -> Self {
Self { Self {
uid: batch.uid, uid: batch.uid,
progress: batch.progress.clone(),
details: batch.details.clone(), details: batch.details.clone(),
stats: batch.stats.clone(), stats: batch.stats.clone(),
duration: batch.finished_at.map(|finished_at| finished_at - batch.started_at), duration: batch.finished_at.map(|finished_at| finished_at - batch.started_at),

View File

@ -1,12 +1,11 @@
use std::collections::BTreeMap; use std::collections::BTreeMap;
use milli::progress::ProgressView;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use time::OffsetDateTime; use time::OffsetDateTime;
use crate::{ use crate::task_view::DetailsView;
task_view::DetailsView, use crate::tasks::{Kind, Status};
tasks::{Kind, Status},
};
pub type BatchId = u32; pub type BatchId = u32;
@ -15,6 +14,8 @@ pub type BatchId = u32;
pub struct Batch { pub struct Batch {
pub uid: BatchId, pub uid: BatchId,
#[serde(skip)]
pub progress: Option<ProgressView>,
pub details: DetailsView, pub details: DetailsView,
pub stats: BatchStats, pub stats: BatchStats,

View File

@ -4,10 +4,11 @@ use std::io::{self, BufWriter};
use std::marker::PhantomData; use std::marker::PhantomData;
use bumpalo::Bump; use bumpalo::Bump;
use bumparaw_collections::RawMap;
use memmap2::Mmap; use memmap2::Mmap;
use milli::documents::Error; use milli::documents::Error;
use milli::Object; use milli::Object;
use raw_collections::RawMap; use rustc_hash::FxBuildHasher;
use serde::de::{SeqAccess, Visitor}; use serde::de::{SeqAccess, Visitor};
use serde::{Deserialize, Deserializer}; use serde::{Deserialize, Deserializer};
use serde_json::error::Category; use serde_json::error::Category;
@ -220,7 +221,7 @@ pub fn read_json(input: &File, output: impl io::Write) -> Result<u64> {
let mut deserializer = serde_json::Deserializer::from_slice(&input); let mut deserializer = serde_json::Deserializer::from_slice(&input);
let res = array_each(&mut deserializer, |obj: &RawValue| { let res = array_each(&mut deserializer, |obj: &RawValue| {
doc_alloc.reset(); doc_alloc.reset();
let map = RawMap::from_raw_value(obj, &doc_alloc)?; let map = RawMap::from_raw_value_and_hasher(obj, FxBuildHasher, &doc_alloc)?;
to_writer(&mut out, &map) to_writer(&mut out, &map)
}); });
let count = match res { let count = match res {
@ -250,26 +251,25 @@ pub fn read_json(input: &File, output: impl io::Write) -> Result<u64> {
} }
} }
/// Reads NDJSON from file and write it in NDJSON in a file checking it along the way. /// Reads NDJSON from file and checks it.
pub fn read_ndjson(input: &File, output: impl io::Write) -> Result<u64> { pub fn read_ndjson(input: &File) -> Result<u64> {
// We memory map to be able to deserialize into a RawMap that // We memory map to be able to deserialize into a RawMap that
// does not allocate when possible and only materialize the first/top level. // does not allocate when possible and only materialize the first/top level.
let input = unsafe { Mmap::map(input).map_err(DocumentFormatError::Io)? }; let input = unsafe { Mmap::map(input).map_err(DocumentFormatError::Io)? };
let mut output = BufWriter::new(output);
let mut bump = Bump::with_capacity(1024 * 1024); let mut bump = Bump::with_capacity(1024 * 1024);
let mut count = 0; let mut count = 0;
for result in serde_json::Deserializer::from_slice(&input).into_iter() { for result in serde_json::Deserializer::from_slice(&input).into_iter() {
bump.reset(); bump.reset();
count += 1; match result {
result Ok(raw) => {
.and_then(|raw: &RawValue| {
// try to deserialize as a map // try to deserialize as a map
let map = RawMap::from_raw_value(raw, &bump)?; RawMap::from_raw_value_and_hasher(raw, FxBuildHasher, &bump)
to_writer(&mut output, &map) .map_err(|e| DocumentFormatError::from((PayloadType::Ndjson, e)))?;
}) count += 1;
.map_err(|e| DocumentFormatError::from((PayloadType::Ndjson, e)))?; }
Err(e) => return Err(DocumentFormatError::from((PayloadType::Ndjson, e))),
}
} }
Ok(count) Ok(count)

View File

@ -279,6 +279,7 @@ InvalidSearchPage , InvalidRequest , BAD_REQUEST ;
InvalidSearchQ , InvalidRequest , BAD_REQUEST ; InvalidSearchQ , InvalidRequest , BAD_REQUEST ;
InvalidFacetSearchQuery , InvalidRequest , BAD_REQUEST ; InvalidFacetSearchQuery , InvalidRequest , BAD_REQUEST ;
InvalidFacetSearchName , InvalidRequest , BAD_REQUEST ; InvalidFacetSearchName , InvalidRequest , BAD_REQUEST ;
FacetSearchDisabled , InvalidRequest , BAD_REQUEST ;
InvalidSearchVector , InvalidRequest , BAD_REQUEST ; InvalidSearchVector , InvalidRequest , BAD_REQUEST ;
InvalidSearchShowMatchesPosition , InvalidRequest , BAD_REQUEST ; InvalidSearchShowMatchesPosition , InvalidRequest , BAD_REQUEST ;
InvalidSearchShowRankingScore , InvalidRequest , BAD_REQUEST ; InvalidSearchShowRankingScore , InvalidRequest , BAD_REQUEST ;
@ -549,7 +550,7 @@ impl fmt::Display for deserr_codes::InvalidSimilarId {
"the value of `id` is invalid. \ "the value of `id` is invalid. \
A document identifier can be of type integer or string, \ A document identifier can be of type integer or string, \
only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), \ only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), \
and can not be more than 512 bytes." and can not be more than 511 bytes."
) )
} }
} }

View File

@ -4,7 +4,6 @@ use std::fmt::{Display, Write};
use std::str::FromStr; use std::str::FromStr;
use enum_iterator::Sequence; use enum_iterator::Sequence;
use milli::update::new::indexer::document_changes::Progress;
use milli::update::IndexDocumentsMethod; use milli::update::IndexDocumentsMethod;
use milli::Object; use milli::Object;
use roaring::RoaringBitmap; use roaring::RoaringBitmap;
@ -41,62 +40,6 @@ pub struct Task {
pub kind: KindWithContent, pub kind: KindWithContent,
} }
#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct TaskProgress {
pub current_step: &'static str,
pub finished_steps: u16,
pub total_steps: u16,
pub finished_substeps: Option<u32>,
pub total_substeps: Option<u32>,
}
impl Default for TaskProgress {
fn default() -> Self {
Self::new()
}
}
impl TaskProgress {
pub fn new() -> Self {
Self {
current_step: "start",
finished_steps: 0,
total_steps: 1,
finished_substeps: None,
total_substeps: None,
}
}
pub fn update(&mut self, progress: Progress) -> TaskProgress {
if self.finished_steps > progress.finished_steps {
return *self;
}
if self.current_step != progress.step_name {
self.current_step = progress.step_name
}
self.total_steps = progress.total_steps;
if self.finished_steps < progress.finished_steps {
self.finished_substeps = None;
self.total_substeps = None;
}
self.finished_steps = progress.finished_steps;
if let Some((finished_substeps, total_substeps)) = progress.finished_total_substep {
if let Some(task_finished_substeps) = self.finished_substeps {
if task_finished_substeps > finished_substeps {
return *self;
}
}
self.finished_substeps = Some(finished_substeps);
self.total_substeps = Some(total_substeps);
}
*self
}
}
impl Task { impl Task {
pub fn index_uid(&self) -> Option<&str> { pub fn index_uid(&self) -> Option<&str> {
use KindWithContent::*; use KindWithContent::*;

View File

@ -4,6 +4,7 @@ use byte_unit::{Byte, UnitType};
use meilisearch_types::document_formats::{DocumentFormatError, PayloadType}; use meilisearch_types::document_formats::{DocumentFormatError, PayloadType};
use meilisearch_types::error::{Code, ErrorCode, ResponseError}; use meilisearch_types::error::{Code, ErrorCode, ResponseError};
use meilisearch_types::index_uid::{IndexUid, IndexUidFormatError}; use meilisearch_types::index_uid::{IndexUid, IndexUidFormatError};
use meilisearch_types::milli;
use meilisearch_types::milli::OrderBy; use meilisearch_types::milli::OrderBy;
use serde_json::Value; use serde_json::Value;
use tokio::task::JoinError; use tokio::task::JoinError;
@ -62,8 +63,11 @@ pub enum MeilisearchHttpError {
HeedError(#[from] meilisearch_types::heed::Error), HeedError(#[from] meilisearch_types::heed::Error),
#[error(transparent)] #[error(transparent)]
IndexScheduler(#[from] index_scheduler::Error), IndexScheduler(#[from] index_scheduler::Error),
#[error(transparent)] #[error("{}", match .index_name {
Milli(#[from] meilisearch_types::milli::Error), Some(name) if !name.is_empty() => format!("Index `{}`: {error}", name),
_ => format!("{error}")
})]
Milli { error: milli::Error, index_name: Option<String> },
#[error(transparent)] #[error(transparent)]
Payload(#[from] PayloadError), Payload(#[from] PayloadError),
#[error(transparent)] #[error(transparent)]
@ -76,6 +80,12 @@ pub enum MeilisearchHttpError {
MissingSearchHybrid, MissingSearchHybrid,
} }
impl MeilisearchHttpError {
pub(crate) fn from_milli(error: milli::Error, index_name: Option<String>) -> Self {
Self::Milli { error, index_name }
}
}
impl ErrorCode for MeilisearchHttpError { impl ErrorCode for MeilisearchHttpError {
fn error_code(&self) -> Code { fn error_code(&self) -> Code {
match self { match self {
@ -95,7 +105,7 @@ impl ErrorCode for MeilisearchHttpError {
MeilisearchHttpError::SerdeJson(_) => Code::Internal, MeilisearchHttpError::SerdeJson(_) => Code::Internal,
MeilisearchHttpError::HeedError(_) => Code::Internal, MeilisearchHttpError::HeedError(_) => Code::Internal,
MeilisearchHttpError::IndexScheduler(e) => e.error_code(), MeilisearchHttpError::IndexScheduler(e) => e.error_code(),
MeilisearchHttpError::Milli(e) => e.error_code(), MeilisearchHttpError::Milli { error, .. } => error.error_code(),
MeilisearchHttpError::Payload(e) => e.error_code(), MeilisearchHttpError::Payload(e) => e.error_code(),
MeilisearchHttpError::FileStore(_) => Code::Internal, MeilisearchHttpError::FileStore(_) => Code::Internal,
MeilisearchHttpError::DocumentFormat(e) => e.error_code(), MeilisearchHttpError::DocumentFormat(e) => e.error_code(),

View File

@ -395,6 +395,7 @@ fn import_dump(
for index_reader in dump_reader.indexes()? { for index_reader in dump_reader.indexes()? {
let mut index_reader = index_reader?; let mut index_reader = index_reader?;
let metadata = index_reader.metadata(); let metadata = index_reader.metadata();
let uid = metadata.uid.clone();
tracing::info!("Importing index `{}`.", metadata.uid); tracing::info!("Importing index `{}`.", metadata.uid);
let date = Some((metadata.created_at, metadata.updated_at)); let date = Some((metadata.created_at, metadata.updated_at));
@ -432,7 +433,7 @@ fn import_dump(
let reader = DocumentsBatchReader::from_reader(reader)?; let reader = DocumentsBatchReader::from_reader(reader)?;
let embedder_configs = index.embedding_configs(&wtxn)?; let embedder_configs = index.embedding_configs(&wtxn)?;
let embedders = index_scheduler.embedders(embedder_configs)?; let embedders = index_scheduler.embedders(uid, embedder_configs)?;
let builder = milli::update::IndexDocuments::new( let builder = milli::update::IndexDocuments::new(
&mut wtxn, &mut wtxn,

View File

@ -129,6 +129,11 @@ async fn try_main() -> anyhow::Result<()> {
print_launch_resume(&opt, analytics.clone(), config_read_from); print_launch_resume(&opt, analytics.clone(), config_read_from);
tokio::spawn(async move {
tokio::signal::ctrl_c().await.unwrap();
std::process::exit(130);
});
run_http(index_scheduler, auth_controller, opt, log_handle, Arc::new(analytics)).await?; run_http(index_scheduler, auth_controller, opt, log_handle, Arc::new(analytics)).await?;
Ok(()) Ok(())

View File

@ -1,18 +1,18 @@
use actix_web::{ use actix_web::web::{self, Data};
web::{self, Data}, use actix_web::HttpResponse;
HttpResponse,
};
use deserr::actix_web::AwebQueryParameter; use deserr::actix_web::AwebQueryParameter;
use index_scheduler::{IndexScheduler, Query}; use index_scheduler::{IndexScheduler, Query};
use meilisearch_types::{ use meilisearch_types::batch_view::BatchView;
batch_view::BatchView, batches::BatchId, deserr::DeserrQueryParamError, error::ResponseError, use meilisearch_types::batches::BatchId;
keys::actions, use meilisearch_types::deserr::DeserrQueryParamError;
}; use meilisearch_types::error::ResponseError;
use meilisearch_types::keys::actions;
use serde::Serialize; use serde::Serialize;
use crate::extractors::{authentication::GuardedData, sequential_extractor::SeqHandler}; use super::tasks::TasksFilterQuery;
use super::ActionPolicy;
use super::{tasks::TasksFilterQuery, ActionPolicy}; use crate::extractors::authentication::GuardedData;
use crate::extractors::sequential_extractor::SeqHandler;
pub fn configure(cfg: &mut web::ServiceConfig) { pub fn configure(cfg: &mut web::ServiceConfig) {
cfg.service(web::resource("").route(web::get().to(SeqHandler(get_batches)))) cfg.service(web::resource("").route(web::get().to(SeqHandler(get_batches))))

View File

@ -1,5 +1,5 @@
use std::collections::HashSet; use std::collections::HashSet;
use std::io::ErrorKind; use std::io::{ErrorKind, Seek as _};
use std::marker::PhantomData; use std::marker::PhantomData;
use actix_web::http::header::CONTENT_TYPE; use actix_web::http::header::CONTENT_TYPE;
@ -572,7 +572,7 @@ async fn document_addition(
index_uid: IndexUid, index_uid: IndexUid,
primary_key: Option<String>, primary_key: Option<String>,
csv_delimiter: Option<u8>, csv_delimiter: Option<u8>,
mut body: Payload, body: Payload,
method: IndexDocumentsMethod, method: IndexDocumentsMethod,
task_id: Option<TaskId>, task_id: Option<TaskId>,
dry_run: bool, dry_run: bool,
@ -609,54 +609,60 @@ async fn document_addition(
}; };
let (uuid, mut update_file) = index_scheduler.create_update_file(dry_run)?; let (uuid, mut update_file) = index_scheduler.create_update_file(dry_run)?;
let documents_count = match format {
PayloadType::Ndjson => {
let (path, file) = update_file.into_parts();
let file = match file {
Some(file) => {
let (file, path) = file.into_parts();
let mut file = copy_body_to_file(file, body, format).await?;
file.rewind().map_err(|e| {
index_scheduler::Error::FileStore(file_store::Error::IoError(e))
})?;
Some(tempfile::NamedTempFile::from_parts(file, path))
}
None => None,
};
let temp_file = match tempfile() { let documents_count = tokio::task::spawn_blocking(move || {
Ok(file) => file, let documents_count = file.as_ref().map_or(Ok(0), |ntf| {
Err(e) => return Err(MeilisearchHttpError::Payload(ReceivePayload(Box::new(e)))), read_ndjson(ntf.as_file()).map_err(MeilisearchHttpError::DocumentFormat)
})?;
let update_file = file_store::File::from_parts(path, file);
update_file.persist()?;
Ok(documents_count)
})
.await?;
Ok(documents_count)
}
PayloadType::Json | PayloadType::Csv { delimiter: _ } => {
let temp_file = match tempfile() {
Ok(file) => file,
Err(e) => return Err(MeilisearchHttpError::Payload(ReceivePayload(Box::new(e)))),
};
let read_file = copy_body_to_file(temp_file, body, format).await?;
tokio::task::spawn_blocking(move || {
let documents_count = match format {
PayloadType::Json => read_json(&read_file, &mut update_file)?,
PayloadType::Csv { delimiter } => {
read_csv(&read_file, &mut update_file, delimiter)?
}
PayloadType::Ndjson => {
unreachable!("We already wrote the user content into the update file")
}
};
// we NEED to persist the file here because we moved the `udpate_file` in another task.
update_file.persist()?;
Ok(documents_count)
})
.await
}
}; };
let async_file = File::from_std(temp_file);
let mut buffer = BufWriter::new(async_file);
let mut buffer_write_size: usize = 0;
while let Some(result) = body.next().await {
let byte = result?;
if byte.is_empty() && buffer_write_size == 0 {
return Err(MeilisearchHttpError::MissingPayload(format));
}
match buffer.write_all(&byte).await {
Ok(()) => buffer_write_size += 1,
Err(e) => return Err(MeilisearchHttpError::Payload(ReceivePayload(Box::new(e)))),
}
}
if let Err(e) = buffer.flush().await {
return Err(MeilisearchHttpError::Payload(ReceivePayload(Box::new(e))));
}
if buffer_write_size == 0 {
return Err(MeilisearchHttpError::MissingPayload(format));
}
if let Err(e) = buffer.seek(std::io::SeekFrom::Start(0)).await {
return Err(MeilisearchHttpError::Payload(ReceivePayload(Box::new(e))));
}
let read_file = buffer.into_inner().into_std().await;
let documents_count = tokio::task::spawn_blocking(move || {
let documents_count = match format {
PayloadType::Json => read_json(&read_file, &mut update_file)?,
PayloadType::Csv { delimiter } => read_csv(&read_file, &mut update_file, delimiter)?,
PayloadType::Ndjson => read_ndjson(&read_file, &mut update_file)?,
};
// we NEED to persist the file here because we moved the `udpate_file` in another task.
update_file.persist()?;
Ok(documents_count)
})
.await;
let documents_count = match documents_count { let documents_count = match documents_count {
Ok(Ok(documents_count)) => documents_count, Ok(Ok(documents_count)) => documents_count,
// in this case the file has not possibly be persisted. // in this case the file has not possibly be persisted.
@ -703,6 +709,39 @@ async fn document_addition(
Ok(task.into()) Ok(task.into())
} }
async fn copy_body_to_file(
output: std::fs::File,
mut body: Payload,
format: PayloadType,
) -> Result<std::fs::File, MeilisearchHttpError> {
let async_file = File::from_std(output);
let mut buffer = BufWriter::new(async_file);
let mut buffer_write_size: usize = 0;
while let Some(result) = body.next().await {
let byte = result?;
if byte.is_empty() && buffer_write_size == 0 {
return Err(MeilisearchHttpError::MissingPayload(format));
}
match buffer.write_all(&byte).await {
Ok(()) => buffer_write_size += 1,
Err(e) => return Err(MeilisearchHttpError::Payload(ReceivePayload(Box::new(e)))),
}
}
if let Err(e) = buffer.flush().await {
return Err(MeilisearchHttpError::Payload(ReceivePayload(Box::new(e))));
}
if buffer_write_size == 0 {
return Err(MeilisearchHttpError::MissingPayload(format));
}
if let Err(e) = buffer.seek(std::io::SeekFrom::Start(0)).await {
return Err(MeilisearchHttpError::Payload(ReceivePayload(Box::new(e))));
}
let read_file = buffer.into_inner().into_std().await;
Ok(read_file)
}
pub async fn delete_documents_batch( pub async fn delete_documents_batch(
index_scheduler: GuardedData<ActionPolicy<{ actions::DOCUMENTS_DELETE }>, Data<IndexScheduler>>, index_scheduler: GuardedData<ActionPolicy<{ actions::DOCUMENTS_DELETE }>, Data<IndexScheduler>>,
index_uid: web::Path<String>, index_uid: web::Path<String>,

View File

@ -185,7 +185,8 @@ pub async fn search(
let index = index_scheduler.index(&index_uid)?; let index = index_scheduler.index(&index_uid)?;
let features = index_scheduler.features(); let features = index_scheduler.features();
let search_kind = search_kind(&search_query, &index_scheduler, &index, features)?; let search_kind =
search_kind(&search_query, &index_scheduler, index_uid.to_string(), &index, features)?;
let permit = search_queue.try_get_search_permit().await?; let permit = search_queue.try_get_search_permit().await?;
let search_result = tokio::task::spawn_blocking(move || { let search_result = tokio::task::spawn_blocking(move || {
perform_facet_search( perform_facet_search(

View File

@ -5,7 +5,7 @@ use actix_web::web::Data;
use actix_web::{web, HttpRequest, HttpResponse}; use actix_web::{web, HttpRequest, HttpResponse};
use deserr::actix_web::{AwebJson, AwebQueryParameter}; use deserr::actix_web::{AwebJson, AwebQueryParameter};
use deserr::{DeserializeError, Deserr, ValuePointerRef}; use deserr::{DeserializeError, Deserr, ValuePointerRef};
use index_scheduler::IndexScheduler; use index_scheduler::{Error, IndexScheduler};
use meilisearch_types::deserr::query_params::Param; use meilisearch_types::deserr::query_params::Param;
use meilisearch_types::deserr::{immutable_field_error, DeserrJsonError, DeserrQueryParamError}; use meilisearch_types::deserr::{immutable_field_error, DeserrJsonError, DeserrQueryParamError};
use meilisearch_types::error::deserr_codes::*; use meilisearch_types::error::deserr_codes::*;
@ -107,7 +107,10 @@ pub async fn list_indexes(
if !filters.is_index_authorized(uid) { if !filters.is_index_authorized(uid) {
return Ok(None); return Ok(None);
} }
Ok(Some(IndexView::new(uid.to_string(), index)?)) Ok(Some(
IndexView::new(uid.to_string(), index)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?,
))
})?; })?;
// Won't cause to open all indexes because IndexView doesn't keep the `Index` opened. // Won't cause to open all indexes because IndexView doesn't keep the `Index` opened.
let indexes: Vec<IndexView> = indexes.into_iter().flatten().collect(); let indexes: Vec<IndexView> = indexes.into_iter().flatten().collect();

View File

@ -243,11 +243,19 @@ pub async fn search_with_url_query(
let index = index_scheduler.index(&index_uid)?; let index = index_scheduler.index(&index_uid)?;
let features = index_scheduler.features(); let features = index_scheduler.features();
let search_kind = search_kind(&query, index_scheduler.get_ref(), &index, features)?; let search_kind =
search_kind(&query, index_scheduler.get_ref(), index_uid.to_string(), &index, features)?;
let retrieve_vector = RetrieveVectors::new(query.retrieve_vectors, features)?; let retrieve_vector = RetrieveVectors::new(query.retrieve_vectors, features)?;
let permit = search_queue.try_get_search_permit().await?; let permit = search_queue.try_get_search_permit().await?;
let search_result = tokio::task::spawn_blocking(move || { let search_result = tokio::task::spawn_blocking(move || {
perform_search(&index, query, search_kind, retrieve_vector, index_scheduler.features()) perform_search(
index_uid.to_string(),
&index,
query,
search_kind,
retrieve_vector,
index_scheduler.features(),
)
}) })
.await; .await;
permit.drop().await; permit.drop().await;
@ -287,12 +295,20 @@ pub async fn search_with_post(
let features = index_scheduler.features(); let features = index_scheduler.features();
let search_kind = search_kind(&query, index_scheduler.get_ref(), &index, features)?; let search_kind =
search_kind(&query, index_scheduler.get_ref(), index_uid.to_string(), &index, features)?;
let retrieve_vectors = RetrieveVectors::new(query.retrieve_vectors, features)?; let retrieve_vectors = RetrieveVectors::new(query.retrieve_vectors, features)?;
let permit = search_queue.try_get_search_permit().await?; let permit = search_queue.try_get_search_permit().await?;
let search_result = tokio::task::spawn_blocking(move || { let search_result = tokio::task::spawn_blocking(move || {
perform_search(&index, query, search_kind, retrieve_vectors, index_scheduler.features()) perform_search(
index_uid.to_string(),
&index,
query,
search_kind,
retrieve_vectors,
index_scheduler.features(),
)
}) })
.await; .await;
permit.drop().await; permit.drop().await;
@ -314,6 +330,7 @@ pub async fn search_with_post(
pub fn search_kind( pub fn search_kind(
query: &SearchQuery, query: &SearchQuery,
index_scheduler: &IndexScheduler, index_scheduler: &IndexScheduler,
index_uid: String,
index: &milli::Index, index: &milli::Index,
features: RoFeatures, features: RoFeatures,
) -> Result<SearchKind, ResponseError> { ) -> Result<SearchKind, ResponseError> {
@ -332,7 +349,7 @@ pub fn search_kind(
(None, _, None) => Ok(SearchKind::KeywordOnly), (None, _, None) => Ok(SearchKind::KeywordOnly),
// hybrid.semantic_ratio == 1.0 => vector // hybrid.semantic_ratio == 1.0 => vector
(_, Some(HybridQuery { semantic_ratio, embedder }), v) if **semantic_ratio == 1.0 => { (_, Some(HybridQuery { semantic_ratio, embedder }), v) if **semantic_ratio == 1.0 => {
SearchKind::semantic(index_scheduler, index, embedder, v.map(|v| v.len())) SearchKind::semantic(index_scheduler, index_uid, index, embedder, v.map(|v| v.len()))
} }
// hybrid.semantic_ratio == 0.0 => keyword // hybrid.semantic_ratio == 0.0 => keyword
(_, Some(HybridQuery { semantic_ratio, embedder: _ }), _) if **semantic_ratio == 0.0 => { (_, Some(HybridQuery { semantic_ratio, embedder: _ }), _) if **semantic_ratio == 0.0 => {
@ -340,13 +357,14 @@ pub fn search_kind(
} }
// no query, hybrid, vector => semantic // no query, hybrid, vector => semantic
(None, Some(HybridQuery { semantic_ratio: _, embedder }), Some(v)) => { (None, Some(HybridQuery { semantic_ratio: _, embedder }), Some(v)) => {
SearchKind::semantic(index_scheduler, index, embedder, Some(v.len())) SearchKind::semantic(index_scheduler, index_uid, index, embedder, Some(v.len()))
} }
// query, no hybrid, no vector => keyword // query, no hybrid, no vector => keyword
(Some(_), None, None) => Ok(SearchKind::KeywordOnly), (Some(_), None, None) => Ok(SearchKind::KeywordOnly),
// query, hybrid, maybe vector => hybrid // query, hybrid, maybe vector => hybrid
(Some(_), Some(HybridQuery { semantic_ratio, embedder }), v) => SearchKind::hybrid( (Some(_), Some(HybridQuery { semantic_ratio, embedder }), v) => SearchKind::hybrid(
index_scheduler, index_scheduler,
index_uid,
index, index,
embedder, embedder,
**semantic_ratio, **semantic_ratio,

View File

@ -103,8 +103,13 @@ async fn similar(
let index = index_scheduler.index(&index_uid)?; let index = index_scheduler.index(&index_uid)?;
let (embedder_name, embedder, quantized) = let (embedder_name, embedder, quantized) = SearchKind::embedder(
SearchKind::embedder(&index_scheduler, &index, &query.embedder, None)?; &index_scheduler,
index_uid.to_string(),
&index,
&query.embedder,
None,
)?;
tokio::task::spawn_blocking(move || { tokio::task::spawn_blocking(move || {
perform_similar( perform_similar(

View File

@ -125,14 +125,28 @@ pub async fn multi_search_with_post(
}) })
.with_index(query_index)?; .with_index(query_index)?;
let search_kind = let index_uid_str = index_uid.to_string();
search_kind(&query, index_scheduler.get_ref(), &index, features)
.with_index(query_index)?; let search_kind = search_kind(
&query,
index_scheduler.get_ref(),
index_uid_str.clone(),
&index,
features,
)
.with_index(query_index)?;
let retrieve_vector = RetrieveVectors::new(query.retrieve_vectors, features) let retrieve_vector = RetrieveVectors::new(query.retrieve_vectors, features)
.with_index(query_index)?; .with_index(query_index)?;
let search_result = tokio::task::spawn_blocking(move || { let search_result = tokio::task::spawn_blocking(move || {
perform_search(&index, query, search_kind, retrieve_vector, features) perform_search(
index_uid_str.clone(),
&index,
query,
search_kind,
retrieve_vector,
features,
)
}) })
.await .await
.with_index(query_index)?; .with_index(query_index)?;

View File

@ -560,7 +560,8 @@ pub fn perform_federated_search(
// use an immediately invoked lambda to capture the result without returning from the function // use an immediately invoked lambda to capture the result without returning from the function
let res: Result<(), ResponseError> = (|| { let res: Result<(), ResponseError> = (|| {
let search_kind = search_kind(&query, index_scheduler, &index, features)?; let search_kind =
search_kind(&query, index_scheduler, index_uid.to_string(), &index, features)?;
let canonicalization_kind = match (&search_kind, &query.q) { let canonicalization_kind = match (&search_kind, &query.q) {
(SearchKind::SemanticOnly { .. }, _) => { (SearchKind::SemanticOnly { .. }, _) => {
@ -636,7 +637,8 @@ pub fn perform_federated_search(
search.offset(0); search.offset(0);
search.limit(required_hit_count); search.limit(required_hit_count);
let (result, _semantic_hit_count) = super::search_from_kind(search_kind, search)?; let (result, _semantic_hit_count) =
super::search_from_kind(index_uid.to_string(), search_kind, search)?;
let format = AttributesFormat { let format = AttributesFormat {
attributes_to_retrieve: query.attributes_to_retrieve, attributes_to_retrieve: query.attributes_to_retrieve,
retrieve_vectors, retrieve_vectors,
@ -670,7 +672,10 @@ pub fn perform_federated_search(
let formatter_builder = HitMaker::formatter_builder(matching_words, tokenizer); let formatter_builder = HitMaker::formatter_builder(matching_words, tokenizer);
let hit_maker = HitMaker::new(&index, &rtxn, format, formatter_builder)?; let hit_maker =
HitMaker::new(&index, &rtxn, format, formatter_builder).map_err(|e| {
MeilisearchHttpError::from_milli(e, Some(index_uid.to_string()))
})?;
results_by_query.push(SearchResultByQuery { results_by_query.push(SearchResultByQuery {
federation_options, federation_options,

View File

@ -19,7 +19,9 @@ use meilisearch_types::locales::Locale;
use meilisearch_types::milli::score_details::{ScoreDetails, ScoringStrategy}; use meilisearch_types::milli::score_details::{ScoreDetails, ScoringStrategy};
use meilisearch_types::milli::vector::parsed_vectors::ExplicitVectors; use meilisearch_types::milli::vector::parsed_vectors::ExplicitVectors;
use meilisearch_types::milli::vector::Embedder; use meilisearch_types::milli::vector::Embedder;
use meilisearch_types::milli::{FacetValueHit, OrderBy, SearchForFacetValues, TimeBudget}; use meilisearch_types::milli::{
FacetValueHit, InternalError, OrderBy, SearchForFacetValues, TimeBudget,
};
use meilisearch_types::settings::DEFAULT_PAGINATION_MAX_TOTAL_HITS; use meilisearch_types::settings::DEFAULT_PAGINATION_MAX_TOTAL_HITS;
use meilisearch_types::{milli, Document}; use meilisearch_types::{milli, Document};
use milli::tokenizer::{Language, TokenizerBuilder}; use milli::tokenizer::{Language, TokenizerBuilder};
@ -281,35 +283,38 @@ pub enum SearchKind {
impl SearchKind { impl SearchKind {
pub(crate) fn semantic( pub(crate) fn semantic(
index_scheduler: &index_scheduler::IndexScheduler, index_scheduler: &index_scheduler::IndexScheduler,
index_uid: String,
index: &Index, index: &Index,
embedder_name: &str, embedder_name: &str,
vector_len: Option<usize>, vector_len: Option<usize>,
) -> Result<Self, ResponseError> { ) -> Result<Self, ResponseError> {
let (embedder_name, embedder, quantized) = let (embedder_name, embedder, quantized) =
Self::embedder(index_scheduler, index, embedder_name, vector_len)?; Self::embedder(index_scheduler, index_uid, index, embedder_name, vector_len)?;
Ok(Self::SemanticOnly { embedder_name, embedder, quantized }) Ok(Self::SemanticOnly { embedder_name, embedder, quantized })
} }
pub(crate) fn hybrid( pub(crate) fn hybrid(
index_scheduler: &index_scheduler::IndexScheduler, index_scheduler: &index_scheduler::IndexScheduler,
index_uid: String,
index: &Index, index: &Index,
embedder_name: &str, embedder_name: &str,
semantic_ratio: f32, semantic_ratio: f32,
vector_len: Option<usize>, vector_len: Option<usize>,
) -> Result<Self, ResponseError> { ) -> Result<Self, ResponseError> {
let (embedder_name, embedder, quantized) = let (embedder_name, embedder, quantized) =
Self::embedder(index_scheduler, index, embedder_name, vector_len)?; Self::embedder(index_scheduler, index_uid, index, embedder_name, vector_len)?;
Ok(Self::Hybrid { embedder_name, embedder, quantized, semantic_ratio }) Ok(Self::Hybrid { embedder_name, embedder, quantized, semantic_ratio })
} }
pub(crate) fn embedder( pub(crate) fn embedder(
index_scheduler: &index_scheduler::IndexScheduler, index_scheduler: &index_scheduler::IndexScheduler,
index_uid: String,
index: &Index, index: &Index,
embedder_name: &str, embedder_name: &str,
vector_len: Option<usize>, vector_len: Option<usize>,
) -> Result<(String, Arc<Embedder>, bool), ResponseError> { ) -> Result<(String, Arc<Embedder>, bool), ResponseError> {
let embedder_configs = index.embedding_configs(&index.read_txn()?)?; let embedder_configs = index.embedding_configs(&index.read_txn()?)?;
let embedders = index_scheduler.embedders(embedder_configs)?; let embedders = index_scheduler.embedders(index_uid, embedder_configs)?;
let (embedder, _, quantized) = embedders let (embedder, _, quantized) = embedders
.get(embedder_name) .get(embedder_name)
@ -890,6 +895,7 @@ fn prepare_search<'t>(
} }
pub fn perform_search( pub fn perform_search(
index_uid: String,
index: &Index, index: &Index,
query: SearchQuery, query: SearchQuery,
search_kind: SearchKind, search_kind: SearchKind,
@ -916,7 +922,7 @@ pub fn perform_search(
used_negative_operator, used_negative_operator,
}, },
semantic_hit_count, semantic_hit_count,
) = search_from_kind(search_kind, search)?; ) = search_from_kind(index_uid, search_kind, search)?;
let SearchQuery { let SearchQuery {
q, q,
@ -1069,17 +1075,27 @@ fn compute_facet_distribution_stats<S: AsRef<str>>(
} }
pub fn search_from_kind( pub fn search_from_kind(
index_uid: String,
search_kind: SearchKind, search_kind: SearchKind,
search: milli::Search<'_>, search: milli::Search<'_>,
) -> Result<(milli::SearchResult, Option<u32>), MeilisearchHttpError> { ) -> Result<(milli::SearchResult, Option<u32>), MeilisearchHttpError> {
let (milli_result, semantic_hit_count) = match &search_kind { let (milli_result, semantic_hit_count) = match &search_kind {
SearchKind::KeywordOnly => (search.execute()?, None), SearchKind::KeywordOnly => {
let results = search
.execute()
.map_err(|e| MeilisearchHttpError::from_milli(e, Some(index_uid.to_string())))?;
(results, None)
}
SearchKind::SemanticOnly { .. } => { SearchKind::SemanticOnly { .. } => {
let results = search.execute()?; let results = search
.execute()
.map_err(|e| MeilisearchHttpError::from_milli(e, Some(index_uid.to_string())))?;
let semantic_hit_count = results.document_scores.len() as u32; let semantic_hit_count = results.document_scores.len() as u32;
(results, Some(semantic_hit_count)) (results, Some(semantic_hit_count))
} }
SearchKind::Hybrid { semantic_ratio, .. } => search.execute_hybrid(*semantic_ratio)?, SearchKind::Hybrid { semantic_ratio, .. } => search
.execute_hybrid(*semantic_ratio)
.map_err(|e| MeilisearchHttpError::from_milli(e, Some(index_uid)))?,
}; };
Ok((milli_result, semantic_hit_count)) Ok((milli_result, semantic_hit_count))
} }
@ -1181,7 +1197,7 @@ impl<'a> HitMaker<'a> {
rtxn: &'a RoTxn<'a>, rtxn: &'a RoTxn<'a>,
format: AttributesFormat, format: AttributesFormat,
mut formatter_builder: MatcherBuilder<'a>, mut formatter_builder: MatcherBuilder<'a>,
) -> Result<Self, MeilisearchHttpError> { ) -> milli::Result<Self> {
formatter_builder.crop_marker(format.crop_marker); formatter_builder.crop_marker(format.crop_marker);
formatter_builder.highlight_prefix(format.highlight_pre_tag); formatter_builder.highlight_prefix(format.highlight_pre_tag);
formatter_builder.highlight_suffix(format.highlight_post_tag); formatter_builder.highlight_suffix(format.highlight_post_tag);
@ -1276,11 +1292,7 @@ impl<'a> HitMaker<'a> {
}) })
} }
pub fn make_hit( pub fn make_hit(&self, id: u32, score: &[ScoreDetails]) -> milli::Result<SearchHit> {
&self,
id: u32,
score: &[ScoreDetails],
) -> Result<SearchHit, MeilisearchHttpError> {
let (_, obkv) = let (_, obkv) =
self.index.iter_documents(self.rtxn, std::iter::once(id))?.next().unwrap()?; self.index.iter_documents(self.rtxn, std::iter::once(id))?.next().unwrap()?;
@ -1323,7 +1335,10 @@ impl<'a> HitMaker<'a> {
.is_some_and(|conf| conf.user_provided.contains(id)); .is_some_and(|conf| conf.user_provided.contains(id));
let embeddings = let embeddings =
ExplicitVectors { embeddings: Some(vector.into()), regenerate: !user_provided }; ExplicitVectors { embeddings: Some(vector.into()), regenerate: !user_provided };
vectors.insert(name, serde_json::to_value(embeddings)?); vectors.insert(
name,
serde_json::to_value(embeddings).map_err(InternalError::SerdeJson)?,
);
} }
document.insert("_vectors".into(), vectors.into()); document.insert("_vectors".into(), vectors.into());
} }
@ -1369,7 +1384,7 @@ fn make_hits<'a>(
format: AttributesFormat, format: AttributesFormat,
matching_words: milli::MatchingWords, matching_words: milli::MatchingWords,
documents_ids_scores: impl Iterator<Item = (u32, &'a Vec<ScoreDetails>)> + 'a, documents_ids_scores: impl Iterator<Item = (u32, &'a Vec<ScoreDetails>)> + 'a,
) -> Result<Vec<SearchHit>, MeilisearchHttpError> { ) -> milli::Result<Vec<SearchHit>> {
let mut documents = Vec::new(); let mut documents = Vec::new();
let dictionary = index.dictionary(rtxn)?; let dictionary = index.dictionary(rtxn)?;
@ -1407,6 +1422,13 @@ pub fn perform_facet_search(
None => TimeBudget::default(), None => TimeBudget::default(),
}; };
if !index.facet_search(&rtxn)? {
return Err(ResponseError::from_msg(
"The facet search is disabled for this index".to_string(),
Code::FacetSearchDisabled,
));
}
// In the faceted search context, we want to use the intersection between the locales provided by the user // In the faceted search context, we want to use the intersection between the locales provided by the user
// and the locales of the facet string. // and the locales of the facet string.
// If the facet string is not localized, we **ignore** the locales provided by the user because the facet data has no locale. // If the facet string is not localized, we **ignore** the locales provided by the user because the facet data has no locale.
@ -1690,12 +1712,12 @@ fn make_document(
displayed_attributes: &BTreeSet<FieldId>, displayed_attributes: &BTreeSet<FieldId>,
field_ids_map: &FieldsIdsMap, field_ids_map: &FieldsIdsMap,
obkv: &obkv::KvReaderU16, obkv: &obkv::KvReaderU16,
) -> Result<Document, MeilisearchHttpError> { ) -> milli::Result<Document> {
let mut document = serde_json::Map::new(); let mut document = serde_json::Map::new();
// recreate the original json // recreate the original json
for (key, value) in obkv.iter() { for (key, value) in obkv.iter() {
let value = serde_json::from_slice(value)?; let value = serde_json::from_slice(value).map_err(InternalError::SerdeJson)?;
let key = field_ids_map.name(key).expect("Missing field name").to_string(); let key = field_ids_map.name(key).expect("Missing field name").to_string();
document.insert(key, value); document.insert(key, value);
@ -1720,7 +1742,7 @@ fn format_fields(
displayable_ids: &BTreeSet<FieldId>, displayable_ids: &BTreeSet<FieldId>,
locales: Option<&[Language]>, locales: Option<&[Language]>,
localized_attributes: &[LocalizedAttributesRule], localized_attributes: &[LocalizedAttributesRule],
) -> Result<(Option<MatchesPosition>, Document), MeilisearchHttpError> { ) -> milli::Result<(Option<MatchesPosition>, Document)> {
let mut matches_position = compute_matches.then(BTreeMap::new); let mut matches_position = compute_matches.then(BTreeMap::new);
let mut document = document.clone(); let mut document = document.clone();
@ -1898,7 +1920,7 @@ fn parse_filter_array(arr: &[Value]) -> Result<Option<Filter>, MeilisearchHttpEr
} }
} }
Ok(Filter::from_array(ands)?) Filter::from_array(ands).map_err(|e| MeilisearchHttpError::from_milli(e, None))
} }
#[cfg(test)] #[cfg(test)]

View File

@ -284,6 +284,7 @@ async fn test_summarized_document_addition_or_update() {
@r#" @r#"
{ {
"uid": 0, "uid": 0,
"progress": null,
"details": { "details": {
"receivedDocuments": 1, "receivedDocuments": 1,
"indexedDocuments": 1 "indexedDocuments": 1
@ -314,6 +315,7 @@ async fn test_summarized_document_addition_or_update() {
@r#" @r#"
{ {
"uid": 1, "uid": 1,
"progress": null,
"details": { "details": {
"receivedDocuments": 1, "receivedDocuments": 1,
"indexedDocuments": 1 "indexedDocuments": 1
@ -349,6 +351,7 @@ async fn test_summarized_delete_documents_by_batch() {
@r#" @r#"
{ {
"uid": 0, "uid": 0,
"progress": null,
"details": { "details": {
"providedIds": 3, "providedIds": 3,
"deletedDocuments": 0 "deletedDocuments": 0
@ -380,6 +383,7 @@ async fn test_summarized_delete_documents_by_batch() {
@r#" @r#"
{ {
"uid": 2, "uid": 2,
"progress": null,
"details": { "details": {
"providedIds": 1, "providedIds": 1,
"deletedDocuments": 0 "deletedDocuments": 0
@ -416,6 +420,7 @@ async fn test_summarized_delete_documents_by_filter() {
@r#" @r#"
{ {
"uid": 0, "uid": 0,
"progress": null,
"details": { "details": {
"providedIds": 0, "providedIds": 0,
"deletedDocuments": 0, "deletedDocuments": 0,
@ -448,6 +453,7 @@ async fn test_summarized_delete_documents_by_filter() {
@r#" @r#"
{ {
"uid": 2, "uid": 2,
"progress": null,
"details": { "details": {
"providedIds": 0, "providedIds": 0,
"deletedDocuments": 0, "deletedDocuments": 0,
@ -480,6 +486,7 @@ async fn test_summarized_delete_documents_by_filter() {
@r#" @r#"
{ {
"uid": 4, "uid": 4,
"progress": null,
"details": { "details": {
"providedIds": 0, "providedIds": 0,
"deletedDocuments": 0, "deletedDocuments": 0,
@ -516,6 +523,7 @@ async fn test_summarized_delete_document_by_id() {
@r#" @r#"
{ {
"uid": 0, "uid": 0,
"progress": null,
"details": { "details": {
"providedIds": 1, "providedIds": 1,
"deletedDocuments": 0 "deletedDocuments": 0
@ -547,6 +555,7 @@ async fn test_summarized_delete_document_by_id() {
@r#" @r#"
{ {
"uid": 2, "uid": 2,
"progress": null,
"details": { "details": {
"providedIds": 1, "providedIds": 1,
"deletedDocuments": 0 "deletedDocuments": 0
@ -594,6 +603,7 @@ async fn test_summarized_settings_update() {
@r#" @r#"
{ {
"uid": 0, "uid": 0,
"progress": null,
"details": { "details": {
"displayedAttributes": [ "displayedAttributes": [
"doggos", "doggos",
@ -638,6 +648,7 @@ async fn test_summarized_index_creation() {
@r#" @r#"
{ {
"uid": 0, "uid": 0,
"progress": null,
"details": {}, "details": {},
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,
@ -665,6 +676,7 @@ async fn test_summarized_index_creation() {
@r#" @r#"
{ {
"uid": 1, "uid": 1,
"progress": null,
"details": { "details": {
"primaryKey": "doggos" "primaryKey": "doggos"
}, },
@ -809,6 +821,7 @@ async fn test_summarized_index_update() {
@r#" @r#"
{ {
"uid": 0, "uid": 0,
"progress": null,
"details": {}, "details": {},
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,
@ -836,6 +849,7 @@ async fn test_summarized_index_update() {
@r#" @r#"
{ {
"uid": 1, "uid": 1,
"progress": null,
"details": { "details": {
"primaryKey": "bones" "primaryKey": "bones"
}, },
@ -868,6 +882,7 @@ async fn test_summarized_index_update() {
@r#" @r#"
{ {
"uid": 3, "uid": 3,
"progress": null,
"details": {}, "details": {},
"stats": { "stats": {
"totalNbTasks": 1, "totalNbTasks": 1,
@ -895,6 +910,7 @@ async fn test_summarized_index_update() {
@r#" @r#"
{ {
"uid": 4, "uid": 4,
"progress": null,
"details": { "details": {
"primaryKey": "bones" "primaryKey": "bones"
}, },
@ -932,6 +948,7 @@ async fn test_summarized_index_swap() {
@r#" @r#"
{ {
"uid": 0, "uid": 0,
"progress": null,
"details": { "details": {
"swaps": [ "swaps": [
{ {
@ -972,6 +989,7 @@ async fn test_summarized_index_swap() {
@r#" @r#"
{ {
"uid": 3, "uid": 3,
"progress": null,
"details": { "details": {
"swaps": [ "swaps": [
{ {
@ -1014,6 +1032,7 @@ async fn test_summarized_batch_cancelation() {
@r#" @r#"
{ {
"uid": 1, "uid": 1,
"progress": null,
"details": { "details": {
"matchedTasks": 1, "matchedTasks": 1,
"canceledTasks": 0, "canceledTasks": 0,
@ -1051,6 +1070,7 @@ async fn test_summarized_batch_deletion() {
@r#" @r#"
{ {
"uid": 1, "uid": 1,
"progress": null,
"details": { "details": {
"matchedTasks": 1, "matchedTasks": 1,
"deletedTasks": 1, "deletedTasks": 1,
@ -1084,6 +1104,7 @@ async fn test_summarized_dump_creation() {
@r#" @r#"
{ {
"uid": 0, "uid": 0,
"progress": null,
"details": { "details": {
"dumpUid": "[dumpUid]" "dumpUid": "[dumpUid]"
}, },

View File

@ -52,6 +52,25 @@ impl Value {
} }
self self
} }
/// Return `true` if the `status` field is set to `failed`.
/// Panic if the `status` field doesn't exists.
#[track_caller]
pub fn is_fail(&self) -> bool {
if !self["status"].is_string() {
panic!("Called `is_fail` on {}", serde_json::to_string_pretty(&self.0).unwrap());
}
self["status"] == serde_json::Value::String(String::from("failed"))
}
// Panic if the json doesn't contain the `status` field set to "succeeded"
#[track_caller]
pub fn failed(&self) -> &Self {
if !self.is_fail() {
panic!("Called failed on {}", serde_json::to_string_pretty(&self.0).unwrap());
}
self
}
} }
impl From<serde_json::Value> for Value { impl From<serde_json::Value> for Value {

View File

@ -1220,9 +1220,89 @@ async fn replace_document() {
#[actix_rt::test] #[actix_rt::test]
async fn add_no_documents() { async fn add_no_documents() {
let server = Server::new().await; let server = Server::new().await;
let index = server.index("test"); let index = server.index("kefir");
let (_response, code) = index.add_documents(json!([]), None).await; let (task, code) = index.add_documents(json!([]), None).await;
snapshot!(code, @"202 Accepted"); snapshot!(code, @"202 Accepted");
let task = server.wait_task(task.uid()).await;
let task = task.succeeded();
snapshot!(task, @r#"
{
"uid": "[uid]",
"batchUid": "[batch_uid]",
"indexUid": "kefir",
"status": "succeeded",
"type": "documentAdditionOrUpdate",
"canceledBy": null,
"details": {
"receivedDocuments": 0,
"indexedDocuments": 0
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"#);
let (task, _code) = index.add_documents(json!([]), Some("kefkef")).await;
let task = server.wait_task(task.uid()).await;
let task = task.succeeded();
snapshot!(task, @r#"
{
"uid": "[uid]",
"batchUid": "[batch_uid]",
"indexUid": "kefir",
"status": "succeeded",
"type": "documentAdditionOrUpdate",
"canceledBy": null,
"details": {
"receivedDocuments": 0,
"indexedDocuments": 0
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"#);
let (task, _code) = index.add_documents(json!([{ "kefkef": 1 }]), None).await;
let task = server.wait_task(task.uid()).await;
let task = task.succeeded();
snapshot!(task, @r#"
{
"uid": "[uid]",
"batchUid": "[batch_uid]",
"indexUid": "kefir",
"status": "succeeded",
"type": "documentAdditionOrUpdate",
"canceledBy": null,
"details": {
"receivedDocuments": 1,
"indexedDocuments": 1
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"#);
let (documents, _status) = index.get_all_documents(GetAllDocumentsOptions::default()).await;
snapshot!(documents, @r#"
{
"results": [
{
"kefkef": 1
}
],
"offset": 0,
"limit": 20,
"total": 1
}
"#);
} }
#[actix_rt::test] #[actix_rt::test]
@ -1264,15 +1344,18 @@ async fn error_add_documents_bad_document_id() {
let server = Server::new().await; let server = Server::new().await;
let index = server.index("test"); let index = server.index("test");
index.create(Some("docid")).await; index.create(Some("docid")).await;
// unsupported characters
let documents = json!([ let documents = json!([
{ {
"docid": "foo & bar", "docid": "foo & bar",
"content": "foobar" "content": "foobar"
} }
]); ]);
index.add_documents(documents, None).await; let (value, _code) = index.add_documents(documents, None).await;
index.wait_task(1).await; index.wait_task(value.uid()).await;
let (response, code) = index.get_task(1).await; let (response, code) = index.get_task(value.uid()).await;
snapshot!(code, @"200 OK"); snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]" }), snapshot!(json_string!(response, { ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]" }),
@r###" @r###"
@ -1288,7 +1371,81 @@ async fn error_add_documents_bad_document_id() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Document identifier `\"foo & bar\"` is invalid. A document identifier can be of type integer or string, only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), and can not be more than 512 bytes.", "message": "Document identifier `\"foo & bar\"` is invalid. A document identifier can be of type integer or string, only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), and can not be more than 511 bytes.",
"code": "invalid_document_id",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_id"
},
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
// More than 512 bytes
let documents = json!([
{
"docid": "a".repeat(600),
"content": "foobar"
}
]);
let (value, _code) = index.add_documents(documents, None).await;
index.wait_task(value.uid()).await;
let (response, code) = index.get_task(value.uid()).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]" }),
@r###"
{
"uid": 2,
"batchUid": 2,
"indexUid": "test",
"status": "failed",
"type": "documentAdditionOrUpdate",
"canceledBy": null,
"details": {
"receivedDocuments": 1,
"indexedDocuments": 0
},
"error": {
"message": "Document identifier `\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\"` is invalid. A document identifier can be of type integer or string, only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), and can not be more than 511 bytes.",
"code": "invalid_document_id",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_id"
},
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
// Exactly 512 bytes
let documents = json!([
{
"docid": "a".repeat(512),
"content": "foobar"
}
]);
let (value, _code) = index.add_documents(documents, None).await;
index.wait_task(value.uid()).await;
let (response, code) = index.get_task(value.uid()).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response, { ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]" }),
@r###"
{
"uid": 3,
"batchUid": 3,
"indexUid": "test",
"status": "failed",
"type": "documentAdditionOrUpdate",
"canceledBy": null,
"details": {
"receivedDocuments": 1,
"indexedDocuments": 0
},
"error": {
"message": "Document identifier `\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\"` is invalid. A document identifier can be of type integer or string, only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), and can not be more than 511 bytes.",
"code": "invalid_document_id", "code": "invalid_document_id",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_id" "link": "https://docs.meilisearch.com/errors#invalid_document_id"
@ -1681,7 +1838,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "The `_geo` field in the document with the id: `\"11\"` is not an object. Was expecting an object with the `_geo.lat` and `_geo.lng` fields but instead got `\"foobar\"`.", "message": "Index `test`: The `_geo` field in the document with the id: `\"11\"` is not an object. Was expecting an object with the `_geo.lat` and `_geo.lng` fields but instead got `\"foobar\"`.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -1719,7 +1876,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not find latitude nor longitude in the document with the id: `\"11\"`. Was expecting `_geo.lat` and `_geo.lng` fields.", "message": "Index `test`: Could not find latitude nor longitude in the document with the id: `\"11\"`. Was expecting `_geo.lat` and `_geo.lng` fields.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -1757,7 +1914,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not find latitude nor longitude in the document with the id: `\"11\"`. Was expecting `_geo.lat` and `_geo.lng` fields.", "message": "Index `test`: Could not find latitude nor longitude in the document with the id: `\"11\"`. Was expecting `_geo.lat` and `_geo.lng` fields.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -1795,7 +1952,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.", "message": "Index `test`: Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -1833,7 +1990,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.", "message": "Index `test`: Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -1871,7 +2028,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.", "message": "Index `test`: Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -1909,7 +2066,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.", "message": "Index `test`: Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -1947,7 +2104,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not parse latitude nor longitude in the document with the id: `\"11\"`. Was expecting finite numbers but instead got `false` and `true`.", "message": "Index `test`: Could not parse latitude nor longitude in the document with the id: `\"11\"`. Was expecting finite numbers but instead got `false` and `true`.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -1985,7 +2142,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.", "message": "Index `test`: Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -2023,7 +2180,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.", "message": "Index `test`: Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -2061,7 +2218,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not parse latitude nor longitude in the document with the id: `\"11\"`. Was expecting finite numbers but instead got `\"doggo\"` and `\"doggo\"`.", "message": "Index `test`: Could not parse latitude nor longitude in the document with the id: `\"11\"`. Was expecting finite numbers but instead got `\"doggo\"` and `\"doggo\"`.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -2099,7 +2256,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "The `_geo` field in the document with the id: `\"11\"` contains the following unexpected fields: `{\"doggo\":\"are the best\"}`.", "message": "Index `test`: The `_geo` field in the document with the id: `\"11\"` contains the following unexpected fields: `{\"doggo\":\"are the best\"}`.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -2138,7 +2295,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not parse longitude in the document with the id: `\"12\"`. Was expecting a finite number but instead got `null`.", "message": "Index `test`: Could not parse longitude in the document with the id: `\"12\"`. Was expecting a finite number but instead got `null`.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -2175,7 +2332,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not parse latitude in the document with the id: `\"12\"`. Was expecting a finite number but instead got `null`.", "message": "Index `test`: Could not parse latitude in the document with the id: `\"12\"`. Was expecting a finite number but instead got `null`.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -2212,7 +2369,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Could not parse latitude nor longitude in the document with the id: `\"13\"`. Was expecting finite numbers but instead got `null` and `null`.", "message": "Index `test`: Could not parse latitude nor longitude in the document with the id: `\"13\"`. Was expecting finite numbers but instead got `null` and `null`.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@ -2279,7 +2436,7 @@ async fn add_invalid_geo_and_then_settings() {
] ]
}, },
"error": { "error": {
"message": "Could not parse latitude in the document with the id: `\"11\"`. Was expecting a finite number but instead got `null`.", "message": "Index `test`: Could not parse latitude in the document with the id: `\"11\"`. Was expecting a finite number but instead got `null`.",
"code": "invalid_document_geo_field", "code": "invalid_document_geo_field",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field" "link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"

View File

@ -604,7 +604,7 @@ async fn delete_document_by_filter() {
"originalFilter": "\"doggo = bernese\"" "originalFilter": "\"doggo = bernese\""
}, },
"error": { "error": {
"message": "Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese", "message": "Index `EMPTY_INDEX`: Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese",
"code": "invalid_document_filter", "code": "invalid_document_filter",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_filter" "link": "https://docs.meilisearch.com/errors#invalid_document_filter"
@ -636,7 +636,7 @@ async fn delete_document_by_filter() {
"originalFilter": "\"catto = jorts\"" "originalFilter": "\"catto = jorts\""
}, },
"error": { "error": {
"message": "Attribute `catto` is not filterable. Available filterable attributes are: `id`, `title`.\n1:6 catto = jorts", "message": "Index `SHARED_DOCUMENTS`: Attribute `catto` is not filterable. Available filterable attributes are: `id`, `title`.\n1:6 catto = jorts",
"code": "invalid_document_filter", "code": "invalid_document_filter",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_filter" "link": "https://docs.meilisearch.com/errors#invalid_document_filter"

View File

@ -172,7 +172,7 @@ async fn error_update_documents_bad_document_id() {
assert_eq!( assert_eq!(
response["error"]["message"], response["error"]["message"],
json!( json!(
r#"Document identifier `"foo & bar"` is invalid. A document identifier can be of type integer or string, only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), and can not be more than 512 bytes."# r#"Document identifier `"foo & bar"` is invalid. A document identifier can be of type integer or string, only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), and can not be more than 511 bytes."#
) )
); );
assert_eq!(response["error"]["code"], json!("invalid_document_id")); assert_eq!(response["error"]["code"], json!("invalid_document_id"));

View File

@ -95,7 +95,7 @@ async fn error_update_existing_primary_key() {
let response = index.wait_task(2).await; let response = index.wait_task(2).await;
let expected_response = json!({ let expected_response = json!({
"message": "Index already has a primary key: `id`.", "message": "Index `test`: Index already has a primary key: `id`.",
"code": "index_primary_key_already_exists", "code": "index_primary_key_already_exists",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#index_primary_key_already_exists" "link": "https://docs.meilisearch.com/errors#index_primary_key_already_exists"

View File

@ -711,7 +711,7 @@ async fn filter_invalid_attribute_array() {
index.wait_task(task.uid()).await; index.wait_task(task.uid()).await;
let expected_response = json!({ let expected_response = json!({
"message": "Attribute `many` is not filterable. Available filterable attributes are: `title`.\n1:5 many = Glass", "message": format!("Index `{}`: Attribute `many` is not filterable. Available filterable attributes are: `title`.\n1:5 many = Glass", index.uid),
"code": "invalid_search_filter", "code": "invalid_search_filter",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter" "link": "https://docs.meilisearch.com/errors#invalid_search_filter"
@ -733,7 +733,7 @@ async fn filter_invalid_attribute_string() {
index.wait_task(task.uid()).await; index.wait_task(task.uid()).await;
let expected_response = json!({ let expected_response = json!({
"message": "Attribute `many` is not filterable. Available filterable attributes are: `title`.\n1:5 many = Glass", "message": format!("Index `{}`: Attribute `many` is not filterable. Available filterable attributes are: `title`.\n1:5 many = Glass", index.uid),
"code": "invalid_search_filter", "code": "invalid_search_filter",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter" "link": "https://docs.meilisearch.com/errors#invalid_search_filter"
@ -940,7 +940,7 @@ async fn sort_unsortable_attribute() {
index.wait_task(response.uid()).await.succeeded(); index.wait_task(response.uid()).await.succeeded();
let expected_response = json!({ let expected_response = json!({
"message": "Attribute `title` is not sortable. Available sortable attributes are: `id`.", "message": format!("Index `{}`: Attribute `title` is not sortable. Available sortable attributes are: `id`.", index.uid),
"code": "invalid_search_sort", "code": "invalid_search_sort",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort" "link": "https://docs.meilisearch.com/errors#invalid_search_sort"
@ -998,7 +998,7 @@ async fn sort_unset_ranking_rule() {
index.wait_task(response.uid()).await.succeeded(); index.wait_task(response.uid()).await.succeeded();
let expected_response = json!({ let expected_response = json!({
"message": "You must specify where `sort` is listed in the rankingRules setting to use the sort parameter at search time.", "message": format!("Index `{}`: You must specify where `sort` is listed in the rankingRules setting to use the sort parameter at search time.", index.uid),
"code": "invalid_search_sort", "code": "invalid_search_sort",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort" "link": "https://docs.meilisearch.com/errors#invalid_search_sort"
@ -1024,19 +1024,18 @@ async fn search_on_unknown_field() {
index.update_settings_searchable_attributes(json!(["id", "title"])).await; index.update_settings_searchable_attributes(json!(["id", "title"])).await;
index.wait_task(response.uid()).await.succeeded(); index.wait_task(response.uid()).await.succeeded();
let expected_response = json!({
"message": format!("Index `{}`: Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.", index.uid),
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
});
index index
.search( .search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["unknown"]}), json!({"q": "Captain Marvel", "attributesToSearchOn": ["unknown"]}),
|response, code| { |response, code| {
snapshot!(code, @"400 Bad Request"); assert_eq!(response, expected_response);
snapshot!(json_string!(response), @r###" assert_eq!(code, 400);
{
"message": "Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.",
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
}
"###);
}, },
) )
.await; .await;
@ -1050,19 +1049,18 @@ async fn search_on_unknown_field_plus_joker() {
index.update_settings_searchable_attributes(json!(["id", "title"])).await; index.update_settings_searchable_attributes(json!(["id", "title"])).await;
index.wait_task(response.uid()).await.succeeded(); index.wait_task(response.uid()).await.succeeded();
let expected_response = json!({
"message": format!("Index `{}`: Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.", index.uid),
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
});
index index
.search( .search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["*", "unknown"]}), json!({"q": "Captain Marvel", "attributesToSearchOn": ["*", "unknown"]}),
|response, code| { |response, code| {
snapshot!(code, @"400 Bad Request"); assert_eq!(response, expected_response);
snapshot!(json_string!(response), @r###" assert_eq!(code, 400);
{
"message": "Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.",
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
}
"###);
}, },
) )
.await; .await;
@ -1071,15 +1069,8 @@ async fn search_on_unknown_field_plus_joker() {
.search( .search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["unknown", "*"]}), json!({"q": "Captain Marvel", "attributesToSearchOn": ["unknown", "*"]}),
|response, code| { |response, code| {
snapshot!(code, @"400 Bad Request"); assert_eq!(response, expected_response);
snapshot!(json_string!(response), @r###" assert_eq!(code, 400);
{
"message": "Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.",
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
}
"###);
}, },
) )
.await; .await;
@ -1092,47 +1083,44 @@ async fn distinct_at_search_time() {
let (task, _) = index.create(None).await; let (task, _) = index.create(None).await;
index.wait_task(task.uid()).await.succeeded(); index.wait_task(task.uid()).await.succeeded();
let expected_response = json!({
"message": format!("Index `{}`: Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. This index does not have configured filterable attributes.", index.uid),
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
});
let (response, code) = let (response, code) =
index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": "doggo.truc"})).await; index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": "doggo.truc"})).await;
snapshot!(code, @"400 Bad Request"); assert_eq!(response, expected_response);
snapshot!(response, @r###" assert_eq!(code, 400);
{
"message": "Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. This index does not have configured filterable attributes.",
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
}
"###);
let (task, _) = index.update_settings_filterable_attributes(json!(["color", "machin"])).await; let (task, _) = index.update_settings_filterable_attributes(json!(["color", "machin"])).await;
index.wait_task(task.uid()).await; index.wait_task(task.uid()).await;
let expected_response = json!({
"message": format!("Index `{}`: Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. Available filterable attributes are: `color, machin`.", index.uid),
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
});
let (response, code) = let (response, code) =
index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": "doggo.truc"})).await; index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": "doggo.truc"})).await;
snapshot!(code, @"400 Bad Request"); assert_eq!(response, expected_response);
snapshot!(response, @r###" assert_eq!(code, 400);
{
"message": "Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. Available filterable attributes are: `color, machin`.",
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
}
"###);
let (task, _) = index.update_settings_displayed_attributes(json!(["color"])).await; let (task, _) = index.update_settings_displayed_attributes(json!(["color"])).await;
index.wait_task(task.uid()).await; index.wait_task(task.uid()).await;
let expected_response = json!({
"message": format!("Index `{}`: Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. Available filterable attributes are: `color, <..hidden-attributes>`.", index.uid),
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
});
let (response, code) = let (response, code) =
index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": "doggo.truc"})).await; index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": "doggo.truc"})).await;
snapshot!(code, @"400 Bad Request"); assert_eq!(response, expected_response);
snapshot!(response, @r###" assert_eq!(code, 400);
{
"message": "Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. Available filterable attributes are: `color, <..hidden-attributes>`.",
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
}
"###);
let (response, code) = let (response, code) =
index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": true})).await; index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": true})).await;

View File

@ -57,6 +57,116 @@ async fn simple_facet_search() {
assert_eq!(response["facetHits"].as_array().unwrap().len(), 1); assert_eq!(response["facetHits"].as_array().unwrap().len(), 1);
} }
#[actix_rt::test]
async fn simple_facet_search_on_movies() {
let server = Server::new().await;
let index = server.index("test");
let documents = json!([
{
"id": 1,
"title": "Carol",
"genres": [
"Romance",
"Drama"
],
"color": [
"red"
],
"platforms": [
"MacOS",
"Linux",
"Windows"
]
},
{
"id": 2,
"title": "Wonder Woman",
"genres": [
"Action",
"Adventure"
],
"color": [
"green"
],
"platforms": [
"MacOS"
]
},
{
"id": 3,
"title": "Life of Pi",
"genres": [
"Adventure",
"Drama"
],
"color": [
"blue"
],
"platforms": [
"Windows"
]
},
{
"id": 4,
"title": "Mad Max: Fury Road",
"genres": [
"Adventure",
"Science Fiction"
],
"color": [
"red"
],
"platforms": [
"MacOS",
"Linux"
]
},
{
"id": 5,
"title": "Moana",
"genres": [
"Fantasy",
"Action"
],
"color": [
"red"
],
"platforms": [
"Windows"
]
},
{
"id": 6,
"title": "Philadelphia",
"genres": [
"Drama"
],
"color": [
"blue"
],
"platforms": [
"MacOS",
"Linux",
"Windows"
]
}
]);
let (response, code) =
index.update_settings_filterable_attributes(json!(["genres", "color"])).await;
assert_eq!(202, code, "{:?}", response);
index.wait_task(response.uid()).await;
let (response, _code) = index.add_documents(documents, None).await;
index.wait_task(response.uid()).await;
let (response, code) =
index.facet_search(json!({"facetQuery": "", "facetName": "genres", "q": "" })).await;
assert_eq!(code, 200, "{}", response);
snapshot!(response["facetHits"], @r###"[{"value":"Action","count":2},{"value":"Adventure","count":3},{"value":"Drama","count":3},{"value":"Fantasy","count":1},{"value":"Romance","count":1},{"value":"Science Fiction","count":1}]"###);
}
#[actix_rt::test] #[actix_rt::test]
async fn advanced_facet_search() { async fn advanced_facet_search() {
let server = Server::new().await; let server = Server::new().await;
@ -221,8 +331,15 @@ async fn add_documents_and_deactivate_facet_search() {
let (response, code) = let (response, code) =
index.facet_search(json!({"facetName": "genres", "facetQuery": "a"})).await; index.facet_search(json!({"facetName": "genres", "facetQuery": "a"})).await;
assert_eq!(code, 200, "{}", response); assert_eq!(code, 400, "{}", response);
assert_eq!(dbg!(response)["facetHits"].as_array().unwrap().len(), 0); snapshot!(response, @r###"
{
"message": "The facet search is disabled for this index",
"code": "facet_search_disabled",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#facet_search_disabled"
}
"###);
} }
#[actix_rt::test] #[actix_rt::test]
@ -245,8 +362,15 @@ async fn deactivate_facet_search_and_add_documents() {
let (response, code) = let (response, code) =
index.facet_search(json!({"facetName": "genres", "facetQuery": "a"})).await; index.facet_search(json!({"facetName": "genres", "facetQuery": "a"})).await;
assert_eq!(code, 200, "{}", response); assert_eq!(code, 400, "{}", response);
assert_eq!(dbg!(response)["facetHits"].as_array().unwrap().len(), 0); snapshot!(response, @r###"
{
"message": "The facet search is disabled for this index",
"code": "facet_search_disabled",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#facet_search_disabled"
}
"###);
} }
#[actix_rt::test] #[actix_rt::test]

View File

@ -1070,7 +1070,7 @@ async fn federation_one_query_error() {
snapshot!(code, @"400 Bad Request"); snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###" snapshot!(json_string!(response), @r###"
{ {
"message": "Inside `.queries[1]`: Attribute `title` is not filterable. This index does not have configured filterable attributes.\n1:6 title = toto", "message": "Inside `.queries[1]`: Index `nested`: Attribute `title` is not filterable. This index does not have configured filterable attributes.\n1:6 title = toto",
"code": "invalid_search_filter", "code": "invalid_search_filter",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter" "link": "https://docs.meilisearch.com/errors#invalid_search_filter"
@ -1102,7 +1102,7 @@ async fn federation_one_query_sort_error() {
snapshot!(code, @"400 Bad Request"); snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###" snapshot!(json_string!(response), @r###"
{ {
"message": "Inside `.queries[1]`: Attribute `doggos` is not sortable. This index does not have configured sortable attributes.", "message": "Inside `.queries[1]`: Index `nested`: Attribute `doggos` is not sortable. This index does not have configured sortable attributes.",
"code": "invalid_search_sort", "code": "invalid_search_sort",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort" "link": "https://docs.meilisearch.com/errors#invalid_search_sort"
@ -1166,7 +1166,7 @@ async fn federation_multiple_query_errors() {
snapshot!(code, @"400 Bad Request"); snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###" snapshot!(json_string!(response), @r###"
{ {
"message": "Inside `.queries[0]`: Attribute `title` is not filterable. This index does not have configured filterable attributes.\n1:6 title = toto", "message": "Inside `.queries[0]`: Index `test`: Attribute `title` is not filterable. This index does not have configured filterable attributes.\n1:6 title = toto",
"code": "invalid_search_filter", "code": "invalid_search_filter",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter" "link": "https://docs.meilisearch.com/errors#invalid_search_filter"
@ -1198,7 +1198,7 @@ async fn federation_multiple_query_sort_errors() {
snapshot!(code, @"400 Bad Request"); snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###" snapshot!(json_string!(response), @r###"
{ {
"message": "Inside `.queries[0]`: Attribute `title` is not sortable. This index does not have configured sortable attributes.", "message": "Inside `.queries[0]`: Index `test`: Attribute `title` is not sortable. This index does not have configured sortable attributes.",
"code": "invalid_search_sort", "code": "invalid_search_sort",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort" "link": "https://docs.meilisearch.com/errors#invalid_search_sort"
@ -1231,7 +1231,7 @@ async fn federation_multiple_query_errors_interleaved() {
snapshot!(code, @"400 Bad Request"); snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###" snapshot!(json_string!(response), @r###"
{ {
"message": "Inside `.queries[1]`: Attribute `doggos` is not filterable. This index does not have configured filterable attributes.\n1:7 doggos IN [intel, kefir]", "message": "Inside `.queries[1]`: Index `nested`: Attribute `doggos` is not filterable. This index does not have configured filterable attributes.\n1:7 doggos IN [intel, kefir]",
"code": "invalid_search_filter", "code": "invalid_search_filter",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter" "link": "https://docs.meilisearch.com/errors#invalid_search_filter"
@ -1264,7 +1264,7 @@ async fn federation_multiple_query_sort_errors_interleaved() {
snapshot!(code, @"400 Bad Request"); snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###" snapshot!(json_string!(response), @r###"
{ {
"message": "Inside `.queries[1]`: Attribute `doggos` is not sortable. This index does not have configured sortable attributes.", "message": "Inside `.queries[1]`: Index `nested`: Attribute `doggos` is not sortable. This index does not have configured sortable attributes.",
"code": "invalid_search_sort", "code": "invalid_search_sort",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort" "link": "https://docs.meilisearch.com/errors#invalid_search_sort"

View File

@ -79,7 +79,7 @@ async fn similar_bad_id() {
snapshot!(code, @"400 Bad Request"); snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###" snapshot!(json_string!(response), @r###"
{ {
"message": "Invalid value at `.id`: the value of `id` is invalid. A document identifier can be of type integer or string, only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), and can not be more than 512 bytes.", "message": "Invalid value at `.id`: the value of `id` is invalid. A document identifier can be of type integer or string, only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), and can not be more than 511 bytes.",
"code": "invalid_similar_id", "code": "invalid_similar_id",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_similar_id" "link": "https://docs.meilisearch.com/errors#invalid_similar_id"
@ -172,7 +172,7 @@ async fn similar_invalid_id() {
snapshot!(code, @"400 Bad Request"); snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###" snapshot!(json_string!(response), @r###"
{ {
"message": "Invalid value at `.id`: the value of `id` is invalid. A document identifier can be of type integer or string, only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), and can not be more than 512 bytes.", "message": "Invalid value at `.id`: the value of `id` is invalid. A document identifier can be of type integer or string, only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), and can not be more than 511 bytes.",
"code": "invalid_similar_id", "code": "invalid_similar_id",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_similar_id" "link": "https://docs.meilisearch.com/errors#invalid_similar_id"

View File

@ -129,11 +129,11 @@ async fn perform_on_demand_snapshot() {
index.load_test_set().await; index.load_test_set().await;
server.index("doggo").create(Some("bone")).await; let (task, _) = server.index("doggo").create(Some("bone")).await;
index.wait_task(2).await; index.wait_task(task.uid()).await.succeeded();
server.index("doggo").create(Some("bone")).await; let (task, _) = server.index("doggo").create(Some("bone")).await;
index.wait_task(2).await; index.wait_task(task.uid()).await.failed();
let (task, code) = server.create_snapshot().await; let (task, code) = server.create_snapshot().await;
snapshot!(code, @"202 Accepted"); snapshot!(code, @"202 Accepted");

View File

@ -448,7 +448,7 @@ async fn test_summarized_delete_documents_by_filter() {
"originalFilter": "\"doggo = bernese\"" "originalFilter": "\"doggo = bernese\""
}, },
"error": { "error": {
"message": "Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese", "message": "Index `test`: Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese",
"code": "invalid_document_filter", "code": "invalid_document_filter",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_filter" "link": "https://docs.meilisearch.com/errors#invalid_document_filter"

View File

@ -318,7 +318,7 @@ async fn try_to_disable_binary_quantization() {
} }
}, },
"error": { "error": {
"message": "`.embedders.manual.binaryQuantized`: Cannot disable the binary quantization.\n - Note: Binary quantization is a lossy operation that cannot be reverted.\n - Hint: Add a new embedder that is non-quantized and regenerate the vectors.", "message": "Index `doggo`: `.embedders.manual.binaryQuantized`: Cannot disable the binary quantization.\n - Note: Binary quantization is a lossy operation that cannot be reverted.\n - Hint: Add a new embedder that is non-quantized and regenerate the vectors.",
"code": "invalid_settings_embedders", "code": "invalid_settings_embedders",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_settings_embedders" "link": "https://docs.meilisearch.com/errors#invalid_settings_embedders"

View File

@ -250,7 +250,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Bad embedder configuration in the document with id: `0`. Missing field `._vectors.manual.regenerate`\n - note: `._vectors.manual` must be an array of floats, an array of arrays of floats, or an object with field `regenerate`", "message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Missing field `._vectors.manual.regenerate`\n - note: `._vectors.manual` must be an array of floats, an array of arrays of floats, or an object with field `regenerate`",
"code": "invalid_vectors_type", "code": "invalid_vectors_type",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type" "link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@ -280,7 +280,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Bad embedder configuration in the document with id: `0`. Missing field `._vectors.manual.regenerate`\n - note: `._vectors.manual` must be an array of floats, an array of arrays of floats, or an object with field `regenerate`", "message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Missing field `._vectors.manual.regenerate`\n - note: `._vectors.manual` must be an array of floats, an array of arrays of floats, or an object with field `regenerate`",
"code": "invalid_vectors_type", "code": "invalid_vectors_type",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type" "link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@ -311,7 +311,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Bad embedder configuration in the document with id: `0`. Could not parse `._vectors.manual.regenerate`: invalid type: string \"yes please\", expected a boolean at line 1 column 26", "message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Could not parse `._vectors.manual.regenerate`: invalid type: string \"yes please\", expected a boolean at line 1 column 26",
"code": "invalid_vectors_type", "code": "invalid_vectors_type",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type" "link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@ -340,7 +340,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings`: expected null or an array, but found a boolean: `true`", "message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings`: expected null or an array, but found a boolean: `true`",
"code": "invalid_vectors_type", "code": "invalid_vectors_type",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type" "link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@ -369,7 +369,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0]`: expected a number or an array, but found a boolean: `true`", "message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0]`: expected a number or an array, but found a boolean: `true`",
"code": "invalid_vectors_type", "code": "invalid_vectors_type",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type" "link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@ -398,7 +398,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0][0]`: expected a number, but found a boolean: `true`", "message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0][0]`: expected a number, but found a boolean: `true`",
"code": "invalid_vectors_type", "code": "invalid_vectors_type",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type" "link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@ -440,7 +440,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[1]`: expected a number, but found an array: `[0.2,0.3]`", "message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[1]`: expected a number, but found an array: `[0.2,0.3]`",
"code": "invalid_vectors_type", "code": "invalid_vectors_type",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type" "link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@ -469,7 +469,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[1]`: expected an array, but found a number: `0.3`", "message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[1]`: expected an array, but found a number: `0.3`",
"code": "invalid_vectors_type", "code": "invalid_vectors_type",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type" "link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@ -498,7 +498,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0][1]`: expected a number, but found a boolean: `true`", "message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0][1]`: expected a number, but found a boolean: `true`",
"code": "invalid_vectors_type", "code": "invalid_vectors_type",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type" "link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@ -539,7 +539,7 @@ async fn user_provided_vectors_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "While embedding documents for embedder `manual`: no vectors provided for document `40` and at least 4 other document(s)\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: opt-out for a document with `_vectors.manual: null`", "message": "Index `doggo`: While embedding documents for embedder `manual`: no vectors provided for document `40` and at least 4 other document(s)\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: opt-out for a document with `_vectors.manual: null`",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -569,7 +569,7 @@ async fn user_provided_vectors_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "While embedding documents for embedder `manual`: no vectors provided for document `42`\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: try replacing `_vector` by `_vectors` in 1 document(s).", "message": "Index `doggo`: While embedding documents for embedder `manual`: no vectors provided for document `42`\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: try replacing `_vector` by `_vectors` in 1 document(s).",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -599,7 +599,7 @@ async fn user_provided_vectors_error() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "While embedding documents for embedder `manual`: no vectors provided for document `42`\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: try replacing `_vectors.manaul` by `_vectors.manual` in 1 document(s).", "message": "Index `doggo`: While embedding documents for embedder `manual`: no vectors provided for document `42`\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: try replacing `_vectors.manaul` by `_vectors.manual` in 1 document(s).",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"

View File

@ -713,7 +713,7 @@ async fn bad_api_key() {
} }
}, },
"error": { "error": {
"message": "While embedding documents for embedder `default`: user error: could not authenticate against OpenAI server\n - server replied with `{\"error\":{\"message\":\"Incorrect API key provided: Bearer doggo. You can find your API key at https://platform.openai.com/account/api-keys.\",\"type\":\"invalid_request_error\",\"param\":null,\"code\":\"invalid_api_key\"}}`\n - Hint: Check the `apiKey` parameter in the embedder configuration, and the `MEILI_OPENAI_API_KEY` and `OPENAI_API_KEY` environment variables", "message": "Index `doggo`: While embedding documents for embedder `default`: user error: could not authenticate against OpenAI server\n - server replied with `{\"error\":{\"message\":\"Incorrect API key provided: Bearer doggo. You can find your API key at https://platform.openai.com/account/api-keys.\",\"type\":\"invalid_request_error\",\"param\":null,\"code\":\"invalid_api_key\"}}`\n - Hint: Check the `apiKey` parameter in the embedder configuration, and the `MEILI_OPENAI_API_KEY` and `OPENAI_API_KEY` environment variables",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -757,7 +757,7 @@ async fn bad_api_key() {
} }
}, },
"error": { "error": {
"message": "While embedding documents for embedder `default`: user error: could not authenticate against OpenAI server\n - server replied with `{\"error\":{\"message\":\"You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.\",\"type\":\"invalid_request_error\",\"param\":null,\"code\":null}}`\n - Hint: Check the `apiKey` parameter in the embedder configuration, and the `MEILI_OPENAI_API_KEY` and `OPENAI_API_KEY` environment variables", "message": "Index `doggo`: While embedding documents for embedder `default`: user error: could not authenticate against OpenAI server\n - server replied with `{\"error\":{\"message\":\"You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.\",\"type\":\"invalid_request_error\",\"param\":null,\"code\":null}}`\n - Hint: Check the `apiKey` parameter in the embedder configuration, and the `MEILI_OPENAI_API_KEY` and `OPENAI_API_KEY` environment variables",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"

View File

@ -985,7 +985,7 @@ async fn bad_settings() {
} }
}, },
"error": { "error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting a single \"{{embedding}}\", expected `response` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected a sequence", "message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting a single \"{{embedding}}\", expected `response` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected a sequence",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -1025,7 +1025,7 @@ async fn bad_settings() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "While embedding documents for embedder `rest`: runtime error: was expecting embeddings of dimension `2`, got embeddings of dimensions `3`", "message": "Index `doggo`: While embedding documents for embedder `rest`: runtime error: was expecting embeddings of dimension `2`, got embeddings of dimensions `3`",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -1178,7 +1178,7 @@ async fn server_returns_bad_request() {
} }
}, },
"error": { "error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: sent a bad request to embedding server\n - Hint: check that the `request` in the embedder configuration matches the remote server's API\n - server replied with `{\"error\":\"Invalid request: invalid type: string \\\"test\\\", expected struct MultipleRequest at line 1 column 6\"}`", "message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: sent a bad request to embedding server\n - Hint: check that the `request` in the embedder configuration matches the remote server's API\n - server replied with `{\"error\":\"Invalid request: invalid type: string \\\"test\\\", expected struct MultipleRequest at line 1 column 6\"}`",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -1247,7 +1247,7 @@ async fn server_returns_bad_request() {
"indexedDocuments": 0 "indexedDocuments": 0
}, },
"error": { "error": {
"message": "While embedding documents for embedder `rest`: user error: sent a bad request to embedding server\n - Hint: check that the `request` in the embedder configuration matches the remote server's API\n - server replied with `{\"error\":\"Invalid request: invalid type: string \\\"name: kefir\\\\n\\\", expected struct MultipleRequest at line 1 column 15\"}`", "message": "Index `doggo`: While embedding documents for embedder `rest`: user error: sent a bad request to embedding server\n - Hint: check that the `request` in the embedder configuration matches the remote server's API\n - server replied with `{\"error\":\"Invalid request: invalid type: string \\\"name: kefir\\\\n\\\", expected struct MultipleRequest at line 1 column 15\"}`",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -1306,7 +1306,7 @@ async fn server_returns_bad_response() {
} }
}, },
"error": { "error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting the array of \"{{embedding}}\"s, configuration expects `response` to be an array with at least 1 item(s) but server sent an object with 1 field(s)", "message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting the array of \"{{embedding}}\"s, configuration expects `response` to be an array with at least 1 item(s) but server sent an object with 1 field(s)",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -1362,7 +1362,7 @@ async fn server_returns_bad_response() {
} }
}, },
"error": { "error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting item #0 from the array of \"{{embedding}}\"s, expected `response` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected a sequence", "message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting item #0 from the array of \"{{embedding}}\"s, expected `response` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected a sequence",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -1414,7 +1414,7 @@ async fn server_returns_bad_response() {
} }
}, },
"error": { "error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.output`, while extracting a single \"{{embedding}}\", expected `output` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected f32", "message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.output`, while extracting a single \"{{embedding}}\", expected `output` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected f32",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -1478,7 +1478,7 @@ async fn server_returns_bad_response() {
} }
}, },
"error": { "error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.embedding`, while extracting item #0 from the array of \"{{embedding}}\"s, configuration expects `embedding` to be an object with key `data` but server sent an array of size 3", "message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.embedding`, while extracting item #0 from the array of \"{{embedding}}\"s, configuration expects `embedding` to be an object with key `data` but server sent an array of size 3",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -1542,7 +1542,7 @@ async fn server_returns_bad_response() {
} }
}, },
"error": { "error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.output[0]`, while extracting a single \"{{embedding}}\", configuration expects key \"embeddings\", which is missing in response\n - Hint: item #0 inside `output` has key `embedding`, did you mean `response.output[0].embedding` in embedder configuration?", "message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.output[0]`, while extracting a single \"{{embedding}}\", configuration expects key \"embeddings\", which is missing in response\n - Hint: item #0 inside `output` has key `embedding`, did you mean `response.output[0].embedding` in embedder configuration?",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -1908,7 +1908,7 @@ async fn server_custom_header() {
} }
}, },
"error": { "error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: could not authenticate against embedding server\n - server replied with `{\"error\":\"missing header 'my-nonstandard-auth'\"}`\n - Hint: Check the `apiKey` parameter in the embedder configuration", "message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: could not authenticate against embedding server\n - server replied with `{\"error\":\"missing header 'my-nonstandard-auth'\"}`\n - Hint: Check the `apiKey` parameter in the embedder configuration",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -1951,7 +1951,7 @@ async fn server_custom_header() {
} }
}, },
"error": { "error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: could not authenticate against embedding server\n - server replied with `{\"error\":\"thou shall not pass, Balrog\"}`\n - Hint: Check the `apiKey` parameter in the embedder configuration", "message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: could not authenticate against embedding server\n - server replied with `{\"error\":\"thou shall not pass, Balrog\"}`\n - Hint: Check the `apiKey` parameter in the embedder configuration",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@ -2099,7 +2099,7 @@ async fn searchable_reindex() {
] ]
}, },
"error": { "error": {
"message": "While embedding documents for embedder `rest`: error: received unexpected HTTP 404 from embedding server\n - server replied with `{\"error\":\"text not found\",\"text\":\"breed: patou\\n\"}`", "message": "Index `doggo`: While embedding documents for embedder `rest`: error: received unexpected HTTP 404 from embedding server\n - server replied with `{\"error\":\"text not found\",\"text\":\"breed: patou\\n\"}`",
"code": "vector_embedding_error", "code": "vector_embedding_error",
"type": "invalid_request", "type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error" "link": "https://docs.meilisearch.com/errors#vector_embedding_error"

View File

@ -10,12 +10,15 @@ license.workspace = true
[dependencies] [dependencies]
anyhow = "1.0.86" anyhow = "1.0.86"
arroy_v04_to_v05 = { package = "arroy", git = "https://github.com/meilisearch/arroy/", tag = "DO-NOT-DELETE-upgrade-v04-to-v05" }
clap = { version = "4.5.9", features = ["derive"] } clap = { version = "4.5.9", features = ["derive"] }
dump = { path = "../dump" } dump = { path = "../dump" }
file-store = { path = "../file-store" } file-store = { path = "../file-store" }
indexmap = {version = "2.7.0", features = ["serde"]}
meilisearch-auth = { path = "../meilisearch-auth" } meilisearch-auth = { path = "../meilisearch-auth" }
meilisearch-types = { path = "../meilisearch-types" } meilisearch-types = { path = "../meilisearch-types" }
serde = { version = "1.0.209", features = ["derive"] } serde = { version = "1.0.209", features = ["derive"] }
serde_json = {version = "1.0.133", features = ["preserve_order"]}
tempfile = "3.14.0"
time = { version = "0.3.36", features = ["formatting", "parsing", "alloc"] } time = { version = "0.3.36", features = ["formatting", "parsing", "alloc"] }
uuid = { version = "1.10.0", features = ["v4"], default-features = false } uuid = { version = "1.10.0", features = ["v4"], default-features = false }
arroy_v04_to_v05 = { package = "arroy", git = "https://github.com/meilisearch/arroy/", tag = "DO-NOT-DELETE-upgrade-v04-to-v05" }

View File

@ -73,7 +73,7 @@ enum Command {
/// ///
/// Supported upgrade paths: /// Supported upgrade paths:
/// ///
/// - v1.9.x -> v1.10.x -> v1.11.x /// - v1.9.x -> v1.10.x -> v1.11.x -> v1.12.x
OfflineUpgrade { OfflineUpgrade {
#[arg(long)] #[arg(long)]
target_version: String, target_version: String,

View File

@ -1,13 +1,14 @@
mod v1_10; mod v1_10;
mod v1_11; mod v1_11;
mod v1_12;
mod v1_9; mod v1_9;
use std::path::{Path, PathBuf}; use std::path::{Path, PathBuf};
use anyhow::{bail, Context}; use anyhow::{bail, Context};
use meilisearch_types::versioning::create_version_file; use meilisearch_types::versioning::create_version_file;
use v1_10::v1_9_to_v1_10; use v1_10::v1_9_to_v1_10;
use v1_12::v1_11_to_v1_12;
use crate::upgrade::v1_11::v1_10_to_v1_11; use crate::upgrade::v1_11::v1_10_to_v1_11;
@ -22,6 +23,7 @@ impl OfflineUpgrade {
let upgrade_list = [ let upgrade_list = [
(v1_9_to_v1_10 as fn(&Path) -> Result<(), anyhow::Error>, "1", "10", "0"), (v1_9_to_v1_10 as fn(&Path) -> Result<(), anyhow::Error>, "1", "10", "0"),
(v1_10_to_v1_11, "1", "11", "0"), (v1_10_to_v1_11, "1", "11", "0"),
(v1_11_to_v1_12, "1", "12", "0"),
]; ];
let (current_major, current_minor, current_patch) = &self.current_version; let (current_major, current_minor, current_patch) = &self.current_version;
@ -33,6 +35,7 @@ impl OfflineUpgrade {
) { ) {
("1", "9", _) => 0, ("1", "9", _) => 0,
("1", "10", _) => 1, ("1", "10", _) => 1,
("1", "11", _) => 2,
_ => { _ => {
bail!("Unsupported current version {current_major}.{current_minor}.{current_patch}. Can only upgrade from v1.9 and v1.10") bail!("Unsupported current version {current_major}.{current_minor}.{current_patch}. Can only upgrade from v1.9 and v1.10")
} }
@ -43,6 +46,7 @@ impl OfflineUpgrade {
let ends_at = match (target_major.as_str(), target_minor.as_str(), target_patch.as_str()) { let ends_at = match (target_major.as_str(), target_minor.as_str(), target_patch.as_str()) {
("1", "10", _) => 0, ("1", "10", _) => 0,
("1", "11", _) => 1, ("1", "11", _) => 1,
("1", "12", _) => 2,
(major, _, _) if major.starts_with('v') => { (major, _, _) if major.starts_with('v') => {
bail!("Target version must not starts with a `v`. Instead of writing `v1.9.0` write `1.9.0` for example.") bail!("Target version must not starts with a `v`. Instead of writing `v1.9.0` write `1.9.0` for example.")
} }

View File

@ -1,18 +1,13 @@
use anyhow::bail;
use std::path::Path; use std::path::Path;
use anyhow::Context; use anyhow::{bail, Context};
use meilisearch_types::{ use meilisearch_types::heed::types::{SerdeJson, Str};
heed::{ use meilisearch_types::heed::{Database, Env, EnvOpenOptions, RoTxn, RwTxn, Unspecified};
types::{SerdeJson, Str}, use meilisearch_types::milli::index::{db_name, main_key};
Database, Env, EnvOpenOptions, RoTxn, RwTxn, Unspecified,
},
milli::index::{db_name, main_key},
};
use crate::{try_opening_database, try_opening_poly_database, uuid_codec::UuidCodec};
use super::v1_9; use super::v1_9;
use crate::uuid_codec::UuidCodec;
use crate::{try_opening_database, try_opening_poly_database};
pub type FieldDistribution = std::collections::BTreeMap<String, u64>; pub type FieldDistribution = std::collections::BTreeMap<String, u64>;

View File

@ -7,12 +7,12 @@
use std::path::Path; use std::path::Path;
use anyhow::Context; use anyhow::Context;
use meilisearch_types::{ use meilisearch_types::heed::types::Str;
heed::{types::Str, Database, EnvOpenOptions}, use meilisearch_types::heed::{Database, EnvOpenOptions};
milli::index::db_name, use meilisearch_types::milli::index::db_name;
};
use crate::{try_opening_database, try_opening_poly_database, uuid_codec::UuidCodec}; use crate::uuid_codec::UuidCodec;
use crate::{try_opening_database, try_opening_poly_database};
pub fn v1_10_to_v1_11(db_path: &Path) -> anyhow::Result<()> { pub fn v1_10_to_v1_11(db_path: &Path) -> anyhow::Result<()> {
println!("Upgrading from v1.10.0 to v1.11.0"); println!("Upgrading from v1.10.0 to v1.11.0");

View File

@ -0,0 +1,79 @@
//! The breaking changes that happened between the v1.11 and the v1.12 are:
//! - The new indexer changed the update files format from OBKV to ndjson. https://github.com/meilisearch/meilisearch/pull/4900
use std::io::BufWriter;
use std::path::Path;
use anyhow::Context;
use file_store::FileStore;
use indexmap::IndexMap;
use meilisearch_types::milli::documents::DocumentsBatchReader;
use serde_json::value::RawValue;
use tempfile::NamedTempFile;
pub fn v1_11_to_v1_12(db_path: &Path) -> anyhow::Result<()> {
println!("Upgrading from v1.11.0 to v1.12.0");
convert_update_files(db_path)?;
Ok(())
}
/// Convert the update files from OBKV to ndjson format.
///
/// 1) List all the update files using the file store.
/// 2) For each update file, read the update file into a DocumentsBatchReader.
/// 3) For each document in the update file, convert the document to a JSON object.
/// 4) Write the JSON object to a tmp file in the update files directory.
/// 5) Persist the tmp file replacing the old update file.
fn convert_update_files(db_path: &Path) -> anyhow::Result<()> {
let update_files_dir_path = db_path.join("update_files");
let file_store = FileStore::new(&update_files_dir_path).with_context(|| {
format!("while creating file store for update files dir {update_files_dir_path:?}")
})?;
for uuid in file_store.all_uuids().context("while retrieving uuids from file store")? {
let uuid = uuid.context("while retrieving uuid from file store")?;
let update_file_path = file_store.get_update_path(uuid);
let update_file = file_store
.get_update(uuid)
.with_context(|| format!("while getting update file for uuid {uuid:?}"))?;
let mut file =
NamedTempFile::new_in(&update_files_dir_path).map(BufWriter::new).with_context(
|| format!("while creating bufwriter for update file {update_file_path:?}"),
)?;
let reader = DocumentsBatchReader::from_reader(update_file).with_context(|| {
format!("while creating documents batch reader for update file {update_file_path:?}")
})?;
let (mut cursor, index) = reader.into_cursor_and_fields_index();
while let Some(document) = cursor.next_document().with_context(|| {
format!(
"while reading documents from batch reader for update file {update_file_path:?}"
)
})? {
let mut json_document = IndexMap::new();
for (fid, value) in document {
let field_name = index
.name(fid)
.with_context(|| format!("while getting field name for fid {fid} for update file {update_file_path:?}"))?;
let value: &RawValue = serde_json::from_slice(value)?;
json_document.insert(field_name, value);
}
serde_json::to_writer(&mut file, &json_document)?;
}
let file = file.into_inner().map_err(|e| e.into_error()).context(format!(
"while flushing update file bufwriter for update file {update_file_path:?}"
))?;
let _ = file
// atomically replace the obkv file with the rewritten NDJSON file
.persist(&update_file_path)
.with_context(|| format!("while persisting update file {update_file_path:?}"))?;
}
Ok(())
}

View File

@ -91,8 +91,8 @@ ureq = { version = "2.10.0", features = ["json"] }
url = "2.5.2" url = "2.5.2"
rayon-par-bridge = "0.1.0" rayon-par-bridge = "0.1.0"
hashbrown = "0.15.0" hashbrown = "0.15.0"
raw-collections = { git = "https://github.com/meilisearch/raw-collections.git", version = "0.1.0" }
bumpalo = "3.16.0" bumpalo = "3.16.0"
bumparaw-collections = "0.1.2"
thread_local = "1.1.8" thread_local = "1.1.8"
allocator-api2 = "0.2.18" allocator-api2 = "0.2.18"
rustc-hash = "2.0.0" rustc-hash = "2.0.0"

View File

@ -280,7 +280,7 @@ fn starts_with(selector: &str, key: &str) -> bool {
pub fn validate_document_id_str(document_id: &str) -> Option<&str> { pub fn validate_document_id_str(document_id: &str) -> Option<&str> {
if document_id.is_empty() if document_id.is_empty()
|| document_id.len() > 512 || document_id.len() >= 512
|| !document_id.chars().all(|c| c.is_ascii_alphanumeric() || c == '-' || c == '_') || !document_id.chars().all(|c| c.is_ascii_alphanumeric() || c == '-' || c == '_')
{ {
None None

View File

@ -114,7 +114,7 @@ pub enum UserError {
"Document identifier `{}` is invalid. \ "Document identifier `{}` is invalid. \
A document identifier can be of type integer or string, \ A document identifier can be of type integer or string, \
only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), \ only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_), \
and can not be more than 512 bytes.", .document_id.to_string() and can not be more than 511 bytes.", .document_id.to_string()
)] )]
InvalidDocumentId { document_id: Value }, InvalidDocumentId { document_id: Value },
#[error("Invalid facet distribution, {}", format_invalid_filter_distribution(.invalid_facets_name, .valid_facets_name))] #[error("Invalid facet distribution, {}", format_invalid_filter_distribution(.invalid_facets_name, .valid_facets_name))]

View File

@ -1734,6 +1734,7 @@ pub(crate) mod tests {
use crate::error::{Error, InternalError}; use crate::error::{Error, InternalError};
use crate::index::{DEFAULT_MIN_WORD_LEN_ONE_TYPO, DEFAULT_MIN_WORD_LEN_TWO_TYPOS}; use crate::index::{DEFAULT_MIN_WORD_LEN_ONE_TYPO, DEFAULT_MIN_WORD_LEN_TWO_TYPOS};
use crate::progress::Progress;
use crate::update::new::indexer; use crate::update::new::indexer;
use crate::update::settings::InnerIndexSettings; use crate::update::settings::InnerIndexSettings;
use crate::update::{ use crate::update::{
@ -1810,7 +1811,7 @@ pub(crate) mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
)?; )?;
if let Some(error) = operation_stats.into_iter().find_map(|stat| stat.error) { if let Some(error) = operation_stats.into_iter().find_map(|stat| stat.error) {
@ -1829,7 +1830,7 @@ pub(crate) mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
}) })
.unwrap()?; .unwrap()?;
@ -1901,7 +1902,7 @@ pub(crate) mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
)?; )?;
if let Some(error) = operation_stats.into_iter().find_map(|stat| stat.error) { if let Some(error) = operation_stats.into_iter().find_map(|stat| stat.error) {
@ -1920,7 +1921,7 @@ pub(crate) mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
}) })
.unwrap()?; .unwrap()?;
@ -1982,7 +1983,7 @@ pub(crate) mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2001,7 +2002,7 @@ pub(crate) mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| should_abort.load(Relaxed), &|| should_abort.load(Relaxed),
&|_| (), &Progress::default(),
) )
}) })
.unwrap() .unwrap()

View File

@ -31,6 +31,7 @@ pub mod vector;
#[macro_use] #[macro_use]
pub mod snapshot_tests; pub mod snapshot_tests;
mod fieldids_weights_map; mod fieldids_weights_map;
pub mod progress;
use std::collections::{BTreeMap, HashMap}; use std::collections::{BTreeMap, HashMap};
use std::convert::{TryFrom, TryInto}; use std::convert::{TryFrom, TryInto};

View File

@ -0,0 +1,152 @@
use std::any::TypeId;
use std::borrow::Cow;
use std::sync::atomic::{AtomicU32, Ordering};
use std::sync::{Arc, RwLock};
use serde::Serialize;
pub trait Step: 'static + Send + Sync {
fn name(&self) -> Cow<'static, str>;
fn current(&self) -> u32;
fn total(&self) -> u32;
}
#[derive(Clone, Default)]
pub struct Progress {
steps: Arc<RwLock<Vec<(TypeId, Box<dyn Step>)>>>,
}
impl Progress {
pub fn update_progress<P: Step>(&self, sub_progress: P) {
let mut steps = self.steps.write().unwrap();
let step_type = TypeId::of::<P>();
if let Some(idx) = steps.iter().position(|(id, _)| *id == step_type) {
steps.truncate(idx);
}
steps.push((step_type, Box::new(sub_progress)));
}
// TODO: This code should be in meilisearch_types but cannot because milli can't depend on meilisearch_types
pub fn as_progress_view(&self) -> ProgressView {
let steps = self.steps.read().unwrap();
let mut percentage = 0.0;
let mut prev_factors = 1.0;
let mut step_view = Vec::with_capacity(steps.len());
for (_, step) in steps.iter() {
prev_factors *= step.total() as f32;
percentage += step.current() as f32 / prev_factors;
step_view.push(ProgressStepView {
current_step: step.name(),
finished: step.current(),
total: step.total(),
});
}
ProgressView { steps: step_view, percentage: percentage * 100.0 }
}
}
/// This trait lets you use the AtomicSubStep defined right below.
/// The name must be a const that never changed but that can't be enforced by the type system because it make the trait non object-safe.
/// By forcing the Default trait + the &'static str we make it harder to miss-use the trait.
pub trait NamedStep: 'static + Send + Sync + Default {
fn name(&self) -> &'static str;
}
/// Structure to quickly define steps that need very quick, lockless updating of their current step.
/// You can use this struct if:
/// - The name of the step doesn't change
/// - The total number of steps doesn't change
pub struct AtomicSubStep<Name: NamedStep> {
unit_name: Name,
current: Arc<AtomicU32>,
total: u32,
}
impl<Name: NamedStep> AtomicSubStep<Name> {
pub fn new(total: u32) -> (Arc<AtomicU32>, Self) {
let current = Arc::new(AtomicU32::new(0));
(current.clone(), Self { current, total, unit_name: Name::default() })
}
}
impl<Name: NamedStep> Step for AtomicSubStep<Name> {
fn name(&self) -> Cow<'static, str> {
self.unit_name.name().into()
}
fn current(&self) -> u32 {
self.current.load(Ordering::Relaxed)
}
fn total(&self) -> u32 {
self.total
}
}
#[macro_export]
macro_rules! make_enum_progress {
($visibility:vis enum $name:ident { $($variant:ident,)+ }) => {
#[repr(u8)]
#[derive(Debug, Clone, Copy, PartialEq, Eq, Sequence)]
#[allow(clippy::enum_variant_names)]
$visibility enum $name {
$($variant),+
}
impl Step for $name {
fn name(&self) -> Cow<'static, str> {
use convert_case::Casing;
match self {
$(
$name::$variant => stringify!($variant).from_case(convert_case::Case::Camel).to_case(convert_case::Case::Lower).into()
),+
}
}
fn current(&self) -> u32 {
*self as u32
}
fn total(&self) -> u32 {
Self::CARDINALITY as u32
}
}
};
}
#[macro_export]
macro_rules! make_atomic_progress {
($struct_name:ident alias $atomic_struct_name:ident => $step_name:literal) => {
#[derive(Default, Debug, Clone, Copy)]
pub struct $struct_name {}
impl NamedStep for $struct_name {
fn name(&self) -> &'static str {
$step_name
}
}
pub type $atomic_struct_name = AtomicSubStep<$struct_name>;
};
}
make_atomic_progress!(Document alias AtomicDocumentStep => "document" );
make_atomic_progress!(Payload alias AtomicPayloadStep => "payload" );
#[derive(Debug, Serialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct ProgressView {
pub steps: Vec<ProgressStepView>,
pub percentage: f32,
}
#[derive(Debug, Serialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct ProgressStepView {
pub current_step: Cow<'static, str>,
pub finished: u32,
pub total: u32,
}

View File

@ -3,12 +3,13 @@ use std::collections::BTreeMap;
use std::fmt::{self, Debug}; use std::fmt::{self, Debug};
use bumpalo::Bump; use bumpalo::Bump;
use bumparaw_collections::{RawMap, RawVec, Value};
use liquid::model::{ use liquid::model::{
ArrayView, DisplayCow, KString, KStringCow, ObjectRender, ObjectSource, ScalarCow, State, ArrayView, DisplayCow, KString, KStringCow, ObjectRender, ObjectSource, ScalarCow, State,
Value as LiquidValue, Value as LiquidValue,
}; };
use liquid::{ObjectView, ValueView}; use liquid::{ObjectView, ValueView};
use raw_collections::{RawMap, RawVec}; use rustc_hash::FxBuildHasher;
use serde_json::value::RawValue; use serde_json::value::RawValue;
use crate::update::del_add::{DelAdd, KvReaderDelAdd}; use crate::update::del_add::{DelAdd, KvReaderDelAdd};
@ -195,7 +196,7 @@ impl<'doc, D: DocumentTrait<'doc> + Debug> ObjectView for ParseableDocument<'doc
} }
impl<'doc, D: DocumentTrait<'doc> + Debug> ValueView for ParseableDocument<'doc, D> { impl<'doc, D: DocumentTrait<'doc> + Debug> ValueView for ParseableDocument<'doc, D> {
fn as_debug(&self) -> &dyn fmt::Debug { fn as_debug(&self) -> &dyn Debug {
self self
} }
fn render(&self) -> liquid::model::DisplayCow<'_> { fn render(&self) -> liquid::model::DisplayCow<'_> {
@ -243,14 +244,13 @@ impl<'doc, D: DocumentTrait<'doc> + Debug> ValueView for ParseableDocument<'doc,
} }
} }
#[derive(Debug)]
struct ParseableValue<'doc> { struct ParseableValue<'doc> {
value: raw_collections::Value<'doc>, value: Value<'doc, FxBuildHasher>,
} }
impl<'doc> ParseableValue<'doc> { impl<'doc> ParseableValue<'doc> {
pub fn new(value: &'doc RawValue, doc_alloc: &'doc Bump) -> Self { pub fn new(value: &'doc RawValue, doc_alloc: &'doc Bump) -> Self {
let value = raw_collections::Value::from_raw_value(value, doc_alloc).unwrap(); let value = Value::from_raw_value_and_hasher(value, FxBuildHasher, doc_alloc).unwrap();
Self { value } Self { value }
} }
@ -260,19 +260,19 @@ impl<'doc> ParseableValue<'doc> {
} }
// transparent newtype for implementing ValueView // transparent newtype for implementing ValueView
#[repr(transparent)]
#[derive(Debug)] #[derive(Debug)]
struct ParseableMap<'doc>(RawMap<'doc>); #[repr(transparent)]
struct ParseableMap<'doc>(RawMap<'doc, FxBuildHasher>);
// transparent newtype for implementing ValueView // transparent newtype for implementing ValueView
#[repr(transparent)]
#[derive(Debug)] #[derive(Debug)]
#[repr(transparent)]
struct ParseableArray<'doc>(RawVec<'doc>); struct ParseableArray<'doc>(RawVec<'doc>);
impl<'doc> ParseableMap<'doc> { impl<'doc> ParseableMap<'doc> {
pub fn as_parseable<'a>(map: &'a RawMap<'doc>) -> &'a ParseableMap<'doc> { pub fn as_parseable<'a>(map: &'a RawMap<'doc, FxBuildHasher>) -> &'a ParseableMap<'doc> {
// SAFETY: repr(transparent) // SAFETY: repr(transparent)
unsafe { &*(map as *const RawMap as *const Self) } unsafe { &*(map as *const RawMap<FxBuildHasher> as *const Self) }
} }
} }
@ -447,8 +447,9 @@ impl<'doc> ValueView for ParseableValue<'doc> {
} }
fn render(&self) -> DisplayCow<'_> { fn render(&self) -> DisplayCow<'_> {
use raw_collections::value::Number; use bumparaw_collections::value::Number;
use raw_collections::Value; use bumparaw_collections::Value;
match &self.value { match &self.value {
Value::Null => LiquidValue::Nil.render(), Value::Null => LiquidValue::Nil.render(),
Value::Bool(v) => v.render(), Value::Bool(v) => v.render(),
@ -464,8 +465,9 @@ impl<'doc> ValueView for ParseableValue<'doc> {
} }
fn source(&self) -> DisplayCow<'_> { fn source(&self) -> DisplayCow<'_> {
use raw_collections::value::Number; use bumparaw_collections::value::Number;
use raw_collections::Value; use bumparaw_collections::Value;
match &self.value { match &self.value {
Value::Null => LiquidValue::Nil.source(), Value::Null => LiquidValue::Nil.source(),
Value::Bool(v) => ValueView::source(v), Value::Bool(v) => ValueView::source(v),
@ -481,8 +483,9 @@ impl<'doc> ValueView for ParseableValue<'doc> {
} }
fn type_name(&self) -> &'static str { fn type_name(&self) -> &'static str {
use raw_collections::value::Number; use bumparaw_collections::value::Number;
use raw_collections::Value; use bumparaw_collections::Value;
match &self.value { match &self.value {
Value::Null => LiquidValue::Nil.type_name(), Value::Null => LiquidValue::Nil.type_name(),
Value::Bool(v) => v.type_name(), Value::Bool(v) => v.type_name(),
@ -498,7 +501,8 @@ impl<'doc> ValueView for ParseableValue<'doc> {
} }
fn query_state(&self, state: State) -> bool { fn query_state(&self, state: State) -> bool {
use raw_collections::Value; use bumparaw_collections::Value;
match &self.value { match &self.value {
Value::Null => ValueView::query_state(&LiquidValue::Nil, state), Value::Null => ValueView::query_state(&LiquidValue::Nil, state),
Value::Bool(v) => ValueView::query_state(v, state), Value::Bool(v) => ValueView::query_state(v, state),
@ -515,7 +519,8 @@ impl<'doc> ValueView for ParseableValue<'doc> {
} }
fn to_kstr(&self) -> KStringCow<'_> { fn to_kstr(&self) -> KStringCow<'_> {
use raw_collections::Value; use bumparaw_collections::Value;
match &self.value { match &self.value {
Value::Null => ValueView::to_kstr(&LiquidValue::Nil), Value::Null => ValueView::to_kstr(&LiquidValue::Nil),
Value::Bool(v) => ValueView::to_kstr(v), Value::Bool(v) => ValueView::to_kstr(v),
@ -527,12 +532,14 @@ impl<'doc> ValueView for ParseableValue<'doc> {
} }
fn to_value(&self) -> LiquidValue { fn to_value(&self) -> LiquidValue {
use raw_collections::Value; use bumparaw_collections::value::Number;
use bumparaw_collections::Value;
match &self.value { match &self.value {
Value::Null => LiquidValue::Nil, Value::Null => LiquidValue::Nil,
Value::Bool(v) => LiquidValue::Scalar(liquid::model::ScalarCow::new(*v)), Value::Bool(v) => LiquidValue::Scalar(liquid::model::ScalarCow::new(*v)),
Value::Number(number) => match number { Value::Number(number) => match number {
raw_collections::value::Number::PosInt(number) => { Number::PosInt(number) => {
let number: i64 = match (*number).try_into() { let number: i64 = match (*number).try_into() {
Ok(number) => number, Ok(number) => number,
Err(_) => { Err(_) => {
@ -541,12 +548,8 @@ impl<'doc> ValueView for ParseableValue<'doc> {
}; };
LiquidValue::Scalar(ScalarCow::new(number)) LiquidValue::Scalar(ScalarCow::new(number))
} }
raw_collections::value::Number::NegInt(number) => { Number::NegInt(number) => LiquidValue::Scalar(ScalarCow::new(*number)),
LiquidValue::Scalar(ScalarCow::new(*number)) Number::Finite(number) => LiquidValue::Scalar(ScalarCow::new(*number)),
}
raw_collections::value::Number::Finite(number) => {
LiquidValue::Scalar(ScalarCow::new(*number))
}
}, },
Value::String(s) => LiquidValue::Scalar(liquid::model::ScalarCow::new(s.to_string())), Value::String(s) => LiquidValue::Scalar(liquid::model::ScalarCow::new(s.to_string())),
Value::Array(raw_vec) => ParseableArray::as_parseable(raw_vec).to_value(), Value::Array(raw_vec) => ParseableArray::as_parseable(raw_vec).to_value(),
@ -555,8 +558,9 @@ impl<'doc> ValueView for ParseableValue<'doc> {
} }
fn as_scalar(&self) -> Option<liquid::model::ScalarCow<'_>> { fn as_scalar(&self) -> Option<liquid::model::ScalarCow<'_>> {
use raw_collections::value::Number; use bumparaw_collections::value::Number;
use raw_collections::Value; use bumparaw_collections::Value;
match &self.value { match &self.value {
Value::Bool(v) => Some(liquid::model::ScalarCow::new(*v)), Value::Bool(v) => Some(liquid::model::ScalarCow::new(*v)),
Value::Number(number) => match number { Value::Number(number) => match number {
@ -576,34 +580,41 @@ impl<'doc> ValueView for ParseableValue<'doc> {
} }
fn is_scalar(&self) -> bool { fn is_scalar(&self) -> bool {
use raw_collections::Value; use bumparaw_collections::Value;
matches!(&self.value, Value::Bool(_) | Value::Number(_) | Value::String(_)) matches!(&self.value, Value::Bool(_) | Value::Number(_) | Value::String(_))
} }
fn as_array(&self) -> Option<&dyn liquid::model::ArrayView> { fn as_array(&self) -> Option<&dyn liquid::model::ArrayView> {
if let raw_collections::Value::Array(array) = &self.value { if let Value::Array(array) = &self.value {
return Some(ParseableArray::as_parseable(array) as _); return Some(ParseableArray::as_parseable(array) as _);
} }
None None
} }
fn is_array(&self) -> bool { fn is_array(&self) -> bool {
matches!(&self.value, raw_collections::Value::Array(_)) matches!(&self.value, bumparaw_collections::Value::Array(_))
} }
fn as_object(&self) -> Option<&dyn ObjectView> { fn as_object(&self) -> Option<&dyn ObjectView> {
if let raw_collections::Value::Object(object) = &self.value { if let Value::Object(object) = &self.value {
return Some(ParseableMap::as_parseable(object) as _); return Some(ParseableMap::as_parseable(object) as _);
} }
None None
} }
fn is_object(&self) -> bool { fn is_object(&self) -> bool {
matches!(&self.value, raw_collections::Value::Object(_)) matches!(&self.value, bumparaw_collections::Value::Object(_))
} }
fn is_nil(&self) -> bool { fn is_nil(&self) -> bool {
matches!(&self.value, raw_collections::Value::Null) matches!(&self.value, bumparaw_collections::Value::Null)
}
}
impl Debug for ParseableValue<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("ParseableValue").field("value", &self.value).finish()
} }
} }

View File

@ -38,6 +38,16 @@ pub struct RenderPromptError {
pub fault: FaultSource, pub fault: FaultSource,
} }
impl RenderPromptError { impl RenderPromptError {
pub(crate) fn missing_context_with_external_docid(
external_docid: String,
inner: liquid::Error,
) -> RenderPromptError {
Self {
kind: RenderPromptErrorKind::MissingContextWithExternalDocid(external_docid, inner),
fault: FaultSource::User,
}
}
pub(crate) fn missing_context(inner: liquid::Error) -> RenderPromptError { pub(crate) fn missing_context(inner: liquid::Error) -> RenderPromptError {
Self { kind: RenderPromptErrorKind::MissingContext(inner), fault: FaultSource::User } Self { kind: RenderPromptErrorKind::MissingContext(inner), fault: FaultSource::User }
} }
@ -47,6 +57,8 @@ impl RenderPromptError {
pub enum RenderPromptErrorKind { pub enum RenderPromptErrorKind {
#[error("missing field in document: {0}")] #[error("missing field in document: {0}")]
MissingContext(liquid::Error), MissingContext(liquid::Error),
#[error("missing field in document `{0}`: {1}")]
MissingContextWithExternalDocid(String, liquid::Error),
} }
impl From<RenderPromptError> for crate::Error { impl From<RenderPromptError> for crate::Error {

View File

@ -119,6 +119,7 @@ impl Prompt {
'doc: 'a, // lifetime of the allocator, will live for an entire chunk of documents 'doc: 'a, // lifetime of the allocator, will live for an entire chunk of documents
>( >(
&self, &self,
external_docid: &str,
document: impl crate::update::new::document::Document<'a> + Debug, document: impl crate::update::new::document::Document<'a> + Debug,
field_id_map: &RefCell<GlobalFieldsIdsMap>, field_id_map: &RefCell<GlobalFieldsIdsMap>,
doc_alloc: &'doc Bump, doc_alloc: &'doc Bump,
@ -130,9 +131,12 @@ impl Prompt {
self.max_bytes.unwrap_or_else(default_max_bytes).get(), self.max_bytes.unwrap_or_else(default_max_bytes).get(),
doc_alloc, doc_alloc,
); );
self.template self.template.render_to(&mut rendered, &context).map_err(|liquid_error| {
.render_to(&mut rendered, &context) RenderPromptError::missing_context_with_external_docid(
.map_err(RenderPromptError::missing_context)?; external_docid.to_owned(),
liquid_error,
)
})?;
Ok(std::str::from_utf8(rendered.into_bump_slice()) Ok(std::str::from_utf8(rendered.into_bump_slice())
.expect("render can only write UTF-8 because all inputs and processing preserve utf-8")) .expect("render can only write UTF-8 because all inputs and processing preserve utf-8"))
} }

View File

@ -5,6 +5,7 @@ use bumpalo::Bump;
use heed::EnvOpenOptions; use heed::EnvOpenOptions;
use maplit::{btreemap, hashset}; use maplit::{btreemap, hashset};
use crate::progress::Progress;
use crate::update::new::indexer; use crate::update::new::indexer;
use crate::update::{IndexDocumentsMethod, IndexerConfig, Settings}; use crate::update::{IndexDocumentsMethod, IndexerConfig, Settings};
use crate::vector::EmbeddingConfigs; use crate::vector::EmbeddingConfigs;
@ -72,7 +73,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -91,7 +92,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();

View File

@ -79,22 +79,29 @@ pub const FACET_MIN_LEVEL_SIZE: u8 = 5;
use std::collections::BTreeSet; use std::collections::BTreeSet;
use std::fs::File; use std::fs::File;
use std::io::BufReader; use std::io::BufReader;
use std::ops::Bound;
use grenad::Merger; use grenad::Merger;
use heed::types::{Bytes, DecodeIgnore}; use heed::types::{Bytes, DecodeIgnore};
use heed::BytesDecode as _;
use roaring::RoaringBitmap;
use time::OffsetDateTime; use time::OffsetDateTime;
use tracing::debug; use tracing::debug;
use self::incremental::FacetsUpdateIncremental; use self::incremental::FacetsUpdateIncremental;
use super::{FacetsUpdateBulk, MergeDeladdBtreesetString, MergeDeladdCboRoaringBitmaps}; use super::{FacetsUpdateBulk, MergeDeladdBtreesetString, MergeDeladdCboRoaringBitmaps};
use crate::facet::FacetType; use crate::facet::FacetType;
use crate::heed_codec::facet::{FacetGroupKey, FacetGroupKeyCodec, FacetGroupValueCodec}; use crate::heed_codec::facet::{
FacetGroupKey, FacetGroupKeyCodec, FacetGroupValueCodec, OrderedF64Codec,
};
use crate::heed_codec::BytesRefCodec; use crate::heed_codec::BytesRefCodec;
use crate::search::facet::get_highest_level;
use crate::update::del_add::{DelAdd, KvReaderDelAdd}; use crate::update::del_add::{DelAdd, KvReaderDelAdd};
use crate::{try_split_array_at, FieldId, Index, Result}; use crate::{try_split_array_at, FieldId, Index, Result};
pub mod bulk; pub mod bulk;
pub mod incremental; pub mod incremental;
pub mod new_incremental;
/// A builder used to add new elements to the `facet_id_string_docids` or `facet_id_f64_docids` databases. /// A builder used to add new elements to the `facet_id_string_docids` or `facet_id_f64_docids` databases.
/// ///
@ -646,3 +653,194 @@ mod comparison_bench {
} }
} }
} }
/// Run sanity checks on the specified fid tree
///
/// 1. No "orphan" child value, any child value has a parent
/// 2. Any docid in the child appears in the parent
/// 3. No docid in the parent is missing from all its children
/// 4. no group is bigger than max_group_size
/// 5. Less than 50% of groups are bigger than group_size
/// 6. group size matches the number of children
/// 7. max_level is < 255
pub(crate) fn sanity_checks(
index: &Index,
rtxn: &heed::RoTxn,
field_id: FieldId,
facet_type: FacetType,
group_size: usize,
_min_level_size: usize, // might add a check on level size later
max_group_size: usize,
) -> Result<()> {
tracing::info!(%field_id, ?facet_type, "performing sanity checks");
let database = match facet_type {
FacetType::String => {
index.facet_id_string_docids.remap_key_type::<FacetGroupKeyCodec<BytesRefCodec>>()
}
FacetType::Number => {
index.facet_id_f64_docids.remap_key_type::<FacetGroupKeyCodec<BytesRefCodec>>()
}
};
let leaf_prefix: FacetGroupKey<&[u8]> = FacetGroupKey { field_id, level: 0, left_bound: &[] };
let leaf_it = database.prefix_iter(rtxn, &leaf_prefix)?;
let max_level = get_highest_level(rtxn, database, field_id)?;
if max_level == u8::MAX {
panic!("max_level == 255");
}
for leaf in leaf_it {
let (leaf_facet_value, leaf_docids) = leaf?;
let mut current_level = 0;
let mut current_parent_facet_value: Option<FacetGroupKey<&[u8]>> = None;
let mut current_parent_docids: Option<crate::heed_codec::facet::FacetGroupValue> = None;
loop {
current_level += 1;
if current_level >= max_level {
break;
}
let parent_key_right_bound = FacetGroupKey {
field_id,
level: current_level,
left_bound: leaf_facet_value.left_bound,
};
let (parent_facet_value, parent_docids) = database
.get_lower_than_or_equal_to(rtxn, &parent_key_right_bound)?
.expect("no parent found");
if parent_facet_value.level != current_level {
panic!(
"wrong parent level, found_level={}, expected_level={}",
parent_facet_value.level, current_level
);
}
if parent_facet_value.field_id != field_id {
panic!("wrong parent fid");
}
if parent_facet_value.left_bound > leaf_facet_value.left_bound {
panic!("wrong parent left bound");
}
if !leaf_docids.bitmap.is_subset(&parent_docids.bitmap) {
panic!(
"missing docids from leaf in parent, current_level={}, parent={}, child={}, missing={missing:?}, child_len={}, child={:?}",
current_level,
facet_to_string(parent_facet_value.left_bound, facet_type),
facet_to_string(leaf_facet_value.left_bound, facet_type),
leaf_docids.bitmap.len(),
leaf_docids.bitmap.clone(),
missing=leaf_docids.bitmap - parent_docids.bitmap,
)
}
if let Some(current_parent_facet_value) = current_parent_facet_value {
if current_parent_facet_value.field_id != parent_facet_value.field_id {
panic!("wrong parent parent fid");
}
if current_parent_facet_value.level + 1 != parent_facet_value.level {
panic!("wrong parent parent level");
}
if current_parent_facet_value.left_bound < parent_facet_value.left_bound {
panic!("wrong parent parent left bound");
}
}
if let Some(current_parent_docids) = current_parent_docids {
if !current_parent_docids.bitmap.is_subset(&parent_docids.bitmap) {
panic!("missing docids from intermediate node in parent, parent_level={}, parent={}, intermediate={}, missing={missing:?}, intermediate={:?}",
parent_facet_value.level,
facet_to_string(parent_facet_value.left_bound, facet_type),
facet_to_string(current_parent_facet_value.unwrap().left_bound, facet_type),
current_parent_docids.bitmap.clone(),
missing=current_parent_docids.bitmap - parent_docids.bitmap,
);
}
}
current_parent_facet_value = Some(parent_facet_value);
current_parent_docids = Some(parent_docids);
}
}
tracing::info!(%field_id, ?facet_type, "checked all leaves");
let mut current_level = max_level;
let mut greater_than_group = 0usize;
let mut total = 0usize;
loop {
if current_level == 0 {
break;
}
let child_level = current_level - 1;
tracing::info!(%field_id, ?facet_type, %current_level, "checked groups for level");
let level_groups_prefix: FacetGroupKey<&[u8]> =
FacetGroupKey { field_id, level: current_level, left_bound: &[] };
let mut level_groups_it = database.prefix_iter(rtxn, &level_groups_prefix)?.peekable();
'group_it: loop {
let Some(group) = level_groups_it.next() else { break 'group_it };
let (group_facet_value, group_docids) = group?;
let child_left_bound = group_facet_value.left_bound.to_owned();
let mut expected_docids = RoaringBitmap::new();
let mut expected_size = 0usize;
let right_bound = level_groups_it
.peek()
.and_then(|res| res.as_ref().ok())
.map(|(key, _)| key.left_bound);
let child_left_bound = FacetGroupKey {
field_id,
level: child_level,
left_bound: child_left_bound.as_slice(),
};
let child_left_bound = Bound::Included(&child_left_bound);
let child_right_bound;
let child_right_bound = if let Some(right_bound) = right_bound {
child_right_bound =
FacetGroupKey { field_id, level: child_level, left_bound: right_bound };
Bound::Excluded(&child_right_bound)
} else {
Bound::Unbounded
};
let children = database.range(rtxn, &(child_left_bound, child_right_bound))?;
for child in children {
let (child_facet_value, child_docids) = child?;
if child_facet_value.field_id != field_id {
break;
}
if child_facet_value.level != child_level {
break;
}
expected_size += 1;
expected_docids |= &child_docids.bitmap;
}
assert_eq!(expected_size, group_docids.size as usize);
assert!(expected_size <= max_group_size);
assert_eq!(expected_docids, group_docids.bitmap);
total += 1;
if expected_size > group_size {
greater_than_group += 1;
}
}
current_level -= 1;
}
if greater_than_group * 2 > total {
panic!("too many groups have a size > group_size");
}
tracing::info!("sanity checks OK");
Ok(())
}
fn facet_to_string(facet_value: &[u8], facet_type: FacetType) -> String {
match facet_type {
FacetType::String => bstr::BStr::new(facet_value).to_string(),
FacetType::Number => match OrderedF64Codec::bytes_decode(facet_value) {
Ok(value) => value.to_string(),
Err(e) => format!("error: {e} (bytes: {facet_value:?}"),
},
}
}

View File

@ -0,0 +1,498 @@
use std::ops::Bound;
use heed::types::{Bytes, DecodeIgnore};
use heed::{BytesDecode as _, Database, RwTxn};
use roaring::RoaringBitmap;
use crate::facet::FacetType;
use crate::heed_codec::facet::{
FacetGroupKey, FacetGroupKeyCodec, FacetGroupValue, FacetGroupValueCodec,
};
use crate::heed_codec::BytesRefCodec;
use crate::search::facet::get_highest_level;
use crate::update::valid_facet_value;
use crate::{FieldId, Index, Result};
pub struct FacetsUpdateIncremental {
inner: FacetsUpdateIncrementalInner,
delta_data: Vec<FacetFieldIdChange>,
}
struct FacetsUpdateIncrementalInner {
db: Database<FacetGroupKeyCodec<BytesRefCodec>, FacetGroupValueCodec>,
field_id: FieldId,
group_size: u8,
min_level_size: u8,
max_group_size: u8,
}
impl FacetsUpdateIncremental {
pub fn new(
index: &Index,
facet_type: FacetType,
field_id: FieldId,
delta_data: Vec<FacetFieldIdChange>,
group_size: u8,
min_level_size: u8,
max_group_size: u8,
) -> Self {
FacetsUpdateIncremental {
inner: FacetsUpdateIncrementalInner {
db: match facet_type {
FacetType::String => index
.facet_id_string_docids
.remap_key_type::<FacetGroupKeyCodec<BytesRefCodec>>(),
FacetType::Number => index
.facet_id_f64_docids
.remap_key_type::<FacetGroupKeyCodec<BytesRefCodec>>(),
},
field_id,
group_size,
min_level_size,
max_group_size,
},
delta_data,
}
}
#[tracing::instrument(level = "trace", skip_all, target = "indexing::facets::incremental")]
pub fn execute(mut self, wtxn: &mut RwTxn) -> Result<()> {
if self.delta_data.is_empty() {
return Ok(());
}
self.delta_data.sort_unstable_by(
|FacetFieldIdChange { facet_value: left, .. },
FacetFieldIdChange { facet_value: right, .. }| {
left.cmp(right)
// sort in **reverse** lexicographic order
.reverse()
},
);
self.inner.find_changed_parents(wtxn, self.delta_data)?;
self.inner.add_or_delete_level(wtxn)
}
}
impl FacetsUpdateIncrementalInner {
/// WARNING: `changed_children` must be sorted in **reverse** lexicographic order.
fn find_changed_parents(
&self,
wtxn: &mut RwTxn,
mut changed_children: Vec<FacetFieldIdChange>,
) -> Result<()> {
let mut changed_parents = vec![];
for child_level in 0u8..u8::MAX {
// child_level < u8::MAX by construction
let parent_level = child_level + 1;
let parent_level_left_bound: FacetGroupKey<&[u8]> =
FacetGroupKey { field_id: self.field_id, level: parent_level, left_bound: &[] };
let mut last_parent: Option<Box<[u8]>> = None;
let mut child_it = changed_children
// drain all changed children
.drain(..)
// keep only children whose value is valid in the LMDB sense
.filter(|child| valid_facet_value(&child.facet_value));
// `while let` rather than `for` because we advance `child_it` inside of the loop
'current_level: while let Some(child) = child_it.next() {
if let Some(last_parent) = &last_parent {
if &child.facet_value >= last_parent {
self.compute_parent_group(wtxn, child_level, child.facet_value)?;
continue 'current_level;
}
}
// need to find a new parent
let parent_key_prefix = FacetGroupKey {
field_id: self.field_id,
level: parent_level,
left_bound: &*child.facet_value,
};
let parent = self
.db
.remap_data_type::<DecodeIgnore>()
.rev_range(
wtxn,
&(
Bound::Excluded(&parent_level_left_bound),
Bound::Included(&parent_key_prefix),
),
)?
.next();
match parent {
Some(Ok((parent_key, _parent_value))) => {
// found parent, cache it for next keys
last_parent = Some(parent_key.left_bound.to_owned().into_boxed_slice());
// add to modified list for parent level
changed_parents.push(FacetFieldIdChange {
facet_value: parent_key.left_bound.to_owned().into_boxed_slice(),
});
self.compute_parent_group(wtxn, child_level, child.facet_value)?;
}
Some(Err(err)) => return Err(err.into()),
None => {
// no parent for that key
let mut parent_it = self
.db
.remap_data_type::<DecodeIgnore>()
.prefix_iter_mut(wtxn, &parent_level_left_bound)?;
match parent_it.next() {
// 1. left of the current left bound, or
Some(Ok((first_key, _first_value))) => {
// make sure we don't spill on the neighboring fid (level also included defensively)
if first_key.field_id != self.field_id
|| first_key.level != parent_level
{
// max level reached, exit
drop(parent_it);
self.compute_parent_group(
wtxn,
child_level,
child.facet_value,
)?;
for child in child_it.by_ref() {
self.compute_parent_group(
wtxn,
child_level,
child.facet_value,
)?;
}
return Ok(());
}
// remove old left bound
unsafe { parent_it.del_current()? };
drop(parent_it);
changed_parents.push(FacetFieldIdChange {
facet_value: child.facet_value.clone(),
});
self.compute_parent_group(wtxn, child_level, child.facet_value)?;
// pop all elements in order to visit the new left bound
let new_left_bound =
&mut changed_parents.last_mut().unwrap().facet_value;
for child in child_it.by_ref() {
new_left_bound.clone_from(&child.facet_value);
self.compute_parent_group(
wtxn,
child_level,
child.facet_value,
)?;
}
}
Some(Err(err)) => return Err(err.into()),
// 2. max level reached, exit
None => {
drop(parent_it);
self.compute_parent_group(wtxn, child_level, child.facet_value)?;
for child in child_it.by_ref() {
self.compute_parent_group(
wtxn,
child_level,
child.facet_value,
)?;
}
return Ok(());
}
}
}
}
}
if changed_parents.is_empty() {
return Ok(());
}
drop(child_it);
std::mem::swap(&mut changed_children, &mut changed_parents);
// changed_parents is now empty because changed_children was emptied by the drain
}
Ok(())
}
fn compute_parent_group(
&self,
wtxn: &mut RwTxn<'_>,
parent_level: u8,
parent_left_bound: Box<[u8]>,
) -> Result<()> {
let mut range_left_bound: Vec<u8> = parent_left_bound.into();
if parent_level == 0 {
return Ok(());
}
let child_level = parent_level - 1;
let parent_key = FacetGroupKey {
field_id: self.field_id,
level: parent_level,
left_bound: &*range_left_bound,
};
let child_right_bound = self
.db
.remap_data_type::<DecodeIgnore>()
.get_greater_than(wtxn, &parent_key)?
.and_then(
|(
FacetGroupKey {
level: right_level,
field_id: right_fid,
left_bound: right_bound,
},
_,
)| {
if parent_level != right_level || self.field_id != right_fid {
// there was a greater key, but with a greater level or fid, so not a sibling to the parent: ignore
return None;
}
Some(right_bound.to_owned())
},
);
let child_right_bound = match &child_right_bound {
Some(right_bound) => Bound::Excluded(FacetGroupKey {
left_bound: right_bound.as_slice(),
field_id: self.field_id,
level: child_level,
}),
None => Bound::Unbounded,
};
let child_left_key = FacetGroupKey {
field_id: self.field_id,
level: child_level,
left_bound: &*range_left_bound,
};
let mut child_left_bound = Bound::Included(child_left_key);
loop {
// do a first pass on the range to find the number of children
let child_count = self
.db
.remap_data_type::<DecodeIgnore>()
.range(wtxn, &(child_left_bound, child_right_bound))?
.take(self.max_group_size as usize * 2)
.count();
let mut child_it = self.db.range(wtxn, &(child_left_bound, child_right_bound))?;
// pick the right group_size depending on the number of children
let group_size = if child_count >= self.max_group_size as usize * 2 {
// more than twice the max_group_size => there will be space for at least 2 groups of max_group_size
self.max_group_size as usize
} else if child_count >= self.group_size as usize {
// size in [group_size, max_group_size * 2[
// divided by 2 it is between [group_size / 2, max_group_size[
// this ensures that the tree is balanced
child_count / 2
} else {
// take everything
child_count
};
let res: Result<_> = child_it
.by_ref()
.take(group_size)
// stop if we go to the next level or field id
.take_while(|res| match res {
Ok((child_key, _)) => {
child_key.field_id == self.field_id && child_key.level == child_level
}
Err(_) => true,
})
.try_fold(
(None, FacetGroupValue { size: 0, bitmap: Default::default() }),
|(bounds, mut group_value), child_res| {
let (child_key, child_value) = child_res?;
let bounds = match bounds {
Some((left_bound, _)) => Some((left_bound, child_key.left_bound)),
None => Some((child_key.left_bound, child_key.left_bound)),
};
// max_group_size <= u8::MAX
group_value.size += 1;
group_value.bitmap |= &child_value.bitmap;
Ok((bounds, group_value))
},
);
let (bounds, group_value) = res?;
let Some((group_left_bound, right_bound)) = bounds else {
let update_key = FacetGroupKey {
field_id: self.field_id,
level: parent_level,
left_bound: &*range_left_bound,
};
drop(child_it);
if let Bound::Included(_) = child_left_bound {
self.db.delete(wtxn, &update_key)?;
}
break;
};
drop(child_it);
let current_left_bound = group_left_bound.to_owned();
let delete_old_bound = match child_left_bound {
Bound::Included(bound) => {
if bound.left_bound != current_left_bound {
Some(range_left_bound.clone())
} else {
None
}
}
_ => None,
};
range_left_bound.clear();
range_left_bound.extend_from_slice(right_bound);
let child_left_key = FacetGroupKey {
field_id: self.field_id,
level: child_level,
left_bound: range_left_bound.as_slice(),
};
child_left_bound = Bound::Excluded(child_left_key);
if let Some(old_bound) = delete_old_bound {
let update_key = FacetGroupKey {
field_id: self.field_id,
level: parent_level,
left_bound: old_bound.as_slice(),
};
self.db.delete(wtxn, &update_key)?;
}
let update_key = FacetGroupKey {
field_id: self.field_id,
level: parent_level,
left_bound: current_left_bound.as_slice(),
};
if group_value.bitmap.is_empty() {
self.db.delete(wtxn, &update_key)?;
} else {
self.db.put(wtxn, &update_key, &group_value)?;
}
}
Ok(())
}
/// Check whether the highest level has exceeded `min_level_size` * `self.group_size`.
/// If it has, we must build an addition level above it.
/// Then check whether the highest level is under `min_level_size`.
/// If it has, we must remove the complete level.
pub(crate) fn add_or_delete_level(&self, txn: &mut RwTxn<'_>) -> Result<()> {
let highest_level = get_highest_level(txn, self.db, self.field_id)?;
let mut highest_level_prefix = vec![];
highest_level_prefix.extend_from_slice(&self.field_id.to_be_bytes());
highest_level_prefix.push(highest_level);
let size_highest_level =
self.db.remap_types::<Bytes, Bytes>().prefix_iter(txn, &highest_level_prefix)?.count();
if size_highest_level >= self.group_size as usize * self.min_level_size as usize {
self.add_level(txn, highest_level, &highest_level_prefix, size_highest_level)
} else if size_highest_level < self.min_level_size as usize && highest_level != 0 {
self.delete_level(txn, &highest_level_prefix)
} else {
Ok(())
}
}
/// Delete a level.
fn delete_level(&self, txn: &mut RwTxn<'_>, highest_level_prefix: &[u8]) -> Result<()> {
let mut to_delete = vec![];
let mut iter =
self.db.remap_types::<Bytes, Bytes>().prefix_iter(txn, highest_level_prefix)?;
for el in iter.by_ref() {
let (k, _) = el?;
to_delete.push(
FacetGroupKeyCodec::<BytesRefCodec>::bytes_decode(k)
.map_err(heed::Error::Encoding)?
.into_owned(),
);
}
drop(iter);
for k in to_delete {
self.db.delete(txn, &k.as_ref())?;
}
Ok(())
}
/// Build an additional level for the field id.
fn add_level(
&self,
txn: &mut RwTxn<'_>,
highest_level: u8,
highest_level_prefix: &[u8],
size_highest_level: usize,
) -> Result<()> {
let mut groups_iter = self
.db
.remap_types::<Bytes, FacetGroupValueCodec>()
.prefix_iter(txn, highest_level_prefix)?;
let nbr_new_groups = size_highest_level / self.group_size as usize;
let nbr_leftover_elements = size_highest_level % self.group_size as usize;
let mut to_add = vec![];
for _ in 0..nbr_new_groups {
let mut first_key = None;
let mut values = RoaringBitmap::new();
for _ in 0..self.group_size {
let (key_bytes, value_i) = groups_iter.next().unwrap()?;
let key_i = FacetGroupKeyCodec::<BytesRefCodec>::bytes_decode(key_bytes)
.map_err(heed::Error::Encoding)?;
if first_key.is_none() {
first_key = Some(key_i);
}
values |= value_i.bitmap;
}
let key = FacetGroupKey {
field_id: self.field_id,
level: highest_level + 1,
left_bound: first_key.unwrap().left_bound,
};
let value = FacetGroupValue { size: self.group_size, bitmap: values };
to_add.push((key.into_owned(), value));
}
// now we add the rest of the level, in case its size is > group_size * min_level_size
// this can indeed happen if the min_level_size parameter changes between two calls to `insert`
if nbr_leftover_elements > 0 {
let mut first_key = None;
let mut values = RoaringBitmap::new();
for _ in 0..nbr_leftover_elements {
let (key_bytes, value_i) = groups_iter.next().unwrap()?;
let key_i = FacetGroupKeyCodec::<BytesRefCodec>::bytes_decode(key_bytes)
.map_err(heed::Error::Encoding)?;
if first_key.is_none() {
first_key = Some(key_i);
}
values |= value_i.bitmap;
}
let key = FacetGroupKey {
field_id: self.field_id,
level: highest_level + 1,
left_bound: first_key.unwrap().left_bound,
};
// Note: nbr_leftover_elements can be casted to a u8 since it is bounded by `max_group_size`
// when it is created above.
let value = FacetGroupValue { size: nbr_leftover_elements as u8, bitmap: values };
to_add.push((key.into_owned(), value));
}
drop(groups_iter);
for (key, value) in to_add {
self.db.put(txn, &key.as_ref(), &value)?;
}
Ok(())
}
}
#[derive(Debug)]
pub struct FacetFieldIdChange {
pub facet_value: Box<[u8]>,
}

View File

@ -10,10 +10,14 @@ use fst::{IntoStreamer, Streamer};
pub use grenad_helpers::*; pub use grenad_helpers::*;
pub use merge_functions::*; pub use merge_functions::*;
use crate::MAX_WORD_LENGTH; use crate::MAX_LMDB_KEY_LENGTH;
pub fn valid_lmdb_key(key: impl AsRef<[u8]>) -> bool { pub fn valid_lmdb_key(key: impl AsRef<[u8]>) -> bool {
key.as_ref().len() <= MAX_WORD_LENGTH * 2 && !key.as_ref().is_empty() key.as_ref().len() <= MAX_LMDB_KEY_LENGTH - 3 && !key.as_ref().is_empty()
}
pub fn valid_facet_value(facet_value: impl AsRef<[u8]>) -> bool {
facet_value.as_ref().len() <= MAX_LMDB_KEY_LENGTH - 3 && !facet_value.as_ref().is_empty()
} }
/// Divides one slice into two at an index, returns `None` if mid is out of bounds. /// Divides one slice into two at an index, returns `None` if mid is out of bounds.

View File

@ -766,6 +766,7 @@ mod tests {
use crate::documents::mmap_from_objects; use crate::documents::mmap_from_objects;
use crate::index::tests::TempIndex; use crate::index::tests::TempIndex;
use crate::index::IndexEmbeddingConfig; use crate::index::IndexEmbeddingConfig;
use crate::progress::Progress;
use crate::search::TermsMatchingStrategy; use crate::search::TermsMatchingStrategy;
use crate::update::new::indexer; use crate::update::new::indexer;
use crate::update::Setting; use crate::update::Setting;
@ -1964,7 +1965,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2148,7 +2149,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2163,7 +2164,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();
@ -2210,7 +2211,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2225,7 +2226,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();
@ -2263,7 +2264,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2278,7 +2279,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();
@ -2315,7 +2316,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2330,7 +2331,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();
@ -2369,7 +2370,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2384,7 +2385,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();
@ -2428,7 +2429,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2443,7 +2444,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();
@ -2480,7 +2481,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2495,7 +2496,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();
@ -2532,7 +2533,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2547,7 +2548,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();
@ -2726,7 +2727,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2741,7 +2742,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();
@ -2785,7 +2786,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2800,7 +2801,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();
@ -2841,7 +2842,7 @@ mod tests {
None, None,
&mut new_fields_ids_map, &mut new_fields_ids_map,
&|| false, &|| false,
&|_progress| (), Progress::default(),
) )
.unwrap(); .unwrap();
@ -2856,7 +2857,7 @@ mod tests {
&document_changes, &document_changes,
embedders, embedders,
&|| false, &|| false,
&|_| (), &Progress::default(),
) )
.unwrap(); .unwrap();
wtxn.commit().unwrap(); wtxn.commit().unwrap();

View File

@ -1,5 +1,5 @@
--- ---
source: milli/src/update/index_documents/mod.rs source: crates/milli/src/update/index_documents/mod.rs
--- ---
3 0 48.9021 1 [19, ] 3 0 48.9021 1 [19, ]
3 0 49.9314 1 [17, ] 3 0 49.9314 1 [17, ]
@ -15,6 +15,11 @@ source: milli/src/update/index_documents/mod.rs
3 0 50.7453 1 [7, ] 3 0 50.7453 1 [7, ]
3 0 50.8466 1 [10, ] 3 0 50.8466 1 [10, ]
3 0 51.0537 1 [9, ] 3 0 51.0537 1 [9, ]
3 1 48.9021 2 [17, 19, ]
3 1 50.1793 3 [13, 14, 15, ]
3 1 50.4502 4 [0, 3, 8, 12, ]
3 1 50.6312 2 [1, 2, ]
3 1 50.7453 3 [7, 9, 10, ]
4 0 2.271 1 [17, ] 4 0 2.271 1 [17, ]
4 0 2.3708 1 [19, ] 4 0 2.3708 1 [19, ]
4 0 2.7637 1 [14, ] 4 0 2.7637 1 [14, ]
@ -28,4 +33,3 @@ source: milli/src/update/index_documents/mod.rs
4 0 3.6957 1 [9, ] 4 0 3.6957 1 [9, ]
4 0 3.9623 1 [12, ] 4 0 3.9623 1 [12, ]
4 0 4.337 1 [10, ] 4 0 4.337 1 [10, ]

View File

@ -1,7 +1,8 @@
use std::collections::{BTreeMap, BTreeSet}; use std::collections::{BTreeMap, BTreeSet};
use bumparaw_collections::RawMap;
use heed::RoTxn; use heed::RoTxn;
use raw_collections::RawMap; use rustc_hash::FxBuildHasher;
use serde_json::value::RawValue; use serde_json::value::RawValue;
use super::vector_document::VectorDocument; use super::vector_document::VectorDocument;
@ -385,12 +386,12 @@ pub type Entry<'doc> = (&'doc str, &'doc RawValue);
#[derive(Debug)] #[derive(Debug)]
pub struct Versions<'doc> { pub struct Versions<'doc> {
data: RawMap<'doc>, data: RawMap<'doc, FxBuildHasher>,
} }
impl<'doc> Versions<'doc> { impl<'doc> Versions<'doc> {
pub fn multiple( pub fn multiple(
mut versions: impl Iterator<Item = Result<RawMap<'doc>>>, mut versions: impl Iterator<Item = Result<RawMap<'doc, FxBuildHasher>>>,
) -> Result<Option<Self>> { ) -> Result<Option<Self>> {
let Some(data) = versions.next() else { return Ok(None) }; let Some(data) = versions.next() else { return Ok(None) };
let mut data = data?; let mut data = data?;
@ -403,7 +404,7 @@ impl<'doc> Versions<'doc> {
Ok(Some(Self::single(data))) Ok(Some(Self::single(data)))
} }
pub fn single(version: RawMap<'doc>) -> Self { pub fn single(version: RawMap<'doc, FxBuildHasher>) -> Self {
Self { data: version } Self { data: version }
} }

View File

@ -1,7 +1,10 @@
use bumpalo::Bump; use bumpalo::Bump;
use heed::RoTxn; use heed::RoTxn;
use super::document::{DocumentFromDb, DocumentFromVersions, MergedDocument, Versions}; use super::document::{
Document as _, DocumentFromDb, DocumentFromVersions, MergedDocument, Versions,
};
use super::extract::perm_json_p;
use super::vector_document::{ use super::vector_document::{
MergedVectorDocument, VectorDocumentFromDb, VectorDocumentFromVersions, MergedVectorDocument, VectorDocumentFromDb, VectorDocumentFromVersions,
}; };
@ -164,6 +167,80 @@ impl<'doc> Update<'doc> {
} }
} }
/// Returns whether the updated version of the document is different from the current version for the passed subset of fields.
///
/// `true` if at least one top-level-field that is a exactly a member of field or a parent of a member of field changed.
/// Otherwise `false`.
pub fn has_changed_for_fields<'t, Mapper: FieldIdMapper>(
&self,
fields: Option<&[&str]>,
rtxn: &'t RoTxn,
index: &'t Index,
mapper: &'t Mapper,
) -> Result<bool> {
let mut changed = false;
let mut cached_current = None;
let mut updated_selected_field_count = 0;
for entry in self.updated().iter_top_level_fields() {
let (key, updated_value) = entry?;
if perm_json_p::select_field(key, fields, &[]) == perm_json_p::Selection::Skip {
continue;
}
updated_selected_field_count += 1;
let current = match cached_current {
Some(current) => current,
None => self.current(rtxn, index, mapper)?,
};
let current_value = current.top_level_field(key)?;
let Some(current_value) = current_value else {
changed = true;
break;
};
if current_value.get() != updated_value.get() {
changed = true;
break;
}
cached_current = Some(current);
}
if !self.has_deletion {
// no field deletion, so fields that don't appear in `updated` cannot have changed
return Ok(changed);
}
if changed {
return Ok(true);
}
// we saw all updated fields, and set `changed` if any field wasn't in `current`.
// so if there are as many fields in `current` as in `updated`, then nothing changed.
// If there is any more fields in `current`, then they are missing in `updated`.
let has_deleted_fields = {
let current = match cached_current {
Some(current) => current,
None => self.current(rtxn, index, mapper)?,
};
let mut current_selected_field_count = 0;
for entry in current.iter_top_level_fields() {
let (key, _) = entry?;
if perm_json_p::select_field(key, fields, &[]) == perm_json_p::Selection::Skip {
continue;
}
current_selected_field_count += 1;
}
current_selected_field_count != updated_selected_field_count
};
Ok(has_deleted_fields)
}
pub fn updated_vectors( pub fn updated_vectors(
&self, &self,
doc_alloc: &'doc Bump, doc_alloc: &'doc Bump,

View File

@ -69,12 +69,12 @@ use std::io::BufReader;
use std::{io, iter, mem}; use std::{io, iter, mem};
use bumpalo::Bump; use bumpalo::Bump;
use bumparaw_collections::bbbul::{BitPacker, BitPacker4x};
use bumparaw_collections::map::FrozenMap;
use bumparaw_collections::{Bbbul, FrozenBbbul};
use grenad::ReaderCursor; use grenad::ReaderCursor;
use hashbrown::hash_map::RawEntryMut; use hashbrown::hash_map::RawEntryMut;
use hashbrown::HashMap; use hashbrown::HashMap;
use raw_collections::bbbul::{BitPacker, BitPacker4x};
use raw_collections::map::FrozenMap;
use raw_collections::{Bbbul, FrozenBbbul};
use roaring::RoaringBitmap; use roaring::RoaringBitmap;
use rustc_hash::FxBuildHasher; use rustc_hash::FxBuildHasher;
@ -177,12 +177,12 @@ impl<'extractor> BalancedCaches<'extractor> {
Ok(()) Ok(())
} }
pub fn freeze(&mut self) -> Result<Vec<FrozenCache<'_, 'extractor>>> { pub fn freeze(&mut self, source_id: usize) -> Result<Vec<FrozenCache<'_, 'extractor>>> {
match &mut self.caches { match &mut self.caches {
InnerCaches::Normal(NormalCaches { caches }) => caches InnerCaches::Normal(NormalCaches { caches }) => caches
.iter_mut() .iter_mut()
.enumerate() .enumerate()
.map(|(bucket, map)| { .map(|(bucket_id, map)| {
// safety: we are transmuting the Bbbul into a FrozenBbbul // safety: we are transmuting the Bbbul into a FrozenBbbul
// that are the same size. // that are the same size.
let map = unsafe { let map = unsafe {
@ -201,14 +201,19 @@ impl<'extractor> BalancedCaches<'extractor> {
>, >,
>(map) >(map)
}; };
Ok(FrozenCache { bucket, cache: FrozenMap::new(map), spilled: Vec::new() }) Ok(FrozenCache {
source_id,
bucket_id,
cache: FrozenMap::new(map),
spilled: Vec::new(),
})
}) })
.collect(), .collect(),
InnerCaches::Spilling(SpillingCaches { caches, spilled_entries, .. }) => caches InnerCaches::Spilling(SpillingCaches { caches, spilled_entries, .. }) => caches
.iter_mut() .iter_mut()
.zip(mem::take(spilled_entries)) .zip(mem::take(spilled_entries))
.enumerate() .enumerate()
.map(|(bucket, (map, sorter))| { .map(|(bucket_id, (map, sorter))| {
let spilled = sorter let spilled = sorter
.into_reader_cursors()? .into_reader_cursors()?
.into_iter() .into_iter()
@ -234,7 +239,7 @@ impl<'extractor> BalancedCaches<'extractor> {
>, >,
>(map) >(map)
}; };
Ok(FrozenCache { bucket, cache: FrozenMap::new(map), spilled }) Ok(FrozenCache { source_id, bucket_id, cache: FrozenMap::new(map), spilled })
}) })
.collect(), .collect(),
} }
@ -440,7 +445,8 @@ fn spill_entry_to_sorter(
} }
pub struct FrozenCache<'a, 'extractor> { pub struct FrozenCache<'a, 'extractor> {
bucket: usize, bucket_id: usize,
source_id: usize,
cache: FrozenMap< cache: FrozenMap<
'a, 'a,
'extractor, 'extractor,
@ -457,9 +463,9 @@ pub fn transpose_and_freeze_caches<'a, 'extractor>(
let width = caches.first().map(BalancedCaches::buckets).unwrap_or(0); let width = caches.first().map(BalancedCaches::buckets).unwrap_or(0);
let mut bucket_caches: Vec<_> = iter::repeat_with(Vec::new).take(width).collect(); let mut bucket_caches: Vec<_> = iter::repeat_with(Vec::new).take(width).collect();
for thread_cache in caches { for (thread_index, thread_cache) in caches.iter_mut().enumerate() {
for frozen in thread_cache.freeze()? { for frozen in thread_cache.freeze(thread_index)? {
bucket_caches[frozen.bucket].push(frozen); bucket_caches[frozen.bucket_id].push(frozen);
} }
} }
@ -477,21 +483,16 @@ where
F: for<'a> FnMut(&'a [u8], DelAddRoaringBitmap) -> Result<()>, F: for<'a> FnMut(&'a [u8], DelAddRoaringBitmap) -> Result<()>,
{ {
let mut maps = Vec::new(); let mut maps = Vec::new();
let mut readers = Vec::new();
let mut current_bucket = None;
for FrozenCache { bucket, cache, ref mut spilled } in frozen {
assert_eq!(*current_bucket.get_or_insert(bucket), bucket);
maps.push(cache);
readers.append(spilled);
}
// First manage the spilled entries by looking into the HashMaps,
// merge them and mark them as dummy.
let mut heap = BinaryHeap::new(); let mut heap = BinaryHeap::new();
for (source_index, source) in readers.into_iter().enumerate() { let mut current_bucket = None;
let mut cursor = source.into_cursor()?; for FrozenCache { source_id, bucket_id, cache, spilled } in frozen {
if cursor.move_on_next()?.is_some() { assert_eq!(*current_bucket.get_or_insert(bucket_id), bucket_id);
heap.push(Entry { cursor, source_index }); maps.push((source_id, cache));
for reader in spilled {
let mut cursor = reader.into_cursor()?;
if cursor.move_on_next()?.is_some() {
heap.push(Entry { cursor, source_id });
}
} }
} }
@ -508,25 +509,29 @@ where
let mut output = DelAddRoaringBitmap::from_bytes(first_value)?; let mut output = DelAddRoaringBitmap::from_bytes(first_value)?;
while let Some(mut entry) = heap.peek_mut() { while let Some(mut entry) = heap.peek_mut() {
if let Some((key, _value)) = entry.cursor.current() { if let Some((key, value)) = entry.cursor.current() {
if first_key == key { if first_key != key {
let new = DelAddRoaringBitmap::from_bytes(first_value)?;
output = output.merge(new);
// When we are done we the current value of this entry move make
// it move forward and let the heap reorganize itself (on drop)
if entry.cursor.move_on_next()?.is_none() {
PeekMut::pop(entry);
}
} else {
break; break;
} }
let new = DelAddRoaringBitmap::from_bytes(value)?;
output = output.merge(new);
// When we are done we the current value of this entry move make
// it move forward and let the heap reorganize itself (on drop)
if entry.cursor.move_on_next()?.is_none() {
PeekMut::pop(entry);
}
} }
} }
// Once we merged all of the spilled bitmaps we must also // Once we merged all of the spilled bitmaps we must also
// fetch the entries from the non-spilled entries (the HashMaps). // fetch the entries from the non-spilled entries (the HashMaps).
for (map_index, map) in maps.iter_mut().enumerate() { for (source_id, map) in maps.iter_mut() {
if first_entry.source_index != map_index { debug_assert!(
!(map.get(first_key).is_some() && first_entry.source_id == *source_id),
"A thread should not have spiled a key that has been inserted in the cache"
);
if first_entry.source_id != *source_id {
if let Some(new) = map.get_mut(first_key) { if let Some(new) = map.get_mut(first_key) {
output.union_and_clear_bbbul(new); output.union_and_clear_bbbul(new);
} }
@ -538,12 +543,12 @@ where
// Don't forget to put the first entry back into the heap. // Don't forget to put the first entry back into the heap.
if first_entry.cursor.move_on_next()?.is_some() { if first_entry.cursor.move_on_next()?.is_some() {
heap.push(first_entry) heap.push(first_entry);
} }
} }
// Then manage the content on the HashMap entries that weren't taken (mem::take). // Then manage the content on the HashMap entries that weren't taken (mem::take).
while let Some(mut map) = maps.pop() { while let Some((_, mut map)) = maps.pop() {
// Make sure we don't try to work with entries already managed by the spilled // Make sure we don't try to work with entries already managed by the spilled
let mut ordered_entries: Vec<_> = let mut ordered_entries: Vec<_> =
map.iter_mut().filter(|(_, bbbul)| !bbbul.is_empty()).collect(); map.iter_mut().filter(|(_, bbbul)| !bbbul.is_empty()).collect();
@ -553,7 +558,7 @@ where
let mut output = DelAddRoaringBitmap::empty(); let mut output = DelAddRoaringBitmap::empty();
output.union_and_clear_bbbul(bbbul); output.union_and_clear_bbbul(bbbul);
for rhs in maps.iter_mut() { for (_, rhs) in maps.iter_mut() {
if let Some(new) = rhs.get_mut(key) { if let Some(new) = rhs.get_mut(key) {
output.union_and_clear_bbbul(new); output.union_and_clear_bbbul(new);
} }
@ -569,14 +574,14 @@ where
struct Entry<R> { struct Entry<R> {
cursor: ReaderCursor<R>, cursor: ReaderCursor<R>,
source_index: usize, source_id: usize,
} }
impl<R> Ord for Entry<R> { impl<R> Ord for Entry<R> {
fn cmp(&self, other: &Entry<R>) -> Ordering { fn cmp(&self, other: &Entry<R>) -> Ordering {
let skey = self.cursor.current().map(|(k, _)| k); let skey = self.cursor.current().map(|(k, _)| k);
let okey = other.cursor.current().map(|(k, _)| k); let okey = other.cursor.current().map(|(k, _)| k);
skey.cmp(&okey).then(self.source_index.cmp(&other.source_index)).reverse() skey.cmp(&okey).then(self.source_id.cmp(&other.source_id)).reverse()
} }
} }

View File

@ -16,10 +16,10 @@ use crate::update::del_add::DelAdd;
use crate::update::new::channel::FieldIdDocidFacetSender; use crate::update::new::channel::FieldIdDocidFacetSender;
use crate::update::new::extract::perm_json_p; use crate::update::new::extract::perm_json_p;
use crate::update::new::indexer::document_changes::{ use crate::update::new::indexer::document_changes::{
extract, DocumentChangeContext, DocumentChanges, Extractor, IndexingContext, Progress, extract, DocumentChangeContext, DocumentChanges, Extractor, IndexingContext,
}; };
use crate::update::new::ref_cell_ext::RefCellExt as _; use crate::update::new::ref_cell_ext::RefCellExt as _;
use crate::update::new::steps::Step; use crate::update::new::steps::IndexingStep;
use crate::update::new::thread_local::{FullySend, ThreadLocal}; use crate::update::new::thread_local::{FullySend, ThreadLocal};
use crate::update::new::DocumentChange; use crate::update::new::DocumentChange;
use crate::update::GrenadParameters; use crate::update::GrenadParameters;
@ -97,6 +97,15 @@ impl FacetedDocidsExtractor {
}, },
), ),
DocumentChange::Update(inner) => { DocumentChange::Update(inner) => {
if !inner.has_changed_for_fields(
Some(attributes_to_extract),
rtxn,
index,
context.db_fields_ids_map,
)? {
return Ok(());
}
extract_document_facets( extract_document_facets(
attributes_to_extract, attributes_to_extract,
inner.current(rtxn, index, context.db_fields_ids_map)?, inner.current(rtxn, index, context.db_fields_ids_map)?,
@ -364,26 +373,16 @@ fn truncate_str(s: &str) -> &str {
impl FacetedDocidsExtractor { impl FacetedDocidsExtractor {
#[tracing::instrument(level = "trace", skip_all, target = "indexing::extract::faceted")] #[tracing::instrument(level = "trace", skip_all, target = "indexing::extract::faceted")]
pub fn run_extraction< pub fn run_extraction<'pl, 'fid, 'indexer, 'index, 'extractor, DC: DocumentChanges<'pl>, MSP>(
'pl,
'fid,
'indexer,
'index,
'extractor,
DC: DocumentChanges<'pl>,
MSP,
SP,
>(
grenad_parameters: GrenadParameters, grenad_parameters: GrenadParameters,
document_changes: &DC, document_changes: &DC,
indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP, SP>, indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP>,
extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>, extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>,
sender: &FieldIdDocidFacetSender, sender: &FieldIdDocidFacetSender,
step: Step, step: IndexingStep,
) -> Result<Vec<BalancedCaches<'extractor>>> ) -> Result<Vec<BalancedCaches<'extractor>>>
where where
MSP: Fn() -> bool + Sync, MSP: Fn() -> bool + Sync,
SP: Fn(Progress) + Sync,
{ {
let index = indexing_context.index; let index = indexing_context.index;
let rtxn = index.read_txn()?; let rtxn = index.read_txn()?;

View File

@ -15,23 +15,22 @@ pub use geo::*;
pub use searchable::*; pub use searchable::*;
pub use vectors::EmbeddingExtractor; pub use vectors::EmbeddingExtractor;
use super::indexer::document_changes::{DocumentChanges, IndexingContext, Progress}; use super::indexer::document_changes::{DocumentChanges, IndexingContext};
use super::steps::Step; use super::steps::IndexingStep;
use super::thread_local::{FullySend, ThreadLocal}; use super::thread_local::{FullySend, ThreadLocal};
use crate::update::GrenadParameters; use crate::update::GrenadParameters;
use crate::Result; use crate::Result;
pub trait DocidsExtractor { pub trait DocidsExtractor {
fn run_extraction<'pl, 'fid, 'indexer, 'index, 'extractor, DC: DocumentChanges<'pl>, MSP, SP>( fn run_extraction<'pl, 'fid, 'indexer, 'index, 'extractor, DC: DocumentChanges<'pl>, MSP>(
grenad_parameters: GrenadParameters, grenad_parameters: GrenadParameters,
document_changes: &DC, document_changes: &DC,
indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP, SP>, indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP>,
extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>, extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>,
step: Step, step: IndexingStep,
) -> Result<Vec<BalancedCaches<'extractor>>> ) -> Result<Vec<BalancedCaches<'extractor>>>
where where
MSP: Fn() -> bool + Sync, MSP: Fn() -> bool + Sync;
SP: Fn(Progress) + Sync;
} }
/// TODO move in permissive json pointer /// TODO move in permissive json pointer

View File

@ -11,10 +11,10 @@ use super::tokenize_document::{tokenizer_builder, DocumentTokenizer};
use crate::update::new::extract::cache::BalancedCaches; use crate::update::new::extract::cache::BalancedCaches;
use crate::update::new::extract::perm_json_p::contained_in; use crate::update::new::extract::perm_json_p::contained_in;
use crate::update::new::indexer::document_changes::{ use crate::update::new::indexer::document_changes::{
extract, DocumentChangeContext, DocumentChanges, Extractor, IndexingContext, Progress, extract, DocumentChangeContext, DocumentChanges, Extractor, IndexingContext,
}; };
use crate::update::new::ref_cell_ext::RefCellExt as _; use crate::update::new::ref_cell_ext::RefCellExt as _;
use crate::update::new::steps::Step; use crate::update::new::steps::IndexingStep;
use crate::update::new::thread_local::{FullySend, MostlySend, ThreadLocal}; use crate::update::new::thread_local::{FullySend, MostlySend, ThreadLocal};
use crate::update::new::DocumentChange; use crate::update::new::DocumentChange;
use crate::update::GrenadParameters; use crate::update::GrenadParameters;
@ -28,7 +28,7 @@ pub struct WordDocidsBalancedCaches<'extractor> {
exact_word_docids: BalancedCaches<'extractor>, exact_word_docids: BalancedCaches<'extractor>,
word_position_docids: BalancedCaches<'extractor>, word_position_docids: BalancedCaches<'extractor>,
fid_word_count_docids: BalancedCaches<'extractor>, fid_word_count_docids: BalancedCaches<'extractor>,
fid_word_count: HashMap<FieldId, (usize, usize)>, fid_word_count: HashMap<FieldId, (Option<usize>, Option<usize>)>,
current_docid: Option<DocumentId>, current_docid: Option<DocumentId>,
} }
@ -85,8 +85,8 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
self.fid_word_count self.fid_word_count
.entry(field_id) .entry(field_id)
.and_modify(|(_current_count, new_count)| *new_count += 1) .and_modify(|(_current_count, new_count)| *new_count.get_or_insert(0) += 1)
.or_insert((0, 1)); .or_insert((None, Some(1)));
self.current_docid = Some(docid); self.current_docid = Some(docid);
Ok(()) Ok(())
@ -130,8 +130,8 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
self.fid_word_count self.fid_word_count
.entry(field_id) .entry(field_id)
.and_modify(|(current_count, _new_count)| *current_count += 1) .and_modify(|(current_count, _new_count)| *current_count.get_or_insert(0) += 1)
.or_insert((1, 0)); .or_insert((Some(1), None));
self.current_docid = Some(docid); self.current_docid = Some(docid);
@ -141,14 +141,18 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
fn flush_fid_word_count(&mut self, buffer: &mut BumpVec<u8>) -> Result<()> { fn flush_fid_word_count(&mut self, buffer: &mut BumpVec<u8>) -> Result<()> {
for (fid, (current_count, new_count)) in self.fid_word_count.drain() { for (fid, (current_count, new_count)) in self.fid_word_count.drain() {
if current_count != new_count { if current_count != new_count {
if current_count <= MAX_COUNTED_WORDS { if let Some(current_count) =
current_count.filter(|current_count| *current_count <= MAX_COUNTED_WORDS)
{
buffer.clear(); buffer.clear();
buffer.extend_from_slice(&fid.to_be_bytes()); buffer.extend_from_slice(&fid.to_be_bytes());
buffer.push(current_count as u8); buffer.push(current_count as u8);
self.fid_word_count_docids self.fid_word_count_docids
.insert_del_u32(buffer, self.current_docid.unwrap())?; .insert_del_u32(buffer, self.current_docid.unwrap())?;
} }
if new_count <= MAX_COUNTED_WORDS { if let Some(new_count) =
new_count.filter(|new_count| *new_count <= MAX_COUNTED_WORDS)
{
buffer.clear(); buffer.clear();
buffer.extend_from_slice(&fid.to_be_bytes()); buffer.extend_from_slice(&fid.to_be_bytes());
buffer.push(new_count as u8); buffer.push(new_count as u8);
@ -235,25 +239,15 @@ impl<'a, 'extractor> Extractor<'extractor> for WordDocidsExtractorData<'a> {
pub struct WordDocidsExtractors; pub struct WordDocidsExtractors;
impl WordDocidsExtractors { impl WordDocidsExtractors {
pub fn run_extraction< pub fn run_extraction<'pl, 'fid, 'indexer, 'index, 'extractor, DC: DocumentChanges<'pl>, MSP>(
'pl,
'fid,
'indexer,
'index,
'extractor,
DC: DocumentChanges<'pl>,
MSP,
SP,
>(
grenad_parameters: GrenadParameters, grenad_parameters: GrenadParameters,
document_changes: &DC, document_changes: &DC,
indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP, SP>, indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP>,
extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>, extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>,
step: Step, step: IndexingStep,
) -> Result<WordDocidsCaches<'extractor>> ) -> Result<WordDocidsCaches<'extractor>>
where where
MSP: Fn() -> bool + Sync, MSP: Fn() -> bool + Sync,
SP: Fn(Progress) + Sync,
{ {
let index = indexing_context.index; let index = indexing_context.index;
let rtxn = index.read_txn()?; let rtxn = index.read_txn()?;
@ -351,6 +345,15 @@ impl WordDocidsExtractors {
)?; )?;
} }
DocumentChange::Update(inner) => { DocumentChange::Update(inner) => {
if !inner.has_changed_for_fields(
document_tokenizer.attribute_to_extract,
&context.rtxn,
context.index,
context.db_fields_ids_map,
)? {
return Ok(());
}
let mut token_fn = |fname: &str, fid, pos, word: &str| { let mut token_fn = |fname: &str, fid, pos, word: &str| {
cached_sorter.insert_del_u32( cached_sorter.insert_del_u32(
fid, fid,

View File

@ -70,6 +70,15 @@ impl SearchableExtractor for WordPairProximityDocidsExtractor {
)?; )?;
} }
DocumentChange::Update(inner) => { DocumentChange::Update(inner) => {
if !inner.has_changed_for_fields(
document_tokenizer.attribute_to_extract,
rtxn,
index,
context.db_fields_ids_map,
)? {
return Ok(());
}
let document = inner.current(rtxn, index, context.db_fields_ids_map)?; let document = inner.current(rtxn, index, context.db_fields_ids_map)?;
process_document_tokens( process_document_tokens(
document, document,

View File

@ -14,9 +14,9 @@ use tokenize_document::{tokenizer_builder, DocumentTokenizer};
use super::cache::BalancedCaches; use super::cache::BalancedCaches;
use super::DocidsExtractor; use super::DocidsExtractor;
use crate::update::new::indexer::document_changes::{ use crate::update::new::indexer::document_changes::{
extract, DocumentChangeContext, DocumentChanges, Extractor, IndexingContext, Progress, extract, DocumentChangeContext, DocumentChanges, Extractor, IndexingContext,
}; };
use crate::update::new::steps::Step; use crate::update::new::steps::IndexingStep;
use crate::update::new::thread_local::{FullySend, ThreadLocal}; use crate::update::new::thread_local::{FullySend, ThreadLocal};
use crate::update::new::DocumentChange; use crate::update::new::DocumentChange;
use crate::update::GrenadParameters; use crate::update::GrenadParameters;
@ -56,16 +56,15 @@ impl<'a, 'extractor, EX: SearchableExtractor + Sync> Extractor<'extractor>
} }
pub trait SearchableExtractor: Sized + Sync { pub trait SearchableExtractor: Sized + Sync {
fn run_extraction<'pl, 'fid, 'indexer, 'index, 'extractor, DC: DocumentChanges<'pl>, MSP, SP>( fn run_extraction<'pl, 'fid, 'indexer, 'index, 'extractor, DC: DocumentChanges<'pl>, MSP>(
grenad_parameters: GrenadParameters, grenad_parameters: GrenadParameters,
document_changes: &DC, document_changes: &DC,
indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP, SP>, indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP>,
extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>, extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>,
step: Step, step: IndexingStep,
) -> Result<Vec<BalancedCaches<'extractor>>> ) -> Result<Vec<BalancedCaches<'extractor>>>
where where
MSP: Fn() -> bool + Sync, MSP: Fn() -> bool + Sync,
SP: Fn(Progress) + Sync,
{ {
let rtxn = indexing_context.index.read_txn()?; let rtxn = indexing_context.index.read_txn()?;
let stop_words = indexing_context.index.stop_words(&rtxn)?; let stop_words = indexing_context.index.stop_words(&rtxn)?;
@ -134,16 +133,15 @@ pub trait SearchableExtractor: Sized + Sync {
} }
impl<T: SearchableExtractor> DocidsExtractor for T { impl<T: SearchableExtractor> DocidsExtractor for T {
fn run_extraction<'pl, 'fid, 'indexer, 'index, 'extractor, DC: DocumentChanges<'pl>, MSP, SP>( fn run_extraction<'pl, 'fid, 'indexer, 'index, 'extractor, DC: DocumentChanges<'pl>, MSP>(
grenad_parameters: GrenadParameters, grenad_parameters: GrenadParameters,
document_changes: &DC, document_changes: &DC,
indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP, SP>, indexing_context: IndexingContext<'fid, 'indexer, 'index, MSP>,
extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>, extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>,
step: Step, step: IndexingStep,
) -> Result<Vec<BalancedCaches<'extractor>>> ) -> Result<Vec<BalancedCaches<'extractor>>>
where where
MSP: Fn() -> bool + Sync, MSP: Fn() -> bool + Sync,
SP: Fn(Progress) + Sync,
{ {
Self::run_extraction( Self::run_extraction(
grenad_parameters, grenad_parameters,

View File

@ -176,9 +176,10 @@ pub fn tokenizer_builder<'a>(
#[cfg(test)] #[cfg(test)]
mod test { mod test {
use bumpalo::Bump; use bumpalo::Bump;
use bumparaw_collections::RawMap;
use charabia::TokenizerBuilder; use charabia::TokenizerBuilder;
use meili_snap::snapshot; use meili_snap::snapshot;
use raw_collections::RawMap; use rustc_hash::FxBuildHasher;
use serde_json::json; use serde_json::json;
use serde_json::value::RawValue; use serde_json::value::RawValue;
@ -234,7 +235,7 @@ mod test {
let bump = Bump::new(); let bump = Bump::new();
let document: &RawValue = serde_json::from_str(&document).unwrap(); let document: &RawValue = serde_json::from_str(&document).unwrap();
let document = RawMap::from_raw_value(document, &bump).unwrap(); let document = RawMap::from_raw_value_and_hasher(document, FxBuildHasher, &bump).unwrap();
let document = Versions::single(document); let document = Versions::single(document);
let document = DocumentFromVersions::new(&document); let document = DocumentFromVersions::new(&document);

View File

@ -130,6 +130,7 @@ impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
); );
} else if new_vectors.regenerate { } else if new_vectors.regenerate {
let new_rendered = prompt.render_document( let new_rendered = prompt.render_document(
update.external_document_id(),
update.current( update.current(
&context.rtxn, &context.rtxn,
context.index, context.index,
@ -139,6 +140,7 @@ impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
&context.doc_alloc, &context.doc_alloc,
)?; )?;
let old_rendered = prompt.render_document( let old_rendered = prompt.render_document(
update.external_document_id(),
update.merged( update.merged(
&context.rtxn, &context.rtxn,
context.index, context.index,
@ -158,6 +160,7 @@ impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
} }
} else if old_vectors.regenerate { } else if old_vectors.regenerate {
let old_rendered = prompt.render_document( let old_rendered = prompt.render_document(
update.external_document_id(),
update.current( update.current(
&context.rtxn, &context.rtxn,
context.index, context.index,
@ -167,6 +170,7 @@ impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
&context.doc_alloc, &context.doc_alloc,
)?; )?;
let new_rendered = prompt.render_document( let new_rendered = prompt.render_document(
update.external_document_id(),
update.merged( update.merged(
&context.rtxn, &context.rtxn,
context.index, context.index,
@ -216,6 +220,7 @@ impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
); );
} else if new_vectors.regenerate { } else if new_vectors.regenerate {
let rendered = prompt.render_document( let rendered = prompt.render_document(
insertion.external_document_id(),
insertion.inserted(), insertion.inserted(),
context.new_fields_ids_map, context.new_fields_ids_map,
&context.doc_alloc, &context.doc_alloc,
@ -229,6 +234,7 @@ impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
} }
} else { } else {
let rendered = prompt.render_document( let rendered = prompt.render_document(
insertion.external_document_id(),
insertion.inserted(), insertion.inserted(),
context.new_fields_ids_map, context.new_fields_ids_map,
&context.doc_alloc, &context.doc_alloc,

View File

@ -103,6 +103,8 @@ impl<'indexer> FacetSearchBuilder<'indexer> {
#[tracing::instrument(level = "trace", skip_all, target = "indexing::facet_fst")] #[tracing::instrument(level = "trace", skip_all, target = "indexing::facet_fst")]
pub fn merge_and_write(self, index: &Index, wtxn: &mut RwTxn, rtxn: &RoTxn) -> Result<()> { pub fn merge_and_write(self, index: &Index, wtxn: &mut RwTxn, rtxn: &RoTxn) -> Result<()> {
tracing::trace!("merge facet strings for facet search: {:?}", self.registered_facets);
let reader = self.normalized_facet_string_docids_sorter.into_reader_cursors()?; let reader = self.normalized_facet_string_docids_sorter.into_reader_cursors()?;
let mut builder = grenad::MergerBuilder::new(MergeDeladdBtreesetString); let mut builder = grenad::MergerBuilder::new(MergeDeladdBtreesetString);
builder.extend(reader); builder.extend(reader);
@ -118,12 +120,15 @@ impl<'indexer> FacetSearchBuilder<'indexer> {
BEU16StrCodec::bytes_decode(key).map_err(heed::Error::Encoding)?; BEU16StrCodec::bytes_decode(key).map_err(heed::Error::Encoding)?;
if current_field_id != Some(field_id) { if current_field_id != Some(field_id) {
if let Some(fst_merger_builder) = fst_merger_builder { if let (Some(current_field_id), Some(fst_merger_builder)) =
(current_field_id, fst_merger_builder)
{
let mmap = fst_merger_builder.build(&mut callback)?; let mmap = fst_merger_builder.build(&mut callback)?;
index index.facet_id_string_fst.remap_data_type::<Bytes>().put(
.facet_id_string_fst wtxn,
.remap_data_type::<Bytes>() &current_field_id,
.put(wtxn, &field_id, &mmap)?; &mmap,
)?;
} }
fst = index.facet_id_string_fst.get(rtxn, &field_id)?; fst = index.facet_id_string_fst.get(rtxn, &field_id)?;

View File

@ -1,6 +1,8 @@
use std::ops::ControlFlow; use std::ops::ControlFlow;
use bumpalo::Bump; use bumpalo::Bump;
use bumparaw_collections::RawVec;
use rustc_hash::FxBuildHasher;
use serde::de::{DeserializeSeed, Deserializer as _, Visitor}; use serde::de::{DeserializeSeed, Deserializer as _, Visitor};
use serde_json::value::RawValue; use serde_json::value::RawValue;
@ -360,7 +362,7 @@ impl<'a> DeserrRawValue<'a> {
} }
pub struct DeserrRawVec<'a> { pub struct DeserrRawVec<'a> {
vec: raw_collections::RawVec<'a>, vec: RawVec<'a>,
alloc: &'a Bump, alloc: &'a Bump,
} }
@ -379,7 +381,7 @@ impl<'a> deserr::Sequence for DeserrRawVec<'a> {
} }
pub struct DeserrRawVecIter<'a> { pub struct DeserrRawVecIter<'a> {
it: raw_collections::vec::iter::IntoIter<'a>, it: bumparaw_collections::vec::iter::IntoIter<'a>,
alloc: &'a Bump, alloc: &'a Bump,
} }
@ -393,7 +395,7 @@ impl<'a> Iterator for DeserrRawVecIter<'a> {
} }
pub struct DeserrRawMap<'a> { pub struct DeserrRawMap<'a> {
map: raw_collections::RawMap<'a>, map: bumparaw_collections::RawMap<'a, FxBuildHasher>,
alloc: &'a Bump, alloc: &'a Bump,
} }
@ -416,7 +418,7 @@ impl<'a> deserr::Map for DeserrRawMap<'a> {
} }
pub struct DeserrRawMapIter<'a> { pub struct DeserrRawMapIter<'a> {
it: raw_collections::map::iter::IntoIter<'a>, it: bumparaw_collections::map::iter::IntoIter<'a>,
alloc: &'a Bump, alloc: &'a Bump,
} }
@ -615,7 +617,7 @@ impl<'de> Visitor<'de> for DeserrRawValueVisitor<'de> {
where where
A: serde::de::SeqAccess<'de>, A: serde::de::SeqAccess<'de>,
{ {
let mut raw_vec = raw_collections::RawVec::new_in(self.alloc); let mut raw_vec = RawVec::new_in(self.alloc);
while let Some(next) = seq.next_element()? { while let Some(next) = seq.next_element()? {
raw_vec.push(next); raw_vec.push(next);
} }

View File

@ -1,4 +1,5 @@
use std::cell::{Cell, RefCell}; use std::cell::{Cell, RefCell};
use std::sync::atomic::Ordering;
use std::sync::{Arc, RwLock}; use std::sync::{Arc, RwLock};
use bumpalo::Bump; use bumpalo::Bump;
@ -7,8 +8,9 @@ use rayon::iter::IndexedParallelIterator;
use super::super::document_change::DocumentChange; use super::super::document_change::DocumentChange;
use crate::fields_ids_map::metadata::FieldIdMapWithMetadata; use crate::fields_ids_map::metadata::FieldIdMapWithMetadata;
use crate::progress::{AtomicDocumentStep, Progress};
use crate::update::new::parallel_iterator_ext::ParallelIteratorExt as _; use crate::update::new::parallel_iterator_ext::ParallelIteratorExt as _;
use crate::update::new::steps::Step; use crate::update::new::steps::IndexingStep;
use crate::update::new::thread_local::{FullySend, MostlySend, ThreadLocal}; use crate::update::new::thread_local::{FullySend, MostlySend, ThreadLocal};
use crate::{FieldsIdsMap, GlobalFieldsIdsMap, Index, InternalError, Result}; use crate::{FieldsIdsMap, GlobalFieldsIdsMap, Index, InternalError, Result};
@ -133,10 +135,8 @@ pub struct IndexingContext<
'indexer, // covariant lifetime of objects that are borrowed during the entire indexing operation 'indexer, // covariant lifetime of objects that are borrowed during the entire indexing operation
'index, // covariant lifetime of the index 'index, // covariant lifetime of the index
MSP, MSP,
SP,
> where > where
MSP: Fn() -> bool + Sync, MSP: Fn() -> bool + Sync,
SP: Fn(Progress) + Sync,
{ {
pub index: &'index Index, pub index: &'index Index,
pub db_fields_ids_map: &'indexer FieldsIdsMap, pub db_fields_ids_map: &'indexer FieldsIdsMap,
@ -144,7 +144,7 @@ pub struct IndexingContext<
pub doc_allocs: &'indexer ThreadLocal<FullySend<Cell<Bump>>>, pub doc_allocs: &'indexer ThreadLocal<FullySend<Cell<Bump>>>,
pub fields_ids_map_store: &'indexer ThreadLocal<FullySend<RefCell<GlobalFieldsIdsMap<'fid>>>>, pub fields_ids_map_store: &'indexer ThreadLocal<FullySend<RefCell<GlobalFieldsIdsMap<'fid>>>>,
pub must_stop_processing: &'indexer MSP, pub must_stop_processing: &'indexer MSP,
pub send_progress: &'indexer SP, pub progress: &'indexer Progress,
} }
impl< impl<
@ -152,18 +152,15 @@ impl<
'indexer, // covariant lifetime of objects that are borrowed during the entire indexing operation 'indexer, // covariant lifetime of objects that are borrowed during the entire indexing operation
'index, // covariant lifetime of the index 'index, // covariant lifetime of the index
MSP, MSP,
SP,
> Copy > Copy
for IndexingContext< for IndexingContext<
'fid, // invariant lifetime of fields ids map 'fid, // invariant lifetime of fields ids map
'indexer, // covariant lifetime of objects that are borrowed during the entire indexing operation 'indexer, // covariant lifetime of objects that are borrowed during the entire indexing operation
'index, // covariant lifetime of the index 'index, // covariant lifetime of the index
MSP, MSP,
SP,
> >
where where
MSP: Fn() -> bool + Sync, MSP: Fn() -> bool + Sync,
SP: Fn(Progress) + Sync,
{ {
} }
@ -172,18 +169,15 @@ impl<
'indexer, // covariant lifetime of objects that are borrowed during the entire indexing operation 'indexer, // covariant lifetime of objects that are borrowed during the entire indexing operation
'index, // covariant lifetime of the index 'index, // covariant lifetime of the index
MSP, MSP,
SP,
> Clone > Clone
for IndexingContext< for IndexingContext<
'fid, // invariant lifetime of fields ids map 'fid, // invariant lifetime of fields ids map
'indexer, // covariant lifetime of objects that are borrowed during the entire indexing operation 'indexer, // covariant lifetime of objects that are borrowed during the entire indexing operation
'index, // covariant lifetime of the index 'index, // covariant lifetime of the index
MSP, MSP,
SP,
> >
where where
MSP: Fn() -> bool + Sync, MSP: Fn() -> bool + Sync,
SP: Fn(Progress) + Sync,
{ {
fn clone(&self) -> Self { fn clone(&self) -> Self {
*self *self
@ -202,7 +196,6 @@ pub fn extract<
EX, EX,
DC: DocumentChanges<'pl>, DC: DocumentChanges<'pl>,
MSP, MSP,
SP,
>( >(
document_changes: &DC, document_changes: &DC,
extractor: &EX, extractor: &EX,
@ -213,18 +206,18 @@ pub fn extract<
doc_allocs, doc_allocs,
fields_ids_map_store, fields_ids_map_store,
must_stop_processing, must_stop_processing,
send_progress, progress,
}: IndexingContext<'fid, 'indexer, 'index, MSP, SP>, }: IndexingContext<'fid, 'indexer, 'index, MSP>,
extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>, extractor_allocs: &'extractor mut ThreadLocal<FullySend<Bump>>,
datastore: &'data ThreadLocal<EX::Data>, datastore: &'data ThreadLocal<EX::Data>,
step: Step, step: IndexingStep,
) -> Result<()> ) -> Result<()>
where where
EX: Extractor<'extractor>, EX: Extractor<'extractor>,
MSP: Fn() -> bool + Sync, MSP: Fn() -> bool + Sync,
SP: Fn(Progress) + Sync,
{ {
tracing::trace!("We are resetting the extractor allocators"); tracing::trace!("We are resetting the extractor allocators");
progress.update_progress(step);
// Clean up and reuse the extractor allocs // Clean up and reuse the extractor allocs
for extractor_alloc in extractor_allocs.iter_mut() { for extractor_alloc in extractor_allocs.iter_mut() {
tracing::trace!("\tWith {} bytes reset", extractor_alloc.0.allocated_bytes()); tracing::trace!("\tWith {} bytes reset", extractor_alloc.0.allocated_bytes());
@ -232,9 +225,11 @@ where
} }
let total_documents = document_changes.len() as u32; let total_documents = document_changes.len() as u32;
let (step, progress_step) = AtomicDocumentStep::new(total_documents);
progress.update_progress(progress_step);
let pi = document_changes.iter(CHUNK_SIZE); let pi = document_changes.iter(CHUNK_SIZE);
pi.enumerate().try_arc_for_each_try_init( pi.try_arc_for_each_try_init(
|| { || {
DocumentChangeContext::new( DocumentChangeContext::new(
index, index,
@ -247,13 +242,10 @@ where
move |index_alloc| extractor.init_data(index_alloc), move |index_alloc| extractor.init_data(index_alloc),
) )
}, },
|context, (finished_documents, items)| { |context, items| {
if (must_stop_processing)() { if (must_stop_processing)() {
return Err(Arc::new(InternalError::AbortedIndexation.into())); return Err(Arc::new(InternalError::AbortedIndexation.into()));
} }
let finished_documents = (finished_documents * CHUNK_SIZE) as u32;
(send_progress)(Progress::from_step_substep(step, finished_documents, total_documents));
// Clean up and reuse the document-specific allocator // Clean up and reuse the document-specific allocator
context.doc_alloc.reset(); context.doc_alloc.reset();
@ -264,6 +256,7 @@ where
}); });
let res = extractor.process(changes, context).map_err(Arc::new); let res = extractor.process(changes, context).map_err(Arc::new);
step.fetch_add(items.as_ref().len() as u32, Ordering::Relaxed);
// send back the doc_alloc in the pool // send back the doc_alloc in the pool
context.doc_allocs.get_or_default().0.set(std::mem::take(&mut context.doc_alloc)); context.doc_allocs.get_or_default().0.set(std::mem::take(&mut context.doc_alloc));
@ -271,32 +264,7 @@ where
res res
}, },
)?; )?;
step.store(total_documents, Ordering::Relaxed);
(send_progress)(Progress::from_step_substep(step, total_documents, total_documents));
Ok(()) Ok(())
} }
pub struct Progress {
pub finished_steps: u16,
pub total_steps: u16,
pub step_name: &'static str,
pub finished_total_substep: Option<(u32, u32)>,
}
impl Progress {
pub fn from_step(step: Step) -> Self {
Self {
finished_steps: step.finished_steps(),
total_steps: Step::total_steps(),
step_name: step.name(),
finished_total_substep: None,
}
}
pub fn from_step_substep(step: Step, finished_substep: u32, total_substep: u32) -> Self {
Self {
finished_total_substep: Some((finished_substep, total_substep)),
..Progress::from_step(step)
}
}
}

View File

@ -92,11 +92,12 @@ mod test {
use crate::fields_ids_map::metadata::{FieldIdMapWithMetadata, MetadataBuilder}; use crate::fields_ids_map::metadata::{FieldIdMapWithMetadata, MetadataBuilder};
use crate::index::tests::TempIndex; use crate::index::tests::TempIndex;
use crate::progress::Progress;
use crate::update::new::indexer::document_changes::{ use crate::update::new::indexer::document_changes::{
extract, DocumentChangeContext, Extractor, IndexingContext, extract, DocumentChangeContext, Extractor, IndexingContext,
}; };
use crate::update::new::indexer::DocumentDeletion; use crate::update::new::indexer::DocumentDeletion;
use crate::update::new::steps::Step; use crate::update::new::steps::IndexingStep;
use crate::update::new::thread_local::{MostlySend, ThreadLocal}; use crate::update::new::thread_local::{MostlySend, ThreadLocal};
use crate::update::new::DocumentChange; use crate::update::new::DocumentChange;
use crate::DocumentId; use crate::DocumentId;
@ -164,7 +165,7 @@ mod test {
doc_allocs: &doc_allocs, doc_allocs: &doc_allocs,
fields_ids_map_store: &fields_ids_map_store, fields_ids_map_store: &fields_ids_map_store,
must_stop_processing: &(|| false), must_stop_processing: &(|| false),
send_progress: &(|_progress| {}), progress: &Progress::default(),
}; };
for _ in 0..3 { for _ in 0..3 {
@ -176,7 +177,7 @@ mod test {
context, context,
&mut extractor_allocs, &mut extractor_allocs,
&datastore, &datastore,
Step::ExtractingDocuments, IndexingStep::ExtractingDocuments,
) )
.unwrap(); .unwrap();

Some files were not shown because too many files have changed in this diff Show More