Compare commits

..

112 Commits

Author SHA1 Message Date
ManyTheFish
14a980e54e Add debugs 2024-12-12 09:38:49 +01:00
ManyTheFish
cbc453c6d1 Avoid cloning database bitmap when it's possible 2024-12-10 11:34:39 +01:00
ManyTheFish
2fb065b9fb Reduce merge allocations 2024-12-10 11:00:20 +01:00
ManyTheFish
07f42e8057 Do not index a filed count when no word is counted 2024-12-09 15:45:12 +01:00
ManyTheFish
71f59749dc Reduce union impact in merging 2024-12-09 15:44:06 +01:00
meili-bors[bot]
3b0b9967f6 Merge #5141
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 16s
Test suite / Run tests in debug (push) Failing after 14s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 44s
Test suite / Run Rustfmt (push) Successful in 9m52s
Test suite / Run Clippy (push) Successful in 1h2m24s
5141: Use the right amount of max memory and not impact the settings r=curquiza a=Kerollmops

Fixes #5132. Related to #5125.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-09 10:40:46 +00:00
meili-bors[bot]
123b54a178 Merge #5056
5056: Attach index name in error message r=irevoire a=airycanon

# Pull Request

## Related issue
Fixes #4392 

## What does this PR do?
- ...

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: airycanon <airycanon@airycanon.me>
2024-12-09 09:59:12 +00:00
Kerollmops
f5dd8dfc3e Rollback max memory usage changes 2024-12-09 10:26:30 +01:00
Kerollmops
bcfed70888 Revert "Merge #5125"
This reverts commit 9a9383643f, reversing
changes made to cac355bfa7.
2024-12-09 10:08:02 +01:00
meili-bors[bot]
503ef3bbc9 Merge #5138
5138: Allow xtask bench to proceed without a commit message r=Kerollmops a=dureuill



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-09 09:00:12 +00:00
Louis Dureuil
08f2c696b0 Allow xtask bench to proceed without a commit message 2024-12-09 09:36:59 +01:00
airycanon
b75f1f4c17 fix tests
# Conflicts:
#	crates/index-scheduler/src/batch.rs
#	crates/index-scheduler/src/snapshots/lib.rs/fail_in_process_batch_for_document_deletion/after_removing_the_documents.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_bad_primary_key/fifth_task_succeeds.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_bad_primary_key/fourth_task_fails.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_multiple_primary_key/second_task_fails.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_multiple_primary_key/third_task_fails.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_multiple_primary_key_batch_wrong_key/second_and_third_tasks_fails.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_set_and_null_primary_key_inference_works/all_other_tasks_succeeds.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_set_and_null_primary_key_inference_works/second_task_fails.snap
#	crates/index-scheduler/src/snapshots/lib.rs/test_document_addition_with_set_and_null_primary_key_inference_works/third_task_succeeds.snap

# Conflicts:
#	crates/index-scheduler/src/batch.rs
#	crates/meilisearch/src/search/mod.rs
#	crates/meilisearch/tests/vector/mod.rs

# Conflicts:
#	crates/index-scheduler/src/batch.rs
2024-12-06 02:03:02 +08:00
airycanon
95ed079761 attach index name in errors
# Conflicts:
#	crates/index-scheduler/src/batch.rs

# Conflicts:
#	crates/index-scheduler/src/batch.rs
#	crates/meilisearch/src/search/mod.rs
2024-12-06 01:12:13 +08:00
meili-bors[bot]
4a082683df Merge #5131
Some checks failed
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 21s
Test suite / Tests on ubuntu-20.04 (push) Failing after 10s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Run Rustfmt (push) Successful in 1m25s
Test suite / Run Clippy (push) Successful in 5m54s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5131: Ignore documents whose selected fields didn't change r=dureuill a=dureuill

Attempts to improve the new indexer performance by ignoring documents whose selected fields didn't change:

- Add `Update::has_changed_for_fields` function
- Ignore documents whose searchable attributes didn't change for word docids and word pair proximity extraction
- Ignore documents whose faceted attributes didn't change for facet extraction

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-05 16:04:16 +00:00
meili-bors[bot]
26be5e0733 Merge #5123
5123: Fix batch details r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5079
Fixes https://github.com/meilisearch/meilisearch/issues/5112

## What does this PR do?
- Make the processing tasks actually processing in the stats of the batch instead of enqueued
- Stop counting one extra task for all non-prioritized batches in the stats
- Add a test

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-05 15:21:55 +00:00
Louis Dureuil
bd5110a2fe Fix clippy warnings 2024-12-05 16:13:07 +01:00
Louis Dureuil
fa8b9acdf6 Ignore documents that didn't change in facets 2024-12-05 16:12:52 +01:00
Louis Dureuil
2b74d1824b Ignore documents that didn't change any field in word pair proximity 2024-12-05 15:56:22 +01:00
Louis Dureuil
c77b00d3ac Don't extract word docids when no searchable changed 2024-12-05 15:51:58 +01:00
Louis Dureuil
c77073efcc Update::has_changed_for_fields 2024-12-05 15:50:12 +01:00
meili-bors[bot]
1537323eb9 Merge #5119
5119: Settings opt out error msg r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
PRD: https://meilisearch.notion.site/API-usage-Settings-to-opt-out-indexing-features-fff4b06b651f8108ade3f858aeb16b14?pvs=4
## What does this PR do?

Add a new error code and message when the user tries a facet search on an index where the facet search is disabled:
```json
{
  "message": "The facet search is disabled for this index",
  "code": "facet_search_disabled",
  "type": "invalid_request",
  "link": "https://docs.meilisearch.com/errors#invalid_facet_search_disabled"
}
 ```


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-12-05 13:51:11 +00:00
ManyTheFish
a0a3b55700 Change error code 2024-12-05 14:48:29 +01:00
Tamo
214b51de87 try to fix the snapshot on demand flaky test 2024-12-05 14:45:54 +01:00
Tamo
95975944d7 fix the dumps missing the empty swap index tasks 2024-12-05 14:23:38 +01:00
meili-bors[bot]
9a9383643f Merge #5125
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 37s
Test suite / Tests on ubuntu-20.04 (push) Failing after 15s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 12s
Test suite / Run Rustfmt (push) Successful in 2m14s
Test suite / Run Clippy (push) Successful in 12m4s
5125: Change the default max memory usage to 5% of the total memory r=ManyTheFish a=Kerollmops

After thorough testing, we found that giving 5% of the total available memory to allocate resident memory (caches and channels) is the best approach.

The main reason is that the new indexer is highly memory-map oriented, with LMDB, and reads the database while performing the indexation. So, by allowing the maximum amount of memory available to LMDB and the OS, it will perform the key-value store reads and all other indexation operations faster by keeping more pages hot in the cache. In #5124, we also sorted the entries to merge to improve the read speed of LMDB.

This is common in database management systems: Reading stuff on the disk is much faster when done in lexicographic order (the default sorted order of key values). The entries have a great chance of already being in the OS memory cache, as they were loaded in a previous read, and reading stuff on the disk is very slow compared to reading memory.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 10:11:25 +00:00
meili-bors[bot]
cac355bfa7 Merge #5124
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops

In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:

 - Optimize the prefix generation for word position docids (`@manythefish)`
 - Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
 
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
 
Before on the tag meilisearch-v1.12.0-rc.3.

```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```

After sorting the whole `HashMap`s in a `Vec` on this branch.

```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 09:35:52 +00:00
Kerollmops
9020a50df8 Change the default max memory usage to 5% of the total memory 2024-12-05 10:14:46 +01:00
Kerollmops
52843123d4 Clean up and remove the non-sorted merge_caches function 2024-12-05 10:03:05 +01:00
meili-bors[bot]
6298db5bea Merge #5113
5113: Fix the Minimum BBQueue channel threshold r=Kerollmops a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-05 09:01:02 +00:00
meili-bors[bot]
a003a0934a Merge #5121
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 9s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 24s
Test suite / Run Rustfmt (push) Successful in 1m19s
Test suite / Run Clippy (push) Successful in 5m32s
5121: Make the tasks pulling timeout configurable r=dureuill a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-04 17:04:14 +00:00
Louis Dureuil
3a11e39c01 Force max_memory to a min of 100MiB 2024-12-04 17:53:30 +01:00
Louis Dureuil
5f896b1050 Fix geo when spilling 2024-12-04 17:51:12 +01:00
Kerollmops
d0c4e6da6b Make clippy happy 2024-12-04 17:39:10 +01:00
Kerollmops
2da5584bb5 Make the tasks pulling timeout configurable 2024-12-04 17:39:07 +01:00
meili-bors[bot]
b7eb802ae6 Merge #5120
5120: Add cross tasks r=Kerollmops a=ManyTheFish

Add 4 xtask bench workloads:
- `hackernews-add-new-documents`: adds new documents on a db already containing documents
- `hackernews-modify-facet-numbers`: modify filterable fields containing numbers of documents on a db already containing documents
- `hackernews-modify-facet-strings`: modify filterable fields containing strings of documents on a db already containing documents
- `hackernews-modify-searchables`: modify searchable fields of documents on a db already containing documents

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-12-04 16:16:57 +00:00
Kerollmops
2e32d0474c Lexicographically sort all the map to merge 2024-12-04 17:05:11 +01:00
Kerollmops
cb99ac6f7e Consume vec instead of draining 2024-12-04 17:00:22 +01:00
Kerollmops
be411435f5 Use the merge_caches_alt function in the docids merging 2024-12-04 16:37:29 +01:00
Kerollmops
29ef164530 Introduce a new semi ordered merge function 2024-12-04 16:33:35 +01:00
ManyTheFish
739c52a3cd Replace HashSets by BTreeSets for the prefixes 2024-12-04 16:16:48 +01:00
Tamo
7a2af06b1e update the impacted snapshots 2024-12-04 15:52:24 +01:00
Tamo
cb0c3a5aad stop adding one enqueued tasks to all unprioritized batches 2024-12-04 15:48:28 +01:00
ManyTheFish
8388698993 Fix dat hash 2024-12-04 15:09:10 +01:00
Tamo
cbcf6c9ba3 make the processing tasks as processing in a batch 2024-12-04 14:48:48 +01:00
Tamo
bf742d81cf add a test 2024-12-04 14:47:02 +01:00
ManyTheFish
7458f0386c fix asset name 2024-12-04 14:44:57 +01:00
ManyTheFish
fc1df5793c fix tests 2024-12-04 14:35:20 +01:00
meili-bors[bot]
3ded069042 Merge #5122
5122: Yield the BBQueue writing loop r=ManyTheFish a=Kerollmops

We prefer yielding to let the writing thread do its job instead of spin looping.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-04 13:33:51 +00:00
Kerollmops
261d2ceb06 Yield the BBQueue writer instead of spin looping 2024-12-04 14:16:40 +01:00
ManyTheFish
1a17e2e572 fix formating 2024-12-04 13:57:06 +01:00
meili-bors[bot]
5b8cd68abe Merge #5110
5110: Increase margin on deletion of task r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5077

## What does this PR do?
- Increase the margin we keep to enqueue task deletion

The issue was that we had not enough space on the reserved memory to write both the batch and the deletion task we just enqueued.
We could fix it only for this test as it’s not an issue in production where we have 10GiB of margin, but I thought it wasn’t a bad idea either to increase our margin a bit since we’re effectively writing more to lmdb.


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-04 12:54:48 +00:00
ManyTheFish
5ce9acb0b9 Add workloads 2024-12-04 12:19:19 +01:00
ManyTheFish
953a82ca04 Add new error message 2024-12-04 11:15:29 +01:00
meili-bors[bot]
54341c2e80 Merge #5118
5118: Change the reserve and grant function to accept a closure r=ManyTheFish a=Kerollmops

This simplifies the usage of the grant and commits it at the right time, just after having written in it.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-04 10:12:39 +00:00
Kerollmops
96831ed9bb Send the WakeUp message if necessary in the reserve function 2024-12-04 11:03:01 +01:00
Kerollmops
0459b1a242 Change the reserve and grant function to accept a closure 2024-12-04 10:32:25 +01:00
Kerollmops
8ecb726683 Fix the minimun BBQueue channel threshold 2024-12-03 15:49:11 +01:00
meili-bors[bot]
297e72e262 Merge #5111
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 43s
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 9s
Test suite / Run Clippy (push) Successful in 7m18s
Test suite / Run Rustfmt (push) Successful in 1m32s
5111: Update BBQueue repo to point to the Meilisearch org r=curquiza a=Kerollmops

This PR updates the milli dependencies to make BBQueue point to the Meilisearch org repo.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-12-03 14:27:04 +00:00
Clément Renault
0ad2f57a92 Update bbqueue repo to point to the meilisearch org 2024-12-03 12:00:04 +01:00
Tamo
71d53f413f increase the margin allowed to delete task 2024-12-03 11:07:03 +01:00
meili-bors[bot]
054622bd16 Merge #5094
5094: Implement a bbqueue channel between the extractors and the writer r=dureuill a=Kerollmops

This PR switches from a bounded crossbeam channel only with allocated entries for the communication between the extractors and the writer to a [BBQueue](https://github.com/jamesmunns/bbqueue)-based system with a Single Producer Single Consumer kind of Circular/Ring Buffers channel.

 - [x] Implement the BBQueue channel system...
 - [x] with a crossbeam channel to wake up the receiver.
 - [x] Manage the BBQueue allocated memory dynamically.
 - [x] Support content that doesn't fit in the bbqueues.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-12-03 08:00:55 +00:00
Louis Dureuil
e905a72d73 remove mimalloc on Windows 2024-12-02 18:13:56 +01:00
meili-bors[bot]
2e879c1df8 Merge #5109
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 11s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 24s
Test suite / Run Rustfmt (push) Successful in 1m22s
Test suite / Run Clippy (push) Successful in 6m29s
5109: Fix autobatch r=dureuill a=dureuill

Fixes most SDK tests and flaky failures

Changes:

- Make sure that the settings are not autobatched with document operations, as the new indexer no longer supports this operating mode

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-02 16:30:51 +00:00
Louis Dureuil
d040aff101 Stop allocating 1GiB for documents 2024-12-02 16:30:14 +01:00
meili-bors[bot]
5e30731cad Merge #5107
5107: While spamming the batches route we could see a processing batch becoming missing and then finished, this commit ensures the batches goes from processing to finished directly r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes the failed tests from this PR: https://github.com/meilisearch/meilisearch-js/pull/1775
See [this message](https://meilisearch.slack.com/archives/CD7Q2UKGB/p1732784680450749) [private link] for more context

## What does this PR do?
- Ensure we never enter a state where a processing batches (only existing in RAM) becomes « Not found » by removing the processing batches AFTER writing them to disk
- This should also theoretically avoid an issue where a task could go from processing to enqueued and then finished


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-02 14:36:29 +00:00
Tamo
beeb31ce41 Update crates/index-scheduler/src/lib.rs 2024-12-02 15:32:16 +01:00
Louis Dureuil
057143214d Fix warnings 2024-12-02 14:42:31 +01:00
Louis Dureuil
6a1d26a60c Update autobatching tests 2024-12-02 14:15:15 +01:00
Louis Dureuil
d78f4666a0 Fix autobatching of documents and settings 2024-12-02 12:25:01 +01:00
Tamo
a439fa3e1a While spamming the batches route we could see a processing batch becoming missing and then finished, this commit ensures the batches goes from processing to finished directly 2024-12-02 12:02:16 +01:00
Clément Renault
767259be7e Prefer returning a abort indexation rather than throwing a panic 2024-12-02 11:53:42 +01:00
Clément Renault
e9f34fb4b1 Make the frame consumer pulling fair 2024-12-02 11:49:01 +01:00
Clément Renault
d5c07ef7b3 Manage key length conversion error correctly 2024-12-02 11:03:00 +01:00
Clément Renault
5e218f3f4d Remove a sync_all (mark my words) 2024-12-02 11:03:00 +01:00
Clément Renault
bcab61ab1d Do spurious wake ups on the receiver side 2024-12-02 11:03:00 +01:00
Clément Renault
263c5a348e Move the spin looping for BBQueue frames into a dedicated function 2024-12-02 10:33:49 +01:00
Clément Renault
be7d2fbe63 Move the EntryHeader up in the file and document the safety related to the size 2024-12-02 10:19:11 +01:00
Clément Renault
f7f9a131e4 Improve copying bytes into aligned memory area 2024-12-02 10:15:58 +01:00
Clément Renault
5df5eb2db2 Clarify a method name 2024-12-02 10:10:48 +01:00
Clément Renault
30eb0e5b5b Rename recv and read methods to recv_action and recv_frame 2024-12-02 10:08:01 +01:00
Clément Renault
5b860cb989 Fix english in the doc 2024-12-02 10:06:35 +01:00
Clément Renault
76d0623b11 Reduce the number of unwraps 2024-12-02 10:05:06 +01:00
Clément Renault
db4eaf4d2d Rename serialize_into into serialize_into_writer 2024-12-02 10:03:27 +01:00
Clément Renault
13f21206a6 Call the serialize_into_writer method from the serialize_into one 2024-12-02 10:03:01 +01:00
Clément Renault
14ee7aa84c Make sure the BBQueue is at least 50 MiB 2024-11-28 18:02:48 +01:00
Clément Renault
8a35cd1743 Adjust the BBQueue buffers to use 2% instead of 10% 2024-11-28 16:00:15 +01:00
meili-bors[bot]
8d33af1dff Merge #5102
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 24s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 28s
Test suite / Run tests in debug (push) Failing after 28s
Test suite / Run Rustfmt (push) Successful in 3m52s
Test suite / Run Clippy (push) Successful in 9m8s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5102: Update mini-dashboard to v0.2.16 version r=curquiza a=curquiza

Fixes https://github.com/meilisearch/meilisearch/issues/5093

Fixes this bug: https://github.com/meilisearch/mini-dashboard/issues/563

Co-authored-by: curquiza <clementine@meilisearch.com>
2024-11-28 14:57:27 +00:00
Clément Renault
3c7ac093d3 Take the BBQueue capacity into account in the max memory 2024-11-28 15:43:14 +01:00
meili-bors[bot]
d49d127863 Merge #5101
5101: Fix index settings opt out r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #5099 

## What does this PR do?
- Refactor the settings implementation ensuring the routes are configured
- Add a test checking if all the routes are tested
- Refactor the tests to ease the modifications


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-11-28 14:23:33 +00:00
Clément Renault
b57dd5c58e Remove the Vector variant and use the Vectors 2024-11-28 15:20:43 +01:00
ManyTheFish
90b428a8c3 Apply change requests 2024-11-28 15:16:13 +01:00
Clément Renault
096a28656e Fix a bug around deleting all the vectors of a doc 2024-11-28 15:15:06 +01:00
curquiza
3dc87f5baa Update mini-dashboard to v0.2.16 version 2024-11-28 14:33:05 +01:00
Clément Renault
cc4bd54669 Correctly construct the Embeddings struct 2024-11-28 13:53:25 +01:00
ManyTheFish
5383f41bba Polish test_setting_routes! 2024-11-28 12:04:21 +01:00
Clément Renault
58eab9a018 Send large payload through crossbeam 2024-11-28 12:01:06 +01:00
ManyTheFish
9f36ffcbdb Polish make_setting_routes! 2024-11-28 11:44:09 +01:00
ManyTheFish
68c4717e21 Change the settings tests and macros to avoid oversights 2024-11-28 11:34:35 +01:00
Clément Renault
5c488e20cc Send the geo rtree through crossbeam channel 2024-11-27 18:03:45 +01:00
Clément Renault
da650f834e Plug the NoPanicThreadPool in the tests and benchmarks 2024-11-27 17:04:49 +01:00
Clément Renault
e83534a430 Fix the indexer::index to correctly use the rayon::ThreadPool 2024-11-27 16:27:43 +01:00
Clément Renault
98d4a2909e Fix the way we spawn the rayon threadpool 2024-11-27 16:05:44 +01:00
Clément Renault
a514ce472a Make clippy happy 2024-11-27 14:59:04 +01:00
Clément Renault
cc63802115 Modify and return the IndexEmbeddings to write them later 2024-11-27 14:58:03 +01:00
Clément Renault
acec45ad7c Send a WakeUp when writing data in the BBQueue buffers 2024-11-27 14:33:23 +01:00
Clément Renault
08d6413365 Fix result types 2024-11-27 14:32:42 +01:00
Clément Renault
70802eb7c7 Fix most issues with the lifetimes 2024-11-27 14:32:42 +01:00
Clément Renault
6ac5b3b136 Finish most of the channels types 2024-11-27 14:32:26 +01:00
Clément Renault
e1e76f39d0 Clean up dependencies 2024-11-27 14:30:34 +01:00
Clément Renault
2094ce8a9a Move the arroy building after the writing loop 2024-11-27 14:30:33 +01:00
Clément Renault
8442db8101 Implement mostly all senders 2024-11-27 14:16:35 +01:00
Clément Renault
79671c9faa Implement a first version of the bbqueue channels 2024-11-27 14:15:00 +01:00
101 changed files with 2979 additions and 1819 deletions

39
Cargo.lock generated
View File

@@ -489,6 +489,11 @@ version = "0.22.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6"
[[package]]
name = "bbqueue"
version = "0.5.1"
source = "git+https://github.com/meilisearch/bbqueue#cbb87cc707b5af415ef203bdaf2443e06ba0d6d4"
[[package]]
name = "benchmarks"
version = "1.12.0"
@@ -1246,19 +1251,6 @@ dependencies = [
"itertools 0.10.5",
]
[[package]]
name = "crossbeam"
version = "0.8.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1137cd7e7fc0fb5d3c5a8678be38ec56e819125d8d7907411fe24ccb943faca8"
dependencies = [
"crossbeam-channel",
"crossbeam-deque",
"crossbeam-epoch",
"crossbeam-queue",
"crossbeam-utils",
]
[[package]]
name = "crossbeam-channel"
version = "0.5.13"
@@ -1918,6 +1910,15 @@ dependencies = [
"serde_json",
]
[[package]]
name = "flume"
version = "0.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "da0e4dd2a88388a1f4ccc7c9ce104604dab68d9f408dc34cd45823d5a9069095"
dependencies = [
"spin",
]
[[package]]
name = "fnv"
version = "1.0.7"
@@ -2616,7 +2617,7 @@ dependencies = [
"big_s",
"bincode",
"bumpalo",
"crossbeam",
"crossbeam-channel",
"csv",
"derive_builder 0.20.0",
"dump",
@@ -3611,6 +3612,7 @@ version = "1.12.0"
dependencies = [
"allocator-api2",
"arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
"bbqueue",
"big_s",
"bimap",
"bincode",
@@ -3630,6 +3632,7 @@ dependencies = [
"enum-iterator",
"filter-parser",
"flatten-serde-json",
"flume",
"fst",
"fxhash",
"geoutils",
@@ -4743,8 +4746,9 @@ dependencies = [
[[package]]
name = "roaring"
version = "0.10.6"
source = "git+https://github.com/RoaringBitmap/roaring-rs?branch=clone-iter-slice#8ff028e484fb6192a0acf5a669eaf18c30cada6e"
version = "0.10.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f81dc953b2244ddd5e7860cb0bb2a790494b898ef321d4aff8e260efab60cc88"
dependencies = [
"bytemuck",
"byteorder",
@@ -5186,6 +5190,9 @@ name = "spin"
version = "0.9.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6980e8d7511241f8acf4aebddbb1ff938df5eebe98691418c4468d0b72a96a67"
dependencies = [
"lock_api",
]
[[package]]
name = "spm_precompiled"

View File

@@ -43,6 +43,3 @@ opt-level = 3
opt-level = 3
[profile.dev.package.roaring]
opt-level = 3
[patch.crates-io]
roaring = { git = "https://github.com/RoaringBitmap/roaring-rs", branch = "clone-iter-slice" }

View File

@@ -24,7 +24,7 @@ tempfile = "3.14.0"
criterion = { version = "0.5.1", features = ["html_reports"] }
rand = "0.8.5"
rand_chacha = "0.3.1"
roaring = "0.10.6"
roaring = "0.10.7"
[build-dependencies]
anyhow = "1.0.86"

View File

@@ -16,6 +16,7 @@ use rand::seq::SliceRandom;
use rand_chacha::rand_core::SeedableRng;
use roaring::RoaringBitmap;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
@@ -157,6 +158,7 @@ fn indexing_songs_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -223,6 +225,7 @@ fn reindexing_songs_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -267,6 +270,7 @@ fn reindexing_songs_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -335,6 +339,7 @@ fn deleting_songs_in_batches_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -411,6 +416,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -455,6 +461,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -495,6 +502,7 @@ fn indexing_songs_in_three_batches_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -562,6 +570,7 @@ fn indexing_songs_without_faceted_numbers(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -628,6 +637,7 @@ fn indexing_songs_without_faceted_fields(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -694,6 +704,7 @@ fn indexing_wiki(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -759,6 +770,7 @@ fn reindexing_wiki(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -803,6 +815,7 @@ fn reindexing_wiki(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -870,6 +883,7 @@ fn deleting_wiki_in_batches_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -946,6 +960,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -991,6 +1006,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1032,6 +1048,7 @@ fn indexing_wiki_in_three_batches(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1098,6 +1115,7 @@ fn indexing_movies_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1163,6 +1181,7 @@ fn reindexing_movies_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1207,6 +1226,7 @@ fn reindexing_movies_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1274,6 +1294,7 @@ fn deleting_movies_in_batches_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1321,6 +1342,7 @@ fn delete_documents_from_ids(index: Index, document_ids_to_delete: Vec<RoaringBi
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1385,6 +1407,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1429,6 +1452,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1469,6 +1493,7 @@ fn indexing_movies_in_three_batches(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1558,6 +1583,7 @@ fn indexing_nested_movies_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1648,6 +1674,7 @@ fn deleting_nested_movies_in_batches_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1730,6 +1757,7 @@ fn indexing_nested_movies_without_faceted_fields(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1796,6 +1824,7 @@ fn indexing_geo(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1861,6 +1890,7 @@ fn reindexing_geo(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1905,6 +1935,7 @@ fn reindexing_geo(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1972,6 +2003,7 @@ fn deleting_geo_in_batches_default(c: &mut Criterion) {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,

View File

@@ -5,6 +5,7 @@ use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;

View File

@@ -5,6 +5,7 @@ use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;

View File

@@ -5,6 +5,7 @@ use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;

View File

@@ -117,6 +117,7 @@ pub fn base_setup(conf: &Conf) -> Index {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,

View File

@@ -17,7 +17,7 @@ http = "1.1.0"
meilisearch-types = { path = "../meilisearch-types" }
once_cell = "1.19.0"
regex = "1.10.5"
roaring = { version = "0.10.6", features = ["serde"] }
roaring = { version = "0.10.7", features = ["serde"] }
serde = { version = "1.0.204", features = ["derive"] }
serde_json = { version = "1.0.120", features = ["preserve_order"] }
tar = "0.4.41"

View File

@@ -135,6 +135,7 @@ fn main() {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,

View File

@@ -24,7 +24,7 @@ meilisearch-types = { path = "../meilisearch-types" }
page_size = "0.6.0"
raw-collections = { git = "https://github.com/meilisearch/raw-collections.git", version = "0.1.0" }
rayon = "1.10.0"
roaring = { version = "0.10.6", features = ["serde"] }
roaring = { version = "0.10.7", features = ["serde"] }
serde = { version = "1.0.204", features = ["derive"] }
serde_json = { version = "1.0.120", features = ["preserve_order"] }
synchronoise = "1.0.1"
@@ -45,7 +45,7 @@ bumpalo = "3.16.0"
[dev-dependencies]
arroy = "0.5.0"
big_s = "1.0.2"
crossbeam = "0.8.4"
crossbeam-channel = "0.5.13"
insta = { version = "1.39.0", features = ["json", "redactions"] }
maplit = "1.0.2"
meili-snap = { path = "../meili-snap" }

View File

@@ -115,13 +115,6 @@ pub enum BatchKind {
allow_index_creation: bool,
settings_ids: Vec<TaskId>,
},
SettingsAndDocumentOperation {
settings_ids: Vec<TaskId>,
method: IndexDocumentsMethod,
allow_index_creation: bool,
primary_key: Option<String>,
operation_ids: Vec<TaskId>,
},
Settings {
allow_index_creation: bool,
settings_ids: Vec<TaskId>,
@@ -146,7 +139,6 @@ impl BatchKind {
match self {
BatchKind::DocumentOperation { allow_index_creation, .. }
| BatchKind::ClearAndSettings { allow_index_creation, .. }
| BatchKind::SettingsAndDocumentOperation { allow_index_creation, .. }
| BatchKind::Settings { allow_index_creation, .. } => Some(*allow_index_creation),
_ => None,
}
@@ -154,10 +146,7 @@ impl BatchKind {
fn primary_key(&self) -> Option<Option<&str>> {
match self {
BatchKind::DocumentOperation { primary_key, .. }
| BatchKind::SettingsAndDocumentOperation { primary_key, .. } => {
Some(primary_key.as_deref())
}
BatchKind::DocumentOperation { primary_key, .. } => Some(primary_key.as_deref()),
_ => None,
}
}
@@ -275,8 +264,7 @@ impl BatchKind {
Break(BatchKind::IndexDeletion { ids })
}
(
BatchKind::ClearAndSettings { settings_ids: mut ids, allow_index_creation: _, mut other }
| BatchKind::SettingsAndDocumentOperation { operation_ids: mut ids, method: _, allow_index_creation: _, primary_key: _, settings_ids: mut other },
BatchKind::ClearAndSettings { settings_ids: mut ids, allow_index_creation: _, mut other },
K::IndexDeletion,
) => {
ids.push(id);
@@ -356,15 +344,9 @@ impl BatchKind {
) => Break(this),
(
BatchKind::DocumentOperation { method, allow_index_creation, primary_key, operation_ids },
this @ BatchKind::DocumentOperation { .. },
K::Settings { .. },
) => Continue(BatchKind::SettingsAndDocumentOperation {
settings_ids: vec![id],
method,
allow_index_creation,
primary_key,
operation_ids,
}),
) => Break(this),
(BatchKind::DocumentDeletion { mut deletion_ids, includes_by_filter: _ }, K::DocumentClear) => {
deletion_ids.push(id);
@@ -477,63 +459,7 @@ impl BatchKind {
allow_index_creation,
})
}
(
BatchKind::SettingsAndDocumentOperation { settings_ids, method: _, mut operation_ids, allow_index_creation, primary_key: _ },
K::DocumentClear,
) => {
operation_ids.push(id);
Continue(BatchKind::ClearAndSettings {
settings_ids,
other: operation_ids,
allow_index_creation,
})
}
(
BatchKind::SettingsAndDocumentOperation { settings_ids, method: ReplaceDocuments, mut operation_ids, allow_index_creation, primary_key: _},
K::DocumentImport { method: ReplaceDocuments, primary_key: pk2, .. },
) => {
operation_ids.push(id);
Continue(BatchKind::SettingsAndDocumentOperation {
settings_ids,
method: ReplaceDocuments,
allow_index_creation,
primary_key: pk2,
operation_ids,
})
}
(
BatchKind::SettingsAndDocumentOperation { settings_ids, method: UpdateDocuments, allow_index_creation, primary_key: _, mut operation_ids },
K::DocumentImport { method: UpdateDocuments, primary_key: pk2, .. },
) => {
operation_ids.push(id);
Continue(BatchKind::SettingsAndDocumentOperation {
settings_ids,
method: UpdateDocuments,
allow_index_creation,
primary_key: pk2,
operation_ids,
})
}
// But we can't batch a settings and a doc op with another doc op
// this MUST be AFTER the two previous branch
(
this @ BatchKind::SettingsAndDocumentOperation { .. },
K::DocumentDeletion { .. } | K::DocumentImport { .. },
) => Break(this),
(
BatchKind::SettingsAndDocumentOperation { mut settings_ids, method, allow_index_creation,primary_key, operation_ids },
K::Settings { .. },
) => {
settings_ids.push(id);
Continue(BatchKind::SettingsAndDocumentOperation {
settings_ids,
method,
allow_index_creation,
primary_key,
operation_ids,
})
}
(
BatchKind::IndexCreation { .. }
| BatchKind::IndexDeletion { .. }
@@ -808,30 +734,30 @@ mod tests {
}
#[test]
fn document_addition_batch_with_settings() {
fn document_addition_doesnt_batch_with_settings() {
// simple case
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
// multiple settings and doc addition
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [2, 3], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [2, 3], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
// addition and setting unordered
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1, 3], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1, 3], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
// We ensure this kind of batch doesn't batch with forbidden operations
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_del()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_del()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_create()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_create()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_update()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_update()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_swap()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_swap()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
// Doesn't batch with other forbidden operations
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_create()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_create()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_update()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_update()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_swap()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_swap()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
}
#[test]
@@ -859,8 +785,8 @@ mod tests {
debug_snapshot!(autobatch_from(true, None, [doc_clr(), settings(true)]), @"Some((DocumentClear { ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [settings(true), doc_clr(), settings(true)]), @"Some((ClearAndSettings { other: [1], allow_index_creation: true, settings_ids: [0, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_clr()]), @"Some((ClearAndSettings { other: [0, 2], allow_index_creation: true, settings_ids: [1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_clr()]), @"Some((ClearAndSettings { other: [0, 2], allow_index_creation: true, settings_ids: [1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_clr()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_clr()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
}
#[test]
@@ -907,50 +833,6 @@ mod tests {
debug_snapshot!(autobatch_from(false,None, [doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 1] }, false))");
// Then the mixed cases.
// The index already exists, whatever is the right of the tasks it shouldn't change the result.
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments,false, None), settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments,false, None), settings(false), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), settings(false), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments,false, None), settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments,false, None), settings(true), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), settings(true), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments,true, None), settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments,true, None), settings(false), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(false), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, true))");
// When the index doesn't exists yet it's more complicated.
// Either the first task we encounter create it, in which case we can create a big batch with everything.
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, true))");
// The right of the tasks following isn't really important.
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments,true, None), settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None), settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments,true, None), settings(false), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None), settings(false), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, true))");
// Or, the second case; the first task doesn't create the index and thus we wants to batch it with only tasks that can't create an index.
// that can be a second task that don't have the right to create an index. Or anything that can't create an index like an index deletion, document deletion, document clear, etc.
// All theses tasks are going to throw an error `Index doesn't exist` once the batch is processed.
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments,false, None), settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), settings(false), idx_del()]), @"Some((IndexDeletion { ids: [0, 2, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments,false, None), settings(false), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), settings(false), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, false))");
// The third and final case is when the first task doesn't create an index but is directly followed by a task creating an index. In this case we can't batch whit what
// follows because we first need to process the erronous batch.
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments,false, None), settings(true), idx_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), settings(true), idx_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments,false, None), settings(true), doc_clr(), idx_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), settings(true), doc_clr(), idx_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
}
#[test]
@@ -959,13 +841,13 @@ mod tests {
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
// batch deletion and addition

View File

@@ -29,7 +29,6 @@ use bumpalo::collections::CollectIn;
use bumpalo::Bump;
use dump::IndexMetadata;
use meilisearch_types::batches::BatchId;
use meilisearch_types::error::Code;
use meilisearch_types::heed::{RoTxn, RwTxn};
use meilisearch_types::milli::documents::{obkv_to_object, DocumentsBatchReader, PrimaryKey};
use meilisearch_types::milli::heed::CompactionOption;
@@ -104,7 +103,6 @@ pub(crate) enum IndexOperation {
index_uid: String,
primary_key: Option<String>,
method: IndexDocumentsMethod,
documents_counts: Vec<u64>,
operations: Vec<DocumentOperation>,
tasks: Vec<Task>,
},
@@ -130,19 +128,6 @@ pub(crate) enum IndexOperation {
index_uid: String,
cleared_tasks: Vec<Task>,
// The boolean indicates if it's a settings deletion or creation.
settings: Vec<(bool, Settings<Unchecked>)>,
settings_tasks: Vec<Task>,
},
SettingsAndDocumentOperation {
index_uid: String,
primary_key: Option<String>,
method: IndexDocumentsMethod,
documents_counts: Vec<u64>,
operations: Vec<DocumentOperation>,
document_import_tasks: Vec<Task>,
// The boolean indicates if it's a settings deletion or creation.
settings: Vec<(bool, Settings<Unchecked>)>,
settings_tasks: Vec<Task>,
@@ -174,12 +159,7 @@ impl Batch {
IndexOperation::DocumentEdition { task, .. } => {
RoaringBitmap::from_sorted_iter(std::iter::once(task.uid)).unwrap()
}
IndexOperation::SettingsAndDocumentOperation {
document_import_tasks: tasks,
settings_tasks: other,
..
}
| IndexOperation::DocumentClearAndSetting {
IndexOperation::DocumentClearAndSetting {
cleared_tasks: tasks,
settings_tasks: other,
..
@@ -239,8 +219,7 @@ impl IndexOperation {
| IndexOperation::DocumentDeletion { index_uid, .. }
| IndexOperation::DocumentClear { index_uid, .. }
| IndexOperation::Settings { index_uid, .. }
| IndexOperation::DocumentClearAndSetting { index_uid, .. }
| IndexOperation::SettingsAndDocumentOperation { index_uid, .. } => index_uid,
| IndexOperation::DocumentClearAndSetting { index_uid, .. } => index_uid,
}
}
}
@@ -262,9 +241,6 @@ impl fmt::Display for IndexOperation {
IndexOperation::DocumentClearAndSetting { .. } => {
f.write_str("IndexOperation::DocumentClearAndSetting")
}
IndexOperation::SettingsAndDocumentOperation { .. } => {
f.write_str("IndexOperation::SettingsAndDocumentOperation")
}
}
}
}
@@ -330,21 +306,14 @@ impl IndexScheduler {
})
.flatten();
let mut documents_counts = Vec::new();
let mut operations = Vec::new();
for task in tasks.iter() {
match task.kind {
KindWithContent::DocumentAdditionOrUpdate {
content_file,
documents_count,
..
} => {
documents_counts.push(documents_count);
KindWithContent::DocumentAdditionOrUpdate { content_file, .. } => {
operations.push(DocumentOperation::Add(content_file));
}
KindWithContent::DocumentDeletion { ref documents_ids, .. } => {
documents_counts.push(documents_ids.len() as u64);
operations.push(DocumentOperation::Delete(documents_ids.clone()));
}
_ => unreachable!(),
@@ -356,7 +325,6 @@ impl IndexScheduler {
index_uid,
primary_key,
method,
documents_counts,
operations,
tasks,
},
@@ -441,67 +409,6 @@ impl IndexScheduler {
must_create_index,
}))
}
BatchKind::SettingsAndDocumentOperation {
settings_ids,
method,
allow_index_creation,
primary_key,
operation_ids,
} => {
let settings = self.create_next_batch_index(
rtxn,
index_uid.clone(),
BatchKind::Settings { settings_ids, allow_index_creation },
current_batch,
must_create_index,
)?;
let document_import = self.create_next_batch_index(
rtxn,
index_uid.clone(),
BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key,
operation_ids,
},
current_batch,
must_create_index,
)?;
match (document_import, settings) {
(
Some(Batch::IndexOperation {
op:
IndexOperation::DocumentOperation {
primary_key,
documents_counts,
operations,
tasks: document_import_tasks,
..
},
..
}),
Some(Batch::IndexOperation {
op: IndexOperation::Settings { settings, tasks: settings_tasks, .. },
..
}),
) => Ok(Some(Batch::IndexOperation {
op: IndexOperation::SettingsAndDocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations,
document_import_tasks,
settings,
settings_tasks,
},
must_create_index,
})),
_ => unreachable!(),
}
}
BatchKind::IndexCreation { id } => {
let mut task = self.get_task(rtxn, id)?.ok_or(Error::CorruptedTaskQueue)?;
current_batch.processing(Some(&mut task));
@@ -589,7 +496,6 @@ impl IndexScheduler {
// 5. We make a batch from the unprioritised tasks. Start by taking the next enqueued task.
let task_id = if let Some(task_id) = enqueued.min() { task_id } else { return Ok(None) };
let mut task = self.get_task(rtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?;
current_batch.processing(Some(&mut task));
// If the task is not associated with any index, verify that it is an index swap and
// create the batch directly. Otherwise, get the index name associated with the task
@@ -599,6 +505,7 @@ impl IndexScheduler {
index_name
} else {
assert!(matches!(&task.kind, KindWithContent::IndexSwap { swaps } if swaps.is_empty()));
current_batch.processing(Some(&mut task));
return Ok(Some((Batch::IndexSwap { task }, current_batch)));
};
@@ -781,7 +688,9 @@ impl IndexScheduler {
let index = self.index_mapper.index(&rtxn, name)?;
let dst = temp_snapshot_dir.path().join("indexes").join(uuid.to_string());
fs::create_dir_all(&dst)?;
index.copy_to_file(dst.join("data.mdb"), CompactionOption::Enabled)?;
index
.copy_to_file(dst.join("data.mdb"), CompactionOption::Enabled)
.map_err(|e| Error::from_milli(e, Some(name.to_string())))?;
}
drop(rtxn);
@@ -883,16 +792,19 @@ impl IndexScheduler {
let content_file = self.file_store.get_update(content_file)?;
let reader = DocumentsBatchReader::from_reader(content_file)
.map_err(milli::Error::from)?;
.map_err(|e| Error::from_milli(e.into(), None))?;
let (mut cursor, documents_batch_index) =
reader.into_cursor_and_fields_index();
while let Some(doc) =
cursor.next_document().map_err(milli::Error::from)?
while let Some(doc) = cursor
.next_document()
.map_err(|e| Error::from_milli(e.into(), None))?
{
dump_content_file
.push_document(&obkv_to_object(doc, &documents_batch_index)?)?;
dump_content_file.push_document(
&obkv_to_object(doc, &documents_batch_index)
.map_err(|e| Error::from_milli(e, None))?,
)?;
}
dump_content_file.flush()?;
}
@@ -906,27 +818,41 @@ impl IndexScheduler {
let metadata = IndexMetadata {
uid: uid.to_owned(),
primary_key: index.primary_key(&rtxn)?.map(String::from),
created_at: index.created_at(&rtxn)?,
updated_at: index.updated_at(&rtxn)?,
created_at: index
.created_at(&rtxn)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?,
updated_at: index
.updated_at(&rtxn)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?,
};
let mut index_dumper = dump.create_index(uid, &metadata)?;
let fields_ids_map = index.fields_ids_map(&rtxn)?;
let all_fields: Vec<_> = fields_ids_map.iter().map(|(id, _)| id).collect();
let embedding_configs = index.embedding_configs(&rtxn)?;
let embedding_configs = index
.embedding_configs(&rtxn)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
let documents = index
.all_documents(&rtxn)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
// 3.1. Dump the documents
for ret in index.all_documents(&rtxn)? {
for ret in documents {
if self.must_stop_processing.get() {
return Err(Error::AbortedTask);
}
let (id, doc) = ret?;
let (id, doc) =
ret.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
let mut document = milli::obkv_to_json(&all_fields, &fields_ids_map, doc)?;
let mut document =
milli::obkv_to_json(&all_fields, &fields_ids_map, doc)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
'inject_vectors: {
let embeddings = index.embeddings(&rtxn, id)?;
let embeddings = index
.embeddings(&rtxn, id)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
if embeddings.is_empty() {
break 'inject_vectors;
@@ -937,7 +863,7 @@ impl IndexScheduler {
.or_insert(serde_json::Value::Object(Default::default()));
let serde_json::Value::Object(vectors) = vectors else {
return Err(milli::Error::UserError(
let user_err = milli::Error::UserError(
milli::UserError::InvalidVectorsMapType {
document_id: {
if let Ok(Some(Ok(index))) = index
@@ -951,8 +877,9 @@ impl IndexScheduler {
},
value: vectors.clone(),
},
)
.into());
);
return Err(Error::from_milli(user_err, Some(uid.to_string())));
};
for (embedder_name, embeddings) in embeddings {
@@ -982,7 +909,8 @@ impl IndexScheduler {
index,
&rtxn,
meilisearch_types::settings::SecretPolicy::RevealSecrets,
)?;
)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?;
index_dumper.settings(&settings)?;
Ok(())
})?;
@@ -1038,7 +966,8 @@ impl IndexScheduler {
// the entire batch.
let res = || -> Result<()> {
let index_rtxn = index.read_txn()?;
let stats = crate::index_mapper::IndexStats::new(&index, &index_rtxn)?;
let stats = crate::index_mapper::IndexStats::new(&index, &index_rtxn)
.map_err(|e| Error::from_milli(e, Some(index_uid.to_string())))?;
let mut wtxn = self.env.write_txn()?;
self.index_mapper.store_stats_of(&mut wtxn, &index_uid, &stats)?;
wtxn.commit()?;
@@ -1080,10 +1009,12 @@ impl IndexScheduler {
);
builder.set_primary_key(primary_key);
let must_stop_processing = self.must_stop_processing.clone();
builder.execute(
|indexing_step| tracing::debug!(update = ?indexing_step),
|| must_stop_processing.get(),
)?;
builder
.execute(
|indexing_step| tracing::debug!(update = ?indexing_step),
|| must_stop_processing.get(),
)
.map_err(|e| Error::from_milli(e, Some(index_uid.to_string())))?;
index_wtxn.commit()?;
}
@@ -1100,7 +1031,8 @@ impl IndexScheduler {
let res = || -> Result<()> {
let mut wtxn = self.env.write_txn()?;
let index_rtxn = index.read_txn()?;
let stats = crate::index_mapper::IndexStats::new(&index, &index_rtxn)?;
let stats = crate::index_mapper::IndexStats::new(&index, &index_rtxn)
.map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?;
self.index_mapper.store_stats_of(&mut wtxn, &index_uid, &stats)?;
wtxn.commit()?;
Ok(())
@@ -1123,7 +1055,9 @@ impl IndexScheduler {
let number_of_documents = || -> Result<u64> {
let index = self.index_mapper.index(&wtxn, &index_uid)?;
let index_rtxn = index.read_txn()?;
Ok(index.number_of_documents(&index_rtxn)?)
index
.number_of_documents(&index_rtxn)
.map_err(|e| Error::from_milli(e, Some(index_uid.to_string())))
}()
.unwrap_or_default();
@@ -1280,8 +1214,10 @@ impl IndexScheduler {
};
match operation {
IndexOperation::DocumentClear { mut tasks, .. } => {
let count = milli::update::ClearDocuments::new(index_wtxn, index).execute()?;
IndexOperation::DocumentClear { index_uid, mut tasks } => {
let count = milli::update::ClearDocuments::new(index_wtxn, index)
.execute()
.map_err(|e| Error::from_milli(e, Some(index_uid)))?;
let mut first_clear_found = false;
for task in &mut tasks {
@@ -1301,10 +1237,9 @@ impl IndexScheduler {
Ok(tasks)
}
IndexOperation::DocumentOperation {
index_uid: _,
index_uid,
primary_key,
method,
documents_counts: _,
operations,
mut tasks,
} => {
@@ -1328,13 +1263,17 @@ impl IndexScheduler {
let mut content_files_iter = content_files.iter();
let mut indexer = indexer::DocumentOperation::new(method);
let embedders = index.embedding_configs(index_wtxn)?;
let embedders = self.embedders(embedders)?;
let embedders = index
.embedding_configs(index_wtxn)
.map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?;
let embedders = self.embedders(index_uid.clone(), embedders)?;
for operation in operations {
match operation {
DocumentOperation::Add(_content_uuid) => {
let mmap = content_files_iter.next().unwrap();
indexer.add_documents(mmap)?;
indexer
.add_documents(mmap)
.map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?;
}
DocumentOperation::Delete(document_ids) => {
let document_ids: bumpalo::collections::vec::Vec<_> = document_ids
@@ -1351,20 +1290,25 @@ impl IndexScheduler {
let pool = match &indexer_config.thread_pool {
Some(pool) => pool,
None => {
local_pool = ThreadPoolNoAbortBuilder::new().build().unwrap();
local_pool = ThreadPoolNoAbortBuilder::new()
.thread_name(|i| format!("indexing-thread-{i}"))
.build()
.unwrap();
&local_pool
}
};
let (document_changes, operation_stats, primary_key) = indexer.into_changes(
&indexer_alloc,
index,
&rtxn,
primary_key.as_deref(),
&mut new_fields_ids_map,
&|| must_stop_processing.get(),
&send_progress,
)?;
let (document_changes, operation_stats, primary_key) = indexer
.into_changes(
&indexer_alloc,
index,
&rtxn,
primary_key.as_deref(),
&mut new_fields_ids_map,
&|| must_stop_processing.get(),
&send_progress,
)
.map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?;
let mut addition = 0;
for (stats, task) in operation_stats.into_iter().zip(&mut tasks) {
@@ -1399,28 +1343,27 @@ impl IndexScheduler {
}
if tasks.iter().any(|res| res.error.is_none()) {
pool.install(|| {
indexer::index(
index_wtxn,
index,
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
primary_key,
&document_changes,
embedders,
&|| must_stop_processing.get(),
&send_progress,
)
})
.unwrap()?;
indexer::index(
index_wtxn,
index,
pool,
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
primary_key,
&document_changes,
embedders,
&|| must_stop_processing.get(),
&send_progress,
)
.map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?;
tracing::info!(indexing_result = ?addition, processed_in = ?started_processing_at.elapsed(), "document indexing done");
}
Ok(tasks)
}
IndexOperation::DocumentEdition { mut task, .. } => {
IndexOperation::DocumentEdition { index_uid, mut task } => {
let (filter, code) = if let KindWithContent::DocumentEdition {
filter_expr,
context: _,
@@ -1434,16 +1377,11 @@ impl IndexScheduler {
};
let candidates = match filter.as_ref().map(Filter::from_json) {
Some(Ok(Some(filter))) => {
filter.evaluate(index_wtxn, index).map_err(|err| match err {
milli::Error::UserError(milli::UserError::InvalidFilter(_)) => {
Error::from(err).with_custom_error_code(Code::InvalidDocumentFilter)
}
e => e.into(),
})?
}
Some(Ok(Some(filter))) => filter
.evaluate(index_wtxn, index)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?,
None | Some(Ok(None)) => index.documents_ids(index_wtxn)?,
Some(Err(e)) => return Err(e.into()),
Some(Err(e)) => return Err(Error::from_milli(e, Some(index_uid.clone()))),
};
let (original_filter, context, function) = if let Some(Details::DocumentEdition {
@@ -1478,8 +1416,9 @@ impl IndexScheduler {
// candidates not empty => index not empty => a primary key is set
let primary_key = index.primary_key(&rtxn)?.unwrap();
let primary_key = PrimaryKey::new_or_insert(primary_key, &mut new_fields_ids_map)
.map_err(milli::Error::from)?;
let primary_key =
PrimaryKey::new_or_insert(primary_key, &mut new_fields_ids_map)
.map_err(|err| Error::from_milli(err.into(), Some(index_uid.clone())))?;
let result_count = Ok((candidates.len(), candidates.len())) as Result<_>;
@@ -1489,34 +1428,41 @@ impl IndexScheduler {
let pool = match &indexer_config.thread_pool {
Some(pool) => pool,
None => {
local_pool = ThreadPoolNoAbortBuilder::new().build().unwrap();
local_pool = ThreadPoolNoAbortBuilder::new()
.thread_name(|i| format!("indexing-thread-{i}"))
.build()
.unwrap();
&local_pool
}
};
pool.install(|| {
let indexer =
UpdateByFunction::new(candidates, context.clone(), code.clone());
let document_changes = indexer.into_changes(&primary_key)?;
let embedders = index.embedding_configs(index_wtxn)?;
let embedders = self.embedders(embedders)?;
let indexer = UpdateByFunction::new(candidates, context.clone(), code.clone());
let document_changes = pool
.install(|| {
indexer
.into_changes(&primary_key)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))
})
.unwrap()?;
let embedders = index
.embedding_configs(index_wtxn)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?;
let embedders = self.embedders(index_uid.clone(), embedders)?;
indexer::index(
index_wtxn,
index,
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
None, // cannot change primary key in DocumentEdition
&document_changes,
embedders,
&|| must_stop_processing.get(),
&send_progress,
)?;
Result::Ok(())
})
.unwrap()?;
indexer::index(
index_wtxn,
index,
pool,
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
None, // cannot change primary key in DocumentEdition
&document_changes,
embedders,
&|| must_stop_processing.get(),
&send_progress,
)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?;
// tracing::info!(indexing_result = ?addition, processed_in = ?started_processing_at.elapsed(), "document indexing done");
}
@@ -1547,7 +1493,7 @@ impl IndexScheduler {
Ok(vec![task])
}
IndexOperation::DocumentDeletion { mut tasks, index_uid: _ } => {
IndexOperation::DocumentDeletion { mut tasks, index_uid } => {
let mut to_delete = RoaringBitmap::new();
let external_documents_ids = index.external_documents_ids();
@@ -1568,35 +1514,23 @@ impl IndexScheduler {
deleted_documents: Some(will_be_removed),
});
}
KindWithContent::DocumentDeletionByFilter { index_uid: _, filter_expr } => {
KindWithContent::DocumentDeletionByFilter { index_uid, filter_expr } => {
let before = to_delete.len();
let filter = match Filter::from_json(filter_expr) {
Ok(filter) => filter,
Err(err) => {
// theorically, this should be catched by deserr before reaching the index-scheduler and cannot happens
task.status = Status::Failed;
task.error = match err {
milli::Error::UserError(
milli::UserError::InvalidFilterExpression { .. },
) => Some(
Error::from(err)
.with_custom_error_code(Code::InvalidDocumentFilter)
.into(),
),
e => Some(e.into()),
};
task.error = Some(
Error::from_milli(err, Some(index_uid.clone())).into(),
);
None
}
};
if let Some(filter) = filter {
let candidates =
filter.evaluate(index_wtxn, index).map_err(|err| match err {
milli::Error::UserError(
milli::UserError::InvalidFilter(_),
) => Error::from(err)
.with_custom_error_code(Code::InvalidDocumentFilter),
e => e.into(),
});
let candidates = filter
.evaluate(index_wtxn, index)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())));
match candidates {
Ok(candidates) => to_delete |= candidates,
Err(err) => {
@@ -1632,8 +1566,9 @@ impl IndexScheduler {
// to_delete not empty => index not empty => primary key set
let primary_key = index.primary_key(&rtxn)?.unwrap();
let primary_key = PrimaryKey::new_or_insert(primary_key, &mut new_fields_ids_map)
.map_err(milli::Error::from)?;
let primary_key =
PrimaryKey::new_or_insert(primary_key, &mut new_fields_ids_map)
.map_err(|err| Error::from_milli(err.into(), Some(index_uid.clone())))?;
if !tasks.iter().all(|res| res.error.is_some()) {
let local_pool;
@@ -1641,7 +1576,10 @@ impl IndexScheduler {
let pool = match &indexer_config.thread_pool {
Some(pool) => pool,
None => {
local_pool = ThreadPoolNoAbortBuilder::new().build().unwrap();
local_pool = ThreadPoolNoAbortBuilder::new()
.thread_name(|i| format!("indexing-thread-{i}"))
.build()
.unwrap();
&local_pool
}
};
@@ -1649,31 +1587,32 @@ impl IndexScheduler {
let mut indexer = indexer::DocumentDeletion::new();
indexer.delete_documents_by_docids(to_delete);
let document_changes = indexer.into_changes(&indexer_alloc, primary_key);
let embedders = index.embedding_configs(index_wtxn)?;
let embedders = self.embedders(embedders)?;
let embedders = index
.embedding_configs(index_wtxn)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?;
let embedders = self.embedders(index_uid.clone(), embedders)?;
pool.install(|| {
indexer::index(
index_wtxn,
index,
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
None, // document deletion never changes primary key
&document_changes,
embedders,
&|| must_stop_processing.get(),
&send_progress,
)
})
.unwrap()?;
indexer::index(
index_wtxn,
index,
pool,
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
None, // document deletion never changes primary key
&document_changes,
embedders,
&|| must_stop_processing.get(),
&send_progress,
)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?;
// tracing::info!(indexing_result = ?addition, processed_in = ?started_processing_at.elapsed(), "document indexing done");
}
Ok(tasks)
}
IndexOperation::Settings { index_uid: _, settings, mut tasks } => {
IndexOperation::Settings { index_uid, settings, mut tasks } => {
let indexer_config = self.index_mapper.indexer_config();
let mut builder = milli::update::Settings::new(index_wtxn, index, indexer_config);
@@ -1687,50 +1626,15 @@ impl IndexScheduler {
task.status = Status::Succeeded;
}
builder.execute(
|indexing_step| tracing::debug!(update = ?indexing_step),
|| must_stop_processing.get(),
)?;
builder
.execute(
|indexing_step| tracing::debug!(update = ?indexing_step),
|| must_stop_processing.get(),
)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?;
Ok(tasks)
}
IndexOperation::SettingsAndDocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations,
document_import_tasks,
settings,
settings_tasks,
} => {
let settings_tasks = self.apply_index_operation(
index_wtxn,
index,
IndexOperation::Settings {
index_uid: index_uid.clone(),
settings,
tasks: settings_tasks,
},
)?;
let mut import_tasks = self.apply_index_operation(
index_wtxn,
index,
IndexOperation::DocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
operations,
tasks: document_import_tasks,
},
)?;
let mut tasks = settings_tasks;
tasks.append(&mut import_tasks);
Ok(tasks)
}
IndexOperation::DocumentClearAndSetting {
index_uid,
cleared_tasks,

View File

@@ -1,13 +1,12 @@
use std::fmt::Display;
use crate::TaskId;
use meilisearch_types::batches::BatchId;
use meilisearch_types::error::{Code, ErrorCode};
use meilisearch_types::tasks::{Kind, Status};
use meilisearch_types::{heed, milli};
use thiserror::Error;
use crate::TaskId;
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum DateField {
BeforeEnqueuedAt,
@@ -122,8 +121,11 @@ pub enum Error {
Dump(#[from] dump::Error),
#[error(transparent)]
Heed(#[from] heed::Error),
#[error(transparent)]
Milli(#[from] milli::Error),
#[error("{}", match .index_uid {
Some(uid) if !uid.is_empty() => format!("Index `{}`: {error}", uid),
_ => format!("{error}")
})]
Milli { error: milli::Error, index_uid: Option<String> },
#[error("An unexpected crash occurred when processing the task.")]
ProcessBatchPanicked,
#[error(transparent)]
@@ -190,7 +192,7 @@ impl Error {
| Error::AbortedTask
| Error::Dump(_)
| Error::Heed(_)
| Error::Milli(_)
| Error::Milli { .. }
| Error::ProcessBatchPanicked
| Error::FileStore(_)
| Error::IoError(_)
@@ -209,6 +211,20 @@ impl Error {
pub fn with_custom_error_code(self, code: Code) -> Self {
Self::WithCustomErrorCode(code, Box::new(self))
}
pub fn from_milli(err: milli::Error, index_uid: Option<String>) -> Self {
match err {
milli::Error::UserError(milli::UserError::InvalidFilter(_)) => {
Self::Milli { error: err, index_uid }
.with_custom_error_code(Code::InvalidDocumentFilter)
}
milli::Error::UserError(milli::UserError::InvalidFilterExpression { .. }) => {
Self::Milli { error: err, index_uid }
.with_custom_error_code(Code::InvalidDocumentFilter)
}
_ => Self::Milli { error: err, index_uid },
}
}
}
impl ErrorCode for Error {
@@ -236,7 +252,7 @@ impl ErrorCode for Error {
// TODO: not sure of the Code to use
Error::NoSpaceLeftInTaskQueue => Code::NoSpaceLeftOnDevice,
Error::Dump(e) => e.error_code(),
Error::Milli(e) => e.error_code(),
Error::Milli { error, .. } => error.error_code(),
Error::ProcessBatchPanicked => Code::Internal,
Error::Heed(e) => e.error_code(),
Error::HeedTransaction(e) => e.error_code(),

View File

@@ -3,14 +3,13 @@ use std::path::Path;
use std::time::Duration;
use meilisearch_types::heed::{EnvClosingEvent, EnvFlags, EnvOpenOptions};
use meilisearch_types::milli::Index;
use meilisearch_types::milli::{Index, Result};
use time::OffsetDateTime;
use uuid::Uuid;
use super::IndexStatus::{self, Available, BeingDeleted, Closing, Missing};
use crate::clamp_to_page_size;
use crate::lru::{InsertionOutcome, LruMap};
use crate::{clamp_to_page_size, Result};
/// Keep an internally consistent view of the open indexes in memory.
///
/// This view is made of an LRU cache that will evict the least frequently used indexes when new indexes are opened.

View File

@@ -3,8 +3,13 @@ use std::sync::{Arc, RwLock};
use std::time::Duration;
use std::{fs, thread};
use self::index_map::IndexMap;
use self::IndexStatus::{Available, BeingDeleted, Closing, Missing};
use crate::uuid_codec::UuidCodec;
use crate::{Error, Result};
use meilisearch_types::heed::types::{SerdeJson, Str};
use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn};
use meilisearch_types::milli;
use meilisearch_types::milli::update::IndexerConfig;
use meilisearch_types::milli::{FieldDistribution, Index};
use serde::{Deserialize, Serialize};
@@ -12,11 +17,6 @@ use time::OffsetDateTime;
use tracing::error;
use uuid::Uuid;
use self::index_map::IndexMap;
use self::IndexStatus::{Available, BeingDeleted, Closing, Missing};
use crate::uuid_codec::UuidCodec;
use crate::{Error, Result};
mod index_map;
const INDEX_MAPPING: &str = "index-mapping";
@@ -121,7 +121,7 @@ impl IndexStats {
/// # Parameters
///
/// - rtxn: a RO transaction for the index, obtained from `Index::read_txn()`.
pub fn new(index: &Index, rtxn: &RoTxn) -> Result<Self> {
pub fn new(index: &Index, rtxn: &RoTxn) -> milli::Result<Self> {
Ok(IndexStats {
number_of_documents: index.number_of_documents(rtxn)?,
database_size: index.on_disk_size()?,
@@ -183,13 +183,18 @@ impl IndexMapper {
// Error if the UUIDv4 somehow already exists in the map, since it should be fresh.
// This is very unlikely to happen in practice.
// TODO: it would be better to lazily create the index. But we need an Index::open function for milli.
let index = self.index_map.write().unwrap().create(
&uuid,
&index_path,
date,
self.enable_mdb_writemap,
self.index_base_map_size,
)?;
let index = self
.index_map
.write()
.unwrap()
.create(
&uuid,
&index_path,
date,
self.enable_mdb_writemap,
self.index_base_map_size,
)
.map_err(|e| Error::from_milli(e, Some(uuid.to_string())))?;
wtxn.commit()?;
@@ -357,7 +362,9 @@ impl IndexMapper {
};
let index_path = self.base_path.join(uuid.to_string());
// take the lock to reopen the environment.
reopen.reopen(&mut self.index_map.write().unwrap(), &index_path)?;
reopen
.reopen(&mut self.index_map.write().unwrap(), &index_path)
.map_err(|e| Error::from_milli(e, Some(uuid.to_string())))?;
continue;
}
BeingDeleted => return Err(Error::IndexNotFound(name.to_string())),
@@ -372,13 +379,15 @@ impl IndexMapper {
Missing => {
let index_path = self.base_path.join(uuid.to_string());
break index_map.create(
&uuid,
&index_path,
None,
self.enable_mdb_writemap,
self.index_base_map_size,
)?;
break index_map
.create(
&uuid,
&index_path,
None,
self.enable_mdb_writemap,
self.index_base_map_size,
)
.map_err(|e| Error::from_milli(e, Some(uuid.to_string())))?;
}
Available(index) => break index,
Closing(_) => {
@@ -460,6 +469,7 @@ impl IndexMapper {
let index = self.index(rtxn, index_uid)?;
let index_rtxn = index.read_txn()?;
IndexStats::new(&index, &index_rtxn)
.map_err(|e| Error::from_milli(e, Some(uuid.to_string())))
}
}
}

View File

@@ -407,7 +407,7 @@ pub struct IndexScheduler {
///
/// See [self.breakpoint()](`IndexScheduler::breakpoint`) for an explanation.
#[cfg(test)]
test_breakpoint_sdr: crossbeam::channel::Sender<(Breakpoint, bool)>,
test_breakpoint_sdr: crossbeam_channel::Sender<(Breakpoint, bool)>,
/// A list of planned failures within the [`tick`](IndexScheduler::tick) method of the index scheduler.
///
@@ -476,7 +476,7 @@ impl IndexScheduler {
/// Create an index scheduler and start its run loop.
pub fn new(
options: IndexSchedulerOptions,
#[cfg(test)] test_breakpoint_sdr: crossbeam::channel::Sender<(Breakpoint, bool)>,
#[cfg(test)] test_breakpoint_sdr: crossbeam_channel::Sender<(Breakpoint, bool)>,
#[cfg(test)] planned_failures: Vec<(usize, tests::FailureLocation)>,
) -> Result<Self> {
std::fs::create_dir_all(&options.tasks_path)?;
@@ -1440,7 +1440,7 @@ impl IndexScheduler {
// if the task doesn't delete anything and 50% of the task queue is full, we must refuse to enqueue the incomming task
if !matches!(&kind, KindWithContent::TaskDeletion { tasks, .. } if !tasks.is_empty())
&& (self.env.non_free_pages_size()? * 100) / self.env.info().map_size as u64 > 50
&& (self.env.non_free_pages_size()? * 100) / self.env.info().map_size as u64 > 40
{
return Err(Error::NoSpaceLeftInTaskQueue);
}
@@ -1678,9 +1678,10 @@ impl IndexScheduler {
tracing::info!("A batch of tasks was successfully completed with {success} successful tasks and {failure} failed tasks.");
}
// If we have an abortion error we must stop the tick here and re-schedule tasks.
Err(Error::Milli(milli::Error::InternalError(
milli::InternalError::AbortedIndexation,
)))
Err(Error::Milli {
error: milli::Error::InternalError(milli::InternalError::AbortedIndexation),
..
})
| Err(Error::AbortedTask) => {
#[cfg(test)]
self.breakpoint(Breakpoint::AbortedIndexation);
@@ -1699,9 +1700,10 @@ impl IndexScheduler {
// 2. close the associated environment
// 3. resize it
// 4. re-schedule tasks
Err(Error::Milli(milli::Error::UserError(
milli::UserError::MaxDatabaseSizeReached,
))) if index_uid.is_some() => {
Err(Error::Milli {
error: milli::Error::UserError(milli::UserError::MaxDatabaseSizeReached),
..
}) if index_uid.is_some() => {
// fixme: add index_uid to match to avoid the unwrap
let index_uid = index_uid.unwrap();
// fixme: handle error more gracefully? not sure when this could happen
@@ -1738,11 +1740,8 @@ impl IndexScheduler {
}
}
self.processing_tasks.write().unwrap().stop_processing();
// We must re-add the canceled task so they're part of the same batch.
// processed.processing |= canceled;
ids |= canceled;
self.write_batch(&mut wtxn, processing_batch, &ids)?;
#[cfg(test)]
@@ -1750,6 +1749,10 @@ impl IndexScheduler {
wtxn.commit().map_err(Error::HeedTransaction)?;
// We should stop processing AFTER everything is processed and written to disk otherwise, a batch (which only lives in RAM) may appear in the processing task
// and then become « not found » for some time until the commit everything is written and the final commit is made.
self.processing_tasks.write().unwrap().stop_processing();
// Once the tasks are committed, we should delete all the update files associated ASAP to avoid leaking files in case of a restart
tracing::debug!("Deleting the update files");
@@ -1942,6 +1945,7 @@ impl IndexScheduler {
// TODO: consider using a type alias or a struct embedder/template
pub fn embedders(
&self,
index_uid: String,
embedding_configs: Vec<IndexEmbeddingConfig>,
) -> Result<EmbeddingConfigs> {
let res: Result<_> = embedding_configs
@@ -1952,8 +1956,12 @@ impl IndexScheduler {
config: milli::vector::EmbeddingConfig { embedder_options, prompt, quantized },
..
}| {
let prompt =
Arc::new(prompt.try_into().map_err(meilisearch_types::milli::Error::from)?);
let prompt = Arc::new(
prompt
.try_into()
.map_err(meilisearch_types::milli::Error::from)
.map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?,
);
// optimistically return existing embedder
{
let embedders = self.embedders.read().unwrap();
@@ -1969,7 +1977,9 @@ impl IndexScheduler {
let embedder = Arc::new(
Embedder::new(embedder_options.clone())
.map_err(meilisearch_types::milli::vector::Error::from)
.map_err(meilisearch_types::milli::Error::from)?,
.map_err(|err| {
Error::from_milli(err.into(), Some(index_uid.clone()))
})?,
);
{
let mut embedders = self.embedders.write().unwrap();
@@ -2237,7 +2247,7 @@ mod tests {
use std::time::Instant;
use big_s::S;
use crossbeam::channel::RecvTimeoutError;
use crossbeam_channel::RecvTimeoutError;
use file_store::File;
use insta::assert_json_snapshot;
use maplit::btreeset;
@@ -2289,7 +2299,7 @@ mod tests {
configuration: impl Fn(&mut IndexSchedulerOptions),
) -> (Self, IndexSchedulerHandle) {
let tempdir = TempDir::new().unwrap();
let (sender, receiver) = crossbeam::channel::bounded(0);
let (sender, receiver) = crossbeam_channel::bounded(0);
let indexer_config = IndexerConfig { skip_index_budget: true, ..Default::default() };
@@ -2421,7 +2431,7 @@ mod tests {
pub struct IndexSchedulerHandle {
_tempdir: TempDir,
index_scheduler: IndexScheduler,
test_breakpoint_rcv: crossbeam::channel::Receiver<(Breakpoint, bool)>,
test_breakpoint_rcv: crossbeam_channel::Receiver<(Breakpoint, bool)>,
last_breakpoint: Breakpoint,
}
@@ -4318,10 +4328,35 @@ mod tests {
let proc = index_scheduler.processing_tasks.read().unwrap().clone();
let query = Query { statuses: Some(vec![Status::Processing]), ..Default::default() };
let (batches, _) = index_scheduler
.get_batch_ids_from_authorized_indexes(&rtxn, &proc, &query, &AuthFilter::default())
let (mut batches, _) = index_scheduler
.get_batches_from_authorized_indexes(query.clone(), &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&batches), @"[0,]"); // only the processing batch in the first tick
assert_eq!(batches.len(), 1);
batches[0].started_at = OffsetDateTime::UNIX_EPOCH;
// Insta cannot snapshot our batches because the batch stats contains an enum as key: https://github.com/mitsuhiko/insta/issues/689
let batch = serde_json::to_string_pretty(&batches[0]).unwrap();
snapshot!(batch, @r#"
{
"uid": 0,
"details": {
"primaryKey": "mouse"
},
"stats": {
"totalNbTasks": 1,
"status": {
"processing": 1
},
"types": {
"indexCreation": 1
},
"indexUids": {
"catto": 1
}
},
"startedAt": "1970-01-01T00:00:00Z",
"finishedAt": null
}
"#);
let query = Query { statuses: Some(vec![Status::Enqueued]), ..Default::default() };
let (batches, _) = index_scheduler
@@ -6145,7 +6180,7 @@ mod tests {
insta::assert_json_snapshot!(simple_hf_config.embedder_options);
let simple_hf_name = name.clone();
let configs = index_scheduler.embedders(configs).unwrap();
let configs = index_scheduler.embedders("doggos".to_string(), configs).unwrap();
let (hf_embedder, _, _) = configs.get(&simple_hf_name).unwrap();
let beagle_embed =
hf_embedder.embed_one(S("Intel the beagle best doggo"), None).unwrap();

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(1):
[1,]
{uid: 1, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"beavero":2}}, }
{uid: 1, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"beavero":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(1):
[1,]
{uid: 1, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"beavero":2}}, }
{uid: 1, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"beavero":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { received_documents: 1, indexed_documents: Some(1) }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"dumpUid":null}, stats: {"totalNbTasks":1,"status":{"enqueued":1},"types":{"dumpCreation":1},"indexUids":{}}, }
{uid: 0, details: {"dumpUid":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"dumpCreation":1},"indexUids":{}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { dump_uid: None }, kind: DumpCreation { keys: [], instance_uid: None }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"catto":2}}, }
{uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"catto":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"catto":2}}, }
{uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"catto":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"catto":2}}, }
{uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"catto":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "catto", primary_key: None, method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"doggos":2}}, }
{uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"doggos":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"doggos":2}}, }
{uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"doggos":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@@ -9,8 +9,8 @@ source: crates/index-scheduler/src/lib.rs
0 {uid: 0, batch_uid: 0, status: succeeded, details: { settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: Set({"catto"}), sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: NotSet, search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> } }, kind: SettingsUpdate { index_uid: "doggos", new_settings: Settings { displayed_attributes: WildcardSetting(NotSet), searchable_attributes: WildcardSetting(NotSet), filterable_attributes: Set({"catto"}), sortable_attributes: NotSet, ranking_rules: NotSet, stop_words: NotSet, non_separator_tokens: NotSet, separator_tokens: NotSet, dictionary: NotSet, synonyms: NotSet, distinct_attribute: NotSet, proximity_precision: NotSet, typo_tolerance: NotSet, faceting: NotSet, pagination: NotSet, embedders: NotSet, search_cutoff_ms: NotSet, localized_attributes: NotSet, facet_search: NotSet, prefix_search: NotSet, _kind: PhantomData<meilisearch_types::settings::Unchecked> }, is_deletion: false, allow_index_creation: true }}
1 {uid: 1, batch_uid: 1, status: succeeded, details: { received_documents: 3, indexed_documents: Some(3) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 3, allow_index_creation: true }}
2 {uid: 2, batch_uid: 2, status: succeeded, details: { received_document_ids: 1, deleted_documents: Some(1) }, kind: DocumentDeletion { index_uid: "doggos", documents_ids: ["1"] }}
3 {uid: 3, batch_uid: 2, status: failed, error: ResponseError { code: 200, message: "Invalid type for filter subexpression: expected: String, Array, found: true.", error_code: "invalid_document_filter", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#invalid_document_filter" }, details: { original_filter: true, deleted_documents: Some(0) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: Bool(true) }}
4 {uid: 4, batch_uid: 2, status: failed, error: ResponseError { code: 200, message: "Attribute `id` is not filterable. Available filterable attributes are: `catto`.\n1:3 id = 2", error_code: "invalid_document_filter", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#invalid_document_filter" }, details: { original_filter: "id = 2", deleted_documents: Some(0) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: String("id = 2") }}
3 {uid: 3, batch_uid: 2, status: failed, error: ResponseError { code: 200, message: "Index `doggos`: Invalid type for filter subexpression: expected: String, Array, found: true.", error_code: "invalid_document_filter", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#invalid_document_filter" }, details: { original_filter: true, deleted_documents: Some(0) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: Bool(true) }}
4 {uid: 4, batch_uid: 2, status: failed, error: ResponseError { code: 200, message: "Index `doggos`: Attribute `id` is not filterable. Available filterable attributes are: `catto`.\n1:3 id = 2", error_code: "invalid_document_filter", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#invalid_document_filter" }, details: { original_filter: "id = 2", deleted_documents: Some(0) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: String("id = 2") }}
5 {uid: 5, batch_uid: 2, status: succeeded, details: { original_filter: "catto EXISTS", deleted_documents: Some(1) }, kind: DocumentDeletionByFilter { index_uid: "doggos", filter_expr: String("catto EXISTS") }}
----------------------------------------------------------------------
### Status:

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"doggos":2}}, }
{uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"doggos":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"receivedDocuments":2,"indexedDocuments":null}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"documentAdditionOrUpdate":2},"indexUids":{"doggos":2}}, }
{uid: 0, details: {"receivedDocuments":1,"indexedDocuments":null}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"documentAdditionOrUpdate":1},"indexUids":{"doggos":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 1, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 1, allow_index_creation: true }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"indexCreation":2},"indexUids":{"index_a":2}}, }
{uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"indexCreation":1},"indexUids":{"index_a":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { primary_key: Some("id") }, kind: IndexCreation { index_uid: "index_a", primary_key: Some("id") }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"indexCreation":2},"indexUids":{"index_a":2}}, }
{uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"indexCreation":1},"indexUids":{"index_a":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { primary_key: Some("id") }, kind: IndexCreation { index_uid: "index_a", primary_key: Some("id") }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[0,]
{uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"indexCreation":2},"indexUids":{"index_a":2}}, }
{uid: 0, details: {"primaryKey":"id"}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"indexCreation":1},"indexUids":{"index_a":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { primary_key: Some("id") }, kind: IndexCreation { index_uid: "index_a", primary_key: Some("id") }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(1):
[1,]
{uid: 1, details: {"primaryKey":"sheep"}, stats: {"totalNbTasks":2,"status":{"enqueued":2},"types":{"indexCreation":2},"indexUids":{"doggo":2}}, }
{uid: 1, details: {"primaryKey":"sheep"}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"indexCreation":1},"indexUids":{"doggo":1}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, batch_uid: 0, status: succeeded, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}

View File

@@ -5,7 +5,7 @@ snapshot_kind: text
### Autobatching Enabled = true
### Processing batch Some(0):
[3,]
{uid: 0, details: {"matchedTasks":2,"deletedTasks":null,"originalFilter":"test_query"}, stats: {"totalNbTasks":1,"status":{"enqueued":1},"types":{"taskDeletion":1},"indexUids":{}}, }
{uid: 0, details: {"matchedTasks":2,"deletedTasks":null,"originalFilter":"test_query"}, stats: {"totalNbTasks":1,"status":{"processing":1},"types":{"taskDeletion":1},"indexUids":{}}, }
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { primary_key: Some("mouse") }, kind: IndexCreation { index_uid: "catto", primary_key: Some("mouse") }}

View File

@@ -67,7 +67,7 @@ impl ProcessingBatch {
task.batch_uid = Some(self.uid);
// We don't store the statuses in the map since they're all enqueued but we must
// still store them in the stats since that can be displayed.
*self.stats.status.entry(task.status).or_default() += 1;
*self.stats.status.entry(Status::Processing).or_default() += 1;
self.kinds.insert(task.kind.as_kind());
*self.stats.types.entry(task.kind.as_kind()).or_default() += 1;
@@ -106,7 +106,7 @@ impl ProcessingBatch {
self.stats.total_nb_tasks = 0;
}
/// Update the timestamp of the tasks and the inner structure of this sturcture.
/// Update the timestamp of the tasks and the inner structure of this structure.
pub fn update(&mut self, task: &mut Task) {
// We must re-set this value in case we're dealing with a task that has been added between
// the `processing` and `finished` state

View File

@@ -17,7 +17,7 @@ hmac = "0.12.1"
maplit = "1.0.2"
meilisearch-types = { path = "../meilisearch-types" }
rand = "0.8.5"
roaring = { version = "0.10.6", features = ["serde"] }
roaring = { version = "0.10.7", features = ["serde"] }
serde = { version = "1.0.204", features = ["derive"] }
serde_json = { version = "1.0.120", features = ["preserve_order"] }
sha2 = "0.10.8"

View File

@@ -25,7 +25,7 @@ fst = "0.4.7"
memmap2 = "0.9.4"
milli = { path = "../milli" }
raw-collections = { git = "https://github.com/meilisearch/raw-collections.git", version = "0.1.0" }
roaring = { version = "0.10.6", features = ["serde"] }
roaring = { version = "0.10.7", features = ["serde"] }
serde = { version = "1.0.204", features = ["derive"] }
serde-cs = "0.2.4"
serde_json = "1.0.120"

View File

@@ -214,7 +214,7 @@ pub fn read_json(input: &File, output: impl io::Write) -> Result<u64> {
// We memory map to be able to deserialize into a RawMap that
// does not allocate when possible and only materialize the first/top level.
let input = unsafe { Mmap::map(input).map_err(DocumentFormatError::Io)? };
let mut doc_alloc = Bump::with_capacity(1024 * 1024 * 1024); // 1MiB
let mut doc_alloc = Bump::with_capacity(1024 * 1024); // 1MiB
let mut out = BufWriter::new(output);
let mut deserializer = serde_json::Deserializer::from_slice(&input);

View File

@@ -279,6 +279,7 @@ InvalidSearchPage , InvalidRequest , BAD_REQUEST ;
InvalidSearchQ , InvalidRequest , BAD_REQUEST ;
InvalidFacetSearchQuery , InvalidRequest , BAD_REQUEST ;
InvalidFacetSearchName , InvalidRequest , BAD_REQUEST ;
FacetSearchDisabled , InvalidRequest , BAD_REQUEST ;
InvalidSearchVector , InvalidRequest , BAD_REQUEST ;
InvalidSearchShowMatchesPosition , InvalidRequest , BAD_REQUEST ;
InvalidSearchShowRankingScore , InvalidRequest , BAD_REQUEST ;

View File

@@ -103,7 +103,7 @@ tracing-subscriber = { version = "0.3.18", features = ["json"] }
tracing-trace = { version = "0.1.0", path = "../tracing-trace" }
tracing-actix-web = "0.7.11"
build-info = { version = "1.7.0", path = "../build-info" }
roaring = "0.10.2"
roaring = "0.10.7"
mopa-maintained = "0.2.3"
[dev-dependencies]
@@ -157,5 +157,5 @@ german = ["meilisearch-types/german"]
turkish = ["meilisearch-types/turkish"]
[package.metadata.mini-dashboard]
assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.15/build.zip"
sha1 = "d057600b4a839a2e0c0be7a372cd1b2683f3ca7e"
assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.16/build.zip"
sha1 = "68f83438a114aabbe76bc9fe480071e741996662"

View File

@@ -4,6 +4,7 @@ use byte_unit::{Byte, UnitType};
use meilisearch_types::document_formats::{DocumentFormatError, PayloadType};
use meilisearch_types::error::{Code, ErrorCode, ResponseError};
use meilisearch_types::index_uid::{IndexUid, IndexUidFormatError};
use meilisearch_types::milli;
use meilisearch_types::milli::OrderBy;
use serde_json::Value;
use tokio::task::JoinError;
@@ -62,8 +63,11 @@ pub enum MeilisearchHttpError {
HeedError(#[from] meilisearch_types::heed::Error),
#[error(transparent)]
IndexScheduler(#[from] index_scheduler::Error),
#[error(transparent)]
Milli(#[from] meilisearch_types::milli::Error),
#[error("{}", match .index_name {
Some(name) if !name.is_empty() => format!("Index `{}`: {error}", name),
_ => format!("{error}")
})]
Milli { error: milli::Error, index_name: Option<String> },
#[error(transparent)]
Payload(#[from] PayloadError),
#[error(transparent)]
@@ -76,6 +80,12 @@ pub enum MeilisearchHttpError {
MissingSearchHybrid,
}
impl MeilisearchHttpError {
pub(crate) fn from_milli(error: milli::Error, index_name: Option<String>) -> Self {
Self::Milli { error, index_name }
}
}
impl ErrorCode for MeilisearchHttpError {
fn error_code(&self) -> Code {
match self {
@@ -95,7 +105,7 @@ impl ErrorCode for MeilisearchHttpError {
MeilisearchHttpError::SerdeJson(_) => Code::Internal,
MeilisearchHttpError::HeedError(_) => Code::Internal,
MeilisearchHttpError::IndexScheduler(e) => e.error_code(),
MeilisearchHttpError::Milli(e) => e.error_code(),
MeilisearchHttpError::Milli { error, .. } => error.error_code(),
MeilisearchHttpError::Payload(e) => e.error_code(),
MeilisearchHttpError::FileStore(_) => Code::Internal,
MeilisearchHttpError::DocumentFormat(e) => e.error_code(),

View File

@@ -395,6 +395,7 @@ fn import_dump(
for index_reader in dump_reader.indexes()? {
let mut index_reader = index_reader?;
let metadata = index_reader.metadata();
let uid = metadata.uid.clone();
tracing::info!("Importing index `{}`.", metadata.uid);
let date = Some((metadata.created_at, metadata.updated_at));
@@ -432,7 +433,7 @@ fn import_dump(
let reader = DocumentsBatchReader::from_reader(reader)?;
let embedder_configs = index.embedding_configs(&wtxn)?;
let embedders = index_scheduler.embedders(embedder_configs)?;
let embedders = index_scheduler.embedders(uid, embedder_configs)?;
let builder = milli::update::IndexDocuments::new(
&mut wtxn,

View File

@@ -20,14 +20,14 @@ use meilisearch::{
LogStderrType, Opt, SubscriberForSecondLayer,
};
use meilisearch_auth::{generate_master_key, AuthController, MASTER_KEY_MIN_SIZE};
use mimalloc::MiMalloc;
use termcolor::{Color, ColorChoice, ColorSpec, StandardStream, WriteColor};
use tracing::level_filters::LevelFilter;
use tracing_subscriber::layer::SubscriberExt as _;
use tracing_subscriber::Layer;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: MiMalloc = MiMalloc;
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
fn default_log_route_layer() -> LogRouteType {
None.with_filter(tracing_subscriber::filter::Targets::new().with_target("", LevelFilter::OFF))

View File

@@ -185,7 +185,8 @@ pub async fn search(
let index = index_scheduler.index(&index_uid)?;
let features = index_scheduler.features();
let search_kind = search_kind(&search_query, &index_scheduler, &index, features)?;
let search_kind =
search_kind(&search_query, &index_scheduler, index_uid.to_string(), &index, features)?;
let permit = search_queue.try_get_search_permit().await?;
let search_result = tokio::task::spawn_blocking(move || {
perform_facet_search(

View File

@@ -5,7 +5,7 @@ use actix_web::web::Data;
use actix_web::{web, HttpRequest, HttpResponse};
use deserr::actix_web::{AwebJson, AwebQueryParameter};
use deserr::{DeserializeError, Deserr, ValuePointerRef};
use index_scheduler::IndexScheduler;
use index_scheduler::{Error, IndexScheduler};
use meilisearch_types::deserr::query_params::Param;
use meilisearch_types::deserr::{immutable_field_error, DeserrJsonError, DeserrQueryParamError};
use meilisearch_types::error::deserr_codes::*;
@@ -107,7 +107,10 @@ pub async fn list_indexes(
if !filters.is_index_authorized(uid) {
return Ok(None);
}
Ok(Some(IndexView::new(uid.to_string(), index)?))
Ok(Some(
IndexView::new(uid.to_string(), index)
.map_err(|e| Error::from_milli(e, Some(uid.to_string())))?,
))
})?;
// Won't cause to open all indexes because IndexView doesn't keep the `Index` opened.
let indexes: Vec<IndexView> = indexes.into_iter().flatten().collect();

View File

@@ -243,11 +243,19 @@ pub async fn search_with_url_query(
let index = index_scheduler.index(&index_uid)?;
let features = index_scheduler.features();
let search_kind = search_kind(&query, index_scheduler.get_ref(), &index, features)?;
let search_kind =
search_kind(&query, index_scheduler.get_ref(), index_uid.to_string(), &index, features)?;
let retrieve_vector = RetrieveVectors::new(query.retrieve_vectors, features)?;
let permit = search_queue.try_get_search_permit().await?;
let search_result = tokio::task::spawn_blocking(move || {
perform_search(&index, query, search_kind, retrieve_vector, index_scheduler.features())
perform_search(
index_uid.to_string(),
&index,
query,
search_kind,
retrieve_vector,
index_scheduler.features(),
)
})
.await;
permit.drop().await;
@@ -287,12 +295,20 @@ pub async fn search_with_post(
let features = index_scheduler.features();
let search_kind = search_kind(&query, index_scheduler.get_ref(), &index, features)?;
let search_kind =
search_kind(&query, index_scheduler.get_ref(), index_uid.to_string(), &index, features)?;
let retrieve_vectors = RetrieveVectors::new(query.retrieve_vectors, features)?;
let permit = search_queue.try_get_search_permit().await?;
let search_result = tokio::task::spawn_blocking(move || {
perform_search(&index, query, search_kind, retrieve_vectors, index_scheduler.features())
perform_search(
index_uid.to_string(),
&index,
query,
search_kind,
retrieve_vectors,
index_scheduler.features(),
)
})
.await;
permit.drop().await;
@@ -314,6 +330,7 @@ pub async fn search_with_post(
pub fn search_kind(
query: &SearchQuery,
index_scheduler: &IndexScheduler,
index_uid: String,
index: &milli::Index,
features: RoFeatures,
) -> Result<SearchKind, ResponseError> {
@@ -332,7 +349,7 @@ pub fn search_kind(
(None, _, None) => Ok(SearchKind::KeywordOnly),
// hybrid.semantic_ratio == 1.0 => vector
(_, Some(HybridQuery { semantic_ratio, embedder }), v) if **semantic_ratio == 1.0 => {
SearchKind::semantic(index_scheduler, index, embedder, v.map(|v| v.len()))
SearchKind::semantic(index_scheduler, index_uid, index, embedder, v.map(|v| v.len()))
}
// hybrid.semantic_ratio == 0.0 => keyword
(_, Some(HybridQuery { semantic_ratio, embedder: _ }), _) if **semantic_ratio == 0.0 => {
@@ -340,13 +357,14 @@ pub fn search_kind(
}
// no query, hybrid, vector => semantic
(None, Some(HybridQuery { semantic_ratio: _, embedder }), Some(v)) => {
SearchKind::semantic(index_scheduler, index, embedder, Some(v.len()))
SearchKind::semantic(index_scheduler, index_uid, index, embedder, Some(v.len()))
}
// query, no hybrid, no vector => keyword
(Some(_), None, None) => Ok(SearchKind::KeywordOnly),
// query, hybrid, maybe vector => hybrid
(Some(_), Some(HybridQuery { semantic_ratio, embedder }), v) => SearchKind::hybrid(
index_scheduler,
index_uid,
index,
embedder,
**semantic_ratio,

View File

@@ -17,6 +17,32 @@ use crate::extractors::authentication::GuardedData;
use crate::routes::{get_task_id, is_dry_run, SummarizedTaskView};
use crate::Opt;
/// This macro generates the routes for the settings.
///
/// It takes a list of settings and generates a module for each setting.
/// Each module contains the `get`, `update` and `delete` routes for the setting.
///
/// It also generates a `configure` function that configures the routes for the settings.
macro_rules! make_setting_routes {
($({route: $route:literal, update_verb: $update_verb:ident, value_type: $type:ty, err_type: $err_ty:ty, attr: $attr:ident, camelcase_attr: $camelcase_attr:literal, analytics: $analytics:ident},)*) => {
$(
make_setting_route!($route, $update_verb, $type, $err_ty, $attr, $camelcase_attr, $analytics);
)*
pub fn configure(cfg: &mut web::ServiceConfig) {
use crate::extractors::sequential_extractor::SeqHandler;
cfg.service(
web::resource("")
.route(web::patch().to(SeqHandler(update_all)))
.route(web::get().to(SeqHandler(get_all)))
.route(web::delete().to(SeqHandler(delete_all))))
$(.service($attr::resources()))*;
}
pub const ALL_SETTINGS_NAMES: &[&str] = &[$(stringify!($attr)),*];
};
}
#[macro_export]
macro_rules! make_setting_route {
($route:literal, $update_verb:ident, $type:ty, $err_ty:ty, $attr:ident, $camelcase_attr:literal, $analytics:ident) => {
@@ -153,279 +179,227 @@ macro_rules! make_setting_route {
};
}
make_setting_route!(
"/filterable-attributes",
put,
std::collections::BTreeSet<String>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsFilterableAttributes,
>,
filterable_attributes,
"filterableAttributes",
FilterableAttributesAnalytics
);
make_setting_route!(
"/sortable-attributes",
put,
std::collections::BTreeSet<String>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsSortableAttributes,
>,
sortable_attributes,
"sortableAttributes",
SortableAttributesAnalytics
);
make_setting_route!(
"/displayed-attributes",
put,
Vec<String>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsDisplayedAttributes,
>,
displayed_attributes,
"displayedAttributes",
DisplayedAttributesAnalytics
);
make_setting_route!(
"/typo-tolerance",
patch,
meilisearch_types::settings::TypoSettings,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsTypoTolerance,
>,
typo_tolerance,
"typoTolerance",
TypoToleranceAnalytics
);
make_setting_route!(
"/searchable-attributes",
put,
Vec<String>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsSearchableAttributes,
>,
searchable_attributes,
"searchableAttributes",
SearchableAttributesAnalytics
);
make_setting_route!(
"/stop-words",
put,
std::collections::BTreeSet<String>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsStopWords,
>,
stop_words,
"stopWords",
StopWordsAnalytics
);
make_setting_route!(
"/non-separator-tokens",
put,
std::collections::BTreeSet<String>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsNonSeparatorTokens,
>,
non_separator_tokens,
"nonSeparatorTokens",
NonSeparatorTokensAnalytics
);
make_setting_route!(
"/separator-tokens",
put,
std::collections::BTreeSet<String>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsSeparatorTokens,
>,
separator_tokens,
"separatorTokens",
SeparatorTokensAnalytics
);
make_setting_route!(
"/dictionary",
put,
std::collections::BTreeSet<String>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsDictionary,
>,
dictionary,
"dictionary",
DictionaryAnalytics
);
make_setting_route!(
"/synonyms",
put,
std::collections::BTreeMap<String, Vec<String>>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsSynonyms,
>,
synonyms,
"synonyms",
SynonymsAnalytics
);
make_setting_route!(
"/distinct-attribute",
put,
String,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsDistinctAttribute,
>,
distinct_attribute,
"distinctAttribute",
DistinctAttributeAnalytics
);
make_setting_route!(
"/proximity-precision",
put,
meilisearch_types::settings::ProximityPrecisionView,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsProximityPrecision,
>,
proximity_precision,
"proximityPrecision",
ProximityPrecisionAnalytics
);
make_setting_route!(
"/localized-attributes",
put,
Vec<meilisearch_types::locales::LocalizedAttributesRuleView>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsLocalizedAttributes,
>,
localized_attributes,
"localizedAttributes",
LocalesAnalytics
);
make_setting_route!(
"/ranking-rules",
put,
Vec<meilisearch_types::settings::RankingRuleView>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsRankingRules,
>,
ranking_rules,
"rankingRules",
RankingRulesAnalytics
);
make_setting_route!(
"/faceting",
patch,
meilisearch_types::settings::FacetingSettings,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsFaceting,
>,
faceting,
"faceting",
FacetingAnalytics
);
make_setting_route!(
"/pagination",
patch,
meilisearch_types::settings::PaginationSettings,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsPagination,
>,
pagination,
"pagination",
PaginationAnalytics
);
make_setting_route!(
"/embedders",
patch,
std::collections::BTreeMap<String, Setting<meilisearch_types::milli::vector::settings::EmbeddingSettings>>,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsEmbedders,
>,
embedders,
"embedders",
EmbeddersAnalytics
);
make_setting_route!(
"/search-cutoff-ms",
put,
u64,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsSearchCutoffMs,
>,
search_cutoff_ms,
"searchCutoffMs",
SearchCutoffMsAnalytics
);
make_setting_route!(
"/facet-search",
put,
bool,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsFacetSearch,
>,
facet_search,
"facetSearch",
FacetSearchAnalytics
);
make_setting_route!(
"/prefix-search",
put,
meilisearch_types::settings::PrefixSearchSettings,
meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsPrefixSearch,
>,
prefix_search,
"prefixSearch",
PrefixSearchAnalytics
);
macro_rules! generate_configure {
($($mod:ident),*) => {
pub fn configure(cfg: &mut web::ServiceConfig) {
use crate::extractors::sequential_extractor::SeqHandler;
cfg.service(
web::resource("")
.route(web::patch().to(SeqHandler(update_all)))
.route(web::get().to(SeqHandler(get_all)))
.route(web::delete().to(SeqHandler(delete_all))))
$(.service($mod::resources()))*;
}
};
}
generate_configure!(
filterable_attributes,
sortable_attributes,
displayed_attributes,
localized_attributes,
searchable_attributes,
distinct_attribute,
proximity_precision,
stop_words,
separator_tokens,
non_separator_tokens,
dictionary,
synonyms,
ranking_rules,
typo_tolerance,
pagination,
faceting,
embedders,
search_cutoff_ms
make_setting_routes!(
{
route: "/filterable-attributes",
update_verb: put,
value_type: std::collections::BTreeSet<String>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsFilterableAttributes,
>,
attr: filterable_attributes,
camelcase_attr: "filterableAttributes",
analytics: FilterableAttributesAnalytics
},
{
route: "/sortable-attributes",
update_verb: put,
value_type: std::collections::BTreeSet<String>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsSortableAttributes,
>,
attr: sortable_attributes,
camelcase_attr: "sortableAttributes",
analytics: SortableAttributesAnalytics
},
{
route: "/displayed-attributes",
update_verb: put,
value_type: Vec<String>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsDisplayedAttributes,
>,
attr: displayed_attributes,
camelcase_attr: "displayedAttributes",
analytics: DisplayedAttributesAnalytics
},
{
route: "/typo-tolerance",
update_verb: patch,
value_type: meilisearch_types::settings::TypoSettings,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsTypoTolerance,
>,
attr: typo_tolerance,
camelcase_attr: "typoTolerance",
analytics: TypoToleranceAnalytics
},
{
route: "/searchable-attributes",
update_verb: put,
value_type: Vec<String>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsSearchableAttributes,
>,
attr: searchable_attributes,
camelcase_attr: "searchableAttributes",
analytics: SearchableAttributesAnalytics
},
{
route: "/stop-words",
update_verb: put,
value_type: std::collections::BTreeSet<String>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsStopWords,
>,
attr: stop_words,
camelcase_attr: "stopWords",
analytics: StopWordsAnalytics
},
{
route: "/non-separator-tokens",
update_verb: put,
value_type: std::collections::BTreeSet<String>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsNonSeparatorTokens,
>,
attr: non_separator_tokens,
camelcase_attr: "nonSeparatorTokens",
analytics: NonSeparatorTokensAnalytics
},
{
route: "/separator-tokens",
update_verb: put,
value_type: std::collections::BTreeSet<String>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsSeparatorTokens,
>,
attr: separator_tokens,
camelcase_attr: "separatorTokens",
analytics: SeparatorTokensAnalytics
},
{
route: "/dictionary",
update_verb: put,
value_type: std::collections::BTreeSet<String>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsDictionary,
>,
attr: dictionary,
camelcase_attr: "dictionary",
analytics: DictionaryAnalytics
},
{
route: "/synonyms",
update_verb: put,
value_type: std::collections::BTreeMap<String, Vec<String>>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsSynonyms,
>,
attr: synonyms,
camelcase_attr: "synonyms",
analytics: SynonymsAnalytics
},
{
route: "/distinct-attribute",
update_verb: put,
value_type: String,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsDistinctAttribute,
>,
attr: distinct_attribute,
camelcase_attr: "distinctAttribute",
analytics: DistinctAttributeAnalytics
},
{
route: "/proximity-precision",
update_verb: put,
value_type: meilisearch_types::settings::ProximityPrecisionView,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsProximityPrecision,
>,
attr: proximity_precision,
camelcase_attr: "proximityPrecision",
analytics: ProximityPrecisionAnalytics
},
{
route: "/localized-attributes",
update_verb: put,
value_type: Vec<meilisearch_types::locales::LocalizedAttributesRuleView>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsLocalizedAttributes,
>,
attr: localized_attributes,
camelcase_attr: "localizedAttributes",
analytics: LocalesAnalytics
},
{
route: "/ranking-rules",
update_verb: put,
value_type: Vec<meilisearch_types::settings::RankingRuleView>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsRankingRules,
>,
attr: ranking_rules,
camelcase_attr: "rankingRules",
analytics: RankingRulesAnalytics
},
{
route: "/faceting",
update_verb: patch,
value_type: meilisearch_types::settings::FacetingSettings,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsFaceting,
>,
attr: faceting,
camelcase_attr: "faceting",
analytics: FacetingAnalytics
},
{
route: "/pagination",
update_verb: patch,
value_type: meilisearch_types::settings::PaginationSettings,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsPagination,
>,
attr: pagination,
camelcase_attr: "pagination",
analytics: PaginationAnalytics
},
{
route: "/embedders",
update_verb: patch,
value_type: std::collections::BTreeMap<String, Setting<meilisearch_types::milli::vector::settings::EmbeddingSettings>>,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsEmbedders,
>,
attr: embedders,
camelcase_attr: "embedders",
analytics: EmbeddersAnalytics
},
{
route: "/search-cutoff-ms",
update_verb: put,
value_type: u64,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsSearchCutoffMs,
>,
attr: search_cutoff_ms,
camelcase_attr: "searchCutoffMs",
analytics: SearchCutoffMsAnalytics
},
{
route: "/facet-search",
update_verb: put,
value_type: bool,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsFacetSearch,
>,
attr: facet_search,
camelcase_attr: "facetSearch",
analytics: FacetSearchAnalytics
},
{
route: "/prefix-search",
update_verb: put,
value_type: meilisearch_types::settings::PrefixSearchSettings,
err_type: meilisearch_types::deserr::DeserrJsonError<
meilisearch_types::error::deserr_codes::InvalidSettingsPrefixSearch,
>,
attr: prefix_search,
camelcase_attr: "prefixSearch",
analytics: PrefixSearchAnalytics
},
);
pub async fn update_all(

View File

@@ -103,8 +103,13 @@ async fn similar(
let index = index_scheduler.index(&index_uid)?;
let (embedder_name, embedder, quantized) =
SearchKind::embedder(&index_scheduler, &index, &query.embedder, None)?;
let (embedder_name, embedder, quantized) = SearchKind::embedder(
&index_scheduler,
index_uid.to_string(),
&index,
&query.embedder,
None,
)?;
tokio::task::spawn_blocking(move || {
perform_similar(

View File

@@ -125,14 +125,28 @@ pub async fn multi_search_with_post(
})
.with_index(query_index)?;
let search_kind =
search_kind(&query, index_scheduler.get_ref(), &index, features)
.with_index(query_index)?;
let index_uid_str = index_uid.to_string();
let search_kind = search_kind(
&query,
index_scheduler.get_ref(),
index_uid_str.clone(),
&index,
features,
)
.with_index(query_index)?;
let retrieve_vector = RetrieveVectors::new(query.retrieve_vectors, features)
.with_index(query_index)?;
let search_result = tokio::task::spawn_blocking(move || {
perform_search(&index, query, search_kind, retrieve_vector, features)
perform_search(
index_uid_str.clone(),
&index,
query,
search_kind,
retrieve_vector,
features,
)
})
.await
.with_index(query_index)?;

View File

@@ -560,7 +560,8 @@ pub fn perform_federated_search(
// use an immediately invoked lambda to capture the result without returning from the function
let res: Result<(), ResponseError> = (|| {
let search_kind = search_kind(&query, index_scheduler, &index, features)?;
let search_kind =
search_kind(&query, index_scheduler, index_uid.to_string(), &index, features)?;
let canonicalization_kind = match (&search_kind, &query.q) {
(SearchKind::SemanticOnly { .. }, _) => {
@@ -636,7 +637,8 @@ pub fn perform_federated_search(
search.offset(0);
search.limit(required_hit_count);
let (result, _semantic_hit_count) = super::search_from_kind(search_kind, search)?;
let (result, _semantic_hit_count) =
super::search_from_kind(index_uid.to_string(), search_kind, search)?;
let format = AttributesFormat {
attributes_to_retrieve: query.attributes_to_retrieve,
retrieve_vectors,
@@ -670,7 +672,10 @@ pub fn perform_federated_search(
let formatter_builder = HitMaker::formatter_builder(matching_words, tokenizer);
let hit_maker = HitMaker::new(&index, &rtxn, format, formatter_builder)?;
let hit_maker =
HitMaker::new(&index, &rtxn, format, formatter_builder).map_err(|e| {
MeilisearchHttpError::from_milli(e, Some(index_uid.to_string()))
})?;
results_by_query.push(SearchResultByQuery {
federation_options,

View File

@@ -19,7 +19,9 @@ use meilisearch_types::locales::Locale;
use meilisearch_types::milli::score_details::{ScoreDetails, ScoringStrategy};
use meilisearch_types::milli::vector::parsed_vectors::ExplicitVectors;
use meilisearch_types::milli::vector::Embedder;
use meilisearch_types::milli::{FacetValueHit, OrderBy, SearchForFacetValues, TimeBudget};
use meilisearch_types::milli::{
FacetValueHit, InternalError, OrderBy, SearchForFacetValues, TimeBudget,
};
use meilisearch_types::settings::DEFAULT_PAGINATION_MAX_TOTAL_HITS;
use meilisearch_types::{milli, Document};
use milli::tokenizer::{Language, TokenizerBuilder};
@@ -281,35 +283,38 @@ pub enum SearchKind {
impl SearchKind {
pub(crate) fn semantic(
index_scheduler: &index_scheduler::IndexScheduler,
index_uid: String,
index: &Index,
embedder_name: &str,
vector_len: Option<usize>,
) -> Result<Self, ResponseError> {
let (embedder_name, embedder, quantized) =
Self::embedder(index_scheduler, index, embedder_name, vector_len)?;
Self::embedder(index_scheduler, index_uid, index, embedder_name, vector_len)?;
Ok(Self::SemanticOnly { embedder_name, embedder, quantized })
}
pub(crate) fn hybrid(
index_scheduler: &index_scheduler::IndexScheduler,
index_uid: String,
index: &Index,
embedder_name: &str,
semantic_ratio: f32,
vector_len: Option<usize>,
) -> Result<Self, ResponseError> {
let (embedder_name, embedder, quantized) =
Self::embedder(index_scheduler, index, embedder_name, vector_len)?;
Self::embedder(index_scheduler, index_uid, index, embedder_name, vector_len)?;
Ok(Self::Hybrid { embedder_name, embedder, quantized, semantic_ratio })
}
pub(crate) fn embedder(
index_scheduler: &index_scheduler::IndexScheduler,
index_uid: String,
index: &Index,
embedder_name: &str,
vector_len: Option<usize>,
) -> Result<(String, Arc<Embedder>, bool), ResponseError> {
let embedder_configs = index.embedding_configs(&index.read_txn()?)?;
let embedders = index_scheduler.embedders(embedder_configs)?;
let embedders = index_scheduler.embedders(index_uid, embedder_configs)?;
let (embedder, _, quantized) = embedders
.get(embedder_name)
@@ -890,6 +895,7 @@ fn prepare_search<'t>(
}
pub fn perform_search(
index_uid: String,
index: &Index,
query: SearchQuery,
search_kind: SearchKind,
@@ -916,7 +922,7 @@ pub fn perform_search(
used_negative_operator,
},
semantic_hit_count,
) = search_from_kind(search_kind, search)?;
) = search_from_kind(index_uid, search_kind, search)?;
let SearchQuery {
q,
@@ -1069,17 +1075,27 @@ fn compute_facet_distribution_stats<S: AsRef<str>>(
}
pub fn search_from_kind(
index_uid: String,
search_kind: SearchKind,
search: milli::Search<'_>,
) -> Result<(milli::SearchResult, Option<u32>), MeilisearchHttpError> {
let (milli_result, semantic_hit_count) = match &search_kind {
SearchKind::KeywordOnly => (search.execute()?, None),
SearchKind::KeywordOnly => {
let results = search
.execute()
.map_err(|e| MeilisearchHttpError::from_milli(e, Some(index_uid.to_string())))?;
(results, None)
}
SearchKind::SemanticOnly { .. } => {
let results = search.execute()?;
let results = search
.execute()
.map_err(|e| MeilisearchHttpError::from_milli(e, Some(index_uid.to_string())))?;
let semantic_hit_count = results.document_scores.len() as u32;
(results, Some(semantic_hit_count))
}
SearchKind::Hybrid { semantic_ratio, .. } => search.execute_hybrid(*semantic_ratio)?,
SearchKind::Hybrid { semantic_ratio, .. } => search
.execute_hybrid(*semantic_ratio)
.map_err(|e| MeilisearchHttpError::from_milli(e, Some(index_uid)))?,
};
Ok((milli_result, semantic_hit_count))
}
@@ -1181,7 +1197,7 @@ impl<'a> HitMaker<'a> {
rtxn: &'a RoTxn<'a>,
format: AttributesFormat,
mut formatter_builder: MatcherBuilder<'a>,
) -> Result<Self, MeilisearchHttpError> {
) -> milli::Result<Self> {
formatter_builder.crop_marker(format.crop_marker);
formatter_builder.highlight_prefix(format.highlight_pre_tag);
formatter_builder.highlight_suffix(format.highlight_post_tag);
@@ -1276,11 +1292,7 @@ impl<'a> HitMaker<'a> {
})
}
pub fn make_hit(
&self,
id: u32,
score: &[ScoreDetails],
) -> Result<SearchHit, MeilisearchHttpError> {
pub fn make_hit(&self, id: u32, score: &[ScoreDetails]) -> milli::Result<SearchHit> {
let (_, obkv) =
self.index.iter_documents(self.rtxn, std::iter::once(id))?.next().unwrap()?;
@@ -1323,7 +1335,10 @@ impl<'a> HitMaker<'a> {
.is_some_and(|conf| conf.user_provided.contains(id));
let embeddings =
ExplicitVectors { embeddings: Some(vector.into()), regenerate: !user_provided };
vectors.insert(name, serde_json::to_value(embeddings)?);
vectors.insert(
name,
serde_json::to_value(embeddings).map_err(InternalError::SerdeJson)?,
);
}
document.insert("_vectors".into(), vectors.into());
}
@@ -1369,7 +1384,7 @@ fn make_hits<'a>(
format: AttributesFormat,
matching_words: milli::MatchingWords,
documents_ids_scores: impl Iterator<Item = (u32, &'a Vec<ScoreDetails>)> + 'a,
) -> Result<Vec<SearchHit>, MeilisearchHttpError> {
) -> milli::Result<Vec<SearchHit>> {
let mut documents = Vec::new();
let dictionary = index.dictionary(rtxn)?;
@@ -1407,6 +1422,13 @@ pub fn perform_facet_search(
None => TimeBudget::default(),
};
if !index.facet_search(&rtxn)? {
return Err(ResponseError::from_msg(
"The facet search is disabled for this index".to_string(),
Code::FacetSearchDisabled,
));
}
// In the faceted search context, we want to use the intersection between the locales provided by the user
// and the locales of the facet string.
// If the facet string is not localized, we **ignore** the locales provided by the user because the facet data has no locale.
@@ -1690,12 +1712,12 @@ fn make_document(
displayed_attributes: &BTreeSet<FieldId>,
field_ids_map: &FieldsIdsMap,
obkv: &obkv::KvReaderU16,
) -> Result<Document, MeilisearchHttpError> {
) -> milli::Result<Document> {
let mut document = serde_json::Map::new();
// recreate the original json
for (key, value) in obkv.iter() {
let value = serde_json::from_slice(value)?;
let value = serde_json::from_slice(value).map_err(InternalError::SerdeJson)?;
let key = field_ids_map.name(key).expect("Missing field name").to_string();
document.insert(key, value);
@@ -1720,7 +1742,7 @@ fn format_fields(
displayable_ids: &BTreeSet<FieldId>,
locales: Option<&[Language]>,
localized_attributes: &[LocalizedAttributesRule],
) -> Result<(Option<MatchesPosition>, Document), MeilisearchHttpError> {
) -> milli::Result<(Option<MatchesPosition>, Document)> {
let mut matches_position = compute_matches.then(BTreeMap::new);
let mut document = document.clone();
@@ -1898,7 +1920,7 @@ fn parse_filter_array(arr: &[Value]) -> Result<Option<Filter>, MeilisearchHttpEr
}
}
Ok(Filter::from_array(ands)?)
Filter::from_array(ands).map_err(|e| MeilisearchHttpError::from_milli(e, None))
}
#[cfg(test)]

View File

@@ -224,7 +224,7 @@ async fn list_batches_status_and_type_filtered() {
}
#[actix_rt::test]
async fn get_batch_filter_error() {
async fn list_batch_filter_error() {
let server = Server::new().await;
let (response, code) = server.batches_filter("lol=pied").await;

View File

@@ -52,6 +52,25 @@ impl Value {
}
self
}
/// Return `true` if the `status` field is set to `failed`.
/// Panic if the `status` field doesn't exists.
#[track_caller]
pub fn is_fail(&self) -> bool {
if !self["status"].is_string() {
panic!("Called `is_fail` on {}", serde_json::to_string_pretty(&self.0).unwrap());
}
self["status"] == serde_json::Value::String(String::from("failed"))
}
// Panic if the json doesn't contain the `status` field set to "succeeded"
#[track_caller]
pub fn failed(&self) -> &Self {
if !self.is_fail() {
panic!("Called failed on {}", serde_json::to_string_pretty(&self.0).unwrap());
}
self
}
}
impl From<serde_json::Value> for Value {

View File

@@ -1681,7 +1681,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "The `_geo` field in the document with the id: `\"11\"` is not an object. Was expecting an object with the `_geo.lat` and `_geo.lng` fields but instead got `\"foobar\"`.",
"message": "Index `test`: The `_geo` field in the document with the id: `\"11\"` is not an object. Was expecting an object with the `_geo.lat` and `_geo.lng` fields but instead got `\"foobar\"`.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -1719,7 +1719,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not find latitude nor longitude in the document with the id: `\"11\"`. Was expecting `_geo.lat` and `_geo.lng` fields.",
"message": "Index `test`: Could not find latitude nor longitude in the document with the id: `\"11\"`. Was expecting `_geo.lat` and `_geo.lng` fields.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -1757,7 +1757,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not find latitude nor longitude in the document with the id: `\"11\"`. Was expecting `_geo.lat` and `_geo.lng` fields.",
"message": "Index `test`: Could not find latitude nor longitude in the document with the id: `\"11\"`. Was expecting `_geo.lat` and `_geo.lng` fields.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -1795,7 +1795,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.",
"message": "Index `test`: Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -1833,7 +1833,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.",
"message": "Index `test`: Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -1871,7 +1871,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.",
"message": "Index `test`: Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -1909,7 +1909,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.",
"message": "Index `test`: Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -1947,7 +1947,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not parse latitude nor longitude in the document with the id: `\"11\"`. Was expecting finite numbers but instead got `false` and `true`.",
"message": "Index `test`: Could not parse latitude nor longitude in the document with the id: `\"11\"`. Was expecting finite numbers but instead got `false` and `true`.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -1985,7 +1985,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.",
"message": "Index `test`: Could not find longitude in the document with the id: `\"11\"`. Was expecting a `_geo.lng` field.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -2023,7 +2023,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.",
"message": "Index `test`: Could not find latitude in the document with the id: `\"11\"`. Was expecting a `_geo.lat` field.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -2061,7 +2061,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not parse latitude nor longitude in the document with the id: `\"11\"`. Was expecting finite numbers but instead got `\"doggo\"` and `\"doggo\"`.",
"message": "Index `test`: Could not parse latitude nor longitude in the document with the id: `\"11\"`. Was expecting finite numbers but instead got `\"doggo\"` and `\"doggo\"`.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -2099,7 +2099,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "The `_geo` field in the document with the id: `\"11\"` contains the following unexpected fields: `{\"doggo\":\"are the best\"}`.",
"message": "Index `test`: The `_geo` field in the document with the id: `\"11\"` contains the following unexpected fields: `{\"doggo\":\"are the best\"}`.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -2138,7 +2138,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not parse longitude in the document with the id: `\"12\"`. Was expecting a finite number but instead got `null`.",
"message": "Index `test`: Could not parse longitude in the document with the id: `\"12\"`. Was expecting a finite number but instead got `null`.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -2175,7 +2175,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not parse latitude in the document with the id: `\"12\"`. Was expecting a finite number but instead got `null`.",
"message": "Index `test`: Could not parse latitude in the document with the id: `\"12\"`. Was expecting a finite number but instead got `null`.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -2212,7 +2212,7 @@ async fn add_documents_invalid_geo_field() {
"indexedDocuments": 0
},
"error": {
"message": "Could not parse latitude nor longitude in the document with the id: `\"13\"`. Was expecting finite numbers but instead got `null` and `null`.",
"message": "Index `test`: Could not parse latitude nor longitude in the document with the id: `\"13\"`. Was expecting finite numbers but instead got `null` and `null`.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"
@@ -2279,7 +2279,7 @@ async fn add_invalid_geo_and_then_settings() {
]
},
"error": {
"message": "Could not parse latitude in the document with the id: `\"11\"`. Was expecting a finite number but instead got `null`.",
"message": "Index `test`: Could not parse latitude in the document with the id: `\"11\"`. Was expecting a finite number but instead got `null`.",
"code": "invalid_document_geo_field",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_geo_field"

View File

@@ -604,7 +604,7 @@ async fn delete_document_by_filter() {
"originalFilter": "\"doggo = bernese\""
},
"error": {
"message": "Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese",
"message": "Index `EMPTY_INDEX`: Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
@@ -636,7 +636,7 @@ async fn delete_document_by_filter() {
"originalFilter": "\"catto = jorts\""
},
"error": {
"message": "Attribute `catto` is not filterable. Available filterable attributes are: `id`, `title`.\n1:6 catto = jorts",
"message": "Index `SHARED_DOCUMENTS`: Attribute `catto` is not filterable. Available filterable attributes are: `id`, `title`.\n1:6 catto = jorts",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"

View File

@@ -95,7 +95,7 @@ async fn error_update_existing_primary_key() {
let response = index.wait_task(2).await;
let expected_response = json!({
"message": "Index already has a primary key: `id`.",
"message": "Index `test`: Index already has a primary key: `id`.",
"code": "index_primary_key_already_exists",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#index_primary_key_already_exists"

View File

@@ -711,7 +711,7 @@ async fn filter_invalid_attribute_array() {
index.wait_task(task.uid()).await;
let expected_response = json!({
"message": "Attribute `many` is not filterable. Available filterable attributes are: `title`.\n1:5 many = Glass",
"message": format!("Index `{}`: Attribute `many` is not filterable. Available filterable attributes are: `title`.\n1:5 many = Glass", index.uid),
"code": "invalid_search_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter"
@@ -733,7 +733,7 @@ async fn filter_invalid_attribute_string() {
index.wait_task(task.uid()).await;
let expected_response = json!({
"message": "Attribute `many` is not filterable. Available filterable attributes are: `title`.\n1:5 many = Glass",
"message": format!("Index `{}`: Attribute `many` is not filterable. Available filterable attributes are: `title`.\n1:5 many = Glass", index.uid),
"code": "invalid_search_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter"
@@ -940,7 +940,7 @@ async fn sort_unsortable_attribute() {
index.wait_task(response.uid()).await.succeeded();
let expected_response = json!({
"message": "Attribute `title` is not sortable. Available sortable attributes are: `id`.",
"message": format!("Index `{}`: Attribute `title` is not sortable. Available sortable attributes are: `id`.", index.uid),
"code": "invalid_search_sort",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort"
@@ -998,7 +998,7 @@ async fn sort_unset_ranking_rule() {
index.wait_task(response.uid()).await.succeeded();
let expected_response = json!({
"message": "You must specify where `sort` is listed in the rankingRules setting to use the sort parameter at search time.",
"message": format!("Index `{}`: You must specify where `sort` is listed in the rankingRules setting to use the sort parameter at search time.", index.uid),
"code": "invalid_search_sort",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort"
@@ -1024,19 +1024,18 @@ async fn search_on_unknown_field() {
index.update_settings_searchable_attributes(json!(["id", "title"])).await;
index.wait_task(response.uid()).await.succeeded();
let expected_response = json!({
"message": format!("Index `{}`: Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.", index.uid),
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
});
index
.search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["unknown"]}),
|response, code| {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.",
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
}
"###);
assert_eq!(response, expected_response);
assert_eq!(code, 400);
},
)
.await;
@@ -1050,19 +1049,18 @@ async fn search_on_unknown_field_plus_joker() {
index.update_settings_searchable_attributes(json!(["id", "title"])).await;
index.wait_task(response.uid()).await.succeeded();
let expected_response = json!({
"message": format!("Index `{}`: Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.", index.uid),
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
});
index
.search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["*", "unknown"]}),
|response, code| {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.",
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
}
"###);
assert_eq!(response, expected_response);
assert_eq!(code, 400);
},
)
.await;
@@ -1071,15 +1069,8 @@ async fn search_on_unknown_field_plus_joker() {
.search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["unknown", "*"]}),
|response, code| {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.",
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
}
"###);
assert_eq!(response, expected_response);
assert_eq!(code, 400);
},
)
.await;
@@ -1092,47 +1083,44 @@ async fn distinct_at_search_time() {
let (task, _) = index.create(None).await;
index.wait_task(task.uid()).await.succeeded();
let expected_response = json!({
"message": format!("Index `{}`: Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. This index does not have configured filterable attributes.", index.uid),
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
});
let (response, code) =
index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": "doggo.truc"})).await;
snapshot!(code, @"400 Bad Request");
snapshot!(response, @r###"
{
"message": "Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. This index does not have configured filterable attributes.",
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
}
"###);
assert_eq!(response, expected_response);
assert_eq!(code, 400);
let (task, _) = index.update_settings_filterable_attributes(json!(["color", "machin"])).await;
index.wait_task(task.uid()).await;
let expected_response = json!({
"message": format!("Index `{}`: Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. Available filterable attributes are: `color, machin`.", index.uid),
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
});
let (response, code) =
index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": "doggo.truc"})).await;
snapshot!(code, @"400 Bad Request");
snapshot!(response, @r###"
{
"message": "Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. Available filterable attributes are: `color, machin`.",
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
}
"###);
assert_eq!(response, expected_response);
assert_eq!(code, 400);
let (task, _) = index.update_settings_displayed_attributes(json!(["color"])).await;
index.wait_task(task.uid()).await;
let expected_response = json!({
"message": format!("Index `{}`: Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. Available filterable attributes are: `color, <..hidden-attributes>`.", index.uid),
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
});
let (response, code) =
index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": "doggo.truc"})).await;
snapshot!(code, @"400 Bad Request");
snapshot!(response, @r###"
{
"message": "Attribute `doggo.truc` is not filterable and thus, cannot be used as distinct attribute. Available filterable attributes are: `color, <..hidden-attributes>`.",
"code": "invalid_search_distinct",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_distinct"
}
"###);
assert_eq!(response, expected_response);
assert_eq!(code, 400);
let (response, code) =
index.search_post(json!({"page": 0, "hitsPerPage": 2, "distinct": true})).await;

View File

@@ -221,8 +221,15 @@ async fn add_documents_and_deactivate_facet_search() {
let (response, code) =
index.facet_search(json!({"facetName": "genres", "facetQuery": "a"})).await;
assert_eq!(code, 200, "{}", response);
assert_eq!(dbg!(response)["facetHits"].as_array().unwrap().len(), 0);
assert_eq!(code, 400, "{}", response);
snapshot!(response, @r###"
{
"message": "The facet search is disabled for this index",
"code": "facet_search_disabled",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#facet_search_disabled"
}
"###);
}
#[actix_rt::test]
@@ -245,8 +252,15 @@ async fn deactivate_facet_search_and_add_documents() {
let (response, code) =
index.facet_search(json!({"facetName": "genres", "facetQuery": "a"})).await;
assert_eq!(code, 200, "{}", response);
assert_eq!(dbg!(response)["facetHits"].as_array().unwrap().len(), 0);
assert_eq!(code, 400, "{}", response);
snapshot!(response, @r###"
{
"message": "The facet search is disabled for this index",
"code": "facet_search_disabled",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#facet_search_disabled"
}
"###);
}
#[actix_rt::test]

View File

@@ -1070,7 +1070,7 @@ async fn federation_one_query_error() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Inside `.queries[1]`: Attribute `title` is not filterable. This index does not have configured filterable attributes.\n1:6 title = toto",
"message": "Inside `.queries[1]`: Index `nested`: Attribute `title` is not filterable. This index does not have configured filterable attributes.\n1:6 title = toto",
"code": "invalid_search_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter"
@@ -1102,7 +1102,7 @@ async fn federation_one_query_sort_error() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Inside `.queries[1]`: Attribute `doggos` is not sortable. This index does not have configured sortable attributes.",
"message": "Inside `.queries[1]`: Index `nested`: Attribute `doggos` is not sortable. This index does not have configured sortable attributes.",
"code": "invalid_search_sort",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort"
@@ -1166,7 +1166,7 @@ async fn federation_multiple_query_errors() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Inside `.queries[0]`: Attribute `title` is not filterable. This index does not have configured filterable attributes.\n1:6 title = toto",
"message": "Inside `.queries[0]`: Index `test`: Attribute `title` is not filterable. This index does not have configured filterable attributes.\n1:6 title = toto",
"code": "invalid_search_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter"
@@ -1198,7 +1198,7 @@ async fn federation_multiple_query_sort_errors() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Inside `.queries[0]`: Attribute `title` is not sortable. This index does not have configured sortable attributes.",
"message": "Inside `.queries[0]`: Index `test`: Attribute `title` is not sortable. This index does not have configured sortable attributes.",
"code": "invalid_search_sort",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort"
@@ -1231,7 +1231,7 @@ async fn federation_multiple_query_errors_interleaved() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Inside `.queries[1]`: Attribute `doggos` is not filterable. This index does not have configured filterable attributes.\n1:7 doggos IN [intel, kefir]",
"message": "Inside `.queries[1]`: Index `nested`: Attribute `doggos` is not filterable. This index does not have configured filterable attributes.\n1:7 doggos IN [intel, kefir]",
"code": "invalid_search_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter"
@@ -1264,7 +1264,7 @@ async fn federation_multiple_query_sort_errors_interleaved() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Inside `.queries[1]`: Attribute `doggos` is not sortable. This index does not have configured sortable attributes.",
"message": "Inside `.queries[1]`: Index `nested`: Attribute `doggos` is not sortable. This index does not have configured sortable attributes.",
"code": "invalid_search_sort",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort"

View File

@@ -1,44 +1,185 @@
use std::collections::HashMap;
use once_cell::sync::Lazy;
use crate::common::{Server, Value};
use crate::common::Server;
use crate::json;
static DEFAULT_SETTINGS_VALUES: Lazy<HashMap<&'static str, Value>> = Lazy::new(|| {
let mut map = HashMap::new();
map.insert("displayed_attributes", json!(["*"]));
map.insert("searchable_attributes", json!(["*"]));
map.insert("localized_attributes", json!(null));
map.insert("filterable_attributes", json!([]));
map.insert("distinct_attribute", json!(null));
map.insert(
"ranking_rules",
json!(["words", "typo", "proximity", "attribute", "sort", "exactness"]),
);
map.insert("stop_words", json!([]));
map.insert("non_separator_tokens", json!([]));
map.insert("separator_tokens", json!([]));
map.insert("dictionary", json!([]));
map.insert("synonyms", json!({}));
map.insert(
"faceting",
json!({
"maxValuesPerFacet": json!(100),
"sortFacetValuesBy": {
"*": "alpha"
macro_rules! test_setting_routes {
($({setting: $setting:ident, update_verb: $update_verb:ident, default_value: $default_value:tt},) *) => {
$(
mod $setting {
use crate::common::Server;
#[actix_rt::test]
async fn get_unexisting_index() {
let server = Server::new().await;
let url = format!("/indexes/test/settings/{}",
stringify!($setting)
.chars()
.map(|c| if c == '_' { '-' } else { c })
.collect::<String>());
let (_response, code) = server.service.get(url).await;
assert_eq!(code, 404);
}
#[actix_rt::test]
async fn update_unexisting_index() {
let server = Server::new().await;
let url = format!("/indexes/test/settings/{}",
stringify!($setting)
.chars()
.map(|c| if c == '_' { '-' } else { c })
.collect::<String>());
let (response, code) = server.service.$update_verb(url, serde_json::Value::Null.into()).await;
assert_eq!(code, 202, "{}", response);
server.index("").wait_task(0).await;
let (response, code) = server.index("test").get().await;
assert_eq!(code, 200, "{}", response);
}
#[actix_rt::test]
async fn delete_unexisting_index() {
let server = Server::new().await;
let url = format!("/indexes/test/settings/{}",
stringify!($setting)
.chars()
.map(|c| if c == '_' { '-' } else { c })
.collect::<String>());
let (_, code) = server.service.delete(url).await;
assert_eq!(code, 202);
let response = server.index("").wait_task(0).await;
assert_eq!(response["status"], "failed");
}
#[actix_rt::test]
async fn get_default() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) = index.create(None).await;
assert_eq!(code, 202, "{}", response);
index.wait_task(0).await;
let url = format!("/indexes/test/settings/{}",
stringify!($setting)
.chars()
.map(|c| if c == '_' { '-' } else { c })
.collect::<String>());
let (response, code) = server.service.get(url).await;
assert_eq!(code, 200, "{}", response);
let expected = crate::json!($default_value);
assert_eq!(expected, response);
}
}
}),
);
map.insert(
"pagination",
json!({
"maxTotalHits": json!(1000),
}),
);
map.insert("search_cutoff_ms", json!(null));
map
});
)*
#[actix_rt::test]
async fn all_setting_tested() {
let expected = std::collections::BTreeSet::from_iter(meilisearch::routes::indexes::settings::ALL_SETTINGS_NAMES.iter());
let tested = std::collections::BTreeSet::from_iter([$(stringify!($setting)),*].iter());
let diff: Vec<_> = expected.difference(&tested).collect();
assert!(diff.is_empty(), "Not all settings were tested, please add the following settings to the `test_setting_routes!` macro: {:?}", diff);
}
};
}
test_setting_routes!(
{
setting: filterable_attributes,
update_verb: put,
default_value: []
},
{
setting: displayed_attributes,
update_verb: put,
default_value: ["*"]
},
{
setting: localized_attributes,
update_verb: put,
default_value: null
},
{
setting: searchable_attributes,
update_verb: put,
default_value: ["*"]
},
{
setting: distinct_attribute,
update_verb: put,
default_value: null
},
{
setting: stop_words,
update_verb: put,
default_value: []
},
{
setting: separator_tokens,
update_verb: put,
default_value: []
},
{
setting: non_separator_tokens,
update_verb: put,
default_value: []
},
{
setting: dictionary,
update_verb: put,
default_value: []
},
{
setting: ranking_rules,
update_verb: put,
default_value: ["words", "typo", "proximity", "attribute", "sort", "exactness"]
},
{
setting: synonyms,
update_verb: put,
default_value: {}
},
{
setting: pagination,
update_verb: patch,
default_value: {"maxTotalHits": 1000}
},
{
setting: faceting,
update_verb: patch,
default_value: {"maxValuesPerFacet": 100, "sortFacetValuesBy": {"*": "alpha"}}
},
{
setting: search_cutoff_ms,
update_verb: put,
default_value: null
},
{
setting: embedders,
update_verb: patch,
default_value: null
},
{
setting: facet_search,
update_verb: put,
default_value: true
},
{
setting: prefix_search,
update_verb: put,
default_value: "indexingTime"
},
{
setting: proximity_precision,
update_verb: put,
default_value: "byWord"
},
{
setting: sortable_attributes,
update_verb: put,
default_value: []
},
{
setting: typo_tolerance,
update_verb: patch,
default_value: {"enabled": true, "minWordSizeForTypos": {"oneTypo": 5, "twoTypos": 9}, "disableOnWords": [], "disableOnAttributes": []}
},
);
#[actix_rt::test]
async fn get_settings_unexisting_index() {
@@ -342,93 +483,6 @@ async fn error_update_setting_unexisting_index_invalid_uid() {
"###);
}
macro_rules! test_setting_routes {
($($setting:ident $write_method:ident), *) => {
$(
mod $setting {
use crate::common::Server;
use super::DEFAULT_SETTINGS_VALUES;
#[actix_rt::test]
async fn get_unexisting_index() {
let server = Server::new().await;
let url = format!("/indexes/test/settings/{}",
stringify!($setting)
.chars()
.map(|c| if c == '_' { '-' } else { c })
.collect::<String>());
let (_response, code) = server.service.get(url).await;
assert_eq!(code, 404);
}
#[actix_rt::test]
async fn update_unexisting_index() {
let server = Server::new().await;
let url = format!("/indexes/test/settings/{}",
stringify!($setting)
.chars()
.map(|c| if c == '_' { '-' } else { c })
.collect::<String>());
let (response, code) = server.service.$write_method(url, serde_json::Value::Null.into()).await;
assert_eq!(code, 202, "{}", response);
server.index("").wait_task(0).await;
let (response, code) = server.index("test").get().await;
assert_eq!(code, 200, "{}", response);
}
#[actix_rt::test]
async fn delete_unexisting_index() {
let server = Server::new().await;
let url = format!("/indexes/test/settings/{}",
stringify!($setting)
.chars()
.map(|c| if c == '_' { '-' } else { c })
.collect::<String>());
let (_, code) = server.service.delete(url).await;
assert_eq!(code, 202);
let response = server.index("").wait_task(0).await;
assert_eq!(response["status"], "failed");
}
#[actix_rt::test]
async fn get_default() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) = index.create(None).await;
assert_eq!(code, 202, "{}", response);
index.wait_task(0).await;
let url = format!("/indexes/test/settings/{}",
stringify!($setting)
.chars()
.map(|c| if c == '_' { '-' } else { c })
.collect::<String>());
let (response, code) = server.service.get(url).await;
assert_eq!(code, 200, "{}", response);
let expected = DEFAULT_SETTINGS_VALUES.get(stringify!($setting)).unwrap();
assert_eq!(expected, &response);
}
}
)*
};
}
test_setting_routes!(
filterable_attributes put,
displayed_attributes put,
localized_attributes put,
searchable_attributes put,
distinct_attribute put,
stop_words put,
separator_tokens put,
non_separator_tokens put,
dictionary put,
ranking_rules put,
synonyms put,
pagination patch,
faceting patch,
search_cutoff_ms put
);
#[actix_rt::test]
async fn error_set_invalid_ranking_rules() {
let server = Server::new().await;

View File

@@ -129,11 +129,11 @@ async fn perform_on_demand_snapshot() {
index.load_test_set().await;
server.index("doggo").create(Some("bone")).await;
index.wait_task(2).await;
let (task, _) = server.index("doggo").create(Some("bone")).await;
index.wait_task(task.uid()).await.succeeded();
server.index("doggo").create(Some("bone")).await;
index.wait_task(2).await;
let (task, _) = server.index("doggo").create(Some("bone")).await;
index.wait_task(task.uid()).await.failed();
let (task, code) = server.create_snapshot().await;
snapshot!(code, @"202 Accepted");

View File

@@ -448,7 +448,7 @@ async fn test_summarized_delete_documents_by_filter() {
"originalFilter": "\"doggo = bernese\""
},
"error": {
"message": "Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese",
"message": "Index `test`: Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"

View File

@@ -318,7 +318,7 @@ async fn try_to_disable_binary_quantization() {
}
},
"error": {
"message": "`.embedders.manual.binaryQuantized`: Cannot disable the binary quantization.\n - Note: Binary quantization is a lossy operation that cannot be reverted.\n - Hint: Add a new embedder that is non-quantized and regenerate the vectors.",
"message": "Index `doggo`: `.embedders.manual.binaryQuantized`: Cannot disable the binary quantization.\n - Note: Binary quantization is a lossy operation that cannot be reverted.\n - Hint: Add a new embedder that is non-quantized and regenerate the vectors.",
"code": "invalid_settings_embedders",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_settings_embedders"

View File

@@ -250,7 +250,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0
},
"error": {
"message": "Bad embedder configuration in the document with id: `0`. Missing field `._vectors.manual.regenerate`\n - note: `._vectors.manual` must be an array of floats, an array of arrays of floats, or an object with field `regenerate`",
"message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Missing field `._vectors.manual.regenerate`\n - note: `._vectors.manual` must be an array of floats, an array of arrays of floats, or an object with field `regenerate`",
"code": "invalid_vectors_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@@ -280,7 +280,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0
},
"error": {
"message": "Bad embedder configuration in the document with id: `0`. Missing field `._vectors.manual.regenerate`\n - note: `._vectors.manual` must be an array of floats, an array of arrays of floats, or an object with field `regenerate`",
"message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Missing field `._vectors.manual.regenerate`\n - note: `._vectors.manual` must be an array of floats, an array of arrays of floats, or an object with field `regenerate`",
"code": "invalid_vectors_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@@ -311,7 +311,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0
},
"error": {
"message": "Bad embedder configuration in the document with id: `0`. Could not parse `._vectors.manual.regenerate`: invalid type: string \"yes please\", expected a boolean at line 1 column 26",
"message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Could not parse `._vectors.manual.regenerate`: invalid type: string \"yes please\", expected a boolean at line 1 column 26",
"code": "invalid_vectors_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@@ -340,7 +340,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0
},
"error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings`: expected null or an array, but found a boolean: `true`",
"message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings`: expected null or an array, but found a boolean: `true`",
"code": "invalid_vectors_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@@ -369,7 +369,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0
},
"error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0]`: expected a number or an array, but found a boolean: `true`",
"message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0]`: expected a number or an array, but found a boolean: `true`",
"code": "invalid_vectors_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@@ -398,7 +398,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0
},
"error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0][0]`: expected a number, but found a boolean: `true`",
"message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0][0]`: expected a number, but found a boolean: `true`",
"code": "invalid_vectors_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@@ -440,7 +440,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0
},
"error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[1]`: expected a number, but found an array: `[0.2,0.3]`",
"message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[1]`: expected a number, but found an array: `[0.2,0.3]`",
"code": "invalid_vectors_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@@ -469,7 +469,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0
},
"error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[1]`: expected an array, but found a number: `0.3`",
"message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[1]`: expected an array, but found a number: `0.3`",
"code": "invalid_vectors_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@@ -498,7 +498,7 @@ async fn user_provided_embeddings_error() {
"indexedDocuments": 0
},
"error": {
"message": "Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0][1]`: expected a number, but found a boolean: `true`",
"message": "Index `doggo`: Bad embedder configuration in the document with id: `0`. Invalid value type at `._vectors.manual.embeddings[0][1]`: expected a number, but found a boolean: `true`",
"code": "invalid_vectors_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_vectors_type"
@@ -539,7 +539,7 @@ async fn user_provided_vectors_error() {
"indexedDocuments": 0
},
"error": {
"message": "While embedding documents for embedder `manual`: no vectors provided for document `40` and at least 4 other document(s)\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: opt-out for a document with `_vectors.manual: null`",
"message": "Index `doggo`: While embedding documents for embedder `manual`: no vectors provided for document `40` and at least 4 other document(s)\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: opt-out for a document with `_vectors.manual: null`",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -569,7 +569,7 @@ async fn user_provided_vectors_error() {
"indexedDocuments": 0
},
"error": {
"message": "While embedding documents for embedder `manual`: no vectors provided for document `42`\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: try replacing `_vector` by `_vectors` in 1 document(s).",
"message": "Index `doggo`: While embedding documents for embedder `manual`: no vectors provided for document `42`\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: try replacing `_vector` by `_vectors` in 1 document(s).",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -599,7 +599,7 @@ async fn user_provided_vectors_error() {
"indexedDocuments": 0
},
"error": {
"message": "While embedding documents for embedder `manual`: no vectors provided for document `42`\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: try replacing `_vectors.manaul` by `_vectors.manual` in 1 document(s).",
"message": "Index `doggo`: While embedding documents for embedder `manual`: no vectors provided for document `42`\n- Note: `manual` has `source: userProvided`, so documents must provide embeddings as an array in `_vectors.manual`.\n- Hint: try replacing `_vectors.manaul` by `_vectors.manual` in 1 document(s).",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"

View File

@@ -713,7 +713,7 @@ async fn bad_api_key() {
}
},
"error": {
"message": "While embedding documents for embedder `default`: user error: could not authenticate against OpenAI server\n - server replied with `{\"error\":{\"message\":\"Incorrect API key provided: Bearer doggo. You can find your API key at https://platform.openai.com/account/api-keys.\",\"type\":\"invalid_request_error\",\"param\":null,\"code\":\"invalid_api_key\"}}`\n - Hint: Check the `apiKey` parameter in the embedder configuration, and the `MEILI_OPENAI_API_KEY` and `OPENAI_API_KEY` environment variables",
"message": "Index `doggo`: While embedding documents for embedder `default`: user error: could not authenticate against OpenAI server\n - server replied with `{\"error\":{\"message\":\"Incorrect API key provided: Bearer doggo. You can find your API key at https://platform.openai.com/account/api-keys.\",\"type\":\"invalid_request_error\",\"param\":null,\"code\":\"invalid_api_key\"}}`\n - Hint: Check the `apiKey` parameter in the embedder configuration, and the `MEILI_OPENAI_API_KEY` and `OPENAI_API_KEY` environment variables",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -757,7 +757,7 @@ async fn bad_api_key() {
}
},
"error": {
"message": "While embedding documents for embedder `default`: user error: could not authenticate against OpenAI server\n - server replied with `{\"error\":{\"message\":\"You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.\",\"type\":\"invalid_request_error\",\"param\":null,\"code\":null}}`\n - Hint: Check the `apiKey` parameter in the embedder configuration, and the `MEILI_OPENAI_API_KEY` and `OPENAI_API_KEY` environment variables",
"message": "Index `doggo`: While embedding documents for embedder `default`: user error: could not authenticate against OpenAI server\n - server replied with `{\"error\":{\"message\":\"You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.\",\"type\":\"invalid_request_error\",\"param\":null,\"code\":null}}`\n - Hint: Check the `apiKey` parameter in the embedder configuration, and the `MEILI_OPENAI_API_KEY` and `OPENAI_API_KEY` environment variables",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"

View File

@@ -985,7 +985,7 @@ async fn bad_settings() {
}
},
"error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting a single \"{{embedding}}\", expected `response` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected a sequence",
"message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting a single \"{{embedding}}\", expected `response` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected a sequence",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -1025,7 +1025,7 @@ async fn bad_settings() {
"indexedDocuments": 0
},
"error": {
"message": "While embedding documents for embedder `rest`: runtime error: was expecting embeddings of dimension `2`, got embeddings of dimensions `3`",
"message": "Index `doggo`: While embedding documents for embedder `rest`: runtime error: was expecting embeddings of dimension `2`, got embeddings of dimensions `3`",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -1178,7 +1178,7 @@ async fn server_returns_bad_request() {
}
},
"error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: sent a bad request to embedding server\n - Hint: check that the `request` in the embedder configuration matches the remote server's API\n - server replied with `{\"error\":\"Invalid request: invalid type: string \\\"test\\\", expected struct MultipleRequest at line 1 column 6\"}`",
"message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: sent a bad request to embedding server\n - Hint: check that the `request` in the embedder configuration matches the remote server's API\n - server replied with `{\"error\":\"Invalid request: invalid type: string \\\"test\\\", expected struct MultipleRequest at line 1 column 6\"}`",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -1247,7 +1247,7 @@ async fn server_returns_bad_request() {
"indexedDocuments": 0
},
"error": {
"message": "While embedding documents for embedder `rest`: user error: sent a bad request to embedding server\n - Hint: check that the `request` in the embedder configuration matches the remote server's API\n - server replied with `{\"error\":\"Invalid request: invalid type: string \\\"name: kefir\\\\n\\\", expected struct MultipleRequest at line 1 column 15\"}`",
"message": "Index `doggo`: While embedding documents for embedder `rest`: user error: sent a bad request to embedding server\n - Hint: check that the `request` in the embedder configuration matches the remote server's API\n - server replied with `{\"error\":\"Invalid request: invalid type: string \\\"name: kefir\\\\n\\\", expected struct MultipleRequest at line 1 column 15\"}`",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -1306,7 +1306,7 @@ async fn server_returns_bad_response() {
}
},
"error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting the array of \"{{embedding}}\"s, configuration expects `response` to be an array with at least 1 item(s) but server sent an object with 1 field(s)",
"message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting the array of \"{{embedding}}\"s, configuration expects `response` to be an array with at least 1 item(s) but server sent an object with 1 field(s)",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -1362,7 +1362,7 @@ async fn server_returns_bad_response() {
}
},
"error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting item #0 from the array of \"{{embedding}}\"s, expected `response` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected a sequence",
"message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response`, while extracting item #0 from the array of \"{{embedding}}\"s, expected `response` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected a sequence",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -1414,7 +1414,7 @@ async fn server_returns_bad_response() {
}
},
"error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.output`, while extracting a single \"{{embedding}}\", expected `output` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected f32",
"message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.output`, while extracting a single \"{{embedding}}\", expected `output` to be an array of numbers, but failed to parse server response:\n - invalid type: map, expected f32",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -1478,7 +1478,7 @@ async fn server_returns_bad_response() {
}
},
"error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.embedding`, while extracting item #0 from the array of \"{{embedding}}\"s, configuration expects `embedding` to be an object with key `data` but server sent an array of size 3",
"message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.embedding`, while extracting item #0 from the array of \"{{embedding}}\"s, configuration expects `embedding` to be an object with key `data` but server sent an array of size 3",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -1542,7 +1542,7 @@ async fn server_returns_bad_response() {
}
},
"error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.output[0]`, while extracting a single \"{{embedding}}\", configuration expects key \"embeddings\", which is missing in response\n - Hint: item #0 inside `output` has key `embedding`, did you mean `response.output[0].embedding` in embedder configuration?",
"message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with runtime error: error extracting embeddings from the response:\n - in `response.output[0]`, while extracting a single \"{{embedding}}\", configuration expects key \"embeddings\", which is missing in response\n - Hint: item #0 inside `output` has key `embedding`, did you mean `response.output[0].embedding` in embedder configuration?",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -1908,7 +1908,7 @@ async fn server_custom_header() {
}
},
"error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: could not authenticate against embedding server\n - server replied with `{\"error\":\"missing header 'my-nonstandard-auth'\"}`\n - Hint: Check the `apiKey` parameter in the embedder configuration",
"message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: could not authenticate against embedding server\n - server replied with `{\"error\":\"missing header 'my-nonstandard-auth'\"}`\n - Hint: Check the `apiKey` parameter in the embedder configuration",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -1951,7 +1951,7 @@ async fn server_custom_header() {
}
},
"error": {
"message": "Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: could not authenticate against embedding server\n - server replied with `{\"error\":\"thou shall not pass, Balrog\"}`\n - Hint: Check the `apiKey` parameter in the embedder configuration",
"message": "Index `doggo`: Error while generating embeddings: runtime error: could not determine model dimensions:\n - test embedding failed with user error: could not authenticate against embedding server\n - server replied with `{\"error\":\"thou shall not pass, Balrog\"}`\n - Hint: Check the `apiKey` parameter in the embedder configuration",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"
@@ -2099,7 +2099,7 @@ async fn searchable_reindex() {
]
},
"error": {
"message": "While embedding documents for embedder `rest`: error: received unexpected HTTP 404 from embedding server\n - server replied with `{\"error\":\"text not found\",\"text\":\"breed: patou\\n\"}`",
"message": "Index `doggo`: While embedding documents for embedder `rest`: error: received unexpected HTTP 404 from embedding server\n - server replied with `{\"error\":\"text not found\",\"text\":\"breed: patou\\n\"}`",
"code": "vector_embedding_error",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#vector_embedding_error"

View File

@@ -42,7 +42,7 @@ obkv = "0.3.0"
once_cell = "1.19.0"
ordered-float = "4.2.1"
rayon = "1.10.0"
roaring = { version = "0.10.6", features = ["serde"] }
roaring = { version = "0.10.7", features = ["serde"] }
rstar = { version = "0.12.0", features = ["serde"] }
serde = { version = "1.0.204", features = ["derive"] }
serde_json = { version = "1.0.120", features = ["preserve_order", "raw_value"] }
@@ -98,6 +98,8 @@ allocator-api2 = "0.2.18"
rustc-hash = "2.0.0"
uell = "0.1.0"
enum-iterator = "2.1.0"
bbqueue = { git = "https://github.com/meilisearch/bbqueue" }
flume = { version = "0.11.1", default-features = false }
[dev-dependencies]
mimalloc = { version = "0.1.43", default-features = false }

View File

@@ -3,6 +3,7 @@ use std::convert::Infallible;
use std::fmt::Write;
use std::{io, str};
use bstr::BString;
use heed::{Error as HeedError, MdbError};
use rayon::ThreadPoolBuildError;
use rhai::EvalAltResult;
@@ -62,9 +63,9 @@ pub enum InternalError {
#[error(transparent)]
Store(#[from] MdbError),
#[error("Cannot delete {key:?} from database {database_name}: {error}")]
StoreDeletion { database_name: &'static str, key: Vec<u8>, error: heed::Error },
StoreDeletion { database_name: &'static str, key: BString, error: heed::Error },
#[error("Cannot insert {key:?} and value with length {value_length} into database {database_name}: {error}")]
StorePut { database_name: &'static str, key: Vec<u8>, value_length: usize, error: heed::Error },
StorePut { database_name: &'static str, key: BString, value_length: usize, error: heed::Error },
#[error(transparent)]
Utf8(#[from] str::Utf8Error),
#[error("An indexation process was explicitly aborted")]

View File

@@ -97,7 +97,7 @@ impl<'a> heed::BytesEncode<'a> for FacetGroupValueCodec {
fn bytes_encode(value: &'a Self::EItem) -> Result<Cow<'a, [u8]>, BoxedError> {
let mut v = vec![value.size];
CboRoaringBitmapCodec::serialize_into(&value.bitmap, &mut v);
CboRoaringBitmapCodec::serialize_into_vec(&value.bitmap, &mut v);
Ok(Cow::Owned(v))
}
}

View File

@@ -27,18 +27,27 @@ impl CboRoaringBitmapCodec {
}
}
pub fn serialize_into(roaring: &RoaringBitmap, vec: &mut Vec<u8>) {
pub fn serialize_into_vec(roaring: &RoaringBitmap, vec: &mut Vec<u8>) {
Self::serialize_into_writer(roaring, vec).unwrap()
}
pub fn serialize_into_writer<W: io::Write>(
roaring: &RoaringBitmap,
mut writer: W,
) -> io::Result<()> {
if roaring.len() <= THRESHOLD as u64 {
// If the number of items (u32s) to encode is less than or equal to the threshold
// it means that it would weigh the same or less than the RoaringBitmap
// header, so we directly encode them using ByteOrder instead.
for integer in roaring {
vec.write_u32::<NativeEndian>(integer).unwrap();
writer.write_u32::<NativeEndian>(integer)?;
}
} else {
// Otherwise, we use the classic RoaringBitmapCodec that writes a header.
roaring.serialize_into(vec).unwrap();
roaring.serialize_into(writer)?;
}
Ok(())
}
pub fn deserialize_from(mut bytes: &[u8]) -> io::Result<RoaringBitmap> {
@@ -143,7 +152,7 @@ impl CboRoaringBitmapCodec {
return Ok(None);
}
Self::serialize_into(&previous, buffer);
Self::serialize_into_vec(&previous, buffer);
Ok(Some(&buffer[..]))
}
}
@@ -169,7 +178,7 @@ impl heed::BytesEncode<'_> for CboRoaringBitmapCodec {
fn bytes_encode(item: &Self::EItem) -> Result<Cow<'_, [u8]>, BoxedError> {
let mut vec = Vec::with_capacity(Self::serialized_size(item));
Self::serialize_into(item, &mut vec);
Self::serialize_into_vec(item, &mut vec);
Ok(Cow::Owned(vec))
}
}

View File

@@ -1821,6 +1821,7 @@ pub(crate) mod tests {
indexer::index(
wtxn,
&self.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1911,6 +1912,7 @@ pub(crate) mod tests {
indexer::index(
wtxn,
&self.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -1991,6 +1993,7 @@ pub(crate) mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,

View File

@@ -1,6 +1,7 @@
#![cfg_attr(all(test, fuzzing), feature(no_coverage))]
#![allow(clippy::type_complexity)]
#[cfg(not(windows))]
#[cfg(test)]
#[global_allocator]
pub static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;

View File

@@ -83,6 +83,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index {
indexer::index(
&mut wtxn,
&index,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,

View File

@@ -2155,6 +2155,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -2216,6 +2217,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -2268,6 +2270,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -2319,6 +2322,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -2372,6 +2376,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -2430,6 +2435,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -2481,6 +2487,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -2532,6 +2539,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -2725,6 +2733,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -2783,6 +2792,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
@@ -2838,6 +2848,7 @@ mod tests {
indexer::index(
&mut wtxn,
&index.inner,
&crate::ThreadPoolNoAbortBuilder::new().build().unwrap(),
indexer_config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,

File diff suppressed because it is too large Load Diff

View File

@@ -1,7 +1,10 @@
use bumpalo::Bump;
use heed::RoTxn;
use super::document::{DocumentFromDb, DocumentFromVersions, MergedDocument, Versions};
use super::document::{
Document as _, DocumentFromDb, DocumentFromVersions, MergedDocument, Versions,
};
use super::extract::perm_json_p;
use super::vector_document::{
MergedVectorDocument, VectorDocumentFromDb, VectorDocumentFromVersions,
};
@@ -164,6 +167,80 @@ impl<'doc> Update<'doc> {
}
}
/// Returns whether the updated version of the document is different from the current version for the passed subset of fields.
///
/// `true` if at least one top-level-field that is a exactly a member of field or a parent of a member of field changed.
/// Otherwise `false`.
pub fn has_changed_for_fields<'t, Mapper: FieldIdMapper>(
&self,
fields: Option<&[&str]>,
rtxn: &'t RoTxn,
index: &'t Index,
mapper: &'t Mapper,
) -> Result<bool> {
let mut changed = false;
let mut cached_current = None;
let mut updated_selected_field_count = 0;
for entry in self.updated().iter_top_level_fields() {
let (key, updated_value) = entry?;
if perm_json_p::select_field(key, fields, &[]) == perm_json_p::Selection::Skip {
continue;
}
updated_selected_field_count += 1;
let current = match cached_current {
Some(current) => current,
None => self.current(rtxn, index, mapper)?,
};
let current_value = current.top_level_field(key)?;
let Some(current_value) = current_value else {
changed = true;
break;
};
if current_value.get() != updated_value.get() {
changed = true;
break;
}
cached_current = Some(current);
}
if !self.has_deletion {
// no field deletion, so fields that don't appear in `updated` cannot have changed
return Ok(changed);
}
if changed {
return Ok(true);
}
// we saw all updated fields, and set `changed` if any field wasn't in `current`.
// so if there are as many fields in `current` as in `updated`, then nothing changed.
// If there is any more fields in `current`, then they are missing in `updated`.
let has_deleted_fields = {
let current = match cached_current {
Some(current) => current,
None => self.current(rtxn, index, mapper)?,
};
let mut current_selected_field_count = 0;
for entry in current.iter_top_level_fields() {
let (key, _) = entry?;
if perm_json_p::select_field(key, fields, &[]) == perm_json_p::Selection::Skip {
continue;
}
current_selected_field_count += 1;
}
current_selected_field_count != updated_selected_field_count
};
Ok(has_deleted_fields)
}
pub fn updated_vectors(
&self,
doc_alloc: &'doc Bump,

View File

@@ -415,21 +415,21 @@ fn spill_entry_to_sorter(
match deladd {
DelAddRoaringBitmap { del: Some(del), add: None } => {
cbo_buffer.clear();
CboRoaringBitmapCodec::serialize_into(&del, cbo_buffer);
CboRoaringBitmapCodec::serialize_into_vec(&del, cbo_buffer);
value_writer.insert(DelAdd::Deletion, &cbo_buffer)?;
}
DelAddRoaringBitmap { del: None, add: Some(add) } => {
cbo_buffer.clear();
CboRoaringBitmapCodec::serialize_into(&add, cbo_buffer);
CboRoaringBitmapCodec::serialize_into_vec(&add, cbo_buffer);
value_writer.insert(DelAdd::Addition, &cbo_buffer)?;
}
DelAddRoaringBitmap { del: Some(del), add: Some(add) } => {
cbo_buffer.clear();
CboRoaringBitmapCodec::serialize_into(&del, cbo_buffer);
CboRoaringBitmapCodec::serialize_into_vec(&del, cbo_buffer);
value_writer.insert(DelAdd::Deletion, &cbo_buffer)?;
cbo_buffer.clear();
CboRoaringBitmapCodec::serialize_into(&add, cbo_buffer);
CboRoaringBitmapCodec::serialize_into_vec(&add, cbo_buffer);
value_writer.insert(DelAdd::Addition, &cbo_buffer)?;
}
DelAddRoaringBitmap { del: None, add: None } => return Ok(()),
@@ -466,12 +466,13 @@ pub fn transpose_and_freeze_caches<'a, 'extractor>(
Ok(bucket_caches)
}
/// Merges the caches that must be all associated to the same bucket.
/// Merges the caches that must be all associated to the same bucket
/// but make sure to sort the different buckets before performing the merges.
///
/// # Panics
///
/// - If the bucket IDs in these frozen caches are not exactly the same.
pub fn merge_caches<F>(frozen: Vec<FrozenCache>, mut f: F) -> Result<()>
pub fn merge_caches_sorted<F>(frozen: Vec<FrozenCache>, mut f: F) -> Result<()>
where
F: for<'a> FnMut(&'a [u8], DelAddRoaringBitmap) -> Result<()>,
{
@@ -543,12 +544,12 @@ where
// Then manage the content on the HashMap entries that weren't taken (mem::take).
while let Some(mut map) = maps.pop() {
for (key, bbbul) in map.iter_mut() {
// Make sure we don't try to work with entries already managed by the spilled
if bbbul.is_empty() {
continue;
}
// Make sure we don't try to work with entries already managed by the spilled
let mut ordered_entries: Vec<_> =
map.iter_mut().filter(|(_, bbbul)| !bbbul.is_empty()).collect();
ordered_entries.sort_unstable_by_key(|(key, _)| *key);
for (key, bbbul) in ordered_entries {
let mut output = DelAddRoaringBitmap::empty();
output.union_and_clear_bbbul(bbbul);

View File

@@ -12,13 +12,14 @@ use crate::update::new::thread_local::FullySend;
use crate::update::new::DocumentChange;
use crate::vector::EmbeddingConfigs;
use crate::Result;
pub struct DocumentsExtractor<'a> {
document_sender: &'a DocumentsSender<'a>,
pub struct DocumentsExtractor<'a, 'b> {
document_sender: DocumentsSender<'a, 'b>,
embedders: &'a EmbeddingConfigs,
}
impl<'a> DocumentsExtractor<'a> {
pub fn new(document_sender: &'a DocumentsSender<'a>, embedders: &'a EmbeddingConfigs) -> Self {
impl<'a, 'b> DocumentsExtractor<'a, 'b> {
pub fn new(document_sender: DocumentsSender<'a, 'b>, embedders: &'a EmbeddingConfigs) -> Self {
Self { document_sender, embedders }
}
}
@@ -29,7 +30,7 @@ pub struct DocumentExtractorData {
pub field_distribution_delta: HashMap<String, i64>,
}
impl<'a, 'extractor> Extractor<'extractor> for DocumentsExtractor<'a> {
impl<'a, 'b, 'extractor> Extractor<'extractor> for DocumentsExtractor<'a, 'b> {
type Data = FullySend<RefCell<DocumentExtractorData>>;
fn init_data(&self, _extractor_alloc: &'extractor Bump) -> Result<Self::Data> {

View File

@@ -25,14 +25,14 @@ use crate::update::new::DocumentChange;
use crate::update::GrenadParameters;
use crate::{DocumentId, FieldId, Index, Result, MAX_FACET_VALUE_LENGTH};
pub struct FacetedExtractorData<'a> {
pub struct FacetedExtractorData<'a, 'b> {
attributes_to_extract: &'a [&'a str],
sender: &'a FieldIdDocidFacetSender<'a>,
sender: &'a FieldIdDocidFacetSender<'a, 'b>,
grenad_parameters: GrenadParameters,
buckets: usize,
}
impl<'a, 'extractor> Extractor<'extractor> for FacetedExtractorData<'a> {
impl<'a, 'b, 'extractor> Extractor<'extractor> for FacetedExtractorData<'a, 'b> {
type Data = RefCell<BalancedCaches<'extractor>>;
fn init_data(&self, extractor_alloc: &'extractor Bump) -> Result<Self::Data> {
@@ -97,6 +97,15 @@ impl FacetedDocidsExtractor {
},
),
DocumentChange::Update(inner) => {
if !inner.has_changed_for_fields(
Some(attributes_to_extract),
rtxn,
index,
context.db_fields_ids_map,
)? {
return Ok(());
}
extract_document_facets(
attributes_to_extract,
inner.current(rtxn, index, context.db_fields_ids_map)?,
@@ -318,7 +327,7 @@ impl<'doc> DelAddFacetValue<'doc> {
docid: DocumentId,
sender: &FieldIdDocidFacetSender,
doc_alloc: &Bump,
) -> std::result::Result<(), crossbeam_channel::SendError<()>> {
) -> crate::Result<()> {
let mut buffer = bumpalo::collections::Vec::new_in(doc_alloc);
for ((fid, value), deladd) in self.strings {
if let Ok(s) = std::str::from_utf8(&value) {

View File

@@ -1,6 +1,6 @@
use std::cell::RefCell;
use std::fs::File;
use std::io::{self, BufReader, BufWriter, ErrorKind, Read, Write as _};
use std::io::{self, BufReader, BufWriter, ErrorKind, Read, Seek as _, Write as _};
use std::{iter, mem, result};
use bumpalo::Bump;
@@ -97,30 +97,34 @@ pub struct FrozenGeoExtractorData<'extractor> {
impl<'extractor> FrozenGeoExtractorData<'extractor> {
pub fn iter_and_clear_removed(
&mut self,
) -> impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_ {
mem::take(&mut self.removed)
) -> io::Result<impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_> {
Ok(mem::take(&mut self.removed)
.iter()
.copied()
.map(Ok)
.chain(iterator_over_spilled_geopoints(&mut self.spilled_removed))
.chain(iterator_over_spilled_geopoints(&mut self.spilled_removed)?))
}
pub fn iter_and_clear_inserted(
&mut self,
) -> impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_ {
mem::take(&mut self.inserted)
) -> io::Result<impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_> {
Ok(mem::take(&mut self.inserted)
.iter()
.copied()
.map(Ok)
.chain(iterator_over_spilled_geopoints(&mut self.spilled_inserted))
.chain(iterator_over_spilled_geopoints(&mut self.spilled_inserted)?))
}
}
fn iterator_over_spilled_geopoints(
spilled: &mut Option<BufReader<File>>,
) -> impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_ {
) -> io::Result<impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_> {
let mut spilled = spilled.take();
iter::from_fn(move || match &mut spilled {
if let Some(spilled) = &mut spilled {
spilled.rewind()?;
}
Ok(iter::from_fn(move || match &mut spilled {
Some(file) => {
let geopoint_bytes = &mut [0u8; mem::size_of::<ExtractedGeoPoint>()];
match file.read_exact(geopoint_bytes) {
@@ -130,7 +134,7 @@ fn iterator_over_spilled_geopoints(
}
}
None => None,
})
}))
}
impl<'extractor> Extractor<'extractor> for GeoExtractor {
@@ -157,7 +161,9 @@ impl<'extractor> Extractor<'extractor> for GeoExtractor {
let mut data_ref = context.data.borrow_mut_or_yield();
for change in changes {
if max_memory.map_or(false, |mm| context.extractor_alloc.allocated_bytes() >= mm) {
if data_ref.spilled_removed.is_none()
&& max_memory.map_or(false, |mm| context.extractor_alloc.allocated_bytes() >= mm)
{
// We must spill as we allocated too much memory
data_ref.spilled_removed = tempfile::tempfile().map(BufWriter::new).map(Some)?;
data_ref.spilled_inserted = tempfile::tempfile().map(BufWriter::new).map(Some)?;

View File

@@ -6,7 +6,9 @@ mod searchable;
mod vectors;
use bumpalo::Bump;
pub use cache::{merge_caches, transpose_and_freeze_caches, BalancedCaches, DelAddRoaringBitmap};
pub use cache::{
merge_caches_sorted, transpose_and_freeze_caches, BalancedCaches, DelAddRoaringBitmap,
};
pub use documents::*;
pub use faceted::*;
pub use geo::*;

View File

@@ -28,7 +28,7 @@ pub struct WordDocidsBalancedCaches<'extractor> {
exact_word_docids: BalancedCaches<'extractor>,
word_position_docids: BalancedCaches<'extractor>,
fid_word_count_docids: BalancedCaches<'extractor>,
fid_word_count: HashMap<FieldId, (usize, usize)>,
fid_word_count: HashMap<FieldId, (Option<usize>, Option<usize>)>,
current_docid: Option<DocumentId>,
}
@@ -85,8 +85,8 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
self.fid_word_count
.entry(field_id)
.and_modify(|(_current_count, new_count)| *new_count += 1)
.or_insert((0, 1));
.and_modify(|(_current_count, new_count)| *new_count.get_or_insert(0) += 1)
.or_insert((None, Some(1)));
self.current_docid = Some(docid);
Ok(())
@@ -130,8 +130,8 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
self.fid_word_count
.entry(field_id)
.and_modify(|(current_count, _new_count)| *current_count += 1)
.or_insert((1, 0));
.and_modify(|(current_count, _new_count)| *current_count.get_or_insert(0) += 1)
.or_insert((Some(1), None));
self.current_docid = Some(docid);
@@ -141,14 +141,18 @@ impl<'extractor> WordDocidsBalancedCaches<'extractor> {
fn flush_fid_word_count(&mut self, buffer: &mut BumpVec<u8>) -> Result<()> {
for (fid, (current_count, new_count)) in self.fid_word_count.drain() {
if current_count != new_count {
if current_count <= MAX_COUNTED_WORDS {
if let Some(current_count) =
current_count.filter(|current_count| *current_count <= MAX_COUNTED_WORDS)
{
buffer.clear();
buffer.extend_from_slice(&fid.to_be_bytes());
buffer.push(current_count as u8);
self.fid_word_count_docids
.insert_del_u32(buffer, self.current_docid.unwrap())?;
}
if new_count <= MAX_COUNTED_WORDS {
if let Some(new_count) =
new_count.filter(|new_count| *new_count <= MAX_COUNTED_WORDS)
{
buffer.clear();
buffer.extend_from_slice(&fid.to_be_bytes());
buffer.push(new_count as u8);
@@ -351,6 +355,15 @@ impl WordDocidsExtractors {
)?;
}
DocumentChange::Update(inner) => {
if !inner.has_changed_for_fields(
document_tokenizer.attribute_to_extract,
&context.rtxn,
context.index,
context.db_fields_ids_map,
)? {
return Ok(());
}
let mut token_fn = |fname: &str, fid, pos, word: &str| {
cached_sorter.insert_del_u32(
fid,

View File

@@ -70,6 +70,15 @@ impl SearchableExtractor for WordPairProximityDocidsExtractor {
)?;
}
DocumentChange::Update(inner) => {
if !inner.has_changed_for_fields(
document_tokenizer.attribute_to_extract,
rtxn,
index,
context.db_fields_ids_map,
)? {
return Ok(());
}
let document = inner.current(rtxn, index, context.db_fields_ids_map)?;
process_document_tokens(
document,

View File

@@ -18,17 +18,17 @@ use crate::vector::error::{
use crate::vector::{Embedder, Embedding, EmbeddingConfigs};
use crate::{DocumentId, FieldDistribution, InternalError, Result, ThreadPoolNoAbort, UserError};
pub struct EmbeddingExtractor<'a> {
pub struct EmbeddingExtractor<'a, 'b> {
embedders: &'a EmbeddingConfigs,
sender: &'a EmbeddingSender<'a>,
sender: EmbeddingSender<'a, 'b>,
possible_embedding_mistakes: PossibleEmbeddingMistakes,
threads: &'a ThreadPoolNoAbort,
}
impl<'a> EmbeddingExtractor<'a> {
impl<'a, 'b> EmbeddingExtractor<'a, 'b> {
pub fn new(
embedders: &'a EmbeddingConfigs,
sender: &'a EmbeddingSender<'a>,
sender: EmbeddingSender<'a, 'b>,
field_distribution: &'a FieldDistribution,
threads: &'a ThreadPoolNoAbort,
) -> Self {
@@ -43,7 +43,7 @@ pub struct EmbeddingExtractorData<'extractor>(
unsafe impl MostlySend for EmbeddingExtractorData<'_> {}
impl<'a, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a> {
impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
type Data = RefCell<EmbeddingExtractorData<'extractor>>;
fn init_data<'doc>(&'doc self, extractor_alloc: &'extractor Bump) -> crate::Result<Self::Data> {
@@ -259,7 +259,7 @@ impl<'a, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a> {
// Currently this is the case as:
// 1. BVec are inside of the bumaplo
// 2. All other fields are either trivial (u8) or references.
struct Chunks<'a, 'extractor> {
struct Chunks<'a, 'b, 'extractor> {
texts: BVec<'a, &'a str>,
ids: BVec<'a, DocumentId>,
@@ -270,11 +270,11 @@ struct Chunks<'a, 'extractor> {
possible_embedding_mistakes: &'a PossibleEmbeddingMistakes,
user_provided: &'a RefCell<EmbeddingExtractorData<'extractor>>,
threads: &'a ThreadPoolNoAbort,
sender: &'a EmbeddingSender<'a>,
sender: EmbeddingSender<'a, 'b>,
has_manual_generation: Option<&'a str>,
}
impl<'a, 'extractor> Chunks<'a, 'extractor> {
impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> {
#[allow(clippy::too_many_arguments)]
pub fn new(
embedder: &'a Embedder,
@@ -284,7 +284,7 @@ impl<'a, 'extractor> Chunks<'a, 'extractor> {
user_provided: &'a RefCell<EmbeddingExtractorData<'extractor>>,
possible_embedding_mistakes: &'a PossibleEmbeddingMistakes,
threads: &'a ThreadPoolNoAbort,
sender: &'a EmbeddingSender<'a>,
sender: EmbeddingSender<'a, 'b>,
doc_alloc: &'a Bump,
) -> Self {
let capacity = embedder.prompt_count_in_chunk_hint() * embedder.chunk_count_hint();
@@ -368,7 +368,7 @@ impl<'a, 'extractor> Chunks<'a, 'extractor> {
possible_embedding_mistakes: &PossibleEmbeddingMistakes,
unused_vectors_distribution: &UnusedVectorsDistributionBump,
threads: &ThreadPoolNoAbort,
sender: &EmbeddingSender<'a>,
sender: EmbeddingSender<'a, 'b>,
has_manual_generation: Option<&'a str>,
) -> Result<()> {
if let Some(external_docid) = has_manual_generation {

View File

@@ -70,7 +70,7 @@ impl<
F: FnOnce(&'extractor Bump) -> Result<T>,
{
let doc_alloc =
doc_allocs.get_or(|| FullySend(Cell::new(Bump::with_capacity(1024 * 1024 * 1024))));
doc_allocs.get_or(|| FullySend(Cell::new(Bump::with_capacity(1024 * 1024))));
let doc_alloc = doc_alloc.0.take();
let fields_ids_map = fields_ids_map_store
.get_or(|| RefCell::new(GlobalFieldsIdsMap::new(new_fields_ids_map)).into());

View File

@@ -62,6 +62,7 @@ mod update_by_function;
pub fn index<'pl, 'indexer, 'index, DC, MSP, SP>(
wtxn: &mut RwTxn,
index: &'index Index,
pool: &ThreadPoolNoAbort,
grenad_parameters: GrenadParameters,
db_fields_ids_map: &'indexer FieldsIdsMap,
new_fields_ids_map: FieldsIdsMap,
@@ -76,9 +77,39 @@ where
MSP: Fn() -> bool + Sync,
SP: Fn(Progress) + Sync,
{
let (extractor_sender, writer_receiver) = extractor_writer_channel(10_000);
let mut bbbuffers = Vec::new();
let finished_extraction = AtomicBool::new(false);
// We reduce the actual memory used to 5%. The reason we do this here and not in Meilisearch
// is because we still use the old indexer for the settings and it is highly impacted by the
// max memory. So we keep the changes here and will remove these changes once we use the new
// indexer to also index settings. Related to #5125 and #5141.
let grenad_parameters = GrenadParameters {
max_memory: grenad_parameters.max_memory.map(|mm| mm * 5 / 100),
..grenad_parameters
};
// We compute and remove the allocated BBQueues buffers capacity from the indexing memory.
let minimum_capacity = 50 * 1024 * 1024 * pool.current_num_threads(); // 50 MiB
let (grenad_parameters, total_bbbuffer_capacity) = grenad_parameters.max_memory.map_or(
(grenad_parameters, 2 * minimum_capacity), // 100 MiB by thread by default
|max_memory| {
// 2% of the indexing memory
let total_bbbuffer_capacity = (max_memory / 100 / 2).max(minimum_capacity);
let new_grenad_parameters = GrenadParameters {
max_memory: Some(
max_memory.saturating_sub(total_bbbuffer_capacity).max(100 * 1024 * 1024),
),
..grenad_parameters
};
(new_grenad_parameters, total_bbbuffer_capacity)
},
);
let (extractor_sender, mut writer_receiver) = pool
.install(|| extractor_writer_bbqueue(&mut bbbuffers, total_bbbuffer_capacity, 1000))
.unwrap();
let metadata_builder = MetadataBuilder::from_index(index, wtxn)?;
let new_fields_ids_map = FieldIdMapWithMetadata::new(new_fields_ids_map, metadata_builder);
let new_fields_ids_map = RwLock::new(new_fields_ids_map);
@@ -96,6 +127,7 @@ where
send_progress,
};
let mut index_embeddings = index.embedding_configs(wtxn)?;
let mut field_distribution = index.field_distribution(wtxn)?;
let mut document_ids = index.documents_ids(wtxn)?;
@@ -107,261 +139,261 @@ where
let field_distribution = &mut field_distribution;
let document_ids = &mut document_ids;
let extractor_handle = Builder::new().name(S("indexer-extractors")).spawn_scoped(s, move || {
let span = tracing::trace_span!(target: "indexing::documents", parent: &indexer_span, "extract");
let _entered = span.enter();
let rtxn = index.read_txn()?;
// document but we need to create a function that collects and compresses documents.
let document_sender = extractor_sender.documents();
let document_extractor = DocumentsExtractor::new(&document_sender, embedders);
let datastore = ThreadLocal::with_capacity(rayon::current_num_threads());
{
let span = tracing::trace_span!(target: "indexing::documents::extract", parent: &indexer_span, "documents");
pool.install(move || {
let span = tracing::trace_span!(target: "indexing::documents", parent: &indexer_span, "extract");
let _entered = span.enter();
extract(document_changes,
&document_extractor,
indexing_context,
&mut extractor_allocs,
&datastore,
Step::ExtractingDocuments,
)?;
}
{
let span = tracing::trace_span!(target: "indexing::documents::merge", parent: &indexer_span, "documents");
let _entered = span.enter();
for document_extractor_data in datastore {
let document_extractor_data = document_extractor_data.0.into_inner();
for (field, delta) in document_extractor_data.field_distribution_delta {
let current = field_distribution.entry(field).or_default();
// adding the delta should never cause a negative result, as we are removing fields that previously existed.
*current = current.saturating_add_signed(delta);
let rtxn = index.read_txn()?;
// document but we need to create a function that collects and compresses documents.
let document_sender = extractor_sender.documents();
let document_extractor = DocumentsExtractor::new(document_sender, embedders);
let datastore = ThreadLocal::with_capacity(rayon::current_num_threads());
{
let span = tracing::trace_span!(target: "indexing::documents::extract", parent: &indexer_span, "documents");
let _entered = span.enter();
extract(
document_changes,
&document_extractor,
indexing_context,
&mut extractor_allocs,
&datastore,
Step::ExtractingDocuments,
)?;
}
{
let span = tracing::trace_span!(target: "indexing::documents::merge", parent: &indexer_span, "documents");
let _entered = span.enter();
for document_extractor_data in datastore {
let document_extractor_data = document_extractor_data.0.into_inner();
for (field, delta) in document_extractor_data.field_distribution_delta {
let current = field_distribution.entry(field).or_default();
// adding the delta should never cause a negative result, as we are removing fields that previously existed.
*current = current.saturating_add_signed(delta);
}
document_extractor_data.docids_delta.apply_to(document_ids);
}
document_extractor_data.docids_delta.apply_to(document_ids);
field_distribution.retain(|_, v| *v != 0);
}
field_distribution.retain(|_, v| *v != 0);
}
let facet_field_ids_delta;
let facet_field_ids_delta;
{
let caches = {
let span = tracing::trace_span!(target: "indexing::documents::extract", parent: &indexer_span, "faceted");
let _entered = span.enter();
{
let caches = {
let span = tracing::trace_span!(target: "indexing::documents::extract", parent: &indexer_span, "faceted");
let _entered = span.enter();
FacetedDocidsExtractor::run_extraction(
grenad_parameters,
document_changes,
indexing_context,
&mut extractor_allocs,
&extractor_sender.field_id_docid_facet_sender(),
Step::ExtractingFacets
)?
};
FacetedDocidsExtractor::run_extraction(
{
let span = tracing::trace_span!(target: "indexing::documents::merge", parent: &indexer_span, "faceted");
let _entered = span.enter();
facet_field_ids_delta = merge_and_send_facet_docids(
caches,
FacetDatabases::new(index),
index,
extractor_sender.facet_docids(),
)?;
}
}
{
let WordDocidsCaches {
word_docids,
word_fid_docids,
exact_word_docids,
word_position_docids,
fid_word_count_docids,
} = {
let span = tracing::trace_span!(target: "indexing::documents::extract", "word_docids");
let _entered = span.enter();
WordDocidsExtractors::run_extraction(
grenad_parameters,
document_changes,
indexing_context,
&mut extractor_allocs,
&extractor_sender.field_id_docid_facet_sender(),
Step::ExtractingFacets
Step::ExtractingWords
)?
};
};
{
let span = tracing::trace_span!(target: "indexing::documents::merge", parent: &indexer_span, "faceted");
let _entered = span.enter();
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_docids");
let _entered = span.enter();
merge_and_send_docids(
word_docids,
index.word_docids.remap_types(),
index,
extractor_sender.docids::<WordDocids>(),
&indexing_context.must_stop_processing,
)?;
}
facet_field_ids_delta = merge_and_send_facet_docids(
caches,
FacetDatabases::new(index),
index,
extractor_sender.facet_docids(),
)?;
}
}
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_fid_docids");
let _entered = span.enter();
merge_and_send_docids(
word_fid_docids,
index.word_fid_docids.remap_types(),
index,
extractor_sender.docids::<WordFidDocids>(),
&indexing_context.must_stop_processing,
)?;
}
{
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "exact_word_docids");
let _entered = span.enter();
merge_and_send_docids(
exact_word_docids,
index.exact_word_docids.remap_types(),
index,
extractor_sender.docids::<ExactWordDocids>(),
&indexing_context.must_stop_processing,
)?;
}
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_position_docids");
let _entered = span.enter();
merge_and_send_docids(
word_position_docids,
index.word_position_docids.remap_types(),
index,
extractor_sender.docids::<WordPositionDocids>(),
&indexing_context.must_stop_processing,
)?;
}
let WordDocidsCaches {
word_docids,
word_fid_docids,
exact_word_docids,
word_position_docids,
fid_word_count_docids,
} = {
let span = tracing::trace_span!(target: "indexing::documents::extract", "word_docids");
let _entered = span.enter();
WordDocidsExtractors::run_extraction(
grenad_parameters,
document_changes,
indexing_context,
&mut extractor_allocs,
Step::ExtractingWords
)?
};
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_docids");
let _entered = span.enter();
merge_and_send_docids(
word_docids,
index.word_docids.remap_types(),
index,
extractor_sender.docids::<WordDocids>(),
&indexing_context.must_stop_processing,
)?;
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "fid_word_count_docids");
let _entered = span.enter();
merge_and_send_docids(
fid_word_count_docids,
index.field_id_word_count_docids.remap_types(),
index,
extractor_sender.docids::<FidWordCountDocids>(),
&indexing_context.must_stop_processing,
)?;
}
}
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_fid_docids");
let _entered = span.enter();
merge_and_send_docids(
word_fid_docids,
index.word_fid_docids.remap_types(),
index,
extractor_sender.docids::<WordFidDocids>(),
&indexing_context.must_stop_processing,
)?;
// run the proximity extraction only if the precision is by word
// this works only if the settings didn't change during this transaction.
let proximity_precision = index.proximity_precision(&rtxn)?.unwrap_or_default();
if proximity_precision == ProximityPrecision::ByWord {
let caches = {
let span = tracing::trace_span!(target: "indexing::documents::extract", "word_pair_proximity_docids");
let _entered = span.enter();
<WordPairProximityDocidsExtractor as DocidsExtractor>::run_extraction(
grenad_parameters,
document_changes,
indexing_context,
&mut extractor_allocs,
Step::ExtractingWordProximity,
)?
};
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_pair_proximity_docids");
let _entered = span.enter();
merge_and_send_docids(
caches,
index.word_pair_proximity_docids.remap_types(),
index,
extractor_sender.docids::<WordPairProximityDocids>(),
&indexing_context.must_stop_processing,
)?;
}
}
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "exact_word_docids");
let _entered = span.enter();
merge_and_send_docids(
exact_word_docids,
index.exact_word_docids.remap_types(),
index,
extractor_sender.docids::<ExactWordDocids>(),
&indexing_context.must_stop_processing,
)?;
}
'vectors: {
if index_embeddings.is_empty() {
break 'vectors;
}
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_position_docids");
let _entered = span.enter();
merge_and_send_docids(
word_position_docids,
index.word_position_docids.remap_types(),
index,
extractor_sender.docids::<WordPositionDocids>(),
&indexing_context.must_stop_processing,
)?;
}
let embedding_sender = extractor_sender.embeddings();
let extractor = EmbeddingExtractor::new(embedders, embedding_sender, field_distribution, request_threads());
let mut datastore = ThreadLocal::with_capacity(rayon::current_num_threads());
{
let span = tracing::trace_span!(target: "indexing::documents::extract", "vectors");
let _entered = span.enter();
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "fid_word_count_docids");
let _entered = span.enter();
merge_and_send_docids(
fid_word_count_docids,
index.field_id_word_count_docids.remap_types(),
index,
extractor_sender.docids::<FidWordCountDocids>(),
&indexing_context.must_stop_processing,
)?;
}
}
extract(
document_changes,
&extractor,
indexing_context,
&mut extractor_allocs,
&datastore,
Step::ExtractingEmbeddings,
)?;
}
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "vectors");
let _entered = span.enter();
// run the proximity extraction only if the precision is by word
// this works only if the settings didn't change during this transaction.
let proximity_precision = index.proximity_precision(&rtxn)?.unwrap_or_default();
if proximity_precision == ProximityPrecision::ByWord {
let caches = {
let span = tracing::trace_span!(target: "indexing::documents::extract", "word_pair_proximity_docids");
let _entered = span.enter();
<WordPairProximityDocidsExtractor as DocidsExtractor>::run_extraction(
grenad_parameters,
document_changes,
indexing_context,
&mut extractor_allocs,
Step::ExtractingWordProximity,
)?
};
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "word_pair_proximity_docids");
let _entered = span.enter();
merge_and_send_docids(
caches,
index.word_pair_proximity_docids.remap_types(),
index,
extractor_sender.docids::<WordPairProximityDocids>(),
&indexing_context.must_stop_processing,
)?;
}
}
'vectors: {
let mut index_embeddings = index.embedding_configs(&rtxn)?;
if index_embeddings.is_empty() {
break 'vectors;
}
let embedding_sender = extractor_sender.embeddings();
let extractor = EmbeddingExtractor::new(embedders, &embedding_sender, field_distribution, request_threads());
let mut datastore = ThreadLocal::with_capacity(rayon::current_num_threads());
{
let span = tracing::trace_span!(target: "indexing::documents::extract", "vectors");
let _entered = span.enter();
extract(document_changes, &extractor, indexing_context, &mut extractor_allocs, &datastore, Step::ExtractingEmbeddings)?;
}
{
let span = tracing::trace_span!(target: "indexing::documents::merge", "vectors");
let _entered = span.enter();
for config in &mut index_embeddings {
'data: for data in datastore.iter_mut() {
let data = &mut data.get_mut().0;
let Some(deladd) = data.remove(&config.name) else { continue 'data; };
deladd.apply_to(&mut config.user_provided);
for config in &mut index_embeddings {
'data: for data in datastore.iter_mut() {
let data = &mut data.get_mut().0;
let Some(deladd) = data.remove(&config.name) else { continue 'data; };
deladd.apply_to(&mut config.user_provided);
}
}
}
}
embedding_sender.finish(index_embeddings).unwrap();
}
'geo: {
let Some(extractor) = GeoExtractor::new(&rtxn, index, grenad_parameters)? else {
break 'geo;
};
let datastore = ThreadLocal::with_capacity(rayon::current_num_threads());
'geo: {
let Some(extractor) = GeoExtractor::new(&rtxn, index, grenad_parameters)? else {
break 'geo;
};
let datastore = ThreadLocal::with_capacity(rayon::current_num_threads());
{
let span = tracing::trace_span!(target: "indexing::documents::extract", "geo");
let _entered = span.enter();
{
let span = tracing::trace_span!(target: "indexing::documents::extract", "geo");
let _entered = span.enter();
extract(
document_changes,
&extractor,
indexing_context,
&mut extractor_allocs,
&datastore,
Step::WritingGeoPoints
)?;
}
extract(
document_changes,
&extractor,
indexing_context,
&mut extractor_allocs,
&datastore,
Step::WritingGeoPoints
merge_and_send_rtree(
datastore,
&rtxn,
index,
extractor_sender.geo(),
&indexing_context.must_stop_processing,
)?;
}
merge_and_send_rtree(
datastore,
&rtxn,
index,
extractor_sender.geo(),
&indexing_context.must_stop_processing,
)?;
}
(indexing_context.send_progress)(Progress::from_step(Step::WritingToDatabase));
(indexing_context.send_progress)(Progress::from_step(Step::WritingToDatabase));
finished_extraction.store(true, std::sync::atomic::Ordering::Relaxed);
finished_extraction.store(true, std::sync::atomic::Ordering::Relaxed);
Result::Ok(facet_field_ids_delta)
Result::Ok((facet_field_ids_delta, index_embeddings))
}).unwrap()
})?;
let global_fields_ids_map = GlobalFieldsIdsMap::new(&new_fields_ids_map);
let vector_arroy = index.vector_arroy;
let mut rng = rand::rngs::StdRng::seed_from_u64(42);
let indexer_span = tracing::Span::current();
let arroy_writers: Result<HashMap<_, _>> = embedders
.inner_as_ref()
@@ -384,7 +416,11 @@ where
})
.collect();
// Used by by the ArroySetVector to copy the embedding into an
// aligned memory area, required by arroy to accept a new vector.
let mut aligned_embedding = Vec::new();
let mut arroy_writers = arroy_writers?;
{
let span = tracing::trace_span!(target: "indexing::write_db", "all");
let _entered = span.enter();
@@ -392,110 +428,93 @@ where
let span = tracing::trace_span!(target: "indexing::write_db", "post_merge");
let mut _entered_post_merge = None;
for operation in writer_receiver {
while let Some(action) = writer_receiver.recv_action() {
if _entered_post_merge.is_none()
&& finished_extraction.load(std::sync::atomic::Ordering::Relaxed)
{
_entered_post_merge = Some(span.enter());
}
match operation {
WriterOperation::DbOperation(db_operation) => {
let database = db_operation.database(index);
let database_name = db_operation.database_name();
match db_operation.entry() {
EntryOperation::Delete(e) => match database.delete(wtxn, e.entry()) {
Ok(false) => unreachable!("We tried to delete an unknown key"),
Ok(_) => (),
Err(error) => {
return Err(Error::InternalError(
InternalError::StoreDeletion {
database_name,
key: e.entry().to_owned(),
error,
},
));
}
},
EntryOperation::Write(e) => {
if let Err(error) = database.put(wtxn, e.key(), e.value()) {
return Err(Error::InternalError(InternalError::StorePut {
database_name,
key: e.key().to_owned(),
value_length: e.value().len(),
error,
}));
}
}
match action {
ReceiverAction::WakeUp => (),
ReceiverAction::LargeEntry(LargeEntry { database, key, value }) => {
let database_name = database.database_name();
let database = database.database(index);
if let Err(error) = database.put(wtxn, &key, &value) {
return Err(Error::InternalError(InternalError::StorePut {
database_name,
key: bstr::BString::from(&key[..]),
value_length: value.len(),
error,
}));
}
}
WriterOperation::ArroyOperation(arroy_operation) => match arroy_operation {
ArroyOperation::DeleteVectors { docid } => {
for (
_embedder_index,
(_embedder_name, _embedder, writer, dimensions),
) in &mut arroy_writers
{
let dimensions = *dimensions;
writer.del_items(wtxn, dimensions, docid)?;
}
ReceiverAction::LargeVectors(large_vectors) => {
let LargeVectors { docid, embedder_id, .. } = large_vectors;
let (_, _, writer, dimensions) =
arroy_writers.get(&embedder_id).expect("requested a missing embedder");
let mut embeddings = Embeddings::new(*dimensions);
for embedding in large_vectors.read_embeddings(*dimensions) {
embeddings.push(embedding.to_vec()).unwrap();
}
ArroyOperation::SetVectors {
docid,
embedder_id,
embeddings: raw_embeddings,
} => {
let (_, _, writer, dimensions) = arroy_writers
.get(&embedder_id)
.expect("requested a missing embedder");
let mut embeddings = Embeddings::new(*dimensions);
for embedding in raw_embeddings {
embeddings.append(embedding).unwrap();
}
writer.del_items(wtxn, *dimensions, docid)?;
writer.add_items(wtxn, docid, &embeddings)?;
}
ArroyOperation::SetVector { docid, embedder_id, embedding } => {
let (_, _, writer, dimensions) = arroy_writers
.get(&embedder_id)
.expect("requested a missing embedder");
writer.del_items(wtxn, *dimensions, docid)?;
writer.add_item(wtxn, docid, &embedding)?;
}
ArroyOperation::Finish { configs } => {
let span = tracing::trace_span!(target: "indexing::vectors", parent: &indexer_span, "build");
let _entered = span.enter();
(indexing_context.send_progress)(Progress::from_step(
Step::WritingEmbeddingsToDatabase,
));
for (
_embedder_index,
(_embedder_name, _embedder, writer, dimensions),
) in &mut arroy_writers
{
let dimensions = *dimensions;
writer.build_and_quantize(
wtxn,
&mut rng,
dimensions,
false,
&indexing_context.must_stop_processing,
)?;
}
index.put_embedding_configs(wtxn, configs)?;
}
},
writer.del_items(wtxn, *dimensions, docid)?;
writer.add_items(wtxn, docid, &embeddings)?;
}
}
// Every time the is a message in the channel we search
// for new entries in the BBQueue buffers.
write_from_bbqueue(
&mut writer_receiver,
index,
wtxn,
&arroy_writers,
&mut aligned_embedding,
)?;
}
// Once the extractor/writer channel is closed
// we must process the remaining BBQueue messages.
write_from_bbqueue(
&mut writer_receiver,
index,
wtxn,
&arroy_writers,
&mut aligned_embedding,
)?;
}
(indexing_context.send_progress)(Progress::from_step(Step::WaitingForExtractors));
let facet_field_ids_delta = extractor_handle.join().unwrap()?;
let (facet_field_ids_delta, index_embeddings) = extractor_handle.join().unwrap()?;
'vectors: {
let span =
tracing::trace_span!(target: "indexing::vectors", parent: &indexer_span, "build");
let _entered = span.enter();
if index_embeddings.is_empty() {
break 'vectors;
}
(indexing_context.send_progress)(Progress::from_step(
Step::WritingEmbeddingsToDatabase,
));
let mut rng = rand::rngs::StdRng::seed_from_u64(42);
for (_index, (_embedder_name, _embedder, writer, dimensions)) in &mut arroy_writers {
let dimensions = *dimensions;
writer.build_and_quantize(
wtxn,
&mut rng,
dimensions,
false,
&indexing_context.must_stop_processing,
)?;
}
index.put_embedding_configs(wtxn, index_embeddings)?;
}
(indexing_context.send_progress)(Progress::from_step(Step::PostProcessingFacets));
@@ -537,6 +556,72 @@ where
Ok(())
}
/// A function dedicated to manage all the available BBQueue frames.
///
/// It reads all the available frames, do the corresponding database operations
/// and stops when no frame are available.
fn write_from_bbqueue(
writer_receiver: &mut WriterBbqueueReceiver<'_>,
index: &Index,
wtxn: &mut RwTxn<'_>,
arroy_writers: &HashMap<u8, (&str, &crate::vector::Embedder, ArroyWrapper, usize)>,
aligned_embedding: &mut Vec<f32>,
) -> crate::Result<()> {
while let Some(frame_with_header) = writer_receiver.recv_frame() {
match frame_with_header.header() {
EntryHeader::DbOperation(operation) => {
let database_name = operation.database.database_name();
let database = operation.database.database(index);
let frame = frame_with_header.frame();
match operation.key_value(frame) {
(key, Some(value)) => {
if let Err(error) = database.put(wtxn, key, value) {
return Err(Error::InternalError(InternalError::StorePut {
database_name,
key: key.into(),
value_length: value.len(),
error,
}));
}
}
(key, None) => match database.delete(wtxn, key) {
Ok(false) => {
unreachable!("We tried to delete an unknown key: {key:?}")
}
Ok(_) => (),
Err(error) => {
return Err(Error::InternalError(InternalError::StoreDeletion {
database_name,
key: key.into(),
error,
}));
}
},
}
}
EntryHeader::ArroyDeleteVector(ArroyDeleteVector { docid }) => {
for (_index, (_name, _embedder, writer, dimensions)) in arroy_writers {
let dimensions = *dimensions;
writer.del_items(wtxn, dimensions, docid)?;
}
}
EntryHeader::ArroySetVectors(asvs) => {
let ArroySetVectors { docid, embedder_id, .. } = asvs;
let frame = frame_with_header.frame();
let (_, _, writer, dimensions) =
arroy_writers.get(&embedder_id).expect("requested a missing embedder");
let mut embeddings = Embeddings::new(*dimensions);
let all_embeddings = asvs.read_all_embeddings_into_vec(frame, aligned_embedding);
embeddings.append(all_embeddings.to_vec()).unwrap();
writer.del_items(wtxn, *dimensions, docid)?;
writer.add_items(wtxn, docid, &embeddings)?;
}
}
}
Ok(())
}
#[tracing::instrument(level = "trace", skip_all, target = "indexing::prefix")]
fn compute_prefix_database(
index: &Index,

View File

@@ -9,8 +9,8 @@ use roaring::RoaringBitmap;
use super::channel::*;
use super::extract::{
merge_caches, transpose_and_freeze_caches, BalancedCaches, DelAddRoaringBitmap, FacetKind,
GeoExtractorData,
merge_caches_sorted, transpose_and_freeze_caches, BalancedCaches, DelAddRoaringBitmap,
FacetKind, GeoExtractorData,
};
use crate::{CboRoaringBitmapCodec, FieldId, GeoPoint, Index, InternalError, Result};
@@ -19,7 +19,7 @@ pub fn merge_and_send_rtree<'extractor, MSP>(
datastore: impl IntoIterator<Item = RefCell<GeoExtractorData<'extractor>>>,
rtxn: &RoTxn,
index: &Index,
geo_sender: GeoSender<'_>,
geo_sender: GeoSender<'_, '_>,
must_stop_processing: &MSP,
) -> Result<()>
where
@@ -34,7 +34,7 @@ where
}
let mut frozen = data.into_inner().freeze()?;
for result in frozen.iter_and_clear_removed() {
for result in frozen.iter_and_clear_removed()? {
let extracted_geo_point = result?;
let removed = rtree.remove(&GeoPoint::from(extracted_geo_point));
debug_assert!(removed.is_some());
@@ -42,7 +42,7 @@ where
debug_assert!(removed);
}
for result in frozen.iter_and_clear_inserted() {
for result in frozen.iter_and_clear_inserted()? {
let extracted_geo_point = result?;
rtree.insert(GeoPoint::from(extracted_geo_point));
let inserted = faceted.insert(extracted_geo_point.docid);
@@ -56,41 +56,59 @@ where
let rtree_mmap = unsafe { Mmap::map(&file)? };
geo_sender.set_rtree(rtree_mmap).unwrap();
geo_sender.set_geo_faceted(&faceted).unwrap();
geo_sender.set_geo_faceted(&faceted)?;
Ok(())
}
#[tracing::instrument(level = "trace", skip_all, target = "indexing::merge")]
pub fn merge_and_send_docids<'extractor, MSP>(
pub fn merge_and_send_docids<'extractor, MSP, D>(
mut caches: Vec<BalancedCaches<'extractor>>,
database: Database<Bytes, Bytes>,
index: &Index,
docids_sender: impl DocidsSender + Sync,
docids_sender: WordDocidsSender<D>,
must_stop_processing: &MSP,
) -> Result<()>
where
MSP: Fn() -> bool + Sync,
D: DatabaseType + Sync,
{
transpose_and_freeze_caches(&mut caches)?.into_par_iter().try_for_each(|frozen| {
let rtxn = index.read_txn()?;
let mut buffer = Vec::new();
if must_stop_processing() {
return Err(InternalError::AbortedIndexation.into());
}
merge_caches(frozen, |key, DelAddRoaringBitmap { del, add }| {
merge_caches_sorted(frozen, |key, DelAddRoaringBitmap { del, add }| {
let current = database.get(&rtxn, key)?;
match merge_cbo_bitmaps(current, del, add)? {
Operation::Write(bitmap) => {
let value = cbo_bitmap_serialize_into_vec(&bitmap, &mut buffer);
docids_sender.write(key, value).unwrap();
if let (Some(del), Some(current)) = (&del, &current) {
let current = CboRoaringBitmapCodec::deserialize_from(current).unwrap();
let diff = del - &current;
let external_ids = index.external_id_of(&rtxn, &diff).unwrap().into_iter().map(|id| id.unwrap()).collect::<Vec<_>>();
if !del.is_subset(&current) {
eprintln!(
"======================== {:?}: {} -> c: {:?} d: {:?} a: {:?} extra: {:?} extra_external_ids: {:?}",
D::DATABASE,
D::DATABASE.stringify_key(key),
&current,
del,
add,
diff,
external_ids
);
}
}
match merge_cbo_bitmaps(current, del, add) {
Ok(Operation::Write(bitmap)) => {
docids_sender.write(key, &bitmap)?;
Ok(())
}
Operation::Delete => {
docids_sender.delete(key).unwrap();
Ok(Operation::Delete) => {
docids_sender.delete(key)?;
Ok(())
}
Operation::Ignore => Ok(()),
Ok(Operation::Ignore) => Ok(()),
Err(e) => Err(e),
}
})
})
@@ -101,26 +119,24 @@ pub fn merge_and_send_facet_docids<'extractor>(
mut caches: Vec<BalancedCaches<'extractor>>,
database: FacetDatabases,
index: &Index,
docids_sender: impl DocidsSender + Sync,
docids_sender: FacetDocidsSender,
) -> Result<FacetFieldIdsDelta> {
transpose_and_freeze_caches(&mut caches)?
.into_par_iter()
.map(|frozen| {
let mut facet_field_ids_delta = FacetFieldIdsDelta::default();
let rtxn = index.read_txn()?;
let mut buffer = Vec::new();
merge_caches(frozen, |key, DelAddRoaringBitmap { del, add }| {
merge_caches_sorted(frozen, |key, DelAddRoaringBitmap { del, add }| {
let current = database.get_cbo_roaring_bytes_value(&rtxn, key)?;
match merge_cbo_bitmaps(current, del, add)? {
Operation::Write(bitmap) => {
facet_field_ids_delta.register_from_key(key);
let value = cbo_bitmap_serialize_into_vec(&bitmap, &mut buffer);
docids_sender.write(key, value).unwrap();
docids_sender.write(key, &bitmap)?;
Ok(())
}
Operation::Delete => {
facet_field_ids_delta.register_from_key(key);
docids_sender.delete(key).unwrap();
docids_sender.delete(key)?;
Ok(())
}
Operation::Ignore => Ok(()),
@@ -237,25 +253,47 @@ fn merge_cbo_bitmaps(
(None, Some(_del), Some(add)) => Ok(Operation::Write(add)),
(Some(_current), None, None) => Ok(Operation::Ignore), // but it's strange
(Some(current), None, Some(add)) => Ok(Operation::Write(current | add)),
(Some(current), Some(del), add) => {
(Some(current), Some(mut del), add) => {
debug_assert!(
del.is_subset(&current),
"del is not a subset of current, which must be impossible."
);
let output = match add {
Some(add) => (&current - del) | add,
None => &current - del,
Some(add) => {
del -= &add;
if del.is_empty() {
if add.is_subset(&current) {
// no changes, no allocation
None
} else {
// addition
Some(current | add)
}
} else {
if add.is_subset(&current) {
// deletion only, no union
Some(current - del)
} else {
// deletion and addition
Some((current - del) | add)
}
}
}
// deletion only, no union
None => Some(current - del),
};
if output.is_empty() {
Ok(Operation::Delete)
} else if current == output {
Ok(Operation::Ignore)
} else {
Ok(Operation::Write(output))
match output {
Some(output) => {
if output.is_empty() {
Ok(Operation::Delete)
} else {
Ok(Operation::Write(output))
}
}
None => Ok(Operation::Ignore),
}
}
}
}
/// TODO Return the slice directly from the serialize_into method
fn cbo_bitmap_serialize_into_vec<'b>(bitmap: &RoaringBitmap, buffer: &'b mut Vec<u8>) -> &'b [u8] {
buffer.clear();
CboRoaringBitmapCodec::serialize_into(bitmap, buffer);
buffer.as_slice()
}

View File

@@ -5,6 +5,7 @@ pub trait RefCellExt<T: ?Sized> {
&self,
) -> std::result::Result<RefMut<'_, T>, std::cell::BorrowMutError>;
#[track_caller]
fn borrow_mut_or_yield(&self) -> RefMut<'_, T> {
self.try_borrow_mut_or_yield().unwrap()
}

View File

@@ -11,8 +11,8 @@ pub enum Step {
ExtractingEmbeddings,
WritingGeoPoints,
WritingToDatabase,
WritingEmbeddingsToDatabase,
WaitingForExtractors,
WritingEmbeddingsToDatabase,
PostProcessingFacets,
PostProcessingWords,
Finalizing,
@@ -29,8 +29,8 @@ impl Step {
Step::ExtractingEmbeddings => "extracting embeddings",
Step::WritingGeoPoints => "writing geo points",
Step::WritingToDatabase => "writing to database",
Step::WritingEmbeddingsToDatabase => "writing embeddings to database",
Step::WaitingForExtractors => "waiting for extractors",
Step::WritingEmbeddingsToDatabase => "writing embeddings to database",
Step::PostProcessingFacets => "post-processing facets",
Step::PostProcessingWords => "post-processing words",
Step::Finalizing => "finalizing",

View File

@@ -1,4 +1,4 @@
use std::collections::HashSet;
use std::collections::BTreeSet;
use std::io::BufWriter;
use fst::{Set, SetBuilder, Streamer};
@@ -75,8 +75,8 @@ pub struct PrefixData {
#[derive(Debug)]
pub struct PrefixDelta {
pub modified: HashSet<Prefix>,
pub deleted: HashSet<Prefix>,
pub modified: BTreeSet<Prefix>,
pub deleted: BTreeSet<Prefix>,
}
struct PrefixFstBuilder {
@@ -86,7 +86,7 @@ struct PrefixFstBuilder {
prefix_fst_builders: Vec<SetBuilder<Vec<u8>>>,
current_prefix: Vec<Prefix>,
current_prefix_count: Vec<usize>,
modified_prefixes: HashSet<Prefix>,
modified_prefixes: BTreeSet<Prefix>,
current_prefix_is_modified: Vec<bool>,
}
@@ -110,7 +110,7 @@ impl PrefixFstBuilder {
prefix_fst_builders,
current_prefix: vec![Prefix::new(); max_prefix_length],
current_prefix_count: vec![0; max_prefix_length],
modified_prefixes: HashSet::new(),
modified_prefixes: BTreeSet::new(),
current_prefix_is_modified: vec![false; max_prefix_length],
})
}
@@ -180,7 +180,7 @@ impl PrefixFstBuilder {
let prefix_fst_mmap = unsafe { Mmap::map(&prefix_fst_file)? };
let new_prefix_fst = Set::new(&prefix_fst_mmap)?;
let old_prefix_fst = index.words_prefixes_fst(rtxn)?;
let mut deleted_prefixes = HashSet::new();
let mut deleted_prefixes = BTreeSet::new();
{
let mut deleted_prefixes_stream = old_prefix_fst.op().add(&new_prefix_fst).difference();
while let Some(prefix) = deleted_prefixes_stream.next() {

View File

@@ -1,5 +1,5 @@
use std::cell::RefCell;
use std::collections::HashSet;
use std::collections::BTreeSet;
use std::io::{BufReader, BufWriter, Read, Seek, Write};
use hashbrown::HashMap;
@@ -37,8 +37,8 @@ impl WordPrefixDocids {
fn execute(
self,
wtxn: &mut heed::RwTxn,
prefix_to_compute: &HashSet<Prefix>,
prefix_to_delete: &HashSet<Prefix>,
prefix_to_compute: &BTreeSet<Prefix>,
prefix_to_delete: &BTreeSet<Prefix>,
) -> Result<()> {
delete_prefixes(wtxn, &self.prefix_database, prefix_to_delete)?;
self.recompute_modified_prefixes(wtxn, prefix_to_compute)
@@ -48,7 +48,7 @@ impl WordPrefixDocids {
fn recompute_modified_prefixes(
&self,
wtxn: &mut RwTxn,
prefixes: &HashSet<Prefix>,
prefixes: &BTreeSet<Prefix>,
) -> Result<()> {
// We fetch the docids associated to the newly added word prefix fst only.
// And collect the CboRoaringBitmaps pointers in an HashMap.
@@ -76,7 +76,7 @@ impl WordPrefixDocids {
.union()?;
buffer.clear();
CboRoaringBitmapCodec::serialize_into(&output, buffer);
CboRoaringBitmapCodec::serialize_into_vec(&output, buffer);
index.push(PrefixEntry { prefix, serialized_length: buffer.len() });
file.write_all(buffer)
})?;
@@ -127,7 +127,7 @@ impl<'a, 'rtxn> FrozenPrefixBitmaps<'a, 'rtxn> {
pub fn from_prefixes(
database: Database<Bytes, CboRoaringBitmapCodec>,
rtxn: &'rtxn RoTxn,
prefixes: &'a HashSet<Prefix>,
prefixes: &'a BTreeSet<Prefix>,
) -> heed::Result<Self> {
let database = database.remap_data_type::<Bytes>();
@@ -173,8 +173,8 @@ impl WordPrefixIntegerDocids {
fn execute(
self,
wtxn: &mut heed::RwTxn,
prefix_to_compute: &HashSet<Prefix>,
prefix_to_delete: &HashSet<Prefix>,
prefix_to_compute: &BTreeSet<Prefix>,
prefix_to_delete: &BTreeSet<Prefix>,
) -> Result<()> {
delete_prefixes(wtxn, &self.prefix_database, prefix_to_delete)?;
self.recompute_modified_prefixes(wtxn, prefix_to_compute)
@@ -184,7 +184,7 @@ impl WordPrefixIntegerDocids {
fn recompute_modified_prefixes(
&self,
wtxn: &mut RwTxn,
prefixes: &HashSet<Prefix>,
prefixes: &BTreeSet<Prefix>,
) -> Result<()> {
// We fetch the docids associated to the newly added word prefix fst only.
// And collect the CboRoaringBitmaps pointers in an HashMap.
@@ -211,7 +211,7 @@ impl WordPrefixIntegerDocids {
.union()?;
buffer.clear();
CboRoaringBitmapCodec::serialize_into(&output, buffer);
CboRoaringBitmapCodec::serialize_into_vec(&output, buffer);
index.push(PrefixIntegerEntry { prefix, pos, serialized_length: buffer.len() });
file.write_all(buffer)?;
}
@@ -262,7 +262,7 @@ impl<'a, 'rtxn> FrozenPrefixIntegerBitmaps<'a, 'rtxn> {
pub fn from_prefixes(
database: Database<Bytes, CboRoaringBitmapCodec>,
rtxn: &'rtxn RoTxn,
prefixes: &'a HashSet<Prefix>,
prefixes: &'a BTreeSet<Prefix>,
) -> heed::Result<Self> {
let database = database.remap_data_type::<Bytes>();
@@ -291,7 +291,7 @@ unsafe impl<'a, 'rtxn> Sync for FrozenPrefixIntegerBitmaps<'a, 'rtxn> {}
fn delete_prefixes(
wtxn: &mut RwTxn,
prefix_database: &Database<Bytes, CboRoaringBitmapCodec>,
prefixes: &HashSet<Prefix>,
prefixes: &BTreeSet<Prefix>,
) -> Result<()> {
// We remove all the entries that are no more required in this word prefix docids database.
for prefix in prefixes {
@@ -309,8 +309,8 @@ fn delete_prefixes(
pub fn compute_word_prefix_docids(
wtxn: &mut RwTxn,
index: &Index,
prefix_to_compute: &HashSet<Prefix>,
prefix_to_delete: &HashSet<Prefix>,
prefix_to_compute: &BTreeSet<Prefix>,
prefix_to_delete: &BTreeSet<Prefix>,
grenad_parameters: GrenadParameters,
) -> Result<()> {
WordPrefixDocids::new(
@@ -325,8 +325,8 @@ pub fn compute_word_prefix_docids(
pub fn compute_exact_word_prefix_docids(
wtxn: &mut RwTxn,
index: &Index,
prefix_to_compute: &HashSet<Prefix>,
prefix_to_delete: &HashSet<Prefix>,
prefix_to_compute: &BTreeSet<Prefix>,
prefix_to_delete: &BTreeSet<Prefix>,
grenad_parameters: GrenadParameters,
) -> Result<()> {
WordPrefixDocids::new(
@@ -341,8 +341,8 @@ pub fn compute_exact_word_prefix_docids(
pub fn compute_word_prefix_fid_docids(
wtxn: &mut RwTxn,
index: &Index,
prefix_to_compute: &HashSet<Prefix>,
prefix_to_delete: &HashSet<Prefix>,
prefix_to_compute: &BTreeSet<Prefix>,
prefix_to_delete: &BTreeSet<Prefix>,
grenad_parameters: GrenadParameters,
) -> Result<()> {
WordPrefixIntegerDocids::new(
@@ -357,8 +357,8 @@ pub fn compute_word_prefix_fid_docids(
pub fn compute_word_prefix_position_docids(
wtxn: &mut RwTxn,
index: &Index,
prefix_to_compute: &HashSet<Prefix>,
prefix_to_delete: &HashSet<Prefix>,
prefix_to_compute: &BTreeSet<Prefix>,
prefix_to_delete: &BTreeSet<Prefix>,
grenad_parameters: GrenadParameters,
) -> Result<()> {
WordPrefixIntegerDocids::new(

View File

@@ -475,7 +475,7 @@ impl<F> Embeddings<F> {
Ok(())
}
/// Append a flat vector of embeddings a the end of the embeddings.
/// Append a flat vector of embeddings at the end of the embeddings.
///
/// If `embeddings.len() % self.dimension != 0`, then the append operation fails.
pub fn append(&mut self, mut embeddings: Vec<F>) -> Result<(), Vec<F>> {

View File

@@ -64,6 +64,7 @@ fn test_facet_distribution_with_no_facet_values() {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,

View File

@@ -101,6 +101,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,

View File

@@ -333,6 +333,7 @@ fn criteria_ascdesc() {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,

View File

@@ -142,6 +142,7 @@ fn test_typo_disabled_on_word() {
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,

View File

@@ -82,6 +82,10 @@ pub struct BenchDeriveArgs {
/// Reason for the benchmark invocation
#[arg(short, long)]
reason: Option<String>,
/// The maximum time in seconds we allow for fetching the task queue before timing out.
#[arg(long, default_value_t = 60)]
tasks_queue_timeout_secs: u64,
}
pub fn run(args: BenchDeriveArgs) -> anyhow::Result<()> {
@@ -127,7 +131,7 @@ pub fn run(args: BenchDeriveArgs) -> anyhow::Result<()> {
let meili_client = Client::new(
Some("http://127.0.0.1:7700".into()),
args.master_key.as_deref(),
Some(std::time::Duration::from_secs(60)),
Some(std::time::Duration::from_secs(args.tasks_queue_timeout_secs)),
)?;
// enter runtime
@@ -135,7 +139,7 @@ pub fn run(args: BenchDeriveArgs) -> anyhow::Result<()> {
rt.block_on(async {
dashboard_client.send_machine_info(&env).await?;
let commit_message = build_info.commit_msg.context("missing commit message")?.split('\n').next().unwrap();
let commit_message = build_info.commit_msg.unwrap_or_default().split('\n').next().unwrap();
let max_workloads = args.workload_file.len();
let reason: Option<&str> = args.reason.as_deref();
let invocation_uuid = dashboard_client.create_invocation(build_info.clone(), commit_message, env, max_workloads, reason).await?;

View File

@@ -16,6 +16,7 @@ struct ListFeaturesDeriveArgs {
#[command(author, version, about, long_about)]
#[command(name = "cargo xtask")]
#[command(bin_name = "cargo xtask")]
#[allow(clippy::large_enum_variant)] // please, that's enough...
enum Command {
ListFeatures(ListFeaturesDeriveArgs),
Bench(BenchDeriveArgs),

View File

@@ -0,0 +1,105 @@
{
"name": "hackernews.add_new_documents",
"run_count": 3,
"extra_cli_args": [],
"assets": {
"hackernews-01.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/01.ndjson",
"sha256": "cd3627b86c064d865b6754848ed0e73ef1d8142752a25e5f0765c3a1296dd3ae"
},
"hackernews-02.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/02.ndjson",
"sha256": "5d533b83bcf992201dace88b4d0c0be8b4df5225c6c4b763582d986844bcc23b"
},
"hackernews-03.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/03.ndjson",
"sha256": "f5f351a0d04a8a83643ace12cafa2b7ec8ca8cb7d46fd268e5126492a6c66f2a"
},
"hackernews-04.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/04.ndjson",
"sha256": "ac1915ee7ce53a6718548c255a6cc59969784b2570745dc5b739f714beda291a"
},
"hackernews-05.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/05.ndjson",
"sha256": "be31d5632602f798e62d1c10c83bdfda2b4deaa068477eacde05fdd247572b82"
}
},
"precommands": [
{
"route": "indexes/movies/settings",
"method": "PATCH",
"body": {
"inline": {
"displayedAttributes": [
"title",
"by",
"score",
"time",
"text"
],
"searchableAttributes": [
"title",
"text"
],
"filterableAttributes": [
"by",
"kids",
"parent"
],
"sortableAttributes": [
"score",
"time"
]
}
},
"synchronous": "WaitForTask"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-01.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-02.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-03.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-04.ndjson"
},
"synchronous": "WaitForTask"
}
],
"commands": [
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-05.ndjson"
},
"synchronous": "WaitForTask"
}
]
}

View File

@@ -0,0 +1,111 @@
{
"name": "hackernews.modify_facet_numbers",
"run_count": 3,
"extra_cli_args": [],
"assets": {
"hackernews-01.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/01.ndjson",
"sha256": "cd3627b86c064d865b6754848ed0e73ef1d8142752a25e5f0765c3a1296dd3ae"
},
"hackernews-02.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/02.ndjson",
"sha256": "5d533b83bcf992201dace88b4d0c0be8b4df5225c6c4b763582d986844bcc23b"
},
"hackernews-03.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/03.ndjson",
"sha256": "f5f351a0d04a8a83643ace12cafa2b7ec8ca8cb7d46fd268e5126492a6c66f2a"
},
"hackernews-04.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/04.ndjson",
"sha256": "ac1915ee7ce53a6718548c255a6cc59969784b2570745dc5b739f714beda291a"
},
"hackernews-05.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/05.ndjson",
"sha256": "be31d5632602f798e62d1c10c83bdfda2b4deaa068477eacde05fdd247572b82"
},
"hackernews-02-modified-filters.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/02-modified-filters.ndjson",
"sha256": "7272cbfd41110d32d7fe168424a0000f07589bfe40f664652b34f4f20aaf3802"
}
},
"precommands": [
{
"route": "indexes/movies/settings",
"method": "PATCH",
"body": {
"inline": {
"displayedAttributes": [
"title",
"by",
"score",
"time",
"text"
],
"searchableAttributes": [
"title",
"text"
],
"filterableAttributes": [
"by",
"kids",
"parent"
],
"sortableAttributes": [
"score",
"time"
]
}
},
"synchronous": "WaitForTask"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-01.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-02.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-03.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-04.ndjson"
},
"synchronous": "WaitForTask"
}
],
"commands": [
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-02-modified-filters.ndjson"
},
"synchronous": "WaitForTask"
}
]
}

View File

@@ -0,0 +1,111 @@
{
"name": "hackernews.modify_facet_strings",
"run_count": 3,
"extra_cli_args": [],
"assets": {
"hackernews-01.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/01.ndjson",
"sha256": "cd3627b86c064d865b6754848ed0e73ef1d8142752a25e5f0765c3a1296dd3ae"
},
"hackernews-02.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/02.ndjson",
"sha256": "5d533b83bcf992201dace88b4d0c0be8b4df5225c6c4b763582d986844bcc23b"
},
"hackernews-03.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/03.ndjson",
"sha256": "f5f351a0d04a8a83643ace12cafa2b7ec8ca8cb7d46fd268e5126492a6c66f2a"
},
"hackernews-04.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/04.ndjson",
"sha256": "ac1915ee7ce53a6718548c255a6cc59969784b2570745dc5b739f714beda291a"
},
"hackernews-05.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/05.ndjson",
"sha256": "be31d5632602f798e62d1c10c83bdfda2b4deaa068477eacde05fdd247572b82"
},
"hackernews-01-modified-filters.ndjson": {
"local_location": null,
"remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/01-modified-filters.ndjson",
"sha256": "b80c245ce1b1df80b9b38800f677f3bd11947ebc62716fb108269d50e796c35c"
}
},
"precommands": [
{
"route": "indexes/movies/settings",
"method": "PATCH",
"body": {
"inline": {
"displayedAttributes": [
"title",
"by",
"score",
"time",
"text"
],
"searchableAttributes": [
"title",
"text"
],
"filterableAttributes": [
"by",
"kids",
"parent"
],
"sortableAttributes": [
"score",
"time"
]
}
},
"synchronous": "WaitForTask"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-01.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-02.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-03.ndjson"
},
"synchronous": "WaitForResponse"
},
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-04.ndjson"
},
"synchronous": "WaitForTask"
}
],
"commands": [
{
"route": "indexes/movies/documents",
"method": "POST",
"body": {
"asset": "hackernews-01-modified-filters.ndjson"
},
"synchronous": "WaitForTask"
}
]
}

Some files were not shown because too many files have changed in this diff Show More