Compare commits

...

571 Commits

Author SHA1 Message Date
1cf14e765f Change implementation of MergedDocuments::iter_top_level_fields 2024-12-09 09:38:21 +01:00
4a082683df Merge #5131
Some checks failed
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 21s
Test suite / Tests on ubuntu-20.04 (push) Failing after 10s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Run Rustfmt (push) Successful in 1m25s
Test suite / Run Clippy (push) Successful in 5m54s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5131: Ignore documents whose selected fields didn't change r=dureuill a=dureuill

Attempts to improve the new indexer performance by ignoring documents whose selected fields didn't change:

- Add `Update::has_changed_for_fields` function
- Ignore documents whose searchable attributes didn't change for word docids and word pair proximity extraction
- Ignore documents whose faceted attributes didn't change for facet extraction

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-05 16:04:16 +00:00
26be5e0733 Merge #5123
5123: Fix batch details r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5079
Fixes https://github.com/meilisearch/meilisearch/issues/5112

## What does this PR do?
- Make the processing tasks actually processing in the stats of the batch instead of enqueued
- Stop counting one extra task for all non-prioritized batches in the stats
- Add a test

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-05 15:21:55 +00:00
bd5110a2fe Fix clippy warnings 2024-12-05 16:13:07 +01:00
fa8b9acdf6 Ignore documents that didn't change in facets 2024-12-05 16:12:52 +01:00
2b74d1824b Ignore documents that didn't change any field in word pair proximity 2024-12-05 15:56:22 +01:00
c77b00d3ac Don't extract word docids when no searchable changed 2024-12-05 15:51:58 +01:00
c77073efcc Update::has_changed_for_fields 2024-12-05 15:50:12 +01:00
1537323eb9 Merge #5119
5119: Settings opt out error msg r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
PRD: https://meilisearch.notion.site/API-usage-Settings-to-opt-out-indexing-features-fff4b06b651f8108ade3f858aeb16b14?pvs=4
## What does this PR do?

Add a new error code and message when the user tries a facet search on an index where the facet search is disabled:
```json
{
  "message": "The facet search is disabled for this index",
  "code": "facet_search_disabled",
  "type": "invalid_request",
  "link": "https://docs.meilisearch.com/errors#invalid_facet_search_disabled"
}
 ```


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-12-05 13:51:11 +00:00
a0a3b55700 Change error code 2024-12-05 14:48:29 +01:00
214b51de87 try to fix the snapshot on demand flaky test 2024-12-05 14:45:54 +01:00
95975944d7 fix the dumps missing the empty swap index tasks 2024-12-05 14:23:38 +01:00
9a9383643f Merge #5125
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 37s
Test suite / Tests on ubuntu-20.04 (push) Failing after 15s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 12s
Test suite / Run Rustfmt (push) Successful in 2m14s
Test suite / Run Clippy (push) Successful in 12m4s
5125: Change the default max memory usage to 5% of the total memory r=ManyTheFish a=Kerollmops

After thorough testing, we found that giving 5% of the total available memory to allocate resident memory (caches and channels) is the best approach.

The main reason is that the new indexer is highly memory-map oriented, with LMDB, and reads the database while performing the indexation. So, by allowing the maximum amount of memory available to LMDB and the OS, it will perform the key-value store reads and all other indexation operations faster by keeping more pages hot in the cache. In #5124, we also sorted the entries to merge to improve the read speed of LMDB.

This is common in database management systems: Reading stuff on the disk is much faster when done in lexicographic order (the default sorted order of key values). The entries have a great chance of already being in the OS memory cache, as they were loaded in a previous read, and reading stuff on the disk is very slow compared to reading memory.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 10:11:25 +00:00
cac355bfa7 Merge #5124
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops

In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:

 - Optimize the prefix generation for word position docids (`@manythefish)`
 - Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
 
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
 
Before on the tag meilisearch-v1.12.0-rc.3.

```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```

After sorting the whole `HashMap`s in a `Vec` on this branch.

```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 09:35:52 +00:00
9020a50df8 Change the default max memory usage to 5% of the total memory 2024-12-05 10:14:46 +01:00
52843123d4 Clean up and remove the non-sorted merge_caches function 2024-12-05 10:03:05 +01:00
6298db5bea Merge #5113
5113: Fix the Minimum BBQueue channel threshold r=Kerollmops a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-05 09:01:02 +00:00
a003a0934a Merge #5121
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 9s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 24s
Test suite / Run Rustfmt (push) Successful in 1m19s
Test suite / Run Clippy (push) Successful in 5m32s
5121: Make the tasks pulling timeout configurable r=dureuill a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-04 17:04:14 +00:00
3a11e39c01 Force max_memory to a min of 100MiB 2024-12-04 17:53:30 +01:00
5f896b1050 Fix geo when spilling 2024-12-04 17:51:12 +01:00
d0c4e6da6b Make clippy happy 2024-12-04 17:39:10 +01:00
2da5584bb5 Make the tasks pulling timeout configurable 2024-12-04 17:39:07 +01:00
b7eb802ae6 Merge #5120
5120: Add cross tasks r=Kerollmops a=ManyTheFish

Add 4 xtask bench workloads:
- `hackernews-add-new-documents`: adds new documents on a db already containing documents
- `hackernews-modify-facet-numbers`: modify filterable fields containing numbers of documents on a db already containing documents
- `hackernews-modify-facet-strings`: modify filterable fields containing strings of documents on a db already containing documents
- `hackernews-modify-searchables`: modify searchable fields of documents on a db already containing documents

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-12-04 16:16:57 +00:00
2e32d0474c Lexicographically sort all the map to merge 2024-12-04 17:05:11 +01:00
cb99ac6f7e Consume vec instead of draining 2024-12-04 17:00:22 +01:00
be411435f5 Use the merge_caches_alt function in the docids merging 2024-12-04 16:37:29 +01:00
29ef164530 Introduce a new semi ordered merge function 2024-12-04 16:33:35 +01:00
739c52a3cd Replace HashSets by BTreeSets for the prefixes 2024-12-04 16:16:48 +01:00
7a2af06b1e update the impacted snapshots 2024-12-04 15:52:24 +01:00
cb0c3a5aad stop adding one enqueued tasks to all unprioritized batches 2024-12-04 15:48:28 +01:00
8388698993 Fix dat hash 2024-12-04 15:09:10 +01:00
cbcf6c9ba3 make the processing tasks as processing in a batch 2024-12-04 14:48:48 +01:00
bf742d81cf add a test 2024-12-04 14:47:02 +01:00
7458f0386c fix asset name 2024-12-04 14:44:57 +01:00
fc1df5793c fix tests 2024-12-04 14:35:20 +01:00
3ded069042 Merge #5122
5122: Yield the BBQueue writing loop r=ManyTheFish a=Kerollmops

We prefer yielding to let the writing thread do its job instead of spin looping.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-04 13:33:51 +00:00
261d2ceb06 Yield the BBQueue writer instead of spin looping 2024-12-04 14:16:40 +01:00
1a17e2e572 fix formating 2024-12-04 13:57:06 +01:00
5b8cd68abe Merge #5110
5110: Increase margin on deletion of task r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5077

## What does this PR do?
- Increase the margin we keep to enqueue task deletion

The issue was that we had not enough space on the reserved memory to write both the batch and the deletion task we just enqueued.
We could fix it only for this test as it’s not an issue in production where we have 10GiB of margin, but I thought it wasn’t a bad idea either to increase our margin a bit since we’re effectively writing more to lmdb.


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-04 12:54:48 +00:00
5ce9acb0b9 Add workloads 2024-12-04 12:19:19 +01:00
953a82ca04 Add new error message 2024-12-04 11:15:29 +01:00
54341c2e80 Merge #5118
5118: Change the reserve and grant function to accept a closure r=ManyTheFish a=Kerollmops

This simplifies the usage of the grant and commits it at the right time, just after having written in it.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-04 10:12:39 +00:00
96831ed9bb Send the WakeUp message if necessary in the reserve function 2024-12-04 11:03:01 +01:00
0459b1a242 Change the reserve and grant function to accept a closure 2024-12-04 10:32:25 +01:00
8ecb726683 Fix the minimun BBQueue channel threshold 2024-12-03 15:49:11 +01:00
297e72e262 Merge #5111
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 43s
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 9s
Test suite / Run Clippy (push) Successful in 7m18s
Test suite / Run Rustfmt (push) Successful in 1m32s
5111: Update BBQueue repo to point to the Meilisearch org r=curquiza a=Kerollmops

This PR updates the milli dependencies to make BBQueue point to the Meilisearch org repo.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-12-03 14:27:04 +00:00
0ad2f57a92 Update bbqueue repo to point to the meilisearch org 2024-12-03 12:00:04 +01:00
71d53f413f increase the margin allowed to delete task 2024-12-03 11:07:03 +01:00
054622bd16 Merge #5094
5094: Implement a bbqueue channel between the extractors and the writer r=dureuill a=Kerollmops

This PR switches from a bounded crossbeam channel only with allocated entries for the communication between the extractors and the writer to a [BBQueue](https://github.com/jamesmunns/bbqueue)-based system with a Single Producer Single Consumer kind of Circular/Ring Buffers channel.

 - [x] Implement the BBQueue channel system...
 - [x] with a crossbeam channel to wake up the receiver.
 - [x] Manage the BBQueue allocated memory dynamically.
 - [x] Support content that doesn't fit in the bbqueues.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-12-03 08:00:55 +00:00
e905a72d73 remove mimalloc on Windows 2024-12-02 18:13:56 +01:00
2e879c1df8 Merge #5109
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 11s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 24s
Test suite / Run Rustfmt (push) Successful in 1m22s
Test suite / Run Clippy (push) Successful in 6m29s
5109: Fix autobatch r=dureuill a=dureuill

Fixes most SDK tests and flaky failures

Changes:

- Make sure that the settings are not autobatched with document operations, as the new indexer no longer supports this operating mode

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-02 16:30:51 +00:00
d040aff101 Stop allocating 1GiB for documents 2024-12-02 16:30:14 +01:00
5e30731cad Merge #5107
5107: While spamming the batches route we could see a processing batch becoming missing and then finished, this commit ensures the batches goes from processing to finished directly r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes the failed tests from this PR: https://github.com/meilisearch/meilisearch-js/pull/1775
See [this message](https://meilisearch.slack.com/archives/CD7Q2UKGB/p1732784680450749) [private link] for more context

## What does this PR do?
- Ensure we never enter a state where a processing batches (only existing in RAM) becomes « Not found » by removing the processing batches AFTER writing them to disk
- This should also theoretically avoid an issue where a task could go from processing to enqueued and then finished


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-02 14:36:29 +00:00
beeb31ce41 Update crates/index-scheduler/src/lib.rs 2024-12-02 15:32:16 +01:00
057143214d Fix warnings 2024-12-02 14:42:31 +01:00
6a1d26a60c Update autobatching tests 2024-12-02 14:15:15 +01:00
d78f4666a0 Fix autobatching of documents and settings 2024-12-02 12:25:01 +01:00
a439fa3e1a While spamming the batches route we could see a processing batch becoming missing and then finished, this commit ensures the batches goes from processing to finished directly 2024-12-02 12:02:16 +01:00
767259be7e Prefer returning a abort indexation rather than throwing a panic 2024-12-02 11:53:42 +01:00
e9f34fb4b1 Make the frame consumer pulling fair 2024-12-02 11:49:01 +01:00
d5c07ef7b3 Manage key length conversion error correctly 2024-12-02 11:03:00 +01:00
5e218f3f4d Remove a sync_all (mark my words) 2024-12-02 11:03:00 +01:00
bcab61ab1d Do spurious wake ups on the receiver side 2024-12-02 11:03:00 +01:00
263c5a348e Move the spin looping for BBQueue frames into a dedicated function 2024-12-02 10:33:49 +01:00
be7d2fbe63 Move the EntryHeader up in the file and document the safety related to the size 2024-12-02 10:19:11 +01:00
f7f9a131e4 Improve copying bytes into aligned memory area 2024-12-02 10:15:58 +01:00
5df5eb2db2 Clarify a method name 2024-12-02 10:10:48 +01:00
30eb0e5b5b Rename recv and read methods to recv_action and recv_frame 2024-12-02 10:08:01 +01:00
5b860cb989 Fix english in the doc 2024-12-02 10:06:35 +01:00
76d0623b11 Reduce the number of unwraps 2024-12-02 10:05:06 +01:00
db4eaf4d2d Rename serialize_into into serialize_into_writer 2024-12-02 10:03:27 +01:00
13f21206a6 Call the serialize_into_writer method from the serialize_into one 2024-12-02 10:03:01 +01:00
14ee7aa84c Make sure the BBQueue is at least 50 MiB 2024-11-28 18:02:48 +01:00
8a35cd1743 Adjust the BBQueue buffers to use 2% instead of 10% 2024-11-28 16:00:15 +01:00
8d33af1dff Merge #5102
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 24s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 28s
Test suite / Run tests in debug (push) Failing after 28s
Test suite / Run Rustfmt (push) Successful in 3m52s
Test suite / Run Clippy (push) Successful in 9m8s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5102: Update mini-dashboard to v0.2.16 version r=curquiza a=curquiza

Fixes https://github.com/meilisearch/meilisearch/issues/5093

Fixes this bug: https://github.com/meilisearch/mini-dashboard/issues/563

Co-authored-by: curquiza <clementine@meilisearch.com>
2024-11-28 14:57:27 +00:00
3c7ac093d3 Take the BBQueue capacity into account in the max memory 2024-11-28 15:43:14 +01:00
d49d127863 Merge #5101
5101: Fix index settings opt out r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #5099 

## What does this PR do?
- Refactor the settings implementation ensuring the routes are configured
- Add a test checking if all the routes are tested
- Refactor the tests to ease the modifications


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-11-28 14:23:33 +00:00
b57dd5c58e Remove the Vector variant and use the Vectors 2024-11-28 15:20:43 +01:00
90b428a8c3 Apply change requests 2024-11-28 15:16:13 +01:00
096a28656e Fix a bug around deleting all the vectors of a doc 2024-11-28 15:15:06 +01:00
3dc87f5baa Update mini-dashboard to v0.2.16 version 2024-11-28 14:33:05 +01:00
cc4bd54669 Correctly construct the Embeddings struct 2024-11-28 13:53:25 +01:00
5383f41bba Polish test_setting_routes! 2024-11-28 12:04:21 +01:00
58eab9a018 Send large payload through crossbeam 2024-11-28 12:01:06 +01:00
9f36ffcbdb Polish make_setting_routes! 2024-11-28 11:44:09 +01:00
68c4717e21 Change the settings tests and macros to avoid oversights 2024-11-28 11:34:35 +01:00
5c488e20cc Send the geo rtree through crossbeam channel 2024-11-27 18:03:45 +01:00
da650f834e Plug the NoPanicThreadPool in the tests and benchmarks 2024-11-27 17:04:49 +01:00
e83534a430 Fix the indexer::index to correctly use the rayon::ThreadPool 2024-11-27 16:27:43 +01:00
98d4a2909e Fix the way we spawn the rayon threadpool 2024-11-27 16:05:44 +01:00
a514ce472a Make clippy happy 2024-11-27 14:59:04 +01:00
cc63802115 Modify and return the IndexEmbeddings to write them later 2024-11-27 14:58:03 +01:00
acec45ad7c Send a WakeUp when writing data in the BBQueue buffers 2024-11-27 14:33:23 +01:00
08d6413365 Fix result types 2024-11-27 14:32:42 +01:00
70802eb7c7 Fix most issues with the lifetimes 2024-11-27 14:32:42 +01:00
6ac5b3b136 Finish most of the channels types 2024-11-27 14:32:26 +01:00
e1e76f39d0 Clean up dependencies 2024-11-27 14:30:34 +01:00
2094ce8a9a Move the arroy building after the writing loop 2024-11-27 14:30:33 +01:00
8442db8101 Implement mostly all senders 2024-11-27 14:16:35 +01:00
79671c9faa Implement a first version of the bbqueue channels 2024-11-27 14:15:00 +01:00
a2f64f6552 Merge #5095
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 13s
Test suite / Run tests in debug (push) Failing after 12s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 40s
Test suite / Run Rustfmt (push) Successful in 1m46s
Test suite / Run Clippy (push) Successful in 9m55s
5095: Span to measure the part of db writes that is after the merge/extraction r=curquiza a=dureuill



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 11:10:00 +00:00
fde2e0691c Merge #5098
5098: Update charabia v0.9.2 r=dureuill a=ManyTheFish

# Pull Request

## Related issue
Fixes #5097

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-11-27 10:28:04 +00:00
18a9af353c Update Charabia version to v0.9.2 2024-11-27 11:12:08 +01:00
aae0dc715d Merge #5063
5063: Fix pagination when embedding fails r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5045

## What does this PR do?
- Use `return_keyword_results` function when embedding fails


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 09:13:28 +00:00
d0b2c0a523 Merge #5091
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 39s
Test suite / Run Rustfmt (push) Successful in 1m38s
Test suite / Run Clippy (push) Successful in 23m11s
5091: Settings opt out r=Kerollmops a=ManyTheFish

# Pull Request

Related PRD: https://www.notion.so/meilisearch/API-usage-Settings-to-opt-out-indexing-features-fff4b06b651f8108ade3f858aeb16b14?pvs=4

## Related issue
Fixes #4979 

- [x] Add setting opt-out
- [x] Add analytics
- [x] Add tests


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-11-26 15:50:28 +00:00
2e896f30a5 Fix PR comments 2024-11-26 16:06:33 +01:00
8f57b4fdf4 Span to measure the part of db writes that is after the merge/extraction 2024-11-26 14:46:36 +01:00
f014e78684 Update crates/milli/src/index.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-26 14:46:01 +01:00
9008ecda3d Update crates/meilisearch-types/src/settings.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-26 14:44:24 +01:00
d7bcfb2d19 fix clippy 2024-11-26 14:04:16 +01:00
fb66fec398 Merge #5092
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 12s
Test suite / Run tests in debug (push) Failing after 11s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 23s
Test suite / Run Rustfmt (push) Successful in 1m41s
Test suite / Run Clippy (push) Successful in 5m36s
5092: Precise spans for new indexer r=dureuill a=dureuill

- Separate extract and merge spans
- Add span around commit

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-26 10:59:40 +00:00
fa15be5bc4 Add span around commit 2024-11-26 09:45:48 +01:00
aa460819a7 Add more precise spans 2024-11-26 09:45:36 +01:00
e241f91285 Merge #5062
5062: Fix bugs for v1.12 r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #4984
Fixes https://github.com/meilisearch/meilisearch/issues/4974
Fixes [SDK test](https://github.com/meilisearch/meilisearch/actions/runs/11886701996/job/33118278794)
## What does this PR do?
- add 3 tests
- fix bugs

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-11-26 08:10:50 +00:00
d66dc363ed Test and implement settings opt-out 2024-11-25 18:23:22 +01:00
5560452ef9 Merge #5089
5089: Improve error handling when writing into LMDB r=dureuill a=Kerollmops

This PR exposes two new internal error variants: `StoreDelete` and `StorePut`. So that the error messages are better when we fail at writing into LMDB.

Related to #5078

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-25 16:19:41 +00:00
d9df7e00e1 Merge #5090
5090: Use the published crates versions r=dureuill a=Kerollmops

This PR uses the published versions of the obkv, grenad, and roaring crates in milli and Meilisearch.

Related to #5078.


Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-25 15:33:55 +00:00
b4fb2dabd4 Use the grenad rayon feature 2024-11-25 16:31:21 +01:00
5606679c53 Use the obkv and grenad crates.io versions 2024-11-25 16:24:59 +01:00
a3103f347e Fix the facet f64 database name 2024-11-25 16:05:31 +01:00
25aac45fc7 Expose better error messages 2024-11-25 15:54:43 +01:00
dd76eaaaec Merge #5076
Some checks failed
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 24s
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Run Clippy (push) Successful in 6m35s
Test suite / Run Rustfmt (push) Successful in 1m52s
Run the indexing fuzzer / Setup the action (push) Successful in 1h5m8s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
Indexing bench (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of indexing (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Has been cancelled
5076: Update version for the next release (v1.12.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2024-11-21 17:51:32 +00:00
98a785b0d7 Merge #5080
Some checks failed
Indexing bench (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of indexing (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Waiting to run
Run the indexing fuzzer / Setup the action (push) Successful in 1h5m43s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Publish binaries to GitHub release / Publish binary for ${{ matrix.os }} (meilisearch, meilisearch-macos-amd64, macos-13) (push) Waiting to run
Publish binaries to GitHub release / Publish binary for macOS silicon (meilisearch-macos-apple-silicon, aarch64-apple-darwin) (push) Waiting to run
Look for flaky tests / flaky (push) Failing after 21s
Test suite / Tests on ubuntu-20.04 (push) Failing after 10s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 22s
Test suite / Tests almost all features (push) Failing after 7s
Test suite / Test disabled tokenization (push) Failing after 7s
Test suite / Run tests in debug (push) Failing after 9s
Test suite / Run Rustfmt (push) Successful in 1m24s
Test suite / Run Clippy (push) Successful in 6m14s
Publish binaries to GitHub release / Check the version validity (push) Successful in 7s
Publish binaries to GitHub release / Publish binary for Linux (push) Failing after 9s
Publish binaries to GitHub release / Publish binary for ${{ matrix.os }} (meilisearch.exe, meilisearch-windows-amd64.exe, windows-2022) (push) Failing after 19s
Publish binaries to GitHub release / Publish binary for aarch64 (meilisearch-linux-aarch64, aarch64-unknown-linux-gnu) (push) Failing after 9s
5080: Fix getting a single batch through the GET route r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes a bug where getting a single batch does not work

Related to #5070 


fix by `@Kerollmops` 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 17:08:46 +00:00
ba7500998e Fix getting a single batch through the GET route 2024-11-21 17:59:31 +01:00
19e6f675b3 Merge #4900
4900: Indexer edition 2024 r=Kerollmops a=dureuill

This PR is implementing the indexer edition 2024, largely inspired by [the ideas from this blog post](https://blog.kerollmops.com/meilisearch-is-too-slow).

Fixes https://github.com/meilisearch/meilisearch/issues/4985

## Features
- Stream-first approach to reading documents.
- Minimum disk write operations.
- RAM usage-first approach to avoid modifying common bitmaps on disk but in memory.
- Reduced LMDB fragmentation by writing entries only once...
- ...computing the final version of the entries in parallel...
- ...and storing them in write-optimized data structures before sending them to the BTree (LMDB).
- Indexing in multiple transactions to improve large dataset support (dumps).


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 16:19:10 +00:00
323ecbb885 Add span on document operation 2024-11-21 17:01:10 +01:00
ffb60cb885 Add comment explaining why we fixed the version of insta 2024-11-21 16:56:56 +01:00
dcc3caef0d Remove TopLevelMap 2024-11-21 16:56:46 +01:00
221e547e86 Slight changes 2024-11-21 16:47:44 +01:00
61d0615253 Document the geo point extractor 2024-11-21 16:47:08 +01:00
5727e00374 Remove useless geo skipped 2024-11-21 16:47:08 +01:00
9b60843831 Remove commented lines 2024-11-21 16:47:07 +01:00
36962b943b First batch of PR comment 2024-11-21 16:38:11 +01:00
32bcacefd5 Changes Document::len to Document::top_level_fields_count 2024-11-21 15:01:07 +01:00
4ed195426c remove unused stuff in global.rs 2024-11-21 15:01:07 +01:00
ff38f29981 Update crates/index-scheduler/src/batch.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 14:18:39 +01:00
5899861ff0 Update version for the next release (v1.12.0) in Cargo.toml 2024-11-21 11:21:18 +00:00
94b260fd25 Remove orphan span 2024-11-21 12:12:07 +01:00
03ab6b39e7 Revert the change in run count for movies workload 2024-11-21 11:17:34 +01:00
ab2c83f868 Use the disk less when computing prefixes 2024-11-21 10:45:37 +01:00
9a08757a70 Merge #5070
Some checks failed
Test suite / Tests on ubuntu-20.04 (push) Failing after 12s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 11s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Run Clippy (push) Successful in 6m18s
Test suite / Run Rustfmt (push) Successful in 1m34s
Indexing bench (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of indexing (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Waiting to run
Run the indexing fuzzer / Setup the action (push) Successful in 1h4m33s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5070: Improve the details and stats of the current batch processing r=Kerollmops a=irevoire

Small improvement we missed over https://github.com/meilisearch/meilisearch/pull/5060

The current batch processing had empty details and stats.

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-11-20 16:56:01 +00:00
1f9692cd04 Increase map size for tests 2024-11-20 17:52:21 +01:00
1e694ae432 improve the count of the number of tasks in a batch 2024-11-20 17:48:26 +01:00
71807cac6d makes clippy happy 2024-11-20 17:40:58 +01:00
21a2264782 improve the details and stats of the current batch processing 2024-11-20 17:25:55 +01:00
bda2b41d11 update snaps after merge 2024-11-20 17:08:30 +01:00
6e6acfcf1b Merge branch 'main' into indexer-edition-2024 2024-11-20 16:59:58 +01:00
e0864f1b21 Separate side effect and debug asserts 2024-11-20 16:25:17 +01:00
a38344acb3 Replace eprintlns by tracing 2024-11-20 15:29:51 +01:00
4d616f8794 Parse every attributes and filter before tokenization 2024-11-20 15:15:25 +01:00
ff9c92c409 rename documents -> substep 2024-11-20 15:12:02 +01:00
8380ddbdcd Fix progress of into_changes 2024-11-20 15:10:09 +01:00
d4d8becfa7 Merge #5060
Some checks failed
Indexing bench (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of indexing (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Waiting to run
Publish binaries to GitHub release / Check the version validity (push) Successful in 11s
Publish binaries to GitHub release / Publish binary for ${{ matrix.os }} (meilisearch, meilisearch-macos-amd64, macos-13) (push) Waiting to run
Publish binaries to GitHub release / Publish binary for macOS silicon (meilisearch-macos-apple-silicon, aarch64-apple-darwin) (push) Waiting to run
Publish binaries to GitHub release / Publish binary for ${{ matrix.os }} (meilisearch.exe, meilisearch-windows-amd64.exe, windows-2022) (push) Failing after 21s
Publish binaries to GitHub release / Publish binary for Linux (push) Failing after 12s
Publish binaries to GitHub release / Publish binary for aarch64 (meilisearch-linux-aarch64, aarch64-unknown-linux-gnu) (push) Failing after 10s
Run the indexing fuzzer / Setup the action (push) Successful in 1h5m1s
Test suite / Tests on ubuntu-20.04 (push) Failing after 12s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Failing after 9s
Test suite / Test disabled tokenization (push) Failing after 8s
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 40s
Test suite / Run Rustfmt (push) Successful in 1m28s
Test suite / Run Clippy (push) Successful in 5m29s
5060: Batch route r=Kerollmops a=irevoire

# Pull Request

See [usage](https://www.notion.so/meilisearch/Enhance-visibility-on-batched-tasks-1194b06b651f810b8fe0fab5d72846a8).

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4977

## What does this PR do?
- For more detailed information, see the PRD.
- Added a `batchUid` to the tasks (that's the cause of all the updates of the dumps):
  - For all enqueued tasks, it's set to `None`
  - For every other tasks it must be set to something
  - ⚠️ For all the tasks imported in a dump, the `batchUid` will be set to `None` as well.
- Add two new routes:
  - `GET /batches/:uid` - to query a batch by its id
  - `GET /batches` - to retrieve a list of batches. It accepts all the same query parameters that are available on the `GET /tasks` route
- Adds new databases to query the batches directly:
  - When doing a query against the batches, the rule of thumb is that we want to return a batch iif **at least one** task in it matches the provided filter.
  - We don't need a `canceledBy` batch specific database because we can just retrieve the task and if it's a `taskCancelation` retrieve its `batchUid`
- The task cancelation has been updated and simplified a bit:
  - Instead of updating the matching tasks on disk while processing the cancelation task, we instead retrieve the task and let the `tick` function do the work afterward.
  - In the `tick` function, we now have to take care of not missing any tasks
- All the tests applied to the tasks were duplicated and updated to works with the new batches routes
- The deletion of batches doesn't contain any tests because it's already tested in the deletion of tasks (and especially highlighted in the snapshots)


Currently, one part of the PRD is not implemented: it's the progress.

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-11-20 14:07:48 +00:00
867138f166 Add SP to into_changes 2024-11-20 15:07:05 +01:00
567bd4538b Fxi the into_changes stop processing 2024-11-20 14:58:25 +01:00
84600a10d1 Add MSP to document_update.into_changes() 2024-11-20 14:53:37 +01:00
35bbe1c2a2 Add failing test on settings changes 2024-11-20 14:48:12 +01:00
7d64e8dbd3 Fix Windows compilation 2024-11-20 14:40:38 +01:00
ec06879d28 apply review changes 2024-11-20 14:40:36 +01:00
83d1f858c1 Update crates/index-scheduler/src/lib.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-20 14:36:05 +01:00
cae8c89467 "fix" last warnings 2024-11-20 14:03:52 +01:00
a7ac590e9e implements the reverse query parameter for the batches 2024-11-20 13:29:52 +01:00
7cb8732b45 Introduce a new bincode internal error 2024-11-20 13:23:11 +01:00
8ad68dd708 stop leaking the update files of the canceled tasks 2024-11-20 13:17:54 +01:00
fe5d50969a Fix filed selector in extrators 2024-11-20 13:16:44 +01:00
56c7c5d5f0 Fix comments 2024-11-20 13:16:44 +01:00
4cdfdddd6d Fix one more 2024-11-20 13:16:43 +01:00
2afa33011a Fix tokenize_document 2024-11-20 13:16:43 +01:00
61feca1f41 More tests pass 2024-11-20 13:16:43 +01:00
f893b5153e Don't mark [""] as empty facet 2024-11-20 13:16:42 +01:00
ca779c21f9 facets: Handle boolean and skip empty strings 2024-11-20 13:16:42 +01:00
477077bdc2 Remove _vectors from fid map when there are no vectors in sight 2024-11-20 13:16:42 +01:00
b1f8aec348 Fix index_documents_check_exists_database 2024-11-20 13:16:41 +01:00
ba7f091db3 Use tokenizer on numbers and booleans 2024-11-20 13:16:41 +01:00
8049df125b Add depth to facet extraction so that null inside an array doesn't mark the entire field as null 2024-11-20 13:16:40 +01:00
50d1bd01df We no longer index geo lat and lng 2024-11-20 13:16:40 +01:00
a28d4f5d0c Fix setup_search_index_with_criteria 2024-11-20 13:16:40 +01:00
fc14f4bc66 Attempt to fix setup_search_index_with_criteria 2024-11-20 13:16:39 +01:00
5f8a82d6f5 Improve test 2024-11-20 13:16:39 +01:00
fe04e51a49 One more 2024-11-20 13:16:38 +01:00
01b27e40ad Fix a bit of the placeholder search tests 2024-11-20 13:16:38 +01:00
8076d98544 Fix stats_should_not_return_deleted_documents 2024-11-20 13:16:37 +01:00
9e951baad5 One more test passing 2024-11-20 13:16:37 +01:00
52f2fc4c46 Fail in case of user error in tests 2024-11-20 13:16:37 +01:00
3957917e0b Correctly count indexed documents 2024-11-20 13:16:36 +01:00
651c30899e Allow fetching embedders from inside tests 2024-11-20 13:16:36 +01:00
2c7a7fe4e8 Count the number of documents correctly 2024-11-20 13:16:35 +01:00
23f0c2c29b Generate internal ids only when needed 2024-11-20 13:16:35 +01:00
6641c3f59b Remove all autogenerated tests 2024-11-20 13:16:34 +01:00
07a72824b7 Subfields of _vectors are no longer part of the fid map 2024-11-20 13:16:34 +01:00
000eb55c4e fix one 2024-11-20 13:16:34 +01:00
b4bf7ce9b0 Increase the number of readers as the indexer uses readers too 2024-11-20 13:16:33 +01:00
1aef0e4037 documents! macro accepts a single object again 2024-11-20 13:16:33 +01:00
32d0e50a75 Fix all the benchmark compilation errors 2024-11-20 13:16:32 +01:00
df5884b0c1 Fix settings test 2024-11-20 13:16:32 +01:00
9e0eb5ebb0 Removed some warnings 2024-11-20 13:16:32 +01:00
3cf1352ae1 Fix the benchmark tests 2024-11-20 13:16:31 +01:00
aba8a0e9e0 Fix some tests but not all of them 2024-11-20 13:16:31 +01:00
670aff5553 Remove useless Transform methods 2024-11-20 13:16:08 +01:00
7e379b3d14 remove useless prints 2024-11-20 12:27:12 +01:00
56eacd221f update the tests after the rebase 2024-11-20 10:54:38 +01:00
bdb51a85fe now that the task cancelation shares their started at with all the tasks of their batch we don't need the trick of retrieving the previous batch anymore 2024-11-20 10:51:07 +01:00
b24a34830d fix the dump test -> the only change is that we now have a null batch_uid in all the tasks 2024-11-20 10:51:06 +01:00
e145d71a62 implements the two last TODOs 2024-11-20 10:51:06 +01:00
d9a4e69990 push a missing snapshot 2024-11-20 10:51:06 +01:00
b906e3ed70 improve the way we access the mutex 2024-11-20 10:51:06 +01:00
4abcd9c04e add some stats on the batches 2024-11-20 10:51:06 +01:00
229fa0f902 implements the batch details 2024-11-20 10:51:06 +01:00
5d10c2312b remove unused file 2024-11-20 10:51:06 +01:00
f1d38581e5 add the front end tests on the batches routes 2024-11-20 10:51:06 +01:00
62646af7b9 implements the automatic batch deletion 2024-11-20 10:51:06 +01:00
1fcb9526f5 fix the task cancelation 2024-11-20 10:51:06 +01:00
15eefa4fcc fixes a lot of small issue, the test about the cancellation is still failing 2024-11-20 10:51:05 +01:00
ad9763ffcd copy multiple task query tests to batches. Currently, they fails 2024-11-20 10:49:25 +01:00
d489f5635f add the mapping between the task and batches 2024-11-20 10:49:23 +01:00
a1251c3c83 Implements the get all batches route with filters working 2024-11-20 10:42:55 +01:00
6062914654 add the batch_id to the tasks 2024-11-20 10:42:54 +01:00
057fcb3993 Add indices field to _matchesPosition to specify where in an array a match comes from (#5005)
Some checks are pending
Indexing bench (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of indexing (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Waiting to run
Run the indexing fuzzer / Setup the action (push) Successful in 1h4m31s
* Remove unreachable code

* Add `indices` field to `MatchBounds`

For matches inside arrays, this field holds the indices of the array
elements that matched. For example, searching for `cat` inside
`{ "a": ["dog", "cat", "fox"] }` would return `indices: [1]`. For nested
arrays, this contains multiple indices, starting with the one for the
top-most array. For matches in fields without arrays, `indices` is not
serialized (does not exist) to save space.
2024-11-20 01:00:43 +01:00
41dbdd2d18 Fix filtered_placeholder_search_should_not_return_deleted_documents and word_scale_set_and_reset 2024-11-19 16:08:25 +01:00
bfefaf71c2 Progress displayed in logs 2024-11-19 09:32:52 +01:00
c782c09208 Move step to a dedicated mod and replace it with an enum 2024-11-18 18:22:13 +01:00
75943a5a9b Add TODO to remember replacing steps with an enum 2024-11-18 17:40:51 +01:00
c1d8ee2a8d Merge #5048
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 14s
Test suite / Run tests in debug (push) Failing after 24s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 57s
Test suite / Run Rustfmt (push) Successful in 1m36s
Test suite / Run Clippy (push) Successful in 6m8s
Indexing bench (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of indexing (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Waiting to run
Run the indexing fuzzer / Setup the action (push) Successful in 1h4m23s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5048: Reverse the order of the task queue r=Kerollmops a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5047

## What does this PR do?
- Provide a new parameter to reverse the order of the task queue
- Add tests
- Remove some unrelated tests that were duplicated in tests/tasks/mod.rs and tests/tasks/error.rs


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-11-18 16:24:12 +00:00
04c38220ca Move MostlySend, ThreadLocal, FullySend to their own commit 2024-11-18 16:43:05 +01:00
5f93651cef fixes 2024-11-18 16:23:11 +01:00
510ca99996 Fixes #4974 2024-11-18 16:08:55 +01:00
8924d486db Add a test reproducing the bug 2024-11-18 16:08:55 +01:00
e0c3f3d560 Fix #4984 2024-11-18 16:08:53 +01:00
0a21d9bfb3 Fix double borrow of new fields id map 2024-11-18 15:56:01 +01:00
1f8b01a598 Fix snap since _vectors is no longer part of the field distributions 2024-11-18 12:50:59 +01:00
e736a74729 Remove infinite loop in import_vectors 2024-11-18 12:50:56 +01:00
e9d17136b2 Add deadline of 3 seconds to embedding requests made in the context of hybrid search 2024-11-18 12:15:11 +01:00
a05e448cf8 Add test 2024-11-18 12:15:11 +01:00
cd796b0f4b Fix SDK test 2024-11-18 11:46:00 +01:00
6570da3bcb Retry in case where the JSON deserialization fails 2024-11-18 11:33:09 +01:00
5b4c06c24c Plug the grenad max memory parameter 2024-11-18 11:28:04 +01:00
3a8051866a Use return_keyword_results function instead of returning raw keyword results when the embedder is broken 2024-11-18 11:17:15 +01:00
9150c8f052 Accept changes to vector format 2024-11-18 11:04:57 +01:00
c202f3dbe2 fix tests and revert change in behavior when primary_key_from_op != primary_key_from_db && index.is_empty() 2024-11-18 10:59:05 +01:00
677d7293f5 Fix a lot of primary key related tests 2024-11-18 10:59:05 +01:00
bd31ea2174 Check for at least one valid task after setting their statuses 2024-11-18 10:59:05 +01:00
83865d2ebd Expose intermediate errors when processing batches 2024-11-18 10:59:05 +01:00
72ba353498 reproduce sdk fail 2024-11-18 10:03:23 +01:00
4ff2b3c2ee Fix test on locales 2024-11-14 15:45:04 +01:00
91c58cfa38 Fix positional databases 2024-11-14 11:40:12 +01:00
9e8367f1e6 Move the rayon thread pool outside the extract method 2024-11-14 10:40:32 +01:00
0dd321afc7 reproduce #4984 2024-11-14 10:02:51 +01:00
0e3c5d91ab Document deletion test passes 2024-11-14 08:42:56 +01:00
695c2c6b99 Cosmetic fix 2024-11-14 08:42:39 +01:00
40dd25d6b2 Fix issue with Replace document method when adding and deleting a document in the same batch 2024-11-13 22:10:00 +01:00
8e5b1a3ec1 Compute the field distribution and convert _geo into an f64s 2024-11-13 17:44:05 +01:00
e627e182ce Fix facet strings 2024-11-13 17:43:02 +01:00
51b6293738 Add linear facet databases 2024-11-13 17:43:02 +01:00
b17896d899 Finialize the GeoExtractor 2024-11-13 17:43:02 +01:00
94fb55bb6f Merge #5049
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests on ubuntu-20.04 (push) Failing after 59s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 13s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 7m4s
Test suite / Run Clippy (push) Successful in 10m58s
Test suite / Run Rustfmt (push) Successful in 2m34s
Run the indexing fuzzer / Setup the action (push) Successful in 1h5m58s
Indexing bench (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of indexing (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Has been cancelled
5049: Fix the path used in the flaky tests CI r=irevoire a=Kerollmops

This PR fixes [the flaky tests CI](https://github.com/meilisearch/meilisearch/actions/runs/11741717787) path used.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-13 10:26:50 +00:00
a01bc7b454 Fix error_document_field_limit_reached_in_one_document test 2024-11-13 10:34:54 +01:00
7accfea624 Don't short circuit when we encounter a semantic error while extracting fields and external docid 2024-11-13 10:33:59 +01:00
009709eace Fix the path used in the flaky tests CI 2024-11-13 09:52:10 +01:00
82dcaba6ca Fix test: somehow on main vectors where displayed even though retrieveVectors: false 2024-11-12 23:58:25 +01:00
cb1d6613dd Adjust snapshots 2024-11-12 23:26:30 +01:00
3b0cb5b487 Fix vector error messages 2024-11-12 23:26:16 +01:00
bfdcd1cf33 Space changes 2024-11-12 22:52:45 +01:00
1d13e804f7 Adjust test snapshots 2024-11-12 22:52:41 +01:00
c4e9f761e9 Emit better error messages when parsing vectors 2024-11-12 22:49:22 +01:00
8a6e61c77f InvalidVectorsEmbedderConf error takes a String rather than a deserr error 2024-11-12 22:47:57 +01:00
68bbf674c9 Make REST mock thread independent 2024-11-12 16:31:31 +01:00
980921e078 Vector fixes 2024-11-12 16:31:22 +01:00
1fcd5f091e Remove progress from task 2024-11-12 12:23:13 +01:00
6094bb299a Fix user_provided vectors 2024-11-12 10:15:55 +01:00
bef8fc6cf1 Fix hf embedder 2024-11-08 13:10:17 +01:00
e32677999f Adapt some snapshots 2024-11-08 00:06:33 +01:00
5185aa21b8 Know if your vectors are implicit when writing them back in documents + don't write empty _vectors 2024-11-08 00:05:36 +01:00
8a314ab81d Fix primary key fid order 2024-11-08 00:05:12 +01:00
4706a0eb49 Fix vector parsing 2024-11-07 23:26:20 +01:00
d97af4d8e6 fix field order of JSON documents 2024-11-07 22:36:52 +01:00
2eb1801e85 reverse the order of the task queue 2024-11-07 19:19:44 +01:00
a5d7ae23bd Merge #5044
5044: Adds new metrics to prometheus r=irevoire a=PedroTurik

not 100% confident in this solution, especially because i couldn't make the "Search Queue searches waiting" metric give me any value other than 0 with my local testing 😆. But i believe it solves the Issue.

# Pull Request

## Related issue
Fixes #4998 

## What does this PR do?
### Adds new metrics to prometheus;
- SearchQueue size, 
- SearchQueue searches running, 
- and Search Queue searches waiting.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: Pedro Turik Firmino <pedroturik@gmail.com>
2024-11-07 17:05:43 +00:00
1f5d801271 Fix crashes in facet search indexing 2024-11-07 17:22:30 +01:00
7864530589 Make the word prefix integer multi-threaded 2024-11-07 16:39:14 +01:00
03886d0012 Applies optimizations to formatted integration tests (#5043) 2024-11-07 15:58:55 +01:00
700757c01f Adding a new step 2024-11-07 15:32:04 +01:00
01f8f30a7a Fix indentation 2024-11-07 15:08:56 +01:00
0e4e9e866a Move the RefCellExt trait in a dedicated module 2024-11-07 11:36:09 +01:00
1477b81d38 Support cancelation in merge and send 2024-11-07 11:23:49 +01:00
c9f478bc45 Fix bbbul merger 2024-11-07 10:53:46 +01:00
b427b9e88f Merge #5025
5025: test: improve performance of get_documents.rs r=irevoire a=PedroTurik

# Pull Request

## Related issue
Fixes one item from #4840 

## What does this PR do?
- Applies the changes recommended on the issue for `meilisearch/tests/documents/get_documents.rs`

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Pedro Turik Firmino <pedroturik@gmail.com>
2024-11-07 09:46:34 +00:00
39366a67c4 Top level fields don't return vector fields 2024-11-07 10:39:58 +01:00
e2138170ad some warning fix 2024-11-07 10:06:07 +01:00
03650e3217 Reverse order of computation 2024-11-07 09:39:46 +01:00
8b95f5ccc6 Adds new metrics to prometheus: SearchQueue size, SearchQueue searches running, and Search Queue searches waiting. 2024-11-06 15:37:16 -03:00
10f49f0d75 Post processing of the merge 2024-11-06 17:50:12 +01:00
ee03743355 Merge branch 'indexer-edition-2024' into indexer-edition-2024-doc-chunks 2024-11-06 15:50:53 +01:00
10feeb88f2 Merge branch 'main' into indexer-edition-2024 2024-11-06 15:19:18 +01:00
a9ecbf0b64 Use the Bbbul crate in the cache to better control memory 2024-11-06 14:40:14 +01:00
6b67f9fc4c Merge #5030
5030: Bump Swatinem/rust-cache from 2.7.1 to 2.7.5 r=curquiza a=dependabot[bot]

Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.7.1 to 2.7.5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p>
<blockquote>
<h2>v2.7.5</h2>
<h2>What's Changed</h2>
<ul>
<li>Upgrade checkout action from version 3 to 4 by <a href="https://github.com/carsten-wenderdel"><code>`@​carsten-wenderdel</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/190">Swatinem/rust-cache#190</a></li>
<li>fix: usage of <code>deprecated</code> version of <code>node</code> by <a href="https://github.com/hamirmahal"><code>`@​hamirmahal</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/197">Swatinem/rust-cache#197</a></li>
<li>Only run macOsWorkaround() on macOS by <a href="https://github.com/heksesang"><code>`@​heksesang</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/206">Swatinem/rust-cache#206</a></li>
<li>Support Cargo.lock format cargo-lock v4 by <a href="https://github.com/NobodyXu"><code>`@​NobodyXu</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/211">Swatinem/rust-cache#211</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/carsten-wenderdel"><code>`@​carsten-wenderdel</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/190">Swatinem/rust-cache#190</a></li>
<li><a href="https://github.com/hamirmahal"><code>`@​hamirmahal</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/197">Swatinem/rust-cache#197</a></li>
<li><a href="https://github.com/heksesang"><code>`@​heksesang</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/206">Swatinem/rust-cache#206</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/Swatinem/rust-cache/compare/v2.7.3...v2.7.5">https://github.com/Swatinem/rust-cache/compare/v2.7.3...v2.7.5</a></p>
<h2>v2.7.3</h2>
<ul>
<li>Work around upstream problem that causes cache saving to hang for minutes.</li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/Swatinem/rust-cache/compare/v2.7.2...v2.7.3">https://github.com/Swatinem/rust-cache/compare/v2.7.2...v2.7.3</a></p>
<h2>v2.7.2</h2>
<h2>What's Changed</h2>
<ul>
<li>Update action runtime to <code>node20</code> by <a href="https://github.com/rhysd"><code>`@​rhysd</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/175">Swatinem/rust-cache#175</a></li>
<li>Only key by <code>Cargo.toml</code> and <code>Cargo.lock</code> files of workspace members by <a href="https://github.com/max-heller"><code>`@​max-heller</code></a>` in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/180">Swatinem/rust-cache#180</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/rhysd"><code>`@​rhysd</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/175">Swatinem/rust-cache#175</a></li>
<li><a href="https://github.com/max-heller"><code>`@​max-heller</code></a>` made their first contribution in <a href="https://redirect.github.com/Swatinem/rust-cache/pull/180">Swatinem/rust-cache#180</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/Swatinem/rust-cache/compare/v2.7.1...v2.7.2">https://github.com/Swatinem/rust-cache/compare/v2.7.1...v2.7.2</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>2.7.3</h2>
<ul>
<li>Work around upstream problem that causes cache saving to hang for minutes.</li>
</ul>
<h2>2.7.2</h2>
<ul>
<li>Only key by <code>Cargo.toml</code> and <code>Cargo.lock</code> files of workspace members.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="82a92a6e8f"><code>82a92a6</code></a> 2.7.5</li>
<li><a href="598fe25fa1"><code>598fe25</code></a> update dependencies, rebuild</li>
<li><a href="8f842c2d45"><code>8f842c2</code></a> Support Cargo.lock format cargo-lock v4 (<a href="https://redirect.github.com/swatinem/rust-cache/issues/211">#211</a>)</li>
<li><a href="96a8d65dba"><code>96a8d65</code></a> Only run macOsWorkaround() on macOS (<a href="https://redirect.github.com/swatinem/rust-cache/issues/206">#206</a>)</li>
<li><a href="9bdad043e8"><code>9bdad04</code></a> fix: usage of <code>deprecated</code> version of <code>node</code> (<a href="https://redirect.github.com/swatinem/rust-cache/issues/197">#197</a>)</li>
<li><a href="f7a52f6914"><code>f7a52f6</code></a> &quot;add jsonpath test&quot;</li>
<li><a href="2bceda3912"><code>2bceda3</code></a> &quot;update dependencies&quot;</li>
<li><a href="640a22190e"><code>640a221</code></a> Upgrade checkout action from version 3 to 4 (<a href="https://redirect.github.com/swatinem/rust-cache/issues/190">#190</a>)</li>
<li><a href="1582741630"><code>1582741</code></a> update dependencies</li>
<li><a href="23bce251a8"><code>23bce25</code></a> 2.7.3</li>
<li>Additional commits viewable in <a href="https://github.com/swatinem/rust-cache/compare/v2.7.1...v2.7.5">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Swatinem/rust-cache&package-manager=github_actions&previous-version=2.7.1&new-version=2.7.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-06 12:59:36 +00:00
2e4d4b398d Bump Swatinem/rust-cache from 2.7.1 to 2.7.5
Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.7.1 to 2.7.5.
- [Release notes](https://github.com/swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](https://github.com/swatinem/rust-cache/compare/v2.7.1...v2.7.5)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-06 12:57:04 +00:00
da59a043ba Fixes formatting issues 2024-11-06 09:55:48 -03:00
da4d47b5d0 Fixes formatting issues 2024-11-06 09:54:20 -03:00
0507f5d99b Merge #4928
4928: Make matches consider phrases as a single `Match` r=ManyTheFish a=flevi29

# Pull Request

## Related issue
Fixes #4732

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: F. Levi <55688616+flevi29@users.noreply.github.com>
2024-11-06 08:22:01 +00:00
8b260de5a0 Reimplement facet search and facetr level and put them in dedidcated functions 2024-11-05 16:46:43 +01:00
be2a7c70f2 Merge #5037
5037: Fix the benchmarks r=Kerollmops a=irevoire

# Pull Request

## Related issue
https://github.com/meilisearch/meilisearch/pull/5016 broke all benchmarks. This PR fix the benchmarks


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-11-05 15:37:55 +00:00
33b1f54b41 Progress, in the task queue 2024-11-05 16:23:02 +01:00
ede086bc30 Merge #5034
5034: Upgrade from v1 10 to v1 11 r=irevoire a=irevoire

# Pull Request

## Related issue
Parts of https://github.com/meilisearch/meilisearch/issues/4978

## What does this PR do?
- Move the code around the offline upgrade to its own module with a file per version
- Fix the upgrade from v1.9 to v1.10 because I couldn’t make it work anymore. It now uses a specified format instead of relying on cargo to get the right set of feature
- ☝️ must be checked against docker
- Provide an update path from v1.10 to v1.11. Most of the code is boilerplate in meilitool, the real code is located here: 053807bf38/src/lib.rs (L161-L269)


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-11-05 14:49:56 +00:00
7415ef7ff5 Update crates/meilitool/src/upgrade/v1_11.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-05 15:37:59 +01:00
a5d138ac34 use a tag while importing arroy instead of a loose branch or rev 2024-11-05 15:24:02 +01:00
0f74a93346 Update crates/meilitool/src/upgrade/v1_11.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-05 15:14:02 +01:00
e4993aa705 Update crates/meilitool/src/upgrade/mod.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-05 15:13:50 +01:00
66b7e0824e Update crates/meilitool/src/upgrade/mod.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-05 15:13:40 +01:00
f193c3a67c Update crates/meilitool/src/main.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-05 15:13:32 +01:00
9799812b27 fix the benchmarks 2024-11-05 15:08:01 +01:00
db55638714 Do not forget to recompute common prefixes 2024-11-05 11:26:46 +01:00
ad52c950ba Only run word pair proximity docids extraction if proximity_precision enables it 2024-11-05 11:08:47 +01:00
48ab898ca2 fix the datetime of v1.9 2024-11-05 10:30:53 +01:00
a5dc783ffa Merge with main branch 2024-11-05 10:56:17 +02:00
1b49b60486 Merge #5026
5026: test: improve performance of update_documents.rs  r=dureuill a=PedroTurik

# Pull Request

## Related issue
Fixes one item from #4840 

## What does this PR do?
- Applies the changes recommended on the issue for `meilisearch/tests/documents/update_documents.rs`

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Pedro Turik Firmino <pedroturik@gmail.com>
2024-11-05 08:37:44 +00:00
d0b1ba20cb Improves usage of shared indexes 2024-11-04 17:26:50 -03:00
a1f228f662 remove the uneeded files after the rebase 2024-11-04 18:19:36 +01:00
99a9fde37f push back the removed files 2024-11-04 17:55:55 +01:00
106cc7fe3a fmt 2024-11-04 17:51:40 +01:00
4eef0cd332 fix the update from v1_9 to v1_10 by providing a custom datetime formatter myself 2024-11-04 17:47:10 +01:00
5f57306858 update the arroy version in meilitool 2024-11-04 17:47:10 +01:00
690eb42fc0 update the version of arroy 2024-11-04 17:47:10 +01:00
a9b61c8434 fix the version parsing and improve error handling 2024-11-04 17:47:10 +01:00
ddd03e9b37 implement the upgrade from v1.10 to v1.11 in meilitool 2024-11-04 17:47:10 +01:00
362836efb7 make an upgrade module where we'll be able to shove each version instead of putting everything in the same file 2024-11-04 17:47:10 +01:00
22229d3046 Merge #5022
5022: Briging changes from v1.11.0 back to main r=irevoire a=Kerollmops

Fixes https://github.com/meilisearch/meilisearch/issues/5035

...and fixing merge conflicts.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2024-11-04 15:34:19 +00:00
186326fe40 update the macos version 2024-11-04 16:33:04 +01:00
cf6ad1ae5e Merge branch 'main' into tmp-release-v1.11.0 2024-11-04 16:14:44 +01:00
3658f57f93 Add progress 2024-11-04 15:10:40 +01:00
c79ca9679b Changes variable name to re-run CI 2024-11-02 18:25:33 -03:00
a77d5ea8c1 Pass embedders to documents 2024-10-30 14:03:29 +01:00
c9082130c8 support vectors or array of vectors 2024-10-30 13:50:51 +01:00
df5bc3c9fd Reintroduce vector errors 2024-10-30 10:55:57 +01:00
0f6a1dbce7 habemus field distribution 2024-10-30 10:06:46 +01:00
4ebedf4dc8 clippy fixes 2024-10-30 10:06:38 +01:00
b02a72c0c0 Applies optimizations to some integration tests 2024-10-29 19:30:11 -03:00
a934b0ac6a Applies optimizations to some integration tests 2024-10-29 18:49:06 -03:00
1075dd34bb Vectors 2024-10-29 17:43:36 +01:00
28274292d8 Merge #5021
5021: Update benchmarks to match the new crates subfolder r=dureuill a=Kerollmops



Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-10-29 08:06:35 +00:00
7058959a46 Write into documents 2024-10-28 16:18:48 +01:00
9cbb2b066a WIP vector extraction 2024-10-28 14:23:54 +01:00
5efd70c251 Allow random access to fields in documents 2024-10-28 14:23:38 +01:00
65470e26e0 Document trait changes 2024-10-28 14:23:20 +01:00
bbb67ae0a8 todo channel 2024-10-28 14:23:02 +01:00
af9f96e2af Update older embedding 2024-10-28 14:22:45 +01:00
1960003805 Remove some warnings 2024-10-28 14:22:19 +01:00
2a91849660 Remove primary key from top id map 2024-10-28 14:21:50 +01:00
663deac236 Slight changes index scheduler 2024-10-28 14:21:39 +01:00
c8189e975c Add rendering based on document trait 2024-10-28 14:10:55 +01:00
9e7c455a01 GlobalFieldIdMap manages metadata 2024-10-28 14:09:48 +01:00
c22dc55694 Add embed_chunks_ref 2024-10-28 14:08:54 +01:00
50de3fba7b Update raw-collections 2024-10-28 14:07:23 +01:00
ee72f622c7 Update benchmarks to match the new crates subfolder 2024-10-28 14:06:46 +01:00
b0da626506 Merge #5016
5016: Hide code complexity into a subfolder r=Kerollmops a=Kerollmops

This PR moves the complexity and main code into a subfolder to make the main repository page more welcoming by reducing the number of visible files and showing the README earlier.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-10-28 09:43:14 +00:00
3d29226a7f Merge pull request #5019 from meilisearch/indexer-edition-2024-bumpalo-in-extractors
Implement facet search extraction
2024-10-23 10:42:38 +02:00
f372ee505f Merge #5017
5017: Rollback the Meilisearch Kawaii logo r=curquiza a=Kerollmops

This PR reverts #4778 and brings back the official one. It's no longer the time to JOKE, OK !?

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-10-22 08:14:18 +00:00
89243f7df0 WIP vector extraction 2024-10-21 10:39:40 +02:00
9fe5122176 Fixup imports 2024-10-21 10:39:31 +02:00
aff8ca4397 Add raw versions of parsed vectors 2024-10-21 10:39:05 +02:00
1a3f4e719d Vector document trait 2024-10-21 10:38:21 +02:00
c278024709 Add vectors field and geo field to document trait 2024-10-21 10:37:40 +02:00
73e29ee155 EmbeddingSender stub 2024-10-21 10:35:56 +02:00
124b5c3df8 Update raw collections 2024-10-21 10:35:44 +02:00
60cc09abec Implement facet search exctraction 2024-10-21 09:28:49 +02:00
8ef8035bf2 Fix CI 2024-10-21 08:28:33 +02:00
3353bcd82d Revert "Change the Meilisearch logo to the kawaii version"
This reverts commit 13d1d78a2d.
2024-10-21 08:21:56 +02:00
9c1e54a2c8 Move crates under a sub folder to clean up the code 2024-10-21 08:18:43 +02:00
e51e6f902a Highlight partially cropped matches too 2024-10-19 13:42:02 +03:00
6c226a4580 Merge branch 'main' into change-matches-position-phrase-search 2024-10-17 21:25:42 +03:00
cd378e5bd2 Add chunking 2024-10-17 10:18:00 +02:00
c1fcb2ebc6 add some warning 2024-10-17 09:43:11 +02:00
0749633618 Don't sort in parallel in sorters of the new indexer 2024-10-17 09:30:18 +02:00
0647f75e6b Add borrow_mut_or_yield extension method 2024-10-16 17:36:41 +02:00
86a0097311 Use bumpalo in word docids 2024-10-16 14:04:44 +02:00
c75de1f391 Remove TODO 2024-10-16 11:18:59 +02:00
198238687f Guess and retrieve primary key correctly in batch 2024-10-16 09:27:18 +02:00
f9a6c624a7 Put primary key, and use provided key in operation 2024-10-16 09:27:00 +02:00
017757004e Add PrimaryKey::new_or_insert 2024-10-16 09:26:18 +02:00
152683083b Change document operation to use method in primary key 2024-10-15 14:08:37 +02:00
c283c95f6a Support nested primary keys 2024-10-15 14:08:37 +02:00
9a0e1dc375 Fix the prefix deletion 2024-10-15 11:20:09 +02:00
1e81d72b5f Use the fixed version of the Rhai crate 2024-10-14 18:18:59 +02:00
52b95c4e59 Make sure we edit the task statuses 2024-10-14 16:48:15 +02:00
7e1dc8439b Introduce the new update by function 2024-10-14 16:32:50 +02:00
96658ec775 Make de public 2024-10-14 15:41:58 +02:00
c01ee7b732 external changes 2024-10-14 15:41:58 +02:00
6ad3f57bc1 Changes to de 2024-10-14 15:41:58 +02:00
28d92c521a External docids to &'bump str 2024-10-14 15:41:58 +02:00
7df20d8282 Changes to primary key 2024-10-14 15:41:57 +02:00
b4102741e6 Fix duplicated fields when a document is modified 2024-10-14 14:59:40 +02:00
a525598ad6 Fix facet string indexing 2024-10-14 11:12:10 +02:00
4e97e38177 Serialize docids bitmap one time 2024-10-14 11:12:10 +02:00
d675e73af1 Finish prefix databases 2024-10-14 11:12:10 +02:00
a2fbf2ea21 set updated at at the end of the indexing 2024-10-14 11:05:25 +02:00
132916f62c Only run word pair proximity docids extraction if proximity_precision enables it 2024-10-14 11:05:25 +02:00
8371819114 Some clippy related fixes 2024-10-14 10:58:37 +02:00
6028d6ba43 Remove somme warnings 2024-10-10 22:42:37 +02:00
68a2502388 Introduce indexer level bumpalo 2024-10-10 22:23:05 +02:00
39b27e42be Plug the deletion pipeline 2024-10-08 16:04:19 +02:00
470c2272dd Show much more stats about the LRU caches 2024-10-08 15:29:24 +02:00
30f3c30389 Merge #4962
4962: test: improve performance of create_index.rs r=irevoire a=DerTimonius

# Pull Request

## Related issue
related to #4840 

## What does this PR do?
This PR follows the instructions in #4840 and improves the performance of `meilisearch/tests/index/create_index.rs`. The tests run locally, if they fail in the CI I'll try to fix them

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Timon Jurschitsch <timon.jurschitsch@gmail.com>
2024-10-08 13:00:56 +00:00
d907d1b22d Merge #4990
4990: Add image source label to dockerfiles r=curquiza a=wuast94

To get changelogs shown with Renovate a docker container has to add the source label described in the OCI Image Format Specification.

For reference: https://github.com/renovatebot/renovate/blob/main/lib/modules/datasource/docker/readme.md

Co-authored-by: Marc <github@wuast24.de>
Co-authored-by: Clémentine <clementine@meilisearch.com>
2024-10-08 12:19:38 +00:00
ed267fa063 Apply suggestions from code review 2024-10-08 14:14:16 +02:00
6af55b1a80 Update Dockerfile 2024-10-08 11:59:43 +02:00
2230674c0a Merge branch 'fix-append-only-vec' into indexer-edition-2024 2024-10-08 10:32:45 +02:00
5b04189f7a remove flaky assert 2024-10-07 16:50:57 +02:00
eb09dfed04 Avoid reallocation with the ThreadLocal pool 2024-10-07 16:41:17 +02:00
83c09d0db0 Remove the now, useless AppendOnlyVec library 2024-10-07 16:38:45 +02:00
c0912aa685 add missing shared servers 2024-10-07 16:29:47 +02:00
af38f46621 Merge branch 'main' of https://github.com/meilisearch/meilisearch into test/improve-create-index 2024-10-07 16:27:57 +02:00
c11b7e5c0f Reduce number of cache created by using thread_local 2024-10-07 15:58:16 +02:00
03579aba13 Adjust test 2024-10-04 11:38:47 +03:00
c3de3a9ab7 Refactor 2024-10-04 11:30:31 +03:00
386ca86297 Merge #4963
4963: test: improve performance of delete_index.rs r=curquiza a=DerTimonius

# Pull Request

## Related issue
related to #4840

## What does this PR do?
This PR follows the instructions in #4840 and improves the performance of `meilisearch/tests/index/delete_index.rs`. The tests run locally, if they fail in the CI I'll try to fix them

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Timon Jurschitsch <timon.jurschitsch@gmail.com>
2024-10-03 15:40:07 +00:00
dff2d54784 Merge pull request #4976 from meilisearch/fix-append-only-vec
Fix append only `Vec` by using a `LinkedList`
2024-10-03 17:26:00 +02:00
58d96fbea3 Rename Node parent to next 2024-10-03 16:15:05 +02:00
4665bfcb19 Move the parent assignation before the exchange operation 2024-10-03 16:14:23 +02:00
a7a01646cf Remove the useless Manually drop 2024-10-03 15:57:31 +02:00
0409a26cd8 Replace the concurrent vec by a linked list 2024-10-03 15:15:29 +02:00
8221c94e7f Split into multiple files, refactor 2024-10-03 15:37:51 +03:00
35f78b5423 TO REMOVE: usefull debug prints 2024-10-03 11:13:01 +02:00
14261f8f04 Integrate facet level bulk update
Only the facet bulk update has been added so far, the incremental must be completely rewritten

Factorize facet merging

Fix facet level extraction
2024-10-03 11:13:00 +02:00
774ed28539 Fix Prefix FST when a document is modified 2024-10-03 11:12:26 +02:00
d79f75f630 Compute and Write external-documents-ids database 2024-10-03 11:11:56 +02:00
c427d9e2ad Merge branch 'main' into change-matches-position-phrase-search 2024-10-03 10:42:34 +03:00
40336ce87d Fix and refactor crop_bounds 2024-10-03 10:40:14 +03:00
2a18917af3 add delete_index_fail function 2024-10-02 16:23:21 +02:00
ccf01c2471 Merge pull request #4969 from meilisearch/indexer-edition-2024-try-map
Indexer edition 2024 try map
2024-10-02 11:25:05 +02:00
37a9d64c44 Fix failing test, refactor 2024-10-01 22:52:01 +03:00
17571805b4 use shared servers 2024-10-01 17:27:27 +02:00
2654ce6e6c use shared servers 2024-10-01 17:01:47 +02:00
d9e4db9983 Refactor 2024-10-01 17:50:59 +03:00
6d16230f17 Refactor 2024-10-01 17:19:15 +03:00
b7a5ba100e Move the ParallelIteratorExt into the parallel_iterator_ext module 2024-10-01 11:11:52 +02:00
dead7a56a3 Keep the caches in the AppendOnlyVec 2024-10-01 11:11:39 +02:00
0a8cb471df Introduce the AppendOnlyVec struct for the parallel computing 2024-10-01 11:11:25 +02:00
00e045b249 Rename and use the try_arc_for_each_try_init method 2024-10-01 11:11:25 +02:00
d83c9a4074 Introduce the try_for_each_try_init method to be used with Arced Errors 2024-10-01 11:11:25 +02:00
f3356ddaa4 Fix the errors when using the try_map_try_init method 2024-10-01 11:11:10 +02:00
31de5c747e WIP using try_map_try_init 2024-10-01 11:10:53 +02:00
3843240940 Prefer using Ars instead of Options 2024-10-01 11:10:53 +02:00
8cb5e7437d try using try_map_try_init 2024-10-01 11:10:53 +02:00
5b776556fe Add ParallelIteratorExt 2024-10-01 11:10:53 +02:00
bb7a503e5d Compute prefix databases
We are now computing the prefix FST and a prefix delta in the Merger thread,
after all the databases are written, the main thread will recompute the prefix databases based on the prefix delta without needing any grenad temporary file anymore
2024-10-01 09:57:06 +02:00
eabc14c268 Refactor, handle more cases for phrases 2024-09-30 21:24:41 +03:00
e78da35287 Merge #4930
4930: Return `UserError::InvalidDocumentId` for primary keys with a length greater than 512 bytes r=curquiza a=flevi29

# Pull Request

## Related issue
Fixes #4843

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: F. Levi <55688616+flevi29@users.noreply.github.com>
2024-09-30 15:55:05 +00:00
64589278ac Appease *some* of clippy warnings 2024-09-30 16:08:29 +02:00
8df6daf308 Remove fid_wordcount_docids.rs 2024-09-30 11:52:31 +02:00
5b552caf42 Fix position in insertions 2024-09-30 11:46:32 +02:00
2b51a63418 Remove dead code 2024-09-30 11:42:36 +02:00
3d8024fb2b write the weighted fields ids map 2024-09-30 11:35:03 +02:00
4b0da0ff24 Fix inversion of field_id and position 2024-09-30 11:34:50 +02:00
079f2b5de0 Format error messages consistently 2024-09-30 11:34:31 +02:00
84b4219a4f test: improve delete_index.rs 2024-09-29 10:16:31 +02:00
5539a1904a test: improve performance of create_index.rs 2024-09-28 11:05:52 +02:00
00ccf53ffa Merge branch 'main' into change-matches-position-phrase-search 2024-09-27 15:52:05 +03:00
d20a39b959 Refactor find_best_match_interval 2024-09-27 15:44:30 +03:00
960060ebdf Fix fst builder when their is no previous FST 2024-09-25 16:53:00 +02:00
3d244451df Reduce the lru key size from 8 to 12 bytes 2024-09-25 16:14:13 +02:00
5f53935c8a Fix a bug in the Lru 2024-09-25 16:09:34 +02:00
29a7623c3f Fxi some logs 2024-09-25 15:57:50 +02:00
e97041f7d0 Replace the Lru free list by a simple increment 2024-09-25 15:55:52 +02:00
52d7f3ed1c Reduce the lru key size from 20 to 8 bytes 2024-09-25 15:37:13 +02:00
86d5e6d9ff Use the new Lru 2024-09-25 14:54:56 +02:00
759b9b1546 Introduce a new custom Lru 2024-09-25 14:49:12 +02:00
3f7a500f3b Build prefix fst 2024-09-25 14:36:06 +02:00
974272f2e9 Merge branch 'main' into indexer-edition-2024 2024-09-25 07:41:16 +02:00
7ad037841f Move the tracing info to eprintln 2024-09-24 18:21:58 +02:00
e0c7067355 Expose an IndexedParallelIterator to the index function 2024-09-24 17:24:59 +02:00
6e87332410 Change the way the FST is built 2024-09-24 16:28:31 +02:00
2d1caf27df Use eprintln to log 2024-09-24 15:59:50 +02:00
92678383d6 Update charabia 2024-09-24 15:37:56 +02:00
7f148c127c Measure the SmallVec efficacity 2024-09-24 15:32:15 +02:00
4ce5d3d66d Do not check before pushing in bitmaps 2024-09-24 09:43:16 +02:00
ff931edb55 Update roaring to inline max calls 2024-09-23 16:53:42 +02:00
42b093687d Introduce the new PushOptimizedBitmap 2024-09-23 16:38:21 +02:00
835c5f98f9 Remove the debug symbols 2024-09-23 15:49:24 +02:00
f00664247d Add more stats about the channel message sent 2024-09-23 15:13:52 +02:00
3c63d4a1e5 Fix charabia Zho 2024-09-23 14:50:17 +02:00
4551abf6d4 Update roaring to the latest version 2024-09-23 14:35:33 +02:00
193d7f5d34 Add the mutualized charabia normalization 2024-09-23 14:24:25 +02:00
013acb3d93 Measure merger writer channel contention 2024-09-23 11:07:59 +02:00
0ffeea5a52 Remove wrong comments 2024-09-19 09:06:40 +03:00
30aa1f6dea Merge with main 2024-09-18 11:03:33 +03:00
83113998f9 Add more test assertions 2024-09-18 10:35:23 +03:00
f7337affd6 Adjust tests to changes 2024-09-17 17:31:09 +03:00
e098cc8320 Make comparison simpler, add IndexUid error details similarly 2024-09-17 00:16:15 +03:00
ec815fa368 Format 2024-09-16 23:59:48 +03:00
4a922a176f Add test for > 512 byte ID 2024-09-16 23:53:34 +03:00
51bc7b3173 Update tests 2024-09-16 22:22:24 +03:00
f4ab1f168e Prefer using Rc<str> than String when cloning a lot 2024-09-16 15:41:29 +02:00
1a0e962299 Replace hashmap by vectors in wpp 2024-09-16 15:01:20 +02:00
f13e076b8a Use hashmap instead of Btree in wpp extractor 2024-09-16 14:40:40 +02:00
7ba49b849e Extract and write facet databases 2024-09-16 09:35:16 +02:00
993408d3ba Change closure to fn 2024-09-15 16:15:09 +03:00
dcb61f8b3a Return error for primary keys with a length greater than 512 bytes 2024-09-14 11:34:13 +03:00
51085206cc Misc adjustments 2024-09-14 10:14:07 +03:00
a2a16bf846 Move MatchPosition impl to Match, adjust counting score for phrases 2024-09-13 21:20:06 +03:00
cab63abc84 Improve MatchesPosition enum with an impl 2024-09-13 14:35:28 +03:00
65e3d61a95 Make use of helper function in one more place 2024-09-13 13:35:58 +03:00
cc6a2aec06 Improve changes to Matcher 2024-09-13 13:31:07 +03:00
f7652186e1 WIP geo fields 2024-09-12 18:01:02 +02:00
e7af499314 Improve changes to Matcher 2024-09-12 16:58:13 +03:00
b2f4e67c9a Do not store useless updates 2024-09-12 15:38:31 +02:00
ff5d3b59f5 Move the document id extraction to the primary key code 2024-09-12 12:01:42 +02:00
aa69308e45 Use a bufWriter to build word FSTs 2024-09-12 11:48:00 +02:00
eb9a20ff0b Fix fid_word_docids extraction 2024-09-12 11:08:18 +02:00
edcb4c60ba Change Matcher so that phrases are counted as one instead of word by word 2024-09-12 09:46:08 +03:00
0d868f36d7 Make sure we always use a BufWriter to write the update files 2024-09-11 18:38:04 +02:00
e7d9db078f Use the right key name when convertir from CSV to NDJSON 2024-09-11 18:27:00 +02:00
3e9198ebaa Support guessing primary key again 2024-09-11 17:25:40 +02:00
2a0ad0982f Fix the document counter 2024-09-11 15:59:36 +02:00
2b317c681b Build mergers in parallel 2024-09-11 11:49:26 +02:00
39b5990f64 Mutualize tokenization 2024-09-11 10:22:38 +02:00
3848adf5a2 Improve error management and simplify JSON read 2024-09-11 10:10:51 +02:00
b4de06259e Better CSV support 2024-09-11 10:02:00 +02:00
8287c2644f Support CSV again 2024-09-10 21:10:28 +01:00
c1c44a0b81 Impl serialize on TopLevelMap 2024-09-10 19:32:03 +01:00
04596f3616 Move the TopLevelMap into a dedicated module 2024-09-10 18:01:17 +01:00
24cb5839ad Move the document changes sorting logic to a new trait 2024-09-10 17:37:52 +01:00
8d97b7b28c Support JSON payloads again (not perfectly though) 2024-09-10 17:09:49 +01:00
f69688e8f7 Fix several warnings in extractors and remove unreachable macros 2024-09-09 14:52:50 +02:00
8fd0afaaaa Make sure we iterate over the payload documents in order 2024-09-06 08:09:08 +02:00
72c6a21a30 Use raw JSON to read the payloads 2024-09-05 20:08:23 +02:00
8412be4a7d Cleanup CowStr and TopLevelMap struct 2024-09-05 18:32:55 +02:00
10f09c531f add some commented code to read from json with raw values 2024-09-05 18:22:16 +02:00
8fd99b111b Add tracing timers logs 2024-09-05 18:00:22 +02:00
f6b3d1f9a5 Increase some channel sizes 2024-09-05 15:12:07 +02:00
73ce67862d Use the word pair proximity and fid word count docids extractors
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-05 10:56:22 +02:00
0fc02f7351 Move the facet extraction to dedicated modules 2024-09-05 10:32:27 +02:00
34f11e3380 Implement word count and word pair proximity extractors 2024-09-05 10:30:39 +02:00
27308eaab1 Import the facet extractors 2024-09-04 17:58:15 +02:00
b33ec9ba3f Introduce the FieldIdFacetIsNullDocidsExtractor 2024-09-04 17:50:08 +02:00
9c0a1cd9fd Introduce the FieldIdFacetExistsDocidsExtractor 2024-09-04 17:48:49 +02:00
0b061f1e70 Introduce the FieldIdFacetIsEmptyDocidsExtractor 2024-09-04 17:40:24 +02:00
19d937ab21 Introduce the facet extractors 2024-09-04 17:03:54 +02:00
1d59c19cd2 Send the WordsFst by using an Mmap 2024-09-04 14:30:09 +02:00
98e48371c3 Factorize some stuff 2024-09-04 12:17:13 +02:00
6d74fb0229 Introduce the WordFidWordDocids database 2024-09-04 11:40:55 +02:00
1eb75a1040 remove milli/src/update/new/extract/tokenize_document.rs 2024-09-04 11:40:26 +02:00
3b82d8b5b9 Fix the cache to serialize entries correctly 2024-09-04 10:55:36 +02:00
781a186f75 remove milli/src/update/new/extract/extract_word_docids.rs 2024-09-04 10:28:31 +02:00
6a399556b5 Implement more searchable extractor 2024-09-04 10:20:18 +02:00
27b4cab857 Extract and write the documents and words fst in the database 2024-09-04 09:59:19 +02:00
52d32b4ee9 Move the channel sender in the closure to stop the merger thread 2024-09-03 16:08:33 +02:00
da61408e52 Remove unimplemented from document changes 2024-09-03 15:14:16 +02:00
fe69385bd7 Fix tokenizer test 2024-09-03 14:24:37 +02:00
c1557734dc Use the GlobalFieldsIdsMap everywhere and write it to disk
Co-authored-by: Dureuill <louis@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-03 12:01:01 +02:00
c50d3edc4a Integrate first searchable exctrator 2024-09-03 11:02:39 +02:00
5369bf4a62 Change some lifetimes 2024-09-02 19:51:22 +02:00
bcb1aa3d22 Find a temporary solution to par into iter on an HashMap
Spoiler: Do not use an HashMap but drain it into a Vec
2024-09-02 19:39:48 +02:00
9b7858fb90 Expose the new indexer 2024-09-02 15:21:59 +02:00
ab01679a8f Remove the useless option from the document changes 2024-09-02 15:21:00 +02:00
521775f788 I push for Many 2024-09-02 15:10:21 +02:00
72e7b7846e Renaming the indexers 2024-09-02 14:42:27 +02:00
6526ce1208 Fix the merging of documents 2024-09-02 14:41:20 +02:00
e639ec79d1 Move the indexers into their own modules 2024-09-02 10:42:19 +02:00
bb885a5810 Fix the merge for roaring bitmap 2024-09-01 23:20:19 +02:00
b625d31c7d Introduce the PartialDumpIndexer indexer that generates document ids in parallel 2024-08-30 15:07:21 +02:00
6487a67f2b Introduce the ConcurrentAvailableIds struct and rename the other to AvailableIds 2024-08-30 15:06:50 +02:00
271ce91b3b Add the rayon Threadpool to the index function parameter 2024-08-30 14:34:24 +02:00
54f2eb4507 Remove duplication of grenad merger 2024-08-30 14:34:05 +02:00
794ebcd582 Replace grenad with the new grenad various-improvement branch 2024-08-30 11:53:59 +02:00
b7c77c7a39 Use the latest version of the obkv crate 2024-08-30 11:53:59 +02:00
0c57cf7565 Replace obkv with the temporary new version of it 2024-08-30 11:53:58 +02:00
27df9e6c73 Introduce the indexer::index function that runs the indexation 2024-08-30 11:53:58 +02:00
45c060831e Introduce typed channels and the merger loop 2024-08-30 11:53:58 +02:00
874c1ac538 First channels types 2024-08-30 11:53:58 +02:00
e6ffa4d454 Implement the document merge function for the replace method 2024-08-30 11:53:58 +02:00
637a9c8bdd Implement the document merge function for the update method 2024-08-30 11:53:58 +02:00
c683fa98e6 WIP
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-08-30 11:53:57 +02:00
1235 changed files with 36041 additions and 12942 deletions

View File

@ -43,7 +43,7 @@ jobs:
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd benchmarks
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files

View File

@ -88,7 +88,7 @@ jobs:
# Run benchmarks
- name: Run benchmarks - Dataset ${{ steps.command.outputs.command-arguments }} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd benchmarks
cd crates/benchmarks
cargo bench --bench ${{ steps.command.outputs.command-arguments }} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files

View File

@ -41,7 +41,7 @@ jobs:
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd benchmarks
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files

View File

@ -40,7 +40,7 @@ jobs:
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd benchmarks
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files

View File

@ -40,7 +40,7 @@ jobs:
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd benchmarks
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files

View File

@ -40,7 +40,7 @@ jobs:
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd benchmarks
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files

View File

@ -21,10 +21,10 @@ jobs:
- name: Install cargo-flaky
run: cargo install cargo-flaky
- name: Run cargo flaky in the dumps
run: cd dump; cargo flaky -i 100 --release
run: cd crates/dump; cargo flaky -i 100 --release
- name: Run cargo flaky in the index-scheduler
run: cd index-scheduler; cargo flaky -i 100 --release
run: cd crates/index-scheduler; cargo flaky -i 100 --release
- name: Run cargo flaky in the auth
run: cd meilisearch-auth; cargo flaky -i 100 --release
run: cd crates/meilisearch-auth; cargo flaky -i 100 --release
- name: Run cargo flaky in meilisearch
run: cd meilisearch; cargo flaky -i 100 --release
run: cd crates/meilisearch; cargo flaky -i 100 --release

View File

@ -65,9 +65,9 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [macos-12, windows-2022]
os: [macos-13, windows-2022]
include:
- os: macos-12
- os: macos-13
artifact_name: meilisearch
asset_name: meilisearch-macos-amd64
- os: windows-2022
@ -90,7 +90,7 @@ jobs:
publish-macos-apple-silicon:
name: Publish binary for macOS silicon
runs-on: macos-12
runs-on: macos-13
needs: check-version
strategy:
matrix:

View File

@ -33,7 +33,7 @@ jobs:
- name: Setup test with Rust stable
uses: dtolnay/rust-toolchain@1.79
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.7.1
uses: Swatinem/rust-cache@v2.7.5
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
@ -51,11 +51,11 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [macos-12, windows-2022]
os: [macos-13, windows-2022]
steps:
- uses: actions/checkout@v3
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.7.1
uses: Swatinem/rust-cache@v2.7.5
- uses: dtolnay/rust-toolchain@1.79
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
@ -127,7 +127,7 @@ jobs:
apt-get install build-essential -y
- uses: dtolnay/rust-toolchain@1.79
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.7.1
uses: Swatinem/rust-cache@v2.7.5
- name: Run tests in debug
uses: actions-rs/cargo@v1
with:
@ -144,7 +144,7 @@ jobs:
profile: minimal
components: clippy
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.7.1
uses: Swatinem/rust-cache@v2.7.5
- name: Run cargo clippy
uses: actions-rs/cargo@v1
with:
@ -163,11 +163,11 @@ jobs:
override: true
components: rustfmt
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.7.1
uses: Swatinem/rust-cache@v2.7.5
- name: Run cargo fmt
# Since we never ran the `build.rs` script in the benchmark directory we are missing one auto-generated import file.
# Since we want to trigger (and fail) this action as fast as possible, instead of building the benchmark crate
# we are going to create an empty file where rustfmt expects it.
run: |
echo -ne "\n" > benchmarks/benches/datasets_paths.rs
echo -ne "\n" > crates/benchmarks/benches/datasets_paths.rs
cargo fmt --all -- --check

3
.gitignore vendored
View File

@ -5,7 +5,6 @@
**/*.json_lines
**/*.rs.bk
/*.mdb
/query-history.txt
/data.ms
/snapshots
/dumps
@ -19,4 +18,4 @@
*.snap.new
# Fuzzcheck data for the facet indexing fuzz test
milli/fuzz/update::facet::incremental::fuzz::fuzz/
crates/milli/fuzz/update::facet::incremental::fuzz::fuzz/

318
Cargo.lock generated
View File

@ -80,7 +80,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e01ed3140b2f8d422c68afa1ed2e85d996ea619c988ac834d255db32138655cb"
dependencies = [
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -216,7 +216,7 @@ dependencies = [
"actix-router",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -296,9 +296,9 @@ dependencies = [
[[package]]
name = "allocator-api2"
version = "0.2.16"
version = "0.2.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0942ffc6dcaadf03badf6e6a2d0228460359d5e34b57ccdc720b7382dfbd5ec5"
checksum = "5c6cb57a04249c6480766f7f7cef5467412af1490f8d1e243141daddada3264f"
[[package]]
name = "anes"
@ -404,6 +404,25 @@ dependencies = [
"thiserror",
]
[[package]]
name = "arroy"
version = "0.5.0"
source = "git+https://github.com/meilisearch/arroy/?tag=DO-NOT-DELETE-upgrade-v04-to-v05#053807bf38dc079f25b003f19fc30fbf3613f6e7"
dependencies = [
"bytemuck",
"byteorder",
"heed",
"log",
"memmap2",
"nohash",
"ordered-float",
"rand",
"rayon",
"roaring",
"tempfile",
"thiserror",
]
[[package]]
name = "assert-json-diff"
version = "2.0.2"
@ -422,7 +441,7 @@ checksum = "6e0c28dcc82d7c8ead5cb13beb15405b57b8546e93215673ff8ca0349a028107"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -470,16 +489,23 @@ version = "0.22.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6"
[[package]]
name = "bbqueue"
version = "0.5.1"
source = "git+https://github.com/meilisearch/bbqueue#cbb87cc707b5af415ef203bdaf2443e06ba0d6d4"
[[package]]
name = "benchmarks"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"anyhow",
"bumpalo",
"bytes",
"convert_case 0.6.0",
"criterion",
"csv",
"flate2",
"memmap2",
"milli",
"mimalloc",
"rand",
@ -487,6 +513,7 @@ dependencies = [
"reqwest",
"roaring",
"serde_json",
"tempfile",
]
[[package]]
@ -530,7 +557,7 @@ dependencies = [
"regex",
"rustc-hash 1.1.0",
"shlex",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -574,6 +601,15 @@ dependencies = [
"serde",
]
[[package]]
name = "bitpacking"
version = "0.9.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4c1d3e2bfd8d06048a179f7b17afc3188effa10385e7b00dc65af6aae732ea92"
dependencies = [
"crunchy",
]
[[package]]
name = "bitvec"
version = "1.0.1"
@ -615,7 +651,7 @@ dependencies = [
"proc-macro-crate",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
"syn_derive",
]
@ -653,7 +689,7 @@ dependencies = [
[[package]]
name = "build-info"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"anyhow",
"time",
@ -665,6 +701,10 @@ name = "bumpalo"
version = "3.16.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "79296716171880943b8470b5f8d03aa55eb2e645a4874bdbb28adb49162e012c"
dependencies = [
"allocator-api2",
"serde",
]
[[package]]
name = "byte-unit"
@ -707,9 +747,9 @@ checksum = "2c676a478f63e9fa2dd5368a42f28bba0d6c560b775f38583c8bbaa7fcd67c9c"
[[package]]
name = "bytemuck"
version = "1.16.1"
version = "1.19.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b236fc92302c97ed75b38da1f4917b5cdda4984745740f153a5d3059e48d725e"
checksum = "8334215b81e418a0a7bdb8ef0849474f40bb10c8b71f1c4ed315cff49f32494d"
dependencies = [
"bytemuck_derive",
]
@ -722,7 +762,7 @@ checksum = "4da9a32f3fed317401fa3c862968128267c3106685286e15d5aaa3d7389c2f60"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -934,9 +974,9 @@ dependencies = [
[[package]]
name = "charabia"
version = "0.9.1"
version = "0.9.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "55ff52497324e7d168505a16949ae836c14595606fab94687238d2f6c8d4c798"
checksum = "cf8921fe4d53ab8f9e8f9b72ce6f91726cfc40fffab1243d27db406b5e2e9cc2"
dependencies = [
"aho-corasick",
"csv",
@ -1033,7 +1073,7 @@ dependencies = [
"heck 0.5.0",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -1211,19 +1251,6 @@ dependencies = [
"itertools 0.10.5",
]
[[package]]
name = "crossbeam"
version = "0.8.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1137cd7e7fc0fb5d3c5a8678be38ec56e819125d8d7907411fe24ccb943faca8"
dependencies = [
"crossbeam-channel",
"crossbeam-deque",
"crossbeam-epoch",
"crossbeam-queue",
"crossbeam-utils",
]
[[package]]
name = "crossbeam-channel"
version = "0.5.13"
@ -1359,7 +1386,7 @@ dependencies = [
"proc-macro2",
"quote",
"strsim 0.11.1",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -1381,7 +1408,7 @@ checksum = "733cabb43482b1a1b53eee8583c2b9e8684d592215ea83efd305dd31bc2f0178"
dependencies = [
"darling_core 0.20.9",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -1435,7 +1462,7 @@ checksum = "67e77553c4162a157adbf834ebae5b415acbecbeafc7a74b0e886657506a7611"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -1477,7 +1504,7 @@ dependencies = [
"darling 0.20.9",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -1497,7 +1524,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "206868b8242f27cecce124c19fd88157fbd0dd334df2587f36417bafbc85097b"
dependencies = [
"derive_builder_core 0.20.0",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -1539,7 +1566,7 @@ dependencies = [
"convert_case 0.6.0",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -1603,7 +1630,7 @@ checksum = "97369cbbc041bc366949bc74d34658d6cda5621039731c6310521892a3a20ae0"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -1623,7 +1650,7 @@ dependencies = [
[[package]]
name = "dump"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"anyhow",
"big_s",
@ -1761,7 +1788,7 @@ dependencies = [
"heck 0.4.1",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -1781,7 +1808,7 @@ checksum = "a1ab991c1362ac86c61ab6f556cff143daa22e5a15e4e189df818b2fd19fe65b"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -1829,13 +1856,13 @@ dependencies = [
[[package]]
name = "fastrand"
version = "2.1.0"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9fc0510504f03c51ada170672ac806f1f105a88aa97a5281117e1ddc3368e51a"
checksum = "486f806e73c5707928240ddc295403b1b93c96a02038563881c4a2fd84b81ac4"
[[package]]
name = "file-store"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"tempfile",
"thiserror",
@ -1857,7 +1884,7 @@ dependencies = [
[[package]]
name = "filter-parser"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"insta",
"nom",
@ -1877,18 +1904,33 @@ dependencies = [
[[package]]
name = "flatten-serde-json"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"criterion",
"serde_json",
]
[[package]]
name = "flume"
version = "0.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "da0e4dd2a88388a1f4ccc7c9ce104604dab68d9f408dc34cd45823d5a9069095"
dependencies = [
"spin",
]
[[package]]
name = "fnv"
version = "1.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1"
[[package]]
name = "foldhash"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f81ec6369c545a7d40e4589b5597581fa1c441fe1cce96dd1de43159910a36a2"
[[package]]
name = "form_urlencoded"
version = "1.2.1"
@ -1966,7 +2008,7 @@ checksum = "87750cf4b7a4c0625b1529e4c543c2182106e4dedc60a2a6455e00d212c489ac"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -2001,10 +2043,12 @@ dependencies = [
[[package]]
name = "fuzzers"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"arbitrary",
"bumpalo",
"clap",
"either",
"fastrand",
"milli",
"serde",
@ -2221,12 +2265,13 @@ checksum = "d2fabcfbdc87f4758337ca535fb41a6d701b65693ce38287d856d1674551ec9b"
[[package]]
name = "grenad"
version = "0.4.7"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "350d89047298d3b1b40050acd11ab76e487b854a104b760ebc5a7f375093de77"
checksum = "0e2ac9baf835ee2a7f0622a5617792ced6f65af25994078c343d429431ef2bbc"
dependencies = [
"bytemuck",
"byteorder",
"either",
"rayon",
"tempfile",
]
@ -2317,6 +2362,18 @@ dependencies = [
"allocator-api2",
]
[[package]]
name = "hashbrown"
version = "0.15.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3a9bfc1af68b1726ea47d3d5109de126281def866b33970e10fbab11b5dafab3"
dependencies = [
"allocator-api2",
"equivalent",
"foldhash",
"serde",
]
[[package]]
name = "heapless"
version = "0.8.0"
@ -2553,13 +2610,14 @@ checksum = "206ca75c9c03ba3d4ace2460e57b189f39f43de612c2f85836e65c929701bb2d"
[[package]]
name = "index-scheduler"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"anyhow",
"arroy",
"arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
"big_s",
"bincode",
"crossbeam",
"bumpalo",
"crossbeam-channel",
"csv",
"derive_builder 0.20.0",
"dump",
@ -2571,7 +2629,9 @@ dependencies = [
"meili-snap",
"meilisearch-auth",
"meilisearch-types",
"memmap2",
"page_size",
"raw-collections",
"rayon",
"roaring",
"serde",
@ -2747,7 +2807,7 @@ dependencies = [
[[package]]
name = "json-depth-checker"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"criterion",
"serde_json",
@ -2816,9 +2876,9 @@ dependencies = [
[[package]]
name = "libc"
version = "0.2.155"
version = "0.2.164"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "97b3888a4aecf77e811145cadf6eef5901f4782c53886191b2f693f24761847c"
checksum = "433bfe06b8c75da9b2e3fbea6e5329ff87748f0b144ef75306e674c3f6f7c13f"
[[package]]
name = "libgit2-sys"
@ -3202,9 +3262,9 @@ checksum = "0717cef1bc8b636c6e1c1bbdefc09e6322da8a9321966e8928ef80d20f7f770f"
[[package]]
name = "linux-raw-sys"
version = "0.4.12"
version = "0.4.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c4cd1a83af159aa67994778be9070f0ae1bd732942279cabb14f86f986a21456"
checksum = "78b3ae25bc7c8c38cec158d1f2757ee79e9b3740fbc7ccf0e59e4b08d793fa89"
[[package]]
name = "liquid"
@ -3245,7 +3305,7 @@ checksum = "915f6d0a2963a27cd5205c1902f32ddfe3bc035816afd268cf88c0fc0f8d287e"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -3349,7 +3409,7 @@ dependencies = [
"once_cell",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -3366,7 +3426,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]]
name = "meili-snap"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"insta",
"md5",
@ -3375,7 +3435,7 @@ dependencies = [
[[package]]
name = "meilisearch"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"actix-cors",
"actix-http",
@ -3465,7 +3525,7 @@ dependencies = [
[[package]]
name = "meilisearch-auth"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"base64 0.22.1",
"enum-iterator",
@ -3484,10 +3544,11 @@ dependencies = [
[[package]]
name = "meilisearch-types"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"actix-web",
"anyhow",
"bumpalo",
"convert_case 0.6.0",
"csv",
"deserr",
@ -3500,6 +3561,7 @@ dependencies = [
"meili-snap",
"memmap2",
"milli",
"raw-collections",
"roaring",
"serde",
"serde-cs",
@ -3514,9 +3576,10 @@ dependencies = [
[[package]]
name = "meilitool"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"anyhow",
"arroy 0.5.0 (git+https://github.com/meilisearch/arroy/?tag=DO-NOT-DELETE-upgrade-v04-to-v05)",
"clap",
"dump",
"file-store",
@ -3535,9 +3598,9 @@ checksum = "78ca9ab1a0babb1e7d5695e3530886289c18cf2f87ec19a575a0abdce112e3a3"
[[package]]
name = "memmap2"
version = "0.9.4"
version = "0.9.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fe751422e4a8caa417e13c3ea66452215d7d63e19e604f4980461212f3ae1322"
checksum = "fd3f7eed9d3848f8b98834af67102b720745c4ec028fcd0aa0239277e7de374f"
dependencies = [
"libc",
"stable_deref_trait",
@ -3545,13 +3608,16 @@ dependencies = [
[[package]]
name = "milli"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"arroy",
"allocator-api2",
"arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
"bbqueue",
"big_s",
"bimap",
"bincode",
"bstr",
"bumpalo",
"bytemuck",
"byteorder",
"candle-core",
@ -3563,12 +3629,15 @@ dependencies = [
"csv",
"deserr",
"either",
"enum-iterator",
"filter-parser",
"flatten-serde-json",
"flume",
"fst",
"fxhash",
"geoutils",
"grenad",
"hashbrown 0.15.1",
"heed",
"hf-hub",
"indexmap",
@ -3587,11 +3656,13 @@ dependencies = [
"once_cell",
"ordered-float",
"rand",
"raw-collections",
"rayon",
"rayon-par-bridge",
"rhai",
"roaring",
"rstar",
"rustc-hash 2.0.0",
"serde",
"serde_json",
"slice-group-by",
@ -3600,10 +3671,12 @@ dependencies = [
"smartstring",
"tempfile",
"thiserror",
"thread_local",
"tiktoken-rs",
"time",
"tokenizers",
"tracing",
"uell",
"ureq",
"url",
"uuid",
@ -3679,7 +3752,7 @@ checksum = "371717c0a5543d6a800cac822eac735aa7d2d2fbb41002e9856a4089532dbdce"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -3815,7 +3888,7 @@ dependencies = [
"proc-macro-crate",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -3844,9 +3917,9 @@ dependencies = [
[[package]]
name = "obkv"
version = "0.2.2"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a2e27bcfe835a379d32352112f6b8dbae2d99d16a5fff42abe6e5ba5386c1e5a"
checksum = "ae4512a8f418ac322335255a72361b9ac927e106f4d7fe6ab4d8ac59cb01f7a9"
[[package]]
name = "once_cell"
@ -3991,7 +4064,7 @@ checksum = "e3148f5046208a5d56bcfc03053e3ca6334e51da8dfb19b6cdc8b306fae3283e"
[[package]]
name = "permissive-json-pointer"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"big_s",
"serde_json",
@ -4027,7 +4100,7 @@ dependencies = [
"pest_meta",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -4081,7 +4154,7 @@ dependencies = [
"phf_shared",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -4110,7 +4183,7 @@ checksum = "266c042b60c9c76b8d53061e52b2e0d1116abc57cefc8c5cd671619a56ac3690"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -4227,9 +4300,9 @@ dependencies = [
[[package]]
name = "proc-macro2"
version = "1.0.81"
version = "1.0.89"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3d1597b0c024618f09a9c3b8655b7e430397a36d23fdafec26d6965e9eec3eba"
checksum = "f139b0662de085916d1fb67d2b4169d1addddda1919e696f3252b740b629986e"
dependencies = [
"unicode-ident",
]
@ -4414,6 +4487,19 @@ dependencies = [
"rand",
]
[[package]]
name = "raw-collections"
version = "0.1.0"
source = "git+https://github.com/meilisearch/raw-collections.git#15e5d7bdebc0c149b2a28b2454f307c717d07f8a"
dependencies = [
"allocator-api2",
"bitpacking",
"bumpalo",
"hashbrown 0.15.1",
"serde",
"serde_json",
]
[[package]]
name = "raw-cpuid"
version = "10.7.0"
@ -4611,7 +4697,7 @@ source = "git+https://github.com/rhaiscript/rhai?rev=ef3df63121d27aacd838f366f2b
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -4660,9 +4746,9 @@ dependencies = [
[[package]]
name = "roaring"
version = "0.10.6"
version = "0.10.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8f4b84ba6e838ceb47b41de5194a60244fac43d9fe03b71dbe8c5a201081d6d1"
checksum = "f81dc953b2244ddd5e7860cb0bb2a790494b898ef321d4aff8e260efab60cc88"
dependencies = [
"bytemuck",
"byteorder",
@ -4726,9 +4812,9 @@ dependencies = [
[[package]]
name = "rustix"
version = "0.38.31"
version = "0.38.41"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6ea3e1a662af26cd7a3ba09c0297a31af215563ecf42817c98df621387f4e949"
checksum = "d7f649912bc1495e167a6edee79151c84b1bad49748cb4f1f1167f459f6224f6"
dependencies = [
"bitflags 2.6.0",
"errno",
@ -4853,9 +4939,9 @@ checksum = "a3f0bf26fd526d2a95683cd0f87bf103b8539e2ca1ef48ce002d67aad59aa0b4"
[[package]]
name = "serde"
version = "1.0.209"
version = "1.0.214"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "99fce0ffe7310761ca6bf9faf5115afbc19688edd00171d81b1bb1b116c63e09"
checksum = "f55c3193aca71c12ad7890f1785d2b73e1b9f63a0bbc353c08ef26fe03fc56b5"
dependencies = [
"serde_derive",
]
@ -4871,23 +4957,24 @@ dependencies = [
[[package]]
name = "serde_derive"
version = "1.0.209"
version = "1.0.214"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a5831b979fd7b5439637af1752d535ff49f4860c0f341d1baeb6faf0f4242170"
checksum = "de523f781f095e28fa605cdce0f8307e451cc0fd14e2eb4cd2e98a355b147766"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
name = "serde_json"
version = "1.0.120"
version = "1.0.132"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4e0d21c9a8cae1235ad58a00c11cb40d4b1e5c784f1ef2c537876ed6ffd8b7c5"
checksum = "d726bfaff4b320266d395898905d0eba0345aae23b54aee3a737e260fd46db03"
dependencies = [
"indexmap",
"itoa",
"memchr",
"ryu",
"serde",
]
@ -5103,6 +5190,9 @@ name = "spin"
version = "0.9.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6980e8d7511241f8acf4aebddbb1ff938df5eebe98691418c4468d0b72a96a67"
dependencies = [
"lock_api",
]
[[package]]
name = "spm_precompiled"
@ -5170,7 +5260,7 @@ dependencies = [
"proc-macro2",
"quote",
"rustversion",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -5192,9 +5282,9 @@ dependencies = [
[[package]]
name = "syn"
version = "2.0.60"
version = "2.0.87"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "909518bc7b1c9b779f1bbf07f2929d35af9f0f37e47c6e9ef7f9dddc1e1821f3"
checksum = "25aa4ce346d03a6dcd68dd8b4010bcb74e54e62c90c573f394c46eae99aba32d"
dependencies = [
"proc-macro2",
"quote",
@ -5210,7 +5300,7 @@ dependencies = [
"proc-macro-error",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -5236,7 +5326,7 @@ checksum = "c8af7666ab7b6390ab78131fb5b0fce11d6b7a6951602017c35fa82800708971"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -5296,12 +5386,13 @@ dependencies = [
[[package]]
name = "tempfile"
version = "3.10.1"
version = "3.14.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85b77fafb263dd9d05cbeac119526425676db3784113aa9295c88498cbf8bff1"
checksum = "28cce251fcbc87fac86a866eeb0d6c2d536fc16d06f184bb61aeae11aa4cee0c"
dependencies = [
"cfg-if",
"fastrand",
"once_cell",
"rustix",
"windows-sys 0.52.0",
]
@ -5341,14 +5432,14 @@ checksum = "46c3384250002a6d5af4d114f2845d37b57521033f30d5c3f46c4d70e1197533"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
name = "thread_local"
version = "1.1.7"
version = "1.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3fdd6f064ccff2d6567adcb3873ca630700f00b5ad3f060c25b5dcfd9a4ce152"
checksum = "8b9ef9bad013ada3808854ceac7b46812a6465ba368859a37e2100283d2d719c"
dependencies = [
"cfg-if",
"once_cell",
@ -5493,7 +5584,7 @@ checksum = "5f5ae998a069d4b5aba8ee9dad856af7d520c3699e6159b185c2acd48155d39a"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -5625,7 +5716,7 @@ checksum = "34704c8d6ebcbc939824180af020566b01a7c01f80641264eba0999f6c2b6be7"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -5720,6 +5811,15 @@ version = "0.1.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ed646292ffc8188ef8ea4d1e0e0150fb15a5c2e12ad9b8fc191ae7a8a7f3c4b9"
[[package]]
name = "uell"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "40de5982e28612e20330e77d81f1559b74f66caf3c7fc10b19ada4843f4b4fd7"
dependencies = [
"bumpalo",
]
[[package]]
name = "unescaper"
version = "0.1.5"
@ -5926,9 +6026,9 @@ dependencies = [
[[package]]
name = "wana_kana"
version = "3.0.0"
version = "4.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "477976a5c56fb7b014795df5a2ce08d2de8bcd4d5980844c5bd3978a7fd1c30b"
checksum = "a74666202acfcb4f9b995be2e3e9f7f530deb65e05a1407b8d0b30c9c451238a"
dependencies = [
"fnv",
"itertools 0.10.5",
@ -5971,7 +6071,7 @@ dependencies = [
"once_cell",
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
"wasm-bindgen-shared",
]
@ -6005,7 +6105,7 @@ checksum = "e94f17b526d0a461a191c78ea52bbce64071ed5c04c9ffe424dcb38f74171bb7"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
"wasm-bindgen-backend",
"wasm-bindgen-shared",
]
@ -6380,7 +6480,7 @@ dependencies = [
[[package]]
name = "xtask"
version = "1.11.0"
version = "1.12.0"
dependencies = [
"anyhow",
"build-info",
@ -6438,7 +6538,7 @@ checksum = "9e6936f0cce458098a201c245a11bef556c6a0181129c7034d10d76d1ec3a2b8"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
"synstructure",
]
@ -6459,7 +6559,7 @@ checksum = "9ce1b18ccd8e73a9321186f97e46f9f04b778851177567b1975109d26a08d2a6"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]
@ -6479,7 +6579,7 @@ checksum = "e6a647510471d372f2e6c2e6b7219e44d8c574d24fdc11c610a61455782f18c3"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
"synstructure",
]
@ -6500,7 +6600,7 @@ checksum = "ce36e65b0d2999d2aafac989fb249189a141aee1f53c612c1f37d72631959f69"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.60",
"syn 2.0.87",
]
[[package]]

View File

@ -1,28 +1,28 @@
[workspace]
resolver = "2"
members = [
"meilisearch",
"meilitool",
"meilisearch-types",
"meilisearch-auth",
"meili-snap",
"index-scheduler",
"dump",
"file-store",
"permissive-json-pointer",
"milli",
"filter-parser",
"flatten-serde-json",
"json-depth-checker",
"benchmarks",
"fuzzers",
"tracing-trace",
"xtask",
"build-info",
"crates/meilisearch",
"crates/meilitool",
"crates/meilisearch-types",
"crates/meilisearch-auth",
"crates/meili-snap",
"crates/index-scheduler",
"crates/dump",
"crates/file-store",
"crates/permissive-json-pointer",
"crates/milli",
"crates/filter-parser",
"crates/flatten-serde-json",
"crates/json-depth-checker",
"crates/benchmarks",
"crates/fuzzers",
"crates/tracing-trace",
"crates/xtask",
"crates/build-info",
]
[workspace.package]
version = "1.11.0"
version = "1.12.0"
authors = [
"Quentin de Quelen <quentin@dequelen.me>",
"Clément Renault <clement@meilisearch.com>",
@ -43,24 +43,3 @@ opt-level = 3
opt-level = 3
[profile.dev.package.roaring]
opt-level = 3
[profile.dev.package.lindera-ipadic-builder]
opt-level = 3
[profile.dev.package.encoding]
opt-level = 3
[profile.dev.package.yada]
opt-level = 3
[profile.release.package.lindera-ipadic-builder]
opt-level = 3
[profile.release.package.encoding]
opt-level = 3
[profile.release.package.yada]
opt-level = 3
[profile.bench.package.lindera-ipadic-builder]
opt-level = 3
[profile.bench.package.encoding]
opt-level = 3
[profile.bench.package.yada]
opt-level = 3

View File

@ -21,6 +21,7 @@ RUN set -eux; \
# Run
FROM alpine:3.20
LABEL org.opencontainers.image.source="https://github.com/meilisearch/meilisearch"
ENV MEILI_HTTP_ADDR 0.0.0.0:7700
ENV MEILI_SERVER_PROVIDER docker

View File

@ -1,6 +1,9 @@
<p align="center">
<a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=logo" target="_blank">
<img src="assets/meilisearch-logo-kawaii.png">
<a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=logo#gh-light-mode-only" target="_blank">
<img src="assets/meilisearch-logo-light.svg?sanitize=true#gh-light-mode-only">
</a>
<a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=logo#gh-dark-mode-only" target="_blank">
<img src="assets/meilisearch-logo-dark.svg?sanitize=true#gh-dark-mode-only">
</a>
</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 98 KiB

File diff suppressed because it is too large Load Diff

View File

@ -1,6 +1,6 @@
status = [
'Tests on ubuntu-20.04',
'Tests on macos-12',
'Tests on macos-13',
'Tests on windows-2022',
'Run Clippy',
'Run Rustfmt',

View File

@ -12,16 +12,19 @@ license.workspace = true
[dependencies]
anyhow = "1.0.86"
bumpalo = "3.16.0"
csv = "1.3.0"
memmap2 = "0.9.5"
milli = { path = "../milli" }
mimalloc = { version = "0.1.43", default-features = false }
serde_json = { version = "1.0.120", features = ["preserve_order"] }
tempfile = "3.14.0"
[dev-dependencies]
criterion = { version = "0.5.1", features = ["html_reports"] }
rand = "0.8.5"
rand_chacha = "0.3.1"
roaring = "0.10.6"
roaring = "0.10.7"
[build-dependencies]
anyhow = "1.0.86"

File diff suppressed because it is too large Load Diff

View File

@ -5,6 +5,7 @@ use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;

View File

@ -5,6 +5,7 @@ use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;

View File

@ -5,6 +5,7 @@ use criterion::{criterion_group, criterion_main};
use milli::update::Settings;
use utils::Conf;
#[cfg(not(windows))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;

View File

@ -1,17 +1,18 @@
#![allow(dead_code)]
use std::fs::{create_dir_all, remove_dir_all, File};
use std::io::{self, BufRead, BufReader, Cursor, Read, Seek};
use std::num::ParseFloatError;
use std::io::{self, BufReader, BufWriter, Read};
use std::path::Path;
use std::str::FromStr;
use std::str::FromStr as _;
use anyhow::Context;
use bumpalo::Bump;
use criterion::BenchmarkId;
use milli::documents::{DocumentsBatchBuilder, DocumentsBatchReader};
use memmap2::Mmap;
use milli::heed::EnvOpenOptions;
use milli::update::{
IndexDocuments, IndexDocumentsConfig, IndexDocumentsMethod, IndexerConfig, Settings,
};
use milli::update::new::indexer;
use milli::update::{IndexDocumentsMethod, IndexerConfig, Settings};
use milli::vector::EmbeddingConfigs;
use milli::{Criterion, Filter, Index, Object, TermsMatchingStrategy};
use serde_json::Value;
@ -65,7 +66,7 @@ pub fn base_setup(conf: &Conf) -> Index {
let mut options = EnvOpenOptions::new();
options.map_size(100 * 1024 * 1024 * 1024); // 100 GB
options.max_readers(10);
options.max_readers(100);
let index = Index::new(options, conf.database_name).unwrap();
let config = IndexerConfig::default();
@ -92,18 +93,44 @@ pub fn base_setup(conf: &Conf) -> Index {
let config = IndexerConfig::default();
let mut wtxn = index.write_txn().unwrap();
let indexing_config = IndexDocumentsConfig {
autogenerate_docids: conf.primary_key.is_none(),
update_method: IndexDocumentsMethod::ReplaceDocuments,
..Default::default()
};
let builder =
IndexDocuments::new(&mut wtxn, &index, &config, indexing_config, |_| (), || false).unwrap();
let rtxn = index.read_txn().unwrap();
let db_fields_ids_map = index.fields_ids_map(&rtxn).unwrap();
let mut new_fields_ids_map = db_fields_ids_map.clone();
let documents = documents_from(conf.dataset, conf.dataset_format);
let (builder, user_error) = builder.add_documents(documents).unwrap();
user_error.unwrap();
builder.execute().unwrap();
let mut indexer = indexer::DocumentOperation::new(IndexDocumentsMethod::ReplaceDocuments);
indexer.add_documents(&documents).unwrap();
let indexer_alloc = Bump::new();
let (document_changes, _operation_stats, primary_key) = indexer
.into_changes(
&indexer_alloc,
&index,
&rtxn,
None,
&mut new_fields_ids_map,
&|| false,
&|_progress| (),
)
.unwrap();
indexer::index(
&mut wtxn,
&index,
&milli::ThreadPoolNoAbortBuilder::new().build().unwrap(),
config.grenad_parameters(),
&db_fields_ids_map,
new_fields_ids_map,
primary_key,
&document_changes,
EmbeddingConfigs::default(),
&|| false,
&|_| (),
)
.unwrap();
wtxn.commit().unwrap();
drop(rtxn);
index
}
@ -140,49 +167,96 @@ pub fn run_benches(c: &mut criterion::Criterion, confs: &[Conf]) {
}
}
pub fn documents_from(filename: &str, filetype: &str) -> DocumentsBatchReader<impl BufRead + Seek> {
let reader = File::open(filename)
.unwrap_or_else(|_| panic!("could not find the dataset in: {}", filename));
let reader = BufReader::new(reader);
let documents = match filetype {
"csv" => documents_from_csv(reader).unwrap(),
"json" => documents_from_json(reader).unwrap(),
"jsonl" => documents_from_jsonl(reader).unwrap(),
otherwise => panic!("invalid update format {:?}", otherwise),
};
DocumentsBatchReader::from_reader(Cursor::new(documents)).unwrap()
pub fn documents_from(filename: &str, filetype: &str) -> Mmap {
let file = File::open(filename)
.unwrap_or_else(|_| panic!("could not find the dataset in: {filename}"));
match filetype {
"csv" => documents_from_csv(file).unwrap(),
"json" => documents_from_json(file).unwrap(),
"jsonl" => documents_from_jsonl(file).unwrap(),
otherwise => panic!("invalid update format {otherwise:?}"),
}
}
fn documents_from_jsonl(reader: impl BufRead) -> anyhow::Result<Vec<u8>> {
let mut documents = DocumentsBatchBuilder::new(Vec::new());
fn documents_from_jsonl(file: File) -> anyhow::Result<Mmap> {
unsafe { Mmap::map(&file).map_err(Into::into) }
}
for result in serde_json::Deserializer::from_reader(reader).into_iter::<Object>() {
let object = result?;
documents.append_json_object(&object)?;
fn documents_from_json(file: File) -> anyhow::Result<Mmap> {
let reader = BufReader::new(file);
let documents: Vec<milli::Object> = serde_json::from_reader(reader)?;
let mut output = tempfile::tempfile().map(BufWriter::new)?;
for document in documents {
serde_json::to_writer(&mut output, &document)?;
}
documents.into_inner().map_err(Into::into)
let file = output.into_inner()?;
unsafe { Mmap::map(&file).map_err(Into::into) }
}
fn documents_from_json(reader: impl BufRead) -> anyhow::Result<Vec<u8>> {
let mut documents = DocumentsBatchBuilder::new(Vec::new());
fn documents_from_csv(file: File) -> anyhow::Result<Mmap> {
let output = tempfile::tempfile()?;
let mut output = BufWriter::new(output);
let mut reader = csv::ReaderBuilder::new().from_reader(file);
documents.append_json_array(reader)?;
let headers = reader.headers().context("while retrieving headers")?.clone();
let typed_fields: Vec<_> = headers.iter().map(parse_csv_header).collect();
let mut object: serde_json::Map<_, _> =
typed_fields.iter().map(|(k, _)| (k.to_string(), Value::Null)).collect();
documents.into_inner().map_err(Into::into)
}
let mut line = 0;
let mut record = csv::StringRecord::new();
while reader.read_record(&mut record).context("while reading a record")? {
// We increment here and not at the end of the loop
// to take the header offset into account.
line += 1;
fn documents_from_csv(reader: impl BufRead) -> anyhow::Result<Vec<u8>> {
let csv = csv::Reader::from_reader(reader);
// Reset the document values
object.iter_mut().for_each(|(_, v)| *v = Value::Null);
let mut documents = DocumentsBatchBuilder::new(Vec::new());
documents.append_csv(csv)?;
for (i, (name, atype)) in typed_fields.iter().enumerate() {
let value = &record[i];
let trimmed_value = value.trim();
let value = match atype {
AllowedType::Number if trimmed_value.is_empty() => Value::Null,
AllowedType::Number => {
match trimmed_value.parse::<i64>() {
Ok(integer) => Value::from(integer),
Err(_) => match trimmed_value.parse::<f64>() {
Ok(float) => Value::from(float),
Err(error) => {
anyhow::bail!("document format error on line {line}: {error}. For value: {value}")
}
},
}
}
AllowedType::Boolean if trimmed_value.is_empty() => Value::Null,
AllowedType::Boolean => match trimmed_value.parse::<bool>() {
Ok(bool) => Value::from(bool),
Err(error) => {
anyhow::bail!(
"document format error on line {line}: {error}. For value: {value}"
)
}
},
AllowedType::String if value.is_empty() => Value::Null,
AllowedType::String => Value::from(value),
};
documents.into_inner().map_err(Into::into)
*object.get_mut(name).expect("encountered an unknown field") = value;
}
serde_json::to_writer(&mut output, &object).context("while writing to disk")?;
}
let output = output.into_inner()?;
unsafe { Mmap::map(&output).map_err(Into::into) }
}
enum AllowedType {
String,
Boolean,
Number,
}
@ -191,8 +265,9 @@ fn parse_csv_header(header: &str) -> (String, AllowedType) {
match header.rsplit_once(':') {
Some((field_name, field_type)) => match field_type {
"string" => (field_name.to_string(), AllowedType::String),
"boolean" => (field_name.to_string(), AllowedType::Boolean),
"number" => (field_name.to_string(), AllowedType::Number),
// we may return an error in this case.
// if the pattern isn't recognized, we keep the whole field.
_otherwise => (header.to_string(), AllowedType::String),
},
None => (header.to_string(), AllowedType::String),
@ -230,10 +305,13 @@ impl<R: Read> Iterator for CSVDocumentDeserializer<R> {
for ((field_name, field_type), value) in
self.headers.iter().zip(csv_document.into_iter())
{
let parsed_value: Result<Value, ParseFloatError> = match field_type {
let parsed_value: anyhow::Result<Value> = match field_type {
AllowedType::Number => {
value.parse::<f64>().map(Value::from).map_err(Into::into)
}
AllowedType::Boolean => {
value.parse::<bool>().map(Value::from).map_err(Into::into)
}
AllowedType::String => Ok(Value::String(value.to_string())),
};

View File

@ -17,7 +17,7 @@ http = "1.1.0"
meilisearch-types = { path = "../meilisearch-types" }
once_cell = "1.19.0"
regex = "1.10.5"
roaring = { version = "0.10.6", features = ["serde"] }
roaring = { version = "0.10.7", features = ["serde"] }
serde = { version = "1.0.204", features = ["derive"] }
serde_json = { version = "1.0.120", features = ["preserve_order"] }
tar = "0.4.41"

View File

@ -1,6 +1,7 @@
#![allow(clippy::type_complexity)]
#![allow(clippy::wrong_self_convention)]
use meilisearch_types::batches::BatchId;
use meilisearch_types::error::ResponseError;
use meilisearch_types::keys::Key;
use meilisearch_types::milli::update::IndexDocumentsMethod;
@ -57,6 +58,9 @@ pub enum Version {
#[serde(rename_all = "camelCase")]
pub struct TaskDump {
pub uid: TaskId,
// The batch ID were introduced in v1.12, everything prior to this version will be `None`.
#[serde(default)]
pub batch_uid: Option<BatchId>,
#[serde(default)]
pub index_uid: Option<String>,
pub status: Status,
@ -143,6 +147,7 @@ impl From<Task> for TaskDump {
fn from(task: Task) -> Self {
TaskDump {
uid: task.uid,
batch_uid: task.batch_uid,
index_uid: task.index_uid().map(|uid| uid.to_string()),
status: task.status,
kind: task.kind.into(),
@ -287,6 +292,8 @@ pub(crate) mod test {
embedders: Setting::NotSet,
search_cutoff_ms: Setting::NotSet,
localized_attributes: Setting::NotSet,
facet_search: Setting::NotSet,
prefix_search: Setting::NotSet,
_kind: std::marker::PhantomData,
};
settings.check()
@ -297,6 +304,7 @@ pub(crate) mod test {
(
TaskDump {
uid: 0,
batch_uid: Some(0),
index_uid: Some(S("doggo")),
status: Status::Succeeded,
kind: KindDump::DocumentImport {
@ -320,6 +328,7 @@ pub(crate) mod test {
(
TaskDump {
uid: 1,
batch_uid: None,
index_uid: Some(S("doggo")),
status: Status::Enqueued,
kind: KindDump::DocumentImport {
@ -346,6 +355,7 @@ pub(crate) mod test {
(
TaskDump {
uid: 5,
batch_uid: None,
index_uid: Some(S("catto")),
status: Status::Enqueued,
kind: KindDump::IndexDeletion,

View File

@ -70,6 +70,7 @@ impl CompatV5ToV6 {
let task = v6::Task {
uid: task_view.uid,
batch_uid: None,
index_uid: task_view.index_uid,
status: match task_view.status {
v5::Status::Enqueued => v6::Status::Enqueued,
@ -381,6 +382,8 @@ impl<T> From<v5::Settings<T>> for v6::Settings<v6::Unchecked> {
embedders: v6::Setting::NotSet,
localized_attributes: v6::Setting::NotSet,
search_cutoff_ms: v6::Setting::NotSet,
facet_search: v6::Setting::NotSet,
prefix_search: v6::Setting::NotSet,
_kind: std::marker::PhantomData,
}
}
@ -449,7 +452,7 @@ pub(crate) mod test {
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"41f91d3a94911b2735ec41b07540df5c");
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"4b03e23e740b27bfb9d2a1faffe512e2");
assert_eq!(update_files.len(), 22);
assert!(update_files[0].is_none()); // the dump creation
assert!(update_files[1].is_some()); // the enqueued document addition

View File

@ -222,7 +222,7 @@ pub(crate) mod test {
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"278f63325ef06ca04d01df98d8207b94");
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"2b8a72d6bc6ba79980491966437daaf9");
assert_eq!(update_files.len(), 10);
assert!(update_files[0].is_none()); // the dump creation
assert!(update_files[1].is_none());
@ -345,7 +345,7 @@ pub(crate) mod test {
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"d45cd8571703e58ae53c7bd7ce3f5c22");
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"3ddf6169b0a3703c5d770971f036fc5d");
assert_eq!(update_files.len(), 2);
assert!(update_files[0].is_none()); // the dump creation
assert!(update_files[1].is_none()); // the processed document addition
@ -391,7 +391,7 @@ pub(crate) mod test {
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"41f91d3a94911b2735ec41b07540df5c");
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"4b03e23e740b27bfb9d2a1faffe512e2");
assert_eq!(update_files.len(), 22);
assert!(update_files[0].is_none()); // the dump creation
assert!(update_files[1].is_some()); // the enqueued document addition
@ -471,7 +471,7 @@ pub(crate) mod test {
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"c2445ddd1785528b80f2ba534d3bd00c");
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"c1b06a5ca60d5805483c16c5b3ff61ef");
assert_eq!(update_files.len(), 10);
assert!(update_files[0].is_some()); // the enqueued document addition
assert!(update_files[1..].iter().all(|u| u.is_none())); // everything already processed
@ -548,7 +548,7 @@ pub(crate) mod test {
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"cd12efd308fe3ed226356a727ab42ed3");
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"0e203b6095f7c68dbdf788321dcc8215");
assert_eq!(update_files.len(), 10);
assert!(update_files[0].is_some()); // the enqueued document addition
assert!(update_files[1..].iter().all(|u| u.is_none())); // everything already processed
@ -641,7 +641,7 @@ pub(crate) mod test {
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"bc616290adfe7d09a624cf6065ca9069");
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"d216c7f90f538ffbb2a059531d7ac89a");
assert_eq!(update_files.len(), 9);
assert!(update_files[0].is_some()); // the enqueued document addition
assert!(update_files[1..].iter().all(|u| u.is_none())); // everything already processed
@ -734,7 +734,7 @@ pub(crate) mod test {
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"2db37756d8af1fb7623436b76e8956a6");
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"e27999f1112632222cb84f6cffff7c5f");
assert_eq!(update_files.len(), 8);
assert!(update_files[0..].iter().all(|u| u.is_none())); // everything already processed
@ -810,7 +810,7 @@ pub(crate) mod test {
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"8df6eab075a44b3c1af6b726f9fd9a43");
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"0155a664b0cf62aae23db5138b6b03d7");
assert_eq!(update_files.len(), 9);
assert!(update_files[..].iter().all(|u| u.is_none())); // no update file in dump v1

Some files were not shown because too many files have changed in this diff Show More