36c27a18a1
implement the dry run ha parameter
2024-02-26 13:58:04 +01:00
1eb1c043b5
disable the auto deletion of tasks when the ha mode is enabled
2024-02-26 13:58:04 +01:00
eb25b07390
let you specify your task id
2024-02-26 13:56:31 +01:00
066a7a3cde
takes only one read transaction per thread
2024-02-26 10:43:04 +01:00
91cdd502f8
When processing tasks, make the update file deletion atomic
2024-02-22 14:56:22 +01:00
3b6544db6d
Implement the experimental log mode cli flag
2024-02-13 18:09:15 +01:00
ef994d84d0
Change error messages and fix tests
2024-02-08 15:04:06 +01:00
f70a615ed9
update the github discussion links
2024-02-08 15:04:05 +01:00
7ff722b72e
get rids of the log dependencies everywhere
2024-02-08 15:04:05 +01:00
e23ec4886d
fix the tests and add tests on the experimental features
2024-02-08 15:04:03 +01:00
7793ba67a4
hide the route logs behind a feature flag
2024-02-08 15:03:33 +01:00
2f1abd2c03
nelson is not used anymore
2024-02-08 15:03:32 +01:00
02e6c8a440
Add tracing to index-scheduler
2024-02-08 15:03:31 +01:00
05edd85d75
Stabilize scoreDetails
2024-02-06 11:15:19 +01:00
5869ca7716
Upgrade all compatible dependencies
2024-01-16 15:05:03 +01:00
1ccde9bf0b
Merge #4316
...
4316: Autobatch the task deletions r=curquiza a=irevoire
# Pull Request
## Related issue
Fix part of https://github.com/meilisearch/meilisearch-support/issues/69
Fix #4315
## What does this PR do?
- Autobatch the task deletions
Co-authored-by: Tamo <tamo@meilisearch.com >
2024-01-15 17:54:50 +00:00
b4d7d80ad9
autobatch the task deletions
2024-01-11 14:58:07 +01:00
97bb1ff9e2
Move currently_updating_index
to IndexMapper
2024-01-09 15:37:27 +01:00
658ec6e0a4
Merge #4279
...
4279: Check experimental feature on setting update query rather than in the task. r=ManyTheFish a=dureuill
Improve the UX by checking for the vector store feature and returning an error synchronously when sending a setting update, rather than in the indexing task.
Co-authored-by: Louis Dureuil <louis@meilisearch.com >
2023-12-22 11:36:12 +00:00
ee54d3171e
Check experimental feature at query time
2023-12-21 15:26:12 +01:00
fa2b96b9a5
Add an Authorization Header along with the webhook calls
2023-12-19 12:18:45 +01:00
4fb25b8782
fix clippy
2023-12-19 10:35:51 +01:00
c83a33017e
stream and chunk the data
2023-12-19 10:35:51 +01:00
be72326c0a
gzip the tasks
2023-12-19 10:35:51 +01:00
0b2fff27f2
update and fix the test
2023-12-19 10:35:51 +01:00
3adbc2b942
return a task view instead of a task
2023-12-19 10:35:51 +01:00
fbea721378
add a first working test with actixweb
2023-12-19 10:35:51 +01:00
d78ad51082
Implement the webhook
2023-12-19 10:35:50 +01:00
9e1b458010
Merge branch 'main' into change-proximity-precision-settings
2023-12-18 09:08:47 +01:00
e0cc775dc4
Various changes
...
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
2023-12-14 16:08:41 +01:00
922a640188
WIP multi embedders
...
fixed template bugs
2023-12-14 16:08:41 +01:00
abbe131084
Cosmetic change
2023-12-14 16:08:41 +01:00
13c2c6c16b
Small commit to add hybrid search and autoembedding
2023-12-14 16:07:48 +01:00
35e1981488
Remove proximityPrecision form the experimental feature
2023-12-14 15:52:42 +01:00
7e259cb0d2
Expose the --max-number-of-batched-tasks argument
2023-12-11 16:08:39 +01:00
1f4fc9c229
Make the feature experimental
2023-12-06 15:49:05 +01:00
6376c342c1
Merge #4223
...
4223: Update to heed 0.20 r=dureuill a=Kerollmops
This PR brings the v0.20-alpha.9 version of heed into Meilisearch đ The main goal is to test it in a real environment to make the necessary changes if needed. We also want to merge it as soon as possible during the pre-release phase to ensure we catch bugs before the release.
Most of the calls to heed are the same as before, except:
- The `PolyDatabase` has been replaced with a `Database<Unspecified, Unspecified>`. We replaced the `get<T, U>()` by a `remap<T, U>().get()` calls.
- The `Database` `append(...)` method has been replaced with a `put_with_flags(PutFlags::APPEND, ...)`.
- The `RwTxn<'e, 'p>` has been simplified into a `RwTxn<'e>`.
- The `BytesEncode/Decode` traits return a `Result<_, BoxedError>` instead of an `Option<_>`.
- We no longer need to wrap and unwrap the `BEU32` integer when storing/getting them from heed.
### TODO
- [x] Create actual, simple error types instead of using strings in the codecs.
### Follow-up work
- Move the codecs into another member crate (we depend on the uuid one in the meilitool crate).
- Display the internal decoding error in the `SerializationError` internal error variant.
Co-authored-by: Clément Renault <clement@meilisearch.com >
2023-11-28 13:39:44 +00:00
ec9b52d608
Rename copy_to_path to copy_to_file
2023-11-28 14:32:30 +01:00
34c67ac389
Remove the possibility to fail fetching the env info
2023-11-28 14:31:23 +01:00
5751f5c640
fix puffin in the index scheduler
2023-11-27 15:18:33 +01:00
0dbf1a16ff
Make clippy happy
2023-11-23 14:11:38 +01:00
462b4c0080
Fix the tests
2023-11-23 12:07:35 +01:00
0d4482625a
Make the changes to use heed v0.20-alpha.6
2023-11-23 11:43:58 +01:00
7cb7e37ba8
Merge branch 'main' into tmp-release-v1.5.0
2023-11-21 16:30:46 +01:00
33b7c574ea
Merge #4090
...
4090: Diff indexing r=ManyTheFish a=ManyTheFish
This pull request aims to reduce the indexing time by computing a difference between the data added to the index and the data removed from the index before writing in LMDB.
## Why focus on reducing the writings in LMDB?
The indexing in Meilisearch is split into 3 main phases:
1) The computing or the extraction of the data (Multi-threaded)
2) The writing of the data in LMDB (Mono-threaded)
3) The processing of the prefix databases (Mono-threaded)
see below:

Because the writing is mono-threaded, it represents a bottleneck in the indexing, reducing the number of writes in LMDB will reduce the pressure on the main thread and should reduce the global time spent on the indexing.
## Give Feedback
We created [a dedicated discussion](https://github.com/meilisearch/meilisearch/discussions/4196 ) for users to try this new feature and to give feedback on bugs or performance issues.
## Technical approach
### Part 1: merge the addition and the deletion process
This part:
a) Aims to reduce the time spent on indexing only the filterable/sortable fields of documents, for example:
- Updating the number of "likes" or "stars" of a song or a movie
- Updating the "stock count" or the "price" of a product
b) Aims to reduce the time spent on writing in LMDB which should reduce the global indexing time for the highly multi-threaded machines by reducing the writing bottleneck.
c) Aims to reduce the average time spent to delete documents without having to keep the soft-deleted documents implementation
- [x] Create a preprocessing function that creates the diff-based documents chuck (`OBKV<fid, OBKV<AddDel, value>>`)
- [x] and clearly separate the faceted fields and the searchable fields in two different chunks
- Change the parameters of the input extractor by taking an `OBKV<fid, OBKV<AddDel, value>>` instead of `OBKV<fid, value>`.
- [x] extract_docid_word_positions
- [x] extract_geo_points
- [x] extract_vector_points
- [x] extract_fid_docid_facet_values
- Adapt the searchable extractors to the new diff-chucks
- [x] extract_fid_word_count_docids
- [x] extract_word_pair_proximity_docids
- [x] extract_word_position_docids
- [x] extract_word_docids
- Adapt the facet extractors to the new diff-chucks
- [x] extract_facet_number_docids
- [x] extract_facet_string_docids
- [x] extract_fid_docid_facet_values
- [x] FacetsUpdate
- [x] Adapt the prefix database extractors â ïž â ïž
- [x] Make the LMDB writer remove the document_ids to delete at the same time the new document_ids are added
- [x] Remove document deletion pipeline
- [x] remove `new_documents_ids` entirely and `replaced_documents_ids`
- [x] reuse extracted external id from transform instead of re-extracting in `TypedChunks::Documents`
- [x] Remove deletion pipeline after autobatcher
- [x] remove autobatcher deletion pipeline
- [x] everything uses `IndexOperation::DocumentOperation`
- [x] repair deletion by internal id for filter by delete
- [x] Improve the deletion via internal ids by avoiding iterating over the whole set of external document ids.
- [x] Remove soft-deleted documents
#### FIXME
- [x] field distribution is not correctly updated after deletion
- [x] missing documents in the tests of tokenizer_customization
### Part 2: Only compute the documents field by field
This part aims to reduce the global indexing time for any kind of partial document modification on any size of machine from the mono-threaded one to the highly multi-threaded one.
- [ ] Make the preprocessing function only send the fields that changed to the extractors
- [ ] remove the `word_docids` and `exact_word_docids` database and adapt the search (â ïž could impact the search performances)
- [ ] replace the `word_pair_proximity_docids` database with a `word_pair_proximity_fid_docids` database and adapt the search (â ïž could impact the search performances)
- [Â ] Adapt the prefix database extractors â ïž â ïž
## Technical Concerns
- The part 1 implementation could increase the indexing time for the smallest machines (with few threads) by increasing the extracting time (multi-threaded) more than the writing time (mono-threaded)
- The part 2 implementation needs to change the databases which could have a significant impact on the search performances
- The prefix databases are a bit special to process and may be a pain to adapt to the difference-based indexing
Co-authored-by: ManyTheFish <many@meilisearch.com >
Co-authored-by: Clément Renault <clement@meilisearch.com >
Co-authored-by: Louis Dureuil <louis@meilisearch.com >
2023-11-21 09:44:38 +00:00
5b57fbab08
makes the dump cancellable
2023-11-14 11:23:13 +01:00
a2d6dc8571
Fix typo, remove caching for the change of index
2023-11-13 10:44:36 +01:00
492fc086f0
cargo fmt
2023-11-12 21:53:11 +01:00
a2d0c73b41
Save the currently updating index so that the search can access it at all times
2023-11-10 10:52:03 +01:00
f8289cd974
Use it from delete-by-filter
2023-11-09 14:23:15 +01:00