6376c342c1
Merge #4223
...
4223: Update to heed 0.20 r=dureuill a=Kerollmops
This PR brings the v0.20-alpha.9 version of heed into Meilisearch đ The main goal is to test it in a real environment to make the necessary changes if needed. We also want to merge it as soon as possible during the pre-release phase to ensure we catch bugs before the release.
Most of the calls to heed are the same as before, except:
- The `PolyDatabase` has been replaced with a `Database<Unspecified, Unspecified>`. We replaced the `get<T, U>()` by a `remap<T, U>().get()` calls.
- The `Database` `append(...)` method has been replaced with a `put_with_flags(PutFlags::APPEND, ...)`.
- The `RwTxn<'e, 'p>` has been simplified into a `RwTxn<'e>`.
- The `BytesEncode/Decode` traits return a `Result<_, BoxedError>` instead of an `Option<_>`.
- We no longer need to wrap and unwrap the `BEU32` integer when storing/getting them from heed.
### TODO
- [x] Create actual, simple error types instead of using strings in the codecs.
### Follow-up work
- Move the codecs into another member crate (we depend on the uuid one in the meilitool crate).
- Display the internal decoding error in the `SerializationError` internal error variant.
Co-authored-by: Clément Renault <clement@meilisearch.com >
2023-11-28 13:39:44 +00:00
ec9b52d608
Rename copy_to_path to copy_to_file
2023-11-28 14:32:30 +01:00
34c67ac389
Remove the possibility to fail fetching the env info
2023-11-28 14:31:23 +01:00
5751f5c640
fix puffin in the index scheduler
2023-11-27 15:18:33 +01:00
0dbf1a16ff
Make clippy happy
2023-11-23 14:11:38 +01:00
462b4c0080
Fix the tests
2023-11-23 12:07:35 +01:00
0d4482625a
Make the changes to use heed v0.20-alpha.6
2023-11-23 11:43:58 +01:00
7cb7e37ba8
Merge branch 'main' into tmp-release-v1.5.0
2023-11-21 16:30:46 +01:00
33b7c574ea
Merge #4090
...
4090: Diff indexing r=ManyTheFish a=ManyTheFish
This pull request aims to reduce the indexing time by computing a difference between the data added to the index and the data removed from the index before writing in LMDB.
## Why focus on reducing the writings in LMDB?
The indexing in Meilisearch is split into 3 main phases:
1) The computing or the extraction of the data (Multi-threaded)
2) The writing of the data in LMDB (Mono-threaded)
3) The processing of the prefix databases (Mono-threaded)
see below:

Because the writing is mono-threaded, it represents a bottleneck in the indexing, reducing the number of writes in LMDB will reduce the pressure on the main thread and should reduce the global time spent on the indexing.
## Give Feedback
We created [a dedicated discussion](https://github.com/meilisearch/meilisearch/discussions/4196 ) for users to try this new feature and to give feedback on bugs or performance issues.
## Technical approach
### Part 1: merge the addition and the deletion process
This part:
a) Aims to reduce the time spent on indexing only the filterable/sortable fields of documents, for example:
- Updating the number of "likes" or "stars" of a song or a movie
- Updating the "stock count" or the "price" of a product
b) Aims to reduce the time spent on writing in LMDB which should reduce the global indexing time for the highly multi-threaded machines by reducing the writing bottleneck.
c) Aims to reduce the average time spent to delete documents without having to keep the soft-deleted documents implementation
- [x] Create a preprocessing function that creates the diff-based documents chuck (`OBKV<fid, OBKV<AddDel, value>>`)
- [x] and clearly separate the faceted fields and the searchable fields in two different chunks
- Change the parameters of the input extractor by taking an `OBKV<fid, OBKV<AddDel, value>>` instead of `OBKV<fid, value>`.
- [x] extract_docid_word_positions
- [x] extract_geo_points
- [x] extract_vector_points
- [x] extract_fid_docid_facet_values
- Adapt the searchable extractors to the new diff-chucks
- [x] extract_fid_word_count_docids
- [x] extract_word_pair_proximity_docids
- [x] extract_word_position_docids
- [x] extract_word_docids
- Adapt the facet extractors to the new diff-chucks
- [x] extract_facet_number_docids
- [x] extract_facet_string_docids
- [x] extract_fid_docid_facet_values
- [x] FacetsUpdate
- [x] Adapt the prefix database extractors â ïž â ïž
- [x] Make the LMDB writer remove the document_ids to delete at the same time the new document_ids are added
- [x] Remove document deletion pipeline
- [x] remove `new_documents_ids` entirely and `replaced_documents_ids`
- [x] reuse extracted external id from transform instead of re-extracting in `TypedChunks::Documents`
- [x] Remove deletion pipeline after autobatcher
- [x] remove autobatcher deletion pipeline
- [x] everything uses `IndexOperation::DocumentOperation`
- [x] repair deletion by internal id for filter by delete
- [x] Improve the deletion via internal ids by avoiding iterating over the whole set of external document ids.
- [x] Remove soft-deleted documents
#### FIXME
- [x] field distribution is not correctly updated after deletion
- [x] missing documents in the tests of tokenizer_customization
### Part 2: Only compute the documents field by field
This part aims to reduce the global indexing time for any kind of partial document modification on any size of machine from the mono-threaded one to the highly multi-threaded one.
- [ ] Make the preprocessing function only send the fields that changed to the extractors
- [ ] remove the `word_docids` and `exact_word_docids` database and adapt the search (â ïž could impact the search performances)
- [ ] replace the `word_pair_proximity_docids` database with a `word_pair_proximity_fid_docids` database and adapt the search (â ïž could impact the search performances)
- [Â ] Adapt the prefix database extractors â ïž â ïž
## Technical Concerns
- The part 1 implementation could increase the indexing time for the smallest machines (with few threads) by increasing the extracting time (multi-threaded) more than the writing time (mono-threaded)
- The part 2 implementation needs to change the databases which could have a significant impact on the search performances
- The prefix databases are a bit special to process and may be a pain to adapt to the difference-based indexing
Co-authored-by: ManyTheFish <many@meilisearch.com >
Co-authored-by: Clément Renault <clement@meilisearch.com >
Co-authored-by: Louis Dureuil <louis@meilisearch.com >
2023-11-21 09:44:38 +00:00
5b57fbab08
makes the dump cancellable
2023-11-14 11:23:13 +01:00
a2d6dc8571
Fix typo, remove caching for the change of index
2023-11-13 10:44:36 +01:00
492fc086f0
cargo fmt
2023-11-12 21:53:11 +01:00
a2d0c73b41
Save the currently updating index so that the search can access it at all times
2023-11-10 10:52:03 +01:00
f8289cd974
Use it from delete-by-filter
2023-11-09 14:23:15 +01:00
ef6fa10f7a
Remove IndexOperation::DocumentDeletion
2023-11-06 12:16:15 +01:00
cbaa54cafd
Fix clippy issues
2023-11-06 11:19:31 +01:00
e507ef5932
Slow the logging down
2023-11-01 13:49:32 +01:00
13416ccbf7
Introduce a new meilitool to help the cloud team
2023-10-30 14:30:20 +01:00
dfab6293c9
Use an LMDB database to store the external documents ids
2023-10-30 11:41:23 +01:00
652ac3052d
use new iterator in batch
2023-10-30 11:41:22 +01:00
c534a1b687
Stop using delete documents pipeline in batch runner
2023-10-30 11:41:22 +01:00
cf8dad1ca0
index_scheduler.features() is no longer fallible
2023-10-23 10:38:56 +02:00
dd619913da
Use RwLock to never persist cli state to db
2023-10-19 12:45:57 -07:00
d8c649b3cd
Return recoverable error if we fail to retrieve metrics state
2023-10-18 08:28:24 -07:00
12fc878640
Merge remote-tracking branch 'origin/main' into enable-metrics-http
2023-10-16 13:48:01 -07:00
689ec7c7ad
Make the experimental route /metrics activable via HTTP
2023-10-13 22:12:54 +00:00
3655d4bdca
Move the puffin file export logic into the run function
2023-10-13 13:11:30 +02:00
055ca3935b
Update index-scheduler/src/batch.rs
...
Co-authored-by: Tamo <tamo@meilisearch.com >
2023-10-13 13:11:30 +02:00
bf8fac6676
Fix the tests
2023-10-13 13:11:30 +02:00
f2a9e1ebbb
Improve the debugging experience in the puffin reports
2023-10-13 13:11:30 +02:00
513e61e9a3
Remove the experimental CLI flag
2023-10-13 13:11:29 +02:00
90a626bf80
Use the runtime feature to enable puffin report exporting
2023-10-13 13:11:29 +02:00
0d4acf2daa
Fix the metrics product URL
2023-10-13 13:11:29 +02:00
58db8d85ec
Add the exportPuffinReports
option to the runtime features route
2023-10-13 13:11:29 +02:00
656dadabea
Expose an experimental flag to write the puffin reports to disk
2023-10-13 13:11:09 +02:00
34fac115d5
fix clippy
2023-09-11 17:15:57 +02:00
9258e5b5bf
Fix the stats of the documents deletion by filter
...
The issue was that the operation « DocumentDeletionByFilter » was not
declared as an index operation. That means the indexes stats were not
reprocessed after the application of the operation.
2023-09-11 14:04:10 +02:00
e4e49e63d0
Merge #3993
...
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza
Co-authored-by: irevoire <irevoire@users.noreply.github.com >
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com >
Co-authored-by: ManyTheFish <many@meilisearch.com >
2023-08-10 14:30:02 +00:00
fe819a9d80
fix the get stats method
...
It was not taking into account the processing tasks at all
2023-08-08 13:21:15 +02:00
b45c36cd71
Merge branch 'main' into tmp-release-v1.3.0
2023-08-01 15:05:17 +02:00
eef95de30e
First iteration on exposing puffin profiling
2023-07-18 17:38:13 +02:00
22762808ab
Fix the tests
2023-07-06 12:13:29 +02:00
86b834c9e4
Display the total number of tasks in the tasks route
2023-07-06 10:05:18 +02:00
aae099e330
Merge #3851
...
3851: Expose lastUpdate and isIndexing in /stats endpoint r=dureuill a=gentcys
# Pull Request
## Related issue
Fixes #3843
## What does this PR do?
- expose lastUpdate in `/stats` endpoint
- expose isIndex in `stats` endpoint
- add a method `is_task_processing` in index-scheduler/src/lib.rs.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Cong Chen <cong.chen@ocrlabs.com >
Co-authored-by: ManyTheFish <many@meilisearch.com >
Co-authored-by: Louis Dureuil <louis@meilisearch.com >
2023-07-03 13:41:04 +00:00
71500a4e15
Update tests
2023-07-03 11:20:43 +02:00
324d448236
Format let-else â€ïž đ
2023-07-03 10:20:28 +02:00
9859e65d2f
fix tests
2023-07-01 09:32:50 +08:00
3bdf01bc1c
Fix failed test
2023-06-30 17:39:23 +08:00
a5a31667b0
fix converse result of is_task_processing()
2023-06-30 11:28:18 +08:00
e3fc7112bc
use RoaringBitmap::is_empty
instead
2023-06-29 11:46:47 +08:00