Commit Graph

8888 Commits

Author SHA1 Message Date
d59b7db8d0 remove unused code 2023-11-20 10:10:45 +01:00
263e825619 Fix typos in comments 2023-11-20 10:06:29 +01:00
69354a6144 Add the benchmarck name to the bot message 2023-11-15 13:56:54 +01:00
b0adc73ce6 Merge pull request #4207 from meilisearch/diff-indexing-prefix-databases
Diff indexing prefix databases
prototype-diff-indexing-1
2023-11-14 16:04:05 +01:00
2b5d9042d1 Merge #4208
4208: Makes the dump cancellable r=Kerollmops a=irevoire

# Pull Request

Make the dump tasks cancellable even when they have already started processing.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4157


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-11-14 13:31:45 +00:00
5b57fbab08 makes the dump cancellable 2023-11-14 11:23:13 +01:00
72d3fa4898 Merge #4203
4203: Extract external document docids from docs on deletion by filter r=Kerollmops a=dureuill

This fixes some of the performance regression observed on `diff-indexing` when doing delete-by-filter with a filter matching many documents.

To delete 19 768 771 documents (hackernews dataset, all documents matching `type = comment`), here are the observed time:

|branch (commit sha1sum)|time|speed-down factor (lower is better)|
|--|--|--|
|`main` (48865470d7)|1212.885536s (~20min)|x1.0 (baseline)|
|`diff-indexing` (523519fdbf)|5385.550543s (90min)|x4.44|
|**`diff-indexing-extract-primary-key`**(f8289cd974)|2582.323324s (43min) | x2.13|

So we're still suffering a speed-down of x2.13, but that's much better than x4.44.

---

Changes:

- Refactor the logic of PrimaryKey extraction to a struct
- Add a trait to abstract the extraction of field id from a name between `DocumentBatch` and `FieldIdMap`.
- Add `Index::external_id_of` to get the external ids of a bitmap of internal ids.
- Use this new method to add new Transform and Batch methods to remove documents that are known to be from the DB.
- Modify delete-by-filter to use the new method

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-11-13 13:02:10 +00:00
772964125d Factor removal of document from DB 2023-11-13 13:51:22 +01:00
378deb0bef Rename trait 2023-11-13 13:38:36 +01:00
1f36410541 Update tests 2023-11-13 13:36:39 +01:00
b11f85a635 Merge #4205
4205: Prevent search hang on the processing index r=Kerollmops a=dureuill

Fixes #4206, an issue originally [reported on Discord](https://discord.com/channels/1006923006964154428/1148983671026618579/1148983671026618579) where having parallel search requests on more indexes than the index cache capacity would cause search requests on the currently updating index to hang until the index is done updating.

## Test setup

- Create 20 empty indexes by sending settings to them
- repeatedly send placeholder search requests to each of the indexes in a loop
- Create another index and send a significant batch of documents to index.
- Attempt to perform a search request on that last index.
  - Before this PR, the search request hangs while the index update task is processing
  - After this PR, the search request respond immediately even while the index update task is processing

## Changes

- When getting the handle to an index for some potentially long running batches of tasks, save it in the index scheduler.
- Drop the handle from the index-scheduler when the task is done so that we don't leak indexes.
- When getting an index from outside the task queue processor, check if there is such an handle matching the requested index. If so, skip the cache entirely and clone the handle.

Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
v1.5.0 v1.5.0-rc.3
2023-11-13 10:36:01 +00:00
a2d6dc8571 Fix typo, remove caching for the change of index 2023-11-13 10:44:36 +01:00
ee1701157f Merge #4204
4204: Throw error when the vector search is sent with the wrong size r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #4201 


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-11-13 09:43:20 +00:00
8c649d8061 Throw error when the vector search is sent with the wrong size 2023-11-13 09:57:42 +01:00
492fc086f0 cargo fmt 2023-11-12 21:53:11 +01:00
a2d0c73b41 Save the currently updating index so that the search can access it at all times 2023-11-10 10:52:03 +01:00
264b10ec20 Fixup documentation 2023-11-09 16:23:20 +01:00
825257da76 Use more efficient method for deletion in benchmarks 2023-11-09 16:13:15 +01:00
f8289cd974 Use it from delete-by-filter 2023-11-09 14:23:15 +01:00
3053e01c05 Batch::remove_documents_from_db_no_batch 2023-11-09 14:23:02 +01:00
b11c2afac0 Index::external_id_of 2023-11-09 14:22:43 +01:00
9cef800b2a Enrich uses the new type 2023-11-09 14:22:05 +01:00
db2fb86b8b Extract PrimaryKey logic to a type 2023-11-09 14:19:16 +01:00
882ab9cc85 remove warnings 2023-11-09 11:35:33 +01:00
5a9c96e1db Compute word integer prefix cache 2023-11-09 11:34:26 +01:00
70ce40828c Compute word docids prefix cache 2023-11-08 17:01:00 +01:00
688266c83e Remove word pair proximity prefix cache and compute it at search time 2023-11-08 14:16:01 +01:00
6dab826908 Reactivate prefix databases 2023-11-08 13:58:01 +01:00
1e2fbc6a42 revert "REVERT ME: ignore prefix pair databases tests"
This reverts commit 1b2ea6cf19.
2023-11-08 11:50:52 +01:00
523519fdbf Merge pull request #4195 from meilisearch/diff-indexing-remove-from-batch
Remove `IndexOperation::DocumentDeletion`
2023-11-08 10:29:49 +01:00
ef6fa10f7a Remove IndexOperation::DocumentDeletion 2023-11-06 12:16:15 +01:00
620fee35f9 Fix benches prototype-diff-indexing-0 2023-11-06 11:56:46 +01:00
cbaa54cafd Fix clippy issues 2023-11-06 11:19:31 +01:00
1bccf2079e Correctly mark non-tests as non-tests 2023-11-06 11:03:56 +01:00
1b2ea6cf19 REVERT ME: ignore prefix pair databases tests 2023-11-06 10:46:22 +01:00
1ad1fcc8c8 Remove all warnings 2023-11-06 10:31:14 +01:00
48865470d7 Merge #4191
4191: Remove banner r=Kerollmops a=curquiza



Co-authored-by: Clémentine U. - curqui <clementine@meilisearch.com>
2023-11-02 17:14:23 +00:00
c810df4d9f Update README.md 2023-11-02 17:40:18 +01:00
87610a5f98 Don't try to delete a document that is not in the database 2023-11-02 16:49:03 +01:00
2544bc1416 Merge pull request #4160 from meilisearch/diff-indexing-vector-points
Diff Indexing for the vector points
2023-11-02 16:01:51 +01:00
ff522c919d Fix the vector extractions for the diff indexing 2023-11-02 15:58:08 +01:00
1c39459cf4 Merge pull request #4179 from meilisearch/diff-indexing-fix-nested-primary-key
Diff indexing fix nested primary key
2023-11-02 15:39:50 +01:00
bf0651f23c Implement iter method on ExternalDocumentsIds 2023-11-02 15:38:00 +01:00
5b20e625f3 fix merge 2023-11-02 15:31:37 +01:00
bc51d6157a Fix transform reindexing path 2023-11-02 15:26:20 +01:00
1b4ff991c0 update typed chunks 2023-11-02 15:26:20 +01:00
4b64c33aa2 update vector extractor 2023-11-02 15:26:20 +01:00
12323d610e Change the original document sorter key from the internal docid to a concatenation of the internal and the external docid 2023-11-02 15:26:20 +01:00
44e9033b3a Merge pull request #4181 from meilisearch/diff-indexing-parallel-transform
Use rayon to sort entries in parallel
2023-11-02 15:16:10 +01:00
4d864f0702 Always sort internal Sorter entries in parallel 2023-11-02 14:47:43 +01:00