199 Commits

Author SHA1 Message Date
ManyTheFish
689e69d6d2 Take into account PR messages 2025-03-10 13:46:33 +01:00
ManyTheFish
8ec0c322ea Apply PR requests related to Refactor the FieldIdMapWithMetadata 2025-03-06 11:42:53 +01:00
ManyTheFish
ae8d453868 Refactor Document indexing process (searchables)
**Changes:**
The searchable database extraction is now relying on the AttributePatterns and FieldIdMapWithMetadata to match the field to extract.
Remove the SearchableExtractor trait to make the code less complex.

**Impact:**
- Document Addition/modification searchable indexing
- Document deletion searchable indexing
2025-03-03 10:32:42 +01:00
ManyTheFish
95bccaf5f5 Refactor Document indexing process (Facets)
**Changes:**
The Documents changes now take a selector closure instead of a list of field to match the field to extract.
The seek_leaf_values_in_object function now uses a selector closure of a list of field to match the field to extract
The facet database extraction is now relying on the FilterableAttributesRule to match the field to extract.
The facet-search database extraction is now relying on the FieldIdMapWithMetadata to select the field to index.
The facet level database extraction is now relying on the FieldIdMapWithMetadata to select the field to index.

**Important:**
Because the filterable attributes are patterns now,
the fieldIdMap will only register the fields that exists in at least one document.
if a field doesn't exist in any document, it will not be registered even if it has been specified in the filterable fields.

**Impact:**
- Document Addition/modification facet indexing
- Document deletion facet indexing
2025-03-03 10:32:03 +01:00
ManyTheFish
d25953f322
fix clippy 2025-02-26 17:02:43 +01:00
ManyTheFish
9f3663e768
Implement Incremental document database stats computing 2025-02-26 17:01:35 +01:00
Kerollmops
76fd5d92d7
Clarify the tail writing to database 2025-02-20 17:35:23 +01:00
Kerollmops
245a55722a
Remove commented code 2025-02-20 16:48:18 +01:00
Kerollmops
05cc8c650c
Expose the write channel congestion in the batches 2025-02-19 15:47:54 +01:00
meili-bors[bot]
0f1aeb8eaa
Merge #5351
Some checks failed
Look for flaky tests / flaky (push) Failing after 19s
SDKs tests / define-docker-image (push) Failing after 5s
SDKs tests / .NET SDK tests (push) Has been skipped
SDKs tests / Dart SDK tests (push) Has been skipped
SDKs tests / Go SDK tests (push) Has been skipped
SDKs tests / Java SDK tests (push) Has been skipped
SDKs tests / JS SDK tests (push) Has been skipped
SDKs tests / PHP SDK tests (push) Has been skipped
SDKs tests / Python SDK tests (push) Has been skipped
SDKs tests / Ruby SDK tests (push) Has been skipped
SDKs tests / Rust SDK tests (push) Has been skipped
SDKs tests / Swift SDK tests (push) Has been skipped
SDKs tests / meilisearch-js-plugins tests (push) Has been skipped
SDKs tests / meilisearch-rails tests (push) Has been skipped
SDKs tests / meilisearch-symfony tests (push) Has been skipped
Publish binaries to GitHub release / Check the version validity (push) Successful in 9s
Publish binaries to GitHub release / Publish binary for aarch64 (meilisearch-linux-aarch64, aarch64-unknown-linux-gnu) (push) Failing after 2s
Publish binaries to GitHub release / Publish binary for Linux (push) Failing after 12s
Publish binaries to GitHub release / Publish binary for macos-13 (push) Has been cancelled
Publish binaries to GitHub release / Publish binary for windows-2022 (push) Has been cancelled
Publish binaries to GitHub release / Publish binary for macOS silicon (meilisearch-macos-apple-silicon, aarch64-apple-darwin) (push) Has been cancelled
Test suite / Tests on ubuntu-20.04 (push) Failing after 12s
Test suite / Test with Ollama (push) Failing after 7s
Test suite / Test disabled tokenization (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 11s
Test suite / Run Clippy (push) Failing after 17s
Test suite / Run Rustfmt (push) Successful in 1m51s
Test suite / Tests almost all features (push) Failing after 7m7s
Test suite / Tests on macos-13 (push) Has been cancelled
Test suite / Tests on windows-2022 (push) Has been cancelled
5351: Bring back v1.13.0 changes into main r=irevoire a=Kerollmops

This PR brings back the changes made in v1.13 into the main branch.

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Clémentine <clementine@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2025-02-18 08:05:02 +00:00
Louis Dureuil
b83275c9c5
Change the updated* functions to only_new functions, hopefully better communicating what they do 2025-02-11 15:27:10 +01:00
Louis Dureuil
d7f35ee3ba
Use merged document instead of updated 2025-02-11 15:27:10 +01:00
meili-bors[bot]
0c3e7fe963
Merge #5316
Some checks failed
Test suite / Tests on ubuntu-20.04 (push) Failing after 2s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 16s
Test suite / Run Clippy (push) Failing after 12s
Test suite / Run Rustfmt (push) Failing after 32s
Test suite / Tests on macos-13 (push) Has been cancelled
Test suite / Tests on windows-2022 (push) Has been cancelled
5316: Fix the dumpless upgrade corruption r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5280

## What does this PR do?
- Add a test that ensure we write the version in the index-scheduler even if we have a bug while writing the VERSION file
- Do what was described in the issue


Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-10 09:53:57 +00:00
Tamo
45f843ccb9 fmt 2025-02-10 10:46:42 +01:00
Kerollmops
2b0e17ede0
Make sure arroy is using the rayon thread-pool 2025-02-06 15:28:10 +01:00
meili-bors[bot]
796acd1aee
Merge #5288
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 13s
Test suite / Run tests in debug (push) Failing after 13s
Test suite / Run Clippy (push) Failing after 19s
Test suite / Tests on windows-2022 (push) Failing after 48s
Test suite / Run Rustfmt (push) Successful in 1m28s
Test suite / Tests on macos-13 (push) Has been cancelled
5288: Improve AI logging r=dureuill a=Kerollmops

This PR fixes #5285 and brings the changes from #5233 to simplify debugging indexation and search performance issues related to AI. The following texts can be found in the logs to debug and understand performance issues:

 - `embed_one: search` represents the time we spent waiting for the embedding generation, i.e., OpenAI, local HuggingFace, Ollama.
 - `filtered_universe: search::universe` the time spent filtering the documents.
 - ~`next_bucket: search::vector_sort` is the time spent finding the nearest neighbors (ANNs) in the vector store (arroy), locally~ was being triggered too many times.
 - `indexing::vectors` is the time arroy spends indexing the new vectors for a batch.
 - `documents::extract vectors` and `documents::merge vectors` to see the time spent generating and writing the embeddings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-04 10:20:45 +00:00
Tamo
d34f0b606c
Update crates/milli/src/update/new/document_change.rs 2025-02-03 12:08:52 +01:00
Kerollmops
acc400face
Support merging update and replacement operations 2025-02-03 11:47:17 +01:00
Kerollmops
8e6893ddbe
Make sure we correctly mix different document operations 2025-02-03 10:34:06 +01:00
Kerollmops
424c5bde40
Move the embedding computation and extraction log to debug 2025-01-29 16:40:36 +01:00
Kerollmops
cb1b7513af
Log the memory metrics only once 2025-01-29 15:21:52 +01:00
Clément Renault
a9d0f4a002
Improve english comments
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-01-29 15:16:40 +01:00
Kerollmops
db032079d8
Show indexation allocated memory 2025-01-29 14:21:02 +01:00
Clément Renault
a00796c46a
Improve the naming in the log message 2025-01-29 14:21:02 +01:00
Kerollmops
6112bd8caa
Display the channel congestion 2025-01-29 14:21:02 +01:00
Kerollmops
cec88cfc29
Measure the bbqueue congestion 2025-01-29 14:21:02 +01:00
Kerollmops
4a5923a55e
log the time arroy took to insert embeddings 2025-01-27 14:22:17 +01:00
Clément Renault
9b579069df
Comment the max grant of the bbqueue
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-01-24 12:18:32 +01:00
Louis Dureuil
f5a4a1c8b2
Give more RAM to bbqueue.
- bbqueue buffers used to have (5% * 2%) / num_threads
- they now have 5% / num_threads
2025-01-24 12:18:32 +01:00
Kerollmops
5ab4cdb1f3
Reduce the maximum grant possible we can store in the BBQueue 2025-01-24 12:18:32 +01:00
Louis Dureuil
d6063079af
Unify facet strings by their normalized value 2025-01-22 15:50:42 +01:00
Louis Dureuil
a6470a0c37
Improve error log 2025-01-22 15:50:41 +01:00
Louis Dureuil
8a54f14b8e
Demote panic to error log 2025-01-22 15:49:24 +01:00
Kerollmops
63c8cbae5b
Improve the panic message when deleting an unknown entry 2025-01-14 10:31:44 +01:00
Louis Dureuil
72ded27e98
Update after review 2025-01-14 10:24:50 +01:00
Louis Dureuil
4070895a21
Add support to upgrade to v1.12.3 in meilitool 2025-01-14 10:24:27 +01:00
Louis Dureuil
a21711f473
Fix test 2025-01-14 10:23:59 +01:00
meili-bors[bot]
247eaed872
Merge #5221
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 1m4s
Test suite / Tests on ubuntu-20.04 (push) Failing after 27s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 20s
Test suite / Run Clippy (push) Successful in 8m45s
Test suite / Run Rustfmt (push) Successful in 2m31s
5221: Merge bitmaps by using `Extend::extend` r=Kerollmops a=Kerollmops

This PR tries to speed up the merging of bitmaps by using [the new `Extend::extend` implementation](https://github.com/RoaringBitmap/roaring-rs/pull/306).

Co-authored-by: Clément Renault <clement@meilisearch.com>
2025-01-13 13:43:28 +00:00
meili-bors[bot]
cc4aca78c4
Merge #5220
5220: Merge back changes of v1.12.2 in main r=dureuill a=dureuill



Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: dureuill <dureuill@users.noreply.github.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-01-13 10:54:36 +00:00
Clément Renault
00a03742ff
Prefer using extend when merging bitmaps than unions (less allocations) 2025-01-09 10:42:38 +01:00
Louis Dureuil
4aa7c8f7b1
Remove unused FacetFieldIdOperation 2025-01-09 10:36:37 +01:00
Louis Dureuil
c14967eeac
Use new incremental facet indexing and enable sanity checks in debug 2025-01-09 10:36:35 +01:00
Tamo
908adee6fc
Fix the addition of empty payload 2025-01-09 10:24:36 +01:00
Clément Renault
71e5605daa
Make clippy happy 2025-01-08 18:24:39 +01:00
Louis Dureuil
4275833bab
Rename compute.rs to post_process.rs 2025-01-07 15:31:20 +01:00
Louis Dureuil
de7f8c4406
refactor indexer mod 2025-01-07 15:29:02 +01:00
Gnosnay
44eb153619 Replace hardcoded string with constants 2024-12-28 20:35:55 +08:00
meili-bors[bot]
ba11121cfc
Merge #5159
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 22s
Test suite / Run Rustfmt (push) Successful in 1m18s
Test suite / Run Clippy (push) Successful in 5m30s
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Has been cancelled
5159: Fix the New Indexer Spilling r=irevoire a=Kerollmops

Fix two bugs in the merging of the spilled caches. Thanks to `@ManyTheFish` and `@irevoire` 👏

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-12-12 17:16:53 +00:00
ManyTheFish
acdd5aa6ea
Use the thread source id instead of the destination id
when filtering on the cache to merge
2024-12-12 18:12:00 +01:00
Kerollmops
2f3cc8cdd2
Fix the merge_caches_sorted function 2024-12-12 16:15:37 +01:00