Commit Graph

29 Commits

Author SHA1 Message Date
d80edead01 fix the old indexer 2025-07-17 18:21:48 +02:00
3f00f56f9f finish plugin cellulite to the new indexer 2025-07-16 00:10:40 +02:00
a921ee31ce Cellulite is almost in the new indexer. We must add the documentID to the geojson pipeline 2025-07-15 23:48:14 +02:00
f8232976ed Implement in new document indexer 2025-07-02 00:05:12 +02:00
d35b2d8d33 minor fixes 2025-06-30 09:52:06 +02:00
900be0ccad Extract or regenerate vectors related to settings changes 2025-06-26 18:14:48 +02:00
b025f1bcf1 Merge branch 'main' into release-v1.14.0-tmp 2025-04-14 12:35:47 +02:00
db7ce03763 Improve the performances of computing the size of the documents database 2025-03-26 17:40:12 +01:00
bf3a29b60d Document problematic case in test and acknowledge PR comment 2025-03-26 12:57:25 +01:00
43c8a206b4 detail comments 2025-03-25 13:07:17 +01:00
6b1c262b74 fix all tests 2025-03-25 12:43:15 +01:00
d71c6f3483 allow multiple embedding in per document per embedder to pass 2025-03-25 12:04:25 +01:00
e019ad7692 Display more detailed error message instead of panic 2025-03-21 15:41:31 +01:00
009c36a4d0 Add support for the progress API of arroy 2025-03-13 19:00:43 +01:00
d53225bf64 uses a random seed instead of 42 2025-03-13 12:43:31 +01:00
ef9d9f8481 set the memory in arroy 2025-03-13 11:29:00 +01:00
95bccaf5f5 Refactor Document indexing process (Facets)
**Changes:**
The Documents changes now take a selector closure instead of a list of field to match the field to extract.
The seek_leaf_values_in_object function now uses a selector closure of a list of field to match the field to extract
The facet database extraction is now relying on the FilterableAttributesRule to match the field to extract.
The facet-search database extraction is now relying on the FieldIdMapWithMetadata to select the field to index.
The facet level database extraction is now relying on the FieldIdMapWithMetadata to select the field to index.

**Important:**
Because the filterable attributes are patterns now,
the fieldIdMap will only register the fields that exists in at least one document.
if a field doesn't exist in any document, it will not be registered even if it has been specified in the filterable fields.

**Impact:**
- Document Addition/modification facet indexing
- Document deletion facet indexing
2025-03-03 10:32:03 +01:00
d25953f322 fix clippy 2025-02-26 17:02:43 +01:00
9f3663e768 Implement Incremental document database stats computing 2025-02-26 17:01:35 +01:00
245a55722a Remove commented code 2025-02-20 16:48:18 +01:00
05cc8c650c Expose the write channel congestion in the batches 2025-02-19 15:47:54 +01:00
796acd1aee Merge #5288
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 13s
Test suite / Run tests in debug (push) Failing after 13s
Test suite / Run Clippy (push) Failing after 19s
Test suite / Tests on windows-2022 (push) Failing after 48s
Test suite / Run Rustfmt (push) Successful in 1m28s
Test suite / Tests on macos-13 (push) Has been cancelled
5288: Improve AI logging r=dureuill a=Kerollmops

This PR fixes #5285 and brings the changes from #5233 to simplify debugging indexation and search performance issues related to AI. The following texts can be found in the logs to debug and understand performance issues:

 - `embed_one: search` represents the time we spent waiting for the embedding generation, i.e., OpenAI, local HuggingFace, Ollama.
 - `filtered_universe: search::universe` the time spent filtering the documents.
 - ~`next_bucket: search::vector_sort` is the time spent finding the nearest neighbors (ANNs) in the vector store (arroy), locally~ was being triggered too many times.
 - `indexing::vectors` is the time arroy spends indexing the new vectors for a batch.
 - `documents::extract vectors` and `documents::merge vectors` to see the time spent generating and writing the embeddings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-04 10:20:45 +00:00
a00796c46a Improve the naming in the log message 2025-01-29 14:21:02 +01:00
6112bd8caa Display the channel congestion 2025-01-29 14:21:02 +01:00
4a5923a55e log the time arroy took to insert embeddings 2025-01-27 14:22:17 +01:00
a6470a0c37 Improve error log 2025-01-22 15:50:41 +01:00
8a54f14b8e Demote panic to error log 2025-01-22 15:49:24 +01:00
63c8cbae5b Improve the panic message when deleting an unknown entry 2025-01-14 10:31:44 +01:00
de7f8c4406 refactor indexer mod 2025-01-07 15:29:02 +01:00