Commit Graph

148 Commits

Author SHA1 Message Date
796acd1aee Merge #5288
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 13s
Test suite / Run tests in debug (push) Failing after 13s
Test suite / Run Clippy (push) Failing after 19s
Test suite / Tests on windows-2022 (push) Failing after 48s
Test suite / Run Rustfmt (push) Successful in 1m28s
Test suite / Tests on macos-13 (push) Has been cancelled
5288: Improve AI logging r=dureuill a=Kerollmops

This PR fixes #5285 and brings the changes from #5233 to simplify debugging indexation and search performance issues related to AI. The following texts can be found in the logs to debug and understand performance issues:

 - `embed_one: search` represents the time we spent waiting for the embedding generation, i.e., OpenAI, local HuggingFace, Ollama.
 - `filtered_universe: search::universe` the time spent filtering the documents.
 - ~`next_bucket: search::vector_sort` is the time spent finding the nearest neighbors (ANNs) in the vector store (arroy), locally~ was being triggered too many times.
 - `indexing::vectors` is the time arroy spends indexing the new vectors for a batch.
 - `documents::extract vectors` and `documents::merge vectors` to see the time spent generating and writing the embeddings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-04 10:20:45 +00:00
acc400face Support merging update and replacement operations 2025-02-03 11:47:17 +01:00
8e6893ddbe Make sure we correctly mix different document operations 2025-02-03 10:34:06 +01:00
424c5bde40 Move the embedding computation and extraction log to debug 2025-01-29 16:40:36 +01:00
cb1b7513af Log the memory metrics only once 2025-01-29 15:21:52 +01:00
db032079d8 Show indexation allocated memory 2025-01-29 14:21:02 +01:00
a00796c46a Improve the naming in the log message 2025-01-29 14:21:02 +01:00
6112bd8caa Display the channel congestion 2025-01-29 14:21:02 +01:00
4a5923a55e log the time arroy took to insert embeddings 2025-01-27 14:22:17 +01:00
f5a4a1c8b2 Give more RAM to bbqueue.
- bbqueue buffers used to have (5% * 2%) / num_threads
- they now have 5% / num_threads
2025-01-24 12:18:32 +01:00
a6470a0c37 Improve error log 2025-01-22 15:50:41 +01:00
8a54f14b8e Demote panic to error log 2025-01-22 15:49:24 +01:00
63c8cbae5b Improve the panic message when deleting an unknown entry 2025-01-14 10:31:44 +01:00
c14967eeac Use new incremental facet indexing and enable sanity checks in debug 2025-01-09 10:36:35 +01:00
908adee6fc Fix the addition of empty payload 2025-01-09 10:24:36 +01:00
4275833bab Rename compute.rs to post_process.rs 2025-01-07 15:31:20 +01:00
de7f8c4406 refactor indexer mod 2025-01-07 15:29:02 +01:00
08fd026ebd fix warning 2024-12-11 18:18:13 +01:00
fa885e75b4 rename the send_progress in progress 2024-12-11 18:13:12 +01:00
f1beb60204 make the progress use payload instead of documents 2024-12-11 18:07:45 +01:00
867e6a8f1d rename the send_progress field to progress since it s not sending anything 2024-12-11 16:25:01 +01:00
6f4823fc97 make the number of document in the document tasks more incremental 2024-12-11 16:25:01 +01:00
df9b68f8ed inital implementation of the progress 2024-12-11 16:25:01 +01:00
a751972c57 Prefer using a stable than a random hash builder 2024-12-10 14:25:53 +01:00
89637bcaaf Use bumparaw-collections in Meilisearch/milli 2024-12-10 11:52:20 +01:00
f5dd8dfc3e Rollback max memory usage changes 2024-12-09 10:26:30 +01:00
3a11e39c01 Force max_memory to a min of 100MiB 2024-12-04 17:53:30 +01:00
8ecb726683 Fix the minimun BBQueue channel threshold 2024-12-03 15:49:11 +01:00
d040aff101 Stop allocating 1GiB for documents 2024-12-02 16:30:14 +01:00
d5c07ef7b3 Manage key length conversion error correctly 2024-12-02 11:03:00 +01:00
30eb0e5b5b Rename recv and read methods to recv_action and recv_frame 2024-12-02 10:08:01 +01:00
14ee7aa84c Make sure the BBQueue is at least 50 MiB 2024-11-28 18:02:48 +01:00
8a35cd1743 Adjust the BBQueue buffers to use 2% instead of 10% 2024-11-28 16:00:15 +01:00
3c7ac093d3 Take the BBQueue capacity into account in the max memory 2024-11-28 15:43:14 +01:00
b57dd5c58e Remove the Vector variant and use the Vectors 2024-11-28 15:20:43 +01:00
096a28656e Fix a bug around deleting all the vectors of a doc 2024-11-28 15:15:06 +01:00
cc4bd54669 Correctly construct the Embeddings struct 2024-11-28 13:53:25 +01:00
58eab9a018 Send large payload through crossbeam 2024-11-28 12:01:06 +01:00
e83534a430 Fix the indexer::index to correctly use the rayon::ThreadPool 2024-11-27 16:27:43 +01:00
98d4a2909e Fix the way we spawn the rayon threadpool 2024-11-27 16:05:44 +01:00
cc63802115 Modify and return the IndexEmbeddings to write them later 2024-11-27 14:58:03 +01:00
acec45ad7c Send a WakeUp when writing data in the BBQueue buffers 2024-11-27 14:33:23 +01:00
70802eb7c7 Fix most issues with the lifetimes 2024-11-27 14:32:42 +01:00
6ac5b3b136 Finish most of the channels types 2024-11-27 14:32:26 +01:00
2094ce8a9a Move the arroy building after the writing loop 2024-11-27 14:30:33 +01:00
8442db8101 Implement mostly all senders 2024-11-27 14:16:35 +01:00
a2f64f6552 Merge #5095
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 13s
Test suite / Run tests in debug (push) Failing after 12s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 40s
Test suite / Run Rustfmt (push) Successful in 1m46s
Test suite / Run Clippy (push) Successful in 9m55s
5095: Span to measure the part of db writes that is after the merge/extraction r=curquiza a=dureuill



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 11:10:00 +00:00
d0b2c0a523 Merge #5091
Some checks failed
Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 10s
Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 39s
Test suite / Run Rustfmt (push) Successful in 1m38s
Test suite / Run Clippy (push) Successful in 23m11s
5091: Settings opt out r=Kerollmops a=ManyTheFish

# Pull Request

Related PRD: https://www.notion.so/meilisearch/API-usage-Settings-to-opt-out-indexing-features-fff4b06b651f8108ade3f858aeb16b14?pvs=4

## Related issue
Fixes #4979 

- [x] Add setting opt-out
- [x] Add analytics
- [x] Add tests


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-11-26 15:50:28 +00:00
8f57b4fdf4 Span to measure the part of db writes that is after the merge/extraction 2024-11-26 14:46:36 +01:00
aa460819a7 Add more precise spans 2024-11-26 09:45:36 +01:00