Commit Graph

410 Commits

Author SHA1 Message Date
meili-bors[bot]
885710a07b Merge #5341
5341: Embeddings stats r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #5321

## What does this PR do?
- Add embedding stats
- force dumpless upgrade to recompute stats
- add tests


Co-authored-by: ManyTheFish <many@meilisearch.com>
2025-02-12 15:46:37 +00:00
ManyTheFish
c55fdad2c3 Fix dumpless upgrade target version 2025-02-12 16:35:05 +01:00
ManyTheFish
8419ed52a1 fix clippy 2025-02-12 14:38:51 +01:00
Louis Dureuil
8e0d8d31f9 Add back timeout from v1.11.3 2025-02-12 11:53:00 +01:00
ManyTheFish
bd27fe7d02 force dumpless upgrade to recompute stats 2025-02-12 11:45:02 +01:00
ManyTheFish
41203f0931 Add embedders stats 2025-02-12 11:37:47 +01:00
Louis Dureuil
b83275c9c5 Change the updated* functions to only_new functions, hopefully better communicating what they do 2025-02-11 15:27:10 +01:00
Louis Dureuil
d7f35ee3ba Use merged document instead of updated 2025-02-11 15:27:10 +01:00
meili-bors[bot]
0c3e7fe963 Merge #5316
Some checks failed
Test suite / Tests on ubuntu-20.04 (push) Failing after 2s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 16s
Test suite / Run Clippy (push) Failing after 12s
Test suite / Run Rustfmt (push) Failing after 32s
Test suite / Tests on macos-13 (push) Has been cancelled
Test suite / Tests on windows-2022 (push) Has been cancelled
5316: Fix the dumpless upgrade corruption r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5280

## What does this PR do?
- Add a test that ensure we write the version in the index-scheduler even if we have a bug while writing the VERSION file
- Do what was described in the issue


Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-10 09:53:57 +00:00
Tamo
45f843ccb9 fmt 2025-02-10 10:46:42 +01:00
Kerollmops
2b0e17ede0 Make sure arroy is using the rayon thread-pool 2025-02-06 15:28:10 +01:00
Louis Dureuil
04ac0af54b Add WeightedScoreValues to be able to compare remote scores 2025-02-05 15:03:16 +01:00
Louis Dureuil
9996533364 Make search types serialize and deserialize so that reading from a proxy is possible 2025-02-05 15:03:16 +01:00
meili-bors[bot]
796acd1aee Merge #5288
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 13s
Test suite / Run tests in debug (push) Failing after 13s
Test suite / Run Clippy (push) Failing after 19s
Test suite / Tests on windows-2022 (push) Failing after 48s
Test suite / Run Rustfmt (push) Successful in 1m28s
Test suite / Tests on macos-13 (push) Has been cancelled
5288: Improve AI logging r=dureuill a=Kerollmops

This PR fixes #5285 and brings the changes from #5233 to simplify debugging indexation and search performance issues related to AI. The following texts can be found in the logs to debug and understand performance issues:

 - `embed_one: search` represents the time we spent waiting for the embedding generation, i.e., OpenAI, local HuggingFace, Ollama.
 - `filtered_universe: search::universe` the time spent filtering the documents.
 - ~`next_bucket: search::vector_sort` is the time spent finding the nearest neighbors (ANNs) in the vector store (arroy), locally~ was being triggered too many times.
 - `indexing::vectors` is the time arroy spends indexing the new vectors for a batch.
 - `documents::extract vectors` and `documents::merge vectors` to see the time spent generating and writing the embeddings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-04 10:20:45 +00:00
Kerollmops
cc8df5e11f Move back the search-side logging to tracing 2025-02-04 11:16:17 +01:00
meili-bors[bot]
ede74ccc42 Merge #5306
Some checks failed
Test suite / Tests on ubuntu-20.04 (push) Failing after 2s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 2s
Test suite / Tests on windows-2022 (push) Failing after 24s
Test suite / Run Rustfmt (push) Successful in 1m33s
Test suite / Run Clippy (push) Successful in 6m20s
Test suite / Tests on macos-13 (push) Has been cancelled
5306: Fix internal error when passing `documentTemplateMaxBytes` to a source that doesn't support it r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes #5305 

## What does this PR do?
- add `DOCUMENT_TEMPLATE_MAX_BYTES` to `allowed_sources_for_field` and `allowed_fields_for_source` to prevent a panic


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-02-04 08:46:13 +00:00
Tamo
d34f0b606c Update crates/milli/src/update/new/document_change.rs 2025-02-03 12:08:52 +01:00
Kerollmops
acc400face Support merging update and replacement operations 2025-02-03 11:47:17 +01:00
Kerollmops
aa2327591e Add more mixing updates and replacements tests 2025-02-03 10:34:07 +01:00
Kerollmops
60470bb647 Fix the tests to use the new replace/update documents 2025-02-03 10:34:07 +01:00
Kerollmops
8e6893ddbe Make sure we correctly mix different document operations 2025-02-03 10:34:06 +01:00
Kerollmops
7a9382b115 Better document the rayon limitation condition 2025-02-03 10:24:53 +01:00
Kerollmops
62dabeba5f Do not create too many rayon tasks when processing the settings 2025-02-03 10:24:52 +01:00
Kerollmops
48812229a9 Remove a log that would log too much 2025-02-03 10:24:52 +01:00
Louis Dureuil
96544bfa43 add DOCUMENT_TEMPLATE_MAX_BYTES to allowed_sources_for_field and allowed_fields_for_source 2025-02-03 09:59:17 +01:00
Kerollmops
aaefbfae1f Do not create too many rayon tasks 2025-01-30 16:36:12 +01:00
Kerollmops
97e17f52a1 Add more logs to see calls to the embedders 2025-01-30 16:36:12 +01:00
Kerollmops
424c5bde40 Move the embedding computation and extraction log to debug 2025-01-29 16:40:36 +01:00
Kerollmops
cb1b7513af Log the memory metrics only once 2025-01-29 15:21:52 +01:00
Clément Renault
a9d0f4a002 Improve english comments
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-01-29 15:16:40 +01:00
Kerollmops
db032079d8 Show indexation allocated memory 2025-01-29 14:21:02 +01:00
Clément Renault
a00796c46a Improve the naming in the log message 2025-01-29 14:21:02 +01:00
Kerollmops
6112bd8caa Display the channel congestion 2025-01-29 14:21:02 +01:00
Kerollmops
cec88cfc29 Measure the bbqueue congestion 2025-01-29 14:21:02 +01:00
Kerollmops
47f70e3d79 Debug the first vector sort fill buffer 2025-01-27 14:22:29 +01:00
Kerollmops
0f8eb3b506 Improve the logs of the search with AI 2025-01-27 14:22:22 +01:00
Kerollmops
4a5923a55e log the time arroy took to insert embeddings 2025-01-27 14:22:17 +01:00
Clément Renault
9b579069df Comment the max grant of the bbqueue
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-01-24 12:18:32 +01:00
Louis Dureuil
f5a4a1c8b2 Give more RAM to bbqueue.
- bbqueue buffers used to have (5% * 2%) / num_threads
- they now have 5% / num_threads
2025-01-24 12:18:32 +01:00
Kerollmops
5ab4cdb1f3 Reduce the maximum grant possible we can store in the BBQueue 2025-01-24 12:18:32 +01:00
Tamo
7197ced673 fix the bad index version on opening 2025-01-23 16:51:24 +01:00
Tamo
787472453d write the version of the index while upgrading it 2025-01-23 16:51:24 +01:00
Tamo
c27c923439 introduce a trait to upgrade the indexes 2025-01-23 16:51:23 +01:00
Tamo
41eeffd88d fmt 2025-01-23 16:51:20 +01:00
Tamo
20ac59c946 fix the field distribution when upgrading from the v1_12 2025-01-23 16:51:19 +01:00
Tamo
cfc1e193b6 update the test with the stats 2025-01-23 16:51:19 +01:00
Tamo
0cc25c7e4c add a large test importing a data.ms from the v1.12.0 2025-01-23 16:51:18 +01:00
Tamo
3ef7a478cd move the version check to the task queue 2025-01-23 16:48:32 +01:00
Tamo
e70ac35e02 fix bugs after rebase 2025-01-23 16:48:32 +01:00
Tamo
d3654906bf Add the new tasks with most of the job done 2025-01-23 16:48:32 +01:00