Commit Graph

11832 Commits

Author SHA1 Message Date
Kerollmops
dfce20be21 Rename callTrace into progressTrace 2025-02-25 10:09:03 +01:00
Louis Dureuil
24fe6cd205 Fix multiple embeddings in hf 2025-02-24 16:24:04 +01:00
Louis Dureuil
e374b095a2 Fix tests 2025-02-24 14:11:26 +01:00
Louis Dureuil
9f3e4801b1 Refactor settings validation and introduce SubEmbedderSettings 2025-02-24 13:58:26 +01:00
Louis Dureuil
b85180fedb Error types 2025-02-24 13:58:26 +01:00
Louis Dureuil
3cdcc54a9e analytics 2025-02-24 13:58:26 +01:00
Louis Dureuil
294cf39cad Integrate composite embedder 2025-02-24 13:58:26 +01:00
Louis Dureuil
4a2643daa2 Rename embed_one to embed_search and embed_chunks* to embed_index* 2025-02-24 13:58:26 +01:00
Louis Dureuil
8d2d9066ba Add composite embedder 2025-02-24 13:58:26 +01:00
Louis Dureuil
526476e168 Move settings test to its own file 2025-02-24 13:58:26 +01:00
meili-bors[bot]
ea7bae9a71 Merge #5356
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 1s
Test suite / Test with Ollama (push) Failing after 12s
Test suite / Tests on ubuntu-20.04 (push) Failing after 19s
Test suite / Run Rustfmt (push) Failing after 17s
Test suite / Run Clippy (push) Successful in 9m21s
Run the indexing fuzzer / Setup the action (push) Successful in 1h8m47s
Test suite / Tests on macos-13 (push) Has been cancelled
Test suite / Tests on windows-2022 (push) Has been cancelled
Indexing bench (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of indexing (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Has been cancelled
5356: Display the internal indexing steps with timings on the `/batches` route r=irevoire a=Kerollmops

This PR computes the durations of each step, stores them in a map, and prints them (for now).

```
"callTrace": {
    "processing tasks > retrieving config": "185.38µs",
    "processing tasks > computing document changes > preparing update file > payload": "23.11ms",
    "processing tasks > computing document changes > preparing update file": "23.26ms",
    "processing tasks > computing document changes": "24.06ms",
    "processing tasks > indexing > extracting documents > document": "15.13ms",
    "processing tasks > indexing > extracting documents": "15.13ms",
    "processing tasks > indexing > extracting facets > document": "5.70ms",
    "processing tasks > indexing > extracting facets": "5.72ms",
    "processing tasks > indexing > extracting words > document": "597.24ms",
    "processing tasks > indexing > extracting words": "597.25ms",
    "processing tasks > indexing > extracting word proximity > document": "1.14s",
    "processing tasks > indexing > extracting word proximity": "1.15s",
    "processing tasks > indexing > tail writing to database": "430.91ms",
    "processing tasks > indexing > waiting for extractors": "52.54µs",
    "processing tasks > indexing > writing embeddings to database": "47.79µs",
    "processing tasks > indexing > post-processing facets": "476.04µs",
    "processing tasks > indexing > post-processing words": "97.82ms",
    "processing tasks > indexing > finalizing": "67.41ms",
    "processing tasks > indexing": "2.40s",
    "processing tasks": "2.43s",
    "writing tasks to disk > task": "37.71µs",
    "writing tasks to disk": "67.13µs"
},
"writeChannelCongestion": {
    "attempts": 2608482,
    "blocking_attempts": 0,
    "blocking_ratio": 0.0
}
```

## To Do
- [x] Update the batches PRD + delivery + tracking issue.
- [x] Store that in the batches to be visible from the `/batches` route.
- [x] Display the writer's congestion.
- [x] Display the info back in the logs too.
- [ ] (optional) Compute the size of each database by [using LMDB](https://docs.rs/heed/latest/heed/struct.DatabaseStat.html).
- [x] Push them in reverse order so that "processing task" is after the other sub-steps.


Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-20 17:38:50 +00:00
Kerollmops
76fd5d92d7 Clarify the tail writing to database 2025-02-20 17:35:23 +01:00
Kerollmops
245a55722a Remove commented code 2025-02-20 16:48:18 +01:00
Kerollmops
434fad5327 Fix insta tests again 2025-02-20 16:41:48 +01:00
Kerollmops
243a5fa6a8 Log the call trace and congestion 2025-02-20 14:17:34 +01:00
Kerollmops
9d314ace09 Fix the insta tests 2025-02-20 11:51:58 +01:00
Kerollmops
1b1172ad16 Fix dump tests 2025-02-20 10:44:53 +01:00
Kerollmops
1d99c8465c Hide the batch stats to make insta pass 2025-02-20 10:16:54 +01:00
Kerollmops
05cc8c650c Expose the write channel congestion in the batches 2025-02-19 15:47:54 +01:00
Louis Dureuil
14e1459bf5 Document settings 2025-02-19 15:06:22 +01:00
Louis Dureuil
589bf30ec6 make clippy happy 2025-02-19 11:38:07 +01:00
Louis Dureuil
b367c71ad2 fixup test 2025-02-19 11:31:17 +01:00
Kerollmops
3ff1de0a21 Expose the call trace in the batch stats 2025-02-19 11:24:11 +01:00
Louis Dureuil
1005a60fb8 Fixup dump settings 2025-02-19 11:03:48 +01:00
Kerollmops
e9add14189 Reorder steps 2025-02-18 19:26:41 +01:00
Kerollmops
4a058a080e Simplify the name generation 2025-02-18 18:48:44 +01:00
Kerollmops
11a11fc870 Accumulate step durations from the progress system 2025-02-18 18:33:19 +01:00
Louis Dureuil
cd0dfa3f1b Fix test cases 2025-02-18 17:21:52 +01:00
Louis Dureuil
7b4ce468a6 Allow overriding pooling method 2025-02-18 17:12:23 +01:00
Louis Dureuil
11759c4be4 Support pooling 2025-02-18 16:10:51 +01:00
meili-bors[bot]
0f1aeb8eaa Merge #5351
Some checks failed
Look for flaky tests / flaky (push) Failing after 19s
SDKs tests / define-docker-image (push) Failing after 5s
SDKs tests / .NET SDK tests (push) Has been skipped
SDKs tests / Dart SDK tests (push) Has been skipped
SDKs tests / Go SDK tests (push) Has been skipped
SDKs tests / Java SDK tests (push) Has been skipped
SDKs tests / JS SDK tests (push) Has been skipped
SDKs tests / PHP SDK tests (push) Has been skipped
SDKs tests / Python SDK tests (push) Has been skipped
SDKs tests / Ruby SDK tests (push) Has been skipped
SDKs tests / Rust SDK tests (push) Has been skipped
SDKs tests / Swift SDK tests (push) Has been skipped
SDKs tests / meilisearch-js-plugins tests (push) Has been skipped
SDKs tests / meilisearch-rails tests (push) Has been skipped
SDKs tests / meilisearch-symfony tests (push) Has been skipped
Publish binaries to GitHub release / Check the version validity (push) Successful in 9s
Publish binaries to GitHub release / Publish binary for aarch64 (meilisearch-linux-aarch64, aarch64-unknown-linux-gnu) (push) Failing after 2s
Publish binaries to GitHub release / Publish binary for Linux (push) Failing after 12s
Publish binaries to GitHub release / Publish binary for macos-13 (push) Has been cancelled
Publish binaries to GitHub release / Publish binary for windows-2022 (push) Has been cancelled
Publish binaries to GitHub release / Publish binary for macOS silicon (meilisearch-macos-apple-silicon, aarch64-apple-darwin) (push) Has been cancelled
Test suite / Tests on ubuntu-20.04 (push) Failing after 12s
Test suite / Test with Ollama (push) Failing after 7s
Test suite / Test disabled tokenization (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 11s
Test suite / Run Clippy (push) Failing after 17s
Test suite / Run Rustfmt (push) Successful in 1m51s
Test suite / Tests almost all features (push) Failing after 7m7s
Test suite / Tests on macos-13 (push) Has been cancelled
Test suite / Tests on windows-2022 (push) Has been cancelled
5351: Bring back v1.13.0 changes into main r=irevoire a=Kerollmops

This PR brings back the changes made in v1.13 into the main branch.

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Clémentine <clementine@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2025-02-18 08:05:02 +00:00
meili-bors[bot]
5e7803632d Merge #5342
Some checks failed
Test suite / Tests on ubuntu-20.04 (push) Failing after 1s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 14s
Test suite / Run Rustfmt (push) Successful in 1m54s
Test suite / Run Clippy (push) Failing after 6m49s
Test suite / Tests on macos-13 (push) Has been cancelled
Test suite / Tests on windows-2022 (push) Has been cancelled
5342: Fix workload sha r=dureuill a=ManyTheFish

The dataset shasum was wrong for some workloads making the `/bench workloads/*.json` crash

Co-authored-by: ManyTheFish <many@meilisearch.com>
v1.13.0-rc.3 v1.13.0
2025-02-12 16:27:09 +00:00
meili-bors[bot]
885710a07b Merge #5341
5341: Embeddings stats r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #5321

## What does this PR do?
- Add embedding stats
- force dumpless upgrade to recompute stats
- add tests


Co-authored-by: ManyTheFish <many@meilisearch.com>
2025-02-12 15:46:37 +00:00
ManyTheFish
c55fdad2c3 Fix dumpless upgrade target version 2025-02-12 16:35:05 +01:00
ManyTheFish
1caad4c4b0 Add multiple embeddings for the same embedder in tests 2025-02-12 16:13:34 +01:00
ManyTheFish
8419ed52a1 fix clippy 2025-02-12 14:38:51 +01:00
ManyTheFish
a65c52cc97 Convert dump test into snapshots 2025-02-12 14:14:10 +01:00
ManyTheFish
49e9655c24 Update snapshots 2025-02-12 14:05:32 +01:00
meili-bors[bot]
fa763ca5dc Merge #5339
5339: Add back timeout from v1.11.3 r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #5337

## What does this PR do?
- Fix regression compared with v1.11 by reintroducing the 30s timeout on all REST API calls.

Thanks to `@migueltarga` for reporting the issue


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
v1.13.0-rc.2
2025-02-12 12:50:27 +00:00
ManyTheFish
c7aeb554b2 Add tests 2025-02-12 13:37:41 +01:00
ManyTheFish
88d9d47928 Fix benchmark sha 2025-02-12 13:27:15 +01:00
Louis Dureuil
8e0d8d31f9 Add back timeout from v1.11.3 2025-02-12 11:53:00 +01:00
meili-bors[bot]
81a38099ec Merge #5336
5336: Meilitool Hair Dryer r=dureuill a=Kerollmops

This pull request introduces a new subcommand to hair dry a specific part of specific indexes. It is useful when [the memory-mapped pages are not hot in the cache](https://arc.net/l/quote/ixhcdwcq) and must be. Hair drying those interesting pages makes the search requests using the vector store much faster.

The previous technique used the "cat method," which consists of reading the whole LMDB data file and pipping it into the null file descriptor. By doing that, the whole LMDB data file becomes hot in the cache. However, when the database is large, at least 30% of it is free, and unused pages and many other pages don't need to be hot, e.g., raw JSON documents or uninteresting parts of the inverted index.

This new subcommand reads all the Arroy pages of a given index to make them hot, and only those. More coming...

The current algorithm is single-threaded and takes a lot of time. I am in the process of multithreading it. This is the time it takes to hair dry a 305GiB database with a single thread.

```
real    21m51.054s
user    0m3.155s
sys     0m19.393s
```

## To Do
- [ ] (optional) Do the reads in parallel.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-12 10:45:16 +00:00
ManyTheFish
bd27fe7d02 force dumpless upgrade to recompute stats 2025-02-12 11:45:02 +01:00
ManyTheFish
41203f0931 Add embedders stats 2025-02-12 11:37:47 +01:00
Kerollmops
803a699b15 Remove unsafes 2025-02-12 10:46:45 +01:00
Kerollmops
246ad3b06e Display a progress percentage 2025-02-12 09:56:05 +01:00
Kerollmops
a21c440274 Bump Ubuntu from 20.04 to 22.04 2025-02-12 09:49:50 +01:00
meili-bors[bot]
c01d26ffd7 Merge #5324
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Test with Ollama (push) Failing after 5s
Test suite / Tests on ubuntu-20.04 (push) Failing after 11s
Test suite / Run tests in debug (push) Failing after 0s
Test suite / Run Rustfmt (push) Failing after 7s
Test suite / Run Clippy (push) Successful in 7m29s
Run the indexing fuzzer / Setup the action (push) Successful in 1h5m25s
Indexing bench (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of indexing (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Has been cancelled
Test suite / Tests on macos-13 (push) Has been cancelled
Test suite / Tests on windows-2022 (push) Has been cancelled
5324: Mention utoipa in sprint issues r=curquiza a=irevoire

Update the sprint-issue template to mention the openAPI file and utoipa.

Let me know if something is not clear or missing

Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-11 20:46:26 +00:00
meili-bors[bot]
225af069a9 Merge #5149
5149: Ensure the settings routes are now configurated when a new field is added to the Settings struct  r=curquiza a=MichaScant

# Pull Request
## Related issue
Fixes #5126 

## What does this PR do?
Ensures the settings routes are properly configured before a new field is added to the settings structure. Changes were made based on what was proposed in the original issue, any new field for settings struct is added in the [make_settings_route! macro list](6298db5bea/crates/meilisearch/src/routes/indexes/settings.rs (L182-L403)) 

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: michascant <89426143+MichaScant@users.noreply.github.com>
2025-02-11 20:10:29 +00:00