Commit Graph

10958 Commits

Author SHA1 Message Date
792be63567 Merge #5323
5323: exclude network time from processingMs r=Kerollmops a=dureuill



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-02-06 16:35:44 +00:00
70aac71c63 exclude network time from processingMs 2025-02-06 17:18:36 +01:00
a562d6abc1 Merge #5322
5322: Make sure arroy is using the rayon thread-pool r=dureuill a=Kerollmops

This PR fixes #5249 by ensuring arroy uses the rayon thread pool.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-06 15:28:47 +00:00
b7fdd9516c Merge #4970
4970: Create a new export documents meilitool subcommand r=dureuill a=Kerollmops

This subcommand can be useful for extracting documents from an existing database.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-06 14:48:27 +00:00
5f2a1a4fd1 Skip the documents before fetching them 2025-02-06 15:40:22 +01:00
2b0e17ede0 Make sure arroy is using the rayon thread-pool 2025-02-06 15:28:10 +01:00
37092adc71 Show a bit of progress 2025-02-06 10:37:05 +01:00
86fcad788e Introduce a parameter to skip the first documents 2025-02-06 10:32:50 +01:00
2ea5c57871 Create a new export documents meilitool subcommand based on v1.12 2025-02-06 10:32:39 +01:00
78867b6852 Merge #5299
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 1s
Test suite / Tests on ubuntu-20.04 (push) Failing after 17s
Test suite / Tests on windows-2022 (push) Failing after 25s
Test suite / Run Rustfmt (push) Failing after 1m6s
Test suite / Run Clippy (push) Successful in 8m46s
Test suite / Tests on macos-13 (push) Has been cancelled
5299: Remote federated search r=dureuill a=dureuill

Fixes #4980 

- Usage: https://www.notion.so/meilisearch/API-usage-Remote-search-request-f64fae093abf409e9434c9b9c8fab6f3?pvs=25#1894b06b651f809a9f3dcc6b7189646e

- Changes database format:
  - Adds a new database key: the code is resilient to the case where the key is missing
  - Adds a new experimental feature: the code for experimental features is resilient to this case

Changes:

- Add experimental feature `proxySearch`
- Add network routes
- Dump support for network
- Add proxy search
- Add various tests

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
v1.13.0-rc.1
2025-02-05 16:08:48 +00:00
b21b8e8f30 Remote search tests 2025-02-05 15:03:33 +01:00
4a9e5ae215 mv multi.rs -> multi/mod.rs 2025-02-05 15:03:33 +01:00
6e1865b75b network integration tests 2025-02-05 15:03:32 +01:00
64409a1de7 Test server: clear_api_key 2025-02-05 15:03:32 +01:00
1b81cab782 Add more analytics 2025-02-05 15:03:32 +01:00
88190b5602 Fix tests 2025-02-05 15:03:32 +01:00
0b27aa5138 Multi search reads header to know if it is being proxied 2025-02-05 15:03:32 +01:00
35160788d7 Proxy search requests 2025-02-05 15:03:32 +01:00
c3e5c3ba36 Allow rebuilding a SearchQueryWithIndex from its components 2025-02-05 15:03:16 +01:00
04ac0af54b Add WeightedScoreValues to be able to compare remote scores 2025-02-05 15:03:16 +01:00
9996533364 Make search types serialize and deserialize so that reading from a proxy is possible 2025-02-05 15:03:16 +01:00
3f6b334fc5 Route network 2025-02-05 15:03:16 +01:00
b30e5a7a35 Add new permissions 2025-02-05 15:03:16 +01:00
6d79cb23ba New error codes 2025-02-05 15:03:16 +01:00
e34afca6d7 Support network in dumps 2025-02-05 15:03:16 +01:00
4918b9ffb6 Network stored in DB 2025-02-05 15:03:15 +01:00
73474e7af0 Network types 2025-02-05 15:03:15 +01:00
7ae6dda03f Add new experimental feature 2025-02-05 15:01:04 +01:00
00e764b0d3 Merge #5314
5314: Activate used database size r=irevoire a=ManyTheFish

# Pull Request

make the `/stats` route return the `usedDatabaseSize` corresponding to the size used to store the "real" data in the database and not the disk size used by LMDB


Co-authored-by: ManyTheFish <many@meilisearch.com>
2025-02-05 12:51:57 +00:00
4abf0db0b4 Activate used database size 2025-02-05 13:45:47 +01:00
acc885fd0a Merge #5312
5312: Send the OSS analytics once per day instead of once per hour r=ManyTheFish a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5311

## What does this PR do?
- If the instance is OSS => we send the analytics once every day
- If the instance is on the meilisearch cloud => we send the analytics every hour


Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-05 11:15:34 +00:00
61e8cfd4bc Send the OSS analytics once per day instead of once per hour 2025-02-04 15:39:00 +01:00
796acd1aee Merge #5288
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 13s
Test suite / Run tests in debug (push) Failing after 13s
Test suite / Run Clippy (push) Failing after 19s
Test suite / Tests on windows-2022 (push) Failing after 48s
Test suite / Run Rustfmt (push) Successful in 1m28s
Test suite / Tests on macos-13 (push) Has been cancelled
5288: Improve AI logging r=dureuill a=Kerollmops

This PR fixes #5285 and brings the changes from #5233 to simplify debugging indexation and search performance issues related to AI. The following texts can be found in the logs to debug and understand performance issues:

 - `embed_one: search` represents the time we spent waiting for the embedding generation, i.e., OpenAI, local HuggingFace, Ollama.
 - `filtered_universe: search::universe` the time spent filtering the documents.
 - ~`next_bucket: search::vector_sort` is the time spent finding the nearest neighbors (ANNs) in the vector store (arroy), locally~ was being triggered too many times.
 - `indexing::vectors` is the time arroy spends indexing the new vectors for a batch.
 - `documents::extract vectors` and `documents::merge vectors` to see the time spent generating and writing the embeddings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-04 10:20:45 +00:00
cc8df5e11f Move back the search-side logging to tracing 2025-02-04 11:16:17 +01:00
ede74ccc42 Merge #5306
Some checks failed
Test suite / Tests on ubuntu-20.04 (push) Failing after 2s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 2s
Test suite / Tests on windows-2022 (push) Failing after 24s
Test suite / Run Rustfmt (push) Successful in 1m33s
Test suite / Run Clippy (push) Successful in 6m20s
Test suite / Tests on macos-13 (push) Has been cancelled
5306: Fix internal error when passing `documentTemplateMaxBytes` to a source that doesn't support it r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes #5305 

## What does this PR do?
- add `DOCUMENT_TEMPLATE_MAX_BYTES` to `allowed_sources_for_field` and `allowed_fields_for_source` to prevent a panic


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-02-04 08:46:13 +00:00
6425451bbc Merge #5303
5303: Bring back changes from v1.12.8 into v1.13.0 r=Kerollmops a=Kerollmops

Fixes #5087 and other problems that you can find in the original PR #5294.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-03 10:49:26 +00:00
fe46855462 Merge #5235
5235: Introduce a compaction subcommand in meilitool r=dureuill a=Kerollmops

This PR proposes a change to the meilitool helper, introducing the `compact-index` subcommand to reduce the size of the indexes.

While working on this tool, I discovered that the current heed `Env::copy_to_file` API is not very temp file friendly and [could be improved](https://github.com/meilisearch/heed/issues/306).

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2025-02-03 10:11:01 +00:00
8e7d2d25f2 Only open indexes, do not create them 2025-02-03 10:50:38 +01:00
a436534515 Fix test 2025-02-03 10:36:34 +01:00
2385842537 Fix the imports 2025-02-03 10:29:09 +01:00
6a70c0ec92 Add a link to the experimental feature GitHub discussion 2025-02-03 10:24:53 +01:00
7a9382b115 Better document the rayon limitation condition 2025-02-03 10:24:53 +01:00
62dabeba5f Do not create too many rayon tasks when processing the settings 2025-02-03 10:24:52 +01:00
48812229a9 Remove a log that would log too much 2025-02-03 10:24:52 +01:00
915cc377fb Refine the env variable and the max readers 2025-02-03 10:24:52 +01:00
96544bfa43 add DOCUMENT_TEMPLATE_MAX_BYTES to allowed_sources_for_field and allowed_fields_for_source 2025-02-03 09:59:17 +01:00
09d474da63 Merge #5140
Some checks failed
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests almost all features (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 21s
Test suite / Tests on windows-2022 (push) Failing after 26s
Test suite / Run Clippy (push) Failing after 19s
Test suite / Run Rustfmt (push) Successful in 4m7s
Test suite / Tests on ubuntu-20.04 (push) Failing after 14m22s
Test suite / Tests on macos-13 (push) Has been cancelled
5140: Fix workload inversion r=dureuill a=ManyTheFish

The used assets were inverted between `workloads/hackernews-modify-facet-numbers.json`
and `workloads/hackernews-modify-facet-strings.json`, now fixed.


Co-authored-by: ManyTheFish <many@meilisearch.com>
2025-02-03 08:22:22 +00:00
aaefbfae1f Do not create too many rayon tasks 2025-01-30 16:36:12 +01:00
97e17f52a1 Add more logs to see calls to the embedders 2025-01-30 16:36:12 +01:00
62ced0e3f1 Make cargo fmt happy 2025-01-30 11:09:54 +01:00