Commit Graph

9793 Commits

Author SHA1 Message Date
2d16d0aea1 Merge #4839
4839: In prometheus metrics return the route pattern instead of the real route when returning the HTTP requests total r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4825

## What does this PR do?
- return the route pattern instead of the real route when returning the HTTP requests total


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-08-05 10:14:51 +00:00
866922ecc3 Merge #4808
4808: Make the tests run faster r=irevoire a=irevoire

## Index-Scheduler

### Only check the consistency of the index-scheduler on snapshots when running in release mode

This saves 12s on the tests, and since the tests run in release mode in the CI, we don't lose any information.
From 28s to 16s

### We were snapshotting the index for no reason in `advance_till`, I removed this call

This saved an additional 8s on the tests, going from 16s to 8s.

----

After these two optimizations, the test suite as a whole executes 14% quicker

## Meilisearch integration tests

While profiling this test suite, nothing stands out. The only noticeable thing is that we're losing most of our time creating and dropping threads.
I made the theory that by sharing a single common instance between all integrations tests I would gain some time again.

In 355a7acd1c I saved another 15s by only testing this theory on the module that tests the error messages. 
But we can do it on many more tests. **We must take care of not making any test flaky, though**.

## Use two indexing threads

By moving from one to two indexing threads, we gain an additional 30% in performance.

# Conclusion

## Before

The execution of the test suite was taking around:
- 4m40s on my computer
- 15 minutes on the debug CI with cache
- 29 minutes on the Windows CI with cache

## After

The execution of the test suite is taking around:
- 2m20 on my computer
- 8 minutes on the debug CI with cache
- 29 minutes on the Windows CI with cache

## This means the test suite should now run ~50% faster on your computer; the CI may report errors twice faster, but we'll still wait for ~the same amount of time to merge a PR


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-07-30 15:11:30 +00:00
f05ea04879 In prometheus metrics return the route pattern instead of the real route when returning the HTTP requests total 2024-07-30 16:24:49 +02:00
c457069367 ensure a test is 100% not flaky 2024-07-30 15:41:51 +02:00
bb1283222e make clippy happy 2024-07-30 15:10:56 +02:00
7a5a38f870 fix a sync issue on empty indexes 2024-07-30 15:09:12 +02:00
ded3cd0dd6 an additionnal 30% of perf for the tests 2024-07-30 15:03:54 +02:00
68f885f1c4 fix two snapshots 2024-07-30 14:45:59 +02:00
9372c34dab prepare the tests to share indexes with api key 2024-07-30 14:34:11 +02:00
6666c57880 reduce the number of thread spawned by milli 2024-07-30 14:34:10 +02:00
b53a019b07 fix the initialization problem over the shared indexes with documents 2024-07-30 14:24:57 +02:00
d262b1df32 craft an API over the Shared Server and Shared index to avoid hard to debug mistakes 2024-07-30 14:24:57 +02:00
ed795bc837 fmt 2024-07-30 14:24:57 +02:00
993264227d reuse an index with already indexed documents instead of reindexing from scratch 2024-07-30 14:24:57 +02:00
953d3a44bd make the new_shared function synchronous and stop indexing documents when it's not required 2024-07-30 14:24:57 +02:00
e5345fb0eb shave off 15s by providing a shared instance to the integration tests 2024-07-30 14:24:55 +02:00
2d9a055fb9 stops snapshotting in advance_till when we don't need to 2024-07-30 13:57:12 +02:00
110dc01f40 only check the consistency of the index-scheduler on snapshots when running in release mode 2024-07-30 13:57:12 +02:00
abe128476f Merge #4830
4830: Use the dtolnay's Rust Toolchain r=dureuill a=Kerollmops

Fixes the CI by using another rust-toolchain GitHub repo.

Note: the [helix-editor/rust-toolchain repository](https://github.com/helix-editor/rust-toolchain) has been deleted so we moved to the [dtolnay/rust-toolchain](https://github.com/dtolnay/rust-toolchain) one. However, the dtolnay's one doesn't support `rust-toolchain.toml` and the version is directly in the rust-toolchain@version. We keep the `rust-toolchain.toml` for local builds only.

Co-authored-by: Clément Renault <clement@meilisearch.com>
v1.10.0-rc.0
2024-07-29 08:33:59 +00:00
a663e408ad Move to the right rust toolchain version 2024-07-29 10:06:34 +02:00
986991277f Use the dtolnay rust toolchain 2024-07-29 10:00:40 +02:00
c2c1ba39ee Merge #4826
4826: Update Charabia v0.9.0 r=dureuill a=ManyTheFish

# Pull Request

## Related Changelog
https://github.com/meilisearch/charabia/releases/tag/v0.9.0

## Notable Change for Meilisearch
Adds all math symbols from https://www.compart.com/en/unicode/category/Sm to the default separator list.



Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-07-25 14:08:38 +00:00
35567b2137 Update Charabia v0.9.0 2024-07-25 16:02:14 +02:00
00c97c7152 Merge #4818
4818: Custom headers and QoL improvements r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes #4734 
Depends on #4815 

## What does this PR do?
- Adds custom headers for rest embedders ([public usage](https://meilisearch.notion.site/v1-10-AI-search-changes-737c9d7d010d4dd685582bf5dab579e2#41354652885242c899def07e36a66d49))
- Quality of life: allow specifying `dimensions` for `ollama` embedders ([public usage](https://meilisearch.notion.site/v1-10-AI-search-changes-737c9d7d010d4dd685582bf5dab579e2#37218531431343dab3d2d3a9a1937e9d)). As for `rest` embedders, specifying `dimensions` disables the "test" embedding when the embedder is spawned.
- Improve error message again when indexing documents that don't have a vector for a user-provided vector
  1. Remove the contents of the document
  2. Display the docid of the first document that triggered the error
  3. Indicate how many documents in that chunk suffered from the same issue for that embedder


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-07-25 13:33:11 +00:00
d4ea7cc2a9 fix clippy 👉👈 2024-07-25 12:10:32 +02:00
8532fe8afc Fix tests 2024-07-25 12:10:32 +02:00
2413592bbf Display docid when there are documents without manual embeddings for a manual embedder 2024-07-25 12:10:32 +02:00
553440632e Introduce Setting::some_or_not_set 2024-07-25 12:01:52 +02:00
7a347966da Allow explicit dimensions for ollama 2024-07-25 12:01:51 +02:00
6c598fa06d test custom headers 2024-07-25 12:01:51 +02:00
8338df0dbe Fix tests 2024-07-25 12:01:51 +02:00
4654d51e05 Add custom headers for REST embedder 2024-07-25 12:01:51 +02:00
22ef2d877f Ensure test server has a single indexing thread 2024-07-25 12:01:51 +02:00
76bc2c18e8 Merge #4819
4819: Language settings r=dureuill a=ManyTheFish

# Pull Request

## Related issue
Fixes #4749 

## What does this PR do?
- [Implement localized search](c0c6955c0d)
- [Implement localized attributes settings](bde827b055)

## Related PRD

- [PRD](https://www.notion.so/meilisearch/Define-language-settings-to-impact-relevancy-bee62e18b7584c4f87d18a7654855329)
- [Public usage](https://www.notion.so/meilisearch/v1-10-Language-settings-usage-26c5d98b553349d9abacbe7aff698e4e)


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-07-25 09:00:33 +00:00
59115fd058 Fix tests 2024-07-25 10:52:57 +02:00
a918561ac1 Fix PR comments 2024-07-25 10:52:56 +02:00
70d71581ee fix clippy 2024-07-25 10:52:56 +02:00
4fbe048cbf Update Cargo.lock 2024-07-25 10:52:56 +02:00
e06fbcc607 Update snapshots 2024-07-25 10:52:56 +02:00
04fa44e7eb Implement localized attributes settings 2024-07-25 10:51:27 +02:00
90c0a6db7d Implement localized search 2024-07-25 10:51:27 +02:00
d82f8fd904 Add tests 2024-07-25 10:51:27 +02:00
cc02920f2b Update charabia 2024-07-25 10:51:27 +02:00
c26bd68de5 Merge #4815
4815: Rest embedder api mk2 r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4756

- [x] [REST API parameter names and behavior are unclear](https://github.com/meilisearch/documentation/pull/2824#issuecomment-2124073720)
  - unclear names are removed. There remain only two parameters: `request`, a template of what Meilisearch's request to the embedding server should be, and `response`, a template of what the embedding server's response to Meilisearch should look like
- [x] [Bad error message or bad default value when we don't specify the `query` parameter](85d8455c11/meilisearch/tests/vector/rest.rs (L105-L140))
  - The replacement for `query`, which is `request`, is now a mandatory parameter. Omitting it will result in the following error message : "`.embedders.rest`: Missing field `request` (note: this field is mandatory for source rest)", which is clear
- [x] [Bad error message when both `pathToEmbeddings` and `embeddingObject` are missing](2141cb3b69/meilisearch/tests/vector/rest.rs (L142-L178))
  - These parameters no longer exist. Now, the point of extraction is given directly by the location of an `{{embedding}}` placeholder in the `response` parameter.
- [x] [Unexpected error when we don't specify both `pathToEmbeddings` and `embeddingObject` (only once should be required)](2141cb3b69/meilisearch/tests/vector/rest.rs (L180-L260))
  - These parameters no longer exist. Now, the point of extraction is given directly by the location of an `{{embedding}}` placeholder in the `response` parameter.
- [x] [Should not panic when the dimensions specified do not work with the model](2141cb3b69/meilisearch/tests/vector/rest.rs (L262-L299))
  - This no longer panics, instead returns "While embedding documents for embedder `rest`: runtime error: was expecting embeddings of dimension `2`, got embeddings of dimensions `3`"
- [x] [Be more flexible on the type of data that is accepted](https://github.com/meilisearch/meilisearch/issues/4757#issuecomment-2201948531)
  - [x] Always accept arrays of embeddings even if `inputType` is set to `text`
    - This is controlled by the repeat placeholder `"{..}"`, an array of embeddings can be configured even if the input is not in an array.
  - [x] Accept arrays of result at the root level and texts/array of text at the root level.
    -  doable with `request: "{{text}}"` and `response: "{{embedding}}"` or `response: ["{{embedding}}"]` (see test `vector::rest::server_raw`)

## What does this PR do?
- [See public usage](https://meilisearch.notion.site/v1-10-AI-search-changes-737c9d7d010d4dd685582bf5dab579e2#8de842673ffa4a139210094a89c1ec3e)
- Add new `milli::vector::json_template` module to parse JSON templates with an injection placeholder and a repeat placeholder
- Change rest embedder to use two JSON templates
- Change ollama and openai embedders to use the new rest embedder
- Update settings
- Update and add tests

## Breaking change

> [!CAUTION]
> This PR is a breaking change to the REST embedder.
> Importing a dump containing a REST embedder configuration will fail in v1.10 with an error: "Error: unknown field `query`, expected one of `source`, `model`, `revision`, `apiKey`, `dimensions`, `documentTemplate`, `url`, `request`, `response`, `distribution` at line 1 column 752".

Upgrade procedure:

1. Remove any embedder with source "rest"
2. Create a dump
3. Import that dump in a v1.10
4. Re-add any removed embedder, using the new settings.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-07-24 16:32:52 +00:00
80fdea9afc Merge pull request #4823 from meilisearch/explicit-check-bench
Explicitly check permissions when receiving a slash command
2024-07-24 17:34:07 +02:00
e3faacd160 Explicitly check permissions when receiving a slash command 2024-07-24 17:09:25 +02:00
988552e178 add tests on the rest embedder 2024-07-24 14:34:17 +02:00
0d8199f3b7 Change parameters in milli settings 2024-07-24 14:34:17 +02:00
4b74803dae Change parameters in vector settings 2024-07-24 14:34:17 +02:00
d731fa661b ollama and openai use new EmbedderOptions 2024-07-24 14:34:17 +02:00