Commit Graph

9799 Commits

Author SHA1 Message Date
Louis Dureuil
2f10273d14 Group by normalized values, make sure you don't remove a value where there remains at still one value that normalizes towards it 2024-08-08 14:02:53 +02:00
meili-bors[bot]
b44e17c4c3 Merge #4858
4858: also intersect the universe for searchOnAttributes r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #4857 

## What does this PR do?
- intersect with the universe (which does not contain the filtered out ids) when looking up documents for words, even when using `searchOnAttributes`


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-08-07 13:15:26 +00:00
Louis Dureuil
e3ef0ae19e also intersect the universe for searchOnAttributes 2024-08-06 14:06:56 +02:00
meili-bors[bot]
57f7af77c7 Merge #4846
4846: Add OpenAI tests r=dureuill a=dureuill

# Pull Request

## Related issue
Part of fixing #4757 

## What does this PR do?
- OpenAI embedder: don't pass apiKey when it is empty (slightly improves error messages)
- rest embedder and rest-based embedders: specialize the authorization denied error message depending on the configuration source
- fix existing tests
- Adds assets containing prerecorded texts to embed and the embeddings obtained from OpenAI
- Adds an asset containing a tokenized long document and the embedding obtained from OpenAI for this token
- Uses the wiremock crate to mock the OpenAI API: parse the openai request, lookup the response in assets, craft an openai response


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
v1.10.0-rc.2
2024-08-05 10:49:28 +00:00
meili-bors[bot]
c817718e07 Merge #4853
4853: Fix rhai deletion r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #4849 

## What does this PR do?
- insert inside of the bitmap instead of pushing into it.


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-08-01 16:34:31 +00:00
Louis Dureuil
e64d0e0ca8 use insert instead of push for bitmaps 2024-08-01 18:32:45 +02:00
Louis Dureuil
21aa430b5e Fix openai tests 2024-07-31 17:57:55 +02:00
Louis Dureuil
8535dc0be2 Fix existing tests 2024-07-31 17:57:32 +02:00
Louis Dureuil
72b9005344 Redact uid for Value 2024-07-31 17:57:13 +02:00
meili-bors[bot]
420c33132c Merge #4850
4850: Use a fixed date format regardless of features r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #4844 

## What does this PR do?

Given the following script: 
```
cargo run -- --db-path meili.ms
sleep 3
curl -s -X POST http://127.0.0.1:7700/indexes -H 'Content-Type: application/json' --data-binary '{"uid": "movies", "primaryKey": "id"}'
sleep 3
cargo run  -p meilisearch --db-path meili.ms
sleep 3
curl -s -X POST http://127.0.0.1:7700/indexes/movies/search -H 'Content-Type: application/json' --data-binary '{}'
```

- Before this PR, the final search returns a decoding error.
- After this PR, the search completes successfully

### Technical standpoint

This PR fixes two locations where the formatting of dates were dependent on the feature set of the `time` crate.

1. The `IndexStats` had two fields without the serialization format specified
2. More subtly, the index dates (`createdAt,` `updatedAt`) were using value remapping in the main DB to `SerdeJson<OffsetDateTime>`, which was using whatever default format was available. This was fixed by creating a local `OffsetDateTime` wrapper that would specify the serialization format 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-07-31 15:32:26 +00:00
Louis Dureuil
9ef710cad4 Use wrapper that forces the desired date format 2024-07-31 17:12:19 +02:00
Louis Dureuil
48f7329a83 Specify index_mapper on IndexStats 2024-07-31 17:11:28 +02:00
Louis Dureuil
ab1ec9ca21 Add tokenized test 2024-07-31 15:03:45 +02:00
Louis Dureuil
9d6efd92d2 new assets for tokenized test 2024-07-31 15:03:45 +02:00
Louis Dureuil
abdb337fd6 Add openai tests 2024-07-31 15:03:45 +02:00
Louis Dureuil
1c755c8899 Add openai responses 2024-07-31 15:03:45 +02:00
Louis Dureuil
3a42c3134e update tests after changing authorized error message 2024-07-31 15:03:45 +02:00
Louis Dureuil
5aa6cb3600 Specialize authorized error message depending on config source 2024-07-31 15:03:44 +02:00
Louis Dureuil
9b7764575b openai: don't pass apiKey when it is empty 2024-07-31 15:03:44 +02:00
meili-bors[bot]
25791e3f46 Merge #4836
4836: Attach declared localized-attributes subroutes r=dureuill a=dureuill

RC.0 unexpectedly doesn't contain the `GET /indexes/{indexUid}/localized-attributes` and `PUT /indexes/{indexUid}/localized-attributes` subroute.

This PR makes them available.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
v1.10.0-rc.1
2024-07-30 19:01:54 +00:00
Tamo
b1b3a1a98b add a get, set and put test for the localized attributes setting 2024-07-30 15:51:02 +02:00
meili-bors[bot]
143d6cde10 Merge #4835
4835: Log error from main using tracing r=irevoire a=dureuill

Engine follow-up to https://github.com/meilisearch/meilisearch-support/issues/252#issuecomment-2251288276 (private link)

> `@meilisearch/engine-team` we need to open a PR to tracing::error! when an error occurs in the Meilisearch main. It would be nice to have it included in the second RC

<img width="1349" alt="Error logged when launching Meilisearch to import dump on path where the dump doesn't exist" src="https://github.com/user-attachments/assets/e5d2ae6e-f810-4029-9787-3b6ea9d47cfd">

---

<img width="1349" alt="Error logges when launching Meilisearch with a db path that is not writeable" src="https://github.com/user-attachments/assets/f672d78d-04b0-4d02-9402-259eaa6e2b62">



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-07-30 13:43:50 +00:00
Louis Dureuil
9719dec443 Attach declared attributes-localized subroutes 2024-07-29 16:19:35 +02:00
Louis Dureuil
fa77a949aa Log error from main using tracing 2024-07-29 14:58:39 +02:00
meili-bors[bot]
abe128476f Merge #4830
4830: Use the dtolnay's Rust Toolchain r=dureuill a=Kerollmops

Fixes the CI by using another rust-toolchain GitHub repo.

Note: the [helix-editor/rust-toolchain repository](https://github.com/helix-editor/rust-toolchain) has been deleted so we moved to the [dtolnay/rust-toolchain](https://github.com/dtolnay/rust-toolchain) one. However, the dtolnay's one doesn't support `rust-toolchain.toml` and the version is directly in the rust-toolchain@version. We keep the `rust-toolchain.toml` for local builds only.

Co-authored-by: Clément Renault <clement@meilisearch.com>
v1.10.0-rc.0
2024-07-29 08:33:59 +00:00
Clément Renault
a663e408ad Move to the right rust toolchain version 2024-07-29 10:06:34 +02:00
Clément Renault
986991277f Use the dtolnay rust toolchain 2024-07-29 10:00:40 +02:00
meili-bors[bot]
c2c1ba39ee Merge #4826
4826: Update Charabia v0.9.0 r=dureuill a=ManyTheFish

# Pull Request

## Related Changelog
https://github.com/meilisearch/charabia/releases/tag/v0.9.0

## Notable Change for Meilisearch
Adds all math symbols from https://www.compart.com/en/unicode/category/Sm to the default separator list.



Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-07-25 14:08:38 +00:00
ManyTheFish
35567b2137 Update Charabia v0.9.0 2024-07-25 16:02:14 +02:00
meili-bors[bot]
00c97c7152 Merge #4818
4818: Custom headers and QoL improvements r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes #4734 
Depends on #4815 

## What does this PR do?
- Adds custom headers for rest embedders ([public usage](https://meilisearch.notion.site/v1-10-AI-search-changes-737c9d7d010d4dd685582bf5dab579e2#41354652885242c899def07e36a66d49))
- Quality of life: allow specifying `dimensions` for `ollama` embedders ([public usage](https://meilisearch.notion.site/v1-10-AI-search-changes-737c9d7d010d4dd685582bf5dab579e2#37218531431343dab3d2d3a9a1937e9d)). As for `rest` embedders, specifying `dimensions` disables the "test" embedding when the embedder is spawned.
- Improve error message again when indexing documents that don't have a vector for a user-provided vector
  1. Remove the contents of the document
  2. Display the docid of the first document that triggered the error
  3. Indicate how many documents in that chunk suffered from the same issue for that embedder


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-07-25 13:33:11 +00:00
Louis Dureuil
d4ea7cc2a9 fix clippy 👉👈 2024-07-25 12:10:32 +02:00
Louis Dureuil
8532fe8afc Fix tests 2024-07-25 12:10:32 +02:00
Louis Dureuil
2413592bbf Display docid when there are documents without manual embeddings for a manual embedder 2024-07-25 12:10:32 +02:00
Louis Dureuil
553440632e Introduce Setting::some_or_not_set 2024-07-25 12:01:52 +02:00
Louis Dureuil
7a347966da Allow explicit dimensions for ollama 2024-07-25 12:01:51 +02:00
Louis Dureuil
6c598fa06d test custom headers 2024-07-25 12:01:51 +02:00
Louis Dureuil
8338df0dbe Fix tests 2024-07-25 12:01:51 +02:00
Louis Dureuil
4654d51e05 Add custom headers for REST embedder 2024-07-25 12:01:51 +02:00
Louis Dureuil
22ef2d877f Ensure test server has a single indexing thread 2024-07-25 12:01:51 +02:00
meili-bors[bot]
76bc2c18e8 Merge #4819
4819: Language settings r=dureuill a=ManyTheFish

# Pull Request

## Related issue
Fixes #4749 

## What does this PR do?
- [Implement localized search](c0c6955c0d)
- [Implement localized attributes settings](bde827b055)

## Related PRD

- [PRD](https://www.notion.so/meilisearch/Define-language-settings-to-impact-relevancy-bee62e18b7584c4f87d18a7654855329)
- [Public usage](https://www.notion.so/meilisearch/v1-10-Language-settings-usage-26c5d98b553349d9abacbe7aff698e4e)


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-07-25 09:00:33 +00:00
Louis Dureuil
59115fd058 Fix tests 2024-07-25 10:52:57 +02:00
ManyTheFish
a918561ac1 Fix PR comments 2024-07-25 10:52:56 +02:00
ManyTheFish
70d71581ee fix clippy 2024-07-25 10:52:56 +02:00
ManyTheFish
4fbe048cbf Update Cargo.lock 2024-07-25 10:52:56 +02:00
ManyTheFish
e06fbcc607 Update snapshots 2024-07-25 10:52:56 +02:00
ManyTheFish
04fa44e7eb Implement localized attributes settings 2024-07-25 10:51:27 +02:00
ManyTheFish
90c0a6db7d Implement localized search 2024-07-25 10:51:27 +02:00
ManyTheFish
d82f8fd904 Add tests 2024-07-25 10:51:27 +02:00
ManyTheFish
cc02920f2b Update charabia 2024-07-25 10:51:27 +02:00
meili-bors[bot]
c26bd68de5 Merge #4815
4815: Rest embedder api mk2 r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4756

- [x] [REST API parameter names and behavior are unclear](https://github.com/meilisearch/documentation/pull/2824#issuecomment-2124073720)
  - unclear names are removed. There remain only two parameters: `request`, a template of what Meilisearch's request to the embedding server should be, and `response`, a template of what the embedding server's response to Meilisearch should look like
- [x] [Bad error message or bad default value when we don't specify the `query` parameter](85d8455c11/meilisearch/tests/vector/rest.rs (L105-L140))
  - The replacement for `query`, which is `request`, is now a mandatory parameter. Omitting it will result in the following error message : "`.embedders.rest`: Missing field `request` (note: this field is mandatory for source rest)", which is clear
- [x] [Bad error message when both `pathToEmbeddings` and `embeddingObject` are missing](2141cb3b69/meilisearch/tests/vector/rest.rs (L142-L178))
  - These parameters no longer exist. Now, the point of extraction is given directly by the location of an `{{embedding}}` placeholder in the `response` parameter.
- [x] [Unexpected error when we don't specify both `pathToEmbeddings` and `embeddingObject` (only once should be required)](2141cb3b69/meilisearch/tests/vector/rest.rs (L180-L260))
  - These parameters no longer exist. Now, the point of extraction is given directly by the location of an `{{embedding}}` placeholder in the `response` parameter.
- [x] [Should not panic when the dimensions specified do not work with the model](2141cb3b69/meilisearch/tests/vector/rest.rs (L262-L299))
  - This no longer panics, instead returns "While embedding documents for embedder `rest`: runtime error: was expecting embeddings of dimension `2`, got embeddings of dimensions `3`"
- [x] [Be more flexible on the type of data that is accepted](https://github.com/meilisearch/meilisearch/issues/4757#issuecomment-2201948531)
  - [x] Always accept arrays of embeddings even if `inputType` is set to `text`
    - This is controlled by the repeat placeholder `"{..}"`, an array of embeddings can be configured even if the input is not in an array.
  - [x] Accept arrays of result at the root level and texts/array of text at the root level.
    -  doable with `request: "{{text}}"` and `response: "{{embedding}}"` or `response: ["{{embedding}}"]` (see test `vector::rest::server_raw`)

## What does this PR do?
- [See public usage](https://meilisearch.notion.site/v1-10-AI-search-changes-737c9d7d010d4dd685582bf5dab579e2#8de842673ffa4a139210094a89c1ec3e)
- Add new `milli::vector::json_template` module to parse JSON templates with an injection placeholder and a repeat placeholder
- Change rest embedder to use two JSON templates
- Change ollama and openai embedders to use the new rest embedder
- Update settings
- Update and add tests

## Breaking change

> [!CAUTION]
> This PR is a breaking change to the REST embedder.
> Importing a dump containing a REST embedder configuration will fail in v1.10 with an error: "Error: unknown field `query`, expected one of `source`, `model`, `revision`, `apiKey`, `dimensions`, `documentTemplate`, `url`, `request`, `response`, `distribution` at line 1 column 752".

Upgrade procedure:

1. Remove any embedder with source "rest"
2. Create a dump
3. Import that dump in a v1.10
4. Re-add any removed embedder, using the new settings.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-07-24 16:32:52 +00:00