Commit Graph

9404 Commits

Author SHA1 Message Date
meili-bors[bot]
d4cb0a885b Merge #4275
4275: Flatten settings r=dureuill a=dureuill

# Pull Request

## Related issue
Initial internal feedback seems to indicate that the current shape of the `embedders` setting is undesirable: it has too much depth.

This PR changes this by flattening the structure of the embedders to the following:

```json5
// NEW structure
"embedders": {
  // still starts with the embedder name
  "default": {
    "source": "huggingFace", // now a string
    // properties of the source are all at the same level as the source
    "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    "revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd",
    "documentTemplate": "A product titled '{{doc.title}}'" // now a string
  }
}
```

By comparison, the old structure was:

```json5
// PREVIOUS version, no longer working with this PR
"embedders": {
  // still starts with the embedder name
  "default": {
    "source": {
      "huggingFace": {
        "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
        "revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd"
      },
    "documentTemplate": { 
      "template": "A product titled '{{doc.title}}'" // now a string
    }
  }
}
```

The fields that are accepted in the new version of the `embedders` setting are depending on the value of the `source` field:

```json5
// huggingFace
"embedders": {
   "default": {
    "source": "huggingFace",
    "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    "revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd",
    "documentTemplate": "A product titled '{{doc.title}}'"
  }
}

// openAi
"embedders": {
   "default": {
    "source": "openAi",
    "model": "text-embedding-ada-002",
    "apiKey": "open_ai_api_key",
    "documentTemplate": "A product titled '{{doc.title}}'"
  }
}

// userProvided
"embedders": {
   "default": {
    "source": "userProvided",
    "dimensions": 42, // mandatory
  }
}
```

## What does this PR do?
- Flatten the settings structure
- Validate the prompt earlier to return a synchronous error on setting change rather than in the failing task
- Make it an error to pass a field for the wrong source (see above for allowed fields for each source)
- Not changed: It is still an error not to pass `dimensions` to the `userProvided` embedder
- If `source` was specified in the settings, validate the setting early to return a synchronous error in case of a missing mandatory field for the userProvided source (dimensions) or a forbidden field for the specified source.
- If `source` was not specified in the settings, still validate the setting, but only at indexing time, by using the source stored in the DB.
- Resets all values if the source changes, even if the user did not reset them explicitly.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Change the public facing guide for using the API
- [ ] Change examples of use in the changelog


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-12-21 09:58:01 +00:00
Morgane Dubus
f52dee2b3b Update Cargo.toml
Update mini-dashboard with v0.2.12
2023-12-21 09:53:13 +01:00
Louis Dureuil
0bf879fb88 Fix warning on rust stable 2023-12-20 17:48:09 +01:00
Louis Dureuil
6ff81de401 Fix tests 2023-12-20 17:16:46 +01:00
Louis Dureuil
2e4c9651df Validate settings in route 2023-12-20 17:16:46 +01:00
Louis Dureuil
ec9649c922 Add function to validate settings in Meilisearch, to be used in the routes 2023-12-20 17:16:46 +01:00
Louis Dureuil
9123370e90 Validate fused settings in settings task after fusing with existing setting 2023-12-20 17:16:46 +01:00
Louis Dureuil
14b396d302 Add new errors 2023-12-20 17:16:45 +01:00
Louis Dureuil
393216bf30 Flatten embedders settings 2023-12-20 17:16:43 +01:00
Louis Dureuil
e249e4db7b Change Setting::apply function signature 2023-12-20 17:15:24 +01:00
meili-bors[bot]
de2ca7006e Merge #4272
4272: Don't pass default revision when the model is explicitly set in config r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #4271 

## What does this PR do?

- When the `model` is explicitly set in the `embedders` setting, we reset the `revision` to `None`, such that if the user doesn't specify a revision, the head of the model repository is chosen. 
- Not changed: If the user specifies a revision, it applies, like previously. 
- Not changed: If the user doesn't specify a model, the default model with the default revision applies, like previously.

## Manual testing on a fresh DB

1. Enable experimental feature:
```sh
curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json' -H 'Authorization: Bearer foo' \
--data-binary '{ "vectorStore": true
  }'
```
2. Send settings with a specified model but no specified revision:
```sh
curl \
-X PATCH 'http://localhost:7700/indexes/products/settings' \
-H 'Content-Type: application/json' --data-binary \
'{ "embedders": { "default": { "source": { "huggingFace": { "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2" } }, "documentTemplate": { "template": "A product titled '{{doc.title}}'"} } } }'
```
3. Check that the task was successful:
```sh
curl 'http://localhost:7700/tasks/0'

{"uid":0,"indexUid":"products","status":"succeeded","type":"settingsUpdate","canceledBy":null,"details":{"embedders":{"default":{"source":{"huggingFace":{"model":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"}},"documentTemplate":{"template":"A product titled {{doc.title}}"}}}},"error":null,"duration":"PT0.001892S","enqueuedAt":"2023-12-20T09:17:01.73789Z","startedAt":"2023-12-20T09:17:01.73854Z","finishedAt":"2023-12-20T09:17:01.740432Z"}
```
4. Send documents to index:
```sh
curl 'https://localhost:7700/indexes/products/documents' -H 'Content-Type: application/json' --data-binary '{"id": 0, "title": "Best product"}'
```

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
v1.6.0-rc.2
2023-12-20 14:27:51 +00:00
Louis Dureuil
333ce12eb2 Fixed issue where the default revision is always the one we picked for the default model 2023-12-20 10:17:49 +01:00
meili-bors[bot]
fb9db1eba6 Merge #4269
4269: Remove dependency that requires libstdc++ r=dureuill a=dureuill

Removes the dependency that caused the additional runtime dependency on libstdc++ by disabling the default features of the hf tokenizer.

## Discussion

- This removes a feature that is using a C++ dependency and is supposed to accelerate the tokenizer. As the tokenizer is likely to be a significant bottleneck for embedding texts using a HF model, this is an issue.
- We should at least rerun the movies vector indexing and check that it still works correctly and that it has a runtime in the ballpark of what it used to be.

Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>
v1.6.0-rc.1
2023-12-19 12:26:48 +00:00
Clément Renault
fa2b96b9a5 Add an Authorization Header along with the webhook calls prototype-task-queue-webhook-1 2023-12-19 12:18:45 +01:00
Tamo
19736cefe8 add the analytics 2023-12-19 10:36:04 +01:00
Tamo
4fb25b8782 fix clippy 2023-12-19 10:35:51 +01:00
Tamo
c83a33017e stream and chunk the data 2023-12-19 10:35:51 +01:00
Tamo
be72326c0a gzip the tasks 2023-12-19 10:35:51 +01:00
Tamo
547379abb0 parse the url correctly 2023-12-19 10:35:51 +01:00
Tamo
0b2fff27f2 update and fix the test 2023-12-19 10:35:51 +01:00
Tamo
3adbc2b942 return a task view instead of a task 2023-12-19 10:35:51 +01:00
Tamo
fbea721378 add a first working test with actixweb 2023-12-19 10:35:51 +01:00
Tamo
391eb72137 start writing a test with actix but it doesn't works 2023-12-19 10:35:50 +01:00
Tamo
d78ad51082 Implement the webhook 2023-12-19 10:35:50 +01:00
Tamo
1956045a06 add the option 2023-12-19 10:23:56 +01:00
Louis Dureuil
b2193e612f Revert "Add libstdc++ in Dockerfile" as it is no longer needed
This reverts commit 9df8cfc013.
2023-12-18 22:17:29 +01:00
Louis Dureuil
942d49314c Remove dependency that requires libstdc++ 2023-12-18 22:17:18 +01:00
meili-bors[bot]
9a846e82bc Merge #4268
4268: Add libstdc++ in Dockerfile r=curquiza a=sanders41

# Pull Request

## Related issue
Fixes #4267

## What does this PR do?
- Add libstdc++ in the Dockerfile

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Paul Sanders <psanders1@gmail.com>
2023-12-18 18:35:53 +00:00
Paul Sanders
9df8cfc013 Add libstdc++ in Dockerfile 2023-12-18 13:05:46 -05:00
dependabot[bot]
d868131bb7 Bump rustls-webpki from 0.101.3 to 0.101.7
Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.101.3 to 0.101.7.
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.101.3...v/0.101.7)

---
updated-dependencies:
- dependency-name: rustls-webpki
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-12-18 14:57:38 +00:00
meili-bors[bot]
248aaa6d45 Merge #4262
4262: Update version for the next release (v1.6.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
v1.6.0-rc.0
2023-12-18 14:00:19 +00:00
curquiza
50d6317ec0 Update version for the next release (v1.6.0) in Cargo.toml 2023-12-18 13:57:46 +00:00
meili-bors[bot]
b734bd9891 Merge #4261
4261: Set rust toolchain to 1.71.1 in dockerfile r=curquiza a=dureuill

Fixes docker [CI](https://github.com/meilisearch/meilisearch/actions/workflows/publish-docker-images.yml)

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-12-18 12:32:26 +00:00
Louis Dureuil
9800d5a103 Set rust toolchain to 1.71.1 in dockerfile 2023-12-18 10:59:25 +01:00
meili-bors[bot]
7c4ed07617 Merge #4257
4257: Change proximity precision settings r=dureuill a=ManyTheFish

- [x] Add proximity_precision value into the analytics
- [x] Change the naming of `attributeScale` and `wordScale` into `byAttribute` and `byWord`
- [x] Remove proximityPrecision from the experimental feature

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-12-18 09:07:28 +00:00
ManyTheFish
3a99a555a2 Fix experimental features snapshots in tests 2023-12-18 10:05:51 +01:00
Many the fish
9e1b458010 Merge branch 'main' into change-proximity-precision-settings 2023-12-18 09:08:47 +01:00
meili-bors[bot]
2aede03bc2 Merge #4226
4226: Hybrid search r=dureuill a=dureuill

Allows to perform hybrid search requests that combine the results of semantic and keyword search and automatically generate embeddings.

## How to use

See [feature description](https://meilisearch.notion.site/v1-6-Hybrid-Search-Embedders-ea42c82f90cc4bc0be1eeb917c1118c8)

## Changes

- work is based on #4213 
- milli::new search now takes an input universe directly, rather than computing it from a filter. This adds flexibility to require results on a subset of documents
- vector search is now a regular ranking rule (akin to sort and geosort) and reports its score as a ScoreDetail
- separate keyword search and vector search functions, vector search now respects (geo)sort ranking rules
- add automatic embedding
- add hybrid search

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-12-14 16:24:56 +00:00
ManyTheFish
e741bc1c62 Add proximity_precision value into the analytics 2023-12-14 16:48:06 +01:00
ManyTheFish
6425996e36 Change the naming of attributeScale and wordScale into byAttribute and byWord 2023-12-14 16:31:00 +01:00
Louis Dureuil
eb5cb91da2 Switch default from hf to openai 2023-12-14 16:19:46 +01:00
Louis Dureuil
87bba98bd8 Various changes
- fixed seed for arroy
- check vector dimensions as soon as it is provided to search
- don't embed whitespace
2023-12-14 16:08:42 +01:00
Louis Dureuil
217105b7da hybrid search uses semantic ratio, error handling 2023-12-14 16:08:42 +01:00
ManyTheFish
1b7c164a55 Pass the semantic ratio to milli 2023-12-14 16:08:42 +01:00
ManyTheFish
f3f3944469 Fix error checking 2023-12-14 16:08:42 +01:00
ManyTheFish
93dcbf598d Deserialize semantic ratio 2023-12-14 16:08:42 +01:00
ManyTheFish
ac68f33194 Add simple test 2023-12-14 16:08:42 +01:00
ManyTheFish
9991152bbe Add TODOs 2023-12-14 16:08:42 +01:00
Louis Dureuil
a4536b1381 Small adjustments to respect the spec 2023-12-14 16:08:42 +01:00
Louis Dureuil
5b51cb04af Remove some settings 2023-12-14 16:08:42 +01:00