Commit Graph

95 Commits

Author SHA1 Message Date
Clément Renault
0c57cf7565 Replace obkv with the temporary new version of it 2024-08-30 11:53:58 +02:00
Louis Dureuil
5aa6cb3600 Specialize authorized error message depending on config source 2024-07-31 15:03:44 +02:00
Louis Dureuil
9b7764575b openai: don't pass apiKey when it is empty 2024-07-31 15:03:44 +02:00
Louis Dureuil
553440632e Introduce Setting::some_or_not_set 2024-07-25 12:01:52 +02:00
Louis Dureuil
7a347966da Allow explicit dimensions for ollama 2024-07-25 12:01:51 +02:00
Louis Dureuil
4654d51e05 Add custom headers for REST embedder 2024-07-25 12:01:51 +02:00
Louis Dureuil
4b74803dae Change parameters in vector settings 2024-07-24 14:34:17 +02:00
Louis Dureuil
d731fa661b ollama and openai use new EmbedderOptions 2024-07-24 14:34:17 +02:00
Louis Dureuil
a1beddd5d9 rest embedder: use json_template 2024-07-24 14:34:17 +02:00
Louis Dureuil
4109182ca4 Add json_template module 2024-07-24 14:34:12 +02:00
Louis Dureuil
1a297c048e Error changes 2024-07-24 14:34:12 +02:00
Louis Dureuil
303e601b87 HuggingFace: Clearer error message when a model is not supported 2024-07-23 15:13:22 +02:00
Louis Dureuil
24240934f9 Improve errors when indexing documents with a user provided embedder 2024-07-16 13:39:01 +02:00
Louis Dureuil
f4c94ac57f manual embedders: limit max size of errors to 250 2024-07-16 13:39:01 +02:00
Louis Dureuil
4087a88dbe rest|ollama|openai: increase tries to 10 + randomize retry duration 2024-07-16 13:39:00 +02:00
Louis Dureuil
5adacf2f45 OpenAI: embed only the first MAX_TOKENS tokens 2024-07-16 13:39:00 +02:00
Louis Dureuil
65d0c32aa7 Allow overriding OpenAI's url 2024-07-16 13:39:00 +02:00
hanbings
0a40a98bb6 Make milli use edition 2021 (#4770)
* Make milli use edition 2021

* Add lifetime annotations to milli.

* Run cargo fmt
2024-07-09 17:25:39 +02:00
Tamo
ce08dc509b add more tests and improve the location of the error 2024-06-27 11:51:45 +02:00
Tamo
1daaed163a Make _vectors.:embedding.regenerate mandatory + tests + error messages 2024-06-27 11:04:58 +02:00
Louis Dureuil
e35ef31738 Small changes following review 2024-06-13 14:20:48 +02:00
Louis Dureuil
3bc8f81abc user_provided => regenerate 2024-06-12 18:12:20 +02:00
Louis Dureuil
d0b05ae691 Add EmbedderAction to settings 2024-06-12 14:50:54 +02:00
Louis Dureuil
e9bf4eb100 Reformulate ParsedVectorsDiff in terms of VectorState 2024-06-12 14:11:44 +02:00
Louis Dureuil
b368105272 Add EmbedderConfigs::into_inner 2024-06-12 14:11:44 +02:00
Tamo
31a793d226 fix the regeneration of the embeddings in the search 2024-06-06 11:39:29 +02:00
Tamo
d85ab23b82 rename all occurences of user_defined to user_provided for consistency 2024-06-06 11:39:29 +02:00
Tamo
b7349910d9 implements mor review comments 2024-06-06 11:39:29 +02:00
Tamo
b867829ef1 remove useless dbg 2024-06-06 11:39:29 +02:00
Tamo
5d50850e12 always push the user defined vectors in arroy 2024-06-06 11:39:29 +02:00
Tamo
04f6523f3c expose a new parameter to retrieve the embedders at search time 2024-06-06 11:36:11 +02:00
Tamo
84e498299b Remove the vectors from the documents database 2024-06-06 11:36:11 +02:00
Louis Dureuil
d35278320e Add support functions for accessing arroy writers and readers 2024-05-28 15:27:43 +02:00
Louis Dureuil
3412e7fbcf "[]" is deserialized as 0 embedding rather than 1 embedding of dim 0 2024-05-22 12:25:21 +02:00
Louis Dureuil
16037e2169 Don't remove embedders that are not in the config from the document DB 2024-05-22 12:24:51 +02:00
Louis Dureuil
b17cb56dee Test array of vectors 2024-05-20 14:44:10 +02:00
Louis Dureuil
52d9cb6e5a Refactor vector indexing
- use the parsed_vectors module
- only parse `_vectors` once per document, instead of once per embedder per document
2024-05-20 10:36:17 +02:00
Louis Dureuil
98c811247e Add parsed vectors module 2024-05-20 10:25:59 +02:00
Louis Dureuil
f4dd73ec8c Destructure EmbedderOptions so we don't miss some options 2024-05-02 15:39:36 +02:00
meili-bors[bot]
c793b6ef6d Merge #4600
4600: Fix embedders api r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #4594
Fixes #4595


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 13:16:33 +00:00
Clément Renault
d4aeff92d0 Introduce the ThreadPoolNoAbort wrapper 2024-04-24 16:40:12 +02:00
ManyTheFish
9b76501875 Display set API key for Ollama embedder 2024-04-24 12:33:07 +02:00
meili-bors[bot]
b1844b0c27 Merge #4548
4548: v1.8 hybrid search changes r=dureuill a=dureuill

Implements the search changes from the [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#40f24df3da694428a39cc8043c9cfc64)

### ⚠️ Breaking changes in an experimental feature:

- Removed the `_semanticScore`. Use the `_rankingScore` instead.
- Removed `vector` in the response of the search (output was too big).
- Removed all the vectors from the `vectorSort` ranking score details
  - target vector appearing in the name of the rule
  - matched vector appearing in the details of the rule

### Other user-facing changes

- Added `semanticHitCount`, indicating how many hits were returned from the semantic search. This is especially useful in the hybrid search.
- Embed lazily: Meilisearch no longer generates an embedding when the keyword results are "good enough".
- Graceful embedding failure in hybrid search: when doing hybrid search (`semanticRatio in ]0.0, 1.0[`), an embedding failure no longer causes the search request to fail. Instead, only the keyword search is performed. When doing a full vector search (`semanticRatio==1.0`), a failure to embed will still result in failing that search.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-04 16:00:20 +00:00
Louis Dureuil
fabc9cf14a milli: add Embedder::embed_one 2024-04-04 15:57:29 +02:00
Louis Dureuil
00c4ed3bc2 milli: refactor getting embedder and embedder name 2024-04-04 15:57:29 +02:00
meili-bors[bot]
339a5e3431 Merge #4549
4549: Hugging Face embedder improvements r=dureuill a=dureuill

Architectural changes/Internal improvements

### 1. Prefer safetensors weights over pytorch weights when available

safetensors weights are memory mapped, which reduces memory usage of supported models.

### 2. Update candle

Updates candle to `0.4.1`, now targeting crates.io and the tokenizers to `v0.15.2` (still on github).

This might fix https://github.com/meilisearch/meilisearch/issues/4399 thanks to the now included https://github.com/huggingface/candle/issues/1454

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-04 13:47:18 +00:00
Louis Dureuil
a1eccc762a Prefer safetensors to pytorch when both are available 2024-04-03 11:05:59 +02:00
Louis Dureuil
572fb3a51d Finer granularity for embedder needs reindex 2024-03-27 12:01:34 +01:00
Louis Dureuil
4ff0255783 remove unused function 2024-03-27 11:51:14 +01:00
Louis Dureuil
a25456120d Expose distribution in settings 2024-03-27 11:51:04 +01:00