Commit Graph

9940 Commits

Author SHA1 Message Date
Louis Dureuil
c2fb7afe59 fmt 2024-05-30 12:06:46 +02:00
ManyTheFish
3f1a510069 Add tests and fix matching strategy 2024-05-30 12:02:42 +02:00
Clément Renault
3a78e988da Reduce the number of complex calls to settings diff functions 2024-05-30 11:23:07 +02:00
Clément Renault
d9e5074189 Introduce a new way to determine the operations to perform on the fields 2024-05-30 11:23:07 +02:00
Clément Renault
bc210bdc00 Introduce a dedicated function to write proximity entries in database 2024-05-30 11:23:06 +02:00
Clément Renault
4bf83f701c Give the settings diff to the write_typed_chunk_into_index function 2024-05-30 11:23:06 +02:00
Clément Renault
db3887929f Fix an issue with settings diff and * in the searchable attributes 2024-05-30 11:22:50 +02:00
Clément Renault
9af103a88e Introducing a new into_del_add_obkv_conditional_operation function 2024-05-30 11:22:49 +02:00
Clément Renault
99211eb375 Introduce the SettingDiff only_additional_fields method 2024-05-30 11:22:49 +02:00
Louis Dureuil
41976b82b1 Tests for ranking_score_threshold 2024-05-30 11:22:26 +02:00
Louis Dureuil
c36410fcbf Analytics for ranking score threshold 2024-05-30 11:22:12 +02:00
Louis Dureuil
7ce2691374 Add ranking score threshold to similar API 2024-05-30 11:21:31 +02:00
Louis Dureuil
4f03b0cf5b Add ranking score threshold to similar 2024-05-30 11:20:50 +02:00
Louis Dureuil
c26db7878c Expose rankingScoreThreshold in API 2024-05-30 10:32:35 +02:00
meili-bors[bot]
06a9803544 Merge #4664
4664: Update README.md r=curquiza a=tpayet

Add hybrid & semantic as a feature

# Pull Request

## Related issue
Fixes #<issue_number>

## What does this PR do?
- ...

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Thomas Payet <thomas@meilisearch.com>
2024-05-29 16:55:20 +00:00
Thomas Payet
b2588d8101 Update README.md
Add hybrid & semantic as a feature
2024-05-29 17:48:48 +02:00
meili-bors[bot]
62d27172f4 Merge #4663
4663: Bring back release v1.8.1 into main r=ManyTheFish a=ManyTheFish



Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: ManyTheFish <ManyTheFish@users.noreply.github.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-05-29 14:47:38 +00:00
ManyTheFish
1ab88e10b9 Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 16:24:00 +02:00
ManyTheFish
6a4b2516aa WIP 2024-05-29 16:21:24 +02:00
Louis Dureuil
aac1d769a7 Add ranking_score_threshold to milli 2024-05-29 14:17:09 +02:00
ManyTheFish
abdc4afcca Implement Frequency matching strategy 2024-05-29 13:59:08 +02:00
meili-bors[bot]
75d5c0ae1f Merge #4647
4647: Feature: get similar documents r=dureuill a=dureuill

# Pull Request

## Related issue
Fixes #4610 

## What does this PR do?
[Usage](https://meilisearch.notion.site/Get-similar-documents-usage-540919ca755c4da0b7cdee273db3f290)

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-05-29 11:42:23 +00:00
meili-bors[bot]
a88554216a Merge #4657
4657: Update version for the next release (v1.9.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2024-05-29 11:14:19 +00:00
Louis Dureuil
2cf3e1c80a Temporarily ignore perform snapshot test under Windows 2024-05-29 12:42:47 +02:00
Many the fish
e1fbfde6c4 Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 11:31:03 +02:00
ManyTheFish
27b75ec648 merge main into v1.8.1 2024-05-29 11:26:07 +02:00
curquiza
07fdb081a4 Update version for the next release (v1.9.0) in Cargo.toml 2024-05-28 14:19:40 +00:00
Louis Dureuil
ca006e38ec Basic tests 2024-05-28 15:28:19 +02:00
Louis Dureuil
e26bd87780 Error tests for similar routes 2024-05-28 15:28:19 +02:00
Louis Dureuil
c01e498a63 Test server can call similar 2024-05-28 15:28:19 +02:00
Louis Dureuil
ca6cc4654b Add similar route 2024-05-28 15:28:19 +02:00
Louis Dureuil
3bd9d2478c Add error codes 2024-05-28 15:27:43 +02:00
Louis Dureuil
54b15059a0 Analytics changes 2024-05-28 15:27:43 +02:00
Louis Dureuil
d35278320e Add support functions for accessing arroy writers and readers 2024-05-28 15:27:43 +02:00
Louis Dureuil
e172e938e7 add search rules directly takes the filter rather than the searchquery 2024-05-28 15:22:25 +02:00
Louis Dureuil
02b3d82c60 filtered_universe accepts index and txn instead of SearchContext 2024-05-28 15:22:12 +02:00
Louis Dureuil
fd2c95999d Change validate_document_id to public and remove extra layer of result 2024-05-28 15:21:19 +02:00
meili-bors[bot]
e248d2a1e6 Merge #4655
4655: Remove `exportPuffinReport` experimental feature r=Kerollmops a=Kerollmops

This PR fixes #4605 by removing every trace of Puffin. Puffin is a great tool, but we use a better approach to measuring performance.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-28 07:01:16 +00:00
Clément Renault
487431a035 Fix tests 2024-05-27 16:12:20 +02:00
Clément Renault
b6d450d484 Remove puffin experimental feature 2024-05-27 15:59:28 +02:00
Clément Renault
dc949ab46a Remove puffin usage 2024-05-27 15:59:14 +02:00
Clément Renault
7f3e51349e Remove puffin for the dependencies 2024-05-27 15:53:06 +02:00
meili-bors[bot]
19acc65ad2 Merge #4646
4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops

This PR implements what is described in #4485. It reduces the number of disk writes and disk usage.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-23 16:06:50 +00:00
meili-bors[bot]
3a3ab17714 Merge #4651
4651: Allow to comment with the results of benchmark invocation r=Kerollmops a=dureuill



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-05-23 15:32:09 +00:00
Louis Dureuil
eaf57056ca comment with the results of benchmarks 2024-05-23 15:34:39 +02:00
Louis Dureuil
e340705634 Change benchmark outputs
- logs to stderr instead of stdout
- prints links to the dashboard when there is a dashboard
2024-05-23 15:29:06 +02:00
Clément Renault
fe17c0f52e Construct the minimal OBKVs according to the settings diff 2024-05-23 11:23:57 +02:00
meili-bors[bot]
14bc80e3df Merge #4633
4633: Allow to mark vectors as "userProvided" r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #4606 

## What does this PR do?

[See usage in PRD](https://meilisearch.notion.site/v1-9-AI-search-changes-e90d6803eca8417aa70a1ac5d0225697#deb96fb0595947bda7d4a371100326eb)

- Extends the shape of the special `_vectors` field in documents.
    - previously, the `_vectors` field had to be an object, with each field the name of a configured embedder, and each value either `null`, an embedding (array of numbers), or an array of embeddings.
    - In this PR, the value of an embedder in the `_vectors` field can additionally be an object. The object has two fields:
      1. `embeddings`: `null`, an embedding (array of numbers), or an array of embeddings.
      2. `userProvided`: a boolean indicating if the vector was provided by the user.
    - The previous form `embedder_or_array_of_embedders` is semantically equivalent to:
    ```json
    {
        "embeddings": embedder_or_array_of_embedders,
        "userProvided": true
    }
    ```
- During the indexing step, the subfields and values of the `_vectors` field that have `userProvided` set to **false** are added in the vector DB, but not in the documents DB: that means that future modifications of the documents will trigger a regeneration of that particular vector using the document template.
- This allows **importing** embeddings as a one-shot process, while still retaining the ability to regenerate embeddings on document change.
- The dump process now uses this ability: it enriches the `_vectors` fields of documents with the embeddings that were autogenerated, marking them as not `userProvided`. This allows importing the vectors from a dump without regenerating them.

### Tests

This PR adds the following tests

- Long-needed hybrid search tests of a simple hf embedder
- Dump test that imports vectors. Due to the difficulty of actually importing a dump in tests, we just read the dump and check it contains the expected content.
- Tests in the index-scheduler: this tests that documents containing the same kind of instructions as in the dump indexes as expected


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-05-23 08:17:54 +00:00
Clément Renault
bc5663e673 FieldIdsMap no longer useful thanks to #4631 2024-05-22 16:06:15 +02:00
Louis Dureuil
8a941c0241 Smaller review changes 2024-05-22 14:44:42 +02:00