Commit Graph

2464 Commits

Author SHA1 Message Date
meili-bors[bot]
3c4c46377b Merge #4665
4665: Add missing Korean support r=ManyTheFish a=junhochoi

Some configuration is missing `korean` features and add a test case in `milli/src/search/mod.rs`.

# Pull Request

## Related issue

#3443 #3882 

## What does this PR do?
- Improvement on enabling Korean support

Inspired by the work (#3882) I tried to enable Korean features but have found some missing configurations.
This PR is add those missing configs (mostly Cargo.toml) and added one test case.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Junho Choi <jh.choi@catenoid.net>
2024-06-25 11:51:21 +00:00
Louis Dureuil
d75e0098c7 Fixes for Rust v1.79 2024-06-25 11:16:06 +02:00
Junho Choi
2e0ff56f3f Add missing Korean support
Some configuration is missing `korean` features and
add a test case in `milli/src/search/mod.rs`.
2024-06-25 12:45:21 +09:00
Tamo
1693332cab Update arroy and always build the tree that need to be built 2024-06-24 10:14:03 +02:00
meili-bors[bot]
ddd564665b Merge #4713
4713: Speed up facet distribution r=ManyTheFish a=Kerollmops

This PR is akin to #4682, but this time, the same logic is applied to the facets. Bitmaps are not decoded, and we do an intersection on the bytes with the search candidates instead of materializing the RoaringBitmap to destroy it just after the operation.

A prospect raised some slow requests when performing facet searches, and I found out that the disk optimization intersection wasn't performed on the facets.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-06-24 05:23:46 +00:00
Clément Renault
9736e16a88 Make clippy happy 2024-06-20 13:02:44 +02:00
Clément Renault
6fa4da8ae7 Improve facet distribution speed in count mode 2024-06-20 12:58:51 +02:00
Clément Renault
19d7cdc20d Improve facet distribution speed in lexico mode 2024-06-20 12:57:08 +02:00
Louis Dureuil
a04041c8f2 Only spawn the pool once 2024-06-19 16:25:33 +02:00
meili-bors[bot]
e580d6b98f Merge #4693
4693: Introduce distinct attributes at search time r=irevoire a=Kerollmops

This PR fixes #4611.

### To Do
- [x] Remove the `distinguishableAttributes` settings (not even a commit about that).
- [x] Use the `filterableAttributes` to be able to use the `distinct` parameter at search.
- [x] Work on the errors and make tests.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-06-18 07:45:03 +00:00
Tamo
43875e6758 fix bug around nested fields 2024-06-17 15:59:30 +02:00
meili-bors[bot]
e9bf4c43a4 Merge #4649
4649: Don't store the vectors in the documents database r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4607

## What does this PR do?
- Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable
- [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings
- [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field.
- [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-17 12:32:03 +00:00
Louis Dureuil
0a8f50695e Fixes for Rust v1.79 2024-06-13 17:47:44 +02:00
Louis Dureuil
e35ef31738 Small changes following review 2024-06-13 14:20:48 +02:00
Louis Dureuil
3bc8f81abc user_provided => regenerate 2024-06-12 18:12:20 +02:00
Louis Dureuil
a89eea233b Fix vectors injection 2024-06-12 17:10:19 +02:00
Louis Dureuil
f5cf01e7d1 Rework extraction to use EmbedderAction 2024-06-12 14:50:55 +02:00
Louis Dureuil
d1dd7e5d09 In transform for removed embedders, write back their user provided vectors in documents, and clear the writers 2024-06-12 14:50:55 +02:00
Louis Dureuil
d18c1f77d7 Update embedder configs with a finer granularity
- no longer clear vector DB between any two embedder changes
2024-06-12 14:50:55 +02:00
Louis Dureuil
d0b05ae691 Add EmbedderAction to settings 2024-06-12 14:50:54 +02:00
Louis Dureuil
e9bf4eb100 Reformulate ParsedVectorsDiff in terms of VectorState 2024-06-12 14:11:44 +02:00
Louis Dureuil
b368105272 Add EmbedderConfigs::into_inner 2024-06-12 14:11:44 +02:00
meili-bors[bot]
e0eff08095 Merge #4685
4685: Fix ci tests r=dureuill a=ManyTheFish

# Pull Request
Make the all following CI succeed:
https://github.com/meilisearch/meilisearch/actions/runs/9477183091

## Related issue
Fixes #4629

## What does this PR do?
- Change the test behavior for `swedish-recomposition` feature flag
- Remove the `-v` parameter from grep

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-06-12 07:58:33 +00:00
Clément Renault
39f60abd7d Add and modify distinct tests 2024-06-11 17:53:53 -04:00
Clément Renault
1991bd03da Distinct at search erases the distinct in the settings 2024-06-11 17:02:39 -04:00
Clément Renault
ee39309aae Improve errors and introduce a new InvalidSearchDistinct error code 2024-06-11 16:03:39 -04:00
Clément Renault
0d31be1494 Make the distinct work at search 2024-06-11 11:39:35 -04:00
Louis Dureuil
7cef2299cf Fix behavior when removing a document 2024-06-11 09:45:08 +02:00
ManyTheFish
57d066595b fix Tests almost all features 2024-06-06 17:24:50 +02:00
Clément Renault
75b2e02cd2 Log more stuff around filtering 2024-06-06 11:00:07 -04:00
Clément Renault
52d0d35b39 Revert "Reduce the universe while exploring the facet tree" because it's slower this way
This reverts commit 14026115f21409535772ede0ee4273f37848dd61.
2024-06-06 09:17:51 -04:00
Clément Renault
5432776132 Reduce the universe while exploring the facet tree 2024-06-06 09:17:51 -04:00
Clément Renault
66470b27e6 Use the MultiOps trait for IN operations 2024-06-06 09:17:51 -04:00
Clément Renault
0a9bd398c7 Improve the NOT operator to use the universe when possible 2024-06-06 09:17:51 -04:00
Clément Renault
7967e93c16 Skip evaluating when a universe is empty, nothing can be found 2024-06-06 09:17:51 -04:00
Clément Renault
a6f3a01c6a Expose the universe to do efficient intersections on deserialization 2024-06-06 09:17:51 -04:00
Clément Renault
4ca4a3f954 Make the CboRoaringBitmapCodec support intersection on deserialization 2024-06-06 09:17:51 -04:00
Clément Renault
e4a69c5ac3 Introduce the FacetGroupLazyValue type 2024-06-06 09:17:50 -04:00
Clément Renault
531e3d7d6a MultiOps trait for OR operations 2024-06-06 09:17:50 -04:00
Tamo
2cdcb703d9 fix the deletion of vectors and add a test 2024-06-06 11:39:29 +02:00
Tamo
31a793d226 fix the regeneration of the embeddings in the search 2024-06-06 11:39:29 +02:00
Tamo
d85ab23b82 rename all occurences of user_defined to user_provided for consistency 2024-06-06 11:39:29 +02:00
Tamo
b7349910d9 implements mor review comments 2024-06-06 11:39:29 +02:00
Tamo
376b3a19a7 makes clippy and fmt happy 2024-06-06 11:39:29 +02:00
Tamo
b867829ef1 remove useless dbg 2024-06-06 11:39:29 +02:00
Tamo
5d50850e12 always push the user defined vectors in arroy 2024-06-06 11:39:29 +02:00
Tamo
a73ccc78a6 forward the embedding config to the extractors 2024-06-06 11:39:28 +02:00
Tamo
9eb6f522ea wraps the index embedding config in a struct 2024-06-06 11:37:30 +02:00
Tamo
04f6523f3c expose a new parameter to retrieve the embedders at search time 2024-06-06 11:36:11 +02:00
Tamo
84e498299b Remove the vectors from the documents database 2024-06-06 11:36:11 +02:00