Commit Graph

9538 Commits

Author SHA1 Message Date
9736e16a88 Make clippy happy 2024-06-20 13:02:44 +02:00
6fa4da8ae7 Improve facet distribution speed in count mode 2024-06-20 12:58:51 +02:00
19d7cdc20d Improve facet distribution speed in lexico mode 2024-06-20 12:57:08 +02:00
e580d6b98f Merge #4693
4693: Introduce distinct attributes at search time r=irevoire a=Kerollmops

This PR fixes #4611.

### To Do
- [x] Remove the `distinguishableAttributes` settings (not even a commit about that).
- [x] Use the `filterableAttributes` to be able to use the `distinct` parameter at search.
- [x] Work on the errors and make tests.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
v1.9.0-rc.3
2024-06-18 07:45:03 +00:00
8ba65e333b add snapshot files 2024-06-17 16:50:26 +02:00
43875e6758 fix bug around nested fields 2024-06-17 15:59:30 +02:00
d7844a6e45 add a bunch of tests on the errors of the distinct at search time 2024-06-17 15:37:32 +02:00
e9bf4c43a4 Merge #4649
4649: Don't store the vectors in the documents database r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4607

## What does this PR do?
- Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable
- [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings
- [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field.
- [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-17 12:32:03 +00:00
a8a0854421 Update meilisearch/src/analytics/segment_analytics.rs 2024-06-17 14:30:50 +02:00
0a8f50695e Fixes for Rust v1.79 2024-06-13 17:47:44 +02:00
09d9b63e1c - test case where all vectors were generated
- update tests following changes in behavior from previous commit
2024-06-13 17:16:41 +02:00
b9b938c902 Change retrieveVectors behavior:
- when the feature is disabled, documents are never modified
- when the feature is enabled and `retrieveVectors` is disabled, `_vectors` is removed from documents
- when the feature is enabled and `retrieveVectors` is enabled, vectors from the vectors DB are merged with `_vectors` in documents

Additionally `_vectors` is never displayed when the `displayedAttributes` list does not contain either `*` or `_vectors`

- fixed an issue where `_vectors` was not injected when all vectors in the dataset where always generated
2024-06-13 17:13:36 +02:00
6bf07d969e add failing test 2024-06-13 15:49:42 +02:00
e35ef31738 Small changes following review 2024-06-13 14:20:48 +02:00
3f212a8202 Update tests 2024-06-12 18:13:34 +02:00
bc547dad6f Update dump file 2024-06-12 18:12:56 +02:00
3bc8f81abc user_provided => regenerate 2024-06-12 18:12:20 +02:00
a89eea233b Fix vectors injection 2024-06-12 17:10:19 +02:00
34fabed214 Add test for vector writeback 2024-06-12 17:09:34 +02:00
fca9fe39b3 Update test snapshots 2024-06-12 14:50:55 +02:00
f5cf01e7d1 Rework extraction to use EmbedderAction 2024-06-12 14:50:55 +02:00
d1dd7e5d09 In transform for removed embedders, write back their user provided vectors in documents, and clear the writers 2024-06-12 14:50:55 +02:00
d18c1f77d7 Update embedder configs with a finer granularity
- no longer clear vector DB between any two embedder changes
2024-06-12 14:50:55 +02:00
d0b05ae691 Add EmbedderAction to settings 2024-06-12 14:50:54 +02:00
e9bf4eb100 Reformulate ParsedVectorsDiff in terms of VectorState 2024-06-12 14:11:44 +02:00
b368105272 Add EmbedderConfigs::into_inner 2024-06-12 14:11:44 +02:00
e0eff08095 Merge #4685
4685: Fix ci tests r=dureuill a=ManyTheFish

# Pull Request
Make the all following CI succeed:
https://github.com/meilisearch/meilisearch/actions/runs/9477183091

## Related issue
Fixes #4629

## What does this PR do?
- Change the test behavior for `swedish-recomposition` feature flag
- Remove the `-v` parameter from grep

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-06-12 07:58:33 +00:00
304a9df52d Remove -v parameter 2024-06-12 07:22:24 +02:00
39f60abd7d Add and modify distinct tests 2024-06-11 17:53:53 -04:00
1991bd03da Distinct at search erases the distinct in the settings 2024-06-11 17:02:39 -04:00
ee39309aae Improve errors and introduce a new InvalidSearchDistinct error code 2024-06-11 16:03:39 -04:00
0d31be1494 Make the distinct work at search 2024-06-11 11:39:35 -04:00
3493093c4f add a batch of tests 2024-06-11 16:03:54 +02:00
7cef2299cf Fix behavior when removing a document 2024-06-11 09:45:08 +02:00
a838f39fce Merge #4682
4682: Speed Up Filter ANDs operations r=Kerollmops a=Kerollmops

This PR fixes #4659 and improves the way we do AND operations by using the latest [RoaringBitmap feature to do intersections with serialized bitmaps](https://github.com/RoaringBitmap/roaring-rs/pull/281). Doing so drastically reduces the time spent reading, copying bytes in memory to use and keep a subset of the containers in the bitmap.

### Some Example Results

With a 45M documents dataset running on a good NVMe. This example filter was taking 77ms and with this PR only 13ms (6x speedup):

```sql
artist = 'The Beatles' AND (duration 150 TO 500 OR duration NOT EXISTS) AND genres IN [Rock, 'Rock and Roll'] AND rating > 4 AND released_year 1960 TO 1990
```

By reordering the filter AND clauses we can reach a constant 8ms execution time. However, note that it is a manual operation. On the other side the previous filter pipeline is still at a constant 45ms execution time with this filter. (6x speedup)

```sql
artist = 'The Beatles' AND genres IN [Rock, 'Rock and Roll'] AND released_year 1960 TO 1990 AND (duration 150 TO 500 OR duration NOT EXISTS)
```

### To Do
- [x] Rebase on `release-v1.9.0`.
- [ ] ~Skip branches of the facet/filter tree when nothing is in common with the universe~ slower this way.
- [x] When the universe is required use the universe given in parameter if possible.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-06-11 02:51:17 +00:00
600e97d9dc gate the retrieveVectors parameter behind the vectors feature flag 2024-06-10 18:26:12 +02:00
7add7d053c Merge #4689
4689: Bring back changes from v1.8.2 into v1.9.0 r=curquiza a=dureuill



Co-authored-by: dureuill <dureuill@users.noreply.github.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
2024-06-10 14:03:55 +00:00
7559dfc814 Merge tag 'v1.8.2' into release-v1.9.0 2024-06-10 15:07:34 +02:00
6c6c4732a1 Merge #4681
4681: Fix concurrency issue r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #4654 

## What does this PR do?
- Asynchronously drop permits


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
v1.8.2
2024-06-10 09:36:08 +00:00
0502b17501 log the state of the index-scheduler in all failed tests 2024-06-10 10:52:49 +02:00
3976fe660e Merge #4688
4688: Update version for the next release (v1.8.2) in Cargo.toml r=dureuill a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: dureuill <dureuill@users.noreply.github.com>
2024-06-10 08:28:34 +00:00
50f8218a5d Asynchronously drop permits 2024-06-10 10:19:57 +02:00
19585f1a4f Update version for the next release (v1.8.2) in Cargo.toml 2024-06-10 07:59:36 +00:00
8ec6e175e5 Replace roaring patch to the v0.10.5 2024-06-07 22:11:26 -04:00
57d066595b fix Tests almost all features 2024-06-06 17:24:50 +02:00
75b2e02cd2 Log more stuff around filtering 2024-06-06 11:00:07 -04:00
40f05fe156 Bump roaring to the latest commit 2024-06-06 10:59:55 -04:00
734d1c53ad fix a panic in yaup 2024-06-06 16:31:07 +02:00
52d0d35b39 Revert "Reduce the universe while exploring the facet tree" because it's slower this way
This reverts commit 14026115f21409535772ede0ee4273f37848dd61.
2024-06-06 09:17:51 -04:00
5432776132 Reduce the universe while exploring the facet tree 2024-06-06 09:17:51 -04:00