96be85396d
Use a vecDeque in wpp database
2023-10-30 11:15:02 +01:00
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use
2023-10-30 11:15:02 +01:00
b541d48847
Add buffer to the obkv writter
2023-10-30 11:15:02 +01:00
8ccf32d1a0
Compute word_fid_docids before word_docids and exact_word_docids
2023-10-30 11:15:02 +01:00
db1ca21231
add puffin in sorter into reeder function
2023-10-30 11:15:00 +01:00
11ea5acff9
Fix
2023-10-30 11:13:10 +01:00
8d77736a67
Fix fid_word_docids
2023-10-30 11:13:10 +01:00
748b333161
Add usefull debug assert before key insertion in database
2023-10-30 11:13:10 +01:00
17b647dfe5
Wip
2023-10-30 11:13:08 +01:00
e7244aa485
fix warnings
2023-10-30 11:00:46 +01:00
2bae9550c8
Add explanatory comment
2023-10-23 12:06:28 +02:00
5fe7c4545a
compute all candidates correctly when skipping
2023-10-23 12:02:45 +02:00
5e0485d8dd
Merge #4131
...
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish
## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.
## Stats
### Impact on database size and indexing time

### Impact on search relevancy
<details>
| dataset_name | host_name | Relevancy rate (Precision) | completion_rate 25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | 1_4_0 | percentile-75 | 0.00% | 12.50% | 35.00% | 45.00% |
| FBIS | 1_4_0 | percentile-90 | 20.00% | 40.00% | | 100.00% |
| FBIS | 1_4_0 | average | 5.78% | 11.16% | 21.90% | 26.29% |
| FBIS | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | reduce_proximity | percentile-75 | 0.00% | 15.00% | 35.00% | 40.00% |
| FBIS | reduce_proximity | percentile-90 | 20.00% | 40.00% | 85.00% | 100.00% |
| FBIS | reduce_proximity | average | 5.55% | 11.34% | 21.75% | 26.14% |
| FR94 | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | 1_4_0 | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | 1_4_0 | average | 5.95% | 12.07% | 18.70% | 25.57% |
| FR94 | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | reduce_proximity | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | reduce_proximity | average | 5.79% | 12.00% | 18.70% | 25.53% |
| FT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | 1_4_0 | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | 1_4_0 | percentile-90 | 20.00% | 50.00% | 65.00% | 100.00% |
| FT | 1_4_0 | average | 5.08% | 12.58% | 20.00% | 25.49% |
| FT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | reduce_proximity | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | reduce_proximity | percentile-90 | 10.00% | 45.00% | 60.00% | 100.00% |
| FT | reduce_proximity | average | 5.01% | 12.64% | 20.10% | 25.53% |
| LAT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | 1_4_0 | percentile-75 | 5.00% | 15.00% | 30.00% | 30.00% |
| LAT | 1_4_0 | percentile-90 | 15.00% | 45.00% | 60.00% | 80.00% |
| LAT | 1_4_0 | average | 4.80% | 11.80% | 17.88% | 21.62% |
| LAT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | reduce_proximity | percentile-75 | 0.00% | 11.11% | 25.00% | 35.00% |
| LAT | reduce_proximity | percentile-90 | 15.00% | 45.00% | 55.00% | 80.00% |
| LAT | reduce_proximity | average | 4.43% | 11.23% | 17.32% | 21.45% |
</details>
### Impact on Search time
| dataset_name | host_name | 25.00% | 50.00% | 75.00% | 100.00% | Average |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS | 1_4_0 | 3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94 | 1_4_0 | 2.236666667 | 4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94 | reduce_proximity | 2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT | 1_4_0 | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 | 10.0787919 |
| FT | reduce_proximity | 4.51 | 5.981666667 | 7.701342282 | 6.766917293 | 6.23998156 |
| LAT | 1_4_0 | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT | reduce_proximity | 6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |
## Technical approach
- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4
## Related
TBD
Co-authored-by: ManyTheFish <many@meilisearch.com >
2023-10-18 14:56:08 +00:00
27eec21415
Fix tests
2023-10-18 16:03:22 +02:00
62dfd09dc6
Add more puffin logs to the deletion functions
2023-10-13 13:11:09 +02:00
f343ef5f2f
Merge #4108
...
4108: Fix bug where search with distinct attribute and no ranking, returns offset+limit hits r=curquiza a=vivek-26
# Pull Request
## Related issue
Fixes #4078
## What does this PR do?
This PR -
- Fixes bug where search with distinct attribute and no ranking, returns offset+limit hits.
- Adds unit and integration tests.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com >
2023-10-12 07:51:29 +00:00
d4da06ff47
fix bug where distinct search with no ranking returns offset+limit hits
2023-10-11 19:02:16 +05:30
c0f2724c2d
get rids of the new introduced error code in favor of an io::Error
2023-10-10 15:12:23 +02:00
d772073dfa
use a bufreader everytime there is a grenad<file>
2023-10-10 15:00:30 +02:00
43989fe2e4
Reduce porximity range from 7 to 3
2023-10-03 12:16:48 +02:00
487d493f49
Merge #4043
...
4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza
Co-authored-by: curquiza <curquiza@users.noreply.github.com >
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com >
Co-authored-by: curquiza <clementine@meilisearch.com >
2023-09-11 12:27:34 +00:00
abfa7ded25
use a new temp index in the test
2023-09-08 12:32:47 +05:30
f2837aaec2
add another test case
2023-09-08 11:39:54 +05:30
11df155598
fix highlighting bug when searching for a phrase with cropping
2023-09-08 11:39:52 +05:30
256cf33bca
Merge #4039
...
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops
This PR fixes #4035 , making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.
Co-authored-by: Kerollmops <clement@meilisearch.com >
2023-09-07 09:25:58 +00:00
679c0b0f97
Extract the vectors from the non-flattened version of the documents
2023-09-06 12:26:00 +02:00
e02d0064bd
Add a test case scenario
2023-09-06 12:26:00 +02:00
dc3d9c90d9
Merge #3994
...
3994: Fix synonyms with separators r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes #3977
## Available prototype
```
$ docker pull getmeili/meilisearch:prototype-fix-synonyms-with-separators-0
```
## What does this PR do?
- add a new test
- filter the empty synonyms after normalization
Co-authored-by: ManyTheFish <many@meilisearch.com >
2023-09-05 14:42:46 +00:00
66aa6d5871
Ignore tokens with empty normalized value during indexing process
2023-09-05 15:44:14 +02:00
8ac5b765bc
Fix synonyms normalization
2023-09-04 16:12:48 +02:00
085aad0a94
Add a test
2023-09-04 14:39:33 +02:00
ccf3ba3f32
Merge #4019
...
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops
Co-authored-by: Kerollmops <clement@meilisearch.com >
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com >
Co-authored-by: Clément Renault <clement@meilisearch.com >
2023-08-28 12:14:11 +00:00
8c0ebd1331
Update milli/src/search/new/bucket_sort.rs
...
Co-authored-by: Louis Dureuil <louis@meilisearch.com >
2023-08-23 16:40:39 +02:00
5130e06b41
Temporarily disable an assert in the ranking rules
2023-08-23 16:11:54 +02:00
914b125c5f
Merge #3945
...
3945: Do not leak field information on error r=Kerollmops a=vivek-26
# Pull Request
## Related issue
Fixes #3865
## What does this PR do?
This PR ensures that `InvalidSortableAttribute`and `InvalidFacetSearchFacetName` errors do not leak field information i.e. fields which are not part of `displayedAttributes` in the settings are hidden from the error message.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com >
2023-08-22 18:55:27 +00:00
c53841e166
Accept the null JSON value as the value of _vectors
2023-08-14 16:03:55 +02:00
e4e49e63d0
Merge #3993
...
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza
Co-authored-by: irevoire <irevoire@users.noreply.github.com >
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com >
Co-authored-by: ManyTheFish <many@meilisearch.com >
2023-08-10 14:30:02 +00:00
5a7c1bde84
Fix clippy
2023-08-10 11:27:56 +02:00
6b2d671be7
Fix PR comments
2023-08-10 10:44:07 +02:00
43c13faeda
Update milli/src/update/index_documents/extract/extract_docid_word_positions.rs
...
Co-authored-by: Tamo <tamo@meilisearch.com >
2023-08-10 10:05:03 +02:00
44c1900f36
Merge #3986
...
3986: Fix geo bounding box with strings r=ManyTheFish a=irevoire
# Pull Request
When sending a document with one geofield of type string (i.e.: `{ "_geo": { "lat": 12, "lng": "13" }}`), the geobounding box would exclude this document.
This PR fixes this issue by automatically parsing the string value in case we're working on a geofield.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3973
## What does this PR do?
- Automatically parse the facet value iif we're working on a geofield.
- Make insta works with snapshots in loops or closure executed multiple times. (you may need to update your cli if it panics after this PR: `cargo install cargo-insta`).
- Add one integration test in milli and in meilisearch to ensure it works forever.
- Add three snapshots for the dump that mysteriously disappeared I don't know how
Co-authored-by: Tamo <tamo@meilisearch.com >
2023-08-09 07:58:15 +00:00
8dc5acf998
Try fix
2023-08-08 16:52:36 +02:00
35758db9ec
Truncate the the normalized long facets used in search for facet value
2023-08-08 16:38:30 +02:00
4988199bb9
ensure the geoboundingbox works with strings and int geofields in milli and meilisearch
2023-08-08 16:29:25 +02:00
9d061cec26
automatically parse the filterable attribute to float if it's a geo field
2023-08-08 16:28:07 +02:00
4a21fecf67
Merge branch 'main' into settings-customizing-tokenization
2023-08-08 16:08:16 +02:00
dd57873f8e
hide fields not in the displayedAttributes list from errors
2023-08-05 16:03:10 +05:30
b45c36cd71
Merge branch 'main' into tmp-release-v1.3.0
2023-08-01 15:05:17 +02:00
9d5e3457e5
Fix clippy
2023-07-27 14:21:19 +02:00
939b2fc6fd
Merge #3949
...
3949: Fix score details casing r=Kerollmops a=ManyTheFish
# Pull Request
Fixes #3941
Co-authored-by: ManyTheFish <many@meilisearch.com >
2023-07-26 14:14:59 +00:00