Commit Graph

1894 Commits

Author SHA1 Message Date
ManyTheFish
96be85396d Use a vecDeque in wpp database 2023-10-30 11:15:02 +01:00
ManyTheFish
df9e5c8651 Generalize usage of CboRoaringBitmap codec to ease the use 2023-10-30 11:15:02 +01:00
ManyTheFish
b541d48847 Add buffer to the obkv writter 2023-10-30 11:15:02 +01:00
ManyTheFish
8ccf32d1a0 Compute word_fid_docids before word_docids and exact_word_docids 2023-10-30 11:15:02 +01:00
ManyTheFish
db1ca21231 add puffin in sorter into reeder function 2023-10-30 11:15:00 +01:00
ManyTheFish
11ea5acff9 Fix 2023-10-30 11:13:10 +01:00
ManyTheFish
8d77736a67 Fix fid_word_docids 2023-10-30 11:13:10 +01:00
ManyTheFish
748b333161 Add usefull debug assert before key insertion in database 2023-10-30 11:13:10 +01:00
ManyTheFish
17b647dfe5 Wip 2023-10-30 11:13:08 +01:00
meili-bors[bot]
5e0485d8dd Merge #4131
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish

## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.

## Stats

### Impact on database size and indexing time
![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23)

### Impact on search relevancy

<details>

| dataset_name | host_name        | Relevancy rate (Precision) | completion_rate  25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | 1_4_0            | percentile-75 |           0.00% |          12.50% |          35.00% |          45.00% |
| FBIS         | 1_4_0            | percentile-90 |          20.00% |          40.00% |                 |         100.00% |
| FBIS         | 1_4_0            | average       |           5.78% |          11.16% |          21.90% |          26.29% |
| FBIS         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | reduce_proximity | percentile-75 |           0.00% |          15.00% |          35.00% |          40.00% |
| FBIS         | reduce_proximity | percentile-90 |          20.00% |          40.00% |          85.00% |         100.00% |
| FBIS         | reduce_proximity | average       |           5.55% |          11.34% |          21.75% |          26.14% |
| FR94         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | 1_4_0            | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | 1_4_0            | average       |           5.95% |          12.07% |          18.70% |          25.57% |
| FR94         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | reduce_proximity | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | reduce_proximity | average       |           5.79% |          12.00% |          18.70% |          25.53% |
| FT           | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | 1_4_0            | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | 1_4_0            | percentile-90 |          20.00% |          50.00% |          65.00% |         100.00% |
| FT           | 1_4_0            | average       |           5.08% |          12.58% |          20.00% |          25.49% |
| FT           | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | reduce_proximity | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | reduce_proximity | percentile-90 |          10.00% |          45.00% |          60.00% |         100.00% |
| FT           | reduce_proximity | average       |           5.01% |          12.64% |          20.10% |          25.53% |
| LAT          | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | 1_4_0            | percentile-75 |           5.00% |          15.00% |          30.00% |          30.00% |
| LAT          | 1_4_0            | percentile-90 |          15.00% |          45.00% |          60.00% |          80.00% |
| LAT          | 1_4_0            | average       |           4.80% |          11.80% |          17.88% |          21.62% |
| LAT          | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | reduce_proximity | percentile-75 |           0.00% |          11.11% |          25.00% |          35.00% |
| LAT          | reduce_proximity | percentile-90 |          15.00% |          45.00% |          55.00% |          80.00% |
| LAT          | reduce_proximity | average       |           4.43% |          11.23% |          17.32% |          21.45% |

</details>

### Impact on Search time

| dataset_name | host_name        |      25.00% |      50.00% |      75.00% |     100.00% | Average     |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS         | 1_4_0            |        3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS         | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94         | 1_4_0            | 2.236666667 |        4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94         | reduce_proximity |        2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT           | 1_4_0            | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 |  10.0787919 |
| FT           | reduce_proximity |        4.51 | 5.981666667 | 7.701342282 | 6.766917293 |  6.23998156 |
| LAT          | 1_4_0            | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT          | reduce_proximity |        6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |

## Technical approach

- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4

## Related

TBD

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-10-18 14:56:08 +00:00
ManyTheFish
27eec21415 Fix tests 2023-10-18 16:03:22 +02:00
Clément Renault
62dfd09dc6 Add more puffin logs to the deletion functions 2023-10-13 13:11:09 +02:00
meili-bors[bot]
f343ef5f2f Merge #4108
4108: Fix bug where search with distinct attribute and no ranking, returns offset+limit hits r=curquiza a=vivek-26

# Pull Request

## Related issue
Fixes #4078 

## What does this PR do?
This PR - 
- Fixes bug where search with distinct attribute and no ranking, returns offset+limit hits.
- Adds unit and integration tests.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
2023-10-12 07:51:29 +00:00
Vivek Kumar
19ba129165 add unit test for distinct search with no ranking 2023-10-11 19:02:27 +05:30
Vivek Kumar
d4da06ff47 fix bug where distinct search with no ranking returns offset+limit hits 2023-10-11 19:02:16 +05:30
Tamo
c0f2724c2d get rids of the new introduced error code in favor of an io::Error 2023-10-10 15:12:23 +02:00
Tamo
d772073dfa use a bufreader everytime there is a grenad<file> 2023-10-10 15:00:30 +02:00
ManyTheFish
43989fe2e4 Reduce porximity range from 7 to 3 2023-10-03 12:16:48 +02:00
meili-bors[bot]
487d493f49 Merge #4043
4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza



Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2023-09-11 12:27:34 +00:00
Vivek Kumar
abfa7ded25 use a new temp index in the test 2023-09-08 12:32:47 +05:30
Vivek Kumar
f2837aaec2 add another test case 2023-09-08 11:39:54 +05:30
Vivek Kumar
11df155598 fix highlighting bug when searching for a phrase with cropping 2023-09-08 11:39:52 +05:30
meili-bors[bot]
256cf33bca Merge #4039
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops

This PR fixes #4035, making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-09-07 09:25:58 +00:00
Kerollmops
679c0b0f97 Extract the vectors from the non-flattened version of the documents 2023-09-06 12:26:00 +02:00
Kerollmops
e02d0064bd Add a test case scenario 2023-09-06 12:26:00 +02:00
meili-bors[bot]
dc3d9c90d9 Merge #3994
3994: Fix synonyms with separators r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #3977

## Available prototype
```
$ docker pull getmeili/meilisearch:prototype-fix-synonyms-with-separators-0
```

## What does this PR do?
- add a new test
- filter the empty synonyms after normalization


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-09-05 14:42:46 +00:00
ManyTheFish
66aa6d5871 Ignore tokens with empty normalized value during indexing process 2023-09-05 15:44:14 +02:00
Kerollmops
8ac5b765bc Fix synonyms normalization 2023-09-04 16:12:48 +02:00
Kerollmops
085aad0a94 Add a test 2023-09-04 14:39:33 +02:00
meili-bors[bot]
ccf3ba3f32 Merge #4019
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
Clément Renault
8c0ebd1331 Update milli/src/search/new/bucket_sort.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-08-23 16:40:39 +02:00
Kerollmops
5130e06b41 Temporarily disable an assert in the ranking rules 2023-08-23 16:11:54 +02:00
meili-bors[bot]
914b125c5f Merge #3945
3945: Do not leak field information on error r=Kerollmops a=vivek-26

# Pull Request

## Related issue
Fixes #3865

## What does this PR do?
This PR ensures that `InvalidSortableAttribute`and `InvalidFacetSearchFacetName` errors do not leak field information i.e. fields which are not part of `displayedAttributes` in the settings are hidden from the error message.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
2023-08-22 18:55:27 +00:00
Kerollmops
717b069907 Bump charabia to 0.8.3 2023-08-22 16:25:00 +02:00
Kerollmops
c53841e166 Accept the null JSON value as the value of _vectors 2023-08-14 16:03:55 +02:00
ManyTheFish
cab27c2ab4 upgrade indexmap = "2.0.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
624fa9052f upgrade deserr = "0.6.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
60c11dbdbd upgrade rstar - "0.11.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
dacee40ebc upgrade memmap2 = "0.7.1" 2023-08-10 18:09:02 +02:00
ManyTheFish
cc2c19d4c3 upgrade itertools = "0.10.5" 2023-08-10 18:09:02 +02:00
meili-bors[bot]
e4e49e63d0 Merge #3993
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza



Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-10 14:30:02 +00:00
ManyTheFish
5a7c1bde84 Fix clippy 2023-08-10 11:27:56 +02:00
ManyTheFish
6b2d671be7 Fix PR comments 2023-08-10 10:44:07 +02:00
Many the fish
43c13faeda Update milli/src/update/index_documents/extract/extract_docid_word_positions.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-10 10:05:03 +02:00
meili-bors[bot]
44c1900f36 Merge #3986
3986: Fix geo bounding box with strings r=ManyTheFish a=irevoire

# Pull Request

When sending a document with one geofield of type string (i.e.: `{ "_geo": { "lat": 12, "lng": "13" }}`), the geobounding box would exclude this document.

This PR fixes this issue by automatically parsing the string value in case we're working on a geofield.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3973

## What does this PR do?
- Automatically parse the facet value iif we're working on a geofield.
- Make insta works with snapshots in loops or closure executed multiple times. (you may need to update your cli if it panics after this PR: `cargo install cargo-insta`).
- Add one integration test in milli and in meilisearch to ensure it works forever.
- Add three snapshots for the dump that mysteriously disappeared I don't know how


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-09 07:58:15 +00:00
ManyTheFish
8dc5acf998 Try fix 2023-08-08 16:52:36 +02:00
ManyTheFish
35758db9ec Truncate the the normalized long facets used in search for facet value 2023-08-08 16:38:30 +02:00
Tamo
4988199bb9 ensure the geoboundingbox works with strings and int geofields in milli and meilisearch 2023-08-08 16:29:25 +02:00
Tamo
9d061cec26 automatically parse the filterable attribute to float if it's a geo field 2023-08-08 16:28:07 +02:00
ManyTheFish
4a21fecf67 Merge branch 'main' into settings-customizing-tokenization 2023-08-08 16:08:16 +02:00