6079141ea6
snapshot the scores side by side with the score details
2024-03-19 18:30:14 +01:00
2c3af8e513
query the detailed score detail in the test
2024-03-19 18:09:02 +01:00
b8cda6c300
fix the search cutoff and add a test
2024-03-19 10:35:47 +01:00
4a467739cd
implements a first version of the cutoff without settings
2024-03-19 10:28:21 +01:00
5c95b5c933
chore: remove repetitive words
...
Signed-off-by: shuangcui <fliter@qq.com >
2024-03-14 21:28:55 +08:00
25f64ce7df
Replace logging timer by spans
2024-03-05 11:05:42 +01:00
452a343a2b
Fix imports
2024-02-28 18:09:40 +01:00
e773dfa9ba
get rids of log in milli and add logs for the bucket sort
2024-02-08 15:04:05 +01:00
5f5a486895
Reduce formatting time
2024-01-11 11:36:41 +01:00
5f4fc6c955
Add timer logs
2024-01-11 09:44:16 +01:00
9e1b458010
Merge branch 'main' into change-proximity-precision-settings
2023-12-18 09:08:47 +01:00
6425996e36
Change the naming of attributeScale and wordScale into byAttribute and byWord
2023-12-14 16:31:00 +01:00
806e5b6899
Tests pass
2023-12-14 16:08:41 +01:00
e0cc775dc4
Various changes
...
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
2023-12-14 16:08:41 +01:00
922a640188
WIP multi embedders
...
fixed template bugs
2023-12-14 16:08:41 +01:00
d4715e0c4d
Fix same vector sort bug
2023-12-14 16:08:41 +01:00
11e2a2c1aa
Fix geosort bug
2023-12-14 16:08:41 +01:00
65e49b7092
Remove stuff, add distribution shift (WIP)
2023-12-14 16:08:38 +01:00
cb4ebe163e
WIP
2023-12-14 16:07:49 +01:00
dde3a04679
WIP arroy integration
2023-12-14 16:07:49 +01:00
13c2c6c16b
Small commit to add hybrid search and autoembedding
2023-12-14 16:07:48 +01:00
56571f762a
Merge remote-tracking branch 'origin/main' into tmp-release-v1.5.1
2023-12-13 11:57:01 +01:00
467b49153d
Implement proximityPrecision setting on milli side
2023-12-06 15:49:02 +01:00
bddc168d83
List TODOs
2023-12-06 14:59:23 +01:00
3b3fa38f27
Put the restrict list in a sub-struct
2023-11-28 18:37:57 +01:00
d6c2ee15a9
Filter on attributes before computing the docids when attribute restriction is on
2023-11-28 14:55:29 +01:00
d32eb11329
Move to the v0.20.0-alpha.9 of heed
2023-11-27 11:52:22 +01:00
58dac8af42
Remove the panics and unwraps
2023-11-23 15:00:48 +01:00
0dbf1a16ff
Make clippy happy
2023-11-23 14:11:38 +01:00
0d4482625a
Make the changes to use heed v0.20-alpha.6
2023-11-23 11:43:58 +01:00
7cb7e37ba8
Merge branch 'main' into tmp-release-v1.5.0
2023-11-21 16:30:46 +01:00
1f36410541
Update tests
2023-11-13 13:36:39 +01:00
8c649d8061
Throw error when the vector search is sent with the wrong size
2023-11-13 09:57:42 +01:00
688266c83e
Remove word pair proximity prefix cache and compute it at search time
2023-11-08 14:16:01 +01:00
94206b0055
Update tests
2023-10-31 13:48:47 +01:00
1c5705c164
clean PR warnings
2023-10-30 11:22:05 +01:00
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use
2023-10-30 11:15:02 +01:00
17b647dfe5
Wip
2023-10-30 11:13:08 +01:00
e7244aa485
fix warnings
2023-10-30 11:00:46 +01:00
2bae9550c8
Add explanatory comment
2023-10-23 12:06:28 +02:00
5fe7c4545a
compute all candidates correctly when skipping
2023-10-23 12:02:45 +02:00
5e0485d8dd
Merge #4131
...
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish
## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.
## Stats
### Impact on database size and indexing time

### Impact on search relevancy
<details>
| dataset_name | host_name | Relevancy rate (Precision) | completion_rate 25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | 1_4_0 | percentile-75 | 0.00% | 12.50% | 35.00% | 45.00% |
| FBIS | 1_4_0 | percentile-90 | 20.00% | 40.00% | | 100.00% |
| FBIS | 1_4_0 | average | 5.78% | 11.16% | 21.90% | 26.29% |
| FBIS | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | reduce_proximity | percentile-75 | 0.00% | 15.00% | 35.00% | 40.00% |
| FBIS | reduce_proximity | percentile-90 | 20.00% | 40.00% | 85.00% | 100.00% |
| FBIS | reduce_proximity | average | 5.55% | 11.34% | 21.75% | 26.14% |
| FR94 | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | 1_4_0 | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | 1_4_0 | average | 5.95% | 12.07% | 18.70% | 25.57% |
| FR94 | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | reduce_proximity | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | reduce_proximity | average | 5.79% | 12.00% | 18.70% | 25.53% |
| FT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | 1_4_0 | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | 1_4_0 | percentile-90 | 20.00% | 50.00% | 65.00% | 100.00% |
| FT | 1_4_0 | average | 5.08% | 12.58% | 20.00% | 25.49% |
| FT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | reduce_proximity | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | reduce_proximity | percentile-90 | 10.00% | 45.00% | 60.00% | 100.00% |
| FT | reduce_proximity | average | 5.01% | 12.64% | 20.10% | 25.53% |
| LAT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | 1_4_0 | percentile-75 | 5.00% | 15.00% | 30.00% | 30.00% |
| LAT | 1_4_0 | percentile-90 | 15.00% | 45.00% | 60.00% | 80.00% |
| LAT | 1_4_0 | average | 4.80% | 11.80% | 17.88% | 21.62% |
| LAT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | reduce_proximity | percentile-75 | 0.00% | 11.11% | 25.00% | 35.00% |
| LAT | reduce_proximity | percentile-90 | 15.00% | 45.00% | 55.00% | 80.00% |
| LAT | reduce_proximity | average | 4.43% | 11.23% | 17.32% | 21.45% |
</details>
### Impact on Search time
| dataset_name | host_name | 25.00% | 50.00% | 75.00% | 100.00% | Average |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS | 1_4_0 | 3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94 | 1_4_0 | 2.236666667 | 4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94 | reduce_proximity | 2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT | 1_4_0 | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 | 10.0787919 |
| FT | reduce_proximity | 4.51 | 5.981666667 | 7.701342282 | 6.766917293 | 6.23998156 |
| LAT | 1_4_0 | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT | reduce_proximity | 6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |
## Technical approach
- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4
## Related
TBD
Co-authored-by: ManyTheFish <many@meilisearch.com >
2023-10-18 14:56:08 +00:00
27eec21415
Fix tests
2023-10-18 16:03:22 +02:00
d4da06ff47
fix bug where distinct search with no ranking returns offset+limit hits
2023-10-11 19:02:16 +05:30
43989fe2e4
Reduce porximity range from 7 to 3
2023-10-03 12:16:48 +02:00
abfa7ded25
use a new temp index in the test
2023-09-08 12:32:47 +05:30
f2837aaec2
add another test case
2023-09-08 11:39:54 +05:30
11df155598
fix highlighting bug when searching for a phrase with cropping
2023-09-08 11:39:52 +05:30
ccf3ba3f32
Merge #4019
...
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops
Co-authored-by: Kerollmops <clement@meilisearch.com >
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com >
Co-authored-by: Clément Renault <clement@meilisearch.com >
2023-08-28 12:14:11 +00:00
8c0ebd1331
Update milli/src/search/new/bucket_sort.rs
...
Co-authored-by: Louis Dureuil <louis@meilisearch.com >
2023-08-23 16:40:39 +02:00