mirror of
https://github.com/meilisearch/meilisearch.git
synced 2025-07-28 01:01:00 +00:00
Merge #4131
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish ## Summary This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database. ## Stats ### Impact on database size and indexing time  ### Impact on search relevancy <details> | dataset_name | host_name | Relevancy rate (Precision) | completion_rate 25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% | |--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------| | FBIS | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% | | FBIS | 1_4_0 | percentile-75 | 0.00% | 12.50% | 35.00% | 45.00% | | FBIS | 1_4_0 | percentile-90 | 20.00% | 40.00% | | 100.00% | | FBIS | 1_4_0 | average | 5.78% | 11.16% | 21.90% | 26.29% | | FBIS | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% | | FBIS | reduce_proximity | percentile-75 | 0.00% | 15.00% | 35.00% | 40.00% | | FBIS | reduce_proximity | percentile-90 | 20.00% | 40.00% | 85.00% | 100.00% | | FBIS | reduce_proximity | average | 5.55% | 11.34% | 21.75% | 26.14% | | FR94 | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | 1_4_0 | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | 1_4_0 | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% | | FR94 | 1_4_0 | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% | | FR94 | 1_4_0 | average | 5.95% | 12.07% | 18.70% | 25.57% | | FR94 | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | reduce_proximity | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | reduce_proximity | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% | | FR94 | reduce_proximity | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% | | FR94 | reduce_proximity | average | 5.79% | 12.00% | 18.70% | 25.53% | | FT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% | | FT | 1_4_0 | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% | | FT | 1_4_0 | percentile-90 | 20.00% | 50.00% | 65.00% | 100.00% | | FT | 1_4_0 | average | 5.08% | 12.58% | 20.00% | 25.49% | | FT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% | | FT | reduce_proximity | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% | | FT | reduce_proximity | percentile-90 | 10.00% | 45.00% | 60.00% | 100.00% | | FT | reduce_proximity | average | 5.01% | 12.64% | 20.10% | 25.53% | | LAT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% | | LAT | 1_4_0 | percentile-75 | 5.00% | 15.00% | 30.00% | 30.00% | | LAT | 1_4_0 | percentile-90 | 15.00% | 45.00% | 60.00% | 80.00% | | LAT | 1_4_0 | average | 4.80% | 11.80% | 17.88% | 21.62% | | LAT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% | | LAT | reduce_proximity | percentile-75 | 0.00% | 11.11% | 25.00% | 35.00% | | LAT | reduce_proximity | percentile-90 | 15.00% | 45.00% | 55.00% | 80.00% | | LAT | reduce_proximity | average | 4.43% | 11.23% | 17.32% | 21.45% | </details> ### Impact on Search time | dataset_name | host_name | 25.00% | 50.00% | 75.00% | 100.00% | Average | |--------------|------------------|------------:|------------:|------------:|------------:|-------------| | FBIS | 1_4_0 | 3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 | | FBIS | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 | | FR94 | 1_4_0 | 2.236666667 | 4.45 | 5.523489933 | 4.560150376 | 4.192576744 | | FR94 | reduce_proximity | 2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 | | FT | 1_4_0 | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 | 10.0787919 | | FT | reduce_proximity | 4.51 | 5.981666667 | 7.701342282 | 6.766917293 | 6.23998156 | | LAT | 1_4_0 | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 | | LAT | reduce_proximity | 6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 | ## Technical approach - Ensure the MAX_DISTANCE constant is used everywhere needed - Reduce the MAX_DISTANCE from 8 to 4 ## Related TBD Co-authored-by: ManyTheFish <many@meilisearch.com>
This commit is contained in:
@ -20,7 +20,4 @@ source: milli/src/update/prefix_word_pairs/mod.rs
|
||||
3 at a [100, ]
|
||||
3 rings a [101, ]
|
||||
3 the a [101, ]
|
||||
4 at b [100, ]
|
||||
4 at be [100, ]
|
||||
4 bell a [101, ]
|
||||
|
||||
|
@ -30,10 +30,4 @@ source: milli/src/update/prefix_word_pairs/mod.rs
|
||||
3 bell 5 [101, ]
|
||||
3 rings am [101, ]
|
||||
3 the at [101, ]
|
||||
4 an house [100, ]
|
||||
4 at beautiful [100, ]
|
||||
4 bell am [101, ]
|
||||
4 the 5 [101, ]
|
||||
5 at house [100, ]
|
||||
5 the am [101, ]
|
||||
|
||||
|
@ -28,8 +28,4 @@ source: milli/src/update/prefix_word_pairs/mod.rs
|
||||
3 rings a [101, ]
|
||||
3 rings am [101, ]
|
||||
3 the a [101, ]
|
||||
4 at b [100, ]
|
||||
4 at be [100, ]
|
||||
4 bell a [101, ]
|
||||
4 bell am [101, ]
|
||||
|
||||
|
@ -7,5 +7,4 @@ source: milli/src/update/prefix_word_pairs/mod.rs
|
||||
2 bell a [51, ]
|
||||
3 rings a [51, ]
|
||||
3 the a [51, ]
|
||||
4 bell a [51, ]
|
||||
|
||||
|
@ -12,5 +12,4 @@ source: milli/src/update/prefix_word_pairs/mod.rs
|
||||
3 at a [50, ]
|
||||
3 rings a [51, ]
|
||||
3 the a [51, ]
|
||||
4 bell a [51, ]
|
||||
|
||||
|
@ -7,5 +7,4 @@ source: milli/src/update/prefix_word_pairs/mod.rs
|
||||
2 bell a [51, ]
|
||||
3 rings a [51, ]
|
||||
3 the a [51, ]
|
||||
4 bell a [51, ]
|
||||
|
||||
|
@ -12,5 +12,4 @@ source: milli/src/update/prefix_word_pairs/mod.rs
|
||||
3 at a [50, ]
|
||||
3 rings a [51, ]
|
||||
3 the a [51, ]
|
||||
4 bell a [51, ]
|
||||
|
||||
|
@ -12,5 +12,4 @@ source: milli/src/update/prefix_word_pairs/mod.rs
|
||||
3 at a [50, ]
|
||||
3 rings a [51, ]
|
||||
3 the a [51, ]
|
||||
4 bell a [51, ]
|
||||
|
||||
|
@ -16,6 +16,4 @@ source: milli/src/update/prefix_word_pairs/mod.rs
|
||||
3 at a [50, ]
|
||||
3 rings a [51, ]
|
||||
3 the a [51, ]
|
||||
4 at b [50, ]
|
||||
4 bell a [51, ]
|
||||
|
||||
|
@ -12,5 +12,4 @@ source: milli/src/update/prefix_word_pairs/mod.rs
|
||||
3 at a [50, ]
|
||||
3 rings a [51, ]
|
||||
3 the a [51, ]
|
||||
4 bell a [51, ]
|
||||
|
||||
|
Reference in New Issue
Block a user