ManyTheFish 
							
						 
					 
					
						
						
							
						
						ba90a5ec0e 
					 
					
						
						
							
							update extract fid word count docids  
						
						
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						b26dc9aabe 
					 
					
						
						
							
							Explanatory code comment  
						
						
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						66abac9364 
					 
					
						
						
							
							Use specialized KvReaderDelAdd type  
						
						... 
						
						
						
						Co-authored-by: Clément Renault <clement@meilisearch.com > 
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						59f88c14b3 
					 
					
						
						
							
							Simplify facet update after removing Index::faceted_documents_ids  
						
						
						
						
					 
					
						2023-10-30 11:39:29 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						14832cb324 
					 
					
						
						
							
							Remove Index::faceted_documents_ids  
						
						
						
						
					 
					
						2023-10-30 11:37:32 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						04ec293024 
					 
					
						
						
							
							Facet Incremental update  
						
						
						
						
					 
					
						2023-10-30 11:37:30 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						f67ff3a738 
					 
					
						
						
							
							Facets Bulk update  
						
						
						
						
					 
					
						2023-10-30 11:36:40 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						560e8f5613 
					 
					
						
						
							
							Introduce the CboRoaringBitmapCodec merge_deladd_into and use it  
						
						
						
						
					 
					
						2023-10-30 11:34:55 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						2d3f15f82c 
					 
					
						
						
							
							Introduce a function to only serialize the Add side of a DelAdd obkv  
						
						
						
						
					 
					
						2023-10-30 11:34:55 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						40186bf403 
					 
					
						
						
							
							Rename FieldIdWordCountDocids correctly  
						
						
						
						
					 
					
						2023-10-30 11:34:50 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						87e3d27878 
					 
					
						
						
							
							update extract word pair proximity to support deladd obkvs  
						
						
						
						
					 
					
						2023-10-30 11:34:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						6bcf8b4f8c 
					 
					
						
						
							
							update extract word position docids  
						
						
						
						
					 
					
						2023-10-30 11:34:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						46aa75abdb 
					 
					
						
						
							
							update extract word docids  
						
						
						
						
					 
					
						2023-10-30 11:34:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						2597bbd107 
					 
					
						
						
							
							Make script language docids map taking a tuple of roaring bitmaps expressing the deletions and the additions  
						
						
						
						
					 
					
						2023-10-30 11:34:00 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						e2bc054604 
					 
					
						
						
							
							Update extract_facet_string_docids to support deladd obkvs  
						
						
						
						
					 
					
						2023-10-30 11:32:36 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						fcd3a1434d 
					 
					
						
						
							
							Update extract_facet_number_docids to support deladd obkvs  
						
						
						
						
					 
					
						2023-10-30 11:31:04 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						a82dee21e0 
					 
					
						
						
							
							Rename docid_fid into fid_docid  
						
						
						
						
					 
					
						2023-10-30 11:31:02 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						bc45c1206d 
					 
					
						
						
							
							Implement all the facet extraction paths and simplify them  
						
						
						
						
					 
					
						2023-10-30 11:29:08 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						6ae4100f07 
					 
					
						
						
							
							Generate the DelAdd for is_null, is_empty, and exists  
						
						
						
						
					 
					
						2023-10-30 11:29:08 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						0c47defeee 
					 
					
						
						
							
							Work on fid docid facet values rewrite  
						
						
						
						
					 
					
						2023-10-30 11:29:06 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						313b16bec2 
					 
					
						
						
							
							Support diff indexing on extract_docid_word_positions  
						
						
						
						
					 
					
						2023-10-30 11:24:19 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						1dd97578a8 
					 
					
						
						
							
							Make the transform struct return diff-based documents obkvs  
						
						
						
						
					 
					
						2023-10-30 11:22:07 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						f5ef69293b 
					 
					
						
						
							
							deactivate prefix dbs  
						
						
						
						
					 
					
						2023-10-30 11:22:07 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						1c5705c164 
					 
					
						
						
							
							clean PR warnings  
						
						
						
						
					 
					
						2023-10-30 11:22:05 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						66c2c82a18 
					 
					
						
						
							
							Split wpp in several sorters  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						28a8d0ccda 
					 
					
						
						
							
							Fix word pair proximity  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						96be85396d 
					 
					
						
						
							
							Use a vecDeque in wpp database  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						df9e5c8651 
					 
					
						
						
							
							Generalize usage of CboRoaringBitmap codec to ease the use  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						b541d48847 
					 
					
						
						
							
							Add buffer to the obkv writter  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						8ccf32d1a0 
					 
					
						
						
							
							Compute word_fid_docids before word_docids and exact_word_docids  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						db1ca21231 
					 
					
						
						
							
							add puffin in sorter into reeder function  
						
						
						
						
					 
					
						2023-10-30 11:15:00 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						11ea5acff9 
					 
					
						
						
							
							Fix  
						
						
						
						
					 
					
						2023-10-30 11:13:10 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						8d77736a67 
					 
					
						
						
							
							Fix fid_word_docids  
						
						
						
						
					 
					
						2023-10-30 11:13:10 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						748b333161 
					 
					
						
						
							
							Add usefull debug assert before key insertion in database  
						
						
						
						
					 
					
						2023-10-30 11:13:10 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						17b647dfe5 
					 
					
						
						
							
							Wip  
						
						
						
						
					 
					
						2023-10-30 11:13:08 +01:00 
						 
				 
			
				
					
						
							
							
								meili-bors[bot] 
							
						 
					 
					
						
						
							
						
						5e0485d8dd 
					 
					
						
						
							
							Merge  #4131  
						
						... 
						
						
						
						4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish
## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.
## Stats
### Impact on database size and indexing time

### Impact on search relevancy
<details>
| dataset_name | host_name        | Relevancy rate (Precision) | completion_rate  25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | 1_4_0            | percentile-75 |           0.00% |          12.50% |          35.00% |          45.00% |
| FBIS         | 1_4_0            | percentile-90 |          20.00% |          40.00% |                 |         100.00% |
| FBIS         | 1_4_0            | average       |           5.78% |          11.16% |          21.90% |          26.29% |
| FBIS         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | reduce_proximity | percentile-75 |           0.00% |          15.00% |          35.00% |          40.00% |
| FBIS         | reduce_proximity | percentile-90 |          20.00% |          40.00% |          85.00% |         100.00% |
| FBIS         | reduce_proximity | average       |           5.55% |          11.34% |          21.75% |          26.14% |
| FR94         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | 1_4_0            | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | 1_4_0            | average       |           5.95% |          12.07% |          18.70% |          25.57% |
| FR94         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | reduce_proximity | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | reduce_proximity | average       |           5.79% |          12.00% |          18.70% |          25.53% |
| FT           | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | 1_4_0            | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | 1_4_0            | percentile-90 |          20.00% |          50.00% |          65.00% |         100.00% |
| FT           | 1_4_0            | average       |           5.08% |          12.58% |          20.00% |          25.49% |
| FT           | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | reduce_proximity | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | reduce_proximity | percentile-90 |          10.00% |          45.00% |          60.00% |         100.00% |
| FT           | reduce_proximity | average       |           5.01% |          12.64% |          20.10% |          25.53% |
| LAT          | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | 1_4_0            | percentile-75 |           5.00% |          15.00% |          30.00% |          30.00% |
| LAT          | 1_4_0            | percentile-90 |          15.00% |          45.00% |          60.00% |          80.00% |
| LAT          | 1_4_0            | average       |           4.80% |          11.80% |          17.88% |          21.62% |
| LAT          | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | reduce_proximity | percentile-75 |           0.00% |          11.11% |          25.00% |          35.00% |
| LAT          | reduce_proximity | percentile-90 |          15.00% |          45.00% |          55.00% |          80.00% |
| LAT          | reduce_proximity | average       |           4.43% |          11.23% |          17.32% |          21.45% |
</details>
### Impact on Search time
| dataset_name | host_name        |      25.00% |      50.00% |      75.00% |     100.00% | Average     |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS         | 1_4_0            |        3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS         | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94         | 1_4_0            | 2.236666667 |        4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94         | reduce_proximity |        2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT           | 1_4_0            | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 |  10.0787919 |
| FT           | reduce_proximity |        4.51 | 5.981666667 | 7.701342282 | 6.766917293 |  6.23998156 |
| LAT          | 1_4_0            | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT          | reduce_proximity |        6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |
## Technical approach
- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4
## Related
TBD
Co-authored-by: ManyTheFish <many@meilisearch.com > 
						
						
					 
					
						2023-10-18 14:56:08 +00:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						27eec21415 
					 
					
						
						
							
							Fix tests  
						
						
						
						
					 
					
						2023-10-18 16:03:22 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						62dfd09dc6 
					 
					
						
						
							
							Add more puffin logs to the deletion functions  
						
						
						
						
					 
					
						2023-10-13 13:11:09 +02:00 
						 
				 
			
				
					
						
							
							
								meili-bors[bot] 
							
						 
					 
					
						
						
							
						
						f343ef5f2f 
					 
					
						
						
							
							Merge  #4108  
						
						... 
						
						
						
						4108: Fix bug where search with distinct attribute and no ranking, returns offset+limit hits r=curquiza a=vivek-26
# Pull Request
## Related issue
Fixes  #4078  
## What does this PR do?
This PR - 
- Fixes bug where search with distinct attribute and no ranking, returns offset+limit hits.
- Adds unit and integration tests.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com > 
						
						
					 
					
						2023-10-12 07:51:29 +00:00 
						 
				 
			
				
					
						
							
							
								Vivek Kumar 
							
						 
					 
					
						
						
							
						
						19ba129165 
					 
					
						
						
							
							add unit test for distinct search with no ranking  
						
						
						
						
					 
					
						2023-10-11 19:02:27 +05:30 
						 
				 
			
				
					
						
							
							
								Vivek Kumar 
							
						 
					 
					
						
						
							
						
						d4da06ff47 
					 
					
						
						
							
							fix bug where distinct search with no ranking returns offset+limit hits  
						
						
						
						
					 
					
						2023-10-11 19:02:16 +05:30 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						c0f2724c2d 
					 
					
						
						
							
							get rids of the new introduced error code in favor of an io::Error  
						
						
						
						
					 
					
						2023-10-10 15:12:23 +02:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						d772073dfa 
					 
					
						
						
							
							use a bufreader everytime there is a grenad<file>  
						
						
						
						
					 
					
						2023-10-10 15:00:30 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						43989fe2e4 
					 
					
						
						
							
							Reduce porximity range from 7 to 3  
						
						
						
						
					 
					
						2023-10-03 12:16:48 +02:00 
						 
				 
			
				
					
						
							
							
								meili-bors[bot] 
							
						 
					 
					
						
						
							
						
						487d493f49 
					 
					
						
						
							
							Merge  #4043  
						
						... 
						
						
						
						4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza
Co-authored-by: curquiza <curquiza@users.noreply.github.com >
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com >
Co-authored-by: curquiza <clementine@meilisearch.com > 
						
						
					 
					
						2023-09-11 12:27:34 +00:00 
						 
				 
			
				
					
						
							
							
								Vivek Kumar 
							
						 
					 
					
						
						
							
						
						abfa7ded25 
					 
					
						
						
							
							use a new temp index in the test  
						
						
						
						
					 
					
						2023-09-08 12:32:47 +05:30 
						 
				 
			
				
					
						
							
							
								Vivek Kumar 
							
						 
					 
					
						
						
							
						
						f2837aaec2 
					 
					
						
						
							
							add another test case  
						
						
						
						
					 
					
						2023-09-08 11:39:54 +05:30 
						 
				 
			
				
					
						
							
							
								Vivek Kumar 
							
						 
					 
					
						
						
							
						
						11df155598 
					 
					
						
						
							
							fix highlighting bug when searching for a phrase with cropping  
						
						
						
						
					 
					
						2023-09-08 11:39:52 +05:30 
						 
				 
			
				
					
						
							
							
								meili-bors[bot] 
							
						 
					 
					
						
						
							
						
						256cf33bca 
					 
					
						
						
							
							Merge  #4039  
						
						... 
						
						
						
						4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops
This PR fixes  #4035 , making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.
Co-authored-by: Kerollmops <clement@meilisearch.com > 
						
						
					 
					
						2023-09-07 09:25:58 +00:00 
						 
				 
			
				
					
						
							
							
								Kerollmops 
							
						 
					 
					
						
						
							
						
						679c0b0f97 
					 
					
						
						
							
							Extract the vectors from the non-flattened version of the documents  
						
						
						
						
					 
					
						2023-09-06 12:26:00 +02:00