Louis Dureuil 
							
						 
					 
					
						
						
							
						
						946c762d28 
					 
					
						
						
							
							WIP: reset documents in TypedChunk::Documents  
						
						
						
						
					 
					
						2023-10-30 11:40:20 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						cda6ca1ee6 
					 
					
						
						
							
							Remove TypedChunk::NewDocumentIds  
						
						
						
						
					 
					
						2023-10-30 11:40:18 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						696fcf4d18 
					 
					
						
						
							
							Fix document insertion into LMDB  
						
						
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						476e4d3dbe 
					 
					
						
						
							
							Use value buffer instead of the initial value when writting the final result in the sorter  
						
						
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						576fa9c6da 
					 
					
						
						
							
							Remove useless comment  
						
						
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								Kerollmops 
							
						 
					 
					
						
						
							
						
						77dcbff6b2 
					 
					
						
						
							
							Remove and Insert the DelAdd geo points  
						
						
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								Kerollmops 
							
						 
					 
					
						
						
							
						
						544440c363 
					 
					
						
						
							
							Ignore geo fields when the Del and Add content is the same  
						
						
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						a3dae4db9b 
					 
					
						
						
							
							Extract the geo fields DelAdd and generate a new DelAdd obkv with it  
						
						
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						ba90a5ec0e 
					 
					
						
						
							
							update extract fid word count docids  
						
						
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						b26dc9aabe 
					 
					
						
						
							
							Explanatory code comment  
						
						
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						66abac9364 
					 
					
						
						
							
							Use specialized KvReaderDelAdd type  
						
						... 
						
						
						
						Co-authored-by: Clément Renault <clement@meilisearch.com > 
						
						
					 
					
						2023-10-30 11:39:31 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						59f88c14b3 
					 
					
						
						
							
							Simplify facet update after removing Index::faceted_documents_ids  
						
						
						
						
					 
					
						2023-10-30 11:39:29 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						14832cb324 
					 
					
						
						
							
							Remove Index::faceted_documents_ids  
						
						
						
						
					 
					
						2023-10-30 11:37:32 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						04ec293024 
					 
					
						
						
							
							Facet Incremental update  
						
						
						
						
					 
					
						2023-10-30 11:37:30 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						f67ff3a738 
					 
					
						
						
							
							Facets Bulk update  
						
						
						
						
					 
					
						2023-10-30 11:36:40 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						560e8f5613 
					 
					
						
						
							
							Introduce the CboRoaringBitmapCodec merge_deladd_into and use it  
						
						
						
						
					 
					
						2023-10-30 11:34:55 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						2d3f15f82c 
					 
					
						
						
							
							Introduce a function to only serialize the Add side of a DelAdd obkv  
						
						
						
						
					 
					
						2023-10-30 11:34:55 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						40186bf403 
					 
					
						
						
							
							Rename FieldIdWordCountDocids correctly  
						
						
						
						
					 
					
						2023-10-30 11:34:50 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						87e3d27878 
					 
					
						
						
							
							update extract word pair proximity to support deladd obkvs  
						
						
						
						
					 
					
						2023-10-30 11:34:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						6bcf8b4f8c 
					 
					
						
						
							
							update extract word position docids  
						
						
						
						
					 
					
						2023-10-30 11:34:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						46aa75abdb 
					 
					
						
						
							
							update extract word docids  
						
						
						
						
					 
					
						2023-10-30 11:34:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						2597bbd107 
					 
					
						
						
							
							Make script language docids map taking a tuple of roaring bitmaps expressing the deletions and the additions  
						
						
						
						
					 
					
						2023-10-30 11:34:00 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						e2bc054604 
					 
					
						
						
							
							Update extract_facet_string_docids to support deladd obkvs  
						
						
						
						
					 
					
						2023-10-30 11:32:36 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						fcd3a1434d 
					 
					
						
						
							
							Update extract_facet_number_docids to support deladd obkvs  
						
						
						
						
					 
					
						2023-10-30 11:31:04 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						a82dee21e0 
					 
					
						
						
							
							Rename docid_fid into fid_docid  
						
						
						
						
					 
					
						2023-10-30 11:31:02 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						bc45c1206d 
					 
					
						
						
							
							Implement all the facet extraction paths and simplify them  
						
						
						
						
					 
					
						2023-10-30 11:29:08 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						6ae4100f07 
					 
					
						
						
							
							Generate the DelAdd for is_null, is_empty, and exists  
						
						
						
						
					 
					
						2023-10-30 11:29:08 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						0c47defeee 
					 
					
						
						
							
							Work on fid docid facet values rewrite  
						
						
						
						
					 
					
						2023-10-30 11:29:06 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						313b16bec2 
					 
					
						
						
							
							Support diff indexing on extract_docid_word_positions  
						
						
						
						
					 
					
						2023-10-30 11:24:19 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						1dd97578a8 
					 
					
						
						
							
							Make the transform struct return diff-based documents obkvs  
						
						
						
						
					 
					
						2023-10-30 11:22:07 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						f5ef69293b 
					 
					
						
						
							
							deactivate prefix dbs  
						
						
						
						
					 
					
						2023-10-30 11:22:07 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						1c5705c164 
					 
					
						
						
							
							clean PR warnings  
						
						
						
						
					 
					
						2023-10-30 11:22:05 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						66c2c82a18 
					 
					
						
						
							
							Split wpp in several sorters  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						28a8d0ccda 
					 
					
						
						
							
							Fix word pair proximity  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						96be85396d 
					 
					
						
						
							
							Use a vecDeque in wpp database  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						df9e5c8651 
					 
					
						
						
							
							Generalize usage of CboRoaringBitmap codec to ease the use  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						b541d48847 
					 
					
						
						
							
							Add buffer to the obkv writter  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						8ccf32d1a0 
					 
					
						
						
							
							Compute word_fid_docids before word_docids and exact_word_docids  
						
						
						
						
					 
					
						2023-10-30 11:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						db1ca21231 
					 
					
						
						
							
							add puffin in sorter into reeder function  
						
						
						
						
					 
					
						2023-10-30 11:15:00 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						11ea5acff9 
					 
					
						
						
							
							Fix  
						
						
						
						
					 
					
						2023-10-30 11:13:10 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						8d77736a67 
					 
					
						
						
							
							Fix fid_word_docids  
						
						
						
						
					 
					
						2023-10-30 11:13:10 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						748b333161 
					 
					
						
						
							
							Add usefull debug assert before key insertion in database  
						
						
						
						
					 
					
						2023-10-30 11:13:10 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						17b647dfe5 
					 
					
						
						
							
							Wip  
						
						
						
						
					 
					
						2023-10-30 11:13:08 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						e7244aa485 
					 
					
						
						
							
							fix warnings  
						
						
						
						
					 
					
						2023-10-30 11:00:46 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						2bae9550c8 
					 
					
						
						
							
							Add explanatory comment  
						
						
						
						
					 
					
						2023-10-23 12:06:28 +02:00 
						 
				 
			
				
					
						
							
							
								Vivek Kumar 
							
						 
					 
					
						
						
							
						
						5fe7c4545a 
					 
					
						
						
							
							compute all candidates correctly when skipping  
						
						
						
						
					 
					
						2023-10-23 12:02:45 +02:00 
						 
				 
			
				
					
						
							
							
								meili-bors[bot] 
							
						 
					 
					
						
						
							
						
						5e0485d8dd 
					 
					
						
						
							
							Merge  #4131  
						
						... 
						
						
						
						4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish
## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.
## Stats
### Impact on database size and indexing time

### Impact on search relevancy
<details>
| dataset_name | host_name        | Relevancy rate (Precision) | completion_rate  25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | 1_4_0            | percentile-75 |           0.00% |          12.50% |          35.00% |          45.00% |
| FBIS         | 1_4_0            | percentile-90 |          20.00% |          40.00% |                 |         100.00% |
| FBIS         | 1_4_0            | average       |           5.78% |          11.16% |          21.90% |          26.29% |
| FBIS         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | reduce_proximity | percentile-75 |           0.00% |          15.00% |          35.00% |          40.00% |
| FBIS         | reduce_proximity | percentile-90 |          20.00% |          40.00% |          85.00% |         100.00% |
| FBIS         | reduce_proximity | average       |           5.55% |          11.34% |          21.75% |          26.14% |
| FR94         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | 1_4_0            | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | 1_4_0            | average       |           5.95% |          12.07% |          18.70% |          25.57% |
| FR94         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | reduce_proximity | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | reduce_proximity | average       |           5.79% |          12.00% |          18.70% |          25.53% |
| FT           | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | 1_4_0            | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | 1_4_0            | percentile-90 |          20.00% |          50.00% |          65.00% |         100.00% |
| FT           | 1_4_0            | average       |           5.08% |          12.58% |          20.00% |          25.49% |
| FT           | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | reduce_proximity | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | reduce_proximity | percentile-90 |          10.00% |          45.00% |          60.00% |         100.00% |
| FT           | reduce_proximity | average       |           5.01% |          12.64% |          20.10% |          25.53% |
| LAT          | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | 1_4_0            | percentile-75 |           5.00% |          15.00% |          30.00% |          30.00% |
| LAT          | 1_4_0            | percentile-90 |          15.00% |          45.00% |          60.00% |          80.00% |
| LAT          | 1_4_0            | average       |           4.80% |          11.80% |          17.88% |          21.62% |
| LAT          | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | reduce_proximity | percentile-75 |           0.00% |          11.11% |          25.00% |          35.00% |
| LAT          | reduce_proximity | percentile-90 |          15.00% |          45.00% |          55.00% |          80.00% |
| LAT          | reduce_proximity | average       |           4.43% |          11.23% |          17.32% |          21.45% |
</details>
### Impact on Search time
| dataset_name | host_name        |      25.00% |      50.00% |      75.00% |     100.00% | Average     |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS         | 1_4_0            |        3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS         | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94         | 1_4_0            | 2.236666667 |        4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94         | reduce_proximity |        2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT           | 1_4_0            | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 |  10.0787919 |
| FT           | reduce_proximity |        4.51 | 5.981666667 | 7.701342282 | 6.766917293 |  6.23998156 |
| LAT          | 1_4_0            | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT          | reduce_proximity |        6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |
## Technical approach
- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4
## Related
TBD
Co-authored-by: ManyTheFish <many@meilisearch.com > 
						
						
					 
					
						2023-10-18 14:56:08 +00:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						27eec21415 
					 
					
						
						
							
							Fix tests  
						
						
						
						
					 
					
						2023-10-18 16:03:22 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						62dfd09dc6 
					 
					
						
						
							
							Add more puffin logs to the deletion functions  
						
						
						
						
					 
					
						2023-10-13 13:11:09 +02:00 
						 
				 
			
				
					
						
							
							
								meili-bors[bot] 
							
						 
					 
					
						
						
							
						
						f343ef5f2f 
					 
					
						
						
							
							Merge  #4108  
						
						... 
						
						
						
						4108: Fix bug where search with distinct attribute and no ranking, returns offset+limit hits r=curquiza a=vivek-26
# Pull Request
## Related issue
Fixes  #4078  
## What does this PR do?
This PR - 
- Fixes bug where search with distinct attribute and no ranking, returns offset+limit hits.
- Adds unit and integration tests.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com > 
						
						
					 
					
						2023-10-12 07:51:29 +00:00