meili-bors[bot] 
							
						 
					 
					
						
						
							
						
						19e6f675b3 
					 
					
						
						
							
							Merge  #4900  
						
						... 
						
						
						
						4900: Indexer edition 2024 r=Kerollmops a=dureuill
This PR is implementing the indexer edition 2024, largely inspired by [the ideas from this blog post](https://blog.kerollmops.com/meilisearch-is-too-slow ).
Fixes https://github.com/meilisearch/meilisearch/issues/4985 
## Features
- Stream-first approach to reading documents.
- Minimum disk write operations.
- RAM usage-first approach to avoid modifying common bitmaps on disk but in memory.
- Reduced LMDB fragmentation by writing entries only once...
- ...computing the final version of the entries in parallel...
- ...and storing them in write-optimized data structures before sending them to the BTree (LMDB).
- Indexing in multiple transactions to improve large dataset support (dumps).
Co-authored-by: ManyTheFish <many@meilisearch.com >
Co-authored-by: Clément Renault <clement@meilisearch.com >
Co-authored-by: Louis Dureuil <louis@meilisearch.com > 
						
						
					 
					
						2024-11-21 16:19:10 +00:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						323ecbb885 
					 
					
						
						
							
							Add span on document operation  
						
						
						
						
					 
					
						2024-11-21 17:01:10 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						ffb60cb885 
					 
					
						
						
							
							Add comment explaining why we fixed the version of insta  
						
						
						
						
					 
					
						2024-11-21 16:56:56 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						dcc3caef0d 
					 
					
						
						
							
							Remove TopLevelMap  
						
						
						
						
					 
					
						2024-11-21 16:56:46 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						221e547e86 
					 
					
						
						
							
							Slight changes  
						
						
						
						
					 
					
						2024-11-21 16:47:44 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						61d0615253 
					 
					
						
						
							
							Document the geo point extractor  
						
						
						
						
					 
					
						2024-11-21 16:47:08 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						5727e00374 
					 
					
						
						
							
							Remove useless geo skipped  
						
						
						
						
					 
					
						2024-11-21 16:47:08 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						9b60843831 
					 
					
						
						
							
							Remove commented lines  
						
						
						
						
					 
					
						2024-11-21 16:47:07 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						36962b943b 
					 
					
						
						
							
							First batch of PR comment  
						
						
						
						
					 
					
						2024-11-21 16:38:11 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						32bcacefd5 
					 
					
						
						
							
							Changes Document::len to Document::top_level_fields_count  
						
						
						
						
					 
					
						2024-11-21 15:01:07 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						4ed195426c 
					 
					
						
						
							
							remove unused stuff in global.rs  
						
						
						
						
					 
					
						2024-11-21 15:01:07 +01:00 
						 
				 
			
				
					
						
							
							
								Many the fish 
							
						 
					 
					
						
						
							
						
						ff38f29981 
					 
					
						
						
							
							Update crates/index-scheduler/src/batch.rs  
						
						... 
						
						
						
						Co-authored-by: Louis Dureuil <louis@meilisearch.com > 
						
						
					 
					
						2024-11-21 14:18:39 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						94b260fd25 
					 
					
						
						
							
							Remove orphan span  
						
						
						
						
					 
					
						2024-11-21 12:12:07 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						ab2c83f868 
					 
					
						
						
							
							Use the disk less when computing prefixes  
						
						
						
						
					 
					
						2024-11-21 10:45:37 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						1f9692cd04 
					 
					
						
						
							
							Increase map size for tests  
						
						
						
						
					 
					
						2024-11-20 17:52:21 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						1e694ae432 
					 
					
						
						
							
							improve the count of the number of tasks in a batch  
						
						
						
						
					 
					
						2024-11-20 17:48:26 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						71807cac6d 
					 
					
						
						
							
							makes clippy happy  
						
						
						
						
					 
					
						2024-11-20 17:40:58 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						21a2264782 
					 
					
						
						
							
							improve the details and stats of the current batch processing  
						
						
						
						
					 
					
						2024-11-20 17:25:55 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						bda2b41d11 
					 
					
						
						
							
							update snaps after merge  
						
						
						
						
					 
					
						2024-11-20 17:08:30 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						6e6acfcf1b 
					 
					
						
						
							
							Merge branch 'main' into indexer-edition-2024  
						
						
						
						
					 
					
						2024-11-20 16:59:58 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						e0864f1b21 
					 
					
						
						
							
							Separate side effect and debug asserts  
						
						
						
						
					 
					
						2024-11-20 16:25:17 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						a38344acb3 
					 
					
						
						
							
							Replace eprintlns by tracing  
						
						
						
						
					 
					
						2024-11-20 15:29:51 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						4d616f8794 
					 
					
						
						
							
							Parse every attributes and filter before tokenization  
						
						
						
						
					 
					
						2024-11-20 15:15:25 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						ff9c92c409 
					 
					
						
						
							
							rename documents -> substep  
						
						
						
						
					 
					
						2024-11-20 15:12:02 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						8380ddbdcd 
					 
					
						
						
							
							Fix progress of into_changes  
						
						
						
						
					 
					
						2024-11-20 15:10:09 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						867138f166 
					 
					
						
						
							
							Add SP to into_changes  
						
						
						
						
					 
					
						2024-11-20 15:07:05 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						567bd4538b 
					 
					
						
						
							
							Fxi the into_changes stop processing  
						
						
						
						
					 
					
						2024-11-20 14:58:25 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						84600a10d1 
					 
					
						
						
							
							Add MSP to document_update.into_changes()  
						
						
						
						
					 
					
						2024-11-20 14:53:37 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						35bbe1c2a2 
					 
					
						
						
							
							Add failing test on settings changes  
						
						
						
						
					 
					
						2024-11-20 14:48:12 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						7d64e8dbd3 
					 
					
						
						
							
							Fix Windows compilation  
						
						
						
						
					 
					
						2024-11-20 14:40:38 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						ec06879d28 
					 
					
						
						
							
							apply review changes  
						
						
						
						
					 
					
						2024-11-20 14:40:36 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						83d1f858c1 
					 
					
						
						
							
							Update crates/index-scheduler/src/lib.rs  
						
						... 
						
						
						
						Co-authored-by: Clément Renault <clement@meilisearch.com > 
						
						
					 
					
						2024-11-20 14:36:05 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						cae8c89467 
					 
					
						
						
							
							"fix" last warnings  
						
						
						
						
					 
					
						2024-11-20 14:03:52 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						a7ac590e9e 
					 
					
						
						
							
							implements the reverse query parameter for the batches  
						
						
						
						
					 
					
						2024-11-20 13:29:52 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						7cb8732b45 
					 
					
						
						
							
							Introduce a new bincode internal error  
						
						
						
						
					 
					
						2024-11-20 13:23:11 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						8ad68dd708 
					 
					
						
						
							
							stop leaking the update files of the canceled tasks  
						
						
						
						
					 
					
						2024-11-20 13:17:54 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						fe5d50969a 
					 
					
						
						
							
							Fix filed selector in extrators  
						
						
						
						
					 
					
						2024-11-20 13:16:44 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						56c7c5d5f0 
					 
					
						
						
							
							Fix comments  
						
						
						
						
					 
					
						2024-11-20 13:16:44 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						4cdfdddd6d 
					 
					
						
						
							
							Fix one more  
						
						
						
						
					 
					
						2024-11-20 13:16:43 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						2afa33011a 
					 
					
						
						
							
							Fix tokenize_document  
						
						
						
						
					 
					
						2024-11-20 13:16:43 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						61feca1f41 
					 
					
						
						
							
							More tests pass  
						
						
						
						
					 
					
						2024-11-20 13:16:43 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						f893b5153e 
					 
					
						
						
							
							Don't mark [""] as empty facet  
						
						
						
						
					 
					
						2024-11-20 13:16:42 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						ca779c21f9 
					 
					
						
						
							
							facets: Handle boolean and skip empty strings  
						
						
						
						
					 
					
						2024-11-20 13:16:42 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						477077bdc2 
					 
					
						
						
							
							Remove _vectors from fid map when there are no vectors in sight  
						
						
						
						
					 
					
						2024-11-20 13:16:42 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						b1f8aec348 
					 
					
						
						
							
							Fix index_documents_check_exists_database  
						
						
						
						
					 
					
						2024-11-20 13:16:41 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						ba7f091db3 
					 
					
						
						
							
							Use tokenizer on numbers and booleans  
						
						
						
						
					 
					
						2024-11-20 13:16:41 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						8049df125b 
					 
					
						
						
							
							Add depth to facet extraction so that null inside an array doesn't mark the entire field as null  
						
						
						
						
					 
					
						2024-11-20 13:16:40 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						50d1bd01df 
					 
					
						
						
							
							We no longer index geo lat and lng  
						
						
						
						
					 
					
						2024-11-20 13:16:40 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						a28d4f5d0c 
					 
					
						
						
							
							Fix setup_search_index_with_criteria  
						
						
						
						
					 
					
						2024-11-20 13:16:40 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						fc14f4bc66 
					 
					
						
						
							
							Attempt to fix setup_search_index_with_criteria  
						
						
						
						
					 
					
						2024-11-20 13:16:39 +01:00