Louis Dureuil 
							
						 
					 
					
						
						
							
						
						fa15be5bc4 
					 
					
						
						
							
							Add span around commit  
						
						
						
						
					 
					
						2024-11-26 09:45:48 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						d66dc363ed 
					 
					
						
						
							
							Test and implement settings opt-out  
						
						
						
						
					 
					
						2024-11-25 18:23:22 +01:00 
						 
				 
			
				
					
						
							
							
								meili-bors[bot] 
							
						 
					 
					
						
						
							
						
						19e6f675b3 
					 
					
						
						
							
							Merge  #4900  
						
						... 
						
						
						
						4900: Indexer edition 2024 r=Kerollmops a=dureuill
This PR is implementing the indexer edition 2024, largely inspired by [the ideas from this blog post](https://blog.kerollmops.com/meilisearch-is-too-slow ).
Fixes https://github.com/meilisearch/meilisearch/issues/4985 
## Features
- Stream-first approach to reading documents.
- Minimum disk write operations.
- RAM usage-first approach to avoid modifying common bitmaps on disk but in memory.
- Reduced LMDB fragmentation by writing entries only once...
- ...computing the final version of the entries in parallel...
- ...and storing them in write-optimized data structures before sending them to the BTree (LMDB).
- Indexing in multiple transactions to improve large dataset support (dumps).
Co-authored-by: ManyTheFish <many@meilisearch.com >
Co-authored-by: Clément Renault <clement@meilisearch.com >
Co-authored-by: Louis Dureuil <louis@meilisearch.com > 
						
						
					 
					
						2024-11-21 16:19:10 +00:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						221e547e86 
					 
					
						
						
							
							Slight changes  
						
						
						
						
					 
					
						2024-11-21 16:47:44 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						36962b943b 
					 
					
						
						
							
							First batch of PR comment  
						
						
						
						
					 
					
						2024-11-21 16:38:11 +01:00 
						 
				 
			
				
					
						
							
							
								Many the fish 
							
						 
					 
					
						
						
							
						
						ff38f29981 
					 
					
						
						
							
							Update crates/index-scheduler/src/batch.rs  
						
						... 
						
						
						
						Co-authored-by: Louis Dureuil <louis@meilisearch.com > 
						
						
					 
					
						2024-11-21 14:18:39 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						1f9692cd04 
					 
					
						
						
							
							Increase map size for tests  
						
						
						
						
					 
					
						2024-11-20 17:52:21 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						1e694ae432 
					 
					
						
						
							
							improve the count of the number of tasks in a batch  
						
						
						
						
					 
					
						2024-11-20 17:48:26 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						71807cac6d 
					 
					
						
						
							
							makes clippy happy  
						
						
						
						
					 
					
						2024-11-20 17:40:58 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						21a2264782 
					 
					
						
						
							
							improve the details and stats of the current batch processing  
						
						
						
						
					 
					
						2024-11-20 17:25:55 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						6e6acfcf1b 
					 
					
						
						
							
							Merge branch 'main' into indexer-edition-2024  
						
						
						
						
					 
					
						2024-11-20 16:59:58 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						ff9c92c409 
					 
					
						
						
							
							rename documents -> substep  
						
						
						
						
					 
					
						2024-11-20 15:12:02 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						8380ddbdcd 
					 
					
						
						
							
							Fix progress of into_changes  
						
						
						
						
					 
					
						2024-11-20 15:10:09 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						84600a10d1 
					 
					
						
						
							
							Add MSP to document_update.into_changes()  
						
						
						
						
					 
					
						2024-11-20 14:53:37 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						ec06879d28 
					 
					
						
						
							
							apply review changes  
						
						
						
						
					 
					
						2024-11-20 14:40:36 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						83d1f858c1 
					 
					
						
						
							
							Update crates/index-scheduler/src/lib.rs  
						
						... 
						
						
						
						Co-authored-by: Clément Renault <clement@meilisearch.com > 
						
						
					 
					
						2024-11-20 14:36:05 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						a7ac590e9e 
					 
					
						
						
							
							implements the reverse query parameter for the batches  
						
						
						
						
					 
					
						2024-11-20 13:29:52 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						8ad68dd708 
					 
					
						
						
							
							stop leaking the update files of the canceled tasks  
						
						
						
						
					 
					
						2024-11-20 13:17:54 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						7e379b3d14 
					 
					
						
						
							
							remove useless prints  
						
						
						
						
					 
					
						2024-11-20 12:27:12 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						bdb51a85fe 
					 
					
						
						
							
							now that the task cancelation shares their started at with all the tasks of their batch we don't need the trick of retrieving the previous batch anymore  
						
						
						
						
					 
					
						2024-11-20 10:51:07 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						e145d71a62 
					 
					
						
						
							
							implements the two last TODOs  
						
						
						
						
					 
					
						2024-11-20 10:51:06 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						d9a4e69990 
					 
					
						
						
							
							push a missing snapshot  
						
						
						
						
					 
					
						2024-11-20 10:51:06 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						b906e3ed70 
					 
					
						
						
							
							improve the way we access the mutex  
						
						
						
						
					 
					
						2024-11-20 10:51:06 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						4abcd9c04e 
					 
					
						
						
							
							add some stats on the batches  
						
						
						
						
					 
					
						2024-11-20 10:51:06 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						229fa0f902 
					 
					
						
						
							
							implements the batch details  
						
						
						
						
					 
					
						2024-11-20 10:51:06 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						62646af7b9 
					 
					
						
						
							
							implements the automatic batch deletion  
						
						
						
						
					 
					
						2024-11-20 10:51:06 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						1fcb9526f5 
					 
					
						
						
							
							fix the task cancelation  
						
						
						
						
					 
					
						2024-11-20 10:51:06 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						15eefa4fcc 
					 
					
						
						
							
							fixes a lot of small issue, the test about the cancellation is still failing  
						
						
						
						
					 
					
						2024-11-20 10:51:05 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						ad9763ffcd 
					 
					
						
						
							
							copy multiple task query tests to batches. Currently, they fails  
						
						
						
						
					 
					
						2024-11-20 10:49:25 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						d489f5635f 
					 
					
						
						
							
							add the mapping between the task and batches  
						
						
						
						
					 
					
						2024-11-20 10:49:23 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						a1251c3c83 
					 
					
						
						
							
							Implements the get all batches route with filters working  
						
						
						
						
					 
					
						2024-11-20 10:42:55 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						6062914654 
					 
					
						
						
							
							add the batch_id to the tasks  
						
						
						
						
					 
					
						2024-11-20 10:42:54 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						bfefaf71c2 
					 
					
						
						
							
							Progress displayed in logs  
						
						
						
						
					 
					
						2024-11-19 09:32:52 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						5f93651cef 
					 
					
						
						
							
							fixes  
						
						
						
						
					 
					
						2024-11-18 16:23:11 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						1f8b01a598 
					 
					
						
						
							
							Fix snap since _vectors is no longer part of the field distributions  
						
						
						
						
					 
					
						2024-11-18 12:50:59 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						e9d17136b2 
					 
					
						
						
							
							Add deadline of 3 seconds to embedding requests made in the context of hybrid search  
						
						
						
						
					 
					
						2024-11-18 12:15:11 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						5b4c06c24c 
					 
					
						
						
							
							Plug the grenad max memory parameter  
						
						
						
						
					 
					
						2024-11-18 11:28:04 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						c202f3dbe2 
					 
					
						
						
							
							fix tests and revert change in behavior when primary_key_from_op != primary_key_from_db && index.is_empty()  
						
						
						
						
					 
					
						2024-11-18 10:59:05 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						677d7293f5 
					 
					
						
						
							
							Fix a lot of primary key related tests  
						
						
						
						
					 
					
						2024-11-18 10:59:05 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						bd31ea2174 
					 
					
						
						
							
							Check for at least one valid task after setting their statuses  
						
						
						
						
					 
					
						2024-11-18 10:59:05 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						83865d2ebd 
					 
					
						
						
							
							Expose intermediate errors when processing batches  
						
						
						
						
					 
					
						2024-11-18 10:59:05 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						9e8367f1e6 
					 
					
						
						
							
							Move the rayon thread pool outside the extract method  
						
						
						
						
					 
					
						2024-11-14 10:40:32 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						1fcd5f091e 
					 
					
						
						
							
							Remove progress from task  
						
						
						
						
					 
					
						2024-11-12 12:23:13 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						e32677999f 
					 
					
						
						
							
							Adapt some snapshots  
						
						
						
						
					 
					
						2024-11-08 00:06:33 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						8a314ab81d 
					 
					
						
						
							
							Fix primary key fid order  
						
						
						
						
					 
					
						2024-11-08 00:05:12 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						2eb1801e85 
					 
					
						
						
							
							reverse the order of the task queue  
						
						
						
						
					 
					
						2024-11-07 19:19:44 +01:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						ee03743355 
					 
					
						
						
							
							Merge branch 'indexer-edition-2024' into indexer-edition-2024-doc-chunks  
						
						
						
						
					 
					
						2024-11-06 15:50:53 +01:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						10feeb88f2 
					 
					
						
						
							
							Merge branch 'main' into indexer-edition-2024  
						
						
						
						
					 
					
						2024-11-06 15:19:18 +01:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						cf6ad1ae5e 
					 
					
						
						
							
							Merge branch 'main' into tmp-release-v1.11.0  
						
						
						
						
					 
					
						2024-11-04 16:14:44 +01:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						9c1e54a2c8 
					 
					
						
						
							
							Move crates under a sub folder to clean up the code  
						
						
						
						
					 
					
						2024-10-21 08:18:43 +02:00