mirror of
				https://github.com/meilisearch/meilisearch.git
				synced 2025-10-26 13:36:27 +00:00 
			
		
		
		
	Merge #3670
3670: Fix addition deletion bug r=irevoire a=irevoire The first commit of this PR is a revert of https://github.com/meilisearch/meilisearch/pull/3667. It re-enable the auto-batching of addition and deletion of tasks. No new changes have been introduced outside of `milli`. So all the changes you see on the autobatcher have actually already been reviewed. It fixes https://github.com/meilisearch/meilisearch/issues/3440. ### What was happening? The issue was that the `external_documents_ids` generated in the `transform` were used in a very strange way that wasn’t compatible with the deletion of documents. Instead of doing a clear merge between the external document IDs of the DB and the one returned by the transform + writing it on disk, we were doing some weird tricks with the soft-deleted to avoid writing the fst on disk as much as possible. The new algorithm may be a bit slower but is way more straightforward and doesn’t change depending on if the soft deletion was used or not. Here is a list of the changes introduced: 1. We now do a clear distinction between the `new_external_documents_ids` coming from the transform and only held on RAM and the `external_documents_ids` coming from the DB. 2. The `new_external_documents_ids` (coming out of the transform) are now represented as an `fst`. We don't need to struggle with the hard, soft distinction + the soft_deleted => That's easier to understand 3. When indexing documents, we merge the `external_documents_ids` coming from the DB and the `new_external_documents_ids` coming from the transform. ### Other things introduced in this PR Since we constantly have to write small, very specialized fuzzers for this kind of bug, we decided to push the one used to reproduce this bug. It's not perfect, but it's easy to improve in the future. It'll also run for as long as possible on every merge on the main branch. Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
This commit is contained in:
		
							
								
								
									
										24
									
								
								.github/workflows/fuzzer-indexing.yml
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										24
									
								
								.github/workflows/fuzzer-indexing.yml
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @@ -0,0 +1,24 @@ | ||||
| name: Run the indexing fuzzer | ||||
|  | ||||
| on: | ||||
|   push: | ||||
|     branches: | ||||
|       - main | ||||
|  | ||||
| jobs: | ||||
|   fuzz: | ||||
|     name: Setup the action | ||||
|     runs-on: ubuntu-latest | ||||
|     timeout-minutes: 4320 # 72h | ||||
|     steps: | ||||
|       - uses: actions/checkout@v3 | ||||
|       - uses: actions-rs/toolchain@v1 | ||||
|         with: | ||||
|           profile: minimal | ||||
|           toolchain: stable | ||||
|           override: true | ||||
|  | ||||
|       # Run benchmarks | ||||
|       - name: Run the fuzzer | ||||
|         run: | | ||||
|           cargo run --release --bin fuzz-indexing | ||||
		Reference in New Issue
	
	Block a user