ManyTheFish 
							
						 
					 
					
						
						
							
						
						774ed28539 
					 
					
						
						
							
							Fix Prefix FST when a document is modified  
						
						
						
						
					 
					
						2024-10-03 11:12:26 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						d79f75f630 
					 
					
						
						
							
							Compute and Write external-documents-ids database  
						
						
						
						
					 
					
						2024-10-03 11:11:56 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						b7a5ba100e 
					 
					
						
						
							
							Move the ParallelIteratorExt into the parallel_iterator_ext module  
						
						
						
						
					 
					
						2024-10-01 11:11:52 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						dead7a56a3 
					 
					
						
						
							
							Keep the caches in the AppendOnlyVec  
						
						
						
						
					 
					
						2024-10-01 11:11:39 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						0a8cb471df 
					 
					
						
						
							
							Introduce the AppendOnlyVec struct for the parallel computing  
						
						
						
						
					 
					
						2024-10-01 11:11:25 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						00e045b249 
					 
					
						
						
							
							Rename and use the try_arc_for_each_try_init method  
						
						
						
						
					 
					
						2024-10-01 11:11:25 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						d83c9a4074 
					 
					
						
						
							
							Introduce the try_for_each_try_init method to be used with Arced Errors  
						
						
						
						
					 
					
						2024-10-01 11:11:25 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						f3356ddaa4 
					 
					
						
						
							
							Fix the errors when using the try_map_try_init method  
						
						
						
						
					 
					
						2024-10-01 11:11:10 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						31de5c747e 
					 
					
						
						
							
							WIP using try_map_try_init  
						
						
						
						
					 
					
						2024-10-01 11:10:53 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						3843240940 
					 
					
						
						
							
							Prefer using Ars instead of Options  
						
						
						
						
					 
					
						2024-10-01 11:10:53 +02:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						8cb5e7437d 
					 
					
						
						
							
							try using try_map_try_init  
						
						
						
						
					 
					
						2024-10-01 11:10:53 +02:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						5b776556fe 
					 
					
						
						
							
							Add ParallelIteratorExt  
						
						
						
						
					 
					
						2024-10-01 11:10:53 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						bb7a503e5d 
					 
					
						
						
							
							Compute prefix databases  
						
						... 
						
						
						
						We are now computing the prefix FST and a prefix delta in the Merger thread,
after all the databases are written, the main thread will recompute the prefix databases based on the prefix delta without needing any grenad temporary file anymore 
						
						
					 
					
						2024-10-01 09:57:06 +02:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						64589278ac 
					 
					
						
						
							
							Appease *some* of clippy warnings  
						
						
						
						
					 
					
						2024-09-30 16:08:29 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						8df6daf308 
					 
					
						
						
							
							Remove fid_wordcount_docids.rs  
						
						
						
						
					 
					
						2024-09-30 11:52:31 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						5b552caf42 
					 
					
						
						
							
							Fix position in insertions  
						
						
						
						
					 
					
						2024-09-30 11:46:32 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						2b51a63418 
					 
					
						
						
							
							Remove dead code  
						
						
						
						
					 
					
						2024-09-30 11:42:36 +02:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						3d8024fb2b 
					 
					
						
						
							
							write the weighted fields ids map  
						
						
						
						
					 
					
						2024-09-30 11:35:03 +02:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						4b0da0ff24 
					 
					
						
						
							
							Fix inversion of field_id and position  
						
						
						
						
					 
					
						2024-09-30 11:34:50 +02:00 
						 
				 
			
				
					
						
							
							
								Louis Dureuil 
							
						 
					 
					
						
						
							
						
						079f2b5de0 
					 
					
						
						
							
							Format error messages consistently  
						
						
						
						
					 
					
						2024-09-30 11:34:31 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						960060ebdf 
					 
					
						
						
							
							Fix fst builder when their is no previous FST  
						
						
						
						
					 
					
						2024-09-25 16:53:00 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						3d244451df 
					 
					
						
						
							
							Reduce the lru key size from 8 to 12 bytes  
						
						
						
						
					 
					
						2024-09-25 16:14:13 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						5f53935c8a 
					 
					
						
						
							
							Fix a bug in the Lru  
						
						
						
						
					 
					
						2024-09-25 16:09:34 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						29a7623c3f 
					 
					
						
						
							
							Fxi some logs  
						
						
						
						
					 
					
						2024-09-25 15:57:50 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						e97041f7d0 
					 
					
						
						
							
							Replace the Lru free list by a simple increment  
						
						
						
						
					 
					
						2024-09-25 15:55:52 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						52d7f3ed1c 
					 
					
						
						
							
							Reduce the lru key size from 20 to 8 bytes  
						
						
						
						
					 
					
						2024-09-25 15:37:13 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						86d5e6d9ff 
					 
					
						
						
							
							Use the new Lru  
						
						
						
						
					 
					
						2024-09-25 14:54:56 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						759b9b1546 
					 
					
						
						
							
							Introduce a new custom Lru  
						
						
						
						
					 
					
						2024-09-25 14:49:12 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						3f7a500f3b 
					 
					
						
						
							
							Build prefix fst  
						
						
						
						
					 
					
						2024-09-25 14:36:06 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						974272f2e9 
					 
					
						
						
							
							Merge branch 'main' into indexer-edition-2024  
						
						
						
						
					 
					
						2024-09-25 07:41:16 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						7ad037841f 
					 
					
						
						
							
							Move the tracing info to eprintln  
						
						
						
						
					 
					
						2024-09-24 18:21:58 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						e0c7067355 
					 
					
						
						
							
							Expose an IndexedParallelIterator to the index function  
						
						
						
						
					 
					
						2024-09-24 17:24:59 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						6e87332410 
					 
					
						
						
							
							Change the way the FST is built  
						
						
						
						
					 
					
						2024-09-24 16:28:31 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						2d1caf27df 
					 
					
						
						
							
							Use eprintln to log  
						
						
						
						
					 
					
						2024-09-24 15:59:50 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						7f148c127c 
					 
					
						
						
							
							Measure the SmallVec efficacity  
						
						
						
						
					 
					
						2024-09-24 15:32:15 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						4ce5d3d66d 
					 
					
						
						
							
							Do not check before pushing in bitmaps  
						
						
						
						
					 
					
						2024-09-24 09:43:16 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						42b093687d 
					 
					
						
						
							
							Introduce the new PushOptimizedBitmap  
						
						
						
						
					 
					
						2024-09-23 16:38:21 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						f00664247d 
					 
					
						
						
							
							Add more stats about the channel message sent  
						
						
						
						
					 
					
						2024-09-23 15:13:52 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						193d7f5d34 
					 
					
						
						
							
							Add the mutualized charabia normalization  
						
						
						
						
					 
					
						2024-09-23 14:24:25 +02:00 
						 
				 
			
				
					
						
							
							
								Clément Renault 
							
						 
					 
					
						
						
							
						
						013acb3d93 
					 
					
						
						
							
							Measure merger writer channel contention  
						
						
						
						
					 
					
						2024-09-23 11:07:59 +02:00 
						 
				 
			
				
					
						
							
							
								meili-bors[bot] 
							
						 
					 
					
						
						
							
						
						462a2329f1 
					 
					
						
						
							
							Merge  #4941  
						
						... 
						
						
						
						4941: Implement the binary quantization in meilisearch r=irevoire a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4873 
## What does this PR do?
- Add a settings for the binary quantization
- Once enabled, the bq cannot be disabled
TODO:
- [ ] Missing a bunch of tests
Co-authored-by: Tamo <tamo@meilisearch.com > 
						
						
					 
					
						2024-09-19 15:50:24 +00:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						f6483cf15d 
					 
					
						
						
							
							apply review comment  
						
						
						
						
					 
					
						2024-09-19 16:47:06 +02:00 
						 
				 
			
				
					
						
							
							
								meili-bors[bot] 
							
						 
					 
					
						
						
							
						
						bd34ed01d9 
					 
					
						
						
							
							Merge  #4945  
						
						... 
						
						
						
						4945: Add swedish in default pipelines r=dureuill a=ManyTheFish
# Summary
## Fix Swedish support
In Swedish the characters `å`/`ä`/`ö` are completely different than `a` or `o`  and should not be normalized as the same character.
because the Swedish specialized pipeline was not activated by default, these characters were normalized even with the settings:
```json
{
  "localizedAttributes": [ { "locales": ["swe"], "attributePatterns": ["*"] } ]
}
```
## Update Charabia adding German support
German segmentation will now be activated using the setting:
```json
{
  "localizedAttributes": [ { "locales": ["deu"], "attributePatterns": ["*"] } ]
}
```
# TODO
- [x] Activate Swedish Pipeline
- [x] Add a test to avoid future regressions
- [x] Update Charabia
Co-authored-by: ManyTheFish <many@meilisearch.com > 
						
						
					 
					
						2024-09-19 14:42:03 +00:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						74199f328d 
					 
					
						
						
							
							Make clippy happy  
						
						
						
						
					 
					
						2024-09-19 16:27:34 +02:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						1113c42de0 
					 
					
						
						
							
							fix broken comments  
						
						
						
						
					 
					
						2024-09-19 16:18:36 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						7d6768e4c4 
					 
					
						
						
							
							Add german tokenization pipeline  
						
						
						
						
					 
					
						2024-09-19 16:09:01 +02:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						f77661ec44 
					 
					
						
						
							
							Update Charabia v0.9.1  
						
						
						
						
					 
					
						2024-09-19 16:08:59 +02:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						b8fd85a46d 
					 
					
						
						
							
							Get rids of useless collect before an iteration on the readers  
						
						
						
						
					 
					
						2024-09-19 15:57:38 +02:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						fd43c6c404 
					 
					
						
						
							
							Improve the error message explaining you can't un-bq an embedder  
						
						
						
						
					 
					
						2024-09-19 15:51:29 +02:00 
						 
				 
			
				
					
						
							
							
								Tamo 
							
						 
					 
					
						
						
							
						
						2564ec1496 
					 
					
						
						
							
							Update milli/src/index.rs  
						
						... 
						
						
						
						Co-authored-by: Louis Dureuil <louis@meilisearch.com > 
						
						
					 
					
						2024-09-19 15:41:44 +02:00