mirror of
				https://github.com/meilisearch/meilisearch.git
				synced 2025-10-25 13:06:27 +00:00 
			
		
		
		
	Merge #4845
4845: Fix perf regression facet strings r=ManyTheFish a=dureuill Benchmarks between v1.9 and v1.10 show a performance regression of about x2 (+3dB regression) for most indexing workloads (+44s for hackernews). [Benchmark interpretation in the engine weekly meeting](https://www.notion.so/meilisearch/Engine-weekly-4d49560d374c4a87b4e3d126a261d4a0?pvs=4#98a709683276450295fcfe1f8ea5cef3). - Initial investigation pointed to #4819 as the origin of the regression. - Further investigation points towards the hypernormalization of each facet value in `extract_facet_string_docids` - Most of the slowdown is in `normalize_facet_strings`, and precisely in `detection.language()`. This PR improves the situation (-10s compared with `main` for hackernews, so only +34s regression compared with `v1.9`) by skipping normalization when it can be skipped. I'm not sure how to fix the root cause though. Should we skip facet locale normalization for now? Cc `@ManyTheFish` --- Tentative resolution options: 1. remove locale normalization from facet. I'm not sure why this is required, I believe we weren't doing this before, so maybe we can stop doing that again. 2. don't do language detection when it can be helped: won't help with the regressions in benchmark, but maybe we can skip language detection when the locales contain only one language? 3. use a faster language detection library: `@Kerollmops` told me about https://github.com/quickwit-oss/whichlang which bolsters x10 to x100 throughput compared with whatlang. Should we consider replacing whatlang with whichlang? Now I understand whichlang supports fewer languages than whatlang, so I also suggest: 4. use whichlang when the list of locales is empty (autodetection), or when it only contains locales that whichlang can detect. If the list of locales contains locales that whichlang *cannot* detect, **then** use whatlang instead. --- > [!CAUTION] > this PR contains a commit that adds detailed spans, that were used to detect which part of `extract_facet_string_docids` was taking too much time. As this commit adds spans that are called too often and adds 7s overhead, it should be removed before landing. Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>
This commit is contained in:
		| @@ -1369,12 +1369,18 @@ pub fn perform_facet_search( | ||||
|         None => TimeBudget::default(), | ||||
|     }; | ||||
|  | ||||
|     // In the faceted search context, we want to use the intersection between the locales provided by the user | ||||
|     // and the locales of the facet string. | ||||
|     // If the facet string is not localized, we **ignore** the locales provided by the user because the facet data has no locale. | ||||
|     // If the user does not provide locales, we use the locales of the facet string. | ||||
|     let localized_attributes = index.localized_attributes_rules(&rtxn)?.unwrap_or_default(); | ||||
|     let locales = locales.or_else(|| { | ||||
|         localized_attributes | ||||
|     let localized_attributes_locales = | ||||
|         localized_attributes.into_iter().find(|attr| attr.match_str(&facet_name)); | ||||
|     let locales = localized_attributes_locales.map(|attr| { | ||||
|         attr.locales | ||||
|             .into_iter() | ||||
|             .find(|attr| attr.match_str(&facet_name)) | ||||
|             .map(|attr| attr.locales) | ||||
|             .filter(|locale| locales.as_ref().map_or(true, |locales| locales.contains(locale))) | ||||
|             .collect() | ||||
|     }); | ||||
|  | ||||
|     let (search, _, _, _) = | ||||
|   | ||||
		Reference in New Issue
	
	Block a user