meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-10-29 06:56:45 +00:00

Author	SHA1	Message	Date
Louis Dureuil	580ea2f450	Pass the fields <-> ids map with metadata to render	2024-09-02 11:30:10 +02:00
meili-bors[bot]	ee62d9ce30	Merge #4845 4845: Fix perf regression facet strings r=ManyTheFish a=dureuill Benchmarks between v1.9 and v1.10 show a performance regression of about x2 (+3dB regression) for most indexing workloads (+44s for hackernews). [Benchmark interpretation in the engine weekly meeting](https://www.notion.so/meilisearch/Engine-weekly-4d49560d374c4a87b4e3d126a261d4a0?pvs=4#98a709683276450295fcfe1f8ea5cef3). - Initial investigation pointed to #4819 as the origin of the regression. - Further investigation points towards the hypernormalization of each facet value in `extract_facet_string_docids` - Most of the slowdown is in `normalize_facet_strings`, and precisely in `detection.language()`. This PR improves the situation (-10s compared with `main` for hackernews, so only +34s regression compared with `v1.9`) by skipping normalization when it can be skipped. I'm not sure how to fix the root cause though. Should we skip facet locale normalization for now? Cc `@ManyTheFish` --- Tentative resolution options: 1. remove locale normalization from facet. I'm not sure why this is required, I believe we weren't doing this before, so maybe we can stop doing that again. 2. don't do language detection when it can be helped: won't help with the regressions in benchmark, but maybe we can skip language detection when the locales contain only one language? 3. use a faster language detection library: `@Kerollmops` told me about https://github.com/quickwit-oss/whichlang which bolsters x10 to x100 throughput compared with whatlang. Should we consider replacing whatlang with whichlang? Now I understand whichlang supports fewer languages than whatlang, so I also suggest: 4. use whichlang when the list of locales is empty (autodetection), or when it only contains locales that whichlang can detect. If the list of locales contains locales that whichlang cannot detect, then use whatlang instead. --- > [!CAUTION] > this PR contains a commit that adds detailed spans, that were used to detect which part of `extract_facet_string_docids` was taking too much time. As this commit adds spans that are called too often and adds 7s overhead, it should be removed before landing. Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-08-19 06:29:48 +00:00
ManyTheFish	0f965d3574	Remove hotloop's spans	2024-08-14 14:33:36 +02:00
ManyTheFish	ade54493ab	Only detect language for a facet if several locales have been specified by the user in the settings	2024-08-14 12:03:52 +02:00
Louis Dureuil	c3cdc407ec	Avoid unnecessary clone()	2024-08-08 14:57:02 +02:00
Louis Dureuil	2f10273d14	Group by normalized values, make sure you don't remove a value where there remains at still one value that normalizes towards it	2024-08-08 14:02:53 +02:00
Louis Dureuil	e64d0e0ca8	use insert instead of push for bitmaps	2024-08-01 18:32:45 +02:00
Louis Dureuil	0e68718027	Add detailed spans	2024-07-31 13:05:47 +02:00
Louis Dureuil	7c3fc8c655	Split settings and document facet string extractions	2024-07-31 10:57:46 +02:00
Louis Dureuil	8acd3f50bb	skip normalization when the locales and values are the same	2024-07-31 09:53:00 +02:00
Louis Dureuil	d4ea7cc2a9	fix clippy 👉👈	2024-07-25 12:10:32 +02:00
Louis Dureuil	2413592bbf	Display docid when there are documents without manual embeddings for a manual embedder	2024-07-25 12:10:32 +02:00
Louis Dureuil	553440632e	Introduce Setting::some_or_not_set	2024-07-25 12:01:52 +02:00
Louis Dureuil	7a347966da	Allow explicit `dimensions` for ollama	2024-07-25 12:01:51 +02:00
Louis Dureuil	4654d51e05	Add custom headers for REST embedder	2024-07-25 12:01:51 +02:00
ManyTheFish	a918561ac1	Fix PR comments	2024-07-25 10:52:56 +02:00
ManyTheFish	04fa44e7eb	Implement localized attributes settings	2024-07-25 10:51:27 +02:00
ManyTheFish	cc02920f2b	Update charabia	2024-07-25 10:51:27 +02:00
Tamo	988552e178	add tests on the rest embedder	2024-07-24 14:34:17 +02:00
Louis Dureuil	0d8199f3b7	Change parameters in milli settings	2024-07-24 14:34:17 +02:00
Louis Dureuil	24240934f9	Improve errors when indexing documents with a user provided embedder	2024-07-16 13:39:01 +02:00
Louis Dureuil	65d0c32aa7	Allow overriding OpenAI's url	2024-07-16 13:39:00 +02:00
Clément Renault	6e80364c50	Apply review comments	2024-07-11 11:00:27 +02:00
Clément Renault	837274f853	Restrict even more the Rhai engine	2024-07-10 16:30:18 +02:00
Clément Renault	aace587dd1	Create errors for the internal processing ones	2024-07-10 16:29:18 +02:00
Clément Renault	81ec0abad1	Use the new rayon-par-bridge library	2024-07-10 16:29:04 +02:00
Clément Renault	b67d385cf0	Parallelize the edition functions	2024-07-10 16:28:54 +02:00
Clément Renault	2eae2015d7	Support aborting documents edition by function	2024-07-10 16:28:15 +02:00
Clément Renault	33fa17bf12	Support deleting documents with functions	2024-07-10 16:28:15 +02:00
Clément Renault	400e6b93ce	Support user-provided context for documents edition	2024-07-10 16:28:15 +02:00
Clément Renault	f4add93043	Limit the number of script operations	2024-07-10 16:28:14 +02:00
Clément Renault	2fae96ac14	Show the actual number of actually edited documents	2024-07-10 16:28:14 +02:00
Clément Renault	45af18ae9c	Check the Rhai syntax before accepting the script	2024-07-10 16:28:13 +02:00
Clément Renault	2d97164d9f	It works perfectly with some Rhai	2024-07-10 16:28:13 +02:00
Clément Renault	efc156a4a4	Executing Lua works correctly	2024-07-10 16:27:36 +02:00
meili-bors[bot]	2099b4f0dd	Merge #4786 4786: Update dependencies r=Kerollmops a=irevoire # Pull Request ## Related issue Fixes #4753 ## What does this PR do? - Update all dependencies except rustls - [x] Release charabia - [x] Update charabia - [x] Double check that the docker build works after updating charabia Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-07-10 13:23:54 +00:00
Tamo	4d5005b01a	make clippy happy	2024-07-10 10:06:59 +02:00
hanbings	0a40a98bb6	Make milli use edition 2021 (#4770 ) * Make milli use edition 2021 * Add lifetime annotations to milli. * Run cargo fmt	2024-07-09 17:25:39 +02:00
Tamo	cd46ebd6b5	remove insta deprecating	2024-07-08 18:38:05 +02:00
Tamo	1693332cab	Update arroy and always build the tree that need to be built	2024-06-24 10:14:03 +02:00
meili-bors[bot]	ddd564665b	Merge #4713 4713: Speed up facet distribution r=ManyTheFish a=Kerollmops This PR is akin to #4682, but this time, the same logic is applied to the facets. Bitmaps are not decoded, and we do an intersection on the bytes with the search candidates instead of materializing the RoaringBitmap to destroy it just after the operation. A prospect raised some slow requests when performing facet searches, and I found out that the disk optimization intersection wasn't performed on the facets. Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-06-24 05:23:46 +00:00
Clément Renault	9736e16a88	Make clippy happy	2024-06-20 13:02:44 +02:00
Louis Dureuil	a04041c8f2	Only spawn the pool once	2024-06-19 16:25:33 +02:00
Louis Dureuil	0a8f50695e	Fixes for Rust v1.79	2024-06-13 17:47:44 +02:00
Louis Dureuil	e35ef31738	Small changes following review	2024-06-13 14:20:48 +02:00
Louis Dureuil	3bc8f81abc	user_provided => regenerate	2024-06-12 18:12:20 +02:00
Louis Dureuil	a89eea233b	Fix vectors injection	2024-06-12 17:10:19 +02:00
Louis Dureuil	f5cf01e7d1	Rework extraction to use EmbedderAction	2024-06-12 14:50:55 +02:00
Louis Dureuil	d1dd7e5d09	In transform for removed embedders, write back their user provided vectors in documents, and clear the writers	2024-06-12 14:50:55 +02:00
Louis Dureuil	d18c1f77d7	Update embedder configs with a finer granularity - no longer clear vector DB between any two embedder changes	2024-06-12 14:50:55 +02:00

1 2 3 4 5 ...

937 Commits