meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-09-09 06:06:30 +00:00

Author	SHA1	Message	Date
ManyTheFish	ae8d453868	Refactor Document indexing process (searchables) Changes: The searchable database extraction is now relying on the AttributePatterns and FieldIdMapWithMetadata to match the field to extract. Remove the SearchableExtractor trait to make the code less complex. Impact: - Document Addition/modification searchable indexing - Document deletion searchable indexing	2025-03-03 10:32:42 +01:00
ManyTheFish	95bccaf5f5	Refactor Document indexing process (Facets) Changes: The Documents changes now take a selector closure instead of a list of field to match the field to extract. The seek_leaf_values_in_object function now uses a selector closure of a list of field to match the field to extract The facet database extraction is now relying on the FilterableAttributesRule to match the field to extract. The facet-search database extraction is now relying on the FieldIdMapWithMetadata to select the field to index. The facet level database extraction is now relying on the FieldIdMapWithMetadata to select the field to index. Important: Because the filterable attributes are patterns now, the fieldIdMap will only register the fields that exists in at least one document. if a field doesn't exist in any document, it will not be registered even if it has been specified in the filterable fields. Impact: - Document Addition/modification facet indexing - Document deletion facet indexing	2025-03-03 10:32:03 +01:00
ManyTheFish	9f3663e768	Implement Incremental document database stats computing	2025-02-26 17:01:35 +01:00
Louis Dureuil	b83275c9c5	Change the `updated*` functions to `only_new` functions, hopefully better communicating what they do	2025-02-11 15:27:10 +01:00
Louis Dureuil	d7f35ee3ba	Use merged document instead of updated	2025-02-11 15:27:10 +01:00
Louis Dureuil	d6063079af	Unify facet strings by their normalized value	2025-01-22 15:50:42 +01:00
Louis Dureuil	a21711f473	Fix test	2025-01-14 10:23:59 +01:00
Clément Renault	00a03742ff	Prefer using extend when merging bitmaps than unions (less allocations)	2025-01-09 10:42:38 +01:00
Louis Dureuil	de7f8c4406	refactor indexer mod	2025-01-07 15:29:02 +01:00
Gnosnay	44eb153619	Replace hardcoded string with constants	2024-12-28 20:35:55 +08:00
ManyTheFish	acdd5aa6ea	Use the thread source id instead of the destination id when filtering on the cache to merge	2024-12-12 18:12:00 +01:00
Kerollmops	2f3cc8cdd2	Fix the merge_caches_sorted function	2024-12-12 16:15:37 +01:00
meili-bors[bot]	1fc90fbacb	Merge #5147 5147: Batch progress r=dureuill a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/5068 ## What does this PR do? - ... ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Tamo <tamo@meilisearch.com>	2024-12-12 09:15:54 +00:00
Tamo	df9b68f8ed	inital implementation of the progress	2024-12-11 16:25:01 +01:00
Louis Dureuil	bfca54cc2c	Return docid in case of errors while rendering the document template	2024-12-11 15:26:18 +01:00
Kerollmops	a751972c57	Prefer using a stable than a random hash builder	2024-12-10 14:25:53 +01:00
Kerollmops	89637bcaaf	Use bumparaw-collections in Meilisearch/milli	2024-12-10 11:52:20 +01:00
ManyTheFish	07f42e8057	Do not index a filed count when no word is counted	2024-12-09 15:45:12 +01:00
Louis Dureuil	bd5110a2fe	Fix clippy warnings	2024-12-05 16:13:07 +01:00
Louis Dureuil	fa8b9acdf6	Ignore documents that didn't change in facets	2024-12-05 16:12:52 +01:00
Louis Dureuil	2b74d1824b	Ignore documents that didn't change any field in word pair proximity	2024-12-05 15:56:22 +01:00
Louis Dureuil	c77b00d3ac	Don't extract word docids when no searchable changed	2024-12-05 15:51:58 +01:00
meili-bors[bot]	cac355bfa7	Merge #5124 5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache: - Optimize the prefix generation for word position docids (`@manythefish)` - Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)` ## Benchmarks on 1cpu 2gb gpo3 (5k IOps) Before on the tag meilisearch-v1.12.0-rc.3. ``` word_position_docids:merge_and_send_docids: 988s compute_word_fst: 23.3s word_pair_proximity_docids:merge_and_send_docids: 428s compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s ``` After sorting the whole `HashMap`s in a `Vec` on this branch. ``` word_position_docids:merge_and_send_docids: 202s compute_word_fst: 20.4s word_pair_proximity_docids:merge_and_send_docids: 427s compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s ``` Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2024-12-05 09:35:52 +00:00
Kerollmops	52843123d4	Clean up and remove the non-sorted merge_caches function	2024-12-05 10:03:05 +01:00
Louis Dureuil	5f896b1050	Fix geo when spilling	2024-12-04 17:51:12 +01:00
Kerollmops	2e32d0474c	Lexicographically sort all the map to merge	2024-12-04 17:05:11 +01:00
Kerollmops	cb99ac6f7e	Consume vec instead of draining	2024-12-04 17:00:22 +01:00
Kerollmops	be411435f5	Use the merge_caches_alt function in the docids merging	2024-12-04 16:37:29 +01:00
Kerollmops	29ef164530	Introduce a new semi ordered merge function	2024-12-04 16:33:35 +01:00
Clément Renault	db4eaf4d2d	Rename serialize_into into serialize_into_writer	2024-12-02 10:03:27 +01:00
Clément Renault	08d6413365	Fix result types	2024-11-27 14:32:42 +01:00
Clément Renault	70802eb7c7	Fix most issues with the lifetimes	2024-11-27 14:32:42 +01:00
Clément Renault	6ac5b3b136	Finish most of the channels types	2024-11-27 14:32:26 +01:00
Clément Renault	8442db8101	Implement mostly all senders	2024-11-27 14:16:35 +01:00
Louis Dureuil	221e547e86	Slight changes	2024-11-21 16:47:44 +01:00
Clément Renault	61d0615253	Document the geo point extractor	2024-11-21 16:47:08 +01:00
Clément Renault	5727e00374	Remove useless geo skipped	2024-11-21 16:47:08 +01:00
ManyTheFish	36962b943b	First batch of PR comment	2024-11-21 16:38:11 +01:00
Clément Renault	a38344acb3	Replace eprintlns by tracing	2024-11-20 15:29:51 +01:00
ManyTheFish	4d616f8794	Parse every attributes and filter before tokenization	2024-11-20 15:15:25 +01:00
ManyTheFish	fe5d50969a	Fix filed selector in extrators	2024-11-20 13:16:44 +01:00
Clément Renault	56c7c5d5f0	Fix comments	2024-11-20 13:16:44 +01:00
Louis Dureuil	2afa33011a	Fix tokenize_document	2024-11-20 13:16:43 +01:00
Louis Dureuil	f893b5153e	Don't mark [""] as empty facet	2024-11-20 13:16:42 +01:00
Louis Dureuil	ca779c21f9	facets: Handle boolean and skip empty strings	2024-11-20 13:16:42 +01:00
ManyTheFish	b1f8aec348	Fix index_documents_check_exists_database	2024-11-20 13:16:41 +01:00
ManyTheFish	ba7f091db3	Use tokenizer on numbers and booleans	2024-11-20 13:16:41 +01:00
Louis Dureuil	8049df125b	Add depth to facet extraction so that null inside an array doesn't mark the entire field as null	2024-11-20 13:16:40 +01:00
ManyTheFish	41dbdd2d18	Fix filtered_placeholder_search_should_not_return_deleted_documents and word_scale_set_and_reset	2024-11-19 16:08:25 +01:00
Louis Dureuil	c782c09208	Move step to a dedicated mod and replace it with an enum	2024-11-18 18:22:13 +01:00

1 2

72 Commits