Commit Graph

350 Commits

Author SHA1 Message Date
aeb6b74725 Make sure we use an FxHashBuilder on the Value 2024-12-10 15:52:22 +01:00
a751972c57 Prefer using a stable than a random hash builder 2024-12-10 14:25:53 +01:00
6b269795d2 Update bumparaw-collections to 0.1.2 2024-12-10 14:25:13 +01:00
89637bcaaf Use bumparaw-collections in Meilisearch/milli 2024-12-10 11:52:20 +01:00
866ac91be3 Fix error messages 2024-12-10 11:06:58 +01:00
e610af36aa User failure for documents with docid of ==512 bytes 2024-12-10 11:06:24 +01:00
07f42e8057 Do not index a filed count when no word is counted 2024-12-09 15:45:12 +01:00
71f59749dc Reduce union impact in merging 2024-12-09 15:44:06 +01:00
f5dd8dfc3e Rollback max memory usage changes 2024-12-09 10:26:30 +01:00
54e34beac6 Check attributes are filterable before evaluating search query 2024-12-07 21:13:13 +00:00
bd5110a2fe Fix clippy warnings 2024-12-05 16:13:07 +01:00
fa8b9acdf6 Ignore documents that didn't change in facets 2024-12-05 16:12:52 +01:00
2b74d1824b Ignore documents that didn't change any field in word pair proximity 2024-12-05 15:56:22 +01:00
c77b00d3ac Don't extract word docids when no searchable changed 2024-12-05 15:51:58 +01:00
c77073efcc Update::has_changed_for_fields 2024-12-05 15:50:12 +01:00
cac355bfa7 Merge #5124
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops

In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:

 - Optimize the prefix generation for word position docids (`@manythefish)`
 - Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
 
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
 
Before on the tag meilisearch-v1.12.0-rc.3.

```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```

After sorting the whole `HashMap`s in a `Vec` on this branch.

```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 09:35:52 +00:00
52843123d4 Clean up and remove the non-sorted merge_caches function 2024-12-05 10:03:05 +01:00
6298db5bea Merge #5113
5113: Fix the Minimum BBQueue channel threshold r=Kerollmops a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-05 09:01:02 +00:00
3a11e39c01 Force max_memory to a min of 100MiB 2024-12-04 17:53:30 +01:00
5f896b1050 Fix geo when spilling 2024-12-04 17:51:12 +01:00
2e32d0474c Lexicographically sort all the map to merge 2024-12-04 17:05:11 +01:00
cb99ac6f7e Consume vec instead of draining 2024-12-04 17:00:22 +01:00
be411435f5 Use the merge_caches_alt function in the docids merging 2024-12-04 16:37:29 +01:00
29ef164530 Introduce a new semi ordered merge function 2024-12-04 16:33:35 +01:00
739c52a3cd Replace HashSets by BTreeSets for the prefixes 2024-12-04 16:16:48 +01:00
261d2ceb06 Yield the BBQueue writer instead of spin looping 2024-12-04 14:16:40 +01:00
96831ed9bb Send the WakeUp message if necessary in the reserve function 2024-12-04 11:03:01 +01:00
0459b1a242 Change the reserve and grant function to accept a closure 2024-12-04 10:32:25 +01:00
8ecb726683 Fix the minimun BBQueue channel threshold 2024-12-03 15:49:11 +01:00
0ad2f57a92 Update bbqueue repo to point to the meilisearch org 2024-12-03 12:00:04 +01:00
e905a72d73 remove mimalloc on Windows 2024-12-02 18:13:56 +01:00
d040aff101 Stop allocating 1GiB for documents 2024-12-02 16:30:14 +01:00
767259be7e Prefer returning a abort indexation rather than throwing a panic 2024-12-02 11:53:42 +01:00
e9f34fb4b1 Make the frame consumer pulling fair 2024-12-02 11:49:01 +01:00
d5c07ef7b3 Manage key length conversion error correctly 2024-12-02 11:03:00 +01:00
5e218f3f4d Remove a sync_all (mark my words) 2024-12-02 11:03:00 +01:00
bcab61ab1d Do spurious wake ups on the receiver side 2024-12-02 11:03:00 +01:00
263c5a348e Move the spin looping for BBQueue frames into a dedicated function 2024-12-02 10:33:49 +01:00
be7d2fbe63 Move the EntryHeader up in the file and document the safety related to the size 2024-12-02 10:19:11 +01:00
f7f9a131e4 Improve copying bytes into aligned memory area 2024-12-02 10:15:58 +01:00
5df5eb2db2 Clarify a method name 2024-12-02 10:10:48 +01:00
30eb0e5b5b Rename recv and read methods to recv_action and recv_frame 2024-12-02 10:08:01 +01:00
5b860cb989 Fix english in the doc 2024-12-02 10:06:35 +01:00
76d0623b11 Reduce the number of unwraps 2024-12-02 10:05:06 +01:00
db4eaf4d2d Rename serialize_into into serialize_into_writer 2024-12-02 10:03:27 +01:00
13f21206a6 Call the serialize_into_writer method from the serialize_into one 2024-12-02 10:03:01 +01:00
14ee7aa84c Make sure the BBQueue is at least 50 MiB 2024-11-28 18:02:48 +01:00
8a35cd1743 Adjust the BBQueue buffers to use 2% instead of 10% 2024-11-28 16:00:15 +01:00
3c7ac093d3 Take the BBQueue capacity into account in the max memory 2024-11-28 15:43:14 +01:00
b57dd5c58e Remove the Vector variant and use the Vectors 2024-11-28 15:20:43 +01:00