77 Commits

Author SHA1 Message Date
ManyTheFish
8790880589 Fix clippy 2025-03-11 15:22:39 +01:00
ManyTheFish
6d52c6e711 Merge branch 'main' into granular-filterable-attributes 2025-03-11 10:05:58 +01:00
ManyTheFish
689e69d6d2 Take into account PR messages 2025-03-10 13:46:33 +01:00
meili-bors[bot]
3fd86e8d76
Merge #5371
Some checks failed
Test suite / Tests on ubuntu-22.04 (push) Failing after 12s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test with Ollama (push) Failing after 8s
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 11s
Test suite / Run Clippy (push) Successful in 7m1s
Test suite / Run Rustfmt (push) Successful in 2m44s
Run the indexing fuzzer / Setup the action (push) Successful in 1h5m40s
Indexing bench (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of indexing (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Has been cancelled
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Has been cancelled
Test suite / Tests on macos-13 (push) Has been cancelled
Test suite / Tests on windows-2022 (push) Has been cancelled
5371: Composite embedders r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #5343 

## What does this PR do?
- Implement [public usage](https://www.notion.so/meilisearch/Composite-embedder-usage-14a4b06b651f81859dc3df21e8cd02a0)
- Refactor the way we check if a parameter is mandatory/allowed/disallowed for a given source
- Take the "nesting context" into account for computer if a parameter is mandatory/allowed/disallowed
- Add tests checking all parameters with all sources, and made sure the results didn't change compared with v1.13

## Dumpless Upgrade

- This adds a new value for an existing parameter => compatible without change
- This adds new optional parameters => compatible without change

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-03-05 17:18:11 +00:00
ManyTheFish
ae8d453868 Refactor Document indexing process (searchables)
**Changes:**
The searchable database extraction is now relying on the AttributePatterns and FieldIdMapWithMetadata to match the field to extract.
Remove the SearchableExtractor trait to make the code less complex.

**Impact:**
- Document Addition/modification searchable indexing
- Document deletion searchable indexing
2025-03-03 10:32:42 +01:00
ManyTheFish
95bccaf5f5 Refactor Document indexing process (Facets)
**Changes:**
The Documents changes now take a selector closure instead of a list of field to match the field to extract.
The seek_leaf_values_in_object function now uses a selector closure of a list of field to match the field to extract
The facet database extraction is now relying on the FilterableAttributesRule to match the field to extract.
The facet-search database extraction is now relying on the FieldIdMapWithMetadata to select the field to index.
The facet level database extraction is now relying on the FieldIdMapWithMetadata to select the field to index.

**Important:**
Because the filterable attributes are patterns now,
the fieldIdMap will only register the fields that exists in at least one document.
if a field doesn't exist in any document, it will not be registered even if it has been specified in the filterable fields.

**Impact:**
- Document Addition/modification facet indexing
- Document deletion facet indexing
2025-03-03 10:32:03 +01:00
ManyTheFish
9f3663e768
Implement Incremental document database stats computing 2025-02-26 17:01:35 +01:00
Louis Dureuil
4a2643daa2
Rename embed_one to embed_search and embed_chunks* to embed_index* 2025-02-24 13:58:26 +01:00
Louis Dureuil
b83275c9c5
Change the updated* functions to only_new functions, hopefully better communicating what they do 2025-02-11 15:27:10 +01:00
Louis Dureuil
d7f35ee3ba
Use merged document instead of updated 2025-02-11 15:27:10 +01:00
Louis Dureuil
d6063079af
Unify facet strings by their normalized value 2025-01-22 15:50:42 +01:00
Louis Dureuil
a21711f473
Fix test 2025-01-14 10:23:59 +01:00
Clément Renault
00a03742ff
Prefer using extend when merging bitmaps than unions (less allocations) 2025-01-09 10:42:38 +01:00
Louis Dureuil
de7f8c4406
refactor indexer mod 2025-01-07 15:29:02 +01:00
Gnosnay
44eb153619 Replace hardcoded string with constants 2024-12-28 20:35:55 +08:00
ManyTheFish
acdd5aa6ea
Use the thread source id instead of the destination id
when filtering on the cache to merge
2024-12-12 18:12:00 +01:00
Kerollmops
2f3cc8cdd2
Fix the merge_caches_sorted function 2024-12-12 16:15:37 +01:00
meili-bors[bot]
1fc90fbacb
Merge #5147
5147: Batch progress r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5068

## What does this PR do?
- ...

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-12 09:15:54 +00:00
Tamo
df9b68f8ed
inital implementation of the progress 2024-12-11 16:25:01 +01:00
Louis Dureuil
bfca54cc2c
Return docid in case of errors while rendering the document template 2024-12-11 15:26:18 +01:00
Kerollmops
a751972c57
Prefer using a stable than a random hash builder 2024-12-10 14:25:53 +01:00
Kerollmops
89637bcaaf
Use bumparaw-collections in Meilisearch/milli 2024-12-10 11:52:20 +01:00
ManyTheFish
07f42e8057 Do not index a filed count when no word is counted 2024-12-09 15:45:12 +01:00
Louis Dureuil
bd5110a2fe
Fix clippy warnings 2024-12-05 16:13:07 +01:00
Louis Dureuil
fa8b9acdf6
Ignore documents that didn't change in facets 2024-12-05 16:12:52 +01:00
Louis Dureuil
2b74d1824b
Ignore documents that didn't change any field in word pair proximity 2024-12-05 15:56:22 +01:00
Louis Dureuil
c77b00d3ac
Don't extract word docids when no searchable changed 2024-12-05 15:51:58 +01:00
meili-bors[bot]
cac355bfa7
Merge #5124
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops

In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:

 - Optimize the prefix generation for word position docids (`@manythefish)`
 - Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
 
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
 
Before on the tag meilisearch-v1.12.0-rc.3.

```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```

After sorting the whole `HashMap`s in a `Vec` on this branch.

```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 09:35:52 +00:00
Kerollmops
52843123d4
Clean up and remove the non-sorted merge_caches function 2024-12-05 10:03:05 +01:00
Louis Dureuil
5f896b1050
Fix geo when spilling 2024-12-04 17:51:12 +01:00
Kerollmops
2e32d0474c
Lexicographically sort all the map to merge 2024-12-04 17:05:11 +01:00
Kerollmops
cb99ac6f7e
Consume vec instead of draining 2024-12-04 17:00:22 +01:00
Kerollmops
be411435f5
Use the merge_caches_alt function in the docids merging 2024-12-04 16:37:29 +01:00
Kerollmops
29ef164530
Introduce a new semi ordered merge function 2024-12-04 16:33:35 +01:00
Clément Renault
db4eaf4d2d
Rename serialize_into into serialize_into_writer 2024-12-02 10:03:27 +01:00
Clément Renault
08d6413365
Fix result types 2024-11-27 14:32:42 +01:00
Clément Renault
70802eb7c7
Fix most issues with the lifetimes 2024-11-27 14:32:42 +01:00
Clément Renault
6ac5b3b136
Finish most of the channels types 2024-11-27 14:32:26 +01:00
Clément Renault
8442db8101
Implement mostly all senders 2024-11-27 14:16:35 +01:00
Louis Dureuil
221e547e86
Slight changes 2024-11-21 16:47:44 +01:00
Clément Renault
61d0615253
Document the geo point extractor 2024-11-21 16:47:08 +01:00
Clément Renault
5727e00374
Remove useless geo skipped 2024-11-21 16:47:08 +01:00
ManyTheFish
36962b943b First batch of PR comment 2024-11-21 16:38:11 +01:00
Clément Renault
a38344acb3
Replace eprintlns by tracing 2024-11-20 15:29:51 +01:00
ManyTheFish
4d616f8794 Parse every attributes and filter before tokenization 2024-11-20 15:15:25 +01:00
ManyTheFish
fe5d50969a
Fix filed selector in extrators 2024-11-20 13:16:44 +01:00
Clément Renault
56c7c5d5f0
Fix comments 2024-11-20 13:16:44 +01:00
Louis Dureuil
2afa33011a
Fix tokenize_document 2024-11-20 13:16:43 +01:00
Louis Dureuil
f893b5153e
Don't mark [""] as empty facet 2024-11-20 13:16:42 +01:00
Louis Dureuil
ca779c21f9
facets: Handle boolean and skip empty strings 2024-11-20 13:16:42 +01:00