Commit Graph

2640 Commits

Author SHA1 Message Date
Clément Renault
00e045b249 Rename and use the try_arc_for_each_try_init method 2024-10-01 11:11:25 +02:00
Clément Renault
d83c9a4074 Introduce the try_for_each_try_init method to be used with Arced Errors 2024-10-01 11:11:25 +02:00
Clément Renault
f3356ddaa4 Fix the errors when using the try_map_try_init method 2024-10-01 11:11:10 +02:00
Clément Renault
31de5c747e WIP using try_map_try_init 2024-10-01 11:10:53 +02:00
Clément Renault
3843240940 Prefer using Ars instead of Options 2024-10-01 11:10:53 +02:00
Louis Dureuil
8cb5e7437d try using try_map_try_init 2024-10-01 11:10:53 +02:00
Louis Dureuil
5b776556fe Add ParallelIteratorExt 2024-10-01 11:10:53 +02:00
ManyTheFish
bb7a503e5d Compute prefix databases
We are now computing the prefix FST and a prefix delta in the Merger thread,
after all the databases are written, the main thread will recompute the prefix databases based on the prefix delta without needing any grenad temporary file anymore
2024-10-01 09:57:06 +02:00
Louis Dureuil
64589278ac Appease *some* of clippy warnings 2024-09-30 16:08:29 +02:00
ManyTheFish
8df6daf308 Remove fid_wordcount_docids.rs 2024-09-30 11:52:31 +02:00
ManyTheFish
5b552caf42 Fix position in insertions 2024-09-30 11:46:32 +02:00
ManyTheFish
2b51a63418 Remove dead code 2024-09-30 11:42:36 +02:00
Louis Dureuil
3d8024fb2b write the weighted fields ids map 2024-09-30 11:35:03 +02:00
Louis Dureuil
4b0da0ff24 Fix inversion of field_id and position 2024-09-30 11:34:50 +02:00
Louis Dureuil
079f2b5de0 Format error messages consistently 2024-09-30 11:34:31 +02:00
ManyTheFish
960060ebdf Fix fst builder when their is no previous FST 2024-09-25 16:53:00 +02:00
Clément Renault
3d244451df Reduce the lru key size from 8 to 12 bytes 2024-09-25 16:14:13 +02:00
Clément Renault
5f53935c8a Fix a bug in the Lru 2024-09-25 16:09:34 +02:00
Clément Renault
29a7623c3f Fxi some logs 2024-09-25 15:57:50 +02:00
Clément Renault
e97041f7d0 Replace the Lru free list by a simple increment 2024-09-25 15:55:52 +02:00
Clément Renault
52d7f3ed1c Reduce the lru key size from 20 to 8 bytes 2024-09-25 15:37:13 +02:00
Clément Renault
86d5e6d9ff Use the new Lru 2024-09-25 14:54:56 +02:00
Clément Renault
759b9b1546 Introduce a new custom Lru 2024-09-25 14:49:12 +02:00
ManyTheFish
3f7a500f3b Build prefix fst 2024-09-25 14:36:06 +02:00
ManyTheFish
974272f2e9 Merge branch 'main' into indexer-edition-2024 2024-09-25 07:41:16 +02:00
Clément Renault
7ad037841f Move the tracing info to eprintln 2024-09-24 18:21:58 +02:00
Clément Renault
e0c7067355 Expose an IndexedParallelIterator to the index function 2024-09-24 17:24:59 +02:00
ManyTheFish
6e87332410 Change the way the FST is built 2024-09-24 16:28:31 +02:00
Clément Renault
2d1caf27df Use eprintln to log 2024-09-24 15:59:50 +02:00
Clément Renault
7f148c127c Measure the SmallVec efficacity 2024-09-24 15:32:15 +02:00
Clément Renault
4ce5d3d66d Do not check before pushing in bitmaps 2024-09-24 09:43:16 +02:00
Clément Renault
42b093687d Introduce the new PushOptimizedBitmap 2024-09-23 16:38:21 +02:00
Clément Renault
f00664247d Add more stats about the channel message sent 2024-09-23 15:13:52 +02:00
Clément Renault
193d7f5d34 Add the mutualized charabia normalization 2024-09-23 14:24:25 +02:00
Clément Renault
013acb3d93 Measure merger writer channel contention 2024-09-23 11:07:59 +02:00
meili-bors[bot]
462a2329f1 Merge #4941
4941: Implement the binary quantization in meilisearch r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4873

## What does this PR do?
- Add a settings for the binary quantization
- Once enabled, the bq cannot be disabled

TODO:
- [ ] Missing a bunch of tests

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-09-19 15:50:24 +00:00
Tamo
f6483cf15d apply review comment 2024-09-19 16:47:06 +02:00
meili-bors[bot]
bd34ed01d9 Merge #4945
4945: Add swedish in default pipelines r=dureuill a=ManyTheFish

# Summary
## Fix Swedish support

In Swedish the characters `å`/`ä`/`ö` are completely different than `a` or `o`  and should not be normalized as the same character.
because the Swedish specialized pipeline was not activated by default, these characters were normalized even with the settings:
```json
{
  "localizedAttributes": [ { "locales": ["swe"], "attributePatterns": ["*"] } ]
}
```

## Update Charabia adding German support

German segmentation will now be activated using the setting:
```json
{
  "localizedAttributes": [ { "locales": ["deu"], "attributePatterns": ["*"] } ]
}
```

# TODO

- [x] Activate Swedish Pipeline
- [x] Add a test to avoid future regressions
- [x] Update Charabia


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-19 14:42:03 +00:00
Tamo
74199f328d Make clippy happy 2024-09-19 16:27:34 +02:00
Tamo
1113c42de0 fix broken comments 2024-09-19 16:18:36 +02:00
ManyTheFish
7d6768e4c4 Add german tokenization pipeline 2024-09-19 16:09:01 +02:00
ManyTheFish
f77661ec44 Update Charabia v0.9.1 2024-09-19 16:08:59 +02:00
Tamo
b8fd85a46d Get rids of useless collect before an iteration on the readers 2024-09-19 15:57:38 +02:00
Tamo
fd43c6c404 Improve the error message explaining you can't un-bq an embedder 2024-09-19 15:51:29 +02:00
Tamo
2564ec1496 Update milli/src/index.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 15:41:44 +02:00
Tamo
b6b73fe41c Update milli/src/update/settings.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 15:41:14 +02:00
Tamo
6dde41cc46 stop using a local version of arroy and instead point to the git repo with the rev 2024-09-19 15:25:38 +02:00
Tamo
163f8023a1 remove debug println 2024-09-19 12:13:25 +02:00
Tamo
633537ccd7 fix updating documents without updating the settings 2024-09-19 12:00:58 +02:00
Tamo
3f6301dbc9 fix the missing embedder name in the error message when trying to disable the binary quantization 2024-09-19 12:00:58 +02:00