Commit Graph

259 Commits

Author SHA1 Message Date
e773dfa9ba get rids of log in milli and add logs for the bucket sort 2024-02-08 15:04:05 +01:00
5d7061682e Add tracing to milli 2024-02-08 15:03:31 +01:00
f692021bfc Implement PR comments 2024-01-22 10:25:56 +01:00
84f49d76cd Add cuda feature 2024-01-22 10:25:16 +01:00
01e2c3d6bb Bump arroy to v0.2.0 2024-01-16 16:45:55 +01:00
7f125bfb12 Update incompatible dependencies 2024-01-16 15:15:54 +01:00
5869ca7716 Upgrade all compatible dependencies 2024-01-16 15:05:03 +01:00
942d49314c Remove dependency that requires libstdc++ 2023-12-18 22:17:18 +01:00
65e49b7092 Remove stuff, add distribution shift (WIP) 2023-12-14 16:08:38 +01:00
dde3a04679 WIP arroy integration 2023-12-14 16:07:49 +01:00
13c2c6c16b Small commit to add hybrid search and autoembedding 2023-12-14 16:07:48 +01:00
21bcf32109 Add candle and hg_hub, updating a lot of deps in the process 2023-12-14 16:07:48 +01:00
170e063b80 Remove the actix-web dependency from milli 2023-11-28 17:19:57 +01:00
d32eb11329 Move to the v0.20.0-alpha.9 of heed 2023-11-27 11:52:22 +01:00
0d4482625a Make the changes to use heed v0.20-alpha.6 2023-11-23 11:43:58 +01:00
56a0d91ecd Update the heed dependency and lock file 2023-11-22 15:11:09 +01:00
7cb7e37ba8 Merge branch 'main' into tmp-release-v1.5.0 2023-11-21 16:30:46 +01:00
b10c060bf7 Cleanup TOML 2023-11-01 14:03:04 +01:00
c71b1d33ae Sort entries using rayon in the transform sorters 2023-11-01 11:07:16 +01:00
17b647dfe5 Wip 2023-10-30 11:13:08 +01:00
4c6fddb1cb update charabia 2023-10-26 17:01:10 +02:00
ccf3ba3f32 Merge #4019
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
717b069907 Bump charabia to 0.8.3 2023-08-22 16:25:00 +02:00
cab27c2ab4 upgrade indexmap = "2.0.0" 2023-08-10 18:09:02 +02:00
624fa9052f upgrade deserr = "0.6.0" 2023-08-10 18:09:02 +02:00
60c11dbdbd upgrade rstar - "0.11.0" 2023-08-10 18:09:02 +02:00
dacee40ebc upgrade memmap2 = "0.7.1" 2023-08-10 18:09:02 +02:00
cc2c19d4c3 upgrade itertools = "0.10.5" 2023-08-10 18:09:02 +02:00
b45c36cd71 Merge branch 'main' into tmp-release-v1.3.0 2023-08-01 15:05:17 +02:00
151c31c18f Merge #3963
3963: Fix the milli crate r=ManyTheFish a=irevoire

Milli was using the serde feature of either without enabling it first; thus, it wasn't working.

It was working in meilisearch, though, because `meilisearch-types` was using the feature which enables it globally for all the other crates.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3962

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-07-31 09:32:08 +00:00
a8ad0902d3 Fix the milli crate
Milli was using the serde feature of either without enabling it first, thus it wasn't working
2023-07-31 11:08:27 +02:00
ba919b6123 fix: ⬆️ up mimalloc 2023-07-28 20:35:47 -07:00
d8b47b689e Use the new read-txn-no-tls heed feature 2023-07-26 15:45:15 +02:00
29ab54b259 Replace the hnsw crate by the instant-distance one 2023-07-25 12:37:35 +02:00
0497f93494 Update Charabia to the last version 2023-07-19 15:19:32 +02:00
eef95de30e First iteration on exposing puffin profiling 2023-07-18 17:38:13 +02:00
c106906f8f deactivate camelCase segmentation 2023-07-13 12:06:27 +02:00
a442af6a7c Update the features of the either dependency to compile milli successfully 2023-07-03 18:51:43 +02:00
661d1f90dc Merge #3866
3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish

# Pull Request

Update Charabia:
- enhance Japanese segmentation
- enhance Latin Tokenization
  - words containing `_` are now properly segmented into several words
  - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule
- fixes #3815
- fixes #3778
- fixes [product#151](https://github.com/meilisearch/product/discussions/151)

> Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22`

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-29 15:24:36 +00:00
84845de9ef Update Charabia 2023-06-29 15:56:32 +02:00
a385642ec3 Replace the BTreeMap by an IndexMap to return values in order 2023-06-29 14:33:31 +02:00
c79e82c62a Move back to the hnsw crate
This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.
2023-06-27 12:32:39 +02:00
268a9ef416 Move to the hgg crate 2023-06-27 12:32:38 +02:00
4571e512d2 Store the vectors in an HNSW in LMDB 2023-06-27 12:32:38 +02:00
34349faeae Create a new _vector extractor 2023-06-27 12:32:37 +02:00
45636d315c Merge #3670
3670: Fix addition deletion bug r=irevoire a=irevoire

The first commit of this PR is a revert of https://github.com/meilisearch/meilisearch/pull/3667. It re-enable the auto-batching of addition and deletion of tasks. No new changes have been introduced outside of `milli`. So all the changes you see on the autobatcher have actually already been reviewed.

It fixes https://github.com/meilisearch/meilisearch/issues/3440.

### What was happening?

The issue was that the `external_documents_ids` generated in the `transform` were used in a very strange way that wasn’t compatible with the deletion of documents.
Instead of doing a clear merge between the external document IDs of the DB and the one returned by the transform + writing it on disk, we were doing some weird tricks with the soft-deleted to avoid writing the fst on disk as much as possible.
The new algorithm may be a bit slower but is way more straightforward and doesn’t change depending on if the soft deletion was used or not. Here is a list of the changes introduced:
1. We now do a clear distinction between the `new_external_documents_ids` coming from the transform and only held on RAM and the `external_documents_ids` coming from the DB.
2. The `new_external_documents_ids` (coming out of the transform) are now represented as an `fst`. We don't need to struggle with the hard, soft distinction + the soft_deleted => That's easier to understand
3. When indexing documents, we merge the `external_documents_ids` coming from the DB and the `new_external_documents_ids` coming from the transform.

### Other things introduced in this  PR

Since we constantly have to write small, very specialized fuzzers for this kind of bug, we decided to push the one used to reproduce this bug.
It's not perfect, but it's easy to improve in the future.
It'll also run for as long as possible on every merge on the main branch.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
2023-06-19 09:09:30 +00:00
6c6387d05e move the fuzzer to its own crate 2023-05-29 12:27:39 +02:00
602ad98cb8 improve the way we handle the fsts 2023-05-22 11:15:14 +02:00
4391cba6ca fix the addition + deletion bug 2023-05-17 18:28:57 +02:00
1a79fd0c3c Use the new heed v0.12.6 2023-05-15 11:42:30 +02:00