Commit Graph

252 Commits

Author SHA1 Message Date
Louis Dureuil
942d49314c Remove dependency that requires libstdc++ 2023-12-18 22:17:18 +01:00
Louis Dureuil
65e49b7092 Remove stuff, add distribution shift (WIP) 2023-12-14 16:08:38 +01:00
Louis Dureuil
dde3a04679 WIP arroy integration 2023-12-14 16:07:49 +01:00
Louis Dureuil
13c2c6c16b Small commit to add hybrid search and autoembedding 2023-12-14 16:07:48 +01:00
Louis Dureuil
21bcf32109 Add candle and hg_hub, updating a lot of deps in the process 2023-12-14 16:07:48 +01:00
Clément Renault
170e063b80 Remove the actix-web dependency from milli 2023-11-28 17:19:57 +01:00
Clément Renault
d32eb11329 Move to the v0.20.0-alpha.9 of heed 2023-11-27 11:52:22 +01:00
Clément Renault
0d4482625a Make the changes to use heed v0.20-alpha.6 2023-11-23 11:43:58 +01:00
Clément Renault
56a0d91ecd Update the heed dependency and lock file 2023-11-22 15:11:09 +01:00
Clément Renault
7cb7e37ba8 Merge branch 'main' into tmp-release-v1.5.0 2023-11-21 16:30:46 +01:00
Clément Renault
b10c060bf7 Cleanup TOML 2023-11-01 14:03:04 +01:00
Clément Renault
c71b1d33ae Sort entries using rayon in the transform sorters 2023-11-01 11:07:16 +01:00
ManyTheFish
17b647dfe5 Wip 2023-10-30 11:13:08 +01:00
ManyTheFish
4c6fddb1cb update charabia 2023-10-26 17:01:10 +02:00
meili-bors[bot]
ccf3ba3f32 Merge #4019
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
Kerollmops
717b069907 Bump charabia to 0.8.3 2023-08-22 16:25:00 +02:00
ManyTheFish
cab27c2ab4 upgrade indexmap = "2.0.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
624fa9052f upgrade deserr = "0.6.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
60c11dbdbd upgrade rstar - "0.11.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
dacee40ebc upgrade memmap2 = "0.7.1" 2023-08-10 18:09:02 +02:00
ManyTheFish
cc2c19d4c3 upgrade itertools = "0.10.5" 2023-08-10 18:09:02 +02:00
ManyTheFish
b45c36cd71 Merge branch 'main' into tmp-release-v1.3.0 2023-08-01 15:05:17 +02:00
meili-bors[bot]
151c31c18f Merge #3963
3963: Fix the milli crate r=ManyTheFish a=irevoire

Milli was using the serde feature of either without enabling it first; thus, it wasn't working.

It was working in meilisearch, though, because `meilisearch-types` was using the feature which enables it globally for all the other crates.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3962

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-07-31 09:32:08 +00:00
Tamo
a8ad0902d3 Fix the milli crate
Milli was using the serde feature of either without enabling it first, thus it wasn't working
2023-07-31 11:08:27 +02:00
ThatOneCalculator
ba919b6123 fix: ⬆️ up mimalloc 2023-07-28 20:35:47 -07:00
Clément Renault
d8b47b689e Use the new read-txn-no-tls heed feature 2023-07-26 15:45:15 +02:00
Kerollmops
29ab54b259 Replace the hnsw crate by the instant-distance one 2023-07-25 12:37:35 +02:00
ManyTheFish
0497f93494 Update Charabia to the last version 2023-07-19 15:19:32 +02:00
Kerollmops
eef95de30e First iteration on exposing puffin profiling 2023-07-18 17:38:13 +02:00
ManyTheFish
c106906f8f deactivate camelCase segmentation 2023-07-13 12:06:27 +02:00
Kerollmops
a442af6a7c Update the features of the either dependency to compile milli successfully 2023-07-03 18:51:43 +02:00
meili-bors[bot]
661d1f90dc Merge #3866
3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish

# Pull Request

Update Charabia:
- enhance Japanese segmentation
- enhance Latin Tokenization
  - words containing `_` are now properly segmented into several words
  - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule
- fixes #3815
- fixes #3778
- fixes [product#151](https://github.com/meilisearch/product/discussions/151)

> Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22`

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-29 15:24:36 +00:00
ManyTheFish
84845de9ef Update Charabia 2023-06-29 15:56:32 +02:00
Kerollmops
a385642ec3 Replace the BTreeMap by an IndexMap to return values in order 2023-06-29 14:33:31 +02:00
Kerollmops
c79e82c62a Move back to the hnsw crate
This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.
2023-06-27 12:32:39 +02:00
Kerollmops
268a9ef416 Move to the hgg crate 2023-06-27 12:32:38 +02:00
Clément Renault
4571e512d2 Store the vectors in an HNSW in LMDB 2023-06-27 12:32:38 +02:00
Clément Renault
34349faeae Create a new _vector extractor 2023-06-27 12:32:37 +02:00
meili-bors[bot]
45636d315c Merge #3670
3670: Fix addition deletion bug r=irevoire a=irevoire

The first commit of this PR is a revert of https://github.com/meilisearch/meilisearch/pull/3667. It re-enable the auto-batching of addition and deletion of tasks. No new changes have been introduced outside of `milli`. So all the changes you see on the autobatcher have actually already been reviewed.

It fixes https://github.com/meilisearch/meilisearch/issues/3440.

### What was happening?

The issue was that the `external_documents_ids` generated in the `transform` were used in a very strange way that wasn’t compatible with the deletion of documents.
Instead of doing a clear merge between the external document IDs of the DB and the one returned by the transform + writing it on disk, we were doing some weird tricks with the soft-deleted to avoid writing the fst on disk as much as possible.
The new algorithm may be a bit slower but is way more straightforward and doesn’t change depending on if the soft deletion was used or not. Here is a list of the changes introduced:
1. We now do a clear distinction between the `new_external_documents_ids` coming from the transform and only held on RAM and the `external_documents_ids` coming from the DB.
2. The `new_external_documents_ids` (coming out of the transform) are now represented as an `fst`. We don't need to struggle with the hard, soft distinction + the soft_deleted => That's easier to understand
3. When indexing documents, we merge the `external_documents_ids` coming from the DB and the `new_external_documents_ids` coming from the transform.

### Other things introduced in this  PR

Since we constantly have to write small, very specialized fuzzers for this kind of bug, we decided to push the one used to reproduce this bug.
It's not perfect, but it's easy to improve in the future.
It'll also run for as long as possible on every merge on the main branch.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
2023-06-19 09:09:30 +00:00
Tamo
6c6387d05e move the fuzzer to its own crate 2023-05-29 12:27:39 +02:00
Tamo
602ad98cb8 improve the way we handle the fsts 2023-05-22 11:15:14 +02:00
Tamo
4391cba6ca fix the addition + deletion bug 2023-05-17 18:28:57 +02:00
Kerollmops
1a79fd0c3c Use the new heed v0.12.6 2023-05-15 11:42:30 +02:00
Kerollmops
c4a40e7110 Use the writemap flag to reduce the memory usage 2023-05-15 10:15:33 +02:00
Jakub Jirutka
13f1277637 Allow to disable specialized tokenizations (again)
In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai`
feature flags to allow melisearch to be built without huge specialed
tokenizations that took up 90% of the melisearch binary size.
Unfortunately, due to some recent changes, this doesn't work anymore.
The problem lies in excessive use of the `default` feature flag, which
infects the dependency graph.

Instead of adding `default-features = false` here and there, it's easier
and more future-proof to not declare `default` in `milli` and
`meilisearch-types`. I've renamed it to `all-tokenizers`, which also
makes it a bit clearer what it's about.
2023-05-04 15:45:40 +02:00
Louis Dureuil
90bc230820 Merge remote-tracking branch 'origin/main' into search-refactor
Conflicts | resolution
----------|-----------
Cargo.lock | added mimalloc
Cargo.toml |  took origin/main version
milli/src/search/criteria/exactness.rs | deleted after checking it was only clippy changes
milli/src/search/query_tree.rs | deleted after checking it was only clippy changes
2023-05-03 12:19:06 +02:00
ManyTheFish
249053e514 Update feature flags 2023-04-26 14:59:25 +02:00
ManyTheFish
ff2cf2a5ae Update charabia in milli 2023-04-26 14:56:54 +02:00
Kerollmops
a109802d45 Upgrade the incompatible versions of the dependencies 2023-04-24 17:50:57 +02:00
Kerollmops
47b66e49b8 Upgrade the compatible versions of the dependencies 2023-04-24 17:50:52 +02:00