Commit Graph

1164 Commits

Author SHA1 Message Date
bors[bot]
16698f714b Merge #287
287: Add benchmarks for indexing r=Kerollmops a=irevoire

closes #274 
I don't really know how much time this will take on our bench machine. I'm afraid the wiki dataset will take a really long time to bench (it takes 1h30 on my computer).

If you are ok with it, I would like to merge this first PR since it introduces a first set of benchmarks and see how much time it takes in reality on our setup.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-07 15:41:15 +00:00
Tamo
931021fe57 add benchmarks for indexing 2021-07-07 13:09:05 +02:00
bors[bot]
4c9531bdf3 Merge #285
285: Support documents with at most 65536 fields r=Kerollmops a=Kerollmops

Fixes #248.

In this PR I updated the `obkv` crate, it now supports arbitrary key length and therefore I was able to use an `u16` to represent the fields instead of a single byte. It was impressively easy to update the whole codebase 🍡 🍔

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-06 16:44:51 +00:00
Kerollmops
0a78107525 Fix the infos crate to make it read u16 field ids 2021-07-06 11:58:03 +02:00
Kerollmops
a9553af635 Add a test to check that we can index more that 256 fields 2021-07-06 11:58:03 +02:00
Kerollmops
838ed1cd32 Use an u16 field id instead of one byte 2021-07-06 11:58:03 +02:00
bors[bot]
cc54c41e30 Merge #283
283: Use the AlwaysFreePages flag when opening an index r=irevoire a=Kerollmops

We introduced a new flag in our fork of LMDB, this `AlwaysFreePages` flag forces LMDB to always free the single pages it uses before writing to the disk instead of keeping them in a linked list.

Declaring this flag reduces the memory print (leak) we have on memory after indexing a lot of documents.

Fixes #279.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-05 16:59:16 +00:00
bors[bot]
63db43cc7a Merge #284
284: [http-ui] Introduce the route `die` r=Kerollmops a=irevoire

This route just `exit` the process. This can come in handy when you run `http-ui` inside of another process (a profiler for example), and you don't want to kill everything

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-07-05 15:47:53 +00:00
Irevoire
4562b278a8 remove a warning and add a log
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-07-05 17:46:02 +02:00
Tamo
a57e522a67 introduce a die route let the program exit itself alone 2021-07-05 17:38:10 +02:00
Kerollmops
91c5d0c042 Use the AlwaysFreePages flag when opening an index 2021-07-05 16:36:13 +02:00
bors[bot]
007fec21fc Merge #281
281: Bump to v0.7.2 r=ManyTheFish a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-05 09:00:26 +00:00
Kerollmops
a6b4069172 Bump to v0.7.2 2021-07-05 10:54:53 +02:00
bors[bot]
d7bc6a6999 Merge #280
280: Fix matching lenghth in matching_words r=Kerollmops a=ManyTheFish

related to https://github.com/meilisearch/MeiliSearch/issues/1441

Co-authored-by: many <maxime@meilisearch.com>
2021-07-01 18:50:46 +00:00
many
9f62149b94 Fix matching lenghth in matching_words 2021-07-01 19:03:28 +02:00
bors[bot]
f25f454bd4 Merge #275
275: Fix the benchmarks dependencies r=Kerollmops a=irevoire

Import exactly the same dependency as milli instead of a wildcard that can do anything

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <irevoire@protonmail.ch>
2021-07-01 11:07:01 +00:00
bors[bot]
885f243afc Merge #276
276: Fix the fmt of the auto-generated file r=Kerollmops a=irevoire

The file generated by the `build.rs` file of the benchmark was badly formatted and that was causing an issue with the git pre-commit hook I wrote [earlier](https://github.com/meilisearch/milli/blob/main/script/pre-commit)

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-01 10:24:36 +00:00
Irevoire
ec87bf3dd5 Update benchmarks/Cargo.toml
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2021-07-01 11:45:05 +02:00
Tamo
ef965aa3f3 fix the fmt of the auto-generated file 2021-07-01 11:43:09 +02:00
Tamo
fc09d77e89 fix the benchmarks dependcies 2021-07-01 11:38:30 +02:00
bors[bot]
056180e6c8 Merge #273
273: Update tokenizer version to v0.2.3 r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-07-01 09:02:16 +00:00
Clémentine Urquizar
3c149d8a43 Update tokenizer version to v0.2.3 2021-06-30 18:41:35 +02:00
bors[bot]
b4dcdbf00d Merge #269 #271
269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops

This PR fixes #268.

The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted.

The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html).

271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops

In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-30 12:34:55 +00:00
Kerollmops
32b7bd366f Remove the roaring operation functions warnings 2021-06-30 14:12:56 +02:00
bors[bot]
00e2845f0f Merge #270
270: Update milli version to v0.7.1 r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-30 12:12:24 +00:00
Kerollmops
c92ef54466 Add a test for when we insert a previously deleted document 2021-06-30 14:00:01 +02:00
Kerollmops
28782ff99d Fix ExternalDocumentsIds struct when inserting previously deleted ids 2021-06-30 14:00:01 +02:00
Clémentine Urquizar
b489515f4d Update milli version to v0.7.1 2021-06-30 13:52:46 +02:00
Kerollmops
54889813ce Implement some debug functions on the ExternalDocumentsIds struct 2021-06-30 11:29:41 +02:00
Kerollmops
4bce66d5ff Make the Index::delete_* method private 2021-06-30 10:07:31 +02:00
bors[bot]
66e6ea56b8 Merge #267
267: Highlighting r=Kerollmops a=irevoire

closes #262 
I basically rewrote a part of the damerau-levenshtein function we were using for the highlighting to accept at most two errors from the user and stop on the third mistake.
Also, now it supports utf-8, so it should fix our issue.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <irevoire@protonmail.ch>
2021-06-30 05:43:50 +00:00
Irevoire
6044b80362 Update milli/src/search/matching_words.rs
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2021-06-30 00:35:26 +02:00
Tamo
be75e738b1 add more tests 2021-06-29 16:24:58 +02:00
Tamo
56fceb1928 re-implement the Damerau-Levenshtein used for the highlighting 2021-06-29 15:36:03 +02:00
bors[bot]
9dbc8b2dd0 Merge #266
266: Bump LMDB to the latest version (v0.9.70) r=Kerollmops a=Kerollmops

By bumping to a new version of heed (from git, v0.12.0 unpublished yet), this PR fixes Windows disk reservation problems. This new version of heed changes the `del/put_current`, and `append` iterator methods signature by declaring them unsafe.

This PR also bumps milli itself into v0.7.0 as it is breaking due to the heed/LMDB bump.

This PR must be merged after #264.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-28 17:11:41 +00:00
Clément Renault
80c6aaf1fd Bump milli to 0.7.0 2021-06-28 18:31:56 +02:00
Clément Renault
bdc5599b73 Bump heed to use the git repo with v0.12.0 2021-06-28 18:26:20 +02:00
Clément Renault
73384aec21 Merge pull request #264 from meilisearch/fix-heed-undefined-behavior
Fix the invalid heed usage
2021-06-28 18:23:49 +02:00
Clément Renault
0013236e5d Fix the LMDB and heed invalid interactions.
It is undefined behavior to keep a reference to the database while
modifying it, we were keeping references in the database and also
feeding the heed put_current methods with keys referenced inside
the database itself.

https://github.com/Kerollmops/heed/pull/108
2021-06-28 16:19:02 +02:00
Kerollmops
9e5f9a8a10 Add a test for the words level positions generation bug 2021-06-28 16:08:31 +02:00
bors[bot]
c38b0b883d Merge #257
257: Fix unconditional facet indexing r=Kerollmops a=Kerollmops

We were indexing every searchable field as filterable, this was a mistake.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-23 15:32:46 +00:00
Kerollmops
98285b4b18 Bump milli to 0.6.0 2021-06-23 17:30:26 +02:00
Kerollmops
4fc8f06791 Rename faceted_fields into filterable_fields 2021-06-23 17:26:54 +02:00
Kerollmops
c31cadb54f Do not consider the searchable field as filterable 2021-06-23 17:26:54 +02:00
bors[bot]
41c4a5b60d Merge #246
246: Improve the ci r=Kerollmops a=irevoire

Rewrite the CI entirely:
- run the ci on Linux, macOS and Windows.
- run the ci on rust stable, beta and nightly
- add rustfmt to the CI.
- split the CI into multiple tasks, this way, the ci should be faster to fail

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-06-23 12:52:39 +00:00
Irevoire
faa3cd3b71 Update bors.toml
Don't check nightly and beta channel

Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-23 14:30:33 +02:00
bors[bot]
2ab24c4f49 Merge #256
256: Update version for the next release (v0.5.1) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-23 12:29:57 +00:00
Clémentine Urquizar
9885fb4159 Update version for the next release (v0.5.1) 2021-06-23 14:05:20 +02:00
bors[bot]
66f55e3e6a Merge #255
255: Fix facet distribution error r=Kerollmops a=Kerollmops

This PR fixes two invalid behaviors and fixes #253:
 - We were ignoring the list of fields for which the user wanted a facet distribution.
 - We were not raising any error for when a non-filterable field was requested a facet distribution.

~For the latter behavior I need the help of @curquiza to help me choose the right error type.~

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-23 12:03:05 +00:00
Kerollmops
a6218a20ae Introduce a new InvalidFacetsDistribution user error 2021-06-23 13:56:19 +02:00