Commit Graph

1050 Commits

Author SHA1 Message Date
cb45a10bcd Merge #298
298: Rename the search benchmarks r=Kerollmops a=irevoire

And fix a bug. As always, I was not closing the env.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-29 15:33:15 +00:00
7eb2d71009 fix the benchmarks 2021-07-29 16:27:05 +02:00
976dc1f4bc prefix the search benchmarks with 'search' 2021-07-29 16:27:05 +02:00
1290edd58a Merge #297
297: Bump milli to v0.8.1 r=curquiza a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-29 14:19:41 +00:00
341c244965 Bump milli to v0.8.1 2021-07-29 15:56:36 +02:00
d962e46ed1 Merge #296
296: Fix invalid faceted documents ids buffer size r=Kerollmops a=Kerollmops

Fix a bug found by `@irevoire` when benchmarking the search.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-29 13:52:34 +00:00
90514e03d1 Fix invalid faceted documents ids buffer size 2021-07-29 15:49:23 +02:00
200e98c211 Merge #293
293: Make sure that the relevancy is not impacted by other settings r=Kerollmops a=Kerollmops

Fix https://github.com/meilisearch/meilisearch/issues/1505.

fix https://github.com/meilisearch/MeiliSearch/issues/1529

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-27 16:04:52 +00:00
bc845324df Merge #295
295: Update version for the next release (v0.8.0) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-07-27 14:42:10 +00:00
6a141694da Update version for the next release (v0.8.0) 2021-07-27 16:38:42 +02:00
dc2b63abdf Introduce an empty FilterCondition variant to support unknown fields 2021-07-27 16:34:04 +02:00
4ab7ca0e83 Merge #288
288: Stop tracking the Cargo.lock and add cache + windows to the CI r=curquiza a=irevoire

We reuse the same `~/.cargo` and `./target` directory between each run on the same OS and rust toolchain.
The `key` to decide if we can use the cache or not is: `$OS_NAME-$RUST_TOOLCHAIN-$HASH(Cargo.toml)`

We also removed the `Cargo.lock` from this repository. Indeed, milli is a library and [should not track the `Cargo.lock`](https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html)

And finally, we enabled the tests on `windows-latest`. Since `lmdb` has been updated, this is now possible.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-26 14:22:19 +00:00
0038b3848a add a simple github cache 2021-07-26 15:31:26 +02:00
88646a63a1 update bors 2021-07-26 15:31:00 +02:00
b12738cfe9 Use the right DB prefixes to store the faceted fields 2021-07-22 19:18:22 +02:00
7aa6cc9b04 Do not insert fields in the map when changing the settings 2021-07-22 18:40:12 +02:00
ee3a49cfba Merge #291
291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops

Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released.

It is not an undefined behavior, I repeat: it is a "normal" bug 🎉 👏

----

This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it.

As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`.

I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes.

<details>
  <summary>The backtrace leading us to a panic in grenad.</summary>

```
meilisearch_1    | thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1    | stack backtrace:
meilisearch_1    |    0: rust_begin_unwind
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5
meilisearch_1    |    1: core::panicking::panic_fmt
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14
meilisearch_1    |    2: core::panicking::panic
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5
meilisearch_1    |    3: grenad::block_builder::BlockBuilder::insert
meilisearch_1    |              at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1    |    4: grenad::writer::Writer<W>::insert
meilisearch_1    |              at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12
meilisearch_1    |    5: milli::update::words_level_positions::write_level_entry
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5
meilisearch_1    |    6: milli::update::words_level_positions::compute_positions_levels
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13
meilisearch_1    |    7: milli::update::words_level_positions::WordsLevelPositions::execute
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23
meilisearch_1    |    8: milli::update::index_documents::IndexDocuments::execute_raw
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9
meilisearch_1    |    9: milli::update::index_documents::IndexDocuments::execute
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9
meilisearch_1    |   10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30
meilisearch_1    |   11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22
meilisearch_1    |   12: meilisearch_http::index::update_handler::UpdateHandler::handle_update
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18
meilisearch_1    |   13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}}
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35
meilisearch_1    |   14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21
meilisearch_1    |   15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17
meilisearch_1    |   16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9
meilisearch_1    |   17: tokio::runtime::task::core::CoreStage<T>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13
meilisearch_1    |   18: tokio::runtime::task::harness::poll_future::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23
meilisearch_1    |   19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
meilisearch_1    |   20: std::panicking::try::do_call
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
meilisearch_1    |   21: std::panicking::try
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
meilisearch_1    |   22: std::panic::catch_unwind
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
meilisearch_1    |   23: tokio::runtime::task::harness::poll_future
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19
meilisearch_1    |   24: tokio::runtime::task::harness::Harness<T,S>::poll_inner
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9
meilisearch_1    |   25: tokio::runtime::task::harness::Harness<T,S>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15
meilisearch_1    |   26: tokio::runtime::task::raw::RawTask::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18
meilisearch_1    |   27: tokio::runtime::task::Notified<S>::run
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9
meilisearch_1    |   28: tokio::runtime::blocking::pool::Inner::run
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17
meilisearch_1    |   29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17
meilisearch_1    | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```

</details>

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-22 16:14:35 +00:00
0353fbb5df Bump the tokenizer version to v0.2.4 2021-07-22 17:14:45 +02:00
92c0a2cdc1 Add a test that triggers a panic when indexing zeroes 2021-07-22 17:14:44 +02:00
aa02a7fdd8 Add a test to check that we indeed impact the relevancy 2021-07-22 17:04:38 +02:00
77de82aaa4 Merge #254
254: Improve the facet string distribution speed r=Kerollmops a=Kerollmops

This pull request creates a data structure similar to the one we use for the faceted numbers, a tetratomic decision tree but this time for the facet strings. This PR also changes the facet distribution behavior by returning one of the original facet values, fixes #260.

This data structure defines bucket-like structures where documents ids are stored under their facet value and helps the search decide if it wants to move to a lower level under a given bucket or not, depending on if the current bucket contains interesting documents or not. The whole format, algorithm, and previous attempts are explained in the [`facet_string.rs` file](ec1cfdd42b/milli/src/search/facet/facet_string.rs).

Note that this data structure **could** be used to sort by string lexicographically, that hypothetically possible. We need more testing, in terms of performance and quality, as we will sort on lowercased versions of the facet values.

 - [x] Implement a faster and more precise way to fetch the facet distribution.
 - [x] Store and return the original facet string value. We currently return the lowercased version.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-07-21 15:34:40 +00:00
0227254a65 Return the original string values for the inverted facet index database 2021-07-21 16:59:39 +02:00
03a01166ba Display the original facet string value from the linear facet database 2021-07-21 16:59:39 +02:00
d23c250ad5 Fix a bound error in the facet string range construction 2021-07-21 16:59:39 +02:00
081278dfd6 Use the facet string levels when computing the facet distribution 2021-07-21 16:59:39 +02:00
5676b204dd Fix the facet string levels codecs 2021-07-21 16:59:38 +02:00
8c86348119 Indexing the facet strings levels 2021-07-21 16:59:38 +02:00
a7ae552ba7 Fix the FacetStringLevelZeroRange range when unbounded 2021-07-21 16:59:38 +02:00
757b2b502a Remove the FacetValueStringCodec 2021-07-21 16:59:38 +02:00
adfd4da24c Introduce the FacetStringIter iterator 2021-07-21 16:59:38 +02:00
a79661c6dc Introduce a lot of facet string helper iterators 2021-07-21 16:59:38 +02:00
851f979039 Describe the way we want to group the facet strings 2021-07-21 16:59:38 +02:00
f858f64b1f Move the facet number iterators into their own module 2021-07-21 16:59:37 +02:00
9f8095c069 Make sure that we don't keep a reference on the LMDB key when using put_current 2021-07-21 10:35:35 +02:00
fa44e95c91 Merge #290
290: Add a $HOME to the CI r=Kerollmops a=irevoire

This should fix this issue:
https://github.com/meilisearch/milli/runs/3104228432?check_suite_focus=true

I think a real fix would be to fix the configuration of our github runner but I don't know how to do it.
@curquiza could probably help us on that once she's back from vacation 😄 

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-20 07:32:46 +00:00
0ab541627b add a $HOME var to the ci 2021-07-19 14:33:49 +02:00
16698f714b Merge #287
287: Add benchmarks for indexing r=Kerollmops a=irevoire

closes #274 
I don't really know how much time this will take on our bench machine. I'm afraid the wiki dataset will take a really long time to bench (it takes 1h30 on my computer).

If you are ok with it, I would like to merge this first PR since it introduces a first set of benchmarks and see how much time it takes in reality on our setup.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-07 15:41:15 +00:00
931021fe57 add benchmarks for indexing 2021-07-07 13:09:05 +02:00
4c9531bdf3 Merge #285
285: Support documents with at most 65536 fields r=Kerollmops a=Kerollmops

Fixes #248.

In this PR I updated the `obkv` crate, it now supports arbitrary key length and therefore I was able to use an `u16` to represent the fields instead of a single byte. It was impressively easy to update the whole codebase 🍡 🍔

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-06 16:44:51 +00:00
0a78107525 Fix the infos crate to make it read u16 field ids 2021-07-06 11:58:03 +02:00
a9553af635 Add a test to check that we can index more that 256 fields 2021-07-06 11:58:03 +02:00
838ed1cd32 Use an u16 field id instead of one byte 2021-07-06 11:58:03 +02:00
cc54c41e30 Merge #283
283: Use the AlwaysFreePages flag when opening an index r=irevoire a=Kerollmops

We introduced a new flag in our fork of LMDB, this `AlwaysFreePages` flag forces LMDB to always free the single pages it uses before writing to the disk instead of keeping them in a linked list.

Declaring this flag reduces the memory print (leak) we have on memory after indexing a lot of documents.

Fixes #279.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-05 16:59:16 +00:00
63db43cc7a Merge #284
284: [http-ui] Introduce the route `die` r=Kerollmops a=irevoire

This route just `exit` the process. This can come in handy when you run `http-ui` inside of another process (a profiler for example), and you don't want to kill everything

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-07-05 15:47:53 +00:00
4562b278a8 remove a warning and add a log
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-07-05 17:46:02 +02:00
a57e522a67 introduce a die route let the program exit itself alone 2021-07-05 17:38:10 +02:00
91c5d0c042 Use the AlwaysFreePages flag when opening an index 2021-07-05 16:36:13 +02:00
007fec21fc Merge #281
281: Bump to v0.7.2 r=ManyTheFish a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-05 09:00:26 +00:00
a6b4069172 Bump to v0.7.2 2021-07-05 10:54:53 +02:00
d7bc6a6999 Merge #280
280: Fix matching lenghth in matching_words r=Kerollmops a=ManyTheFish

related to https://github.com/meilisearch/MeiliSearch/issues/1441

Co-authored-by: many <maxime@meilisearch.com>
2021-07-01 18:50:46 +00:00