Commit Graph

958 Commits

Author SHA1 Message Date
bors[bot]
ee3a49cfba Merge #291
291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops

Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released.

It is not an undefined behavior, I repeat: it is a "normal" bug 🎉 👏

----

This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it.

As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`.

I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes.

<details>
  <summary>The backtrace leading us to a panic in grenad.</summary>

```
meilisearch_1    | thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1    | stack backtrace:
meilisearch_1    |    0: rust_begin_unwind
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5
meilisearch_1    |    1: core::panicking::panic_fmt
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14
meilisearch_1    |    2: core::panicking::panic
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5
meilisearch_1    |    3: grenad::block_builder::BlockBuilder::insert
meilisearch_1    |              at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1    |    4: grenad::writer::Writer<W>::insert
meilisearch_1    |              at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12
meilisearch_1    |    5: milli::update::words_level_positions::write_level_entry
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5
meilisearch_1    |    6: milli::update::words_level_positions::compute_positions_levels
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13
meilisearch_1    |    7: milli::update::words_level_positions::WordsLevelPositions::execute
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23
meilisearch_1    |    8: milli::update::index_documents::IndexDocuments::execute_raw
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9
meilisearch_1    |    9: milli::update::index_documents::IndexDocuments::execute
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9
meilisearch_1    |   10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30
meilisearch_1    |   11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22
meilisearch_1    |   12: meilisearch_http::index::update_handler::UpdateHandler::handle_update
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18
meilisearch_1    |   13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}}
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35
meilisearch_1    |   14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21
meilisearch_1    |   15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17
meilisearch_1    |   16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9
meilisearch_1    |   17: tokio::runtime::task::core::CoreStage<T>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13
meilisearch_1    |   18: tokio::runtime::task::harness::poll_future::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23
meilisearch_1    |   19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
meilisearch_1    |   20: std::panicking::try::do_call
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
meilisearch_1    |   21: std::panicking::try
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
meilisearch_1    |   22: std::panic::catch_unwind
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
meilisearch_1    |   23: tokio::runtime::task::harness::poll_future
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19
meilisearch_1    |   24: tokio::runtime::task::harness::Harness<T,S>::poll_inner
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9
meilisearch_1    |   25: tokio::runtime::task::harness::Harness<T,S>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15
meilisearch_1    |   26: tokio::runtime::task::raw::RawTask::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18
meilisearch_1    |   27: tokio::runtime::task::Notified<S>::run
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9
meilisearch_1    |   28: tokio::runtime::blocking::pool::Inner::run
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17
meilisearch_1    |   29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17
meilisearch_1    | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```

</details>

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-22 16:14:35 +00:00
Kerollmops
0353fbb5df Bump the tokenizer version to v0.2.4 2021-07-22 17:14:45 +02:00
Kerollmops
92c0a2cdc1 Add a test that triggers a panic when indexing zeroes 2021-07-22 17:14:44 +02:00
Kerollmops
aa02a7fdd8 Add a test to check that we indeed impact the relevancy 2021-07-22 17:04:38 +02:00
Clément Renault
0227254a65 Return the original string values for the inverted facet index database 2021-07-21 16:59:39 +02:00
Kerollmops
03a01166ba Display the original facet string value from the linear facet database 2021-07-21 16:59:39 +02:00
Clément Renault
d23c250ad5 Fix a bound error in the facet string range construction 2021-07-21 16:59:39 +02:00
Clément Renault
081278dfd6 Use the facet string levels when computing the facet distribution 2021-07-21 16:59:39 +02:00
Clément Renault
5676b204dd Fix the facet string levels codecs 2021-07-21 16:59:38 +02:00
Kerollmops
8c86348119 Indexing the facet strings levels 2021-07-21 16:59:38 +02:00
Kerollmops
a7ae552ba7 Fix the FacetStringLevelZeroRange range when unbounded 2021-07-21 16:59:38 +02:00
Kerollmops
757b2b502a Remove the FacetValueStringCodec 2021-07-21 16:59:38 +02:00
Kerollmops
adfd4da24c Introduce the FacetStringIter iterator 2021-07-21 16:59:38 +02:00
Kerollmops
a79661c6dc Introduce a lot of facet string helper iterators 2021-07-21 16:59:38 +02:00
Kerollmops
851f979039 Describe the way we want to group the facet strings 2021-07-21 16:59:38 +02:00
Kerollmops
f858f64b1f Move the facet number iterators into their own module 2021-07-21 16:59:37 +02:00
Kerollmops
9f8095c069 Make sure that we don't keep a reference on the LMDB key when using put_current 2021-07-21 10:35:35 +02:00
Kerollmops
a9553af635 Add a test to check that we can index more that 256 fields 2021-07-06 11:58:03 +02:00
Kerollmops
838ed1cd32 Use an u16 field id instead of one byte 2021-07-06 11:58:03 +02:00
Kerollmops
91c5d0c042 Use the AlwaysFreePages flag when opening an index 2021-07-05 16:36:13 +02:00
Kerollmops
a6b4069172 Bump to v0.7.2 2021-07-05 10:54:53 +02:00
many
9f62149b94 Fix matching lenghth in matching_words 2021-07-01 19:03:28 +02:00
Clémentine Urquizar
3c149d8a43 Update tokenizer version to v0.2.3 2021-06-30 18:41:35 +02:00
bors[bot]
b4dcdbf00d Merge #269 #271
269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops

This PR fixes #268.

The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted.

The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html).

271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops

In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-30 12:34:55 +00:00
Kerollmops
32b7bd366f Remove the roaring operation functions warnings 2021-06-30 14:12:56 +02:00
Kerollmops
c92ef54466 Add a test for when we insert a previously deleted document 2021-06-30 14:00:01 +02:00
Kerollmops
28782ff99d Fix ExternalDocumentsIds struct when inserting previously deleted ids 2021-06-30 14:00:01 +02:00
Clémentine Urquizar
b489515f4d Update milli version to v0.7.1 2021-06-30 13:52:46 +02:00
Kerollmops
54889813ce Implement some debug functions on the ExternalDocumentsIds struct 2021-06-30 11:29:41 +02:00
Kerollmops
4bce66d5ff Make the Index::delete_* method private 2021-06-30 10:07:31 +02:00
Irevoire
6044b80362 Update milli/src/search/matching_words.rs
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2021-06-30 00:35:26 +02:00
Tamo
be75e738b1 add more tests 2021-06-29 16:24:58 +02:00
Tamo
56fceb1928 re-implement the Damerau-Levenshtein used for the highlighting 2021-06-29 15:36:03 +02:00
Clément Renault
80c6aaf1fd Bump milli to 0.7.0 2021-06-28 18:31:56 +02:00
Clément Renault
bdc5599b73 Bump heed to use the git repo with v0.12.0 2021-06-28 18:26:20 +02:00
Clément Renault
0013236e5d Fix the LMDB and heed invalid interactions.
It is undefined behavior to keep a reference to the database while
modifying it, we were keeping references in the database and also
feeding the heed put_current methods with keys referenced inside
the database itself.

https://github.com/Kerollmops/heed/pull/108
2021-06-28 16:19:02 +02:00
Kerollmops
9e5f9a8a10 Add a test for the words level positions generation bug 2021-06-28 16:08:31 +02:00
Kerollmops
98285b4b18 Bump milli to 0.6.0 2021-06-23 17:30:26 +02:00
Kerollmops
4fc8f06791 Rename faceted_fields into filterable_fields 2021-06-23 17:26:54 +02:00
Kerollmops
c31cadb54f Do not consider the searchable field as filterable 2021-06-23 17:26:54 +02:00
bors[bot]
2ab24c4f49 Merge #256
256: Update version for the next release (v0.5.1) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-23 12:29:57 +00:00
Clémentine Urquizar
9885fb4159 Update version for the next release (v0.5.1) 2021-06-23 14:05:20 +02:00
Kerollmops
a6218a20ae Introduce a new InvalidFacetsDistribution user error 2021-06-23 13:56:19 +02:00
Kerollmops
2364777838 Return an error for when a field distribution cannot be done 2021-06-23 11:50:49 +02:00
Kerollmops
aeaac743ff Replace an if let some by a match 2021-06-23 11:33:30 +02:00
Tamo
8d2a0b43ff run the formatter on the whole project a second time 2021-06-22 15:36:22 +02:00
Tamo
3d90b03d7b fix the limit
There was no check on the limit and thus, if a user especified a very large number this line could causes a panic
2021-06-22 14:52:13 +02:00
bors[bot]
5b6adc6d96 Merge #245
245: Warn for when a key is too large for LMDB r=Kerollmops a=Kerollmops

Closes #191, and resolves #140.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-22 12:10:52 +00:00
Kerollmops
51dbb2e06d Warn for when a key is too large for LMDB 2021-06-22 11:51:36 +02:00
Kerollmops
aecbd14761 Improve the error message for InvalidDocumentId 2021-06-22 11:31:58 +02:00