Commit Graph

1405 Commits

Author SHA1 Message Date
cda4ba2bb6 Add document import tests 2022-12-05 12:02:49 +01:00
ae59d37b75 Improve insta-snap of the external document ids 2022-12-05 10:51:02 +01:00
f2cf981641 Add more tests and allow disabling of soft-deletion outside of tests
Also allow disabling soft-deletion in the IndexDocumentsConfig
2022-12-05 10:51:01 +01:00
50954d31fa feat: Re-export Span and Token to milli:: 2022-12-03 13:37:33 -05:00
d3731dda48 Merge #706
706: Limit the reindexing caused by updating settings when not needed r=curquiza a=GregoryConrad

## What does this PR do?
When updating index settings using `update::Settings`, sometimes a `reindex` of `update::Settings` is triggered when it doesn't need to be. This PR aims to prevent those unnecessary `reindex` calls.

For reference, here is a snippet from the current `execute` method in `update::Settings`:
```rust
// ...
if stop_words_updated
    || faceted_updated
    || synonyms_updated
    || searchable_updated
    || exact_attributes_updated
{
    self.reindex(&progress_callback, &should_abort, old_fields_ids_map)?;
}
```

- [x] `faceted_updated` - looks good as-is 
- [x] `stop_words_updated` - looks good as-is 
- [x] `synonyms_updated` - looks good as-is 
- [x] `searchable_updated` - fixed in this PR
- [x] `exact_attributes_updated` - fixed in this PR

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>
2022-12-01 13:58:02 +00:00
5e754b3ee0 Merge #708
708: Reduce memory usage of the MatchingWords structure r=ManyTheFish a=loiclec

# Pull Request

## Related issue
Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3115 

## What does this PR do?
1. Reduces the memory usage caused by the creation of a 10-word query tree by 20x. 
   This is done by deduplicating the `MatchingWord` values, which are heavy because of their inner DFA. The deduplication works by wrapping each `MatchingWord` in a reference-counted box and using a hash map to determine whether a  `MatchingWord` DFA already exists for a certain signature, or whether a new one needs to be built.
 
2. Avoid the worst-case scenario of creating a `MatchingWord` for extremely long words that cannot be indexed by milli.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-30 17:47:34 +00:00
e1612fcb01 Merge #712
712: Fix bulk facet indexing bug r=Kerollmops a=loiclec

# Pull Request

## Related issue
Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3165

## What does this PR do?
Fixes a bug where indexing certain numbers of filterable attribute values in bulk led to corrupted facet databases. This was due to a lossy integer conversion which would ultimately prevent entire levels of the facet database to be written into LMDB.

More specifically, this change was made:
```diff
      - if cur_writer_len as u8 >= self.min_level_size {
      + if cur_writer_len >= self.min_level_size as usize {
```
I also checked other comparisons to `min_level_size` and other conversions such as `x as u8` in this part of the codebase.



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-30 16:51:48 +00:00
9dd4b33a9a Fix bulk facet indexing bug 2022-11-30 14:27:36 +01:00
87e2bc3bed fix(reindex): reindex in a few more cases
Cases: whenever searchable_fields OR user_defined_searchable_fields is modified
2022-11-28 13:12:19 -05:00
61b58b115a Don't create partial matching words for synonyms in ngrams 2022-11-28 16:32:28 +01:00
d3182f3830 refactor: Change return type to keep consistency with others 2022-11-28 10:02:03 -05:00
f70856bab1 Remove memory usage test that fails when many tests are run in parallel 2022-11-28 12:55:28 +01:00
e2ebed62b1 Don't create partial matching words for synonyms, split words, phrases 2022-11-28 10:20:13 +01:00
8284bd760f Relax memory ordering of operations within the test CountingAlloc 2022-11-28 10:20:13 +01:00
8d0ace2d64 Avoid creating a MatchingWord for words that exceed the length limit 2022-11-28 10:20:13 +01:00
86c34a996b Deduplicate matching words 2022-11-28 10:20:13 +01:00
e0d24104a3 refactor: Rewrite another method chain to be more readable 2022-11-26 13:33:19 -05:00
2db738dbac refactor: rewrite method chain to be more readable 2022-11-26 13:26:39 -05:00
935a724c57 revert: Revert pass by reference API change 2022-11-24 10:08:23 -05:00
ed29cceae9 perf: Prevent reindex in searchable set case when not needed 2022-11-23 22:33:06 -05:00
bb9e33bf85 perf: Prevent reindex in searchable reset case when not needed 2022-11-23 22:01:46 -05:00
7c0e544839 feat: Add all_obkv_to_json function 2022-11-23 21:18:58 -05:00
d19c8672bb perf: limit reindex to when exact_attributes changes 2022-11-23 15:50:53 -05:00
57c9f03e51 Merge #697
697: Fix bug in prefix DB indexing r=loiclec a=loiclec

Where the batch's information was not properly updated in cases where only the proximity changed between two consecutive word pair proximities.

Closes partially https://github.com/meilisearch/meilisearch/issues/3043



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-17 15:22:01 +00:00
cd5aaa3a9f Update version for the next release (v0.37.0) in Cargo.toml files 2022-11-17 12:50:07 +00:00
777eb3fa00 Add insta-snaps for test of bug 3043 2022-11-17 12:21:27 +01:00
0caadedd3b Make clippy happy 2022-11-17 12:17:53 +01:00
ac3baafbe8 Truncate facet values that are too long before indexing them 2022-11-17 11:29:42 +01:00
990a861241 Add test for indexing a document with a long facet value 2022-11-17 11:29:42 +01:00
d95d02cb8a Fix Facet Indexing bugs
1. Handle keys with variable length correctly

This fixes https://github.com/meilisearch/meilisearch/issues/3042 and
is easily reproducible with the updated fuzz tests, which now generate
keys with variable lengths.

2. Prevent adding facets to the database if their encoded value does
not satisfy `valid_lmdb_key`.

This fixes an indexing failure when a document had a filterable
attribute containing a value whose length is higher than ~500 bytes.
2022-11-17 11:29:42 +01:00
f00108d2ec Fix name of bug in reproduction test 2022-11-17 11:29:18 +01:00
f7c8730d09 Fix bug in prefix DB indexing
Where the batch's information was not properly updated in cases
where only the proximity changed between two consecutive word pair
proximities.

Closes https://github.com/meilisearch/meilisearch/issues/3043
2022-11-17 11:29:18 +01:00
6dc6a5d874 Force using vendored version of LMDB
- don't use lmdb master3 branch anymore
2022-11-14 17:17:51 +01:00
d00d2aab3f Update version for the next release (v0.36.0) in Cargo.toml files 2022-11-09 11:03:09 +00:00
f46a8ab2e2 Merge #693
693: use the lmdb-master.3 branch r=Kerollmops a=irevoire

After investigating https://github.com/meilisearch/meilisearch/issues/3017, we found out that it was due to lmdb and that, without any code change on our side, bumping using the lmdb-master-3 branch fix our issues.

But, we’re not really confident about what changed between the `mdb.master` and `mdb.master3` branches; thus this is a temporary change, and we hope we’ll be able to move to the new version of heed asap (either before the end of the pre-release or for the next release).

--------

The bug is hard to reproduce; I can reproduce it 100% of the time on my archlinux personal computer. But on a scaleway archlinux bare-metal machine, it doesn’t reproduce. It’s flaky on our test suite, but `@loiclec` was able to write a minimal test that reproduces it every time on macOS.
Basically, what happens is when there are multiple threads opening databases in a different directory at the same time.
If there are 10 or more threads running at the same time, lmdb starts throwing the `Invalid argument (os error 22)` error for no reason, we believe.
I would like to submit an issue to lmdb, but I don’t really have the time to write a test in C without heed currently.

`@hyc,` if you want to take a look at it, here is the repo that reproduces the issue on macOS: https://github.com/irevoire/heed-bug

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-11-09 09:42:38 +00:00
c7711daca3 use the lmdb-master.3 branch 2022-11-08 16:28:01 +01:00
bd12989610 Update version for the next release (v0.35.1) in Cargo.toml files 2022-11-08 14:31:39 +00:00
24a298a83c Merge #690
690: Fix soft deleted bug settings r=ManyTheFish a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-08 13:45:10 +00:00
d85cd9bf1a Merge #689
689: Handle non-finite floats consistently in filters r=irevoire a=dureuill

# Pull Request

## Related issue

Related meilisearch/meilisearch#3000

## What does this PR do?

### User

- Filters using `field = inf`, (or `infinite`, `NaN`) now match the value as a string rather than returning an internal error.
- Filters using `field < inf` (or other comparison operators) now return an invalid_filter error rather than returning an internal error, much like when using `field < aaa`.

### Implementation

- Add new `NonFiniteFloat` error variants to the filter-parser errors
- Add `Token::parse_as_finite_float` that can fail both when the string is not a float and when the float is not finite
- Refactor `Filter::inner_evaluate` to always use `parse_as_finite_float` instead of just `parse`
- Add corresponding tests

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-11-08 13:24:38 +00:00
37b3c5c323 Fix transform to use all_documents and ignore soft_deleted documents 2022-11-08 14:23:16 +01:00
1b1ad1923b Add a test to check that we take care of soft deleted documents 2022-11-08 14:23:14 +01:00
a836b8e703 tests: Tests filter with non-finite floats 2022-11-08 13:56:55 +01:00
3328560788 fix: allow filters on = inf, = NaN, return InvalidFilter for < inf, < NaN
Fixes meilisearch/meilisearch#3000
2022-11-08 13:27:15 +01:00
abf1cf9cd5 Fix clippy errors 2022-11-04 09:27:46 +09:00
70465aa5ce Execute cargo fmt 2022-11-04 08:59:58 +09:00
3009981d31 Fix clippy errors
Add clippy job

Add clippy job to CI
2022-11-04 08:58:14 +09:00
6add470805 Merge #659
659: Fix clippy error to add clippy job on Ci r=Kerollmops a=unvalley

## Related PR
This PR is for #673 

## What does this PR do?
- ~~add `Run Clippy` job to CI (rust.yml)~~
- apply `cargo clippy --fix` command
- fix some `cargo clippy` error manually (but warnings still remain on tests)

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?


Co-authored-by: unvalley <kirohi.code@gmail.com>
Co-authored-by: unvalley <38400669+unvalley@users.noreply.github.com>
2022-11-03 15:24:38 +00:00
13175f2339 refactor: match for filterCondition 2022-11-03 17:34:33 +09:00
a07f0a4a43 Delete facet_string_zero_bounds_value_codec.rs 2022-10-30 08:59:04 +05:30
2dec6e86e9 Delete facet_string_level_zero_value_codec.rs 2022-10-30 08:58:36 +05:30