Commit Graph

2285 Commits

Author SHA1 Message Date
282b2e3b98 Bump Swatinem/rust-cache from 2.0.1 to 2.2.0
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.0.1 to 2.2.0.
- [Release notes](https://github.com/Swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Swatinem/rust-cache/compare/v2.0.1...v2.2.0)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-12-01 10:02:54 +00:00
5e754b3ee0 Merge #708
708: Reduce memory usage of the MatchingWords structure r=ManyTheFish a=loiclec

# Pull Request

## Related issue
Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3115 

## What does this PR do?
1. Reduces the memory usage caused by the creation of a 10-word query tree by 20x. 
   This is done by deduplicating the `MatchingWord` values, which are heavy because of their inner DFA. The deduplication works by wrapping each `MatchingWord` in a reference-counted box and using a hash map to determine whether a  `MatchingWord` DFA already exists for a certain signature, or whether a new one needs to be built.
 
2. Avoid the worst-case scenario of creating a `MatchingWord` for extremely long words that cannot be indexed by milli.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-30 17:47:34 +00:00
e1612fcb01 Merge #712
712: Fix bulk facet indexing bug r=Kerollmops a=loiclec

# Pull Request

## Related issue
Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3165

## What does this PR do?
Fixes a bug where indexing certain numbers of filterable attribute values in bulk led to corrupted facet databases. This was due to a lossy integer conversion which would ultimately prevent entire levels of the facet database to be written into LMDB.

More specifically, this change was made:
```diff
      - if cur_writer_len as u8 >= self.min_level_size {
      + if cur_writer_len >= self.min_level_size as usize {
```
I also checked other comparisons to `min_level_size` and other conversions such as `x as u8` in this part of the codebase.



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-30 16:51:48 +00:00
9dd4b33a9a Fix bulk facet indexing bug 2022-11-30 14:27:36 +01:00
de22116b3d Merge #711
711: Replace deprecated gh actions r=curquiza a=pnhatminh

# Pull Request

## Related issue
Fixes #678

## What does this PR do?
- Replace deprecated github action command with newly defined command.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Minh Pham <minh.pham@codelink.io>
2022-11-29 09:56:22 +00:00
5f78522044 Updagte 2022-11-29 10:11:38 +07:00
61b58b115a Don't create partial matching words for synonyms in ngrams 2022-11-28 16:32:28 +01:00
f698e6cfdf Merge #707
707: Add all_obkv_to_json function r=Kerollmops a=GregoryConrad

## What does this PR do?
When embedding milli in an application (other than Meilisearch), it often makes sense to not use the `displayed_attributes` functionality and instead just use milli as a full document store. Thus, this PR adds a function, `all_obkv_to_json`, to supplement the already exposed `milli::obkv_to_json` so that those embedding milli *do not* need to deal with `displayed_attributes` if they don't need to.

~This PR also introduces a slight breaking change: `obkv_to_json` now accepts a reference to `obkv::KvReaderU16` instead of taking ownership of it. As far as I can tell, this seems like a change for the better (`obkv_to_json` only acts upon `obkv` rather than consuming it), but I can change it back if you so desire.~ (reverted in [935a724](935a724c57))

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>
2022-11-28 14:52:45 +00:00
f70856bab1 Remove memory usage test that fails when many tests are run in parallel 2022-11-28 12:55:28 +01:00
80588daae5 Fix compilation error in formatting benches 2022-11-28 10:27:15 +01:00
e2ebed62b1 Don't create partial matching words for synonyms, split words, phrases 2022-11-28 10:20:13 +01:00
8284bd760f Relax memory ordering of operations within the test CountingAlloc 2022-11-28 10:20:13 +01:00
8d0ace2d64 Avoid creating a MatchingWord for words that exceed the length limit 2022-11-28 10:20:13 +01:00
86c34a996b Deduplicate matching words 2022-11-28 10:20:13 +01:00
eba7af1d2c Replace deprecated gh actions 2022-11-27 06:47:08 +07:00
84dd2e4df1 Merge #710
710: Update Clippy to use Rust Stable r=irevoire a=Kerollmops

This PR changes the CI to use Rust stable for Clippy and Rustfmt. This way we will reduce the number of times we break the CI. [The version will only change every two months or so](https://www.whatrustisit.com/).

Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-11-24 15:57:04 +00:00
3d06ea41ea Keep a nightly for rustfmt
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-24 16:54:40 +01:00
3958db4b17 Update the CI to use Rust Stable 2022-11-24 16:26:48 +01:00
935a724c57 revert: Revert pass by reference API change 2022-11-24 10:08:23 -05:00
7c0e544839 feat: Add all_obkv_to_json function 2022-11-23 21:18:58 -05:00
57c9f03e51 Merge #697
697: Fix bug in prefix DB indexing r=loiclec a=loiclec

Where the batch's information was not properly updated in cases where only the proximity changed between two consecutive word pair proximities.

Closes partially https://github.com/meilisearch/meilisearch/issues/3043



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-17 15:22:01 +00:00
467e742bd1 Merge #702
702: Update version for the next release (v0.37.0) in Cargo.toml files r=ManyTheFish a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2022-11-17 12:54:27 +00:00
cd5aaa3a9f Update version for the next release (v0.37.0) in Cargo.toml files 2022-11-17 12:50:07 +00:00
8ceb199dca Merge #696
696: Fix Facet Indexing bugs r=Kerollmops a=loiclec

1. Handle keys with variable length correctly

Closes (partially) https://github.com/meilisearch/meilisearch/issues/3042 
This issue is now easily reproducible with the updated fuzz tests, which now generate keys with variable lengths.

2. Prevent adding facets to the database if their encoded value does not satisfy `valid_lmdb_key`.

Closes (partially) https://github.com/meilisearch/meilisearch/issues/2743
This fixes an indexing failure when a document had a filterable attribute containing a value whose length is higher than ~500 bytes. For now, this fix is just meant to prevent crashes. Better handling of long values of filterable attributes will be handled in a separate PR.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-17 11:56:16 +00:00
777eb3fa00 Add insta-snaps for test of bug 3043 2022-11-17 12:21:27 +01:00
0caadedd3b Make clippy happy 2022-11-17 12:17:53 +01:00
ac3baafbe8 Truncate facet values that are too long before indexing them 2022-11-17 11:29:42 +01:00
990a861241 Add test for indexing a document with a long facet value 2022-11-17 11:29:42 +01:00
d95d02cb8a Fix Facet Indexing bugs
1. Handle keys with variable length correctly

This fixes https://github.com/meilisearch/meilisearch/issues/3042 and
is easily reproducible with the updated fuzz tests, which now generate
keys with variable lengths.

2. Prevent adding facets to the database if their encoded value does
not satisfy `valid_lmdb_key`.

This fixes an indexing failure when a document had a filterable
attribute containing a value whose length is higher than ~500 bytes.
2022-11-17 11:29:42 +01:00
f00108d2ec Fix name of bug in reproduction test 2022-11-17 11:29:18 +01:00
f7c8730d09 Fix bug in prefix DB indexing
Where the batch's information was not properly updated in cases
where only the proximity changed between two consecutive word pair
proximities.

Closes https://github.com/meilisearch/meilisearch/issues/3043
2022-11-17 11:29:18 +01:00
a651397afc Merge #685
685: ci: Use pre-compiled binaries for faster CI r=irevoire a=azzamsa

# Pull Request

## Related issue
Fixes #<issue_number>

## What does this PR do?
- ...

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: azzamsa <me@azzamsa.com>
2022-11-16 16:39:39 +00:00
2000db8453 Merge #701
701: Remove Hacktoberfest sections r=curquiza a=meili-bot

_This PR is auto-generated._

Remove Hacktoberfest sections from CONTRIBUTING file.


Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
2022-11-15 15:17:18 +00:00
92cc3550d8 Update CONTRIBUTING.md 2022-11-15 16:16:40 +01:00
cd3bca06e9 Merge #699
699: Force vendoring of LMDB even if a system version is available r=Kerollmops a=dureuill

# Pull Request

## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3017: will fix once ported to milli and meilisearch.

## What does this PR do?
- Force using vendored version of LMDB
- **don't use lmdb master3 branch anymore**: this is a bit of a side effect of using a tag instead of branch for heed as a dependency, but it is wanted anyway for now as lmdb master3 was more of an experiment
- **modifies CI to run `cargo check` on the release rather than the debug artifacts**. This is an attempt to reduce the necessary disk space and avoid "out of space" failures.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-11-15 11:09:20 +00:00
87576cf26c Perform cargo check on the release artifacts 2022-11-15 10:25:02 +01:00
6dc6a5d874 Force using vendored version of LMDB
- don't use lmdb master3 branch anymore
2022-11-14 17:17:51 +01:00
e75829aded Merge #694
694: Update version for the next release (v0.36.0) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: Kerollmops <Kerollmops@users.noreply.github.com>
2022-11-09 11:22:24 +00:00
d00d2aab3f Update version for the next release (v0.36.0) in Cargo.toml files 2022-11-09 11:03:09 +00:00
f46a8ab2e2 Merge #693
693: use the lmdb-master.3 branch r=Kerollmops a=irevoire

After investigating https://github.com/meilisearch/meilisearch/issues/3017, we found out that it was due to lmdb and that, without any code change on our side, bumping using the lmdb-master-3 branch fix our issues.

But, we’re not really confident about what changed between the `mdb.master` and `mdb.master3` branches; thus this is a temporary change, and we hope we’ll be able to move to the new version of heed asap (either before the end of the pre-release or for the next release).

--------

The bug is hard to reproduce; I can reproduce it 100% of the time on my archlinux personal computer. But on a scaleway archlinux bare-metal machine, it doesn’t reproduce. It’s flaky on our test suite, but `@loiclec` was able to write a minimal test that reproduces it every time on macOS.
Basically, what happens is when there are multiple threads opening databases in a different directory at the same time.
If there are 10 or more threads running at the same time, lmdb starts throwing the `Invalid argument (os error 22)` error for no reason, we believe.
I would like to submit an issue to lmdb, but I don’t really have the time to write a test in C without heed currently.

`@hyc,` if you want to take a look at it, here is the repo that reproduces the issue on macOS: https://github.com/irevoire/heed-bug

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-11-09 09:42:38 +00:00
c3b75bbe5d Merge #691
691: Update version for the next release (v0.35.1) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: Kerollmops <Kerollmops@users.noreply.github.com>
2022-11-08 15:31:50 +00:00
c7711daca3 use the lmdb-master.3 branch 2022-11-08 16:28:01 +01:00
f18a4581f1 Merge #692
692: Update CONTRIBUTING.md r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-11-08 14:51:30 +00:00
8ce8bbcdfc Update CONTRIBUTING.md 2022-11-08 15:49:45 +01:00
bd12989610 Update version for the next release (v0.35.1) in Cargo.toml files 2022-11-08 14:31:39 +00:00
24a298a83c Merge #690
690: Fix soft deleted bug settings r=ManyTheFish a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-08 13:45:10 +00:00
d85cd9bf1a Merge #689
689: Handle non-finite floats consistently in filters r=irevoire a=dureuill

# Pull Request

## Related issue

Related meilisearch/meilisearch#3000

## What does this PR do?

### User

- Filters using `field = inf`, (or `infinite`, `NaN`) now match the value as a string rather than returning an internal error.
- Filters using `field < inf` (or other comparison operators) now return an invalid_filter error rather than returning an internal error, much like when using `field < aaa`.

### Implementation

- Add new `NonFiniteFloat` error variants to the filter-parser errors
- Add `Token::parse_as_finite_float` that can fail both when the string is not a float and when the float is not finite
- Refactor `Filter::inner_evaluate` to always use `parse_as_finite_float` instead of just `parse`
- Add corresponding tests

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-11-08 13:24:38 +00:00
37b3c5c323 Fix transform to use all_documents and ignore soft_deleted documents 2022-11-08 14:23:16 +01:00
1b1ad1923b Add a test to check that we take care of soft deleted documents 2022-11-08 14:23:14 +01:00
a836b8e703 tests: Tests filter with non-finite floats 2022-11-08 13:56:55 +01:00