Commit Graph

7825 Commits

Author SHA1 Message Date
bors[bot]
97fb64e40e Merge #747
747: Soft-deletion computation no longer depends on the mapsize r=irevoire a=dureuill

# Pull Request

## Related issue

Related to https://github.com/meilisearch/meilisearch/issues/3231: After removing `--max-index-size`, the `mapsize` will always be unrelated to the actual max size the user wants for their DB, so it doesn't make sense to use these values any longer.

This implements solution 2.3 from https://github.com/meilisearch/meilisearch/issues/3231#issuecomment-1348628824

## What does this PR do?

### User-visible

- Soft-deleted are no longer deleted when there is less than 10% of the mapsize available or when they take more than 10% of the mapsize
- Instead, they are deleted when they are more soft deleted than regular documents, or when they take more than 1GiB disk space (estimated).

### Implementation standpoint

1. Adds a `DeletionStrategy` struct to replace the boolean `disable_soft_deletion` that we had up until now. This enum allows us to specify that we want "always hard", "always soft", or to use the dynamic soft-deletion strategy (default).
2. Uses the current strategy when deleting documents, with the new heuristics being used in the `DeletionStrategy::Dynamic` variant.
3. Updates the tests to use the appropriate DeletionStrategy whenever needed (one of `AlwaysHard` or `AlwaysSoft` depending on the test)

Note to reviewers: this PR is optimized for a commit-by-commit review.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-12-19 17:46:18 +00:00
curquiza
b3fce7c366 Remove useless continue-on-error 2022-12-19 18:39:35 +01:00
curquiza
5099a40484 Use ubuntu-18.04 container in publish CIs 2022-12-19 18:35:33 +01:00
Tamo
69edbf9f6d Update milli/src/update/delete_documents.rs 2022-12-19 18:23:50 +01:00
bors[bot]
8957251eed Merge #751
751: Update version for the next release (v0.38.0) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2022-12-19 17:02:39 +00:00
curquiza
c72535531b Update version for the next release (v0.38.0) in Cargo.toml files 2022-12-19 16:35:38 +00:00
bors[bot]
19ee9a828f Merge #3262
3262: Clippy fixes after updating Rust to v1.66 r=curquiza a=dureuill

Ran `cargo clippy --fix`

Fixes the CI.


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-12-19 14:05:59 +00:00
Louis Dureuil
869d331680 Clippy fixes after updating Rust to v1.66 2022-12-19 14:17:12 +01:00
curquiza
913eff5b2f Use ubuntu-18.04 container in rust tests 2022-12-19 10:46:29 +01:00
Louis Dureuil
916c23e7be Tests: rename snapshots 2022-12-19 10:07:17 +01:00
Louis Dureuil
ad9937c755 Fix tests after adding DeletionStrategy 2022-12-19 10:07:17 +01:00
Louis Dureuil
171c942282 Soft-deletion computation no longer takes into account the mapsize
Implemented solution 2.3 from https://github.com/meilisearch/meilisearch/issues/3231#issuecomment-1348628824
2022-12-19 10:07:17 +01:00
Louis Dureuil
e2ae3b24aa Hard or soft delete according to the deletion strategy 2022-12-19 10:00:13 +01:00
Louis Dureuil
fc7618d49b Add DeletionStrategy 2022-12-19 09:49:58 +01:00
amab8901
b4a73f2d74 Remove redundant date-setting 2022-12-16 08:32:44 +01:00
amab8901
4e175ae882 Replace Index::new_with_creation_dates(...) with Index::new(...) 2022-12-16 08:20:13 +01:00
amab8901
5a0a0468df Combine created and added into date 2022-12-16 08:11:12 +01:00
ManyTheFish
7f88c4ff2f Fix #1714 test 2022-12-15 18:22:28 +01:00
ManyTheFish
96d4242b93 Update charabia 2022-12-15 18:22:22 +01:00
ManyTheFish
60ebf0ea0b Add a specific test on finite pagination placeolder search with distinct attributes 2022-12-15 17:28:20 +01:00
bors[bot]
867279f2a4 Merge #3249
3249: Bring back changes from release-v0.30.3 to main r=curquiza a=curquiza

⚠️ ⚠️ I had to fix git conflicts, ensure I did not lose anything ⚠️ ⚠️ 

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-12-15 14:13:30 +00:00
bors[bot]
5114686394 Merge #743
743: Fix finite pagination with placeholder search r=Kerollmops a=ManyTheFish

this bug is reproducible on real datasets and is hard to isolate in a simple test.

related to: https://github.com/meilisearch/meilisearch/issues/3200

poke `@curquiza` 

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-12-15 09:31:47 +00:00
ManyTheFish
3322018c06 Fix placeholder search 2022-12-14 20:09:47 +01:00
Louis Dureuil
ce84a59873 Re-apply some changes from #3132 2022-12-14 20:02:39 +01:00
Tamo
d66bb3a53f rename the two new functions 2022-12-14 17:27:43 +01:00
Tamo
6c0b8edab5 Fix typos
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-12-14 17:27:37 +01:00
Tamo
fbbc6eaeca Fix the import of dumps and snapshot.
Some flags were badly applied + the database wrongly deleted when they shouldn't
2022-12-14 17:27:28 +01:00
Kerollmops
60c3bac108 Bump milli to v0.37.3 2022-12-14 17:25:40 +01:00
bors[bot]
9491fe0704 Merge #3247
3247: Re-add push in docker CI r=curquiza a=curquiza

I made a mistake here https://github.com/meilisearch/meilisearch/pull/3229, `push` is not `true` by default, see https://github.com/docker/build-push-action#customizing

Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-12-14 13:15:41 +00:00
Clémentine Urquizar - curqui
240c73d292 Re-add push 2022-12-14 14:05:25 +01:00
amab8901
d3eb8d2d5c Enable create_raw_index(...) to specify time 2022-12-14 10:44:25 +01:00
bors[bot]
0276d5212a Merge #728
728: Add some integration tests on the sort criterion r=ManyTheFish a=loiclec

This is simply an integration test ensuring that the sort criterion works properly. 

However, only one version of the algorithm is tested here (the iterative one). To test the version that uses the facet DB, one has to manually set the `CANDIDATES_THRESHOLD` constant to `0`. I have done that and ensured that the test still succeeds. However, in the future, we will probably want to have an option to force which algorithm is used at runtime, for testing purposes.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-12-14 09:27:12 +00:00
bors[bot]
660be071b5 Merge #3236
3236: Improves clarity of the code that receives payloads r=Kerollmops a=Kerollmops

This PR makes small changes to #3164. It improves the clarity and simplicity of some parts of the code.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-12-13 18:20:24 +00:00
bors[bot]
89542d7d8b Merge #3241
3241: Remove core mention r=curquiza a=curquiza

No impact for the users or the team

Co-authored-by: curquiza <clementine@meilisearch.com>
2022-12-13 17:35:50 +00:00
curquiza
f62e7a3501 Remove core mention 2022-12-13 17:34:43 +01:00
Kerollmops
a08cc82983 Revert "Simplify the code when array_each failed"
This reverts commit 271685cceb.
2022-12-13 16:29:49 +01:00
bors[bot]
e2ffc3d69a Merge #741
741: Add test reproducing the bug fixed by #737 r=Kerollmops a=ManyTheFish

related to #737

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-12-13 15:02:19 +00:00
ManyTheFish
739da9fd4d Add test 2022-12-13 15:54:43 +01:00
bors[bot]
2af93966e0 Merge #740
740: Fix two nightly errors r=Kerollmops a=irevoire

Currently, we have these two errors on rust nightly. It would be nice to help rustc understand what's going on

```
error[E0658]: anonymous lifetimes in `impl Trait` are unstable
   --> filter-parser/src/lib.rs:173:53
    |
173 | fn ws<'a, O>(inner: impl FnMut(Span<'a>) -> IResult<O>) -> impl FnMut(Span<'a>) -> IResult<O> {
    |                                                     ^ expected named lifetime parameter
    |
    = help: add `#![feature(anonymous_lifetime_in_impl_trait)]` to the crate attributes to enable
help: consider introducing a named lifetime parameter
    |
173 | fn ws<'a, 'a, O>(inner: impl FnMut(Span<'a>) -> IResult<'a, O>) -> impl FnMut(Span<'a>) -> IResult<O> {
    |       +++                                               +++

error[E0658]: anonymous lifetimes in `impl Trait` are unstable
  --> filter-parser/src/error.rs:36:49
   |
36 |     mut parser: impl FnMut(Span<'a>) -> IResult<O>,
   |                                                 ^ expected named lifetime parameter
   |
   = help: add `#![feature(anonymous_lifetime_in_impl_trait)]` to the crate attributes to enable
help: consider introducing a named lifetime parameter
   |
35 ~ pub fn cut_with_err<'a, 'a, O>(
36 ~     mut parser: impl FnMut(Span<'a>) -> IResult<'a, O>,
   |

For more information about this error, try `rustc --explain E0658`.
error: could not compile `filter-parser` due to 2 previous errors
```

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-12-13 14:33:40 +00:00
Kerollmops
7b2f2a4f9c Do only one convertion to u64 2022-12-13 15:31:55 +01:00
Tamo
2c47500bc3 fix two nightly errors 2022-12-13 15:29:52 +01:00
Kerollmops
5d5615ef45 Rename the ReceivePayload error variant 2022-12-13 15:07:35 +01:00
Kerollmops
526793b5b2 Handle empty arrays the same way we handle other arrays 2022-12-13 14:58:40 +01:00
Kerollmops
271685cceb Simplify the code when array_each failed 2022-12-13 14:58:05 +01:00
bors[bot]
1af590d3bc Merge #3234
3234: Update README.md r=curquiza a=tpayet

Change Slack link to Discord link

Co-authored-by: Thomas Payet <thomas@meilisearch.com>
2022-12-13 11:41:10 +00:00
bors[bot]
dab2634ca8 Merge #3164
3164: Improve the way we receive the documents payload r=Kerollmops a=jiangbo212

# Pull Request

## Related issue
Fixes #3037 

## What does this PR do?
- writing the playload to a temporary file via BufWritter
- deserialising the json tempporary file to an array of Objects by means of a memory map
- deserialising thie csv tempporary file by means of a memory map
- Adapted some read_json tests

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: jiangbo212 <peiyaoliukuan@gmail.com>
Co-authored-by: jiangbo212 <peiyaoliukuan@126.com>
2022-12-13 10:58:24 +00:00
bors[bot]
406ee31d1a Merge #737
737: Fix typo initial candidates computation r=Kerollmops a=ManyTheFish

When `Typo` criterion was after a different criterion than `Words` and the previous criterion wasn't returning any candidates at the first iteration of the bucket sort, then the `initial_candidates` were lost.

Now, `Typo`ensure to keep the `initial_candidates` between iterations.


related to https://github.com/meilisearch/meilisearch/issues/3200#issuecomment-1345179578
related to https://github.com/meilisearch/meilisearch/issues/3228

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-12-13 10:29:28 +00:00
ManyTheFish
2d8d0af1a6 Rename short name bc by ic for initial_candidates 2022-12-13 10:56:38 +01:00
Thomas Payet
8a7f90250c Update README.md
Change Slack link to Discord link
2022-12-13 10:46:05 +01:00
bors[bot]
e0a8f8cb5a Merge #734
734: Fix bug 2945/3021 (missing key in documents database) r=Kerollmops a=loiclec

# Pull Request

## Related issue
Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/2945 (until we integrate the new milli bump into meilisearch).

**Note that a dump will not be sufficient to upgrade from meilisearch v0.30.2 to meilisearch v0.30.3 due to this fix** because the bug could have caused the `documents` database to be corrupted. Instead, a full manual reimport of the documents will be necessary.

## What does this PR do?
There was a bug happening when:
1. A few documents are added to the index
2. Some of these documents are soft-deleted
3. New documents are added, replacing existing ones and triggering a hard-deletion

The `IndexDocuments::execute` method would then perform the hard-deletion but forget to change the `external_document_ids` structure appropriately. As a result, the `external_document_ids` would contain keys corresponding to documents that do no exist anymore.

To fix this bug, I split the `DeleteDocuments::execute` method into two: `execute_inner` and `execute`. 
- `execute_inner` returns a `DetailedDocumentDeletionResult` which says whether soft-deletion was used or not
- `execute` keeps the exact same signature and behaviour

Then, when deleting replaced documents inside `IndexDocuments::execute`, we call `DeleteDocuments::execute_inner` instead of `DeleteDocuments::execute`. If soft-deletion was used, nothing more is done. But if hard-deletion was used, we remove every reference to soft-deleted documents in the new `external_documents_ids` structure.

## Correctness

- Every other test still passes
- The reproduction test case now passes
- In a different branch ([`update-fuzz-test`](https://github.com/meilisearch/milli/pull/735)), I created a fuzz-test that reproduces the past two bugs. This fuzz test cannot find this bug through any combination of some hand-selected `DocumentAddition / DocumentDeletion / DocumentClear / SettingsUpdate` operations. In that test, each relevant operations can be executed with or without soft-deletion, and document additions can be done in batches, replacing or updating existing documents.



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-12-13 09:45:57 +00:00