Commit Graph

7630 Commits

Author SHA1 Message Date
bors[bot]
9519e60f97 Merge #709
709: Optimise the `ExactWords` sub-criterion within `Exactness` r=loiclec a=loiclec

# Pull Request

## Related issue
Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3116

## What does this PR do?
1. Reduces the algorithmic complexity of finding the documents containing N exact words from something that is exponential to something that is polynomial.
2. Cache intermediary results between different calls to the `exactness` criterion.

## Performance Results
On the `smol_songs.csv` dataset, a request containing 10 common words now takes about 60ms instead of 5 seconds to execute. For example, this is the case with this (admittedly nonsensical) request: `Rock You Hip Hop Folk World Country Electronic Love The`.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-02 12:28:30 +00:00
Loïc Lecrenier
b5df889dcb Apply review suggestions: simplify implementation of exactness criterion 2023-01-02 13:11:47 +01:00
bors[bot]
31155dce4c Merge #752
752: Simplify primary key inference r=irevoire a=dureuill

# Pull Request

## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3233

## What does this PR do?

### User PoV

- Change primary key inference to only consider a value as a candidate when it ends with "id", rather than when it simply contains "id".
- Change primary key inference to always fail when there are multiple candidates.
- Replace UserError::MissingPrimaryKey with `UserError::NoPrimaryKeyCandidateFound` and `UserError::MultiplePrimaryKeyCandidatesFound`

### Implementation-wise

- Remove uses of UserError::MissingPrimaryKey not pertaining to inference. This introduces a possible panicking path.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-02 11:44:22 +00:00
Loïc Lecrenier
8d36570958 Add explicit criterion impl strategy to proximity search tests 2023-01-02 10:37:01 +01:00
dependabot[bot]
939e7faf31 Bump taiki-e/install-action from 1 to 2
Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 1 to 2.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/install-action/compare/v1...v2)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-01 10:02:00 +00:00
bors[bot]
776acb5ed3 Merge #3276
3276: README: Replace Slack link with Discord r=dureuill a=shivaylamba

# Pull Request

## Related issue
Fixes #3275 

## What does this PR do?

Update Slack link with Discord link in the README

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Shivay Lamba <shivaylamba@gmail.com>
2022-12-29 10:32:42 +00:00
Louis Dureuil
3e9834abff Change error message when the db version is incompatible with engine version. 2022-12-26 17:34:36 +01:00
Louis Dureuil
3cba476a9f Add --generate-master-key CLI option 2022-12-26 10:36:45 +01:00
Louis Dureuil
57e851d8a9 Check for key length 2022-12-26 10:36:45 +01:00
Shivay Lamba
9c45850bd2 README: Replace Slack link with Discord 2022-12-26 00:19:13 +05:30
funilrys
3e0e8164a3 fixup! Adjust + Cleanup changes. 2022-12-22 18:01:54 +01:00
funilrys
0bc4572905 Adjust + Cleanup changes.
Indeed, I missed some of the changed that were introduced by #3190.
2022-12-22 17:53:33 +01:00
funilrys
4e6c663a2e Release unecessary ownership. 2022-12-22 17:47:58 +01:00
funilrys
e2775c6f49 Remove unused object. 2022-12-22 17:47:58 +01:00
funilrys
c07a5932cb Apply fmt. 2022-12-22 17:47:58 +01:00
funilrys
528a944997 Reimplement v5 date extraction.
Indeed, before this patch the implementation wasn't correct.
2022-12-22 17:47:58 +01:00
funilrys
13fb5ce974 Re-Open tasks list when needed.
Indeed, before this patch we were using the reference instead of
"reopening" the task list each time we needed to access it.
Without this patch, all other usage of the task attribute will
break.
2022-12-22 17:47:57 +01:00
funilrys
a43a0712fa Add reader.v5.tasks.Task.updated_at.
There was no way to "quickly" get the update date.
2022-12-22 17:47:57 +01:00
funilrys
1be4619b91 Add reader.v5.tasks.Task.created_at.
There was no way to "quickly" get the creation date.
2022-12-22 17:47:57 +01:00
funilrys
cf50f85986 Add reader.v5.tasks.Task.processed_at.
There was no way to "quickly" get the processed date.
2022-12-22 17:47:57 +01:00
funilrys
61b3a29ff3 Extract the dates out of the dumpv5.
This patch possibly fixes #2986.

This patch introduces a way to fill the IndexMetadata.created_at
and IndexMetadata.updated_at keys from the tasks events.
This is done by reading the creation date of the first event
(created_at) and the creation date of the last event (updated_at).
2022-12-22 17:47:57 +01:00
Loïc Lecrenier
32c6062e65 Optimise exactness criterion
1. Cache some results between calls to next()
2. Compute the combinations of exact words more efficiently
2022-12-22 12:28:45 +01:00
Loïc Lecrenier
f097aafa1c Add unit test for prefix handling by the proximity criterion 2022-12-22 12:08:00 +01:00
Loïc Lecrenier
777b387dc4 Avoid a prefix-related worst-case scenario in the proximity criterion 2022-12-22 12:08:00 +01:00
Loïc Lecrenier
b0f3dc2c06 Interpret synonyms as phrases 2022-12-22 12:07:51 +01:00
Louis Dureuil
66e18eae79 auth: add generate_master_key function 2022-12-22 11:55:27 +01:00
amab8901
9a39c4e40d Get date from IndexMetaData 2022-12-22 11:46:17 +01:00
amab8901
df176aaf01 Insert dump_reader.date() into create_raw_index(_) argument 2022-12-21 15:16:31 +01:00
Louis Dureuil
4b166bea2b Add primary_key_inference test 2022-12-21 15:13:38 +01:00
Louis Dureuil
5943100754 Fix existing tests 2022-12-21 15:13:38 +01:00
Louis Dureuil
b24def3281 Add logging when inference took place.
Displays log message in the form:
```
[2022-12-21T09:19:42Z INFO  milli::update::index_documents::enrich] Primary key was not specified in index. Inferred to 'id'
```
2022-12-21 15:13:38 +01:00
Louis Dureuil
402dcd6b2f Simplify primary key inference 2022-12-21 15:13:38 +01:00
Louis Dureuil
13c95d25aa Remove uses of UserError::MissingPrimaryKey not related to inference 2022-12-21 15:13:36 +01:00
amab8901
0893b175dc Merge branch 'main' into 2983-forward-date-to-milli 2022-12-21 14:31:19 +01:00
amab8901
d5978d11e1 Refactor 2022-12-21 14:28:00 +01:00
bors[bot]
a8defb585b Merge #742
742: Add a "Criterion implementation strategy" parameter to Search r=irevoire a=loiclec

Add a parameter to search requests which determines the implementation strategy of the criteria. This can be either `set-based`, `iterative`, or `dynamic` (ie choosing between set-based or iterative at search time). See https://github.com/meilisearch/milli/issues/755 for more context about this change.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-12-21 12:18:49 +00:00
Loïc Lecrenier
339a4b0789 Make clippy happy 2022-12-21 12:49:34 +01:00
Loïc Lecrenier
904fd2f6d1 Add a search strategy option to the cli 2022-12-21 12:48:53 +01:00
Loïc Lecrenier
229405aeb9 Choose implementation strategy of criterion at runtime 2022-12-21 09:29:39 +01:00
jiangbo212
2780e365e2 test update and ndjson serde use from_slice 2022-12-21 14:31:45 +08:00
jiangbo212
bf2a401a05 serde ndjson fix 2022-12-21 11:27:15 +08:00
bors[bot]
9925309492 Merge #3263
3263: Handle most io error instead of tagging everything as an internal r=dureuill a=irevoire

Fix https://github.com/meilisearch/meilisearch/issues/2255
Fix https://github.com/meilisearch/meilisearch/issues/2785
Close https://github.com/meilisearch/milli/pull/580

- [x] Find a way to catch the `io::Error` contained in `serde_json::Error`: We can't: https://docs.rs/serde_json/latest/serde_json/struct.Error.html
- [x] Check the `grenad::Error` as well => the `grenad::Error::Io` error are correctly converted to a `milli::Error::Io` error 
- [x] Ensure the error code mean the same thing under windows

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-12-20 17:15:53 +00:00
Tamo
9e0cce5ca4 Update dump/src/error.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-12-20 18:08:51 +01:00
Tamo
336ea57384 Update dump/src/error.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-12-20 18:08:44 +01:00
Tamo
c637bfba37 convert all the document format error due to io to io::Error 2022-12-20 17:49:38 +01:00
Tamo
3040172562 update the error message as well 2022-12-20 17:31:13 +01:00
bors[bot]
249e051cd4 Merge #750
750: Fix hard-deletion of an external id that was soft-deleted and then reimported - main r=irevoire a=loiclec

# Pull Request

## Related issue
Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3021

## What does this PR do?
There was a bug happening when:

1. Documents were added
2. Some of these documents were replaced using soft-deletion
3. A deletion of another non-replaced document takes place and triggers a hard-deletion
4. Documents with the same identifiers as the replaced documents are added again

Then, search results would return duplicate documents. No crash would happen at any time (this is the reason it wasn't caught by the previous fuzz test. I have updated the new one such that it also checks the result of a placeholder search request, which then finds the bug immediately).

The cause of the bug is: 

1. When a hard-deletion is triggered, we try to retrieve the external document id associated with each soft-deleted document id. 
2. Then, we take this list of external document ids and remove each of them from the `ExternalDocumentsIds` structure. 
3. However, this is not correct in case an existing (non-deleted) document shares the external id of a soft-deleted document. 
   
## Implementation of the fix
1. Before we process a permanent deletion, we update the list of soft-deleted document ids.
2. Then, the permanent deletion's job is to remove the soft-deleted documents from all data structures. Therefore, to update `ExternalDocumentsIds`, we can simply call the `delete_soft_deleted_documents_ids_from_fsts` method, which is faster and simpler.

## Correctness
A unit test was added to reproduce the bug. The new fuzz test, when adjusted to check the correctness of a placeholder search, could also instantly reproduce the bug, but now does not find any other problem.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-12-20 16:13:20 +00:00
Tamo
52aa34d984 remove an unused error handling file 2022-12-20 16:32:51 +01:00
Loïc Lecrenier
fc0e7382fe Fix hard-deletion of an external id that was soft-deleted 2022-12-20 15:33:31 +01:00
bors[bot]
2c86d42a44 Merge #3264
3264: Remove macos-latest and windows-latest usages r=curquiza a=curquiza

Related to https://github.com/meilisearch/meilisearch/issues/3109#issuecomment-1359151297

Remove the `macos-latest` and `windows-latest` to replace them with the specific version: this will avoid "surprises" in the future when GitHub changes the `latest` version.
This way, it will also allow us to let the documentation team know about the changes, since we will control the macOS/Windows version we support

Co-authored-by: curquiza <clementine@meilisearch.com>
2022-12-20 10:53:37 +00:00