Commit Graph

7143 Commits

Author SHA1 Message Date
bors[bot]
49f58b2c47 Merge #732
732: Interpret synonyms as phrases r=loiclec a=loiclec

# Pull Request

## Related issue
Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3125

## What does this PR do?
We now map multi-word synonyms to phrases instead of loose words. Such that the request:
```
btw I am going to nyc soon
```
is interpreted as (when the synonym interpretation is chosen for both `btw` and `nyc`):
```
"by the way" I am going to "New York City" soon
```
instead of:
```
by the way I am going to New York City soon
```

This prevents queries containing multi-word synonyms to exceed to word length limit and degrade the search performance.

In terms of relevancy, there is a debate to have. I personally think this could be considered an improvement, since it would be strange for a user to search for:
```
good DIY project
```
and have a result such as:
```
{
    "text": "whether it is a good project to do, you'll have to decide for yourself"
}
```
However, for synonyms such as `NYC -> New York City`, then we will stop matching documents where `New York` is separated from `City`. This is however solvable by adding an additional mapping: `NYC -> New York`.

## Performance

With the old behaviour, some long search requests making heavy uses of synonyms could take minutes to be executed. This is no longer the case, these search requests now take an average amount of time to be resolved.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-04 08:34:18 +00:00
bors[bot]
947f08793a Merge #3296
3296: Remove `--disable-auto-batching` CLI option r=gmourier a=loiclec

Fixes #3294 

The `index-scheduler` code is not modified, only the CLI options have changed.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-03 16:57:14 +00:00
bors[bot]
6a10e85707 Merge #736
736: Update charabia r=curquiza a=ManyTheFish

Update Charabia to the last version.

> We are now Romanizing Chinese characters into Pinyin.
> Note that we keep the accent because they are in fact never typed directly by the end-user, moreover, changing an accent leads to a different Chinese character, and I don't have sufficient knowledge to forecast the impact of removing accents in this context.

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-01-03 15:44:41 +00:00
Louis Dureuil
17dac72464 Characters -> bytes 2023-01-03 15:31:02 +01:00
Loïc Lecrenier
b821c72459 Remove --disable-auto-batching CLI option 2023-01-03 15:01:04 +01:00
Louis Dureuil
7b2575c646 Master Key -> master key 2023-01-03 14:45:23 +01:00
bors[bot]
c505fa9d7d Merge #758
758: Bump taiki-e/install-action from 1 to 2 r=curquiza a=dependabot[bot]

Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 1 to 2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p>
<blockquote>
<h2>2.0.0</h2>
<p>This release implements a mechanism to automatically track the latest version of the tool on our end. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/27">#27</a>)
Hopefully, this will avoid situations such as &quot;new version of the tool has been released, but the maintainer has not been aware of it for a number of months&quot;.
This also makes it easier to add support for new tools.</p>
<p>This release also includes the following improvements:</p>
<ul>
<li>
<p>Verify SHA256 checksums for downloaded files in all tools installed from GH releases. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/27">#27</a>)</p>
</li>
<li>
<p>Support omitting the patch/minor version in all tools installed from GH releases. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/27">#27</a>)</p>
<p>For example:</p>
<pre lang="yaml"><code>- uses: taiki-e/install-action@v2
  with:
    tool: cargo-hack@0.5
</code></pre>
<p>You can also omit the minor version if the major version of tool is 1 or greater.</p>
</li>
<li>
<p>Support <code>just</code>. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/34">#34</a>)</p>
</li>
<li>
<p>Support <code>dprint</code>. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/34">#34</a>)</p>
</li>
</ul>
<p>Note: This release is considered a breaking change because installing on versions not yet recognized by the action or on pre-release versions will no longer work with this release. (They were never officially supported, but they could work before.) Please submit an issue if you need these supports again.</p>
<h2>1.17.3</h2>
<ul>
<li>Update <code>wasmtime@latest</code> to 4.0.0.</li>
</ul>
<h2>1.17.2</h2>
<ul>
<li>Update <code>mdbook@latest</code> to 0.4.25.</li>
</ul>
<h2>1.17.1</h2>
<ul>
<li>Update <code>mdbook@latest</code> to 0.4.23.</li>
<li>Support <code>mdbook</code> on Linux (musl).</li>
<li>Update <code>cargo-llvm-cov@latest</code> to 0.5.3.</li>
</ul>
<h2>1.17.0</h2>
<ul>
<li>Update <code>protoc@latest</code> to 3.21.12.</li>
<li>Support aarch64 self-hosted runners (Linux, macOS, Windows).</li>
<li>Improve support for Fedora/RHEL based containers/self-hosted runners.</li>
</ul>
<h2>1.16.0</h2>
<ul>
<li>
<p>Update <code>cargo-binstall@latest</code> to 0.18.1. (<a href="https://github-redirect.dependabot.com/taiki-e/install-action/pull/32">#32</a>, thanks <a href="https://github.com/NobodyXu"><code>`@​NobodyXu</code></a>)</p>`
</li>
<li>
<p>If the host environment lacks packages required for installation, such as <code>curl</code> or <code>tar</code>, install them if possible.</p>
<p>It is mainly intended to make the use of this action easy on containers or self-hosted runners, and currently supports Debian-based distributions (including Ubuntu) and Alpine.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="8ffc26aecd"><code>8ffc26a</code></a> Release 2.0.1</li>
<li><a href="82e9eb5996"><code>82e9eb5</code></a> Update DEVELOPMENT.md</li>
<li><a href="d3b7ad8380"><code>d3b7ad8</code></a> Update changelog</li>
<li><a href="f1a96ee3ed"><code>f1a96ee</code></a> Update changelog when manifest update</li>
<li><a href="46063c186c"><code>46063c1</code></a> Update cargo-minimal-versions</li>
<li><a href="048586d7a8"><code>048586d</code></a> Update cargo-hack</li>
<li><a href="d117b8d41a"><code>d117b8d</code></a> Remove outdated todo</li>
<li><a href="76828c33cd"><code>76828c3</code></a> Release 2.0.0</li>
<li><a href="eea8c318de"><code>eea8c31</code></a> Update readme and changelog</li>
<li><a href="ab0e193cf5"><code>ab0e193</code></a> Support dprint</li>
<li>Additional commits viewable in <a href="https://github.com/taiki-e/install-action/compare/v1...v2">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=1&new-version=2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-03 10:44:36 +00:00
bors[bot]
ab655a85e8 Merge #3279
3279: Clarify error message when the db and engine versions are incompatible r=irevoire a=dureuill

# Pull Request

## Related issue

Related to https://github.com/meilisearch/meilisearch/issues/2752

## What does this PR do?
- Implements https://github.com/meilisearch/product/discussions/572#discussioncomment-4390616

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-02 17:18:11 +00:00
bors[bot]
6425e06cf2 Merge #3274
3274: Reject master keys that are less than 16 bytes and add `--generate-master-key` CLI option r=irevoire a=dureuill

# Pull Request

## Related issue
Fix #3272 
Fix #3287

## What does this PR do?

### User standpoint

---

- Adds a `--generate-master-key` CLI flag to generate a fresh Master Key and exit.

<img width="1351" alt="Capture d’écran 2022-12-22 à 14 18 58" src="https://user-images.githubusercontent.com/41078892/209142778-eab52eeb-eaa8-409b-897a-c0d5728c8aaa.png">

---

(relevant fragment of the `--help` message)

<img width="1351" alt="Capture d’écran 2022-12-22 à 14 19 40" src="https://user-images.githubusercontent.com/41078892/209142891-ebfa2ed6-f231-4f76-a3ae-b7542c7aef04.png">

---

- When `meilisearch` is started in the `development` environment and no Master Key has been provided, then the binary prints a warning before starting.

<img width="1351" alt="Capture d’écran 2022-12-22 à 14 14 49" src="https://user-images.githubusercontent.com/41078892/209142158-54eba3b7-bf71-4f3f-8840-0600b13a1a9f.png">

---

- When `meilisearch` is started in the `development` environment and the provided Master Key is shorter than 16 bytes, then the binary prints a warning before starting.

<img width="1351" alt="Capture d’écran 2022-12-22 à 14 15 58" src="https://user-images.githubusercontent.com/41078892/209142295-0209fe47-c03b-424f-a73f-cee9b633137a.png">

---

- When `meilisearch` is started in the `production` environment, and no Master Key is provided, the error message is altered to generate a fresh Master Key.

<img width="1351" alt="Capture d’écran 2022-12-22 à 17 29 02" src="https://user-images.githubusercontent.com/41078892/209180540-0def5798-15db-47f0-a6ec-8cfa081dea77.png">


---

- When `meilisearch` is started in the `production` environment, and the provided Master Key is shorter than 16 bytes, then the binary exits with an error.

<img width="1351" alt="Capture d’écran 2022-12-22 à 17 28 47" src="https://user-images.githubusercontent.com/41078892/209180567-fa54fe33-fbc4-4b9f-b281-7dfb7b33af85.png">


---

This implements the solution B described here: https://github.com/meilisearch/product/discussions/538#discussioncomment-4391346 

### Implementation standpoint

- Add a new `meilisearch-auth::generate_master_key` function that uses a Cryptographic Random Number Generator (CRNG) to fill a vector of 32 bytes before encoding these bytes as base64

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-01-02 16:00:40 +00:00
Tamo
1692f58b83 slightly update the message associated with the cli parameter + accept an env variable 2023-01-02 16:49:35 +01:00
Tamo
9ba4d0f921 update the error messages according to the spec 2023-01-02 16:43:23 +01:00
Tamo
4b6ffe0cd1 Update meilisearch-auth/src/lib.rs 2023-01-02 16:33:02 +01:00
bors[bot]
336c77aa45 Merge #3245
3245: Enable create_raw_index(...) to specify time r=irevoire a=amab8901

# Pull Request

## Related issue
Partially fixes #2983 

## What does this PR do?
- Enables [`create_raw_index`](660be071b5/index-scheduler/src/lib.rs (L868)) to specify time

## PR checklist
Please check if your PR fulfills the following requirements:
- [ X ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ X ] Have you read the contributing guidelines?
- [ X ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: amab8901 <amab8901@protonmail.com>
2023-01-02 14:11:39 +00:00
bors[bot]
9519e60f97 Merge #709
709: Optimise the `ExactWords` sub-criterion within `Exactness` r=loiclec a=loiclec

# Pull Request

## Related issue
Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3116

## What does this PR do?
1. Reduces the algorithmic complexity of finding the documents containing N exact words from something that is exponential to something that is polynomial.
2. Cache intermediary results between different calls to the `exactness` criterion.

## Performance Results
On the `smol_songs.csv` dataset, a request containing 10 common words now takes about 60ms instead of 5 seconds to execute. For example, this is the case with this (admittedly nonsensical) request: `Rock You Hip Hop Folk World Country Electronic Love The`.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-02 12:28:30 +00:00
Loïc Lecrenier
b5df889dcb Apply review suggestions: simplify implementation of exactness criterion 2023-01-02 13:11:47 +01:00
bors[bot]
31155dce4c Merge #752
752: Simplify primary key inference r=irevoire a=dureuill

# Pull Request

## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3233

## What does this PR do?

### User PoV

- Change primary key inference to only consider a value as a candidate when it ends with "id", rather than when it simply contains "id".
- Change primary key inference to always fail when there are multiple candidates.
- Replace UserError::MissingPrimaryKey with `UserError::NoPrimaryKeyCandidateFound` and `UserError::MultiplePrimaryKeyCandidatesFound`

### Implementation-wise

- Remove uses of UserError::MissingPrimaryKey not pertaining to inference. This introduces a possible panicking path.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-02 11:44:22 +00:00
Loïc Lecrenier
8d36570958 Add explicit criterion impl strategy to proximity search tests 2023-01-02 10:37:01 +01:00
dependabot[bot]
939e7faf31 Bump taiki-e/install-action from 1 to 2
Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 1 to 2.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/install-action/compare/v1...v2)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-01 10:02:00 +00:00
bors[bot]
776acb5ed3 Merge #3276
3276: README: Replace Slack link with Discord r=dureuill a=shivaylamba

# Pull Request

## Related issue
Fixes #3275 

## What does this PR do?

Update Slack link with Discord link in the README

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Shivay Lamba <shivaylamba@gmail.com>
2022-12-29 10:32:42 +00:00
Louis Dureuil
3e9834abff Change error message when the db version is incompatible with engine version. 2022-12-26 17:34:36 +01:00
Louis Dureuil
3cba476a9f Add --generate-master-key CLI option 2022-12-26 10:36:45 +01:00
Louis Dureuil
57e851d8a9 Check for key length 2022-12-26 10:36:45 +01:00
Shivay Lamba
9c45850bd2 README: Replace Slack link with Discord 2022-12-26 00:19:13 +05:30
funilrys
3e0e8164a3 fixup! Adjust + Cleanup changes. 2022-12-22 18:01:54 +01:00
funilrys
0bc4572905 Adjust + Cleanup changes.
Indeed, I missed some of the changed that were introduced by #3190.
2022-12-22 17:53:33 +01:00
funilrys
4e6c663a2e Release unecessary ownership. 2022-12-22 17:47:58 +01:00
funilrys
e2775c6f49 Remove unused object. 2022-12-22 17:47:58 +01:00
funilrys
c07a5932cb Apply fmt. 2022-12-22 17:47:58 +01:00
funilrys
528a944997 Reimplement v5 date extraction.
Indeed, before this patch the implementation wasn't correct.
2022-12-22 17:47:58 +01:00
funilrys
13fb5ce974 Re-Open tasks list when needed.
Indeed, before this patch we were using the reference instead of
"reopening" the task list each time we needed to access it.
Without this patch, all other usage of the task attribute will
break.
2022-12-22 17:47:57 +01:00
funilrys
a43a0712fa Add reader.v5.tasks.Task.updated_at.
There was no way to "quickly" get the update date.
2022-12-22 17:47:57 +01:00
funilrys
1be4619b91 Add reader.v5.tasks.Task.created_at.
There was no way to "quickly" get the creation date.
2022-12-22 17:47:57 +01:00
funilrys
cf50f85986 Add reader.v5.tasks.Task.processed_at.
There was no way to "quickly" get the processed date.
2022-12-22 17:47:57 +01:00
funilrys
61b3a29ff3 Extract the dates out of the dumpv5.
This patch possibly fixes #2986.

This patch introduces a way to fill the IndexMetadata.created_at
and IndexMetadata.updated_at keys from the tasks events.
This is done by reading the creation date of the first event
(created_at) and the creation date of the last event (updated_at).
2022-12-22 17:47:57 +01:00
Loïc Lecrenier
32c6062e65 Optimise exactness criterion
1. Cache some results between calls to next()
2. Compute the combinations of exact words more efficiently
2022-12-22 12:28:45 +01:00
Loïc Lecrenier
f097aafa1c Add unit test for prefix handling by the proximity criterion 2022-12-22 12:08:00 +01:00
Loïc Lecrenier
777b387dc4 Avoid a prefix-related worst-case scenario in the proximity criterion 2022-12-22 12:08:00 +01:00
Loïc Lecrenier
b0f3dc2c06 Interpret synonyms as phrases 2022-12-22 12:07:51 +01:00
Louis Dureuil
66e18eae79 auth: add generate_master_key function 2022-12-22 11:55:27 +01:00
amab8901
9a39c4e40d Get date from IndexMetaData 2022-12-22 11:46:17 +01:00
amab8901
df176aaf01 Insert dump_reader.date() into create_raw_index(_) argument 2022-12-21 15:16:31 +01:00
Louis Dureuil
4b166bea2b Add primary_key_inference test 2022-12-21 15:13:38 +01:00
Louis Dureuil
5943100754 Fix existing tests 2022-12-21 15:13:38 +01:00
Louis Dureuil
b24def3281 Add logging when inference took place.
Displays log message in the form:
```
[2022-12-21T09:19:42Z INFO  milli::update::index_documents::enrich] Primary key was not specified in index. Inferred to 'id'
```
2022-12-21 15:13:38 +01:00
Louis Dureuil
402dcd6b2f Simplify primary key inference 2022-12-21 15:13:38 +01:00
Louis Dureuil
13c95d25aa Remove uses of UserError::MissingPrimaryKey not related to inference 2022-12-21 15:13:36 +01:00
amab8901
0893b175dc Merge branch 'main' into 2983-forward-date-to-milli 2022-12-21 14:31:19 +01:00
amab8901
d5978d11e1 Refactor 2022-12-21 14:28:00 +01:00
bors[bot]
a8defb585b Merge #742
742: Add a "Criterion implementation strategy" parameter to Search r=irevoire a=loiclec

Add a parameter to search requests which determines the implementation strategy of the criteria. This can be either `set-based`, `iterative`, or `dynamic` (ie choosing between set-based or iterative at search time). See https://github.com/meilisearch/milli/issues/755 for more context about this change.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-12-21 12:18:49 +00:00
Loïc Lecrenier
339a4b0789 Make clippy happy 2022-12-21 12:49:34 +01:00