Commit Graph

7792 Commits

Author SHA1 Message Date
Louis Dureuil
00746b32c0 Add Index::map_size 2023-01-10 11:16:51 +01:00
bors[bot]
e27bb8ab3e Merge #3246
3246: Implement most of the error handling enhancement planned for v1.0 r=irevoire a=irevoire

Fix #3095 and #2325
Close https://github.com/meilisearch/meilisearch/pull/2540

Implements most of https://github.com/meilisearch/specifications/pull/212

## Generic error message we re-implements (in deserr):

- [x] Json
  - [x] Incorrect value kind
  - [x] Missing field
  - [x] Unknown key
  - [x] Unexpected
  - [x] Reimplement the way we show the location

- [x] Query parameter
  - [x] Incorrect value kind
  - [x] Missing field
  - [x] Unknown key
  - [x] Unexpected

## Routes to implements:
- [x] Get search
- [x] Post search
- [x] Settings
- [x] Swap indexes
- [x] Task API
- [x] Documents ressource

Error codes to implements;
## Swap API

- [x] `duplicate_index_found` → `invalid_swap_duplicate_index_found`

## Search API

- [x] `invalid_search_q`
- [x] `invalid_search_offset`
- [x] `invalid_search_limit`
- [x] `invalid_search_page`
- [x] `invalid_search_hits_per_page`
- [x] `invalid_search_attributes_to_retrieve`
- [x] `invalid_search_attributes_to_crop`
- [x] `invalid_search_crop_length`
- [x] `invalid_search_attributes_to_highlight`
- [x] `invalid_search_show_matches_position`
- [x] `invalid_search_filter`
- [x] `invalid_search_sort`
- [x] `invalid_search_facets`
- [x] `invalid_search_highlight_pre_tag`
- [x] `invalid_search_highlight_post_tag`
- [x] `invalid_search_crop_marker`
- [x] `invalid_search_matching_strategy`

## Settings API

- [x] invalid_settings_displayed_attributes
- [x] invalid_settings_searchable_attributes
- [x] invalid_settings_filterable_attributes
- [x] invalid_settings_sortable_attributes
- [x] invalid_settings_ranking_rules
- [x] invalid_settings_stop_words
- [x] invalid_settings_synonyms
- [x] invalid_settings_distinct_attribute
- [x] Add invalid_settings_typo_tolerance
    - [x] ~~invalid_settings_typo_tolerance_min_word_size_for_typos~~ (Merge in **invalid_settings_typo_tolerance**)
- [x] invalid_settings_faceting
- [x] invalid_settings_pagination

## Task API

- [x] invalid_task_date_filer → invalid_task_before_enqueued_at_filter (for all date filter) ?

## Document Resource

- [x] ~~`primary_key_inference_failed` → `index_primary_key_`~~ This doesn't exists anymore after `@dureuill` PR's on the primary key inference

------------------

# Changes

# `code` property

## Swap API

- [x] `invalid_swap_duplicate_index_found`  [RENAME]
- [x] `invalid_swap_indexes`  [NEW]

## Index API

### POST

- [x] `missing_index_uid`  [NEW]

### POST/PATCH

- [x] `invalid_index_primary_key`  [NEW]

### GET

- [x] `invalid_index_limit`  [NEW]
- [x] `invalid_index_offset`  [NEW]

## Documents API

### GET

- [x] `fields` parameter error `bad_request` → `invalid_document_fields`  [NEW]
- [x] `limit` parameter error `bad_request` → `invalid_document_limit`  [NEW]
- [x] `offset` parameter error `bad_request` → `invalid_document_offset`  [NEW]

### POST/PUT

- [x] `?primaryKey` parameter error `bad_request` →  `invalid_index_primary_key`  [NEW]

## Keys API

### POST

- ~~`missing_parameter`~~
    - [x] `missing_api_key_actions`  [NEW]
    - [x] `missing_api_key_indexes`  [NEW]
    - [x] `missing_api_key_expires_at`  [NEW]

### GET

- [x] `limit` parameter `bad_request` → `invalid_api_key_limit`  [NEW]
- [x] `offset` parameter `bad_request` → `invalid_api_key_offset`  [NEW]

## Misc
- [x] ~~`invalid_geo_field`~~ → `invalid_document_geo_field`  [RENAME]

# `type` property

## `system`   [NEW]

- [x] `no_space_left_on_device` error code
- [x] `io_error` error code (**does not exist in the current spec, need a catch-up**)
- [x] `too_many_open_files` error code (**does not exist in the current spec, need a catch-up**)

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
v1.0.0-rc.0
2023-01-09 16:25:48 +00:00
Tamo
ff843881c5 remove the documentation of the query parameter extractor module 2023-01-09 15:14:48 +01:00
Loïc Lecrenier
ae08fba76e Remove forgotten comment 2023-01-09 13:45:03 +01:00
Loïc Lecrenier
af6d4b3031 Remove unused deserr extractor 2023-01-09 13:43:16 +01:00
Louis Dureuil
1cce613399 Fixup dumps-destination -> dump-directory section header in help link 2023-01-09 13:31:57 +01:00
Tamo
b03ee54fe0 makes clippy turbo-happy 2023-01-09 13:04:31 +01:00
Tamo
d17efb9ed6 use the published version of deserr 2023-01-09 12:51:10 +01:00
Loïc Lecrenier
9ab791bedc Update error codes on the api key routes 2023-01-09 12:30:25 +01:00
Loïc Lecrenier
96105a5e8d Update error codes on the documents/ routes 2023-01-09 12:30:25 +01:00
Tamo
e706628bb1 fix the error code of the swap index route 2023-01-06 14:48:25 +01:00
Tamo
3c630891bb fix the error code for the swap index 2023-01-05 21:25:20 +01:00
Tamo
97854274b4 rename the invalid_geo_field error code to invalid_document_geo_field 2023-01-05 21:08:19 +01:00
Tamo
0646f63404 implement the new type property for the system error 2023-01-05 21:06:50 +01:00
Tamo
ce3e8794a2 fix the tests after the rebase 2023-01-05 20:52:26 +01:00
Tamo
50ce0409bc Integrate deserr on the most important routes 2023-01-05 20:48:29 +01:00
bors[bot]
839b05c43d Merge #3305
3305: Remove hidden but usable CLI arguments r=Kerollmops a=Kerollmops

`@curquiza` found out that we were exposing some internal CLI arguments: `nb-max-chunks` and `log-every-n`. In this PR I removed those two, the only two ones that I found. Those options shouldn't be accessible as non-documented in the documentation or the `--help` message.

Fixes https://github.com/meilisearch/meilisearch/issues/3307

Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-01-05 17:11:58 +00:00
bors[bot]
cc699fae40 Merge #3308
3308: Remove `--generate-master-key` option r=Kerollmops a=dureuill

# Pull Request

## Related issue

Related to https://github.com/meilisearch/specifications/pull/210#issuecomment-1372035525

## What does this PR do?
- Remove the short-lived `--generate-master-key` flag that was too beautiful for this world :D.

Removal of this option proceeds of the following reasoning:

1. It is the only option that starts meilisearch and then immediately exits
2. We are unsure if we want to keep it under this form in the future or switch to a subcommand.
3. Releasing this option in v1 would make it insta-stable.
5. The option is only marginally useful, as users will be presented with freshly generated key directly in the error messages if their master key is absent/too short.
6. If we remove this option now, we can still add it back in a future v1 release. If we add it now, we won't be able to remove it in any future v1 version.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

### Impacts

this impacts the docs team as they would previously have had to document this option, and they may have wanted to use it in the user workflow.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-05 16:19:40 +00:00
Clément Renault
aa4b813237 Derive Default on IndexerOpts 2023-01-05 16:00:45 +01:00
Louis Dureuil
eb08a0fb0b Remove --generate-master-key option 2023-01-05 14:55:24 +01:00
Clément Renault
cda529c07b Remove hidden but usable CLI arguments 2023-01-05 14:25:41 +01:00
bors[bot]
1f8ddb366c Merge #3302
3302: Update insta snap tests for index dates of dump v5 r=curquiza a=loiclec

This PR simply updates the content of the insta snapshot test following https://github.com/meilisearch/meilisearch/pull/3013 . I manually verified that the dates in the snaps are indeed correct.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-05 12:58:10 +00:00
bors[bot]
8a3da0c2a7 Merge #3304
3304: Fix update cargo.toml workflow r=Kerollmops a=curquiza

Following https://github.com/meilisearch/meilisearch/pull/3224

Fixes #3219 

Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2023-01-05 12:16:57 +00:00
Clémentine Urquizar - curqui
c840d55e89 Fix update cargo.toml workflow 2023-01-05 12:56:02 +01:00
bors[bot]
c7a3992510 Merge #3303
3303: Update version for the next release (v1.0.0) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2023-01-05 11:53:09 +00:00
curquiza
28408816ef Update version for the next release (v1.0.0) in Cargo.toml files 2023-01-05 11:45:15 +00:00
bors[bot]
0eaa8ca255 Merge #3266
3266: Improve the way we receive the documents payload- serde multiple ndjson fix r=curquiza a=jiangbo212

# Pull Request

## Related issue
Fixes #3037 

## Related PR
#3164 

## What does this PR do?
Sorry, This PR is mainly to fix the problems caused by my previously provided PR #3164. It causes multiple ndjson data deserialization failures
- Fix serde multiple ndjson data failures and add test to it
- Fix serde jsonarray error and againest serde it use `from_slice`. only use `from_slice` when serde error category is `data`, it indicate json data is a single json.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: jiangbo212 <peiyaoliukuan@126.com>
2023-01-05 11:30:29 +00:00
bors[bot]
201bc633d2 Merge #3288
3288: Replace underscores with hyphens in documentation link to error code r=dureuill a=loiclec

# Pull Request

## Related issue
Fixes #3097 

## Implementation
Add a new dependency to `convert_case` (already used transitively by `deserr`) so that the link can be generated using:
```rust
    /// return the doc url associated with the error
    fn url(&self) -> String {
        format!(
            "https://docs.meilisearch.com/errors#{}",
            self.name().to_case(convert_case::Case::Kebab)
        )
    }
```

## Review
I'd like the reviewer to check whether it is expected that the content of some `dump` snapshot tests changed :-)

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-05 11:08:57 +00:00
Loïc Lecrenier
ba839852f5 Update insta snap tests for index dates of dump v5 2023-01-05 11:45:40 +01:00
Louis Dureuil
be9786bed9 Change primary key inference error messages 2023-01-05 10:40:09 +01:00
Loïc Lecrenier
f9aa897ab5 Update insta tests 2023-01-05 10:19:19 +01:00
Loïc Lecrenier
2d74678b51 Replace underscores with hyphens in doc link to error code 2023-01-05 10:09:02 +01:00
bors[bot]
db7eaf23f4 Merge #3251
3251: Add a specific test on finite pagination placeolder search with disti… r=curquiza a=ManyTheFish

Add a specific test on finite pagination placeholder search with distinct attributes


related to https://github.com/meilisearch/milli/pull/743
related to https://github.com/meilisearch/meilisearch/issues/3200

poke `@curquiza` 

> note that the destination branch should be changed

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-01-05 09:06:53 +00:00
bors[bot]
32f7cfa5cb Merge #3295
3295: Adjust Master Key-related messages r=dureuill a=dureuill

# Pull Request

## Related issue
Follow up for #3272 

## What does this PR do?
- Consistently capitalize "master key" (instead of "Master Key" sometimes) (see https://github.com/meilisearch/specifications/pull/209#discussion_r1060081094)
- Clarify that the counted unit for master key length is bytes, not characters (see https://github.com/meilisearch/documentation/issues/2069#issuecomment-1368873167)

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-05 08:43:23 +00:00
bors[bot]
a402fc4486 Merge #3013
3013: Extract the dates out of the dumpv5. r=loiclec a=funilrys

Hi there, 

please review this PR that tries to fix #2986. I'm still learning Rust and I found that #2986 is an excellent way for me to read and learn what others do with Rust. So please excuse my semantics ...

Stay safe and healthy.

---

# Pull Request

This patch possibly fixes #2986.

This patch introduces a way to fill the IndexMetadata.created_at and IndexMetadata.updated_at keys from the tasks events. This is done by reading the creation date of the first event (created_at) and the creation date of the last event (updated_at).


## Related issue
Fixes #2986

## What does this PR do?
- Extract the dates out of the dumpv5.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: funilrys <contact@funilrys.com>
2023-01-05 08:23:52 +00:00
bors[bot]
502d9e4b24 Merge #3278
3278: Remove `--max-index-size` and `--max-task-db-size` flags r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #3231 

## What does this PR do?
- Remove `--max-index-size` and `--max-task-db-size` flags from the CLI, config file and environment variable
- Set the size of all indexes to **500GiB** and the size of the task DB to **10GiB**.  Reviewers might want to review these values carefully.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-04 16:44:27 +00:00
Louis Dureuil
a85ff1f690 Fix documentation
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-01-04 17:20:03 +01:00
Louis Dureuil
233372abea Remove --max-index-size and --max-task-db-size 2023-01-04 17:20:01 +01:00
bors[bot]
13d4ae264a Merge #3269
3269: Simplify primary key inference r=dureuill a=dureuill

# Pull Request

## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3233

## What does this PR do?
- Integrates https://github.com/meilisearch/milli/pull/752 in meilisearch
- Remove `Serialize` and `Deserialize` from `error::Code` as it is unused.
- No longer filter on `milli` logs when `--log-level` is "info".
  - `milli` only has the newly-added inference log at the `info` level (from greping `info` in the codebase)
  - the default value for `--log-level` is "INFO" and not "info" since `v0.30` so the filter is not active by default.
- updates milli to v0.38.0

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-04 16:14:36 +00:00
bors[bot]
c766e06003 Merge #3281
3281: Merge `--schedule-snapshot` and `--snapshot-interval-sec` options r=dureuill a=dureuill

# Pull Request

## Related issue
Fixes #3131

## What does this PR do?
- Removes `--snapshot-interval-sec`
- `--schedule-snapshot` now accepts an optional integer value specifying the interval in seconds
- The config file no longer has a snapshot_interval_sec key.  Instead, the schedule_snapshot key now additionally accepts an integer value specifying the interval in seconds
- The env variable MEILI_SNAPSHOT_INTERVAL no longer exists
- The env variable MEILI_SCHEDULE_SNAPSHOT is always specified to the interval of the snapshot in seconds when defined. If snapshots are disabled the variable is undefined.

---

Relevant part of the `--help`

<img width="885" alt="Capture d’écran 2022-12-27 à 18 22 32" src="https://user-images.githubusercontent.com/41078892/209700626-1a1292c1-14e3-45b6-8265-e0adbd76ecf1.png">

---

### Tests

| `schedule_snapshot` in config.toml | `--schedule-snapshot` flag on CLI | `MEILI_SCHEDULE_SNAPSHOT` | `opt.schedule_snapshot` |
|--|--|--|--|
| missing | missing | missing | `Disabled`
| `false` | missing | missing | `Disabled`
| `true` | missing | missing | `Enabled(86400)`
| `1234` | missing | missing | `Enabled(1234)`
| missing | `--schedule-snapshot` | missing | `Enabled(86400)`
| `false` | `--schedule-snapshot` | missing | `Enabled(86400)` 
| missing | `--schedule-snapshot 2345` | missing | `Enabled(2345)`
| `false` | `--schedule-snapshot 2345` | missing | `Enabled(2345)`
| `true` | `--schedule-snapshot 2345` | missing | `Enabled(2345)`
| `1234` | `--schedule-snapshot 2345` | missing | `Enabled(2345)`
| `false` | `--schedule-snapshot 2345` | 3456 | `Enabled(2345)`
| `false` | `--schedule-snapshot` | 3456 | **`Enabled(86400)`**
| `1234` | missing | 3456 | `Enabled(3456)`
| `false` | missing | 3456 | `Enabled(3456)`


## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-04 14:25:47 +00:00
Louis Dureuil
fcbd47281b Fix tests 2023-01-04 14:24:20 +01:00
Louis Dureuil
b6d80293f7 Propagate new error codes from milli 2023-01-04 14:24:20 +01:00
Louis Dureuil
0e98a71a24 Update milli to v0.38 2023-01-04 14:24:20 +01:00
Louis Dureuil
5cb566b165 No longer filter out milli logs when --log-level is "info" 2023-01-04 14:24:20 +01:00
Louis Dureuil
9d46caba29 Code doesn't need to be serializable/deserializable 2023-01-04 14:16:22 +01:00
Louis Dureuil
c4aa5cc7d0 Merge --schedule-snapshot and --snapshot-interval-sec options 2023-01-04 14:13:54 +01:00
bors[bot]
12c3d432f9 Merge #3293
3293: Explicitly restrict log level options to those that are documented r=loiclec a=loiclec

Fixes https://github.com/meilisearch/meilisearch/issues/3292





Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-04 10:30:35 +00:00
bors[bot]
c3f4835e8e Merge #733
733: Avoid a prefix-related worst-case scenario in the proximity criterion r=loiclec a=loiclec

# Pull Request

## Related issue
Somewhat fixes (until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3118

## What does this PR do?
When a query ends with a word and a prefix, such as:
```
word pr
```
Then we first determine whether `pre` *could possibly* be in the proximity prefix database before querying it. There are then three possibilities:

1. `pr` is not in any prefix cache because it is not the prefix of many words. We don't query the proximity prefix database. Instead, we list all the word derivations of `pre` through the FST and query the regular proximity databases.

2. `pr` is in the prefix cache but cannot be found in the proximity prefix databases. **In this case, we partially disable the proximity ranking rule for the pair `word pre`.** This is done as follows:
   1. Only find the documents where `word` is in proximity to `pre` **exactly** (no derivations)
   2. Otherwise, assume that their proximity in all the documents in which they coexist is >= 8

3. `pr` is in the prefix cache and can be found in the proximity prefix databases. In this case we simply query the proximity prefix databases.

Note that if a prefix is longer than 2 bytes, then it cannot be in the proximity prefix databases. Also, proximities larger than 4 are not present in these databases either. Therefore, the impact on relevancy is:

1. For common prefixes of one or two letters: we no longer distinguish between proximities from 4 to 8
2. For common prefixes of more than two letters: we no longer distinguish between any proximities
3. For uncommon prefixes: nothing changes

Regarding (1), it means that these two documents would be considered equally relevant according to the proximity rule for the query `heard pr` (IF `pr` is the prefix of more than 200 words in the dataset):
```json
[
    { "text": "I heard there is a faster proximity criterion" },
    { "text": "I heard there is a faster but less relevant proximity criterion" }
]
```

Regarding (2), it means that two documents would be considered equally relevant according to the proximity rule for the query "faster pro":
```json
[
    { "text": "I heard there is a faster but less relevant proximity criterion" }
    { "text": "I heard there is a faster proximity criterion" },
]
```
But the following document would be considered more relevant than the two documents above:
```json
{ "text": "I heard there is a faster swimmer who is competing in the pro section of the competition " }
```

Note, however, that this change of behaviour only occurs when using the set-based version of the proximity criterion. In cases where there are fewer than 1000 candidate documents when the proximity criterion is called, this PR does not change anything. 

---

## Performance

I couldn't use the existing search benchmarks to measure the impact of the PR, but I did some manual tests with the `songs` benchmark dataset.   

```
1. 10x 'a': 
	- 640ms ⟹ 630ms                  = no significant difference
2. 10x 'b':
	- set-based: 4.47s ⟹ 7.42        = bad, ~2x regression
	- dynamic: 1s ⟹ 870 ms           = no significant difference
3. 'Someone I l':
	- set-based: 250ms ⟹ 12 ms       = very good, x20 speedup
	- dynamic: 21ms ⟹ 11 ms          = good, x2 speedup 
4. 'billie e':
	- set-based: 623ms ⟹ 2ms         = very good, x300 speedup 
	- dynamic: ~4ms ⟹ 4ms            = no difference
5. 'billie ei':
	- set-based: 57ms ⟹ 20ms         = good, ~2x speedup
	- dynamic: ~4ms ⟹ ~2ms.          = no significant difference
6. 'i am getting o' 
	- set-based: 300ms ⟹ 60ms        = very good, 5x speedup
	- dynamic: 30ms ⟹ 6ms            = very good, 5x speedup
7. 'prologue 1 a 1:
	- set-based: 3.36s ⟹ 120ms       = very good, 30x speedup
	- dynamic: 200ms ⟹ 30ms          = very good, 6x speedup
8. 'prologue 1 a 10':
	- set-based: 590ms ⟹ 18ms        = very good, 30x speedup 
	- dynamic: 82ms ⟹ 35ms           = good, ~2x speedup
```

Performance is often significantly better, but there is also one regression in the set-based implementation with the query `b b b b b b b b b b`.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-04 09:00:50 +00:00
Loïc Lecrenier
d082ded7ad Explicitly restrict log level options to those that are documented
Fixes https://github.com/meilisearch/meilisearch/issues/3292
2023-01-04 09:40:24 +01:00
bors[bot]
49f58b2c47 Merge #732
732: Interpret synonyms as phrases r=loiclec a=loiclec

# Pull Request

## Related issue
Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3125

## What does this PR do?
We now map multi-word synonyms to phrases instead of loose words. Such that the request:
```
btw I am going to nyc soon
```
is interpreted as (when the synonym interpretation is chosen for both `btw` and `nyc`):
```
"by the way" I am going to "New York City" soon
```
instead of:
```
by the way I am going to New York City soon
```

This prevents queries containing multi-word synonyms to exceed to word length limit and degrade the search performance.

In terms of relevancy, there is a debate to have. I personally think this could be considered an improvement, since it would be strange for a user to search for:
```
good DIY project
```
and have a result such as:
```
{
    "text": "whether it is a good project to do, you'll have to decide for yourself"
}
```
However, for synonyms such as `NYC -> New York City`, then we will stop matching documents where `New York` is separated from `City`. This is however solvable by adding an additional mapping: `NYC -> New York`.

## Performance

With the old behaviour, some long search requests making heavy uses of synonyms could take minutes to be executed. This is no longer the case, these search requests now take an average amount of time to be resolved.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-04 08:34:18 +00:00