Commit Graph

8607 Commits

Author SHA1 Message Date
dependabot[bot]
40ad19ba9e Bump Swatinem/rust-cache from 2.4.0 to 2.5.0
Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.4.0 to 2.5.0.
- [Release notes](https://github.com/swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](https://github.com/swatinem/rust-cache/compare/v2.4.0...v2.5.0)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-01 17:46:11 +00:00
Cong Chen
9859e65d2f fix tests 2023-07-01 09:32:50 +08:00
Cong Chen
3bdf01bc1c Fix failed test 2023-06-30 17:39:23 +08:00
Cong Chen
a5a31667b0 fix converse result of is_task_processing() 2023-06-30 11:28:18 +08:00
Clément Renault
cab4c4d7c9 Add a UTMs to the Cloud link 2023-06-29 17:59:59 +02:00
Clément Renault
4ec08e9430 Add a new link to the cloud pricing page 2023-06-29 17:38:10 +02:00
meili-bors[bot]
661d1f90dc Merge #3866
3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish

# Pull Request

Update Charabia:
- enhance Japanese segmentation
- enhance Latin Tokenization
  - words containing `_` are now properly segmented into several words
  - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule
- fixes #3815
- fixes #3778
- fixes [product#151](https://github.com/meilisearch/product/discussions/151)

> Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22`

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-29 15:24:36 +00:00
ManyTheFish
6ec7541026 Update inta snapshots 2023-06-29 17:18:39 +02:00
ManyTheFish
e8dee3ca65 Update lock file 2023-06-29 17:02:24 +02:00
ManyTheFish
a82c49ab08 Update test 2023-06-29 15:56:36 +02:00
ManyTheFish
84845de9ef Update Charabia 2023-06-29 15:56:32 +02:00
meili-bors[bot]
c9b3f80947 Merge #3780
3780: Be able to sort facet values by alpha or count r=dureuill a=Kerollmops

This PR introduces a new `sortFacetValuesBy` settings parameter to expose the facet distribution in either count or lexicographic/alpha order.

## Mini Spec of the `sortFacetValuesBy` Settings Parameter

This parameter can be set in the settings to change how the engine returns the facet values. There are two possible values to this parameter.

Please note that the current behavior changed a bit, and keys are returned in lexicographic order instead of undefined order. The previous order wasn't defined as we were using a `HashMap`, which returns entries in hash order (undefined), and we are now using an `IndexMap`, which returns them in insertion order (the order we actually want).

Also, note that there are performance issues when the dataset is enormous. Here are the timings of the engine running on my Macbook Pro M1 (16Go of RAM). [The dataset is 40 million songs file](https://www.notion.so/meilisearch/Songs-from-MusicBrainz-686e31b2bd3845898c7746f502a6e117), and the database size is about 50GiB. Even if you think 800ms is not that high, don't forget that the API is public, and anybody can ask for multiple facets in a single query.

| Search Kind | Get Facets | Max Values per Facet | Time for Alpha | Time for Count | Count but with #3788 |
|------------:|------------|----------------------|:--------------:|----------------|----------------------|
| Placeholder | genres     | default (100)        | 7ms            | 187ms          | 122ms                |
| Placeholder | genres     | 20                   | 6ms            | 124ms          | 75ms                 |
| Placeholder | album      | default (100)        | 9ms            | 808ms          | 677ms                |
| Placeholder | album      | 20                   | 8ms            | 579ms          | 446ms                |
| Placeholder | artist     | default (100)        | 9ms            | 462ms          | 344ms                |
| Placeholder | artist     | 20                   | 9ms            | 341ms          | 246ms                |

### Order Values in Alphanumeric Order

This is the default one. Values will be returned by lexicographic order, ascending from A to Z.

```bash
# First, update the settings
curl 'localhost:7700/indexes/movies/settings/facetting' \
  -H "Content-Type: application/json"  \
  -d '{ "sortFacetValuesBy": { "*": "alpha" } }'

# Then, ask for the facet distribution
curl 'localhost:7700/indexes/movies/search?facets=genres'
```

```json5
{
    "hits": [
        /* list of results */
    ],
    "query": "",
    "processingTimeMs": 0,
    "limit": 20,
    "offset": 0,
    "estimatedTotalHits": 1000,
    "facetDistribution": {
        "genres": {
            "Action": 3215,
            "Adventure": 1972,
            "Animation": 1577,
            "Comedy": 5883,
            "Crime": 1808,
            // ...
        }
    },
    "facetStats": {}
}
```

### Order Values in Count Order

Facet values are sorted by decreasing count. The count is the number of records containing this facet value in the query results.

```bash
# First, update the settings
curl 'localhost:7700/indexes/movies/settings/facetting' \
  -H "Content-Type: application/json"  \
  -d '{ "sortFacetValuesBy": { "*": "count" } }'

# Then, ask for the facet distribution
curl 'localhost:7700/indexes/movies/search?facets=genres'
```

```json5
{
    "hits": [
        /* list of results */
    ],
    "query": "",
    "processingTimeMs": 0,
    "limit": 20,
    "offset": 0,
    "estimatedTotalHits": 1000,
    "facetDistribution": {
        "genres": {
            "Drama": 7337,
            "Comedy": 5883,
            "Action": 3215,
            "Thriller": 3189,
            "Romance": 2507,
            // ...
        }
    },
    "facetStats": {}
}
```

## Todo List
 - [x] Add tests
 - [x] Send analytics when a user change the `sortFacetValuesBy`
 - [x] Create a prototype and announce it in https://github.com/meilisearch/product/discussions/519.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-06-29 12:43:25 +00:00
Clément Renault
09c5edf242 Cargo fmt 2023-06-29 14:37:18 +02:00
Clément Renault
4e85f91aee Add a non default value to the faceting settings of the dump tests 2023-06-29 14:33:33 +02:00
Clément Renault
7c157fc442 Document that the LevelEntry fields order is important 2023-06-29 14:33:32 +02:00
Clément Renault
0b97596c93 Replace unwraps with ? 2023-06-29 14:33:32 +02:00
Clément Renault
a0e0fce677 Simplify a Rust lifetime trick 2023-06-29 14:33:32 +02:00
Clément Renault
3c295c1ffc Fix typos 2023-06-29 14:33:32 +02:00
Clément Renault
b951830461 Add more tests 2023-06-29 14:33:32 +02:00
Clément Renault
9a13b72f25 Fix the tests 2023-06-29 14:33:32 +02:00
Clément Renault
1d8dfafd25 Add analytics when all facets are sorted by count and the number of modified ones 2023-06-29 14:33:31 +02:00
Kerollmops
eed9176e0c Also reset the sortFacetValuesBy when reseting the faceting settings 2023-06-29 14:33:31 +02:00
Kerollmops
b132e859f7 Make clippy happy 2023-06-29 14:33:31 +02:00
Kerollmops
9917bf046a Move the sortFacetValuesBy in the faceting settings 2023-06-29 14:33:31 +02:00
Kerollmops
d9fea0143f Make Clippy happy 2023-06-29 14:33:31 +02:00
Kerollmops
a385642ec3 Replace the BTreeMap by an IndexMap to return values in order 2023-06-29 14:33:31 +02:00
Kerollmops
34b2e98fe9 Expose a sortFacetValuesBy parameter to the user 2023-06-29 14:33:00 +02:00
Kerollmops
80bbd4b6f3 Clean and make the facet order configurable internally 2023-06-29 14:31:17 +02:00
Kerollmops
f42bef2f66 Make the search to always return the facets ordered by count 2023-06-29 14:31:17 +02:00
Kerollmops
bd3c026406 First to-test version of the algorithm 2023-06-29 14:31:17 +02:00
Kerollmops
84f8938f33 Rename facet distribution to be explicit on the order to find them 2023-06-29 14:31:15 +02:00
meili-bors[bot]
34a07110de Merge #3864
3864: Remove `/experimental-features` verbs that weren't in the PRD r=dureuill a=dureuill

Removes:

- POST `/experimental-features`
- DELETE `/experimental-features`

keeping only:

- PATCH `/experimental-features`
- GET `/experimental-features`

The two routes that are described in the PRD.

Following `@guimachiavelli's` [question](https://github.com/meilisearch/documentation/issues/2482#issuecomment-1611845372) about the POST route.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-29 09:43:14 +00:00
meili-bors[bot]
73bb080a26 Merge #3699
3699: Search for Facet Values r=Kerollmops a=Kerollmops

This PR introduces the first version of [the _Search for Facet Values_ feature](https://github.com/meilisearch/product/discussions/515) that allows a user to search for facets, by optionally using a prefix string and optionally specifying the `q` and `filter` original search parameters to restrict the candidates to search in.

The steps to merge it into Meilisearch will first start by providing prototype Docker images. This way users will be able to test the prototypes before using them.

The current route to use the _Search for Facet Values_ feature is the `POST /indexes/{index}/facet-search` where the body is a JSON object that looks like the following:
```json5
{
  "q": "spiderman", // optional
  "filter": "rating > 10", // optional
  "facetName": "genres",
  "facetQuery": "a" // optional
}
```

## What is missing?

 - [x] Send some analytics.
 - [x] Support the `matchingStrategy` parameter.
 - [x] Make sure that the errors are the right ones.
 - [x] Use the [Index typo tolerance settings](https://www.meilisearch.com/docs/learn/configuration/typo_tolerance#minwordsizefortypos) when matching facet values.
    - [x] minWordSizeForTypos.oneTypo
    - [x] minWordSizeForTypos.twoTypo
 - [x] Add tests
 - [x] Log the time it took to compute the results.
 - [x] Fix the compilation warnings.
 - [x] [Create an issue to fix potential performance issues when indexing](https://github.com/meilisearch/meilisearch/issues/3862).


Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-06-29 09:08:55 +00:00
Clément Renault
44b5b9e1a7 Improve the documentation of the FacetSearchQuery struct 2023-06-29 10:28:23 +02:00
Louis Dureuil
68356869c0 Remove /experimental-features verbs that weren't in the PRD 2023-06-29 10:02:55 +02:00
Cong Chen
e3fc7112bc use RoaringBitmap::is_empty instead 2023-06-29 11:46:47 +08:00
Louis Dureuil
605c1dd54a Fix analytics 2023-06-28 16:41:56 +02:00
Clément Renault
3e3f73ba1e Fix the analytics 2023-06-28 15:45:09 +02:00
Clément Renault
efbe7ce78b Clean the facet string FSTs when we clear the documents 2023-06-28 15:36:32 +02:00
Louis Dureuil
82e1f59f1e Add attributes_to_search_on 2023-06-28 15:28:24 +02:00
Clément Renault
362e9ff845 Add more tests 2023-06-28 15:28:24 +02:00
Clément Renault
32f2556d22 Move the additional_search_parameters_provided analytic inside facets 2023-06-28 15:06:09 +02:00
Kerollmops
63fd10aaa5 Fix the invalid facet name field error code 2023-06-28 15:06:09 +02:00
Kerollmops
29b40295b8 Ignore unknown facet search query parameters 2023-06-28 15:06:09 +02:00
Kerollmops
26f0fa678d Change the error message when a facet is not searchable 2023-06-28 15:06:09 +02:00
Kerollmops
60ddd53439 Return one of the original facet values when doing a facet search 2023-06-28 15:06:09 +02:00
Kerollmops
2bcd8d2983 Make sure the facet queries are normalized 2023-06-28 15:06:09 +02:00
Kerollmops
09079a4e88 Remove useless InvalidSearchFacet error 2023-06-28 15:06:09 +02:00
Kerollmops
904f6574bf Make rustfmt happy 2023-06-28 15:06:08 +02:00
Kerollmops
6fb8af423c Rename the hits and query output into facetHits and facetQuery respectively 2023-06-28 15:06:08 +02:00