Commit Graph

8283 Commits

Author SHA1 Message Date
Louis Dureuil
bb40ce6e35 Experimental features analytics match the spec 2023-07-10 08:57:53 +02:00
meili-bors[bot]
ff192bb480 Merge #3889
3889: Display the total number of tasks matching a filter/query r=dureuill a=Kerollmops

This PR returns a new field on the `/tasks` routes. The `total` field exposes the total number of tasks that matches the given filter/query. It is useful to display information on a user interface and can help understand when progress is made in processing tasks, i.e., the total number of tasks on `/tasks?statuses=succeeded` will increase over time.

Fixes #3888.

- [ ] Update the specs fo the `/tasks` route.

## How have I implemented it?

I found it much easier to run two times the task filtering system. Once with the original `from` and `limit` parameters and a second time without. The second call will return the total number of tasks that match the query, not only the number of tasks on the current page.

So far, in terms of performance, there doesn't seem to be any issue. I tried different filters with something like 250k tasks. Note that there is a limit of 1M tasks in the queue.

Co-authored-by: Clément Renault <clement@meilisearch.com>
v1.3.0-rc.1
2023-07-06 10:23:09 +00:00
Clément Renault
22762808ab Fix the tests 2023-07-06 12:13:29 +02:00
Clément Renault
86b834c9e4 Display the total number of tasks in the tasks route 2023-07-06 10:05:18 +02:00
meili-bors[bot]
886c8bb647 Merge #3891
3891: Fix the way we compute the 99th percentile r=dureuill a=Kerollmops

This PR fixes how we compute the 99th percentile by avoiding using float and doing the multiplication and divisions in the correct order avoiding going out of the buffer of timings. You can see the issue on [this rust playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021).

When there are a very small number of successful requests, the number is so tiny that the 99th percentile calculus sometimes gives an index out of the buffer. In this example, the `1`/`1.0` represent the number of timings you collected (one). As you can see, the float computation gives us the index `1.0`, with is out of a vector of only one value. This makes the engine generate a `null` value.

```rust
1 * 99 / 100 = 0 // with integers
0.99_f64 * (1.0 - 1.0) + 1.0 = 1.0 // with floats
```

Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-07-06 06:04:08 +00:00
meili-bors[bot]
b422e5fdc3 Merge #3890
3890: Fix the analytics of the sort facet values by count feature r=dureuill a=Kerollmops

This PR ensures we return the right analytics from the settings route.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-07-06 05:24:40 +00:00
Clément Renault
d727ebee05 Fix the way we compute the 99th percentile 2023-07-05 17:53:09 +02:00
Clément Renault
da39a7b29e Return the right analytics 2023-07-05 17:27:51 +02:00
meili-bors[bot]
377fe33aac Merge #3885
3885: Exactness missing field r=dureuill a=dureuill

# Pull Request

Adds fields to score details that were [specified](c25d758264/text/0195-ranking-score.md (322-ranking-rule-specific-fields)), but missing in the implementation:

- `exactness.matchingWords`
- `exactness.maxMatchingWords` 


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-04 15:14:53 +00:00
Louis Dureuil
55cd7738b9 Update snapshots 2023-07-04 16:31:01 +02:00
Louis Dureuil
48409c9183 Add missing exactness.matchingWords, exactness.maxMatchingWords 2023-07-04 16:31:01 +02:00
meili-bors[bot]
82650eaae1 Merge #3877
3877: update the total_received properties of multiple events r=dureuill a=dureuill

# Pull Request

## Related issue
Fixes #3814 

## What does this PR do?
-fix name of `total_received` for several events


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-07-03 19:49:53 +00:00
meili-bors[bot]
b8ca09c13f Merge #3878
3878: Remove unsafe `atty` dependency r=dureuill a=Kerollmops

This PR replaces the `atty` dependency with the `is-terminal` one. We do that to fix GHSA-g98v-hv3f-hcfr.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-07-03 19:07:03 +00:00
Kerollmops
a442af6a7c Update the features of the either dependency to compile milli successfully 2023-07-03 18:51:43 +02:00
Kerollmops
e7f8daaf86 Update criterion to 0.5.1 to remove the atty dependency 2023-07-03 18:51:42 +02:00
Kerollmops
d1ff631df8 Replace the atty dependency with the is-terminal one 2023-07-03 18:51:42 +02:00
Tamo
202183adf8 update the total_received properties of multiple events 2023-07-03 15:57:09 +02:00
meili-bors[bot]
aae099e330 Merge #3851
3851: Expose lastUpdate and isIndexing in /stats endpoint r=dureuill a=gentcys

# Pull Request

## Related issue
Fixes #3843

## What does this PR do?
- expose lastUpdate in `/stats` endpoint
- expose isIndex in `stats` endpoint
- add a method `is_task_processing` in index-scheduler/src/lib.rs.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Cong Chen <cong.chen@ocrlabs.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-03 13:41:04 +00:00
Louis Dureuil
5387cf1718 Don't unwrap in case of error/missing last_update field 2023-07-03 15:32:11 +02:00
meili-bors[bot]
a0df4becf4 Merge #3867
3867: Add a new link to the cloud pricing page r=curquiza a=Kerollmops

This PR promotes the Cloud by adding a link to the Pricing page to the startup message!

<img width="1002" alt="Capture d’écran 2023-06-29 à 17 40 22" src="https://github.com/meilisearch/meilisearch/assets/3610253/b0528c24-fcc2-43ff-a6a1-3ed91716663b">

Co-authored-by: Clément Renault <clement@meilisearch.com>
v1.3.0-rc.0
2023-07-03 11:25:26 +00:00
meili-bors[bot]
e0a2f88fb0 Merge #3874
3874: Update version for the next release (v1.3.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: gillian-meilisearch <gillian-meilisearch@users.noreply.github.com>
2023-07-03 10:37:03 +00:00
meili-bors[bot]
e871906370 Merge #3876
3876: Fix invalid attributeToSearchOn error code r=Kerollmops a=ManyTheFish

Fix the invalid attributeToSearchOn error code to be consistent with the other search parameters' error codes:

error code `invalid_attributes_to_search_on` becomes `invalid_search_attributes_to_search_on`:
```diff
- invalid_attributes_to_search_on
+ invalid_search_attributes_to_search_on
```

related to #3772


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-07-03 10:06:30 +00:00
ManyTheFish
7a80c0dfb3 Fix invalid attributeToSearchOn error code to be consistent with the others search parameters error codes 2023-07-03 11:52:43 +02:00
ManyTheFish
71500a4e15 Update tests 2023-07-03 11:20:43 +02:00
meili-bors[bot]
a9f691f279 Merge #3873
3873: Format let-else ❤️ 🎉 r=Kerollmops a=dureuill

# Pull Request

Allows passing CI after landing of 6162f6f123

## What does this PR do?
- `cargo +nightly fmt`

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-03 09:01:20 +00:00
gillian-meilisearch
1d40452057 Update version for the next release (v1.3.0) in Cargo.toml 2023-07-03 08:32:21 +00:00
Louis Dureuil
324d448236 Format let-else ❤️ 🎉 2023-07-03 10:20:28 +02:00
Cong Chen
9859e65d2f fix tests 2023-07-01 09:32:50 +08:00
Cong Chen
3bdf01bc1c Fix failed test 2023-06-30 17:39:23 +08:00
Cong Chen
a5a31667b0 fix converse result of is_task_processing() 2023-06-30 11:28:18 +08:00
Clément Renault
cab4c4d7c9 Add a UTMs to the Cloud link 2023-06-29 17:59:59 +02:00
Clément Renault
4ec08e9430 Add a new link to the cloud pricing page 2023-06-29 17:38:10 +02:00
meili-bors[bot]
661d1f90dc Merge #3866
3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish

# Pull Request

Update Charabia:
- enhance Japanese segmentation
- enhance Latin Tokenization
  - words containing `_` are now properly segmented into several words
  - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule
- fixes #3815
- fixes #3778
- fixes [product#151](https://github.com/meilisearch/product/discussions/151)

> Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22`

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-29 15:24:36 +00:00
ManyTheFish
6ec7541026 Update inta snapshots 2023-06-29 17:18:39 +02:00
ManyTheFish
e8dee3ca65 Update lock file 2023-06-29 17:02:24 +02:00
ManyTheFish
a82c49ab08 Update test 2023-06-29 15:56:36 +02:00
ManyTheFish
84845de9ef Update Charabia 2023-06-29 15:56:32 +02:00
meili-bors[bot]
c9b3f80947 Merge #3780
3780: Be able to sort facet values by alpha or count r=dureuill a=Kerollmops

This PR introduces a new `sortFacetValuesBy` settings parameter to expose the facet distribution in either count or lexicographic/alpha order.

## Mini Spec of the `sortFacetValuesBy` Settings Parameter

This parameter can be set in the settings to change how the engine returns the facet values. There are two possible values to this parameter.

Please note that the current behavior changed a bit, and keys are returned in lexicographic order instead of undefined order. The previous order wasn't defined as we were using a `HashMap`, which returns entries in hash order (undefined), and we are now using an `IndexMap`, which returns them in insertion order (the order we actually want).

Also, note that there are performance issues when the dataset is enormous. Here are the timings of the engine running on my Macbook Pro M1 (16Go of RAM). [The dataset is 40 million songs file](https://www.notion.so/meilisearch/Songs-from-MusicBrainz-686e31b2bd3845898c7746f502a6e117), and the database size is about 50GiB. Even if you think 800ms is not that high, don't forget that the API is public, and anybody can ask for multiple facets in a single query.

| Search Kind | Get Facets | Max Values per Facet | Time for Alpha | Time for Count | Count but with #3788 |
|------------:|------------|----------------------|:--------------:|----------------|----------------------|
| Placeholder | genres     | default (100)        | 7ms            | 187ms          | 122ms                |
| Placeholder | genres     | 20                   | 6ms            | 124ms          | 75ms                 |
| Placeholder | album      | default (100)        | 9ms            | 808ms          | 677ms                |
| Placeholder | album      | 20                   | 8ms            | 579ms          | 446ms                |
| Placeholder | artist     | default (100)        | 9ms            | 462ms          | 344ms                |
| Placeholder | artist     | 20                   | 9ms            | 341ms          | 246ms                |

### Order Values in Alphanumeric Order

This is the default one. Values will be returned by lexicographic order, ascending from A to Z.

```bash
# First, update the settings
curl 'localhost:7700/indexes/movies/settings/facetting' \
  -H "Content-Type: application/json"  \
  -d '{ "sortFacetValuesBy": { "*": "alpha" } }'

# Then, ask for the facet distribution
curl 'localhost:7700/indexes/movies/search?facets=genres'
```

```json5
{
    "hits": [
        /* list of results */
    ],
    "query": "",
    "processingTimeMs": 0,
    "limit": 20,
    "offset": 0,
    "estimatedTotalHits": 1000,
    "facetDistribution": {
        "genres": {
            "Action": 3215,
            "Adventure": 1972,
            "Animation": 1577,
            "Comedy": 5883,
            "Crime": 1808,
            // ...
        }
    },
    "facetStats": {}
}
```

### Order Values in Count Order

Facet values are sorted by decreasing count. The count is the number of records containing this facet value in the query results.

```bash
# First, update the settings
curl 'localhost:7700/indexes/movies/settings/facetting' \
  -H "Content-Type: application/json"  \
  -d '{ "sortFacetValuesBy": { "*": "count" } }'

# Then, ask for the facet distribution
curl 'localhost:7700/indexes/movies/search?facets=genres'
```

```json5
{
    "hits": [
        /* list of results */
    ],
    "query": "",
    "processingTimeMs": 0,
    "limit": 20,
    "offset": 0,
    "estimatedTotalHits": 1000,
    "facetDistribution": {
        "genres": {
            "Drama": 7337,
            "Comedy": 5883,
            "Action": 3215,
            "Thriller": 3189,
            "Romance": 2507,
            // ...
        }
    },
    "facetStats": {}
}
```

## Todo List
 - [x] Add tests
 - [x] Send analytics when a user change the `sortFacetValuesBy`
 - [x] Create a prototype and announce it in https://github.com/meilisearch/product/discussions/519.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-06-29 12:43:25 +00:00
Clément Renault
09c5edf242 Cargo fmt 2023-06-29 14:37:18 +02:00
Clément Renault
4e85f91aee Add a non default value to the faceting settings of the dump tests 2023-06-29 14:33:33 +02:00
Clément Renault
7c157fc442 Document that the LevelEntry fields order is important 2023-06-29 14:33:32 +02:00
Clément Renault
0b97596c93 Replace unwraps with ? 2023-06-29 14:33:32 +02:00
Clément Renault
a0e0fce677 Simplify a Rust lifetime trick 2023-06-29 14:33:32 +02:00
Clément Renault
3c295c1ffc Fix typos 2023-06-29 14:33:32 +02:00
Clément Renault
b951830461 Add more tests 2023-06-29 14:33:32 +02:00
Clément Renault
9a13b72f25 Fix the tests 2023-06-29 14:33:32 +02:00
Clément Renault
1d8dfafd25 Add analytics when all facets are sorted by count and the number of modified ones 2023-06-29 14:33:31 +02:00
Kerollmops
eed9176e0c Also reset the sortFacetValuesBy when reseting the faceting settings 2023-06-29 14:33:31 +02:00
Kerollmops
b132e859f7 Make clippy happy 2023-06-29 14:33:31 +02:00
Kerollmops
9917bf046a Move the sortFacetValuesBy in the faceting settings 2023-06-29 14:33:31 +02:00