Compare commits

..

193 Commits

Author SHA1 Message Date
7e2fd82e41 Use Language allow list in the highlighter 2023-03-08 12:44:16 +01:00
24c0775c67 Change indexing threshold 2023-03-08 12:36:04 +01:00
3092cf0448 Fix clippy errors 2023-03-08 10:53:42 +01:00
37d4551e8e Add a threshold filtering the Languages allowed to be detected at search time 2023-03-07 19:38:01 +01:00
da48506f15 Rerun extraction when language detection might have failed 2023-03-07 18:35:26 +01:00
370d88f626 Merge #3561
3561: Fix the snapshots permissions on unix system r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3507

The snapshot permissions were wrong after the v0.30 and the huge refacto of the index scheduler.
Fix this issue + add a test on the permissions on unix

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-03-07 08:51:38 +00:00
d34faa8f9c put back the sleep as it was and fix the from 2023-03-06 18:09:09 +01:00
e5d0bef6d8 update a comment 2023-03-06 17:04:24 +01:00
e704728ee7 fix the snapshots permissions on unix system 2023-03-06 16:28:40 +01:00
c0ede6d152 Merge #3562
3562: Update version for the next release (v1.1.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2023-03-06 13:54:16 +00:00
577e7126f9 Update version for the next release (v1.1.0) in Cargo.toml 2023-03-06 13:52:54 +00:00
3d1046369c Merge #3529
3529: Add an analytics on the geo bounding box feature r=ManyTheFish a=irevoire

Fixes #3527

[The specification of the geoBoundingBox](https://github.com/meilisearch/specifications/pull/223) feature has been updated and now introduces a new analytics to follow the usage of the geoBoundingBox feature in the search requests.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-03-02 11:58:39 +00:00
4f1ccbc495 Merge #3525
3525: Fix phrase search containing stop words r=ManyTheFish a=ManyTheFish

# Summary
A search with a phrase containing only stop words was returning an HTTP error 500,
this PR filters the phrase containing only stop words dropping them before the search starts, a query with a phrase containing only stop words now behaves like a placeholder search.

fixes https://github.com/meilisearch/meilisearch/issues/3521

related v1.0.2 PR on milli: https://github.com/meilisearch/milli/pull/779



Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-03-02 10:55:37 +00:00
37489fd495 Return an internal error in the case of matching word is invalid 2023-03-01 19:05:16 +01:00
d9e19c89c5 Merge #3544
3544: Attempt to use default vram budget for faster startup r=Kerollmops a=dureuill

# Pull Request

## Related issue
Follow-up to #3382: addresses the added startup time on Windows/macOS.

## What does this PR do?
- Attempt to skip budget calculation by using "known good values" instead
- Perform dichotomic budget calculation as fallback only when the known value is not actually good.


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-03-01 09:49:38 +00:00
18bf740ee2 Merge #3539
3539: Update migration link to the docs r=curquiza a=curquiza

Fixes https://github.com/meilisearch/meilisearch/issues/3449

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-02-28 18:21:11 +00:00
0202ff8ab4 Attempt to use default budget for faster startup 2023-02-28 10:55:43 +01:00
fbe4ab158e Merge #3543
3543: config: case `experimental_enable_metrics` in snake_case r=dureuill a=dureuill

# Pull Request

Avoids "Error: unknown field `experimental-enable-metrics` at line 1 column 1" error when using the default config file.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-28 09:35:25 +00:00
92318ca573 config: case experimental_enable_metrics in snake_case 2023-02-27 17:14:06 +01:00
6ca7a109b9 Merge #3538
3538: Improve the api key of the metrics r=dureuill a=irevoire

Related to https://github.com/meilisearch/meilisearch/pull/3524#discussion_r1115903998
Update: https://github.com/meilisearch/meilisearch/issues/3523

Right after merging the PR, we changed our minds and decided to update the way we handle the API keys on the metrics route.
Now instead of bypassing all the applied rules of the API key, we forbid the usage of the `/metrics` route if you have any restrictions on the indexes.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-27 13:46:57 +00:00
d4d4702f1b Rephrase hint message 2023-02-27 13:46:16 +01:00
2648bbca25 Update migration link to the docs 2023-02-23 18:36:30 +01:00
562c86ea01 Merge #3519
3519: Update comments in version bump CI r=irevoire a=curquiza

Following https://github.com/meilisearch/meilisearch/pull/3499

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-02-23 16:46:10 +00:00
7ae10abb6b fix the auth tests 2023-02-23 17:27:42 +01:00
dc533584c6 Forbid the usage of the metrics route if your API key have a limitation on the indexes 2023-02-23 17:13:22 +01:00
442c1e36de Merge #3537
3537: fix a bug where the filestore could try to parse its own tmp file and fail (main) r=irevoire a=curquiza



Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-23 15:58:05 +00:00
66b5e4b548 fix a bug where the filestore could try to parse its own tmp file and fail 2023-02-23 16:52:41 +01:00
89ac1015f3 Merge #3524
3524: Update the metrics route r=irevoire a=irevoire

Fixes #3523

Make the metrics available by default without a feature flag.
+ Rename the cli-flag to `experimental-enable-metrics`.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-23 15:11:10 +00:00
ca25904c26 Merge #3331
3331: Limit the number of concurrently opened indexes r=dureuill a=dureuill

# Pull Request

## Related issue
Relevant to #1841, fixes #3382

## What does this PR do?

### User standpoint

- Limit the number of concurrently opened indexes (currently, the number of indexes that can be concurrently opened is computed at startup)
- When too many an index is opened, the least recently used one is closed and its virtual memory released.
- This allows a user to have an arbitrary number of indexes of an arbitrary size

### Implementation standpoint

- Added a LRU cache map in `index-scheduler::lru`. A more complete implementation  (eg with helper functions not used here) is available but would better fit a dedicated crate.
- Use the LRU cache map in the `IndexScheduler`. To simplify the lifecycle of indexes, they are never removed from the cache when they are in the middle of a resize or delete operation. To achieve this, an intermediate `Vec` stores the UUIDs of the indexes that are in the middle of such an operation.
- Upon creating the index scheduler object, compute the total virtual memory that is adressable by using a dichotomic search on the max size of an index. Use this as a base to compute the number of indexes that can be open with 2TiB per index. If the virtual memory address space is lower than 2TiB, then only allow for 1 index of a fraction of that size.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-23 14:20:52 +00:00
8a1b1a95f3 comment the right of the metrics 2023-02-23 13:59:01 +01:00
8d47d2d018 update the auth api after the rebase 2023-02-23 13:15:51 +01:00
5082cd5e67 update the config file to mention the experimental metrics feature 2023-02-23 12:26:22 +01:00
750a2b6842 Update meilisearch/src/option.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-23 12:26:22 +01:00
bc7d4112d9 send the cli experimental feature in the analytics 2023-02-23 12:26:22 +01:00
88a18677d0 rename the metrics cli flag 2023-02-23 12:26:22 +01:00
68e30214ca remove the feature flag and reorganize the module slightly 2023-02-23 12:26:21 +01:00
b985b96e4e Merge #3530
3530: Fix highlighter bug r=Kerollmops a=ManyTheFish

# Pull Request

There was a highlighting issue on CJK's character, we were highlighting too many characters and these additional characters were duplicated after the highlight tag.

## Related issue
Fixes #3517 
Fixes #3526 

## What does this PR do?
- add a test showcasing the bug
- fix the bug by activating the char_map creation of the tokenizer during the highlighting process


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-02-23 10:59:43 +00:00
71e7900c67 move index_map to file 2023-02-23 11:29:11 +01:00
431782f3ee Move index_mapper to mod.rs 2023-02-23 11:29:11 +01:00
3db613ff77 Don't iterate all indexes manually 2023-02-23 11:29:09 +01:00
5822764be9 Skip computing index budget in tests 2023-02-23 11:23:39 +01:00
c63294f331 Switch to 2TiB default index size, updates documentation 2023-02-23 11:23:39 +01:00
a529bf160c Compute budget 2023-02-23 11:23:39 +01:00
f1119f2dc2 Add dichotomic search to utils 2023-02-23 11:23:39 +01:00
1db7d5d851 Add basic tests for index eviction and resize 2023-02-23 11:23:39 +01:00
80b060f920 Use LRU cache 2023-02-23 11:23:39 +01:00
fdf043580c Add LruMap 2023-02-23 11:23:38 +01:00
f62703cd67 Merge #3534
3534: Update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter r=Kerollmops a=irevoire

Fixes #3533

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-23 07:05:12 +00:00
76f82c880d update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter 2023-02-22 19:26:48 +01:00
6eeba3a8ab Merge #3417
3417: Allow multiple searches in a single request r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #3427

## What does this PR do?

### User standpoint

- Adds a new `/multi-search` entry point (not to be confused with the existing `/{index_uid}/search` entry points) that accepts a POST whose body is an object containing an array of queries.
    - Each query must specify on which index it acts by providing its `indexUid`. Other parameters are identical to the one in the existing search routes (`q`, `limit`, etc.).
    - The response is a JSON object containing an array of the results for each search query as if it had been performed using the `/{index_uid}/search` routes.

### Implementation standpoint

- Refactor authentication module:
  -  Allow tenant token to be checked even without an index in URL
  - Add `meilisearch-auth` as a dependency to `index-scheduler` so as to have a working method of checking if the indexes are authorized there that takes into account both the API key and the tenant token (existing method relied on a behavior that was returning the allowed indexes from the API key as long as there weren't any tenant token)
  - Make `AuthFilter` an object with invariants and so its fields are now private
  - Use the methods of `AuthFilter` to know if an index is authorized rather than relying on its internal search rules.
  - Make tenant token search rules optional and `None` when the `AuthFilter` was not built with a tenant token.
- Add a new `routes::index::search::multiple_search` module containing a post handler that performs the same work as the existing `routes::index::search` post handler, but in a loop.
  - Add various tests
  - Add authentication test suite 

### Sample request


<details>
<summary>
Click to see request/response
</summary>

```json
~/datasets
❯ curl \
  -X POST 'http://localhost:7700/multi-search' \
  -H 'Content-Type: application/json' \
  --data-binary '{"queries": [{ "indexUid": "index-0", "q": "toto", "limit": 1 }, {"indexUid": "index-1", "q": "titi", "limit": 1}]}' | jsonxf
{ "results": [
  {
    "indexUid": "index-0",
    "hits": [
      {
        "id": 20480,
        "title": "Toto - 25th Anniversary - Live in Amsterdam",
        "overview": "Filmed in High Definition in Amsterdam on Toto's 25th Anniversary Tour in 2003, this stunning concert captures the band at their very best, reunited with original vocalist Bobby Kimball. The set combines all their hits with tracks from their latest album \"Through the Looking Glass\" and other live favorites, performed in front of a wildly enthusastic sell-out crowd. Extras include 35 minute behind-the-scenes film following the band through various stages of their world tour including footage from Japan, Thailand, South Korea, and France.  Toto celebrate their 25th anniversary with this blistering live concert, filmed in Amsterdam on February 25th, 2003. Proving they've still got exactly what it takes to move a crowd, the band perform a mixture of medley's, solo spots, and huge hits. Tracks include \"Rosanna,\" \"Africa,\" \"Hold The Line,\" a cover of the Beatles' \"While My Guitar Gently Weeps,\" and many more.",
        "genres": [
          "Music"
        ],
        "poster": "https://image.tmdb.org/t/p/w500/7SCbUPwoB8Z7VUIA1Rn1WWwjNiT.jpg",
        "release_date": 1064275200
      }
    ],
    "query": "toto",
    "processingTimeMs": 1,
    "limit": 1,
    "offset": 0,
    "estimatedTotalHits": 17
  },
  {
    "indexUid": "index-1",
    "hits": [
      {
        "id": 41212,
        "title": "Titicut Follies",
        "overview": "The film is a stark and graphic portrayal of the conditions that existed at the State Prison for the Criminally Insane at Bridgewater, Massachusetts. TITICUT FOLLIES documents the various ways the inmates are treated by the guards, social workers and psychiatrists.",
        "genres": [
          "Documentary"
        ],
        "poster": "https://image.tmdb.org/t/p/w500/2Ju5hn1ofOPeP1eRJtQWakiHuhW.jpg",
        "release_date": -70934400
      }
    ],
    "query": "titi",
    "processingTimeMs": 0,
    "limit": 1,
    "offset": 0,
    "estimatedTotalHits": 7
  }
]}
```

</details>


## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-22 17:23:45 +00:00
28d6a4466d Make the tokenizer creating a char map during highlighting 2023-02-22 17:43:10 +01:00
1ba2fae3ae multi-search/authentication: Add authentication tests 2023-02-22 17:04:12 +01:00
28d6ab78de multi-search: Add multi search tests 2023-02-22 17:04:12 +01:00
3ba5dfb6ec multi-search: Add test server search method for multi search 2023-02-22 17:04:12 +01:00
a23fbf6c7b multi-search: Add search with an array of indexes 2023-02-22 17:04:12 +01:00
596a98f7c6 multi-search: Add basic analytics 2023-02-22 16:37:18 +01:00
14c4a222da Authentication: AuthFilter::allow_index_creation both check that the index is authorized and the IndexCreate action 2023-02-22 16:37:13 +01:00
690bb2e5cc Authentication: Make allow_index_creation a private field 2023-02-22 16:35:52 +01:00
d0f2c9c72e Authentication: Make search_rules optional in AuthFilter 2023-02-22 16:35:52 +01:00
42577403d8 Authentication: Directly pass the authfilter to the index scheduler 2023-02-22 16:35:52 +01:00
c8c5944094 Authentication: is_index_authorized takes into account API key indexes even with a tenant token 2023-02-22 16:35:52 +01:00
4b65851793 Authentication: Refactor authentication check to work for tenant token even without an index in URL
Callers need to manually check `is_index_authorized` when using the route without an index in URL
2023-02-22 16:35:51 +01:00
10d4a1a9af Make ResponseError code and message pub so that they can be modified 2023-02-22 16:35:51 +01:00
ad35edfa32 Add test 2023-02-22 15:47:15 +01:00
033417e9cc add an analytics on the geo bounding box feature 2023-02-22 15:35:26 +01:00
ac5a1e4c4b Merge #3423
3423: Add min and max facet stats r=dureuill a=dureuill

# Pull Request

## Related issue
Fixes #3426

## What does this PR do?

### User standpoint

- When using a `facets` parameter in search, the facets that have numeric values are displayed in a new section of the response called `facetStats` that contains, per facet, the numeric min and max value of the hits returned by the search.

<details>
<summary>
Sample request/response
</summary>

```json
❯ curl \
  -X POST 'http://localhost:7700/indexes/meteorites/search?facets=mass' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "LL6", "facets":["mass", "recclass"], "limit": 5 }' | jsonxf
{
  "hits": [
    {
      "name": "Niger (LL6)",
      "id": "16975",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 3.3,
      "fall": "Fell"
    },
    {
      "name": "Appley Bridge",
      "id": "2318",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 15000,
      "fall": "Fell",
      "_geo": {
        "lat": 53.58333,
        "lng": -2.71667
      }
    },
    {
      "name": "Athens",
      "id": "4885",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 265,
      "fall": "Fell",
      "_geo": {
        "lat": 34.75,
        "lng": -87.0
      }
    },
    {
      "name": "Bandong",
      "id": "4935",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 11500,
      "fall": "Fell",
      "_geo": {
        "lat": -6.91667,
        "lng": 107.6
      }
    },
    {
      "name": "Benguerir",
      "id": "30443",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 25000,
      "fall": "Fell",
      "_geo": {
        "lat": 32.25,
        "lng": -8.15
      }
    }
  ],
  "query": "LL6",
  "processingTimeMs": 15,
  "limit": 5,
  "offset": 0,
  "estimatedTotalHits": 42,
  "facetDistribution": {
    "mass": {
      "110000": 1,
      "11500": 1,
      "1161": 1,
      "12000": 1,
      "1215.5": 1,
      "127000": 1,
      "15000": 1,
      "1676": 1,
      "1700": 1,
      "1710.5": 1,
      "18000": 1,
      "19000": 1,
      "220000": 1,
      "2220": 1,
      "22300": 1,
      "25000": 2,
      "265": 1,
      "271000": 1,
      "2840": 1,
      "3.3": 1,
      "3000": 1,
      "303": 1,
      "32000": 1,
      "34000": 1,
      "36.1": 1,
      "45000": 1,
      "460": 1,
      "478": 1,
      "483": 1,
      "5500": 2,
      "600": 1,
      "6000": 1,
      "67.8": 1,
      "678": 1,
      "680.5": 1,
      "6930": 1,
      "8": 1,
      "8300": 1,
      "840": 1,
      "8400": 1
    },
    "recclass": {
      "L/LL6": 3,
      "LL6": 39
    }
  },
  "facetStats": {
    "mass": {
      "min": 3.3,
      "max": 271000.0
    }
  }
}
```

</details>

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-22 13:06:43 +00:00
3eb9a08b5c Update comments in version bump CI 2023-02-21 19:14:59 +01:00
900bae3d9d keep phrases that has at least one word 2023-02-21 18:16:51 +01:00
28b7d73d4a Remove an unefficient part of a test on milli 2023-02-21 18:16:51 +01:00
6841f167b4 Add test 2023-02-21 18:02:52 +01:00
c88b6f331f Merge #3482
3482: Optimize meilisearch uffizzi build r=curquiza a=waveywaves

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3476

## What does this PR do?
even though docker cache was being used earlier for uffizzi builds, seems like the cache layers weren't persisting. This commit adds changes to move meilisearch building outside the dockerfile so that we can use the rust cache action. We are also building to the musl target so that the binary for meilisearch which is created can be used for the uffizzi ttyd image which uses alpine.

Meilisearch build time brought to 5 mins example https://github.com/waveywaves/meilisearch/actions/runs/4142776058

we also update the version of uffizzi action used here which fixes another uffizzi bug where the environments are not deployed. https://app.uffizzi.com/github.com/waveywaves/meilisearch/pull/2 was built as a part of a test for this PR and we can be sure that the deployment works well now.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Vibhav Bobade <vibhav.bobde@gmail.com>
2023-02-21 16:06:53 +00:00
09a94e0db3 optimize meilisearch uffizzi build
even though docker cache was being used earlier for uffizzi builds,
seems like the cache layers weren't persisting. This commit adds changes
to move meilisearch building outside the dockerfile so that we can
use the rust cache action. We are also building to the musl target
so that the binary for meilisearch which is created can be used for
the uffizzi ttyd image which uses alpine.
2023-02-21 17:25:28 +05:30
39407885c2 Merge #3347
3347: Enhance language detection r=irevoire a=ManyTheFish

## Summary

Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query.

This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search.

## Detail

- Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected
- Fill the newly created database during indexing
- Create an allow-list with this database and pass it to Charabia
- Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese

## Related issues
Fixes #2403
Fixes #3513

Co-authored-by: f3r10 <frledesma@outlook.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-02-21 10:52:13 +00:00
a3e41ba33e Merge #3496
3496: Fix metrics feature r=irevoire a=james-2001

# Pull Request

## Related issue

Resolves: #3469
See also: #2763

## What does this PR do?
As reported the metrics feature was broken by still using and old reference to `meilisearch_auth::actions`. This commit switches to the new location, `meilisearch_types::keys::actions`.

The original issue was not *that* clear as to exactly what was broken, and the build logs have disappeared, but it seemed to just be this one line fix. If this is not the case and I've missed the mark let me know, and i'll head back to the drawing board.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: James <james.a.may.2001@gmail.com>
2023-02-21 10:13:11 +00:00
ce807d760b Fix formatting issue on Opt struct
tab in enable_metrics_route to fix cargo fmt issues

Resolves: #3469
See also: #2763
2023-02-21 09:45:18 +00:00
bbecab8948 fix clippy 2023-02-21 10:18:44 +01:00
5cff435bf6 Add feature flags to Opt structure
Resolves: #3469
See also: #2763
2023-02-21 07:41:41 +00:00
8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
1e9ac00800 Merge #3505
3505: Csv delimiter r=irevoire a=irevoire

Fixes https://github.com/meilisearch/meilisearch/issues/3442
Closes https://github.com/meilisearch/meilisearch/pull/2803
Specified in https://github.com/meilisearch/specifications/pull/221

This PR is a reimplementation of https://github.com/meilisearch/meilisearch/pull/2803, on the new engine. Thanks for your idea and initial PR `@MixusMinimax;` sorry I couldn’t update/merge your PR. Way too many changes happened on the engine in the meantime.

**Attention to reviewer**; I had to update deserr to implement the support of deserializing `char`s

-------

It introduces four new error messages;
- Invalid value in parameter csvDelimiter: expected a string of one character, but found an empty string
- Invalid value in parameter csvDelimiter: expected a string of one character, but found the following string of 5 characters: doggo
- csv delimiter must be an ascii character. Found: 🍰 
- The Content-Type application/json does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type text/csv.

And one error code;
- `invalid_index_csv_delimiter`

The `invalid_content_type` error code is now also used when we encounter the `csvDelimiter` query parameter with a non-csv content type.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 17:01:36 +00:00
b08a49a16e Merge #3319 #3470
3319: Transparently resize indexes on MaxDatabaseSizeReached errors r=Kerollmops a=dureuill

# Pull Request

## Related issue
Related to https://github.com/meilisearch/meilisearch/discussions/3280, depends on https://github.com/meilisearch/milli/pull/760

## What does this PR do?

### User standpoint

- Meilisearch no longer fails tasks that encounter the `milli::UserError(MaxDatabaseSizeReached)` error.
- Instead, these tasks are retried after increasing the maximum size allocated to the index where the failure occurred.

### Implementation standpoint

- Add `Batch::index_uid` to get the `index_uid` of a batch of task if there is one
- `IndexMapper::create_or_open_index` now takes an additional `size` argument that allows to (re)open indexes with a size different from the base `IndexScheduler::index_size` field
- `IndexScheduler::tick` now returns a `Result<TickOutcome>` instead of a `Result<usize>`. This offers more explicit control over what the behavior should be wrt the next tick.
- Add `IndexStatus::BeingResized` that contains a handle that a thread can use to await for the resize operation to complete and the index to be available again.
- Add `IndexMapper::resize_index` to increase the size of an index.
- In `IndexScheduler::tick`, intercept task batches that failed due to `MaxDatabaseSizeReached` and resize the index that caused the error, then request a new tick that will eventually handle the still enqueued task.

## Testing the PR

The following diff can be applied to this branch to make testing the PR easier:

<details>


```diff
diff --git a/index-scheduler/src/index_mapper.rs b/index-scheduler/src/index_mapper.rs
index 553ab45a..022b2f00 100644
--- a/index-scheduler/src/index_mapper.rs
+++ b/index-scheduler/src/index_mapper.rs
`@@` -228,13 +228,15 `@@` impl IndexMapper {
 
         drop(lock);
 
+        std:🧵:sleep_ms(2000);
+
         let current_size = index.map_size()?;
         let closing_event = index.prepare_for_closing();
-        log::info!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2);
+        log::error!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2);
 
         closing_event.wait();
 
-        log::info!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2);
+        log::error!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2);
 
         let index_path = self.base_path.join(uuid.to_string());
         let index = self.create_or_open_index(&index_path, None, 2 * current_size)?;
`@@` -268,8 +270,10 `@@` impl IndexMapper {
             match index {
                 Some(Available(index)) => break index,
                 Some(BeingResized(ref resize_operation)) => {
+                    log::error!("waiting for resize end");
                     // Deadlock: no lock taken while doing this operation.
                     resize_operation.wait();
+                    log::error!("trying our luck again!");
                     continue;
                 }
                 Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())),
diff --git a/index-scheduler/src/lib.rs b/index-scheduler/src/lib.rs
index 11b17d05..242dc095 100644
--- a/index-scheduler/src/lib.rs
+++ b/index-scheduler/src/lib.rs
`@@` -908,6 +908,7 `@@` impl IndexScheduler {
     ///
     /// Returns the number of processed tasks.
     fn tick(&self) -> Result<TickOutcome> {
+        log::error!("ticking!");
         #[cfg(test)]
         {
             *self.run_loop_iteration.write().unwrap() += 1;
diff --git a/meilisearch/src/main.rs b/meilisearch/src/main.rs
index 050c825a..63f312f6 100644
--- a/meilisearch/src/main.rs
+++ b/meilisearch/src/main.rs
`@@` -25,7 +25,7 `@@` fn setup(opt: &Opt) -> anyhow::Result<()> {
 
 #[actix_web::main]
 async fn main() -> anyhow::Result<()> {
-    let (opt, config_read_from) = Opt::try_build()?;
+    let (mut opt, config_read_from) = Opt::try_build()?;
 
     setup(&opt)?;
 
`@@` -56,6 +56,8 `@@` We generated a secure master key for you (you can safely copy this token):
         _ => (),
     }
 
+    opt.max_index_size = byte_unit::Byte::from_str("1MB").unwrap();
+
     let (index_scheduler, auth_controller) = setup_meilisearch(&opt)?;
 
     #[cfg(all(not(debug_assertions), feature = "analytics"))]
```
</details>

Mainly, these debug changes do the following:

- Set the default index size to 1MiB so that index resizes are initially frequent
- Turn some logs from info to error so that they can be displayed with `--log-level ERROR` (hiding the other infos)
- Add a long sleep between the beginning and the end of the resize so that we can observe the `BeingResized` index status (otherwise it would never come up in my tests)

## Open questions

- Is the growth factor of x2 the correct solution? For a `Vec` in memory it makes sense, but here we're manipulating quantities that are potentially in the order of 500GiBs. For bigger indexes it may make more sense to add at most e.g. 100GiB on each resize operation, avoiding big steps like 500GiB -> 1TiB.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


3470: Autobatch addition and deletion r=irevoire a=irevoire

This PR adds the capability to meilisearch to batch document addition and deletion together.

Fix https://github.com/meilisearch/meilisearch/issues/3440

--------------

Things to check before merging;

- [x] What happens if we delete multiple time the same documents -> add a test
- [x] If a documentDeletion gets batched with a documentAddition but the index doesn't exist yet? It should not work

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 15:00:19 +00:00
23f4e82b53 Add test ensuring that Meilisearch works on kanji only requests 2023-02-20 15:43:29 +01:00
119e6d8811 Update milli/src/search/mod.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 15:33:10 +01:00
a8f6f108e0 Merge #3515
3515: Consider null as a valid geo field r=irevoire a=irevoire

Fix #3497
Associated spec; https://github.com/meilisearch/specifications/pull/222

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 14:12:55 +00:00
1479050f7a apply review suggestions 2023-02-20 14:53:37 +01:00
97b8c32e22 Merge #3514
3514: Bump version of mini-dashboard to v0.2.6 r=irevoire a=bidoubiwa

Update the version of the mini-dashboard to v0.2.6.

See [release notes](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.6).

Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
2023-02-20 13:21:00 +00:00
cb8d5f2d4b Update Charabia to 0.7.1 2023-02-20 14:00:31 +01:00
35f6c624bc Make sure we don't leave the in memory hashmap in an inconsistent state 2023-02-20 13:55:32 +01:00
1116788475 Resize indexes when they're full 2023-02-20 13:55:32 +01:00
951a5b5832 Add IndexMapper::resize_index fn 2023-02-20 13:55:32 +01:00
1c670d7fa0 Add IndexStatus::BeingResized 2023-02-20 13:55:32 +01:00
6cc3797aa1 IndexScheduler::tick returns a TickOutcome 2023-02-20 13:55:31 +01:00
faf1e17a27 create_or_open_index takes a map_size argument 2023-02-20 13:55:31 +01:00
4c519c2ab3 Add Batch::index_uid 2023-02-20 13:55:31 +01:00
eb28d4c525 add facet test 2023-02-20 13:52:28 +01:00
9ac981d025 Remove some clippy type complexity warns by deboxing iters 2023-02-20 13:52:27 +01:00
74859ecd61 Add min and max facet stats 2023-02-20 13:52:27 +01:00
8ae441a4db Update usage of iterators 2023-02-20 13:52:27 +01:00
042d86cbb3 facet sort ascending/descending now also return the values 2023-02-20 13:52:27 +01:00
dd120e0e16 Bump version of mini-dashboard to v0.2.6 2023-02-20 13:45:57 +01:00
18796d6e6a Consider null as a valid geo object 2023-02-20 13:45:51 +01:00
c91bfeaf15 Merge #3467
3467: Identify builds git tagged with `prototype-...` in CLI and analytics r=curquiza a=dureuill

# Pull Request

## What does this PR do?

- Parses the last git tag to extract a prototype name if:
  - Current build uses the prototype tag (not after the tag) precisely
  - The prototype tag name respects the following conditions:
    1. starts with `prototype-`
    2. ends with a number
    3. the hyphen-separated segment right before the number is not a number (required to reject commits after the tag).
- Display the prototype name in the launch summary in the CLI
- Send the prototype name to analytics if any
- Update prototypes instructions in CONTRIBUTING.md

|`VERGEN_GIT_SEMVER_LIGHTWEIGHT` value | Prototype |
|---|---|
| `Some("prototype-geo-bounding-box-0-139-gcde89018")` | `None` (does not end with a number) |
| `Some("prototype-geo-bounding-box-0-139-89018")` | `None` (before the last segment is a number) |
| `Some("prototype-geo-bounding-box-0")` | `Some("prototype-geo-bounding-box-0")` |
| `Some("prototype-geo-bounding-box")` | `None` (does not end with a number") |
| `Some("geo-bounding-box-0")` | `None` (does not start with "prototype") |
| `None` | `None` | 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-20 09:27:51 +00:00
91048d209d Fix metrics feature
Metrics feature was relying on old references. Refactored with inspiration from the `get_stats` method in `meilisearch/src/routes/lib.rs`. `enable_metrics_routes` added to options in `segment_analytics`.

Resolves: #3469
See also: #2763
2023-02-17 20:11:57 +00:00
28961b2ad1 Merge #3499
3499: Use the workspace inheritance r=Kerollmops a=irevoire

Use the workspace inheritance [introduced in rust 1.64](https://blog.rust-lang.org/2022/09/22/Rust-1.64.0.html#cargo-improvements-workspace-inheritance-and-multi-target-builds).

It allows us to define the version of meilisearch once in the main `Cargo.toml` and let all the other `Cargo.toml` uses this version.

`@curquiza` I added you as a reviewer because I had to patch some CI scripts

And `@Kerollmops,` I had to bump the `cargo_toml` crates because our version was getting old and didn't support the feature yet.

Also, in another PR, I would like to unify some of our dependencies to ensure we always stay in sync between all our crates.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-17 09:52:29 +00:00
895ab2906c apply review suggestions 2023-02-16 18:42:47 +01:00
f11c7d4b62 cargo run execute meilisearch by default 2023-02-16 18:03:45 +01:00
e79f6f87f6 make cargo fmt&clippy happy 2023-02-16 18:00:40 +01:00
5367d8f05a add two tests on the indexing of csvs 2023-02-16 17:37:11 +01:00
52686da028 test various error on the document ressource 2023-02-16 17:37:10 +01:00
8c074f5028 implements the csv delimiter without tests
Co-authored-by: Maxi Barmetler <maxi.barmetler@gmail.com>
2023-02-16 17:35:36 +01:00
49e18da23e Do not escape tag name
$() syntax is not interpreted by the Dockerfile
2023-02-16 10:53:14 +01:00
54240db495 Add note in code so one does not forget next time 2023-02-16 10:53:14 +01:00
e1ed4bc750 Change Dockerfile to also pass the VERGEN_GIT_SEMVER_LIGHTWEIGHT when building 2023-02-16 10:53:14 +01:00
9bd1cfb3a3 Ignore -dirty flag 2023-02-16 10:53:14 +01:00
a341c94871 Update contributing.md 2023-02-16 10:53:14 +01:00
f46cf46b8c Add prototype to analytics if any 2023-02-16 10:53:14 +01:00
c3a30a5a91 If using a prototype, display its name at Meilisearch startup 2023-02-16 10:53:14 +01:00
143e3cf948 Merge #3490
3490: Fix attributes set candidates r=curquiza a=ManyTheFish

# Pull Request

Fix attributes set candidates for v1.1.0

## details

The attribute criterion was not returning the remaining candidates when its internal algorithm was been exhausted.
We had a loss of candidates by the attribute criterion leading to the bug reported in the issue linked below.
After some investigation, it seems that it was the only criterion that had this behavior.

We are now returning the remaining candidates instead of an empty bitmap.

## Related issue

Fixes #3483
PR on milli for v1.0.1: https://github.com/meilisearch/milli/pull/777


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-02-15 17:38:07 +00:00
ab2adba183 update our CI scripts accordingly 2023-02-15 13:56:24 +01:00
74d1a67a99 Use the workspace inheritance feature of rust 1.64 2023-02-15 13:51:07 +01:00
91ce8a5e67 Merge #3492
3492: Bump deserr r=Kerollmops a=irevoire

Bump deserr to the latest version;
- We now use the default actix-web extractors that deserr provides (which were copy/pasted from meilisearch)
- We also use the default `JsonError` message provided by deserr instead of defining our own in meilisearch
- Finally, we get the new `did you mean?` error message. Fix #3493

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-15 10:05:05 +00:00
fd7ae1883b Merge #3495
3495: Add tests with rust nightly in CI r=curquiza a=ztkmkoo

# Pull Request

## Related issue
Fixes #3402 

## What does this PR do?
- add ci test with rust nightly
- make test with rust stable not run on schedule event

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Kebron <ztkmkoo@gmail.com>
2023-02-15 07:53:17 +00:00
42a3cdca66 get rids of the unwrap_any function in favor of take_cf_content 2023-02-14 20:06:31 +01:00
a43765d454 use the pre-defined deserr extractors 2023-02-14 20:05:30 +01:00
769576fd94 get rids of the whole error_message module since it has been integrated into the last version of deserr 2023-02-14 20:05:27 +01:00
8fb7b1d10f bump deserr 2023-02-14 20:04:30 +01:00
d494c29768 Merge #3479
3479: Unify "Bad latitude" & "Bad longitude" errors r=irevoire a=cymruu

# Pull Request

## Related issue
Fix part of #3006

## What does this PR do?
- Moved out `BadGeoLat`, `BadGeoLng`, `BadGeoBoundingBoxTopIsBelowBottom` from `FilterError` into newly introduced error type `ParseGeoError`. 
- Renamed `BadGeo` error  to `ReservedGeo`
- Used new `ParseGeoError` type in `FilterError` and `AscDescError`

Screenshot: 
![image](https://user-images.githubusercontent.com/2981598/217927231-fe23b6a3-2ea8-4145-98af-38eb61c4ff16.png)

I ran `cargo test --package milli -- --test-threads 1` and tests passed.
`--test-threads` was set to 1 because my OS complained about too many opened files.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?


Co-authored-by: Filip Bachul <filipbachul@gmail.com>
Co-authored-by: filip <filipbachul@gmail.com>
2023-02-14 18:35:51 +00:00
74dcfe9676 Fix a bug when you update a document that was already present in the db, deleted and then inserted again in the same transform 2023-02-14 19:09:40 +01:00
1b1703a609 make a small optimization to merge obkvs a little bit faster 2023-02-14 18:32:41 +01:00
62358bd31c Fix metrics feature
As reported the metrics feature was broken by still using and old reference to `meilisearch_auth::actions`. This commit switches to the new location, `meilisearch_types::keys::actions`.

Resolves: #3469
See also: #2763
2023-02-14 17:29:38 +00:00
fb5e4957a6 fix and test the early exit in case a grenad ends with a deletion 2023-02-14 18:23:57 +01:00
8de3c9f737 Update milli/src/update/index_documents/transform.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-02-14 17:57:14 +01:00
43a19d0709 document the operation enum + the grenads 2023-02-14 17:55:26 +01:00
29d14bed90 get rids of the let/else syntax 2023-02-14 17:45:46 +01:00
f3b54337f9 Merge #3174
3174: Allow wildcards at the end of index names for API Keys and Tenant tokens r=irevoire a=Kerollmops

This PR introduces the wildcards at the end of the index names when identifying indexes in the API Keys and tenant tokens. It fixes #2788 and fixes #2908. This PR is based on `@akhildevelops'` work.

Note that when a tenant token filter is chosen to restrict a search, it is always the most restrictive pattern that is chosen. If we have an index pattern _prod*_ that defines _filter1_ and _p*_ that defines _filter2_, the engine will choose _filter1_ over _filter2_ as it is defined for a most restrictive pattern, _prod*_. This restrictiveness is defined by 1. is it exact, without _*_ 2. the length of the pattern.

It is a continuation of work that has already started and should close #2869.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-02-14 16:12:01 +00:00
7f3ae40204 Remove a useless comment regarding the index pattern error code 2023-02-14 17:09:20 +01:00
a53536836b fmt 2023-02-14 17:04:22 +01:00
b095325bf8 Add tests with rust nightly in CI 2023-02-14 15:33:12 +00:00
d7ad39ad77 fix: clippy error 2023-02-14 00:15:35 +01:00
849de089d2 add thiserror for AscDescError 2023-02-14 00:15:35 +01:00
7f25007d31 Update milli/src/asc_desc.rs
Co-authored-by: Tamo <irevoire@protonmail.ch>
2023-02-14 00:15:35 +01:00
c810af3ebf implement From<ParseGeoError> for AscDescError 2023-02-14 00:15:35 +01:00
c0b77773ba fmt asc_desc 2023-02-14 00:15:35 +01:00
7481559e8b move BadGeo to FilterError 2023-02-14 00:15:35 +01:00
83c765ce6c implement From<ParseGeoError> for FilterError 2023-02-14 00:15:35 +01:00
4c91037602 use ParseGeoError in sort parser 2023-02-14 00:15:35 +01:00
825923f6fc export ParseGeoError 2023-02-14 00:15:35 +01:00
e405702733 chore: introduce new error ParseGeoError type 2023-02-14 00:15:35 +01:00
6fa877efb0 Fix attributes set candidates 2023-02-13 17:49:52 +01:00
4b1cd10653 Return an internal error when index pattern should be valid 2023-02-13 17:49:42 +01:00
47748395dc Update an authentication comment
Co-authored-by: Many the fish <many@meilisearch.com>
2023-02-13 17:20:08 +01:00
ff595156d7 Merge #3480
3480: Gitignore vscode & jetbrains IDE folders r=curquiza a=AymanHamdoun

# Pull Request

## Related issue
There is no issue for it, and i couldn't find an appropriate category to make an issue for it.

## What does this PR do?
- Its just a gitignore edit so people who use vscode and jetbrains IDEs (like IntelliJ) dont have to deal with committing the folder the IDE generates to store local project configs by mistake. (I honestly wanted to fork the repo to add something else but this bothered me enough to make a PR for it first) 

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ✔️ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [✔️  ] Have you read the contributing guidelines?
- [✔️  ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Ayman <ayman.s.hamdoun@gmail.com>
2023-02-13 10:47:25 +00:00
8770088df3 remove idea folder 2023-02-10 11:45:02 +04:00
827c1c8447 edit gitignore to ignore .idea and .vscode folders 2023-02-10 11:42:19 +04:00
764df24b7d Make clippy happy (again) 2023-02-09 13:21:20 +01:00
4570d5bf3a Merge remote-tracking branch 'origin/main' into temp-wildcard 2023-02-09 13:14:05 +01:00
746b31c1ce makes clippy happy 2023-02-09 12:23:01 +01:00
eaad84bd1d fix the test to handle the document deletion correctly 2023-02-09 11:29:13 +01:00
c690c4fec4 Added and modified the current API Key and Tenant Token tests 2023-02-09 11:17:30 +01:00
ea9ac46f28 stop autobatching the deletion without the index creation right with the addition 2023-02-08 21:24:27 +01:00
93db755d57 add a test to ensure we handle correctly a deletion of multiple time the same document 2023-02-08 21:03:34 +01:00
93f130a400 fix all warnings 2023-02-08 20:57:35 +01:00
860c993ef7 Handle the autobatching of deletion and addition in the scheduler 2023-02-08 20:53:19 +01:00
67dda0678f cleanup the autobatcher a little bit 2023-02-08 18:10:59 +01:00
2db6347686 update the autobatcher to batch the addition and deletion together 2023-02-08 18:07:59 +01:00
421a9cf05e provide a new method on the transform to remove documents 2023-02-08 16:06:09 +01:00
7b4b57ecc8 Fix the current tests 2023-02-08 14:54:05 +01:00
8f64fba1ce rewrite the current transform to handle a new byte specifying the kind of operation it's merging 2023-02-08 12:53:38 +01:00
9882029fa4 Merge #3456
3456: Bump tokio from 1.24.1 to 1.24.2 r=curquiza a=dependabot[bot]

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.1 to 1.24.2.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a href="https://github.com/tokio-rs/tokio/commits">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tokio&package-manager=cargo&previous-version=1.24.1&new-version=1.24.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).

</details>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-02-07 13:28:42 +00:00
5f56e6dd58 Bump tokio from 1.24.1 to 1.24.2
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.1 to 1.24.2.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/commits)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-07 12:14:05 +00:00
0bc1a18f52 Use Languages list detected during indexing at search time 2023-02-01 18:57:43 +01:00
643d99e0f9 Add expectancy test 2023-02-01 18:39:54 +01:00
a36b1dbd70 Fix the tasks with the new patterns 2023-02-01 18:21:45 +01:00
d563ed8a39 Making it work with index uid patterns 2023-02-01 17:51:30 +01:00
064158e4e2 Update test 2023-02-01 15:34:01 +01:00
77d32d0ee8 Fix codec deserialization 2023-02-01 15:26:26 +01:00
f4569b04ad Update Charabia version 2023-02-01 15:26:26 +01:00
2922c5c899 Fix code format 2023-01-31 11:28:05 +01:00
7681be5367 Format code 2023-01-31 11:28:05 +01:00
50bc156257 Fix tests 2023-01-31 11:28:05 +01:00
d8207356f4 Skip script,language insertion if language is undetected 2023-01-31 11:28:05 +01:00
2d58b28f43 Improve script language codec 2023-01-31 11:28:05 +01:00
fd60a39f1c Format code 2023-01-31 11:28:05 +01:00
369c05732e Add test checking if from script_language_docids database were removed
deleted docids
2023-01-31 11:28:05 +01:00
34d04f3d3f Filter from script_language_docids database soft deleted documents 2023-01-31 11:28:05 +01:00
a27f329e3a Add tests for checking that detected script and language associated with document(s) were stored during indexing 2023-01-31 11:28:05 +01:00
b216ddba63 Delete and clear data from the new database 2023-01-31 11:28:05 +01:00
d97fb6117e Extract and index data 2023-01-31 11:28:05 +01:00
c45d1e3610 Create a new database on index and add a specialized codec for it 2023-01-31 11:28:05 +01:00
ec7de4bae7 Make it work for any all routes including stats and index swaps 2023-01-25 16:12:40 +01:00
184b8afd9e Make it work in the CreateApiKey struct 2023-01-25 15:01:50 +01:00
29961b8c6b Make it work with the dumps 2023-01-25 14:41:36 +01:00
0b08413c98 Introduce the IndexUidPattern type 2023-01-25 14:22:17 +01:00
474d4ec498 Add tests for the index patterns 2023-01-25 14:22:16 +01:00
131 changed files with 7034 additions and 1971 deletions

View File

@ -3,7 +3,7 @@
# check_tag $current_tag $file_tag $file_name
function check_tag {
if [[ "$1" != "$2" ]]; then
echo "Error: the current tag does not match the version in $3: found $2 - expected $1"
echo "Error: the current tag does not match the version in Cargo.toml: found $2 - expected $1"
ret=1
fi
}
@ -11,12 +11,8 @@ function check_tag {
ret=0
current_tag=${GITHUB_REF#'refs/tags/v'}
toml_files='*/Cargo.toml'
for toml_file in $toml_files;
do
file_tag="$(grep '^version = ' $toml_file | cut -d '=' -f 2 | tr -d '"' | tr -d ' ')"
check_tag $current_tag $file_tag $toml_file
done
file_tag="$(grep '^version = ' Cargo.toml | cut -d '=' -f 2 | tr -d '"' | tr -d ' ')"
check_tag $current_tag $file_tag
lock_file='Cargo.lock'
lock_tag=$(grep -A 1 'name = "meilisearch-auth"' $lock_file | grep version | cut -d '=' -f 2 | tr -d '"' | tr -d ' ')

View File

@ -1,23 +1,3 @@
# Compile
FROM rust:alpine3.16 AS compiler
RUN apk add -q --update-cache --no-cache build-base openssl-dev
WORKDIR /meilisearch
ARG COMMIT_SHA
ARG COMMIT_DATE
ENV COMMIT_SHA=${COMMIT_SHA} COMMIT_DATE=${COMMIT_DATE}
ENV RUSTFLAGS="-C target-feature=-crt-static"
COPY . .
RUN set -eux; \
apkArch="$(apk --print-arch)"; \
if [ "$apkArch" = "aarch64" ]; then \
export JEMALLOC_SYS_WITH_LG_PAGE=16; \
fi && \
cargo build --release
# Run
FROM uffizzi/ttyd:alpine
@ -28,19 +8,11 @@ ENV MEILI_NO_ANALYTICS true
RUN apk update --quiet \
&& apk add -q --no-cache libgcc tini curl
# add meilisearch to the `/bin` so you can run it from anywhere and it's easy
# to find.
COPY --from=compiler /meilisearch/target/release/meilisearch /bin/meilisearch
# To stay compatible with the older version of the container (pre v0.27.0) we're
# going to symlink the meilisearch binary in the path to `/meilisearch`
COPY target/x86_64-unknown-linux-musl/release/meilisearch /bin/meilisearch
RUN ln -s /bin/meilisearch /meilisearch
# This directory should hold all the data related to meilisearch so we're going
# to move our PWD in there.
# We don't want to put the meilisearch binary
WORKDIR /meili_data
EXPOSE 7700/tcp
ENTRYPOINT ["tini", "--"]

View File

@ -92,6 +92,7 @@ jobs:
build-args: |
COMMIT_SHA=${{ github.sha }}
COMMIT_DATE=${{ steps.build-metadata.outputs.date }}
GIT_TAG=${{ github.ref_name }}
# /!\ Don't touch this without checking with Cloud team
- name: Send CI information to Cloud team

View File

@ -2,6 +2,9 @@ name: Rust
on:
workflow_dispatch:
schedule:
# Everyday at 5:00am
- cron: '0 5 * * *'
pull_request:
push:
# trying and staging branches are for Bors config
@ -27,10 +30,18 @@ jobs:
run: |
apt-get update && apt-get install -y curl
apt-get install build-essential -y
- uses: actions-rs/toolchain@v1
- name: Run test with Rust stable
if: github.event_name != 'schedule'
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
- name: Run test with Rust nightly
if: github.event_name == 'schedule'
uses: actions-rs/toolchain@v1
with:
toolchain: nightly
override: true
# Disable cache due to disk space issues with Windows workers in CI
# - name: Cache dependencies
# uses: Swatinem/rust-cache@v2.2.0

View File

@ -14,6 +14,26 @@ jobs:
- name: checkout
uses: actions/checkout@v3
- run: sudo apt-get install musl-tools
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
target: x86_64-unknown-linux-musl
- name: Cache dependencies
uses: Swatinem/rust-cache@v2.2.0
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --target x86_64-unknown-linux-musl --release
- name: Remove dockerignore so we can use the target folder in our docker build
run: rm -f .dockerignore
- name: Set up QEMU
uses: docker/setup-qemu-action@v2

View File

@ -82,7 +82,7 @@ jobs:
name: Use Remote Workflow to Preview on Uffizzi
needs:
- cache-compose-file
uses: UffizziCloud/preview-action/.github/workflows/reusable.yaml@desc
uses: UffizziCloud/preview-action/.github/workflows/reusable.yaml@v2
with:
# If this workflow was triggered by a PR close event, cache-key will be an empty string
# and this reusable workflow will delete the preview deployment.
@ -95,8 +95,8 @@ jobs:
`meilisearch` command. You should be able to access this instance of meilisearch running in
the preview from the link Meilisearch Endpoint link given below.
Web Terminal Endpoint : ${{ needs.cache-compose-file.outputs.expected-url }}
Meilisearch Endpoint : ${{ needs.cache-compose-file.outputs.expected-url }}/meilisearch
Web Terminal Endpoint : <uffizzi-url>
Meilisearch Endpoint : <uffizzi-url>/meilisearch
permissions:
contents: read
pull-requests: write

View File

@ -1,4 +1,4 @@
name: Update Meilisearch version in all Cargo.toml files
name: Update Meilisearch version in Cargo.toml
on:
workflow_dispatch:
@ -14,7 +14,7 @@ env:
jobs:
update-version-cargo-toml:
name: Update version in Cargo.toml files
name: Update version in Cargo.toml
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
@ -25,23 +25,23 @@ jobs:
override: true
- name: Install sd
run: cargo install sd
- name: Update Cargo.toml files
- name: Update Cargo.toml file
run: |
raw_new_version=$(echo $NEW_VERSION | cut -d 'v' -f 2)
new_string="version = \"$raw_new_version\""
sd '^version = "\d+.\d+.\w+"$' "$new_string" */Cargo.toml
sd '^version = "\d+.\d+.\w+"$' "$new_string" Cargo.toml
- name: Build Meilisearch to update Cargo.lock
run: cargo build
- name: Commit and push the changes to the ${{ env.NEW_BRANCH }} branch
uses: EndBug/add-and-commit@v9
with:
message: "Update version for the next release (${{ env.NEW_VERSION }}) in Cargo.toml files"
message: "Update version for the next release (${{ env.NEW_VERSION }}) in Cargo.toml"
new_branch: ${{ env.NEW_BRANCH }}
- name: Create the PR pointing to ${{ github.ref_name }}
run: |
gh pr create \
--title "Update version for the next release ($NEW_VERSION) in Cargo.toml files" \
--body '⚠️ This PR is automatically generated. Check the new version is the expected one before merging.' \
--title "Update version for the next release ($NEW_VERSION) in Cargo.toml" \
--body '⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.' \
--label 'skip changelog' \
--milestone $NEW_VERSION \
--base $GITHUB_REF_NAME

2
.gitignore vendored
View File

@ -1,3 +1,5 @@
.idea/
.vscode/
/target
**/*.csv
**/*.json_lines

View File

@ -121,15 +121,19 @@ The full Meilisearch release process is described in [this guide](https://github
Depending on the developed feature, you might need to provide a prototyped version of Meilisearch to make it easier to test by the users.
The prototype name must follow this convention: `prototype-X-Y` where
- `X` is the feature name formatted in `kebab-case`
- `X` is the feature name formatted in `kebab-case`. It should not end with a single number.
- `Y` is the version of the prototype, starting from `0`.
Example: `prototype-auto-resize-0`.
Example: `prototype-auto-resize-0`. </br>
❌ Bad example: `auto-resize-0`: lacks the `prototype` prefix. </br>
❌ Bad example: `prototype-auto-resize`: lacks the version suffix. </br>
❌ Bad example: `prototype-auto-resize-0-0`: feature name ends with a single number.
Steps to create a prototype:
1. In your terminal, go to the last commit of your branch (the one you want to provide as a prototype).
2. Create a tag following the convention: `git tag prototype-X-Y`
3. Run Meilisearch and check that its launch summary features a line: `Prototype: prototype-X-Y` (you may need to switch branches and back after tagging for this to work).
3. Push the tag: `git push origin prototype-X-Y`
4. Check the [Docker CI](https://github.com/meilisearch/meilisearch/actions/workflows/publish-docker-images.yml) is now running.
@ -138,7 +142,7 @@ More information about [how to run Meilisearch with Docker](https://docs.meilise
⚙️ However, no binaries will be created. If the users do not use Docker, they can go to the `prototype-X-Y` tag in the Meilisearch repository and compile from the source code.
⚠️ When sharing a prototype with users, prevent them from using it in production. Prototypes are only for test purposes.
⚠️ When sharing a prototype with users, remind them to not use it in production. Prototypes are solely for test purposes.
### Release assets

262
Cargo.lock generated
View File

@ -36,9 +36,9 @@ dependencies = [
[[package]]
name = "actix-http"
version = "3.2.2"
version = "3.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0c83abf9903e1f0ad9973cc4f7b9767fd5a03a583f51a5b7a339e07987cd2724"
checksum = "0070905b2c4a98d184c4e81025253cb192aa8a73827553f38e9410801ceb35bb"
dependencies = [
"actix-codec",
"actix-rt",
@ -46,7 +46,7 @@ dependencies = [
"actix-tls",
"actix-utils",
"ahash",
"base64 0.13.1",
"base64 0.21.0",
"bitflags",
"brotli",
"bytes",
@ -68,7 +68,10 @@ dependencies = [
"rand",
"sha1",
"smallvec",
"tokio",
"tokio-util",
"tracing",
"zstd 0.12.3+zstd.1.5.2",
]
[[package]]
@ -164,9 +167,9 @@ dependencies = [
[[package]]
name = "actix-web"
version = "4.2.1"
version = "4.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d48f7b6534e06c7bfc72ee91db7917d4af6afe23e7d223b51e68fffbb21e96b9"
checksum = "464e0fddc668ede5f26ec1f9557a8d44eda948732f40c6b0ad79126930eb775f"
dependencies = [
"actix-codec",
"actix-http",
@ -407,7 +410,7 @@ checksum = "b645a089122eccb6111b4f81cbc1a49f5900ac4666bb93ac027feaecf15607bf"
[[package]]
name = "benchmarks"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"anyhow",
"bytes",
@ -514,12 +517,6 @@ dependencies = [
"serde",
]
[[package]]
name = "build_const"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b4ae4235e6dac0694637c763029ecea1a2ec9e4e06ec2729bd21ba4d9c863eb7"
[[package]]
name = "bumpalo"
version = "3.11.1"
@ -606,9 +603,9 @@ dependencies = [
[[package]]
name = "cargo_toml"
version = "0.13.3"
version = "0.14.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "497049e9477329f8f6a559972ee42e117487d01d1e8c2cc9f836ea6fa23a9e1a"
checksum = "2bfbc36312494041e2cdd5f06697b7e89d4b76f42773a0b5556ac290ff22acc2"
dependencies = [
"serde",
"toml",
@ -656,16 +653,19 @@ dependencies = [
[[package]]
name = "charabia"
version = "0.7.0"
version = "0.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b57f9571f611796ea38e5a9c12e5ce37476f70397b032757f8dfe0c7b9bc5637"
checksum = "1ad3d9667a6b4e03813162c22c4d58235c2dc25d580d60837ce29199038341c9"
dependencies = [
"cow-utils",
"csv",
"deunicode",
"fst",
"irg-kvariants",
"jieba-rs",
"lindera",
"lindera-ipadic",
"lindera-ko-dic",
"once_cell",
"pinyin",
"serde",
@ -718,14 +718,9 @@ version = "3.2.23"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "71655c45cb9845d3270c9d6df84ebe72b4dad3c2ba3f7023ad47c144e4e473a5"
dependencies = [
"atty",
"bitflags",
"clap_derive 3.2.18",
"clap_lex 0.2.4",
"indexmap",
"once_cell",
"strsim",
"termcolor",
"textwrap",
]
@ -736,7 +731,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a7db700bc935f9e43e88d00b0850dae18a63773cfbec6d8e070fccf7fef89a39"
dependencies = [
"bitflags",
"clap_derive 4.0.21",
"clap_derive",
"clap_lex 0.3.0",
"is-terminal",
"once_cell",
@ -744,19 +739,6 @@ dependencies = [
"termcolor",
]
[[package]]
name = "clap_derive"
version = "3.2.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ea0c8bce528c4be4da13ea6fead8965e95b6073585a2f05204bd8f4119f82a65"
dependencies = [
"heck",
"proc-macro-error",
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "clap_derive"
version = "4.0.21"
@ -870,15 +852,6 @@ dependencies = [
"libc",
]
[[package]]
name = "crc"
version = "1.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d663548de7f5cca343f1e0a48d14dcfb0e9eb4e079ec58883b7251539fa10aeb"
dependencies = [
"build_const",
]
[[package]]
name = "crc32fast"
version = "1.3.2"
@ -1110,20 +1083,26 @@ dependencies = [
[[package]]
name = "deserr"
version = "0.3.0"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "28380303ca15ec07e1d5b079baf19cf849b09edad5cab219c1c51b2bd07523de"
checksum = "c71c14985c842bf1e520b1ebcd22daff6aeece32f510e11f063cecf9b308c04b"
dependencies = [
"actix-http",
"actix-utils",
"actix-web",
"deserr-internal",
"futures",
"serde-cs",
"serde_json",
"serde_urlencoded",
"strsim",
]
[[package]]
name = "deserr-internal"
version = "0.3.0"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "860928cd8af78d223a3d70dd581f21d7c3de8aa2eecd938e0c0a399ded7c1451"
checksum = "cae1c51b191528c9e4e5d6cff671de94f61fcda1c206cc891251e0cf438c941a"
dependencies = [
"convert_case 0.5.0",
"proc-macro2",
@ -1171,7 +1150,7 @@ dependencies = [
[[package]]
name = "dump"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"anyhow",
"big_s",
@ -1324,6 +1303,19 @@ dependencies = [
"termcolor",
]
[[package]]
name = "env_logger"
version = "0.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85cdab6a89accf66733ad5a1693a4dcced6aeff64602b634530dd73c1f3ee9f0"
dependencies = [
"humantime",
"is-terminal",
"log",
"regex",
"termcolor",
]
[[package]]
name = "errno"
version = "0.2.8"
@ -1379,7 +1371,7 @@ dependencies = [
[[package]]
name = "file-store"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"faux",
"tempfile",
@ -1401,7 +1393,7 @@ dependencies = [
[[package]]
name = "filter-parser"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"insta",
"nom",
@ -1421,7 +1413,7 @@ dependencies = [
[[package]]
name = "flatten-serde-json"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"criterion",
"serde_json",
@ -1898,7 +1890,7 @@ dependencies = [
[[package]]
name = "index-scheduler"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"anyhow",
"big_s",
@ -1912,6 +1904,7 @@ dependencies = [
"insta",
"log",
"meili-snap",
"meilisearch-auth",
"meilisearch-types",
"nelson",
"page_size 0.5.0",
@ -1977,6 +1970,17 @@ version = "2.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "30e22bd8629359895450b59ea7a776c850561b96a3b1d31321c1949d9e6c9146"
[[package]]
name = "irg-kvariants"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c73214298363629cf9dbfc93b426808865ee3c121029778cb31b1284104fdf78"
dependencies = [
"csv",
"once_cell",
"serde",
]
[[package]]
name = "is-terminal"
version = "0.4.2"
@ -2045,7 +2049,7 @@ dependencies = [
[[package]]
name = "json-depth-checker"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"criterion",
"serde_json",
@ -2065,6 +2069,15 @@ dependencies = [
"simple_asn1",
]
[[package]]
name = "kanaria"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c0f9d9652540055ac4fded998a73aca97d965899077ab1212587437da44196ff"
dependencies = [
"bitflags",
]
[[package]]
name = "language-tags"
version = "0.3.2"
@ -2134,14 +2147,15 @@ dependencies = [
[[package]]
name = "lindera"
version = "0.17.0"
version = "0.21.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "082ca91ac4d1557028ace9bfb8cee1500d156a4574dda93cfcdcf4caaebb9bd7"
checksum = "0f33a20bb9cbf95572b2d2f40d7040c8d8c7ad09ae20e1f6513db6ef2564dfc5"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"encoding",
"kanaria",
"lindera-cc-cedict-builder",
"lindera-core",
"lindera-dictionary",
@ -2150,24 +2164,27 @@ dependencies = [
"lindera-ko-dic",
"lindera-ko-dic-builder",
"lindera-unidic-builder",
"regex",
"serde",
"serde_json",
"thiserror",
"unicode-blocks",
"unicode-normalization",
"yada",
]
[[package]]
name = "lindera-cc-cedict-builder"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a8967615a6d85320ec2755e1435c36165467ba01a79026adc3f86dad1b668df3"
checksum = "60c3b379251edadbac7a5fdb31e482274e11dae6ab6cc789d0d86cf34369cf49"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"clap 3.2.23",
"csv",
"encoding",
"env_logger",
"env_logger 0.10.0",
"glob",
"lindera-core",
"lindera-decompress",
@ -2176,16 +2193,28 @@ dependencies = [
]
[[package]]
name = "lindera-core"
version = "0.17.0"
name = "lindera-compress"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0e8ed3cea13f73557a4574a179b1518670a3b70bfdad120521313b03cc89380e"
checksum = "a8d0ea3de5625e2381cac94e518d3b56103fde56bc0dce840fe875c1e871b125"
dependencies = [
"anyhow",
"flate2",
"lindera-decompress",
]
[[package]]
name = "lindera-core"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2281747b98fdd46bcc54ce7fdb6870dad9f67ddb3dc086c47b6704f3e1178cd5"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"encoding_rs",
"log",
"once_cell",
"serde",
"thiserror",
"yada",
@ -2193,20 +2222,20 @@ dependencies = [
[[package]]
name = "lindera-decompress"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2badb41828f89cfa6452db0a66da77897c0a04478304de26c8b2b36613e08d43"
checksum = "52101bd454754c506305ab897af5ac2ae41fe91e3272c1ff5c6a02a089dfaefd"
dependencies = [
"anyhow",
"lzma-rs",
"flate2",
"serde",
]
[[package]]
name = "lindera-dictionary"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e219722c9f56b920c231210e7c25d8b5d35b508e7a2fd69d368916c4b1c926f6"
checksum = "af1c6668848f1d30d216c99093a3ed3fe125c105fa12a4aeed5a1861dc01dd52"
dependencies = [
"anyhow",
"bincode",
@ -2216,15 +2245,16 @@ dependencies = [
[[package]]
name = "lindera-ipadic"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2c8e87c8362c724e8188fb7d9b6d184cac15d01369295e9bff7812b630d57e3b"
checksum = "693098007200fa43fd5cdc9ca8740f371327369672ce812cd87a1f6344971e31"
dependencies = [
"bincode",
"byteorder",
"encoding",
"flate2",
"lindera-core",
"lindera-decompress",
"lindera-ipadic-builder",
"once_cell",
"tar",
@ -2232,19 +2262,19 @@ dependencies = [
[[package]]
name = "lindera-ipadic-builder"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1439e95852e444a116424086dc64d709c90e8af269ff7d2c2c4020f666f8dfab"
checksum = "7b6b7240d097a8fc37ee8f90ebff02c4db0ba5325ecb0dacb6da3724596798c9"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"clap 3.2.23",
"csv",
"encoding_rs",
"encoding_rs_io",
"env_logger",
"env_logger 0.10.0",
"glob",
"lindera-compress",
"lindera-core",
"lindera-decompress",
"log",
@ -2254,15 +2284,16 @@ dependencies = [
[[package]]
name = "lindera-ko-dic"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cb15f949220da45872d774b7831bb030855ec083435c907499782f8558c8a203"
checksum = "abd3c5a4addeb61ca66788a3dd1fd51093e6cd8fea1d997042ada5aa60e8cc5e"
dependencies = [
"bincode",
"byteorder",
"encoding",
"flate2",
"lindera-core",
"lindera-decompress",
"lindera-ko-dic-builder",
"once_cell",
"tar",
@ -2270,18 +2301,18 @@ dependencies = [
[[package]]
name = "lindera-ko-dic-builder"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fde5a7352f4754be4f741e90bf4dff38a12a6572ab3880d0cf688e1166b8d82b"
checksum = "512bb1393a9281e0b13704319d1343b7931416865852d9d7b7c0178431518326"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"clap 3.2.23",
"csv",
"encoding",
"env_logger",
"env_logger 0.10.0",
"glob",
"lindera-compress",
"lindera-core",
"lindera-decompress",
"log",
@ -2290,17 +2321,16 @@ dependencies = [
[[package]]
name = "lindera-unidic-builder"
version = "0.17.0"
version = "0.21.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f1451b2ed8a7184a5f815d84f99d358c1d67297305831453dfdc0eb5d08e22b5"
checksum = "7f575a27f8ba67c15fe16ebf7d277a0ac04e8c8a0f72670ebc2443da9d41c450"
dependencies = [
"anyhow",
"bincode",
"byteorder",
"clap 3.2.23",
"csv",
"encoding",
"env_logger",
"env_logger 0.10.0",
"glob",
"lindera-core",
"lindera-decompress",
@ -2389,16 +2419,6 @@ dependencies = [
"syn",
]
[[package]]
name = "lzma-rs"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "aba8ecb0450dfabce4ad72085eed0a75dffe8f21f7ada05638564ea9db2d7fb1"
dependencies = [
"byteorder",
"crc",
]
[[package]]
name = "manifest-dir-macros"
version = "0.1.16"
@ -2425,7 +2445,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]]
name = "meili-snap"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"insta",
"md5",
@ -2434,7 +2454,7 @@ dependencies = [
[[package]]
name = "meilisearch"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"actix-cors",
"actix-http",
@ -2457,7 +2477,7 @@ dependencies = [
"deserr",
"dump",
"either",
"env_logger",
"env_logger 0.9.3",
"file-store",
"flate2",
"fst",
@ -2522,11 +2542,12 @@ dependencies = [
[[package]]
name = "meilisearch-auth"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"base64 0.13.1",
"enum-iterator",
"hmac",
"maplit",
"meilisearch-types",
"rand",
"roaring",
@ -2540,7 +2561,7 @@ dependencies = [
[[package]]
name = "meilisearch-types"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"actix-web",
"anyhow",
@ -2594,7 +2615,7 @@ dependencies = [
[[package]]
name = "milli"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"big_s",
"bimap",
@ -2948,7 +2969,7 @@ checksum = "478c572c3d73181ff3c2539045f6eb99e5491218eae919370993b890cdbdd98e"
[[package]]
name = "permissive-json-pointer"
version = "1.0.0"
version = "1.1.0"
dependencies = [
"big_s",
"serde_json",
@ -3867,9 +3888,9 @@ checksum = "cda74da7e1a664f795bb1f8a87ec406fb89a02522cf6e50620d016add6dbbf5c"
[[package]]
name = "tokio"
version = "1.24.1"
version = "1.24.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1d9f76183f91ecfb55e1d7d5602bd1d979e38a3a522fe900241cf195624d67ae"
checksum = "597a12a59981d9e3c38d216785b0c37399f6e415e8d0712047620f189371b0bb"
dependencies = [
"autocfg",
"bytes",
@ -4001,6 +4022,12 @@ version = "0.3.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "099b7128301d285f79ddd55b9a83d5e6b9e97c92e0ea0daebee7263e932de992"
[[package]]
name = "unicode-blocks"
version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9de2be6bad6f56ce8373d377e611cbb2265de3a656138065609ce82e217aad70"
[[package]]
name = "unicode-ident"
version = "1.0.6"
@ -4422,7 +4449,7 @@ dependencies = [
"pbkdf2",
"sha1",
"time",
"zstd",
"zstd 0.11.2+zstd.1.5.2",
]
[[package]]
@ -4431,7 +4458,16 @@ version = "0.11.2+zstd.1.5.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "20cc960326ece64f010d2d2107537f26dc589a6573a316bd5b1dba685fa5fde4"
dependencies = [
"zstd-safe",
"zstd-safe 5.0.2+zstd.1.5.2",
]
[[package]]
name = "zstd"
version = "0.12.3+zstd.1.5.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "76eea132fb024e0e13fd9c2f5d5d595d8a967aa72382ac2f9d39fcc95afd0806"
dependencies = [
"zstd-safe 6.0.4+zstd.1.5.4",
]
[[package]]
@ -4445,10 +4481,20 @@ dependencies = [
]
[[package]]
name = "zstd-sys"
version = "2.0.5+zstd.1.5.2"
name = "zstd-safe"
version = "6.0.4+zstd.1.5.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "edc50ffce891ad571e9f9afe5039c4837bede781ac4bb13052ed7ae695518596"
checksum = "7afb4b54b8910cf5447638cb54bf4e8a65cbedd783af98b98c62ffe91f185543"
dependencies = [
"libc",
"zstd-sys",
]
[[package]]
name = "zstd-sys"
version = "2.0.7+zstd.1.5.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "94509c3ba2fe55294d752b79842c530ccfab760192521df74a081a78d2b3c7f5"
dependencies = [
"cc",
"libc",

View File

@ -16,6 +16,15 @@ members = [
"benchmarks"
]
[workspace.package]
version = "1.1.0"
authors = ["Quentin de Quelen <quentin@dequelen.me>", "Clément Renault <clement@meilisearch.com>"]
description = "Meilisearch HTTP server"
homepage = "https://meilisearch.com"
readme = "README.md"
edition = "2021"
license = "MIT"
[profile.release]
codegen-units = 1

View File

@ -7,7 +7,8 @@ WORKDIR /meilisearch
ARG COMMIT_SHA
ARG COMMIT_DATE
ENV VERGEN_GIT_SHA=${COMMIT_SHA} VERGEN_GIT_COMMIT_TIMESTAMP=${COMMIT_DATE}
ARG GIT_TAG
ENV VERGEN_GIT_SHA=${COMMIT_SHA} VERGEN_GIT_COMMIT_TIMESTAMP=${COMMIT_DATE} VERGEN_GIT_SEMVER_LIGHTWEIGHT=${GIT_TAG}
ENV RUSTFLAGS="-C target-feature=-crt-static"
COPY . .

View File

@ -1,9 +1,15 @@
[package]
name = "benchmarks"
version = "1.0.0"
edition = "2018"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.65"
csv = "1.1.6"

View File

@ -29,7 +29,7 @@ fn bench_formatting(c: &mut criterion::Criterion) {
(vec![Rc::new(MatchingWord::new("thedoord".to_string(), 1, true).unwrap())], vec![0, 1, 2]),
(vec![Rc::new(MatchingWord::new("doord".to_string(), 1, true).unwrap())], vec![1, 2]),
]
), TokenizerBuilder::default().build()),
).unwrap(), TokenizerBuilder::default().build()),
},
];

View File

@ -118,3 +118,13 @@ ssl_resumption = false
ssl_tickets = false
# Activates SSL tickets.
# https://docs.meilisearch.com/learn/configuration/instance_options.html#ssl-tickets
#############################
### Experimental features ###
#############################
experimental_enable_metrics = false
# Experimental metrics feature. For more information, see: <https://github.com/meilisearch/meilisearch/discussions/3518>
# Enables the Prometheus metrics on the `GET /metrics` endpoint.

View File

@ -1,7 +1,14 @@
[package]
name = "dump"
version = "1.0.0"
edition = "2021"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
edition.workspace = true
homepage.workspace = true
readme.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.65"

View File

@ -203,12 +203,11 @@ pub(crate) mod test {
use big_s::S;
use maplit::btreeset;
use meilisearch_types::index_uid::IndexUid;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::keys::{Action, Key};
use meilisearch_types::milli::update::Setting;
use meilisearch_types::milli::{self};
use meilisearch_types::settings::{Checked, Settings};
use meilisearch_types::star_or::StarOr;
use meilisearch_types::tasks::{Details, Status};
use serde_json::{json, Map, Value};
use time::macros::datetime;
@ -341,7 +340,7 @@ pub(crate) mod test {
name: Some(S("doggos_key")),
uid: Uuid::from_str("9f8a34da-b6b2-42f0-939b-dbd4c3448655").unwrap(),
actions: vec![Action::DocumentsAll],
indexes: vec![StarOr::Other(IndexUid::from_str("doggos").unwrap())],
indexes: vec![IndexUidPattern::from_str("doggos").unwrap()],
expires_at: Some(datetime!(4130-03-14 12:21 UTC)),
created_at: datetime!(1960-11-15 0:00 UTC),
updated_at: datetime!(2022-11-10 0:00 UTC),
@ -351,7 +350,7 @@ pub(crate) mod test {
name: Some(S("master_key")),
uid: Uuid::from_str("4622f717-1c00-47bb-a494-39d76a49b591").unwrap(),
actions: vec![Action::All],
indexes: vec![StarOr::Star],
indexes: vec![IndexUidPattern::all()],
expires_at: None,
created_at: datetime!(0000-01-01 00:01 UTC),
updated_at: datetime!(1964-05-04 17:25 UTC),

View File

@ -181,10 +181,8 @@ impl CompatV5ToV6 {
.indexes
.into_iter()
.map(|index| match index {
v5::StarOr::Star => v6::StarOr::Star,
v5::StarOr::Other(uid) => {
v6::StarOr::Other(v6::IndexUid::new_unchecked(uid.as_str()))
}
v5::StarOr::Star => v6::IndexUidPattern::all(),
v5::StarOr::Other(uid) => v6::IndexUidPattern::new_unchecked(uid.as_str()),
})
.collect(),
expires_at: key.expires_at,

View File

@ -34,8 +34,7 @@ pub type PaginationSettings = meilisearch_types::settings::PaginationSettings;
// everything related to the api keys
pub type Action = meilisearch_types::keys::Action;
pub type StarOr<T> = meilisearch_types::star_or::StarOr<T>;
pub type IndexUid = meilisearch_types::index_uid::IndexUid;
pub type IndexUidPattern = meilisearch_types::index_uid_pattern::IndexUidPattern;
// everything related to the errors
pub type ResponseError = meilisearch_types::error::ResponseError;

View File

@ -1,7 +1,14 @@
[package]
name = "file-store"
version = "1.0.0"
edition = "2021"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
tempfile = "3.3.0"

View File

@ -116,10 +116,20 @@ impl FileStore {
/// List the Uuids of the files in the FileStore
pub fn all_uuids(&self) -> Result<impl Iterator<Item = Result<Uuid>>> {
Ok(self.path.read_dir()?.map(|entry| {
Ok(Uuid::from_str(
entry?.file_name().to_str().ok_or(Error::CouldNotParseFileNameAsUtf8)?,
)?)
Ok(self.path.read_dir()?.filter_map(|entry| {
let file_name = match entry {
Ok(entry) => entry.file_name(),
Err(e) => return Some(Err(e.into())),
};
let file_name = match file_name.to_str() {
Some(file_name) => file_name,
None => return Some(Err(Error::CouldNotParseFileNameAsUtf8)),
};
if file_name.starts_with('.') {
None
} else {
Some(Uuid::from_str(file_name).map_err(|e| e.into()))
}
}))
}
}
@ -135,3 +145,34 @@ impl File {
Ok(())
}
}
#[cfg(test)]
mod test {
use std::io::Write;
use tempfile::TempDir;
use super::*;
#[test]
fn all_uuids() {
let dir = TempDir::new().unwrap();
let fs = FileStore::new(dir.path()).unwrap();
let (uuid, mut file) = fs.new_update().unwrap();
file.write_all(b"Hello world").unwrap();
file.persist().unwrap();
let all_uuids = fs.all_uuids().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(all_uuids, vec![uuid]);
let (uuid2, file) = fs.new_update().unwrap();
let all_uuids = fs.all_uuids().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(all_uuids, vec![uuid]);
file.persist().unwrap();
let mut all_uuids = fs.all_uuids().unwrap().collect::<Result<Vec<_>>>().unwrap();
all_uuids.sort();
let mut expected = vec![uuid, uuid2];
expected.sort();
assert_eq!(all_uuids, expected);
}
}

View File

@ -1,10 +1,16 @@
[package]
name = "filter-parser"
version = "1.0.0"
edition = "2021"
description = "The parser for the Meilisearch filter syntax"
publish = false
version.workspace = true
authors.workspace = true
# description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
nom = "7.1.1"
nom_locate = "4.0.0"

View File

@ -1,11 +1,17 @@
[package]
name = "flatten-serde-json"
version = "1.0.0"
edition = "2021"
description = "Flatten serde-json objects like elastic search"
readme = "README.md"
publish = false
version.workspace = true
authors.workspace = true
# description.workspace = true
homepage.workspace = true
# readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
serde_json = "1.0"

View File

@ -1,7 +1,14 @@
[package]
name = "index-scheduler"
version = "1.0.0"
edition = "2021"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
anyhow = "1.0.64"
@ -12,6 +19,7 @@ dump = { path = "../dump" }
enum-iterator = "1.1.3"
file-store = { path = "../file-store" }
log = "0.4.14"
meilisearch-auth = { path = "../meilisearch-auth" }
meilisearch-types = { path = "../meilisearch-types" }
page_size = "0.5.0"
roaring = { version = "0.10.0", features = ["serde"] }

View File

@ -88,11 +88,11 @@ pub enum BatchKind {
DocumentClear {
ids: Vec<TaskId>,
},
DocumentImport {
DocumentOperation {
method: IndexDocumentsMethod,
allow_index_creation: bool,
primary_key: Option<String>,
import_ids: Vec<TaskId>,
operation_ids: Vec<TaskId>,
},
DocumentDeletion {
deletion_ids: Vec<TaskId>,
@ -102,12 +102,12 @@ pub enum BatchKind {
allow_index_creation: bool,
settings_ids: Vec<TaskId>,
},
SettingsAndDocumentImport {
SettingsAndDocumentOperation {
settings_ids: Vec<TaskId>,
method: IndexDocumentsMethod,
allow_index_creation: bool,
primary_key: Option<String>,
import_ids: Vec<TaskId>,
operation_ids: Vec<TaskId>,
},
Settings {
allow_index_creation: bool,
@ -131,9 +131,9 @@ impl BatchKind {
#[rustfmt::skip]
fn allow_index_creation(&self) -> Option<bool> {
match self {
BatchKind::DocumentImport { allow_index_creation, .. }
BatchKind::DocumentOperation { allow_index_creation, .. }
| BatchKind::ClearAndSettings { allow_index_creation, .. }
| BatchKind::SettingsAndDocumentImport { allow_index_creation, .. }
| BatchKind::SettingsAndDocumentOperation { allow_index_creation, .. }
| BatchKind::Settings { allow_index_creation, .. } => Some(*allow_index_creation),
_ => None,
}
@ -141,8 +141,8 @@ impl BatchKind {
fn primary_key(&self) -> Option<Option<&str>> {
match self {
BatchKind::DocumentImport { primary_key, .. }
| BatchKind::SettingsAndDocumentImport { primary_key, .. } => {
BatchKind::DocumentOperation { primary_key, .. }
| BatchKind::SettingsAndDocumentOperation { primary_key, .. } => {
Some(primary_key.as_deref())
}
_ => None,
@ -173,22 +173,22 @@ impl BatchKind {
if primary_key.is_none() || pk.is_none() || primary_key == pk.as_deref() =>
{
(
Continue(BatchKind::DocumentImport {
Continue(BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key: pk,
import_ids: vec![task_id],
operation_ids: vec![task_id],
}),
allow_index_creation,
)
}
// if the primary key set in the task was different than ours we should stop and make this batch fail asap.
K::DocumentImport { method, allow_index_creation, primary_key } => (
Break(BatchKind::DocumentImport {
Break(BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key,
import_ids: vec![task_id],
operation_ids: vec![task_id],
}),
allow_index_creation,
),
@ -249,7 +249,7 @@ impl BatchKind {
(
BatchKind::DocumentClear { mut ids }
| BatchKind::DocumentDeletion { deletion_ids: mut ids }
| BatchKind::DocumentImport { method: _, allow_index_creation: _, primary_key: _, import_ids: mut ids }
| BatchKind::DocumentOperation { method: _, allow_index_creation: _, primary_key: _, operation_ids: mut ids }
| BatchKind::Settings { allow_index_creation: _, settings_ids: mut ids },
K::IndexDeletion,
) => {
@ -258,7 +258,7 @@ impl BatchKind {
}
(
BatchKind::ClearAndSettings { settings_ids: mut ids, allow_index_creation: _, mut other }
| BatchKind::SettingsAndDocumentImport { import_ids: mut ids, method: _, allow_index_creation: _, primary_key: _, settings_ids: mut other },
| BatchKind::SettingsAndDocumentOperation { operation_ids: mut ids, method: _, allow_index_creation: _, primary_key: _, settings_ids: mut other },
K::IndexDeletion,
) => {
ids.push(id);
@ -278,63 +278,108 @@ impl BatchKind {
K::DocumentImport { .. } | K::Settings { .. },
) => Break(this),
(
BatchKind::DocumentImport { method: _, allow_index_creation: _, primary_key: _, import_ids: mut ids },
BatchKind::DocumentOperation { method: _, allow_index_creation: _, primary_key: _, mut operation_ids },
K::DocumentClear,
) => {
ids.push(id);
Continue(BatchKind::DocumentClear { ids })
operation_ids.push(id);
Continue(BatchKind::DocumentClear { ids: operation_ids })
}
// we can autobatch the same kind of document additions / updates
(
BatchKind::DocumentImport { method: ReplaceDocuments, allow_index_creation, primary_key: _, mut import_ids },
BatchKind::DocumentOperation { method: ReplaceDocuments, allow_index_creation, primary_key: _, mut operation_ids },
K::DocumentImport { method: ReplaceDocuments, primary_key: pk, .. },
) => {
import_ids.push(id);
Continue(BatchKind::DocumentImport {
operation_ids.push(id);
Continue(BatchKind::DocumentOperation {
method: ReplaceDocuments,
allow_index_creation,
import_ids,
operation_ids,
primary_key: pk,
})
}
(
BatchKind::DocumentImport { method: UpdateDocuments, allow_index_creation, primary_key: _, mut import_ids },
BatchKind::DocumentOperation { method: UpdateDocuments, allow_index_creation, primary_key: _, mut operation_ids },
K::DocumentImport { method: UpdateDocuments, primary_key: pk, .. },
) => {
import_ids.push(id);
Continue(BatchKind::DocumentImport {
operation_ids.push(id);
Continue(BatchKind::DocumentOperation {
method: UpdateDocuments,
allow_index_creation,
primary_key: pk,
import_ids,
operation_ids,
})
}
(
BatchKind::DocumentOperation { method, allow_index_creation, primary_key, mut operation_ids },
K::DocumentDeletion,
) => {
operation_ids.push(id);
Continue(BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key,
operation_ids,
})
}
// but we can't autobatch documents if it's not the same kind
// this match branch MUST be AFTER the previous one
(
this @ BatchKind::DocumentImport { .. },
K::DocumentDeletion | K::DocumentImport { .. },
this @ BatchKind::DocumentOperation { .. },
K::DocumentImport { .. },
) => Break(this),
(
BatchKind::DocumentImport { method, allow_index_creation, primary_key, import_ids },
BatchKind::DocumentOperation { method, allow_index_creation, primary_key, operation_ids },
K::Settings { .. },
) => Continue(BatchKind::SettingsAndDocumentImport {
) => Continue(BatchKind::SettingsAndDocumentOperation {
settings_ids: vec![id],
method,
allow_index_creation,
primary_key,
import_ids,
operation_ids,
}),
(BatchKind::DocumentDeletion { mut deletion_ids }, K::DocumentClear) => {
deletion_ids.push(id);
Continue(BatchKind::DocumentClear { ids: deletion_ids })
}
(this @ BatchKind::DocumentDeletion { .. }, K::DocumentImport { .. }) => Break(this),
// we can autobatch the deletion and import if the index already exists
(
BatchKind::DocumentDeletion { mut deletion_ids },
K::DocumentImport { method, allow_index_creation, primary_key }
) if index_already_exists => {
deletion_ids.push(id);
Continue(BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key,
operation_ids: deletion_ids,
})
}
// we can autobatch the deletion and import if both can't create an index
(
BatchKind::DocumentDeletion { mut deletion_ids },
K::DocumentImport { method, allow_index_creation, primary_key }
) if !allow_index_creation => {
deletion_ids.push(id);
Continue(BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key,
operation_ids: deletion_ids,
})
}
// we can't autobatch a deletion and an import if the index does not exists but would be created by an addition
(
this @ BatchKind::DocumentDeletion { .. },
K::DocumentImport { .. }
) => {
Break(this)
}
(BatchKind::DocumentDeletion { mut deletion_ids }, K::DocumentDeletion) => {
deletion_ids.push(id);
Continue(BatchKind::DocumentDeletion { deletion_ids })
@ -403,60 +448,60 @@ impl BatchKind {
})
}
(
BatchKind::SettingsAndDocumentImport { settings_ids, method: _, import_ids: mut other, allow_index_creation, primary_key: _ },
BatchKind::SettingsAndDocumentOperation { settings_ids, method: _, mut operation_ids, allow_index_creation, primary_key: _ },
K::DocumentClear,
) => {
other.push(id);
operation_ids.push(id);
Continue(BatchKind::ClearAndSettings {
settings_ids,
other,
other: operation_ids,
allow_index_creation,
})
}
(
BatchKind::SettingsAndDocumentImport { settings_ids, method: ReplaceDocuments, mut import_ids, allow_index_creation, primary_key: _},
BatchKind::SettingsAndDocumentOperation { settings_ids, method: ReplaceDocuments, mut operation_ids, allow_index_creation, primary_key: _},
K::DocumentImport { method: ReplaceDocuments, primary_key: pk2, .. },
) => {
import_ids.push(id);
Continue(BatchKind::SettingsAndDocumentImport {
operation_ids.push(id);
Continue(BatchKind::SettingsAndDocumentOperation {
settings_ids,
method: ReplaceDocuments,
allow_index_creation,
primary_key: pk2,
import_ids,
operation_ids,
})
}
(
BatchKind::SettingsAndDocumentImport { settings_ids, method: UpdateDocuments, allow_index_creation, primary_key: _, mut import_ids },
BatchKind::SettingsAndDocumentOperation { settings_ids, method: UpdateDocuments, allow_index_creation, primary_key: _, mut operation_ids },
K::DocumentImport { method: UpdateDocuments, primary_key: pk2, .. },
) => {
import_ids.push(id);
Continue(BatchKind::SettingsAndDocumentImport {
operation_ids.push(id);
Continue(BatchKind::SettingsAndDocumentOperation {
settings_ids,
method: UpdateDocuments,
allow_index_creation,
primary_key: pk2,
import_ids,
operation_ids,
})
}
// But we can't batch a settings and a doc op with another doc op
// this MUST be AFTER the two previous branch
(
this @ BatchKind::SettingsAndDocumentImport { .. },
this @ BatchKind::SettingsAndDocumentOperation { .. },
K::DocumentDeletion | K::DocumentImport { .. },
) => Break(this),
(
BatchKind::SettingsAndDocumentImport { mut settings_ids, method, allow_index_creation,primary_key, import_ids },
BatchKind::SettingsAndDocumentOperation { mut settings_ids, method, allow_index_creation,primary_key, operation_ids },
K::Settings { .. },
) => {
settings_ids.push(id);
Continue(BatchKind::SettingsAndDocumentImport {
Continue(BatchKind::SettingsAndDocumentOperation {
settings_ids,
method,
allow_index_creation,
primary_key,
import_ids,
operation_ids,
})
}
(
@ -588,29 +633,29 @@ mod tests {
fn autobatch_simple_operation_together() {
// we can autobatch one or multiple `ReplaceDocuments` together.
// if the index exists.
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp( ReplaceDocuments, false , None), doc_imp(ReplaceDocuments, false , None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0, 1, 2] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp( ReplaceDocuments, false , None), doc_imp(ReplaceDocuments, false , None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1, 2] }, false))");
// if it doesn't exists.
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp( ReplaceDocuments, true , None), doc_imp(ReplaceDocuments, true , None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
// we can autobatch one or multiple `UpdateDocuments` together.
// if the index exists.
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: false, primary_key: None, import_ids: [0, 1, 2] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1, 2] }, false))");
// if it doesn't exists.
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: false, primary_key: None, import_ids: [0, 1, 2] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1, 2] }, false))");
// we can autobatch one or multiple DocumentDeletion together
debug_snapshot!(autobatch_from(true, None, [doc_del()]), @"Some((DocumentDeletion { deletion_ids: [0] }, false))");
@ -628,56 +673,83 @@ mod tests {
debug_snapshot!(autobatch_from(false,None, [settings(true), settings(true), settings(true)]), @"Some((Settings { allow_index_creation: true, settings_ids: [0, 1, 2] }, true))");
debug_snapshot!(autobatch_from(false,None, [settings(false)]), @"Some((Settings { allow_index_creation: false, settings_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [settings(false), settings(false), settings(false)]), @"Some((Settings { allow_index_creation: false, settings_ids: [0, 1, 2] }, false))");
// We can autobatch document addition with document deletion
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, None), doc_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, true, None), doc_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, true, None), doc_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, false, None), doc_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, false, None), doc_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, true, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(ReplaceDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(false, None, [doc_imp(UpdateDocuments, false, Some("catto")), doc_del()]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
// And the other way around
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, true, Some("catto"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, true, Some("catto"))]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, false, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, false, Some("catto"))]), @r###"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: Some("catto"), operation_ids: [0, 1] }, false))"###);
}
#[test]
fn simple_document_operation_dont_autobatch_with_other() {
// addition, updates and deletion can't batch together
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_del()]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_del()]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_create()]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_create()]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_create()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_create()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_create()]), @"Some((DocumentDeletion { deletion_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_update()]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_update()]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_update()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_update()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_update()]), @"Some((DocumentDeletion { deletion_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_swap()]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_swap()]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), idx_swap()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), idx_swap()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_del(), idx_swap()]), @"Some((DocumentDeletion { deletion_ids: [0] }, false))");
}
#[test]
fn document_addition_batch_with_settings() {
// simple case
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
// multiple settings and doc addition
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((SettingsAndDocumentImport { settings_ids: [2, 3], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((SettingsAndDocumentImport { settings_ids: [2, 3], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [2, 3], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None), settings(true), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [2, 3], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
// addition and setting unordered
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentImport { settings_ids: [1, 3], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentImport { settings_ids: [1, 3], method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1, 3], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 2] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1, 3], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 2] }, true))");
// We ensure this kind of batch doesn't batch with forbidden operations
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None)]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None)]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_del()]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_del()]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_create()]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_create()]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_update()]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_update()]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_swap()]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_swap()]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_imp(UpdateDocuments, true, None)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_imp(ReplaceDocuments, true, None)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), doc_del()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), doc_del()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_create()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_create()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_update()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_update()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true), idx_swap()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(UpdateDocuments, true, None), settings(true), idx_swap()]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: UpdateDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
}
#[test]
@ -789,67 +861,73 @@ mod tests {
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), settings(false), doc_clr(), idx_del()]), @"Some((IndexDeletion { ids: [1, 3, 0, 2] }, false))");
// The third and final case is when the first task doesn't create an index but is directly followed by a task creating an index. In this case we can't batch whith what
// follows because we first need to process the erronous batch.
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments,false, None), settings(true), idx_del()]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), settings(true), idx_del()]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments,false, None), settings(true), doc_clr(), idx_del()]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), settings(true), doc_clr(), idx_del()]), @"Some((DocumentImport { method: UpdateDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments,false, None), settings(true), idx_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), settings(true), idx_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments,false, None), settings(true), doc_clr(), idx_del()]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(UpdateDocuments, false, None), settings(true), doc_clr(), idx_del()]), @"Some((DocumentOperation { method: UpdateDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
}
#[test]
fn allowed_and_disallowed_index_creation() {
// `DocumentImport` can't be mixed with those disallowed to do so except if the index already exists.
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentImport { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, import_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), doc_imp(ReplaceDocuments, false, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0, 1] }, false))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, true, None), settings(true)]), @"Some((SettingsAndDocumentOperation { settings_ids: [1], method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(false,None, [doc_imp(ReplaceDocuments, false, None), settings(true)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: false, primary_key: None, operation_ids: [0] }, false))");
// batch deletion and addition
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, true, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, true, Some("catto"))]), @"Some((DocumentDeletion { deletion_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0] }, false))");
debug_snapshot!(autobatch_from(false, None, [doc_del(), doc_imp(UpdateDocuments, true, None)]), @"Some((DocumentDeletion { deletion_ids: [0] }, false))");
}
#[test]
fn autobatch_primary_key() {
// ==> If I have a pk
// With a single update
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
// With a multiple updates
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other"))]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other"))]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0, 1] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, Some("id"), [doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("other")), doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
// ==> If I don't have a pk
// With a single update
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("id"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("other"))]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("other"), operation_ids: [0] }, true))"###);
// With a multiple updates
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id"))]), @"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, import_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentImport { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), import_ids: [0] }, true))"###);
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, None)]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0, 1] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, None), doc_imp(ReplaceDocuments, true, Some("id"))]), @"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: None, operation_ids: [0] }, true))");
debug_snapshot!(autobatch_from(true, None, [doc_imp(ReplaceDocuments, true, Some("id")), doc_imp(ReplaceDocuments, true, None)]), @r###"Some((DocumentOperation { method: ReplaceDocuments, allow_index_creation: true, primary_key: Some("id"), operation_ids: [0] }, true))"###);
}
}

View File

@ -28,8 +28,7 @@ use meilisearch_types::heed::{RoTxn, RwTxn};
use meilisearch_types::milli::documents::{obkv_to_object, DocumentsBatchReader};
use meilisearch_types::milli::heed::CompactionOption;
use meilisearch_types::milli::update::{
DocumentAdditionResult, DocumentDeletionResult, IndexDocumentsConfig, IndexDocumentsMethod,
Settings as MilliSettings,
DocumentDeletionResult, IndexDocumentsConfig, IndexDocumentsMethod, Settings as MilliSettings,
};
use meilisearch_types::milli::{self, BEU32};
use meilisearch_types::settings::{apply_settings_to_builder, Settings, Unchecked};
@ -86,15 +85,21 @@ pub(crate) enum Batch {
},
}
#[derive(Debug)]
pub(crate) enum DocumentOperation {
Add(Uuid),
Delete(Vec<String>),
}
/// A [batch](Batch) that combines multiple tasks operating on an index.
#[derive(Debug)]
pub(crate) enum IndexOperation {
DocumentImport {
DocumentOperation {
index_uid: String,
primary_key: Option<String>,
method: IndexDocumentsMethod,
documents_counts: Vec<u64>,
content_files: Vec<Uuid>,
operations: Vec<DocumentOperation>,
tasks: Vec<Task>,
},
DocumentDeletion {
@ -121,13 +126,13 @@ pub(crate) enum IndexOperation {
settings: Vec<(bool, Settings<Unchecked>)>,
settings_tasks: Vec<Task>,
},
SettingsAndDocumentImport {
SettingsAndDocumentOperation {
index_uid: String,
primary_key: Option<String>,
method: IndexDocumentsMethod,
documents_counts: Vec<u64>,
content_files: Vec<Uuid>,
operations: Vec<DocumentOperation>,
document_import_tasks: Vec<Task>,
// The boolean indicates if it's a settings deletion or creation.
@ -149,13 +154,13 @@ impl Batch {
tasks.iter().map(|task| task.uid).collect()
}
Batch::IndexOperation { op, .. } => match op {
IndexOperation::DocumentImport { tasks, .. }
IndexOperation::DocumentOperation { tasks, .. }
| IndexOperation::DocumentDeletion { tasks, .. }
| IndexOperation::Settings { tasks, .. }
| IndexOperation::DocumentClear { tasks, .. } => {
tasks.iter().map(|task| task.uid).collect()
}
IndexOperation::SettingsAndDocumentImport {
IndexOperation::SettingsAndDocumentOperation {
document_import_tasks: tasks,
settings_tasks: other,
..
@ -169,17 +174,33 @@ impl Batch {
Batch::IndexSwap { task } => vec![task.uid],
}
}
/// Return the index UID associated with this batch
pub fn index_uid(&self) -> Option<&str> {
use Batch::*;
match self {
TaskCancelation { .. }
| TaskDeletion(_)
| SnapshotCreation(_)
| Dump(_)
| IndexSwap { .. } => None,
IndexOperation { op, .. } => Some(op.index_uid()),
IndexCreation { index_uid, .. }
| IndexUpdate { index_uid, .. }
| IndexDeletion { index_uid, .. } => Some(index_uid),
}
}
}
impl IndexOperation {
pub fn index_uid(&self) -> &str {
match self {
IndexOperation::DocumentImport { index_uid, .. }
IndexOperation::DocumentOperation { index_uid, .. }
| IndexOperation::DocumentDeletion { index_uid, .. }
| IndexOperation::DocumentClear { index_uid, .. }
| IndexOperation::Settings { index_uid, .. }
| IndexOperation::DocumentClearAndSetting { index_uid, .. }
| IndexOperation::SettingsAndDocumentImport { index_uid, .. } => index_uid,
| IndexOperation::SettingsAndDocumentOperation { index_uid, .. } => index_uid,
}
}
}
@ -206,17 +227,22 @@ impl IndexScheduler {
},
must_create_index,
})),
BatchKind::DocumentImport { method, import_ids, .. } => {
let tasks = self.get_existing_tasks(rtxn, import_ids)?;
let primary_key = match &tasks[0].kind {
KindWithContent::DocumentAdditionOrUpdate { primary_key, .. } => {
primary_key.clone()
}
_ => unreachable!(),
};
BatchKind::DocumentOperation { method, operation_ids, .. } => {
let tasks = self.get_existing_tasks(rtxn, operation_ids)?;
let primary_key = tasks
.iter()
.find_map(|task| match task.kind {
KindWithContent::DocumentAdditionOrUpdate { ref primary_key, .. } => {
// we want to stop on the first document addition
Some(primary_key.clone())
}
KindWithContent::DocumentDeletion { .. } => None,
_ => unreachable!(),
})
.flatten();
let mut documents_counts = Vec::new();
let mut content_files = Vec::new();
let mut operations = Vec::new();
for task in tasks.iter() {
match task.kind {
@ -226,19 +252,23 @@ impl IndexScheduler {
..
} => {
documents_counts.push(documents_count);
content_files.push(content_file);
operations.push(DocumentOperation::Add(content_file));
}
KindWithContent::DocumentDeletion { ref documents_ids, .. } => {
documents_counts.push(documents_ids.len() as u64);
operations.push(DocumentOperation::Delete(documents_ids.clone()));
}
_ => unreachable!(),
}
}
Ok(Some(Batch::IndexOperation {
op: IndexOperation::DocumentImport {
op: IndexOperation::DocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
content_files,
operations,
tasks,
},
must_create_index,
@ -322,12 +352,12 @@ impl IndexScheduler {
must_create_index,
}))
}
BatchKind::SettingsAndDocumentImport {
BatchKind::SettingsAndDocumentOperation {
settings_ids,
method,
allow_index_creation,
primary_key,
import_ids,
operation_ids,
} => {
let settings = self.create_next_batch_index(
rtxn,
@ -339,11 +369,11 @@ impl IndexScheduler {
let document_import = self.create_next_batch_index(
rtxn,
index_uid.clone(),
BatchKind::DocumentImport {
BatchKind::DocumentOperation {
method,
allow_index_creation,
primary_key,
import_ids,
operation_ids,
},
must_create_index,
)?;
@ -352,10 +382,10 @@ impl IndexScheduler {
(
Some(Batch::IndexOperation {
op:
IndexOperation::DocumentImport {
IndexOperation::DocumentOperation {
primary_key,
documents_counts,
content_files,
operations,
tasks: document_import_tasks,
..
},
@ -366,12 +396,12 @@ impl IndexScheduler {
..
}),
) => Ok(Some(Batch::IndexOperation {
op: IndexOperation::SettingsAndDocumentImport {
op: IndexOperation::SettingsAndDocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
content_files,
operations,
document_import_tasks,
settings,
settings_tasks,
@ -645,9 +675,6 @@ impl IndexScheduler {
}
// 3. Snapshot every indexes
// TODO we are opening all of the indexes it can be too much we should unload all
// of the indexes we are trying to open. It would be even better to only unload
// the ones that were opened by us. Or maybe use a LRU in the index mapper.
for result in self.index_mapper.index_mapping.iter(&rtxn)? {
let (name, uuid) = result?;
let index = self.index_mapper.index(&rtxn, name)?;
@ -684,6 +711,14 @@ impl IndexScheduler {
// 5.3 Change the permission to make the snapshot readonly
let mut permissions = file.metadata()?.permissions();
permissions.set_readonly(true);
#[cfg(unix)]
{
use std::os::unix::fs::PermissionsExt;
#[allow(clippy::non_octal_unix_permissions)]
// rwxrwxrwx
permissions.set_mode(0b100100100);
}
file.set_permissions(permissions)?;
for task in &mut tasks {
@ -758,15 +793,15 @@ impl IndexScheduler {
dump_tasks.flush()?;
// 3. Dump the indexes
for (uid, index) in self.index_mapper.indexes(&rtxn)? {
self.index_mapper.try_for_each_index(&rtxn, |uid, index| -> Result<()> {
let rtxn = index.read_txn()?;
let metadata = IndexMetadata {
uid: uid.clone(),
uid: uid.to_owned(),
primary_key: index.primary_key(&rtxn)?.map(String::from),
created_at: index.created_at(&rtxn)?,
updated_at: index.updated_at(&rtxn)?,
};
let mut index_dumper = dump.create_index(&uid, &metadata)?;
let mut index_dumper = dump.create_index(uid, &metadata)?;
let fields_ids_map = index.fields_ids_map(&rtxn)?;
let all_fields: Vec<_> = fields_ids_map.iter().map(|(id, _)| id).collect();
@ -779,9 +814,10 @@ impl IndexScheduler {
}
// 3.2. Dump the settings
let settings = meilisearch_types::settings::settings(&index, &rtxn)?;
let settings = meilisearch_types::settings::settings(index, &rtxn)?;
index_dumper.settings(&settings)?;
}
Ok(())
})?;
let dump_uid = started_at.format(format_description!(
"[year repr:full][month repr:numerical][day padding:zero]-[hour padding:zero][minute padding:zero][second padding:zero][subsecond digits:3]"
@ -987,12 +1023,12 @@ impl IndexScheduler {
Ok(tasks)
}
IndexOperation::DocumentImport {
IndexOperation::DocumentOperation {
index_uid: _,
primary_key,
method,
documents_counts,
content_files,
documents_counts: _,
operations,
mut tasks,
} => {
let mut primary_key_has_been_set = false;
@ -1037,26 +1073,82 @@ impl IndexScheduler {
|| must_stop_processing.get(),
)?;
let mut results = Vec::new();
for content_uuid in content_files.into_iter() {
let content_file = self.file_store.get_update(content_uuid)?;
let reader = DocumentsBatchReader::from_reader(content_file)
.map_err(milli::Error::from)?;
let (new_builder, user_result) = builder.add_documents(reader)?;
builder = new_builder;
for (operation, task) in operations.into_iter().zip(tasks.iter_mut()) {
match operation {
DocumentOperation::Add(content_uuid) => {
let content_file = self.file_store.get_update(content_uuid)?;
let reader = DocumentsBatchReader::from_reader(content_file)
.map_err(milli::Error::from)?;
let (new_builder, user_result) = builder.add_documents(reader)?;
builder = new_builder;
let user_result = match user_result {
Ok(count) => Ok(DocumentAdditionResult {
indexed_documents: count,
number_of_documents: count, // TODO: this is wrong, we should use the value stored in the Details.
}),
Err(e) => Err(milli::Error::from(e)),
};
let received_documents =
if let Some(Details::DocumentAdditionOrUpdate {
received_documents,
..
}) = task.details
{
received_documents
} else {
// In the case of a `documentAdditionOrUpdate` the details MUST be set
unreachable!();
};
results.push(user_result);
match user_result {
Ok(count) => {
task.status = Status::Succeeded;
task.details = Some(Details::DocumentAdditionOrUpdate {
received_documents,
indexed_documents: Some(count),
})
}
Err(e) => {
task.status = Status::Failed;
task.details = Some(Details::DocumentAdditionOrUpdate {
received_documents,
indexed_documents: Some(0),
});
task.error = Some(milli::Error::from(e).into());
}
}
}
DocumentOperation::Delete(document_ids) => {
let (new_builder, user_result) =
builder.remove_documents(document_ids)?;
builder = new_builder;
let provided_ids =
if let Some(Details::DocumentDeletion { provided_ids, .. }) =
task.details
{
provided_ids
} else {
// In the case of a `documentAdditionOrUpdate` the details MUST be set
unreachable!();
};
match user_result {
Ok(count) => {
task.status = Status::Succeeded;
task.details = Some(Details::DocumentDeletion {
provided_ids,
deleted_documents: Some(count),
});
}
Err(e) => {
task.status = Status::Failed;
task.details = Some(Details::DocumentDeletion {
provided_ids,
deleted_documents: Some(0),
});
task.error = Some(milli::Error::from(e).into());
}
}
}
}
}
if results.iter().any(|res| res.is_ok()) {
if !tasks.iter().all(|res| res.error.is_some()) {
let addition = builder.execute()?;
info!("document addition done: {:?}", addition);
} else if primary_key_has_been_set {
@ -1071,29 +1163,6 @@ impl IndexScheduler {
)?;
}
for (task, (ret, count)) in
tasks.iter_mut().zip(results.into_iter().zip(documents_counts))
{
match ret {
Ok(DocumentAdditionResult { indexed_documents, number_of_documents }) => {
task.status = Status::Succeeded;
task.details = Some(Details::DocumentAdditionOrUpdate {
received_documents: number_of_documents,
indexed_documents: Some(indexed_documents),
});
}
Err(error) => {
task.status = Status::Failed;
task.details = Some(Details::DocumentAdditionOrUpdate {
received_documents: count,
// if there was an error we indexed 0 documents.
indexed_documents: Some(0),
});
task.error = Some(error.into())
}
}
}
Ok(tasks)
}
IndexOperation::DocumentDeletion { index_uid: _, documents, mut tasks } => {
@ -1136,12 +1205,12 @@ impl IndexScheduler {
Ok(tasks)
}
IndexOperation::SettingsAndDocumentImport {
IndexOperation::SettingsAndDocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
content_files,
operations,
document_import_tasks,
settings,
settings_tasks,
@ -1159,12 +1228,12 @@ impl IndexScheduler {
let mut import_tasks = self.apply_index_operation(
index_wtxn,
index,
IndexOperation::DocumentImport {
IndexOperation::DocumentOperation {
index_uid,
primary_key,
method,
documents_counts,
content_files,
operations,
tasks: document_import_tasks,
},
)?;

View File

@ -1,250 +0,0 @@
use std::collections::hash_map::Entry;
use std::collections::HashMap;
use std::path::{Path, PathBuf};
use std::sync::{Arc, RwLock};
use std::{fs, thread};
use log::error;
use meilisearch_types::heed::types::Str;
use meilisearch_types::heed::{Database, Env, EnvOpenOptions, RoTxn, RwTxn};
use meilisearch_types::milli::update::IndexerConfig;
use meilisearch_types::milli::Index;
use time::OffsetDateTime;
use uuid::Uuid;
use self::IndexStatus::{Available, BeingDeleted};
use crate::uuid_codec::UuidCodec;
use crate::{clamp_to_page_size, Error, Result};
const INDEX_MAPPING: &str = "index-mapping";
/// Structure managing meilisearch's indexes.
///
/// It is responsible for:
/// 1. Creating new indexes
/// 2. Opening indexes and storing references to these opened indexes
/// 3. Accessing indexes through their uuid
/// 4. Mapping a user-defined name to each index uuid.
#[derive(Clone)]
pub struct IndexMapper {
/// Keep track of the opened indexes. Used mainly by the index resolver.
index_map: Arc<RwLock<HashMap<Uuid, IndexStatus>>>,
/// Map an index name with an index uuid currently available on disk.
pub(crate) index_mapping: Database<Str, UuidCodec>,
/// Path to the folder where the LMDB environments of each index are.
base_path: PathBuf,
index_size: usize,
pub indexer_config: Arc<IndexerConfig>,
}
/// Whether the index is available for use or is forbidden to be inserted back in the index map
#[allow(clippy::large_enum_variant)]
#[derive(Clone)]
pub enum IndexStatus {
/// Do not insert it back in the index map as it is currently being deleted.
BeingDeleted,
/// You can use the index without worrying about anything.
Available(Index),
}
impl IndexMapper {
pub fn new(
env: &Env,
base_path: PathBuf,
index_size: usize,
indexer_config: IndexerConfig,
) -> Result<Self> {
Ok(Self {
index_map: Arc::default(),
index_mapping: env.create_database(Some(INDEX_MAPPING))?,
base_path,
index_size,
indexer_config: Arc::new(indexer_config),
})
}
/// Create or open an index in the specified path.
/// The path *must* exists or an error will be thrown.
fn create_or_open_index(
&self,
path: &Path,
date: Option<(OffsetDateTime, OffsetDateTime)>,
) -> Result<Index> {
let mut options = EnvOpenOptions::new();
options.map_size(clamp_to_page_size(self.index_size));
options.max_readers(1024);
if let Some((created, updated)) = date {
Ok(Index::new_with_creation_dates(options, path, created, updated)?)
} else {
Ok(Index::new(options, path)?)
}
}
/// Get or create the index.
pub fn create_index(
&self,
mut wtxn: RwTxn,
name: &str,
date: Option<(OffsetDateTime, OffsetDateTime)>,
) -> Result<Index> {
match self.index(&wtxn, name) {
Ok(index) => {
wtxn.commit()?;
Ok(index)
}
Err(Error::IndexNotFound(_)) => {
let uuid = Uuid::new_v4();
self.index_mapping.put(&mut wtxn, name, &uuid)?;
let index_path = self.base_path.join(uuid.to_string());
fs::create_dir_all(&index_path)?;
let index = self.create_or_open_index(&index_path, date)?;
wtxn.commit()?;
// TODO: it would be better to lazily create the index. But we need an Index::open function for milli.
if let Some(BeingDeleted) =
self.index_map.write().unwrap().insert(uuid, Available(index.clone()))
{
panic!("Uuid v4 conflict.");
}
Ok(index)
}
error => error,
}
}
/// Removes the index from the mapping table and the in-memory index map
/// but keeps the associated tasks.
pub fn delete_index(&self, mut wtxn: RwTxn, name: &str) -> Result<()> {
let uuid = self
.index_mapping
.get(&wtxn, name)?
.ok_or_else(|| Error::IndexNotFound(name.to_string()))?;
// Once we retrieved the UUID of the index we remove it from the mapping table.
assert!(self.index_mapping.delete(&mut wtxn, name)?);
wtxn.commit()?;
// We remove the index from the in-memory index map.
let mut lock = self.index_map.write().unwrap();
let closing_event = match lock.insert(uuid, BeingDeleted) {
Some(Available(index)) => Some(index.prepare_for_closing()),
_ => None,
};
drop(lock);
let index_map = self.index_map.clone();
let index_path = self.base_path.join(uuid.to_string());
let index_name = name.to_string();
thread::Builder::new()
.name(String::from("index_deleter"))
.spawn(move || {
// We first wait to be sure that the previously opened index is effectively closed.
// This can take a lot of time, this is why we do that in a seperate thread.
if let Some(closing_event) = closing_event {
closing_event.wait();
}
// Then we remove the content from disk.
if let Err(e) = fs::remove_dir_all(&index_path) {
error!(
"An error happened when deleting the index {} ({}): {}",
index_name, uuid, e
);
}
// Finally we remove the entry from the index map.
assert!(matches!(index_map.write().unwrap().remove(&uuid), Some(BeingDeleted)));
})
.unwrap();
Ok(())
}
pub fn exists(&self, rtxn: &RoTxn, name: &str) -> Result<bool> {
Ok(self.index_mapping.get(rtxn, name)?.is_some())
}
/// Return an index, may open it if it wasn't already opened.
pub fn index(&self, rtxn: &RoTxn, name: &str) -> Result<Index> {
let uuid = self
.index_mapping
.get(rtxn, name)?
.ok_or_else(|| Error::IndexNotFound(name.to_string()))?;
// we clone here to drop the lock before entering the match
let index = self.index_map.read().unwrap().get(&uuid).cloned();
let index = match index {
Some(Available(index)) => index,
Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())),
// since we're lazy, it's possible that the index has not been opened yet.
None => {
let mut index_map = self.index_map.write().unwrap();
// between the read lock and the write lock it's not impossible
// that someone already opened the index (eg if two search happens
// at the same time), thus before opening it we check a second time
// if it's not already there.
// Since there is a good chance it's not already there we can use
// the entry method.
match index_map.entry(uuid) {
Entry::Vacant(entry) => {
let index_path = self.base_path.join(uuid.to_string());
let index = self.create_or_open_index(&index_path, None)?;
entry.insert(Available(index.clone()));
index
}
Entry::Occupied(entry) => match entry.get() {
Available(index) => index.clone(),
BeingDeleted => return Err(Error::IndexNotFound(name.to_string())),
},
}
}
};
Ok(index)
}
/// Return all indexes, may open them if they weren't already opened.
pub fn indexes(&self, rtxn: &RoTxn) -> Result<Vec<(String, Index)>> {
self.index_mapping
.iter(rtxn)?
.map(|ret| {
ret.map_err(Error::from).and_then(|(name, _)| {
self.index(rtxn, name).map(|index| (name.to_string(), index))
})
})
.collect()
}
/// Swap two index names.
pub fn swap(&self, wtxn: &mut RwTxn, lhs: &str, rhs: &str) -> Result<()> {
let lhs_uuid = self
.index_mapping
.get(wtxn, lhs)?
.ok_or_else(|| Error::IndexNotFound(lhs.to_string()))?;
let rhs_uuid = self
.index_mapping
.get(wtxn, rhs)?
.ok_or_else(|| Error::IndexNotFound(rhs.to_string()))?;
self.index_mapping.put(wtxn, lhs, &rhs_uuid)?;
self.index_mapping.put(wtxn, rhs, &lhs_uuid)?;
Ok(())
}
pub fn index_exists(&self, rtxn: &RoTxn, name: &str) -> Result<bool> {
Ok(self.index_mapping.get(rtxn, name)?.is_some())
}
pub fn indexer_config(&self) -> &IndexerConfig {
&self.indexer_config
}
}

View File

@ -0,0 +1,370 @@
/// the map size to use when we don't succeed in reading it in indexes.
const DEFAULT_MAP_SIZE: usize = 10 * 1024 * 1024 * 1024; // 10 GiB
use std::collections::BTreeMap;
use std::path::Path;
use std::time::Duration;
use meilisearch_types::heed::{EnvClosingEvent, EnvOpenOptions};
use meilisearch_types::milli::Index;
use time::OffsetDateTime;
use uuid::Uuid;
use super::IndexStatus::{self, Available, BeingDeleted, Closing, Missing};
use crate::lru::{InsertionOutcome, LruMap};
use crate::{clamp_to_page_size, Result};
/// Keep an internally consistent view of the open indexes in memory.
///
/// This view is made of an LRU cache that will evict the least frequently used indexes when new indexes are opened.
/// Indexes that are being closed (for resizing or due to cache eviction) or deleted cannot be evicted from the cache and
/// are stored separately.
///
/// This view provides operations to change the state of the index as it is known in memory:
/// open an index (making it available for queries), close an index (specifying the new size it should be opened with),
/// delete an index.
///
/// External consistency with the other bits of data of an index is provided by the `IndexMapper` parent structure.
pub struct IndexMap {
/// A LRU map of indexes that are in the open state and available for queries.
available: LruMap<Uuid, Index>,
/// A map of indexes that are not available for queries, either because they are being deleted
/// or because they are being closed.
///
/// If they are being deleted, the UUID points to `None`.
unavailable: BTreeMap<Uuid, Option<ClosingIndex>>,
/// A monotonically increasing generation number, used to differentiate between multiple successive index closing requests.
///
/// Because multiple readers could be waiting on an index to close, the following could theoretically happen:
///
/// 1. Multiple readers wait for the index closing to occur.
/// 2. One of them "wins the race", takes the lock and then removes the index that finished closing from the map.
/// 3. The index is reopened, but must be closed again (such as being resized again).
/// 4. One reader that "lost the race" in (2) wakes up and tries to take the lock and remove the index from the map.
///
/// In that situation, the index may or may not have finished closing. The `generation` field allows to remember which
/// closing request was made, so the reader that "lost the race" has the old generation and will need to wait again for the index
/// to close.
generation: usize,
}
#[derive(Clone)]
pub struct ClosingIndex {
uuid: Uuid,
closing_event: EnvClosingEvent,
map_size: usize,
generation: usize,
}
impl ClosingIndex {
/// Waits for the index to be definitely closed.
///
/// To avoid blocking, users should relinquish their locks to the IndexMap before calling this function.
///
/// After the index is physically closed, the in memory map must still be updated to take this into account.
/// To do so, a `ReopenableIndex` is returned, that can be used to either definitely close or definitely open
/// the index without waiting anymore.
pub fn wait_timeout(self, timeout: Duration) -> Option<ReopenableIndex> {
self.closing_event.wait_timeout(timeout).then_some(ReopenableIndex {
uuid: self.uuid,
map_size: self.map_size,
generation: self.generation,
})
}
}
pub struct ReopenableIndex {
uuid: Uuid,
map_size: usize,
generation: usize,
}
impl ReopenableIndex {
/// Attempts to reopen the index, which can result in the index being reopened again or not
/// (e.g. if another thread already opened and closed the index again).
///
/// Use get again on the IndexMap to get the updated status.
///
/// Fails if the underlying index creation fails.
///
/// # Status table
///
/// | Previous Status | New Status |
/// |-----------------|----------------------------------------------|
/// | Missing | Missing |
/// | BeingDeleted | BeingDeleted |
/// | Closing | Available or Closing depending on generation |
/// | Available | Available |
///
pub fn reopen(self, map: &mut IndexMap, path: &Path) -> Result<()> {
if let Closing(reopen) = map.get(&self.uuid) {
if reopen.generation != self.generation {
return Ok(());
}
map.unavailable.remove(&self.uuid);
map.create(&self.uuid, path, None, self.map_size)?;
}
Ok(())
}
/// Attempts to close the index, which may or may not result in the index being closed
/// (e.g. if another thread already reopened the index again).
///
/// Use get again on the IndexMap to get the updated status.
///
/// # Status table
///
/// | Previous Status | New Status |
/// |-----------------|--------------------------------------------|
/// | Missing | Missing |
/// | BeingDeleted | BeingDeleted |
/// | Closing | Missing or Closing depending on generation |
/// | Available | Available |
pub fn close(self, map: &mut IndexMap) {
if let Closing(reopen) = map.get(&self.uuid) {
if reopen.generation != self.generation {
return;
}
map.unavailable.remove(&self.uuid);
}
}
}
impl IndexMap {
pub fn new(cap: usize) -> IndexMap {
Self { unavailable: Default::default(), available: LruMap::new(cap), generation: 0 }
}
/// Gets the current status of an index in the map.
///
/// If the index is available it can be accessed from the returned status.
pub fn get(&self, uuid: &Uuid) -> IndexStatus {
self.available
.get(uuid)
.map(|index| Available(index.clone()))
.unwrap_or_else(|| self.get_unavailable(uuid))
}
fn get_unavailable(&self, uuid: &Uuid) -> IndexStatus {
match self.unavailable.get(uuid) {
Some(Some(reopen)) => Closing(reopen.clone()),
Some(None) => BeingDeleted,
None => Missing,
}
}
/// Attempts to create a new index that wasn't existing before.
///
/// # Status table
///
/// | Previous Status | New Status |
/// |-----------------|------------|
/// | Missing | Available |
/// | BeingDeleted | panics |
/// | Closing | panics |
/// | Available | panics |
///
pub fn create(
&mut self,
uuid: &Uuid,
path: &Path,
date: Option<(OffsetDateTime, OffsetDateTime)>,
map_size: usize,
) -> Result<Index> {
if !matches!(self.get_unavailable(uuid), Missing) {
panic!("Attempt to open an index that was unavailable");
}
let index = create_or_open_index(path, date, map_size)?;
match self.available.insert(*uuid, index.clone()) {
InsertionOutcome::InsertedNew => (),
InsertionOutcome::Evicted(evicted_uuid, evicted_index) => {
self.close(evicted_uuid, evicted_index, 0);
}
InsertionOutcome::Replaced(_) => {
panic!("Attempt to open an index that was already opened")
}
}
Ok(index)
}
/// Increases the current generation. See documentation for this field.
///
/// In the unlikely event that the 2^64 generations would have been exhausted, we simply wrap-around.
///
/// For this to cause an issue, one should be able to stop a reader in time after it got a `ReopenableIndex` and before it takes the lock
/// to remove it from the unavailable map, and keep the reader in this frozen state for 2^64 closing of other indexes.
///
/// This seems overwhelmingly impossible to achieve in practice.
fn next_generation(&mut self) -> usize {
self.generation = self.generation.wrapping_add(1);
self.generation
}
/// Attempts to close an index.
///
/// # Status table
///
/// | Previous Status | New Status |
/// |-----------------|---------------|
/// | Missing | Missing |
/// | BeingDeleted | BeingDeleted |
/// | Closing | Closing |
/// | Available | Closing |
///
pub fn close_for_resize(&mut self, uuid: &Uuid, map_size_growth: usize) {
let Some(index) = self.available.remove(uuid) else { return; };
self.close(*uuid, index, map_size_growth);
}
fn close(&mut self, uuid: Uuid, index: Index, map_size_growth: usize) {
let map_size = index.map_size().unwrap_or(DEFAULT_MAP_SIZE) + map_size_growth;
let closing_event = index.prepare_for_closing();
let generation = self.next_generation();
self.unavailable
.insert(uuid, Some(ClosingIndex { uuid, closing_event, map_size, generation }));
}
/// Attempts to delete and index.
///
/// `end_deletion` must be called just after.
///
/// # Status table
///
/// | Previous Status | New Status | Return value |
/// |-----------------|--------------|-----------------------------|
/// | Missing | BeingDeleted | Ok(None) |
/// | BeingDeleted | BeingDeleted | Err(None) |
/// | Closing | Closing | Err(Some(reopen)) |
/// | Available | BeingDeleted | Ok(Some(env_closing_event)) |
pub fn start_deletion(
&mut self,
uuid: &Uuid,
) -> std::result::Result<Option<EnvClosingEvent>, Option<ClosingIndex>> {
if let Some(index) = self.available.remove(uuid) {
self.unavailable.insert(*uuid, None);
return Ok(Some(index.prepare_for_closing()));
}
match self.unavailable.remove(uuid) {
Some(Some(reopen)) => Err(Some(reopen)),
Some(None) => Err(None),
None => Ok(None),
}
}
/// Marks that an index deletion finished.
///
/// Must be used after calling `start_deletion`.
///
/// # Status table
///
/// | Previous Status | New Status |
/// |-----------------|------------|
/// | Missing | Missing |
/// | BeingDeleted | Missing |
/// | Closing | panics |
/// | Available | panics |
pub fn end_deletion(&mut self, uuid: &Uuid) {
assert!(
self.available.get(uuid).is_none(),
"Attempt to finish deletion of an index that was not being deleted"
);
// Do not panic if the index was Missing or BeingDeleted
assert!(
!matches!(self.unavailable.remove(uuid), Some(Some(_))),
"Attempt to finish deletion of an index that was being closed"
);
}
}
/// Create or open an index in the specified path.
/// The path *must* exist or an error will be thrown.
fn create_or_open_index(
path: &Path,
date: Option<(OffsetDateTime, OffsetDateTime)>,
map_size: usize,
) -> Result<Index> {
let mut options = EnvOpenOptions::new();
options.map_size(clamp_to_page_size(map_size));
options.max_readers(1024);
if let Some((created, updated)) = date {
Ok(Index::new_with_creation_dates(options, path, created, updated)?)
} else {
Ok(Index::new(options, path)?)
}
}
/// Putting the tests of the LRU down there so we have access to the cache's private members
#[cfg(test)]
mod tests {
use meilisearch_types::heed::Env;
use meilisearch_types::Index;
use uuid::Uuid;
use super::super::IndexMapper;
use crate::tests::IndexSchedulerHandle;
use crate::utils::clamp_to_page_size;
use crate::IndexScheduler;
impl IndexMapper {
fn test() -> (Self, Env, IndexSchedulerHandle) {
let (index_scheduler, handle) = IndexScheduler::test(true, vec![]);
(index_scheduler.index_mapper, index_scheduler.env, handle)
}
}
fn check_first_unavailable(mapper: &IndexMapper, expected_uuid: Uuid, is_closing: bool) {
let index_map = mapper.index_map.read().unwrap();
let (uuid, state) = index_map.unavailable.first_key_value().unwrap();
assert_eq!(uuid, &expected_uuid);
assert_eq!(state.is_some(), is_closing);
}
#[test]
fn evict_indexes() {
let (mapper, env, _handle) = IndexMapper::test();
let mut uuids = vec![];
// LRU cap + 1
for i in 0..(5 + 1) {
let index_name = format!("index-{i}");
let wtxn = env.write_txn().unwrap();
mapper.create_index(wtxn, &index_name, None).unwrap();
let txn = env.read_txn().unwrap();
uuids.push(mapper.index_mapping.get(&txn, &index_name).unwrap().unwrap());
}
// index-0 was evicted
check_first_unavailable(&mapper, uuids[0], true);
// get back the evicted index
let wtxn = env.write_txn().unwrap();
mapper.create_index(wtxn, "index-0", None).unwrap();
// Least recently used is now index-1
check_first_unavailable(&mapper, uuids[1], true);
}
#[test]
fn resize_index() {
let (mapper, env, _handle) = IndexMapper::test();
let index = mapper.create_index(env.write_txn().unwrap(), "index", None).unwrap();
assert_index_size(index, mapper.index_base_map_size);
mapper.resize_index(&env.read_txn().unwrap(), "index").unwrap();
let index = mapper.create_index(env.write_txn().unwrap(), "index", None).unwrap();
assert_index_size(index, mapper.index_base_map_size + mapper.index_growth_amount);
mapper.resize_index(&env.read_txn().unwrap(), "index").unwrap();
let index = mapper.create_index(env.write_txn().unwrap(), "index", None).unwrap();
assert_index_size(index, mapper.index_base_map_size + mapper.index_growth_amount * 2);
}
fn assert_index_size(index: Index, expected: usize) {
let expected = clamp_to_page_size(expected);
let index_map_size = index.map_size().unwrap();
assert_eq!(index_map_size, expected);
}
}

View File

@ -0,0 +1,370 @@
use std::path::PathBuf;
use std::sync::{Arc, RwLock};
use std::time::Duration;
use std::{fs, thread};
use log::error;
use meilisearch_types::heed::types::Str;
use meilisearch_types::heed::{Database, Env, RoTxn, RwTxn};
use meilisearch_types::milli::update::IndexerConfig;
use meilisearch_types::milli::Index;
use time::OffsetDateTime;
use uuid::Uuid;
use self::index_map::IndexMap;
use self::IndexStatus::{Available, BeingDeleted, Closing, Missing};
use crate::uuid_codec::UuidCodec;
use crate::{Error, Result};
mod index_map;
const INDEX_MAPPING: &str = "index-mapping";
/// Structure managing meilisearch's indexes.
///
/// It is responsible for:
/// 1. Creating new indexes
/// 2. Opening indexes and storing references to these opened indexes
/// 3. Accessing indexes through their uuid
/// 4. Mapping a user-defined name to each index uuid.
///
/// # Implementation notes
///
/// An index exists as 3 bits of data:
/// 1. The index data on disk, that can exist in 3 states: Missing, Present, or BeingDeleted.
/// 2. The persistent database containing the association between the index' name and its UUID,
/// that can exist in 2 states: Missing or Present.
/// 3. The state of the index in the in-memory `IndexMap`, that can exist in multiple states:
/// - Missing
/// - Available
/// - Closing (because an index needs resizing or was evicted from the cache)
/// - BeingDeleted
///
/// All of this data should be kept consistent between index operations, which is achieved by the `IndexMapper`
/// with the use of the following primitives:
/// - A RwLock on the `IndexMap`.
/// - Transactions on the association database.
/// - ClosingEvent signals emitted when closing an environment.
#[derive(Clone)]
pub struct IndexMapper {
/// Keep track of the opened indexes. Used mainly by the index resolver.
index_map: Arc<RwLock<IndexMap>>,
/// Map an index name with an index uuid currently available on disk.
pub(crate) index_mapping: Database<Str, UuidCodec>,
/// Path to the folder where the LMDB environments of each index are.
base_path: PathBuf,
/// The map size an index is opened with on the first time.
index_base_map_size: usize,
/// The quantity by which the map size of an index is incremented upon reopening, in bytes.
index_growth_amount: usize,
pub indexer_config: Arc<IndexerConfig>,
}
/// Whether the index is available for use or is forbidden to be inserted back in the index map
#[allow(clippy::large_enum_variant)]
#[derive(Clone)]
pub enum IndexStatus {
/// Not currently in the index map.
Missing,
/// Do not insert it back in the index map as it is currently being deleted.
BeingDeleted,
/// Temporarily do not insert the index in the index map as it is currently being resized/evicted from the map.
Closing(index_map::ClosingIndex),
/// You can use the index without worrying about anything.
Available(Index),
}
impl IndexMapper {
pub fn new(
env: &Env,
base_path: PathBuf,
index_base_map_size: usize,
index_growth_amount: usize,
index_count: usize,
indexer_config: IndexerConfig,
) -> Result<Self> {
Ok(Self {
index_map: Arc::new(RwLock::new(IndexMap::new(index_count))),
index_mapping: env.create_database(Some(INDEX_MAPPING))?,
base_path,
index_base_map_size,
index_growth_amount,
indexer_config: Arc::new(indexer_config),
})
}
/// Get or create the index.
pub fn create_index(
&self,
mut wtxn: RwTxn,
name: &str,
date: Option<(OffsetDateTime, OffsetDateTime)>,
) -> Result<Index> {
match self.index(&wtxn, name) {
Ok(index) => {
wtxn.commit()?;
Ok(index)
}
Err(Error::IndexNotFound(_)) => {
let uuid = Uuid::new_v4();
self.index_mapping.put(&mut wtxn, name, &uuid)?;
let index_path = self.base_path.join(uuid.to_string());
fs::create_dir_all(&index_path)?;
// Error if the UUIDv4 somehow already exists in the map, since it should be fresh.
// This is very unlikely to happen in practice.
// TODO: it would be better to lazily create the index. But we need an Index::open function for milli.
let index = self.index_map.write().unwrap().create(
&uuid,
&index_path,
date,
self.index_base_map_size,
)?;
wtxn.commit()?;
Ok(index)
}
error => error,
}
}
/// Removes the index from the mapping table and the in-memory index map
/// but keeps the associated tasks.
pub fn delete_index(&self, mut wtxn: RwTxn, name: &str) -> Result<()> {
let uuid = self
.index_mapping
.get(&wtxn, name)?
.ok_or_else(|| Error::IndexNotFound(name.to_string()))?;
// Once we retrieved the UUID of the index we remove it from the mapping table.
assert!(self.index_mapping.delete(&mut wtxn, name)?);
wtxn.commit()?;
let mut tries = 0;
// Attempts to remove the index from the in-memory index map in a loop.
//
// If the index is currently being closed, we will wait for it to be closed and retry getting it in a subsequent
// loop iteration.
//
// We make 100 attempts before giving up.
// This could happen in the following situations:
//
// 1. There is a bug preventing the index from being correctly closed, or us from detecting this.
// 2. A user of the index is keeping it open for more than 600 seconds. This could happen e.g. during a pathological search.
// This can not be caused by indexation because deleting an index happens in the scheduler itself, so cannot be concurrent with indexation.
//
// In these situations, reporting the error through a panic is in order.
let closing_event = loop {
let mut lock = self.index_map.write().unwrap();
match lock.start_deletion(&uuid) {
Ok(env_closing) => break env_closing,
Err(Some(reopen)) => {
// drop the lock here so that we don't synchronously wait for the index to close.
drop(lock);
tries += 1;
if tries >= 100 {
panic!("Too many attempts to close index {name} prior to deletion.")
}
let reopen = if let Some(reopen) = reopen.wait_timeout(Duration::from_secs(6)) {
reopen
} else {
continue;
};
reopen.close(&mut self.index_map.write().unwrap());
continue;
}
Err(None) => return Ok(()),
}
};
let index_map = self.index_map.clone();
let index_path = self.base_path.join(uuid.to_string());
let index_name = name.to_string();
thread::Builder::new()
.name(String::from("index_deleter"))
.spawn(move || {
// We first wait to be sure that the previously opened index is effectively closed.
// This can take a lot of time, this is why we do that in a separate thread.
if let Some(closing_event) = closing_event {
closing_event.wait();
}
// Then we remove the content from disk.
if let Err(e) = fs::remove_dir_all(&index_path) {
error!(
"An error happened when deleting the index {} ({}): {}",
index_name, uuid, e
);
}
// Finally we remove the entry from the index map.
index_map.write().unwrap().end_deletion(&uuid);
})
.unwrap();
Ok(())
}
pub fn exists(&self, rtxn: &RoTxn, name: &str) -> Result<bool> {
Ok(self.index_mapping.get(rtxn, name)?.is_some())
}
/// Resizes the maximum size of the specified index to the double of its current maximum size.
///
/// This operation involves closing the underlying environment and so can take a long time to complete.
///
/// # Panics
///
/// - If the Index corresponding to the passed name is concurrently being deleted/resized or cannot be found in the
/// in memory hash map.
pub fn resize_index(&self, rtxn: &RoTxn, name: &str) -> Result<()> {
let uuid = self
.index_mapping
.get(rtxn, name)?
.ok_or_else(|| Error::IndexNotFound(name.to_string()))?;
// We remove the index from the in-memory index map.
self.index_map.write().unwrap().close_for_resize(&uuid, self.index_growth_amount);
Ok(())
}
/// Return an index, may open it if it wasn't already opened.
pub fn index(&self, rtxn: &RoTxn, name: &str) -> Result<Index> {
let uuid = self
.index_mapping
.get(rtxn, name)?
.ok_or_else(|| Error::IndexNotFound(name.to_string()))?;
let mut tries = 0;
// attempts to open the index in a loop.
//
// If the index is currently being closed, we will wait for it to be closed and retry getting it in a subsequent
// loop iteration.
//
// We make 100 attempts before giving up.
// This could happen in the following situations:
//
// 1. There is a bug preventing the index from being correctly closed, or us from detecting it was.
// 2. A user of the index is keeping it open for more than 600 seconds. This could happen e.g. during a long indexation,
// a pathological search, and so on.
//
// In these situations, reporting the error through a panic is in order.
let index = loop {
tries += 1;
if tries > 100 {
panic!("Too many spurious wake ups while trying to open the index {name}");
}
// we get the index here to drop the lock before entering the match
let index = self.index_map.read().unwrap().get(&uuid);
match index {
Available(index) => break index,
Closing(reopen) => {
// Avoiding deadlocks: no lock taken while doing this operation.
let reopen = if let Some(reopen) = reopen.wait_timeout(Duration::from_secs(6)) {
reopen
} else {
continue;
};
let index_path = self.base_path.join(uuid.to_string());
// take the lock to reopen the environment.
reopen.reopen(&mut self.index_map.write().unwrap(), &index_path)?;
continue;
}
BeingDeleted => return Err(Error::IndexNotFound(name.to_string())),
// since we're lazy, it's possible that the index has not been opened yet.
Missing => {
let mut index_map = self.index_map.write().unwrap();
// between the read lock and the write lock it's not impossible
// that someone already opened the index (eg if two searches happen
// at the same time), thus before opening it we check a second time
// if it's not already there.
match index_map.get(&uuid) {
Missing => {
let index_path = self.base_path.join(uuid.to_string());
break index_map.create(
&uuid,
&index_path,
None,
self.index_base_map_size,
)?;
}
Available(index) => break index,
Closing(_) => {
// the reopening will be handled in the next loop operation
continue;
}
BeingDeleted => return Err(Error::IndexNotFound(name.to_string())),
}
}
}
};
Ok(index)
}
/// Attempts `f` for each index that exists in the index mapper.
///
/// It is preferable to use this function rather than a loop that opens all indexes, as a way to avoid having all indexes opened,
/// which is unsupported in general.
///
/// Since `f` is allowed to return a result, and `Index` is cloneable, it is still possible to wrongly build e.g. a vector of
/// all the indexes, but this function makes it harder and so less likely to do accidentally.
pub fn try_for_each_index<U, V>(
&self,
rtxn: &RoTxn,
mut f: impl FnMut(&str, &Index) -> Result<U>,
) -> Result<V>
where
V: FromIterator<U>,
{
self.index_mapping
.iter(rtxn)?
.map(|res| {
res.map_err(Error::from)
.and_then(|(name, _)| self.index(rtxn, name).and_then(|index| f(name, &index)))
})
.collect()
}
/// Return the name of all indexes without opening them.
pub fn index_names(&self, rtxn: &RoTxn) -> Result<Vec<String>> {
self.index_mapping
.iter(rtxn)?
.map(|res| res.map_err(Error::from).map(|(name, _)| name.to_string()))
.collect()
}
/// Swap two index names.
pub fn swap(&self, wtxn: &mut RwTxn, lhs: &str, rhs: &str) -> Result<()> {
let lhs_uuid = self
.index_mapping
.get(wtxn, lhs)?
.ok_or_else(|| Error::IndexNotFound(lhs.to_string()))?;
let rhs_uuid = self
.index_mapping
.get(wtxn, rhs)?
.ok_or_else(|| Error::IndexNotFound(rhs.to_string()))?;
self.index_mapping.put(wtxn, lhs, &rhs_uuid)?;
self.index_mapping.put(wtxn, rhs, &lhs_uuid)?;
Ok(())
}
pub fn index_exists(&self, rtxn: &RoTxn, name: &str) -> Result<bool> {
Ok(self.index_mapping.get(rtxn, name)?.is_some())
}
pub fn indexer_config(&self) -> &IndexerConfig {
&self.indexer_config
}
}

View File

@ -254,6 +254,6 @@ pub fn snapshot_canceled_by(
snap
}
pub fn snapshot_index_mapper(rtxn: &RoTxn, mapper: &IndexMapper) -> String {
let names = mapper.indexes(rtxn).unwrap().into_iter().map(|(n, _)| n).collect::<Vec<_>>();
let names = mapper.index_names(rtxn).unwrap();
format!("{names:?}")
}

View File

@ -24,6 +24,7 @@ pub mod error;
mod index_mapper;
#[cfg(test)]
mod insta_snapshot;
mod lru;
mod utils;
mod uuid_codec;
@ -31,7 +32,7 @@ pub type Result<T> = std::result::Result<T, Error>;
pub type TaskId = u32;
use std::ops::{Bound, RangeBounds};
use std::path::PathBuf;
use std::path::{Path, PathBuf};
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering::Relaxed;
use std::sync::{Arc, RwLock};
@ -229,8 +230,12 @@ pub struct IndexSchedulerOptions {
pub dumps_path: PathBuf,
/// The maximum size, in bytes, of the task index.
pub task_db_size: usize,
/// The maximum size, in bytes, of each meilisearch index.
pub index_size: usize,
/// The size, in bytes, with which a meilisearch index is opened the first time of each meilisearch index.
pub index_base_map_size: usize,
/// The size, in bytes, by which the map size of an index is increased when it resized due to being full.
pub index_growth_amount: usize,
/// The number of indexes that can be concurrently opened in memory.
pub index_count: usize,
/// Configuration used during indexing for each meilisearch index.
pub indexer_config: IndexerConfig,
/// Set to `true` iff the index scheduler is allowed to automatically
@ -360,9 +365,25 @@ impl IndexScheduler {
std::fs::create_dir_all(&options.indexes_path)?;
std::fs::create_dir_all(&options.dumps_path)?;
let task_db_size = clamp_to_page_size(options.task_db_size);
let budget = if options.indexer_config.skip_index_budget {
IndexBudget {
map_size: options.index_base_map_size,
index_count: options.index_count,
task_db_size,
}
} else {
Self::index_budget(
&options.tasks_path,
options.index_base_map_size,
task_db_size,
options.index_count,
)
};
let env = heed::EnvOpenOptions::new()
.max_dbs(10)
.map_size(clamp_to_page_size(options.task_db_size))
.map_size(budget.task_db_size)
.open(options.tasks_path)?;
let file_store = FileStore::new(&options.update_file_path)?;
@ -382,7 +403,9 @@ impl IndexScheduler {
index_mapper: IndexMapper::new(
&env,
options.indexes_path,
options.index_size,
budget.map_size,
options.index_growth_amount,
budget.index_count,
options.indexer_config,
)?,
env,
@ -406,6 +429,75 @@ impl IndexScheduler {
Ok(this)
}
fn index_budget(
tasks_path: &Path,
base_map_size: usize,
mut task_db_size: usize,
max_index_count: usize,
) -> IndexBudget {
#[cfg(windows)]
const DEFAULT_BUDGET: usize = 6 * 1024 * 1024 * 1024 * 1024; // 6 TiB, 1 index
#[cfg(not(windows))]
const DEFAULT_BUDGET: usize = 80 * 1024 * 1024 * 1024 * 1024; // 80 TiB, 18 indexes
let budget = if Self::is_good_heed(tasks_path, DEFAULT_BUDGET) {
DEFAULT_BUDGET
} else {
log::debug!("determining budget with dichotomic search");
utils::dichotomic_search(DEFAULT_BUDGET / 2, |map_size| {
Self::is_good_heed(tasks_path, map_size)
})
};
log::debug!("memmap budget: {budget}B");
let mut budget = budget / 2;
if task_db_size > (budget / 2) {
task_db_size = clamp_to_page_size(budget * 2 / 5);
log::debug!(
"Decreasing max size of task DB to {task_db_size}B due to constrained memory space"
);
}
budget -= task_db_size;
// won't be mutated again
let budget = budget;
let task_db_size = task_db_size;
log::debug!("index budget: {budget}B");
let mut index_count = budget / base_map_size;
if index_count < 2 {
// take a bit less than half than the budget to make sure we can always afford to open an index
let map_size = (budget * 2) / 5;
// single index of max budget
log::debug!("1 index of {map_size}B can be opened simultaneously.");
return IndexBudget { map_size, index_count: 1, task_db_size };
}
// give us some space for an additional index when the cache is already full
// decrement is OK because index_count >= 2.
index_count -= 1;
if index_count > max_index_count {
index_count = max_index_count;
}
log::debug!("Up to {index_count} indexes of {base_map_size}B opened simultaneously.");
IndexBudget { map_size: base_map_size, index_count, task_db_size }
}
fn is_good_heed(tasks_path: &Path, map_size: usize) -> bool {
if let Ok(env) =
heed::EnvOpenOptions::new().map_size(clamp_to_page_size(map_size)).open(tasks_path)
{
env.prepare_for_closing().wait();
true
} else {
// We're treating all errors equally here, not only allocation errors.
// This means there's a possiblity for the budget to lower due to errors different from allocation errors.
// For persistent errors, this is OK as long as the task db is then reopened normally without ignoring the error this time.
// For transient errors, this could lead to an instance with too low a budget.
// However transient errors are: 1) less likely than persistent errors 2) likely to cause other issues down the line anyway.
false
}
}
pub fn read_txn(&self) -> Result<RoTxn> {
self.env.read_txn().map_err(|e| e.into())
}
@ -422,12 +514,12 @@ impl IndexScheduler {
#[cfg(test)]
run.breakpoint(Breakpoint::Init);
loop {
run.wake_up.wait();
run.wake_up.wait();
loop {
match run.tick() {
Ok(0) => (),
Ok(_) => run.wake_up.signal(),
Ok(TickOutcome::TickAgain(_)) => (),
Ok(TickOutcome::WaitForSignal) => run.wake_up.wait(),
Err(e) => {
log::error!("{}", e);
// Wait one second when an irrecoverable error occurs.
@ -440,7 +532,6 @@ impl IndexScheduler {
) {
std::thread::sleep(Duration::from_secs(1));
}
run.wake_up.signal();
}
}
}
@ -460,15 +551,42 @@ impl IndexScheduler {
///
/// * If the index wasn't opened before, the index will be opened.
/// * If the index doesn't exist on disk, the `IndexNotFoundError` is thrown.
///
/// ### Note
///
/// As an `Index` requires a large swath of the virtual memory address space, correct usage of an `Index` does not
/// keep its handle for too long.
///
/// Some configurations also can't reasonably open multiple indexes at once.
/// If you need to fetch information from or perform an action on all indexes,
/// see the `try_for_each_index` function.
pub fn index(&self, name: &str) -> Result<Index> {
let rtxn = self.env.read_txn()?;
self.index_mapper.index(&rtxn, name)
}
/// Return and open all the indexes.
pub fn indexes(&self) -> Result<Vec<(String, Index)>> {
/// Return the name of all indexes without opening them.
pub fn index_names(self) -> Result<Vec<String>> {
let rtxn = self.env.read_txn()?;
self.index_mapper.indexes(&rtxn)
self.index_mapper.index_names(&rtxn)
}
/// Attempts `f` for each index that exists known to the index scheduler.
///
/// It is preferable to use this function rather than a loop that opens all indexes, as a way to avoid having all indexes opened,
/// which is unsupported in general.
///
/// Since `f` is allowed to return a result, and `Index` is cloneable, it is still possible to wrongly build e.g. a vector of
/// all the indexes, but this function makes it harder and so less likely to do accidentally.
///
/// If many indexes exist, this operation can take time to complete (in the order of seconds for a 1000 of indexes) as it needs to open
/// all the indexes.
pub fn try_for_each_index<U, V>(&self, f: impl FnMut(&str, &Index) -> Result<U>) -> Result<V>
where
V: FromIterator<U>,
{
let rtxn = self.env.read_txn()?;
self.index_mapper.try_for_each_index(&rtxn, f)
}
/// Return the task ids matched by the given query from the index scheduler's point of view.
@ -630,13 +748,13 @@ impl IndexScheduler {
&self,
rtxn: &RoTxn,
query: &Query,
authorized_indexes: &Option<Vec<String>>,
filters: &meilisearch_auth::AuthFilter,
) -> Result<RoaringBitmap> {
let mut tasks = self.get_task_ids(rtxn, query)?;
// If the query contains a list of index uid or there is a finite list of authorized indexes,
// then we must exclude all the kinds that aren't associated to one and only one index.
if query.index_uids.is_some() || authorized_indexes.is_some() {
if query.index_uids.is_some() || !filters.all_indexes_authorized() {
for kind in enum_iterator::all::<Kind>().filter(|kind| !kind.related_to_one_index()) {
tasks -= self.get_kind(rtxn, kind)?;
}
@ -644,11 +762,11 @@ impl IndexScheduler {
// Any task that is internally associated with a non-authorized index
// must be discarded.
if let Some(authorized_indexes) = authorized_indexes {
if !filters.all_indexes_authorized() {
let all_indexes_iter = self.index_tasks.iter(rtxn)?;
for result in all_indexes_iter {
let (index, index_tasks) = result?;
if !authorized_indexes.contains(&index.to_owned()) {
if !filters.is_index_authorized(index) {
tasks -= index_tasks;
}
}
@ -668,12 +786,11 @@ impl IndexScheduler {
pub fn get_tasks_from_authorized_indexes(
&self,
query: Query,
authorized_indexes: Option<Vec<String>>,
filters: &meilisearch_auth::AuthFilter,
) -> Result<Vec<Task>> {
let rtxn = self.env.read_txn()?;
let tasks =
self.get_task_ids_from_authorized_indexes(&rtxn, &query, &authorized_indexes)?;
let tasks = self.get_task_ids_from_authorized_indexes(&rtxn, &query, filters)?;
let tasks = self.get_existing_tasks(
&rtxn,
@ -764,8 +881,8 @@ impl IndexScheduler {
Ok(task)
}
/// Register a new task comming from a dump in the scheduler.
/// By takinig a mutable ref we're pretty sure no one will ever import a dump while actix is running.
/// Register a new task coming from a dump in the scheduler.
/// By taking a mutable ref we're pretty sure no one will ever import a dump while actix is running.
pub fn register_dumped_task(
&mut self,
task: TaskDump,
@ -926,7 +1043,7 @@ impl IndexScheduler {
/// 5. Reset the in-memory list of processed tasks.
///
/// Returns the number of processed tasks.
fn tick(&self) -> Result<usize> {
fn tick(&self) -> Result<TickOutcome> {
#[cfg(test)]
{
*self.run_loop_iteration.write().unwrap() += 1;
@ -937,8 +1054,9 @@ impl IndexScheduler {
let batch =
match self.create_next_batch(&rtxn).map_err(|e| Error::CreateBatch(Box::new(e)))? {
Some(batch) => batch,
None => return Ok(0),
None => return Ok(TickOutcome::WaitForSignal),
};
let index_uid = batch.index_uid().map(ToOwned::to_owned);
drop(rtxn);
// 1. store the starting date with the bitmap of processing tasks.
@ -1009,7 +1127,23 @@ impl IndexScheduler {
// the `started_at` date times and `processings` of the current processing tasks.
// This date time is used by the task cancelation to store the right `started_at`
// date in the task on disk.
return Ok(0);
return Ok(TickOutcome::TickAgain(0));
}
// If an index said it was full, we need to:
// 1. identify which index is full
// 2. close the associated environment
// 3. resize it
// 4. re-schedule tasks
Err(Error::Milli(milli::Error::UserError(
milli::UserError::MaxDatabaseSizeReached,
))) if index_uid.is_some() => {
// fixme: add index_uid to match to avoid the unwrap
let index_uid = index_uid.unwrap();
// fixme: handle error more gracefully? not sure when this could happen
self.index_mapper.resize_index(&wtxn, &index_uid)?;
wtxn.abort().map_err(Error::HeedTransaction)?;
return Ok(TickOutcome::TickAgain(0));
}
// In case of a failure we must get back and patch all the tasks with the error.
Err(err) => {
@ -1049,7 +1183,7 @@ impl IndexScheduler {
#[cfg(test)]
self.breakpoint(Breakpoint::AfterProcessing);
Ok(processed_tasks)
Ok(TickOutcome::TickAgain(processed_tasks))
}
pub(crate) fn delete_persisted_task_data(&self, task: &Task) -> Result<()> {
@ -1084,6 +1218,26 @@ impl IndexScheduler {
}
}
/// The outcome of calling the [`IndexScheduler::tick`] function.
pub enum TickOutcome {
/// The scheduler should immediately attempt another `tick`.
///
/// The `usize` field contains the number of processed tasks.
TickAgain(usize),
/// The scheduler should wait for an external signal before attempting another `tick`.
WaitForSignal,
}
/// How many indexes we can afford to have open simultaneously.
struct IndexBudget {
/// Map size of an index.
map_size: usize,
/// Maximum number of simultaneously opened indexes.
index_count: usize,
/// For very constrained systems we might need to reduce the base task_db_size so we can accept at least one index.
task_db_size: usize,
}
#[cfg(test)]
mod tests {
use std::io::{BufWriter, Seek, Write};
@ -1093,7 +1247,9 @@ mod tests {
use crossbeam::channel::RecvTimeoutError;
use file_store::File;
use meili_snap::snapshot;
use meilisearch_auth::AuthFilter;
use meilisearch_types::document_formats::DocumentFormatError;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::milli::obkv_to_json;
use meilisearch_types::milli::update::IndexDocumentsMethod::{
ReplaceDocuments, UpdateDocuments,
@ -1127,6 +1283,8 @@ mod tests {
let tempdir = TempDir::new().unwrap();
let (sender, receiver) = crossbeam::channel::bounded(0);
let indexer_config = IndexerConfig { skip_index_budget: true, ..Default::default() };
let options = IndexSchedulerOptions {
version_file_path: tempdir.path().join(VERSION_FILE_NAME),
auth_path: tempdir.path().join("auth"),
@ -1136,8 +1294,10 @@ mod tests {
snapshots_path: tempdir.path().join("snapshots"),
dumps_path: tempdir.path().join("dumps"),
task_db_size: 1000 * 1000, // 1 MB, we don't use MiB on purpose.
index_size: 1000 * 1000, // 1 MB, we don't use MiB on purpose.
indexer_config: IndexerConfig::default(),
index_base_map_size: 1000 * 1000, // 1 MB, we don't use MiB on purpose.
index_growth_amount: 1000 * 1000, // 1 MB
index_count: 5,
indexer_config,
autobatching_enabled,
};
@ -1679,6 +1839,105 @@ mod tests {
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "both_task_succeeded");
}
#[test]
fn document_addition_and_document_deletion() {
let (index_scheduler, mut handle) = IndexScheduler::test(true, vec![]);
let content = r#"[
{ "id": 1, "doggo": "jean bob" },
{ "id": 2, "catto": "jorts" },
{ "id": 3, "doggo": "bork" }
]"#;
let (uuid, mut file) = index_scheduler.create_update_file_with_uuid(0).unwrap();
let documents_count = read_json(content.as_bytes(), file.as_file_mut()).unwrap();
file.persist().unwrap();
index_scheduler
.register(KindWithContent::DocumentAdditionOrUpdate {
index_uid: S("doggos"),
primary_key: Some(S("id")),
method: ReplaceDocuments,
content_file: uuid,
documents_count,
allow_index_creation: true,
})
.unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "registered_the_first_task");
index_scheduler
.register(KindWithContent::DocumentDeletion {
index_uid: S("doggos"),
documents_ids: vec![S("1"), S("2")],
})
.unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "registered_the_second_task");
handle.advance_one_successful_batch(); // The addition AND deletion should've been batched together
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "after_processing_the_batch");
let index = index_scheduler.index("doggos").unwrap();
let rtxn = index.read_txn().unwrap();
let field_ids_map = index.fields_ids_map(&rtxn).unwrap();
let field_ids = field_ids_map.ids().collect::<Vec<_>>();
let documents = index
.all_documents(&rtxn)
.unwrap()
.map(|ret| obkv_to_json(&field_ids, &field_ids_map, ret.unwrap().1).unwrap())
.collect::<Vec<_>>();
snapshot!(serde_json::to_string_pretty(&documents).unwrap(), name: "documents");
}
#[test]
fn document_deletion_and_document_addition() {
let (index_scheduler, mut handle) = IndexScheduler::test(true, vec![]);
index_scheduler
.register(KindWithContent::DocumentDeletion {
index_uid: S("doggos"),
documents_ids: vec![S("1"), S("2")],
})
.unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "registered_the_first_task");
let content = r#"[
{ "id": 1, "doggo": "jean bob" },
{ "id": 2, "catto": "jorts" },
{ "id": 3, "doggo": "bork" }
]"#;
let (uuid, mut file) = index_scheduler.create_update_file_with_uuid(0).unwrap();
let documents_count = read_json(content.as_bytes(), file.as_file_mut()).unwrap();
file.persist().unwrap();
index_scheduler
.register(KindWithContent::DocumentAdditionOrUpdate {
index_uid: S("doggos"),
primary_key: Some(S("id")),
method: ReplaceDocuments,
content_file: uuid,
documents_count,
allow_index_creation: true,
})
.unwrap();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "registered_the_second_task");
// The deletion should have failed because it can't create an index
handle.advance_one_failed_batch();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "after_failing_the_deletion");
// The addition should works
handle.advance_one_successful_batch();
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "after_last_successful_addition");
let index = index_scheduler.index("doggos").unwrap();
let rtxn = index.read_txn().unwrap();
let field_ids_map = index.fields_ids_map(&rtxn).unwrap();
let field_ids = field_ids_map.ids().collect::<Vec<_>>();
let documents = index
.all_documents(&rtxn)
.unwrap()
.map(|ret| obkv_to_json(&field_ids, &field_ids_map, ret.unwrap().1).unwrap())
.collect::<Vec<_>>();
snapshot!(serde_json::to_string_pretty(&documents).unwrap(), name: "documents");
}
#[test]
fn do_not_batch_task_of_different_indexes() {
let (index_scheduler, mut handle) = IndexScheduler::test(true, vec![]);
@ -2245,38 +2504,45 @@ mod tests {
let rtxn = index_scheduler.env.read_txn().unwrap();
let query = Query { limit: Some(0), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[]");
let query = Query { limit: Some(1), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
let query = Query { limit: Some(2), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[1,2,]");
let query = Query { from: Some(1), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,]");
let query = Query { from: Some(2), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,2,]");
let query = Query { from: Some(1), limit: Some(1), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[1,]");
let query = Query { from: Some(1), limit: Some(2), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,]");
}
@ -2301,21 +2567,24 @@ mod tests {
let rtxn = index_scheduler.env.read_txn().unwrap();
let query = Query { statuses: Some(vec![Status::Processing]), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,]"); // only the processing tasks in the first tick
let query = Query { statuses: Some(vec![Status::Enqueued]), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[1,2,]"); // only the enqueued tasks in the first tick
let query = Query {
statuses: Some(vec![Status::Enqueued, Status::Processing]),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
snapshot!(snapshot_bitmap(&tasks), @"[0,1,2,]"); // both enqueued and processing tasks in the first tick
let query = Query {
@ -2323,8 +2592,9 @@ mod tests {
after_started_at: Some(start_time),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test, which should excludes the enqueued tasks
snapshot!(snapshot_bitmap(&tasks), @"[0,]");
@ -2334,8 +2604,9 @@ mod tests {
before_started_at: Some(start_time),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes before the start of the test, which should excludes all of them
snapshot!(snapshot_bitmap(&tasks), @"[]");
@ -2346,8 +2617,9 @@ mod tests {
before_started_at: Some(start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both enqueued and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test and before one minute after the start of the test,
// which should exclude the enqueued tasks and include the only processing task
@ -2372,8 +2644,9 @@ mod tests {
before_started_at: Some(start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the test and before one minute after the start of the test,
// which should include all tasks
@ -2384,8 +2657,9 @@ mod tests {
before_started_at: Some(start_time),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes before the start of the test, which should exclude all tasks
snapshot!(snapshot_bitmap(&tasks), @"[]");
@ -2396,8 +2670,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// both succeeded and processing tasks in the first tick, but limited to those with a started_at
// that comes after the start of the second part of the test and before one minute after the
// second start of the test, which should exclude all tasks
@ -2415,8 +2690,9 @@ mod tests {
let rtxn = index_scheduler.env.read_txn().unwrap();
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// we run the same query to verify that, and indeed find that the last task is matched
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
@ -2426,8 +2702,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// enqueued, succeeded, or processing tasks started after the second part of the test, should
// again only return the last task
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
@ -2437,8 +2714,9 @@ mod tests {
// now the last task should have failed
snapshot!(snapshot_index_scheduler(&index_scheduler), name: "end");
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// so running the last query should return nothing
snapshot!(snapshot_bitmap(&tasks), @"[]");
@ -2448,8 +2726,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// but the same query on failed tasks should return the last task
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
@ -2459,8 +2738,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// but the same query on failed tasks should return the last task
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
@ -2471,8 +2751,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// same query but with an invalid uid
snapshot!(snapshot_bitmap(&tasks), @"[]");
@ -2483,8 +2764,9 @@ mod tests {
before_started_at: Some(second_start_time + Duration::minutes(1)),
..Default::default()
};
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// same query but with a valid uid
snapshot!(snapshot_bitmap(&tasks), @"[2,]");
}
@ -2514,14 +2796,21 @@ mod tests {
let rtxn = index_scheduler.env.read_txn().unwrap();
let query = Query { index_uids: Some(vec!["catto".to_owned()]), ..Default::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// only the first task associated with catto is returned, the indexSwap tasks are excluded!
snapshot!(snapshot_bitmap(&tasks), @"[0,]");
let query = Query { index_uids: Some(vec!["catto".to_owned()]), ..Default::default() };
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &Some(vec!["doggo".to_owned()]))
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
)
.unwrap();
// we have asked for only the tasks associated with catto, but are only authorized to retrieve the tasks
// associated with doggo -> empty result
@ -2529,7 +2818,13 @@ mod tests {
let query = Query::default();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &Some(vec!["doggo".to_owned()]))
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
)
.unwrap();
// we asked for all the tasks, but we are only authorized to retrieve the doggo tasks
// -> only the index creation of doggo should be returned
@ -2540,7 +2835,14 @@ mod tests {
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&Some(vec!["catto".to_owned(), "doggo".to_owned()]),
&AuthFilter::with_allowed_indexes(
vec![
IndexUidPattern::new_unchecked("catto"),
IndexUidPattern::new_unchecked("doggo"),
]
.into_iter()
.collect(),
),
)
.unwrap();
// we asked for all the tasks, but we are only authorized to retrieve the doggo and catto tasks
@ -2548,8 +2850,9 @@ mod tests {
snapshot!(snapshot_bitmap(&tasks), @"[0,1,]");
let query = Query::default();
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// we asked for all the tasks with all index authorized -> all tasks returned
snapshot!(snapshot_bitmap(&tasks), @"[0,1,2,3,]");
}
@ -2580,15 +2883,22 @@ mod tests {
let rtxn = index_scheduler.read_txn().unwrap();
let query = Query { canceled_by: Some(vec![task_cancelation.uid]), ..Query::default() };
let tasks =
index_scheduler.get_task_ids_from_authorized_indexes(&rtxn, &query, &None).unwrap();
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &AuthFilter::default())
.unwrap();
// 0 is not returned because it was not canceled, 3 is not returned because it is the uid of the
// taskCancelation itself
snapshot!(snapshot_bitmap(&tasks), @"[1,2,]");
let query = Query { canceled_by: Some(vec![task_cancelation.uid]), ..Query::default() };
let tasks = index_scheduler
.get_task_ids_from_authorized_indexes(&rtxn, &query, &Some(vec!["doggo".to_string()]))
.get_task_ids_from_authorized_indexes(
&rtxn,
&query,
&AuthFilter::with_allowed_indexes(
vec![IndexUidPattern::new_unchecked("doggo")].into_iter().collect(),
),
)
.unwrap();
// Return only 1 because the user is not authorized to see task 2
snapshot!(snapshot_bitmap(&tasks), @"[1,]");

203
index-scheduler/src/lru.rs Normal file
View File

@ -0,0 +1,203 @@
//! Thread-safe `Vec`-backend LRU cache using [`std::sync::atomic::AtomicU64`] for synchronization.
use std::sync::atomic::{AtomicU64, Ordering};
/// Thread-safe `Vec`-backend LRU cache
#[derive(Debug)]
pub struct Lru<T> {
data: Vec<(AtomicU64, T)>,
generation: AtomicU64,
cap: usize,
}
impl<T> Lru<T> {
/// Creates a new LRU cache with the specified capacity.
///
/// The capacity is allocated up-front, and will never change through a [`Self::put`] operation.
///
/// # Panics
///
/// - If the capacity is 0.
/// - If the capacity exceeds `isize::MAX` bytes.
pub fn new(cap: usize) -> Self {
assert_ne!(cap, 0, "The capacity of a cache cannot be 0");
Self {
// Note: since the element of the vector contains an AtomicU64, it is definitely not zero-sized so cap will never be usize::MAX.
data: Vec::with_capacity(cap),
generation: AtomicU64::new(0),
cap,
}
}
/// The capacity of this LRU cache, that is the maximum number of elements it can hold before evicting elements from the cache.
///
/// The cache will contain at most this number of elements at any given time.
pub fn capacity(&self) -> usize {
self.cap
}
fn next_generation(&self) -> u64 {
// Acquire so this "happens-before" any potential store to a data cell (with Release ordering)
let generation = self.generation.fetch_add(1, Ordering::Acquire);
generation + 1
}
fn next_generation_mut(&mut self) -> u64 {
let generation = self.generation.get_mut();
*generation += 1;
*generation
}
/// Add a value in the cache, evicting an older value if necessary.
///
/// If a value was evicted from the cache, it is returned.
///
/// # Complexity
///
/// - If the cache is full, then linear in the capacity.
/// - Otherwise constant.
pub fn put(&mut self, value: T) -> Option<T> {
// no need for a memory fence: we assume that whichever mechanism provides us synchronization
// (very probably, a RwLock) takes care of fencing for us.
let next_generation = self.next_generation_mut();
let evicted = if self.is_full() { self.pop() } else { None };
self.data.push((AtomicU64::new(next_generation), value));
evicted
}
/// Evict the oldest value from the cache.
///
/// If the cache is empty, `None` will be returned.
///
/// # Complexity
///
/// - Linear in the capacity of the cache.
pub fn pop(&mut self) -> Option<T> {
// Don't use `Iterator::min_by_key` that provides shared references to its elements,
// so that we can get an exclusive one.
// This allows to handles the `AtomicU64`s as normal integers without using atomic instructions.
let mut min_generation_index = None;
for (index, (generation, _)) in self.data.iter_mut().enumerate() {
let generation = *generation.get_mut();
if let Some((_, min_generation)) = min_generation_index {
if min_generation > generation {
min_generation_index = Some((index, generation));
}
} else {
min_generation_index = Some((index, generation))
}
}
min_generation_index.map(|(min_index, _)| self.data.swap_remove(min_index).1)
}
/// The current number of elements in the cache.
///
/// This value is guaranteed to be less than or equal to [`Self::capacity`].
pub fn len(&self) -> usize {
self.data.len()
}
/// Returns `true` if putting any additional element in the cache would cause the eviction of an element.
pub fn is_full(&self) -> bool {
self.len() == self.capacity()
}
}
pub struct LruMap<K, V>(Lru<(K, V)>);
impl<K, V> LruMap<K, V>
where
K: Eq,
{
/// Creates a new LRU cache map with the specified capacity.
///
/// The capacity is allocated up-front, and will never change through a [`Self::insert`] operation.
///
/// # Panics
///
/// - If the capacity is 0.
/// - If the capacity exceeds `isize::MAX` bytes.
pub fn new(cap: usize) -> Self {
Self(Lru::new(cap))
}
/// Gets a value in the cache map by its key.
///
/// If no value matches, `None` will be returned.
///
/// # Complexity
///
/// - Linear in the capacity of the cache.
pub fn get(&self, key: &K) -> Option<&V> {
for (generation, (candidate, value)) in self.0.data.iter() {
if key == candidate {
generation.store(self.0.next_generation(), Ordering::Release);
return Some(value);
}
}
None
}
/// Gets a value in the cache map by its key.
///
/// If no value matches, `None` will be returned.
///
/// # Complexity
///
/// - Linear in the capacity of the cache.
pub fn get_mut(&mut self, key: &K) -> Option<&mut V> {
let next_generation = self.0.next_generation_mut();
for (generation, (candidate, value)) in self.0.data.iter_mut() {
if key == candidate {
*generation.get_mut() = next_generation;
return Some(value);
}
}
None
}
/// Inserts a value in the cache map by its key, replacing any existing value and returning any evicted value.
///
/// # Complexity
///
/// - Linear in the capacity of the cache.
pub fn insert(&mut self, key: K, mut value: V) -> InsertionOutcome<K, V> {
match self.get_mut(&key) {
Some(old_value) => {
std::mem::swap(old_value, &mut value);
InsertionOutcome::Replaced(value)
}
None => match self.0.put((key, value)) {
Some((key, value)) => InsertionOutcome::Evicted(key, value),
None => InsertionOutcome::InsertedNew,
},
}
}
/// Removes an element from the cache map by its key, returning its value.
///
/// Returns `None` if there was no element with this key in the cache.
///
/// # Complexity
///
/// - Linear in the capacity of the cache.
pub fn remove(&mut self, key: &K) -> Option<V> {
for (index, (_, (candidate, _))) in self.0.data.iter_mut().enumerate() {
if key == candidate {
return Some(self.0.data.swap_remove(index).1 .1);
}
}
None
}
}
/// The result of an insertion in a LRU map.
pub enum InsertionOutcome<K, V> {
/// The key was not in the cache, the key-value pair has been inserted.
InsertedNew,
/// The key was not in the cache and an old key-value pair was evicted from the cache to make room for its insertions.
Evicted(K, V),
/// The key was already in the cache map, its value has been updated.
Replaced(V),
}

View File

@ -0,0 +1,42 @@
---
source: index-scheduler/src/lib.rs
---
### Autobatching Enabled = true
### Processing Tasks:
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: succeeded, details: { received_documents: 3, indexed_documents: Some(3) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 3, allow_index_creation: true }}
1 {uid: 1, status: succeeded, details: { received_document_ids: 2, deleted_documents: Some(2) }, kind: DocumentDeletion { index_uid: "doggos", documents_ids: ["1", "2"] }}
----------------------------------------------------------------------
### Status:
enqueued []
succeeded [0,1,]
----------------------------------------------------------------------
### Kind:
"documentAdditionOrUpdate" [0,]
"documentDeletion" [1,]
----------------------------------------------------------------------
### Index Tasks:
doggos [0,1,]
----------------------------------------------------------------------
### Index Mapper:
["doggos"]
----------------------------------------------------------------------
### Canceled By:
----------------------------------------------------------------------
### Enqueued At:
[timestamp] [0,]
[timestamp] [1,]
----------------------------------------------------------------------
### Started At:
[timestamp] [0,1,]
----------------------------------------------------------------------
### Finished At:
[timestamp] [0,1,]
----------------------------------------------------------------------
### File Store:
----------------------------------------------------------------------

View File

@ -0,0 +1,9 @@
---
source: index-scheduler/src/lib.rs
---
[
{
"id": 3,
"doggo": "bork"
}
]

View File

@ -0,0 +1,37 @@
---
source: index-scheduler/src/lib.rs
---
### Autobatching Enabled = true
### Processing Tasks:
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 3, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 3, allow_index_creation: true }}
----------------------------------------------------------------------
### Status:
enqueued [0,]
----------------------------------------------------------------------
### Kind:
"documentAdditionOrUpdate" [0,]
----------------------------------------------------------------------
### Index Tasks:
doggos [0,]
----------------------------------------------------------------------
### Index Mapper:
[]
----------------------------------------------------------------------
### Canceled By:
----------------------------------------------------------------------
### Enqueued At:
[timestamp] [0,]
----------------------------------------------------------------------
### Started At:
----------------------------------------------------------------------
### Finished At:
----------------------------------------------------------------------
### File Store:
00000000-0000-0000-0000-000000000000
----------------------------------------------------------------------

View File

@ -0,0 +1,40 @@
---
source: index-scheduler/src/lib.rs
---
### Autobatching Enabled = true
### Processing Tasks:
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_documents: 3, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 3, allow_index_creation: true }}
1 {uid: 1, status: enqueued, details: { received_document_ids: 2, deleted_documents: None }, kind: DocumentDeletion { index_uid: "doggos", documents_ids: ["1", "2"] }}
----------------------------------------------------------------------
### Status:
enqueued [0,1,]
----------------------------------------------------------------------
### Kind:
"documentAdditionOrUpdate" [0,]
"documentDeletion" [1,]
----------------------------------------------------------------------
### Index Tasks:
doggos [0,1,]
----------------------------------------------------------------------
### Index Mapper:
[]
----------------------------------------------------------------------
### Canceled By:
----------------------------------------------------------------------
### Enqueued At:
[timestamp] [0,]
[timestamp] [1,]
----------------------------------------------------------------------
### Started At:
----------------------------------------------------------------------
### Finished At:
----------------------------------------------------------------------
### File Store:
00000000-0000-0000-0000-000000000000
----------------------------------------------------------------------

View File

@ -0,0 +1,43 @@
---
source: index-scheduler/src/lib.rs
---
### Autobatching Enabled = true
### Processing Tasks:
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: failed, error: ResponseError { code: 200, message: "Index `doggos` not found.", error_code: "index_not_found", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#index_not_found" }, details: { received_document_ids: 2, deleted_documents: Some(0) }, kind: DocumentDeletion { index_uid: "doggos", documents_ids: ["1", "2"] }}
1 {uid: 1, status: enqueued, details: { received_documents: 3, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 3, allow_index_creation: true }}
----------------------------------------------------------------------
### Status:
enqueued [1,]
failed [0,]
----------------------------------------------------------------------
### Kind:
"documentAdditionOrUpdate" [1,]
"documentDeletion" [0,]
----------------------------------------------------------------------
### Index Tasks:
doggos [0,1,]
----------------------------------------------------------------------
### Index Mapper:
[]
----------------------------------------------------------------------
### Canceled By:
----------------------------------------------------------------------
### Enqueued At:
[timestamp] [0,]
[timestamp] [1,]
----------------------------------------------------------------------
### Started At:
[timestamp] [0,]
----------------------------------------------------------------------
### Finished At:
[timestamp] [0,]
----------------------------------------------------------------------
### File Store:
00000000-0000-0000-0000-000000000000
----------------------------------------------------------------------

View File

@ -0,0 +1,45 @@
---
source: index-scheduler/src/lib.rs
---
### Autobatching Enabled = true
### Processing Tasks:
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: failed, error: ResponseError { code: 200, message: "Index `doggos` not found.", error_code: "index_not_found", error_type: "invalid_request", error_link: "https://docs.meilisearch.com/errors#index_not_found" }, details: { received_document_ids: 2, deleted_documents: Some(0) }, kind: DocumentDeletion { index_uid: "doggos", documents_ids: ["1", "2"] }}
1 {uid: 1, status: succeeded, details: { received_documents: 3, indexed_documents: Some(3) }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 3, allow_index_creation: true }}
----------------------------------------------------------------------
### Status:
enqueued []
succeeded [1,]
failed [0,]
----------------------------------------------------------------------
### Kind:
"documentAdditionOrUpdate" [1,]
"documentDeletion" [0,]
----------------------------------------------------------------------
### Index Tasks:
doggos [0,1,]
----------------------------------------------------------------------
### Index Mapper:
["doggos"]
----------------------------------------------------------------------
### Canceled By:
----------------------------------------------------------------------
### Enqueued At:
[timestamp] [0,]
[timestamp] [1,]
----------------------------------------------------------------------
### Started At:
[timestamp] [0,]
[timestamp] [1,]
----------------------------------------------------------------------
### Finished At:
[timestamp] [0,]
[timestamp] [1,]
----------------------------------------------------------------------
### File Store:
----------------------------------------------------------------------

View File

@ -0,0 +1,17 @@
---
source: index-scheduler/src/lib.rs
---
[
{
"id": 1,
"doggo": "jean bob"
},
{
"id": 2,
"catto": "jorts"
},
{
"id": 3,
"doggo": "bork"
}
]

View File

@ -0,0 +1,36 @@
---
source: index-scheduler/src/lib.rs
---
### Autobatching Enabled = true
### Processing Tasks:
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_document_ids: 2, deleted_documents: None }, kind: DocumentDeletion { index_uid: "doggos", documents_ids: ["1", "2"] }}
----------------------------------------------------------------------
### Status:
enqueued [0,]
----------------------------------------------------------------------
### Kind:
"documentDeletion" [0,]
----------------------------------------------------------------------
### Index Tasks:
doggos [0,]
----------------------------------------------------------------------
### Index Mapper:
[]
----------------------------------------------------------------------
### Canceled By:
----------------------------------------------------------------------
### Enqueued At:
[timestamp] [0,]
----------------------------------------------------------------------
### Started At:
----------------------------------------------------------------------
### Finished At:
----------------------------------------------------------------------
### File Store:
----------------------------------------------------------------------

View File

@ -0,0 +1,40 @@
---
source: index-scheduler/src/lib.rs
---
### Autobatching Enabled = true
### Processing Tasks:
[]
----------------------------------------------------------------------
### All Tasks:
0 {uid: 0, status: enqueued, details: { received_document_ids: 2, deleted_documents: None }, kind: DocumentDeletion { index_uid: "doggos", documents_ids: ["1", "2"] }}
1 {uid: 1, status: enqueued, details: { received_documents: 3, indexed_documents: None }, kind: DocumentAdditionOrUpdate { index_uid: "doggos", primary_key: Some("id"), method: ReplaceDocuments, content_file: 00000000-0000-0000-0000-000000000000, documents_count: 3, allow_index_creation: true }}
----------------------------------------------------------------------
### Status:
enqueued [0,1,]
----------------------------------------------------------------------
### Kind:
"documentAdditionOrUpdate" [1,]
"documentDeletion" [0,]
----------------------------------------------------------------------
### Index Tasks:
doggos [0,1,]
----------------------------------------------------------------------
### Index Mapper:
[]
----------------------------------------------------------------------
### Canceled By:
----------------------------------------------------------------------
### Enqueued At:
[timestamp] [0,]
[timestamp] [1,]
----------------------------------------------------------------------
### Started At:
----------------------------------------------------------------------
### Finished At:
----------------------------------------------------------------------
### File Store:
00000000-0000-0000-0000-000000000000
----------------------------------------------------------------------

View File

@ -439,20 +439,29 @@ impl IndexScheduler {
provided_ids: received_document_ids,
deleted_documents,
} => {
if let Some(deleted_documents) = deleted_documents {
assert_eq!(status, Status::Succeeded);
assert!(deleted_documents <= received_document_ids as u64);
assert_eq!(kind.as_kind(), Kind::DocumentDeletion);
assert_eq!(kind.as_kind(), Kind::DocumentDeletion);
let (index_uid, documents_ids) =
if let KindWithContent::DocumentDeletion {
ref index_uid,
ref documents_ids,
} = kind
{
(index_uid, documents_ids)
} else {
unreachable!()
};
assert_eq!(&task_index_uid.unwrap(), index_uid);
match &kind {
KindWithContent::DocumentDeletion { index_uid, documents_ids } => {
assert_eq!(&task_index_uid.unwrap(), index_uid);
assert!(documents_ids.len() >= received_document_ids);
}
_ => panic!(),
match status {
Status::Enqueued | Status::Processing => (),
Status::Succeeded => {
assert!(deleted_documents.unwrap() <= received_document_ids as u64);
assert!(documents_ids.len() == received_document_ids);
}
Status::Failed | Status::Canceled => {
assert!(deleted_documents == Some(0));
assert!(documents_ids.len() == received_document_ids);
}
} else {
assert_ne!(status, Status::Succeeded);
}
}
Details::ClearAll { deleted_documents } => {
@ -529,3 +538,37 @@ impl IndexScheduler {
}
}
}
pub fn dichotomic_search(start_point: usize, mut is_good: impl FnMut(usize) -> bool) -> usize {
let mut biggest_good = None;
let mut smallest_bad = None;
let mut current = start_point;
loop {
let is_good = is_good(current);
(biggest_good, smallest_bad, current) = match (biggest_good, smallest_bad, is_good) {
(None, None, false) => (None, Some(current), current / 2),
(None, None, true) => (Some(current), None, current * 2),
(None, Some(smallest_bad), true) => {
(Some(current), Some(smallest_bad), (current + smallest_bad) / 2)
}
(None, Some(_), false) => (None, Some(current), current / 2),
(Some(_), None, true) => (Some(current), None, current * 2),
(Some(biggest_good), None, false) => {
(Some(biggest_good), Some(current), (biggest_good + current) / 2)
}
(Some(_), Some(smallest_bad), true) => {
(Some(current), Some(smallest_bad), (smallest_bad + current) / 2)
}
(Some(biggest_good), Some(_), false) => {
(Some(biggest_good), Some(current), (biggest_good + current) / 2)
}
};
if current == 0 {
return current;
}
if smallest_bad.is_some() && biggest_good.is_some() && biggest_good >= Some(current) {
return current;
}
}
}

View File

@ -1,10 +1,16 @@
[package]
name = "json-depth-checker"
version = "1.0.0"
edition = "2021"
description = "A library that indicates if a JSON must be flattened"
publish = false
version.workspace = true
authors.workspace = true
# description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
serde_json = "1.0"

View File

@ -1,7 +1,14 @@
[package]
name = "meili-snap"
version = "1.0.0"
edition = "2021"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
insta = { version = "^1.19.1", features = ["json", "redactions"] }

View File

@ -1,12 +1,20 @@
[package]
name = "meilisearch-auth"
version = "1.0.0"
edition = "2021"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
base64 = "0.13.1"
enum-iterator = "1.1.3"
hmac = "0.12.1"
maplit = "1.0.2"
meilisearch-types = { path = "../meilisearch-types" }
rand = "0.8.5"
roaring = { version = "0.10.0", features = ["serde"] }

View File

@ -7,9 +7,10 @@ use std::path::Path;
use std::sync::Arc;
use error::{AuthControllerError, Result};
use maplit::hashset;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::keys::{Action, CreateApiKey, Key, PatchApiKey};
use meilisearch_types::milli::update::Setting;
use meilisearch_types::star_or::StarOr;
use serde::{Deserialize, Serialize};
pub use store::open_auth_store_env;
use store::{generate_key_as_hexa, HeedAuthStore};
@ -84,34 +85,13 @@ impl AuthController {
uid: Uuid,
search_rules: Option<SearchRules>,
) -> Result<AuthFilter> {
let mut filters = AuthFilter::default();
let key = self
.store
.get_api_key(uid)?
.ok_or_else(|| AuthControllerError::ApiKeyNotFound(uid.to_string()))?;
let key = self.get_key(uid)?;
if !key.indexes.iter().any(|i| i == &StarOr::Star) {
filters.search_rules = match search_rules {
// Intersect search_rules with parent key authorized indexes.
Some(search_rules) => SearchRules::Map(
key.indexes
.into_iter()
.filter_map(|index| {
search_rules.get_index_search_rules(&format!("{index}")).map(
|index_search_rules| (index.to_string(), Some(index_search_rules)),
)
})
.collect(),
),
None => SearchRules::Set(key.indexes.into_iter().map(|x| x.to_string()).collect()),
};
} else if let Some(search_rules) = search_rules {
filters.search_rules = search_rules;
}
let key_authorized_indexes = SearchRules::Set(key.indexes.into_iter().collect());
filters.allow_index_creation = self.is_key_authorized(uid, Action::IndexesAdd, None)?;
let allow_index_creation = self.is_key_authorized(uid, Action::IndexesAdd, None)?;
Ok(filters)
Ok(AuthFilter { search_rules, key_authorized_indexes, allow_index_creation })
}
pub fn list_keys(&self) -> Result<Vec<Key>> {
@ -150,9 +130,7 @@ impl AuthController {
.get_expiration_date(uid, action, None)?
.or(match index {
// else check if the key has access to the requested index.
Some(index) => {
self.store.get_expiration_date(uid, action, Some(index.as_bytes()))?
}
Some(index) => self.store.get_expiration_date(uid, action, Some(index))?,
// or to any index if no index has been requested.
None => self.store.prefix_first_expiration_date(uid, action)?,
}) {
@ -178,13 +156,59 @@ impl AuthController {
}
pub struct AuthFilter {
pub search_rules: SearchRules,
pub allow_index_creation: bool,
search_rules: Option<SearchRules>,
key_authorized_indexes: SearchRules,
allow_index_creation: bool,
}
impl Default for AuthFilter {
fn default() -> Self {
Self { search_rules: SearchRules::default(), allow_index_creation: true }
Self {
search_rules: None,
key_authorized_indexes: SearchRules::default(),
allow_index_creation: true,
}
}
}
impl AuthFilter {
#[inline]
pub fn allow_index_creation(&self, index: &str) -> bool {
self.allow_index_creation && self.is_index_authorized(index)
}
pub fn with_allowed_indexes(allowed_indexes: HashSet<IndexUidPattern>) -> Self {
Self {
search_rules: None,
key_authorized_indexes: SearchRules::Set(allowed_indexes),
allow_index_creation: false,
}
}
pub fn all_indexes_authorized(&self) -> bool {
self.key_authorized_indexes.all_indexes_authorized()
&& self
.search_rules
.as_ref()
.map(|search_rules| search_rules.all_indexes_authorized())
.unwrap_or(true)
}
pub fn is_index_authorized(&self, index: &str) -> bool {
self.key_authorized_indexes.is_index_authorized(index)
&& self
.search_rules
.as_ref()
.map(|search_rules| search_rules.is_index_authorized(index))
.unwrap_or(true)
}
pub fn get_index_search_rules(&self, index: &str) -> Option<IndexSearchRules> {
if !self.is_index_authorized(index) {
return None;
}
let search_rules = self.search_rules.as_ref().unwrap_or(&self.key_authorized_indexes);
search_rules.get_index_search_rules(index)
}
}
@ -192,63 +216,61 @@ impl Default for AuthFilter {
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(untagged)]
pub enum SearchRules {
Set(HashSet<String>),
Map(HashMap<String, Option<IndexSearchRules>>),
Set(HashSet<IndexUidPattern>),
Map(HashMap<IndexUidPattern, Option<IndexSearchRules>>),
}
impl Default for SearchRules {
fn default() -> Self {
Self::Set(Some("*".to_string()).into_iter().collect())
Self::Set(hashset! { IndexUidPattern::all() })
}
}
impl SearchRules {
pub fn is_index_authorized(&self, index: &str) -> bool {
fn is_index_authorized(&self, index: &str) -> bool {
match self {
Self::Set(set) => set.contains("*") || set.contains(index),
Self::Map(map) => map.contains_key("*") || map.contains_key(index),
Self::Set(set) => {
set.contains("*")
|| set.contains(index)
|| set.iter().any(|pattern| pattern.matches_str(index))
}
Self::Map(map) => {
map.contains_key("*")
|| map.contains_key(index)
|| map.keys().any(|pattern| pattern.matches_str(index))
}
}
}
pub fn get_index_search_rules(&self, index: &str) -> Option<IndexSearchRules> {
fn get_index_search_rules(&self, index: &str) -> Option<IndexSearchRules> {
match self {
Self::Set(set) => {
if set.contains("*") || set.contains(index) {
Self::Set(_) => {
if self.is_index_authorized(index) {
Some(IndexSearchRules::default())
} else {
None
}
}
Self::Map(map) => {
map.get(index).or_else(|| map.get("*")).map(|isr| isr.clone().unwrap_or_default())
// We must take the most retrictive rule of this index uid patterns set of rules.
map.iter()
.filter(|(pattern, _)| pattern.matches_str(index))
.max_by_key(|(pattern, _)| (pattern.is_exact(), pattern.len()))
.and_then(|(_, rule)| rule.clone())
}
}
}
/// Return the list of indexes such that `self.is_index_authorized(index) == true`,
/// or `None` if all indexes satisfy this condition.
pub fn authorized_indexes(&self) -> Option<Vec<String>> {
fn all_indexes_authorized(&self) -> bool {
match self {
SearchRules::Set(set) => {
if set.contains("*") {
None
} else {
Some(set.iter().cloned().collect())
}
}
SearchRules::Map(map) => {
if map.contains_key("*") {
None
} else {
Some(map.keys().cloned().collect())
}
}
SearchRules::Set(set) => set.contains("*"),
SearchRules::Map(map) => map.contains_key("*"),
}
}
}
impl IntoIterator for SearchRules {
type Item = (String, IndexSearchRules);
type Item = (IndexUidPattern, IndexSearchRules);
type IntoIter = Box<dyn Iterator<Item = Self::Item>>;
fn into_iter(self) -> Self::IntoIter {

View File

@ -5,20 +5,21 @@ use std::convert::{TryFrom, TryInto};
use std::fs::create_dir_all;
use std::path::Path;
use std::str;
use std::str::FromStr;
use std::sync::Arc;
use hmac::{Hmac, Mac};
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::keys::KeyId;
use meilisearch_types::milli;
use meilisearch_types::milli::heed::types::{ByteSlice, DecodeIgnore, SerdeJson};
use meilisearch_types::milli::heed::{Database, Env, EnvOpenOptions, RwTxn};
use meilisearch_types::star_or::StarOr;
use sha2::Sha256;
use time::OffsetDateTime;
use uuid::fmt::Hyphenated;
use uuid::Uuid;
use super::error::Result;
use super::error::{AuthControllerError, Result};
use super::{Action, Key};
const AUTH_STORE_SIZE: usize = 1_073_741_824; //1GiB
@ -129,7 +130,7 @@ impl HeedAuthStore {
}
}
let no_index_restriction = key.indexes.contains(&StarOr::Star);
let no_index_restriction = key.indexes.iter().any(|p| p.matches_all());
for action in actions {
if no_index_restriction {
// If there is no index restriction we put None.
@ -214,11 +215,28 @@ impl HeedAuthStore {
&self,
uid: Uuid,
action: Action,
index: Option<&[u8]>,
index: Option<&str>,
) -> Result<Option<Option<OffsetDateTime>>> {
let rtxn = self.env.read_txn()?;
let tuple = (&uid, &action, index);
Ok(self.action_keyid_index_expiration.get(&rtxn, &tuple)?)
let tuple = (&uid, &action, index.map(|s| s.as_bytes()));
match self.action_keyid_index_expiration.get(&rtxn, &tuple)? {
Some(expiration) => Ok(Some(expiration)),
None => {
let tuple = (&uid, &action, None);
for result in self.action_keyid_index_expiration.prefix_iter(&rtxn, &tuple)? {
let ((_, _, index_uid_pattern), expiration) = result?;
if let Some((pattern, index)) = index_uid_pattern.zip(index) {
let index_uid_pattern = str::from_utf8(pattern)?;
let pattern = IndexUidPattern::from_str(index_uid_pattern)
.map_err(|e| AuthControllerError::Internal(Box::new(e)))?;
if pattern.matches_str(index) {
return Ok(Some(expiration));
}
}
}
Ok(None)
}
}
}
pub fn prefix_first_expiration_date(

View File

@ -1,15 +1,21 @@
[package]
name = "meilisearch-types"
version = "1.0.0"
authors = ["marin <postma.marin@protonmail.com>"]
edition = "2021"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
actix-web = { version = "4.2.1", default-features = false }
anyhow = "1.0.65"
convert_case = "0.6.0"
csv = "1.1.6"
deserr = "0.3.0"
deserr = "0.5.0"
either = { version = "1.6.1", features = ["serde"] }
enum-iterator = "1.1.3"
file-store = { path = "../file-store" }
@ -25,7 +31,7 @@ tar = "0.4.38"
tempfile = "3.3.0"
thiserror = "1.0.30"
time = { version = "0.3.7", features = ["serde-well-known", "formatting", "parsing", "macros"] }
tokio = "1.0"
tokio = "1.24"
uuid = { version = "1.1.2", features = ["serde", "v4"] }
[dev-dependencies]

View File

@ -1,328 +0,0 @@
/*!
This module implements the error messages of deserialization errors.
We try to:
1. Give a human-readable description of where the error originated.
2. Use the correct terms depending on the format of the request (json/query param)
3. Categorise the type of the error (e.g. missing field, wrong value type, unexpected error, etc.)
*/
use deserr::{ErrorKind, IntoValue, ValueKind, ValuePointerRef};
use super::{DeserrJsonError, DeserrQueryParamError};
use crate::error::{Code, ErrorCode};
/// Return a description of the given location in a Json, preceded by the given article.
/// e.g. `at .key1[8].key2`. If the location is the origin, the given article will not be
/// included in the description.
pub fn location_json_description(location: ValuePointerRef, article: &str) -> String {
fn rec(location: ValuePointerRef) -> String {
match location {
ValuePointerRef::Origin => String::new(),
ValuePointerRef::Key { key, prev } => rec(*prev) + "." + key,
ValuePointerRef::Index { index, prev } => format!("{}[{index}]", rec(*prev)),
}
}
match location {
ValuePointerRef::Origin => String::new(),
_ => {
format!("{article} `{}`", rec(location))
}
}
}
/// Return a description of the list of value kinds for a Json payload.
fn value_kinds_description_json(kinds: &[ValueKind]) -> String {
// Rank each value kind so that they can be sorted (and deduplicated)
// Having a predictable order helps with pattern matching
fn order(kind: &ValueKind) -> u8 {
match kind {
ValueKind::Null => 0,
ValueKind::Boolean => 1,
ValueKind::Integer => 2,
ValueKind::NegativeInteger => 3,
ValueKind::Float => 4,
ValueKind::String => 5,
ValueKind::Sequence => 6,
ValueKind::Map => 7,
}
}
// Return a description of a single value kind, preceded by an article
fn single_description(kind: &ValueKind) -> &'static str {
match kind {
ValueKind::Null => "null",
ValueKind::Boolean => "a boolean",
ValueKind::Integer => "a positive integer",
ValueKind::NegativeInteger => "a negative integer",
ValueKind::Float => "a number",
ValueKind::String => "a string",
ValueKind::Sequence => "an array",
ValueKind::Map => "an object",
}
}
fn description_rec(kinds: &[ValueKind], count_items: &mut usize, message: &mut String) {
let (msg_part, rest): (_, &[ValueKind]) = match kinds {
[] => (String::new(), &[]),
[ValueKind::Integer | ValueKind::NegativeInteger, ValueKind::Float, rest @ ..] => {
("a number".to_owned(), rest)
}
[ValueKind::Integer, ValueKind::NegativeInteger, ValueKind::Float, rest @ ..] => {
("a number".to_owned(), rest)
}
[ValueKind::Integer, ValueKind::NegativeInteger, rest @ ..] => {
("an integer".to_owned(), rest)
}
[a] => (single_description(a).to_owned(), &[]),
[a, rest @ ..] => (single_description(a).to_owned(), rest),
};
if rest.is_empty() {
if *count_items == 0 {
message.push_str(&msg_part);
} else if *count_items == 1 {
message.push_str(&format!(" or {msg_part}"));
} else {
message.push_str(&format!(", or {msg_part}"));
}
} else {
if *count_items == 0 {
message.push_str(&msg_part);
} else {
message.push_str(&format!(", {msg_part}"));
}
*count_items += 1;
description_rec(rest, count_items, message);
}
}
let mut kinds = kinds.to_owned();
kinds.sort_by_key(order);
kinds.dedup();
if kinds.is_empty() {
// Should not happen ideally
"a different value".to_owned()
} else {
let mut message = String::new();
description_rec(kinds.as_slice(), &mut 0, &mut message);
message
}
}
/// Return the JSON string of the value preceded by a description of its kind
fn value_description_with_kind_json(v: &serde_json::Value) -> String {
match v.kind() {
ValueKind::Null => "null".to_owned(),
kind => {
format!(
"{}: `{}`",
value_kinds_description_json(&[kind]),
serde_json::to_string(v).unwrap()
)
}
}
}
impl<C: Default + ErrorCode> deserr::DeserializeError for DeserrJsonError<C> {
fn error<V: IntoValue>(
_self_: Option<Self>,
error: deserr::ErrorKind<V>,
location: ValuePointerRef,
) -> Result<Self, Self> {
let mut message = String::new();
message.push_str(&match error {
ErrorKind::IncorrectValueKind { actual, accepted } => {
let expected = value_kinds_description_json(accepted);
let received = value_description_with_kind_json(&serde_json::Value::from(actual));
let location = location_json_description(location, " at");
format!("Invalid value type{location}: expected {expected}, but found {received}")
}
ErrorKind::MissingField { field } => {
let location = location_json_description(location, " inside");
format!("Missing field `{field}`{location}")
}
ErrorKind::UnknownKey { key, accepted } => {
let location = location_json_description(location, " inside");
format!(
"Unknown field `{}`{location}: expected one of {}",
key,
accepted
.iter()
.map(|accepted| format!("`{}`", accepted))
.collect::<Vec<String>>()
.join(", ")
)
}
ErrorKind::UnknownValue { value, accepted } => {
let location = location_json_description(location, " at");
format!(
"Unknown value `{}`{location}: expected one of {}",
value,
accepted
.iter()
.map(|accepted| format!("`{}`", accepted))
.collect::<Vec<String>>()
.join(", "),
)
}
ErrorKind::Unexpected { msg } => {
let location = location_json_description(location, " at");
format!("Invalid value{location}: {msg}")
}
});
Err(DeserrJsonError::new(message, C::default().error_code()))
}
}
pub fn immutable_field_error(field: &str, accepted: &[&str], code: Code) -> DeserrJsonError {
let msg = format!(
"Immutable field `{field}`: expected one of {}",
accepted
.iter()
.map(|accepted| format!("`{}`", accepted))
.collect::<Vec<String>>()
.join(", ")
);
DeserrJsonError::new(msg, code)
}
/// Return a description of the given location in query parameters, preceded by the
/// given article. e.g. `at key5[2]`. If the location is the origin, the given article
/// will not be included in the description.
pub fn location_query_param_description(location: ValuePointerRef, article: &str) -> String {
fn rec(location: ValuePointerRef) -> String {
match location {
ValuePointerRef::Origin => String::new(),
ValuePointerRef::Key { key, prev } => {
if matches!(prev, ValuePointerRef::Origin) {
key.to_owned()
} else {
rec(*prev) + "." + key
}
}
ValuePointerRef::Index { index, prev } => format!("{}[{index}]", rec(*prev)),
}
}
match location {
ValuePointerRef::Origin => String::new(),
_ => {
format!("{article} `{}`", rec(location))
}
}
}
impl<C: Default + ErrorCode> deserr::DeserializeError for DeserrQueryParamError<C> {
fn error<V: IntoValue>(
_self_: Option<Self>,
error: deserr::ErrorKind<V>,
location: ValuePointerRef,
) -> Result<Self, Self> {
let mut message = String::new();
message.push_str(&match error {
ErrorKind::IncorrectValueKind { actual, accepted } => {
let expected = value_kinds_description_query_param(accepted);
let received = value_description_with_kind_query_param(actual);
let location = location_query_param_description(location, " for parameter");
format!("Invalid value type{location}: expected {expected}, but found {received}")
}
ErrorKind::MissingField { field } => {
let location = location_query_param_description(location, " inside");
format!("Missing parameter `{field}`{location}")
}
ErrorKind::UnknownKey { key, accepted } => {
let location = location_query_param_description(location, " inside");
format!(
"Unknown parameter `{}`{location}: expected one of {}",
key,
accepted
.iter()
.map(|accepted| format!("`{}`", accepted))
.collect::<Vec<String>>()
.join(", ")
)
}
ErrorKind::UnknownValue { value, accepted } => {
let location = location_query_param_description(location, " for parameter");
format!(
"Unknown value `{}`{location}: expected one of {}",
value,
accepted
.iter()
.map(|accepted| format!("`{}`", accepted))
.collect::<Vec<String>>()
.join(", "),
)
}
ErrorKind::Unexpected { msg } => {
let location = location_query_param_description(location, " in parameter");
format!("Invalid value{location}: {msg}")
}
});
Err(DeserrQueryParamError::new(message, C::default().error_code()))
}
}
/// Return a description of the list of value kinds for query parameters
/// Since query parameters are always treated as strings, we always return
/// "a string" for now.
fn value_kinds_description_query_param(_accepted: &[ValueKind]) -> String {
"a string".to_owned()
}
fn value_description_with_kind_query_param<V: IntoValue>(actual: deserr::Value<V>) -> String {
match actual {
deserr::Value::Null => "null".to_owned(),
deserr::Value::Boolean(x) => format!("a boolean: `{x}`"),
deserr::Value::Integer(x) => format!("an integer: `{x}`"),
deserr::Value::NegativeInteger(x) => {
format!("an integer: `{x}`")
}
deserr::Value::Float(x) => {
format!("a number: `{x}`")
}
deserr::Value::String(x) => {
format!("a string: `{x}`")
}
deserr::Value::Sequence(_) => "multiple values".to_owned(),
deserr::Value::Map(_) => "multiple parameters".to_owned(),
}
}
#[cfg(test)]
mod tests {
use deserr::ValueKind;
use crate::deserr::error_messages::value_kinds_description_json;
#[test]
fn test_value_kinds_description_json() {
insta::assert_display_snapshot!(value_kinds_description_json(&[]), @"a different value");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Boolean]), @"a boolean");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Integer]), @"a positive integer");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::NegativeInteger]), @"a negative integer");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Integer]), @"a positive integer");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::String]), @"a string");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Sequence]), @"an array");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Map]), @"an object");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Integer, ValueKind::Boolean]), @"a boolean or a positive integer");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Null, ValueKind::Integer]), @"null or a positive integer");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Sequence, ValueKind::NegativeInteger]), @"a negative integer or an array");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Integer, ValueKind::Float]), @"a number");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Integer, ValueKind::Float, ValueKind::NegativeInteger]), @"a number");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Integer, ValueKind::Float, ValueKind::NegativeInteger, ValueKind::Null]), @"null or a number");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Boolean, ValueKind::Integer, ValueKind::Float, ValueKind::NegativeInteger, ValueKind::Null]), @"null, a boolean, or a number");
insta::assert_display_snapshot!(value_kinds_description_json(&[ValueKind::Null, ValueKind::Boolean, ValueKind::Integer, ValueKind::Float, ValueKind::NegativeInteger, ValueKind::Null]), @"null, a boolean, or a number");
}
}

View File

@ -1,18 +1,19 @@
use std::convert::Infallible;
use std::fmt;
use std::marker::PhantomData;
use std::ops::ControlFlow;
use deserr::{DeserializeError, MergeWithError, ValuePointerRef};
use deserr::errors::{JsonError, QueryParamError};
use deserr::{take_cf_content, DeserializeError, IntoValue, MergeWithError, ValuePointerRef};
use crate::error::deserr_codes::{self, *};
use crate::error::deserr_codes::*;
use crate::error::{
unwrap_any, Code, DeserrParseBoolError, DeserrParseIntError, ErrorCode, InvalidTaskDateError,
Code, DeserrParseBoolError, DeserrParseIntError, ErrorCode, InvalidTaskDateError,
ParseOffsetDateTimeError,
};
use crate::index_uid::IndexUidFormatError;
use crate::tasks::{ParseTaskKindError, ParseTaskStatusError};
pub mod error_messages;
pub mod query_params;
/// Marker type for the Json format
@ -20,8 +21,8 @@ pub struct DeserrJson;
/// Marker type for the Query Parameter format
pub struct DeserrQueryParam;
pub type DeserrJsonError<C = deserr_codes::BadRequest> = DeserrError<DeserrJson, C>;
pub type DeserrQueryParamError<C = deserr_codes::BadRequest> = DeserrError<DeserrQueryParam, C>;
pub type DeserrJsonError<C = BadRequest> = DeserrError<DeserrJson, C>;
pub type DeserrQueryParamError<C = BadRequest> = DeserrError<DeserrQueryParam, C>;
/// A request deserialization error.
///
@ -37,6 +38,7 @@ impl<Format, C: Default + ErrorCode> DeserrError<Format, C> {
Self { msg, code, _phantom: PhantomData }
}
}
impl<Format, C: Default + ErrorCode> std::fmt::Debug for DeserrError<Format, C> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("DeserrError").field("msg", &self.msg).field("code", &self.code).finish()
@ -49,6 +51,16 @@ impl<Format, C: Default + ErrorCode> std::fmt::Display for DeserrError<Format, C
}
}
impl<F, C: Default + ErrorCode> actix_web::ResponseError for DeserrError<F, C> {
fn status_code(&self) -> actix_web::http::StatusCode {
self.code.http()
}
fn error_response(&self) -> actix_web::HttpResponse<actix_web::body::BoxBody> {
crate::error::ResponseError::from_msg(self.msg.to_string(), self.code).error_response()
}
}
impl<Format, C: Default + ErrorCode> std::error::Error for DeserrError<Format, C> {}
impl<Format, C: Default + ErrorCode> ErrorCode for DeserrError<Format, C> {
fn error_code(&self) -> Code {
@ -64,8 +76,8 @@ impl<Format, C1: Default + ErrorCode, C2: Default + ErrorCode>
_self_: Option<Self>,
other: DeserrError<Format, C2>,
_merge_location: ValuePointerRef,
) -> Result<Self, Self> {
Err(DeserrError { msg: other.msg, code: other.code, _phantom: PhantomData })
) -> ControlFlow<Self, Self> {
ControlFlow::Break(DeserrError { msg: other.msg, code: other.code, _phantom: PhantomData })
}
}
@ -74,17 +86,56 @@ impl<Format, C: Default + ErrorCode> MergeWithError<Infallible> for DeserrError<
_self_: Option<Self>,
_other: Infallible,
_merge_location: ValuePointerRef,
) -> Result<Self, Self> {
) -> ControlFlow<Self, Self> {
unreachable!()
}
}
impl<C: Default + ErrorCode> DeserializeError for DeserrJsonError<C> {
fn error<V: IntoValue>(
_self_: Option<Self>,
error: deserr::ErrorKind<V>,
location: ValuePointerRef,
) -> ControlFlow<Self, Self> {
ControlFlow::Break(DeserrJsonError::new(
take_cf_content(JsonError::error(None, error, location)).to_string(),
C::default().error_code(),
))
}
}
impl<C: Default + ErrorCode> DeserializeError for DeserrQueryParamError<C> {
fn error<V: IntoValue>(
_self_: Option<Self>,
error: deserr::ErrorKind<V>,
location: ValuePointerRef,
) -> ControlFlow<Self, Self> {
ControlFlow::Break(DeserrQueryParamError::new(
take_cf_content(QueryParamError::error(None, error, location)).to_string(),
C::default().error_code(),
))
}
}
pub fn immutable_field_error(field: &str, accepted: &[&str], code: Code) -> DeserrJsonError {
let msg = format!(
"Immutable field `{field}`: expected one of {}",
accepted
.iter()
.map(|accepted| format!("`{}`", accepted))
.collect::<Vec<String>>()
.join(", ")
);
DeserrJsonError::new(msg, code)
}
// Implement a convenience function to build a `missing_field` error
macro_rules! make_missing_field_convenience_builder {
($err_code:ident, $fn_name:ident) => {
impl DeserrJsonError<$err_code> {
pub fn $fn_name(field: &str, location: ValuePointerRef) -> Self {
let x = unwrap_any(Self::error::<Infallible>(
let x = deserr::take_cf_content(Self::error::<Infallible>(
None,
deserr::ErrorKind::MissingField { field },
location,
@ -112,7 +163,7 @@ macro_rules! merge_with_error_impl_take_error_message {
_self_: Option<Self>,
other: $err_type,
merge_location: ValuePointerRef,
) -> Result<Self, Self> {
) -> ControlFlow<Self, Self> {
DeserrError::<Format, C>::error::<Infallible>(
None,
deserr::ErrorKind::Unexpected { msg: other.to_string() },

View File

@ -15,10 +15,9 @@ use std::convert::Infallible;
use std::ops::Deref;
use std::str::FromStr;
use deserr::{DeserializeError, DeserializeFromValue, MergeWithError, ValueKind};
use deserr::{DeserializeError, Deserr, MergeWithError, ValueKind};
use super::{DeserrParseBoolError, DeserrParseIntError};
use crate::error::unwrap_any;
use crate::index_uid::IndexUid;
use crate::tasks::{Kind, Status};
@ -38,7 +37,7 @@ impl<T> Deref for Param<T> {
}
}
impl<T, E> DeserializeFromValue<E> for Param<T>
impl<T, E> Deserr<E> for Param<T>
where
E: DeserializeError + MergeWithError<T::Err>,
T: FromQueryParameter,
@ -50,9 +49,9 @@ where
match value {
deserr::Value::String(s) => match T::from_query_param(&s) {
Ok(x) => Ok(Param(x)),
Err(e) => Err(unwrap_any(E::merge(None, e, location))),
Err(e) => Err(deserr::take_cf_content(E::merge(None, e, location))),
},
_ => Err(unwrap_any(E::error(
_ => Err(deserr::take_cf_content(E::error(
None,
deserr::ErrorKind::IncorrectValueKind {
actual: value,

View File

@ -19,7 +19,7 @@ type Result<T> = std::result::Result<T, DocumentFormatError>;
pub enum PayloadType {
Ndjson,
Json,
Csv,
Csv { delimiter: u8 },
}
impl fmt::Display for PayloadType {
@ -27,7 +27,7 @@ impl fmt::Display for PayloadType {
match self {
PayloadType::Ndjson => f.write_str("ndjson"),
PayloadType::Json => f.write_str("json"),
PayloadType::Csv => f.write_str("csv"),
PayloadType::Csv { .. } => f.write_str("csv"),
}
}
}
@ -105,11 +105,11 @@ impl ErrorCode for DocumentFormatError {
}
/// Reads CSV from input and write an obkv batch to writer.
pub fn read_csv(file: &File, writer: impl Write + Seek) -> Result<u64> {
pub fn read_csv(file: &File, writer: impl Write + Seek, delimiter: u8) -> Result<u64> {
let mut builder = DocumentsBatchBuilder::new(writer);
let mmap = unsafe { MmapOptions::new().map(file)? };
let csv = csv::Reader::from_reader(mmap.as_ref());
builder.append_csv(csv).map_err(|e| (PayloadType::Csv, e))?;
let csv = csv::ReaderBuilder::new().delimiter(delimiter).from_reader(mmap.as_ref());
builder.append_csv(csv).map_err(|e| (PayloadType::Csv { delimiter }, e))?;
let count = builder.documents_count();
let _ = builder.into_inner().map_err(DocumentFormatError::Io)?;

View File

@ -11,8 +11,8 @@ use serde::{Deserialize, Serialize};
#[serde(rename_all = "camelCase")]
pub struct ResponseError {
#[serde(skip)]
code: StatusCode,
message: String,
pub code: StatusCode,
pub message: String,
#[serde(rename = "code")]
error_code: String,
#[serde(rename = "type")]
@ -127,7 +127,7 @@ macro_rules! make_error_codes {
}
impl Code {
/// return the HTTP status code associated with the `Code`
fn http(&self) -> StatusCode {
pub fn http(&self) -> StatusCode {
match self {
$(
Code::$code_ident => StatusCode::$status
@ -212,6 +212,7 @@ InvalidApiKeyName , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyOffset , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyUid , InvalidRequest , BAD_REQUEST ;
InvalidContentType , InvalidRequest , UNSUPPORTED_MEDIA_TYPE ;
InvalidDocumentCsvDelimiter , InvalidRequest , BAD_REQUEST ;
InvalidDocumentFields , InvalidRequest , BAD_REQUEST ;
InvalidDocumentGeoField , InvalidRequest , BAD_REQUEST ;
InvalidDocumentId , InvalidRequest , BAD_REQUEST ;
@ -381,14 +382,6 @@ impl ErrorCode for io::Error {
}
}
/// Unwrap a result, either its Ok or Err value.
pub fn unwrap_any<T>(any: Result<T, T>) -> T {
match any {
Ok(any) => any,
Err(any) => any,
}
}
/// Deserialization when `deserr` cannot parse an API key date.
#[derive(Debug)]
pub struct ParseOffsetDateTimeError(pub String);

View File

@ -2,14 +2,14 @@ use std::error::Error;
use std::fmt;
use std::str::FromStr;
use deserr::DeserializeFromValue;
use deserr::Deserr;
use crate::error::{Code, ErrorCode};
/// An index uid is composed of only ascii alphanumeric characters, - and _, between 1 and 400
/// bytes long
#[derive(Debug, Clone, PartialEq, Eq, DeserializeFromValue)]
#[deserr(from(String) = IndexUid::try_from -> IndexUidFormatError)]
#[derive(Debug, Clone, PartialEq, Eq, Deserr)]
#[deserr(try_from(String) = IndexUid::try_from -> IndexUidFormatError)]
pub struct IndexUid(String);
impl IndexUid {

View File

@ -0,0 +1,124 @@
use std::borrow::Borrow;
use std::error::Error;
use std::fmt;
use std::ops::Deref;
use std::str::FromStr;
use deserr::Deserr;
use serde::{Deserialize, Serialize};
use crate::error::{Code, ErrorCode};
use crate::index_uid::{IndexUid, IndexUidFormatError};
/// An index uid pattern is composed of only ascii alphanumeric characters, - and _, between 1 and 400
/// bytes long and optionally ending with a *.
#[derive(Serialize, Deserialize, Deserr, Debug, Clone, PartialEq, Eq, Hash)]
#[deserr(try_from(&String) = FromStr::from_str -> IndexUidPatternFormatError)]
pub struct IndexUidPattern(String);
impl IndexUidPattern {
pub fn new_unchecked(s: impl AsRef<str>) -> Self {
Self(s.as_ref().to_string())
}
/// Matches any index name.
pub fn all() -> Self {
IndexUidPattern::from_str("*").unwrap()
}
/// Returns `true` if it matches any index.
pub fn matches_all(&self) -> bool {
self.0 == "*"
}
/// Returns `true` if the pattern matches a specific index name.
pub fn is_exact(&self) -> bool {
!self.0.ends_with('*')
}
/// Returns wether this index uid matches this index uid pattern.
pub fn matches(&self, uid: &IndexUid) -> bool {
self.matches_str(uid.as_str())
}
/// Returns wether this string matches this index uid pattern.
pub fn matches_str(&self, uid: &str) -> bool {
match self.0.strip_suffix('*') {
Some(prefix) => uid.starts_with(prefix),
None => self.0 == uid,
}
}
}
impl Deref for IndexUidPattern {
type Target = str;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl Borrow<str> for IndexUidPattern {
fn borrow(&self) -> &str {
&self.0
}
}
impl TryFrom<String> for IndexUidPattern {
type Error = IndexUidPatternFormatError;
fn try_from(uid: String) -> Result<Self, Self::Error> {
let result = match uid.strip_suffix('*') {
Some("") => Ok(IndexUidPattern(uid)),
Some(prefix) => IndexUid::from_str(prefix).map(|_| IndexUidPattern(uid)),
None => IndexUid::try_from(uid).map(IndexUid::into_inner).map(IndexUidPattern),
};
match result {
Ok(index_uid_pattern) => Ok(index_uid_pattern),
Err(IndexUidFormatError { invalid_uid }) => {
Err(IndexUidPatternFormatError { invalid_uid })
}
}
}
}
impl FromStr for IndexUidPattern {
type Err = IndexUidPatternFormatError;
fn from_str(uid: &str) -> Result<IndexUidPattern, IndexUidPatternFormatError> {
uid.to_string().try_into()
}
}
impl From<IndexUidPattern> for String {
fn from(IndexUidPattern(uid): IndexUidPattern) -> Self {
uid
}
}
#[derive(Debug)]
pub struct IndexUidPatternFormatError {
pub invalid_uid: String,
}
impl fmt::Display for IndexUidPatternFormatError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(
f,
"`{}` is not a valid index uid pattern. Index uid patterns \
can be an integer or a string containing only alphanumeric \
characters, hyphens (-), underscores (_), and \
optionally end with a star (*).",
self.invalid_uid,
)
}
}
impl Error for IndexUidPatternFormatError {}
impl ErrorCode for IndexUidPatternFormatError {
fn error_code(&self) -> Code {
Code::InvalidIndexUid
}
}

View File

@ -2,7 +2,7 @@ use std::convert::Infallible;
use std::hash::Hash;
use std::str::FromStr;
use deserr::{DeserializeError, DeserializeFromValue, ValuePointerRef};
use deserr::{DeserializeError, Deserr, MergeWithError, ValuePointerRef};
use enum_iterator::Sequence;
use milli::update::Setting;
use serde::{Deserialize, Serialize};
@ -11,31 +11,44 @@ use time::macros::{format_description, time};
use time::{Date, OffsetDateTime, PrimitiveDateTime};
use uuid::Uuid;
use crate::deserr::error_messages::immutable_field_error;
use crate::deserr::DeserrJsonError;
use crate::deserr::{immutable_field_error, DeserrError, DeserrJsonError};
use crate::error::deserr_codes::*;
use crate::error::{unwrap_any, Code, ParseOffsetDateTimeError};
use crate::index_uid::IndexUid;
use crate::star_or::StarOr;
use crate::error::{Code, ErrorCode, ParseOffsetDateTimeError};
use crate::index_uid_pattern::{IndexUidPattern, IndexUidPatternFormatError};
pub type KeyId = Uuid;
#[derive(Debug, DeserializeFromValue)]
impl<C: Default + ErrorCode> MergeWithError<IndexUidPatternFormatError> for DeserrJsonError<C> {
fn merge(
_self_: Option<Self>,
other: IndexUidPatternFormatError,
merge_location: deserr::ValuePointerRef,
) -> std::ops::ControlFlow<Self, Self> {
DeserrError::error::<Infallible>(
None,
deserr::ErrorKind::Unexpected { msg: other.to_string() },
merge_location,
)
}
}
#[derive(Debug, Deserr)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
pub struct CreateApiKey {
#[deserr(default, error = DeserrJsonError<InvalidApiKeyDescription>)]
pub description: Option<String>,
#[deserr(default, error = DeserrJsonError<InvalidApiKeyName>)]
pub name: Option<String>,
#[deserr(default = Uuid::new_v4(), error = DeserrJsonError<InvalidApiKeyUid>, from(&String) = Uuid::from_str -> uuid::Error)]
#[deserr(default = Uuid::new_v4(), error = DeserrJsonError<InvalidApiKeyUid>, try_from(&String) = Uuid::from_str -> uuid::Error)]
pub uid: KeyId,
#[deserr(error = DeserrJsonError<InvalidApiKeyActions>, missing_field_error = DeserrJsonError::missing_api_key_actions)]
pub actions: Vec<Action>,
#[deserr(error = DeserrJsonError<InvalidApiKeyIndexes>, missing_field_error = DeserrJsonError::missing_api_key_indexes)]
pub indexes: Vec<StarOr<IndexUid>>,
#[deserr(error = DeserrJsonError<InvalidApiKeyExpiresAt>, from(Option<String>) = parse_expiration_date -> ParseOffsetDateTimeError, missing_field_error = DeserrJsonError::missing_api_key_expires_at)]
pub indexes: Vec<IndexUidPattern>,
#[deserr(error = DeserrJsonError<InvalidApiKeyExpiresAt>, try_from(Option<String>) = parse_expiration_date -> ParseOffsetDateTimeError, missing_field_error = DeserrJsonError::missing_api_key_expires_at)]
pub expires_at: Option<OffsetDateTime>,
}
impl CreateApiKey {
pub fn to_key(self) -> Key {
let CreateApiKey { description, name, uid, actions, indexes, expires_at } = self;
@ -65,7 +78,7 @@ fn deny_immutable_fields_api_key(
"expiresAt" => immutable_field_error(field, accepted, Code::ImmutableApiKeyExpiresAt),
"createdAt" => immutable_field_error(field, accepted, Code::ImmutableApiKeyCreatedAt),
"updatedAt" => immutable_field_error(field, accepted, Code::ImmutableApiKeyUpdatedAt),
_ => unwrap_any(DeserrJsonError::<BadRequest>::error::<Infallible>(
_ => deserr::take_cf_content(DeserrJsonError::<BadRequest>::error::<Infallible>(
None,
deserr::ErrorKind::UnknownKey { key: field, accepted },
location,
@ -73,7 +86,7 @@ fn deny_immutable_fields_api_key(
}
}
#[derive(Debug, DeserializeFromValue)]
#[derive(Debug, Deserr)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields = deny_immutable_fields_api_key)]
pub struct PatchApiKey {
#[deserr(default, error = DeserrJsonError<InvalidApiKeyDescription>)]
@ -90,7 +103,7 @@ pub struct Key {
pub name: Option<String>,
pub uid: KeyId,
pub actions: Vec<Action>,
pub indexes: Vec<StarOr<IndexUid>>,
pub indexes: Vec<IndexUidPattern>,
#[serde(with = "time::serde::rfc3339::option")]
pub expires_at: Option<OffsetDateTime>,
#[serde(with = "time::serde::rfc3339")]
@ -108,7 +121,7 @@ impl Key {
description: Some("Use it for anything that is not a search operation. Caution! Do not expose it on a public frontend".to_string()),
uid,
actions: vec![Action::All],
indexes: vec![StarOr::Star],
indexes: vec![IndexUidPattern::all()],
expires_at: None,
created_at: now,
updated_at: now,
@ -123,7 +136,7 @@ impl Key {
description: Some("Use it to search from the frontend".to_string()),
uid,
actions: vec![Action::Search],
indexes: vec![StarOr::Star],
indexes: vec![IndexUidPattern::all()],
expires_at: None,
created_at: now,
updated_at: now,
@ -168,9 +181,7 @@ fn parse_expiration_date(
}
}
#[derive(
Copy, Clone, Serialize, Deserialize, Debug, Eq, PartialEq, Hash, Sequence, DeserializeFromValue,
)]
#[derive(Copy, Clone, Serialize, Deserialize, Debug, Eq, PartialEq, Hash, Sequence, Deserr)]
#[repr(u8)]
pub enum Action {
#[serde(rename = "*")]

View File

@ -3,6 +3,7 @@ pub mod deserr;
pub mod document_formats;
pub mod error;
pub mod index_uid;
pub mod index_uid_pattern;
pub mod keys;
pub mod settings;
pub mod star_or;

View File

@ -3,9 +3,10 @@ use std::convert::Infallible;
use std::fmt;
use std::marker::PhantomData;
use std::num::NonZeroUsize;
use std::ops::ControlFlow;
use std::str::FromStr;
use deserr::{DeserializeError, DeserializeFromValue, ErrorKind, MergeWithError, ValuePointerRef};
use deserr::{DeserializeError, Deserr, ErrorKind, MergeWithError, ValuePointerRef};
use fst::IntoStreamer;
use milli::update::Setting;
use milli::{Criterion, CriterionError, Index, DEFAULT_VALUES_PER_FACET};
@ -13,7 +14,6 @@ use serde::{Deserialize, Serialize, Serializer};
use crate::deserr::DeserrJsonError;
use crate::error::deserr_codes::*;
use crate::error::unwrap_any;
/// The maximimum number of results that the engine
/// will be able to return in one search call.
@ -41,7 +41,7 @@ pub struct Checked;
#[derive(Clone, Default, Debug, Serialize, Deserialize, PartialEq, Eq)]
pub struct Unchecked;
impl<E> DeserializeFromValue<E> for Unchecked
impl<E> Deserr<E> for Unchecked
where
E: DeserializeError,
{
@ -59,13 +59,13 @@ fn validate_min_word_size_for_typo_setting<E: DeserializeError>(
) -> Result<MinWordSizeTyposSetting, E> {
if let (Setting::Set(one), Setting::Set(two)) = (s.one_typo, s.two_typos) {
if one > two {
return Err(unwrap_any(E::error::<Infallible>(None, ErrorKind::Unexpected { msg: format!("`minWordSizeForTypos` setting is invalid. `oneTypo` and `twoTypos` fields should be between `0` and `255`, and `twoTypos` should be greater or equals to `oneTypo` but found `oneTypo: {one}` and twoTypos: {two}`.") }, location)));
return Err(deserr::take_cf_content(E::error::<Infallible>(None, ErrorKind::Unexpected { msg: format!("`minWordSizeForTypos` setting is invalid. `oneTypo` and `twoTypos` fields should be between `0` and `255`, and `twoTypos` should be greater or equals to `oneTypo` but found `oneTypo: {one}` and twoTypos: {two}`.") }, location)));
}
}
Ok(s)
}
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, DeserializeFromValue)]
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, Deserr)]
#[serde(deny_unknown_fields, rename_all = "camelCase")]
#[deserr(deny_unknown_fields, rename_all = camelCase, validate = validate_min_word_size_for_typo_setting -> DeserrJsonError<InvalidSettingsTypoTolerance>)]
pub struct MinWordSizeTyposSetting {
@ -77,7 +77,7 @@ pub struct MinWordSizeTyposSetting {
pub two_typos: Setting<u8>,
}
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, DeserializeFromValue)]
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, Deserr)]
#[serde(deny_unknown_fields, rename_all = "camelCase")]
#[deserr(deny_unknown_fields, rename_all = camelCase, where_predicate = __Deserr_E: deserr::MergeWithError<DeserrJsonError<InvalidSettingsTypoTolerance>>)]
pub struct TypoSettings {
@ -95,7 +95,7 @@ pub struct TypoSettings {
pub disable_on_attributes: Setting<BTreeSet<String>>,
}
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, DeserializeFromValue)]
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, Deserr)]
#[serde(deny_unknown_fields, rename_all = "camelCase")]
#[deserr(rename_all = camelCase, deny_unknown_fields)]
pub struct FacetingSettings {
@ -104,7 +104,7 @@ pub struct FacetingSettings {
pub max_values_per_facet: Setting<usize>,
}
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, DeserializeFromValue)]
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, Deserr)]
#[serde(deny_unknown_fields, rename_all = "camelCase")]
#[deserr(rename_all = camelCase, deny_unknown_fields)]
pub struct PaginationSettings {
@ -118,7 +118,7 @@ impl MergeWithError<milli::CriterionError> for DeserrJsonError<InvalidSettingsRa
_self_: Option<Self>,
other: milli::CriterionError,
merge_location: ValuePointerRef,
) -> Result<Self, Self> {
) -> ControlFlow<Self, Self> {
Self::error::<Infallible>(
None,
ErrorKind::Unexpected { msg: other.to_string() },
@ -130,7 +130,7 @@ impl MergeWithError<milli::CriterionError> for DeserrJsonError<InvalidSettingsRa
/// Holds all the settings for an index. `T` can either be `Checked` if they represents settings
/// whose validity is guaranteed, or `Unchecked` if they need to be validated. In the later case, a
/// call to `check` will return a `Settings<Checked>` from a `Settings<Unchecked>`.
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, DeserializeFromValue)]
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, Deserr)]
#[serde(
deny_unknown_fields,
rename_all = "camelCase",
@ -509,8 +509,8 @@ pub fn settings(
})
}
#[derive(Debug, Clone, PartialEq, Eq, DeserializeFromValue)]
#[deserr(from(&String) = FromStr::from_str -> CriterionError)]
#[derive(Debug, Clone, PartialEq, Eq, Deserr)]
#[deserr(try_from(&String) = FromStr::from_str -> CriterionError)]
pub enum RankingRuleView {
/// Sorted by decreasing number of matched query terms.
/// Query words at the front of an attribute is considered better than if it was at the back.

View File

@ -1,13 +1,13 @@
use std::fmt;
use std::marker::PhantomData;
use std::ops::ControlFlow;
use std::str::FromStr;
use deserr::{DeserializeError, DeserializeFromValue, MergeWithError, ValueKind};
use deserr::{DeserializeError, Deserr, MergeWithError, ValueKind};
use serde::de::Visitor;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use crate::deserr::query_params::FromQueryParameter;
use crate::error::unwrap_any;
/// A type that tries to match either a star (*) or
/// any other thing that implements `FromStr`.
@ -111,7 +111,7 @@ where
}
}
impl<T, E> DeserializeFromValue<E> for StarOr<T>
impl<T, E> Deserr<E> for StarOr<T>
where
T: FromStr,
E: DeserializeError + MergeWithError<T::Err>,
@ -127,11 +127,11 @@ where
} else {
match T::from_str(&v) {
Ok(parsed) => Ok(StarOr::Other(parsed)),
Err(e) => Err(unwrap_any(E::merge(None, e, location))),
Err(e) => Err(deserr::take_cf_content(E::merge(None, e, location))),
}
}
}
_ => Err(unwrap_any(E::error::<V>(
_ => Err(deserr::take_cf_content(E::error::<V>(
None,
deserr::ErrorKind::IncorrectValueKind {
actual: value,
@ -191,7 +191,7 @@ where
}
}
impl<T, E> DeserializeFromValue<E> for OptionStarOr<T>
impl<T, E> Deserr<E> for OptionStarOr<T>
where
E: DeserializeError + MergeWithError<T::Err>,
T: FromQueryParameter,
@ -205,10 +205,10 @@ where
"*" => Ok(OptionStarOr::Star),
s => match T::from_query_param(s) {
Ok(x) => Ok(OptionStarOr::Other(x)),
Err(e) => Err(unwrap_any(E::merge(None, e, location))),
Err(e) => Err(deserr::take_cf_content(E::merge(None, e, location))),
},
},
_ => Err(unwrap_any(E::error::<V>(
_ => Err(deserr::take_cf_content(E::error::<V>(
None,
deserr::ErrorKind::IncorrectValueKind {
actual: value,
@ -271,7 +271,7 @@ impl<T> OptionStarOrList<T> {
}
}
impl<T, E> DeserializeFromValue<E> for OptionStarOrList<T>
impl<T, E> Deserr<E> for OptionStarOrList<T>
where
E: DeserializeError + MergeWithError<T::Err>,
T: FromQueryParameter,
@ -299,7 +299,10 @@ where
Err(e) => {
let location =
if len_cs > 1 { location.push_index(i) } else { location };
error = Some(E::merge(error, e, location)?);
error = match E::merge(error, e, location) {
ControlFlow::Continue(e) => Some(e),
ControlFlow::Break(e) => return Err(e),
};
}
}
}
@ -314,7 +317,7 @@ where
Ok(OptionStarOrList::List(els))
}
}
_ => Err(unwrap_any(E::error::<V>(
_ => Err(deserr::take_cf_content(E::error::<V>(
None,
deserr::ErrorKind::IncorrectValueKind {
actual: value,

View File

@ -46,7 +46,7 @@ pub fn check_version_file(db_path: &Path) -> anyhow::Result<()> {
pub enum VersionFileError {
#[error(
"Meilisearch (v{}) failed to infer the version of the database.
To update Meilisearch please follow our guide on https://docs.meilisearch.com/learn/advanced/updating.html.",
To update Meilisearch please follow our guide on https://docs.meilisearch.com/learn/update_and_migration/updating.html.",
env!("CARGO_PKG_VERSION").to_string()
)]
MissingVersionFile,
@ -54,7 +54,7 @@ pub enum VersionFileError {
MalformedVersionFile,
#[error(
"Your database version ({major}.{minor}.{patch}) is incompatible with your current engine version ({}).\n\
To migrate data between Meilisearch versions, please follow our guide on https://docs.meilisearch.com/learn/advanced/updating.html.",
To migrate data between Meilisearch versions, please follow our guide on https://docs.meilisearch.com/learn/update_and_migration/updating.html.",
env!("CARGO_PKG_VERSION").to_string()
)]
VersionMismatch { major: String, minor: String, patch: String },

View File

@ -1,10 +1,16 @@
[package]
authors = ["Quentin de Quelen <quentin@dequelen.me>", "Clément Renault <clement@meilisearch.com>"]
description = "Meilisearch HTTP server"
edition = "2021"
license = "MIT"
name = "meilisearch"
version = "1.0.0"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
edition.workspace = true
license.workspace = true
default-run = "meilisearch"
[dependencies]
actix-cors = "0.6.3"
@ -19,7 +25,7 @@ byte-unit = { version = "4.0.14", default-features = false, features = ["std", "
bytes = "1.2.1"
clap = { version = "4.0.9", features = ["derive", "env"] }
crossbeam-channel = "0.5.6"
deserr = "0.3.0"
deserr = "0.5.0"
dump = { path = "../dump" }
either = "1.8.0"
env_logger = "0.9.1"
@ -46,7 +52,7 @@ parking_lot = "0.12.1"
permissive-json-pointer = { path = "../permissive-json-pointer" }
pin-project-lite = "0.2.9"
platform-dirs = "0.3.0"
prometheus = { version = "0.13.2", features = ["process"], optional = true }
prometheus = { version = "0.13.2", features = ["process"] }
rand = "0.8.5"
rayon = "1.5.3"
regex = "1.6.0"
@ -65,7 +71,7 @@ tar = "0.4.38"
tempfile = "3.3.0"
thiserror = "1.0.37"
time = { version = "0.3.15", features = ["serde-well-known", "formatting", "parsing", "macros"] }
tokio = { version = "1.21.2", features = ["full"] }
tokio = { version = "1.24.2", features = ["full"] }
tokio-stream = "0.1.10"
toml = "0.5.9"
uuid = { version = "1.1.2", features = ["serde", "v4"] }
@ -90,7 +96,7 @@ yaup = "0.2.1"
[build-dependencies]
anyhow = { version = "1.0.65", optional = true }
cargo_toml = { version = "0.13.0", optional = true }
cargo_toml = { version = "0.14.0", optional = true }
hex = { version = "0.4.3", optional = true }
reqwest = { version = "0.11.12", features = ["blocking", "rustls-tls"], default-features = false, optional = true }
sha-1 = { version = "0.10.0", optional = true }
@ -101,7 +107,6 @@ zip = { version = "0.6.2", optional = true }
[features]
default = ["analytics", "meilisearch-types/default", "mini-dashboard"]
metrics = ["prometheus"]
analytics = ["segment"]
mini-dashboard = ["actix-web-static-files", "static-files", "anyhow", "cargo_toml", "hex", "reqwest", "sha-1", "tempfile", "zip"]
chinese = ["meilisearch-types/chinese"]
@ -110,5 +115,5 @@ japanese = ["meilisearch-types/japanese"]
thai = ["meilisearch-types/thai"]
[package.metadata.mini-dashboard]
assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.5/build.zip"
sha1 = "6fe959b78511b32e9ff857fd9fd31740633b9fce"
assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.6/build.zip"
sha1 = "dce0aba16bceab5549edf9f01de89858800f7422"

View File

@ -1,11 +1,12 @@
use vergen::{vergen, Config, SemverKind};
fn main() {
// Note: any code that needs VERGEN_ environment variables should take care to define them manually in the Dockerfile and pass them
// in the corresponding GitHub workflow (publish_docker.yml).
// This is due to the Dockerfile building the binary outside of the git directory.
let mut config = Config::default();
// allow using non-annotated tags
*config.git_mut().semver_kind_mut() = SemverKind::Lightweight;
// add -dirty suffix when we're not right on the tag
*config.git_mut().semver_dirty_mut() = Some("-dirty");
if let Err(e) = vergen(config) {
println!("cargo:warning=vergen: {}", e);

View File

@ -9,7 +9,7 @@ use actix_web::HttpRequest;
use byte_unit::Byte;
use http::header::CONTENT_TYPE;
use index_scheduler::IndexScheduler;
use meilisearch_auth::{AuthController, SearchRules};
use meilisearch_auth::{AuthController, AuthFilter};
use meilisearch_types::InstanceUid;
use once_cell::sync::Lazy;
use regex::Regex;
@ -224,6 +224,7 @@ impl super::Analytics for SegmentAnalytics {
#[derive(Debug, Clone, Serialize)]
struct Infos {
env: String,
experimental_enable_metrics: bool,
db_path: bool,
import_dump: bool,
dump_dir: bool,
@ -256,6 +257,7 @@ impl From<Opt> for Infos {
// Thus we must not insert `..` at the end.
let Opt {
db_path,
experimental_enable_metrics,
http_addr,
master_key: _,
env,
@ -290,12 +292,14 @@ impl From<Opt> for Infos {
ScheduleSnapshot::Enabled(interval) => Some(interval),
};
let IndexerOpts { max_indexing_memory, max_indexing_threads } = indexer_options;
let IndexerOpts { max_indexing_memory, max_indexing_threads, skip_index_budget: _ } =
indexer_options;
// We're going to override every sensible information.
// We consider information sensible if it contains a path, an address, or a key.
Self {
env,
experimental_enable_metrics,
db_path: db_path != PathBuf::from("./data.ms"),
import_dump: import_dump.is_some(),
dump_dir: dump_dir != PathBuf::from("dumps/"),
@ -407,7 +411,7 @@ impl Segment {
auth_controller: AuthController,
) {
if let Ok(stats) =
create_all_stats(index_scheduler.into(), auth_controller, &SearchRules::default())
create_all_stats(index_scheduler.into(), auth_controller, &AuthFilter::default())
{
// Replace the version number with the prototype name if any.
let version = if let Some(prototype) = crate::prototype_name() {
@ -435,7 +439,7 @@ impl Segment {
let post_search = std::mem::take(&mut self.post_search_aggregator)
.into_event(&self.user, "Documents Searched POST");
let post_multi_search = std::mem::take(&mut self.post_multi_search_aggregator)
.into_event(&self.user, "Documents Searched By Array of Queries POST");
.into_event(&self.user, "Documents Searched by Multi-Search POST");
let add_documents = std::mem::take(&mut self.add_documents_aggregator)
.into_event(&self.user, "Documents Added");
let delete_documents = std::mem::take(&mut self.delete_documents_aggregator)
@ -496,6 +500,7 @@ pub struct SearchAggregator {
// filter
filter_with_geo_radius: bool,
filter_with_geo_bounding_box: bool,
// every time a request has a filter, this field must be incremented by the number of terms it contains
filter_sum_of_criteria_terms: usize,
// every time a request has a filter, this field must be incremented by one
@ -563,6 +568,7 @@ impl SearchAggregator {
let stringified_filters = filter.to_string();
ret.filter_with_geo_radius = stringified_filters.contains("_geoRadius(");
ret.filter_with_geo_bounding_box = stringified_filters.contains("_geoBoundingBox(");
ret.filter_sum_of_criteria_terms = RE.split(&stringified_filters).count();
}
@ -622,6 +628,7 @@ impl SearchAggregator {
// filter
self.filter_with_geo_radius |= other.filter_with_geo_radius;
self.filter_with_geo_bounding_box |= other.filter_with_geo_bounding_box;
self.filter_sum_of_criteria_terms =
self.filter_sum_of_criteria_terms.saturating_add(other.filter_sum_of_criteria_terms);
self.filter_total_number_of_criteria = self
@ -689,6 +696,7 @@ impl SearchAggregator {
},
"filter": {
"with_geoRadius": self.filter_with_geo_radius,
"with_geoBoundingBox": self.filter_with_geo_bounding_box,
"avg_criteria_number": format!("{:.2}", self.filter_sum_of_criteria_terms as f64 / self.filter_total_number_of_criteria as f64),
"most_used_syntax": self.used_syntax.iter().max_by_key(|(_, v)| *v).map(|(k, _)| json!(k)).unwrap_or_else(|| json!(null)),
},
@ -755,7 +763,8 @@ impl MultiSearchAggregator {
let user_agents = extract_user_agents(request).into_iter().collect();
let distinct_indexes: HashSet<_> = query.iter().map(|query| &query.index_uid).collect();
let distinct_indexes: HashSet<_> =
query.iter().map(|query| query.index_uid.as_str()).collect();
Self {
timestamp,

View File

@ -11,6 +11,8 @@ pub enum MeilisearchHttpError {
#[error("A Content-Type header is missing. Accepted values for the Content-Type header are: {}",
.0.iter().map(|s| format!("`{}`", s)).collect::<Vec<_>>().join(", "))]
MissingContentType(Vec<String>),
#[error("The Content-Type `{0}` does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type `text/csv`.")]
CsvDelimiterWithWrongContentType(String),
#[error(
"The Content-Type `{0}` is invalid. Accepted values for the Content-Type header are: {}",
.1.iter().map(|s| format!("`{}`", s)).collect::<Vec<_>>().join(", ")
@ -52,6 +54,7 @@ impl ErrorCode for MeilisearchHttpError {
fn error_code(&self) -> Code {
match self {
MeilisearchHttpError::MissingContentType(_) => Code::MissingContentType,
MeilisearchHttpError::CsvDelimiterWithWrongContentType(_) => Code::InvalidContentType,
MeilisearchHttpError::MissingPayload(_) => Code::MissingPayload,
MeilisearchHttpError::InvalidContentType(_, _) => Code::InvalidContentType,
MeilisearchHttpError::DocumentNotFound(_) => Code::DocumentNotFound,

View File

@ -136,6 +136,13 @@ pub mod policies {
use crate::extractors::authentication::Policy;
enum TenantTokenOutcome {
NotATenantToken,
Invalid,
Expired,
Valid(Uuid, SearchRules),
}
fn tenant_token_validation() -> Validation {
let mut validation = Validation::default();
validation.validate_exp = false;
@ -164,29 +171,42 @@ pub mod policies {
pub struct ActionPolicy<const A: u8>;
impl<const A: u8> Policy for ActionPolicy<A> {
/// Attempts to grant authentication from a bearer token (that can be a tenant token or an API key), the requested Action,
/// and a list of requested indexes.
///
/// If the bearer token is not allowed for the specified indexes and action, returns `None`.
/// Otherwise, returns an object containing the generated permissions: the search filters to add to a search, and the list of allowed indexes
/// (that may contain more indexes than requested).
fn authenticate(
auth: AuthController,
token: &str,
index: Option<&str>,
) -> Option<AuthFilter> {
// authenticate if token is the master key.
// master key can only have access to keys routes.
// if master key is None only keys routes are inaccessible.
// Without a master key, all routes are accessible except the key-related routes.
if auth.get_master_key().map_or_else(|| !is_keys_action(A), |mk| mk == token) {
return Some(AuthFilter::default());
}
// Tenant token
if let Some(filters) = ActionPolicy::<A>::authenticate_tenant_token(&auth, token, index)
{
return Some(filters);
} else if let Some(action) = Action::from_repr(A) {
// API key
if let Ok(Some(uid)) = auth.get_optional_uid_from_encoded_key(token.as_bytes()) {
if let Ok(true) = auth.is_key_authorized(uid, action, index) {
return auth.get_key_filters(uid, None).ok();
let (key_uuid, search_rules) =
match ActionPolicy::<A>::authenticate_tenant_token(&auth, token) {
TenantTokenOutcome::Valid(key_uuid, search_rules) => {
(key_uuid, Some(search_rules))
}
}
TenantTokenOutcome::Expired => return None,
TenantTokenOutcome::Invalid => return None,
TenantTokenOutcome::NotATenantToken => {
(auth.get_optional_uid_from_encoded_key(token.as_bytes()).ok()??, None)
}
};
// check that the indexes are allowed
let action = Action::from_repr(A)?;
let auth_filter = auth.get_key_filters(key_uuid, search_rules).ok()?;
if auth.is_key_authorized(key_uuid, action, index).unwrap_or(false)
&& index.map(|index| auth_filter.is_index_authorized(index)).unwrap_or(true)
{
return Some(auth_filter);
}
None
@ -194,46 +214,43 @@ pub mod policies {
}
impl<const A: u8> ActionPolicy<A> {
fn authenticate_tenant_token(
auth: &AuthController,
token: &str,
index: Option<&str>,
) -> Option<AuthFilter> {
fn authenticate_tenant_token(auth: &AuthController, token: &str) -> TenantTokenOutcome {
// Only search action can be accessed by a tenant token.
if A != actions::SEARCH {
return None;
return TenantTokenOutcome::NotATenantToken;
}
let uid = extract_key_id(token)?;
// check if parent key is authorized to do the action.
if auth.is_key_authorized(uid, Action::Search, index).ok()? {
// Check if tenant token is valid.
let key = auth.generate_key(uid)?;
let data = decode::<Claims>(
token,
&DecodingKey::from_secret(key.as_bytes()),
&tenant_token_validation(),
)
.ok()?;
let uid = if let Some(uid) = extract_key_id(token) {
uid
} else {
return TenantTokenOutcome::NotATenantToken;
};
// Check index access if an index restriction is provided.
if let Some(index) = index {
if !data.claims.search_rules.is_index_authorized(index) {
return None;
}
// Check if tenant token is valid.
let key = if let Some(key) = auth.generate_key(uid) {
key
} else {
return TenantTokenOutcome::Invalid;
};
let data = if let Ok(data) = decode::<Claims>(
token,
&DecodingKey::from_secret(key.as_bytes()),
&tenant_token_validation(),
) {
data
} else {
return TenantTokenOutcome::Invalid;
};
// Check if token is expired.
if let Some(exp) = data.claims.exp {
if OffsetDateTime::now_utc().unix_timestamp() > exp {
return TenantTokenOutcome::Expired;
}
// Check if token is expired.
if let Some(exp) = data.claims.exp {
if OffsetDateTime::now_utc().unix_timestamp() > exp {
return None;
}
}
return auth.get_key_filters(uid, Some(data.claims.search_rules)).ok();
}
None
TenantTokenOutcome::Valid(uid, data.claims.search_rules)
}
}

View File

@ -1,78 +0,0 @@
use std::fmt::Debug;
use std::future::Future;
use std::marker::PhantomData;
use std::pin::Pin;
use std::task::{Context, Poll};
use actix_web::dev::Payload;
use actix_web::web::Json;
use actix_web::{FromRequest, HttpRequest};
use deserr::{DeserializeError, DeserializeFromValue};
use futures::ready;
use meilisearch_types::error::{ErrorCode, ResponseError};
/// Extractor for typed data from Json request payloads
/// deserialised by deserr.
///
/// # Extractor
/// To extract typed data from a request body, the inner type `T` must implement the
/// [`deserr::DeserializeFromError<E>`] trait. The inner type `E` must implement the
/// [`ErrorCode`](meilisearch_error::ErrorCode) trait.
#[derive(Debug)]
pub struct ValidatedJson<T, E>(pub T, PhantomData<*const E>);
impl<T, E> ValidatedJson<T, E> {
pub fn new(data: T) -> Self {
ValidatedJson(data, PhantomData)
}
pub fn into_inner(self) -> T {
self.0
}
}
impl<T, E> FromRequest for ValidatedJson<T, E>
where
E: DeserializeError + ErrorCode + std::error::Error + 'static,
T: DeserializeFromValue<E>,
{
type Error = actix_web::Error;
type Future = ValidatedJsonExtractFut<T, E>;
#[inline]
fn from_request(req: &HttpRequest, payload: &mut Payload) -> Self::Future {
ValidatedJsonExtractFut {
fut: Json::<serde_json::Value>::from_request(req, payload),
_phantom: PhantomData,
}
}
}
pub struct ValidatedJsonExtractFut<T, E> {
fut: <Json<serde_json::Value> as FromRequest>::Future,
_phantom: PhantomData<*const (T, E)>,
}
impl<T, E> Future for ValidatedJsonExtractFut<T, E>
where
T: DeserializeFromValue<E>,
E: DeserializeError + ErrorCode + std::error::Error + 'static,
{
type Output = Result<ValidatedJson<T, E>, actix_web::Error>;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
let ValidatedJsonExtractFut { fut, .. } = self.get_mut();
let fut = Pin::new(fut);
let res = ready!(fut.poll(cx));
let res = match res {
Err(err) => Err(err),
Ok(data) => match deserr::deserialize::<_, _, E>(data.into_inner()) {
Ok(data) => Ok(ValidatedJson::new(data)),
Err(e) => Err(ResponseError::from(e).into()),
},
};
Poll::Ready(res)
}
}

View File

@ -1,6 +1,4 @@
pub mod payload;
#[macro_use]
pub mod authentication;
pub mod json;
pub mod query_parameters;
pub mod sequential_extractor;

View File

@ -1,70 +0,0 @@
//! A module to parse query parameter with deserr
use std::marker::PhantomData;
use std::{fmt, ops};
use actix_http::Payload;
use actix_utils::future::{err, ok, Ready};
use actix_web::{FromRequest, HttpRequest};
use deserr::{DeserializeError, DeserializeFromValue};
use meilisearch_types::error::{Code, ErrorCode, ResponseError};
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]
pub struct QueryParameter<T, E>(pub T, PhantomData<*const E>);
impl<T, E> QueryParameter<T, E> {
/// Unwrap into inner `T` value.
pub fn into_inner(self) -> T {
self.0
}
}
impl<T, E> QueryParameter<T, E>
where
T: DeserializeFromValue<E>,
E: DeserializeError + ErrorCode + std::error::Error + 'static,
{
pub fn from_query(query_str: &str) -> Result<Self, actix_web::Error> {
let value = serde_urlencoded::from_str::<serde_json::Value>(query_str)
.map_err(|e| ResponseError::from_msg(e.to_string(), Code::BadRequest))?;
match deserr::deserialize::<_, _, E>(value) {
Ok(data) => Ok(QueryParameter(data, PhantomData)),
Err(e) => Err(ResponseError::from(e).into()),
}
}
}
impl<T, E> ops::Deref for QueryParameter<T, E> {
type Target = T;
fn deref(&self) -> &T {
&self.0
}
}
impl<T, E> ops::DerefMut for QueryParameter<T, E> {
fn deref_mut(&mut self) -> &mut T {
&mut self.0
}
}
impl<T: fmt::Display, E> fmt::Display for QueryParameter<T, E> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.0.fmt(f)
}
}
impl<T, E> FromRequest for QueryParameter<T, E>
where
T: DeserializeFromValue<E>,
E: DeserializeError + ErrorCode + std::error::Error + 'static,
{
type Error = actix_web::Error;
type Future = Ready<Result<Self, actix_web::Error>>;
#[inline]
fn from_request(req: &HttpRequest, _: &mut Payload) -> Self::Future {
QueryParameter::from_query(req.query_string()).map(ok).unwrap_or_else(err)
}
}

View File

@ -4,15 +4,12 @@ pub mod error;
pub mod analytics;
#[macro_use]
pub mod extractors;
pub mod metrics;
pub mod middleware;
pub mod option;
pub mod routes;
pub mod search;
#[cfg(feature = "metrics")]
pub mod metrics;
#[cfg(feature = "metrics")]
pub mod route_metrics;
use std::fs::File;
use std::io::{BufReader, BufWriter};
use std::path::Path;
@ -25,7 +22,7 @@ use actix_http::body::MessageBody;
use actix_web::dev::{ServiceFactory, ServiceResponse};
use actix_web::error::JsonPayloadError;
use actix_web::web::Data;
use actix_web::{middleware, web, HttpRequest};
use actix_web::{web, HttpRequest};
use analytics::Analytics;
use anyhow::bail;
use error::PayloadError;
@ -45,6 +42,34 @@ use option::ScheduleSnapshot;
use crate::error::MeilisearchHttpError;
/// Default number of simultaneously opened indexes.
///
/// This value is used when dynamic computation of how many indexes can be opened at once was skipped (e.g., in tests).
///
/// Lower for Windows that dedicates a smaller virtual address space to processes.
///
/// The value was chosen this way:
///
/// - Windows provides a small virtual address space of about 10TiB to processes.
/// - The chosen value allows for indexes to use the default map size of 2TiB safely.
#[cfg(windows)]
const DEFAULT_INDEX_COUNT: usize = 4;
/// Default number of simultaneously opened indexes.
///
/// This value is used when dynamic computation of how many indexes can be opened at once was skipped (e.g., in tests).
///
/// The higher, the better for avoiding reopening indexes.
///
/// The value was chosen this way:
///
/// - Opening an index consumes a file descriptor.
/// - The default on many unices is about 256 file descriptors for a process.
/// - 100 is a little bit less than half this value.
/// - The chosen value allows for indexes to use the default map size of 2TiB safely.
#[cfg(not(windows))]
const DEFAULT_INDEX_COUNT: usize = 20;
/// Check if a db is empty. It does not provide any information on the
/// validity of the data in it.
/// We consider a database as non empty when it's a non empty directory.
@ -86,13 +111,13 @@ pub fn create_app(
analytics.clone(),
)
})
.configure(routes::configure)
.configure(|cfg| routes::configure(cfg, opt.experimental_enable_metrics))
.configure(|s| dashboard(s, enable_dashboard));
#[cfg(feature = "metrics")]
let app = app.configure(|s| configure_metrics_route(s, opt.enable_metrics_route));
#[cfg(feature = "metrics")]
let app = app.wrap(Condition::new(opt.enable_metrics_route, route_metrics::RouteMetrics));
let app = app.wrap(actix_web::middleware::Condition::new(
opt.experimental_enable_metrics,
middleware::RouteMetrics,
));
app.wrap(
Cors::default()
.send_wildcard()
@ -101,9 +126,9 @@ pub fn create_app(
.allow_any_method()
.max_age(86_400), // 24h
)
.wrap(middleware::Logger::default())
.wrap(middleware::Compress::default())
.wrap(middleware::NormalizePath::new(middleware::TrailingSlash::Trim))
.wrap(actix_web::middleware::Logger::default())
.wrap(actix_web::middleware::Compress::default())
.wrap(actix_web::middleware::NormalizePath::new(actix_web::middleware::TrailingSlash::Trim))
}
enum OnFailure {
@ -205,9 +230,11 @@ fn open_or_create_database_unchecked(
snapshots_path: opt.snapshot_dir.clone(),
dumps_path: opt.dump_dir.clone(),
task_db_size: opt.max_task_db_size.get_bytes() as usize,
index_size: opt.max_index_size.get_bytes() as usize,
index_base_map_size: opt.max_index_size.get_bytes() as usize,
indexer_config: (&opt.indexer_options).try_into()?,
autobatching_enabled: true,
index_growth_amount: byte_unit::Byte::from_str("10GiB").unwrap().get_bytes() as usize,
index_count: DEFAULT_INDEX_COUNT,
})?)
};
@ -419,15 +446,6 @@ pub fn dashboard(config: &mut web::ServiceConfig, _enable_frontend: bool) {
config.service(web::resource("/").route(web::get().to(routes::running)));
}
#[cfg(feature = "metrics")]
pub fn configure_metrics_route(config: &mut web::ServiceConfig, enable_metrics_route: bool) {
if enable_metrics_route {
config.service(
web::resource("/metrics").route(web::get().to(crate::route_metrics::get_metrics)),
);
}
}
/// Parses the output of
/// [`VERGEN_GIT_SEMVER_LIGHTWEIGHT`](https://docs.rs/vergen/latest/vergen/struct.Git.html#instructions)
/// as a prototype name.
@ -435,18 +453,13 @@ pub fn configure_metrics_route(config: &mut web::ServiceConfig, enable_metrics_r
/// Returns `Some(prototype_name)` if the following conditions are met on this value:
///
/// 1. starts with `prototype-`,
/// 2. does not end with `dirty-`,
/// 3. ends with `-<some_number>`,
/// 4. does not end with `<some_number>-<some_number>`.
/// 2. ends with `-<some_number>`,
/// 3. does not end with `<some_number>-<some_number>`.
///
/// Otherwise, returns `None`.
pub fn prototype_name() -> Option<&'static str> {
let prototype: &'static str = option_env!("VERGEN_GIT_SEMVER_LIGHTWEIGHT")?;
if prototype.ends_with("-dirty") {
return None;
}
if !prototype.starts_with("prototype-") {
return None;
}

View File

@ -1,40 +1,11 @@
//! Contains all the custom middleware used in meilisearch
use std::future::{ready, Ready};
use actix_web::dev::{self, Service, ServiceRequest, ServiceResponse, Transform};
use actix_web::http::header;
use actix_web::{Error, HttpResponse};
use actix_web::Error;
use futures_util::future::LocalBoxFuture;
use meilisearch_auth::actions;
use meilisearch_lib::MeiliSearch;
use meilisearch_types::error::ResponseError;
use prometheus::{Encoder, HistogramTimer, TextEncoder};
use crate::extractors::authentication::policies::ActionPolicy;
use crate::extractors::authentication::GuardedData;
pub async fn get_metrics(
meilisearch: GuardedData<ActionPolicy<{ actions::METRICS_GET }>, MeiliSearch>,
) -> Result<HttpResponse, ResponseError> {
let search_rules = &meilisearch.filters().search_rules;
let response = meilisearch.get_all_stats(search_rules).await?;
crate::metrics::MEILISEARCH_DB_SIZE_BYTES.set(response.database_size as i64);
crate::metrics::MEILISEARCH_INDEX_COUNT.set(response.indexes.len() as i64);
for (index, value) in response.indexes.iter() {
crate::metrics::MEILISEARCH_INDEX_DOCS_COUNT
.with_label_values(&[index])
.set(value.number_of_documents as i64);
}
let encoder = TextEncoder::new();
let mut buffer = vec![];
encoder.encode(&prometheus::gather(), &mut buffer).expect("Failed to encode metrics");
let response = String::from_utf8(buffer).expect("Failed to convert bytes to string");
Ok(HttpResponse::Ok().insert_header(header::ContentType(mime::TEXT_PLAIN)).body(response))
}
use prometheus::HistogramTimer;
pub struct RouteMetrics;

View File

@ -47,8 +47,7 @@ const MEILI_IGNORE_MISSING_DUMP: &str = "MEILI_IGNORE_MISSING_DUMP";
const MEILI_IGNORE_DUMP_IF_DB_EXISTS: &str = "MEILI_IGNORE_DUMP_IF_DB_EXISTS";
const MEILI_DUMP_DIR: &str = "MEILI_DUMP_DIR";
const MEILI_LOG_LEVEL: &str = "MEILI_LOG_LEVEL";
#[cfg(feature = "metrics")]
const MEILI_ENABLE_METRICS_ROUTE: &str = "MEILI_ENABLE_METRICS_ROUTE";
const MEILI_EXPERIMENTAL_ENABLE_METRICS: &str = "MEILI_EXPERIMENTAL_ENABLE_METRICS";
const DEFAULT_CONFIG_FILE_PATH: &str = "./config.toml";
const DEFAULT_DB_PATH: &str = "./data.ms";
@ -65,11 +64,11 @@ const MEILI_MAX_INDEXING_THREADS: &str = "MEILI_MAX_INDEXING_THREADS";
const DEFAULT_LOG_EVERY_N: usize = 100_000;
// Each environment (index and task-db) is taking space in the virtual address space.
//
// The size of the virtual address space is limited by the OS. About 100TB for Linux and about 10TB for Windows.
// This means that the number of indexes is limited to about 200 for Linux and about 20 for Windows.
pub const INDEX_SIZE: u64 = 536_870_912_000; // 500 GiB
pub const TASK_DB_SIZE: u64 = 10_737_418_240; // 10 GiB
// Ideally, indexes can occupy 2TiB each to avoid having to manually resize them.
// The actual size of the virtual address space is computed at startup to determine how many 2TiB indexes can be
// opened simultaneously.
pub const INDEX_SIZE: u64 = 2 * 1024 * 1024 * 1024 * 1024; // 2 TiB
pub const TASK_DB_SIZE: u64 = 10 * 1024 * 1024 * 1024; // 10 GiB
#[derive(Debug, Default, Clone, Copy, Serialize, Deserialize)]
#[serde(rename_all = "UPPERCASE")]
@ -287,11 +286,12 @@ pub struct Opt {
#[serde(default)]
pub log_level: LogLevel,
/// Enables Prometheus metrics and /metrics route.
#[cfg(feature = "metrics")]
#[clap(long, env = MEILI_ENABLE_METRICS_ROUTE)]
/// Experimental metrics feature. For more information, see: <https://github.com/meilisearch/meilisearch/discussions/3518>
///
/// Enables the Prometheus metrics on the `GET /metrics` endpoint.
#[clap(long, env = MEILI_EXPERIMENTAL_ENABLE_METRICS)]
#[serde(default)]
pub enable_metrics_route: bool,
pub experimental_enable_metrics: bool,
#[serde(flatten)]
#[clap(flatten)]
@ -384,8 +384,7 @@ impl Opt {
config_file_path: _,
#[cfg(all(not(debug_assertions), feature = "analytics"))]
no_analytics,
#[cfg(feature = "metrics")]
enable_metrics_route,
experimental_enable_metrics: enable_metrics_route,
} = self;
export_to_env_if_not_present(MEILI_DB_PATH, db_path);
export_to_env_if_not_present(MEILI_HTTP_ADDR, http_addr);
@ -423,13 +422,10 @@ impl Opt {
export_to_env_if_not_present(MEILI_DUMP_DIR, dump_dir);
export_to_env_if_not_present(MEILI_LOG_LEVEL, log_level.to_string());
#[cfg(feature = "metrics")]
{
export_to_env_if_not_present(
MEILI_ENABLE_METRICS_ROUTE,
enable_metrics_route.to_string(),
);
}
export_to_env_if_not_present(
MEILI_EXPERIMENTAL_ENABLE_METRICS,
enable_metrics_route.to_string(),
);
indexer_options.export_to_env();
}
@ -494,12 +490,21 @@ pub struct IndexerOpts {
#[clap(long, env = MEILI_MAX_INDEXING_THREADS, default_value_t)]
#[serde(default)]
pub max_indexing_threads: MaxThreads,
/// Whether or not we want to determine the budget of virtual memory address space we have available dynamically
/// (the default), or statically.
///
/// Determining the budget of virtual memory address space dynamically takes some time on some systems (such as macOS)
/// and may make tests non-deterministic, so we want to skip it in tests.
#[clap(skip)]
#[serde(skip)]
pub skip_index_budget: bool,
}
impl IndexerOpts {
/// Exports the values to their corresponding env vars if they are not set.
pub fn export_to_env(self) {
let IndexerOpts { max_indexing_memory, max_indexing_threads } = self;
let IndexerOpts { max_indexing_memory, max_indexing_threads, skip_index_budget: _ } = self;
if let Some(max_indexing_memory) = max_indexing_memory.0 {
export_to_env_if_not_present(
MEILI_MAX_INDEXING_MEMORY,
@ -527,6 +532,7 @@ impl TryFrom<&IndexerOpts> for IndexerConfig {
max_memory: other.max_indexing_memory.map(|b| b.get_bytes() as usize),
thread_pool: Some(thread_pool),
max_positions_per_attributes: None,
skip_index_budget: other.skip_index_budget,
..Default::default()
})
}

View File

@ -1,7 +1,8 @@
use std::str;
use actix_web::{web, HttpRequest, HttpResponse};
use deserr::DeserializeFromValue;
use deserr::actix_web::{AwebJson, AwebQueryParameter};
use deserr::Deserr;
use meilisearch_auth::error::AuthControllerError;
use meilisearch_auth::AuthController;
use meilisearch_types::deserr::query_params::Param;
@ -16,8 +17,6 @@ use uuid::Uuid;
use super::PAGINATION_DEFAULT_LIMIT;
use crate::extractors::authentication::policies::*;
use crate::extractors::authentication::GuardedData;
use crate::extractors::json::ValidatedJson;
use crate::extractors::query_parameters::QueryParameter;
use crate::extractors::sequential_extractor::SeqHandler;
use crate::routes::Pagination;
@ -37,7 +36,7 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
pub async fn create_api_key(
auth_controller: GuardedData<ActionPolicy<{ actions::KEYS_CREATE }>, AuthController>,
body: ValidatedJson<CreateApiKey, DeserrJsonError>,
body: AwebJson<CreateApiKey, DeserrJsonError>,
_req: HttpRequest,
) -> Result<HttpResponse, ResponseError> {
let v = body.into_inner();
@ -51,7 +50,7 @@ pub async fn create_api_key(
Ok(HttpResponse::Created().json(res))
}
#[derive(DeserializeFromValue, Debug, Clone, Copy)]
#[derive(Deserr, Debug, Clone, Copy)]
#[deserr(error = DeserrQueryParamError, rename_all = camelCase, deny_unknown_fields)]
pub struct ListApiKeys {
#[deserr(default, error = DeserrQueryParamError<InvalidApiKeyOffset>)]
@ -59,6 +58,7 @@ pub struct ListApiKeys {
#[deserr(default = Param(PAGINATION_DEFAULT_LIMIT), error = DeserrQueryParamError<InvalidApiKeyLimit>)]
pub limit: Param<usize>,
}
impl ListApiKeys {
fn as_pagination(self) -> Pagination {
Pagination { offset: self.offset.0, limit: self.limit.0 }
@ -67,7 +67,7 @@ impl ListApiKeys {
pub async fn list_api_keys(
auth_controller: GuardedData<ActionPolicy<{ actions::KEYS_GET }>, AuthController>,
list_api_keys: QueryParameter<ListApiKeys, DeserrQueryParamError>,
list_api_keys: AwebQueryParameter<ListApiKeys, DeserrQueryParamError>,
) -> Result<HttpResponse, ResponseError> {
let paginate = list_api_keys.into_inner().as_pagination();
let page_view = tokio::task::spawn_blocking(move || -> Result<_, AuthControllerError> {
@ -104,7 +104,7 @@ pub async fn get_api_key(
pub async fn patch_api_key(
auth_controller: GuardedData<ActionPolicy<{ actions::KEYS_UPDATE }>, AuthController>,
body: ValidatedJson<PatchApiKey, DeserrJsonError>,
body: AwebJson<PatchApiKey, DeserrJsonError>,
path: web::Path<AuthParam>,
) -> Result<HttpResponse, ResponseError> {
let key = path.into_inner().key;

View File

@ -4,15 +4,16 @@ use actix_web::http::header::CONTENT_TYPE;
use actix_web::web::Data;
use actix_web::{web, HttpMessage, HttpRequest, HttpResponse};
use bstr::ByteSlice;
use deserr::DeserializeFromValue;
use deserr::actix_web::AwebQueryParameter;
use deserr::Deserr;
use futures::StreamExt;
use index_scheduler::IndexScheduler;
use log::debug;
use meilisearch_types::deserr::query_params::Param;
use meilisearch_types::deserr::{DeserrJsonError, DeserrQueryParamError};
use meilisearch_types::deserr::DeserrQueryParamError;
use meilisearch_types::document_formats::{read_csv, read_json, read_ndjson, PayloadType};
use meilisearch_types::error::deserr_codes::*;
use meilisearch_types::error::ResponseError;
use meilisearch_types::error::{Code, ResponseError};
use meilisearch_types::heed::RoTxn;
use meilisearch_types::index_uid::IndexUid;
use meilisearch_types::milli::update::IndexDocumentsMethod;
@ -33,7 +34,6 @@ use crate::error::PayloadError::ReceivePayload;
use crate::extractors::authentication::policies::*;
use crate::extractors::authentication::GuardedData;
use crate::extractors::payload::Payload;
use crate::extractors::query_parameters::QueryParameter;
use crate::extractors::sequential_extractor::SeqHandler;
use crate::routes::{PaginationView, SummarizedTaskView, PAGINATION_DEFAULT_LIMIT};
@ -67,7 +67,7 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
cfg.service(
web::resource("")
.route(web::get().to(SeqHandler(get_all_documents)))
.route(web::post().to(SeqHandler(add_documents)))
.route(web::post().to(SeqHandler(replace_documents)))
.route(web::put().to(SeqHandler(update_documents)))
.route(web::delete().to(SeqHandler(clear_all_documents))),
)
@ -80,7 +80,7 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
);
}
#[derive(Debug, DeserializeFromValue)]
#[derive(Debug, Deserr)]
#[deserr(error = DeserrQueryParamError, rename_all = camelCase, deny_unknown_fields)]
pub struct GetDocument {
#[deserr(default, error = DeserrQueryParamError<InvalidDocumentFields>)]
@ -90,7 +90,7 @@ pub struct GetDocument {
pub async fn get_document(
index_scheduler: GuardedData<ActionPolicy<{ actions::DOCUMENTS_GET }>, Data<IndexScheduler>>,
document_param: web::Path<DocumentParam>,
params: QueryParameter<GetDocument, DeserrQueryParamError>,
params: AwebQueryParameter<GetDocument, DeserrQueryParamError>,
) -> Result<HttpResponse, ResponseError> {
let DocumentParam { index_uid, document_id } = document_param.into_inner();
let index_uid = IndexUid::try_from(index_uid)?;
@ -125,7 +125,7 @@ pub async fn delete_document(
Ok(HttpResponse::Accepted().json(task))
}
#[derive(Debug, DeserializeFromValue)]
#[derive(Debug, Deserr)]
#[deserr(error = DeserrQueryParamError, rename_all = camelCase, deny_unknown_fields)]
pub struct BrowseQuery {
#[deserr(default, error = DeserrQueryParamError<InvalidDocumentOffset>)]
@ -139,7 +139,7 @@ pub struct BrowseQuery {
pub async fn get_all_documents(
index_scheduler: GuardedData<ActionPolicy<{ actions::DOCUMENTS_GET }>, Data<IndexScheduler>>,
index_uid: web::Path<String>,
params: QueryParameter<BrowseQuery, DeserrQueryParamError>,
params: AwebQueryParameter<BrowseQuery, DeserrQueryParamError>,
) -> Result<HttpResponse, ResponseError> {
let index_uid = IndexUid::try_from(index_uid.into_inner())?;
debug!("called with params: {:?}", params);
@ -155,17 +155,32 @@ pub async fn get_all_documents(
Ok(HttpResponse::Ok().json(ret))
}
#[derive(Deserialize, Debug, DeserializeFromValue)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
#[derive(Deserialize, Debug, Deserr)]
#[deserr(error = DeserrQueryParamError, rename_all = camelCase, deny_unknown_fields)]
pub struct UpdateDocumentsQuery {
#[deserr(default, error = DeserrJsonError<InvalidIndexPrimaryKey>)]
#[deserr(default, error = DeserrQueryParamError<InvalidIndexPrimaryKey>)]
pub primary_key: Option<String>,
#[deserr(default, try_from(char) = from_char_csv_delimiter -> DeserrQueryParamError<InvalidDocumentCsvDelimiter>, error = DeserrQueryParamError<InvalidDocumentCsvDelimiter>)]
pub csv_delimiter: Option<u8>,
}
pub async fn add_documents(
fn from_char_csv_delimiter(
c: char,
) -> Result<Option<u8>, DeserrQueryParamError<InvalidDocumentCsvDelimiter>> {
if c.is_ascii() {
Ok(Some(c as u8))
} else {
Err(DeserrQueryParamError::new(
format!("csv delimiter must be an ascii character. Found: `{}`", c),
Code::InvalidDocumentCsvDelimiter,
))
}
}
pub async fn replace_documents(
index_scheduler: GuardedData<ActionPolicy<{ actions::DOCUMENTS_ADD }>, Data<IndexScheduler>>,
index_uid: web::Path<String>,
params: QueryParameter<UpdateDocumentsQuery, DeserrJsonError>,
params: AwebQueryParameter<UpdateDocumentsQuery, DeserrQueryParamError>,
body: Payload,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
@ -177,12 +192,13 @@ pub async fn add_documents(
analytics.add_documents(&params, index_scheduler.index(&index_uid).is_err(), &req);
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation = index_scheduler.filters().allow_index_creation(&index_uid);
let task = document_addition(
extract_mime_type(&req)?,
index_scheduler,
index_uid,
params.primary_key,
params.csv_delimiter,
body,
IndexDocumentsMethod::ReplaceDocuments,
allow_index_creation,
@ -195,7 +211,7 @@ pub async fn add_documents(
pub async fn update_documents(
index_scheduler: GuardedData<ActionPolicy<{ actions::DOCUMENTS_ADD }>, Data<IndexScheduler>>,
index_uid: web::Path<String>,
params: QueryParameter<UpdateDocumentsQuery, DeserrJsonError>,
params: AwebQueryParameter<UpdateDocumentsQuery, DeserrQueryParamError>,
body: Payload,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
@ -203,15 +219,17 @@ pub async fn update_documents(
let index_uid = IndexUid::try_from(index_uid.into_inner())?;
debug!("called with params: {:?}", params);
let params = params.into_inner();
analytics.update_documents(&params, index_scheduler.index(&index_uid).is_err(), &req);
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation = index_scheduler.filters().allow_index_creation(&index_uid);
let task = document_addition(
extract_mime_type(&req)?,
index_scheduler,
index_uid,
params.into_inner().primary_key,
params.primary_key,
params.csv_delimiter,
body,
IndexDocumentsMethod::UpdateDocuments,
allow_index_creation,
@ -221,26 +239,43 @@ pub async fn update_documents(
Ok(HttpResponse::Accepted().json(task))
}
#[allow(clippy::too_many_arguments)]
async fn document_addition(
mime_type: Option<Mime>,
index_scheduler: GuardedData<ActionPolicy<{ actions::DOCUMENTS_ADD }>, Data<IndexScheduler>>,
index_uid: IndexUid,
primary_key: Option<String>,
csv_delimiter: Option<u8>,
mut body: Payload,
method: IndexDocumentsMethod,
allow_index_creation: bool,
) -> Result<SummarizedTaskView, MeilisearchHttpError> {
let format = match mime_type.as_ref().map(|m| (m.type_().as_str(), m.subtype().as_str())) {
Some(("application", "json")) => PayloadType::Json,
Some(("application", "x-ndjson")) => PayloadType::Ndjson,
Some(("text", "csv")) => PayloadType::Csv,
Some((type_, subtype)) => {
let format = match (
mime_type.as_ref().map(|m| (m.type_().as_str(), m.subtype().as_str())),
csv_delimiter,
) {
(Some(("application", "json")), None) => PayloadType::Json,
(Some(("application", "x-ndjson")), None) => PayloadType::Ndjson,
(Some(("text", "csv")), None) => PayloadType::Csv { delimiter: b',' },
(Some(("text", "csv")), Some(delimiter)) => PayloadType::Csv { delimiter },
(Some(("application", "json")), Some(_)) => {
return Err(MeilisearchHttpError::CsvDelimiterWithWrongContentType(String::from(
"application/json",
)))
}
(Some(("application", "x-ndjson")), Some(_)) => {
return Err(MeilisearchHttpError::CsvDelimiterWithWrongContentType(String::from(
"application/x-ndjson",
)))
}
(Some((type_, subtype)), _) => {
return Err(MeilisearchHttpError::InvalidContentType(
format!("{}/{}", type_, subtype),
ACCEPTED_CONTENT_TYPE.clone(),
))
}
None => {
(None, _) => {
return Err(MeilisearchHttpError::MissingContentType(ACCEPTED_CONTENT_TYPE.clone()))
}
};
@ -285,7 +320,9 @@ async fn document_addition(
let documents_count = tokio::task::spawn_blocking(move || {
let documents_count = match format {
PayloadType::Json => read_json(&read_file, update_file.as_file_mut())?,
PayloadType::Csv => read_csv(&read_file, update_file.as_file_mut())?,
PayloadType::Csv { delimiter } => {
read_csv(&read_file, update_file.as_file_mut(), delimiter)?
}
PayloadType::Ndjson => read_ndjson(&read_file, update_file.as_file_mut())?,
};
// we NEED to persist the file here because we moved the `udpate_file` in another task.

View File

@ -2,14 +2,14 @@ use std::convert::Infallible;
use actix_web::web::Data;
use actix_web::{web, HttpRequest, HttpResponse};
use deserr::{DeserializeError, DeserializeFromValue, ValuePointerRef};
use deserr::actix_web::{AwebJson, AwebQueryParameter};
use deserr::{DeserializeError, Deserr, ValuePointerRef};
use index_scheduler::IndexScheduler;
use log::debug;
use meilisearch_types::deserr::error_messages::immutable_field_error;
use meilisearch_types::deserr::query_params::Param;
use meilisearch_types::deserr::{DeserrJsonError, DeserrQueryParamError};
use meilisearch_types::deserr::{immutable_field_error, DeserrJsonError, DeserrQueryParamError};
use meilisearch_types::error::deserr_codes::*;
use meilisearch_types::error::{unwrap_any, Code, ResponseError};
use meilisearch_types::error::{Code, ResponseError};
use meilisearch_types::index_uid::IndexUid;
use meilisearch_types::milli::{self, FieldDistribution, Index};
use meilisearch_types::tasks::KindWithContent;
@ -21,8 +21,6 @@ use super::{Pagination, SummarizedTaskView, PAGINATION_DEFAULT_LIMIT};
use crate::analytics::Analytics;
use crate::extractors::authentication::policies::*;
use crate::extractors::authentication::{AuthenticationError, GuardedData};
use crate::extractors::json::ValidatedJson;
use crate::extractors::query_parameters::QueryParameter;
use crate::extractors::sequential_extractor::SeqHandler;
pub mod documents;
@ -63,6 +61,8 @@ pub struct IndexView {
impl IndexView {
fn new(uid: String, index: &Index) -> Result<IndexView, milli::Error> {
// It is important that this function does not keep the Index handle or a clone of it, because
// `list_indexes` relies on this property to avoid opening all indexes at once.
let rtxn = index.read_txn()?;
Ok(IndexView {
uid,
@ -73,7 +73,7 @@ impl IndexView {
}
}
#[derive(DeserializeFromValue, Debug, Clone, Copy)]
#[derive(Deserr, Debug, Clone, Copy)]
#[deserr(error = DeserrQueryParamError, rename_all = camelCase, deny_unknown_fields)]
pub struct ListIndexes {
#[deserr(default, error = DeserrQueryParamError<InvalidIndexOffset>)]
@ -89,23 +89,27 @@ impl ListIndexes {
pub async fn list_indexes(
index_scheduler: GuardedData<ActionPolicy<{ actions::INDEXES_GET }>, Data<IndexScheduler>>,
paginate: QueryParameter<ListIndexes, DeserrQueryParamError>,
paginate: AwebQueryParameter<ListIndexes, DeserrQueryParamError>,
) -> Result<HttpResponse, ResponseError> {
let search_rules = &index_scheduler.filters().search_rules;
let indexes: Vec<_> = index_scheduler.indexes()?;
let indexes = indexes
.into_iter()
.filter(|(name, _)| search_rules.is_index_authorized(name))
.map(|(name, index)| IndexView::new(name, &index))
.collect::<Result<Vec<_>, _>>()?;
let filters = index_scheduler.filters();
let indexes: Vec<Option<IndexView>> =
index_scheduler.try_for_each_index(|uid, index| -> Result<Option<IndexView>, _> {
if !filters.is_index_authorized(uid) {
return Ok(None);
}
Ok(Some(IndexView::new(uid.to_string(), index)?))
})?;
// Won't cause to open all indexes because IndexView doesn't keep the `Index` opened.
// error when trying to fix it: the trait `ExactSizeIterator` is not implemented for `Flatten<IntoIter<Option<IndexView>>>`
#[allow(clippy::needless_collect)]
let indexes: Vec<IndexView> = indexes.into_iter().flatten().collect();
let ret = paginate.as_pagination().auto_paginate_sized(indexes.into_iter());
debug!("returns: {:?}", ret);
Ok(HttpResponse::Ok().json(ret))
}
#[derive(DeserializeFromValue, Debug)]
#[derive(Deserr, Debug)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
pub struct IndexCreateRequest {
#[deserr(error = DeserrJsonError<InvalidIndexUid>, missing_field_error = DeserrJsonError::missing_index_uid)]
@ -116,13 +120,13 @@ pub struct IndexCreateRequest {
pub async fn create_index(
index_scheduler: GuardedData<ActionPolicy<{ actions::INDEXES_CREATE }>, Data<IndexScheduler>>,
body: ValidatedJson<IndexCreateRequest, DeserrJsonError>,
body: AwebJson<IndexCreateRequest, DeserrJsonError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
let IndexCreateRequest { primary_key, uid } = body.into_inner();
let allow_index_creation = index_scheduler.filters().search_rules.is_index_authorized(&uid);
let allow_index_creation = index_scheduler.filters().allow_index_creation(&uid);
if allow_index_creation {
analytics.publish(
"Index Created".to_string(),
@ -149,7 +153,7 @@ fn deny_immutable_fields_index(
"uid" => immutable_field_error(field, accepted, Code::ImmutableIndexUid),
"createdAt" => immutable_field_error(field, accepted, Code::ImmutableIndexCreatedAt),
"updatedAt" => immutable_field_error(field, accepted, Code::ImmutableIndexUpdatedAt),
_ => unwrap_any(DeserrJsonError::<BadRequest>::error::<Infallible>(
_ => deserr::take_cf_content(DeserrJsonError::<BadRequest>::error::<Infallible>(
None,
deserr::ErrorKind::UnknownKey { key: field, accepted },
location,
@ -157,7 +161,7 @@ fn deny_immutable_fields_index(
}
}
#[derive(DeserializeFromValue, Debug)]
#[derive(Deserr, Debug)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields = deny_immutable_fields_index)]
pub struct UpdateIndexRequest {
#[deserr(default, error = DeserrJsonError<InvalidIndexPrimaryKey>)]
@ -181,7 +185,7 @@ pub async fn get_index(
pub async fn update_index(
index_scheduler: GuardedData<ActionPolicy<{ actions::INDEXES_UPDATE }>, Data<IndexScheduler>>,
index_uid: web::Path<String>,
body: ValidatedJson<UpdateIndexRequest, DeserrJsonError>,
body: AwebJson<UpdateIndexRequest, DeserrJsonError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {

View File

@ -1,5 +1,6 @@
use actix_web::web::Data;
use actix_web::{web, HttpRequest, HttpResponse};
use deserr::actix_web::{AwebJson, AwebQueryParameter};
use index_scheduler::IndexScheduler;
use log::debug;
use meilisearch_types::deserr::query_params::Param;
@ -13,8 +14,6 @@ use serde_json::Value;
use crate::analytics::{Analytics, SearchAggregator};
use crate::extractors::authentication::policies::*;
use crate::extractors::authentication::GuardedData;
use crate::extractors::json::ValidatedJson;
use crate::extractors::query_parameters::QueryParameter;
use crate::extractors::sequential_extractor::SeqHandler;
use crate::search::{
add_search_rules, perform_search, MatchingStrategy, SearchQuery, DEFAULT_CROP_LENGTH,
@ -30,7 +29,7 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
);
}
#[derive(Debug, deserr::DeserializeFromValue)]
#[derive(Debug, deserr::Deserr)]
#[deserr(error = DeserrQueryParamError, rename_all = camelCase, deny_unknown_fields)]
pub struct SearchQueryGet {
#[deserr(default, error = DeserrQueryParamError<InvalidSearchQ>)]
@ -129,7 +128,7 @@ fn fix_sort_query_parameters(sort_query: &str) -> Vec<String> {
pub async fn search_with_url_query(
index_scheduler: GuardedData<ActionPolicy<{ actions::SEARCH }>, Data<IndexScheduler>>,
index_uid: web::Path<String>,
params: QueryParameter<SearchQueryGet, DeserrQueryParamError>,
params: AwebQueryParameter<SearchQueryGet, DeserrQueryParamError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
@ -139,9 +138,7 @@ pub async fn search_with_url_query(
let mut query: SearchQuery = params.into_inner().into();
// Tenant token search_rules.
if let Some(search_rules) =
index_scheduler.filters().search_rules.get_index_search_rules(&index_uid)
{
if let Some(search_rules) = index_scheduler.filters().get_index_search_rules(&index_uid) {
add_search_rules(&mut query, search_rules);
}
@ -163,7 +160,7 @@ pub async fn search_with_url_query(
pub async fn search_with_post(
index_scheduler: GuardedData<ActionPolicy<{ actions::SEARCH }>, Data<IndexScheduler>>,
index_uid: web::Path<String>,
params: ValidatedJson<SearchQuery, DeserrJsonError>,
params: AwebJson<SearchQuery, DeserrJsonError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
@ -173,9 +170,7 @@ pub async fn search_with_post(
debug!("search called with params: {:?}", query);
// Tenant token search_rules.
if let Some(search_rules) =
index_scheduler.filters().search_rules.get_index_search_rules(&index_uid)
{
if let Some(search_rules) = index_scheduler.filters().get_index_search_rules(&index_uid) {
add_search_rules(&mut query, search_rules);
}

View File

@ -1,5 +1,6 @@
use actix_web::web::Data;
use actix_web::{web, HttpRequest, HttpResponse};
use deserr::actix_web::AwebJson;
use index_scheduler::IndexScheduler;
use log::debug;
use meilisearch_types::deserr::DeserrJsonError;
@ -12,7 +13,6 @@ use serde_json::json;
use crate::analytics::Analytics;
use crate::extractors::authentication::policies::*;
use crate::extractors::authentication::GuardedData;
use crate::extractors::json::ValidatedJson;
use crate::routes::SummarizedTaskView;
#[macro_export]
@ -45,7 +45,8 @@ macro_rules! make_setting_route {
let new_settings = Settings { $attr: Setting::Reset.into(), ..Default::default() };
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation =
index_scheduler.filters().allow_index_creation(&index_uid);
let task = KindWithContent::SettingsUpdate {
index_uid: index_uid.to_string(),
@ -68,7 +69,7 @@ macro_rules! make_setting_route {
Data<IndexScheduler>,
>,
index_uid: actix_web::web::Path<String>,
body: $crate::routes::indexes::ValidatedJson<Option<$type>, $err_ty>,
body: deserr::actix_web::AwebJson<Option<$type>, $err_ty>,
req: HttpRequest,
$analytics_var: web::Data<dyn Analytics>,
) -> std::result::Result<HttpResponse, ResponseError> {
@ -86,7 +87,8 @@ macro_rules! make_setting_route {
..Default::default()
};
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation =
index_scheduler.filters().allow_index_creation(&index_uid);
let task = KindWithContent::SettingsUpdate {
index_uid: index_uid.to_string(),
@ -468,7 +470,7 @@ generate_configure!(
pub async fn update_all(
index_scheduler: GuardedData<ActionPolicy<{ actions::SETTINGS_UPDATE }>, Data<IndexScheduler>>,
index_uid: web::Path<String>,
body: ValidatedJson<Settings<Unchecked>, DeserrJsonError>,
body: AwebJson<Settings<Unchecked>, DeserrJsonError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
@ -560,7 +562,7 @@ pub async fn update_all(
Some(&req),
);
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation = index_scheduler.filters().allow_index_creation(&index_uid);
let index_uid = IndexUid::try_from(index_uid.into_inner())?.into_inner();
let task = KindWithContent::SettingsUpdate {
index_uid,
@ -596,7 +598,7 @@ pub async fn delete_all(
let new_settings = Settings::cleared().into_unchecked();
let allow_index_creation = index_scheduler.filters().allow_index_creation;
let allow_index_creation = index_scheduler.filters().allow_index_creation(&index_uid);
let index_uid = IndexUid::try_from(index_uid.into_inner())?.into_inner();
let task = KindWithContent::SettingsUpdate {
index_uid,

View File

@ -0,0 +1,50 @@
use actix_web::http::header;
use actix_web::web::{self, Data};
use actix_web::HttpResponse;
use index_scheduler::IndexScheduler;
use meilisearch_auth::AuthController;
use meilisearch_types::error::ResponseError;
use meilisearch_types::keys::actions;
use prometheus::{Encoder, TextEncoder};
use crate::extractors::authentication::policies::ActionPolicy;
use crate::extractors::authentication::{AuthenticationError, GuardedData};
use crate::routes::create_all_stats;
pub fn configure(config: &mut web::ServiceConfig) {
config.service(web::resource("").route(web::get().to(get_metrics)));
}
pub async fn get_metrics(
index_scheduler: GuardedData<ActionPolicy<{ actions::METRICS_GET }>, Data<IndexScheduler>>,
auth_controller: GuardedData<ActionPolicy<{ actions::METRICS_GET }>, AuthController>,
) -> Result<HttpResponse, ResponseError> {
let auth_filters = index_scheduler.filters();
if !auth_filters.all_indexes_authorized() {
let mut error = ResponseError::from(AuthenticationError::InvalidToken);
error
.message
.push_str(" The API key for the `/metrics` route must allow access to all indexes.");
return Err(error);
}
let response =
create_all_stats((*index_scheduler).clone(), (*auth_controller).clone(), auth_filters)?;
crate::metrics::MEILISEARCH_DB_SIZE_BYTES.set(response.database_size as i64);
crate::metrics::MEILISEARCH_INDEX_COUNT.set(response.indexes.len() as i64);
for (index, value) in response.indexes.iter() {
crate::metrics::MEILISEARCH_INDEX_DOCS_COUNT
.with_label_values(&[index])
.set(value.number_of_documents as i64);
}
let encoder = TextEncoder::new();
let mut buffer = vec![];
encoder.encode(&prometheus::gather(), &mut buffer).expect("Failed to encode metrics");
let response = String::from_utf8(buffer).expect("Failed to convert bytes to string");
Ok(HttpResponse::Ok().insert_header(header::ContentType(mime::TEXT_PLAIN)).body(response))
}

View File

@ -17,13 +17,17 @@ use crate::analytics::Analytics;
use crate::extractors::authentication::policies::*;
use crate::extractors::authentication::GuardedData;
const PAGINATION_DEFAULT_LIMIT: usize = 20;
mod api_key;
mod dump;
pub mod indexes;
mod metrics;
mod multi_search;
mod swap_indexes;
pub mod tasks;
pub fn configure(cfg: &mut web::ServiceConfig) {
pub fn configure(cfg: &mut web::ServiceConfig, enable_metrics: bool) {
cfg.service(web::scope("/tasks").configure(tasks::configure))
.service(web::resource("/health").route(web::get().to(get_health)))
.service(web::scope("/keys").configure(api_key::configure))
@ -31,11 +35,13 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
.service(web::resource("/stats").route(web::get().to(get_stats)))
.service(web::resource("/version").route(web::get().to(get_version)))
.service(web::scope("/indexes").configure(indexes::configure))
.service(web::scope("/search").configure(multi_search::configure))
.service(web::scope("/multi-search").configure(multi_search::configure))
.service(web::scope("/swap-indexes").configure(swap_indexes::configure));
}
const PAGINATION_DEFAULT_LIMIT: usize = 20;
if enable_metrics {
cfg.service(web::scope("/metrics").configure(metrics::configure));
}
}
#[derive(Debug, Serialize)]
#[serde(rename_all = "camelCase")]
@ -60,6 +66,7 @@ impl From<Task> for SummarizedTaskView {
}
}
}
pub struct Pagination {
pub offset: usize,
pub limit: usize,
@ -237,10 +244,9 @@ async fn get_stats(
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
analytics.publish("Stats Seen".to_string(), json!({ "per_index_uid": false }), Some(&req));
let search_rules = &index_scheduler.filters().search_rules;
let filters = index_scheduler.filters();
let stats =
create_all_stats((*index_scheduler).clone(), (*auth_controller).clone(), search_rules)?;
let stats = create_all_stats((*index_scheduler).clone(), (*auth_controller).clone(), filters)?;
debug!("returns: {:?}", stats);
Ok(HttpResponse::Ok().json(stats))
@ -249,20 +255,20 @@ async fn get_stats(
pub fn create_all_stats(
index_scheduler: Data<IndexScheduler>,
auth_controller: AuthController,
search_rules: &meilisearch_auth::SearchRules,
filters: &meilisearch_auth::AuthFilter,
) -> Result<Stats, ResponseError> {
let mut last_task: Option<OffsetDateTime> = None;
let mut indexes = BTreeMap::new();
let mut database_size = 0;
let processing_task = index_scheduler.get_tasks_from_authorized_indexes(
Query { statuses: Some(vec![Status::Processing]), limit: Some(1), ..Query::default() },
search_rules.authorized_indexes(),
filters,
)?;
// accumulate the size of each indexes
let processing_index = processing_task.first().and_then(|task| task.index_uid());
for (name, index) in index_scheduler.indexes()? {
if !search_rules.is_index_authorized(&name) {
continue;
index_scheduler.try_for_each_index(|name, index| {
if !filters.is_index_authorized(name) {
return Ok(());
}
database_size += index.on_disk_size()?;
@ -277,8 +283,9 @@ pub fn create_all_stats(
let updated_at = index.updated_at(&rtxn)?;
last_task = last_task.map_or(Some(updated_at), |last| Some(last.max(updated_at)));
indexes.insert(name, stats);
}
indexes.insert(name.to_string(), stats);
Ok(())
})?;
database_size += index_scheduler.size()?;
database_size += auth_controller.size()?;
@ -327,5 +334,3 @@ pub async fn get_health(
Ok(HttpResponse::Ok().json(serde_json::json!({ "status": "available" })))
}
mod multi_search;

View File

@ -1,54 +1,88 @@
use actix_http::StatusCode;
use actix_web::web::{self, Data};
use actix_web::{HttpRequest, HttpResponse};
use deserr::actix_web::AwebJson;
use index_scheduler::IndexScheduler;
use log::debug;
use meilisearch_types::deserr::DeserrJsonError;
use meilisearch_types::error::ResponseError;
use meilisearch_types::keys::actions;
use serde::Serialize;
use crate::analytics::{Analytics, MultiSearchAggregator};
use crate::extractors::authentication::policies::ActionPolicy;
use crate::extractors::authentication::GuardedData;
use crate::extractors::json::ValidatedJson;
use crate::extractors::authentication::{AuthenticationError, GuardedData};
use crate::extractors::sequential_extractor::SeqHandler;
use crate::search::{
add_search_rules, perform_search, SearchQueryWithIndex, SearchResultWithIndex,
};
pub fn configure(cfg: &mut web::ServiceConfig) {
cfg.service(web::resource("").route(web::post().to(SeqHandler(search_with_post))));
cfg.service(web::resource("").route(web::post().to(SeqHandler(multi_search_with_post))));
}
pub async fn search_with_post(
#[derive(Serialize)]
struct SearchResults {
results: Vec<SearchResultWithIndex>,
}
#[derive(Debug, deserr::Deserr)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
pub struct SearchQueries {
queries: Vec<SearchQueryWithIndex>,
}
pub async fn multi_search_with_post(
index_scheduler: GuardedData<ActionPolicy<{ actions::SEARCH }>, Data<IndexScheduler>>,
params: ValidatedJson<Vec<SearchQueryWithIndex>, DeserrJsonError>,
params: AwebJson<SearchQueries, DeserrJsonError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
let queries = params.into_inner();
let queries = params.into_inner().queries;
let mut multi_aggregate = MultiSearchAggregator::from_queries(&queries, &req);
let search_results: Result<_, ResponseError> = (|| {
// Explicitly expect a `(ResponseError, usize)` for the error type rather than `ResponseError` only,
// so that `?` doesn't work if it doesn't use `with_index`, ensuring that it is not forgotten in case of code
// changes.
let search_results: Result<_, (ResponseError, usize)> = (|| {
async {
let mut search_results = Vec::with_capacity(queries.len());
for (index_uid, mut query) in
queries.into_iter().map(SearchQueryWithIndex::into_index_query)
for (query_index, (index_uid, mut query)) in
queries.into_iter().map(SearchQueryWithIndex::into_index_query).enumerate()
{
debug!("search called with params: {:?}", query);
debug!("multi-search #{query_index}: called with params: {:?}", query);
// Tenant token search_rules.
// Check index from API key
if !index_scheduler.filters().is_index_authorized(&index_uid) {
return Err(AuthenticationError::InvalidToken).with_index(query_index);
}
// Apply search rules from tenant token
if let Some(search_rules) =
index_scheduler.filters().search_rules.get_index_search_rules(&index_uid)
index_scheduler.filters().get_index_search_rules(&index_uid)
{
add_search_rules(&mut query, search_rules);
}
let index = index_scheduler.index(&index_uid)?;
let index = index_scheduler
.index(&index_uid)
.map_err(|err| {
let mut err = ResponseError::from(err);
// Patch the HTTP status code to 400 as it defaults to 404 for `index_not_found`, but
// here the resource not found is not part of the URL.
err.code = StatusCode::BAD_REQUEST;
err
})
.with_index(query_index)?;
let search_result =
tokio::task::spawn_blocking(move || perform_search(&index, query)).await?;
tokio::task::spawn_blocking(move || perform_search(&index, query))
.await
.with_index(query_index)?;
search_results.push(SearchResultWithIndex { index_uid, result: search_result? });
search_results.push(SearchResultWithIndex {
index_uid: index_uid.into_inner(),
result: search_result.with_index(query_index)?,
});
}
Ok(search_results)
}
@ -60,9 +94,29 @@ pub async fn search_with_post(
}
analytics.post_multi_search(multi_aggregate);
let search_results = search_results?;
let search_results = search_results.map_err(|(mut err, query_index)| {
// Add the query index that failed as context for the error message.
// We're doing it only here and not directly in the `WithIndex` trait so that the `with_index` function returns a different type
// of result and we can benefit from static typing.
err.message = format!("Inside `.queries[{query_index}]`: {}", err.message);
err
})?;
debug!("returns: {:?}", search_results);
Ok(HttpResponse::Ok().json(search_results))
Ok(HttpResponse::Ok().json(SearchResults { results: search_results }))
}
/// Local `Result` extension trait to avoid `map_err` boilerplate.
trait WithIndex {
type T;
/// convert the error type inside of the `Result` to a `ResponseError`, and return a couple of it + the usize.
fn with_index(self, index: usize) -> Result<Self::T, (ResponseError, usize)>;
}
impl<T, E: Into<ResponseError>> WithIndex for Result<T, E> {
type T = T;
fn with_index(self, index: usize) -> Result<T, (ResponseError, usize)> {
self.map_err(|err| (err.into(), index))
}
}

View File

@ -1,6 +1,7 @@
use actix_web::web::Data;
use actix_web::{web, HttpRequest, HttpResponse};
use deserr::DeserializeFromValue;
use deserr::actix_web::AwebJson;
use deserr::Deserr;
use index_scheduler::IndexScheduler;
use meilisearch_types::deserr::DeserrJsonError;
use meilisearch_types::error::deserr_codes::InvalidSwapIndexes;
@ -14,14 +15,13 @@ use crate::analytics::Analytics;
use crate::error::MeilisearchHttpError;
use crate::extractors::authentication::policies::*;
use crate::extractors::authentication::{AuthenticationError, GuardedData};
use crate::extractors::json::ValidatedJson;
use crate::extractors::sequential_extractor::SeqHandler;
pub fn configure(cfg: &mut web::ServiceConfig) {
cfg.service(web::resource("").route(web::post().to(SeqHandler(swap_indexes))));
}
#[derive(DeserializeFromValue, Debug, Clone, PartialEq, Eq)]
#[derive(Deserr, Debug, Clone, PartialEq, Eq)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
pub struct SwapIndexesPayload {
#[deserr(error = DeserrJsonError<InvalidSwapIndexes>, missing_field_error = DeserrJsonError::missing_swap_indexes)]
@ -30,7 +30,7 @@ pub struct SwapIndexesPayload {
pub async fn swap_indexes(
index_scheduler: GuardedData<ActionPolicy<{ actions::INDEXES_SWAP }>, Data<IndexScheduler>>,
params: ValidatedJson<Vec<SwapIndexesPayload>, DeserrJsonError>,
params: AwebJson<Vec<SwapIndexesPayload>, DeserrJsonError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
@ -42,7 +42,7 @@ pub async fn swap_indexes(
}),
Some(&req),
);
let search_rules = &index_scheduler.filters().search_rules;
let filters = index_scheduler.filters();
let mut swaps = vec![];
for SwapIndexesPayload { indexes } in params.into_iter() {
@ -53,7 +53,7 @@ pub async fn swap_indexes(
return Err(MeilisearchHttpError::SwapIndexPayloadWrongLength(indexes).into());
}
};
if !search_rules.is_index_authorized(lhs) || !search_rules.is_index_authorized(rhs) {
if !filters.is_index_authorized(lhs) || !filters.is_index_authorized(rhs) {
return Err(AuthenticationError::InvalidToken.into());
}
swaps.push(IndexSwap { indexes: (lhs.to_string(), rhs.to_string()) });

View File

@ -1,6 +1,7 @@
use actix_web::web::Data;
use actix_web::{web, HttpRequest, HttpResponse};
use deserr::DeserializeFromValue;
use deserr::actix_web::AwebQueryParameter;
use deserr::Deserr;
use index_scheduler::{IndexScheduler, Query, TaskId};
use meilisearch_types::deserr::query_params::Param;
use meilisearch_types::deserr::DeserrQueryParamError;
@ -23,7 +24,6 @@ use super::SummarizedTaskView;
use crate::analytics::Analytics;
use crate::extractors::authentication::policies::*;
use crate::extractors::authentication::GuardedData;
use crate::extractors::query_parameters::QueryParameter;
use crate::extractors::sequential_extractor::SeqHandler;
const DEFAULT_LIMIT: u32 = 20;
@ -162,7 +162,7 @@ impl From<Details> for DetailsView {
}
}
#[derive(Debug, DeserializeFromValue)]
#[derive(Debug, Deserr)]
#[deserr(error = DeserrQueryParamError, rename_all = camelCase, deny_unknown_fields)]
pub struct TasksFilterQuery {
#[deserr(default = Param(DEFAULT_LIMIT), error = DeserrQueryParamError<InvalidTaskLimit>)]
@ -181,19 +181,20 @@ pub struct TasksFilterQuery {
#[deserr(default, error = DeserrQueryParamError<InvalidIndexUid>)]
pub index_uids: OptionStarOrList<IndexUid>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterEnqueuedAt>, from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterEnqueuedAt>, try_from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
pub after_enqueued_at: OptionStarOr<OffsetDateTime>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeEnqueuedAt>, from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeEnqueuedAt>, try_from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
pub before_enqueued_at: OptionStarOr<OffsetDateTime>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterStartedAt>, from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterStartedAt>, try_from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
pub after_started_at: OptionStarOr<OffsetDateTime>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeStartedAt>, from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeStartedAt>, try_from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
pub before_started_at: OptionStarOr<OffsetDateTime>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterFinishedAt>, from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterFinishedAt>, try_from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
pub after_finished_at: OptionStarOr<OffsetDateTime>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeFinishedAt>, from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeFinishedAt>, try_from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
pub before_finished_at: OptionStarOr<OffsetDateTime>,
}
impl TasksFilterQuery {
fn into_query(self) -> Query {
Query {
@ -235,7 +236,7 @@ impl TaskDeletionOrCancelationQuery {
}
}
#[derive(Debug, DeserializeFromValue)]
#[derive(Debug, Deserr)]
#[deserr(error = DeserrQueryParamError, rename_all = camelCase, deny_unknown_fields)]
pub struct TaskDeletionOrCancelationQuery {
#[deserr(default, error = DeserrQueryParamError<InvalidTaskUids>)]
@ -249,19 +250,20 @@ pub struct TaskDeletionOrCancelationQuery {
#[deserr(default, error = DeserrQueryParamError<InvalidIndexUid>)]
pub index_uids: OptionStarOrList<IndexUid>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterEnqueuedAt>, from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterEnqueuedAt>, try_from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
pub after_enqueued_at: OptionStarOr<OffsetDateTime>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeEnqueuedAt>, from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeEnqueuedAt>, try_from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
pub before_enqueued_at: OptionStarOr<OffsetDateTime>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterStartedAt>, from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterStartedAt>, try_from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
pub after_started_at: OptionStarOr<OffsetDateTime>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeStartedAt>, from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeStartedAt>, try_from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
pub before_started_at: OptionStarOr<OffsetDateTime>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterFinishedAt>, from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskAfterFinishedAt>, try_from(OptionStarOr<String>) = deserialize_date_after -> InvalidTaskDateError)]
pub after_finished_at: OptionStarOr<OffsetDateTime>,
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeFinishedAt>, from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
#[deserr(default, error = DeserrQueryParamError<InvalidTaskBeforeFinishedAt>, try_from(OptionStarOr<String>) = deserialize_date_before -> InvalidTaskDateError)]
pub before_finished_at: OptionStarOr<OffsetDateTime>,
}
impl TaskDeletionOrCancelationQuery {
fn into_query(self) -> Query {
Query {
@ -284,7 +286,7 @@ impl TaskDeletionOrCancelationQuery {
async fn cancel_tasks(
index_scheduler: GuardedData<ActionPolicy<{ actions::TASKS_CANCEL }>, Data<IndexScheduler>>,
params: QueryParameter<TaskDeletionOrCancelationQuery, DeserrQueryParamError>,
params: AwebQueryParameter<TaskDeletionOrCancelationQuery, DeserrQueryParamError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
@ -317,7 +319,7 @@ async fn cancel_tasks(
let tasks = index_scheduler.get_task_ids_from_authorized_indexes(
&index_scheduler.read_txn()?,
&query,
&index_scheduler.filters().search_rules.authorized_indexes(),
index_scheduler.filters(),
)?;
let task_cancelation =
KindWithContent::TaskCancelation { query: format!("?{}", req.query_string()), tasks };
@ -330,7 +332,7 @@ async fn cancel_tasks(
async fn delete_tasks(
index_scheduler: GuardedData<ActionPolicy<{ actions::TASKS_DELETE }>, Data<IndexScheduler>>,
params: QueryParameter<TaskDeletionOrCancelationQuery, DeserrQueryParamError>,
params: AwebQueryParameter<TaskDeletionOrCancelationQuery, DeserrQueryParamError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
@ -362,7 +364,7 @@ async fn delete_tasks(
let tasks = index_scheduler.get_task_ids_from_authorized_indexes(
&index_scheduler.read_txn()?,
&query,
&index_scheduler.filters().search_rules.authorized_indexes(),
index_scheduler.filters(),
)?;
let task_deletion =
KindWithContent::TaskDeletion { query: format!("?{}", req.query_string()), tasks };
@ -383,7 +385,7 @@ pub struct AllTasks {
async fn get_tasks(
index_scheduler: GuardedData<ActionPolicy<{ actions::TASKS_GET }>, Data<IndexScheduler>>,
params: QueryParameter<TasksFilterQuery, DeserrQueryParamError>,
params: AwebQueryParameter<TasksFilterQuery, DeserrQueryParamError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
@ -396,10 +398,7 @@ async fn get_tasks(
let query = params.into_query();
let mut tasks_results: Vec<TaskView> = index_scheduler
.get_tasks_from_authorized_indexes(
query,
index_scheduler.filters().search_rules.authorized_indexes(),
)?
.get_tasks_from_authorized_indexes(query, index_scheduler.filters())?
.into_iter()
.map(|t| TaskView::from_task(&t))
.collect();
@ -437,12 +436,8 @@ async fn get_task(
let query = index_scheduler::Query { uids: Some(vec![task_uid]), ..Query::default() };
if let Some(task) = index_scheduler
.get_tasks_from_authorized_indexes(
query,
index_scheduler.filters().search_rules.authorized_indexes(),
)?
.first()
if let Some(task) =
index_scheduler.get_tasks_from_authorized_indexes(query, index_scheduler.filters())?.first()
{
let task_view = TaskView::from_task(task);
Ok(HttpResponse::Ok().json(task_view))
@ -498,7 +493,7 @@ pub fn deserialize_date_before(
#[cfg(test)]
mod tests {
use deserr::DeserializeFromValue;
use deserr::Deserr;
use meili_snap::snapshot;
use meilisearch_types::deserr::DeserrQueryParamError;
use meilisearch_types::error::{Code, ResponseError};
@ -507,7 +502,7 @@ mod tests {
fn deserr_query_params<T>(j: &str) -> Result<T, ResponseError>
where
T: DeserializeFromValue<DeserrQueryParamError>,
T: Deserr<DeserrQueryParamError>,
{
let value = serde_urlencoded::from_str::<serde_json::Value>(j)
.map_err(|e| ResponseError::from_msg(e.to_string(), Code::BadRequest))?;

View File

@ -3,11 +3,12 @@ use std::collections::{BTreeMap, BTreeSet, HashSet};
use std::str::FromStr;
use std::time::Instant;
use deserr::DeserializeFromValue;
use deserr::Deserr;
use either::Either;
use meilisearch_auth::IndexSearchRules;
use meilisearch_types::deserr::DeserrJsonError;
use meilisearch_types::error::deserr_codes::*;
use meilisearch_types::index_uid::IndexUid;
use meilisearch_types::settings::DEFAULT_PAGINATION_MAX_TOTAL_HITS;
use meilisearch_types::{milli, Document};
use milli::tokenizer::TokenizerBuilder;
@ -30,7 +31,7 @@ pub const DEFAULT_CROP_MARKER: fn() -> String = || "…".to_string();
pub const DEFAULT_HIGHLIGHT_PRE_TAG: fn() -> String = || "<em>".to_string();
pub const DEFAULT_HIGHLIGHT_POST_TAG: fn() -> String = || "</em>".to_string();
#[derive(Debug, Clone, Default, PartialEq, Eq, DeserializeFromValue)]
#[derive(Debug, Clone, Default, PartialEq, Eq, Deserr)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
pub struct SearchQuery {
#[deserr(default, error = DeserrJsonError<InvalidSearchQ>)]
@ -79,10 +80,11 @@ impl SearchQuery {
// This struct contains the fields of `SearchQuery` inline.
// This is because neither deserr nor serde support `flatten` when using `deny_unknown_fields.
// The `From<SearchQueryWithIndex>` implementation ensures both structs remain up to date.
#[derive(Debug, deserr::DeserializeFromValue)]
#[derive(Debug, Clone, PartialEq, Eq, Deserr)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
pub struct SearchQueryWithIndex {
pub index_uid: String,
#[deserr(error = DeserrJsonError<InvalidIndexUid>, missing_field_error = DeserrJsonError::missing_index_uid)]
pub index_uid: IndexUid,
#[deserr(default, error = DeserrJsonError<InvalidSearchQ>)]
pub q: Option<String>,
#[deserr(default = DEFAULT_SEARCH_OFFSET(), error = DeserrJsonError<InvalidSearchOffset>)]
@ -120,7 +122,7 @@ pub struct SearchQueryWithIndex {
}
impl SearchQueryWithIndex {
pub fn into_index_query(self) -> (String, SearchQuery) {
pub fn into_index_query(self) -> (IndexUid, SearchQuery) {
let SearchQueryWithIndex {
index_uid,
q,
@ -168,7 +170,7 @@ impl SearchQueryWithIndex {
}
}
#[derive(Debug, Clone, PartialEq, Eq, DeserializeFromValue)]
#[derive(Debug, Clone, PartialEq, Eq, Deserr)]
#[deserr(rename_all = camelCase)]
pub enum MatchingStrategy {
/// Remove query words from last to first
@ -202,7 +204,7 @@ pub struct SearchHit {
pub matches_position: Option<MatchesPosition>,
}
#[derive(Serialize, Debug, Clone, PartialEq, Eq)]
#[derive(Serialize, Debug, Clone, PartialEq)]
#[serde(rename_all = "camelCase")]
pub struct SearchResult {
pub hits: Vec<SearchHit>,
@ -212,9 +214,11 @@ pub struct SearchResult {
pub hits_info: HitsInfo,
#[serde(skip_serializing_if = "Option::is_none")]
pub facet_distribution: Option<BTreeMap<String, BTreeMap<String, u64>>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub facet_stats: Option<BTreeMap<String, FacetStats>>,
}
#[derive(Serialize, Debug, Clone, PartialEq, Eq)]
#[derive(Serialize, Debug, Clone, PartialEq)]
#[serde(rename_all = "camelCase")]
pub struct SearchResultWithIndex {
pub index_uid: String,
@ -231,6 +235,12 @@ pub enum HitsInfo {
OffsetLimit { limit: usize, offset: usize, estimated_total_hits: usize },
}
#[derive(Serialize, Debug, Clone, PartialEq)]
pub struct FacetStats {
pub min: f64,
pub max: f64,
}
/// Incorporate search rules in search query
pub fn add_search_rules(query: &mut SearchQuery, rules: IndexSearchRules) {
query.filter = match (query.filter.take(), rules.filter) {
@ -365,9 +375,15 @@ pub fn perform_search(
&displayed_ids,
);
let tokenizer = TokenizerBuilder::default().build();
let mut tokenizer_buidler = TokenizerBuilder::default();
tokenizer_buidler.create_char_map(true);
let mut formatter_builder = MatcherBuilder::new(matching_words, tokenizer);
let script_lang_map = index.script_language(&rtxn)?;
if !script_lang_map.is_empty() {
tokenizer_buidler.allow_list(&script_lang_map);
}
let mut formatter_builder = MatcherBuilder::new(matching_words, tokenizer_buidler.build());
formatter_builder.crop_marker(query.crop_marker);
formatter_builder.highlight_prefix(query.highlight_pre_tag);
formatter_builder.highlight_suffix(query.highlight_post_tag);
@ -422,7 +438,7 @@ pub fn perform_search(
HitsInfo::OffsetLimit { limit: query.limit, offset, estimated_total_hits: number_of_hits }
};
let facet_distribution = match query.facets {
let (facet_distribution, facet_stats) = match query.facets {
Some(ref fields) => {
let mut facet_distribution = index.facets_distribution(&rtxn);
@ -436,18 +452,23 @@ pub fn perform_search(
facet_distribution.facets(fields);
}
let distribution = facet_distribution.candidates(candidates).execute()?;
Some(distribution)
let stats = facet_distribution.compute_stats()?;
(Some(distribution), Some(stats))
}
None => None,
None => (None, None),
};
let facet_stats = facet_stats.map(|stats| {
stats.into_iter().map(|(k, (min, max))| (k, FacetStats { min, max })).collect()
});
let result = SearchResult {
hits: documents,
hits_info,
query: query.q.clone().unwrap_or_default(),
processing_time_ms: before_search.elapsed().as_millis(),
facet_distribution,
facet_stats,
};
Ok(result)
}

View File

@ -377,7 +377,7 @@ async fn error_add_api_key_invalid_index_uids() {
meili_snap::snapshot!(code, @"400 Bad Request");
meili_snap::snapshot!(meili_snap::json_string!(response, { ".createdAt" => "[ignored]", ".updatedAt" => "[ignored]" }), @r###"
{
"message": "Invalid value at `.indexes[0]`: `invalid index # / \\name with spaces` is not a valid index uid. Index uid can be an integer or a string containing only alphanumeric characters, hyphens (-) and underscores (_).",
"message": "Invalid value at `.indexes[0]`: `invalid index # / \\name with spaces` is not a valid index uid pattern. Index uid patterns can be an integer or a string containing only alphanumeric characters, hyphens (-), underscores (_), and optionally end with a star (*).",
"code": "invalid_api_key_indexes",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_api_key_indexes"

View File

@ -10,7 +10,8 @@ use crate::common::Server;
pub static AUTHORIZATIONS: Lazy<HashMap<(&'static str, &'static str), HashSet<&'static str>>> =
Lazy::new(|| {
let mut authorizations = hashmap! {
let authorizations = hashmap! {
("POST", "/multi-search") => hashset!{"search", "*"},
("POST", "/indexes/products/search") => hashset!{"search", "*"},
("GET", "/indexes/products/search") => hashset!{"search", "*"},
("POST", "/indexes/products/documents") => hashset!{"documents.add", "documents.*", "*"},
@ -51,6 +52,7 @@ pub static AUTHORIZATIONS: Lazy<HashMap<(&'static str, &'static str), HashSet<&'
("GET", "/stats") => hashset!{"stats.get", "stats.*", "*"},
("POST", "/dumps") => hashset!{"dumps.create", "dumps.*", "*"},
("GET", "/version") => hashset!{"version", "*"},
("GET", "/metrics") => hashset!{"metrics.get", "metrics.*", "*"},
("PATCH", "/keys/mykey/") => hashset!{"keys.update", "*"},
("GET", "/keys/mykey/") => hashset!{"keys.get", "*"},
("DELETE", "/keys/mykey/") => hashset!{"keys.delete", "*"},
@ -58,10 +60,6 @@ pub static AUTHORIZATIONS: Lazy<HashMap<(&'static str, &'static str), HashSet<&'
("GET", "/keys") => hashset!{"keys.get", "*"},
};
if cfg!(feature = "metrics") {
authorizations.insert(("GET", "/metrics"), hashset! {"metrics.get", "metrics.*", "*"});
}
authorizations
});
@ -77,12 +75,22 @@ static INVALID_RESPONSE: Lazy<Value> = Lazy::new(|| {
})
});
static INVALID_METRICS_RESPONSE: Lazy<Value> = Lazy::new(|| {
json!({"message": "The provided API key is invalid. The API key for the `/metrics` route must allow access to all indexes.",
"code": "invalid_api_key",
"type": "auth",
"link": "https://docs.meilisearch.com/errors#invalid_api_key"
})
});
const MASTER_KEY: &str = "MASTER_KEY";
#[actix_rt::test]
async fn error_access_expired_key() {
use std::{thread, time};
let mut server = Server::new_auth().await;
server.use_api_key("MASTER_KEY");
server.use_api_key(MASTER_KEY);
let content = json!({
"indexes": ["products"],
@ -111,7 +119,7 @@ async fn error_access_expired_key() {
#[actix_rt::test]
async fn error_access_unauthorized_index() {
let mut server = Server::new_auth().await;
server.use_api_key("MASTER_KEY");
server.use_api_key(MASTER_KEY);
let content = json!({
"indexes": ["sales"],
@ -144,7 +152,7 @@ async fn error_access_unauthorized_action() {
for ((method, route), action) in AUTHORIZATIONS.iter() {
// create a new API key letting only the needed action.
server.use_api_key("MASTER_KEY");
server.use_api_key(MASTER_KEY);
let content = json!({
"indexes": ["products"],
@ -168,7 +176,7 @@ async fn error_access_unauthorized_action() {
#[actix_rt::test]
async fn access_authorized_master_key() {
let mut server = Server::new_auth().await;
server.use_api_key("MASTER_KEY");
server.use_api_key(MASTER_KEY);
// master key must have access to all routes.
for ((method, route), _) in AUTHORIZATIONS.iter() {
@ -185,7 +193,7 @@ async fn access_authorized_restricted_index() {
for ((method, route), actions) in AUTHORIZATIONS.iter() {
for action in actions {
// create a new API key letting only the needed action.
server.use_api_key("MASTER_KEY");
server.use_api_key(MASTER_KEY);
let content = json!({
"indexes": ["products"],
@ -202,15 +210,28 @@ async fn access_authorized_restricted_index() {
let (response, code) = server.dummy_request(method, route).await;
assert_ne!(
response,
INVALID_RESPONSE.clone(),
"on route: {:?} - {:?} with action: {:?}",
method,
route,
action
);
assert_ne!(code, 403);
// The metrics route MUST have no limitation on the indexes
if *route == "/metrics" {
assert_eq!(
response,
INVALID_METRICS_RESPONSE.clone(),
"on route: {:?} - {:?} with action: {:?}",
method,
route,
action
);
assert_eq!(code, 403);
} else {
assert_ne!(
response,
INVALID_RESPONSE.clone(),
"on route: {:?} - {:?} with action: {:?}",
method,
route,
action
);
assert_ne!(code, 403);
}
}
}
}
@ -222,7 +243,7 @@ async fn access_authorized_no_index_restriction() {
for ((method, route), actions) in AUTHORIZATIONS.iter() {
for action in actions {
// create a new API key letting only the needed action.
server.use_api_key("MASTER_KEY");
server.use_api_key(MASTER_KEY);
let content = json!({
"indexes": ["*"],
@ -255,7 +276,7 @@ async fn access_authorized_no_index_restriction() {
#[actix_rt::test]
async fn access_authorized_stats_restricted_index() {
let mut server = Server::new_auth().await;
server.use_admin_key("MASTER_KEY").await;
server.use_admin_key(MASTER_KEY).await;
// create index `test`
let index = server.index("test");
@ -295,7 +316,7 @@ async fn access_authorized_stats_restricted_index() {
#[actix_rt::test]
async fn access_authorized_stats_no_index_restriction() {
let mut server = Server::new_auth().await;
server.use_admin_key("MASTER_KEY").await;
server.use_admin_key(MASTER_KEY).await;
// create index `test`
let index = server.index("test");
@ -335,7 +356,7 @@ async fn access_authorized_stats_no_index_restriction() {
#[actix_rt::test]
async fn list_authorized_indexes_restricted_index() {
let mut server = Server::new_auth().await;
server.use_admin_key("MASTER_KEY").await;
server.use_admin_key(MASTER_KEY).await;
// create index `test`
let index = server.index("test");
@ -376,7 +397,7 @@ async fn list_authorized_indexes_restricted_index() {
#[actix_rt::test]
async fn list_authorized_indexes_no_index_restriction() {
let mut server = Server::new_auth().await;
server.use_admin_key("MASTER_KEY").await;
server.use_admin_key(MASTER_KEY).await;
// create index `test`
let index = server.index("test");
@ -414,10 +435,194 @@ async fn list_authorized_indexes_no_index_restriction() {
assert!(response.iter().any(|index| index["uid"] == "test"));
}
#[actix_rt::test]
async fn access_authorized_index_patterns() {
let mut server = Server::new_auth().await;
server.use_admin_key(MASTER_KEY).await;
// create products_1 index
let index_1 = server.index("products_1");
let (response, code) = index_1.create(Some("id")).await;
assert_eq!(202, code, "{:?}", &response);
// create products index
let index_ = server.index("products");
let (response, code) = index_.create(Some("id")).await;
assert_eq!(202, code, "{:?}", &response);
// create key with all document access on indices with product_* pattern.
let content = json!({
"indexes": ["products_*"],
"actions": ["documents.*"],
"expiresAt": (OffsetDateTime::now_utc() + Duration::hours(1)).format(&Rfc3339).unwrap(),
});
// Register the key
let (response, code) = server.add_api_key(content).await;
assert_eq!(201, code, "{:?}", &response);
assert!(response["key"].is_string());
// use created key.
let key = response["key"].as_str().unwrap();
server.use_api_key(key);
// refer to products_1 and products with modified api key.
let index_1 = server.index("products_1");
let index_ = server.index("products");
// try to create a index via add documents route
let documents = json!([
{
"id": 1,
"content": "foo",
}
]);
// Adding document to products_1 index. Should succeed with 202
let (response, code) = index_1.add_documents(documents.clone(), None).await;
assert_eq!(202, code, "{:?}", &response);
let task_id = response["taskUid"].as_u64().unwrap();
// Adding document to products index. Should Fail with 403 -- invalid_api_key
let (response, code) = index_.add_documents(documents, None).await;
assert_eq!(403, code, "{:?}", &response);
server.use_api_key(MASTER_KEY);
// refer to products_1 with modified api key.
let index_1 = server.index("products_1");
index_1.wait_task(task_id).await;
let (response, code) = index_1.get_task(task_id).await;
assert_eq!(200, code, "{:?}", &response);
assert_eq!(response["status"], "succeeded");
}
#[actix_rt::test]
async fn raise_error_non_authorized_index_patterns() {
let mut server = Server::new_auth().await;
server.use_admin_key(MASTER_KEY).await;
// create products_1 index
let product_1_index = server.index("products_1");
let (response, code) = product_1_index.create(Some("id")).await;
assert_eq!(202, code, "{:?}", &response);
// create products_2 index
let product_2_index = server.index("products_2");
let (response, code) = product_2_index.create(Some("id")).await;
assert_eq!(202, code, "{:?}", &response);
// create test index
let test_index = server.index("test");
let (response, code) = test_index.create(Some("id")).await;
assert_eq!(202, code, "{:?}", &response);
// create key with all document access on indices with product_* pattern.
let content = json!({
"indexes": ["products_*"],
"actions": ["documents.*"],
"expiresAt": (OffsetDateTime::now_utc() + Duration::hours(1)).format(&Rfc3339).unwrap(),
});
// Register the key
let (response, code) = server.add_api_key(content).await;
assert_eq!(201, code, "{:?}", &response);
assert!(response["key"].is_string());
// use created key.
let key = response["key"].as_str().unwrap();
server.use_api_key(key);
// refer to products_1 and products_2 with modified api key.
let product_1_index = server.index("products_1");
let product_2_index = server.index("products_2");
// refer to test index
let test_index = server.index("test");
// try to create a index via add documents route
let documents = json!([
{
"id": 1,
"content": "foo",
}
]);
// Adding document to products_1 index. Should succeed with 202
let (response, code) = product_1_index.add_documents(documents.clone(), None).await;
assert_eq!(202, code, "{:?}", &response);
let task1_id = response["taskUid"].as_u64().unwrap();
// Adding document to products_2 index. Should succeed with 202
let (response, code) = product_2_index.add_documents(documents.clone(), None).await;
assert_eq!(202, code, "{:?}", &response);
let task2_id = response["taskUid"].as_u64().unwrap();
// Adding document to test index. Should Fail with 403 -- invalid_api_key
let (response, code) = test_index.add_documents(documents, None).await;
assert_eq!(403, code, "{:?}", &response);
server.use_api_key(MASTER_KEY);
// refer to products_1 with modified api key.
let product_1_index = server.index("products_1");
// refer to products_2 with modified api key.
let product_2_index = server.index("products_2");
product_1_index.wait_task(task1_id).await;
product_2_index.wait_task(task2_id).await;
let (response, code) = product_1_index.get_task(task1_id).await;
assert_eq!(200, code, "{:?}", &response);
assert_eq!(response["status"], "succeeded");
let (response, code) = product_1_index.get_task(task2_id).await;
assert_eq!(200, code, "{:?}", &response);
assert_eq!(response["status"], "succeeded");
}
#[actix_rt::test]
async fn pattern_indexes() {
// Create server with master key
let mut server = Server::new_auth().await;
server.use_admin_key(MASTER_KEY).await;
// index.* constraints on products_* index pattern
let content = json!({
"indexes": ["products_*"],
"actions": ["indexes.*"],
"expiresAt": (OffsetDateTime::now_utc() + Duration::hours(1)).format(&Rfc3339).unwrap(),
});
// Generate and use the api key
let (response, code) = server.add_api_key(content).await;
assert_eq!(201, code, "{:?}", &response);
let key = response["key"].as_str().expect("Key is not string");
server.use_api_key(key);
// Create Index products_1 using generated api key
let products_1 = server.index("products_1");
let (response, code) = products_1.create(Some("id")).await;
assert_eq!(202, code, "{:?}", &response);
// Fail to create products_* using generated api key
let products_1 = server.index("products_*");
let (response, code) = products_1.create(Some("id")).await;
assert_eq!(400, code, "{:?}", &response);
// Fail to create test_1 using generated api key
let products_1 = server.index("test_1");
let (response, code) = products_1.create(Some("id")).await;
assert_eq!(403, code, "{:?}", &response);
}
#[actix_rt::test]
async fn list_authorized_tasks_restricted_index() {
let mut server = Server::new_auth().await;
server.use_admin_key("MASTER_KEY").await;
server.use_admin_key(MASTER_KEY).await;
// create index `test`
let index = server.index("test");
@ -446,7 +651,6 @@ async fn list_authorized_tasks_restricted_index() {
let (response, code) = server.service.get("/tasks").await;
assert_eq!(200, code, "{:?}", &response);
println!("{}", response);
let response = response["results"].as_array().unwrap();
// key should have access on `products` index.
assert!(response.iter().any(|task| task["indexUid"] == "products"));
@ -458,7 +662,7 @@ async fn list_authorized_tasks_restricted_index() {
#[actix_rt::test]
async fn list_authorized_tasks_no_index_restriction() {
let mut server = Server::new_auth().await;
server.use_admin_key("MASTER_KEY").await;
server.use_admin_key(MASTER_KEY).await;
// create index `test`
let index = server.index("test");
@ -499,7 +703,7 @@ async fn list_authorized_tasks_no_index_restriction() {
#[actix_rt::test]
async fn error_creating_index_without_action() {
let mut server = Server::new_auth().await;
server.use_api_key("MASTER_KEY");
server.use_api_key(MASTER_KEY);
// create key with access on all indexes.
let content = json!({
@ -587,7 +791,7 @@ async fn lazy_create_index() {
];
for content in contents {
server.use_api_key("MASTER_KEY");
server.use_api_key(MASTER_KEY);
let (response, code) = server.add_api_key(content).await;
assert_eq!(201, code, "{:?}", &response);
assert!(response["key"].is_string());
@ -643,14 +847,114 @@ async fn lazy_create_index() {
}
}
#[actix_rt::test]
async fn lazy_create_index_from_pattern() {
let mut server = Server::new_auth().await;
// create key with access on all indexes.
let contents = vec![
json!({
"indexes": ["products_*"],
"actions": ["*"],
"expiresAt": "2050-11-13T00:00:00Z"
}),
json!({
"indexes": ["products_*"],
"actions": ["indexes.*", "documents.*", "settings.*", "tasks.*"],
"expiresAt": "2050-11-13T00:00:00Z"
}),
json!({
"indexes": ["products_*"],
"actions": ["indexes.create", "documents.add", "settings.update", "tasks.get"],
"expiresAt": "2050-11-13T00:00:00Z"
}),
];
for content in contents {
server.use_api_key(MASTER_KEY);
let (response, code) = server.add_api_key(content).await;
assert_eq!(201, code, "{:?}", &response);
assert!(response["key"].is_string());
// use created key.
let key = response["key"].as_str().unwrap();
server.use_api_key(key);
// try to create a index via add documents route
let index = server.index("products_1");
let test = server.index("test");
let documents = json!([
{
"id": 1,
"content": "foo",
}
]);
let (response, code) = index.add_documents(documents.clone(), None).await;
assert_eq!(202, code, "{:?}", &response);
let task_id = response["taskUid"].as_u64().unwrap();
index.wait_task(task_id).await;
let (response, code) = index.get_task(task_id).await;
assert_eq!(200, code, "{:?}", &response);
assert_eq!(response["status"], "succeeded");
// Fail to create test index
let (response, code) = test.add_documents(documents, None).await;
assert_eq!(403, code, "{:?}", &response);
// try to create a index via add settings route
let index = server.index("products_2");
let settings = json!({ "distinctAttribute": "test"});
let (response, code) = index.update_settings(settings).await;
assert_eq!(202, code, "{:?}", &response);
let task_id = response["taskUid"].as_u64().unwrap();
index.wait_task(task_id).await;
let (response, code) = index.get_task(task_id).await;
assert_eq!(200, code, "{:?}", &response);
assert_eq!(response["status"], "succeeded");
// Fail to create test index
let index = server.index("test");
let settings = json!({ "distinctAttribute": "test"});
let (response, code) = index.update_settings(settings).await;
assert_eq!(403, code, "{:?}", &response);
// try to create a index via add specialized settings route
let index = server.index("products_3");
let (response, code) = index.update_distinct_attribute(json!("test")).await;
assert_eq!(202, code, "{:?}", &response);
let task_id = response["taskUid"].as_u64().unwrap();
index.wait_task(task_id).await;
let (response, code) = index.get_task(task_id).await;
assert_eq!(200, code, "{:?}", &response);
assert_eq!(response["status"], "succeeded");
// Fail to create test index
let index = server.index("test");
let settings = json!({ "distinctAttribute": "test"});
let (response, code) = index.update_settings(settings).await;
assert_eq!(403, code, "{:?}", &response);
}
}
#[actix_rt::test]
async fn error_creating_index_without_index() {
let mut server = Server::new_auth().await;
server.use_api_key("MASTER_KEY");
server.use_api_key(MASTER_KEY);
// create key with access on all indexes.
let content = json!({
"indexes": ["unexpected"],
"indexes": ["unexpected","products_*"],
"actions": ["*"],
"expiresAt": "2050-11-13T00:00:00Z"
});
@ -690,4 +994,32 @@ async fn error_creating_index_without_index() {
let index = server.index("test3");
let (response, code) = index.create(None).await;
assert_eq!(403, code, "{:?}", &response);
// try to create a index via add documents route
let index = server.index("products");
let documents = json!([
{
"id": 1,
"content": "foo",
}
]);
let (response, code) = index.add_documents(documents, None).await;
assert_eq!(403, code, "{:?}", &response);
// try to create a index via add settings route
let index = server.index("products");
let settings = json!({ "distinctAttribute": "test"});
let (response, code) = index.update_settings(settings).await;
assert_eq!(403, code, "{:?}", &response);
// try to create a index via add specialized settings route
let index = server.index("products");
let (response, code) = index.update_distinct_attribute(json!("test")).await;
assert_eq!(403, code, "{:?}", &response);
// try to create a index via create index route
let index = server.index("products");
let (response, code) = index.create(None).await;
assert_eq!(403, code, "{:?}", &response);
}

View File

@ -120,7 +120,7 @@ async fn create_api_key_bad_indexes() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value at `.indexes[0]`: `good doggo` is not a valid index uid. Index uid can be an integer or a string containing only alphanumeric characters, hyphens (-) and underscores (_).",
"message": "Invalid value at `.indexes[0]`: `good doggo` is not a valid index uid pattern. Index uid patterns can be an integer or a string containing only alphanumeric characters, hyphens (-), underscores (_), and optionally end with a star (*).",
"code": "invalid_api_key_indexes",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_api_key_indexes"
@ -138,7 +138,7 @@ async fn create_api_key_bad_expires_at() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Unknown field `expires_at`: expected one of `description`, `name`, `uid`, `actions`, `indexes`, `expiresAt`",
"message": "Unknown field `expires_at`: did you mean `expiresAt`? expected one of `description`, `name`, `uid`, `actions`, `indexes`, `expiresAt`",
"code": "bad_request",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#bad_request"
@ -150,7 +150,7 @@ async fn create_api_key_bad_expires_at() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Unknown field `expires_at`: expected one of `description`, `name`, `uid`, `actions`, `indexes`, `expiresAt`",
"message": "Unknown field `expires_at`: did you mean `expiresAt`? expected one of `description`, `name`, `uid`, `actions`, `indexes`, `expiresAt`",
"code": "bad_request",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#bad_request"

View File

@ -4,6 +4,8 @@ mod errors;
mod payload;
mod tenant_token;
mod tenant_token_multi_search;
use actix_web::http::StatusCode;
use serde_json::{json, Value};

View File

@ -82,6 +82,11 @@ static ACCEPTED_KEYS: Lazy<Vec<Value>> = Lazy::new(|| {
"actions": ["search"],
"expiresAt": (OffsetDateTime::now_utc() + Duration::days(1)).format(&Rfc3339).unwrap()
}),
json!({
"indexes": ["sal*", "prod*"],
"actions": ["search"],
"expiresAt": (OffsetDateTime::now_utc() + Duration::days(1)).format(&Rfc3339).unwrap()
}),
]
});
@ -104,6 +109,11 @@ static REFUSED_KEYS: Lazy<Vec<Value>> = Lazy::new(|| {
"actions": ["*"],
"expiresAt": (OffsetDateTime::now_utc() + Duration::days(1)).format(&Rfc3339).unwrap()
}),
json!({
"indexes": ["prod*", "p*"],
"actions": ["*"],
"expiresAt": (OffsetDateTime::now_utc() + Duration::days(1)).format(&Rfc3339).unwrap()
}),
json!({
"indexes": ["products"],
"actions": ["search"],
@ -245,6 +255,10 @@ async fn search_authorized_simple_token() {
"searchRules" => json!(["sales"]),
"exp" => Value::Null
},
hashmap! {
"searchRules" => json!(["sa*"]),
"exp" => Value::Null
},
];
compute_authorized_search!(tenant_tokens, {}, 5);
@ -351,11 +365,19 @@ async fn filter_search_authorized_filter_token() {
}),
"exp" => json!((OffsetDateTime::now_utc() + Duration::hours(1)).unix_timestamp())
},
hashmap! {
"searchRules" => json!({
"*": {},
"sal*": {"filter": ["color = blue"]}
}),
"exp" => json!((OffsetDateTime::now_utc() + Duration::hours(1)).unix_timestamp())
},
];
compute_authorized_search!(tenant_tokens, "color = yellow", 1);
}
/// Tests that those Tenant Token are incompatible with the REFUSED_KEYS defined above.
#[actix_rt::test]
async fn error_search_token_forbidden_parent_key() {
let tenant_tokens = vec![
@ -383,6 +405,10 @@ async fn error_search_token_forbidden_parent_key() {
"searchRules" => json!(["sales"]),
"exp" => json!((OffsetDateTime::now_utc() + Duration::hours(1)).unix_timestamp())
},
hashmap! {
"searchRules" => json!(["sali*", "s*", "sales*"]),
"exp" => json!((OffsetDateTime::now_utc() + Duration::hours(1)).unix_timestamp())
},
];
compute_forbidden_search!(tenant_tokens, REFUSED_KEYS);

File diff suppressed because it is too large Load Diff

View File

@ -30,7 +30,7 @@ impl Index<'_> {
.post_str(
url,
include_str!("../assets/test_set.json"),
("content-type", "application/json"),
vec![("content-type", "application/json")],
)
.await;
assert_eq!(code, 202);
@ -46,7 +46,7 @@ impl Index<'_> {
.post_str(
url,
include_str!("../assets/test_set.ndjson"),
("content-type", "application/x-ndjson"),
vec![("content-type", "application/x-ndjson")],
)
.await;
assert_eq!(code, 202);
@ -96,6 +96,21 @@ impl Index<'_> {
self.service.post_encoded(url, documents, self.encoder).await
}
pub async fn raw_add_documents(
&self,
payload: &str,
content_type: Option<&str>,
query_parameter: &str,
) -> (Value, StatusCode) {
let url = format!("/indexes/{}/documents{}", urlencode(self.uid.as_ref()), query_parameter);
if let Some(content_type) = content_type {
self.service.post_str(url, payload, vec![("Content-Type", content_type)]).await
} else {
self.service.post_str(url, payload, Vec::new()).await
}
}
pub async fn update_documents(
&self,
documents: Value,
@ -110,6 +125,21 @@ impl Index<'_> {
self.service.put_encoded(url, documents, self.encoder).await
}
pub async fn raw_update_documents(
&self,
payload: &str,
content_type: Option<&str>,
query_parameter: &str,
) -> (Value, StatusCode) {
let url = format!("/indexes/{}/documents{}", urlencode(self.uid.as_ref()), query_parameter);
if let Some(content_type) = content_type {
self.service.put_str(url, payload, vec![("Content-Type", content_type)]).await
} else {
self.service.put_str(url, payload, Vec::new()).await
}
}
pub async fn wait_task(&self, update_id: u64) -> Value {
// try several times to get status, or panic to not wait forever
let url = format!("/tasks/{}", update_id);

View File

@ -103,8 +103,8 @@ impl Server {
Index { uid: uid.as_ref().to_string(), service: &self.service, encoder }
}
pub async fn search(&self, queries: Value) -> (Value, StatusCode) {
self.service.post("/search", queries).await
pub async fn multi_search(&self, queries: Value) -> (Value, StatusCode) {
self.service.post("/multi-search", queries).await
}
pub async fn list_indexes_raw(&self, parameters: &str) -> (Value, StatusCode) {
@ -205,10 +205,10 @@ pub fn default_settings(dir: impl AsRef<Path>) -> Opt {
indexer_options: IndexerOpts {
// memory has to be unlimited because several meilisearch are running in test context.
max_indexing_memory: MaxMemory::unlimited(),
skip_index_budget: true,
..Parser::parse_from(None as Option<&str>)
},
#[cfg(feature = "metrics")]
enable_metrics_route: true,
experimental_enable_metrics: true,
..Parser::parse_from(None as Option<&str>)
}
}

View File

@ -34,17 +34,18 @@ impl Service {
self.request(req).await
}
/// Send a test post request from a text body, with a `content-type:application/json` header.
/// Send a test post request from a text body.
pub async fn post_str(
&self,
url: impl AsRef<str>,
body: impl AsRef<str>,
header: (&str, &str),
headers: Vec<(&str, &str)>,
) -> (Value, StatusCode) {
let req = test::TestRequest::post()
.uri(url.as_ref())
.set_payload(body.as_ref().to_string())
.insert_header(header);
let mut req =
test::TestRequest::post().uri(url.as_ref()).set_payload(body.as_ref().to_string());
for header in headers {
req = req.insert_header(header);
}
self.request(req).await
}
@ -57,6 +58,21 @@ impl Service {
self.put_encoded(url, body, Encoder::Plain).await
}
/// Send a test put request from a text body.
pub async fn put_str(
&self,
url: impl AsRef<str>,
body: impl AsRef<str>,
headers: Vec<(&str, &str)>,
) -> (Value, StatusCode) {
let mut req =
test::TestRequest::put().uri(url.as_ref()).set_payload(body.as_ref().to_string());
for header in headers {
req = req.insert_header(header);
}
self.request(req).await
}
pub async fn put_encoded(
&self,
url: impl AsRef<str>,

View File

@ -216,6 +216,133 @@ async fn add_single_document_with_every_encoding() {
}
}
#[actix_rt::test]
async fn add_csv_document() {
let server = Server::new().await;
let index = server.index("pets");
let document = "#id,name,race
0,jean,bernese mountain
1,jorts,orange cat";
let (response, code) = index.raw_update_documents(document, Some("text/csv"), "").await;
snapshot!(code, @"202 Accepted");
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]" }), @r###"
{
"taskUid": 0,
"indexUid": "pets",
"status": "enqueued",
"type": "documentAdditionOrUpdate",
"enqueuedAt": "[date]"
}
"###);
let response = index.wait_task(response["taskUid"].as_u64().unwrap()).await;
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".duration" => "[duration]" }), @r###"
{
"uid": 0,
"indexUid": "pets",
"status": "succeeded",
"type": "documentAdditionOrUpdate",
"canceledBy": null,
"details": {
"receivedDocuments": 2,
"indexedDocuments": 2
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
let (documents, code) = index.get_all_documents(GetAllDocumentsOptions::default()).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(documents), @r###"
{
"results": [
{
"#id": "0",
"name": "jean",
"race": "bernese mountain"
},
{
"#id": "1",
"name": "jorts",
"race": "orange cat"
}
],
"offset": 0,
"limit": 20,
"total": 2
}
"###);
}
#[actix_rt::test]
async fn add_csv_document_with_custom_delimiter() {
let server = Server::new().await;
let index = server.index("pets");
let document = "#id|name|race
0|jean|bernese mountain
1|jorts|orange cat";
let (response, code) =
index.raw_update_documents(document, Some("text/csv"), "?csvDelimiter=|").await;
snapshot!(code, @"202 Accepted");
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]" }), @r###"
{
"taskUid": 0,
"indexUid": "pets",
"status": "enqueued",
"type": "documentAdditionOrUpdate",
"enqueuedAt": "[date]"
}
"###);
let response = index.wait_task(response["taskUid"].as_u64().unwrap()).await;
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".duration" => "[duration]" }), @r###"
{
"uid": 0,
"indexUid": "pets",
"status": "succeeded",
"type": "documentAdditionOrUpdate",
"canceledBy": null,
"details": {
"receivedDocuments": 2,
"indexedDocuments": 2
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
let (documents, code) = index.get_all_documents(GetAllDocumentsOptions::default()).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(documents), @r###"
{
"results": [
{
"#id": "0",
"name": "jean",
"race": "bernese mountain"
},
{
"#id": "1",
"name": "jorts",
"race": "orange cat"
}
],
"offset": 0,
"limit": 20,
"total": 2
}
"###);
}
/// any other content-type is must be refused
#[actix_rt::test]
async fn error_add_documents_test_bad_content_types() {
@ -1027,6 +1154,53 @@ async fn error_document_field_limit_reached() {
@"");
}
#[actix_rt::test]
async fn add_documents_with_geo_field() {
let server = Server::new().await;
let index = server.index("doggo");
index.update_settings(json!({"sortableAttributes": ["_geo"]})).await;
let documents = json!([
{
"id": "1",
},
{
"id": "2",
"_geo": null,
},
{
"id": "3",
"_geo": { "lat": 1, "lng": 1 },
},
{
"id": "4",
"_geo": { "lat": "1", "lng": "1" },
},
]);
index.add_documents(documents, None).await;
let response = index.wait_task(1).await;
snapshot!(json_string!(response, { ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]" }),
@r###"
{
"uid": 1,
"indexUid": "doggo",
"status": "succeeded",
"type": "documentAdditionOrUpdate",
"canceledBy": null,
"details": {
"receivedDocuments": 4,
"indexedDocuments": 4
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
}
#[actix_rt::test]
async fn add_documents_invalid_geo_field() {
let server = Server::new().await;

View File

@ -1,5 +1,6 @@
use meili_snap::*;
use serde_json::json;
use urlencoding::encode;
use crate::common::Server;
@ -97,3 +98,323 @@ async fn delete_documents_batch() {
}
"###);
}
#[actix_rt::test]
async fn replace_documents_missing_payload() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) = index.raw_add_documents("", Some("application/json"), "").await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "A json payload is missing.",
"code": "missing_payload",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_payload"
}
"###);
let (response, code) = index.raw_add_documents("", Some("application/x-ndjson"), "").await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "A ndjson payload is missing.",
"code": "missing_payload",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_payload"
}
"###);
let (response, code) = index.raw_add_documents("", Some("text/csv"), "").await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "A csv payload is missing.",
"code": "missing_payload",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_payload"
}
"###);
}
#[actix_rt::test]
async fn update_documents_missing_payload() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) = index.raw_update_documents("", Some("application/json"), "").await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "A json payload is missing.",
"code": "missing_payload",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_payload"
}
"###);
let (response, code) = index.raw_update_documents("", Some("application/x-ndjson"), "").await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "A ndjson payload is missing.",
"code": "missing_payload",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_payload"
}
"###);
let (response, code) = index.raw_update_documents("", Some("text/csv"), "").await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "A csv payload is missing.",
"code": "missing_payload",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_payload"
}
"###);
}
#[actix_rt::test]
async fn replace_documents_missing_content_type() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) = index.raw_add_documents("", None, "").await;
snapshot!(code, @"415 Unsupported Media Type");
snapshot!(json_string!(response), @r###"
{
"message": "A Content-Type header is missing. Accepted values for the Content-Type header are: `application/json`, `application/x-ndjson`, `text/csv`",
"code": "missing_content_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_content_type"
}
"###);
// even with a csv delimiter specified this error is triggered first
let (response, code) = index.raw_add_documents("", None, "?csvDelimiter=;").await;
snapshot!(code, @"415 Unsupported Media Type");
snapshot!(json_string!(response), @r###"
{
"message": "A Content-Type header is missing. Accepted values for the Content-Type header are: `application/json`, `application/x-ndjson`, `text/csv`",
"code": "missing_content_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_content_type"
}
"###);
}
#[actix_rt::test]
async fn update_documents_missing_content_type() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) = index.raw_update_documents("", None, "").await;
snapshot!(code, @"415 Unsupported Media Type");
snapshot!(json_string!(response), @r###"
{
"message": "A Content-Type header is missing. Accepted values for the Content-Type header are: `application/json`, `application/x-ndjson`, `text/csv`",
"code": "missing_content_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_content_type"
}
"###);
// even with a csv delimiter specified this error is triggered first
let (response, code) = index.raw_update_documents("", None, "?csvDelimiter=;").await;
snapshot!(code, @"415 Unsupported Media Type");
snapshot!(json_string!(response), @r###"
{
"message": "A Content-Type header is missing. Accepted values for the Content-Type header are: `application/json`, `application/x-ndjson`, `text/csv`",
"code": "missing_content_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_content_type"
}
"###);
}
#[actix_rt::test]
async fn replace_documents_bad_content_type() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) = index.raw_add_documents("", Some("doggo"), "").await;
snapshot!(code, @"415 Unsupported Media Type");
snapshot!(json_string!(response), @r###"
{
"message": "The Content-Type `doggo` is invalid. Accepted values for the Content-Type header are: `application/json`, `application/x-ndjson`, `text/csv`",
"code": "invalid_content_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_content_type"
}
"###);
}
#[actix_rt::test]
async fn update_documents_bad_content_type() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) = index.raw_update_documents("", Some("doggo"), "").await;
snapshot!(code, @"415 Unsupported Media Type");
snapshot!(json_string!(response), @r###"
{
"message": "The Content-Type `doggo` is invalid. Accepted values for the Content-Type header are: `application/json`, `application/x-ndjson`, `text/csv`",
"code": "invalid_content_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_content_type"
}
"###);
}
#[actix_rt::test]
async fn replace_documents_bad_csv_delimiter() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) =
index.raw_add_documents("", Some("application/json"), "?csvDelimiter").await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `csvDelimiter`: expected a string of one character, but found an empty string",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
let (response, code) =
index.raw_add_documents("", Some("application/json"), "?csvDelimiter=doggo").await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `csvDelimiter`: expected a string of one character, but found the following string of 5 characters: `doggo`",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
let (response, code) = index
.raw_add_documents("", Some("application/json"), &format!("?csvDelimiter={}", encode("🍰")))
.await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "csv delimiter must be an ascii character. Found: `🍰`",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
}
#[actix_rt::test]
async fn update_documents_bad_csv_delimiter() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) =
index.raw_update_documents("", Some("application/json"), "?csvDelimiter").await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `csvDelimiter`: expected a string of one character, but found an empty string",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
let (response, code) =
index.raw_update_documents("", Some("application/json"), "?csvDelimiter=doggo").await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `csvDelimiter`: expected a string of one character, but found the following string of 5 characters: `doggo`",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
let (response, code) = index
.raw_update_documents(
"",
Some("application/json"),
&format!("?csvDelimiter={}", encode("🍰")),
)
.await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "csv delimiter must be an ascii character. Found: `🍰`",
"code": "invalid_document_csv_delimiter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_csv_delimiter"
}
"###);
}
#[actix_rt::test]
async fn replace_documents_csv_delimiter_with_bad_content_type() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) =
index.raw_add_documents("", Some("application/json"), "?csvDelimiter=a").await;
snapshot!(code, @"415 Unsupported Media Type");
snapshot!(json_string!(response), @r###"
{
"message": "The Content-Type `application/json` does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type `text/csv`.",
"code": "invalid_content_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_content_type"
}
"###);
let (response, code) =
index.raw_add_documents("", Some("application/x-ndjson"), "?csvDelimiter=a").await;
snapshot!(code, @"415 Unsupported Media Type");
snapshot!(json_string!(response), @r###"
{
"message": "The Content-Type `application/x-ndjson` does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type `text/csv`.",
"code": "invalid_content_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_content_type"
}
"###);
}
#[actix_rt::test]
async fn update_documents_csv_delimiter_with_bad_content_type() {
let server = Server::new().await;
let index = server.index("test");
let (response, code) =
index.raw_update_documents("", Some("application/json"), "?csvDelimiter=a").await;
snapshot!(code, @"415 Unsupported Media Type");
snapshot!(json_string!(response), @r###"
{
"message": "The Content-Type `application/json` does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type `text/csv`.",
"code": "invalid_content_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_content_type"
}
"###);
let (response, code) =
index.raw_update_documents("", Some("application/x-ndjson"), "?csvDelimiter=a").await;
snapshot!(code, @"415 Unsupported Media Type");
snapshot!(json_string!(response), @r###"
{
"message": "The Content-Type `application/x-ndjson` does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type `text/csv`.",
"code": "invalid_content_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_content_type"
}
"###);
}

View File

@ -442,3 +442,37 @@ async fn displayedattr_2_smol() {
)
.await;
}
#[cfg(feature = "default")]
#[actix_rt::test]
async fn test_cjk_highlight() {
let server = Server::new().await;
let index = server.index("test");
let documents = json!([
{ "id": 0, "title": "この度、クーポンで無料で頂きました。" },
{ "id": 1, "title": "大卫到了扫罗那里" },
]);
index.add_documents(documents, None).await;
index.wait_task(0).await;
index
.search(json!({"q": "", "attributesToHighlight": ["title"]}), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0]["_formatted"]["title"],
json!("この度、クーポン<em>で</em>無料<em>で</em>頂きました。")
);
})
.await;
index
.search(json!({"q": "大卫", "attributesToHighlight": ["title"]}), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"][0]["_formatted"]["title"],
json!("<em>大卫</em>到了扫罗那里")
);
})
.await;
}

View File

@ -149,6 +149,49 @@ async fn simple_search() {
.await;
}
#[actix_rt::test]
async fn phrase_search_with_stop_word() {
// related to https://github.com/meilisearch/meilisearch/issues/3521
let server = Server::new().await;
let index = server.index("test");
let (_, code) = index.update_settings(json!({"stopWords": ["the", "of"]})).await;
meili_snap::snapshot!(code, @"202 Accepted");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(1).await;
index
.search(json!({"q": "how \"to\" train \"the" }), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(response["hits"].as_array().unwrap().len(), 1);
})
.await;
}
#[cfg(feature = "default")]
#[actix_rt::test]
async fn test_kanji_language_detection() {
let server = Server::new().await;
let index = server.index("test");
let documents = json!([
{ "id": 0, "title": "The quick (\"brown\") fox can't jump 32.3 feet, right? Brr, it's 29.3°F!" },
{ "id": 1, "title": "東京のお寿司。" },
{ "id": 2, "title": "הַשּׁוּעָל הַמָּהִיר (״הַחוּם״) לֹא יָכוֹל לִקְפֹּץ 9.94 מֶטְרִים, נָכוֹן? ברר, 1.5°C- בַּחוּץ!" }
]);
index.add_documents(documents, None).await;
index.wait_task(0).await;
index
.search(json!({"q": "東京"}), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(response["hits"].as_array().unwrap().len(), 1);
})
.await;
}
#[actix_rt::test]
async fn search_multiple_params() {
let server = Server::new().await;

View File

@ -8,20 +8,40 @@ use crate::common::Server;
async fn search_empty_list() {
let server = Server::new().await;
let (response, code) = server.search(json!([])).await;
let (response, code) = server.multi_search(json!({"queries": []})).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(response), @"[]");
snapshot!(json_string!(response), @r###"
{
"results": []
}
"###);
}
#[actix_rt::test]
async fn search_json_object() {
let server = Server::new().await;
let (response, code) = server.search(json!({})).await;
let (response, code) = server.multi_search(json!({})).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value type: expected an array, but found an object: `{}`",
"message": "Missing field `queries`",
"code": "bad_request",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#bad_request"
}
"###);
}
#[actix_rt::test]
async fn search_json_array() {
let server = Server::new().await;
let (response, code) = server.multi_search(json!([])).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value type: expected an object, but found an array: `[]`",
"code": "bad_request",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#bad_request"
@ -39,13 +59,13 @@ async fn simple_search_single_index() {
index.wait_task(0).await;
let (response, code) = server
.search(json!([
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass"},
{"indexUid": "test", "q": "captain"},
]))
]}))
.await;
snapshot!(code, @"200 OK");
insta::assert_json_snapshot!(response, { "[].processingTimeMs" => "[time]" }, @r###"
insta::assert_json_snapshot!(response["results"], { "[].processingTimeMs" => "[time]" }, @r###"
[
{
"indexUid": "test",
@ -79,6 +99,56 @@ async fn simple_search_single_index() {
"###);
}
#[actix_rt::test]
async fn simple_search_missing_index_uid() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(0).await;
let (response, code) = server
.multi_search(json!({"queries": [
{"q": "glass"},
]}))
.await;
snapshot!(code, @"400 Bad Request");
insta::assert_json_snapshot!(response, @r###"
{
"message": "Missing field `indexUid` inside `.queries[0]`",
"code": "missing_index_uid",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_index_uid"
}
"###);
}
#[actix_rt::test]
async fn simple_search_illegal_index_uid() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(0).await;
let (response, code) = server
.multi_search(json!({"queries": [
{"indexUid": "", "q": "glass"},
]}))
.await;
snapshot!(code, @"400 Bad Request");
insta::assert_json_snapshot!(response, @r###"
{
"message": "Invalid value at `.queries[0].indexUid`: `hé` is not a valid index uid. Index uid can be an integer or a string containing only alphanumeric characters, hyphens (-) and underscores (_).",
"code": "invalid_index_uid",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_index_uid"
}
"###);
}
#[actix_rt::test]
async fn simple_search_two_indexes() {
let server = Server::new().await;
@ -94,13 +164,13 @@ async fn simple_search_two_indexes() {
index.wait_task(1).await;
let (response, code) = server
.search(json!([
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass"},
{"indexUid": "nested", "q": "pesti"},
]))
]}))
.await;
snapshot!(code, @"200 OK");
insta::assert_json_snapshot!(response, { "[].processingTimeMs" => "[time]" }, @r###"
insta::assert_json_snapshot!(response["results"], { "[].processingTimeMs" => "[time]" }, @r###"
[
{
"indexUid": "test",
@ -171,15 +241,15 @@ async fn search_one_index_doesnt_exist() {
index.wait_task(0).await;
let (response, code) = server
.search(json!([
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass"},
{"indexUid": "nested", "q": "pesti"},
]))
]}))
.await;
snapshot!(code, @"404 Not Found");
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Index `nested` not found.",
"message": "Inside `.queries[1]`: Index `nested` not found.",
"code": "index_not_found",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#index_not_found"
@ -192,15 +262,15 @@ async fn search_multiple_indexes_dont_exist() {
let server = Server::new().await;
let (response, code) = server
.search(json!([
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass"},
{"indexUid": "nested", "q": "pesti"},
]))
]}))
.await;
snapshot!(code, @"404 Not Found");
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Index `test` not found.",
"message": "Inside `.queries[0]`: Index `test` not found.",
"code": "index_not_found",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#index_not_found"
@ -224,15 +294,15 @@ async fn search_one_query_error() {
index.wait_task(1).await;
let (response, code) = server
.search(json!([
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass", "facets": ["title"]},
{"indexUid": "nested", "q": "pesti"},
]))
]}))
.await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid facet distribution, this index does not have configured filterable attributes.",
"message": "Inside `.queries[0]`: Invalid facet distribution, this index does not have configured filterable attributes.",
"code": "invalid_search_facets",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_facets"
@ -256,15 +326,15 @@ async fn search_multiple_query_errors() {
index.wait_task(1).await;
let (response, code) = server
.search(json!([
.multi_search(json!({"queries": [
{"indexUid" : "test", "q": "glass", "facets": ["title"]},
{"indexUid": "nested", "q": "pesti", "facets": ["doggos"]},
]))
]}))
.await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid facet distribution, this index does not have configured filterable attributes.",
"message": "Inside `.queries[0]`: Invalid facet distribution, this index does not have configured filterable attributes.",
"code": "invalid_search_facets",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_facets"

View File

@ -1,8 +1,8 @@
use std::time::Duration;
use actix_rt::time::sleep;
use meilisearch::option::ScheduleSnapshot;
use meilisearch::Opt;
use tokio::time::sleep;
use crate::common::server::default_settings;
use crate::common::{GetAllDocumentsOptions, Server};
@ -23,21 +23,20 @@ macro_rules! verify_snapshot {
};
let (snapshot, _) = test(snapshot.clone()).await;
let (orig, _) = test(orig.clone()).await;
assert_eq!(snapshot, orig);
assert_eq!(snapshot, orig, "Got \n{}\nWhile expecting:\n{}", serde_json::to_string_pretty(&snapshot).unwrap(), serde_json::to_string_pretty(&orig).unwrap());
}
)*
};
}
#[actix_rt::test]
#[ignore] // TODO: unignore
async fn perform_snapshot() {
let temp = tempfile::tempdir().unwrap();
let snapshot_dir = tempfile::tempdir().unwrap();
let options = Opt {
snapshot_dir: snapshot_dir.path().to_owned(),
schedule_snapshot: ScheduleSnapshot::Enabled(1),
schedule_snapshot: ScheduleSnapshot::Enabled(2),
..default_settings(temp.path())
};
@ -61,6 +60,16 @@ async fn perform_snapshot() {
let temp = tempfile::tempdir().unwrap();
let snapshot_path = snapshot_dir.path().to_owned().join("db.snapshot");
#[cfg_attr(windows, allow(unused))]
let snapshot_meta = std::fs::metadata(&snapshot_path).unwrap();
#[cfg(unix)]
{
use std::os::unix::fs::PermissionsExt;
let mode = snapshot_meta.permissions().mode();
// rwxrwxrwx
meili_snap::snapshot!(format!("{:b}", mode), @"1000000100100100");
}
let options = Opt { import_snapshot: Some(snapshot_path), ..default_settings(temp.path()) };
@ -71,7 +80,10 @@ async fn perform_snapshot() {
// for some reason the db sizes differ. this may be due to the compaction options we have
// set when performing the snapshot
//server.stats(),
server.tasks(),
// The original instance contains the snapshotCreation task, while the snapshotted-instance does not. For this reason we need to compare the task queue **after** the task 4
server.tasks_filter("?from=2"),
server.index("test").get_all_documents(GetAllDocumentsOptions::default()),
server.index("test").settings(),
server.index("test1").get_all_documents(GetAllDocumentsOptions::default()),

View File

@ -1,18 +1,25 @@
[package]
name = "milli"
version = "1.0.0"
authors = ["Kerollmops <clement@meilisearch.com>"]
edition = "2018"
publish = false
version.workspace = true
authors.workspace = true
description.workspace = true
homepage.workspace = true
readme.workspace = true
# edition.workspace = true
license.workspace = true
[dependencies]
bimap = { version = "0.6.2", features = ["serde"] }
bincode = "1.3.3"
bstr = "1.0.1"
byteorder = "1.4.3"
charabia = { version = "0.7.0", default-features = false }
charabia = { version = "0.7.1", default-features = false }
concat-arrays = "0.1.2"
crossbeam-channel = "0.5.6"
deserr = "0.3.0"
deserr = "0.5.0"
either = "1.8.0"
flatten-serde-json = { path = "../flatten-serde-json" }
fst = "0.4.7"

View File

@ -7,45 +7,31 @@ use serde::{Deserialize, Serialize};
use thiserror::Error;
use crate::error::is_reserved_keyword;
use crate::search::facet::BadGeoError;
use crate::{CriterionError, Error, UserError};
/// This error type is never supposed to be shown to the end user.
/// You must always cast it to a sort error or a criterion error.
#[derive(Debug)]
#[derive(Error, Debug)]
pub enum AscDescError {
InvalidLatitude,
InvalidLongitude,
#[error(transparent)]
GeoError(BadGeoError),
#[error("Invalid syntax for the asc/desc parameter: expected expression ending by `:asc` or `:desc`, found `{name}`.")]
InvalidSyntax { name: String },
#[error("`{name}` is a reserved keyword and thus can't be used as a asc/desc rule.")]
ReservedKeyword { name: String },
}
impl fmt::Display for AscDescError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
Self::InvalidLatitude => {
write!(f, "Latitude must be contained between -90 and 90 degrees.",)
}
Self::InvalidLongitude => {
write!(f, "Longitude must be contained between -180 and 180 degrees.",)
}
Self::InvalidSyntax { name } => {
write!(f, "Invalid syntax for the asc/desc parameter: expected expression ending by `:asc` or `:desc`, found `{}`.", name)
}
Self::ReservedKeyword { name } => {
write!(
f,
"`{}` is a reserved keyword and thus can't be used as a asc/desc rule.",
name
)
}
}
impl From<BadGeoError> for AscDescError {
fn from(geo_error: BadGeoError) -> Self {
AscDescError::GeoError(geo_error)
}
}
impl From<AscDescError> for CriterionError {
fn from(error: AscDescError) -> Self {
match error {
AscDescError::InvalidLatitude | AscDescError::InvalidLongitude => {
AscDescError::GeoError(_) => {
CriterionError::ReservedNameForSort { name: "_geoPoint".to_string() }
}
AscDescError::InvalidSyntax { name } => CriterionError::InvalidName { name },
@ -85,9 +71,9 @@ impl FromStr for Member {
.map_err(|_| AscDescError::ReservedKeyword { name: text.to_string() })
})?;
if !(-90.0..=90.0).contains(&lat) {
return Err(AscDescError::InvalidLatitude)?;
return Err(BadGeoError::Lat(lat))?;
} else if !(-180.0..=180.0).contains(&lng) {
return Err(AscDescError::InvalidLongitude)?;
return Err(BadGeoError::Lng(lng))?;
}
Ok(Member::Geo([lat, lng]))
}
@ -162,10 +148,8 @@ impl FromStr for AscDesc {
#[derive(Error, Debug)]
pub enum SortError {
#[error("{}", AscDescError::InvalidLatitude)]
InvalidLatitude,
#[error("{}", AscDescError::InvalidLongitude)]
InvalidLongitude,
#[error(transparent)]
ParseGeoError { error: BadGeoError },
#[error("Invalid syntax for the geo parameter: expected expression formated like \
`_geoPoint(latitude, longitude)` and ending by `:asc` or `:desc`, found `{name}`.")]
BadGeoPointUsage { name: String },
@ -184,8 +168,7 @@ pub enum SortError {
impl From<AscDescError> for SortError {
fn from(error: AscDescError) -> Self {
match error {
AscDescError::InvalidLatitude => SortError::InvalidLatitude,
AscDescError::InvalidLongitude => SortError::InvalidLongitude,
AscDescError::GeoError(error) => SortError::ParseGeoError { error },
AscDescError::InvalidSyntax { name } => SortError::InvalidName { name },
AscDescError::ReservedKeyword { name } if name.starts_with("_geoPoint") => {
SortError::BadGeoPointUsage { name }
@ -277,11 +260,11 @@ mod tests {
),
("_geoPoint(35, 85, 75):asc", ReservedKeyword { name: S("_geoPoint(35, 85, 75)") }),
("_geoPoint(18):asc", ReservedKeyword { name: S("_geoPoint(18)") }),
("_geoPoint(200, 200):asc", InvalidLatitude),
("_geoPoint(90.000001, 0):asc", InvalidLatitude),
("_geoPoint(0, -180.000001):desc", InvalidLongitude),
("_geoPoint(159.256, 130):asc", InvalidLatitude),
("_geoPoint(12, -2021):desc", InvalidLongitude),
("_geoPoint(200, 200):asc", GeoError(BadGeoError::Lat(200.))),
("_geoPoint(90.000001, 0):asc", GeoError(BadGeoError::Lat(90.000001))),
("_geoPoint(0, -180.000001):desc", GeoError(BadGeoError::Lng(-180.000001))),
("_geoPoint(159.256, 130):asc", GeoError(BadGeoError::Lat(159.256))),
("_geoPoint(12, -2021):desc", GeoError(BadGeoError::Lng(-2021.))),
];
for (req, expected_error) in invalid_req {

View File

@ -59,6 +59,8 @@ pub enum InternalError {
Utf8(#[from] str::Utf8Error),
#[error("An indexation process was explicitly aborted.")]
AbortedIndexation,
#[error("The matching words list contains at least one invalid member.")]
InvalidMatchingWords,
}
#[derive(Error, Debug)]

Some files were not shown because too many files have changed in this diff Show More