Commit Graph

7998 Commits

Author SHA1 Message Date
Tamo
8a1b1a95f3 comment the right of the metrics 2023-02-23 13:59:01 +01:00
Tamo
8d47d2d018 update the auth api after the rebase 2023-02-23 13:15:51 +01:00
Tamo
5082cd5e67 update the config file to mention the experimental metrics feature 2023-02-23 12:26:22 +01:00
Tamo
750a2b6842 Update meilisearch/src/option.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-23 12:26:22 +01:00
Tamo
bc7d4112d9 send the cli experimental feature in the analytics 2023-02-23 12:26:22 +01:00
Tamo
88a18677d0 rename the metrics cli flag 2023-02-23 12:26:22 +01:00
Tamo
68e30214ca remove the feature flag and reorganize the module slightly 2023-02-23 12:26:21 +01:00
bors[bot]
b985b96e4e Merge #3530
3530: Fix highlighter bug r=Kerollmops a=ManyTheFish

# Pull Request

There was a highlighting issue on CJK's character, we were highlighting too many characters and these additional characters were duplicated after the highlight tag.

## Related issue
Fixes #3517 
Fixes #3526 

## What does this PR do?
- add a test showcasing the bug
- fix the bug by activating the char_map creation of the tokenizer during the highlighting process


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-02-23 10:59:43 +00:00
Louis Dureuil
71e7900c67 move index_map to file 2023-02-23 11:29:11 +01:00
Louis Dureuil
431782f3ee Move index_mapper to mod.rs 2023-02-23 11:29:11 +01:00
Louis Dureuil
3db613ff77 Don't iterate all indexes manually 2023-02-23 11:29:09 +01:00
Louis Dureuil
5822764be9 Skip computing index budget in tests 2023-02-23 11:23:39 +01:00
Louis Dureuil
c63294f331 Switch to 2TiB default index size, updates documentation 2023-02-23 11:23:39 +01:00
Louis Dureuil
a529bf160c Compute budget 2023-02-23 11:23:39 +01:00
Louis Dureuil
f1119f2dc2 Add dichotomic search to utils 2023-02-23 11:23:39 +01:00
Louis Dureuil
1db7d5d851 Add basic tests for index eviction and resize 2023-02-23 11:23:39 +01:00
Louis Dureuil
80b060f920 Use LRU cache 2023-02-23 11:23:39 +01:00
Louis Dureuil
fdf043580c Add LruMap 2023-02-23 11:23:38 +01:00
bors[bot]
f62703cd67 Merge #3534
3534: Update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter r=Kerollmops a=irevoire

Fixes #3533

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-23 07:05:12 +00:00
Tamo
76f82c880d update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter 2023-02-22 19:26:48 +01:00
bors[bot]
6eeba3a8ab Merge #3417
3417: Allow multiple searches in a single request r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #3427

## What does this PR do?

### User standpoint

- Adds a new `/multi-search` entry point (not to be confused with the existing `/{index_uid}/search` entry points) that accepts a POST whose body is an object containing an array of queries.
    - Each query must specify on which index it acts by providing its `indexUid`. Other parameters are identical to the one in the existing search routes (`q`, `limit`, etc.).
    - The response is a JSON object containing an array of the results for each search query as if it had been performed using the `/{index_uid}/search` routes.

### Implementation standpoint

- Refactor authentication module:
  -  Allow tenant token to be checked even without an index in URL
  - Add `meilisearch-auth` as a dependency to `index-scheduler` so as to have a working method of checking if the indexes are authorized there that takes into account both the API key and the tenant token (existing method relied on a behavior that was returning the allowed indexes from the API key as long as there weren't any tenant token)
  - Make `AuthFilter` an object with invariants and so its fields are now private
  - Use the methods of `AuthFilter` to know if an index is authorized rather than relying on its internal search rules.
  - Make tenant token search rules optional and `None` when the `AuthFilter` was not built with a tenant token.
- Add a new `routes::index::search::multiple_search` module containing a post handler that performs the same work as the existing `routes::index::search` post handler, but in a loop.
  - Add various tests
  - Add authentication test suite 

### Sample request


<details>
<summary>
Click to see request/response
</summary>

```json
~/datasets
❯ curl \
  -X POST 'http://localhost:7700/multi-search' \
  -H 'Content-Type: application/json' \
  --data-binary '{"queries": [{ "indexUid": "index-0", "q": "toto", "limit": 1 }, {"indexUid": "index-1", "q": "titi", "limit": 1}]}' | jsonxf
{ "results": [
  {
    "indexUid": "index-0",
    "hits": [
      {
        "id": 20480,
        "title": "Toto - 25th Anniversary - Live in Amsterdam",
        "overview": "Filmed in High Definition in Amsterdam on Toto's 25th Anniversary Tour in 2003, this stunning concert captures the band at their very best, reunited with original vocalist Bobby Kimball. The set combines all their hits with tracks from their latest album \"Through the Looking Glass\" and other live favorites, performed in front of a wildly enthusastic sell-out crowd. Extras include 35 minute behind-the-scenes film following the band through various stages of their world tour including footage from Japan, Thailand, South Korea, and France.  Toto celebrate their 25th anniversary with this blistering live concert, filmed in Amsterdam on February 25th, 2003. Proving they've still got exactly what it takes to move a crowd, the band perform a mixture of medley's, solo spots, and huge hits. Tracks include \"Rosanna,\" \"Africa,\" \"Hold The Line,\" a cover of the Beatles' \"While My Guitar Gently Weeps,\" and many more.",
        "genres": [
          "Music"
        ],
        "poster": "https://image.tmdb.org/t/p/w500/7SCbUPwoB8Z7VUIA1Rn1WWwjNiT.jpg",
        "release_date": 1064275200
      }
    ],
    "query": "toto",
    "processingTimeMs": 1,
    "limit": 1,
    "offset": 0,
    "estimatedTotalHits": 17
  },
  {
    "indexUid": "index-1",
    "hits": [
      {
        "id": 41212,
        "title": "Titicut Follies",
        "overview": "The film is a stark and graphic portrayal of the conditions that existed at the State Prison for the Criminally Insane at Bridgewater, Massachusetts. TITICUT FOLLIES documents the various ways the inmates are treated by the guards, social workers and psychiatrists.",
        "genres": [
          "Documentary"
        ],
        "poster": "https://image.tmdb.org/t/p/w500/2Ju5hn1ofOPeP1eRJtQWakiHuhW.jpg",
        "release_date": -70934400
      }
    ],
    "query": "titi",
    "processingTimeMs": 0,
    "limit": 1,
    "offset": 0,
    "estimatedTotalHits": 7
  }
]}
```

</details>


## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-22 17:23:45 +00:00
ManyTheFish
28d6a4466d Make the tokenizer creating a char map during highlighting 2023-02-22 17:43:10 +01:00
Louis Dureuil
1ba2fae3ae multi-search/authentication: Add authentication tests 2023-02-22 17:04:12 +01:00
Louis Dureuil
28d6ab78de multi-search: Add multi search tests 2023-02-22 17:04:12 +01:00
Louis Dureuil
3ba5dfb6ec multi-search: Add test server search method for multi search 2023-02-22 17:04:12 +01:00
Louis Dureuil
a23fbf6c7b multi-search: Add search with an array of indexes 2023-02-22 17:04:12 +01:00
Louis Dureuil
596a98f7c6 multi-search: Add basic analytics 2023-02-22 16:37:18 +01:00
Louis Dureuil
14c4a222da Authentication: AuthFilter::allow_index_creation both check that the index is authorized and the IndexCreate action 2023-02-22 16:37:13 +01:00
Louis Dureuil
690bb2e5cc Authentication: Make allow_index_creation a private field 2023-02-22 16:35:52 +01:00
Louis Dureuil
d0f2c9c72e Authentication: Make search_rules optional in AuthFilter 2023-02-22 16:35:52 +01:00
Louis Dureuil
42577403d8 Authentication: Directly pass the authfilter to the index scheduler 2023-02-22 16:35:52 +01:00
Louis Dureuil
c8c5944094 Authentication: is_index_authorized takes into account API key indexes even with a tenant token 2023-02-22 16:35:52 +01:00
Louis Dureuil
4b65851793 Authentication: Refactor authentication check to work for tenant token even without an index in URL
Callers need to manually check `is_index_authorized` when using the route without an index in URL
2023-02-22 16:35:51 +01:00
Louis Dureuil
10d4a1a9af Make ResponseError code and message pub so that they can be modified 2023-02-22 16:35:51 +01:00
ManyTheFish
ad35edfa32 Add test 2023-02-22 15:47:15 +01:00
Tamo
033417e9cc add an analytics on the geo bounding box feature 2023-02-22 15:35:26 +01:00
bors[bot]
ac5a1e4c4b Merge #3423
3423: Add min and max facet stats r=dureuill a=dureuill

# Pull Request

## Related issue
Fixes #3426

## What does this PR do?

### User standpoint

- When using a `facets` parameter in search, the facets that have numeric values are displayed in a new section of the response called `facetStats` that contains, per facet, the numeric min and max value of the hits returned by the search.

<details>
<summary>
Sample request/response
</summary>

```json
❯ curl \
  -X POST 'http://localhost:7700/indexes/meteorites/search?facets=mass' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "LL6", "facets":["mass", "recclass"], "limit": 5 }' | jsonxf
{
  "hits": [
    {
      "name": "Niger (LL6)",
      "id": "16975",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 3.3,
      "fall": "Fell"
    },
    {
      "name": "Appley Bridge",
      "id": "2318",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 15000,
      "fall": "Fell",
      "_geo": {
        "lat": 53.58333,
        "lng": -2.71667
      }
    },
    {
      "name": "Athens",
      "id": "4885",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 265,
      "fall": "Fell",
      "_geo": {
        "lat": 34.75,
        "lng": -87.0
      }
    },
    {
      "name": "Bandong",
      "id": "4935",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 11500,
      "fall": "Fell",
      "_geo": {
        "lat": -6.91667,
        "lng": 107.6
      }
    },
    {
      "name": "Benguerir",
      "id": "30443",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 25000,
      "fall": "Fell",
      "_geo": {
        "lat": 32.25,
        "lng": -8.15
      }
    }
  ],
  "query": "LL6",
  "processingTimeMs": 15,
  "limit": 5,
  "offset": 0,
  "estimatedTotalHits": 42,
  "facetDistribution": {
    "mass": {
      "110000": 1,
      "11500": 1,
      "1161": 1,
      "12000": 1,
      "1215.5": 1,
      "127000": 1,
      "15000": 1,
      "1676": 1,
      "1700": 1,
      "1710.5": 1,
      "18000": 1,
      "19000": 1,
      "220000": 1,
      "2220": 1,
      "22300": 1,
      "25000": 2,
      "265": 1,
      "271000": 1,
      "2840": 1,
      "3.3": 1,
      "3000": 1,
      "303": 1,
      "32000": 1,
      "34000": 1,
      "36.1": 1,
      "45000": 1,
      "460": 1,
      "478": 1,
      "483": 1,
      "5500": 2,
      "600": 1,
      "6000": 1,
      "67.8": 1,
      "678": 1,
      "680.5": 1,
      "6930": 1,
      "8": 1,
      "8300": 1,
      "840": 1,
      "8400": 1
    },
    "recclass": {
      "L/LL6": 3,
      "LL6": 39
    }
  },
  "facetStats": {
    "mass": {
      "min": 3.3,
      "max": 271000.0
    }
  }
}
```

</details>

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-22 13:06:43 +00:00
curquiza
3eb9a08b5c Update comments in version bump CI 2023-02-21 19:14:59 +01:00
ManyTheFish
900bae3d9d keep phrases that has at least one word 2023-02-21 18:16:51 +01:00
ManyTheFish
28b7d73d4a Remove an unefficient part of a test on milli 2023-02-21 18:16:51 +01:00
ManyTheFish
6841f167b4 Add test 2023-02-21 18:02:52 +01:00
bors[bot]
c88b6f331f Merge #3482
3482: Optimize meilisearch uffizzi build r=curquiza a=waveywaves

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3476

## What does this PR do?
even though docker cache was being used earlier for uffizzi builds, seems like the cache layers weren't persisting. This commit adds changes to move meilisearch building outside the dockerfile so that we can use the rust cache action. We are also building to the musl target so that the binary for meilisearch which is created can be used for the uffizzi ttyd image which uses alpine.

Meilisearch build time brought to 5 mins example https://github.com/waveywaves/meilisearch/actions/runs/4142776058

we also update the version of uffizzi action used here which fixes another uffizzi bug where the environments are not deployed. https://app.uffizzi.com/github.com/waveywaves/meilisearch/pull/2 was built as a part of a test for this PR and we can be sure that the deployment works well now.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Vibhav Bobade <vibhav.bobde@gmail.com>
2023-02-21 16:06:53 +00:00
Vibhav Bobade
09a94e0db3 optimize meilisearch uffizzi build
even though docker cache was being used earlier for uffizzi builds,
seems like the cache layers weren't persisting. This commit adds changes
to move meilisearch building outside the dockerfile so that we can
use the rust cache action. We are also building to the musl target
so that the binary for meilisearch which is created can be used for
the uffizzi ttyd image which uses alpine.
2023-02-21 17:25:28 +05:30
bors[bot]
39407885c2 Merge #3347
3347: Enhance language detection r=irevoire a=ManyTheFish

## Summary

Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query.

This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search.

## Detail

- Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected
- Fill the newly created database during indexing
- Create an allow-list with this database and pass it to Charabia
- Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese

## Related issues
Fixes #2403
Fixes #3513

Co-authored-by: f3r10 <frledesma@outlook.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-02-21 10:52:13 +00:00
bors[bot]
a3e41ba33e Merge #3496
3496: Fix metrics feature r=irevoire a=james-2001

# Pull Request

## Related issue

Resolves: #3469
See also: #2763

## What does this PR do?
As reported the metrics feature was broken by still using and old reference to `meilisearch_auth::actions`. This commit switches to the new location, `meilisearch_types::keys::actions`.

The original issue was not *that* clear as to exactly what was broken, and the build logs have disappeared, but it seemed to just be this one line fix. If this is not the case and I've missed the mark let me know, and i'll head back to the drawing board.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: James <james.a.may.2001@gmail.com>
2023-02-21 10:13:11 +00:00
James
ce807d760b Fix formatting issue on Opt struct
tab in enable_metrics_route to fix cargo fmt issues

Resolves: #3469
See also: #2763
2023-02-21 09:45:18 +00:00
ManyTheFish
bbecab8948 fix clippy 2023-02-21 10:18:44 +01:00
James
5cff435bf6 Add feature flags to Opt structure
Resolves: #3469
See also: #2763
2023-02-21 07:41:41 +00:00
ManyTheFish
8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
bors[bot]
1e9ac00800 Merge #3505
3505: Csv delimiter r=irevoire a=irevoire

Fixes https://github.com/meilisearch/meilisearch/issues/3442
Closes https://github.com/meilisearch/meilisearch/pull/2803
Specified in https://github.com/meilisearch/specifications/pull/221

This PR is a reimplementation of https://github.com/meilisearch/meilisearch/pull/2803, on the new engine. Thanks for your idea and initial PR `@MixusMinimax;` sorry I couldn’t update/merge your PR. Way too many changes happened on the engine in the meantime.

**Attention to reviewer**; I had to update deserr to implement the support of deserializing `char`s

-------

It introduces four new error messages;
- Invalid value in parameter csvDelimiter: expected a string of one character, but found an empty string
- Invalid value in parameter csvDelimiter: expected a string of one character, but found the following string of 5 characters: doggo
- csv delimiter must be an ascii character. Found: 🍰 
- The Content-Type application/json does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type text/csv.

And one error code;
- `invalid_index_csv_delimiter`

The `invalid_content_type` error code is now also used when we encounter the `csvDelimiter` query parameter with a non-csv content type.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 17:01:36 +00:00