Commit Graph

7526 Commits

Author SHA1 Message Date
ManyTheFish
24c0775c67 Change indexing threshold 2023-03-08 12:36:04 +01:00
ManyTheFish
3092cf0448 Fix clippy errors 2023-03-08 10:53:42 +01:00
ManyTheFish
37d4551e8e Add a threshold filtering the Languages allowed to be detected at search time 2023-03-07 19:38:01 +01:00
ManyTheFish
da48506f15 Rerun extraction when language detection might have failed 2023-03-07 18:35:26 +01:00
bors[bot]
370d88f626 Merge #3561
3561: Fix the snapshots permissions on unix system r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3507

The snapshot permissions were wrong after the v0.30 and the huge refacto of the index scheduler.
Fix this issue + add a test on the permissions on unix

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-03-07 08:51:38 +00:00
Tamo
d34faa8f9c put back the sleep as it was and fix the from 2023-03-06 18:09:09 +01:00
Tamo
e5d0bef6d8 update a comment 2023-03-06 17:04:24 +01:00
Tamo
e704728ee7 fix the snapshots permissions on unix system 2023-03-06 16:28:40 +01:00
bors[bot]
c0ede6d152 Merge #3562
3562: Update version for the next release (v1.1.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
v1.1.0-rc.0
2023-03-06 13:54:16 +00:00
curquiza
577e7126f9 Update version for the next release (v1.1.0) in Cargo.toml 2023-03-06 13:52:54 +00:00
bors[bot]
3d1046369c Merge #3529
3529: Add an analytics on the geo bounding box feature r=ManyTheFish a=irevoire

Fixes #3527

[The specification of the geoBoundingBox](https://github.com/meilisearch/specifications/pull/223) feature has been updated and now introduces a new analytics to follow the usage of the geoBoundingBox feature in the search requests.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-03-02 11:58:39 +00:00
bors[bot]
4f1ccbc495 Merge #3525
3525: Fix phrase search containing stop words r=ManyTheFish a=ManyTheFish

# Summary
A search with a phrase containing only stop words was returning an HTTP error 500,
this PR filters the phrase containing only stop words dropping them before the search starts, a query with a phrase containing only stop words now behaves like a placeholder search.

fixes https://github.com/meilisearch/meilisearch/issues/3521

related v1.0.2 PR on milli: https://github.com/meilisearch/milli/pull/779



Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-03-02 10:55:37 +00:00
ManyTheFish
37489fd495 Return an internal error in the case of matching word is invalid 2023-03-01 19:05:16 +01:00
bors[bot]
d9e19c89c5 Merge #3544
3544: Attempt to use default vram budget for faster startup r=Kerollmops a=dureuill

# Pull Request

## Related issue
Follow-up to #3382: addresses the added startup time on Windows/macOS.

## What does this PR do?
- Attempt to skip budget calculation by using "known good values" instead
- Perform dichotomic budget calculation as fallback only when the known value is not actually good.


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-03-01 09:49:38 +00:00
bors[bot]
18bf740ee2 Merge #3539
3539: Update migration link to the docs r=curquiza a=curquiza

Fixes https://github.com/meilisearch/meilisearch/issues/3449

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-02-28 18:21:11 +00:00
Louis Dureuil
0202ff8ab4 Attempt to use default budget for faster startup 2023-02-28 10:55:43 +01:00
bors[bot]
fbe4ab158e Merge #3543
3543: config: case `experimental_enable_metrics` in snake_case r=dureuill a=dureuill

# Pull Request

Avoids "Error: unknown field `experimental-enable-metrics` at line 1 column 1" error when using the default config file.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-28 09:35:25 +00:00
Louis Dureuil
92318ca573 config: case experimental_enable_metrics in snake_case 2023-02-27 17:14:06 +01:00
bors[bot]
6ca7a109b9 Merge #3538
3538: Improve the api key of the metrics r=dureuill a=irevoire

Related to https://github.com/meilisearch/meilisearch/pull/3524#discussion_r1115903998
Update: https://github.com/meilisearch/meilisearch/issues/3523

Right after merging the PR, we changed our minds and decided to update the way we handle the API keys on the metrics route.
Now instead of bypassing all the applied rules of the API key, we forbid the usage of the `/metrics` route if you have any restrictions on the indexes.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-27 13:46:57 +00:00
Louis Dureuil
d4d4702f1b Rephrase hint message 2023-02-27 13:46:16 +01:00
curquiza
2648bbca25 Update migration link to the docs 2023-02-23 18:36:30 +01:00
bors[bot]
562c86ea01 Merge #3519
3519: Update comments in version bump CI r=irevoire a=curquiza

Following https://github.com/meilisearch/meilisearch/pull/3499

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-02-23 16:46:10 +00:00
Tamo
7ae10abb6b fix the auth tests 2023-02-23 17:27:42 +01:00
Tamo
dc533584c6 Forbid the usage of the metrics route if your API key have a limitation on the indexes 2023-02-23 17:13:22 +01:00
bors[bot]
442c1e36de Merge #3537
3537: fix a bug where the filestore could try to parse its own tmp file and fail (main) r=irevoire a=curquiza



Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-23 15:58:05 +00:00
Tamo
66b5e4b548 fix a bug where the filestore could try to parse its own tmp file and fail 2023-02-23 16:52:41 +01:00
bors[bot]
89ac1015f3 Merge #3524
3524: Update the metrics route r=irevoire a=irevoire

Fixes #3523

Make the metrics available by default without a feature flag.
+ Rename the cli-flag to `experimental-enable-metrics`.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-23 15:11:10 +00:00
bors[bot]
ca25904c26 Merge #3331
3331: Limit the number of concurrently opened indexes r=dureuill a=dureuill

# Pull Request

## Related issue
Relevant to #1841, fixes #3382

## What does this PR do?

### User standpoint

- Limit the number of concurrently opened indexes (currently, the number of indexes that can be concurrently opened is computed at startup)
- When too many an index is opened, the least recently used one is closed and its virtual memory released.
- This allows a user to have an arbitrary number of indexes of an arbitrary size

### Implementation standpoint

- Added a LRU cache map in `index-scheduler::lru`. A more complete implementation  (eg with helper functions not used here) is available but would better fit a dedicated crate.
- Use the LRU cache map in the `IndexScheduler`. To simplify the lifecycle of indexes, they are never removed from the cache when they are in the middle of a resize or delete operation. To achieve this, an intermediate `Vec` stores the UUIDs of the indexes that are in the middle of such an operation.
- Upon creating the index scheduler object, compute the total virtual memory that is adressable by using a dichotomic search on the max size of an index. Use this as a base to compute the number of indexes that can be open with 2TiB per index. If the virtual memory address space is lower than 2TiB, then only allow for 1 index of a fraction of that size.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-23 14:20:52 +00:00
Tamo
8a1b1a95f3 comment the right of the metrics 2023-02-23 13:59:01 +01:00
Tamo
8d47d2d018 update the auth api after the rebase 2023-02-23 13:15:51 +01:00
Tamo
5082cd5e67 update the config file to mention the experimental metrics feature 2023-02-23 12:26:22 +01:00
Tamo
750a2b6842 Update meilisearch/src/option.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-23 12:26:22 +01:00
Tamo
bc7d4112d9 send the cli experimental feature in the analytics 2023-02-23 12:26:22 +01:00
Tamo
88a18677d0 rename the metrics cli flag 2023-02-23 12:26:22 +01:00
Tamo
68e30214ca remove the feature flag and reorganize the module slightly 2023-02-23 12:26:21 +01:00
bors[bot]
b985b96e4e Merge #3530
3530: Fix highlighter bug r=Kerollmops a=ManyTheFish

# Pull Request

There was a highlighting issue on CJK's character, we were highlighting too many characters and these additional characters were duplicated after the highlight tag.

## Related issue
Fixes #3517 
Fixes #3526 

## What does this PR do?
- add a test showcasing the bug
- fix the bug by activating the char_map creation of the tokenizer during the highlighting process


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-02-23 10:59:43 +00:00
Louis Dureuil
71e7900c67 move index_map to file 2023-02-23 11:29:11 +01:00
Louis Dureuil
431782f3ee Move index_mapper to mod.rs 2023-02-23 11:29:11 +01:00
Louis Dureuil
3db613ff77 Don't iterate all indexes manually 2023-02-23 11:29:09 +01:00
Louis Dureuil
5822764be9 Skip computing index budget in tests 2023-02-23 11:23:39 +01:00
Louis Dureuil
c63294f331 Switch to 2TiB default index size, updates documentation 2023-02-23 11:23:39 +01:00
Louis Dureuil
a529bf160c Compute budget 2023-02-23 11:23:39 +01:00
Louis Dureuil
f1119f2dc2 Add dichotomic search to utils 2023-02-23 11:23:39 +01:00
Louis Dureuil
1db7d5d851 Add basic tests for index eviction and resize 2023-02-23 11:23:39 +01:00
Louis Dureuil
80b060f920 Use LRU cache 2023-02-23 11:23:39 +01:00
Louis Dureuil
fdf043580c Add LruMap 2023-02-23 11:23:38 +01:00
bors[bot]
f62703cd67 Merge #3534
3534: Update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter r=Kerollmops a=irevoire

Fixes #3533

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-23 07:05:12 +00:00
Tamo
76f82c880d update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter 2023-02-22 19:26:48 +01:00
bors[bot]
6eeba3a8ab Merge #3417
3417: Allow multiple searches in a single request r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #3427

## What does this PR do?

### User standpoint

- Adds a new `/multi-search` entry point (not to be confused with the existing `/{index_uid}/search` entry points) that accepts a POST whose body is an object containing an array of queries.
    - Each query must specify on which index it acts by providing its `indexUid`. Other parameters are identical to the one in the existing search routes (`q`, `limit`, etc.).
    - The response is a JSON object containing an array of the results for each search query as if it had been performed using the `/{index_uid}/search` routes.

### Implementation standpoint

- Refactor authentication module:
  -  Allow tenant token to be checked even without an index in URL
  - Add `meilisearch-auth` as a dependency to `index-scheduler` so as to have a working method of checking if the indexes are authorized there that takes into account both the API key and the tenant token (existing method relied on a behavior that was returning the allowed indexes from the API key as long as there weren't any tenant token)
  - Make `AuthFilter` an object with invariants and so its fields are now private
  - Use the methods of `AuthFilter` to know if an index is authorized rather than relying on its internal search rules.
  - Make tenant token search rules optional and `None` when the `AuthFilter` was not built with a tenant token.
- Add a new `routes::index::search::multiple_search` module containing a post handler that performs the same work as the existing `routes::index::search` post handler, but in a loop.
  - Add various tests
  - Add authentication test suite 

### Sample request


<details>
<summary>
Click to see request/response
</summary>

```json
~/datasets
❯ curl \
  -X POST 'http://localhost:7700/multi-search' \
  -H 'Content-Type: application/json' \
  --data-binary '{"queries": [{ "indexUid": "index-0", "q": "toto", "limit": 1 }, {"indexUid": "index-1", "q": "titi", "limit": 1}]}' | jsonxf
{ "results": [
  {
    "indexUid": "index-0",
    "hits": [
      {
        "id": 20480,
        "title": "Toto - 25th Anniversary - Live in Amsterdam",
        "overview": "Filmed in High Definition in Amsterdam on Toto's 25th Anniversary Tour in 2003, this stunning concert captures the band at their very best, reunited with original vocalist Bobby Kimball. The set combines all their hits with tracks from their latest album \"Through the Looking Glass\" and other live favorites, performed in front of a wildly enthusastic sell-out crowd. Extras include 35 minute behind-the-scenes film following the band through various stages of their world tour including footage from Japan, Thailand, South Korea, and France.  Toto celebrate their 25th anniversary with this blistering live concert, filmed in Amsterdam on February 25th, 2003. Proving they've still got exactly what it takes to move a crowd, the band perform a mixture of medley's, solo spots, and huge hits. Tracks include \"Rosanna,\" \"Africa,\" \"Hold The Line,\" a cover of the Beatles' \"While My Guitar Gently Weeps,\" and many more.",
        "genres": [
          "Music"
        ],
        "poster": "https://image.tmdb.org/t/p/w500/7SCbUPwoB8Z7VUIA1Rn1WWwjNiT.jpg",
        "release_date": 1064275200
      }
    ],
    "query": "toto",
    "processingTimeMs": 1,
    "limit": 1,
    "offset": 0,
    "estimatedTotalHits": 17
  },
  {
    "indexUid": "index-1",
    "hits": [
      {
        "id": 41212,
        "title": "Titicut Follies",
        "overview": "The film is a stark and graphic portrayal of the conditions that existed at the State Prison for the Criminally Insane at Bridgewater, Massachusetts. TITICUT FOLLIES documents the various ways the inmates are treated by the guards, social workers and psychiatrists.",
        "genres": [
          "Documentary"
        ],
        "poster": "https://image.tmdb.org/t/p/w500/2Ju5hn1ofOPeP1eRJtQWakiHuhW.jpg",
        "release_date": -70934400
      }
    ],
    "query": "titi",
    "processingTimeMs": 0,
    "limit": 1,
    "offset": 0,
    "estimatedTotalHits": 7
  }
]}
```

</details>


## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-22 17:23:45 +00:00
ManyTheFish
28d6a4466d Make the tokenizer creating a char map during highlighting 2023-02-22 17:43:10 +01:00