Commit Graph

7985 Commits

Author SHA1 Message Date
Tamo
67a583bedf handle the panic happening in milli 2023-05-29 13:39:26 +02:00
Tamo
99e9057684 rename the indexing fuzzer to fuzz-indexing so it doesn't collide with other binary name when being called from the root of the workspace 2023-05-29 13:07:06 +02:00
Tamo
8d40d300a5 rename the fuzzer to indexing 2023-05-29 12:37:24 +02:00
Tamo
6c6387d05e move the fuzzer to its own crate 2023-05-29 12:27:39 +02:00
Tamo
002f42875f fix the fuzzer 2023-05-23 11:42:40 +02:00
Tamo
22213dc604 push the fuzzer 2023-05-23 09:14:26 +02:00
Tamo
602ad98cb8 improve the way we handle the fsts 2023-05-22 11:15:14 +02:00
Tamo
7f619ff0e4 get rids of the now unused soft_deletion_used parameter 2023-05-22 10:33:49 +02:00
Tamo
4391cba6ca fix the addition + deletion bug 2023-05-17 18:28:57 +02:00
Tamo
d7ddf4925e Revert "Disable autobatching of additions and deletions"
This reverts commit a94e78ffb0.
2023-05-17 14:25:50 +02:00
meili-bors[bot]
918ce1dd67 Merge #3731
3731: Move comments above keys in config.toml r=curquiza a=jirutka

The current style is very unusual, confusing and breaks compatibility with tools for parsing config files including comments. Everyone writes comments above the items to which they refer (maybe except pythonists), so let's stick to that.


Co-authored-by: Jakub Jirutka <jakub@jirutka.cz>
2023-05-09 09:24:36 +00:00
meili-bors[bot]
4a4210c116 Merge #3734
3734: Update version for the next release (v1.2.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
v1.2.0-rc.0
2023-05-09 07:35:48 +00:00
curquiza
3533d4f2bb Update version for the next release (v1.2.0) in Cargo.toml 2023-05-08 17:52:33 +00:00
meili-bors[bot]
eace6df91b Merge #3726
3726: Fix prefix highlighting r=loiclec a=ManyTheFish

The prefix queries were not properly highlighted, this PR now highlights only the start of a word when it matched with a prefix

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-05-08 07:46:46 +00:00
Loïc Lecrenier
83ab8cf4e5 Remove dbg!(..) expression in highlighter tests 2023-05-08 09:45:23 +02:00
Jakub Jirutka
8095f21999 Move comments above keys in config.toml
The current style is very unusual, confusing and breaks compatibility
with tools for parsing config files including comments. Everyone writes
comments above the items to which they refer (maybe except pythonists),
so let's stick to that.
2023-05-06 18:10:54 +02:00
ManyTheFish
cd2573fcc3 Fix prefix highlighting 2023-05-04 16:53:50 +02:00
meili-bors[bot]
9f7981df28 Merge #3687
3687: Allow to disable specialized tokenizations (again) r=Kerollmops a=jirutka

In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai` feature flags to allow melisearch to be built without huge specialed tokenizations that took up 90% of the melisearch binary size. Unfortunately, due to some recent changes, this doesn't work anymore. The problem lies in excessive use of the `default` feature flag, which infects the dependency graph.

Instead of adding `default-features = false` here and there, it's easier and more future-proof to not declare `default` in `milli` and `meilisearch-types`. I've renamed it to `all-tokenizers`, which also makes it a bit clearer what it's about.


Co-authored-by: Jakub Jirutka <jakub@jirutka.cz>
2023-05-04 14:48:01 +00:00
Jakub Jirutka
e615fa5ec6 Fix unused_imports warning in milli when japanese is not enabled 2023-05-04 15:46:11 +02:00
Jakub Jirutka
13f1277637 Allow to disable specialized tokenizations (again)
In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai`
feature flags to allow melisearch to be built without huge specialed
tokenizations that took up 90% of the melisearch binary size.
Unfortunately, due to some recent changes, this doesn't work anymore.
The problem lies in excessive use of the `default` feature flag, which
infects the dependency graph.

Instead of adding `default-features = false` here and there, it's easier
and more future-proof to not declare `default` in `milli` and
`meilisearch-types`. I've renamed it to `all-tokenizers`, which also
makes it a bit clearer what it's about.
2023-05-04 15:45:40 +02:00
meili-bors[bot]
4919774f2e Merge #3570
3570: Get documents by filter r=irevoire a=dureuill

# Pull Request

## Related issue

Associated spec: https://github.com/meilisearch/specifications/pull/234

None really, this is more of an extension of #3477: since after this issue we'll be able to delete documents by filter, it makes sense to also be able to get documents by filter. 

## What does this PR do?

### User standpoint

- Add a new `filter` URL parameter to `GET /indexes/{:indexUid}/documents` and a new `POST /indexes/{:indexUid}/documents/fetch` route with the same `offset, limit, fields, filter` 

### Implementation standpoint

-  Add a new `Index::iter_documents` method to iterate on a set of documents rather than return a vector of these documents.
- Rewrite the other `Index::*documents` methods to use the new `Index::iter_documents` method.

## Usage

<details>
<summary>
Sample request and response
</summary>

```
curl -X POST 'http://localhost:7700/indexes/index-1101/documents/fetch' -H 'Content-Type: application/json' --data-binary '{ "filter": "genres = Comedy", "limit": 3, "offset": 8000}' | jsonxf
```

```json
{
  "results": [
    {
      "id": 326126,
      "title": "Bad Exorcists",
      "overview": "A trio of awkward teens intend to win a horror festival by making their own movie, but wind up getting their actress possessed in the process.",
      "genres": [
        "Horror",
        "Comedy"
      ],
      "poster": "https://image.tmdb.org/t/p/w500/lwd65kPbjFacAw3QSXiwSsW6cFU.jpg",
      "release_date": 1425081600
    },
    {
      "id": 326215,
      "title": "Ooops! Noah is Gone...",
      "overview": "It's the end of the world. A flood is coming. Luckily for Dave and his son Finny, a couple of clumsy Nestrians, an Ark has been built to save all animals. But as it turns out, Nestrians aren't allowed. Sneaking on board with the involuntary help of Hazel and her daughter Leah, two Grymps, they think they're safe. Until the curious kids fall off the Ark. Now Finny and Leah struggle to survive the flood and hungry predators and attempt to reach the top of a mountain, while Dave and Hazel must put aside their differences, turn the Ark around and save their kids. It's definitely not going to be smooth sailing.",
      "genres": [
        "Animation",
        "Adventure",
        "Comedy",
        "Family"
      ],
      "poster": "https://image.tmdb.org/t/p/w500/gEJXHgpiKh89Vwjc4XUY5CIgUdB.jpg",
      "release_date": 1427328000
    },
    {
      "id": 326241,
      "title": "For Here or to Go?",
      "overview": "An aspiring Indian tech entrepreneur in the Silicon Valley finds himself unexpectedly battling the bizarre American immigration system to keep his dream alive or prepare to return home forever.",
      "genres": [
        "Drama",
        "Comedy"
      ],
      "poster": "https://image.tmdb.org/t/p/w500/ff8WaA7ItBgl36kdT232i0d0Fnq.jpg",
      "release_date": 1490918400
    }
  ],
  "offset": 8000,
  "limit": 3,
  "total": 9331
}
```

<img width="1348" alt="Capture d’écran 2023-03-08 à 10 09 04" src="https://user-images.githubusercontent.com/41078892/223670905-6932b79b-f9b8-4a41-b59e-be2171705b7d.png">



</details>

# Draft status

- [ ] Route naming: having one route be `GET /indexes/{:indexUid}/documents` and the other `POST /indexes/{:indexUid}/documents/fetch` is suboptimal (also, technically a breaking change for documents with `fetch` as uid?), but `POST /indexes/{:indexUid}/documents` is already used to insert documents.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-05-04 12:54:26 +00:00
Tamo
a3da680ce6 Update meilisearch/tests/documents/errors.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-04 14:51:17 +02:00
Tamo
11e394dba1 merge the document fetch and get error codes 2023-05-04 15:39:49 +02:00
Tamo
469d2f2a9c fix the fields field of the POST fetch document API 2023-05-04 15:34:09 +02:00
Tamo
ce6507d20c improve the test of the get document by filter 2023-05-04 15:34:09 +02:00
Tamo
b92da5d15a add a big test on the get document by filter of the get route 2023-05-04 15:34:09 +02:00
Tamo
ed3dfbe729 add error codes and tests 2023-05-04 15:34:08 +02:00
Louis Dureuil
441641397b Implement document get with filters 2023-05-04 15:32:34 +02:00
Louis Dureuil
a35d3fc708 Add Index::iter_documents 2023-05-04 15:31:54 +02:00
Louis Dureuil
745c1a2668 Make parse_filter pub 2023-05-04 15:31:53 +02:00
meili-bors[bot]
a95128df6b Merge #3550
3550: Delete documents by filter r=irevoire a=dureuill

# Prototype `prototype-delete-by-filter-0`

Usage:
A new route is available under `POST /indexes/{index_uid}/documents/delete` that allows you to delete your documents by filter.
The expected payload looks like that:
```json
{
  "filter": "doggo = bernese",
}
```

It'll then enqueue a task in your task queue that'll delete all the documents matching this filter once it's processed.
Here is an example of the associated details;
```json
  "details": {
    "deletedDocuments": 53,
    "originalFilter": "\"doggo = bernese\""
  }
```

----------


# Pull Request

## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3477

## What does this PR do?

### User standpoint

- Modifies the `/indexes/{:indexUid}/documents/delete-batch` route to accept either the existing array of documents ids, or a JSON object with a `filter` field representing a filter to apply. If that latter variant is used, any document matching the filter will be deleted.

### Implementation standpoint

- (processing time version) Adds a new BatchKind that is not autobatchable and that performs the delete by filter
- Reuse the `documentDeletion` task with a new `originalFilter` detail that replaces the `providedIds` detail.

## Example

<details>
<summary>Sample request, response and task result</summary>

Request:

```
curl \
  -X POST 'http://localhost:7700/indexes/index-10/documents/delete-batch' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "filter" : "mass = 600"}'
```

Response:

```
{
  "taskUid": 3902,
  "indexUid": "index-10",
  "status": "enqueued",
  "type": "documentDeletion",
  "enqueuedAt": "2023-02-28T20:50:31.667502Z"
}
```

Task log:

```json
    {
      "uid": 3906,
      "indexUid": "index-12",
      "status": "succeeded",
      "type": "documentDeletion",
      "canceledBy": null,
      "details": {
        "deletedDocuments": 3,
        "originalFilter": "\"mass = 600\""
      },
      "error": null,
      "duration": "PT0.001819S",
      "enqueuedAt": "2023-03-07T08:57:20.11387Z",
      "startedAt": "2023-03-07T08:57:20.115895Z",
      "finishedAt": "2023-03-07T08:57:20.117714Z"
    }
```

</details>

## Draft status

- [ ] Error handling
- [ ] Analytics
- [ ] Do we want to reuse the `delete-batch` route in this way, or create a new route instead?
- [ ] Should the filter be applied at request time or when the deletion task is processed? 
  - The first commit in this PR applies the filter at request time, meaning that even if a document is modified in a way that no longer matches the filter in a later update, it will be deleted as long as the deletion task is processed after that update. 
  - The other commits in this PR apply the filter only when the asynchronous deletion task is processed, meaning that documents that match the filter at processing time are deleted even if they didn't match the filter at request time.
- [ ] If keeping the filter at request time, find a more elegant way to recover the user document ids from the internal document ids. The current way implemented in the first commit of this PR involves getting all the documents matching the filter, looking for the value of their primary key, and turning it into a string by copy-pasting routines found in milli...
- [ ] Security consideration, if any
- [ ] Fix the tests (but waiting until product questions are resolved)
- [ ] Add delete by filter specific tests



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-05-04 10:44:41 +00:00
meili-bors[bot]
e0537c3870 Merge #3720
3720: Change links of docs everywhere r=curquiza a=curquiza

Completely fixes #3668 

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-05-04 10:07:41 +00:00
meili-bors[bot]
da220294f6 Merge #3639
3639: Add a dedicated error variant for planned failures in index scheduler tests r=Kerollmops a=Sufflope

# Pull Request

## Related issue
Fixes #3086

## What does this PR do?
- Add a dedicated test variant in test cfg to avoid reusing a misleading existing error

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Jean-Sébastien Bour <jean-sebastien@bour.name>
2023-05-04 09:33:57 +00:00
meili-bors[bot]
78e611f282 Merge #3693
3693: Implement the auto deletion of tasks r=dureuill a=irevoire

Fixes https://github.com/meilisearch/meilisearch/issues/3622

This PR should be the definite fix for #3622.

It adds a limit (1M) to the maximum number of tasks the task queue can hold.
Once the task queue reaches this limit (1M of tasks are in the task queue, whatever their status is), meilisearch will schedule a task deletion that tries to delete the oldest 100k tasks.
If meilisearch can't delete 100k tasks because some of them are not yet finished, it will delete as many tasks as possible.

Once the limit is reached, you're still able to register new tasks. The engine will only stop you from adding new tasks once [the other hard limit](https://github.com/meilisearch/meilisearch/pull/3659) of 10GiB of tasks is reached (that's between 5M and 15M of tasks depending on your workflow).

-------

Technically;
- We only try to schedule our task deletion when calling the tick function but before creating a new batch. This means we never enqueue a task we're not going to process ~right away.
- If our task deletion doesn't delete anything, we don't enqueue it and log a warn the user that the engine is not working properly

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-04 08:30:22 +00:00
Louis Dureuil
d8381eb790 Fix originalFilter 2023-05-04 10:07:59 +02:00
Louis Dureuil
b212aef5db add one nanosecond to generated filter so as to generate a filter that would have matched the last task to delete 2023-05-04 09:56:48 +02:00
meili-bors[bot]
6bf66f35be Merge #3721
3721: Use new bors URL of our self hosted bors instance r=curquiza a=curquiza



Co-authored-by: curquiza <clementine@meilisearch.com>
2023-05-04 07:53:39 +00:00
Louis Dureuil
52ab114f6c Fix test on macOS: 50 tasks would result in the test consistently failing on a local macOS 2023-05-04 00:06:49 +02:00
Tamo
dcbfecf42c make the generated filter valid 2023-05-04 00:06:49 +02:00
Tamo
9ca6f59546 Update index-scheduler/src/lib.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-04 00:06:49 +02:00
Tamo
aa7537a11e make the autodeletion work with a fixed number of tasks and update the tests 2023-05-04 00:06:49 +02:00
Tamo
972bb2831c log when meilisearch need to delete tasks 2023-05-04 00:06:49 +02:00
Tamo
f9ddd32545 implement the auto-deletion of tasks 2023-05-04 00:06:49 +02:00
Louis Dureuil
d5059520aa Fix typo 2023-05-03 22:27:03 +02:00
Louis Dureuil
1c3642c9b2 Fix deletion per filter analytics 2023-05-03 22:26:51 +02:00
Tamo
d2d2bacaf2 add a test on the complex filter 2023-05-03 20:07:08 +02:00
curquiza
30edba3497 Update links of the docs 2023-05-03 19:14:57 +02:00
Louis Dureuil
84e7bd9342 Fix test after rebase on filter additions 2023-05-03 17:51:28 +02:00
Louis Dureuil
2b74e4d116 Fix test 2023-05-03 17:41:50 +02:00
Tamo
b5fe0b2b07 fix the details 2023-05-03 17:41:50 +02:00