Commit Graph

2231 Commits

Author SHA1 Message Date
Clément Renault
25faef67d0 Remove the database setup in the filter_depth test 2021-12-09 11:57:53 +01:00
Clément Renault
65519bc04b Test that empty filters return a None 2021-12-09 11:57:53 +01:00
Clément Renault
ef59762d8e Prefer returning None instead of the Empty Filter state 2021-12-09 11:57:52 +01:00
bors[bot]
80dcfd5c3e Merge #421
421: Introduce the depth method on FilterCondition r=Kerollmops a=Kerollmops

This PR introduces the depth method on the FilterCondition type to be able to react to it. It is meant to be used to reject filters that go too deep and can make the engine stack overflow.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-12-09 10:28:52 +00:00
Clément Renault
ee856a7a46 Limit the max filter depth to 2000 2021-12-07 17:36:45 +01:00
Clément Renault
32bd9f091f Detect the filters that are too deep and return an error 2021-12-07 17:20:11 +01:00
Clément Renault
90f49eab6d Check the filter max depth limit and reject the invalid ones 2021-12-07 16:32:48 +01:00
Clément Renault
49c2db9485 Change the depth function to return the token depth 2021-12-07 16:06:10 +01:00
Clément Renault
57502fcf6a Introduce the depth method on FilterCondition 2021-12-06 17:35:20 +01:00
bors[bot]
c83b77304a Merge #420
420: Update milli 0.21.0 r=ManyTheFish a=ManyTheFish

Update all modules to 0.21.0

Co-authored-by: many <maxime@meilisearch.com>
2021-11-30 17:22:12 +00:00
many
1b3923b5ce Update all packages to 0.21.0 2021-11-29 12:17:59 +01:00
bors[bot]
26629a3f9e Merge #419
419:  fix word pair proximity indexing r=ManyTheFish a=ManyTheFish

# Pull Request

Sort positions before iterating over them during word pair proximity extraction.

fixes [Meilisearch#1913](https://github.com/meilisearch/MeiliSearch/issues/1913)

Co-authored-by: many <maxime@meilisearch.com>
2021-11-23 10:21:05 +00:00
many
8970246bc4 Sort positions before iterating over them during word pair proximity extraction 2021-11-22 18:16:54 +01:00
bors[bot]
cc32519a2d Merge #418
418: change visibility of DocumentDeletionResult r=Kerollmops a=MarinPostma

Change the visibility of `DocumentDeletionResult`, so its fields can be accesses from outside milli.


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-11-22 14:45:55 +00:00
Marin Postma
6e977dd8e8 change visibility of DocumentDeletionResult 2021-11-22 15:44:44 +01:00
bors[bot]
68f1db123a Merge #416
416: Update tokenizer v0.2.6 r=Kerollmops a=ManyTheFish



Co-authored-by: many <maxime@meilisearch.com>
2021-11-18 16:01:11 +00:00
many
35f9499638 Export tokenizer from milli 2021-11-18 16:57:12 +01:00
many
64ef5869d7 Update tokenizer v0.2.6 2021-11-18 16:56:05 +01:00
bors[bot]
2c14efa8a2 Merge #409
409: remove update_id in UpdateBuilder r=ManyTheFish a=MarinPostma

Removing the `update_id` from `UpdateBuidler`, since it serves no purpose. I had introduced it when working in HA some time ago, but I think there are better ways to do it now, so it can be removed an stop being in our way.

Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-11-16 14:59:09 +00:00
Marin Postma
6eb47ab792 remove update_id in UpdateBuilder 2021-11-16 13:07:04 +01:00
bors[bot]
21b78f3926 Merge #414
414: improve update result types r=ManyTheFish a=MarinPostma

Inprove the returned meta when performing document additions and deletions:

- On document addition return the number of indexed documents and the total number of documents in the index after the indexion
- On document deletion return the number of deleted documents, and the remaining number of documents in the index after the deletion is performed


I also fixed a potential bug when performing a document deletion and the primary key couldn't be found: before we assumed that the db was empty and returned that no documents were deleted, but since we checked before that the db wasn't empty, entering this branch is actually a bug, and now returns a 'MissingPrimaryKey' error.


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-11-15 09:06:10 +00:00
Marin Postma
09b4281cff improve document addition returned metaimprove document addition
returned metaimprove document addition returned metaimprove document
addition returned metaimprove document addition returned metaimprove
document addition returned metaimprove document addition returned
metaimprove document addition returned meta
2021-11-10 14:08:36 +01:00
Marin Postma
721fc294be improve document deletion returned meta
returns both the remaining number of documents and the number of deleted
documents.
2021-11-10 14:08:18 +01:00
bors[bot]
8dff08d772 Merge #400
400: Rewrite the filter parser and add a lot of tests r=irevoire a=irevoire

This PR is a complete rewrite of #358, which was reverted in #403.
You can already try this PR in Meilisearch here https://github.com/meilisearch/MeiliSearch/pull/1880.

Since writing a parser is quite complicated, I moved all the logic to another workspace called `filter_parser`.
In this workspace, we don't know anything about milli, the filterable fields / field ID or anything.
As you can see in its `cargo.toml`, it has only three dependencies entirely focused on the parsing part:
```
nom = "7.0.0"
nom_locate = "4.0.0"
```

But introducing this new workspace made some changes necessary on the “AST”. Now the parser only returns `Tokens` (a simple `&str` with a bit of context). Everything is interpreted when we execute the filter later in milli.
This crate provides a new error type for all filter related errors.

---------
## Errors

Currently, we have multiple kinds of errors. Sometimes we are generating errors looking like that: (for `name = truc`)
```
Attribute `name` is not filterable. Available filterable attributes are: ``.
```
While sometimes pest was generating errors looking like that:
```
Invalid syntax for the filter parameter: ` --> 1:7
  |
1 | name =
  |       ^---
  |
  = expected word`.
```

Which most people were seeing like that: (for `name =`)
```
Invalid syntax for the filter parameter: ` --> 1:7\n  |\n1 | name =\n  |       ^---\n  |\n  = expected word`.
```

-----------

With this PR, the error format is unified between all errors.
All errors follow this more straightforward format:
```
The error message.
[from char]:[to char] filter
```

This should be way easier to read when embedded in the JSON for a human. And it should also allow us to parse the errors easily and provide highlighting or something with a frontend playground.

Here is an example of the two previous errors with the new format:
For `name = truc`:
```
Attribute `name` is not filterable. Available filterable attributes are: ``.
1:4 name = truc
```
Or in one line:
```
Attribute `name` is not filterable. Available filterable attributes are: ``.\n1:4 name = truc
```

And for `name =`:
```
Was expecting a value but instead got nothing.
7:7 name =
```
Or in one line:
```
Was expecting a value but instead got nothing.\n7:7 name =
```

Also, since we now have control over the parser, we can generate more explicit error messages so a lot of new errors have been created. I tried to be as helpful as possible for the user; here is a little overview of the new error message you can get when misusing a filter:
```
Expression `"truc` is missing the following closing delimiter: `"`.
8:13 name = "truc
```
The `_geoRadius` filter is an operation and can't be used as a value.
8:30 name = _geoRadius(12, 13, 14)
```
etc

## Tests
A lot of tests have been written in the `filter_parser` crate. I think there is a unit test for every part of the syntax. 
But since we can never be sure we covered all the cases, I also fuzzed the new parser A LOT (for ±8 hours on 20 threads). And the code to fuzz the parser is included in the workspace, so if one day we need to change something to the syntax, we'll be able to re-use it by simply running:
```
cargo fuzz run --release parse
```

## Milli
I renamed the type and module `filter_condition.rs` / `FilterCondition` to `filter.rs` / `Filter`.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-11-09 16:09:34 +00:00
Irevoire
7c3017734a re-ignore the ! symbol when generating a good error message 2021-11-09 17:08:04 +01:00
Tamo
bff48681d2 Re-order the operator
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-11-09 17:05:36 +01:00
Irevoire
519d6b2bf3 remove the ! syntax for the not 2021-11-09 16:47:54 +01:00
Irevoire
73df873f44 fix typos 2021-11-09 16:41:10 +01:00
Irevoire
99197387af fix the test with the new escaped format 2021-11-09 16:41:10 +01:00
Tamo
f28600031d Rename the filter_parser crate into filter-parser
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-11-09 16:41:10 +01:00
Irevoire
0ea0146e04 implement deref &str on the tokens 2021-11-09 11:34:10 +01:00
Irevoire
a211a9cdcd update the error format so it can be easily parsed by someone else 2021-11-09 11:19:30 +01:00
Irevoire
9b24f83456 in case of error return a range of chars position instead of one line and column 2021-11-09 10:27:29 +01:00
Tamo
2c6d08c519 Simplify the tokens to only wrap one span and no inner value
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 10:12:20 +01:00
Irevoire
18eb4b9c51 fix spaces in the bnf 2021-11-09 01:04:50 +01:00
Tamo
cf98bf37d0 Simplify some closure
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 01:03:02 +01:00
Tamo
bc9daf9041 update the bnf
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 01:00:42 +01:00
Tamo
9c36e497d9 Rename the key_component into a value_component
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 00:59:44 +01:00
Irevoire
6515838d35 improve the readability of the _geoPoint thingy in the value 2021-11-09 00:57:46 +01:00
Tamo
ea52aff6dc Rename the ExtendNomError trait to NomErrorExt
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 00:52:17 +01:00
Irevoire
ef0d5a8240 flatten a match 2021-11-09 00:49:13 +01:00
Tamo
15bd14297e Remove useless closure
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 00:45:46 +01:00
Irevoire
21d115dcbb remove greedy-error 2021-11-08 17:53:41 +01:00
Irevoire
959ca66125 improve the error diagnostic when parsing values 2021-11-08 15:58:21 +01:00
Tamo
7483c7513a fix the filterable fields 2021-11-07 01:52:19 +01:00
Tamo
e5af3ac65c rename the filter_condition.rs to filter.rs 2021-11-06 16:37:55 +01:00
Tamo
6831c23449 merge with main 2021-11-06 16:34:30 +01:00
Tamo
5c01e9bf7c fix the benchmarks 2021-11-06 16:03:49 +01:00
Tamo
075d9c97c0 re-implement the equality between tokens to only compare the inner value 2021-11-06 16:02:27 +01:00
Tamo
b249989bef fix most of the tests 2021-11-06 01:32:12 +01:00