Commit Graph

1421 Commits

Author SHA1 Message Date
many
1b3923b5ce Update all packages to 0.21.0 2021-11-29 12:17:59 +01:00
bors[bot]
26629a3f9e Merge #419
419:  fix word pair proximity indexing r=ManyTheFish a=ManyTheFish

# Pull Request

Sort positions before iterating over them during word pair proximity extraction.

fixes [Meilisearch#1913](https://github.com/meilisearch/MeiliSearch/issues/1913)

Co-authored-by: many <maxime@meilisearch.com>
2021-11-23 10:21:05 +00:00
many
8970246bc4 Sort positions before iterating over them during word pair proximity extraction 2021-11-22 18:16:54 +01:00
bors[bot]
cc32519a2d Merge #418
418: change visibility of DocumentDeletionResult r=Kerollmops a=MarinPostma

Change the visibility of `DocumentDeletionResult`, so its fields can be accesses from outside milli.


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-11-22 14:45:55 +00:00
Marin Postma
6e977dd8e8 change visibility of DocumentDeletionResult 2021-11-22 15:44:44 +01:00
bors[bot]
68f1db123a Merge #416
416: Update tokenizer v0.2.6 r=Kerollmops a=ManyTheFish



Co-authored-by: many <maxime@meilisearch.com>
2021-11-18 16:01:11 +00:00
many
35f9499638 Export tokenizer from milli 2021-11-18 16:57:12 +01:00
many
64ef5869d7 Update tokenizer v0.2.6 2021-11-18 16:56:05 +01:00
bors[bot]
2c14efa8a2 Merge #409
409: remove update_id in UpdateBuilder r=ManyTheFish a=MarinPostma

Removing the `update_id` from `UpdateBuidler`, since it serves no purpose. I had introduced it when working in HA some time ago, but I think there are better ways to do it now, so it can be removed an stop being in our way.

Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-11-16 14:59:09 +00:00
Marin Postma
6eb47ab792 remove update_id in UpdateBuilder 2021-11-16 13:07:04 +01:00
bors[bot]
21b78f3926 Merge #414
414: improve update result types r=ManyTheFish a=MarinPostma

Inprove the returned meta when performing document additions and deletions:

- On document addition return the number of indexed documents and the total number of documents in the index after the indexion
- On document deletion return the number of deleted documents, and the remaining number of documents in the index after the deletion is performed


I also fixed a potential bug when performing a document deletion and the primary key couldn't be found: before we assumed that the db was empty and returned that no documents were deleted, but since we checked before that the db wasn't empty, entering this branch is actually a bug, and now returns a 'MissingPrimaryKey' error.


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-11-15 09:06:10 +00:00
Marin Postma
09b4281cff improve document addition returned metaimprove document addition
returned metaimprove document addition returned metaimprove document
addition returned metaimprove document addition returned metaimprove
document addition returned metaimprove document addition returned
metaimprove document addition returned meta
2021-11-10 14:08:36 +01:00
Marin Postma
721fc294be improve document deletion returned meta
returns both the remaining number of documents and the number of deleted
documents.
2021-11-10 14:08:18 +01:00
bors[bot]
8dff08d772 Merge #400
400: Rewrite the filter parser and add a lot of tests r=irevoire a=irevoire

This PR is a complete rewrite of #358, which was reverted in #403.
You can already try this PR in Meilisearch here https://github.com/meilisearch/MeiliSearch/pull/1880.

Since writing a parser is quite complicated, I moved all the logic to another workspace called `filter_parser`.
In this workspace, we don't know anything about milli, the filterable fields / field ID or anything.
As you can see in its `cargo.toml`, it has only three dependencies entirely focused on the parsing part:
```
nom = "7.0.0"
nom_locate = "4.0.0"
```

But introducing this new workspace made some changes necessary on the “AST”. Now the parser only returns `Tokens` (a simple `&str` with a bit of context). Everything is interpreted when we execute the filter later in milli.
This crate provides a new error type for all filter related errors.

---------
## Errors

Currently, we have multiple kinds of errors. Sometimes we are generating errors looking like that: (for `name = truc`)
```
Attribute `name` is not filterable. Available filterable attributes are: ``.
```
While sometimes pest was generating errors looking like that:
```
Invalid syntax for the filter parameter: ` --> 1:7
  |
1 | name =
  |       ^---
  |
  = expected word`.
```

Which most people were seeing like that: (for `name =`)
```
Invalid syntax for the filter parameter: ` --> 1:7\n  |\n1 | name =\n  |       ^---\n  |\n  = expected word`.
```

-----------

With this PR, the error format is unified between all errors.
All errors follow this more straightforward format:
```
The error message.
[from char]:[to char] filter
```

This should be way easier to read when embedded in the JSON for a human. And it should also allow us to parse the errors easily and provide highlighting or something with a frontend playground.

Here is an example of the two previous errors with the new format:
For `name = truc`:
```
Attribute `name` is not filterable. Available filterable attributes are: ``.
1:4 name = truc
```
Or in one line:
```
Attribute `name` is not filterable. Available filterable attributes are: ``.\n1:4 name = truc
```

And for `name =`:
```
Was expecting a value but instead got nothing.
7:7 name =
```
Or in one line:
```
Was expecting a value but instead got nothing.\n7:7 name =
```

Also, since we now have control over the parser, we can generate more explicit error messages so a lot of new errors have been created. I tried to be as helpful as possible for the user; here is a little overview of the new error message you can get when misusing a filter:
```
Expression `"truc` is missing the following closing delimiter: `"`.
8:13 name = "truc
```
The `_geoRadius` filter is an operation and can't be used as a value.
8:30 name = _geoRadius(12, 13, 14)
```
etc

## Tests
A lot of tests have been written in the `filter_parser` crate. I think there is a unit test for every part of the syntax. 
But since we can never be sure we covered all the cases, I also fuzzed the new parser A LOT (for ±8 hours on 20 threads). And the code to fuzz the parser is included in the workspace, so if one day we need to change something to the syntax, we'll be able to re-use it by simply running:
```
cargo fuzz run --release parse
```

## Milli
I renamed the type and module `filter_condition.rs` / `FilterCondition` to `filter.rs` / `Filter`.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-11-09 16:09:34 +00:00
Irevoire
7c3017734a re-ignore the ! symbol when generating a good error message 2021-11-09 17:08:04 +01:00
Tamo
bff48681d2 Re-order the operator
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-11-09 17:05:36 +01:00
Irevoire
519d6b2bf3 remove the ! syntax for the not 2021-11-09 16:47:54 +01:00
Irevoire
73df873f44 fix typos 2021-11-09 16:41:10 +01:00
Irevoire
99197387af fix the test with the new escaped format 2021-11-09 16:41:10 +01:00
Tamo
f28600031d Rename the filter_parser crate into filter-parser
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-11-09 16:41:10 +01:00
Irevoire
0ea0146e04 implement deref &str on the tokens 2021-11-09 11:34:10 +01:00
Irevoire
a211a9cdcd update the error format so it can be easily parsed by someone else 2021-11-09 11:19:30 +01:00
Irevoire
9b24f83456 in case of error return a range of chars position instead of one line and column 2021-11-09 10:27:29 +01:00
Tamo
2c6d08c519 Simplify the tokens to only wrap one span and no inner value
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 10:12:20 +01:00
Irevoire
18eb4b9c51 fix spaces in the bnf 2021-11-09 01:04:50 +01:00
Tamo
cf98bf37d0 Simplify some closure
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 01:03:02 +01:00
Tamo
bc9daf9041 update the bnf
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 01:00:42 +01:00
Tamo
9c36e497d9 Rename the key_component into a value_component
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 00:59:44 +01:00
Irevoire
6515838d35 improve the readability of the _geoPoint thingy in the value 2021-11-09 00:57:46 +01:00
Tamo
ea52aff6dc Rename the ExtendNomError trait to NomErrorExt
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 00:52:17 +01:00
Irevoire
ef0d5a8240 flatten a match 2021-11-09 00:49:13 +01:00
Tamo
15bd14297e Remove useless closure
Co-authored-by: marin <postma.marin@protonmail.com>
2021-11-09 00:45:46 +01:00
Irevoire
21d115dcbb remove greedy-error 2021-11-08 17:53:41 +01:00
Irevoire
959ca66125 improve the error diagnostic when parsing values 2021-11-08 15:58:21 +01:00
Tamo
7483c7513a fix the filterable fields 2021-11-07 01:52:19 +01:00
Tamo
e5af3ac65c rename the filter_condition.rs to filter.rs 2021-11-06 16:37:55 +01:00
Tamo
6831c23449 merge with main 2021-11-06 16:34:30 +01:00
Tamo
5c01e9bf7c fix the benchmarks 2021-11-06 16:03:49 +01:00
Tamo
075d9c97c0 re-implement the equality between tokens to only compare the inner value 2021-11-06 16:02:27 +01:00
Tamo
b249989bef fix most of the tests 2021-11-06 01:32:12 +01:00
Tamo
070ec9bd97 small update on the README 2021-11-05 17:45:20 +01:00
Tamo
27a6a26b4b makes the parse function part of the filter_parser 2021-11-05 10:46:54 +01:00
Tamo
76d961cc77 implements the last errors 2021-11-04 17:42:06 +01:00
Tamo
8234f9fdf3 recreate most filter error except for the geosearch 2021-11-04 17:24:55 +01:00
Tamo
7328ffb034 stop panicking in case of internal error 2021-11-04 16:20:53 +01:00
Tamo
3e5550c910 clean the errors 2021-11-04 16:12:17 +01:00
Tamo
72a9071203 fix typo 2021-11-04 16:03:52 +01:00
Tamo
07a5ffb04c update http-ui 2021-11-04 15:52:22 +01:00
Tamo
a58bc5bebb update milli with the new parser_filter 2021-11-04 15:02:36 +01:00
Tamo
b1a0110a47 update the main 2021-11-04 14:48:39 +01:00