Commit Graph

979 Commits

Author SHA1 Message Date
Clément Renault
f04cd19886 Introduce a max prefix length parameter to the word prefix pair proximity update 2022-01-25 17:04:23 +01:00
Clément Renault
1514dfa1b7 Introduce a max proximity parameter to the word prefix pair proximity update 2022-01-25 17:04:23 +01:00
Clément Renault
23ea3ad738 Remove the useless threshold when computing the word prefix pair proximity 2022-01-25 17:04:23 +01:00
Clément Renault
e3c34684c6 Fix a bug where we were skipping most of the prefix pairs 2022-01-25 17:04:23 +01:00
bors[bot]
fd177b63f8 Merge #423
423: Remove an unused file r=irevoire a=irevoire

This empty file is not included anywhere

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-19 14:18:05 +00:00
Marin Postma
0c84a40298 document batch support
reusable transform

rework update api

add indexer config

fix tests

review changes

Co-authored-by: Clément Renault <clement@meilisearch.com>

fmt
2022-01-19 12:40:20 +01:00
Tamo
01968d7ca7 ensure we get no documents and no error when filtering on an empty db 2022-01-18 11:40:30 +01:00
bors[bot]
8f4499090b Merge #433
433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire

- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
  documents containing this field thus we return an empty RoaringBitmap
  instead of throwing an internal error

Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-17 14:06:53 +00:00
Tamo
d1ac40ea14 fix(filter): Fix two bugs.
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
  documents containing this field thus we returns an empty RoaringBitmap
  instead of throwing an internal error
2022-01-17 13:51:46 +01:00
Samyak S Sarnayak
2d7607734e Run cargo fmt on matching_words.rs 2022-01-17 13:04:33 +05:30
Samyak S Sarnayak
5ab505be33 Fix highlight by replacing num_graphemes_from_bytes
num_graphemes_from_bytes has been renamed in the tokenizer to
num_chars_from_bytes.

Highlight now works correctly!
2022-01-17 13:02:55 +05:30
Samyak S Sarnayak
e752bd06f7 Fix matching_words tests to compile successfully
The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59
2022-01-17 11:37:45 +05:30
Samyak S Sarnayak
30247d70cd Fix search highlight for non-unicode chars
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
  clusters to highlight.

In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
  clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a
      part of it matched
2022-01-17 11:37:44 +05:30
Tamo
98a365aaae store the geopoint in three dimensions 2021-12-14 12:21:24 +01:00
Tamo
d671d6f0f1 remove an unused file 2021-12-13 19:27:34 +01:00
Clément Renault
25faef67d0 Remove the database setup in the filter_depth test 2021-12-09 11:57:53 +01:00
Clément Renault
65519bc04b Test that empty filters return a None 2021-12-09 11:57:53 +01:00
Clément Renault
ef59762d8e Prefer returning None instead of the Empty Filter state 2021-12-09 11:57:52 +01:00
Clément Renault
ee856a7a46 Limit the max filter depth to 2000 2021-12-07 17:36:45 +01:00
Clément Renault
32bd9f091f Detect the filters that are too deep and return an error 2021-12-07 17:20:11 +01:00
Clément Renault
90f49eab6d Check the filter max depth limit and reject the invalid ones 2021-12-07 16:32:48 +01:00
many
8970246bc4 Sort positions before iterating over them during word pair proximity extraction 2021-11-22 18:16:54 +01:00
Marin Postma
6e977dd8e8 change visibility of DocumentDeletionResult 2021-11-22 15:44:44 +01:00
many
35f9499638 Export tokenizer from milli 2021-11-18 16:57:12 +01:00
Marin Postma
6eb47ab792 remove update_id in UpdateBuilder 2021-11-16 13:07:04 +01:00
Marin Postma
09b4281cff improve document addition returned metaimprove document addition
returned metaimprove document addition returned metaimprove document
addition returned metaimprove document addition returned metaimprove
document addition returned metaimprove document addition returned
metaimprove document addition returned meta
2021-11-10 14:08:36 +01:00
Marin Postma
721fc294be improve document deletion returned meta
returns both the remaining number of documents and the number of deleted
documents.
2021-11-10 14:08:18 +01:00
Irevoire
0ea0146e04 implement deref &str on the tokens 2021-11-09 11:34:10 +01:00
Tamo
7483c7513a fix the filterable fields 2021-11-07 01:52:19 +01:00
Tamo
e5af3ac65c rename the filter_condition.rs to filter.rs 2021-11-06 16:37:55 +01:00
Tamo
6831c23449 merge with main 2021-11-06 16:34:30 +01:00
Tamo
b249989bef fix most of the tests 2021-11-06 01:32:12 +01:00
Tamo
27a6a26b4b makes the parse function part of the filter_parser 2021-11-05 10:46:54 +01:00
Tamo
76d961cc77 implements the last errors 2021-11-04 17:42:06 +01:00
Tamo
8234f9fdf3 recreate most filter error except for the geosearch 2021-11-04 17:24:55 +01:00
Tamo
07a5ffb04c update http-ui 2021-11-04 15:52:22 +01:00
Tamo
a58bc5bebb update milli with the new parser_filter 2021-11-04 15:02:36 +01:00
many
7b3bac46a0 Change Attribute and Ranking rules errors 2021-11-04 13:19:32 +01:00
many
0c0038488c Change last error messages 2021-11-03 11:24:06 +01:00
Tamo
76a2adb7c3 re-enable the tests in the parser and start the creation of an error type 2021-11-02 17:35:17 +01:00
bors[bot]
08ae47e475 Merge #405
405: Change some error messages r=ManyTheFish a=ManyTheFish



Co-authored-by: many <maxime@meilisearch.com>
2021-10-28 13:35:55 +00:00
many
9f1e0d2a49 Refine asc/desc error messages 2021-10-28 14:47:17 +02:00
many
ed6db19681 Fix PR comments 2021-10-28 11:18:32 +02:00
marin postma
183d3dada7 return document count from builder 2021-10-28 10:33:04 +02:00
many
2be755ce75 Lower error check, already check in meilisearch 2021-10-27 19:50:41 +02:00
many
3599df77f0 Change some error messages 2021-10-27 19:33:01 +02:00
bors[bot]
d7943fe225 Merge #402
402: Optimize document transform r=MarinPostma a=MarinPostma

This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically:
- For json, we build a serde `Visitor`, that transform the json straight into obkv without intermediate representation.
- For csv, we directly write the lines in the obkv, applying other optimization as well.

Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-10-26 09:55:28 +00:00
marin postma
baddd80069 implement review suggestions 2021-10-25 18:29:12 +02:00
marin postma
f9445c1d90 return float parsing error context in csv 2021-10-25 17:27:10 +02:00
Clémentine Urquizar
208903ddde Revert "Replacing pest with nom " 2021-10-25 11:58:00 +02:00