Commit Graph

643 Commits

Author SHA1 Message Date
e1cc025cbd Merge #440
440: fix(fuzzer): fix the fuzzer after #430 r=Kerollmops a=irevoire



Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-25 16:33:57 +00:00
fb51d511be fix(fuzzer): fix the fuzzer after #430 2022-01-25 12:08:47 +01:00
9f2ff71581 Merge #434
434: bump milli to v0.22.0 r=curquiza a=irevoire

This is breaking because of this PR:
98a365aaae

Should we do a special branch to only release the [patch](https://github.com/meilisearch/milli/pull/433) for https://github.com/meilisearch/MeiliSearch/issues/2082 (which is non-breaking)?

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-24 17:31:20 +00:00
fd177b63f8 Merge #423
423: Remove an unused file r=irevoire a=irevoire

This empty file is not included anywhere

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-19 14:18:05 +00:00
0c84a40298 document batch support
reusable transform

rework update api

add indexer config

fix tests

review changes

Co-authored-by: Clément Renault <clement@meilisearch.com>

fmt
2022-01-19 12:40:20 +01:00
01968d7ca7 ensure we get no documents and no error when filtering on an empty db 2022-01-18 11:40:30 +01:00
367f403693 bump milli 2022-01-17 16:41:34 +01:00
8f4499090b Merge #433
433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire

- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
  documents containing this field thus we return an empty RoaringBitmap
  instead of throwing an internal error

Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-17 14:06:53 +00:00
4c516c00da Merge #426
426: Fix search highlight for non-unicode chars r=ManyTheFish a=Samyak2

# Pull Request

## What does this PR do?
Fixes https://github.com/meilisearch/MeiliSearch/issues/1480
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

## Changes

The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight.

In essence, the `matching_bytes` function now returns the number of matching grapheme clusters instead of bytes.

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a part of it matched

## Questions

Since `matching_bytes` does not return number of bytes but grapheme clusters, should it be renamed to something like `matching_chars` or `matching_graphemes`? Will this break the API?

Thank you very much `@ManyTheFish` for helping 😄 

Co-authored-by: Samyak S Sarnayak <samyak201@gmail.com>
2022-01-17 13:39:00 +00:00
d1ac40ea14 fix(filter): Fix two bugs.
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
  documents containing this field thus we returns an empty RoaringBitmap
  instead of throwing an internal error
2022-01-17 13:51:46 +01:00
2d7607734e Run cargo fmt on matching_words.rs 2022-01-17 13:04:33 +05:30
5ab505be33 Fix highlight by replacing num_graphemes_from_bytes
num_graphemes_from_bytes has been renamed in the tokenizer to
num_chars_from_bytes.

Highlight now works correctly!
2022-01-17 13:02:55 +05:30
c10f58b7bd Update tokenizer to v0.2.7 2022-01-17 13:02:00 +05:30
e752bd06f7 Fix matching_words tests to compile successfully
The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59
2022-01-17 11:37:45 +05:30
30247d70cd Fix search highlight for non-unicode chars
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
  clusters to highlight.

In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
  clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a
      part of it matched
2022-01-17 11:37:44 +05:30
0605c0ac68 apply review comments 2022-01-13 18:51:08 +01:00
b22c80106f add some settings to the fuzzed milli and use the published version of arbitrary json 2022-01-13 15:35:24 +01:00
c94952e25d update the readme + dependencies 2022-01-12 18:30:11 +01:00
e1053989c0 add a fuzzer on milli 2022-01-12 17:57:54 +01:00
98a365aaae store the geopoint in three dimensions 2021-12-14 12:21:24 +01:00
d671d6f0f1 remove an unused file 2021-12-13 19:27:34 +01:00
25faef67d0 Remove the database setup in the filter_depth test 2021-12-09 11:57:53 +01:00
65519bc04b Test that empty filters return a None 2021-12-09 11:57:53 +01:00
ef59762d8e Prefer returning None instead of the Empty Filter state 2021-12-09 11:57:52 +01:00
ee856a7a46 Limit the max filter depth to 2000 2021-12-07 17:36:45 +01:00
32bd9f091f Detect the filters that are too deep and return an error 2021-12-07 17:20:11 +01:00
90f49eab6d Check the filter max depth limit and reject the invalid ones 2021-12-07 16:32:48 +01:00
1b3923b5ce Update all packages to 0.21.0 2021-11-29 12:17:59 +01:00
8970246bc4 Sort positions before iterating over them during word pair proximity extraction 2021-11-22 18:16:54 +01:00
6e977dd8e8 change visibility of DocumentDeletionResult 2021-11-22 15:44:44 +01:00
35f9499638 Export tokenizer from milli 2021-11-18 16:57:12 +01:00
64ef5869d7 Update tokenizer v0.2.6 2021-11-18 16:56:05 +01:00
6eb47ab792 remove update_id in UpdateBuilder 2021-11-16 13:07:04 +01:00
09b4281cff improve document addition returned metaimprove document addition
returned metaimprove document addition returned metaimprove document
addition returned metaimprove document addition returned metaimprove
document addition returned metaimprove document addition returned
metaimprove document addition returned meta
2021-11-10 14:08:36 +01:00
721fc294be improve document deletion returned meta
returns both the remaining number of documents and the number of deleted
documents.
2021-11-10 14:08:18 +01:00
f28600031d Rename the filter_parser crate into filter-parser
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-11-09 16:41:10 +01:00
0ea0146e04 implement deref &str on the tokens 2021-11-09 11:34:10 +01:00
7483c7513a fix the filterable fields 2021-11-07 01:52:19 +01:00
e5af3ac65c rename the filter_condition.rs to filter.rs 2021-11-06 16:37:55 +01:00
6831c23449 merge with main 2021-11-06 16:34:30 +01:00
b249989bef fix most of the tests 2021-11-06 01:32:12 +01:00
27a6a26b4b makes the parse function part of the filter_parser 2021-11-05 10:46:54 +01:00
76d961cc77 implements the last errors 2021-11-04 17:42:06 +01:00
8234f9fdf3 recreate most filter error except for the geosearch 2021-11-04 17:24:55 +01:00
07a5ffb04c update http-ui 2021-11-04 15:52:22 +01:00
a58bc5bebb update milli with the new parser_filter 2021-11-04 15:02:36 +01:00
743ed9f57f Bump milli version 2021-11-04 14:04:21 +01:00
7b3bac46a0 Change Attribute and Ranking rules errors 2021-11-04 13:19:32 +01:00
702589104d Update version for the next release (v0.20.1) 2021-11-03 14:20:01 +01:00
0c0038488c Change last error messages 2021-11-03 11:24:06 +01:00