Commit Graph

371 Commits

Author SHA1 Message Date
74d1914a64 Merge #535
535: Reintroduce the max values by facet limit r=ManyTheFish a=Kerollmops

This PR reintroduces the max values by facet limit this is related to https://github.com/meilisearch/meilisearch/issues/2349.

~I would like some help in deciding on whether I keep the default 100 max values in milli and set up the `FacetDistribution` settings in Meilisearch to use 1000 as the new value, I expose the `max_values_by_facet` for this purpose.~

I changed the default value to 1000 and the max to 10000, thank you `@ManyTheFish` for the help!

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-01 14:30:50 +00:00
25fc576696 review changes 2022-05-24 14:15:33 +02:00
69dc4de80f change &Option<Set> to Option<&Set> 2022-05-24 12:14:55 +02:00
ac975cc747 cache context's exact words 2022-05-24 09:43:17 +02:00
8993fec8a3 return optional exact words 2022-05-24 09:15:49 +02:00
cd7c6e19ed Reintroduce the max values by facet limit 2022-05-18 15:57:57 +02:00
137434a1c8 Add some implementation on MatchBounds 2022-05-17 15:57:09 +02:00
9db86aac51 Merge #518
518: Return facets even when there is no value associated to it r=Kerollmops a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2352 and should fix the issue when Meilisearch is up-to-date with this PR.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-04-28 09:04:36 +00:00
7d1c2d97bf Return facets even when there is no values associated to it 2022-04-26 17:59:53 +02:00
5c29258e8e fix cargo warnings 2022-04-26 17:33:11 +02:00
ea4bb9402f Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
f1115e274f Use Copy impl of FormatOption instead of clonning 2022-04-19 10:35:50 +02:00
dda28d7415 exclude excluded canditates from search result candidates 2022-04-13 12:10:35 +02:00
bbb6728d2f add distinct attributes to cli 2022-04-13 12:10:35 +02:00
5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
011f8210ed Make compute_matches more rust idiomatic 2022-04-12 10:19:02 +02:00
a16de5de84 Symplify format and remove intermediate function 2022-04-08 11:20:41 +02:00
a769e09dfa Make token_crop_bounds more rust idiomatic 2022-04-07 20:15:14 +02:00
c8ed1675a7 Add some documentation 2022-04-07 17:32:13 +02:00
b1905dfa24 Make split_best_frequency returns references instead of owned data 2022-04-07 17:05:44 +02:00
4f3ce6d9cd nested fields 2022-04-07 16:58:46 +02:00
fa7d3a37c0 Make some cleaning and add comments 2022-04-05 17:48:56 +02:00
3bb1e35ada Fix match count 2022-04-05 17:48:45 +02:00
56e0edd621 Put crop markers direclty around words 2022-04-05 17:41:32 +02:00
a93cd8c61c Fix prefix highlight with special chars 2022-04-05 17:41:32 +02:00
b3f0f39106 Make some cleaning 2022-04-05 17:41:32 +02:00
6dc345bc53 Test and Fix prefix highlight 2022-04-05 17:41:32 +02:00
bd30ee97b8 Keep separators at start of the croped string 2022-04-05 17:41:32 +02:00
29c5f76d7f Use new matcher in http-ui 2022-04-05 17:41:32 +02:00
734d0899d3 Publish Matcher 2022-04-05 17:41:32 +02:00
4428cb5909 Add some tests and fix some corner cases 2022-04-05 17:41:32 +02:00
844f546a8b Add matches algorithm V1 2022-04-05 17:41:32 +02:00
3be1790803 Add crop algorithm with naive match algorithm 2022-04-05 17:41:32 +02:00
d96e72e5dc Create formater with some tests 2022-04-05 17:41:32 +02:00
6b2c2509b2 fix bug in exact search 2022-04-04 20:54:03 +02:00
56b4f5dce2 add exact prefix to query_docids 2022-04-04 20:54:03 +02:00
21ae4143b1 add exact_word_prefix to Context 2022-04-04 20:54:03 +02:00
c4c6e35352 query exact_word_docids in resolve_query_tree 2022-04-04 20:54:02 +02:00
c882d8daf0 add test for exact words 2022-04-04 20:54:01 +02:00
7e9d56a9e7 disable typos on exact words 2022-04-04 20:54:01 +02:00
0fd55db21c fmt 2022-04-04 20:10:55 +02:00
559e46be5e fix bad rebase bug 2022-04-04 20:10:55 +02:00
8b1e5d9c6d add test for exact words 2022-04-04 20:10:55 +02:00
774fa8f065 disable typos on exact words 2022-04-04 20:10:55 +02:00
853b4a520f fmt 2022-04-04 10:41:46 +02:00
fdaf45aab2 replace hardcoded value with constant in TestContext 2022-04-04 10:41:46 +02:00
950a740bd4 refactor typos for readability 2022-04-04 10:41:46 +02:00
66020cd923 rename min_word_len* to use plain letter numbers 2022-04-04 10:41:46 +02:00
286dd7b2e4 rename min_word_len_2_typo 2022-04-01 11:17:03 +02:00