Commit Graph

600 Commits

Author SHA1 Message Date
07003704a8 Merge branch 'filter/field-exist' 2022-07-21 14:51:41 +02:00
cbb3b25459 Fix(Search): Fix phrase search candidates computation
This bug is an old bug but was hidden by the proximity criterion,
Phrase search were always returning an empty candidates list.

Before the fix, we were trying to find any words[n] near words[n]
instead of finding  any words[n] near words[n+1], for example:

for a phrase search '"Hello world"' we were searching for "hello" near "hello" first, instead of "hello" near "world".
2022-07-21 10:04:30 +02:00
941af58239 Merge #561
561: Enriched documents batch reader r=curquiza a=Kerollmops

~This PR is based on #555 and must be rebased on main after it has been merged to ease the review.~
This PR contains the work in #555 and can be merged on main as soon as reviewed and approved.

- [x] Create an `EnrichedDocumentsBatchReader` that contains the external documents id.
- [x] Extract the primary key name and make it accessible in the `EnrichedDocumentsBatchReader`.
- [x] Use the external id from the `EnrichedDocumentsBatchReader` in the `Transform::read_documents`.
- [x] Remove the `update_primary_key` from the _transform.rs_ file.
- [x] Really generate the auto-generated documents ids.
- [x] Insert the (auto-generated) document ids in the document while processing it in `Transform::read_documents`.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-21 07:08:50 +00:00
d0eee5ff7a Fix compiler error 2022-07-19 13:54:30 +02:00
dc64170a69 Improve syntax of EXISTS filter, allow “value NOT EXISTS” 2022-07-19 10:07:33 +02:00
72452f0cb2 Implements the EXIST filter operator 2022-07-19 10:07:33 +02:00
2d79720f5d Update milli/src/search/matches/mod.rs 2022-07-18 17:48:04 +02:00
8ddb4e750b Update milli/src/search/matches/mod.rs 2022-07-18 17:47:39 +02:00
a277daa1f2 Update milli/src/search/matches/mod.rs 2022-07-18 17:47:13 +02:00
fb794c6b5e Update milli/src/search/matches/mod.rs 2022-07-18 17:46:00 +02:00
1237cfc249 Update milli/src/search/matches/mod.rs 2022-07-18 17:45:37 +02:00
d7fd5c58cd Update milli/src/search/matches/mod.rs 2022-07-18 17:45:06 +02:00
e261ef64d7 Update milli/src/search/matches/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-18 10:18:51 +02:00
1da4ab5918 Update milli/src/search/matches/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-18 10:18:03 +02:00
399eec5c01 Fix the indexation tests 2022-07-12 14:55:51 +02:00
e8297ad27e Fix the tests for the new DocumentsBatchBuilder/Reader 2022-07-12 14:52:56 +02:00
5d79617a56 Chores: Enhance smart-crop code comments 2022-07-07 16:28:09 +02:00
3b309f654a Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
3ff03a3f5f Fix not equal filter when field contains both number and strings 2022-06-27 15:55:17 +03:00
d2f84a9d9e Improve the estimatedNbHits when distinct is enabled 2022-06-22 11:39:21 +02:00
a0ab90a4d7 Avoid having an ending separator before crop marker 2022-06-16 18:23:57 +02:00
f1d848bb9a Merge #552
552: Fix escaped quotes in filter r=Kerollmops a=irevoire

Will fix https://github.com/meilisearch/meilisearch/issues/2380

The issue was that in the evaluation of the filter, I was using the deref implementation instead of calling the `value` method of my token.

To avoid the problem happening again, I removed the deref implementation; now, you need to either call the `lexeme` or the `value` methods but can't rely on a « default » implementation to get a string out of a token.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-06-09 14:56:44 +00:00
90afde435b fix escaped quotes in filter 2022-06-09 16:03:49 +02:00
69931e50d2 Add the max_values_by_facet setting to the database 2022-06-08 17:54:56 +02:00
2a505503b3 Change the number of facet values returned by default to 100 2022-06-08 15:58:57 +02:00
bae4007447 Remove the hard limit on the number of facet values returned 2022-06-08 15:58:57 +02:00
d212dc6b8b Remove useless newline 2022-06-02 18:22:56 +02:00
7aabe42ae0 Refactor matching words 2022-06-02 17:59:04 +02:00
86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
74d1914a64 Merge #535
535: Reintroduce the max values by facet limit r=ManyTheFish a=Kerollmops

This PR reintroduces the max values by facet limit this is related to https://github.com/meilisearch/meilisearch/issues/2349.

~I would like some help in deciding on whether I keep the default 100 max values in milli and set up the `FacetDistribution` settings in Meilisearch to use 1000 as the new value, I expose the `max_values_by_facet` for this purpose.~

I changed the default value to 1000 and the max to 10000, thank you `@ManyTheFish` for the help!

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-01 14:30:50 +00:00
25fc576696 review changes 2022-05-24 14:15:33 +02:00
69dc4de80f change &Option<Set> to Option<&Set> 2022-05-24 12:14:55 +02:00
ac975cc747 cache context's exact words 2022-05-24 09:43:17 +02:00
8993fec8a3 return optional exact words 2022-05-24 09:15:49 +02:00
cd7c6e19ed Reintroduce the max values by facet limit 2022-05-18 15:57:57 +02:00
137434a1c8 Add some implementation on MatchBounds 2022-05-17 15:57:09 +02:00
9db86aac51 Merge #518
518: Return facets even when there is no value associated to it r=Kerollmops a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2352 and should fix the issue when Meilisearch is up-to-date with this PR.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-04-28 09:04:36 +00:00
7d1c2d97bf Return facets even when there is no values associated to it 2022-04-26 17:59:53 +02:00
5c29258e8e fix cargo warnings 2022-04-26 17:33:11 +02:00
ea4bb9402f Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
f1115e274f Use Copy impl of FormatOption instead of clonning 2022-04-19 10:35:50 +02:00
dda28d7415 exclude excluded canditates from search result candidates 2022-04-13 12:10:35 +02:00
bbb6728d2f add distinct attributes to cli 2022-04-13 12:10:35 +02:00
5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
011f8210ed Make compute_matches more rust idiomatic 2022-04-12 10:19:02 +02:00
a16de5de84 Symplify format and remove intermediate function 2022-04-08 11:20:41 +02:00
a769e09dfa Make token_crop_bounds more rust idiomatic 2022-04-07 20:15:14 +02:00
c8ed1675a7 Add some documentation 2022-04-07 17:32:13 +02:00
b1905dfa24 Make split_best_frequency returns references instead of owned data 2022-04-07 17:05:44 +02:00