Commit Graph

109 Commits

Author SHA1 Message Date
2d88089129 Remove unused term matching strategies 2023-03-20 09:41:55 +01:00
8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
119e6d8811 Update milli/src/search/mod.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 15:33:10 +01:00
7a38fe624f throw an error if the top left corner is found below the bottom right corner 2023-02-06 17:50:47 +01:00
0bc1a18f52 Use Languages list detected during indexing at search time 2023-02-01 18:57:43 +01:00
643d99e0f9 Add expectancy test 2023-02-01 18:39:54 +01:00
229405aeb9 Choose implementation strategy of criterion at runtime 2022-12-21 09:29:39 +01:00
55724f2412 Introduce an initial candidates set that makes the difference between an exhaustive count and an estimation 2022-12-08 09:41:34 +01:00
cb8442a119 Further unify facet databases of f64s and strings 2022-10-26 13:47:04 +02:00
e8a156d682 Reorganise facets database indexing code 2022-10-26 13:46:46 +02:00
c3f49f766d Prepare refactor of facets database
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
f11a4087da Merge #665
665: Fixing piles of clippy errors. r=ManyTheFish a=ehiggs

## Related issue
No issue fixed. Simply cleaning up some code for clippy on the march towards a clean build when #659 is merged.

## What does this PR do?
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to name fresh variables.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>
2022-10-20 07:19:46 +00:00
6f55e7844c Add some code comments 2022-10-17 14:41:57 +02:00
d71bc1e69f Compute an exact count when using distinct 2022-10-17 14:13:44 +02:00
a396806343 Add settings to force milli to exhaustively compute the total number of hits 2022-10-17 14:13:44 +02:00
beb987d3d1 Fixing piles of clippy errors.
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
5391e3842c replace optional_words by term_matching_strategy 2022-08-22 17:47:19 +02:00
9640976c79 Rename TermMatchingPolicies 2022-08-18 17:36:08 +02:00
3b309f654a Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
d2f84a9d9e Improve the estimatedNbHits when distinct is enabled 2022-06-22 11:39:21 +02:00
69931e50d2 Add the max_values_by_facet setting to the database 2022-06-08 17:54:56 +02:00
86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
ac975cc747 cache context's exact words 2022-05-24 09:43:17 +02:00
ea4bb9402f Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
dda28d7415 exclude excluded canditates from search result candidates 2022-04-13 12:10:35 +02:00
bbb6728d2f add distinct attributes to cli 2022-04-13 12:10:35 +02:00
5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
4f3ce6d9cd nested fields 2022-04-07 16:58:46 +02:00
3bb1e35ada Fix match count 2022-04-05 17:48:45 +02:00
b3f0f39106 Make some cleaning 2022-04-05 17:41:32 +02:00
734d0899d3 Publish Matcher 2022-04-05 17:41:32 +02:00
d96e72e5dc Create formater with some tests 2022-04-05 17:41:32 +02:00
9fe40df960 add word derivations tests 2022-04-01 11:05:18 +02:00
d5ddc6b080 fix 2 typos word derivation bug 2022-04-01 10:51:22 +02:00
6ef3bb9d83 fmt 2022-03-31 14:06:23 +02:00
f782fe2062 add authorize_typo_test 2022-03-31 10:08:39 +02:00
c4653347fd add authorize typo setting 2022-03-31 10:05:44 +02:00
3f24555c3d custom fst automatons 2022-03-15 17:38:35 +01:00
628c835a22 fix tests 2022-03-15 17:38:34 +01:00
7541ab99cd review changes 2022-02-02 12:59:01 +01:00
d0aabde502 optimize 2 typos case 2022-02-02 12:56:09 +01:00
55e6cb9c7b typos on first letter counts as 2 2022-02-02 12:56:09 +01:00
6831c23449 merge with main 2021-11-06 16:34:30 +01:00
a58bc5bebb update milli with the new parser_filter 2021-11-04 15:02:36 +01:00
ed6db19681 Fix PR comments 2021-10-28 11:18:32 +02:00
208903ddde Revert "Replacing pest with nom " 2021-10-25 11:58:00 +02:00
e25ca9776f start updating the exposed function to makes other modules happy 2021-10-22 17:23:22 +02:00
c27870e765 integrate a first version without any error handling 2021-10-22 14:33:18 +02:00
01dedde1c9 update some names and move some parser out of the lib.rs 2021-10-22 01:59:38 +02:00