Commit Graph

112 Commits

Author SHA1 Message Date
334098a7e0 Add index snapshot test helper function 2022-08-10 15:53:46 +02:00
07003704a8 Merge branch 'filter/field-exist' 2022-07-21 14:51:41 +02:00
30bd4db0fc Simplify indexing task for facet_exists_docids database 2022-07-19 10:07:33 +02:00
fcfc4caf8c Move the Object type in the lib.rs file and use it everywhere 2022-07-12 14:55:51 +02:00
69931e50d2 Add the max_values_by_facet setting to the database 2022-06-08 17:54:56 +02:00
86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
ea4bb9402f Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
4f3ce6d9cd nested fields 2022-04-07 16:58:46 +02:00
b3f0f39106 Make some cleaning 2022-04-05 17:41:32 +02:00
29c5f76d7f Use new matcher in http-ui 2022-04-05 17:41:32 +02:00
5e08fac729 fixes for rustfmt pass 2022-03-14 19:22:41 +05:30
92e2e09434 exporting heed to avoid having different versions of Heed in Meilisearch 2022-03-14 01:01:58 +05:30
98a365aaae store the geopoint in three dimensions 2021-12-14 12:21:24 +01:00
35f9499638 Export tokenizer from milli 2021-11-18 16:57:12 +01:00
a58bc5bebb update milli with the new parser_filter 2021-11-04 15:02:36 +01:00
e25ca9776f start updating the exposed function to makes other modules happy 2021-10-22 17:23:22 +02:00
c27870e765 integrate a first version without any error handling 2021-10-22 14:33:18 +02:00
59cc59e93e Merge #358
358: Replacing pest with nom  r=Kerollmops a=CNLHC



Co-authored-by: 刘瀚骋 <cn_lhc@qq.com>
2021-10-16 20:44:38 +00:00
360c5ff3df Remove limit of 1000 position per attribute
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
f7796edc7e remove everything about pest 2021-10-12 13:30:40 +08:00
3296bb243c Simplify word level position DB into a word position DB 2021-10-05 12:15:02 +02:00
c7cb816ae1 simplify the error handling of the sort syntax for meilisearch 2021-09-27 19:07:22 +02:00
023446ecf3 create a smaller and easier to maintain CriterionError type 2021-09-22 16:37:41 +02:00
86e272856a create an asc_desc error type that is never supposed to be returned to the end user 2021-09-22 16:37:41 +02:00
257e621d40 create an asc_desc module 2021-09-22 16:37:41 +02:00
aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
cfc62a1c15 use geoutils instead of haversine 2021-09-09 18:11:38 +02:00
3fc145c254 if we have no rtree we return all other provided documents 2021-09-09 17:44:09 +02:00
a84f3a8b31 Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
e5ef0cad9a use meters in the filters 2021-09-08 18:24:09 +02:00
f0b74637dc fix all the tests 2021-09-08 18:24:09 +02:00
8d9c2c4425 create a new db with getters and setters 2021-09-08 17:51:07 +02:00
1d314328f0 Plug new indexer 2021-09-01 16:48:36 +02:00
3aaf1d62f3 Publish grenad CompressionType type in milli 2021-09-01 16:42:08 +02:00
af65485ba7 Reexport the grenad CompressionType from milli 2021-08-24 18:15:31 +02:00
89d0758713 Revert "Revert "Sort at query time"" 2021-08-24 11:55:16 +02:00
922f9fd4d5 Revert "Sort at query time" 2021-08-20 18:09:17 +02:00
d1df0d20f9 Add integration test of SortBy criterion 2021-08-18 16:21:51 +02:00
838ed1cd32 Use an u16 field id instead of one byte 2021-07-06 11:58:03 +02:00
daef43f504 Rename FieldsDistribution into FieldDistribution 2021-06-21 15:57:41 +02:00
d08cfda796 convert the field_distribution to a BTreeMap and avoid counting twice the same documents 2021-06-17 18:31:54 +02:00
969adaefdf rename fields_distribution in field_distribution 2021-06-17 15:16:20 +02:00
70bee7d405 re-export remaining error types
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-17 11:49:03 +02:00
abbebad669 change sub errors visibility 2021-06-17 11:44:01 +02:00
9716fb3b36 format the whole project 2021-06-16 18:33:33 +02:00
f0e804afd5 Rename the FieldIdMapMissingEntry from_db_name field into process 2021-06-15 11:13:04 +02:00
312c2d1d8e Use the Error enum everywhere in the project 2021-06-14 16:58:38 +02:00
23fcf7920e Introduce a basic version of the InternalError struct 2021-06-14 16:48:51 +02:00