Commit Graph

131 Commits

Author SHA1 Message Date
c27ea2677f Rewrite cheapest path algorithm and empty path cache
It is now much simpler and has much better performance.
2023-03-20 09:41:56 +01:00
600e3dd1c5 Remove warnings 2023-03-20 09:41:56 +01:00
6c659dc12f Use MiMalloc in milli tests 2023-03-20 09:41:37 +01:00
229405aeb9 Choose implementation strategy of criterion at runtime 2022-12-21 09:29:39 +01:00
50954d31fa feat: Re-export Span and Token to milli:: 2022-12-03 13:37:33 -05:00
5e754b3ee0 Merge #708
708: Reduce memory usage of the MatchingWords structure r=ManyTheFish a=loiclec

# Pull Request

## Related issue
Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3115 

## What does this PR do?
1. Reduces the memory usage caused by the creation of a 10-word query tree by 20x. 
   This is done by deduplicating the `MatchingWord` values, which are heavy because of their inner DFA. The deduplication works by wrapping each `MatchingWord` in a reference-counted box and using a hash map to determine whether a  `MatchingWord` DFA already exists for a certain signature, or whether a new one needs to be built.
 
2. Avoid the worst-case scenario of creating a `MatchingWord` for extremely long words that cannot be indexed by milli.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-30 17:47:34 +00:00
8d0ace2d64 Avoid creating a MatchingWord for words that exceed the length limit 2022-11-28 10:20:13 +01:00
935a724c57 revert: Revert pass by reference API change 2022-11-24 10:08:23 -05:00
7c0e544839 feat: Add all_obkv_to_json function 2022-11-23 21:18:58 -05:00
c7322f704c Fix cargo clippy errors
Dont apply clippy for tests for now

Fix clippy warnings of filter-parser package

parent 8352febd646ec4bcf56a44161e5c4dce0e55111f
author unvalley <38400669+unvalley@users.noreply.github.com> 1666325847 +0900
committer unvalley <kirohi.code@gmail.com> 1666791316 +0900

Update .github/workflows/rust.yml

Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>

Allow clippy lint too_many_argments

Allow clippy lint needless_collect

Allow clippy lint too_many_arguments and type_complexity

Fix for clippy warnings comparison_chains

Fix for clippy warnings vec_init_then_push

Allow clippy lint should_implement_trait

Allow clippy lint drop_non_drop

Fix lifetime clipy warnings in filter-paprser

Execute cargo fmt

Fix clippy remaining warnings

Fix clippy remaining warnings again and allow lint on each place
2022-10-27 01:04:23 +09:00
54c0cf93fe Merge remote-tracking branch 'origin/main' into facet-levels-refactor 2022-10-26 15:13:34 +02:00
86d9f50b9c Fix bugs in incremental facet indexing with variable parameters
e.g. add one facet value incrementally with a group_size = X and then
add another one with group_size = Y

It is not actually possible to do so with the public API of milli,
but I wanted to make sure the algorithm worked well in those cases
anyway.

The bugs were found by fuzzing the code with fuzzcheck, which I've added
to milli as a conditional dev-dependency. But it can be removed later.
2022-10-26 13:47:04 +02:00
9d27ac8a2e Ignore too many arguments to functions. 2022-10-25 21:22:53 +02:00
42cdc38c7b Allow weird ranges like 1..=0 to pass clippy.
Everything else is just a warning and exit code will be 0.
2022-10-25 21:12:59 +02:00
d76d0cb1bf Merge branch 'main' into word-pair-proximity-docids-refactor 2022-10-24 15:23:00 +02:00
1dbbd8694f Rename StrStrU8Codec to U8StrStrCodec and reorder its fields 2022-10-18 10:37:34 +02:00
beb987d3d1 Fixing piles of clippy errors.
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
9640976c79 Rename TermMatchingPolicies 2022-08-18 17:36:08 +02:00
306593144d Refactor word prefix pair proximity indexation 2022-08-17 11:59:00 +02:00
334098a7e0 Add index snapshot test helper function 2022-08-10 15:53:46 +02:00
07003704a8 Merge branch 'filter/field-exist' 2022-07-21 14:51:41 +02:00
30bd4db0fc Simplify indexing task for facet_exists_docids database 2022-07-19 10:07:33 +02:00
fcfc4caf8c Move the Object type in the lib.rs file and use it everywhere 2022-07-12 14:55:51 +02:00
69931e50d2 Add the max_values_by_facet setting to the database 2022-06-08 17:54:56 +02:00
86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
ea4bb9402f Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
4f3ce6d9cd nested fields 2022-04-07 16:58:46 +02:00
b3f0f39106 Make some cleaning 2022-04-05 17:41:32 +02:00
29c5f76d7f Use new matcher in http-ui 2022-04-05 17:41:32 +02:00
5e08fac729 fixes for rustfmt pass 2022-03-14 19:22:41 +05:30
92e2e09434 exporting heed to avoid having different versions of Heed in Meilisearch 2022-03-14 01:01:58 +05:30
98a365aaae store the geopoint in three dimensions 2021-12-14 12:21:24 +01:00
35f9499638 Export tokenizer from milli 2021-11-18 16:57:12 +01:00
a58bc5bebb update milli with the new parser_filter 2021-11-04 15:02:36 +01:00
e25ca9776f start updating the exposed function to makes other modules happy 2021-10-22 17:23:22 +02:00
c27870e765 integrate a first version without any error handling 2021-10-22 14:33:18 +02:00
59cc59e93e Merge #358
358: Replacing pest with nom  r=Kerollmops a=CNLHC



Co-authored-by: 刘瀚骋 <cn_lhc@qq.com>
2021-10-16 20:44:38 +00:00
360c5ff3df Remove limit of 1000 position per attribute
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
f7796edc7e remove everything about pest 2021-10-12 13:30:40 +08:00
3296bb243c Simplify word level position DB into a word position DB 2021-10-05 12:15:02 +02:00
c7cb816ae1 simplify the error handling of the sort syntax for meilisearch 2021-09-27 19:07:22 +02:00
023446ecf3 create a smaller and easier to maintain CriterionError type 2021-09-22 16:37:41 +02:00
86e272856a create an asc_desc error type that is never supposed to be returned to the end user 2021-09-22 16:37:41 +02:00
257e621d40 create an asc_desc module 2021-09-22 16:37:41 +02:00
aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
cfc62a1c15 use geoutils instead of haversine 2021-09-09 18:11:38 +02:00
3fc145c254 if we have no rtree we return all other provided documents 2021-09-09 17:44:09 +02:00
a84f3a8b31 Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00