Commit Graph

1313 Commits

Author SHA1 Message Date
36bd66281d Add method to create a new Index with specific creation dates 2022-10-25 14:37:56 +02:00
9a569d73d1 Minor code style change 2022-10-24 15:30:43 +02:00
be302fd250 Remove outdated workaround for duplicate words in phrase search 2022-10-24 15:27:06 +02:00
d76d0cb1bf Merge branch 'main' into word-pair-proximity-docids-refactor 2022-10-24 15:23:00 +02:00
f3874d58b9 Update version for the next release (v0.34.0) in Cargo.toml files 2022-10-24 10:13:25 +00:00
a983129613 Apply suggestions from code review 2022-10-20 09:49:37 +02:00
f11a4087da Merge #665
665: Fixing piles of clippy errors. r=ManyTheFish a=ehiggs

## Related issue
No issue fixed. Simply cleaning up some code for clippy on the march towards a clean build when #659 is merged.

## What does this PR do?
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to name fresh variables.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>
2022-10-20 07:19:46 +00:00
176ffd23f5 Fix compile error after rebasing wppd-refactor 2022-10-18 10:40:26 +02:00
ab2f6f3aa4 Refine some details in word_prefix_pair_proximity indexing code 2022-10-18 10:37:34 +02:00
e6e76fbefe Improve performance of resolve_phrase at the cost of some relevancy 2022-10-18 10:37:34 +02:00
178d00f93a Cargo fmt 2022-10-18 10:37:34 +02:00
830a7c0c7a Use resolve_phrase function for exactness criteria as well 2022-10-18 10:37:34 +02:00
18d578dfc4 Adjust some algorithms using DBs of word pair proximities 2022-10-18 10:37:34 +02:00
072b576514 Fix proximity value in keys of prefix_word_pair_proximity_docids 2022-10-18 10:37:34 +02:00
6c3a5d69e1 Update snapshots 2022-10-18 10:37:34 +02:00
a7de4f5b85 Don't add swapped word pairs to the word_pair_proximity_docids db 2022-10-18 10:37:34 +02:00
264a04922d Add prefix_word_pair_proximity database
Similar to the word_prefix_pair_proximity one but instead the keys are:
(proximity, prefix, word2)
2022-10-18 10:37:34 +02:00
1dbbd8694f Rename StrStrU8Codec to U8StrStrCodec and reorder its fields 2022-10-18 10:37:34 +02:00
bdeb47305e Change encoding of word_pair_proximity DB to (proximity, word1, word2)
Same for word_prefix_pair_proximity
2022-10-18 10:37:34 +02:00
81919a35a2 Update milli/src/search/criteria/initial.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-17 18:23:20 +02:00
516e838eb4 Update milli/src/search/criteria/initial.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-17 18:23:15 +02:00
fc03e53615 Add a test to check that we can abort an indexation 2022-10-17 17:28:03 +02:00
6603437cb1 Introduce an indexation abortion function when indexing documents 2022-10-17 17:28:03 +02:00
6f55e7844c Add some code comments 2022-10-17 14:41:57 +02:00
cf203b7fde Take filter in account when computing the pages candidates 2022-10-17 14:13:44 +02:00
d71bc1e69f Compute an exact count when using distinct 2022-10-17 14:13:44 +02:00
a396806343 Add settings to force milli to exhaustively compute the total number of hits 2022-10-17 14:13:44 +02:00
4c481a8947 Upgrade all dependencies 2022-10-17 13:05:56 +02:00
beb987d3d1 Fixing piles of clippy errors.
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
f30979d021 Merge #662
662: Enhance word splitting strategy r=ManyTheFish a=akki1306

# Pull Request

## Related issue
Fixes #648 

## What does this PR do?
- [split_best_frequency](55d889522b/milli/src/search/query_tree.rs (L282-L301)) to use frequency of word pairs near together with proximity value of 1 instead of considering the frequency of individual words. Word pairs having max frequency are considered.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Co-authored-by: Akshay Kulkarni <akshayk.gj@gmail.com>
2022-10-13 08:14:22 +00:00
85f3028317 remove underscore and introduce back word_documents_count 2022-10-13 13:21:59 +05:30
8195fc6141 revert removal of word_documents_count method 2022-10-13 13:14:27 +05:30
32f825d442 move default implementation of word_pair_frequency to TestContext 2022-10-13 12:57:50 +05:30
ff8b2d4422 formatting 2022-10-13 12:44:08 +05:30
6cb8b46900 use word_pair_frequency and remove word_documents_count 2022-10-13 12:43:11 +05:30
8c9245149e format file 2022-10-12 15:27:56 +05:30
63e79a9039 update comment 2022-10-12 13:36:48 +05:30
7f9680f0a0 Enhance word splitting strategy 2022-10-12 13:18:23 +05:30
6fbf5dac68 Simplify documents! macro to reduce compile times 2022-10-12 09:22:05 +02:00
762e320c35 Add proximity calculation for the same word 2022-10-07 12:59:12 +02:00
00c02d00f3 Add missing logging timer to extractors 2022-09-30 22:17:06 +05:30
d94339a858 Merge #636
636: Remove unused `infos`, `http-ui`, and `milli/fuzz`, crates r=ManyTheFish a=loiclec

We haven't used the `infos/`, `http-ui/` and `milli/fuzz/` crates in a long time. They are not properly maintained and probably do not work correctly anymore.

This PR removes these crates entirely from the workspace to reduce the amount of code we need to maintain.

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:39:57 +00:00
15d478cf4d Merge #635
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec

# Pull Request
## What does this PR do?

Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.

In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.


Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:00:52 +00:00
add96f921b Remove unused infos/ http-ui/ and fuzz/ crates 2022-09-14 06:55:01 +02:00
753e76d451 Update version for the next release (v0.33.4) in Cargo.toml files 2022-09-13 13:55:50 +00:00
3794962330 Use an unstable algorithm for grenad::Sorter when possible 2022-09-13 14:49:53 +02:00
d4d7c9d577 We avoid skipping errors in the indexing pipeline 2022-09-13 14:03:00 +02:00
8cd5200f48 Make charabia languages configurable 2022-09-08 12:21:43 +02:00
5e07ea79c2 Make charabia default feature optional 2022-09-07 20:54:31 +02:00
077dcd2002 Update version for the next release (v0.33.3) in Cargo.toml files 2022-09-07 15:48:53 +00:00