Commit Graph

2028 Commits

Author SHA1 Message Date
Kerollmops
bdc4263883 Introduce the validate_documents_batch function 2022-07-12 14:55:51 +02:00
Kerollmops
6d0498df24 Fix the fuzz tests 2022-07-12 14:52:56 +02:00
Kerollmops
e8297ad27e Fix the tests for the new DocumentsBatchBuilder/Reader 2022-07-12 14:52:56 +02:00
Kerollmops
419ce3966c Rework the DocumentsBatchBuilder/Reader to use grenad 2022-07-12 14:52:55 +02:00
Kerollmops
eb63af1f10 Update grenad to 0.4.2 2022-07-12 14:52:55 +02:00
Kerollmops
048e174efb Do not allocate when parsing CSV headers 2022-07-12 14:52:55 +02:00
ManyTheFish
5d79617a56 Chores: Enhance smart-crop code comments 2022-07-07 16:28:09 +02:00
bors[bot]
ebddfdb9a3 Merge #578
578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops

Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-05 14:56:08 +00:00
Kerollmops
1bfdcfc84f Bump uuid to 1.1.2 2022-07-05 16:23:36 +02:00
Tamo
250be9fe6c put the threshold back to 10k 2022-07-05 15:57:44 +02:00
Tamo
b61efd09fc Makes the internal soft deleted error a UserError 2022-07-05 15:34:45 +02:00
Tamo
eaf28b0628 Apply review suggestions
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-05 15:30:33 +02:00
Tamo
3b309f654a Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Tamo
446439e8be bump charabia 2022-07-05 12:19:30 +02:00
Dmytro Gordon
3ff03a3f5f Fix not equal filter when field contains both number and strings 2022-06-27 15:55:17 +03:00
Kerollmops
cc48992e79 Bump the milli version to 0.31.1 2022-06-22 17:05:51 +02:00
Kerollmops
238692a8e7 Introduce the copy_to_path method on the Index 2022-06-22 16:49:47 +02:00
bors[bot]
290a40b7a5 Merge #564
564: Rename the limitedTo parameter into maxTotalHits r=curquiza a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2542, it renames the `limitedTo` parameter into `maxTotalHits`.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 13:48:33 +00:00
bors[bot]
d546f6f40e Merge #563
563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way.

The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document. 

The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned.

If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 12:39:44 +00:00
Kerollmops
f5c3b951bc Bump the milli version to 0.31.0 2022-06-22 12:08:16 +02:00
Kerollmops
d7c248042b Rename the limitedTo parameter into maxTotalHits 2022-06-22 12:00:48 +02:00
Kerollmops
d2f84a9d9e Improve the estimatedNbHits when distinct is enabled 2022-06-22 11:39:21 +02:00
bors[bot]
4f547eff02 Merge #560
560: Update version for next release (v0.30.0) r=curquiza a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-06-20 12:37:01 +00:00
Clémentine Urquizar
31f749b5d8 Update version for next release (v0.30.0) 2022-06-20 12:09:57 +02:00
ManyTheFish
a0ab90a4d7 Avoid having an ending separator before crop marker 2022-06-16 18:23:57 +02:00
ManyTheFish
177154828c Extends deletion tests 2022-06-13 17:34:16 +02:00
ManyTheFish
0d1d354052 Ensure that Index methods are not bypassed by Meilisearch 2022-06-13 17:34:11 +02:00
bors[bot]
f1d848bb9a Merge #552
552: Fix escaped quotes in filter r=Kerollmops a=irevoire

Will fix https://github.com/meilisearch/meilisearch/issues/2380

The issue was that in the evaluation of the filter, I was using the deref implementation instead of calling the `value` method of my token.

To avoid the problem happening again, I removed the deref implementation; now, you need to either call the `lexeme` or the `value` methods but can't rely on a « default » implementation to get a string out of a token.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-06-09 14:56:44 +00:00
Tamo
676187ba43 bump milli version 2022-06-09 16:53:32 +02:00
Tamo
90afde435b fix escaped quotes in filter 2022-06-09 16:03:49 +02:00
Kerollmops
445d5474cc Add the pagination_limited_to setting to the database 2022-06-08 18:14:27 +02:00
Kerollmops
69931e50d2 Add the max_values_by_facet setting to the database 2022-06-08 17:54:56 +02:00
Kerollmops
52a494bd3b Add the new pagination.limited_to and faceting.max_values_per_facet settings 2022-06-08 17:15:36 +02:00
bors[bot]
9580b9de79 Merge #549
549: Bump the version to 0.29.2 r=curquiza a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-08 14:29:47 +00:00
Kerollmops
56ee9cc21f Bump the version to 0.29.2 2022-06-08 16:00:06 +02:00
Kerollmops
2a505503b3 Change the number of facet values returned by default to 100 2022-06-08 15:58:57 +02:00
Kerollmops
bae4007447 Remove the hard limit on the number of facet values returned 2022-06-08 15:58:57 +02:00
bors[bot]
7313d6c533 Merge #547
547: Update version for next release (v0.29.1) r=Kerollmops a=curquiza

A new milli version will be released once this PR is merged https://github.com/meilisearch/milli/pull/543

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-06-08 10:20:24 +00:00
Clémentine Urquizar
478dbfa45a Update version for next release (v0.29.1) 2022-06-07 18:59:33 +02:00
Tamo
d0aaa7ff00 Fix wrong internal ids assignments 2022-06-07 15:49:33 +02:00
ad hoc
31776fdc3f add failing test 2022-06-07 15:49:33 +02:00
bors[bot]
05ae6dbfa4 Merge #541
541: Update version for next release (v0.29.0) r=ManyTheFish a=curquiza

Need to update the version since #540 was merged and breaking

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-06-02 16:53:28 +00:00
ManyTheFish
d212dc6b8b Remove useless newline 2022-06-02 18:22:56 +02:00
Clémentine Urquizar
6ce1c6487a Update version for next release (v0.29.0) 2022-06-02 18:07:55 +02:00
ManyTheFish
7aabe42ae0 Refactor matching words 2022-06-02 17:59:04 +02:00
ManyTheFish
86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
ManyTheFish
192e024ada Add Charabia in Cargo.toml 2022-06-02 16:59:07 +02:00
Clémentine Urquizar
c19c17eddb Update version to v0.28.1 2022-06-01 18:31:02 +02:00
bors[bot]
74d1914a64 Merge #535
535: Reintroduce the max values by facet limit r=ManyTheFish a=Kerollmops

This PR reintroduces the max values by facet limit this is related to https://github.com/meilisearch/meilisearch/issues/2349.

~I would like some help in deciding on whether I keep the default 100 max values in milli and set up the `FacetDistribution` settings in Meilisearch to use 1000 as the new value, I expose the `max_values_by_facet` for this purpose.~

I changed the default value to 1000 and the max to 10000, thank you `@ManyTheFish` for the help!

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-01 14:30:50 +00:00
bors[bot]
582930dbbb Merge #538
538: speedup exact words r=Kerollmops a=MarinPostma

This PR make `exact_words` return an `Option` instead of an empty set, since set creation is costly, as noticed by `@kerollmops.`

I was not convinces that this was the cause for all of the performance drop we measured, and then realized that methods that initialized it were called recursively which caused initialization times to add up. While the first fix solves the issue when not using exact words, using exact word remained way more expensive that it should be. To address this issue, the exact words are cached into the `Context`, so they are only initialized once.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-05-30 08:20:34 +00:00