Commit Graph

1841 Commits

Author SHA1 Message Date
Kerollmops
2ceeb51c37 Support the auto-generated ids when validating documents 2022-07-12 14:55:51 +02:00
Kerollmops
399eec5c01 Fix the indexation tests 2022-07-12 14:55:51 +02:00
Kerollmops
fcfc4caf8c Move the Object type in the lib.rs file and use it everywhere 2022-07-12 14:55:51 +02:00
Kerollmops
0146175fe6 Introduce the validate_documents_batch function 2022-07-12 14:55:51 +02:00
Kerollmops
cefffde9af Improve the .gitignore of the fuzz crate 2022-07-12 14:55:51 +02:00
Kerollmops
bdc4263883 Introduce the validate_documents_batch function 2022-07-12 14:55:51 +02:00
Kerollmops
a97d4d63b9 Fix the benchmarks 2022-07-12 14:55:50 +02:00
Kerollmops
f29114f94a Fix http-ui to fit with the new DocumentsBatchBuilder/Reader structs 2022-07-12 14:52:56 +02:00
Kerollmops
a4ceef9624 Fix the cli for the new DocumentsBatchBuilder/Reader structs 2022-07-12 14:52:56 +02:00
Kerollmops
6d0498df24 Fix the fuzz tests 2022-07-12 14:52:56 +02:00
Kerollmops
e8297ad27e Fix the tests for the new DocumentsBatchBuilder/Reader 2022-07-12 14:52:56 +02:00
Kerollmops
419ce3966c Rework the DocumentsBatchBuilder/Reader to use grenad 2022-07-12 14:52:55 +02:00
Kerollmops
eb63af1f10 Update grenad to 0.4.2 2022-07-12 14:52:55 +02:00
Kerollmops
048e174efb Do not allocate when parsing CSV headers 2022-07-12 14:52:55 +02:00
bors[bot]
ce90fc628a Merge #583
583: Use BufReader to read datasets in benchmarks r=ManyTheFish a=loiclec

## What does this PR do?
Ensure that the datasets used by the benchmarks are read efficiently by using a `BufReader`.

## Why?
Using a `BufReader` is more representative of how `meilisearch` works. It will also make performance comparisons between different branches of `milli` more  accurate.




Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-07-07 08:13:07 +00:00
Loïc Lecrenier
aae03356cb Use BufReader to read datasets in benchmarks 2022-07-06 18:20:15 +02:00
bors[bot]
ebddfdb9a3 Merge #578
578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops

Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-05 14:56:08 +00:00
bors[bot]
eeba196053 Merge #572
572: Add reindexing benchmarks r=Kerollmops a=irevoire

With #557 coming, we should add benchmarks that measure our impact on the reindexing process.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-05 14:43:01 +00:00
Kerollmops
1bfdcfc84f Bump uuid to 1.1.2 2022-07-05 16:23:36 +02:00
bors[bot]
dd1e606f13 Merge #557
557: Fasten documents deletion and update r=Kerollmops a=irevoire

When a document deletion occurs, instead of deleting the document we mark it as deleted in the new “soft deleted” bitmap. It is then removed from the search and all the other endpoints.

I ran the benchmarks against main;
```
% ./compare.sh indexing_main_83ad1aaf.json indexing_fasten-document-deletion_abab51fb.json
group                                                                     indexing_fasten-document-deletion_abab51fb    indexing_main_83ad1aaf
-----                                                                     ------------------------------------------    ----------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable-                 1.05      2.0±0.40ms        ? ?/sec           1.00  1904.9±190.00µs        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-           1.00     10.3±2.64ms        ? ?/sec           961.61      9.9±0.12s        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested-    1.00     15.1±3.90ms        ? ?/sec           554.63      8.4±0.12s        ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable-            1.00     45.1±7.53ms        ? ?/sec           710.15     32.0±0.10s        ? ?/sec
indexing/-wiki-delete-searchable-                                         1.00    277.8±7.97ms        ? ?/sec           1946.57    540.8±3.15s        ? ?/sec
indexing/Indexing geo_point                                               1.00      12.0±0.20s        ? ?/sec           1.03      12.4±0.19s        ? ?/sec
indexing/Indexing movies in three batches                                 1.00      19.3±0.30s        ? ?/sec           1.01      19.4±0.16s        ? ?/sec
indexing/Indexing movies with default settings                            1.00      18.8±0.09s        ? ?/sec           1.00      18.9±0.10s        ? ?/sec
indexing/Indexing nested movies with default settings                     1.00      25.9±0.19s        ? ?/sec           1.00      25.9±0.12s        ? ?/sec
indexing/Indexing nested movies without any facets                        1.00      24.8±0.17s        ? ?/sec           1.00      24.8±0.18s        ? ?/sec
indexing/Indexing songs in three batches with default settings            1.00      65.9±0.96s        ? ?/sec           1.03      67.8±0.82s        ? ?/sec
indexing/Indexing songs with default settings                             1.00      58.8±1.11s        ? ?/sec           1.02      59.9±2.09s        ? ?/sec
indexing/Indexing songs without any facets                                1.00      53.4±0.72s        ? ?/sec           1.01      54.2±0.88s        ? ?/sec
indexing/Indexing songs without faceted numbers                           1.00      57.9±1.17s        ? ?/sec           1.01      58.3±1.20s        ? ?/sec
indexing/Indexing wiki                                                    1.00   1065.2±13.26s        ? ?/sec           1.00   1065.8±12.66s        ? ?/sec
indexing/Indexing wiki in three batches                                   1.00    1182.4±6.20s        ? ?/sec           1.01    1190.8±8.48s        ? ?/sec
```

Most things do not change, we lost 0.1ms on the indexing of geo point (I don’t get why), and then we are between 500 and 1900 times faster when we delete documents.


Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-05 14:14:38 +00:00
Tamo
250be9fe6c put the threshold back to 10k 2022-07-05 15:57:44 +02:00
bors[bot]
62692c171d Merge #577
577: Fix deserialisation of NDJson documents in benchmarks r=irevoire a=loiclec

Previously, the first document in the NDJson file was read over and over again. So the `geo_point` benchmark was not working properly: it only indexed one document.

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-07-05 13:54:47 +00:00
Loïc Lecrenier
9bc7627e27 Fix deserialisation of NDJson documents in benchmarks 2022-07-05 15:51:06 +02:00
Tamo
b61efd09fc Makes the internal soft deleted error a UserError 2022-07-05 15:34:45 +02:00
Tamo
eaf28b0628 Apply review suggestions
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-05 15:30:33 +02:00
Tamo
3b309f654a Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Tamo
2700d8dc67 Add reindexing benchmarks 2022-07-05 14:46:46 +02:00
bors[bot]
77c837fc1b Merge #575
575: Bump charabia r=loiclec a=irevoire

This fix #573

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-05 11:53:57 +00:00
Tamo
446439e8be bump charabia 2022-07-05 12:19:30 +02:00
bors[bot]
c6f4775fde Merge #568
568: Fix not equal filter when field contains both number and strings r=Kerollmops a=GraDKh

Related to https://github.com/meilisearch/meilisearch/issues/2516
Looks like the issue should be moved to this repo, but I'm not sure what the right procedure for it.

Co-authored-by: Dmytro Gordon <dmytro@bigstream.co>
2022-06-28 08:46:23 +00:00
Dmytro Gordon
3ff03a3f5f Fix not equal filter when field contains both number and strings 2022-06-27 15:55:17 +03:00
bors[bot]
83ad1aaf05 Merge #567
567: Bump the milli version to 0.31.1 r=curquiza a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 15:07:03 +00:00
Kerollmops
cc48992e79 Bump the milli version to 0.31.1 2022-06-22 17:05:51 +02:00
bors[bot]
68bb170732 Merge #566
566: Introduce the copy_to_path method on the Index r=irevoire a=Kerollmops

Meilisearch needs this method to do snapshots.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 14:52:19 +00:00
Kerollmops
238692a8e7 Introduce the copy_to_path method on the Index 2022-06-22 16:49:47 +02:00
bors[bot]
290a40b7a5 Merge #564
564: Rename the limitedTo parameter into maxTotalHits r=curquiza a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2542, it renames the `limitedTo` parameter into `maxTotalHits`.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 13:48:33 +00:00
bors[bot]
d546f6f40e Merge #563
563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way.

The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document. 

The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned.

If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 12:39:44 +00:00
bors[bot]
38a8d3cae1 Merge #565
565: Bump the milli version to 0.31.0 r=curquiza a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 10:09:41 +00:00
Kerollmops
f5c3b951bc Bump the milli version to 0.31.0 2022-06-22 12:08:16 +02:00
Kerollmops
d7c248042b Rename the limitedTo parameter into maxTotalHits 2022-06-22 12:00:48 +02:00
Kerollmops
d2f84a9d9e Improve the estimatedNbHits when distinct is enabled 2022-06-22 11:39:21 +02:00
bors[bot]
4f547eff02 Merge #560
560: Update version for next release (v0.30.0) r=curquiza a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-06-20 12:37:01 +00:00
bors[bot]
64b833410c Merge #559
559: Avoid having an ending separator before crop marker r=irevoire a=ManyTheFish

related to https://github.com/meilisearch/meilisearch/issues/2528


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-06-20 11:06:52 +00:00
Clémentine Urquizar
31f749b5d8 Update version for next release (v0.30.0) 2022-06-20 12:09:57 +02:00
ManyTheFish
a0ab90a4d7 Avoid having an ending separator before crop marker 2022-06-16 18:23:57 +02:00
bors[bot]
a59ae19842 Merge #558
558: Deletion benchmarks r=ManyTheFish a=ManyTheFish

Add benchmarks on the deletion and start rethinking benchmark names.

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-06-16 09:34:37 +00:00
ManyTheFish
2652310f2a Change delete benchmark names 2022-06-16 10:32:58 +02:00
ManyTheFish
adbb0ff318 Add deletion benchmarks 2022-06-16 10:17:58 +02:00
bors[bot]
0a5d1a445e Merge #554
554: Enhance tests for soft deletetion r=irevoire a=ManyTheFish

#### tests: (skip in changelog)
- [x] placeholder search shouldn’t return soft deleted
- [x] search shouldn’t return soft deleted
- [x] filtered placeholder search shouldn’t return soft deleted
- [x] geo-filtered placeholder search shouldn’t return soft deleted
- [x] documents list/get shouldn’t return soft deleted
- [x] stats shouldn’t count soft deleted

#### other: (API breaking)
- [x] ensure that Index methods are not bypassed by Meilisearch


Poke `@irevoire,` we may merge this into your branch.

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-06-14 09:49:37 +00:00
ManyTheFish
447195a27a Replace format by to_string 2022-06-14 10:32:44 +02:00