Commit Graph

764 Commits

Author SHA1 Message Date
bors[bot]
50f6524ff2 Merge #579
579: Stop reindexing already indexed documents r=ManyTheFish a=irevoire

```
 % ./compare.sh indexing_stop-reindexing-unchanged-documents_cb5a1669.json indexing_main_eeba1960.json
group                                                                     indexing_main_eeba1960                 indexing_stop-reindexing-unchanged-documents_cb5a1669
-----                                                                     ----------------------                 -----------------------------------------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable-                 1.03      2.0±0.22ms        ? ?/sec    1.00  1955.4±336.24µs        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-           1.08     11.0±2.93ms        ? ?/sec    1.00     10.2±4.04ms        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested-    1.00     15.1±3.89ms        ? ?/sec    1.14     17.1±5.18ms        ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable-            1.26    59.2±12.01ms        ? ?/sec    1.00     47.1±8.52ms        ? ?/sec
indexing/-wiki-delete-searchable-                                         1.08   316.6±31.53ms        ? ?/sec    1.00   293.6±17.00ms        ? ?/sec
indexing/Indexing geo_point                                               1.01      60.9±0.31s        ? ?/sec    1.00      60.6±0.36s        ? ?/sec
indexing/Indexing movies in three batches                                 1.04      20.0±0.30s        ? ?/sec    1.00      19.2±0.25s        ? ?/sec
indexing/Indexing movies with default settings                            1.02      19.1±0.18s        ? ?/sec    1.00      18.7±0.24s        ? ?/sec
indexing/Indexing nested movies with default settings                     1.02      26.2±0.29s        ? ?/sec    1.00      25.9±0.22s        ? ?/sec
indexing/Indexing nested movies without any facets                        1.02      25.3±0.32s        ? ?/sec    1.00      24.7±0.26s        ? ?/sec
indexing/Indexing songs in three batches with default settings            1.00      66.7±0.41s        ? ?/sec    1.01      67.1±0.86s        ? ?/sec
indexing/Indexing songs with default settings                             1.00      58.3±0.90s        ? ?/sec    1.01      58.8±1.32s        ? ?/sec
indexing/Indexing songs without any facets                                1.00      54.5±1.43s        ? ?/sec    1.01      55.2±1.29s        ? ?/sec
indexing/Indexing songs without faceted numbers                           1.00      57.9±1.20s        ? ?/sec    1.01      58.4±0.93s        ? ?/sec
indexing/Indexing wiki                                                    1.00   1052.0±10.95s        ? ?/sec    1.02   1069.4±20.38s        ? ?/sec
indexing/Indexing wiki in three batches                                   1.00    1193.1±8.83s        ? ?/sec    1.00    1189.5±9.40s        ? ?/sec
indexing/Reindexing geo_point                                             3.22      67.5±0.73s        ? ?/sec    1.00      21.0±0.16s        ? ?/sec
indexing/Reindexing movies with default settings                          3.75      19.4±0.28s        ? ?/sec    1.00       5.2±0.05s        ? ?/sec
indexing/Reindexing songs with default settings                           8.90      61.4±0.91s        ? ?/sec    1.00       6.9±0.07s        ? ?/sec
indexing/Reindexing wiki                                                  1.00   1748.2±35.68s        ? ?/sec    1.00   1750.5±18.53s        ? ?/sec
```

tldr: We do not lose any performance on the normal indexing benchmark, but we get between 3 and 8 times faster on the reindexing benchmarks 👍 

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-08-04 08:10:37 +00:00
ManyTheFish
d6f9a60a32 fix: Remove whitespace trimming during document id validation
fix #592
2022-08-03 11:38:40 +02:00
Tamo
7fc35c5586 remove the useless prints 2022-08-02 10:31:22 +02:00
Tamo
f156d7dd3b Stop reindexing already indexed documents 2022-08-02 10:31:20 +02:00
Loïc Lecrenier
07003704a8 Merge branch 'filter/field-exist' 2022-07-21 14:51:41 +02:00
Loïc Lecrenier
1506683705 Avoid using too much memory when indexing facet-exists-docids 2022-07-19 14:42:35 +02:00
Loïc Lecrenier
aed8c69bcb Refactor indexation of the "facet-id-exists-docids" database
The idea is to directly create a sorted and merged list of bitmaps
in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating
a grenad::Reader where the keys are field_id and the values are docids.

Then we send that BTreeMap to the thing that handles TypedChunks, which
inserts its content into the database.
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
1eb1e73bb3 Add integration tests for the EXISTS filter 2022-07-19 10:07:33 +02:00
Loïc Lecrenier
80b962b4f4 Run cargo fmt 2022-07-19 10:07:33 +02:00
Loïc Lecrenier
c17d616250 Refactor index_documents_check_exists_database tests 2022-07-19 10:07:33 +02:00
Loïc Lecrenier
30bd4db0fc Simplify indexing task for facet_exists_docids database 2022-07-19 10:07:33 +02:00
Loïc Lecrenier
392472f4bb Apply suggestions from code review
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
453d593ce8 Add a database containing the docids where each field exists 2022-07-19 10:07:33 +02:00
Loïc Lecrenier
fc9f3f31e7 Change DocumentsBatchReader to access cursor and index at same time
Otherwise it is not possible to iterate over all documents while
using the fields index at the same time.
2022-07-18 16:08:14 +02:00
Loïc Lecrenier
ab1571cdec Simplify Transform::read_documents, enabled by enriched documents reader 2022-07-18 12:45:47 +02:00
Kerollmops
448114cc1c Fix the benchmarks with the new indexation API 2022-07-12 15:22:09 +02:00
Kerollmops
25e768f31c Fix another issue with the nested primary key selector 2022-07-12 15:14:07 +02:00
Kerollmops
192793ee38 Add some tests to check for the nested documents ids 2022-07-12 15:14:07 +02:00
Kerollmops
dc61105554 Fix the nested document id fetching function 2022-07-12 15:14:06 +02:00
Kerollmops
2eec290424 Check the validity of the latitute and longitude numbers 2022-07-12 15:14:06 +02:00
Kerollmops
5d149d631f Remove tests for a function that no more exists 2022-07-12 15:14:06 +02:00
Kerollmops
0bbcc7b180 Expose the DocumentId struct to be sure to inject the generated ids 2022-07-12 15:14:06 +02:00
Kerollmops
d1a4da9812 Generate a real UUIDv4 when ids are auto-generated 2022-07-12 15:14:06 +02:00
Kerollmops
c8ebf0de47 Rename the validate function as an enriching function 2022-07-12 15:14:06 +02:00
Kerollmops
905af2a2e9 Use the primary key and external id in the transform 2022-07-12 15:14:05 +02:00
Kerollmops
742543091e Constify the default primary key name 2022-07-12 14:55:52 +02:00
Kerollmops
5f1bfb73ee Extract the primary key name and make it accessible 2022-07-12 14:55:52 +02:00
Kerollmops
6a0a0ae94f Make the Transform read from an EnrichedDocumentsBatchReader 2022-07-12 14:55:52 +02:00
Kerollmops
8ebf5eed0d Make the nested primary key work 2022-07-12 14:55:52 +02:00
Kerollmops
19eb3b4708 Make sur that we do not accept floats as documents ids 2022-07-12 14:55:52 +02:00
Kerollmops
2ceeb51c37 Support the auto-generated ids when validating documents 2022-07-12 14:55:51 +02:00
Kerollmops
399eec5c01 Fix the indexation tests 2022-07-12 14:55:51 +02:00
Kerollmops
fcfc4caf8c Move the Object type in the lib.rs file and use it everywhere 2022-07-12 14:55:51 +02:00
Kerollmops
0146175fe6 Introduce the validate_documents_batch function 2022-07-12 14:55:51 +02:00
Kerollmops
bdc4263883 Introduce the validate_documents_batch function 2022-07-12 14:55:51 +02:00
Kerollmops
e8297ad27e Fix the tests for the new DocumentsBatchBuilder/Reader 2022-07-12 14:52:56 +02:00
bors[bot]
ebddfdb9a3 Merge #578
578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops

Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-05 14:56:08 +00:00
Kerollmops
1bfdcfc84f Bump uuid to 1.1.2 2022-07-05 16:23:36 +02:00
Tamo
250be9fe6c put the threshold back to 10k 2022-07-05 15:57:44 +02:00
Tamo
eaf28b0628 Apply review suggestions
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-05 15:30:33 +02:00
Tamo
3b309f654a Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Kerollmops
d7c248042b Rename the limitedTo parameter into maxTotalHits 2022-06-22 12:00:48 +02:00
ManyTheFish
177154828c Extends deletion tests 2022-06-13 17:34:16 +02:00
Kerollmops
445d5474cc Add the pagination_limited_to setting to the database 2022-06-08 18:14:27 +02:00
Kerollmops
69931e50d2 Add the max_values_by_facet setting to the database 2022-06-08 17:54:56 +02:00
Kerollmops
52a494bd3b Add the new pagination.limited_to and faceting.max_values_per_facet settings 2022-06-08 17:15:36 +02:00
Tamo
d0aaa7ff00 Fix wrong internal ids assignments 2022-06-07 15:49:33 +02:00
ad hoc
31776fdc3f add failing test 2022-06-07 15:49:33 +02:00
ManyTheFish
86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
ad hoc
8993fec8a3 return optional exact words 2022-05-24 09:15:49 +02:00