Commit Graph

10662 Commits

Author SHA1 Message Date
Tamo
099abefc6d Merge branch 'main' into metrics/prometheus-setup 2022-08-22 09:56:15 +02:00
mohandasspat
a05101af4d clippy & fmt fixed 2022-08-22 13:21:22 +05:30
mohandasspat
109540011a conflict fixes 2022-08-22 13:21:22 +05:30
mohandasspat
2f92169e48 clippy issue in metrics fixed 2022-08-22 13:21:22 +05:30
Pavo-Tusker
a58b00d8f1 Update meilisearch-http/src/option.rs
Co-authored-by: Tamo <irevoire@protonmail.ch>
2022-08-22 13:21:22 +05:30
mohandasspat
2b8f3c26ec Changed prometheus metrics feature as optional 2022-08-22 13:21:22 +05:30
mohandasspat
0b6ca73790 review fixes 2022-08-22 13:21:22 +05:30
Pavo-Tusker
1f1482e97c Update meilisearch-http/src/routes/mod.rs
Co-authored-by: Tamo <irevoire@protonmail.ch>
2022-08-22 13:21:22 +05:30
mohandasspat
25fecf9360 clippy & rustfmt fixed 2022-08-22 13:21:22 +05:30
mohandasspat
4bee0565e8 prometheus and grafana dashboards implemented 2022-08-22 13:21:22 +05:30
mohandasspat
d5da063666 clippy & fmt fixed 2022-08-22 10:52:09 +05:30
mohandasspat
43bb5176a9 conflict fixes 2022-08-22 10:30:07 +05:30
Irevoire
e7624abe63 share heed between all sub-crates 2022-08-19 11:23:41 +02:00
ManyTheFish
993aa1321c Fix query tree building 2022-08-18 17:56:06 +02:00
ManyTheFish
bff9653050 Fix remove count 2022-08-18 17:36:30 +02:00
ManyTheFish
9640976c79 Rename TermMatchingPolicies 2022-08-18 17:36:08 +02:00
bors[bot]
a0734c991c Merge #2674
2674: Add analytics on the stats routes r=ManyTheFish a=irevoire

# Pull Request

## What does this PR do?
Implements https://github.com/meilisearch/specifications/pull/169

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-08-18 14:19:56 +00:00
bors[bot]
cb29d7d124 Merge #2678
2678: Accept either an array of documents or a single document r=irevoire a=Kerollmops

# Pull Request

## What does this PR do?
Fixes #2671

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?


Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-08-18 14:00:01 +00:00
Clément Renault
e32d5ef2b3 Fix the test with an uncomprehensible user error message 2022-08-18 14:37:44 +02:00
bors[bot]
60a7221827 Merge #609
609: Retry downloading the benchmarks datasets r=Kerollmops a=irevoire

Downloading the benchmarks datasets is failing [more and more](https://github.com/meilisearch/milli/pull/607#pullrequestreview-1076023074) often; thus, instead of fixing the issue, I thought we could retry multiple times.


Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-08-18 11:47:09 +00:00
bors[bot]
afc10acd19 Merge #596
596: Filter operators: NOT + IN[..] r=irevoire a=loiclec

# Pull Request

## What does this PR do?
Implements the changes described in https://github.com/meilisearch/meilisearch/issues/2580
It is based on top of #556 

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-08-18 11:24:32 +00:00
Loïc Lecrenier
c7a86b56ef Fix filter parser compilation error 2022-08-18 13:16:56 +02:00
Loïc Lecrenier
9b6602cba2 Avoid cloning FilterCondition in filter array parsing 2022-08-18 13:06:57 +02:00
Loïc Lecrenier
8a271223a9 Change a macro_rules to a function in filter parser 2022-08-18 13:03:55 +02:00
bors[bot]
ee69ede1ce Merge #2677
2677: Hide the batch_uid field from the tasks route r=Kerollmops a=Kerollmops

# Pull Request

## What does this PR do?

Fixes #2676

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?


Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-08-18 10:01:09 +00:00
Clément Renault
9b2036ac05 Accept either an array of documents or a single document 2022-08-18 11:55:14 +02:00
Loïc Lecrenier
dd34dbaca5 Add more filter parser tests 2022-08-18 11:55:01 +02:00
Loïc Lecrenier
5d74ebd5e5 Cargo fmt 2022-08-18 11:36:38 +02:00
Clément Renault
5c543f9d94 Add a test for single document upload 2022-08-18 11:33:22 +02:00
Loïc Lecrenier
9af69c151b Limit the maximum depth of filters
This should have no impact on the user but is there to safeguard
meilisearch against malicious inputs.
2022-08-18 11:31:38 +02:00
Clément Renault
0c03ed3c1e Hide the batch_uid field from the tasks route 2022-08-18 11:15:21 +02:00
Loïc Lecrenier
c51dcad51b Don't recompute filterable fields in evaluation of IN[] filter 2022-08-18 10:59:21 +02:00
Loïc Lecrenier
98f0da6b38 Simplify representation of nested NOT filters 2022-08-18 10:58:24 +02:00
Loïc Lecrenier
b030efdc83 Fix parsing of IN[] filter followed by whitespace + factorise its impl 2022-08-18 10:58:04 +02:00
Irevoire
84a784834e retry downloading the benchmarks datasets 2022-08-17 19:25:05 +02:00
bors[bot]
79094bcbcf Merge #607
607: Better threshold r=Kerollmops a=irevoire

# Pull Request

## What does this PR do?
Fixes #570 

This PR tries to improve the threshold used to trigger the real deletion of documents.
The deletion is now triggered in two cases;
- 10% of the total available space is used by soft deleted documents
- 90% of the total available space is used.

In this context, « total available space » means the `map_size` of lmdb.
And the size used by the soft deleted documents is actually an estimation. We can't determine precisely the size used by one document thus what we do is; take the total space used, divide it by the number of documents + soft deleted documents to estimate the size of one average document. Then multiply the size of one avg document by the number of soft deleted document.

--------

<img width="808" alt="image" src="https://user-images.githubusercontent.com/7032172/185083075-92cf379e-8ae1-4bfc-9ca6-93b54e6ab4e9.png">

Here we can see we have a ~10GB drift in the end between the space used by the soft deleted and the real space used by the documents.
Personally I don’t think that's a big issue because once the red line reach 90GB everything will be freed but now you know.

If you have an idea on how to improve this estimation I would love to hear it.
It look like the difference is linear so maybe we could simply multiply the current estimation by two?

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-08-17 16:31:04 +00:00
mohandasspat
54a0b47c2b clippy issue in metrics fixed 2022-08-17 21:08:28 +05:30
Loïc Lecrenier
497f9817a2 Use snapshot testing for the filter parser 2022-08-17 17:35:01 +02:00
Pavo-Tusker
947fb5c956 Update meilisearch-http/src/option.rs
Co-authored-by: Tamo <irevoire@protonmail.ch>
2022-08-17 20:57:07 +05:30
mohandasspat
cd18459484 Changed prometheus metrics feature as optional 2022-08-17 20:56:15 +05:30
mohandasspat
225d9936ed review fixes 2022-08-17 20:55:29 +05:30
Pavo-Tusker
93daa4c464 Update meilisearch-http/src/routes/mod.rs
Co-authored-by: Tamo <irevoire@protonmail.ch>
2022-08-17 20:55:29 +05:30
mohandasspat
d08c77706c clippy & rustfmt fixed 2022-08-17 20:55:29 +05:30
mohandasspat
de58ccd4ba prometheus and grafana dashboards implemented 2022-08-17 20:54:39 +05:30
Irevoire
4aae07d5f5 expose the size methods 2022-08-17 17:07:38 +02:00
Irevoire
e96b852107 bump heed 2022-08-17 17:05:50 +02:00
Loïc Lecrenier
238a7be58d Fix filter parser handling of keywords and surrounding spaces
Now the following fragments are allowed:

AND(field =

AND'field' =

AND"field" =
2022-08-17 16:53:40 +02:00
Irevoire
62240b7e19 add analytics on the stats routes 2022-08-17 16:12:26 +02:00
Loïc Lecrenier
b09a8f1b91 Filters: add explicit error message when using a keyword as value 2022-08-17 16:07:00 +02:00
bors[bot]
087da5621a Merge #587
587: Word prefix pair proximity docids indexation refactor r=Kerollmops a=loiclec

# Pull Request

## What does this PR do?
Refactor the code of `WordPrefixPairProximityDocIds` to make it much faster, fix a bug, and add a unit test.

## Why is it faster?
Because we avoid using a sorter to insert the (`word1`, `prefix`, `proximity`) keys and their associated bitmaps, and thus we don't have to sort a potentially very big set of data. I have also added a couple of other optimisations: 

1. reusing allocations
2. using a prefix trie instead of an array of prefixes to get all the prefixes of a word
3. inserting directly into the database instead of putting the data in an intermediary grenad when possible. Also avoid checking for pre-existing values in the database when we know for certain that they do not exist. 

## What bug was fixed?
When reindexing, the `new_prefix_fst_words` prefixes may look like:
```
["ant",  "axo", "bor"]
```
which we group by first letter:
```
[["ant", "axo"], ["bor"]]
```

Later in the code, if we have the word2 "axolotl", we try to find which subarray of prefixes contains its prefixes. This check is done with `word2.starts_with(subarray_prefixes[0])`, but `"axolotl".starts_with("ant")` is false, and thus we wrongly think that there are no prefixes in `new_prefix_fst_words` that are prefixes of `axolotl`.

## StrStrU8Codec
I had to change the encoding of `StrStrU8Codec` to make the second string null-terminated as well. I don't think this should be a problem, but I may have missed some nuances about the impacts of this change.

## Requests when reviewing this PR
I have explained what the code does in the module documentation of `word_pair_proximity_prefix_docids`. It would be nice if someone could read it and give their opinion on whether it is a clear explanation or not. 

I also have a couple questions regarding the code itself:
- Should we clean up and factor out the `PrefixTrieNode` code to try and make broader use of it outside this module? For now, the prefixes undergo a few transformations: from FST, to array, to prefix trie. It seems like it could be simplified.
- I wrote a function called `write_into_lmdb_database_without_merging`. (1) Are we okay with such a function existing? (2) Should it be in `grenad_helpers` instead?

## Benchmark Results

We reduce the time it takes to index about 8% in most cases, but it varies between -3% and -20%. 

```
group                                                                     indexing_main_ce90fc62                  indexing_word-prefix-pair-proximity-docids-refactor_cbad2023
-----                                                                     ----------------------                  ------------------------------------------------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable-                 1.00  1893.0±233.03µs        ? ?/sec    1.01  1921.2±260.79µs        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-           1.05      9.4±3.51ms        ? ?/sec     1.00      9.0±2.14ms        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested-    1.22    18.3±11.42ms        ? ?/sec     1.00     15.0±5.79ms        ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable-            1.00     41.4±4.20ms        ? ?/sec     1.28    53.0±13.97ms        ? ?/sec
indexing/-wiki-delete-searchable-                                         1.00   285.6±18.12ms        ? ?/sec     1.03   293.1±16.09ms        ? ?/sec
indexing/Indexing geo_point                                               1.03      60.8±0.45s        ? ?/sec     1.00      58.8±0.68s        ? ?/sec
indexing/Indexing movies in three batches                                 1.14      16.5±0.30s        ? ?/sec     1.00      14.5±0.24s        ? ?/sec
indexing/Indexing movies with default settings                            1.11      13.7±0.07s        ? ?/sec     1.00      12.3±0.28s        ? ?/sec
indexing/Indexing nested movies with default settings                     1.10      10.6±0.11s        ? ?/sec     1.00       9.6±0.15s        ? ?/sec
indexing/Indexing nested movies without any facets                        1.11       9.4±0.15s        ? ?/sec     1.00       8.5±0.10s        ? ?/sec
indexing/Indexing songs in three batches with default settings            1.18      66.2±0.39s        ? ?/sec     1.00      56.0±0.67s        ? ?/sec
indexing/Indexing songs with default settings                             1.07      58.7±1.26s        ? ?/sec     1.00      54.7±1.71s        ? ?/sec
indexing/Indexing songs without any facets                                1.08      53.1±0.88s        ? ?/sec     1.00      49.3±1.43s        ? ?/sec
indexing/Indexing songs without faceted numbers                           1.08      57.7±1.33s        ? ?/sec     1.00      53.3±0.98s        ? ?/sec
indexing/Indexing wiki                                                    1.06   1051.1±21.46s        ? ?/sec     1.00    989.6±24.55s        ? ?/sec
indexing/Indexing wiki in three batches                                   1.20    1184.8±8.93s        ? ?/sec     1.00     989.7±7.06s        ? ?/sec
indexing/Reindexing geo_point                                             1.04      67.5±0.75s        ? ?/sec     1.00      64.9±0.32s        ? ?/sec
indexing/Reindexing movies with default settings                          1.12      13.9±0.17s        ? ?/sec     1.00      12.4±0.13s        ? ?/sec
indexing/Reindexing songs with default settings                           1.05      60.6±0.84s        ? ?/sec     1.00      57.5±0.99s        ? ?/sec
indexing/Reindexing wiki                                                  1.07   1725.0±17.92s        ? ?/sec     1.00    1611.4±9.90s        ? ?/sec
```

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-08-17 14:06:12 +00:00