Commit Graph

1182 Commits

Author SHA1 Message Date
6cc975704d Add some documentation to facets.rs 2022-08-17 12:59:52 +02:00
93252769af Apply review suggestions 2022-08-17 12:41:22 +02:00
196f79115a Run cargo fmt 2022-08-17 12:28:33 +02:00
d10d78d520 Add integration tests for the IN filter 2022-08-17 12:28:33 +02:00
ca97cb0eda Implement the IN filter operator 2022-08-17 12:28:33 +02:00
cc7415bb31 Simplify FilterCondition code, made possible by the new NOT operator 2022-08-17 12:28:33 +02:00
44744d9e67 Implement the simplified NOT operator 2022-08-17 12:28:33 +02:00
01675771d5 Reimplement != filter to select all docids not selected by = 2022-08-17 12:28:33 +02:00
258c3dd563 Make AND+OR filters n-ary (store a vector of subfilters instead of 2)
NOTE: The token_at_depth is method is a bit useless now, as the only
cases where there would be a toke at depth 1000 are the cases where
the parser already stack-overflowed earlier.

Example: (((((... (x=1) ...)))))
2022-08-17 12:28:33 +02:00
39687908f1 Add documentation and comments to facets.rs 2022-08-17 12:26:49 +02:00
8d4b21a005 Switch string facet levels indexation to new algo
Write the algorithm once for both numbers and strings
2022-08-17 12:26:49 +02:00
cf0cd92ed4 Refactor Facets::execute to increase performance 2022-08-17 12:26:49 +02:00
cd2635ccfc Merge #602
602: Use mimalloc as the default allocator r=Kerollmops a=loiclec

## What does this PR do?
Use mimalloc as the global allocator for milli's benchmarks on macOS.

## Why?
On Linux, we use jemalloc, which is a very fast allocator. But on macOS, we currently use the system allocator, which is very slow. In practice, this difference in allocator speed means that it is difficult to gain insight into milli's performance by running benchmarks locally on the Mac.

By using mimalloc, which is another excellent allocator, we reduce the speed difference between the two platforms.

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-08-17 10:26:13 +00:00
78d9f0622d cargo fmt 2022-08-17 12:21:24 +02:00
4f9edf13d7 Remove commented-out function 2022-08-17 12:21:24 +02:00
405555b401 Add some documentation to PrefixTrieNode 2022-08-17 12:21:24 +02:00
1bc4788e59 Remove cached Allocations struct from wpppd indexing 2022-08-17 12:18:22 +02:00
ef75a77464 Fix undefined behaviour caused by reusing key from the database
New full snapshot:
---
source: milli/src/update/word_prefix_pair_proximity_docids.rs
---
5                a    1  [101, ]
5                a    2  [101, ]
5                am   1  [101, ]
5                b    4  [101, ]
5                be   4  [101, ]
am               a    3  [101, ]
amazing          a    1  [100, ]
amazing          a    2  [100, ]
amazing          a    3  [100, ]
amazing          an   1  [100, ]
amazing          an   2  [100, ]
amazing          b    2  [100, ]
amazing          be   2  [100, ]
an               a    1  [100, ]
an               a    2  [100, 202, ]
an               am   1  [100, ]
an               an   2  [100, ]
an               b    3  [100, ]
an               be   3  [100, ]
and              a    2  [100, ]
and              a    3  [100, ]
and              a    4  [100, ]
and              am   2  [100, ]
and              an   3  [100, ]
and              b    1  [100, ]
and              be   1  [100, ]
at               a    1  [100, 202, ]
at               a    2  [100, 101, ]
at               a    3  [100, ]
at               am   2  [100, 101, ]
at               an   1  [100, 202, ]
at               an   3  [100, ]
at               b    3  [101, ]
at               b    4  [100, ]
at               be   3  [101, ]
at               be   4  [100, ]
beautiful        a    2  [100, ]
beautiful        a    3  [100, ]
beautiful        a    4  [100, ]
beautiful        am   3  [100, ]
beautiful        an   2  [100, ]
beautiful        an   4  [100, ]
bell             a    2  [101, ]
bell             a    4  [101, ]
bell             am   4  [101, ]
extraordinary    a    2  [202, ]
extraordinary    a    3  [202, ]
extraordinary    an   2  [202, ]
house            a    3  [100, 202, ]
house            a    4  [100, 202, ]
house            am   4  [100, ]
house            an   3  [100, 202, ]
house            b    2  [100, ]
house            be   2  [100, ]
rings            a    1  [101, ]
rings            a    3  [101, ]
rings            am   3  [101, ]
rings            b    2  [101, ]
rings            be   2  [101, ]
the              a    3  [101, ]
the              b    1  [101, ]
the              be   1  [101, ]
2022-08-17 12:17:45 +02:00
7309111433 Don't run block code in doc tests of word_pair_proximity_docids 2022-08-17 12:17:18 +02:00
f6f8f543e1 Run cargo fmt 2022-08-17 12:17:18 +02:00
34c991ea02 Add newlines in documentation of word_prefix_pair_proximity_docids 2022-08-17 12:17:18 +02:00
06f3fd8c6d Add more comments to WordPrefixPairProximityDocids::execute 2022-08-17 12:17:18 +02:00
474500362c Update wpppd snapshots
New snapshot (yes, it's wrong as well, it will get fixed later):

---
source: milli/src/update/word_prefix_pair_proximity_docids.rs
---
5                a    1  [101, ]
5                a    2  [101, ]
5                am   1  [101, ]
5                b    4  [101, ]
5                be   4  [101, ]
am               a    3  [101, ]
amazing          a    1  [100, ]
amazing          a    2  [100, ]
amazing          a    3  [100, ]
amazing          an   1  [100, ]
amazing          an   2  [100, ]
amazing          b    2  [100, ]
amazing          be   2  [100, ]
an               a    1  [100, ]
an               a    2  [100, 202, ]
an               am   1  [100, ]
an               b    3  [100, ]
an               be   3  [100, ]
and              a    2  [100, ]
and              a    3  [100, ]
and              a    4  [100, ]
and              b    1  [100, ]
and              be   1  [100, ]
                 d\0  0  [100, 202, ]
an               an   2  [100, ]
and              am   2  [100, ]
and              an   3  [100, ]
at               a    2  [100, 101, ]
at               a    3  [100, ]
at               am   2  [100, 101, ]
at               an   1  [100, 202, ]
at               an   3  [100, ]
at               b    3  [101, ]
at               b    4  [100, ]
at               be   3  [101, ]
at               be   4  [100, ]
beautiful        a    2  [100, ]
beautiful        a    3  [100, ]
beautiful        a    4  [100, ]
beautiful        am   3  [100, ]
beautiful        an   2  [100, ]
beautiful        an   4  [100, ]
bell             a    2  [101, ]
bell             a    4  [101, ]
bell             am   4  [101, ]
extraordinary    a    2  [202, ]
extraordinary    a    3  [202, ]
extraordinary    an   2  [202, ]
house            a    4  [100, 202, ]
house            a    4  [100, ]
house            am   4  [100, ]
house            an   3  [100, 202, ]
house            b    2  [100, ]
house            be   2  [100, ]
rings            a    1  [101, ]
rings            a    3  [101, ]
rings            am   3  [101, ]
rings            b    2  [101, ]
rings            be   2  [101, ]
the              a    3  [101, ]
the              b    1  [101, ]
the              be   1  [101, ]
2022-08-17 12:17:18 +02:00
ea4a96761c Move content of readme for WordPrefixPairProximityDocids into the code 2022-08-17 12:05:37 +02:00
220921628b Simplify and document WordPrefixPairProximityDocIds::execute 2022-08-17 11:59:19 +02:00
044356d221 Optimise WordPrefixPairProximityDocIds merge operation 2022-08-17 11:59:18 +02:00
d350114159 Add tests for WordPrefixPairProximityDocIds 2022-08-17 11:59:15 +02:00
86807ca848 Refactor word prefix pair proximity indexation further 2022-08-17 11:59:13 +02:00
306593144d Refactor word prefix pair proximity indexation 2022-08-17 11:59:00 +02:00
20be69e1b9 Always use mimalloc as the global allocator 2022-08-16 20:09:36 +02:00
dea00311b6 Add type annotations to remove compiler error 2022-08-16 09:19:30 +02:00
6f49126223 Fix db_snap macro with inline parameter 2022-08-10 15:55:22 +02:00
12920f2a4f Fix paths of snapshot tests 2022-08-10 15:53:46 +02:00
4b7fd4dfae Update insta version 2022-08-10 15:53:46 +02:00
ce560fdcb5 Add documentation for db_snap! 2022-08-10 15:53:46 +02:00
748bb86b5b cargo fmt 2022-08-10 15:53:46 +02:00
051f24f674 Switch to snapshot tests for search/matches/mod.rs 2022-08-10 15:53:46 +02:00
d2e01528a6 Switch to snapshot tests for search/criteria/typo.rs 2022-08-10 15:53:46 +02:00
a9c7d82693 Switch to snapshot tests for search/criteria/attribute.rs 2022-08-10 15:53:46 +02:00
4bba2f41d7 Switch to snapshot tests for query_tree.rs 2022-08-10 15:53:46 +02:00
8ac24d3114 Cargo fmt + fix compiler warnings/error 2022-08-10 15:53:46 +02:00
6066256689 Add snapshot tests for indexing of word_prefix_pair_proximity_docids 2022-08-10 15:53:46 +02:00
3a734af159 Add snapshot tests for Facets::execute 2022-08-10 15:53:46 +02:00
b9907997e4 Remove old snapshot tests code 2022-08-10 15:53:46 +02:00
ef889ade5d Refactor snapshot tests 2022-08-10 15:53:46 +02:00
334098a7e0 Add index snapshot test helper function 2022-08-10 15:53:46 +02:00
b389be48a0 Factorize phrase computation 2022-08-08 10:37:31 +02:00
58cb1c1bda Simplify unit tests in facet/filter.rs 2022-08-04 12:03:44 +02:00
acff17fb88 Simplify indexing tests 2022-08-04 12:03:13 +02:00
21284cf235 Merge #556
556: Add EXISTS filter r=loiclec a=loiclec

## What does this PR do?

Fixes issue [#2484](https://github.com/meilisearch/meilisearch/issues/2484) in the meilisearch repo.

It creates a `field EXISTS` filter which selects all documents containing the `field` key. 
For example, with the following documents:
```json
[{
	"id": 0,
	"colour": []
},
{
	"id": 1,
	"colour": ["blue", "green"]
},
{
	"id": 2,
	"colour": 145238
},
{
	"id": 3,
	"colour": null
},
{
	"id": 4,
	"colour": {
		"green": []
	}
},
{
	"id": 5,
	"colour": {}
},
{
	"id": 6
}]
```
Then the filter `colour EXISTS` selects the ids `[0, 1, 2, 3, 4, 5]`. The filter `colour NOT EXISTS` selects `[6]`.

## Details
There is a new database named `facet-id-exists-docids`. Its keys are field ids and its values are bitmaps of all the document ids where the corresponding field exists.

To create this database, the indexing part of milli had to be adapted. The implementation there is basically copy/pasted from the code handling the `facet-id-f64-docids` database, with appropriate modifications in place.

There was an issue involving the flattening of documents during (re)indexing. Previously, the following JSON:
```json
{
    "id": 0,
    "colour": [],
    "size": {}
}
```
would be flattened to:
```json
{
    "id": 0
}
```
prior to being given to the extraction pipeline.

This transformation would lose the information that is needed to populate the `facet-id-exists-docids` database. Therefore, I have also changed the implementation of the `flatten-serde-json` crate. Now, as it traverses the Json, it keeps track of which key was encountered. Then, at the end, if a previously encountered key is not present in the flattened object, it adds that key to the object with an empty array as value. For example:
```json
{
    "id": 0,
    "colour": {
        "green": [],
        "blue": 1
    },
    "size": {}
} 
```
becomes
```json
{
    "id": 0,
    "colour": [],
    "colour.green": [],
    "colour.blue": 1,
    "size": []
} 
```


Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-08-04 09:46:06 +00:00