Commit Graph

159 Commits

Author SHA1 Message Date
c15c076da9 DB BREAKING: Count the number of words in field_id_word_count_docids 2023-06-08 12:07:11 +02:00
8628a0c856 Remove docid_word_positions_db + fix deletion bug
That would happen when a word was deleted from all exact attributes
but not all regular attributes.
2023-06-07 10:52:50 +02:00
90bc230820 Merge remote-tracking branch 'origin/main' into search-refactor
Conflicts | resolution
----------|-----------
Cargo.lock | added mimalloc
Cargo.toml |  took origin/main version
milli/src/search/criteria/exactness.rs | deleted after checking it was only clippy changes
milli/src/search/query_tree.rs | deleted after checking it was only clippy changes
2023-05-03 12:19:06 +02:00
414b3fae89 Merge #3571
3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops

# Pull Request

## Related issue
This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884) is describing in a very detailed manner.

## What does this PR do?

### `IS NULL` and `IS NOT NULL`

This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter.

- `IS NULL` matches fields that `EXISTS` AND `= IS NULL`
- `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL`

1. `{"name": "A", "price": null}`
2. `{"name": "A", "price": 10}`
3. `{"name": "A"}`

`price IS NULL` would match 1
`price IS NOT NULL` or `NOT price IS NULL` would match 2,3
`price EXISTS` would match 1, 2
`price NOT EXISTS` or `NOT price EXISTS` would match 3

common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2

### `IS EMPTY` and `IS NOT EMPTY`

- `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty
- `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty.

1. `{"name": "A", "tags": null}`
2. `{"name": "A", "tags": [null]}`
3. `{"name": "A", "tags": []}`
4. `{"name": "A", "tags": ["hello","world"]}`
5. `{"name": "A", "tags": [""]}`
6. `{"name": "A"}`
7. `{"name": "A", "tags": {}}`
8. `{"name": "A", "tags": {"t1":"v1"}}`
9. `{"name": "A", "tags": {"t1":""}}`
10. `{"name": "A", "tags": ""}`

`tags IS EMPTY` would match 3,7,10
`tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9
`tags IS NULL` would match 1
`tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10
`tags EXISTS` would match 1,2,3,4,5,7,8,9,10
`tags NOT EXISTS` or `NOT tags EXISTS` would match 6

common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9

## What should the reviewer do?

- Check that I tested the filters
- Check that I deleted the ids of the documents when deleting documents


Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-04-27 13:14:00 +00:00
84d9c731f8 Fix bug in encoding of word_position_docids and word_fid_docids 2023-04-24 09:59:30 +02:00
a81165f0d8 Merge remote-tracking branch 'origin/main' into search-refactor 2023-04-07 10:15:55 +02:00
130d2061bd Fix indexing of word_position_docid and fid 2023-04-06 17:50:39 +02:00
66ddee4390 Fix word_position_docids indexing 2023-04-06 17:50:39 +02:00
e58426109a Fix panics and issues in exactness graph ranking rule 2023-04-06 17:50:39 +02:00
996619b22a Increase position by 8 on hard separator when building query terms 2023-04-06 17:50:39 +02:00
efea1e5837 Fix facet normalization 2023-03-29 12:02:24 +02:00
9b2653427d Split position DB into fid and relative position DB 2023-03-23 09:22:01 +01:00
1a9c58a7ab Fix a bug with the new flattening rules 2023-03-15 16:56:44 +01:00
ea016d97af Implementing an IS EMPTY filter 2023-03-15 14:12:34 +01:00
2f8eb4f54a last PR fixes 2023-03-09 15:34:36 +01:00
0ad53784e7 Create a new struct to reduce the type complexity 2023-03-09 13:21:21 +01:00
e106b16148 Fix a typo in a variable
Co-authored-by: Louis Dureuil <louis@meilisearch.com>

aaa
2023-03-09 13:08:02 +01:00
5deea631ea fix clippy too many arguments 2023-03-09 11:19:13 +01:00
b4b859ec8c Fix typos 2023-03-09 10:58:35 +01:00
7dc04747fd Make clippy happy 2023-03-08 17:37:08 +01:00
43ff236df8 Write the NULL facet values in the database 2023-03-08 16:49:53 +01:00
19ab4d1a15 Classify the NULL fields values in the facet extractor 2023-03-08 16:49:31 +01:00
24c0775c67 Change indexing threshold 2023-03-08 12:36:04 +01:00
3092cf0448 Fix clippy errors 2023-03-08 10:53:42 +01:00
da48506f15 Rerun extraction when language detection might have failed 2023-03-07 18:35:26 +01:00
bbecab8948 fix clippy 2023-02-21 10:18:44 +01:00
8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
18796d6e6a Consider null as a valid geo object 2023-02-20 13:45:51 +01:00
d8207356f4 Skip script,language insertion if language is undetected 2023-01-31 11:28:05 +01:00
fd60a39f1c Format code 2023-01-31 11:28:05 +01:00
d97fb6117e Extract and index data 2023-01-31 11:28:05 +01:00
d1fc42b53a Use compatibility decomposition normalizer in facets 2023-01-18 15:02:13 +01:00
8d0ace2d64 Avoid creating a MatchingWord for words that exceed the length limit 2022-11-28 10:20:13 +01:00
ac3baafbe8 Truncate facet values that are too long before indexing them 2022-11-17 11:29:42 +01:00
d95d02cb8a Fix Facet Indexing bugs
1. Handle keys with variable length correctly

This fixes https://github.com/meilisearch/meilisearch/issues/3042 and
is easily reproducible with the updated fuzz tests, which now generate
keys with variable lengths.

2. Prevent adding facets to the database if their encoded value does
not satisfy `valid_lmdb_key`.

This fixes an indexing failure when a document had a filterable
attribute containing a value whose length is higher than ~500 bytes.
2022-11-17 11:29:42 +01:00
70465aa5ce Execute cargo fmt 2022-11-04 08:59:58 +09:00
3009981d31 Fix clippy errors
Add clippy job

Add clippy job to CI
2022-11-04 08:58:14 +09:00
c7322f704c Fix cargo clippy errors
Dont apply clippy for tests for now

Fix clippy warnings of filter-parser package

parent 8352febd646ec4bcf56a44161e5c4dce0e55111f
author unvalley <38400669+unvalley@users.noreply.github.com> 1666325847 +0900
committer unvalley <kirohi.code@gmail.com> 1666791316 +0900

Update .github/workflows/rust.yml

Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>

Allow clippy lint too_many_argments

Allow clippy lint needless_collect

Allow clippy lint too_many_arguments and type_complexity

Fix for clippy warnings comparison_chains

Fix for clippy warnings vec_init_then_push

Allow clippy lint should_implement_trait

Allow clippy lint drop_non_drop

Fix lifetime clipy warnings in filter-paprser

Execute cargo fmt

Fix clippy remaining warnings

Fix clippy remaining warnings again and allow lint on each place
2022-10-27 01:04:23 +09:00
54c0cf93fe Merge remote-tracking branch 'origin/main' into facet-levels-refactor 2022-10-26 15:13:34 +02:00
a034a1e628 Move StrRefCodec and ByteSliceRefCodec to their own files 2022-10-26 13:47:46 +02:00
51961e1064 Polish some details 2022-10-26 13:47:04 +02:00
b1ab09196c Remove outdated TODOs 2022-10-26 13:47:04 +02:00
9026867d17 Give same interface to bulk and incremental facet indexing types
+ cargo fmt, oops, sorry for the bad history :(
2022-10-26 13:47:04 +02:00
485a72306d Refactor facet-related codecs 2022-10-26 13:47:04 +02:00
afdf87f6f7 Fix bugs in asc/desc criterion and facet indexing 2022-10-26 13:47:04 +02:00
a7201ece04 cargo fmt 2022-10-26 13:47:04 +02:00
61252248fb Fix some facet indexing bugs 2022-10-26 13:47:04 +02:00
85824ee203 Try to make facet indexing incremental 2022-10-26 13:47:04 +02:00
39a4a0a362 Reintroduce filter range search and facet extractors 2022-10-26 13:46:14 +02:00
c3f49f766d Prepare refactor of facets database
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00