Compare commits

..

93 Commits

Author SHA1 Message Date
Clément Renault
f46c2de607 Fix the tests 2023-09-15 12:01:29 +02:00
Tamo
388d78f70e update the description of the cli argument 2023-09-15 12:01:29 +02:00
Clément Renault
b252900470 Expose a new flag to limit the number of batched tasks 2023-09-15 12:01:28 +02:00
meili-bors[bot]
8822ca234e Merge #4057
4057: Fix stats delete by filter for v1.3.4 r=irevoire a=curquiza

Fixes https://github.com/meilisearch/meilisearch/issues/4018 for v1.3.4

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-09-12 13:34:39 +00:00
Tamo
d23abc8771 fix clippy 2023-09-12 11:26:48 +02:00
Tamo
036b846e4d Fix the stats of the documents deletion by filter
The issue was that the operation « DocumentDeletionByFilter » was not
declared as an index operation. That means the indexes stats were not
reprocessed after the application of the operation.
2023-09-12 11:26:41 +02:00
meili-bors[bot]
9889390d13 Merge #4055
4055: Update version for the next release (v1.3.4) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2023-09-11 17:04:31 +00:00
curquiza
8e2bb29cf1 Update version for the next release (v1.3.4) in Cargo.toml 2023-09-11 16:20:03 +00:00
meili-bors[bot]
256cf33bca Merge #4039
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops

This PR fixes #4035, making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-09-07 09:25:58 +00:00
meili-bors[bot]
9945cbf9db Merge #4038
4038: Fix filter escaping issues r=ManyTheFish a=Kerollmops

This PR fixes #4034 by always escaping the sequences. Users must always put quotes (simple or double) to escape the filter values.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-09-06 12:29:29 +00:00
Kerollmops
03d0f628bd Use the unescaper crate to unescape any char sequence 2023-09-06 13:59:45 +02:00
Kerollmops
ea78060916 Fix tests that were supposed to escape characters 2023-09-06 13:59:45 +02:00
Kerollmops
b42d48187a Add a test case scenario 2023-09-06 13:59:44 +02:00
Kerollmops
679c0b0f97 Extract the vectors from the non-flattened version of the documents 2023-09-06 12:26:00 +02:00
Kerollmops
e02d0064bd Add a test case scenario 2023-09-06 12:26:00 +02:00
meili-bors[bot]
7ef3572f11 Merge #4037
4037: Update version for the next release (v1.3.3) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2023-09-06 09:50:58 +00:00
curquiza
93285041a9 Update version for the next release (v1.3.3) in Cargo.toml 2023-09-06 09:23:20 +00:00
meili-bors[bot]
cdb4b3e024 Merge #4013
4013: Fix the ranking rule by temporarily disabling an assert in the bucket sort algorithm r=Kerollmops a=Kerollmops

This PR temporarily disables an assertion, making the search crash. [I created a tracking issue](https://github.com/meilisearch/meilisearch/issues/4012) to find a better way to fix this.

It no longer reverts a20e4d447c, which seemed to generate unreachable graphs and make the bucket sort ranking algorithm panic because of entering an unreachable state. We discussed that below in the comments.

Temporary fixes #4002, fixes #4006, and fixes #3995.

---

It took me approximately 2 days to find the first bad commit just because I'm bad in `git bisect` x `bash`, i.e. [I misused `%1` with `$!` to kill the most recently backgrounded job](https://unix.stackexchange.com/a/340084/212574)...

<details>
  <summary>Here is the script I used to find the invalid commit</summary>

```bash
#!/usr/bin/env bash

set -x

# remove the data
rm -rf data.ms

# build meilisearch
cargo build --release
# ignore this commit if it doesn't compile
if [[ $? != 0 ]]; then
    exit 125
fi

# index the dump and start from it
./target/release/meilisearch \
--http-addr 'localhost:7705' \
--import-dump $HOME/Downloads/modified-20230822-083016113.dump &

# wait 10 sec while it indexes the docs
sleep 5

# check if the server crashes on requests
echo '{
    "q": "rtx 305",
    "attributesToHighlight": [
        "*"
    ],
    "highlightPreTag": "<ais-highlight-0000000000>",
    "highlightPostTag": "</ais-highlight-0000000000>",
    "limit": 21,
    "offset": 0
}' | xh 'localhost:7705/indexes/arvutitark_local_orderables/search'

last_exit_code=$?

# Now kill Meilisearch
kill $!

# Clean the potential Cargo.lock
git checkout .

exit $last_exit_code
```
</details>

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-23 15:30:56 +00:00
Clément Renault
8c0ebd1331 Update milli/src/search/new/bucket_sort.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-08-23 16:40:39 +02:00
Kerollmops
5130e06b41 Temporarily disable an assert in the ranking rules 2023-08-23 16:11:54 +02:00
Clément Renault
08e27ef73f Merge pull request #4008 from meilisearch/fix-highlighting-panic
Bump charabia to 0.8.3
2023-08-23 11:56:45 +02:00
Kerollmops
717b069907 Bump charabia to 0.8.3 2023-08-22 16:25:00 +02:00
meili-bors[bot]
7ea154673a Merge #4000
4000: Update version for the next release (v1.3.2) in Cargo.toml r=irevoire a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: irevoire <irevoire@users.noreply.github.com>
2023-08-16 10:41:33 +00:00
irevoire
b947f3bb9d Update version for the next release (v1.3.2) in Cargo.toml 2023-08-16 08:20:36 +00:00
meili-bors[bot]
4c35817c5f Merge #3998
3998: Accept the `null` JSON value as a value of the `_vectors` field r=irevoire a=Kerollmops

This PR fixes #3979 by accepting `null` JSON values in the `_vectors` fields provided by the user.

Can the reviewer please verify that I am merging in the right branch?
I think we must create a new _release-v1.3.2_.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-08-16 08:12:24 +00:00
Kerollmops
c53841e166 Accept the null JSON value as the value of _vectors 2023-08-14 16:03:55 +02:00
meili-bors[bot]
ef3d098b4d Merge #3976
3976: Fix the get stats method r=ManyTheFish a=irevoire

# Pull Request

- The get stats method of the index-scheduler was not using at all the processing tasks. That was returning a wrong number of enqueued tasks and 0 processing tasks.
- Added a test
- Currently this method was **ONLY** used to compute the `meilisearch_nb_tasks` field of the **experimental feature** metrics.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3972


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-10 10:55:50 +00:00
meili-bors[bot]
44c1900f36 Merge #3986
3986: Fix geo bounding box with strings r=ManyTheFish a=irevoire

# Pull Request

When sending a document with one geofield of type string (i.e.: `{ "_geo": { "lat": 12, "lng": "13" }}`), the geobounding box would exclude this document.

This PR fixes this issue by automatically parsing the string value in case we're working on a geofield.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3973

## What does this PR do?
- Automatically parse the facet value iif we're working on a geofield.
- Make insta works with snapshots in loops or closure executed multiple times. (you may need to update your cli if it panics after this PR: `cargo install cargo-insta`).
- Add one integration test in milli and in meilisearch to ensure it works forever.
- Add three snapshots for the dump that mysteriously disappeared I don't know how


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-09 07:58:15 +00:00
meili-bors[bot]
04671d0751 Merge #3981
3981: Truncate the normalized long facets used in the search for facet value r=irevoire a=ManyTheFish

# Pull Request
 Truncate the normalized long facets used in the search for facet value

## targeted release

v1.3.1

## Related issue
Fixes #3978


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-08 15:07:07 +00:00
Tamo
4f4c669d50 add back some dump snapshots that disappeared. it's completely unrelated to this PR 2023-08-08 16:58:14 +02:00
ManyTheFish
35758db9ec Truncate the the normalized long facets used in search for facet value 2023-08-08 16:38:30 +02:00
Tamo
4988199bb9 ensure the geoboundingbox works with strings and int geofields in milli and meilisearch 2023-08-08 16:29:25 +02:00
Tamo
83991ee770 enable the multi-snapshot attribute in insta. This will let us use insta in loops 2023-08-08 16:28:38 +02:00
Tamo
9d061cec26 automatically parse the filterable attribute to float if it's a geo field 2023-08-08 16:28:07 +02:00
Tamo
fe819a9d80 fix the get stats method
It was not taking into account the processing tasks at all
2023-08-08 13:21:15 +02:00
meili-bors[bot]
e338ceb97f Merge #3982
3982: Update version for the next release (v1.3.1) in Cargo.toml r=irevoire a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: irevoire <irevoire@users.noreply.github.com>
2023-08-08 10:30:56 +00:00
irevoire
75c87d5391 Update version for the next release (v1.3.1) in Cargo.toml 2023-08-08 10:30:06 +00:00
meili-bors[bot]
5b0157c6c6 Merge #3955
3955: Update mini-dashboard to version 0.2.11 r=curquiza a=bidoubiwa

# Pull Request

## What does this PR do?
- Updates the mini-dashboard to version [0.2.11](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.11)

## PR checklist
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
2023-07-27 11:59:55 +00:00
Charlotte Vermandel
3b9a87c790 Update mini-dashboard to version 0.2.11 2023-07-27 13:16:32 +02:00
meili-bors[bot]
3a3414270d Merge #3952
3952: Use the new safe `read-txn-no-tls` heed feature r=ManyTheFish a=Kerollmops

[We recently found out](https://github.com/meilisearch/heed/issues/191#issuecomment-1650280513) that the `read-sync-txn` heed feature was invalid and must be removed from this crate. We were declaring it in milli/meilisearch but, fortunately, not sharing the `RoTxn`s across threads 😮‍💨

[I recently introduced the `read-txn-no-tls` heed feature](https://github.com/meilisearch/heed/pull/194), which implements `RoTxn: Send` and allows multiple read transactions on a single thread (which we use).

This PR removes the `sync-read-txn` heed feature from the _Cargo.toml_ file. I will fix this in heed v0.20.0 and will fill a RustSec advisory in the meantime.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-07-26 16:40:58 +00:00
meili-bors[bot]
d06e0905db Merge #3953
3953: Update UTM campaign r=curquiza a=macraig

# Pull Request

## What does this PR do?
Redirect CTAs to Cloud landing page



Co-authored-by: María <maria@Marias-MacBook-Pro.local>
2023-07-26 15:20:40 +00:00
meili-bors[bot]
939b2fc6fd Merge #3949
3949: Fix score details casing r=Kerollmops a=ManyTheFish

# Pull Request

Fixes #3941


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-07-26 14:14:59 +00:00
María
fae61372be Redirect CTAs to Cloud landing page 2023-07-26 15:54:43 +02:00
Clément Renault
d8b47b689e Use the new read-txn-no-tls heed feature 2023-07-26 15:45:15 +02:00
meili-bors[bot]
be72be7c0d Merge #3942
3942: Normalize for the search the facets values r=ManyTheFish a=Kerollmops

This PR improves and fixes the search for facet values feature. Searching for _bre_ wasn't returning facet values like _brévent_ or _brô_.

The issue was related to the fact that facets are normalized but not in the same way as the `searchableAttributes` are. We decided to normalize them further and add another intermediate database where the key is the normalized facet value, and the value is a set of the non-normalized facets. We then use these non-normalized ones to get the correct counts by fetching the associated databases.

### What's missing in this PR?
 - [x] Apply the change to the whole set of `SearchForFacetValue::execute` conditions.
 - [x] Factorize the code that does an intermediate normalized value fetch in a function.
 - [x] Add or modify the search for facet value test.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-07-25 14:37:17 +00:00
ManyTheFish
88559a2d54 Fix score details casing 2023-07-25 15:49:33 +02:00
Clément Renault
59201a7852 Use snapshot instead of asserts
Co-authored-by: Many the fish <many@meilisearch.com>
2023-07-25 15:34:05 +02:00
meili-bors[bot]
9e3e69373e Merge #3948
3948: Fix hnsw internal panic by using another library r=ManyTheFish a=Kerollmops

This pull request fixes #3923. The issue concerns the `hnsw` crate panicking due to a wrong call to the `[T]::copy_from_slice` function.

I decided to switch the library to `instant-distance`, which is maintained [by someone of trust](https://lib.rs/~djc), who maintains a lot of very important crates.

- [x] Make Clippy happy with the first commit.
- [x] Reproduce the #3923 bug without this patch
- [x] Check if the bug disappeared with this PR.
- [x] Test with [the Algolia e-commerce dataset](https://www.notion.so/meilisearch/Algolia-Ecommerce-c5fa3b5f23a7485295df7e87306d5859).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-07-25 13:28:25 +00:00
Kerollmops
29ab54b259 Replace the hnsw crate by the instant-distance one 2023-07-25 12:37:35 +02:00
Kerollmops
86d8bb3a3e Make clippy happy (again) 2023-07-25 10:30:50 +02:00
Kerollmops
0e2a5951b4 Add more advanced tests 2023-07-24 18:04:58 +02:00
Kerollmops
691a536893 Implement the facet search with the normalized index 2023-07-24 17:56:17 +02:00
Clément Renault
df528b41d8 Normalize for the search the facets values 2023-07-20 17:57:07 +02:00
meili-bors[bot]
2452ec55b4 Merge #3940
3940: Update mini dashboard v0.2.9 r=gillian-meilisearch a=bidoubiwa

# Pull Request


## What does this PR do?
- Updates the mini-dashboard to version [0.2.9](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.9)

## PR checklist
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
2023-07-20 15:08:59 +00:00
Charlotte Vermandel
54ae1b5a67 Update mini-dashboard to version 0.2.9 2023-07-20 14:11:17 +02:00
meili-bors[bot]
3070a20580 Merge #3937
3937: Update Charabia to the last version r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #3924

## What does this PR do?
- Update Charabia


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-07-19 14:57:38 +00:00
ManyTheFish
0497f93494 Update Charabia to the last version 2023-07-19 15:19:32 +02:00
meili-bors[bot]
d5ab750627 Merge #3935
3935: Update mini-dashboard to version 0.2.8 r=Kerollmops a=bidoubiwa

# Pull Request


## What does this PR do?
- Updates the mini-dashboard to version [0.2.8](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.8)

## PR checklist
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
2023-07-18 12:59:29 +00:00
Charlotte Vermandel
2afd10f96d Update mini-dashboard to version 0.2.8 2023-07-18 14:49:36 +02:00
meili-bors[bot]
2d2619bd90 Merge #3933
3933: Stop computing the update files size r=ManyTheFish a=Kerollmops

This PR, related #3934, removes the part which computes the total size of the `data.ms/update_files` folder, which can take a lot of time when many updates must be processed.

It is not breaking API-side but is breaking on the result we will show to the user. The `databaseSize` field returned by the `/stats` endpoint will be reduced.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-07-18 12:02:08 +00:00
Kerollmops
516d2df862 Stop computing the update files size 2023-07-18 11:51:30 +02:00
meili-bors[bot]
c76b488ab1 Merge #3929
3929: Fix a panic when sorting geo fields represented by strings r=Kerollmops a=Kerollmops

This issue fixes #3927 by retrieving and parsing the original string values into f64s. I also added a test to ensure we don't break it in a future version.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-07-18 09:13:22 +00:00
Kerollmops
d383afc82b Fix the geo sort when lat and lng are strings 2023-07-17 18:28:04 +02:00
Kerollmops
f9d94c5845 Test geo sort with string lat/lng 2023-07-17 18:28:03 +02:00
meili-bors[bot]
7745cc9d3c Merge #3921
3921: Deactivate camel case segmentation r=dureuill a=ManyTheFish

# Pull Request
This PR deactivates the camel case segmentation to retrieve the possibility to accept typos over camel-cased words

## Related issue
Fixes #3869
Fixes #3818

## What does this PR do?
- deactivates camelcase segmentation

related to #3919



Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-07-13 11:00:14 +00:00
meili-bors[bot]
657f24ec5f Merge #3907
3907: Add telemetry for define field to search on at query time r=dureuill a=ManyTheFish

Add "attributes_to_search_on" telemetry usage counter:
```json
"attributes_to_search_on": {
   "total_number_of_use": 12,
},
```

This measures the number of search queries that the user uses `attributesToSearchOn` field.

related to https://github.com/meilisearch/specifications/pull/251

## reviewers:

- `@macraig` for validating the telemetry's name
- `@dureuill` for validating the code

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-07-13 10:14:00 +00:00
ManyTheFish
c106906f8f deactivate camelCase segmentation 2023-07-13 12:06:27 +02:00
ManyTheFish
9c0691156f Add tests 2023-07-13 11:53:13 +02:00
ManyTheFish
359b90288d Use saturating add 2023-07-13 11:38:28 +02:00
ManyTheFish
13e3f8faae Fix typo 2023-07-13 11:34:50 +02:00
meili-bors[bot]
fd7c66fd62 Merge #3915
3915: `attributesToSearchOn` supports wildcards r=ManyTheFish a=dureuill

# Pull Request

## Related issue

Fixes #3912  and #3911 

## What does this PR do?
- Adding `*` in the list of `attributesToSearchOn` allows searching on all the `searchableAttributes`.
- If `searchableAttributes contains "*"`, then any attribute is accepted in the `attributesToSearchOn` list.


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-13 09:33:10 +00:00
Louis Dureuil
183f23f40d More relevant test
Co-authored-by: Many the fish <many@meilisearch.com>
2023-07-12 16:06:15 +02:00
Louis Dureuil
16c8437b28 Update tests 2023-07-12 11:21:19 +02:00
Louis Dureuil
4310928803 Fixes #3912 2023-07-12 10:08:56 +02:00
Louis Dureuil
74315b4ea8 Fixes #3911 2023-07-12 10:08:29 +02:00
meili-bors[bot]
177e6e27f9 Merge #3901
3901: Fix experimental analytics r=curquiza a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/specifications/pull/250#discussion_r1253191583

## What does this PR do?
- `snake_case` instead of `camelCase` for feature fields


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-10 16:22:59 +00:00
meili-bors[bot]
50afe724ae Merge #3909
3909: Effectively send the `vector.max_vector_size` telemetry r=curquiza a=Kerollmops

This PR effectively aggregates and sends the `vector.max_vector_size` analytics value.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-07-10 15:44:30 +00:00
Kerollmops
012c960fad Send the vector.max_vector_size telemetry 2023-07-10 16:50:37 +02:00
meili-bors[bot]
76f6d3357e Merge #3908
3908: Allow a comma-separated value to the `vector` argument in GET search r=Kerollmops a=dureuill

# Pull Request

For request:

```
 curl \
  -X GET 'http://localhost:7700/indexes/movies/search?vector=0.123,1.124,244'
```

Before PR: 

```
{"message":"Invalid value type for parameter `vector`: expected a string, but found a string: `0,1,2`","code":"invalid_search_vector","type":"invalid_request","link":"https://docs.meilisearch.com/errors#invalid_search_vector"}%
```

After PR:

```
{"hits":[],"query":"","vector":[0.123,1.124,244.0],"processingTimeMs":0,"limit":20,"offset":0,"estimatedTotalHits":1000}%
```

cc `@gmourier` `@bidoubiwa` 


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-10 14:25:44 +00:00
Louis Dureuil
d59e969c16 Allow a comma-separated value to the vector argument in GET search 2023-07-10 16:16:34 +02:00
meili-bors[bot]
eb7a1aa7af Merge #3904
3904: Sort by lexicographic order after normalization r=dureuill a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3893

## What does this PR do?
- Re-sort stop words after normalization so they're not sent out-of-order to the FST


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-10 12:12:05 +00:00
ManyTheFish
c30a14cb97 Add telemetry 2023-07-10 13:12:12 +02:00
meili-bors[bot]
a3ca8412ce Merge #3906
3906: Add "scoring.*" analytics to multi search route r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/specifications/pull/252#discussion_r1254375746 by implementing (3): multi search now returns the "score.show_ranking_rule" and "score.show_ranking_rule_details" analytics.


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-10 09:51:30 +00:00
Louis Dureuil
106f98aa72 Add "scoring.*" analytics to multi search route 2023-07-10 11:45:43 +02:00
Louis Dureuil
40fa59d64c Sort by lexicographic order after normalization 2023-07-10 09:26:59 +02:00
Louis Dureuil
bb40ce6e35 Experimental features analytics match the spec 2023-07-10 08:57:53 +02:00
meili-bors[bot]
0c8dbf6fa6 Merge #3897
3897: Add automated tests for `/experimental-features` route r=Kerollmops a=dureuill

# Pull Request

## What does this PR do?
- Make `RuntimeTogglableFeatures` `Eq`
- Add various tests for the `/experimental-features` route
  - Integration tests for the route itself
  - Integration tests for the effect of enabling `scoreDetails` and `vectorStore` through this route.
  - Dump integration tests


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-06 13:37:56 +00:00
Louis Dureuil
dd6519b64f Dump tests 2023-07-06 14:22:29 +02:00
Louis Dureuil
da02a9cf32 Make RuntimeTogglableFeatures Eq 2023-07-06 14:20:58 +02:00
Louis Dureuil
2d3cec11a7 Search integration test to check score details and vector store 2023-07-06 09:02:02 +02:00
Louis Dureuil
76e1ee9988 integration test on "/experimental-features" route 2023-07-06 09:01:28 +02:00
Louis Dureuil
222615d3df Allow to get/set features in integration test server 2023-07-06 09:01:05 +02:00
Louis Dureuil
11d024c613 Authentication tests 2023-07-06 09:00:51 +02:00
63 changed files with 2067 additions and 580 deletions

278
Cargo.lock generated
View File

@@ -405,7 +405,7 @@ checksum = "16e62a023e7c117e27523144c5d2459f4397fcc3cab0085af8e2224f643a0193"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
@@ -416,7 +416,7 @@ checksum = "b9ccdd8f2a161be9bd5c023df56f1b2a0bd1d83872ae53b71a84a12c9bf6e842"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
@@ -469,7 +469,7 @@ checksum = "8c3c1a368f70d6cf7302d78f8f7093da241fb8e8807c05cc9e51a125895a6d5b"
[[package]]
name = "benchmarks"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"anyhow",
"bytes",
@@ -603,7 +603,7 @@ checksum = "fdde5c9cd29ebd706ce1b35600920a33550e402fc998a2e53ad3b42c3c47a192"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
@@ -700,16 +700,15 @@ dependencies = [
[[package]]
name = "charabia"
version = "0.8.1"
version = "0.8.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bb49850f555eb71aa6fc6d4d79420e81f4d89fa56e0e9c0f6d19aace2f56c554"
checksum = "098219a776307414866165a03a9cc68c1578764fe3616fe979e1c280790ddd73"
dependencies = [
"aho-corasick",
"cow-utils",
"csv",
"deunicode",
"either",
"finl_unicode",
"fst",
"irg-kvariants",
"jieba-rs",
@@ -795,7 +794,7 @@ dependencies = [
"heck",
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
@@ -1022,9 +1021,9 @@ dependencies = [
[[package]]
name = "csv"
version = "1.2.1"
version = "1.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0b015497079b9a9d69c02ad25de6c0a6edef051ea6360a327d0bd05802ef64ad"
checksum = "626ae34994d3d8d668f4269922248239db4ae42d538b14c398b74a52208e8086"
dependencies = [
"csv-core",
"itoa",
@@ -1198,15 +1197,9 @@ dependencies = [
"winapi",
]
[[package]]
name = "doc-comment"
version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fea41bba32d969b513997752735605054bc0dfa92b4c56bf1189f2e174be7a10"
[[package]]
name = "dump"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"anyhow",
"big_s",
@@ -1343,7 +1336,7 @@ checksum = "eecf8589574ce9b895052fa12d69af7a233f99e6107f5cb8dd1044f2a17bfdcb"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
@@ -1359,6 +1352,12 @@ dependencies = [
"termcolor",
]
[[package]]
name = "equivalent"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5443807d6dff69373d433ab9ef5378ad8df50ca6298caf15de6e52e24aaf54d5"
[[package]]
name = "errno"
version = "0.3.1"
@@ -1414,7 +1413,7 @@ dependencies = [
[[package]]
name = "file-store"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"faux",
"tempfile",
@@ -1436,19 +1435,14 @@ dependencies = [
[[package]]
name = "filter-parser"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"insta",
"nom",
"nom_locate",
"unescaper",
]
[[package]]
name = "finl_unicode"
version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8fcfdc7a0362c9f4444381a9e697c79d435fe65b52a37466fc2c1184cee9edc6"
[[package]]
name = "flate2"
version = "1.0.26"
@@ -1461,7 +1455,7 @@ dependencies = [
[[package]]
name = "flatten-serde-json"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"criterion",
"serde_json",
@@ -1544,7 +1538,7 @@ checksum = "89ca545a94061b6365f2c7355b4b32bd20df3ff95f02da9329b34ccc3bd6ee72"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
@@ -1579,7 +1573,7 @@ dependencies = [
[[package]]
name = "fuzzers"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"arbitrary",
"clap",
@@ -1686,7 +1680,7 @@ dependencies = [
"futures-sink",
"futures-util",
"http",
"indexmap",
"indexmap 1.9.3",
"slab",
"tokio",
"tokio-util",
@@ -1708,15 +1702,6 @@ dependencies = [
"byteorder",
]
[[package]]
name = "hashbrown"
version = "0.11.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ab5ef0d4909ef3724cc8cce6ccc8572c5c817592e9285f5464f8e86f8bd3726e"
dependencies = [
"ahash 0.7.6",
]
[[package]]
name = "hashbrown"
version = "0.12.3"
@@ -1726,6 +1711,12 @@ dependencies = [
"ahash 0.7.6",
]
[[package]]
name = "hashbrown"
version = "0.14.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2c6201b9ff9fd90a5a3bac2e56a830d0caa509576f0e503818ee82c181b3437a"
[[package]]
name = "heapless"
version = "0.7.16"
@@ -1747,8 +1738,8 @@ checksum = "95505c38b4572b2d910cecb0281560f54b440a19336cbbcb27bf6ce6adc6f5a8"
[[package]]
name = "heed"
version = "0.12.5"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.6#8c5b94225fc949c02bb7b900cc50ffaf6b584b1e"
version = "0.12.7"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.7#061a5276b1f336f5f3302bee291e336041d88632"
dependencies = [
"byteorder",
"heed-traits",
@@ -1765,12 +1756,12 @@ dependencies = [
[[package]]
name = "heed-traits"
version = "0.7.0"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.6#8c5b94225fc949c02bb7b900cc50ffaf6b584b1e"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.7#061a5276b1f336f5f3302bee291e336041d88632"
[[package]]
name = "heed-types"
version = "0.7.2"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.6#8c5b94225fc949c02bb7b900cc50ffaf6b584b1e"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.7#061a5276b1f336f5f3302bee291e336041d88632"
dependencies = [
"bincode",
"heed-traits",
@@ -1809,22 +1800,6 @@ dependencies = [
"digest",
]
[[package]]
name = "hnsw"
version = "0.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2b9740ebf8769ec4ad6762cc951ba18f39bba6dfbc2fbbe46285f7539af79752"
dependencies = [
"ahash 0.7.6",
"hashbrown 0.11.2",
"libm",
"num-traits",
"rand_core",
"serde",
"smallvec",
"space",
]
[[package]]
name = "http"
version = "0.2.9"
@@ -1920,7 +1895,7 @@ dependencies = [
[[package]]
name = "index-scheduler"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"anyhow",
"big_s",
@@ -1959,6 +1934,16 @@ dependencies = [
"serde",
]
[[package]]
name = "indexmap"
version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d5477fe2230a79769d8dc68e0eabf5437907c0457a5614a9e8dddb67f65eb65d"
dependencies = [
"equivalent",
"hashbrown 0.14.0",
]
[[package]]
name = "inout"
version = "0.1.3"
@@ -1993,6 +1978,21 @@ dependencies = [
"cfg-if",
]
[[package]]
name = "instant-distance"
version = "0.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8c619cdaa30bb84088963968bee12a45ea5fbbf355f2c021bcd15589f5ca494a"
dependencies = [
"num_cpus",
"ordered-float",
"parking_lot",
"rand",
"rayon",
"serde",
"serde-big-array",
]
[[package]]
name = "io-lifetimes"
version = "1.0.11"
@@ -2082,7 +2082,7 @@ dependencies = [
[[package]]
name = "json-depth-checker"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"criterion",
"serde_json",
@@ -2171,9 +2171,9 @@ dependencies = [
[[package]]
name = "lindera-cc-cedict-builder"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4c6bf79b29a90bcd22036e494d6cc9ac3abe9ab604b21f3258ba6dc1ce501801"
checksum = "2d2e8f2ca97ddf952fe340642511b9c14b373cb2eef711d526bb8ef2ca0969b8"
dependencies = [
"anyhow",
"bincode",
@@ -2190,9 +2190,9 @@ dependencies = [
[[package]]
name = "lindera-compress"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8f2e99e67736352bbb6ed1c273643975822505067ca32194b0981040bc50527a"
checksum = "f72b460559bcbe8a9cee85ea4a5056133ed3abf373031191589236e656d65b59"
dependencies = [
"anyhow",
"flate2",
@@ -2201,9 +2201,9 @@ dependencies = [
[[package]]
name = "lindera-core"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7c3935e966409156f22cb4b334b21b0dce84b7aa1cad62214b466489d249c8e5"
checksum = "f586eb8a9393c32d5525e0e9336a3727bd1329674740097126f3b0bff8a1a1ea"
dependencies = [
"anyhow",
"bincode",
@@ -2218,9 +2218,9 @@ dependencies = [
[[package]]
name = "lindera-decompress"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7476406abb63c49d7f59c88b9b868ee8d2981495ea7e2c3ad129902f9916b3c6"
checksum = "1fb1facd8da698072fcc7338bd757730db53d59f313f44dd583fa03681dcc0e1"
dependencies = [
"anyhow",
"flate2",
@@ -2229,9 +2229,9 @@ dependencies = [
[[package]]
name = "lindera-dictionary"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "808b7d2b3cabc25a4022526d484a4cfd1d5924dc76a26e0379707698841acef2"
checksum = "ec7be7410b1da7017a8948986b87af67082f605e9a716f0989790d795d677f0c"
dependencies = [
"anyhow",
"bincode",
@@ -2249,9 +2249,9 @@ dependencies = [
[[package]]
name = "lindera-ipadic-builder"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "31f373a280958c930e5ee4a1e4db3a0ee0542afaf02d3b5cacb8cab4e298648e"
checksum = "705d07f8a45d04fd95149f7ad41a26d1f9e56c9c00402be6f9dd05e3d88b99c6"
dependencies = [
"anyhow",
"bincode",
@@ -2270,9 +2270,9 @@ dependencies = [
[[package]]
name = "lindera-ipadic-neologd-builder"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "92eff98e9ed1a7a412b91709c2343457a04ef02fa0c27c27e3a5892f5591eae9"
checksum = "633a93983ba13fba42328311a501091bd4a7aff0c94ae9eaa9d4733dd2b0468a"
dependencies = [
"anyhow",
"bincode",
@@ -2291,9 +2291,9 @@ dependencies = [
[[package]]
name = "lindera-ko-dic"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "74c6d5bf7d8092bd6d10de7a5d74b70ea7cf234586235b0d6cdb903b05a6c9e2"
checksum = "a428e0d316b6c86f51bd919479692bc41ad840dba266ebc044663970f431ea18"
dependencies = [
"bincode",
"byteorder",
@@ -2308,9 +2308,9 @@ dependencies = [
[[package]]
name = "lindera-ko-dic-builder"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f0a4add6d3c1e41ec9e2690d33e287d0223fb59a30ccee4980c23f31368cae1e"
checksum = "2a5288704c6b8a069c0a1705c38758e836497698b50453373ab3d56c6f9a7ef8"
dependencies = [
"anyhow",
"bincode",
@@ -2328,9 +2328,9 @@ dependencies = [
[[package]]
name = "lindera-tokenizer"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cb6a8acbd068019d1cdac7316f0dcb87f8e33ede2b13aa237f45114f9750afb8"
checksum = "106ba439b2e87529d9bbedbb88d69f635baba1195c26502b308f55a85885fc81"
dependencies = [
"bincode",
"byteorder",
@@ -2343,9 +2343,9 @@ dependencies = [
[[package]]
name = "lindera-unidic"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "14abf0613d350b30d3b0406a33b1de8fa8d829f26516909421702174785991c8"
checksum = "3399b6dcfe1701333451d184ff3c677f433b320153427b146360c9e4bd8cb816"
dependencies = [
"bincode",
"byteorder",
@@ -2360,9 +2360,9 @@ dependencies = [
[[package]]
name = "lindera-unidic-builder"
version = "0.25.0"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e204ed53d9bd63227d1e6a6c1f122ca039e00a8634ac32e7fb0281eeec8615c4"
checksum = "b698227fdaeac32289173ab389b990d4eb00a40cbc9912020f69a0c491dabf55"
dependencies = [
"anyhow",
"bincode",
@@ -2442,9 +2442,9 @@ dependencies = [
[[package]]
name = "log"
version = "0.4.18"
version = "0.4.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "518ef76f2f87365916b142844c16d8fefd85039bc5699050210a7778ee1cd1de"
checksum = "b06a4cde4c0f271a446782e3eff8de789548ce57dbc8eca9292c27f4a42004b4"
[[package]]
name = "logging_timer"
@@ -2477,7 +2477,7 @@ dependencies = [
"once_cell",
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
@@ -2494,7 +2494,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]]
name = "meili-snap"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"insta",
"md5",
@@ -2503,7 +2503,7 @@ dependencies = [
[[package]]
name = "meilisearch"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"actix-cors",
"actix-http",
@@ -2534,7 +2534,7 @@ dependencies = [
"hex",
"http",
"index-scheduler",
"indexmap",
"indexmap 1.9.3",
"insta",
"is-terminal",
"itertools",
@@ -2592,7 +2592,7 @@ dependencies = [
[[package]]
name = "meilisearch-auth"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"base64 0.21.2",
"enum-iterator",
@@ -2611,7 +2611,7 @@ dependencies = [
[[package]]
name = "meilisearch-types"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"actix-web",
"anyhow",
@@ -2665,7 +2665,7 @@ dependencies = [
[[package]]
name = "milli"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"big_s",
"bimap",
@@ -2686,9 +2686,9 @@ dependencies = [
"geoutils",
"grenad",
"heed",
"hnsw",
"indexmap",
"indexmap 1.9.3",
"insta",
"instant-distance",
"itertools",
"json-depth-checker",
"levenshtein_automata",
@@ -2712,7 +2712,6 @@ dependencies = [
"smallstr",
"smallvec",
"smartstring",
"space",
"tempfile",
"thiserror",
"time",
@@ -2873,9 +2872,9 @@ checksum = "f69e48cd7c8e5bb52a1da1287fdbfd877c32673176583ce664cd63b201aba385"
[[package]]
name = "once_cell"
version = "1.17.1"
version = "1.18.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b7e5500299e16ebb147ae15a00a942af264cf3688f47923b8fc2cd5858f23ad3"
checksum = "dd8b5dd2ae5ed71462c540258bedcb51965123ad7e7ccf4b9a8cafaa4a63576d"
[[package]]
name = "oorandom"
@@ -2996,7 +2995,7 @@ checksum = "478c572c3d73181ff3c2539045f6eb99e5491218eae919370993b890cdbdd98e"
[[package]]
name = "permissive-json-pointer"
version = "1.3.0"
version = "1.3.4"
dependencies = [
"big_s",
"serde_json",
@@ -3032,7 +3031,7 @@ dependencies = [
"pest_meta",
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
@@ -3177,9 +3176,9 @@ dependencies = [
[[package]]
name = "proc-macro2"
version = "1.0.59"
version = "1.0.66"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6aeca18b86b413c660b781aa319e4e2648a3e6f9eadc9b47e9038e6fe9f3451b"
checksum = "18fb31db3f9bddb2ea821cde30a9f70117e3f119938b5ee630b7403aa6e2ead9"
dependencies = [
"unicode-ident",
]
@@ -3222,9 +3221,9 @@ checksum = "106dd99e98437432fed6519dedecfade6a06a73bb7b2a1e019fdd2bee5778d94"
[[package]]
name = "quote"
version = "1.0.28"
version = "1.0.31"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1b9ab9c7eadfd8df19006f1cf1a4aed13540ed5cbc047010ece5826e10825488"
checksum = "5fe8a65d69dd0808184ebb5f836ab526bb259db23c657efa38711b1072ee47f0"
dependencies = [
"proc-macro2",
]
@@ -3585,13 +3584,22 @@ checksum = "bebd363326d05ec3e2f532ab7660680f3b02130d780c299bca73469d521bc0ed"
[[package]]
name = "serde"
version = "1.0.163"
version = "1.0.171"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2113ab51b87a539ae008b5c6c02dc020ffa39afd2d83cffcb3f4eb2722cebec2"
checksum = "30e27d1e4fd7659406c492fd6cfaf2066ba8773de45ca75e855590f856dc34a9"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde-big-array"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "11fc7cc2c76d73e0f27ee52abbd64eec84d46f370c88371120433196934e4b7f"
dependencies = [
"serde",
]
[[package]]
name = "serde-cs"
version = "0.2.4"
@@ -3603,22 +3611,22 @@ dependencies = [
[[package]]
name = "serde_derive"
version = "1.0.163"
version = "1.0.171"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8c805777e3930c8883389c602315a24224bcc738b63905ef87cd1420353ea93e"
checksum = "389894603bd18c46fa56231694f8d827779c0951a667087194cf9de94ed24682"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
name = "serde_json"
version = "1.0.96"
version = "1.0.103"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "057d394a50403bcac12672b2b18fb387ab6d289d957dab67dd201875391e52f1"
checksum = "d03b412469450d4404fe8499a268edd7f8b79fecb074b0d812ad64ca21f4031b"
dependencies = [
"indexmap",
"indexmap 2.0.0",
"itoa",
"ryu",
"serde",
@@ -3741,9 +3749,6 @@ name = "smallvec"
version = "1.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a507befe795404456341dfab10cef66ead4c041f62b8b11bbb92bffe5d0953e0"
dependencies = [
"serde",
]
[[package]]
name = "smartstring"
@@ -3766,16 +3771,6 @@ dependencies = [
"winapi",
]
[[package]]
name = "space"
version = "0.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c5ab9701ae895386d13db622abf411989deff7109b13b46b6173bb4ce5c1d123"
dependencies = [
"doc-comment",
"num-traits",
]
[[package]]
name = "spin"
version = "0.5.2"
@@ -3839,9 +3834,9 @@ dependencies = [
[[package]]
name = "syn"
version = "2.0.18"
version = "2.0.26"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "32d41677bcbe24c20c52e7c70b0d8db04134c5d1066bf98662e2871ad200ea3e"
checksum = "45c3457aacde3c65315de5031ec191ce46604304d2446e803d71ade03308d970"
dependencies = [
"proc-macro2",
"quote",
@@ -3928,22 +3923,22 @@ dependencies = [
[[package]]
name = "thiserror"
version = "1.0.40"
version = "1.0.43"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "978c9a314bd8dc99be594bc3c175faaa9794be04a5a5e153caba6915336cebac"
checksum = "a35fc5b8971143ca348fa6df4f024d4d55264f3468c71ad1c2f365b0a4d58c42"
dependencies = [
"thiserror-impl",
]
[[package]]
name = "thiserror-impl"
version = "1.0.40"
version = "1.0.43"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f9456a42c5b0d803c8cd86e73dd7cc9edd429499f37a3550d286d5e86720569f"
checksum = "463fe12d7993d3b327787537ce8dd4dfa058de32fc2b195ef3cde03dc4771e8f"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
@@ -4025,7 +4020,7 @@ checksum = "630bdcf245f78637c13ec01ffae6187cca34625e8c63150d424b59e55af2675e"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
]
[[package]]
@@ -4101,7 +4096,7 @@ version = "0.19.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2380d56e8670370eee6566b0bfd4265f65b3f432e8c6d85623f728d4fa31f739"
dependencies = [
"indexmap",
"indexmap 1.9.3",
"serde",
"serde_spanned",
"toml_datetime",
@@ -4153,6 +4148,15 @@ version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9e79c4d996edb816c91e4308506774452e55e95c3c9de07b6729e17e15a5ef81"
[[package]]
name = "unescaper"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a96a44ae11e25afb520af4534fd7b0bd8cd613e35a78def813b8cf41631fa3c8"
dependencies = [
"thiserror",
]
[[package]]
name = "unicase"
version = "2.6.0"
@@ -4350,7 +4354,7 @@ dependencies = [
"once_cell",
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
"wasm-bindgen-shared",
]
@@ -4384,7 +4388,7 @@ checksum = "e128beba882dd1eb6200e1dc92ae6c5dbaa4311aa7bb211ca035779e5efc39f8"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.18",
"syn 2.0.26",
"wasm-bindgen-backend",
"wasm-bindgen-shared",
]

View File

@@ -18,7 +18,7 @@ members = [
]
[workspace.package]
version = "1.3.0"
version = "1.3.4"
authors = ["Quentin de Quelen <quentin@dequelen.me>", "Clément Renault <clement@meilisearch.com>"]
description = "Meilisearch HTTP server"
homepage = "https://meilisearch.com"

View File

@@ -61,7 +61,7 @@ You may also want to check out [Meilisearch 101](https://www.meilisearch.com/doc
## ⚡ Supercharge your Meilisearch experience
Say goodbye to server deployment and manual updates with [Meilisearch Cloud](https://www.meilisearch.com/pricing?utm_campaign=oss&utm_source=engine&utm_medium=meilisearch). No credit card required.
Say goodbye to server deployment and manual updates with [Meilisearch Cloud](https://www.meilisearch.com/cloud?utm_campaign=oss&utm_source=github&utm_medium=meilisearch). No credit card required.
## 🧰 SDKs & integration tools

View File

@@ -210,6 +210,7 @@ pub(crate) mod test {
use big_s::S;
use maplit::{btreemap, btreeset};
use meilisearch_types::facet_values_sort::FacetValuesSort;
use meilisearch_types::features::RuntimeTogglableFeatures;
use meilisearch_types::index_uid_pattern::IndexUidPattern;
use meilisearch_types::keys::{Action, Key};
use meilisearch_types::milli;
@@ -418,7 +419,10 @@ pub(crate) mod test {
}
keys.flush().unwrap();
// ========== TODO: create features here
// ========== experimental features
let features = create_test_features();
dump.create_experimental_features(features).unwrap();
// create the dump
let mut file = tempfile::tempfile().unwrap();
@@ -428,6 +432,10 @@ pub(crate) mod test {
file
}
fn create_test_features() -> RuntimeTogglableFeatures {
RuntimeTogglableFeatures { vector_store: true, ..Default::default() }
}
#[test]
fn test_creating_and_read_dump() {
let mut file = create_test_dump();
@@ -472,5 +480,9 @@ pub(crate) mod test {
for (key, expected) in dump.keys().unwrap().zip(create_test_api_keys()) {
assert_eq!(key.unwrap(), expected);
}
// ==== checking the features
let expected = create_test_features();
assert_eq!(dump.features().unwrap().unwrap(), expected);
}
}

View File

@@ -195,8 +195,53 @@ pub(crate) mod test {
use meili_snap::insta;
use super::*;
use crate::reader::v6::RuntimeTogglableFeatures;
// TODO: add `features` to tests
#[test]
fn import_dump_v6_experimental() {
let dump = File::open("tests/assets/v6-with-experimental.dump").unwrap();
let mut dump = DumpReader::open(dump).unwrap();
// top level infos
insta::assert_display_snapshot!(dump.date().unwrap(), @"2023-07-06 7:10:27.21958 +00:00:00");
insta::assert_debug_snapshot!(dump.instance_uid().unwrap(), @"None");
// tasks
let tasks = dump.tasks().unwrap().collect::<Result<Vec<_>>>().unwrap();
let (tasks, update_files): (Vec<_>, Vec<_>) = tasks.into_iter().unzip();
meili_snap::snapshot_hash!(meili_snap::json_string!(tasks), @"d45cd8571703e58ae53c7bd7ce3f5c22");
assert_eq!(update_files.len(), 2);
assert!(update_files[0].is_none()); // the dump creation
assert!(update_files[1].is_none()); // the processed document addition
// keys
let keys = dump.keys().unwrap().collect::<Result<Vec<_>>>().unwrap();
meili_snap::snapshot_hash!(meili_snap::json_string!(keys), @"13c2da155e9729c2344688cab29af71d");
// indexes
let mut indexes = dump.indexes().unwrap().collect::<Result<Vec<_>>>().unwrap();
// the index are not ordered in any way by default
indexes.sort_by_key(|index| index.metadata().uid.to_string());
let mut test = indexes.pop().unwrap();
assert!(indexes.is_empty());
insta::assert_json_snapshot!(test.metadata(), @r###"
{
"uid": "test",
"primaryKey": "id",
"createdAt": "2023-07-06T07:07:41.364694Z",
"updatedAt": "2023-07-06T07:07:41.396114Z"
}
"###);
assert_eq!(test.documents().unwrap().count(), 1);
assert_eq!(
dump.features().unwrap().unwrap(),
RuntimeTogglableFeatures { vector_store: true, ..Default::default() }
);
}
#[test]
fn import_dump_v5() {
@@ -274,6 +319,8 @@ pub(crate) mod test {
let documents = spells.documents().unwrap().collect::<Result<Vec<_>>>().unwrap();
assert_eq!(documents.len(), 10);
meili_snap::snapshot_hash!(format!("{:#?}", documents), @"235016433dd04262c7f2da01d1e808ce");
assert_eq!(dump.features().unwrap(), None);
}
#[test]

View File

@@ -0,0 +1,24 @@
---
source: dump/src/reader/mod.rs
expression: spells.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"typo",
"words",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@@ -0,0 +1,38 @@
---
source: dump/src/reader/mod.rs
expression: products.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"typo",
"words",
"proximity",
"attribute",
"exactness"
],
"stopWords": [],
"synonyms": {
"android": [
"phone",
"smartphone"
],
"iphone": [
"phone",
"smartphone"
],
"phone": [
"android",
"iphone",
"smartphone"
]
},
"distinctAttribute": null
}

View File

@@ -0,0 +1,31 @@
---
source: dump/src/reader/mod.rs
expression: movies.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [
"genres",
"id"
],
"sortableAttributes": [
"genres",
"id"
],
"rankingRules": [
"typo",
"words",
"proximity",
"attribute",
"exactness",
"release_date:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null
}

View File

@@ -292,6 +292,7 @@ pub(crate) mod test {
│ ├---- update_files/
│ │ └---- 1.jsonl
│ └---- queue.jsonl
├---- experimental-features.json
├---- instance_uid.uuid
├---- keys.jsonl
└---- metadata.json

Binary file not shown.

View File

@@ -14,6 +14,7 @@ license.workspace = true
[dependencies]
nom = "7.1.3"
nom_locate = "4.1.0"
unescaper = "0.1.2"
[dev-dependencies]
insta = "1.29.0"

View File

@@ -62,6 +62,7 @@ pub enum ErrorKind<'a> {
MisusedGeoRadius,
MisusedGeoBoundingBox,
InvalidPrimary,
InvalidEscapedNumber,
ExpectedEof,
ExpectedValue(ExpectedValueKind),
MalformedValue,
@@ -147,6 +148,9 @@ impl<'a> Display for Error<'a> {
let text = if input.trim().is_empty() { "but instead got nothing.".to_string() } else { format!("at `{}`.", escaped_input) };
writeln!(f, "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `_geoRadius`, or `_geoBoundingBox` {}", text)?
}
ErrorKind::InvalidEscapedNumber => {
writeln!(f, "Found an invalid escaped sequence number: `{}`.", escaped_input)?
}
ErrorKind::ExpectedEof => {
writeln!(f, "Found unexpected characters at the end of the filter: `{}`. You probably forgot an `OR` or an `AND` rule.", escaped_input)?
}

View File

@@ -472,8 +472,81 @@ pub fn parse_filter(input: Span) -> IResult<FilterCondition> {
terminated(|input| parse_expression(input, 0), eof)(input)
}
impl<'a> std::fmt::Display for FilterCondition<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
FilterCondition::Not(filter) => {
write!(f, "NOT ({filter})")
}
FilterCondition::Condition { fid, op } => {
write!(f, "{fid} {op}")
}
FilterCondition::In { fid, els } => {
write!(f, "{fid} IN[")?;
for el in els {
write!(f, "{el}, ")?;
}
write!(f, "]")
}
FilterCondition::Or(els) => {
write!(f, "OR[")?;
for el in els {
write!(f, "{el}, ")?;
}
write!(f, "]")
}
FilterCondition::And(els) => {
write!(f, "AND[")?;
for el in els {
write!(f, "{el}, ")?;
}
write!(f, "]")
}
FilterCondition::GeoLowerThan { point, radius } => {
write!(f, "_geoRadius({}, {}, {})", point[0], point[1], radius)
}
FilterCondition::GeoBoundingBox {
top_right_point: top_left_point,
bottom_left_point: bottom_right_point,
} => {
write!(
f,
"_geoBoundingBox([{}, {}], [{}, {}])",
top_left_point[0],
top_left_point[1],
bottom_right_point[0],
bottom_right_point[1]
)
}
}
}
}
impl<'a> std::fmt::Display for Condition<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Condition::GreaterThan(token) => write!(f, "> {token}"),
Condition::GreaterThanOrEqual(token) => write!(f, ">= {token}"),
Condition::Equal(token) => write!(f, "= {token}"),
Condition::NotEqual(token) => write!(f, "!= {token}"),
Condition::Null => write!(f, "IS NULL"),
Condition::Empty => write!(f, "IS EMPTY"),
Condition::Exists => write!(f, "EXISTS"),
Condition::LowerThan(token) => write!(f, "< {token}"),
Condition::LowerThanOrEqual(token) => write!(f, "<= {token}"),
Condition::Between { from, to } => write!(f, "{from} TO {to}"),
}
}
}
impl<'a> std::fmt::Display for Token<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{{{}}}", self.value())
}
}
#[cfg(test)]
pub mod tests {
use FilterCondition as Fc;
use super::*;
/// Create a raw [Token]. You must specify the string that appear BEFORE your element followed by your element
@@ -485,14 +558,22 @@ pub mod tests {
unsafe { Span::new_from_raw_offset(offset, lines as u32, value, "") }.into()
}
fn p(s: &str) -> impl std::fmt::Display + '_ {
Fc::parse(s).unwrap().unwrap()
}
#[test]
fn parse_escaped() {
insta::assert_display_snapshot!(p(r#"title = 'foo\\'"#), @r#"{title} = {foo\}"#);
insta::assert_display_snapshot!(p(r#"title = 'foo\\\\'"#), @r#"{title} = {foo\\}"#);
insta::assert_display_snapshot!(p(r#"title = 'foo\\\\\\'"#), @r#"{title} = {foo\\\}"#);
insta::assert_display_snapshot!(p(r#"title = 'foo\\\\\\\\'"#), @r#"{title} = {foo\\\\}"#);
// but it also works with other sequencies
insta::assert_display_snapshot!(p(r#"title = 'foo\x20\n\t\"\'"'"#), @"{title} = {foo \n\t\"\'\"}");
}
#[test]
fn parse() {
use FilterCondition as Fc;
fn p(s: &str) -> impl std::fmt::Display + '_ {
Fc::parse(s).unwrap().unwrap()
}
// Test equal
insta::assert_display_snapshot!(p("channel = Ponce"), @"{channel} = {Ponce}");
insta::assert_display_snapshot!(p("subscribers = 12"), @"{subscribers} = {12}");
@@ -852,74 +933,3 @@ pub mod tests {
assert_eq!(token.value(), s);
}
}
impl<'a> std::fmt::Display for FilterCondition<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
FilterCondition::Not(filter) => {
write!(f, "NOT ({filter})")
}
FilterCondition::Condition { fid, op } => {
write!(f, "{fid} {op}")
}
FilterCondition::In { fid, els } => {
write!(f, "{fid} IN[")?;
for el in els {
write!(f, "{el}, ")?;
}
write!(f, "]")
}
FilterCondition::Or(els) => {
write!(f, "OR[")?;
for el in els {
write!(f, "{el}, ")?;
}
write!(f, "]")
}
FilterCondition::And(els) => {
write!(f, "AND[")?;
for el in els {
write!(f, "{el}, ")?;
}
write!(f, "]")
}
FilterCondition::GeoLowerThan { point, radius } => {
write!(f, "_geoRadius({}, {}, {})", point[0], point[1], radius)
}
FilterCondition::GeoBoundingBox {
top_right_point: top_left_point,
bottom_left_point: bottom_right_point,
} => {
write!(
f,
"_geoBoundingBox([{}, {}], [{}, {}])",
top_left_point[0],
top_left_point[1],
bottom_right_point[0],
bottom_right_point[1]
)
}
}
}
}
impl<'a> std::fmt::Display for Condition<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Condition::GreaterThan(token) => write!(f, "> {token}"),
Condition::GreaterThanOrEqual(token) => write!(f, ">= {token}"),
Condition::Equal(token) => write!(f, "= {token}"),
Condition::NotEqual(token) => write!(f, "!= {token}"),
Condition::Null => write!(f, "IS NULL"),
Condition::Empty => write!(f, "IS EMPTY"),
Condition::Exists => write!(f, "EXISTS"),
Condition::LowerThan(token) => write!(f, "< {token}"),
Condition::LowerThanOrEqual(token) => write!(f, "<= {token}"),
Condition::Between { from, to } => write!(f, "{from} TO {to}"),
}
}
}
impl<'a> std::fmt::Display for Token<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{{{}}}", self.value())
}
}

View File

@@ -171,7 +171,24 @@ pub fn parse_value(input: Span) -> IResult<Token> {
})
})?;
Ok((input, value))
match unescaper::unescape(value.value()) {
Ok(content) => {
if content.len() != value.value().len() {
Ok((input, Token::new(value.original_span(), Some(content))))
} else {
Ok((input, value))
}
}
Err(unescaper::Error::IncompleteStr(_)) => Err(nom::Err::Incomplete(nom::Needed::Unknown)),
Err(unescaper::Error::ParseIntError { .. }) => Err(nom::Err::Error(Error::new_from_kind(
value.original_span(),
ErrorKind::InvalidEscapedNumber,
))),
Err(unescaper::Error::InvalidChar { .. }) => Err(nom::Err::Error(Error::new_from_kind(
value.original_span(),
ErrorKind::MalformedValue,
))),
}
}
fn is_value_component(c: char) -> bool {
@@ -318,17 +335,17 @@ pub mod test {
("\"cha'nnel\"", "cha'nnel", false),
("I'm tamo", "I", false),
// escaped thing but not quote
(r#""\\""#, r#"\\"#, false),
(r#""\\\\\\""#, r#"\\\\\\"#, false),
(r#""aa\\aa""#, r#"aa\\aa"#, false),
(r#""\\""#, r#"\"#, true),
(r#""\\\\\\""#, r#"\\\"#, true),
(r#""aa\\aa""#, r#"aa\aa"#, true),
// with double quote
(r#""Hello \"world\"""#, r#"Hello "world""#, true),
(r#""Hello \\\"world\\\"""#, r#"Hello \\"world\\""#, true),
(r#""Hello \\\"world\\\"""#, r#"Hello \"world\""#, true),
(r#""I'm \"super\" tamo""#, r#"I'm "super" tamo"#, true),
(r#""\"\"""#, r#""""#, true),
// with simple quote
(r#"'Hello \'world\''"#, r#"Hello 'world'"#, true),
(r#"'Hello \\\'world\\\''"#, r#"Hello \\'world\\'"#, true),
(r#"'Hello \\\'world\\\''"#, r#"Hello \'world\'"#, true),
(r#"'I\'m "super" tamo'"#, r#"I'm "super" tamo"#, true),
(r#"'\'\''"#, r#"''"#, true),
];
@@ -350,7 +367,14 @@ pub mod test {
"Filter `{}` was not supposed to be escaped",
input
);
assert_eq!(token.value(), expected, "Filter `{}` failed.", input);
assert_eq!(
token.value(),
expected,
"Filter `{}` failed by giving `{}` instead of `{}`.",
input,
token.value(),
expected
);
}
}

View File

@@ -67,10 +67,6 @@ pub(crate) enum Batch {
op: IndexOperation,
must_create_index: bool,
},
IndexDocumentDeletionByFilter {
index_uid: String,
task: Task,
},
IndexCreation {
index_uid: String,
primary_key: Option<String>,
@@ -114,6 +110,10 @@ pub(crate) enum IndexOperation {
documents: Vec<Vec<String>>,
tasks: Vec<Task>,
},
IndexDocumentDeletionByFilter {
index_uid: String,
task: Task,
},
DocumentClear {
index_uid: String,
tasks: Vec<Task>,
@@ -155,7 +155,6 @@ impl Batch {
| Batch::TaskDeletion(task)
| Batch::Dump(task)
| Batch::IndexCreation { task, .. }
| Batch::IndexDocumentDeletionByFilter { task, .. }
| Batch::IndexUpdate { task, .. } => vec![task.uid],
Batch::SnapshotCreation(tasks) | Batch::IndexDeletion { tasks, .. } => {
tasks.iter().map(|task| task.uid).collect()
@@ -167,6 +166,7 @@ impl Batch {
| IndexOperation::DocumentClear { tasks, .. } => {
tasks.iter().map(|task| task.uid).collect()
}
IndexOperation::IndexDocumentDeletionByFilter { task, .. } => vec![task.uid],
IndexOperation::SettingsAndDocumentOperation {
document_import_tasks: tasks,
settings_tasks: other,
@@ -194,8 +194,7 @@ impl Batch {
IndexOperation { op, .. } => Some(op.index_uid()),
IndexCreation { index_uid, .. }
| IndexUpdate { index_uid, .. }
| IndexDeletion { index_uid, .. }
| IndexDocumentDeletionByFilter { index_uid, .. } => Some(index_uid),
| IndexDeletion { index_uid, .. } => Some(index_uid),
}
}
}
@@ -205,6 +204,7 @@ impl IndexOperation {
match self {
IndexOperation::DocumentOperation { index_uid, .. }
| IndexOperation::DocumentDeletion { index_uid, .. }
| IndexOperation::IndexDocumentDeletionByFilter { index_uid, .. }
| IndexOperation::DocumentClear { index_uid, .. }
| IndexOperation::Settings { index_uid, .. }
| IndexOperation::DocumentClearAndSetting { index_uid, .. }
@@ -239,9 +239,12 @@ impl IndexScheduler {
let task = self.get_task(rtxn, id)?.ok_or(Error::CorruptedTaskQueue)?;
match &task.kind {
KindWithContent::DocumentDeletionByFilter { index_uid, .. } => {
Ok(Some(Batch::IndexDocumentDeletionByFilter {
index_uid: index_uid.clone(),
task,
Ok(Some(Batch::IndexOperation {
op: IndexOperation::IndexDocumentDeletionByFilter {
index_uid: index_uid.clone(),
task,
},
must_create_index: false,
}))
}
_ => unreachable!(),
@@ -534,7 +537,9 @@ impl IndexScheduler {
let index_tasks = self.index_tasks(rtxn, index_name)? & enqueued;
// If autobatching is disabled we only take one task at a time.
let tasks_limit = if self.autobatching_enabled { usize::MAX } else { 1 };
// Otherwise, we take only a maximum of tasks to create batches.
let tasks_limit =
if self.autobatching_enabled { self.maximum_number_of_batched_tasks } else { 1 };
let enqueued = index_tasks
.into_iter()
@@ -891,51 +896,6 @@ impl IndexScheduler {
Ok(tasks)
}
Batch::IndexDocumentDeletionByFilter { mut task, index_uid: _ } => {
let (index_uid, filter) =
if let KindWithContent::DocumentDeletionByFilter { index_uid, filter_expr } =
&task.kind
{
(index_uid, filter_expr)
} else {
unreachable!()
};
let index = {
let rtxn = self.env.read_txn()?;
self.index_mapper.index(&rtxn, index_uid)?
};
let deleted_documents = delete_document_by_filter(filter, index);
let original_filter = if let Some(Details::DocumentDeletionByFilter {
original_filter,
deleted_documents: _,
}) = task.details
{
original_filter
} else {
// In the case of a `documentDeleteByFilter` the details MUST be set
unreachable!();
};
match deleted_documents {
Ok(deleted_documents) => {
task.status = Status::Succeeded;
task.details = Some(Details::DocumentDeletionByFilter {
original_filter,
deleted_documents: Some(deleted_documents),
});
}
Err(e) => {
task.status = Status::Failed;
task.details = Some(Details::DocumentDeletionByFilter {
original_filter,
deleted_documents: Some(0),
});
task.error = Some(e.into());
}
}
Ok(vec![task])
}
Batch::IndexCreation { index_uid, primary_key, task } => {
let wtxn = self.env.write_txn()?;
if self.index_mapper.exists(&wtxn, &index_uid)? {
@@ -1292,6 +1252,47 @@ impl IndexScheduler {
Ok(tasks)
}
IndexOperation::IndexDocumentDeletionByFilter { mut task, index_uid: _ } => {
let filter =
if let KindWithContent::DocumentDeletionByFilter { filter_expr, .. } =
&task.kind
{
filter_expr
} else {
unreachable!()
};
let deleted_documents = delete_document_by_filter(index_wtxn, filter, index);
let original_filter = if let Some(Details::DocumentDeletionByFilter {
original_filter,
deleted_documents: _,
}) = task.details
{
original_filter
} else {
// In the case of a `documentDeleteByFilter` the details MUST be set
unreachable!();
};
match deleted_documents {
Ok(deleted_documents) => {
task.status = Status::Succeeded;
task.details = Some(Details::DocumentDeletionByFilter {
original_filter,
deleted_documents: Some(deleted_documents),
});
}
Err(e) => {
task.status = Status::Failed;
task.details = Some(Details::DocumentDeletionByFilter {
original_filter,
deleted_documents: Some(0),
});
task.error = Some(e.into());
}
}
Ok(vec![task])
}
IndexOperation::Settings { index_uid: _, settings, mut tasks } => {
let indexer_config = self.index_mapper.indexer_config();
let mut builder = milli::update::Settings::new(index_wtxn, index, indexer_config);
@@ -1491,23 +1492,22 @@ impl IndexScheduler {
}
}
fn delete_document_by_filter(filter: &serde_json::Value, index: Index) -> Result<u64> {
fn delete_document_by_filter<'a>(
wtxn: &mut RwTxn<'a, '_>,
filter: &serde_json::Value,
index: &'a Index,
) -> Result<u64> {
let filter = Filter::from_json(filter)?;
Ok(if let Some(filter) = filter {
let mut wtxn = index.write_txn()?;
let candidates = filter.evaluate(&wtxn, &index).map_err(|err| match err {
let candidates = filter.evaluate(wtxn, index).map_err(|err| match err {
milli::Error::UserError(milli::UserError::InvalidFilter(_)) => {
Error::from(err).with_custom_error_code(Code::InvalidDocumentFilter)
}
e => e.into(),
})?;
let mut delete_operation = DeleteDocuments::new(&mut wtxn, &index)?;
let mut delete_operation = DeleteDocuments::new(wtxn, index)?;
delete_operation.delete_documents(&candidates);
let deleted_documents =
delete_operation.execute().map(|result| result.deleted_documents)?;
wtxn.commit()?;
deleted_documents
delete_operation.execute().map(|result| result.deleted_documents)?
} else {
0
})

View File

@@ -15,6 +15,7 @@ pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String {
let IndexScheduler {
autobatching_enabled,
maximum_number_of_batched_tasks: _,
must_stop_processing: _,
processing_tasks,
file_store,

View File

@@ -253,6 +253,9 @@ pub struct IndexSchedulerOptions {
/// Set to `true` iff the index scheduler is allowed to automatically
/// batch tasks together, to process multiple tasks at once.
pub autobatching_enabled: bool,
/// If the autobatcher is allowed to automatically batch tasks
/// it will only batch this defined number of tasks at once.
pub maximum_number_of_batched_tasks: usize,
/// The maximum number of tasks stored in the task queue before starting
/// to auto schedule task deletions.
pub max_number_of_tasks: usize,
@@ -310,6 +313,9 @@ pub struct IndexScheduler {
/// Whether auto-batching is enabled or not.
pub(crate) autobatching_enabled: bool,
/// The maximum number of tasks that will be batched together.
pub(crate) maximum_number_of_batched_tasks: usize,
/// The max number of tasks allowed before the scheduler starts to delete
/// the finished tasks automatically.
pub(crate) max_number_of_tasks: usize,
@@ -363,6 +369,7 @@ impl IndexScheduler {
index_mapper: self.index_mapper.clone(),
wake_up: self.wake_up.clone(),
autobatching_enabled: self.autobatching_enabled,
maximum_number_of_batched_tasks: self.maximum_number_of_batched_tasks,
max_number_of_tasks: self.max_number_of_tasks,
snapshots_path: self.snapshots_path.clone(),
dumps_path: self.dumps_path.clone(),
@@ -458,6 +465,7 @@ impl IndexScheduler {
// we want to start the loop right away in case meilisearch was ctrl+Ced while processing things
wake_up: Arc::new(SignalEvent::auto(true)),
autobatching_enabled: options.autobatching_enabled,
maximum_number_of_batched_tasks: options.maximum_number_of_batched_tasks,
max_number_of_tasks: options.max_number_of_tasks,
dumps_path: options.dumps_path,
snapshots_path: options.snapshots_path,
@@ -790,10 +798,19 @@ impl IndexScheduler {
let mut res = BTreeMap::new();
let processing_tasks = { self.processing_tasks.read().unwrap().processing.len() };
res.insert(
"statuses".to_string(),
enum_iterator::all::<Status>()
.map(|s| Ok((s.to_string(), self.get_status(&rtxn, s)?.len())))
.map(|s| {
let tasks = self.get_status(&rtxn, s)?.len();
match s {
Status::Enqueued => Ok((s.to_string(), tasks - processing_tasks)),
Status::Processing => Ok((s.to_string(), processing_tasks)),
s => Ok((s.to_string(), tasks)),
}
})
.collect::<Result<BTreeMap<String, u64>>>()?,
);
res.insert(
@@ -1578,6 +1595,7 @@ mod tests {
index_count: 5,
indexer_config,
autobatching_enabled: true,
maximum_number_of_batched_tasks: usize::MAX,
max_number_of_tasks: 1_000_000,
instance_features: Default::default(),
};
@@ -4129,4 +4147,154 @@ mod tests {
snapshot!(json_string!(tasks, { "[].enqueuedAt" => "[date]", "[].startedAt" => "[date]", "[].finishedAt" => "[date]", ".**.original_filter" => "[filter]", ".**.query" => "[query]" }), name: "everything_has_been_processed");
drop(rtxn);
}
#[test]
fn basic_get_stats() {
let (index_scheduler, mut handle) = IndexScheduler::test(true, vec![]);
let kind = index_creation_task("catto", "mouse");
let _task = index_scheduler.register(kind).unwrap();
let kind = index_creation_task("doggo", "sheep");
let _task = index_scheduler.register(kind).unwrap();
let kind = index_creation_task("whalo", "fish");
let _task = index_scheduler.register(kind).unwrap();
snapshot!(json_string!(index_scheduler.get_stats().unwrap()), @r###"
{
"indexes": {
"catto": 1,
"doggo": 1,
"whalo": 1
},
"statuses": {
"canceled": 0,
"enqueued": 3,
"failed": 0,
"processing": 0,
"succeeded": 0
},
"types": {
"documentAdditionOrUpdate": 0,
"documentDeletion": 0,
"dumpCreation": 0,
"indexCreation": 3,
"indexDeletion": 0,
"indexSwap": 0,
"indexUpdate": 0,
"settingsUpdate": 0,
"snapshotCreation": 0,
"taskCancelation": 0,
"taskDeletion": 0
}
}
"###);
handle.advance_till([Start, BatchCreated]);
snapshot!(json_string!(index_scheduler.get_stats().unwrap()), @r###"
{
"indexes": {
"catto": 1,
"doggo": 1,
"whalo": 1
},
"statuses": {
"canceled": 0,
"enqueued": 2,
"failed": 0,
"processing": 1,
"succeeded": 0
},
"types": {
"documentAdditionOrUpdate": 0,
"documentDeletion": 0,
"dumpCreation": 0,
"indexCreation": 3,
"indexDeletion": 0,
"indexSwap": 0,
"indexUpdate": 0,
"settingsUpdate": 0,
"snapshotCreation": 0,
"taskCancelation": 0,
"taskDeletion": 0
}
}
"###);
handle.advance_till([
InsideProcessBatch,
InsideProcessBatch,
ProcessBatchSucceeded,
AfterProcessing,
Start,
BatchCreated,
]);
snapshot!(json_string!(index_scheduler.get_stats().unwrap()), @r###"
{
"indexes": {
"catto": 1,
"doggo": 1,
"whalo": 1
},
"statuses": {
"canceled": 0,
"enqueued": 1,
"failed": 0,
"processing": 1,
"succeeded": 1
},
"types": {
"documentAdditionOrUpdate": 0,
"documentDeletion": 0,
"dumpCreation": 0,
"indexCreation": 3,
"indexDeletion": 0,
"indexSwap": 0,
"indexUpdate": 0,
"settingsUpdate": 0,
"snapshotCreation": 0,
"taskCancelation": 0,
"taskDeletion": 0
}
}
"###);
// now we make one more batch, the started_at field of the new tasks will be past `second_start_time`
handle.advance_till([
InsideProcessBatch,
InsideProcessBatch,
ProcessBatchSucceeded,
AfterProcessing,
Start,
BatchCreated,
]);
snapshot!(json_string!(index_scheduler.get_stats().unwrap()), @r###"
{
"indexes": {
"catto": 1,
"doggo": 1,
"whalo": 1
},
"statuses": {
"canceled": 0,
"enqueued": 0,
"failed": 0,
"processing": 1,
"succeeded": 2
},
"types": {
"documentAdditionOrUpdate": 0,
"documentDeletion": 0,
"dumpCreation": 0,
"indexCreation": 3,
"indexDeletion": 0,
"indexSwap": 0,
"indexUpdate": 0,
"settingsUpdate": 0,
"snapshotCreation": 0,
"taskCancelation": 0,
"taskDeletion": 0
}
}
"###);
}
}

View File

@@ -167,7 +167,9 @@ macro_rules! snapshot {
let (settings, snap_name, _) = $crate::default_snapshot_settings_for_test(test_name, Some(&snap_name));
settings.bind(|| {
let snap = format!("{}", $value);
meili_snap::insta::assert_snapshot!(format!("{}", snap_name), snap);
insta::allow_duplicates! {
meili_snap::insta::assert_snapshot!(format!("{}", snap_name), snap);
}
});
};
($value:expr, @$inline:literal) => {
@@ -176,7 +178,9 @@ macro_rules! snapshot {
let (settings, _, _) = $crate::default_snapshot_settings_for_test("", Some("_dummy_argument"));
settings.bind(|| {
let snap = format!("{}", $value);
meili_snap::insta::assert_snapshot!(snap, @$inline);
insta::allow_duplicates! {
meili_snap::insta::assert_snapshot!(snap, @$inline);
}
});
};
($value:expr) => {
@@ -194,11 +198,37 @@ macro_rules! snapshot {
let (settings, snap_name, _) = $crate::default_snapshot_settings_for_test(test_name, None);
settings.bind(|| {
let snap = format!("{}", $value);
meili_snap::insta::assert_snapshot!(format!("{}", snap_name), snap);
insta::allow_duplicates! {
meili_snap::insta::assert_snapshot!(format!("{}", snap_name), snap);
}
});
};
}
/// Create a string from the value by serializing it as Json, optionally
/// redacting some parts of it.
///
/// The second argument to the macro can be an object expression for redaction.
/// It's in the form { selector => replacement }. For more information about redactions
/// refer to the redactions feature in the `insta` guide.
#[macro_export]
macro_rules! json_string {
($value:expr, {$($k:expr => $v:expr),*$(,)?}) => {
{
let (_, snap) = meili_snap::insta::_prepare_snapshot_for_redaction!($value, {$($k => $v),*}, Json, File);
snap
}
};
($value:expr) => {{
let value = meili_snap::insta::_macro_support::serialize_value(
&$value,
meili_snap::insta::_macro_support::SerializationFormat::Json,
meili_snap::insta::_macro_support::SnapshotLocation::File
);
value
}};
}
#[cfg(test)]
mod tests {
use crate as meili_snap;
@@ -250,27 +280,3 @@ mod tests {
}
}
}
/// Create a string from the value by serializing it as Json, optionally
/// redacting some parts of it.
///
/// The second argument to the macro can be an object expression for redaction.
/// It's in the form { selector => replacement }. For more information about redactions
/// refer to the redactions feature in the `insta` guide.
#[macro_export]
macro_rules! json_string {
($value:expr, {$($k:expr => $v:expr),*$(,)?}) => {
{
let (_, snap) = meili_snap::insta::_prepare_snapshot_for_redaction!($value, {$($k => $v),*}, Json, File);
snap
}
};
($value:expr) => {{
let value = meili_snap::insta::_macro_support::serialize_value(
&$value,
meili_snap::insta::_macro_support::SerializationFormat::Json,
meili_snap::insta::_macro_support::SnapshotLocation::File
);
value
}};
}

View File

@@ -1,6 +1,6 @@
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize, Debug, Clone, Copy, Default)]
#[derive(Serialize, Deserialize, Debug, Clone, Copy, Default, PartialEq, Eq)]
#[serde(rename_all = "camelCase", default)]
pub struct RuntimeTogglableFeatures {
pub score_details: bool,

View File

@@ -141,5 +141,5 @@ thai = ["meilisearch-types/thai"]
greek = ["meilisearch-types/greek"]
[package.metadata.mini-dashboard]
assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.7/build.zip"
sha1 = "28b45bf772c84f9a6e16bc1689b393bfce8da7d6"
assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.11/build.zip"
sha1 = "83cd44ed1e5f97ecb581dc9f958a63f4ccc982d9"

View File

@@ -285,6 +285,7 @@ impl From<Opt> for Infos {
db_path,
experimental_enable_metrics,
experimental_reduce_indexing_memory_usage,
experimental_limit_batched_tasks: _,
http_addr,
master_key: _,
env,
@@ -574,6 +575,10 @@ pub struct SearchAggregator {
filter_total_number_of_criteria: usize,
used_syntax: HashMap<String, usize>,
// attributes_to_search_on
// every time a search is done using attributes_to_search_on
attributes_to_search_on_total_number_of_uses: usize,
// q
// The maximum number of terms in a q request
max_terms_number: usize,
@@ -647,6 +652,11 @@ impl SearchAggregator {
ret.filter_sum_of_criteria_terms = RE.split(&stringified_filters).count();
}
// attributes_to_search_on
if let Some(_) = query.attributes_to_search_on {
ret.attributes_to_search_on_total_number_of_uses = 1;
}
if let Some(ref q) = query.q {
ret.max_terms_number = q.split_whitespace().count();
}
@@ -720,9 +730,18 @@ impl SearchAggregator {
let used_syntax = self.used_syntax.entry(key).or_insert(0);
*used_syntax = used_syntax.saturating_add(value);
}
// attributes_to_search_on
self.attributes_to_search_on_total_number_of_uses = self
.attributes_to_search_on_total_number_of_uses
.saturating_add(other.attributes_to_search_on_total_number_of_uses);
// q
self.max_terms_number = self.max_terms_number.max(other.max_terms_number);
// vector
self.max_vector_size = self.max_vector_size.max(other.max_vector_size);
// pagination
self.max_limit = self.max_limit.max(other.max_limit);
self.max_offset = self.max_offset.max(other.max_offset);
@@ -786,9 +805,15 @@ impl SearchAggregator {
"avg_criteria_number": format!("{:.2}", self.filter_sum_of_criteria_terms as f64 / self.filter_total_number_of_criteria as f64),
"most_used_syntax": self.used_syntax.iter().max_by_key(|(_, v)| *v).map(|(k, _)| json!(k)).unwrap_or_else(|| json!(null)),
},
"attributes_to_search_on": {
"total_number_of_uses": self.attributes_to_search_on_total_number_of_uses,
},
"q": {
"max_terms_number": self.max_terms_number,
},
"vector": {
"max_vector_size": self.max_vector_size,
},
"pagination": {
"max_limit": self.max_limit,
"max_offset": self.max_offset,
@@ -843,6 +868,10 @@ pub struct MultiSearchAggregator {
// sum of the number of search queries in the requests, use with total_received to compute an average
total_search_count: usize,
// scoring
show_ranking_score: bool,
show_ranking_score_details: bool,
// context
user_agents: HashSet<String>,
}
@@ -856,6 +885,9 @@ impl MultiSearchAggregator {
let distinct_indexes: HashSet<_> =
query.iter().map(|query| query.index_uid.as_str()).collect();
let show_ranking_score = query.iter().any(|query| query.show_ranking_score);
let show_ranking_score_details = query.iter().any(|query| query.show_ranking_score_details);
Self {
timestamp,
total_received: 1,
@@ -863,6 +895,8 @@ impl MultiSearchAggregator {
total_distinct_index_count: distinct_indexes.len(),
total_single_index: if distinct_indexes.len() == 1 { 1 } else { 0 },
total_search_count: query.len(),
show_ranking_score,
show_ranking_score_details,
user_agents,
}
}
@@ -884,6 +918,9 @@ impl MultiSearchAggregator {
this.total_distinct_index_count.saturating_add(other.total_distinct_index_count);
let total_single_index = this.total_single_index.saturating_add(other.total_single_index);
let total_search_count = this.total_search_count.saturating_add(other.total_search_count);
let show_ranking_score = this.show_ranking_score || other.show_ranking_score;
let show_ranking_score_details =
this.show_ranking_score_details || other.show_ranking_score_details;
let mut user_agents = this.user_agents;
for user_agent in other.user_agents.into_iter() {
@@ -899,6 +936,8 @@ impl MultiSearchAggregator {
total_single_index,
total_search_count,
user_agents,
show_ranking_score,
show_ranking_score_details,
// do not add _ or ..Default::default() here
};
@@ -925,6 +964,10 @@ impl MultiSearchAggregator {
"searches": {
"total_search_count": self.total_search_count,
"avg_search_count": (self.total_search_count as f64) / (self.total_received as f64),
},
"scoring": {
"show_ranking_score": self.show_ranking_score,
"show_ranking_score_details": self.show_ranking_score_details,
}
});

View File

@@ -236,6 +236,7 @@ fn open_or_create_database_unchecked(
enable_mdb_writemap: opt.experimental_reduce_indexing_memory_usage,
indexer_config: (&opt.indexer_options).try_into()?,
autobatching_enabled: true,
maximum_number_of_batched_tasks: opt.experimental_limit_batched_tasks,
max_number_of_tasks: 1_000_000,
index_growth_amount: byte_unit::Byte::from_str("10GiB").unwrap().get_bytes() as usize,
index_count: DEFAULT_INDEX_COUNT,

View File

@@ -187,7 +187,7 @@ Anonymous telemetry:\t\"Enabled\""
}
eprintln!();
eprintln!("Check out Meilisearch Cloud!\thttps://cloud.meilisearch.com/login?utm_campaign=oss&utm_source=engine&utm_medium=cli");
eprintln!("Check out Meilisearch Cloud!\thttps://www.meilisearch.com/cloud?utm_campaign=oss&utm_source=engine&utm_medium=cli");
eprintln!("Documentation:\t\t\thttps://www.meilisearch.com/docs");
eprintln!("Source code:\t\t\thttps://github.com/meilisearch/meilisearch");
eprintln!("Discord:\t\t\thttps://discord.meilisearch.com");

View File

@@ -51,6 +51,7 @@ const MEILI_LOG_LEVEL: &str = "MEILI_LOG_LEVEL";
const MEILI_EXPERIMENTAL_ENABLE_METRICS: &str = "MEILI_EXPERIMENTAL_ENABLE_METRICS";
const MEILI_EXPERIMENTAL_REDUCE_INDEXING_MEMORY_USAGE: &str =
"MEILI_EXPERIMENTAL_REDUCE_INDEXING_MEMORY_USAGE";
const MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS: &str = "MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS";
const DEFAULT_CONFIG_FILE_PATH: &str = "./config.toml";
const DEFAULT_DB_PATH: &str = "./data.ms";
@@ -301,6 +302,11 @@ pub struct Opt {
#[serde(default)]
pub experimental_reduce_indexing_memory_usage: bool,
/// Experimental limit to the number of tasks per batch
#[clap(long, env = MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS, default_value_t = default_limit_batched_tasks())]
#[serde(default = "default_limit_batched_tasks")]
pub experimental_limit_batched_tasks: usize,
#[serde(flatten)]
#[clap(flatten)]
pub indexer_options: IndexerOpts,
@@ -393,7 +399,8 @@ impl Opt {
#[cfg(all(not(debug_assertions), feature = "analytics"))]
no_analytics,
experimental_enable_metrics: enable_metrics_route,
experimental_reduce_indexing_memory_usage: reduce_indexing_memory_usage,
experimental_reduce_indexing_memory_usage,
experimental_limit_batched_tasks,
} = self;
export_to_env_if_not_present(MEILI_DB_PATH, db_path);
export_to_env_if_not_present(MEILI_HTTP_ADDR, http_addr);
@@ -437,7 +444,11 @@ impl Opt {
);
export_to_env_if_not_present(
MEILI_EXPERIMENTAL_REDUCE_INDEXING_MEMORY_USAGE,
reduce_indexing_memory_usage.to_string(),
experimental_reduce_indexing_memory_usage.to_string(),
);
export_to_env_if_not_present(
MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS,
experimental_limit_batched_tasks.to_string(),
);
indexer_options.export_to_env();
}
@@ -739,6 +750,10 @@ fn default_dump_dir() -> PathBuf {
PathBuf::from(DEFAULT_DUMP_DIR)
}
fn default_limit_batched_tasks() -> usize {
usize::MAX
}
/// Indicates if a snapshot was scheduled, and if yes with which interval.
#[derive(Debug, Default, Copy, Clone, Deserialize, Serialize)]
pub enum ScheduleSnapshot {

View File

@@ -64,7 +64,20 @@ async fn patch_features(
vector_store: new_features.0.vector_store.unwrap_or(old_features.vector_store),
};
analytics.publish("Experimental features Updated".to_string(), json!(new_features), Some(&req));
// explicitly destructure for analytics rather than using the `Serialize` implementation, because
// the it renames to camelCase, which we don't want for analytics.
// **Do not** ignore fields with `..` or `_` here, because we want to add them in the future.
let meilisearch_types::features::RuntimeTogglableFeatures { score_details, vector_store } =
new_features;
analytics.publish(
"Experimental features Updated".to_string(),
json!({
"score_details": score_details,
"vector_store": vector_store,
}),
Some(&req),
);
index_scheduler.put_runtime_features(new_features)?;
Ok(HttpResponse::Ok().json(new_features))
}

View File

@@ -35,7 +35,7 @@ pub struct SearchQueryGet {
#[deserr(default, error = DeserrQueryParamError<InvalidSearchQ>)]
q: Option<String>,
#[deserr(default, error = DeserrQueryParamError<InvalidSearchVector>)]
vector: Option<Vec<f32>>,
vector: Option<CS<f32>>,
#[deserr(default = Param(DEFAULT_SEARCH_OFFSET()), error = DeserrQueryParamError<InvalidSearchOffset>)]
offset: Param<usize>,
#[deserr(default = Param(DEFAULT_SEARCH_LIMIT()), error = DeserrQueryParamError<InvalidSearchLimit>)]
@@ -88,7 +88,7 @@ impl From<SearchQueryGet> for SearchQuery {
Self {
q: other.q,
vector: other.vector,
vector: other.vector.map(CS::into_inner),
offset: other.offset.0,
limit: other.limit.0,
page: other.page.as_deref().copied(),

View File

@@ -284,9 +284,6 @@ pub fn create_all_stats(
used_database_size += index_scheduler.used_size()?;
database_size += auth_controller.size()?;
used_database_size += auth_controller.used_size()?;
let update_file_size = index_scheduler.compute_update_file_size()?;
database_size += update_file_size;
used_database_size += update_file_size;
let stats = Stats { database_size, used_database_size, last_update: last_task, indexes };
Ok(stats)

View File

@@ -666,6 +666,7 @@ fn compute_semantic_score(query: &[f32], vectors: Value) -> milli::Result<Option
.map_err(InternalError::SerdeJson)?;
Ok(vectors
.into_iter()
.flatten()
.map(|v| OrderedFloat(dot_product_similarity(query, &v)))
.max()
.map(OrderedFloat::into_inner))

View File

@@ -61,6 +61,8 @@ pub static AUTHORIZATIONS: Lazy<HashMap<(&'static str, &'static str), HashSet<&'
("DELETE", "/keys/mykey/") => hashset!{"keys.delete", "*"},
("POST", "/keys") => hashset!{"keys.create", "*"},
("GET", "/keys") => hashset!{"keys.get", "*"},
("GET", "/experimental-features") => hashset!{"experimental.get", "*"},
("PATCH", "/experimental-features") => hashset!{"experimental.update", "*"},
};
authorizations

View File

@@ -189,6 +189,14 @@ impl Server {
let url = format!("/tasks/{}", update_id);
self.service.get(url).await
}
pub async fn get_features(&self) -> (Value, StatusCode) {
self.service.get("/experimental-features").await
}
pub async fn set_features(&self, value: Value) -> (Value, StatusCode) {
self.service.patch("/experimental-features", value).await
}
}
pub fn default_settings(dir: impl AsRef<Path>) -> Opt {

View File

@@ -154,6 +154,19 @@ async fn delete_document_by_filter() {
)
.await;
index.wait_task(1).await;
let (stats, _) = index.stats().await;
snapshot!(json_string!(stats), @r###"
{
"numberOfDocuments": 4,
"isIndexing": false,
"fieldDistribution": {
"color": 3,
"id": 4
}
}
"###);
let (response, code) =
index.delete_document_by_filter(json!({ "filter": "color = blue"})).await;
snapshot!(code, @"202 Accepted");
@@ -188,6 +201,18 @@ async fn delete_document_by_filter() {
}
"###);
let (stats, _) = index.stats().await;
snapshot!(json_string!(stats), @r###"
{
"numberOfDocuments": 2,
"isIndexing": false,
"fieldDistribution": {
"color": 1,
"id": 2
}
}
"###);
let (documents, code) = index.get_all_documents(GetAllDocumentsOptions::default()).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(documents), @r###"
@@ -241,6 +266,18 @@ async fn delete_document_by_filter() {
}
"###);
let (stats, _) = index.stats().await;
snapshot!(json_string!(stats), @r###"
{
"numberOfDocuments": 1,
"isIndexing": false,
"fieldDistribution": {
"color": 1,
"id": 1
}
}
"###);
let (documents, code) = index.get_all_documents(GetAllDocumentsOptions::default()).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(documents), @r###"

View File

@@ -0,0 +1,109 @@
use serde_json::json;
use crate::common::Server;
/// Feature name to test against.
/// This will have to be changed by a different one when that feature is stabilized.
/// All tests that need to set a feature can make use of this constant.
const FEATURE_NAME: &str = "vectorStore";
#[actix_rt::test]
async fn experimental_features() {
let server = Server::new().await;
let (response, code) = server.get_features().await;
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response), @r###"
{
"scoreDetails": false,
"vectorStore": false
}
"###);
let (response, code) = server.set_features(json!({FEATURE_NAME: true})).await;
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response), @r###"
{
"scoreDetails": false,
"vectorStore": true
}
"###);
let (response, code) = server.get_features().await;
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response), @r###"
{
"scoreDetails": false,
"vectorStore": true
}
"###);
// sending null does not change the value
let (response, code) = server.set_features(json!({FEATURE_NAME: null})).await;
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response), @r###"
{
"scoreDetails": false,
"vectorStore": true
}
"###);
// not sending the field does not change the value
let (response, code) = server.set_features(json!({})).await;
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response), @r###"
{
"scoreDetails": false,
"vectorStore": true
}
"###);
}
#[actix_rt::test]
async fn errors() {
let server = Server::new().await;
// Sending a feature not in the list is an error
let (response, code) = server.set_features(json!({"NotAFeature": true})).await;
meili_snap::snapshot!(code, @"400 Bad Request");
meili_snap::snapshot!(meili_snap::json_string!(response), @r###"
{
"message": "Unknown field `NotAFeature`: expected one of `scoreDetails`, `vectorStore`",
"code": "bad_request",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#bad_request"
}
"###);
// The type must be a bool, not a number
let (response, code) = server.set_features(json!({FEATURE_NAME: 42})).await;
meili_snap::snapshot!(code, @"400 Bad Request");
meili_snap::snapshot!(meili_snap::json_string!(response), @r###"
{
"message": "Invalid value type at `.vectorStore`: expected a boolean, but found a positive integer: `42`",
"code": "bad_request",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#bad_request"
}
"###);
// The type must be a bool, not a string
let (response, code) = server.set_features(json!({FEATURE_NAME: "true"})).await;
meili_snap::snapshot!(code, @"400 Bad Request");
meili_snap::snapshot!(meili_snap::json_string!(response), @r###"
{
"message": "Invalid value type at `.vectorStore`: expected a boolean, but found a string: `\"true\"`",
"code": "bad_request",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#bad_request"
}
"###);
}

View File

@@ -3,6 +3,7 @@ mod common;
mod dashboard;
mod documents;
mod dumps;
mod features;
mod index;
mod search;
mod settings;

View File

@@ -968,9 +968,12 @@ async fn sort_unset_ranking_rule() {
async fn search_on_unknown_field() {
let server = Server::new().await;
let index = server.index("test");
index.update_settings_searchable_attributes(json!(["id", "title"])).await;
index.wait_task(0).await;
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(0).await;
index.wait_task(1).await;
index
.search(
@@ -989,3 +992,49 @@ async fn search_on_unknown_field() {
)
.await;
}
#[actix_rt::test]
async fn search_on_unknown_field_plus_joker() {
let server = Server::new().await;
let index = server.index("test");
index.update_settings_searchable_attributes(json!(["id", "title"])).await;
index.wait_task(0).await;
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_task(1).await;
index
.search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["*", "unknown"]}),
|response, code| {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.",
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
}
"###);
},
)
.await;
index
.search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["unknown", "*"]}),
|response, code| {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Attribute `unknown` is not searchable. Available searchable attributes are: `id, title`.",
"code": "invalid_search_attributes_to_search_on",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_attributes_to_search_on"
}
"###);
},
)
.await;
}

View File

@@ -1,3 +1,4 @@
use meili_snap::snapshot;
use once_cell::sync::Lazy;
use serde_json::{json, Value};
@@ -56,6 +57,54 @@ async fn simple_facet_search() {
assert_eq!(response["facetHits"].as_array().unwrap().len(), 1);
}
#[actix_rt::test]
async fn advanced_facet_search() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.update_settings_filterable_attributes(json!(["genres"])).await;
index.update_settings_typo_tolerance(json!({ "enabled": false })).await;
index.add_documents(documents, None).await;
index.wait_task(2).await;
let (response, code) =
index.facet_search(json!({"facetName": "genres", "facetQuery": "adventre"})).await;
snapshot!(code, @"200 OK");
snapshot!(response["facetHits"].as_array().unwrap().len(), @"0");
let (response, code) =
index.facet_search(json!({"facetName": "genres", "facetQuery": "àdventure"})).await;
snapshot!(code, @"200 OK");
snapshot!(response["facetHits"].as_array().unwrap().len(), @"1");
}
#[actix_rt::test]
async fn more_advanced_facet_search() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.update_settings_filterable_attributes(json!(["genres"])).await;
index.update_settings_typo_tolerance(json!({ "disableOnWords": ["adventre"] })).await;
index.add_documents(documents, None).await;
index.wait_task(2).await;
let (response, code) =
index.facet_search(json!({"facetName": "genres", "facetQuery": "adventre"})).await;
snapshot!(code, @"200 OK");
snapshot!(response["facetHits"].as_array().unwrap().len(), @"0");
let (response, code) =
index.facet_search(json!({"facetName": "genres", "facetQuery": "adventure"})).await;
snapshot!(code, @"200 OK");
snapshot!(response["facetHits"].as_array().unwrap().len(), @"1");
}
#[actix_rt::test]
async fn non_filterable_facet_search_error() {
let server = Server::new().await;

View File

@@ -0,0 +1,119 @@
use meili_snap::{json_string, snapshot};
use once_cell::sync::Lazy;
use serde_json::{json, Value};
use crate::common::Server;
pub(self) static DOCUMENTS: Lazy<Value> = Lazy::new(|| {
json!([
{
"id": 1,
"name": "Taco Truck",
"address": "444 Salsa Street, Burritoville",
"type": "Mexican",
"rating": 9,
"_geo": {
"lat": 34.0522,
"lng": -118.2437
}
},
{
"id": 2,
"name": "La Bella Italia",
"address": "456 Elm Street, Townsville",
"type": "Italian",
"rating": 9,
"_geo": {
"lat": "45.4777599",
"lng": "9.1967508"
}
},
{
"id": 3,
"name": "Crêpe Truck",
"address": "2 Billig Avenue, Rouenville",
"type": "French",
"rating": 10
}
])
});
#[actix_rt::test]
async fn geo_sort_with_geo_strings() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.update_settings_filterable_attributes(json!(["_geo"])).await;
index.update_settings_sortable_attributes(json!(["_geo"])).await;
index.add_documents(documents, None).await;
index.wait_task(2).await;
index
.search(
json!({
"filter": "_geoRadius(45.472735, 9.184019, 10000)",
"sort": ["_geoPoint(0.0, 0.0):asc"]
}),
|response, code| {
assert_eq!(code, 200, "{}", response);
},
)
.await;
}
#[actix_rt::test]
async fn geo_bounding_box_with_string_and_number() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.update_settings_filterable_attributes(json!(["_geo"])).await;
index.update_settings_sortable_attributes(json!(["_geo"])).await;
index.add_documents(documents, None).await;
index.wait_task(2).await;
index
.search(
json!({
"filter": "_geoBoundingBox([89, 179], [-89, -179])",
}),
|response, code| {
assert_eq!(code, 200, "{}", response);
snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r###"
{
"hits": [
{
"id": 1,
"name": "Taco Truck",
"address": "444 Salsa Street, Burritoville",
"type": "Mexican",
"rating": 9,
"_geo": {
"lat": 34.0522,
"lng": -118.2437
}
},
{
"id": 2,
"name": "La Bella Italia",
"address": "456 Elm Street, Townsville",
"type": "Italian",
"rating": 9,
"_geo": {
"lat": "45.4777599",
"lng": "9.1967508"
}
}
],
"query": "",
"processingTimeMs": "[time]",
"limit": 20,
"offset": 0,
"estimatedTotalHits": 2
}
"###);
},
)
.await;
}

View File

@@ -4,6 +4,7 @@
mod errors;
mod facet_search;
mod formatted;
mod geo;
mod multi;
mod pagination;
mod restrict_searchable;
@@ -752,3 +753,354 @@ async fn faceting_max_values_per_facet() {
)
.await;
}
#[actix_rt::test]
async fn experimental_feature_score_details() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(json!(documents), None).await;
index.wait_task(0).await;
index
.search(
json!({
"q": "train dragon",
"showRankingScoreDetails": true,
}),
|response, code| {
meili_snap::snapshot!(code, @"400 Bad Request");
meili_snap::snapshot!(meili_snap::json_string!(response), @r###"
{
"message": "Computing score details requires enabling the `score details` experimental feature. See https://github.com/meilisearch/product/discussions/674",
"code": "feature_not_enabled",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#feature_not_enabled"
}
"###);
},
)
.await;
let (response, code) = server.set_features(json!({"scoreDetails": true})).await;
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(response["scoreDetails"], @"true");
index
.search(
json!({
"q": "train dragon",
"showRankingScoreDetails": true,
}),
|response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"title": "How to Train Your Dragon: The Hidden World",
"id": "166428",
"_rankingScoreDetails": {
"words": {
"order": 0,
"matchingWords": 2,
"maxMatchingWords": 2,
"score": 1.0
},
"typo": {
"order": 1,
"typoCount": 0,
"maxTypoCount": 2,
"score": 1.0
},
"proximity": {
"order": 2,
"score": 0.875
},
"attribute": {
"order": 3,
"attributeRankingOrderScore": 1.0,
"queryWordDistanceScore": 0.8095238095238095,
"score": 0.9365079365079364
},
"exactness": {
"order": 4,
"matchType": "noExactMatch",
"matchingWords": 2,
"maxMatchingWords": 2,
"score": 0.3333333333333333
}
}
}
]
"###);
},
)
.await;
}
#[actix_rt::test]
async fn experimental_feature_vector_store() {
let server = Server::new().await;
let index = server.index("test");
let documents = DOCUMENTS.clone();
index.add_documents(json!(documents), None).await;
index.wait_task(0).await;
let (response, code) = index
.search_post(json!({
"vector": [1.0, 2.0, 3.0],
}))
.await;
meili_snap::snapshot!(code, @"400 Bad Request");
meili_snap::snapshot!(meili_snap::json_string!(response), @r###"
{
"message": "Passing `vector` as a query parameter requires enabling the `vector store` experimental feature. See https://github.com/meilisearch/product/discussions/677",
"code": "feature_not_enabled",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#feature_not_enabled"
}
"###);
let (response, code) = server.set_features(json!({"vectorStore": true})).await;
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(response["vectorStore"], @"true");
let (response, code) = index
.search_post(json!({
"vector": [1.0, 2.0, 3.0],
}))
.await;
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @"[]");
}
#[cfg(feature = "default")]
#[actix_rt::test]
async fn camelcased_words() {
let server = Server::new().await;
let index = server.index("test");
// related to https://github.com/meilisearch/meilisearch/issues/3818
let documents = json!([
{ "id": 0, "title": "DeLonghi" },
{ "id": 1, "title": "delonghi" },
{ "id": 2, "title": "TestAB" },
{ "id": 3, "title": "TestAb" },
{ "id": 4, "title": "testab" },
]);
index.add_documents(documents, None).await;
index.wait_task(0).await;
index
.search(json!({"q": "deLonghi"}), |response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"id": 0,
"title": "DeLonghi"
},
{
"id": 1,
"title": "delonghi"
}
]
"###);
})
.await;
index
.search(json!({"q": "dellonghi"}), |response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"id": 0,
"title": "DeLonghi"
},
{
"id": 1,
"title": "delonghi"
}
]
"###);
})
.await;
index
.search(json!({"q": "testa"}), |response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"id": 2,
"title": "TestAB"
},
{
"id": 3,
"title": "TestAb"
},
{
"id": 4,
"title": "testab"
}
]
"###);
})
.await;
index
.search(json!({"q": "testab"}), |response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"id": 2,
"title": "TestAB"
},
{
"id": 3,
"title": "TestAb"
},
{
"id": 4,
"title": "testab"
}
]
"###);
})
.await;
index
.search(json!({"q": "TestaB"}), |response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"id": 2,
"title": "TestAB"
},
{
"id": 3,
"title": "TestAb"
},
{
"id": 4,
"title": "testab"
}
]
"###);
})
.await;
index
.search(json!({"q": "Testab"}), |response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"id": 2,
"title": "TestAB"
},
{
"id": 3,
"title": "TestAb"
},
{
"id": 4,
"title": "testab"
}
]
"###);
})
.await;
index
.search(json!({"q": "TestAb"}), |response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"id": 2,
"title": "TestAB"
},
{
"id": 3,
"title": "TestAb"
},
{
"id": 4,
"title": "testab"
}
]
"###);
})
.await;
// with Typos
index
.search(json!({"q": "dellonghi"}), |response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"id": 0,
"title": "DeLonghi"
},
{
"id": 1,
"title": "delonghi"
}
]
"###);
})
.await;
index
.search(json!({"q": "TetsAB"}), |response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"id": 2,
"title": "TestAB"
},
{
"id": 3,
"title": "TestAb"
},
{
"id": 4,
"title": "testab"
}
]
"###);
})
.await;
index
.search(json!({"q": "TetsAB"}), |response, code| {
meili_snap::snapshot!(code, @"200 OK");
meili_snap::snapshot!(meili_snap::json_string!(response["hits"]), @r###"
[
{
"id": 2,
"title": "TestAB"
},
{
"id": 3,
"title": "TestAb"
},
{
"id": 4,
"title": "testab"
}
]
"###);
})
.await;
}

View File

@@ -49,6 +49,76 @@ async fn simple_search_on_title() {
.await;
}
#[actix_rt::test]
async fn search_no_searchable_attribute_set() {
let server = Server::new().await;
let index = index_with_documents(&server, &SIMPLE_SEARCH_DOCUMENTS).await;
index
.search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["unknown"]}),
|response, code| {
snapshot!(code, @"200 OK");
snapshot!(response["hits"].as_array().unwrap().len(), @"0");
},
)
.await;
index.update_settings_searchable_attributes(json!(["*"])).await;
index.wait_task(1).await;
index
.search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["unknown"]}),
|response, code| {
snapshot!(code, @"200 OK");
snapshot!(response["hits"].as_array().unwrap().len(), @"0");
},
)
.await;
index.update_settings_searchable_attributes(json!(["*"])).await;
index.wait_task(2).await;
index
.search(
json!({"q": "Captain Marvel", "attributesToSearchOn": ["unknown", "title"]}),
|response, code| {
snapshot!(code, @"200 OK");
snapshot!(response["hits"].as_array().unwrap().len(), @"2");
},
)
.await;
}
#[actix_rt::test]
async fn search_on_all_attributes() {
let server = Server::new().await;
let index = index_with_documents(&server, &SIMPLE_SEARCH_DOCUMENTS).await;
index
.search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["*"]}), |response, code| {
snapshot!(code, @"200 OK");
snapshot!(response["hits"].as_array().unwrap().len(), @"3");
})
.await;
}
#[actix_rt::test]
async fn search_on_all_attributes_restricted_set() {
let server = Server::new().await;
let index = index_with_documents(&server, &SIMPLE_SEARCH_DOCUMENTS).await;
index.update_settings_searchable_attributes(json!(["title"])).await;
index.wait_task(1).await;
index
.search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["*"]}), |response, code| {
snapshot!(code, @"200 OK");
snapshot!(response["hits"].as_array().unwrap().len(), @"2");
})
.await;
}
#[actix_rt::test]
async fn simple_prefix_search_on_title() {
let server = Server::new().await;
@@ -240,7 +310,7 @@ async fn exactness_ranking_rule_order() {
},
{
"title": "Captain Marvel",
"desc": "CaptainMarvel",
"desc": "Captain the Marvel",
"id": "2",
}]),
)

View File

@@ -17,7 +17,7 @@ bincode = "1.3.3"
bstr = "1.4.0"
bytemuck = { version = "1.13.1", features = ["extern_crate_alloc"] }
byteorder = "1.4.3"
charabia = { version = "0.8.1", default-features = false }
charabia = { version = "0.8.3", default-features = false }
concat-arrays = "0.1.2"
crossbeam-channel = "0.5.8"
deserr = "0.5.0"
@@ -29,12 +29,11 @@ geoutils = "0.5.1"
grenad = { version = "0.4.4", default-features = false, features = [
"tempfile",
] }
heed = { git = "https://github.com/meilisearch/heed", tag = "v0.12.6", default-features = false, features = [
"lmdb",
"sync-read-txn",
heed = { git = "https://github.com/meilisearch/heed", tag = "v0.12.7", default-features = false, features = [
"lmdb", "read-txn-no-tls"
] }
hnsw = { version = "0.11.0", features = ["serde1"] }
indexmap = { version = "1.9.3", features = ["serde"] }
instant-distance = { version = "0.6.1", features = ["with-serde"] }
json-depth-checker = { path = "../json-depth-checker" }
levenshtein_automata = { version = "0.2.1", features = ["fst_automaton"] }
memmap2 = "0.5.10"
@@ -48,7 +47,6 @@ rstar = { version = "0.10.0", features = ["serde"] }
serde = { version = "1.0.160", features = ["derive"] }
serde_json = { version = "1.0.95", features = ["preserve_order"] }
slice-group-by = "0.3.0"
space = "0.17.0"
smallstr = { version = "0.3.0", features = ["serde"] }
smallvec = "1.10.0"
smartstring = "1.0.1"
@@ -81,7 +79,7 @@ md5 = "0.7.0"
rand = { version = "0.8.5", features = ["small_rng"] }
[features]
all-tokenizations = ["charabia/default"]
all-tokenizations = ["charabia/chinese", "charabia/hebrew", "charabia/japanese", "charabia/thai", "charabia/korean", "charabia/greek"]
# Use POSIX semaphores instead of SysV semaphores in LMDB
# For more information on this feature, see heed's Cargo.toml

View File

@@ -1,20 +1,36 @@
use std::ops;
use instant_distance::Point;
use serde::{Deserialize, Serialize};
use space::Metric;
#[derive(Debug, Default, Clone, Copy, Serialize, Deserialize)]
pub struct DotProduct;
use crate::normalize_vector;
impl Metric<Vec<f32>> for DotProduct {
type Unit = u32;
#[derive(Debug, Default, Clone, Serialize, Deserialize)]
pub struct NDotProductPoint(Vec<f32>);
// Following <https://docs.rs/space/0.17.0/space/trait.Metric.html>.
//
// Here is a playground that validate the ordering of the bit representation of floats in range 0.0..=1.0:
// <https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6c59e31a3cc5036b32edf51e8937b56e>
fn distance(&self, a: &Vec<f32>, b: &Vec<f32>) -> Self::Unit {
let dist = 1.0 - dot_product_similarity(a, b);
impl NDotProductPoint {
pub fn new(point: Vec<f32>) -> Self {
NDotProductPoint(normalize_vector(point))
}
pub fn into_inner(self) -> Vec<f32> {
self.0
}
}
impl ops::Deref for NDotProductPoint {
type Target = [f32];
fn deref(&self) -> &Self::Target {
self.0.as_slice()
}
}
impl Point for NDotProductPoint {
fn distance(&self, other: &Self) -> f32 {
let dist = 1.0 - dot_product_similarity(&self.0, &other.0);
debug_assert!(!dist.is_nan());
dist.to_bits()
dist
}
}

View File

@@ -0,0 +1,27 @@
use std::borrow::Cow;
use std::convert::TryInto;
use std::str;
pub struct BEU16StrCodec;
impl<'a> heed::BytesDecode<'a> for BEU16StrCodec {
type DItem = (u16, &'a str);
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
let (n_bytes, str_bytes) = bytes.split_at(2);
let n = n_bytes.try_into().map(u16::from_be_bytes).ok()?;
let s = str::from_utf8(str_bytes).ok()?;
Some((n, s))
}
}
impl<'a> heed::BytesEncode<'a> for BEU16StrCodec {
type EItem = (u16, &'a str);
fn bytes_encode((n, s): &Self::EItem) -> Option<Cow<[u8]>> {
let mut bytes = Vec::with_capacity(s.len() + 2);
bytes.extend_from_slice(&n.to_be_bytes());
bytes.extend_from_slice(s.as_bytes());
Some(Cow::Owned(bytes))
}
}

View File

@@ -1,3 +1,4 @@
mod beu16_str_codec;
mod beu32_str_codec;
mod byte_slice_ref;
pub mod facet;
@@ -14,6 +15,7 @@ mod str_str_u8_codec;
pub use byte_slice_ref::ByteSliceRefCodec;
pub use str_ref::StrRefCodec;
pub use self::beu16_str_codec::BEU16StrCodec;
pub use self::beu32_str_codec::BEU32StrCodec;
pub use self::field_id_word_count_codec::FieldIdWordCountCodec;
pub use self::fst_set_codec::FstSetCodec;

View File

@@ -1,5 +1,5 @@
use std::borrow::Cow;
use std::collections::{HashMap, HashSet};
use std::collections::{BTreeSet, HashMap, HashSet};
use std::fs::File;
use std::mem::size_of;
use std::path::Path;
@@ -8,12 +8,11 @@ use charabia::{Language, Script};
use heed::flags::Flags;
use heed::types::*;
use heed::{CompactionOption, Database, PolyDatabase, RoTxn, RwTxn};
use rand_pcg::Pcg32;
use roaring::RoaringBitmap;
use rstar::RTree;
use time::OffsetDateTime;
use crate::distance::DotProduct;
use crate::distance::NDotProductPoint;
use crate::error::{InternalError, UserError};
use crate::facet::FacetType;
use crate::fields_ids_map::FieldsIdsMap;
@@ -21,7 +20,9 @@ use crate::heed_codec::facet::{
FacetGroupKeyCodec, FacetGroupValueCodec, FieldDocIdFacetF64Codec, FieldDocIdFacetStringCodec,
FieldIdCodec, OrderedF64Codec,
};
use crate::heed_codec::{FstSetCodec, ScriptLanguageCodec, StrBEU16Codec, StrRefCodec};
use crate::heed_codec::{
BEU16StrCodec, FstSetCodec, ScriptLanguageCodec, StrBEU16Codec, StrRefCodec,
};
use crate::readable_slices::ReadableSlices;
use crate::{
default_criteria, CboRoaringBitmapCodec, Criterion, DocumentId, ExternalDocumentsIds,
@@ -31,7 +32,7 @@ use crate::{
};
/// The HNSW data-structure that we serialize, fill and search in.
pub type Hnsw = hnsw::Hnsw<DotProduct, Vec<f32>, Pcg32, 12, 24>;
pub type Hnsw = instant_distance::Hnsw<NDotProductPoint>;
pub const DEFAULT_MIN_WORD_LEN_ONE_TYPO: u8 = 5;
pub const DEFAULT_MIN_WORD_LEN_TWO_TYPOS: u8 = 9;
@@ -96,6 +97,7 @@ pub mod db_name {
pub const FACET_ID_IS_NULL_DOCIDS: &str = "facet-id-is-null-docids";
pub const FACET_ID_IS_EMPTY_DOCIDS: &str = "facet-id-is-empty-docids";
pub const FACET_ID_STRING_DOCIDS: &str = "facet-id-string-docids";
pub const FACET_ID_NORMALIZED_STRING_STRINGS: &str = "facet-id-normalized-string-strings";
pub const FACET_ID_STRING_FST: &str = "facet-id-string-fst";
pub const FIELD_ID_DOCID_FACET_F64S: &str = "field-id-docid-facet-f64s";
pub const FIELD_ID_DOCID_FACET_STRINGS: &str = "field-id-docid-facet-strings";
@@ -157,6 +159,8 @@ pub struct Index {
pub facet_id_f64_docids: Database<FacetGroupKeyCodec<OrderedF64Codec>, FacetGroupValueCodec>,
/// Maps the facet field id and ranges of strings with the docids that corresponds to them.
pub facet_id_string_docids: Database<FacetGroupKeyCodec<StrRefCodec>, FacetGroupValueCodec>,
/// Maps the facet field id of the normalized-for-search string facets with their original versions.
pub facet_id_normalized_string_strings: Database<BEU16StrCodec, SerdeJson<BTreeSet<String>>>,
/// Maps the facet field id of the string facets with an FST containing all the facets values.
pub facet_id_string_fst: Database<OwnedType<BEU16>, FstSetCodec>,
@@ -181,7 +185,7 @@ impl Index {
) -> Result<Index> {
use db_name::*;
options.max_dbs(24);
options.max_dbs(25);
unsafe { options.flag(Flags::MdbAlwaysFreePages) };
let env = options.open(path)?;
@@ -211,6 +215,8 @@ impl Index {
let facet_id_f64_docids = env.create_database(&mut wtxn, Some(FACET_ID_F64_DOCIDS))?;
let facet_id_string_docids =
env.create_database(&mut wtxn, Some(FACET_ID_STRING_DOCIDS))?;
let facet_id_normalized_string_strings =
env.create_database(&mut wtxn, Some(FACET_ID_NORMALIZED_STRING_STRINGS))?;
let facet_id_string_fst = env.create_database(&mut wtxn, Some(FACET_ID_STRING_FST))?;
let facet_id_exists_docids =
env.create_database(&mut wtxn, Some(FACET_ID_EXISTS_DOCIDS))?;
@@ -246,6 +252,7 @@ impl Index {
field_id_word_count_docids,
facet_id_f64_docids,
facet_id_string_docids,
facet_id_normalized_string_strings,
facet_id_string_fst,
facet_id_exists_docids,
facet_id_is_null_docids,
@@ -1711,11 +1718,11 @@ pub(crate) mod tests {
.unwrap();
index
.add_documents(documents!([
{ "id": 0, "_geo": { "lat": 0, "lng": 0 } },
{ "id": 1, "_geo": { "lat": 0, "lng": -175 } },
{ "id": 2, "_geo": { "lat": 0, "lng": 175 } },
{ "id": 0, "_geo": { "lat": "0", "lng": "0" } },
{ "id": 1, "_geo": { "lat": 0, "lng": "-175" } },
{ "id": 2, "_geo": { "lat": "0", "lng": 175 } },
{ "id": 3, "_geo": { "lat": 85, "lng": 0 } },
{ "id": 4, "_geo": { "lat": -85, "lng": 0 } },
{ "id": 4, "_geo": { "lat": "-85", "lng": "0" } },
]))
.unwrap();

View File

@@ -51,9 +51,10 @@ pub use self::error::{
pub use self::external_documents_ids::ExternalDocumentsIds;
pub use self::fields_ids_map::FieldsIdsMap;
pub use self::heed_codec::{
BEU32StrCodec, BoRoaringBitmapCodec, BoRoaringBitmapLenCodec, CboRoaringBitmapCodec,
CboRoaringBitmapLenCodec, FieldIdWordCountCodec, ObkvCodec, RoaringBitmapCodec,
RoaringBitmapLenCodec, StrBEU32Codec, U8StrStrCodec, UncheckedU8StrStrCodec,
BEU16StrCodec, BEU32StrCodec, BoRoaringBitmapCodec, BoRoaringBitmapLenCodec,
CboRoaringBitmapCodec, CboRoaringBitmapLenCodec, FieldIdWordCountCodec, ObkvCodec,
RoaringBitmapCodec, RoaringBitmapLenCodec, StrBEU32Codec, U8StrStrCodec,
UncheckedU8StrStrCodec,
};
pub use self::index::Index;
pub use self::search::{
@@ -96,7 +97,7 @@ const MAX_LMDB_KEY_LENGTH: usize = 500;
///
/// This number is determined by the keys of the different facet databases
/// and adding a margin of safety.
pub const MAX_FACET_VALUE_LENGTH: usize = MAX_LMDB_KEY_LENGTH - 20;
pub const MAX_FACET_VALUE_LENGTH: usize = MAX_LMDB_KEY_LENGTH - 32;
/// The maximum length a word can be
pub const MAX_WORD_LENGTH: usize = MAX_LMDB_KEY_LENGTH / 2;
@@ -292,15 +293,15 @@ pub fn normalize_facet(original: &str) -> String {
#[derive(serde::Serialize, serde::Deserialize, Debug)]
#[serde(transparent)]
pub struct VectorOrArrayOfVectors {
#[serde(with = "either::serde_untagged")]
inner: either::Either<Vec<f32>, Vec<Vec<f32>>>,
#[serde(with = "either::serde_untagged_optional")]
inner: Option<either::Either<Vec<f32>, Vec<Vec<f32>>>>,
}
impl VectorOrArrayOfVectors {
pub fn into_array_of_vectors(self) -> Vec<Vec<f32>> {
match self.inner {
either::Either::Left(vector) => vec![vector],
either::Either::Right(vectors) => vectors,
pub fn into_array_of_vectors(self) -> Option<Vec<Vec<f32>>> {
match self.inner? {
either::Either::Left(vector) => Some(vec![vector]),
either::Either::Right(vectors) => Some(vectors),
}
}
}

View File

@@ -84,7 +84,7 @@ impl ScoreDetails {
// For now, fid is a virtual rule always followed by the "position" rule
let fid_details = serde_json::json!({
"order": order,
"attribute_ranking_order_score": fid.local_score(),
"attributeRankingOrderScore": fid.local_score(),
});
details_map.insert("attribute".into(), fid_details);
order += 1;
@@ -102,7 +102,7 @@ impl ScoreDetails {
};
attribute_details
.insert("query_word_distance_score".into(), position.local_score().into());
.insert("queryWordDistanceScore".into(), position.local_score().into());
let score = Rank::global_score([fid_details, *position].iter().copied());
attribute_details.insert("score".into(), score.into());

View File

@@ -1,5 +1,8 @@
use std::fmt;
use std::ops::ControlFlow;
use charabia::normalizer::NormalizerOption;
use charabia::Normalize;
use fst::automaton::{Automaton, Str};
use fst::{IntoStreamer, Streamer};
use levenshtein_automata::{LevenshteinAutomatonBuilder as LevBuilder, DFA};
@@ -14,8 +17,8 @@ use crate::error::UserError;
use crate::heed_codec::facet::{FacetGroupKey, FacetGroupValue};
use crate::score_details::{ScoreDetails, ScoringStrategy};
use crate::{
execute_search, normalize_facet, AscDesc, DefaultSearchLogger, DocumentId, FieldId, Index,
Result, SearchContext, BEU16,
execute_search, AscDesc, DefaultSearchLogger, DocumentId, FieldId, Index, Result,
SearchContext, BEU16,
};
// Building these factories is not free.
@@ -301,29 +304,28 @@ impl<'a> SearchForFacetValues<'a> {
match self.query.as_ref() {
Some(query) => {
let query = normalize_facet(query);
let query = query.as_str();
let options = NormalizerOption { lossy: true, ..Default::default() };
let query = query.normalize(&options);
let query = query.as_ref();
let authorize_typos = self.search_query.index.authorize_typos(rtxn)?;
let field_authorizes_typos =
!self.search_query.index.exact_attributes_ids(rtxn)?.contains(&fid);
if authorize_typos && field_authorizes_typos {
let mut results = vec![];
let exact_words_fst = self.search_query.index.exact_words(rtxn)?;
if exact_words_fst.map_or(false, |fst| fst.contains(query)) {
let key = FacetGroupKey { field_id: fid, level: 0, left_bound: query };
if let Some(FacetGroupValue { bitmap, .. }) =
index.facet_id_string_docids.get(rtxn, &key)?
{
let count = search_candidates.intersection_len(&bitmap);
if count != 0 {
let value = self
.one_original_value_of(fid, query, bitmap.min().unwrap())?
.unwrap_or_else(|| query.to_string());
results.push(FacetValueHit { value, count });
}
let mut results = vec![];
if fst.contains(query) {
self.fetch_original_facets_using_normalized(
fid,
query,
query,
&search_candidates,
&mut results,
)?;
}
Ok(results)
} else {
let one_typo = self.search_query.index.min_word_len_one_typo(rtxn)?;
let two_typos = self.search_query.index.min_word_len_two_typos(rtxn)?;
@@ -338,60 +340,41 @@ impl<'a> SearchForFacetValues<'a> {
};
let mut stream = fst.search(automaton).into_stream();
let mut length = 0;
let mut results = vec![];
while let Some(facet_value) = stream.next() {
let value = std::str::from_utf8(facet_value)?;
let key = FacetGroupKey { field_id: fid, level: 0, left_bound: value };
let docids = match index.facet_id_string_docids.get(rtxn, &key)? {
Some(FacetGroupValue { bitmap, .. }) => bitmap,
None => {
error!(
"the facet value is missing from the facet database: {key:?}"
);
continue;
}
};
let count = search_candidates.intersection_len(&docids);
if count != 0 {
let value = self
.one_original_value_of(fid, value, docids.min().unwrap())?
.unwrap_or_else(|| query.to_string());
results.push(FacetValueHit { value, count });
length += 1;
}
if length >= MAX_NUMBER_OF_FACETS {
if self
.fetch_original_facets_using_normalized(
fid,
value,
query,
&search_candidates,
&mut results,
)?
.is_break()
{
break;
}
}
}
Ok(results)
Ok(results)
}
} else {
let automaton = Str::new(query).starts_with();
let mut stream = fst.search(automaton).into_stream();
let mut results = vec![];
let mut length = 0;
while let Some(facet_value) = stream.next() {
let value = std::str::from_utf8(facet_value)?;
let key = FacetGroupKey { field_id: fid, level: 0, left_bound: value };
let docids = match index.facet_id_string_docids.get(rtxn, &key)? {
Some(FacetGroupValue { bitmap, .. }) => bitmap,
None => {
error!(
"the facet value is missing from the facet database: {key:?}"
);
continue;
}
};
let count = search_candidates.intersection_len(&docids);
if count != 0 {
let value = self
.one_original_value_of(fid, value, docids.min().unwrap())?
.unwrap_or_else(|| query.to_string());
results.push(FacetValueHit { value, count });
length += 1;
}
if length >= MAX_NUMBER_OF_FACETS {
if self
.fetch_original_facets_using_normalized(
fid,
value,
query,
&search_candidates,
&mut results,
)?
.is_break()
{
break;
}
}
@@ -401,7 +384,6 @@ impl<'a> SearchForFacetValues<'a> {
}
None => {
let mut results = vec![];
let mut length = 0;
let prefix = FacetGroupKey { field_id: fid, level: 0, left_bound: "" };
for result in index.facet_id_string_docids.prefix_iter(rtxn, &prefix)? {
let (FacetGroupKey { left_bound, .. }, FacetGroupValue { bitmap, .. }) =
@@ -412,9 +394,8 @@ impl<'a> SearchForFacetValues<'a> {
.one_original_value_of(fid, left_bound, bitmap.min().unwrap())?
.unwrap_or_else(|| left_bound.to_string());
results.push(FacetValueHit { value, count });
length += 1;
}
if length >= MAX_NUMBER_OF_FACETS {
if results.len() >= MAX_NUMBER_OF_FACETS {
break;
}
}
@@ -422,6 +403,50 @@ impl<'a> SearchForFacetValues<'a> {
}
}
}
fn fetch_original_facets_using_normalized(
&self,
fid: FieldId,
value: &str,
query: &str,
search_candidates: &RoaringBitmap,
results: &mut Vec<FacetValueHit>,
) -> Result<ControlFlow<()>> {
let index = self.search_query.index;
let rtxn = self.search_query.rtxn;
let database = index.facet_id_normalized_string_strings;
let key = (fid, value);
let original_strings = match database.get(rtxn, &key)? {
Some(original_strings) => original_strings,
None => {
error!("the facet value is missing from the facet database: {key:?}");
return Ok(ControlFlow::Continue(()));
}
};
for original in original_strings {
let key = FacetGroupKey { field_id: fid, level: 0, left_bound: original.as_str() };
let docids = match index.facet_id_string_docids.get(rtxn, &key)? {
Some(FacetGroupValue { bitmap, .. }) => bitmap,
None => {
error!("the facet value is missing from the facet database: {key:?}");
return Ok(ControlFlow::Continue(()));
}
};
let count = search_candidates.intersection_len(&docids);
if count != 0 {
let value = self
.one_original_value_of(fid, &original, docids.min().unwrap())?
.unwrap_or_else(|| query.to_string());
results.push(FacetValueHit { value, count });
}
if results.len() >= MAX_NUMBER_OF_FACETS {
return Ok(ControlFlow::Break(()));
}
}
Ok(ControlFlow::Continue(()))
}
}
#[derive(Debug, Clone, serde::Serialize, PartialEq)]

View File

@@ -91,11 +91,12 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
/// Update the universes accordingly and inform the logger.
macro_rules! back {
() => {
assert!(
ranking_rule_universes[cur_ranking_rule_index].is_empty(),
"The ranking rule {} did not sort its bucket exhaustively",
ranking_rules[cur_ranking_rule_index].id()
);
// FIXME: temporarily disabled assert: see <https://github.com/meilisearch/meilisearch/pull/4013>
// assert!(
// ranking_rule_universes[cur_ranking_rule_index].is_empty(),
// "The ranking rule {} did not sort its bucket exhaustively",
// ranking_rules[cur_ranking_rule_index].id()
// );
logger.end_iteration_ranking_rule(
cur_ranking_rule_index,
ranking_rules[cur_ranking_rule_index].as_ref(),

View File

@@ -100,7 +100,7 @@ fn facet_number_values<'a>(
}
/// Return an iterator over each string value in the given field of the given document.
fn facet_string_values<'a>(
pub fn facet_string_values<'a>(
docid: u32,
field_id: u16,
index: &Index,

View File

@@ -6,6 +6,7 @@ use heed::{RoPrefix, RoTxn};
use roaring::RoaringBitmap;
use rstar::RTree;
use super::facet_string_values;
use super::ranking_rules::{RankingRule, RankingRuleOutput, RankingRuleQueryTrait};
use crate::heed_codec::facet::{FieldDocIdFacetCodec, OrderedF64Codec};
use crate::score_details::{self, ScoreDetails};
@@ -157,23 +158,7 @@ impl<Q: RankingRuleQueryTrait> GeoSort<Q> {
let mut documents = self
.geo_candidates
.iter()
.map(|id| -> Result<_> {
Ok((
id,
[
facet_number_values(id, lat, ctx.index, ctx.txn)?
.next()
.expect("A geo faceted document doesn't contain any lat")?
.0
.2,
facet_number_values(id, lng, ctx.index, ctx.txn)?
.next()
.expect("A geo faceted document doesn't contain any lng")?
.0
.2,
],
))
})
.map(|id| -> Result<_> { Ok((id, geo_value(id, lat, lng, ctx.index, ctx.txn)?)) })
.collect::<Result<Vec<(u32, [f64; 2])>>>()?;
// computing the distance between two points is expensive thus we cache the result
documents
@@ -185,6 +170,37 @@ impl<Q: RankingRuleQueryTrait> GeoSort<Q> {
}
}
/// Extracts the lat and long values from a single document.
///
/// If it is not able to find it in the facet number index it will extract it
/// from the facet string index and parse it as f64 (as the geo extraction behaves).
fn geo_value(
docid: u32,
field_lat: u16,
field_lng: u16,
index: &Index,
rtxn: &RoTxn,
) -> Result<[f64; 2]> {
let extract_geo = |geo_field: u16| -> Result<f64> {
match facet_number_values(docid, geo_field, index, rtxn)?.next() {
Some(Ok(((_, _, geo), ()))) => Ok(geo),
Some(Err(e)) => Err(e.into()),
None => match facet_string_values(docid, geo_field, index, rtxn)?.next() {
Some(Ok((_, geo))) => {
Ok(geo.parse::<f64>().expect("cannot parse geo field as f64"))
}
Some(Err(e)) => Err(e.into()),
None => panic!("A geo faceted document doesn't contain any lat or lng"),
},
}
};
let lat = extract_geo(field_lat)?;
let lng = extract_geo(field_lng)?;
Ok([lat, lng])
}
impl<'ctx, Q: RankingRuleQueryTrait> RankingRule<'ctx, Q> for GeoSort<Q> {
fn id(&self) -> String {
"geo_sort".to_owned()

View File

@@ -28,7 +28,7 @@ use db_cache::DatabaseCache;
use exact_attribute::ExactAttribute;
use graph_based_ranking_rule::{Exactness, Fid, Position, Proximity, Typo};
use heed::RoTxn;
use hnsw::Searcher;
use instant_distance::Search;
use interner::{DedupInterner, Interner};
pub use logger::visual::VisualSearchLogger;
pub use logger::{DefaultSearchLogger, SearchLogger};
@@ -40,18 +40,18 @@ use ranking_rules::{
use resolve_query_graph::{compute_query_graph_docids, PhraseDocIdsCache};
use roaring::RoaringBitmap;
use sort::Sort;
use space::Neighbor;
use self::distinct::facet_string_values;
use self::geo_sort::GeoSort;
pub use self::geo_sort::Strategy as GeoSortStrategy;
use self::graph_based_ranking_rule::Words;
use self::interner::Interned;
use crate::distance::NDotProductPoint;
use crate::error::FieldIdMapMissingEntry;
use crate::score_details::{ScoreDetails, ScoringStrategy};
use crate::search::new::distinct::apply_distinct_rule;
use crate::{
normalize_vector, AscDesc, DocumentId, Filter, Index, Member, Result, TermsMatchingStrategy,
UserError, BEU32,
AscDesc, DocumentId, Filter, Index, Member, Result, TermsMatchingStrategy, UserError, BEU32,
};
/// A structure used throughout the execution of a search query.
@@ -85,7 +85,12 @@ impl<'ctx> SearchContext<'ctx> {
let searchable_names = self.index.searchable_fields(self.txn)?;
let mut restricted_fids = Vec::new();
let mut contains_wildcard = false;
for field_name in searchable_attributes {
if field_name == "*" {
contains_wildcard = true;
continue;
}
let searchable_contains_name =
searchable_names.as_ref().map(|sn| sn.iter().any(|name| name == field_name));
let fid = match (fids_map.id(field_name), searchable_contains_name) {
@@ -99,8 +104,10 @@ impl<'ctx> SearchContext<'ctx> {
}
.into())
}
// The field is not searchable, but the searchableAttributes are set to * => ignore field
(None, None) => continue,
// The field is not searchable => User error
_otherwise => {
(_fid, Some(false)) => {
let mut valid_fields: BTreeSet<_> =
fids_map.names().map(String::from).collect();
@@ -132,7 +139,7 @@ impl<'ctx> SearchContext<'ctx> {
restricted_fids.push(fid);
}
self.restricted_fids = Some(restricted_fids);
self.restricted_fids = (!contains_wildcard).then_some(restricted_fids);
Ok(())
}
@@ -437,29 +444,31 @@ pub fn execute_search(
check_sort_criteria(ctx, sort_criteria.as_ref())?;
if let Some(vector) = vector {
let mut searcher = Searcher::new();
let hnsw = ctx.index.vector_hnsw(ctx.txn)?.unwrap_or_default();
let ef = hnsw.len().min(100);
let mut dest = vec![Neighbor { index: 0, distance: 0 }; ef];
let vector = normalize_vector(vector.clone());
let neighbors = hnsw.nearest(&vector, ef, &mut searcher, &mut dest[..]);
let mut search = Search::default();
let docids = match ctx.index.vector_hnsw(ctx.txn)? {
Some(hnsw) => {
let vector = NDotProductPoint::new(vector.clone());
let neighbors = hnsw.search(&vector, &mut search);
let mut docids = Vec::new();
let mut uniq_docids = RoaringBitmap::new();
for Neighbor { index, distance: _ } in neighbors.iter() {
let index = BEU32::new(*index as u32);
let docid = ctx.index.vector_id_docid.get(ctx.txn, &index)?.unwrap().get();
if universe.contains(docid) && uniq_docids.insert(docid) {
docids.push(docid);
if docids.len() == (from + length) {
break;
let mut docids = Vec::new();
let mut uniq_docids = RoaringBitmap::new();
for instant_distance::Item { distance: _, pid, point: _ } in neighbors {
let index = BEU32::new(pid.into_inner());
let docid = ctx.index.vector_id_docid.get(ctx.txn, &index)?.unwrap().get();
if universe.contains(docid) && uniq_docids.insert(docid) {
docids.push(docid);
if docids.len() == (from + length) {
break;
}
}
}
}
}
// return the nearest documents that are also part of the candidates
// along with a dummy list of scores that are useless in this context.
let docids: Vec<_> = docids.into_iter().skip(from).take(length).collect();
// return the nearest documents that are also part of the candidates
// along with a dummy list of scores that are useless in this context.
docids.into_iter().skip(from).take(length).collect()
}
None => Vec::new(),
};
return Ok(PartialSearchResult {
candidates: universe,

View File

@@ -34,6 +34,7 @@ impl<'t, 'u, 'i> ClearDocuments<'t, 'u, 'i> {
script_language_docids,
facet_id_f64_docids,
facet_id_string_docids,
facet_id_normalized_string_strings,
facet_id_string_fst,
facet_id_exists_docids,
facet_id_is_null_docids,
@@ -92,6 +93,7 @@ impl<'t, 'u, 'i> ClearDocuments<'t, 'u, 'i> {
word_prefix_fid_docids.clear(self.wtxn)?;
script_language_docids.clear(self.wtxn)?;
facet_id_f64_docids.clear(self.wtxn)?;
facet_id_normalized_string_strings.clear(self.wtxn)?;
facet_id_string_fst.clear(self.wtxn)?;
facet_id_exists_docids.clear(self.wtxn)?;
facet_id_is_null_docids.clear(self.wtxn)?;

View File

@@ -4,10 +4,9 @@ use std::collections::{BTreeSet, HashMap, HashSet};
use fst::IntoStreamer;
use heed::types::{ByteSlice, DecodeIgnore, Str, UnalignedSlice};
use heed::{BytesDecode, BytesEncode, Database, RwIter};
use hnsw::Searcher;
use instant_distance::PointId;
use roaring::RoaringBitmap;
use serde::{Deserialize, Serialize};
use space::KnnPoints;
use time::OffsetDateTime;
use super::facet::delete::FacetsDelete;
@@ -237,6 +236,7 @@ impl<'t, 'u, 'i> DeleteDocuments<'t, 'u, 'i> {
word_prefix_fid_docids,
facet_id_f64_docids: _,
facet_id_string_docids: _,
facet_id_normalized_string_strings: _,
facet_id_string_fst: _,
field_id_docid_facet_f64s: _,
field_id_docid_facet_strings: _,
@@ -436,24 +436,24 @@ impl<'t, 'u, 'i> DeleteDocuments<'t, 'u, 'i> {
// An ugly and slow way to remove the vectors from the HNSW
// It basically reconstructs the HNSW from scratch without editing the current one.
let current_hnsw = self.index.vector_hnsw(self.wtxn)?.unwrap_or_default();
if !current_hnsw.is_empty() {
let mut new_hnsw = Hnsw::default();
let mut searcher = Searcher::new();
let mut new_vector_id_docids = Vec::new();
if let Some(current_hnsw) = self.index.vector_hnsw(self.wtxn)? {
let mut points = Vec::new();
let mut docids = Vec::new();
for result in vector_id_docid.iter(self.wtxn)? {
let (vector_id, docid) = result?;
if !self.to_delete_docids.contains(docid.get()) {
let vector = current_hnsw.get_point(vector_id.get() as usize).clone();
let vector_id = new_hnsw.insert(vector, &mut searcher);
new_vector_id_docids.push((vector_id as u32, docid));
let pid = PointId::from(vector_id.get());
let vector = current_hnsw[pid].clone();
points.push(vector);
docids.push(docid);
}
}
let (new_hnsw, pids) = Hnsw::builder().build_hnsw(points);
vector_id_docid.clear(self.wtxn)?;
for (vector_id, docid) in new_vector_id_docids {
vector_id_docid.put(self.wtxn, &BEU32::new(vector_id), &docid)?;
for (pid, docid) in pids.into_iter().zip(docids) {
vector_id_docid.put(self.wtxn, &BEU32::new(pid.into_inner()), &docid)?;
}
self.index.put_vector_hnsw(self.wtxn, &new_hnsw)?;
}

View File

@@ -76,9 +76,14 @@ pub const FACET_MAX_GROUP_SIZE: u8 = 8;
pub const FACET_GROUP_SIZE: u8 = 4;
pub const FACET_MIN_LEVEL_SIZE: u8 = 5;
use std::collections::BTreeSet;
use std::fs::File;
use std::iter::FromIterator;
use heed::types::DecodeIgnore;
use charabia::normalizer::{Normalize, NormalizerOption};
use grenad::{CompressionType, SortAlgorithm};
use heed::types::{ByteSlice, DecodeIgnore, SerdeJson};
use heed::BytesEncode;
use log::debug;
use time::OffsetDateTime;
@@ -87,7 +92,9 @@ use super::FacetsUpdateBulk;
use crate::facet::FacetType;
use crate::heed_codec::facet::{FacetGroupKey, FacetGroupKeyCodec, FacetGroupValueCodec};
use crate::heed_codec::ByteSliceRefCodec;
use crate::{Index, Result, BEU16};
use crate::update::index_documents::create_sorter;
use crate::update::merge_btreeset_string;
use crate::{BEU16StrCodec, Index, Result, BEU16, MAX_FACET_VALUE_LENGTH};
pub mod bulk;
pub mod delete;
@@ -159,26 +166,78 @@ impl<'i> FacetsUpdate<'i> {
incremental_update.execute(wtxn)?;
}
// We clear the list of normalized-for-search facets
// and the previous FSTs to compute everything from scratch
self.index.facet_id_normalized_string_strings.clear(wtxn)?;
self.index.facet_id_string_fst.clear(wtxn)?;
// As we can't use the same write transaction to read and write in two different databases
// we must create a temporary sorter that we will write into LMDB afterward.
// As multiple unnormalized facet values can become the same normalized facet value
// we must merge them together.
let mut sorter = create_sorter(
SortAlgorithm::Unstable,
merge_btreeset_string,
CompressionType::None,
None,
None,
None,
);
// We iterate on the list of original, semi-normalized, facet values
// and normalize them for search, inserting them in LMDB in any given order.
let options = NormalizerOption { lossy: true, ..Default::default() };
let database = self.index.facet_id_string_docids.remap_data_type::<DecodeIgnore>();
for result in database.iter(wtxn)? {
let (facet_group_key, ()) = result?;
if let FacetGroupKey { field_id, level: 0, left_bound } = facet_group_key {
let mut normalized_facet = left_bound.normalize(&options);
let normalized_truncated_facet: String;
if normalized_facet.len() > MAX_FACET_VALUE_LENGTH {
normalized_truncated_facet = normalized_facet
.char_indices()
.take_while(|(idx, _)| *idx < MAX_FACET_VALUE_LENGTH)
.map(|(_, c)| c)
.collect();
normalized_facet = normalized_truncated_facet.into();
}
let set = BTreeSet::from_iter(std::iter::once(left_bound));
let key = (field_id, normalized_facet.as_ref());
let key = BEU16StrCodec::bytes_encode(&key).ok_or(heed::Error::Encoding)?;
let val = SerdeJson::bytes_encode(&set).ok_or(heed::Error::Encoding)?;
sorter.insert(key, val)?;
}
}
// In this loop we don't need to take care of merging bitmaps
// as the grenad sorter already merged them for us.
let mut merger_iter = sorter.into_stream_merger_iter()?;
while let Some((key_bytes, btreeset_bytes)) = merger_iter.next()? {
self.index
.facet_id_normalized_string_strings
.remap_types::<ByteSlice, ByteSlice>()
.put(wtxn, key_bytes, btreeset_bytes)?;
}
// We compute one FST by string facet
let mut text_fsts = vec![];
let mut current_fst: Option<(u16, fst::SetBuilder<Vec<u8>>)> = None;
let database = self.index.facet_id_string_docids.remap_data_type::<DecodeIgnore>();
let database =
self.index.facet_id_normalized_string_strings.remap_data_type::<DecodeIgnore>();
for result in database.iter(wtxn)? {
let (facet_group_key, _) = result?;
if let FacetGroupKey { field_id, level: 0, left_bound } = facet_group_key {
current_fst = match current_fst.take() {
Some((fid, fst_builder)) if fid != field_id => {
let fst = fst_builder.into_set();
text_fsts.push((fid, fst));
Some((field_id, fst::SetBuilder::memory()))
}
Some((field_id, fst_builder)) => Some((field_id, fst_builder)),
None => Some((field_id, fst::SetBuilder::memory())),
};
if let Some((_, fst_builder)) = current_fst.as_mut() {
fst_builder.insert(left_bound)?;
let ((field_id, normalized_facet), _) = result?;
current_fst = match current_fst.take() {
Some((fid, fst_builder)) if fid != field_id => {
let fst = fst_builder.into_set();
text_fsts.push((fid, fst));
Some((field_id, fst::SetBuilder::memory()))
}
Some((field_id, fst_builder)) => Some((field_id, fst_builder)),
None => Some((field_id, fst::SetBuilder::memory())),
};
if let Some((_, fst_builder)) = current_fst.as_mut() {
fst_builder.insert(normalized_facet)?;
}
}
@@ -187,9 +246,6 @@ impl<'i> FacetsUpdate<'i> {
text_fsts.push((field_id, fst));
}
// We remove all of the previous FSTs that were in this database
self.index.facet_id_string_fst.clear(wtxn)?;
// We write those FSTs in LMDB now
for (field_id, fst) in text_fsts {
self.index.facet_id_string_fst.put(wtxn, &BEU16::new(field_id), &fst)?;

View File

@@ -44,7 +44,7 @@ pub fn extract_facet_string_docids<R: io::Read + io::Seek>(
if normalised_value.len() > MAX_FACET_VALUE_LENGTH {
normalised_truncated_value = normalised_value
.char_indices()
.take_while(|(idx, _)| idx + 4 < MAX_FACET_VALUE_LENGTH)
.take_while(|(idx, _)| *idx < MAX_FACET_VALUE_LENGTH)
.map(|(_, c)| c)
.collect();
normalised_value = normalised_truncated_value.as_str();

View File

@@ -28,11 +28,13 @@ pub struct ExtractedFacetValues {
///
/// Returns the generated grenad reader containing the docid the fid and the orginal value as key
/// and the normalized value as value extracted from the given chunk of documents.
/// We need the fid of the geofields to correctly parse them as numbers if they were sent as strings initially.
#[logging_timer::time]
pub fn extract_fid_docid_facet_values<R: io::Read + io::Seek>(
obkv_documents: grenad::Reader<R>,
indexer: GrenadParameters,
faceted_fields: &HashSet<FieldId>,
geo_fields_ids: Option<(FieldId, FieldId)>,
) -> Result<ExtractedFacetValues> {
let max_memory = indexer.max_memory_by_thread();
@@ -82,7 +84,10 @@ pub fn extract_fid_docid_facet_values<R: io::Read + io::Seek>(
let value = from_slice(field_bytes).map_err(InternalError::SerdeJson)?;
match extract_facet_values(&value) {
match extract_facet_values(
&value,
geo_fields_ids.map_or(false, |(lat, lng)| field_id == lat || field_id == lng),
) {
FilterableValues::Null => {
facet_is_null_docids.entry(field_id).or_default().insert(document);
}
@@ -175,12 +180,13 @@ enum FilterableValues {
Values { numbers: Vec<f64>, strings: Vec<(String, String)> },
}
fn extract_facet_values(value: &Value) -> FilterableValues {
fn extract_facet_values(value: &Value, geo_field: bool) -> FilterableValues {
fn inner_extract_facet_values(
value: &Value,
can_recurse: bool,
output_numbers: &mut Vec<f64>,
output_strings: &mut Vec<(String, String)>,
geo_field: bool,
) {
match value {
Value::Null => (),
@@ -191,13 +197,30 @@ fn extract_facet_values(value: &Value) -> FilterableValues {
}
}
Value::String(original) => {
// if we're working on a geofield it MUST be something we can parse or else there was an internal error
// in the enrich pipeline. But since the enrich pipeline worked, we want to avoid crashing at all costs.
if geo_field {
if let Ok(float) = original.parse() {
output_numbers.push(float);
} else {
log::warn!(
"Internal error, could not parse a geofield that has been validated. Please open an issue."
)
}
}
let normalized = crate::normalize_facet(original);
output_strings.push((normalized, original.clone()));
}
Value::Array(values) => {
if can_recurse {
for value in values {
inner_extract_facet_values(value, false, output_numbers, output_strings);
inner_extract_facet_values(
value,
false,
output_numbers,
output_strings,
geo_field,
);
}
}
}
@@ -213,7 +236,7 @@ fn extract_facet_values(value: &Value) -> FilterableValues {
otherwise => {
let mut numbers = Vec::new();
let mut strings = Vec::new();
inner_extract_facet_values(otherwise, true, &mut numbers, &mut strings);
inner_extract_facet_values(otherwise, true, &mut numbers, &mut strings, geo_field);
FilterableValues::Values { numbers, strings }
}
}

View File

@@ -33,7 +33,7 @@ pub fn extract_vector_points<R: io::Read + io::Seek>(
// lazily get it when needed
let document_id = || -> Value {
let document_id = obkv.get(primary_key_id).unwrap();
serde_json::from_slice(document_id).unwrap()
from_slice(document_id).unwrap()
};
// first we retrieve the _vectors field
@@ -50,12 +50,14 @@ pub fn extract_vector_points<R: io::Read + io::Seek>(
}
};
for (i, vector) in vectors.into_iter().enumerate().take(u16::MAX as usize) {
let index = u16::try_from(i).unwrap();
let mut key = docid_bytes.to_vec();
key.extend_from_slice(&index.to_be_bytes());
let bytes = cast_slice(&vector);
writer.insert(key, bytes)?;
if let Some(vectors) = vectors {
for (i, vector) in vectors.into_iter().enumerate().take(u16::MAX as usize) {
let index = u16::try_from(i).unwrap();
let mut key = docid_bytes.to_vec();
key.extend_from_slice(&index.to_be_bytes());
let bytes = cast_slice(&vector);
writer.insert(key, bytes)?;
}
}
}
// else => the `_vectors` object was `null`, there is nothing to do

View File

@@ -55,7 +55,13 @@ pub(crate) fn data_from_obkv_documents(
original_obkv_chunks
.par_bridge()
.map(|original_documents_chunk| {
send_original_documents_data(original_documents_chunk, lmdb_writer_sx.clone())
send_original_documents_data(
original_documents_chunk,
indexer,
lmdb_writer_sx.clone(),
vectors_field_id,
primary_key_id,
)
})
.collect::<Result<()>>()?;
@@ -72,7 +78,6 @@ pub(crate) fn data_from_obkv_documents(
&faceted_fields,
primary_key_id,
geo_fields_ids,
vectors_field_id,
&stop_words,
max_positions_per_attributes,
)
@@ -257,11 +262,33 @@ fn spawn_extraction_task<FE, FS, M>(
/// - documents
fn send_original_documents_data(
original_documents_chunk: Result<grenad::Reader<File>>,
indexer: GrenadParameters,
lmdb_writer_sx: Sender<Result<TypedChunk>>,
vectors_field_id: Option<FieldId>,
primary_key_id: FieldId,
) -> Result<()> {
let original_documents_chunk =
original_documents_chunk.and_then(|c| unsafe { as_cloneable_grenad(&c) })?;
if let Some(vectors_field_id) = vectors_field_id {
let documents_chunk_cloned = original_documents_chunk.clone();
let lmdb_writer_sx_cloned = lmdb_writer_sx.clone();
rayon::spawn(move || {
let result = extract_vector_points(
documents_chunk_cloned,
indexer,
primary_key_id,
vectors_field_id,
);
let _ = match result {
Ok(vector_points) => {
lmdb_writer_sx_cloned.send(Ok(TypedChunk::VectorPoints(vector_points)))
}
Err(error) => lmdb_writer_sx_cloned.send(Err(error)),
};
});
}
// TODO: create a custom internal error
lmdb_writer_sx.send(Ok(TypedChunk::Documents(original_documents_chunk))).unwrap();
Ok(())
@@ -283,7 +310,6 @@ fn send_and_extract_flattened_documents_data(
faceted_fields: &HashSet<FieldId>,
primary_key_id: FieldId,
geo_fields_ids: Option<(FieldId, FieldId)>,
vectors_field_id: Option<FieldId>,
stop_words: &Option<fst::Set<&[u8]>>,
max_positions_per_attributes: Option<u32>,
) -> Result<(
@@ -312,25 +338,6 @@ fn send_and_extract_flattened_documents_data(
});
}
if let Some(vectors_field_id) = vectors_field_id {
let documents_chunk_cloned = flattened_documents_chunk.clone();
let lmdb_writer_sx_cloned = lmdb_writer_sx.clone();
rayon::spawn(move || {
let result = extract_vector_points(
documents_chunk_cloned,
indexer,
primary_key_id,
vectors_field_id,
);
let _ = match result {
Ok(vector_points) => {
lmdb_writer_sx_cloned.send(Ok(TypedChunk::VectorPoints(vector_points)))
}
Err(error) => lmdb_writer_sx_cloned.send(Err(error)),
};
});
}
let (docid_word_positions_chunk, docid_fid_facet_values_chunks): (Result<_>, Result<_>) =
rayon::join(
|| {
@@ -366,6 +373,7 @@ fn send_and_extract_flattened_documents_data(
flattened_documents_chunk.clone(),
indexer,
faceted_fields,
geo_fields_ids,
)?;
// send docid_fid_facet_numbers_chunk to DB writer

View File

@@ -1,4 +1,5 @@
use std::borrow::Cow;
use std::collections::BTreeSet;
use std::io;
use std::result::Result as StdResult;
@@ -44,6 +45,27 @@ pub fn merge_roaring_bitmaps<'a>(_key: &[u8], values: &[Cow<'a, [u8]>]) -> Resul
}
}
pub fn merge_btreeset_string<'a>(_key: &[u8], values: &[Cow<'a, [u8]>]) -> Result<Cow<'a, [u8]>> {
if values.len() == 1 {
Ok(values[0].clone())
} else {
// TODO improve the perf by using a `#[borrow] Cow<str>`.
let strings: BTreeSet<String> = values
.iter()
.map(AsRef::as_ref)
.map(serde_json::from_slice::<BTreeSet<String>>)
.map(StdResult::unwrap)
.reduce(|mut current, new| {
for x in new {
current.insert(x);
}
current
})
.unwrap();
Ok(Cow::Owned(serde_json::to_vec(&strings).unwrap()))
}
}
pub fn keep_first<'a>(_key: &[u8], values: &[Cow<'a, [u8]>]) -> Result<Cow<'a, [u8]>> {
Ok(values[0].clone())
}

View File

@@ -13,9 +13,9 @@ pub use grenad_helpers::{
GrenadParameters, MergeableReader,
};
pub use merge_functions::{
concat_u32s_array, keep_first, keep_latest_obkv, merge_cbo_roaring_bitmaps,
merge_obkvs_and_operations, merge_roaring_bitmaps, merge_two_obkvs, serialize_roaring_bitmap,
MergeFn,
concat_u32s_array, keep_first, keep_latest_obkv, merge_btreeset_string,
merge_cbo_roaring_bitmaps, merge_obkvs_and_operations, merge_roaring_bitmaps, merge_two_obkvs,
serialize_roaring_bitmap, MergeFn,
};
use crate::MAX_WORD_LENGTH;

View File

@@ -26,7 +26,7 @@ pub use self::enrich::{
};
pub use self::helpers::{
as_cloneable_grenad, create_sorter, create_writer, fst_stream_into_hashset,
fst_stream_into_vec, merge_cbo_roaring_bitmaps, merge_roaring_bitmaps,
fst_stream_into_vec, merge_btreeset_string, merge_cbo_roaring_bitmaps, merge_roaring_bitmaps,
sorter_into_lmdb_database, valid_lmdb_key, writer_into_reader, ClonableMmap, MergeFn,
};
use self::helpers::{grenad_obkv_into_chunks, GrenadParameters};
@@ -2519,6 +2519,25 @@ mod tests {
db_snap!(index, word_position_docids, 3, @"74f556b91d161d997a89468b4da1cb8f");
}
/// Index multiple different number of vectors in documents.
/// Vectors must be of the same length.
#[test]
fn test_multiple_vectors() {
let index = TempIndex::new();
index.add_documents(documents!([{"id": 0, "_vectors": [[0, 1, 2], [3, 4, 5]] }])).unwrap();
index.add_documents(documents!([{"id": 1, "_vectors": [6, 7, 8] }])).unwrap();
index
.add_documents(
documents!([{"id": 2, "_vectors": [[9, 10, 11], [12, 13, 14], [15, 16, 17]] }]),
)
.unwrap();
let rtxn = index.read_txn().unwrap();
let res = index.search(&rtxn).vector([0.0, 1.0, 2.0]).execute().unwrap();
assert_eq!(res.documents_ids.len(), 3);
}
#[test]
fn reproduce_the_bug() {
/*

View File

@@ -9,22 +9,19 @@ use charabia::{Language, Script};
use grenad::MergerBuilder;
use heed::types::ByteSlice;
use heed::RwTxn;
use hnsw::Searcher;
use roaring::RoaringBitmap;
use space::KnnPoints;
use super::helpers::{
self, merge_ignore_values, serialize_roaring_bitmap, valid_lmdb_key, CursorClonableMmap,
};
use super::{ClonableMmap, MergeFn};
use crate::distance::NDotProductPoint;
use crate::error::UserError;
use crate::facet::FacetType;
use crate::index::Hnsw;
use crate::update::facet::FacetsUpdate;
use crate::update::index_documents::helpers::{as_cloneable_grenad, try_split_array_at};
use crate::{
lat_lng_to_xyz, normalize_vector, CboRoaringBitmapCodec, DocumentId, GeoPoint, Index, Result,
BEU32,
};
use crate::{lat_lng_to_xyz, CboRoaringBitmapCodec, DocumentId, GeoPoint, Index, Result, BEU32};
pub(crate) enum TypedChunk {
FieldIdDocidFacetStrings(grenad::Reader<CursorClonableMmap>),
@@ -230,17 +227,20 @@ pub(crate) fn write_typed_chunk_into_index(
index.put_geo_faceted_documents_ids(wtxn, &geo_faceted_docids)?;
}
TypedChunk::VectorPoints(vector_points) => {
let mut hnsw = index.vector_hnsw(wtxn)?.unwrap_or_default();
let mut searcher = Searcher::new();
let mut expected_dimensions = match index.vector_id_docid.iter(wtxn)?.next() {
Some(result) => {
let (vector_id, _) = result?;
Some(hnsw.get_point(vector_id.get() as usize).len())
}
None => None,
let (pids, mut points): (Vec<_>, Vec<_>) = match index.vector_hnsw(wtxn)? {
Some(hnsw) => hnsw.iter().map(|(pid, point)| (pid, point.clone())).unzip(),
None => Default::default(),
};
// Convert the PointIds into DocumentIds
let mut docids = Vec::new();
for pid in pids {
let docid =
index.vector_id_docid.get(wtxn, &BEU32::new(pid.into_inner()))?.unwrap();
docids.push(docid.get());
}
let mut expected_dimensions = points.get(0).map(|p| p.len());
let mut cursor = vector_points.into_cursor()?;
while let Some((key, value)) = cursor.move_on_next()? {
// convert the key back to a u32 (4 bytes)
@@ -256,12 +256,26 @@ pub(crate) fn write_typed_chunk_into_index(
return Err(UserError::InvalidVectorDimensions { expected, found })?;
}
let vector = normalize_vector(vector);
let vector_id = hnsw.insert(vector, &mut searcher) as u32;
index.vector_id_docid.put(wtxn, &BEU32::new(vector_id), &BEU32::new(docid))?;
points.push(NDotProductPoint::new(vector));
docids.push(docid);
}
log::debug!("There are {} entries in the HNSW so far", hnsw.len());
index.put_vector_hnsw(wtxn, &hnsw)?;
assert_eq!(docids.len(), points.len());
let hnsw_length = points.len();
let (new_hnsw, pids) = Hnsw::builder().build_hnsw(points);
index.vector_id_docid.clear(wtxn)?;
for (docid, pid) in docids.into_iter().zip(pids) {
index.vector_id_docid.put(
wtxn,
&BEU32::new(pid.into_inner()),
&BEU32::new(docid),
)?;
}
log::debug!("There are {} entries in the HNSW so far", hnsw_length);
index.put_vector_hnsw(wtxn, &new_hnsw)?;
}
TypedChunk::ScriptLanguageDocids(hash_pair) => {
let mut buffer = Vec::new();

View File

@@ -4,8 +4,9 @@ pub use self::delete_documents::{DeleteDocuments, DeletionStrategy, DocumentDele
pub use self::facet::bulk::FacetsUpdateBulk;
pub use self::facet::incremental::FacetsUpdateIncrementalInner;
pub use self::index_documents::{
merge_cbo_roaring_bitmaps, merge_roaring_bitmaps, DocumentAdditionResult, DocumentId,
IndexDocuments, IndexDocumentsConfig, IndexDocumentsMethod, MergeFn,
merge_btreeset_string, merge_cbo_roaring_bitmaps, merge_roaring_bitmaps,
DocumentAdditionResult, DocumentId, IndexDocuments, IndexDocumentsConfig, IndexDocumentsMethod,
MergeFn,
};
pub use self::indexer_config::IndexerConfig;
pub use self::prefix_word_pairs::{

View File

@@ -425,13 +425,14 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
let current = self.index.stop_words(self.wtxn)?;
// Apply an unlossy normalization on stop_words
let stop_words = stop_words
let stop_words: BTreeSet<String> = stop_words
.iter()
.map(|w| w.as_str().normalize(&Default::default()).into_owned());
.map(|w| w.as_str().normalize(&Default::default()).into_owned())
.collect();
// since we can't compare a BTreeSet with an FST we are going to convert the
// BTreeSet to an FST and then compare bytes per bytes the two FSTs.
let fst = fst::Set::from_iter(stop_words)?;
let fst = fst::Set::from_iter(stop_words.into_iter())?;
// Does the new FST differ from the previous one?
if current