Compare commits

...

49 Commits

Author SHA1 Message Date
3fa40a3f9c Change the error codes of the faceting settings 2023-06-22 18:15:15 +02:00
5675847a5f Move the sortFacetValuesBy in the faceting settings 2023-06-22 17:40:10 +02:00
1c77117d02 Make Clippy happy 2023-05-29 16:02:54 +02:00
26dc415d9e Replace the BTreeMap by an IndexMap to return values in order 2023-05-29 15:47:45 +02:00
89a4e7cee4 Expose a sortFacetValuesBy parameter to the user 2023-05-29 15:32:09 +02:00
f2040e50b2 Clean and make the facet order configurable internally 2023-05-29 15:09:41 +02:00
2b62e85622 Make the search to always return the facets ordered by count 2023-05-29 11:52:57 +02:00
c13e3d5c8a First to-test version of the algorithm 2023-05-25 12:28:26 +02:00
73a8018eb1 Rename facet distribution to be explicit on the order to find them 2023-05-25 10:59:04 +02:00
087866d59f Merge #3775
3775: Last error code changes on the new get/delete documents routes r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes #3774

## What does this PR do?
Following the specification: https://github.com/meilisearch/specifications/pull/236

1. Get rid of the `invalid_document_delete_filter` and always use the `invalid_document_filter`
2. Introduce a new `missing_document_filter` instead of returning `invalid_document_delete_filter` (that’s consistent with all the other routes that have a mandatory parameter)
3. Always return the `original_filter` in the details (potentially set to `null`) instead of hiding it if it wasn’t used


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-05-24 10:07:41 +00:00
9111f5176f get rid of the invalid document delete filter in favor of the invalid document filter 2023-05-24 11:53:16 +02:00
b9dd092a62 make the details return null in the originalFilter field if no filter was provided + add a big test on the details 2023-05-24 11:48:22 +02:00
ca99bc3188 implement the missing document filter error code when deleting documents 2023-05-24 11:29:20 +02:00
2e49d6aec1 Merge #3768
3768: Fix bugs in graph-based ranking rules + make `words` a graph-based ranking rule r=dureuill a=loiclec

This PR contains three changes:

## 1. Don't call the `words` ranking rule if the term matching strategy is `All`

This is because the purpose of `words` is only to remove nodes from the query graph. It would never do any useful work when the matching strategy was `All`. Remember that the universe was already computed before by computing all the docids corresponding to the "maximally reduced" query graph, which, in the case of `All`, is equal to the original graph.

## 2. The `words` ranking rule is replaced by a graph-based ranking rule. 

This is for three reasons:

1. **performance**: graph-based ranking rules benefit from a lot of optimisations by default, which ensures that they are never too slow. The previous implementation of `words` could call `compute_query_graph_docids` many times if some words had to be removed from the query, which would be quite expensive. I was especially worried about its performance in cases where it is placed right after the `sort` ranking rule. Furthermore, `compute_query_graph_docids` would clone a lot of bitmaps many times unnecessarily.

2. **consistency**: every other ranking rule (except `sort`) is graph-based. It makes sense to implement `words` like that as well. It will automatically benefit from all the features, optimisations, and bug fixes that all the other ranking rules get.

3. **surfacing bugs**: as the first ranking rule to be called (most of the time), I'd like `words` to behave the same as the other ranking rules so that we can quickly detect bugs in our graph algorithms. This actually already happened, which is why this PR also contains a bug fix.

## 3. Fix the `update_all_costs_before_nodes` function

It is a bit difficult to explain what was wrong, but I'll try. The bug happened when we had graphs like:
<img width="730" alt="Screenshot 2023-05-16 at 10 58 57" src="https://github.com/meilisearch/meilisearch/assets/6040237/40db1a68-d852-4e89-99d5-0d65757242a7">
and we gave the node `is` as argument.

Then, we'd walk backwards from the node breadth-first. We'd update the costs of:
1. `sun`
2. `thesun`
3. `start`
4. `the`

which is an incorrect order. The correct order is:

1. `sun`
2. `thesun`
3. `the`
4. `start`

That is, we can only update the cost of a node when all of its successors have either already been visited or were not affected by the update to the node passed as argument. To solve this bug, I factored out the graph-traversal logic into a `traverse_breadth_first_backward` function.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-23 13:28:08 +00:00
51043f78f0 Remove trailing whitespace 2023-05-23 15:27:25 +02:00
a490a11325 Add explanatory comment on the way we're recomputing costs 2023-05-23 15:24:24 +02:00
101f5a20d2 Merge #3757
3757: Adjust the cost of edges in the `position` ranking rule by bucketing positions more aggressively r=loiclec a=loiclec

This PR significantly improves the performance of the `position` ranking rule when:
1. a query contains many words
2. the `position` ranking rule needs to be called many times
3. the score of the documents according to `position` is high

These conditions greatly increase:
1. the number of edge traversals that are needed to find a valid path from the `start` node to the `end` node
2. the number of edges that need to be deleted from the graph, and therefore the number of times that we need to recompute all the possible costs from START to END

As a result, a majority of the search time is spent in `visit_condition`, `visit_node`, and `update_all_costs_before_node`. This is frustrating because it often happens when the "universe" given to the rule consists of only a handful of document ids.

By limiting the number of possible edges between two nodes from `20` to `10`, we:
1. reduce the number of possible costs from START to END
2. reduce the number of edges that will be deleted 
3. make it faster to update the costs after deleting an edge
4. reduce the number of buckets that need to be computed

In terms of relevancy, I don't think we lose or gain much. We still prefer terms that are in a lower positions, with decreasing precision as we go further. The previous choice of bucketing wasn't chosen in a principled way, and neither is this one. They both "feel" right to me.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
2023-05-17 11:43:59 +00:00
6ce1ce77e6 Merge #3738
3738: Add analytics on the get documents resource r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3737
Related spec https://github.com/meilisearch/specifications/pull/234

## What does this PR do?
Add the analytics for the following routes:
- `GET` - `/indexes/:uid/documents`
- `GET` - `/indexes/:uid/documents/:doc_id`
- `POST` - `/indexes/:uid/documents/fetch`

These analytics are aggregated between two events:
- `Documents Fetched GET`
- `Documents Fetched POST`

That shares the same payload:
 Property name | Description | Example |
|---------------|-------------|---------|
| `requests.total_received` | Total number of request received in this batch | 325 |
| `per_document_id` | `false` | false |
| `per_filter` | `true` if `POST /indexes/:indexUid/documents/fetch` endpoint was used with a filter in this batch, otherwise `false` | false |
| `pagination.max_limit` | Highest value given for the `limit` parameter in this batch | 60 |
| `pagination.max_offset` | Highest value given for the `offset` parameter in this batch | 1000 |

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-05-16 19:37:41 +00:00
ec8f685d84 Fix bug in cheapest path algorithm 2023-05-16 17:01:30 +02:00
5758268866 Don't compute split_words for phrases 2023-05-16 17:01:18 +02:00
4d037e6693 Merge #3759
3759: Invalid error code when parsing filters r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3753

## What does this PR do?
Fix the error code in case the error comes from the evaluate of the filter for the get, fetch and delete documents routes.


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-05-16 12:55:06 +00:00
96da5130a4 fix the error code in case of not filterable attributes on the get / delete documents by filter routes 2023-05-16 13:56:18 +02:00
3e19702de6 Update snapshot tests 2023-05-16 12:22:46 +02:00
1e762d151f Merge #3755
3755: Re-add final dot r=curquiza a=ManyTheFish

I removed the final dot of the error message in my last PR, this one re-adds it.

related to https://github.com/meilisearch/meilisearch/pull/3749

> Oups 😬 

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-05-16 10:10:58 +00:00
0b38f211ac test the new introduced route 2023-05-16 12:07:44 +02:00
f6524a6858 Adjust costs of edges in position ranking rule
To ensure good performance
2023-05-16 11:28:56 +02:00
65ad8cce36 Merge #3741
3741: Add ngram support to the highlighter r=ManyTheFish a=loiclec

This PR fixes a bug introduced by the search refactor, where ngrams were not highlighted. 

The solution was to add the ngrams to the vector of `LocatedQueryTerm` that is given to the `MatchingWords` structure.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-05-16 09:03:31 +00:00
42650f82e8 Re-add final dot 2023-05-16 10:57:26 +02:00
a37da36766 Implement words as a graph-based ranking rule and fix some bugs 2023-05-16 10:42:11 +02:00
85d96d35a8 Highlight ngram matches as well 2023-05-16 10:39:36 +02:00
bf66e97b48 Merge #3749
3749: Fix back: sort error message r=ManyTheFish a=ManyTheFish

This PR reintroduces the error message modified in https://github.com/meilisearch/milli/pull/375.
However, this added double-quotes around `sort` in the message. I don't think another message contains double-quotes, so I have added a separate commit replacing the double-quotes with back-ticks, which seems more consistent with the other error messages, this last change can be reverted easily.

## Detailed changes
#### v1.2-rc0
```
The sort ranking rule must be specified in the ranking rules settings to use the sort parameter at search time.
```
#### [Reintroduce fix (previous and expected behavior)](23d1c86825)
```
You must specify where "sort" is listed in the rankingRules setting to use the sort parameter at search time
```
#### [Replace double-quotes with back-ticks (my suggestion)](4d691d071a)
```
You must specify where `sort` is listed in the rankingRules setting to use the sort parameter at search time
```

## Related

Fixes #3722

## Reviewers

- technical review: `@irevoire`
- to validate the replacement: `@macraig`

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-05-15 14:55:51 +00:00
a7ea5ec748 Merge #3651
3651: Use the writemap flag to reduce the memory usage r=irevoire a=Kerollmops

This draft PR is showing some stats about the memory usage of Meilisearch when [the LMDB `MDB_WRITEMAP` flag](3947014aed/libraries/liblmdb/lmdb.h (L573-L581)) is enabled and when it is not. As you can see there is a reduction of about 50% of the memory usage pick. The dataset used was [the Wikipedia one](https://www.notion.so/meilisearch/Wikipedia-8b1486e4b17547c5bda485d2d97767a0) with the first 30 000 first CSV documents without settings. This PR depends on https://github.com/meilisearch/heed/pull/168.

I just [opened a discussion](https://github.com/meilisearch/product/discussions/652) for people to understand the tradeoffs and give their feedback.

- [x] Create an experiment flag `--experimental-reduce-indexing-memory-usage`.
- [x] Add it to the config file.
- [x] Explain the tradeoff and copy/link the LMDB documentation in the help message.
- [x] Add analytics about the experimental flag.
- [x] Document that this flag cannot be used on Windows, ~~or hide it~~.

<details>
  <summary>The command I used to run the tests</summary>

#### Sign the binary to be able to use Instruments / xcrun
```sh
codesign -s - -f --entitlements ~/ent.plist target/release/meilisearch
```

where `ent.plist` contains:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>com.apple.security.get-task-allow</key>
        <true/>
    </dict>
</plist>
```

#### Run Meilisearch in measure-mode
```sh
xcrun xctrace record --template 'Allocations' --launch -- target/release/meilisearch --max-indexing-memory 0MiB
```

#### Send the wiki dataset available on notion.so / Public
```sh
for f in 0.csv 15000.csv; do echo sending $f; xh 'localhost:7700/indexes/wiki/documents' 'content-type:text/csv' `@$f;` done
```

#### Wait for the task to finish
```sh
watch --color xh --pretty all 'localhost:7700/tasks?statuses=processing'
```
</details>

Keep in mind that I tested that with the Instruments Apple tools on an iMac 5k 2019. More benchmarks must be done, especially on the indexation speed, as the flag is told to slow down writing into databases bigger that the amount of memory.

On the left Meilisearch is running without the flag. On the right, it is running with the flag.

<p align="center">
<img align="left" width="45%" alt="Instrument showing the memory usage of Meilisearch without the MDB_WRITEMAP flag" src="https://user-images.githubusercontent.com/3610253/234299524-7607f1df-6fc1-45d3-bd3d-4f9388002857.png">
<img align="right" width="45%" alt="Instrument showing the memory usage of Meilisearch with the MDB_WRITEMAP flag" src="https://user-images.githubusercontent.com/3610253/234299534-6cc3ae58-8bd9-426c-aa79-4c78f9e88b94.png">
</p>

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-05-15 14:10:07 +00:00
dc7ba77e57 Add the option in the config file 2023-05-15 16:07:43 +02:00
13f870e993 Fix typos and documentation issues 2023-05-15 15:11:45 +02:00
1a79fd0c3c Use the new heed v0.12.6 2023-05-15 11:42:30 +02:00
f759ec7fad Expose a flag to enable the MDB_WRITEMAP flag 2023-05-15 11:38:43 +02:00
4d691d071a Change double-quotes by back-ticks in sort error message 2023-05-15 11:10:36 +02:00
23d1c86825 Re-introduce the sort error message fix 2023-05-15 11:07:23 +02:00
c4a40e7110 Use the writemap flag to reduce the memory usage 2023-05-15 10:15:33 +02:00
e01980c6f4 Merge #3739
3739: fix: update `payload_too_large` error message to include human readable maximum acceptable payload size r=Kerollmops a=cymruu


# Pull Request

## Related issue
Fixes #3736 

## What does this PR do?
- update `payload_too_large` error message as requested in ticket

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Filip Bachul <filipbachul@gmail.com>
2023-05-11 09:37:19 +00:00
25209a3590 introduce remaining field in Payload 2023-05-10 20:55:18 +02:00
3064ea6495 fix: update payload_too_large error message to include human readable maximum acceptable payload size 2023-05-10 18:16:59 +02:00
46ec8a97e9 rename the analytics according to the spec 2023-05-10 14:28:30 +02:00
c42a65a297 Update meilisearch/src/analytics/segment_analytics.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-10 14:28:30 +02:00
d08f8690d2 add analytics on the get documents resource 2023-05-10 14:28:30 +02:00
ad5f25d880 Merge #3742
3742: Compute split words derivations of terms that don't accept typos r=ManyTheFish a=loiclec

Allows looking for the split-word derivation for short words in the user's query (like `the -> "t he"` or `door -> do or`) as well as for 3grams.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-05-10 12:12:52 +00:00
4d352a21ac Compute split words derivations of terms that don't accept typos 2023-05-10 13:31:19 +02:00
4a4210c116 Merge #3734
3734: Update version for the next release (v1.2.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2023-05-09 07:35:48 +00:00
3533d4f2bb Update version for the next release (v1.2.0) in Cargo.toml 2023-05-08 17:52:33 +00:00
59 changed files with 1396 additions and 616 deletions

33
Cargo.lock generated
View File

@ -463,7 +463,7 @@ checksum = "b645a089122eccb6111b4f81cbc1a49f5900ac4666bb93ac027feaecf15607bf"
[[package]]
name = "benchmarks"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"anyhow",
"bytes",
@ -1209,7 +1209,7 @@ dependencies = [
[[package]]
name = "dump"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"anyhow",
"big_s",
@ -1428,7 +1428,7 @@ dependencies = [
[[package]]
name = "file-store"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"faux",
"tempfile",
@ -1450,7 +1450,7 @@ dependencies = [
[[package]]
name = "filter-parser"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"insta",
"nom",
@ -1476,7 +1476,7 @@ dependencies = [
[[package]]
name = "flatten-serde-json"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"criterion",
"serde_json",
@ -1794,7 +1794,7 @@ checksum = "95505c38b4572b2d910cecb0281560f54b440a19336cbbcb27bf6ce6adc6f5a8"
[[package]]
name = "heed"
version = "0.12.5"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.5#4158a6c484752afaaf9e2530a6ee0e7ab0f24ee8"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.6#8c5b94225fc949c02bb7b900cc50ffaf6b584b1e"
dependencies = [
"byteorder",
"heed-traits",
@ -1811,12 +1811,12 @@ dependencies = [
[[package]]
name = "heed-traits"
version = "0.7.0"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.5#4158a6c484752afaaf9e2530a6ee0e7ab0f24ee8"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.6#8c5b94225fc949c02bb7b900cc50ffaf6b584b1e"
[[package]]
name = "heed-types"
version = "0.7.2"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.5#4158a6c484752afaaf9e2530a6ee0e7ab0f24ee8"
source = "git+https://github.com/meilisearch/heed?tag=v0.12.6#8c5b94225fc949c02bb7b900cc50ffaf6b584b1e"
dependencies = [
"bincode",
"heed-traits",
@ -1959,7 +1959,7 @@ dependencies = [
[[package]]
name = "index-scheduler"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"anyhow",
"big_s",
@ -2113,7 +2113,7 @@ dependencies = [
[[package]]
name = "json-depth-checker"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"criterion",
"serde_json",
@ -2539,7 +2539,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]]
name = "meili-snap"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"insta",
"md5",
@ -2548,7 +2548,7 @@ dependencies = [
[[package]]
name = "meilisearch"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"actix-cors",
"actix-http",
@ -2636,7 +2636,7 @@ dependencies = [
[[package]]
name = "meilisearch-auth"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"base64 0.21.0",
"enum-iterator",
@ -2655,7 +2655,7 @@ dependencies = [
[[package]]
name = "meilisearch-types"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"actix-web",
"anyhow",
@ -2709,7 +2709,7 @@ dependencies = [
[[package]]
name = "milli"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"big_s",
"bimap",
@ -2730,6 +2730,7 @@ dependencies = [
"geoutils",
"grenad",
"heed",
"indexmap",
"insta",
"itertools",
"json-depth-checker",
@ -3064,7 +3065,7 @@ checksum = "478c572c3d73181ff3c2539045f6eb99e5491218eae919370993b890cdbdd98e"
[[package]]
name = "permissive-json-pointer"
version = "1.1.1"
version = "1.2.0"
dependencies = [
"big_s",
"serde_json",

View File

@ -17,7 +17,7 @@ members = [
]
[workspace.package]
version = "1.1.1"
version = "1.2.0"
authors = ["Quentin de Quelen <quentin@dequelen.me>", "Clément Renault <clement@meilisearch.com>"]
description = "Meilisearch HTTP server"
homepage = "https://meilisearch.com"

View File

@ -126,3 +126,6 @@ ssl_tickets = false
experimental_enable_metrics = false
# Experimental metrics feature. For more information, see: <https://github.com/meilisearch/meilisearch/discussions/3518>
# Enables the Prometheus metrics on the `GET /metrics` endpoint.
experimental_reduce_indexing_memory_usage = false
# Experimental RAM reduction during indexing, do not use in production, see: <https://github.com/meilisearch/product/discussions/652>

View File

@ -358,6 +358,7 @@ impl<T> From<v5::Settings<T>> for v6::Settings<v6::Unchecked> {
faceting: match settings.faceting {
v5::Setting::Set(faceting) => v6::Setting::Set(v6::FacetingSettings {
max_values_per_facet: faceting.max_values_per_facet.into(),
sort_facet_values_by: v6::Setting::NotSet,
}),
v5::Setting::Reset => v6::Setting::Reset,
v5::Setting::NotSet => v6::Setting::NotSet,

View File

@ -24,6 +24,7 @@ use std::io::BufWriter;
use dump::IndexMetadata;
use log::{debug, error, info};
use meilisearch_types::error::Code;
use meilisearch_types::heed::{RoTxn, RwTxn};
use meilisearch_types::milli::documents::{obkv_to_object, DocumentsBatchReader};
use meilisearch_types::milli::heed::CompactionOption;
@ -1491,7 +1492,12 @@ fn delete_document_by_filter(filter: &serde_json::Value, index: Index) -> Result
Ok(if let Some(filter) = filter {
let mut wtxn = index.write_txn()?;
let candidates = filter.evaluate(&wtxn, &index)?;
let candidates = filter.evaluate(&wtxn, &index).map_err(|err| match err {
milli::Error::UserError(milli::UserError::InvalidFilter(_)) => {
Error::from(err).with_custom_error_code(Code::InvalidDocumentFilter)
}
e => e.into(),
})?;
let mut delete_operation = DeleteDocuments::new(&mut wtxn, &index)?;
delete_operation.delete_documents(&candidates);
let deleted_documents =

View File

@ -46,6 +46,8 @@ impl From<DateField> for Code {
#[allow(clippy::large_enum_variant)]
#[derive(Error, Debug)]
pub enum Error {
#[error("{1}")]
WithCustomErrorCode(Code, Box<Self>),
#[error("Index `{0}` not found.")]
IndexNotFound(String),
#[error("Index `{0}` already exists.")]
@ -144,6 +146,7 @@ impl Error {
pub fn is_recoverable(&self) -> bool {
match self {
Error::IndexNotFound(_)
| Error::WithCustomErrorCode(_, _)
| Error::IndexAlreadyExists(_)
| Error::SwapDuplicateIndexFound(_)
| Error::SwapDuplicateIndexesFound(_)
@ -176,11 +179,16 @@ impl Error {
Error::PlannedFailure => false,
}
}
pub fn with_custom_error_code(self, code: Code) -> Self {
Self::WithCustomErrorCode(code, Box::new(self))
}
}
impl ErrorCode for Error {
fn error_code(&self) -> Code {
match self {
Error::WithCustomErrorCode(code, _) => *code,
Error::IndexNotFound(_) => Code::IndexNotFound,
Error::IndexAlreadyExists(_) => Code::IndexAlreadyExists,
Error::SwapDuplicateIndexesFound(_) => Code::InvalidSwapDuplicateIndexFound,

View File

@ -5,6 +5,7 @@ use std::collections::BTreeMap;
use std::path::Path;
use std::time::Duration;
use meilisearch_types::heed::flags::Flags;
use meilisearch_types::heed::{EnvClosingEvent, EnvOpenOptions};
use meilisearch_types::milli::Index;
use time::OffsetDateTime;
@ -53,6 +54,7 @@ pub struct IndexMap {
pub struct ClosingIndex {
uuid: Uuid,
closing_event: EnvClosingEvent,
enable_mdb_writemap: bool,
map_size: usize,
generation: usize,
}
@ -68,6 +70,7 @@ impl ClosingIndex {
pub fn wait_timeout(self, timeout: Duration) -> Option<ReopenableIndex> {
self.closing_event.wait_timeout(timeout).then_some(ReopenableIndex {
uuid: self.uuid,
enable_mdb_writemap: self.enable_mdb_writemap,
map_size: self.map_size,
generation: self.generation,
})
@ -76,6 +79,7 @@ impl ClosingIndex {
pub struct ReopenableIndex {
uuid: Uuid,
enable_mdb_writemap: bool,
map_size: usize,
generation: usize,
}
@ -103,7 +107,7 @@ impl ReopenableIndex {
return Ok(());
}
map.unavailable.remove(&self.uuid);
map.create(&self.uuid, path, None, self.map_size)?;
map.create(&self.uuid, path, None, self.enable_mdb_writemap, self.map_size)?;
}
Ok(())
}
@ -170,16 +174,17 @@ impl IndexMap {
uuid: &Uuid,
path: &Path,
date: Option<(OffsetDateTime, OffsetDateTime)>,
enable_mdb_writemap: bool,
map_size: usize,
) -> Result<Index> {
if !matches!(self.get_unavailable(uuid), Missing) {
panic!("Attempt to open an index that was unavailable");
}
let index = create_or_open_index(path, date, map_size)?;
let index = create_or_open_index(path, date, enable_mdb_writemap, map_size)?;
match self.available.insert(*uuid, index.clone()) {
InsertionOutcome::InsertedNew => (),
InsertionOutcome::Evicted(evicted_uuid, evicted_index) => {
self.close(evicted_uuid, evicted_index, 0);
self.close(evicted_uuid, evicted_index, enable_mdb_writemap, 0);
}
InsertionOutcome::Replaced(_) => {
panic!("Attempt to open an index that was already opened")
@ -212,17 +217,30 @@ impl IndexMap {
/// | Closing | Closing |
/// | Available | Closing |
///
pub fn close_for_resize(&mut self, uuid: &Uuid, map_size_growth: usize) {
pub fn close_for_resize(
&mut self,
uuid: &Uuid,
enable_mdb_writemap: bool,
map_size_growth: usize,
) {
let Some(index) = self.available.remove(uuid) else { return; };
self.close(*uuid, index, map_size_growth);
self.close(*uuid, index, enable_mdb_writemap, map_size_growth);
}
fn close(&mut self, uuid: Uuid, index: Index, map_size_growth: usize) {
fn close(
&mut self,
uuid: Uuid,
index: Index,
enable_mdb_writemap: bool,
map_size_growth: usize,
) {
let map_size = index.map_size().unwrap_or(DEFAULT_MAP_SIZE) + map_size_growth;
let closing_event = index.prepare_for_closing();
let generation = self.next_generation();
self.unavailable
.insert(uuid, Some(ClosingIndex { uuid, closing_event, map_size, generation }));
self.unavailable.insert(
uuid,
Some(ClosingIndex { uuid, closing_event, enable_mdb_writemap, map_size, generation }),
);
}
/// Attempts to delete and index.
@ -282,11 +300,15 @@ impl IndexMap {
fn create_or_open_index(
path: &Path,
date: Option<(OffsetDateTime, OffsetDateTime)>,
enable_mdb_writemap: bool,
map_size: usize,
) -> Result<Index> {
let mut options = EnvOpenOptions::new();
options.map_size(clamp_to_page_size(map_size));
options.max_readers(1024);
if enable_mdb_writemap {
unsafe { options.flag(Flags::MdbWriteMap) };
}
if let Some((created, updated)) = date {
Ok(Index::new_with_creation_dates(options, path, created, updated)?)

View File

@ -66,6 +66,8 @@ pub struct IndexMapper {
index_base_map_size: usize,
/// The quantity by which the map size of an index is incremented upon reopening, in bytes.
index_growth_amount: usize,
/// Whether we open a meilisearch index with the MDB_WRITEMAP option or not.
enable_mdb_writemap: bool,
pub indexer_config: Arc<IndexerConfig>,
}
@ -123,15 +125,22 @@ impl IndexMapper {
index_base_map_size: usize,
index_growth_amount: usize,
index_count: usize,
enable_mdb_writemap: bool,
indexer_config: IndexerConfig,
) -> Result<Self> {
let mut wtxn = env.write_txn()?;
let index_mapping = env.create_database(&mut wtxn, Some(INDEX_MAPPING))?;
let index_stats = env.create_database(&mut wtxn, Some(INDEX_STATS))?;
wtxn.commit()?;
Ok(Self {
index_map: Arc::new(RwLock::new(IndexMap::new(index_count))),
index_mapping: env.create_database(Some(INDEX_MAPPING))?,
index_stats: env.create_database(Some(INDEX_STATS))?,
index_mapping,
index_stats,
base_path,
index_base_map_size,
index_growth_amount,
enable_mdb_writemap,
indexer_config: Arc::new(indexer_config),
})
}
@ -162,6 +171,7 @@ impl IndexMapper {
&uuid,
&index_path,
date,
self.enable_mdb_writemap,
self.index_base_map_size,
)?;
@ -273,7 +283,11 @@ impl IndexMapper {
.ok_or_else(|| Error::IndexNotFound(name.to_string()))?;
// We remove the index from the in-memory index map.
self.index_map.write().unwrap().close_for_resize(&uuid, self.index_growth_amount);
self.index_map.write().unwrap().close_for_resize(
&uuid,
self.enable_mdb_writemap,
self.index_growth_amount,
);
Ok(())
}
@ -338,6 +352,7 @@ impl IndexMapper {
&uuid,
&index_path,
None,
self.enable_mdb_writemap,
self.index_base_map_size,
)?;
}

View File

@ -233,6 +233,8 @@ pub struct IndexSchedulerOptions {
pub task_db_size: usize,
/// The size, in bytes, with which a meilisearch index is opened the first time of each meilisearch index.
pub index_base_map_size: usize,
/// Whether we open a meilisearch index with the MDB_WRITEMAP option or not.
pub enable_mdb_writemap: bool,
/// The size, in bytes, by which the map size of an index is increased when it resized due to being full.
pub index_growth_amount: usize,
/// The number of indexes that can be concurrently opened in memory.
@ -374,6 +376,11 @@ impl IndexScheduler {
std::fs::create_dir_all(&options.indexes_path)?;
std::fs::create_dir_all(&options.dumps_path)?;
if cfg!(windows) && options.enable_mdb_writemap {
// programmer error if this happens: in normal use passing the option on Windows is an error in main
panic!("Windows doesn't support the MDB_WRITEMAP LMDB option");
}
let task_db_size = clamp_to_page_size(options.task_db_size);
let budget = if options.indexer_config.skip_index_budget {
IndexBudget {
@ -396,25 +403,37 @@ impl IndexScheduler {
.open(options.tasks_path)?;
let file_store = FileStore::new(&options.update_file_path)?;
let mut wtxn = env.write_txn()?;
let all_tasks = env.create_database(&mut wtxn, Some(db_name::ALL_TASKS))?;
let status = env.create_database(&mut wtxn, Some(db_name::STATUS))?;
let kind = env.create_database(&mut wtxn, Some(db_name::KIND))?;
let index_tasks = env.create_database(&mut wtxn, Some(db_name::INDEX_TASKS))?;
let canceled_by = env.create_database(&mut wtxn, Some(db_name::CANCELED_BY))?;
let enqueued_at = env.create_database(&mut wtxn, Some(db_name::ENQUEUED_AT))?;
let started_at = env.create_database(&mut wtxn, Some(db_name::STARTED_AT))?;
let finished_at = env.create_database(&mut wtxn, Some(db_name::FINISHED_AT))?;
wtxn.commit()?;
// allow unreachable_code to get rids of the warning in the case of a test build.
let this = Self {
must_stop_processing: MustStopProcessing::default(),
processing_tasks: Arc::new(RwLock::new(ProcessingTasks::new())),
file_store,
all_tasks: env.create_database(Some(db_name::ALL_TASKS))?,
status: env.create_database(Some(db_name::STATUS))?,
kind: env.create_database(Some(db_name::KIND))?,
index_tasks: env.create_database(Some(db_name::INDEX_TASKS))?,
canceled_by: env.create_database(Some(db_name::CANCELED_BY))?,
enqueued_at: env.create_database(Some(db_name::ENQUEUED_AT))?,
started_at: env.create_database(Some(db_name::STARTED_AT))?,
finished_at: env.create_database(Some(db_name::FINISHED_AT))?,
all_tasks,
status,
kind,
index_tasks,
canceled_by,
enqueued_at,
started_at,
finished_at,
index_mapper: IndexMapper::new(
&env,
options.indexes_path,
budget.map_size,
options.index_growth_amount,
budget.index_count,
options.enable_mdb_writemap,
options.indexer_config,
)?,
env,
@ -1471,6 +1490,7 @@ mod tests {
dumps_path: tempdir.path().join("dumps"),
task_db_size: 1000 * 1000, // 1 MB, we don't use MiB on purpose.
index_base_map_size: 1000 * 1000, // 1 MB, we don't use MiB on purpose.
enable_mdb_writemap: false,
index_growth_amount: 1000 * 1000, // 1 MB
index_count: 5,
indexer_config,

View File

@ -55,9 +55,11 @@ impl HeedAuthStore {
let path = path.as_ref().join(AUTH_DB_PATH);
create_dir_all(&path)?;
let env = Arc::new(open_auth_store_env(path.as_ref())?);
let keys = env.create_database(Some(KEY_DB_NAME))?;
let mut wtxn = env.write_txn()?;
let keys = env.create_database(&mut wtxn, Some(KEY_DB_NAME))?;
let action_keyid_index_expiration =
env.create_database(Some(KEY_ID_ACTION_INDEX_EXPIRATION_DB_NAME))?;
env.create_database(&mut wtxn, Some(KEY_ID_ACTION_INDEX_EXPIRATION_DB_NAME))?;
wtxn.commit()?;
Ok(Self { env, keys, action_keyid_index_expiration, should_close_on_drop: true })
}

View File

@ -150,6 +150,7 @@ make_missing_field_convenience_builder!(MissingApiKeyActions, missing_api_key_ac
make_missing_field_convenience_builder!(MissingApiKeyExpiresAt, missing_api_key_expires_at);
make_missing_field_convenience_builder!(MissingApiKeyIndexes, missing_api_key_indexes);
make_missing_field_convenience_builder!(MissingSwapIndexes, missing_swap_indexes);
make_missing_field_convenience_builder!(MissingDocumentFilter, missing_document_filter);
// Integrate a sub-error into a [`DeserrError`] by taking its error message but using
// the default error code (C) from `Self`

View File

@ -175,120 +175,122 @@ macro_rules! make_error_codes {
// An exhaustive list of all the error codes used by meilisearch.
make_error_codes! {
ApiKeyAlreadyExists , InvalidRequest , CONFLICT ;
ApiKeyNotFound , InvalidRequest , NOT_FOUND ;
BadParameter , InvalidRequest , BAD_REQUEST;
BadRequest , InvalidRequest , BAD_REQUEST;
DatabaseSizeLimitReached , Internal , INTERNAL_SERVER_ERROR;
DocumentNotFound , InvalidRequest , NOT_FOUND;
DumpAlreadyProcessing , InvalidRequest , CONFLICT;
DumpNotFound , InvalidRequest , NOT_FOUND;
DumpProcessFailed , Internal , INTERNAL_SERVER_ERROR;
DuplicateIndexFound , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyActions , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyCreatedAt , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyExpiresAt , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyIndexes , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyKey , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyUid , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyUpdatedAt , InvalidRequest , BAD_REQUEST;
ImmutableIndexCreatedAt , InvalidRequest , BAD_REQUEST;
ImmutableIndexUid , InvalidRequest , BAD_REQUEST;
ImmutableIndexUpdatedAt , InvalidRequest , BAD_REQUEST;
IndexAlreadyExists , InvalidRequest , CONFLICT ;
IndexCreationFailed , Internal , INTERNAL_SERVER_ERROR;
IndexNotFound , InvalidRequest , NOT_FOUND;
IndexPrimaryKeyAlreadyExists , InvalidRequest , BAD_REQUEST ;
IndexPrimaryKeyMultipleCandidatesFound, InvalidRequest , BAD_REQUEST;
IndexPrimaryKeyNoCandidateFound , InvalidRequest , BAD_REQUEST ;
Internal , Internal , INTERNAL_SERVER_ERROR ;
InvalidApiKey , Auth , FORBIDDEN ;
InvalidApiKeyActions , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyDescription , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyExpiresAt , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyIndexes , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyLimit , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyName , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyOffset , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyUid , InvalidRequest , BAD_REQUEST ;
InvalidContentType , InvalidRequest , UNSUPPORTED_MEDIA_TYPE ;
InvalidDocumentCsvDelimiter , InvalidRequest , BAD_REQUEST ;
InvalidDocumentFields , InvalidRequest , BAD_REQUEST ;
InvalidDocumentFilter , InvalidRequest , BAD_REQUEST ;
InvalidDocumentGeoField , InvalidRequest , BAD_REQUEST ;
InvalidDocumentId , InvalidRequest , BAD_REQUEST ;
InvalidDocumentLimit , InvalidRequest , BAD_REQUEST ;
InvalidDocumentOffset , InvalidRequest , BAD_REQUEST ;
InvalidDocumentDeleteFilter , InvalidRequest , BAD_REQUEST ;
InvalidIndexLimit , InvalidRequest , BAD_REQUEST ;
InvalidIndexOffset , InvalidRequest , BAD_REQUEST ;
InvalidIndexPrimaryKey , InvalidRequest , BAD_REQUEST ;
InvalidIndexUid , InvalidRequest , BAD_REQUEST ;
InvalidSearchAttributesToCrop , InvalidRequest , BAD_REQUEST ;
InvalidSearchAttributesToHighlight , InvalidRequest , BAD_REQUEST ;
InvalidSearchAttributesToRetrieve , InvalidRequest , BAD_REQUEST ;
InvalidSearchCropLength , InvalidRequest , BAD_REQUEST ;
InvalidSearchCropMarker , InvalidRequest , BAD_REQUEST ;
InvalidSearchFacets , InvalidRequest , BAD_REQUEST ;
InvalidSearchFilter , InvalidRequest , BAD_REQUEST ;
InvalidSearchHighlightPostTag , InvalidRequest , BAD_REQUEST ;
InvalidSearchHighlightPreTag , InvalidRequest , BAD_REQUEST ;
InvalidSearchHitsPerPage , InvalidRequest , BAD_REQUEST ;
InvalidSearchLimit , InvalidRequest , BAD_REQUEST ;
InvalidSearchMatchingStrategy , InvalidRequest , BAD_REQUEST ;
InvalidSearchOffset , InvalidRequest , BAD_REQUEST ;
InvalidSearchPage , InvalidRequest , BAD_REQUEST ;
InvalidSearchQ , InvalidRequest , BAD_REQUEST ;
InvalidSearchShowMatchesPosition , InvalidRequest , BAD_REQUEST ;
InvalidSearchSort , InvalidRequest , BAD_REQUEST ;
InvalidSettingsDisplayedAttributes , InvalidRequest , BAD_REQUEST ;
InvalidSettingsDistinctAttribute , InvalidRequest , BAD_REQUEST ;
InvalidSettingsFaceting , InvalidRequest , BAD_REQUEST ;
InvalidSettingsFilterableAttributes , InvalidRequest , BAD_REQUEST ;
InvalidSettingsPagination , InvalidRequest , BAD_REQUEST ;
InvalidSettingsRankingRules , InvalidRequest , BAD_REQUEST ;
InvalidSettingsSearchableAttributes , InvalidRequest , BAD_REQUEST ;
InvalidSettingsSortableAttributes , InvalidRequest , BAD_REQUEST ;
InvalidSettingsStopWords , InvalidRequest , BAD_REQUEST ;
InvalidSettingsSynonyms , InvalidRequest , BAD_REQUEST ;
InvalidSettingsTypoTolerance , InvalidRequest , BAD_REQUEST ;
InvalidState , Internal , INTERNAL_SERVER_ERROR ;
InvalidStoreFile , Internal , INTERNAL_SERVER_ERROR ;
InvalidSwapDuplicateIndexFound , InvalidRequest , BAD_REQUEST ;
InvalidSwapIndexes , InvalidRequest , BAD_REQUEST ;
InvalidTaskAfterEnqueuedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskAfterFinishedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskAfterStartedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskBeforeEnqueuedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskBeforeFinishedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskBeforeStartedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskCanceledBy , InvalidRequest , BAD_REQUEST ;
InvalidTaskFrom , InvalidRequest , BAD_REQUEST ;
InvalidTaskLimit , InvalidRequest , BAD_REQUEST ;
InvalidTaskStatuses , InvalidRequest , BAD_REQUEST ;
InvalidTaskTypes , InvalidRequest , BAD_REQUEST ;
InvalidTaskUids , InvalidRequest , BAD_REQUEST ;
IoError , System , UNPROCESSABLE_ENTITY;
MalformedPayload , InvalidRequest , BAD_REQUEST ;
MaxFieldsLimitExceeded , InvalidRequest , BAD_REQUEST ;
MissingApiKeyActions , InvalidRequest , BAD_REQUEST ;
MissingApiKeyExpiresAt , InvalidRequest , BAD_REQUEST ;
MissingApiKeyIndexes , InvalidRequest , BAD_REQUEST ;
MissingAuthorizationHeader , Auth , UNAUTHORIZED ;
MissingContentType , InvalidRequest , UNSUPPORTED_MEDIA_TYPE ;
MissingDocumentId , InvalidRequest , BAD_REQUEST ;
MissingIndexUid , InvalidRequest , BAD_REQUEST ;
MissingMasterKey , Auth , UNAUTHORIZED ;
MissingPayload , InvalidRequest , BAD_REQUEST ;
MissingSwapIndexes , InvalidRequest , BAD_REQUEST ;
MissingTaskFilters , InvalidRequest , BAD_REQUEST ;
NoSpaceLeftOnDevice , System , UNPROCESSABLE_ENTITY;
PayloadTooLarge , InvalidRequest , PAYLOAD_TOO_LARGE ;
TaskNotFound , InvalidRequest , NOT_FOUND ;
TooManyOpenFiles , System , UNPROCESSABLE_ENTITY ;
UnretrievableDocument , Internal , BAD_REQUEST ;
UnretrievableErrorCode , InvalidRequest , BAD_REQUEST ;
UnsupportedMediaType , InvalidRequest , UNSUPPORTED_MEDIA_TYPE
ApiKeyAlreadyExists , InvalidRequest , CONFLICT ;
ApiKeyNotFound , InvalidRequest , NOT_FOUND ;
BadParameter , InvalidRequest , BAD_REQUEST;
BadRequest , InvalidRequest , BAD_REQUEST;
DatabaseSizeLimitReached , Internal , INTERNAL_SERVER_ERROR;
DocumentNotFound , InvalidRequest , NOT_FOUND;
DumpAlreadyProcessing , InvalidRequest , CONFLICT;
DumpNotFound , InvalidRequest , NOT_FOUND;
DumpProcessFailed , Internal , INTERNAL_SERVER_ERROR;
DuplicateIndexFound , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyActions , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyCreatedAt , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyExpiresAt , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyIndexes , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyKey , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyUid , InvalidRequest , BAD_REQUEST;
ImmutableApiKeyUpdatedAt , InvalidRequest , BAD_REQUEST;
ImmutableIndexCreatedAt , InvalidRequest , BAD_REQUEST;
ImmutableIndexUid , InvalidRequest , BAD_REQUEST;
ImmutableIndexUpdatedAt , InvalidRequest , BAD_REQUEST;
IndexAlreadyExists , InvalidRequest , CONFLICT ;
IndexCreationFailed , Internal , INTERNAL_SERVER_ERROR;
IndexNotFound , InvalidRequest , NOT_FOUND;
IndexPrimaryKeyAlreadyExists , InvalidRequest , BAD_REQUEST ;
IndexPrimaryKeyMultipleCandidatesFound , InvalidRequest , BAD_REQUEST;
IndexPrimaryKeyNoCandidateFound , InvalidRequest , BAD_REQUEST ;
Internal , Internal , INTERNAL_SERVER_ERROR ;
InvalidApiKey , Auth , FORBIDDEN ;
InvalidApiKeyActions , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyDescription , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyExpiresAt , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyIndexes , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyLimit , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyName , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyOffset , InvalidRequest , BAD_REQUEST ;
InvalidApiKeyUid , InvalidRequest , BAD_REQUEST ;
InvalidContentType , InvalidRequest , UNSUPPORTED_MEDIA_TYPE ;
InvalidDocumentCsvDelimiter , InvalidRequest , BAD_REQUEST ;
InvalidDocumentFields , InvalidRequest , BAD_REQUEST ;
MissingDocumentFilter , InvalidRequest , BAD_REQUEST ;
InvalidDocumentFilter , InvalidRequest , BAD_REQUEST ;
InvalidDocumentGeoField , InvalidRequest , BAD_REQUEST ;
InvalidDocumentId , InvalidRequest , BAD_REQUEST ;
InvalidDocumentLimit , InvalidRequest , BAD_REQUEST ;
InvalidDocumentOffset , InvalidRequest , BAD_REQUEST ;
InvalidIndexLimit , InvalidRequest , BAD_REQUEST ;
InvalidIndexOffset , InvalidRequest , BAD_REQUEST ;
InvalidIndexPrimaryKey , InvalidRequest , BAD_REQUEST ;
InvalidIndexUid , InvalidRequest , BAD_REQUEST ;
InvalidSearchAttributesToCrop , InvalidRequest , BAD_REQUEST ;
InvalidSearchAttributesToHighlight , InvalidRequest , BAD_REQUEST ;
InvalidSearchAttributesToRetrieve , InvalidRequest , BAD_REQUEST ;
InvalidSearchCropLength , InvalidRequest , BAD_REQUEST ;
InvalidSearchCropMarker , InvalidRequest , BAD_REQUEST ;
InvalidSearchFacets , InvalidRequest , BAD_REQUEST ;
InvalidSearchFilter , InvalidRequest , BAD_REQUEST ;
InvalidSearchHighlightPostTag , InvalidRequest , BAD_REQUEST ;
InvalidSearchHighlightPreTag , InvalidRequest , BAD_REQUEST ;
InvalidSearchHitsPerPage , InvalidRequest , BAD_REQUEST ;
InvalidSearchLimit , InvalidRequest , BAD_REQUEST ;
InvalidSearchMatchingStrategy , InvalidRequest , BAD_REQUEST ;
InvalidSearchOffset , InvalidRequest , BAD_REQUEST ;
InvalidSearchPage , InvalidRequest , BAD_REQUEST ;
InvalidSearchQ , InvalidRequest , BAD_REQUEST ;
InvalidSearchShowMatchesPosition , InvalidRequest , BAD_REQUEST ;
InvalidSearchSort , InvalidRequest , BAD_REQUEST ;
InvalidSettingsDisplayedAttributes , InvalidRequest , BAD_REQUEST ;
InvalidSettingsDistinctAttribute , InvalidRequest , BAD_REQUEST ;
InvalidSettingsFilterableAttributes , InvalidRequest , BAD_REQUEST ;
InvalidSettingsPagination , InvalidRequest , BAD_REQUEST ;
InvalidSettingsRankingRules , InvalidRequest , BAD_REQUEST ;
InvalidSettingsSearchableAttributes , InvalidRequest , BAD_REQUEST ;
InvalidSettingsSortableAttributes , InvalidRequest , BAD_REQUEST ;
InvalidSettingsStopWords , InvalidRequest , BAD_REQUEST ;
InvalidSettingsSynonyms , InvalidRequest , BAD_REQUEST ;
InvalidSettingsTypoTolerance , InvalidRequest , BAD_REQUEST ;
InvalidSettingsFaceting , InvalidRequest , BAD_REQUEST ;
InvalidSettingsFacetingMaxValuesPerFacet , InvalidRequest , BAD_REQUEST ;
InvalidSettingsFacetingSortFacetValuesBy , InvalidRequest , BAD_REQUEST ;
InvalidState , Internal , INTERNAL_SERVER_ERROR ;
InvalidStoreFile , Internal , INTERNAL_SERVER_ERROR ;
InvalidSwapDuplicateIndexFound , InvalidRequest , BAD_REQUEST ;
InvalidSwapIndexes , InvalidRequest , BAD_REQUEST ;
InvalidTaskAfterEnqueuedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskAfterFinishedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskAfterStartedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskBeforeEnqueuedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskBeforeFinishedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskBeforeStartedAt , InvalidRequest , BAD_REQUEST ;
InvalidTaskCanceledBy , InvalidRequest , BAD_REQUEST ;
InvalidTaskFrom , InvalidRequest , BAD_REQUEST ;
InvalidTaskLimit , InvalidRequest , BAD_REQUEST ;
InvalidTaskStatuses , InvalidRequest , BAD_REQUEST ;
InvalidTaskTypes , InvalidRequest , BAD_REQUEST ;
InvalidTaskUids , InvalidRequest , BAD_REQUEST ;
IoError , System , UNPROCESSABLE_ENTITY;
MalformedPayload , InvalidRequest , BAD_REQUEST ;
MaxFieldsLimitExceeded , InvalidRequest , BAD_REQUEST ;
MissingApiKeyActions , InvalidRequest , BAD_REQUEST ;
MissingApiKeyExpiresAt , InvalidRequest , BAD_REQUEST ;
MissingApiKeyIndexes , InvalidRequest , BAD_REQUEST ;
MissingAuthorizationHeader , Auth , UNAUTHORIZED ;
MissingContentType , InvalidRequest , UNSUPPORTED_MEDIA_TYPE ;
MissingDocumentId , InvalidRequest , BAD_REQUEST ;
MissingIndexUid , InvalidRequest , BAD_REQUEST ;
MissingMasterKey , Auth , UNAUTHORIZED ;
MissingPayload , InvalidRequest , BAD_REQUEST ;
MissingSwapIndexes , InvalidRequest , BAD_REQUEST ;
MissingTaskFilters , InvalidRequest , BAD_REQUEST ;
NoSpaceLeftOnDevice , System , UNPROCESSABLE_ENTITY;
PayloadTooLarge , InvalidRequest , PAYLOAD_TOO_LARGE ;
TaskNotFound , InvalidRequest , NOT_FOUND ;
TooManyOpenFiles , System , UNPROCESSABLE_ENTITY ;
UnretrievableDocument , Internal , BAD_REQUEST ;
UnretrievableErrorCode , InvalidRequest , BAD_REQUEST ;
UnsupportedMediaType , InvalidRequest , UNSUPPORTED_MEDIA_TYPE
}
impl ErrorCode for JoinError {

View File

@ -0,0 +1,33 @@
use deserr::Deserr;
use milli::OrderBy;
use serde::{Deserialize, Serialize};
#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, Serialize, Deserialize, Deserr)]
#[serde(rename_all = "camelCase")]
#[deserr(rename_all = camelCase)]
pub enum FacetValuesSort {
/// Facet values are sorted in alphabetical order, ascending from A to Z.
#[default]
Alpha,
/// Facet values are sorted by decreasing count.
/// The count is the number of records containing this facet value in the results of the query.
Count,
}
impl From<FacetValuesSort> for OrderBy {
fn from(val: FacetValuesSort) -> Self {
match val {
FacetValuesSort::Alpha => OrderBy::Lexicographic,
FacetValuesSort::Count => OrderBy::Count,
}
}
}
impl From<OrderBy> for FacetValuesSort {
fn from(val: OrderBy) -> Self {
match val {
OrderBy::Lexicographic => FacetValuesSort::Alpha,
OrderBy::Count => FacetValuesSort::Count,
}
}
}

View File

@ -2,6 +2,7 @@ pub mod compression;
pub mod deserr;
pub mod document_formats;
pub mod error;
pub mod facet_values_sort;
pub mod index_uid;
pub mod index_uid_pattern;
pub mod keys;

View File

@ -14,6 +14,7 @@ use serde::{Deserialize, Serialize, Serializer};
use crate::deserr::DeserrJsonError;
use crate::error::deserr_codes::*;
use crate::facet_values_sort::FacetValuesSort;
/// The maximimum number of results that the engine
/// will be able to return in one search call.
@ -97,11 +98,14 @@ pub struct TypoSettings {
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, Deserr)]
#[serde(deny_unknown_fields, rename_all = "camelCase")]
#[deserr(rename_all = camelCase, deny_unknown_fields)]
#[deserr(deny_unknown_fields, rename_all = camelCase, where_predicate = __Deserr_E: deserr::MergeWithError<DeserrJsonError<InvalidSettingsFaceting>> + deserr::MergeWithError<DeserrJsonError<InvalidSettingsFacetingMaxValuesPerFacet>> + deserr::MergeWithError<DeserrJsonError<InvalidSettingsFacetingSortFacetValuesBy>>)]
pub struct FacetingSettings {
#[serde(default, skip_serializing_if = "Setting::is_not_set")]
#[deserr(default)]
#[deserr(default, error = DeserrJsonError<InvalidSettingsFacetingMaxValuesPerFacet>)]
pub max_values_per_facet: Setting<usize>,
#[serde(default, skip_serializing_if = "Setting::is_not_set")]
#[deserr(default, error = DeserrJsonError<InvalidSettingsFacetingSortFacetValuesBy>)]
pub sort_facet_values_by: Setting<BTreeMap<String, FacetValuesSort>>,
}
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, Deserr)]
@ -398,12 +402,21 @@ pub fn apply_settings_to_builder(
Setting::NotSet => (),
}
match settings.faceting {
Setting::Set(ref value) => match value.max_values_per_facet {
Setting::Set(val) => builder.set_max_values_per_facet(val),
Setting::Reset => builder.reset_max_values_per_facet(),
Setting::NotSet => (),
},
match &settings.faceting {
Setting::Set(FacetingSettings { max_values_per_facet, sort_facet_values_by }) => {
match max_values_per_facet {
Setting::Set(val) => builder.set_max_values_per_facet(*val),
Setting::Reset => builder.reset_max_values_per_facet(),
Setting::NotSet => (),
}
match sort_facet_values_by {
Setting::Set(val) => builder.set_sort_facet_values_by(
val.iter().map(|(name, order)| (name.clone(), (*order).into())).collect(),
),
Setting::Reset => builder.reset_sort_facet_values_by(),
Setting::NotSet => (),
}
}
Setting::Reset => builder.reset_max_values_per_facet(),
Setting::NotSet => (),
}
@ -476,6 +489,13 @@ pub fn settings(
max_values_per_facet: Setting::Set(
index.max_values_per_facet(rtxn)?.unwrap_or(DEFAULT_VALUES_PER_FACET),
),
sort_facet_values_by: Setting::Set(
index
.sort_facet_values_by(rtxn)?
.into_iter()
.map(|(name, sort)| (name, sort.into()))
.collect(),
),
};
let pagination = PaginationSettings {

View File

@ -14,14 +14,27 @@ default-run = "meilisearch"
[dependencies]
actix-cors = "0.6.4"
actix-http = { version = "3.3.1", default-features = false, features = ["compress-brotli", "compress-gzip", "rustls"] }
actix-web = { version = "4.3.1", default-features = false, features = ["macros", "compress-brotli", "compress-gzip", "cookies", "rustls"] }
actix-http = { version = "3.3.1", default-features = false, features = [
"compress-brotli",
"compress-gzip",
"rustls",
] }
actix-web = { version = "4.3.1", default-features = false, features = [
"macros",
"compress-brotli",
"compress-gzip",
"cookies",
"rustls",
] }
actix-web-static-files = { git = "https://github.com/kilork/actix-web-static-files.git", rev = "2d3b6160", optional = true }
anyhow = { version = "1.0.70", features = ["backtrace"] }
async-stream = "0.3.5"
async-trait = "0.1.68"
bstr = "1.4.0"
byte-unit = { version = "4.0.19", default-features = false, features = ["std", "serde"] }
byte-unit = { version = "4.0.19", default-features = false, features = [
"std",
"serde",
] }
bytes = "1.4.0"
clap = { version = "4.2.1", features = ["derive", "env"] }
crossbeam-channel = "0.5.8"
@ -56,7 +69,10 @@ prometheus = { version = "0.13.3", features = ["process"] }
rand = "0.8.5"
rayon = "1.7.0"
regex = "1.7.3"
reqwest = { version = "0.11.16", features = ["rustls-tls", "json"], default-features = false }
reqwest = { version = "0.11.16", features = [
"rustls-tls",
"json",
], default-features = false }
rustls = "0.20.8"
rustls-pemfile = "1.0.2"
segment = { version = "0.2.2", optional = true }
@ -70,7 +86,12 @@ sysinfo = "0.28.4"
tar = "0.4.38"
tempfile = "3.5.0"
thiserror = "1.0.40"
time = { version = "0.3.20", features = ["serde-well-known", "formatting", "parsing", "macros"] }
time = { version = "0.3.20", features = [
"serde-well-known",
"formatting",
"parsing",
"macros",
] }
tokio = { version = "1.27.0", features = ["full"] }
tokio-stream = "0.1.12"
toml = "0.7.3"
@ -89,7 +110,7 @@ brotli = "3.3.4"
insta = "1.29.0"
manifest-dir-macros = "0.1.16"
maplit = "1.0.2"
meili-snap = {path = "../meili-snap"}
meili-snap = { path = "../meili-snap" }
temp-env = "0.3.3"
urlencoding = "2.1.2"
yaup = "0.2.1"
@ -98,7 +119,10 @@ yaup = "0.2.1"
anyhow = { version = "1.0.70", optional = true }
cargo_toml = { version = "0.15.2", optional = true }
hex = { version = "0.4.3", optional = true }
reqwest = { version = "0.11.16", features = ["blocking", "rustls-tls"], default-features = false, optional = true }
reqwest = { version = "0.11.16", features = [
"blocking",
"rustls-tls",
], default-features = false, optional = true }
sha-1 = { version = "0.10.1", optional = true }
static-files = { version = "0.2.3", optional = true }
tempfile = { version = "3.5.0", optional = true }
@ -108,7 +132,17 @@ zip = { version = "0.6.4", optional = true }
[features]
default = ["analytics", "meilisearch-types/all-tokenizations", "mini-dashboard"]
analytics = ["segment"]
mini-dashboard = ["actix-web-static-files", "static-files", "anyhow", "cargo_toml", "hex", "reqwest", "sha-1", "tempfile", "zip"]
mini-dashboard = [
"actix-web-static-files",
"static-files",
"anyhow",
"cargo_toml",
"hex",
"reqwest",
"sha-1",
"tempfile",
"zip",
]
chinese = ["meilisearch-types/chinese"]
hebrew = ["meilisearch-types/hebrew"]
japanese = ["meilisearch-types/japanese"]

View File

@ -5,7 +5,7 @@ use actix_web::HttpRequest;
use meilisearch_types::InstanceUid;
use serde_json::Value;
use super::{find_user_id, Analytics, DocumentDeletionKind};
use super::{find_user_id, Analytics, DocumentDeletionKind, DocumentFetchKind};
use crate::routes::indexes::documents::UpdateDocumentsQuery;
use crate::routes::tasks::TasksFilterQuery;
use crate::Opt;
@ -71,6 +71,8 @@ impl Analytics for MockAnalytics {
_request: &HttpRequest,
) {
}
fn get_fetch_documents(&self, _documents_query: &DocumentFetchKind, _request: &HttpRequest) {}
fn post_fetch_documents(&self, _documents_query: &DocumentFetchKind, _request: &HttpRequest) {}
fn get_tasks(&self, _query: &TasksFilterQuery, _request: &HttpRequest) {}
fn health_seen(&self, _request: &HttpRequest) {}
}

View File

@ -67,6 +67,12 @@ pub enum DocumentDeletionKind {
PerFilter,
}
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum DocumentFetchKind {
PerDocumentId,
Normal { with_filter: bool, limit: usize, offset: usize },
}
pub trait Analytics: Sync + Send {
fn instance_uid(&self) -> Option<&InstanceUid>;
@ -90,6 +96,12 @@ pub trait Analytics: Sync + Send {
request: &HttpRequest,
);
// this method should be called to aggregate a fetch documents request
fn get_fetch_documents(&self, documents_query: &DocumentFetchKind, request: &HttpRequest);
// this method should be called to aggregate a fetch documents request
fn post_fetch_documents(&self, documents_query: &DocumentFetchKind, request: &HttpRequest);
// this method should be called to aggregate a add documents request
fn delete_documents(&self, kind: DocumentDeletionKind, request: &HttpRequest);

View File

@ -23,7 +23,9 @@ use tokio::select;
use tokio::sync::mpsc::{self, Receiver, Sender};
use uuid::Uuid;
use super::{config_user_id_path, DocumentDeletionKind, MEILISEARCH_CONFIG_PATH};
use super::{
config_user_id_path, DocumentDeletionKind, DocumentFetchKind, MEILISEARCH_CONFIG_PATH,
};
use crate::analytics::Analytics;
use crate::option::{default_http_addr, IndexerOpts, MaxMemory, MaxThreads, ScheduleSnapshot};
use crate::routes::indexes::documents::UpdateDocumentsQuery;
@ -72,6 +74,8 @@ pub enum AnalyticsMsg {
AggregateAddDocuments(DocumentsAggregator),
AggregateDeleteDocuments(DocumentsDeletionAggregator),
AggregateUpdateDocuments(DocumentsAggregator),
AggregateGetFetchDocuments(DocumentsFetchAggregator),
AggregatePostFetchDocuments(DocumentsFetchAggregator),
AggregateTasks(TasksAggregator),
AggregateHealth(HealthAggregator),
}
@ -139,6 +143,8 @@ impl SegmentAnalytics {
add_documents_aggregator: DocumentsAggregator::default(),
delete_documents_aggregator: DocumentsDeletionAggregator::default(),
update_documents_aggregator: DocumentsAggregator::default(),
get_fetch_documents_aggregator: DocumentsFetchAggregator::default(),
post_fetch_documents_aggregator: DocumentsFetchAggregator::default(),
get_tasks_aggregator: TasksAggregator::default(),
health_aggregator: HealthAggregator::default(),
});
@ -205,6 +211,16 @@ impl super::Analytics for SegmentAnalytics {
let _ = self.sender.try_send(AnalyticsMsg::AggregateUpdateDocuments(aggregate));
}
fn get_fetch_documents(&self, documents_query: &DocumentFetchKind, request: &HttpRequest) {
let aggregate = DocumentsFetchAggregator::from_query(documents_query, request);
let _ = self.sender.try_send(AnalyticsMsg::AggregateGetFetchDocuments(aggregate));
}
fn post_fetch_documents(&self, documents_query: &DocumentFetchKind, request: &HttpRequest) {
let aggregate = DocumentsFetchAggregator::from_query(documents_query, request);
let _ = self.sender.try_send(AnalyticsMsg::AggregatePostFetchDocuments(aggregate));
}
fn get_tasks(&self, query: &TasksFilterQuery, request: &HttpRequest) {
let aggregate = TasksAggregator::from_query(query, request);
let _ = self.sender.try_send(AnalyticsMsg::AggregateTasks(aggregate));
@ -225,6 +241,7 @@ impl super::Analytics for SegmentAnalytics {
struct Infos {
env: String,
experimental_enable_metrics: bool,
experimental_reduce_indexing_memory_usage: bool,
db_path: bool,
import_dump: bool,
dump_dir: bool,
@ -258,6 +275,7 @@ impl From<Opt> for Infos {
let Opt {
db_path,
experimental_enable_metrics,
experimental_reduce_indexing_memory_usage,
http_addr,
master_key: _,
env,
@ -300,6 +318,7 @@ impl From<Opt> for Infos {
Self {
env,
experimental_enable_metrics,
experimental_reduce_indexing_memory_usage,
db_path: db_path != PathBuf::from("./data.ms"),
import_dump: import_dump.is_some(),
dump_dir: dump_dir != PathBuf::from("dumps/"),
@ -338,6 +357,8 @@ pub struct Segment {
add_documents_aggregator: DocumentsAggregator,
delete_documents_aggregator: DocumentsDeletionAggregator,
update_documents_aggregator: DocumentsAggregator,
get_fetch_documents_aggregator: DocumentsFetchAggregator,
post_fetch_documents_aggregator: DocumentsFetchAggregator,
get_tasks_aggregator: TasksAggregator,
health_aggregator: HealthAggregator,
}
@ -400,6 +421,8 @@ impl Segment {
Some(AnalyticsMsg::AggregateAddDocuments(agreg)) => self.add_documents_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregateDeleteDocuments(agreg)) => self.delete_documents_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregateUpdateDocuments(agreg)) => self.update_documents_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregateGetFetchDocuments(agreg)) => self.get_fetch_documents_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregatePostFetchDocuments(agreg)) => self.post_fetch_documents_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregateTasks(agreg)) => self.get_tasks_aggregator.aggregate(agreg),
Some(AnalyticsMsg::AggregateHealth(agreg)) => self.health_aggregator.aggregate(agreg),
None => (),
@ -450,6 +473,10 @@ impl Segment {
.into_event(&self.user, "Documents Deleted");
let update_documents = std::mem::take(&mut self.update_documents_aggregator)
.into_event(&self.user, "Documents Updated");
let get_fetch_documents = std::mem::take(&mut self.get_fetch_documents_aggregator)
.into_event(&self.user, "Documents Fetched GET");
let post_fetch_documents = std::mem::take(&mut self.post_fetch_documents_aggregator)
.into_event(&self.user, "Documents Fetched POST");
let get_tasks =
std::mem::take(&mut self.get_tasks_aggregator).into_event(&self.user, "Tasks Seen");
let health =
@ -473,6 +500,12 @@ impl Segment {
if let Some(update_documents) = update_documents {
let _ = self.batcher.push(update_documents).await;
}
if let Some(get_fetch_documents) = get_fetch_documents {
let _ = self.batcher.push(get_fetch_documents).await;
}
if let Some(post_fetch_documents) = post_fetch_documents {
let _ = self.batcher.push(post_fetch_documents).await;
}
if let Some(get_tasks) = get_tasks {
let _ = self.batcher.push(get_tasks).await;
}
@ -1135,3 +1168,76 @@ impl HealthAggregator {
})
}
}
#[derive(Default, Serialize)]
pub struct DocumentsFetchAggregator {
#[serde(skip)]
timestamp: Option<OffsetDateTime>,
// context
#[serde(rename = "user-agent")]
user_agents: HashSet<String>,
#[serde(rename = "requests.max_limit")]
total_received: usize,
// a call on ../documents/:doc_id
per_document_id: bool,
// if a filter was used
per_filter: bool,
// pagination
#[serde(rename = "pagination.max_limit")]
max_limit: usize,
#[serde(rename = "pagination.max_offset")]
max_offset: usize,
}
impl DocumentsFetchAggregator {
pub fn from_query(query: &DocumentFetchKind, request: &HttpRequest) -> Self {
let (limit, offset) = match query {
DocumentFetchKind::PerDocumentId => (1, 0),
DocumentFetchKind::Normal { limit, offset, .. } => (*limit, *offset),
};
Self {
timestamp: Some(OffsetDateTime::now_utc()),
user_agents: extract_user_agents(request).into_iter().collect(),
total_received: 1,
per_document_id: matches!(query, DocumentFetchKind::PerDocumentId),
per_filter: matches!(query, DocumentFetchKind::Normal { with_filter, .. } if *with_filter),
max_limit: limit,
max_offset: offset,
}
}
/// Aggregate one [DocumentsFetchAggregator] into another.
pub fn aggregate(&mut self, other: Self) {
if self.timestamp.is_none() {
self.timestamp = other.timestamp;
}
for user_agent in other.user_agents {
self.user_agents.insert(user_agent);
}
self.total_received = self.total_received.saturating_add(other.total_received);
self.per_document_id |= other.per_document_id;
self.per_filter |= other.per_filter;
self.max_limit = self.max_limit.max(other.max_limit);
self.max_offset = self.max_offset.max(other.max_offset);
}
pub fn into_event(self, user: &User, event_name: &str) -> Option<Track> {
// if we had no timestamp it means we never encountered any events and
// thus we don't need to send this event.
let timestamp = self.timestamp?;
Some(Track {
timestamp: Some(timestamp),
user: user.clone(),
event: event_name.to_string(),
properties: serde_json::to_value(self).ok()?,
..Default::default()
})
}
}

View File

@ -1,5 +1,6 @@
use actix_web as aweb;
use aweb::error::{JsonPayloadError, QueryPayloadError};
use byte_unit::Byte;
use meilisearch_types::document_formats::{DocumentFormatError, PayloadType};
use meilisearch_types::error::{Code, ErrorCode, ResponseError};
use meilisearch_types::index_uid::{IndexUid, IndexUidFormatError};
@ -26,8 +27,8 @@ pub enum MeilisearchHttpError {
InvalidExpression(&'static [&'static str], Value),
#[error("A {0} payload is missing.")]
MissingPayload(PayloadType),
#[error("The provided payload reached the size limit.")]
PayloadTooLarge,
#[error("The provided payload reached the size limit. The maximum accepted payload size is {}.", Byte::from_bytes(*.0 as u64).get_appropriate_unit(true))]
PayloadTooLarge(usize),
#[error("Two indexes must be given for each swap. The list `[{}]` contains {} indexes.",
.0.iter().map(|uid| format!("\"{uid}\"")).collect::<Vec<_>>().join(", "), .0.len()
)]
@ -60,9 +61,9 @@ impl ErrorCode for MeilisearchHttpError {
MeilisearchHttpError::MissingPayload(_) => Code::MissingPayload,
MeilisearchHttpError::InvalidContentType(_, _) => Code::InvalidContentType,
MeilisearchHttpError::DocumentNotFound(_) => Code::DocumentNotFound,
MeilisearchHttpError::EmptyFilter => Code::InvalidDocumentDeleteFilter,
MeilisearchHttpError::EmptyFilter => Code::InvalidDocumentFilter,
MeilisearchHttpError::InvalidExpression(_, _) => Code::InvalidSearchFilter,
MeilisearchHttpError::PayloadTooLarge => Code::PayloadTooLarge,
MeilisearchHttpError::PayloadTooLarge(_) => Code::PayloadTooLarge,
MeilisearchHttpError::SwapIndexPayloadWrongLength(_) => Code::InvalidSwapIndexes,
MeilisearchHttpError::IndexUid(e) => e.error_code(),
MeilisearchHttpError::SerdeJson(_) => Code::Internal,

View File

@ -11,6 +11,7 @@ use crate::error::MeilisearchHttpError;
pub struct Payload {
payload: Decompress<dev::Payload>,
limit: usize,
remaining: usize,
}
pub struct PayloadConfig {
@ -43,6 +44,7 @@ impl FromRequest for Payload {
ready(Ok(Payload {
payload: Decompress::from_headers(payload.take(), req.headers()),
limit,
remaining: limit,
}))
}
}
@ -54,12 +56,14 @@ impl Stream for Payload {
fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
match Pin::new(&mut self.payload).poll_next(cx) {
Poll::Ready(Some(result)) => match result {
Ok(bytes) => match self.limit.checked_sub(bytes.len()) {
Ok(bytes) => match self.remaining.checked_sub(bytes.len()) {
Some(new_limit) => {
self.limit = new_limit;
self.remaining = new_limit;
Poll::Ready(Some(Ok(bytes)))
}
None => Poll::Ready(Some(Err(MeilisearchHttpError::PayloadTooLarge))),
None => {
Poll::Ready(Some(Err(MeilisearchHttpError::PayloadTooLarge(self.limit))))
}
},
x => Poll::Ready(Some(x.map_err(MeilisearchHttpError::from))),
},

View File

@ -232,6 +232,7 @@ fn open_or_create_database_unchecked(
dumps_path: opt.dump_dir.clone(),
task_db_size: opt.max_task_db_size.get_bytes() as usize,
index_base_map_size: opt.max_index_size.get_bytes() as usize,
enable_mdb_writemap: opt.experimental_reduce_indexing_memory_usage,
indexer_config: (&opt.indexer_options).try_into()?,
autobatching_enabled: true,
max_number_of_tasks: 1_000_000,

View File

@ -29,6 +29,11 @@ fn setup(opt: &Opt) -> anyhow::Result<()> {
async fn main() -> anyhow::Result<()> {
let (opt, config_read_from) = Opt::try_build()?;
anyhow::ensure!(
!(cfg!(windows) && opt.experimental_reduce_indexing_memory_usage),
"The `experimental-reduce-indexing-memory-usage` flag is not supported on Windows"
);
setup(&opt)?;
match (opt.env.as_ref(), &opt.master_key) {

View File

@ -48,6 +48,8 @@ const MEILI_IGNORE_DUMP_IF_DB_EXISTS: &str = "MEILI_IGNORE_DUMP_IF_DB_EXISTS";
const MEILI_DUMP_DIR: &str = "MEILI_DUMP_DIR";
const MEILI_LOG_LEVEL: &str = "MEILI_LOG_LEVEL";
const MEILI_EXPERIMENTAL_ENABLE_METRICS: &str = "MEILI_EXPERIMENTAL_ENABLE_METRICS";
const MEILI_EXPERIMENTAL_REDUCE_INDEXING_MEMORY_USAGE: &str =
"MEILI_EXPERIMENTAL_REDUCE_INDEXING_MEMORY_USAGE";
const DEFAULT_CONFIG_FILE_PATH: &str = "./config.toml";
const DEFAULT_DB_PATH: &str = "./data.ms";
@ -293,6 +295,11 @@ pub struct Opt {
#[serde(default)]
pub experimental_enable_metrics: bool,
/// Experimental RAM reduction during indexing, do not use in production, see: <https://github.com/meilisearch/product/discussions/652>
#[clap(long, env = MEILI_EXPERIMENTAL_REDUCE_INDEXING_MEMORY_USAGE)]
#[serde(default)]
pub experimental_reduce_indexing_memory_usage: bool,
#[serde(flatten)]
#[clap(flatten)]
pub indexer_options: IndexerOpts,
@ -385,6 +392,7 @@ impl Opt {
#[cfg(all(not(debug_assertions), feature = "analytics"))]
no_analytics,
experimental_enable_metrics: enable_metrics_route,
experimental_reduce_indexing_memory_usage: reduce_indexing_memory_usage,
} = self;
export_to_env_if_not_present(MEILI_DB_PATH, db_path);
export_to_env_if_not_present(MEILI_HTTP_ADDR, http_addr);
@ -426,6 +434,10 @@ impl Opt {
MEILI_EXPERIMENTAL_ENABLE_METRICS,
enable_metrics_route.to_string(),
);
export_to_env_if_not_present(
MEILI_EXPERIMENTAL_REDUCE_INDEXING_MEMORY_USAGE,
reduce_indexing_memory_usage.to_string(),
);
indexer_options.export_to_env();
}

View File

@ -29,7 +29,7 @@ use tempfile::tempfile;
use tokio::fs::File;
use tokio::io::{AsyncSeekExt, AsyncWriteExt, BufWriter};
use crate::analytics::{Analytics, DocumentDeletionKind};
use crate::analytics::{Analytics, DocumentDeletionKind, DocumentFetchKind};
use crate::error::MeilisearchHttpError;
use crate::error::PayloadError::ReceivePayload;
use crate::extractors::authentication::policies::*;
@ -97,10 +97,14 @@ pub async fn get_document(
index_scheduler: GuardedData<ActionPolicy<{ actions::DOCUMENTS_GET }>, Data<IndexScheduler>>,
document_param: web::Path<DocumentParam>,
params: AwebQueryParameter<GetDocument, DeserrQueryParamError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
let DocumentParam { index_uid, document_id } = document_param.into_inner();
let index_uid = IndexUid::try_from(index_uid)?;
analytics.get_fetch_documents(&DocumentFetchKind::PerDocumentId, &req);
let GetDocument { fields } = params.into_inner();
let attributes_to_retrieve = fields.merge_star_and_none();
@ -161,16 +165,31 @@ pub async fn documents_by_query_post(
index_scheduler: GuardedData<ActionPolicy<{ actions::DOCUMENTS_GET }>, Data<IndexScheduler>>,
index_uid: web::Path<String>,
body: AwebJson<BrowseQuery, DeserrJsonError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
debug!("called with body: {:?}", body);
documents_by_query(&index_scheduler, index_uid, body.into_inner())
let body = body.into_inner();
analytics.post_fetch_documents(
&DocumentFetchKind::Normal {
with_filter: body.filter.is_some(),
limit: body.limit,
offset: body.offset,
},
&req,
);
documents_by_query(&index_scheduler, index_uid, body)
}
pub async fn get_documents(
index_scheduler: GuardedData<ActionPolicy<{ actions::DOCUMENTS_GET }>, Data<IndexScheduler>>,
index_uid: web::Path<String>,
params: AwebQueryParameter<BrowseQueryGet, DeserrQueryParamError>,
req: HttpRequest,
analytics: web::Data<dyn Analytics>,
) -> Result<HttpResponse, ResponseError> {
debug!("called with params: {:?}", params);
@ -191,6 +210,15 @@ pub async fn get_documents(
filter,
};
analytics.get_fetch_documents(
&DocumentFetchKind::Normal {
with_filter: query.filter.is_some(),
limit: query.limit,
offset: query.offset,
},
&req,
);
documents_by_query(&index_scheduler, index_uid, query)
}
@ -458,7 +486,7 @@ pub async fn delete_documents_batch(
#[derive(Debug, Deserr)]
#[deserr(error = DeserrJsonError, rename_all = camelCase, deny_unknown_fields)]
pub struct DocumentDeletionByFilter {
#[deserr(error = DeserrJsonError<InvalidDocumentDeleteFilter>)]
#[deserr(error = DeserrJsonError<InvalidDocumentFilter>, missing_field_error = DeserrJsonError::missing_document_filter)]
filter: Value,
}
@ -480,8 +508,8 @@ pub async fn delete_documents_by_filter(
|| -> Result<_, ResponseError> {
Ok(crate::search::parse_filter(&filter)?.ok_or(MeilisearchHttpError::EmptyFilter)?)
}()
// and whatever was the error, the error code should always be an InvalidDocumentDeleteFilter
.map_err(|err| ResponseError::from_msg(err.message, Code::InvalidDocumentDeleteFilter))?;
// and whatever was the error, the error code should always be an InvalidDocumentFilter
.map_err(|err| ResponseError::from_msg(err.message, Code::InvalidDocumentFilter))?;
let task = KindWithContent::DocumentDeletionByFilter { index_uid, filter_expr: filter };
let task: SummarizedTaskView =
@ -540,7 +568,12 @@ fn retrieve_documents<S: AsRef<str>>(
};
let candidates = if let Some(filter) = filter {
filter.evaluate(&rtxn, index)?
filter.evaluate(&rtxn, index).map_err(|err| match err {
milli::Error::UserError(milli::UserError::InvalidFilter(_)) => {
ResponseError::from_msg(err.to_string(), Code::InvalidDocumentFilter)
}
e => e.into(),
})?
} else {
index.documents_ids(&rtxn)?
};

View File

@ -407,6 +407,7 @@ make_setting_route!(
json!({
"faceting": {
"max_values_per_facet": setting.as_ref().and_then(|s| s.max_values_per_facet.set()),
"sort_facet_values_by": setting.as_ref().and_then(|s| s.sort_facet_values_by.clone().set()),
},
}),
Some(req),
@ -545,6 +546,10 @@ pub async fn update_all(
.as_ref()
.set()
.and_then(|s| s.max_values_per_facet.as_ref().set()),
"sort_facet_values_by": new_settings.faceting
.as_ref()
.set()
.and_then(|s| s.sort_facet_values_by.as_ref().set()),
},
"pagination": {
"max_total_hits": new_settings.pagination

View File

@ -99,7 +99,7 @@ pub struct DetailsView {
#[serde(skip_serializing_if = "Option::is_none")]
pub deleted_tasks: Option<Option<u64>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub original_filter: Option<String>,
pub original_filter: Option<Option<String>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub dump_uid: Option<Option<String>>,
#[serde(skip_serializing_if = "Option::is_none")]
@ -131,12 +131,13 @@ impl From<Details> for DetailsView {
} => DetailsView {
provided_ids: Some(received_document_ids),
deleted_documents: Some(deleted_documents),
original_filter: Some(None),
..DetailsView::default()
},
Details::DocumentDeletionByFilter { original_filter, deleted_documents } => {
DetailsView {
provided_ids: Some(0),
original_filter: Some(original_filter),
original_filter: Some(Some(original_filter)),
deleted_documents: Some(deleted_documents),
..DetailsView::default()
}
@ -148,7 +149,7 @@ impl From<Details> for DetailsView {
DetailsView {
matched_tasks: Some(matched_tasks),
canceled_tasks: Some(canceled_tasks),
original_filter: Some(original_filter),
original_filter: Some(Some(original_filter)),
..DetailsView::default()
}
}
@ -156,7 +157,7 @@ impl From<Details> for DetailsView {
DetailsView {
matched_tasks: Some(matched_tasks),
deleted_tasks: Some(deleted_tasks),
original_filter: Some(original_filter),
original_filter: Some(Some(original_filter)),
..DetailsView::default()
}
}

View File

@ -5,6 +5,7 @@ use std::time::Instant;
use deserr::Deserr;
use either::Either;
use indexmap::IndexMap;
use meilisearch_auth::IndexSearchRules;
use meilisearch_types::deserr::DeserrJsonError;
use meilisearch_types::error::deserr_codes::*;
@ -213,7 +214,7 @@ pub struct SearchResult {
#[serde(flatten)]
pub hits_info: HitsInfo,
#[serde(skip_serializing_if = "Option::is_none")]
pub facet_distribution: Option<BTreeMap<String, BTreeMap<String, u64>>>,
pub facet_distribution: Option<BTreeMap<String, IndexMap<String, u64>>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub facet_stats: Option<BTreeMap<String, FacetStats>>,
}
@ -448,10 +449,30 @@ pub fn perform_search(
.unwrap_or(DEFAULT_VALUES_PER_FACET);
facet_distribution.max_values_per_facet(max_values_by_facet);
let sort_facet_values_by =
index.sort_facet_values_by(&rtxn).map_err(milli::Error::from)?;
let default_sort_facet_values_by =
sort_facet_values_by.get("*").copied().unwrap_or_default();
if fields.iter().all(|f| f != "*") {
let fields: Vec<_> = fields
.into_iter()
.map(|n| {
(
n,
sort_facet_values_by
.get(n)
.copied()
.unwrap_or(default_sort_facet_values_by),
)
})
.collect();
facet_distribution.facets(fields);
}
let distribution = facet_distribution.candidates(candidates).execute()?;
let distribution = facet_distribution
.candidates(candidates)
.default_order_by(default_sort_facet_values_by)
.execute()?;
let stats = facet_distribution.compute_stats()?;
(Some(distribution), Some(stats))
}

View File

@ -16,8 +16,11 @@ pub static AUTHORIZATIONS: Lazy<HashMap<(&'static str, &'static str), HashSet<&'
("GET", "/indexes/products/search") => hashset!{"search", "*"},
("POST", "/indexes/products/documents") => hashset!{"documents.add", "documents.*", "*"},
("GET", "/indexes/products/documents") => hashset!{"documents.get", "documents.*", "*"},
("POST", "/indexes/products/documents/fetch") => hashset!{"documents.get", "documents.*", "*"},
("GET", "/indexes/products/documents/0") => hashset!{"documents.get", "documents.*", "*"},
("DELETE", "/indexes/products/documents/0") => hashset!{"documents.delete", "documents.*", "*"},
("POST", "/indexes/products/documents/delete-batch") => hashset!{"documents.delete", "documents.*", "*"},
("POST", "/indexes/products/documents/delete") => hashset!{"documents.delete", "documents.*", "*"},
("GET", "/tasks") => hashset!{"tasks.get", "tasks.*", "*"},
("DELETE", "/tasks") => hashset!{"tasks.delete", "tasks.*", "*"},
("GET", "/tasks?indexUid=products") => hashset!{"tasks.get", "tasks.*", "*"},

View File

@ -1781,7 +1781,7 @@ async fn error_add_documents_payload_size() {
snapshot!(json_string!(response, { ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]" }),
@r###"
{
"message": "The provided payload reached the size limit.",
"message": "The provided payload reached the size limit. The maximum accepted payload size is 10.00 MiB.",
"code": "payload_too_large",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#payload_too_large"

View File

@ -180,9 +180,9 @@ async fn get_all_documents_bad_filter() {
snapshot!(json_string!(response), @r###"
{
"message": "Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo=bernese",
"code": "invalid_search_filter",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter"
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
}
"###);
}
@ -547,9 +547,9 @@ async fn delete_document_by_filter() {
snapshot!(json_string!(response), @r###"
{
"message": "Invalid syntax for the filter parameter: `expected String, Array, found: true`.",
"code": "invalid_document_delete_filter",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_delete_filter"
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
}
"###);
@ -559,9 +559,9 @@ async fn delete_document_by_filter() {
snapshot!(json_string!(response), @r###"
{
"message": "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `_geoRadius`, or `_geoBoundingBox` at `hello`.\n1:6 hello",
"code": "invalid_document_delete_filter",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_delete_filter"
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
}
"###);
@ -571,9 +571,21 @@ async fn delete_document_by_filter() {
snapshot!(json_string!(response), @r###"
{
"message": "Sending an empty filter is forbidden.",
"code": "invalid_document_delete_filter",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_delete_filter"
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
}
"###);
// do not send any filter
let (response, code) = index.delete_document_by_filter(json!({})).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Missing field `filter`",
"code": "missing_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#missing_document_filter"
}
"###);
@ -630,9 +642,9 @@ async fn delete_document_by_filter() {
},
"error": {
"message": "Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese",
"code": "invalid_search_filter",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter"
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
},
"duration": "[duration]",
"enqueuedAt": "[date]",
@ -664,9 +676,9 @@ async fn delete_document_by_filter() {
},
"error": {
"message": "Attribute `catto` is not filterable. Available filterable attributes are: `doggo`.\n1:6 catto = jorts",
"code": "invalid_search_filter",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter"
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
},
"duration": "[duration]",
"enqueuedAt": "[date]",
@ -748,4 +760,27 @@ async fn fetch_document_by_filter() {
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
}
"###);
let (response, code) = index.get_document_by_filter(json!({ "filter": "cool doggo" })).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `_geoRadius`, or `_geoBoundingBox` at `cool doggo`.\n1:11 cool doggo",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
}
"###);
let (response, code) =
index.get_document_by_filter(json!({ "filter": "doggo = bernese" })).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Attribute `doggo` is not filterable. Available filterable attributes are: `color`.\n1:6 doggo = bernese",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
}
"###);
}

View File

@ -946,7 +946,7 @@ async fn sort_unset_ranking_rule() {
index.wait_task(1).await;
let expected_response = json!({
"message": "The sort ranking rule must be specified in the ranking rules settings to use the sort parameter at search time.",
"message": "You must specify where `sort` is listed in the rankingRules setting to use the sort parameter at search time.",
"code": "invalid_search_sort",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_sort"

View File

@ -413,7 +413,7 @@ async fn test_summarized_document_addition_or_update() {
}
#[actix_web::test]
async fn test_summarized_delete_batch() {
async fn test_summarized_delete_documents_by_batch() {
let server = Server::new().await;
let index = server.index("test");
index.delete_batch(vec![1, 2, 3]).await;
@ -430,7 +430,8 @@ async fn test_summarized_delete_batch() {
"canceledBy": null,
"details": {
"providedIds": 3,
"deletedDocuments": 0
"deletedDocuments": 0,
"originalFilter": null
},
"error": {
"message": "Index `test` not found.",
@ -460,7 +461,8 @@ async fn test_summarized_delete_batch() {
"canceledBy": null,
"details": {
"providedIds": 1,
"deletedDocuments": 0
"deletedDocuments": 0,
"originalFilter": null
},
"error": null,
"duration": "[duration]",
@ -472,7 +474,100 @@ async fn test_summarized_delete_batch() {
}
#[actix_web::test]
async fn test_summarized_delete_document() {
async fn test_summarized_delete_documents_by_filter() {
let server = Server::new().await;
let index = server.index("test");
index.delete_document_by_filter(json!({ "filter": "doggo = bernese" })).await;
index.wait_task(0).await;
let (task, _) = index.get_task(0).await;
assert_json_snapshot!(task,
{ ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]" },
@r###"
{
"uid": 0,
"indexUid": "test",
"status": "failed",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"providedIds": 0,
"deletedDocuments": 0,
"originalFilter": "\"doggo = bernese\""
},
"error": {
"message": "Index `test` not found.",
"code": "index_not_found",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#index_not_found"
},
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
index.create(None).await;
index.delete_document_by_filter(json!({ "filter": "doggo = bernese" })).await;
index.wait_task(2).await;
let (task, _) = index.get_task(2).await;
assert_json_snapshot!(task,
{ ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]" },
@r###"
{
"uid": 2,
"indexUid": "test",
"status": "failed",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"providedIds": 0,
"deletedDocuments": 0,
"originalFilter": "\"doggo = bernese\""
},
"error": {
"message": "Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese",
"code": "invalid_document_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_filter"
},
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
index.update_settings(json!({ "filterableAttributes": ["doggo"] })).await;
index.delete_document_by_filter(json!({ "filter": "doggo = bernese" })).await;
index.wait_task(4).await;
let (task, _) = index.get_task(4).await;
assert_json_snapshot!(task,
{ ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]" },
@r###"
{
"uid": 4,
"indexUid": "test",
"status": "succeeded",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"providedIds": 0,
"deletedDocuments": 0,
"originalFilter": "\"doggo = bernese\""
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
}
#[actix_web::test]
async fn test_summarized_delete_document_by_id() {
let server = Server::new().await;
let index = server.index("test");
index.delete_document(1).await;
@ -489,7 +584,8 @@ async fn test_summarized_delete_document() {
"canceledBy": null,
"details": {
"providedIds": 1,
"deletedDocuments": 0
"deletedDocuments": 0,
"originalFilter": null
},
"error": {
"message": "Index `test` not found.",
@ -519,7 +615,8 @@ async fn test_summarized_delete_document() {
"canceledBy": null,
"details": {
"providedIds": 1,
"deletedDocuments": 0
"deletedDocuments": 0,
"originalFilter": null
},
"error": null,
"duration": "[duration]",

View File

@ -25,8 +25,14 @@ flatten-serde-json = { path = "../flatten-serde-json" }
fst = "0.4.7"
fxhash = "0.2.1"
geoutils = "0.5.1"
grenad = { version = "0.4.4", default-features = false, features = ["tempfile"] }
heed = { git = "https://github.com/meilisearch/heed", tag = "v0.12.5", default-features = false, features = ["lmdb", "sync-read-txn"] }
grenad = { version = "0.4.4", default-features = false, features = [
"tempfile",
] }
heed = { git = "https://github.com/meilisearch/heed", tag = "v0.12.6", default-features = false, features = [
"lmdb",
"sync-read-txn",
] }
indexmap = { version = "1.9.3", features = ["serde"] }
json-depth-checker = { path = "../json-depth-checker" }
levenshtein_automata = { version = "0.2.1", features = ["fst_automaton"] }
memmap2 = "0.5.10"
@ -39,12 +45,17 @@ rstar = { version = "0.10.0", features = ["serde"] }
serde = { version = "1.0.160", features = ["derive"] }
serde_json = { version = "1.0.95", features = ["preserve_order"] }
slice-group-by = "0.3.0"
smallstr = { version = "0.3.0", features = ["serde"] }
smallstr = { version = "0.3.0", features = ["serde"] }
smallvec = "1.10.0"
smartstring = "1.0.1"
tempfile = "3.5.0"
thiserror = "1.0.40"
time = { version = "0.3.20", features = ["serde-well-known", "formatting", "parsing", "macros"] }
time = { version = "0.3.20", features = [
"serde-well-known",
"formatting",
"parsing",
"macros",
] }
uuid = { version = "1.3.1", features = ["v4"] }
filter-parser = { path = "../filter-parser" }
@ -63,13 +74,13 @@ big_s = "1.0.2"
insta = "1.29.0"
maplit = "1.0.2"
md5 = "0.7.0"
rand = {version = "0.8.5", features = ["small_rng"] }
rand = { version = "0.8.5", features = ["small_rng"] }
[target.'cfg(fuzzing)'.dev-dependencies]
fuzzcheck = "0.12.1"
[features]
all-tokenizations = [ "charabia/default" ]
all-tokenizations = ["charabia/default"]
# Use POSIX semaphores instead of SysV semaphores in LMDB
# For more information on this feature, see heed's Cargo.toml

View File

@ -126,7 +126,7 @@ only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and undersco
InvalidSortableAttribute { field: String, valid_fields: BTreeSet<String> },
#[error("{}", HeedError::BadOpenOptions)]
InvalidLmdbOpenOptions,
#[error("The sort ranking rule must be specified in the ranking rules settings to use the sort parameter at search time.")]
#[error("You must specify where `sort` is listed in the rankingRules setting to use the sort parameter at search time.")]
SortRankingRuleMissing,
#[error("The database file is in an invalid state.")]
InvalidStoreFile,

View File

@ -23,8 +23,8 @@ use crate::heed_codec::{ScriptLanguageCodec, StrBEU16Codec, StrRefCodec};
use crate::{
default_criteria, BEU32StrCodec, BoRoaringBitmapCodec, CboRoaringBitmapCodec, Criterion,
DocumentId, ExternalDocumentsIds, FacetDistribution, FieldDistribution, FieldId,
FieldIdWordCountCodec, GeoPoint, ObkvCodec, Result, RoaringBitmapCodec, RoaringBitmapLenCodec,
Search, U8StrStrCodec, BEU16, BEU32,
FieldIdWordCountCodec, GeoPoint, ObkvCodec, OrderBy, Result, RoaringBitmapCodec,
RoaringBitmapLenCodec, Search, U8StrStrCodec, BEU16, BEU32,
};
pub const DEFAULT_MIN_WORD_LEN_ONE_TYPO: u8 = 5;
@ -62,6 +62,7 @@ pub mod main_key {
pub const EXACT_WORDS: &str = "exact-words";
pub const EXACT_ATTRIBUTES: &str = "exact-attributes";
pub const MAX_VALUES_PER_FACET: &str = "max-values-per-facet";
pub const SORT_FACET_VALUES_BY: &str = "sort-facet-values-by";
pub const PAGINATION_MAX_TOTAL_HITS: &str = "pagination-max-total-hits";
}
@ -170,33 +171,46 @@ impl Index {
unsafe { options.flag(Flags::MdbAlwaysFreePages) };
let env = options.open(path)?;
let main = env.create_poly_database(Some(MAIN))?;
let word_docids = env.create_database(Some(WORD_DOCIDS))?;
let exact_word_docids = env.create_database(Some(EXACT_WORD_DOCIDS))?;
let word_prefix_docids = env.create_database(Some(WORD_PREFIX_DOCIDS))?;
let exact_word_prefix_docids = env.create_database(Some(EXACT_WORD_PREFIX_DOCIDS))?;
let docid_word_positions = env.create_database(Some(DOCID_WORD_POSITIONS))?;
let word_pair_proximity_docids = env.create_database(Some(WORD_PAIR_PROXIMITY_DOCIDS))?;
let script_language_docids = env.create_database(Some(SCRIPT_LANGUAGE_DOCIDS))?;
let mut wtxn = env.write_txn()?;
let main = env.create_poly_database(&mut wtxn, Some(MAIN))?;
let word_docids = env.create_database(&mut wtxn, Some(WORD_DOCIDS))?;
let exact_word_docids = env.create_database(&mut wtxn, Some(EXACT_WORD_DOCIDS))?;
let word_prefix_docids = env.create_database(&mut wtxn, Some(WORD_PREFIX_DOCIDS))?;
let exact_word_prefix_docids =
env.create_database(&mut wtxn, Some(EXACT_WORD_PREFIX_DOCIDS))?;
let docid_word_positions = env.create_database(&mut wtxn, Some(DOCID_WORD_POSITIONS))?;
let word_pair_proximity_docids =
env.create_database(&mut wtxn, Some(WORD_PAIR_PROXIMITY_DOCIDS))?;
let script_language_docids =
env.create_database(&mut wtxn, Some(SCRIPT_LANGUAGE_DOCIDS))?;
let word_prefix_pair_proximity_docids =
env.create_database(Some(WORD_PREFIX_PAIR_PROXIMITY_DOCIDS))?;
env.create_database(&mut wtxn, Some(WORD_PREFIX_PAIR_PROXIMITY_DOCIDS))?;
let prefix_word_pair_proximity_docids =
env.create_database(Some(PREFIX_WORD_PAIR_PROXIMITY_DOCIDS))?;
let word_position_docids = env.create_database(Some(WORD_POSITION_DOCIDS))?;
let word_fid_docids = env.create_database(Some(WORD_FIELD_ID_DOCIDS))?;
let field_id_word_count_docids = env.create_database(Some(FIELD_ID_WORD_COUNT_DOCIDS))?;
let word_prefix_position_docids = env.create_database(Some(WORD_PREFIX_POSITION_DOCIDS))?;
let word_prefix_fid_docids = env.create_database(Some(WORD_PREFIX_FIELD_ID_DOCIDS))?;
let facet_id_f64_docids = env.create_database(Some(FACET_ID_F64_DOCIDS))?;
let facet_id_string_docids = env.create_database(Some(FACET_ID_STRING_DOCIDS))?;
let facet_id_exists_docids = env.create_database(Some(FACET_ID_EXISTS_DOCIDS))?;
let facet_id_is_null_docids = env.create_database(Some(FACET_ID_IS_NULL_DOCIDS))?;
let facet_id_is_empty_docids = env.create_database(Some(FACET_ID_IS_EMPTY_DOCIDS))?;
env.create_database(&mut wtxn, Some(PREFIX_WORD_PAIR_PROXIMITY_DOCIDS))?;
let word_position_docids = env.create_database(&mut wtxn, Some(WORD_POSITION_DOCIDS))?;
let word_fid_docids = env.create_database(&mut wtxn, Some(WORD_FIELD_ID_DOCIDS))?;
let field_id_word_count_docids =
env.create_database(&mut wtxn, Some(FIELD_ID_WORD_COUNT_DOCIDS))?;
let word_prefix_position_docids =
env.create_database(&mut wtxn, Some(WORD_PREFIX_POSITION_DOCIDS))?;
let word_prefix_fid_docids =
env.create_database(&mut wtxn, Some(WORD_PREFIX_FIELD_ID_DOCIDS))?;
let facet_id_f64_docids = env.create_database(&mut wtxn, Some(FACET_ID_F64_DOCIDS))?;
let facet_id_string_docids =
env.create_database(&mut wtxn, Some(FACET_ID_STRING_DOCIDS))?;
let facet_id_exists_docids =
env.create_database(&mut wtxn, Some(FACET_ID_EXISTS_DOCIDS))?;
let facet_id_is_null_docids =
env.create_database(&mut wtxn, Some(FACET_ID_IS_NULL_DOCIDS))?;
let facet_id_is_empty_docids =
env.create_database(&mut wtxn, Some(FACET_ID_IS_EMPTY_DOCIDS))?;
let field_id_docid_facet_f64s = env.create_database(Some(FIELD_ID_DOCID_FACET_F64S))?;
let field_id_docid_facet_f64s =
env.create_database(&mut wtxn, Some(FIELD_ID_DOCID_FACET_F64S))?;
let field_id_docid_facet_strings =
env.create_database(Some(FIELD_ID_DOCID_FACET_STRINGS))?;
let documents = env.create_database(Some(DOCUMENTS))?;
env.create_database(&mut wtxn, Some(FIELD_ID_DOCID_FACET_STRINGS))?;
let documents = env.create_database(&mut wtxn, Some(DOCUMENTS))?;
wtxn.commit()?;
Index::set_creation_dates(&env, main, created_at, updated_at)?;
@ -1221,6 +1235,31 @@ impl Index {
self.main.delete::<_, Str>(txn, main_key::MAX_VALUES_PER_FACET)
}
pub fn sort_facet_values_by(&self, txn: &RoTxn) -> heed::Result<HashMap<String, OrderBy>> {
let mut orders = self
.main
.get::<_, Str, SerdeJson<HashMap<String, OrderBy>>>(
txn,
main_key::SORT_FACET_VALUES_BY,
)?
.unwrap_or_default();
// Insert the default ordering if it is not already overwritten by the user.
orders.entry("*".to_string()).or_insert(OrderBy::Lexicographic);
Ok(orders)
}
pub(crate) fn put_sort_facet_values_by(
&self,
txn: &mut RwTxn,
val: &HashMap<String, OrderBy>,
) -> heed::Result<()> {
self.main.put::<_, Str, SerdeJson<_>>(txn, main_key::SORT_FACET_VALUES_BY, &val)
}
pub(crate) fn delete_sort_facet_values_by(&self, txn: &mut RwTxn) -> heed::Result<bool> {
self.main.delete::<_, Str>(txn, main_key::SORT_FACET_VALUES_BY)
}
pub fn pagination_max_total_hits(&self, txn: &RoTxn) -> heed::Result<Option<usize>> {
self.main.get::<_, Str, OwnedType<usize>>(txn, main_key::PAGINATION_MAX_TOTAL_HITS)
}
@ -1459,9 +1498,9 @@ pub(crate) mod tests {
db_snap!(index, field_distribution,
@r###"
age 1
id 2
name 2
age 1
id 2
name 2
"###
);
@ -1479,9 +1518,9 @@ pub(crate) mod tests {
db_snap!(index, field_distribution,
@r###"
age 1
id 2
name 2
age 1
id 2
name 2
"###
);
@ -1495,9 +1534,9 @@ pub(crate) mod tests {
db_snap!(index, field_distribution,
@r###"
has_dog 1
id 2
name 2
has_dog 1
id 2
name 2
"###
);
}

View File

@ -99,8 +99,8 @@ pub use self::heed_codec::{
};
pub use self::index::Index;
pub use self::search::{
FacetDistribution, Filter, FormatOptions, MatchBounds, MatcherBuilder, MatchingWords, Search,
SearchResult, TermsMatchingStrategy, DEFAULT_VALUES_PER_FACET,
FacetDistribution, Filter, FormatOptions, MatchBounds, MatcherBuilder, MatchingWords, OrderBy,
Search, SearchResult, TermsMatchingStrategy, DEFAULT_VALUES_PER_FACET,
};
pub type Result<T> = std::result::Result<T, error::Error>;

View File

@ -1,19 +1,22 @@
use std::collections::{BTreeMap, HashSet};
use std::collections::{BTreeMap, HashMap, HashSet};
use std::ops::ControlFlow;
use std::{fmt, mem};
use heed::types::ByteSlice;
use heed::BytesDecode;
use indexmap::IndexMap;
use roaring::RoaringBitmap;
use serde::{Deserialize, Serialize};
use crate::error::UserError;
use crate::facet::FacetType;
use crate::heed_codec::facet::{
FacetGroupKeyCodec, FacetGroupValueCodec, FieldDocIdFacetF64Codec, FieldDocIdFacetStringCodec,
OrderedF64Codec,
FacetGroupKeyCodec, FieldDocIdFacetF64Codec, FieldDocIdFacetStringCodec, OrderedF64Codec,
};
use crate::heed_codec::{ByteSliceRefCodec, StrRefCodec};
use crate::search::facet::facet_distribution_iter;
use crate::search::facet::facet_distribution_iter::{
count_iterate_over_facet_distribution, lexicographically_iterate_over_facet_distribution,
};
use crate::{FieldId, Index, Result};
/// The default number of values by facets that will
@ -24,10 +27,21 @@ pub const DEFAULT_VALUES_PER_FACET: usize = 100;
/// the system to choose between one algorithm or another.
const CANDIDATES_THRESHOLD: u64 = 3000;
/// How should we fetch the facets?
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum OrderBy {
/// By lexicographic order...
#[default]
Lexicographic,
/// Or by number of docids in common?
Count,
}
pub struct FacetDistribution<'a> {
facets: Option<HashSet<String>>,
facets: Option<HashMap<String, OrderBy>>,
candidates: Option<RoaringBitmap>,
max_values_per_facet: usize,
default_order_by: OrderBy,
rtxn: &'a heed::RoTxn<'a>,
index: &'a Index,
}
@ -38,13 +52,22 @@ impl<'a> FacetDistribution<'a> {
facets: None,
candidates: None,
max_values_per_facet: DEFAULT_VALUES_PER_FACET,
default_order_by: OrderBy::default(),
rtxn,
index,
}
}
pub fn facets<I: IntoIterator<Item = A>, A: AsRef<str>>(&mut self, names: I) -> &mut Self {
self.facets = Some(names.into_iter().map(|s| s.as_ref().to_string()).collect());
pub fn facets<I: IntoIterator<Item = (A, OrderBy)>, A: AsRef<str>>(
&mut self,
names_ordered_by: I,
) -> &mut Self {
self.facets = Some(
names_ordered_by
.into_iter()
.map(|(name, order_by)| (name.as_ref().to_string(), order_by))
.collect(),
);
self
}
@ -53,6 +76,11 @@ impl<'a> FacetDistribution<'a> {
self
}
pub fn default_order_by(&mut self, order_by: OrderBy) -> &mut Self {
self.default_order_by = order_by;
self
}
pub fn candidates(&mut self, candidates: RoaringBitmap) -> &mut Self {
self.candidates = Some(candidates);
self
@ -65,7 +93,7 @@ impl<'a> FacetDistribution<'a> {
field_id: FieldId,
facet_type: FacetType,
candidates: &RoaringBitmap,
distribution: &mut BTreeMap<String, u64>,
distribution: &mut IndexMap<String, u64>,
) -> heed::Result<()> {
match facet_type {
FacetType::Number => {
@ -134,9 +162,15 @@ impl<'a> FacetDistribution<'a> {
&self,
field_id: FieldId,
candidates: &RoaringBitmap,
distribution: &mut BTreeMap<String, u64>,
order_by: OrderBy,
distribution: &mut IndexMap<String, u64>,
) -> heed::Result<()> {
facet_distribution_iter::iterate_over_facet_distribution(
let search_function = match order_by {
OrderBy::Lexicographic => lexicographically_iterate_over_facet_distribution,
OrderBy::Count => count_iterate_over_facet_distribution,
};
search_function(
self.rtxn,
self.index
.facet_id_f64_docids
@ -159,9 +193,15 @@ impl<'a> FacetDistribution<'a> {
&self,
field_id: FieldId,
candidates: &RoaringBitmap,
distribution: &mut BTreeMap<String, u64>,
order_by: OrderBy,
distribution: &mut IndexMap<String, u64>,
) -> heed::Result<()> {
facet_distribution_iter::iterate_over_facet_distribution(
let search_function = match order_by {
OrderBy::Lexicographic => lexicographically_iterate_over_facet_distribution,
OrderBy::Count => count_iterate_over_facet_distribution,
};
search_function(
self.rtxn,
self.index
.facet_id_string_docids
@ -189,93 +229,48 @@ impl<'a> FacetDistribution<'a> {
)
}
/// Placeholder search, a.k.a. no candidates were specified. We iterate throught the
/// facet values one by one and iterate on the facet level 0 for numbers.
fn facet_values_from_raw_facet_database(
fn facet_values(
&self,
field_id: FieldId,
) -> heed::Result<BTreeMap<String, u64>> {
let mut distribution = BTreeMap::new();
let db = self.index.facet_id_f64_docids;
let mut prefix = vec![];
prefix.extend_from_slice(&field_id.to_be_bytes());
prefix.push(0); // read values from level 0 only
let iter = db
.as_polymorph()
.prefix_iter::<_, ByteSlice, ByteSlice>(self.rtxn, prefix.as_slice())?
.remap_types::<FacetGroupKeyCodec<OrderedF64Codec>, FacetGroupValueCodec>();
for result in iter {
let (key, value) = result?;
distribution.insert(key.left_bound.to_string(), value.bitmap.len());
if distribution.len() == self.max_values_per_facet {
break;
}
}
let iter = self
.index
.facet_id_string_docids
.as_polymorph()
.prefix_iter::<_, ByteSlice, ByteSlice>(self.rtxn, prefix.as_slice())?
.remap_types::<FacetGroupKeyCodec<StrRefCodec>, FacetGroupValueCodec>();
for result in iter {
let (key, value) = result?;
let docid = value.bitmap.iter().next().unwrap();
let key: (FieldId, _, &'a str) = (field_id, docid, key.left_bound);
let original_string =
self.index.field_id_docid_facet_strings.get(self.rtxn, &key)?.unwrap().to_owned();
distribution.insert(original_string, value.bitmap.len());
if distribution.len() == self.max_values_per_facet {
break;
}
}
Ok(distribution)
}
fn facet_values(&self, field_id: FieldId) -> heed::Result<BTreeMap<String, u64>> {
order_by: OrderBy,
) -> heed::Result<IndexMap<String, u64>> {
use FacetType::{Number, String};
match self.candidates {
Some(ref candidates) => {
let mut distribution = IndexMap::new();
match (order_by, &self.candidates) {
(OrderBy::Lexicographic, Some(cnd)) if cnd.len() <= CANDIDATES_THRESHOLD => {
// Classic search, candidates were specified, we must return facet values only related
// to those candidates. We also enter here for facet strings for performance reasons.
let mut distribution = BTreeMap::new();
if candidates.len() <= CANDIDATES_THRESHOLD {
self.facet_distribution_from_documents(
field_id,
Number,
candidates,
&mut distribution,
)?;
self.facet_distribution_from_documents(
field_id,
String,
candidates,
&mut distribution,
)?;
} else {
self.facet_numbers_distribution_from_facet_levels(
field_id,
candidates,
&mut distribution,
)?;
self.facet_strings_distribution_from_facet_levels(
field_id,
candidates,
&mut distribution,
)?;
}
Ok(distribution)
self.facet_distribution_from_documents(field_id, Number, cnd, &mut distribution)?;
self.facet_distribution_from_documents(field_id, String, cnd, &mut distribution)?;
}
None => self.facet_values_from_raw_facet_database(field_id),
}
_ => {
let universe;
let candidates;
match &self.candidates {
Some(cnd) => candidates = cnd,
None => {
universe = self.index.documents_ids(self.rtxn)?;
candidates = &universe;
}
}
self.facet_numbers_distribution_from_facet_levels(
field_id,
candidates,
order_by,
&mut distribution,
)?;
self.facet_strings_distribution_from_facet_levels(
field_id,
candidates,
order_by,
&mut distribution,
)?;
}
};
Ok(distribution)
}
pub fn compute_stats(&self) -> Result<BTreeMap<String, (f64, f64)>> {
@ -291,6 +286,7 @@ impl<'a> FacetDistribution<'a> {
Some(facets) => {
let invalid_fields: HashSet<_> = facets
.iter()
.map(|(name, _)| name)
.filter(|facet| !crate::is_faceted(facet, &filterable_fields))
.collect();
if !invalid_fields.is_empty() {
@ -300,7 +296,7 @@ impl<'a> FacetDistribution<'a> {
}
.into());
} else {
facets.clone()
facets.into_iter().map(|(name, _)| name).cloned().collect()
}
}
None => filterable_fields,
@ -337,7 +333,7 @@ impl<'a> FacetDistribution<'a> {
Ok(distribution)
}
pub fn execute(&self) -> Result<BTreeMap<String, BTreeMap<String, u64>>> {
pub fn execute(&self) -> Result<BTreeMap<String, IndexMap<String, u64>>> {
let fields_ids_map = self.index.fields_ids_map(self.rtxn)?;
let filterable_fields = self.index.filterable_fields(self.rtxn)?;
@ -345,6 +341,7 @@ impl<'a> FacetDistribution<'a> {
Some(ref facets) => {
let invalid_fields: HashSet<_> = facets
.iter()
.map(|(name, _)| name)
.filter(|facet| !crate::is_faceted(facet, &filterable_fields))
.collect();
if !invalid_fields.is_empty() {
@ -354,7 +351,7 @@ impl<'a> FacetDistribution<'a> {
}
.into());
} else {
facets.clone()
facets.into_iter().map(|(name, _)| name).cloned().collect()
}
}
None => filterable_fields,
@ -363,7 +360,13 @@ impl<'a> FacetDistribution<'a> {
let mut distribution = BTreeMap::new();
for (fid, name) in fields_ids_map.iter() {
if crate::is_faceted(name, &fields) {
let values = self.facet_values(fid)?;
let order_by = self
.facets
.as_ref()
.map(|facets| facets.get(name).copied())
.flatten()
.unwrap_or(self.default_order_by);
let values = self.facet_values(fid, order_by)?;
distribution.insert(name.to_string(), values);
}
}
@ -374,13 +377,20 @@ impl<'a> FacetDistribution<'a> {
impl fmt::Debug for FacetDistribution<'_> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let FacetDistribution { facets, candidates, max_values_per_facet, rtxn: _, index: _ } =
self;
let FacetDistribution {
facets,
candidates,
max_values_per_facet,
default_order_by,
rtxn: _,
index: _,
} = self;
f.debug_struct("FacetDistribution")
.field("facets", facets)
.field("candidates", candidates)
.field("max_values_per_facet", max_values_per_facet)
.field("default_order_by", default_order_by)
.finish()
}
}
@ -392,7 +402,7 @@ mod tests {
use crate::documents::documents_batch_reader_from_objects;
use crate::index::tests::TempIndex;
use crate::{milli_snap, FacetDistribution};
use crate::{milli_snap, FacetDistribution, OrderBy};
#[test]
fn few_candidates_few_facet_values() {
@ -417,14 +427,14 @@ mod tests {
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.execute()
.unwrap();
milli_snap!(format!("{map:?}"), @r###"{"colour": {"Blue": 2, "RED": 1}}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates([0, 1, 2].iter().copied().collect())
.execute()
.unwrap();
@ -432,7 +442,7 @@ mod tests {
milli_snap!(format!("{map:?}"), @r###"{"colour": {"Blue": 2, "RED": 1}}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates([1, 2].iter().copied().collect())
.execute()
.unwrap();
@ -443,7 +453,7 @@ mod tests {
milli_snap!(format!("{map:?}"), @r###"{"colour": {" blue": 1, "RED": 1}}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates([2].iter().copied().collect())
.execute()
.unwrap();
@ -451,7 +461,7 @@ mod tests {
milli_snap!(format!("{map:?}"), @r###"{"colour": {"RED": 1}}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates([0, 1, 2].iter().copied().collect())
.max_values_per_facet(1)
.execute()
@ -489,14 +499,14 @@ mod tests {
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.execute()
.unwrap();
milli_snap!(format!("{map:?}"), @r###"{"colour": {"Blue": 4000, "Red": 6000}}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.max_values_per_facet(1)
.execute()
.unwrap();
@ -504,7 +514,7 @@ mod tests {
milli_snap!(format!("{map:?}"), @r###"{"colour": {"Blue": 4000}}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((0..10_000).collect())
.execute()
.unwrap();
@ -512,7 +522,7 @@ mod tests {
milli_snap!(format!("{map:?}"), @r###"{"colour": {"Blue": 4000, "Red": 6000}}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((0..5_000).collect())
.execute()
.unwrap();
@ -520,7 +530,7 @@ mod tests {
milli_snap!(format!("{map:?}"), @r###"{"colour": {"Blue": 2000, "Red": 3000}}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((0..5_000).collect())
.execute()
.unwrap();
@ -528,7 +538,7 @@ mod tests {
milli_snap!(format!("{map:?}"), @r###"{"colour": {"Blue": 2000, "Red": 3000}}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((0..5_000).collect())
.max_values_per_facet(1)
.execute()
@ -566,14 +576,14 @@ mod tests {
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.execute()
.unwrap();
milli_snap!(format!("{map:?}"), "no_candidates", @"ac9229ed5964d893af96a7076e2f8af5");
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.max_values_per_facet(2)
.execute()
.unwrap();
@ -581,7 +591,7 @@ mod tests {
milli_snap!(format!("{map:?}"), "no_candidates_with_max_2", @r###"{"colour": {"0": 10, "1": 10}}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((0..10_000).collect())
.execute()
.unwrap();
@ -589,7 +599,7 @@ mod tests {
milli_snap!(format!("{map:?}"), "candidates_0_10_000", @"ac9229ed5964d893af96a7076e2f8af5");
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((0..5_000).collect())
.execute()
.unwrap();
@ -626,14 +636,14 @@ mod tests {
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "no_candidates", @"{}");
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((0..1000).collect())
.compute_stats()
.unwrap();
@ -641,7 +651,7 @@ mod tests {
milli_snap!(format!("{map:?}"), "candidates_0_1000", @r###"{"colour": (0.0, 999.0)}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((217..777).collect())
.compute_stats()
.unwrap();
@ -678,14 +688,14 @@ mod tests {
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "no_candidates", @"{}");
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((0..1000).collect())
.compute_stats()
.unwrap();
@ -693,7 +703,7 @@ mod tests {
milli_snap!(format!("{map:?}"), "candidates_0_1000", @r###"{"colour": (0.0, 1999.0)}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((217..777).collect())
.compute_stats()
.unwrap();
@ -730,14 +740,14 @@ mod tests {
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "no_candidates", @"{}");
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((0..1000).collect())
.compute_stats()
.unwrap();
@ -745,7 +755,7 @@ mod tests {
milli_snap!(format!("{map:?}"), "candidates_0_1000", @r###"{"colour": (0.0, 999.0)}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((217..777).collect())
.compute_stats()
.unwrap();
@ -786,14 +796,14 @@ mod tests {
let txn = index.read_txn().unwrap();
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.compute_stats()
.unwrap();
milli_snap!(format!("{map:?}"), "no_candidates", @"{}");
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((0..1000).collect())
.compute_stats()
.unwrap();
@ -801,7 +811,7 @@ mod tests {
milli_snap!(format!("{map:?}"), "candidates_0_1000", @r###"{"colour": (0.0, 1998.0)}"###);
let map = FacetDistribution::new(&txn, &index)
.facets(std::iter::once("colour"))
.facets(std::iter::once(("colour", OrderBy::default())))
.candidates((217..777).collect())
.compute_stats()
.unwrap();

View File

@ -1,3 +1,5 @@
use std::cmp::Reverse;
use std::collections::BinaryHeap;
use std::ops::ControlFlow;
use heed::Result;
@ -19,7 +21,7 @@ use crate::DocumentId;
///
/// The return value of the closure is a `ControlFlow<()>` which indicates whether we should
/// keep iterating over the different facet values or stop.
pub fn iterate_over_facet_distribution<'t, CB>(
pub fn lexicographically_iterate_over_facet_distribution<'t, CB>(
rtxn: &'t heed::RoTxn<'t>,
db: heed::Database<FacetGroupKeyCodec<ByteSliceRefCodec>, FacetGroupValueCodec>,
field_id: u16,
@ -29,7 +31,7 @@ pub fn iterate_over_facet_distribution<'t, CB>(
where
CB: FnMut(&'t [u8], u64, DocumentId) -> Result<ControlFlow<()>>,
{
let mut fd = FacetDistribution { rtxn, db, field_id, callback };
let mut fd = LexicographicFacetDistribution { rtxn, db, field_id, callback };
let highest_level = get_highest_level(
rtxn,
db.remap_key_type::<FacetGroupKeyCodec<ByteSliceRefCodec>>(),
@ -44,7 +46,99 @@ where
}
}
struct FacetDistribution<'t, CB>
pub fn count_iterate_over_facet_distribution<'t, CB>(
rtxn: &'t heed::RoTxn<'t>,
db: heed::Database<FacetGroupKeyCodec<ByteSliceRefCodec>, FacetGroupValueCodec>,
field_id: u16,
candidates: &RoaringBitmap,
mut callback: CB,
) -> Result<()>
where
CB: FnMut(&'t [u8], u64, DocumentId) -> Result<ControlFlow<()>>,
{
#[derive(Debug, PartialOrd, Ord, PartialEq, Eq)]
struct LevelEntry<'t> {
/// The number of candidates in this entry.
count: u64,
/// The key level of the entry.
level: Reverse<u8>,
/// The left bound key.
left_bound: &'t [u8],
/// The number of keys we must look for after `left_bound`.
group_size: u8,
/// Any docid in the set of matching documents. Used to find the original facet string.
any_docid: u32,
}
// Represents the list of keys that we must explore.
let mut heap = BinaryHeap::new();
let highest_level = get_highest_level(
rtxn,
db.remap_key_type::<FacetGroupKeyCodec<ByteSliceRefCodec>>(),
field_id,
)?;
if let Some(first_bound) = get_first_facet_value::<ByteSliceRefCodec>(rtxn, db, field_id)? {
// We first fill the heap with values from the highest level
let starting_key =
FacetGroupKey { field_id, level: highest_level, left_bound: first_bound };
for el in db.range(rtxn, &(&starting_key..)).unwrap().take(usize::MAX) {
let (key, value) = el.unwrap();
// The range is unbounded on the right and the group size for the highest level is MAX,
// so we need to check that we are not iterating over the next field id
if key.field_id != field_id {
break;
}
let intersection = value.bitmap & candidates;
let count = intersection.len();
if count != 0 {
heap.push(LevelEntry {
count,
level: Reverse(key.level),
left_bound: key.left_bound,
group_size: value.size,
any_docid: intersection.min().unwrap(),
});
}
}
while let Some(LevelEntry { count, level, left_bound, group_size, any_docid }) = heap.pop()
{
if let Reverse(0) = level {
match (callback)(left_bound, count, any_docid)? {
ControlFlow::Continue(_) => (),
ControlFlow::Break(_) => return Ok(()),
}
} else {
let starting_key = FacetGroupKey { field_id, level: level.0 - 1, left_bound };
for el in db.range(rtxn, &(&starting_key..)).unwrap().take(group_size as usize) {
let (key, value) = el.unwrap();
// The range is unbounded on the right and the group size for the highest level is MAX,
// so we need to check that we are not iterating over the next field id
if key.field_id != field_id {
break;
}
let intersection = value.bitmap & candidates;
let count = intersection.len();
if count != 0 {
heap.push(LevelEntry {
count,
level: Reverse(key.level),
left_bound: key.left_bound,
group_size: value.size,
any_docid: intersection.min().unwrap(),
});
}
}
}
}
}
Ok(())
}
/// Iterate over the facets values by lexicographic order.
struct LexicographicFacetDistribution<'t, CB>
where
CB: FnMut(&'t [u8], u64, DocumentId) -> Result<ControlFlow<()>>,
{
@ -54,7 +148,7 @@ where
callback: CB,
}
impl<'t, CB> FacetDistribution<'t, CB>
impl<'t, CB> LexicographicFacetDistribution<'t, CB>
where
CB: FnMut(&'t [u8], u64, DocumentId) -> Result<ControlFlow<()>>,
{
@ -86,6 +180,7 @@ where
}
Ok(ControlFlow::Continue(()))
}
fn iterate(
&mut self,
candidates: &RoaringBitmap,
@ -116,7 +211,7 @@ where
value.size as usize,
)?;
match cf {
ControlFlow::Continue(_) => {}
ControlFlow::Continue(_) => (),
ControlFlow::Break(_) => return Ok(ControlFlow::Break(())),
}
}
@ -132,7 +227,7 @@ mod tests {
use heed::BytesDecode;
use roaring::RoaringBitmap;
use super::iterate_over_facet_distribution;
use super::lexicographically_iterate_over_facet_distribution;
use crate::heed_codec::facet::OrderedF64Codec;
use crate::milli_snap;
use crate::search::facet::tests::{get_random_looking_index, get_simple_index};
@ -144,7 +239,7 @@ mod tests {
let txn = index.env.read_txn().unwrap();
let candidates = (0..=255).collect::<RoaringBitmap>();
let mut results = String::new();
iterate_over_facet_distribution(
lexicographically_iterate_over_facet_distribution(
&txn,
index.content,
0,
@ -161,6 +256,7 @@ mod tests {
txn.commit().unwrap();
}
}
#[test]
fn filter_distribution_all_stop_early() {
let indexes = [get_simple_index(), get_random_looking_index()];
@ -169,7 +265,7 @@ mod tests {
let candidates = (0..=255).collect::<RoaringBitmap>();
let mut results = String::new();
let mut nbr_facets = 0;
iterate_over_facet_distribution(
lexicographically_iterate_over_facet_distribution(
&txn,
index.content,
0,

View File

@ -4,7 +4,7 @@ use heed::types::{ByteSlice, DecodeIgnore};
use heed::{BytesDecode, RoTxn};
use roaring::RoaringBitmap;
pub use self::facet_distribution::{FacetDistribution, DEFAULT_VALUES_PER_FACET};
pub use self::facet_distribution::{FacetDistribution, OrderBy, DEFAULT_VALUES_PER_FACET};
pub use self::filter::{BadGeoError, Filter};
use crate::heed_codec::facet::{FacetGroupKeyCodec, FacetGroupValueCodec, OrderedF64Codec};
use crate::heed_codec::ByteSliceRefCodec;

View File

@ -4,7 +4,7 @@ use levenshtein_automata::{LevenshteinAutomatonBuilder as LevBuilder, DFA};
use once_cell::sync::Lazy;
use roaring::bitmap::RoaringBitmap;
pub use self::facet::{FacetDistribution, Filter, DEFAULT_VALUES_PER_FACET};
pub use self::facet::{FacetDistribution, Filter, OrderBy, DEFAULT_VALUES_PER_FACET};
pub use self::new::matches::{FormatOptions, MatchBounds, Matcher, MatcherBuilder, MatchingWords};
use self::new::PartialSearchResult;
use crate::{

View File

@ -46,7 +46,7 @@ use super::logger::SearchLogger;
use super::query_graph::QueryNode;
use super::ranking_rule_graph::{
ConditionDocIdsCache, DeadEndsCache, ExactnessGraph, FidGraph, PositionGraph, ProximityGraph,
RankingRuleGraph, RankingRuleGraphTrait, TypoGraph,
RankingRuleGraph, RankingRuleGraphTrait, TypoGraph, WordsGraph,
};
use super::small_bitmap::SmallBitmap;
use super::{QueryGraph, RankingRule, RankingRuleOutput, SearchContext};
@ -54,6 +54,12 @@ use crate::search::new::query_term::LocatedQueryTermSubset;
use crate::search::new::ranking_rule_graph::PathVisitor;
use crate::{Result, TermsMatchingStrategy};
pub type Words = GraphBasedRankingRule<WordsGraph>;
impl GraphBasedRankingRule<WordsGraph> {
pub fn new(terms_matching_strategy: TermsMatchingStrategy) -> Self {
Self::new_with_id("words".to_owned(), Some(terms_matching_strategy))
}
}
pub type Proximity = GraphBasedRankingRule<ProximityGraph>;
impl GraphBasedRankingRule<ProximityGraph> {
pub fn new(terms_matching_strategy: Option<TermsMatchingStrategy>) -> Self {

View File

@ -4,7 +4,6 @@ use std::io::{BufWriter, Write};
use std::path::{Path, PathBuf};
use std::time::Instant;
// use rand::random;
use roaring::RoaringBitmap;
use crate::search::new::interner::Interned;
@ -13,6 +12,7 @@ use crate::search::new::query_term::LocatedQueryTermSubset;
use crate::search::new::ranking_rule_graph::{
Edge, FidCondition, FidGraph, PositionCondition, PositionGraph, ProximityCondition,
ProximityGraph, RankingRuleGraph, RankingRuleGraphTrait, TypoCondition, TypoGraph,
WordsCondition, WordsGraph,
};
use crate::search::new::ranking_rules::BoxRankingRule;
use crate::search::new::{QueryGraph, QueryNode, RankingRule, SearchContext, SearchLogger};
@ -24,11 +24,12 @@ pub enum SearchEvents {
RankingRuleSkipBucket { ranking_rule_idx: usize, bucket_len: u64 },
RankingRuleEndIteration { ranking_rule_idx: usize, universe_len: u64 },
ExtendResults { new: Vec<u32> },
WordsGraph { query_graph: QueryGraph },
ProximityGraph { graph: RankingRuleGraph<ProximityGraph> },
ProximityPaths { paths: Vec<Vec<Interned<ProximityCondition>>> },
TypoGraph { graph: RankingRuleGraph<TypoGraph> },
TypoPaths { paths: Vec<Vec<Interned<TypoCondition>>> },
WordsGraph { graph: RankingRuleGraph<WordsGraph> },
WordsPaths { paths: Vec<Vec<Interned<WordsCondition>>> },
FidGraph { graph: RankingRuleGraph<FidGraph> },
FidPaths { paths: Vec<Vec<Interned<FidCondition>>> },
PositionGraph { graph: RankingRuleGraph<PositionGraph> },
@ -139,8 +140,11 @@ impl SearchLogger<QueryGraph> for VisualSearchLogger {
let Some(location) = self.location.last() else { return };
match location {
Location::Words => {
if let Some(query_graph) = state.downcast_ref::<QueryGraph>() {
self.events.push(SearchEvents::WordsGraph { query_graph: query_graph.clone() });
if let Some(graph) = state.downcast_ref::<RankingRuleGraph<WordsGraph>>() {
self.events.push(SearchEvents::WordsGraph { graph: graph.clone() });
}
if let Some(paths) = state.downcast_ref::<Vec<Vec<Interned<WordsCondition>>>>() {
self.events.push(SearchEvents::WordsPaths { paths: paths.clone() });
}
}
Location::Typo => {
@ -329,7 +333,6 @@ impl<'ctx> DetailedLoggerFinish<'ctx> {
SearchEvents::ExtendResults { new } => {
self.write_extend_results(new)?;
}
SearchEvents::WordsGraph { query_graph } => self.write_words_graph(query_graph)?,
SearchEvents::ProximityGraph { graph } => self.write_rr_graph(&graph)?,
SearchEvents::ProximityPaths { paths } => {
self.write_rr_graph_paths::<ProximityGraph>(paths)?;
@ -338,6 +341,10 @@ impl<'ctx> DetailedLoggerFinish<'ctx> {
SearchEvents::TypoPaths { paths } => {
self.write_rr_graph_paths::<TypoGraph>(paths)?;
}
SearchEvents::WordsGraph { graph } => self.write_rr_graph(&graph)?,
SearchEvents::WordsPaths { paths } => {
self.write_rr_graph_paths::<WordsGraph>(paths)?;
}
SearchEvents::FidGraph { graph } => self.write_rr_graph(&graph)?,
SearchEvents::FidPaths { paths } => {
self.write_rr_graph_paths::<FidGraph>(paths)?;
@ -455,7 +462,7 @@ fill: \"#B6E2D3\"
shape: class
max_nbr_typo: {}",
term_subset.description(ctx),
term_subset.max_nbr_typos(ctx)
term_subset.max_typo_cost(ctx)
)?;
for w in term_subset.all_single_words_except_prefix_db(ctx)? {
@ -482,13 +489,6 @@ fill: \"#B6E2D3\"
}
Ok(())
}
fn write_words_graph(&mut self, qg: QueryGraph) -> Result<()> {
self.make_new_file_for_internal_state_if_needed()?;
self.write_query_graph(&qg)?;
Ok(())
}
fn write_rr_graph<R: RankingRuleGraphTrait>(
&mut self,
graph: &RankingRuleGraph<R>,

View File

@ -15,11 +15,7 @@ mod resolve_query_graph;
mod small_bitmap;
mod exact_attribute;
// TODO: documentation + comments
// implementation is currently an adaptation of the previous implementation to fit with the new model
mod sort;
// TODO: documentation + comments
mod words;
#[cfg(test)]
mod tests;
@ -43,10 +39,10 @@ use ranking_rules::{
use resolve_query_graph::{compute_query_graph_docids, PhraseDocIdsCache};
use roaring::RoaringBitmap;
use sort::Sort;
use words::Words;
use self::geo_sort::GeoSort;
pub use self::geo_sort::Strategy as GeoSortStrategy;
use self::graph_based_ranking_rule::Words;
use self::interner::Interned;
use crate::search::new::distinct::apply_distinct_rule;
use crate::{AscDesc, DocumentId, Filter, Index, Member, Result, TermsMatchingStrategy, UserError};
@ -202,6 +198,11 @@ fn get_ranking_rules_for_query_graph_search<'ctx>(
let mut sorted_fields = HashSet::new();
let mut geo_sorted = false;
// Don't add the `words` ranking rule if the term matching strategy is `All`
if matches!(terms_matching_strategy, TermsMatchingStrategy::All) {
words = true;
}
let mut ranking_rules: Vec<BoxRankingRule<QueryGraph>> = vec![];
let settings_ranking_rules = ctx.index.criteria(ctx.txn)?;
for rr in settings_ranking_rules {

View File

@ -28,16 +28,14 @@ pub enum ZeroOrOneTypo {
impl Interned<QueryTerm> {
pub fn compute_fully_if_needed(self, ctx: &mut SearchContext) -> Result<()> {
let s = ctx.term_interner.get_mut(self);
if s.max_nbr_typos == 0 {
s.one_typo = Lazy::Init(OneTypoTerm::default());
s.two_typo = Lazy::Init(TwoTypoTerm::default());
} else if s.max_nbr_typos == 1 && s.one_typo.is_uninit() {
if s.max_levenshtein_distance <= 1 && s.one_typo.is_uninit() {
assert!(s.two_typo.is_uninit());
// Initialize one_typo subterm even if max_nbr_typo is 0 because of split words
self.initialize_one_typo_subterm(ctx)?;
let s = ctx.term_interner.get_mut(self);
assert!(s.one_typo.is_init());
s.two_typo = Lazy::Init(TwoTypoTerm::default());
} else if s.max_nbr_typos > 1 && s.two_typo.is_uninit() {
} else if s.max_levenshtein_distance > 1 && s.two_typo.is_uninit() {
assert!(s.two_typo.is_uninit());
self.initialize_one_and_two_typo_subterm(ctx)?;
let s = ctx.term_interner.get_mut(self);
@ -187,7 +185,7 @@ pub fn partially_initialized_term_from_word(
original: ctx.word_interner.insert(word.to_owned()),
ngram_words: None,
is_prefix: false,
max_nbr_typos: 0,
max_levenshtein_distance: 0,
zero_typo: <_>::default(),
one_typo: Lazy::Init(<_>::default()),
two_typo: Lazy::Init(<_>::default()),
@ -258,7 +256,7 @@ pub fn partially_initialized_term_from_word(
Ok(QueryTerm {
original: word_interned,
ngram_words: None,
max_nbr_typos: max_typo,
max_levenshtein_distance: max_typo,
is_prefix,
zero_typo,
one_typo: Lazy::Uninit,
@ -277,7 +275,16 @@ fn find_split_words(ctx: &mut SearchContext, word: &str) -> Result<Option<Intern
impl Interned<QueryTerm> {
fn initialize_one_typo_subterm(self, ctx: &mut SearchContext) -> Result<()> {
let self_mut = ctx.term_interner.get_mut(self);
let QueryTerm { original, is_prefix, one_typo, .. } = self_mut;
let allows_split_words = self_mut.allows_split_words();
let QueryTerm {
original,
is_prefix,
one_typo,
max_levenshtein_distance: max_nbr_typos,
..
} = self_mut;
let original = *original;
let is_prefix = *is_prefix;
// let original_str = ctx.word_interner.get(*original).to_owned();
@ -286,26 +293,33 @@ impl Interned<QueryTerm> {
}
let mut one_typo_words = BTreeSet::new();
find_zero_one_typo_derivations(ctx, original, is_prefix, |derived_word, nbr_typos| {
match nbr_typos {
ZeroOrOneTypo::Zero => {}
ZeroOrOneTypo::One => {
if one_typo_words.len() < limits::MAX_ONE_TYPO_COUNT {
one_typo_words.insert(derived_word);
} else {
return Ok(ControlFlow::Break(()));
if *max_nbr_typos > 0 {
find_zero_one_typo_derivations(ctx, original, is_prefix, |derived_word, nbr_typos| {
match nbr_typos {
ZeroOrOneTypo::Zero => {}
ZeroOrOneTypo::One => {
if one_typo_words.len() < limits::MAX_ONE_TYPO_COUNT {
one_typo_words.insert(derived_word);
} else {
return Ok(ControlFlow::Break(()));
}
}
}
}
Ok(ControlFlow::Continue(()))
})?;
let original_str = ctx.word_interner.get(original).to_owned();
let split_words = find_split_words(ctx, original_str.as_str())?;
Ok(ControlFlow::Continue(()))
})?;
}
let split_words = if allows_split_words {
let original_str = ctx.word_interner.get(original).to_owned();
find_split_words(ctx, original_str.as_str())?
} else {
None
};
let self_mut = ctx.term_interner.get_mut(self);
// Only add the split words to the derivations if:
// 1. the term is not an ngram; OR
// 1. the term is neither an ngram nor a phrase; OR
// 2. the term is an ngram, but the split words are different from the ngram's component words
let split_words = if let Some((ngram_words, split_words)) =
self_mut.ngram_words.as_ref().zip(split_words.as_ref())
@ -327,7 +341,13 @@ impl Interned<QueryTerm> {
}
fn initialize_one_and_two_typo_subterm(self, ctx: &mut SearchContext) -> Result<()> {
let self_mut = ctx.term_interner.get_mut(self);
let QueryTerm { original, is_prefix, two_typo, .. } = self_mut;
let QueryTerm {
original,
is_prefix,
two_typo,
max_levenshtein_distance: max_nbr_typos,
..
} = self_mut;
let original_str = ctx.word_interner.get(*original).to_owned();
if two_typo.is_init() {
return Ok(());
@ -335,34 +355,37 @@ impl Interned<QueryTerm> {
let mut one_typo_words = BTreeSet::new();
let mut two_typo_words = BTreeSet::new();
find_zero_one_two_typo_derivations(
*original,
*is_prefix,
ctx.index.words_fst(ctx.txn)?,
&mut ctx.word_interner,
|derived_word, nbr_typos| {
if one_typo_words.len() >= limits::MAX_ONE_TYPO_COUNT
&& two_typo_words.len() >= limits::MAX_TWO_TYPOS_COUNT
{
// No chance we will add either one- or two-typo derivations anymore, stop iterating.
return Ok(ControlFlow::Break(()));
}
match nbr_typos {
NumberOfTypos::Zero => {}
NumberOfTypos::One => {
if one_typo_words.len() < limits::MAX_ONE_TYPO_COUNT {
one_typo_words.insert(derived_word);
if *max_nbr_typos > 0 {
find_zero_one_two_typo_derivations(
*original,
*is_prefix,
ctx.index.words_fst(ctx.txn)?,
&mut ctx.word_interner,
|derived_word, nbr_typos| {
if one_typo_words.len() >= limits::MAX_ONE_TYPO_COUNT
&& two_typo_words.len() >= limits::MAX_TWO_TYPOS_COUNT
{
// No chance we will add either one- or two-typo derivations anymore, stop iterating.
return Ok(ControlFlow::Break(()));
}
match nbr_typos {
NumberOfTypos::Zero => {}
NumberOfTypos::One => {
if one_typo_words.len() < limits::MAX_ONE_TYPO_COUNT {
one_typo_words.insert(derived_word);
}
}
NumberOfTypos::Two => {
if two_typo_words.len() < limits::MAX_TWO_TYPOS_COUNT {
two_typo_words.insert(derived_word);
}
}
}
NumberOfTypos::Two => {
if two_typo_words.len() < limits::MAX_TWO_TYPOS_COUNT {
two_typo_words.insert(derived_word);
}
}
}
Ok(ControlFlow::Continue(()))
},
)?;
Ok(ControlFlow::Continue(()))
},
)?;
}
let split_words = find_split_words(ctx, original_str.as_str())?;
let self_mut = ctx.term_interner.get_mut(self);

View File

@ -43,7 +43,7 @@ pub struct QueryTermSubset {
pub struct QueryTerm {
original: Interned<String>,
ngram_words: Option<Vec<Interned<String>>>,
max_nbr_typos: u8,
max_levenshtein_distance: u8,
is_prefix: bool,
zero_typo: ZeroTypoTerm,
// May not be computed yet
@ -342,10 +342,16 @@ impl QueryTermSubset {
}
None
}
pub fn max_nbr_typos(&self, ctx: &SearchContext) -> u8 {
pub fn max_typo_cost(&self, ctx: &SearchContext) -> u8 {
let t = ctx.term_interner.get(self.original);
match t.max_nbr_typos {
0 => 0,
match t.max_levenshtein_distance {
0 => {
if t.allows_split_words() {
1
} else {
0
}
}
1 => {
if self.one_typo_subset.is_empty() {
0
@ -438,6 +444,9 @@ impl QueryTerm {
self.zero_typo.is_empty() && one_typo.is_empty() && two_typo.is_empty()
}
fn allows_split_words(&self) -> bool {
self.zero_typo.phrase.is_none()
}
}
impl Interned<QueryTerm> {

View File

@ -217,7 +217,7 @@ pub fn make_ngram(
original: ngram_str_interned,
ngram_words: Some(words_interned),
is_prefix,
max_nbr_typos,
max_levenshtein_distance: max_nbr_typos,
zero_typo: term.zero_typo,
one_typo: Lazy::Uninit,
two_typo: Lazy::Uninit,
@ -271,7 +271,7 @@ impl PhraseBuilder {
QueryTerm {
original: ctx.word_interner.insert(phrase_desc),
ngram_words: None,
max_nbr_typos: 0,
max_levenshtein_distance: 0,
is_prefix: false,
zero_typo: ZeroTypoTerm {
phrase: Some(phrase),

View File

@ -205,18 +205,12 @@ impl<G: RankingRuleGraphTrait> VisitorState<G> {
impl<G: RankingRuleGraphTrait> RankingRuleGraph<G> {
pub fn find_all_costs_to_end(&self) -> MappedInterner<QueryNode, Vec<u64>> {
let mut costs_to_end = self.query_graph.nodes.map(|_| vec![]);
let mut enqueued = SmallBitmap::new(self.query_graph.nodes.len());
let mut node_stack = VecDeque::new();
*costs_to_end.get_mut(self.query_graph.end_node) = vec![0];
for prev_node in self.query_graph.nodes.get(self.query_graph.end_node).predecessors.iter() {
node_stack.push_back(prev_node);
enqueued.insert(prev_node);
}
while let Some(cur_node) = node_stack.pop_front() {
self.traverse_breadth_first_backward(self.query_graph.end_node, |cur_node| {
if cur_node == self.query_graph.end_node {
*costs_to_end.get_mut(self.query_graph.end_node) = vec![0];
return;
}
let mut self_costs = Vec::<u64>::new();
let cur_node_edges = &self.edges_of_node.get(cur_node);
@ -232,13 +226,7 @@ impl<G: RankingRuleGraphTrait> RankingRuleGraph<G> {
self_costs.dedup();
*costs_to_end.get_mut(cur_node) = self_costs;
for prev_node in self.query_graph.nodes.get(cur_node).predecessors.iter() {
if !enqueued.contains(prev_node) {
node_stack.push_back(prev_node);
enqueued.insert(prev_node);
}
}
}
});
costs_to_end
}
@ -247,17 +235,12 @@ impl<G: RankingRuleGraphTrait> RankingRuleGraph<G> {
node_with_removed_outgoing_conditions: Interned<QueryNode>,
costs: &mut MappedInterner<QueryNode, Vec<u64>>,
) {
let mut enqueued = SmallBitmap::new(self.query_graph.nodes.len());
let mut node_stack = VecDeque::new();
enqueued.insert(node_with_removed_outgoing_conditions);
node_stack.push_back(node_with_removed_outgoing_conditions);
'main_loop: while let Some(cur_node) = node_stack.pop_front() {
// Traverse the graph backward from the target node, recomputing the cost for each of its predecessors.
// We first check that no other node is contributing the same total cost to a predecessor before removing
// the cost from the predecessor.
self.traverse_breadth_first_backward(node_with_removed_outgoing_conditions, |cur_node| {
let mut costs_to_remove = FxHashSet::default();
for c in costs.get(cur_node) {
costs_to_remove.insert(*c);
}
costs_to_remove.extend(costs.get(cur_node).iter().copied());
let cur_node_edges = &self.edges_of_node.get(cur_node);
for edge_idx in cur_node_edges.iter() {
@ -265,22 +248,75 @@ impl<G: RankingRuleGraphTrait> RankingRuleGraph<G> {
for cost in costs.get(edge.dest_node).iter() {
costs_to_remove.remove(&(*cost + edge.cost as u64));
if costs_to_remove.is_empty() {
continue 'main_loop;
return;
}
}
}
if costs_to_remove.is_empty() {
continue 'main_loop;
return;
}
let mut new_costs = BTreeSet::from_iter(costs.get(cur_node).iter().copied());
for c in costs_to_remove {
new_costs.remove(&c);
}
*costs.get_mut(cur_node) = new_costs.into_iter().collect();
});
}
/// Traverse the graph backwards from the given node such that every time
/// a node is visited, we are guaranteed that all its successors either:
/// 1. have already been visited; OR
/// 2. were not reachable from the given node
pub fn traverse_breadth_first_backward(
&self,
from: Interned<QueryNode>,
mut visit: impl FnMut(Interned<QueryNode>),
) {
let mut reachable = SmallBitmap::for_interned_values_in(&self.query_graph.nodes);
{
// go backward to get the set of all reachable nodes from the given node
// the nodes that are not reachable will be set as `visited`
let mut stack = VecDeque::new();
let mut enqueued = SmallBitmap::for_interned_values_in(&self.query_graph.nodes);
enqueued.insert(from);
stack.push_back(from);
while let Some(n) = stack.pop_front() {
if reachable.contains(n) {
continue;
}
reachable.insert(n);
for prev_node in self.query_graph.nodes.get(n).predecessors.iter() {
if !enqueued.contains(prev_node) && !reachable.contains(prev_node) {
stack.push_back(prev_node);
enqueued.insert(prev_node);
}
}
}
};
let mut unreachable_or_visited =
SmallBitmap::for_interned_values_in(&self.query_graph.nodes);
for (n, _) in self.query_graph.nodes.iter() {
if !reachable.contains(n) {
unreachable_or_visited.insert(n);
}
}
let mut enqueued = SmallBitmap::for_interned_values_in(&self.query_graph.nodes);
let mut stack = VecDeque::new();
enqueued.insert(from);
stack.push_back(from);
while let Some(cur_node) = stack.pop_front() {
if !self.query_graph.nodes.get(cur_node).successors.is_subset(&unreachable_or_visited) {
stack.push_back(cur_node);
continue;
}
unreachable_or_visited.insert(cur_node);
visit(cur_node);
for prev_node in self.query_graph.nodes.get(cur_node).predecessors.iter() {
if !enqueued.contains(prev_node) {
node_stack.push_back(prev_node);
if !enqueued.contains(prev_node) && !unreachable_or_visited.contains(prev_node) {
stack.push_back(prev_node);
enqueued.insert(prev_node);
}
}

View File

@ -20,6 +20,8 @@ mod position;
mod proximity;
/// Implementation of the `typo` ranking rule
mod typo;
/// Implementation of the `words` ranking rule
mod words;
use std::collections::BTreeSet;
use std::hash::Hash;
@ -33,6 +35,7 @@ pub use position::{PositionCondition, PositionGraph};
pub use proximity::{ProximityCondition, ProximityGraph};
use roaring::RoaringBitmap;
pub use typo::{TypoCondition, TypoGraph};
pub use words::{WordsCondition, WordsGraph};
use super::interner::{DedupInterner, FixedSizeInterner, Interned, MappedInterner};
use super::query_term::LocatedQueryTermSubset;

View File

@ -111,23 +111,16 @@ impl RankingRuleGraphTrait for PositionGraph {
fn cost_from_position(sum_positions: u32) -> u32 {
match sum_positions {
0 | 1 | 2 | 3 => sum_positions,
4 | 5 => 4,
6 | 7 => 5,
8 | 9 => 6,
10 | 11 => 7,
12 | 13 => 8,
14 | 15 => 9,
16 | 17..=24 => 10,
25..=32 => 11,
33..=64 => 12,
65..=128 => 13,
129..=256 => 14,
257..=512 => 15,
513..=1024 => 16,
1025..=2048 => 17,
2049..=4096 => 18,
4097..=8192 => 19,
_ => 20,
0 => 0,
1 => 1,
2..=4 => 2,
5..=7 => 3,
8..=11 => 4,
12..=16 => 5,
17..=24 => 6,
25..=64 => 7,
65..=256 => 8,
257..=1024 => 9,
_ => 10,
}
}

View File

@ -50,7 +50,7 @@ impl RankingRuleGraphTrait for TypoGraph {
// 3-gram -> equivalent to 2 typos
let base_cost = if term.term_ids.len() == 1 { 0 } else { term.term_ids.len() as u32 };
for nbr_typos in 0..=term.term_subset.max_nbr_typos(ctx) {
for nbr_typos in 0..=term.term_subset.max_typo_cost(ctx) {
let mut term = term.clone();
match nbr_typos {
0 => {

View File

@ -0,0 +1,49 @@
use roaring::RoaringBitmap;
use super::{ComputedCondition, RankingRuleGraphTrait};
use crate::search::new::interner::{DedupInterner, Interned};
use crate::search::new::query_term::LocatedQueryTermSubset;
use crate::search::new::resolve_query_graph::compute_query_term_subset_docids;
use crate::search::new::SearchContext;
use crate::Result;
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct WordsCondition {
term: LocatedQueryTermSubset,
}
pub enum WordsGraph {}
impl RankingRuleGraphTrait for WordsGraph {
type Condition = WordsCondition;
fn resolve_condition(
ctx: &mut SearchContext,
condition: &Self::Condition,
universe: &RoaringBitmap,
) -> Result<ComputedCondition> {
let WordsCondition { term, .. } = condition;
// maybe compute_query_term_subset_docids should accept a universe as argument
let mut docids = compute_query_term_subset_docids(ctx, &term.term_subset)?;
docids &= universe;
Ok(ComputedCondition {
docids,
universe_len: universe.len(),
start_term_subset: None,
end_term_subset: term.clone(),
})
}
fn build_edges(
_ctx: &mut SearchContext,
conditions_interner: &mut DedupInterner<Self::Condition>,
_from: Option<&LocatedQueryTermSubset>,
to_term: &LocatedQueryTermSubset,
) -> Result<Vec<(u32, Interned<Self::Condition>)>> {
Ok(vec![(
to_term.term_ids.len() as u32,
conditions_interner.insert(WordsCondition { term: to_term.clone() }),
)])
}
}

View File

@ -138,7 +138,7 @@ fn test_attribute_position_simple() {
s.terms_matching_strategy(TermsMatchingStrategy::All);
s.query("quick brown");
let SearchResult { documents_ids, .. } = s.execute().unwrap();
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[10, 11, 12, 13, 3, 4, 2, 1, 0, 6, 8, 7, 9, 5]");
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[10, 11, 12, 13, 2, 3, 4, 1, 0, 6, 8, 7, 9, 5]");
}
#[test]
fn test_attribute_position_repeated() {
@ -163,7 +163,7 @@ fn test_attribute_position_different_fields() {
s.terms_matching_strategy(TermsMatchingStrategy::All);
s.query("quick brown");
let SearchResult { documents_ids, .. } = s.execute().unwrap();
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[10, 11, 12, 13, 3, 4, 2, 1, 0, 6, 8, 7, 9, 5]");
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[10, 11, 12, 13, 2, 3, 4, 1, 0, 6, 8, 7, 9, 5]");
}
#[test]
@ -176,5 +176,5 @@ fn test_attribute_position_ngrams() {
s.terms_matching_strategy(TermsMatchingStrategy::All);
s.query("quick brown");
let SearchResult { documents_ids, .. } = s.execute().unwrap();
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[10, 11, 12, 13, 3, 4, 2, 1, 0, 6, 8, 7, 9, 5]");
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[10, 11, 12, 13, 2, 3, 4, 1, 0, 6, 8, 7, 9, 5]");
}

View File

@ -3,9 +3,9 @@ This module tests the following properties:
1. Two consecutive words from a query can be combined into a "2gram"
2. Three consecutive words from a query can be combined into a "3gram"
3. A word from the query can be split into two consecutive words (split words)
3. A word from the query can be split into two consecutive words (split words), no matter how short it is
4. A 2gram can be split into two words
5. A 3gram cannot be split into two words
5. A 3gram can be split into two words
6. 2grams can contain up to 1 typo
7. 3grams cannot have typos
8. 2grams and 3grams can be prefix tolerant
@ -14,6 +14,7 @@ This module tests the following properties:
11. Disabling typo tolerance does not disable ngram tolerance
12. Prefix tolerance is disabled for the last word if a space follows it
13. Ngrams cannot be formed by combining a phrase and a word or two phrases
14. Split words are not disabled by the `disableOnAttribute` or `disableOnWords` typo settings
*/
use crate::index::tests::TempIndex;
@ -56,6 +57,10 @@ fn create_index() -> TempIndex {
{
"id": 5,
"text": "sunflowering is not a verb"
},
{
"id": 6,
"text": "xy z"
}
]))
.unwrap();
@ -263,10 +268,11 @@ fn test_disable_split_words() {
s.query("sunflower ");
let SearchResult { documents_ids, .. } = s.execute().unwrap();
// no document containing `sun flower`
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[3]");
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[1, 3]");
let texts = collect_field_values(&index, &txn, "text", &documents_ids);
insta::assert_debug_snapshot!(texts, @r###"
[
"\"the sun flower is tall\"",
"\"the sunflower is tall\"",
]
"###);
@ -307,10 +313,11 @@ fn test_3gram_no_split_words() {
let SearchResult { documents_ids, .. } = s.execute().unwrap();
// no document with `sun flower`
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[2, 3, 5]");
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[1, 2, 3, 5]");
let texts = collect_field_values(&index, &txn, "text", &documents_ids);
insta::assert_debug_snapshot!(texts, @r###"
[
"\"the sun flower is tall\"",
"\"the sunflowers are pretty\"",
"\"the sunflower is tall\"",
"\"sunflowering is not a verb\"",
@ -369,3 +376,50 @@ fn test_no_ngram_phrases() {
]
"###);
}
#[test]
fn test_short_split_words() {
let index = create_index();
let txn = index.read_txn().unwrap();
let mut s = Search::new(&txn, &index);
s.terms_matching_strategy(TermsMatchingStrategy::All);
s.query("xyz");
let SearchResult { documents_ids, .. } = s.execute().unwrap();
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[6]");
let texts = collect_field_values(&index, &txn, "text", &documents_ids);
insta::assert_debug_snapshot!(texts, @r###"
[
"\"xy z\"",
]
"###);
}
#[test]
fn test_split_words_never_disabled() {
let index = create_index();
index
.update_settings(|s| {
s.set_exact_words(["sunflower"].iter().map(ToString::to_string).collect());
s.set_exact_attributes(["text"].iter().map(ToString::to_string).collect());
})
.unwrap();
let txn = index.read_txn().unwrap();
let mut s = Search::new(&txn, &index);
s.terms_matching_strategy(TermsMatchingStrategy::All);
s.query("the sunflower is tall");
let SearchResult { documents_ids, .. } = s.execute().unwrap();
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[1, 3]");
let texts = collect_field_values(&index, &txn, "text", &documents_ids);
insta::assert_debug_snapshot!(texts, @r###"
[
"\"the sun flower is tall\"",
"\"the sunflower is tall\"",
]
"###);
}

View File

@ -9,7 +9,7 @@ This module tests the following properties:
6. A typo on the first letter of a word counts as two typos
7. Phrases are not typo tolerant
8. 2grams can have 1 typo if they are larger than `min_word_len_two_typos`
9. 3grams are not typo tolerant
9. 3grams are not typo tolerant (but they can be split into two words)
10. The `typo` ranking rule assumes the role of the `words` ranking rule implicitly
if `words` doesn't exist before it.
11. The `typo` ranking rule places documents with the same number of typos in the same bucket
@ -287,16 +287,17 @@ fn test_typo_exact_word() {
]
"###);
// exact words do not disable prefix (sunflowering OK, but no sunflowar or sun flower)
// exact words do not disable prefix (sunflowering OK, but no sunflowar)
let mut s = Search::new(&txn, &index);
s.terms_matching_strategy(TermsMatchingStrategy::All);
s.query("network interconnection sunflower");
let SearchResult { documents_ids, .. } = s.execute().unwrap();
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[16, 18]");
insta::assert_snapshot!(format!("{documents_ids:?}"), @"[16, 17, 18]");
let texts = collect_field_values(&index, &txn, "text", &documents_ids);
insta::assert_debug_snapshot!(texts, @r###"
[
"\"network interconnection sunflower\"",
"\"network interconnection sun flower\"",
"\"network interconnection sunflowering\"",
]
"###);

View File

@ -1,87 +0,0 @@
use roaring::RoaringBitmap;
use super::logger::SearchLogger;
use super::query_graph::QueryNode;
use super::resolve_query_graph::compute_query_graph_docids;
use super::small_bitmap::SmallBitmap;
use super::{QueryGraph, RankingRule, RankingRuleOutput, SearchContext};
use crate::{Result, TermsMatchingStrategy};
pub struct Words {
exhausted: bool, // TODO: remove
query_graph: Option<QueryGraph>,
nodes_to_remove: Vec<SmallBitmap<QueryNode>>,
terms_matching_strategy: TermsMatchingStrategy,
}
impl Words {
pub fn new(terms_matching_strategy: TermsMatchingStrategy) -> Self {
Self {
exhausted: true,
query_graph: None,
nodes_to_remove: vec![],
terms_matching_strategy,
}
}
}
impl<'ctx> RankingRule<'ctx, QueryGraph> for Words {
fn id(&self) -> String {
"words".to_owned()
}
fn start_iteration(
&mut self,
ctx: &mut SearchContext<'ctx>,
_logger: &mut dyn SearchLogger<QueryGraph>,
_universe: &RoaringBitmap,
parent_query_graph: &QueryGraph,
) -> Result<()> {
self.exhausted = false;
self.query_graph = Some(parent_query_graph.clone());
self.nodes_to_remove = match self.terms_matching_strategy {
TermsMatchingStrategy::Last => {
let mut ns = parent_query_graph.removal_order_for_terms_matching_strategy_last(ctx);
ns.reverse();
ns
}
TermsMatchingStrategy::All => {
vec![]
}
};
Ok(())
}
fn next_bucket(
&mut self,
ctx: &mut SearchContext<'ctx>,
logger: &mut dyn SearchLogger<QueryGraph>,
universe: &RoaringBitmap,
) -> Result<Option<RankingRuleOutput<QueryGraph>>> {
if self.exhausted {
return Ok(None);
}
let Some(query_graph) = &mut self.query_graph else { panic!() };
logger.log_internal_state(query_graph);
let this_bucket = compute_query_graph_docids(ctx, query_graph, universe)?;
let child_query_graph = query_graph.clone();
if self.nodes_to_remove.is_empty() {
self.exhausted = true;
} else {
let nodes_to_remove = self.nodes_to_remove.pop().unwrap();
query_graph.remove_nodes_keep_edges(&nodes_to_remove.iter().collect::<Vec<_>>());
}
Ok(Some(RankingRuleOutput { query: child_query_graph, candidates: this_bucket }))
}
fn end_iteration(
&mut self,
_ctx: &mut SearchContext<'ctx>,
_logger: &mut dyn SearchLogger<QueryGraph>,
) {
self.exhausted = true;
self.nodes_to_remove = vec![];
self.query_graph = None;
}
}

View File

@ -261,7 +261,9 @@ pub(crate) mod test_helpers {
let options = options.map_size(4096 * 4 * 1000 * 100);
let tempdir = tempfile::TempDir::new().unwrap();
let env = options.open(tempdir.path()).unwrap();
let content = env.create_database(None).unwrap();
let mut wtxn = env.write_txn().unwrap();
let content = env.create_database(&mut wtxn, None).unwrap();
wtxn.commit().unwrap();
FacetIndex {
content,

View File

@ -14,7 +14,7 @@ use crate::error::UserError;
use crate::index::{DEFAULT_MIN_WORD_LEN_ONE_TYPO, DEFAULT_MIN_WORD_LEN_TWO_TYPOS};
use crate::update::index_documents::IndexDocumentsMethod;
use crate::update::{IndexDocuments, UpdateIndexingStep};
use crate::{FieldsIdsMap, Index, Result};
use crate::{FieldsIdsMap, Index, OrderBy, Result};
#[derive(Debug, Clone, PartialEq, Eq, Copy)]
pub enum Setting<T> {
@ -122,6 +122,7 @@ pub struct Settings<'a, 't, 'u, 'i> {
/// Attributes on which typo tolerance is disabled.
exact_attributes: Setting<HashSet<String>>,
max_values_per_facet: Setting<usize>,
sort_facet_values_by: Setting<HashMap<String, OrderBy>>,
pagination_max_total_hits: Setting<usize>,
}
@ -149,6 +150,7 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
min_word_len_one_typo: Setting::NotSet,
exact_attributes: Setting::NotSet,
max_values_per_facet: Setting::NotSet,
sort_facet_values_by: Setting::NotSet,
pagination_max_total_hits: Setting::NotSet,
indexer_config,
}
@ -275,6 +277,14 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
self.max_values_per_facet = Setting::Reset;
}
pub fn set_sort_facet_values_by(&mut self, value: HashMap<String, OrderBy>) {
self.sort_facet_values_by = Setting::Set(value);
}
pub fn reset_sort_facet_values_by(&mut self) {
self.sort_facet_values_by = Setting::Reset;
}
pub fn set_pagination_max_total_hits(&mut self, value: usize) {
self.pagination_max_total_hits = Setting::Set(value);
}
@ -680,6 +690,20 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
Ok(())
}
fn update_sort_facet_values_by(&mut self) -> Result<()> {
match self.sort_facet_values_by.as_ref() {
Setting::Set(value) => {
self.index.put_sort_facet_values_by(self.wtxn, value)?;
}
Setting::Reset => {
self.index.delete_sort_facet_values_by(self.wtxn)?;
}
Setting::NotSet => (),
}
Ok(())
}
fn update_pagination_max_total_hits(&mut self) -> Result<()> {
match self.pagination_max_total_hits {
Setting::Set(max) => {
@ -714,6 +738,7 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
self.update_min_typo_word_len()?;
self.update_exact_words()?;
self.update_max_values_per_facet()?;
self.update_sort_facet_values_by()?;
self.update_pagination_max_total_hits()?;
// If there is new faceted fields we indicate that we must reindex as we must
@ -1515,6 +1540,7 @@ mod tests {
exact_words,
exact_attributes,
max_values_per_facet,
sort_facet_values_by,
pagination_max_total_hits,
} = settings;
assert!(matches!(searchable_fields, Setting::NotSet));
@ -1532,6 +1558,7 @@ mod tests {
assert!(matches!(exact_words, Setting::NotSet));
assert!(matches!(exact_attributes, Setting::NotSet));
assert!(matches!(max_values_per_facet, Setting::NotSet));
assert!(matches!(sort_facet_values_by, Setting::NotSet));
assert!(matches!(pagination_max_total_hits, Setting::NotSet));
})
.unwrap();

View File

@ -5,7 +5,7 @@ use heed::EnvOpenOptions;
use maplit::hashset;
use milli::documents::{DocumentsBatchBuilder, DocumentsBatchReader};
use milli::update::{IndexDocuments, IndexDocumentsConfig, IndexerConfig, Settings};
use milli::{FacetDistribution, Index, Object};
use milli::{FacetDistribution, Index, Object, OrderBy};
use serde_json::Deserializer;
#[test]
@ -63,12 +63,12 @@ fn test_facet_distribution_with_no_facet_values() {
let txn = index.read_txn().unwrap();
let mut distrib = FacetDistribution::new(&txn, &index);
distrib.facets(vec!["genres"]);
distrib.facets(vec![("genres", OrderBy::default())]);
let result = distrib.execute().unwrap();
assert_eq!(result["genres"].len(), 0);
let mut distrib = FacetDistribution::new(&txn, &index);
distrib.facets(vec!["tags"]);
distrib.facets(vec![("tags", OrderBy::default())]);
let result = distrib.execute().unwrap();
assert_eq!(result["tags"].len(), 2);
}