Commit Graph

8222 Commits

Author SHA1 Message Date
73bb080a26 Merge #3699
3699: Search for Facet Values r=Kerollmops a=Kerollmops

This PR introduces the first version of [the _Search for Facet Values_ feature](https://github.com/meilisearch/product/discussions/515) that allows a user to search for facets, by optionally using a prefix string and optionally specifying the `q` and `filter` original search parameters to restrict the candidates to search in.

The steps to merge it into Meilisearch will first start by providing prototype Docker images. This way users will be able to test the prototypes before using them.

The current route to use the _Search for Facet Values_ feature is the `POST /indexes/{index}/facet-search` where the body is a JSON object that looks like the following:
```json5
{
  "q": "spiderman", // optional
  "filter": "rating > 10", // optional
  "facetName": "genres",
  "facetQuery": "a" // optional
}
```

## What is missing?

 - [x] Send some analytics.
 - [x] Support the `matchingStrategy` parameter.
 - [x] Make sure that the errors are the right ones.
 - [x] Use the [Index typo tolerance settings](https://www.meilisearch.com/docs/learn/configuration/typo_tolerance#minwordsizefortypos) when matching facet values.
    - [x] minWordSizeForTypos.oneTypo
    - [x] minWordSizeForTypos.twoTypo
 - [x] Add tests
 - [x] Log the time it took to compute the results.
 - [x] Fix the compilation warnings.
 - [x] [Create an issue to fix potential performance issues when indexing](https://github.com/meilisearch/meilisearch/issues/3862).


Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-06-29 09:08:55 +00:00
44b5b9e1a7 Improve the documentation of the FacetSearchQuery struct 2023-06-29 10:28:23 +02:00
605c1dd54a Fix analytics 2023-06-28 16:41:56 +02:00
3e3f73ba1e Fix the analytics 2023-06-28 15:45:09 +02:00
efbe7ce78b Clean the facet string FSTs when we clear the documents 2023-06-28 15:36:32 +02:00
82e1f59f1e Add attributes_to_search_on 2023-06-28 15:28:24 +02:00
362e9ff845 Add more tests 2023-06-28 15:28:24 +02:00
32f2556d22 Move the additional_search_parameters_provided analytic inside facets 2023-06-28 15:06:09 +02:00
63fd10aaa5 Fix the invalid facet name field error code 2023-06-28 15:06:09 +02:00
29b40295b8 Ignore unknown facet search query parameters 2023-06-28 15:06:09 +02:00
26f0fa678d Change the error message when a facet is not searchable 2023-06-28 15:06:09 +02:00
60ddd53439 Return one of the original facet values when doing a facet search 2023-06-28 15:06:09 +02:00
2bcd8d2983 Make sure the facet queries are normalized 2023-06-28 15:06:09 +02:00
09079a4e88 Remove useless InvalidSearchFacet error 2023-06-28 15:06:09 +02:00
904f6574bf Make rustfmt happy 2023-06-28 15:06:08 +02:00
6fb8af423c Rename the hits and query output into facetHits and facetQuery respectively 2023-06-28 15:06:08 +02:00
cb0bb399fa Fix the error code returned when the facetName field is missing 2023-06-28 15:06:08 +02:00
41760a9306 Introduce a new invalid_facet_search_facet_name error code 2023-06-28 15:06:07 +02:00
e9a3029c30 Use the right field id to write the string facet values FST 2023-06-28 15:01:51 +02:00
ed0ff47551 Return an empty list of results if attribute is set as filterable 2023-06-28 15:01:51 +02:00
e1b8fb48ee Use the minWordSizeForTypos index settings 2023-06-28 15:01:51 +02:00
87e22e436a Fix compilation issues 2023-06-28 15:01:51 +02:00
0252cfe8b6 Simplify the placeholder search of the facet-search route 2023-06-28 15:01:50 +02:00
f35ad96afa Use the disableOnAttributes parameter on the facet-search route 2023-06-28 15:01:50 +02:00
2ceb781c73 Use the disableOnWords parameter on the facet-search route 2023-06-28 15:01:50 +02:00
7bd67543dd Support the typoTolerant.enabled parameter 2023-06-28 15:01:50 +02:00
8e86eb91bb Log an error when a facet value is missing from the database 2023-06-28 15:01:50 +02:00
55c17aa38b Rename the SearchForFacetValues struct 2023-06-28 15:01:50 +02:00
aadbe88048 Return an internal error when a field id is missing 2023-06-28 15:01:50 +02:00
f36de2115f Make clippy happy 2023-06-28 15:01:50 +02:00
702041b7e1 Improve the returned errors from the facet-search route 2023-06-28 15:01:48 +02:00
a05074e675 Fix the max number of facets to be returned to 100 2023-06-28 14:58:42 +02:00
93f30e65a9 Return the correct response JSON object from the facet-search route 2023-06-28 14:58:42 +02:00
893592c5e9 Send analytics about the facet-search route 2023-06-28 14:58:42 +02:00
e81809aae7 Make the search for facet work 2023-06-28 14:58:41 +02:00
ce7e7f12c8 Introduce the facet search route 2023-06-28 14:58:41 +02:00
addb21f110 Restrict the number of facet search results to 1000 2023-06-28 14:58:41 +02:00
c34de05106 Introduce the SearchForFacetValue struct 2023-06-28 14:58:41 +02:00
15a4c05379 Store the facet string values in multiple FSTs 2023-06-28 14:58:41 +02:00
9deeec88e0 Merge #3861
3861: Add "meilisearch" prefix to last metrics that were missing it r=Kerollmops a=dureuill

# Pull Request

## Related issue
Related to #3790 

## What does this PR do?
- change implementation to follow the spec on metrics name
- regenerate grafana dashboard from the code

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-28 09:28:31 +00:00
167ac55a2d Update dashboard generated from grafana 2023-06-28 11:22:16 +02:00
ea68ccd034 prefix http_* metrics by meilisearch 2023-06-28 11:21:50 +02:00
d4f10800f2 Merge #3834
3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish

## Summary
This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`:

```json
{
  "q": "Captain Marvel",
  "attributesToSearchOn": ["title"]
}
```

This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. 

## Trying the prototype

A dedicated docker image has been released for this feature:

#### last prototype version:

```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1
```

#### others prototype versions:

```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0
```

## Technical Detail

The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases.
The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache.

### Relevancy limits

Almost all ranking rules behave as expected when ordering the documents.
Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it:
```rust
#[actix_rt::test]
async fn proximity_ranking_rule_order() {
    let server = Server::new().await;
    let index = index_with_documents(
        &server,
        &json!([
        {
            "title": "Captain super mega cool. A Marvel story",
            // Perfect distance between words in an ignored attribute
            "desc": "Captain Marvel",
            "id": "1",
        },
        {
            "title": "Captain America from Marvel",
            "desc": "a Shazam ersatz",
            "id": "2",
        }]),
    )
    .await;

    // Document 2 should appear before document 1.
    index
        .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), |response, code| {
            assert_eq!(code, 200, "{}", response);
            assert_eq!(
                response["hits"],
                json!([
                    {"id": "2"},
                    {"id": "1"},
                ])
            );
        })
        .await;
}
```

Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR.

## Related

Fixes #3772

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-28 08:19:23 +00:00
dc293911ad Merge #3745
3745: tests: add unit test for `PayloadTooLarge` error r=curquiza a=cymruu

# Pull Request
Add a unit test for the `Payload`, which verifies that a request with a payload that is too large is rejected with the appropriate message.
This was requested in this PR https://github.com/meilisearch/meilisearch/pull/3739

## Related issue
https://github.com/meilisearch/meilisearch/pull/3739

## What does this PR do?
- Adds requested test

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Filip Bachul <filipbachul@gmail.com>
2023-06-27 14:58:23 +00:00
9d68e6969e Merge #3859
3859: Merge all analytics events pertaining to updating the experimental features r=Kerollmops a=dureuill

Follow-up to #3850 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-27 13:26:01 +00:00
b4b686d253 Merge all analytics events pertaining to updating the experimental features 2023-06-27 15:16:23 +02:00
98ec476198 Merge #3855
3855: Change and add links to the Cloud r=Kerollmops a=dureuill

- add cloud link in banner
- add utm to existing links following https://github.com/meilisearch/integration-guides/issues/277#issuecomment-1592054536

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-27 12:29:36 +00:00
c47b8a8bfe Fix typo
Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>
2023-06-27 14:27:54 +02:00
054f81a021 Make message consistent with the one in integration repos 2023-06-27 14:20:55 +02:00
d8ea688481 Merge #3825
3825: Accept semantic vectors and allow users to query nearest neighbors r=Kerollmops a=Kerollmops

This Pull Request brings a new feature to the current API. The engine accepts a new `_vectors` field akin to the `_geo` one. This vector is stored in Meilisearch and can be retrieved via search. This work is the first step toward hybrid search, bringing the best of both worlds: keyword and semantic search ❤️‍🔥

## ToDo
 - [x] Make it possible to get the `limit` nearest neighbors from a user-generated vector by using the `vector` field of search route.
 - [x] Delete the documents and vectors from the HNSW-related data structures.
     - [x] Do it the slow and ugly way (we need to be able to iterate over all the values).
     - [ ] Do it the efficient way (Wait for a new method or implement it myself).
 - [ ] ~~Move from the `hnsw` crate to the hgg one~~ The hgg crate is too slow.
   Meilisearch takes approximately 88s to answer a query. It is related to the time it takes to deserialize the `Hgg` data structure or search in it. I didn't take the time to measure precisely. We moved back to the hnsw crate which takes approximately 40ms to answer.
   - [ ] ~~Wait for a fix for https://github.com/rust-cv/hgg/issues/4.~~
 - [x] Fix the current dot product function.
 - [x] Fill in the other `SearchResult` fields.
 - [x] Remove the `hnsw` dependency of the meilisearch crate.
 - [x] Fix the pages by taking the offset into account.
 - [x] Release a first prototype https://github.com/meilisearch/product/discussions/621#discussioncomment-6183647
 - [x] Make the pagination and filtering faster and more correct.
 - [x] Return the original vector in the output search results (like `query`).
 - [x] Return an `_semanticSimilarity` field in the documents (it's a dot product)
   - [x] Return this score even if the `_vectors` field is not displayed
   - [x] Rename the field `_semanticScore`.
   - [ ] Return the `_geoDistance` value even if the `_geo` field is not displayed
 - [x] Store the HNSW on possibly multiple LMDB values.
   - [ ] Measure it and make it faster if needed
   - [ ] Export the `ReadableSlices` type into a small external crate
 - [x] Accept an `_vectors` field instead of the `_vector` one.
 - [x] Normalize all vectors.
 - [ ] Remove the `_vectors` field from the default searchable attributes (as we do with `_geo`?).
 - [ ] Correctly compute the candidates by remembering the documents having a valid `_vectors` field.
 - [ ] Return the right errors:
     - [ ] Return an error when the query vector is not the same length as the vectors in the HNSW.
     - [ ] We must return the user document id that triggered the vector dimension issue.
     - [x] If an indexation error occurs.
     - [ ] Fix the error codes when using the search route.
 - [ ] ~~Introduce some settings:~~
    We currently ensure that the vector length is consistent over the whole set of documents and return an error for when a vector dimension doesn't follow the current number of dimensions.
     - [ ] The length of the vector the user will provide.
     - [ ] The distance function (we only support dot as of now).
 - [ ] Introduce other distance functions
    - [ ] Euclidean
    - [ ] Dot Product
    - [ ] Cosine
    - [ ] Make them SIMD optimized
    - [ ] Give credit to qdrant
 - [ ] Add tests.
 - [ ] Write a mini spec.
 - [ ] Release it in v1.3 as an experimental feature.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-06-27 11:17:07 +00:00