Commit Graph

9427 Commits

Author SHA1 Message Date
Tamo
c9ac7f2e7e update heed to latest version 2024-05-20 15:19:00 +02:00
Tamo
7e251b43d4 Revert "Stream documents" 2024-05-20 15:09:45 +02:00
Louis Dureuil
9969f7a638 Add test on index-scheduler 2024-05-20 14:44:10 +02:00
Louis Dureuil
b17cb56dee Test array of vectors 2024-05-20 14:44:10 +02:00
Louis Dureuil
afcd7b9f0c Test hybrid search with hf embedder 2024-05-20 14:44:10 +02:00
ManyTheFish
fc7e817221 Index geo points based on the settings differences 2024-05-20 12:27:26 +02:00
Tamo
0f78703b85 add a test reproducing the bug 2024-05-20 10:58:08 +02:00
Louis Dureuil
30cf972987 Add test with a dump 2024-05-20 10:36:18 +02:00
Louis Dureuil
d05d49ffd8 Fix tests 2024-05-20 10:36:18 +02:00
Louis Dureuil
0462ebbe58 Don't write an empty _vectors field 2024-05-20 10:36:18 +02:00
Louis Dureuil
2f7a8a4efb Don't write vectors that weren't autogenerated in document DB 2024-05-20 10:36:18 +02:00
Louis Dureuil
02714ef5ed Add vectors from vector DB in dump 2024-05-20 10:36:18 +02:00
Louis Dureuil
52d9cb6e5a Refactor vector indexing
- use the parsed_vectors module
- only parse `_vectors` once per document, instead of once per embedder per document
2024-05-20 10:36:17 +02:00
Louis Dureuil
261de888b7 Add function to get the embeddings of a document in an index 2024-05-20 10:36:17 +02:00
Louis Dureuil
98c811247e Add parsed vectors module 2024-05-20 10:25:59 +02:00
meili-bors[bot]
59ecf1cea7 Merge #4544
4544: Stream documents r=curquiza a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4383


### Perf
2M hackernews:

main:
Time to retrieve: 7s
RAM consumption: 2+GiB

stream:
Time to retrieve: 4.7s
RAM consumption: Too small

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-05-17 14:49:08 +00:00
Tamo
273c6e8c5c uses the latest version of heed to get rid of unsafe code 2024-05-16 18:31:32 +02:00
Tamo
897d25780e update milli to latest version 2024-05-16 18:31:32 +02:00
Tamo
c85d1752dd keep the same rtxn to compute the filters on the documents and to stream the documents later on 2024-05-16 18:31:32 +02:00
Tamo
8e6ffbfc6f stream documents 2024-05-16 18:31:32 +02:00
meili-bors[bot]
7c19c072fa Merge #4631
4631: Split the field id map from the weight of each fields r=Kerollmops a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4484

## What does this PR do?
- Make the (internal) searchable fields database always contain the searchable fields (instead of None when the user-defined searchable fields were not defined)
- Introduce a new « fieldids_weights_map » that does the mapping between a fieldId and its Weight
- Ensure that when two searchable fields are swapped, the field ID map doesn't change anymore (and thus, doesn't re-index)
- Uses the weight instead of the order of the searchable fields in the attribute ranking rule at search time
- When no searchable attributes are defined, make all their weights equal to zero
- When a field is declared as searchable and contains nested fields, all its subfields share the same weight

## Impact on relevancy

### When no searchable attributes are declared

When no searchable attributes are declared, all the fields have the same importance instead of randomly giving more importance to the field we've encountered « the most early » in the life of the index.

This means that before this PR, send the following json:
```json
[
  { "id": 0, "name": "kefir", "color": "white" },
  { "id": 1, "name": "white", "last name": "spirit" }
]
```

Would make the field `name` more important than the field `color` or `last name`.
This means that searching for `white` would make the document `1` automatically higher ranked than the document `0`.

After this PR, all the fields have the same weight, and none are considered more important than others.

### When a nested field is made searchable

The second behavior change that happened with this PR is in the case you're sending this document, for example:

```json
{
  "id": 0,
  "name": "tamo",
  "doggo": {
    "name": "kefir",
    "surname": "le kef"
  },
  "catto": "gromez"
}
```

Previously, defining the searchable attributes as: `["tamo", "doggo", "catto"]` was actually defining the « real » searchable attributes in the engine as: `["tamo", "doggo", "catto", "doggo.name", "doggo.surname"]`, which means that `doggo.name` and `doggo.surname` were _NOT_ where the user expected them and had completely different weights than `doggo`.
In this PR all the weights have been unified, and the « real » searchable fields look like this:
```json
[ "tamo", "doggo", "doggo.name", "doggo.surname", "catto"]
   ^^^^    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^^^^
Weight 0                 Weight 1                  Weight 2

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-05-16 09:59:24 +00:00
Tamo
673b6e1dc0 fix a flaky test 2024-05-16 11:28:14 +02:00
Tamo
f2d0a59f1d when no searchable attributes are defined, makes all the weight equals to zero 2024-05-16 01:06:33 +02:00
Tamo
c78a2fa4f5 rename method and variable around the attributes to search on feature 2024-05-15 18:04:42 +02:00
Tamo
5542f1d9f1 get back to what we were doingb efore in the DB cache and with the restricted field id 2024-05-15 18:00:39 +02:00
Tamo
ad4d8502b3 stops storing the whole fieldids weights map when no searchable are defined 2024-05-15 17:16:10 +02:00
Tamo
7ec4e2a3fb apply all style review comments 2024-05-15 15:02:26 +02:00
Tamo
9fffb8e83d make clippy happy 2024-05-14 17:36:32 +02:00
Tamo
caa6a7149a make the attribute ranking rule use the weights and fix the tests 2024-05-14 17:36:32 +02:00
Tamo
a0082c4df9 add a failing test on the attribute ranking rule 2024-05-14 17:00:02 +02:00
Tamo
b0afe0972e stop updating the fields ids map when fields are only swapped 2024-05-14 17:00:02 +02:00
Tamo
9ecde41853 add a test on the current behaviour 2024-05-14 17:00:02 +02:00
Tamo
685f452fb2 Fix the indexing of the searchable 2024-05-14 17:00:02 +02:00
Tamo
4e4a1ddff7 gate a test behind the required feature 2024-05-14 17:00:02 +02:00
Tamo
c22460045c Stops returning an option in the internal searchable fields 2024-05-14 17:00:02 +02:00
meili-bors[bot]
76bb6d565c Merge #4624
4624: Add "precommands" to benchmark r=dureuill a=dureuill

# Pull Request

## Related issue
Helps for https://github.com/meilisearch/meilisearch/issues/4493

## What does this PR do?
- Add support for precommands for cargo xtask bench
- update benchmark docs
- update workload files


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-05-13 08:27:56 +00:00
Louis Dureuil
9d3ff11b21 Modify existing workload files to use precommands 2024-05-07 14:03:14 +02:00
Louis Dureuil
43763eb98a Document precommands 2024-05-07 12:26:22 +02:00
Louis Dureuil
2a0ece814c Add precommands to workloads 2024-05-07 12:23:36 +02:00
meili-bors[bot]
95fcd17373 Merge #4622
4622: Bump Rustls to non-vulnerable versions r=Kerollmops a=Kerollmops

This PR Fixes #4599 by bumping the Rustls dependency to v0.21.12 and [ureq to v2.9.7](https://github.com/algesten/ureq/blob/main/CHANGELOG.md#297) (which bump rustls to v0.22.4).

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-07 09:47:30 +00:00
Clément Renault
ac4bc143c4 Bump ureq to v2.9.7 2024-05-07 10:39:38 +02:00
Clément Renault
f33a1282f8 Bump Rustls to v0.21.12 2024-05-07 10:31:39 +02:00
meili-bors[bot]
4d5971f343 Merge #4621
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
meili-bors[bot]
ecb5c506b3 Merge #4619
4619: Use http path pattern instead of full path in metrics r=irevoire a=gh2k

# Pull Request

## Related issue

Fixes #3983 

## What does this PR do?

- This records only the HTTP pattern in metrics instead of the full path

An alternative solution was proposed in #4145, but this doesn't really fix the root cause of the issue. The problem I'm experiencing at my end is that by using the full path, the number of labels is far too high to be useful. It is normal practice to use the path with variable placeholders, instead of the fully-expanded path.

The example given in the ticket was endpoints under `/tasks`, but this can also be a very significant problem under `/indexes/{index-uid}/documents`. e.g.:
<img width="1510" alt="Screenshot 2024-05-03 at 12 14 36" src="https://github.com/meilisearch/meilisearch/assets/6530014/1df2ec19-5f69-4164-90d2-f65c59f9b544">

This patch replaces the fully-expanded path with the matched pattern.

The linked PR also mentions paths under other routes, e.g. `/static`, but this feels like a separate concern and these can be stripped out at the Prometheus end by filters if they are unwanted. The most important thing is to make the paths usable so that we can still get stats on e.g. the number of document deletes we see.

## PR checklist

Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Simon Detheridge <s@sd.ai>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-05-06 09:37:32 +00:00
Tamo
3698aef66b fix warning 2024-05-06 11:36:37 +02:00
Simon Detheridge
7f5ab3cef5 Use http path pattern instead of full path in metrics 2024-05-03 12:29:31 +01:00
meili-bors[bot]
c668043c4f Merge #4617
4617: Destructure `EmbedderOptions` so we don't miss some options r=dureuill a=dureuill

# Pull Request

## Related issue
#4595 was caused by the code not destructuring the embedder options.


## What does this PR do?
This PR adds the missing `url` parameter for ollama, and makes sure similar issue cannot happen in the future



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
v1.8.0
2024-05-02 14:55:32 +00:00
Louis Dureuil
5a305bfdea Remove unused struct 2024-05-02 16:14:37 +02:00
Louis Dureuil
f4dd73ec8c Destructure EmbedderOptions so we don't miss some options 2024-05-02 15:39:36 +02:00
meili-bors[bot]
66dce4600d Merge #4603
4603: Update charabia v0.8.10 r=Kerollmops a=ManyTheFish

- Update Charabia v0.8.10
- Add `swedish-recomposition` as an optional feature flag

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-30 13:04:02 +00:00