Commit Graph

9364 Commits

Author SHA1 Message Date
261de888b7 Add function to get the embeddings of a document in an index 2024-05-20 10:36:17 +02:00
98c811247e Add parsed vectors module 2024-05-20 10:25:59 +02:00
59ecf1cea7 Merge #4544
4544: Stream documents r=curquiza a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4383


### Perf
2M hackernews:

main:
Time to retrieve: 7s
RAM consumption: 2+GiB

stream:
Time to retrieve: 4.7s
RAM consumption: Too small

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-05-17 14:49:08 +00:00
273c6e8c5c uses the latest version of heed to get rid of unsafe code 2024-05-16 18:31:32 +02:00
897d25780e update milli to latest version 2024-05-16 18:31:32 +02:00
c85d1752dd keep the same rtxn to compute the filters on the documents and to stream the documents later on 2024-05-16 18:31:32 +02:00
8e6ffbfc6f stream documents 2024-05-16 18:31:32 +02:00
7c19c072fa Merge #4631
4631: Split the field id map from the weight of each fields r=Kerollmops a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4484

## What does this PR do?
- Make the (internal) searchable fields database always contain the searchable fields (instead of None when the user-defined searchable fields were not defined)
- Introduce a new « fieldids_weights_map » that does the mapping between a fieldId and its Weight
- Ensure that when two searchable fields are swapped, the field ID map doesn't change anymore (and thus, doesn't re-index)
- Uses the weight instead of the order of the searchable fields in the attribute ranking rule at search time
- When no searchable attributes are defined, make all their weights equal to zero
- When a field is declared as searchable and contains nested fields, all its subfields share the same weight

## Impact on relevancy

### When no searchable attributes are declared

When no searchable attributes are declared, all the fields have the same importance instead of randomly giving more importance to the field we've encountered « the most early » in the life of the index.

This means that before this PR, send the following json:
```json
[
  { "id": 0, "name": "kefir", "color": "white" },
  { "id": 1, "name": "white", "last name": "spirit" }
]
```

Would make the field `name` more important than the field `color` or `last name`.
This means that searching for `white` would make the document `1` automatically higher ranked than the document `0`.

After this PR, all the fields have the same weight, and none are considered more important than others.

### When a nested field is made searchable

The second behavior change that happened with this PR is in the case you're sending this document, for example:

```json
{
  "id": 0,
  "name": "tamo",
  "doggo": {
    "name": "kefir",
    "surname": "le kef"
  },
  "catto": "gromez"
}
```

Previously, defining the searchable attributes as: `["tamo", "doggo", "catto"]` was actually defining the « real » searchable attributes in the engine as: `["tamo", "doggo", "catto", "doggo.name", "doggo.surname"]`, which means that `doggo.name` and `doggo.surname` were _NOT_ where the user expected them and had completely different weights than `doggo`.
In this PR all the weights have been unified, and the « real » searchable fields look like this:
```json
[ "tamo", "doggo", "doggo.name", "doggo.surname", "catto"]
   ^^^^    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^^^^
Weight 0                 Weight 1                  Weight 2

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-05-16 09:59:24 +00:00
673b6e1dc0 fix a flaky test 2024-05-16 11:28:14 +02:00
f2d0a59f1d when no searchable attributes are defined, makes all the weight equals to zero 2024-05-16 01:06:33 +02:00
c78a2fa4f5 rename method and variable around the attributes to search on feature 2024-05-15 18:04:42 +02:00
5542f1d9f1 get back to what we were doingb efore in the DB cache and with the restricted field id 2024-05-15 18:00:39 +02:00
ad4d8502b3 stops storing the whole fieldids weights map when no searchable are defined 2024-05-15 17:16:10 +02:00
7ec4e2a3fb apply all style review comments 2024-05-15 15:02:26 +02:00
9fffb8e83d make clippy happy 2024-05-14 17:36:32 +02:00
caa6a7149a make the attribute ranking rule use the weights and fix the tests 2024-05-14 17:36:32 +02:00
a0082c4df9 add a failing test on the attribute ranking rule 2024-05-14 17:00:02 +02:00
b0afe0972e stop updating the fields ids map when fields are only swapped 2024-05-14 17:00:02 +02:00
9ecde41853 add a test on the current behaviour 2024-05-14 17:00:02 +02:00
685f452fb2 Fix the indexing of the searchable 2024-05-14 17:00:02 +02:00
4e4a1ddff7 gate a test behind the required feature 2024-05-14 17:00:02 +02:00
c22460045c Stops returning an option in the internal searchable fields 2024-05-14 17:00:02 +02:00
76bb6d565c Merge #4624
4624: Add "precommands" to benchmark r=dureuill a=dureuill

# Pull Request

## Related issue
Helps for https://github.com/meilisearch/meilisearch/issues/4493

## What does this PR do?
- Add support for precommands for cargo xtask bench
- update benchmark docs
- update workload files


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-05-13 08:27:56 +00:00
9d3ff11b21 Modify existing workload files to use precommands 2024-05-07 14:03:14 +02:00
43763eb98a Document precommands 2024-05-07 12:26:22 +02:00
2a0ece814c Add precommands to workloads 2024-05-07 12:23:36 +02:00
95fcd17373 Merge #4622
4622: Bump Rustls to non-vulnerable versions r=Kerollmops a=Kerollmops

This PR Fixes #4599 by bumping the Rustls dependency to v0.21.12 and [ureq to v2.9.7](https://github.com/algesten/ureq/blob/main/CHANGELOG.md#297) (which bump rustls to v0.22.4).

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-07 09:47:30 +00:00
ac4bc143c4 Bump ureq to v2.9.7 2024-05-07 10:39:38 +02:00
f33a1282f8 Bump Rustls to v0.21.12 2024-05-07 10:31:39 +02:00
4d5971f343 Merge #4621
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
ecb5c506b3 Merge #4619
4619: Use http path pattern instead of full path in metrics r=irevoire a=gh2k

# Pull Request

## Related issue

Fixes #3983 

## What does this PR do?

- This records only the HTTP pattern in metrics instead of the full path

An alternative solution was proposed in #4145, but this doesn't really fix the root cause of the issue. The problem I'm experiencing at my end is that by using the full path, the number of labels is far too high to be useful. It is normal practice to use the path with variable placeholders, instead of the fully-expanded path.

The example given in the ticket was endpoints under `/tasks`, but this can also be a very significant problem under `/indexes/{index-uid}/documents`. e.g.:
<img width="1510" alt="Screenshot 2024-05-03 at 12 14 36" src="https://github.com/meilisearch/meilisearch/assets/6530014/1df2ec19-5f69-4164-90d2-f65c59f9b544">

This patch replaces the fully-expanded path with the matched pattern.

The linked PR also mentions paths under other routes, e.g. `/static`, but this feels like a separate concern and these can be stripped out at the Prometheus end by filters if they are unwanted. The most important thing is to make the paths usable so that we can still get stats on e.g. the number of document deletes we see.

## PR checklist

Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Simon Detheridge <s@sd.ai>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-05-06 09:37:32 +00:00
3698aef66b fix warning 2024-05-06 11:36:37 +02:00
7f5ab3cef5 Use http path pattern instead of full path in metrics 2024-05-03 12:29:31 +01:00
c668043c4f Merge #4617
4617: Destructure `EmbedderOptions` so we don't miss some options r=dureuill a=dureuill

# Pull Request

## Related issue
#4595 was caused by the code not destructuring the embedder options.


## What does this PR do?
This PR adds the missing `url` parameter for ollama, and makes sure similar issue cannot happen in the future



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
v1.8.0
2024-05-02 14:55:32 +00:00
5a305bfdea Remove unused struct 2024-05-02 16:14:37 +02:00
f4dd73ec8c Destructure EmbedderOptions so we don't miss some options 2024-05-02 15:39:36 +02:00
66dce4600d Merge #4603
4603: Update charabia v0.8.10 r=Kerollmops a=ManyTheFish

- Update Charabia v0.8.10
- Add `swedish-recomposition` as an optional feature flag

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-30 13:04:02 +00:00
fe51ceca6d Update lock file 2024-04-30 14:33:37 +02:00
88174b8ae4 Update charabia v0.8.10 2024-04-30 14:30:23 +02:00
ebca29f3de Merge #4597
4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish

# Pull Request
- add some conditions reducing the work done when changing the settings
- add some benchmarks on embedders

## Related issue
Fixes #4585


Co-authored-by: ManyTheFish <many@meilisearch.com>
v1.8.0-rc.2
2024-04-25 16:37:28 +00:00
c793b6ef6d Merge #4600
4600: Fix embedders api r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #4594
Fixes #4595


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 13:16:33 +00:00
cbbfff3594 Remove debuging prints 2024-04-25 10:37:18 +02:00
dbcf50589b Fix clippy 2024-04-25 10:36:10 +02:00
3e5cd027a5 Merge #4593
4593: Stop crashing when panic occurs in thread pool r=ManyTheFish a=Kerollmops

This PR fixes #4362 by introducing a new boolean to catch panics in the rayon thread pool. The boolean is read after performing the operations in rayon, and the indexation process is stopped. This first version doesn't expose the panic message but marks the task as failed.

The current implementation exposes a `ThreadPoolNoAbort` wrapper. The `rayon::ThreadPool` has been wrapped to check that nothing went wrong after running the `ThreadPool::install` function. An atomic boolean and some `store/load` logic make the system work efficiently.

Before, Meilisearch was completely crashing...

<img width="1563" alt="Capture d’écran 2024-04-22 à 15 49 02" src="https://github.com/meilisearch/meilisearch/assets/3610253/ce114917-a881-4fbb-85df-c195fcf0c7cb">

Now, it handles the panics correctly and marks the task as failed.

<img width="1558" alt="Capture d’écran 2024-04-22 à 15 42 14" src="https://github.com/meilisearch/meilisearch/assets/3610253/8bd031ef-5e8f-4a12-a91e-c823597a2344">


Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-04-24 16:27:08 +00:00
7468c1cf8d Introduce WildcardSetting that are serialized as wildcards by default 2024-04-24 18:15:03 +02:00
d4aeff92d0 Introduce the ThreadPoolNoAbort wrapper 2024-04-24 16:40:12 +02:00
e87cb373de Avoid intermediate serializing when displaying settings 2024-04-24 12:33:07 +02:00
9b76501875 Display set API key for Ollama embedder 2024-04-24 12:33:07 +02:00
6247e95dc3 Add benchmark for embeddings prototype-embeddings-opti-0 2024-04-23 17:42:20 +02:00
b3173d0423 Remove useless dots in the error messages 2024-04-22 18:09:33 +02:00