Compare commits

...

145 Commits

Author SHA1 Message Date
Tamo
7415ef7ff5 Update crates/meilitool/src/upgrade/v1_11.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-05 15:37:59 +01:00
Tamo
a5d138ac34 use a tag while importing arroy instead of a loose branch or rev 2024-11-05 15:24:02 +01:00
Tamo
0f74a93346 Update crates/meilitool/src/upgrade/v1_11.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-05 15:14:02 +01:00
Tamo
e4993aa705 Update crates/meilitool/src/upgrade/mod.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-05 15:13:50 +01:00
Tamo
66b7e0824e Update crates/meilitool/src/upgrade/mod.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-05 15:13:40 +01:00
Tamo
f193c3a67c Update crates/meilitool/src/main.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-05 15:13:32 +01:00
Tamo
48ab898ca2 fix the datetime of v1.9 2024-11-05 10:30:53 +01:00
Tamo
a1f228f662 remove the uneeded files after the rebase 2024-11-04 18:19:36 +01:00
Tamo
99a9fde37f push back the removed files 2024-11-04 17:55:55 +01:00
Tamo
106cc7fe3a fmt 2024-11-04 17:51:40 +01:00
Tamo
4eef0cd332 fix the update from v1_9 to v1_10 by providing a custom datetime formatter myself 2024-11-04 17:47:10 +01:00
Tamo
5f57306858 update the arroy version in meilitool 2024-11-04 17:47:10 +01:00
Tamo
690eb42fc0 update the version of arroy 2024-11-04 17:47:10 +01:00
Tamo
a9b61c8434 fix the version parsing and improve error handling 2024-11-04 17:47:10 +01:00
Tamo
ddd03e9b37 implement the upgrade from v1.10 to v1.11 in meilitool 2024-11-04 17:47:10 +01:00
Tamo
362836efb7 make an upgrade module where we'll be able to shove each version instead of putting everything in the same file 2024-11-04 17:47:10 +01:00
meili-bors[bot]
22229d3046 Merge #5022
5022: Briging changes from v1.11.0 back to main r=irevoire a=Kerollmops

Fixes https://github.com/meilisearch/meilisearch/issues/5035

...and fixing merge conflicts.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2024-11-04 15:34:19 +00:00
Tamo
186326fe40 update the macos version 2024-11-04 16:33:04 +01:00
Tamo
cf6ad1ae5e Merge branch 'main' into tmp-release-v1.11.0 2024-11-04 16:14:44 +01:00
meili-bors[bot]
28274292d8 Merge #5021
5021: Update benchmarks to match the new crates subfolder r=dureuill a=Kerollmops



Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-10-29 08:06:35 +00:00
Clément Renault
ee72f622c7 Update benchmarks to match the new crates subfolder 2024-10-28 14:06:46 +01:00
meili-bors[bot]
b0da626506 Merge #5016
5016: Hide code complexity into a subfolder r=Kerollmops a=Kerollmops

This PR moves the complexity and main code into a subfolder to make the main repository page more welcoming by reducing the number of visible files and showing the README earlier.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-10-28 09:43:14 +00:00
meili-bors[bot]
f372ee505f Merge #5017
5017: Rollback the Meilisearch Kawaii logo r=curquiza a=Kerollmops

This PR reverts #4778 and brings back the official one. It's no longer the time to JOKE, OK !?

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-10-22 08:14:18 +00:00
meili-bors[bot]
3753f87fd8 Merge #5011
5011: Revamp analytics r=ManyTheFish a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5009

## What does this PR do?
- Force every analytics to go through a trait that forces you to handle aggregation correcty
- Put the code to retrieve the `user-agent`, `timestamp` and `requests.total_received` in common between all aggregates, so there is no mistake
- Get rids of all the different channel for each kind of event in favor of an any map
- Ensure that we never [send empty event ever again](https://github.com/meilisearch/meilisearch/pull/5001)
- Merge all the sub-settings route into a global « Settings Updated » event.
- Fix: When using one of the three following feature, we were not sending any analytics IF they were set from the global route
  - /non-separator-tokens
  - /separator-tokens
  - /dictionary

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-10-21 15:08:49 +00:00
Clément Renault
8ef8035bf2 Fix CI 2024-10-21 08:28:33 +02:00
Clément Renault
3353bcd82d Revert "Change the Meilisearch logo to the kawaii version"
This reverts commit 13d1d78a2d.
2024-10-21 08:21:56 +02:00
Clément Renault
9c1e54a2c8 Move crates under a sub folder to clean up the code 2024-10-21 08:18:43 +02:00
Tamo
5675585fe8 move all the searches structures to new modules 2024-10-20 17:54:43 +02:00
Tamo
af589c85ec reverse all the settings to keep the last one received instead of the first one received in case we receive the same setting multiple times 2024-10-20 17:40:31 +02:00
Tamo
ac919df37d simplify the trait a bit more by getting rids of the downcast_aggregate method 2024-10-20 17:36:29 +02:00
Tamo
73b5722896 rename the other parameter of the aggregate method to new to avoid confusion 2024-10-20 17:31:35 +02:00
Tamo
c94679bde6 apply review comments 2024-10-20 17:24:12 +02:00
Tamo
89e2d2b2b9 fix the doctest 2024-10-17 13:55:49 +02:00
Tamo
3a7a20c716 remove the segment feature and always import segment 2024-10-17 11:21:14 +02:00
Tamo
fa1db6b721 fix the tests 2024-10-17 09:55:30 +02:00
Tamo
1ab6fec903 send all experimental features in the info event including the runtime one 2024-10-17 09:49:21 +02:00
Tamo
18ac4032aa Remove the experimental feature seen 2024-10-17 09:35:11 +02:00
Tamo
d9115b74f0 move the analytics settings code to a dedicated file 2024-10-17 09:32:54 +02:00
Tamo
0fde49640a make clippy happy 2024-10-17 09:18:25 +02:00
Tamo
4ee65d870e remove a lot of ununsed code 2024-10-17 09:14:34 +02:00
Tamo
ef77c7699b add the required shared values between all the events and fix the timestamp 2024-10-17 09:06:23 +02:00
Tamo
7382fb21e4 fix the main 2024-10-17 08:38:11 +02:00
Tamo
e4ace98004 fix all the routes + move to a better version of mopa 2024-10-17 01:04:25 +02:00
Tamo
aa7a34ffe8 make the aggregate method send 2024-10-17 00:43:34 +02:00
Tamo
6728cfbfac fix the analytics 2024-10-17 00:38:18 +02:00
Tamo
ea6883189e finish the analytics in all the routes 2024-10-16 21:17:06 +02:00
Tamo
fdeb47fb54 implements all routes 2024-10-16 17:16:33 +02:00
Tamo
e66fccc3f2 get rids of the analytics closure 2024-10-16 15:51:48 +02:00
Tamo
73e87c152a rewrite most of the analytics especially the settings 2024-10-16 15:43:27 +02:00
meili-bors[bot]
75b2f22add Merge #5008
5008: Display vectors when no custom vectors where ever provided r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes the issue reported on [Discord](https://discord.com/channels/1006923006964154428/1294653031958446080/1295336784896589967).

## What does this PR do?
- Normal behavior of Meilisearch is to hide `_vectors` even when `retrieveVectors: true` when there is an explicit list of displayed attributes that does not contain vectors
- However, this relied on the field id for the `_vectors` field to exist, which wasn't the case when no `_vectors` was manually provided to documents. This would often be the case for people using autoembedders such as the OpenAI integration.
- This PR fixes the behavior by looking for the `_vectors` string in the `displayedAttributes` when there is no `_vectors` fid.
- This PR also adds a test for this specific situation, that would fail before the PR, and pass after the PR


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-10-15 13:08:47 +00:00
Louis Dureuil
5a74d4729c Add test failing before this PR, OK now 2024-10-14 16:23:28 +02:00
Louis Dureuil
e44e7b5e81 Fix retrieveVectors when explicitly passed in displayed attributes without any document containing _vectors 2024-10-14 16:17:19 +02:00
meili-bors[bot]
a0b3887709 Merge #5006
5006: Bring back changes from v1.10.3 r=Kerollmops a=irevoire

# Pull Request

## Related issue
Port the following PR to the latest version: https://github.com/meilisearch/meilisearch/pull/5000
See its description for more information

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-10-14 14:06:35 +00:00
Tamo
4b4a6c7863 Update meilisearch/src/option.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-10-14 14:39:34 +02:00
Tamo
3085092e04 Update meilisearch/src/option.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-10-14 14:39:34 +02:00
Tamo
c4efd1df4e Update meilisearch/src/option.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-10-14 14:39:34 +02:00
Tamo
c32282acb1 improve doc 2024-10-14 14:39:34 +02:00
Tamo
92070a3578 Implement the experimental drop search after and nb search per core 2024-10-14 14:39:33 +02:00
meili-bors[bot]
a90563df3f Merge #5001
5001: Do not send empty edit document by function r=Kerollmops a=irevoire

# Pull Request

We realized that we had a huge usage of the feature from user who didn’t enable the feature at all. That shouldn’t be possible.
After a big investigation with `@gmourier` 
![image](https://github.com/user-attachments/assets/eae3e851-dc5b-4616-80ee-7237a4871522)
We found the issue, it was in the engine

## What does this PR do?
- Do not send the edit by function event to segment if no event was received during this batch

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-10-11 08:27:16 +00:00
Tamo
466604725e Do not send empty edit document by function 2024-10-10 23:47:15 +02:00
meili-bors[bot]
995394a516 Merge #4993
4993: Update mini-dashboard r=ManyTheFish a=curquiza

Remove the forced capitalized attribute name

Co-authored-by: curquiza <clementine@meilisearch.com>
2024-10-10 05:57:45 +00:00
curquiza
6e37ae8619 Update mini-dashboard 2024-10-09 19:13:14 +02:00
meili-bors[bot]
657c645603 Merge #4992
4992: fix the bad experimental search queue size r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes #4991 

## What does this PR do?
- Set the right default value for the experimental search queue size in the config file


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-10-09 10:45:48 +00:00
Tamo
7f5d0837c3 fix the bad experimental search queue size 2024-10-09 11:46:57 +02:00
meili-bors[bot]
30f3c30389 Merge #4962
4962: test: improve performance of create_index.rs r=irevoire a=DerTimonius

# Pull Request

## Related issue
related to #4840 

## What does this PR do?
This PR follows the instructions in #4840 and improves the performance of `meilisearch/tests/index/create_index.rs`. The tests run locally, if they fail in the CI I'll try to fix them

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Timon Jurschitsch <timon.jurschitsch@gmail.com>
2024-10-08 13:00:56 +00:00
meili-bors[bot]
d907d1b22d Merge #4990
4990: Add image source label to dockerfiles r=curquiza a=wuast94

To get changelogs shown with Renovate a docker container has to add the source label described in the OCI Image Format Specification.

For reference: https://github.com/renovatebot/renovate/blob/main/lib/modules/datasource/docker/readme.md

Co-authored-by: Marc <github@wuast24.de>
Co-authored-by: Clémentine <clementine@meilisearch.com>
2024-10-08 12:19:38 +00:00
Clémentine
ed267fa063 Apply suggestions from code review 2024-10-08 14:14:16 +02:00
Marc
6af55b1a80 Update Dockerfile 2024-10-08 11:59:43 +02:00
Timon Jurschitsch
5b04189f7a remove flaky assert 2024-10-07 16:50:57 +02:00
Timon Jurschitsch
c0912aa685 add missing shared servers 2024-10-07 16:29:47 +02:00
Timon Jurschitsch
af38f46621 Merge branch 'main' of https://github.com/meilisearch/meilisearch into test/improve-create-index 2024-10-07 16:27:57 +02:00
meili-bors[bot]
386ca86297 Merge #4963
4963: test: improve performance of delete_index.rs r=curquiza a=DerTimonius

# Pull Request

## Related issue
related to #4840

## What does this PR do?
This PR follows the instructions in #4840 and improves the performance of `meilisearch/tests/index/delete_index.rs`. The tests run locally, if they fail in the CI I'll try to fix them

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Timon Jurschitsch <timon.jurschitsch@gmail.com>
2024-10-03 15:40:07 +00:00
Timon Jurschitsch
2a18917af3 add delete_index_fail function 2024-10-02 16:23:21 +02:00
meili-bors[bot]
0566f2549d Merge #4972
4972: Add binary quantized to error messages r=irevoire a=dureuill

was missing in error messages

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-10-02 09:23:55 +00:00
Louis Dureuil
0c2661ea90 Fix tests 2024-10-02 11:20:29 +02:00
Louis Dureuil
62dfbd6255 Add binary quantized to allowed fields for source adds its sources 2024-10-02 11:20:02 +02:00
meili-bors[bot]
cc669f90d5 Merge #4971
4971: update arroy r=dureuill a=irevoire

# Pull Request

Fix part of https://github.com/meilisearch/meilisearch/issues/3715


## What does this PR do?
- Update arroy to the latest version, most change are maintenance changes
- The performances of adding vectors to arroy should slightly improve
- Forward the build cancellation function to arroy so it can stop building trees when we have to stop an indexing process


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-10-02 05:53:51 +00:00
Tamo
b1dc10e771 uses the new cancellation method in arroy 2024-10-01 17:45:49 +02:00
Tamo
4b598fa648 update arroy 2024-10-01 17:31:12 +02:00
Timon Jurschitsch
17571805b4 use shared servers 2024-10-01 17:27:27 +02:00
Timon Jurschitsch
2654ce6e6c use shared servers 2024-10-01 17:01:47 +02:00
meili-bors[bot]
e78da35287 Merge #4930
4930: Return `UserError::InvalidDocumentId` for primary keys with a length greater than 512 bytes r=curquiza a=flevi29

# Pull Request

## Related issue
Fixes #4843

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: F. Levi <55688616+flevi29@users.noreply.github.com>
2024-09-30 15:55:05 +00:00
Timon Jurschitsch
84b4219a4f test: improve delete_index.rs 2024-09-29 10:16:31 +02:00
Timon Jurschitsch
5539a1904a test: improve performance of create_index.rs 2024-09-28 11:05:52 +02:00
meili-bors[bot]
71b364286b Merge #4957
4957: Update charabia feature flags r=dureuill a=ManyTheFish

# Pull Request

Add charabia's `turkish` feature flag into Meilisearch default tokenization flag



[All tests pipeline](https://github.com/meilisearch/meilisearch/actions/runs/11030036031)

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-26 20:19:21 +00:00
meili-bors[bot]
86183e0807 Merge #4960
4960: Update rhai r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4956

A fix has been implemented in https://github.com/rhaiscript/rhai/issues/916

## What does this PR do?
- Use the latest version of rhai containing the fix

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-09-26 15:03:01 +00:00
Tamo
78a4b7949d update rhai to a version that shouldn’t panic 2024-09-26 15:04:03 +02:00
ManyTheFish
dc2cb58cf1 use charabia default for all-tokenization 2024-09-25 11:12:30 +02:00
ManyTheFish
e9580fe619 Add turkish normalization 2024-09-25 11:03:17 +02:00
meili-bors[bot]
8205254f4c Merge #4955
4955: Upgrade "batch failed" log to error level r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #4916 


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-25 08:18:44 +00:00
meili-bors[bot]
efdc5739d7 Merge #4953
4953: Move the multi arroy index logic to the arroy wrapper r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4948

## What does this PR do?
- Make the `ArroyWrapper` we introduced in the last PR handle all the embedded for a specific docid itself.


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-09-24 15:02:24 +00:00
Tamo
b31e9bea26 while retrieving the readers on an arroywrapper, stops at the first empty reader 2024-09-24 16:33:17 +02:00
Tamo
7f048b9732 early exit in the clear and contains 2024-09-24 15:02:38 +02:00
Tamo
8b4e2c7b17 Remove now unused method 2024-09-24 15:00:25 +02:00
Tamo
645a55317a merge the build and quantize method 2024-09-24 14:54:24 +02:00
meili-bors[bot]
8caf97db86 Merge #4954
4954: Fix bench by adding embedder r=ManyTheFish a=dureuill

Fix benchmark workloads following breaking change on embedders

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-24 12:53:34 +00:00
Tamo
b8a74e0464 fix comments 2024-09-24 10:59:15 +02:00
Tamo
fd8447c521 fix the del items thing 2024-09-24 10:52:05 +02:00
Tamo
f2d187ba3e rename the index method to embedder_index 2024-09-24 10:39:40 +02:00
Tamo
79d8a7a51a rename the embedder index for clarity 2024-09-24 10:36:28 +02:00
Louis Dureuil
86da0e83fe Upgrade "batch failed" log to ERROR level 2024-09-24 10:02:53 +02:00
Louis Dureuil
0704fb71e9 Fix bench by adding embedder 2024-09-24 09:56:47 +02:00
Tamo
1e4d4e69c4 finish the arroywrapper 2024-09-23 18:56:15 +02:00
Tamo
6ba4baecbf first ugly step 2024-09-23 15:15:26 +02:00
meili-bors[bot]
7f20c13f3f Merge #4943
4943: Correct broken links in README r=curquiza a=iornstein

# Pull Request

## Related issue
Fixes #4942

## What does this PR do?
- Corrects some broken links in the README. My suspicion is that some of these documentation articles were moved around without someone updating links in the README.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? _(well the contributing guidelines led me to create an issue first)_
- [x] Have you read the contributing guidelines? _yes_
- [x] Have you made sure that the title is accurate and descriptive of the changes? _yes_

Thank you so much for contributing to Meilisearch!


Co-authored-by: Ian Ornstein <ian.ornstein@gmail.com>
2024-09-19 19:22:04 +00:00
meili-bors[bot]
462a2329f1 Merge #4941
4941: Implement the binary quantization in meilisearch r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4873

## What does this PR do?
- Add a settings for the binary quantization
- Once enabled, the bq cannot be disabled

TODO:
- [ ] Missing a bunch of tests

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-09-19 15:50:24 +00:00
Tamo
afa3ae0cbd WIP 2024-09-19 17:42:52 +02:00
Tamo
f6483cf15d apply review comment 2024-09-19 16:47:06 +02:00
meili-bors[bot]
bd34ed01d9 Merge #4945
4945: Add swedish in default pipelines r=dureuill a=ManyTheFish

# Summary
## Fix Swedish support

In Swedish the characters `å`/`ä`/`ö` are completely different than `a` or `o`  and should not be normalized as the same character.
because the Swedish specialized pipeline was not activated by default, these characters were normalized even with the settings:
```json
{
  "localizedAttributes": [ { "locales": ["swe"], "attributePatterns": ["*"] } ]
}
```

## Update Charabia adding German support

German segmentation will now be activated using the setting:
```json
{
  "localizedAttributes": [ { "locales": ["deu"], "attributePatterns": ["*"] } ]
}
```

# TODO

- [x] Activate Swedish Pipeline
- [x] Add a test to avoid future regressions
- [x] Update Charabia


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-19 14:42:03 +00:00
Tamo
74199f328d Make clippy happy 2024-09-19 16:27:34 +02:00
Tamo
1113c42de0 fix broken comments 2024-09-19 16:18:36 +02:00
ManyTheFish
465afe01b2 Add test for German 2024-09-19 16:09:01 +02:00
ManyTheFish
7d6768e4c4 Add german tokenization pipeline 2024-09-19 16:09:01 +02:00
ManyTheFish
f77661ec44 Update Charabia v0.9.1 2024-09-19 16:08:59 +02:00
Tamo
b8fd85a46d Get rids of useless collect before an iteration on the readers 2024-09-19 15:57:38 +02:00
Tamo
fd43c6c404 Improve the error message explaining you can't un-bq an embedder 2024-09-19 15:51:29 +02:00
Tamo
2564ec1496 Update milli/src/index.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 15:41:44 +02:00
Tamo
b6b73fe41c Update milli/src/update/settings.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 15:41:14 +02:00
Tamo
6dde41cc46 stop using a local version of arroy and instead point to the git repo with the rev 2024-09-19 15:25:38 +02:00
Tamo
163f8023a1 remove debug println 2024-09-19 12:13:25 +02:00
Tamo
2b120b89e4 update the test now that the embedder must be specified 2024-09-19 12:08:59 +02:00
Tamo
84f842233d snapshots the embedder settings in the dump import with vector test 2024-09-19 12:00:58 +02:00
Tamo
633537ccd7 fix updating documents without updating the settings 2024-09-19 12:00:58 +02:00
Tamo
e8d7c00d30 add a test on the settings value 2024-09-19 12:00:58 +02:00
Tamo
3f6301dbc9 fix the missing embedder name in the error message when trying to disable the binary quantization 2024-09-19 12:00:58 +02:00
Tamo
ca71b63ed1 adds integration tests 2024-09-19 12:00:58 +02:00
Tamo
2b6952eda1 rename the ArroyReader to an ArroyWrapper since it can read and write 2024-09-19 12:00:58 +02:00
Tamo
79f29eed3c fix the tests and the arroy_readers method 2024-09-19 12:00:58 +02:00
Tamo
cc45e264ca implement the binary quantization in meilisearch 2024-09-19 12:00:56 +02:00
meili-bors[bot]
5f474a640d Merge #4938
4938: Remove default embedder r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes #4738 

## What does this PR do?

[See public usage](https://meilisearch.notion.site/v1-11-AI-search-changes-0e37727193884a70999f254fa953ce6e#1044b06b651f80edb9d4ef6dc367bad0)

- Remove `hybrid.embedder` boolean from analytics because embedder is now mandatory and so the boolean would always be `true`
- Rework search kind so that a search without query but with vector is a vector search regardless of (non-zero) semantic ratio


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 09:17:14 +00:00
ManyTheFish
bbaee3dbc6 Add Swedish pipeline in all-tokenization feature 2024-09-19 08:34:51 +02:00
ManyTheFish
877717cb26 Add a test using Swedish documents 2024-09-19 08:34:04 +02:00
Ian Ornstein
716817122a Correct broken links in README 2024-09-18 16:30:29 -05:00
Louis Dureuil
1120a5296c Update tests 2024-09-17 16:30:43 +02:00
Louis Dureuil
a35a339c3d Touchup error message 2024-09-17 16:30:43 +02:00
Louis Dureuil
cac5836f6f Remove hybrid.embedder boolean from analytics because embedder is now mandatory 2024-09-17 16:30:43 +02:00
Louis Dureuil
5239ae0297 Rework search kind so that a search without query but with vector is a vector search regardless of semantic ratio 2024-09-17 16:30:43 +02:00
Louis Dureuil
2fdb1d8018 SearchQueryGet can fail 2024-09-17 16:30:43 +02:00
Louis Dureuil
3c5e363554 Remove default embedders 2024-09-17 16:30:43 +02:00
Louis Dureuil
da0dd6febf Make embedder mandatory 2024-09-17 16:30:43 +02:00
F. Levi
e098cc8320 Make comparison simpler, add IndexUid error details similarly 2024-09-17 00:16:15 +03:00
F. Levi
ec815fa368 Format 2024-09-16 23:59:48 +03:00
F. Levi
4a922a176f Add test for > 512 byte ID 2024-09-16 23:53:34 +03:00
F. Levi
51bc7b3173 Update tests 2024-09-16 22:22:24 +03:00
F. Levi
dcb61f8b3a Return error for primary keys with a length greater than 512 bytes 2024-09-14 11:34:13 +03:00
1087 changed files with 6501 additions and 4861 deletions

View File

@@ -40,7 +40,7 @@ jobs:
# Run benchmarks
- name: Run benchmarks - Dataset ${BENCH_NAME} - Branch ${{ steps.current_branch.outputs.name }} - Commit ${{ steps.commit_sha.outputs.short }}
run: |
cd benchmarks
cd crates/benchmarks
cargo bench --bench ${BENCH_NAME} -- --save-baseline ${{ steps.file.outputs.basename }}
# Generate critcmp files

View File

@@ -65,9 +65,9 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [macos-12, windows-2022]
os: [macos-13, windows-2022]
include:
- os: macos-12
- os: macos-13
artifact_name: meilisearch
asset_name: meilisearch-macos-amd64
- os: windows-2022
@@ -90,7 +90,7 @@ jobs:
publish-macos-apple-silicon:
name: Publish binary for macOS silicon
runs-on: macos-12
runs-on: macos-13
needs: check-version
strategy:
matrix:

View File

@@ -51,7 +51,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [macos-12, windows-2022]
os: [macos-13, windows-2022]
steps:
- uses: actions/checkout@v3
- name: Cache dependencies
@@ -169,5 +169,5 @@ jobs:
# Since we want to trigger (and fail) this action as fast as possible, instead of building the benchmark crate
# we are going to create an empty file where rustfmt expects it.
run: |
echo -ne "\n" > benchmarks/benches/datasets_paths.rs
echo -ne "\n" > crates/benchmarks/benches/datasets_paths.rs
cargo fmt --all -- --check

3
.gitignore vendored
View File

@@ -5,7 +5,6 @@
**/*.json_lines
**/*.rs.bk
/*.mdb
/query-history.txt
/data.ms
/snapshots
/dumps
@@ -19,4 +18,4 @@
*.snap.new
# Fuzzcheck data for the facet indexing fuzz test
milli/fuzz/update::facet::incremental::fuzz::fuzz/
crates/milli/fuzz/update::facet::incremental::fuzz::fuzz/

60
Cargo.lock generated
View File

@@ -386,15 +386,35 @@ checksum = "96d30a06541fbafbc7f82ed10c06164cfbd2c401138f6addd8404629c4b16711"
[[package]]
name = "arroy"
version = "0.4.0"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2ece9e5347e7fdaaea3181dec7f916677ad5f3fcbac183648ce1924eb4aeef9a"
checksum = "dfc5f272f38fa063bbff0a7ab5219404e221493de005e2b4078c62d626ef567e"
dependencies = [
"bytemuck",
"byteorder",
"heed",
"log",
"memmap2",
"nohash",
"ordered-float",
"rand",
"rayon",
"roaring",
"tempfile",
"thiserror",
]
[[package]]
name = "arroy"
version = "0.5.0"
source = "git+https://github.com/meilisearch/arroy/?tag=DO-NOT-DELETE-upgrade-v04-to-v05#053807bf38dc079f25b003f19fc30fbf3613f6e7"
dependencies = [
"bytemuck",
"byteorder",
"heed",
"log",
"memmap2",
"nohash",
"ordered-float",
"rand",
"rayon",
@@ -706,9 +726,9 @@ checksum = "2c676a478f63e9fa2dd5368a42f28bba0d6c560b775f38583c8bbaa7fcd67c9c"
[[package]]
name = "bytemuck"
version = "1.16.1"
version = "1.19.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b236fc92302c97ed75b38da1f4917b5cdda4984745740f153a5d3059e48d725e"
checksum = "8334215b81e418a0a7bdb8ef0849474f40bb10c8b71f1c4ed315cff49f32494d"
dependencies = [
"bytemuck_derive",
]
@@ -933,9 +953,9 @@ dependencies = [
[[package]]
name = "charabia"
version = "0.9.0"
version = "0.9.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "03cd8f290cae94934cdd0103c14c2de9faf2d7d85be0d24d511af2bf1b14119d"
checksum = "55ff52497324e7d168505a16949ae836c14595606fab94687238d2f6c8d4c798"
dependencies = [
"aho-corasick",
"csv",
@@ -2555,7 +2575,7 @@ name = "index-scheduler"
version = "1.11.0"
dependencies = [
"anyhow",
"arroy",
"arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
"big_s",
"bincode",
"crossbeam",
@@ -2838,7 +2858,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e310b3a6b5907f99202fcdb4960ff45b93735d7c7d96b760fcff8db2dc0e103d"
dependencies = [
"cfg-if",
"windows-targets 0.48.1",
"windows-targets 0.52.4",
]
[[package]]
@@ -3414,6 +3434,7 @@ dependencies = [
"meilisearch-types",
"mimalloc",
"mime",
"mopa-maintained",
"num_cpus",
"obkv",
"once_cell",
@@ -3515,6 +3536,7 @@ name = "meilitool"
version = "1.11.0"
dependencies = [
"anyhow",
"arroy 0.5.0 (git+https://github.com/meilisearch/arroy/?tag=DO-NOT-DELETE-upgrade-v04-to-v05)",
"clap",
"dump",
"file-store",
@@ -3545,7 +3567,7 @@ dependencies = [
name = "milli"
version = "1.11.0"
dependencies = [
"arroy",
"arroy 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
"big_s",
"bimap",
"bincode",
@@ -3680,12 +3702,24 @@ dependencies = [
"syn 2.0.60",
]
[[package]]
name = "mopa-maintained"
version = "0.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "79b7f3e22167862cc7c95b21a6f326c22e4bf40da59cbf000b368a310173ba11"
[[package]]
name = "mutually_exclusive_features"
version = "0.0.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6d02c0b00610773bb7fc61d85e13d86c7858cbdf00e1a120bfc41bc055dbaa0e"
[[package]]
name = "nohash"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a0f889fb66f7acdf83442c35775764b51fed3c606ab9cee51500dbde2cf528ca"
[[package]]
name = "nom"
version = "7.1.3"
@@ -4575,9 +4609,8 @@ dependencies = [
[[package]]
name = "rhai"
version = "1.19.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "61797318be89b1a268a018a92a7657096d83f3ecb31418b9e9c16dcbb043b702"
version = "1.20.0"
source = "git+https://github.com/rhaiscript/rhai?rev=ef3df63121d27aacd838f366f2b83fd65f20a1e4#ef3df63121d27aacd838f366f2b83fd65f20a1e4"
dependencies = [
"ahash 0.8.11",
"bitflags 2.6.0",
@@ -4594,8 +4627,7 @@ dependencies = [
[[package]]
name = "rhai_codegen"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a5a11a05ee1ce44058fa3d5961d05194fdbe3ad6b40f904af764d81b86450e6b"
source = "git+https://github.com/rhaiscript/rhai?rev=ef3df63121d27aacd838f366f2b83fd65f20a1e4#ef3df63121d27aacd838f366f2b83fd65f20a1e4"
dependencies = [
"proc-macro2",
"quote",

View File

@@ -1,24 +1,24 @@
[workspace]
resolver = "2"
members = [
"meilisearch",
"meilitool",
"meilisearch-types",
"meilisearch-auth",
"meili-snap",
"index-scheduler",
"dump",
"file-store",
"permissive-json-pointer",
"milli",
"filter-parser",
"flatten-serde-json",
"json-depth-checker",
"benchmarks",
"fuzzers",
"tracing-trace",
"xtask",
"build-info",
"crates/meilisearch",
"crates/meilitool",
"crates/meilisearch-types",
"crates/meilisearch-auth",
"crates/meili-snap",
"crates/index-scheduler",
"crates/dump",
"crates/file-store",
"crates/permissive-json-pointer",
"crates/milli",
"crates/filter-parser",
"crates/flatten-serde-json",
"crates/json-depth-checker",
"crates/benchmarks",
"crates/fuzzers",
"crates/tracing-trace",
"crates/xtask",
"crates/build-info",
]
[workspace.package]

View File

@@ -21,6 +21,7 @@ RUN set -eux; \
# Run
FROM alpine:3.20
LABEL org.opencontainers.image.source="https://github.com/meilisearch/meilisearch"
ENV MEILI_HTTP_ADDR 0.0.0.0:7700
ENV MEILI_SERVER_PROVIDER docker

View File

@@ -1,6 +1,9 @@
<p align="center">
<a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=logo" target="_blank">
<img src="assets/meilisearch-logo-kawaii.png">
<a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=logo#gh-light-mode-only" target="_blank">
<img src="assets/meilisearch-logo-light.svg?sanitize=true#gh-light-mode-only">
</a>
<a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=logo#gh-dark-mode-only" target="_blank">
<img src="assets/meilisearch-logo-dark.svg?sanitize=true#gh-dark-mode-only">
</a>
</p>
@@ -45,14 +48,14 @@ See the list of all our example apps in our [demos repository](https://github.co
## ✨ Features
- **Hybrid search:** Combine the best of both [semantic](https://www.meilisearch.com/docs/learn/experimental/vector_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features) & full-text search to get the most relevant results
- **Search-as-you-type:** Find & display results in less than 50 milliseconds to provide an intuitive experience
- **[Typo tolerance](https://www.meilisearch.com/docs/learn/configuration/typo_tolerance?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** get relevant matches even when queries contain typos and misspellings
- **[Typo tolerance](https://www.meilisearch.com/docs/learn/relevancy/typo_tolerance_settings?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** get relevant matches even when queries contain typos and misspellings
- **[Filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features) and [faceted search](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** enhance your users' search experience with custom filters and build a faceted search interface in a few lines of code
- **[Sorting](https://www.meilisearch.com/docs/learn/fine_tuning_results/sorting?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** sort results based on price, date, or pretty much anything else your users need
- **[Synonym support](https://www.meilisearch.com/docs/learn/configuration/synonyms?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** configure synonyms to include more relevant content in your search results
- **[Synonym support](https://www.meilisearch.com/docs/learn/relevancy/synonyms?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** configure synonyms to include more relevant content in your search results
- **[Geosearch](https://www.meilisearch.com/docs/learn/fine_tuning_results/geosearch?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** filter and sort documents based on geographic data
- **[Extensive language support](https://www.meilisearch.com/docs/learn/what_is_meilisearch/language?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** search datasets in any language, with optimized support for Chinese, Japanese, Hebrew, and languages using the Latin alphabet
- **[Security management](https://www.meilisearch.com/docs/learn/security/master_api_keys?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** control which users can access what data with API keys that allow fine-grained permissions handling
- **[Multi-Tenancy](https://www.meilisearch.com/docs/learn/security/tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** personalize search results for any number of application tenants
- **[Multi-Tenancy](https://www.meilisearch.com/docs/learn/security/multitenancy_tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** personalize search results for any number of application tenants
- **Highly Customizable:** customize Meilisearch to your specific needs or use our out-of-the-box and hassle-free presets
- **[RESTful API](https://www.meilisearch.com/docs/reference/api/overview?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** integrate Meilisearch in your technical stack with our plugins and SDKs
- **Easy to install, deploy, and maintain**

Binary file not shown.

Before

Width:  |  Height:  |  Size: 98 KiB

View File

@@ -1,6 +1,6 @@
status = [
'Tests on ubuntu-20.04',
'Tests on macos-12',
'Tests on macos-13',
'Tests on windows-2022',
'Run Clippy',
'Run Rustfmt',

View File

@@ -255,6 +255,8 @@ pub(crate) mod test {
}
"###);
insta::assert_json_snapshot!(vector_index.settings().unwrap());
{
let documents: Result<Vec<_>> = vector_index.documents().unwrap().collect();
let mut documents = documents.unwrap();

View File

@@ -0,0 +1,56 @@
---
source: dump/src/reader/mod.rs
expression: vector_index.settings().unwrap()
---
{
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"*"
],
"filterableAttributes": [],
"sortableAttributes": [],
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"nonSeparatorTokens": [],
"separatorTokens": [],
"dictionary": [],
"synonyms": {},
"distinctAttribute": null,
"proximityPrecision": "byWord",
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
"oneTypo": 5,
"twoTypos": 9
},
"disableOnWords": [],
"disableOnAttributes": []
},
"faceting": {
"maxValuesPerFacet": 100,
"sortFacetValuesBy": {
"*": "alpha"
}
},
"pagination": {
"maxTotalHits": 1000
},
"embedders": {
"default": {
"source": "huggingFace",
"model": "BAAI/bge-base-en-v1.5",
"revision": "617ca489d9e86b49b8167676d8220688b99db36e",
"documentTemplate": "{% for field in fields %} {{ field.name }}: {{ field.value }}\n{% endfor %}"
}
},
"searchCutoffMs": null
}

Some files were not shown because too many files have changed in this diff Show More