Commit Graph

7520 Commits

Author SHA1 Message Date
ManyTheFish
064158e4e2 Update test 2023-02-01 15:34:01 +01:00
ManyTheFish
77d32d0ee8 Fix codec deserialization 2023-02-01 15:26:26 +01:00
ManyTheFish
f4569b04ad Update Charabia version 2023-02-01 15:26:26 +01:00
bors[bot]
5e12af88e2 Merge #3445
3445: Bump milli to v0.41.1 r=curquiza a=dureuill

# Pull Request

## Related issue

Fixes #3438.

## What does this PR do?
- Bump milli to [v0.41.1](https://github.com/meilisearch/milli/releases/tag/v0.41.1) that includes a bugfix for #3438 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
v1.0.0-rc.3 v1.0.0
2023-02-01 11:07:46 +00:00
Louis Dureuil
231067a1c4 Bump milli to v0.41.1 2023-02-01 11:53:39 +01:00
Vibhav Bobade
2a1a7ef00a Integrate Uffizzi 2023-02-01 13:06:27 +05:30
bors[bot]
758b4acea7 Merge #776
776: Reduce incremental indexing time of `words_prefix_position_docids` DB r=curquiza a=loiclec

Fixes partially https://github.com/meilisearch/milli/issues/605

The `words_prefix_position_docids` can easily contain millions of entries. Thus, iterating
over it can be very expensive. But we do so needlessly for every document addition tasks.

It can sometimes cause indexing performance issues when :
- a user sends many `documentAdditionOrUpdate` tasks that cannot be all batched together (for example if they are interspersed with `documentDeletion` tasks)
- the documents contain long, diverse text fields, thus increasing the number of entries in `words_prefix_position_docids`
- the index has accumulated many soft-deleted documents, further increasing the size of `words_prefix_position_docids`
- the machine running Meilisearch does not have great IO performance (e.g. slow SSD, or quota-limited by the cloud provider)

Note, before approving  the PR: the only changed file should be `milli/src/update/words_prefix_position_docids.rs`.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-31 15:52:28 +00:00
bors[bot]
20f8184c06 Merge #3441
3441: Fix import of dump v2 r=dureuill a=irevoire

# Pull Request
This bug was introduced because of a mistake we did earlier: We said the last version to export dump v2 was the v0.21.0 while it was the v0.22.0.
To fix the bug I updated our whole v2 reader to use the code from meilisearch v0.22.0.
Also:
- Import the bugged dump in the tests
- Test the import of this dump in the v2 reader and current reader

## Related issue
Fixes #3435


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-01-31 13:23:57 +00:00
bors[bot]
2f8ebd0501 Merge #3439
3439: Add git config about ownership in Docker CI r=curquiza a=curquiza

The docker CI si failing because of git usage: https://github.com/meilisearch/meilisearch/actions/runs/4053334082/jobs/6973827940

<img width="960" alt="Capture d’écran 2023-01-31 à 12 12 44" src="https://user-images.githubusercontent.com/20380692/215745119-b866bcf2-7077-48e4-b018-7a2085b23680.png">


> fatal: detected dubious ownership in repository at '/home/meili/actions-runner/_work/meilisearch/meilisearch'

I made some research and I found out this https://github.com/actions/runner-images/issues/6775

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-01-31 12:58:59 +00:00
Tamo
6be9a828fa makes clippy happy 2023-01-31 13:03:28 +01:00
Tamo
4b7b2d6a90 fix the import of dump v2 generated by meilisearch v0.22.0 2023-01-31 13:03:28 +01:00
bors[bot]
a4e8158239 Merge #774
774: Update version for the next release (v0.41.1) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2023-01-31 11:51:42 +00:00
bors[bot]
151e52c481 Merge #3433
3433: Add prototype guide to CONTRIBUTING.md r=curquiza a=curquiza



Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2023-01-31 11:25:46 +00:00
curquiza
e269027cdd Add git config about ownershio in Docker CI 2023-01-31 12:04:41 +01:00
Loïc Lecrenier
a2690ea8d4 Reduce incremental indexing time of words_prefix_position_docids DB
This database can easily contain millions of entries. Thus, iterating
over it can be very expensive.

For regular `documentAdditionOrUpdate` tasks, `del_prefix_fst_words`
will always be empty. Thus, we can save a significant amount of time
by adding this `if !del_prefix_fst_words.is_empty()` condition.

The code's behaviour remains completely unchanged.
2023-01-31 11:42:24 +01:00
bors[bot]
33f61d2cd4 Merge #775
775: Fix clippy for Rust 1.67, allow `uninlined_format_args` r=dureuill a=dureuill

# Pull Request

milli part of https://github.com/meilisearch/meilisearch/pull/3437

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-31 10:29:24 +00:00
bors[bot]
544b581b15 Merge #3437
3437: Make clippy happy for Rust 1.67, allow uninlined_format_args r=Kerollmops a=dureuill

# Pull Request

This PR is the equivalent of #3434 for the `release-v1.0.0` branch.

See #3434 for more information.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-31 10:29:12 +00:00
f3r10
2922c5c899 Fix code format 2023-01-31 11:28:05 +01:00
f3r10
7681be5367 Format code 2023-01-31 11:28:05 +01:00
f3r10
50bc156257 Fix tests 2023-01-31 11:28:05 +01:00
f3r10
d8207356f4 Skip script,language insertion if language is undetected 2023-01-31 11:28:05 +01:00
f3r10
2d58b28f43 Improve script language codec 2023-01-31 11:28:05 +01:00
f3r10
fd60a39f1c Format code 2023-01-31 11:28:05 +01:00
f3r10
369c05732e Add test checking if from script_language_docids database were removed
deleted docids
2023-01-31 11:28:05 +01:00
f3r10
34d04f3d3f Filter from script_language_docids database soft deleted documents 2023-01-31 11:28:05 +01:00
f3r10
a27f329e3a Add tests for checking that detected script and language associated with document(s) were stored during indexing 2023-01-31 11:28:05 +01:00
f3r10
b216ddba63 Delete and clear data from the new database 2023-01-31 11:28:05 +01:00
f3r10
d97fb6117e Extract and index data 2023-01-31 11:28:05 +01:00
f3r10
c45d1e3610 Create a new database on index and add a specialized codec for it 2023-01-31 11:28:05 +01:00
Louis Dureuil
5c0668afcf clippy: allow uninlined_format_args 2023-01-31 11:13:47 +01:00
Louis Dureuil
20f05efb3c clippy: needless_lifetimes 2023-01-31 11:12:59 +01:00
Louis Dureuil
cbf029f64c clippy: --fix 2023-01-31 11:12:59 +01:00
curquiza
bffabf9cc6 Update version for the next release (v0.41.1) in Cargo.toml files 2023-01-31 09:56:22 +00:00
bors[bot]
f647b20818 Merge #3434
3434: Make clippy happy for Rust 1.67, allow `uninlined_format_args` r=Kerollmops a=dureuill

# Pull Request

This PR allows `uninlined_format_args` in CI for clippy.

This is due to https://github.com/rust-lang/rust-clippy/issues/10087, which in particular has correctness issues wrt edition 2018 crates, and is a big change altogether. https://github.com/rust-lang/rust-clippy/pull/10265 is already open in order to change the category of this lint to "pedantic", meaning that if this latter PR merges, a future Rust release will accept our code unmodified wrt uninlined format arguments.

As a result, this PR introduces the following changes:

1. Allow `uninlined_format_args` in the clippy command in CI.
2. Use rewind rather than seek(0)
3. Remove lifetimes that clippy deems needless.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-31 09:45:12 +00:00
Louis Dureuil
924d5d4c11 clippy: remove needless lifetimes 2023-01-31 10:40:48 +01:00
Louis Dureuil
771a367b97 clippy: use rewind instead of seek 0 2023-01-31 10:40:48 +01:00
Louis Dureuil
07603373f3 clippy: allow uninlined_format_args 2023-01-31 10:15:07 +01:00
Louis Dureuil
d91f8fc493 clippy: Allow uninlined_format_args in CI 2023-01-31 09:56:26 +01:00
Louis Dureuil
3296cf7ae6 clippy: remove needless lifetimes 2023-01-31 09:32:40 +01:00
Louis Dureuil
89675e5f15 clippy: Replace seek 0 by rewind 2023-01-31 09:32:40 +01:00
Louis Dureuil
47b7d515ed Add more detailed contribution instructions for tests 2023-01-30 17:39:05 +01:00
Clémentine Urquizar - curqui
2ba4629938 Update CONTRIBUTING.md
Co-authored-by: Many the fish <many@meilisearch.com>
2023-01-30 15:56:30 +01:00
curquiza
982dd76042 Improve readability 2023-01-30 14:36:22 +01:00
curquiza
3505ee47f8 Add volume to docker command 2023-01-30 14:33:50 +01:00
curquiza
b2d25c07d7 Add guide to create a proto 2023-01-30 14:31:36 +01:00
Clémentine Urquizar - curqui
b9d8bd77fc Update README.md
Co-authored-by: gui machiavelli <hey@guimachiavelli.com>
2023-01-26 18:14:00 +01:00
Clémentine Urquizar - curqui
8a66ba01d8 Update README.md
Co-authored-by: gui machiavelli <hey@guimachiavelli.com>
2023-01-26 18:13:53 +01:00
Clémentine Urquizar - curqui
8a6d548041 Update README.md
Co-authored-by: gui machiavelli <hey@guimachiavelli.com>
2023-01-26 18:13:08 +01:00
Clémentine Urquizar - curqui
b452358124 Update README.md
Co-authored-by: gui machiavelli <hey@guimachiavelli.com>
2023-01-26 18:12:56 +01:00
bors[bot]
bfb1f9279b Merge #3420 #3422
3420: Add image hyperlink in README.md r=curquiza a=gregsadetsky

# Pull Request

## What does this PR do?
- tiny README.md improvement: under "SDKs & integration tools", add a hyperlink to the image with all of the language logos so that clicking the image leads to the integrations page. Otherwise, right now, clicking this image leads to the image file in the repo, which is not really useful.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [X] Have you read the contributing guidelines?
- [X] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


3422: Remove cache from test CI r=dureuill a=curquiza

Comment the lines in CIs where we use the test CIs
We indeed have cache issues (lack of space on the machine) when running our test CIs
https://github.com/meilisearch/meilisearch/pull/3403

Co-authored-by: Greg Sadetsky <lepetitg@gmail.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2023-01-25 16:25:53 +00:00