Commit Graph

2228 Commits

Author SHA1 Message Date
Akshay Kulkarni
32f825d442 move default implementation of word_pair_frequency to TestContext 2022-10-13 12:57:50 +05:30
Akshay Kulkarni
ff8b2d4422 formatting 2022-10-13 12:44:08 +05:30
Akshay Kulkarni
6cb8b46900 use word_pair_frequency and remove word_documents_count 2022-10-13 12:43:11 +05:30
Akshay Kulkarni
8c9245149e format file 2022-10-12 15:27:56 +05:30
bors[bot]
2000f7958d Merge #604
604: Speed up debug builds r=Kerollmops a=loiclec

Note: this draft PR is based on https://github.com/meilisearch/milli/pull/601 , for no particular reason.

## What does this PR do?
Make a series of changes with the goal of speeding up debug builds:

1. Add an `all_languages` feature which compiles charabia with its `default` features activated.
The `all_languages` feature is activated by default. But running:
```
cargo build --no-default-features
```
on `milli` is now much faster.

2. Reduce the debug optimisation level from 3 to 0, except for a few critical dependencies.

3.  Compile the build dependencies quicker as well. Previously, all build dependencies were compiled with `opt-level = 3`. Now, only the critical build dependencies are compiled with optimisations.

4. Reduce the amount of code generated by the `documents!` macro

5. Make the "progress update" closure provided to indexing functions a trait object instead of a generic parameter. This avoids monomorphising the indexing code multiple times needlessly.

## Results
Initial build times on my computer before and after these changes:
|        | cargo check | cargo check --no-default-features | cargo test | cargo test --lib | cargo test --no-default-features | cargo test --lib --no-default-features |
|--------|-------------|-----------------------------------|------------|------------------|----------------------------------|----------------------------------------|
| before | 1m05s       | 1m05s                             | 2m06s      | 1m47s            | 2m06                             | 1m47s                                  |
| after  | 28.9s       | 13.1s                             | 40s      | 38s            | 23s                              | 21s                                  |



Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-10-12 08:54:48 +00:00
Akshay Kulkarni
63e79a9039 update comment 2022-10-12 13:36:48 +05:30
Akshay Kulkarni
7f9680f0a0 Enhance word splitting strategy 2022-10-12 13:18:23 +05:30
Loïc Lecrenier
53503f09ca Make milli's default features optional in other executable targets 2022-10-12 09:22:05 +02:00
Loïc Lecrenier
6fbf5dac68 Simplify documents! macro to reduce compile times 2022-10-12 09:22:05 +02:00
Loïc Lecrenier
98fc093823 Optimize a few performance sensitive dependencies on debug builds 2022-10-12 09:22:05 +02:00
Loïc Lecrenier
5cfb5df31e Set opt-level to 0 for debug builds
But speed up compile times by optimising build dependencies of lindera
2022-10-12 09:22:05 +02:00
bors[bot]
55d889522b Merge #658
658: Add proximity calculation for the same word r=ManyTheFish a=msvaljek

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/milli/issues/647

## What does this PR do?
- During [the increase of the current word position](d94339a858/milli/src/update/index_documents/extract/extract_word_pair_proximity_docids.rs (L129-L135)) we extract the proximity between the current position and the next one.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: msvaljek <marko.svaljek@commercetools.com>
2022-10-10 13:33:58 +00:00
msvaljek
762e320c35 Add proximity calculation for the same word 2022-10-07 12:59:12 +02:00
bors[bot]
358aa337ea Merge #657
657: Fix link in Hacktoberfest section r=curquiza a=meili-bot

_This PR is auto-generated._

Fix link in CONTRIBUTING.md.
Following [this PR](https://github.com/meilisearch/meilisearch/pull/2845) and [this issue](https://github.com/meilisearch/meilisearch/issues/2840).


Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
2022-10-05 17:19:33 +00:00
meili-bot
1764a33690 Update CONTRIBUTING.md 2022-10-05 19:19:03 +02:00
bors[bot]
a90d7e4cc7 Merge #654
654: Re-upload milli's logo r=curquiza a=jeertmans

# Pull Request

## Related issue
None

## What does this PR do?
Apparently, some [commit](add96f921b) deleted the logo file, and updated the `src` path. It seems to me that this was an error, and that the logo file should have been moved, not deleted.

This fixes the problem of seeing this (see image) instead of the actual logo.
![image](https://user-images.githubusercontent.com/27275099/193786803-e0d11a59-48fa-4331-bd92-48457969d766.png)


## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?


Co-authored-by: Jérome Eertmans <jeertmans@icloud.com>
2022-10-04 10:56:33 +00:00
Jérome Eertmans
aec220ab63 chore: move logo to (new) assets folder 2022-10-04 12:20:24 +02:00
Jérome Eertmans
4348c49656 fix: re-upload milli's logo
The logo was deleted with this [commit](add96f921b).
2022-10-04 11:33:19 +02:00
bors[bot]
a18de9b5f0 Merge #650
650: Add missing logging timer to extractors r=Kerollmops a=vishalsodani

# Pull Request

## What does this PR do?
#645 
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: vishalsodani <vishalsodani@rediffmail.com>
2022-10-04 07:25:47 +00:00
bors[bot]
f9c2dacf33 Merge #653
653: Fix #652 - Change Spelling of `author` in `README.md` r=curquiza a=anirudhRowjee

# Pull Request

## What does this PR do?
Fixes #652
- Changes spellings of `au{hor` to `author`
- Minor formatting changes in Markdown

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?


Co-authored-by: Anirudh Rowjee <ani.rowjee@gmail.com>
2022-10-03 08:20:48 +00:00
Anirudh Rowjee
7d247353d0 [docs] contd - fix #652, revert capitalization of 'Meilisearch' 2022-10-03 09:52:20 +05:30
Anirudh Rowjee
bc502ee125 [docs] Fixed #652, changes spelling of author 2022-10-03 09:38:59 +05:30
vishalsodani
00c02d00f3 Add missing logging timer to extractors 2022-09-30 22:17:06 +05:30
bors[bot]
804db03e41 Merge #649
649: Update Hacktoberfest section in CONTRIBUTING.md r=curquiza a=meili-bot

_This PR is auto-generated._

Following: af850854e4

Update Hacktoberfest section in CONTRIBUTING.md with the global guideline information.


Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
2022-09-29 15:50:20 +00:00
meili-bot
26efdf4dd9 Update CONTRIBUTING.md 2022-09-29 16:00:15 +02:00
bors[bot]
4b903719a0 Merge #643
643: Add Hacktoberfest section to CONTRIBUTING.md r=curquiza a=meili-bot

_This PR is auto-generated._

Add Hacktoberfest section to CONTRIBUTING.md


Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
2022-09-22 16:44:51 +00:00
meili-bot
ed3d87f061 Update CONTRIBUTING.md 2022-09-22 18:43:42 +02:00
bors[bot]
a3622eda46 Merge #642
642: Remove LTO in release profile r=Kerollmops a=loiclec

Since we can't enable it in Meilisearch (see https://github.com/meilisearch/meilisearch/pull/2717 ), we should not enable it in milli either. The goal is for milli's benchmarks to accurately represent its performance within meilisearch.


Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-21 09:14:46 +00:00
Loïc Lecrenier
513a38f07b Remove LTO in release profile
Since we can't enable it in Meilisearch, there is no point in having it
enabled in milli
2022-09-21 10:44:33 +02:00
bors[bot]
e1e025c319 Merge #641
641: Remove `helpers` crate r=Kerollmops a=loiclec

# Pull Request

## What does this PR do?
Remove the `helpers` crates, because (I think) we don't use it. This should have been part of https://github.com/meilisearch/milli/pull/636 , but I forgot about it then :)





Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-21 08:36:05 +00:00
Loïc Lecrenier
b6fe6838d3 Remove helpers crate 2022-09-21 10:25:36 +02:00
bors[bot]
d94339a858 Merge #636
636: Remove unused `infos`, `http-ui`, and `milli/fuzz`, crates r=ManyTheFish a=loiclec

We haven't used the `infos/`, `http-ui/` and `milli/fuzz/` crates in a long time. They are not properly maintained and probably do not work correctly anymore.

This PR removes these crates entirely from the workspace to reduce the amount of code we need to maintain.

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:39:57 +00:00
bors[bot]
15d478cf4d Merge #635
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec

# Pull Request
## What does this PR do?

Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.

In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.


Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:00:52 +00:00
Loïc Lecrenier
add96f921b Remove unused infos/ http-ui/ and fuzz/ crates 2022-09-14 06:55:01 +02:00
bors[bot]
4fc6331cb6 Merge #638
638: Update version for the next release (v0.33.4) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2022-09-13 13:56:53 +00:00
curquiza
753e76d451 Update version for the next release (v0.33.4) in Cargo.toml files 2022-09-13 13:55:50 +00:00
Loïc Lecrenier
3794962330 Use an unstable algorithm for grenad::Sorter when possible 2022-09-13 14:49:53 +02:00
bors[bot]
2865b063ad Merge #637
637: We avoid skipping errors in the indexing pipeline r=ManyTheFish a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2764 and should fix it when merged into Meilisearch.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-09-13 12:12:05 +00:00
Kerollmops
d4d7c9d577 We avoid skipping errors in the indexing pipeline 2022-09-13 14:03:00 +02:00
bors[bot]
f8697075ea Merge #632
632: Make charabia default feature optional r=ManyTheFish a=vincent-herlemont

# Pull Request

## What does this PR do?
Fixes [#627](https://github.com/meilisearch/milli/issues/627#issuecomment-1239769122)

Thank you so much for contributing to Meilisearch!


Co-authored-by: Vincent Herlemont <vincent@herlemont.fr>
2022-09-08 14:33:26 +00:00
bors[bot]
7cd0aea1d3 Merge #633
633: Upgrade ubuntu-18.04 to 20.04 r=Kerollmops a=curquiza

Ubuntu-18.04 is going to be deprecated by GitHub
https://github.com/actions/runner-images/issues/6002

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-09-08 14:08:28 +00:00
Clémentine Urquizar
69b2d31b71 Upgrade ubuntu-18.04 to 20.04 2022-09-08 14:58:06 +02:00
Vincent Herlemont
8cd5200f48 Make charabia languages configurable 2022-09-08 12:21:43 +02:00
bors[bot]
99b45a7820 Merge #631
631: Revert "Remove Bors required test for Windows" r=Kerollmops a=curquiza

Reverts meilisearch/milli#612

Because the issue does not seem to be there!

Closes https://github.com/meilisearch/milli/issues/614

Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-09-07 21:07:44 +00:00
Vincent Herlemont
5e07ea79c2 Make charabia default feature optional 2022-09-07 20:54:31 +02:00
Clémentine Urquizar - curqui
3af3d3f7d9 Revert "Remove Bors required test for Windows" 2022-09-07 18:36:10 +02:00
bors[bot]
549fa12d5a Merge #629
629: Update version for the next release (v0.33.3) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2022-09-07 15:55:04 +00:00
curquiza
077dcd2002 Update version for the next release (v0.33.3) in Cargo.toml files 2022-09-07 15:48:53 +00:00
bors[bot]
2907928d93 Merge #628
628: Make sure that long words are ignored r=ManyTheFish a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2743 and is fixing it.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-09-07 13:04:59 +00:00
Kerollmops
fe3973a51c Make sure that long words are correctly skipped 2022-09-07 15:03:32 +02:00