Commit Graph

8357 Commits

Author SHA1 Message Date
0xflotus
85a24775c5 Update README.md 2023-06-23 12:25:53 +02:00
0xflotus
6b0e9b9a7f Update README.md 2023-06-23 12:20:43 +02:00
0xflotus
b18c57ea7f docs: fixed some broken links
Some of the links in the README file were broken.
2023-06-23 12:18:43 +02:00
Cong Chen
6d4981ec25 Expose lastUpdate and isIndexing in /stats endpoint 2023-06-23 07:24:25 +08:00
meili-bors[bot]
040b5a5b6f Merge #3842
3842: fix some typos r=dureuill a=cuishuang

# Pull Request

## Related issue
Fixes #<issue_number>

## What does this PR do?
- fix some typos

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: cui fliter <imcusg@gmail.com>
2023-06-22 18:01:10 +00:00
cui fliter
530a3e2df3 fix some typos
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-06-22 21:59:00 +08:00
Louis Dureuil
11d32ad192 Add very light analytics for scoring 2023-06-22 12:39:14 +02:00
Louis Dureuil
d26e9a96ec Add score details to new search tests 2023-06-22 12:39:14 +02:00
Louis Dureuil
49c8bc4de6 Fix tests 2023-06-22 12:39:14 +02:00
Louis Dureuil
da833eb095 Expose the scores and detailed scores in the API 2023-06-22 12:39:14 +02:00
Louis Dureuil
701d44bd91 Store the scores for each bucket
Remove optimization where ranking rules are not executed on buckets of a single document
when the score needs to be computed
2023-06-22 12:39:14 +02:00
Louis Dureuil
c621a250a7 Score for graph based ranking rules
Count phrases in matchingWords and maxMatchingWords
2023-06-22 12:39:14 +02:00
Louis Dureuil
8939e85f60 Add rank_to_score for graph based ranking rules 2023-06-22 12:39:14 +02:00
Louis Dureuil
fa41d2489e Score for sort 2023-06-22 12:39:14 +02:00
Louis Dureuil
59c5b992c2 Score for geosort 2023-06-22 12:39:14 +02:00
Louis Dureuil
2ea8194c18 Score for exact_attributes 2023-06-22 12:39:14 +02:00
Louis Dureuil
421df64602 RankingRuleOutput now contains a Score 2023-06-22 12:39:14 +02:00
Louis Dureuil
c0fca6f884 Add score_details 2023-06-22 12:39:14 +02:00
Filip Bachul
9015a8e8d9 Merge branch 'main' into cymruu/payload-unit-test 2023-06-21 09:26:50 +02:00
meili-bors[bot]
28404d56b7 Merge #3799
3799: Fix error messages in `check-release.sh` r=curquiza a=vvv

- `check_tag`: Report file name correctly. Use named variables.
- Introduce `read_version` helper function. Simplify the implementation.
- Show meaningful error message if `GITHUB_REF` is not set or its format is incorrect.

Co-authored-by: Valeriy V. Vorotyntsev <valery.vv@gmail.com>
2023-06-20 13:35:33 +00:00
meili-bors[bot]
262c1f2baf Merge #3844
3844: Fix SDK CI (again) r=curquiza a=curquiza

Following this PR: https://github.com/meilisearch/meilisearch/pull/3813

Sorry `@Kerollmops,` here is (I hope) the latest fix 🙏 I made tests last time that were not sufficient. I really did a lot this time. I hope I have not missed anything.



Co-authored-by: curquiza <clementine@meilisearch.com>
2023-06-20 13:01:07 +00:00
Valeriy V. Vorotyntsev
cfed349aa3 Fix error messages in check-release.sh
- `check_tag`: Report file name correctly. Use named variables.
- Introduce `read_version` helper function. Simplify the implementation.
- Show meaningful error message if `GITHUB_REF` is not set or its format
  is incorrect.
2023-06-20 13:58:09 +03:00
Louis Dureuil
f050634b1e add virtual conditions to fid and position to always have the max cost 2023-06-20 10:07:18 +02:00
Louis Dureuil
becf1f066a Change how the cost of removing words is computed 2023-06-20 09:45:43 +02:00
Louis Dureuil
701d299369 Remove out-of-date comment 2023-06-20 09:45:42 +02:00
Louis Dureuil
a20e4d447c Position now takes into account the distance to the position of the word in the query
it used to be based on the distance to the position 0
2023-06-20 09:45:42 +02:00
Louis Dureuil
af57c3c577 Proximity costs 0 for documents that are perfectly matching 2023-06-20 09:45:42 +02:00
Louis Dureuil
0c40ef6911 Fix sort id 2023-06-20 09:45:42 +02:00
curquiza
bbc9f68ff5 Use the input from the previous job instead of the workflow dispatch 2023-06-19 18:49:15 +02:00
meili-bors[bot]
45636d315c Merge #3670
3670: Fix addition deletion bug r=irevoire a=irevoire

The first commit of this PR is a revert of https://github.com/meilisearch/meilisearch/pull/3667. It re-enable the auto-batching of addition and deletion of tasks. No new changes have been introduced outside of `milli`. So all the changes you see on the autobatcher have actually already been reviewed.

It fixes https://github.com/meilisearch/meilisearch/issues/3440.

### What was happening?

The issue was that the `external_documents_ids` generated in the `transform` were used in a very strange way that wasn’t compatible with the deletion of documents.
Instead of doing a clear merge between the external document IDs of the DB and the one returned by the transform + writing it on disk, we were doing some weird tricks with the soft-deleted to avoid writing the fst on disk as much as possible.
The new algorithm may be a bit slower but is way more straightforward and doesn’t change depending on if the soft deletion was used or not. Here is a list of the changes introduced:
1. We now do a clear distinction between the `new_external_documents_ids` coming from the transform and only held on RAM and the `external_documents_ids` coming from the DB.
2. The `new_external_documents_ids` (coming out of the transform) are now represented as an `fst`. We don't need to struggle with the hard, soft distinction + the soft_deleted => That's easier to understand
3. When indexing documents, we merge the `external_documents_ids` coming from the DB and the `new_external_documents_ids` coming from the transform.

### Other things introduced in this  PR

Since we constantly have to write small, very specialized fuzzers for this kind of bug, we decided to push the one used to reproduce this bug.
It's not perfect, but it's easy to improve in the future.
It'll also run for as long as possible on every merge on the main branch.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
2023-06-19 09:09:30 +00:00
meili-bors[bot]
cb9d78fc7f Merge #3835
3835: Add more documentation to graph-based ranking rule algorithms + comment cleanup r=Kerollmops a=loiclec

In addition to documenting the `cheapest_path.rs` file, this PR cleans up a few outdated comments as well as some TODOs. These TODOs have been moved to https://github.com/meilisearch/meilisearch/issues/3776



Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
2023-06-15 15:30:24 +00:00
meili-bors[bot]
01d2ee5cc1 Merge #3836
3836: Remove trailing whitespace in snapshots r=dureuill a=dureuill

# Pull Request

## Related issue

No issue, maintenance

## What does this PR do?
- Remove trailing whitespace in snapshots by adding a trailing `|` at the end of lines that would previously end with fixed-width integers
- This allows contributors whose editor is configured to remove trailing whitespace not to modify the tests when changing an unrelated part of the file containing the tests


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-14 13:00:52 +00:00
Louis Dureuil
e0c4682758 Fix tests 2023-06-14 13:30:52 +02:00
Louis Dureuil
d9b4b39922 Add trailing pipe to the snapshots so it doesn't end with trailing whitespace 2023-06-14 13:30:52 +02:00
Loïc Lecrenier
2da86b31a6 Remove comments and add documentation 2023-06-14 12:39:42 +02:00
Loïc Lecrenier
4e81445d42 Stop the fuzzer after an hour 2023-06-12 15:30:51 +02:00
meili-bors[bot]
4829348d6e Merge #3813
3813: Fix SDK CI for scheduled jobs r=curquiza a=curquiza

The SDK CI does not run for the scheduled job (`cron`) every day, and only works for manual triggers.

I added a job to define the Docker image we use depending on the event: `worflow_dispatch` = manual triggering, or `scheduled` = cron jobs

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-06-12 08:41:03 +00:00
meili-bors[bot]
047d22fcb1 Merge #3824
3824: Changes the way words are counted in the word count DB r=ManyTheFish a=dureuill

# Pull Request

## Related issue

Fixes https://github.com/meilisearch/meilisearch/issues/3823

## What does this PR do?

- Apply offset when parsing query that is consistent with the indexing

### DB breaking changes

- Count the number of words in `field_id_word_count_docids`
- raise limit of word count for storing the entry in the DB from 10 to 30

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-08 13:26:05 +00:00
Louis Dureuil
a2a3b8c973 Fix offset difference between query and indexing for hard separators 2023-06-08 12:07:12 +02:00
Louis Dureuil
9f37b61666 DB BREAKING: raise limit of word count from 10 to 30. 2023-06-08 12:07:12 +02:00
Louis Dureuil
c15c076da9 DB BREAKING: Count the number of words in field_id_word_count_docids 2023-06-08 12:07:11 +02:00
meili-bors[bot]
9dcf1da59d Merge #3819
3819: Remove the `docid_word_positions` database r=Kerollmops a=loiclec

Remove the `docid_word_positions` database, which was only used during deletion operations. In the process, also fixes https://github.com/meilisearch/meilisearch/issues/3816




Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
2023-06-07 09:53:25 +00:00
Loïc Lecrenier
8628a0c856 Remove docid_word_positions_db + fix deletion bug
That would happen when a word was deleted from all exact attributes
but not all regular attributes.
2023-06-07 10:52:50 +02:00
meili-bors[bot]
c1e3cc04b0 Merge #3811
3811: Bring back changes from `release-v1.2.0` to `main` r=Kerollmops a=curquiza



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Filip Bachul <filipbachul@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-06-06 13:10:24 +00:00
meili-bors[bot]
d96d8bb0dd Merge #3789
3789: Improve the metrics r=dureuill a=irevoire

# Pull Request

## Related issue
Implements https://github.com/meilisearch/meilisearch/issues/3790
Associated specification: https://github.com/meilisearch/specifications/pull/242

## Be cautious; it's DB-breaking 😱 

While reviewing and after merging this PR, be cautious; if you already have a `data.ms` and run meilisearch with this code on it, it won't work because we need to cache a new information on the index stats (that are backed up on disk). You'll get internal errors.

### About the breaking-change label

We only break the API of the metrics route, which does not pose any problem since it's experimental.

## What does this PR do?
- Create a method to get the « facet distribution » of the task queue.
- Prefix all the metrics by `meilisearch_`
- Add the real database size used by meilisearch
- Add metrics on the task queue
- Update the grafana dashboard to these new changes
- Move the dashboard to the `assets` directory
- Provide a new prometheus file to scrape meilisearch easily

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-06-06 11:44:54 +00:00
Tamo
4a3405afec comment the stats method 2023-06-06 12:59:58 +02:00
Tamo
3cfd653db1 Apply suggestions from code review
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-06 11:38:41 +02:00
curquiza
b6b6a80b76 Fix SDK CI for scheduled jobs 2023-06-06 10:38:05 +02:00
Clémentine U. - curqui
f3e2f79290 Merge branch 'main' into tmp-release-v1.2.0 2023-06-05 18:36:28 +02:00
meili-bors[bot]
f517274d1f Merge #3788
3788: Use `RoaringBitmap::deserialize_unchecked_from` to reduce the deserialization time r=irevoire a=Kerollmops

This pull request replaces the `RoaringBitmap::deserialize_from` methods with the `deserialize_unchecked_from` to avoid doing too much checks. We know the written bitmaps are valid as we do not disable the checks during the indexation phase.

I did a small test with #3780 and discovered that the deserialization time changed from 32% to 9.46% when using these changes. It seems it was low-hanging fruit hidden behind a leaf.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-06-05 09:20:30 +00:00