Commit Graph

11464 Commits

Author SHA1 Message Date
mpostma
55e6cb9c7b typos on first letter counts as 2 2022-02-02 12:56:09 +01:00
mpostma
642c01d0dc set max typos on ngram to 1 2022-02-02 12:56:08 +01:00
bors[bot]
9448ca58aa Merge #2005
2005: auto batching r=MarinPostma a=MarinPostma

This pr implements auto batching. The basic functioning of this is that all updates that can be batched together are batched together while the previous batch is being processed.

For now, the only updates that can be batched together are the document addition updates (both update and replace), for a single index.

The batching is disabled by default for multiple reasons:
- We need more experimentation with the scheduling techniques
- Right now, if one task fails in a batch, the whole batch fails. We need more permissive error handling when processing document indexation.

There are four CLI options, for now, to interact with how the batch is scheduled:
- `enable-autobatching`: enable the autobatching feature.
- `debounce-duration-sec`: When an update is received, wait that number of seconds before batching and performing the updates. Defaults to 0s.
- `max-batch-size`: the maximum number of tasks per batch, defaults to unlimited.
- `max-documents-per-batch`: the maximum number of documents in a batch, defaults to unlimited. The batch will always contain a least 1 task, no matter the number of documents in that task.

# Implementation

The current implementation is made of 3 major components:

## TaskStore
The `TaskStore` contains all the tasks. When a task is pushed, it is directly registered to the task store.

## Scheduler
The scheduler is in charge of making the batches. At its core, there is a `TaskQueue` and a job queue. `Job`s are always processed first. They are *volatile* tasks, that is, they don't have a TaskId and are not persisted to disk. Snapshots and dumps are examples of Jobs.

If no `Job` is available for processing, then the scheduler attempts to make a `Task` batch from the `TaskQueue`. The first step is to gather new tasks from the `TaskStore` to populate the `TaskQueue`. When this is done, we can prepare our batch. The `TaskQueue` is itself a `BinaryHeap` of `Tasklist`. Each `index_uid` is associated with a `TaskList` that contains all the updates associated with that index uid. Each `TaskList` in the `TaskQueue` is ordered by the id of its first task.

When preparing a batch, the `TaskList` at the top of the `TaskQueue` is popped, and the tasks are popped from the list to make the next batch. If there are remaining tasks in the list, the list is inserted back in the `TaskQueue`.

## UpdateLoop
The `UpdateLoop` role is to perform batch sequentially. Each time updates are pushed to the update store, the scheduler is notified, and will in turn notify the update loop that work can be performed. When notified, the update loop waits some time to wait for more incoming update and then asks the scheduler for the next batch to perform and perform it. When it is done, the status of the task is put back into the store, and the next batch is processed.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2022-02-02 11:04:30 +00:00
ad hoc
d852dc0d2b fix phrase search 2022-02-01 20:21:33 +01:00
mpostma
c9a236b0af feat(lib): auto-batching 2022-02-01 18:06:20 +01:00
bors[bot]
622c15e825 Merge #2096
2096: feat(auth): Tenant token r=Kerollmops a=ManyTheFish

Make meilisearch support JWT authentication signed with meilisearch API keys
using HS256, HS384 or HS512 algorithms.

Related spec: [specifications#89](https://github.com/meilisearch/specifications/pull/89) [rendered](https://github.com/meilisearch/specifications/blob/scoped-api-keys/text/0089-tenant-tokens.md)
Fix #1991 


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-01-27 10:38:41 +00:00
Kerollmops
fb79c32430 Compute the new, common and, deleted prefix words fst once 2022-01-27 11:00:18 +01:00
bors[bot]
054598734a Merge #2120
2120: Bring `stable` into `main` r=curquiza a=curquiza

I forgot to do it, tell me `@Kerollmops` or `@irevoire` if it's useful or not. I would say yes, otherwise I will have conflict when I will try to bring `main` into `stable` for the next release. Maybe I'm wrong

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-01-27 09:35:21 +00:00
Clément Renault
51d1e64b23 Remove, now useless, the WriteMethod enum 2022-01-27 10:08:35 +01:00
Clément Renault
e9c02173cf Rework the WordsPrefixPositionDocids update to compute a subset of the database 2022-01-27 10:08:35 +01:00
Clément Renault
dbba5fd461 Create a function to simplify the word prefix pair proximity docids compute 2022-01-27 10:08:35 +01:00
Clément Renault
e760e02737 Fix the computation of the newly added and common prefix pair proximity words 2022-01-27 10:08:35 +01:00
Clément Renault
d59e559317 Fix the computation of the newly added and common prefix words 2022-01-27 10:08:34 +01:00
Clément Renault
2ec8542105 Rework the WordPrefixDocids update to compute a subset of the database 2022-01-27 10:08:34 +01:00
Clément Renault
28692f65be Rework the WordPrefixDocids update to compute a subset of the database 2022-01-27 10:08:34 +01:00
Clément Renault
5404bc02dd Move the fst_stream_into_hashset method in the helper methods 2022-01-27 10:06:00 +01:00
Clément Renault
c90fa95f93 Only compute the word prefix pairs on the created word pair proximities 2022-01-27 10:06:00 +01:00
Clément Renault
822f67e9ad Bring the newly created word pair proximity docids 2022-01-27 10:06:00 +01:00
Clément Renault
d28f18658e Retrieve the previous version of the words prefixes FST 2022-01-27 10:05:59 +01:00
ManyTheFish
7ca647f0d0 feat(auth): Implement Tenant token
Make meilisearch support JWT authentication signed with meilisearch API keys
using HS256, HS384 or HS512 algorithms.

Related spec: https://github.com/meilisearch/specifications/pull/89
Fix #1991
2022-01-27 08:25:39 +01:00
bors[bot]
38d23546a5 Merge #431
431: Fix and improve word prefix pair proximity r=ManyTheFish a=Kerollmops

This PR first fixes the algorithm we used to select and compute the word prefix pair proximity database. The previous version was skipping nearly all of the prefixes. The issue is that this fix made this method to take more time and we were trying to reduce the time spent in it.

With `@ManyTheFish` we found out that we could skip some of the work we were doing by:
 - discarding the prefixes that were shorter than a specific threshold (default: 2).
 - discarding the word prefix pairs with proximity bigger than a specific threshold (default: 4).
 - remove the unused threshold that was specifying a minimum amount of word docids to merge.

We will take more time to do some more optimization, like stop clearing and recomputing from scratch the database, we will compute the subsets of keys to create, keep and merge. This change is a little bit more complex than what this PR does.

I keep this PR as a draft as I want to further test the real gain if it is enough or not if it is valid or not. I advise reviewers to review commit by commit to see the changes bit by bit, reviewing the whole PR can be hard.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-01-27 07:04:56 +00:00
Clémentine Urquizar - curqui
aa50fcb1f0 Merge branch 'main' into stable 2022-01-26 20:17:41 +01:00
Clémentine Urquizar - curqui
b408de0761 Merge pull request #2117 from meilisearch/rebranding
Changes related to the rebranding
2022-01-26 19:58:54 +01:00
Tamo
72d9c5ee5c fix(rebranding): Update the ascii art (#2118) 2022-01-26 18:53:07 +01:00
bors[bot]
c63f945093 Merge #441
441: Changes related to the rebranding r=curquiza a=meili-bot

_This PR is auto-generated._

 - [X] Change the name `MeiliSearch` to `Meilisearch` in README.
 - [x] ⚠️ Ensure the bot did not update part you don’t want it to update, especially in the code examples in the Getting started.
 - [x] Please, ensure there is no other "MeiliSearch". For example, in the comments or in the tests name.
 - [x] Put the new logo on the README if needed -> still using the milli logo so far


Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-01-26 17:07:37 +00:00
Clémentine Urquizar
2b7440d4b5 Fix some typo 2022-01-26 17:56:18 +01:00
Clémentine Urquizar - curqui
3bc6a18bcd Update README.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-01-26 17:54:51 +01:00
Clémentine Urquizar - curqui
db56d6cb11 Update download-latest.sh 2022-01-26 17:54:22 +01:00
Clémentine Urquizar - curqui
a5759139bf Update CONTRIBUTING.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-01-26 17:51:38 +01:00
Clémentine Urquizar
0f213f2202 Replace MeiliSearch by Meilisearch 2022-01-26 17:49:55 +01:00
Clémentine Urquizar
de808a391a Replace meilisearch by Meilisearch 2022-01-26 17:48:22 +01:00
Clémentine Urquizar
8a959da120 Update MeiliSearch into Meilisearch everywhere 2022-01-26 17:43:16 +01:00
Clémentine Urquizar
0a78750465 Replace meilisearch by Meilisearch 2022-01-26 17:35:56 +01:00
Clémentine Urquizar
372f4fc924 Replace logo 2022-01-26 17:34:31 +01:00
meili-bot
0d282e3cc5 Update README.md 2022-01-26 16:33:16 +01:00
meili-bot
ae5b401e74 Update README.md 2022-01-26 16:31:04 +01:00
meili-bot
c562655be7 Update CONTRIBUTING.md 2022-01-26 16:31:03 +01:00
bors[bot]
d342c3c357 Merge #438
438: CLI improvements r=Kerollmops a=MarinPostma

I've made the following changes to the cli:
- `settings-update` become `settings`, with two subcommands: `update` and `show`.
- `document-addition` becomes `documents` with a subcommands: `add` (I'll add a feature to list documents later)
- `search` now has an interactive mode `-i`
- search return the number of documents and the time it took to perform the search.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2022-01-26 15:18:20 +00:00
bors[bot]
c8bb54cd94 Merge #2098
2098: feat(dump): Provide the same cli options as the snapshots r=MarinPostma a=irevoire

Add two cli options for the dump:
- `--ignore-missing-dump`
- `--ignore-dump-if-db-exists`

Fix #2087

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-26 14:32:23 +00:00
bors[bot]
8c80326dd5 Merge #2086
2086: feat(analytics): send the whole set of cli options instead of only the snapshot r=MarinPostma a=irevoire

Fixes #2088 

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-26 13:53:20 +00:00
Tamo
bad4bed439 feat(dump): Provide the same cli options as the snapshots
Add two cli options for the dump:
- `--ignore-missing-dump`
- `--ignore-dump-if-db-exists`

Fix #2087
2022-01-26 14:34:06 +01:00
Tamo
7828da15c3 feat(analytics): send the whole set of cli options instead of only the snapshot 2022-01-26 13:52:41 +01:00
Clément Renault
f9b214f34e Apply suggestions from code review
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2022-01-26 11:28:11 +01:00
bors[bot]
e1cc025cbd Merge #440
440: fix(fuzzer): fix the fuzzer after #430 r=Kerollmops a=irevoire



Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-25 16:33:57 +00:00
Clément Renault
f04cd19886 Introduce a max prefix length parameter to the word prefix pair proximity update 2022-01-25 17:04:23 +01:00
Clément Renault
1514dfa1b7 Introduce a max proximity parameter to the word prefix pair proximity update 2022-01-25 17:04:23 +01:00
Clément Renault
23ea3ad738 Remove the useless threshold when computing the word prefix pair proximity 2022-01-25 17:04:23 +01:00
Clément Renault
e3c34684c6 Fix a bug where we were skipping most of the prefix pairs 2022-01-25 17:04:23 +01:00
mpostma
b5f01b52c7 cli improvements 2022-01-25 14:08:30 +01:00
Tamo
fb51d511be fix(fuzzer): fix the fuzzer after #430 2022-01-25 12:08:47 +01:00