Commit Graph

7147 Commits

Author SHA1 Message Date
42cb94e1f4 fix: docker image failed to boot on arm64 node v0.1.0-test 2022-02-14 13:23:20 +08:00
6e7a0cc65d Merge #2157
2157: fix(auth): fix env being closed when dumping r=Kerollmops a=MarinPostma

When creating a dump, the auth store environment would be closed on drop, so subsequent dumps couldn't reopen the environment. I have added a flag in the environment to prevent the closing of the environment on drop when dumping.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-02-09 15:04:19 +00:00
23eba82038 fix(auth): fix env being closed when dumping 2022-02-09 13:56:26 +01:00
001b9acb63 Merge #2155
2155: Fix curl command in download-latest script r=Kerollmops a=curquiza

Fixes #2151 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-02-08 15:29:51 +00:00
af65ccfd6a Fix curl command in download-latest script 2022-02-08 16:28:26 +01:00
0c7251475d Merge #2150
2150: Bump milli to v0.22.1 r=curquiza a=curquiza

Fixes https://github.com/meilisearch/meilisearch/issues/2138 and https://github.com/meilisearch/meilisearch/issues/2123 by bumping milli to [v0.22.1](https://github.com/meilisearch/milli/releases/tag/v0.22.1)

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-02-08 15:05:06 +00:00
1a87b2f37d Bump milli to v0.22.1 2022-02-08 11:21:44 +01:00
ea15ad6c34 Merge #447
447: Update version for the next release (v0.22.1) r=curquiza a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-02-07 17:44:09 +00:00
d03b3ceb58 Update version for the next release (v0.22.1) 2022-02-07 18:39:29 +01:00
5d58cb7449 Merge #442
442: fix phrase search r=curquiza a=MarinPostma

Run the exact match search on 7 words windows instead of only two. This makes false positive very very unlikely, and impossible on phrase query that are less than seven words.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-02-07 16:18:20 +00:00
752a0e13ad Merge #2136
2136: Refactoring CI regarding ARM binary publish r=curquiza a=curquiza

Fixes https://github.com/meilisearch/meilisearch/issues/1909

- Remove CI file to publish aarch64 binary and put the logic into `publish-binary.yml`
- Remove the job to publish armv8 binary
- Fix download-latest script accordingly
- Adapt dowload-latest with the specific case of the MacOS m1

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
2022-02-07 16:07:46 +00:00
ccaca33446 Add --fail-with-body flag to curl in script 2022-02-07 16:16:49 +01:00
2a90e805a2 Fix script 2022-02-07 16:05:48 +01:00
c4a2d70d19 Fix error handler for curl command in script 2022-02-07 16:01:50 +01:00
f7e4a0177d Update download-latest.sh
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-02-07 13:53:30 +01:00
cca65499de Merge #2145
2145: Update LICENSE r=meili-bot a=curquiza



Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-02-07 12:51:50 +00:00
c5a996aa78 Merge #446
446: Update LICENSE r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-02-07 09:47:39 +00:00
80fa7dbbfa Update LICENSE 2022-02-05 18:29:47 +01:00
1279c38ac9 Update LICENSE 2022-02-05 18:29:11 +01:00
c24b1e5250 Merge #2135
2135: bug(auth): Make API keys accept Null descriptions r=curquiza a=ManyTheFish

Fix #2116


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-02-03 15:26:11 +00:00
267d14c28d Merge #445
445: allow null values in csv r=Kerollmops a=MarinPostma

This pr allows null values in csv:
- if the field is of type string, then an empty field is considered null (`,,`), anything other is turned into a string (i.e `, ,` is a single whitespace string)
- if the field is of type number, when the trimmed field is empty, we consider the value null (i.e `,,`, `, ,` are both null numbers) otherwise we try to parse the number.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-02-03 15:11:32 +00:00
bd2262ceea allow null values in csv 2022-02-03 16:03:01 +01:00
13de251047 rewrite word pair distance gathering 2022-02-03 15:57:20 +01:00
78cf8f1f9f Fix typo 2022-02-02 19:32:20 +01:00
fda4f229bb Merge #417
417: Change chunk size to 4MiB to fit more the end user usage r=Kerollmops a=ManyTheFish

Reverts meilisearch/milli#379

We made several indexing tests using different sizes of datasets (5 datasets from 9MiB to 100MiB) on several typologies of VMs (`XS: 1GiB RAM, 1 VCPU`, `S: 2GiB RAM, 2 VCPU`, `M: 4GiB RAM, 3 VCPU`, `L: 8GiB RAM, 4 VCPU`).
The result of these tests shows that the `4MiB` chunk size seems to be the best size compared to other chunk sizes (`2Mib`, `4MiB`, `8Mib`, `16Mib`,  `32Mib`, `64Mib`, `128Mib`).

below is the average time per chunk size:

![Capture d’écran 2021-09-27 à 14 27 50](https://user-images.githubusercontent.com/6482087/134909368-ef0bc45e-68d5-49d1-aaf9-91113b7c410f.png)

<details>
<summary>Detailled data</summary>
<br>

![Capture d’écran 2021-09-27 à 14 39 48](https://user-images.githubusercontent.com/6482087/134909952-a36b1457-bbbd-4a6c-bbe5-519e4b926b5a.png)
</br>
</details> 


Co-authored-by: Many <many@meilisearch.com>
2022-02-02 18:30:59 +00:00
1da7277817 Fix dowload-latest.sh according to the new name of the binary 2022-02-02 19:25:52 +01:00
c71c95feb0 Refactor CIs to publish aaarch64 binary 2022-02-02 19:25:28 +01:00
2468ebb76b Merge #444
444: Fix the parsing of ndjson requests to index more than the first line r=Kerollmops a=Kerollmops

This PR correctly uses the `BufRead` trait to read every line of the content instead of just the first one. This bug was only affecting the http-ui test crate.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-02-02 17:59:44 +00:00
3bee31e6c7 bug(auth): Make API keys accept Null descriptions 2022-02-02 18:18:17 +01:00
9142ba9dd4 Fix the parsing of ndjson requests to index more than the first line 2022-02-02 17:55:13 +01:00
d59bcea749 Revert "Revert "Change chunk size to 4MiB to fit more the end user usage"" 2022-02-02 17:01:13 +01:00
7541ab99cd review changes 2022-02-02 12:59:01 +01:00
d0aabde502 optimize 2 typos case 2022-02-02 12:56:09 +01:00
55e6cb9c7b typos on first letter counts as 2 2022-02-02 12:56:09 +01:00
642c01d0dc set max typos on ngram to 1 2022-02-02 12:56:08 +01:00
9448ca58aa Merge #2005
2005: auto batching r=MarinPostma a=MarinPostma

This pr implements auto batching. The basic functioning of this is that all updates that can be batched together are batched together while the previous batch is being processed.

For now, the only updates that can be batched together are the document addition updates (both update and replace), for a single index.

The batching is disabled by default for multiple reasons:
- We need more experimentation with the scheduling techniques
- Right now, if one task fails in a batch, the whole batch fails. We need more permissive error handling when processing document indexation.

There are four CLI options, for now, to interact with how the batch is scheduled:
- `enable-autobatching`: enable the autobatching feature.
- `debounce-duration-sec`: When an update is received, wait that number of seconds before batching and performing the updates. Defaults to 0s.
- `max-batch-size`: the maximum number of tasks per batch, defaults to unlimited.
- `max-documents-per-batch`: the maximum number of documents in a batch, defaults to unlimited. The batch will always contain a least 1 task, no matter the number of documents in that task.

# Implementation

The current implementation is made of 3 major components:

## TaskStore
The `TaskStore` contains all the tasks. When a task is pushed, it is directly registered to the task store.

## Scheduler
The scheduler is in charge of making the batches. At its core, there is a `TaskQueue` and a job queue. `Job`s are always processed first. They are *volatile* tasks, that is, they don't have a TaskId and are not persisted to disk. Snapshots and dumps are examples of Jobs.

If no `Job` is available for processing, then the scheduler attempts to make a `Task` batch from the `TaskQueue`. The first step is to gather new tasks from the `TaskStore` to populate the `TaskQueue`. When this is done, we can prepare our batch. The `TaskQueue` is itself a `BinaryHeap` of `Tasklist`. Each `index_uid` is associated with a `TaskList` that contains all the updates associated with that index uid. Each `TaskList` in the `TaskQueue` is ordered by the id of its first task.

When preparing a batch, the `TaskList` at the top of the `TaskQueue` is popped, and the tasks are popped from the list to make the next batch. If there are remaining tasks in the list, the list is inserted back in the `TaskQueue`.

## UpdateLoop
The `UpdateLoop` role is to perform batch sequentially. Each time updates are pushed to the update store, the scheduler is notified, and will in turn notify the update loop that work can be performed. When notified, the update loop waits some time to wait for more incoming update and then asks the scheduler for the next batch to perform and perform it. When it is done, the status of the task is put back into the store, and the next batch is processed.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2022-02-02 11:04:30 +00:00
d852dc0d2b fix phrase search 2022-02-01 20:21:33 +01:00
c9a236b0af feat(lib): auto-batching 2022-02-01 18:06:20 +01:00
622c15e825 Merge #2096
2096: feat(auth): Tenant token r=Kerollmops a=ManyTheFish

Make meilisearch support JWT authentication signed with meilisearch API keys
using HS256, HS384 or HS512 algorithms.

Related spec: [specifications#89](https://github.com/meilisearch/specifications/pull/89) [rendered](https://github.com/meilisearch/specifications/blob/scoped-api-keys/text/0089-tenant-tokens.md)
Fix #1991 


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-01-27 10:38:41 +00:00
fb79c32430 Compute the new, common and, deleted prefix words fst once 2022-01-27 11:00:18 +01:00
054598734a Merge #2120
2120: Bring `stable` into `main` r=curquiza a=curquiza

I forgot to do it, tell me `@Kerollmops` or `@irevoire` if it's useful or not. I would say yes, otherwise I will have conflict when I will try to bring `main` into `stable` for the next release. Maybe I'm wrong

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-01-27 09:35:21 +00:00
51d1e64b23 Remove, now useless, the WriteMethod enum 2022-01-27 10:08:35 +01:00
e9c02173cf Rework the WordsPrefixPositionDocids update to compute a subset of the database 2022-01-27 10:08:35 +01:00
dbba5fd461 Create a function to simplify the word prefix pair proximity docids compute 2022-01-27 10:08:35 +01:00
e760e02737 Fix the computation of the newly added and common prefix pair proximity words 2022-01-27 10:08:35 +01:00
d59e559317 Fix the computation of the newly added and common prefix words 2022-01-27 10:08:34 +01:00
2ec8542105 Rework the WordPrefixDocids update to compute a subset of the database 2022-01-27 10:08:34 +01:00
28692f65be Rework the WordPrefixDocids update to compute a subset of the database 2022-01-27 10:08:34 +01:00
5404bc02dd Move the fst_stream_into_hashset method in the helper methods 2022-01-27 10:06:00 +01:00
c90fa95f93 Only compute the word prefix pairs on the created word pair proximities 2022-01-27 10:06:00 +01:00