Compare commits

...

173 Commits

Author SHA1 Message Date
3172c96042 Merge #1795
1795: Update milli version r=irevoire a=curquiza

Closes #1788 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-10-11 14:27:46 +00:00
60473637fe Update milli version 2021-10-11 16:21:19 +02:00
dffd90b966 Merge #1783
1783: Fix too many open file error r=ManyTheFish a=ManyTheFish

- prepare_for_closing() function wasn't called when an index is deleted, we are now calling it
- Index wasn't deleted in the case where we couldn't insert `uid` in `index_uuid_store`, we are now cleaning it

Fix #1736

Co-authored-by: many <maxime@meilisearch.com>
2021-10-07 15:48:07 +00:00
a92a0c3ed3 Log the error instead of returning it when deletion fails 2021-10-07 17:38:22 +02:00
0774b1efa5 Close index's heed environment when index is deleted 2021-10-07 17:09:41 +02:00
7fc7eb7457 Make sure to remove newly created index if uid is already taken 2021-10-07 16:49:21 +02:00
5e3e108143 Merge #1769
1769: Enforce `Content-Type` header for routes requiring a body payload r=MarinPostma a=irevoire

closes #1665 

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-10-06 15:43:50 +00:00
66dbd3cd34 makes clippy happy 2021-10-06 17:39:04 +02:00
9a1e44dc78 Apply suggestion
- remove the payload_error_handler in favor of a PayloadError::from

- merge the two match branch into one

- makes the accepted content type a const instead of recalculating it for every error
2021-10-06 17:15:47 +02:00
37b267ffb3 duplicate the post document tests with the put verb 2021-10-06 17:15:47 +02:00
dfa199f98f add content-type tests
fix typo

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-10-06 17:15:47 +02:00
c6d107a05f makes the content-type mandatory for every routes 2021-10-06 17:15:47 +02:00
ddbcf449da Merge #1763
1763: Index tests r=MarinPostma a=MarinPostma

This pr aims to test more thorougly the usage on index in the meilisearch database, by writing unit tests.

work included:
- [x] Create index mock and stub methods
- [x] Test snapshot creation
- [x] Test Dumps
- [x] Test search

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-10-06 14:39:53 +00:00
9fa61439b1 fix clippy warning & unsafety 2021-10-06 14:51:46 +02:00
a38215de98 edit documentation 2021-10-06 14:35:18 +02:00
85b5260d9d simple search unit test 2021-10-06 14:20:05 +02:00
4b4ebad9a9 test dumps 2021-10-06 14:10:26 +02:00
ece4c739f4 update store tests 2021-10-06 14:10:26 +02:00
85ae34cf9f test snapshots 2021-10-06 14:10:23 +02:00
0448f0ce56 handle panic in stubs 2021-10-06 14:09:04 +02:00
4835d82a0b implement index mock 2021-10-06 14:09:01 +02:00
2190764162 Merge #1768
1768: Fix auth error r=irevoire a=MarinPostma

fix a small auth error, that set the invalid token error token to "hello". This was invilisble to the user because the invalid token is not returned.

thank you hawk-eye `@irevoire` 

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-10-05 15:16:14 +00:00
3b91764587 fix auth error 2021-10-05 09:09:40 +02:00
0c3ec549f8 Merge #1764
1764: Bump Milli to improve geosearch errors r=curquiza a=irevoire

closes #1734 

`@curquiza,` your two examples still don't work because a filter must be composed of multiples operations; look at my screenshot to see what works and what doesn't.
Is this ok? 🤔 

`@gmourier,`
What do you think?

![image](https://user-images.githubusercontent.com/7032172/135846911-588f652d-16db-4d88-89fd-148640bac0f7.png)


And here is a screenshot with all the new errors that have been implemented

![image](https://user-images.githubusercontent.com/7032172/135854851-da469fef-0dd0-4ff1-b15e-89934ed8fb6f.png)

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-10-04 17:30:22 +00:00
fca686e7f8 bump meilisearch 2021-10-04 13:52:37 +02:00
607e28749a Merge #1755
1755: Fix mini dashboard r=curquiza a=anirudhRowjee

This commit is a fix to issue #1750.
As a part of the changes to solve this issue, the following changes have
been made -
1. Route registration for static assets has been modified
2. the `mut` keyword on the `scope` has been removed.

Co-authored-by: Anirudh Rowjee <ani.rowjee@gmail.com>
2021-10-04 09:56:21 +00:00
bffab21b10 Changes
1. Removed redundant scope registration
2021-10-04 14:47:05 +05:30
151f691609 Fixes #1750
This commit is a fix to issue #1750.
As a part of the changes to solve this issue, the following changes have
been made -
1. Route registration for static assets has been modified
2. the `mut` keyword on the `scope` has been removed.
2021-10-02 15:24:04 +05:30
81993b6a15 Merge #1747
1747: Add new error types for document additions r=curquiza a=MarinPostma

Adds the missing errors for the documents routes, as specified.

close #1691
close #1690


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-09-30 15:16:44 +00:00
4eb3817b03 missing payload error 2021-09-30 16:58:13 +02:00
18cb514073 invalid content type error 2021-09-30 16:58:13 +02:00
ddd40d87a7 malformed payload error 2021-09-30 16:58:13 +02:00
137272b8de empty content type error 2021-09-30 16:58:13 +02:00
e400ae900d Merge #1746
1746: Do not commit transaction on failed updates r=irevoire a=Kerollmops

This PR fixes MeiliSearch that was always committing the transactions even when an update was invalid and the whole transaction should have been trashed. It was the source of a bug where an invalid update (with an invalid primary key) was creating an index with the specified primary key and should instead have failed and done nothing on the server.

Fixes #1735.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-09-30 13:46:43 +00:00
c388dca5ec Check that invalid updates do not create an index with a primary key 2021-09-30 15:46:04 +02:00
6a691db7f8 Do not commit transaction on failed updates 2021-09-30 15:46:03 +02:00
ed783b67ca Merge #1742
1742: Create dumps v3 r=irevoire a=MarinPostma

The introduction of the obkv document format has changed the format of the updates, by removing the need for the document format of the addition (it is not necessary since update are store in the obkv format). This has caused breakage in the dumps that this pr solves by introducing a 3rd version of the dumps.

A v2 compat layer has been created that support the import of v2 dumps into meilisearch. This has permitted to move the compat code that existed elsewhere in meiliearch to be moved into the v2 module. The asc/desc patching is now only done for forward compatibility when loading a v2 dump, and the v3 write the asc/desc to the dump with the new syntax.




Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-09-30 13:17:30 +00:00
ee372a7b30 implement new dump v2 2021-09-30 14:49:13 +02:00
66f39aaa92 fix dump v3 2021-09-30 14:49:13 +02:00
03af99650d fix dumpv1 2021-09-30 14:49:13 +02:00
44a2ff07b1 Merge #1697
1697: Make exec binary for M1 mac available for download r=irevoire a=k-nasa

## Why

fix: https://github.com/meilisearch/MeiliSearch/issues/1661

Now, Do not supported getting exec file for m1 mac on using`download-latest.sh`.


## What

Download x86 binary when run `download-latest.sh` on m1 mac, because it can execute binary targeting x86.

## Proof

I verified like this.
I got executable binary on M1 mac 💡 

```sh
:) % arch
arm64

:) % ./download-latest.sh
Downloading MeiliSearch binary v0.21.1 for macos, architecture amd64...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   631  100   631    0     0   2035      0 --:--:-- --:--:-- --:--:--  2035
100 43.5M  100 43.5M    0     0  7826k      0  0:00:05  0:00:05 --:--:-- 9123k
MeiliSearch binary successfully downloaded as 'meilisearch' file.

Run it:
    $ ./meilisearch
Usage:
    $ ./meilisearch --help

:) % file ./meilisearch
./meilisearch: Mach-O 64-bit executable x86_64

:) % ./meilisearch --help # this is execuable
meilisearch-http 0.21.1
...
...
```

Co-authored-by: k-nasa <htilcs1115@gmail.com>
Co-authored-by: nasa <htilcs1115@gmail.com>
2021-09-30 11:33:48 +00:00
fb95540394 Merge #1748
1748: Add a link to join the cloud-hosted beta r=MarinPostma a=gmourier

The product team would like to add a link to communicate and invite users to fill out the form to test the closed beta of our cloud solution.

We have done the same thing on the documentation side https://github.com/meilisearch/documentation/pull/1148. 😇

Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>
2021-09-30 09:42:22 +00:00
be00fafb29 Add a link to join the cloud-hosted beta 2021-09-30 11:28:51 +02:00
0bc376a17b Merge #1738
1738: Update Dockerfile following the refactor r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-30 08:08:19 +00:00
05d5de47cb Merge #1737
1737: Update version for the next release (v0.23.0) r=irevoire a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-29 22:06:46 +00:00
d3eb604d66 Merge #1740
1740: Add Hacktoberfest section to CONTRIBUTING.md r=curquiza a=meili-bot

_This PR is auto-generated._

Add Hacktoberfest section to CONTRIBUTING.md


Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
2021-09-29 17:55:47 +00:00
d6db210ef3 Update CONTRIBUTING.md 2021-09-29 19:54:12 +02:00
80ca42922f Merge #1739
1739: Fix add document Content-Type r=curquiza a=MarinPostma

change the `Content-Type` guards of the document addition routes to match the specification.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-09-29 17:13:45 +00:00
fe5df6d06f fix payload content type guards 2021-09-29 19:04:47 +02:00
535442179e Update Dockerfile following the refactor 2021-09-29 18:54:14 +02:00
b17dae9ac0 Update version for the next release (v0.23.0) 2021-09-29 18:40:35 +02:00
5fad37aebd Merge #1711
1711: MeiliSearch refactor introducing OBKV format r=MarinPostma a=MarinPostma

This PR refactor some multiple components of meilisearch, and introduce the obkv document format to meilisearch

- [x] Split meilisearch-http and meilisearch-lib
- [x] Replace `IndexActor` and `UuidResolver` with `IndexResolver`
- [x] Remove mentions to Actor
- [x] Remove Actor traits to simplify code
- [x] Integrate obkv document format
- [x] Remove `Data`
- [x] Restore all route
- [x] Replace `Box<dyn error>` with `anyhow::Error`
- [x] Introduce update file store
- [x] Update file store error handling
- [x] Fix dumps
- [x] Fix snapshots
- [x] Fix tests
- [x] Update module documentation
- [x] add csv suppport (feat `@ManyTheFish` #1729 )
- [x] add jsonl support
- [x] integrate geosearch (feat `@irevoire` #1725) 

partially implements #1691 and #1690. The error handling is very basic now, I will finish it in the next pr.

Some unit tests have been disabled, I will re-enable them ASAP, but they need a bit more work.

close #1531 


P.S: sorry for this monstrous PR :'(

Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: many <maxime@meilisearch.com>
2021-09-29 14:38:55 +00:00
311933614e bump milli to v0.17.0 2021-09-29 15:44:54 +02:00
8fa6502b16 review changes 2021-09-29 14:17:41 +02:00
1f537e1b60 jsonl support 2021-09-29 11:28:02 +02:00
5bac65f8b8 add missing content type errors 2021-09-29 09:55:35 +02:00
911630000f split csv and json document routes 2021-09-29 00:12:25 +02:00
6e8a3fe8de move csv parsing to document_formats 2021-09-28 22:58:48 +02:00
2a14948123 Use an existing revision of milli 2021-09-28 22:30:34 +02:00
61e5eed493 Call csv specialized function 2021-09-28 22:29:26 +02:00
d30830a55c Add csv deserializer for documents 2021-09-28 22:28:13 +02:00
102c46f88b clippy + fmt 2021-09-28 22:22:59 +02:00
5fa9bc67d7 remove unused dependencies 2021-09-28 22:16:18 +02:00
3503fbf7fe re-export milli from meilisearch_lib 2021-09-28 22:08:03 +02:00
1cc733f801 fix get_info 2021-09-28 22:02:04 +02:00
7a27cbcc78 rename RegisterUpdate to store::Update 2021-09-28 20:20:13 +02:00
6f8e670dee move json reader to document_formats module 2021-09-28 20:13:26 +02:00
df4e9f4e1e restore dump v1 2021-09-28 19:49:25 +02:00
3747f5bdd8 replace unwraps with correct error 2021-09-28 19:29:14 +02:00
56766cffc3 remove module level doc 2021-09-28 18:58:56 +02:00
692c676625 fix tests 2021-09-28 18:57:36 +02:00
ddfd7def35 add a TODO while waiting for the tests to be fixed 2021-09-28 18:17:56 +02:00
bcaee4d179 fix uuid store size 2021-09-28 18:17:56 +02:00
539a57026d fix the sort error messages 2021-09-28 14:50:26 +02:00
654f49ccec [WIP] put milli on branch main 2021-09-28 14:50:26 +02:00
c1376a9f2a add the geosearch to Meilisearch 2021-09-28 14:50:26 +02:00
9ac999ca59 remove uuid resolver and index actor 2021-09-28 12:00:35 +02:00
6a1964f146 restore dumps 2021-09-28 11:59:55 +02:00
90018755c5 restore snapshots 2021-09-27 16:48:03 +02:00
95211e2665 Merge #1703
1703: Trigger CodeCoverage manually instead of on each PR r=irevoire a=curquiza

Since no one is using it now on the PRs, we would rather get a state of the code coverage once (triggered manually) rather than on each PR.

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-27 13:55:05 +00:00
7cd94e5486 Merge #1724
1724: Redo CONTRIBUTING.md r=curquiza a=curquiza

- Update `Development` section
- Update the `Git Guidelines` section
- Remove `Benchmarking & Profiling` -> done on the milli side at the moment
- Remove `Humans` -> synchronization job done by the manager of the core team at the moment
- Remove `Changelog` section -> done by the manager and the docs team 
- Remove `Documentation` section -> job done by the manager to synchronize both teams.

Fixes #1723 at the same time

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-27 13:20:29 +00:00
35ef6a9204 Update CONTRIBUTING.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-27 14:42:56 +02:00
41272e7148 Update CONTRIBUTING.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-27 14:42:38 +02:00
8ff39d8432 Update CONTRIBUTING.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-27 14:42:28 +02:00
e22f57cae5 Update CONTRIBUTING.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-27 14:32:47 +02:00
67afb0e3fe Update CONTRIBUTING.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-27 14:30:53 +02:00
f56f31c277 Update CONTRIBUTING.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-27 14:30:42 +02:00
b7c4754be2 Update CONTRIBUTING.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-27 14:30:37 +02:00
3118f32221 Update CONTRIBUTING.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-27 14:30:30 +02:00
c9cc504e7b Update CONTRIBUTING.md 2021-09-27 14:18:15 +02:00
b9d189bf12 restore document deletion routes 2021-09-24 15:21:07 +02:00
c32012c44a restore settings updates 2021-09-24 14:55:57 +02:00
dfce44fa3b rename data to meilisearch 2021-09-24 12:03:16 +02:00
42a6260b65 introduce index resolver 2021-09-24 11:53:11 +02:00
5353be74c3 refactor index actor 2021-09-22 15:07:04 +02:00
12542bf922 refactor update actor 2021-09-22 11:52:50 +02:00
def737edee refactor uuid resolver 2021-09-22 10:49:59 +02:00
60518449fc split meilisearch-http and meilisearch-lib 2021-09-21 13:23:22 +02:00
09d4e37044 split data and api keys 2021-09-20 15:31:03 +02:00
e14640e530 refactor meilisearch 2021-09-20 14:54:20 +02:00
7f734f0a18 get x84 binary 2021-09-20 20:57:47 +09:00
cd2f886234 Update download-latest.sh
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-19 23:46:36 +09:00
dda4fe10b3 Update download-latest.sh
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-19 23:46:31 +09:00
148a896cca Merge #1666
1666: Proposed fix for ARM binary on RHEL r=irevoire a=kappa-wingman

https://github.com/meilisearch/MeiliSearch/issues/1410

Co-authored-by: Kappa Wingman <64772920+kappa-wingman@users.noreply.github.com>
2021-09-15 14:16:25 +00:00
e8eb589d6e Merge #1659
1659: deps: unify pest dependency r=MarinPostma a=happysalada

meilisearch dependends on two different versions of pest.
This can be problematic for some build systems (e.g. NixOS).
Since the repo hasn't received an update in a while, in the meantime, use the later version of the two pest dependencies.

Context: this has been discussed previously https://github.com/meilisearch/MeiliSearch/issues/1273
meilisearch has been selected by ngi to be packaged for nixos. A patch can be applied to make the changes proposed in this PR. This PR intends to see how the maintainers of meilisearch would feel about the patch.

What was done.
- Add an override for the pest dependency in Cargo.toml.
- recreate the Cargo.lock with `cargo update`. This has had the side effect of updating some dependencies.

I ran the tests on darwin. My machine is quite old so I had 8 failures due to a timeout. None of the failures look like they are due to the new dependencies.

Checking the pest repo, it seems there are some recent commits, however no sure date of when there could be a new release.

If this gets accepted, there is no need to do a new release, nixos can just target the new commit.

If you feel it's too much pain for not enough gain, no worries at all!

Co-authored-by: happysalada <raphael@megzari.com>
2021-09-15 07:56:03 +00:00
770b6d25ae deps: unify pest dependency 2021-09-15 12:15:44 +09:00
be10d90f07 Trigger CodeCoverage manually instead of on each PR 2021-09-14 18:59:20 +02:00
fd3fa1ef45 Merge #1692
1692: Use tikv-jemallocator instead of jemallocator r=curquiza a=felixonmars

`jemallocator` has been abandoned for nearly two years, and `rustc`
itself moved to use `tikv-jemallocator` instead:
3965773ae7

Let's switch to a better maintained version.

Co-authored-by: Felix Yan <felixonmars@archlinux.org>
2021-09-14 16:26:27 +00:00
a57943b77e Use tikv-jemallocator instead of jemallocator
`jemallocator` has been abandoned for nearly two years, and `rustc`
itself moved to use `tikv-jemallocator` instead:
3965773ae7

Let's switch to a better maintained version.
2021-09-14 18:30:24 +03:00
6fafdb7711 Merge #1651 #1676 #1684
1651: Use reset_sortable_fields r=Kerollmops a=shekhirin

Resolves https://github.com/meilisearch/MeiliSearch/issues/1635

1676: Add curl binary to final stage image r=curquiza a=ook

Reference: #1673 

Changes: * add `curl` binary to final docker Melisearch image.

For metrics, docker funny layer management makes this add a  shrink from 319MB to 315MB:

```
☁  MeiliSearch [feature/1673-add-curl-to-docker-image]  docker image ls
REPOSITORY                                                         TAG                  IMAGE ID       CREATED         SIZE
getmeili/meilisearch                                               0.22.0_ook_1673      938e239ad989   2 hours ago     315MB
getmeili/meilisearch                                               latest               258fa3aa1230   6 days ago      319MB
```

1684: bump dependencies r=MarinPostma a=MarinPostma

Bump meilisearch dependencies.

We still depend on custom patch that have been upgraded along the way.

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
Co-authored-by: Thomas Lecavelier <thomas@followanalytics.com>
Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-09-13 13:20:29 +00:00
0f7625e29a bump dependencies 2021-09-13 15:17:08 +02:00
7afccfc92d Merge #1683
1683: Better dependencies cache for CI r=curquiza a=shekhirin



Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-09-13 12:48:35 +00:00
6e59da26d4 Merge #1700
1700: `stable` into `main` after v0.22.0 r=MarinPostma a=curquiza



Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-13 12:32:23 +00:00
928930ddd5 Merge #1699
1699: Bump milli: fix some crashes r=ManyTheFish a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-13 10:17:10 +00:00
6d2f7af642 Bump milli: fix some crashes 2021-09-13 12:14:54 +02:00
5b995ba080 feat: Make exec binary for M1 mac available for download 2021-09-11 23:11:33 +09:00
c101b2a5cb Merge #1686
1686: Bump milli r=curquiza a=irevoire

 fixes #1685, #1678, #1671, #1677 and #1680

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-08 16:31:02 +00:00
971c361e0f Merge #1682
1682: Change the format of custom ranking rules when importing old dumps r=curquiza a=Kerollmops

This PR changes the format of the custom ranking rules from `asc(price)` to `title:asc` as the format changed between v0.21 and v0.22. The dumps are now correctly importing the custom ranking rules.

This PR also change the previous default ranking rules (without sort) to the new default ranking rules (with the new sort).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-09-08 16:20:10 +00:00
be50b2bec6 Change the format of custom ranking rules when importing v2 dumps 2021-09-08 17:56:21 +02:00
49c918defa bump milli 2021-09-08 17:41:47 +02:00
d595623162 Merge #1669
1669: Fix windows integration tests r=MarinPostma a=ManyTheFish

Set max_memory value to unlimited during tests:
because tests run several meilisearch in parallel,
we overestimate the value for max_memory making the tests on Windows crash

Co-authored-by: many <maxime@meilisearch.com>
2021-09-08 12:44:50 +00:00
169e739634 Remove useless indexer options 2021-09-08 13:40:05 +02:00
08138c7c23 Use set indexer options instead of create a default one 2021-09-08 13:40:00 +02:00
bbb012ad0f chore(ci): use smarter dependencies cache 2021-09-07 19:56:38 +03:00
331d28102f Change the format of custom ranking rules when importing v1 dumps 2021-09-07 17:16:40 +02:00
efa69875d9 refactor(http): use reset_sortable_fields 2021-09-07 15:04:44 +03:00
59797bfc7b meilisearch/MeiliSearch#1673 Add curl binary to final stage image 2021-09-06 19:35:23 +02:00
c0f9c891f5 Set max_memory value to unlimited during tests
because tests run several meilisearch in parallel,
we over estimate the value for max_memory making the tests on widows crash
2021-09-06 14:38:10 +02:00
33514b28be Merge pull request #1588 from meilisearch/test-new-indexer
Integrate the new indexer
2021-09-06 10:21:42 +02:00
e3a913e03f Merge #1660
1660: Update version for the next release (v0.22.0) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-02 16:43:32 +00:00
7e80337e5b Bump milli to v0.12.0 2021-09-02 18:19:12 +02:00
8d4723d91b Update lock file 2021-09-02 18:19:12 +02:00
4cdf680a81 Make the MaxMemory use the default value when undefined 2021-09-02 18:19:11 +02:00
63e67f72e3 Update tokenizer and new milli version 2021-09-02 18:19:00 +02:00
0cd66c3a89 Bump the milli version 2021-09-02 18:19:00 +02:00
b092a624ed Introduce the MaxMemory struct that defaults to 2/3 of the available memory 2021-09-02 18:18:59 +02:00
24e84d7ca1 Test new indexer 2021-09-02 18:11:20 +02:00
14f9056349 Merge #1662
1662: Fix link in download script r=irevoire a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-02 11:27:14 +00:00
b444e2e074 Proposed fix for ARM binary on RHEL
https://github.com/meilisearch/MeiliSearch/issues/1410
2021-09-02 01:01:46 +00:00
723cb4d520 Fix link in download script 2021-09-01 15:57:11 +02:00
90116155b4 Update version for the next release (v0.22.0) 2021-09-01 12:33:30 +02:00
0d01c0e935 Merge #1658
1658: Remove COMMIT_SHA and COMMIT_DATE build arg from the Docker CIs r=irevoire a=curquiza

Since `@irevoire` add the `.git` folder in the Dockerfile, no need to compute `COMMIT_SHA` and `COMMIT_DATE` in the CI.
Can you confirm `@irevoire?`

Also, update some CIs using `checkout@v1` to `checkout@v2`

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-31 15:15:24 +00:00
e002509bf2 Remove COMMIT_SHA and COMMIT_DATE build arg 2021-08-31 17:01:58 +02:00
19c5c74291 Merge #1652 #1654 #1657
1652: Remove dependabot r=MarinPostma a=curquiza

Fixes #1649 

Dependabot for vulnerability and security updates is still activated.

1654: Add Script for Windows r=MarinPostma a=singh08prashant

fixes #1570 

changes:

1. added script for detecting windows os running git bash
2. appended `.exe` to `$release_file` for windows as listed [here](https://github.com/meilisearch/MeiliSearch/releases/)
3. removed global `$BINARY_NAME='meilisearch'` as windows require `.exe` file

1657: Bring vergen hotfix from `stable` to `main` r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: singh08prashant <singh08prashant@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
2021-08-31 14:31:42 +00:00
b6fec60243 Merge #1656
1656: Remove unused Arc import r=MarinPostma a=Kerollmops

This PR removes a warning introduced by #1606 which removed Sentry that was using an `Arc` but forgot to remove the scope import, we remove it here.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-08-31 14:04:16 +00:00
9d0fa8112b Remove unused Arc import 2021-08-31 14:50:36 +02:00
d30f5b1bef add scrpit for git-bash 2021-08-31 08:34:21 +05:30
70df41bc62 Remove dependabot 2021-08-30 16:51:50 +02:00
23ccf4429e Merge #1639
1639: Add new mini-dahsboard gif r=curquiza a=CaroFG



Co-authored-by: CaroFG <48251481+CaroFG@users.noreply.github.com>
Co-authored-by: CaroFG <carolina.ferreira131@gmail.com>
2021-08-26 15:58:39 +00:00
bf4e799dba Update README.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-08-26 17:47:29 +02:00
cb695bdec3 Update README with new gif 2021-08-26 17:43:41 +02:00
be70eb881a Remove old gif 2021-08-26 17:42:56 +02:00
867c277088 Add files via upload 2021-08-26 16:40:44 +02:00
96f72f009a Merge #1615
1615: Integrate the query time sort feature r=Kerollmops a=Kerollmops

This pull request integrates the sort at query time feature that was implemented on the milli side https://github.com/meilisearch/milli/pull/320. It follows the specification file https://github.com/meilisearch/specifications/blob/develop/text/0055-sort.md.

A bunch of tests has been added to ensure that the search works correctly and that the settings are fine too!

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-08-26 14:09:38 +00:00
cf4a466b6b Make sure that the order of the filterableAttributes is constant 2021-08-26 11:06:05 +02:00
087e4626ce Make sure that the order of the sortableAttributes is constant 2021-08-26 11:06:04 +02:00
64462c842b Test the search with sort time queries with POST and GET methods 2021-08-25 17:39:25 +02:00
e0f73fe742 Introduce the sort search parameter 2021-08-25 17:39:25 +02:00
ea4c831de0 Integrate the sortable-attributes into the settings 2021-08-25 17:39:25 +02:00
51387b2c80 Introduce the new invalid sortable error codes 2021-08-25 17:29:30 +02:00
2d8dd87cad Merge #1623
1623: Use Setting enum r=Kerollmops a=shekhirin

Resolves https://github.com/meilisearch/MeiliSearch/issues/1620

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-08-25 14:58:40 +00:00
d9dd2a038b refactor(http): use Setting enum 2021-08-25 17:43:46 +03:00
1227ce8091 Merge #1622
1622: Update README to welcome the contribution again r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-25 13:08:08 +00:00
cd63c80be8 Merge #1616
1616: Remove sentry r=Kerollmops a=irevoire

closes #1606 

Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-08-25 11:40:30 +00:00
e0a5eebe79 Update README to welcome the contribution again 2021-08-24 20:31:05 +02:00
850069af75 Merge #1610
1610: Fix Docker CI for `latest` tag r=irevoire a=curquiza

Fixes https://github.com/meilisearch/MeiliSearch/issues/1608

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-24 11:46:04 +00:00
672fcee8aa remove sentry 2021-08-24 12:38:31 +02:00
d9b023c11f Update publish-docker-latest.yml 2021-08-23 19:27:48 +02:00
6b228f56cb Merge #1607
1607: Merge changes in `stable` into `main` r=Kerollmops a=curquiza

Containing all the fixes since v0.21.0rc0

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
2021-08-23 16:27:46 +00:00
96839c48c9 Direct users to milli for the core library in the README (#1520)
* Update README.md

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>

* Update README.md

Co-authored-by: gui machiavelli <hey@guimachiavelli.com>

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: gui machiavelli <hey@guimachiavelli.com>
2021-08-19 16:24:12 +02:00
63daa8b15a Update README.md (#1568) 2021-08-09 16:38:52 +02:00
92913e1eb8 Add information about product repo (#1567)
* Add information about product repo

* Update README.md

Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>

Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>
2021-08-09 14:56:43 +02:00
418be3daa8 Update issue templates (#1564) 2021-08-09 10:51:02 +02:00
108 changed files with 6764 additions and 5415 deletions

10
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,10 @@
contact_links:
- name: Feature request
url: https://github.com/meilisearch/product/discussions/categories/feedback-feature-proposal
about: The feature requests are not managed in this repository, please open a discussion in our dedicated product repository
- name: Documentation issue
url: https://github.com/meilisearch/documentation/issues/new
about: For documentation issues, open an issue or a PR in the documentation repository
- name: Support questions & other
url: https://github.com/meilisearch/MeiliSearch/discussions/new
about: For any other question, open a discussion in this repository

View File

@ -1,20 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

View File

@ -1,40 +0,0 @@
---
name: Tracking issue
about: Template for a tracking issue
title: ''
labels: tracking-issue
assignees: ''
---
# Summary
One paragraph to explain the feature.
# Motivations
Why are we doing this? What use cases does it support? What is the expected outcome?
# Explanation
Explain the proposal like it was the final documentation of this proposal.
- What is changing for end-users.
- How it works.
- What is breaking?
- Examples.
# Implementation
Explain the technical specificities that will need to be known or done in order to implement this proposal.
## Steps
Describe each step to create the feature with it's associated issue/PR.
# Related
- [ ] Validated by the team (@people needed)
- [ ] Test added
- [ ] [Documentation](https://github.com/meilisearch/documentation/issues/#xxx) //Change xxx or remove the line
- [ ] [SDK/Integrations](https://github.com/meilisearch/integration-guides/issues/#xxx) //Change xxx or remove the line

View File

@ -1,6 +0,0 @@
version: 2
updates:
- package-ecosystem: "cargo"
directory: "/"
schedule:
interval: "monthly"

View File

@ -1,7 +1,6 @@
---
on:
pull_request:
types: [review_requested, ready_for_review]
workflow_dispatch:
name: Execute code coverage

View File

@ -26,7 +26,7 @@ jobs:
- uses: hecrj/setup-rust-action@master
with:
rust-version: stable
- uses: actions/checkout@v1
- uses: actions/checkout@v2
- name: Build
run: cargo build --release --locked
- name: Upload binaries to release
@ -41,12 +41,14 @@ jobs:
name: Publish for ARMv8
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v1.0.0
- uses: uraimo/run-on-arch-action@v1.0.7
- uses: actions/checkout@v2
- uses: uraimo/run-on-arch-action@v2.1.1
id: runcmd
with:
architecture: aarch64 # aka ARMv8
distribution: ubuntu18.04
arch: aarch64 # aka ARMv8
distro: ubuntu18.04
env: |
JEMALLOC_SYS_WITH_LG_PAGE: 16
run: |
apt update
apt install -y curl gcc make

View File

@ -14,7 +14,7 @@ jobs:
rust-version: stable
- name: Install cargo-deb
run: cargo install cargo-deb
- uses: actions/checkout@v1
- uses: actions/checkout@v2
- name: Build deb package
run: cargo deb -p meilisearch-http -o target/debian/meilisearch.deb
- name: Upload debian pkg to release

View File

@ -13,9 +13,6 @@ jobs:
- name: Check if current release is latest
run: echo "##[set-output name=is_latest;]$(sh .github/is-latest-release.sh)"
id: release
- name: Set COMMIT_DATE env variable
run: |
echo "COMMIT_DATE=$( git log --pretty=format:'%ad' -n1 --date=short )" >> $GITHUB_ENV
- name: Publish to Registry
if: steps.release.outputs.is_latest == 'true'
uses: elgohr/Publish-Docker-Github-Action@master
@ -23,5 +20,3 @@ jobs:
name: getmeili/meilisearch
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
tag_names: true
buildargs: COMMIT_SHA,COMMIT_DATE

View File

@ -10,10 +10,7 @@ jobs:
build:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v1
- name: Set COMMIT_DATE env variable
run: |
echo "COMMIT_DATE=$( git log --pretty=format:'%ad' -n1 --date=short )" >> $GITHUB_ENV
- uses: actions/checkout@v2
- name: Publish to Registry
uses: elgohr/Publish-Docker-Github-Action@master
env:
@ -23,4 +20,3 @@ jobs:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
tag_names: true
buildargs: COMMIT_SHA,COMMIT_DATE

View File

@ -23,12 +23,7 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: Cache dependencies
uses: actions/cache@v2
with:
path: |
~/.cargo
./target
key: ${{ matrix.os }}-${{ hashFiles('Cargo.lock') }}
uses: Swatinem/rust-cache@v1.3.0
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
@ -45,19 +40,14 @@ jobs:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- name: Cache dependencies
uses: actions/cache@v2
with:
path: |
~/.cargo
./target
key: ${{ matrix.os }}-${{ hashFiles('Cargo.lock') }}
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
components: clippy
- name: Cache dependencies
uses: Swatinem/rust-cache@v1.3.0
- name: Run cargo clippy
uses: actions-rs/cargo@v1
with:
@ -69,18 +59,13 @@ jobs:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- name: Cache dependencies
uses: actions/cache@v2
with:
path: |
~/.cargo
./target
key: ${{ matrix.os }}-${{ hashFiles('Cargo.lock') }}
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: nightly
override: true
components: rustfmt
- name: Cache dependencies
uses: Swatinem/rust-cache@v1.3.0
- name: Run cargo fmt
run: cargo fmt --all -- --check

View File

@ -1,112 +1,91 @@
# Contributing
First, thank you for contributing to MeiliSearch! The goal of this document is to
provide everything you need to start contributing to MeiliSearch. The
following TOC is sorted progressively, starting with the basics and
expanding into more specifics.
First, thank you for contributing to MeiliSearch! The goal of this document is to provide everything you need to start contributing to MeiliSearch.
<!-- MarkdownTOC autolink="true" style="ordered" indent=" " -->
- [Hacktoberfest](#hacktoberfest)
- [Assumptions](#assumptions)
- [How to Contribute](#how-to-contribute)
- [Development Workflow](#development-workflow)
- [Git Guidelines](#git-guidelines)
1. [Assumptions](#assumptions)
1. [Your First Contribution](#your-first-contribution)
1. [Change Control](#change-control)
1. [Git Branches](#git-branches)
1. [Git Commits](#git-commits)
1. [Style](#style)
1. [Github Pull Requests](#github-pull-requests)
1. [Reviews & Approvals](#reviews--approvals)
1. [Merge Style](#merge-style)
1. [CI](#ci)
1. [Development](#development)
1. [Setup](#setup)
1. [Testing](#testing)
1. [Benchmarking](#benchmarking--profiling)
1. [Humans](#humans)
1. [Documentation](#documentation)
1. [Changelog](#changelog)
## Hacktoberfest
<!-- /MarkdownTOC -->
It's [Hacktoberfest month](https://blog.meilisearch.com/contribute-hacktoberfest-2021/)! 🥳
🚀 If your PR gets accepted it will count into your participation to Hacktoberfest!
✅ To be accepted it has either to have been merged, approved or tagged with the `hacktoberest-accepted` label.
🧐 Don't forget to check the [quality standards](https://hacktoberfest.digitalocean.com/resources/qualitystandards), otherwise your PR could be marked as `spam` or `invalid`, and it will not be counted toward your participation in Hacktoberfest.
## Assumptions
1. **You're familiar with [Github](https://github.com) and the [pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
workflow.**
2. **You've read the MeiliSearch [docs](https://docs.meilisearch.com).**
1. **You're familiar with [Github](https://github.com) and the [Pull Requests](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)(PR) workflow.**
2. **You've read the MeiliSearch [documentation](https://docs.meilisearch.com).**
3. **You know about the [MeiliSearch community](https://docs.meilisearch.com/learn/what_is_meilisearch/contact.html).
Please use this for help.**
## Your First Contribution
## How to Contribute
1. Ensure your change has an issue! Find an
[existing issue](https://github.com/meilisearch/meilisearch/issues/) or [open a new issue](https://github.com/meilisearch/meilisearch/issues/new).
* This is where you can get a feel if the change will be accepted or not.
2. Once approved, [fork the MeiliSearch repository](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) in your own
Github account.
2. Once approved, [fork the MeiliSearch repository](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) in your own Github account.
3. [Create a new Git branch](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-and-deleting-branches-within-your-repository)
4. Review the MeiliSearch [workflow](#workflow) and [development](#development).
5. Make your changes.
6. [Submit the branch as a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) to the main MeiliSearch
repo. A MeiliSearch team member should comment and/or review your pull request
with a few days. Although, depending on the circumstances, it may take
longer.
4. Review the [Development Workflow](#development-workflow) section that describes the steps to maintain the repository.
5. Make your changes on your branch.
6. [Submit the branch as a Pull Request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) pointing to the `main` branch of the MeiliSearch repository. A maintainer should comment and/or review your Pull Request within a few days. Although depending on the circumstances, it may take longer.
## Change Control
## Development Workflow
### Setup and run MeiliSearch
```bash
cargo run --release
```
We recommend using the `--release` flag to test the full performance of MeiliSearch.
### Test
```bash
cargo test
```
If you get a "Too many open files" error you might want to increase the open file limit using this command:
```bash
ulimit -Sn 3000
```
## Git Guidelines
### Git Branches
_All_ changes must be made in a branch and submitted as [pull requests](#pull-requests).
MeiliSearch does not adopt any type of branch naming style, but please use something
descriptive of your changes.
All changes must be made in a branch and submitted as PR.
We do not enforce any branch naming style, but please use something descriptive of your changes.
### Git Commits
#### Style
As minimal requirements, your commit message should:
- be capitalized
- not finish by a dot or any other punctuation character (!,?)
- start with a verb so that we can read your commit message this way: "This commit will ...", where "..." is the commit message.
e.g.: "Fix the home page button" or "Add more tests for create_index method"
Please ensure your commits are small and focused; they should tell a story of
your change. This helps reviewers to follow your changes, especially for more
complex changes.
Familiarise yourself with [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/).
We don't follow any other convention, but if you want to use one, we recommend [the Chris Beams one](https://chris.beams.io/posts/git-commit/).
### Github Pull Requests
Once your changes are ready you must submit your branch as a pull request.
Some notes on GitHub PRs:
#### Reviews & Approvals
- All PRs must be reviewed and approved by at least one maintainer.
- The PR title should be accurate and descriptive of the changes.
- [Convert your PR as a draft](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request) if your changes are a work in progress: no one will review it until you pass your PR as ready for review.<br>
The draft PRs are recommended when you want to show that you are working on something and make your work visible.
- The branch related to the PR must be **up-to-date with `main`** before merging. Fortunately, this project uses [Bors](https://github.com/bors-ng/bors-ng) to automatically enforce this requirement without the PR author having to rebase manually.
All pull requests must be reviewed and approved by at least one MeiliSearch team
member.
<hr>
#### Merge Style
All pull requests are squashed and merged. We generally discourage large pull
requests that are over 300-500 lines of diff. If you would like to propose
a change that is larger we suggest coming onto our chat channel and
discuss it with one of our engineers. This way we can talk through the
solution and discuss if a change that large is even needed! This overall
will produce a quicker response to the change and likely produce code that
aligns better with our process.
## Development
### Setup
See the [MeiliSearch Docs](https://docs.meilisearch.com/reference/features/installation.html) for how to set up a development environment.
### Benchmarking & Profiling
We do not yet do any benchmarking, nor have we formalised our profiling. If you'd like to work on this please get in touch!
## Humans
After making your change, you'll want to prepare it for MeiliSearch users (mostly humans). This usually entails updating documentation and announcing your feature.
### Documentation
Documentation is very important to MeiliSearch. All contributions that
alter user-facing behavior MUST include documentation changes. Please see
[GitHub.com/meilisearch/documentation](https://github.com/meilisearch/documentation) for more info.
### Changelog
Until we have guidelines in place, updating the [`Changelog`](/CHANGELOG.md) is solely the responsibility of MeiliSearch team members.
Thank you again for reading this through, we can not wait to begin to work with you if you made your way through this contributing guide ❤️

1260
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -2,7 +2,12 @@
members = [
"meilisearch-http",
"meilisearch-error",
"meilisearch-lib",
]
resolver = "2"
[profile.release]
debug = true
[patch.crates-io]
pest = { git = "https://github.com/pest-parser/pest.git", rev = "51fd1d49f1041f7839975664ef71fe15c7dcaf67" }

View File

@ -14,6 +14,7 @@ COPY Cargo.toml .
COPY meilisearch-error/Cargo.toml meilisearch-error/
COPY meilisearch-http/Cargo.toml meilisearch-http/
COPY meilisearch-lib/Cargo.toml meilisearch-lib/
ENV RUSTFLAGS="-C target-feature=-crt-static"
@ -34,7 +35,7 @@ RUN $HOME/.cargo/bin/cargo build --release
# Run
FROM alpine:3.14
RUN apk add -q --no-cache libgcc tini
RUN apk add -q --no-cache libgcc tini curl
COPY --from=compiler /meilisearch/target/release/meilisearch .

View File

@ -29,7 +29,7 @@
For more information about features go to [our documentation](https://docs.meilisearch.com/).
<p align="center">
<img src="assets/trumen_quick_loop.gif" alt="Web interface gif" />
<img src="assets/trumen-fast.gif" alt="Web interface gif" />
</p>
## ✨ Features
@ -61,6 +61,10 @@ meilisearch
docker run -p 7700:7700 -v "$(pwd)/data.ms:/data.ms" getmeili/meilisearch
```
#### Announcing a cloud-hosted MeiliSearch
Join the closed beta by filling out this [form](https://meilisearch.typeform.com/to/FtnzvZfh).
#### Try MeiliSearch in our Sandbox
Create a MeiliSearch instance in [MeiliSearch Sandbox](https://sandbox.meilisearch.com/). This instance is free, and will be active for 48 hours.
@ -108,7 +112,7 @@ Let's create an index! If you need a sample dataset, use [this movie database](h
curl -L 'https://bit.ly/2PAcw9l' -o movies.json
```
Now, you're ready to index some data.
Now, you're ready to index some data.
```bash
curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents' \
@ -169,8 +173,15 @@ Now that your MeiliSearch server is up and running, you can learn more about how
## Contributing
Hey! We're glad you're thinking about contributing to MeiliSearch! However, we are currently working on a huge refactor and accepting PRs on this repository wouldn't be productive. We are sorry about this! Be sure we are doing our best so that you can contribute to MeiliSearch again as soon as possible ❤️
Hey! We're glad you're thinking about contributing to MeiliSearch! Feel free to pick an [issue labeled as `good first issue`](https://github.com/meilisearch/MeiliSearch/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22), and to ask any question you need. Some points might not be clear and we are available to help you!
Also, we recommend following the [CONTRIBUTING](./CONTRIBUTING.md) to create your PR.
## Core engine and tokenizer
The code in this repository is only concerned with managing multiple indexes, handling the update store, and exposing an HTTP API.
Search and indexation are the domain of our core engine, [`milli`](https://github.com/meilisearch/milli), while tokenization is handled by [our `tokenizer` library](https://github.com/meilisearch/tokenizer/).
## Telemetry
MeiliSearch collects anonymous data regarding general usage.
@ -178,15 +189,17 @@ This helps us better understand developers' usage of MeiliSearch features.
To see what information we're retrieving, please see the complete list [on the dedicated issue](https://github.com/meilisearch/MeiliSearch/issues/720).
We also use Sentry to make us crash and error reports. If you want to know more about what Sentry collects, please visit their [privacy policy website](https://sentry.io/privacy/).
This program is optional, you can disable these analytics by using the `MEILI_NO_ANALYTICS` env variable.
## Feature request
The feature requests are not managed in this repository. Please visit our [dedicated repository](https://github.com/meilisearch/product) to see our work about the MeiliSearch product.
If you have a feature request or any feedback about an existing feature, please open [a discussion](https://github.com/meilisearch/product/discussions).
Also, feel free to participate in the current discussions, we are looking forward to reading your comments.
## 💌 Contact
Feel free to contact us with any questions you may have:
* 🆕 Join our [GitHub Discussions forum](https://github.com/meilisearch/MeiliSearch/discussions)
* Join our [Slack community](https://slack.meilisearch.com/).
* By opening an issue.
Please visit [this page](https://docs.meilisearch.com/learn/what_is_meilisearch/contact.html#contact-us).
MeiliSearch is developed by [Meili](https://www.meilisearch.com), a young company. To know more about us, you can [read our blog](https://blog.meilisearch.com). Any suggestion or feedback is highly appreciated. Thank you for your support!

BIN
assets/trumen-fast.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 MiB

View File

@ -6,7 +6,6 @@ GREEN='\033[32m'
DEFAULT='\033[0m'
# GLOBALS
BINARY_NAME='meilisearch'
GREP_SEMVER_REGEXP='v\([0-9]*\)[.]\([0-9]*\)[.]\([0-9]*\)$' # i.e. v[number].[number].[number]
# FUNCTIONS
@ -127,6 +126,9 @@ get_os() {
'Linux')
os='linux'
;;
'MINGW'*)
os='windows'
;;
*)
return 1
esac
@ -138,7 +140,7 @@ get_os() {
get_archi() {
architecture=$(uname -m)
case "$architecture" in
'x86_64' | 'amd64')
'x86_64' | 'amd64' | 'arm64')
archi='amd64'
;;
'aarch64')
@ -163,7 +165,7 @@ failure_usage() {
printf "$RED%s\n$DEFAULT" 'ERROR: MeiliSearch binary is not available for your OS distribution or your architecture yet.'
echo ''
echo 'However, you can easily compile the binary from the source files.'
echo 'Follow the steps at the page ("Source" tab): https://docs.meilisearch.com/guides/advanced_guides/installation.html'
echo 'Follow the steps at the page ("Source" tab): https://docs.meilisearch.com/learn/getting_started/installation.html'
}
# MAIN
@ -180,7 +182,17 @@ if ! get_archi; then
fi
echo "Downloading MeiliSearch binary $latest for $os, architecture $archi..."
release_file="meilisearch-$os-$archi"
case "$os" in
'windows')
release_file="meilisearch-$os-$archi.exe"
BINARY_NAME='meilisearch.exe'
;;
*)
release_file="meilisearch-$os-$archi"
BINARY_NAME='meilisearch'
esac
link="https://github.com/meilisearch/MeiliSearch/releases/download/$latest/$release_file"
curl -OL "$link"
mv "$release_file" "$BINARY_NAME"

View File

@ -1,8 +1,9 @@
[package]
name = "meilisearch-error"
version = "0.21.1"
version = "0.23.0"
authors = ["marin <postma.marin@protonmail.com>"]
edition = "2018"
[dependencies]
actix-http = "=3.0.0-beta.6"
actix-http = "=3.0.0-beta.10"
serde = { version = "1.0.130", features = ["derive"] }

View File

@ -1,6 +1,7 @@
use std::fmt;
use actix_http::http::StatusCode;
use serde::{Deserialize, Serialize};
pub trait ErrorCode: std::error::Error {
fn error_code(&self) -> Code;
@ -45,6 +46,7 @@ impl fmt::Display for ErrorType {
}
}
#[derive(Serialize, Deserialize, Debug, Clone, Copy)]
pub enum Code {
// index related error
CreateIndex,
@ -63,11 +65,14 @@ pub enum Code {
Facet,
Filter,
Sort,
BadParameter,
BadRequest,
DocumentNotFound,
Internal,
InvalidGeoField,
InvalidRankingRule,
InvalidToken,
MissingAuthorizationHeader,
NotFound,
@ -78,6 +83,11 @@ pub enum Code {
DumpAlreadyInProgress,
DumpProcessFailed,
InvalidContentType,
MissingContentType,
MalformedPayload,
MissingPayload,
}
impl Code {
@ -105,6 +115,8 @@ impl Code {
PrimaryKeyAlreadyPresent => {
ErrCode::invalid("primary_key_already_present", StatusCode::BAD_REQUEST)
}
// invalid ranking rule
InvalidRankingRule => ErrCode::invalid("invalid_request", StatusCode::BAD_REQUEST),
// invalid document
MaxFieldsLimitExceeded => {
@ -116,11 +128,16 @@ impl Code {
Facet => ErrCode::invalid("invalid_facet", StatusCode::BAD_REQUEST),
// error related to filters
Filter => ErrCode::invalid("invalid_filter", StatusCode::BAD_REQUEST),
// error related to sorts
Sort => ErrCode::invalid("invalid_sort", StatusCode::BAD_REQUEST),
BadParameter => ErrCode::invalid("bad_parameter", StatusCode::BAD_REQUEST),
BadRequest => ErrCode::invalid("bad_request", StatusCode::BAD_REQUEST),
DocumentNotFound => ErrCode::invalid("document_not_found", StatusCode::NOT_FOUND),
Internal => ErrCode::internal("internal", StatusCode::INTERNAL_SERVER_ERROR),
InvalidGeoField => {
ErrCode::authentication("invalid_geo_field", StatusCode::BAD_REQUEST)
}
InvalidToken => ErrCode::authentication("invalid_token", StatusCode::FORBIDDEN),
MissingAuthorizationHeader => {
ErrCode::authentication("missing_authorization_header", StatusCode::UNAUTHORIZED)
@ -142,6 +159,14 @@ impl Code {
DumpProcessFailed => {
ErrCode::internal("dump_process_failed", StatusCode::INTERNAL_SERVER_ERROR)
}
MissingContentType => {
ErrCode::invalid("missing_content_type", StatusCode::UNSUPPORTED_MEDIA_TYPE)
}
MalformedPayload => ErrCode::invalid("malformed_payload", StatusCode::BAD_REQUEST),
InvalidContentType => {
ErrCode::invalid("invalid_content_type", StatusCode::UNSUPPORTED_MEDIA_TYPE)
}
MissingPayload => ErrCode::invalid("missing_payload", StatusCode::BAD_REQUEST),
}
}

View File

@ -4,102 +4,82 @@ description = "MeiliSearch HTTP server"
edition = "2018"
license = "MIT"
name = "meilisearch-http"
version = "0.21.1"
version = "0.23.0"
[[bin]]
name = "meilisearch"
path = "src/main.rs"
[build-dependencies]
actix-web-static-files = { git = "https://github.com/MarinPostma/actix-web-static-files.git", rev = "6db8c3e", optional = true }
anyhow = { version = "*", optional = true }
cargo_toml = { version = "0.9.0", optional = true }
actix-web-static-files = { git = "https://github.com/MarinPostma/actix-web-static-files.git", rev = "39d8006", optional = true }
anyhow = { version = "1.0.43", optional = true }
cargo_toml = { version = "0.9", optional = true }
hex = { version = "0.4.3", optional = true }
reqwest = { version = "0.11.3", features = ["blocking", "rustls-tls"], default-features = false, optional = true }
sha-1 = { version = "0.9.4", optional = true }
tempfile = { version = "3.1.0", optional = true }
reqwest = { version = "0.11.4", features = ["blocking", "rustls-tls"], default-features = false, optional = true }
sha-1 = { version = "0.9.8", optional = true }
tempfile = { version = "3.2.0", optional = true }
vergen = { version = "5.1.15", default-features = false, features = ["git"] }
zip = { version = "0.5.12", optional = true }
zip = { version = "0.5.13", optional = true }
[dependencies]
actix-cors = { git = "https://github.com/MarinPostma/actix-extras.git", rev = "2dac1a4"}
actix-http = { version = "=3.0.0-beta.6" }
actix-service = "2.0.0"
actix-web = { version = "=4.0.0-beta.6", features = ["rustls"] }
actix-web-static-files = { git = "https://github.com/MarinPostma/actix-web-static-files.git", rev = "6db8c3e", optional = true }
anyhow = "1.0.36"
async-stream = "0.3.0"
async-trait = "0.1.42"
arc-swap = "1.2.0"
byte-unit = { version = "4.0.9", default-features = false, features = ["std"] }
bytes = "0.6.0"
actix-cors = { git = "https://github.com/MarinPostma/actix-extras.git", rev = "963ac94d" }
actix-web = { version = "4.0.0-beta.9", features = ["rustls"] }
actix-web-static-files = { git = "https://github.com/MarinPostma/actix-web-static-files.git", rev = "39d8006", optional = true }
anyhow = { version = "1.0.43", features = ["backtrace"] }
async-stream = "0.3.2"
async-trait = "0.1.51"
arc-swap = "1.3.2"
byte-unit = { version = "4.0.12", default-features = false, features = ["std"] }
bytes = "1.1.0"
chrono = { version = "0.4.19", features = ["serde"] }
crossbeam-channel = "0.5.0"
crossbeam-channel = "0.5.1"
either = "1.6.1"
env_logger = "0.8.2"
flate2 = "1.0.19"
fst = "0.4.5"
futures = "0.3.7"
futures-util = "0.3.8"
grenad = { git = "https://github.com/Kerollmops/grenad.git", rev = "3adcb26" }
env_logger = "0.9.0"
flate2 = "1.0.21"
fst = "0.4.7"
futures = "0.3.17"
futures-util = "0.3.17"
heed = { git = "https://github.com/Kerollmops/heed", tag = "v0.12.1" }
http = "0.2.1"
indexmap = { version = "1.3.2", features = ["serde-1"] }
itertools = "0.10.0"
log = "0.4.8"
main_error = "0.1.0"
http = "0.2.4"
indexmap = { version = "1.7.0", features = ["serde-1"] }
itertools = "0.10.1"
log = "0.4.14"
meilisearch-lib = { path = "../meilisearch-lib" }
meilisearch-error = { path = "../meilisearch-error" }
meilisearch-tokenizer = { git = "https://github.com/meilisearch/tokenizer.git", tag = "v0.2.5" }
memmap = "0.7.0"
milli = { git = "https://github.com/meilisearch/milli.git", tag = "v0.10.2" }
mime = "0.3.16"
num_cpus = "1.13.0"
once_cell = "1.5.2"
parking_lot = "0.11.1"
rand = "0.7.3"
rayon = "1.5.0"
regex = "1.4.2"
rustls = "0.19"
serde = { version = "1.0", features = ["derive"] }
serde_json = { version = "1.0.59", features = ["preserve_order"] }
sha2 = "0.9.1"
siphasher = "0.3.2"
once_cell = "1.8.0"
parking_lot = "0.11.2"
rand = "0.8.4"
rayon = "1.5.1"
regex = "1.5.4"
rustls = "0.19.1"
serde = { version = "1.0.130", features = ["derive"] }
serde_json = { version = "1.0.67", features = ["preserve_order"] }
sha2 = "0.9.6"
siphasher = "0.3.7"
slice-group-by = "0.2.6"
structopt = "0.3.20"
tar = "0.4.29"
tempfile = "3.1.0"
thiserror = "1.0.24"
tokio = { version = "1", features = ["full"] }
uuid = { version = "0.8.2", features = ["serde"] }
structopt = "0.3.23"
tar = "0.4.37"
tempfile = "3.2.0"
thiserror = "1.0.28"
tokio = { version = "1.11.0", features = ["full"] }
uuid = { version = "0.8.2", features = ["serde"] }
walkdir = "2.3.2"
obkv = "0.2.0"
pin-project = "1.0.7"
whoami = { version = "1.1.2", optional = true }
reqwest = { version = "0.11.3", features = ["json", "rustls-tls"], default-features = false, optional = true }
serdeval = "0.1.0"
[dependencies.sentry]
default-features = false
features = [
"backtrace",
"contexts",
"panic",
"reqwest",
"rustls",
"log",
]
optional = true
version = "0.22.0"
pin-project = "1.0.8"
whoami = { version = "1.1.3", optional = true }
reqwest = { version = "0.11.4", features = ["json", "rustls-tls"], default-features = false, optional = true }
sysinfo = "0.20.2"
tokio-stream = "0.1.7"
[dev-dependencies]
actix-rt = "2.1.0"
assert-json-diff = { branch = "master", git = "https://github.com/qdequele/assert-json-diff" }
mockall = "0.9.1"
actix-rt = "2.2.0"
paste = "1.0.5"
serde_url_params = "0.2.1"
tempdir = "0.3.7"
urlencoding = "1.1.1"
urlencoding = "2.1.0"
[features]
mini-dashboard = [
@ -112,11 +92,11 @@ mini-dashboard = [
"tempfile",
"zip",
]
analytics = ["sentry", "whoami", "reqwest"]
analytics = ["whoami", "reqwest"]
default = ["analytics", "mini-dashboard"]
[target.'cfg(target_os = "linux")'.dependencies]
jemallocator = "0.3.2"
tikv-jemallocator = "0.4.1"
[package.metadata.mini-dashboard]
assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.1.4/build.zip"

View File

@ -2,10 +2,10 @@ use std::hash::{Hash, Hasher};
use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH};
use log::debug;
use meilisearch_lib::MeiliSearch;
use serde::Serialize;
use siphasher::sip::SipHasher;
use crate::Data;
use crate::Opt;
const AMPLITUDE_API_KEY: &str = "f7fba398780e06d8fe6666a9be7e3d47";
@ -18,8 +18,8 @@ struct EventProperties {
}
impl EventProperties {
async fn from(data: Data) -> anyhow::Result<EventProperties> {
let stats = data.index_controller.get_all_stats().await?;
async fn from(data: MeiliSearch) -> anyhow::Result<EventProperties> {
let stats = data.get_all_stats().await?;
let database_size = stats.database_size;
let last_update_timestamp = stats.last_update.map(|u| u.timestamp());
@ -62,7 +62,7 @@ struct AmplitudeRequest<'a> {
events: Vec<Event<'a>>,
}
pub async fn analytics_sender(data: Data, opt: Opt) {
pub async fn analytics_sender(data: MeiliSearch, opt: Opt) {
let username = whoami::username();
let hostname = whoami::hostname();
let platform = whoami::platform();

View File

@ -1,133 +0,0 @@
use std::ops::Deref;
use std::sync::Arc;
use sha2::Digest;
use crate::index::{Checked, Settings};
use crate::index_controller::{
error::Result, DumpInfo, IndexController, IndexMetadata, IndexSettings, IndexStats, Stats,
};
use crate::option::Opt;
pub mod search;
mod updates;
#[derive(Clone)]
pub struct Data {
inner: Arc<DataInner>,
}
impl Deref for Data {
type Target = DataInner;
fn deref(&self) -> &Self::Target {
&self.inner
}
}
pub struct DataInner {
pub index_controller: IndexController,
pub api_keys: ApiKeys,
options: Opt,
}
#[derive(Clone)]
pub struct ApiKeys {
pub public: Option<String>,
pub private: Option<String>,
pub master: Option<String>,
}
impl ApiKeys {
pub fn generate_missing_api_keys(&mut self) {
if let Some(master_key) = &self.master {
if self.private.is_none() {
let key = format!("{}-private", master_key);
let sha = sha2::Sha256::digest(key.as_bytes());
self.private = Some(format!("{:x}", sha));
}
if self.public.is_none() {
let key = format!("{}-public", master_key);
let sha = sha2::Sha256::digest(key.as_bytes());
self.public = Some(format!("{:x}", sha));
}
}
}
}
impl Data {
pub fn new(options: Opt) -> anyhow::Result<Data> {
let path = options.db_path.clone();
let index_controller = IndexController::new(&path, &options)?;
let mut api_keys = ApiKeys {
master: options.clone().master_key,
private: None,
public: None,
};
api_keys.generate_missing_api_keys();
let inner = DataInner {
index_controller,
api_keys,
options,
};
let inner = Arc::new(inner);
Ok(Data { inner })
}
pub async fn settings(&self, uid: String) -> Result<Settings<Checked>> {
self.index_controller.settings(uid).await
}
pub async fn list_indexes(&self) -> Result<Vec<IndexMetadata>> {
self.index_controller.list_indexes().await
}
pub async fn index(&self, uid: String) -> Result<IndexMetadata> {
self.index_controller.get_index(uid).await
}
pub async fn create_index(
&self,
uid: String,
primary_key: Option<String>,
) -> Result<IndexMetadata> {
let settings = IndexSettings {
uid: Some(uid),
primary_key,
};
let meta = self.index_controller.create_index(settings).await?;
Ok(meta)
}
pub async fn get_index_stats(&self, uid: String) -> Result<IndexStats> {
Ok(self.index_controller.get_index_stats(uid).await?)
}
pub async fn get_all_stats(&self) -> Result<Stats> {
Ok(self.index_controller.get_all_stats().await?)
}
pub async fn create_dump(&self) -> Result<DumpInfo> {
Ok(self.index_controller.create_dump().await?)
}
pub async fn dump_status(&self, uid: String) -> Result<DumpInfo> {
Ok(self.index_controller.dump_info(uid).await?)
}
#[inline]
pub fn http_payload_size_limit(&self) -> usize {
self.options.http_payload_size_limit.get_bytes() as usize
}
#[inline]
pub fn api_keys(&self) -> &ApiKeys {
&self.api_keys
}
}

View File

@ -1,34 +0,0 @@
use serde_json::{Map, Value};
use super::Data;
use crate::index::{SearchQuery, SearchResult};
use crate::index_controller::error::Result;
impl Data {
pub async fn search(&self, index: String, search_query: SearchQuery) -> Result<SearchResult> {
self.index_controller.search(index, search_query).await
}
pub async fn retrieve_documents(
&self,
index: String,
offset: usize,
limit: usize,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Vec<Map<String, Value>>> {
self.index_controller
.documents(index, offset, limit, attributes_to_retrieve)
.await
}
pub async fn retrieve_document(
&self,
index: String,
document_id: String,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Map<String, Value>> {
self.index_controller
.document(index, document_id, attributes_to_retrieve)
.await
}
}

View File

@ -1,80 +0,0 @@
use milli::update::{IndexDocumentsMethod, UpdateFormat};
use crate::extractors::payload::Payload;
use crate::index::{Checked, Settings};
use crate::index_controller::{error::Result, IndexMetadata, IndexSettings, UpdateStatus};
use crate::Data;
impl Data {
pub async fn add_documents(
&self,
index: String,
method: IndexDocumentsMethod,
format: UpdateFormat,
stream: Payload,
primary_key: Option<String>,
) -> Result<UpdateStatus> {
let update_status = self
.index_controller
.add_documents(index, method, format, stream, primary_key)
.await?;
Ok(update_status)
}
pub async fn update_settings(
&self,
index: String,
settings: Settings<Checked>,
create: bool,
) -> Result<UpdateStatus> {
let update = self
.index_controller
.update_settings(index, settings, create)
.await?;
Ok(update)
}
pub async fn clear_documents(&self, index: String) -> Result<UpdateStatus> {
let update = self.index_controller.clear_documents(index).await?;
Ok(update)
}
pub async fn delete_documents(
&self,
index: String,
document_ids: Vec<String>,
) -> Result<UpdateStatus> {
let update = self
.index_controller
.delete_documents(index, document_ids)
.await?;
Ok(update)
}
pub async fn delete_index(&self, index: String) -> Result<()> {
self.index_controller.delete_index(index).await?;
Ok(())
}
pub async fn get_update_status(&self, index: String, uid: u64) -> Result<UpdateStatus> {
self.index_controller.update_status(index, uid).await
}
pub async fn get_updates_status(&self, index: String) -> Result<Vec<UpdateStatus>> {
self.index_controller.all_update_status(index).await
}
pub async fn update_index(
&self,
uid: String,
primary_key: Option<String>,
new_uid: Option<String>,
) -> Result<IndexMetadata> {
let settings = IndexSettings {
uid: new_uid,
primary_key,
};
self.index_controller.update_index(uid, settings).await
}
}

View File

@ -3,13 +3,39 @@ use std::fmt;
use actix_web as aweb;
use actix_web::body::Body;
use actix_web::dev::BaseHttpResponseBuilder;
use actix_web::http::StatusCode;
use actix_web::HttpResponseBuilder;
use aweb::error::{JsonPayloadError, QueryPayloadError};
use meilisearch_error::{Code, ErrorCode};
use milli::UserError;
use serde::{Deserialize, Serialize};
#[derive(Debug, thiserror::Error)]
pub enum MeilisearchHttpError {
#[error("A Content-Type header is missing. Accepted values for the Content-Type header are: {}",
.0.iter().map(|s| format!("\"{}\"", s)).collect::<Vec<_>>().join(", "))]
MissingContentType(Vec<String>),
#[error(
"The Content-Type \"{0}\" is invalid. Accepted values for the Content-Type header are: {}",
.1.iter().map(|s| format!("\"{}\"", s)).collect::<Vec<_>>().join(", ")
)]
InvalidContentType(String, Vec<String>),
}
impl ErrorCode for MeilisearchHttpError {
fn error_code(&self) -> Code {
match self {
MeilisearchHttpError::MissingContentType(_) => Code::MissingContentType,
MeilisearchHttpError::InvalidContentType(_, _) => Code::InvalidContentType,
}
}
}
impl From<MeilisearchHttpError> for aweb::Error {
fn from(other: MeilisearchHttpError) -> Self {
aweb::Error::from(ResponseError::from(other))
}
}
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct ResponseError {
@ -43,9 +69,9 @@ where
}
impl aweb::error::ResponseError for ResponseError {
fn error_response(&self) -> aweb::BaseHttpResponse<Body> {
fn error_response(&self) -> aweb::HttpResponse<Body> {
let json = serde_json::to_vec(self).unwrap();
BaseHttpResponseBuilder::new(self.status_code())
HttpResponseBuilder::new(self.status_code())
.content_type("application/json")
.body(json)
}
@ -55,60 +81,6 @@ impl aweb::error::ResponseError for ResponseError {
}
}
macro_rules! internal_error {
($target:ty : $($other:path), *) => {
$(
impl From<$other> for $target {
fn from(other: $other) -> Self {
Self::Internal(Box::new(other))
}
}
)*
}
}
#[derive(Debug)]
pub struct MilliError<'a>(pub &'a milli::Error);
impl Error for MilliError<'_> {}
impl fmt::Display for MilliError<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.0.fmt(f)
}
}
impl ErrorCode for MilliError<'_> {
fn error_code(&self) -> Code {
match self.0 {
milli::Error::InternalError(_) => Code::Internal,
milli::Error::IoError(_) => Code::Internal,
milli::Error::UserError(ref error) => {
match error {
// TODO: wait for spec for new error codes.
UserError::Csv(_)
| UserError::SerdeJson(_)
| UserError::MaxDatabaseSizeReached
| UserError::InvalidCriterionName { .. }
| UserError::InvalidDocumentId { .. }
| UserError::InvalidStoreFile
| UserError::NoSpaceLeftOnDevice
| UserError::DocumentLimitReached => Code::Internal,
UserError::AttributeLimitReached => Code::MaxFieldsLimitExceeded,
UserError::InvalidFilter(_) => Code::Filter,
UserError::InvalidFilterAttribute(_) => Code::Filter,
UserError::MissingDocumentId { .. } => Code::MissingDocumentId,
UserError::MissingPrimaryKey => Code::MissingPrimaryKey,
UserError::PrimaryKeyCannotBeChanged => Code::PrimaryKeyAlreadyPresent,
UserError::PrimaryKeyCannotBeReset => Code::PrimaryKeyAlreadyPresent,
UserError::UnknownInternalDocumentId { .. } => Code::DocumentNotFound,
UserError::InvalidFacetsDistribution { .. } => Code::BadRequest,
}
}
}
}
}
impl fmt::Display for PayloadError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
@ -130,7 +102,7 @@ impl ErrorCode for PayloadError {
fn error_code(&self) -> Code {
match self {
PayloadError::Json(err) => match err {
JsonPayloadError::Overflow => Code::PayloadTooLarge,
JsonPayloadError::Overflow { .. } => Code::PayloadTooLarge,
JsonPayloadError::ContentType => Code::UnsupportedMediaType,
JsonPayloadError::Payload(aweb::error::PayloadError::Overflow) => {
Code::PayloadTooLarge
@ -159,9 +131,8 @@ impl From<QueryPayloadError> for PayloadError {
}
}
pub fn payload_error_handler<E>(err: E) -> ResponseError
where
E: Into<PayloadError>,
{
err.into().into()
impl From<PayloadError> for aweb::Error {
fn from(other: PayloadError) -> Self {
aweb::Error::from(ResponseError::from(other))
}
}

View File

@ -145,7 +145,7 @@ impl<P: Policy + 'static, D: 'static + Clone> FromRequest for GuardedData<P, D>
fn from_request(
req: &actix_web::HttpRequest,
_payload: &mut actix_http::Payload,
_payload: &mut actix_web::dev::Payload,
) -> Self::Future {
match req.app_data::<Self::Config>() {
Some(config) => match config {
@ -168,7 +168,8 @@ impl<P: Policy + 'static, D: 'static + Clone> FromRequest for GuardedData<P, D>
None => err(AuthenticationError::IrretrievableState.into()),
}
} else {
err(AuthenticationError::InvalidToken(String::from("hello")).into())
let token = token.to_str().unwrap_or("unknown").to_string();
err(AuthenticationError::InvalidToken(token).into())
}
}
None => err(AuthenticationError::MissingAuthorizationHeader.into()),

View File

@ -1,7 +1,7 @@
use std::pin::Pin;
use std::task::{Context, Poll};
use actix_http::error::PayloadError;
use actix_web::error::PayloadError;
use actix_web::{dev, web, FromRequest, HttpRequest};
use futures::future::{ready, Ready};
use futures::Stream;

View File

@ -1,4 +1,3 @@
pub mod compression;
mod env;
pub use env::EnvSizer;

View File

@ -1,92 +0,0 @@
use std::fs::File;
use crate::index::Index;
use grenad::CompressionType;
use milli::update::UpdateBuilder;
use rayon::ThreadPool;
use crate::index_controller::UpdateMeta;
use crate::index_controller::{Failed, Processed, Processing};
use crate::option::IndexerOpts;
pub struct UpdateHandler {
max_nb_chunks: Option<usize>,
chunk_compression_level: Option<u32>,
thread_pool: ThreadPool,
log_frequency: usize,
max_memory: usize,
linked_hash_map_size: usize,
chunk_compression_type: CompressionType,
chunk_fusing_shrink_size: u64,
}
impl UpdateHandler {
pub fn new(opt: &IndexerOpts) -> anyhow::Result<Self> {
let thread_pool = rayon::ThreadPoolBuilder::new()
.num_threads(opt.indexing_jobs.unwrap_or(num_cpus::get() / 2))
.build()?;
Ok(Self {
max_nb_chunks: opt.max_nb_chunks,
chunk_compression_level: opt.chunk_compression_level,
thread_pool,
log_frequency: opt.log_every_n,
max_memory: opt.max_memory.get_bytes() as usize,
linked_hash_map_size: opt.linked_hash_map_size,
chunk_compression_type: opt.chunk_compression_type,
chunk_fusing_shrink_size: opt.chunk_fusing_shrink_size.get_bytes(),
})
}
pub fn update_builder(&self, update_id: u64) -> UpdateBuilder {
// We prepare the update by using the update builder.
let mut update_builder = UpdateBuilder::new(update_id);
if let Some(max_nb_chunks) = self.max_nb_chunks {
update_builder.max_nb_chunks(max_nb_chunks);
}
if let Some(chunk_compression_level) = self.chunk_compression_level {
update_builder.chunk_compression_level(chunk_compression_level);
}
update_builder.thread_pool(&self.thread_pool);
update_builder.log_every_n(self.log_frequency);
update_builder.max_memory(self.max_memory);
update_builder.linked_hash_map_size(self.linked_hash_map_size);
update_builder.chunk_compression_type(self.chunk_compression_type);
update_builder.chunk_fusing_shrink_size(self.chunk_fusing_shrink_size);
update_builder
}
pub fn handle_update(
&self,
meta: Processing,
content: Option<File>,
index: Index,
) -> Result<Processed, Failed> {
use UpdateMeta::*;
let update_id = meta.id();
let update_builder = self.update_builder(update_id);
let result = match meta.meta() {
DocumentsAddition {
method,
format,
primary_key,
} => index.update_documents(
*format,
*method,
content,
update_builder,
primary_key.as_deref(),
),
ClearDocuments => index.clear_documents(update_builder),
DeleteDocuments { ids } => index.delete_documents(ids, update_builder),
Settings(settings) => index.update_settings(&settings.clone().check(), update_builder),
};
match result {
Ok(result) => Ok(meta.process(result)),
Err(e) => Err(meta.fail(e.into())),
}
}
}

View File

@ -1,382 +0,0 @@
use std::collections::{BTreeMap, BTreeSet, HashSet};
use std::io;
use std::marker::PhantomData;
use std::num::NonZeroUsize;
use flate2::read::GzDecoder;
use log::{debug, info, trace};
use milli::update::{IndexDocumentsMethod, UpdateBuilder, UpdateFormat};
use serde::{Deserialize, Serialize, Serializer};
use crate::index_controller::UpdateResult;
use super::error::Result;
use super::{deserialize_some, Index};
fn serialize_with_wildcard<S>(
field: &Option<Option<Vec<String>>>,
s: S,
) -> std::result::Result<S::Ok, S::Error>
where
S: Serializer,
{
let wildcard = vec!["*".to_string()];
s.serialize_some(&field.as_ref().map(|o| o.as_ref().unwrap_or(&wildcard)))
}
#[derive(Clone, Default, Debug, Serialize)]
pub struct Checked;
#[derive(Clone, Default, Debug, Serialize, Deserialize)]
pub struct Unchecked;
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
#[serde(rename_all = "camelCase")]
#[serde(bound(serialize = "T: Serialize", deserialize = "T: Deserialize<'static>"))]
pub struct Settings<T> {
#[serde(
default,
deserialize_with = "deserialize_some",
serialize_with = "serialize_with_wildcard",
skip_serializing_if = "Option::is_none"
)]
pub displayed_attributes: Option<Option<Vec<String>>>,
#[serde(
default,
deserialize_with = "deserialize_some",
serialize_with = "serialize_with_wildcard",
skip_serializing_if = "Option::is_none"
)]
pub searchable_attributes: Option<Option<Vec<String>>>,
#[serde(
default,
deserialize_with = "deserialize_some",
skip_serializing_if = "Option::is_none"
)]
pub filterable_attributes: Option<Option<HashSet<String>>>,
#[serde(
default,
deserialize_with = "deserialize_some",
skip_serializing_if = "Option::is_none"
)]
pub ranking_rules: Option<Option<Vec<String>>>,
#[serde(
default,
deserialize_with = "deserialize_some",
skip_serializing_if = "Option::is_none"
)]
pub stop_words: Option<Option<BTreeSet<String>>>,
#[serde(
default,
deserialize_with = "deserialize_some",
skip_serializing_if = "Option::is_none"
)]
pub synonyms: Option<Option<BTreeMap<String, Vec<String>>>>,
#[serde(
default,
deserialize_with = "deserialize_some",
skip_serializing_if = "Option::is_none"
)]
pub distinct_attribute: Option<Option<String>>,
#[serde(skip)]
pub _kind: PhantomData<T>,
}
impl Settings<Checked> {
pub fn cleared() -> Settings<Checked> {
Settings {
displayed_attributes: Some(None),
searchable_attributes: Some(None),
filterable_attributes: Some(None),
ranking_rules: Some(None),
stop_words: Some(None),
synonyms: Some(None),
distinct_attribute: Some(None),
_kind: PhantomData,
}
}
pub fn into_unchecked(self) -> Settings<Unchecked> {
let Self {
displayed_attributes,
searchable_attributes,
filterable_attributes,
ranking_rules,
stop_words,
synonyms,
distinct_attribute,
..
} = self;
Settings {
displayed_attributes,
searchable_attributes,
filterable_attributes,
ranking_rules,
stop_words,
synonyms,
distinct_attribute,
_kind: PhantomData,
}
}
}
impl Settings<Unchecked> {
pub fn check(mut self) -> Settings<Checked> {
let displayed_attributes = match self.displayed_attributes.take() {
Some(Some(fields)) => {
if fields.iter().any(|f| f == "*") {
Some(None)
} else {
Some(Some(fields))
}
}
otherwise => otherwise,
};
let searchable_attributes = match self.searchable_attributes.take() {
Some(Some(fields)) => {
if fields.iter().any(|f| f == "*") {
Some(None)
} else {
Some(Some(fields))
}
}
otherwise => otherwise,
};
Settings {
displayed_attributes,
searchable_attributes,
filterable_attributes: self.filterable_attributes,
ranking_rules: self.ranking_rules,
stop_words: self.stop_words,
synonyms: self.synonyms,
distinct_attribute: self.distinct_attribute,
_kind: PhantomData,
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
#[serde(rename_all = "camelCase")]
pub struct Facets {
pub level_group_size: Option<NonZeroUsize>,
pub min_level_size: Option<NonZeroUsize>,
}
impl Index {
pub fn update_documents(
&self,
format: UpdateFormat,
method: IndexDocumentsMethod,
content: Option<impl io::Read>,
update_builder: UpdateBuilder,
primary_key: Option<&str>,
) -> Result<UpdateResult> {
let mut txn = self.write_txn()?;
let result = self.update_documents_txn(
&mut txn,
format,
method,
content,
update_builder,
primary_key,
)?;
txn.commit()?;
Ok(result)
}
pub fn update_documents_txn<'a, 'b>(
&'a self,
txn: &mut heed::RwTxn<'a, 'b>,
format: UpdateFormat,
method: IndexDocumentsMethod,
content: Option<impl io::Read>,
update_builder: UpdateBuilder,
primary_key: Option<&str>,
) -> Result<UpdateResult> {
trace!("performing document addition");
// Set the primary key if not set already, ignore if already set.
if let (None, Some(primary_key)) = (self.primary_key(txn)?, primary_key) {
let mut builder = UpdateBuilder::new(0).settings(txn, self);
builder.set_primary_key(primary_key.to_string());
builder.execute(|_, _| ())?;
}
let mut builder = update_builder.index_documents(txn, self);
builder.update_format(format);
builder.index_documents_method(method);
let indexing_callback =
|indexing_step, update_id| debug!("update {}: {:?}", update_id, indexing_step);
let gzipped = false;
let addition = match content {
Some(content) if gzipped => {
builder.execute(GzDecoder::new(content), indexing_callback)?
}
Some(content) => builder.execute(content, indexing_callback)?,
None => builder.execute(std::io::empty(), indexing_callback)?,
};
info!("document addition done: {:?}", addition);
Ok(UpdateResult::DocumentsAddition(addition))
}
pub fn clear_documents(&self, update_builder: UpdateBuilder) -> Result<UpdateResult> {
// We must use the write transaction of the update here.
let mut wtxn = self.write_txn()?;
let builder = update_builder.clear_documents(&mut wtxn, self);
let _count = builder.execute()?;
wtxn.commit()
.and(Ok(UpdateResult::Other))
.map_err(Into::into)
}
pub fn update_settings_txn<'a, 'b>(
&'a self,
txn: &mut heed::RwTxn<'a, 'b>,
settings: &Settings<Checked>,
update_builder: UpdateBuilder,
) -> Result<UpdateResult> {
// We must use the write transaction of the update here.
let mut builder = update_builder.settings(txn, self);
if let Some(ref names) = settings.searchable_attributes {
match names {
Some(names) => builder.set_searchable_fields(names.clone()),
None => builder.reset_searchable_fields(),
}
}
if let Some(ref names) = settings.displayed_attributes {
match names {
Some(names) => builder.set_displayed_fields(names.clone()),
None => builder.reset_displayed_fields(),
}
}
if let Some(ref facet_types) = settings.filterable_attributes {
let facet_types = facet_types.clone().unwrap_or_else(HashSet::new);
builder.set_filterable_fields(facet_types);
}
if let Some(ref criteria) = settings.ranking_rules {
match criteria {
Some(criteria) => builder.set_criteria(criteria.clone()),
None => builder.reset_criteria(),
}
}
if let Some(ref stop_words) = settings.stop_words {
match stop_words {
Some(stop_words) => builder.set_stop_words(stop_words.clone()),
None => builder.reset_stop_words(),
}
}
if let Some(ref synonyms) = settings.synonyms {
match synonyms {
Some(synonyms) => builder.set_synonyms(synonyms.clone().into_iter().collect()),
None => builder.reset_synonyms(),
}
}
if let Some(ref distinct_attribute) = settings.distinct_attribute {
match distinct_attribute {
Some(attr) => builder.set_distinct_field(attr.clone()),
None => builder.reset_distinct_field(),
}
}
builder.execute(|indexing_step, update_id| {
debug!("update {}: {:?}", update_id, indexing_step)
})?;
Ok(UpdateResult::Other)
}
pub fn update_settings(
&self,
settings: &Settings<Checked>,
update_builder: UpdateBuilder,
) -> Result<UpdateResult> {
let mut txn = self.write_txn()?;
let result = self.update_settings_txn(&mut txn, settings, update_builder)?;
txn.commit()?;
Ok(result)
}
pub fn delete_documents(
&self,
document_ids: &[String],
update_builder: UpdateBuilder,
) -> Result<UpdateResult> {
let mut txn = self.write_txn()?;
let mut builder = update_builder.delete_documents(&mut txn, self)?;
// We ignore unexisting document ids
document_ids.iter().for_each(|id| {
builder.delete_external_id(id);
});
let deleted = builder.execute()?;
txn.commit()
.and(Ok(UpdateResult::DocumentDeletion { deleted }))
.map_err(Into::into)
}
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn test_setting_check() {
// test no changes
let settings = Settings {
displayed_attributes: Some(Some(vec![String::from("hello")])),
searchable_attributes: Some(Some(vec![String::from("hello")])),
filterable_attributes: None,
ranking_rules: None,
stop_words: None,
synonyms: None,
distinct_attribute: None,
_kind: PhantomData::<Unchecked>,
};
let checked = settings.clone().check();
assert_eq!(settings.displayed_attributes, checked.displayed_attributes);
assert_eq!(
settings.searchable_attributes,
checked.searchable_attributes
);
// test wildcard
// test no changes
let settings = Settings {
displayed_attributes: Some(Some(vec![String::from("*")])),
searchable_attributes: Some(Some(vec![String::from("hello"), String::from("*")])),
filterable_attributes: None,
ranking_rules: None,
stop_words: None,
synonyms: None,
distinct_attribute: None,
_kind: PhantomData::<Unchecked>,
};
let checked = settings.check();
assert_eq!(checked.displayed_attributes, Some(None));
assert_eq!(checked.searchable_attributes, Some(None));
}
}

View File

@ -1,2 +0,0 @@
pub mod v1;
pub mod v2;

View File

@ -1,192 +0,0 @@
use std::collections::{BTreeMap, BTreeSet};
use std::fs::{create_dir_all, File};
use std::io::BufRead;
use std::marker::PhantomData;
use std::path::Path;
use std::sync::Arc;
use heed::EnvOpenOptions;
use log::{error, info, warn};
use milli::update::{IndexDocumentsMethod, UpdateFormat};
use serde::{Deserialize, Serialize};
use uuid::Uuid;
use crate::index_controller::{self, uuid_resolver::HeedUuidStore, IndexMetadata};
use crate::{
index::{deserialize_some, update_handler::UpdateHandler, Index, Unchecked},
option::IndexerOpts,
};
#[derive(Serialize, Deserialize, Debug)]
#[serde(rename_all = "camelCase")]
pub struct MetadataV1 {
db_version: String,
indexes: Vec<IndexMetadata>,
}
impl MetadataV1 {
pub fn load_dump(
self,
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
size: usize,
indexer_options: &IndexerOpts,
) -> anyhow::Result<()> {
info!(
"Loading dump, dump database version: {}, dump version: V1",
self.db_version
);
let uuid_store = HeedUuidStore::new(&dst)?;
for index in self.indexes {
let uuid = Uuid::new_v4();
uuid_store.insert(index.uid.clone(), uuid)?;
let src = src.as_ref().join(index.uid);
load_index(
&src,
&dst,
uuid,
index.meta.primary_key.as_deref(),
size,
indexer_options,
)?;
}
Ok(())
}
}
// These are the settings used in legacy meilisearch (<v0.21.0).
#[derive(Default, Clone, Serialize, Deserialize, Debug)]
#[serde(rename_all = "camelCase", deny_unknown_fields)]
struct Settings {
#[serde(default, deserialize_with = "deserialize_some")]
pub ranking_rules: Option<Option<Vec<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub distinct_attribute: Option<Option<String>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub searchable_attributes: Option<Option<Vec<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub displayed_attributes: Option<Option<BTreeSet<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub stop_words: Option<Option<BTreeSet<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub synonyms: Option<Option<BTreeMap<String, Vec<String>>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub attributes_for_faceting: Option<Option<Vec<String>>>,
}
fn load_index(
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
uuid: Uuid,
primary_key: Option<&str>,
size: usize,
indexer_options: &IndexerOpts,
) -> anyhow::Result<()> {
let index_path = dst.as_ref().join(&format!("indexes/index-{}", uuid));
create_dir_all(&index_path)?;
let mut options = EnvOpenOptions::new();
options.map_size(size);
let index = milli::Index::new(options, index_path)?;
let index = Index(Arc::new(index));
// extract `settings.json` file and import content
let settings = import_settings(&src)?;
let settings: index_controller::Settings<Unchecked> = settings.into();
let mut txn = index.write_txn()?;
let handler = UpdateHandler::new(indexer_options)?;
index.update_settings_txn(&mut txn, &settings.check(), handler.update_builder(0))?;
let file = File::open(&src.as_ref().join("documents.jsonl"))?;
let mut reader = std::io::BufReader::new(file);
reader.fill_buf()?;
if !reader.buffer().is_empty() {
index.update_documents_txn(
&mut txn,
UpdateFormat::JsonStream,
IndexDocumentsMethod::ReplaceDocuments,
Some(reader),
handler.update_builder(0),
primary_key,
)?;
}
txn.commit()?;
// Finaly, we extract the original milli::Index and close it
Arc::try_unwrap(index.0)
.map_err(|_e| "Couldn't close the index properly")
.unwrap()
.prepare_for_closing()
.wait();
// Updates are ignored in dumps V1.
Ok(())
}
/// we need to **always** be able to convert the old settings to the settings currently being used
impl From<Settings> for index_controller::Settings<Unchecked> {
fn from(settings: Settings) -> Self {
Self {
distinct_attribute: settings.distinct_attribute,
// we need to convert the old `Vec<String>` into a `BTreeSet<String>`
displayed_attributes: settings.displayed_attributes.map(|o| o.map(|vec| vec.into_iter().collect())),
searchable_attributes: settings.searchable_attributes,
// we previously had a `Vec<String>` but now we have a `HashMap<String, String>`
// representing the name of the faceted field + the type of the field. Since the type
// was not known in the V1 of the dump we are just going to assume everything is a
// String
filterable_attributes: settings.attributes_for_faceting.map(|o| o.map(|vec| vec.into_iter().collect())),
// we need to convert the old `Vec<String>` into a `BTreeSet<String>`
ranking_rules: settings.ranking_rules.map(|o| o.map(|vec| vec.into_iter().filter(|criterion| {
match criterion.as_str() {
"words" | "typo" | "proximity" | "attribute" | "exactness" => true,
s if s.starts_with("asc") || s.starts_with("desc") => true,
"wordsPosition" => {
warn!("The criteria `attribute` and `wordsPosition` have been merged into a single criterion `attribute` so `wordsPositon` will be ignored");
false
}
s => {
error!("Unknown criterion found in the dump: `{}`, it will be ignored", s);
false
}
}
}).collect())),
// we need to convert the old `Vec<String>` into a `BTreeSet<String>`
stop_words: settings.stop_words.map(|o| o.map(|vec| vec.into_iter().collect())),
// we need to convert the old `Vec<String>` into a `BTreeMap<String>`
synonyms: settings.synonyms.map(|o| o.map(|vec| vec.into_iter().collect())),
_kind: PhantomData,
}
}
}
/// Extract Settings from `settings.json` file present at provided `dir_path`
fn import_settings(dir_path: impl AsRef<Path>) -> anyhow::Result<Settings> {
let path = dir_path.as_ref().join("settings.json");
let file = File::open(path)?;
let reader = std::io::BufReader::new(file);
let metadata = serde_json::from_reader(reader)?;
Ok(metadata)
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn settings_format_regression() {
let settings = Settings::default();
assert_eq!(
r##"{"rankingRules":null,"distinctAttribute":null,"searchableAttributes":null,"displayedAttributes":null,"stopWords":null,"synonyms":null,"attributesForFaceting":null}"##,
serde_json::to_string(&settings).unwrap()
);
}
}

View File

@ -1,59 +0,0 @@
use std::path::Path;
use chrono::{DateTime, Utc};
use log::info;
use serde::{Deserialize, Serialize};
use crate::index::Index;
use crate::index_controller::{update_actor::UpdateStore, uuid_resolver::HeedUuidStore};
use crate::option::IndexerOpts;
#[derive(Serialize, Deserialize, Debug)]
#[serde(rename_all = "camelCase")]
pub struct MetadataV2 {
db_version: String,
index_db_size: usize,
update_db_size: usize,
dump_date: DateTime<Utc>,
}
impl MetadataV2 {
pub fn new(index_db_size: usize, update_db_size: usize) -> Self {
Self {
db_version: env!("CARGO_PKG_VERSION").to_string(),
index_db_size,
update_db_size,
dump_date: Utc::now(),
}
}
pub fn load_dump(
self,
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
index_db_size: usize,
update_db_size: usize,
indexing_options: &IndexerOpts,
) -> anyhow::Result<()> {
info!(
"Loading dump from {}, dump database version: {}, dump version: V2",
self.dump_date, self.db_version
);
info!("Loading index database.");
HeedUuidStore::load_dump(src.as_ref(), &dst)?;
info!("Loading updates.");
UpdateStore::load_dump(&src, &dst, update_db_size)?;
info!("Loading indexes.");
let indexes_path = src.as_ref().join("indexes");
let indexes = indexes_path.read_dir()?;
for index in indexes {
let index = index?;
Index::load_dump(&index.path(), &dst, index_db_size, indexing_options)?;
}
Ok(())
}
}

View File

@ -1,203 +0,0 @@
use std::fs::File;
use std::path::{Path, PathBuf};
use anyhow::Context;
use chrono::{DateTime, Utc};
use log::{info, trace, warn};
#[cfg(test)]
use mockall::automock;
use serde::{Deserialize, Serialize};
use tokio::fs::create_dir_all;
use loaders::v1::MetadataV1;
use loaders::v2::MetadataV2;
pub use actor::DumpActor;
pub use handle_impl::*;
pub use message::DumpMsg;
use super::{update_actor::UpdateActorHandle, uuid_resolver::UuidResolverHandle};
use crate::index_controller::dump_actor::error::DumpActorError;
use crate::{helpers::compression, option::IndexerOpts};
use error::Result;
mod actor;
pub mod error;
mod handle_impl;
mod loaders;
mod message;
const META_FILE_NAME: &str = "metadata.json";
#[async_trait::async_trait]
#[cfg_attr(test, automock)]
pub trait DumpActorHandle {
/// Start the creation of a dump
/// Implementation: [handle_impl::DumpActorHandleImpl::create_dump]
async fn create_dump(&self) -> Result<DumpInfo>;
/// Return the status of an already created dump
/// Implementation: [handle_impl::DumpActorHandleImpl::dump_info]
async fn dump_info(&self, uid: String) -> Result<DumpInfo>;
}
#[derive(Debug, Serialize, Deserialize)]
#[serde(tag = "dumpVersion")]
pub enum Metadata {
V1(MetadataV1),
V2(MetadataV2),
}
impl Metadata {
pub fn new_v2(index_db_size: usize, update_db_size: usize) -> Self {
let meta = MetadataV2::new(index_db_size, update_db_size);
Self::V2(meta)
}
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Clone)]
#[serde(rename_all = "snake_case")]
pub enum DumpStatus {
Done,
InProgress,
Failed,
}
#[derive(Debug, Serialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct DumpInfo {
pub uid: String,
pub status: DumpStatus,
#[serde(skip_serializing_if = "Option::is_none")]
pub error: Option<String>,
started_at: DateTime<Utc>,
#[serde(skip_serializing_if = "Option::is_none")]
finished_at: Option<DateTime<Utc>>,
}
impl DumpInfo {
pub fn new(uid: String, status: DumpStatus) -> Self {
Self {
uid,
status,
error: None,
started_at: Utc::now(),
finished_at: None,
}
}
pub fn with_error(&mut self, error: String) {
self.status = DumpStatus::Failed;
self.finished_at = Some(Utc::now());
self.error = Some(error);
}
pub fn done(&mut self) {
self.finished_at = Some(Utc::now());
self.status = DumpStatus::Done;
}
pub fn dump_already_in_progress(&self) -> bool {
self.status == DumpStatus::InProgress
}
}
pub fn load_dump(
dst_path: impl AsRef<Path>,
src_path: impl AsRef<Path>,
index_db_size: usize,
update_db_size: usize,
indexer_opts: &IndexerOpts,
) -> anyhow::Result<()> {
let tmp_src = tempfile::tempdir_in(".")?;
let tmp_src_path = tmp_src.path();
compression::from_tar_gz(&src_path, tmp_src_path)?;
let meta_path = tmp_src_path.join(META_FILE_NAME);
let mut meta_file = File::open(&meta_path)?;
let meta: Metadata = serde_json::from_reader(&mut meta_file)?;
let dst_dir = dst_path
.as_ref()
.parent()
.with_context(|| format!("Invalid db path: {}", dst_path.as_ref().display()))?;
let tmp_dst = tempfile::tempdir_in(dst_dir)?;
match meta {
Metadata::V1(meta) => {
meta.load_dump(&tmp_src_path, tmp_dst.path(), index_db_size, indexer_opts)?
}
Metadata::V2(meta) => meta.load_dump(
&tmp_src_path,
tmp_dst.path(),
index_db_size,
update_db_size,
indexer_opts,
)?,
}
// Persist and atomically rename the db
let persisted_dump = tmp_dst.into_path();
if dst_path.as_ref().exists() {
warn!("Overwriting database at {}", dst_path.as_ref().display());
std::fs::remove_dir_all(&dst_path)?;
}
std::fs::rename(&persisted_dump, &dst_path)?;
Ok(())
}
struct DumpTask<U, P> {
path: PathBuf,
uuid_resolver: U,
update_handle: P,
uid: String,
update_db_size: usize,
index_db_size: usize,
}
impl<U, P> DumpTask<U, P>
where
U: UuidResolverHandle + Send + Sync + Clone + 'static,
P: UpdateActorHandle + Send + Sync + Clone + 'static,
{
async fn run(self) -> Result<()> {
trace!("Performing dump.");
create_dir_all(&self.path).await?;
let path_clone = self.path.clone();
let temp_dump_dir =
tokio::task::spawn_blocking(|| tempfile::TempDir::new_in(path_clone)).await??;
let temp_dump_path = temp_dump_dir.path().to_owned();
let meta = Metadata::new_v2(self.index_db_size, self.update_db_size);
let meta_path = temp_dump_path.join(META_FILE_NAME);
let mut meta_file = File::create(&meta_path)?;
serde_json::to_writer(&mut meta_file, &meta)?;
let uuids = self.uuid_resolver.dump(temp_dump_path.clone()).await?;
self.update_handle
.dump(uuids, temp_dump_path.clone())
.await?;
let dump_path = tokio::task::spawn_blocking(move || -> Result<PathBuf> {
let temp_dump_file = tempfile::NamedTempFile::new_in(&self.path)?;
compression::to_tar_gz(temp_dump_path, temp_dump_file.path())
.map_err(|e| DumpActorError::Internal(e.into()))?;
let dump_path = self.path.join(self.uid).with_extension("dump");
temp_dump_file.persist(&dump_path)?;
Ok(dump_path)
})
.await??;
info!("Created dump in {:?}.", dump_path);
Ok(())
}
}

View File

@ -1,348 +0,0 @@
use std::fs::File;
use std::path::PathBuf;
use std::sync::Arc;
use async_stream::stream;
use futures::stream::StreamExt;
use heed::CompactionOption;
use log::debug;
use milli::update::UpdateBuilder;
use tokio::task::spawn_blocking;
use tokio::{fs, sync::mpsc};
use uuid::Uuid;
use crate::index::{
update_handler::UpdateHandler, Checked, Document, SearchQuery, SearchResult, Settings,
};
use crate::index_controller::{
get_arc_ownership_blocking, Failed, IndexStats, Processed, Processing,
};
use crate::option::IndexerOpts;
use super::error::{IndexActorError, Result};
use super::{IndexMeta, IndexMsg, IndexSettings, IndexStore};
pub const CONCURRENT_INDEX_MSG: usize = 10;
pub struct IndexActor<S> {
receiver: Option<mpsc::Receiver<IndexMsg>>,
update_handler: Arc<UpdateHandler>,
store: S,
}
impl<S: IndexStore + Sync + Send> IndexActor<S> {
pub fn new(receiver: mpsc::Receiver<IndexMsg>, store: S) -> anyhow::Result<Self> {
let options = IndexerOpts::default();
let update_handler = UpdateHandler::new(&options)?;
let update_handler = Arc::new(update_handler);
let receiver = Some(receiver);
Ok(Self {
receiver,
update_handler,
store,
})
}
/// `run` poll the write_receiver and read_receiver concurrently, but while messages send
/// through the read channel are processed concurrently, the messages sent through the write
/// channel are processed one at a time.
pub async fn run(mut self) {
let mut receiver = self
.receiver
.take()
.expect("Index Actor must have a inbox at this point.");
let stream = stream! {
loop {
match receiver.recv().await {
Some(msg) => yield msg,
None => break,
}
}
};
stream
.for_each_concurrent(Some(CONCURRENT_INDEX_MSG), |msg| self.handle_message(msg))
.await;
}
async fn handle_message(&self, msg: IndexMsg) {
use IndexMsg::*;
match msg {
CreateIndex {
uuid,
primary_key,
ret,
} => {
let _ = ret.send(self.handle_create_index(uuid, primary_key).await);
}
Update {
ret,
meta,
data,
uuid,
} => {
let _ = ret.send(self.handle_update(uuid, meta, data).await);
}
Search { ret, query, uuid } => {
let _ = ret.send(self.handle_search(uuid, query).await);
}
Settings { ret, uuid } => {
let _ = ret.send(self.handle_settings(uuid).await);
}
Documents {
ret,
uuid,
attributes_to_retrieve,
offset,
limit,
} => {
let _ = ret.send(
self.handle_fetch_documents(uuid, offset, limit, attributes_to_retrieve)
.await,
);
}
Document {
uuid,
attributes_to_retrieve,
doc_id,
ret,
} => {
let _ = ret.send(
self.handle_fetch_document(uuid, doc_id, attributes_to_retrieve)
.await,
);
}
Delete { uuid, ret } => {
let _ = ret.send(self.handle_delete(uuid).await);
}
GetMeta { uuid, ret } => {
let _ = ret.send(self.handle_get_meta(uuid).await);
}
UpdateIndex {
uuid,
index_settings,
ret,
} => {
let _ = ret.send(self.handle_update_index(uuid, index_settings).await);
}
Snapshot { uuid, path, ret } => {
let _ = ret.send(self.handle_snapshot(uuid, path).await);
}
Dump { uuid, path, ret } => {
let _ = ret.send(self.handle_dump(uuid, path).await);
}
GetStats { uuid, ret } => {
let _ = ret.send(self.handle_get_stats(uuid).await);
}
}
}
async fn handle_search(&self, uuid: Uuid, query: SearchQuery) -> Result<SearchResult> {
let index = self
.store
.get(uuid)
.await?
.ok_or(IndexActorError::UnexistingIndex)?;
let result = spawn_blocking(move || index.perform_search(query)).await??;
Ok(result)
}
async fn handle_create_index(
&self,
uuid: Uuid,
primary_key: Option<String>,
) -> Result<IndexMeta> {
let index = self.store.create(uuid, primary_key).await?;
let meta = spawn_blocking(move || IndexMeta::new(&index)).await??;
Ok(meta)
}
async fn handle_update(
&self,
uuid: Uuid,
meta: Processing,
data: Option<File>,
) -> Result<std::result::Result<Processed, Failed>> {
debug!("Processing update {}", meta.id());
let update_handler = self.update_handler.clone();
let index = match self.store.get(uuid).await? {
Some(index) => index,
None => self.store.create(uuid, None).await?,
};
Ok(spawn_blocking(move || update_handler.handle_update(meta, data, index)).await?)
}
async fn handle_settings(&self, uuid: Uuid) -> Result<Settings<Checked>> {
let index = self
.store
.get(uuid)
.await?
.ok_or(IndexActorError::UnexistingIndex)?;
let result = spawn_blocking(move || index.settings()).await??;
Ok(result)
}
async fn handle_fetch_documents(
&self,
uuid: Uuid,
offset: usize,
limit: usize,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Vec<Document>> {
let index = self
.store
.get(uuid)
.await?
.ok_or(IndexActorError::UnexistingIndex)?;
let result =
spawn_blocking(move || index.retrieve_documents(offset, limit, attributes_to_retrieve))
.await??;
Ok(result)
}
async fn handle_fetch_document(
&self,
uuid: Uuid,
doc_id: String,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Document> {
let index = self
.store
.get(uuid)
.await?
.ok_or(IndexActorError::UnexistingIndex)?;
let result =
spawn_blocking(move || index.retrieve_document(doc_id, attributes_to_retrieve))
.await??;
Ok(result)
}
async fn handle_delete(&self, uuid: Uuid) -> Result<()> {
let index = self.store.delete(uuid).await?;
if let Some(index) = index {
tokio::task::spawn(async move {
let index = index.0;
let store = get_arc_ownership_blocking(index).await;
spawn_blocking(move || {
store.prepare_for_closing().wait();
debug!("Index closed");
});
});
}
Ok(())
}
async fn handle_get_meta(&self, uuid: Uuid) -> Result<IndexMeta> {
match self.store.get(uuid).await? {
Some(index) => {
let meta = spawn_blocking(move || IndexMeta::new(&index)).await??;
Ok(meta)
}
None => Err(IndexActorError::UnexistingIndex),
}
}
async fn handle_update_index(
&self,
uuid: Uuid,
index_settings: IndexSettings,
) -> Result<IndexMeta> {
let index = self
.store
.get(uuid)
.await?
.ok_or(IndexActorError::UnexistingIndex)?;
let result = spawn_blocking(move || match index_settings.primary_key {
Some(primary_key) => {
let mut txn = index.write_txn()?;
if index.primary_key(&txn)?.is_some() {
return Err(IndexActorError::ExistingPrimaryKey);
}
let mut builder = UpdateBuilder::new(0).settings(&mut txn, &index);
builder.set_primary_key(primary_key);
builder.execute(|_, _| ())?;
let meta = IndexMeta::new_txn(&index, &txn)?;
txn.commit()?;
Ok(meta)
}
None => {
let meta = IndexMeta::new(&index)?;
Ok(meta)
}
})
.await??;
Ok(result)
}
async fn handle_snapshot(&self, uuid: Uuid, mut path: PathBuf) -> Result<()> {
use tokio::fs::create_dir_all;
path.push("indexes");
create_dir_all(&path).await?;
if let Some(index) = self.store.get(uuid).await? {
let mut index_path = path.join(format!("index-{}", uuid));
create_dir_all(&index_path).await?;
index_path.push("data.mdb");
spawn_blocking(move || -> Result<()> {
// Get write txn to wait for ongoing write transaction before snapshot.
let _txn = index.write_txn()?;
index
.env
.copy_to_path(index_path, CompactionOption::Enabled)?;
Ok(())
})
.await??;
}
Ok(())
}
/// Create a `documents.jsonl` and a `settings.json` in `path/uid/` with a dump of all the
/// documents and all the settings.
async fn handle_dump(&self, uuid: Uuid, path: PathBuf) -> Result<()> {
let index = self
.store
.get(uuid)
.await?
.ok_or(IndexActorError::UnexistingIndex)?;
let path = path.join(format!("indexes/index-{}/", uuid));
fs::create_dir_all(&path).await?;
tokio::task::spawn_blocking(move || index.dump(path)).await??;
Ok(())
}
async fn handle_get_stats(&self, uuid: Uuid) -> Result<IndexStats> {
let index = self
.store
.get(uuid)
.await?
.ok_or(IndexActorError::UnexistingIndex)?;
spawn_blocking(move || {
let rtxn = index.read_txn()?;
Ok(IndexStats {
size: index.size(),
number_of_documents: index.number_of_documents(&rtxn)?,
is_indexing: None,
field_distribution: index.field_distribution(&rtxn)?,
})
})
.await?
}
}

View File

@ -1,48 +0,0 @@
use meilisearch_error::{Code, ErrorCode};
use crate::{error::MilliError, index::error::IndexError};
pub type Result<T> = std::result::Result<T, IndexActorError>;
#[derive(thiserror::Error, Debug)]
pub enum IndexActorError {
#[error("{0}")]
IndexError(#[from] IndexError),
#[error("Index already exists")]
IndexAlreadyExists,
#[error("Index not found")]
UnexistingIndex,
#[error("A primary key is already present. It's impossible to update it")]
ExistingPrimaryKey,
#[error("Internal Error: {0}")]
Internal(Box<dyn std::error::Error + Send + Sync + 'static>),
#[error("{0}")]
Milli(#[from] milli::Error),
}
macro_rules! internal_error {
($($other:path), *) => {
$(
impl From<$other> for IndexActorError {
fn from(other: $other) -> Self {
Self::Internal(Box::new(other))
}
}
)*
}
}
internal_error!(heed::Error, tokio::task::JoinError, std::io::Error);
impl ErrorCode for IndexActorError {
fn error_code(&self) -> Code {
match self {
IndexActorError::IndexError(e) => e.error_code(),
IndexActorError::IndexAlreadyExists => Code::IndexAlreadyExists,
IndexActorError::UnexistingIndex => Code::IndexNotFound,
IndexActorError::ExistingPrimaryKey => Code::PrimaryKeyAlreadyPresent,
IndexActorError::Internal(_) => Code::Internal,
IndexActorError::Milli(e) => MilliError(e).error_code(),
}
}
}

View File

@ -1,159 +0,0 @@
use std::path::{Path, PathBuf};
use tokio::sync::{mpsc, oneshot};
use uuid::Uuid;
use crate::{
index::Checked,
index_controller::{IndexSettings, IndexStats, Processing},
};
use crate::{
index::{Document, SearchQuery, SearchResult, Settings},
index_controller::{Failed, Processed},
};
use super::error::Result;
use super::{IndexActor, IndexActorHandle, IndexMeta, IndexMsg, MapIndexStore};
#[derive(Clone)]
pub struct IndexActorHandleImpl {
sender: mpsc::Sender<IndexMsg>,
}
#[async_trait::async_trait]
impl IndexActorHandle for IndexActorHandleImpl {
async fn create_index(&self, uuid: Uuid, primary_key: Option<String>) -> Result<IndexMeta> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::CreateIndex {
ret,
uuid,
primary_key,
};
let _ = self.sender.send(msg).await;
receiver.await.expect("IndexActor has been killed")
}
async fn update(
&self,
uuid: Uuid,
meta: Processing,
data: Option<std::fs::File>,
) -> Result<std::result::Result<Processed, Failed>> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::Update {
ret,
meta,
data,
uuid,
};
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
async fn search(&self, uuid: Uuid, query: SearchQuery) -> Result<SearchResult> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::Search { uuid, query, ret };
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
async fn settings(&self, uuid: Uuid) -> Result<Settings<Checked>> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::Settings { uuid, ret };
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
async fn documents(
&self,
uuid: Uuid,
offset: usize,
limit: usize,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Vec<Document>> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::Documents {
uuid,
ret,
offset,
attributes_to_retrieve,
limit,
};
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
async fn document(
&self,
uuid: Uuid,
doc_id: String,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Document> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::Document {
uuid,
ret,
doc_id,
attributes_to_retrieve,
};
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
async fn delete(&self, uuid: Uuid) -> Result<()> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::Delete { uuid, ret };
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
async fn get_index_meta(&self, uuid: Uuid) -> Result<IndexMeta> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::GetMeta { uuid, ret };
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
async fn update_index(&self, uuid: Uuid, index_settings: IndexSettings) -> Result<IndexMeta> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::UpdateIndex {
uuid,
index_settings,
ret,
};
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
async fn snapshot(&self, uuid: Uuid, path: PathBuf) -> Result<()> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::Snapshot { uuid, path, ret };
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
async fn dump(&self, uuid: Uuid, path: PathBuf) -> Result<()> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::Dump { uuid, path, ret };
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
async fn get_index_stats(&self, uuid: Uuid) -> Result<IndexStats> {
let (ret, receiver) = oneshot::channel();
let msg = IndexMsg::GetStats { uuid, ret };
let _ = self.sender.send(msg).await;
Ok(receiver.await.expect("IndexActor has been killed")?)
}
}
impl IndexActorHandleImpl {
pub fn new(path: impl AsRef<Path>, index_size: usize) -> anyhow::Result<Self> {
let (sender, receiver) = mpsc::channel(100);
let store = MapIndexStore::new(path, index_size);
let actor = IndexActor::new(receiver, store)?;
tokio::task::spawn(actor.run());
Ok(Self { sender })
}
}

View File

@ -1,74 +0,0 @@
use std::path::PathBuf;
use tokio::sync::oneshot;
use uuid::Uuid;
use super::error::Result as IndexResult;
use crate::index::{Checked, Document, SearchQuery, SearchResult, Settings};
use crate::index_controller::{Failed, IndexStats, Processed, Processing};
use super::{IndexMeta, IndexSettings};
#[allow(clippy::large_enum_variant)]
pub enum IndexMsg {
CreateIndex {
uuid: Uuid,
primary_key: Option<String>,
ret: oneshot::Sender<IndexResult<IndexMeta>>,
},
Update {
uuid: Uuid,
meta: Processing,
data: Option<std::fs::File>,
ret: oneshot::Sender<IndexResult<Result<Processed, Failed>>>,
},
Search {
uuid: Uuid,
query: SearchQuery,
ret: oneshot::Sender<IndexResult<SearchResult>>,
},
Settings {
uuid: Uuid,
ret: oneshot::Sender<IndexResult<Settings<Checked>>>,
},
Documents {
uuid: Uuid,
attributes_to_retrieve: Option<Vec<String>>,
offset: usize,
limit: usize,
ret: oneshot::Sender<IndexResult<Vec<Document>>>,
},
Document {
uuid: Uuid,
attributes_to_retrieve: Option<Vec<String>>,
doc_id: String,
ret: oneshot::Sender<IndexResult<Document>>,
},
Delete {
uuid: Uuid,
ret: oneshot::Sender<IndexResult<()>>,
},
GetMeta {
uuid: Uuid,
ret: oneshot::Sender<IndexResult<IndexMeta>>,
},
UpdateIndex {
uuid: Uuid,
index_settings: IndexSettings,
ret: oneshot::Sender<IndexResult<IndexMeta>>,
},
Snapshot {
uuid: Uuid,
path: PathBuf,
ret: oneshot::Sender<IndexResult<()>>,
},
Dump {
uuid: Uuid,
path: PathBuf,
ret: oneshot::Sender<IndexResult<()>>,
},
GetStats {
uuid: Uuid,
ret: oneshot::Sender<IndexResult<IndexStats>>,
},
}

View File

@ -1,169 +0,0 @@
use std::fs::File;
use std::path::PathBuf;
use chrono::{DateTime, Utc};
#[cfg(test)]
use mockall::automock;
use serde::{Deserialize, Serialize};
use uuid::Uuid;
use actor::IndexActor;
pub use actor::CONCURRENT_INDEX_MSG;
pub use handle_impl::IndexActorHandleImpl;
use message::IndexMsg;
use store::{IndexStore, MapIndexStore};
use crate::index::{Checked, Document, Index, SearchQuery, SearchResult, Settings};
use crate::index_controller::{Failed, IndexStats, Processed, Processing};
use error::Result;
use super::IndexSettings;
mod actor;
pub mod error;
mod handle_impl;
mod message;
mod store;
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct IndexMeta {
created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
pub primary_key: Option<String>,
}
impl IndexMeta {
fn new(index: &Index) -> Result<Self> {
let txn = index.read_txn()?;
Self::new_txn(index, &txn)
}
fn new_txn(index: &Index, txn: &heed::RoTxn) -> Result<Self> {
let created_at = index.created_at(txn)?;
let updated_at = index.updated_at(txn)?;
let primary_key = index.primary_key(txn)?.map(String::from);
Ok(Self {
created_at,
updated_at,
primary_key,
})
}
}
#[async_trait::async_trait]
#[cfg_attr(test, automock)]
pub trait IndexActorHandle {
async fn create_index(&self, uuid: Uuid, primary_key: Option<String>) -> Result<IndexMeta>;
async fn update(
&self,
uuid: Uuid,
meta: Processing,
data: Option<File>,
) -> Result<std::result::Result<Processed, Failed>>;
async fn search(&self, uuid: Uuid, query: SearchQuery) -> Result<SearchResult>;
async fn settings(&self, uuid: Uuid) -> Result<Settings<Checked>>;
async fn documents(
&self,
uuid: Uuid,
offset: usize,
limit: usize,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Vec<Document>>;
async fn document(
&self,
uuid: Uuid,
doc_id: String,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Document>;
async fn delete(&self, uuid: Uuid) -> Result<()>;
async fn get_index_meta(&self, uuid: Uuid) -> Result<IndexMeta>;
async fn update_index(&self, uuid: Uuid, index_settings: IndexSettings) -> Result<IndexMeta>;
async fn snapshot(&self, uuid: Uuid, path: PathBuf) -> Result<()>;
async fn dump(&self, uuid: Uuid, path: PathBuf) -> Result<()>;
async fn get_index_stats(&self, uuid: Uuid) -> Result<IndexStats>;
}
#[cfg(test)]
mod test {
use std::sync::Arc;
use super::*;
#[async_trait::async_trait]
/// Useful for passing around an `Arc<MockIndexActorHandle>` in tests.
impl IndexActorHandle for Arc<MockIndexActorHandle> {
async fn create_index(&self, uuid: Uuid, primary_key: Option<String>) -> Result<IndexMeta> {
self.as_ref().create_index(uuid, primary_key).await
}
async fn update(
&self,
uuid: Uuid,
meta: Processing,
data: Option<std::fs::File>,
) -> Result<std::result::Result<Processed, Failed>> {
self.as_ref().update(uuid, meta, data).await
}
async fn search(&self, uuid: Uuid, query: SearchQuery) -> Result<SearchResult> {
self.as_ref().search(uuid, query).await
}
async fn settings(&self, uuid: Uuid) -> Result<Settings<Checked>> {
self.as_ref().settings(uuid).await
}
async fn documents(
&self,
uuid: Uuid,
offset: usize,
limit: usize,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Vec<Document>> {
self.as_ref()
.documents(uuid, offset, limit, attributes_to_retrieve)
.await
}
async fn document(
&self,
uuid: Uuid,
doc_id: String,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Document> {
self.as_ref()
.document(uuid, doc_id, attributes_to_retrieve)
.await
}
async fn delete(&self, uuid: Uuid) -> Result<()> {
self.as_ref().delete(uuid).await
}
async fn get_index_meta(&self, uuid: Uuid) -> Result<IndexMeta> {
self.as_ref().get_index_meta(uuid).await
}
async fn update_index(
&self,
uuid: Uuid,
index_settings: IndexSettings,
) -> Result<IndexMeta> {
self.as_ref().update_index(uuid, index_settings).await
}
async fn snapshot(&self, uuid: Uuid, path: PathBuf) -> Result<()> {
self.as_ref().snapshot(uuid, path).await
}
async fn dump(&self, uuid: Uuid, path: PathBuf) -> Result<()> {
self.as_ref().dump(uuid, path).await
}
async fn get_index_stats(&self, uuid: Uuid) -> Result<IndexStats> {
self.as_ref().get_index_stats(uuid).await
}
}
}

View File

@ -1,441 +0,0 @@
use std::collections::BTreeMap;
use std::path::Path;
use std::sync::Arc;
use std::time::Duration;
use actix_web::web::Bytes;
use chrono::{DateTime, Utc};
use futures::stream::StreamExt;
use log::error;
use log::info;
use milli::FieldDistribution;
use serde::{Deserialize, Serialize};
use tokio::sync::mpsc;
use tokio::time::sleep;
use uuid::Uuid;
use dump_actor::DumpActorHandle;
pub use dump_actor::{DumpInfo, DumpStatus};
use index_actor::IndexActorHandle;
use snapshot::{load_snapshot, SnapshotService};
use update_actor::UpdateActorHandle;
pub use updates::*;
use uuid_resolver::{error::UuidResolverError, UuidResolverHandle};
use crate::extractors::payload::Payload;
use crate::index::{Checked, Document, SearchQuery, SearchResult, Settings};
use crate::option::Opt;
use error::Result;
use self::dump_actor::load_dump;
use self::error::IndexControllerError;
mod dump_actor;
pub mod error;
pub mod index_actor;
mod snapshot;
mod update_actor;
mod updates;
mod uuid_resolver;
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct IndexMetadata {
#[serde(skip)]
pub uuid: Uuid,
pub uid: String,
name: String,
#[serde(flatten)]
pub meta: index_actor::IndexMeta,
}
#[derive(Clone, Debug)]
pub struct IndexSettings {
pub uid: Option<String>,
pub primary_key: Option<String>,
}
#[derive(Serialize, Debug)]
#[serde(rename_all = "camelCase")]
pub struct IndexStats {
#[serde(skip)]
pub size: u64,
pub number_of_documents: u64,
/// Whether the current index is performing an update. It is initially `None` when the
/// index returns it, since it is the `UpdateStore` that knows what index is currently indexing. It is
/// later set to either true or false, we we retrieve the information from the `UpdateStore`
pub is_indexing: Option<bool>,
pub field_distribution: FieldDistribution,
}
#[derive(Clone)]
pub struct IndexController {
uuid_resolver: uuid_resolver::UuidResolverHandleImpl,
index_handle: index_actor::IndexActorHandleImpl,
update_handle: update_actor::UpdateActorHandleImpl<Bytes>,
dump_handle: dump_actor::DumpActorHandleImpl,
}
#[derive(Serialize, Debug)]
#[serde(rename_all = "camelCase")]
pub struct Stats {
pub database_size: u64,
pub last_update: Option<DateTime<Utc>>,
pub indexes: BTreeMap<String, IndexStats>,
}
impl IndexController {
pub fn new(path: impl AsRef<Path>, options: &Opt) -> anyhow::Result<Self> {
let index_size = options.max_index_size.get_bytes() as usize;
let update_store_size = options.max_index_size.get_bytes() as usize;
if let Some(ref path) = options.import_snapshot {
info!("Loading from snapshot {:?}", path);
load_snapshot(
&options.db_path,
path,
options.ignore_snapshot_if_db_exists,
options.ignore_missing_snapshot,
)?;
} else if let Some(ref src_path) = options.import_dump {
load_dump(
&options.db_path,
src_path,
options.max_index_size.get_bytes() as usize,
options.max_udb_size.get_bytes() as usize,
&options.indexer_options,
)?;
}
std::fs::create_dir_all(&path)?;
let uuid_resolver = uuid_resolver::UuidResolverHandleImpl::new(&path)?;
let index_handle = index_actor::IndexActorHandleImpl::new(&path, index_size)?;
let update_handle = update_actor::UpdateActorHandleImpl::new(
index_handle.clone(),
&path,
update_store_size,
)?;
let dump_handle = dump_actor::DumpActorHandleImpl::new(
&options.dumps_dir,
uuid_resolver.clone(),
update_handle.clone(),
options.max_index_size.get_bytes() as usize,
options.max_udb_size.get_bytes() as usize,
)?;
if options.schedule_snapshot {
let snapshot_service = SnapshotService::new(
uuid_resolver.clone(),
update_handle.clone(),
Duration::from_secs(options.snapshot_interval_sec),
options.snapshot_dir.clone(),
options
.db_path
.file_name()
.map(|n| n.to_owned().into_string().expect("invalid path"))
.unwrap_or_else(|| String::from("data.ms")),
);
tokio::task::spawn(snapshot_service.run());
}
Ok(Self {
uuid_resolver,
index_handle,
update_handle,
dump_handle,
})
}
pub async fn add_documents(
&self,
uid: String,
method: milli::update::IndexDocumentsMethod,
format: milli::update::UpdateFormat,
payload: Payload,
primary_key: Option<String>,
) -> Result<UpdateStatus> {
let perform_update = |uuid| async move {
let meta = UpdateMeta::DocumentsAddition {
method,
format,
primary_key,
};
let (sender, receiver) = mpsc::channel(10);
// It is necessary to spawn a local task to send the payload to the update handle to
// prevent dead_locking between the update_handle::update that waits for the update to be
// registered and the update_actor that waits for the the payload to be sent to it.
tokio::task::spawn_local(async move {
payload
.for_each(|r| async {
let _ = sender.send(r).await;
})
.await
});
// This must be done *AFTER* spawning the task.
self.update_handle.update(meta, receiver, uuid).await
};
match self.uuid_resolver.get(uid).await {
Ok(uuid) => Ok(perform_update(uuid).await?),
Err(UuidResolverError::UnexistingIndex(name)) => {
let uuid = Uuid::new_v4();
let status = perform_update(uuid).await?;
// ignore if index creation fails now, since it may already have been created
let _ = self.index_handle.create_index(uuid, None).await;
self.uuid_resolver.insert(name, uuid).await?;
Ok(status)
}
Err(e) => Err(e.into()),
}
}
pub async fn clear_documents(&self, uid: String) -> Result<UpdateStatus> {
let uuid = self.uuid_resolver.get(uid).await?;
let meta = UpdateMeta::ClearDocuments;
let (_, receiver) = mpsc::channel(1);
let status = self.update_handle.update(meta, receiver, uuid).await?;
Ok(status)
}
pub async fn delete_documents(
&self,
uid: String,
documents: Vec<String>,
) -> Result<UpdateStatus> {
let uuid = self.uuid_resolver.get(uid).await?;
let meta = UpdateMeta::DeleteDocuments { ids: documents };
let (_, receiver) = mpsc::channel(1);
let status = self.update_handle.update(meta, receiver, uuid).await?;
Ok(status)
}
pub async fn update_settings(
&self,
uid: String,
settings: Settings<Checked>,
create: bool,
) -> Result<UpdateStatus> {
let perform_udpate = |uuid| async move {
let meta = UpdateMeta::Settings(settings.into_unchecked());
// Nothing so send, drop the sender right away, as not to block the update actor.
let (_, receiver) = mpsc::channel(1);
self.update_handle.update(meta, receiver, uuid).await
};
match self.uuid_resolver.get(uid).await {
Ok(uuid) => Ok(perform_udpate(uuid).await?),
Err(UuidResolverError::UnexistingIndex(name)) if create => {
let uuid = Uuid::new_v4();
let status = perform_udpate(uuid).await?;
// ignore if index creation fails now, since it may already have been created
let _ = self.index_handle.create_index(uuid, None).await;
self.uuid_resolver.insert(name, uuid).await?;
Ok(status)
}
Err(e) => Err(e.into()),
}
}
pub async fn create_index(&self, index_settings: IndexSettings) -> Result<IndexMetadata> {
let IndexSettings { uid, primary_key } = index_settings;
let uid = uid.ok_or(IndexControllerError::MissingUid)?;
let uuid = Uuid::new_v4();
let meta = self.index_handle.create_index(uuid, primary_key).await?;
self.uuid_resolver.insert(uid.clone(), uuid).await?;
let meta = IndexMetadata {
uuid,
name: uid.clone(),
uid,
meta,
};
Ok(meta)
}
pub async fn delete_index(&self, uid: String) -> Result<()> {
let uuid = self.uuid_resolver.delete(uid).await?;
// We remove the index from the resolver synchronously, and effectively perform the index
// deletion as a background task.
let update_handle = self.update_handle.clone();
let index_handle = self.index_handle.clone();
tokio::spawn(async move {
if let Err(e) = update_handle.delete(uuid).await {
error!("Error while deleting index: {}", e);
}
if let Err(e) = index_handle.delete(uuid).await {
error!("Error while deleting index: {}", e);
}
});
Ok(())
}
pub async fn update_status(&self, uid: String, id: u64) -> Result<UpdateStatus> {
let uuid = self.uuid_resolver.get(uid).await?;
let result = self.update_handle.update_status(uuid, id).await?;
Ok(result)
}
pub async fn all_update_status(&self, uid: String) -> Result<Vec<UpdateStatus>> {
let uuid = self.uuid_resolver.get(uid).await?;
let result = self.update_handle.get_all_updates_status(uuid).await?;
Ok(result)
}
pub async fn list_indexes(&self) -> Result<Vec<IndexMetadata>> {
let uuids = self.uuid_resolver.list().await?;
let mut ret = Vec::new();
for (uid, uuid) in uuids {
let meta = self.index_handle.get_index_meta(uuid).await?;
let meta = IndexMetadata {
uuid,
name: uid.clone(),
uid,
meta,
};
ret.push(meta);
}
Ok(ret)
}
pub async fn settings(&self, uid: String) -> Result<Settings<Checked>> {
let uuid = self.uuid_resolver.get(uid.clone()).await?;
let settings = self.index_handle.settings(uuid).await?;
Ok(settings)
}
pub async fn documents(
&self,
uid: String,
offset: usize,
limit: usize,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Vec<Document>> {
let uuid = self.uuid_resolver.get(uid.clone()).await?;
let documents = self
.index_handle
.documents(uuid, offset, limit, attributes_to_retrieve)
.await?;
Ok(documents)
}
pub async fn document(
&self,
uid: String,
doc_id: String,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Document> {
let uuid = self.uuid_resolver.get(uid.clone()).await?;
let document = self
.index_handle
.document(uuid, doc_id, attributes_to_retrieve)
.await?;
Ok(document)
}
pub async fn update_index(
&self,
uid: String,
mut index_settings: IndexSettings,
) -> Result<IndexMetadata> {
if index_settings.uid.is_some() {
index_settings.uid.take();
}
let uuid = self.uuid_resolver.get(uid.clone()).await?;
let meta = self.index_handle.update_index(uuid, index_settings).await?;
let meta = IndexMetadata {
uuid,
name: uid.clone(),
uid,
meta,
};
Ok(meta)
}
pub async fn search(&self, uid: String, query: SearchQuery) -> Result<SearchResult> {
let uuid = self.uuid_resolver.get(uid).await?;
let result = self.index_handle.search(uuid, query).await?;
Ok(result)
}
pub async fn get_index(&self, uid: String) -> Result<IndexMetadata> {
let uuid = self.uuid_resolver.get(uid.clone()).await?;
let meta = self.index_handle.get_index_meta(uuid).await?;
let meta = IndexMetadata {
uuid,
name: uid.clone(),
uid,
meta,
};
Ok(meta)
}
pub async fn get_uuids_size(&self) -> Result<u64> {
Ok(self.uuid_resolver.get_size().await?)
}
pub async fn get_index_stats(&self, uid: String) -> Result<IndexStats> {
let uuid = self.uuid_resolver.get(uid).await?;
let update_infos = self.update_handle.get_info().await?;
let mut stats = self.index_handle.get_index_stats(uuid).await?;
// Check if the currently indexing update is from out index.
stats.is_indexing = Some(Some(uuid) == update_infos.processing);
Ok(stats)
}
pub async fn get_all_stats(&self) -> Result<Stats> {
let update_infos = self.update_handle.get_info().await?;
let mut database_size = self.get_uuids_size().await? + update_infos.size;
let mut last_update: Option<DateTime<_>> = None;
let mut indexes = BTreeMap::new();
for index in self.list_indexes().await? {
let mut index_stats = self.index_handle.get_index_stats(index.uuid).await?;
database_size += index_stats.size;
last_update = last_update.map_or(Some(index.meta.updated_at), |last| {
Some(last.max(index.meta.updated_at))
});
index_stats.is_indexing = Some(Some(index.uuid) == update_infos.processing);
indexes.insert(index.uid, index_stats);
}
Ok(Stats {
database_size,
last_update,
indexes,
})
}
pub async fn create_dump(&self) -> Result<DumpInfo> {
Ok(self.dump_handle.create_dump().await?)
}
pub async fn dump_info(&self, uid: String) -> Result<DumpInfo> {
Ok(self.dump_handle.dump_info(uid).await?)
}
}
pub async fn get_arc_ownership_blocking<T>(mut item: Arc<T>) -> T {
loop {
match Arc::try_unwrap(item) {
Ok(item) => return item,
Err(item_arc) => {
item = item_arc;
sleep(Duration::from_millis(100)).await;
continue;
}
}
}
}

View File

@ -1,268 +0,0 @@
use std::path::{Path, PathBuf};
use std::time::Duration;
use anyhow::bail;
use log::{error, info, trace};
use tokio::fs;
use tokio::task::spawn_blocking;
use tokio::time::sleep;
use super::update_actor::UpdateActorHandle;
use super::uuid_resolver::UuidResolverHandle;
use crate::helpers::compression;
pub struct SnapshotService<U, R> {
uuid_resolver_handle: R,
update_handle: U,
snapshot_period: Duration,
snapshot_path: PathBuf,
db_name: String,
}
impl<U, R> SnapshotService<U, R>
where
U: UpdateActorHandle,
R: UuidResolverHandle,
{
pub fn new(
uuid_resolver_handle: R,
update_handle: U,
snapshot_period: Duration,
snapshot_path: PathBuf,
db_name: String,
) -> Self {
Self {
uuid_resolver_handle,
update_handle,
snapshot_period,
snapshot_path,
db_name,
}
}
pub async fn run(self) {
info!(
"Snapshot scheduled every {}s.",
self.snapshot_period.as_secs()
);
loop {
if let Err(e) = self.perform_snapshot().await {
error!("Error while performing snapshot: {}", e);
}
sleep(self.snapshot_period).await;
}
}
async fn perform_snapshot(&self) -> anyhow::Result<()> {
trace!("Performing snapshot.");
let snapshot_dir = self.snapshot_path.clone();
fs::create_dir_all(&snapshot_dir).await?;
let temp_snapshot_dir =
spawn_blocking(move || tempfile::tempdir_in(snapshot_dir)).await??;
let temp_snapshot_path = temp_snapshot_dir.path().to_owned();
let uuids = self
.uuid_resolver_handle
.snapshot(temp_snapshot_path.clone())
.await?;
if uuids.is_empty() {
return Ok(());
}
self.update_handle
.snapshot(uuids, temp_snapshot_path.clone())
.await?;
let snapshot_dir = self.snapshot_path.clone();
let snapshot_path = self
.snapshot_path
.join(format!("{}.snapshot", self.db_name));
let snapshot_path = spawn_blocking(move || -> anyhow::Result<PathBuf> {
let temp_snapshot_file = tempfile::NamedTempFile::new_in(snapshot_dir)?;
let temp_snapshot_file_path = temp_snapshot_file.path().to_owned();
compression::to_tar_gz(temp_snapshot_path, temp_snapshot_file_path)?;
temp_snapshot_file.persist(&snapshot_path)?;
Ok(snapshot_path)
})
.await??;
trace!("Created snapshot in {:?}.", snapshot_path);
Ok(())
}
}
pub fn load_snapshot(
db_path: impl AsRef<Path>,
snapshot_path: impl AsRef<Path>,
ignore_snapshot_if_db_exists: bool,
ignore_missing_snapshot: bool,
) -> anyhow::Result<()> {
if !db_path.as_ref().exists() && snapshot_path.as_ref().exists() {
match compression::from_tar_gz(snapshot_path, &db_path) {
Ok(()) => Ok(()),
Err(e) => {
// clean created db folder
std::fs::remove_dir_all(&db_path)?;
Err(e)
}
}
} else if db_path.as_ref().exists() && !ignore_snapshot_if_db_exists {
bail!(
"database already exists at {:?}, try to delete it or rename it",
db_path
.as_ref()
.canonicalize()
.unwrap_or_else(|_| db_path.as_ref().to_owned())
)
} else if !snapshot_path.as_ref().exists() && !ignore_missing_snapshot {
bail!(
"snapshot doesn't exist at {:?}",
snapshot_path
.as_ref()
.canonicalize()
.unwrap_or_else(|_| snapshot_path.as_ref().to_owned())
)
} else {
Ok(())
}
}
#[cfg(test)]
mod test {
use std::iter::FromIterator;
use std::{collections::HashSet, sync::Arc};
use futures::future::{err, ok};
use rand::Rng;
use tokio::time::timeout;
use uuid::Uuid;
use super::*;
use crate::index_controller::index_actor::MockIndexActorHandle;
use crate::index_controller::update_actor::{
error::UpdateActorError, MockUpdateActorHandle, UpdateActorHandleImpl,
};
use crate::index_controller::uuid_resolver::{
error::UuidResolverError, MockUuidResolverHandle,
};
#[actix_rt::test]
async fn test_normal() {
let mut rng = rand::thread_rng();
let uuids_num: usize = rng.gen_range(5, 10);
let uuids = (0..uuids_num)
.map(|_| Uuid::new_v4())
.collect::<HashSet<_>>();
let mut uuid_resolver = MockUuidResolverHandle::new();
let uuids_clone = uuids.clone();
uuid_resolver
.expect_snapshot()
.times(1)
.returning(move |_| Box::pin(ok(uuids_clone.clone())));
let uuids_clone = uuids.clone();
let mut index_handle = MockIndexActorHandle::new();
index_handle
.expect_snapshot()
.withf(move |uuid, _path| uuids_clone.contains(uuid))
.times(uuids_num)
.returning(move |_, _| Box::pin(ok(())));
let dir = tempfile::tempdir_in(".").unwrap();
let handle = Arc::new(index_handle);
let update_handle =
UpdateActorHandleImpl::<Vec<u8>>::new(handle.clone(), dir.path(), 4096 * 100).unwrap();
let snapshot_path = tempfile::tempdir_in(".").unwrap();
let snapshot_service = SnapshotService::new(
uuid_resolver,
update_handle,
Duration::from_millis(100),
snapshot_path.path().to_owned(),
"data.ms".to_string(),
);
snapshot_service.perform_snapshot().await.unwrap();
}
#[actix_rt::test]
async fn error_performing_uuid_snapshot() {
let mut uuid_resolver = MockUuidResolverHandle::new();
uuid_resolver
.expect_snapshot()
.times(1)
// abitrary error
.returning(|_| Box::pin(err(UuidResolverError::NameAlreadyExist)));
let update_handle = MockUpdateActorHandle::new();
let snapshot_path = tempfile::tempdir_in(".").unwrap();
let snapshot_service = SnapshotService::new(
uuid_resolver,
update_handle,
Duration::from_millis(100),
snapshot_path.path().to_owned(),
"data.ms".to_string(),
);
assert!(snapshot_service.perform_snapshot().await.is_err());
// Nothing was written to the file
assert!(!snapshot_path.path().join("data.ms.snapshot").exists());
}
#[actix_rt::test]
async fn error_performing_index_snapshot() {
let uuid = Uuid::new_v4();
let mut uuid_resolver = MockUuidResolverHandle::new();
uuid_resolver
.expect_snapshot()
.times(1)
.returning(move |_| Box::pin(ok(HashSet::from_iter(Some(uuid)))));
let mut update_handle = MockUpdateActorHandle::new();
update_handle
.expect_snapshot()
// abitrary error
.returning(|_, _| Box::pin(err(UpdateActorError::UnexistingUpdate(0))));
let snapshot_path = tempfile::tempdir_in(".").unwrap();
let snapshot_service = SnapshotService::new(
uuid_resolver,
update_handle,
Duration::from_millis(100),
snapshot_path.path().to_owned(),
"data.ms".to_string(),
);
assert!(snapshot_service.perform_snapshot().await.is_err());
// Nothing was written to the file
assert!(!snapshot_path.path().join("data.ms.snapshot").exists());
}
#[actix_rt::test]
async fn test_loop() {
let mut uuid_resolver = MockUuidResolverHandle::new();
uuid_resolver
.expect_snapshot()
// we expect the funtion to be called between 2 and 3 time in the given interval.
.times(2..4)
// abitrary error, to short-circuit the function
.returning(move |_| Box::pin(err(UuidResolverError::NameAlreadyExist)));
let update_handle = MockUpdateActorHandle::new();
let snapshot_path = tempfile::tempdir_in(".").unwrap();
let snapshot_service = SnapshotService::new(
uuid_resolver,
update_handle,
Duration::from_millis(100),
snapshot_path.path().to_owned(),
"data.ms".to_string(),
);
let _ = timeout(Duration::from_millis(300), snapshot_service.run()).await;
}
}

View File

@ -1,270 +0,0 @@
use std::collections::HashSet;
use std::io::SeekFrom;
use std::path::{Path, PathBuf};
use std::sync::atomic::AtomicBool;
use std::sync::Arc;
use async_stream::stream;
use futures::StreamExt;
use log::trace;
use serdeval::*;
use tokio::fs;
use tokio::io::AsyncWriteExt;
use tokio::sync::mpsc;
use uuid::Uuid;
use super::error::{Result, UpdateActorError};
use super::{PayloadData, UpdateMsg, UpdateStore, UpdateStoreInfo};
use crate::index_controller::index_actor::IndexActorHandle;
use crate::index_controller::{UpdateMeta, UpdateStatus};
pub struct UpdateActor<D, I> {
path: PathBuf,
store: Arc<UpdateStore>,
inbox: Option<mpsc::Receiver<UpdateMsg<D>>>,
index_handle: I,
must_exit: Arc<AtomicBool>,
}
impl<D, I> UpdateActor<D, I>
where
D: AsRef<[u8]> + Sized + 'static,
I: IndexActorHandle + Clone + Send + Sync + 'static,
{
pub fn new(
update_db_size: usize,
inbox: mpsc::Receiver<UpdateMsg<D>>,
path: impl AsRef<Path>,
index_handle: I,
) -> anyhow::Result<Self> {
let path = path.as_ref().join("updates");
std::fs::create_dir_all(&path)?;
let mut options = heed::EnvOpenOptions::new();
options.map_size(update_db_size);
let must_exit = Arc::new(AtomicBool::new(false));
let store = UpdateStore::open(options, &path, index_handle.clone(), must_exit.clone())?;
std::fs::create_dir_all(path.join("update_files"))?;
let inbox = Some(inbox);
Ok(Self {
path,
store,
inbox,
index_handle,
must_exit,
})
}
pub async fn run(mut self) {
use UpdateMsg::*;
trace!("Started update actor.");
let mut inbox = self
.inbox
.take()
.expect("A receiver should be present by now.");
let must_exit = self.must_exit.clone();
let stream = stream! {
loop {
let msg = inbox.recv().await;
if must_exit.load(std::sync::atomic::Ordering::Relaxed) {
break;
}
match msg {
Some(msg) => yield msg,
None => break,
}
}
};
stream
.for_each_concurrent(Some(10), |msg| async {
match msg {
Update {
uuid,
meta,
data,
ret,
} => {
let _ = ret.send(self.handle_update(uuid, meta, data).await);
}
ListUpdates { uuid, ret } => {
let _ = ret.send(self.handle_list_updates(uuid).await);
}
GetUpdate { uuid, ret, id } => {
let _ = ret.send(self.handle_get_update(uuid, id).await);
}
Delete { uuid, ret } => {
let _ = ret.send(self.handle_delete(uuid).await);
}
Snapshot { uuids, path, ret } => {
let _ = ret.send(self.handle_snapshot(uuids, path).await);
}
GetInfo { ret } => {
let _ = ret.send(self.handle_get_info().await);
}
Dump { uuids, path, ret } => {
let _ = ret.send(self.handle_dump(uuids, path).await);
}
}
})
.await;
}
async fn handle_update(
&self,
uuid: Uuid,
meta: UpdateMeta,
payload: mpsc::Receiver<PayloadData<D>>,
) -> Result<UpdateStatus> {
let file_path = match meta {
UpdateMeta::DocumentsAddition { .. } => {
let update_file_id = uuid::Uuid::new_v4();
let path = self
.path
.join(format!("update_files/update_{}", update_file_id));
let mut file = fs::OpenOptions::new()
.read(true)
.write(true)
.create(true)
.open(&path)
.await?;
async fn write_to_file<D>(
file: &mut fs::File,
mut payload: mpsc::Receiver<PayloadData<D>>,
) -> Result<usize>
where
D: AsRef<[u8]> + Sized + 'static,
{
let mut file_len = 0;
while let Some(bytes) = payload.recv().await {
let bytes = bytes?;
file_len += bytes.as_ref().len();
file.write_all(bytes.as_ref()).await?;
}
file.flush().await?;
Ok(file_len)
}
let file_len = write_to_file(&mut file, payload).await;
match file_len {
Ok(len) if len > 0 => {
let file = file.into_std().await;
Some((file, update_file_id))
}
Err(e) => {
fs::remove_file(&path).await?;
return Err(e);
}
_ => {
fs::remove_file(&path).await?;
None
}
}
}
_ => None,
};
let update_store = self.store.clone();
tokio::task::spawn_blocking(move || {
use std::io::{BufReader, Seek};
// If the payload is empty, ignore the check.
let update_uuid = if let Some((mut file, uuid)) = file_path {
// set the file back to the beginning
file.seek(SeekFrom::Start(0))?;
// Check that the json payload is valid:
let reader = BufReader::new(&mut file);
// Validate that the payload is in the correct format.
let _: Seq<Map<Str, Any>> = serde_json::from_reader(reader)
.map_err(|e| UpdateActorError::InvalidPayload(Box::new(e)))?;
Some(uuid)
} else {
None
};
// The payload is valid, we can register it to the update store.
let status = update_store
.register_update(meta, update_uuid, uuid)
.map(UpdateStatus::Enqueued)?;
Ok(status)
})
.await?
}
async fn handle_list_updates(&self, uuid: Uuid) -> Result<Vec<UpdateStatus>> {
let update_store = self.store.clone();
tokio::task::spawn_blocking(move || {
let result = update_store.list(uuid)?;
Ok(result)
})
.await?
}
async fn handle_get_update(&self, uuid: Uuid, id: u64) -> Result<UpdateStatus> {
let store = self.store.clone();
tokio::task::spawn_blocking(move || {
let result = store
.meta(uuid, id)?
.ok_or(UpdateActorError::UnexistingUpdate(id))?;
Ok(result)
})
.await?
}
async fn handle_delete(&self, uuid: Uuid) -> Result<()> {
let store = self.store.clone();
tokio::task::spawn_blocking(move || store.delete_all(uuid)).await??;
Ok(())
}
async fn handle_snapshot(&self, uuids: HashSet<Uuid>, path: PathBuf) -> Result<()> {
let index_handle = self.index_handle.clone();
let update_store = self.store.clone();
tokio::task::spawn_blocking(move || update_store.snapshot(&uuids, &path, index_handle))
.await??;
Ok(())
}
async fn handle_dump(&self, uuids: HashSet<Uuid>, path: PathBuf) -> Result<()> {
let index_handle = self.index_handle.clone();
let update_store = self.store.clone();
tokio::task::spawn_blocking(move || -> Result<()> {
update_store.dump(&uuids, path.to_path_buf(), index_handle)?;
Ok(())
})
.await??;
Ok(())
}
async fn handle_get_info(&self) -> Result<UpdateStoreInfo> {
let update_store = self.store.clone();
let info = tokio::task::spawn_blocking(move || -> Result<UpdateStoreInfo> {
let info = update_store.get_info()?;
Ok(info)
})
.await??;
Ok(info)
}
}

View File

@ -1,61 +0,0 @@
use std::error::Error;
use meilisearch_error::{Code, ErrorCode};
use crate::index_controller::index_actor::error::IndexActorError;
pub type Result<T> = std::result::Result<T, UpdateActorError>;
#[derive(Debug, thiserror::Error)]
#[allow(clippy::large_enum_variant)]
pub enum UpdateActorError {
#[error("Update {0} not found.")]
UnexistingUpdate(u64),
#[error("Internal error: {0}")]
Internal(Box<dyn Error + Send + Sync + 'static>),
#[error("{0}")]
IndexActor(#[from] IndexActorError),
#[error(
"update store was shut down due to a fatal error, please check your logs for more info."
)]
FatalUpdateStoreError,
#[error("{0}")]
InvalidPayload(Box<dyn Error + Send + Sync + 'static>),
#[error("{0}")]
PayloadError(#[from] actix_web::error::PayloadError),
}
impl<T> From<tokio::sync::mpsc::error::SendError<T>> for UpdateActorError {
fn from(_: tokio::sync::mpsc::error::SendError<T>) -> Self {
Self::FatalUpdateStoreError
}
}
impl From<tokio::sync::oneshot::error::RecvError> for UpdateActorError {
fn from(_: tokio::sync::oneshot::error::RecvError) -> Self {
Self::FatalUpdateStoreError
}
}
internal_error!(
UpdateActorError: heed::Error,
std::io::Error,
serde_json::Error,
tokio::task::JoinError
);
impl ErrorCode for UpdateActorError {
fn error_code(&self) -> Code {
match self {
UpdateActorError::UnexistingUpdate(_) => Code::NotFound,
UpdateActorError::Internal(_) => Code::Internal,
UpdateActorError::IndexActor(e) => e.error_code(),
UpdateActorError::FatalUpdateStoreError => Code::Internal,
UpdateActorError::InvalidPayload(_) => Code::BadRequest,
UpdateActorError::PayloadError(error) => match error {
actix_http::error::PayloadError::Overflow => Code::PayloadTooLarge,
_ => Code::Internal,
},
}
}
}

View File

@ -1,103 +0,0 @@
use std::collections::HashSet;
use std::path::{Path, PathBuf};
use tokio::sync::{mpsc, oneshot};
use uuid::Uuid;
use crate::index_controller::{IndexActorHandle, UpdateStatus};
use super::error::Result;
use super::{PayloadData, UpdateActor, UpdateActorHandle, UpdateMeta, UpdateMsg, UpdateStoreInfo};
#[derive(Clone)]
pub struct UpdateActorHandleImpl<D> {
sender: mpsc::Sender<UpdateMsg<D>>,
}
impl<D> UpdateActorHandleImpl<D>
where
D: AsRef<[u8]> + Sized + 'static + Sync + Send,
{
pub fn new<I>(
index_handle: I,
path: impl AsRef<Path>,
update_store_size: usize,
) -> anyhow::Result<Self>
where
I: IndexActorHandle + Clone + Send + Sync + 'static,
{
let path = path.as_ref().to_owned();
let (sender, receiver) = mpsc::channel(100);
let actor = UpdateActor::new(update_store_size, receiver, path, index_handle)?;
tokio::task::spawn(actor.run());
Ok(Self { sender })
}
}
#[async_trait::async_trait]
impl<D> UpdateActorHandle for UpdateActorHandleImpl<D>
where
D: AsRef<[u8]> + Sized + 'static + Sync + Send,
{
type Data = D;
async fn get_all_updates_status(&self, uuid: Uuid) -> Result<Vec<UpdateStatus>> {
let (ret, receiver) = oneshot::channel();
let msg = UpdateMsg::ListUpdates { uuid, ret };
self.sender.send(msg).await?;
receiver.await?
}
async fn update_status(&self, uuid: Uuid, id: u64) -> Result<UpdateStatus> {
let (ret, receiver) = oneshot::channel();
let msg = UpdateMsg::GetUpdate { uuid, id, ret };
self.sender.send(msg).await?;
receiver.await?
}
async fn delete(&self, uuid: Uuid) -> Result<()> {
let (ret, receiver) = oneshot::channel();
let msg = UpdateMsg::Delete { uuid, ret };
self.sender.send(msg).await?;
receiver.await?
}
async fn snapshot(&self, uuids: HashSet<Uuid>, path: PathBuf) -> Result<()> {
let (ret, receiver) = oneshot::channel();
let msg = UpdateMsg::Snapshot { uuids, path, ret };
self.sender.send(msg).await?;
receiver.await?
}
async fn dump(&self, uuids: HashSet<Uuid>, path: PathBuf) -> Result<()> {
let (ret, receiver) = oneshot::channel();
let msg = UpdateMsg::Dump { uuids, path, ret };
self.sender.send(msg).await?;
receiver.await?
}
async fn get_info(&self) -> Result<UpdateStoreInfo> {
let (ret, receiver) = oneshot::channel();
let msg = UpdateMsg::GetInfo { ret };
self.sender.send(msg).await?;
receiver.await?
}
async fn update(
&self,
meta: UpdateMeta,
data: mpsc::Receiver<PayloadData<Self::Data>>,
uuid: Uuid,
) -> Result<UpdateStatus> {
let (ret, receiver) = oneshot::channel();
let msg = UpdateMsg::Update {
uuid,
data,
meta,
ret,
};
self.sender.send(msg).await?;
receiver.await?
}
}

View File

@ -1,43 +0,0 @@
use std::collections::HashSet;
use std::path::PathBuf;
use tokio::sync::{mpsc, oneshot};
use uuid::Uuid;
use super::error::Result;
use super::{PayloadData, UpdateMeta, UpdateStatus, UpdateStoreInfo};
pub enum UpdateMsg<D> {
Update {
uuid: Uuid,
meta: UpdateMeta,
data: mpsc::Receiver<PayloadData<D>>,
ret: oneshot::Sender<Result<UpdateStatus>>,
},
ListUpdates {
uuid: Uuid,
ret: oneshot::Sender<Result<Vec<UpdateStatus>>>,
},
GetUpdate {
uuid: Uuid,
ret: oneshot::Sender<Result<UpdateStatus>>,
id: u64,
},
Delete {
uuid: Uuid,
ret: oneshot::Sender<Result<()>>,
},
Snapshot {
uuids: HashSet<Uuid>,
path: PathBuf,
ret: oneshot::Sender<Result<()>>,
},
Dump {
uuids: HashSet<Uuid>,
path: PathBuf,
ret: oneshot::Sender<Result<()>>,
},
GetInfo {
ret: oneshot::Sender<Result<UpdateStoreInfo>>,
},
}

View File

@ -1,44 +0,0 @@
use std::{collections::HashSet, path::PathBuf};
use actix_http::error::PayloadError;
use tokio::sync::mpsc;
use uuid::Uuid;
use crate::index_controller::{UpdateMeta, UpdateStatus};
use actor::UpdateActor;
use error::Result;
use message::UpdateMsg;
pub use handle_impl::UpdateActorHandleImpl;
pub use store::{UpdateStore, UpdateStoreInfo};
mod actor;
pub mod error;
mod handle_impl;
mod message;
pub mod store;
type PayloadData<D> = std::result::Result<D, PayloadError>;
#[cfg(test)]
use mockall::automock;
#[async_trait::async_trait]
#[cfg_attr(test, automock(type Data=Vec<u8>;))]
pub trait UpdateActorHandle {
type Data: AsRef<[u8]> + Sized + 'static + Sync + Send;
async fn get_all_updates_status(&self, uuid: Uuid) -> Result<Vec<UpdateStatus>>;
async fn update_status(&self, uuid: Uuid, id: u64) -> Result<UpdateStatus>;
async fn delete(&self, uuid: Uuid) -> Result<()>;
async fn snapshot(&self, uuid: HashSet<Uuid>, path: PathBuf) -> Result<()>;
async fn dump(&self, uuids: HashSet<Uuid>, path: PathBuf) -> Result<()>;
async fn get_info(&self) -> Result<UpdateStoreInfo>;
async fn update(
&self,
meta: UpdateMeta,
data: mpsc::Receiver<PayloadData<Self::Data>>,
uuid: Uuid,
) -> Result<UpdateStatus>;
}

View File

@ -1,184 +0,0 @@
use std::{
collections::HashSet,
fs::{create_dir_all, File},
io::{BufRead, BufReader, Write},
path::{Path, PathBuf},
};
use heed::{EnvOpenOptions, RoTxn};
use serde::{Deserialize, Serialize};
use uuid::Uuid;
use super::{Result, State, UpdateStore};
use crate::index_controller::{
index_actor::IndexActorHandle, update_actor::store::update_uuid_to_file_path, Enqueued,
UpdateStatus,
};
#[derive(Serialize, Deserialize)]
struct UpdateEntry {
uuid: Uuid,
update: UpdateStatus,
}
impl UpdateStore {
pub fn dump(
&self,
uuids: &HashSet<Uuid>,
path: PathBuf,
handle: impl IndexActorHandle,
) -> Result<()> {
let state_lock = self.state.write();
state_lock.swap(State::Dumping);
// txn must *always* be acquired after state lock, or it will dead lock.
let txn = self.env.write_txn()?;
let dump_path = path.join("updates");
create_dir_all(&dump_path)?;
self.dump_updates(&txn, uuids, &dump_path)?;
let fut = dump_indexes(uuids, handle, &path);
tokio::runtime::Handle::current().block_on(fut)?;
state_lock.swap(State::Idle);
Ok(())
}
fn dump_updates(
&self,
txn: &RoTxn,
uuids: &HashSet<Uuid>,
path: impl AsRef<Path>,
) -> Result<()> {
let dump_data_path = path.as_ref().join("data.jsonl");
let mut dump_data_file = File::create(dump_data_path)?;
let update_files_path = path.as_ref().join(super::UPDATE_DIR);
create_dir_all(&update_files_path)?;
self.dump_pending(txn, uuids, &mut dump_data_file, &path)?;
self.dump_completed(txn, uuids, &mut dump_data_file)?;
Ok(())
}
fn dump_pending(
&self,
txn: &RoTxn,
uuids: &HashSet<Uuid>,
mut file: &mut File,
dst_path: impl AsRef<Path>,
) -> Result<()> {
let pendings = self.pending_queue.iter(txn)?.lazily_decode_data();
for pending in pendings {
let ((_, uuid, _), data) = pending?;
if uuids.contains(&uuid) {
let update = data.decode()?;
if let Some(ref update_uuid) = update.content {
let src = super::update_uuid_to_file_path(&self.path, *update_uuid);
let dst = super::update_uuid_to_file_path(&dst_path, *update_uuid);
std::fs::copy(src, dst)?;
}
let update_json = UpdateEntry {
uuid,
update: update.into(),
};
serde_json::to_writer(&mut file, &update_json)?;
file.write_all(b"\n")?;
}
}
Ok(())
}
fn dump_completed(
&self,
txn: &RoTxn,
uuids: &HashSet<Uuid>,
mut file: &mut File,
) -> Result<()> {
let updates = self.updates.iter(txn)?.lazily_decode_data();
for update in updates {
let ((uuid, _), data) = update?;
if uuids.contains(&uuid) {
let update = data.decode()?;
let update_json = UpdateEntry { uuid, update };
serde_json::to_writer(&mut file, &update_json)?;
file.write_all(b"\n")?;
}
}
Ok(())
}
pub fn load_dump(
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
db_size: usize,
) -> anyhow::Result<()> {
let dst_update_path = dst.as_ref().join("updates/");
create_dir_all(&dst_update_path)?;
let mut options = EnvOpenOptions::new();
options.map_size(db_size as usize);
let (store, _) = UpdateStore::new(options, &dst_update_path)?;
let src_update_path = src.as_ref().join("updates");
let update_data = File::open(&src_update_path.join("data.jsonl"))?;
let mut update_data = BufReader::new(update_data);
std::fs::create_dir_all(dst_update_path.join("update_files/"))?;
let mut wtxn = store.env.write_txn()?;
let mut line = String::new();
loop {
match update_data.read_line(&mut line) {
Ok(0) => break,
Ok(_) => {
let UpdateEntry { uuid, update } = serde_json::from_str(&line)?;
store.register_raw_updates(&mut wtxn, &update, uuid)?;
// Copy ascociated update path if it exists
if let UpdateStatus::Enqueued(Enqueued {
content: Some(uuid),
..
}) = update
{
let src = update_uuid_to_file_path(&src_update_path, uuid);
let dst = update_uuid_to_file_path(&dst_update_path, uuid);
std::fs::copy(src, dst)?;
}
}
_ => break,
}
line.clear();
}
wtxn.commit()?;
Ok(())
}
}
async fn dump_indexes(
uuids: &HashSet<Uuid>,
handle: impl IndexActorHandle,
path: impl AsRef<Path>,
) -> Result<()> {
for uuid in uuids {
handle.dump(*uuid, path.as_ref().to_owned()).await?;
}
Ok(())
}

View File

@ -1,98 +0,0 @@
use std::{collections::HashSet, path::PathBuf};
use log::{trace, warn};
use tokio::sync::mpsc;
use uuid::Uuid;
use super::{error::UuidResolverError, Result, UuidResolveMsg, UuidStore};
pub struct UuidResolverActor<S> {
inbox: mpsc::Receiver<UuidResolveMsg>,
store: S,
}
impl<S: UuidStore> UuidResolverActor<S> {
pub fn new(inbox: mpsc::Receiver<UuidResolveMsg>, store: S) -> Self {
Self { inbox, store }
}
pub async fn run(mut self) {
use UuidResolveMsg::*;
trace!("uuid resolver started");
loop {
match self.inbox.recv().await {
Some(Get { uid: name, ret }) => {
let _ = ret.send(self.handle_get(name).await);
}
Some(Delete { uid: name, ret }) => {
let _ = ret.send(self.handle_delete(name).await);
}
Some(List { ret }) => {
let _ = ret.send(self.handle_list().await);
}
Some(Insert { ret, uuid, name }) => {
let _ = ret.send(self.handle_insert(name, uuid).await);
}
Some(SnapshotRequest { path, ret }) => {
let _ = ret.send(self.handle_snapshot(path).await);
}
Some(GetSize { ret }) => {
let _ = ret.send(self.handle_get_size().await);
}
Some(DumpRequest { path, ret }) => {
let _ = ret.send(self.handle_dump(path).await);
}
// all senders have been dropped, need to quit.
None => break,
}
}
warn!("exiting uuid resolver loop");
}
async fn handle_get(&self, uid: String) -> Result<Uuid> {
self.store
.get_uuid(uid.clone())
.await?
.ok_or(UuidResolverError::UnexistingIndex(uid))
}
async fn handle_delete(&self, uid: String) -> Result<Uuid> {
self.store
.delete(uid.clone())
.await?
.ok_or(UuidResolverError::UnexistingIndex(uid))
}
async fn handle_list(&self) -> Result<Vec<(String, Uuid)>> {
let result = self.store.list().await?;
Ok(result)
}
async fn handle_snapshot(&self, path: PathBuf) -> Result<HashSet<Uuid>> {
self.store.snapshot(path).await
}
async fn handle_dump(&self, path: PathBuf) -> Result<HashSet<Uuid>> {
self.store.dump(path).await
}
async fn handle_insert(&self, uid: String, uuid: Uuid) -> Result<()> {
if !is_index_uid_valid(&uid) {
return Err(UuidResolverError::BadlyFormatted(uid));
}
self.store.insert(uid, uuid).await?;
Ok(())
}
async fn handle_get_size(&self) -> Result<u64> {
self.store.get_size().await
}
}
fn is_index_uid_valid(uid: &str) -> bool {
uid.chars()
.all(|x| x.is_ascii_alphanumeric() || x == '-' || x == '_')
}

View File

@ -1,34 +0,0 @@
use meilisearch_error::{Code, ErrorCode};
pub type Result<T> = std::result::Result<T, UuidResolverError>;
#[derive(Debug, thiserror::Error)]
pub enum UuidResolverError {
#[error("Index already exists.")]
NameAlreadyExist,
#[error("Index \"{0}\" not found.")]
UnexistingIndex(String),
#[error("Index must have a valid uid; Index uid can be of type integer or string only composed of alphanumeric characters, hyphens (-) and underscores (_).")]
BadlyFormatted(String),
#[error("Internal error: {0}")]
Internal(Box<dyn std::error::Error + Sync + Send + 'static>),
}
internal_error!(
UuidResolverError: heed::Error,
uuid::Error,
std::io::Error,
tokio::task::JoinError,
serde_json::Error
);
impl ErrorCode for UuidResolverError {
fn error_code(&self) -> Code {
match self {
UuidResolverError::NameAlreadyExist => Code::IndexAlreadyExists,
UuidResolverError::UnexistingIndex(_) => Code::IndexNotFound,
UuidResolverError::BadlyFormatted(_) => Code::InvalidIndexUid,
UuidResolverError::Internal(_) => Code::Internal,
}
}
}

View File

@ -1,87 +0,0 @@
use std::collections::HashSet;
use std::path::{Path, PathBuf};
use tokio::sync::{mpsc, oneshot};
use uuid::Uuid;
use super::{HeedUuidStore, Result, UuidResolveMsg, UuidResolverActor, UuidResolverHandle};
#[derive(Clone)]
pub struct UuidResolverHandleImpl {
sender: mpsc::Sender<UuidResolveMsg>,
}
impl UuidResolverHandleImpl {
pub fn new(path: impl AsRef<Path>) -> Result<Self> {
let (sender, reveiver) = mpsc::channel(100);
let store = HeedUuidStore::new(path)?;
let actor = UuidResolverActor::new(reveiver, store);
tokio::spawn(actor.run());
Ok(Self { sender })
}
}
#[async_trait::async_trait]
impl UuidResolverHandle for UuidResolverHandleImpl {
async fn get(&self, name: String) -> Result<Uuid> {
let (ret, receiver) = oneshot::channel();
let msg = UuidResolveMsg::Get { uid: name, ret };
let _ = self.sender.send(msg).await;
Ok(receiver
.await
.expect("Uuid resolver actor has been killed")?)
}
async fn delete(&self, name: String) -> Result<Uuid> {
let (ret, receiver) = oneshot::channel();
let msg = UuidResolveMsg::Delete { uid: name, ret };
let _ = self.sender.send(msg).await;
Ok(receiver
.await
.expect("Uuid resolver actor has been killed")?)
}
async fn list(&self) -> Result<Vec<(String, Uuid)>> {
let (ret, receiver) = oneshot::channel();
let msg = UuidResolveMsg::List { ret };
let _ = self.sender.send(msg).await;
Ok(receiver
.await
.expect("Uuid resolver actor has been killed")?)
}
async fn insert(&self, name: String, uuid: Uuid) -> Result<()> {
let (ret, receiver) = oneshot::channel();
let msg = UuidResolveMsg::Insert { ret, name, uuid };
let _ = self.sender.send(msg).await;
Ok(receiver
.await
.expect("Uuid resolver actor has been killed")?)
}
async fn snapshot(&self, path: PathBuf) -> Result<HashSet<Uuid>> {
let (ret, receiver) = oneshot::channel();
let msg = UuidResolveMsg::SnapshotRequest { path, ret };
let _ = self.sender.send(msg).await;
Ok(receiver
.await
.expect("Uuid resolver actor has been killed")?)
}
async fn get_size(&self) -> Result<u64> {
let (ret, receiver) = oneshot::channel();
let msg = UuidResolveMsg::GetSize { ret };
let _ = self.sender.send(msg).await;
Ok(receiver
.await
.expect("Uuid resolver actor has been killed")?)
}
async fn dump(&self, path: PathBuf) -> Result<HashSet<Uuid>> {
let (ret, receiver) = oneshot::channel();
let msg = UuidResolveMsg::DumpRequest { ret, path };
let _ = self.sender.send(msg).await;
Ok(receiver
.await
.expect("Uuid resolver actor has been killed")?)
}
}

View File

@ -1,35 +0,0 @@
mod actor;
pub mod error;
mod handle_impl;
mod message;
pub mod store;
use std::collections::HashSet;
use std::path::PathBuf;
use uuid::Uuid;
use actor::UuidResolverActor;
use error::Result;
use message::UuidResolveMsg;
use store::UuidStore;
#[cfg(test)]
use mockall::automock;
pub use handle_impl::UuidResolverHandleImpl;
pub use store::HeedUuidStore;
const UUID_STORE_SIZE: usize = 1_073_741_824; //1GiB
#[async_trait::async_trait]
#[cfg_attr(test, automock)]
pub trait UuidResolverHandle {
async fn get(&self, name: String) -> Result<Uuid>;
async fn insert(&self, name: String, uuid: Uuid) -> Result<()>;
async fn delete(&self, name: String) -> Result<Uuid>;
async fn list(&self) -> Result<Vec<(String, Uuid)>>;
async fn snapshot(&self, path: PathBuf) -> Result<HashSet<Uuid>>;
async fn get_size(&self) -> Result<u64>;
async fn dump(&self, path: PathBuf) -> Result<HashSet<Uuid>>;
}

View File

@ -1,87 +1,138 @@
//! # MeiliSearch
//! Hello there, future contributors. If you are here and see this code, it's probably because you want to add a super new fancy feature in MeiliSearch or fix a bug and first of all, thank you for that!
//!
//! To help you in this task, we'll try to do a little overview of the project.
//! ## Milli
//! [Milli](https://github.com/meilisearch/milli) is the core library of MeiliSearch. It's where we actually index documents and perform searches. Its purpose is to do these two tasks as fast as possible. You can give an update to milli, and it'll uses as many cores as provided to perform it as fast as possible. Nothing more. You can perform searches at the same time (search only uses one core).
//! As you can see, we're missing quite a lot of features here; milli does not handle multiples indexes, it can't queue updates, it doesn't provide any web / API frontend, it doesn't implement dumps or snapshots, etc...
//!
//! ## `Index` module
//! The [index] module is what encapsulates one milli index. It abstracts over its transaction and isolates a task that can be run into a thread. This is the unit of interaction with milli.
//! If you add a feature to milli, you'll probably need to add it in this module too before exposing it to the rest of meilisearch.
//!
//! ## `IndexController` module
//! To handle multiple indexes, we created an [index_controller]. It's in charge of creating new indexes, keeping references to all its indexes, forward asynchronous updates to its indexes, and provide an API to search in its indexes synchronously.
//! To achieves this goal, we use an [actor model](https://en.wikipedia.org/wiki/Actor_model).
//!
//! ### The actor model
//! Every actor is composed of at least three files:
//! - `mod.rs` declare and import all the files used by the actor. We also describe the interface (= all the methods) used to interact with the actor. If you are not modifying anything inside of an actor, this is usually all you need to see.
//! - `handle_impl.rs` implements the interface described in the `mod.rs`; in reality, there is no code logic in this file. Every method is only wrapping its parameters in a structure that is sent to the actor. This is useful for test and futureproofing.
//! - `message.rs` contains an enum that describes all the interactions you can have with the actor.
//! - `actor.rs` is used to create and execute the actor. It's where we'll write the loop looking for new messages and actually perform the tasks.
//!
//! MeiliSearch currently uses four actors:
//! - [`uuid_resolver`](index_controller/uuid_resolver/index.html) hold the association between the user-provided indexes name and the internal [`uuid`](https://en.wikipedia.org/wiki/Universally_unique_identifier) representation we use.
//! - [`index_actor`](index_controller::index_actor) is our representation of multiples indexes. Any request made to MeiliSearch that needs to talk to milli will pass through this actor.
//! - [`update_actor`](index_controller/update_actor/index.html) is in charge of indexes updates. Since updates can take a long time to receive and process, we need to:
//! 1. Store them as fast as possible so we can continue to receive other updates even if nothing has been processed
//! 2. Feed the `index_actor` with a new update every time it finished its current job.
//! - [`dump_actor`](index_controller/dump_actor/index.html) this actor handle the [dumps](https://docs.meilisearch.com/reference/api/dump.html). It needs to contact all the others actors and create a dump of everything that was currently happening.
//!
//! ## Data module
//! The [data] module provide a unified interface to communicate with the index controller and other services (snapshot, dumps, ...), initialize the MeiliSearch instance
//!
//! ## HTTP server
//! To handle the web and API part, we are using [actix-web](https://docs.rs/actix-web/); you can find all routes in the [routes] module.
//! Currently, the configuration of actix-web is made in the [lib.rs](crate).
//! Most of the routes use [extractors] to handle the authentication.
#![allow(rustdoc::private_intra_doc_links)]
pub mod data;
#[macro_use]
pub mod error;
#[macro_use]
pub mod extractors;
pub mod helpers;
mod index;
mod index_controller;
pub mod option;
pub mod routes;
#[cfg(all(not(debug_assertions), feature = "analytics"))]
pub mod analytics;
pub mod helpers;
pub mod option;
pub mod routes;
use std::path::Path;
use std::time::Duration;
use crate::error::MeilisearchHttpError;
use crate::extractors::authentication::AuthConfig;
pub use self::data::Data;
use actix_web::error::JsonPayloadError;
use error::PayloadError;
use http::header::CONTENT_TYPE;
pub use option::Opt;
use actix_web::web;
use actix_web::{web, HttpRequest};
use extractors::authentication::policies::*;
use extractors::payload::PayloadConfig;
use meilisearch_lib::MeiliSearch;
use sha2::Digest;
pub fn configure_data(config: &mut web::ServiceConfig, data: Data) {
let http_payload_size_limit = data.http_payload_size_limit();
#[derive(Clone)]
pub struct ApiKeys {
pub public: Option<String>,
pub private: Option<String>,
pub master: Option<String>,
}
impl ApiKeys {
pub fn generate_missing_api_keys(&mut self) {
if let Some(master_key) = &self.master {
if self.private.is_none() {
let key = format!("{}-private", master_key);
let sha = sha2::Sha256::digest(key.as_bytes());
self.private = Some(format!("{:x}", sha));
}
if self.public.is_none() {
let key = format!("{}-public", master_key);
let sha = sha2::Sha256::digest(key.as_bytes());
self.public = Some(format!("{:x}", sha));
}
}
}
}
pub fn setup_meilisearch(opt: &Opt) -> anyhow::Result<MeiliSearch> {
let mut meilisearch = MeiliSearch::builder();
meilisearch
.set_max_index_size(opt.max_index_size.get_bytes() as usize)
.set_max_update_store_size(opt.max_udb_size.get_bytes() as usize)
.set_ignore_missing_snapshot(opt.ignore_missing_snapshot)
.set_ignore_snapshot_if_db_exists(opt.ignore_snapshot_if_db_exists)
.set_dump_dst(opt.dumps_dir.clone())
.set_snapshot_interval(Duration::from_secs(opt.snapshot_interval_sec))
.set_snapshot_dir(opt.snapshot_dir.clone());
if let Some(ref path) = opt.import_snapshot {
meilisearch.set_import_snapshot(path.clone());
}
if let Some(ref path) = opt.import_dump {
meilisearch.set_dump_src(path.clone());
}
if opt.schedule_snapshot {
meilisearch.set_schedule_snapshot();
}
meilisearch.build(opt.db_path.clone(), opt.indexer_options.clone())
}
/// Cleans and setup the temporary file folder in the database directory. This must be done after
/// the meilisearch instance has been created, to not interfere with the snapshot and dump loading.
pub fn setup_temp_dir(db_path: impl AsRef<Path>) -> anyhow::Result<()> {
// Set the tempfile directory in the current db path, to avoid cross device references. Also
// remove the previous outstanding files found there
//
// TODO: if two processes open the same db, one might delete the other tmpdir. Need to make
// sure that no one is using it before deleting it.
let temp_path = db_path.as_ref().join("tmp");
// Ignore error if tempdir doesn't exist
let _ = std::fs::remove_dir_all(&temp_path);
std::fs::create_dir_all(&temp_path)?;
if cfg!(windows) {
std::env::set_var("TMP", temp_path);
} else {
std::env::set_var("TMPDIR", temp_path);
}
Ok(())
}
pub fn configure_data(config: &mut web::ServiceConfig, data: MeiliSearch, opt: &Opt) {
let http_payload_size_limit = opt.http_payload_size_limit.get_bytes() as usize;
config
.data(data.clone())
.app_data(data)
.app_data(
web::JsonConfig::default()
.limit(http_payload_size_limit)
.content_type(|_mime| true) // Accept all mime types
.error_handler(|err, _req| error::payload_error_handler(err).into()),
.content_type(|mime| mime == mime::APPLICATION_JSON)
.error_handler(|err, req: &HttpRequest| match err {
JsonPayloadError::ContentType => match req.headers().get(CONTENT_TYPE) {
Some(content_type) => MeilisearchHttpError::InvalidContentType(
content_type.to_str().unwrap_or("unknown").to_string(),
vec![mime::APPLICATION_JSON.to_string()],
)
.into(),
None => MeilisearchHttpError::MissingContentType(vec![
mime::APPLICATION_JSON.to_string(),
])
.into(),
},
err => PayloadError::from(err).into(),
}),
)
.app_data(PayloadConfig::new(http_payload_size_limit))
.app_data(
web::QueryConfig::default()
.error_handler(|err, _req| error::payload_error_handler(err).into()),
web::QueryConfig::default().error_handler(|err, _req| PayloadError::from(err).into()),
);
}
pub fn configure_auth(config: &mut web::ServiceConfig, data: &Data) {
let keys = data.api_keys();
pub fn configure_auth(config: &mut web::ServiceConfig, opts: &Opt) {
let mut keys = ApiKeys {
master: opts.master_key.clone(),
private: None,
public: None,
};
keys.generate_missing_api_keys();
let auth_config = if let Some(ref master_key) = keys.master {
let private_key = keys.private.as_ref().unwrap();
let public_key = keys.public.as_ref().unwrap();
@ -97,7 +148,7 @@ pub fn configure_auth(config: &mut web::ServiceConfig, data: &Data) {
AuthConfig::NoAuth
};
config.app_data(auth_config);
config.app_data(auth_config).app_data(keys);
}
#[cfg(feature = "mini-dashboard")]
@ -111,7 +162,6 @@ pub fn dashboard(config: &mut web::ServiceConfig, enable_frontend: bool) {
if enable_frontend {
let generated = generated::generate();
let mut scope = web::scope("/");
// Generate routes for mini-dashboard assets
for (path, resource) in generated.into_iter() {
let Resource {
@ -123,12 +173,11 @@ pub fn dashboard(config: &mut web::ServiceConfig, enable_frontend: bool) {
web::get().to(move || HttpResponse::Ok().content_type(mime_type).body(data)),
));
} else {
scope = scope.service(web::resource(path).route(
config.service(web::resource(path).route(
web::get().to(move || HttpResponse::Ok().content_type(mime_type).body(data)),
));
}
}
config.service(scope);
} else {
config.service(web::resource("/").route(web::get().to(routes::running)));
}
@ -141,17 +190,18 @@ pub fn dashboard(config: &mut web::ServiceConfig, _enable_frontend: bool) {
#[macro_export]
macro_rules! create_app {
($data:expr, $enable_frontend:expr) => {{
($data:expr, $enable_frontend:expr, $opt:expr) => {{
use actix_cors::Cors;
use actix_web::middleware::TrailingSlash;
use actix_web::App;
use actix_web::{middleware, web};
use meilisearch_http::error::{MeilisearchHttpError, ResponseError};
use meilisearch_http::routes;
use meilisearch_http::{configure_auth, configure_data, dashboard};
App::new()
.configure(|s| configure_data(s, $data.clone()))
.configure(|s| configure_auth(s, &$data))
.configure(|s| configure_data(s, $data.clone(), &$opt))
.configure(|s| configure_auth(s, &$opt))
.configure(routes::configure)
.configure(|s| dashboard(s, $enable_frontend))
.wrap(

View File

@ -1,26 +1,19 @@
use std::env;
use actix_web::HttpServer;
use main_error::MainError;
use meilisearch_http::{create_app, Data, Opt};
use meilisearch_http::{create_app, setup_meilisearch, Opt};
use meilisearch_lib::MeiliSearch;
use structopt::StructOpt;
#[cfg(all(not(debug_assertions), feature = "analytics"))]
use meilisearch_http::analytics;
#[cfg(all(not(debug_assertions), feature = "analytics"))]
use std::sync::Arc;
#[cfg(target_os = "linux")]
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
#[cfg(all(not(debug_assertions), feature = "analytics"))]
const SENTRY_DSN: &str = "https://5ddfa22b95f241198be2271aaf028653@sentry.io/3060337";
#[actix_web::main]
async fn main() -> Result<(), MainError> {
let opt = Opt::from_args();
static ALLOC: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
/// does all the setup before meilisearch is launched
fn setup(opt: &Opt) -> anyhow::Result<()> {
let mut log_builder = env_logger::Builder::new();
log_builder.parse_filters(&opt.log_level);
if opt.log_level == "info" {
@ -28,67 +21,53 @@ async fn main() -> Result<(), MainError> {
log_builder.filter_module("milli", log::LevelFilter::Warn);
}
match opt.env.as_ref() {
"production" => {
if opt.master_key.is_none() {
return Err(
"In production mode, the environment variable MEILI_MASTER_KEY is mandatory"
.into(),
);
}
#[cfg(all(not(debug_assertions), feature = "analytics"))]
if !opt.no_analytics {
let logger =
sentry::integrations::log::SentryLogger::with_dest(log_builder.build());
log::set_boxed_logger(Box::new(logger))
.map(|()| log::set_max_level(log::LevelFilter::Info))
.unwrap();
let sentry = sentry::init(sentry::ClientOptions {
release: sentry::release_name!(),
dsn: Some(SENTRY_DSN.parse()?),
before_send: Some(Arc::new(|event| {
if matches!(event.message, Some(ref msg) if msg.to_lowercase().contains("no space left on device"))
{
None
} else {
Some(event)
}
})),
..Default::default()
});
// sentry must stay alive as long as possible
std::mem::forget(sentry);
} else {
log_builder.init();
}
}
"development" => {
log_builder.init();
}
_ => unreachable!(),
}
let data = Data::new(opt.clone())?;
#[cfg(all(not(debug_assertions), feature = "analytics"))]
if !opt.no_analytics {
let analytics_data = data.clone();
let analytics_opt = opt.clone();
tokio::task::spawn(analytics::analytics_sender(analytics_data, analytics_opt));
}
print_launch_resume(&opt, &data);
run_http(data, opt).await?;
log_builder.init();
Ok(())
}
async fn run_http(data: Data, opt: Opt) -> Result<(), Box<dyn std::error::Error>> {
#[actix_web::main]
async fn main() -> anyhow::Result<()> {
let opt = Opt::from_args();
setup(&opt)?;
match opt.env.as_ref() {
"production" => {
if opt.master_key.is_none() {
anyhow::bail!(
"In production mode, the environment variable MEILI_MASTER_KEY is mandatory"
)
}
}
"development" => (),
_ => unreachable!(),
}
let meilisearch = setup_meilisearch(&opt)?;
// Setup the temp directory to be in the db folder. This is important, since temporary file
// don't support to be persisted accross filesystem boundaries.
meilisearch_http::setup_temp_dir(&opt.db_path)?;
#[cfg(all(not(debug_assertions), feature = "analytics"))]
if !opt.no_analytics {
let analytics_data = meilisearch.clone();
let analytics_opt = opt.clone();
tokio::task::spawn(analytics::analytics_sender(analytics_data, analytics_opt));
}
print_launch_resume(&opt);
run_http(meilisearch, opt).await?;
Ok(())
}
async fn run_http(data: MeiliSearch, opt: Opt) -> anyhow::Result<()> {
let _enable_dashboard = &opt.env == "development";
let http_server = HttpServer::new(move || create_app!(data, _enable_dashboard))
let opt_clone = opt.clone();
let http_server = HttpServer::new(move || create_app!(data, _enable_dashboard, opt_clone))
// Disable signals allows the server to terminate immediately when a user enter CTRL-C
.disable_signals();
@ -98,12 +77,12 @@ async fn run_http(data: Data, opt: Opt) -> Result<(), Box<dyn std::error::Error>
.run()
.await?;
} else {
http_server.bind(opt.http_addr)?.run().await?;
http_server.bind(&opt.http_addr)?.run().await?;
}
Ok(())
}
pub fn print_launch_resume(opt: &Opt, data: &Data) {
pub fn print_launch_resume(opt: &Opt) {
let commit_sha = option_env!("VERGEN_GIT_SHA").unwrap_or("unknown");
let commit_date = option_env!("VERGEN_GIT_COMMIT_TIMESTAMP").unwrap_or("unknown");
@ -148,7 +127,7 @@ Anonymous telemetry: \"Enabled\""
eprintln!();
if data.api_keys().master.is_some() {
if opt.master_key.is_some() {
eprintln!("A Master Key has been set. Requests to MeiliSearch won't be authorized unless you provide an authentication key.");
} else {
eprintln!("No master key found; The server will accept unidentified requests. \

View File

@ -1,10 +1,10 @@
use std::fs;
use std::io::{BufReader, Read};
use std::path::PathBuf;
use std::sync::Arc;
use std::{error, fs};
use byte_unit::Byte;
use grenad::CompressionType;
use meilisearch_lib::options::IndexerOpts;
use rustls::internal::pemfile::{certs, pkcs8_private_keys, rsa_private_keys};
use rustls::{
AllowAnyAnonymousOrAuthenticatedClient, AllowAnyAuthenticatedClient, NoClientAuth,
@ -12,74 +12,6 @@ use rustls::{
};
use structopt::StructOpt;
#[derive(Debug, Clone, StructOpt)]
pub struct IndexerOpts {
/// The amount of documents to skip before printing
/// a log regarding the indexing advancement.
#[structopt(long, default_value = "100000")] // 100k
pub log_every_n: usize,
/// Grenad max number of chunks in bytes.
#[structopt(long)]
pub max_nb_chunks: Option<usize>,
/// The maximum amount of memory to use for the Grenad buffer. It is recommended
/// to use something like 80%-90% of the available memory.
///
/// It is automatically split by the number of jobs e.g. if you use 7 jobs
/// and 7 GB of max memory, each thread will use a maximum of 1 GB.
#[structopt(long, default_value = "7 GiB")]
pub max_memory: Byte,
/// Size of the linked hash map cache when indexing.
/// The bigger it is, the faster the indexing is but the more memory it takes.
#[structopt(long, default_value = "500")]
pub linked_hash_map_size: usize,
/// The name of the compression algorithm to use when compressing intermediate
/// Grenad chunks while indexing documents.
///
/// Choosing a fast algorithm will make the indexing faster but may consume more memory.
#[structopt(long, default_value = "snappy", possible_values = &["snappy", "zlib", "lz4", "lz4hc", "zstd"])]
pub chunk_compression_type: CompressionType,
/// The level of compression of the chosen algorithm.
#[structopt(long, requires = "chunk-compression-type")]
pub chunk_compression_level: Option<u32>,
/// The number of bytes to remove from the begining of the chunks while reading/sorting
/// or merging them.
///
/// File fusing must only be enable on file systems that support the `FALLOC_FL_COLLAPSE_RANGE`,
/// (i.e. ext4 and XFS). File fusing will only work if the `enable-chunk-fusing` is set.
#[structopt(long, default_value = "4 GiB")]
pub chunk_fusing_shrink_size: Byte,
/// Enable the chunk fusing or not, this reduces the amount of disk space used.
#[structopt(long)]
pub enable_chunk_fusing: bool,
/// Number of parallel jobs for indexing, defaults to # of CPUs.
#[structopt(long)]
pub indexing_jobs: Option<usize>,
}
impl Default for IndexerOpts {
fn default() -> Self {
Self {
log_every_n: 100_000,
max_nb_chunks: None,
max_memory: Byte::from_str("1GiB").unwrap(),
linked_hash_map_size: 500,
chunk_compression_type: CompressionType::None,
chunk_compression_level: None,
chunk_fusing_shrink_size: Byte::from_str("4GiB").unwrap(),
enable_chunk_fusing: false,
indexing_jobs: None,
}
}
}
const POSSIBLE_ENV: [&str; 2] = ["development", "production"];
#[derive(Debug, Clone, StructOpt)]
@ -197,7 +129,7 @@ pub struct Opt {
}
impl Opt {
pub fn get_ssl_config(&self) -> Result<Option<rustls::ServerConfig>, Box<dyn error::Error>> {
pub fn get_ssl_config(&self) -> anyhow::Result<Option<rustls::ServerConfig>> {
if let (Some(cert_path), Some(key_path)) = (&self.ssl_cert_path, &self.ssl_key_path) {
let client_auth = match &self.ssl_auth_path {
Some(auth_path) => {
@ -223,7 +155,7 @@ impl Opt {
let ocsp = load_ocsp(&self.ssl_ocsp_path)?;
config
.set_single_cert_with_ocsp_and_sct(certs, privkey, ocsp, vec![])
.map_err(|_| "bad certificates/private key")?;
.map_err(|_| anyhow::anyhow!("bad certificates/private key"))?;
if self.ssl_resumption {
config.set_persistence(rustls::ServerSessionMemoryCache::new(256));
@ -240,25 +172,31 @@ impl Opt {
}
}
fn load_certs(filename: PathBuf) -> Result<Vec<rustls::Certificate>, Box<dyn error::Error>> {
let certfile = fs::File::open(filename).map_err(|_| "cannot open certificate file")?;
fn load_certs(filename: PathBuf) -> anyhow::Result<Vec<rustls::Certificate>> {
let certfile =
fs::File::open(filename).map_err(|_| anyhow::anyhow!("cannot open certificate file"))?;
let mut reader = BufReader::new(certfile);
Ok(certs(&mut reader).map_err(|_| "cannot read certificate file")?)
certs(&mut reader).map_err(|_| anyhow::anyhow!("cannot read certificate file"))
}
fn load_private_key(filename: PathBuf) -> Result<rustls::PrivateKey, Box<dyn error::Error>> {
fn load_private_key(filename: PathBuf) -> anyhow::Result<rustls::PrivateKey> {
let rsa_keys = {
let keyfile =
fs::File::open(filename.clone()).map_err(|_| "cannot open private key file")?;
let keyfile = fs::File::open(filename.clone())
.map_err(|_| anyhow::anyhow!("cannot open private key file"))?;
let mut reader = BufReader::new(keyfile);
rsa_private_keys(&mut reader).map_err(|_| "file contains invalid rsa private key")?
rsa_private_keys(&mut reader)
.map_err(|_| anyhow::anyhow!("file contains invalid rsa private key"))?
};
let pkcs8_keys = {
let keyfile = fs::File::open(filename).map_err(|_| "cannot open private key file")?;
let keyfile = fs::File::open(filename)
.map_err(|_| anyhow::anyhow!("cannot open private key file"))?;
let mut reader = BufReader::new(keyfile);
pkcs8_private_keys(&mut reader)
.map_err(|_| "file contains invalid pkcs8 private key (encrypted keys not supported)")?
pkcs8_private_keys(&mut reader).map_err(|_| {
anyhow::anyhow!(
"file contains invalid pkcs8 private key (encrypted keys not supported)"
)
})?
};
// prefer to load pkcs8 keys
@ -270,14 +208,14 @@ fn load_private_key(filename: PathBuf) -> Result<rustls::PrivateKey, Box<dyn err
}
}
fn load_ocsp(filename: &Option<PathBuf>) -> Result<Vec<u8>, Box<dyn error::Error>> {
fn load_ocsp(filename: &Option<PathBuf>) -> anyhow::Result<Vec<u8>> {
let mut ret = Vec::new();
if let Some(ref name) = filename {
fs::File::open(name)
.map_err(|_| "cannot open ocsp file")?
.map_err(|_| anyhow::anyhow!("cannot open ocsp file"))?
.read_to_end(&mut ret)
.map_err(|_| "cannot read oscp file")?;
.map_err(|_| anyhow::anyhow!("cannot read oscp file"))?;
}
Ok(ret)

View File

@ -1,18 +1,20 @@
use actix_web::{web, HttpResponse};
use log::debug;
use meilisearch_lib::MeiliSearch;
use serde::{Deserialize, Serialize};
use crate::error::ResponseError;
use crate::extractors::authentication::{policies::*, GuardedData};
use crate::Data;
pub fn configure(cfg: &mut web::ServiceConfig) {
cfg.service(web::resource("").route(web::post().to(create_dump)))
.service(web::resource("/{dump_uid}/status").route(web::get().to(get_dump_status)));
}
pub async fn create_dump(data: GuardedData<Private, Data>) -> Result<HttpResponse, ResponseError> {
let res = data.create_dump().await?;
pub async fn create_dump(
meilisearch: GuardedData<Private, MeiliSearch>,
) -> Result<HttpResponse, ResponseError> {
let res = meilisearch.create_dump().await?;
debug!("returns: {:?}", res);
Ok(HttpResponse::Accepted().json(res))
@ -30,10 +32,10 @@ struct DumpParam {
}
async fn get_dump_status(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<DumpParam>,
) -> Result<HttpResponse, ResponseError> {
let res = data.dump_status(path.dump_uid.clone()).await?;
let res = meilisearch.dump_info(path.dump_uid.clone()).await?;
debug!("returns: {:?}", res);
Ok(HttpResponse::Ok().json(res))

View File

@ -1,51 +1,33 @@
use actix_web::{web, HttpResponse};
use actix_web::error::PayloadError;
use actix_web::web::Bytes;
use actix_web::{web, HttpRequest, HttpResponse};
use futures::{Stream, StreamExt};
use log::debug;
use milli::update::{IndexDocumentsMethod, UpdateFormat};
use meilisearch_lib::index_controller::{DocumentAdditionFormat, Update};
use meilisearch_lib::milli::update::IndexDocumentsMethod;
use meilisearch_lib::MeiliSearch;
use once_cell::sync::Lazy;
use serde::Deserialize;
use serde_json::Value;
use tokio::sync::mpsc;
use crate::error::ResponseError;
use crate::error::{MeilisearchHttpError, ResponseError};
use crate::extractors::authentication::{policies::*, GuardedData};
use crate::extractors::payload::Payload;
use crate::routes::IndexParam;
use crate::Data;
const DEFAULT_RETRIEVE_DOCUMENTS_OFFSET: usize = 0;
const DEFAULT_RETRIEVE_DOCUMENTS_LIMIT: usize = 20;
/*
macro_rules! guard_content_type {
($fn_name:ident, $guard_value:literal) => {
fn $fn_name(head: &actix_web::dev::RequestHead) -> bool {
if let Some(content_type) = head.headers.get("Content-Type") {
content_type
.to_str()
.map(|v| v.contains($guard_value))
.unwrap_or(false)
} else {
false
}
/// This is required because Payload is not Sync nor Send
fn payload_to_stream(mut payload: Payload) -> impl Stream<Item = Result<Bytes, PayloadError>> {
let (snd, recv) = mpsc::channel(1);
tokio::task::spawn_local(async move {
while let Some(data) = payload.next().await {
let _ = snd.send(data).await;
}
};
}
guard_content_type!(guard_json, "application/json");
*/
fn guard_json(head: &actix_web::dev::RequestHead) -> bool {
if let Some(_content_type) = head.headers.get("Content-Type") {
// CURRENTLY AND FOR THIS RELEASE ONLY WE DECIDED TO INTERPRET ALL CONTENT-TYPES AS JSON
true
/*
content_type
.to_str()
.map(|v| v.contains("application/json"))
.unwrap_or(false)
*/
} else {
// if no content-type is specified we still accept the data as json!
true
}
});
tokio_stream::wrappers::ReceiverStream::new(recv)
}
#[derive(Deserialize)]
@ -58,8 +40,8 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
cfg.service(
web::resource("")
.route(web::get().to(get_all_documents))
.route(web::post().guard(guard_json).to(add_documents))
.route(web::put().guard(guard_json).to(update_documents))
.route(web::post().to(add_documents))
.route(web::put().to(update_documents))
.route(web::delete().to(clear_all_documents)),
)
// this route needs to be before the /documents/{document_id} to match properly
@ -72,24 +54,29 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
}
pub async fn get_document(
data: GuardedData<Public, Data>,
meilisearch: GuardedData<Public, MeiliSearch>,
path: web::Path<DocumentParam>,
) -> Result<HttpResponse, ResponseError> {
let index = path.index_uid.clone();
let id = path.document_id.clone();
let document = data
.retrieve_document(index, id, None as Option<Vec<String>>)
let document = meilisearch
.document(index, id, None as Option<Vec<String>>)
.await?;
debug!("returns: {:?}", document);
Ok(HttpResponse::Ok().json(document))
}
pub async fn delete_document(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<DocumentParam>,
) -> Result<HttpResponse, ResponseError> {
let update_status = data
.delete_documents(path.index_uid.clone(), vec![path.document_id.clone()])
let DocumentParam {
document_id,
index_uid,
} = path.into_inner();
let update = Update::DeleteDocuments(vec![document_id]);
let update_status = meilisearch
.register_update(index_uid, update, false)
.await?;
debug!("returns: {:?}", update_status);
Ok(HttpResponse::Accepted().json(serde_json::json!({ "updateId": update_status.id() })))
@ -104,7 +91,7 @@ pub struct BrowseQuery {
}
pub async fn get_all_documents(
data: GuardedData<Public, Data>,
meilisearch: GuardedData<Public, MeiliSearch>,
path: web::Path<IndexParam>,
params: web::Query<BrowseQuery>,
) -> Result<HttpResponse, ResponseError> {
@ -120,8 +107,8 @@ pub async fn get_all_documents(
Some(names)
});
let documents = data
.retrieve_documents(
let documents = meilisearch
.documents(
path.index_uid.clone(),
params.offset.unwrap_or(DEFAULT_RETRIEVE_DOCUMENTS_OFFSET),
params.limit.unwrap_or(DEFAULT_RETRIEVE_DOCUMENTS_LIMIT),
@ -138,54 +125,98 @@ pub struct UpdateDocumentsQuery {
primary_key: Option<String>,
}
/// Route used when the payload type is "application/json"
/// Used to add or replace documents
pub async fn add_documents(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<IndexParam>,
params: web::Query<UpdateDocumentsQuery>,
body: Payload,
req: HttpRequest,
) -> Result<HttpResponse, ResponseError> {
debug!("called with params: {:?}", params);
let update_status = data
.add_documents(
path.into_inner().index_uid,
IndexDocumentsMethod::ReplaceDocuments,
UpdateFormat::Json,
body,
params.primary_key.clone(),
)
.await?;
document_addition(
req.headers()
.get("Content-type")
.map(|s| s.to_str().unwrap_or("unkown")),
meilisearch,
path.into_inner().index_uid,
params.into_inner().primary_key,
body,
IndexDocumentsMethod::ReplaceDocuments,
)
.await
}
pub async fn update_documents(
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<IndexParam>,
params: web::Query<UpdateDocumentsQuery>,
body: Payload,
req: HttpRequest,
) -> Result<HttpResponse, ResponseError> {
debug!("called with params: {:?}", params);
document_addition(
req.headers()
.get("Content-type")
.map(|s| s.to_str().unwrap_or("unkown")),
meilisearch,
path.into_inner().index_uid,
params.into_inner().primary_key,
body,
IndexDocumentsMethod::UpdateDocuments,
)
.await
}
/// Route used when the payload type is "application/json"
/// Used to add or replace documents
async fn document_addition(
content_type: Option<&str>,
meilisearch: GuardedData<Private, MeiliSearch>,
index_uid: String,
primary_key: Option<String>,
body: Payload,
method: IndexDocumentsMethod,
) -> Result<HttpResponse, ResponseError> {
static ACCEPTED_CONTENT_TYPE: Lazy<Vec<String>> = Lazy::new(|| {
vec![
"application/json".to_string(),
"application/x-ndjson".to_string(),
"application/csv".to_string(),
]
});
let format = match content_type {
Some("application/json") => DocumentAdditionFormat::Json,
Some("application/x-ndjson") => DocumentAdditionFormat::Ndjson,
Some("text/csv") => DocumentAdditionFormat::Csv,
Some(other) => {
return Err(MeilisearchHttpError::InvalidContentType(
other.to_string(),
ACCEPTED_CONTENT_TYPE.clone(),
)
.into())
}
None => {
return Err(
MeilisearchHttpError::MissingContentType(ACCEPTED_CONTENT_TYPE.clone()).into(),
)
}
};
let update = Update::DocumentAddition {
payload: Box::new(payload_to_stream(body)),
primary_key,
method,
format,
};
let update_status = meilisearch.register_update(index_uid, update, true).await?;
debug!("returns: {:?}", update_status);
Ok(HttpResponse::Accepted().json(serde_json::json!({ "updateId": update_status.id() })))
}
/// Route used when the payload type is "application/json"
/// Used to add or replace documents
pub async fn update_documents(
data: GuardedData<Private, Data>,
path: web::Path<IndexParam>,
params: web::Query<UpdateDocumentsQuery>,
body: Payload,
) -> Result<HttpResponse, ResponseError> {
debug!("called with params: {:?}", params);
let update = data
.add_documents(
path.into_inner().index_uid,
IndexDocumentsMethod::UpdateDocuments,
UpdateFormat::Json,
body,
params.primary_key.clone(),
)
.await?;
debug!("returns: {:?}", update);
Ok(HttpResponse::Accepted().json(serde_json::json!({ "updateId": update.id() })))
}
pub async fn delete_documents(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<IndexParam>,
body: web::Json<Vec<Value>>,
) -> Result<HttpResponse, ResponseError> {
@ -199,16 +230,22 @@ pub async fn delete_documents(
})
.collect();
let update_status = data.delete_documents(path.index_uid.clone(), ids).await?;
let update = Update::DeleteDocuments(ids);
let update_status = meilisearch
.register_update(path.into_inner().index_uid, update, false)
.await?;
debug!("returns: {:?}", update_status);
Ok(HttpResponse::Accepted().json(serde_json::json!({ "updateId": update_status.id() })))
}
pub async fn clear_all_documents(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let update_status = data.clear_documents(path.index_uid.clone()).await?;
let update = Update::ClearDocuments;
let update_status = meilisearch
.register_update(path.into_inner().index_uid, update, false)
.await?;
debug!("returns: {:?}", update_status);
Ok(HttpResponse::Accepted().json(serde_json::json!({ "updateId": update_status.id() })))
}

View File

@ -1,12 +1,13 @@
use actix_web::{web, HttpResponse};
use chrono::{DateTime, Utc};
use log::debug;
use meilisearch_lib::index_controller::IndexSettings;
use meilisearch_lib::MeiliSearch;
use serde::{Deserialize, Serialize};
use crate::error::ResponseError;
use crate::extractors::authentication::{policies::*, GuardedData};
use crate::routes::IndexParam;
use crate::Data;
pub mod documents;
pub mod search;
@ -35,7 +36,9 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
);
}
pub async fn list_indexes(data: GuardedData<Private, Data>) -> Result<HttpResponse, ResponseError> {
pub async fn list_indexes(
data: GuardedData<Private, MeiliSearch>,
) -> Result<HttpResponse, ResponseError> {
let indexes = data.list_indexes().await?;
debug!("returns: {:?}", indexes);
Ok(HttpResponse::Ok().json(indexes))
@ -49,11 +52,11 @@ pub struct IndexCreateRequest {
}
pub async fn create_index(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
body: web::Json<IndexCreateRequest>,
) -> Result<HttpResponse, ResponseError> {
let body = body.into_inner();
let meta = data.create_index(body.uid, body.primary_key).await?;
let meta = meilisearch.create_index(body.uid, body.primary_key).await?;
Ok(HttpResponse::Created().json(meta))
}
@ -75,41 +78,45 @@ pub struct UpdateIndexResponse {
}
pub async fn get_index(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let meta = data.index(path.index_uid.clone()).await?;
let meta = meilisearch.get_index(path.index_uid.clone()).await?;
debug!("returns: {:?}", meta);
Ok(HttpResponse::Ok().json(meta))
}
pub async fn update_index(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<IndexParam>,
body: web::Json<UpdateIndexRequest>,
) -> Result<HttpResponse, ResponseError> {
debug!("called with params: {:?}", body);
let body = body.into_inner();
let meta = data
.update_index(path.into_inner().index_uid, body.primary_key, body.uid)
let settings = IndexSettings {
uid: body.uid,
primary_key: body.primary_key,
};
let meta = meilisearch
.update_index(path.into_inner().index_uid, settings)
.await?;
debug!("returns: {:?}", meta);
Ok(HttpResponse::Ok().json(meta))
}
pub async fn delete_index(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
data.delete_index(path.index_uid.clone()).await?;
meilisearch.delete_index(path.index_uid.clone()).await?;
Ok(HttpResponse::NoContent().finish())
}
pub async fn get_index_stats(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let response = data.get_index_stats(path.index_uid.clone()).await?;
let response = meilisearch.get_index_stats(path.index_uid.clone()).await?;
debug!("returns: {:?}", response);
Ok(HttpResponse::Ok().json(response))

View File

@ -1,15 +1,13 @@
use std::collections::{BTreeSet, HashSet};
use actix_web::{web, HttpResponse};
use log::debug;
use meilisearch_lib::index::{default_crop_length, SearchQuery, DEFAULT_SEARCH_LIMIT};
use meilisearch_lib::MeiliSearch;
use serde::Deserialize;
use serde_json::Value;
use crate::error::ResponseError;
use crate::extractors::authentication::{policies::*, GuardedData};
use crate::index::{default_crop_length, SearchQuery, DEFAULT_SEARCH_LIMIT};
use crate::routes::IndexParam;
use crate::Data;
pub fn configure(cfg: &mut web::ServiceConfig) {
cfg.service(
@ -31,6 +29,7 @@ pub struct SearchQueryGet {
crop_length: usize,
attributes_to_highlight: Option<String>,
filter: Option<String>,
sort: Option<String>,
#[serde(default = "Default::default")]
matches: bool,
facets_distribution: Option<String>,
@ -40,19 +39,19 @@ impl From<SearchQueryGet> for SearchQuery {
fn from(other: SearchQueryGet) -> Self {
let attributes_to_retrieve = other
.attributes_to_retrieve
.map(|attrs| attrs.split(',').map(String::from).collect::<BTreeSet<_>>());
.map(|attrs| attrs.split(',').map(String::from).collect());
let attributes_to_crop = other
.attributes_to_crop
.map(|attrs| attrs.split(',').map(String::from).collect::<Vec<_>>());
.map(|attrs| attrs.split(',').map(String::from).collect());
let attributes_to_highlight = other
.attributes_to_highlight
.map(|attrs| attrs.split(',').map(String::from).collect::<HashSet<_>>());
.map(|attrs| attrs.split(',').map(String::from).collect());
let facets_distribution = other
.facets_distribution
.map(|attrs| attrs.split(',').map(String::from).collect::<Vec<_>>());
.map(|attrs| attrs.split(',').map(String::from).collect());
let filter = match other.filter {
Some(f) => match serde_json::from_str(&f) {
@ -62,6 +61,8 @@ impl From<SearchQueryGet> for SearchQuery {
None => None,
};
let sort = other.sort.map(|attr| fix_sort_query_parameters(&attr));
Self {
q: other.q,
offset: other.offset,
@ -71,20 +72,49 @@ impl From<SearchQueryGet> for SearchQuery {
crop_length: other.crop_length,
attributes_to_highlight,
filter,
sort,
matches: other.matches,
facets_distribution,
}
}
}
// TODO: TAMO: split on :asc, and :desc, instead of doing some weird things
/// Transform the sort query parameter into something that matches the post expected format.
fn fix_sort_query_parameters(sort_query: &str) -> Vec<String> {
let mut sort_parameters = Vec::new();
let mut merge = false;
for current_sort in sort_query.trim_matches('"').split(',').map(|s| s.trim()) {
if current_sort.starts_with("_geoPoint(") {
sort_parameters.push(current_sort.to_string());
merge = true;
} else if merge && !sort_parameters.is_empty() {
sort_parameters
.last_mut()
.unwrap()
.push_str(&format!(",{}", current_sort));
if current_sort.ends_with("):desc") || current_sort.ends_with("):asc") {
merge = false;
}
} else {
sort_parameters.push(current_sort.to_string());
merge = false;
}
}
sort_parameters
}
pub async fn search_with_url_query(
data: GuardedData<Public, Data>,
meilisearch: GuardedData<Public, MeiliSearch>,
path: web::Path<IndexParam>,
params: web::Query<SearchQueryGet>,
) -> Result<HttpResponse, ResponseError> {
debug!("called with params: {:?}", params);
let query = params.into_inner().into();
let search_result = data.search(path.into_inner().index_uid, query).await?;
let search_result = meilisearch
.search(path.into_inner().index_uid, query)
.await?;
// Tests that the nb_hits is always set to false
#[cfg(test)]
@ -95,12 +125,12 @@ pub async fn search_with_url_query(
}
pub async fn search_with_post(
data: GuardedData<Public, Data>,
meilisearch: GuardedData<Public, MeiliSearch>,
path: web::Path<IndexParam>,
params: web::Json<SearchQuery>,
) -> Result<HttpResponse, ResponseError> {
debug!("search called with params: {:?}", params);
let search_result = data
let search_result = meilisearch
.search(path.into_inner().index_uid, params.into_inner())
.await?;
@ -111,3 +141,42 @@ pub async fn search_with_post(
debug!("returns: {:?}", search_result);
Ok(HttpResponse::Ok().json(search_result))
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn test_fix_sort_query_parameters() {
let sort = fix_sort_query_parameters("_geoPoint(12, 13):asc");
assert_eq!(sort, vec!["_geoPoint(12,13):asc".to_string()]);
let sort = fix_sort_query_parameters("doggo:asc,_geoPoint(12.45,13.56):desc");
assert_eq!(
sort,
vec![
"doggo:asc".to_string(),
"_geoPoint(12.45,13.56):desc".to_string(),
]
);
let sort = fix_sort_query_parameters(
"doggo:asc , _geoPoint(12.45, 13.56, 2590352):desc , catto:desc",
);
assert_eq!(
sort,
vec![
"doggo:asc".to_string(),
"_geoPoint(12.45,13.56,2590352):desc".to_string(),
"catto:desc".to_string(),
]
);
let sort = fix_sort_query_parameters("doggo:asc , _geoPoint(1, 2), catto:desc");
// This is ugly but eh, I don't want to write a full parser just for this unused route
assert_eq!(
sort,
vec![
"doggo:asc".to_string(),
"_geoPoint(1,2),catto:desc".to_string(),
]
);
}
}

View File

@ -1,10 +1,12 @@
use actix_web::{web, HttpResponse};
use log::debug;
use actix_web::{web, HttpResponse};
use meilisearch_lib::index::{Settings, Unchecked};
use meilisearch_lib::index_controller::Update;
use meilisearch_lib::MeiliSearch;
use crate::error::ResponseError;
use crate::extractors::authentication::{policies::*, GuardedData};
use crate::index::Settings;
use crate::Data;
use crate::{error::ResponseError, index::Unchecked};
#[macro_export]
macro_rules! make_setting_route {
@ -13,45 +15,50 @@ macro_rules! make_setting_route {
use log::debug;
use actix_web::{web, HttpResponse, Resource};
use crate::data;
use meilisearch_lib::milli::update::Setting;
use meilisearch_lib::{MeiliSearch, index::Settings, index_controller::Update};
use crate::error::ResponseError;
use crate::index::Settings;
use crate::extractors::authentication::{GuardedData, policies::*};
pub async fn delete(
data: GuardedData<Private, data::Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
index_uid: web::Path<String>,
) -> Result<HttpResponse, ResponseError> {
use crate::index::Settings;
let settings = Settings {
$attr: Some(None),
$attr: Setting::Reset,
..Default::default()
};
let update_status = data.update_settings(index_uid.into_inner(), settings, false).await?;
let update = Update::Settings(settings);
let update_status = meilisearch.register_update(index_uid.into_inner(), update, false).await?;
debug!("returns: {:?}", update_status);
Ok(HttpResponse::Accepted().json(serde_json::json!({ "updateId": update_status.id() })))
}
pub async fn update(
data: GuardedData<Private, data::Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
index_uid: actix_web::web::Path<String>,
body: actix_web::web::Json<Option<$type>>,
) -> std::result::Result<HttpResponse, ResponseError> {
let settings = Settings {
$attr: Some(body.into_inner()),
$attr: match body.into_inner() {
Some(inner_body) => Setting::Set(inner_body),
None => Setting::Reset
},
..Default::default()
};
let update_status = data.update_settings(index_uid.into_inner(), settings, true).await?;
let update = Update::Settings(settings);
let update_status = meilisearch.register_update(index_uid.into_inner(), update, true).await?;
debug!("returns: {:?}", update_status);
Ok(HttpResponse::Accepted().json(serde_json::json!({ "updateId": update_status.id() })))
}
pub async fn get(
data: GuardedData<Private, data::Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
index_uid: actix_web::web::Path<String>,
) -> std::result::Result<HttpResponse, ResponseError> {
let settings = data.settings(index_uid.into_inner()).await?;
let settings = meilisearch.settings(index_uid.into_inner()).await?;
debug!("returns: {:?}", settings);
let mut json = serde_json::json!(&settings);
let val = json[$camelcase_attr].take();
@ -70,11 +77,18 @@ macro_rules! make_setting_route {
make_setting_route!(
"/filterable-attributes",
std::collections::HashSet<String>,
std::collections::BTreeSet<String>,
filterable_attributes,
"filterableAttributes"
);
make_setting_route!(
"/sortable-attributes",
std::collections::BTreeSet<String>,
sortable_attributes,
"sortableAttributes"
);
make_setting_route!(
"/displayed-attributes",
Vec<String>,
@ -127,6 +141,7 @@ macro_rules! generate_configure {
generate_configure!(
filterable_attributes,
sortable_attributes,
displayed_attributes,
searchable_attributes,
distinct_attribute,
@ -136,13 +151,15 @@ generate_configure!(
);
pub async fn update_all(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
index_uid: web::Path<String>,
body: web::Json<Settings<Unchecked>>,
) -> Result<HttpResponse, ResponseError> {
let settings = body.into_inner().check();
let update_result = data
.update_settings(index_uid.into_inner(), settings, true)
let settings = body.into_inner();
let update = Update::Settings(settings);
let update_result = meilisearch
.register_update(index_uid.into_inner(), update, true)
.await?;
let json = serde_json::json!({ "updateId": update_result.id() });
debug!("returns: {:?}", json);
@ -150,7 +167,7 @@ pub async fn update_all(
}
pub async fn get_all(
data: GuardedData<Private, Data>,
data: GuardedData<Private, MeiliSearch>,
index_uid: web::Path<String>,
) -> Result<HttpResponse, ResponseError> {
let settings = data.settings(index_uid.into_inner()).await?;
@ -159,12 +176,14 @@ pub async fn get_all(
}
pub async fn delete_all(
data: GuardedData<Private, Data>,
data: GuardedData<Private, MeiliSearch>,
index_uid: web::Path<String>,
) -> Result<HttpResponse, ResponseError> {
let settings = Settings::cleared();
let update = Update::Settings(settings.into_unchecked());
let update_result = data
.update_settings(index_uid.into_inner(), settings, false)
.register_update(index_uid.into_inner(), update, false)
.await?;
let json = serde_json::json!({ "updateId": update_result.id() });
debug!("returns: {:?}", json);

View File

@ -1,12 +1,12 @@
use actix_web::{web, HttpResponse};
use chrono::{DateTime, Utc};
use log::debug;
use meilisearch_lib::MeiliSearch;
use serde::{Deserialize, Serialize};
use crate::error::ResponseError;
use crate::extractors::authentication::{policies::*, GuardedData};
use crate::routes::{IndexParam, UpdateStatusResponse};
use crate::Data;
pub fn configure(cfg: &mut web::ServiceConfig) {
cfg.service(web::resource("").route(web::get().to(get_all_updates_status)))
@ -37,12 +37,12 @@ pub struct UpdateParam {
}
pub async fn get_update_status(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<UpdateParam>,
) -> Result<HttpResponse, ResponseError> {
let params = path.into_inner();
let meta = data
.get_update_status(params.index_uid, params.update_id)
let meta = meilisearch
.update_status(params.index_uid, params.update_id)
.await?;
let meta = UpdateStatusResponse::from(meta);
debug!("returns: {:?}", meta);
@ -50,10 +50,12 @@ pub async fn get_update_status(
}
pub async fn get_all_updates_status(
data: GuardedData<Private, Data>,
meilisearch: GuardedData<Private, MeiliSearch>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let metas = data.get_updates_status(path.into_inner().index_uid).await?;
let metas = meilisearch
.all_update_status(path.into_inner().index_uid)
.await?;
let metas = metas
.into_iter()
.map(UpdateStatusResponse::from)

View File

@ -3,13 +3,15 @@ use std::time::Duration;
use actix_web::{web, HttpResponse};
use chrono::{DateTime, Utc};
use log::debug;
use meilisearch_lib::index_controller::updates::status::{UpdateResult, UpdateStatus};
use serde::{Deserialize, Serialize};
use meilisearch_lib::index::{Settings, Unchecked};
use meilisearch_lib::{MeiliSearch, Update};
use crate::error::ResponseError;
use crate::extractors::authentication::{policies::*, GuardedData};
use crate::index::{Settings, Unchecked};
use crate::index_controller::{UpdateMeta, UpdateResult, UpdateStatus};
use crate::Data;
use crate::ApiKeys;
mod dump;
mod indexes;
@ -48,9 +50,9 @@ pub enum UpdateType {
impl From<&UpdateStatus> for UpdateType {
fn from(other: &UpdateStatus) -> Self {
use milli::update::IndexDocumentsMethod::*;
use meilisearch_lib::milli::update::IndexDocumentsMethod::*;
match other.meta() {
UpdateMeta::DocumentsAddition { method, .. } => {
Update::DocumentAddition { method, .. } => {
let number = match other {
UpdateStatus::Processed(processed) => match processed.success {
UpdateResult::DocumentsAddition(ref addition) => {
@ -67,13 +69,13 @@ impl From<&UpdateStatus> for UpdateType {
_ => unreachable!(),
}
}
UpdateMeta::ClearDocuments => UpdateType::ClearAll,
UpdateMeta::DeleteDocuments { ids } => UpdateType::DocumentsDeletion {
number: Some(ids.len()),
},
UpdateMeta::Settings(settings) => UpdateType::Settings {
Update::Settings(settings) => UpdateType::Settings {
settings: settings.clone(),
},
Update::ClearDocuments => UpdateType::ClearAll,
Update::DeleteDocuments(ids) => UpdateType::DocumentsDeletion {
number: Some(ids.len()),
},
}
}
}
@ -186,15 +188,17 @@ impl From<UpdateStatus> for UpdateStatusResponse {
let duration = Duration::from_millis(duration as u64).as_secs_f64();
let update_id = failed.id();
let response = failed.error;
let processed_at = failed.failed_at;
let enqueued_at = failed.from.from.enqueued_at;
let response = failed.into();
let content = FailedUpdateResult {
update_id,
update_type,
response,
duration,
enqueued_at: failed.from.from.enqueued_at,
processed_at: failed.failed_at,
enqueued_at,
processed_at,
};
UpdateStatusResponse::Failed { content }
}
@ -229,8 +233,10 @@ pub async fn running() -> HttpResponse {
HttpResponse::Ok().json(serde_json::json!({ "status": "MeiliSearch is running" }))
}
async fn get_stats(data: GuardedData<Private, Data>) -> Result<HttpResponse, ResponseError> {
let response = data.get_all_stats().await?;
async fn get_stats(
meilisearch: GuardedData<Private, MeiliSearch>,
) -> Result<HttpResponse, ResponseError> {
let response = meilisearch.get_all_stats().await?;
debug!("returns: {:?}", response);
Ok(HttpResponse::Ok().json(response))
@ -244,7 +250,7 @@ struct VersionResponse {
pkg_version: String,
}
async fn get_version(_data: GuardedData<Private, Data>) -> HttpResponse {
async fn get_version(_meilisearch: GuardedData<Private, MeiliSearch>) -> HttpResponse {
let commit_sha = option_env!("VERGEN_GIT_SHA").unwrap_or("unknown");
let commit_date = option_env!("VERGEN_GIT_COMMIT_TIMESTAMP").unwrap_or("unknown");
@ -261,8 +267,8 @@ struct KeysResponse {
public: Option<String>,
}
pub async fn list_keys(data: GuardedData<Admin, Data>) -> HttpResponse {
let api_keys = data.api_keys.clone();
pub async fn list_keys(meilisearch: GuardedData<Admin, ApiKeys>) -> HttpResponse {
let api_keys = (*meilisearch).clone();
HttpResponse::Ok().json(&KeysResponse {
private: api_keys.private,
public: api_keys.public,
@ -276,17 +282,16 @@ pub async fn get_health() -> Result<HttpResponse, ResponseError> {
#[cfg(test)]
mod test {
use super::*;
use crate::data::Data;
use crate::extractors::authentication::GuardedData;
/// A type implemented for a route that uses a authentication policy `Policy`.
///
/// This trait is used for regression testing of route authenticaton policies.
trait Is<Policy, T> {}
trait Is<Policy, Data, T> {}
macro_rules! impl_is_policy {
($($param:ident)*) => {
impl<Policy, Func, $($param,)* Res> Is<Policy, (($($param,)*), Res)> for Func
impl<Policy, Func, Data, $($param,)* Res> Is<Policy, Data, (($($param,)*), Res)> for Func
where Func: Fn(GuardedData<Policy, Data>, $($param,)*) -> Res {}
};
@ -306,7 +311,7 @@ mod test {
($($policy:ident => { $($route:expr,)*})*) => {
#[test]
fn test_auth() {
$($(let _: &dyn Is<$policy, _> = &$route;)*)*
$($(let _: &dyn Is<$policy, _, _> = &$route;)*)*
}
};
}

View File

@ -2,12 +2,14 @@ use std::path::Path;
use actix_web::http::StatusCode;
use byte_unit::{Byte, ByteUnit};
use meilisearch_http::setup_meilisearch;
use meilisearch_lib::options::{IndexerOpts, MaxMemory};
use once_cell::sync::Lazy;
use serde_json::Value;
use tempdir::TempDir;
use tempfile::TempDir;
use urlencoding::encode;
use meilisearch_http::data::Data;
use meilisearch_http::option::{IndexerOpts, Opt};
use meilisearch_http::option::Opt;
use super::index::Index;
use super::service::Service;
@ -15,17 +17,28 @@ use super::service::Service;
pub struct Server {
pub service: Service,
// hold ownership to the tempdir while we use the server instance.
_dir: Option<tempdir::TempDir>,
_dir: Option<TempDir>,
}
static TEST_TEMP_DIR: Lazy<TempDir> = Lazy::new(|| TempDir::new().unwrap());
impl Server {
pub async fn new() -> Self {
let dir = TempDir::new("meilisearch").unwrap();
let dir = TempDir::new().unwrap();
let opt = default_settings(dir.path());
if cfg!(windows) {
std::env::set_var("TMP", TEST_TEMP_DIR.path());
} else {
std::env::set_var("TMPDIR", TEST_TEMP_DIR.path());
}
let data = Data::new(opt).unwrap();
let service = Service(data);
let options = default_settings(dir.path());
let meilisearch = setup_meilisearch(&options).unwrap();
let service = Service {
meilisearch,
options,
};
Server {
service,
@ -33,9 +46,12 @@ impl Server {
}
}
pub async fn new_with_options(opt: Opt) -> Self {
let data = Data::new(opt).unwrap();
let service = Service(data);
pub async fn new_with_options(options: Opt) -> Self {
let meilisearch = setup_meilisearch(&options).unwrap();
let service = Service {
meilisearch,
options,
};
Server {
service,
@ -46,7 +62,7 @@ impl Server {
/// Returns a view to an index. There is no guarantee that the index exists.
pub fn index(&self, uid: impl AsRef<str>) -> Index<'_> {
Index {
uid: encode(uid.as_ref()),
uid: encode(uid.as_ref()).to_string(),
service: &self.service,
}
}
@ -90,7 +106,11 @@ pub fn default_settings(dir: impl AsRef<Path>) -> Opt {
schedule_snapshot: false,
snapshot_interval_sec: 0,
import_dump: None,
indexer_options: IndexerOpts::default(),
indexer_options: IndexerOpts {
// memory has to be unlimited because several meilisearch are running in test context.
max_memory: MaxMemory::unlimited(),
..Default::default()
},
log_level: "off".into(),
}
}

View File

@ -1,14 +1,17 @@
use actix_web::{http::StatusCode, test};
use meilisearch_lib::MeiliSearch;
use serde_json::Value;
use meilisearch_http::create_app;
use meilisearch_http::data::Data;
use meilisearch_http::{create_app, Opt};
pub struct Service(pub Data);
pub struct Service {
pub meilisearch: MeiliSearch,
pub options: Opt,
}
impl Service {
pub async fn post(&self, url: impl AsRef<str>, body: Value) -> (Value, StatusCode) {
let app = test::init_service(create_app!(&self.0, true)).await;
let app = test::init_service(create_app!(&self.meilisearch, true, &self.options)).await;
let req = test::TestRequest::post()
.uri(url.as_ref())
@ -28,7 +31,7 @@ impl Service {
url: impl AsRef<str>,
body: impl AsRef<str>,
) -> (Value, StatusCode) {
let app = test::init_service(create_app!(&self.0, true)).await;
let app = test::init_service(create_app!(&self.meilisearch, true, &self.options)).await;
let req = test::TestRequest::post()
.uri(url.as_ref())
@ -44,7 +47,7 @@ impl Service {
}
pub async fn get(&self, url: impl AsRef<str>) -> (Value, StatusCode) {
let app = test::init_service(create_app!(&self.0, true)).await;
let app = test::init_service(create_app!(&self.meilisearch, true, &self.options)).await;
let req = test::TestRequest::get().uri(url.as_ref()).to_request();
let res = test::call_service(&app, req).await;
@ -56,7 +59,7 @@ impl Service {
}
pub async fn put(&self, url: impl AsRef<str>, body: Value) -> (Value, StatusCode) {
let app = test::init_service(create_app!(&self.0, true)).await;
let app = test::init_service(create_app!(&self.meilisearch, true, &self.options)).await;
let req = test::TestRequest::put()
.uri(url.as_ref())
@ -71,7 +74,7 @@ impl Service {
}
pub async fn delete(&self, url: impl AsRef<str>) -> (Value, StatusCode) {
let app = test::init_service(create_app!(&self.0, true)).await;
let app = test::init_service(create_app!(&self.meilisearch, true, &self.options)).await;
let req = test::TestRequest::delete().uri(url.as_ref()).to_request();
let res = test::call_service(&app, req).await;

View File

@ -0,0 +1,111 @@
#![allow(dead_code)]
mod common;
use crate::common::Server;
use actix_web::test;
use meilisearch_http::create_app;
use serde_json::{json, Value};
#[actix_rt::test]
async fn strict_json_bad_content_type() {
let routes = [
// all the POST routes except the dumps that can be created without any body or content-type
// and the search that is not a strict json
"/indexes",
"/indexes/doggo/documents/delete-batch",
"/indexes/doggo/search",
"/indexes/doggo/settings",
"/indexes/doggo/settings/displayed-attributes",
"/indexes/doggo/settings/distinct-attribute",
"/indexes/doggo/settings/filterable-attributes",
"/indexes/doggo/settings/ranking-rules",
"/indexes/doggo/settings/searchable-attributes",
"/indexes/doggo/settings/sortable-attributes",
"/indexes/doggo/settings/stop-words",
"/indexes/doggo/settings/synonyms",
];
let bad_content_types = [
"application/csv",
"application/x-ndjson",
"application/x-www-form-urlencoded",
"text/plain",
"json",
"application",
"json/application",
];
let document = "{}";
let server = Server::new().await;
let app = test::init_service(create_app!(
&server.service.meilisearch,
true,
&server.service.options
))
.await;
for route in routes {
// Good content-type, we probably have an error since we didn't send anything in the json
// so we only ensure we didn't get a bad media type error.
let req = test::TestRequest::post()
.uri(route)
.set_payload(document)
.insert_header(("content-type", "application/json"))
.to_request();
let res = test::call_service(&app, req).await;
let status_code = res.status();
assert_ne!(status_code, 415,
"calling the route `{}` with a content-type of json isn't supposed to throw a bad media type error", route);
// No content-type.
let req = test::TestRequest::post()
.uri(route)
.set_payload(document)
.to_request();
let res = test::call_service(&app, req).await;
let status_code = res.status();
let body = test::read_body(res).await;
let response: Value = serde_json::from_slice(&body).unwrap_or_default();
assert_eq!(status_code, 415, "calling the route `{}` without content-type is supposed to throw a bad media type error", route);
assert_eq!(
response,
json!({
"message": r#"A Content-Type header is missing. Accepted values for the Content-Type header are: "application/json""#,
"errorCode": "missing_content_type",
"errorType": "invalid_request_error",
"errorLink": "https://docs.meilisearch.com/errors#missing_content_type",
}),
"when calling the route `{}` with no content-type",
route,
);
for bad_content_type in bad_content_types {
// Always bad content-type
let req = test::TestRequest::post()
.uri(route)
.set_payload(document.to_string())
.insert_header(("content-type", bad_content_type))
.to_request();
let res = test::call_service(&app, req).await;
let status_code = res.status();
let body = test::read_body(res).await;
let response: Value = serde_json::from_slice(&body).unwrap_or_default();
assert_eq!(status_code, 415);
let expected_error_message = format!(
r#"The Content-Type "{}" is invalid. Accepted values for the Content-Type header are: "application/json""#,
bad_content_type
);
assert_eq!(
response,
json!({
"message": expected_error_message,
"errorCode": "invalid_content_type",
"errorType": "invalid_request_error",
"errorLink": "https://docs.meilisearch.com/errors#invalid_content_type",
}),
"when calling the route `{}` with a content-type of `{}`",
route,
bad_content_type,
);
}
}
}

View File

@ -16,7 +16,13 @@ async fn add_documents_test_json_content_types() {
// this is a what is expected and should work
let server = Server::new().await;
let app = test::init_service(create_app!(&server.service.0, true)).await;
let app = test::init_service(create_app!(
&server.service.meilisearch,
true,
&server.service.options
))
.await;
// post
let req = test::TestRequest::post()
.uri("/indexes/dog/documents")
.set_payload(document.to_string())
@ -28,6 +34,19 @@ async fn add_documents_test_json_content_types() {
let response: Value = serde_json::from_slice(&body).unwrap_or_default();
assert_eq!(status_code, 202);
assert_eq!(response, json!({ "updateId": 0 }));
// put
let req = test::TestRequest::put()
.uri("/indexes/dog/documents")
.set_payload(document.to_string())
.insert_header(("content-type", "application/json"))
.to_request();
let res = test::call_service(&app, req).await;
let status_code = res.status();
let body = test::read_body(res).await;
let response: Value = serde_json::from_slice(&body).unwrap_or_default();
assert_eq!(status_code, 202);
assert_eq!(response, json!({ "updateId": 1 }));
}
/// no content type is still supposed to be accepted as json
@ -41,7 +60,13 @@ async fn add_documents_test_no_content_types() {
]);
let server = Server::new().await;
let app = test::init_service(create_app!(&server.service.0, true)).await;
let app = test::init_service(create_app!(
&server.service.meilisearch,
true,
&server.service.options
))
.await;
// post
let req = test::TestRequest::post()
.uri("/indexes/dog/documents")
.set_payload(document.to_string())
@ -53,11 +78,23 @@ async fn add_documents_test_no_content_types() {
let response: Value = serde_json::from_slice(&body).unwrap_or_default();
assert_eq!(status_code, 202);
assert_eq!(response, json!({ "updateId": 0 }));
// put
let req = test::TestRequest::put()
.uri("/indexes/dog/documents")
.set_payload(document.to_string())
.insert_header(("content-type", "application/json"))
.to_request();
let res = test::call_service(&app, req).await;
let status_code = res.status();
let body = test::read_body(res).await;
let response: Value = serde_json::from_slice(&body).unwrap_or_default();
assert_eq!(status_code, 202);
assert_eq!(response, json!({ "updateId": 1 }));
}
/// any other content-type is must be refused
#[actix_rt::test]
#[ignore]
async fn add_documents_test_bad_content_types() {
let document = json!([
{
@ -67,7 +104,13 @@ async fn add_documents_test_bad_content_types() {
]);
let server = Server::new().await;
let app = test::init_service(create_app!(&server.service.0, true)).await;
let app = test::init_service(create_app!(
&server.service.meilisearch,
true,
&server.service.options
))
.await;
// post
let req = test::TestRequest::post()
.uri("/indexes/dog/documents")
.set_payload(document.to_string())
@ -76,8 +119,32 @@ async fn add_documents_test_bad_content_types() {
let res = test::call_service(&app, req).await;
let status_code = res.status();
let body = test::read_body(res).await;
assert_eq!(status_code, 405);
assert!(body.is_empty());
let response: Value = serde_json::from_slice(&body).unwrap_or_default();
assert_eq!(status_code, 415);
assert_eq!(
response["message"],
json!(
r#"The Content-Type "text/plain" is invalid. Accepted values for the Content-Type header are: "application/json", "application/x-ndjson", "application/csv""#
)
);
// put
let req = test::TestRequest::put()
.uri("/indexes/dog/documents")
.set_payload(document.to_string())
.insert_header(("content-type", "text/plain"))
.to_request();
let res = test::call_service(&app, req).await;
let status_code = res.status();
let body = test::read_body(res).await;
let response: Value = serde_json::from_slice(&body).unwrap_or_default();
assert_eq!(status_code, 415);
assert_eq!(
response["message"],
json!(
r#"The Content-Type "text/plain" is invalid. Accepted values for the Content-Type header are: "application/json", "application/x-ndjson", "application/csv""#
)
);
}
#[actix_rt::test]
@ -137,8 +204,8 @@ async fn document_add_create_index_bad_uid() {
async fn document_update_create_index_bad_uid() {
let server = Server::new().await;
let index = server.index("883 fj!");
let (_response, code) = index.update_documents(json!([]), None).await;
assert_eq!(code, 400);
let (response, code) = index.update_documents(json!([]), None).await;
assert_eq!(code, 400, "{}", response);
}
#[actix_rt::test]

View File

@ -1,5 +1,5 @@
use crate::common::Server;
use serde_json::Value;
use serde_json::{json, Value};
#[actix_rt::test]
async fn create_index_no_primary_key() {
@ -33,6 +33,22 @@ async fn create_index_with_primary_key() {
assert_eq!(response.as_object().unwrap().len(), 5);
}
#[actix_rt::test]
async fn create_index_with_invalid_primary_key() {
let document = json!([ { "id": 2, "title": "Pride and Prejudice" } ]);
let server = Server::new().await;
let index = server.index("movies");
let (_response, code) = index.add_documents(document, Some("title")).await;
assert_eq!(code, 202);
index.wait_update_id(0).await;
let (response, code) = index.get().await;
assert_eq!(code, 200);
assert_eq!(response["primaryKey"], Value::Null);
}
// TODO: partial test since we are testing error, amd error is not yet fully implemented in
// transplant
#[actix_rt::test]

View File

@ -146,6 +146,80 @@ async fn search_with_filter_array_notation() {
assert_eq!(response["hits"].as_array().unwrap().len(), 3);
}
#[actix_rt::test]
async fn search_with_sort_on_numbers() {
let server = Server::new().await;
let index = server.index("test");
index
.update_settings(json!({"sortableAttributes": ["id"]}))
.await;
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_update_id(1).await;
index
.search(
json!({
"sort": ["id:asc"]
}),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(response["hits"].as_array().unwrap().len(), 5);
},
)
.await;
}
#[actix_rt::test]
async fn search_with_sort_on_strings() {
let server = Server::new().await;
let index = server.index("test");
index
.update_settings(json!({"sortableAttributes": ["title"]}))
.await;
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_update_id(1).await;
index
.search(
json!({
"sort": ["title:desc"]
}),
|response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(response["hits"].as_array().unwrap().len(), 5);
},
)
.await;
}
#[actix_rt::test]
async fn search_with_multiple_sort() {
let server = Server::new().await;
let index = server.index("test");
index
.update_settings(json!({"sortableAttributes": ["id", "title"]}))
.await;
let documents = DOCUMENTS.clone();
index.add_documents(documents, None).await;
index.wait_update_id(1).await;
let (response, code) = index
.search_post(json!({
"sort": ["id:asc", "title:desc"]
}))
.await;
assert_eq!(code, 200, "{}", response);
assert_eq!(response["hits"].as_array().unwrap().len(), 5);
}
#[actix_rt::test]
async fn search_facet_distribution() {
let server = Server::new().await;

View File

@ -13,7 +13,14 @@ static DEFAULT_SETTINGS_VALUES: Lazy<HashMap<&'static str, Value>> = Lazy::new(|
map.insert("distinct_attribute", json!(Value::Null));
map.insert(
"ranking_rules",
json!(["words", "typo", "proximity", "attribute", "exactness"]),
json!([
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
]),
);
map.insert("stop_words", json!([]));
map.insert("synonyms", json!({}));
@ -23,8 +30,8 @@ static DEFAULT_SETTINGS_VALUES: Lazy<HashMap<&'static str, Value>> = Lazy::new(|
#[actix_rt::test]
async fn get_settings_unexisting_index() {
let server = Server::new().await;
let (_response, code) = server.index("test").settings().await;
assert_eq!(code, 404)
let (response, code) = server.index("test").settings().await;
assert_eq!(code, 404, "{}", response)
}
#[actix_rt::test]
@ -35,14 +42,22 @@ async fn get_settings() {
let (response, code) = index.settings().await;
assert_eq!(code, 200);
let settings = response.as_object().unwrap();
assert_eq!(settings.keys().len(), 7);
assert_eq!(settings.keys().len(), 8);
assert_eq!(settings["displayedAttributes"], json!(["*"]));
assert_eq!(settings["searchableAttributes"], json!(["*"]));
assert_eq!(settings["filterableAttributes"], json!([]));
assert_eq!(settings["sortableAttributes"], json!([]));
assert_eq!(settings["distinctAttribute"], json!(null));
assert_eq!(
settings["rankingRules"],
json!(["words", "typo", "proximity", "attribute", "exactness"])
json!([
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
])
);
assert_eq!(settings["stopWords"], json!([]));
}
@ -152,8 +167,8 @@ async fn update_setting_unexisting_index() {
async fn update_setting_unexisting_index_invalid_uid() {
let server = Server::new().await;
let index = server.index("test##! ");
let (_response, code) = index.update_settings(json!({})).await;
assert_eq!(code, 400);
let (response, code) = index.update_settings(json!({})).await;
assert_eq!(code, 400, "{}", response);
}
macro_rules! test_setting_routes {

View File

@ -9,8 +9,8 @@ use meilisearch_http::Opt;
#[actix_rt::test]
async fn perform_snapshot() {
let temp = tempfile::tempdir_in(".").unwrap();
let snapshot_dir = tempfile::tempdir_in(".").unwrap();
let temp = tempfile::tempdir().unwrap();
let snapshot_dir = tempfile::tempdir().unwrap();
let options = Opt {
snapshot_dir: snapshot_dir.path().to_owned(),
@ -29,7 +29,7 @@ async fn perform_snapshot() {
sleep(Duration::from_secs(2)).await;
let temp = tempfile::tempdir_in(".").unwrap();
let temp = tempfile::tempdir().unwrap();
let snapshot_path = snapshot_dir
.path()

View File

@ -0,0 +1,64 @@
[package]
name = "meilisearch-lib"
version = "0.23.0"
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
actix-web = { version = "4.0.0-beta.9", features = ["rustls"] }
actix-web-static-files = { git = "https://github.com/MarinPostma/actix-web-static-files.git", rev = "39d8006", optional = true }
anyhow = { version = "1.0.43", features = ["backtrace"] }
async-stream = "0.3.2"
async-trait = "0.1.51"
arc-swap = "1.3.2"
byte-unit = { version = "4.0.12", default-features = false, features = ["std"] }
bytes = "1.1.0"
chrono = { version = "0.4.19", features = ["serde"] }
csv = "1.1.6"
crossbeam-channel = "0.5.1"
either = "1.6.1"
flate2 = "1.0.21"
fst = "0.4.7"
futures = "0.3.17"
futures-util = "0.3.17"
heed = { git = "https://github.com/Kerollmops/heed", tag = "v0.12.1" }
http = "0.2.4"
indexmap = { version = "1.7.0", features = ["serde-1"] }
itertools = "0.10.1"
lazy_static = "1.4.0"
log = "0.4.14"
meilisearch-error = { path = "../meilisearch-error" }
meilisearch-tokenizer = { git = "https://github.com/meilisearch/tokenizer.git", tag = "v0.2.5" }
memmap = "0.7.0"
milli = { git = "https://github.com/meilisearch/milli.git", tag = "v0.17.2" }
mime = "0.3.16"
num_cpus = "1.13.0"
once_cell = "1.8.0"
parking_lot = "0.11.2"
rand = "0.8.4"
rayon = "1.5.1"
regex = "1.5.4"
rustls = "0.19.1"
serde = { version = "1.0.130", features = ["derive"] }
serde_json = { version = "1.0.67", features = ["preserve_order"] }
siphasher = "0.3.7"
slice-group-by = "0.2.6"
structopt = "0.3.23"
tar = "0.4.37"
tempfile = "3.2.0"
thiserror = "1.0.28"
tokio = { version = "1.11.0", features = ["full"] }
uuid = { version = "0.8.2", features = ["serde"] }
walkdir = "2.3.2"
obkv = "0.2.0"
pin-project = "1.0.8"
whoami = { version = "1.1.3", optional = true }
reqwest = { version = "0.11.4", features = ["json", "rustls-tls"], default-features = false, optional = true }
sysinfo = "0.20.2"
derivative = "2.2.0"
[dev-dependencies]
actix-rt = "2.2.0"
mockall = "0.10.2"
paste = "1.0.5"

View File

@ -0,0 +1,374 @@
use std::fmt;
use std::io::{self, Read, Result as IoResult, Seek, Write};
use csv::{Reader as CsvReader, StringRecordsIntoIter};
use meilisearch_error::{Code, ErrorCode};
use milli::documents::DocumentBatchBuilder;
use serde_json::{Deserializer, Map, Value};
type Result<T> = std::result::Result<T, DocumentFormatError>;
#[derive(Debug)]
pub enum PayloadType {
Ndjson,
Json,
Csv,
}
impl fmt::Display for PayloadType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
PayloadType::Ndjson => write!(f, "ndjson"),
PayloadType::Json => write!(f, "json"),
PayloadType::Csv => write!(f, "csv"),
}
}
}
#[derive(thiserror::Error, Debug)]
pub enum DocumentFormatError {
#[error("Internal error: {0}")]
Internal(Box<dyn std::error::Error + Send + Sync + 'static>),
#[error("{0}. The {1} payload provided is malformed.")]
MalformedPayload(
Box<dyn std::error::Error + Send + Sync + 'static>,
PayloadType,
),
}
impl ErrorCode for DocumentFormatError {
fn error_code(&self) -> Code {
match self {
DocumentFormatError::Internal(_) => Code::Internal,
DocumentFormatError::MalformedPayload(_, _) => Code::MalformedPayload,
}
}
}
internal_error!(DocumentFormatError: milli::documents::Error, io::Error);
macro_rules! malformed {
($type:path, $e:expr) => {
$e.map_err(|e| DocumentFormatError::MalformedPayload(Box::new(e), $type))
};
}
pub fn read_csv(input: impl Read, writer: impl Write + Seek) -> Result<()> {
let mut builder = DocumentBatchBuilder::new(writer).unwrap();
let iter = CsvDocumentIter::from_reader(input)?;
for doc in iter {
let doc = doc?;
builder.add_documents(doc).unwrap();
}
builder.finish().unwrap();
Ok(())
}
/// read jsonl from input and write an obkv batch to writer.
pub fn read_ndjson(input: impl Read, writer: impl Write + Seek) -> Result<()> {
let mut builder = DocumentBatchBuilder::new(writer)?;
let stream = Deserializer::from_reader(input).into_iter::<Map<String, Value>>();
for value in stream {
let value = malformed!(PayloadType::Ndjson, value)?;
builder.add_documents(&value)?;
}
builder.finish()?;
Ok(())
}
/// read json from input and write an obkv batch to writer.
pub fn read_json(input: impl Read, writer: impl Write + Seek) -> Result<()> {
let mut builder = DocumentBatchBuilder::new(writer).unwrap();
let documents: Vec<Map<String, Value>> =
malformed!(PayloadType::Json, serde_json::from_reader(input))?;
builder.add_documents(documents).unwrap();
builder.finish().unwrap();
Ok(())
}
enum AllowedType {
String,
Number,
}
fn parse_csv_header(header: &str) -> (String, AllowedType) {
// if there are several separators we only split on the last one.
match header.rsplit_once(':') {
Some((field_name, field_type)) => match field_type {
"string" => (field_name.to_string(), AllowedType::String),
"number" => (field_name.to_string(), AllowedType::Number),
// if the pattern isn't reconized, we keep the whole field.
_otherwise => (header.to_string(), AllowedType::String),
},
None => (header.to_string(), AllowedType::String),
}
}
pub struct CsvDocumentIter<R>
where
R: Read,
{
documents: StringRecordsIntoIter<R>,
headers: Vec<(String, AllowedType)>,
}
impl<R: Read> CsvDocumentIter<R> {
pub fn from_reader(reader: R) -> IoResult<Self> {
let mut records = CsvReader::from_reader(reader);
let headers = records
.headers()?
.into_iter()
.map(parse_csv_header)
.collect();
Ok(Self {
documents: records.into_records(),
headers,
})
}
}
impl<R: Read> Iterator for CsvDocumentIter<R> {
type Item = Result<Map<String, Value>>;
fn next(&mut self) -> Option<Self::Item> {
let csv_document = self.documents.next()?;
match csv_document {
Ok(csv_document) => {
let mut document = Map::new();
for ((field_name, field_type), value) in
self.headers.iter().zip(csv_document.into_iter())
{
let parsed_value = match field_type {
AllowedType::Number => {
malformed!(PayloadType::Csv, value.parse::<f64>().map(Value::from))
}
AllowedType::String => Ok(Value::String(value.to_string())),
};
match parsed_value {
Ok(value) => drop(document.insert(field_name.to_string(), value)),
Err(e) => return Some(Err(e)),
}
}
Some(Ok(document))
}
Err(e) => Some(Err(DocumentFormatError::MalformedPayload(
Box::new(e),
PayloadType::Csv,
))),
}
}
}
#[cfg(test)]
mod test {
use serde_json::json;
use super::*;
#[test]
fn simple_csv_document() {
let documents = r#"city,country,pop
"Boston","United States","4628910""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert_eq!(
Value::Object(csv_iter.next().unwrap().unwrap()),
json!({
"city": "Boston",
"country": "United States",
"pop": "4628910",
})
);
}
#[test]
fn coma_in_field() {
let documents = r#"city,country,pop
"Boston","United, States","4628910""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert_eq!(
Value::Object(csv_iter.next().unwrap().unwrap()),
json!({
"city": "Boston",
"country": "United, States",
"pop": "4628910",
})
);
}
#[test]
fn quote_in_field() {
let documents = r#"city,country,pop
"Boston","United"" States","4628910""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert_eq!(
Value::Object(csv_iter.next().unwrap().unwrap()),
json!({
"city": "Boston",
"country": "United\" States",
"pop": "4628910",
})
);
}
#[test]
fn integer_in_field() {
let documents = r#"city,country,pop:number
"Boston","United States","4628910""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert_eq!(
Value::Object(csv_iter.next().unwrap().unwrap()),
json!({
"city": "Boston",
"country": "United States",
"pop": 4628910.0,
})
);
}
#[test]
fn float_in_field() {
let documents = r#"city,country,pop:number
"Boston","United States","4628910.01""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert_eq!(
Value::Object(csv_iter.next().unwrap().unwrap()),
json!({
"city": "Boston",
"country": "United States",
"pop": 4628910.01,
})
);
}
#[test]
fn several_colon_in_header() {
let documents = r#"city:love:string,country:state,pop
"Boston","United States","4628910""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert_eq!(
Value::Object(csv_iter.next().unwrap().unwrap()),
json!({
"city:love": "Boston",
"country:state": "United States",
"pop": "4628910",
})
);
}
#[test]
fn ending_by_colon_in_header() {
let documents = r#"city:,country,pop
"Boston","United States","4628910""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert_eq!(
Value::Object(csv_iter.next().unwrap().unwrap()),
json!({
"city:": "Boston",
"country": "United States",
"pop": "4628910",
})
);
}
#[test]
fn starting_by_colon_in_header() {
let documents = r#":city,country,pop
"Boston","United States","4628910""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert_eq!(
Value::Object(csv_iter.next().unwrap().unwrap()),
json!({
":city": "Boston",
"country": "United States",
"pop": "4628910",
})
);
}
#[ignore]
#[test]
fn starting_by_colon_in_header2() {
let documents = r#":string,country,pop
"Boston","United States","4628910""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert!(csv_iter.next().unwrap().is_err());
}
#[test]
fn double_colon_in_header() {
let documents = r#"city::string,country,pop
"Boston","United States","4628910""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert_eq!(
Value::Object(csv_iter.next().unwrap().unwrap()),
json!({
"city:": "Boston",
"country": "United States",
"pop": "4628910",
})
);
}
#[test]
fn bad_type_in_header() {
let documents = r#"city,country:number,pop
"Boston","United States","4628910""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert!(csv_iter.next().unwrap().is_err());
}
#[test]
fn bad_column_count1() {
let documents = r#"city,country,pop
"Boston","United States","4628910", "too much""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert!(csv_iter.next().unwrap().is_err());
}
#[test]
fn bad_column_count2() {
let documents = r#"city,country,pop
"Boston","United States""#;
let mut csv_iter = CsvDocumentIter::from_reader(documents.as_bytes()).unwrap();
assert!(csv_iter.next().unwrap().is_err());
}
}

View File

@ -0,0 +1,62 @@
use std::error::Error;
use std::fmt;
use meilisearch_error::{Code, ErrorCode};
use milli::UserError;
macro_rules! internal_error {
($target:ty : $($other:path), *) => {
$(
impl From<$other> for $target {
fn from(other: $other) -> Self {
Self::Internal(Box::new(other))
}
}
)*
}
}
#[derive(Debug)]
pub struct MilliError<'a>(pub &'a milli::Error);
impl Error for MilliError<'_> {}
impl fmt::Display for MilliError<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.0.fmt(f)
}
}
impl ErrorCode for MilliError<'_> {
fn error_code(&self) -> Code {
match self.0 {
milli::Error::InternalError(_) => Code::Internal,
milli::Error::IoError(_) => Code::Internal,
milli::Error::UserError(ref error) => {
match error {
// TODO: wait for spec for new error codes.
UserError::SerdeJson(_)
| UserError::MaxDatabaseSizeReached
| UserError::InvalidDocumentId { .. }
| UserError::InvalidStoreFile
| UserError::NoSpaceLeftOnDevice
| UserError::DocumentLimitReached => Code::Internal,
UserError::AttributeLimitReached => Code::MaxFieldsLimitExceeded,
UserError::InvalidFilter(_) => Code::Filter,
UserError::InvalidFilterAttribute(_) => Code::Filter,
UserError::MissingDocumentId { .. } => Code::MissingDocumentId,
UserError::MissingPrimaryKey => Code::MissingPrimaryKey,
UserError::PrimaryKeyCannotBeChanged => Code::PrimaryKeyAlreadyPresent,
UserError::PrimaryKeyCannotBeReset => Code::PrimaryKeyAlreadyPresent,
UserError::SortRankingRuleMissing => Code::Sort,
UserError::UnknownInternalDocumentId { .. } => Code::DocumentNotFound,
UserError::InvalidFacetsDistribution { .. } => Code::BadRequest,
UserError::InvalidSortableAttribute { .. } => Code::Sort,
UserError::CriterionError(_) => Code::InvalidRankingRule,
UserError::InvalidGeoField { .. } => Code::InvalidGeoField,
UserError::SortError(_) => Code::Sort,
}
}
}
}
}

View File

@ -1,18 +1,19 @@
use std::fs::{create_dir_all, File};
use std::io::{BufRead, BufReader, Write};
use std::io::{BufReader, Seek, SeekFrom, Write};
use std::path::Path;
use std::sync::Arc;
use anyhow::{bail, Context};
use heed::RoTxn;
use anyhow::Context;
use heed::{EnvOpenOptions, RoTxn};
use indexmap::IndexMap;
use milli::update::{IndexDocumentsMethod, UpdateFormat::JsonStream};
use milli::documents::DocumentBatchReader;
use serde::{Deserialize, Serialize};
use crate::option::IndexerOpts;
use crate::document_formats::read_ndjson;
use crate::index::update_handler::UpdateHandler;
use crate::index::updates::apply_settings_to_builder;
use super::error::Result;
use super::{update_handler::UpdateHandler, Index, Settings, Unchecked};
use super::{index::Index, Settings, Unchecked};
#[derive(Serialize, Deserialize)]
struct DumpMeta {
@ -27,6 +28,11 @@ impl Index {
pub fn dump(&self, path: impl AsRef<Path>) -> Result<()> {
// acquire write txn make sure any ongoing write is finished before we start.
let txn = self.env.write_txn()?;
let path = path
.as_ref()
.join(format!("indexes/{}", self.uuid.to_string()));
create_dir_all(&path)?;
self.dump_documents(&txn, &path)?;
self.dump_meta(&txn, &path)?;
@ -81,7 +87,7 @@ impl Index {
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
size: usize,
indexing_options: &IndexerOpts,
update_handler: &UpdateHandler,
) -> anyhow::Result<()> {
let dir_name = src
.as_ref()
@ -92,42 +98,54 @@ impl Index {
create_dir_all(&dst_dir_path)?;
let meta_path = src.as_ref().join(META_FILE_NAME);
let mut meta_file = File::open(meta_path)?;
let meta_file = File::open(meta_path)?;
let DumpMeta {
settings,
primary_key,
} = serde_json::from_reader(&mut meta_file)?;
} = serde_json::from_reader(meta_file)?;
let settings = settings.check();
let index = Self::open(&dst_dir_path, size)?;
let mut options = EnvOpenOptions::new();
options.map_size(size);
let index = milli::Index::new(options, &dst_dir_path)?;
let mut txn = index.write_txn()?;
let handler = UpdateHandler::new(indexing_options)?;
// Apply settings first
let builder = update_handler.update_builder(0);
let mut builder = builder.settings(&mut txn, &index);
index.update_settings_txn(&mut txn, &settings, handler.update_builder(0))?;
if let Some(primary_key) = primary_key {
builder.set_primary_key(primary_key);
}
apply_settings_to_builder(&settings, &mut builder);
builder.execute(|_, _| ())?;
let document_file_path = src.as_ref().join(DATA_FILE_NAME);
let reader = File::open(&document_file_path)?;
let mut reader = BufReader::new(reader);
reader.fill_buf()?;
// If the document file is empty, we don't perform the document addition, to prevent
// a primary key error to be thrown.
if !reader.buffer().is_empty() {
index.update_documents_txn(
&mut txn,
JsonStream,
IndexDocumentsMethod::UpdateDocuments,
Some(reader),
handler.update_builder(0),
primary_key.as_deref(),
)?;
let reader = BufReader::new(File::open(&document_file_path)?);
let mut tmp_doc_file = tempfile::tempfile()?;
read_ndjson(reader, &mut tmp_doc_file)?;
tmp_doc_file.seek(SeekFrom::Start(0))?;
let documents_reader = DocumentBatchReader::from_reader(tmp_doc_file)?;
//If the document file is empty, we don't perform the document addition, to prevent
//a primary key error to be thrown.
if !documents_reader.is_empty() {
let builder = update_handler
.update_builder(0)
.index_documents(&mut txn, &index);
builder.execute(documents_reader, |_, _| ())?;
}
txn.commit()?;
match Arc::try_unwrap(index.0) {
Ok(inner) => inner.prepare_for_closing().wait(),
Err(_) => bail!("Could not close index properly."),
}
index.prepare_for_closing().wait();
Ok(())
}

View File

@ -17,6 +17,8 @@ pub enum IndexError {
Facet(#[from] FacetError),
#[error("{0}")]
Milli(#[from] milli::Error),
#[error("A primary key is already present. It's impossible to update it")]
ExistingPrimaryKey,
}
internal_error!(
@ -33,6 +35,7 @@ impl ErrorCode for IndexError {
IndexError::DocumentNotFound(_) => Code::DocumentNotFound,
IndexError::Facet(e) => e.error_code(),
IndexError::Milli(e) => MilliError(e).error_code(),
IndexError::ExistingPrimaryKey => Code::PrimaryKeyAlreadyPresent,
}
}
}

View File

@ -5,61 +5,130 @@ use std::ops::Deref;
use std::path::Path;
use std::sync::Arc;
use chrono::{DateTime, Utc};
use heed::{EnvOpenOptions, RoTxn};
use milli::{obkv_to_json, FieldId};
use serde::{de::Deserializer, Deserialize};
use milli::update::Setting;
use milli::{obkv_to_json, FieldDistribution, FieldId};
use serde::{Deserialize, Serialize};
use serde_json::{Map, Value};
use uuid::Uuid;
use crate::helpers::EnvSizer;
use error::Result;
use crate::index_controller::update_file_store::UpdateFileStore;
use crate::EnvSizer;
pub use search::{default_crop_length, SearchQuery, SearchResult, DEFAULT_SEARCH_LIMIT};
pub use updates::{Checked, Facets, Settings, Unchecked};
use self::error::IndexError;
pub mod error;
pub mod update_handler;
mod dump;
mod search;
mod updates;
use super::error::IndexError;
use super::error::Result;
use super::update_handler::UpdateHandler;
use super::{Checked, Settings};
pub type Document = Map<String, Value>;
#[derive(Clone)]
pub struct Index(pub Arc<milli::Index>);
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct IndexMeta {
created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
pub primary_key: Option<String>,
}
impl IndexMeta {
pub fn new(index: &Index) -> Result<Self> {
let txn = index.read_txn()?;
Self::new_txn(index, &txn)
}
pub fn new_txn(index: &Index, txn: &heed::RoTxn) -> Result<Self> {
let created_at = index.created_at(txn)?;
let updated_at = index.updated_at(txn)?;
let primary_key = index.primary_key(txn)?.map(String::from);
Ok(Self {
created_at,
updated_at,
primary_key,
})
}
}
#[derive(Serialize, Debug)]
#[serde(rename_all = "camelCase")]
pub struct IndexStats {
#[serde(skip)]
pub size: u64,
pub number_of_documents: u64,
/// Whether the current index is performing an update. It is initially `None` when the
/// index returns it, since it is the `UpdateStore` that knows what index is currently indexing. It is
/// later set to either true or false, we we retrieve the information from the `UpdateStore`
pub is_indexing: Option<bool>,
pub field_distribution: FieldDistribution,
}
#[derive(Clone, derivative::Derivative)]
#[derivative(Debug)]
pub struct Index {
pub uuid: Uuid,
#[derivative(Debug = "ignore")]
pub inner: Arc<milli::Index>,
#[derivative(Debug = "ignore")]
pub update_file_store: Arc<UpdateFileStore>,
#[derivative(Debug = "ignore")]
pub update_handler: Arc<UpdateHandler>,
}
impl Deref for Index {
type Target = milli::Index;
fn deref(&self) -> &Self::Target {
self.0.as_ref()
self.inner.as_ref()
}
}
pub fn deserialize_some<'de, T, D>(deserializer: D) -> std::result::Result<Option<T>, D::Error>
where
T: Deserialize<'de>,
D: Deserializer<'de>,
{
Deserialize::deserialize(deserializer).map(Some)
}
impl Index {
pub fn open(path: impl AsRef<Path>, size: usize) -> Result<Self> {
pub fn open(
path: impl AsRef<Path>,
size: usize,
update_file_store: Arc<UpdateFileStore>,
uuid: Uuid,
update_handler: Arc<UpdateHandler>,
) -> Result<Self> {
create_dir_all(&path)?;
let mut options = EnvOpenOptions::new();
options.map_size(size);
let index = milli::Index::new(options, &path)?;
Ok(Index(Arc::new(index)))
let inner = Arc::new(milli::Index::new(options, &path)?);
Ok(Index {
inner,
update_file_store,
uuid,
update_handler,
})
}
pub fn inner(&self) -> &milli::Index {
&self.inner
}
pub fn stats(&self) -> Result<IndexStats> {
let rtxn = self.read_txn()?;
Ok(IndexStats {
size: self.size(),
number_of_documents: self.number_of_documents(&rtxn)?,
is_indexing: None,
field_distribution: self.field_distribution(&rtxn)?,
})
}
pub fn meta(&self) -> Result<IndexMeta> {
IndexMeta::new(self)
}
pub fn settings(&self) -> Result<Settings<Checked>> {
let txn = self.read_txn()?;
self.settings_txn(&txn)
}
pub fn uuid(&self) -> Uuid {
self.uuid
}
pub fn settings_txn(&self, txn: &RoTxn) -> Result<Settings<Checked>> {
let displayed_attributes = self
.displayed_fields(txn)?
@ -71,6 +140,8 @@ impl Index {
let filterable_attributes = self.filterable_fields(txn)?.into_iter().collect();
let sortable_attributes = self.sortable_fields(txn)?.into_iter().collect();
let criteria = self
.criteria(txn)?
.into_iter()
@ -100,13 +171,23 @@ impl Index {
.collect();
Ok(Settings {
displayed_attributes: Some(displayed_attributes),
searchable_attributes: Some(searchable_attributes),
filterable_attributes: Some(Some(filterable_attributes)),
ranking_rules: Some(Some(criteria)),
stop_words: Some(Some(stop_words)),
distinct_attribute: Some(distinct_field),
synonyms: Some(Some(synonyms)),
displayed_attributes: match displayed_attributes {
Some(attrs) => Setting::Set(attrs),
None => Setting::Reset,
},
searchable_attributes: match searchable_attributes {
Some(attrs) => Setting::Set(attrs),
None => Setting::Reset,
},
filterable_attributes: Setting::Set(filterable_attributes),
sortable_attributes: Setting::Set(sortable_attributes),
ranking_rules: Setting::Set(criteria),
stop_words: Setting::Set(stop_words),
distinct_attribute: match distinct_field {
Some(field) => Setting::Set(field),
None => Setting::Reset,
},
synonyms: Setting::Set(synonyms),
_kind: PhantomData,
})
}
@ -191,4 +272,15 @@ impl Index {
displayed_fields_ids.retain(|fid| attributes_to_retrieve_ids.contains(fid));
Ok(displayed_fields_ids)
}
pub fn snapshot(&self, path: impl AsRef<Path>) -> Result<()> {
let mut dst = path.as_ref().join(format!("indexes/{}/", self.uuid));
create_dir_all(&dst)?;
dst.push("data.mdb");
let _txn = self.write_txn()?;
self.inner
.env
.copy_to_path(dst, heed::CompactionOption::Enabled)?;
Ok(())
}
}

View File

@ -0,0 +1,365 @@
pub use search::{default_crop_length, SearchQuery, SearchResult, DEFAULT_SEARCH_LIMIT};
pub use updates::{apply_settings_to_builder, Checked, Facets, Settings, Unchecked};
mod dump;
pub mod error;
mod search;
pub mod update_handler;
mod updates;
#[allow(clippy::module_inception)]
mod index;
pub use index::{Document, IndexMeta, IndexStats};
#[cfg(not(test))]
pub use index::Index;
#[cfg(test)]
pub use test::MockIndex as Index;
/// The index::test module provides means of mocking an index instance. I can be used throughout the
/// code for unit testing, in places where an index would normally be used.
#[cfg(test)]
pub mod test {
use std::any::Any;
use std::collections::HashMap;
use std::panic::{RefUnwindSafe, UnwindSafe};
use std::path::Path;
use std::path::PathBuf;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::{Arc, Mutex};
use serde_json::{Map, Value};
use uuid::Uuid;
use crate::index_controller::update_file_store::UpdateFileStore;
use crate::index_controller::updates::status::{Failed, Processed, Processing};
use super::error::Result;
use super::index::Index;
use super::update_handler::UpdateHandler;
use super::{Checked, IndexMeta, IndexStats, SearchQuery, SearchResult, Settings};
pub struct Stub<A, R> {
name: String,
times: Mutex<Option<usize>>,
stub: Box<dyn Fn(A) -> R + Sync + Send>,
invalidated: AtomicBool,
}
impl<A, R> Drop for Stub<A, R> {
fn drop(&mut self) {
if !self.invalidated.load(Ordering::Relaxed) {
let lock = self.times.lock().unwrap();
if let Some(n) = *lock {
assert_eq!(n, 0, "{} not called enough times", self.name);
}
}
}
}
impl<A, R> Stub<A, R> {
fn invalidate(&self) {
self.invalidated.store(true, Ordering::Relaxed);
}
}
impl<A: UnwindSafe, R> Stub<A, R> {
fn call(&self, args: A) -> R {
let mut lock = self.times.lock().unwrap();
match *lock {
Some(0) => panic!("{} called to many times", self.name),
Some(ref mut times) => {
*times -= 1;
}
None => (),
}
// Since we add assertions in the drop implementation for Stub, a panic can occur in a
// panic, causing a hard abort of the program. To handle that, we catch the panic, and
// set the stub as invalidated so the assertions aren't run during the drop.
impl<'a, A, R> RefUnwindSafe for StubHolder<'a, A, R> {}
struct StubHolder<'a, A, R>(&'a (dyn Fn(A) -> R + Sync + Send));
let stub = StubHolder(self.stub.as_ref());
match std::panic::catch_unwind(|| (stub.0)(args)) {
Ok(r) => r,
Err(panic) => {
self.invalidate();
std::panic::resume_unwind(panic);
}
}
}
}
#[derive(Debug, Default)]
struct StubStore {
inner: Arc<Mutex<HashMap<String, Box<dyn Any + Sync + Send>>>>,
}
impl StubStore {
pub fn insert<A: 'static, R: 'static>(&self, name: String, stub: Stub<A, R>) {
let mut lock = self.inner.lock().unwrap();
lock.insert(name, Box::new(stub));
}
pub fn get<A, B>(&self, name: &str) -> Option<&Stub<A, B>> {
let mut lock = self.inner.lock().unwrap();
match lock.get_mut(name) {
Some(s) => {
let s = s.as_mut() as *mut dyn Any as *mut Stub<A, B>;
Some(unsafe { &mut *s })
}
None => None,
}
}
}
pub struct StubBuilder<'a, A, R> {
name: String,
store: &'a StubStore,
times: Option<usize>,
_f: std::marker::PhantomData<fn(A) -> R>,
}
impl<'a, A: 'static, R: 'static> StubBuilder<'a, A, R> {
/// Asserts the stub has been called exactly `times` times.
#[must_use]
pub fn times(mut self, times: usize) -> Self {
self.times = Some(times);
self
}
/// Asserts the stub has been called exactly once.
#[must_use]
pub fn once(mut self) -> Self {
self.times = Some(1);
self
}
/// The function that will be called when the stub is called. This needs to be called to
/// actually build the stub and register it to the stub store.
pub fn then(self, f: impl Fn(A) -> R + Sync + Send + 'static) {
let times = Mutex::new(self.times);
let stub = Stub {
stub: Box::new(f),
times,
name: self.name.clone(),
invalidated: AtomicBool::new(false),
};
self.store.insert(self.name, stub);
}
}
/// Mocker allows to stub metod call on any struct. you can register stubs by calling
/// `Mocker::when` and retrieve it in the proxy implementation when with `Mocker::get`.
#[derive(Debug, Default)]
pub struct Mocker {
store: StubStore,
}
impl Mocker {
pub fn when<A, R>(&self, name: &str) -> StubBuilder<A, R> {
StubBuilder {
name: name.to_string(),
store: &self.store,
times: None,
_f: std::marker::PhantomData,
}
}
pub fn get<A, R>(&self, name: &str) -> &Stub<A, R> {
match self.store.get(name) {
Some(stub) => stub,
None => {
// panic here causes the stubs to get dropped, and panic in turn. To prevent
// that, we forget them, and let them be cleaned by the os later. This is not
// optimal, but is still better than nested panicks.
let mut stubs = self.store.inner.lock().unwrap();
let stubs = std::mem::take(&mut *stubs);
std::mem::forget(stubs);
panic!("unexpected call to {}", name)
}
}
}
}
#[derive(Debug, Clone)]
pub enum MockIndex {
Vrai(Index),
Faux(Arc<Mocker>),
}
impl MockIndex {
pub fn faux(faux: Mocker) -> Self {
Self::Faux(Arc::new(faux))
}
pub fn open(
path: impl AsRef<Path>,
size: usize,
update_file_store: Arc<UpdateFileStore>,
uuid: Uuid,
update_handler: Arc<UpdateHandler>,
) -> Result<Self> {
let index = Index::open(path, size, update_file_store, uuid, update_handler)?;
Ok(Self::Vrai(index))
}
pub fn load_dump(
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
size: usize,
update_handler: &UpdateHandler,
) -> anyhow::Result<()> {
Index::load_dump(src, dst, size, update_handler)?;
Ok(())
}
pub fn handle_update(&self, update: Processing) -> std::result::Result<Processed, Failed> {
match self {
MockIndex::Vrai(index) => index.handle_update(update),
MockIndex::Faux(faux) => faux.get("handle_update").call(update),
}
}
pub fn uuid(&self) -> Uuid {
match self {
MockIndex::Vrai(index) => index.uuid(),
MockIndex::Faux(faux) => faux.get("uuid").call(()),
}
}
pub fn stats(&self) -> Result<IndexStats> {
match self {
MockIndex::Vrai(index) => index.stats(),
MockIndex::Faux(_) => todo!(),
}
}
pub fn meta(&self) -> Result<IndexMeta> {
match self {
MockIndex::Vrai(index) => index.meta(),
MockIndex::Faux(_) => todo!(),
}
}
pub fn settings(&self) -> Result<Settings<Checked>> {
match self {
MockIndex::Vrai(index) => index.settings(),
MockIndex::Faux(_) => todo!(),
}
}
pub fn retrieve_documents<S: AsRef<str>>(
&self,
offset: usize,
limit: usize,
attributes_to_retrieve: Option<Vec<S>>,
) -> Result<Vec<Map<String, Value>>> {
match self {
MockIndex::Vrai(index) => {
index.retrieve_documents(offset, limit, attributes_to_retrieve)
}
MockIndex::Faux(_) => todo!(),
}
}
pub fn retrieve_document<S: AsRef<str>>(
&self,
doc_id: String,
attributes_to_retrieve: Option<Vec<S>>,
) -> Result<Map<String, Value>> {
match self {
MockIndex::Vrai(index) => index.retrieve_document(doc_id, attributes_to_retrieve),
MockIndex::Faux(_) => todo!(),
}
}
pub fn size(&self) -> u64 {
match self {
MockIndex::Vrai(index) => index.size(),
MockIndex::Faux(_) => todo!(),
}
}
pub fn snapshot(&self, path: impl AsRef<Path>) -> Result<()> {
match self {
MockIndex::Vrai(index) => index.snapshot(path),
MockIndex::Faux(faux) => faux.get("snapshot").call(path.as_ref()),
}
}
pub fn inner(&self) -> &milli::Index {
match self {
MockIndex::Vrai(index) => index.inner(),
MockIndex::Faux(_) => todo!(),
}
}
pub fn update_primary_key(&self, primary_key: Option<String>) -> Result<IndexMeta> {
match self {
MockIndex::Vrai(index) => index.update_primary_key(primary_key),
MockIndex::Faux(_) => todo!(),
}
}
pub fn perform_search(&self, query: SearchQuery) -> Result<SearchResult> {
match self {
MockIndex::Vrai(index) => index.perform_search(query),
MockIndex::Faux(faux) => faux.get("perform_search").call(query),
}
}
pub fn dump(&self, path: impl AsRef<Path>) -> Result<()> {
match self {
MockIndex::Vrai(index) => index.dump(path),
MockIndex::Faux(faux) => faux.get("dump").call(path.as_ref()),
}
}
}
#[test]
fn test_faux_index() {
let faux = Mocker::default();
faux.when("snapshot")
.times(2)
.then(|_: &Path| -> Result<()> { Ok(()) });
let index = MockIndex::faux(faux);
let path = PathBuf::from("hello");
index.snapshot(&path).unwrap();
index.snapshot(&path).unwrap();
}
#[test]
#[should_panic]
fn test_faux_unexisting_method_stub() {
let faux = Mocker::default();
let index = MockIndex::faux(faux);
let path = PathBuf::from("hello");
index.snapshot(&path).unwrap();
index.snapshot(&path).unwrap();
}
#[test]
#[should_panic]
fn test_faux_panic() {
let faux = Mocker::default();
faux.when("snapshot")
.times(2)
.then(|_: &Path| -> Result<()> {
panic!();
});
let index = MockIndex::faux(faux);
let path = PathBuf::from("hello");
index.snapshot(&path).unwrap();
index.snapshot(&path).unwrap();
}
}

View File

@ -1,23 +1,25 @@
use std::collections::{BTreeMap, BTreeSet, HashSet};
use std::str::FromStr;
use std::time::Instant;
use either::Either;
use heed::RoTxn;
use indexmap::IndexMap;
use meilisearch_tokenizer::{Analyzer, AnalyzerConfig, Token};
use milli::{FieldId, FieldsIdsMap, FilterCondition, MatchingWords};
use milli::{AscDesc, FieldId, FieldsIdsMap, FilterCondition, MatchingWords, SortError};
use regex::Regex;
use serde::{Deserialize, Serialize};
use serde_json::Value;
use serde_json::{json, Value};
use crate::index::error::FacetError;
use super::error::Result;
use super::Index;
use super::error::{IndexError, Result};
use super::index::Index;
pub type Document = IndexMap<String, Value>;
type MatchesInfo = BTreeMap<String, Vec<MatchInfo>>;
#[derive(Serialize, Debug, Clone)]
#[derive(Serialize, Debug, Clone, PartialEq)]
pub struct MatchInfo {
start: usize,
length: usize,
@ -33,7 +35,7 @@ pub const fn default_crop_length() -> usize {
DEFAULT_CROP_LENGTH
}
#[derive(Deserialize, Debug)]
#[derive(Deserialize, Debug, Clone, PartialEq)]
#[serde(rename_all = "camelCase", deny_unknown_fields)]
pub struct SearchQuery {
pub q: Option<String>,
@ -49,10 +51,11 @@ pub struct SearchQuery {
#[serde(default = "Default::default")]
pub matches: bool,
pub filter: Option<Value>,
pub sort: Option<Vec<String>>,
pub facets_distribution: Option<Vec<String>>,
}
#[derive(Debug, Clone, Serialize)]
#[derive(Debug, Clone, Serialize, PartialEq)]
pub struct SearchHit {
#[serde(flatten)]
pub document: Document,
@ -62,7 +65,7 @@ pub struct SearchHit {
pub matches_info: Option<MatchesInfo>,
}
#[derive(Serialize, Debug)]
#[derive(Serialize, Debug, Clone, PartialEq)]
#[serde(rename_all = "camelCase")]
pub struct SearchResult {
pub hits: Vec<SearchHit>,
@ -104,6 +107,17 @@ impl Index {
}
}
if let Some(ref sort) = query.sort {
let sort = match sort.iter().map(|s| AscDesc::from_str(s)).collect() {
Ok(sorts) => sorts,
Err(asc_desc_error) => {
return Err(IndexError::Milli(SortError::from(asc_desc_error).into()))
}
};
search.sort_criteria(sort);
}
let milli::SearchResult {
documents_ids,
matching_words,
@ -176,7 +190,7 @@ impl Index {
let documents_iter = self.documents(&rtxn, documents_ids)?;
for (_id, obkv) in documents_iter {
let document = make_document(&to_retrieve_ids, &fields_ids_map, obkv)?;
let mut document = make_document(&to_retrieve_ids, &fields_ids_map, obkv)?;
let matches_info = query
.matches
@ -190,6 +204,10 @@ impl Index {
&formatted_options,
)?;
if let Some(sort) = query.sort.as_ref() {
insert_geo_distance(sort, &mut document);
}
let hit = SearchHit {
document,
formatted,
@ -230,6 +248,25 @@ impl Index {
}
}
fn insert_geo_distance(sorts: &[String], document: &mut Document) {
lazy_static::lazy_static! {
static ref GEO_REGEX: Regex =
Regex::new(r"_geoPoint\(\s*([[:digit:].\-]+)\s*,\s*([[:digit:].\-]+)\s*\)").unwrap();
};
if let Some(capture_group) = sorts.iter().find_map(|sort| GEO_REGEX.captures(sort)) {
// TODO: TAMO: milli encountered an internal error, what do we want to do?
let base = [
capture_group[1].parse().unwrap(),
capture_group[2].parse().unwrap(),
];
let geo_point = &document.get("_geo").unwrap_or(&json!(null));
if let Some((lat, lng)) = geo_point["lat"].as_f64().zip(geo_point["lng"].as_f64()) {
let distance = milli::distance_between_two_points(&base, &[lat, lng]);
document.insert("_geoDistance".to_string(), json!(distance.round() as usize));
}
}
}
fn compute_matches<A: AsRef<[u8]>>(
matcher: &impl Matcher,
document: &Document,
@ -645,7 +682,7 @@ fn parse_filter_array(
}
}
Ok(FilterCondition::from_array(txn, &index.0, ands)?)
Ok(FilterCondition::from_array(txn, index, ands)?)
}
#[cfg(test)]
@ -1315,4 +1352,65 @@ mod test {
r##"{"about": [MatchInfo { start: 0, length: 6 }, MatchInfo { start: 31, length: 7 }, MatchInfo { start: 191, length: 7 }, MatchInfo { start: 225, length: 7 }, MatchInfo { start: 233, length: 6 }], "color": [MatchInfo { start: 0, length: 3 }]}"##
);
}
#[test]
fn test_insert_geo_distance() {
let value: Document = serde_json::from_str(
r#"{
"_geo": {
"lat": 50.629973371633746,
"lng": 3.0569447399419567
},
"city": "Lille",
"id": "1"
}"#,
)
.unwrap();
let sorters = &["_geoPoint(50.629973371633746,3.0569447399419567):desc".to_string()];
let mut document = value.clone();
insert_geo_distance(sorters, &mut document);
assert_eq!(document.get("_geoDistance"), Some(&json!(0)));
let sorters = &["_geoPoint(50.629973371633746, 3.0569447399419567):asc".to_string()];
let mut document = value.clone();
insert_geo_distance(sorters, &mut document);
assert_eq!(document.get("_geoDistance"), Some(&json!(0)));
let sorters =
&["_geoPoint( 50.629973371633746 , 3.0569447399419567 ):desc".to_string()];
let mut document = value.clone();
insert_geo_distance(sorters, &mut document);
assert_eq!(document.get("_geoDistance"), Some(&json!(0)));
let sorters = &[
"prix:asc",
"villeneuve:desc",
"_geoPoint(50.629973371633746, 3.0569447399419567):asc",
"ubu:asc",
]
.map(|s| s.to_string());
let mut document = value.clone();
insert_geo_distance(sorters, &mut document);
assert_eq!(document.get("_geoDistance"), Some(&json!(0)));
// only the first geoPoint is used to compute the distance
let sorters = &[
"chien:desc",
"_geoPoint(50.629973371633746, 3.0569447399419567):asc",
"pangolin:desc",
"_geoPoint(100.0, -80.0):asc",
"chat:asc",
]
.map(|s| s.to_string());
let mut document = value.clone();
insert_geo_distance(sorters, &mut document);
assert_eq!(document.get("_geoDistance"), Some(&json!(0)));
// there was no _geoPoint so nothing is inserted in the document
let sorters = &["chien:asc".to_string()];
let mut document = value;
insert_geo_distance(sorters, &mut document);
assert_eq!(document.get("_geoDistance"), None);
}
}

View File

@ -0,0 +1,49 @@
use milli::update::UpdateBuilder;
use milli::CompressionType;
use rayon::ThreadPool;
use crate::options::IndexerOpts;
pub struct UpdateHandler {
max_nb_chunks: Option<usize>,
chunk_compression_level: Option<u32>,
thread_pool: ThreadPool,
log_frequency: usize,
max_memory: Option<usize>,
chunk_compression_type: CompressionType,
}
impl UpdateHandler {
pub fn new(opt: &IndexerOpts) -> anyhow::Result<Self> {
let thread_pool = rayon::ThreadPoolBuilder::new()
.num_threads(opt.indexing_jobs.unwrap_or(num_cpus::get() / 2))
.build()?;
Ok(Self {
max_nb_chunks: opt.max_nb_chunks,
chunk_compression_level: opt.chunk_compression_level,
thread_pool,
log_frequency: opt.log_every_n,
max_memory: opt.max_memory.map(|m| m.get_bytes() as usize),
chunk_compression_type: opt.chunk_compression_type,
})
}
pub fn update_builder(&self, update_id: u64) -> UpdateBuilder {
// We prepare the update by using the update builder.
let mut update_builder = UpdateBuilder::new(update_id);
if let Some(max_nb_chunks) = self.max_nb_chunks {
update_builder.max_nb_chunks(max_nb_chunks);
}
if let Some(chunk_compression_level) = self.chunk_compression_level {
update_builder.chunk_compression_level(chunk_compression_level);
}
update_builder.thread_pool(&self.thread_pool);
update_builder.log_every_n(self.log_frequency);
if let Some(max_memory) = self.max_memory {
update_builder.max_memory(max_memory);
}
update_builder.chunk_compression_type(self.chunk_compression_type);
update_builder
}
}

View File

@ -0,0 +1,392 @@
use std::collections::{BTreeMap, BTreeSet};
use std::marker::PhantomData;
use std::num::NonZeroUsize;
use log::{debug, info, trace};
use milli::documents::DocumentBatchReader;
use milli::update::{IndexDocumentsMethod, Setting, UpdateBuilder};
use serde::{Deserialize, Serialize, Serializer};
use uuid::Uuid;
use crate::index_controller::updates::status::{Failed, Processed, Processing, UpdateResult};
use crate::Update;
use super::error::{IndexError, Result};
use super::index::{Index, IndexMeta};
fn serialize_with_wildcard<S>(
field: &Setting<Vec<String>>,
s: S,
) -> std::result::Result<S::Ok, S::Error>
where
S: Serializer,
{
let wildcard = vec!["*".to_string()];
match field {
Setting::Set(value) => Some(value),
Setting::Reset => Some(&wildcard),
Setting::NotSet => None,
}
.serialize(s)
}
#[derive(Clone, Default, Debug, Serialize)]
pub struct Checked;
#[derive(Clone, Default, Debug, Serialize, Deserialize)]
pub struct Unchecked;
/// Holds all the settings for an index. `T` can either be `Checked` if they represents settings
/// whose validity is guaranteed, or `Unchecked` if they need to be validated. In the later case, a
/// call to `check` will return a `Settings<Checked>` from a `Settings<Unchecked>`.
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
#[serde(rename_all = "camelCase")]
#[serde(bound(serialize = "T: Serialize", deserialize = "T: Deserialize<'static>"))]
pub struct Settings<T> {
#[serde(
default,
serialize_with = "serialize_with_wildcard",
skip_serializing_if = "Setting::is_not_set"
)]
pub displayed_attributes: Setting<Vec<String>>,
#[serde(
default,
serialize_with = "serialize_with_wildcard",
skip_serializing_if = "Setting::is_not_set"
)]
pub searchable_attributes: Setting<Vec<String>>,
#[serde(default, skip_serializing_if = "Setting::is_not_set")]
pub filterable_attributes: Setting<BTreeSet<String>>,
#[serde(default, skip_serializing_if = "Setting::is_not_set")]
pub sortable_attributes: Setting<BTreeSet<String>>,
#[serde(default, skip_serializing_if = "Setting::is_not_set")]
pub ranking_rules: Setting<Vec<String>>,
#[serde(default, skip_serializing_if = "Setting::is_not_set")]
pub stop_words: Setting<BTreeSet<String>>,
#[serde(default, skip_serializing_if = "Setting::is_not_set")]
pub synonyms: Setting<BTreeMap<String, Vec<String>>>,
#[serde(default, skip_serializing_if = "Setting::is_not_set")]
pub distinct_attribute: Setting<String>,
#[serde(skip)]
pub _kind: PhantomData<T>,
}
impl Settings<Checked> {
pub fn cleared() -> Settings<Checked> {
Settings {
displayed_attributes: Setting::Reset,
searchable_attributes: Setting::Reset,
filterable_attributes: Setting::Reset,
sortable_attributes: Setting::Reset,
ranking_rules: Setting::Reset,
stop_words: Setting::Reset,
synonyms: Setting::Reset,
distinct_attribute: Setting::Reset,
_kind: PhantomData,
}
}
pub fn into_unchecked(self) -> Settings<Unchecked> {
let Self {
displayed_attributes,
searchable_attributes,
filterable_attributes,
sortable_attributes,
ranking_rules,
stop_words,
synonyms,
distinct_attribute,
..
} = self;
Settings {
displayed_attributes,
searchable_attributes,
filterable_attributes,
sortable_attributes,
ranking_rules,
stop_words,
synonyms,
distinct_attribute,
_kind: PhantomData,
}
}
}
impl Settings<Unchecked> {
pub fn check(self) -> Settings<Checked> {
let displayed_attributes = match self.displayed_attributes {
Setting::Set(fields) => {
if fields.iter().any(|f| f == "*") {
Setting::Reset
} else {
Setting::Set(fields)
}
}
otherwise => otherwise,
};
let searchable_attributes = match self.searchable_attributes {
Setting::Set(fields) => {
if fields.iter().any(|f| f == "*") {
Setting::Reset
} else {
Setting::Set(fields)
}
}
otherwise => otherwise,
};
Settings {
displayed_attributes,
searchable_attributes,
filterable_attributes: self.filterable_attributes,
sortable_attributes: self.sortable_attributes,
ranking_rules: self.ranking_rules,
stop_words: self.stop_words,
synonyms: self.synonyms,
distinct_attribute: self.distinct_attribute,
_kind: PhantomData,
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
#[serde(rename_all = "camelCase")]
pub struct Facets {
pub level_group_size: Option<NonZeroUsize>,
pub min_level_size: Option<NonZeroUsize>,
}
impl Index {
pub fn handle_update(&self, update: Processing) -> std::result::Result<Processed, Failed> {
let update_id = update.id();
let update_builder = self.update_handler.update_builder(update_id);
let result = (|| {
let mut txn = self.write_txn()?;
let result = match update.meta() {
Update::DocumentAddition {
primary_key,
content_uuid,
method,
} => self.update_documents(
&mut txn,
*method,
*content_uuid,
update_builder,
primary_key.as_deref(),
),
Update::Settings(settings) => {
let settings = settings.clone().check();
self.update_settings(&mut txn, &settings, update_builder)
}
Update::ClearDocuments => {
let builder = update_builder.clear_documents(&mut txn, self);
let _count = builder.execute()?;
Ok(UpdateResult::Other)
}
Update::DeleteDocuments(ids) => {
let mut builder = update_builder.delete_documents(&mut txn, self)?;
// We ignore unexisting document ids
ids.iter().for_each(|id| {
builder.delete_external_id(id);
});
let deleted = builder.execute()?;
Ok(UpdateResult::DocumentDeletion { deleted })
}
};
if result.is_ok() {
txn.commit()?;
}
result
})();
if let Update::DocumentAddition { content_uuid, .. } = update.from.meta() {
let _ = self.update_file_store.delete(*content_uuid);
}
match result {
Ok(result) => Ok(update.process(result)),
Err(e) => Err(update.fail(e)),
}
}
pub fn update_primary_key(&self, primary_key: Option<String>) -> Result<IndexMeta> {
match primary_key {
Some(primary_key) => {
let mut txn = self.write_txn()?;
if self.primary_key(&txn)?.is_some() {
return Err(IndexError::ExistingPrimaryKey);
}
let mut builder = UpdateBuilder::new(0).settings(&mut txn, self);
builder.set_primary_key(primary_key);
builder.execute(|_, _| ())?;
let meta = IndexMeta::new_txn(self, &txn)?;
txn.commit()?;
Ok(meta)
}
None => {
let meta = IndexMeta::new(self)?;
Ok(meta)
}
}
}
fn update_documents<'a, 'b>(
&'a self,
txn: &mut heed::RwTxn<'a, 'b>,
method: IndexDocumentsMethod,
content_uuid: Uuid,
update_builder: UpdateBuilder,
primary_key: Option<&str>,
) -> Result<UpdateResult> {
trace!("performing document addition");
// Set the primary key if not set already, ignore if already set.
if let (None, Some(primary_key)) = (self.primary_key(txn)?, primary_key) {
let mut builder = UpdateBuilder::new(0).settings(txn, self);
builder.set_primary_key(primary_key.to_string());
builder.execute(|_, _| ())?;
}
let indexing_callback =
|indexing_step, update_id| debug!("update {}: {:?}", update_id, indexing_step);
let content_file = self.update_file_store.get_update(content_uuid).unwrap();
let reader = DocumentBatchReader::from_reader(content_file).unwrap();
let mut builder = update_builder.index_documents(txn, self);
builder.index_documents_method(method);
let addition = builder.execute(reader, indexing_callback)?;
info!("document addition done: {:?}", addition);
Ok(UpdateResult::DocumentsAddition(addition))
}
fn update_settings<'a, 'b>(
&'a self,
txn: &mut heed::RwTxn<'a, 'b>,
settings: &Settings<Checked>,
update_builder: UpdateBuilder,
) -> Result<UpdateResult> {
// We must use the write transaction of the update here.
let mut builder = update_builder.settings(txn, self);
apply_settings_to_builder(settings, &mut builder);
builder.execute(|indexing_step, update_id| {
debug!("update {}: {:?}", update_id, indexing_step)
})?;
Ok(UpdateResult::Other)
}
}
pub fn apply_settings_to_builder(
settings: &Settings<Checked>,
builder: &mut milli::update::Settings,
) {
match settings.searchable_attributes {
Setting::Set(ref names) => builder.set_searchable_fields(names.clone()),
Setting::Reset => builder.reset_searchable_fields(),
Setting::NotSet => (),
}
match settings.displayed_attributes {
Setting::Set(ref names) => builder.set_displayed_fields(names.clone()),
Setting::Reset => builder.reset_displayed_fields(),
Setting::NotSet => (),
}
match settings.filterable_attributes {
Setting::Set(ref facets) => {
builder.set_filterable_fields(facets.clone().into_iter().collect())
}
Setting::Reset => builder.reset_filterable_fields(),
Setting::NotSet => (),
}
match settings.sortable_attributes {
Setting::Set(ref fields) => builder.set_sortable_fields(fields.iter().cloned().collect()),
Setting::Reset => builder.reset_sortable_fields(),
Setting::NotSet => (),
}
match settings.ranking_rules {
Setting::Set(ref criteria) => builder.set_criteria(criteria.clone()),
Setting::Reset => builder.reset_criteria(),
Setting::NotSet => (),
}
match settings.stop_words {
Setting::Set(ref stop_words) => builder.set_stop_words(stop_words.clone()),
Setting::Reset => builder.reset_stop_words(),
Setting::NotSet => (),
}
match settings.synonyms {
Setting::Set(ref synonyms) => builder.set_synonyms(synonyms.clone().into_iter().collect()),
Setting::Reset => builder.reset_synonyms(),
Setting::NotSet => (),
}
match settings.distinct_attribute {
Setting::Set(ref attr) => builder.set_distinct_field(attr.clone()),
Setting::Reset => builder.reset_distinct_field(),
Setting::NotSet => (),
}
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn test_setting_check() {
// test no changes
let settings = Settings {
displayed_attributes: Setting::Set(vec![String::from("hello")]),
searchable_attributes: Setting::Set(vec![String::from("hello")]),
filterable_attributes: Setting::NotSet,
sortable_attributes: Setting::NotSet,
ranking_rules: Setting::NotSet,
stop_words: Setting::NotSet,
synonyms: Setting::NotSet,
distinct_attribute: Setting::NotSet,
_kind: PhantomData::<Unchecked>,
};
let checked = settings.clone().check();
assert_eq!(settings.displayed_attributes, checked.displayed_attributes);
assert_eq!(
settings.searchable_attributes,
checked.searchable_attributes
);
// test wildcard
// test no changes
let settings = Settings {
displayed_attributes: Setting::Set(vec![String::from("*")]),
searchable_attributes: Setting::Set(vec![String::from("hello"), String::from("*")]),
filterable_attributes: Setting::NotSet,
sortable_attributes: Setting::NotSet,
ranking_rules: Setting::NotSet,
stop_words: Setting::NotSet,
synonyms: Setting::NotSet,
distinct_attribute: Setting::NotSet,
_kind: PhantomData::<Unchecked>,
};
let checked = settings.check();
assert_eq!(checked.displayed_attributes, Setting::Reset);
assert_eq!(checked.searchable_attributes, Setting::Reset);
}
}

View File

@ -7,19 +7,20 @@ use chrono::Utc;
use futures::{lock::Mutex, stream::StreamExt};
use log::{error, trace};
use tokio::sync::{mpsc, oneshot, RwLock};
use update_actor::UpdateActorHandle;
use uuid_resolver::UuidResolverHandle;
use super::error::{DumpActorError, Result};
use super::{DumpInfo, DumpMsg, DumpStatus, DumpTask};
use crate::index_controller::{update_actor, uuid_resolver};
use crate::index_controller::index_resolver::index_store::IndexStore;
use crate::index_controller::index_resolver::uuid_store::UuidStore;
use crate::index_controller::index_resolver::IndexResolver;
use crate::index_controller::updates::UpdateSender;
pub const CONCURRENT_DUMP_MSG: usize = 10;
pub struct DumpActor<UuidResolver, Update> {
pub struct DumpActor<U, I> {
inbox: Option<mpsc::Receiver<DumpMsg>>,
uuid_resolver: UuidResolver,
update: Update,
index_resolver: Arc<IndexResolver<U, I>>,
update: UpdateSender,
dump_path: PathBuf,
lock: Arc<Mutex<()>>,
dump_infos: Arc<RwLock<HashMap<String, DumpInfo>>>,
@ -32,15 +33,15 @@ fn generate_uid() -> String {
Utc::now().format("%Y%m%d-%H%M%S%3f").to_string()
}
impl<UuidResolver, Update> DumpActor<UuidResolver, Update>
impl<U, I> DumpActor<U, I>
where
UuidResolver: UuidResolverHandle + Send + Sync + Clone + 'static,
Update: UpdateActorHandle + Send + Sync + Clone + 'static,
U: UuidStore + Sync + Send + 'static,
I: IndexStore + Sync + Send + 'static,
{
pub fn new(
inbox: mpsc::Receiver<DumpMsg>,
uuid_resolver: UuidResolver,
update: Update,
index_resolver: Arc<IndexResolver<U, I>>,
update: UpdateSender,
dump_path: impl AsRef<Path>,
index_db_size: usize,
update_db_size: usize,
@ -49,7 +50,7 @@ where
let lock = Arc::new(Mutex::new(()));
Self {
inbox: Some(inbox),
uuid_resolver,
index_resolver,
update,
dump_path: dump_path.as_ref().into(),
dump_infos,
@ -118,8 +119,8 @@ where
let task = DumpTask {
path: self.dump_path.clone(),
uuid_resolver: self.uuid_resolver.clone(),
update_handle: self.update.clone(),
index_resolver: self.index_resolver.clone(),
update_sender: self.update.clone(),
uid: uid.clone(),
update_db_size: self.update_db_size,
index_db_size: self.index_db_size,

View File

@ -1,7 +1,7 @@
use meilisearch_error::{Code, ErrorCode};
use crate::index_controller::update_actor::error::UpdateActorError;
use crate::index_controller::uuid_resolver::error::UuidResolverError;
use crate::index_controller::index_resolver::error::IndexResolverError;
use crate::index_controller::updates::error::UpdateLoopError;
pub type Result<T> = std::result::Result<T, DumpActorError>;
@ -14,9 +14,9 @@ pub enum DumpActorError {
#[error("Internal error: {0}")]
Internal(Box<dyn std::error::Error + Send + Sync + 'static>),
#[error("{0}")]
UuidResolver(#[from] UuidResolverError),
IndexResolver(#[from] IndexResolverError),
#[error("{0}")]
UpdateActor(#[from] UpdateActorError),
UpdateLoop(#[from] UpdateLoopError),
}
macro_rules! internal_error {
@ -45,8 +45,8 @@ impl ErrorCode for DumpActorError {
DumpActorError::DumpAlreadyRunning => Code::DumpAlreadyInProgress,
DumpActorError::DumpDoesNotExist(_) => Code::NotFound,
DumpActorError::Internal(_) => Code::Internal,
DumpActorError::UuidResolver(e) => e.error_code(),
DumpActorError::UpdateActor(e) => e.error_code(),
DumpActorError::IndexResolver(e) => e.error_code(),
DumpActorError::UpdateLoop(e) => e.error_code(),
}
}
}

View File

@ -1,8 +1,10 @@
use std::path::Path;
use std::sync::Arc;
use actix_web::web::Bytes;
use tokio::sync::{mpsc, oneshot};
use crate::index_controller::index_resolver::HardStateIndexResolver;
use super::error::Result;
use super::{DumpActor, DumpActorHandle, DumpInfo, DumpMsg};
@ -31,15 +33,15 @@ impl DumpActorHandle for DumpActorHandleImpl {
impl DumpActorHandleImpl {
pub fn new(
path: impl AsRef<Path>,
uuid_resolver: crate::index_controller::uuid_resolver::UuidResolverHandleImpl,
update: crate::index_controller::update_actor::UpdateActorHandleImpl<Bytes>,
index_resolver: Arc<HardStateIndexResolver>,
update: crate::index_controller::updates::UpdateSender,
index_db_size: usize,
update_db_size: usize,
) -> anyhow::Result<Self> {
let (sender, receiver) = mpsc::channel(10);
let actor = DumpActor::new(
receiver,
uuid_resolver,
index_resolver,
update,
path,
index_db_size,

View File

@ -0,0 +1,19 @@
pub mod v1;
pub mod v2;
pub mod v3;
mod compat {
/// Parses the v1 version of the Asc ranking rules `asc(price)`and returns the field name.
pub fn asc_ranking_rule(text: &str) -> Option<&str> {
text.split_once("asc(")
.and_then(|(_, tail)| tail.rsplit_once(")"))
.map(|(field, _)| field)
}
/// Parses the v1 version of the Desc ranking rules `desc(price)`and returns the field name.
pub fn desc_ranking_rule(text: &str) -> Option<&str> {
text.split_once("desc(")
.and_then(|(_, tail)| tail.rsplit_once(")"))
.map(|(field, _)| field)
}
}

View File

@ -0,0 +1,233 @@
use std::collections::{BTreeMap, BTreeSet};
use std::fs::{create_dir_all, File};
use std::io::{BufReader, Seek, SeekFrom};
use std::marker::PhantomData;
use std::path::Path;
use heed::EnvOpenOptions;
use log::{error, warn};
use milli::documents::DocumentBatchReader;
use milli::update::Setting;
use serde::{Deserialize, Deserializer, Serialize};
use uuid::Uuid;
use crate::document_formats::read_ndjson;
use crate::index::apply_settings_to_builder;
use crate::index::update_handler::UpdateHandler;
use crate::index_controller::dump_actor::loaders::compat::{asc_ranking_rule, desc_ranking_rule};
use crate::index_controller::index_resolver::uuid_store::HeedUuidStore;
use crate::index_controller::{self, IndexMetadata};
use crate::{index::Unchecked, options::IndexerOpts};
#[derive(Serialize, Deserialize, Debug)]
#[serde(rename_all = "camelCase")]
pub struct MetadataV1 {
pub db_version: String,
indexes: Vec<IndexMetadata>,
}
impl MetadataV1 {
pub fn load_dump(
self,
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
size: usize,
indexer_options: &IndexerOpts,
) -> anyhow::Result<()> {
let uuid_store = HeedUuidStore::new(&dst)?;
for index in self.indexes {
let uuid = Uuid::new_v4();
uuid_store.insert(index.uid.clone(), uuid)?;
let src = src.as_ref().join(index.uid);
load_index(
&src,
&dst,
uuid,
index.meta.primary_key.as_deref(),
size,
indexer_options,
)?;
}
Ok(())
}
}
pub fn deserialize_some<'de, T, D>(deserializer: D) -> std::result::Result<Option<T>, D::Error>
where
T: Deserialize<'de>,
D: Deserializer<'de>,
{
Deserialize::deserialize(deserializer).map(Some)
}
// These are the settings used in legacy meilisearch (<v0.21.0).
#[derive(Default, Clone, Serialize, Deserialize, Debug)]
#[serde(rename_all = "camelCase", deny_unknown_fields)]
struct Settings {
#[serde(default, deserialize_with = "deserialize_some")]
pub ranking_rules: Option<Option<Vec<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub distinct_attribute: Option<Option<String>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub searchable_attributes: Option<Option<Vec<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub displayed_attributes: Option<Option<BTreeSet<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub stop_words: Option<Option<BTreeSet<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub synonyms: Option<Option<BTreeMap<String, Vec<String>>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub attributes_for_faceting: Option<Option<Vec<String>>>,
}
fn load_index(
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
uuid: Uuid,
primary_key: Option<&str>,
size: usize,
indexer_options: &IndexerOpts,
) -> anyhow::Result<()> {
let index_path = dst.as_ref().join(&format!("indexes/{}", uuid));
create_dir_all(&index_path)?;
let mut options = EnvOpenOptions::new();
options.map_size(size);
let index = milli::Index::new(options, index_path)?;
let update_handler = UpdateHandler::new(indexer_options)?;
let mut txn = index.write_txn()?;
// extract `settings.json` file and import content
let settings = import_settings(&src)?;
let settings: index_controller::Settings<Unchecked> = settings.into();
let handler = UpdateHandler::new(indexer_options)?;
let mut builder = handler.update_builder(0).settings(&mut txn, &index);
if let Some(primary_key) = primary_key {
builder.set_primary_key(primary_key.to_string());
}
apply_settings_to_builder(&settings.check(), &mut builder);
builder.execute(|_, _| ())?;
let reader = BufReader::new(File::open(&src.as_ref().join("documents.jsonl"))?);
let mut tmp_doc_file = tempfile::tempfile()?;
read_ndjson(reader, &mut tmp_doc_file)?;
tmp_doc_file.seek(SeekFrom::Start(0))?;
let documents_reader = DocumentBatchReader::from_reader(tmp_doc_file)?;
//If the document file is empty, we don't perform the document addition, to prevent
//a primary key error to be thrown.
if !documents_reader.is_empty() {
let builder = update_handler
.update_builder(0)
.index_documents(&mut txn, &index);
builder.execute(documents_reader, |_, _| ())?;
}
txn.commit()?;
// Finaly, we extract the original milli::Index and close it
index.prepare_for_closing().wait();
// Updates are ignored in dumps V1.
Ok(())
}
/// we need to **always** be able to convert the old settings to the settings currently being used
impl From<Settings> for index_controller::Settings<Unchecked> {
fn from(settings: Settings) -> Self {
Self {
distinct_attribute: match settings.distinct_attribute {
Some(Some(attr)) => Setting::Set(attr),
Some(None) => Setting::Reset,
None => Setting::NotSet
},
// we need to convert the old `Vec<String>` into a `BTreeSet<String>`
displayed_attributes: match settings.displayed_attributes {
Some(Some(attrs)) => Setting::Set(attrs.into_iter().collect()),
Some(None) => Setting::Reset,
None => Setting::NotSet
},
searchable_attributes: match settings.searchable_attributes {
Some(Some(attrs)) => Setting::Set(attrs),
Some(None) => Setting::Reset,
None => Setting::NotSet
},
filterable_attributes: match settings.attributes_for_faceting {
Some(Some(attrs)) => Setting::Set(attrs.into_iter().collect()),
Some(None) => Setting::Reset,
None => Setting::NotSet
},
sortable_attributes: Setting::NotSet,
ranking_rules: match settings.ranking_rules {
Some(Some(ranking_rules)) => Setting::Set(ranking_rules.into_iter().filter_map(|criterion| {
match criterion.as_str() {
"words" | "typo" | "proximity" | "attribute" | "exactness" => Some(criterion),
s if s.starts_with("asc") => asc_ranking_rule(s).map(|f| format!("{}:asc", f)),
s if s.starts_with("desc") => desc_ranking_rule(s).map(|f| format!("{}:desc", f)),
"wordsPosition" => {
warn!("The criteria `attribute` and `wordsPosition` have been merged \
into a single criterion `attribute` so `wordsPositon` will be \
ignored");
None
}
s => {
error!("Unknown criterion found in the dump: `{}`, it will be ignored", s);
None
}
}
}).collect()),
Some(None) => Setting::Reset,
None => Setting::NotSet
},
// we need to convert the old `Vec<String>` into a `BTreeSet<String>`
stop_words: match settings.stop_words {
Some(Some(stop_words)) => Setting::Set(stop_words.into_iter().collect()),
Some(None) => Setting::Reset,
None => Setting::NotSet
},
// we need to convert the old `Vec<String>` into a `BTreeMap<String>`
synonyms: match settings.synonyms {
Some(Some(synonyms)) => Setting::Set(synonyms.into_iter().collect()),
Some(None) => Setting::Reset,
None => Setting::NotSet
},
_kind: PhantomData,
}
}
}
/// Extract Settings from `settings.json` file present at provided `dir_path`
fn import_settings(dir_path: impl AsRef<Path>) -> anyhow::Result<Settings> {
let path = dir_path.as_ref().join("settings.json");
let file = File::open(path)?;
let reader = std::io::BufReader::new(file);
let metadata = serde_json::from_reader(reader)?;
Ok(metadata)
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn settings_format_regression() {
let settings = Settings::default();
assert_eq!(
r##"{"rankingRules":null,"distinctAttribute":null,"searchableAttributes":null,"displayedAttributes":null,"stopWords":null,"synonyms":null,"attributesForFaceting":null}"##,
serde_json::to_string(&settings).unwrap()
);
}
}

View File

@ -0,0 +1,392 @@
use std::fs::{File, OpenOptions};
use std::io::Write;
use std::path::{Path, PathBuf};
use serde_json::{Deserializer, Value};
use tempfile::NamedTempFile;
use crate::index_controller::dump_actor::loaders::compat::{asc_ranking_rule, desc_ranking_rule};
use crate::index_controller::dump_actor::Metadata;
use crate::index_controller::updates::status::{
Aborted, Enqueued, Failed, Processed, Processing, UpdateResult, UpdateStatus,
};
use crate::index_controller::updates::store::dump::UpdateEntry;
use crate::index_controller::updates::store::Update;
use crate::options::IndexerOpts;
use super::v3;
/// The dump v2 reads the dump folder and patches all the needed file to make it compatible with a
/// dump v3, then calls the dump v3 to actually handle the dump.
pub fn load_dump(
meta: Metadata,
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
index_db_size: usize,
update_db_size: usize,
indexing_options: &IndexerOpts,
) -> anyhow::Result<()> {
let indexes_path = src.as_ref().join("indexes");
let dir_entries = std::fs::read_dir(indexes_path)?;
for entry in dir_entries {
let entry = entry?;
// rename the index folder
let path = entry.path();
let new_path = patch_index_uuid_path(&path).expect("invalid index folder.");
std::fs::rename(path, &new_path)?;
let settings_path = new_path.join("meta.json");
patch_settings(settings_path)?;
}
let update_path = src.as_ref().join("updates/data.jsonl");
patch_updates(update_path)?;
v3::load_dump(
meta,
src,
dst,
index_db_size,
update_db_size,
indexing_options,
)
}
fn patch_index_uuid_path(path: &Path) -> Option<PathBuf> {
let uuid = path.file_name()?.to_str()?.trim_start_matches("index-");
let new_path = path.parent()?.join(uuid);
Some(new_path)
}
fn patch_settings(path: impl AsRef<Path>) -> anyhow::Result<()> {
let mut meta_file = File::open(&path)?;
let mut meta: Value = serde_json::from_reader(&mut meta_file)?;
// We first deserialize the dump meta into a serde_json::Value and change
// the custom ranking rules settings from the old format to the new format.
if let Some(ranking_rules) = meta.pointer_mut("/settings/rankingRules") {
patch_custon_ranking_rules(ranking_rules);
}
let mut meta_file = OpenOptions::new().truncate(true).write(true).open(path)?;
serde_json::to_writer(&mut meta_file, &meta)?;
Ok(())
}
fn patch_updates(path: impl AsRef<Path>) -> anyhow::Result<()> {
let mut output_update_file = NamedTempFile::new()?;
let update_file = File::open(&path)?;
let stream = Deserializer::from_reader(update_file).into_iter::<compat::UpdateEntry>();
for update in stream {
let update_entry = update?;
let update_entry = UpdateEntry::from(update_entry);
serde_json::to_writer(&mut output_update_file, &update_entry)?;
output_update_file.write_all(b"\n")?;
}
output_update_file.flush()?;
output_update_file.persist(path)?;
Ok(())
}
/// Converts the ranking rules from the format `asc(_)`, `desc(_)` to the format `_:asc`, `_:desc`.
///
/// This is done for compatibility reasons, and to avoid a new dump version,
/// since the new syntax was introduced soon after the new dump version.
fn patch_custon_ranking_rules(ranking_rules: &mut Value) {
*ranking_rules = match ranking_rules.take() {
Value::Array(values) => values
.into_iter()
.filter_map(|value| match value {
Value::String(s) if s.starts_with("asc") => asc_ranking_rule(&s)
.map(|f| format!("{}:asc", f))
.map(Value::String),
Value::String(s) if s.starts_with("desc") => desc_ranking_rule(&s)
.map(|f| format!("{}:desc", f))
.map(Value::String),
otherwise => Some(otherwise),
})
.collect(),
otherwise => otherwise,
}
}
impl From<compat::UpdateEntry> for UpdateEntry {
fn from(compat::UpdateEntry { uuid, update }: compat::UpdateEntry) -> Self {
let update = match update {
compat::UpdateStatus::Processing(meta) => UpdateStatus::Processing(meta.into()),
compat::UpdateStatus::Enqueued(meta) => UpdateStatus::Enqueued(meta.into()),
compat::UpdateStatus::Processed(meta) => UpdateStatus::Processed(meta.into()),
compat::UpdateStatus::Aborted(meta) => UpdateStatus::Aborted(meta.into()),
compat::UpdateStatus::Failed(meta) => UpdateStatus::Failed(meta.into()),
};
Self { uuid, update }
}
}
impl From<compat::Failed> for Failed {
fn from(other: compat::Failed) -> Self {
let compat::Failed {
from,
error,
failed_at,
} = other;
Self {
from: from.into(),
msg: error.message,
code: compat::error_code_from_str(&error.error_code)
.expect("Invalid update: Invalid error code"),
failed_at,
}
}
}
impl From<compat::Aborted> for Aborted {
fn from(other: compat::Aborted) -> Self {
let compat::Aborted { from, aborted_at } = other;
Self {
from: from.into(),
aborted_at,
}
}
}
impl From<compat::Processing> for Processing {
fn from(other: compat::Processing) -> Self {
let compat::Processing {
from,
started_processing_at,
} = other;
Self {
from: from.into(),
started_processing_at,
}
}
}
impl From<compat::Enqueued> for Enqueued {
fn from(other: compat::Enqueued) -> Self {
let compat::Enqueued {
update_id,
meta,
enqueued_at,
content,
} = other;
let meta = match meta {
compat::UpdateMeta::DocumentsAddition {
method,
primary_key,
..
} => {
Update::DocumentAddition {
primary_key,
method,
// Just ignore if the uuid is no present. If it is needed later, an error will
// be thrown.
content_uuid: content.unwrap_or_default(),
}
}
compat::UpdateMeta::ClearDocuments => Update::ClearDocuments,
compat::UpdateMeta::DeleteDocuments { ids } => Update::DeleteDocuments(ids),
compat::UpdateMeta::Settings(settings) => Update::Settings(settings),
};
Self {
update_id,
meta,
enqueued_at,
}
}
}
impl From<compat::Processed> for Processed {
fn from(other: compat::Processed) -> Self {
let compat::Processed {
from,
success,
processed_at,
} = other;
Self {
success: success.into(),
processed_at,
from: from.into(),
}
}
}
impl From<compat::UpdateResult> for UpdateResult {
fn from(other: compat::UpdateResult) -> Self {
match other {
compat::UpdateResult::DocumentsAddition(r) => Self::DocumentsAddition(r),
compat::UpdateResult::DocumentDeletion { deleted } => {
Self::DocumentDeletion { deleted }
}
compat::UpdateResult::Other => Self::Other,
}
}
}
/// compat structure from pre-dumpv3 meilisearch
mod compat {
use anyhow::bail;
use chrono::{DateTime, Utc};
use meilisearch_error::Code;
use milli::update::{DocumentAdditionResult, IndexDocumentsMethod};
use serde::{Deserialize, Serialize};
use uuid::Uuid;
use crate::index::{Settings, Unchecked};
#[derive(Serialize, Deserialize)]
pub struct UpdateEntry {
pub uuid: Uuid,
pub update: UpdateStatus,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum UpdateFormat {
Json,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum UpdateResult {
DocumentsAddition(DocumentAdditionResult),
DocumentDeletion { deleted: u64 },
Other,
}
#[allow(clippy::large_enum_variant)]
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum UpdateMeta {
DocumentsAddition {
method: IndexDocumentsMethod,
format: UpdateFormat,
primary_key: Option<String>,
},
ClearDocuments,
DeleteDocuments {
ids: Vec<String>,
},
Settings(Settings<Unchecked>),
}
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct Enqueued {
pub update_id: u64,
pub meta: UpdateMeta,
pub enqueued_at: DateTime<Utc>,
pub content: Option<Uuid>,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct Processed {
pub success: UpdateResult,
pub processed_at: DateTime<Utc>,
#[serde(flatten)]
pub from: Processing,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct Processing {
#[serde(flatten)]
pub from: Enqueued,
pub started_processing_at: DateTime<Utc>,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct Aborted {
#[serde(flatten)]
pub from: Enqueued,
pub aborted_at: DateTime<Utc>,
}
#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Failed {
#[serde(flatten)]
pub from: Processing,
pub error: ResponseError,
pub failed_at: DateTime<Utc>,
}
#[derive(Debug, Serialize, Deserialize)]
#[serde(tag = "status", rename_all = "camelCase")]
pub enum UpdateStatus {
Processing(Processing),
Enqueued(Enqueued),
Processed(Processed),
Aborted(Aborted),
Failed(Failed),
}
type StatusCode = ();
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct ResponseError {
#[serde(skip)]
pub code: StatusCode,
pub message: String,
pub error_code: String,
pub error_type: String,
pub error_link: String,
}
pub fn error_code_from_str(s: &str) -> anyhow::Result<Code> {
let code = match s {
"index_creation_failed" => Code::CreateIndex,
"index_already_exists" => Code::IndexAlreadyExists,
"index_not_found" => Code::IndexNotFound,
"invalid_index_uid" => Code::InvalidIndexUid,
"index_not_accessible" => Code::OpenIndex,
"invalid_state" => Code::InvalidState,
"missing_primary_key" => Code::MissingPrimaryKey,
"primary_key_already_present" => Code::PrimaryKeyAlreadyPresent,
"invalid_request" => Code::InvalidRankingRule,
"max_fields_limit_exceeded" => Code::MaxFieldsLimitExceeded,
"missing_document_id" => Code::MissingDocumentId,
"invalid_facet" => Code::Facet,
"invalid_filter" => Code::Filter,
"invalid_sort" => Code::Sort,
"bad_parameter" => Code::BadParameter,
"bad_request" => Code::BadRequest,
"document_not_found" => Code::DocumentNotFound,
"internal" => Code::Internal,
"invalid_geo_field" => Code::InvalidGeoField,
"invalid_token" => Code::InvalidToken,
"missing_authorization_header" => Code::MissingAuthorizationHeader,
"not_found" => Code::NotFound,
"payload_too_large" => Code::PayloadTooLarge,
"unretrievable_document" => Code::RetrieveDocument,
"search_error" => Code::SearchDocuments,
"unsupported_media_type" => Code::UnsupportedMediaType,
"dump_already_in_progress" => Code::DumpAlreadyInProgress,
"dump_process_failed" => Code::DumpProcessFailed,
_ => bail!("unknow error code."),
};
Ok(code)
}
}

View File

@ -0,0 +1,31 @@
use std::path::Path;
use log::info;
use crate::index_controller::dump_actor::Metadata;
use crate::index_controller::index_resolver::IndexResolver;
use crate::index_controller::update_file_store::UpdateFileStore;
use crate::index_controller::updates::store::UpdateStore;
use crate::options::IndexerOpts;
pub fn load_dump(
meta: Metadata,
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
index_db_size: usize,
update_db_size: usize,
indexing_options: &IndexerOpts,
) -> anyhow::Result<()> {
info!(
"Loading dump from {}, dump database version: {}, dump version: V3",
meta.dump_date, meta.db_version
);
IndexResolver::load_dump(src.as_ref(), &dst, index_db_size, indexing_options)?;
UpdateFileStore::load_dump(src.as_ref(), &dst)?;
UpdateStore::load_dump(&src, &dst, update_db_size)?;
info!("Loading indexes.");
Ok(())
}

View File

@ -0,0 +1,378 @@
use std::fs::File;
use std::path::{Path, PathBuf};
use std::sync::Arc;
use chrono::{DateTime, Utc};
use log::{info, trace, warn};
use serde::{Deserialize, Serialize};
use tokio::fs::create_dir_all;
use loaders::v1::MetadataV1;
pub use actor::DumpActor;
pub use handle_impl::*;
pub use message::DumpMsg;
use super::index_resolver::index_store::IndexStore;
use super::index_resolver::uuid_store::UuidStore;
use super::index_resolver::IndexResolver;
use super::updates::UpdateSender;
use crate::compression::{from_tar_gz, to_tar_gz};
use crate::index_controller::dump_actor::error::DumpActorError;
use crate::index_controller::dump_actor::loaders::{v2, v3};
use crate::index_controller::updates::UpdateMsg;
use crate::options::IndexerOpts;
use error::Result;
mod actor;
pub mod error;
mod handle_impl;
mod loaders;
mod message;
const META_FILE_NAME: &str = "metadata.json";
#[derive(Serialize, Deserialize, Debug)]
#[serde(rename_all = "camelCase")]
pub struct Metadata {
db_version: String,
index_db_size: usize,
update_db_size: usize,
dump_date: DateTime<Utc>,
}
impl Metadata {
pub fn new(index_db_size: usize, update_db_size: usize) -> Self {
Self {
db_version: env!("CARGO_PKG_VERSION").to_string(),
index_db_size,
update_db_size,
dump_date: Utc::now(),
}
}
}
#[async_trait::async_trait]
#[cfg_attr(test, mockall::automock)]
pub trait DumpActorHandle {
/// Start the creation of a dump
/// Implementation: [handle_impl::DumpActorHandleImpl::create_dump]
async fn create_dump(&self) -> Result<DumpInfo>;
/// Return the status of an already created dump
/// Implementation: [handle_impl::DumpActorHandleImpl::dump_info]
async fn dump_info(&self, uid: String) -> Result<DumpInfo>;
}
#[derive(Debug, Serialize, Deserialize)]
#[serde(tag = "dumpVersion")]
pub enum MetadataVersion {
V1(MetadataV1),
V2(Metadata),
V3(Metadata),
}
impl MetadataVersion {
pub fn new_v3(index_db_size: usize, update_db_size: usize) -> Self {
let meta = Metadata::new(index_db_size, update_db_size);
Self::V3(meta)
}
pub fn db_version(&self) -> &str {
match self {
Self::V1(meta) => &meta.db_version,
Self::V2(meta) | Self::V3(meta) => &meta.db_version,
}
}
pub fn version(&self) -> &str {
match self {
MetadataVersion::V1(_) => "V1",
MetadataVersion::V2(_) => "V2",
MetadataVersion::V3(_) => "V3",
}
}
pub fn dump_date(&self) -> Option<&DateTime<Utc>> {
match self {
MetadataVersion::V1(_) => None,
MetadataVersion::V2(meta) | MetadataVersion::V3(meta) => Some(&meta.dump_date),
}
}
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Clone)]
#[serde(rename_all = "snake_case")]
pub enum DumpStatus {
Done,
InProgress,
Failed,
}
#[derive(Debug, Serialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct DumpInfo {
pub uid: String,
pub status: DumpStatus,
#[serde(skip_serializing_if = "Option::is_none")]
pub error: Option<String>,
started_at: DateTime<Utc>,
#[serde(skip_serializing_if = "Option::is_none")]
finished_at: Option<DateTime<Utc>>,
}
impl DumpInfo {
pub fn new(uid: String, status: DumpStatus) -> Self {
Self {
uid,
status,
error: None,
started_at: Utc::now(),
finished_at: None,
}
}
pub fn with_error(&mut self, error: String) {
self.status = DumpStatus::Failed;
self.finished_at = Some(Utc::now());
self.error = Some(error);
}
pub fn done(&mut self) {
self.finished_at = Some(Utc::now());
self.status = DumpStatus::Done;
}
pub fn dump_already_in_progress(&self) -> bool {
self.status == DumpStatus::InProgress
}
}
pub fn load_dump(
dst_path: impl AsRef<Path>,
src_path: impl AsRef<Path>,
index_db_size: usize,
update_db_size: usize,
indexer_opts: &IndexerOpts,
) -> anyhow::Result<()> {
// Setup a temp directory path in the same path as the database, to prevent cross devices
// references.
let temp_path = dst_path
.as_ref()
.parent()
.map(ToOwned::to_owned)
.unwrap_or_else(|| ".".into());
if cfg!(windows) {
std::env::set_var("TMP", temp_path);
} else {
std::env::set_var("TMPDIR", temp_path);
}
let tmp_src = tempfile::tempdir()?;
let tmp_src_path = tmp_src.path();
from_tar_gz(&src_path, tmp_src_path)?;
let meta_path = tmp_src_path.join(META_FILE_NAME);
let mut meta_file = File::open(&meta_path)?;
let meta: MetadataVersion = serde_json::from_reader(&mut meta_file)?;
let tmp_dst = tempfile::tempdir()?;
info!(
"Loading dump {}, dump database version: {}, dump version: {}",
meta.dump_date()
.map(|t| format!("from {}", t))
.unwrap_or_else(String::new),
meta.db_version(),
meta.version()
);
match meta {
MetadataVersion::V1(meta) => {
meta.load_dump(&tmp_src_path, tmp_dst.path(), index_db_size, indexer_opts)?
}
MetadataVersion::V2(meta) => v2::load_dump(
meta,
&tmp_src_path,
tmp_dst.path(),
index_db_size,
update_db_size,
indexer_opts,
)?,
MetadataVersion::V3(meta) => v3::load_dump(
meta,
&tmp_src_path,
tmp_dst.path(),
index_db_size,
update_db_size,
indexer_opts,
)?,
}
// Persist and atomically rename the db
let persisted_dump = tmp_dst.into_path();
if dst_path.as_ref().exists() {
warn!("Overwriting database at {}", dst_path.as_ref().display());
std::fs::remove_dir_all(&dst_path)?;
}
std::fs::rename(&persisted_dump, &dst_path)?;
Ok(())
}
struct DumpTask<U, I> {
path: PathBuf,
index_resolver: Arc<IndexResolver<U, I>>,
update_sender: UpdateSender,
uid: String,
update_db_size: usize,
index_db_size: usize,
}
impl<U, I> DumpTask<U, I>
where
U: UuidStore + Sync + Send + 'static,
I: IndexStore + Sync + Send + 'static,
{
async fn run(self) -> Result<()> {
trace!("Performing dump.");
create_dir_all(&self.path).await?;
let temp_dump_dir = tokio::task::spawn_blocking(tempfile::TempDir::new).await??;
let temp_dump_path = temp_dump_dir.path().to_owned();
let meta = MetadataVersion::new_v3(self.index_db_size, self.update_db_size);
let meta_path = temp_dump_path.join(META_FILE_NAME);
let mut meta_file = File::create(&meta_path)?;
serde_json::to_writer(&mut meta_file, &meta)?;
let uuids = self.index_resolver.dump(temp_dump_path.clone()).await?;
UpdateMsg::dump(&self.update_sender, uuids, temp_dump_path.clone()).await?;
let dump_path = tokio::task::spawn_blocking(move || -> Result<PathBuf> {
let temp_dump_file = tempfile::NamedTempFile::new()?;
to_tar_gz(temp_dump_path, temp_dump_file.path())
.map_err(|e| DumpActorError::Internal(e.into()))?;
let dump_path = self.path.join(self.uid).with_extension("dump");
temp_dump_file.persist(&dump_path)?;
Ok(dump_path)
})
.await??;
info!("Created dump in {:?}.", dump_path);
Ok(())
}
}
#[cfg(test)]
mod test {
use std::collections::HashSet;
use futures::future::{err, ok};
use once_cell::sync::Lazy;
use uuid::Uuid;
use super::*;
use crate::index::error::Result as IndexResult;
use crate::index::test::Mocker;
use crate::index::Index;
use crate::index_controller::index_resolver::error::IndexResolverError;
use crate::index_controller::index_resolver::index_store::MockIndexStore;
use crate::index_controller::index_resolver::uuid_store::MockUuidStore;
use crate::index_controller::updates::create_update_handler;
fn setup() {
static SETUP: Lazy<()> = Lazy::new(|| {
if cfg!(windows) {
std::env::set_var("TMP", ".");
} else {
std::env::set_var("TMPDIR", ".");
}
});
// just deref to make sure the env is setup
*SETUP
}
#[actix_rt::test]
async fn test_dump_normal() {
setup();
let tmp = tempfile::tempdir().unwrap();
let uuids = std::iter::repeat_with(Uuid::new_v4)
.take(4)
.collect::<HashSet<_>>();
let mut uuid_store = MockUuidStore::new();
let uuids_cloned = uuids.clone();
uuid_store
.expect_dump()
.once()
.returning(move |_| Box::pin(ok(uuids_cloned.clone())));
let mut index_store = MockIndexStore::new();
index_store.expect_get().times(4).returning(move |uuid| {
let mocker = Mocker::default();
let uuids_clone = uuids.clone();
mocker.when::<(), Uuid>("uuid").once().then(move |_| {
assert!(uuids_clone.contains(&uuid));
uuid
});
mocker
.when::<&Path, IndexResult<()>>("dump")
.once()
.then(move |_| Ok(()));
Box::pin(ok(Some(Index::faux(mocker))))
});
let index_resolver = Arc::new(IndexResolver::new(uuid_store, index_store));
let update_sender =
create_update_handler(index_resolver.clone(), tmp.path(), 4096 * 100).unwrap();
let task = DumpTask {
path: tmp.path().to_owned(),
index_resolver,
update_sender,
uid: String::from("test"),
update_db_size: 4096 * 10,
index_db_size: 4096 * 10,
};
task.run().await.unwrap();
}
#[actix_rt::test]
async fn error_performing_dump() {
let tmp = tempfile::tempdir().unwrap();
let mut uuid_store = MockUuidStore::new();
uuid_store
.expect_dump()
.once()
.returning(move |_| Box::pin(err(IndexResolverError::ExistingPrimaryKey)));
let index_store = MockIndexStore::new();
let index_resolver = Arc::new(IndexResolver::new(uuid_store, index_store));
let update_sender =
create_update_handler(index_resolver.clone(), tmp.path(), 4096 * 100).unwrap();
let task = DumpTask {
path: tmp.path().to_owned(),
index_resolver,
update_sender,
uid: String::from("test"),
update_db_size: 4096 * 10,
index_db_size: 4096 * 10,
};
assert!(task.run().await.is_err());
}
}

View File

@ -1,12 +1,14 @@
use std::error::Error;
use meilisearch_error::Code;
use meilisearch_error::ErrorCode;
use tokio::task::JoinError;
use crate::index::error::IndexError;
use super::dump_actor::error::DumpActorError;
use super::index_actor::error::IndexActorError;
use super::update_actor::error::UpdateActorError;
use super::uuid_resolver::error::UuidResolverError;
use super::index_resolver::error::IndexResolverError;
use super::updates::error::UpdateLoopError;
pub type Result<T> = std::result::Result<T, IndexControllerError>;
@ -15,26 +17,28 @@ pub enum IndexControllerError {
#[error("Index creation must have an uid")]
MissingUid,
#[error("{0}")]
Uuid(#[from] UuidResolverError),
IndexResolver(#[from] IndexResolverError),
#[error("{0}")]
IndexActor(#[from] IndexActorError),
#[error("{0}")]
UpdateActor(#[from] UpdateActorError),
UpdateLoop(#[from] UpdateLoopError),
#[error("{0}")]
DumpActor(#[from] DumpActorError),
#[error("{0}")]
IndexError(#[from] IndexError),
#[error("Internal error: {0}")]
Internal(Box<dyn Error + Send + Sync + 'static>),
}
internal_error!(IndexControllerError: JoinError);
impl ErrorCode for IndexControllerError {
fn error_code(&self) -> Code {
match self {
IndexControllerError::MissingUid => Code::BadRequest,
IndexControllerError::Uuid(e) => e.error_code(),
IndexControllerError::IndexActor(e) => e.error_code(),
IndexControllerError::UpdateActor(e) => e.error_code(),
IndexControllerError::IndexResolver(e) => e.error_code(),
IndexControllerError::UpdateLoop(e) => e.error_code(),
IndexControllerError::DumpActor(e) => e.error_code(),
IndexControllerError::IndexError(e) => e.error_code(),
IndexControllerError::Internal(_) => Code::Internal,
}
}
}

View File

@ -0,0 +1,64 @@
use std::fmt;
use meilisearch_error::{Code, ErrorCode};
use tokio::sync::mpsc::error::SendError as MpscSendError;
use tokio::sync::oneshot::error::RecvError as OneshotRecvError;
use crate::{error::MilliError, index::error::IndexError};
pub type Result<T> = std::result::Result<T, IndexResolverError>;
#[derive(thiserror::Error, Debug)]
pub enum IndexResolverError {
#[error("{0}")]
IndexError(#[from] IndexError),
#[error("Index already exists")]
IndexAlreadyExists,
#[error("Index {0} not found")]
UnexistingIndex(String),
#[error("A primary key is already present. It's impossible to update it")]
ExistingPrimaryKey,
#[error("Internal Error: {0}")]
Internal(Box<dyn std::error::Error + Send + Sync + 'static>),
#[error("{0}")]
Milli(#[from] milli::Error),
#[error("Index must have a valid uid; Index uid can be of type integer or string only composed of alphanumeric characters, hyphens (-) and underscores (_).")]
BadlyFormatted(String),
}
impl<T> From<MpscSendError<T>> for IndexResolverError
where
T: Send + Sync + 'static + fmt::Debug,
{
fn from(other: tokio::sync::mpsc::error::SendError<T>) -> Self {
Self::Internal(Box::new(other))
}
}
impl From<OneshotRecvError> for IndexResolverError {
fn from(other: tokio::sync::oneshot::error::RecvError) -> Self {
Self::Internal(Box::new(other))
}
}
internal_error!(
IndexResolverError: heed::Error,
uuid::Error,
std::io::Error,
tokio::task::JoinError,
serde_json::Error
);
impl ErrorCode for IndexResolverError {
fn error_code(&self) -> Code {
match self {
IndexResolverError::IndexError(e) => e.error_code(),
IndexResolverError::IndexAlreadyExists => Code::IndexAlreadyExists,
IndexResolverError::UnexistingIndex(_) => Code::IndexNotFound,
IndexResolverError::ExistingPrimaryKey => Code::PrimaryKeyAlreadyPresent,
IndexResolverError::Internal(_) => Code::Internal,
IndexResolverError::Milli(e) => MilliError(e).error_code(),
IndexResolverError::BadlyFormatted(_) => Code::InvalidIndexUid,
}
}
}

View File

@ -8,12 +8,16 @@ use tokio::sync::RwLock;
use tokio::task::spawn_blocking;
use uuid::Uuid;
use super::error::{IndexActorError, Result};
use super::error::{IndexResolverError, Result};
use crate::index::update_handler::UpdateHandler;
use crate::index::Index;
use crate::index_controller::update_file_store::UpdateFileStore;
use crate::options::IndexerOpts;
type AsyncMap<K, V> = Arc<RwLock<HashMap<K, V>>>;
#[async_trait::async_trait]
#[cfg_attr(test, mockall::automock)]
pub trait IndexStore {
async fn create(&self, uuid: Uuid, primary_key: Option<String>) -> Result<Index>;
async fn get(&self, uuid: Uuid) -> Result<Option<Index>>;
@ -24,17 +28,27 @@ pub struct MapIndexStore {
index_store: AsyncMap<Uuid, Index>,
path: PathBuf,
index_size: usize,
update_file_store: Arc<UpdateFileStore>,
update_handler: Arc<UpdateHandler>,
}
impl MapIndexStore {
pub fn new(path: impl AsRef<Path>, index_size: usize) -> Self {
pub fn new(
path: impl AsRef<Path>,
index_size: usize,
indexer_opts: &IndexerOpts,
) -> anyhow::Result<Self> {
let update_handler = Arc::new(UpdateHandler::new(indexer_opts)?);
let update_file_store = Arc::new(UpdateFileStore::new(path.as_ref()).unwrap());
let path = path.as_ref().join("indexes/");
let index_store = Arc::new(RwLock::new(HashMap::new()));
Self {
Ok(Self {
index_store,
path,
index_size,
}
update_file_store,
update_handler,
})
}
}
@ -48,18 +62,21 @@ impl IndexStore for MapIndexStore {
if let Some(index) = lock.get(&uuid) {
return Ok(index.clone());
}
let path = self.path.join(format!("index-{}", uuid));
let path = self.path.join(format!("{}", uuid));
if path.exists() {
return Err(IndexActorError::IndexAlreadyExists);
return Err(IndexResolverError::IndexAlreadyExists);
}
let index_size = self.index_size;
let file_store = self.update_file_store.clone();
let update_handler = self.update_handler.clone();
let index = spawn_blocking(move || -> Result<Index> {
let index = Index::open(path, index_size)?;
let index = Index::open(path, index_size, file_store, uuid, update_handler)?;
if let Some(primary_key) = primary_key {
let mut txn = index.write_txn()?;
let inner = index.inner();
let mut txn = inner.write_txn()?;
let mut builder = UpdateBuilder::new(0).settings(&mut txn, &index);
let mut builder = UpdateBuilder::new(0).settings(&mut txn, index.inner());
builder.set_primary_key(primary_key);
builder.execute(|_, _| ())?;
@ -81,13 +98,18 @@ impl IndexStore for MapIndexStore {
None => {
// drop the guard here so we can perform the write after without deadlocking;
drop(guard);
let path = self.path.join(format!("index-{}", uuid));
let path = self.path.join(format!("{}", uuid));
if !path.exists() {
return Ok(None);
}
let index_size = self.index_size;
let index = spawn_blocking(move || Index::open(path, index_size)).await??;
let file_store = self.update_file_store.clone();
let update_handler = self.update_handler.clone();
let index = spawn_blocking(move || {
Index::open(path, index_size, file_store, uuid, update_handler)
})
.await??;
self.index_store.write().await.insert(uuid, index.clone());
Ok(Some(index))
}
@ -95,7 +117,7 @@ impl IndexStore for MapIndexStore {
}
async fn delete(&self, uuid: Uuid) -> Result<Option<Index>> {
let db_path = self.path.join(format!("index-{}", uuid));
let db_path = self.path.join(format!("{}", uuid));
fs::remove_dir_all(db_path).await?;
let index = self.index_store.write().await.remove(&uuid);
Ok(index)

View File

@ -1,22 +1,22 @@
use std::collections::HashSet;
use std::path::PathBuf;
use std::{collections::HashSet, path::PathBuf};
use tokio::sync::oneshot;
use uuid::Uuid;
use super::Result;
use crate::index::Index;
use super::error::Result;
pub enum UuidResolveMsg {
pub enum IndexResolverMsg {
Get {
uid: String,
ret: oneshot::Sender<Result<Uuid>>,
ret: oneshot::Sender<Result<Index>>,
},
Delete {
uid: String,
ret: oneshot::Sender<Result<Uuid>>,
ret: oneshot::Sender<Result<Index>>,
},
List {
ret: oneshot::Sender<Result<Vec<(String, Uuid)>>>,
ret: oneshot::Sender<Result<Vec<(String, Index)>>>,
},
Insert {
uuid: Uuid,
@ -25,13 +25,13 @@ pub enum UuidResolveMsg {
},
SnapshotRequest {
path: PathBuf,
ret: oneshot::Sender<Result<HashSet<Uuid>>>,
ret: oneshot::Sender<Result<HashSet<Index>>>,
},
GetSize {
ret: oneshot::Sender<Result<u64>>,
},
DumpRequest {
path: PathBuf,
ret: oneshot::Sender<Result<HashSet<Uuid>>>,
ret: oneshot::Sender<Result<HashSet<Index>>>,
},
}

View File

@ -0,0 +1,185 @@
pub mod error;
pub mod index_store;
pub mod uuid_store;
use std::path::Path;
use error::{IndexResolverError, Result};
use index_store::{IndexStore, MapIndexStore};
use log::error;
use uuid::Uuid;
use uuid_store::{HeedUuidStore, UuidStore};
use crate::{
index::{update_handler::UpdateHandler, Index},
options::IndexerOpts,
};
pub type HardStateIndexResolver = IndexResolver<HeedUuidStore, MapIndexStore>;
pub fn create_index_resolver(
path: impl AsRef<Path>,
index_size: usize,
indexer_opts: &IndexerOpts,
) -> anyhow::Result<HardStateIndexResolver> {
let uuid_store = HeedUuidStore::new(&path)?;
let index_store = MapIndexStore::new(&path, index_size, indexer_opts)?;
Ok(IndexResolver::new(uuid_store, index_store))
}
pub struct IndexResolver<U, I> {
index_uuid_store: U,
index_store: I,
}
impl IndexResolver<HeedUuidStore, MapIndexStore> {
pub fn load_dump(
src: impl AsRef<Path>,
dst: impl AsRef<Path>,
index_db_size: usize,
indexer_opts: &IndexerOpts,
) -> anyhow::Result<()> {
HeedUuidStore::load_dump(&src, &dst)?;
let indexes_path = src.as_ref().join("indexes");
let indexes = indexes_path.read_dir()?;
let update_handler = UpdateHandler::new(indexer_opts)?;
for index in indexes {
let index = index?;
Index::load_dump(&index.path(), &dst, index_db_size, &update_handler)?;
}
Ok(())
}
}
impl<U, I> IndexResolver<U, I>
where
U: UuidStore,
I: IndexStore,
{
pub fn new(index_uuid_store: U, index_store: I) -> Self {
Self {
index_uuid_store,
index_store,
}
}
pub async fn dump(&self, path: impl AsRef<Path>) -> Result<Vec<Index>> {
let uuids = self.index_uuid_store.dump(path.as_ref().to_owned()).await?;
let mut indexes = Vec::new();
for uuid in uuids {
indexes.push(self.get_index_by_uuid(uuid).await?);
}
Ok(indexes)
}
pub async fn get_uuids_size(&self) -> Result<u64> {
Ok(self.index_uuid_store.get_size().await?)
}
pub async fn snapshot(&self, path: impl AsRef<Path>) -> Result<Vec<Index>> {
let uuids = self
.index_uuid_store
.snapshot(path.as_ref().to_owned())
.await?;
let mut indexes = Vec::new();
for uuid in uuids {
indexes.push(self.get_index_by_uuid(uuid).await?);
}
Ok(indexes)
}
pub async fn create_index(&self, uid: String, primary_key: Option<String>) -> Result<Index> {
if !is_index_uid_valid(&uid) {
return Err(IndexResolverError::BadlyFormatted(uid));
}
let uuid = Uuid::new_v4();
let index = self.index_store.create(uuid, primary_key).await?;
match self.index_uuid_store.insert(uid, uuid).await {
Err(e) => {
match self.index_store.delete(uuid).await {
Ok(Some(index)) => {
index.inner().clone().prepare_for_closing();
}
Ok(None) => (),
Err(e) => error!("Error while deleting index: {:?}", e),
}
Err(e)
}
Ok(()) => Ok(index),
}
}
pub async fn list(&self) -> Result<Vec<(String, Index)>> {
let uuids = self.index_uuid_store.list().await?;
let mut indexes = Vec::new();
for (name, uuid) in uuids {
match self.index_store.get(uuid).await? {
Some(index) => indexes.push((name, index)),
None => {
// we found an unexisting index, we remove it from the uuid store
let _ = self.index_uuid_store.delete(name).await;
}
}
}
Ok(indexes)
}
pub async fn delete_index(&self, uid: String) -> Result<Uuid> {
match self.index_uuid_store.delete(uid.clone()).await? {
Some(uuid) => {
match self.index_store.delete(uuid).await {
Ok(Some(index)) => {
index.inner().clone().prepare_for_closing();
}
Ok(None) => (),
Err(e) => error!("Error while deleting index: {:?}", e),
}
Ok(uuid)
}
None => Err(IndexResolverError::UnexistingIndex(uid)),
}
}
pub async fn get_index_by_uuid(&self, uuid: Uuid) -> Result<Index> {
// TODO: Handle this error better.
self.index_store
.get(uuid)
.await?
.ok_or_else(|| IndexResolverError::UnexistingIndex(String::new()))
}
pub async fn get_index(&self, uid: String) -> Result<Index> {
match self.index_uuid_store.get_uuid(uid).await? {
(name, Some(uuid)) => {
match self.index_store.get(uuid).await? {
Some(index) => Ok(index),
None => {
// For some reason we got a uuid to an unexisting index, we return an error,
// and remove the uuid from the uuid store.
let _ = self.index_uuid_store.delete(name.clone()).await;
Err(IndexResolverError::UnexistingIndex(name))
}
}
}
(name, _) => Err(IndexResolverError::UnexistingIndex(name)),
}
}
pub async fn get_uuid(&self, uid: String) -> Result<Uuid> {
match self.index_uuid_store.get_uuid(uid).await? {
(_, Some(uuid)) => Ok(uuid),
(name, _) => Err(IndexResolverError::UnexistingIndex(name)),
}
}
}
fn is_index_uid_valid(uid: &str) -> bool {
uid.chars()
.all(|x| x.is_ascii_alphanumeric() || x == '-' || x == '_')
}

View File

@ -8,8 +8,10 @@ use heed::{CompactionOption, Database, Env, EnvOpenOptions};
use serde::{Deserialize, Serialize};
use uuid::Uuid;
use super::{error::UuidResolverError, Result, UUID_STORE_SIZE};
use crate::helpers::EnvSizer;
use super::error::{IndexResolverError, Result};
use crate::EnvSizer;
const UUID_STORE_SIZE: usize = 1_073_741_824; //1GiB
#[derive(Serialize, Deserialize)]
struct DumpEntry {
@ -20,10 +22,11 @@ struct DumpEntry {
const UUIDS_DB_PATH: &str = "index_uuids";
#[async_trait::async_trait]
#[cfg_attr(test, mockall::automock)]
pub trait UuidStore: Sized {
// Create a new entry for `name`. Return an error if `err` and the entry already exists, return
// the uuid otherwise.
async fn get_uuid(&self, uid: String) -> Result<Option<Uuid>>;
async fn get_uuid(&self, uid: String) -> Result<(String, Option<Uuid>)>;
async fn delete(&self, uid: String) -> Result<Option<Uuid>>;
async fn list(&self) -> Result<Vec<(String, Uuid)>>;
async fn insert(&self, name: String, uuid: Uuid) -> Result<()>;
@ -44,16 +47,17 @@ impl HeedUuidStore {
create_dir_all(&path)?;
let mut options = EnvOpenOptions::new();
options.map_size(UUID_STORE_SIZE); // 1GB
options.max_dbs(1);
let env = options.open(path)?;
let db = env.create_database(None)?;
let db = env.create_database(Some("uuids"))?;
Ok(Self { env, db })
}
pub fn get_uuid(&self, name: String) -> Result<Option<Uuid>> {
pub fn get_uuid(&self, name: &str) -> Result<Option<Uuid>> {
let env = self.env.clone();
let db = self.db;
let txn = env.read_txn()?;
match db.get(&txn, &name)? {
match db.get(&txn, name)? {
Some(uuid) => {
let uuid = Uuid::from_slice(uuid)?;
Ok(Some(uuid))
@ -96,7 +100,7 @@ impl HeedUuidStore {
let mut txn = env.write_txn()?;
if db.get(&txn, &name)?.is_some() {
return Err(UuidResolverError::NameAlreadyExist);
return Err(IndexResolverError::IndexAlreadyExists);
}
db.put(&mut txn, &name, uuid.as_bytes())?;
@ -170,7 +174,6 @@ impl HeedUuidStore {
Ok(0) => break,
Ok(_) => {
let DumpEntry { uuid, uid } = serde_json::from_str(&line)?;
println!("importing {} {}", uid, uuid);
db.db.put(&mut txn, &uid, uuid.as_bytes())?;
}
Err(e) => return Err(e.into()),
@ -188,9 +191,9 @@ impl HeedUuidStore {
#[async_trait::async_trait]
impl UuidStore for HeedUuidStore {
async fn get_uuid(&self, name: String) -> Result<Option<Uuid>> {
async fn get_uuid(&self, name: String) -> Result<(String, Option<Uuid>)> {
let this = self.clone();
tokio::task::spawn_blocking(move || this.get_uuid(name)).await?
tokio::task::spawn_blocking(move || this.get_uuid(&name).map(|res| (name, res))).await?
}
async fn delete(&self, uid: String) -> Result<Option<Uuid>> {

View File

@ -0,0 +1,615 @@
use std::collections::BTreeMap;
use std::fmt;
use std::path::{Path, PathBuf};
use std::sync::Arc;
use std::time::Duration;
use actix_web::error::PayloadError;
use bytes::Bytes;
use chrono::{DateTime, Utc};
use futures::Stream;
use log::info;
use milli::update::IndexDocumentsMethod;
use serde::{Deserialize, Serialize};
use tokio::task::spawn_blocking;
use tokio::time::sleep;
use uuid::Uuid;
use dump_actor::DumpActorHandle;
pub use dump_actor::{DumpInfo, DumpStatus};
use snapshot::load_snapshot;
use crate::index::error::Result as IndexResult;
use crate::index::{
Checked, Document, IndexMeta, IndexStats, SearchQuery, SearchResult, Settings, Unchecked,
};
use crate::index_controller::index_resolver::create_index_resolver;
use crate::index_controller::snapshot::SnapshotService;
use crate::options::IndexerOpts;
use error::Result;
use self::dump_actor::load_dump;
use self::index_resolver::error::IndexResolverError;
use self::index_resolver::index_store::{IndexStore, MapIndexStore};
use self::index_resolver::uuid_store::{HeedUuidStore, UuidStore};
use self::index_resolver::IndexResolver;
use self::updates::status::UpdateStatus;
use self::updates::UpdateMsg;
mod dump_actor;
pub mod error;
mod index_resolver;
mod snapshot;
pub mod update_file_store;
pub mod updates;
/// Concrete implementation of the IndexController, exposed by meilisearch-lib
pub type MeiliSearch =
IndexController<HeedUuidStore, MapIndexStore, dump_actor::DumpActorHandleImpl>;
pub type Payload = Box<
dyn Stream<Item = std::result::Result<Bytes, PayloadError>> + Send + Sync + 'static + Unpin,
>;
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct IndexMetadata {
#[serde(skip)]
pub uuid: Uuid,
pub uid: String,
name: String,
#[serde(flatten)]
pub meta: IndexMeta,
}
#[derive(Clone, Debug)]
pub struct IndexSettings {
pub uid: Option<String>,
pub primary_key: Option<String>,
}
#[derive(Debug)]
pub enum DocumentAdditionFormat {
Json,
Csv,
Ndjson,
}
impl fmt::Display for DocumentAdditionFormat {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
DocumentAdditionFormat::Json => write!(f, "json"),
DocumentAdditionFormat::Ndjson => write!(f, "ndjson"),
DocumentAdditionFormat::Csv => write!(f, "csv"),
}
}
}
#[derive(Serialize, Debug)]
#[serde(rename_all = "camelCase")]
pub struct Stats {
pub database_size: u64,
pub last_update: Option<DateTime<Utc>>,
pub indexes: BTreeMap<String, IndexStats>,
}
#[allow(clippy::large_enum_variant)]
#[derive(derivative::Derivative)]
#[derivative(Debug)]
pub enum Update {
DeleteDocuments(Vec<String>),
ClearDocuments,
Settings(Settings<Unchecked>),
DocumentAddition {
#[derivative(Debug = "ignore")]
payload: Payload,
primary_key: Option<String>,
method: IndexDocumentsMethod,
format: DocumentAdditionFormat,
},
}
#[derive(Default, Debug)]
pub struct IndexControllerBuilder {
max_index_size: Option<usize>,
max_update_store_size: Option<usize>,
snapshot_dir: Option<PathBuf>,
import_snapshot: Option<PathBuf>,
snapshot_interval: Option<Duration>,
ignore_snapshot_if_db_exists: bool,
ignore_missing_snapshot: bool,
schedule_snapshot: bool,
dump_src: Option<PathBuf>,
dump_dst: Option<PathBuf>,
}
impl IndexControllerBuilder {
pub fn build(
self,
db_path: impl AsRef<Path>,
indexer_options: IndexerOpts,
) -> anyhow::Result<MeiliSearch> {
let index_size = self
.max_index_size
.ok_or_else(|| anyhow::anyhow!("Missing index size"))?;
let update_store_size = self
.max_index_size
.ok_or_else(|| anyhow::anyhow!("Missing update database size"))?;
if let Some(ref path) = self.import_snapshot {
info!("Loading from snapshot {:?}", path);
load_snapshot(
db_path.as_ref(),
path,
self.ignore_snapshot_if_db_exists,
self.ignore_missing_snapshot,
)?;
} else if let Some(ref src_path) = self.dump_src {
load_dump(
db_path.as_ref(),
src_path,
index_size,
update_store_size,
&indexer_options,
)?;
}
std::fs::create_dir_all(db_path.as_ref())?;
let index_resolver = Arc::new(create_index_resolver(
&db_path,
index_size,
&indexer_options,
)?);
#[allow(unreachable_code)]
let update_sender =
updates::create_update_handler(index_resolver.clone(), &db_path, update_store_size)?;
let dump_path = self
.dump_dst
.ok_or_else(|| anyhow::anyhow!("Missing dump directory path"))?;
let dump_handle = dump_actor::DumpActorHandleImpl::new(
dump_path,
index_resolver.clone(),
update_sender.clone(),
index_size,
update_store_size,
)?;
let dump_handle = Arc::new(dump_handle);
if self.schedule_snapshot {
let snapshot_service = SnapshotService::new(
index_resolver.clone(),
update_sender.clone(),
self.snapshot_interval
.ok_or_else(|| anyhow::anyhow!("Snapshot interval not provided."))?,
self.snapshot_dir
.ok_or_else(|| anyhow::anyhow!("Snapshot path not provided."))?,
db_path
.as_ref()
.file_name()
.map(|n| n.to_owned().into_string().expect("invalid path"))
.unwrap_or_else(|| String::from("data.ms")),
);
tokio::task::spawn(snapshot_service.run());
}
Ok(IndexController {
index_resolver,
update_sender,
dump_handle,
})
}
/// Set the index controller builder's max update store size.
pub fn set_max_update_store_size(&mut self, max_update_store_size: usize) -> &mut Self {
self.max_update_store_size.replace(max_update_store_size);
self
}
pub fn set_max_index_size(&mut self, size: usize) -> &mut Self {
self.max_index_size.replace(size);
self
}
/// Set the index controller builder's snapshot path.
pub fn set_snapshot_dir(&mut self, snapshot_dir: PathBuf) -> &mut Self {
self.snapshot_dir.replace(snapshot_dir);
self
}
/// Set the index controller builder's ignore snapshot if db exists.
pub fn set_ignore_snapshot_if_db_exists(
&mut self,
ignore_snapshot_if_db_exists: bool,
) -> &mut Self {
self.ignore_snapshot_if_db_exists = ignore_snapshot_if_db_exists;
self
}
/// Set the index controller builder's ignore missing snapshot.
pub fn set_ignore_missing_snapshot(&mut self, ignore_missing_snapshot: bool) -> &mut Self {
self.ignore_missing_snapshot = ignore_missing_snapshot;
self
}
/// Set the index controller builder's dump src.
pub fn set_dump_src(&mut self, dump_src: PathBuf) -> &mut Self {
self.dump_src.replace(dump_src);
self
}
/// Set the index controller builder's dump dst.
pub fn set_dump_dst(&mut self, dump_dst: PathBuf) -> &mut Self {
self.dump_dst.replace(dump_dst);
self
}
/// Set the index controller builder's import snapshot.
pub fn set_import_snapshot(&mut self, import_snapshot: PathBuf) -> &mut Self {
self.import_snapshot.replace(import_snapshot);
self
}
/// Set the index controller builder's snapshot interval sec.
pub fn set_snapshot_interval(&mut self, snapshot_interval: Duration) -> &mut Self {
self.snapshot_interval = Some(snapshot_interval);
self
}
/// Set the index controller builder's schedule snapshot.
pub fn set_schedule_snapshot(&mut self) -> &mut Self {
self.schedule_snapshot = true;
self
}
}
// We are using derivative here to derive Clone, because U, I and D do not necessarily implement
// Clone themselves.
#[derive(derivative::Derivative)]
#[derivative(Clone(bound = ""))]
pub struct IndexController<U, I, D> {
index_resolver: Arc<IndexResolver<U, I>>,
update_sender: updates::UpdateSender,
dump_handle: Arc<D>,
}
impl<U, I, D> IndexController<U, I, D>
where
U: UuidStore + Sync + Send + 'static,
I: IndexStore + Sync + Send + 'static,
D: DumpActorHandle + Send + Sync,
{
pub fn builder() -> IndexControllerBuilder {
IndexControllerBuilder::default()
}
pub async fn register_update(
&self,
uid: String,
update: Update,
create_index: bool,
) -> Result<UpdateStatus> {
match self.index_resolver.get_uuid(uid).await {
Ok(uuid) => {
let update_result = UpdateMsg::update(&self.update_sender, uuid, update).await?;
Ok(update_result)
}
Err(IndexResolverError::UnexistingIndex(name)) => {
if create_index {
let index = self.index_resolver.create_index(name, None).await?;
let update_result =
UpdateMsg::update(&self.update_sender, index.uuid(), update).await?;
Ok(update_result)
} else {
Err(IndexResolverError::UnexistingIndex(name).into())
}
}
Err(e) => Err(e.into()),
}
}
pub async fn update_status(&self, uid: String, id: u64) -> Result<UpdateStatus> {
let uuid = self.index_resolver.get_uuid(uid).await?;
let result = UpdateMsg::get_update(&self.update_sender, uuid, id).await?;
Ok(result)
}
pub async fn all_update_status(&self, uid: String) -> Result<Vec<UpdateStatus>> {
let uuid = self.index_resolver.get_uuid(uid).await?;
let result = UpdateMsg::list_updates(&self.update_sender, uuid).await?;
Ok(result)
}
pub async fn list_indexes(&self) -> Result<Vec<IndexMetadata>> {
let indexes = self.index_resolver.list().await?;
let mut ret = Vec::new();
for (uid, index) in indexes {
let meta = index.meta()?;
let meta = IndexMetadata {
uuid: index.uuid(),
name: uid.clone(),
uid,
meta,
};
ret.push(meta);
}
Ok(ret)
}
pub async fn settings(&self, uid: String) -> Result<Settings<Checked>> {
let index = self.index_resolver.get_index(uid).await?;
let settings = spawn_blocking(move || index.settings()).await??;
Ok(settings)
}
pub async fn documents(
&self,
uid: String,
offset: usize,
limit: usize,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Vec<Document>> {
let index = self.index_resolver.get_index(uid).await?;
let documents =
spawn_blocking(move || index.retrieve_documents(offset, limit, attributes_to_retrieve))
.await??;
Ok(documents)
}
pub async fn document(
&self,
uid: String,
doc_id: String,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Document> {
let index = self.index_resolver.get_index(uid).await?;
let document =
spawn_blocking(move || index.retrieve_document(doc_id, attributes_to_retrieve))
.await??;
Ok(document)
}
pub async fn update_index(
&self,
uid: String,
mut index_settings: IndexSettings,
) -> Result<IndexMetadata> {
index_settings.uid.take();
let index = self.index_resolver.get_index(uid.clone()).await?;
let uuid = index.uuid();
let meta =
spawn_blocking(move || index.update_primary_key(index_settings.primary_key)).await??;
let meta = IndexMetadata {
uuid,
name: uid.clone(),
uid,
meta,
};
Ok(meta)
}
pub async fn search(&self, uid: String, query: SearchQuery) -> Result<SearchResult> {
let index = self.index_resolver.get_index(uid.clone()).await?;
let result = spawn_blocking(move || index.perform_search(query)).await??;
Ok(result)
}
pub async fn get_index(&self, uid: String) -> Result<IndexMetadata> {
let index = self.index_resolver.get_index(uid.clone()).await?;
let uuid = index.uuid();
let meta = spawn_blocking(move || index.meta()).await??;
let meta = IndexMetadata {
uuid,
name: uid.clone(),
uid,
meta,
};
Ok(meta)
}
pub async fn get_index_stats(&self, uid: String) -> Result<IndexStats> {
let update_infos = UpdateMsg::get_info(&self.update_sender).await?;
let index = self.index_resolver.get_index(uid).await?;
let uuid = index.uuid();
let mut stats = spawn_blocking(move || index.stats()).await??;
// Check if the currently indexing update is from our index.
stats.is_indexing = Some(Some(uuid) == update_infos.processing);
Ok(stats)
}
pub async fn get_all_stats(&self) -> Result<Stats> {
let update_infos = UpdateMsg::get_info(&self.update_sender).await?;
let mut database_size = self.index_resolver.get_uuids_size().await? + update_infos.size;
let mut last_update: Option<DateTime<_>> = None;
let mut indexes = BTreeMap::new();
for (index_uid, index) in self.index_resolver.list().await? {
let uuid = index.uuid();
let (mut stats, meta) = spawn_blocking::<_, IndexResult<_>>(move || {
let stats = index.stats()?;
let meta = index.meta()?;
Ok((stats, meta))
})
.await??;
database_size += stats.size;
last_update = last_update.map_or(Some(meta.updated_at), |last| {
Some(last.max(meta.updated_at))
});
// Check if the currently indexing update is from our index.
stats.is_indexing = Some(Some(uuid) == update_infos.processing);
indexes.insert(index_uid, stats);
}
Ok(Stats {
database_size,
last_update,
indexes,
})
}
pub async fn create_dump(&self) -> Result<DumpInfo> {
Ok(self.dump_handle.create_dump().await?)
}
pub async fn dump_info(&self, uid: String) -> Result<DumpInfo> {
Ok(self.dump_handle.dump_info(uid).await?)
}
pub async fn create_index(
&self,
uid: String,
primary_key: Option<String>,
) -> Result<IndexMetadata> {
let index = self
.index_resolver
.create_index(uid.clone(), primary_key)
.await?;
let meta = spawn_blocking(move || -> IndexResult<_> {
let meta = index.meta()?;
let meta = IndexMetadata {
uuid: index.uuid(),
uid: uid.clone(),
name: uid,
meta,
};
Ok(meta)
})
.await??;
Ok(meta)
}
pub async fn delete_index(&self, uid: String) -> Result<()> {
let uuid = self.index_resolver.delete_index(uid).await?;
let update_sender = self.update_sender.clone();
tokio::spawn(async move {
let _ = UpdateMsg::delete(&update_sender, uuid).await;
});
Ok(())
}
}
pub async fn get_arc_ownership_blocking<T>(mut item: Arc<T>) -> T {
loop {
match Arc::try_unwrap(item) {
Ok(item) => return item,
Err(item_arc) => {
item = item_arc;
sleep(Duration::from_millis(100)).await;
continue;
}
}
}
}
#[cfg(test)]
mod test {
use futures::future::ok;
use mockall::predicate::eq;
use tokio::sync::mpsc;
use crate::index::error::Result as IndexResult;
use crate::index::test::Mocker;
use crate::index::Index;
use crate::index_controller::dump_actor::MockDumpActorHandle;
use crate::index_controller::index_resolver::index_store::MockIndexStore;
use crate::index_controller::index_resolver::uuid_store::MockUuidStore;
use super::updates::UpdateSender;
use super::*;
impl<D: DumpActorHandle> IndexController<MockUuidStore, MockIndexStore, D> {
pub fn mock(
index_resolver: IndexResolver<MockUuidStore, MockIndexStore>,
update_sender: UpdateSender,
dump_handle: D,
) -> Self {
IndexController {
index_resolver: Arc::new(index_resolver),
update_sender,
dump_handle: Arc::new(dump_handle),
}
}
}
#[actix_rt::test]
async fn test_search_simple() {
let index_uid = "test";
let index_uuid = Uuid::new_v4();
let query = SearchQuery {
q: Some(String::from("hello world")),
offset: Some(10),
limit: 0,
attributes_to_retrieve: Some(vec!["string".to_owned()].into_iter().collect()),
attributes_to_crop: None,
crop_length: 18,
attributes_to_highlight: None,
matches: true,
filter: None,
sort: None,
facets_distribution: None,
};
let result = SearchResult {
hits: vec![],
nb_hits: 29,
exhaustive_nb_hits: true,
query: "hello world".to_string(),
limit: 24,
offset: 0,
processing_time_ms: 50,
facets_distribution: None,
exhaustive_facets_count: Some(true),
};
let mut uuid_store = MockUuidStore::new();
uuid_store
.expect_get_uuid()
.with(eq(index_uid.to_owned()))
.returning(move |s| Box::pin(ok((s, Some(index_uuid)))));
let mut index_store = MockIndexStore::new();
let result_clone = result.clone();
let query_clone = query.clone();
index_store
.expect_get()
.with(eq(index_uuid))
.returning(move |_uuid| {
let result = result_clone.clone();
let query = query_clone.clone();
let mocker = Mocker::default();
mocker
.when::<SearchQuery, IndexResult<SearchResult>>("perform_search")
.once()
.then(move |q| {
assert_eq!(&q, &query);
Ok(result.clone())
});
let index = Index::faux(mocker);
Box::pin(ok(Some(index)))
});
let index_resolver = IndexResolver::new(uuid_store, index_store);
let (update_sender, _) = mpsc::channel(1);
let dump_actor = MockDumpActorHandle::new();
let index_controller = IndexController::mock(index_resolver, update_sender, dump_actor);
let r = index_controller
.search(index_uid.to_owned(), query.clone())
.await
.unwrap();
assert_eq!(r, result);
}
}

View File

@ -0,0 +1,299 @@
use std::path::{Path, PathBuf};
use std::sync::Arc;
use std::time::Duration;
use anyhow::bail;
use log::{error, info, trace};
use tokio::fs;
use tokio::task::spawn_blocking;
use tokio::time::sleep;
use crate::compression::from_tar_gz;
use crate::index_controller::updates::UpdateMsg;
use super::index_resolver::index_store::IndexStore;
use super::index_resolver::uuid_store::UuidStore;
use super::index_resolver::IndexResolver;
use super::updates::UpdateSender;
pub struct SnapshotService<U, I> {
index_resolver: Arc<IndexResolver<U, I>>,
update_sender: UpdateSender,
snapshot_period: Duration,
snapshot_path: PathBuf,
db_name: String,
}
impl<U, I> SnapshotService<U, I>
where
U: UuidStore + Sync + Send + 'static,
I: IndexStore + Sync + Send + 'static,
{
pub fn new(
index_resolver: Arc<IndexResolver<U, I>>,
update_sender: UpdateSender,
snapshot_period: Duration,
snapshot_path: PathBuf,
db_name: String,
) -> Self {
Self {
index_resolver,
update_sender,
snapshot_period,
snapshot_path,
db_name,
}
}
pub async fn run(self) {
info!(
"Snapshot scheduled every {}s.",
self.snapshot_period.as_secs()
);
loop {
if let Err(e) = self.perform_snapshot().await {
error!("Error while performing snapshot: {}", e);
}
sleep(self.snapshot_period).await;
}
}
async fn perform_snapshot(&self) -> anyhow::Result<()> {
trace!("Performing snapshot.");
let snapshot_dir = self.snapshot_path.clone();
fs::create_dir_all(&snapshot_dir).await?;
let temp_snapshot_dir = spawn_blocking(tempfile::tempdir).await??;
let temp_snapshot_path = temp_snapshot_dir.path().to_owned();
let indexes = self
.index_resolver
.snapshot(temp_snapshot_path.clone())
.await?;
if indexes.is_empty() {
return Ok(());
}
UpdateMsg::snapshot(&self.update_sender, temp_snapshot_path.clone(), indexes).await?;
let snapshot_path = self
.snapshot_path
.join(format!("{}.snapshot", self.db_name));
let snapshot_path = spawn_blocking(move || -> anyhow::Result<PathBuf> {
let temp_snapshot_file = tempfile::NamedTempFile::new()?;
let temp_snapshot_file_path = temp_snapshot_file.path().to_owned();
crate::compression::to_tar_gz(temp_snapshot_path, temp_snapshot_file_path)?;
temp_snapshot_file.persist(&snapshot_path)?;
Ok(snapshot_path)
})
.await??;
trace!("Created snapshot in {:?}.", snapshot_path);
Ok(())
}
}
pub fn load_snapshot(
db_path: impl AsRef<Path>,
snapshot_path: impl AsRef<Path>,
ignore_snapshot_if_db_exists: bool,
ignore_missing_snapshot: bool,
) -> anyhow::Result<()> {
if !db_path.as_ref().exists() && snapshot_path.as_ref().exists() {
match from_tar_gz(snapshot_path, &db_path) {
Ok(()) => Ok(()),
Err(e) => {
//clean created db folder
std::fs::remove_dir_all(&db_path)?;
Err(e)
}
}
} else if db_path.as_ref().exists() && !ignore_snapshot_if_db_exists {
bail!(
"database already exists at {:?}, try to delete it or rename it",
db_path
.as_ref()
.canonicalize()
.unwrap_or_else(|_| db_path.as_ref().to_owned())
)
} else if !snapshot_path.as_ref().exists() && !ignore_missing_snapshot {
bail!(
"snapshot doesn't exist at {:?}",
snapshot_path
.as_ref()
.canonicalize()
.unwrap_or_else(|_| snapshot_path.as_ref().to_owned())
)
} else {
Ok(())
}
}
#[cfg(test)]
mod test {
use std::{collections::HashSet, sync::Arc};
use futures::future::{err, ok};
use once_cell::sync::Lazy;
use rand::Rng;
use uuid::Uuid;
use crate::index::error::IndexError;
use crate::index::test::Mocker;
use crate::index::{error::Result as IndexResult, Index};
use crate::index_controller::index_resolver::error::IndexResolverError;
use crate::index_controller::index_resolver::index_store::MockIndexStore;
use crate::index_controller::index_resolver::uuid_store::MockUuidStore;
use crate::index_controller::index_resolver::IndexResolver;
use crate::index_controller::updates::create_update_handler;
use super::*;
fn setup() {
static SETUP: Lazy<()> = Lazy::new(|| {
if cfg!(windows) {
std::env::set_var("TMP", ".");
} else {
std::env::set_var("TMPDIR", ".");
}
});
// just deref to make sure the env is setup
*SETUP
}
#[actix_rt::test]
async fn test_normal() {
setup();
let mut rng = rand::thread_rng();
let uuids_num: usize = rng.gen_range(5..10);
let uuids = (0..uuids_num)
.map(|_| Uuid::new_v4())
.collect::<HashSet<_>>();
let mut uuid_store = MockUuidStore::new();
let uuids_clone = uuids.clone();
uuid_store
.expect_snapshot()
.times(1)
.returning(move |_| Box::pin(ok(uuids_clone.clone())));
let mut indexes = uuids.clone().into_iter().map(|uuid| {
let mocker = Mocker::default();
mocker
.when("snapshot")
.times(1)
.then(|_: &Path| -> IndexResult<()> { Ok(()) });
mocker.when("uuid").then(move |_: ()| uuid);
Index::faux(mocker)
});
let uuids_clone = uuids.clone();
let mut index_store = MockIndexStore::new();
index_store
.expect_get()
.withf(move |uuid| uuids_clone.contains(uuid))
.times(uuids_num)
.returning(move |_| Box::pin(ok(Some(indexes.next().unwrap()))));
let index_resolver = Arc::new(IndexResolver::new(uuid_store, index_store));
let dir = tempfile::tempdir().unwrap();
let update_sender =
create_update_handler(index_resolver.clone(), dir.path(), 4096 * 100).unwrap();
let snapshot_path = tempfile::tempdir().unwrap();
let snapshot_service = SnapshotService::new(
index_resolver,
update_sender,
Duration::from_millis(100),
snapshot_path.path().to_owned(),
"data.ms".to_string(),
);
snapshot_service.perform_snapshot().await.unwrap();
}
#[actix_rt::test]
async fn error_performing_uuid_snapshot() {
setup();
let mut uuid_store = MockUuidStore::new();
uuid_store
.expect_snapshot()
.once()
.returning(move |_| Box::pin(err(IndexResolverError::IndexAlreadyExists)));
let mut index_store = MockIndexStore::new();
index_store.expect_get().never();
let index_resolver = Arc::new(IndexResolver::new(uuid_store, index_store));
let dir = tempfile::tempdir().unwrap();
let update_sender =
create_update_handler(index_resolver.clone(), dir.path(), 4096 * 100).unwrap();
let snapshot_path = tempfile::tempdir().unwrap();
let snapshot_service = SnapshotService::new(
index_resolver,
update_sender,
Duration::from_millis(100),
snapshot_path.path().to_owned(),
"data.ms".to_string(),
);
assert!(snapshot_service.perform_snapshot().await.is_err());
}
#[actix_rt::test]
async fn error_performing_index_snapshot() {
setup();
let uuids: HashSet<Uuid> = vec![Uuid::new_v4()].into_iter().collect();
let mut uuid_store = MockUuidStore::new();
let uuids_clone = uuids.clone();
uuid_store
.expect_snapshot()
.once()
.returning(move |_| Box::pin(ok(uuids_clone.clone())));
let mut indexes = uuids.clone().into_iter().map(|uuid| {
let mocker = Mocker::default();
// index returns random error
mocker
.when("snapshot")
.then(|_: &Path| -> IndexResult<()> { Err(IndexError::ExistingPrimaryKey) });
mocker.when("uuid").then(move |_: ()| uuid);
Index::faux(mocker)
});
let uuids_clone = uuids.clone();
let mut index_store = MockIndexStore::new();
index_store
.expect_get()
.withf(move |uuid| uuids_clone.contains(uuid))
.once()
.returning(move |_| Box::pin(ok(Some(indexes.next().unwrap()))));
let index_resolver = Arc::new(IndexResolver::new(uuid_store, index_store));
let dir = tempfile::tempdir().unwrap();
let update_sender =
create_update_handler(index_resolver.clone(), dir.path(), 4096 * 100).unwrap();
let snapshot_path = tempfile::tempdir().unwrap();
let snapshot_service = SnapshotService::new(
index_resolver,
update_sender,
Duration::from_millis(100),
snapshot_path.path().to_owned(),
"data.ms".to_string(),
);
assert!(snapshot_service.perform_snapshot().await.is_err());
}
}

View File

@ -0,0 +1,177 @@
use std::fs::{create_dir_all, File};
use std::io::{self, BufReader, BufWriter, Write};
use std::ops::{Deref, DerefMut};
use std::path::{Path, PathBuf};
use milli::documents::DocumentBatchReader;
use serde_json::Map;
use tempfile::{NamedTempFile, PersistError};
use uuid::Uuid;
const UPDATE_FILES_PATH: &str = "updates/updates_files";
use crate::document_formats::read_ndjson;
pub struct UpdateFile {
path: PathBuf,
file: NamedTempFile,
}
#[derive(Debug, thiserror::Error)]
#[error("Error while persisting update to disk: {0}")]
pub struct UpdateFileStoreError(Box<dyn std::error::Error + Sync + Send + 'static>);
type Result<T> = std::result::Result<T, UpdateFileStoreError>;
macro_rules! into_update_store_error {
($($other:path),*) => {
$(
impl From<$other> for UpdateFileStoreError {
fn from(other: $other) -> Self {
Self(Box::new(other))
}
}
)*
};
}
into_update_store_error!(
PersistError,
io::Error,
serde_json::Error,
milli::documents::Error
);
impl UpdateFile {
pub fn persist(self) -> Result<()> {
self.file.persist(&self.path)?;
Ok(())
}
}
impl Deref for UpdateFile {
type Target = NamedTempFile;
fn deref(&self) -> &Self::Target {
&self.file
}
}
impl DerefMut for UpdateFile {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.file
}
}
#[derive(Clone, Debug)]
pub struct UpdateFileStore {
path: PathBuf,
}
impl UpdateFileStore {
pub fn load_dump(src: impl AsRef<Path>, dst: impl AsRef<Path>) -> anyhow::Result<()> {
let src_update_files_path = src.as_ref().join(UPDATE_FILES_PATH);
let dst_update_files_path = dst.as_ref().join(UPDATE_FILES_PATH);
// No update files to load
if !src_update_files_path.exists() {
return Ok(());
}
create_dir_all(&dst_update_files_path)?;
let entries = std::fs::read_dir(src_update_files_path)?;
for entry in entries {
let entry = entry?;
let update_file = BufReader::new(File::open(entry.path())?);
let file_uuid = entry.file_name();
let file_uuid = file_uuid
.to_str()
.ok_or_else(|| anyhow::anyhow!("invalid update file name"))?;
let dst_path = dst_update_files_path.join(file_uuid);
let dst_file = BufWriter::new(File::create(dst_path)?);
read_ndjson(update_file, dst_file)?;
}
Ok(())
}
pub fn new(path: impl AsRef<Path>) -> Result<Self> {
let path = path.as_ref().join(UPDATE_FILES_PATH);
std::fs::create_dir_all(&path)?;
Ok(Self { path })
}
/// Creates a new temporary update file.
///
/// A call to `persist` is needed to persist the file in the database.
pub fn new_update(&self) -> Result<(Uuid, UpdateFile)> {
let file = NamedTempFile::new()?;
let uuid = Uuid::new_v4();
let path = self.path.join(uuid.to_string());
let update_file = UpdateFile { file, path };
Ok((uuid, update_file))
}
/// Returns the file corresponding to the requested uuid.
pub fn get_update(&self, uuid: Uuid) -> Result<File> {
let path = self.path.join(uuid.to_string());
let file = File::open(path)?;
Ok(file)
}
/// Copies the content of the update file pointed to by `uuid` to the `dst` directory.
pub fn snapshot(&self, uuid: Uuid, dst: impl AsRef<Path>) -> Result<()> {
let src = self.path.join(uuid.to_string());
let mut dst = dst.as_ref().join(UPDATE_FILES_PATH);
std::fs::create_dir_all(&dst)?;
dst.push(uuid.to_string());
std::fs::copy(src, dst)?;
Ok(())
}
/// Peforms a dump of the given update file uuid into the provided dump path.
pub fn dump(&self, uuid: Uuid, dump_path: impl AsRef<Path>) -> Result<()> {
let uuid_string = uuid.to_string();
let update_file_path = self.path.join(&uuid_string);
let mut dst = dump_path.as_ref().join(UPDATE_FILES_PATH);
std::fs::create_dir_all(&dst)?;
dst.push(&uuid_string);
let update_file = File::open(update_file_path)?;
let mut dst_file = NamedTempFile::new()?;
let mut document_reader = DocumentBatchReader::from_reader(update_file)?;
let mut document_buffer = Map::new();
// TODO: we need to find a way to do this more efficiently. (create a custom serializer
// for jsonl for example...)
while let Some((index, document)) = document_reader.next_document_with_index()? {
for (field_id, content) in document.iter() {
if let Some(field_name) = index.get_by_left(&field_id) {
let content = serde_json::from_slice(content)?;
document_buffer.insert(field_name.to_string(), content);
}
}
serde_json::to_writer(&mut dst_file, &document_buffer)?;
dst_file.write_all(b"\n")?;
document_buffer.clear();
}
dst_file.persist(dst)?;
Ok(())
}
pub fn get_size(&self, uuid: Uuid) -> Result<u64> {
Ok(self.get_update(uuid)?.metadata()?.len())
}
pub fn delete(&self, uuid: Uuid) -> Result<()> {
let path = self.path.join(uuid.to_string());
std::fs::remove_file(path)?;
Ok(())
}
}

View File

@ -0,0 +1,74 @@
use std::error::Error;
use std::fmt;
use meilisearch_error::{Code, ErrorCode};
use crate::{
document_formats::DocumentFormatError,
index::error::IndexError,
index_controller::{update_file_store::UpdateFileStoreError, DocumentAdditionFormat},
};
pub type Result<T> = std::result::Result<T, UpdateLoopError>;
#[derive(Debug, thiserror::Error)]
#[allow(clippy::large_enum_variant)]
pub enum UpdateLoopError {
#[error("Update {0} not found.")]
UnexistingUpdate(u64),
#[error("Internal error: {0}")]
Internal(Box<dyn Error + Send + Sync + 'static>),
#[error(
"update store was shut down due to a fatal error, please check your logs for more info."
)]
FatalUpdateStoreError,
#[error("{0}")]
DocumentFormatError(#[from] DocumentFormatError),
// TODO: The reference to actix has to go.
#[error("{0}")]
PayloadError(#[from] actix_web::error::PayloadError),
#[error("A {0} payload is missing.")]
MissingPayload(DocumentAdditionFormat),
#[error("{0}")]
IndexError(#[from] IndexError),
}
impl<T> From<tokio::sync::mpsc::error::SendError<T>> for UpdateLoopError
where
T: Sync + Send + 'static + fmt::Debug,
{
fn from(other: tokio::sync::mpsc::error::SendError<T>) -> Self {
Self::Internal(Box::new(other))
}
}
impl From<tokio::sync::oneshot::error::RecvError> for UpdateLoopError {
fn from(other: tokio::sync::oneshot::error::RecvError) -> Self {
Self::Internal(Box::new(other))
}
}
internal_error!(
UpdateLoopError: heed::Error,
std::io::Error,
serde_json::Error,
tokio::task::JoinError,
UpdateFileStoreError
);
impl ErrorCode for UpdateLoopError {
fn error_code(&self) -> Code {
match self {
Self::UnexistingUpdate(_) => Code::NotFound,
Self::Internal(_) => Code::Internal,
Self::FatalUpdateStoreError => Code::Internal,
Self::DocumentFormatError(error) => error.error_code(),
Self::PayloadError(error) => match error {
actix_web::error::PayloadError::Overflow => Code::PayloadTooLarge,
_ => Code::Internal,
},
Self::MissingPayload(_) => Code::MissingPayload,
Self::IndexError(e) => e.error_code(),
}
}
}

Some files were not shown because too many files have changed in this diff Show More