Compare commits

...

541 Commits

Author SHA1 Message Date
4398f2c023 Merge #982
982: fix backups r=MarinPostma a=LegendreM

* pluralize variable `backup_folder` -> `backups_folder`
* change env case `MEILI_backup_folder` -> `MEILI_BACKUPS_FOLDER`
* add miliseconds to backup ID to reduce colisions

Co-authored-by: many <maxime@meilisearch.com>
2020-09-30 17:02:34 +00:00
afc3b0915b fix backups
* pluralize variable `backup_folder` -> `backups_folder`
* change env case `MEILI_backup_folder` -> `MEILI_BACKUPS_FOLDER`
* add miliseconds to backup ID to reduce colisions
* fix forgoten stats synchronization
2020-09-30 13:20:40 +02:00
f313de98c8 Merge #980
980: bump meilisearch to v0.15.0 r=Kerollmops a=MarinPostma



Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-09-28 15:09:26 +00:00
03d4651077 bump meilisearch 2020-09-28 16:56:05 +02:00
32f6a9a457 Merge #976
976: Revert 944 r=MarinPostma a=MarinPostma

revert #944 
@bidoubiwa  @curquiza @eskombro, this was a misunderstanding from our side. Doing this would in fact be an error, and would prevent us to do this: https://github.com/meilisearch/MeiliSearch/issues/945#issuecomment-685526678, which is what we are really after. We are resetting this to its default behaviour before it goes in prodution. Sorry for the confusion.

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-09-28 13:38:46 +00:00
099a0802fc Merge #916
916: Considere an empty query search as a placeholder search r=MarinPostma a=qdequele

Fix #856; Relative tracking issue: #729

Co-authored-by: qdequele <quentin@meilisearch.com>
2020-09-28 13:13:47 +00:00
e258e0b2c2 Merge #887
887: backup r=Kerollmops a=LegendreM

[Tracking Issue](https://github.com/meilisearch/MeiliSearch/issues/840)
[Documentation PR](https://github.com/meilisearch/documentation/pull/468)
[Other relevant issue](https://github.com/meilisearch/MeiliSearch/issues/884)

Co-authored-by: many <maxime@meilisearch.com>
2020-09-28 12:47:08 +00:00
c254320860 Implement backups
* trigger backup importation via http route
* follow backup advancement with status route
* import backup via a command line
* let user choose batch size of documents to import (command lines)

closes #884
closes #840
2020-09-28 14:40:06 +02:00
51fd849852 cargo fmt 2020-09-28 14:23:32 +02:00
ab170ce4fd add test 2020-09-28 14:19:45 +02:00
90226dc8a9 Considere an empty query search as a placeholder search #916 2020-09-28 14:19:45 +02:00
63868b2600 Merge #977
977: update pest dependency r=Kerollmops a=MarinPostma

update pest dependency to official repo

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-09-25 19:35:25 +00:00
22d439f682 update pest dependency 2020-09-24 18:36:38 +02:00
394f2abd49 Merge #971
971: Meili tests r=MarinPostma a=MarinPostma

#869 

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-09-24 16:06:35 +00:00
030bcd8b05 Revert "facet count more tests"
This reverts commit 954f572e79.
2020-09-24 16:40:18 +02:00
d8d29d3615 Revert "fix facet count bug"
This reverts commit 733c02dd7c.
2020-09-24 16:39:42 +02:00
efe5984d54 Merge #963
963: upgrade actix-web to v3 r=Kerollmops a=robjtede

Test failures are the same before and after upgrade.

Co-authored-by: Rob Ede <robjtede@icloud.com>
2020-09-22 15:30:21 +00:00
63260e6443 add tests for documents 2020-09-22 16:05:40 +02:00
a794970b72 additional tests for index 2020-09-22 10:51:34 +02:00
ba0f44e361 upgrade actix-web to v3 2020-09-21 22:37:54 +01:00
4acaecd921 Merge #749
749: Contributor guidelines r=Kerollmops a=erlend-sh

Preliminary contributor guidelines, heavily based on the [Vector doc](https://github.com/timberio/vector/blob/master/CONTRIBUTING.md).

Co-authored-by: Erlend Sogge Heggen <e.soghe@gmail.com>
2020-09-21 09:51:56 +00:00
84a3e95fa4 Merge branch 'stable' 2020-09-11 12:08:20 +02:00
f045e111ea Merge #960
960: bump version and update changelog r=MarinPostma a=LegendreM

* bump to 0.14.1
* update CHANGELOG.md file

Co-authored-by: many <maxime@meilisearch.com>
2020-09-08 16:11:53 +00:00
87a76c2a60 bump version and update changelog 2020-09-08 18:11:03 +02:00
4edaebab90 Merge #959
959: add version guard in copy_and_compact_to_path function r=MarinPostma a=LegendreM

fix #958

need to create 0.14.1

Co-authored-by: many <maxime@meilisearch.com>
2020-09-08 08:35:49 +00:00
b43137b508 add version guard in copy_and_compact_to_path function 2020-09-07 18:21:04 +02:00
a07c3743f0 Merge #944
944: Fix facet count r=MarinPostma a=MarinPostma

fix bug reported in: https://github.com/meilisearch/MeiliSearch/issues/929#issuecomment-683683728

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-09-01 08:43:47 +00:00
954f572e79 facet count more tests 2020-09-01 10:27:50 +02:00
733c02dd7c fix facet count bug 2020-09-01 10:12:00 +02:00
c94daf8c3d Merge #933
933: README.md - Fixed Small Typo r=MarinPostma a=LiamRiddell



Co-authored-by: Liam Riddell <3812154+LiamRiddell@users.noreply.github.com>
2020-08-28 13:09:34 +00:00
6db51ed8b2 README.md - Fixed Small Typo 2020-08-28 13:44:53 +01:00
118c673eaf Merge #927
927: Bump meilisearch r=Kerollmops a=MarinPostma

bump meilisearch version 0.14.0

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-08-24 14:36:21 +00:00
a9a2d3bca3 update changelog 2020-08-24 15:49:24 +02:00
4a9e56aa4f bump meilisearch version 0.14.0 2020-08-24 15:49:09 +02:00
14bb9505eb Merge #926
926: Update genre field with genres r=MarinPostma a=bidoubiwa

Most code samples are made with the assumption that the `genres` field takes an `s`. I'm updating the dataset to match those code-samples.


Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
2020-08-24 12:48:08 +00:00
d937aeac0a Update genre field with genres 2020-08-24 14:22:33 +02:00
dd540d2540 Merge #924
924: change max db size opt name r=Kerollmops a=MarinPostma

fix #867

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-08-24 12:18:17 +00:00
4ecaf99047 fix test option test 2020-08-24 14:14:11 +02:00
445a6c9ea2 update options name 2020-08-21 14:42:20 +02:00
67b7d60cb0 Merge #920
920: fix bug and add tests r=MarinPostma a=LegendreM

- add tests about updates
- fix select bug

fix #896

Co-authored-by: many <maxime@meilisearch.com>
2020-08-19 07:56:27 +00:00
94b3e8e56e fix bug and add tests
- add tests about updates
- fix select bug

fix #896
2020-08-19 09:51:57 +02:00
89b5ae63fc Merge #915
915: fix unwrap bug r=Kerollmops a=MarinPostma

fix #912.

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-08-18 12:50:10 +00:00
2a79dc9ded log error on unwrap error 2020-08-17 16:32:40 +02:00
5ed62dbf76 fix unwrap bug 2020-08-14 12:16:48 +02:00
cb267b68ed Merge #910
910: Fix typo in error message r=MarinPostma a=curquiza

Thanks to @ppamorim for reporting the typos to me!

Co-authored-by: Clementine Urquizar <clementine@meilisearch.com>
2020-08-13 15:43:58 +00:00
6539be6c46 Fix typo in error message 2020-08-13 17:13:19 +02:00
a23bdb31a3 Merge #829
829: implement snapshoting r=MarinPostma a=LegendreM

related to #551.

This pull request permit user to create periodically a snapshot of MeiliSearch database via a command line and launch meiliSearch from a snapshot with another command

## Documentation

### schedule a snapshot
`--snapshot-path <DIRECTORY_PATH>`:
this will periodically create a snapshot `<DB_NAME>.tar.gz` in the specified directory

### change period between 2 snapshot creation
`--snapshot-interval-sec <GAP_IN_SEC>`
choose the time gap between 2 snapshot

### start meilisearch from a snapshot
`--load-from-snapshot <FILE_PATH>`
this will use the snapshot stored at `<FILE_PATH>` to initialize MeiliSearch database,

`--ignore-snapshot-if-db-exists` if set and if a db already exists,
this will skip snapshot importation and continue process with actual db instead of quitting process by returning an Error

`--ignore-missing-snapshot` if set and if no snapshot exists at provided path,
this will skip snapshot importation and continue process with actual db instead of quitting process by returning an Error

Co-authored-by: many <maxime@meilisearch.com>
2020-08-12 16:37:31 +00:00
9014290875 implement snapshot 2020-08-12 17:46:28 +02:00
1903302a74 Merge #906
906: Facet distribution correct case r=LegendreM a=MarinPostma

~

Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: marin <postma.marin@protonmail.com>
2020-08-12 09:04:36 +00:00
75c3cb4bb6 fix compile error 2020-08-12 10:31:11 +02:00
bfd0f806f8 requested changed
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2020-08-12 10:31:11 +02:00
afab8a7846 clean facet result types 2020-08-12 10:31:11 +02:00
afacdbc7a0 update tests for facets distribution case 2020-08-12 10:31:11 +02:00
18a50b4dac fix facet distribution case 2020-08-12 10:31:10 +02:00
fb69769991 Merge #889
889: Fix clippy warnings r=MarinPostma a=TaKO8Ki

Good day!

Since `cargo clippy` showed two warnings like the following, I've fixed them. This is a small PR.

```sh
warning: use of `ok_or` followed by a function call
   --> meilisearch-core/src/database.rs:185:18
    |
185 |                 .ok_or(Error::VersionMismatch("bad VERSION file".to_string()))?;
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try this: `ok_or_else(|| Error::VersionMismatch("bad VERSION file".to_string()))`
    |
    = note: `#[warn(clippy::or_fun_call)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#or_fun_call

warning: useless use of `format!`
   --> meilisearch-core/src/database.rs:208:59
    |
208 |                         return Err(Error::VersionMismatch(format!("<0.12.0")));
    |                                                           ^^^^^^^^^^^^^^^^^^ help: consider using `.to_string()`: `"<0.12.0".to_string()`
    |
    = note: `#[warn(clippy::useless_format)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_format

warning: 2 warnings emitted
```

Co-authored-by: Takayuki Maeda <41065217+TaKO8Ki@users.noreply.github.com>
2020-07-29 11:40:08 +00:00
750e7382c6 fix clippy warnings 2020-07-29 11:32:34 +09:00
2464cc7a6d Merge #888
888: Remove schema mention in error message r=MarinPostma a=curquiza

We avoid mentioning the schema since MeiliSearch is schemaless for the user 🙂

Co-authored-by: Clementine Urquizar <clementine@meilisearch.com>
2020-07-28 15:20:59 +00:00
f078cbac4d Remove schema mention in error message 2020-07-28 15:18:05 +02:00
aa545e5386 Merge #638 #828 #865
638: Update requitites for source build(rust version) r=MarinPostma a=djKooks

Hello,
I just found that compile via source has been failed by issue here:
```
error[E0658]: the `#[non_exhaustive]` attribute is an experimental feature
  --> /Users/kwangin.jung/.cargo/registry/src/github.com-1ecc6299db9ec823/whoami-0.8.1/src/lib.rs:40:1
   |
40 | #[non_exhaustive]
   | ^^^^^^^^^^^^^^^^^
   |
   = note: for more information, see https://github.com/rust-lang/rust/issues/44109

error[E0658]: the `#[non_exhaustive]` attribute is an experimental feature
   --> /Users/kwangin.jung/.cargo/registry/src/github.com-1ecc6299db9ec823/whoami-0.8.1/src/lib.rs:102:1
    |
102 | #[non_exhaustive]
    | ^^^^^^^^^^^^^^^^^
    |
    = note: for more information, see https://github.com/rust-lang/rust/issues/44109
```
Seems `#[non_exhaustive]` is a new feature on Rust 1.40.0, so added as pre-requitites.


828: Cleanup readme r=MarinPostma a=tpayet

Closes #613 

865: Update movie dataset with genre field r=MarinPostma a=bidoubiwa

Updated the movie dataset by adding  the `genre` field to each movies where the genre could be fetched.
The `genre` was fetch for each movie by making a search request on the bigger movie dataset (200mb) using MeilISearch. 

I make this proposition to make testing and trying  more accessible. 

```json
{
  "id": "323661",
  "title": "Mune: Guardian of the Moon",
  "poster": "https://image.tmdb.org/t/p/w1280/4vzqow7mVUahqA4hHoe2UpQOxy.jpg",
  "overview": "When a faun named Mune becomes the Guardian of the Moon, little did he had unprepared experience with the Moon and an accident that could put both the Moon and the Sun in danger, including a corrupt titan named Necross who wants the Sun for himself and placing the balance of night and day in great peril. Now with the help of a wax-child named Glim and the warrior, Sohone who also became the Sun Guardian, they go out on an exciting journey to get the Sun back and restore the Moon to their rightful place in the sky.",
  "release_date": 1423094400,
  "genre": [
    "Animation",
    "Family",
    "Adventure",
    "Fantasy",
    "Comedy"
  ]
}
{
  "id": "306",
  "title": "Beverly Hills Cop III",
  "poster": "https://image.tmdb.org/t/p/w1280/tw9gAhqQcBFX0X0XfVbWqUsmzoU.jpg",
  "overview": "Back in sunny southern California and on the trail of two murderers, Axel Foley again teams up with LA cop Billy Rosewood. Soon, they discover that an amusement park is being used as a front for a massive counterfeiting ring – and it's run by the same gang that shot Billy's boss.",
  "release_date": 769741200,
  "genre": [
    "Action",
    "Comedy",
    "Crime"
  ]
}
```

Co-authored-by: kwangin.jung <inylove82@gmail.com>
Co-authored-by: Thomas Payet <thomas@meilisearch.com>
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
2020-07-24 09:45:01 +00:00
9711100ff1 Merge #874
874: Fixes default values on web interface r=MarinPostma a=tpayet



Co-authored-by: Thomas Payet <thomas@meilisearch.com>
2020-07-24 09:20:33 +00:00
8c49ee1b3b Fixes default values on web interface 2020-07-22 14:42:34 +02:00
44cb7f68f9 Merge #878
878: Bump meilisearch v0.13.0 r=MarinPostma a=MarinPostma



Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-07-22 09:18:56 +00:00
25dc2ad66f update changelog 2020-07-22 10:56:19 +02:00
624bd56459 bump meilisearch version 2020-07-22 10:56:19 +02:00
7a6615cfa7 Merge #785
785: Adding a tracking issue template r=MarinPostma a=qdequele



Co-authored-by: Quentin de Quelen <quentin@dequelen.me>
2020-07-22 08:49:27 +00:00
bcad3ffd7c Merge #873
873: Update CI for new workflow r=MarinPostma a=MarinPostma

This pr implements the necessary automation for our new release workflow.

## Pre-releases

whenever something is pushed to a branch `release-v*`, tests are triggered. If all test pass, the current reference is checked to see if it's a release branch. If it's a release branch, a pre-release is created for this branch and assets are automatically generated for this branch. The prerelease has the tag `vx.x.xrcn` where `x.x.x` is the version extracteds from the branch name, and n is the number of commits since the branch was forked from master. (starting from rc0).

## Releases

Whenever something is pushed to stable and tagged `vx.x.x` where `x.x.x` is the version, tests are run and a release is generated containing the assets, and binaries are published to docker, brew, apt, etc.

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-07-22 08:24:24 +00:00
98d87fa1ff Merge #868
868: Update error.rs r=MarinPostma a=tpayet



Co-authored-by: Thomas Payet <thomas@meilisearch.com>
2020-07-21 16:54:56 +00:00
7e00bf4bfa update ci to new workflow 2020-07-21 16:52:01 +02:00
476aecf86d Cleanup readme 2020-07-20 16:03:25 +02:00
c39b358518 Update error.rs 2020-07-20 14:42:47 +02:00
bd5d25429b Update movie dataset with genre field 2020-07-20 10:39:29 +02:00
982fb7b786 Merge #858
858: update error url r=LegendreM a=MarinPostma

@bidoubiwa 

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-07-16 14:55:52 +00:00
7dc628965c Merge #846
846: Change settings behavior r=LegendreM a=MarinPostma

partially implements #824.

Returning the field distribution for all know fields is more complicated that anticipated, see https://github.com/meilisearch/MeiliSearch/issues/824#issuecomment-657656561

If we decide to to it anyway, and find a reasonable solution, I will make another PR.

fix #853 by resetting displayed and searchable attributes to wildcard when attributes are set to `[]` in the all settings route. @curquiza @bidoubiwa can you confirm me that this is the expected behavior?

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-07-16 14:31:06 +00:00
d114250ebb requested changes 2020-07-16 16:19:15 +02:00
8eec3bcdc2 update error url 2020-07-16 15:14:53 +02:00
0583cd8e5d Merge pull request #810 from MarinPostma/remove-sys-info
remove the sys-info routes
2020-07-15 20:24:18 +02:00
83b6fc48e1 remove the sys-info routes 2020-07-15 19:33:29 +02:00
4b5437a882 fix displayed attrs empty array bug 2020-07-15 19:25:24 +02:00
de4caef468 test reset attributes to wildcard 2020-07-15 18:56:19 +02:00
36b763b84e test setting attributes before adding documents 2020-07-15 18:56:19 +02:00
c06dd35af1 fix tests 2020-07-15 18:56:19 +02:00
51b7cb2722 remove accept new fields / add indexed * 2020-07-15 18:56:19 +02:00
7f5fb50307 add displayed attributes wildcard 2020-07-15 18:56:19 +02:00
4262561596 Merge #819
819: run clippy during tests r=MarinPostma a=MarinPostma



Co-authored-by: marin <postma.marin@protonmail.com>
Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-07-15 08:07:42 +00:00
8471796987 add clippy component 2020-07-13 18:53:19 +02:00
2775aeb6ac Merge #794
794: Check database version mismatch r=MarinPostma a=MarinPostma

Checks if the versions of the database and the engine are compatible.

The database and the engine are compatible if they share the same major and minor version.

The engine will refuse to start if there is a mismatch.

@bidoubiwa do we need to document this?

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-07-13 15:08:33 +00:00
a747e79e5d run clippy during tests 2020-07-13 16:15:32 +02:00
5773c5c865 check version file against regex 2020-07-13 16:06:28 +02:00
51d7c84e73 better exit on error
Update meilisearch-core/src/database.rs

Co-authored-by: Clément Renault <renault.cle@gmail.com>

Update meilisearch-core/src/database.rs

Co-authored-by: Clément Renault <renault.cle@gmail.com>
2020-07-13 16:06:28 +02:00
6f0b6933e6 update changelog 2020-07-13 16:05:56 +02:00
f5a936614a error on meili database version mismatch 2020-07-13 16:05:08 +02:00
308630c094 Merge #841
841: Unique docid bugfix r=LegendreM a=MarinPostma

fix #827 

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-07-13 13:36:32 +00:00
f54397e0cf test unique document id bug 2020-07-13 15:14:07 +02:00
754efe1f42 fix document id uniqueness bug 2020-07-13 15:14:07 +02:00
05c30c879f Merge #791
791: Create tests for error codes r=LegendreM a=MarinPostma

- create tests for error codes
-  fix primary key error that returned internal error instead of the correct error
- bits of documentation for error
- change a bunch of error type, for better accuracy, @curquiza, @eskombro, @bidoubiwa  you may want to take a look at `meilisearch-error/src/lib.rs`
- fix #836 

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-07-13 13:12:21 +00:00
99e8d4adae fix missing primary key 2020-07-13 14:54:25 +02:00
ac63f1cd7a fix typo in error code 2020-07-13 14:54:25 +02:00
169749396b update error types to be more accurate 2020-07-13 14:54:25 +02:00
a0637c2c6d Merge #842
842: bors setup r=LegendreM a=MarinPostma

set up bors to run the tests and merge automatically.

the tests are now run only on staging and trying branches

you can use `bors r+` to test and merge the branch into master if the tests succeed

or

you can just use `bors try` to run the test on the trying branch (synced with master)

Co-authored-by: mpostma <postma.marin@protonmail.com>
2020-07-10 13:27:21 +00:00
edbba64711 fix bors.yaml 2020-07-08 21:04:07 +02:00
9ba711dfe5 update readme with bors badge 2020-07-08 14:33:15 +02:00
6bce83dde8 set bors timeout 2020-07-08 13:36:33 +02:00
629a658c75 bors setup 2020-07-08 09:50:07 +02:00
2f6c55ef78 Merge pull request #771 from MarinPostma/placeholder-search
Placeholder search
2020-07-03 18:56:55 +02:00
a6457718f2 update changelog 2020-07-03 17:17:28 +02:00
3bf23a7c59 test placeholder search
move search test macro to common module
2020-07-03 17:17:28 +02:00
bbe3a10107 implement placeholder search 2020-07-03 17:17:28 +02:00
37ee0f36c1 Merge pull request #792 from MarinPostma/error-codes-in-updates
Error codes in updates
2020-07-02 16:17:57 +02:00
e92f544fd1 add test for update errors 2020-07-02 15:18:30 +02:00
d7b49fa671 fix potential infinite loop 2020-07-02 15:18:30 +02:00
41707e3245 fix error on missing document id in document 2020-07-02 15:18:30 +02:00
3c51e9f5ed Enable error code reporting for update errors 2020-07-02 15:18:30 +02:00
7d3e937134 add tests for error codes 2020-07-02 15:18:30 +02:00
6445eea946 update error types to be more accurate 2020-07-02 15:18:28 +02:00
ced6cc0e23 fix bad error report when primary key exists 2020-07-02 15:16:48 +02:00
944a3943e5 Merge pull request #820 from MarinPostma/readme-update
update readme
2020-07-02 15:16:37 +02:00
d419f151a0 update readme 2020-07-02 15:14:05 +02:00
b2124822a3 Merge pull request #825 from Rio/log-analytics-usage
feat(analytics): log if analytics are enabled
2020-07-02 15:02:19 +02:00
f60b912f12 feat(analytics): log if analytics are enabled 2020-07-02 14:33:25 +02:00
e1f956ce18 Merge pull request #821 from aeriksson/patch-1
Fix typo in option.rs
2020-07-02 14:05:00 +02:00
ab16e2eff1 fix merge error 2020-07-02 14:04:15 +02:00
3da607749f Merge branch 'master' into patch-1 2020-07-02 13:57:52 +02:00
a626e5e935 Merge pull request #737 from balajisivaraman/wip_655
Improve test suite performance using Test Dataset
2020-07-02 13:51:38 +02:00
3d73a4895e cleanup movies dataset and related functions 2020-07-02 16:52:39 +05:30
979b01a1c0 update index status test to use the test dataset 2020-07-02 16:52:39 +05:30
38cf489acf update remaining search tests to use the test dataset 2020-07-02 16:52:39 +05:30
60264763f4 update search_settings tests to use the test dataset 2020-07-02 16:52:39 +05:30
d55124e524 update settings_ranking_rules tests to use the test dataset 2020-07-02 16:52:39 +05:30
643933c3b0 update settings tests to use the test dataset 2020-07-02 16:52:39 +05:30
44fd9384bd update stop_words tests to use the test dataset 2020-07-02 16:52:39 +05:30
75d0d2df6c update documents_delete tests to use the test dataset 2020-07-02 16:52:39 +05:30
92d9283d1a Merge pull request #823 from Rio/public-health-endpoint
chore(http): do not require auth on /health endpoint
2020-07-01 17:01:23 +02:00
9b46887f75 chore(http): do not require auth on /health endpoint
This makes it easier to determine the health of the server using http.

closes #822
2020-07-01 16:33:01 +02:00
ad267cbe59 Merge pull request #813 from Rio/remove-hardcoded-sentry-dsn
feat(sentry): make sentry dsn customizable
2020-07-01 16:15:21 +02:00
029772e11f Fix typo in option.rs 2020-07-01 13:45:00 +02:00
2ef888d100 chore(sentry): make sentry dsn customizable
By removing the hardcoded value the sentry client will fall back to pulling
it from the SENTRY_DSN environment variable. The hardcoded value has been
moved to the default value of the commandline options so the default
behavior will be the same.

A `--no-sentry` and `MEILI_NO_SENTRY` option has also been introduced
that effectively disables sentry reporting.
2020-07-01 12:55:14 +02:00
4e1e41994c Merge pull request #817 from meilisearch/bump-version
Bump meilisearch to version 0.12.0
2020-06-30 21:24:47 +02:00
0545424781 update changelog 2020-06-30 20:47:00 +02:00
69af8e9e3d bump meilisearch to 0.12.0 2020-06-30 20:42:19 +02:00
9c7abebde4 Merge pull request #816 from MarinPostma/fix-index-length
Fix long documents not being indexed completely bug
2020-06-30 19:19:07 +02:00
e240591128 add test document over 1000 words 2020-06-30 18:49:33 +02:00
0bceaa5669 add test for long document indexing 2020-06-30 17:46:23 +02:00
3423c0b246 fix indexed document length bug 2020-06-30 17:46:23 +02:00
0953d99198 Merge pull request #809 from MarinPostma/bump-script
Bump script
2020-06-30 13:54:07 +02:00
7ad835baf5 add bump script 2020-06-30 13:45:39 +02:00
8309e00ed3 Merge pull request #801 from MarinPostma/make-clippy-happy
Make clippy happy
2020-06-30 12:25:33 +02:00
4f6a6b1359 make clippy happy 2 2020-06-30 11:01:07 +02:00
21253a2bcb make setting enums more balanced 2020-06-30 11:01:07 +02:00
8e9296c66f simplify bucket sort signature 2020-06-30 11:01:07 +02:00
641d12fb2d make clippy happy 1 2020-06-30 11:01:07 +02:00
2019db972d Merge pull request #805 from MarinPostma/error-code-rename
rename error codes
2020-06-30 10:33:16 +02:00
0d2f5d3fe0 rename error codes 2020-06-29 14:37:51 +02:00
21567eeb8f Merge pull request #800 from MarinPostma/distinct-attribute-return-correct-name
Fix distinct attribute returning id instead of name
2020-06-29 10:42:57 +02:00
b1272d05b4 Test get distinct attribute 2020-06-27 10:38:08 +02:00
feb12a581e fix distinct attribute returning id instead of name 2020-06-27 10:30:27 +02:00
4ad4d7cf34 Merge pull request #796 from meilisearch/bump-version
Bump meilisearch version
2020-06-25 15:19:06 +02:00
a38498fe1e update changelog 2020-06-25 14:31:45 +02:00
8ea6ef1e90 bump meilisearch version 2020-06-25 14:28:50 +02:00
4f2b68eef1 Update CONTRIBUTING.md
Change Git links to chris.beams post
2020-06-24 19:49:20 +02:00
f1d55314d5 Merge pull request #793 from MarinPostma/fix-sysinfos
Fix sysinfos
2020-06-23 19:13:04 +02:00
c7701ebd19 partial sysinfo fix 2020-06-23 14:37:29 +02:00
05c3f598ac Merge pull request #778 from MarinPostma/consistent-settings
Make settings more consistent
2020-06-22 15:32:50 +02:00
3d771f2289 test distinct attribute 2020-06-22 12:16:35 +02:00
8035ca7138 fix distinct attribute behavior 2020-06-22 12:16:35 +02:00
60a90e96f3 add test for ranking rules settings 2020-06-22 12:16:35 +02:00
6167a10e5e change ranking rule addition behavior 2020-06-22 12:16:35 +02:00
ce28567dda Merge pull request #789 from MarinPostma/facet-distribution-update
Fix facet cache on document update
2020-06-22 12:14:01 +02:00
179942b07a test facet document fix 2020-06-22 11:40:08 +02:00
fabb1985ca recompute all facets during document addition 2020-06-22 11:40:08 +02:00
33bfcbeba7 Merge pull request #781 from MarinPostma/fix-benchmarks
Fix benchmarks and remove unused dependencies
2020-06-19 17:13:32 +02:00
3143ffe208 remove unused dependencies 2020-06-19 13:59:40 +02:00
c52d6d0741 fix broken benchmarks 2020-06-19 13:59:40 +02:00
ce7a9073e1 Adding a tracking issue template 2020-06-18 11:09:00 +02:00
95d1762f19 Merge pull request #735 from MarinPostma/post-search-route
Post search route
2020-06-15 22:32:12 +02:00
e5079004e1 adds SearchQueryPost 2020-06-15 16:28:08 +02:00
f6795775e2 update changelog 2020-06-15 16:28:08 +02:00
2d31371975 fix style 2020-06-15 16:28:08 +02:00
26d29783ce add tests for post search route 2020-06-15 16:28:08 +02:00
0ebf7b6214 fix CORS config error in actix 2020-06-15 16:28:08 +02:00
6add10b18f add search post route 2020-06-15 16:28:08 +02:00
940105efb3 change cors max age 2020-06-15 16:28:08 +02:00
3e13e728aa add post method 2020-06-15 16:28:08 +02:00
8cd224899c move search logic out of search route 2020-06-15 16:28:08 +02:00
35605c9f57 Merge pull request #777 from curquiza/hotfix-is-latest-script
Hotfix: Fix syntax error in is-latest-release.sh script
2020-06-15 14:57:44 +02:00
c6e68c87cd Fix syntax error in is-latest-release.sh script 2020-06-15 14:27:34 +02:00
7685165089 Merge pull request #775 from meilisearch/bump-version
Bump Meilisearch to v0.11.0
2020-06-15 11:21:38 +02:00
c6bad90c79 Mark unreleased changes as released in the changelog 2020-06-15 10:56:13 +02:00
8aeeea8382 Bump the Meilisearch crates version to 0.11.0 2020-06-15 10:54:16 +02:00
0ee46f773e Merge pull request #766 from MarinPostma/empty-facet-attributes-error
Empty facet attributes error
2020-06-10 14:04:48 +02:00
ff2490ca8b fix tests 2020-06-10 12:30:33 +02:00
2ada9c5d72 add error on search with empty facets 2020-06-10 12:30:33 +02:00
18b56c6af8 Merge pull request #760 from MarinPostma/typo-update-id
fix typo in error message
2020-06-06 11:02:52 +02:00
6fee7e638c fix typo in error message 2020-06-06 09:05:28 +02:00
f0822a86e1 Merge pull request #757 from MarinPostma/auth-status-code
change error status codes for auth
2020-06-05 20:57:08 +02:00
d007bf13f1 change missing headers & auth status code 2020-06-05 15:44:38 +02:00
cff9e1fd94 Merge pull request #759 from MarinPostma/document-delete-error
return error on deleting unexisting index
2020-06-05 12:33:06 +02:00
56b01ba440 test error delete unexisting index 2020-06-05 11:40:18 +02:00
11e00c906f error when deleting unexisting index 2020-06-05 11:33:59 +02:00
32843e9ade Merge pull request #751 from MarinPostma/handle-path-error
Handle url params errors
2020-06-04 15:22:54 +02:00
cf6c6eb117 test invalid query params 2020-06-04 14:48:37 +02:00
6df56c4ec5 add error handler for query params error 2020-06-04 14:48:37 +02:00
aabfe73b38 Merge pull request #756 from meilisearch/cleanup-dependencies
Cleanup the dependency tree
2020-06-04 14:39:04 +02:00
263583c118 Remove http-service/-mock from the dependencies 2020-06-04 14:04:18 +02:00
3ab8baa1b4 Merge pull request #755 from VerKnowSys/master
new: Updated sysinfo depdendency of meilisearch-http/Cargo.toml. This…
2020-06-04 13:37:00 +02:00
73c60d7768 new: Updated sysinfo depdendency of meilisearch-http/Cargo.toml. This fixes #740 2020-06-04 13:08:12 +02:00
987a60a6c0 Merge pull request #748 from MarinPostma/missing-primary-key-message
error message for missing primary key
2020-06-04 10:52:05 +02:00
ae6a92f89a error message for missing primary key 2020-06-03 17:38:39 +02:00
0fc624aa81 Merge pull request #750 from meilisearch/issue-templates
Update issue templates
2020-06-03 16:09:02 +02:00
af50a5528f Update issue templates
Feel free to close this PR and just go through the settings yourself:

https://github.com/meilisearch/MeiliSearch/issues/templates/edit

Once the new folder has been set up we also need a config.yml file like [this one](https://github.com/vercel/next.js/blob/canary/.github/ISSUE_TEMPLATE/config.yml) that will create the same type of discussion link that you see [here](https://github.com/vercel/next.js/issues/new/choose).

blank_issues_enabled: false
contact_links:
  - name: Ask a question
    url: https://github.com/meilisearch/MeiliSearch/discussions
    about: Ask questions and discuss with other community members
2020-06-03 13:57:01 +02:00
b2877b3549 Merge pull request #747 from MarinPostma/facets-settings-subroutes
Facets settings subroutes
2020-06-03 13:45:40 +02:00
5f1ca15a7c Update CONTRIBUTING.md 2020-06-03 13:37:46 +02:00
e1002862a9 Create CONTRIBUTING.md 2020-06-03 13:31:21 +02:00
3fe3c8cf02 test attributes_for_faceting subroutes 2020-06-03 11:31:58 +02:00
ed051b65ad default attributes_for_faceting to [] 2020-06-03 11:31:32 +02:00
8f0d9ccd87 add subroutes for attributes_for_faceting 2020-06-03 11:31:32 +02:00
adaf74bc87 Merge pull request #718 from meilisearch/add-more-analytics-reporting
Add more analytics
2020-06-02 17:05:09 +02:00
a2321d1562 update changelog and readme 2020-06-02 15:40:33 +02:00
e51ea55ae3 add more analytics 2020-06-02 15:40:31 +02:00
3af2f8b344 Merge pull request #733 from curquiza/fix-welcome-message
Change http into https in welcoming message links
2020-06-02 14:53:34 +02:00
f6c531a5a8 Change http into https in welcoming message links 2020-06-02 14:20:08 +02:00
2ae05d9fd1 Merge pull request #734 from MarinPostma/index-already-exist-code
Index already exist code
2020-06-01 11:43:29 +02:00
e95cec7ea6 add test for error_code 2020-06-01 11:06:57 +02:00
3bd5a90976 rename error types 2020-05-30 12:10:35 +02:00
68ad570cfc replace existing_index with index_already_exists 2020-05-30 12:10:35 +02:00
db45826232 take existing_index out of create_index error 2020-05-30 12:10:35 +02:00
df7284a4df Merge pull request #732 from meilisearch/api-key-dashboard
Allow users to input an API Key to search into private data
2020-05-29 17:53:36 +02:00
b327442eb6 Update the changelog 2020-05-29 12:22:23 +02:00
1370b19402 Allow users to input an API Key to search into private data 2020-05-29 12:22:23 +02:00
5ee4a1e954 Merge pull request #703 from MarinPostma/error-code
Error code support
2020-05-29 11:26:14 +02:00
8a2e60dc09 requested changes 2020-05-28 19:19:26 +02:00
2a32ad39a0 move filter parse error display to core 2020-05-28 16:32:17 +02:00
2bf82b3198 update error codes 2020-05-28 16:32:14 +02:00
c9f10432b8 update changelog 2020-05-28 16:28:41 +02:00
fb6a9ea280 remove unecessary errors 2020-05-28 16:28:41 +02:00
05344043b2 style fixes 2020-05-28 16:28:37 +02:00
d9e2e1a177 ErrorCode improvements 2020-05-28 16:23:46 +02:00
51b3139c0b fix status code 2020-05-28 16:23:46 +02:00
4254cfbce5 reponse error payload 2020-05-28 16:23:46 +02:00
e2546f2646 error codes for schema 2020-05-28 16:23:46 +02:00
9c58ca7ce5 error codes for core 2020-05-28 16:23:46 +02:00
0e20ac28e5 Change ErrorCategory to ErrorType 2020-05-28 16:23:46 +02:00
30fd24aa47 fix details 2020-05-28 16:23:46 +02:00
3bd15a4195 fix tests, restore behavior 2020-05-28 16:23:46 +02:00
c771694623 remove heed from http dependencies 2020-05-28 16:23:46 +02:00
d69180ec67 refactor errors / isolate core/http errors 2020-05-28 16:23:46 +02:00
e2db197b3f change ResponseError to Error 2020-05-28 16:23:46 +02:00
4c2af8e515 add error code abstractions 2020-05-28 16:23:46 +02:00
81b1aed7a1 Merge pull request #726 from MarinPostma/exhaustive-facet-count
Return the exhaustive facets count field
2020-05-28 12:39:00 +02:00
7c7f753463 add facet count in response 2020-05-28 12:08:38 +02:00
f1ac76a283 Merge pull request #725 from MarinPostma/fix-test-warnings
fix test warnings
2020-05-28 11:49:42 +02:00
2b7d614e84 fix test warnings 2020-05-27 19:32:55 +02:00
b859477ffd Merge pull request #716 from MarinPostma/rename-facet
rename facets to facetsDistribution
2020-05-27 18:29:21 +02:00
b6570f7016 rename facets to facetsDistribution 2020-05-27 17:35:33 +02:00
c1a2c7b610 Merge pull request #719 from eskombro/rename_fieldfrequency_to_fielddistribution
Rename fields_frequency into fields_distribution (and fieldsFrequency into fieldsDistribution)
2020-05-27 09:24:07 +02:00
b16088eec1 Update CHANGELOG.md 2020-05-26 20:44:06 +02:00
8438ac9756 Rename fields_frequency into fields_distribution 2020-05-26 20:40:49 +02:00
a3a389cae6 Merge pull request #715 from meilisearch/bump-heed
Bump heed to 0.8.0 and handle abort errors
2020-05-26 17:39:10 +02:00
8cebf78485 Bump heed to 0.8.0 and handle abort errors 2020-05-26 17:04:13 +02:00
166a301c7f Merge pull request #714 from MarinPostma/fix-null-facet-response
fix null facets in response
2020-05-26 17:02:23 +02:00
fac35e34e9 fix numm facets in response 2020-05-26 16:30:27 +02:00
0883e345d0 Merge pull request #669 from meilisearch/add-ssl
Add ssl support
2020-05-26 16:24:22 +02:00
7096fdb56b update changelog 2020-05-26 14:16:40 +02:00
a5ab4b3f64 update tests 2020-05-26 14:16:25 +02:00
7e6f068b18 add ssl support
format code

remove expects and unwrap
2020-05-26 14:16:25 +02:00
dc246b97e6 Merge pull request #699 from mattjtodd/add-tini-process-manager
Added tini process manager and entrypoint decl.
2020-05-26 11:20:56 +02:00
1ce7e09a44 Added tini process manager and entrypoint decl. 2020-05-26 08:52:22 +01:00
690023baff Merge pull request #705 from tpayet/add-docker-test-on-pr
Add docker test on pr
2020-05-25 14:04:33 +02:00
ea4c3b613a update sentry features to remove openssl
update changelog

Add docker build test on PR
2020-05-25 12:24:10 +02:00
8f990b2079 Merge pull request #702 from meilisearch/remove-open-ssl
Update sentry features to remove openssl
2020-05-25 12:22:22 +02:00
82fa060bc8 update changelog 2020-05-25 11:30:31 +02:00
a7cda7f950 update sentry features to remove openssl 2020-05-25 11:29:59 +02:00
59ed3e88b3 Merge pull request #695 from meilisearch/fix-dashboard
update normalize_path middleware
2020-05-23 15:19:08 +02:00
6d33376595 update Changelog 2020-05-23 12:20:28 +02:00
92897e7ad0 add test 2020-05-23 12:20:28 +02:00
92ce0f5c2b update normalize_path middleware 2020-05-23 12:20:27 +02:00
c946d144ce Merge pull request #706 from meilisearch/bump-fst-version
Bump the fst crate version to 0.4
2020-05-22 21:49:27 +02:00
bc7b0a38fd Use fst 0.4.4 in the project 2020-05-22 15:01:55 +02:00
6c87723b19 Bump the fst crate to 0.4.4 2020-05-22 15:01:35 +02:00
cd1679dea7 Merge pull request #684 from MarinPostma/max-payload-size
allow max payload size override
2020-05-22 11:35:15 +02:00
c5daa4a256 fix tests 2020-05-22 10:38:14 +02:00
df2eed1be3 update changelog 2020-05-22 10:38:12 +02:00
5193382b07 allow max payload size override 2020-05-22 10:37:41 +02:00
e40d9e7462 Merge pull request #696 from meilisearch/reduce-document-id-size
Reduce document id size from 64bits to 32bits
2020-05-20 18:58:12 +02:00
ddeb5745be Refactor a little bit 2020-05-20 17:01:57 +02:00
a60e3fb1cb Rename user ids into external docids 2020-05-20 15:08:56 +02:00
7bbb101555 Prefix the attributes_for_faceting key name 2020-05-20 14:19:00 +02:00
788e2202c9 Reduce the DocumentId size from 64 to 32bits 2020-05-20 14:19:00 +02:00
3bca31856d Discover and remove documents ids 2020-05-20 14:18:59 +02:00
5bf15a4190 Compute and merge discovered ids 2020-05-20 14:18:59 +02:00
016bfa391b Introduce internal and user ids put and get methods 2020-05-20 14:18:59 +02:00
e6a7521610 Introduce the DiscoverIds and DocumentsIds types 2020-05-20 14:18:59 +02:00
3e84f916b6 Merge pull request #697 from ndudnicz/typo/route-health-healtbody
typo in route/health.rs: HealtBody -> HealthBody
2020-05-20 14:18:38 +02:00
2d2c933611 typo in route/health.rs: HealtBody -> HealthBody 2020-05-20 11:57:44 +02:00
d30874c912 Merge pull request #691 from meilisearch/rewrite-indexer
Rewrite and simplify every indexer function
2020-05-19 17:13:53 +02:00
e2b115f3a9 Improve Number extraction/conversion function 2020-05-19 16:51:33 +02:00
ae30ee2ade Clean up some comments and variable names 2020-05-19 16:51:33 +02:00
3026840530 Introduce an index_document helper function 2020-05-19 16:51:33 +02:00
d300d788c7 Make the compute_document_id validate the id 2020-05-19 16:51:33 +02:00
2828b5fa19 Move the helper function to their own module 2020-05-19 16:51:33 +02:00
25b3c9a057 Remove the serde ExtractDocumentId struct 2020-05-19 16:51:33 +02:00
2558ce9a00 Export the value_to_string helper function 2020-05-19 16:51:33 +02:00
65ed2dcc1b Remove the serde ConvertToNumber 2020-05-19 16:51:32 +02:00
5e063da14f Remove the serde Indexer 2020-05-19 16:51:32 +02:00
615825b9fd Remove the serde Serializer 2020-05-19 16:51:32 +02:00
3502d8b48c Merge pull request #680 from MarinPostma/better-welcome
improve welcome message
2020-05-19 15:59:36 +02:00
a1d20ea8c8 remove keys in welcome message 2020-05-19 15:32:49 +02:00
ef7b1cc829 update changelog 2020-05-19 15:32:49 +02:00
2c9776c3e8 improve welcome message 2020-05-19 15:32:49 +02:00
3743d8ca5b Merge pull request #690 from MarinPostma/bump-sentry
bump sentry
2020-05-19 14:30:27 +02:00
e222e20517 update changelog 2020-05-19 10:29:38 +02:00
10d7dc75f3 update sentry 2020-05-19 10:27:55 +02:00
f6300497f7 Merge pull request #694 from curquiza/arm
Take achitecture into account in download-latest
2020-05-18 22:15:56 +02:00
1cae6c18b2 Take achitecture into account in download-latest 2020-05-18 18:15:50 +02:00
1fef613024 Merge pull request #685 from curquiza/hotfix-download-script
HOTFIX: the link in download-latest.sh
2020-05-15 22:37:49 +02:00
047407342b Fix the link in download-latest.sh 2020-05-15 17:49:33 +02:00
e2b71b0e57 Merge pull request #679 from MarinPostma/highlight-align-fix
Highlight align fix
2020-05-14 14:57:54 +02:00
9c1de3adfc add tests 2020-05-14 12:57:38 +02:00
54707e4e24 update changelog 2020-05-14 12:57:36 +02:00
a94ee167fc fix unaligned highlight 2020-05-14 12:56:15 +02:00
ce789682cc remove unnecessary clone 2020-05-14 12:56:15 +02:00
c95d4e48a5 Merge pull request #681 from MarinPostma/sentry-release-only
enables debug without sentry
2020-05-14 11:33:22 +02:00
1f35db2ddc update changelog 2020-05-14 10:56:57 +02:00
be1320d21d enables debug without sentry 2020-05-14 10:54:15 +02:00
308c652b30 Merge pull request #678 from erlend-sh/do-button
DigitalOcean button
2020-05-13 16:08:40 +02:00
80ab82897e DigitalOcean button 2020-05-13 15:41:31 +02:00
71578a5462 Merge pull request #676 from MarinPostma/facet-count
Facet count
2020-05-13 12:14:39 +02:00
eca39ad7bf update changelog 2020-05-13 11:48:34 +02:00
28a3e4005a adds test 2020-05-13 11:48:34 +02:00
f38d0d731f style fix 2020-05-13 11:48:34 +02:00
5051a796a0 error handling 2020-05-13 11:48:34 +02:00
869b6019c6 fix tests 2020-05-13 11:48:34 +02:00
347045adf2 smarter field_id name passing 2020-05-13 11:29:46 +02:00
e5126af458 enables facet count 2020-05-13 11:29:46 +02:00
effbb7f7f1 add sort result struct 2020-05-12 18:22:24 +02:00
a88f6c3241 Merge pull request #661 from meilisearch/add-actix-middleware
Add actix middleware
2020-05-12 16:04:29 +02:00
b96da94f92 fix issues from review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2020-05-12 15:42:17 +02:00
305665cd42 Update CHANGELOG.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2020-05-12 15:34:08 +02:00
f2b7aea16c add tests 2020-05-12 15:34:08 +02:00
71e3b5bc11 update changelog 2020-05-12 15:34:08 +02:00
cd12e2717c add errors on content-type and add more serde debug 2020-05-12 15:34:08 +02:00
7a8e64be30 add normalize_slashes middleware 2020-05-12 15:34:07 +02:00
36abcb3976 Merge pull request #660 from curquiza/fix-release-process
Update release process for stable releases
2020-05-12 11:50:04 +02:00
5dc7d498bd Update release process for stable releases 2020-05-12 11:10:55 +02:00
e9c5928fd3 Merge pull request #674 from meilisearch/fix-windows-ci
Fix the Windows CI
2020-05-11 22:45:59 +02:00
48e94b4372 Enable jemalloc only on linux 2020-05-11 21:24:35 +02:00
e3e32e7f2b Fix the Windows CI by using .exe 2020-05-11 18:19:12 +02:00
b215e9e848 Merge pull request #631 from MarinPostma/facet-filters
Facet filters
2020-05-11 18:16:34 +02:00
44ae21671c update changelog 2020-05-11 17:42:33 +02:00
0ce2666d2f tests 2020-05-11 17:38:52 +02:00
d7f099d3ba enables faceted search 2020-05-11 17:38:52 +02:00
e07fe017c1 document update 2020-05-11 17:38:52 +02:00
270c7b0288 facet settings 2020-05-11 16:12:13 +02:00
59c67f6bc8 setting up facets 2020-05-11 16:12:13 +02:00
dd08cfc6a3 Merge pull request #664 from meilisearch/add-sentry-probe
add sentry probe
2020-05-07 18:16:42 +02:00
b89e76ccb4 add sentry as default feature 2020-05-07 17:36:33 +02:00
57e515d5e2 update changelog 2020-05-07 17:36:33 +02:00
b62945961f add sentry probe 2020-05-07 17:36:33 +02:00
61ce9486fc Merge pull request #662 from meilisearch/database-option-default
implement default on DatabaseOptions
2020-05-07 17:09:13 +02:00
2e55457ecc implement default on DatabaseOptions 2020-05-07 15:40:44 +02:00
fe21a43364 Merge pull request #654 from tpayet/fix-docker-expose-port
Add EXPOSE port to Dockerfile
2020-05-04 17:15:07 +02:00
dee12c9c4d Add EXPOSE port to Dockerfile 2020-05-04 12:11:16 +02:00
bd1929695c Merge pull request #651 from meilisearch/add-code-of-conduct-1
Create CODE_OF_CONDUCT.md
2020-05-01 11:47:26 +02:00
7ba92da5e5 Create CODE_OF_CONDUCT.md 2020-04-30 20:16:02 +02:00
4ae2097cdc Merge branch 'update/readme-rust-ver' of https://github.com/djKooks/MeiliSearch into update/readme-rust-ver 2020-04-30 21:09:38 +09:00
1f2ab71bb6 Update requitites for source build
Update requitites for source build(rust version)

Fix README
2020-04-30 21:08:55 +09:00
f3b1261e2f Merge pull request #649 from hkrutzer/patch-1
Update the link to FAQ in README
2020-04-30 13:58:43 +02:00
b47f7dd4c7 Update the link to FAQ in README 2020-04-30 13:12:55 +02:00
674476155a Merge pull request #647 from MarinPostma/master
fix database options
2020-04-29 23:00:34 +02:00
2e3a765dac fix database options 2020-04-29 22:29:09 +02:00
382e300326 Merge pull request #646 from Wazner/configurable-map-size
Add support for configuring lmdb map size
2020-04-29 14:32:03 +02:00
dff36eaef4 Fix example not compiling 2020-04-29 11:04:09 +02:00
bdd088830a Add DatabaseOptions arg to query_builder test 2020-04-29 10:12:25 +02:00
17401cfbe9 Fix compilation error in unit tests 2020-04-29 09:21:07 +02:00
c4287cdfac Add support for configuring lmdb map size 2020-04-29 09:21:07 +02:00
9c0956049a Update requitites for source build
Update requitites for source build(rust version)

Fix README
2020-04-29 08:48:17 +09:00
899559a060 Merge pull request #601 from meilisearch/tide-to-actix-web
Change tide to actix-web
2020-04-28 18:43:06 +02:00
99866ba484 fix test after rebase 2020-04-28 17:54:50 +02:00
36c7fd0cf1 fix requested changes 2020-04-28 17:47:04 +02:00
ea308eb798 remove timeout search query parameter
fix requested changes
2020-04-28 17:46:03 +02:00
bc8ff49de3 update authorization middleware with actix-web-macros 2020-04-28 17:46:03 +02:00
e74d2c1872 simplify error handling by impl errors traits on ResponseError 2020-04-28 17:46:03 +02:00
4bd7e46ba6 revert get document method 2020-04-28 17:46:03 +02:00
ff3149f6fa remove search multi index 2020-04-28 17:46:03 +02:00
27b3b53bc5 update tests & fix the broken code 2020-04-28 17:46:03 +02:00
5e2861ff55 prepare architecture for tests 2020-04-28 17:45:22 +02:00
38d41252e6 add authentication middleware 2020-04-28 17:45:22 +02:00
5fed155f15 add middleware 2020-04-28 17:45:22 +02:00
6a1f73a304 clippy + fmt 2020-04-28 17:45:22 +02:00
22fbff98d4 add stop-word and synonym endpoints 2020-04-28 17:45:22 +02:00
85833e3a0a add setting endpoint 2020-04-28 17:45:22 +02:00
b08f6737ac change param tuples by struct
add settings endpoint; wip
2020-04-28 17:45:22 +02:00
5ec130e6dc cleanup 2020-04-28 17:45:22 +02:00
6c581fb3bd add index endpoint & key endpoint & stats endpoint 2020-04-28 17:45:21 +02:00
73b5c87cbb add search endpoint; warn unwrap 2020-04-28 17:45:21 +02:00
0aa16dd3b1 add key endpoint 2020-04-28 17:45:21 +02:00
540308dc63 add interface endpoint & health endpoint 2020-04-28 17:45:21 +02:00
6d6c8e8fb2 Start change http server; finish document endpoint 2020-04-28 17:45:20 +02:00
6cc80d2565 Merge pull request #641 from meilisearch/bump-version
Bump version to v0.10.1
2020-04-28 16:12:01 +02:00
5265fafd7a Update the changelog for the release 2020-04-28 15:55:29 +02:00
287226b609 Bump crates versions to v0.10.1 2020-04-28 15:55:29 +02:00
7119b21b46 Merge pull request #640 from MarinPostma/fix_filter_parenthesis
fixes parenthesis
2020-04-28 11:10:45 +02:00
d1f1bfe071 fix floats bug
Update CHANGELOG.md

Co-Authored-By: Clément Renault <renault.cle@gmail.com>
2020-04-28 10:44:07 +02:00
812465e014 fixes parenthesis
adds tests
2020-04-27 22:29:29 +02:00
86bab04997 Merge pull request #635 from lironhl/bug_fix/highlight_longest_area
Bug fix/highlight longest area
2020-04-27 19:34:34 +02:00
867bd1ffd7 Tests for the new highlight algorithm 2020-04-27 20:10:40 +03:00
16e075983d Highlights result with longest match 2020-04-27 20:09:12 +03:00
1b7a6687c8 Update README.md (#630)
* Update README.md

* Update README.md

Co-Authored-By: Clément Renault <renault.cle@gmail.com>

Co-authored-by: Clément Renault <renault.cle@gmail.com>
2020-04-24 10:11:27 +02:00
8c41fb2b49 Merge pull request #623 from lironhl/bug_fix/chrome-content-overflow
Fixes the content overflow in the web interface in chrome.
2020-04-22 13:47:33 +02:00
c1797c4e75 add overflow-wrap css property to content class 2020-04-22 11:33:18 +03:00
1c094346e2 Merge pull request #616 from MarinPostma/array-filter
filters on arrays
2020-04-21 10:58:21 +02:00
cd3c0d750c Add support for filtering on arrays of strings
update changelog

Update CHANGELOG.md

Co-Authored-By: Clément Renault <renault.cle@gmail.com>

fix requested changes
2020-04-21 10:33:57 +02:00
3d2f04a7af Added GitHub discussions 2020-04-20 10:54:08 +02:00
10d047a636 Merge pull request #607 from tpayet/add-separators-tokenizer
Add '@' char as a tokenizer separator
2020-04-16 12:18:11 +02:00
10211737c5 Add '@' char as a tokenizer separator
Update CHANGELOG.md

Co-Authored-By: Clément Renault <renault.cle@gmail.com>
2020-04-16 11:04:03 +02:00
45e55bc054 Merge pull request #608 from matboivin/minor-changes
Minor changes
2020-04-15 20:32:25 +02:00
1892ba8973 Minor changes 2020-04-15 16:04:50 +02:00
b7c287ffb7 Merge pull request #604 from meilisearch/personal-token-binaries
Use a personal access token to publish release binaries
2020-04-10 22:51:30 +02:00
457b645f3c Use a personal access token to publish bins
The default GITHUB_TOKEN expires after 1h
2020-04-10 18:28:28 +02:00
0185ffad89 Merge pull request #603 from meilisearch/bump-version
Bump version to v0.10
2020-04-10 15:56:56 +02:00
08edc9d5d0 Update the changelog to refer to the v0.10 2020-04-10 15:43:20 +02:00
979bea0327 Bump MeiliSearch version to v0.10 2020-04-10 15:43:03 +02:00
c7ea9f4cf3 Merge pull request #580 from meilisearch/rework-highlight-crop
Rework query highlight/crop parameters
2020-04-10 13:27:35 +02:00
233651bef8 update changelog 2020-04-10 12:26:53 +02:00
c6fb591348 add * on attributesToRetrieve 2020-04-10 12:26:34 +02:00
644e78df89 Add some tests 2020-04-10 12:26:34 +02:00
500eeca3fb Rework query highlight/crop parameters 2020-04-10 11:12:58 +02:00
c418abe92d Merge pull request #602 from meilisearch/fix-tide-cors
fix tide cors
2020-04-10 10:29:55 +02:00
2fdf33a006 update changelog 2020-04-10 10:13:43 +02:00
c3cf0cade9 fix tide cors 2020-04-10 10:13:43 +02:00
210bc68ced Merge pull request #592 from MarinPostma/query-filters
Implements query filters
2020-04-09 18:43:11 +02:00
193bded4b7 fixes broken tests 2020-04-09 18:26:48 +02:00
8f4d090f34 update changelog 2020-04-09 17:20:37 +02:00
a0a481697b replace lazy_static with once_cell 2020-04-09 17:13:34 +02:00
c3d5778aae allows to get names from schema 2020-04-09 17:13:34 +02:00
3e031d8297 adds error handling and integration 2020-04-09 17:13:34 +02:00
83f50914ec tests 2020-04-09 17:13:34 +02:00
d3916f28aa implements filter logic 2020-04-09 17:13:34 +02:00
dcf1096ac3 implements parser 2020-04-09 17:13:31 +02:00
66568a913c logic skeleton for filter and parser 2020-04-09 16:08:05 +02:00
6db6b40659 Merge pull request #594 from meilisearch/fix-stop-words
Fixes the stop words and words fst generation
2020-04-07 11:06:39 +02:00
780ac5cfd3 Update the CHANGELOG.md 2020-04-06 19:47:57 +02:00
d24209f5a7 Adds a test to check that stop word ar correctly handled 2020-04-06 19:47:57 +02:00
29d021ad4d Fixes the stop words and words fst generation 2020-04-06 18:53:02 +02:00
eb28276923 Merge pull request #589 from meilisearch/change-logo
change logo format
2020-04-05 12:18:36 +02:00
0679ec4f41 change logo format 2020-04-05 11:09:38 +02:00
1b5b71869f Merge pull request #588 from techieshark/patch-1
Fix typo in README
2020-04-05 10:35:30 +02:00
6681681a76 Merge branch 'master' into patch-1 2020-04-05 10:34:10 +02:00
83d8dc0d2b Merge pull request #587 from sgummaluri/fix_first_all_updates_call_after_indexing
Fix for 'Update Status after the first update comes up to be empty (#542)'
2020-04-05 10:32:27 +02:00
49499ca54d Fix typo in README
Non-plural would be more usual in English. I assume "performances" was a typo.
2020-04-05 17:34:12 +10:00
16a63c74ea Modifying the test name for better readability 2020-04-05 00:26:09 +05:30
b4df54197b Slight grammar modification to the changelog message 2020-04-05 00:17:47 +05:30
a28b428074 Update changelog to make the message more readable 2020-04-05 00:14:58 +05:30
e5a336a042 Fix for 'First update does not appear before being processed' #542 2020-04-04 23:18:43 +05:30
5e5702833c Merge pull request #583 from meilisearch/gha-ignore-changelog
Ignores the CHANGELOG when a specific label is set
2020-04-03 15:47:20 +02:00
03063cf349 Ingores the CHANGELOG when label asks for 2020-04-03 15:06:25 +02:00
241b842ef7 Merge pull request #581 from meilisearch/publish-armv8-binary
Publish an aarch64 binary on releases
2020-04-03 11:56:35 +02:00
184c290773 Update the CHANGELOG 2020-04-03 10:42:19 +02:00
5c638184e9 Publish an aarch64 (aka ARMv8) binary on releases 2020-04-03 10:39:28 +02:00
3a88910a24 Merge pull request #579 from meilisearch/update-deps
Update dependencies
2020-04-02 20:24:23 +02:00
eddd453564 Makes http-service a dev-dependency 2020-04-02 18:36:35 +02:00
38c43759bb Update most of the dependencies 2020-04-02 18:36:04 +02:00
26225a2fdf Merge pull request #576 from ppamorim/fix-bench
Fix benchmark
2020-04-02 12:23:31 +02:00
9950fffb6f Simplify imports of std::fs and std::io, remove space not needed, Remove UpdateState 2020-04-02 11:02:19 +01:00
f5d57c9dce Replace the toml reader with the JSON settings reader, directly parse the data to SettingsUpdate, Update CHANGELOG 2020-04-02 11:01:56 +01:00
bc9c80a5ee Merge pull request #577 from meilisearch/change-slogan
Change the slogan
2020-04-01 16:35:59 +02:00
702f7445ec Change the slogan 2020-04-01 16:34:24 +02:00
dcb93e3166 Merge pull request #575 from ppamorim/nested-seq
Support nested-seq
2020-04-01 14:16:47 +02:00
02b79e0040 Modified JSON to add move conditions 2020-04-01 12:59:40 +01:00
88b71fb6c4 Update CHANGELOG to add seq support 2020-04-01 12:59:40 +01:00
95bb443430 Add empty seq 2020-04-01 12:59:40 +01:00
1b47a10e89 Add support for seq values 2020-04-01 12:59:40 +01:00
006e54109b Merge pull request #570 from tpayet/clean-readme-heroku
Removing Heroku deployment from README
2020-04-01 11:35:29 +02:00
7eb6333933 Removing Heroku deployment from README 2020-04-01 11:04:16 +02:00
065da3d613 Merge pull request #572 from ppamorim/ignore-null-nested-obj
Add support of nested null
2020-03-31 16:33:16 +02:00
e698fa0b63 Add issue index in the CHANGELOG 2020-03-31 15:06:04 +01:00
8b662be42b Update CHANGELOG.md
Co-Authored-By: Clément Renault <renault.cle@gmail.com>
2020-03-31 15:03:35 +01:00
52a4f7cd23 Update readme 2020-03-31 14:41:22 +01:00
690b8e0dd0 Replace .toString to String::new() 2020-03-31 14:01:44 +01:00
bc6d86c8ce serialize_unit returns a empty string 2020-03-31 13:51:12 +01:00
fbf7117d6a Rename function, add trailing line, replace JSON string with macro 2020-03-31 13:13:09 +01:00
51472142c6 Add test to check if nested null will be ignored 2020-03-31 12:00:13 +01:00
91d1bd5903 Merge pull request #569 from meilisearch/ignore-bool-nested-obj
Make the engine index booleans
2020-03-31 11:01:26 +02:00
69aee870da Make the engine index booleans
The engine will see the values like text "true" and "false"
2020-03-31 10:39:58 +02:00
3b25bd71ab Merge pull request #567 from meilisearch/fix-not-dedup-matches
Construct a Set using the from_dirty method
2020-03-31 10:15:03 +02:00
c18e907f96 Construct a Set using the from_dirty method
This commit fixes #566 by ensuring that the slice of matches is
ordered and deduplicated.
2020-03-30 20:56:30 +02:00
e3808b8694 Merge pull request #558 from matboivin/update-readme
Update readme
2020-03-28 10:46:00 +01:00
116b301359 Add Slack 2020-03-28 10:28:48 +01:00
3ed510b78e Minor fix 2020-03-28 10:28:30 +01:00
565c46fdd4 Merge pull request #548 from tendant/master
Stringify nested JSON object
2020-03-27 19:57:34 +01:00
b0255076de Merge branch 'master' into master 2020-03-27 19:43:02 +01:00
67348f2251 Merge pull request #555 from meilisearch/add-changelog
Add a CHANGELOG.md file
2020-03-27 19:33:39 +01:00
227bc716d8 Add a Github Action to ensure the CHANGELOG is updated in PRs 2020-03-27 19:12:50 +01:00
c3467313e5 Add a CHANGELOG to help the documentation follow the engine udpates 2020-03-27 19:01:46 +01:00
c82eed010a Merge pull request #543 from MarinPostma/aligned-search-crops
adds support for aligned crop in search result
2020-03-27 18:58:45 +01:00
158c2b5382 tests aligned crop 2020-03-27 18:38:41 +01:00
2d1d59acb7 adds support for aligned cropping with cjk 2020-03-27 18:38:41 +01:00
0088de9802 adds support for aligned crop in search result 2020-03-27 18:38:41 +01:00
f49d2bca64 Merge branch 'master' into master 2020-03-27 17:07:06 +01:00
b7273c450f Merge pull request #545 from matboivin/update-readme
Update readme
2020-03-27 11:49:11 +01:00
4130fddcc8 Center-align crates demo gif 2020-03-27 11:28:57 +01:00
4f05045acb Center-align web interface gif 2020-03-27 11:20:30 +01:00
bc16c9beb7 Update gif links 2020-03-27 11:17:31 +01:00
0af9f6cf6e Add movies gif and move crates demo gif 2020-03-27 11:17:17 +01:00
022aeac808 Stringify nested JSON object 2020-03-26 18:45:57 -07:00
20461ccf36 Add gif
Co-Authored-By: cvermand <33010418+bidoubiwa@users.noreply.github.com>
2020-03-26 21:56:27 +01:00
7297396162 Update performance 2020-03-26 19:22:59 +01:00
c15deb41b0 Remove How it works (deep dive) section 2020-03-26 16:26:43 +01:00
cb2a08db7e Center-align badges 2020-03-26 16:24:03 +01:00
67703b5ea2 Remove Notes about system allocator 2020-03-26 16:17:47 +01:00
c445abb982 Replace a by an
Co-Authored-By: Clément Renault <renault.cle@gmail.com>
2020-03-26 16:14:52 +01:00
38d97fa339 Change phrasing 2020-03-26 13:48:08 +01:00
d45f0819be Remove repetitive word 2020-03-26 13:25:57 +01:00
9375d0efbe Fix details 2020-03-26 13:23:20 +01:00
2291c33074 Align with quick start guide 2020-03-26 13:18:11 +01:00
0a216066f4 Split commands 2020-03-26 13:13:02 +01:00
eea2a9cfc3 Add contact 2020-03-26 13:10:44 +01:00
33c2b9c5ff Add social 2020-03-26 13:04:23 +01:00
1129812e6e Update link formatting 2020-03-26 12:42:41 +01:00
b1b0c6b4b3 Add useful links 2020-03-26 12:31:58 +01:00
6ae3f2f8b9 Remove line under logo 2020-03-26 12:24:02 +01:00
f8d594e7ea Update formatting and add logo 2020-03-26 12:23:09 +01:00
38c3aa542f Add logo image 2020-03-26 12:05:53 +01:00
f3382125e1 Merge branch 'master' of git://github.com/meilisearch/MeiliSearch into update-readme 2020-03-26 12:01:40 +01:00
592a438ae8 Rephrase the readme 2020-03-26 11:59:40 +01:00
d84a86897c Merge pull request #540 from meilisearch/publish-arm-binaries
Publish an ARMv7 binary for the releases
2020-03-26 11:14:48 +01:00
88c063e887 Publish an ARMv7 binary for the releases 2020-03-26 10:51:47 +01:00
ba8a410d4c Merge pull request #539 from emresaglam/html-sanitize
html sanitize
2020-03-25 21:33:03 +01:00
451061f4b8 Merge branch 'master' into html-sanitize 2020-03-25 13:06:18 -07:00
ae17aa4955 Update meilisearch-http/public/interface.html
bypassing <em> tag after encoding the "<>"

Co-Authored-By: Clément Renault <renault.cle@gmail.com>
2020-03-25 12:48:59 -07:00
f589d07706 Merge pull request #544 from meilisearch/add-slack-link
Add a slack badge on readme
2020-03-25 20:29:00 +01:00
3f343ebfdb Update README.md 2020-03-25 20:22:04 +01:00
95ea3e39d2 Merge pull request #541 from MarinPostma/search-result-count
Adds number of hits in search result
2020-03-25 15:34:06 +01:00
a6dcd7a421 fixes tests
fixes tests impacted by sifnature change of query
2020-03-25 15:17:20 +01:00
fa9b7dd29f removes useless deserializer for SearchResult 2020-03-25 13:59:15 +01:00
fd65cf9dcb populates exhaustive number of hits 2020-03-25 12:44:38 +01:00
6e9d7f94d4 adds exhaustive number hits to search result 2020-03-25 12:11:37 +01:00
6151bc262f Added the missing function call 2020-03-24 11:03:16 -07:00
b62f9fabf2 Update meilisearch-http/public/interface.html
Co-Authored-By: Clément Renault <renault.cle@gmail.com>
2020-03-24 10:39:53 -07:00
86e1ba871f html sanitize
Added a function to sanitize the html
This is for browser side only.
2020-03-24 08:37:56 -07:00
a6ac902bf4 Merge pull request #534 from curquiza/homebrew-automatization
Automate homebrew publish
2020-03-20 16:14:41 +01:00
4cdb67c249 Automate homebrew publish 2020-03-20 12:14:08 +01:00
132 changed files with 35750 additions and 127996 deletions

38
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@ -0,0 +1,38 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
**Smartphone (please complete the following information):**
- Device: [e.g. iPhone6]
- OS: [e.g. iOS8.1]
- Browser [e.g. stock browser, safari]
- Version [e.g. 22]
**Additional context**
Add any other context about the problem here.

View File

@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

View File

@ -0,0 +1,40 @@
---
name: Tracking issue
about: Template for a tracking issue
title: ''
labels: tracking-issue
assignees: ''
---
# Summary
One paragraph to explain the feature.
# Motivations
Why are we doing this? What use cases does it support? What is the expected outcome?
# Explanation
Explain the proposal like it was the final documentation of this proposal.
- What is changing for end-users.
- How it works.
- What is breaking?
- Examples.
# Implementation
Explain the technical specificities that will need to be known or done in order to implement this proposal.
## Steps
Describe each step to create the feature with it's associated issue/PR.
# Related
- [ ] Validated by the team (@people needed)
- [ ] Test added
- [ ] [Documentation](https://github.com/meilisearch/documentation/issues/#xxx) //Change xxx or remove the line
- [ ] [SDK/Integrations](https://github.com/meilisearch/integration-guides/issues/#xxx) //Change xxx or remove the line

132
.github/is-latest-release.sh vendored Normal file
View File

@ -0,0 +1,132 @@
#!/bin/sh
# Checks if the current tag should be the latest (in terms of semver and not of release date).
# Ex: previous tag -> v0.10.1
# new tag -> v0.8.12
# The new tag should not be the latest
# So it returns "false", the CI should not run for the release v0.8.2
# Used in GHA in publish-docker-latest.yml
# Returns "true" or "false" (as a string) to be used in the `if` in GHA
# GLOBAL
GREP_SEMVER_REGEXP='v\([0-9]*\)[.]\([0-9]*\)[.]\([0-9]*\)$' # i.e. v[number].[number].[number]
# FUNCTIONS
# semverParseInto and semverLT from https://github.com/cloudflare/semver_bash/blob/master/semver.sh
# usage: semverParseInto version major minor patch special
# version: the string version
# major, minor, patch, special: will be assigned by the function
semverParseInto() {
local RE='[^0-9]*\([0-9]*\)[.]\([0-9]*\)[.]\([0-9]*\)\([0-9A-Za-z-]*\)'
#MAJOR
eval $2=`echo $1 | sed -e "s#$RE#\1#"`
#MINOR
eval $3=`echo $1 | sed -e "s#$RE#\2#"`
#MINOR
eval $4=`echo $1 | sed -e "s#$RE#\3#"`
#SPECIAL
eval $5=`echo $1 | sed -e "s#$RE#\4#"`
}
# usage: semverLT version1 version2
semverLT() {
local MAJOR_A=0
local MINOR_A=0
local PATCH_A=0
local SPECIAL_A=0
local MAJOR_B=0
local MINOR_B=0
local PATCH_B=0
local SPECIAL_B=0
semverParseInto $1 MAJOR_A MINOR_A PATCH_A SPECIAL_A
semverParseInto $2 MAJOR_B MINOR_B PATCH_B SPECIAL_B
if [ $MAJOR_A -lt $MAJOR_B ]; then
return 0
fi
if [ $MAJOR_A -le $MAJOR_B ] && [ $MINOR_A -lt $MINOR_B ]; then
return 0
fi
if [ $MAJOR_A -le $MAJOR_B ] && [ $MINOR_A -le $MINOR_B ] && [ $PATCH_A -lt $PATCH_B ]; then
return 0
fi
if [ "_$SPECIAL_A" == "_" ] && [ "_$SPECIAL_B" == "_" ] ; then
return 1
fi
if [ "_$SPECIAL_A" == "_" ] && [ "_$SPECIAL_B" != "_" ] ; then
return 1
fi
if [ "_$SPECIAL_A" != "_" ] && [ "_$SPECIAL_B" == "_" ] ; then
return 0
fi
if [ "_$SPECIAL_A" < "_$SPECIAL_B" ]; then
return 0
fi
return 1
}
# Returns the tag of the latest stable release (in terms of semver and not of release date)
get_latest() {
temp_file='temp_file' # temp_file needed because the grep would start before the download is over
curl -s 'https://api.github.com/repos/meilisearch/MeiliSearch/releases' > "$temp_file"
releases=$(cat "$temp_file" | \
grep -E "tag_name|draft|prerelease" \
| tr -d ',"' | cut -d ':' -f2 | tr -d ' ')
# Returns a list of [tag_name draft_boolean prerelease_boolean ...]
# Ex: v0.10.1 false false v0.9.1-rc.1 false true v0.9.0 false false...
i=0
latest=""
current_tag=""
for release_info in $releases; do
if [ $i -eq 0 ]; then # Cheking tag_name
if echo "$release_info" | grep -q "$GREP_SEMVER_REGEXP"; then # If it's not an alpha or beta release
current_tag=$release_info
else
current_tag=""
fi
i=1
elif [ $i -eq 1 ]; then # Checking draft boolean
if [ "$release_info" = "true" ]; then
current_tag=""
fi
i=2
elif [ $i -eq 2 ]; then # Checking prerelease boolean
if [ "$release_info" = "true" ]; then
current_tag=""
fi
i=0
if [ "$current_tag" != "" ]; then # If the current_tag is valid
if [ "$latest" = "" ]; then # If there is no latest yet
latest="$current_tag"
else
semverLT $current_tag $latest # Comparing latest and the current tag
if [ $? -eq 1 ]; then
latest="$current_tag"
fi
fi
fi
fi
done
rm -f "$temp_file"
echo $latest
}
# MAIN
current_tag="$(echo $GITHUB_REF | tr -d 'refs/tags/')"
latest="$(get_latest)"
if [ "$current_tag" != "$latest" ]; then
# The current release tag is not the latest
echo "false"
else
# The current release tag is the latest
echo "true"
fi

View File

@ -1,4 +1,4 @@
# GitHub actions workflow for MeiliDB # GitHub Actions Workflow for MeiliSearch
> **Note:** > **Note:**
@ -6,12 +6,14 @@
## Workflow ## Workflow
- On each pull request, we are triggering `cargo test`. - On each pull request, we trigger `cargo test`.
- On each tag, we are building: - On each tag, we build:
- the tagged docker image - the tagged Docker image and publish it to Docker Hub
- the binaries for MacOS, Ubuntu, and Windows - the binaries for MacOS, Ubuntu, and Windows
- the debian package - the Debian package
- On each stable release, we are build the latest docker image. - On each stable release (`v*.*.*` tag):
- we build the `latest` Docker image and publish it to Docker Hub
- we publish the binary to Hombrew and Gemfury
## Problems ## Problems

View File

@ -0,0 +1,16 @@
name: Check if the CHANGELOG.md has been updated
on: [pull_request]
jobs:
check:
name: Test on ${{ matrix.os }}
if: ${{ !contains(github.event.pull_request.labels.*.name, 'ignore-changelog') }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Checking the CHANGELOG.md has been updated in this PR
run: |
set -e
git fetch origin ${{ github.base_ref }}
git diff --name-only origin/${{ github.base_ref }} | grep -q CHANGELOG.md

View File

@ -1,9 +1,8 @@
name: Publish binaries to GitHub release
on: on:
push: release:
tags: types: [published]
- '*'
name: Publish binaries to release
jobs: jobs:
publish: publish:
@ -20,8 +19,8 @@ jobs:
artifact_name: meilisearch artifact_name: meilisearch
asset_name: meilisearch-macos-amd64 asset_name: meilisearch-macos-amd64
- os: windows-latest - os: windows-latest
artifact_name: meilisearch artifact_name: meilisearch.exe
asset_name: meilisearch-windows-amd64 asset_name: meilisearch-windows-amd64.exe
steps: steps:
- uses: hecrj/setup-rust-action@master - uses: hecrj/setup-rust-action@master
@ -33,7 +32,55 @@ jobs:
- name: Upload binaries to release - name: Upload binaries to release
uses: svenstaro/upload-release-action@v1-release uses: svenstaro/upload-release-action@v1-release
with: with:
repo_token: ${{ secrets.GITHUB_TOKEN }} repo_token: ${{ secrets.PUBLISH_TOKEN }}
file: target/release/${{ matrix.artifact_name }} file: target/release/${{ matrix.artifact_name }}
asset_name: ${{ matrix.asset_name }} asset_name: ${{ matrix.asset_name }}
tag: ${{ github.ref }} tag: ${{ github.ref }}
publish-armv7:
name: Publish for ARMv7
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v1.0.0
- uses: uraimo/run-on-arch-action@v1.0.7
id: runcmd
with:
architecture: armv7
distribution: ubuntu18.04
run: |
apt update
apt install -y curl gcc make
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal --default-toolchain stable
source $HOME/.cargo/env
cargo build --release --locked
- name: Upload the binary to release
uses: svenstaro/upload-release-action@v1-release
with:
repo_token: ${{ secrets.PUBLISH_TOKEN }}
file: target/release/meilisearch
asset_name: meilisearch-linux-armv7
tag: ${{ github.ref }}
publish-armv8:
name: Publish for ARMv8
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v1.0.0
- uses: uraimo/run-on-arch-action@v1.0.7
id: runcmd
with:
architecture: aarch64 # aka ARMv8
distribution: ubuntu18.04
run: |
apt update
apt install -y curl gcc make
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal --default-toolchain stable
source $HOME/.cargo/env
cargo build --release --locked
- name: Upload the binary to release
uses: svenstaro/upload-release-action@v1-release
with:
repo_token: ${{ secrets.PUBLISH_TOKEN }}
file: target/release/meilisearch
asset_name: meilisearch-linux-armv8
tag: ${{ github.ref }}

View File

@ -1,15 +1,13 @@
name: Publish deb pkg to GitHub release & apt repository name: Publish deb pkg to GitHub release & APT repository & Homebrew
on: on:
push: release:
tags: types: [released]
- '*'
jobs: jobs:
publish: debian:
name: Publish debian packagge name: Publish debian packagge
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: hecrj/setup-rust-action@master - uses: hecrj/setup-rust-action@master
with: with:
@ -28,3 +26,14 @@ jobs:
tag: ${{ github.ref }} tag: ${{ github.ref }}
- name: Upload debian pkg to apt repository - name: Upload debian pkg to apt repository
run: curl -F package=@target/debian/meilisearch.deb https://${{ secrets.GEMFURY_PUSH_TOKEN }}@push.fury.io/meilisearch/ run: curl -F package=@target/debian/meilisearch.deb https://${{ secrets.GEMFURY_PUSH_TOKEN }}@push.fury.io/meilisearch/
homebrew:
name: Bump Homebrew formula
runs-on: ubuntu-latest
steps:
- name: Create PR to Homebrew
uses: mislav/bump-homebrew-formula-action@v1
with:
formula-name: meilisearch
env:
COMMITTER_TOKEN: ${{ secrets.HOMEBREW_COMMITTER_TOKEN }}

View File

@ -1,8 +1,7 @@
--- ---
on: on:
push: release:
tags: types: [released]
- 'v[0-9]+.[0-9]+.[0-9]+'
name: Publish latest image to Docker Hub name: Publish latest image to Docker Hub
@ -10,8 +9,12 @@ jobs:
build: build:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v1 - uses: actions/checkout@v2
- name: Check if current release is latest
run: echo "##[set-output name=is_latest;]$(sh .github/is-latest-release.sh)"
id: release
- name: Publish to Registry - name: Publish to Registry
if: steps.release.outputs.is_latest == 'true'
uses: elgohr/Publish-Docker-Github-Action@master uses: elgohr/Publish-Docker-Github-Action@master
with: with:
name: getmeili/meilisearch name: getmeili/meilisearch

View File

@ -1,5 +1,12 @@
--- ---
on: [pull_request] on:
push:
branches:
- release-v*
- trying
- staging
tags:
- 'v[0-9]+.[0-9]+.[0-9]+' # this only concerns tags on stable
name: Test binaries with cargo test name: Test binaries with cargo test
@ -10,7 +17,6 @@ jobs:
strategy: strategy:
matrix: matrix:
os: [ubuntu-latest, macos-latest] os: [ubuntu-latest, macos-latest]
steps: steps:
- uses: actions/checkout@v1 - uses: actions/checkout@v1
- uses: actions-rs/toolchain@v1 - uses: actions-rs/toolchain@v1
@ -18,8 +24,75 @@ jobs:
profile: minimal profile: minimal
toolchain: stable toolchain: stable
override: true override: true
components: clippy
- name: Run cargo test - name: Run cargo test
uses: actions-rs/cargo@v1 uses: actions-rs/cargo@v1
with: with:
command: test command: test
args: --locked --release args: --locked --release
- name: Run cargo test dump
uses: actions-rs/cargo@v1
with:
command: test
args: dump --locked --release -- --ignored --test-threads 1
- name: Run cargo clippy
uses: actions-rs/cargo@v1
with:
command: clippy
build-image:
name: Test the build of Docker image
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- run: docker build . --file Dockerfile -t meilisearch
name: Docker build
## A push occurred on a release branch, a prerelease is created and assets are generated
prerelease:
name: create prerelease
needs: [check, build-image]
if: ${{ contains(github.ref, 'release-') && github.event_name == 'push' }}
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Get version number
id: version-number
run: echo "##[set-output name=number;]$(echo ${{ github.ref }} | sed 's/.*\(v.*\)/\1/')"
- name: Get commit count
id: commit-count
run: echo "##[set-output name=count;]$(git rev-list remotes/origin/master..remotes/origin/release-${{ steps.version-number.outputs.number }} --count)"
- name: Create Release
id: create_release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.PUBLISH_TOKEN }} # Personal Access Token
with:
tag_name: ${{ steps.version-number.outputs.number }}rc${{ steps.commit-count.outputs.count }}
release_name: Pre-release ${{ steps.version-number.outputs.number }}-rc${{ steps.commit-count.outputs.count }}
prerelease: true
## If a tag is pushed, a release is created for this tag, and assets will be generated
release:
name: create release
needs: [check, build-image]
if: ${{ contains(github.ref, 'tags/v') }}
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Get version number
id: version-number
run: echo "##[set-output name=number;]$(echo ${{ github.ref }} | sed 's/.*\(v.*\)/\1/')"
- name: Create Release
id: create_release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.PUBLISH_TOKEN }} # PAT
with:
tag_name: ${{ steps.version-number.outputs.number }}
release_name: Meilisearch ${{ steps.version-number.outputs.number }}
prerelease: false

84
CHANGELOG.md Normal file
View File

@ -0,0 +1,84 @@
## v0.15.0
- Dumps (#887)
- Update actix-web dependency to 3.0.0 (#963)
- Consider an empty query to be a placeholder search (#916)
## v0.14.1
- Fix version mismatch in snapshot importation (#959)
## v0.14.0
- Fix facet distribution case (#797)
- Snapshotting (#839)
- Fix bucket-sort unwrap bug (#915)
## v0.13.0
- placeholder search (#771)
- Add database version mismatch check (#794)
- Displayed and searchable attributes wildcard (#846)
- Remove sys-info route (#810)
- Check database version mismatch (#794)
- Fix unique docid bug (#841)
- Error codes in updates (#792)
- Sentry disable argument (#813)
- Log analytics if enabled (#825)
- Fix default values displayed on web interface (#874)
## v0.12.0
- Fix long documents not being indexed completely bug (#816)
- Fix distinct attribute returning id instead of name (#800)
- error code rename (#805)
## v0.11.1
- Fix facet cache on document update (#789)
- Improvements on settings consistency (#778)
## v0.11.0
- Change the HTTP framework, moving from tide to actix-web (#601)
- Bump sentry version to 0.18.1 (#690)
- Enable max payload size override (#684)
- Disable sentry in debug (#681)
- Better terminal greeting (#680)
- Fix highlight misalignment (#679)
- Add support for facet count (#676)
- Add support for faceted search (#631)
- Add support for configuring the lmdb map size (#646, #647)
- Add exposed port for Dockerfile (#654)
- Add sentry probe (#664)
- Fix url trailing slash and double slash issues (#659)
- Fix accept all Content-Type by default (#653)
- Return the error message from Serde when a deserialization error is encountered (#661)
- Fix NormalizePath middleware to make the dashboard accessible (#695)
- Update sentry features to remove openssl (#702)
- Add SSL support (#669)
- Rename fieldsFrequency into fieldsDistribution in stats (#719)
- Add support for error code reporting (#703)
- Allow the dashboard to query private servers (#732)
- Add telemetry (#720)
- Add post route for search (#735)
## v0.10.1
- Add support for floating points in filters (#640)
- Add '@' character as tokenizer separator (#607)
- Add support for filtering on arrays of strings (#611)
## v0.10.0
- Refined filtering (#592)
- Add the number of hits in search result (#541)
- Add support for aligned crop in search result (#543)
- Sanitize the content displayed in the web interface (#539)
- Add support of nested null, boolean and seq values (#571 and #568, #574)
- Fixed the core benchmark (#576)
- Publish an ARMv7 and ARMv8 binaries on releases (#540 and #581)
- Fixed a bug where the result of the update status after the first update was empty (#542)
- Fixed a bug where stop words were not handled correctly (#594)
- Fix CORS issues (#602)
- Support wildcard on attributes to retrieve, highlight, and crop (#549, #565, and #598)

76
CODE_OF_CONDUCT.md Normal file
View File

@ -0,0 +1,76 @@
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at bonjour@meilisearch.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq

112
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,112 @@
# Contributing
First, thank you for contributing to MeiliSearch! The goal of this document is to
provide everything you need to start contributing to MeiliSearch. The
following TOC is sorted progressively, starting with the basics and
expanding into more specifics.
<!-- MarkdownTOC autolink="true" style="ordered" indent=" " -->
1. [Assumptions](#assumptions)
1. [Your First Contribution](#your-first-contribution)
1. [Change Control](#change-control)
1. [Git Branches](#git-branches)
1. [Git Commits](#git-commits)
1. [Style](#style)
1. [Github Pull Requests](#github-pull-requests)
1. [Reviews & Approvals](#reviews--approvals)
1. [Merge Style](#merge-style)
1. [CI](#ci)
1. [Development](#development)
1. [Setup](#setup)
1. [Testing](#testing)
1. [Benchmarking](#benchmarking--profiling)
1. [Humans](#humans)
1. [Documentation](#documentation)
1. [Changelog](#changelog)
<!-- /MarkdownTOC -->
## Assumptions
1. **You're familiar with [Github](https://github.com) and the [pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
workflow.**
2. **You've read the MeiliSearch [docs](https://docs.meilisearch.com).**
3. **You know about the [MeiliSearch community](https://docs.meilisearch.com/resources/contact.html).
Please use this for help.**
## Your First Contribution
1. Ensure your change has an issue! Find an
[existing issue](https://github.com/meilisearch/meilisearch/issues/) or [open a new issue](https://github.com/meilisearch/meilisearch/issues/new).
* This is where you can get a feel if the change will be accepted or not.
2. Once approved, [fork the MeiliSearch repository](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) in your own
Github account.
3. [Create a new Git branch](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-and-deleting-branches-within-your-repository)
4. Review the MeiliSearch [workflow](#workflow) and [development](#development).
5. Make your changes.
6. [Submit the branch as a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) to the main MeiliSearch
repo. A MeiliSearch team member should comment and/or review your pull request
with a few days. Although, depending on the circumstances, it may take
longer.
## Change Control
### Git Branches
_All_ changes must be made in a branch and submitted as [pull requests](#pull-requests).
MeiliSearch does not adopt any type of branch naming style, but please use something
descriptive of your changes.
### Git Commits
#### Style
Please ensure your commits are small and focused; they should tell a story of
your change. This helps reviewers to follow your changes, especially for more
complex changes.
Familiarise yourself with [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/).
### Github Pull Requests
Once your changes are ready you must submit your branch as a pull request.
#### Reviews & Approvals
All pull requests must be reviewed and approved by at least one MeiliSearch team
member.
#### Merge Style
All pull requests are squashed and merged. We generally discourage large pull
requests that are over 300-500 lines of diff. If you would like to propose
a change that is larger we suggest coming onto our chat channel and
discuss it with one of our engineers. This way we can talk through the
solution and discuss if a change that large is even needed! This overall
will produce a quicker response to the change and likely produce code that
aligns better with our process.
## Development
### Setup
See the [MeiliSearch Docs](https://docs.meilisearch.com/guides/advanced_guides/installation.html) for how to set up a development environment.
### Benchmarking & Profiling
We do not yet do any benchmarking, nor have we formalised our profiling. If you'd like to work on this please get in touch!
## Humans
After making your change, you'll want to prepare it for MeiliSearch users (mostly humans). This usually entails updating documentation and announcing your feature.
### Documentation
Documentation is very important to MeiliSearch. All contributions that
alter user-facing behavior MUST include documentation changes. Please see
[GitHub.com/meilisearch/documentation](https://github.com/meilisearch/documentation) for more info.
### Changelog
Until we have guidelines in place, updating the [`Changelog`](/CHANGELOG.md) is solely the responsibility of MeiliSearch team members.

3529
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -18,10 +18,12 @@ RUN $HOME/.cargo/bin/cargo build --release
# Run # Run
FROM alpine:3.10 FROM alpine:3.10
RUN apk update --quiet RUN apk add -q --no-cache libgcc tini
RUN apk add libgcc
COPY --from=compiler /meilisearch/target/release/meilisearch . COPY --from=compiler /meilisearch/target/release/meilisearch .
ENV MEILI_HTTP_ADDR 0.0.0.0:7700 ENV MEILI_HTTP_ADDR 0.0.0.0:7700
EXPOSE 7700/tcp
ENTRYPOINT ["tini", "--"]
CMD ./meilisearch CMD ./meilisearch

View File

@ -1 +0,0 @@
web: ./target/release/meilisearch --http-addr=0.0.0.0:$PORT

179
README.md
View File

@ -1,46 +1,70 @@
# MeiliSearch <p align="center">
<img src="assets/logo.svg" alt="MeiliSearch" width="200" height="200" />
</p>
[![Build Status](https://github.com/meilisearch/MeiliSearch/workflows/Cargo%20test/badge.svg)](https://github.com/meilisearch/MeiliSearch/actions) <h1 align="center">MeiliSearch</h1>
[![dependency status](https://deps.rs/repo/github/meilisearch/MeiliSearch/status.svg)](https://deps.rs/repo/github/meilisearch/MeiliSearch)
[![License](https://img.shields.io/badge/license-MIT-informational)](https://github.com/meilisearch/MeiliSearch/blob/master/LICENSE)
⚡ Ultra relevant and instant full-text search API 🔍 <h4 align="center">
<a href="https://www.meilisearch.com">Website</a> |
<a href="https://blog.meilisearch.com">Blog</a> |
<a href="https://fr.linkedin.com/company/meilisearch">LinkedIn</a> |
<a href="https://twitter.com/meilisearch">Twitter</a> |
<a href="https://docs.meilisearch.com">Documentation</a> |
<a href="https://docs.meilisearch.com/faq/">FAQ</a>
</h4>
MeiliSearch is a powerful, fast, open-source, easy to use, and deploy search engine. The search and indexation are fully customizable and handles features like typo-tolerance, filters, and synonyms. <p align="center">
For more [details about those features, go to our documentation](https://docs.meilisearch.com/). <a href="https://github.com/meilisearch/MeiliSearch/actions"><img src="https://github.com/meilisearch/MeiliSearch/workflows/Cargo%20test/badge.svg" alt="Build Status"></a>
<a href="https://deps.rs/repo/github/meilisearch/MeiliSearch"><img src="https://deps.rs/repo/github/meilisearch/MeiliSearch/status.svg" alt="Dependency status"></a>
<a href="https://github.com/meilisearch/MeiliSearch/blob/master/LICENSE"><img src="https://img.shields.io/badge/license-MIT-informational" alt="License"></a>
<a href="https://slack.meilisearch.com"><img src="https://img.shields.io/badge/slack-MeiliSearch-blue.svg?logo=slack" alt="Slack"></a>
<a href="https://github.com/meilisearch/MeiliSearch/discussions" alt="Discussions"><img src="https://img.shields.io/badge/github-discussions-red" /></a>
<a href="https://app.bors.tech/repositories/26457"><img src="https://bors.tech/images/badge_small.svg" alt="Bors enabled"></a>
</p>
[![crates.io demo gif](misc/crates-io-demo.gif)](https://crates.meilisearch.com) <p align="center">⚡ Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine 🔍</p>
> Meili helps the Rust community find crates on [crates.meilisearch.com](https://crates.meilisearch.com)
## Features **MeiliSearch** is a powerful, fast, open-source, easy to use and deploy search engine. Both searching and indexing are highly customizable. Features such as typo-tolerance, filters, and synonyms are provided out-of-the-box.
* Search as-you-type experience (answers < 50ms) For more information about features go to [our documentation](https://docs.meilisearch.com/).
<p align="center">
<img src="assets/movies-web-demo.gif" alt="Web interface gif" />
</p>
## ✨ Features
* Search as-you-type experience (answers < 50 milliseconds)
* Full-text search * Full-text search
* Typo tolerant (understands typos and spelling mistakes) * Typo tolerant (understands typos and miss-spelling)
* Supports Kanji * Faceted search and filters
* Supports Kanji characters
* Supports Synonym * Supports Synonym
* Easy to install, deploy, and maintain * Easy to install, deploy, and maintain
* Whole documents returned * Whole documents are returned
* Highly customizable * Highly customizable
* RESTfull API * RESTful API
## Quick Start ## Getting started
### Deploy the Server ### Deploy the Server
#### Run it using Docker #### Brew (Mac OS)
```bash
docker run -it -p 7700:7700 --rm getmeili/meilisearch
```
#### Installation using Homebrew
```bash ```bash
brew update && brew install meilisearch brew update && brew install meilisearch
meilisearch meilisearch
``` ```
#### Installation using APT #### Docker
```bash
docker run -p 7700:7700 -v $(pwd)/data.ms:/data.ms getmeili/meilisearch
```
#### Run on Digital Ocean
[![DigitalOcean Marketplace](assets/do-btn-blue.svg)](https://marketplace.digitalocean.com/apps/meilisearch?action=deploy&refcode=7c67bd97e101)
#### APT (Debian & Ubuntu)
```bash ```bash
echo "deb [trusted=yes] https://apt.fury.io/meilisearch/ /" > /etc/apt/sources.list.d/fury.list echo "deb [trusted=yes] https://apt.fury.io/meilisearch/ /" > /etc/apt/sources.list.d/fury.list
@ -48,44 +72,46 @@ apt update && apt install meilisearch-http
meilisearch meilisearch
``` ```
#### Download the binary #### Download the binary (Linux & Mac OS)
```bash ```bash
curl -L https://install.meilisearch.com | sh curl -L https://install.meilisearch.com | sh
./meilisearch ./meilisearch
``` ```
#### Run it on heroku
[![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy?template=https://github.com/meilisearch/MeiliSearch)
#### Compile and run it from sources #### Compile and run it from sources
If you have the Rust toolchain already installed, you can compile from the source If you have the latest stable Rust toolchain installed on your local system, clone the repository and change it to your working directory.
```bash ```bash
git clone https://github.com/meilisearch/MeiliSearch.git git clone https://github.com/meilisearch/MeiliSearch.git
cd MeiliSearch cd MeiliSearch
```
In the cloned repository, compile MeiliSearch.
```bash
rustup override set stable
rustup update stable
cargo run --release cargo run --release
``` ```
### Create an Index and Upload Some Documents ### Create an Index and Upload Some Documents
We provide a movie dataset that you can use for testing purposes. Let's create an index! If you need a sample dataset, use [this movie database](https://www.notion.so/meilisearch/A-movies-dataset-to-test-Meili-1cbf7c9cfa4247249c40edfa22d7ca87#b5ae399b81834705ba5420ac70358a65). You can also find it in the `datasets/` directory.
```bash ```bash
curl -L 'https://bit.ly/2PAcw9l' -o movies.json curl -L 'https://bit.ly/2PAcw9l' -o movies.json
``` ```
MeiliSearch can serve multiple indexes, with different kinds of documents, MeiliSearch can serve multiple indexes, with different kinds of documents.
therefore, it is required to create the index before sending documents to it. It is required to create an index before sending documents to it.
```bash ```bash
curl -i -X POST 'http://127.0.0.1:7700/indexes' --data '{ "name": "Movies", "uid": "movies" }' curl -i -X POST 'http://127.0.0.1:7700/indexes' --data '{ "name": "Movies", "uid": "movies" }'
``` ```
Now that the server knows about our brand new index, we can send it data. Now that the server knows about your brand new index, you're ready to send it some data.
We provided you a small dataset that is available in the `datasets/` directory.
```bash ```bash
curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents' \ curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents' \
@ -97,8 +123,9 @@ curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents' \
#### In command line #### In command line
The search engine is now aware of our documents and can serve those via our HTTP server again. The search engine is now aware of your documents and can serve those via an HTTP server.
The [`jq` command-line tool](https://stedolan.github.io/jq/) can significantly help you read the server responses.
The [`jq` command-line tool](https://stedolan.github.io/jq/) can greatly help you read the server responses.
```bash ```bash
curl 'http://127.0.0.1:7700/indexes/movies/search?q=botman+robin&limit=2' | jq curl 'http://127.0.0.1:7700/indexes/movies/search?q=botman+robin&limit=2' | jq
@ -129,67 +156,37 @@ curl 'http://127.0.0.1:7700/indexes/movies/search?q=botman+robin&limit=2' | jq
} }
``` ```
#### With the Web Interface #### Use the Web Interface
MeiliSearch provides a simple web interface containing a search bar in order to quickly test the instant search experience with a given set of documents. We also deliver an **out-of-the-box web interface** in which you can test MeiliSearch interactively.
This web interface is available in your browser at the root of the server. The default URL is [http://127.0.0.1:7700](http://127.0.0.1:7700). You can access the web interface in your web browser at the root of the server. The default URL is [http://127.0.0.1:7700](http://127.0.0.1:7700). All you need to do is open your web browser and enter MeiliSearchs address to visit it. This will lead you to a web page with a search bar that will allow you to search in the selected index.
### Documentation | [See the gif above](#demo)
Now, that you have a running MeiliSearch, you can learn more and tune your search engine using [the documentation](https://docs.meilisearch.com). ## Documentation
## How it works Now that your MeiliSearch server is up and running, you can learn more about how to tune your search engine in [the documentation](https://docs.meilisearch.com).
MeiliSearch uses [LMDB](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database) as the internal key-value store. The key-value store allows us to handle updates and queries with small memory and CPU overheads. The whole ranking system is [data oriented](https://github.com/meilisearch/MeiliSearch/issues/82) and provides great performances.
You can [read the deep dive](deep-dive.md) if you want more information on the engine; it describes the whole process of generating updates and handling queries. Also, you can take a look at the [typos and ranking rules](typos-ranking-rules.md) if you want to know the default rules used to sort the documents.
### Technical features
- Provides [6 default ranking criteria](https://github.com/meilisearch/MeiliSearch/blob/3ea5aa18a209b6973b921542d46a79e1c753c163/meilisearch-core/src/criterion/mod.rs#L106-L111) used to [bucket sort](https://en.wikipedia.org/wiki/Bucket_sort) documents
- Accepts [custom criteria](https://github.com/meilisearch/MeiliSearch/blob/3ea5aa18a209b6973b921542d46a79e1c753c163/meilisearch-core/src/criterion/mod.rs#L20-L29) and can apply them in any custom order
- Support [ranged queries](https://github.com/meilisearch/MeiliSearch/blob/3ea5aa18a209b6973b921542d46a79e1c753c163/meilisearch-core/src/query_builder.rs#L342), useful for paginating results
- Can [distinct](https://github.com/meilisearch/MeiliSearch/blob/3ea5aa18a209b6973b921542d46a79e1c753c163/meilisearch-core/src/query_builder.rs#L324-L329) and [filter](https://github.com/meilisearch/MeiliSearch/blob/3ea5aa18a209b6973b921542d46a79e1c753c163/meilisearch-core/src/query_builder.rs#L313-L318) returned documents based on context defined rules
- Searches for [concatenated](https://github.com/meilisearch/MeiliSearch/pull/164) and [splitted query words](https://github.com/meilisearch/MeiliSearch/pull/232) to improve the search quality.
- Can store complete documents or only [user schema specified fields](https://github.com/meilisearch/MeiliSearch/blob/3ea5aa18a209b6973b921542d46a79e1c753c163/datasets/movies/schema.toml)
- The [default tokenizer](https://github.com/meilisearch/MeiliSearch/blob/3ea5aa18a209b6973b921542d46a79e1c753c163/meilisearch-tokenizer/src/lib.rs) can index latin and kanji based languages
- Returns [the matching text areas](https://github.com/meilisearch/MeiliSearch/blob/3ea5aa18a209b6973b921542d46a79e1c753c163/meilisearch-types/src/lib.rs#L49-L65), useful to highlight matched words in results
- Accepts query time search config like the [searchable attributes](https://github.com/meilisearch/MeiliSearch/blob/3ea5aa18a209b6973b921542d46a79e1c753c163/meilisearch-core/src/query_builder.rs#L331-L336)
- Supports [runtime incremental indexing](https://github.com/meilisearch/MeiliSearch/blob/3ea5aa18a209b6973b921542d46a79e1c753c163/meilisearch-core/src/store/mod.rs#L143-L212)
## Performances
With a dataset composed of _100 353_ documents with _352_ attributes each and _3_ of them indexed.
So more than _300 000_ fields indexed for _35 million_ stored we can handle more than _2.8k req/sec_ with an average response time of _9 ms_ on an Intel i7-7700 (8) @ 4.2GHz.
Requests are made using [wrk](https://github.com/wg/wrk) and scripted to simulate real users' queries.
```
Running 10s test @ http://localhost:2230
2 threads and 25 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.52ms 7.61ms 99.25ms 84.58%
Req/Sec 1.41k 119.11 1.78k 64.50%
28080 requests in 10.01s, 7.42MB read
Requests/sec: 2806.46
Transfer/sec: 759.17KB
```
We also indexed a dataset containing something like _12 millions_ cities names in _24 minutes_ on a machine with _8 cores_, _64 GB of RAM_, and a _300 GB NMVe_ SSD.<br/>
The resulting database was _16 GB_ and search results were between _30 ms_ and _4 seconds_ for short prefix queries.
### Notes
With Rust 1.32 the allocator has been [changed to use the system allocator](https://blog.rust-lang.org/2019/01/17/Rust-1.32.0.html#jemalloc-is-removed-by-default).
We have seen much better performances when [using jemalloc as the global allocator](https://github.com/alexcrichton/jemallocator#documentation).
## Contributing ## Contributing
We will be glad if you submit issues and pull requests. You can help to grow this project and start contributing by checking [issues tagged "good-first-issue"](https://github.com/meilisearch/MeiliSearch/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22). It is a good start! Hey! We're glad you're thinking about contributing to MeiliSearch! If you think something is missing or could be improved, please open issues and pull requests. If you'd like to help this project grow, we'd love to have you! To start contributing, checking [issues tagged as "good-first-issue"](https://github.com/meilisearch/MeiliSearch/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) is a good start!
### Analytic Events ## Telemetry
We send events to our Amplitude instance to be aware of the number of people who use MeiliSearch.<br/> MeiliSearch collects anonymous data regarding general usage.
We only send the platform on which the server runs once by day. No other information is sent.<br/> This helps us better understand developers usage of MeiliSearch features.<br/>
If you do not want us to send events, you can disable these analytics by using the `MEILI_NO_ANALYTICS` env variable. To see what information we're retrieving, please see the complete list [on the dedicated issue](https://github.com/meilisearch/MeiliSearch/issues/720).<br/>
We also use Sentry to make us crash and error reports. If you want to know more about what Sentry collects, please visit their [privacy policy website](https://sentry.io/privacy/).<br/>
This program is optional, you can disable these analytics by using the `MEILI_NO_ANALYTICS` env variable.
## 💌 Contact
Feel free to contact us about any questions you may have:
* At [bonjour@meilisearch.com](mailto:bonjour@meilisearch.com)
* Via the chat box available on every page of [our documentation](https://docs.meilisearch.com/) and on [our landing page](https://www.meilisearch.com/).
* 🆕 Join our [GitHub Discussions forum](https://github.com/meilisearch/MeiliSearch/discussions)
* Join our [Slack community](https://slack.meilisearch.com/).
* By opening an issue.
MeiliSearch is developed by [Meili](https://www.meilisearch.com), a young company. To know more about us, you can [read our blog](https://blog.meilisearch.com). Any suggestion or feedback is highly appreciated. Thank you for your support!

View File

@ -1,16 +0,0 @@
{
"name": "MeiliSearch",
"description": "Ultra relevant, instant and typo-tolerant full-text search API",
"keywords": [
"search-engine",
"instant search",
"search API"
],
"website": "https://docs.meilisearch.com/",
"repository": "https://github.com/meilisearch/MeiliSearch",
"buildpacks": [
{
"url": "https://github.com/emk/heroku-buildpack-rust"
}
]
}

View File

Before

Width:  |  Height:  |  Size: 7.2 MiB

After

Width:  |  Height:  |  Size: 7.2 MiB

23
assets/do-btn-blue.svg Normal file
View File

@ -0,0 +1,23 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg width="200px" height="42px" viewBox="0 0 200 42" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<!-- Generator: Sketch 52.5 (67469) - http://www.bohemiancoding.com/sketch -->
<title>do-btn-blue</title>
<desc>Created with Sketch.</desc>
<g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g id="Partner-welcome-kit-Copy-3" transform="translate(-651.000000, -762.000000)">
<g id="do-btn-blue" transform="translate(651.000000, 763.000000)">
<rect id="Rectangle-Copy" fill="#0069FF" x="0" y="0" width="200" height="40" rx="6"></rect>
<path d="M45,0 L45,40" id="Line-2" stroke="#FFFFFF" stroke-linecap="square"></path>
<g id="DO_Logo_horizontal_blue-Copy" transform="translate(13.000000, 11.000000)" fill="#FFFFFF">
<path d="M10.0098493,20 L10.0098493,16.1262429 C14.12457,16.1262429 17.2897398,12.0548452 15.7269372,7.74627862 C15.1334679,6.14538921 13.8674,4.86072487 12.2650328,4.28756693 C7.952489,2.72620566 3.87733294,5.88845634 3.87733294,9.99938223 C3.87733294,9.99938223 3.87733294,9.99938223 3.87733294,9.99938223 L0,9.99938223 C0,3.45747613 6.3303395,-1.64165309 13.1948014,0.492866119 C16.2017127,1.42177726 18.57559,3.81322933 19.5053586,6.79760341 C21.6418482,13.6754986 16.5577943,20 10.0098493,20 Z" id="XMLID_49_"></path>
<polygon id="XMLID_47_" points="9.56521739 15.6521739 6.08695652 15.6521739 6.08695652 12.173913 6.08695652 12.173913 9.56521739 12.173913 9.56521739 12.173913"></polygon>
<polygon id="XMLID_46_" points="6.08695652 19.1304348 3.47826087 19.1304348 3.47826087 19.1304348 3.47826087 16.5217391 6.08695652 16.5217391"></polygon>
<polygon id="XMLID_45_" points="3.47826087 16.5217391 0.869565217 16.5217391 0.869565217 16.5217391 0.869565217 13.9130435 0.869565217 13.9130435 3.47826087 13.9130435 3.47826087 13.9130435"></polygon>
</g>
<text id="Create-a-Droplet-Copy" font-family="Sailec-Medium, Sailec" font-size="16" font-weight="400" fill="#FFFFFF">
<tspan x="58" y="26">Create a Droplet</tspan>
</text>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 2.3 KiB

17
assets/logo.svg Normal file
View File

@ -0,0 +1,17 @@
<svg width="360" height="360" viewBox="0 0 360 360" fill="none" xmlns="http://www.w3.org/2000/svg">
<g id="logo_main">
<rect id="Rectangle" x="107.333" y="0.150146" width="274.315" height="274.315" rx="98.8334" transform="rotate(23 107.333 0.150146)" fill="url(#paint0_linear)"/>
<path id="Rectangle_2" fill-rule="evenodd" clip-rule="evenodd" d="M61.3296 230.199C46.2224 194.608 38.6688 176.813 38.208 160.329C37.5286 136.025 47.0175 112.539 64.3891 95.5282C76.1718 83.9904 93.9669 76.4368 129.557 61.3296C165.147 46.2224 182.943 38.6688 199.427 38.208C223.731 37.5286 247.217 47.0175 264.228 64.3891C275.766 76.1718 283.319 93.9669 298.426 129.557C313.534 165.147 321.087 182.943 321.548 199.427C322.227 223.731 312.738 247.217 295.367 264.228C283.584 275.766 265.789 283.319 230.199 298.426C194.608 313.534 176.813 321.087 160.329 321.548C136.025 322.227 112.539 312.738 95.5282 295.367C83.9903 283.584 76.4368 265.789 61.3296 230.199Z" fill="url(#paint1_linear)"/>
<path id="m" fill-rule="evenodd" clip-rule="evenodd" d="M219.568 130.748C242.363 130.748 259.263 147.451 259.263 174.569V229.001H227.232V179.678C227.232 166.119 220.747 159.634 210.136 159.634C205.223 159.634 200.311 161.796 195.595 167.494C195.791 169.852 195.988 172.21 195.988 174.569V229.001H164.154V179.678C164.154 166.119 157.472 159.634 147.057 159.634C142.145 159.634 137.429 161.992 132.712 168.084V229.001H100.878V133.695H132.712V139.394C139.197 133.892 145.878 130.748 156.49 130.748C168.477 130.748 178.695 135.267 185.769 143.52C195.791 134.678 205.42 130.748 219.568 130.748Z" fill="white"/>
</g>
<defs>
<linearGradient id="paint0_linear" x1="-13.6248" y1="129.208" x2="244.49" y2="403.522" gradientUnits="userSpaceOnUse">
<stop stop-color="#E41359"/>
<stop offset="1" stop-color="#F23C79"/>
</linearGradient>
<linearGradient id="paint1_linear" x1="11.0088" y1="111.65" x2="111.65" y2="348.747" gradientUnits="userSpaceOnUse">
<stop stop-color="#24222F"/>
<stop offset="1" stop-color="#2B2937"/>
</linearGradient>
</defs>
</svg>

After

Width:  |  Height:  |  Size: 2.0 KiB

BIN
assets/movies-web-demo.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.3 MiB

3
bors.toml Normal file
View File

@ -0,0 +1,3 @@
status = ["Test on macos-latest", "Test on ubuntu-latest"]
# 4 hours timeout
timeout-sec = 14400

38
bump.sh Executable file
View File

@ -0,0 +1,38 @@
#!/usr/bin/bash
NEW_VERSION=$1
if [ -z "$NEW_VERSION" ]
then
echo "error: a version number must be provided"
exit 1
fi
# find current version
CURRENT_VERSION=$(cat **/*.toml | grep meilisearch | grep version | sed 's/.*\([0-9]\+\.[0-9]\+\.[0-9]\+\).*/\1/' | sed "1q;d")
# bump all version in .toml
echo "bumping from version $CURRENT_VERSION to version $NEW_VERSION"
while true
do
read -r -p "Continue (y/n)?" choice
case "$choice" in
y|Y ) break;;
n|N ) echo "aborting bump" && exit 0;;
* ) echo "invalid choice";;
esac
done
# update all crate version
sed -i "s/version = \"$CURRENT_VERSION\"/version = \"$NEW_VERSION\"/" **/*.toml
printf "running cargo check: "
CARGO_CHECK=$(cargo check 2>&1)
if [ $? != "0" ]
then
printf "\033[31;1m FAIL \033[0m\n"
printf "$CARGO_CHECK"
exit 1
fi
printf "\033[32;1m OK \033[0m\n"

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,4 @@
{ {
"primaryKey": "id",
"searchableAttributes": ["title", "overview"], "searchableAttributes": ["title", "overview"],
"displayedAttributes": [ "displayedAttributes": [
"id", "id",

View File

@ -1,95 +0,0 @@
# A deep dive in MeiliSearch
On the 15 of May 2019.
MeiliSearch is a full text search engine based on a final state transducer named [fst](https://github.com/BurntSushi/fst) and a key-value store named [sled](https://github.com/spacejam/sled). The goal of a search engine is to store data and to respond to queries as accurate and fast as possible. To achieve this it must save the matching words in an [inverted index](https://en.wikipedia.org/wiki/Inverted_index).
<!-- MarkdownTOC autolink="true" -->
- [Where is the data stored?](#where-is-the-data-stored)
- [What does the key-value store contains?](#what-does-the-key-value-store-contains)
- [The inverted word index](#the-inverted-word-index)
- [A final state transducer](#a-final-state-transducer)
- [Document indexes](#document-indexes)
- [The schema](#the-schema)
- [Document attributes](#document-attributes)
- [How is a request processed?](#how-is-a-request-processed)
- [Query lexemes](#query-lexemes)
- [Automatons and query index](#automatons-and-query-index)
- [Sort by criteria](#sort-by-criteria)
<!-- /MarkdownTOC -->
## Where is the data stored?
MeiliSearch is entirely backed by a key-value store like any good database (i.e. Postgres, MySQL). This brings a great flexibility in the way documents can be stored and updates handled along time.
[sled will brings some](https://github.com/spacejam/sled/tree/434533332a3f485e6d2e467023be0a0b55d3a1af#plans) of the [A.C.I.D. properties](https://en.wikipedia.org/wiki/ACID_(computer_science)) to help us be sure the saved data is consistent.
## What does the key-value store contains?
It contain the inverted word index, the schema and the documents fields.
### The inverted word index
[The inverted word index](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-data/src/database/words_index.rs) is a sled Tree dedicated to store and give access to all documents that contains a specific word. The information stored under the word is simply a big ordered array of where in the document the word has been found. In other word, a big list of [`DocIndex`](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-core/src/lib.rs#L35-L51).
#### A final state transducer
_...also abbreviated fst_
This is the first entry point of the engine, you can read more about how it work with the beautiful blog post of @BurntSushi, [Index 1,600,000,000 Keys with Automata and Rust](https://blog.burntsushi.net/transducers/).
To make it short it is a powerful way to store all the words that are present in the indexed documents. You construct it by giving it all the words you want to index. When you want to search in it you can provide any automaton you want, in MeiliSearch [a custom levenshtein automaton](https://github.com/tantivy-search/levenshtein-automata/) is used.
#### Document indexes
The `fst` will only return the words that match with the search automaton but the goal of the search engine is to retrieve all matches in all the documents when a query is made. You want it to return some sort of position in an attribute in a document, an information about where the given word matched.
To make it possible we retrieve all of the `DocIndex` corresponding to all the matching words in the fst, we use the [`WordsIndex`](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-data/src/database/words_index.rs#L11-L21) Tree to get the `DocIndexes` corresponding the words.
### The schema
The schema is a data structure that represents which documents attributes should be stored and which should be indexed. It is stored under a the [`MainIndex`](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-data/src/database/main_index.rs#L12) Tree and given to MeiliSearch only at the creation of an index.
Each document attribute is associated to a unique 16 bit number named [`SchemaAttr`](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-data/src/schema.rs#L186).
In the future, this schema type could be given along with updates, the database could be able to handled a new schema and reindex the database according to the new one.
### Document attributes
When the engine handle a query the result that the requester want is a document, not only the [`Matches`](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-core/src/lib.rs#L62-L88) associated to it, fields of the original document must be returned too.
So MeiliSearch again uses the power of the underlying key-value store and save the documents attributes marked as _STORE_ in the schema. The dedicated Tree for this information is the [`DocumentsIndex`](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-data/src/database/documents_index.rs#L11).
When a document field is saved in the key-value store its value is binary encoded using [message pack](https://github.com/3Hren/msgpack-rust), so a document must be serializable using serde.
## How is a request processed?
Now that we have our inverted index we are able to return results based on a query. In the MeiliSearch universe a query is a simple string containing words.
### Query lexemes
The first step to be able to call the underlying structures is to split the query in words, for that we use a [custom tokenizer](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-tokenizer/src/lib.rs#L82-L84). Note that a tokenizer is specialized for a human language, this is the hard part.
### Automatons and query index
So to query the fst we need an automaton, in MeiliSearch we use a [levenshtein automaton](https://en.wikipedia.org/wiki/Levenshtein_automaton), this automaton is constructed using a string and a maximum distance. According to the [Algolia's blog post](https://blog.algolia.com/inside-the-algolia-engine-part-3-query-processing/#algolia%e2%80%99s-way-of-searching-for-alternatives) we [created the DFAs](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-core/src/automaton.rs#L59-L78) with different settings.
Thanks to the power of the fst library [it is possible to union multiple automatons](https://docs.rs/fst/0.3.2/fst/map/struct.OpBuilder.html#method.union) on the same fst set. The `Stream` is able to return all the matching words. We use these words to find the whole list of `DocIndexes` associated.
With all these informations it is possible [to reconstruct a list of all the `DocIndexes` associated](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-core/src/query_builder.rs#L103-L130) with the words queried.
### Sort by criteria
Now that we are able to get a big list of [DocIndexes](https://github.com/Kerollmops/MeiliSearch/blob/550dc1e99224e386516877450320f694947332d4/src/lib.rs#L21-L36) it is not enough to sort them by criteria, we need more informations like the levenshtein distance or the fact that a query word match exactly the word stored in the fst. So [we stuff it a little bit](https://github.com/Kerollmops/MeiliSearch/blob/550dc1e99224e386516877450320f694947332d4/src/rank/query_builder.rs#L86-L93), and aggregate all these [Matches](https://github.com/Kerollmops/MeiliSearch/blob/550dc1e99224e386516877450320f694947332d4/src/lib.rs#L47-L74) for each document. This way it will be easy to sort a simple vector of document using a bunch of functions.
With this big list of documents and associated matches [we are able to sort only the part of the slice that we want](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-core/src/query_builder.rs#L160-L188) using bucket sorting. [Each criterion](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-core/src/criterion/mod.rs#L95-L101) is evaluated on each subslice without copy, thanks to [GroupByMut](https://docs.rs/slice-group-by/0.2.4/slice_group_by/) which, I hope [will soon be merged](https://github.com/rust-lang/rfcs/pull/2477).
Note that it is possible to customize the criteria used by using the `QueryBuilder::with_criteria` constructor, this way you can implement some custom ranking based on the document attributes using the appropriate structure and the [`document` method](https://github.com/meilisearch/MeiliSearch/blob/3db823de002243004612e36a19b4578d800dab97/meilisearch-data/src/database/index.rs#L86).
At this point, MeiliSearch work is over 🎉

View File

@ -6,8 +6,10 @@ GREEN='\033[32m'
DEFAULT='\033[0m' DEFAULT='\033[0m'
# GLOBALS # GLOBALS
GREP_SEMVER_REGEXP='\"v\([0-9]*\)[.]\([0-9]*\)[.]\([0-9]*\)\"' # i.e. "v[number].[number].[number]"
BINARY_NAME='meilisearch' BINARY_NAME='meilisearch'
GREP_SEMVER_REGEXP='v\([0-9]*\)[.]\([0-9]*\)[.]\([0-9]*\)$' # i.e. v[number].[number].[number]
# FUNCTIONS
# semverParseInto and semverLT from https://github.com/cloudflare/semver_bash/blob/master/semver.sh # semverParseInto and semverLT from https://github.com/cloudflare/semver_bash/blob/master/semver.sh
@ -66,6 +68,88 @@ semverLT() {
return 1 return 1
} }
# Returns the tag of the latest stable release (in terms of semver and not of release date)
get_latest() {
temp_file='temp_file' # temp_file needed because the grep would start before the download is over
curl -s 'https://api.github.com/repos/meilisearch/MeiliSearch/releases' > "$temp_file"
releases=$(cat "$temp_file" | \
grep -E "tag_name|draft|prerelease" \
| tr -d ',"' | cut -d ':' -f2 | tr -d ' ')
# Returns a list of [tag_name draft_boolean prerelease_boolean ...]
# Ex: v0.10.1 false false v0.9.1-rc.1 false true v0.9.0 false false...
i=0
latest=""
current_tag=""
for release_info in $releases; do
if [ $i -eq 0 ]; then # Cheking tag_name
if echo "$release_info" | grep -q "$GREP_SEMVER_REGEXP"; then # If it's not an alpha or beta release
current_tag=$release_info
else
current_tag=""
fi
i=1
elif [ $i -eq 1 ]; then # Checking draft boolean
if [ "$release_info" = "true" ]; then
current_tag=""
fi
i=2
elif [ $i -eq 2 ]; then # Checking prerelease boolean
if [ "$release_info" = "true" ]; then
current_tag=""
fi
i=0
if [ "$current_tag" != "" ]; then # If the current_tag is valid
if [ "$latest" = "" ]; then # If there is no latest yet
latest="$current_tag"
else
semverLT $current_tag $latest # Comparing latest and the current tag
if [ $? -eq 1 ]; then
latest="$current_tag"
fi
fi
fi
fi
done
rm -f "$temp_file"
echo $latest
}
# Gets the OS by setting the $os variable
# Returns 0 in case of success, 1 otherwise.
get_os() {
os_name=$(uname -s)
case "$os_name" in
'Darwin')
os='macos'
;;
'Linux')
os='linux'
;;
*)
return 1
esac
return 0
}
# Gets the architecture by setting the $archi variable
# Returns 0 in case of success, 1 otherwise.
get_archi() {
architecture=$(uname -m)
case "$architecture" in
'x86_64' | 'amd64')
archi='amd64'
;;
'aarch64')
archi='armv8'
;;
*)
return 1
esac
return 0
}
success_usage() { success_usage() {
printf "$GREEN%s\n$DEFAULT" "MeiliSearch binary successfully downloaded as '$BINARY_NAME' file." printf "$GREEN%s\n$DEFAULT" "MeiliSearch binary successfully downloaded as '$BINARY_NAME' file."
echo '' echo ''
@ -76,53 +160,27 @@ success_usage() {
} }
failure_usage() { failure_usage() {
printf "$RED%s\n$DEFAULT" 'ERROR: MeiliSearch binary is not available for your OS distribution yet.' printf "$RED%s\n$DEFAULT" 'ERROR: MeiliSearch binary is not available for your OS distribution or your architecture yet.'
echo '' echo ''
echo 'However, you can easily compile the binary from the source files.' echo 'However, you can easily compile the binary from the source files.'
echo 'Follow the steps on the docs: https://docs.meilisearch.com/advanced_guides/binary.html#how-to-compile-meilisearch' echo 'Follow the steps at the page ("Source" tab): https://docs.meilisearch.com/guides/advanced_guides/installation.html'
} }
# OS DETECTION # MAIN
echo 'Detecting OS distribution...' latest="$(get_latest)"
os_name=$(uname -s) get_os
if [ "$os_name" != "Darwin" ]; then if [ "$?" -eq 1 ]; then
os_name=$(cat /etc/os-release | grep '^ID=' | tr -d '"' | cut -d '=' -f 2)
fi
echo "OS distribution detected: $os_name"
case "$os_name" in
'Darwin')
os='macos'
;;
'ubuntu' | 'debian')
os='linux'
;;
*)
failure_usage failure_usage
exit 1 exit 1
esac fi
get_archi
# GET LATEST VERSION if [ "$?" -eq 1 ]; then
tags=$(curl -s 'https://api.github.com/repos/meilisearch/MeiliSearch/tags' \ failure_usage
| grep "$GREP_SEMVER_REGEXP" \ exit 1
| grep 'name' \ fi
| tr -d '"' | tr -d ',' | cut -d 'v' -f 2) echo "Downloading MeiliSearch binary $latest for $os, architecture $archi..."
release_file="meilisearch-$os-$archi"
latest="" link="https://github.com/meilisearch/MeiliSearch/releases/download/$latest/$release_file"
for tag in $tags; do
if [ "$latest" = "" ]; then
latest="$tag"
else
semverLT $tag $latest
if [ $? -eq 1 ]; then
latest="$tag"
fi
fi
done
# DOWNLOAD THE LATEST
echo "Downloading MeiliSearch binary v$latest for $os..."
release_file="meilisearch-$os-amd64"
link="https://github.com/meilisearch/MeiliSearch/releases/download/v$latest/$release_file"
curl -OL "$link" curl -OL "$link"
mv "$release_file" "$BINARY_NAME" mv "$release_file" "$BINARY_NAME"
chmod 744 "$BINARY_NAME" chmod 744 "$BINARY_NAME"

View File

@ -1,49 +1,56 @@
[package] [package]
name = "meilisearch-core" name = "meilisearch-core"
version = "0.9.0" version = "0.15.0"
license = "MIT" license = "MIT"
authors = ["Kerollmops <clement@meilisearch.com>"] authors = ["Kerollmops <clement@meilisearch.com>"]
edition = "2018" edition = "2018"
[dependencies] [dependencies]
arc-swap = "0.4.3" arc-swap = "0.4.5"
bincode = "1.1.4" bincode = "1.2.1"
byteorder = "1.3.2" byteorder = "1.3.4"
chrono = { version = "0.4.9", features = ["serde"] } chrono = { version = "0.4.11", features = ["serde"] }
compact_arena = "0.4.0" compact_arena = "0.4.0"
crossbeam-channel = "0.4.0" cow-utils = "0.1.2"
deunicode = "1.0.0" crossbeam-channel = "0.4.2"
env_logger = "0.7.0" deunicode = "1.1.0"
fst = { version = "0.3.5", default-features = false } either = "1.5.3"
hashbrown = { version = "0.6.0", features = ["serde"] } env_logger = "0.7.1"
heed = "0.6.1" fst = "0.4.4"
indexmap = { version = "1.2.0", features = ["serde-1"] } hashbrown = { version = "0.7.1", features = ["serde"] }
heed = "0.8.0"
indexmap = { version = "1.3.2", features = ["serde-1"] }
intervaltree = "0.2.5" intervaltree = "0.2.5"
itertools = "0.8.2" itertools = "0.9.0"
levenshtein_automata = { version = "0.1.1", features = ["fst_automaton"] } levenshtein_automata = { version = "0.2.0", features = ["fst_automaton"] }
log = "0.4.8" log = "0.4.8"
meilisearch-schema = { path = "../meilisearch-schema", version = "0.9.0" } meilisearch-error = { path = "../meilisearch-error", version = "0.15.0" }
meilisearch-tokenizer = { path = "../meilisearch-tokenizer", version = "0.9.0" } meilisearch-schema = { path = "../meilisearch-schema", version = "0.15.0" }
meilisearch-types = { path = "../meilisearch-types", version = "0.9.0" } meilisearch-tokenizer = { path = "../meilisearch-tokenizer", version = "0.15.0" }
once_cell = "1.2.0" meilisearch-types = { path = "../meilisearch-types", version = "0.15.0" }
once_cell = "1.3.1"
ordered-float = { version = "1.0.2", features = ["serde"] } ordered-float = { version = "1.0.2", features = ["serde"] }
regex = "1.3.1" pest = { git = "https://github.com/pest-parser/pest.git", rev = "51fd1d49f1041f7839975664ef71fe15c7dcaf67" }
sdset = "0.3.6" pest_derive = "2.0"
serde = { version = "1.0.101", features = ["derive"] } regex = "1.3.6"
serde_json = "1.0.41" sdset = "0.4.0"
siphasher = "0.3.1" serde = { version = "1.0.105", features = ["derive"] }
serde_json = { version = "1.0.50", features = ["preserve_order"] }
slice-group-by = "0.2.6" slice-group-by = "0.2.6"
zerocopy = "0.2.8" unicase = "2.6.0"
zerocopy = "0.3.0"
[dev-dependencies] [dev-dependencies]
assert_matches = "1.3" assert_matches = "1.3.0"
criterion = "0.3" criterion = "0.3.1"
csv = "1.0.7" csv = "1.1.3"
jemallocator = "0.3.2" rustyline = { version = "6.0.0", default-features = false }
rustyline = { version = "5.0.0", default-features = false } structopt = "0.3.12"
structopt = "0.3.2"
tempfile = "3.1.0" tempfile = "3.1.0"
termcolor = "1.0.4" termcolor = "1.1.0"
[target.'cfg(unix)'.dev-dependencies]
jemallocator = "0.3.2"
[[bench]] [[bench]]
name = "search_benchmark" name = "search_benchmark"

View File

@ -2,19 +2,23 @@
#[macro_use] #[macro_use]
extern crate assert_matches; extern crate assert_matches;
use std::sync::mpsc; use std::error::Error;
use std::path::Path; use std::fs::File;
use std::fs; use std::io::BufReader;
use std::iter; use std::iter;
use std::path::Path;
use std::sync::mpsc;
use meilisearch_core::Database; use meilisearch_core::{Database, DatabaseOptions};
use meilisearch_core::{ProcessedUpdateResult, UpdateStatus}; use meilisearch_core::{ProcessedUpdateResult, UpdateStatus};
use meilisearch_core::settings::{Settings, SettingsUpdate};
use meilisearch_schema::Schema;
use serde_json::Value; use serde_json::Value;
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId}; use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};
fn prepare_database(path: &Path) -> Database { fn prepare_database(path: &Path) -> Database {
let database = Database::open_or_create(path).unwrap(); let database = Database::open_or_create(path, DatabaseOptions::default()).unwrap();
let db = &database; let db = &database;
let (sender, receiver) = mpsc::sync_channel(100); let (sender, receiver) = mpsc::sync_channel(100);
@ -25,21 +29,29 @@ fn prepare_database(path: &Path) -> Database {
database.set_update_callback(Box::new(update_fn)); database.set_update_callback(Box::new(update_fn));
let schema = { db.main_write::<_, _, Box<dyn Error>>(|writer| {
let path = concat!(env!("CARGO_MANIFEST_DIR"), "/../datasets/movies/schema.toml"); index.main.put_schema(writer, &Schema::with_primary_key("id")).unwrap();
let string = fs::read_to_string(path).expect("find schema"); Ok(())
toml::from_str(&string).unwrap() }).unwrap();
let settings_update: SettingsUpdate = {
let path = concat!(env!("CARGO_MANIFEST_DIR"), "/../datasets/movies/settings.json");
let file = File::open(path).unwrap();
let reader = BufReader::new(file);
let settings: Settings = serde_json::from_reader(reader).unwrap();
settings.to_update().unwrap()
}; };
let mut update_writer = db.update_write_txn().unwrap(); db.update_write::<_, _, Box<dyn Error>>(|writer| {
let _update_id = index.schema_update(&mut update_writer, schema).unwrap(); let _update_id = index.settings_update(writer, settings_update).unwrap();
update_writer.commit().unwrap(); Ok(())
}).unwrap();
let mut additions = index.documents_addition(); let mut additions = index.documents_addition();
let json: Value = { let json: Value = {
let path = concat!(env!("CARGO_MANIFEST_DIR"), "/../datasets/movies/movies.json"); let path = concat!(env!("CARGO_MANIFEST_DIR"), "/../datasets/movies/movies.json");
let movies_file = fs::File::open(path).expect("find movies"); let movies_file = File::open(path).expect("find movies");
serde_json::from_reader(movies_file).unwrap() serde_json::from_reader(movies_file).unwrap()
}; };
@ -49,9 +61,10 @@ fn prepare_database(path: &Path) -> Database {
additions.update_document(document); additions.update_document(document);
} }
let mut update_writer = db.update_write_txn().unwrap(); let update_id = db.update_write::<_, _, Box<dyn Error>>(|writer| {
let update_id = additions.finalize(&mut update_writer).unwrap(); let update_id = additions.finalize(writer).unwrap();
update_writer.commit().unwrap(); Ok(update_id)
}).unwrap();
// block until the transaction is processed // block until the transaction is processed
let _ = receiver.into_iter().find(|id| *id == update_id); let _ = receiver.into_iter().find(|id| *id == update_id);

View File

@ -12,11 +12,11 @@ use serde::{Deserialize, Serialize};
use structopt::StructOpt; use structopt::StructOpt;
use termcolor::{Color, ColorChoice, ColorSpec, StandardStream, WriteColor}; use termcolor::{Color, ColorChoice, ColorSpec, StandardStream, WriteColor};
use meilisearch_core::{Database, Highlight, ProcessedUpdateResult}; use meilisearch_core::{Database, DatabaseOptions, Highlight, ProcessedUpdateResult};
use meilisearch_core::settings::Settings; use meilisearch_core::settings::Settings;
use meilisearch_schema::FieldId; use meilisearch_schema::FieldId;
// #[cfg(target_os = "linux")] #[cfg(target_os = "linux")]
#[global_allocator] #[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc; static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
@ -123,12 +123,10 @@ fn index_command(command: IndexCommand, database: Database) -> Result<(), Box<dy
let settings = { let settings = {
let string = fs::read_to_string(&command.settings)?; let string = fs::read_to_string(&command.settings)?;
let settings: Settings = serde_json::from_str(&string).unwrap(); let settings: Settings = serde_json::from_str(&string).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut update_writer = db.update_write_txn().unwrap(); db.update_write(|w| index.settings_update(w, settings))?;
index.settings_update(&mut update_writer, settings)?;
update_writer.commit().unwrap();
let mut rdr = if command.csv_data_path.as_os_str() == "-" { let mut rdr = if command.csv_data_path.as_os_str() == "-" {
csv::Reader::from_reader(Box::new(io::stdin()) as Box<dyn Read>) csv::Reader::from_reader(Box::new(io::stdin()) as Box<dyn Read>)
@ -175,10 +173,9 @@ fn index_command(command: IndexCommand, database: Database) -> Result<(), Box<dy
println!(); println!();
let mut update_writer = db.update_write_txn().unwrap(); let update_id = db.update_write(|w| additions.finalize(w))?;
println!("committing update..."); println!("committing update...");
let update_id = additions.finalize(&mut update_writer)?;
update_writer.commit().unwrap();
max_update_id = max_update_id.max(update_id); max_update_id = max_update_id.max(update_id);
println!("committed update {}", update_id); println!("committed update {}", update_id);
} }
@ -325,7 +322,7 @@ fn search_command(command: SearchCommand, database: Database) -> Result<(), Box<
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let schema = index.main.schema(&reader)?; let schema = index.main.schema(&reader)?;
reader.abort(); reader.abort().unwrap();
let schema = schema.ok_or(meilisearch_core::Error::SchemaMissing)?; let schema = schema.ok_or(meilisearch_core::Error::SchemaMissing)?;
@ -371,12 +368,12 @@ fn search_command(command: SearchCommand, database: Database) -> Result<(), Box<
}); });
} }
let documents = builder.query(ref_reader, &query, 0..command.number_results)?; let result = builder.query(ref_reader, Some(&query), 0..command.number_results)?;
let mut retrieve_duration = Duration::default(); let mut retrieve_duration = Duration::default();
let number_of_documents = documents.len(); let number_of_documents = result.documents.len();
for mut doc in documents { for mut doc in result.documents {
doc.highlights doc.highlights
.sort_unstable_by_key(|m| (m.char_index, m.char_length)); .sort_unstable_by_key(|m| (m.char_index, m.char_length));
@ -454,7 +451,7 @@ fn show_updates_command(
let reader = db.update_read_txn().unwrap(); let reader = db.update_read_txn().unwrap();
let updates = index.all_updates_status(&reader)?; let updates = index.all_updates_status(&reader)?;
println!("{:#?}", updates); println!("{:#?}", updates);
reader.abort(); reader.abort().unwrap();
Ok(()) Ok(())
} }
@ -463,7 +460,7 @@ fn main() -> Result<(), Box<dyn Error>> {
env_logger::init(); env_logger::init();
let opt = Command::from_args(); let opt = Command::from_args();
let database = Database::open_or_create(opt.path())?; let database = Database::open_or_create(opt.path(), DatabaseOptions::default())?;
match opt { match opt {
Command::Index(command) => index_command(command, database), Command::Index(command) => index_command(command, database),

View File

@ -9,35 +9,42 @@ use std::time::Instant;
use std::fmt; use std::fmt;
use compact_arena::{SmallArena, Idx32, mk_arena}; use compact_arena::{SmallArena, Idx32, mk_arena};
use log::debug; use log::{debug, error};
use meilisearch_types::DocIndex; use sdset::{Set, SetBuf, exponential_search, SetOperation, Counter, duo::OpBuilder};
use sdset::{Set, SetBuf, exponential_search};
use slice_group_by::{GroupBy, GroupByMut}; use slice_group_by::{GroupBy, GroupByMut};
use crate::error::Error; use meilisearch_types::DocIndex;
use crate::criterion::{Criteria, Context, ContextMut}; use crate::criterion::{Criteria, Context, ContextMut};
use crate::distinct_map::{BufferedDistinctMap, DistinctMap}; use crate::distinct_map::{BufferedDistinctMap, DistinctMap};
use crate::raw_document::RawDocument; use crate::raw_document::RawDocument;
use crate::{database::MainT, reordered_attrs::ReorderedAttrs}; use crate::{database::MainT, reordered_attrs::ReorderedAttrs};
use crate::{store, Document, DocumentId, MResult}; use crate::{store, Document, DocumentId, MResult, Index, RankedMap, MainReader, Error};
use crate::query_tree::{create_query_tree, traverse_query_tree}; use crate::query_tree::{create_query_tree, traverse_query_tree};
use crate::query_tree::{Operation, QueryResult, QueryKind, QueryId, PostingsKey}; use crate::query_tree::{Operation, QueryResult, QueryKind, QueryId, PostingsKey};
use crate::query_tree::Context as QTContext; use crate::query_tree::Context as QTContext;
#[derive(Debug, Default)]
pub struct SortResult {
pub documents: Vec<Document>,
pub nb_hits: usize,
pub exhaustive_nb_hit: bool,
pub facets: Option<HashMap<String, HashMap<String, usize>>>,
pub exhaustive_facets_count: Option<bool>,
}
#[allow(clippy::too_many_arguments)]
pub fn bucket_sort<'c, FI>( pub fn bucket_sort<'c, FI>(
reader: &heed::RoTxn<MainT>, reader: &heed::RoTxn<MainT>,
query: &str, query: &str,
range: Range<usize>, range: Range<usize>,
facets_docids: Option<SetBuf<DocumentId>>,
facet_count_docids: Option<HashMap<String, HashMap<String, (&str, Cow<Set<DocumentId>>)>>>,
filter: Option<FI>, filter: Option<FI>,
criteria: Criteria<'c>, criteria: Criteria<'c>,
searchable_attrs: Option<ReorderedAttrs>, searchable_attrs: Option<ReorderedAttrs>,
main_store: store::Main, index: &Index,
postings_lists_store: store::PostingsLists, ) -> MResult<SortResult>
documents_fields_counts_store: store::DocumentsFieldsCounts,
synonyms_store: store::Synonyms,
prefix_documents_cache_store: store::PrefixDocumentsCache,
prefix_postings_lists_cache_store: store::PrefixPostingsListsCache,
) -> MResult<Vec<Document>>
where where
FI: Fn(DocumentId) -> bool, FI: Fn(DocumentId) -> bool,
{ {
@ -50,33 +57,28 @@ where
reader, reader,
query, query,
range, range,
facets_docids,
facet_count_docids,
filter, filter,
distinct, distinct,
distinct_size, distinct_size,
criteria, criteria,
searchable_attrs, searchable_attrs,
main_store, index,
postings_lists_store,
documents_fields_counts_store,
synonyms_store,
prefix_documents_cache_store,
prefix_postings_lists_cache_store,
); );
} }
let words_set = match unsafe { main_store.static_words_fst(reader)? } { let mut result = SortResult::default();
Some(words) => words,
None => return Ok(Vec::new()),
};
let stop_words = main_store.stop_words_fst(reader)?.unwrap_or_default(); let words_set = index.main.words_fst(reader)?;
let stop_words = index.main.stop_words_fst(reader)?;
let context = QTContext { let context = QTContext {
words_set, words_set,
stop_words, stop_words,
synonyms: synonyms_store, synonyms: index.synonyms,
postings_lists: postings_lists_store, postings_lists: index.postings_lists,
prefix_postings_lists: prefix_postings_lists_cache_store, prefix_postings_lists: index.prefix_postings_lists_cache,
}; };
let (operation, mapping) = create_query_tree(reader, &context, query)?; let (operation, mapping) = create_query_tree(reader, &context, query)?;
@ -94,10 +96,23 @@ where
let mut queries_kinds = HashMap::new(); let mut queries_kinds = HashMap::new();
recurs_operation(&mut queries_kinds, &operation); recurs_operation(&mut queries_kinds, &operation);
let QueryResult { docids, queries } = traverse_query_tree(reader, &context, &operation)?; let QueryResult { mut docids, queries } = traverse_query_tree(reader, &context, &operation)?;
debug!("found {} documents", docids.len()); debug!("found {} documents", docids.len());
debug!("number of postings {:?}", queries.len()); debug!("number of postings {:?}", queries.len());
if let Some(facets_docids) = facets_docids {
let intersection = sdset::duo::OpBuilder::new(docids.as_ref(), facets_docids.as_set())
.intersection()
.into_set_buf();
docids = Cow::Owned(intersection);
}
if let Some(f) = facet_count_docids {
// hardcoded value, until approximation optimization
result.exhaustive_facets_count = Some(true);
result.facets = Some(facet_count(f, &docids));
}
let before = Instant::now(); let before = Instant::now();
mk_arena!(arena); mk_arena!(arena);
let mut bare_matches = cleanup_bare_matches(&mut arena, &docids, queries); let mut bare_matches = cleanup_bare_matches(&mut arena, &docids, queries);
@ -132,7 +147,7 @@ where
reader, reader,
postings_lists: &mut arena, postings_lists: &mut arena,
query_mapping: &mapping, query_mapping: &mapping,
documents_fields_counts_store, documents_fields_counts_store: index.documents_fields_counts,
}; };
criterion.prepare(ctx, &mut group)?; criterion.prepare(ctx, &mut group)?;
@ -165,49 +180,48 @@ where
debug!("criterion loop took {:.02?}", before_criterion_loop.elapsed()); debug!("criterion loop took {:.02?}", before_criterion_loop.elapsed());
debug!("proximity evaluation called {} times", proximity_count.load(Ordering::Relaxed)); debug!("proximity evaluation called {} times", proximity_count.load(Ordering::Relaxed));
let schema = main_store.schema(reader)?.ok_or(Error::SchemaMissing)?; let schema = index.main.schema(reader)?.ok_or(Error::SchemaMissing)?;
let iter = raw_documents.into_iter().skip(range.start).take(range.len()); let iter = raw_documents.into_iter().skip(range.start).take(range.len());
let iter = iter.map(|rd| Document::from_raw(rd, &queries_kinds, &arena, searchable_attrs.as_ref(), &schema)); let iter = iter.map(|rd| Document::from_raw(rd, &queries_kinds, &arena, searchable_attrs.as_ref(), &schema));
let documents = iter.collect(); let documents = iter.collect();
debug!("bucket sort took {:.02?}", before_bucket_sort.elapsed()); debug!("bucket sort took {:.02?}", before_bucket_sort.elapsed());
Ok(documents) result.documents = documents;
result.nb_hits = docids.len();
Ok(result)
} }
#[allow(clippy::too_many_arguments)]
pub fn bucket_sort_with_distinct<'c, FI, FD>( pub fn bucket_sort_with_distinct<'c, FI, FD>(
reader: &heed::RoTxn<MainT>, reader: &heed::RoTxn<MainT>,
query: &str, query: &str,
range: Range<usize>, range: Range<usize>,
facets_docids: Option<SetBuf<DocumentId>>,
facet_count_docids: Option<HashMap<String, HashMap<String, (&str, Cow<Set<DocumentId>>)>>>,
filter: Option<FI>, filter: Option<FI>,
distinct: FD, distinct: FD,
distinct_size: usize, distinct_size: usize,
criteria: Criteria<'c>, criteria: Criteria<'c>,
searchable_attrs: Option<ReorderedAttrs>, searchable_attrs: Option<ReorderedAttrs>,
main_store: store::Main, index: &Index,
postings_lists_store: store::PostingsLists, ) -> MResult<SortResult>
documents_fields_counts_store: store::DocumentsFieldsCounts,
synonyms_store: store::Synonyms,
_prefix_documents_cache_store: store::PrefixDocumentsCache,
prefix_postings_lists_cache_store: store::PrefixPostingsListsCache,
) -> MResult<Vec<Document>>
where where
FI: Fn(DocumentId) -> bool, FI: Fn(DocumentId) -> bool,
FD: Fn(DocumentId) -> Option<u64>, FD: Fn(DocumentId) -> Option<u64>,
{ {
let words_set = match unsafe { main_store.static_words_fst(reader)? } { let mut result = SortResult::default();
Some(words) => words,
None => return Ok(Vec::new()),
};
let stop_words = main_store.stop_words_fst(reader)?.unwrap_or_default(); let words_set = index.main.words_fst(reader)?;
let stop_words = index.main.stop_words_fst(reader)?;
let context = QTContext { let context = QTContext {
words_set, words_set,
stop_words, stop_words,
synonyms: synonyms_store, synonyms: index.synonyms,
postings_lists: postings_lists_store, postings_lists: index.postings_lists,
prefix_postings_lists: prefix_postings_lists_cache_store, prefix_postings_lists: index.prefix_postings_lists_cache,
}; };
let (operation, mapping) = create_query_tree(reader, &context, query)?; let (operation, mapping) = create_query_tree(reader, &context, query)?;
@ -225,10 +239,23 @@ where
let mut queries_kinds = HashMap::new(); let mut queries_kinds = HashMap::new();
recurs_operation(&mut queries_kinds, &operation); recurs_operation(&mut queries_kinds, &operation);
let QueryResult { docids, queries } = traverse_query_tree(reader, &context, &operation)?; let QueryResult { mut docids, queries } = traverse_query_tree(reader, &context, &operation)?;
debug!("found {} documents", docids.len()); debug!("found {} documents", docids.len());
debug!("number of postings {:?}", queries.len()); debug!("number of postings {:?}", queries.len());
if let Some(facets_docids) = facets_docids {
let intersection = OpBuilder::new(docids.as_ref(), facets_docids.as_set())
.intersection()
.into_set_buf();
docids = Cow::Owned(intersection);
}
if let Some(f) = facet_count_docids {
// hardcoded value, until approximation optimization
result.exhaustive_facets_count = Some(true);
result.facets = Some(facet_count(f, &docids));
}
let before = Instant::now(); let before = Instant::now();
mk_arena!(arena); mk_arena!(arena);
let mut bare_matches = cleanup_bare_matches(&mut arena, &docids, queries); let mut bare_matches = cleanup_bare_matches(&mut arena, &docids, queries);
@ -273,7 +300,7 @@ where
reader, reader,
postings_lists: &mut arena, postings_lists: &mut arena,
query_mapping: &mapping, query_mapping: &mapping,
documents_fields_counts_store, documents_fields_counts_store: index.documents_fields_counts,
}; };
let before_criterion_preparation = Instant::now(); let before_criterion_preparation = Instant::now();
@ -338,17 +365,23 @@ where
// once we classified the documents related to the current // once we classified the documents related to the current
// automatons we save that as the next valid result // automatons we save that as the next valid result
let mut seen = BufferedDistinctMap::new(&mut distinct_map); let mut seen = BufferedDistinctMap::new(&mut distinct_map);
let schema = main_store.schema(reader)?.ok_or(Error::SchemaMissing)?; let schema = index.main.schema(reader)?.ok_or(Error::SchemaMissing)?;
let mut documents = Vec::with_capacity(range.len()); let mut documents = Vec::with_capacity(range.len());
for raw_document in raw_documents.into_iter().skip(distinct_raw_offset) { for raw_document in raw_documents.into_iter().skip(distinct_raw_offset) {
let filter_accepted = match &filter { let filter_accepted = match &filter {
Some(_) => filter_map.remove(&raw_document.id).unwrap(), Some(_) => filter_map.remove(&raw_document.id).unwrap_or_else(|| {
error!("error during filtering: expected value for document id {}", &raw_document.id.0);
Default::default()
}),
None => true, None => true,
}; };
if filter_accepted { if filter_accepted {
let key = key_cache.remove(&raw_document.id).unwrap(); let key = key_cache.remove(&raw_document.id).unwrap_or_else(|| {
error!("error during distinct: expected value for document id {}", &raw_document.id.0);
Default::default()
});
let distinct_accepted = match key { let distinct_accepted = match key {
Some(key) => seen.register(key), Some(key) => seen.register(key),
None => seen.register_without_key(), None => seen.register_without_key(),
@ -362,8 +395,10 @@ where
} }
} }
} }
result.documents = documents;
result.nb_hits = docids.len();
Ok(documents) Ok(result)
} }
fn cleanup_bare_matches<'tag, 'txn>( fn cleanup_bare_matches<'tag, 'txn>(
@ -558,3 +593,69 @@ impl Deref for PostingsListView<'_> {
} }
} }
} }
/// sorts documents ids according to user defined ranking rules.
pub fn placeholder_document_sort(
document_ids: &mut [DocumentId],
index: &store::Index,
reader: &MainReader,
ranked_map: &RankedMap
) -> MResult<()> {
use crate::settings::RankingRule;
use std::cmp::Ordering;
enum SortOrder {
Asc,
Desc,
}
if let Some(ranking_rules) = index.main.ranking_rules(reader)? {
let schema = index.main.schema(reader)?
.ok_or(Error::SchemaMissing)?;
// Select custom rules from ranking rules, and map them to custom rules
// containing a field_id
let ranking_rules = ranking_rules.iter().filter_map(|r|
match r {
RankingRule::Asc(name) => schema.id(name).map(|f| (f, SortOrder::Asc)),
RankingRule::Desc(name) => schema.id(name).map(|f| (f, SortOrder::Desc)),
_ => None,
}).collect::<Vec<_>>();
document_ids.sort_unstable_by(|a, b| {
for (field_id, order) in &ranking_rules {
let a_value = ranked_map.get(*a, *field_id);
let b_value = ranked_map.get(*b, *field_id);
let (a, b) = match order {
SortOrder::Asc => (a_value, b_value),
SortOrder::Desc => (b_value, a_value),
};
match a.cmp(&b) {
Ordering::Equal => continue,
ordering => return ordering,
}
}
Ordering::Equal
});
}
Ok(())
}
/// For each entry in facet_docids, calculates the number of documents in the intersection with candidate_docids.
pub fn facet_count(
facet_docids: HashMap<String, HashMap<String, (&str, Cow<Set<DocumentId>>)>>,
candidate_docids: &Set<DocumentId>,
) -> HashMap<String, HashMap<String, usize>> {
let mut facets_counts = HashMap::with_capacity(facet_docids.len());
for (key, doc_map) in facet_docids {
let mut count_map = HashMap::with_capacity(doc_map.len());
for (_, (value, docids)) in doc_map {
let mut counter = Counter::new();
let op = OpBuilder::new(docids.as_ref(), candidate_docids).intersection();
SetOperation::<DocumentId>::extend_collection(op, &mut counter);
count_map.insert(value.to_string(), counter.0);
}
facets_counts.insert(key, count_map);
}
facets_counts
}

View File

@ -92,6 +92,7 @@ impl<'a> CriteriaBuilder<'a> {
self.inner.reserve(additional) self.inner.reserve(additional)
} }
#[allow(clippy::should_implement_trait)]
pub fn add<C: 'a>(mut self, criterion: C) -> CriteriaBuilder<'a> pub fn add<C: 'a>(mut self, criterion: C) -> CriteriaBuilder<'a>
where where
C: Criterion, C: Criterion,

View File

@ -22,6 +22,7 @@ impl Criterion for Typo {
// It is safe to panic on input number higher than 3, // It is safe to panic on input number higher than 3,
// the number of typos is never bigger than that. // the number of typos is never bigger than that.
#[inline] #[inline]
#[allow(clippy::approx_constant)]
fn custom_log10(n: u8) -> f32 { fn custom_log10(n: u8) -> f32 {
match n { match n {
0 => 0.0, // log(1) 0 => 0.0, // log(1)

View File

@ -3,18 +3,33 @@ use std::fs::File;
use std::path::Path; use std::path::Path;
use std::sync::{Arc, RwLock}; use std::sync::{Arc, RwLock};
use std::{fs, thread}; use std::{fs, thread};
use std::io::{Read, Write, ErrorKind};
use chrono::{DateTime, Utc};
use crossbeam_channel::{Receiver, Sender}; use crossbeam_channel::{Receiver, Sender};
use heed::types::{Str, Unit}; use heed::CompactionOption;
use heed::{CompactionOption, Result as ZResult}; use heed::types::{Str, Unit, SerdeBincode};
use log::debug; use log::{debug, error};
use meilisearch_schema::Schema; use meilisearch_schema::Schema;
use regex::Regex;
use crate::{store, update, Index, MResult}; use crate::{store, update, Index, MResult, Error};
pub type BoxUpdateFn = Box<dyn Fn(&str, update::ProcessedUpdateResult) + Send + Sync + 'static>; pub type BoxUpdateFn = Box<dyn Fn(&str, update::ProcessedUpdateResult) + Send + Sync + 'static>;
type ArcSwapFn = arc_swap::ArcSwapOption<BoxUpdateFn>; type ArcSwapFn = arc_swap::ArcSwapOption<BoxUpdateFn>;
type SerdeDatetime = SerdeBincode<DateTime<Utc>>;
pub type MainWriter<'a> = heed::RwTxn<'a, MainT>;
pub type MainReader = heed::RoTxn<MainT>;
pub type UpdateWriter<'a> = heed::RwTxn<'a, UpdateT>;
pub type UpdateReader = heed::RoTxn<UpdateT>;
const UNHEALTHY_KEY: &str = "_is_unhealthy";
const LAST_UPDATE_KEY: &str = "last-update";
pub struct MainT; pub struct MainT;
pub struct UpdateT; pub struct UpdateT;
@ -25,6 +40,21 @@ pub struct Database {
indexes_store: heed::Database<Str, Unit>, indexes_store: heed::Database<Str, Unit>,
indexes: RwLock<HashMap<String, (Index, thread::JoinHandle<MResult<()>>)>>, indexes: RwLock<HashMap<String, (Index, thread::JoinHandle<MResult<()>>)>>,
update_fn: Arc<ArcSwapFn>, update_fn: Arc<ArcSwapFn>,
database_version: (u32, u32, u32),
}
pub struct DatabaseOptions {
pub main_map_size: usize,
pub update_map_size: usize,
}
impl Default for DatabaseOptions {
fn default() -> DatabaseOptions {
DatabaseOptions {
main_map_size: 100 * 1024 * 1024 * 1024, //100Gb
update_map_size: 100 * 1024 * 1024 * 1024, //100Gb
}
}
} }
macro_rules! r#break_try { macro_rules! r#break_try {
@ -55,8 +85,7 @@ fn update_awaiter(
update_fn: Arc<ArcSwapFn>, update_fn: Arc<ArcSwapFn>,
index: Index, index: Index,
) -> MResult<()> { ) -> MResult<()> {
let mut receiver = receiver.into_iter(); for event in receiver {
while let Some(event) = receiver.next() {
// if we receive a *MustClear* event, clear the index and break the loop // if we receive a *MustClear* event, clear the index and break the loop
if let UpdateEvent::MustClear = event { if let UpdateEvent::MustClear = event {
@ -90,7 +119,7 @@ fn update_awaiter(
}; };
// do not keep the reader for too long // do not keep the reader for too long
update_reader.abort(); break_try!(update_reader.abort(), "aborting update transaction failed");
// instantiate a transaction to touch to the main env // instantiate a transaction to touch to the main env
let result = env.typed_write_txn::<MainT>(); let result = env.typed_write_txn::<MainT>();
@ -104,7 +133,7 @@ fn update_awaiter(
if status.error.is_none() { if status.error.is_none() {
break_try!(main_writer.commit(), "commit nested transaction failed"); break_try!(main_writer.commit(), "commit nested transaction failed");
} else { } else {
main_writer.abort() break_try!(main_writer.abort(), "abborting nested transaction failed");
} }
// now that the update has been processed we can instantiate // now that the update has been processed we can instantiate
@ -135,20 +164,90 @@ fn update_awaiter(
Ok(()) Ok(())
} }
/// Ensures Meilisearch version is compatible with the database, returns an error versions mismatch.
/// If create is set to true, a VERSION file is created with the current version.
fn version_guard(path: &Path, create: bool) -> MResult<(u32, u32, u32)> {
let current_version_major = env!("CARGO_PKG_VERSION_MAJOR");
let current_version_minor = env!("CARGO_PKG_VERSION_MINOR");
let current_version_patch = env!("CARGO_PKG_VERSION_PATCH");
let version_path = path.join("VERSION");
match File::open(&version_path) {
Ok(mut file) => {
let mut version = String::new();
file.read_to_string(&mut version)?;
// Matches strings like XX.XX.XX
let re = Regex::new(r"(\d+).(\d+).(\d+)").unwrap();
// Make sure there is a result
let version = re
.captures_iter(&version)
.next()
.ok_or_else(|| Error::VersionMismatch("bad VERSION file".to_string()))?;
// the first is always the complete match, safe to unwrap because we have a match
let version_major = version.get(1).unwrap().as_str();
let version_minor = version.get(2).unwrap().as_str();
let version_patch = version.get(3).unwrap().as_str();
if version_major != current_version_major || version_minor != current_version_minor {
Err(Error::VersionMismatch(format!("{}.{}.XX", version_major, version_minor)))
} else {
Ok((
version_major.parse().or_else(|e| Err(Error::VersionMismatch(format!("error parsing database version: {}", e))))?,
version_minor.parse().or_else(|e| Err(Error::VersionMismatch(format!("error parsing database version: {}", e))))?,
version_patch.parse().or_else(|e| Err(Error::VersionMismatch(format!("error parsing database version: {}", e))))?
))
}
}
Err(error) => {
match error.kind() {
ErrorKind::NotFound => {
if create {
// when no version file is found, and we've been told to create one,
// create a new file with the current version in it.
let mut version_file = File::create(&version_path)?;
version_file.write_all(format!("{}.{}.{}",
current_version_major,
current_version_minor,
current_version_patch).as_bytes())?;
Ok((
current_version_major.parse().or_else(|e| Err(Error::VersionMismatch(format!("error parsing database version: {}", e))))?,
current_version_minor.parse().or_else(|e| Err(Error::VersionMismatch(format!("error parsing database version: {}", e))))?,
current_version_patch.parse().or_else(|e| Err(Error::VersionMismatch(format!("error parsing database version: {}", e))))?
))
} else {
// when no version file is found and we were not told to create one, this
// means that the version is inferior to the one this feature was added in.
Err(Error::VersionMismatch("<0.12.0".to_string()))
}
}
_ => Err(error.into())
}
}
}
}
impl Database { impl Database {
pub fn open_or_create(path: impl AsRef<Path>) -> MResult<Database> { pub fn open_or_create(path: impl AsRef<Path>, options: DatabaseOptions) -> MResult<Database> {
let main_path = path.as_ref().join("main"); let main_path = path.as_ref().join("main");
let update_path = path.as_ref().join("update"); let update_path = path.as_ref().join("update");
//create db directory
fs::create_dir_all(&path)?;
// create file only if main db wasn't created before (first run)
let database_version = version_guard(path.as_ref(), !main_path.exists() && !update_path.exists())?;
fs::create_dir_all(&main_path)?; fs::create_dir_all(&main_path)?;
let env = heed::EnvOpenOptions::new() let env = heed::EnvOpenOptions::new()
.map_size(100 * 1024 * 1024 * 1024) // 100GB .map_size(options.main_map_size)
.max_dbs(3000) .max_dbs(3000)
.open(main_path)?; .open(main_path)?;
fs::create_dir_all(&update_path)?; fs::create_dir_all(&update_path)?;
let update_env = heed::EnvOpenOptions::new() let update_env = heed::EnvOpenOptions::new()
.map_size(100 * 1024 * 1024 * 1024) // 100GB .map_size(options.update_map_size)
.max_dbs(3000) .max_dbs(3000)
.open(update_path)?; .open(update_path)?;
@ -164,7 +263,7 @@ impl Database {
must_open.push(index_uid.to_owned()); must_open.push(index_uid.to_owned());
} }
reader.abort(); reader.abort()?;
// open the previously aggregated indexes // open the previously aggregated indexes
let mut indexes = HashMap::new(); let mut indexes = HashMap::new();
@ -216,6 +315,7 @@ impl Database {
indexes_store, indexes_store,
indexes: RwLock::new(indexes), indexes: RwLock::new(indexes),
update_fn, update_fn,
database_version,
}) })
} }
@ -227,6 +327,13 @@ impl Database {
} }
} }
pub fn is_indexing(&self, reader: &UpdateReader, index: &str) -> MResult<Option<bool>> {
match self.open_index(&index) {
Some(index) => index.current_update_id(&reader).map(|u| Some(u.is_some())),
None => Ok(None),
}
}
pub fn create_index(&self, name: impl AsRef<str>) -> MResult<Index> { pub fn create_index(&self, name: impl AsRef<str>) -> MResult<Index> {
let name = name.as_ref(); let name = name.as_ref();
let mut indexes_lock = self.indexes.write().unwrap(); let mut indexes_lock = self.indexes.write().unwrap();
@ -305,31 +412,90 @@ impl Database {
self.update_fn.swap(None); self.update_fn.swap(None);
} }
pub fn main_read_txn(&self) -> heed::Result<heed::RoTxn<MainT>> { pub fn main_read_txn(&self) -> MResult<MainReader> {
self.env.typed_read_txn::<MainT>() Ok(self.env.typed_read_txn::<MainT>()?)
} }
pub fn main_write_txn(&self) -> heed::Result<heed::RwTxn<MainT>> { pub(crate) fn main_write_txn(&self) -> MResult<MainWriter> {
self.env.typed_write_txn::<MainT>() Ok(self.env.typed_write_txn::<MainT>()?)
} }
pub fn update_read_txn(&self) -> heed::Result<heed::RoTxn<UpdateT>> { /// Calls f providing it with a writer to the main database. After f is called, makes sure the
self.update_env.typed_read_txn::<UpdateT>() /// transaction is commited. Returns whatever result f returns.
pub fn main_write<F, R, E>(&self, f: F) -> Result<R, E>
where
F: FnOnce(&mut MainWriter) -> Result<R, E>,
E: From<Error>,
{
let mut writer = self.main_write_txn()?;
let result = f(&mut writer)?;
writer.commit().map_err(Error::Heed)?;
Ok(result)
} }
pub fn update_write_txn(&self) -> heed::Result<heed::RwTxn<UpdateT>> { /// provides a context with a reader to the main database. experimental.
self.update_env.typed_write_txn::<UpdateT>() pub fn main_read<F, R, E>(&self, f: F) -> Result<R, E>
where
F: FnOnce(&MainReader) -> Result<R, E>,
E: From<Error>,
{
let reader = self.main_read_txn()?;
let result = f(&reader)?;
reader.abort().map_err(Error::Heed)?;
Ok(result)
} }
pub fn copy_and_compact_to_path<P: AsRef<Path>>(&self, path: P) -> ZResult<(File, File)> { pub fn update_read_txn(&self) -> MResult<UpdateReader> {
Ok(self.update_env.typed_read_txn::<UpdateT>()?)
}
pub(crate) fn update_write_txn(&self) -> MResult<heed::RwTxn<UpdateT>> {
Ok(self.update_env.typed_write_txn::<UpdateT>()?)
}
/// Calls f providing it with a writer to the main database. After f is called, makes sure the
/// transaction is commited. Returns whatever result f returns.
pub fn update_write<F, R, E>(&self, f: F) -> Result<R, E>
where
F: FnOnce(&mut UpdateWriter) -> Result<R, E>,
E: From<Error>,
{
let mut writer = self.update_write_txn()?;
let result = f(&mut writer)?;
writer.commit().map_err(Error::Heed)?;
Ok(result)
}
/// provides a context with a reader to the update database. experimental.
pub fn update_read<F, R, E>(&self, f: F) -> Result<R, E>
where
F: FnOnce(&UpdateReader) -> Result<R, E>,
E: From<Error>,
{
let reader = self.update_read_txn()?;
let result = f(&reader)?;
reader.abort().map_err(Error::Heed)?;
Ok(result)
}
pub fn copy_and_compact_to_path<P: AsRef<Path>>(&self, path: P) -> MResult<(File, File)> {
let path = path.as_ref(); let path = path.as_ref();
let env_path = path.join("main"); let env_path = path.join("main");
let env_update_path = path.join("update"); let env_update_path = path.join("update");
let env_version_path = path.join("VERSION");
fs::create_dir(&env_path)?; fs::create_dir(&env_path)?;
fs::create_dir(&env_update_path)?; fs::create_dir(&env_update_path)?;
// write Database Version
let (current_version_major, current_version_minor, current_version_patch) = self.database_version;
let mut version_file = File::create(&env_version_path)?;
version_file.write_all(format!("{}.{}.{}",
current_version_major,
current_version_minor,
current_version_patch).as_bytes())?;
let env_path = env_path.join("data.mdb"); let env_path = env_path.join("data.mdb");
let env_file = self.env.copy_to_path(&env_path, CompactionOption::Enabled)?; let env_file = self.env.copy_to_path(&env_path, CompactionOption::Enabled)?;
@ -338,7 +504,7 @@ impl Database {
Ok(update_env_file) => Ok((env_file, update_env_file)), Ok(update_env_file) => Ok((env_file, update_env_file)),
Err(e) => { Err(e) => {
fs::remove_file(env_path)?; fs::remove_file(env_path)?;
Err(e) Err(e.into())
}, },
} }
} }
@ -348,15 +514,87 @@ impl Database {
indexes.keys().cloned().collect() indexes.keys().cloned().collect()
} }
pub fn common_store(&self) -> heed::PolyDatabase { pub(crate) fn common_store(&self) -> heed::PolyDatabase {
self.common_store self.common_store
} }
pub fn last_update(&self, reader: &heed::RoTxn<MainT>) -> MResult<Option<DateTime<Utc>>> {
match self.common_store()
.get::<_, Str, SerdeDatetime>(reader, LAST_UPDATE_KEY)? {
Some(datetime) => Ok(Some(datetime)),
None => Ok(None),
}
}
pub fn set_last_update(&self, writer: &mut heed::RwTxn<MainT>, time: &DateTime<Utc>) -> MResult<()> {
self.common_store()
.put::<_, Str, SerdeDatetime>(writer, LAST_UPDATE_KEY, time)?;
Ok(())
}
pub fn set_healthy(&self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
let common_store = self.common_store();
common_store.delete::<_, Str>(writer, UNHEALTHY_KEY)?;
Ok(())
}
pub fn set_unhealthy(&self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
let common_store = self.common_store();
common_store.put::<_, Str, Unit>(writer, UNHEALTHY_KEY, &())?;
Ok(())
}
pub fn get_health(&self, reader: &heed::RoTxn<MainT>) -> MResult<Option<()>> {
let common_store = self.common_store();
Ok(common_store.get::<_, Str, Unit>(&reader, UNHEALTHY_KEY)?)
}
pub fn compute_stats(&self, writer: &mut MainWriter, index_uid: &str) -> MResult<()> {
let index = match self.open_index(&index_uid) {
Some(index) => index,
None => {
error!("Impossible to retrieve index {}", index_uid);
return Ok(());
}
};
let schema = match index.main.schema(&writer)? {
Some(schema) => schema,
None => return Ok(()),
};
let all_documents_fields = index
.documents_fields_counts
.all_documents_fields_counts(&writer)?;
// count fields frequencies
let mut fields_frequency = HashMap::<_, usize>::new();
for result in all_documents_fields {
let (_, attr, _) = result?;
if let Some(field_id) = schema.indexed_pos_to_field_id(attr) {
*fields_frequency.entry(field_id).or_default() += 1;
}
}
// convert attributes to their names
let frequency: HashMap<_, _> = fields_frequency
.into_iter()
.filter_map(|(a, c)| schema.name(a).map(|name| (name.to_string(), c)))
.collect();
index
.main
.put_fields_distribution(writer, &frequency)
}
pub fn version(&self) -> (u32, u32, u32) { self.database_version }
} }
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
use crate::bucket_sort::SortResult;
use crate::criterion::{self, CriteriaBuilder}; use crate::criterion::{self, CriteriaBuilder};
use crate::update::{ProcessedUpdateResult, UpdateStatus}; use crate::update::{ProcessedUpdateResult, UpdateStatus};
use crate::settings::Settings; use crate::settings::Settings;
@ -368,7 +606,7 @@ mod tests {
fn valid_updates() { fn valid_updates() {
let dir = tempfile::tempdir().unwrap(); let dir = tempfile::tempdir().unwrap();
let database = Database::open_or_create(dir.path()).unwrap(); let database = Database::open_or_create(dir.path(), DatabaseOptions::default()).unwrap();
let db = &database; let db = &database;
let (sender, receiver) = mpsc::sync_channel(100); let (sender, receiver) = mpsc::sync_channel(100);
@ -393,7 +631,7 @@ mod tests {
} }
"#; "#;
let settings: Settings = serde_json::from_str(data).unwrap(); let settings: Settings = serde_json::from_str(data).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut update_writer = db.update_write_txn().unwrap(); let mut update_writer = db.update_write_txn().unwrap();
@ -433,7 +671,7 @@ mod tests {
fn invalid_updates() { fn invalid_updates() {
let dir = tempfile::tempdir().unwrap(); let dir = tempfile::tempdir().unwrap();
let database = Database::open_or_create(dir.path()).unwrap(); let database = Database::open_or_create(dir.path(), DatabaseOptions::default()).unwrap();
let db = &database; let db = &database;
let (sender, receiver) = mpsc::sync_channel(100); let (sender, receiver) = mpsc::sync_channel(100);
@ -456,7 +694,7 @@ mod tests {
} }
"#; "#;
let settings: Settings = serde_json::from_str(data).unwrap(); let settings: Settings = serde_json::from_str(data).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut update_writer = db.update_write_txn().unwrap(); let mut update_writer = db.update_write_txn().unwrap();
@ -495,7 +733,7 @@ mod tests {
fn ignored_words_too_long() { fn ignored_words_too_long() {
let dir = tempfile::tempdir().unwrap(); let dir = tempfile::tempdir().unwrap();
let database = Database::open_or_create(dir.path()).unwrap(); let database = Database::open_or_create(dir.path(), DatabaseOptions::default()).unwrap();
let db = &database; let db = &database;
let (sender, receiver) = mpsc::sync_channel(100); let (sender, receiver) = mpsc::sync_channel(100);
@ -518,7 +756,7 @@ mod tests {
} }
"#; "#;
let settings: Settings = serde_json::from_str(data).unwrap(); let settings: Settings = serde_json::from_str(data).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut update_writer = db.update_write_txn().unwrap(); let mut update_writer = db.update_write_txn().unwrap();
@ -550,7 +788,7 @@ mod tests {
fn add_schema_attributes_at_end() { fn add_schema_attributes_at_end() {
let dir = tempfile::tempdir().unwrap(); let dir = tempfile::tempdir().unwrap();
let database = Database::open_or_create(dir.path()).unwrap(); let database = Database::open_or_create(dir.path(), DatabaseOptions::default()).unwrap();
let db = &database; let db = &database;
let (sender, receiver) = mpsc::sync_channel(100); let (sender, receiver) = mpsc::sync_channel(100);
@ -573,7 +811,7 @@ mod tests {
} }
"#; "#;
let settings: Settings = serde_json::from_str(data).unwrap(); let settings: Settings = serde_json::from_str(data).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut update_writer = db.update_write_txn().unwrap(); let mut update_writer = db.update_write_txn().unwrap();
@ -609,7 +847,7 @@ mod tests {
} }
"#; "#;
let settings: Settings = serde_json::from_str(data).unwrap(); let settings: Settings = serde_json::from_str(data).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut writer = db.update_write_txn().unwrap(); let mut writer = db.update_write_txn().unwrap();
@ -623,7 +861,7 @@ mod tests {
let update_reader = db.update_read_txn().unwrap(); let update_reader = db.update_read_txn().unwrap();
let result = index.update_status(&update_reader, update_id).unwrap(); let result = index.update_status(&update_reader, update_id).unwrap();
assert_matches!(result, Some(UpdateStatus::Processed { content }) if content.error.is_none()); assert_matches!(result, Some(UpdateStatus::Processed { content }) if content.error.is_none());
update_reader.abort(); update_reader.abort().unwrap();
let mut additions = index.documents_addition(); let mut additions = index.documents_addition();
@ -657,14 +895,14 @@ mod tests {
let update_reader = db.update_read_txn().unwrap(); let update_reader = db.update_read_txn().unwrap();
let result = index.update_status(&update_reader, update_id).unwrap(); let result = index.update_status(&update_reader, update_id).unwrap();
assert_matches!(result, Some(UpdateStatus::Processed { content }) if content.error.is_none()); assert_matches!(result, Some(UpdateStatus::Processed { content }) if content.error.is_none());
update_reader.abort(); update_reader.abort().unwrap();
// even try to search for a document // even try to search for a document
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let results = index.query_builder().query(&reader, "21 ", 0..20).unwrap(); let SortResult {documents, .. } = index.query_builder().query(&reader, Some("21 "), 0..20).unwrap();
assert_matches!(results.len(), 1); assert_matches!(documents.len(), 1);
reader.abort(); reader.abort().unwrap();
// try to introduce attributes in the middle of the schema // try to introduce attributes in the middle of the schema
let settings = { let settings = {
@ -675,7 +913,7 @@ mod tests {
} }
"#; "#;
let settings: Settings = serde_json::from_str(data).unwrap(); let settings: Settings = serde_json::from_str(data).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut writer = db.update_write_txn().unwrap(); let mut writer = db.update_write_txn().unwrap();
@ -694,7 +932,7 @@ mod tests {
fn deserialize_documents() { fn deserialize_documents() {
let dir = tempfile::tempdir().unwrap(); let dir = tempfile::tempdir().unwrap();
let database = Database::open_or_create(dir.path()).unwrap(); let database = Database::open_or_create(dir.path(), DatabaseOptions::default()).unwrap();
let db = &database; let db = &database;
let (sender, receiver) = mpsc::sync_channel(100); let (sender, receiver) = mpsc::sync_channel(100);
@ -717,7 +955,7 @@ mod tests {
} }
"#; "#;
let settings: Settings = serde_json::from_str(data).unwrap(); let settings: Settings = serde_json::from_str(data).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut writer = db.update_write_txn().unwrap(); let mut writer = db.update_write_txn().unwrap();
@ -753,19 +991,19 @@ mod tests {
let update_reader = db.update_read_txn().unwrap(); let update_reader = db.update_read_txn().unwrap();
let result = index.update_status(&update_reader, update_id).unwrap(); let result = index.update_status(&update_reader, update_id).unwrap();
assert_matches!(result, Some(UpdateStatus::Processed { content }) if content.error.is_none()); assert_matches!(result, Some(UpdateStatus::Processed { content }) if content.error.is_none());
update_reader.abort(); update_reader.abort().unwrap();
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let document: Option<IgnoredAny> = index.document(&reader, None, DocumentId(25)).unwrap(); let document: Option<IgnoredAny> = index.document(&reader, None, DocumentId(25)).unwrap();
assert!(document.is_none()); assert!(document.is_none());
let document: Option<IgnoredAny> = index let document: Option<IgnoredAny> = index
.document(&reader, None, DocumentId(7_900_334_843_754_999_545)) .document(&reader, None, DocumentId(0))
.unwrap(); .unwrap();
assert!(document.is_some()); assert!(document.is_some());
let document: Option<IgnoredAny> = index let document: Option<IgnoredAny> = index
.document(&reader, None, DocumentId(8_367_468_610_878_465_872)) .document(&reader, None, DocumentId(1))
.unwrap(); .unwrap();
assert!(document.is_some()); assert!(document.is_some());
} }
@ -774,7 +1012,7 @@ mod tests {
fn partial_document_update() { fn partial_document_update() {
let dir = tempfile::tempdir().unwrap(); let dir = tempfile::tempdir().unwrap();
let database = Database::open_or_create(dir.path()).unwrap(); let database = Database::open_or_create(dir.path(), DatabaseOptions::default()).unwrap();
let db = &database; let db = &database;
let (sender, receiver) = mpsc::sync_channel(100); let (sender, receiver) = mpsc::sync_channel(100);
@ -797,7 +1035,7 @@ mod tests {
} }
"#; "#;
let settings: Settings = serde_json::from_str(data).unwrap(); let settings: Settings = serde_json::from_str(data).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut writer = db.update_write_txn().unwrap(); let mut writer = db.update_write_txn().unwrap();
@ -833,23 +1071,23 @@ mod tests {
let update_reader = db.update_read_txn().unwrap(); let update_reader = db.update_read_txn().unwrap();
let result = index.update_status(&update_reader, update_id).unwrap(); let result = index.update_status(&update_reader, update_id).unwrap();
assert_matches!(result, Some(UpdateStatus::Processed { content }) if content.error.is_none()); assert_matches!(result, Some(UpdateStatus::Processed { content }) if content.error.is_none());
update_reader.abort(); update_reader.abort().unwrap();
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let document: Option<IgnoredAny> = index.document(&reader, None, DocumentId(25)).unwrap(); let document: Option<IgnoredAny> = index.document(&reader, None, DocumentId(25)).unwrap();
assert!(document.is_none()); assert!(document.is_none());
let document: Option<IgnoredAny> = index let document: Option<IgnoredAny> = index
.document(&reader, None, DocumentId(7_900_334_843_754_999_545)) .document(&reader, None, DocumentId(0))
.unwrap(); .unwrap();
assert!(document.is_some()); assert!(document.is_some());
let document: Option<IgnoredAny> = index let document: Option<IgnoredAny> = index
.document(&reader, None, DocumentId(8_367_468_610_878_465_872)) .document(&reader, None, DocumentId(1))
.unwrap(); .unwrap();
assert!(document.is_some()); assert!(document.is_some());
reader.abort(); reader.abort().unwrap();
let mut partial_additions = index.documents_partial_addition(); let mut partial_additions = index.documents_partial_addition();
@ -878,11 +1116,11 @@ mod tests {
let update_reader = db.update_read_txn().unwrap(); let update_reader = db.update_read_txn().unwrap();
let result = index.update_status(&update_reader, update_id).unwrap(); let result = index.update_status(&update_reader, update_id).unwrap();
assert_matches!(result, Some(UpdateStatus::Processed { content }) if content.error.is_none()); assert_matches!(result, Some(UpdateStatus::Processed { content }) if content.error.is_none());
update_reader.abort(); update_reader.abort().unwrap();
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let document: Option<serde_json::Value> = index let document: Option<serde_json::Value> = index
.document(&reader, None, DocumentId(7_900_334_843_754_999_545)) .document(&reader, None, DocumentId(0))
.unwrap(); .unwrap();
let new_doc1 = serde_json::json!({ let new_doc1 = serde_json::json!({
@ -893,7 +1131,7 @@ mod tests {
assert_eq!(document, Some(new_doc1)); assert_eq!(document, Some(new_doc1));
let document: Option<serde_json::Value> = index let document: Option<serde_json::Value> = index
.document(&reader, None, DocumentId(8_367_468_610_878_465_872)) .document(&reader, None, DocumentId(1))
.unwrap(); .unwrap();
let new_doc2 = serde_json::json!({ let new_doc2 = serde_json::json!({
@ -908,7 +1146,7 @@ mod tests {
fn delete_index() { fn delete_index() {
let dir = tempfile::tempdir().unwrap(); let dir = tempfile::tempdir().unwrap();
let database = Arc::new(Database::open_or_create(dir.path()).unwrap()); let database = Arc::new(Database::open_or_create(dir.path(), DatabaseOptions::default()).unwrap());
let db = &database; let db = &database;
let (sender, receiver) = mpsc::sync_channel(100); let (sender, receiver) = mpsc::sync_channel(100);
@ -936,7 +1174,7 @@ mod tests {
} }
"#; "#;
let settings: Settings = serde_json::from_str(data).unwrap(); let settings: Settings = serde_json::from_str(data).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut writer = db.update_write_txn().unwrap(); let mut writer = db.update_write_txn().unwrap();
@ -980,7 +1218,7 @@ mod tests {
fn check_number_ordering() { fn check_number_ordering() {
let dir = tempfile::tempdir().unwrap(); let dir = tempfile::tempdir().unwrap();
let database = Database::open_or_create(dir.path()).unwrap(); let database = Database::open_or_create(dir.path(), DatabaseOptions::default()).unwrap();
let db = &database; let db = &database;
let (sender, receiver) = mpsc::sync_channel(100); let (sender, receiver) = mpsc::sync_channel(100);
@ -1012,7 +1250,7 @@ mod tests {
} }
"#; "#;
let settings: Settings = serde_json::from_str(data).unwrap(); let settings: Settings = serde_json::from_str(data).unwrap();
settings.into_update().unwrap() settings.to_update().unwrap()
}; };
let mut writer = db.update_write_txn().unwrap(); let mut writer = db.update_write_txn().unwrap();
@ -1059,20 +1297,20 @@ mod tests {
let builder = index.query_builder_with_criteria(criteria); let builder = index.query_builder_with_criteria(criteria);
let results = builder.query(&reader, "Kevin", 0..20).unwrap(); let SortResult {documents, .. } = builder.query(&reader, Some("Kevin"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!( assert_matches!(
iter.next(), iter.next(),
Some(Document { Some(Document {
id: DocumentId(7_900_334_843_754_999_545), id: DocumentId(0),
.. ..
}) })
); );
assert_matches!( assert_matches!(
iter.next(), iter.next(),
Some(Document { Some(Document {
id: DocumentId(8_367_468_610_878_465_872), id: DocumentId(1),
.. ..
}) })
); );

View File

@ -1,30 +1,63 @@
use crate::serde::{DeserializerError, SerializerError}; use crate::serde::{DeserializerError, SerializerError};
use serde_json::Error as SerdeJsonError; use serde_json::Error as SerdeJsonError;
use pest::error::Error as PestError;
use crate::filters::Rule;
use std::{error, fmt, io}; use std::{error, fmt, io};
pub use heed::Error as HeedError;
pub use fst::Error as FstError;
pub use bincode::Error as BincodeError; pub use bincode::Error as BincodeError;
pub use fst::Error as FstError;
pub use heed::Error as HeedError;
pub use pest::error as pest_error;
use meilisearch_error::{ErrorCode, Code};
pub type MResult<T> = Result<T, Error>; pub type MResult<T> = Result<T, Error>;
#[derive(Debug)] #[derive(Debug)]
pub enum Error { pub enum Error {
Io(io::Error),
IndexAlreadyExists,
MissingPrimaryKey,
SchemaMissing,
WordIndexMissing,
MissingDocumentId,
MaxFieldsLimitExceeded,
Schema(meilisearch_schema::Error),
Zlmdb(heed::Error),
Fst(fst::Error),
SerdeJson(SerdeJsonError),
Bincode(bincode::Error), Bincode(bincode::Error),
Serializer(SerializerError),
Deserializer(DeserializerError), Deserializer(DeserializerError),
UnsupportedOperation(UnsupportedOperation), FacetError(FacetError),
FilterParseError(PestError<Rule>),
Fst(fst::Error),
Heed(heed::Error),
IndexAlreadyExists,
Io(io::Error),
MaxFieldsLimitExceeded,
MissingDocumentId,
MissingPrimaryKey,
Schema(meilisearch_schema::Error),
SchemaMissing,
SerdeJson(SerdeJsonError),
Serializer(SerializerError),
VersionMismatch(String),
WordIndexMissing,
}
impl ErrorCode for Error {
fn error_code(&self) -> Code {
use Error::*;
match self {
FacetError(_) => Code::Facet,
FilterParseError(_) => Code::Filter,
IndexAlreadyExists => Code::IndexAlreadyExists,
MissingPrimaryKey => Code::MissingPrimaryKey,
MissingDocumentId => Code::MissingDocumentId,
MaxFieldsLimitExceeded => Code::MaxFieldsLimitExceeded,
Schema(s) => s.error_code(),
WordIndexMissing
| SchemaMissing => Code::InvalidState,
Heed(_)
| Fst(_)
| SerdeJson(_)
| Bincode(_)
| Serializer(_)
| Deserializer(_)
| VersionMismatch(_)
| Io(_) => Code::Internal,
}
}
} }
impl From<io::Error> for Error { impl From<io::Error> for Error {
@ -33,6 +66,34 @@ impl From<io::Error> for Error {
} }
} }
impl From<PestError<Rule>> for Error {
fn from(error: PestError<Rule>) -> Error {
Error::FilterParseError(error.renamed_rules(|r| {
let s = match r {
Rule::or => "OR",
Rule::and => "AND",
Rule::not => "NOT",
Rule::string => "string",
Rule::word => "word",
Rule::greater => "field > value",
Rule::less => "field < value",
Rule::eq => "field = value",
Rule::leq => "field <= value",
Rule::geq => "field >= value",
Rule::key => "key",
_ => "other",
};
s.to_string()
}))
}
}
impl From<FacetError> for Error {
fn from(error: FacetError) -> Error {
Error::FacetError(error)
}
}
impl From<meilisearch_schema::Error> for Error { impl From<meilisearch_schema::Error> for Error {
fn from(error: meilisearch_schema::Error) -> Error { fn from(error: meilisearch_schema::Error) -> Error {
Error::Schema(error) Error::Schema(error)
@ -41,7 +102,7 @@ impl From<meilisearch_schema::Error> for Error {
impl From<HeedError> for Error { impl From<HeedError> for Error {
fn from(error: HeedError) -> Error { fn from(error: HeedError) -> Error {
Error::Zlmdb(error) Error::Heed(error)
} }
} }
@ -65,7 +126,10 @@ impl From<BincodeError> for Error {
impl From<SerializerError> for Error { impl From<SerializerError> for Error {
fn from(error: SerializerError) -> Error { fn from(error: SerializerError) -> Error {
Error::Serializer(error) match error {
SerializerError::DocumentIdNotFound => Error::MissingDocumentId,
e => Error::Serializer(e),
}
} }
} }
@ -75,57 +139,86 @@ impl From<DeserializerError> for Error {
} }
} }
impl From<UnsupportedOperation> for Error {
fn from(op: UnsupportedOperation) -> Error {
Error::UnsupportedOperation(op)
}
}
impl fmt::Display for Error { impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
use self::Error::*; use self::Error::*;
match self { match self {
Io(e) => write!(f, "{}", e),
IndexAlreadyExists => write!(f, "index already exists"),
MissingPrimaryKey => write!(f, "schema cannot be built without a primary key"),
SchemaMissing => write!(f, "this index does not have a schema"),
WordIndexMissing => write!(f, "this index does not have a word index"),
MissingDocumentId => write!(f, "document id is missing"),
MaxFieldsLimitExceeded => write!(f, "maximum number of fields in a document exceeded"),
Schema(e) => write!(f, "schema error; {}", e),
Zlmdb(e) => write!(f, "heed error; {}", e),
Fst(e) => write!(f, "fst error; {}", e),
SerdeJson(e) => write!(f, "serde json error; {}", e),
Bincode(e) => write!(f, "bincode error; {}", e), Bincode(e) => write!(f, "bincode error; {}", e),
Serializer(e) => write!(f, "serializer error; {}", e),
Deserializer(e) => write!(f, "deserializer error; {}", e), Deserializer(e) => write!(f, "deserializer error; {}", e),
UnsupportedOperation(op) => write!(f, "unsupported operation; {}", op), FacetError(e) => write!(f, "error processing facet filter: {}", e),
FilterParseError(e) => write!(f, "error parsing filter; {}", e),
Fst(e) => write!(f, "fst error; {}", e),
Heed(e) => write!(f, "heed error; {}", e),
IndexAlreadyExists => write!(f, "index already exists"),
Io(e) => write!(f, "{}", e),
MaxFieldsLimitExceeded => write!(f, "maximum number of fields in a document exceeded"),
MissingDocumentId => write!(f, "document id is missing"),
MissingPrimaryKey => write!(f, "schema cannot be built without a primary key"),
Schema(e) => write!(f, "schema error; {}", e),
SchemaMissing => write!(f, "this index does not have a schema"),
SerdeJson(e) => write!(f, "serde json error; {}", e),
Serializer(e) => write!(f, "serializer error; {}", e),
VersionMismatch(version) => write!(f, "Cannot open database, expected MeiliSearch engine version: {}, current engine version: {}.{}.{}",
version,
env!("CARGO_PKG_VERSION_MAJOR"),
env!("CARGO_PKG_VERSION_MINOR"),
env!("CARGO_PKG_VERSION_PATCH")),
WordIndexMissing => write!(f, "this index does not have a word index"),
} }
} }
} }
impl error::Error for Error {} impl error::Error for Error {}
#[derive(Debug)] struct FilterParseError(PestError<Rule>);
pub enum UnsupportedOperation {
SchemaAlreadyExists, impl fmt::Display for FilterParseError {
CannotUpdateSchemaPrimaryKey, fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
CannotReorderSchemaAttribute, use crate::pest_error::LineColLocation::*;
CanOnlyIntroduceNewSchemaAttributesAtEnd,
CannotRemoveSchemaAttribute, let (line, column) = match self.0.line_col {
Span((line, _), (column, _)) => (line, column),
Pos((line, column)) => (line, column),
};
write!(f, "parsing error on line {} at column {}: {}", line, column, self.0.variant.message())
}
} }
impl fmt::Display for UnsupportedOperation { #[derive(Debug)]
pub enum FacetError {
EmptyArray,
ParsingError(String),
UnexpectedToken { expected: &'static [&'static str], found: String },
InvalidFormat(String),
AttributeNotFound(String),
AttributeNotSet { expected: Vec<String>, found: String },
InvalidDocumentAttribute(String),
NoAttributesForFaceting,
}
impl FacetError {
pub fn unexpected_token(expected: &'static [&'static str], found: impl ToString) -> FacetError {
FacetError::UnexpectedToken{ expected, found: found.to_string() }
}
pub fn attribute_not_set(expected: Vec<String>, found: impl ToString) -> FacetError {
FacetError::AttributeNotSet{ expected, found: found.to_string() }
}
}
impl fmt::Display for FacetError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
use self::UnsupportedOperation::*; use FacetError::*;
match self { match self {
SchemaAlreadyExists => write!(f, "Cannot update index which already have a schema"), EmptyArray => write!(f, "empty array in facet filter is unspecified behavior"),
CannotUpdateSchemaPrimaryKey => write!(f, "Cannot update the primary key of a schema"), ParsingError(msg) => write!(f, "parsing error: {}", msg),
CannotReorderSchemaAttribute => write!(f, "Cannot reorder the attributes of a schema"), UnexpectedToken { expected, found } => write!(f, "unexpected token {}, expected {}", found, expected.join("or")),
CanOnlyIntroduceNewSchemaAttributesAtEnd => { InvalidFormat(found) => write!(f, "invalid facet: {}, facets should be \"facetName:facetValue\"", found),
write!(f, "Can only introduce new attributes at end of a schema") AttributeNotFound(attr) => write!(f, "unknown {:?} attribute", attr),
} AttributeNotSet { found, expected } => write!(f, "`{}` is not set as a faceted attribute. available facet attributes: {}", found, expected.join(", ")),
CannotRemoveSchemaAttribute => write!(f, "Cannot remove attributes from a schema"), InvalidDocumentAttribute(attr) => write!(f, "invalid document attribute {}, accepted types: String and [String]", attr),
NoAttributesForFaceting => write!(f, "impossible to perform faceted search, no attributes for faceting are set"),
} }
} }
} }

View File

@ -0,0 +1,357 @@
use std::borrow::Cow;
use std::collections::HashMap;
use std::hash::Hash;
use std::ops::Deref;
use cow_utils::CowUtils;
use either::Either;
use heed::types::{Str, OwnedType};
use indexmap::IndexMap;
use serde_json::Value;
use meilisearch_schema::{FieldId, Schema};
use meilisearch_types::DocumentId;
use crate::database::MainT;
use crate::error::{FacetError, MResult};
use crate::store::BEU16;
/// Data structure used to represent a boolean expression in the form of nested arrays.
/// Values in the outer array are and-ed together, values in the inner arrays are or-ed together.
#[derive(Debug, PartialEq)]
pub struct FacetFilter(Vec<Either<Vec<FacetKey>, FacetKey>>);
impl Deref for FacetFilter {
type Target = Vec<Either<Vec<FacetKey>, FacetKey>>;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl FacetFilter {
pub fn from_str(
s: &str,
schema: &Schema,
attributes_for_faceting: &[FieldId],
) -> MResult<FacetFilter> {
if attributes_for_faceting.is_empty() {
return Err(FacetError::NoAttributesForFaceting.into());
}
let parsed = serde_json::from_str::<Value>(s).map_err(|e| FacetError::ParsingError(e.to_string()))?;
let mut filter = Vec::new();
match parsed {
Value::Array(and_exprs) => {
if and_exprs.is_empty() {
return Err(FacetError::EmptyArray.into());
}
for expr in and_exprs {
match expr {
Value::String(s) => {
let key = FacetKey::from_str( &s, schema, attributes_for_faceting)?;
filter.push(Either::Right(key));
}
Value::Array(or_exprs) => {
if or_exprs.is_empty() {
return Err(FacetError::EmptyArray.into());
}
let mut inner = Vec::new();
for expr in or_exprs {
match expr {
Value::String(s) => {
let key = FacetKey::from_str( &s, schema, attributes_for_faceting)?;
inner.push(key);
}
bad_value => return Err(FacetError::unexpected_token(&["String"], bad_value).into()),
}
}
filter.push(Either::Left(inner));
}
bad_value => return Err(FacetError::unexpected_token(&["Array", "String"], bad_value).into()),
}
}
Ok(Self(filter))
}
bad_value => Err(FacetError::unexpected_token(&["Array"], bad_value).into()),
}
}
}
#[derive(Debug, Eq, PartialEq, Hash)]
#[repr(C)]
pub struct FacetKey(FieldId, String);
impl FacetKey {
pub fn new(field_id: FieldId, value: String) -> Self {
let value = match value.cow_to_lowercase() {
Cow::Borrowed(_) => value,
Cow::Owned(s) => s,
};
Self(field_id, value)
}
pub fn key(&self) -> FieldId {
self.0
}
pub fn value(&self) -> &str {
&self.1
}
// TODO improve parser
fn from_str(
s: &str,
schema: &Schema,
attributes_for_faceting: &[FieldId],
) -> Result<Self, FacetError> {
let mut split = s.splitn(2, ':');
let key = split
.next()
.ok_or_else(|| FacetError::InvalidFormat(s.to_string()))?
.trim();
let field_id = schema
.id(key)
.ok_or_else(|| FacetError::AttributeNotFound(key.to_string()))?;
if !attributes_for_faceting.contains(&field_id) {
return Err(FacetError::attribute_not_set(
attributes_for_faceting
.iter()
.filter_map(|&id| schema.name(id))
.map(str::to_string)
.collect::<Vec<_>>(),
key))
}
let value = split
.next()
.ok_or_else(|| FacetError::InvalidFormat(s.to_string()))?
.trim();
// unquoting the string if need be:
let mut indices = value.char_indices();
let value = match (indices.next(), indices.last()) {
(Some((s, '\'')), Some((e, '\''))) |
(Some((s, '\"')), Some((e, '\"'))) => value[s + 1..e].to_string(),
_ => value.to_string(),
};
Ok(Self::new(field_id, value))
}
}
impl<'a> heed::BytesEncode<'a> for FacetKey {
type EItem = FacetKey;
fn bytes_encode(item: &'a Self::EItem) -> Option<Cow<'a, [u8]>> {
let mut buffer = Vec::with_capacity(2 + item.1.len());
let id = BEU16::new(item.key().into());
let id_bytes = OwnedType::bytes_encode(&id)?;
let value_bytes = Str::bytes_encode(item.value())?;
buffer.extend_from_slice(id_bytes.as_ref());
buffer.extend_from_slice(value_bytes.as_ref());
Some(Cow::Owned(buffer))
}
}
impl<'a> heed::BytesDecode<'a> for FacetKey {
type DItem = FacetKey;
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
let (id_bytes, value_bytes) = bytes.split_at(2);
let id = OwnedType::<BEU16>::bytes_decode(id_bytes)?;
let id = id.get().into();
let string = Str::bytes_decode(&value_bytes)?;
Some(FacetKey(id, string.to_string()))
}
}
pub fn add_to_facet_map(
facet_map: &mut HashMap<FacetKey, (String, Vec<DocumentId>)>,
field_id: FieldId,
value: Value,
document_id: DocumentId,
) -> Result<(), FacetError> {
let value = match value {
Value::String(s) => s,
// ignore null
Value::Null => return Ok(()),
value => return Err(FacetError::InvalidDocumentAttribute(value.to_string())),
};
let key = FacetKey::new(field_id, value.clone());
facet_map.entry(key).or_insert_with(|| (value, Vec::new())).1.push(document_id);
Ok(())
}
pub fn facet_map_from_docids(
rtxn: &heed::RoTxn<MainT>,
index: &crate::Index,
document_ids: &[DocumentId],
attributes_for_facetting: &[FieldId],
) -> MResult<HashMap<FacetKey, (String, Vec<DocumentId>)>> {
// A hashmap that ascociate a facet key to a pair containing the original facet attribute
// string with it's case preserved, and a list of document ids for that facet attribute.
let mut facet_map: HashMap<FacetKey, (String, Vec<DocumentId>)> = HashMap::new();
for document_id in document_ids {
for result in index
.documents_fields
.document_fields(rtxn, *document_id)?
{
let (field_id, bytes) = result?;
if attributes_for_facetting.contains(&field_id) {
match serde_json::from_slice(bytes)? {
Value::Array(values) => {
for v in values {
add_to_facet_map(&mut facet_map, field_id, v, *document_id)?;
}
}
v => add_to_facet_map(&mut facet_map, field_id, v, *document_id)?,
};
}
}
}
Ok(facet_map)
}
pub fn facet_map_from_docs(
schema: &Schema,
documents: &HashMap<DocumentId, IndexMap<String, Value>>,
attributes_for_facetting: &[FieldId],
) -> MResult<HashMap<FacetKey, (String, Vec<DocumentId>)>> {
let mut facet_map = HashMap::new();
let attributes_for_facetting = attributes_for_facetting
.iter()
.filter_map(|&id| schema.name(id).map(|name| (id, name)))
.collect::<Vec<_>>();
for (id, document) in documents {
for (field_id, name) in &attributes_for_facetting {
if let Some(value) = document.get(*name) {
match value {
Value::Array(values) => {
for v in values {
add_to_facet_map(&mut facet_map, *field_id, v.clone(), *id)?;
}
}
v => add_to_facet_map(&mut facet_map, *field_id, v.clone(), *id)?,
}
}
}
}
Ok(facet_map)
}
#[cfg(test)]
mod test {
use super::*;
use meilisearch_schema::Schema;
#[test]
fn test_facet_key() {
let mut schema = Schema::new();
let id = schema.insert_and_index("hello").unwrap();
let facet_list = [schema.id("hello").unwrap()];
assert_eq!(
FacetKey::from_str("hello:12", &schema, &facet_list).unwrap(),
FacetKey::new(id, "12".to_string())
);
assert_eq!(
FacetKey::from_str("hello:\"foo bar\"", &schema, &facet_list).unwrap(),
FacetKey::new(id, "foo bar".to_string())
);
assert_eq!(
FacetKey::from_str("hello:'foo bar'", &schema, &facet_list).unwrap(),
FacetKey::new(id, "foo bar".to_string())
);
// weird case
assert_eq!(
FacetKey::from_str("hello:blabla:machin", &schema, &facet_list).unwrap(),
FacetKey::new(id, "blabla:machin".to_string())
);
assert_eq!(
FacetKey::from_str("hello:\"\"", &schema, &facet_list).unwrap(),
FacetKey::new(id, "".to_string())
);
assert_eq!(
FacetKey::from_str("hello:'", &schema, &facet_list).unwrap(),
FacetKey::new(id, "'".to_string())
);
assert_eq!(
FacetKey::from_str("hello:''", &schema, &facet_list).unwrap(),
FacetKey::new(id, "".to_string())
);
assert!(FacetKey::from_str("hello", &schema, &facet_list).is_err());
assert!(FacetKey::from_str("toto:12", &schema, &facet_list).is_err());
}
#[test]
fn test_parse_facet_array() {
use either::Either::{Left, Right};
let mut schema = Schema::new();
let _id = schema.insert_and_index("hello").unwrap();
let facet_list = [schema.id("hello").unwrap()];
assert_eq!(
FacetFilter::from_str("[[\"hello:12\"]]", &schema, &facet_list).unwrap(),
FacetFilter(vec![Left(vec![FacetKey(FieldId(0), "12".to_string())])])
);
assert_eq!(
FacetFilter::from_str("[\"hello:12\"]", &schema, &facet_list).unwrap(),
FacetFilter(vec![Right(FacetKey(FieldId(0), "12".to_string()))])
);
assert_eq!(
FacetFilter::from_str("[\"hello:12\", \"hello:13\"]", &schema, &facet_list).unwrap(),
FacetFilter(vec![
Right(FacetKey(FieldId(0), "12".to_string())),
Right(FacetKey(FieldId(0), "13".to_string()))
])
);
assert_eq!(
FacetFilter::from_str("[[\"hello:12\", \"hello:13\"]]", &schema, &facet_list).unwrap(),
FacetFilter(vec![Left(vec![
FacetKey(FieldId(0), "12".to_string()),
FacetKey(FieldId(0), "13".to_string())
])])
);
assert_eq!(
FacetFilter::from_str(
"[[\"hello:12\", \"hello:13\"], \"hello:14\"]",
&schema,
&facet_list
)
.unwrap(),
FacetFilter(vec![
Left(vec![
FacetKey(FieldId(0), "12".to_string()),
FacetKey(FieldId(0), "13".to_string())
]),
Right(FacetKey(FieldId(0), "14".to_string()))
])
);
// invalid array depths
assert!(FacetFilter::from_str(
"[[[\"hello:12\", \"hello:13\"], \"hello:14\"]]",
&schema,
&facet_list
)
.is_err());
assert!(FacetFilter::from_str(
"[[[\"hello:12\", \"hello:13\"]], \"hello:14\"]]",
&schema,
&facet_list
)
.is_err());
assert!(FacetFilter::from_str("\"hello:14\"", &schema, &facet_list).is_err());
// unexisting key
assert!(FacetFilter::from_str("[\"foo:12\"]", &schema, &facet_list).is_err());
// invalid facet key
assert!(FacetFilter::from_str("[\"foo=12\"]", &schema, &facet_list).is_err());
assert!(FacetFilter::from_str("[\"foo12\"]", &schema, &facet_list).is_err());
assert!(FacetFilter::from_str("[\"\"]", &schema, &facet_list).is_err());
// empty array error
assert!(FacetFilter::from_str("[]", &schema, &facet_list).is_err());
assert!(FacetFilter::from_str("[\"hello:12\", []]", &schema, &facet_list).is_err());
}
}

View File

@ -0,0 +1,276 @@
use std::str::FromStr;
use std::cmp::Ordering;
use crate::error::Error;
use crate::{store::Index, DocumentId, MainT};
use heed::RoTxn;
use meilisearch_schema::{FieldId, Schema};
use pest::error::{Error as PestError, ErrorVariant};
use pest::iterators::Pair;
use serde_json::{Value, Number};
use super::parser::Rule;
#[derive(Debug, PartialEq)]
enum ConditionType {
Greater,
Less,
Equal,
LessEqual,
GreaterEqual,
NotEqual,
}
/// We need to infer type when the filter is constructed
/// and match every possible types it can be parsed into.
#[derive(Debug)]
struct ConditionValue<'a> {
string: &'a str,
boolean: Option<bool>,
number: Option<Number>
}
impl<'a> ConditionValue<'a> {
pub fn new(value: &Pair<'a, Rule>) -> Self {
match value.as_rule() {
Rule::string | Rule::word => {
let string = value.as_str();
let boolean = match value.as_str() {
"true" => Some(true),
"false" => Some(false),
_ => None,
};
let number = Number::from_str(value.as_str()).ok();
ConditionValue { string, boolean, number }
},
_ => unreachable!(),
}
}
pub fn as_str(&self) -> &str {
self.string
}
pub fn as_number(&self) -> Option<&Number> {
self.number.as_ref()
}
pub fn as_bool(&self) -> Option<bool> {
self.boolean
}
}
#[derive(Debug)]
pub struct Condition<'a> {
field: FieldId,
condition: ConditionType,
value: ConditionValue<'a>
}
fn get_field_value<'a>(schema: &Schema, pair: Pair<'a, Rule>) -> Result<(FieldId, ConditionValue<'a>), Error> {
let mut items = pair.into_inner();
// lexing ensures that we at least have a key
let key = items.next().unwrap();
let field = schema
.id(key.as_str())
.ok_or_else(|| PestError::new_from_span(
ErrorVariant::CustomError {
message: format!(
"attribute `{}` not found, available attributes are: {}",
key.as_str(),
schema.names().collect::<Vec<_>>().join(", ")
),
},
key.as_span()))?;
let value = ConditionValue::new(&items.next().unwrap());
Ok((field, value))
}
// undefined behavior with big numbers
fn compare_numbers(lhs: &Number, rhs: &Number) -> Option<Ordering> {
match (lhs.as_i64(), lhs.as_u64(), lhs.as_f64(),
rhs.as_i64(), rhs.as_u64(), rhs.as_f64()) {
// i64 u64 f64 i64 u64 f64
(Some(lhs), _, _, Some(rhs), _, _) => lhs.partial_cmp(&rhs),
(_, Some(lhs), _, _, Some(rhs), _) => lhs.partial_cmp(&rhs),
(_, _, Some(lhs), _, _, Some(rhs)) => lhs.partial_cmp(&rhs),
(_, _, _, _, _, _) => None,
}
}
impl<'a> Condition<'a> {
pub fn less(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::Less;
Ok(Self { field, condition, value })
}
pub fn greater(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::Greater;
Ok(Self { field, condition, value })
}
pub fn neq(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::NotEqual;
Ok(Self { field, condition, value })
}
pub fn geq(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::GreaterEqual;
Ok(Self { field, condition, value })
}
pub fn leq(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::LessEqual;
Ok(Self { field, condition, value })
}
pub fn eq(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::Equal;
Ok(Self { field, condition, value })
}
pub fn test(
&self,
reader: &RoTxn<MainT>,
index: &Index,
document_id: DocumentId,
) -> Result<bool, Error> {
match index.document_attribute::<Value>(reader, document_id, self.field)? {
Some(Value::Array(values)) => Ok(values.iter().any(|v| self.match_value(Some(v)))),
other => Ok(self.match_value(other.as_ref())),
}
}
fn match_value(&self, value: Option<&Value>) -> bool {
match value {
Some(Value::String(s)) => {
let value = self.value.as_str();
match self.condition {
ConditionType::Equal => unicase::eq(value, &s),
ConditionType::NotEqual => !unicase::eq(value, &s),
_ => false
}
},
Some(Value::Number(n)) => {
if let Some(value) = self.value.as_number() {
if let Some(ord) = compare_numbers(&n, value) {
let res = match self.condition {
ConditionType::Equal => ord == Ordering::Equal,
ConditionType::NotEqual => ord != Ordering::Equal,
ConditionType::GreaterEqual => ord != Ordering::Less,
ConditionType::LessEqual => ord != Ordering::Greater,
ConditionType::Greater => ord == Ordering::Greater,
ConditionType::Less => ord == Ordering::Less,
};
return res
}
}
false
},
Some(Value::Bool(b)) => {
if let Some(value) = self.value.as_bool() {
let res = match self.condition {
ConditionType::Equal => *b == value,
ConditionType::NotEqual => *b != value,
_ => false
};
return res
}
false
},
// if field is not supported (or not found), all values are different from it,
// so != should always return true in this case.
_ => self.condition == ConditionType::NotEqual,
}
}
}
#[cfg(test)]
mod test {
use super::*;
use serde_json::Number;
use std::cmp::Ordering;
#[test]
fn test_number_comp() {
// test both u64
let n1 = Number::from(1u64);
let n2 = Number::from(2u64);
assert_eq!(Some(Ordering::Less), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Greater), compare_numbers(&n2, &n1));
let n1 = Number::from(1u64);
let n2 = Number::from(1u64);
assert_eq!(Some(Ordering::Equal), compare_numbers(&n1, &n2));
// test both i64
let n1 = Number::from(1i64);
let n2 = Number::from(2i64);
assert_eq!(Some(Ordering::Less), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Greater), compare_numbers(&n2, &n1));
let n1 = Number::from(1i64);
let n2 = Number::from(1i64);
assert_eq!(Some(Ordering::Equal), compare_numbers(&n1, &n2));
// test both f64
let n1 = Number::from_f64(1f64).unwrap();
let n2 = Number::from_f64(2f64).unwrap();
assert_eq!(Some(Ordering::Less), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Greater), compare_numbers(&n2, &n1));
let n1 = Number::from_f64(1f64).unwrap();
let n2 = Number::from_f64(1f64).unwrap();
assert_eq!(Some(Ordering::Equal), compare_numbers(&n1, &n2));
// test one u64 and one f64
let n1 = Number::from_f64(1f64).unwrap();
let n2 = Number::from(2u64);
assert_eq!(Some(Ordering::Less), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Greater), compare_numbers(&n2, &n1));
// equality
let n1 = Number::from_f64(1f64).unwrap();
let n2 = Number::from(1u64);
assert_eq!(Some(Ordering::Equal), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Equal), compare_numbers(&n2, &n1));
// float is neg
let n1 = Number::from_f64(-1f64).unwrap();
let n2 = Number::from(1u64);
assert_eq!(Some(Ordering::Less), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Greater), compare_numbers(&n2, &n1));
// float is too big
let n1 = Number::from_f64(std::f64::MAX).unwrap();
let n2 = Number::from(1u64);
assert_eq!(Some(Ordering::Greater), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Less), compare_numbers(&n2, &n1));
// misc
let n1 = Number::from_f64(std::f64::MAX).unwrap();
let n2 = Number::from(std::u64::MAX);
assert_eq!(Some(Ordering::Greater), compare_numbers(&n1, &n2));
assert_eq!(Some( Ordering::Less ), compare_numbers(&n2, &n1));
}
}

View File

@ -0,0 +1,127 @@
mod parser;
mod condition;
pub(crate) use parser::Rule;
use std::ops::Not;
use condition::Condition;
use crate::error::Error;
use crate::{DocumentId, MainT, store::Index};
use heed::RoTxn;
use meilisearch_schema::Schema;
use parser::{PREC_CLIMBER, FilterParser};
use pest::iterators::{Pair, Pairs};
use pest::Parser;
type FilterResult<'a> = Result<Filter<'a>, Error>;
#[derive(Debug)]
pub enum Filter<'a> {
Condition(Condition<'a>),
Or(Box<Self>, Box<Self>),
And(Box<Self>, Box<Self>),
Not(Box<Self>),
}
impl<'a> Filter<'a> {
pub fn parse(expr: &'a str, schema: &'a Schema) -> FilterResult<'a> {
let mut lexed = FilterParser::parse(Rule::prgm, expr)?;
Self::build(lexed.next().unwrap().into_inner(), schema)
}
pub fn test(
&self,
reader: &RoTxn<MainT>,
index: &Index,
document_id: DocumentId,
) -> Result<bool, Error> {
use Filter::*;
match self {
Condition(c) => c.test(reader, index, document_id),
Or(lhs, rhs) => Ok(
lhs.test(reader, index, document_id)? || rhs.test(reader, index, document_id)?
),
And(lhs, rhs) => Ok(
lhs.test(reader, index, document_id)? && rhs.test(reader, index, document_id)?
),
Not(op) => op.test(reader, index, document_id).map(bool::not),
}
}
fn build(expression: Pairs<'a, Rule>, schema: &'a Schema) -> FilterResult<'a> {
PREC_CLIMBER.climb(
expression,
|pair: Pair<Rule>| match pair.as_rule() {
Rule::eq => Ok(Filter::Condition(Condition::eq(pair, schema)?)),
Rule::greater => Ok(Filter::Condition(Condition::greater(pair, schema)?)),
Rule::less => Ok(Filter::Condition(Condition::less(pair, schema)?)),
Rule::neq => Ok(Filter::Condition(Condition::neq(pair, schema)?)),
Rule::geq => Ok(Filter::Condition(Condition::geq(pair, schema)?)),
Rule::leq => Ok(Filter::Condition(Condition::leq(pair, schema)?)),
Rule::prgm => Self::build(pair.into_inner(), schema),
Rule::term => Self::build(pair.into_inner(), schema),
Rule::not => Ok(Filter::Not(Box::new(Self::build(
pair.into_inner(),
schema,
)?))),
_ => unreachable!(),
},
|lhs: FilterResult, op: Pair<Rule>, rhs: FilterResult| match op.as_rule() {
Rule::or => Ok(Filter::Or(Box::new(lhs?), Box::new(rhs?))),
Rule::and => Ok(Filter::And(Box::new(lhs?), Box::new(rhs?))),
_ => unreachable!(),
},
)
}
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn invalid_syntax() {
assert!(FilterParser::parse(Rule::prgm, "field : id").is_err());
assert!(FilterParser::parse(Rule::prgm, "field=hello hello").is_err());
assert!(FilterParser::parse(Rule::prgm, "field=hello OR OR").is_err());
assert!(FilterParser::parse(Rule::prgm, "OR field:hello").is_err());
assert!(FilterParser::parse(Rule::prgm, r#"field="hello world"#).is_err());
assert!(FilterParser::parse(Rule::prgm, r#"field='hello world"#).is_err());
assert!(FilterParser::parse(Rule::prgm, "NOT field=").is_err());
assert!(FilterParser::parse(Rule::prgm, "N").is_err());
assert!(FilterParser::parse(Rule::prgm, "(field=1").is_err());
assert!(FilterParser::parse(Rule::prgm, "(field=1))").is_err());
assert!(FilterParser::parse(Rule::prgm, "field=1ORfield=2").is_err());
assert!(FilterParser::parse(Rule::prgm, "field=1 ( OR field=2)").is_err());
assert!(FilterParser::parse(Rule::prgm, "hello world=1").is_err());
assert!(FilterParser::parse(Rule::prgm, "").is_err());
assert!(FilterParser::parse(Rule::prgm, r#"((((((hello=world)))))"#).is_err());
}
#[test]
fn valid_syntax() {
assert!(FilterParser::parse(Rule::prgm, "field = id").is_ok());
assert!(FilterParser::parse(Rule::prgm, "field=id").is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field >= 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field <= 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field="hello world""#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field='hello world'"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field > 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field < 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field < 10 AND NOT field=5"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field < 10 AND NOT field > 7.5"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field=true OR NOT field=5"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"NOT field=true OR NOT field=5"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field='hello world' OR ( NOT field=true OR NOT field=5 )"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field='hello \'worl\'d' OR ( NOT field=true OR NOT field=5 )"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field="hello \"worl\"d" OR ( NOT field=true OR NOT field=5 )"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"((((((hello=world))))))"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#""foo bar" > 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#""foo bar" = 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"'foo bar' = 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"'foo bar' <= 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"'foo bar' != 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"bar != 10"#).is_ok());
}
}

View File

@ -0,0 +1,28 @@
key = _{quoted | word}
value = _{quoted | word}
quoted = _{ (PUSH("'") | PUSH("\"")) ~ string ~ POP }
string = {char*}
word = ${(LETTER | NUMBER | "_" | "-" | ".")+}
char = _{ !(PEEK | "\\") ~ ANY
| "\\" ~ (PEEK | "\\" | "/" | "b" | "f" | "n" | "r" | "t")
| "\\" ~ ("u" ~ ASCII_HEX_DIGIT{4})}
condition = _{eq | greater | less | geq | leq | neq}
geq = {key ~ ">=" ~ value}
leq = {key ~ "<=" ~ value}
neq = {key ~ "!=" ~ value}
eq = {key ~ "=" ~ value}
greater = {key ~ ">" ~ value}
less = {key ~ "<" ~ value}
prgm = {SOI ~ expr ~ EOI}
expr = _{ ( term ~ (operation ~ term)* ) }
term = { ("(" ~ expr ~ ")") | condition | not }
operation = _{ and | or }
and = {"AND"}
or = {"OR"}
not = {"NOT" ~ term}
WHITESPACE = _{ " " }

View File

@ -0,0 +1,12 @@
use once_cell::sync::Lazy;
use pest::prec_climber::{Operator, Assoc, PrecClimber};
pub static PREC_CLIMBER: Lazy<PrecClimber<Rule>> = Lazy::new(|| {
use Assoc::*;
use Rule::*;
pest::prec_climber::PrecClimber::new(vec![Operator::new(or, Left), Operator::new(and, Left)])
});
#[derive(Parser)]
#[grammar = "filters/parser/grammar.pest"]
pub struct FilterParser;

View File

@ -1,12 +1,17 @@
#![allow(clippy::type_complexity)]
#[cfg(test)] #[cfg(test)]
#[macro_use] #[macro_use]
extern crate assert_matches; extern crate assert_matches;
#[macro_use]
extern crate pest_derive;
mod automaton; mod automaton;
mod bucket_sort; mod bucket_sort;
mod database; mod database;
mod distinct_map; mod distinct_map;
mod error; mod error;
mod filters;
mod levenshtein; mod levenshtein;
mod number; mod number;
mod query_builder; mod query_builder;
@ -15,15 +20,17 @@ mod query_words_mapper;
mod ranked_map; mod ranked_map;
mod raw_document; mod raw_document;
mod reordered_attrs; mod reordered_attrs;
mod update;
pub mod settings;
pub mod criterion; pub mod criterion;
pub mod facets;
pub mod raw_indexer; pub mod raw_indexer;
pub mod serde; pub mod serde;
pub mod settings;
pub mod store; pub mod store;
pub mod update;
pub use self::database::{BoxUpdateFn, Database, MainT, UpdateT}; pub use self::database::{BoxUpdateFn, Database, DatabaseOptions, MainT, UpdateT, MainWriter, MainReader, UpdateWriter, UpdateReader};
pub use self::error::{Error, HeedError, FstError, MResult}; pub use self::error::{Error, HeedError, FstError, MResult, pest_error, FacetError};
pub use self::filters::Filter;
pub use self::number::{Number, ParseNumberError}; pub use self::number::{Number, ParseNumberError};
pub use self::ranked_map::RankedMap; pub use self::ranked_map::RankedMap;
pub use self::raw_document::RawDocument; pub use self::raw_document::RawDocument;
@ -33,16 +40,20 @@ pub use meilisearch_types::{DocIndex, DocumentId, Highlight};
pub use meilisearch_schema::Schema; pub use meilisearch_schema::Schema;
pub use query_words_mapper::QueryWordsMapper; pub use query_words_mapper::QueryWordsMapper;
use std::convert::TryFrom;
use std::collections::HashMap;
use compact_arena::SmallArena; use compact_arena::SmallArena;
use log::{error, trace}; use log::{error, trace};
use std::borrow::Cow;
use std::collections::HashMap;
use std::convert::TryFrom;
use crate::bucket_sort::PostingsListView; use crate::bucket_sort::PostingsListView;
use crate::levenshtein::prefix_damerau_levenshtein; use crate::levenshtein::prefix_damerau_levenshtein;
use crate::query_tree::{QueryId, QueryKind}; use crate::query_tree::{QueryId, QueryKind};
use crate::reordered_attrs::ReorderedAttrs; use crate::reordered_attrs::ReorderedAttrs;
type FstSetCow<'a> = fst::Set<Cow<'a, [u8]>>;
type FstMapCow<'a> = fst::Map<Cow<'a, [u8]>>;
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)] #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]
pub struct Document { pub struct Document {
pub id: DocumentId, pub id: DocumentId,
@ -186,6 +197,6 @@ mod tests {
#[test] #[test]
fn docindex_mem_size() { fn docindex_mem_size() {
assert_eq!(mem::size_of::<DocIndex>(), 16); assert_eq!(mem::size_of::<DocIndex>(), 12);
} }
} }

View File

@ -6,7 +6,7 @@ use std::str::FromStr;
use ordered_float::OrderedFloat; use ordered_float::OrderedFloat;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize, Debug, Copy, Clone, Hash)] #[derive(Serialize, Deserialize, Debug, Copy, Clone)]
pub enum Number { pub enum Number {
Unsigned(u64), Unsigned(u64),
Signed(i64), Signed(i64),

View File

@ -1,66 +1,57 @@
use std::ops::Range; use std::borrow::Cow;
use std::collections::HashMap;
use std::ops::{Deref, Range};
use std::time::Duration; use std::time::Duration;
use crate::database::MainT; use either::Either;
use crate::bucket_sort::{bucket_sort, bucket_sort_with_distinct}; use sdset::{SetOperation, SetBuf, Set};
use crate::{criterion::Criteria, Document, DocumentId};
use crate::{reordered_attrs::ReorderedAttrs, store, MResult};
pub struct QueryBuilder<'c, 'f, 'd> { use meilisearch_schema::FieldId;
use crate::bucket_sort::{bucket_sort, bucket_sort_with_distinct, SortResult, placeholder_document_sort, facet_count};
use crate::database::MainT;
use crate::facets::FacetFilter;
use crate::distinct_map::{DistinctMap, BufferedDistinctMap};
use crate::Document;
use crate::{criterion::Criteria, DocumentId};
use crate::{reordered_attrs::ReorderedAttrs, store, MResult, MainReader};
pub struct QueryBuilder<'c, 'f, 'd, 'i> {
criteria: Criteria<'c>, criteria: Criteria<'c>,
searchable_attrs: Option<ReorderedAttrs>, searchable_attrs: Option<ReorderedAttrs>,
filter: Option<Box<dyn Fn(DocumentId) -> bool + 'f>>, filter: Option<Box<dyn Fn(DocumentId) -> bool + 'f>>,
distinct: Option<(Box<dyn Fn(DocumentId) -> Option<u64> + 'd>, usize)>, distinct: Option<(Box<dyn Fn(DocumentId) -> Option<u64> + 'd>, usize)>,
timeout: Option<Duration>, timeout: Option<Duration>,
main_store: store::Main, index: &'i store::Index,
postings_lists_store: store::PostingsLists, facet_filter: Option<FacetFilter>,
documents_fields_counts_store: store::DocumentsFieldsCounts, facets: Option<Vec<(FieldId, String)>>,
synonyms_store: store::Synonyms,
prefix_documents_cache_store: store::PrefixDocumentsCache,
prefix_postings_lists_cache_store: store::PrefixPostingsListsCache,
} }
impl<'c, 'f, 'd> QueryBuilder<'c, 'f, 'd> { impl<'c, 'f, 'd, 'i> QueryBuilder<'c, 'f, 'd, 'i> {
pub fn new( pub fn new(index: &'i store::Index) -> Self {
main: store::Main, QueryBuilder::with_criteria(index, Criteria::default())
postings_lists: store::PostingsLists,
documents_fields_counts: store::DocumentsFieldsCounts,
synonyms: store::Synonyms,
prefix_documents_cache: store::PrefixDocumentsCache,
prefix_postings_lists_cache: store::PrefixPostingsListsCache,
) -> QueryBuilder<'c, 'f, 'd> {
QueryBuilder::with_criteria(
main,
postings_lists,
documents_fields_counts,
synonyms,
prefix_documents_cache,
prefix_postings_lists_cache,
Criteria::default(),
)
} }
pub fn with_criteria( /// sets facet attributes to filter on
main: store::Main, pub fn set_facet_filter(&mut self, facets: Option<FacetFilter>) {
postings_lists: store::PostingsLists, self.facet_filter = facets;
documents_fields_counts: store::DocumentsFieldsCounts, }
synonyms: store::Synonyms,
prefix_documents_cache: store::PrefixDocumentsCache, /// sets facet attributes for which to return the count
prefix_postings_lists_cache: store::PrefixPostingsListsCache, pub fn set_facets(&mut self, facets: Option<Vec<(FieldId, String)>>) {
criteria: Criteria<'c>, self.facets = facets;
) -> QueryBuilder<'c, 'f, 'd> { }
pub fn with_criteria(index: &'i store::Index, criteria: Criteria<'c>) -> Self {
QueryBuilder { QueryBuilder {
criteria, criteria,
searchable_attrs: None, searchable_attrs: None,
filter: None, filter: None,
distinct: None, distinct: None,
timeout: None, timeout: None,
main_store: main, index,
postings_lists_store: postings_lists, facet_filter: None,
documents_fields_counts_store: documents_fields_counts, facets: None,
synonyms_store: synonyms,
prefix_documents_cache_store: prefix_documents_cache,
prefix_postings_lists_cache_store: prefix_postings_lists_cache,
} }
} }
@ -87,45 +78,199 @@ impl<'c, 'f, 'd> QueryBuilder<'c, 'f, 'd> {
reorders.insert_attribute(attribute); reorders.insert_attribute(attribute);
} }
pub fn query( /// returns the documents ids associated with a facet filter by computing the union and
self, /// intersection of the document sets
reader: &heed::RoTxn<MainT>, fn facets_docids(&self, reader: &MainReader) -> MResult<Option<SetBuf<DocumentId>>> {
query: &str, let facet_docids = match self.facet_filter {
range: Range<usize>, Some(ref facets) => {
) -> MResult<Vec<Document>> { let mut ands = Vec::with_capacity(facets.len());
let mut ors = Vec::new();
for f in facets.deref() {
match f {
Either::Left(keys) => {
ors.reserve(keys.len());
for key in keys {
let docids = self
.index
.facets
.facet_document_ids(reader, &key)?
.unwrap_or_default();
ors.push(docids);
}
let sets: Vec<_> = ors.iter().map(|(_, i)| i).map(Cow::deref).collect();
let or_result = sdset::multi::OpBuilder::from_vec(sets).union().into_set_buf();
ands.push(Cow::Owned(or_result));
ors.clear();
}
Either::Right(key) => {
match self.index.facets.facet_document_ids(reader, &key)? {
Some((_name, docids)) => ands.push(docids),
// no candidates for search, early return.
None => return Ok(Some(SetBuf::default())),
}
}
};
}
let ands: Vec<_> = ands.iter().map(Cow::deref).collect();
Some(
sdset::multi::OpBuilder::from_vec(ands)
.intersection()
.into_set_buf(),
)
}
None => None,
};
Ok(facet_docids)
}
fn standard_query(self, reader: &MainReader, query: &str, range: Range<usize>) -> MResult<SortResult> {
let facets_docids = match self.facets_docids(reader)? {
Some(ids) if ids.is_empty() => return Ok(SortResult::default()),
other => other
};
// for each field to retrieve the count for, create an HashMap associating the attribute
// value to a set of matching documents. The HashMaps are them collected in another
// HashMap, associating each HashMap to it's field.
let facet_count_docids = self.facet_count_docids(reader)?;
match self.distinct { match self.distinct {
Some((distinct, distinct_size)) => bucket_sort_with_distinct( Some((distinct, distinct_size)) => bucket_sort_with_distinct(
reader, reader,
query, query,
range, range,
facets_docids,
facet_count_docids,
self.filter, self.filter,
distinct, distinct,
distinct_size, distinct_size,
self.criteria, self.criteria,
self.searchable_attrs, self.searchable_attrs,
self.main_store, self.index,
self.postings_lists_store,
self.documents_fields_counts_store,
self.synonyms_store,
self.prefix_documents_cache_store,
self.prefix_postings_lists_cache_store,
), ),
None => bucket_sort( None => bucket_sort(
reader, reader,
query, query,
range, range,
facets_docids,
facet_count_docids,
self.filter, self.filter,
self.criteria, self.criteria,
self.searchable_attrs, self.searchable_attrs,
self.main_store, self.index,
self.postings_lists_store,
self.documents_fields_counts_store,
self.synonyms_store,
self.prefix_documents_cache_store,
self.prefix_postings_lists_cache_store,
), ),
} }
} }
fn placeholder_query(self, reader: &heed::RoTxn<MainT>, range: Range<usize>) -> MResult<SortResult> {
match self.facets_docids(reader)? {
Some(docids) => {
// We sort the docids from facets according to the criteria set by the user
let mut sorted_docids = docids.clone().into_vec();
let mut sort_result = match self.index.main.ranked_map(reader)? {
Some(ranked_map) => {
placeholder_document_sort(&mut sorted_docids, self.index, reader, &ranked_map)?;
self.sort_result_from_docids(&sorted_docids, range)
},
// if we can't perform a sort, we return documents unordered
None => self.sort_result_from_docids(&docids, range),
};
if let Some(f) = self.facet_count_docids(reader)? {
sort_result.exhaustive_facets_count = Some(true);
sort_result.facets = Some(facet_count(f, &docids));
}
Ok(sort_result)
},
None => {
match self.index.main.sorted_document_ids_cache(reader)? {
// build result from cached document ids
Some(docids) => {
let mut sort_result = self.sort_result_from_docids(&docids, range);
if let Some(f) = self.facet_count_docids(reader)? {
sort_result.exhaustive_facets_count = Some(true);
// document ids are not sorted in natural order, we need to construct a new set
let document_set = SetBuf::from_dirty(Vec::from(docids));
sort_result.facets = Some(facet_count(f, &document_set));
}
Ok(sort_result)
},
// no document id cached, return empty result
None => Ok(SortResult::default()),
}
}
}
}
fn facet_count_docids<'a>(&self, reader: &'a MainReader) -> MResult<Option<HashMap<String, HashMap<String, (&'a str, Cow<'a, Set<DocumentId>>)>>>> {
match self.facets {
Some(ref field_ids) => {
let mut facet_count_map = HashMap::new();
for (field_id, field_name) in field_ids {
let mut key_map = HashMap::new();
for pair in self.index.facets.field_document_ids(reader, *field_id)? {
let (facet_key, document_ids) = pair?;
let value = facet_key.value();
key_map.insert(value.to_string(), document_ids);
}
facet_count_map.insert(field_name.clone(), key_map);
}
Ok(Some(facet_count_map))
}
None => Ok(None),
}
}
fn sort_result_from_docids(&self, docids: &[DocumentId], range: Range<usize>) -> SortResult {
let mut sort_result = SortResult::default();
let mut result = match self.filter {
Some(ref filter) => docids
.iter()
.filter(|item| (filter)(**item))
.skip(range.start)
.take(range.end - range.start)
.map(|&id| Document::from_highlights(id, &[]))
.collect::<Vec<_>>(),
None => docids
.iter()
.skip(range.start)
.take(range.end - range.start)
.map(|&id| Document::from_highlights(id, &[]))
.collect::<Vec<_>>(),
};
// distinct is set, remove duplicates with disctinct function
if let Some((distinct, distinct_size)) = &self.distinct {
let mut distinct_map = DistinctMap::new(*distinct_size);
let mut distinct_map = BufferedDistinctMap::new(&mut distinct_map);
result.retain(|doc| {
let id = doc.id;
let key = (distinct)(id);
match key {
Some(key) => distinct_map.register(key),
None => distinct_map.register_without_key(),
}
});
}
sort_result.documents = result;
sort_result.nb_hits = docids.len();
sort_result
}
pub fn query(
self,
reader: &heed::RoTxn<MainT>,
query: Option<&str>,
range: Range<usize>,
) -> MResult<SortResult> {
match query {
Some(query) => self.standard_query(reader, query, range),
None => self.placeholder_query(reader, range),
}
}
} }
#[cfg(test)] #[cfg(test)]
@ -135,33 +280,34 @@ mod tests {
use std::collections::{BTreeSet, HashMap}; use std::collections::{BTreeSet, HashMap};
use std::iter::FromIterator; use std::iter::FromIterator;
use fst::{IntoStreamer, Set}; use fst::IntoStreamer;
use meilisearch_schema::IndexedPos; use meilisearch_schema::IndexedPos;
use sdset::SetBuf; use sdset::SetBuf;
use tempfile::TempDir; use tempfile::TempDir;
use crate::DocIndex;
use crate::automaton::normalize_str; use crate::automaton::normalize_str;
use crate::bucket_sort::SimpleMatch; use crate::bucket_sort::SimpleMatch;
use crate::database::Database; use crate::database::{Database, DatabaseOptions};
use crate::store::Index; use crate::store::Index;
use crate::DocIndex;
use crate::Document;
use meilisearch_schema::Schema; use meilisearch_schema::Schema;
fn set_from_stream<'f, I, S>(stream: I) -> Set fn set_from_stream<'f, I, S>(stream: I) -> fst::Set<Vec<u8>>
where where
I: for<'a> fst::IntoStreamer<'a, Into = S, Item = &'a [u8]>, I: for<'a> fst::IntoStreamer<'a, Into = S, Item = &'a [u8]>,
S: 'f + for<'a> fst::Streamer<'a, Item = &'a [u8]>, S: 'f + for<'a> fst::Streamer<'a, Item = &'a [u8]>,
{ {
let mut builder = fst::SetBuilder::memory(); let mut builder = fst::SetBuilder::memory();
builder.extend_stream(stream).unwrap(); builder.extend_stream(stream).unwrap();
builder.into_inner().and_then(Set::from_bytes).unwrap() builder.into_set()
} }
fn insert_key(set: &Set, key: &[u8]) -> Set { fn insert_key<A: AsRef<[u8]>>(set: &fst::Set<A>, key: &[u8]) -> fst::Set<Vec<u8>> {
let unique_key = { let unique_key = {
let mut builder = fst::SetBuilder::memory(); let mut builder = fst::SetBuilder::memory();
builder.insert(key).unwrap(); builder.insert(key).unwrap();
builder.into_inner().and_then(Set::from_bytes).unwrap() builder.into_set()
}; };
let union_ = set.op().add(unique_key.into_stream()).r#union(); let union_ = set.op().add(unique_key.into_stream()).r#union();
@ -169,14 +315,14 @@ mod tests {
set_from_stream(union_) set_from_stream(union_)
} }
fn sdset_into_fstset(set: &sdset::Set<&str>) -> Set { fn sdset_into_fstset(set: &sdset::Set<&str>) -> fst::Set<Vec<u8>> {
let mut builder = fst::SetBuilder::memory(); let mut builder = fst::SetBuilder::memory();
let set = SetBuf::from_dirty(set.into_iter().map(|s| normalize_str(s)).collect()); let set = SetBuf::from_dirty(set.into_iter().map(|s| normalize_str(s)).collect());
builder.extend_iter(set.into_iter()).unwrap(); builder.extend_iter(set.into_iter()).unwrap();
builder.into_inner().and_then(Set::from_bytes).unwrap() builder.into_set()
} }
const fn doc_index(document_id: u64, word_index: u16) -> DocIndex { const fn doc_index(document_id: u32, word_index: u16) -> DocIndex {
DocIndex { DocIndex {
document_id: DocumentId(document_id), document_id: DocumentId(document_id),
attribute: 0, attribute: 0,
@ -186,7 +332,7 @@ mod tests {
} }
} }
const fn doc_char_index(document_id: u64, word_index: u16, char_index: u16) -> DocIndex { const fn doc_char_index(document_id: u32, word_index: u16, char_index: u16) -> DocIndex {
DocIndex { DocIndex {
document_id: DocumentId(document_id), document_id: DocumentId(document_id),
attribute: 0, attribute: 0,
@ -213,15 +359,11 @@ mod tests {
let word = normalize_str(word); let word = normalize_str(word);
let alternatives = match self let alternatives = self
.index .index
.synonyms .synonyms
.synonyms(&writer, word.as_bytes()) .synonyms_fst(&writer, word.as_bytes())
.unwrap() .unwrap();
{
Some(alternatives) => alternatives,
None => fst::Set::default(),
};
let new = sdset_into_fstset(&new); let new = sdset_into_fstset(&new);
let new_alternatives = let new_alternatives =
@ -231,10 +373,7 @@ mod tests {
.put_synonyms(&mut writer, word.as_bytes(), &new_alternatives) .put_synonyms(&mut writer, word.as_bytes(), &new_alternatives)
.unwrap(); .unwrap();
let synonyms = match self.index.main.synonyms_fst(&writer).unwrap() { let synonyms = self.index.main.synonyms_fst(&writer).unwrap();
Some(synonyms) => synonyms,
None => fst::Set::default(),
};
let synonyms_fst = insert_key(&synonyms, word.as_bytes()); let synonyms_fst = insert_key(&synonyms, word.as_bytes());
self.index self.index
@ -249,7 +388,7 @@ mod tests {
impl<'a> FromIterator<(&'a str, &'a [DocIndex])> for TempDatabase { impl<'a> FromIterator<(&'a str, &'a [DocIndex])> for TempDatabase {
fn from_iter<I: IntoIterator<Item = (&'a str, &'a [DocIndex])>>(iter: I) -> Self { fn from_iter<I: IntoIterator<Item = (&'a str, &'a [DocIndex])>>(iter: I) -> Self {
let tempdir = TempDir::new().unwrap(); let tempdir = TempDir::new().unwrap();
let database = Database::open_or_create(&tempdir).unwrap(); let database = Database::open_or_create(&tempdir, DatabaseOptions::default()).unwrap();
let index = database.create_index("default").unwrap(); let index = database.create_index("default").unwrap();
let db = &database; let db = &database;
@ -287,7 +426,7 @@ mod tests {
index.main.put_schema(&mut writer, &schema).unwrap(); index.main.put_schema(&mut writer, &schema).unwrap();
let words_fst = Set::from_iter(words_fst).unwrap(); let words_fst = fst::Set::from_iter(words_fst).unwrap();
index.main.put_words_fst(&mut writer, &words_fst).unwrap(); index.main.put_words_fst(&mut writer, &words_fst).unwrap();
@ -331,8 +470,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "iphone from apple", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("iphone from apple"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -354,8 +493,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "hello", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("hello"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -365,8 +504,8 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "bonjour", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("bonjour"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -388,7 +527,7 @@ mod tests {
// let builder = store.query_builder(); // let builder = store.query_builder();
// let results = builder.query(&reader, "sal", 0..20).unwrap(); // let results = builder.query(&reader, "sal", 0..20).unwrap();
// let mut iter = results.into_iter(); // let mut iter = documents.into_iter();
// assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { // assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
// let mut matches = matches.into_iter(); // let mut matches = matches.into_iter();
@ -399,7 +538,7 @@ mod tests {
// let builder = store.query_builder(); // let builder = store.query_builder();
// let results = builder.query(&reader, "bonj", 0..20).unwrap(); // let results = builder.query(&reader, "bonj", 0..20).unwrap();
// let mut iter = results.into_iter(); // let mut iter = documents.into_iter();
// assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { // assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
// let mut matches = matches.into_iter(); // let mut matches = matches.into_iter();
@ -410,13 +549,13 @@ mod tests {
// let builder = store.query_builder(); // let builder = store.query_builder();
// let results = builder.query(&reader, "sal blabla", 0..20).unwrap(); // let results = builder.query(&reader, "sal blabla", 0..20).unwrap();
// let mut iter = results.into_iter(); // let mut iter = documents.into_iter();
// assert_matches!(iter.next(), None); // assert_matches!(iter.next(), None);
// let builder = store.query_builder(); // let builder = store.query_builder();
// let results = builder.query(&reader, "bonj blabla", 0..20).unwrap(); // let results = builder.query(&reader, "bonj blabla", 0..20).unwrap();
// let mut iter = results.into_iter(); // let mut iter = documents.into_iter();
// assert_matches!(iter.next(), None); // assert_matches!(iter.next(), None);
// } // }
@ -432,7 +571,7 @@ mod tests {
// let builder = store.query_builder(); // let builder = store.query_builder();
// let results = builder.query(&reader, "salutution", 0..20).unwrap(); // let results = builder.query(&reader, "salutution", 0..20).unwrap();
// let mut iter = results.into_iter(); // let mut iter = documents.into_iter();
// assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { // assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
// let mut matches = matches.into_iter(); // let mut matches = matches.into_iter();
@ -443,7 +582,7 @@ mod tests {
// let builder = store.query_builder(); // let builder = store.query_builder();
// let results = builder.query(&reader, "saluttion", 0..20).unwrap(); // let results = builder.query(&reader, "saluttion", 0..20).unwrap();
// let mut iter = results.into_iter(); // let mut iter = documents.into_iter();
// assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { // assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
// let mut matches = matches.into_iter(); // let mut matches = matches.into_iter();
@ -469,8 +608,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "hello", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("hello"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -490,8 +629,8 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "bonjour", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("bonjour"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -511,8 +650,8 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "salut", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("salut"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -557,8 +696,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "NY subway", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("NY subway"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -579,8 +718,8 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "NYC subway", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("NYC subway"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -621,8 +760,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "NY", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("NY"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(2), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(2), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -645,8 +784,8 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "new york", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("new york"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -679,8 +818,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "NY subway", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("NY subway"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -696,8 +835,9 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "new york subway", 0..20).unwrap(); let SortResult { documents, .. } =
let mut iter = results.into_iter(); builder.query(&reader, Some("new york subway"), 0..20).unwrap();
let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -744,8 +884,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "NY subway", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("NY subway"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -766,8 +906,8 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "NYC subway", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("NYC subway"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -819,8 +959,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "NY subway broken", 0..20).unwrap(); let SortResult {documents, .. } = builder.query(&reader, Some("NY subway broken"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -835,8 +975,8 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "NYC subway", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("NYC subway"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -891,10 +1031,10 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder let SortResult { documents, .. } = builder
.query(&reader, "new york underground train broken", 0..20) .query(&reader, Some("new york underground train broken"), 0..20)
.unwrap(); .unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(2), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(2), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -921,10 +1061,10 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder let SortResult { documents, .. } = builder
.query(&reader, "new york city underground train broken", 0..20) .query(&reader, Some("new york city underground train broken"), 0..20)
.unwrap(); .unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(2), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(2), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -965,8 +1105,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "new york big ", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("new york big "), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -999,8 +1139,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "NY subway ", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("NY subway "), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -1049,10 +1189,10 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder let SortResult { documents, .. } = builder
.query(&reader, "new york city long subway cool ", 0..20) .query(&reader, Some("new york city long subway cool "), 0..20)
.unwrap(); .unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut matches = matches.into_iter(); let mut matches = matches.into_iter();
@ -1082,8 +1222,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "telephone", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("telephone"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -1099,8 +1239,8 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "téléphone", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("téléphone"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -1116,8 +1256,8 @@ mod tests {
assert_matches!(iter.next(), None); assert_matches!(iter.next(), None);
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "télephone", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("télephone"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(1), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -1143,8 +1283,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "i phone case", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("i phone case"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -1172,8 +1312,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "searchengine", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("searchengine"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -1212,8 +1352,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "searchengine", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("searchengine"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();
@ -1244,8 +1384,8 @@ mod tests {
let reader = db.main_read_txn().unwrap(); let reader = db.main_read_txn().unwrap();
let builder = store.query_builder(); let builder = store.query_builder();
let results = builder.query(&reader, "searchengine", 0..20).unwrap(); let SortResult { documents, .. } = builder.query(&reader, Some("searchengine"), 0..20).unwrap();
let mut iter = results.into_iter(); let mut iter = documents.into_iter();
assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => { assert_matches!(iter.next(), Some(Document { id: DocumentId(0), matches, .. }) => {
let mut iter = matches.into_iter(); let mut iter = matches.into_iter();

View File

@ -12,7 +12,7 @@ use sdset::{Set, SetBuf, SetOperation};
use log::debug; use log::debug;
use crate::database::MainT; use crate::database::MainT;
use crate::{store, DocumentId, DocIndex, MResult}; use crate::{store, DocumentId, DocIndex, MResult, FstSetCow};
use crate::automaton::{normalize_str, build_dfa, build_prefix_dfa, build_exact_dfa}; use crate::automaton::{normalize_str, build_dfa, build_prefix_dfa, build_exact_dfa};
use crate::QueryWordsMapper; use crate::QueryWordsMapper;
@ -112,9 +112,9 @@ pub struct PostingsList {
matches: SetBuf<DocIndex>, matches: SetBuf<DocIndex>,
} }
pub struct Context { pub struct Context<'a> {
pub words_set: fst::Set, pub words_set: FstSetCow<'a>,
pub stop_words: fst::Set, pub stop_words: FstSetCow<'a>,
pub synonyms: store::Synonyms, pub synonyms: store::Synonyms,
pub postings_lists: store::PostingsLists, pub postings_lists: store::PostingsLists,
pub prefix_postings_lists: store::PrefixPostingsListsCache, pub prefix_postings_lists: store::PrefixPostingsListsCache,
@ -147,7 +147,7 @@ fn split_best_frequency<'a>(reader: &heed::RoTxn<MainT>, ctx: &Context, word: &'
fn fetch_synonyms(reader: &heed::RoTxn<MainT>, ctx: &Context, words: &[&str]) -> MResult<Vec<Vec<String>>> { fn fetch_synonyms(reader: &heed::RoTxn<MainT>, ctx: &Context, words: &[&str]) -> MResult<Vec<Vec<String>>> {
let words = normalize_str(&words.join(" ")); let words = normalize_str(&words.join(" "));
let set = ctx.synonyms.synonyms(reader, words.as_bytes())?.unwrap_or_default(); let set = ctx.synonyms.synonyms_fst(reader, words.as_bytes())?;
let mut strings = Vec::new(); let mut strings = Vec::new();
let mut stream = set.stream(); let mut stream = set.stream();
@ -531,7 +531,7 @@ pub fn traverse_query_tree<'o, 'txn>(
let docids = SetBuf::new(docids).unwrap(); let docids = SetBuf::new(docids).unwrap();
debug!("{:2$}docids construction took {:.02?}", "", before.elapsed(), depth * 2); debug!("{:2$}docids construction took {:.02?}", "", before.elapsed(), depth * 2);
let matches = Cow::Owned(SetBuf::new(matches).unwrap()); let matches = Cow::Owned(SetBuf::from_dirty(matches));
let key = PostingsKey { query, input: vec![], distance: 0, is_exact: true }; let key = PostingsKey { query, input: vec![], distance: 0, is_exact: true };
postings.insert(key, matches); postings.insert(key, matches);

View File

@ -19,6 +19,7 @@ impl QueryWordsMapper {
QueryWordsMapper { originals, mappings: HashMap::new() } QueryWordsMapper { originals, mappings: HashMap::new() }
} }
#[allow(clippy::len_zero)]
pub fn declare<I, A>(&mut self, range: Range<usize>, id: QueryId, replacement: I) pub fn declare<I, A>(&mut self, range: Range<usize>, id: QueryId, replacement: I)
where I: IntoIterator<Item = A>, where I: IntoIterator<Item = A>,
A: ToString, A: ToString,
@ -53,7 +54,7 @@ impl QueryWordsMapper {
} }
{ {
let replacement = replacement[common_left..replacement.len() - common_right].iter().cloned().collect(); let replacement = replacement[common_left..replacement.len() - common_right].to_vec();
self.mappings.insert(id + common_left, (range.clone(), replacement)); self.mappings.insert(id + common_left, (range.clone(), replacement));
} }

View File

@ -1,34 +1,37 @@
use std::borrow::Cow;
use std::collections::{BTreeMap, HashMap}; use std::collections::{BTreeMap, HashMap};
use std::convert::TryFrom; use std::convert::TryFrom;
use crate::{DocIndex, DocumentId};
use deunicode::deunicode_with_tofu; use deunicode::deunicode_with_tofu;
use meilisearch_schema::IndexedPos; use meilisearch_schema::IndexedPos;
use meilisearch_tokenizer::{is_cjk, SeqTokenizer, Token, Tokenizer}; use meilisearch_tokenizer::{is_cjk, SeqTokenizer, Token, Tokenizer};
use sdset::SetBuf; use sdset::SetBuf;
use crate::{DocIndex, DocumentId};
use crate::FstSetCow;
const WORD_LENGTH_LIMIT: usize = 80; const WORD_LENGTH_LIMIT: usize = 80;
type Word = Vec<u8>; // TODO make it be a SmallVec type Word = Vec<u8>; // TODO make it be a SmallVec
pub struct RawIndexer { pub struct RawIndexer<A> {
word_limit: usize, // the maximum number of indexed words word_limit: usize, // the maximum number of indexed words
stop_words: fst::Set, stop_words: fst::Set<A>,
words_doc_indexes: BTreeMap<Word, Vec<DocIndex>>, words_doc_indexes: BTreeMap<Word, Vec<DocIndex>>,
docs_words: HashMap<DocumentId, Vec<Word>>, docs_words: HashMap<DocumentId, Vec<Word>>,
} }
pub struct Indexed { pub struct Indexed<'a> {
pub words_doc_indexes: BTreeMap<Word, SetBuf<DocIndex>>, pub words_doc_indexes: BTreeMap<Word, SetBuf<DocIndex>>,
pub docs_words: HashMap<DocumentId, fst::Set>, pub docs_words: HashMap<DocumentId, FstSetCow<'a>>,
} }
impl RawIndexer { impl<A> RawIndexer<A> {
pub fn new(stop_words: fst::Set) -> RawIndexer { pub fn new(stop_words: fst::Set<A>) -> RawIndexer<A> {
RawIndexer::with_word_limit(stop_words, 1000) RawIndexer::with_word_limit(stop_words, 1000)
} }
pub fn with_word_limit(stop_words: fst::Set, limit: usize) -> RawIndexer { pub fn with_word_limit(stop_words: fst::Set<A>, limit: usize) -> RawIndexer<A> {
RawIndexer { RawIndexer {
word_limit: limit, word_limit: limit,
stop_words, stop_words,
@ -36,7 +39,9 @@ impl RawIndexer {
docs_words: HashMap::new(), docs_words: HashMap::new(),
} }
} }
}
impl<A: AsRef<[u8]>> RawIndexer<A> {
pub fn index_text(&mut self, id: DocumentId, indexed_pos: IndexedPos, text: &str) -> usize { pub fn index_text(&mut self, id: DocumentId, indexed_pos: IndexedPos, text: &str) -> usize {
let mut number_of_words = 0; let mut number_of_words = 0;
@ -61,9 +66,9 @@ impl RawIndexer {
number_of_words number_of_words
} }
pub fn index_text_seq<'a, I>(&mut self, id: DocumentId, indexed_pos: IndexedPos, iter: I) pub fn index_text_seq<'s, I>(&mut self, id: DocumentId, indexed_pos: IndexedPos, iter: I)
where where
I: IntoIterator<Item = &'a str>, I: IntoIterator<Item = &'s str>,
{ {
let iter = iter.into_iter(); let iter = iter.into_iter();
for token in SeqTokenizer::new(iter) { for token in SeqTokenizer::new(iter) {
@ -83,7 +88,7 @@ impl RawIndexer {
} }
} }
pub fn build(self) -> Indexed { pub fn build(self) -> Indexed<'static> {
let words_doc_indexes = self let words_doc_indexes = self
.words_doc_indexes .words_doc_indexes
.into_iter() .into_iter()
@ -96,7 +101,8 @@ impl RawIndexer {
.map(|(id, mut words)| { .map(|(id, mut words)| {
words.sort_unstable(); words.sort_unstable();
words.dedup(); words.dedup();
(id, fst::Set::from_iter(words).unwrap()) let fst = fst::Set::from_iter(words).unwrap().map_data(Cow::Owned).unwrap();
(id, fst)
}) })
.collect(); .collect();
@ -107,16 +113,18 @@ impl RawIndexer {
} }
} }
fn index_token( fn index_token<A>(
token: Token, token: Token,
id: DocumentId, id: DocumentId,
indexed_pos: IndexedPos, indexed_pos: IndexedPos,
word_limit: usize, word_limit: usize,
stop_words: &fst::Set, stop_words: &fst::Set<A>,
words_doc_indexes: &mut BTreeMap<Word, Vec<DocIndex>>, words_doc_indexes: &mut BTreeMap<Word, Vec<DocIndex>>,
docs_words: &mut HashMap<DocumentId, Vec<Word>>, docs_words: &mut HashMap<DocumentId, Vec<Word>>,
) -> bool { ) -> bool
if token.word_index >= word_limit { where A: AsRef<[u8]>,
{
if token.index >= word_limit {
return false; return false;
} }
@ -269,4 +277,36 @@ mod tests {
.get(&"🇯🇵".to_owned().into_bytes()) .get(&"🇯🇵".to_owned().into_bytes())
.is_some()); .is_some());
} }
#[test]
// test sample from 807
fn very_long_text() {
let mut indexer = RawIndexer::new(fst::Set::default());
let indexed_pos = IndexedPos(0);
let docid = DocumentId(0);
let text = " The locations block is the most powerful, and potentially most involved, section of the .platform.app.yaml file. It allows you to control how the application container responds to incoming requests at a very fine-grained level. Common patterns also vary between language containers due to the way PHP-FPM handles incoming requests.\nEach entry of the locations block is an absolute URI path (with leading /) and its value includes the configuration directives for how the web server should handle matching requests. That is, if your domain is example.com then '/' means &ldquo;requests for example.com/&rdquo;, while '/admin' means &ldquo;requests for example.com/admin&rdquo;. If multiple blocks could match an incoming request then the most-specific will apply.\nweb:locations:&#39;/&#39;:# Rules for all requests that don&#39;t otherwise match....&#39;/sites/default/files&#39;:# Rules for any requests that begin with /sites/default/files....The simplest possible locations configuration is one that simply passes all requests on to your application unconditionally:\nweb:locations:&#39;/&#39;:passthru:trueThat is, all requests to /* should be forwarded to the process started by web.commands.start above. Note that for PHP containers the passthru key must specify what PHP file the request should be forwarded to, and must also specify a docroot under which the file lives. For example:\nweb:locations:&#39;/&#39;:root:&#39;web&#39;passthru:&#39;/app.php&#39;This block will serve requests to / from the web directory in the application, and if a file doesn&rsquo;t exist on disk then the request will be forwarded to the /app.php script.\nA full list of the possible subkeys for locations is below.\n root: The folder from which to serve static assets for this location relative to the application root. The application root is the directory in which the .platform.app.yaml file is located. Typical values for this property include public or web. Setting it to '' is not recommended, and its behavior may vary depending on the type of application. Absolute paths are not supported.\n passthru: Whether to forward disallowed and missing resources from this location to the application and can be true, false or an absolute URI path (with leading /). The default value is false. For non-PHP applications it will generally be just true or false. In a PHP application this will typically be the front controller such as /index.php or /app.php. This entry works similar to mod_rewrite under Apache. Note: If the value of passthru does not begin with the same value as the location key it is under, the passthru may evaluate to another entry. That may be useful when you want different cache settings for different paths, for instance, but want missing files in all of them to map back to the same front controller. See the example block below.\n index: The files to consider when serving a request for a directory: an array of file names or null. (typically ['index.html']). Note that in order for this to work, access to the static files named must be allowed by the allow or rules keys for this location.\n expires: How long to allow static assets from this location to be cached (this enables the Cache-Control and Expires headers) and can be a time or -1 for no caching (default). Times can be suffixed with &ldquo;ms&rdquo; (milliseconds), &ldquo;s&rdquo; (seconds), &ldquo;m&rdquo; (minutes), &ldquo;h&rdquo; (hours), &ldquo;d&rdquo; (days), &ldquo;w&rdquo; (weeks), &ldquo;M&rdquo; (months, 30d) or &ldquo;y&rdquo; (years, 365d).\n scripts: Whether to allow loading scripts in that location (true or false). This directive is only meaningful on PHP.\n allow: Whether to allow serving files which don&rsquo;t match a rule (true or false, default: true).\n headers: Any additional headers to apply to static assets. This section is a mapping of header names to header values. Responses from the application aren&rsquo;t affected, to avoid overlap with the application&rsquo;s own ability to include custom headers in the response.\n rules: Specific overrides for a specific location. The key is a PCRE (regular expression) that is matched against the full request path.\n request_buffering: Most application servers do not support chunked requests (e.g. fpm, uwsgi), so Platform.sh enables request_buffering by default to handle them. That default configuration would look like this if it was present in .platform.app.yaml:\nweb:locations:&#39;/&#39;:passthru:truerequest_buffering:enabled:truemax_request_size:250mIf the application server can already efficiently handle chunked requests, the request_buffering subkey can be modified to disable it entirely (enabled: false). Additionally, applications that frequently deal with uploads greater than 250MB in size can update the max_request_size key to the application&rsquo;s needs. Note that modifications to request_buffering will need to be specified at each location where it is desired.\n ";
indexer.index_text(docid, indexed_pos, text);
let Indexed {
words_doc_indexes, ..
} = indexer.build();
assert!(words_doc_indexes.get(&"buffering".to_owned().into_bytes()).is_some());
}
#[test]
fn words_over_index_1000_not_indexed() {
let mut indexer = RawIndexer::new(fst::Set::default());
let indexed_pos = IndexedPos(0);
let docid = DocumentId(0);
let mut text = String::with_capacity(5000);
for _ in 0..1000 {
text.push_str("less ");
}
text.push_str("more");
indexer.index_text(docid, indexed_pos, &text);
let Indexed {
words_doc_indexes, ..
} = indexer.build();
assert!(words_doc_indexes.get(&"less".to_owned().into_bytes()).is_some());
assert!(words_doc_indexes.get(&"more".to_owned().into_bytes()).is_none());
}
} }

View File

@ -1,198 +0,0 @@
use std::str::FromStr;
use ordered_float::OrderedFloat;
use serde::ser;
use serde::Serialize;
use super::SerializerError;
use crate::Number;
pub struct ConvertToNumber;
impl ser::Serializer for ConvertToNumber {
type Ok = Number;
type Error = SerializerError;
type SerializeSeq = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTuple = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTupleStruct = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTupleVariant = ser::Impossible<Self::Ok, Self::Error>;
type SerializeMap = ser::Impossible<Self::Ok, Self::Error>;
type SerializeStruct = ser::Impossible<Self::Ok, Self::Error>;
type SerializeStructVariant = ser::Impossible<Self::Ok, Self::Error>;
fn serialize_bool(self, value: bool) -> Result<Self::Ok, Self::Error> {
Ok(Number::Unsigned(u64::from(value)))
}
fn serialize_char(self, _value: char) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnrankableType { type_name: "char" })
}
fn serialize_i8(self, value: i8) -> Result<Self::Ok, Self::Error> {
Ok(Number::Signed(i64::from(value)))
}
fn serialize_i16(self, value: i16) -> Result<Self::Ok, Self::Error> {
Ok(Number::Signed(i64::from(value)))
}
fn serialize_i32(self, value: i32) -> Result<Self::Ok, Self::Error> {
Ok(Number::Signed(i64::from(value)))
}
fn serialize_i64(self, value: i64) -> Result<Self::Ok, Self::Error> {
Ok(Number::Signed(value))
}
fn serialize_u8(self, value: u8) -> Result<Self::Ok, Self::Error> {
Ok(Number::Unsigned(u64::from(value)))
}
fn serialize_u16(self, value: u16) -> Result<Self::Ok, Self::Error> {
Ok(Number::Unsigned(u64::from(value)))
}
fn serialize_u32(self, value: u32) -> Result<Self::Ok, Self::Error> {
Ok(Number::Unsigned(u64::from(value)))
}
fn serialize_u64(self, value: u64) -> Result<Self::Ok, Self::Error> {
Ok(Number::Unsigned(value))
}
fn serialize_f32(self, value: f32) -> Result<Self::Ok, Self::Error> {
Ok(Number::Float(OrderedFloat(f64::from(value))))
}
fn serialize_f64(self, value: f64) -> Result<Self::Ok, Self::Error> {
Ok(Number::Float(OrderedFloat(value)))
}
fn serialize_str(self, value: &str) -> Result<Self::Ok, Self::Error> {
Ok(Number::from_str(value)?)
}
fn serialize_bytes(self, _v: &[u8]) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnrankableType { type_name: "&[u8]" })
}
fn serialize_none(self) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnrankableType {
type_name: "Option",
})
}
fn serialize_some<T: ?Sized>(self, _value: &T) -> Result<Self::Ok, Self::Error>
where
T: Serialize,
{
Err(SerializerError::UnrankableType {
type_name: "Option",
})
}
fn serialize_unit(self) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnrankableType { type_name: "()" })
}
fn serialize_unit_struct(self, _name: &'static str) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnrankableType {
type_name: "unit struct",
})
}
fn serialize_unit_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnrankableType {
type_name: "unit variant",
})
}
fn serialize_newtype_struct<T: ?Sized>(
self,
_name: &'static str,
value: &T,
) -> Result<Self::Ok, Self::Error>
where
T: Serialize,
{
value.serialize(self)
}
fn serialize_newtype_variant<T: ?Sized>(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_value: &T,
) -> Result<Self::Ok, Self::Error>
where
T: Serialize,
{
Err(SerializerError::UnrankableType {
type_name: "newtype variant",
})
}
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
Err(SerializerError::UnrankableType {
type_name: "sequence",
})
}
fn serialize_tuple(self, _len: usize) -> Result<Self::SerializeTuple, Self::Error> {
Err(SerializerError::UnrankableType { type_name: "tuple" })
}
fn serialize_tuple_struct(
self,
_name: &'static str,
_len: usize,
) -> Result<Self::SerializeTupleStruct, Self::Error> {
Err(SerializerError::UnrankableType {
type_name: "tuple struct",
})
}
fn serialize_tuple_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_len: usize,
) -> Result<Self::SerializeTupleVariant, Self::Error> {
Err(SerializerError::UnrankableType {
type_name: "tuple variant",
})
}
fn serialize_map(self, _len: Option<usize>) -> Result<Self::SerializeMap, Self::Error> {
Err(SerializerError::UnrankableType { type_name: "map" })
}
fn serialize_struct(
self,
_name: &'static str,
_len: usize,
) -> Result<Self::SerializeStruct, Self::Error> {
Err(SerializerError::UnrankableType {
type_name: "struct",
})
}
fn serialize_struct_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_len: usize,
) -> Result<Self::SerializeStructVariant, Self::Error> {
Err(SerializerError::UnrankableType {
type_name: "struct variant",
})
}
}

View File

@ -1,258 +0,0 @@
use serde::ser;
use serde::Serialize;
use super::SerializerError;
pub struct ConvertToString;
impl ser::Serializer for ConvertToString {
type Ok = String;
type Error = SerializerError;
type SerializeSeq = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTuple = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTupleStruct = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTupleVariant = ser::Impossible<Self::Ok, Self::Error>;
type SerializeMap = MapConvertToString;
type SerializeStruct = StructConvertToString;
type SerializeStructVariant = ser::Impossible<Self::Ok, Self::Error>;
fn serialize_bool(self, _value: bool) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "boolean",
})
}
fn serialize_char(self, value: char) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_i8(self, value: i8) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_i16(self, value: i16) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_i32(self, value: i32) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_i64(self, value: i64) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_u8(self, value: u8) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_u16(self, value: u16) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_u32(self, value: u32) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_u64(self, value: u64) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_f32(self, value: f32) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_f64(self, value: f64) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_str(self, value: &str) -> Result<Self::Ok, Self::Error> {
Ok(value.to_string())
}
fn serialize_bytes(self, _v: &[u8]) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "&[u8]" })
}
fn serialize_none(self) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "Option",
})
}
fn serialize_some<T: ?Sized>(self, _value: &T) -> Result<Self::Ok, Self::Error>
where
T: Serialize,
{
Err(SerializerError::UnserializableType {
type_name: "Option",
})
}
fn serialize_unit(self) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "()" })
}
fn serialize_unit_struct(self, _name: &'static str) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "unit struct",
})
}
fn serialize_unit_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "unit variant",
})
}
fn serialize_newtype_struct<T: ?Sized>(
self,
_name: &'static str,
value: &T,
) -> Result<Self::Ok, Self::Error>
where
T: Serialize,
{
value.serialize(self)
}
fn serialize_newtype_variant<T: ?Sized>(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_value: &T,
) -> Result<Self::Ok, Self::Error>
where
T: Serialize,
{
Err(SerializerError::UnserializableType {
type_name: "newtype variant",
})
}
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "sequence",
})
}
fn serialize_tuple(self, _len: usize) -> Result<Self::SerializeTuple, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "tuple" })
}
fn serialize_tuple_struct(
self,
_name: &'static str,
_len: usize,
) -> Result<Self::SerializeTupleStruct, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "tuple struct",
})
}
fn serialize_tuple_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_len: usize,
) -> Result<Self::SerializeTupleVariant, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "tuple variant",
})
}
fn serialize_map(self, _len: Option<usize>) -> Result<Self::SerializeMap, Self::Error> {
Ok(MapConvertToString {
text: String::new(),
})
}
fn serialize_struct(
self,
_name: &'static str,
_len: usize,
) -> Result<Self::SerializeStruct, Self::Error> {
Ok(StructConvertToString {
text: String::new(),
})
}
fn serialize_struct_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_len: usize,
) -> Result<Self::SerializeStructVariant, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "struct variant",
})
}
}
pub struct MapConvertToString {
text: String,
}
impl ser::SerializeMap for MapConvertToString {
type Ok = String;
type Error = SerializerError;
fn serialize_key<T: ?Sized>(&mut self, key: &T) -> Result<(), Self::Error>
where
T: ser::Serialize,
{
let text = key.serialize(ConvertToString)?;
self.text.push_str(&text);
self.text.push_str(" ");
Ok(())
}
fn serialize_value<T: ?Sized>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ser::Serialize,
{
let text = value.serialize(ConvertToString)?;
self.text.push_str(&text);
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(self.text)
}
}
pub struct StructConvertToString {
text: String,
}
impl ser::SerializeStruct for StructConvertToString {
type Ok = String;
type Error = SerializerError;
fn serialize_field<T: ?Sized>(
&mut self,
key: &'static str,
value: &T,
) -> Result<(), Self::Error>
where
T: ser::Serialize,
{
let value = value.serialize(ConvertToString)?;
self.text.push_str(key);
self.text.push_str(" ");
self.text.push_str(&value);
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(self.text)
}
}

View File

@ -1,310 +0,0 @@
use std::hash::{Hash, Hasher};
use crate::DocumentId;
use serde::{ser, Serialize};
use serde_json::{Value, Number};
use siphasher::sip::SipHasher;
use super::{ConvertToString, SerializerError};
pub fn extract_document_id<D>(
primary_key: &str,
document: &D,
) -> Result<Option<DocumentId>, SerializerError>
where
D: serde::Serialize,
{
let serializer = ExtractDocumentId { primary_key };
document.serialize(serializer)
}
fn validate_number(value: &Number) -> Option<String> {
if value.is_f64() {
return None
}
Some(value.to_string())
}
fn validate_string(value: &str) -> Option<String> {
if value.chars().all(|x| x.is_ascii_alphanumeric() || x == '-' || x == '_') {
Some(value.to_string())
} else {
None
}
}
pub fn value_to_string(value: &Value) -> Option<String> {
match value {
Value::Null => None,
Value::Bool(_) => None,
Value::Number(value) => validate_number(value),
Value::String(value) => validate_string(value),
Value::Array(_) => None,
Value::Object(_) => None,
}
}
pub fn compute_document_id<H: Hash>(t: H) -> DocumentId {
let mut s = SipHasher::new();
t.hash(&mut s);
let hash = s.finish();
DocumentId(hash)
}
struct ExtractDocumentId<'a> {
primary_key: &'a str,
}
impl<'a> ser::Serializer for ExtractDocumentId<'a> {
type Ok = Option<DocumentId>;
type Error = SerializerError;
type SerializeSeq = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTuple = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTupleStruct = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTupleVariant = ser::Impossible<Self::Ok, Self::Error>;
type SerializeMap = ExtractDocumentIdMapSerializer<'a>;
type SerializeStruct = ExtractDocumentIdStructSerializer<'a>;
type SerializeStructVariant = ser::Impossible<Self::Ok, Self::Error>;
forward_to_unserializable_type! {
bool => serialize_bool,
char => serialize_char,
i8 => serialize_i8,
i16 => serialize_i16,
i32 => serialize_i32,
i64 => serialize_i64,
u8 => serialize_u8,
u16 => serialize_u16,
u32 => serialize_u32,
u64 => serialize_u64,
f32 => serialize_f32,
f64 => serialize_f64,
}
fn serialize_str(self, _value: &str) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "str" })
}
fn serialize_bytes(self, _value: &[u8]) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "&[u8]" })
}
fn serialize_none(self) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "Option",
})
}
fn serialize_some<T: ?Sized>(self, _value: &T) -> Result<Self::Ok, Self::Error>
where
T: Serialize,
{
Err(SerializerError::UnserializableType {
type_name: "Option",
})
}
fn serialize_unit(self) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "()" })
}
fn serialize_unit_struct(self, _name: &'static str) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "unit struct",
})
}
fn serialize_unit_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "unit variant",
})
}
fn serialize_newtype_struct<T: ?Sized>(
self,
_name: &'static str,
value: &T,
) -> Result<Self::Ok, Self::Error>
where
T: Serialize,
{
value.serialize(self)
}
fn serialize_newtype_variant<T: ?Sized>(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_value: &T,
) -> Result<Self::Ok, Self::Error>
where
T: Serialize,
{
Err(SerializerError::UnserializableType {
type_name: "newtype variant",
})
}
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "sequence",
})
}
fn serialize_tuple(self, _len: usize) -> Result<Self::SerializeTuple, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "tuple" })
}
fn serialize_tuple_struct(
self,
_name: &'static str,
_len: usize,
) -> Result<Self::SerializeTupleStruct, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "tuple struct",
})
}
fn serialize_tuple_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_len: usize,
) -> Result<Self::SerializeTupleVariant, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "tuple variant",
})
}
fn serialize_map(self, _len: Option<usize>) -> Result<Self::SerializeMap, Self::Error> {
let serializer = ExtractDocumentIdMapSerializer {
primary_key: self.primary_key,
document_id: None,
current_key_name: None,
};
Ok(serializer)
}
fn serialize_struct(
self,
_name: &'static str,
_len: usize,
) -> Result<Self::SerializeStruct, Self::Error> {
let serializer = ExtractDocumentIdStructSerializer {
primary_key: self.primary_key,
document_id: None,
};
Ok(serializer)
}
fn serialize_struct_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_len: usize,
) -> Result<Self::SerializeStructVariant, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "struct variant",
})
}
}
pub struct ExtractDocumentIdMapSerializer<'a> {
primary_key: &'a str,
document_id: Option<DocumentId>,
current_key_name: Option<String>,
}
impl<'a> ser::SerializeMap for ExtractDocumentIdMapSerializer<'a> {
type Ok = Option<DocumentId>;
type Error = SerializerError;
fn serialize_key<T: ?Sized>(&mut self, key: &T) -> Result<(), Self::Error>
where
T: Serialize,
{
let key = key.serialize(ConvertToString)?;
self.current_key_name = Some(key);
Ok(())
}
fn serialize_value<T: ?Sized>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: Serialize,
{
let key = self.current_key_name.take().unwrap();
self.serialize_entry(&key, value)
}
fn serialize_entry<K: ?Sized, V: ?Sized>(
&mut self,
key: &K,
value: &V,
) -> Result<(), Self::Error>
where
K: Serialize,
V: Serialize,
{
let key = key.serialize(ConvertToString)?;
if self.primary_key == key {
let value = serde_json::to_string(value).and_then(|s| serde_json::from_str(&s))?;
match value_to_string(&value).map(|s| compute_document_id(&s)) {
Some(document_id) => self.document_id = Some(document_id),
None => return Err(SerializerError::InvalidDocumentIdType),
}
}
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(self.document_id)
}
}
pub struct ExtractDocumentIdStructSerializer<'a> {
primary_key: &'a str,
document_id: Option<DocumentId>,
}
impl<'a> ser::SerializeStruct for ExtractDocumentIdStructSerializer<'a> {
type Ok = Option<DocumentId>;
type Error = SerializerError;
fn serialize_field<T: ?Sized>(
&mut self,
key: &'static str,
value: &T,
) -> Result<(), Self::Error>
where
T: Serialize,
{
if self.primary_key == key {
let value = serde_json::to_string(value).and_then(|s| serde_json::from_str(&s))?;
match value_to_string(&value).map(compute_document_id) {
Some(document_id) => self.document_id = Some(document_id),
None => return Err(SerializerError::InvalidDocumentIdType),
}
}
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(self.document_id)
}
}

View File

@ -1,362 +0,0 @@
use meilisearch_schema::IndexedPos;
use serde::ser;
use serde::Serialize;
use super::{ConvertToString, SerializerError};
use crate::raw_indexer::RawIndexer;
use crate::DocumentId;
pub struct Indexer<'a> {
pub pos: IndexedPos,
pub indexer: &'a mut RawIndexer,
pub document_id: DocumentId,
}
impl<'a> ser::Serializer for Indexer<'a> {
type Ok = Option<usize>;
type Error = SerializerError;
type SerializeSeq = SeqIndexer<'a>;
type SerializeTuple = TupleIndexer<'a>;
type SerializeTupleStruct = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTupleVariant = ser::Impossible<Self::Ok, Self::Error>;
type SerializeMap = MapIndexer<'a>;
type SerializeStruct = StructIndexer<'a>;
type SerializeStructVariant = ser::Impossible<Self::Ok, Self::Error>;
fn serialize_bool(self, _value: bool) -> Result<Self::Ok, Self::Error> {
Ok(None)
}
fn serialize_char(self, value: char) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_i8(self, value: i8) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_i16(self, value: i16) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_i32(self, value: i32) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_i64(self, value: i64) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_u8(self, value: u8) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_u16(self, value: u16) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_u32(self, value: u32) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_u64(self, value: u64) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_f32(self, value: f32) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_f64(self, value: f64) -> Result<Self::Ok, Self::Error> {
let text = value.serialize(ConvertToString)?;
self.serialize_str(&text)
}
fn serialize_str(self, text: &str) -> Result<Self::Ok, Self::Error> {
let number_of_words = self
.indexer
.index_text(self.document_id, self.pos, text);
Ok(Some(number_of_words))
}
fn serialize_bytes(self, _v: &[u8]) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnindexableType { type_name: "&[u8]" })
}
fn serialize_none(self) -> Result<Self::Ok, Self::Error> {
Ok(None)
}
fn serialize_some<T: ?Sized>(self, value: &T) -> Result<Self::Ok, Self::Error>
where
T: ser::Serialize,
{
let text = value.serialize(ConvertToString)?;
let number_of_words = self
.indexer
.index_text(self.document_id, self.pos, &text);
Ok(Some(number_of_words))
}
fn serialize_unit(self) -> Result<Self::Ok, Self::Error> {
Ok(None)
}
fn serialize_unit_struct(self, _name: &'static str) -> Result<Self::Ok, Self::Error> {
Ok(None)
}
fn serialize_unit_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
) -> Result<Self::Ok, Self::Error> {
Ok(None)
}
fn serialize_newtype_struct<T: ?Sized>(
self,
_name: &'static str,
value: &T,
) -> Result<Self::Ok, Self::Error>
where
T: ser::Serialize,
{
value.serialize(self)
}
fn serialize_newtype_variant<T: ?Sized>(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_value: &T,
) -> Result<Self::Ok, Self::Error>
where
T: ser::Serialize,
{
Err(SerializerError::UnindexableType {
type_name: "newtype variant",
})
}
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
let indexer = SeqIndexer {
pos: self.pos,
document_id: self.document_id,
indexer: self.indexer,
texts: Vec::new(),
};
Ok(indexer)
}
fn serialize_tuple(self, _len: usize) -> Result<Self::SerializeTuple, Self::Error> {
let indexer = TupleIndexer {
pos: self.pos,
document_id: self.document_id,
indexer: self.indexer,
texts: Vec::new(),
};
Ok(indexer)
}
fn serialize_tuple_struct(
self,
_name: &'static str,
_len: usize,
) -> Result<Self::SerializeTupleStruct, Self::Error> {
Err(SerializerError::UnindexableType {
type_name: "tuple struct",
})
}
fn serialize_tuple_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_len: usize,
) -> Result<Self::SerializeTupleVariant, Self::Error> {
Err(SerializerError::UnindexableType {
type_name: "tuple variant",
})
}
fn serialize_map(self, _len: Option<usize>) -> Result<Self::SerializeMap, Self::Error> {
let indexer = MapIndexer {
pos: self.pos,
document_id: self.document_id,
indexer: self.indexer,
texts: Vec::new(),
};
Ok(indexer)
}
fn serialize_struct(
self,
_name: &'static str,
_len: usize,
) -> Result<Self::SerializeStruct, Self::Error> {
let indexer = StructIndexer {
pos: self.pos,
document_id: self.document_id,
indexer: self.indexer,
texts: Vec::new(),
};
Ok(indexer)
}
fn serialize_struct_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_len: usize,
) -> Result<Self::SerializeStructVariant, Self::Error> {
Err(SerializerError::UnindexableType {
type_name: "struct variant",
})
}
}
pub struct SeqIndexer<'a> {
pos: IndexedPos,
document_id: DocumentId,
indexer: &'a mut RawIndexer,
texts: Vec<String>,
}
impl<'a> ser::SerializeSeq for SeqIndexer<'a> {
type Ok = Option<usize>;
type Error = SerializerError;
fn serialize_element<T: ?Sized>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ser::Serialize,
{
let text = value.serialize(ConvertToString)?;
self.texts.push(text);
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
let texts = self.texts.iter().map(String::as_str);
self.indexer
.index_text_seq(self.document_id, self.pos, texts);
Ok(None)
}
}
pub struct MapIndexer<'a> {
pos: IndexedPos,
document_id: DocumentId,
indexer: &'a mut RawIndexer,
texts: Vec<String>,
}
impl<'a> ser::SerializeMap for MapIndexer<'a> {
type Ok = Option<usize>;
type Error = SerializerError;
fn serialize_key<T: ?Sized>(&mut self, key: &T) -> Result<(), Self::Error>
where
T: ser::Serialize,
{
let text = key.serialize(ConvertToString)?;
self.texts.push(text);
Ok(())
}
fn serialize_value<T: ?Sized>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ser::Serialize,
{
let text = value.serialize(ConvertToString)?;
self.texts.push(text);
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
let texts = self.texts.iter().map(String::as_str);
self.indexer
.index_text_seq(self.document_id, self.pos, texts);
Ok(None)
}
}
pub struct StructIndexer<'a> {
pos: IndexedPos,
document_id: DocumentId,
indexer: &'a mut RawIndexer,
texts: Vec<String>,
}
impl<'a> ser::SerializeStruct for StructIndexer<'a> {
type Ok = Option<usize>;
type Error = SerializerError;
fn serialize_field<T: ?Sized>(
&mut self,
key: &'static str,
value: &T,
) -> Result<(), Self::Error>
where
T: ser::Serialize,
{
let key_text = key.to_owned();
let value_text = value.serialize(ConvertToString)?;
self.texts.push(key_text);
self.texts.push(value_text);
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
let texts = self.texts.iter().map(String::as_str);
self.indexer
.index_text_seq(self.document_id, self.pos, texts);
Ok(None)
}
}
pub struct TupleIndexer<'a> {
pos: IndexedPos,
document_id: DocumentId,
indexer: &'a mut RawIndexer,
texts: Vec<String>,
}
impl<'a> ser::SerializeTuple for TupleIndexer<'a> {
type Ok = Option<usize>;
type Error = SerializerError;
fn serialize_element<T: ?Sized>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: Serialize,
{
let text = value.serialize(ConvertToString)?;
self.texts.push(text);
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
let texts = self.texts.iter().map(String::as_str);
self.indexer
.index_text_seq(self.document_id, self.pos, texts);
Ok(None)
}
}

View File

@ -1,26 +1,6 @@
macro_rules! forward_to_unserializable_type {
($($ty:ident => $se_method:ident,)*) => {
$(
fn $se_method(self, _v: $ty) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "$ty" })
}
)*
}
}
mod convert_to_number;
mod convert_to_string;
mod deserializer; mod deserializer;
mod extract_document_id;
mod indexer;
mod serializer;
pub use self::convert_to_number::ConvertToNumber;
pub use self::convert_to_string::ConvertToString;
pub use self::deserializer::{Deserializer, DeserializerError}; pub use self::deserializer::{Deserializer, DeserializerError};
pub use self::extract_document_id::{compute_document_id, extract_document_id, value_to_string};
pub use self::indexer::Indexer;
pub use self::serializer::{serialize_value, serialize_value_with_id, Serializer};
use std::{error::Error, fmt}; use std::{error::Error, fmt};
@ -33,7 +13,7 @@ use crate::ParseNumberError;
#[derive(Debug)] #[derive(Debug)]
pub enum SerializerError { pub enum SerializerError {
DocumentIdNotFound, DocumentIdNotFound,
InvalidDocumentIdType, InvalidDocumentIdFormat,
Zlmdb(heed::Error), Zlmdb(heed::Error),
SerdeJson(SerdeJsonError), SerdeJson(SerdeJsonError),
ParseNumber(ParseNumberError), ParseNumber(ParseNumberError),
@ -54,9 +34,9 @@ impl fmt::Display for SerializerError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self { match self {
SerializerError::DocumentIdNotFound => { SerializerError::DocumentIdNotFound => {
f.write_str("serialized document does not have an id according to the schema") f.write_str("Primary key is missing.")
} }
SerializerError::InvalidDocumentIdType => { SerializerError::InvalidDocumentIdFormat => {
f.write_str("a document primary key can be of type integer or string only composed of alphanumeric characters, hyphens (-) and underscores (_).") f.write_str("a document primary key can be of type integer or string only composed of alphanumeric characters, hyphens (-) and underscores (_).")
} }
SerializerError::Zlmdb(e) => write!(f, "heed related error: {}", e), SerializerError::Zlmdb(e) => write!(f, "heed related error: {}", e),

View File

@ -1,361 +0,0 @@
use meilisearch_schema::{Schema, FieldId};
use serde::ser;
use crate::database::MainT;
use crate::raw_indexer::RawIndexer;
use crate::store::{DocumentsFields, DocumentsFieldsCounts};
use crate::{DocumentId, RankedMap};
use super::{ConvertToNumber, ConvertToString, Indexer, SerializerError};
pub struct Serializer<'a, 'b> {
pub txn: &'a mut heed::RwTxn<'b, MainT>,
pub schema: &'a mut Schema,
pub document_store: DocumentsFields,
pub document_fields_counts: DocumentsFieldsCounts,
pub indexer: &'a mut RawIndexer,
pub ranked_map: &'a mut RankedMap,
pub document_id: DocumentId,
}
impl<'a, 'b> ser::Serializer for Serializer<'a, 'b> {
type Ok = ();
type Error = SerializerError;
type SerializeSeq = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTuple = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTupleStruct = ser::Impossible<Self::Ok, Self::Error>;
type SerializeTupleVariant = ser::Impossible<Self::Ok, Self::Error>;
type SerializeMap = MapSerializer<'a, 'b>;
type SerializeStruct = StructSerializer<'a, 'b>;
type SerializeStructVariant = ser::Impossible<Self::Ok, Self::Error>;
forward_to_unserializable_type! {
bool => serialize_bool,
char => serialize_char,
i8 => serialize_i8,
i16 => serialize_i16,
i32 => serialize_i32,
i64 => serialize_i64,
u8 => serialize_u8,
u16 => serialize_u16,
u32 => serialize_u32,
u64 => serialize_u64,
f32 => serialize_f32,
f64 => serialize_f64,
}
fn serialize_str(self, _v: &str) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "str" })
}
fn serialize_bytes(self, _v: &[u8]) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "&[u8]" })
}
fn serialize_none(self) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "Option",
})
}
fn serialize_some<T: ?Sized>(self, _value: &T) -> Result<Self::Ok, Self::Error>
where
T: ser::Serialize,
{
Err(SerializerError::UnserializableType {
type_name: "Option",
})
}
fn serialize_unit(self) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "()" })
}
fn serialize_unit_struct(self, _name: &'static str) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "unit struct",
})
}
fn serialize_unit_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
) -> Result<Self::Ok, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "unit variant",
})
}
fn serialize_newtype_struct<T: ?Sized>(
self,
_name: &'static str,
value: &T,
) -> Result<Self::Ok, Self::Error>
where
T: ser::Serialize,
{
value.serialize(self)
}
fn serialize_newtype_variant<T: ?Sized>(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_value: &T,
) -> Result<Self::Ok, Self::Error>
where
T: ser::Serialize,
{
Err(SerializerError::UnserializableType {
type_name: "newtype variant",
})
}
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "sequence",
})
}
fn serialize_tuple(self, _len: usize) -> Result<Self::SerializeTuple, Self::Error> {
Err(SerializerError::UnserializableType { type_name: "tuple" })
}
fn serialize_tuple_struct(
self,
_name: &'static str,
_len: usize,
) -> Result<Self::SerializeTupleStruct, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "tuple struct",
})
}
fn serialize_tuple_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_len: usize,
) -> Result<Self::SerializeTupleVariant, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "tuple variant",
})
}
fn serialize_map(self, _len: Option<usize>) -> Result<Self::SerializeMap, Self::Error> {
Ok(MapSerializer {
txn: self.txn,
schema: self.schema,
document_id: self.document_id,
document_store: self.document_store,
document_fields_counts: self.document_fields_counts,
indexer: self.indexer,
ranked_map: self.ranked_map,
current_key_name: None,
})
}
fn serialize_struct(
self,
_name: &'static str,
_len: usize,
) -> Result<Self::SerializeStruct, Self::Error> {
Ok(StructSerializer {
txn: self.txn,
schema: self.schema,
document_id: self.document_id,
document_store: self.document_store,
document_fields_counts: self.document_fields_counts,
indexer: self.indexer,
ranked_map: self.ranked_map,
})
}
fn serialize_struct_variant(
self,
_name: &'static str,
_variant_index: u32,
_variant: &'static str,
_len: usize,
) -> Result<Self::SerializeStructVariant, Self::Error> {
Err(SerializerError::UnserializableType {
type_name: "struct variant",
})
}
}
pub struct MapSerializer<'a, 'b> {
txn: &'a mut heed::RwTxn<'b, MainT>,
schema: &'a mut Schema,
document_id: DocumentId,
document_store: DocumentsFields,
document_fields_counts: DocumentsFieldsCounts,
indexer: &'a mut RawIndexer,
ranked_map: &'a mut RankedMap,
current_key_name: Option<String>,
}
impl<'a, 'b> ser::SerializeMap for MapSerializer<'a, 'b> {
type Ok = ();
type Error = SerializerError;
fn serialize_key<T: ?Sized>(&mut self, key: &T) -> Result<(), Self::Error>
where
T: ser::Serialize,
{
let key = key.serialize(ConvertToString)?;
self.current_key_name = Some(key);
Ok(())
}
fn serialize_value<T: ?Sized>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ser::Serialize,
{
let key = self.current_key_name.take().unwrap();
self.serialize_entry(&key, value)
}
fn serialize_entry<K: ?Sized, V: ?Sized>(
&mut self,
key: &K,
value: &V,
) -> Result<(), Self::Error>
where
K: ser::Serialize,
V: ser::Serialize,
{
let key = key.serialize(ConvertToString)?;
serialize_value(
self.txn,
key.as_str(),
self.schema,
self.document_id,
self.document_store,
self.document_fields_counts,
self.indexer,
self.ranked_map,
value,
)
}
fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(())
}
}
pub struct StructSerializer<'a, 'b> {
txn: &'a mut heed::RwTxn<'b, MainT>,
schema: &'a mut Schema,
document_id: DocumentId,
document_store: DocumentsFields,
document_fields_counts: DocumentsFieldsCounts,
indexer: &'a mut RawIndexer,
ranked_map: &'a mut RankedMap,
}
impl<'a, 'b> ser::SerializeStruct for StructSerializer<'a, 'b> {
type Ok = ();
type Error = SerializerError;
fn serialize_field<T: ?Sized>(
&mut self,
key: &'static str,
value: &T,
) -> Result<(), Self::Error>
where
T: ser::Serialize,
{
serialize_value(
self.txn,
key,
self.schema,
self.document_id,
self.document_store,
self.document_fields_counts,
self.indexer,
self.ranked_map,
value,
)
}
fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(())
}
}
pub fn serialize_value<'a, T: ?Sized>(
txn: &mut heed::RwTxn<MainT>,
attribute: &str,
schema: &'a mut Schema,
document_id: DocumentId,
document_store: DocumentsFields,
documents_fields_counts: DocumentsFieldsCounts,
indexer: &mut RawIndexer,
ranked_map: &mut RankedMap,
value: &T,
) -> Result<(), SerializerError>
where
T: ser::Serialize,
{
let field_id = schema.insert_and_index(&attribute)?;
serialize_value_with_id(
txn,
field_id,
schema,
document_id,
document_store,
documents_fields_counts,
indexer,
ranked_map,
value,
)
}
pub fn serialize_value_with_id<'a, T: ?Sized>(
txn: &mut heed::RwTxn<MainT>,
field_id: FieldId,
schema: &'a Schema,
document_id: DocumentId,
document_store: DocumentsFields,
documents_fields_counts: DocumentsFieldsCounts,
indexer: &mut RawIndexer,
ranked_map: &mut RankedMap,
value: &T,
) -> Result<(), SerializerError>
where
T: ser::Serialize,
{
let serialized = serde_json::to_vec(value)?;
document_store.put_document_field(txn, document_id, field_id, &serialized)?;
if let Some(indexed_pos) = schema.is_indexed(field_id) {
let indexer = Indexer {
pos: *indexed_pos,
indexer,
document_id,
};
if let Some(number_of_words) = value.serialize(indexer)? {
documents_fields_counts.put_document_field_count(
txn,
document_id,
*indexed_pos,
number_of_words as u16,
)?;
}
}
if schema.is_ranked(field_id) {
let number = value.serialize(ConvertToNumber).unwrap_or_default();
ranked_map.insert(document_id, field_id, number);
}
Ok(())
}

View File

@ -10,8 +10,7 @@ use self::RankingRule::*;
pub const DEFAULT_RANKING_RULES: [RankingRule; 6] = [Typo, Words, Proximity, Attribute, WordsPosition, Exactness]; pub const DEFAULT_RANKING_RULES: [RankingRule; 6] = [Typo, Words, Proximity, Attribute, WordsPosition, Exactness];
static RANKING_RULE_REGEX: Lazy<regex::Regex> = Lazy::new(|| { static RANKING_RULE_REGEX: Lazy<regex::Regex> = Lazy::new(|| {
let regex = regex::Regex::new(r"(asc|desc)\(([a-zA-Z0-9-_]*)\)").unwrap(); regex::Regex::new(r"(asc|desc)\(([a-zA-Z0-9-_]*)\)").unwrap()
regex
}); });
#[derive(Default, Clone, Serialize, Deserialize)] #[derive(Default, Clone, Serialize, Deserialize)]
@ -30,7 +29,7 @@ pub struct Settings {
#[serde(default, deserialize_with = "deserialize_some")] #[serde(default, deserialize_with = "deserialize_some")]
pub synonyms: Option<Option<BTreeMap<String, Vec<String>>>>, pub synonyms: Option<Option<BTreeMap<String, Vec<String>>>>,
#[serde(default, deserialize_with = "deserialize_some")] #[serde(default, deserialize_with = "deserialize_some")]
pub accept_new_fields: Option<Option<bool>>, pub attributes_for_faceting: Option<Option<Vec<String>>>,
} }
// Any value that is present is considered Some value, including null. // Any value that is present is considered Some value, including null.
@ -42,11 +41,11 @@ fn deserialize_some<'de, T, D>(deserializer: D) -> Result<Option<T>, D::Error>
} }
impl Settings { impl Settings {
pub fn into_update(&self) -> Result<SettingsUpdate, RankingRuleConversionError> { pub fn to_update(&self) -> Result<SettingsUpdate, RankingRuleConversionError> {
let settings = self.clone(); let settings = self.clone();
let ranking_rules = match settings.ranking_rules { let ranking_rules = match settings.ranking_rules {
Some(Some(rules)) => UpdateState::Update(RankingRule::from_iter(rules.iter())?), Some(Some(rules)) => UpdateState::Update(RankingRule::try_from_iter(rules.iter())?),
Some(None) => UpdateState::Clear, Some(None) => UpdateState::Clear,
None => UpdateState::Nothing, None => UpdateState::Nothing,
}; };
@ -59,7 +58,7 @@ impl Settings {
displayed_attributes: settings.displayed_attributes.into(), displayed_attributes: settings.displayed_attributes.into(),
stop_words: settings.stop_words.into(), stop_words: settings.stop_words.into(),
synonyms: settings.synonyms.into(), synonyms: settings.synonyms.into(),
accept_new_fields: settings.accept_new_fields.into(), attributes_for_faceting: settings.attributes_for_faceting.into(),
}) })
} }
} }
@ -149,7 +148,7 @@ impl RankingRule {
} }
} }
pub fn from_iter(rules: impl IntoIterator<Item = impl AsRef<str>>) -> Result<Vec<RankingRule>, RankingRuleConversionError> { pub fn try_from_iter(rules: impl IntoIterator<Item = impl AsRef<str>>) -> Result<Vec<RankingRule>, RankingRuleConversionError> {
rules.into_iter() rules.into_iter()
.map(|s| RankingRule::from_str(s.as_ref())) .map(|s| RankingRule::from_str(s.as_ref()))
.collect() .collect()
@ -165,7 +164,7 @@ pub struct SettingsUpdate {
pub displayed_attributes: UpdateState<HashSet<String>>, pub displayed_attributes: UpdateState<HashSet<String>>,
pub stop_words: UpdateState<BTreeSet<String>>, pub stop_words: UpdateState<BTreeSet<String>>,
pub synonyms: UpdateState<BTreeMap<String, Vec<String>>>, pub synonyms: UpdateState<BTreeMap<String, Vec<String>>>,
pub accept_new_fields: UpdateState<bool>, pub attributes_for_faceting: UpdateState<Vec<String>>,
} }
impl Default for SettingsUpdate { impl Default for SettingsUpdate {
@ -178,7 +177,7 @@ impl Default for SettingsUpdate {
displayed_attributes: UpdateState::Nothing, displayed_attributes: UpdateState::Nothing,
stop_words: UpdateState::Nothing, stop_words: UpdateState::Nothing,
synonyms: UpdateState::Nothing, synonyms: UpdateState::Nothing,
accept_new_fields: UpdateState::Nothing, attributes_for_faceting: UpdateState::Nothing,
} }
} }
} }

View File

@ -0,0 +1,32 @@
use std::borrow::Cow;
use heed::{types::CowSlice, BytesEncode, BytesDecode};
use sdset::{Set, SetBuf};
use zerocopy::{AsBytes, FromBytes};
pub struct CowSet<T>(std::marker::PhantomData<T>);
impl<'a, T: 'a> BytesEncode<'a> for CowSet<T>
where
T: AsBytes,
{
type EItem = Set<T>;
fn bytes_encode(item: &'a Self::EItem) -> Option<Cow<[u8]>> {
CowSlice::bytes_encode(item.as_slice())
}
}
impl<'a, T: 'a> BytesDecode<'a> for CowSet<T>
where
T: FromBytes + Copy,
{
type DItem = Cow<'a, Set<T>>;
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
match CowSlice::<T>::bytes_decode(bytes)? {
Cow::Owned(vec) => Some(Cow::Owned(SetBuf::new_unchecked(vec))),
Cow::Borrowed(slice) => Some(Cow::Borrowed(Set::new_unchecked(slice))),
}
}
}

View File

@ -1,13 +1,15 @@
use super::BEU64; use std::borrow::Cow;
use crate::database::MainT;
use crate::DocumentId;
use heed::types::{ByteSlice, OwnedType};
use heed::Result as ZResult; use heed::Result as ZResult;
use std::sync::Arc; use heed::types::{ByteSlice, OwnedType};
use crate::database::MainT;
use crate::{DocumentId, FstSetCow};
use super::BEU32;
#[derive(Copy, Clone)] #[derive(Copy, Clone)]
pub struct DocsWords { pub struct DocsWords {
pub(crate) docs_words: heed::Database<OwnedType<BEU64>, ByteSlice>, pub(crate) docs_words: heed::Database<OwnedType<BEU32>, ByteSlice>,
} }
impl DocsWords { impl DocsWords {
@ -15,15 +17,15 @@ impl DocsWords {
self, self,
writer: &mut heed::RwTxn<MainT>, writer: &mut heed::RwTxn<MainT>,
document_id: DocumentId, document_id: DocumentId,
words: &fst::Set, words: &FstSetCow,
) -> ZResult<()> { ) -> ZResult<()> {
let document_id = BEU64::new(document_id.0); let document_id = BEU32::new(document_id.0);
let bytes = words.as_fst().as_bytes(); let bytes = words.as_fst().as_bytes();
self.docs_words.put(writer, &document_id, bytes) self.docs_words.put(writer, &document_id, bytes)
} }
pub fn del_doc_words(self, writer: &mut heed::RwTxn<MainT>, document_id: DocumentId) -> ZResult<bool> { pub fn del_doc_words(self, writer: &mut heed::RwTxn<MainT>, document_id: DocumentId) -> ZResult<bool> {
let document_id = BEU64::new(document_id.0); let document_id = BEU32::new(document_id.0);
self.docs_words.delete(writer, &document_id) self.docs_words.delete(writer, &document_id)
} }
@ -31,20 +33,11 @@ impl DocsWords {
self.docs_words.clear(writer) self.docs_words.clear(writer)
} }
pub fn doc_words( pub fn doc_words(self, reader: &heed::RoTxn<MainT>, document_id: DocumentId) -> ZResult<FstSetCow> {
self, let document_id = BEU32::new(document_id.0);
reader: &heed::RoTxn<MainT>,
document_id: DocumentId,
) -> ZResult<Option<fst::Set>> {
let document_id = BEU64::new(document_id.0);
match self.docs_words.get(reader, &document_id)? { match self.docs_words.get(reader, &document_id)? {
Some(bytes) => { Some(bytes) => Ok(fst::Set::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
let len = bytes.len(); None => Ok(fst::Set::default().map_data(Cow::Owned).unwrap()),
let bytes = Arc::new(bytes.to_owned());
let fst = fst::raw::Fst::from_shared_bytes(bytes, 0, len).unwrap();
Ok(Some(fst::Set::from(fst)))
}
None => Ok(None),
} }
} }
} }

View File

@ -4,6 +4,7 @@ use crate::DocumentId;
use heed::types::OwnedType; use heed::types::OwnedType;
use heed::Result as ZResult; use heed::Result as ZResult;
use meilisearch_schema::IndexedPos; use meilisearch_schema::IndexedPos;
use crate::MResult;
#[derive(Copy, Clone)] #[derive(Copy, Clone)]
pub struct DocumentsFieldsCounts { pub struct DocumentsFieldsCounts {
@ -60,7 +61,7 @@ impl DocumentsFieldsCounts {
Ok(DocumentFieldsCountsIter { iter }) Ok(DocumentFieldsCountsIter { iter })
} }
pub fn documents_ids<'txn>(self, reader: &'txn heed::RoTxn<MainT>) -> ZResult<DocumentsIdsIter<'txn>> { pub fn documents_ids<'txn>(self, reader: &'txn heed::RoTxn<MainT>) -> MResult<DocumentsIdsIter<'txn>> {
let iter = self.documents_fields_counts.iter(reader)?; let iter = self.documents_fields_counts.iter(reader)?;
Ok(DocumentsIdsIter { Ok(DocumentsIdsIter {
last_seen_id: None, last_seen_id: None,
@ -102,7 +103,7 @@ pub struct DocumentsIdsIter<'txn> {
} }
impl Iterator for DocumentsIdsIter<'_> { impl Iterator for DocumentsIdsIter<'_> {
type Item = ZResult<DocumentId>; type Item = MResult<DocumentId>;
fn next(&mut self) -> Option<Self::Item> { fn next(&mut self) -> Option<Self::Item> {
for result in &mut self.iter { for result in &mut self.iter {
@ -114,7 +115,7 @@ impl Iterator for DocumentsIdsIter<'_> {
return Some(Ok(document_id)); return Some(Ok(document_id));
} }
} }
Err(e) => return Some(Err(e)), Err(e) => return Some(Err(e.into())),
} }
} }
None None

View File

@ -0,0 +1,75 @@
use std::borrow::Cow;
use heed::{BytesDecode, BytesEncode};
use sdset::Set;
use crate::DocumentId;
use super::cow_set::CowSet;
pub struct DocumentsIds;
impl BytesEncode<'_> for DocumentsIds {
type EItem = Set<DocumentId>;
fn bytes_encode(item: &Self::EItem) -> Option<Cow<[u8]>> {
CowSet::bytes_encode(item)
}
}
impl<'a> BytesDecode<'a> for DocumentsIds {
type DItem = Cow<'a, Set<DocumentId>>;
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
CowSet::bytes_decode(bytes)
}
}
pub struct DiscoverIds<'a> {
ids_iter: std::slice::Iter<'a, DocumentId>,
left_id: Option<u32>,
right_id: Option<u32>,
available_range: std::ops::Range<u32>,
}
impl DiscoverIds<'_> {
pub fn new(ids: &Set<DocumentId>) -> DiscoverIds {
let mut ids_iter = ids.iter();
let right_id = ids_iter.next().map(|id| id.0);
let available_range = 0..right_id.unwrap_or(u32::max_value());
DiscoverIds { ids_iter, left_id: None, right_id, available_range }
}
}
impl Iterator for DiscoverIds<'_> {
type Item = DocumentId;
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.available_range.next() {
// The available range gives us a new id, we return it.
Some(id) => return Some(DocumentId(id)),
// The available range is exhausted, we need to find the next one.
None if self.available_range.end == u32::max_value() => return None,
None => loop {
self.left_id = self.right_id.take();
self.right_id = self.ids_iter.next().map(|id| id.0);
match (self.left_id, self.right_id) {
// We found a gap in the used ids, we can yield all ids
// until the end of the gap
(Some(l), Some(r)) => if l.saturating_add(1) != r {
self.available_range = (l + 1)..r;
break;
},
// The last used id has been reached, we can use all ids
// until u32 MAX
(Some(l), None) => {
self.available_range = l.saturating_add(1)..u32::max_value();
break;
},
_ => (),
}
},
}
}
}
}

View File

@ -0,0 +1,97 @@
use std::borrow::Cow;
use std::collections::HashMap;
use std::mem;
use heed::{RwTxn, RoTxn, RoRange, types::Str, BytesEncode, BytesDecode};
use sdset::{SetBuf, Set, SetOperation};
use meilisearch_types::DocumentId;
use meilisearch_schema::FieldId;
use crate::MResult;
use crate::database::MainT;
use crate::facets::FacetKey;
use super::cow_set::CowSet;
/// contains facet info
#[derive(Clone, Copy)]
pub struct Facets {
pub(crate) facets: heed::Database<FacetKey, FacetData>,
}
pub struct FacetData;
impl<'a> BytesEncode<'a> for FacetData {
type EItem = (&'a str, &'a Set<DocumentId>);
fn bytes_encode(item: &'a Self::EItem) -> Option<Cow<'a, [u8]>> {
// get size of the first item
let first_size = item.0.as_bytes().len();
let size = mem::size_of::<u64>()
+ first_size
+ item.1.len() * mem::size_of::<DocumentId>();
let mut buffer = Vec::with_capacity(size);
// encode the length of the first item
buffer.extend_from_slice(&first_size.to_be_bytes());
buffer.extend_from_slice(Str::bytes_encode(&item.0)?.as_ref());
let second_slice = CowSet::bytes_encode(&item.1)?;
buffer.extend_from_slice(second_slice.as_ref());
Some(Cow::Owned(buffer))
}
}
impl<'a> BytesDecode<'a> for FacetData {
type DItem = (&'a str, Cow<'a, Set<DocumentId>>);
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
const LEN: usize = mem::size_of::<u64>();
let mut size_buf = [0; LEN];
size_buf.copy_from_slice(bytes.get(0..LEN)?);
// decode size of the first item from the bytes
let first_size = usize::from_be_bytes(size_buf);
// decode first and second items
let first_item = Str::bytes_decode(bytes.get(LEN..(LEN + first_size))?)?;
let second_item = CowSet::bytes_decode(bytes.get((LEN + first_size)..)?)?;
Some((first_item, second_item))
}
}
impl Facets {
// we use sdset::SetBuf to ensure the docids are sorted.
pub fn put_facet_document_ids(&self, writer: &mut RwTxn<MainT>, facet_key: FacetKey, doc_ids: &Set<DocumentId>, facet_value: &str) -> MResult<()> {
Ok(self.facets.put(writer, &facet_key, &(facet_value, doc_ids))?)
}
pub fn field_document_ids<'txn>(&self, reader: &'txn RoTxn<MainT>, field_id: FieldId) -> MResult<RoRange<'txn, FacetKey, FacetData>> {
Ok(self.facets.prefix_iter(reader, &FacetKey::new(field_id, String::new()))?)
}
pub fn facet_document_ids<'txn>(&self, reader: &'txn RoTxn<MainT>, facet_key: &FacetKey) -> MResult<Option<(&'txn str,Cow<'txn, Set<DocumentId>>)>> {
Ok(self.facets.get(reader, &facet_key)?)
}
/// updates the facets store, revmoving the documents from the facets provided in the
/// `facet_map` argument
pub fn remove(&self, writer: &mut RwTxn<MainT>, facet_map: HashMap<FacetKey, (String, Vec<DocumentId>)>) -> MResult<()> {
for (key, (name, document_ids)) in facet_map {
if let Some((_, old)) = self.facets.get(writer, &key)? {
let to_remove = SetBuf::from_dirty(document_ids);
let new = sdset::duo::OpBuilder::new(old.as_ref(), to_remove.as_set()).difference().into_set_buf();
self.facets.put(writer, &key, &(&name, new.as_set()))?;
}
}
Ok(())
}
pub fn add(&self, writer: &mut RwTxn<MainT>, facet_map: HashMap<FacetKey, (String, Vec<DocumentId>)>) -> MResult<()> {
for (key, (facet_name, document_ids)) in facet_map {
let set = SetBuf::from_dirty(document_ids);
self.put_facet_document_ids(writer, key, set.as_set(), &facet_name)?;
}
Ok(())
}
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
Ok(self.facets.clear(writer)?)
}
}

View File

@ -1,26 +1,33 @@
use std::sync::Arc; use std::borrow::Cow;
use std::collections::HashMap; use std::collections::HashMap;
use chrono::{DateTime, Utc}; use chrono::{DateTime, Utc};
use heed::types::{ByteSlice, OwnedType, SerdeBincode, Str}; use heed::types::{ByteSlice, OwnedType, SerdeBincode, Str, CowSlice};
use heed::Result as ZResult; use meilisearch_schema::{FieldId, Schema};
use meilisearch_schema::Schema; use meilisearch_types::DocumentId;
use sdset::Set;
use crate::database::MainT; use crate::database::MainT;
use crate::RankedMap; use crate::{RankedMap, MResult};
use crate::settings::RankingRule; use crate::settings::RankingRule;
use crate::{FstSetCow, FstMapCow};
use super::{CowSet, DocumentsIds};
const ATTRIBUTES_FOR_FACETING_KEY: &str = "attributes-for-faceting";
const CREATED_AT_KEY: &str = "created-at"; const CREATED_AT_KEY: &str = "created-at";
const RANKING_RULES_KEY: &str = "ranking-rules";
const DISTINCT_ATTRIBUTE_KEY: &str = "distinct-attribute";
const STOP_WORDS_KEY: &str = "stop-words";
const SYNONYMS_KEY: &str = "synonyms";
const CUSTOMS_KEY: &str = "customs"; const CUSTOMS_KEY: &str = "customs";
const FIELDS_FREQUENCY_KEY: &str = "fields-frequency"; const DISTINCT_ATTRIBUTE_KEY: &str = "distinct-attribute";
const EXTERNAL_DOCIDS_KEY: &str = "external-docids";
const FIELDS_DISTRIBUTION_KEY: &str = "fields-distribution";
const INTERNAL_DOCIDS_KEY: &str = "internal-docids";
const NAME_KEY: &str = "name"; const NAME_KEY: &str = "name";
const NUMBER_OF_DOCUMENTS_KEY: &str = "number-of-documents"; const NUMBER_OF_DOCUMENTS_KEY: &str = "number-of-documents";
const RANKED_MAP_KEY: &str = "ranked-map"; const RANKED_MAP_KEY: &str = "ranked-map";
const RANKING_RULES_KEY: &str = "ranking-rules";
const SCHEMA_KEY: &str = "schema"; const SCHEMA_KEY: &str = "schema";
const SORTED_DOCUMENT_IDS_CACHE_KEY: &str = "sorted-document-ids-cache";
const STOP_WORDS_KEY: &str = "stop-words";
const SYNONYMS_KEY: &str = "synonyms";
const UPDATED_AT_KEY: &str = "updated-at"; const UPDATED_AT_KEY: &str = "updated-at";
const WORDS_KEY: &str = "words"; const WORDS_KEY: &str = "words";
@ -34,122 +41,200 @@ pub struct Main {
} }
impl Main { impl Main {
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<()> { pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
self.main.clear(writer) Ok(self.main.clear(writer)?)
} }
pub fn put_name(self, writer: &mut heed::RwTxn<MainT>, name: &str) -> ZResult<()> { pub fn put_name(self, writer: &mut heed::RwTxn<MainT>, name: &str) -> MResult<()> {
self.main.put::<_, Str, Str>(writer, NAME_KEY, name) Ok(self.main.put::<_, Str, Str>(writer, NAME_KEY, name)?)
} }
pub fn name(self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<String>> { pub fn name(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<String>> {
Ok(self Ok(self
.main .main
.get::<_, Str, Str>(reader, NAME_KEY)? .get::<_, Str, Str>(reader, NAME_KEY)?
.map(|name| name.to_owned())) .map(|name| name.to_owned()))
} }
pub fn put_created_at(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<()> { pub fn put_created_at(self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
self.main Ok(self.main.put::<_, Str, SerdeDatetime>(writer, CREATED_AT_KEY, &Utc::now())?)
.put::<_, Str, SerdeDatetime>(writer, CREATED_AT_KEY, &Utc::now())
} }
pub fn created_at(self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<DateTime<Utc>>> { pub fn created_at(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<DateTime<Utc>>> {
self.main.get::<_, Str, SerdeDatetime>(reader, CREATED_AT_KEY) Ok(self.main.get::<_, Str, SerdeDatetime>(reader, CREATED_AT_KEY)?)
} }
pub fn put_updated_at(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<()> { pub fn put_updated_at(self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
self.main Ok(self.main.put::<_, Str, SerdeDatetime>(writer, UPDATED_AT_KEY, &Utc::now())?)
.put::<_, Str, SerdeDatetime>(writer, UPDATED_AT_KEY, &Utc::now())
} }
pub fn updated_at(self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<DateTime<Utc>>> { pub fn updated_at(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<DateTime<Utc>>> {
self.main.get::<_, Str, SerdeDatetime>(reader, UPDATED_AT_KEY) Ok(self.main.get::<_, Str, SerdeDatetime>(reader, UPDATED_AT_KEY)?)
} }
pub fn put_words_fst(self, writer: &mut heed::RwTxn<MainT>, fst: &fst::Set) -> ZResult<()> { pub fn put_internal_docids(self, writer: &mut heed::RwTxn<MainT>, ids: &sdset::Set<DocumentId>) -> MResult<()> {
let bytes = fst.as_fst().as_bytes(); Ok(self.main.put::<_, Str, DocumentsIds>(writer, INTERNAL_DOCIDS_KEY, ids)?)
self.main.put::<_, Str, ByteSlice>(writer, WORDS_KEY, bytes)
} }
pub unsafe fn static_words_fst(self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<fst::Set>> { pub fn internal_docids<'txn>(self, reader: &'txn heed::RoTxn<MainT>) -> MResult<Cow<'txn, sdset::Set<DocumentId>>> {
match self.main.get::<_, Str, ByteSlice>(reader, WORDS_KEY)? { match self.main.get::<_, Str, DocumentsIds>(reader, INTERNAL_DOCIDS_KEY)? {
Some(bytes) => { Some(ids) => Ok(ids),
let bytes: &'static [u8] = std::mem::transmute(bytes); None => Ok(Cow::default()),
let set = fst::Set::from_static_slice(bytes).unwrap();
Ok(Some(set))
}
None => Ok(None),
} }
} }
pub fn words_fst(self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<fst::Set>> { pub fn merge_internal_docids(self, writer: &mut heed::RwTxn<MainT>, new_ids: &sdset::Set<DocumentId>) -> MResult<()> {
match self.main.get::<_, Str, ByteSlice>(reader, WORDS_KEY)? { use sdset::SetOperation;
Some(bytes) => {
let len = bytes.len(); // We do an union of the old and new internal ids.
let bytes = Arc::new(bytes.to_owned()); let internal_docids = self.internal_docids(writer)?;
let fst = fst::raw::Fst::from_shared_bytes(bytes, 0, len).unwrap(); let internal_docids = sdset::duo::Union::new(&internal_docids, new_ids).into_set_buf();
Ok(Some(fst::Set::from(fst))) Ok(self.put_internal_docids(writer, &internal_docids)?)
} }
None => Ok(None),
pub fn remove_internal_docids(self, writer: &mut heed::RwTxn<MainT>, ids: &sdset::Set<DocumentId>) -> MResult<()> {
use sdset::SetOperation;
// We do a difference of the old and new internal ids.
let internal_docids = self.internal_docids(writer)?;
let internal_docids = sdset::duo::Difference::new(&internal_docids, ids).into_set_buf();
Ok(self.put_internal_docids(writer, &internal_docids)?)
}
pub fn put_external_docids<A>(self, writer: &mut heed::RwTxn<MainT>, ids: &fst::Map<A>) -> MResult<()>
where A: AsRef<[u8]>,
{
Ok(self.main.put::<_, Str, ByteSlice>(writer, EXTERNAL_DOCIDS_KEY, ids.as_fst().as_bytes())?)
}
pub fn merge_external_docids<A>(self, writer: &mut heed::RwTxn<MainT>, new_docids: &fst::Map<A>) -> MResult<()>
where A: AsRef<[u8]>,
{
use fst::{Streamer, IntoStreamer};
// Do an union of the old and the new set of external docids.
let external_docids = self.external_docids(writer)?;
let mut op = external_docids.op().add(new_docids.into_stream()).r#union();
let mut build = fst::MapBuilder::memory();
while let Some((docid, values)) = op.next() {
build.insert(docid, values[0].value).unwrap();
}
drop(op);
let external_docids = build.into_map();
Ok(self.put_external_docids(writer, &external_docids)?)
}
pub fn remove_external_docids<A>(self, writer: &mut heed::RwTxn<MainT>, ids: &fst::Map<A>) -> MResult<()>
where A: AsRef<[u8]>,
{
use fst::{Streamer, IntoStreamer};
// Do an union of the old and the new set of external docids.
let external_docids = self.external_docids(writer)?;
let mut op = external_docids.op().add(ids.into_stream()).difference();
let mut build = fst::MapBuilder::memory();
while let Some((docid, values)) = op.next() {
build.insert(docid, values[0].value).unwrap();
}
drop(op);
let external_docids = build.into_map();
self.put_external_docids(writer, &external_docids)
}
pub fn external_docids(self, reader: &heed::RoTxn<MainT>) -> MResult<FstMapCow> {
match self.main.get::<_, Str, ByteSlice>(reader, EXTERNAL_DOCIDS_KEY)? {
Some(bytes) => Ok(fst::Map::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
None => Ok(fst::Map::default().map_data(Cow::Owned).unwrap()),
} }
} }
pub fn put_schema(self, writer: &mut heed::RwTxn<MainT>, schema: &Schema) -> ZResult<()> { pub fn external_to_internal_docid(self, reader: &heed::RoTxn<MainT>, external_docid: &str) -> MResult<Option<DocumentId>> {
self.main.put::<_, Str, SerdeBincode<Schema>>(writer, SCHEMA_KEY, schema) let external_ids = self.external_docids(reader)?;
Ok(external_ids.get(external_docid).map(|id| DocumentId(id as u32)))
} }
pub fn schema(self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<Schema>> { pub fn words_fst(self, reader: &heed::RoTxn<MainT>) -> MResult<FstSetCow> {
self.main.get::<_, Str, SerdeBincode<Schema>>(reader, SCHEMA_KEY) match self.main.get::<_, Str, ByteSlice>(reader, WORDS_KEY)? {
Some(bytes) => Ok(fst::Set::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
None => Ok(fst::Set::default().map_data(Cow::Owned).unwrap()),
}
} }
pub fn delete_schema(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<bool> { pub fn put_words_fst<A: AsRef<[u8]>>(self, writer: &mut heed::RwTxn<MainT>, fst: &fst::Set<A>) -> MResult<()> {
self.main.delete::<_, Str>(writer, SCHEMA_KEY) Ok(self.main.put::<_, Str, ByteSlice>(writer, WORDS_KEY, fst.as_fst().as_bytes())?)
} }
pub fn put_ranked_map(self, writer: &mut heed::RwTxn<MainT>, ranked_map: &RankedMap) -> ZResult<()> { pub fn put_sorted_document_ids_cache(self, writer: &mut heed::RwTxn<MainT>, documents_ids: &[DocumentId]) -> MResult<()> {
self.main.put::<_, Str, SerdeBincode<RankedMap>>(writer, RANKED_MAP_KEY, &ranked_map) Ok(self.main.put::<_, Str, CowSlice<DocumentId>>(writer, SORTED_DOCUMENT_IDS_CACHE_KEY, documents_ids)?)
} }
pub fn ranked_map(self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<RankedMap>> { pub fn sorted_document_ids_cache(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<Cow<[DocumentId]>>> {
self.main.get::<_, Str, SerdeBincode<RankedMap>>(reader, RANKED_MAP_KEY) Ok(self.main.get::<_, Str, CowSlice<DocumentId>>(reader, SORTED_DOCUMENT_IDS_CACHE_KEY)?)
} }
pub fn put_synonyms_fst(self, writer: &mut heed::RwTxn<MainT>, fst: &fst::Set) -> ZResult<()> { pub fn put_schema(self, writer: &mut heed::RwTxn<MainT>, schema: &Schema) -> MResult<()> {
Ok(self.main.put::<_, Str, SerdeBincode<Schema>>(writer, SCHEMA_KEY, schema)?)
}
pub fn schema(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<Schema>> {
Ok(self.main.get::<_, Str, SerdeBincode<Schema>>(reader, SCHEMA_KEY)?)
}
pub fn delete_schema(self, writer: &mut heed::RwTxn<MainT>) -> MResult<bool> {
Ok(self.main.delete::<_, Str>(writer, SCHEMA_KEY)?)
}
pub fn put_ranked_map(self, writer: &mut heed::RwTxn<MainT>, ranked_map: &RankedMap) -> MResult<()> {
Ok(self.main.put::<_, Str, SerdeBincode<RankedMap>>(writer, RANKED_MAP_KEY, &ranked_map)?)
}
pub fn ranked_map(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<RankedMap>> {
Ok(self.main.get::<_, Str, SerdeBincode<RankedMap>>(reader, RANKED_MAP_KEY)?)
}
pub fn put_synonyms_fst<A: AsRef<[u8]>>(self, writer: &mut heed::RwTxn<MainT>, fst: &fst::Set<A>) -> MResult<()> {
let bytes = fst.as_fst().as_bytes(); let bytes = fst.as_fst().as_bytes();
self.main.put::<_, Str, ByteSlice>(writer, SYNONYMS_KEY, bytes) Ok(self.main.put::<_, Str, ByteSlice>(writer, SYNONYMS_KEY, bytes)?)
} }
pub fn synonyms_fst(self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<fst::Set>> { pub(crate) fn synonyms_fst(self, reader: &heed::RoTxn<MainT>) -> MResult<FstSetCow> {
match self.main.get::<_, Str, ByteSlice>(reader, SYNONYMS_KEY)? { match self.main.get::<_, Str, ByteSlice>(reader, SYNONYMS_KEY)? {
Some(bytes) => { Some(bytes) => Ok(fst::Set::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
let len = bytes.len(); None => Ok(fst::Set::default().map_data(Cow::Owned).unwrap()),
let bytes = Arc::new(bytes.to_owned());
let fst = fst::raw::Fst::from_shared_bytes(bytes, 0, len).unwrap();
Ok(Some(fst::Set::from(fst)))
}
None => Ok(None),
} }
} }
pub fn put_stop_words_fst(self, writer: &mut heed::RwTxn<MainT>, fst: &fst::Set) -> ZResult<()> { pub fn synonyms(self, reader: &heed::RoTxn<MainT>) -> MResult<Vec<String>> {
let synonyms = self
.synonyms_fst(&reader)?
.stream()
.into_strs()?;
Ok(synonyms)
}
pub fn put_stop_words_fst<A: AsRef<[u8]>>(self, writer: &mut heed::RwTxn<MainT>, fst: &fst::Set<A>) -> MResult<()> {
let bytes = fst.as_fst().as_bytes(); let bytes = fst.as_fst().as_bytes();
self.main.put::<_, Str, ByteSlice>(writer, STOP_WORDS_KEY, bytes) Ok(self.main.put::<_, Str, ByteSlice>(writer, STOP_WORDS_KEY, bytes)?)
} }
pub fn stop_words_fst(self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<fst::Set>> { pub(crate) fn stop_words_fst(self, reader: &heed::RoTxn<MainT>) -> MResult<FstSetCow> {
match self.main.get::<_, Str, ByteSlice>(reader, STOP_WORDS_KEY)? { match self.main.get::<_, Str, ByteSlice>(reader, STOP_WORDS_KEY)? {
Some(bytes) => { Some(bytes) => Ok(fst::Set::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
let len = bytes.len(); None => Ok(fst::Set::default().map_data(Cow::Owned).unwrap()),
let bytes = Arc::new(bytes.to_owned());
let fst = fst::raw::Fst::from_shared_bytes(bytes, 0, len).unwrap();
Ok(Some(fst::Set::from(fst)))
}
None => Ok(None),
} }
} }
pub fn put_number_of_documents<F>(self, writer: &mut heed::RwTxn<MainT>, f: F) -> ZResult<u64> pub fn stop_words(self, reader: &heed::RoTxn<MainT>) -> MResult<Vec<String>> {
let stop_word_list = self
.stop_words_fst(reader)?
.stream()
.into_strs()?;
Ok(stop_word_list)
}
pub fn put_number_of_documents<F>(self, writer: &mut heed::RwTxn<MainT>, f: F) -> MResult<u64>
where where
F: Fn(u64) -> u64, F: Fn(u64) -> u64,
{ {
@ -159,68 +244,77 @@ impl Main {
Ok(new) Ok(new)
} }
pub fn number_of_documents(self, reader: &heed::RoTxn<MainT>) -> ZResult<u64> { pub fn number_of_documents(self, reader: &heed::RoTxn<MainT>) -> MResult<u64> {
match self match self
.main .main
.get::<_, Str, OwnedType<u64>>(reader, NUMBER_OF_DOCUMENTS_KEY)? .get::<_, Str, OwnedType<u64>>(reader, NUMBER_OF_DOCUMENTS_KEY)? {
{
Some(value) => Ok(value), Some(value) => Ok(value),
None => Ok(0), None => Ok(0),
} }
} }
pub fn put_fields_frequency( pub fn put_fields_distribution(
self, self,
writer: &mut heed::RwTxn<MainT>, writer: &mut heed::RwTxn<MainT>,
fields_frequency: &FreqsMap, fields_frequency: &FreqsMap,
) -> ZResult<()> { ) -> MResult<()> {
self.main Ok(self.main.put::<_, Str, SerdeFreqsMap>(writer, FIELDS_DISTRIBUTION_KEY, fields_frequency)?)
.put::<_, Str, SerdeFreqsMap>(writer, FIELDS_FREQUENCY_KEY, fields_frequency)
} }
pub fn fields_frequency(&self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<FreqsMap>> { pub fn fields_distribution(&self, reader: &heed::RoTxn<MainT>) -> MResult<Option<FreqsMap>> {
match self match self
.main .main
.get::<_, Str, SerdeFreqsMap>(reader, FIELDS_FREQUENCY_KEY)? .get::<_, Str, SerdeFreqsMap>(reader, FIELDS_DISTRIBUTION_KEY)?
{ {
Some(freqs) => Ok(Some(freqs)), Some(freqs) => Ok(Some(freqs)),
None => Ok(None), None => Ok(None),
} }
} }
pub fn ranking_rules(&self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<Vec<RankingRule>>> { pub fn attributes_for_faceting<'txn>(&self, reader: &'txn heed::RoTxn<MainT>) -> MResult<Option<Cow<'txn, Set<FieldId>>>> {
self.main.get::<_, Str, SerdeBincode<Vec<RankingRule>>>(reader, RANKING_RULES_KEY) Ok(self.main.get::<_, Str, CowSet<FieldId>>(reader, ATTRIBUTES_FOR_FACETING_KEY)?)
} }
pub fn put_ranking_rules(self, writer: &mut heed::RwTxn<MainT>, value: &[RankingRule]) -> ZResult<()> { pub fn put_attributes_for_faceting(self, writer: &mut heed::RwTxn<MainT>, attributes: &Set<FieldId>) -> MResult<()> {
self.main.put::<_, Str, SerdeBincode<Vec<RankingRule>>>(writer, RANKING_RULES_KEY, &value.to_vec()) Ok(self.main.put::<_, Str, CowSet<FieldId>>(writer, ATTRIBUTES_FOR_FACETING_KEY, attributes)?)
} }
pub fn delete_ranking_rules(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<bool> { pub fn delete_attributes_for_faceting(self, writer: &mut heed::RwTxn<MainT>) -> MResult<bool> {
self.main.delete::<_, Str>(writer, RANKING_RULES_KEY) Ok(self.main.delete::<_, Str>(writer, ATTRIBUTES_FOR_FACETING_KEY)?)
} }
pub fn distinct_attribute(&self, reader: &heed::RoTxn<MainT>) -> ZResult<Option<String>> { pub fn ranking_rules(&self, reader: &heed::RoTxn<MainT>) -> MResult<Option<Vec<RankingRule>>> {
if let Some(value) = self.main.get::<_, Str, Str>(reader, DISTINCT_ATTRIBUTE_KEY)? { Ok(self.main.get::<_, Str, SerdeBincode<Vec<RankingRule>>>(reader, RANKING_RULES_KEY)?)
return Ok(Some(value.to_owned())) }
pub fn put_ranking_rules(self, writer: &mut heed::RwTxn<MainT>, value: &[RankingRule]) -> MResult<()> {
Ok(self.main.put::<_, Str, SerdeBincode<Vec<RankingRule>>>(writer, RANKING_RULES_KEY, &value.to_vec())?)
}
pub fn delete_ranking_rules(self, writer: &mut heed::RwTxn<MainT>) -> MResult<bool> {
Ok(self.main.delete::<_, Str>(writer, RANKING_RULES_KEY)?)
}
pub fn distinct_attribute(&self, reader: &heed::RoTxn<MainT>) -> MResult<Option<FieldId>> {
match self.main.get::<_, Str, OwnedType<u16>>(reader, DISTINCT_ATTRIBUTE_KEY)? {
Some(value) => Ok(Some(FieldId(value.to_owned()))),
None => Ok(None),
} }
return Ok(None)
} }
pub fn put_distinct_attribute(self, writer: &mut heed::RwTxn<MainT>, value: &str) -> ZResult<()> { pub fn put_distinct_attribute(self, writer: &mut heed::RwTxn<MainT>, value: FieldId) -> MResult<()> {
self.main.put::<_, Str, Str>(writer, DISTINCT_ATTRIBUTE_KEY, value) Ok(self.main.put::<_, Str, OwnedType<u16>>(writer, DISTINCT_ATTRIBUTE_KEY, &value.0)?)
} }
pub fn delete_distinct_attribute(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<bool> { pub fn delete_distinct_attribute(self, writer: &mut heed::RwTxn<MainT>) -> MResult<bool> {
self.main.delete::<_, Str>(writer, DISTINCT_ATTRIBUTE_KEY) Ok(self.main.delete::<_, Str>(writer, DISTINCT_ATTRIBUTE_KEY)?)
} }
pub fn put_customs(self, writer: &mut heed::RwTxn<MainT>, customs: &[u8]) -> ZResult<()> { pub fn put_customs(self, writer: &mut heed::RwTxn<MainT>, customs: &[u8]) -> MResult<()> {
self.main Ok(self.main.put::<_, Str, ByteSlice>(writer, CUSTOMS_KEY, customs)?)
.put::<_, Str, ByteSlice>(writer, CUSTOMS_KEY, customs)
} }
pub fn customs<'txn>(self, reader: &'txn heed::RoTxn<MainT>) -> ZResult<Option<&'txn [u8]>> { pub fn customs<'txn>(self, reader: &'txn heed::RoTxn<MainT>) -> MResult<Option<&'txn [u8]>> {
self.main.get::<_, Str, ByteSlice>(reader, CUSTOMS_KEY) Ok(self.main.get::<_, Str, ByteSlice>(reader, CUSTOMS_KEY)?)
} }
} }

View File

@ -1,23 +1,27 @@
mod cow_set;
mod docs_words; mod docs_words;
mod prefix_documents_cache; mod documents_ids;
mod prefix_postings_lists_cache;
mod documents_fields; mod documents_fields;
mod documents_fields_counts; mod documents_fields_counts;
mod facets;
mod main; mod main;
mod postings_lists; mod postings_lists;
mod prefix_documents_cache;
mod prefix_postings_lists_cache;
mod synonyms; mod synonyms;
mod updates; mod updates;
mod updates_results; mod updates_results;
pub use self::cow_set::CowSet;
pub use self::docs_words::DocsWords; pub use self::docs_words::DocsWords;
pub use self::prefix_documents_cache::PrefixDocumentsCache;
pub use self::prefix_postings_lists_cache::PrefixPostingsListsCache;
pub use self::documents_fields::{DocumentFieldsIter, DocumentsFields}; pub use self::documents_fields::{DocumentFieldsIter, DocumentsFields};
pub use self::documents_fields_counts::{ pub use self::documents_fields_counts::{DocumentFieldsCountsIter, DocumentsFieldsCounts, DocumentsIdsIter};
DocumentFieldsCountsIter, DocumentsFieldsCounts, DocumentsIdsIter, pub use self::documents_ids::{DocumentsIds, DiscoverIds};
}; pub use self::facets::Facets;
pub use self::main::Main; pub use self::main::Main;
pub use self::postings_lists::PostingsLists; pub use self::postings_lists::PostingsLists;
pub use self::prefix_documents_cache::PrefixDocumentsCache;
pub use self::prefix_postings_lists_cache::PrefixPostingsListsCache;
pub use self::synonyms::Synonyms; pub use self::synonyms::Synonyms;
pub use self::updates::Updates; pub use self::updates::Updates;
pub use self::updates_results::UpdatesResults; pub use self::updates_results::UpdatesResults;
@ -27,7 +31,6 @@ use std::collections::HashSet;
use std::convert::TryInto; use std::convert::TryInto;
use std::{mem, ptr}; use std::{mem, ptr};
use heed::Result as ZResult;
use heed::{BytesEncode, BytesDecode}; use heed::{BytesEncode, BytesDecode};
use meilisearch_schema::{IndexedPos, FieldId}; use meilisearch_schema::{IndexedPos, FieldId};
use sdset::{Set, SetBuf}; use sdset::{Set, SetBuf};
@ -41,20 +44,21 @@ use crate::serde::Deserializer;
use crate::settings::SettingsUpdate; use crate::settings::SettingsUpdate;
use crate::{query_builder::QueryBuilder, update, DocIndex, DocumentId, Error, MResult}; use crate::{query_builder::QueryBuilder, update, DocIndex, DocumentId, Error, MResult};
type BEU32 = zerocopy::U32<byteorder::BigEndian>;
type BEU64 = zerocopy::U64<byteorder::BigEndian>; type BEU64 = zerocopy::U64<byteorder::BigEndian>;
type BEU16 = zerocopy::U16<byteorder::BigEndian>; pub type BEU16 = zerocopy::U16<byteorder::BigEndian>;
#[derive(Debug, Copy, Clone, AsBytes, FromBytes)] #[derive(Debug, Copy, Clone, AsBytes, FromBytes)]
#[repr(C)] #[repr(C)]
pub struct DocumentFieldIndexedKey { pub struct DocumentFieldIndexedKey {
docid: BEU64, docid: BEU32,
indexed_pos: BEU16, indexed_pos: BEU16,
} }
impl DocumentFieldIndexedKey { impl DocumentFieldIndexedKey {
fn new(docid: DocumentId, indexed_pos: IndexedPos) -> DocumentFieldIndexedKey { fn new(docid: DocumentId, indexed_pos: IndexedPos) -> DocumentFieldIndexedKey {
DocumentFieldIndexedKey { DocumentFieldIndexedKey {
docid: BEU64::new(docid.0), docid: BEU32::new(docid.0),
indexed_pos: BEU16::new(indexed_pos.0), indexed_pos: BEU16::new(indexed_pos.0),
} }
} }
@ -63,14 +67,14 @@ impl DocumentFieldIndexedKey {
#[derive(Debug, Copy, Clone, AsBytes, FromBytes)] #[derive(Debug, Copy, Clone, AsBytes, FromBytes)]
#[repr(C)] #[repr(C)]
pub struct DocumentFieldStoredKey { pub struct DocumentFieldStoredKey {
docid: BEU64, docid: BEU32,
field_id: BEU16, field_id: BEU16,
} }
impl DocumentFieldStoredKey { impl DocumentFieldStoredKey {
fn new(docid: DocumentId, field_id: FieldId) -> DocumentFieldStoredKey { fn new(docid: DocumentId, field_id: FieldId) -> DocumentFieldStoredKey {
DocumentFieldStoredKey { DocumentFieldStoredKey {
docid: BEU64::new(docid.0), docid: BEU32::new(docid.0),
field_id: BEU16::new(field_id.0), field_id: BEU16::new(field_id.0),
} }
} }
@ -94,7 +98,7 @@ impl<'a> BytesEncode<'a> for PostingsCodec {
let mut buffer = Vec::with_capacity(u64_size + docids_size + matches_size); let mut buffer = Vec::with_capacity(u64_size + docids_size + matches_size);
let docids_len = item.docids.len(); let docids_len = item.docids.len() as u64;
buffer.extend_from_slice(&docids_len.to_be_bytes()); buffer.extend_from_slice(&docids_len.to_be_bytes());
buffer.extend_from_slice(item.docids.as_bytes()); buffer.extend_from_slice(item.docids.as_bytes());
buffer.extend_from_slice(item.matches.as_bytes()); buffer.extend_from_slice(item.matches.as_bytes());
@ -197,12 +201,17 @@ fn updates_results_name(name: &str) -> String {
format!("store-{}-updates-results", name) format!("store-{}-updates-results", name)
} }
fn facets_name(name: &str) -> String {
format!("store-{}-facets", name)
}
#[derive(Clone)] #[derive(Clone)]
pub struct Index { pub struct Index {
pub main: Main, pub main: Main,
pub postings_lists: PostingsLists, pub postings_lists: PostingsLists,
pub documents_fields: DocumentsFields, pub documents_fields: DocumentsFields,
pub documents_fields_counts: DocumentsFieldsCounts, pub documents_fields_counts: DocumentsFieldsCounts,
pub facets: Facets,
pub synonyms: Synonyms, pub synonyms: Synonyms,
pub docs_words: DocsWords, pub docs_words: DocsWords,
pub prefix_documents_cache: PrefixDocumentsCache, pub prefix_documents_cache: PrefixDocumentsCache,
@ -269,14 +278,14 @@ impl Index {
} }
} }
pub fn customs_update(&self, writer: &mut heed::RwTxn<UpdateT>, customs: Vec<u8>) -> ZResult<u64> { pub fn customs_update(&self, writer: &mut heed::RwTxn<UpdateT>, customs: Vec<u8>) -> MResult<u64> {
let _ = self.updates_notifier.send(UpdateEvent::NewUpdate); let _ = self.updates_notifier.send(UpdateEvent::NewUpdate);
update::push_customs_update(writer, self.updates, self.updates_results, customs) Ok(update::push_customs_update(writer, self.updates, self.updates_results, customs)?)
} }
pub fn settings_update(&self, writer: &mut heed::RwTxn<UpdateT>, update: SettingsUpdate) -> ZResult<u64> { pub fn settings_update(&self, writer: &mut heed::RwTxn<UpdateT>, update: SettingsUpdate) -> MResult<u64> {
let _ = self.updates_notifier.send(UpdateEvent::NewUpdate); let _ = self.updates_notifier.send(UpdateEvent::NewUpdate);
update::push_settings_update(writer, self.updates, self.updates_results, update) Ok(update::push_settings_update(writer, self.updates, self.updates_results, update)?)
} }
pub fn documents_addition<D>(&self) -> update::DocumentsAddition<D> { pub fn documents_addition<D>(&self) -> update::DocumentsAddition<D> {
@ -334,14 +343,14 @@ impl Index {
for id in 0..=last_id { for id in 0..=last_id {
if let Some(update) = self.update_status(reader, id)? { if let Some(update) = self.update_status(reader, id)? {
updates.push(update); updates.push(update);
last_update_result_id = id; last_update_result_id = id + 1;
} }
} }
} }
// retrieve all enqueued updates // retrieve all enqueued updates
if let Some((last_id, _)) = self.updates.last_update(reader)? { if let Some((last_id, _)) = self.updates.last_update(reader)? {
for id in last_update_result_id + 1..=last_id { for id in last_update_result_id..=last_id {
if let Some(update) = self.update_status(reader, id)? { if let Some(update) = self.update_status(reader, id)? {
updates.push(update); updates.push(update);
} }
@ -352,29 +361,14 @@ impl Index {
} }
pub fn query_builder(&self) -> QueryBuilder { pub fn query_builder(&self) -> QueryBuilder {
QueryBuilder::new( QueryBuilder::new(self)
self.main,
self.postings_lists,
self.documents_fields_counts,
self.synonyms,
self.prefix_documents_cache,
self.prefix_postings_lists_cache,
)
} }
pub fn query_builder_with_criteria<'c, 'f, 'd>( pub fn query_builder_with_criteria<'c, 'f, 'd, 'i>(
&self, &'i self,
criteria: Criteria<'c>, criteria: Criteria<'c>,
) -> QueryBuilder<'c, 'f, 'd> { ) -> QueryBuilder<'c, 'f, 'd, 'i> {
QueryBuilder::with_criteria( QueryBuilder::with_criteria(self, criteria)
self.main,
self.postings_lists,
self.documents_fields_counts,
self.synonyms,
self.prefix_documents_cache,
self.prefix_postings_lists_cache,
criteria,
)
} }
} }
@ -395,12 +389,14 @@ pub fn create(
let prefix_postings_lists_cache_name = prefix_postings_lists_cache_name(name); let prefix_postings_lists_cache_name = prefix_postings_lists_cache_name(name);
let updates_name = updates_name(name); let updates_name = updates_name(name);
let updates_results_name = updates_results_name(name); let updates_results_name = updates_results_name(name);
let facets_name = facets_name(name);
// open all the stores // open all the stores
let main = env.create_poly_database(Some(&main_name))?; let main = env.create_poly_database(Some(&main_name))?;
let postings_lists = env.create_database(Some(&postings_lists_name))?; let postings_lists = env.create_database(Some(&postings_lists_name))?;
let documents_fields = env.create_database(Some(&documents_fields_name))?; let documents_fields = env.create_database(Some(&documents_fields_name))?;
let documents_fields_counts = env.create_database(Some(&documents_fields_counts_name))?; let documents_fields_counts = env.create_database(Some(&documents_fields_counts_name))?;
let facets = env.create_database(Some(&facets_name))?;
let synonyms = env.create_database(Some(&synonyms_name))?; let synonyms = env.create_database(Some(&synonyms_name))?;
let docs_words = env.create_database(Some(&docs_words_name))?; let docs_words = env.create_database(Some(&docs_words_name))?;
let prefix_documents_cache = env.create_database(Some(&prefix_documents_cache_name))?; let prefix_documents_cache = env.create_database(Some(&prefix_documents_cache_name))?;
@ -417,6 +413,8 @@ pub fn create(
docs_words: DocsWords { docs_words }, docs_words: DocsWords { docs_words },
prefix_postings_lists_cache: PrefixPostingsListsCache { prefix_postings_lists_cache }, prefix_postings_lists_cache: PrefixPostingsListsCache { prefix_postings_lists_cache },
prefix_documents_cache: PrefixDocumentsCache { prefix_documents_cache }, prefix_documents_cache: PrefixDocumentsCache { prefix_documents_cache },
facets: Facets { facets },
updates: Updates { updates }, updates: Updates { updates },
updates_results: UpdatesResults { updates_results }, updates_results: UpdatesResults { updates_results },
updates_notifier, updates_notifier,
@ -437,6 +435,7 @@ pub fn open(
let synonyms_name = synonyms_name(name); let synonyms_name = synonyms_name(name);
let docs_words_name = docs_words_name(name); let docs_words_name = docs_words_name(name);
let prefix_documents_cache_name = prefix_documents_cache_name(name); let prefix_documents_cache_name = prefix_documents_cache_name(name);
let facets_name = facets_name(name);
let prefix_postings_lists_cache_name = prefix_postings_lists_cache_name(name); let prefix_postings_lists_cache_name = prefix_postings_lists_cache_name(name);
let updates_name = updates_name(name); let updates_name = updates_name(name);
let updates_results_name = updates_results_name(name); let updates_results_name = updates_results_name(name);
@ -470,6 +469,10 @@ pub fn open(
Some(prefix_documents_cache) => prefix_documents_cache, Some(prefix_documents_cache) => prefix_documents_cache,
None => return Ok(None), None => return Ok(None),
}; };
let facets = match env.open_database(Some(&facets_name))? {
Some(facets) => facets,
None => return Ok(None),
};
let prefix_postings_lists_cache = match env.open_database(Some(&prefix_postings_lists_cache_name))? { let prefix_postings_lists_cache = match env.open_database(Some(&prefix_postings_lists_cache_name))? {
Some(prefix_postings_lists_cache) => prefix_postings_lists_cache, Some(prefix_postings_lists_cache) => prefix_postings_lists_cache,
None => return Ok(None), None => return Ok(None),
@ -491,6 +494,7 @@ pub fn open(
synonyms: Synonyms { synonyms }, synonyms: Synonyms { synonyms },
docs_words: DocsWords { docs_words }, docs_words: DocsWords { docs_words },
prefix_documents_cache: PrefixDocumentsCache { prefix_documents_cache }, prefix_documents_cache: PrefixDocumentsCache { prefix_documents_cache },
facets: Facets { facets },
prefix_postings_lists_cache: PrefixPostingsListsCache { prefix_postings_lists_cache }, prefix_postings_lists_cache: PrefixPostingsListsCache { prefix_postings_lists_cache },
updates: Updates { updates }, updates: Updates { updates },
updates_results: UpdatesResults { updates_results }, updates_results: UpdatesResults { updates_results },

View File

@ -4,7 +4,7 @@ use heed::types::{OwnedType, CowSlice};
use heed::Result as ZResult; use heed::Result as ZResult;
use zerocopy::{AsBytes, FromBytes}; use zerocopy::{AsBytes, FromBytes};
use super::BEU64; use super::{BEU64, BEU32};
use crate::{DocumentId, Highlight}; use crate::{DocumentId, Highlight};
use crate::database::MainT; use crate::database::MainT;
@ -13,15 +13,15 @@ use crate::database::MainT;
pub struct PrefixKey { pub struct PrefixKey {
prefix: [u8; 4], prefix: [u8; 4],
index: BEU64, index: BEU64,
docid: BEU64, docid: BEU32,
} }
impl PrefixKey { impl PrefixKey {
pub fn new(prefix: [u8; 4], index: u64, docid: u64) -> PrefixKey { pub fn new(prefix: [u8; 4], index: u64, docid: u32) -> PrefixKey {
PrefixKey { PrefixKey {
prefix, prefix,
index: BEU64::new(index), index: BEU64::new(index),
docid: BEU64::new(docid), docid: BEU32::new(docid),
} }
} }
} }
@ -54,7 +54,7 @@ impl PrefixDocumentsCache {
prefix: [u8; 4], prefix: [u8; 4],
) -> ZResult<PrefixDocumentsIter<'txn>> { ) -> ZResult<PrefixDocumentsIter<'txn>> {
let start = PrefixKey::new(prefix, 0, 0); let start = PrefixKey::new(prefix, 0, 0);
let end = PrefixKey::new(prefix, u64::max_value(), u64::max_value()); let end = PrefixKey::new(prefix, u64::max_value(), u32::max_value());
let iter = self.prefix_documents_cache.range(reader, &(start..=end))?; let iter = self.prefix_documents_cache.range(reader, &(start..=end))?;
Ok(PrefixDocumentsIter { iter }) Ok(PrefixDocumentsIter { iter })
} }

View File

@ -1,7 +1,10 @@
use heed::types::ByteSlice; use std::borrow::Cow;
use crate::database::MainT;
use heed::Result as ZResult; use heed::Result as ZResult;
use std::sync::Arc; use heed::types::ByteSlice;
use crate::database::MainT;
use crate::{FstSetCow, MResult};
#[derive(Copy, Clone)] #[derive(Copy, Clone)]
pub struct Synonyms { pub struct Synonyms {
@ -9,12 +12,9 @@ pub struct Synonyms {
} }
impl Synonyms { impl Synonyms {
pub fn put_synonyms( pub fn put_synonyms<A>(self, writer: &mut heed::RwTxn<MainT>, word: &[u8], synonyms: &fst::Set<A>) -> ZResult<()>
self, where A: AsRef<[u8]>,
writer: &mut heed::RwTxn<MainT>, {
word: &[u8],
synonyms: &fst::Set,
) -> ZResult<()> {
let bytes = synonyms.as_fst().as_bytes(); let bytes = synonyms.as_fst().as_bytes();
self.synonyms.put(writer, word, bytes) self.synonyms.put(writer, word, bytes)
} }
@ -27,15 +27,18 @@ impl Synonyms {
self.synonyms.clear(writer) self.synonyms.clear(writer)
} }
pub fn synonyms(self, reader: &heed::RoTxn<MainT>, word: &[u8]) -> ZResult<Option<fst::Set>> { pub(crate) fn synonyms_fst<'txn>(self, reader: &'txn heed::RoTxn<MainT>, word: &[u8]) -> ZResult<FstSetCow<'txn>> {
match self.synonyms.get(reader, word)? { match self.synonyms.get(reader, word)? {
Some(bytes) => { Some(bytes) => Ok(fst::Set::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
let len = bytes.len(); None => Ok(fst::Set::default().map_data(Cow::Owned).unwrap()),
let bytes = Arc::new(bytes.to_owned());
let fst = fst::raw::Fst::from_shared_bytes(bytes, 0, len).unwrap();
Ok(Some(fst::Set::from(fst)))
}
None => Ok(None),
} }
} }
pub fn synonyms(self, reader: &heed::RoTxn<MainT>, word: &[u8]) -> MResult<Vec<String>> {
let synonyms = self
.synonyms_fst(&reader, word)?
.stream()
.into_strs()?;
Ok(synonyms)
}
} }

View File

@ -7,6 +7,8 @@ pub fn apply_clear_all(
index: &store::Index, index: &store::Index,
) -> MResult<()> { ) -> MResult<()> {
index.main.put_words_fst(writer, &fst::Set::default())?; index.main.put_words_fst(writer, &fst::Set::default())?;
index.main.put_external_docids(writer, &fst::Map::default())?;
index.main.put_internal_docids(writer, &sdset::SetBuf::default())?;
index.main.put_ranked_map(writer, &RankedMap::default())?; index.main.put_ranked_map(writer, &RankedMap::default())?;
index.main.put_number_of_documents(writer, |_| 0)?; index.main.put_number_of_documents(writer, |_| 0)?;
index.documents_fields.clear(writer)?; index.documents_fields.clear(writer)?;

View File

@ -1,14 +1,13 @@
use heed::Result as ZResult;
use crate::database::{MainT, UpdateT}; use crate::database::{MainT, UpdateT};
use crate::store; use crate::{store, MResult};
use crate::update::{next_update_id, Update}; use crate::update::{next_update_id, Update};
pub fn apply_customs_update( pub fn apply_customs_update(
writer: &mut heed::RwTxn<MainT>, writer: &mut heed::RwTxn<MainT>,
main_store: store::Main, main_store: store::Main,
customs: &[u8], customs: &[u8],
) -> ZResult<()> { ) -> MResult<()> {
main_store.put_customs(writer, customs) main_store.put_customs(writer, customs)
} }
@ -17,7 +16,7 @@ pub fn push_customs_update(
updates_store: store::Updates, updates_store: store::Updates,
updates_results_store: store::UpdatesResults, updates_results_store: store::UpdatesResults,
customs: Vec<u8>, customs: Vec<u8>,
) -> ZResult<u64> { ) -> MResult<u64> {
let last_update_id = next_update_id(writer, updates_store, updates_results_store)?; let last_update_id = next_update_id(writer, updates_store, updates_results_store)?;
let update = Update::customs(customs); let update = Update::customs(customs);

View File

@ -1,15 +1,21 @@
use std::collections::HashMap; use std::borrow::Cow;
use std::collections::{HashMap, BTreeMap};
use fst::{set::OpBuilder, SetBuilder}; use fst::{set::OpBuilder, SetBuilder};
use indexmap::IndexMap; use indexmap::IndexMap;
use meilisearch_schema::{Schema, FieldId};
use meilisearch_types::DocumentId;
use sdset::{duo::Union, SetOperation}; use sdset::{duo::Union, SetOperation};
use serde::{Deserialize, Serialize}; use serde::Deserialize;
use serde_json::Value;
use crate::database::{MainT, UpdateT}; use crate::database::{MainT, UpdateT};
use crate::database::{UpdateEvent, UpdateEventsEmitter}; use crate::database::{UpdateEvent, UpdateEventsEmitter};
use crate::facets;
use crate::raw_indexer::RawIndexer; use crate::raw_indexer::RawIndexer;
use crate::serde::{extract_document_id, serialize_value_with_id, Deserializer, Serializer}; use crate::serde::Deserializer;
use crate::store; use crate::store::{self, DocumentsFields, DocumentsFieldsCounts, DiscoverIds};
use crate::update::helpers::{index_value, value_to_number, extract_document_id};
use crate::update::{apply_documents_deletion, compute_short_prefixes, next_update_id, Update}; use crate::update::{apply_documents_deletion, compute_short_prefixes, next_update_id, Update};
use crate::{Error, MResult, RankedMap}; use crate::{Error, MResult, RankedMap};
@ -103,33 +109,109 @@ pub fn push_documents_addition<D: serde::Serialize>(
Ok(last_update_id) Ok(last_update_id)
} }
pub fn apply_documents_addition<'a, 'b>( #[allow(clippy::too_many_arguments)]
fn index_document<A>(
writer: &mut heed::RwTxn<MainT>,
documents_fields: DocumentsFields,
documents_fields_counts: DocumentsFieldsCounts,
ranked_map: &mut RankedMap,
indexer: &mut RawIndexer<A>,
schema: &Schema,
field_id: FieldId,
document_id: DocumentId,
value: &Value,
) -> MResult<()>
where A: AsRef<[u8]>,
{
let serialized = serde_json::to_vec(value)?;
documents_fields.put_document_field(writer, document_id, field_id, &serialized)?;
if let Some(indexed_pos) = schema.is_indexed(field_id) {
let number_of_words = index_value(indexer, document_id, *indexed_pos, value);
if let Some(number_of_words) = number_of_words {
documents_fields_counts.put_document_field_count(
writer,
document_id,
*indexed_pos,
number_of_words as u16,
)?;
}
}
if schema.is_ranked(field_id) {
let number = value_to_number(value).unwrap_or_default();
ranked_map.insert(document_id, field_id, number);
}
Ok(())
}
pub fn apply_addition<'a, 'b>(
writer: &'a mut heed::RwTxn<'b, MainT>, writer: &'a mut heed::RwTxn<'b, MainT>,
index: &store::Index, index: &store::Index,
addition: Vec<IndexMap<String, serde_json::Value>>, new_documents: Vec<IndexMap<String, Value>>,
) -> MResult<()> { partial: bool
let mut documents_additions = HashMap::new(); ) -> MResult<()>
{
let mut schema = match index.main.schema(writer)? { let mut schema = match index.main.schema(writer)? {
Some(schema) => schema, Some(schema) => schema,
None => return Err(Error::SchemaMissing), None => return Err(Error::SchemaMissing),
}; };
// Retrieve the documents ids related structures
let external_docids = index.main.external_docids(writer)?;
let internal_docids = index.main.internal_docids(writer)?;
let mut available_ids = DiscoverIds::new(&internal_docids);
let primary_key = schema.primary_key().ok_or(Error::MissingPrimaryKey)?; let primary_key = schema.primary_key().ok_or(Error::MissingPrimaryKey)?;
// 1. store documents ids for future deletion // 1. store documents ids for future deletion
for document in addition { let mut documents_additions = HashMap::new();
let document_id = match extract_document_id(&primary_key, &document)? { let mut new_external_docids = BTreeMap::new();
Some(id) => id, let mut new_internal_docids = Vec::with_capacity(new_documents.len());
None => return Err(Error::MissingDocumentId),
for mut document in new_documents {
let external_docids_get = |docid: &str| {
match (external_docids.get(docid), new_external_docids.get(docid)) {
(_, Some(&id))
| (Some(id), _) => Some(id as u32),
(None, None) => None,
}
}; };
documents_additions.insert(document_id, document); let (internal_docid, external_docid) =
extract_document_id(
&primary_key,
&document,
&external_docids_get,
&mut available_ids,
)?;
new_external_docids.insert(external_docid, internal_docid.0 as u64);
new_internal_docids.push(internal_docid);
if partial {
let mut deserializer = Deserializer {
document_id: internal_docid,
reader: writer,
documents_fields: index.documents_fields,
schema: &schema,
fields: None,
};
let old_document = Option::<HashMap<String, Value>>::deserialize(&mut deserializer)?;
if let Some(old_document) = old_document {
for (key, value) in old_document {
document.entry(key).or_insert(value);
}
}
}
documents_additions.insert(internal_docid, document);
} }
// 2. remove the documents posting lists // 2. remove the documents postings lists
let number_of_inserted_documents = documents_additions.len(); let number_of_inserted_documents = documents_additions.len();
let documents_ids = documents_additions.iter().map(|(id, _)| *id).collect(); let documents_ids = new_external_docids.iter().map(|(id, _)| id.clone()).collect();
apply_documents_deletion(writer, index, documents_ids)?; apply_documents_deletion(writer, index, documents_ids)?;
let mut ranked_map = match index.main.ranked_map(writer)? { let mut ranked_map = match index.main.ranked_map(writer)? {
@ -137,26 +219,28 @@ pub fn apply_documents_addition<'a, 'b>(
None => RankedMap::default(), None => RankedMap::default(),
}; };
let stop_words = match index.main.stop_words_fst(writer)? { let stop_words = index.main.stop_words_fst(writer)?.map_data(Cow::into_owned)?;
Some(stop_words) => stop_words,
None => fst::Set::default(),
};
// 3. index the documents fields in the stores
let mut indexer = RawIndexer::new(stop_words); let mut indexer = RawIndexer::new(stop_words);
for (document_id, document) in documents_additions { // For each document in this update
let serializer = Serializer { for (document_id, document) in &documents_additions {
txn: writer, // For each key-value pair in the document.
schema: &mut schema, for (attribute, value) in document {
document_store: index.documents_fields, let field_id = schema.insert_and_index(&attribute)?;
document_fields_counts: index.documents_fields_counts, index_document(
indexer: &mut indexer, writer,
ranked_map: &mut ranked_map, index.documents_fields,
document_id, index.documents_fields_counts,
}; &mut ranked_map,
&mut indexer,
document.serialize(serializer)?; &schema,
field_id,
*document_id,
&value,
)?;
}
} }
write_documents_addition_index( write_documents_addition_index(
@ -169,93 +253,39 @@ pub fn apply_documents_addition<'a, 'b>(
index.main.put_schema(writer, &schema)?; index.main.put_schema(writer, &schema)?;
let new_external_docids = fst::Map::from_iter(new_external_docids.iter().map(|(ext, id)| (ext, *id as u64)))?;
let new_internal_docids = sdset::SetBuf::from_dirty(new_internal_docids);
index.main.merge_external_docids(writer, &new_external_docids)?;
index.main.merge_internal_docids(writer, &new_internal_docids)?;
// recompute all facet attributes after document update.
if let Some(attributes_for_facetting) = index.main.attributes_for_faceting(writer)? {
let docids = index.main.internal_docids(writer)?;
let facet_map = facets::facet_map_from_docids(writer, index, &docids, attributes_for_facetting.as_ref())?;
index.facets.add(writer, facet_map)?;
}
// update is finished; update sorted document id cache with new state
let mut document_ids = index.main.internal_docids(writer)?.to_vec();
super::cache_document_ids_sorted(writer, &ranked_map, index, &mut document_ids)?;
Ok(()) Ok(())
} }
pub fn apply_documents_partial_addition<'a, 'b>( pub fn apply_documents_partial_addition<'a, 'b>(
writer: &'a mut heed::RwTxn<'b, MainT>, writer: &'a mut heed::RwTxn<'b, MainT>,
index: &store::Index, index: &store::Index,
addition: Vec<IndexMap<String, serde_json::Value>>, new_documents: Vec<IndexMap<String, Value>>,
) -> MResult<()> { ) -> MResult<()> {
let mut documents_additions = HashMap::new(); apply_addition(writer, index, new_documents, true)
}
let mut schema = match index.main.schema(writer)? { pub fn apply_documents_addition<'a, 'b>(
Some(schema) => schema, writer: &'a mut heed::RwTxn<'b, MainT>,
None => return Err(Error::SchemaMissing), index: &store::Index,
}; new_documents: Vec<IndexMap<String, Value>>,
) -> MResult<()> {
let primary_key = schema.primary_key().ok_or(Error::MissingPrimaryKey)?; apply_addition(writer, index, new_documents, false)
// 1. store documents ids for future deletion
for mut document in addition {
let document_id = match extract_document_id(&primary_key, &document)? {
Some(id) => id,
None => return Err(Error::MissingDocumentId),
};
let mut deserializer = Deserializer {
document_id,
reader: writer,
documents_fields: index.documents_fields,
schema: &schema,
fields: None,
};
// retrieve the old document and
// update the new one with missing keys found in the old one
let result = Option::<HashMap<String, serde_json::Value>>::deserialize(&mut deserializer)?;
if let Some(old_document) = result {
for (key, value) in old_document {
document.entry(key).or_insert(value);
}
}
documents_additions.insert(document_id, document);
}
// 2. remove the documents posting lists
let number_of_inserted_documents = documents_additions.len();
let documents_ids = documents_additions.iter().map(|(id, _)| *id).collect();
apply_documents_deletion(writer, index, documents_ids)?;
let mut ranked_map = match index.main.ranked_map(writer)? {
Some(ranked_map) => ranked_map,
None => RankedMap::default(),
};
let stop_words = match index.main.stop_words_fst(writer)? {
Some(stop_words) => stop_words,
None => fst::Set::default(),
};
// 3. index the documents fields in the stores
let mut indexer = RawIndexer::new(stop_words);
for (document_id, document) in documents_additions {
let serializer = Serializer {
txn: writer,
schema: &mut schema,
document_store: index.documents_fields,
document_fields_counts: index.documents_fields_counts,
indexer: &mut indexer,
ranked_map: &mut ranked_map,
document_id,
};
document.serialize(serializer)?;
}
write_documents_addition_index(
writer,
index,
&ranked_map,
number_of_inserted_documents,
indexer,
)?;
index.main.put_schema(writer, &schema)?;
Ok(())
} }
pub fn reindex_all_documents(writer: &mut heed::RwTxn<MainT>, index: &store::Index) -> MResult<()> { pub fn reindex_all_documents(writer: &mut heed::RwTxn<MainT>, index: &store::Index) -> MResult<()> {
@ -277,36 +307,43 @@ pub fn reindex_all_documents(writer: &mut heed::RwTxn<MainT>, index: &store::Ind
index.main.put_words_fst(writer, &fst::Set::default())?; index.main.put_words_fst(writer, &fst::Set::default())?;
index.main.put_ranked_map(writer, &ranked_map)?; index.main.put_ranked_map(writer, &ranked_map)?;
index.main.put_number_of_documents(writer, |_| 0)?; index.main.put_number_of_documents(writer, |_| 0)?;
index.facets.clear(writer)?;
index.postings_lists.clear(writer)?; index.postings_lists.clear(writer)?;
index.docs_words.clear(writer)?; index.docs_words.clear(writer)?;
let stop_words = match index.main.stop_words_fst(writer)? { let stop_words = index.main
Some(stop_words) => stop_words, .stop_words_fst(writer)?
None => fst::Set::default(), .map_data(Cow::into_owned)
}; .unwrap();
let number_of_inserted_documents = documents_ids_to_reindex.len(); let number_of_inserted_documents = documents_ids_to_reindex.len();
let mut indexer = RawIndexer::new(stop_words); let mut indexer = RawIndexer::new(stop_words);
let mut ram_store = HashMap::new(); let mut ram_store = HashMap::new();
for document_id in documents_ids_to_reindex { if let Some(ref attributes_for_facetting) = index.main.attributes_for_faceting(writer)? {
for result in index.documents_fields.document_fields(writer, document_id)? { let facet_map = facets::facet_map_from_docids(writer, &index, &documents_ids_to_reindex, &attributes_for_facetting)?;
index.facets.add(writer, facet_map)?;
}
// ^-- https://github.com/meilisearch/MeiliSearch/pull/631#issuecomment-626624470 --v
for document_id in &documents_ids_to_reindex {
for result in index.documents_fields.document_fields(writer, *document_id)? {
let (field_id, bytes) = result?; let (field_id, bytes) = result?;
let value: serde_json::Value = serde_json::from_slice(bytes)?; let value: Value = serde_json::from_slice(bytes)?;
ram_store.insert((document_id, field_id), value); ram_store.insert((document_id, field_id), value);
} }
for ((docid, field_id), value) in ram_store.drain() { // For each key-value pair in the document.
serialize_value_with_id( for ((document_id, field_id), value) in ram_store.drain() {
index_document(
writer, writer,
field_id,
&schema,
docid,
index.documents_fields, index.documents_fields,
index.documents_fields_counts, index.documents_fields_counts,
&mut indexer,
&mut ranked_map, &mut ranked_map,
&value &mut indexer,
&schema,
field_id,
*document_id,
&value,
)?; )?;
} }
} }
@ -322,16 +359,29 @@ pub fn reindex_all_documents(writer: &mut heed::RwTxn<MainT>, index: &store::Ind
index.main.put_schema(writer, &schema)?; index.main.put_schema(writer, &schema)?;
// recompute all facet attributes after document update.
if let Some(attributes_for_facetting) = index.main.attributes_for_faceting(writer)? {
let docids = index.main.internal_docids(writer)?;
let facet_map = facets::facet_map_from_docids(writer, index, &docids, attributes_for_facetting.as_ref())?;
index.facets.add(writer, facet_map)?;
}
// update is finished; update sorted document id cache with new state
let mut document_ids = index.main.internal_docids(writer)?.to_vec();
super::cache_document_ids_sorted(writer, &ranked_map, index, &mut document_ids)?;
Ok(()) Ok(())
} }
pub fn write_documents_addition_index( pub fn write_documents_addition_index<A>(
writer: &mut heed::RwTxn<MainT>, writer: &mut heed::RwTxn<MainT>,
index: &store::Index, index: &store::Index,
ranked_map: &RankedMap, ranked_map: &RankedMap,
number_of_inserted_documents: usize, number_of_inserted_documents: usize,
indexer: RawIndexer, indexer: RawIndexer<A>,
) -> MResult<()> { ) -> MResult<()>
where A: AsRef<[u8]>,
{
let indexed = indexer.build(); let indexed = indexer.build();
let mut delta_words_builder = SetBuilder::memory(); let mut delta_words_builder = SetBuilder::memory();
@ -350,33 +400,27 @@ pub fn write_documents_addition_index(
index.docs_words.put_doc_words(writer, id, &words)?; index.docs_words.put_doc_words(writer, id, &words)?;
} }
let delta_words = delta_words_builder let delta_words = delta_words_builder.into_set();
.into_inner()
.and_then(fst::Set::from_bytes)
.unwrap();
let words = match index.main.words_fst(writer)? { let words_fst = index.main.words_fst(writer)?;
Some(words) => { let words = if !words_fst.is_empty() {
let op = OpBuilder::new() let op = OpBuilder::new()
.add(words.stream()) .add(words_fst.stream())
.add(delta_words.stream()) .add(delta_words.stream())
.r#union(); .r#union();
let mut words_builder = SetBuilder::memory(); let mut words_builder = SetBuilder::memory();
words_builder.extend_stream(op).unwrap(); words_builder.extend_stream(op).unwrap();
words_builder words_builder.into_set()
.into_inner() } else {
.and_then(fst::Set::from_bytes) delta_words
.unwrap()
}
None => delta_words,
}; };
index.main.put_words_fst(writer, &words)?; index.main.put_words_fst(writer, &words)?;
index.main.put_ranked_map(writer, ranked_map)?; index.main.put_ranked_map(writer, ranked_map)?;
index.main.put_number_of_documents(writer, |old| old + number_of_inserted_documents as u64)?; index.main.put_number_of_documents(writer, |old| old + number_of_inserted_documents as u64)?;
compute_short_prefixes(writer, index)?; compute_short_prefixes(writer, &words, index)?;
Ok(()) Ok(())
} }

View File

@ -1,21 +1,20 @@
use std::collections::{BTreeSet, HashMap, HashSet}; use std::collections::{BTreeSet, HashMap, HashSet};
use fst::{SetBuilder, Streamer}; use fst::{SetBuilder, Streamer};
use meilisearch_schema::Schema;
use sdset::{duo::DifferenceByKey, SetBuf, SetOperation}; use sdset::{duo::DifferenceByKey, SetBuf, SetOperation};
use crate::database::{MainT, UpdateT}; use crate::database::{MainT, UpdateT};
use crate::database::{UpdateEvent, UpdateEventsEmitter}; use crate::database::{UpdateEvent, UpdateEventsEmitter};
use crate::serde::extract_document_id; use crate::facets;
use crate::store; use crate::store;
use crate::update::{next_update_id, compute_short_prefixes, Update}; use crate::update::{next_update_id, compute_short_prefixes, Update};
use crate::{DocumentId, Error, MResult, RankedMap}; use crate::{DocumentId, Error, MResult, RankedMap, MainWriter, Index};
pub struct DocumentsDeletion { pub struct DocumentsDeletion {
updates_store: store::Updates, updates_store: store::Updates,
updates_results_store: store::UpdatesResults, updates_results_store: store::UpdatesResults,
updates_notifier: UpdateEventsEmitter, updates_notifier: UpdateEventsEmitter,
documents: Vec<DocumentId>, external_docids: Vec<String>,
} }
impl DocumentsDeletion { impl DocumentsDeletion {
@ -28,27 +27,12 @@ impl DocumentsDeletion {
updates_store, updates_store,
updates_results_store, updates_results_store,
updates_notifier, updates_notifier,
documents: Vec::new(), external_docids: Vec::new(),
} }
} }
pub fn delete_document_by_id(&mut self, document_id: DocumentId) { pub fn delete_document_by_external_docid(&mut self, document_id: String) {
self.documents.push(document_id); self.external_docids.push(document_id);
}
pub fn delete_document<D>(&mut self, schema: &Schema, document: D) -> MResult<()>
where
D: serde::Serialize,
{
let primary_key = schema.primary_key().ok_or(Error::MissingPrimaryKey)?;
let document_id = match extract_document_id(&primary_key, &document)? {
Some(id) => id,
None => return Err(Error::MissingDocumentId),
};
self.delete_document_by_id(document_id);
Ok(())
} }
pub fn finalize(self, writer: &mut heed::RwTxn<UpdateT>) -> MResult<u64> { pub fn finalize(self, writer: &mut heed::RwTxn<UpdateT>) -> MResult<u64> {
@ -57,15 +41,15 @@ impl DocumentsDeletion {
writer, writer,
self.updates_store, self.updates_store,
self.updates_results_store, self.updates_results_store,
self.documents, self.external_docids,
)?; )?;
Ok(update_id) Ok(update_id)
} }
} }
impl Extend<DocumentId> for DocumentsDeletion { impl Extend<String> for DocumentsDeletion {
fn extend<T: IntoIterator<Item = DocumentId>>(&mut self, iter: T) { fn extend<T: IntoIterator<Item=String>>(&mut self, iter: T) {
self.documents.extend(iter) self.external_docids.extend(iter)
} }
} }
@ -73,11 +57,11 @@ pub fn push_documents_deletion(
writer: &mut heed::RwTxn<UpdateT>, writer: &mut heed::RwTxn<UpdateT>,
updates_store: store::Updates, updates_store: store::Updates,
updates_results_store: store::UpdatesResults, updates_results_store: store::UpdatesResults,
deletion: Vec<DocumentId>, external_docids: Vec<String>,
) -> MResult<u64> { ) -> MResult<u64> {
let last_update_id = next_update_id(writer, updates_store, updates_results_store)?; let last_update_id = next_update_id(writer, updates_store, updates_results_store)?;
let update = Update::documents_deletion(deletion); let update = Update::documents_deletion(external_docids);
updates_store.put_update(writer, last_update_id, &update)?; updates_store.put_update(writer, last_update_id, &update)?;
Ok(last_update_id) Ok(last_update_id)
@ -86,9 +70,23 @@ pub fn push_documents_deletion(
pub fn apply_documents_deletion( pub fn apply_documents_deletion(
writer: &mut heed::RwTxn<MainT>, writer: &mut heed::RwTxn<MainT>,
index: &store::Index, index: &store::Index,
deletion: Vec<DocumentId>, external_docids: Vec<String>,
) -> MResult<()> { ) -> MResult<()>
let idset = SetBuf::from_dirty(deletion); {
let (external_docids, internal_docids) = {
let new_external_docids = SetBuf::from_dirty(external_docids);
let mut internal_docids = Vec::new();
let old_external_docids = index.main.external_docids(writer)?;
for external_docid in new_external_docids.as_slice() {
if let Some(id) = old_external_docids.get(external_docid) {
internal_docids.push(DocumentId(id as u32));
}
}
let new_external_docids = fst::Map::from_iter(new_external_docids.into_iter().map(|k| (k, 0))).unwrap();
(new_external_docids, SetBuf::from_dirty(internal_docids))
};
let schema = match index.main.schema(writer)? { let schema = match index.main.schema(writer)? {
Some(schema) => schema, Some(schema) => schema,
@ -100,17 +98,24 @@ pub fn apply_documents_deletion(
None => RankedMap::default(), None => RankedMap::default(),
}; };
// facet filters deletion
if let Some(attributes_for_facetting) = index.main.attributes_for_faceting(writer)? {
let facet_map = facets::facet_map_from_docids(writer, &index, &internal_docids, &attributes_for_facetting)?;
index.facets.remove(writer, facet_map)?;
}
// collect the ranked attributes according to the schema // collect the ranked attributes according to the schema
let ranked_fields = schema.ranked(); let ranked_fields = schema.ranked();
let mut words_document_ids = HashMap::new(); let mut words_document_ids = HashMap::new();
for id in idset { for id in internal_docids.iter().cloned() {
// remove all the ranked attributes from the ranked_map // remove all the ranked attributes from the ranked_map
for ranked_attr in ranked_fields { for ranked_attr in ranked_fields {
ranked_map.remove(id, *ranked_attr); ranked_map.remove(id, *ranked_attr);
} }
if let Some(words) = index.docs_words.doc_words(writer, id)? { let words = index.docs_words.doc_words(writer, id)?;
if !words.is_empty() {
let mut stream = words.stream(); let mut stream = words.stream();
while let Some(word) = stream.next() { while let Some(word) = stream.next() {
let word = word.to_vec(); let word = word.to_vec();
@ -148,33 +153,55 @@ pub fn apply_documents_deletion(
} }
let deleted_documents_len = deleted_documents.len() as u64; let deleted_documents_len = deleted_documents.len() as u64;
for id in deleted_documents { for id in &deleted_documents {
index.docs_words.del_doc_words(writer, id)?; index.docs_words.del_doc_words(writer, *id)?;
} }
let removed_words = fst::Set::from_iter(removed_words).unwrap(); let removed_words = fst::Set::from_iter(removed_words).unwrap();
let words = match index.main.words_fst(writer)? { let words = {
Some(words_set) => { let words_set = index.main.words_fst(writer)?;
let op = fst::set::OpBuilder::new() let op = fst::set::OpBuilder::new()
.add(words_set.stream()) .add(words_set.stream())
.add(removed_words.stream()) .add(removed_words.stream())
.difference(); .difference();
let mut words_builder = SetBuilder::memory(); let mut words_builder = SetBuilder::memory();
words_builder.extend_stream(op).unwrap(); words_builder.extend_stream(op).unwrap();
words_builder words_builder.into_set()
.into_inner()
.and_then(fst::Set::from_bytes)
.unwrap()
}
None => fst::Set::default(),
}; };
index.main.put_words_fst(writer, &words)?; index.main.put_words_fst(writer, &words)?;
index.main.put_ranked_map(writer, &ranked_map)?; index.main.put_ranked_map(writer, &ranked_map)?;
index.main.put_number_of_documents(writer, |old| old - deleted_documents_len)?; index.main.put_number_of_documents(writer, |old| old - deleted_documents_len)?;
compute_short_prefixes(writer, index)?; // We apply the changes to the user and internal ids
index.main.remove_external_docids(writer, &external_docids)?;
index.main.remove_internal_docids(writer, &internal_docids)?;
compute_short_prefixes(writer, &words, index)?;
// update is finished; update sorted document id cache with new state
document_cache_remove_deleted(writer, index, &ranked_map, &deleted_documents)?;
Ok(()) Ok(())
} }
/// rebuilds the document id cache by either removing deleted documents from the existing cache,
/// and generating a new one from docs in store
fn document_cache_remove_deleted(writer: &mut MainWriter, index: &Index, ranked_map: &RankedMap, documents_to_delete: &HashSet<DocumentId>) -> MResult<()> {
let new_cache = match index.main.sorted_document_ids_cache(writer)? {
// only keep documents that are not in the list of deleted documents. Order is preserved,
// no need to resort
Some(old_cache) => {
old_cache.iter().filter(|docid| !documents_to_delete.contains(docid)).cloned().collect::<Vec<_>>()
}
// couldn't find cached documents, try building a new cache from documents in store
None => {
let mut document_ids = index.main.internal_docids(writer)?.to_vec();
super::cache_document_ids_sorted(writer, ranked_map, index, &mut document_ids)?;
document_ids
}
};
index.main.put_sorted_document_ids_cache(writer, &new_cache)?;
Ok(())
}

View File

@ -0,0 +1,143 @@
use std::fmt::Write as _;
use indexmap::IndexMap;
use meilisearch_schema::IndexedPos;
use meilisearch_types::DocumentId;
use ordered_float::OrderedFloat;
use serde_json::Value;
use crate::Number;
use crate::raw_indexer::RawIndexer;
use crate::serde::SerializerError;
use crate::store::DiscoverIds;
/// Returns the number of words indexed or `None` if the type is unindexable.
pub fn index_value<A>(
indexer: &mut RawIndexer<A>,
document_id: DocumentId,
indexed_pos: IndexedPos,
value: &Value,
) -> Option<usize>
where A: AsRef<[u8]>,
{
match value {
Value::Null => None,
Value::Bool(boolean) => {
let text = boolean.to_string();
let number_of_words = indexer.index_text(document_id, indexed_pos, &text);
Some(number_of_words)
},
Value::Number(number) => {
let text = number.to_string();
Some(indexer.index_text(document_id, indexed_pos, &text))
},
Value::String(string) => {
Some(indexer.index_text(document_id, indexed_pos, &string))
},
Value::Array(_) => {
let text = value_to_string(value);
Some(indexer.index_text(document_id, indexed_pos, &text))
},
Value::Object(_) => {
let text = value_to_string(value);
Some(indexer.index_text(document_id, indexed_pos, &text))
},
}
}
/// Transforms the JSON Value type into a String.
pub fn value_to_string(value: &Value) -> String {
fn internal_value_to_string(string: &mut String, value: &Value) {
match value {
Value::Null => (),
Value::Bool(boolean) => { let _ = write!(string, "{}", &boolean); },
Value::Number(number) => { let _ = write!(string, "{}", &number); },
Value::String(text) => string.push_str(&text),
Value::Array(array) => {
for value in array {
internal_value_to_string(string, value);
let _ = string.write_str(". ");
}
},
Value::Object(object) => {
for (key, value) in object {
string.push_str(key);
let _ = string.write_str(". ");
internal_value_to_string(string, value);
let _ = string.write_str(". ");
}
},
}
}
let mut string = String::new();
internal_value_to_string(&mut string, value);
string
}
/// Transforms the JSON Value type into a Number.
pub fn value_to_number(value: &Value) -> Option<Number> {
use std::str::FromStr;
match value {
Value::Null => None,
Value::Bool(boolean) => Some(Number::Unsigned(*boolean as u64)),
Value::Number(number) => {
match (number.as_i64(), number.as_u64(), number.as_f64()) {
(Some(n), _, _) => Some(Number::Signed(n)),
(_, Some(n), _) => Some(Number::Unsigned(n)),
(_, _, Some(n)) => Some(Number::Float(OrderedFloat(n))),
(None, None, None) => None,
}
},
Value::String(string) => Number::from_str(string).ok(),
Value::Array(_array) => None,
Value::Object(_object) => None,
}
}
/// Validates a string representation to be a correct document id and returns
/// the corresponding id or generate a new one, this is the way we produce documents ids.
pub fn discover_document_id<F>(
docid: &str,
external_docids_get: F,
available_docids: &mut DiscoverIds<'_>,
) -> Result<DocumentId, SerializerError>
where
F: FnOnce(&str) -> Option<u32>
{
if docid.chars().all(|x| x.is_ascii_alphanumeric() || x == '-' || x == '_') {
match external_docids_get(docid) {
Some(id) => Ok(DocumentId(id)),
None => {
let internal_id = available_docids.next().expect("no more ids available");
Ok(internal_id)
},
}
} else {
Err(SerializerError::InvalidDocumentIdFormat)
}
}
/// Extracts and validates the document id of a document.
pub fn extract_document_id<F>(
primary_key: &str,
document: &IndexMap<String, Value>,
external_docids_get: F,
available_docids: &mut DiscoverIds<'_>,
) -> Result<(DocumentId, String), SerializerError>
where
F: FnOnce(&str) -> Option<u32>
{
match document.get(primary_key) {
Some(value) => {
let docid = match value {
Value::Number(number) => number.to_string(),
Value::String(string) => string.clone(),
_ => return Err(SerializerError::InvalidDocumentIdFormat),
};
discover_document_id(&docid, external_docids_get, available_docids).map(|id| (id, docid))
}
None => Err(SerializerError::DocumentIdNotFound),
}
}

View File

@ -3,13 +3,13 @@ mod customs_update;
mod documents_addition; mod documents_addition;
mod documents_deletion; mod documents_deletion;
mod settings_update; mod settings_update;
mod helpers;
pub use self::clear_all::{apply_clear_all, push_clear_all}; pub use self::clear_all::{apply_clear_all, push_clear_all};
pub use self::customs_update::{apply_customs_update, push_customs_update}; pub use self::customs_update::{apply_customs_update, push_customs_update};
pub use self::documents_addition::{ pub use self::documents_addition::{apply_documents_addition, apply_documents_partial_addition, DocumentsAddition};
apply_documents_addition, apply_documents_partial_addition, DocumentsAddition,
};
pub use self::documents_deletion::{apply_documents_deletion, DocumentsDeletion}; pub use self::documents_deletion::{apply_documents_deletion, DocumentsDeletion};
pub use self::helpers::{index_value, value_to_string, value_to_number, discover_document_id, extract_document_id};
pub use self::settings_update::{apply_settings_update, push_settings_update}; pub use self::settings_update::{apply_settings_update, push_settings_update};
use std::cmp; use std::cmp;
@ -22,8 +22,12 @@ use indexmap::IndexMap;
use log::debug; use log::debug;
use sdset::Set; use sdset::Set;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use serde_json::Value;
use crate::{store, DocumentId, MResult}; use meilisearch_error::ErrorCode;
use meilisearch_types::DocumentId;
use crate::{store, MResult, RankedMap};
use crate::database::{MainT, UpdateT}; use crate::database::{MainT, UpdateT};
use crate::settings::SettingsUpdate; use crate::settings::SettingsUpdate;
@ -48,21 +52,21 @@ impl Update {
} }
} }
fn documents_addition(data: Vec<IndexMap<String, serde_json::Value>>) -> Update { fn documents_addition(documents: Vec<IndexMap<String, Value>>) -> Update {
Update { Update {
data: UpdateData::DocumentsAddition(data), data: UpdateData::DocumentsAddition(documents),
enqueued_at: Utc::now(), enqueued_at: Utc::now(),
} }
} }
fn documents_partial(data: Vec<IndexMap<String, serde_json::Value>>) -> Update { fn documents_partial(documents: Vec<IndexMap<String, Value>>) -> Update {
Update { Update {
data: UpdateData::DocumentsPartial(data), data: UpdateData::DocumentsPartial(documents),
enqueued_at: Utc::now(), enqueued_at: Utc::now(),
} }
} }
fn documents_deletion(data: Vec<DocumentId>) -> Update { fn documents_deletion(data: Vec<String>) -> Update {
Update { Update {
data: UpdateData::DocumentsDeletion(data), data: UpdateData::DocumentsDeletion(data),
enqueued_at: Utc::now(), enqueued_at: Utc::now(),
@ -71,7 +75,7 @@ impl Update {
fn settings(data: SettingsUpdate) -> Update { fn settings(data: SettingsUpdate) -> Update {
Update { Update {
data: UpdateData::Settings(data), data: UpdateData::Settings(Box::new(data)),
enqueued_at: Utc::now(), enqueued_at: Utc::now(),
} }
} }
@ -81,10 +85,10 @@ impl Update {
pub enum UpdateData { pub enum UpdateData {
ClearAll, ClearAll,
Customs(Vec<u8>), Customs(Vec<u8>),
DocumentsAddition(Vec<IndexMap<String, serde_json::Value>>), DocumentsAddition(Vec<IndexMap<String, Value>>),
DocumentsPartial(Vec<IndexMap<String, serde_json::Value>>), DocumentsPartial(Vec<IndexMap<String, Value>>),
DocumentsDeletion(Vec<DocumentId>), DocumentsDeletion(Vec<String>),
Settings(SettingsUpdate) Settings(Box<SettingsUpdate>)
} }
impl UpdateData { impl UpdateData {
@ -116,7 +120,7 @@ pub enum UpdateType {
DocumentsAddition { number: usize }, DocumentsAddition { number: usize },
DocumentsPartial { number: usize }, DocumentsPartial { number: usize },
DocumentsDeletion { number: usize }, DocumentsDeletion { number: usize },
Settings { settings: SettingsUpdate }, Settings { settings: Box<SettingsUpdate> },
} }
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
@ -127,6 +131,12 @@ pub struct ProcessedUpdateResult {
pub update_type: UpdateType, pub update_type: UpdateType,
#[serde(skip_serializing_if = "Option::is_none")] #[serde(skip_serializing_if = "Option::is_none")]
pub error: Option<String>, pub error: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub error_type: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub error_code: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub error_link: Option<String>,
pub duration: f64, // in seconds pub duration: f64, // in seconds
pub enqueued_at: DateTime<Utc>, pub enqueued_at: DateTime<Utc>,
pub processed_at: DateTime<Utc>, pub processed_at: DateTime<Utc>,
@ -272,7 +282,7 @@ pub fn update_task<'a, 'b>(
let result = apply_settings_update( let result = apply_settings_update(
writer, writer,
index, index,
settings, *settings,
); );
(update_type, result, start.elapsed()) (update_type, result, start.elapsed())
@ -287,7 +297,10 @@ pub fn update_task<'a, 'b>(
let status = ProcessedUpdateResult { let status = ProcessedUpdateResult {
update_id, update_id,
update_type, update_type,
error: result.map_err(|e| e.to_string()).err(), error: result.as_ref().map_err(|e| e.to_string()).err(),
error_code: result.as_ref().map_err(|e| e.error_name()).err(),
error_type: result.as_ref().map_err(|e| e.error_type()).err(),
error_link: result.as_ref().map_err(|e| e.error_url()).err(),
duration: duration.as_secs_f64(), duration: duration.as_secs_f64(),
enqueued_at, enqueued_at,
processed_at: Utc::now(), processed_at: Utc::now(),
@ -296,13 +309,13 @@ pub fn update_task<'a, 'b>(
Ok(status) Ok(status)
} }
fn compute_short_prefixes(writer: &mut heed::RwTxn<MainT>, index: &store::Index) -> MResult<()> { fn compute_short_prefixes<A>(
// retrieve the words fst to compute all those prefixes writer: &mut heed::RwTxn<MainT>,
let words_fst = match index.main.words_fst(writer)? { words_fst: &fst::Set<A>,
Some(fst) => fst, index: &store::Index,
None => return Ok(()), ) -> MResult<()>
}; where A: AsRef<[u8]>,
{
// clear the prefixes // clear the prefixes
let pplc_store = index.prefix_postings_lists_cache; let pplc_store = index.prefix_postings_lists_cache;
pplc_store.clear(writer)?; pplc_store.clear(writer)?;
@ -359,3 +372,13 @@ fn compute_short_prefixes(writer: &mut heed::RwTxn<MainT>, index: &store::Index)
Ok(()) Ok(())
} }
fn cache_document_ids_sorted(
writer: &mut heed::RwTxn<MainT>,
ranked_map: &RankedMap,
index: &store::Index,
document_ids: &mut [DocumentId],
) -> MResult<()> {
crate::bucket_sort::placeholder_document_sort(document_ids, index, writer, ranked_map)?;
index.main.put_sorted_document_ids_cache(writer, &document_ids)
}

View File

@ -46,12 +46,6 @@ pub fn apply_settings_update(
UpdateState::Update(v) => { UpdateState::Update(v) => {
let ranked_field: Vec<&str> = v.iter().filter_map(RankingRule::field).collect(); let ranked_field: Vec<&str> = v.iter().filter_map(RankingRule::field).collect();
schema.update_ranked(&ranked_field)?; schema.update_ranked(&ranked_field)?;
for name in ranked_field {
if schema.accept_new_fields() {
schema.set_indexed(name.as_ref())?;
schema.set_displayed(name.as_ref())?;
}
}
index.main.put_ranking_rules(writer, &v)?; index.main.put_ranking_rules(writer, &v)?;
must_reindex = true; must_reindex = true;
}, },
@ -65,7 +59,8 @@ pub fn apply_settings_update(
match settings.distinct_attribute { match settings.distinct_attribute {
UpdateState::Update(v) => { UpdateState::Update(v) => {
index.main.put_distinct_attribute(writer, &v)?; let field_id = schema.insert(&v)?;
index.main.put_distinct_attribute(writer, field_id)?;
}, },
UpdateState::Clear => { UpdateState::Clear => {
index.main.delete_distinct_attribute(writer)?; index.main.delete_distinct_attribute(writer)?;
@ -73,19 +68,13 @@ pub fn apply_settings_update(
UpdateState::Nothing => (), UpdateState::Nothing => (),
} }
match settings.accept_new_fields {
UpdateState::Update(v) => {
schema.set_accept_new_fields(v);
},
UpdateState::Clear => {
schema.set_accept_new_fields(true);
},
UpdateState::Nothing => (),
}
match settings.searchable_attributes.clone() { match settings.searchable_attributes.clone() {
UpdateState::Update(v) => { UpdateState::Update(v) => {
schema.update_indexed(v)?; if v.iter().any(|e| e == "*") || v.is_empty() {
schema.set_all_fields_as_indexed();
} else {
schema.update_indexed(v)?;
}
must_reindex = true; must_reindex = true;
}, },
UpdateState::Clear => { UpdateState::Clear => {
@ -95,13 +84,31 @@ pub fn apply_settings_update(
UpdateState::Nothing => (), UpdateState::Nothing => (),
} }
match settings.displayed_attributes.clone() { match settings.displayed_attributes.clone() {
UpdateState::Update(v) => schema.update_displayed(v)?, UpdateState::Update(v) => {
if v.contains("*") || v.is_empty() {
schema.set_all_fields_as_displayed();
} else {
schema.update_displayed(v)?
}
},
UpdateState::Clear => { UpdateState::Clear => {
schema.set_all_fields_as_displayed(); schema.set_all_fields_as_displayed();
}, },
UpdateState::Nothing => (), UpdateState::Nothing => (),
} }
match settings.attributes_for_faceting {
UpdateState::Update(attrs) => {
apply_attributes_for_faceting_update(writer, index, &mut schema, &attrs)?;
must_reindex = true;
},
UpdateState::Clear => {
index.main.delete_attributes_for_faceting(writer)?;
index.facets.clear(writer)?;
},
UpdateState::Nothing => (),
}
index.main.put_schema(writer, &schema)?; index.main.put_schema(writer, &schema)?;
match settings.stop_words { match settings.stop_words {
@ -131,85 +138,104 @@ pub fn apply_settings_update(
Ok(()) Ok(())
} }
fn apply_attributes_for_faceting_update(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
schema: &mut Schema,
attributes: &[String]
) -> MResult<()> {
let mut attribute_ids = Vec::new();
for name in attributes {
attribute_ids.push(schema.insert(name)?);
}
let attributes_for_faceting = SetBuf::from_dirty(attribute_ids);
index.main.put_attributes_for_faceting(writer, &attributes_for_faceting)?;
Ok(())
}
pub fn apply_stop_words_update( pub fn apply_stop_words_update(
writer: &mut heed::RwTxn<MainT>, writer: &mut heed::RwTxn<MainT>,
index: &store::Index, index: &store::Index,
stop_words: BTreeSet<String>, stop_words: BTreeSet<String>,
) -> MResult<bool> { ) -> MResult<bool>
{
let mut must_reindex = false;
let old_stop_words: BTreeSet<String> = index.main let old_stop_words: BTreeSet<String> = index.main
.stop_words_fst(writer)? .stop_words_fst(writer)?
.unwrap_or_default()
.stream() .stream()
.into_strs().unwrap().into_iter().collect(); .into_strs()?
.into_iter()
.collect();
let deletion: BTreeSet<String> = old_stop_words.difference(&stop_words).cloned().collect(); let deletion: BTreeSet<String> = old_stop_words.difference(&stop_words).cloned().collect();
let addition: BTreeSet<String> = stop_words.difference(&old_stop_words).cloned().collect(); let addition: BTreeSet<String> = stop_words.difference(&old_stop_words).cloned().collect();
if !addition.is_empty() { if !addition.is_empty() {
apply_stop_words_addition( apply_stop_words_addition(writer, index, addition)?;
writer,
index,
addition
)?;
} }
if !deletion.is_empty() { if !deletion.is_empty() {
apply_stop_words_deletion( must_reindex = true;
writer, apply_stop_words_deletion(writer, index, deletion)?;
index,
deletion
)?;
return Ok(true)
} }
let stop_words_fst = fst::Set::from_iter(stop_words)?; let words_fst = index.main.words_fst(writer)?;
index.main.put_words_fst(writer, &stop_words_fst)?; if !words_fst.is_empty() {
Ok(false) let stop_words = fst::Set::from_iter(stop_words)?;
let op = OpBuilder::new()
.add(&words_fst)
.add(&stop_words)
.difference();
let mut builder = fst::SetBuilder::memory();
builder.extend_stream(op)?;
let words_fst = builder.into_set();
index.main.put_words_fst(writer, &words_fst)?;
index.main.put_stop_words_fst(writer, &stop_words)?;
}
Ok(must_reindex)
} }
fn apply_stop_words_addition( fn apply_stop_words_addition(
writer: &mut heed::RwTxn<MainT>, writer: &mut heed::RwTxn<MainT>,
index: &store::Index, index: &store::Index,
addition: BTreeSet<String>, addition: BTreeSet<String>,
) -> MResult<()> { ) -> MResult<()>
{
let main_store = index.main; let main_store = index.main;
let postings_lists_store = index.postings_lists; let postings_lists_store = index.postings_lists;
let mut stop_words_builder = SetBuilder::memory(); let mut stop_words_builder = SetBuilder::memory();
for word in addition { for word in addition {
stop_words_builder.insert(&word).unwrap(); stop_words_builder.insert(&word)?;
// we remove every posting list associated to a new stop word // we remove every posting list associated to a new stop word
postings_lists_store.del_postings_list(writer, word.as_bytes())?; postings_lists_store.del_postings_list(writer, word.as_bytes())?;
} }
// create the new delta stop words fst // create the new delta stop words fst
let delta_stop_words = stop_words_builder let delta_stop_words = stop_words_builder.into_set();
.into_inner()
.and_then(fst::Set::from_bytes)
.unwrap();
// we also need to remove all the stop words from the main fst // we also need to remove all the stop words from the main fst
if let Some(word_fst) = main_store.words_fst(writer)? { let words_fst = main_store.words_fst(writer)?;
if !words_fst.is_empty() {
let op = OpBuilder::new() let op = OpBuilder::new()
.add(&word_fst) .add(&words_fst)
.add(&delta_stop_words) .add(&delta_stop_words)
.difference(); .difference();
let mut word_fst_builder = SetBuilder::memory(); let mut word_fst_builder = SetBuilder::memory();
word_fst_builder.extend_stream(op).unwrap(); word_fst_builder.extend_stream(op)?;
let word_fst = word_fst_builder let word_fst = word_fst_builder.into_set();
.into_inner()
.and_then(fst::Set::from_bytes)
.unwrap();
main_store.put_words_fst(writer, &word_fst)?; main_store.put_words_fst(writer, &word_fst)?;
} }
// now we add all of these stop words from the main store // now we add all of these stop words from the main store
let stop_words_fst = main_store.stop_words_fst(writer)?.unwrap_or_default(); let stop_words_fst = main_store.stop_words_fst(writer)?;
let op = OpBuilder::new() let op = OpBuilder::new()
.add(&stop_words_fst) .add(&stop_words_fst)
@ -217,11 +243,8 @@ fn apply_stop_words_addition(
.r#union(); .r#union();
let mut stop_words_builder = SetBuilder::memory(); let mut stop_words_builder = SetBuilder::memory();
stop_words_builder.extend_stream(op).unwrap(); stop_words_builder.extend_stream(op)?;
let stop_words_fst = stop_words_builder let stop_words_fst = stop_words_builder.into_set();
.into_inner()
.and_then(fst::Set::from_bytes)
.unwrap();
main_store.put_stop_words_fst(writer, &stop_words_fst)?; main_store.put_stop_words_fst(writer, &stop_words_fst)?;
@ -237,17 +260,14 @@ fn apply_stop_words_deletion(
let mut stop_words_builder = SetBuilder::memory(); let mut stop_words_builder = SetBuilder::memory();
for word in deletion { for word in deletion {
stop_words_builder.insert(&word).unwrap(); stop_words_builder.insert(&word)?;
} }
// create the new delta stop words fst // create the new delta stop words fst
let delta_stop_words = stop_words_builder let delta_stop_words = stop_words_builder.into_set();
.into_inner()
.and_then(fst::Set::from_bytes)
.unwrap();
// now we delete all of these stop words from the main store // now we delete all of these stop words from the main store
let stop_words_fst = index.main.stop_words_fst(writer)?.unwrap_or_default(); let stop_words_fst = index.main.stop_words_fst(writer)?;
let op = OpBuilder::new() let op = OpBuilder::new()
.add(&stop_words_fst) .add(&stop_words_fst)
@ -255,11 +275,8 @@ fn apply_stop_words_deletion(
.difference(); .difference();
let mut stop_words_builder = SetBuilder::memory(); let mut stop_words_builder = SetBuilder::memory();
stop_words_builder.extend_stream(op).unwrap(); stop_words_builder.extend_stream(op)?;
let stop_words_fst = stop_words_builder let stop_words_fst = stop_words_builder.into_set();
.into_inner()
.and_then(fst::Set::from_bytes)
.unwrap();
Ok(index.main.put_stop_words_fst(writer, &stop_words_fst)?) Ok(index.main.put_stop_words_fst(writer, &stop_words_fst)?)
} }
@ -276,23 +293,19 @@ pub fn apply_synonyms_update(
let mut synonyms_builder = SetBuilder::memory(); let mut synonyms_builder = SetBuilder::memory();
synonyms_store.clear(writer)?; synonyms_store.clear(writer)?;
for (word, alternatives) in synonyms.clone() { for (word, alternatives) in synonyms.clone() {
synonyms_builder.insert(&word).unwrap(); synonyms_builder.insert(&word)?;
let alternatives = { let alternatives = {
let alternatives = SetBuf::from_dirty(alternatives); let alternatives = SetBuf::from_dirty(alternatives);
let mut alternatives_builder = SetBuilder::memory(); let mut alternatives_builder = SetBuilder::memory();
alternatives_builder.extend_iter(alternatives).unwrap(); alternatives_builder.extend_iter(alternatives)?;
let bytes = alternatives_builder.into_inner().unwrap(); alternatives_builder.into_set()
fst::Set::from_bytes(bytes).unwrap()
}; };
synonyms_store.put_synonyms(writer, word.as_bytes(), &alternatives)?; synonyms_store.put_synonyms(writer, word.as_bytes(), &alternatives)?;
} }
let synonyms_set = synonyms_builder let synonyms_set = synonyms_builder.into_set();
.into_inner()
.and_then(fst::Set::from_bytes)
.unwrap();
main_store.put_synonyms_fst(writer, &synonyms_set)?; main_store.put_synonyms_fst(writer, &synonyms_set)?;

View File

@ -0,0 +1,8 @@
[package]
name = "meilisearch-error"
version = "0.15.0"
authors = ["marin <postma.marin@protonmail.com>"]
edition = "2018"
[dependencies]
actix-http = "2"

View File

@ -0,0 +1,187 @@
use std::fmt;
use actix_http::http::StatusCode;
pub trait ErrorCode: std::error::Error {
fn error_code(&self) -> Code;
/// returns the HTTP status code ascociated with the error
fn http_status(&self) -> StatusCode {
self.error_code().http()
}
/// returns the doc url ascociated with the error
fn error_url(&self) -> String {
self.error_code().url()
}
/// returns error name, used as error code
fn error_name(&self) -> String {
self.error_code().name()
}
/// return the error type
fn error_type(&self) -> String {
self.error_code().type_()
}
}
#[allow(clippy::enum_variant_names)]
enum ErrorType {
InternalError,
InvalidRequestError,
AuthenticationError,
}
impl fmt::Display for ErrorType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
use ErrorType::*;
match self {
InternalError => write!(f, "internal_error"),
InvalidRequestError => write!(f, "invalid_request_error"),
AuthenticationError => write!(f, "authentication_error"),
}
}
}
pub enum Code {
// index related error
CreateIndex,
IndexAlreadyExists,
IndexNotFound,
InvalidIndexUid,
OpenIndex,
// invalid state error
InvalidState,
MissingPrimaryKey,
PrimaryKeyAlreadyPresent,
MaxFieldsLimitExceeded,
MissingDocumentId,
Facet,
Filter,
BadParameter,
BadRequest,
DocumentNotFound,
Internal,
InvalidToken,
Maintenance,
MissingAuthorizationHeader,
NotFound,
PayloadTooLarge,
RetrieveDocument,
SearchDocuments,
UnsupportedMediaType,
DumpAlreadyInProgress,
DumpProcessFailed,
}
impl Code {
/// ascociate a `Code` variant to the actual ErrCode
fn err_code(&self) -> ErrCode {
use Code::*;
match self {
// index related errors
// create index is thrown on internal error while creating an index.
CreateIndex => ErrCode::internal("index_creation_failed", StatusCode::BAD_REQUEST),
IndexAlreadyExists => ErrCode::invalid("index_already_exists", StatusCode::BAD_REQUEST),
// thrown when requesting an unexisting index
IndexNotFound => ErrCode::invalid("index_not_found", StatusCode::NOT_FOUND),
InvalidIndexUid => ErrCode::invalid("invalid_index_uid", StatusCode::BAD_REQUEST),
OpenIndex => ErrCode::internal("index_not_accessible", StatusCode::INTERNAL_SERVER_ERROR),
// invalid state error
InvalidState => ErrCode::internal("invalid_state", StatusCode::INTERNAL_SERVER_ERROR),
// thrown when no primary key has been set
MissingPrimaryKey => ErrCode::invalid("missing_primary_key", StatusCode::BAD_REQUEST),
// error thrown when trying to set an already existing primary key
PrimaryKeyAlreadyPresent => ErrCode::invalid("primary_key_already_present", StatusCode::BAD_REQUEST),
// invalid document
MaxFieldsLimitExceeded => ErrCode::invalid("max_fields_limit_exceeded", StatusCode::BAD_REQUEST),
MissingDocumentId => ErrCode::invalid("missing_document_id", StatusCode::BAD_REQUEST),
// error related to facets
Facet => ErrCode::invalid("invalid_facet", StatusCode::BAD_REQUEST),
// error related to filters
Filter => ErrCode::invalid("invalid_filter", StatusCode::BAD_REQUEST),
BadParameter => ErrCode::invalid("bad_parameter", StatusCode::BAD_REQUEST),
BadRequest => ErrCode::invalid("bad_request", StatusCode::BAD_REQUEST),
DocumentNotFound => ErrCode::invalid("document_not_found", StatusCode::NOT_FOUND),
Internal => ErrCode::internal("internal", StatusCode::INTERNAL_SERVER_ERROR),
InvalidToken => ErrCode::authentication("invalid_token", StatusCode::FORBIDDEN),
Maintenance => ErrCode::internal("maintenance", StatusCode::SERVICE_UNAVAILABLE),
MissingAuthorizationHeader => ErrCode::authentication("missing_authorization_header", StatusCode::UNAUTHORIZED),
NotFound => ErrCode::invalid("not_found", StatusCode::NOT_FOUND),
PayloadTooLarge => ErrCode::invalid("payload_too_large", StatusCode::PAYLOAD_TOO_LARGE),
RetrieveDocument => ErrCode::internal("unretrievable_document", StatusCode::BAD_REQUEST),
SearchDocuments => ErrCode::internal("search_error", StatusCode::BAD_REQUEST),
UnsupportedMediaType => ErrCode::invalid("unsupported_media_type", StatusCode::UNSUPPORTED_MEDIA_TYPE),
// error related to dump
DumpAlreadyInProgress => ErrCode::invalid("dump_already_in_progress", StatusCode::CONFLICT),
DumpProcessFailed => ErrCode::internal("dump_process_failed", StatusCode::INTERNAL_SERVER_ERROR),
}
}
/// return the HTTP status code ascociated with the `Code`
fn http(&self) -> StatusCode {
self.err_code().status_code
}
/// return error name, used as error code
fn name(&self) -> String {
self.err_code().error_name.to_string()
}
/// return the error type
fn type_(&self) -> String {
self.err_code().error_type.to_string()
}
/// return the doc url ascociated with the error
fn url(&self) -> String {
format!("https://docs.meilisearch.com/errors#{}", self.name())
}
}
/// Internal structure providing a convenient way to create error codes
struct ErrCode {
status_code: StatusCode,
error_type: ErrorType,
error_name: &'static str,
}
impl ErrCode {
fn authentication(error_name: &'static str, status_code: StatusCode) -> ErrCode {
ErrCode {
status_code,
error_name,
error_type: ErrorType::AuthenticationError,
}
}
fn internal(error_name: &'static str, status_code: StatusCode) -> ErrCode {
ErrCode {
status_code,
error_name,
error_type: ErrorType::InternalError,
}
}
fn invalid(error_name: &'static str, status_code: StatusCode) -> ErrCode {
ErrCode {
status_code,
error_name,
error_type: ErrorType::InvalidRequestError,
}
}
}

View File

@ -1,7 +1,7 @@
[package] [package]
name = "meilisearch-http" name = "meilisearch-http"
description = "MeiliSearch HTTP server" description = "MeiliSearch HTTP server"
version = "0.9.0" version = "0.15.0"
license = "MIT" license = "MIT"
authors = [ authors = [
"Quentin de Quelen <quentin@dequelen.me>", "Quentin de Quelen <quentin@dequelen.me>",
@ -13,46 +13,74 @@ edition = "2018"
name = "meilisearch" name = "meilisearch"
path = "src/main.rs" path = "src/main.rs"
[features]
default = ["sentry"]
[dependencies] [dependencies]
async-std = { version = "1.0.1", features = ["attributes"] } actix-cors = "0.3"
chrono = { version = "0.4.9", features = ["serde"] } actix-http = "2"
crossbeam-channel = "0.4.0" actix-rt = "1"
actix-service = "1.0.6"
actix-web = { version = "3", features = ["rustls"] }
bytes = "0.5.4"
chrono = { version = "0.4.11", features = ["serde"] }
crossbeam-channel = "0.4.2"
env_logger = "0.7.1" env_logger = "0.7.1"
futures = "0.3.1" flate2 = "1.0.16"
heed = "0.6.1" futures = "0.3.4"
http = "0.1.19" http = "0.1.19"
http-service = "0.4.0" indexmap = { version = "1.3.2", features = ["serde-1"] }
indexmap = { version = "1.3.0", features = ["serde-1"] }
log = "0.4.8" log = "0.4.8"
main_error = "0.1.0" main_error = "0.1.0"
meilisearch-core = { path = "../meilisearch-core", version = "0.9.0" } meilisearch-core = { path = "../meilisearch-core", version = "0.15.0" }
meilisearch-schema = { path = "../meilisearch-schema", version = "0.9.0" } meilisearch-error = { path = "../meilisearch-error", version = "0.15.0" }
meilisearch-schema = { path = "../meilisearch-schema", version = "0.15.0" }
meilisearch-tokenizer = {path = "../meilisearch-tokenizer", version = "0.15.0"}
mime = "0.3.16" mime = "0.3.16"
pretty-bytes = "0.2.2" once_cell = "1.4.1"
rand = "0.7.2" rand = "0.7.3"
rayon = "1.2.0" regex = "1.3.6"
serde = { version = "1.0.101", features = ["derive"] } rustls = "0.18"
serde_json = { version = "1.0.41", features = ["preserve_order"] } serde = { version = "1.0.105", features = ["derive"] }
serde_qs = "0.5.1" serde_json = { version = "1.0.50", features = ["preserve_order"] }
siphasher = "0.3.1" serde_qs = "0.5.2"
structopt = "0.3.3"
sysinfo = "0.9.5"
tide = "0.6.0"
ureq = { version = "0.11.2", features = ["tls"], default-features = false }
walkdir = "2.2.9"
whoami = "0.6"
sha2 = "0.8.1" sha2 = "0.8.1"
siphasher = "0.3.2"
slice-group-by = "0.2.6"
structopt = "0.3.12"
tar = "0.4.29"
tempfile = "3.1.0"
tokio = { version = "0.2.18", features = ["macros"] }
ureq = { version = "0.12.0", features = ["tls"], default-features = false }
walkdir = "2.3.1"
whoami = "0.8.1"
[dependencies.sentry]
version = "0.18.1"
default-features = false
features = [
"with_client_implementation",
"with_panic",
"with_failure",
"with_device_info",
"with_rust_info",
"with_reqwest_transport",
"with_rustls",
"with_env_logger"
]
optional = true
[dev-dependencies] [dev-dependencies]
http-service-mock = "0.4.0" serde_url_params = "0.2.0"
tempdir = "0.3.7" tempdir = "0.3.7"
tokio = { version = "0.2.18", features = ["macros", "time"] }
[dev-dependencies.assert-json-diff] [dev-dependencies.assert-json-diff]
git = "https://github.com/qdequele/assert-json-diff" git = "https://github.com/qdequele/assert-json-diff"
branch = "master" branch = "master"
[build-dependencies] [build-dependencies]
vergen = "3.0.4" vergen = "3.1.0"
[target.'cfg(unix)'.dependencies] [target.'cfg(unix)'.dependencies]
jemallocator = "0.3.2" jemallocator = "0.3.2"

View File

@ -79,6 +79,7 @@
box-sizing: border-box; box-sizing: border-box;
padding-left: 10px; padding-left: 10px;
color: rgba(0,0,0,.9); color: rgba(0,0,0,.9);
overflow-wrap: break-word;
} }
</style> </style>
</head> </head>
@ -93,6 +94,17 @@
<h2 class="subtitle"> <h2 class="subtitle">
This dashboard will help you check the search results with ease. This dashboard will help you check the search results with ease.
</h2> </h2>
<div class="field">
<!-- API Key -->
<div class="field">
<div class="control">
<input id="apiKey" class="input is-small" type="password" placeholder="API key (optional)">
<div class="help">At least a private API key is required for the dashboard to access the indexes list.</div>
</div>
</div>
</div>
</div> </div>
</div> </div>
</section> </section>
@ -124,13 +136,13 @@
<div class="level-item has-text-centered"> <div class="level-item has-text-centered">
<div> <div>
<p class="heading">Documents</p> <p class="heading">Documents</p>
<p id="count" class="title">25</p> <p id="count" class="title">0</p>
</div> </div>
</div> </div>
<div class="level-item has-text-centered"> <div class="level-item has-text-centered">
<div> <div>
<p class="heading">Time Spent</p> <p class="heading">Time Spent</p>
<p id="time" class="title">4ms</p> <p id="time" class="title">N/A</p>
</div> </div>
</div> </div>
</nav> </nav>
@ -147,13 +159,43 @@
</body> </body>
<script> <script>
function httpGet(theUrl) { function sanitizeHTMLEntities(str) {
if (str && typeof str === 'string') {
str = str.replace(/</g,"&lt;");
str = str.replace(/>/g,"&gt;");
str = str.replace(/&lt;em&gt;/g,"<em>");
str = str.replace(/&lt;\/em&gt;/g,"<\/em>");
}
return str;
}
function httpGet(theUrl, apiKey) {
var xmlHttp = new XMLHttpRequest(); var xmlHttp = new XMLHttpRequest();
xmlHttp.open("GET", theUrl, false); // false for synchronous request xmlHttp.open("GET", theUrl, false); // false for synchronous request
if (apiKey) {
xmlHttp.setRequestHeader("x-Meili-API-Key", apiKey);
}
xmlHttp.send(null); xmlHttp.send(null);
return xmlHttp.responseText; return xmlHttp.responseText;
} }
function refreshIndexList() {
// TODO we must not block here
let result = JSON.parse(httpGet(`${baseUrl}/indexes`, localStorage.getItem('apiKey')));
if (!Array.isArray(result)) { return }
let select = document.getElementById("index");
select.innerHTML = '';
for (index of result) {
const option = document.createElement('option');
option.value = index.uid;
option.innerHTML = index.name;
select.appendChild(option);
}
}
let lastRequest = undefined; let lastRequest = undefined;
function triggerSearch() { function triggerSearch() {
@ -167,13 +209,19 @@
lastRequest = new XMLHttpRequest(); lastRequest = new XMLHttpRequest();
lastRequest.open("GET", theUrl, true); lastRequest.open("GET", theUrl, true);
if (localStorage.getItem('apiKey')) {
lastRequest.setRequestHeader("x-Meili-API-Key", localStorage.getItem('apiKey'));
}
lastRequest.onload = function (e) { lastRequest.onload = function (e) {
if (lastRequest.readyState === 4 && lastRequest.status === 200) { if (lastRequest.readyState === 4 && lastRequest.status === 200) {
let httpResults = JSON.parse(lastRequest.responseText); let sanitizedResponseText = sanitizeHTMLEntities(lastRequest.responseText);
let httpResults = JSON.parse(sanitizedResponseText);
results.innerHTML = ''; results.innerHTML = '';
let processingTimeMs = httpResults.processingTimeMs; let processingTimeMs = httpResults.processingTimeMs;
let numberOfDocuments = httpResults.hits.length; let numberOfDocuments = httpResults.nbHits;
time.innerHTML = `${processingTimeMs}ms`; time.innerHTML = `${processingTimeMs}ms`;
count.innerHTML = `${numberOfDocuments}`; count.innerHTML = `${numberOfDocuments}`;
@ -204,7 +252,11 @@
const content = document.createElement('div'); const content = document.createElement('div');
content.classList.add("content"); content.classList.add("content");
content.innerHTML = element[prop]; if (typeof (element[prop]) === "object") {
content.innerHTML = JSON.stringify(element[prop]);
} else {
content.innerHTML = element[prop];
}
field.appendChild(attribute); field.appendChild(attribute);
field.appendChild(content); field.appendChild(content);
@ -234,19 +286,21 @@
lastRequest.send(null); lastRequest.send(null);
} }
let baseUrl = window.location.origin; if (!apiKey.value) {
// TODO we must not block here apiKey.value = localStorage.getItem('apiKey');
let result = JSON.parse(httpGet(`${baseUrl}/indexes`));
let select = document.getElementById("index");
for (index of result) {
const option = document.createElement('option');
option.value = index.uid;
option.innerHTML = index.name;
select.appendChild(option);
} }
apiKey.addEventListener('input', function(e) {
localStorage.setItem('apiKey', apiKey.value);
refreshIndexList();
}, false);
let baseUrl = window.location.origin;
refreshIndexList();
search.oninput = triggerSearch; search.oninput = triggerSearch;
let select = document.getElementById("index");
select.onchange = triggerSearch; select.onchange = triggerSearch;
triggerSearch(); triggerSearch();

View File

@ -1,20 +1,72 @@
use std::hash::{Hash, Hasher}; use std::hash::{Hash, Hasher};
use std::thread; use std::{error, thread};
use std::time::{Duration, SystemTime, UNIX_EPOCH}; use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH};
use log::error; use log::error;
use serde::Serialize; use serde::Serialize;
use serde_qs as qs; use serde_qs as qs;
use siphasher::sip::SipHasher; use siphasher::sip::SipHasher;
use walkdir::WalkDir;
use crate::Data;
use crate::Opt;
const AMPLITUDE_API_KEY: &str = "f7fba398780e06d8fe6666a9be7e3d47"; const AMPLITUDE_API_KEY: &str = "f7fba398780e06d8fe6666a9be7e3d47";
#[derive(Debug, Serialize)]
struct EventProperties {
database_size: u64,
last_update_timestamp: Option<i64>, //timestamp
number_of_documents: Vec<u64>,
}
impl EventProperties {
fn from(data: Data) -> Result<EventProperties, Box<dyn error::Error>> {
let mut index_list = Vec::new();
let reader = data.db.main_read_txn()?;
for index_uid in data.db.indexes_uids() {
if let Some(index) = data.db.open_index(&index_uid) {
let number_of_documents = index.main.number_of_documents(&reader)?;
index_list.push(number_of_documents);
}
}
let database_size = WalkDir::new(&data.db_path)
.into_iter()
.filter_map(|entry| entry.ok())
.filter_map(|entry| entry.metadata().ok())
.filter(|metadata| metadata.is_file())
.fold(0, |acc, m| acc + m.len());
let last_update_timestamp = data.db.last_update(&reader)?.map(|u| u.timestamp());
Ok(EventProperties {
database_size,
last_update_timestamp,
number_of_documents: index_list,
})
}
}
#[derive(Debug, Serialize)]
struct UserProperties<'a> {
env: &'a str,
start_since_days: u64,
user_email: Option<String>,
server_provider: Option<String>,
}
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
struct Event<'a> { struct Event<'a> {
user_id: &'a str, user_id: &'a str,
event_type: &'a str, event_type: &'a str,
device_id: &'a str, device_id: &'a str,
time: u64, time: u64,
app_version: &'a str,
user_properties: UserProperties<'a>,
event_properties: Option<EventProperties>,
} }
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
@ -23,7 +75,7 @@ struct AmplitudeRequest<'a> {
event: &'a str, event: &'a str,
} }
pub fn analytics_sender() { pub fn analytics_sender(data: Data, opt: Opt) {
let username = whoami::username(); let username = whoami::username();
let hostname = whoami::hostname(); let hostname = whoami::hostname();
let platform = whoami::platform(); let platform = whoami::platform();
@ -36,6 +88,7 @@ pub fn analytics_sender() {
let uid = format!("{:X}", hash); let uid = format!("{:X}", hash);
let platform = platform.to_string(); let platform = platform.to_string();
let first_start = Instant::now();
loop { loop {
let n = SystemTime::now().duration_since(UNIX_EPOCH).unwrap(); let n = SystemTime::now().duration_since(UNIX_EPOCH).unwrap();
@ -43,12 +96,27 @@ pub fn analytics_sender() {
let device_id = &platform; let device_id = &platform;
let time = n.as_secs(); let time = n.as_secs();
let event_type = "runtime_tick"; let event_type = "runtime_tick";
let elapsed_since_start = first_start.elapsed().as_secs() / 86_400; // One day
let event_properties = EventProperties::from(data.clone()).ok();
let app_version = env!("CARGO_PKG_VERSION").to_string();
let app_version = app_version.as_str();
let user_email = std::env::var("MEILI_USER_EMAIL").ok();
let server_provider = std::env::var("MEILI_SERVER_PROVIDER").ok();
let user_properties = UserProperties {
env: &opt.env,
start_since_days: elapsed_since_start,
user_email,
server_provider,
};
let event = Event { let event = Event {
user_id, user_id,
event_type, event_type,
device_id, device_id,
time, time,
app_version,
user_properties,
event_properties
}; };
let event = serde_json::to_string(&event).unwrap(); let event = serde_json::to_string(&event).unwrap();
@ -64,6 +132,6 @@ pub fn analytics_sender() {
error!("Unsuccessful call to Amplitude: {}", body); error!("Unsuccessful call to Amplitude: {}", body);
} }
thread::sleep(Duration::from_secs(86_400)) // one day thread::sleep(Duration::from_secs(3600)) // one hour
} }
} }

View File

@ -1,20 +1,13 @@
use std::collections::HashMap; use std::error::Error;
use std::ops::Deref; use std::ops::Deref;
use std::path::PathBuf;
use std::sync::Arc; use std::sync::Arc;
use chrono::{DateTime, Utc}; use meilisearch_core::{Database, DatabaseOptions};
use heed::types::{SerdeBincode, Str};
use log::error;
use meilisearch_core::{Database, Error as MError, MResult, MainT, UpdateT};
use sha2::Digest; use sha2::Digest;
use sysinfo::Pid;
use crate::index_update_callback;
use crate::option::Opt; use crate::option::Opt;
use crate::routes::index::index_update_callback;
const LAST_UPDATE_KEY: &str = "last-update";
type SerdeDatetime = SerdeBincode<DateTime<Utc>>;
#[derive(Clone)] #[derive(Clone)]
pub struct Data { pub struct Data {
@ -33,11 +26,14 @@ impl Deref for Data {
pub struct DataInner { pub struct DataInner {
pub db: Arc<Database>, pub db: Arc<Database>,
pub db_path: String, pub db_path: String,
pub dumps_folder: PathBuf,
pub dump_batch_size: usize,
pub api_keys: ApiKeys, pub api_keys: ApiKeys,
pub server_pid: Pid, pub server_pid: u32,
pub http_payload_size_limit: usize,
} }
#[derive(Default, Clone)] #[derive(Clone)]
pub struct ApiKeys { pub struct ApiKeys {
pub public: Option<String>, pub public: Option<String>,
pub private: Option<String>, pub private: Option<String>,
@ -61,81 +57,24 @@ impl ApiKeys {
} }
} }
impl DataInner {
pub fn is_indexing(&self, reader: &heed::RoTxn<UpdateT>, index: &str) -> MResult<Option<bool>> {
match self.db.open_index(&index) {
Some(index) => index.current_update_id(&reader).map(|u| Some(u.is_some())),
None => Ok(None),
}
}
pub fn last_update(&self, reader: &heed::RoTxn<MainT>) -> MResult<Option<DateTime<Utc>>> {
match self
.db
.common_store()
.get::<_, Str, SerdeDatetime>(reader, LAST_UPDATE_KEY)?
{
Some(datetime) => Ok(Some(datetime)),
None => Ok(None),
}
}
pub fn set_last_update(&self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
self.db
.common_store()
.put::<_, Str, SerdeDatetime>(writer, LAST_UPDATE_KEY, &Utc::now())
.map_err(Into::into)
}
pub fn compute_stats(&self, writer: &mut heed::RwTxn<MainT>, index_uid: &str) -> MResult<()> {
let index = match self.db.open_index(&index_uid) {
Some(index) => index,
None => {
error!("Impossible to retrieve index {}", index_uid);
return Ok(());
}
};
let schema = match index.main.schema(&writer)? {
Some(schema) => schema,
None => return Ok(()),
};
let all_documents_fields = index
.documents_fields_counts
.all_documents_fields_counts(&writer)?;
// count fields frequencies
let mut fields_frequency = HashMap::<_, usize>::new();
for result in all_documents_fields {
let (_, attr, _) = result?;
if let Some(field_id) = schema.indexed_pos_to_field_id(attr) {
*fields_frequency.entry(field_id).or_default() += 1;
}
}
// convert attributes to their names
let frequency: HashMap<_, _> = fields_frequency
.into_iter()
.filter_map(|(a, c)| schema.name(a).map(|name| (name.to_string(), c)))
.collect();
index
.main
.put_fields_frequency(writer, &frequency)
.map_err(MError::Zlmdb)
}
}
impl Data { impl Data {
pub fn new(opt: Opt) -> Data { pub fn new(opt: Opt) -> Result<Data, Box<dyn Error>> {
let db_path = opt.db_path.clone(); let db_path = opt.db_path.clone();
let server_pid = sysinfo::get_current_pid().unwrap(); let dumps_folder = opt.dumps_folder.clone();
let dump_batch_size = opt.dump_batch_size;
let server_pid = std::process::id();
let db = Arc::new(Database::open_or_create(opt.db_path).unwrap()); let db_opt = DatabaseOptions {
main_map_size: opt.max_mdb_size,
update_map_size: opt.max_udb_size,
};
let http_payload_size_limit = opt.http_payload_size_limit;
let db = Arc::new(Database::open_or_create(opt.db_path, db_opt)?);
let mut api_keys = ApiKeys { let mut api_keys = ApiKeys {
master: opt.master_key.clone(), master: opt.master_key,
private: None, private: None,
public: None, public: None,
}; };
@ -145,8 +84,11 @@ impl Data {
let inner_data = DataInner { let inner_data = DataInner {
db: db.clone(), db: db.clone(),
db_path, db_path,
dumps_folder,
dump_batch_size,
api_keys, api_keys,
server_pid, server_pid,
http_payload_size_limit,
}; };
let data = Data { let data = Data {
@ -158,6 +100,6 @@ impl Data {
index_update_callback(&index_uid, &callback_context, status); index_update_callback(&index_uid, &callback_context, status);
})); }));
data Ok(data)
} }
} }

View File

@ -0,0 +1,424 @@
use std::fs::{create_dir_all, File};
use std::io::prelude::*;
use std::path::{Path, PathBuf};
use std::sync::Mutex;
use std::thread;
use actix_web::web;
use chrono::offset::Utc;
use indexmap::IndexMap;
use log::{error, info};
use meilisearch_core::{MainWriter, MainReader, UpdateReader};
use meilisearch_core::settings::Settings;
use meilisearch_core::update::{apply_settings_update, apply_documents_addition};
use once_cell::sync::Lazy;
use serde::{Deserialize, Serialize};
use tempfile::TempDir;
use crate::Data;
use crate::error::Error;
use crate::helpers::compression;
use crate::routes::index;
use crate::routes::index::IndexResponse;
// Mutex to share dump progress.
static DUMP_INFO: Lazy<Mutex<Option<DumpInfo>>> = Lazy::new(Mutex::default);
#[derive(Debug, Serialize, Deserialize, Copy, Clone)]
enum DumpVersion {
V1,
}
impl DumpVersion {
const CURRENT: Self = Self::V1;
}
#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct DumpMetadata {
indexes: Vec<crate::routes::index::IndexResponse>,
db_version: String,
dump_version: DumpVersion,
}
impl DumpMetadata {
/// Create a DumpMetadata with the current dump version of meilisearch.
pub fn new(indexes: Vec<crate::routes::index::IndexResponse>, db_version: String) -> Self {
DumpMetadata {
indexes,
db_version,
dump_version: DumpVersion::CURRENT,
}
}
/// Extract DumpMetadata from `metadata.json` file present at provided `folder_path`
fn from_path(folder_path: &Path) -> Result<Self, Error> {
let path = folder_path.join("metadata.json");
let file = File::open(path)?;
let reader = std::io::BufReader::new(file);
let metadata = serde_json::from_reader(reader)?;
Ok(metadata)
}
/// Write DumpMetadata in `metadata.json` file at provided `folder_path`
fn to_path(&self, folder_path: &Path) -> Result<(), Error> {
let path = folder_path.join("metadata.json");
let file = File::create(path)?;
serde_json::to_writer(file, &self)?;
Ok(())
}
}
/// Extract Settings from `settings.json` file present at provided `folder_path`
fn settings_from_path(folder_path: &Path) -> Result<Settings, Error> {
let path = folder_path.join("settings.json");
let file = File::open(path)?;
let reader = std::io::BufReader::new(file);
let metadata = serde_json::from_reader(reader)?;
Ok(metadata)
}
/// Write Settings in `settings.json` file at provided `folder_path`
fn settings_to_path(settings: &Settings, folder_path: &Path) -> Result<(), Error> {
let path = folder_path.join("settings.json");
let file = File::create(path)?;
serde_json::to_writer(file, settings)?;
Ok(())
}
/// Import settings and documents of a dump with version `DumpVersion::V1` in specified index.
fn import_index_v1(
data: &Data,
dumps_folder: &Path,
index_uid: &str,
document_batch_size: usize,
write_txn: &mut MainWriter,
) -> Result<(), Error> {
// open index
let index = data
.db
.open_index(index_uid)
.ok_or(Error::index_not_found(index_uid))?;
// index folder path in dump folder
let index_path = &dumps_folder.join(index_uid);
// extract `settings.json` file and import content
let settings = settings_from_path(&index_path)?;
let settings = settings.to_update().or_else(|_e| Err(Error::dump_failed()))?;
apply_settings_update(write_txn, &index, settings)?;
// create iterator over documents in `documents.jsonl` to make batch importation
// create iterator over documents in `documents.jsonl` to make batch importation
let documents = {
let file = File::open(&index_path.join("documents.jsonl"))?;
let reader = std::io::BufReader::new(file);
let deserializer = serde_json::Deserializer::from_reader(reader);
deserializer.into_iter::<IndexMap<String, serde_json::Value>>()
};
// batch import document every `document_batch_size`:
// create a Vec to bufferize documents
let mut values = Vec::with_capacity(document_batch_size);
// iterate over documents
for document in documents {
// push document in buffer
values.push(document?);
// if buffer is full, create and apply a batch, and clean buffer
if values.len() == document_batch_size {
let batch = std::mem::replace(&mut values, Vec::with_capacity(document_batch_size));
apply_documents_addition(write_txn, &index, batch)?;
}
}
// apply documents remaining in the buffer
if !values.is_empty() {
apply_documents_addition(write_txn, &index, values)?;
}
// sync index information: stats, updated_at, last_update
if let Err(e) = crate::index_update_callback_txn(index, index_uid, data, write_txn) {
return Err(Error::Internal(e));
}
Ok(())
}
/// Import dump from `dump_path` in database.
pub fn import_dump(
data: &Data,
dump_path: &Path,
document_batch_size: usize,
) -> Result<(), Error> {
info!("Importing dump from {:?}...", dump_path);
// create a temporary directory
let tmp_dir = TempDir::new()?;
let tmp_dir_path = tmp_dir.path();
// extract dump in temporary directory
compression::from_tar_gz(dump_path, tmp_dir_path)?;
// read dump metadata
let metadata = DumpMetadata::from_path(&tmp_dir_path)?;
// choose importation function from DumpVersion of metadata
let import_index = match metadata.dump_version {
DumpVersion::V1 => import_index_v1,
};
// remove indexes which have same `uid` than indexes to import and create empty indexes
let existing_index_uids = data.db.indexes_uids();
for index in metadata.indexes.iter() {
if existing_index_uids.contains(&index.uid) {
data.db.delete_index(index.uid.clone())?;
}
index::create_index_sync(&data.db, index.uid.clone(), index.name.clone(), index.primary_key.clone())?;
}
// import each indexes content
data.db.main_write::<_, _, Error>(|mut writer| {
for index in metadata.indexes {
import_index(&data, tmp_dir_path, &index.uid, document_batch_size, &mut writer)?;
}
Ok(())
})?;
info!("Dump importation from {:?} succeed", dump_path);
Ok(())
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Clone)]
#[serde(rename_all = "snake_case")]
pub enum DumpStatus {
Done,
Processing,
DumpProcessFailed,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct DumpInfo {
pub uid: String,
pub status: DumpStatus,
#[serde(skip_serializing_if = "Option::is_none")]
pub error: Option<String>,
}
impl DumpInfo {
pub fn new(uid: String, status: DumpStatus) -> Self {
Self { uid, status, error: None }
}
pub fn with_error(mut self, error: String) -> Self {
self.status = DumpStatus::DumpProcessFailed;
self.error = Some(error);
self
}
pub fn dump_already_in_progress(&self) -> bool {
self.status == DumpStatus::Processing
}
pub fn get_current() -> Option<Self> {
DUMP_INFO.lock().unwrap().clone()
}
pub fn set_current(&self) {
*DUMP_INFO.lock().unwrap() = Some(self.clone());
}
}
/// Generate uid from creation date
fn generate_uid() -> String {
Utc::now().format("%Y%m%d-%H%M%S%3f").to_string()
}
/// Infer dumps_folder from dump_uid
pub fn compressed_dumps_folder(dumps_folder: &Path, dump_uid: &str) -> PathBuf {
dumps_folder.join(format!("{}.tar.gz", dump_uid))
}
/// Write metadata in dump
fn dump_metadata(data: &web::Data<Data>, folder_path: &Path, indexes: Vec<IndexResponse>) -> Result<(), Error> {
let (db_major, db_minor, db_patch) = data.db.version();
let metadata = DumpMetadata::new(indexes, format!("{}.{}.{}", db_major, db_minor, db_patch));
metadata.to_path(folder_path)
}
/// Export settings of provided index in dump
fn dump_index_settings(data: &web::Data<Data>, reader: &MainReader, folder_path: &Path, index_uid: &str) -> Result<(), Error> {
let settings = crate::routes::setting::get_all_sync(data, reader, index_uid)?;
settings_to_path(&settings, folder_path)
}
/// Export updates of provided index in dump
fn dump_index_updates(data: &web::Data<Data>, reader: &UpdateReader, folder_path: &Path, index_uid: &str) -> Result<(), Error> {
let updates_path = folder_path.join("updates.jsonl");
let updates = crate::routes::index::get_all_updates_status_sync(data, reader, index_uid)?;
let file = File::create(updates_path)?;
for update in updates {
serde_json::to_writer(&file, &update)?;
writeln!(&file)?;
}
Ok(())
}
/// Export documents of provided index in dump
fn dump_index_documents(data: &web::Data<Data>, reader: &MainReader, folder_path: &Path, index_uid: &str) -> Result<(), Error> {
let documents_path = folder_path.join("documents.jsonl");
let file = File::create(documents_path)?;
let dump_batch_size = data.dump_batch_size;
let mut offset = 0;
loop {
let documents = crate::routes::document::get_all_documents_sync(data, reader, index_uid, offset, dump_batch_size, None)?;
if documents.len() == 0 { break; } else { offset += dump_batch_size; }
for document in documents {
serde_json::to_writer(&file, &document)?;
writeln!(&file)?;
}
}
Ok(())
}
/// Write error with a context.
fn fail_dump_process<E: std::error::Error>(dump_info: DumpInfo, context: &str, error: E) {
let error = format!("Something went wrong during dump process: {}; {}", context, error);
error!("{}", &error);
dump_info.with_error(error).set_current();
}
/// Main function of dump.
fn dump_process(data: web::Data<Data>, dumps_folder: PathBuf, dump_info: DumpInfo) {
// open read transaction on Update
let update_reader = match data.db.update_read_txn() {
Ok(r) => r,
Err(e) => {
fail_dump_process(dump_info, "creating RO transaction on updates", e);
return ;
}
};
// open read transaction on Main
let main_reader = match data.db.main_read_txn() {
Ok(r) => r,
Err(e) => {
fail_dump_process(dump_info, "creating RO transaction on main", e);
return ;
}
};
// create a temporary directory
let tmp_dir = match TempDir::new() {
Ok(tmp_dir) => tmp_dir,
Err(e) => {
fail_dump_process(dump_info, "creating temporary directory", e);
return ;
}
};
let tmp_dir_path = tmp_dir.path();
// fetch indexes
let indexes = match crate::routes::index::list_indexes_sync(&data, &main_reader) {
Ok(indexes) => indexes,
Err(e) => {
fail_dump_process(dump_info, "listing indexes", e);
return ;
}
};
// create metadata
if let Err(e) = dump_metadata(&data, &tmp_dir_path, indexes.clone()) {
fail_dump_process(dump_info, "generating metadata", e);
return ;
}
// export settings, updates and documents for each indexes
for index in indexes {
let index_path = tmp_dir_path.join(&index.uid);
// create index sub-dircetory
if let Err(e) = create_dir_all(&index_path) {
fail_dump_process(dump_info, &format!("creating directory for index {}", &index.uid), e);
return ;
}
// export settings
if let Err(e) = dump_index_settings(&data, &main_reader, &index_path, &index.uid) {
fail_dump_process(dump_info, &format!("generating settings for index {}", &index.uid), e);
return ;
}
// export documents
if let Err(e) = dump_index_documents(&data, &main_reader, &index_path, &index.uid) {
fail_dump_process(dump_info, &format!("generating documents for index {}", &index.uid), e);
return ;
}
// export updates
if let Err(e) = dump_index_updates(&data, &update_reader, &index_path, &index.uid) {
fail_dump_process(dump_info, &format!("generating updates for index {}", &index.uid), e);
return ;
}
}
// compress dump in a file named `{dump_uid}.tar.gz` in `dumps_folder`
if let Err(e) = crate::helpers::compression::to_tar_gz(&tmp_dir_path, &compressed_dumps_folder(&dumps_folder, &dump_info.uid)) {
fail_dump_process(dump_info, "compressing dump", e);
return ;
}
// update dump info to `done`
let resume = DumpInfo::new(
dump_info.uid,
DumpStatus::Done
);
resume.set_current();
}
pub fn init_dump_process(data: &web::Data<Data>, dumps_folder: &Path) -> Result<DumpInfo, Error> {
create_dir_all(dumps_folder).or(Err(Error::dump_failed()))?;
// check if a dump is already in progress
if let Some(resume) = DumpInfo::get_current() {
if resume.dump_already_in_progress() {
return Err(Error::dump_conflict())
}
}
// generate a new dump info
let info = DumpInfo::new(
generate_uid(),
DumpStatus::Processing
);
info.set_current();
let data = data.clone();
let dumps_folder = dumps_folder.to_path_buf();
let info_cloned = info.clone();
// run dump process in a new thread
thread::spawn(move ||
dump_process(data, dumps_folder, info_cloned)
);
Ok(info)
}

View File

@ -1,183 +1,302 @@
use std::fmt::Display; use std::error;
use std::fmt;
use http::status::StatusCode; use actix_http::ResponseBuilder;
use log::{error, warn}; use actix_web as aweb;
use meilisearch_core::{FstError, HeedError}; use actix_web::error::{JsonPayloadError, QueryPayloadError};
use serde::{Deserialize, Serialize}; use actix_web::http::StatusCode;
use tide::IntoResponse; use serde_json::json;
use tide::Response;
use crate::helpers::meilisearch::Error as SearchError; use meilisearch_error::{ErrorCode, Code};
pub type SResult<T> = Result<T, ResponseError>; #[derive(Debug)]
pub struct ResponseError {
inner: Box<dyn ErrorCode>,
}
pub enum ResponseError { impl error::Error for ResponseError {}
Internal(String),
BadRequest(String), impl ErrorCode for ResponseError {
InvalidToken(String), fn error_code(&self) -> Code {
NotFound(String), self.inner.error_code()
IndexNotFound(String), }
DocumentNotFound(String), }
MissingHeader(String),
impl fmt::Display for ResponseError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.inner.fmt(f)
}
}
impl From<Error> for ResponseError {
fn from(error: Error) -> ResponseError {
ResponseError { inner: Box::new(error) }
}
}
#[derive(Debug)]
pub enum Error {
BadParameter(String, String), BadParameter(String, String),
OpenIndex(String), BadRequest(String),
CreateIndex(String), CreateIndex(String),
DocumentNotFound(String),
IndexNotFound(String),
IndexAlreadyExists(String),
Internal(String),
InvalidIndexUid, InvalidIndexUid,
InvalidToken(String),
Maintenance, Maintenance,
MissingAuthorizationHeader,
NotFound(String),
OpenIndex(String),
RetrieveDocument(u32, String),
SearchDocuments(String),
PayloadTooLarge,
UnsupportedMediaType,
DumpAlreadyInProgress,
DumpProcessFailed,
} }
impl ResponseError { impl error::Error for Error {}
pub fn internal(message: impl Display) -> ResponseError {
ResponseError::Internal(message.to_string())
}
pub fn bad_request(message: impl Display) -> ResponseError { impl ErrorCode for Error {
ResponseError::BadRequest(message.to_string()) fn error_code(&self) -> Code {
} use Error::*;
pub fn invalid_token(message: impl Display) -> ResponseError {
ResponseError::InvalidToken(message.to_string())
}
pub fn not_found(message: impl Display) -> ResponseError {
ResponseError::NotFound(message.to_string())
}
pub fn index_not_found(message: impl Display) -> ResponseError {
ResponseError::IndexNotFound(message.to_string())
}
pub fn document_not_found(message: impl Display) -> ResponseError {
ResponseError::DocumentNotFound(message.to_string())
}
pub fn missing_header(message: impl Display) -> ResponseError {
ResponseError::MissingHeader(message.to_string())
}
pub fn bad_parameter(name: impl Display, message: impl Display) -> ResponseError {
ResponseError::BadParameter(name.to_string(), message.to_string())
}
pub fn open_index(message: impl Display) -> ResponseError {
ResponseError::OpenIndex(message.to_string())
}
pub fn create_index(message: impl Display) -> ResponseError {
ResponseError::CreateIndex(message.to_string())
}
}
impl IntoResponse for ResponseError {
fn into_response(self) -> Response {
match self { match self {
ResponseError::Internal(err) => { BadParameter(_, _) => Code::BadParameter,
error!("internal server error: {}", err); BadRequest(_) => Code::BadRequest,
error( CreateIndex(_) => Code::CreateIndex,
String::from("Internal server error"), DocumentNotFound(_) => Code::DocumentNotFound,
StatusCode::INTERNAL_SERVER_ERROR, IndexNotFound(_) => Code::IndexNotFound,
) IndexAlreadyExists(_) => Code::IndexAlreadyExists,
} Internal(_) => Code::Internal,
ResponseError::BadRequest(err) => { InvalidIndexUid => Code::InvalidIndexUid,
warn!("bad request: {}", err); InvalidToken(_) => Code::InvalidToken,
error(err, StatusCode::BAD_REQUEST) Maintenance => Code::Maintenance,
} MissingAuthorizationHeader => Code::MissingAuthorizationHeader,
ResponseError::InvalidToken(err) => { NotFound(_) => Code::NotFound,
error(format!("Invalid API key: {}", err), StatusCode::FORBIDDEN) OpenIndex(_) => Code::OpenIndex,
} RetrieveDocument(_, _) => Code::RetrieveDocument,
ResponseError::NotFound(err) => error(err, StatusCode::NOT_FOUND), SearchDocuments(_) => Code::SearchDocuments,
ResponseError::IndexNotFound(index) => { PayloadTooLarge => Code::PayloadTooLarge,
error(format!("Index {} not found", index), StatusCode::NOT_FOUND) UnsupportedMediaType => Code::UnsupportedMediaType,
} DumpAlreadyInProgress => Code::DumpAlreadyInProgress,
ResponseError::DocumentNotFound(id) => error( DumpProcessFailed => Code::DumpProcessFailed,
format!("Document with id {} not found", id),
StatusCode::NOT_FOUND,
),
ResponseError::MissingHeader(header) => error(
format!("Header {} is missing", header),
StatusCode::UNAUTHORIZED,
),
ResponseError::BadParameter(param, e) => error(
format!("Url parameter {} error: {}", param, e),
StatusCode::BAD_REQUEST,
),
ResponseError::CreateIndex(err) => error(
format!("Impossible to create index; {}", err),
StatusCode::BAD_REQUEST,
),
ResponseError::OpenIndex(err) => error(
format!("Impossible to open index; {}", err),
StatusCode::BAD_REQUEST,
),
ResponseError::InvalidIndexUid => error(
"Index must have a valid uid; Index uid can be of type integer or string only composed of alphanumeric characters, hyphens (-) and underscores (_).".to_string(),
StatusCode::BAD_REQUEST,
),
ResponseError::Maintenance => error(
String::from("Server is in maintenance, please try again later"),
StatusCode::SERVICE_UNAVAILABLE,
),
} }
} }
} }
#[derive(Serialize, Deserialize)] #[derive(Debug)]
struct ErrorMessage { pub enum FacetCountError {
message: String, AttributeNotSet(String),
SyntaxError(String),
UnexpectedToken { found: String, expected: &'static [&'static str] },
NoFacetSet,
} }
fn error(message: String, status: StatusCode) -> Response { impl error::Error for FacetCountError {}
let message = ErrorMessage { message };
tide::Response::new(status.as_u16()) impl ErrorCode for FacetCountError {
.body_json(&message) fn error_code(&self) -> Code {
.unwrap() Code::BadRequest
}
} }
impl From<serde_json::Error> for ResponseError { impl FacetCountError {
fn from(err: serde_json::Error) -> ResponseError { pub fn unexpected_token(found: impl ToString, expected: &'static [&'static str]) -> FacetCountError {
ResponseError::internal(err) let found = found.to_string();
FacetCountError::UnexpectedToken { expected, found }
}
}
impl From<serde_json::error::Error> for FacetCountError {
fn from(other: serde_json::error::Error) -> FacetCountError {
FacetCountError::SyntaxError(other.to_string())
}
}
impl fmt::Display for FacetCountError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
use FacetCountError::*;
match self {
AttributeNotSet(attr) => write!(f, "Attribute {} is not set as facet", attr),
SyntaxError(msg) => write!(f, "Syntax error: {}", msg),
UnexpectedToken { expected, found } => write!(f, "Unexpected {} found, expected {:?}", found, expected),
NoFacetSet => write!(f, "Can't perform facet count, as no facet is set"),
}
}
}
impl Error {
pub fn internal(err: impl fmt::Display) -> Error {
Error::Internal(err.to_string())
}
pub fn bad_request(err: impl fmt::Display) -> Error {
Error::BadRequest(err.to_string())
}
pub fn missing_authorization_header() -> Error {
Error::MissingAuthorizationHeader
}
pub fn invalid_token(err: impl fmt::Display) -> Error {
Error::InvalidToken(err.to_string())
}
pub fn not_found(err: impl fmt::Display) -> Error {
Error::NotFound(err.to_string())
}
pub fn index_not_found(err: impl fmt::Display) -> Error {
Error::IndexNotFound(err.to_string())
}
pub fn document_not_found(err: impl fmt::Display) -> Error {
Error::DocumentNotFound(err.to_string())
}
pub fn bad_parameter(param: impl fmt::Display, err: impl fmt::Display) -> Error {
Error::BadParameter(param.to_string(), err.to_string())
}
pub fn open_index(err: impl fmt::Display) -> Error {
Error::OpenIndex(err.to_string())
}
pub fn create_index(err: impl fmt::Display) -> Error {
Error::CreateIndex(err.to_string())
}
pub fn invalid_index_uid() -> Error {
Error::InvalidIndexUid
}
pub fn maintenance() -> Error {
Error::Maintenance
}
pub fn retrieve_document(doc_id: u32, err: impl fmt::Display) -> Error {
Error::RetrieveDocument(doc_id, err.to_string())
}
pub fn search_documents(err: impl fmt::Display) -> Error {
Error::SearchDocuments(err.to_string())
}
pub fn dump_conflict() -> Error {
Error::DumpAlreadyInProgress
}
pub fn dump_failed() -> Error {
Error::DumpProcessFailed
}
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::BadParameter(param, err) => write!(f, "Url parameter {} error: {}", param, err),
Self::BadRequest(err) => f.write_str(err),
Self::CreateIndex(err) => write!(f, "Impossible to create index; {}", err),
Self::DocumentNotFound(document_id) => write!(f, "Document with id {} not found", document_id),
Self::IndexNotFound(index_uid) => write!(f, "Index {} not found", index_uid),
Self::IndexAlreadyExists(index_uid) => write!(f, "Index {} already exists", index_uid),
Self::Internal(err) => f.write_str(err),
Self::InvalidIndexUid => f.write_str("Index must have a valid uid; Index uid can be of type integer or string only composed of alphanumeric characters, hyphens (-) and underscores (_)."),
Self::InvalidToken(err) => write!(f, "Invalid API key: {}", err),
Self::Maintenance => f.write_str("Server is in maintenance, please try again later"),
Self::MissingAuthorizationHeader => f.write_str("You must have an authorization token"),
Self::NotFound(err) => write!(f, "{} not found", err),
Self::OpenIndex(err) => write!(f, "Impossible to open index; {}", err),
Self::RetrieveDocument(id, err) => write!(f, "Impossible to retrieve the document with id: {}; {}", id, err),
Self::SearchDocuments(err) => write!(f, "Impossible to search documents; {}", err),
Self::PayloadTooLarge => f.write_str("Payload too large"),
Self::UnsupportedMediaType => f.write_str("Unsupported media type"),
Self::DumpAlreadyInProgress => f.write_str("Another dump is already in progress"),
Self::DumpProcessFailed => f.write_str("Dump process failed"),
}
}
}
impl aweb::error::ResponseError for ResponseError {
fn error_response(&self) -> aweb::HttpResponse {
ResponseBuilder::new(self.status_code()).json(json!({
"message": self.to_string(),
"errorCode": self.error_name(),
"errorType": self.error_type(),
"errorLink": self.error_url(),
}))
}
fn status_code(&self) -> StatusCode {
self.http_status()
}
}
impl From<std::io::Error> for Error {
fn from(err: std::io::Error) -> Error {
Error::Internal(err.to_string())
} }
} }
impl From<meilisearch_core::Error> for ResponseError { impl From<meilisearch_core::Error> for ResponseError {
fn from(err: meilisearch_core::Error) -> ResponseError { fn from(err: meilisearch_core::Error) -> ResponseError {
ResponseError::internal(err) ResponseError { inner: Box::new(err) }
} }
} }
impl From<HeedError> for ResponseError { impl From<meilisearch_schema::Error> for ResponseError {
fn from(err: HeedError) -> ResponseError { fn from(err: meilisearch_schema::Error) -> ResponseError {
ResponseError::internal(err) ResponseError { inner: Box::new(err) }
} }
} }
impl From<FstError> for ResponseError { impl From<actix_http::Error> for Error {
fn from(err: FstError) -> ResponseError { fn from(err: actix_http::Error) -> Error {
ResponseError::internal(err) Error::Internal(err.to_string())
} }
} }
impl From<SearchError> for ResponseError { impl From<meilisearch_core::Error> for Error {
fn from(err: SearchError) -> ResponseError { fn from(err: meilisearch_core::Error) -> Error {
ResponseError::internal(err) Error::Internal(err.to_string())
} }
} }
impl From<meilisearch_core::settings::RankingRuleConversionError> for ResponseError { impl From<serde_json::error::Error> for Error {
fn from(err: meilisearch_core::settings::RankingRuleConversionError) -> ResponseError { fn from(err: serde_json::error::Error) -> Error {
ResponseError::internal(err) Error::Internal(err.to_string())
} }
} }
pub trait IntoInternalError<T> { impl From<FacetCountError> for ResponseError {
fn into_internal_error(self) -> SResult<T>; fn from(err: FacetCountError) -> ResponseError {
ResponseError { inner: Box::new(err) }
}
} }
impl<T> IntoInternalError<T> for Option<T> { impl From<JsonPayloadError> for Error {
fn into_internal_error(self) -> SResult<T> { fn from(err: JsonPayloadError) -> Error {
match self { match err {
Some(value) => Ok(value), JsonPayloadError::Deserialize(err) => Error::BadRequest(format!("Invalid JSON: {}", err)),
None => Err(ResponseError::internal("Heed cannot find requested value")), JsonPayloadError::Overflow => Error::PayloadTooLarge,
JsonPayloadError::ContentType => Error::UnsupportedMediaType,
JsonPayloadError::Payload(err) => Error::BadRequest(format!("Problem while decoding the request: {}", err)),
} }
} }
} }
impl From<QueryPayloadError> for Error {
fn from(err: QueryPayloadError) -> Error {
match err {
QueryPayloadError::Deserialize(err) => Error::BadRequest(format!("Invalid query parameters: {}", err)),
}
}
}
pub fn payload_error_handler<E: Into<Error>>(err: E) -> ResponseError {
let error: Error = err.into();
error.into()
}

View File

@ -0,0 +1,103 @@
use std::cell::RefCell;
use std::pin::Pin;
use std::rc::Rc;
use std::task::{Context, Poll};
use actix_service::{Service, Transform};
use actix_web::{dev::ServiceRequest, dev::ServiceResponse, web};
use futures::future::{err, ok, Future, Ready};
use crate::error::{Error, ResponseError};
use crate::Data;
#[derive(Clone)]
pub enum Authentication {
Public,
Private,
Admin,
}
impl<S: 'static, B> Transform<S> for Authentication
where
S: Service<Request = ServiceRequest, Response = ServiceResponse<B>, Error = actix_web::Error>,
S::Future: 'static,
B: 'static,
{
type Request = ServiceRequest;
type Response = ServiceResponse<B>;
type Error = actix_web::Error;
type InitError = ();
type Transform = LoggingMiddleware<S>;
type Future = Ready<Result<Self::Transform, Self::InitError>>;
fn new_transform(&self, service: S) -> Self::Future {
ok(LoggingMiddleware {
acl: self.clone(),
service: Rc::new(RefCell::new(service)),
})
}
}
pub struct LoggingMiddleware<S> {
acl: Authentication,
service: Rc<RefCell<S>>,
}
#[allow(clippy::type_complexity)]
impl<S, B> Service for LoggingMiddleware<S>
where
S: Service<Request = ServiceRequest, Response = ServiceResponse<B>, Error = actix_web::Error> + 'static,
S::Future: 'static,
B: 'static,
{
type Request = ServiceRequest;
type Response = ServiceResponse<B>;
type Error = actix_web::Error;
type Future = Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>>>>;
fn poll_ready(&mut self, cx: &mut Context) -> Poll<Result<(), Self::Error>> {
self.service.poll_ready(cx)
}
fn call(&mut self, req: ServiceRequest) -> Self::Future {
let mut svc = self.service.clone();
// This unwrap is left because this error should never appear. If that's the case, then
// it means that actix-web has an issue or someone changes the type `Data`.
let data = req.app_data::<web::Data<Data>>().unwrap();
if data.api_keys.master.is_none() {
return Box::pin(svc.call(req));
}
let auth_header = match req.headers().get("X-Meili-API-Key") {
Some(auth) => match auth.to_str() {
Ok(auth) => auth,
Err(_) => return Box::pin(err(ResponseError::from(Error::MissingAuthorizationHeader).into())),
},
None => {
return Box::pin(err(ResponseError::from(Error::MissingAuthorizationHeader).into()));
}
};
let authenticated = match self.acl {
Authentication::Admin => data.api_keys.master.as_deref() == Some(auth_header),
Authentication::Private => {
data.api_keys.master.as_deref() == Some(auth_header)
|| data.api_keys.private.as_deref() == Some(auth_header)
}
Authentication::Public => {
data.api_keys.master.as_deref() == Some(auth_header)
|| data.api_keys.private.as_deref() == Some(auth_header)
|| data.api_keys.public.as_deref() == Some(auth_header)
}
};
if authenticated {
Box::pin(svc.call(req))
} else {
Box::pin(err(
ResponseError::from(Error::InvalidToken(auth_header.to_string())).into()
))
}
}
}

View File

@ -0,0 +1,27 @@
use flate2::Compression;
use flate2::read::GzDecoder;
use flate2::write::GzEncoder;
use std::fs::{create_dir_all, File};
use std::path::Path;
use tar::{Builder, Archive};
use crate::error::Error;
pub fn to_tar_gz(src: &Path, dest: &Path) -> Result<(), Error> {
let f = File::create(dest)?;
let gz_encoder = GzEncoder::new(f, Compression::default());
let mut tar_encoder = Builder::new(gz_encoder);
tar_encoder.append_dir_all(".", src)?;
let gz_encoder = tar_encoder.into_inner()?;
gz_encoder.finish()?;
Ok(())
}
pub fn from_tar_gz(src: &Path, dest: &Path) -> Result<(), Error> {
let f = File::open(src)?;
let gz = GzDecoder::new(f);
let mut ar = Archive::new(gz);
create_dir_all(dest)?;
ar.unpack(dest)?;
Ok(())
}

View File

@ -1,82 +1,30 @@
use std::cmp::Ordering; use std::cmp::Ordering;
use std::collections::{HashMap, HashSet}; use std::collections::{HashMap, HashSet};
use std::convert::From;
use std::error;
use std::fmt;
use std::hash::{Hash, Hasher}; use std::hash::{Hash, Hasher};
use std::time::{Duration, Instant}; use std::time::Instant;
use indexmap::IndexMap; use indexmap::IndexMap;
use log::error; use log::error;
use meilisearch_core::{Filter, MainReader};
use meilisearch_core::facets::FacetFilter;
use meilisearch_core::criterion::*; use meilisearch_core::criterion::*;
use meilisearch_core::settings::RankingRule; use meilisearch_core::settings::RankingRule;
use meilisearch_core::{Highlight, Index, MainT, RankedMap}; use meilisearch_core::{Highlight, Index, RankedMap};
use meilisearch_schema::{FieldId, Schema}; use meilisearch_schema::{FieldId, Schema};
use meilisearch_tokenizer::is_cjk;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use serde_json::Value; use serde_json::Value;
use siphasher::sip::SipHasher; use siphasher::sip::SipHasher;
use slice_group_by::GroupBy;
#[derive(Debug)] use crate::error::{Error, ResponseError};
pub enum Error {
SearchDocuments(String),
RetrieveDocument(u64, String),
DocumentNotFound(u64),
CropFieldWrongType(String),
AttributeNotFoundOnDocument(String),
AttributeNotFoundOnSchema(String),
MissingFilterValue,
UnknownFilteredAttribute,
Internal(String),
}
impl error::Error for Error {}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
use Error::*;
match self {
SearchDocuments(err) => write!(f, "impossible to search documents; {}", err),
RetrieveDocument(id, err) => write!(
f,
"impossible to retrieve the document with id: {}; {}",
id, err
),
DocumentNotFound(id) => write!(f, "document {} not found", id),
CropFieldWrongType(field) => {
write!(f, "the field {} cannot be cropped it's not a string", field)
}
AttributeNotFoundOnDocument(field) => {
write!(f, "field {} is not found on document", field)
}
AttributeNotFoundOnSchema(field) => write!(f, "field {} is not found on schema", field),
MissingFilterValue => f.write_str("a filter doesn't have a value to compare it with"),
UnknownFilteredAttribute => {
f.write_str("a filter is specifying an unknown schema attribute")
}
Internal(err) => write!(f, "internal error; {}", err),
}
}
}
impl From<meilisearch_core::Error> for Error {
fn from(error: meilisearch_core::Error) -> Self {
Error::Internal(error.to_string())
}
}
impl From<heed::Error> for Error {
fn from(error: heed::Error) -> Self {
Error::Internal(error.to_string())
}
}
pub trait IndexSearchExt { pub trait IndexSearchExt {
fn new_search(&self, query: String) -> SearchBuilder; fn new_search(&self, query: Option<String>) -> SearchBuilder;
} }
impl IndexSearchExt for Index { impl IndexSearchExt for Index {
fn new_search(&self, query: String) -> SearchBuilder { fn new_search(&self, query: Option<String>) -> SearchBuilder {
SearchBuilder { SearchBuilder {
index: self, index: self,
query, query,
@ -86,23 +34,25 @@ impl IndexSearchExt for Index {
attributes_to_retrieve: None, attributes_to_retrieve: None,
attributes_to_highlight: None, attributes_to_highlight: None,
filters: None, filters: None,
timeout: Duration::from_millis(30),
matches: false, matches: false,
facet_filters: None,
facets: None,
} }
} }
} }
pub struct SearchBuilder<'a> { pub struct SearchBuilder<'a> {
index: &'a Index, index: &'a Index,
query: String, query: Option<String>,
offset: usize, offset: usize,
limit: usize, limit: usize,
attributes_to_crop: Option<HashMap<String, usize>>, attributes_to_crop: Option<HashMap<String, usize>>,
attributes_to_retrieve: Option<HashSet<String>>, attributes_to_retrieve: Option<HashSet<String>>,
attributes_to_highlight: Option<HashSet<String>>, attributes_to_highlight: Option<HashSet<String>>,
filters: Option<String>, filters: Option<String>,
timeout: Duration,
matches: bool, matches: bool,
facet_filters: Option<FacetFilter>,
facets: Option<Vec<(FieldId, String)>>
} }
impl<'a> SearchBuilder<'a> { impl<'a> SearchBuilder<'a> {
@ -137,13 +87,13 @@ impl<'a> SearchBuilder<'a> {
self self
} }
pub fn filters(&mut self, value: String) -> &SearchBuilder { pub fn add_facet_filters(&mut self, filters: FacetFilter) -> &SearchBuilder {
self.filters = Some(value); self.facet_filters = Some(filters);
self self
} }
pub fn timeout(&mut self, value: Duration) -> &SearchBuilder { pub fn filters(&mut self, value: String) -> &SearchBuilder {
self.timeout = value; self.filters = Some(value);
self self
} }
@ -152,17 +102,19 @@ impl<'a> SearchBuilder<'a> {
self self
} }
pub fn search(&self, reader: &heed::RoTxn<MainT>) -> Result<SearchResult, Error> { pub fn add_facets(&mut self, facets: Vec<(FieldId, String)>) -> &SearchBuilder {
let schema = self.index.main.schema(reader); self.facets = Some(facets);
let schema = schema.map_err(|e| Error::Internal(e.to_string()))?; self
let schema = match schema { }
Some(schema) => schema,
None => return Err(Error::Internal(String::from("missing schema"))),
};
let ranked_map = self.index.main.ranked_map(reader); pub fn search(self, reader: &MainReader) -> Result<SearchResult, ResponseError> {
let ranked_map = ranked_map.map_err(|e| Error::Internal(e.to_string()))?; let schema = self
let ranked_map = ranked_map.unwrap_or_default(); .index
.main
.schema(reader)?
.ok_or(Error::internal("missing schema"))?;
let ranked_map = self.index.main.ranked_map(reader)?.unwrap_or_default();
// Change criteria // Change criteria
let mut query_builder = match self.get_criteria(reader, &ranked_map, &schema)? { let mut query_builder = match self.get_criteria(reader, &ranked_map, &schema)? {
@ -170,89 +122,86 @@ impl<'a> SearchBuilder<'a> {
None => self.index.query_builder(), None => self.index.query_builder(),
}; };
if let Some(filters) = &self.filters { if let Some(filter_expression) = &self.filters {
let mut split = filters.split(':'); let filter = Filter::parse(filter_expression, &schema)?;
match (split.next(), split.next()) { let index = &self.index;
(Some(_), None) | (Some(_), Some("")) => return Err(Error::MissingFilterValue), query_builder.with_filter(move |id| {
(Some(attr), Some(value)) => { let reader = &reader;
let ref_reader = reader; let filter = &filter;
let ref_index = &self.index; match filter.test(reader, index, id) {
let value = value.trim().to_lowercase(); Ok(res) => res,
Err(e) => {
let attr = match schema.id(attr) { log::warn!("unexpected error during filtering: {}", e);
Some(attr) => attr, false
None => return Err(Error::UnknownFilteredAttribute), }
};
query_builder.with_filter(move |id| {
let attr = attr;
let index = ref_index;
let reader = ref_reader;
match index.document_attribute::<Value>(reader, id, attr) {
Ok(Some(Value::String(s))) => s.to_lowercase() == value,
Ok(Some(Value::Bool(b))) => {
(value == "true" && b) || (value == "false" && !b)
}
Ok(Some(Value::Array(a))) => {
a.into_iter().any(|s| s.as_str() == Some(&value))
}
_ => false,
}
});
} }
(_, _) => (), });
}
} }
query_builder.with_fetch_timeout(self.timeout);
if let Some(field) = self.index.main.distinct_attribute(reader)? { if let Some(field) = self.index.main.distinct_attribute(reader)? {
if let Some(field_id) = schema.id(&field) { let index = &self.index;
query_builder.with_distinct(1, move |id| { query_builder.with_distinct(1, move |id| {
match self.index.document_attribute_bytes(reader, id, field_id) { match index.document_attribute_bytes(reader, id, field) {
Ok(Some(bytes)) => { Ok(Some(bytes)) => {
let mut s = SipHasher::new(); let mut s = SipHasher::new();
bytes.hash(&mut s); bytes.hash(&mut s);
Some(s.finish()) Some(s.finish())
}
_ => None,
} }
}); _ => None,
} }
});
} }
query_builder.set_facet_filter(self.facet_filters);
query_builder.set_facets(self.facets);
let start = Instant::now(); let start = Instant::now();
let docs = let result = query_builder.query(reader, self.query.as_deref(), self.offset..(self.offset + self.limit));
query_builder.query(reader, &self.query, self.offset..(self.offset + self.limit)); let search_result = result.map_err(Error::search_documents)?;
let time_ms = start.elapsed().as_millis() as usize; let time_ms = start.elapsed().as_millis() as usize;
let mut hits = Vec::with_capacity(self.limit); let mut all_attributes: HashSet<&str> = HashSet::new();
for doc in docs.map_err(|e| Error::SearchDocuments(e.to_string()))? { let mut all_formatted: HashSet<&str> = HashSet::new();
// retrieve the content of document in kv store
let mut fields: Option<HashSet<&str>> = None; match &self.attributes_to_retrieve {
if let Some(attributes_to_retrieve) = &self.attributes_to_retrieve { Some(to_retrieve) => {
let mut set = HashSet::new(); all_attributes.extend(to_retrieve.iter().map(String::as_str));
for field in attributes_to_retrieve {
set.insert(field.as_str()); if let Some(to_highlight) = &self.attributes_to_highlight {
all_formatted.extend(to_highlight.iter().map(String::as_str));
} }
fields = Some(set);
}
let document: IndexMap<String, Value> = self if let Some(to_crop) = &self.attributes_to_crop {
all_formatted.extend(to_crop.keys().map(String::as_str));
}
all_attributes.extend(&all_formatted);
},
None => {
all_attributes.extend(schema.displayed_name());
// If we specified at least one attribute to highlight or crop then
// all available attributes will be returned in the _formatted field.
if self.attributes_to_highlight.is_some() || self.attributes_to_crop.is_some() {
all_formatted.extend(all_attributes.iter().cloned());
}
},
}
let mut hits = Vec::with_capacity(self.limit);
for doc in search_result.documents {
let mut document: IndexMap<String, Value> = self
.index .index
.document(reader, fields.as_ref(), doc.id) .document(reader, Some(&all_attributes), doc.id)
.map_err(|e| Error::RetrieveDocument(doc.id.0, e.to_string()))? .map_err(|e| Error::retrieve_document(doc.id.0, e))?
.ok_or(Error::DocumentNotFound(doc.id.0))?; .ok_or(Error::internal(
"Impossible to retrieve the document; Corrupted data",
))?;
let has_attributes_to_highlight = self.attributes_to_highlight.is_some(); let mut formatted = document.iter()
let has_attributes_to_crop = self.attributes_to_crop.is_some(); .filter(|(key, _)| all_formatted.contains(key.as_str()))
.map(|(k, v)| (k.clone(), v.clone()))
.collect();
let mut formatted = if has_attributes_to_highlight || has_attributes_to_crop {
document.clone()
} else {
IndexMap::new()
};
let mut matches = doc.highlights.clone(); let mut matches = doc.highlights.clone();
// Crops fields if needed // Crops fields if needed
@ -261,13 +210,24 @@ impl<'a> SearchBuilder<'a> {
} }
// Transform to readable matches // Transform to readable matches
let matches = calculate_matches(matches, self.attributes_to_retrieve.clone(), &schema);
if let Some(attributes_to_highlight) = &self.attributes_to_highlight { if let Some(attributes_to_highlight) = &self.attributes_to_highlight {
let matches = calculate_matches(
&matches,
self.attributes_to_highlight.clone(),
&schema,
);
formatted = calculate_highlights(&formatted, &matches, attributes_to_highlight); formatted = calculate_highlights(&formatted, &matches, attributes_to_highlight);
} }
let matches_info = if self.matches { Some(matches) } else { None }; let matches_info = if self.matches {
Some(calculate_matches(&matches, self.attributes_to_retrieve.clone(), &schema))
} else {
None
};
if let Some(attributes_to_retrieve) = &self.attributes_to_retrieve {
document.retain(|key, _| attributes_to_retrieve.contains(&key.to_string()))
}
let hit = SearchHit { let hit = SearchHit {
document, document,
@ -282,8 +242,12 @@ impl<'a> SearchBuilder<'a> {
hits, hits,
offset: self.offset, offset: self.offset,
limit: self.limit, limit: self.limit,
nb_hits: search_result.nb_hits,
exhaustive_nb_hits: search_result.exhaustive_nb_hit,
processing_time_ms: time_ms, processing_time_ms: time_ms,
query: self.query.to_string(), query: self.query.unwrap_or_default(),
facets_distribution: search_result.facets,
exhaustive_facets_count: search_result.exhaustive_facets_count,
}; };
Ok(results) Ok(results)
@ -291,10 +255,10 @@ impl<'a> SearchBuilder<'a> {
pub fn get_criteria( pub fn get_criteria(
&self, &self,
reader: &heed::RoTxn<MainT>, reader: &MainReader,
ranked_map: &'a RankedMap, ranked_map: &'a RankedMap,
schema: &Schema, schema: &Schema,
) -> Result<Option<Criteria<'a>>, Error> { ) -> Result<Option<Criteria<'a>>, ResponseError> {
let ranking_rules = self.index.main.ranking_rules(reader)?; let ranking_rules = self.index.main.ranking_rules(reader)?;
if let Some(ranking_rules) = ranking_rules { if let Some(ranking_rules) = ranking_rules {
@ -358,14 +322,48 @@ pub struct SearchHit {
pub matches_info: Option<MatchesInfos>, pub matches_info: Option<MatchesInfos>,
} }
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize)]
#[serde(rename_all = "camelCase")] #[serde(rename_all = "camelCase")]
pub struct SearchResult { pub struct SearchResult {
pub hits: Vec<SearchHit>, pub hits: Vec<SearchHit>,
pub offset: usize, pub offset: usize,
pub limit: usize, pub limit: usize,
pub nb_hits: usize,
pub exhaustive_nb_hits: bool,
pub processing_time_ms: usize, pub processing_time_ms: usize,
pub query: String, pub query: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub facets_distribution: Option<HashMap<String, HashMap<String, usize>>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub exhaustive_facets_count: Option<bool>,
}
/// returns the start index and the length on the crop.
fn aligned_crop(text: &str, match_index: usize, context: usize) -> (usize, usize) {
let is_word_component = |c: &char| c.is_alphanumeric() && !is_cjk(*c);
let word_end_index = |mut index| {
if text.chars().nth(index - 1).map_or(false, |c| is_word_component(&c)) {
index += text.chars().skip(index).take_while(is_word_component).count();
}
index
};
if context == 0 {
// count need to be at least 1 for cjk queries to return something
return (match_index, 1 + text.chars().skip(match_index).take_while(is_word_component).count());
}
let start = match match_index.saturating_sub(context) {
0 => 0,
n => {
let word_end_index = word_end_index(n);
// skip whitespaces if any
word_end_index + text.chars().skip(word_end_index).take_while(char::is_ascii_whitespace).count()
}
};
let end = word_end_index(match_index + context);
(start, end - start)
} }
fn crop_text( fn crop_text(
@ -376,14 +374,23 @@ fn crop_text(
let mut matches = matches.into_iter().peekable(); let mut matches = matches.into_iter().peekable();
let char_index = matches.peek().map(|m| m.char_index as usize).unwrap_or(0); let char_index = matches.peek().map(|m| m.char_index as usize).unwrap_or(0);
let start = char_index.saturating_sub(context); let (start, count) = aligned_crop(text, char_index, context);
let text = text.chars().skip(start).take(context * 2).collect();
// TODO do something about double allocation
let text = text
.chars()
.skip(start)
.take(count)
.collect::<String>()
.trim()
.to_string();
// update matches index to match the new cropped text
let matches = matches let matches = matches
.take_while(|m| (m.char_index as usize) + (m.char_length as usize) <= start + (context * 2)) .take_while(|m| (m.char_index as usize) + (m.char_length as usize) <= start + count)
.map(|match_| Highlight { .map(|m| Highlight {
char_index: match_.char_index - start as u16, char_index: m.char_index - start as u16,
..match_ ..m
}) })
.collect(); .collect();
@ -422,14 +429,14 @@ fn crop_document(
} }
fn calculate_matches( fn calculate_matches(
matches: Vec<Highlight>, matches: &[Highlight],
attributes_to_retrieve: Option<HashSet<String>>, attributes_to_retrieve: Option<HashSet<String>>,
schema: &Schema, schema: &Schema,
) -> MatchesInfos { ) -> MatchesInfos {
let mut matches_result: HashMap<String, Vec<MatchPosition>> = HashMap::new(); let mut matches_result: HashMap<String, Vec<MatchPosition>> = HashMap::new();
for m in matches.iter() { for m in matches.iter() {
if let Some(attribute) = schema.name(FieldId::new(m.attribute)) { if let Some(attribute) = schema.name(FieldId::new(m.attribute)) {
if let Some(attributes_to_retrieve) = attributes_to_retrieve.clone() { if let Some(ref attributes_to_retrieve) = attributes_to_retrieve {
if !attributes_to_retrieve.contains(attribute) { if !attributes_to_retrieve.contains(attribute) {
continue; continue;
} }
@ -472,19 +479,23 @@ fn calculate_highlights(
let value: Vec<_> = value.chars().collect(); let value: Vec<_> = value.chars().collect();
let mut highlighted_value = String::new(); let mut highlighted_value = String::new();
let mut index = 0; let mut index = 0;
for m in matches {
if m.start >= index { let longest_matches = matches
let before = value.get(index..m.start); .linear_group_by_key(|m| m.start)
let highlighted = value.get(m.start..(m.start + m.length)); .map(|group| group.last().unwrap())
if let (Some(before), Some(highlighted)) = (before, highlighted) { .filter(move |m| m.start >= index);
highlighted_value.extend(before);
highlighted_value.push_str("<em>"); for m in longest_matches {
highlighted_value.extend(highlighted); let before = value.get(index..m.start);
highlighted_value.push_str("</em>"); let highlighted = value.get(m.start..(m.start + m.length));
index = m.start + m.length; if let (Some(before), Some(highlighted)) = (before, highlighted) {
} else { highlighted_value.extend(before);
error!("value: {:?}; index: {:?}, match: {:?}", value, index, m); highlighted_value.push_str("<em>");
} highlighted_value.extend(highlighted);
highlighted_value.push_str("</em>");
index = m.start + m.length;
} else {
error!("value: {:?}; index: {:?}, match: {:?}", value, index, m);
} }
} }
highlighted_value.extend(value[index..].iter()); highlighted_value.extend(value[index..].iter());
@ -492,7 +503,6 @@ fn calculate_highlights(
}; };
} }
} }
highlight_result highlight_result
} }
@ -500,6 +510,67 @@ fn calculate_highlights(
mod tests { mod tests {
use super::*; use super::*;
#[test]
fn aligned_crops() {
let text = r#"En ce début de trentième millénaire, l'Empire n'a jamais été aussi puissant, aussi étendu à travers toute la galaxie. C'est dans sa capitale, Trantor, que l'éminent savant Hari Seldon invente la psychohistoire, une science toute nouvelle, à base de psychologie et de mathématiques, qui lui permet de prédire l'avenir... C'est-à-dire l'effondrement de l'Empire d'ici cinq siècles et au-delà, trente mille années de chaos et de ténèbres. Pour empêcher cette catastrophe et sauver la civilisation, Seldon crée la Fondation."#;
// simple test
let (start, length) = aligned_crop(&text, 6, 2);
let cropped = text.chars().skip(start).take(length).collect::<String>().trim().to_string();
assert_eq!("début", cropped);
// first word test
let (start, length) = aligned_crop(&text, 0, 1);
let cropped = text.chars().skip(start).take(length).collect::<String>().trim().to_string();
assert_eq!("En", cropped);
// last word test
let (start, length) = aligned_crop(&text, 510, 2);
let cropped = text.chars().skip(start).take(length).collect::<String>().trim().to_string();
assert_eq!("Fondation", cropped);
// CJK tests
let text = "this isのス foo myタイリ test";
// mixed charset
let (start, length) = aligned_crop(&text, 5, 3);
let cropped = text.chars().skip(start).take(length).collect::<String>().trim().to_string();
assert_eq!("isの", cropped);
// split regular word / CJK word, no space
let (start, length) = aligned_crop(&text, 7, 1);
let cropped = text.chars().skip(start).take(length).collect::<String>().trim().to_string();
assert_eq!("", cropped);
}
#[test]
fn calculate_matches() {
let mut matches = Vec::new();
matches.push(Highlight { attribute: 0, char_index: 0, char_length: 3});
matches.push(Highlight { attribute: 0, char_index: 0, char_length: 2});
let mut attributes_to_retrieve: HashSet<String> = HashSet::new();
attributes_to_retrieve.insert("title".to_string());
let schema = Schema::with_primary_key("title");
let matches_result = super::calculate_matches(&matches, Some(attributes_to_retrieve), &schema);
let mut matches_result_expected: HashMap<String, Vec<MatchPosition>> = HashMap::new();
let mut positions = Vec::new();
positions.push(MatchPosition {
start: 0,
length: 2,
});
positions.push(MatchPosition {
start: 0,
length: 3,
});
matches_result_expected.insert("title".to_string(), positions);
assert_eq!(matches_result, matches_result_expected);
}
#[test] #[test]
fn calculate_highlights() { fn calculate_highlights() {
let data = r#"{ let data = r#"{
@ -538,4 +609,38 @@ mod tests {
assert_eq!(result, result_expected); assert_eq!(result, result_expected);
} }
#[test]
fn highlight_longest_match() {
let data = r#"{
"title": "Ice"
}"#;
let document: IndexMap<String, Value> = serde_json::from_str(data).unwrap();
let mut attributes_to_highlight = HashSet::new();
attributes_to_highlight.insert("title".to_string());
let mut matches = HashMap::new();
let mut m = Vec::new();
m.push(MatchPosition {
start: 0,
length: 2,
});
m.push(MatchPosition {
start: 0,
length: 3,
});
matches.insert("title".to_string(), m);
let result = super::calculate_highlights(&document, &matches, &attributes_to_highlight);
let mut result_expected = IndexMap::new();
result_expected.insert(
"title".to_string(),
Value::String("<em>Ice</em>".to_string()),
);
assert_eq!(result, result_expected);
}
} }

View File

@ -1,2 +1,7 @@
pub mod authentication;
pub mod meilisearch; pub mod meilisearch;
pub mod tide; pub mod normalize_path;
pub mod compression;
pub use authentication::Authentication;
pub use normalize_path::NormalizePath;

View File

@ -0,0 +1,86 @@
/// From https://docs.rs/actix-web/3.0.0-alpha.2/src/actix_web/middleware/normalize.rs.html#34
use actix_http::Error;
use actix_service::{Service, Transform};
use actix_web::{
dev::ServiceRequest,
dev::ServiceResponse,
http::uri::{PathAndQuery, Uri},
};
use futures::future::{ok, Ready};
use regex::Regex;
use std::task::{Context, Poll};
pub struct NormalizePath;
impl<S, B> Transform<S> for NormalizePath
where
S: Service<Request = ServiceRequest, Response = ServiceResponse<B>, Error = Error>,
S::Future: 'static,
{
type Request = ServiceRequest;
type Response = ServiceResponse<B>;
type Error = Error;
type InitError = ();
type Transform = NormalizePathNormalization<S>;
type Future = Ready<Result<Self::Transform, Self::InitError>>;
fn new_transform(&self, service: S) -> Self::Future {
ok(NormalizePathNormalization {
service,
merge_slash: Regex::new("//+").unwrap(),
})
}
}
pub struct NormalizePathNormalization<S> {
service: S,
merge_slash: Regex,
}
impl<S, B> Service for NormalizePathNormalization<S>
where
S: Service<Request = ServiceRequest, Response = ServiceResponse<B>, Error = Error>,
S::Future: 'static,
{
type Request = ServiceRequest;
type Response = ServiceResponse<B>;
type Error = Error;
type Future = S::Future;
fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
self.service.poll_ready(cx)
}
fn call(&mut self, mut req: ServiceRequest) -> Self::Future {
let head = req.head_mut();
// always add trailing slash, might be an extra one
let path = head.uri.path().to_string() + "/";
if self.merge_slash.find(&path).is_some() {
// normalize multiple /'s to one /
let path = self.merge_slash.replace_all(&path, "/");
let path = if path.len() > 1 {
path.trim_end_matches('/')
} else {
&path
};
let mut parts = head.uri.clone().into_parts();
let pq = parts.path_and_query.as_ref().unwrap();
let path = if let Some(q) = pq.query() {
bytes::Bytes::from(format!("{}?{}", path, q))
} else {
bytes::Bytes::copy_from_slice(path.as_bytes())
};
parts.path_and_query = Some(PathAndQuery::from_maybe_shared(path).unwrap());
let uri = Uri::from_parts(parts).unwrap();
req.match_info_mut().get_mut().update(&uri);
req.head_mut().uri = uri;
}
self.service.call(req)
}
}

View File

@ -1,83 +0,0 @@
use crate::error::{ResponseError, SResult};
use crate::Data;
use meilisearch_core::Index;
use tide::Request;
pub enum ACL {
Admin,
Private,
Public,
}
pub trait RequestExt {
fn is_allowed(&self, acl: ACL) -> SResult<()>;
fn url_param(&self, name: &str) -> SResult<String>;
fn index(&self) -> SResult<Index>;
fn document_id(&self) -> SResult<String>;
}
impl RequestExt for Request<Data> {
fn is_allowed(&self, acl: ACL) -> SResult<()> {
let user_api_key = self.header("X-Meili-API-Key");
if self.state().api_keys.master.is_none() {
return Ok(())
}
match acl {
ACL::Admin => {
if user_api_key == self.state().api_keys.master.as_deref() {
return Ok(());
}
}
ACL::Private => {
if user_api_key == self.state().api_keys.master.as_deref() {
return Ok(());
}
if user_api_key == self.state().api_keys.private.as_deref() {
return Ok(());
}
}
ACL::Public => {
if user_api_key == self.state().api_keys.master.as_deref() {
return Ok(());
}
if user_api_key == self.state().api_keys.private.as_deref() {
return Ok(());
}
if user_api_key == self.state().api_keys.public.as_deref() {
return Ok(());
}
}
}
Err(ResponseError::InvalidToken(
user_api_key.unwrap_or("Need a token").to_owned(),
))
}
fn url_param(&self, name: &str) -> SResult<String> {
let param = self
.param::<String>(name)
.map_err(|e| ResponseError::bad_parameter(name, e))?;
Ok(param)
}
fn index(&self) -> SResult<Index> {
let index_uid = self.url_param("index")?;
let index = self
.state()
.db
.open_index(&index_uid)
.ok_or(ResponseError::index_not_found(index_uid))?;
Ok(index)
}
fn document_id(&self) -> SResult<String> {
let name = self
.param::<String>("document_id")
.map_err(|_| ResponseError::bad_parameter("documentId", "primaryKey"))?;
Ok(name)
}
}

View File

@ -6,5 +6,93 @@ pub mod helpers;
pub mod models; pub mod models;
pub mod option; pub mod option;
pub mod routes; pub mod routes;
pub mod analytics;
pub mod snapshot;
pub mod dump;
use actix_http::Error;
use actix_service::ServiceFactory;
use actix_web::{dev, web, App};
use chrono::Utc;
use log::error;
use meilisearch_core::{Index, MainWriter, ProcessedUpdateResult};
pub use option::Opt;
pub use self::data::Data; pub use self::data::Data;
use self::error::{payload_error_handler, ResponseError};
pub fn create_app(
data: &Data,
) -> App<
impl ServiceFactory<
Config = (),
Request = dev::ServiceRequest,
Response = dev::ServiceResponse<actix_http::body::Body>,
Error = Error,
InitError = (),
>,
actix_http::body::Body,
> {
App::new()
.data(data.clone())
.app_data(
web::JsonConfig::default()
.limit(data.http_payload_size_limit)
.content_type(|_mime| true) // Accept all mime types
.error_handler(|err, _req| payload_error_handler(err).into()),
)
.app_data(
web::QueryConfig::default()
.error_handler(|err, _req| payload_error_handler(err).into())
)
.service(routes::load_html)
.service(routes::load_css)
.configure(routes::document::services)
.configure(routes::index::services)
.configure(routes::search::services)
.configure(routes::setting::services)
.configure(routes::stop_words::services)
.configure(routes::synonym::services)
.configure(routes::health::services)
.configure(routes::stats::services)
.configure(routes::key::services)
.configure(routes::dump::services)
}
pub fn index_update_callback_txn(index: Index, index_uid: &str, data: &Data, mut writer: &mut MainWriter) -> Result<(), String> {
if let Err(e) = data.db.compute_stats(&mut writer, index_uid) {
return Err(format!("Impossible to compute stats; {}", e));
}
if let Err(e) = data.db.set_last_update(&mut writer, &Utc::now()) {
return Err(format!("Impossible to update last_update; {}", e));
}
if let Err(e) = index.main.put_updated_at(&mut writer) {
return Err(format!("Impossible to update updated_at; {}", e));
}
Ok(())
}
pub fn index_update_callback(index_uid: &str, data: &Data, status: ProcessedUpdateResult) {
if status.error.is_some() {
return;
}
if let Some(index) = data.db.open_index(index_uid) {
let db = &data.db;
let res = db.main_write::<_, _, ResponseError>(|mut writer| {
if let Err(e) = index_update_callback_txn(index, index_uid, data, &mut writer) {
error!("{}", e);
}
Ok(())
});
match res {
Ok(_) => (),
Err(e) => error!("{}", e),
}
}
}

View File

@ -1,15 +1,12 @@
use std::{env, thread}; use std::{env, thread};
use async_std::task; use actix_cors::Cors;
use log::info; use actix_web::{middleware, HttpServer};
use main_error::MainError; use main_error::MainError;
use meilisearch_http::helpers::NormalizePath;
use meilisearch_http::{create_app, index_update_callback, Data, Opt};
use structopt::StructOpt; use structopt::StructOpt;
use tide::middleware::{Cors, RequestLogger}; use meilisearch_http::{snapshot, dump};
use meilisearch_http::data::Data;
use meilisearch_http::option::Opt;
use meilisearch_http::routes;
use meilisearch_http::routes::index::index_update_callback;
mod analytics; mod analytics;
@ -17,9 +14,23 @@ mod analytics;
#[global_allocator] #[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc; static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
pub fn main() -> Result<(), MainError> { #[actix_web::main]
async fn main() -> Result<(), MainError> {
let opt = Opt::from_args(); let opt = Opt::from_args();
#[cfg(all(not(debug_assertions), feature = "sentry"))]
let _sentry = sentry::init((
if !opt.no_sentry {
Some(opt.sentry_dsn.clone())
} else {
None
},
sentry::ClientOptions {
release: sentry::release_name!(),
..Default::default()
},
));
match opt.env.as_ref() { match opt.env.as_ref() {
"production" => { "production" => {
if opt.master_key.is_none() { if opt.master_key.is_none() {
@ -28,7 +39,12 @@ pub fn main() -> Result<(), MainError> {
.into(), .into(),
); );
} }
env_logger::init();
#[cfg(all(not(debug_assertions), feature = "sentry"))]
if !opt.no_sentry && _sentry.is_enabled() {
sentry::integrations::panic::register_panic_handler(); // TODO: This shouldn't be needed when upgrading to sentry 0.19.0. These integrations are turned on by default when using `sentry::init`.
sentry::integrations::env_logger::init(None, Default::default());
}
} }
"development" => { "development" => {
env_logger::from_env(env_logger::Env::default().default_filter_or("info")).init(); env_logger::from_env(env_logger::Env::default().default_filter_or("info")).init();
@ -36,27 +52,57 @@ pub fn main() -> Result<(), MainError> {
_ => unreachable!(), _ => unreachable!(),
} }
if !opt.no_analytics { if let Some(path) = &opt.load_from_snapshot {
thread::spawn(analytics::analytics_sender); snapshot::load_snapshot(&opt.db_path, path, opt.ignore_snapshot_if_db_exists, opt.ignore_missing_snapshot)?;
} }
let data = Data::new(opt.clone()); let data = Data::new(opt.clone())?;
if !opt.no_analytics {
let analytics_data = data.clone();
let analytics_opt = opt.clone();
thread::spawn(move || analytics::analytics_sender(analytics_data, analytics_opt));
}
let data_cloned = data.clone(); let data_cloned = data.clone();
data.db.set_update_callback(Box::new(move |name, status| { data.db.set_update_callback(Box::new(move |name, status| {
index_update_callback(name, &data_cloned, status); index_update_callback(name, &data_cloned, status);
})); }));
if let Some(path) = &opt.import_dump {
dump::import_dump(&data, path, opt.dump_batch_size)?;
}
if let Some(path) = &opt.snapshot_path {
snapshot::schedule_snapshot(data.clone(), &path, opt.snapshot_interval_sec.unwrap_or(86400))?;
}
print_launch_resume(&opt, &data); print_launch_resume(&opt, &data);
let mut app = tide::with_state(data); let http_server = HttpServer::new(move || {
create_app(&data)
.wrap(
Cors::new()
.send_wildcard()
.allowed_headers(vec!["content-type", "x-meili-api-key"])
.max_age(86_400) // 24h
.finish(),
)
.wrap(middleware::Logger::default())
.wrap(middleware::Compress::default())
.wrap(NormalizePath)
});
app.middleware(Cors::new()); if let Some(config) = opt.get_ssl_config()? {
app.middleware(RequestLogger::new()); http_server
.bind_rustls(opt.http_addr, config)?
.run()
.await?;
} else {
http_server.bind(opt.http_addr)?.run().await?;
}
routes::load_routes(&mut app);
task::block_on(app.listen(opt.http_addr))?;
Ok(()) Ok(())
} }
@ -72,37 +118,52 @@ pub fn print_launch_resume(opt: &Opt, data: &Data) {
888 888 "Y8888 888 888 888 "Y8888P" "Y8888 "Y888888 888 "Y8888P 888 888 888 888 "Y8888 888 888 888 "Y8888P" "Y8888 "Y888888 888 "Y8888P 888 888
"#; "#;
println!("{}", ascii_name); eprintln!("{}", ascii_name);
info!("Database path: {:?}", opt.db_path); eprintln!("Database path:\t\t{:?}", opt.db_path);
info!("Start server on: {:?}", opt.http_addr); eprintln!("Server listening on:\t{:?}", opt.http_addr);
info!("Environment: {:?}", opt.env); eprintln!("Environment:\t\t{:?}", opt.env);
info!("Commit SHA: {:?}", env!("VERGEN_SHA").to_string()); eprintln!("Commit SHA:\t\t{:?}", env!("VERGEN_SHA").to_string());
info!( eprintln!(
"Build date: {:?}", "Build date:\t\t{:?}",
env!("VERGEN_BUILD_TIMESTAMP").to_string() env!("VERGEN_BUILD_TIMESTAMP").to_string()
); );
info!( eprintln!(
"Package version: {:?}", "Package version:\t{:?}",
env!("CARGO_PKG_VERSION").to_string() env!("CARGO_PKG_VERSION").to_string()
); );
if let Some(master_key) = &data.api_keys.master { #[cfg(all(not(debug_assertions), feature = "sentry"))]
info!("Master Key: {:?}", master_key); eprintln!(
"Sentry DSN:\t\t{:?}",
if let Some(private_key) = &data.api_keys.private { if !opt.no_sentry {
info!("Private Key: {:?}", private_key); &opt.sentry_dsn
} else {
"Disabled"
} }
);
if let Some(public_key) = &data.api_keys.public { eprintln!(
info!("Public Key: {:?}", public_key); "Amplitude Analytics:\t{:?}",
if !opt.no_analytics {
"Enabled"
} else {
"Disabled"
} }
);
eprintln!();
if data.api_keys.master.is_some() {
eprintln!("A Master Key has been set. Requests to MeiliSearch won't be authorized unless you provide an authentication key.");
} else { } else {
info!("No master key found; The server will have no securities.\ eprintln!("No master key found; The server will accept unidentified requests. \
If you need some protection in development mode, please export a key. export MEILI_MASTER_KEY=xxx"); If you need some protection in development mode, please export a key: export MEILI_MASTER_KEY=xxx");
} }
info!("If you need extra information; Please refer to the documentation: http://docs.meilisearch.com"); eprintln!();
info!("If you want to support us or help us; Please consult our Github repo: http://github.com/meilisearch/meilisearch"); eprintln!("Documentation:\t\thttps://docs.meilisearch.com");
info!("If you want to contact us; Please chat with us on http://meilisearch.com or by email to bonjour@meilisearch.com"); eprintln!("Source code:\t\thttps://github.com/meilisearch/meilisearch");
eprintln!("Contact:\t\thttps://docs.meilisearch.com/resources/contact.html or bonjour@meilisearch.com");
eprintln!();
} }

View File

@ -1,8 +1,18 @@
use std::{error, fs};
use std::io::{BufReader, Read};
use std::path::PathBuf;
use std::sync::Arc;
use rustls::internal::pemfile::{certs, pkcs8_private_keys, rsa_private_keys};
use rustls::{
AllowAnyAnonymousOrAuthenticatedClient, AllowAnyAuthenticatedClient, NoClientAuth,
RootCertStore,
};
use structopt::StructOpt; use structopt::StructOpt;
const POSSIBLE_ENV: [&str; 2] = ["development", "production"]; const POSSIBLE_ENV: [&str; 2] = ["development", "production"];
#[derive(Debug, Clone, StructOpt)] #[derive(Debug, Default, Clone, StructOpt)]
pub struct Opt { pub struct Opt {
/// The destination where the database must be created. /// The destination where the database must be created.
#[structopt(long, env = "MEILI_DB_PATH", default_value = "./data.ms")] #[structopt(long, env = "MEILI_DB_PATH", default_value = "./data.ms")]
@ -16,7 +26,18 @@ pub struct Opt {
#[structopt(long, env = "MEILI_MASTER_KEY")] #[structopt(long, env = "MEILI_MASTER_KEY")]
pub master_key: Option<String>, pub master_key: Option<String>,
/// This environment variable must be set to `production` if your are running in production. /// The Sentry DSN to use for error reporting. This defaults to the MeiliSearch Sentry project.
/// You can disable sentry all together using the `--no-sentry` flag or `MEILI_NO_SENTRY` environment variable.
#[cfg(all(not(debug_assertions), feature = "sentry"))]
#[structopt(long, env = "SENTRY_DSN", default_value = "https://5ddfa22b95f241198be2271aaf028653@sentry.io/3060337")]
pub sentry_dsn: String,
/// Disable Sentry error reporting.
#[cfg(all(not(debug_assertions), feature = "sentry"))]
#[structopt(long, env = "MEILI_NO_SENTRY")]
pub no_sentry: bool,
/// This environment variable must be set to `production` if you are running in production.
/// If the server is running in development mode more logs will be displayed, /// If the server is running in development mode more logs will be displayed,
/// and the master key can be avoided which implies that there is no security on the updates routes. /// and the master key can be avoided which implies that there is no security on the updates routes.
/// This is useful to debug when integrating the engine with another service. /// This is useful to debug when integrating the engine with another service.
@ -26,4 +47,171 @@ pub struct Opt {
/// Do not send analytics to Meili. /// Do not send analytics to Meili.
#[structopt(long, env = "MEILI_NO_ANALYTICS")] #[structopt(long, env = "MEILI_NO_ANALYTICS")]
pub no_analytics: bool, pub no_analytics: bool,
/// The maximum size, in bytes, of the main lmdb database directory
#[structopt(long, env = "MEILI_MAX_MDB_SIZE", default_value = "107374182400")] // 100GB
pub max_mdb_size: usize,
/// The maximum size, in bytes, of the update lmdb database directory
#[structopt(long, env = "MEILI_MAX_UDB_SIZE", default_value = "107374182400")] // 100GB
pub max_udb_size: usize,
/// The maximum size, in bytes, of accepted JSON payloads
#[structopt(long, env = "MEILI_HTTP_PAYLOAD_SIZE_LIMIT", default_value = "10485760")] // 10MB
pub http_payload_size_limit: usize,
/// Read server certificates from CERTFILE.
/// This should contain PEM-format certificates
/// in the right order (the first certificate should
/// certify KEYFILE, the last should be a root CA).
#[structopt(long, env = "MEILI_SSL_CERT_PATH", parse(from_os_str))]
pub ssl_cert_path: Option<PathBuf>,
/// Read private key from KEYFILE. This should be a RSA
/// private key or PKCS8-encoded private key, in PEM format.
#[structopt(long, env = "MEILI_SSL_KEY_PATH", parse(from_os_str))]
pub ssl_key_path: Option<PathBuf>,
/// Enable client authentication, and accept certificates
/// signed by those roots provided in CERTFILE.
#[structopt(long, env = "MEILI_SSL_AUTH_PATH", parse(from_os_str))]
pub ssl_auth_path: Option<PathBuf>,
/// Read DER-encoded OCSP response from OCSPFILE and staple to certificate.
/// Optional
#[structopt(long, env = "MEILI_SSL_OCSP_PATH", parse(from_os_str))]
pub ssl_ocsp_path: Option<PathBuf>,
/// Send a fatal alert if the client does not complete client authentication.
#[structopt(long, env = "MEILI_SSL_REQUIRE_AUTH")]
pub ssl_require_auth: bool,
/// SSL support session resumption
#[structopt(long, env = "MEILI_SSL_RESUMPTION")]
pub ssl_resumption: bool,
/// SSL support tickets.
#[structopt(long, env = "MEILI_SSL_TICKETS")]
pub ssl_tickets: bool,
/// Defines the path of the snapshot file to import.
/// This option will, by default, stop the process if a database already exist or if no snapshot exists at
/// the given path. If this option is not specified no snapshot is imported.
#[structopt(long, env = "MEILI_LOAD_FROM_SNAPSHOT")]
pub load_from_snapshot: Option<PathBuf>,
/// The engine will ignore a missing snapshot and not return an error in such case.
#[structopt(long, requires = "load-from-snapshot", env = "MEILI_IGNORE_MISSING_SNAPSHOT")]
pub ignore_missing_snapshot: bool,
/// The engine will skip snapshot importation and not return an error in such case.
#[structopt(long, requires = "load-from-snapshot", env = "MEILI_IGNORE_SNAPSHOT_IF_DB_EXISTS")]
pub ignore_snapshot_if_db_exists: bool,
/// Defines the directory path where meilisearch will create snapshot each snapshot_time_gap.
#[structopt(long, env = "MEILI_SNAPSHOT_PATH")]
pub snapshot_path: Option<PathBuf>,
/// Defines time interval, in seconds, between each snapshot creation.
#[structopt(long, requires = "snapshot-path", env = "MEILI_SNAPSHOT_INTERVAL_SEC")]
pub snapshot_interval_sec: Option<u64>,
/// Folder where dumps are created when the dump route is called.
#[structopt(long, env = "MEILI_DUMPS_FOLDER", default_value = "dumps/")]
pub dumps_folder: PathBuf,
/// Import a dump from the specified path, must be a `.tar.gz` file.
#[structopt(long, env = "MEILI_IMPORT_DUMP", conflicts_with = "load-from-snapshot")]
pub import_dump: Option<PathBuf>,
/// The batch size used in the importation process, the bigger it is the faster the dump is created.
#[structopt(long, env = "MEILI_DUMP_BATCH_SIZE", default_value = "1024")]
pub dump_batch_size: usize,
}
impl Opt {
pub fn get_ssl_config(&self) -> Result<Option<rustls::ServerConfig>, Box<dyn error::Error>> {
if let (Some(cert_path), Some(key_path)) = (&self.ssl_cert_path, &self.ssl_key_path) {
let client_auth = match &self.ssl_auth_path {
Some(auth_path) => {
let roots = load_certs(auth_path.to_path_buf())?;
let mut client_auth_roots = RootCertStore::empty();
for root in roots {
client_auth_roots.add(&root).unwrap();
}
if self.ssl_require_auth {
AllowAnyAuthenticatedClient::new(client_auth_roots)
} else {
AllowAnyAnonymousOrAuthenticatedClient::new(client_auth_roots)
}
}
None => NoClientAuth::new(),
};
let mut config = rustls::ServerConfig::new(client_auth);
config.key_log = Arc::new(rustls::KeyLogFile::new());
let certs = load_certs(cert_path.to_path_buf())?;
let privkey = load_private_key(key_path.to_path_buf())?;
let ocsp = load_ocsp(&self.ssl_ocsp_path)?;
config
.set_single_cert_with_ocsp_and_sct(certs, privkey, ocsp, vec![])
.map_err(|_| "bad certificates/private key")?;
if self.ssl_resumption {
config.set_persistence(rustls::ServerSessionMemoryCache::new(256));
}
if self.ssl_tickets {
config.ticketer = rustls::Ticketer::new();
}
Ok(Some(config))
} else {
Ok(None)
}
}
}
fn load_certs(filename: PathBuf) -> Result<Vec<rustls::Certificate>, Box<dyn error::Error>> {
let certfile = fs::File::open(filename).map_err(|_| "cannot open certificate file")?;
let mut reader = BufReader::new(certfile);
Ok(certs(&mut reader).map_err(|_| "cannot read certificate file")?)
}
fn load_private_key(filename: PathBuf) -> Result<rustls::PrivateKey, Box<dyn error::Error>> {
let rsa_keys = {
let keyfile =
fs::File::open(filename.clone()).map_err(|_| "cannot open private key file")?;
let mut reader = BufReader::new(keyfile);
rsa_private_keys(&mut reader).map_err(|_| "file contains invalid rsa private key")?
};
let pkcs8_keys = {
let keyfile = fs::File::open(filename).map_err(|_| "cannot open private key file")?;
let mut reader = BufReader::new(keyfile);
pkcs8_private_keys(&mut reader)
.map_err(|_| "file contains invalid pkcs8 private key (encrypted keys not supported)")?
};
// prefer to load pkcs8 keys
if !pkcs8_keys.is_empty() {
Ok(pkcs8_keys[0].clone())
} else {
assert!(!rsa_keys.is_empty());
Ok(rsa_keys[0].clone())
}
}
fn load_ocsp(filename: &Option<PathBuf>) -> Result<Vec<u8>, Box<dyn error::Error>> {
let mut ret = Vec::new();
if let Some(ref name) = filename {
fs::File::open(name)
.map_err(|_| "cannot open ocsp file")?
.read_to_end(&mut ret)
.map_err(|_| "cannot read oscp file")?;
}
Ok(ret)
} }

View File

@ -1,62 +1,83 @@
use std::collections::{BTreeSet, HashSet}; use std::collections::{BTreeSet, HashSet};
use actix_web::{delete, get, post, put};
use actix_web::{web, HttpResponse};
use indexmap::IndexMap; use indexmap::IndexMap;
use serde::{Deserialize, Serialize}; use meilisearch_core::{update, MainReader};
use serde_json::Value; use serde_json::Value;
use tide::{Request, Response}; use serde::Deserialize;
use crate::error::{ResponseError, SResult};
use crate::helpers::tide::RequestExt;
use crate::helpers::tide::ACL::*;
use crate::Data; use crate::Data;
use crate::error::{Error, ResponseError};
use crate::helpers::Authentication;
use crate::routes::{IndexParam, IndexUpdateResponse};
pub async fn get_document(ctx: Request<Data>) -> SResult<Response> { type Document = IndexMap<String, Value>;
ctx.is_allowed(Public)?;
let index = ctx.index()?; #[derive(Deserialize)]
struct DocumentParam {
let original_document_id = ctx.document_id()?; index_uid: String,
let document_id = meilisearch_core::serde::compute_document_id(original_document_id.clone()); document_id: String,
let db = &ctx.state().db;
let reader = db.main_read_txn()?;
let response = index
.document::<IndexMap<String, Value>>(&reader, None, document_id)?
.ok_or(ResponseError::document_not_found(&original_document_id))?;
if response.is_empty() {
return Err(ResponseError::document_not_found(&original_document_id));
}
Ok(tide::Response::new(200).body_json(&response)?)
} }
#[derive(Default, Serialize)] pub fn services(cfg: &mut web::ServiceConfig) {
#[serde(rename_all = "camelCase")] cfg.service(get_document)
pub struct IndexUpdateResponse { .service(delete_document)
pub update_id: u64, .service(get_all_documents)
.service(add_documents)
.service(update_documents)
.service(delete_documents)
.service(clear_all_documents);
} }
pub async fn delete_document(ctx: Request<Data>) -> SResult<Response> { #[get(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/documents/{document_id}",
wrap = "Authentication::Public"
)]
async fn get_document(
data: web::Data<Data>,
path: web::Path<DocumentParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let reader = data.db.main_read_txn()?;
let internal_id = index.main
.external_to_internal_docid(&reader, &path.document_id)?
.ok_or(Error::document_not_found(&path.document_id))?;
let document: Document = index
.document(&reader, None, internal_id)?
.ok_or(Error::document_not_found(&path.document_id))?;
Ok(HttpResponse::Ok().json(document))
}
#[delete(
"/indexes/{index_uid}/documents/{document_id}",
wrap = "Authentication::Private"
)]
async fn delete_document(
data: web::Data<Data>,
path: web::Path<DocumentParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let index = ctx.index()?;
let document_id = ctx.document_id()?;
let document_id = meilisearch_core::serde::compute_document_id(document_id);
let db = &ctx.state().db;
let mut update_writer = db.update_write_txn()?;
let mut documents_deletion = index.documents_deletion(); let mut documents_deletion = index.documents_deletion();
documents_deletion.delete_document_by_id(document_id); documents_deletion.delete_document_by_external_docid(path.document_id.clone());
let update_id = documents_deletion.finalize(&mut update_writer)?;
update_writer.commit()?; let update_id = data.db.update_write(|w| documents_deletion.finalize(w))?;
let response_body = IndexUpdateResponse { update_id }; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
#[derive(Default, Deserialize)] #[derive(Deserialize)]
#[serde(rename_all = "camelCase", deny_unknown_fields)] #[serde(rename_all = "camelCase", deny_unknown_fields)]
struct BrowseQuery { struct BrowseQuery {
offset: Option<usize>, offset: Option<usize>,
@ -64,48 +85,63 @@ struct BrowseQuery {
attributes_to_retrieve: Option<String>, attributes_to_retrieve: Option<String>,
} }
pub async fn get_all_documents(ctx: Request<Data>) -> SResult<Response> { pub fn get_all_documents_sync(
ctx.is_allowed(Private)?; data: &web::Data<Data>,
reader: &MainReader,
index_uid: &str,
offset: usize,
limit: usize,
attributes_to_retrieve: Option<&String>
) -> Result<Vec<Document>, Error> {
let index = data
.db
.open_index(index_uid)
.ok_or(Error::index_not_found(index_uid))?;
let index = ctx.index()?;
let query: BrowseQuery = ctx.query().unwrap_or_default();
let offset = query.offset.unwrap_or(0);
let limit = query.limit.unwrap_or(20);
let db = &ctx.state().db;
let reader = db.main_read_txn()?;
let documents_ids: Result<BTreeSet<_>, _> = index let documents_ids: Result<BTreeSet<_>, _> = index
.documents_fields_counts .documents_fields_counts
.documents_ids(&reader)? .documents_ids(reader)?
.skip(offset) .skip(offset)
.take(limit) .take(limit)
.collect(); .collect();
let documents_ids = match documents_ids { let attributes: Option<HashSet<&str>> = attributes_to_retrieve
Ok(documents_ids) => documents_ids, .map(|a| a.split(',').collect());
Err(e) => return Err(ResponseError::internal(e)),
};
let mut response_body = Vec::<IndexMap<String, Value>>::new(); let mut documents = Vec::new();
for document_id in documents_ids? {
if let Some(attributes) = query.attributes_to_retrieve { if let Ok(Some(document)) =
let attributes = attributes.split(',').collect::<HashSet<&str>>(); index.document::<Document>(reader, attributes.as_ref(), document_id)
for document_id in documents_ids { {
if let Ok(Some(document)) = index.document(&reader, Some(&attributes), document_id) { documents.push(document);
response_body.push(document);
}
}
} else {
for document_id in documents_ids {
if let Ok(Some(document)) = index.document(&reader, None, document_id) {
response_body.push(document);
}
} }
} }
Ok(tide::Response::new(200).body_json(&response_body)?) Ok(documents)
}
#[get("/indexes/{index_uid}/documents", wrap = "Authentication::Public")]
async fn get_all_documents(
data: web::Data<Data>,
path: web::Path<IndexParam>,
params: web::Query<BrowseQuery>,
) -> Result<HttpResponse, ResponseError> {
let offset = params.offset.unwrap_or(0);
let limit = params.limit.unwrap_or(20);
let index_uid = &path.index_uid;
let reader = data.db.main_read_txn()?;
let documents = get_all_documents_sync(
&data,
&reader,
index_uid,
offset,
limit,
params.attributes_to_retrieve.as_ref()
)?;
Ok(HttpResponse::Ok().json(documents))
} }
fn find_primary_key(document: &IndexMap<String, Value>) -> Option<String> { fn find_primary_key(document: &IndexMap<String, Value>) -> Option<String> {
@ -117,42 +153,45 @@ fn find_primary_key(document: &IndexMap<String, Value>) -> Option<String> {
None None
} }
#[derive(Default, Deserialize)] #[derive(Deserialize)]
#[serde(rename_all = "camelCase", deny_unknown_fields)] #[serde(rename_all = "camelCase", deny_unknown_fields)]
struct UpdateDocumentsQuery { struct UpdateDocumentsQuery {
primary_key: Option<String>, primary_key: Option<String>,
} }
async fn update_multiple_documents(mut ctx: Request<Data>, is_partial: bool) -> SResult<Response> { async fn update_multiple_documents(
ctx.is_allowed(Private)?; data: web::Data<Data>,
path: web::Path<IndexParam>,
params: web::Query<UpdateDocumentsQuery>,
body: web::Json<Vec<Document>>,
is_partial: bool,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let index = ctx.index()?; let reader = data.db.main_read_txn()?;
let data: Vec<IndexMap<String, Value>> =
ctx.body_json().await.map_err(ResponseError::bad_request)?;
let query: UpdateDocumentsQuery = ctx.query().unwrap_or_default();
let db = &ctx.state().db;
let reader = db.main_read_txn()?;
let mut schema = index let mut schema = index
.main .main
.schema(&reader)? .schema(&reader)?
.ok_or(ResponseError::internal("schema not found"))?; .ok_or(meilisearch_core::Error::SchemaMissing)?;
if schema.primary_key().is_none() { if schema.primary_key().is_none() {
let id = match query.primary_key { let id = match &params.primary_key {
Some(id) => id, Some(id) => id.to_string(),
None => match data.first().and_then(|docs| find_primary_key(docs)) { None => body
Some(id) => id, .first()
None => return Err(ResponseError::bad_request("Could not infer a primary key")), .and_then(find_primary_key)
}, .ok_or(meilisearch_core::Error::MissingPrimaryKey)?
}; };
let mut writer = db.main_write_txn()?; schema
schema.set_primary_key(&id).map_err(ResponseError::bad_request)?; .set_primary_key(&id)
index.main.put_schema(&mut writer, &schema)?; .map_err(Error::bad_request)?;
writer.commit()?;
data.db.main_write(|w| index.main.put_schema(w, &schema))?;
} }
let mut document_addition = if is_partial { let mut document_addition = if is_partial {
@ -161,63 +200,73 @@ async fn update_multiple_documents(mut ctx: Request<Data>, is_partial: bool) ->
index.documents_addition() index.documents_addition()
}; };
for document in data { for document in body.into_inner() {
document_addition.update_document(document); document_addition.update_document(document);
} }
let mut update_writer = db.update_write_txn()?; let update_id = data.db.update_write(|w| document_addition.finalize(w))?;
let update_id = document_addition.finalize(&mut update_writer)?;
update_writer.commit()?;
let response_body = IndexUpdateResponse { update_id }; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn add_or_replace_multiple_documents(ctx: Request<Data>) -> SResult<Response> { #[post("/indexes/{index_uid}/documents", wrap = "Authentication::Private")]
update_multiple_documents(ctx, false).await async fn add_documents(
data: web::Data<Data>,
path: web::Path<IndexParam>,
params: web::Query<UpdateDocumentsQuery>,
body: web::Json<Vec<Document>>,
) -> Result<HttpResponse, ResponseError> {
update_multiple_documents(data, path, params, body, false).await
} }
pub async fn add_or_update_multiple_documents(ctx: Request<Data>) -> SResult<Response> { #[put("/indexes/{index_uid}/documents", wrap = "Authentication::Private")]
update_multiple_documents(ctx, true).await async fn update_documents(
data: web::Data<Data>,
path: web::Path<IndexParam>,
params: web::Query<UpdateDocumentsQuery>,
body: web::Json<Vec<Document>>,
) -> Result<HttpResponse, ResponseError> {
update_multiple_documents(data, path, params, body, true).await
} }
pub async fn delete_multiple_documents(mut ctx: Request<Data>) -> SResult<Response> { #[post(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/documents/delete-batch",
wrap = "Authentication::Private"
)]
async fn delete_documents(
data: web::Data<Data>,
path: web::Path<IndexParam>,
body: web::Json<Vec<Value>>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let data: Vec<Value> = ctx.body_json().await.map_err(ResponseError::bad_request)?;
let index = ctx.index()?;
let db = &ctx.state().db;
let mut writer = db.update_write_txn()?;
let mut documents_deletion = index.documents_deletion(); let mut documents_deletion = index.documents_deletion();
for document_id in data { for document_id in body.into_inner() {
if let Some(document_id) = meilisearch_core::serde::value_to_string(&document_id) { let document_id = update::value_to_string(&document_id);
documents_deletion documents_deletion.delete_document_by_external_docid(document_id);
.delete_document_by_id(meilisearch_core::serde::compute_document_id(document_id));
}
} }
let update_id = documents_deletion.finalize(&mut writer)?; let update_id = data.db.update_write(|w| documents_deletion.finalize(w))?;
writer.commit()?; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
let response_body = IndexUpdateResponse { update_id };
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn clear_all_documents(ctx: Request<Data>) -> SResult<Response> { #[delete("/indexes/{index_uid}/documents", wrap = "Authentication::Private")]
ctx.is_allowed(Private)?; async fn clear_all_documents(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let index = ctx.index()?; let update_id = data.db.update_write(|w| index.clear_all(w))?;
let db = &ctx.state().db; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
let mut writer = db.update_write_txn()?;
let update_id = index.clear_all(&mut writer)?;
writer.commit()?;
let response_body = IndexUpdateResponse { update_id };
Ok(tide::Response::new(202).body_json(&response_body)?)
} }

View File

@ -0,0 +1,64 @@
use std::fs::File;
use std::path::Path;
use actix_web::{get, post};
use actix_web::{HttpResponse, web};
use serde::{Deserialize, Serialize};
use crate::dump::{DumpInfo, DumpStatus, compressed_dumps_folder, init_dump_process};
use crate::Data;
use crate::error::{Error, ResponseError};
use crate::helpers::Authentication;
pub fn services(cfg: &mut web::ServiceConfig) {
cfg.service(trigger_dump)
.service(get_dump_status);
}
#[post("/dumps", wrap = "Authentication::Private")]
async fn trigger_dump(
data: web::Data<Data>,
) -> Result<HttpResponse, ResponseError> {
let dumps_folder = Path::new(&data.dumps_folder);
match init_dump_process(&data, &dumps_folder) {
Ok(resume) => Ok(HttpResponse::Accepted().json(resume)),
Err(e) => Err(e.into())
}
}
#[derive(Debug, Serialize)]
#[serde(rename_all = "camelCase")]
struct DumpStatusResponse {
status: String,
}
#[derive(Deserialize)]
struct DumpParam {
dump_uid: String,
}
#[get("/dumps/{dump_uid}/status", wrap = "Authentication::Private")]
async fn get_dump_status(
data: web::Data<Data>,
path: web::Path<DumpParam>,
) -> Result<HttpResponse, ResponseError> {
let dumps_folder = Path::new(&data.dumps_folder);
let dump_uid = &path.dump_uid;
if let Some(resume) = DumpInfo::get_current() {
if &resume.uid == dump_uid {
return Ok(HttpResponse::Ok().json(resume));
}
}
if File::open(compressed_dumps_folder(Path::new(dumps_folder), dump_uid)).is_ok() {
let resume = DumpInfo::new(
dump_uid.into(),
DumpStatus::Done
);
Ok(HttpResponse::Ok().json(resume))
} else {
Err(Error::not_found("dump does not exist").into())
}
}

View File

@ -1,60 +1,47 @@
use crate::error::{ResponseError, SResult}; use actix_web::{web, HttpResponse};
use crate::helpers::tide::RequestExt; use actix_web::{get, put};
use crate::helpers::tide::ACL::*; use serde::Deserialize;
use crate::error::{Error, ResponseError};
use crate::helpers::Authentication;
use crate::Data; use crate::Data;
use heed::types::{Str, Unit}; pub fn services(cfg: &mut web::ServiceConfig) {
use serde::Deserialize; cfg.service(get_health).service(change_healthyness);
use tide::{Request, Response}; }
const UNHEALTHY_KEY: &str = "_is_unhealthy"; #[get("/health")]
async fn get_health(data: web::Data<Data>) -> Result<HttpResponse, ResponseError> {
pub async fn get_health(ctx: Request<Data>) -> SResult<Response> { let reader = data.db.main_read_txn()?;
let db = &ctx.state().db; if let Ok(Some(_)) = data.db.get_health(&reader) {
let reader = db.main_read_txn()?; return Err(Error::Maintenance.into());
let common_store = ctx.state().db.common_store();
if let Ok(Some(_)) = common_store.get::<_, Str, Unit>(&reader, UNHEALTHY_KEY) {
return Err(ResponseError::Maintenance);
} }
Ok(HttpResponse::Ok().finish())
Ok(tide::Response::new(200))
} }
pub async fn set_healthy(ctx: Request<Data>) -> SResult<Response> { async fn set_healthy(data: web::Data<Data>) -> Result<HttpResponse, ResponseError> {
ctx.is_allowed(Admin)?; data.db.main_write(|w| data.db.set_healthy(w))?;
let db = &ctx.state().db; Ok(HttpResponse::Ok().finish())
let mut writer = db.main_write_txn()?;
let common_store = ctx.state().db.common_store();
common_store.delete::<_, Str>(&mut writer, UNHEALTHY_KEY)?;
writer.commit()?;
Ok(tide::Response::new(200))
} }
pub async fn set_unhealthy(ctx: Request<Data>) -> SResult<Response> { async fn set_unhealthy(data: web::Data<Data>) -> Result<HttpResponse, ResponseError> {
ctx.is_allowed(Admin)?; data.db.main_write(|w| data.db.set_unhealthy(w))?;
let db = &ctx.state().db; Ok(HttpResponse::Ok().finish())
let mut writer = db.main_write_txn()?;
let common_store = ctx.state().db.common_store();
common_store.put::<_, Str, Unit>(&mut writer, UNHEALTHY_KEY, &())?;
writer.commit()?;
Ok(tide::Response::new(200))
} }
#[derive(Deserialize, Clone)] #[derive(Deserialize, Clone)]
struct HealtBody { struct HealthBody {
health: bool, health: bool,
} }
pub async fn change_healthyness(mut ctx: Request<Data>) -> SResult<Response> { #[put("/health", wrap = "Authentication::Private")]
let body: HealtBody = ctx.body_json().await.map_err(ResponseError::bad_request)?; async fn change_healthyness(
data: web::Data<Data>,
body: web::Json<HealthBody>,
) -> Result<HttpResponse, ResponseError> {
if body.health { if body.health {
set_healthy(ctx).await set_healthy(data).await
} else { } else {
set_unhealthy(ctx).await set_unhealthy(data).await
} }
} }

View File

@ -1,15 +1,26 @@
use actix_web::{delete, get, post, put};
use actix_web::{web, HttpResponse};
use chrono::{DateTime, Utc}; use chrono::{DateTime, Utc};
use log::error; use log::error;
use meilisearch_core::ProcessedUpdateResult; use meilisearch_core::{Database, MainReader, UpdateReader};
use meilisearch_core::update::UpdateStatus;
use rand::seq::SliceRandom; use rand::seq::SliceRandom;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use serde_json::json;
use tide::{Request, Response};
use crate::error::{IntoInternalError, ResponseError, SResult};
use crate::helpers::tide::RequestExt;
use crate::helpers::tide::ACL::*;
use crate::Data; use crate::Data;
use crate::error::{Error, ResponseError};
use crate::helpers::Authentication;
use crate::routes::IndexParam;
pub fn services(cfg: &mut web::ServiceConfig) {
cfg.service(list_indexes)
.service(get_index)
.service(create_index)
.service(update_index)
.service(delete_index)
.service(get_update_status)
.service(get_all_updates_status);
}
fn generate_uid() -> String { fn generate_uid() -> String {
let mut rng = rand::thread_rng(); let mut rng = rand::thread_rng();
@ -20,26 +31,41 @@ fn generate_uid() -> String {
.collect() .collect()
} }
pub async fn list_indexes(ctx: Request<Data>) -> SResult<Response> { #[derive(Debug, Serialize, Deserialize, Clone)]
ctx.is_allowed(Private)?; #[serde(rename_all = "camelCase")]
pub struct IndexResponse {
pub name: String,
pub uid: String,
created_at: DateTime<Utc>,
updated_at: DateTime<Utc>,
pub primary_key: Option<String>,
}
let indexes_uids = ctx.state().db.indexes_uids(); pub fn list_indexes_sync(data: &web::Data<Data>, reader: &MainReader) -> Result<Vec<IndexResponse>, ResponseError> {
let mut indexes = Vec::new();
let db = &ctx.state().db; for index_uid in data.db.indexes_uids() {
let reader = db.main_read_txn()?; let index = data.db.open_index(&index_uid);
let mut response_body = Vec::new();
for index_uid in indexes_uids {
let index = ctx.state().db.open_index(&index_uid);
match index { match index {
Some(index) => { Some(index) => {
let name = index.main.name(&reader)?.into_internal_error()?; let name = index.main.name(reader)?.ok_or(Error::internal(
let created_at = index.main.created_at(&reader)?.into_internal_error()?; "Impossible to get the name of an index",
let updated_at = index.main.updated_at(&reader)?.into_internal_error()?; ))?;
let created_at = index
.main
.created_at(reader)?
.ok_or(Error::internal(
"Impossible to get the create date of an index",
))?;
let updated_at = index
.main
.updated_at(reader)?
.ok_or(Error::internal(
"Impossible to get the last update date of an index",
))?;
let primary_key = match index.main.schema(&reader) { let primary_key = match index.main.schema(reader) {
Ok(Some(schema)) => match schema.primary_key() { Ok(Some(schema)) => match schema.primary_key() {
Some(primary_key) => Some(primary_key.to_owned()), Some(primary_key) => Some(primary_key.to_owned()),
None => None, None => None,
@ -54,7 +80,7 @@ pub async fn list_indexes(ctx: Request<Data>) -> SResult<Response> {
updated_at, updated_at,
primary_key, primary_key,
}; };
response_body.push(index_response); indexes.push(index_response);
} }
None => error!( None => error!(
"Index {} is referenced in the indexes list but cannot be found", "Index {} is referenced in the indexes list but cannot be found",
@ -63,31 +89,43 @@ pub async fn list_indexes(ctx: Request<Data>) -> SResult<Response> {
} }
} }
Ok(tide::Response::new(200).body_json(&response_body)?) Ok(indexes)
} }
#[derive(Debug, Serialize)] #[get("/indexes", wrap = "Authentication::Private")]
#[serde(rename_all = "camelCase")] async fn list_indexes(data: web::Data<Data>) -> Result<HttpResponse, ResponseError> {
struct IndexResponse { let reader = data.db.main_read_txn()?;
name: String, let indexes = list_indexes_sync(&data, &reader)?;
uid: String,
created_at: DateTime<Utc>, Ok(HttpResponse::Ok().json(indexes))
updated_at: DateTime<Utc>,
primary_key: Option<String>,
} }
pub async fn get_index(ctx: Request<Data>) -> SResult<Response> { #[get("/indexes/{index_uid}", wrap = "Authentication::Private")]
ctx.is_allowed(Private)?; async fn get_index(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let index = ctx.index()?; let reader = data.db.main_read_txn()?;
let name = index.main.name(&reader)?.ok_or(Error::internal(
let db = &ctx.state().db; "Impossible to get the name of an index",
let reader = db.main_read_txn()?; ))?;
let created_at = index
let uid = ctx.url_param("index")?; .main
let name = index.main.name(&reader)?.into_internal_error()?; .created_at(&reader)?
let created_at = index.main.created_at(&reader)?.into_internal_error()?; .ok_or(Error::internal(
let updated_at = index.main.updated_at(&reader)?.into_internal_error()?; "Impossible to get the create date of an index",
))?;
let updated_at = index
.main
.updated_at(&reader)?
.ok_or(Error::internal(
"Impossible to get the last update date of an index",
))?;
let primary_key = match index.main.schema(&reader) { let primary_key = match index.main.schema(&reader) {
Ok(Some(schema)) => match schema.primary_key() { Ok(Some(schema)) => match schema.primary_key() {
@ -96,16 +134,15 @@ pub async fn get_index(ctx: Request<Data>) -> SResult<Response> {
}, },
_ => None, _ => None,
}; };
let index_response = IndexResponse {
let response_body = IndexResponse {
name, name,
uid, uid: path.index_uid.clone(),
created_at, created_at,
updated_at, updated_at,
primary_key, primary_key,
}; };
Ok(tide::Response::new(200).body_json(&response_body)?) Ok(HttpResponse::Ok().json(index_response))
} }
#[derive(Debug, Deserialize)] #[derive(Debug, Deserialize)]
@ -116,86 +153,90 @@ struct IndexCreateRequest {
primary_key: Option<String>, primary_key: Option<String>,
} }
#[derive(Debug, Serialize)]
#[serde(rename_all = "camelCase")] pub fn create_index_sync(
struct IndexCreateResponse { database: &std::sync::Arc<Database>,
name: String,
uid: String, uid: String,
created_at: DateTime<Utc>, name: String,
updated_at: DateTime<Utc>,
primary_key: Option<String>, primary_key: Option<String>,
) -> Result<IndexResponse, Error> {
let created_index = database
.create_index(&uid)
.map_err(|e| match e {
meilisearch_core::Error::IndexAlreadyExists => Error::IndexAlreadyExists(uid.clone()),
_ => Error::create_index(e)
})?;
let index_response = database.main_write::<_, _, Error>(|mut write_txn| {
created_index.main.put_name(&mut write_txn, &name)?;
let created_at = created_index
.main
.created_at(&write_txn)?
.ok_or(Error::internal("Impossible to read created at"))?;
let updated_at = created_index
.main
.updated_at(&write_txn)?
.ok_or(Error::internal("Impossible to read updated at"))?;
if let Some(id) = primary_key.clone() {
if let Some(mut schema) = created_index.main.schema(&write_txn)? {
schema
.set_primary_key(&id)
.map_err(Error::bad_request)?;
created_index.main.put_schema(&mut write_txn, &schema)?;
}
}
let index_response = IndexResponse {
name,
uid,
created_at,
updated_at,
primary_key,
};
Ok(index_response)
})?;
Ok(index_response)
} }
pub async fn create_index(mut ctx: Request<Data>) -> SResult<Response> { #[post("/indexes", wrap = "Authentication::Private")]
ctx.is_allowed(Private)?; async fn create_index(
data: web::Data<Data>,
let body = ctx body: web::Json<IndexCreateRequest>,
.body_json::<IndexCreateRequest>() ) -> Result<HttpResponse, ResponseError> {
.await
.map_err(ResponseError::bad_request)?;
if let (None, None) = (body.name.clone(), body.uid.clone()) { if let (None, None) = (body.name.clone(), body.uid.clone()) {
return Err(ResponseError::bad_request( return Err(Error::bad_request(
"Index creation must have an uid", "Index creation must have an uid",
)); ).into());
} }
let db = &ctx.state().db; let uid = match &body.uid {
let uid = match body.uid {
Some(uid) => { Some(uid) => {
if uid if uid
.chars() .chars()
.all(|x| x.is_ascii_alphanumeric() || x == '-' || x == '_') .all(|x| x.is_ascii_alphanumeric() || x == '-' || x == '_')
{ {
uid uid.to_owned()
} else { } else {
return Err(ResponseError::InvalidIndexUid); return Err(Error::InvalidIndexUid.into());
} }
} }
None => loop { None => loop {
let uid = generate_uid(); let uid = generate_uid();
if db.open_index(&uid).is_none() { if data.db.open_index(&uid).is_none() {
break uid; break uid;
} }
}, },
}; };
let created_index = match db.create_index(&uid) { let name = body.name.as_ref().unwrap_or(&uid).to_string();
Ok(index) => index,
Err(e) => return Err(ResponseError::create_index(e)),
};
let mut writer = db.main_write_txn()?; let index_response = create_index_sync(&data.db, uid, name, body.primary_key.clone())?;
let name = body.name.unwrap_or(uid.clone());
created_index.main.put_name(&mut writer, &name)?;
let created_at = created_index
.main
.created_at(&writer)?
.into_internal_error()?;
let updated_at = created_index
.main
.updated_at(&writer)?
.into_internal_error()?;
if let Some(id) = body.primary_key.clone() { Ok(HttpResponse::Created().json(index_response))
if let Some(mut schema) = created_index.main.schema(&mut writer)? {
schema.set_primary_key(&id).map_err(ResponseError::bad_request)?;
created_index.main.put_schema(&mut writer, &schema)?;
}
}
writer.commit()?;
let response_body = IndexCreateResponse {
name,
uid,
created_at,
updated_at,
primary_key: body.primary_key,
};
Ok(tide::Response::new(201).body_json(&response_body)?)
} }
#[derive(Debug, Deserialize)] #[derive(Debug, Deserialize)]
@ -215,49 +256,48 @@ struct UpdateIndexResponse {
primary_key: Option<String>, primary_key: Option<String>,
} }
pub async fn update_index(mut ctx: Request<Data>) -> SResult<Response> { #[put("/indexes/{index_uid}", wrap = "Authentication::Private")]
ctx.is_allowed(Private)?; async fn update_index(
data: web::Data<Data>,
path: web::Path<IndexParam>,
body: web::Json<IndexCreateRequest>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let body = ctx data.db.main_write::<_, _, ResponseError>(|writer| {
.body_json::<UpdateIndexRequest>() if let Some(name) = &body.name {
.await index.main.put_name(writer, name)?;
.map_err(ResponseError::bad_request)?; }
let index_uid = ctx.url_param("index")?; if let Some(id) = body.primary_key.clone() {
let index = ctx.index()?; if let Some(mut schema) = index.main.schema(writer)? {
schema.set_primary_key(&id)?;
let db = &ctx.state().db; index.main.put_schema(writer, &schema)?;
let mut writer = db.main_write_txn()?;
if let Some(name) = body.name {
index.main.put_name(&mut writer, &name)?;
}
if let Some(id) = body.primary_key.clone() {
if let Some(mut schema) = index.main.schema(&mut writer)? {
match schema.primary_key() {
Some(_) => {
return Err(ResponseError::bad_request(
"The primary key cannot be updated",
));
}
None => {
schema
.set_primary_key(&id)
.map_err(ResponseError::bad_request)?;
index.main.put_schema(&mut writer, &schema)?;
}
} }
} }
} index.main.put_updated_at(writer)?;
Ok(())
})?;
index.main.put_updated_at(&mut writer)?; let reader = data.db.main_read_txn()?;
writer.commit()?; let name = index.main.name(&reader)?.ok_or(Error::internal(
"Impossible to get the name of an index",
let reader = db.main_read_txn()?; ))?;
let name = index.main.name(&reader)?.into_internal_error()?; let created_at = index
let created_at = index.main.created_at(&reader)?.into_internal_error()?; .main
let updated_at = index.main.updated_at(&reader)?.into_internal_error()?; .created_at(&reader)?
.ok_or(Error::internal(
"Impossible to get the create date of an index",
))?;
let updated_at = index
.main
.updated_at(&reader)?
.ok_or(Error::internal(
"Impossible to get the last update date of an index",
))?;
let primary_key = match index.main.schema(&reader) { let primary_key = match index.main.schema(&reader) {
Ok(Some(schema)) => match schema.primary_key() { Ok(Some(schema)) => match schema.primary_key() {
@ -267,86 +307,82 @@ pub async fn update_index(mut ctx: Request<Data>) -> SResult<Response> {
_ => None, _ => None,
}; };
let response_body = UpdateIndexResponse { let index_response = IndexResponse {
name, name,
uid: index_uid, uid: path.index_uid.clone(),
created_at, created_at,
updated_at, updated_at,
primary_key, primary_key,
}; };
Ok(tide::Response::new(200).body_json(&response_body)?) Ok(HttpResponse::Ok().json(index_response))
} }
pub async fn get_update_status(ctx: Request<Data>) -> SResult<Response> { #[delete("/indexes/{index_uid}", wrap = "Authentication::Private")]
ctx.is_allowed(Private)?; async fn delete_index(
data: web::Data<Data>,
let db = &ctx.state().db; path: web::Path<IndexParam>,
let reader = db.update_read_txn()?; ) -> Result<HttpResponse, ResponseError> {
if data.db.delete_index(&path.index_uid)? {
let update_id = ctx Ok(HttpResponse::NoContent().finish())
.param::<u64>("update_id") } else {
.map_err(|e| ResponseError::bad_parameter("update_id", e))?; Err(Error::index_not_found(&path.index_uid).into())
let index = ctx.index()?;
let status = index.update_status(&reader, update_id)?;
let response = match status {
Some(status) => tide::Response::new(200).body_json(&status).unwrap(),
None => tide::Response::new(404)
.body_json(&json!({ "message": "unknown update id" }))
.unwrap(),
};
Ok(response)
}
pub async fn get_all_updates_status(ctx: Request<Data>) -> SResult<Response> {
ctx.is_allowed(Private)?;
let db = &ctx.state().db;
let reader = db.update_read_txn()?;
let index = ctx.index()?;
let response = index.all_updates_status(&reader)?;
Ok(tide::Response::new(200).body_json(&response).unwrap())
}
pub async fn delete_index(ctx: Request<Data>) -> SResult<Response> {
ctx.is_allowed(Private)?;
let _ = ctx.index()?;
let index_uid = ctx.url_param("index")?;
ctx.state().db.delete_index(&index_uid)?;
Ok(tide::Response::new(204))
}
pub fn index_update_callback(index_uid: &str, data: &Data, status: ProcessedUpdateResult) {
if status.error.is_some() {
return;
}
if let Some(index) = data.db.open_index(&index_uid) {
let db = &data.db;
let mut writer = match db.main_write_txn() {
Ok(writer) => writer,
Err(e) => {
error!("Impossible to get write_txn; {}", e);
return;
}
};
if let Err(e) = data.compute_stats(&mut writer, &index_uid) {
error!("Impossible to compute stats; {}", e)
}
if let Err(e) = data.set_last_update(&mut writer) {
error!("Impossible to update last_update; {}", e)
}
if let Err(e) = index.main.put_updated_at(&mut writer) {
error!("Impossible to update updated_at; {}", e)
}
if let Err(e) = writer.commit() {
error!("Impossible to get write_txn; {}", e);
}
} }
} }
#[derive(Deserialize)]
struct UpdateParam {
index_uid: String,
update_id: u64,
}
#[get(
"/indexes/{index_uid}/updates/{update_id}",
wrap = "Authentication::Private"
)]
async fn get_update_status(
data: web::Data<Data>,
path: web::Path<UpdateParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let reader = data.db.update_read_txn()?;
let status = index.update_status(&reader, path.update_id)?;
match status {
Some(status) => Ok(HttpResponse::Ok().json(status)),
None => Err(Error::NotFound(format!(
"Update {}",
path.update_id
)).into()),
}
}
pub fn get_all_updates_status_sync(
data: &web::Data<Data>,
reader: &UpdateReader,
index_uid: &str,
) -> Result<Vec<UpdateStatus>, Error> {
let index = data
.db
.open_index(index_uid)
.ok_or(Error::index_not_found(index_uid))?;
Ok(index.all_updates_status(reader)?)
}
#[get("/indexes/{index_uid}/updates", wrap = "Authentication::Private")]
async fn get_all_updates_status(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let reader = data.db.update_read_txn()?;
let response = get_all_updates_status_sync(&data, &reader, &path.index_uid)?;
Ok(HttpResponse::Ok().json(response))
}

View File

@ -1,17 +1,26 @@
use crate::error::SResult; use actix_web::web;
use crate::helpers::tide::RequestExt; use actix_web::HttpResponse;
use crate::helpers::tide::ACL::*; use actix_web::get;
use serde::Serialize;
use crate::helpers::Authentication;
use crate::Data; use crate::Data;
use serde_json::json;
use tide::{Request, Response};
pub async fn list(ctx: Request<Data>) -> SResult<Response> { pub fn services(cfg: &mut web::ServiceConfig) {
ctx.is_allowed(Admin)?; cfg.service(list);
}
let keys = &ctx.state().api_keys;
#[derive(Serialize)]
Ok(tide::Response::new(200).body_json(&json!({ struct KeysResponse {
"private": keys.private, private: Option<String>,
"public": keys.public, public: Option<String>,
}))?) }
#[get("/keys", wrap = "Authentication::Admin")]
async fn list(data: web::Data<Data>) -> HttpResponse {
let api_keys = data.api_keys.clone();
HttpResponse::Ok().json(KeysResponse {
private: api_keys.private,
public: api_keys.public,
})
} }

View File

@ -1,7 +1,5 @@
use crate::data::Data; use actix_web::{get, HttpResponse};
use std::future::Future; use serde::{Deserialize, Serialize};
use tide::IntoResponse;
use tide::Response;
pub mod document; pub mod document;
pub mod health; pub mod health;
@ -12,119 +10,35 @@ pub mod setting;
pub mod stats; pub mod stats;
pub mod stop_words; pub mod stop_words;
pub mod synonym; pub mod synonym;
pub mod dump;
async fn into_response<T: IntoResponse, U: IntoResponse>( #[derive(Deserialize)]
x: impl Future<Output = Result<T, U>>, pub struct IndexParam {
) -> Response { index_uid: String,
match x.await { }
Ok(resp) => resp.into_response(),
Err(resp) => resp.into_response(), #[derive(Serialize)]
#[serde(rename_all = "camelCase")]
pub struct IndexUpdateResponse {
pub update_id: u64,
}
impl IndexUpdateResponse {
pub fn with_id(update_id: u64) -> Self {
Self { update_id }
} }
} }
pub fn load_routes(app: &mut tide::Server<Data>) { #[get("/")]
app.at("/").get(|_| async { pub async fn load_html() -> HttpResponse {
tide::Response::new(200) HttpResponse::Ok()
.body_string(include_str!("../../public/interface.html").to_string()) .content_type("text/html; charset=utf-8")
.set_mime(mime::TEXT_HTML_UTF_8) .body(include_str!("../../public/interface.html").to_string())
}); }
app.at("/bulma.min.css").get(|_| async {
tide::Response::new(200) #[get("/bulma.min.css")]
.body_string(include_str!("../../public/bulma.min.css").to_string()) pub async fn load_css() -> HttpResponse {
.set_mime(mime::TEXT_CSS_UTF_8) HttpResponse::Ok()
}); .content_type("text/css; charset=utf-8")
.body(include_str!("../../public/bulma.min.css").to_string())
app.at("/indexes")
.get(|ctx| into_response(index::list_indexes(ctx)))
.post(|ctx| into_response(index::create_index(ctx)));
app.at("/indexes/search")
.post(|ctx| into_response(search::search_multi_index(ctx)));
app.at("/indexes/:index")
.get(|ctx| into_response(index::get_index(ctx)))
.put(|ctx| into_response(index::update_index(ctx)))
.delete(|ctx| into_response(index::delete_index(ctx)));
app.at("/indexes/:index/search")
.get(|ctx| into_response(search::search_with_url_query(ctx)));
app.at("/indexes/:index/updates")
.get(|ctx| into_response(index::get_all_updates_status(ctx)));
app.at("/indexes/:index/updates/:update_id")
.get(|ctx| into_response(index::get_update_status(ctx)));
app.at("/indexes/:index/documents")
.get(|ctx| into_response(document::get_all_documents(ctx)))
.post(|ctx| into_response(document::add_or_replace_multiple_documents(ctx)))
.put(|ctx| into_response(document::add_or_update_multiple_documents(ctx)))
.delete(|ctx| into_response(document::clear_all_documents(ctx)));
app.at("/indexes/:index/documents/:document_id")
.get(|ctx| into_response(document::get_document(ctx)))
.delete(|ctx| into_response(document::delete_document(ctx)));
app.at("/indexes/:index/documents/delete-batch")
.post(|ctx| into_response(document::delete_multiple_documents(ctx)));
app.at("/indexes/:index/settings")
.get(|ctx| into_response(setting::get_all(ctx)))
.post(|ctx| into_response(setting::update_all(ctx)))
.delete(|ctx| into_response(setting::delete_all(ctx)));
app.at("/indexes/:index/settings/ranking-rules")
.get(|ctx| into_response(setting::get_rules(ctx)))
.post(|ctx| into_response(setting::update_rules(ctx)))
.delete(|ctx| into_response(setting::delete_rules(ctx)));
app.at("/indexes/:index/settings/distinct-attribute")
.get(|ctx| into_response(setting::get_distinct(ctx)))
.post(|ctx| into_response(setting::update_distinct(ctx)))
.delete(|ctx| into_response(setting::delete_distinct(ctx)));
app.at("/indexes/:index/settings/searchable-attributes")
.get(|ctx| into_response(setting::get_searchable(ctx)))
.post(|ctx| into_response(setting::update_searchable(ctx)))
.delete(|ctx| into_response(setting::delete_searchable(ctx)));
app.at("/indexes/:index/settings/displayed-attributes")
.get(|ctx| into_response(setting::displayed(ctx)))
.post(|ctx| into_response(setting::update_displayed(ctx)))
.delete(|ctx| into_response(setting::delete_displayed(ctx)));
app.at("/indexes/:index/settings/accept-new-fields")
.get(|ctx| into_response(setting::get_accept_new_fields(ctx)))
.post(|ctx| into_response(setting::update_accept_new_fields(ctx)));
app.at("/indexes/:index/settings/synonyms")
.get(|ctx| into_response(synonym::get(ctx)))
.post(|ctx| into_response(synonym::update(ctx)))
.delete(|ctx| into_response(synonym::delete(ctx)));
app.at("/indexes/:index/settings/stop-words")
.get(|ctx| into_response(stop_words::get(ctx)))
.post(|ctx| into_response(stop_words::update(ctx)))
.delete(|ctx| into_response(stop_words::delete(ctx)));
app.at("/indexes/:index/stats")
.get(|ctx| into_response(stats::index_stats(ctx)));
app.at("/keys").get(|ctx| into_response(key::list(ctx)));
app.at("/health")
.get(|ctx| into_response(health::get_health(ctx)))
.put(|ctx| into_response(health::change_healthyness(ctx)));
app.at("/stats")
.get(|ctx| into_response(stats::get_stats(ctx)));
app.at("/version")
.get(|ctx| into_response(stats::get_version(ctx)));
app.at("/sys-info")
.get(|ctx| into_response(stats::get_sys_info(ctx)));
app.at("/sys-info/pretty")
.get(|ctx| into_response(stats::get_sys_info_pretty(ctx)));
} }

View File

@ -1,22 +1,27 @@
use std::collections::HashMap; use std::collections::{HashMap, HashSet};
use std::collections::HashSet;
use std::time::Duration;
use meilisearch_core::Index; use actix_web::{get, post, web, HttpResponse};
use rayon::iter::{IntoParallelIterator, ParallelIterator}; use log::warn;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use tide::{Request, Response}; use serde_json::Value;
use crate::error::{ResponseError, SResult}; use crate::error::{Error, FacetCountError, ResponseError};
use crate::helpers::meilisearch::{Error, IndexSearchExt, SearchHit}; use crate::helpers::meilisearch::{IndexSearchExt, SearchResult};
use crate::helpers::tide::RequestExt; use crate::helpers::Authentication;
use crate::helpers::tide::ACL::*; use crate::routes::IndexParam;
use crate::Data; use crate::Data;
#[derive(Deserialize)] use meilisearch_core::facets::FacetFilter;
use meilisearch_schema::{FieldId, Schema};
pub fn services(cfg: &mut web::ServiceConfig) {
cfg.service(search_with_post).service(search_with_url_query);
}
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "camelCase", deny_unknown_fields)] #[serde(rename_all = "camelCase", deny_unknown_fields)]
struct SearchQuery { pub struct SearchQuery {
q: String, q: Option<String>,
offset: Option<usize>, offset: Option<usize>,
limit: Option<usize>, limit: Option<usize>,
attributes_to_retrieve: Option<String>, attributes_to_retrieve: Option<String>,
@ -24,214 +29,241 @@ struct SearchQuery {
crop_length: Option<usize>, crop_length: Option<usize>,
attributes_to_highlight: Option<String>, attributes_to_highlight: Option<String>,
filters: Option<String>, filters: Option<String>,
timeout_ms: Option<u64>,
matches: Option<bool>, matches: Option<bool>,
facet_filters: Option<String>,
facets_distribution: Option<String>,
} }
pub async fn search_with_url_query(ctx: Request<Data>) -> SResult<Response> { #[get("/indexes/{index_uid}/search", wrap = "Authentication::Public")]
ctx.is_allowed(Public)?; async fn search_with_url_query(
data: web::Data<Data>,
let index = ctx.index()?; path: web::Path<IndexParam>,
let db = &ctx.state().db; params: web::Query<SearchQuery>,
let reader = db.main_read_txn()?; ) -> Result<HttpResponse, ResponseError> {
let search_result = params.search(&path.index_uid, data)?;
let schema = index Ok(HttpResponse::Ok().json(search_result))
.main
.schema(&reader)?
.ok_or(ResponseError::open_index("No Schema found"))?;
let query: SearchQuery = ctx
.query()
.map_err(|_| ResponseError::bad_request("invalid query parameter"))?;
let mut search_builder = index.new_search(query.q.clone());
if let Some(offset) = query.offset {
search_builder.offset(offset);
}
if let Some(limit) = query.limit {
search_builder.limit(limit);
}
if let Some(attributes_to_retrieve) = query.attributes_to_retrieve {
for attr in attributes_to_retrieve.split(',') {
search_builder.add_retrievable_field(attr.to_string());
}
}
if let Some(attributes_to_crop) = query.attributes_to_crop {
let crop_length = query.crop_length.unwrap_or(200);
if attributes_to_crop == "*" {
let attributes_to_crop = schema
.displayed_name()
.iter()
.map(|attr| (attr.to_string(), crop_length))
.collect();
search_builder.attributes_to_crop(attributes_to_crop);
} else {
let attributes_to_crop = attributes_to_crop
.split(',')
.map(|r| (r.to_string(), crop_length))
.collect();
search_builder.attributes_to_crop(attributes_to_crop);
}
}
if let Some(attributes_to_highlight) = query.attributes_to_highlight {
let attributes_to_highlight = if attributes_to_highlight == "*" {
schema
.displayed_name()
.iter()
.map(|s| s.to_string())
.collect()
} else {
attributes_to_highlight
.split(',')
.map(|s| s.to_string())
.collect()
};
search_builder.attributes_to_highlight(attributes_to_highlight);
}
if let Some(filters) = query.filters {
search_builder.filters(filters);
}
if let Some(timeout_ms) = query.timeout_ms {
search_builder.timeout(Duration::from_millis(timeout_ms));
}
if let Some(matches) = query.matches {
if matches {
search_builder.get_matches();
}
}
let response = match search_builder.search(&reader) {
Ok(response) => response,
Err(Error::Internal(message)) => return Err(ResponseError::Internal(message)),
Err(others) => return Err(ResponseError::bad_request(others)),
};
Ok(tide::Response::new(200).body_json(&response).unwrap())
} }
#[derive(Clone, Deserialize)] #[derive(Deserialize)]
#[serde(rename_all = "camelCase", deny_unknown_fields)] #[serde(rename_all = "camelCase", deny_unknown_fields)]
struct SearchMultiBody { pub struct SearchQueryPost {
indexes: HashSet<String>, q: Option<String>,
query: String,
offset: Option<usize>, offset: Option<usize>,
limit: Option<usize>, limit: Option<usize>,
attributes_to_retrieve: Option<HashSet<String>>, attributes_to_retrieve: Option<Vec<String>>,
searchable_attributes: Option<HashSet<String>>, attributes_to_crop: Option<Vec<String>>,
attributes_to_crop: Option<HashMap<String, usize>>, crop_length: Option<usize>,
attributes_to_highlight: Option<HashSet<String>>, attributes_to_highlight: Option<Vec<String>>,
filters: Option<String>, filters: Option<String>,
timeout_ms: Option<u64>,
matches: Option<bool>, matches: Option<bool>,
facet_filters: Option<Value>,
facets_distribution: Option<Vec<String>>,
} }
#[derive(Debug, Clone, Serialize)] impl From<SearchQueryPost> for SearchQuery {
#[serde(rename_all = "camelCase")] fn from(other: SearchQueryPost) -> SearchQuery {
struct SearchMultiBodyResponse { SearchQuery {
hits: HashMap<String, Vec<SearchHit>>, q: other.q,
offset: usize, offset: other.offset,
hits_per_page: usize, limit: other.limit,
processing_time_ms: usize, attributes_to_retrieve: other.attributes_to_retrieve.map(|attrs| attrs.join(",")),
query: String, attributes_to_crop: other.attributes_to_crop.map(|attrs| attrs.join(",")),
crop_length: other.crop_length,
attributes_to_highlight: other.attributes_to_highlight.map(|attrs| attrs.join(",")),
filters: other.filters,
matches: other.matches,
facet_filters: other.facet_filters.map(|f| f.to_string()),
facets_distribution: other.facets_distribution.map(|f| format!("{:?}", f)),
}
}
} }
pub async fn search_multi_index(mut ctx: Request<Data>) -> SResult<Response> { #[post("/indexes/{index_uid}/search", wrap = "Authentication::Public")]
ctx.is_allowed(Public)?; async fn search_with_post(
let body = ctx data: web::Data<Data>,
.body_json::<SearchMultiBody>() path: web::Path<IndexParam>,
.await params: web::Json<SearchQueryPost>,
.map_err(ResponseError::bad_request)?; ) -> Result<HttpResponse, ResponseError> {
let query: SearchQuery = params.0.into();
let search_result = query.search(&path.index_uid, data)?;
Ok(HttpResponse::Ok().json(search_result))
}
let mut index_list = body.clone().indexes; impl SearchQuery {
fn search(
&self,
index_uid: &str,
data: web::Data<Data>,
) -> Result<SearchResult, ResponseError> {
let index = data
.db
.open_index(index_uid)
.ok_or(Error::index_not_found(index_uid))?;
for index in index_list.clone() { let reader = data.db.main_read_txn()?;
if index == "*" { let schema = index
index_list = ctx.state().db.indexes_uids().into_iter().collect(); .main
break; .schema(&reader)?
} .ok_or(Error::internal("Impossible to retrieve the schema"))?;
}
let mut offset = 0; let query = self
let mut count = 20; .q
.clone()
.and_then(|q| if q.is_empty() { None } else { Some(q) });
if let Some(body_offset) = body.offset { let mut search_builder = index.new_search(query);
if let Some(limit) = body.limit {
offset = body_offset;
count = limit;
}
}
let offset = offset;
let count = count;
let db = &ctx.state().db;
let par_body = body.clone();
let responses_per_index: Vec<SResult<_>> = index_list
.into_par_iter()
.map(move |index_uid| {
let index: Index = db
.open_index(&index_uid)
.ok_or(ResponseError::index_not_found(&index_uid))?;
let mut search_builder = index.new_search(par_body.query.clone());
if let Some(offset) = self.offset {
search_builder.offset(offset); search_builder.offset(offset);
search_builder.limit(count); }
if let Some(limit) = self.limit {
search_builder.limit(limit);
}
if let Some(attributes_to_retrieve) = par_body.attributes_to_retrieve.clone() { let available_attributes = schema.displayed_name();
search_builder.attributes_to_retrieve(attributes_to_retrieve); let mut restricted_attributes: HashSet<&str>;
match &self.attributes_to_retrieve {
Some(attributes_to_retrieve) => {
let attributes_to_retrieve: HashSet<&str> =
attributes_to_retrieve.split(',').collect();
if attributes_to_retrieve.contains("*") {
restricted_attributes = available_attributes.clone();
} else {
restricted_attributes = HashSet::new();
for attr in attributes_to_retrieve {
if available_attributes.contains(attr) {
restricted_attributes.insert(attr);
search_builder.add_retrievable_field(attr.to_string());
} else {
warn!("The attributes {:?} present in attributesToCrop parameter doesn't exist", attr);
}
}
}
} }
if let Some(attributes_to_crop) = par_body.attributes_to_crop.clone() { None => {
search_builder.attributes_to_crop(attributes_to_crop); restricted_attributes = available_attributes.clone();
} }
if let Some(attributes_to_highlight) = par_body.attributes_to_highlight.clone() { }
search_builder.attributes_to_highlight(attributes_to_highlight);
if let Some(ref facet_filters) = self.facet_filters {
let attrs = index
.main
.attributes_for_faceting(&reader)?
.unwrap_or_default();
search_builder.add_facet_filters(FacetFilter::from_str(
facet_filters,
&schema,
&attrs,
)?);
}
if let Some(facets) = &self.facets_distribution {
match index.main.attributes_for_faceting(&reader)? {
Some(ref attrs) => {
let field_ids = prepare_facet_list(&facets, &schema, attrs)?;
search_builder.add_facets(field_ids);
}
None => return Err(FacetCountError::NoFacetSet.into()),
} }
if let Some(filters) = par_body.filters.clone() { }
search_builder.filters(filters);
if let Some(attributes_to_crop) = &self.attributes_to_crop {
let default_length = self.crop_length.unwrap_or(200);
let mut final_attributes: HashMap<String, usize> = HashMap::new();
for attribute in attributes_to_crop.split(',') {
let mut attribute = attribute.split(':');
let attr = attribute.next();
let length = attribute
.next()
.and_then(|s| s.parse().ok())
.unwrap_or(default_length);
match attr {
Some("*") => {
for attr in &restricted_attributes {
final_attributes.insert(attr.to_string(), length);
}
}
Some(attr) => {
if available_attributes.contains(attr) {
final_attributes.insert(attr.to_string(), length);
} else {
warn!("The attributes {:?} present in attributesToCrop parameter doesn't exist", attr);
}
}
None => (),
}
} }
if let Some(timeout_ms) = par_body.timeout_ms { search_builder.attributes_to_crop(final_attributes);
search_builder.timeout(Duration::from_millis(timeout_ms)); }
}
if let Some(matches) = par_body.matches { if let Some(attributes_to_highlight) = &self.attributes_to_highlight {
if matches { let mut final_attributes: HashSet<String> = HashSet::new();
search_builder.get_matches(); for attribute in attributes_to_highlight.split(',') {
if attribute == "*" {
for attr in &restricted_attributes {
final_attributes.insert(attr.to_string());
}
} else if available_attributes.contains(attribute) {
final_attributes.insert(attribute.to_string());
} else {
warn!("The attributes {:?} present in attributesToHighlight parameter doesn't exist", attribute);
} }
} }
let reader = db.main_read_txn()?; search_builder.attributes_to_highlight(final_attributes);
let response = search_builder.search(&reader)?;
Ok((index_uid, response))
})
.collect();
let mut hits_map = HashMap::new();
let mut max_query_time = 0;
for response in responses_per_index {
if let Ok((index_uid, response)) = response {
if response.processing_time_ms > max_query_time {
max_query_time = response.processing_time_ms;
}
hits_map.insert(index_uid, response.hits);
} }
if let Some(filters) = &self.filters {
search_builder.filters(filters.to_string());
}
if let Some(matches) = self.matches {
if matches {
search_builder.get_matches();
}
}
search_builder.search(&reader)
}
}
/// Parses the incoming string into an array of attributes for which to return a count. It returns
/// a Vec of attribute names ascociated with their id.
///
/// An error is returned if the array is malformed, or if it contains attributes that are
/// unexisting, or not set as facets.
fn prepare_facet_list(
facets: &str,
schema: &Schema,
facet_attrs: &[FieldId],
) -> Result<Vec<(FieldId, String)>, FacetCountError> {
let json_array = serde_json::from_str(facets)?;
match json_array {
Value::Array(vals) => {
let wildcard = Value::String("*".to_string());
if vals.iter().any(|f| f == &wildcard) {
let attrs = facet_attrs
.iter()
.filter_map(|&id| schema.name(id).map(|n| (id, n.to_string())))
.collect();
return Ok(attrs);
}
let mut field_ids = Vec::with_capacity(facet_attrs.len());
for facet in vals {
match facet {
Value::String(facet) => {
if let Some(id) = schema.id(&facet) {
if !facet_attrs.contains(&id) {
return Err(FacetCountError::AttributeNotSet(facet));
}
field_ids.push((id, facet));
}
}
bad_val => return Err(FacetCountError::unexpected_token(bad_val, &["String"])),
}
}
Ok(field_ids)
}
bad_val => Err(FacetCountError::unexpected_token(bad_val, &["[String]"])),
} }
let response = SearchMultiBodyResponse {
hits: hits_map,
offset,
hits_per_page: count,
processing_time_ms: max_query_time,
query: body.query,
};
Ok(tide::Response::new(200).body_json(&response).unwrap())
} }

View File

@ -1,98 +1,152 @@
use meilisearch_core::settings::{Settings, SettingsUpdate, UpdateState, DEFAULT_RANKING_RULES};
use std::collections::{BTreeMap, BTreeSet, HashSet}; use std::collections::{BTreeMap, BTreeSet, HashSet};
use tide::{Request, Response};
use crate::error::{ResponseError, SResult}; use actix_web::{delete, get, post};
use crate::helpers::tide::RequestExt; use actix_web::{web, HttpResponse};
use crate::helpers::tide::ACL::*; use meilisearch_core::{MainReader, UpdateWriter};
use crate::routes::document::IndexUpdateResponse; use meilisearch_core::settings::{Settings, SettingsUpdate, UpdateState, DEFAULT_RANKING_RULES};
use meilisearch_schema::Schema;
use crate::Data; use crate::Data;
use crate::error::{Error, ResponseError};
use crate::helpers::Authentication;
use crate::routes::{IndexParam, IndexUpdateResponse};
pub async fn get_all(ctx: Request<Data>) -> SResult<Response> { pub fn services(cfg: &mut web::ServiceConfig) {
ctx.is_allowed(Private)?; cfg.service(update_all)
let index = ctx.index()?; .service(get_all)
let db = &ctx.state().db; .service(delete_all)
let reader = db.main_read_txn()?; .service(get_rules)
.service(update_rules)
.service(delete_rules)
.service(get_distinct)
.service(update_distinct)
.service(delete_distinct)
.service(get_searchable)
.service(update_searchable)
.service(delete_searchable)
.service(get_displayed)
.service(update_displayed)
.service(delete_displayed)
.service(get_attributes_for_faceting)
.service(delete_attributes_for_faceting)
.service(update_attributes_for_faceting);
}
let stop_words_fst = index.main.stop_words_fst(&reader)?; pub fn update_all_settings_txn(
let stop_words = stop_words_fst.unwrap_or_default().stream().into_strs()?; data: &web::Data<Data>,
let stop_words: BTreeSet<String> = stop_words.into_iter().collect(); settings: SettingsUpdate,
index_uid: &str,
write_txn: &mut UpdateWriter,
) -> Result<u64, Error> {
let index = data
.db
.open_index(index_uid)
.ok_or(Error::index_not_found(index_uid))?;
let synonyms_fst = index.main.synonyms_fst(&reader)?.unwrap_or_default(); let update_id = index.settings_update(write_txn, settings)?;
let synonyms_list = synonyms_fst.stream().into_strs()?; Ok(update_id)
}
#[post("/indexes/{index_uid}/settings", wrap = "Authentication::Private")]
async fn update_all(
data: web::Data<Data>,
path: web::Path<IndexParam>,
body: web::Json<Settings>,
) -> Result<HttpResponse, ResponseError> {
let settings = body
.into_inner()
.to_update()
.map_err(Error::bad_request)?;
let update_id = data.db.update_write::<_, _, Error>(|writer| {
update_all_settings_txn(&data, settings, &path.index_uid, writer)
})?;
Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
}
pub fn get_all_sync(data: &web::Data<Data>, reader: &MainReader, index_uid: &str) -> Result<Settings, Error> {
let index = data
.db
.open_index(index_uid)
.ok_or(Error::index_not_found(index_uid))?;
let stop_words: BTreeSet<String> = index
.main
.stop_words(reader)?
.into_iter()
.collect();
let synonyms_list = index.main.synonyms(reader)?;
let mut synonyms = BTreeMap::new(); let mut synonyms = BTreeMap::new();
let index_synonyms = &index.synonyms; let index_synonyms = &index.synonyms;
for synonym in synonyms_list { for synonym in synonyms_list {
let alternative_list = index_synonyms.synonyms(&reader, synonym.as_bytes())?; let list = index_synonyms.synonyms(reader, synonym.as_bytes())?;
if let Some(list) = alternative_list { synonyms.insert(synonym, list);
let list = list.stream().into_strs()?;
synonyms.insert(synonym, list);
}
} }
let ranking_rules = index let ranking_rules = index
.main .main
.ranking_rules(&reader)? .ranking_rules(reader)?
.unwrap_or(DEFAULT_RANKING_RULES.to_vec()) .unwrap_or(DEFAULT_RANKING_RULES.to_vec())
.into_iter() .into_iter()
.map(|r| r.to_string()) .map(|r| r.to_string())
.collect(); .collect();
let distinct_attribute = index.main.distinct_attribute(&reader)?;
let schema = index.main.schema(&reader)?; let schema = index.main.schema(reader)?;
let searchable_attributes = schema.clone().map(|s| { let distinct_attribute = match (index.main.distinct_attribute(reader)?, &schema) {
s.indexed_name() (Some(id), Some(schema)) => schema.name(id).map(str::to_string),
.iter() _ => None,
.map(|s| (*s).to_string()) };
.collect::<Vec<String>>()
});
let displayed_attributes = schema.clone().map(|s| { let attributes_for_faceting = match (&schema, &index.main.attributes_for_faceting(reader)?) {
s.displayed_name() (Some(schema), Some(attrs)) => {
.iter() attrs
.map(|s| (*s).to_string()) .iter()
.collect::<HashSet<String>>() .filter_map(|&id| schema.name(id))
}); .map(str::to_string)
.collect()
}
_ => vec![],
};
let accept_new_fields = schema.map(|s| s.accept_new_fields()); let searchable_attributes = schema.as_ref().map(get_indexed_attributes);
let displayed_attributes = schema.as_ref().map(get_displayed_attributes);
let settings = Settings { Ok(Settings {
ranking_rules: Some(Some(ranking_rules)), ranking_rules: Some(Some(ranking_rules)),
distinct_attribute: Some(distinct_attribute), distinct_attribute: Some(distinct_attribute),
searchable_attributes: Some(searchable_attributes), searchable_attributes: Some(searchable_attributes),
displayed_attributes: Some(displayed_attributes), displayed_attributes: Some(displayed_attributes),
stop_words: Some(Some(stop_words)), stop_words: Some(Some(stop_words)),
synonyms: Some(Some(synonyms)), synonyms: Some(Some(synonyms)),
accept_new_fields: Some(accept_new_fields), attributes_for_faceting: Some(Some(attributes_for_faceting)),
}; })
Ok(tide::Response::new(200).body_json(&settings).unwrap())
} }
pub async fn update_all(mut ctx: Request<Data>) -> SResult<Response> { #[get("/indexes/{index_uid}/settings", wrap = "Authentication::Private")]
ctx.is_allowed(Private)?; async fn get_all(
let index = ctx.index()?; data: web::Data<Data>,
let settings: Settings = path: web::Path<IndexParam>,
ctx.body_json().await.map_err(ResponseError::bad_request)?; ) -> Result<HttpResponse, ResponseError> {
let db = &ctx.state().db; let reader = data.db.main_read_txn()?;
let settings = get_all_sync(&data, &reader, &path.index_uid)?;
let mut writer = db.update_write_txn()?; Ok(HttpResponse::Ok().json(settings))
let settings = settings.into_update().map_err(ResponseError::bad_request)?;
let update_id = index.settings_update(&mut writer, settings)?;
writer.commit()?;
let response_body = IndexUpdateResponse { update_id };
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn delete_all(ctx: Request<Data>) -> SResult<Response> { #[delete("/indexes/{index_uid}/settings", wrap = "Authentication::Private")]
ctx.is_allowed(Private)?; async fn delete_all(
let index = ctx.index()?; data: web::Data<Data>,
let db = &ctx.state().db; path: web::Path<IndexParam>,
let mut writer = db.update_write_txn()?; ) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = SettingsUpdate { let settings = SettingsUpdate {
ranking_rules: UpdateState::Clear, ranking_rules: UpdateState::Clear,
@ -102,22 +156,27 @@ pub async fn delete_all(ctx: Request<Data>) -> SResult<Response> {
displayed_attributes: UpdateState::Clear, displayed_attributes: UpdateState::Clear,
stop_words: UpdateState::Clear, stop_words: UpdateState::Clear,
synonyms: UpdateState::Clear, synonyms: UpdateState::Clear,
accept_new_fields: UpdateState::Clear, attributes_for_faceting: UpdateState::Clear,
}; };
let update_id = index.settings_update(&mut writer, settings)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
writer.commit()?; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
let response_body = IndexUpdateResponse { update_id };
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn get_rules(ctx: Request<Data>) -> SResult<Response> { #[get(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/ranking-rules",
let index = ctx.index()?; wrap = "Authentication::Private"
let db = &ctx.state().db; )]
let reader = db.main_read_txn()?; async fn get_rules(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let reader = data.db.main_read_txn()?;
let ranking_rules = index let ranking_rules = index
.main .main
@ -127,248 +186,366 @@ pub async fn get_rules(ctx: Request<Data>) -> SResult<Response> {
.map(|r| r.to_string()) .map(|r| r.to_string())
.collect::<Vec<String>>(); .collect::<Vec<String>>();
Ok(tide::Response::new(200).body_json(&ranking_rules).unwrap()) Ok(HttpResponse::Ok().json(ranking_rules))
} }
pub async fn update_rules(mut ctx: Request<Data>) -> SResult<Response> { #[post(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/ranking-rules",
let index = ctx.index()?; wrap = "Authentication::Private"
let ranking_rules: Option<Vec<String>> = )]
ctx.body_json().await.map_err(ResponseError::bad_request)?; async fn update_rules(
let db = &ctx.state().db; data: web::Data<Data>,
path: web::Path<IndexParam>,
body: web::Json<Option<Vec<String>>>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = Settings { let settings = Settings {
ranking_rules: Some(ranking_rules), ranking_rules: Some(body.into_inner()),
..Settings::default() ..Settings::default()
}; };
let mut writer = db.update_write_txn()?; let settings = settings.to_update().map_err(Error::bad_request)?;
let settings = settings.into_update().map_err(ResponseError::bad_request)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
let update_id = index.settings_update(&mut writer, settings)?;
writer.commit()?;
let response_body = IndexUpdateResponse { update_id }; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn delete_rules(ctx: Request<Data>) -> SResult<Response> { #[delete(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/ranking-rules",
let index = ctx.index()?; wrap = "Authentication::Private"
let db = &ctx.state().db; )]
let mut writer = db.update_write_txn()?; async fn delete_rules(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = SettingsUpdate { let settings = SettingsUpdate {
ranking_rules: UpdateState::Clear, ranking_rules: UpdateState::Clear,
..SettingsUpdate::default() ..SettingsUpdate::default()
}; };
let update_id = index.settings_update(&mut writer, settings)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
writer.commit()?; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
let response_body = IndexUpdateResponse { update_id };
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn get_distinct(ctx: Request<Data>) -> SResult<Response> { #[get(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/distinct-attribute",
let index = ctx.index()?; wrap = "Authentication::Private"
let db = &ctx.state().db; )]
let reader = db.main_read_txn()?; async fn get_distinct(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let reader = data.db.main_read_txn()?;
let distinct_attribute_id = index.main.distinct_attribute(&reader)?;
let schema = index.main.schema(&reader)?;
let distinct_attribute = match (schema, distinct_attribute_id) {
(Some(schema), Some(id)) => schema.name(id).map(str::to_string),
_ => None,
};
let distinct_attribute = index.main.distinct_attribute(&reader)?; Ok(HttpResponse::Ok().json(distinct_attribute))
Ok(tide::Response::new(200)
.body_json(&distinct_attribute)
.unwrap())
} }
pub async fn update_distinct(mut ctx: Request<Data>) -> SResult<Response> { #[post(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/distinct-attribute",
let index = ctx.index()?; wrap = "Authentication::Private"
let distinct_attribute: Option<String> = )]
ctx.body_json().await.map_err(ResponseError::bad_request)?; async fn update_distinct(
let db = &ctx.state().db; data: web::Data<Data>,
path: web::Path<IndexParam>,
body: web::Json<Option<String>>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = Settings { let settings = Settings {
distinct_attribute: Some(distinct_attribute), distinct_attribute: Some(body.into_inner()),
..Settings::default() ..Settings::default()
}; };
let mut writer = db.update_write_txn()?; let settings = settings.to_update().map_err(Error::bad_request)?;
let settings = settings.into_update().map_err(ResponseError::bad_request)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
let update_id = index.settings_update(&mut writer, settings)?;
writer.commit()?;
let response_body = IndexUpdateResponse { update_id }; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn delete_distinct(ctx: Request<Data>) -> SResult<Response> { #[delete(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/distinct-attribute",
let index = ctx.index()?; wrap = "Authentication::Private"
let db = &ctx.state().db; )]
let mut writer = db.update_write_txn()?; async fn delete_distinct(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = SettingsUpdate { let settings = SettingsUpdate {
distinct_attribute: UpdateState::Clear, distinct_attribute: UpdateState::Clear,
..SettingsUpdate::default() ..SettingsUpdate::default()
}; };
let update_id = index.settings_update(&mut writer, settings)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
writer.commit()?; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
let response_body = IndexUpdateResponse { update_id };
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn get_searchable(ctx: Request<Data>) -> SResult<Response> { #[get(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/searchable-attributes",
let index = ctx.index()?; wrap = "Authentication::Private"
let db = &ctx.state().db; )]
let reader = db.main_read_txn()?; async fn get_searchable(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let reader = data.db.main_read_txn()?;
let schema = index.main.schema(&reader)?; let schema = index.main.schema(&reader)?;
let searchable_attributes: Option<Vec<String>> = let searchable_attributes: Option<Vec<String>> =
schema.map(|s| s.indexed_name().iter().map(|i| (*i).to_string()).collect()); schema.as_ref().map(get_indexed_attributes);
Ok(tide::Response::new(200) Ok(HttpResponse::Ok().json(searchable_attributes))
.body_json(&searchable_attributes)
.unwrap())
} }
pub async fn update_searchable(mut ctx: Request<Data>) -> SResult<Response> { #[post(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/searchable-attributes",
let index = ctx.index()?; wrap = "Authentication::Private"
let searchable_attributes: Option<Vec<String>> = )]
ctx.body_json().await.map_err(ResponseError::bad_request)?; async fn update_searchable(
let db = &ctx.state().db; data: web::Data<Data>,
path: web::Path<IndexParam>,
body: web::Json<Option<Vec<String>>>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = Settings { let settings = Settings {
searchable_attributes: Some(searchable_attributes), searchable_attributes: Some(body.into_inner()),
..Settings::default() ..Settings::default()
}; };
let mut writer = db.update_write_txn()?; let settings = settings.to_update().map_err(Error::bad_request)?;
let settings = settings.into_update().map_err(ResponseError::bad_request)?;
let update_id = index.settings_update(&mut writer, settings)?;
writer.commit()?;
let response_body = IndexUpdateResponse { update_id }; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
Ok(tide::Response::new(202).body_json(&response_body)?)
Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
} }
pub async fn delete_searchable(ctx: Request<Data>) -> SResult<Response> { #[delete(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/searchable-attributes",
let index = ctx.index()?; wrap = "Authentication::Private"
let db = &ctx.state().db; )]
async fn delete_searchable(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = SettingsUpdate { let settings = SettingsUpdate {
searchable_attributes: UpdateState::Clear, searchable_attributes: UpdateState::Clear,
..SettingsUpdate::default() ..SettingsUpdate::default()
}; };
let mut writer = db.update_write_txn()?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
let update_id = index.settings_update(&mut writer, settings)?;
writer.commit()?;
let response_body = IndexUpdateResponse { update_id }; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn displayed(ctx: Request<Data>) -> SResult<Response> { #[get(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/displayed-attributes",
let index = ctx.index()?; wrap = "Authentication::Private"
let db = &ctx.state().db; )]
let reader = db.main_read_txn()?; async fn get_displayed(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let reader = data.db.main_read_txn()?;
let schema = index.main.schema(&reader)?; let schema = index.main.schema(&reader)?;
let displayed_attributes: Option<HashSet<String>> = schema.map(|s| { let displayed_attributes = schema.as_ref().map(get_displayed_attributes);
s.displayed_name()
.iter()
.map(|i| (*i).to_string())
.collect()
});
Ok(tide::Response::new(200) Ok(HttpResponse::Ok().json(displayed_attributes))
.body_json(&displayed_attributes)
.unwrap())
} }
pub async fn update_displayed(mut ctx: Request<Data>) -> SResult<Response> { #[post(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/displayed-attributes",
let index = ctx.index()?; wrap = "Authentication::Private"
let displayed_attributes: Option<HashSet<String>> = )]
ctx.body_json().await.map_err(ResponseError::bad_request)?; async fn update_displayed(
let db = &ctx.state().db; data: web::Data<Data>,
path: web::Path<IndexParam>,
body: web::Json<Option<HashSet<String>>>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = Settings { let settings = Settings {
displayed_attributes: Some(displayed_attributes), displayed_attributes: Some(body.into_inner()),
..Settings::default() ..Settings::default()
}; };
let mut writer = db.update_write_txn()?; let settings = settings.to_update().map_err(Error::bad_request)?;
let settings = settings.into_update().map_err(ResponseError::bad_request)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
let update_id = index.settings_update(&mut writer, settings)?;
writer.commit()?;
let response_body = IndexUpdateResponse { update_id }; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn delete_displayed(ctx: Request<Data>) -> SResult<Response> { #[delete(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/displayed-attributes",
let index = ctx.index()?; wrap = "Authentication::Private"
let db = &ctx.state().db; )]
async fn delete_displayed(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = SettingsUpdate { let settings = SettingsUpdate {
displayed_attributes: UpdateState::Clear, displayed_attributes: UpdateState::Clear,
..SettingsUpdate::default() ..SettingsUpdate::default()
}; };
let mut writer = db.update_write_txn()?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
let update_id = index.settings_update(&mut writer, settings)?;
writer.commit()?;
let response_body = IndexUpdateResponse { update_id }; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn get_accept_new_fields(ctx: Request<Data>) -> SResult<Response> { #[get(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/attributes-for-faceting",
let index = ctx.index()?; wrap = "Authentication::Private"
let db = &ctx.state().db; )]
let reader = db.main_read_txn()?; async fn get_attributes_for_faceting(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let schema = index.main.schema(&reader)?; let attributes_for_faceting = data
.db
.main_read::<_, _, ResponseError>(|reader| {
let schema = index.main.schema(reader)?;
let attrs = index.main.attributes_for_faceting(reader)?;
let attr_names = match (&schema, &attrs) {
(Some(schema), Some(attrs)) => {
attrs
.iter()
.filter_map(|&id| schema.name(id))
.map(str::to_string)
.collect()
}
_ => vec![]
};
Ok(attr_names)
})?;
let accept_new_fields = schema.map(|s| s.accept_new_fields()); Ok(HttpResponse::Ok().json(attributes_for_faceting))
Ok(tide::Response::new(200)
.body_json(&accept_new_fields)
.unwrap())
} }
pub async fn update_accept_new_fields(mut ctx: Request<Data>) -> SResult<Response> { #[post(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/attributes-for-faceting",
let index = ctx.index()?; wrap = "Authentication::Private"
let accept_new_fields: Option<bool> = )]
ctx.body_json().await.map_err(ResponseError::bad_request)?; async fn update_attributes_for_faceting(
let db = &ctx.state().db; data: web::Data<Data>,
path: web::Path<IndexParam>,
body: web::Json<Option<Vec<String>>>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = Settings { let settings = Settings {
accept_new_fields: Some(accept_new_fields), attributes_for_faceting: Some(body.into_inner()),
..Settings::default() ..Settings::default()
}; };
let mut writer = db.update_write_txn()?; let settings = settings.to_update().map_err(Error::bad_request)?;
let settings = settings.into_update().map_err(ResponseError::bad_request)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
let update_id = index.settings_update(&mut writer, settings)?;
writer.commit()?;
let response_body = IndexUpdateResponse { update_id }; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
Ok(tide::Response::new(202).body_json(&response_body)?) }
#[delete(
"/indexes/{index_uid}/settings/attributes-for-faceting",
wrap = "Authentication::Private"
)]
async fn delete_attributes_for_faceting(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = SettingsUpdate {
attributes_for_faceting: UpdateState::Clear,
..SettingsUpdate::default()
};
let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
}
fn get_indexed_attributes(schema: &Schema) -> Vec<String> {
if schema.is_indexed_all() {
["*"].iter().map(|s| s.to_string()).collect()
} else {
schema.indexed_name()
.iter()
.map(|s| s.to_string())
.collect()
}
}
fn get_displayed_attributes(schema: &Schema) -> HashSet<String> {
if schema.is_displayed_all() {
["*"].iter().map(|s| s.to_string()).collect()
} else {
schema.displayed_name()
.iter()
.map(|s| s.to_string())
.collect()
}
} }

View File

@ -1,46 +1,61 @@
use std::collections::HashMap; use std::collections::HashMap;
use actix_web::web;
use actix_web::HttpResponse;
use actix_web::get;
use chrono::{DateTime, Utc}; use chrono::{DateTime, Utc};
use log::error; use log::error;
use pretty_bytes::converter::convert;
use serde::Serialize; use serde::Serialize;
use sysinfo::{NetworkExt, Pid, ProcessExt, ProcessorExt, System, SystemExt};
use tide::{Request, Response};
use walkdir::WalkDir; use walkdir::WalkDir;
use crate::error::{IntoInternalError, SResult}; use crate::error::{Error, ResponseError};
use crate::helpers::tide::RequestExt; use crate::helpers::Authentication;
use crate::helpers::tide::ACL::*; use crate::routes::IndexParam;
use crate::Data; use crate::Data;
pub fn services(cfg: &mut web::ServiceConfig) {
cfg.service(index_stats)
.service(get_stats)
.service(get_version);
}
#[derive(Serialize)] #[derive(Serialize)]
#[serde(rename_all = "camelCase")] #[serde(rename_all = "camelCase")]
struct IndexStatsResponse { struct IndexStatsResponse {
number_of_documents: u64, number_of_documents: u64,
is_indexing: bool, is_indexing: bool,
fields_frequency: HashMap<String, usize>, fields_distribution: HashMap<String, usize>,
} }
pub async fn index_stats(ctx: Request<Data>) -> SResult<Response> { #[get("/indexes/{index_uid}/stats", wrap = "Authentication::Private")]
ctx.is_allowed(Admin)?; async fn index_stats(
let index_uid = ctx.url_param("index")?; data: web::Data<Data>,
let index = ctx.index()?; path: web::Path<IndexParam>,
let db = &ctx.state().db; ) -> Result<HttpResponse, ResponseError> {
let reader = db.main_read_txn()?; let index = data
let update_reader = db.update_read_txn()?; .db
let number_of_documents = index.main.number_of_documents(&reader)?; .open_index(&path.index_uid)
let fields_frequency = index.main.fields_frequency(&reader)?.unwrap_or_default(); .ok_or(Error::index_not_found(&path.index_uid))?;
let is_indexing = ctx
.state()
.is_indexing(&update_reader, &index_uid)?
.into_internal_error()?;
let response = IndexStatsResponse { let reader = data.db.main_read_txn()?;
let number_of_documents = index.main.number_of_documents(&reader)?;
let fields_distribution = index.main.fields_distribution(&reader)?.unwrap_or_default();
let update_reader = data.db.update_read_txn()?;
let is_indexing =
data.db.is_indexing(&update_reader, &path.index_uid)?
.ok_or(Error::internal(
"Impossible to know if the database is indexing",
))?;
Ok(HttpResponse::Ok().json(IndexStatsResponse {
number_of_documents, number_of_documents,
is_indexing, is_indexing,
fields_frequency, fields_distribution,
}; }))
Ok(tide::Response::new(200).body_json(&response).unwrap())
} }
#[derive(Serialize)] #[derive(Serialize)]
@ -51,34 +66,30 @@ struct StatsResult {
indexes: HashMap<String, IndexStatsResponse>, indexes: HashMap<String, IndexStatsResponse>,
} }
pub async fn get_stats(ctx: Request<Data>) -> SResult<Response> { #[get("/stats", wrap = "Authentication::Private")]
ctx.is_allowed(Admin)?; async fn get_stats(data: web::Data<Data>) -> Result<HttpResponse, ResponseError> {
let mut index_list = HashMap::new(); let mut index_list = HashMap::new();
let db = &ctx.state().db; let reader = data.db.main_read_txn()?;
let reader = db.main_read_txn()?; let update_reader = data.db.update_read_txn()?;
let update_reader = db.update_read_txn()?;
let indexes_set = ctx.state().db.indexes_uids(); let indexes_set = data.db.indexes_uids();
for index_uid in indexes_set { for index_uid in indexes_set {
let index = ctx.state().db.open_index(&index_uid); let index = data.db.open_index(&index_uid);
match index { match index {
Some(index) => { Some(index) => {
let number_of_documents = index.main.number_of_documents(&reader)?; let number_of_documents = index.main.number_of_documents(&reader)?;
let fields_frequency = index.main.fields_frequency(&reader)?.unwrap_or_default(); let fields_distribution = index.main.fields_distribution(&reader)?.unwrap_or_default();
let is_indexing = ctx let is_indexing = data.db.is_indexing(&update_reader, &index_uid)?.ok_or(
.state() Error::internal("Impossible to know if the database is indexing"),
.is_indexing(&update_reader, &index_uid)? )?;
.into_internal_error()?;
let response = IndexStatsResponse { let response = IndexStatsResponse {
number_of_documents, number_of_documents,
is_indexing, is_indexing,
fields_frequency, fields_distribution,
}; };
index_list.insert(index_uid, response); index_list.insert(index_uid, response);
} }
@ -89,22 +100,20 @@ pub async fn get_stats(ctx: Request<Data>) -> SResult<Response> {
} }
} }
let database_size = WalkDir::new(ctx.state().db_path.clone()) let database_size = WalkDir::new(&data.db_path)
.into_iter() .into_iter()
.filter_map(|entry| entry.ok()) .filter_map(|entry| entry.ok())
.filter_map(|entry| entry.metadata().ok()) .filter_map(|entry| entry.metadata().ok())
.filter(|metadata| metadata.is_file()) .filter(|metadata| metadata.is_file())
.fold(0, |acc, m| acc + m.len()); .fold(0, |acc, m| acc + m.len());
let last_update = ctx.state().last_update(&reader)?; let last_update = data.db.last_update(&reader)?;
let response = StatsResult { Ok(HttpResponse::Ok().json(StatsResult {
database_size, database_size,
last_update, last_update,
indexes: index_list, indexes: index_list,
}; }))
Ok(tide::Response::new(200).body_json(&response).unwrap())
} }
#[derive(Serialize)] #[derive(Serialize)]
@ -115,203 +124,11 @@ struct VersionResponse {
pkg_version: String, pkg_version: String,
} }
pub async fn get_version(ctx: Request<Data>) -> SResult<Response> { #[get("/version", wrap = "Authentication::Private")]
ctx.is_allowed(Admin)?; async fn get_version() -> HttpResponse {
let response = VersionResponse { HttpResponse::Ok().json(VersionResponse {
commit_sha: env!("VERGEN_SHA").to_string(), commit_sha: env!("VERGEN_SHA").to_string(),
build_date: env!("VERGEN_BUILD_TIMESTAMP").to_string(), build_date: env!("VERGEN_BUILD_TIMESTAMP").to_string(),
pkg_version: env!("CARGO_PKG_VERSION").to_string(), pkg_version: env!("CARGO_PKG_VERSION").to_string(),
}; })
Ok(tide::Response::new(200).body_json(&response).unwrap())
}
#[derive(Serialize)]
#[serde(rename_all = "camelCase")]
pub(crate) struct SysGlobal {
total_memory: u64,
used_memory: u64,
total_swap: u64,
used_swap: u64,
input_data: u64,
output_data: u64,
}
impl SysGlobal {
fn new() -> SysGlobal {
SysGlobal {
total_memory: 0,
used_memory: 0,
total_swap: 0,
used_swap: 0,
input_data: 0,
output_data: 0,
}
}
}
#[derive(Serialize)]
#[serde(rename_all = "camelCase")]
pub(crate) struct SysProcess {
memory: u64,
cpu: f32,
}
impl SysProcess {
fn new() -> SysProcess {
SysProcess {
memory: 0,
cpu: 0.0,
}
}
}
#[derive(Serialize)]
#[serde(rename_all = "camelCase")]
pub(crate) struct SysInfo {
memory_usage: f64,
processor_usage: Vec<f32>,
global: SysGlobal,
process: SysProcess,
}
impl SysInfo {
fn new() -> SysInfo {
SysInfo {
memory_usage: 0.0,
processor_usage: Vec::new(),
global: SysGlobal::new(),
process: SysProcess::new(),
}
}
}
pub(crate) fn report(pid: Pid) -> SysInfo {
let mut sys = System::new();
let mut info = SysInfo::new();
info.memory_usage = sys.get_used_memory() as f64 / sys.get_total_memory() as f64 * 100.0;
for processor in sys.get_processor_list() {
info.processor_usage.push(processor.get_cpu_usage() * 100.0);
}
info.global.total_memory = sys.get_total_memory();
info.global.used_memory = sys.get_used_memory();
info.global.total_swap = sys.get_total_swap();
info.global.used_swap = sys.get_used_swap();
info.global.input_data = sys.get_network().get_income();
info.global.output_data = sys.get_network().get_outcome();
if let Some(process) = sys.get_process(pid) {
info.process.memory = process.memory();
info.process.cpu = process.cpu_usage() * 100.0;
}
sys.refresh_all();
info
}
pub async fn get_sys_info(ctx: Request<Data>) -> SResult<Response> {
ctx.is_allowed(Admin)?;
let response = report(ctx.state().server_pid);
Ok(tide::Response::new(200).body_json(&response).unwrap())
}
#[derive(Serialize)]
#[serde(rename_all = "camelCase")]
pub(crate) struct SysGlobalPretty {
total_memory: String,
used_memory: String,
total_swap: String,
used_swap: String,
input_data: String,
output_data: String,
}
impl SysGlobalPretty {
fn new() -> SysGlobalPretty {
SysGlobalPretty {
total_memory: "None".to_owned(),
used_memory: "None".to_owned(),
total_swap: "None".to_owned(),
used_swap: "None".to_owned(),
input_data: "None".to_owned(),
output_data: "None".to_owned(),
}
}
}
#[derive(Serialize)]
#[serde(rename_all = "camelCase")]
pub(crate) struct SysProcessPretty {
memory: String,
cpu: String,
}
impl SysProcessPretty {
fn new() -> SysProcessPretty {
SysProcessPretty {
memory: "None".to_owned(),
cpu: "None".to_owned(),
}
}
}
#[derive(Serialize)]
#[serde(rename_all = "camelCase")]
pub(crate) struct SysInfoPretty {
memory_usage: String,
processor_usage: Vec<String>,
global: SysGlobalPretty,
process: SysProcessPretty,
}
impl SysInfoPretty {
fn new() -> SysInfoPretty {
SysInfoPretty {
memory_usage: "None".to_owned(),
processor_usage: Vec::new(),
global: SysGlobalPretty::new(),
process: SysProcessPretty::new(),
}
}
}
pub(crate) fn report_pretty(pid: Pid) -> SysInfoPretty {
let mut sys = System::new();
let mut info = SysInfoPretty::new();
info.memory_usage = format!(
"{:.1} %",
sys.get_used_memory() as f64 / sys.get_total_memory() as f64 * 100.0
);
for processor in sys.get_processor_list() {
info.processor_usage
.push(format!("{:.1} %", processor.get_cpu_usage() * 100.0));
}
info.global.total_memory = convert(sys.get_total_memory() as f64 * 1024.0);
info.global.used_memory = convert(sys.get_used_memory() as f64 * 1024.0);
info.global.total_swap = convert(sys.get_total_swap() as f64 * 1024.0);
info.global.used_swap = convert(sys.get_used_swap() as f64 * 1024.0);
info.global.input_data = convert(sys.get_network().get_income() as f64);
info.global.output_data = convert(sys.get_network().get_outcome() as f64);
if let Some(process) = sys.get_process(pid) {
info.process.memory = convert(process.memory() as f64 * 1024.0);
info.process.cpu = format!("{:.1} %", process.cpu_usage() * 100.0);
}
sys.refresh_all();
info
}
pub async fn get_sys_info_pretty(ctx: Request<Data>) -> SResult<Response> {
ctx.is_allowed(Admin)?;
let response = report_pretty(ctx.state().server_pid);
Ok(tide::Response::new(200).body_json(&response).unwrap())
} }

View File

@ -1,63 +1,78 @@
use actix_web::{web, HttpResponse};
use actix_web::{delete, get, post};
use meilisearch_core::settings::{SettingsUpdate, UpdateState};
use std::collections::BTreeSet; use std::collections::BTreeSet;
use meilisearch_core::settings::{SettingsUpdate, UpdateState}; use crate::error::{Error, ResponseError};
use tide::{Request, Response}; use crate::helpers::Authentication;
use crate::routes::{IndexParam, IndexUpdateResponse};
use crate::error::{ResponseError, SResult};
use crate::helpers::tide::RequestExt;
use crate::helpers::tide::ACL::*;
use crate::routes::document::IndexUpdateResponse;
use crate::Data; use crate::Data;
pub async fn get(ctx: Request<Data>) -> SResult<Response> { pub fn services(cfg: &mut web::ServiceConfig) {
ctx.is_allowed(Private)?; cfg.service(get).service(update).service(delete);
let index = ctx.index()?;
let db = &ctx.state().db;
let reader = db.main_read_txn()?;
let stop_words_fst = index.main.stop_words_fst(&reader)?;
let stop_words = stop_words_fst.unwrap_or_default().stream().into_strs()?;
Ok(tide::Response::new(200).body_json(&stop_words).unwrap())
} }
pub async fn update(mut ctx: Request<Data>) -> SResult<Response> { #[get(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/stop-words",
let index = ctx.index()?; wrap = "Authentication::Private"
)]
async fn get(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let reader = data.db.main_read_txn()?;
let stop_words = index.main.stop_words(&reader)?;
let data: BTreeSet<String> = ctx.body_json().await.map_err(ResponseError::bad_request)?; Ok(HttpResponse::Ok().json(stop_words))
}
let db = &ctx.state().db; #[post(
let mut writer = db.update_write_txn()?; "/indexes/{index_uid}/settings/stop-words",
wrap = "Authentication::Private"
)]
async fn update(
data: web::Data<Data>,
path: web::Path<IndexParam>,
body: web::Json<BTreeSet<String>>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = SettingsUpdate { let settings = SettingsUpdate {
stop_words: UpdateState::Update(data), stop_words: UpdateState::Update(body.into_inner()),
..SettingsUpdate::default() ..SettingsUpdate::default()
}; };
let update_id = index.settings_update(&mut writer, settings)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
writer.commit()?; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
let response_body = IndexUpdateResponse { update_id };
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn delete(ctx: Request<Data>) -> SResult<Response> { #[delete(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/stop-words",
let index = ctx.index()?; wrap = "Authentication::Private"
)]
let db = &ctx.state().db; async fn delete(
let mut writer = db.update_write_txn()?; data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = SettingsUpdate { let settings = SettingsUpdate {
stop_words: UpdateState::Clear, stop_words: UpdateState::Clear,
..SettingsUpdate::default() ..SettingsUpdate::default()
}; };
let update_id = index.settings_update(&mut writer, settings)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
writer.commit()?; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
let response_body = IndexUpdateResponse { update_id };
Ok(tide::Response::new(202).body_json(&response_body)?)
} }

View File

@ -1,82 +1,89 @@
use std::collections::BTreeMap; use std::collections::BTreeMap;
use actix_web::{web, HttpResponse};
use actix_web::{delete, get, post};
use indexmap::IndexMap; use indexmap::IndexMap;
use meilisearch_core::settings::{SettingsUpdate, UpdateState}; use meilisearch_core::settings::{SettingsUpdate, UpdateState};
use tide::{Request, Response};
use crate::error::{ResponseError, SResult}; use crate::error::{Error, ResponseError};
use crate::helpers::tide::RequestExt; use crate::helpers::Authentication;
use crate::helpers::tide::ACL::*; use crate::routes::{IndexParam, IndexUpdateResponse};
use crate::routes::document::IndexUpdateResponse;
use crate::Data; use crate::Data;
pub async fn get(ctx: Request<Data>) -> SResult<Response> { pub fn services(cfg: &mut web::ServiceConfig) {
ctx.is_allowed(Private)?; cfg.service(get).service(update).service(delete);
let index = ctx.index()?;
let db = &ctx.state().db;
let reader = db.main_read_txn()?;
let synonyms_fst = index.main.synonyms_fst(&reader)?.unwrap_or_default();
let synonyms_list = synonyms_fst.stream().into_strs()?;
let mut synonyms = IndexMap::new();
let index_synonyms = &index.synonyms;
for synonym in synonyms_list {
let alternative_list = index_synonyms.synonyms(&reader, synonym.as_bytes())?;
if let Some(list) = alternative_list {
let list = list.stream().into_strs()?;
synonyms.insert(synonym, list);
}
}
Ok(tide::Response::new(200).body_json(&synonyms).unwrap())
} }
pub async fn update(mut ctx: Request<Data>) -> SResult<Response> { #[get(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/synonyms",
wrap = "Authentication::Private"
)]
async fn get(
data: web::Data<Data>,
path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let data: BTreeMap<String, Vec<String>> = let reader = data.db.main_read_txn()?;
ctx.body_json().await.map_err(ResponseError::bad_request)?;
let index = ctx.index()?; let synonyms_list = index.main.synonyms(&reader)?;
let db = &ctx.state().db; let mut synonyms = IndexMap::new();
let mut writer = db.update_write_txn()?; let index_synonyms = &index.synonyms;
for synonym in synonyms_list {
let list = index_synonyms.synonyms(&reader, synonym.as_bytes())?;
synonyms.insert(synonym, list);
}
Ok(HttpResponse::Ok().json(synonyms))
}
#[post(
"/indexes/{index_uid}/settings/synonyms",
wrap = "Authentication::Private"
)]
async fn update(
data: web::Data<Data>,
path: web::Path<IndexParam>,
body: web::Json<BTreeMap<String, Vec<String>>>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = SettingsUpdate { let settings = SettingsUpdate {
synonyms: UpdateState::Update(data), synonyms: UpdateState::Update(body.into_inner()),
..SettingsUpdate::default() ..SettingsUpdate::default()
}; };
let update_id = index.settings_update(&mut writer, settings)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
writer.commit()?; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
let response_body = IndexUpdateResponse { update_id };
Ok(tide::Response::new(202).body_json(&response_body)?)
} }
pub async fn delete(ctx: Request<Data>) -> SResult<Response> { #[delete(
ctx.is_allowed(Private)?; "/indexes/{index_uid}/settings/synonyms",
wrap = "Authentication::Private"
let index = ctx.index()?; )]
async fn delete(
let db = &ctx.state().db; data: web::Data<Data>,
let mut writer = db.update_write_txn()?; path: web::Path<IndexParam>,
) -> Result<HttpResponse, ResponseError> {
let index = data
.db
.open_index(&path.index_uid)
.ok_or(Error::index_not_found(&path.index_uid))?;
let settings = SettingsUpdate { let settings = SettingsUpdate {
synonyms: UpdateState::Clear, synonyms: UpdateState::Clear,
..SettingsUpdate::default() ..SettingsUpdate::default()
}; };
let update_id = index.settings_update(&mut writer, settings)?; let update_id = data.db.update_write(|w| index.settings_update(w, settings))?;
writer.commit()?; Ok(HttpResponse::Accepted().json(IndexUpdateResponse::with_id(update_id)))
let response_body = IndexUpdateResponse { update_id };
Ok(tide::Response::new(202).body_json(&response_body)?)
} }

View File

@ -0,0 +1,96 @@
use crate::Data;
use crate::error::Error;
use crate::helpers::compression;
use log::error;
use std::fs::create_dir_all;
use std::path::Path;
use std::thread;
use std::time::{Duration};
use tempfile::TempDir;
pub fn load_snapshot(
db_path: &str,
snapshot_path: &Path,
ignore_snapshot_if_db_exists: bool,
ignore_missing_snapshot: bool
) -> Result<(), Error> {
let db_path = Path::new(db_path);
if !db_path.exists() && snapshot_path.exists() {
compression::from_tar_gz(snapshot_path, db_path)
} else if db_path.exists() && !ignore_snapshot_if_db_exists {
Err(Error::Internal(format!("database already exists at {:?}", db_path)))
} else if !snapshot_path.exists() && !ignore_missing_snapshot {
Err(Error::Internal(format!("snapshot doesn't exist at {:?}", snapshot_path)))
} else {
Ok(())
}
}
pub fn create_snapshot(data: &Data, snapshot_path: &Path) -> Result<(), Error> {
let tmp_dir = TempDir::new()?;
data.db.copy_and_compact_to_path(tmp_dir.path())?;
compression::to_tar_gz(tmp_dir.path(), snapshot_path).or_else(|e| Err(Error::Internal(format!("something went wrong during snapshot compression: {}", e))))
}
pub fn schedule_snapshot(data: Data, snapshot_dir: &Path, time_gap_s: u64) -> Result<(), Error> {
if snapshot_dir.file_name().is_none() {
return Err(Error::Internal("invalid snapshot file path".to_string()));
}
let db_name = Path::new(&data.db_path).file_name().ok_or_else(|| Error::Internal("invalid database name".to_string()))?;
create_dir_all(snapshot_dir)?;
let snapshot_path = snapshot_dir.join(format!("{}.tar.gz", db_name.to_str().unwrap_or("data.ms")));
thread::spawn(move || loop {
thread::sleep(Duration::from_secs(time_gap_s));
if let Err(e) = create_snapshot(&data, &snapshot_path) {
error!("Unsuccessful snapshot creation: {}", e);
}
});
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use std::io::prelude::*;
use std::fs;
#[test]
fn test_pack_unpack() {
let tempdir = TempDir::new().unwrap();
let test_dir = tempdir.path();
let src_dir = test_dir.join("src");
let dest_dir = test_dir.join("complex/destination/path/");
let archive_path = test_dir.join("archive.tar.gz");
let file_1_relative = Path::new("file1.txt");
let subfolder_relative = Path::new("subfolder/");
let file_2_relative = Path::new("subfolder/file2.txt");
create_dir_all(src_dir.join(subfolder_relative)).unwrap();
fs::File::create(src_dir.join(file_1_relative)).unwrap().write_all(b"Hello_file_1").unwrap();
fs::File::create(src_dir.join(file_2_relative)).unwrap().write_all(b"Hello_file_2").unwrap();
assert!(compression::to_tar_gz(&src_dir, &archive_path).is_ok());
assert!(archive_path.exists());
assert!(load_snapshot(&dest_dir.to_str().unwrap(), &archive_path, false, false).is_ok());
assert!(dest_dir.exists());
assert!(dest_dir.join(file_1_relative).exists());
assert!(dest_dir.join(subfolder_relative).exists());
assert!(dest_dir.join(file_2_relative).exists());
let contents = fs::read_to_string(dest_dir.join(file_1_relative)).unwrap();
assert_eq!(contents, "Hello_file_1");
let contents = fs::read_to_string(dest_dir.join(file_2_relative)).unwrap();
assert_eq!(contents, "Hello_file_2");
}
}

View File

@ -0,0 +1,12 @@
{
"indices": [{
"uid": "test",
"primaryKey": "id"
}, {
"uid": "test2",
"primaryKey": "test2_id"
}
],
"dbVersion": "0.13.0",
"dumpVersion": "1"
}

Some files were not shown because too many files have changed in this diff Show More