Compare commits

...

903 Commits

Author SHA1 Message Date
33514b28be Merge pull request #1588 from meilisearch/test-new-indexer
Integrate the new indexer
2021-09-06 10:21:42 +02:00
e3a913e03f Merge #1660
1660: Update version for the next release (v0.22.0) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-02 16:43:32 +00:00
7e80337e5b Bump milli to v0.12.0 2021-09-02 18:19:12 +02:00
8d4723d91b Update lock file 2021-09-02 18:19:12 +02:00
4cdf680a81 Make the MaxMemory use the default value when undefined 2021-09-02 18:19:11 +02:00
63e67f72e3 Update tokenizer and new milli version 2021-09-02 18:19:00 +02:00
0cd66c3a89 Bump the milli version 2021-09-02 18:19:00 +02:00
b092a624ed Introduce the MaxMemory struct that defaults to 2/3 of the available memory 2021-09-02 18:18:59 +02:00
24e84d7ca1 Test new indexer 2021-09-02 18:11:20 +02:00
14f9056349 Merge #1662
1662: Fix link in download script r=irevoire a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-02 11:27:14 +00:00
723cb4d520 Fix link in download script 2021-09-01 15:57:11 +02:00
90116155b4 Update version for the next release (v0.22.0) 2021-09-01 12:33:30 +02:00
0d01c0e935 Merge #1658
1658: Remove COMMIT_SHA and COMMIT_DATE build arg from the Docker CIs r=irevoire a=curquiza

Since `@irevoire` add the `.git` folder in the Dockerfile, no need to compute `COMMIT_SHA` and `COMMIT_DATE` in the CI.
Can you confirm `@irevoire?`

Also, update some CIs using `checkout@v1` to `checkout@v2`

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-31 15:15:24 +00:00
e002509bf2 Remove COMMIT_SHA and COMMIT_DATE build arg 2021-08-31 17:01:58 +02:00
19c5c74291 Merge #1652 #1654 #1657
1652: Remove dependabot r=MarinPostma a=curquiza

Fixes #1649 

Dependabot for vulnerability and security updates is still activated.

1654: Add Script for Windows r=MarinPostma a=singh08prashant

fixes #1570 

changes:

1. added script for detecting windows os running git bash
2. appended `.exe` to `$release_file` for windows as listed [here](https://github.com/meilisearch/MeiliSearch/releases/)
3. removed global `$BINARY_NAME='meilisearch'` as windows require `.exe` file

1657: Bring vergen hotfix from `stable` to `main` r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: singh08prashant <singh08prashant@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
2021-08-31 14:31:42 +00:00
b6fec60243 Merge #1656
1656: Remove unused Arc import r=MarinPostma a=Kerollmops

This PR removes a warning introduced by #1606 which removed Sentry that was using an `Arc` but forgot to remove the scope import, we remove it here.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-08-31 14:04:16 +00:00
9d0fa8112b Remove unused Arc import 2021-08-31 14:50:36 +02:00
d30f5b1bef add scrpit for git-bash 2021-08-31 08:34:21 +05:30
7691b0d721 Merge #1636
1636: Hotfix: Log but don't panic when vergen can't retrieve commit information r=curquiza a=Kerollmops

This pull request fixes an issue we discovered when we tried to publish meilisearch v0.21 on brew, brew uses the tarball downloaded from github directly which doesn't contain the `.git` folder.

We use the `.git` folder with [vergen](https://docs.rs/vergen) to retrieve the commit and datetime information. Unfortunately, we were unwrapping the vergen result and it was crashing when the git folder was missing.

We no more panic when vergen can't find the `.git` folder and just log out a potential error returned by [the git2 library](https://docs.rs/git2). We then just check that the env variables are available at compile-time and replace it with "unknown" if not.

### When the `.git` folder is available

```
xh localhost:7700/version
HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 26 Aug 2021 13:44:23 GMT
Transfer-Encoding: chunked

{
    "commitSha": "81a76eab69944de8a8d5006345b5aec7b02acf50",
    "commitDate": "2021-08-26T13:41:30+00:00",
    "pkgVersion": "0.21.0"
}
```

### When the `.git` folder is unavailable

```bash
cp -R meilisearch meilisearch-cpy
cd meilisearch-cpy
rm -rf .git
cargo clean
cargo run --release
   <snip>
   Compiling meilisearch-http v0.21.0 (/Users/clementrenault/Documents/meilisearch-cpy/meilisearch-http)
warning: vergen: could not find repository from '/Users/clementrenault/Documents/meilisearch-cpy/meilisearch-http'; class=Repository (6); code=NotFound (-3)
```

```
xh localhost:7700/version
HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 26 Aug 2021 13:46:33 GMT
Transfer-Encoding: chunked

{
    "commitSha": "unknown",
    "commitDate": "unknown",
    "pkgVersion": "0.21.0"
}
```

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-08-30 16:25:12 +00:00
b8c954eb3f Bump the MeiliSearch version to v0.21.1 2021-08-30 17:41:25 +02:00
a8c146fd13 Unwrap or unknown the commit hash 2021-08-30 17:41:24 +02:00
70df41bc62 Remove dependabot 2021-08-30 16:51:50 +02:00
1782753387 Bump vergen and remove unused build feature 2021-08-30 15:03:45 +02:00
23ccf4429e Merge #1639
1639: Add new mini-dahsboard gif r=curquiza a=CaroFG



Co-authored-by: CaroFG <48251481+CaroFG@users.noreply.github.com>
Co-authored-by: CaroFG <carolina.ferreira131@gmail.com>
2021-08-26 15:58:39 +00:00
bf4e799dba Update README.md
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-08-26 17:47:29 +02:00
cb695bdec3 Update README with new gif 2021-08-26 17:43:41 +02:00
be70eb881a Remove old gif 2021-08-26 17:42:56 +02:00
867c277088 Add files via upload 2021-08-26 16:40:44 +02:00
96f72f009a Merge #1615
1615: Integrate the query time sort feature r=Kerollmops a=Kerollmops

This pull request integrates the sort at query time feature that was implemented on the milli side https://github.com/meilisearch/milli/pull/320. It follows the specification file https://github.com/meilisearch/specifications/blob/develop/text/0055-sort.md.

A bunch of tests has been added to ensure that the search works correctly and that the settings are fine too!

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-08-26 14:09:38 +00:00
cf4a466b6b Make sure that the order of the filterableAttributes is constant 2021-08-26 11:06:05 +02:00
087e4626ce Make sure that the order of the sortableAttributes is constant 2021-08-26 11:06:04 +02:00
64462c842b Test the search with sort time queries with POST and GET methods 2021-08-25 17:39:25 +02:00
e0f73fe742 Introduce the sort search parameter 2021-08-25 17:39:25 +02:00
ea4c831de0 Integrate the sortable-attributes into the settings 2021-08-25 17:39:25 +02:00
51387b2c80 Introduce the new invalid sortable error codes 2021-08-25 17:29:30 +02:00
2d8dd87cad Merge #1623
1623: Use Setting enum r=Kerollmops a=shekhirin

Resolves https://github.com/meilisearch/MeiliSearch/issues/1620

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-08-25 14:58:40 +00:00
d9dd2a038b refactor(http): use Setting enum 2021-08-25 17:43:46 +03:00
1227ce8091 Merge #1622
1622: Update README to welcome the contribution again r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-25 13:08:08 +00:00
cd63c80be8 Merge #1616
1616: Remove sentry r=Kerollmops a=irevoire

closes #1606 

Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-08-25 11:40:30 +00:00
e0a5eebe79 Update README to welcome the contribution again 2021-08-24 20:31:05 +02:00
850069af75 Merge #1610
1610: Fix Docker CI for `latest` tag r=irevoire a=curquiza

Fixes https://github.com/meilisearch/MeiliSearch/issues/1608

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-24 11:46:04 +00:00
672fcee8aa remove sentry 2021-08-24 12:38:31 +02:00
d9b023c11f Update publish-docker-latest.yml 2021-08-23 19:27:48 +02:00
6b228f56cb Merge #1607
1607: Merge changes in `stable` into `main` r=Kerollmops a=curquiza

Containing all the fixes since v0.21.0rc0

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
2021-08-23 16:27:46 +00:00
dd645e6da4 Merge #1605
1605: Fix pacic when decoding r=curquiza a=curquiza

Update milli to fix the panic during document deletion

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-23 11:06:45 +00:00
149f46c184 Fix pacic when decoding 2021-08-23 12:37:51 +02:00
96839c48c9 Direct users to milli for the core library in the README (#1520)
* Update README.md

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>

* Update README.md

Co-authored-by: gui machiavelli <hey@guimachiavelli.com>

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: gui machiavelli <hey@guimachiavelli.com>
2021-08-19 16:24:12 +02:00
3e27d5e885 Merge #1596
1596: Update milli and tokenizer version: fix panic during indexation r=curquiza a=curquiza

Fixes #1590 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-18 13:44:30 +00:00
38fc876704 Update tokenizer and new milli version with new tags 2021-08-18 14:55:10 +02:00
39d5a99095 Update milli and tokenizer version 2021-08-18 12:09:34 +02:00
2beb306834 Merge #1577
1577: Update milli dependency: fix facet values bugs r=Kerollmops a=curquiza

Fixes #1576 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-16 16:13:42 +00:00
f3e595e2f0 Update milli dependency 2021-08-16 13:36:42 +02:00
5d80d11b23 Merge #1580
1580: Update telemetry link r=curquiza a=curquiza

Here is the page the user will have: https://dev.docs.meilisearch.com/learn/what_is_meilisearch/telemetry.html
instead of: https://docs.meilisearch.com/reference/features/configuration.html#disable-analytics

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-12 17:11:30 +00:00
621529e9dc Update telemetry link 2021-08-12 18:58:07 +02:00
535aff8f7e Merge #1578
1578: Update tokenizer version to v0.2.4 r=ManyTheFish a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-12 15:27:12 +00:00
7531280764 Update tokenizer version to v0.2.4 2021-08-12 13:55:47 +02:00
63daa8b15a Update README.md (#1568) 2021-08-09 16:38:52 +02:00
92913e1eb8 Add information about product repo (#1567)
* Add information about product repo

* Update README.md

Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>

Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>
2021-08-09 14:56:43 +02:00
418be3daa8 Update issue templates (#1564) 2021-08-09 10:51:02 +02:00
7e3b2ddff2 Merge #1554
1554: Fix dump v1 (attributesForFaceting, and criteria) r=curquiza a=MarinPostma

close #1553


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-08-05 19:45:52 +00:00
312d93961a Merge #1556
1556: Update milli to v0.9.0 r=MarinPostma a=curquiza

Fixes #1552 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-05 14:04:55 +00:00
8f05d8d546 fix clippy warnings 2021-08-05 16:00:47 +02:00
f5ddea481a reintroduce exactness 2021-08-05 15:59:39 +02:00
29ca8271b3 test dumpv1 format regression 2021-08-05 15:59:39 +02:00
3084537d1e restore attributes for faceting in dump v1 2021-08-05 15:59:39 +02:00
86ac994543 Merge #1557
1557: Fix docs link anchor r=MarinPostma a=curquiza

thank you `@guimachiavelli` 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-05 13:34:48 +00:00
992b082c6f Fix docs link anchor 2021-08-05 13:28:32 +02:00
31fe263356 Update milli to v0.9.0 2021-08-05 13:08:27 +02:00
7a0b20c740 Merge #1532
1532: Start writing documentation for newcomers r=MarinPostma a=irevoire



Co-authored-by: Tamo <tamo@meilisearch.com>
2021-08-03 09:26:45 +00:00
9810f6b695 Merge #1540
1540: Update milli to version 0.8.1 r=curquiza a=curquiza

Integrates this fix into MeiliSearch https://github.com/meilisearch/milli/pull/296

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-07-29 17:15:52 +00:00
09c74c04a0 Merge #1539
1539: Use serdeval for validating json format. r=curquiza a=MarinPostma

uses [serdeval](https://github.com/MarinPostma/serdeval) to validate that the json payload is valid json, and in the correct format.

fix #1535


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-07-29 17:05:13 +00:00
b6cc932c09 Merge #1541
1541: Make clippy happy r=curquiza a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-07-29 16:53:53 +00:00
1b5d918cb9 Fix rustfmt 2021-07-29 18:32:09 +02:00
bf76d4a43c Make clippy happy 2021-07-29 18:14:36 +02:00
53b4b2fcbc Use serdeval for validating json format. 2021-07-29 18:02:54 +02:00
9a8629a6a9 Update milli 2021-07-29 17:45:31 +02:00
78308365ec fix typos 2021-07-29 14:40:41 +02:00
976075578f Merge #1537
1537: Import `.git` to the docker build image to fix vergen r=curquiza a=irevoire

I observed a small difference in the size of the build image, but I think we can allow it:
![image](https://user-images.githubusercontent.com/7032172/127369567-d03f9a41-3ad5-4933-888e-a3777df8c6cf.png)

I was not able to see any difference in build time, though.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-29 12:31:55 +00:00
243233f652 import .git to docker to fix vergen 2021-07-28 19:12:40 +02:00
d66eea42bb Merge #1536
1536: Remove ARMv7 binary publish r=MarinPostma a=curquiza

Fixes #1315 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-07-28 15:39:34 +00:00
c55f73bbc3 Remove ARMv7 support 2021-07-28 17:29:40 +02:00
3e30d4270b Merge #1533
1533: Update milli version to v0.8.0 r=MarinPostma a=curquiza

- Update milli, heed and obkv
- fix relevancy issue and the `facetsDistribution` display

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-07-28 11:15:31 +00:00
80916baa21 Add FieldId in import 2021-07-28 12:25:13 +02:00
1df8f041bd Update meilisearch-http/src/index/search.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-07-28 12:10:25 +02:00
6a6e2a8cd1 Update meilisearch-http/src/index/search.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-07-28 12:08:51 +02:00
f9d337b320 Update meilisearch-http/src/index/search.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-07-28 12:08:36 +02:00
feb069f604 Update meilisearch-http/src/index/search.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-07-28 12:08:28 +02:00
7e0eed5772 Update meilisearch-http/src/index/search.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-07-28 12:08:24 +02:00
9bdd040dd0 Update meilisearch-http/src/index/mod.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-07-28 12:08:19 +02:00
e5dabf265a Update milli version to v0.8.0 2021-07-28 10:52:47 +02:00
1a1046a0ef start writing some documentation for newcomers 2021-07-27 16:35:42 +02:00
dd18319b44 Merge #1530
1530: Update mini-dashboard version to v0.1.4 r=irevoire a=mdubus



Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
2021-07-27 10:11:02 +00:00
d3cd7e92d1 Update mini-dashboard version to v0.1.4 2021-07-27 11:44:20 +02:00
553e7d8aaa Merge #1528
1528: Update of the Date Time Format in commitDate  r=MarinPostma a=irevoire

Since we were relying on a [super old version of `vergen`](https://docs.rs/crate/vergen/3.0.1), we could not get the `commit timestamp`, so I updated `vergen` to the latest version.
This also allows us to remove all the features we don't use.

closes #1522

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-27 07:49:31 +00:00
f79b8287f5 update vergen 2021-07-26 15:25:30 +02:00
b4c98f6cc3 Merge #1521
1521: Sentry was never sending anything r=Kerollmops a=irevoire

@Kerollmops noticed that we had no log of this release in sentry, and it look like I badly tested my code after ignoring the “No space left on device” errors.

Now it should be fixed.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-21 14:46:56 +00:00
5d4a0ac844 sentry was never sending anything 2021-07-21 11:50:54 +02:00
0136b02e5b Merge #1498
1498: Show the filterable and not the faceted attributes in the settings r=Kerollmops a=Kerollmops

Fixes #1497

Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-07-13 07:27:14 +00:00
f49a01703a Show the filterable and not the faceted attributes in the settings 2021-07-09 16:11:37 +02:00
e4f82aa441 Merge #1494
1494: Add cache to the ci r=irevoire a=irevoire

closes #1446 

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-08 14:06:03 +00:00
751d1af2a6 Merge #1492
1492: auth tests r=irevoire a=MarinPostma

add regression tests on route authentication.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-07-08 13:13:55 +00:00
076d8fbb84 add cache to the ci 2021-07-08 11:19:12 +02:00
4b2d01a453 Merge #1484
1484: Add MeiliSearch version to issue template r=irevoire a=bidoubiwa

It is relevant to know the version of MeiliSearch before any other additional information that might be important to know.

We could also reduce the number of required information asked to the user. I would like to suggest the following:

Instead of the section of `Desktop` and `Smartphone`  I would just improve the last section

```
**Additional context**
Additional information that may be relevant to the issue.
[e.g. architecture, device, OS, browser]
```

By applying this, the template final look will be the following: 

-----

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**MeiliSearch version:** [e.g. v0.20.0]

**Additional context**
Additional information that may be relevant to the issue.
[e.g. architecture, device, OS, browser]

Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
2021-07-08 08:49:35 +00:00
a71fa25ebe auth tests 2021-07-07 17:47:48 +02:00
b4db54cb1f Merge #1488
1488: fix search permissions r=MarinPostma a=MarinPostma

fix #1485


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-07-07 13:16:07 +00:00
b2ca600e79 Remove unecessary questions 2021-07-07 11:15:08 +02:00
83725a1330 fix search permissions 2021-07-07 10:39:04 +02:00
587b837a6c Add MeiliSearch version to issue template 2021-07-06 22:04:15 +02:00
2844fe959f Merge #1483
1483: search tests r=MarinPostma a=MarinPostma

adds search tests from #1440.

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-07-06 15:13:07 +00:00
41e271974a add tests 2021-07-06 16:21:15 +02:00
520d37983c implement index search methods 2021-07-06 11:54:09 +02:00
487d82773a Merge #1481
1481: fix bug in index deletion r=Kerollmops a=MarinPostma

this bug was caused by a heed iterator entry being deleted while still holding a reference to it.


close #1333


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-07-06 08:07:30 +00:00
066085f6f5 fix index deletion bug 2021-07-05 18:42:13 +02:00
0d1f5b7193 Merge #1469
1469: Return 201 on index creation r=Kerollmops a=MarinPostma

fix #1467


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-07-05 15:42:13 +00:00
2f3a439566 fix tests 2021-07-05 16:31:52 +02:00
9681ffca52 change index create http code 2021-07-05 16:31:51 +02:00
fddc60f893 Merge #1471
1471: Bump milli to 0.7.2 r=irevoire a=irevoire



Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-05 13:29:38 +00:00
0f024cc225 Merge #1478
1478: refactor routes r=irevoire a=MarinPostma

refactor the route directory, so the module tree follows the route structure


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-07-05 12:55:39 +00:00
575ec2a06f refactor routes 2021-07-05 14:33:48 +02:00
83aef0a27d Merge #1473
1473: Update loop r=MarinPostma a=irevoire



Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-07-05 12:32:29 +00:00
bc85d30076 add test 2021-07-05 12:33:28 +02:00
bc417726fc fix update loop bug 2021-07-05 12:33:22 +02:00
9949a2a930 bump milli to 0.7.2 2021-07-05 12:19:27 +02:00
71e1cb472f Merge #1457
1457: Hotfix highlight on emojis panic r=Kerollmops a=ManyTheFish

When the highlight bound is in the middle of a character
or if we are out of bounds, we highlight the complete matching word.

note: we should enhance the tokenizer and the Highlighter to match char indices.

Fix #1368

Co-authored-by: many <maxime@meilisearch.com>
2021-07-01 14:48:18 +00:00
38161ede33 Add test with special characters 2021-07-01 16:44:17 +02:00
70dd1e6263 Merge #1456
1456: Fix update loop timeout r=Kerollmops a=Kerollmops

This PR fixes a wrong fix of the update loop introduced in #1429.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-01 13:53:47 +00:00
e626c9c8b9 Merge #1448
1448: Enable the tests on windows in the CI r=curquiza a=irevoire

closes #1443 

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-07-01 13:12:09 +00:00
fa5f8f9531 Fix an issue with the update loop falsely breaking 2021-07-01 14:53:31 +02:00
acfe31151e Hotfix panic for unicode characters
When the highlight bound is in the middle of a character
or if we are out of bounds, we highlight the complete matching word.

note: we should enhance the tokenizer and the Highlighter to match char indices.

Fix #1368
2021-07-01 14:49:22 +02:00
cb71b714d7 fix bors 2021-07-01 14:43:54 +02:00
4c6655f68c ci: enable tests on windows 2021-07-01 14:43:54 +02:00
490836a7b3 ignore the snapshots and dumps in the gitignore (#1449) 2021-07-01 14:41:53 +02:00
c11c909bad update bors 2021-07-01 12:02:22 +02:00
5c9401ad94 Merge #1438
1438: Update milli to 0.7.1 r=curquiza a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-30 18:49:41 +00:00
768987583a Merge #1428
1428: Accept any content type as json r=curquiza a=irevoire



Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-30 18:29:57 +00:00
cb58a8c776 Merge #1429
1429: Do not block when sending update notifications r=curquiza a=irevoire

transplant this [PR](https://github.com/meilisearch/transplant/pull/260) from @Kerollmops 🎉 

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-30 17:21:56 +00:00
4f0d3b065f Update milli 2021-06-30 18:39:06 +02:00
a95c44193d Do not block when sending update notifications 2021-06-30 17:29:22 +02:00
2830853665 accept any content type as json 2021-06-30 17:05:59 +02:00
a4ca79c9b3 Merge #1427
1427: Update README.md r=curquiza a=tpayet

Update quickstart & examples for rc0.21

Co-authored-by: Thomas Payet <thomas@meilisearch.com>
2021-06-30 15:00:42 +00:00
85b0878334 Update README.md
Update quickstart & examples for rc0.21
2021-06-30 16:58:02 +02:00
d61852a73f Merge #1421
1421: Transplant the new search engine r=tpayet a=curquiza



Co-authored-by: tamo <tamo@meilisearch.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: marin <postma.marin@protonmail.com>
2021-06-30 14:14:11 +00:00
14b6224de7 Update docker CIs 2021-06-30 16:08:01 +02:00
f0958c7d9b Remove useless CI 2021-06-30 16:00:25 +02:00
01de7f9e36 Update version 2021-06-30 15:59:59 +02:00
9f9148a1c6 Remove legacy test CI 2021-06-30 15:50:20 +02:00
73db1b3822 Merge remote-tracking branch 'transplant/main' 2021-06-30 15:30:08 +02:00
abca68bf24 Remove legacy source code 2021-06-30 15:20:17 +02:00
eeca841a21 Merge #259
259: Run rustfmt one the whole project and add it to the CI r=curquiza a=irevoire

Since there is currently no other PR modifying the code, I think it's a good time to reformat everything and add rustfmt to the ci.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-30 11:55:30 +00:00
3a9b86ad55 add rustfmt to bors 2021-06-30 10:49:10 +02:00
f1cc141f6c Merge #258
258: Use rustls instead of openssl r=curquiza a=irevoire

I also removed all the `default-features` of reqwest since we are only using the JSON one.
Fix #255

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-29 14:42:25 +00:00
3011209e28 bump alpine version 2021-06-29 16:36:41 +02:00
29bf6a8d42 run rustfmt one the whole project and add it to the CI 2021-06-29 15:25:18 +02:00
c282466750 remove the libressl dependency from our docker file 2021-06-29 15:22:11 +02:00
de9ea94f57 Merge #257
257: Accept no content-type as json r=curquiza a=irevoire

closes #253 

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-29 12:54:33 +00:00
fe7640555d fix the content-type 2021-06-29 13:16:56 +02:00
ec809ca487 use rustls instead of openssl and remove all default-features of reqwest 2021-06-29 13:07:40 +02:00
1dc99ea451 accept no content-type as json 2021-06-29 11:59:25 +02:00
f12ace3fbf Merge #256
256: Update heed and milli r=irevoire a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-29 08:49:22 +00:00
c09e610bb5 Update heed and milli 2021-06-29 10:25:47 +02:00
712abf4c5f Merge #246
246: Stop logging the no space left on device error r=curquiza a=irevoire

closes #208
@qdequele what do you think of that?
Are there any other errors we need to ignore?

As you can see in the code, once we are in `Sentry` the error has already been converted to a `String` so the only thing we can do to see if we need to send the error or not is to match the `String` against our error message. 
If we have a lot of other logs we want to ignore I would suggest prefixing all the logs with something like:
```
User error: No space left on device
```
So in Sentry, we could just check if the log start by `User error:` and ignore all these errors at once

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-29 08:20:49 +00:00
261df4b386 Merge #252
252: Fix docker run r=curquiza a=curquiza

Not the most beautiful fix since I cannot update alpine to version 3.14 without being flooded with errors.

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-28 15:47:24 +00:00
b0f399a51d Merge #249
249: Use half of the computer threads for the indexing process by default r=Kerollmops a=irevoire

closes #241 
By default, we use only half of the CPU threads when indexing documents; this allows the user to use the search while indexing. Also, the machine will not appear unresponsive when indexing a large batch of documents.

On the special case where a user only has one core, we use it entirely 😄 

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-28 15:25:11 +00:00
348d112388 Fix docker run 2021-06-28 16:55:29 +02:00
5c35a5d9fc Merge #250
250: Update mini-dashboard to v.0.1.3 r=curquiza a=mdubus

Should fix #245 

Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
2021-06-28 13:42:34 +00:00
a26bb50d62 Update mini-dashboard to v.0.1.3 2021-06-28 15:13:52 +02:00
a59f437ee3 use only half of the computer threads for the indexation by default 2021-06-28 14:35:50 +02:00
d74c698adc stop logging the no space left on device error 2021-06-28 13:59:48 +02:00
8d8fe8fd29 Merge #248
248: Unused borrow that must be used r=curquiza a=irevoire

I noticed #228 introduced a warning while compiling

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-28 11:53:22 +00:00
c1c50f6714 unused borrow that must be used 2021-06-28 13:35:25 +02:00
d7ca68d8e9 Merge #228
228: Authentication rework r=curquiza a=MarinPostma

In an attempt to fix #201, I ended up rewriting completely the authentication system we use. This is because actix doesn't allow to wrap a single route into a middleware, so we initially put each route into it's own service to use the authentication middleware. Routes are now grouped in resources, fixing #201.

As for the authentication, I decided to take a very different approach, and ditch middleware altogether. Instead, I decided to use actix's [extractor](https://actix.rs/docs/extractors/). `Data` is now wrapped in a `GuardedData<P: Policy, T>` (where `T` is `Data`) in each route. The `Policy` trait, thanks to the `authenticate` method tell if a request is authorized to access the resources in the route. Concretely, before the server starts, it is configured with a `AuthConfig` instance that can either be `AuthConfig::NoAuth` when no auth is required at runtime, or `AuthConfig::Auth(Policies)`, where `Policies` maps the `Policy` type to it singleton instance.

In the current implementation, and this to match the legacy meilisearch behaviour, each policy implementation contains a `HashSet` of token (`Vec<u8>` for now), that represents the user it can authenticate. When starting the program, each key (identified as a user) is given a set of `Policy`, representing its roles. The later is facilitated by the `create_users` macro, like so:

```rust
create_users!(
    policies,
    master_key.as_bytes() => { Admin, Private, Public },
    private_key.as_bytes() => { Private, Public },
    public_key.as_bytes() => { Public }
);
```

This is some groundwork for later development on a full fledged authentication system for meilisearch.


fix #201

Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-28 08:38:59 +00:00
01b09c065b change route to service<resource> 2021-06-24 19:02:28 +02:00
08104fd49c Merge #242
242: Fix docker build r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-24 15:30:27 +00:00
3b601f615a declare new authentication related errors 2021-06-24 16:53:20 +02:00
b1f7fe24f6 Fix docker build 2021-06-24 16:45:51 +02:00
fbd58f2eec clippy 2021-06-24 16:36:22 +02:00
79fc3bb84e fmt 2021-06-24 16:36:22 +02:00
8e4928c7ea fix tests 2021-06-24 16:36:22 +02:00
d078cbf39b remove authentication middleware 2021-06-24 16:36:21 +02:00
561596d8bc update stats routes 2021-06-24 16:36:18 +02:00
549b489c8a update settings routes 2021-06-24 16:35:48 +02:00
1e9f374ff8 update running route 2021-06-24 16:35:12 +02:00
817fcfdd88 update keys route 2021-06-24 16:35:12 +02:00
fab50256bc update index routes 2021-06-24 16:35:04 +02:00
b044608b25 update health route 2021-06-24 16:32:45 +02:00
ce4fb8ce20 update dump route 2021-06-24 16:32:43 +02:00
adf91d286b update documents and search routes 2021-06-24 16:32:15 +02:00
0c1c7a3dd9 implement authentication policies 2021-06-24 16:31:30 +02:00
5b71751391 policies macros 2021-06-24 16:31:30 +02:00
12f6709e1c move authencation to extractor mod 2021-06-24 16:31:28 +02:00
5229f1e220 experimental auth extractor 2021-06-24 16:30:15 +02:00
b6ca7929eb Merge #240
240: Rework error messages r=irevoire a=MarinPostma

Simplify the error messages, and make them more compliant with legacy Meilisearch.

Basically, stop composing the messages, and simply forward the message of inner errors.


Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-24 11:36:11 +00:00
43204ca67b Merge #230
230: Logs r=MarinPostma a=irevoire

closes #193 

Since we can't really print the body of requests in actix-web, I logged the parameters of every request and what we were returning to the client.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-24 09:23:24 +00:00
ad8d9a97d6 debug the body of every http request 2021-06-24 11:22:11 +02:00
36f32f58d4 add the log_level variable to the cli and reduce the log level of milli and grenad 2021-06-24 11:20:52 +02:00
b4fd4212ad reduce the log level of some info! 2021-06-24 11:20:52 +02:00
a1d34faaad decompose error messages 2021-06-24 10:57:28 +02:00
a2368db154 Merge #239
239: Bump milli to 0.6.0 r=MarinPostma a=MarinPostma

fix #231


Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-24 08:08:41 +00:00
381e07b7b6 Merge #1415
1415: Fix README.md typos r=curquiza a=dichotommy

Just fixing some typos and such.
Kanji -> Hanzi
Kanji refers only to the Japanese versions of Chinese characters, and since we don't have a Japanese tokenization pipeline I think it could be misunderstood.

Co-authored-by: Tommy <68053732+dichotommy@users.noreply.github.com>
2021-06-24 07:46:28 +00:00
74bb748a4e bump milli to 0.6.0 2021-06-23 18:40:19 +02:00
09113fc73c Update README.md
Just fixing some typos and such.
Kanji refers only to Japanese versions of the Chinese characters, and since we don't have a Japanese tokenization pipeline I think it could be misleading.
2021-06-23 18:30:48 +02:00
8638c9ab77 Merge #232
232: Fix payload size limit r=MarinPostma a=MarinPostma

Fix #223

This was due to the fact that Payload ignores the limit payload size limit. I fixed it by implementing my own `Payload` extractor that checks that the size of the payload is not too large.

I also refactored the `create_app` a bit.

Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-23 16:06:08 +00:00
b676b10cfe Merge #238
238: Fix settings subroutes get r=MarinPostma a=MarinPostma

Fix #225 

Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-23 15:45:50 +00:00
f68c257452 move flush in write_to_file function 2021-06-23 16:49:25 +02:00
880fc069bd remove dbg 2021-06-23 16:49:25 +02:00
a838238a63 move payload to own module 2021-06-23 16:49:25 +02:00
834995b130 clippy + fmt 2021-06-23 16:49:23 +02:00
b000ae7614 remove file if write to update file fails 2021-06-23 16:48:33 +02:00
f62779671b change error message for payload size limit 2021-06-23 16:48:33 +02:00
4b292c6e9b add payload limit to app config 2021-06-23 16:48:33 +02:00
1c13100948 implement custom payload 2021-06-23 16:48:31 +02:00
71226feb74 refactor create_app macro 2021-06-23 16:47:15 +02:00
b9b4feada8 add tests 2021-06-23 16:21:32 +02:00
3175f09989 Merge #235
235: Fix dump not found error r=MarinPostma a=MarinPostma

fix #233


Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-23 14:21:07 +00:00
322d6b8cfe fix serialization bug in settings 2021-06-23 15:25:56 +02:00
da36a6b5cd fix not found error 2021-06-23 15:06:36 +02:00
f2b2ca6d55 Merge #227
227: improve mini dashboard routing r=MarinPostma a=MarinPostma

The dependency we use to statically serve the mini-dashboard used globing to serve the mini-dashboard files. This caused all unfound routes to be caught by the "/" serving the dashboard assets. This fix makes it so that the assets have a dedicated route, and any unfound route is caught by the default service and return a 404.


Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-23 13:01:40 +00:00
0ebe3900e0 Merge #229
229: Add exhaustiveFacetsCount r=MarinPostma a=curquiza

I completely forgot this one 😅

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-23 09:29:54 +00:00
ec3140a29e Fix clippy 2021-06-23 11:23:57 +02:00
00b0a00fc5 Add exhaustiveFacetsCount 2021-06-23 11:05:30 +02:00
adb970edcc Merge #226
226: Make facetsDistribution name iso r=MarinPostma a=curquiza

Even if there is an English mistake in `facets_distribution` (because of the `s`) @gmourier asked me to keep the typo: the name of `facetsDistribution` might change completely in the future, he wants to avoid two breakings.

@gmourier can you confirm before we merge this PR?

Sorry I left this update in the code (I'm confused because no issues was open to update `facetsDistribution`), there might have been a confusion with `fieldsDistribution` that has been renamed into `fieldDistribution`. Sorry!

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-23 08:14:12 +00:00
6d24a4744f Roll back facetsDistribution 2021-06-23 10:04:01 +02:00
b1a5ef0aab improve mini dashboard routing 2021-06-22 21:49:05 +02:00
7ec752ed1c Merge #224
224: Update version for alpha 6 r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-22 18:20:09 +00:00
0de696feaf Update version for alpha 6 2021-06-22 18:40:51 +02:00
d6b53c5e7a Merge #220
220: Implement `matches` r=irevoire a=MarinPostma

implement `_matchesInfo`. I initially thought we could factor it inside the highlighting, but they are unrelated features after all, and needed a dedicated pass too handle.

Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-22 16:29:07 +00:00
3456a78552 refactor formatter
share the analyzer instance between the formatter and the
compute_matches function
2021-06-22 18:28:20 +02:00
eb3d63691a add tests 2021-06-22 18:12:53 +02:00
c4ee937635 optimize fromat string 2021-06-22 18:12:53 +02:00
f6d1fb7ac2 fmt 2021-06-22 18:12:53 +02:00
97ef4a6c22 implement matches 2021-06-22 18:12:52 +02:00
db7215eaa9 Merge #213
213: Implement all the CLI options r=MarinPostma a=irevoire

closes #206 
And I looked into #204, I fixed some default values and tried to test as many options as possible, and I think the cli is already mostly working.
If someone knows any issues about it, I would like to hear more 🙂 

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-22 15:04:05 +00:00
4b37a4a415 Merge #211 #218
211: fix index deletion race condition r=MarinPostma a=MarinPostma

Make update store block if the currently processing update is from an index we are trying to delete. This ensure that no write to the index can occur after it has been deleted.

218: Update milli version to v0.5.0 r=MarinPostma a=curquiza



Co-authored-by: marin postma <postma.marin@protonmail.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-22 14:36:34 +00:00
d1ad23e2d8 Merge #221
221: fix get search crop len r=irevoire a=MarinPostma

Fix bug where crop length was mandatory when performing a GET search.


Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-22 14:13:52 +00:00
caa231aebe fix race condition 2021-06-22 16:09:07 +02:00
9cc31c2258 fix get search crop len 2021-06-22 16:01:40 +02:00
e2844f3a92 Update tokenizer version to v0.2.3 2021-06-22 15:57:47 +02:00
2e3d85c31a Update milli version to v0.5.0 2021-06-22 15:57:46 +02:00
25af262e79 Merge #210
210: Error handling r=MarinPostma a=MarinPostma

This pr implements the error handling for meilisearch.

Rather than grouping errors by types, this implementation groups them by scope, each scope enclosing errors from a scope further down, or new errors within this scope. This makes the tracking of the origins of errors easier , and error handling easier at the module level.

All errors that are eventually returned to the user implement the `Into<ResponseError>` trait. `ReponseError` in turn implements the `ErrorCode` trait from `meilisearch-error`.

Some new errors have been introduced with the new engine for which we haven't defined error codes yet. It has been decided with @gmourier that those would return the `internal-error` code until the correct error code is specified.


Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-22 13:21:33 +00:00
d0ef1ef174 change errors codes 2021-06-22 11:58:01 +02:00
905ace3e13 fix test 2021-06-22 11:10:57 +02:00
9092d35a3c fix payload error handler 2021-06-21 21:51:38 +02:00
2bdaa70f31 invalid update payload returns bad_request 2021-06-21 18:56:22 +02:00
f91a3bc6ab set error content type to json 2021-06-21 18:48:05 +02:00
1e4592dd7e enable errors in updates 2021-06-21 18:42:47 +02:00
50dc2fc7a5 Merge #219
219: Run cargo flaky only 100 times r=irevoire a=irevoire

Look like the CI was not able to run cargo flaky 1000 times in 6 hours, so I guess, for now, we can come back to 100 times.

https://github.com/meilisearch/transplant/runs/2858159390


Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-21 16:29:27 +00:00
76727455ca ignore all the options related to the indexer 2021-06-21 18:13:00 +02:00
cf94b8e6e0 run cargo flaky only 100 times 2021-06-21 17:36:54 +02:00
1cf9f43dfe fix the tests 2021-06-21 16:34:49 +02:00
2097554c09 fix the cli 2021-06-21 16:34:49 +02:00
56686dee40 review changes 2021-06-21 13:57:32 +02:00
763ee521be fix rebase errors 2021-06-21 12:11:09 +02:00
0bfdf9a785 bump milli 2021-06-21 12:11:09 +02:00
fa573dabf0 fmt 2021-06-21 12:11:09 +02:00
abdf642d68 integrate milli errors 2021-06-21 12:11:08 +02:00
0dfd1b74c8 fix tests 2021-06-21 12:11:08 +02:00
0d3fb5ee0d factorize internal error macro 2021-06-21 12:11:08 +02:00
02277ec2cf reintroduce anyhow 2021-06-21 12:11:06 +02:00
70661ce50d Merge #216
216: optimize cropping r=MarinPostma a=MarinPostma

Optimize cropping as per @kerollmops suggestion.


Co-authored-by: marin postma <postma.marin@protonmail.com>
Co-authored-by: marin <postma.marin@protonmail.com>
2021-06-21 10:00:45 +00:00
8fc12b1526 Update meilisearch-http/src/index/search.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-21 11:06:06 +02:00
439db1aae0 enable response error for search routes 2021-06-21 11:00:14 +02:00
8afbb9c462 enable response error for documents routes 2021-06-21 10:59:41 +02:00
5c52a1393f enable response error for settings routes 2021-06-21 10:59:41 +02:00
112cd1787c change error message for uuid resolver 2021-06-21 10:59:40 +02:00
d1550670a8 enable response error for index routes 2021-06-21 10:59:40 +02:00
58f9974be4 remove anyhow refs & implement missing errors 2021-06-21 10:59:38 +02:00
3a2e7d3c3b optimize cropping 2021-06-20 16:59:31 +02:00
c1b6f0e833 Merge #183
183: Add cropping and update `_formatted` behavior r=curquiza a=MarinPostma

TODO:
- [x] Solves #5 
- [x] Solves #203 
- [x] integrate the new milli highlight (according to the query words)

Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-18 11:18:37 +00:00
5f08e41a85 Merge #215
215: Fix Clippy errors r=irevoire a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-17 17:05:11 +00:00
5d8a21b0de Fix clippy errors 2021-06-17 18:51:07 +02:00
9e8888b603 Fix clippy errors 2021-06-17 18:50:18 +02:00
623b71e81e Fix clippy errors 2021-06-17 18:02:25 +02:00
c5c7e76805 Update meilisearch-http/src/index/search.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-06-17 18:00:02 +02:00
e4b3d35ed8 Fix clippy errors 2021-06-17 17:03:43 +02:00
33e55bd82e Refactor the crop 2021-06-17 16:59:01 +02:00
9543ab4db6 Use mut instead of returning the hashmap 2021-06-17 13:51:27 +02:00
97909ce56e Use BTreeMap and remove ids_in_formatted 2021-06-16 19:30:06 +02:00
2f2484e186 Merge #212
212: bump milli to 0.4.0 r=MarinPostma a=MarinPostma



Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-16 15:42:34 +00:00
2062b10b79 Merge #209
209: Integrate amplitude r=MarinPostma a=irevoire

And merge the sentry and amplitude usage under one “Enable analytics” flag

closes #180


Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-06-16 15:25:31 +00:00
a0b022afee Add Cow 2021-06-16 17:25:02 +02:00
5a47cef9a8 bump milli to 0.4.0 2021-06-16 17:15:56 +02:00
9538790b33 Decompose into two functions 2021-06-16 17:13:21 +02:00
4e2568fd6e disable amplitude on debug build 2021-06-16 17:12:49 +02:00
dc5a3d4a62 Use BTreeSet instead of HashSet 2021-06-16 16:20:10 +02:00
7b02fdaddc Rename functions 2021-06-16 14:23:08 +02:00
c0d169e79e Apply suggestions from code review
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-16 11:12:46 +02:00
9840b5c7fb Refacto 2021-06-15 18:44:56 +02:00
1ef061d92b Fix clippy errors 2021-06-15 17:40:45 +02:00
79a1212ebe Do intersection with displayed ids instead of checking in loop 2021-06-15 17:40:45 +02:00
8d0269fcc4 Create function to create fomatted_options 2021-06-15 17:40:45 +02:00
5e656bb58a Rename parse_facets into parse_filter 2021-06-15 17:40:45 +02:00
d9c0190497 Redo to_retrieve_ids 2021-06-15 17:40:45 +02:00
5dffe566fd Remove useless comments 2021-06-15 17:40:45 +02:00
b769877183 Make it compatible with the new milli highlighting 2021-06-15 17:40:44 +02:00
446b66b0fe Fix cargo clippy error 2021-06-15 17:40:44 +02:00
d0ec081e49 Refacto 2021-06-15 17:40:44 +02:00
65130d9ee7 Change crop_length type from Option(usize) to usize 2021-06-15 17:40:44 +02:00
638009fb2b Rename highlighter variable into formatter 2021-06-15 17:40:44 +02:00
7f84f59472 Reorganize imports 2021-06-15 17:40:44 +02:00
4f8c771bb5 Add new line 2021-06-15 17:40:43 +02:00
9e69f33f3c Fix clippy errors 2021-06-15 17:40:43 +02:00
0da8fa115e Add custom croplength for attributes to crop 2021-06-15 17:40:43 +02:00
811bc2f421 Around to previous word 2021-06-15 17:40:43 +02:00
caaf8d3f40 Fix tests 2021-06-15 17:40:43 +02:00
7473cc6e27 implement crop around 2021-06-15 17:40:43 +02:00
56c9633c53 simple crop before 2021-06-15 17:40:43 +02:00
93002e734c Fix tests 2021-06-15 17:40:42 +02:00
60f6d1c373 First version of highlight after refacto 2021-06-15 17:40:42 +02:00
a03d9d496e Fix compilation errors 2021-06-15 17:40:42 +02:00
7904637893 crop skeleton 2021-06-15 17:40:42 +02:00
def1596eaf Integrate amplitude
And merge the sentry and amplitude usage under one “Enable analytics”
flag
2021-06-15 15:36:30 +02:00
5795254b2a Merge #207
207: Update alpha for the next release r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-14 16:40:26 +00:00
fe5a494035 Update alpha for the next release 2021-06-14 17:55:04 +02:00
13e864d29f Merge #196
196: Implements the synonyms in transplant r=MarinPostma a=irevoire

closes #18 

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-14 14:09:08 +00:00
a780cff8fd fix clippy warning 2021-06-14 14:53:47 +02:00
7cb2dcbdf8 add a comment 2021-06-14 14:47:53 +02:00
f068d7f978 makes clippy happy 2021-06-14 14:47:53 +02:00
18d4d6097a implements the synonyms in transplant 2021-06-14 14:47:51 +02:00
b119bb4ab0 Merge #197
197: Update milli (v0.3.1) with filterable attributes r=MarinPostma a=curquiza

Fixes #187 and #70
Also fixes #195 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-14 12:19:42 +00:00
d65b5db97f Merge #144 #173
144: Concurrent update run loop (refactor) r=MarinPostma a=MarinPostma

This PR allows multiple request to the update store to be performed concurently (i.e, one can list updates while an updates in being written to the update store).


173: Convert UpdateStatus to legacy meilisearch format r=MarinPostma a=MarinPostma

Returns the update statuses with the same format as legacy meilisearch.

The number of documents in a document addition/deletion is not known before processing, so it is only returned when the update is `processed`.

close #78 

associated milli PR: https://github.com/meilisearch/milli/pull/178


Co-authored-by: marin postma <postma.marin@protonmail.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-06-14 11:30:44 +00:00
d4be4d80db Fix after rebase 2021-06-14 13:27:18 +02:00
9996c59183 Update with milli 0.3.1 2021-06-14 13:20:43 +02:00
88bf867a3e Rename attributes for faceting into filterable attributes 2021-06-14 13:20:43 +02:00
7009906d55 Update reset-all-settings test 2021-06-14 13:20:43 +02:00
ca1bb7dc1c Fix tests 2021-06-14 13:20:43 +02:00
aa04124bfc Add changes according to milli update 2021-06-14 13:20:37 +02:00
2be834fced Merge #205
205: Fix the cron syntax to effectively run the test once every friday r=MarinPostma a=irevoire



Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-14 11:11:03 +00:00
11c81ab4cb fix tests 2021-06-14 11:17:49 +02:00
0f767e3743 conccurrent update run loop 2021-06-14 10:57:14 +02:00
92d954ddfe Fix the cron syntax to effectively run the test once every friday 2021-06-14 10:48:59 +02:00
1e659bb17b Merge #194
194: Bump sentry version r=MarinPostma a=irevoire

closes #102 

Co-authored-by: tamo <tamo@meilisearch.com>
2021-06-14 08:34:04 +00:00
e8bd5ea4e0 convert UpdateStatus to legacy meilisearch format 2021-06-14 10:21:57 +02:00
d765397c82 Merge #179
179: Enable filter paramater during search r=MarinPostma a=MarinPostma

This pr makes the necessary changes to transplant in accordance with the specification on filters.

More precisely, it:
- Removes the `filters` parameter
- Renames `facetFilters` to `filter`
- Allows either a string or an array to be passed to the filter param.

It doesn't allow the mixed syntax, that needs to be handled by milli.

close #81
close #140


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-06-14 08:11:30 +00:00
d46a2713d2 Merge #202
202: Add a github action to run cargo-flaky 1000 times r=curquiza a=irevoire

I don’t know how to ensure the CI works so it’s just a first version, do not hesitate to update the code

Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-06-10 22:04:57 +00:00
8932f302ce Merge #1403
1403: fix amount of time r=curquiza a=TheTechRobo

The new MeiliSearch sandboix website says "48 hours" rather than 72, so I updated the readme to reflect that

Co-authored-by: TheTechRobo <52163910+TheTechRobo@users.noreply.github.com>
2021-06-10 15:21:23 +00:00
51105d3b1c run the tests in release mode 2021-06-10 17:12:07 +02:00
efc1225cd8 Update .github/workflows/flaky.yml
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-10 17:07:23 +02:00
41220a7f96 Apply suggestions from code review
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-10 17:02:06 +02:00
7312c13665 add a github action to run cargo-flaky 1000 times 2021-06-10 16:53:30 +02:00
e6220a1346 Merge #199
199: fix flaky tests r=irevoire a=MarinPostma

fixes:
- index creation bug
- fix store lock
- fix decoding error
- fix stats


Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-06-10 14:13:25 +00:00
3ef0830c5d review changes 2021-06-10 16:11:52 +02:00
eb7616ca0f remove dbg 2021-06-10 16:03:48 +02:00
592fcbc71f fix stats test 2021-06-10 16:03:48 +02:00
20e1caef47 makes clippy happy 2021-06-10 16:03:48 +02:00
2d19b78dd8 fix stats test 2021-06-10 16:03:48 +02:00
99551fc21b fix encoding bug 2021-06-10 16:03:48 +02:00
d30641e9ca fix amount of time 2021-06-10 08:55:05 -04:00
2716c1aebb fix update store lock 2021-06-09 16:19:45 +02:00
1a65eed724 fix index creation bug 2021-06-09 11:52:36 +02:00
a26a0a4eec Merge pull request #1401 from meilisearch/remove-stop-words
Remove stop-word datasets
2021-06-08 17:56:07 +02:00
a56ac66e6c Remove stop-word datasets 2021-06-08 16:38:53 +02:00
7e2d7601f2 Merge #1399
1399: Update download-latest.sh r=curquiza a=94noni

Hey, PR of the weekend :)
Kidding, I began to use MeiliSearch recently for fun&personal usage, wishing you good luck for your next v0.21|v1.0 releases
Cheers

Co-authored-by: Antoine Makdessi <amakdessi@me.com>
2021-06-07 15:22:26 +00:00
1550b7d6ba Update download-latest.sh 2021-06-05 16:45:13 +02:00
9f40896f4a Merge #175
175: Fix update loop infinite loop r=irevoire a=MarinPostma

fix update loop infinite loop in case of udpate error.

close #169


Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-02 23:02:10 +00:00
75c0718691 fix update loop infinite loop 2021-06-02 17:29:50 +02:00
509a56a43d Merge #158
158: Implements the dumps r=irevoire a=irevoire

closes #20

divergence from legacy meilisearch:
- dump v2 added, support loading of pending updates (only works dumps created from v2)
- added time stamps to the dump info
- Dump info are only persisted in an internal data structure, and they are not fetched from fs on demand anymore. This was a potential security flaw. This means that the dump infos are flushed on every restart.

Co-authored-by: tamo <tamo@meilisearch.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-06-02 12:06:47 +00:00
2d7785ae0c remove the dump_batch_size option from the CLI 2021-06-01 20:42:06 +02:00
d0552e765e forbid deserialization of Setting<Checked> 2021-06-01 20:41:45 +02:00
3a7c1f2469 Merge #191
191: dumps v2 r=irevoire a=MarinPostma



Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: marin <postma.marin@protonmail.com>
2021-06-01 09:46:31 +00:00
df6ba0e824 Apply suggestions from code review
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-06-01 11:18:37 +02:00
6609f9e3be review edits 2021-05-31 18:41:37 +02:00
1c4f0b2ccf clippy, fmt & tests 2021-05-31 16:03:39 +02:00
10fc870684 improve dump info reports 2021-05-31 15:49:04 +02:00
dffbaca63b bump sentry version 2021-05-31 13:59:31 +02:00
b3c8f0e1f6 fix empty index error 2021-05-31 10:58:51 +02:00
bc5a5e37ea fix dump v1 2021-05-31 10:42:31 +02:00
33c6c4f0ee add timestamos to dump info 2021-05-30 15:55:17 +02:00
39c16c0fe4 fix dump import 2021-05-30 12:35:17 +02:00
1cb64caae4 dump content is now only uuid 2021-05-29 00:08:17 +02:00
b258f4f394 fix dump import 2021-05-27 14:30:20 +02:00
c47369839b dump meta 2021-05-27 10:51:19 +02:00
b924e897f1 load index dump 2021-05-27 10:27:47 +02:00
e818c33fec implement load uuid_resolver 2021-05-26 20:42:09 +02:00
9278a6fe59 integrate in dump actor 2021-05-25 18:14:11 +02:00
3593ebb8aa dump updates 2021-05-25 16:44:58 +02:00
464639aa0f udpate actor error improvements 2021-05-25 16:44:58 +02:00
4acbe8e473 implement index dump 2021-05-25 16:44:58 +02:00
7ad553670f index error handling 2021-05-25 16:44:58 +02:00
2185fb8367 dump uuid resolver 2021-05-25 16:44:54 +02:00
cbcf50960f Merge pull request #192 from meilisearch/dumps-tasks
Dumps tasks
2021-05-25 15:49:15 +02:00
89846d1656 improve panic message 2021-05-25 15:47:57 +02:00
e5175f5dc1 merge 2021-05-25 15:24:39 +02:00
1a6dcec83a crash when the actor have no inbox 2021-05-25 15:23:13 +02:00
fe260f1330 Update meilisearch-http/src/index_controller/dump_actor/actor.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-05-25 15:13:47 +02:00
991d8e1ec6 fix the error printing 2021-05-25 10:48:57 +02:00
49a0e8aa19 use a RwLock instead of a Mutex 2021-05-24 18:19:34 +02:00
912f0286b3 remove the dump_inner trickery 2021-05-24 18:06:20 +02:00
dcf29e1081 fix the error handling in case there is a panic while creating a dump 2021-05-24 17:33:42 +02:00
529f7962f4 handle parallel requests for the dump actor 2021-05-24 15:42:12 +02:00
8a11c6c429 Implements the legacy behaviour of the dump
When asked if a dump exists we check if it's the current dump, and if
it's not then we check on the filesystem for any file matching our
`uid.dump`
2021-05-24 12:35:46 +02:00
4cbf866821 merge with main 2021-05-12 18:12:37 +02:00
e0e23636c6 fix the serializer + reformat the file 2021-05-12 17:04:24 +02:00
295f496e8a atomic index dump load 2021-05-12 16:21:37 +02:00
47a1bc34de Merge #189
189: Fix snapshots r=irevoire a=MarinPostma



Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-05-12 09:28:50 +00:00
6d837e3e07 the route to create a dump must return a 202 2021-05-11 17:34:34 +02:00
1b671d4302 fix-snapshot 2021-05-11 13:57:18 +02:00
c30b32e173 add the criterion attribute when importing dumps from the v1 2021-05-11 13:21:36 +02:00
9e798fea75 fix the import of dump without unprocessing updates 2021-05-11 13:03:47 +02:00
384afb3455 fix the way we return the settings 2021-05-11 11:47:04 +02:00
92a7c8cd17 make clippy happy 2021-05-11 00:27:22 +02:00
8b7735c20a move the import of the updates in the v2 and ignore the v1 for now 2021-05-11 00:20:55 +02:00
7d748fa384 integrate the new Settings in the dumps 2021-05-10 20:48:06 +02:00
d767990424 fix the import of the updates in the dump 2021-05-10 20:25:12 +02:00
ef438852cd fix the v1 2021-05-10 20:25:12 +02:00
40ced3ff8d first working version 2021-05-10 20:25:12 +02:00
5f5402a3ab provide a way to access the internal content path of all processing State 2021-05-10 20:25:12 +02:00
26dcb9e66d bump milli version and fix a performance issue for large dumps 2021-05-10 20:25:12 +02:00
956012da95 fix dump lock 2021-05-10 20:25:12 +02:00
24192fc550 fix tests 2021-05-10 20:25:12 +02:00
efca63f9ce [WIP] rebase on main 2021-05-10 20:25:09 +02:00
c3552cecdf WIP rebase on main 2021-05-10 20:24:18 +02:00
0f94ef8abc WIP: dump 2021-05-10 20:24:18 +02:00
0275b36fb0 [WIP] rebase on main 2021-05-10 20:24:14 +02:00
1b5fc61eb6 [WIP] rebase on main 2021-05-10 20:23:12 +02:00
0fee81678e [WIP] rebase on main 2021-05-10 20:22:18 +02:00
c4d898a265 split the dumps between v1 and v2 2021-05-10 20:20:57 +02:00
e389c088eb WIP: rebasing on master 2021-05-10 20:20:57 +02:00
ceb8d6e1c9 Merge #186
186: settings fix r=MarinPostma a=MarinPostma

add type checked settigns validation. For now it only transform the settings accepting wildcard


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-05-10 16:42:12 +00:00
0cc79d414f add test 2021-05-10 18:34:25 +02:00
8d11b368d1 implement check 2021-05-10 18:22:41 +02:00
706643dfed type setting struct 2021-05-10 17:30:09 +02:00
b192cb9c1f enable string syntax for the filters 2021-05-06 12:48:31 +02:00
998d5ead34 Merge #182
182: remove facet setting r=MarinPostma a=MarinPostma

remove useless code


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-05-05 11:22:12 +00:00
ec7eb7798f remove facet setting 2021-05-04 22:36:31 +02:00
a717925caa remove filters, rename facet_filters to filter 2021-05-04 18:20:56 +02:00
88ae02f8d9 Merge #174
174: Upgrade Tokenizer r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-05-04 15:57:07 +00:00
eb03a3ccb1 Upgrade Milli and Tokenizer 2021-05-04 17:56:19 +02:00
77740829bd Merge #177
177: bump milli r=MarinPostma a=MarinPostma



Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-05-04 13:49:37 +00:00
928fb34eff bump milli and fix tests 2021-05-04 15:10:22 +02:00
1e6b40a24b Merge #172
172: Fix cors authentication issue r=MarinPostma a=MarinPostma

The error was due to the middleware returning an error, instead of a response containing the error.

close #110


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-05-03 08:38:42 +00:00
78217bcf18 Fix cors authentication issue 2021-04-29 16:28:12 +02:00
53c88d9fa3 Merge #170
170: Improve CI r=MarinPostma a=curquiza

Checked with @Kerollmops to improve (a little bit) the CI execution time.

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-29 14:08:33 +00:00
b14fdb1163 Merge #171
171: Update mini-dashboard with version 0.1.2 r=MarinPostma a=mdubus

Update of the mini-dashboard sha1 & assets-url, due to a new release

Co-authored-by: Morgane Dubus <morgane.d@meilisearch.com>
2021-04-29 13:48:54 +00:00
3d5fba94c2 Update mini-dashboard with version 0.1.2 2021-04-29 15:22:41 +02:00
3ee2b07918 Improve CI 2021-04-29 15:19:48 +02:00
8bc7dd8b03 Merge #143
143: Shared update store r=irevoire a=MarinPostma

This PR changes the updates process so that only one instance of an update store is shared among indexes.

This allows updates to always be processed sequentially without additional synchronization, and fixes the bug where all the first pending update for each index were reported as processing whereas only one was.

EDIT:

I ended having to rewrite the whole `UpdateStore` to allow updates being really queued and processed sequentially in the ordered they were added. For that purpose I created a `pending_queue` that orders the updates by a global update id.

To find the next `update_id` to use, both globally and for each index, I have created another database that contains the next id to use.

Finally, all updates that have been processed (with success or otherwise) are all stores in an `updates` database.

The layout for the keys of these databases are such that it is easy to iterate over the elements for a particular index, and greatly reduces the amount of code to do so, compared to the former implementation.

I have also simplified the locking mechanism for the update store, thanks to the StateLock data structure, that allow both an arbitrary number of readers and a single writer to concurrently access the state. The current state can be either Idle, Processing, or Snapshotting. When an update or snapshotting is ongoing, the process holds the state lock until it is done processing its task. When it is done, it sets bask the state to Idle.

I have made other small improvements here and there, and have let some other for work, such as:
- When creating an update file to hold a request's content, it would be preferable to first create a temporary file, and then atomically persist it when we have written to it. This would simplify the case when there is no data to be written to the file, since we wouldn't have to take care about cleaning after ourselves.
- The logic for content validation must be factored.
- Some more tests related to error handling in the process_pending_update function.
- The issue #159

close #114


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-27 18:41:55 +00:00
e6fd1afc3d Merge pull request #163 from meilisearch/curquiza-patch-1
Update README.md
2021-04-27 18:51:04 +02:00
a961f0ce75 fix clippy warnings 2021-04-27 18:28:46 +02:00
cea0c1f41d Update README.md 2021-04-27 16:33:22 +02:00
703d2026e4 Update README.md 2021-04-27 16:33:00 +02:00
3d85b2d854 Merge #162
162: Re-enable ranking rules route r=MarinPostma a=MarinPostma

re-enable ranking rules setting route


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-27 13:55:40 +00:00
bb79a15c04 reenable ranking rules route 2021-04-27 15:29:00 +02:00
4fe2a13c71 rewrite update store 2021-04-27 15:20:52 +02:00
51829ad85e review fixes 2021-04-27 15:10:57 +02:00
c78f351300 fix tests 2021-04-27 15:10:57 +02:00
ee675eadf1 fix stats 2021-04-27 15:10:55 +02:00
33830d5ecf fix snapshots 2021-04-27 15:09:55 +02:00
2b154524bb fix filtered out pending update 2021-04-27 15:09:23 +02:00
b626d02ffe simplify index actor run loop 2021-04-27 15:09:22 +02:00
9ce68d11a7 single update store instance 2021-04-27 15:09:21 +02:00
5a38f13cae multi_index udpate store 2021-04-27 15:07:13 +02:00
7055384aeb Merge #116
116: Add tests for every plateform + clippy r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-27 11:07:58 +00:00
0c41adf868 Update CI 2021-04-27 12:43:00 +02:00
1ba46f8f77 Disable clippy rule 2021-04-27 12:43:00 +02:00
f80ea24d2b Add tests on every platform and fix clippy errors 2021-04-27 12:42:59 +02:00
d34d7cbc37 Merge #161
161: put mini-dashboard in out-dir r=MarinPostma a=MarinPostma

This PR puts the mini-dashboard during build in the `OUT_DIR` specified by cargo. This allow the mini-dashboard artifacts to be cleaned when `cargo clean` is ran, and not pollute the working directory with unwanted files.


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-27 07:40:23 +00:00
5014f74649 put mini-dashboard in out-dir 2021-04-27 09:32:17 +02:00
1f32f35d9e Merge #160
160: Update version for the next release (alpha4) r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-26 19:09:08 +00:00
f3b6bf55a6 Update version for the next release (alpha4) 2021-04-26 19:05:16 +02:00
9e6a7e3aa9 Merge #153
153: integrate mini dashboard r=MarinPostma a=MarinPostma

This PR integrate the [mini dashboard](https://github.com/meilisearch/mini-dashboard) to transplant.

It adds a build feature `mini-dashboard` to statically add the mini-dashboard to the MeiliSearch binary. The mini-dashboard build feature is enabled by default and can be disabled by building MeiliSearch with `cargo build --no-default-features`.

- [x] Fetch the mini-dashboard from the Github release
- [x] Check that the SHA1 on the downloaded payload matches the one in the metadata
- [x] Unpack the mini dashboard in `meilisearch-http/mini-dashboard`
- [x] serve the mini-dashboard if the `mini-dashboard` feature is enabled
- [x] Update CI to build MeiliSearch with mini-dashboard for releases

close #87

## Shasum check and build optimizations.

In order to make sure that the right bundle for the mini-dashboard is downloaded, its shasum is computed and compared to the one specified in the `Cargo.toml`. If the shasums match, them the shasum is written to the `.mini-dashboard.sha1` file for later comparison. On subsequent builds, the build script will check that both the mini-dashboard assets and the shasum file are found and that the shasum file content matches the one from the toml file. It will only preform a re-generation on the static dashboard files if it finds that either the dashboard is not present where it expects it to be, or if it finds out that it is outdated, by comparing the shasums.

## Notes

I had to rely on a [custom patch](https://github.com/MarinPostma/actix-web-static-files/tree/actix-web-4) of actix-web-static-files, to support actix-web 4 beta6. there is currently a [pr on the official repo](https://github.com/kilork/actix-web-static-files/pull/35) to support actix-web 4, but it most likely won't be merged until actix is stabilized.


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-26 16:22:20 +00:00
77481d7c76 update gitignore 2021-04-26 18:21:09 +02:00
c2461e5066 review fixes 2021-04-26 10:20:46 +02:00
e4bd1bc5ce update actix-web-static-file rev 2021-04-22 11:42:41 +02:00
90f57c1329 update CI & Dockerfile 2021-04-22 11:22:09 +02:00
6af769af20 bump mini-dashboard 2021-04-22 10:45:05 +02:00
6bcf20c70e serve static site 2021-04-22 10:26:54 +02:00
bb79695e44 load mini-dashboard assets 2021-04-22 10:26:54 +02:00
ea5517bc8c add mini-dashboard feature 2021-04-22 10:26:54 +02:00
da08a1f25c Merge #157
157: Use <em> tags instead of <mark> tags for highlighting r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-22 08:11:07 +00:00
a72d2f66cd use <em> tags instead of <mark> tags for highlighting 2021-04-21 19:14:55 +02:00
e5df58bc04 Merge #150
150: add _formated field to search result r=MarinPostma a=MarinPostma

close #75 

Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-21 16:33:30 +00:00
662ffc8fa5 Merge #155
155: Fix dockerfile r=MarinPostma a=curquiza

docker build and run works now :)

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-21 10:22:01 +00:00
ce5e4743e6 Fix dockerfile 2021-04-21 11:00:04 +02:00
dd2914873b fix document fields order 2021-04-20 21:30:30 +02:00
d9a29cae60 fix ignored displayed attributes 2021-04-20 21:23:35 +02:00
7a737d2bd3 support wildcard 2021-04-20 21:23:35 +02:00
881b099c8e add tests 2021-04-20 21:23:34 +02:00
c6bb36efa5 implement _formated 2021-04-20 21:23:28 +02:00
526a05565e add SearchHit structure 2021-04-20 21:22:48 +02:00
09f13823f4 Merge #154
154: Update version for the next release (alpha3) r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-20 14:21:18 +00:00
b8e535579f Update version for the next release (alpha3) 2021-04-20 16:11:07 +02:00
63d443deb8 Merge #124
124: enable distinct r=MarinPostma a=MarinPostma



Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-20 13:52:00 +00:00
f8c338e3a7 add test for dedicated distinct route 2021-04-20 15:49:17 +02:00
6c470cf687 enable distinct-attribute setting route 2021-04-20 11:34:18 +02:00
ec63e13896 bump actix 2021-04-20 11:29:32 +02:00
1746132c7d add test set/reset distinct attribute 2021-04-20 11:29:08 +02:00
ec230c2835 enable distinct 2021-04-20 11:29:06 +02:00
bf3c04f2dc Merge #152
152: bump actix r=irevoire a=MarinPostma



Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-20 09:16:15 +00:00
45665245dc bump actix 2021-04-20 11:07:23 +02:00
94c5c5843b Merge #149
149: Handle star in attributes_to_retrieve r=MarinPostma a=curquiza

Closes #147

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-19 17:29:21 +00:00
c05d260d9a Merge #148
148: Update milli version to v0.1.1 r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-19 17:22:20 +00:00
8eceba98d3 Handle star in attributes_to_retrieve 2021-04-19 18:20:19 +02:00
2c380731b9 Update milli version to v0.1.1 2021-04-19 16:03:39 +02:00
7ce74f95a2 Merge #146
146: Remove another unused legacy file r=MarinPostma a=irevoire

When doing #135 I missed an old useless file in the scr/routes directory

Co-authored-by: tamo <tamo@meilisearch.com>
2021-04-15 18:05:28 +00:00
a3813dd453 Merge #145
145: Update tokenizer to v0.2.1 r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-15 17:56:47 +00:00
ec3a08ea0c remove another unused legacy file 2021-04-15 14:44:43 +02:00
b0717b75d9 Update tokenizer to v0.2.1 2021-04-14 19:06:18 +02:00
6359a08cfe Merge #139
139: Fix commit date & SHA in startup message r=MarinPostma a=shekhirin

Resolves https://github.com/meilisearch/transplant/issues/137
Resolves https://github.com/meilisearch/transplant/issues/138

---
I ran a GitHub Action towards my own dockerhub: https://github.com/shekhirin/transplant/actions/runs/732666353

Startup message now shows correct `Commit SHA` and `Commit date` (changed from `Build date`).
```console
➜ transplant (shekhirin/startup-git-vars) ✔ docker run -it -p 7700:7700 shekhirin/meilisearch:v0.21.0-alpha.2 ./meilisearch --no-analytics=true
Unable to find image 'shekhirin/meilisearch:v0.21.0-alpha.2' locally
v0.21.0-alpha.2: Pulling from shekhirin/meilisearch
bfdacc68c91b: Already exists 
73b1ed30fa0b: Pull complete 
6607217ed754: Pull complete 
Digest: sha256:31bd6ac37e8711ab9d4123cf2ba2f942686569f08d68cfed8643752f381bfb74
Status: Downloaded newer image for shekhirin/meilisearch:v0.21.0-alpha.2

888b     d888          d8b 888 d8b  .d8888b.                                    888
8888b   d8888          Y8P 888 Y8P d88P  Y88b                                   888
88888b.d88888              888     Y88b.                                        888
888Y88888P888  .d88b.  888 888 888  "Y888b.    .d88b.   8888b.  888d888 .d8888b 88888b.
888 Y888P 888 d8P  Y8b 888 888 888     "Y88b. d8P  Y8b     "88b 888P"  d88P"    888 "88b
888  Y8P  888 88888888 888 888 888       "888 88888888 .d888888 888    888      888  888
888   "   888 Y8b.     888 888 888 Y88b  d88P Y8b.     888  888 888    Y88b.    888  888
888       888  "Y8888  888 888 888  "Y8888P"   "Y8888  "Y888888 888     "Y8888P 888  888

Database path:          "./data.ms"
Server listening on:    "http://0.0.0.0:7700"
Environment:            "development"
Commit SHA:             "038f1c740198f974743ba87fce7b74a8d0b71b5c"
Commit date:            "2021-04-09"
Package version:        "0.21.0-alpha.2"
Sentry DSN:             "https://5ddfa22b95f241198be2271aaf028653@sentry.io/3060337"
Anonymous telemetry:    "Disabled"

No master key found; The server will accept unidentified requests. If you need some protection in development mode, please export a key: export MEILI_MASTER_KEY=xxx

Documentation:          https://docs.meilisearch.com
Source code:            https://github.com/meilisearch/meilisearch
Contact:                https://docs.meilisearch.com/resources/contact.html or bonjour@meilisearch.com

[2021-04-09T10:29:49Z INFO  actix_server::builder] Starting 2 workers
[2021-04-09T10:29:49Z INFO  actix_server::builder] Starting "actix-web-service-0.0.0.0:7700" service on 0.0.0.0:7700
[2021-04-09T10:29:49Z INFO  meilisearch_http::index_controller::uuid_resolver::actor] uuid resolver started
[2021-04-09T10:29:49Z INFO  meilisearch_http::index_controller::update_actor::actor] Started update actor.
```

Endpoint also works as expected (`buildDate` -> `commitDate`)
```console
➜ transplant (shekhirin/startup-git-vars) ✔ curl http://localhost:7700/version
{"commitSha":"038f1c740198f974743ba87fce7b74a8d0b71b5c","commitDate":"2021-04-09","pkgVersion":"0.21.0-alpha.2"}
```

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-04-13 17:38:47 +00:00
f87afbc558 fix(http): commit date & SHA in startup message 2021-04-13 20:16:18 +03:00
8df5f73706 Merge #133
133: Implement stats route r=MarinPostma a=shekhirin

Resolves https://github.com/meilisearch/transplant/issues/73

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-04-13 17:03:33 +00:00
9eaf048a06 fix(http): use BTreeMap instead of HashMap to preserve stats order 2021-04-13 11:59:07 +03:00
adfdb99abc feat(http): calculate updates' and uuids' dbs size 2021-04-09 15:59:12 +03:00
ae1655586c fixes after review 2021-04-09 14:40:48 +03:00
698a1ea582 feat(http): store processing as RwLock<Option<Uuid>> in index_actor 2021-04-09 14:34:43 +03:00
87412f63ef feat(http): implement is_indexing for stats 2021-04-09 14:34:42 +03:00
09d9a29176 test(http): server & index stats 2021-04-09 14:34:42 +03:00
dd9eae8c26 feat(http): stats route 2021-04-09 14:34:42 +03:00
a1d04fbff5 Merge #136
136: Rename update status "pending" into "enqueued" r=curquiza a=curquiza

Closes #107 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-08 16:46:12 +00:00
dd1a08087b Merge #134
134: fix(http, index): init analyzer with optional stop words r=MarinPostma a=shekhirin

Also bump `milli` and `meilisearch-tokenizer` packages versions

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-04-08 16:13:15 +00:00
51ba1bd7d3 fix(http, index): init analyzer with optional stop words
Next release

update tokenizer
2021-04-08 17:16:13 +03:00
f881e8691e Merge #135
135: Add stop words r=curquiza a=irevoire

closes #21 

Co-authored-by: tamo <tamo@meilisearch.com>
2021-04-08 11:29:00 +00:00
94c0858c27 Merge #1327
1327: Update link after branch renaming r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-08 05:47:20 +00:00
6aaa4a8e19 Update link after branch renaming 2021-04-07 19:47:48 +02:00
cb23775d18 Rename pending into enqueued 2021-04-07 19:46:36 +02:00
0344cf5874 Merge #122
122: Update display r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-07 12:33:25 +00:00
4a1b033765 Merge #1318
1318: Update README.md for contributions r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-06 23:11:29 +00:00
dcd60a5b45 add more tests for the stop_words 2021-04-06 18:29:38 +02:00
b1962c8e02 remove legacy files from meilisearch that have been replaced by a macro in routes/settings/mod.rs 2021-04-06 16:29:04 +02:00
40ef9a3c6a push a first implementation of the stop_words 2021-04-06 16:29:04 +02:00
2206a44baf Merge #132
132: Next release (alpha2) r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-01 15:25:45 +00:00
4ee6ce7871 Next release 2021-04-01 17:16:16 +02:00
6cb8052d3d Merge #104
104: Update all the response format (issue #64) r=MarinPostma a=irevoire

closes #64 

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: tamo <tamo@meilisearch.com>
2021-04-01 14:22:57 +00:00
73973e2b9e fix more settings routes 2021-04-01 15:50:45 +02:00
89e05fc6c5 Merge #113
113: snapshots r=MarinPostma a=MarinPostma

 This pr adds support for snapshoting.

The snapshoting process for an index requires that no other update is processing at the same time. A mutex lock has been added to prevent a snapshot from occuring at the same time as an update, while still premitting updates to be pushed.

The list of the indexes to snapshot is first retrieved from the `UuidResolver` which also performs its snapshot.

This list is passed to the update store, which attempts to acquire a lock on the update store while it snaphots itself and it's associated index store.

 This means that a snapshot can only be completed once all indexes have finished their ongoing update.

This pr also adds refactoring of the code to allow unit testing and mocking, and unit test the snapshot creation.

Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: tamo <irevoire@protonmail.ch>
Co-authored-by: marin <postma.marin@protonmail.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-01 13:16:00 +00:00
248e9b3808 Merge remote-tracking branch 'origin/main' into snapshots 2021-04-01 15:10:33 +02:00
79c63049d7 update the settings routes 2021-04-01 11:52:26 +02:00
96cffeab1e update all the response format to be ISO with meilisearch, see #64 2021-04-01 11:43:03 +02:00
39a18d4edc Update README.md 2021-04-01 00:00:21 +02:00
6e1ddfea5a Merge pull request #129 from shekhirin/fix-docker-commit-sha
fix(ci, http): commit_sha and commit_date in docker builds
2021-03-31 21:46:17 +02:00
d8af4a7202 ignore snapshot test (#130) 2021-03-31 20:07:52 +02:00
3d51db5929 fix(ci, http): commit_sha and commit_date in docker builds
chore(ci): cache dependencies in Docker build
2021-03-31 13:56:28 +03:00
b0956c09c1 Merge pull request #127 from shekhirin/docker-deps-cache
chore(ci): cache dependencies in Docker build
2021-03-31 12:48:57 +02:00
a294462a06 Merge #1319
1319: Stable into master r=MarinPostma a=MarinPostma



Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-03-31 09:32:48 +00:00
5bc464dc53 chore(ci): cache dependencies in Docker build 2021-03-31 11:23:09 +03:00
7807a8dcff Merge #1315
1315: fix armv7 r=MarinPostma a=MarinPostma

fix armv7 build

this was caused by usize being 32 bit on armv7 and 64bits on all other targeted architectures.


Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-03-29 17:20:50 +00:00
0bad5529d8 Merge #1309
1309: fix snapshot r=MarinPostma a=MarinPostma

fix snapshot broken by #1238.

Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-03-29 15:20:46 +00:00
4fe885408b fix arm 2021-03-29 17:19:31 +02:00
9a1ab4e69f fix test 2021-03-29 14:10:37 +02:00
e0b3c4f82f Merge #1310
1310: Fix display of http address r=MarinPostma a=curquiza

Wrong display introduced by https://github.com/meilisearch/MeiliSearch/pull/1206

Now displaying:

<img width="968" alt="Capture d’écran 2021-03-26 à 12 04 59" src="https://user-images.githubusercontent.com/20380692/112622594-8c173080-8e2b-11eb-81c3-5876d273e5fa.png">


Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-03-29 11:04:49 +00:00
ac858d9800 Remove clippy warnings in CI 2021-03-29 12:01:26 +02:00
7050236a93 Merge pull request #123 from irevoire/snapshots
remove the now useless dead_code flags
2021-03-26 17:54:38 +01:00
0f2143e7fd remove the now useless dead_code flags 2021-03-26 14:15:12 +01:00
b9f79c8df0 Update display 2021-03-26 12:12:55 +01:00
9587ea7f06 Fix display of http address 2021-03-26 12:04:22 +01:00
7f68b83cb7 fix snapshot 2021-03-26 11:34:37 +01:00
d7c077cffb atomic snapshot import 2021-03-25 14:48:51 +01:00
7d6ec7f3d3 resolve merge 2021-03-25 14:21:05 +01:00
f3dc853be3 Merge remote-tracking branch 'origin/main' into snapshots 2021-03-25 13:45:07 +01:00
28095c6454 Merge #1307
1307: change ubuntu version r=MarinPostma a=MarinPostma

Change the CI ubuntu version from `latest` to `18.04` because `latest` uses a too recent version of glibc, preventing meilisearch from running on the debian version of the DO image


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-25 11:42:13 +00:00
48507460b2 add snapshot tests 2021-03-25 12:02:10 +01:00
bb7d3be1b8 change ubuntu version 2021-03-25 10:44:40 +01:00
d029464de8 fix snapshot path 2021-03-25 10:23:31 +01:00
79d09705d8 perform snapshot on startup 2021-03-25 09:35:15 +01:00
868658f3d8 Merge #109
109: Make updates atomic r=curquiza a=MarinPostma

Until now, the index_uid->uuid mapping was done before the update was written to disk in the case of automatic index creation. This was an issue when the update failed, and the index would still exists in the uuid resolver.

This is fixed by this pr, by first creating the update with an uuid if the index does not exist, and then register this uuid to the uuid resolver.

This is preliminary work to the implementation of snapshots (#19).

This pr also changes the `resolve` method on the `UuidResolver` to `get` to make it clearer.


The `create_uuid` method may be bound to disappear when the index name resolution is handled by a remote machine.

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-24 12:24:32 +00:00
fe87477238 Merge #115
115: Add the exhaustiveNbHits in search response body (returns always false) r=curquiza a=irevoire

closes #103 

Co-authored-by: tamo <irevoire@protonmail.ch>
Co-authored-by: Irevoire <irevoire@protonmail.ch>
2021-03-24 12:16:53 +00:00
d892a2643e fix clippy 2021-03-24 12:38:59 +01:00
83ffdc888a remove bad file name test 2021-03-24 12:38:59 +01:00
4041d9dc48 format code 2021-03-24 12:38:59 +01:00
1f16c8d224 integration test snapshot 2021-03-24 12:38:59 +01:00
06f9dae0f3 remove prints 2021-03-24 12:38:59 +01:00
48d5f88c1a fix snapshot dir already exists 2021-03-24 12:38:59 +01:00
eb53ed4cc1 load snapshot 2021-03-24 12:38:59 +01:00
46293546f3 add tests and mocks 2021-03-24 12:38:59 +01:00
3cc3637e2d refactor for tests 2021-03-24 12:38:56 +01:00
1f51fc8baf create indexes snapshots concurrently 2021-03-24 12:38:12 +01:00
e9da191b7d fix snapshot bugs 2021-03-24 12:38:12 +01:00
d73fbdef2e remove from snapshot 2021-03-24 12:38:12 +01:00
44dcfe29aa clean snapshot creation 2021-03-24 12:38:12 +01:00
a85e7abb0c fix snapshot creation 2021-03-24 12:38:12 +01:00
4847884165 restore snapshots 2021-03-24 12:38:12 +01:00
7f6a54cb12 add lock to prevent snapshot during update 2021-03-24 12:38:12 +01:00
520f7c09ba sequential index snapshot 2021-03-24 12:38:12 +01:00
35a7b800eb snapshot indexes 2021-03-24 12:38:12 +01:00
c966b1dd94 use options to schedule snapshot 2021-03-24 12:38:11 +01:00
ee838be41b implement snapshot scheduler 2021-03-24 12:38:11 +01:00
127e944866 Update meilisearch-http/src/index/search.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-03-23 19:13:22 +01:00
cc81aca6a4 Update meilisearch-http/src/index/search.rs
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-03-23 10:47:19 +01:00
46d7cedb18 Update meilisearch-http/src/index/search.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-03-23 10:46:59 +01:00
5f33672f0e change payload send to use stream methods 2021-03-22 19:49:21 +01:00
b690f1103a fix typos 2021-03-22 19:25:56 +01:00
91089db444 add the exhaustive nb hits to be ISO, currently it's always set to false 2021-03-22 18:41:33 +01:00
70fd4f109d Merge #1299
1299: bump meilisearch r=MarinPostma a=MarinPostma



Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-22 15:14:11 +00:00
186b0869df edit changelog 2021-03-22 16:10:53 +01:00
7652fc1a04 bump meiliseach 2021-03-22 16:03:19 +01:00
2f418ee767 Merge #108
108: use write senders for updates r=MarinPostma a=MarinPostma

 Use write senders to send updates to the `IndexActor`, so updates are performed sequentially on all indexes.

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-22 14:18:43 +00:00
2ecde74fa4 Merge #112
112: fix root route r=MarinPostma a=irevoire

closes #93

Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-03-22 14:08:59 +00:00
7ecefe37da fix root route 2021-03-19 11:34:54 +01:00
89d13706f1 Merge #1291
1291: Use 200 status code for healthcheck endpoint  r=MarinPostma a=irevoire

closes  #1282

Co-authored-by: tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-03-18 11:02:45 +00:00
d4b1331a0a use the json method instead of the body method in the creation of the response 2021-03-18 11:54:10 +01:00
147756750b create uuid on successful update addition
also change resolve to get in uuid resolver
2021-03-18 09:09:26 +01:00
8b99860e85 use write sender for updates 2021-03-18 08:32:05 +01:00
a2c8dae914 Merge #1292
1292: return a 200 on / when meilisearch is running in production r=MarinPostma a=irevoire

close #1235

Co-authored-by: tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <irevoire@protonmail.ch>
2021-03-18 06:09:21 +00:00
1640d9ea91 Merge #106
106: return 202 on settings update / reset r=MarinPostma a=irevoire

closes #105

Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-03-18 06:06:35 +00:00
6b4ea7f594 ensure the reset_settings also return a 202 2021-03-17 15:09:13 +01:00
c8b05712fa return 202 on settings update / reset 2021-03-17 14:44:32 +01:00
56b4782ee1 Merge #1293
1293: stable to master r=curquiza a=MarinPostma

replace & close #1239


Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: marin <postma.marin@protonmail.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
2021-03-17 13:25:21 +00:00
b6831320f9 Merge pull request #100 from meilisearch/next-release
Update Cargo.toml for the next release
2021-03-16 20:18:37 +01:00
8a52979ffa Update Cargo.toml 2021-03-16 19:54:34 +01:00
ca3b343b1f Merge #96
96: Check json payload on document addition r=curquiza a=MarinPostma

Check if the json payload in updates is valid. It uses a json validator to avoid allocation, and only serializes the json in case of error, to return a pretty message.

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-16 17:20:44 +00:00
f8ea081df5 Merge #98
98: replace body with json r=curquiza a=MarinPostma



Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-16 17:12:30 +00:00
588bc8f9ef Merge #99
99: return a 200 on health check r=MarinPostma a=irevoire

closes #92 

Co-authored-by: tamo <tamo@meilisearch.com>
2021-03-16 16:47:44 +00:00
233c1e304d use json instead of body when crafting the request 2021-03-16 17:45:59 +01:00
a268d0e283 return a 200 on health check 2021-03-16 17:42:01 +01:00
9992c36ced Merge branch 'stable'
fix conflict with master
2021-03-16 16:59:39 +01:00
81255814b1 Update meilisearch-http/src/routes/mod.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-03-16 16:57:29 +01:00
764ced8b5c Merge #88
88: restore name field in index meta r=MarinPostma a=MarinPostma

Makes the IndexMetadata payload iso with legacy meilisearch and closes #67 


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-16 15:50:08 +00:00
3c25ab0d50 replace body with json 2021-03-16 16:46:07 +01:00
63a3a1fd90 Merge pull request #97 from meilisearch/improve-release-drafter
Update release-draft-template.yml
2021-03-16 16:00:28 +01:00
761c2b0639 Update release-draft-template.yml 2021-03-16 15:16:33 +01:00
c6dbd81823 Merge #90
90: restore version route r=MarinPostma a=MarinPostma

close #74


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-16 13:53:23 +00:00
13c5289ff1 Update release-drafter.yml 2021-03-16 14:46:08 +01:00
23fae3328b Merge pull request #77 from meilisearch/release-drafter
Add release drafter file
2021-03-16 14:43:27 +01:00
85f3b192d5 Update release-draft-template.yml 2021-03-16 14:33:52 +01:00
204c743bcc add json payload check on document addition 2021-03-16 14:28:13 +01:00
4aaa561147 Add release drafter file 2021-03-16 14:17:08 +01:00
018cadc598 follow the IBM convention 2021-03-16 14:02:14 +01:00
2138f54954 Merge #89
89: delete index returns 204 instead of 200 r=curquiza a=MarinPostma

 close #63

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-16 13:01:32 +00:00
0a0eee4993 Merge #1238
1238: fix snapshot temp file r=curquiza a=MarinPostma

fix snapshot creating a temp file in /tmp, and create the temp file in the snapshot directory instead.

close #1237


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-16 13:00:21 +00:00
e0c5740050 Merge #94
94: remove guard on document addition routes r=curquiza a=MarinPostma

 Remove `application/json` guards on document addition routes.

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-03-16 12:52:43 +00:00
0c27bea135 return a 400 on / when meilisearch is running in production 2021-03-16 13:38:43 +01:00
1145599c04 Merge #91
91: Add bors configuration r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-03-16 12:09:11 +00:00
9dd1ecdc2a Add bors configuration 2021-03-16 13:08:26 +01:00
f4cf96915a remove guard on add documetn route 2021-03-16 12:04:32 +01:00
f6d0689967 add a body to be fully compliant with the http spec 2021-03-16 11:40:51 +01:00
a2ac2de011 Use 200 status code for healthcheck endpoint 2021-03-16 11:22:00 +01:00
6a742ee62c restore version route 2021-03-15 19:11:27 +01:00
58fab035bb delete index returns 204 instead of 200 2021-03-15 18:44:33 +01:00
07bb1e2c4e fix tests 2021-03-15 18:38:13 +01:00
94bd14ede3 add name to index_metadata 2021-03-15 18:35:16 +01:00
0c17b166df Merge pull request #58 from meilisearch/actor-index-controller
actor index controller
2021-03-15 18:25:35 +01:00
dd324807f9 last review edits + fmt 2021-03-15 18:11:10 +01:00
c29b86849b use actix cors git dependency 2021-03-15 17:40:20 +01:00
abbea59732 fix clippy warnings 2021-03-15 16:52:05 +01:00
01479dcf99 rename name to uid in code 2021-03-15 14:43:47 +01:00
0c80d891c0 clean Cargo.toml 2021-03-15 14:29:30 +01:00
f727dcc8c6 update milli 2021-03-15 14:26:59 +01:00
55fadd7f87 change facetedAttributes to attributesForFaceting 2021-03-15 13:53:50 +01:00
fcf1d4e922 fix displayed attributes in search 2021-03-15 12:20:33 +01:00
c079f60346 fixup! fix displayed attributes in document retrieval 2021-03-15 11:01:14 +01:00
77c0a0fba5 add test get document displayed attributes 2021-03-15 10:36:12 +01:00
adc71a70ce fix displayed attributes in document retrieval 2021-03-15 10:17:41 +01:00
99c89cf2ba use options max db sizes 2021-03-13 10:09:10 +01:00
49b74b587a enable jemalloc only on linux 2021-03-12 17:47:40 +01:00
c61fab1435 Merge branch 'main' into actor-index-controller 2021-03-12 15:14:20 +01:00
2ee2e6a9b2 clean project 2021-03-12 14:57:24 +01:00
c4846dafca implement update index 2021-03-12 14:48:43 +01:00
77d5dd452f remove open_or_create 2021-03-12 14:16:54 +01:00
e4d45b0500 fix various bugs 2021-03-12 00:37:43 +01:00
7d9637861f fix add primary key on index creation 2021-03-11 22:55:29 +01:00
271c8ba991 change index name to uid 2021-03-11 22:47:29 +01:00
8617bcf8bd add ranking rules 2021-03-11 22:39:16 +01:00
66b64c1f80 correct error on settings delete unexisting index 2021-03-11 22:33:31 +01:00
30dd790884 handle badly formatted index uid 2021-03-11 22:23:48 +01:00
40b3451a4e fix unexisting update store + race conditions 2021-03-11 22:11:58 +01:00
3f68460d6c fix update dedup 2021-03-11 20:58:51 +01:00
79a4bc8129 use meta from milli 2021-03-11 19:40:18 +01:00
1fad72e019 fix test bug with tempdir 2021-03-11 17:59:47 +01:00
2ae90f9c5d lazy load update store 2021-03-11 14:23:11 +01:00
53cf500e36 uuid resolver hard state 2021-03-10 18:04:20 +01:00
a56e8c1a0c fix tests 2021-03-10 14:47:04 +01:00
0cd8869349 update relevant changes from master 2021-03-10 14:43:10 +01:00
5ca3382f5c Merge #1286
1286: Timestamp changelog r=curquiza a=sandstrom

A timestamped changelog makes it easier to track progress, understand velocity, see if something has recently changed, etc.

https://keepachangelog.com/en/1.0.0/

Co-authored-by: sandstrom <mail+github@a16m.se>
2021-03-10 12:57:31 +00:00
dcc6f20f31 Timestamp changelog 2021-03-10 13:47:48 +01:00
5ecf514d28 restructure project 2021-03-10 13:46:49 +01:00
8061a04661 add test assets 2021-03-10 13:38:30 +01:00
562da9dd3f fix test compilation 2021-03-10 11:56:51 +01:00
f475385788 Merge #1113
1113: [ci] Add all target to  check r=MarinPostma a=woshilapin

Follow-up on https://github.com/meilisearch/MeiliSearch/pull/1100#issuecomment-735828974. If you disagree to add this, I'm totally fine to close this PR without merging (related to #1099).

Co-authored-by: Jean SIMARD <woshilapin@tuziwo.info>
2021-03-09 14:27:21 +00:00
9661ee5d64 Merge pull request #76 from meilisearch/no-jemalloc-macos
Make sure that we do not use jemalloc on macos
2021-03-09 09:57:39 +01:00
4a0f5f1b03 Make sure that we do not use jemalloc on macos 2021-03-08 21:22:30 +01:00
ce652fc8df Merge #1252
1252: change the wording of Amplify to make it clearer r=curquiza a=fharper



Co-authored-by: Frédéric Harper <hi@fred.dev>
2021-03-08 19:42:13 +00:00
07e7acc35d Merge #1280
1280: Make sure that we do not use jemalloc on macos r=MarinPostma a=Kerollmops

We were wrongly compiling jemalloc on macOS even though we did use it only on Linux.

Fixes #1136.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-03-08 19:10:21 +00:00
51e0d6d5ee remove word on 2021-03-08 11:41:09 -05:00
4e1597bd1d clean Uuid resolver actor 2021-03-08 16:28:27 +01:00
06403a5708 clean index actor unwraps 2021-03-08 15:53:16 +01:00
9d421d5ed4 Merge pull request #72 from meilisearch/enable-criterion
enable criterion setting
2021-03-08 14:08:16 +01:00
e9b90d5380 fixes from review 2021-03-08 13:51:33 +01:00
944a5bb36e update milli 2021-03-08 13:46:30 +01:00
2f93cce7aa auto index creation 2021-03-08 10:48:34 +01:00
ac4d795eff update created at when updating index 2021-03-08 10:21:12 +01:00
ced32afd9f implement get single index 2021-03-06 20:17:58 +01:00
281a445998 implement list indexes 2021-03-06 20:12:20 +01:00
d9254c4355 implement index delete 2021-03-06 12:57:56 +01:00
86211b1ddd import routes modules in main 2021-03-06 10:53:11 +01:00
7d28f8cff0 implement get single udpate 2021-03-06 10:51:52 +01:00
f4f42ec441 add tests 2021-03-05 20:06:10 +01:00
3992d917ec Merge pull request #55 from meilisearch/fix-settings-delete
fix settings delete
2021-03-05 19:57:43 +01:00
964e52ef08 Merge pull request #56 from meilisearch/fix-bad-index-uid
Fix bad index uid
2021-03-05 19:57:31 +01:00
65ca80bdde enable criterion setting 2021-03-05 19:31:49 +01:00
b8ebf07555 Merge pull request #57 from meilisearch/remove-duplicated-pending-update
remove duplicated pending update
2021-03-05 19:17:57 +01:00
f04dd2af39 enable tests delete settings 2021-03-05 19:14:45 +01:00
d52e6fc21e fix settings delete bug 2021-03-05 19:14:45 +01:00
561f29042c add tests 2021-03-05 19:12:35 +01:00
3987d17e40 add indx uid format guard on create ops 2021-03-05 19:10:24 +01:00
c0515bcfe2 Update src/index_controller/local_index_controller/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-03-05 19:08:32 +01:00
7d2ae9089e restore test 2021-03-05 19:08:32 +01:00
4552c42f88 deduplicate pending and processing updates 2021-03-05 19:08:32 +01:00
a9c7b73744 implement list all updates 2021-03-05 18:34:04 +01:00
c2282ab5cb non local udpate actor 2021-03-04 19:30:13 +01:00
f090f42e7a multi index store
create two channels for Index handler, one for writes and one for reads,
so write are processed one at a time, while reads are processed in
parallel.
2021-03-04 19:18:01 +01:00
6a0a9fec6b async update store 2021-03-04 17:25:02 +01:00
a955e04ab6 implement clear documents 2021-03-04 16:04:12 +01:00
ae5581d37c implement delete documents 2021-03-04 15:59:18 +01:00
181eaf95f5 restore update documents 2021-03-04 15:10:58 +01:00
581dcd5735 implement retrieve one document 2021-03-04 15:09:00 +01:00
f3d65ec5e9 implement retrieve documents 2021-03-04 14:20:19 +01:00
17b84691f2 list settings 2021-03-04 12:38:55 +01:00
47138c7632 update settings 2021-03-04 12:20:14 +01:00
8432c8584a refactor index controller 2021-03-04 12:03:06 +01:00
a56db854a2 refactor update handler 2021-03-04 11:58:15 +01:00
9e2a95b1a3 refactor search 2021-03-04 11:23:41 +01:00
ae3c8af56c enable faceted search 2021-03-04 10:42:44 +01:00
70dce6cc0b Make sure that we do not use jemalloc on macos 2021-03-04 09:17:46 +01:00
77083d9e80 Merge #1279
1279: fix Docker volume path r=MarinPostma a=fharper

essential if `$(pwd)` returns a path with spaces

Co-authored-by: Frédéric Harper <hi@fred.dev>
2021-03-03 21:15:16 +00:00
4a66803d76 fix Docker volume path
essential if pwd returns a path with spaces
2021-03-03 13:18:07 -05:00
eff8570f59 handle ctrl-c shutdown 2021-03-03 15:10:00 +01:00
3cd799a744 fix update files created in the wrong place 2021-03-03 14:39:44 +01:00
e285404c3e handle errors when sendign payload to actor 2021-03-03 12:16:16 +01:00
70d935a2da refactor index serach for better error handling 2021-03-03 11:53:01 +01:00
7c7143d435 remove IndexController interface 2021-03-03 11:43:51 +01:00
9aca6fab88 completely file backed udpates 2021-03-03 11:01:15 +01:00
d1f34f926e [ci] Add all target to check 2021-03-02 20:48:57 +01:00
62532b8f79 WIP concurent index store 2021-03-02 14:05:03 +01:00
402203aa2a Merge pull request #62 from meilisearch/fix-ci-2
Fix CI artefacts
2021-03-02 13:25:16 +01:00
cf97b9ff2b Update create_artifacts.yml 2021-03-02 12:06:38 +01:00
e7b541a2af Merge pull request #61 from meilisearch/fix-ci
Add checkout to docker CI
2021-03-02 11:43:45 +01:00
4cf66831d4 Update publish_to_docker.yml 2021-03-02 11:38:39 +01:00
f41284a133 Merge pull request #60 from meilisearch/prepare-for-ci
Prepare for ci
2021-03-02 10:53:15 +01:00
a77d517ac1 Merge #1206
1206: fix running URL display r=curquiza a=fharper

by doing that you can just click on it in the terminal if you want

Co-authored-by: Frédéric Harper <hi@fred.dev>
2021-03-02 09:51:32 +00:00
fc351b54d9 change milli revision 2021-03-01 20:09:23 +01:00
c2fdb0ad4d Update .github/workflows/create_artifacts.yml
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-03-01 19:59:54 +01:00
1968bfac4d remove legacy tests 2021-03-01 15:48:42 +01:00
c4dfd5f0c3 implement search and fix document addition 2021-03-01 15:45:05 +01:00
ac2af4354d remove actor index controller 2021-03-01 15:35:32 +01:00
9227b7cb2f remove data.ms 2021-03-01 15:30:48 +01:00
e1e5935e3c CI recipes 2021-03-01 14:44:55 +01:00
4316d991a2 add docker recipe 2021-03-01 14:41:57 +01:00
d1be3d60df run tests on all pushed 2021-03-01 14:41:57 +01:00
a9a9ed6318 create workspace with meilisearch-error 2021-03-01 14:41:55 +01:00
79708aeb67 add milli as git dep 2021-03-01 14:41:20 +01:00
0c2777dfd5 Merge pull request #59 from meilisearch/license
license
2021-02-28 10:11:33 +01:00
5ba58c1e9c add Marin to authors 2021-02-28 10:09:56 +01:00
c994fe4609 add license 2021-02-28 10:08:36 +01:00
658166c05e implement document push 2021-02-26 18:11:43 +01:00
6bcc302950 receive update 2021-02-26 17:14:11 +01:00
d8a337fcac Merge #1265
1265: Inferring whether to show or Hide API Key box r=curquiza a=sanders41

Relates to #1261

This is one potential solution for inferring whether an instance has an API key and show or hide the text input box accordingly. When the page first loads a request is sent to the server with no API key. If that request was successful then no API key is need so the box is hidden. If the request returns with a 401 status then the API Key was needed and it is shown.


Co-authored-by: Paul Sanders <psanders1@gmail.com>
2021-02-26 10:27:37 +00:00
672a4b5400 add actors/ support index creation 2021-02-26 09:10:36 +01:00
61ce749122 update tokio and disable all routes 2021-02-26 09:10:04 +01:00
ee02d55e67 Merge #1266
1266: Simplify compile and run from sources r=curquiza a=tpayet

Related to #1136, I just saw that compile & run instructions from sources were not up to date

Co-authored-by: Thomas Payet <thomas@meilisearch.com>
2021-02-25 15:47:11 +00:00
417d0ae92a Simplify compile and run from sources 2021-02-25 11:52:08 +01:00
22108f9f90 Specifying a 401 status code to show API Key 2021-02-25 01:07:18 -05:00
101e050746 Show or hide the API key text input box when needed 2021-02-25 00:56:08 -05:00
45d8f36f5e Merge pull request #49 from meilisearch/tests
tests
2021-02-24 10:41:55 +01:00
caaaf15fd6 Create rust.yml 2021-02-24 10:31:28 +01:00
60a42bc511 reset settings 2021-02-24 10:19:22 +01:00
3f939f3ccf test delete settings 2021-02-24 10:14:36 +01:00
7d9c5f64aa test partial update 2021-02-24 09:42:36 +01:00
c7ab4dccc3 test get settings 2021-02-24 09:30:51 +01:00
ac89c35edc add settings routes errors 2021-02-23 19:46:18 +01:00
af2cbd0258 test get updates 2021-02-23 19:15:42 +01:00
0a3e946726 test delete batches 2021-02-23 14:13:43 +01:00
d3758b6f76 test delete documents 2021-02-22 16:03:17 +01:00
c95bf0cdf0 test badly formated primary key 2021-02-22 15:13:10 +01:00
4bca26298e test add document bad primary key 2021-02-22 14:55:40 +01:00
ded6483173 tests get one document 2021-02-22 14:32:48 +01:00
097cae90a7 tests get documents limit, offset, attr to retrieve 2021-02-22 14:23:17 +01:00
739c860cfd Merge #1260
1260: README.md: typos r=Kerollmops a=skerkour

Hey, I think I've noticed small typos. Feel free to close if I'm wrong :)

Co-authored-by: Sylvain Kerkour <6172808+skerkour@users.noreply.github.com>
2021-02-22 08:59:58 +00:00
f01bb9cee3 README.md: typos 2021-02-20 17:49:59 +00:00
b8b8cc1312 get all documents, no options 2021-02-19 19:55:44 +01:00
27a7238d3f test list no documents 2021-02-19 19:46:45 +01:00
ec9dcd3285 test get add documents 2021-02-19 19:43:32 +01:00
ba2cfcc72d test delete index 2021-02-19 19:26:56 +01:00
5270cc0eae test update index 2021-02-19 19:26:42 +01:00
2bb695d60f test list all indexes 2021-02-19 19:23:58 +01:00
556ba956b8 test get empty index list 2021-02-19 19:14:25 +01:00
b1226be2c8 test document addition 2021-02-19 13:16:41 +01:00
b293948d36 test index delete 2021-02-18 20:44:33 +01:00
ed3f8f5cc0 test create multiple indexes 2021-02-18 20:32:34 +01:00
4c5effe714 test index update 2021-02-18 20:28:10 +01:00
68692a256e test get index 2021-02-18 20:24:40 +01:00
72eed0e369 test create index 2021-02-18 19:50:52 +01:00
588add8bec rename update fields to camel case 2021-02-18 19:11:19 +01:00
a7bd0681a0 Merge pull request #45 from meilisearch/facet-distributions
facets distribution
2021-02-17 15:03:38 +01:00
999758f7a1 facets distribution 2021-02-17 14:59:32 +01:00
2d7b2e651d Merge pull request #43 from meilisearch/facet-filters
enable faceted searches
2021-02-17 14:11:10 +01:00
b723f23f14 Merge pull request #44 from meilisearch/fix-fill-buffer-error
fix error message when empty payload
2021-02-17 14:02:39 +01:00
ae9a41a19f fix error message when empty payload 2021-02-17 14:00:42 +01:00
86f32e4ee4 Merge #1253
1253: fix line break r=Kerollmops a=fharper



Co-authored-by: Frédéric Harper <hi@fred.dev>
2021-02-17 10:57:16 +00:00
1873c0399a fix line break 2021-02-16 16:21:50 -05:00
47eeed0a4c change the wording of Amplify to make it clearer 2021-02-16 16:09:26 -05:00
91d6e90d5d enable faceted searches 2021-02-16 19:20:39 +01:00
4d08f04db2 Update movie posters (#1219)
* Update movie posters

* Remove last comma
2021-02-16 11:06:53 -05:00
93ce32d94d Merge pull request #39 from meilisearch/fix-attributes-to-retrieve
fix attributes to retrieve
2021-02-16 16:52:47 +01:00
4fe90a1a1c fix attributes to retrieve in search 2021-02-16 16:51:00 +01:00
22c204fea6 Merge pull request #40 from meilisearch/search-get
search get
2021-02-16 16:49:56 +01:00
e1253b6969 enable search with get route 2021-02-16 16:48:05 +01:00
f175d20599 Merge pull request #41 from meilisearch/list-keys
list keys
2021-02-16 16:39:24 +01:00
4d9819f6ef Merge pull request #42 from meilisearch/basic-error-handling
basic error handling
2021-02-16 16:38:25 +01:00
bead4075d8 implement list api keys 2021-02-16 16:38:20 +01:00
1823fa18c9 add basic error handling 2021-02-16 16:36:57 +01:00
4738fa94d0 Merge pull request #38 from meilisearch/index-deletion
implement index deletion
2021-02-16 16:36:20 +01:00
aad5b789a7 review edits 2021-02-15 23:40:53 +01:00
5c0b541248 delete db files on deletion 2021-02-15 23:32:38 +01:00
a9e9e72840 implement index deletion 2021-02-15 23:24:28 +01:00
a580a6a44d Merge pull request #37 from meilisearch/update-documents
Update documents
2021-02-15 23:22:02 +01:00
1eaf28f823 add primary key and update documents 2021-02-15 23:21:01 +01:00
3a634cb583 Merge pull request #35 from meilisearch/retrieve-documents
implemement retrieve documents
2021-02-15 23:11:34 +01:00
8bb1b6146f make retrieval non blocking 2021-02-15 23:02:20 +01:00
6c7175dfc2 Merge pull request #36 from meilisearch/delete-documents
delete documents
2021-02-15 22:39:00 +01:00
28b9c158b1 implement delete single document 2021-02-15 22:37:56 +01:00
4ea0e0fc05 Merge #1220
1220: Update Contact section of README.md r=Kerollmops a=react-learner

- Remove reference to Crisp chatbox (currently deactivated on docs site and homepage)
- Remove bonjour @ meilisearch.com email address, in order to concentrate communications in visible locations such as Slack and forums. @fharper

Co-authored-by: Tommy <68053732+react-learner@users.noreply.github.com>
2021-02-15 20:52:18 +00:00
b28be43cc6 Remove bonjour email from readme.md
Remove email address from README to concentrate communications in visible locations.
2021-02-15 09:19:23 -05:00
4a71861066 Revert link 2021-02-15 09:19:23 -05:00
5f25703d44 Update README.md
Fix docs links, remove reference to Crisp chatbox
2021-02-15 09:19:23 -05:00
c317af58bc implement delete document batches 2021-02-12 17:39:14 +01:00
a8ba809656 implement clear all documents 2021-02-11 12:03:00 +01:00
6766de437f implement get document 2021-02-11 11:20:39 +01:00
fa7379e129 Merge pull request #30 from meilisearch/update-index
implement update index
2021-02-11 11:03:25 +01:00
9fb0d94fc3 add tests 2021-02-11 11:02:27 +01:00
8fd9dc231c implement retrieve all documents 2021-02-10 17:08:37 +01:00
4ca46b9e5f fix bug in error message 2021-02-09 14:32:28 +01:00
90b930ed7f implement update index
implement update index
2021-02-09 14:32:26 +01:00
f44f8a823a Merge pull request #27 from meilisearch/create-index
Implement create index
2021-02-09 14:26:59 +01:00
e89b11b1fa create IndexSetting struct
need to stabilize the create index trait interface
2021-02-09 11:41:26 +01:00
e0976d10ba Merge branch 'release-v0.19.0' into stable 2021-02-09 11:11:33 +01:00
ea681026f7 fix snapshot temp file 2021-02-09 11:08:30 +01:00
759f6b48ee Merge #1233
1233: Fix link in launched resume r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-02-08 19:04:09 +00:00
ec047eefd2 implement create index 2021-02-08 12:28:45 +01:00
811426b161 Update main.rs 2021-02-06 15:53:40 +01:00
b1d9ad7134 Merge #1224
1224: fix synonyms normalization r=MarinPostma a=LegendreM

Synonyms needs to be indexed in ascendant order,
and the new normalization step for synonyms potentially changes this order
which break the indexation process
because "Harry Potter" > "HP"  but "harry potter" < "hp"

Co-authored-by: many <maxime@meilisearch.com>
2021-02-04 15:37:33 +00:00
e000e10e01 Merge #1229
1229: Fix links in CONTRIBUTING.md r=Kerollmops a=curquiza

Closes #1228 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-02-04 15:00:26 +00:00
8dea9662dc Fix links in CONTRIBUTING.md 2021-02-04 15:56:06 +01:00
ed44e684cc review fixes 2021-02-04 15:28:52 +01:00
f18e795124 fix rebase 2021-02-04 15:09:43 +01:00
f1c09a54be implement get index meta 2021-02-04 14:56:37 +01:00
8d462afb79 add tests for list index and create index. 2021-02-04 14:56:36 +01:00
f988306691 implement create index 2021-02-04 14:56:34 +01:00
d43dc4824c implement list indexes 2021-02-04 14:54:48 +01:00
482f734e53 Merge pull request #24 from meilisearch/index-controller
Index controller
2021-02-04 14:51:21 +01:00
f8f02af23e incorporate review changes 2021-02-04 13:21:15 +01:00
cb50781d2d Merge #1222
1222: Ignore existing primary key r=Kerollmops a=MarinPostma

fixing bug in #1176 made it an hard error to try to re-set the primary key on a document addition. This PR makes Meilisearch ignore a primary key passed as an argument to a document addition. This has been decided after a discussion with @curquiza, in order to make the bug fix non breaking.

Turns out it was a good catch too, since contrary to what I thought the error was not caught asynchronously, thank you @curquiza 

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-02-04 08:08:09 +00:00
1df0fdf3e2 fix synonyms normalization
Synonyms needs to be indexed in ascendant order,
and the new normalization step for synonyms potentially changes this order
which break the indexation process
because "Harry Potter" > "HP"  but "harry potter" < "hp"
2021-02-03 15:21:06 +01:00
a95a18afe4 ignore primary key if it is already set 2021-02-03 11:59:29 +01:00
9af0a08122 post review fixes 2021-02-02 17:34:06 +01:00
69c91d2b56 Merge #1218
1218: bump meilisearch version 0.19.0 r=LegendreM a=MarinPostma



Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-02-02 13:45:28 +00:00
97ba5e97c6 update changelog 2021-02-02 14:32:04 +01:00
8760beed1c bump meilisearch 2021-02-02 14:23:33 +01:00
15464e57af Merge #1172
1172: Fix atomic snapshot creation r=MarinPostma a=raszi

Compress gzip files to a temporary file first and then do an atomic rename.

In our setup we have an indexer which does snapshoting for the instances serving the requests. Since currently the snapshoting mechanism is replacing the file in place therefore the indexer could not share the snapshot with a live instance. 

With this small patch we first create a new temporary file in the same directory as the snapshot dir and then we do an atomic rename therefore the snapshot path would always contain a valid snapshot.
After applying this change it would be enough to simply restart the serving instances to pick up the new snapshot from a shared storage without worrying them to die because of an incomplete snapshot.

Co-authored-by: KARASZI István <ikaraszi@gmail.com>
2021-02-02 12:37:33 +00:00
c984fa1071 Merge #1176
1176: fix race condition in  document addition r=Kerollmops a=MarinPostma

As described in #1160, there was a race condition when updating settings and adding documents simultaneously. This was due to the schema being updated and document addition being processed in two different transactions. This PR moves the schema update logic for the primary key in the same transaction as the document addition, while maintaining the input checks for the validity of the primary key in the http route, in order not to break the error reporting for the document addition route.

close #1160.

Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: marin <postma.marin@protonmail.com>
2021-02-02 09:26:32 +00:00
97f35de41f fix flaky test 2021-02-01 18:59:22 +01:00
81e9fd8933 Merge #1184
1184: normalize synonyms during indexation r=MarinPostma a=LegendreM

fix #1135 #964

Normalizes the synonyms before indexing them, so they are not case sensitive anymore. Then normalization also involves deunicoding is some cases, such as accents, so `été` and `ete` are considered equivalent in a search for synonyms.

Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-02-01 14:12:57 +00:00
17c463ca61 remove unused deps 2021-02-01 13:32:21 +01:00
f0ca193122 Merge branch 'master' into atomic-rename 2021-02-01 13:30:51 +01:00
940f83698c Update meilisearch-core/src/update/settings_update.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-02-01 12:06:48 +01:00
ccb7104dee add tests for IndexStore 2021-01-29 19:14:23 +01:00
da056a6877 comment tests out 2021-01-28 20:55:29 +01:00
e9c95f6623 remove useless files 2021-01-28 19:43:54 +01:00
f37a420a04 Merge #1174
1174: Limit query words number r=MarinPostma a=MarinPostma

This pr adds a limit to the number of words taken into account in a search query. Using query string that are too long leads to huge performance hits and ressources consumtion, that occasionally crashes the machine. The limit has been hard set to 10, and tests have been added to make sure that it is taken into account.

close #941

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-01-28 17:38:34 +00:00
6c63ee6798 implement list all indexes 2021-01-28 18:32:24 +01:00
60371b9dcf get update id 2021-01-28 17:20:51 +01:00
4119ae8655 setttings update 2021-01-28 16:57:53 +01:00
8183202868 documetn addition and search 2021-01-28 15:14:48 +01:00
74410d8c6b architecture rework 2021-01-28 14:12:34 +01:00
c1808513fe Merge #1211
1211: update tokenizer to v0.1.3 r=MarinPostma a=LegendreM

fix #1188

Co-authored-by: many <maxime@meilisearch.com>
2021-01-28 09:50:38 +00:00
eeccdce33a update tokenizer to v0.1.3 2021-01-28 10:33:44 +01:00
a6667b14df Merge #1193
1193: Update LICENSE year r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-01-28 09:17:55 +00:00
62e908264e Merge #1207
1207: fix homebrew name r=MarinPostma a=fharper

brew is the command, the package manager name is homebrew

Co-authored-by: Frédéric Harper <hi@fred.dev>
2021-01-28 08:45:07 +00:00
2fe52d0a4f fix homebrew name
brew is the command, the package manager name is homebrew
2021-01-26 15:14:53 -05:00
d01c93aeee fix running URL display
by doing that you can just click on it in the terminal if you want
2021-01-26 15:11:46 -05:00
c75ffbf3d5 Merge branch 'master' into atomic-rename 2021-01-19 13:04:31 +01:00
e3e475c5b1 Update LICENSE 2021-01-19 00:18:52 +01:00
6a3f625e11 WIP: refactor IndexController
change the architecture of the index controller to allow it to own an
index store.
2021-01-16 15:09:48 +01:00
1d910dbb42 Update meilisearch-core/src/update/documents_addition.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-01-15 00:55:31 +01:00
bf3f36b46e Merge pull request #1191 from meilisearch/release-v0.18.1
Release v0.18.1
2021-01-14 14:11:19 +01:00
686f987180 fix compile errors 2021-01-14 11:27:07 +01:00
334933b874 fix search 2021-01-13 18:29:17 +01:00
d22fab5bae implement open index 2021-01-13 18:20:14 +01:00
ddd7789713 WIP: IndexController 2021-01-13 17:50:36 +01:00
ff38220b68 Merge #1190
1190: Bump meilisearch 0 18 1 r=LegendreM a=LegendreM

- bump version to `0.18.1`
- update `CHANGELOG.md`

Co-authored-by: many <maxime@meilisearch.com>
2021-01-13 15:35:28 +00:00
7a7cb9bcbf update dependencies 2021-01-13 15:48:53 +01:00
fe9c99a11b update changelog 2021-01-13 15:38:54 +01:00
9b47bbc1ac bump meilisearch 2021-01-13 15:37:15 +01:00
430a5f902b fix race condition in document addition 2021-01-13 13:17:52 +01:00
bc0d53e819 Update meilisearch-core/src/update/settings_update.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-01-13 13:17:19 +01:00
0bb8b3a68d Merge #1185
1185: fix cors issue r=MarinPostma a=MarinPostma

This PR fixes a bug where foreign origin were not accepted.
This was due to an update to actix-cors

It also fixes the cors bug when authentication failed, with the caveat that request that are denied for permissions reason are not logged. 

it introduces a bug described in  #1186

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-01-13 10:56:25 +00:00
e5c220b82c fix authentication cors bug 2021-01-12 18:08:16 +01:00
60c636738b fix cors error 2021-01-12 16:46:53 +01:00
06b2a587af normalize synonyms during indexation 2021-01-12 13:53:32 +01:00
81f343a46a add word limit to search queries 2021-01-08 16:23:23 +01:00
956adfc90a Replace in-place compression
Compress gzip files to a temporary file first and then do an atomic
rename.
2021-01-07 17:36:42 +01:00
b07e21ab3c temp 2021-01-05 00:21:42 +01:00
b4d447b5cb temp 2021-01-01 16:59:49 +01:00
d1e9ded76f setting builder takes ownership 2020-12-31 00:50:30 +01:00
12ee7b9b13 impl get all updates 2020-12-30 19:17:13 +01:00
d9dc2036a7 support error & return document count on addition 2020-12-30 18:44:33 +01:00
54861335a0 retrieve update status 2020-12-30 18:16:07 +01:00
0cd9e62fc6 search first iteration 2020-12-24 12:58:34 +01:00
02ef1d41d7 route document add json 2020-12-23 16:12:37 +01:00
1a38bfd31f data add documents 2020-12-23 13:52:28 +01:00
0d7c4beecd reimplement Data 2020-12-22 17:53:13 +01:00
55e1552957 update queue refactor, first iteration 2020-12-22 17:13:50 +01:00
7c9eaaeadb clean code, and fix errors 2020-12-22 14:02:41 +01:00
29b1f55bb0 prepare boilerplate code for new api 2020-12-12 16:04:37 +01:00
8c0ab106c7 initial commit 2020-12-12 13:32:06 +01:00
226 changed files with 16897 additions and 29928 deletions

View File

@ -1,5 +1,4 @@
target
Dockerfile
.dockerignore
.git
.gitignore

View File

@ -23,16 +23,8 @@ A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
**Smartphone (please complete the following information):**
- Device: [e.g. iPhone6]
- OS: [e.g. iOS8.1]
- Browser [e.g. stock browser, safari]
- Version [e.g. 22]
**MeiliSearch version:** [e.g. v0.20.0]
**Additional context**
Add any other context about the problem here.
Additional information that may be relevant to the issue.
[e.g. architecture, device, OS, browser]

10
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,10 @@
contact_links:
- name: Feature request
url: https://github.com/meilisearch/product/discussions/categories/feedback-feature-proposal
about: The feature requests are not managed in this repository, please open a discussion in our dedicated product repository
- name: Documentation issue
url: https://github.com/meilisearch/documentation/issues/new
about: For documentation issues, open an issue or a PR in the documentation repository
- name: Support questions & other
url: https://github.com/meilisearch/MeiliSearch/discussions/new
about: For any other question, open a discussion in this repository

View File

@ -1,20 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

View File

@ -1,40 +0,0 @@
---
name: Tracking issue
about: Template for a tracking issue
title: ''
labels: tracking-issue
assignees: ''
---
# Summary
One paragraph to explain the feature.
# Motivations
Why are we doing this? What use cases does it support? What is the expected outcome?
# Explanation
Explain the proposal like it was the final documentation of this proposal.
- What is changing for end-users.
- How it works.
- What is breaking?
- Examples.
# Implementation
Explain the technical specificities that will need to be known or done in order to implement this proposal.
## Steps
Describe each step to create the feature with it's associated issue/PR.
# Related
- [ ] Validated by the team (@people needed)
- [ ] Test added
- [ ] [Documentation](https://github.com/meilisearch/documentation/issues/#xxx) //Change xxx or remove the line
- [ ] [SDK/Integrations](https://github.com/meilisearch/integration-guides/issues/#xxx) //Change xxx or remove the line

View File

@ -1,6 +0,0 @@
version: 2
updates:
- package-ecosystem: "cargo"
directory: "/"
schedule:
interval: "monthly"

13
.github/release-draft-template.yml vendored Normal file
View File

@ -0,0 +1,13 @@
name-template: 'v$RESOLVED_VERSION'
tag-template: 'v$RESOLVED_VERSION'
version-template: '0.21.0-alpha.$PATCH'
exclude-labels:
- 'skip-changelog'
template: |
## Changes
$CHANGES
no-changes-template: 'Changes are coming soon 😎'
sort-direction: 'ascending'
version-resolver:
default: patch

View File

@ -7,7 +7,7 @@ name: Execute code coverage
jobs:
nightly-coverage:
runs-on: ubuntu-latest
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1

15
.github/workflows/flaky.yml vendored Normal file
View File

@ -0,0 +1,15 @@
name: Look for flaky tests
on:
schedule:
- cron: "0 12 * * FRI" # every friday at 12:00PM
jobs:
flaky:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- name: Install cargo-flaky
run: cargo install cargo-flaky
- name: Run cargo flaky 100 times
run: cargo flaky -i 100 --release

View File

@ -10,9 +10,9 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
os: [ubuntu-18.04, macos-latest, windows-latest]
include:
- os: ubuntu-latest
- os: ubuntu-18.04
artifact_name: meilisearch
asset_name: meilisearch-linux-amd64
- os: macos-latest
@ -26,7 +26,7 @@ jobs:
- uses: hecrj/setup-rust-action@master
with:
rust-version: stable
- uses: actions/checkout@v1
- uses: actions/checkout@v2
- name: Build
run: cargo build --release --locked
- name: Upload binaries to release
@ -37,35 +37,11 @@ jobs:
asset_name: ${{ matrix.asset_name }}
tag: ${{ github.ref }}
publish-armv7:
name: Publish for ARMv7
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v1.0.0
- uses: uraimo/run-on-arch-action@v1.0.7
id: runcmd
with:
architecture: armv7
distribution: ubuntu18.04
run: |
apt update
apt install -y curl gcc make
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal --default-toolchain stable
source $HOME/.cargo/env
cargo build --release --locked
- name: Upload the binary to release
uses: svenstaro/upload-release-action@v1-release
with:
repo_token: ${{ secrets.PUBLISH_TOKEN }}
file: target/release/meilisearch
asset_name: meilisearch-linux-armv7
tag: ${{ github.ref }}
publish-armv8:
name: Publish for ARMv8
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v1.0.0
- uses: actions/checkout@v2
- uses: uraimo/run-on-arch-action@v1.0.7
id: runcmd
with:

View File

@ -7,14 +7,14 @@ on:
jobs:
debian:
name: Publish debian packagge
runs-on: ubuntu-latest
runs-on: ubuntu-18.04
steps:
- uses: hecrj/setup-rust-action@master
with:
rust-version: stable
- name: Install cargo-deb
run: cargo install cargo-deb
- uses: actions/checkout@v1
- uses: actions/checkout@v2
- name: Build deb package
run: cargo deb -p meilisearch-http -o target/debian/meilisearch.deb
- name: Upload debian pkg to release
@ -29,7 +29,7 @@ jobs:
homebrew:
name: Bump Homebrew formula
runs-on: ubuntu-latest
runs-on: ubuntu-18.04
steps:
- name: Create PR to Homebrew
uses: mislav/bump-homebrew-formula-action@v1

View File

@ -7,7 +7,7 @@ name: Publish latest image to Docker Hub
jobs:
build:
runs-on: ubuntu-latest
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- name: Check if current release is latest

View File

@ -8,11 +8,13 @@ name: Publish tagged image to Docker Hub
jobs:
build:
runs-on: ubuntu-latest
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v1
- uses: actions/checkout@v2
- name: Publish to Registry
uses: elgohr/Publish-Docker-Github-Action@master
env:
COMMIT_SHA: ${{ github.sha }}
with:
name: getmeili/meilisearch
username: ${{ secrets.DOCKER_USERNAME }}

16
.github/workflows/release-drafter.yml vendored Normal file
View File

@ -0,0 +1,16 @@
name: Release Drafter
on:
push:
branches:
- main
jobs:
update_release_draft:
runs-on: ubuntu-latest
steps:
- uses: release-drafter/release-drafter@v5
with:
config-name: release-draft-template.yml
env:
GITHUB_TOKEN: ${{ secrets.RELEASE_DRAFTER_TOKEN }}

86
.github/workflows/rust.yml vendored Normal file
View File

@ -0,0 +1,86 @@
name: Rust
on:
workflow_dispatch:
pull_request:
push:
# trying and staging branches are for Bors config
branches:
- trying
- staging
env:
CARGO_TERM_COLOR: always
jobs:
tests:
name: Tests on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-18.04, macos-latest, windows-latest]
steps:
- uses: actions/checkout@v2
- name: Cache dependencies
uses: actions/cache@v2
with:
path: |
~/.cargo
./target
key: ${{ matrix.os }}-${{ hashFiles('Cargo.lock') }}
- name: Run cargo check without any default features
uses: actions-rs/cargo@v1
with:
command: build
args: --locked --release --no-default-features
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release
clippy:
name: Run Clippy
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- name: Cache dependencies
uses: actions/cache@v2
with:
path: |
~/.cargo
./target
key: ${{ matrix.os }}-${{ hashFiles('Cargo.lock') }}
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
components: clippy
- name: Run cargo clippy
uses: actions-rs/cargo@v1
with:
command: clippy
args: --all-targets -- --deny warnings
fmt:
name: Run Rustfmt
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- name: Cache dependencies
uses: actions/cache@v2
with:
path: |
~/.cargo
./target
key: ${{ matrix.os }}-${{ hashFiles('Cargo.lock') }}
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: nightly
override: true
components: rustfmt
- name: Run cargo fmt
run: cargo fmt --all -- --check

View File

@ -1,93 +0,0 @@
---
on:
push:
branches:
- release-v*
- trying
- staging
tags:
- 'v[0-9]+.[0-9]+.[0-9]+' # this only concerns tags on stable
name: Test binaries with cargo test
jobs:
check:
name: Test on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
steps:
- uses: actions/checkout@v1
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
components: clippy
- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
args: --locked --release
- name: Run cargo clippy
uses: actions-rs/cargo@v1
with:
command: clippy
build-image:
name: Test the build of Docker image
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- run: docker build . --file Dockerfile -t meilisearch
name: Docker build
## A push occurred on a release branch, a prerelease is created and assets are generated
prerelease:
name: create prerelease
needs: [check, build-image]
if: ${{ contains(github.ref, 'release-') && github.event_name == 'push' }}
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Get version number
id: version-number
run: echo "##[set-output name=number;]$(echo ${{ github.ref }} | sed 's/.*\(v.*\)/\1/')"
- name: Get commit count
id: commit-count
run: echo "##[set-output name=count;]$(git rev-list remotes/origin/master..remotes/origin/release-${{ steps.version-number.outputs.number }} --count)"
- name: Create Release
id: create_release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.PUBLISH_TOKEN }} # Personal Access Token
with:
tag_name: ${{ steps.version-number.outputs.number }}rc${{ steps.commit-count.outputs.count }}
release_name: Pre-release ${{ steps.version-number.outputs.number }}-rc${{ steps.commit-count.outputs.count }}
prerelease: true
## If a tag is pushed, a release is created for this tag, and assets will be generated
release:
name: create release
needs: [check, build-image]
if: ${{ contains(github.ref, 'tags/v') }}
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Get version number
id: version-number
run: echo "##[set-output name=number;]$(echo ${{ github.ref }} | sed 's/.*\(v.*\)/\1/')"
- name: Create Release
id: create_release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.PUBLISH_TOKEN }} # PAT
with:
tag_name: ${{ steps.version-number.outputs.number }}
release_name: Meilisearch ${{ steps.version-number.outputs.number }}
prerelease: false

3
.gitignore vendored
View File

@ -1,8 +1,9 @@
/target
meilisearch-core/target
**/*.csv
**/*.json_lines
**/*.rs.bk
/*.mdb
/query-history.txt
/data.ms
/snapshots
/dumps

View File

@ -1,109 +0,0 @@
## v0.18.0
- Integration with the new tokenizer (#1091)
- Fix setting consistency bug (#1128)
- Fix attributes to retrieve bug (#1131)
- Increase default payload size (#1147)
- Improvements to code quality (#1167, #1165, #1126, #1151)
## v0.17.0
- Fix corrupted data during placeholder search (#1089)
- Remove maintenance error from http (#1082)
- Disable frontend in production (#1097)
- Update nbHits count with filtered documents (#849)
- Remove update changelog ci check (#1090)
- Add deploy on Platform.sh option to README (#1087)
- Change movie gifs in README (#1077)
- Remove some clippy warnings (#1100)
- Improve script `download-latest.sh` (#1054)
- Bump dependencies version (#1056, #1057, #1059)
## v0.16.0
- Automatically create index on document push if index doesn't exist (#914)
- Sort displayedAttributes and facetDistribution (#946)
## v0.15.0
- Update actix-web dependency to 3.0.0 (#963)
- Consider an empty query to be a placeholder search (#916)
## v0.14.1
- Fix version mismatch in snapshot importation (#959)
## v0.14.0
- Sort displayedAttributes (#943)
- Fix facet distribution case (#797)
- Snapshotting (#839)
- Fix bucket-sort unwrap bug (#915)
## v0.13.0
- placeholder search (#771)
- Add database version mismatch check (#794)
- Displayed and searchable attributes wildcard (#846)
- Remove sys-info route (#810)
- Check database version mismatch (#794)
- Fix unique docid bug (#841)
- Error codes in updates (#792)
- Sentry disable argument (#813)
- Log analytics if enabled (#825)
- Fix default values displayed on web interface (#874)
## v0.12.0
- Fix long documents not being indexed completely bug (#816)
- Fix distinct attribute returning id instead of name (#800)
- error code rename (#805)
## v0.11.1
- Fix facet cache on document update (#789)
- Improvements on settings consistency (#778)
## v0.11.0
- Change the HTTP framework, moving from tide to actix-web (#601)
- Bump sentry version to 0.18.1 (#690)
- Enable max payload size override (#684)
- Disable sentry in debug (#681)
- Better terminal greeting (#680)
- Fix highlight misalignment (#679)
- Add support for facet count (#676)
- Add support for faceted search (#631)
- Add support for configuring the lmdb map size (#646, #647)
- Add exposed port for Dockerfile (#654)
- Add sentry probe (#664)
- Fix url trailing slash and double slash issues (#659)
- Fix accept all Content-Type by default (#653)
- Return the error message from Serde when a deserialization error is encountered (#661)
- Fix NormalizePath middleware to make the dashboard accessible (#695)
- Update sentry features to remove openssl (#702)
- Add SSL support (#669)
- Rename fieldsFrequency into fieldsDistribution in stats (#719)
- Add support for error code reporting (#703)
- Allow the dashboard to query private servers (#732)
- Add telemetry (#720)
- Add post route for search (#735)
## v0.10.1
- Add support for floating points in filters (#640)
- Add '@' character as tokenizer separator (#607)
- Add support for filtering on arrays of strings (#611)
## v0.10.0
- Refined filtering (#592)
- Add the number of hits in search result (#541)
- Add support for aligned crop in search result (#543)
- Sanitize the content displayed in the web interface (#539)
- Add support of nested null, boolean and seq values (#571 and #568, #574)
- Fixed the core benchmark (#576)
- Publish an ARMv7 and ARMv8 binaries on releases (#540 and #581)
- Fixed a bug where the result of the update status after the first update was empty (#542)
- Fixed a bug where stop words were not handled correctly (#594)
- Fix CORS issues (#602)
- Support wildcard on attributes to retrieve, highlight, and crop (#549, #565, and #598)

View File

@ -32,7 +32,7 @@ expanding into more specifics.
1. **You're familiar with [Github](https://github.com) and the [pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
workflow.**
2. **You've read the MeiliSearch [docs](https://docs.meilisearch.com).**
3. **You know about the [MeiliSearch community](https://docs.meilisearch.com/resources/contact.html).
3. **You know about the [MeiliSearch community](https://docs.meilisearch.com/learn/what_is_meilisearch/contact.html).
Please use this for help.**
## Your First Contribution
@ -91,7 +91,7 @@ aligns better with our process.
### Setup
See the [MeiliSearch Docs](https://docs.meilisearch.com/guides/advanced_guides/installation.html) for how to set up a development environment.
See the [MeiliSearch Docs](https://docs.meilisearch.com/reference/features/installation.html) for how to set up a development environment.
### Benchmarking & Profiling

2475
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -1,9 +1,7 @@
[workspace]
members = [
"meilisearch-core",
"meilisearch-http",
"meilisearch-schema",
"meilisearch-types",
"meilisearch-error",
]
[profile.release]

View File

@ -1,5 +1,5 @@
# Compile
FROM alpine:3.10 AS compiler
FROM alpine:3.14 AS compiler
RUN apk update --quiet
RUN apk add curl
@ -9,14 +9,30 @@ RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
WORKDIR /meilisearch
COPY . .
COPY Cargo.lock .
COPY Cargo.toml .
COPY meilisearch-error/Cargo.toml meilisearch-error/
COPY meilisearch-http/Cargo.toml meilisearch-http/
ENV RUSTFLAGS="-C target-feature=-crt-static"
# Create dummy main.rs files for each workspace member to be able to compile all the dependencies
RUN find . -type d -name "meilisearch-*" | xargs -I{} sh -c 'mkdir {}/src; echo "fn main() { }" > {}/src/main.rs;'
# Use `cargo build` instead of `cargo vendor` because we need to not only download but compile dependencies too
RUN $HOME/.cargo/bin/cargo build --release
# Cleanup dummy main.rs files
RUN find . -path "*/src/main.rs" -delete
ARG COMMIT_SHA
ARG COMMIT_DATE
ENV COMMIT_SHA=${COMMIT_SHA} COMMIT_DATE=${COMMIT_DATE}
COPY . .
RUN $HOME/.cargo/bin/cargo build --release
# Run
FROM alpine:3.10
FROM alpine:3.14
RUN apk add -q --no-cache libgcc tini

View File

@ -1,6 +1,6 @@
MIT License
Copyright (c) 2019-2020 Meili SAS
Copyright (c) 2019-2021 Meili SAS
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@ -17,7 +17,7 @@
<p align="center">
<a href="https://github.com/meilisearch/MeiliSearch/actions"><img src="https://github.com/meilisearch/MeiliSearch/workflows/Cargo%20test/badge.svg" alt="Build Status"></a>
<a href="https://deps.rs/repo/github/meilisearch/MeiliSearch"><img src="https://deps.rs/repo/github/meilisearch/MeiliSearch/status.svg" alt="Dependency status"></a>
<a href="https://github.com/meilisearch/MeiliSearch/blob/master/LICENSE"><img src="https://img.shields.io/badge/license-MIT-informational" alt="License"></a>
<a href="https://github.com/meilisearch/MeiliSearch/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-informational" alt="License"></a>
<a href="https://slack.meilisearch.com"><img src="https://img.shields.io/badge/slack-MeiliSearch-blue.svg?logo=slack" alt="Slack"></a>
<a href="https://github.com/meilisearch/MeiliSearch/discussions" alt="Discussions"><img src="https://img.shields.io/badge/github-discussions-red" /></a>
<a href="https://app.bors.tech/repositories/26457"><img src="https://bors.tech/images/badge_small.svg" alt="Bors enabled"></a>
@ -29,16 +29,16 @@
For more information about features go to [our documentation](https://docs.meilisearch.com/).
<p align="center">
<img src="assets/trumen_quick_loop.gif" alt="Web interface gif" />
<img src="assets/trumen-fast.gif" alt="Web interface gif" />
</p>
## ✨ Features
* Search as-you-type experience (answers < 50 milliseconds)
* Search-as-you-type experience (answers < 50 milliseconds)
* Full-text search
* Typo tolerant (understands typos and miss-spelling)
* Typo tolerant (understands typos and misspelling)
* Faceted search and filters
* Supports Kanji characters
* Supports Synonym
* Supports hanzi (Chinese characters)
* Supports synonyms
* Easy to install, deploy, and maintain
* Whole documents are returned
* Highly customizable
@ -48,7 +48,7 @@ For more information about features go to [our documentation](https://docs.meili
### Deploy the Server
#### Brew (Mac OS)
#### Homebrew (Mac OS)
```bash
brew update && brew install meilisearch
@ -58,12 +58,12 @@ meilisearch
#### Docker
```bash
docker run -p 7700:7700 -v $(pwd)/data.ms:/data.ms getmeili/meilisearch
docker run -p 7700:7700 -v "$(pwd)/data.ms:/data.ms" getmeili/meilisearch
```
#### Try MeiliSearch in our Sandbox
Create a MeiliSearch instance in [MeiliSearch Sandbox](https://sandbox.meilisearch.com/). This instance is free, and will be active for 72 hours.
Create a MeiliSearch instance in [MeiliSearch Sandbox](https://sandbox.meilisearch.com/). This instance is free, and will be active for 48 hours.
#### Run on Digital Ocean
@ -97,13 +97,6 @@ If you have the latest stable Rust toolchain installed on your local system, clo
```bash
git clone https://github.com/meilisearch/MeiliSearch.git
cd MeiliSearch
```
In the cloned repository, compile MeiliSearch.
```bash
rustup override set stable
rustup update stable
cargo run --release
```
@ -115,14 +108,7 @@ Let's create an index! If you need a sample dataset, use [this movie database](h
curl -L 'https://bit.ly/2PAcw9l' -o movies.json
```
MeiliSearch can serve multiple indexes, with different kinds of documents.
It is required to create an index before sending documents to it.
```bash
curl -i -X POST 'http://127.0.0.1:7700/indexes' --data '{ "name": "Movies", "uid": "movies" }'
```
Now that the server knows about your brand new index, you're ready to send it some data.
Now, you're ready to index some data.
```bash
curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents' \
@ -149,27 +135,29 @@ curl 'http://127.0.0.1:7700/indexes/movies/search?q=botman+robin&limit=2' | jq
"id": "415",
"title": "Batman & Robin",
"poster": "https://image.tmdb.org/t/p/w1280/79AYCcxw3kSKbhGpx1LiqaCAbwo.jpg",
"overview": "Along with crime-fighting partner Robin and new recruit Batgirl...",
"release_date": "1997-06-20",
"overview": "Along with crime-fighting partner Robin and new recruit Batgirl, Batman battles the dual threat of frosty genius Mr. Freeze and homicidal horticulturalist Poison Ivy. Freeze plans to put Gotham City on ice, while Ivy tries to drive a wedge between the dynamic duo.",
"release_date": 866768400
},
{
"id": "411736",
"title": "Batman: Return of the Caped Crusaders",
"poster": "https://image.tmdb.org/t/p/w1280/GW3IyMW5Xgl0cgCN8wu96IlNpD.jpg",
"overview": "Adam West and Burt Ward returns to their iconic roles of Batman and Robin...",
"release_date": "2016-10-08",
"overview": "Adam West and Burt Ward returns to their iconic roles of Batman and Robin. Featuring the voices of Adam West, Burt Ward, and Julie Newmar, the film sees the superheroes going up against classic villains like The Joker, The Riddler, The Penguin and Catwoman, both in Gotham City… and in space.",
"release_date": 1475888400
}
],
"offset": 0,
"nbHits": 8,
"exhaustiveNbHits": false,
"query": "botman robin",
"limit": 2,
"processingTimeMs": 1,
"query": "botman robin"
"offset": 0,
"processingTimeMs": 2
}
```
#### Use the Web Interface
We also deliver an **out-of-the-box web interface** in which you can test MeiliSearch interactively.
We also deliver an **out-of-the-box [web interface](https://github.com/meilisearch/mini-dashboard)** in which you can test MeiliSearch interactively.
You can access the web interface in your web browser at the root of the server. The default URL is [http://127.0.0.1:7700](http://127.0.0.1:7700). All you need to do is open your web browser and enter MeiliSearchs address to visit it. This will lead you to a web page with a search bar that will allow you to search in the selected index.
@ -181,23 +169,33 @@ Now that your MeiliSearch server is up and running, you can learn more about how
## Contributing
Hey! We're glad you're thinking about contributing to MeiliSearch! If you think something is missing or could be improved, please open issues and pull requests. If you'd like to help this project grow, we'd love to have you! To start contributing, checking [issues tagged as "good-first-issue"](https://github.com/meilisearch/MeiliSearch/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) is a good start!
Hey! We're glad you're thinking about contributing to MeiliSearch! Feel free to pick an [issue labeled as `good first issue`](https://github.com/meilisearch/MeiliSearch/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22), and to ask any question you need. Some points might not be clear and we are available to help you!
Also, we recommend following the [CONTRIBUTING](./CONTRIBUTING.md) to create your PR.
## Core engine and tokenizer
The code in this repository is only concerned with managing multiple indexes, handling the update store, and exposing an HTTP API.
Search and indexation are the domain of our core engine, [`milli`](https://github.com/meilisearch/milli), while tokenization is handled by [our `tokenizer` library](https://github.com/meilisearch/tokenizer/).
## Telemetry
MeiliSearch collects anonymous data regarding general usage.
This helps us better understand developers usage of MeiliSearch features.<br/>
To see what information we're retrieving, please see the complete list [on the dedicated issue](https://github.com/meilisearch/MeiliSearch/issues/720).<br/>
We also use Sentry to make us crash and error reports. If you want to know more about what Sentry collects, please visit their [privacy policy website](https://sentry.io/privacy/).<br/>
This helps us better understand developers' usage of MeiliSearch features.
To see what information we're retrieving, please see the complete list [on the dedicated issue](https://github.com/meilisearch/MeiliSearch/issues/720).
This program is optional, you can disable these analytics by using the `MEILI_NO_ANALYTICS` env variable.
## Feature request
The feature requests are not managed in this repository. Please visit our [dedicated repository](https://github.com/meilisearch/product) to see our work about the MeiliSearch product.
If you have a feature request or any feedback about an existing feature, please open [a discussion](https://github.com/meilisearch/product/discussions).
Also, feel free to participate in the current discussions, we are looking forward to reading your comments.
## 💌 Contact
Feel free to contact us about any questions you may have:
* At [bonjour@meilisearch.com](mailto:bonjour@meilisearch.com)
* Via the chat box available on every page of [our documentation](https://docs.meilisearch.com/) and on [our landing page](https://www.meilisearch.com/).
* 🆕 Join our [GitHub Discussions forum](https://github.com/meilisearch/MeiliSearch/discussions)
* Join our [Slack community](https://slack.meilisearch.com/).
* By opening an issue.
Please visit [this page](https://docs.meilisearch.com/learn/what_is_meilisearch/contact.html#contact-us).
MeiliSearch is developed by [Meili](https://www.meilisearch.com), a young company. To know more about us, you can [read our blog](https://blog.meilisearch.com). Any suggestion or feedback is highly appreciated. Thank you for your support!

BIN
assets/trumen-fast.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 MiB

View File

@ -1,3 +1,9 @@
status = ["Test on macos-latest", "Test on ubuntu-latest"]
# 4 hours timeout
timeout-sec = 14400
status = [
'Tests on ubuntu-18.04',
'Tests on macos-latest',
'Tests on windows-latest',
'Run Clippy',
'Run Rustfmt'
]
# 3 hours timeout
timeout-sec = 10800

38
bump.sh
View File

@ -1,38 +0,0 @@
#!/usr/bin/bash
NEW_VERSION=$1
if [ -z "$NEW_VERSION" ]
then
echo "error: a version number must be provided"
exit 1
fi
# find current version
CURRENT_VERSION=$(cat **/*.toml | grep meilisearch | grep version | sed 's/.*\([0-9]\+\.[0-9]\+\.[0-9]\+\).*/\1/' | sed "1q;d")
# bump all version in .toml
echo "bumping from version $CURRENT_VERSION to version $NEW_VERSION"
while true
do
read -r -p "Continue (y/n)?" choice
case "$choice" in
y|Y ) break;;
n|N ) echo "aborting bump" && exit 0;;
* ) echo "invalid choice";;
esac
done
# update all crate version
sed -i "s/version = \"$CURRENT_VERSION\"/version = \"$NEW_VERSION\"/" **/*.toml
printf "running cargo check: "
CARGO_CHECK=$(cargo check 2>&1)
if [ $? != "0" ]
then
printf "\033[31;1m FAIL \033[0m\n"
printf "$CARGO_CHECK"
exit 1
fi
printf "\033[32;1m OK \033[0m\n"

File diff suppressed because it is too large Load Diff

View File

@ -6,7 +6,6 @@ GREEN='\033[32m'
DEFAULT='\033[0m'
# GLOBALS
BINARY_NAME='meilisearch'
GREP_SEMVER_REGEXP='v\([0-9]*\)[.]\([0-9]*\)[.]\([0-9]*\)$' # i.e. v[number].[number].[number]
# FUNCTIONS
@ -22,7 +21,7 @@ semverParseInto() {
eval $2=`echo $1 | sed -e "s#$RE#\1#"`
#MINOR
eval $3=`echo $1 | sed -e "s#$RE#\2#"`
#MINOR
#PATCH
eval $4=`echo $1 | sed -e "s#$RE#\3#"`
#SPECIAL
eval $5=`echo $1 | sed -e "s#$RE#\4#"`
@ -127,6 +126,9 @@ get_os() {
'Linux')
os='linux'
;;
'MINGW'*)
os='windows'
;;
*)
return 1
esac
@ -163,7 +165,7 @@ failure_usage() {
printf "$RED%s\n$DEFAULT" 'ERROR: MeiliSearch binary is not available for your OS distribution or your architecture yet.'
echo ''
echo 'However, you can easily compile the binary from the source files.'
echo 'Follow the steps at the page ("Source" tab): https://docs.meilisearch.com/guides/advanced_guides/installation.html'
echo 'Follow the steps at the page ("Source" tab): https://docs.meilisearch.com/learn/getting_started/installation.html'
}
# MAIN
@ -180,7 +182,17 @@ if ! get_archi; then
fi
echo "Downloading MeiliSearch binary $latest for $os, architecture $archi..."
release_file="meilisearch-$os-$archi"
case "$os" in
'windows')
release_file="meilisearch-$os-$archi.exe"
BINARY_NAME='meilisearch.exe'
;;
*)
release_file="meilisearch-$os-$archi"
BINARY_NAME='meilisearch'
esac
link="https://github.com/meilisearch/MeiliSearch/releases/download/$latest/$release_file"
curl -OL "$link"
mv "$release_file" "$BINARY_NAME"

View File

@ -1,53 +0,0 @@
[package]
name = "meilisearch-core"
version = "0.18.0"
license = "MIT"
authors = ["Kerollmops <clement@meilisearch.com>"]
edition = "2018"
[dependencies]
arc-swap = "1.2.0"
bincode = "1.3.1"
byteorder = "1.3.4"
chrono = { version = "0.4.19", features = ["serde"] }
compact_arena = "0.4.1"
cow-utils = "0.1.2"
crossbeam-channel = "0.5.0"
deunicode = "1.1.1"
either = "1.6.1"
env_logger = "0.8.2"
fst = "0.4.5"
hashbrown = { version = "0.9.1", features = ["serde"] }
heed = "0.10.6"
indexmap = { version = "1.6.1", features = ["serde-1"] }
intervaltree = "0.2.6"
itertools = "0.10.0"
levenshtein_automata = { version = "0.2.0", features = ["fst_automaton"] }
log = "0.4.11"
meilisearch-error = { path = "../meilisearch-error", version = "0.18.0" }
meilisearch-schema = { path = "../meilisearch-schema", version = "0.18.0" }
meilisearch-tokenizer = { git = "https://github.com/meilisearch/Tokenizer.git", tag = "v0.1.2" }
meilisearch-types = { path = "../meilisearch-types", version = "0.18.0" }
once_cell = "1.5.2"
ordered-float = { version = "2.0.1", features = ["serde"] }
pest = { git = "https://github.com/pest-parser/pest.git", rev = "51fd1d49f1041f7839975664ef71fe15c7dcaf67" }
pest_derive = "2.1.0"
regex = "1.4.2"
sdset = "0.4.0"
serde = { version = "1.0.118", features = ["derive"] }
serde_json = { version = "1.0.61", features = ["preserve_order"] }
slice-group-by = "0.2.6"
unicase = "2.6.0"
zerocopy = "0.3.0"
[dev-dependencies]
assert_matches = "1.4.0"
criterion = "0.3.3"
csv = "1.1.5"
rustyline = { version = "7.1.0", default-features = false }
structopt = "0.3.21"
tempfile = "3.1.0"
termcolor = "1.1.2"
[target.'cfg(unix)'.dev-dependencies]
jemallocator = "0.3.2"

View File

@ -1,470 +0,0 @@
use std::collections::HashSet;
use std::collections::btree_map::{BTreeMap, Entry};
use std::error::Error;
use std::io::{Read, Write};
use std::iter::FromIterator;
use std::path::{Path, PathBuf};
use std::time::{Duration, Instant};
use std::{fs, io, sync::mpsc};
use rustyline::{Config, Editor};
use serde::{Deserialize, Serialize};
use structopt::StructOpt;
use termcolor::{Color, ColorChoice, ColorSpec, StandardStream, WriteColor};
use meilisearch_core::{Database, DatabaseOptions, Highlight, ProcessedUpdateResult};
use meilisearch_core::settings::Settings;
use meilisearch_schema::FieldId;
#[cfg(target_os = "linux")]
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
#[derive(Debug, StructOpt)]
struct IndexCommand {
/// The destination where the database must be created.
#[structopt(parse(from_os_str))]
database_path: PathBuf,
#[structopt(long, default_value = "default")]
index_uid: String,
/// The csv file path to index, you can also use `-` to specify the standard input.
#[structopt(parse(from_os_str))]
csv_data_path: PathBuf,
/// The path to the settings.
#[structopt(long, parse(from_os_str))]
settings: PathBuf,
#[structopt(long)]
update_group_size: Option<usize>,
#[structopt(long, parse(from_os_str))]
compact_to_path: Option<PathBuf>,
}
#[derive(Debug, StructOpt)]
struct SearchCommand {
/// The path of the database to work with.
#[structopt(parse(from_os_str))]
database_path: PathBuf,
#[structopt(long, default_value = "default")]
index_uid: String,
/// Timeout after which the search will return results.
#[structopt(long)]
fetch_timeout_ms: Option<u64>,
/// The number of returned results
#[structopt(short, long, default_value = "10")]
number_results: usize,
/// The number of characters before and after the first match
#[structopt(short = "C", long, default_value = "35")]
char_context: usize,
/// A filter string that can be `!adult` or `adult` to
/// filter documents on this specfied field
#[structopt(short, long)]
filter: Option<String>,
/// Fields that must be displayed.
displayed_fields: Vec<String>,
}
#[derive(Debug, StructOpt)]
struct ShowUpdatesCommand {
/// The path of the database to work with.
#[structopt(parse(from_os_str))]
database_path: PathBuf,
#[structopt(long, default_value = "default")]
index_uid: String,
}
#[derive(Debug, StructOpt)]
enum Command {
Index(IndexCommand),
Search(SearchCommand),
ShowUpdates(ShowUpdatesCommand),
}
impl Command {
fn path(&self) -> &Path {
match self {
Command::Index(command) => &command.database_path,
Command::Search(command) => &command.database_path,
Command::ShowUpdates(command) => &command.database_path,
}
}
}
#[derive(Serialize, Deserialize)]
#[serde(transparent)]
struct Document(indexmap::IndexMap<String, String>);
fn index_command(command: IndexCommand, database: Database) -> Result<(), Box<dyn Error>> {
let start = Instant::now();
let (sender, receiver) = mpsc::sync_channel(100);
let update_fn =
move |_name: &str, update: ProcessedUpdateResult| sender.send(update.update_id).unwrap();
let index = match database.open_index(&command.index_uid) {
Some(index) => index,
None => database.create_index(&command.index_uid).unwrap(),
};
database.set_update_callback(Box::new(update_fn));
let db = &database;
let settings = {
let string = fs::read_to_string(&command.settings)?;
let settings: Settings = serde_json::from_str(&string).unwrap();
settings.to_update().unwrap()
};
db.update_write(|w| index.settings_update(w, settings))?;
let mut rdr = if command.csv_data_path.as_os_str() == "-" {
csv::Reader::from_reader(Box::new(io::stdin()) as Box<dyn Read>)
} else {
let file = std::fs::File::open(command.csv_data_path)?;
csv::Reader::from_reader(Box::new(file) as Box<dyn Read>)
};
let mut raw_record = csv::StringRecord::new();
let headers = rdr.headers()?.clone();
let mut max_update_id = 0;
let mut i = 0;
let mut end_of_file = false;
while !end_of_file {
let mut additions = index.documents_addition();
loop {
end_of_file = !rdr.read_record(&mut raw_record)?;
if end_of_file {
break;
}
let document: Document = match raw_record.deserialize(Some(&headers)) {
Ok(document) => document,
Err(e) => {
eprintln!("{:?}", e);
continue;
}
};
additions.update_document(document);
print!("\rindexing document {}", i);
i += 1;
if let Some(group_size) = command.update_group_size {
if i % group_size == 0 {
break;
}
}
}
println!();
let update_id = db.update_write(|w| additions.finalize(w))?;
println!("committing update...");
max_update_id = max_update_id.max(update_id);
println!("committed update {}", update_id);
}
println!("Waiting for update {}", max_update_id);
for id in receiver {
if id == max_update_id {
break;
}
}
println!(
"database created in {:.2?} at: {:?}",
start.elapsed(),
command.database_path
);
if let Some(path) = command.compact_to_path {
fs::create_dir_all(&path)?;
let start = Instant::now();
let _file = database.copy_and_compact_to_path(path.join("data.mdb"))?;
println!(
"database compacted in {:.2?} at: {:?}",
start.elapsed(),
path
);
}
Ok(())
}
fn display_highlights(text: &str, ranges: &[usize]) -> io::Result<()> {
let mut stdout = StandardStream::stdout(ColorChoice::Always);
let mut highlighted = false;
for range in ranges.windows(2) {
let [start, end] = match range {
[start, end] => [*start, *end],
_ => unreachable!(),
};
if highlighted {
stdout.set_color(
ColorSpec::new()
.set_fg(Some(Color::Yellow))
.set_underline(true),
)?;
}
write!(&mut stdout, "{}", &text[start..end])?;
stdout.reset()?;
highlighted = !highlighted;
}
Ok(())
}
fn char_to_byte_range(index: usize, length: usize, text: &str) -> (usize, usize) {
let mut byte_index = 0;
let mut byte_length = 0;
for (n, (i, c)) in text.char_indices().enumerate() {
if n == index {
byte_index = i;
}
if n + 1 == index + length {
byte_length = i - byte_index + c.len_utf8();
break;
}
}
(byte_index, byte_length)
}
fn create_highlight_areas(text: &str, highlights: &[Highlight]) -> Vec<usize> {
let mut byte_indexes = BTreeMap::new();
for highlight in highlights {
let char_index = highlight.char_index as usize;
let char_length = highlight.char_length as usize;
let (byte_index, byte_length) = char_to_byte_range(char_index, char_length, text);
match byte_indexes.entry(byte_index) {
Entry::Vacant(entry) => {
entry.insert(byte_length);
}
Entry::Occupied(mut entry) => {
if *entry.get() < byte_length {
entry.insert(byte_length);
}
}
}
}
let mut title_areas = Vec::new();
title_areas.push(0);
for (byte_index, length) in byte_indexes {
title_areas.push(byte_index);
title_areas.push(byte_index + length);
}
title_areas.push(text.len());
title_areas.sort_unstable();
title_areas
}
/// note: matches must have been sorted by `char_index` and `char_length` before being passed.
///
/// ```no_run
/// matches.sort_unstable_by_key(|m| (m.char_index, m.char_length));
///
/// let matches = matches.matches.iter().filter(|m| SchemaAttr::new(m.attribute) == attr).cloned();
///
/// let (text, matches) = crop_text(&text, matches, 35);
/// ```
fn crop_text(
text: &str,
highlights: impl IntoIterator<Item = Highlight>,
context: usize,
) -> (String, Vec<Highlight>) {
let mut highlights = highlights.into_iter().peekable();
let char_index = highlights
.peek()
.map(|m| m.char_index as usize)
.unwrap_or(0);
let start = char_index.saturating_sub(context);
let text = text.chars().skip(start).take(context * 2).collect();
let highlights = highlights
.take_while(|m| (m.char_index as usize) + (m.char_length as usize) <= start + (context * 2))
.map(|highlight| Highlight {
char_index: highlight.char_index - start as u16,
..highlight
})
.collect();
(text, highlights)
}
fn search_command(command: SearchCommand, database: Database) -> Result<(), Box<dyn Error>> {
let db = &database;
let index = database
.open_index(&command.index_uid)
.expect("Could not find index");
let reader = db.main_read_txn().unwrap();
let schema = index.main.schema(&reader)?;
reader.abort().unwrap();
let schema = schema.ok_or(meilisearch_core::Error::SchemaMissing)?;
let fields = command.displayed_fields.iter().map(String::as_str);
let fields = HashSet::from_iter(fields);
let config = Config::builder().auto_add_history(true).build();
let mut readline = Editor::<()>::with_config(config);
let _ = readline.load_history("query-history.txt");
for result in readline.iter("Searching for: ") {
match result {
Ok(query) => {
let start_total = Instant::now();
let reader = db.main_read_txn().unwrap();
let ref_index = &index;
let ref_reader = &reader;
let mut builder = index.query_builder();
if let Some(timeout) = command.fetch_timeout_ms {
builder.with_fetch_timeout(Duration::from_millis(timeout));
}
if let Some(ref filter) = command.filter {
let filter = filter.as_str();
let (positive, filter) = if let Some(stripped) = filter.strip_prefix('!') {
(false, stripped)
} else {
(true, filter)
};
let attr = schema
.id(filter)
.expect("Could not find filtered attribute");
builder.with_filter(move |document_id| {
let string: String = ref_index
.document_attribute(ref_reader, document_id, attr)
.unwrap()
.unwrap();
(string == "true") == positive
});
}
let result = builder.query(ref_reader, Some(&query), 0..command.number_results)?;
let mut retrieve_duration = Duration::default();
let number_of_documents = result.documents.len();
for mut doc in result.documents {
doc.highlights
.sort_unstable_by_key(|m| (m.char_index, m.char_length));
let start_retrieve = Instant::now();
let result = index.document::<Document>(&reader, Some(&fields), doc.id);
retrieve_duration += start_retrieve.elapsed();
match result {
Ok(Some(document)) => {
println!("raw-id: {:?}", doc.id);
for (name, text) in document.0 {
print!("{}: ", name);
let attr = schema.id(&name).unwrap();
let highlights = doc
.highlights
.iter()
.filter(|m| FieldId::new(m.attribute) == attr)
.cloned();
let (text, highlights) =
crop_text(&text, highlights, command.char_context);
let areas = create_highlight_areas(&text, &highlights);
display_highlights(&text, &areas)?;
println!();
}
}
Ok(None) => eprintln!("missing document"),
Err(e) => eprintln!("{}", e),
}
let mut matching_attributes = HashSet::new();
for highlight in doc.highlights {
let attr = FieldId::new(highlight.attribute);
let name = schema.name(attr);
matching_attributes.insert(name);
}
let matching_attributes = Vec::from_iter(matching_attributes);
println!("matching in: {:?}", matching_attributes);
println!();
}
eprintln!(
"whole documents fields retrieve took {:.2?}",
retrieve_duration
);
eprintln!(
"===== Found {} results in {:.2?} =====",
number_of_documents,
start_total.elapsed()
);
}
Err(err) => {
println!("Error: {:?}", err);
break;
}
}
}
readline.save_history("query-history.txt").unwrap();
Ok(())
}
fn show_updates_command(
command: ShowUpdatesCommand,
database: Database,
) -> Result<(), Box<dyn Error>> {
let db = &database;
let index = database
.open_index(&command.index_uid)
.expect("Could not find index");
let reader = db.update_read_txn().unwrap();
let updates = index.all_updates_status(&reader)?;
println!("{:#?}", updates);
reader.abort().unwrap();
Ok(())
}
fn main() -> Result<(), Box<dyn Error>> {
env_logger::init();
let opt = Command::from_args();
let database = Database::open_or_create(opt.path(), DatabaseOptions::default())?;
match opt {
Command::Index(command) => index_command(command, database),
Command::Search(command) => search_command(command, database),
Command::ShowUpdates(command) => show_updates_command(command, database),
}
}

View File

@ -1,53 +0,0 @@
use levenshtein_automata::{LevenshteinAutomatonBuilder as LevBuilder, DFA};
use once_cell::sync::OnceCell;
static LEVDIST0: OnceCell<LevBuilder> = OnceCell::new();
static LEVDIST1: OnceCell<LevBuilder> = OnceCell::new();
static LEVDIST2: OnceCell<LevBuilder> = OnceCell::new();
#[derive(Copy, Clone)]
enum PrefixSetting {
Prefix,
NoPrefix,
}
fn build_dfa_with_setting(query: &str, setting: PrefixSetting) -> DFA {
use PrefixSetting::{NoPrefix, Prefix};
match query.len() {
0..=4 => {
let builder = LEVDIST0.get_or_init(|| LevBuilder::new(0, true));
match setting {
Prefix => builder.build_prefix_dfa(query),
NoPrefix => builder.build_dfa(query),
}
}
5..=8 => {
let builder = LEVDIST1.get_or_init(|| LevBuilder::new(1, true));
match setting {
Prefix => builder.build_prefix_dfa(query),
NoPrefix => builder.build_dfa(query),
}
}
_ => {
let builder = LEVDIST2.get_or_init(|| LevBuilder::new(2, true));
match setting {
Prefix => builder.build_prefix_dfa(query),
NoPrefix => builder.build_dfa(query),
}
}
}
}
pub fn build_prefix_dfa(query: &str) -> DFA {
build_dfa_with_setting(query, PrefixSetting::Prefix)
}
pub fn build_dfa(query: &str) -> DFA {
build_dfa_with_setting(query, PrefixSetting::NoPrefix)
}
pub fn build_exact_dfa(query: &str) -> DFA {
let builder = LEVDIST0.get_or_init(|| LevBuilder::new(0, true));
builder.build_dfa(query)
}

View File

@ -1,4 +0,0 @@
mod dfa;
pub use self::dfa::{build_dfa, build_prefix_dfa, build_exact_dfa};

View File

@ -1,679 +0,0 @@
use std::borrow::Cow;
use std::collections::HashMap;
use std::mem;
use std::ops::Deref;
use std::ops::Range;
use std::rc::Rc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::time::Instant;
use std::fmt;
use compact_arena::{SmallArena, Idx32, mk_arena};
use log::{debug, error};
use sdset::{Set, SetBuf, exponential_search, SetOperation, Counter, duo::OpBuilder};
use slice_group_by::{GroupBy, GroupByMut};
use meilisearch_types::DocIndex;
use crate::criterion::{Criteria, Context, ContextMut};
use crate::distinct_map::{BufferedDistinctMap, DistinctMap};
use crate::raw_document::RawDocument;
use crate::{database::MainT, reordered_attrs::ReorderedAttrs};
use crate::{store, Document, DocumentId, MResult, Index, RankedMap, MainReader, Error};
use crate::query_tree::{create_query_tree, traverse_query_tree};
use crate::query_tree::{Operation, QueryResult, QueryKind, QueryId, PostingsKey};
use crate::query_tree::Context as QTContext;
#[derive(Debug, Default)]
pub struct SortResult {
pub documents: Vec<Document>,
pub nb_hits: usize,
pub exhaustive_nb_hit: bool,
pub facets: Option<HashMap<String, HashMap<String, usize>>>,
pub exhaustive_facets_count: Option<bool>,
}
#[allow(clippy::too_many_arguments)]
pub fn bucket_sort<'c, FI>(
reader: &heed::RoTxn<MainT>,
query: &str,
range: Range<usize>,
facets_docids: Option<SetBuf<DocumentId>>,
facet_count_docids: Option<HashMap<String, HashMap<String, (&str, Cow<Set<DocumentId>>)>>>,
filter: Option<FI>,
criteria: Criteria<'c>,
searchable_attrs: Option<ReorderedAttrs>,
index: &Index,
) -> MResult<SortResult>
where
FI: Fn(DocumentId) -> bool,
{
// We delegate the filter work to the distinct query builder,
// specifying a distinct rule that has no effect.
if filter.is_some() {
let distinct = |_| None;
let distinct_size = 1;
return bucket_sort_with_distinct(
reader,
query,
range,
facets_docids,
facet_count_docids,
filter,
distinct,
distinct_size,
criteria,
searchable_attrs,
index,
);
}
let mut result = SortResult::default();
let words_set = index.main.words_fst(reader)?;
let stop_words = index.main.stop_words_fst(reader)?;
let context = QTContext {
words_set,
stop_words,
synonyms: index.synonyms,
postings_lists: index.postings_lists,
prefix_postings_lists: index.prefix_postings_lists_cache,
};
let (operation, mapping) = create_query_tree(reader, &context, query)?;
debug!("operation:\n{:?}", operation);
debug!("mapping:\n{:?}", mapping);
fn recurs_operation<'o>(map: &mut HashMap<QueryId, &'o QueryKind>, operation: &'o Operation) {
match operation {
Operation::And(ops) => ops.iter().for_each(|op| recurs_operation(map, op)),
Operation::Or(ops) => ops.iter().for_each(|op| recurs_operation(map, op)),
Operation::Query(query) => { map.insert(query.id, &query.kind); },
}
}
let mut queries_kinds = HashMap::new();
recurs_operation(&mut queries_kinds, &operation);
let QueryResult { mut docids, queries } = traverse_query_tree(reader, &context, &operation)?;
debug!("found {} documents", docids.len());
debug!("number of postings {:?}", queries.len());
if let Some(facets_docids) = facets_docids {
let intersection = sdset::duo::OpBuilder::new(docids.as_ref(), facets_docids.as_set())
.intersection()
.into_set_buf();
docids = Cow::Owned(intersection);
}
if let Some(f) = facet_count_docids {
// hardcoded value, until approximation optimization
result.exhaustive_facets_count = Some(true);
result.facets = Some(facet_count(f, &docids));
}
let before = Instant::now();
mk_arena!(arena);
let mut bare_matches = cleanup_bare_matches(&mut arena, &docids, queries);
debug!("matches cleaned in {:.02?}", before.elapsed());
let before_bucket_sort = Instant::now();
let before_raw_documents_building = Instant::now();
let mut raw_documents = Vec::new();
for bare_matches in bare_matches.linear_group_by_key_mut(|sm| sm.document_id) {
let raw_document = RawDocument::new(bare_matches, &mut arena, searchable_attrs.as_ref());
raw_documents.push(raw_document);
}
debug!("creating {} candidates documents took {:.02?}",
raw_documents.len(),
before_raw_documents_building.elapsed(),
);
let before_criterion_loop = Instant::now();
let proximity_count = AtomicUsize::new(0);
let mut groups = vec![raw_documents.as_mut_slice()];
'criteria: for criterion in criteria.as_ref() {
let tmp_groups = mem::replace(&mut groups, Vec::new());
let mut documents_seen = 0;
for mut group in tmp_groups {
let before_criterion_preparation = Instant::now();
let ctx = ContextMut {
reader,
postings_lists: &mut arena,
query_mapping: &mapping,
documents_fields_counts_store: index.documents_fields_counts,
};
criterion.prepare(ctx, &mut group)?;
debug!("{:?} preparation took {:.02?}", criterion.name(), before_criterion_preparation.elapsed());
let ctx = Context {
postings_lists: &arena,
query_mapping: &mapping,
};
let before_criterion_sort = Instant::now();
group.sort_unstable_by(|a, b| criterion.evaluate(&ctx, a, b));
debug!("{:?} evaluation took {:.02?}", criterion.name(), before_criterion_sort.elapsed());
for group in group.binary_group_by_mut(|a, b| criterion.eq(&ctx, a, b)) {
debug!("{:?} produced a group of size {}", criterion.name(), group.len());
documents_seen += group.len();
groups.push(group);
// we have sort enough documents if the last document sorted is after
// the end of the requested range, we can continue to the next criterion
if documents_seen >= range.end {
continue 'criteria;
}
}
}
}
debug!("criterion loop took {:.02?}", before_criterion_loop.elapsed());
debug!("proximity evaluation called {} times", proximity_count.load(Ordering::Relaxed));
let schema = index.main.schema(reader)?.ok_or(Error::SchemaMissing)?;
let iter = raw_documents.into_iter().skip(range.start).take(range.len());
let iter = iter.map(|rd| Document::from_raw(rd, &queries_kinds, &arena, searchable_attrs.as_ref(), &schema));
let documents = iter.collect();
debug!("bucket sort took {:.02?}", before_bucket_sort.elapsed());
result.documents = documents;
result.nb_hits = docids.len();
Ok(result)
}
#[allow(clippy::too_many_arguments)]
pub fn bucket_sort_with_distinct<'c, FI, FD>(
reader: &heed::RoTxn<MainT>,
query: &str,
range: Range<usize>,
facets_docids: Option<SetBuf<DocumentId>>,
facet_count_docids: Option<HashMap<String, HashMap<String, (&str, Cow<Set<DocumentId>>)>>>,
filter: Option<FI>,
distinct: FD,
distinct_size: usize,
criteria: Criteria<'c>,
searchable_attrs: Option<ReorderedAttrs>,
index: &Index,
) -> MResult<SortResult>
where
FI: Fn(DocumentId) -> bool,
FD: Fn(DocumentId) -> Option<u64>,
{
let mut result = SortResult::default();
let mut filtered_count = 0;
let words_set = index.main.words_fst(reader)?;
let stop_words = index.main.stop_words_fst(reader)?;
let context = QTContext {
words_set,
stop_words,
synonyms: index.synonyms,
postings_lists: index.postings_lists,
prefix_postings_lists: index.prefix_postings_lists_cache,
};
let (operation, mapping) = create_query_tree(reader, &context, query)?;
debug!("operation:\n{:?}", operation);
debug!("mapping:\n{:?}", mapping);
fn recurs_operation<'o>(map: &mut HashMap<QueryId, &'o QueryKind>, operation: &'o Operation) {
match operation {
Operation::And(ops) => ops.iter().for_each(|op| recurs_operation(map, op)),
Operation::Or(ops) => ops.iter().for_each(|op| recurs_operation(map, op)),
Operation::Query(query) => { map.insert(query.id, &query.kind); },
}
}
let mut queries_kinds = HashMap::new();
recurs_operation(&mut queries_kinds, &operation);
let QueryResult { mut docids, queries } = traverse_query_tree(reader, &context, &operation)?;
debug!("found {} documents", docids.len());
debug!("number of postings {:?}", queries.len());
if let Some(facets_docids) = facets_docids {
let intersection = OpBuilder::new(docids.as_ref(), facets_docids.as_set())
.intersection()
.into_set_buf();
docids = Cow::Owned(intersection);
}
if let Some(f) = facet_count_docids {
// hardcoded value, until approximation optimization
result.exhaustive_facets_count = Some(true);
result.facets = Some(facet_count(f, &docids));
}
let before = Instant::now();
mk_arena!(arena);
let mut bare_matches = cleanup_bare_matches(&mut arena, &docids, queries);
debug!("matches cleaned in {:.02?}", before.elapsed());
let before_raw_documents_building = Instant::now();
let mut raw_documents = Vec::new();
for bare_matches in bare_matches.linear_group_by_key_mut(|sm| sm.document_id) {
let raw_document = RawDocument::new(bare_matches, &mut arena, searchable_attrs.as_ref());
raw_documents.push(raw_document);
}
debug!("creating {} candidates documents took {:.02?}",
raw_documents.len(),
before_raw_documents_building.elapsed(),
);
let mut groups = vec![raw_documents.as_mut_slice()];
let mut key_cache = HashMap::new();
let mut filter_map = HashMap::new();
// these two variables informs on the current distinct map and
// on the raw offset of the start of the group where the
// range.start bound is located according to the distinct function
let mut distinct_map = DistinctMap::new(distinct_size);
let mut distinct_raw_offset = 0;
'criteria: for criterion in criteria.as_ref() {
let tmp_groups = mem::replace(&mut groups, Vec::new());
let mut buf_distinct = BufferedDistinctMap::new(&mut distinct_map);
let mut documents_seen = 0;
for mut group in tmp_groups {
// if this group does not overlap with the requested range,
// push it without sorting and splitting it
if documents_seen + group.len() < distinct_raw_offset {
documents_seen += group.len();
groups.push(group);
continue;
}
let ctx = ContextMut {
reader,
postings_lists: &mut arena,
query_mapping: &mapping,
documents_fields_counts_store: index.documents_fields_counts,
};
let before_criterion_preparation = Instant::now();
criterion.prepare(ctx, &mut group)?;
debug!("{:?} preparation took {:.02?}", criterion.name(), before_criterion_preparation.elapsed());
let ctx = Context {
postings_lists: &arena,
query_mapping: &mapping,
};
let before_criterion_sort = Instant::now();
group.sort_unstable_by(|a, b| criterion.evaluate(&ctx, a, b));
debug!("{:?} evaluation took {:.02?}", criterion.name(), before_criterion_sort.elapsed());
for group in group.binary_group_by_mut(|a, b| criterion.eq(&ctx, a, b)) {
// we must compute the real distinguished len of this sub-group
for document in group.iter() {
let filter_accepted = match &filter {
Some(filter) => {
let entry = filter_map.entry(document.id);
*entry.or_insert_with(|| {
let accepted = (filter)(document.id);
// we only want to count it out the first time we see it
if !accepted {
filtered_count += 1;
}
accepted
})
}
None => true,
};
if filter_accepted {
let entry = key_cache.entry(document.id);
let mut seen = true;
let key = entry.or_insert_with(|| {
seen = false;
(distinct)(document.id).map(Rc::new)
});
let distinct = match key.clone() {
Some(key) => buf_distinct.register(key),
None => buf_distinct.register_without_key(),
};
// we only want to count the document if it is the first time we see it and
// if it wasn't accepted by distinct
if !seen && !distinct {
filtered_count += 1;
}
}
// the requested range end is reached: stop computing distinct
if buf_distinct.len() >= range.end {
break;
}
}
documents_seen += group.len();
groups.push(group);
// if this sub-group does not overlap with the requested range
// we must update the distinct map and its start index
if buf_distinct.len() < range.start {
buf_distinct.transfert_to_internal();
distinct_raw_offset = documents_seen;
}
// we have sort enough documents if the last document sorted is after
// the end of the requested range, we can continue to the next criterion
if buf_distinct.len() >= range.end {
continue 'criteria;
}
}
}
}
// once we classified the documents related to the current
// automatons we save that as the next valid result
let mut seen = BufferedDistinctMap::new(&mut distinct_map);
let schema = index.main.schema(reader)?.ok_or(Error::SchemaMissing)?;
let mut documents = Vec::with_capacity(range.len());
for raw_document in raw_documents.into_iter().skip(distinct_raw_offset) {
let filter_accepted = match &filter {
Some(_) => filter_map.remove(&raw_document.id).unwrap_or_else(|| {
error!("error during filtering: expected value for document id {}", &raw_document.id.0);
Default::default()
}),
None => true,
};
if filter_accepted {
let key = key_cache.remove(&raw_document.id).unwrap_or_else(|| {
error!("error during distinct: expected value for document id {}", &raw_document.id.0);
Default::default()
});
let distinct_accepted = match key {
Some(key) => seen.register(key),
None => seen.register_without_key(),
};
if distinct_accepted && seen.len() > range.start {
documents.push(Document::from_raw(raw_document, &queries_kinds, &arena, searchable_attrs.as_ref(), &schema));
if documents.len() == range.len() {
break;
}
}
}
}
result.documents = documents;
result.nb_hits = docids.len() - filtered_count;
Ok(result)
}
fn cleanup_bare_matches<'tag, 'txn>(
arena: &mut SmallArena<'tag, PostingsListView<'txn>>,
docids: &Set<DocumentId>,
queries: HashMap<PostingsKey, Cow<'txn, Set<DocIndex>>>,
) -> Vec<BareMatch<'tag>>
{
let docidslen = docids.len() as f32;
let mut bare_matches = Vec::new();
for (PostingsKey { query, input, distance, is_exact }, matches) in queries {
let postings_list_view = PostingsListView::original(Rc::from(input), Rc::new(matches));
let pllen = postings_list_view.len() as f32;
if docidslen / pllen >= 0.8 {
let mut offset = 0;
for matches in postings_list_view.linear_group_by_key(|m| m.document_id) {
let document_id = matches[0].document_id;
if docids.contains(&document_id) {
let range = postings_list_view.range(offset, matches.len());
let posting_list_index = arena.add(range);
let bare_match = BareMatch {
document_id,
query_index: query.id,
distance,
is_exact,
postings_list: posting_list_index,
};
bare_matches.push(bare_match);
}
offset += matches.len();
}
} else {
let mut offset = 0;
for id in docids.as_slice() {
let di = DocIndex { document_id: *id, ..DocIndex::default() };
let pos = exponential_search(&postings_list_view[offset..], &di).unwrap_or_else(|x| x);
offset += pos;
let group = postings_list_view[offset..]
.linear_group_by_key(|m| m.document_id)
.next()
.filter(|matches| matches[0].document_id == *id);
if let Some(matches) = group {
let range = postings_list_view.range(offset, matches.len());
let posting_list_index = arena.add(range);
let bare_match = BareMatch {
document_id: *id,
query_index: query.id,
distance,
is_exact,
postings_list: posting_list_index,
};
bare_matches.push(bare_match);
}
}
}
}
let before_raw_documents_presort = Instant::now();
bare_matches.sort_unstable_by_key(|sm| sm.document_id);
debug!("sort by documents ids took {:.02?}", before_raw_documents_presort.elapsed());
bare_matches
}
pub struct BareMatch<'tag> {
pub document_id: DocumentId,
pub query_index: usize,
pub distance: u8,
pub is_exact: bool,
pub postings_list: Idx32<'tag>,
}
impl fmt::Debug for BareMatch<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("BareMatch")
.field("document_id", &self.document_id)
.field("query_index", &self.query_index)
.field("distance", &self.distance)
.field("is_exact", &self.is_exact)
.finish()
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct SimpleMatch {
pub query_index: usize,
pub distance: u8,
pub attribute: u16,
pub word_index: u16,
pub is_exact: bool,
}
#[derive(Clone)]
pub enum PostingsListView<'txn> {
Original {
input: Rc<[u8]>,
postings_list: Rc<Cow<'txn, Set<DocIndex>>>,
offset: usize,
len: usize,
},
Rewritten {
input: Rc<[u8]>,
postings_list: SetBuf<DocIndex>,
},
}
impl fmt::Debug for PostingsListView<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("PostingsListView")
.field("input", &std::str::from_utf8(&self.input()).unwrap())
.field("postings_list", &self.as_ref())
.finish()
}
}
impl<'txn> PostingsListView<'txn> {
pub fn original(input: Rc<[u8]>, postings_list: Rc<Cow<'txn, Set<DocIndex>>>) -> PostingsListView<'txn> {
let len = postings_list.len();
PostingsListView::Original { input, postings_list, offset: 0, len }
}
pub fn rewritten(input: Rc<[u8]>, postings_list: SetBuf<DocIndex>) -> PostingsListView<'txn> {
PostingsListView::Rewritten { input, postings_list }
}
pub fn rewrite_with(&mut self, postings_list: SetBuf<DocIndex>) {
let input = match self {
PostingsListView::Original { input, .. } => input.clone(),
PostingsListView::Rewritten { input, .. } => input.clone(),
};
*self = PostingsListView::rewritten(input, postings_list);
}
pub fn len(&self) -> usize {
match self {
PostingsListView::Original { len, .. } => *len,
PostingsListView::Rewritten { postings_list, .. } => postings_list.len(),
}
}
pub fn input(&self) -> &[u8] {
match self {
PostingsListView::Original { ref input, .. } => input,
PostingsListView::Rewritten { ref input, .. } => input,
}
}
pub fn range(&self, range_offset: usize, range_len: usize) -> PostingsListView<'txn> {
match self {
PostingsListView::Original { input, postings_list, offset, len } => {
assert!(range_offset + range_len <= *len);
PostingsListView::Original {
input: input.clone(),
postings_list: postings_list.clone(),
offset: offset + range_offset,
len: range_len,
}
},
PostingsListView::Rewritten { .. } => {
panic!("Cannot create a range on a rewritten postings list view");
}
}
}
}
impl AsRef<Set<DocIndex>> for PostingsListView<'_> {
fn as_ref(&self) -> &Set<DocIndex> {
self
}
}
impl Deref for PostingsListView<'_> {
type Target = Set<DocIndex>;
fn deref(&self) -> &Set<DocIndex> {
match *self {
PostingsListView::Original { ref postings_list, offset, len, .. } => {
Set::new_unchecked(&postings_list[offset..offset + len])
},
PostingsListView::Rewritten { ref postings_list, .. } => postings_list,
}
}
}
/// sorts documents ids according to user defined ranking rules.
pub fn placeholder_document_sort(
document_ids: &mut [DocumentId],
index: &store::Index,
reader: &MainReader,
ranked_map: &RankedMap
) -> MResult<()> {
use crate::settings::RankingRule;
use std::cmp::Ordering;
enum SortOrder {
Asc,
Desc,
}
if let Some(ranking_rules) = index.main.ranking_rules(reader)? {
let schema = index.main.schema(reader)?
.ok_or(Error::SchemaMissing)?;
// Select custom rules from ranking rules, and map them to custom rules
// containing a field_id
let ranking_rules = ranking_rules.iter().filter_map(|r|
match r {
RankingRule::Asc(name) => schema.id(name).map(|f| (f, SortOrder::Asc)),
RankingRule::Desc(name) => schema.id(name).map(|f| (f, SortOrder::Desc)),
_ => None,
}).collect::<Vec<_>>();
document_ids.sort_unstable_by(|a, b| {
for (field_id, order) in &ranking_rules {
let a_value = ranked_map.get(*a, *field_id);
let b_value = ranked_map.get(*b, *field_id);
let (a, b) = match order {
SortOrder::Asc => (a_value, b_value),
SortOrder::Desc => (b_value, a_value),
};
match a.cmp(&b) {
Ordering::Equal => continue,
ordering => return ordering,
}
}
Ordering::Equal
});
}
Ok(())
}
/// For each entry in facet_docids, calculates the number of documents in the intersection with candidate_docids.
pub fn facet_count(
facet_docids: HashMap<String, HashMap<String, (&str, Cow<Set<DocumentId>>)>>,
candidate_docids: &Set<DocumentId>,
) -> HashMap<String, HashMap<String, usize>> {
let mut facets_counts = HashMap::with_capacity(facet_docids.len());
for (key, doc_map) in facet_docids {
let mut count_map = HashMap::with_capacity(doc_map.len());
for (_, (value, docids)) in doc_map {
let mut counter = Counter::new();
let op = OpBuilder::new(docids.as_ref(), candidate_docids).intersection();
SetOperation::<DocumentId>::extend_collection(op, &mut counter);
count_map.insert(value.to_string(), counter.0);
}
facets_counts.insert(key, count_map);
}
facets_counts
}

View File

@ -1,37 +0,0 @@
use std::cmp::Ordering;
use slice_group_by::GroupBy;
use crate::{RawDocument, MResult};
use crate::bucket_sort::SimpleMatch;
use super::{Criterion, Context, ContextMut, prepare_bare_matches};
pub struct Attribute;
impl Criterion for Attribute {
fn name(&self) -> &str { "attribute" }
fn prepare<'h, 'p, 'tag, 'txn, 'q, 'r>(
&self,
ctx: ContextMut<'h, 'p, 'tag, 'txn, 'q>,
documents: &mut [RawDocument<'r, 'tag>],
) -> MResult<()>
{
prepare_bare_matches(documents, ctx.postings_lists, ctx.query_mapping);
Ok(())
}
fn evaluate(&self, _ctx: &Context, lhs: &RawDocument, rhs: &RawDocument) -> Ordering {
#[inline]
fn sum_of_attribute(matches: &[SimpleMatch]) -> usize {
let mut sum_of_attribute = 0;
for group in matches.linear_group_by_key(|bm| bm.query_index) {
sum_of_attribute += group[0].attribute as usize;
}
sum_of_attribute
}
let lhs = sum_of_attribute(&lhs.processed_matches);
let rhs = sum_of_attribute(&rhs.processed_matches);
lhs.cmp(&rhs)
}
}

View File

@ -1,16 +0,0 @@
use std::cmp::Ordering;
use crate::RawDocument;
use super::{Criterion, Context};
pub struct DocumentId;
impl Criterion for DocumentId {
fn name(&self) -> &str { "stable document id" }
fn evaluate(&self, _ctx: &Context, lhs: &RawDocument, rhs: &RawDocument) -> Ordering {
let lhs = &lhs.id;
let rhs = &rhs.id;
lhs.cmp(rhs)
}
}

View File

@ -1,78 +0,0 @@
use std::cmp::{Ordering, Reverse};
use std::collections::hash_map::{HashMap, Entry};
use meilisearch_schema::IndexedPos;
use slice_group_by::GroupBy;
use crate::{RawDocument, MResult};
use crate::bucket_sort::BareMatch;
use super::{Criterion, Context, ContextMut};
pub struct Exactness;
impl Criterion for Exactness {
fn name(&self) -> &str { "exactness" }
fn prepare<'h, 'p, 'tag, 'txn, 'q, 'r>(
&self,
ctx: ContextMut<'h, 'p, 'tag, 'txn, 'q>,
documents: &mut [RawDocument<'r, 'tag>],
) -> MResult<()>
{
let store = ctx.documents_fields_counts_store;
let reader = ctx.reader;
'documents: for doc in documents {
doc.bare_matches.sort_unstable_by_key(|bm| (bm.query_index, Reverse(bm.is_exact)));
// mark the document if we find a "one word field" that matches
let mut fields_counts = HashMap::new();
for group in doc.bare_matches.linear_group_by_key(|bm| bm.query_index) {
for group in group.linear_group_by_key(|bm| bm.is_exact) {
if !group[0].is_exact { break }
for bm in group {
for di in ctx.postings_lists[bm.postings_list].as_ref() {
let attr = IndexedPos(di.attribute);
let count = match fields_counts.entry(attr) {
Entry::Occupied(entry) => *entry.get(),
Entry::Vacant(entry) => {
let count = store.document_field_count(reader, doc.id, attr)?;
*entry.insert(count)
},
};
if count == Some(1) {
doc.contains_one_word_field = true;
continue 'documents
}
}
}
}
}
}
Ok(())
}
fn evaluate(&self, _ctx: &Context, lhs: &RawDocument, rhs: &RawDocument) -> Ordering {
#[inline]
fn sum_exact_query_words(matches: &[BareMatch]) -> usize {
let mut sum_exact_query_words = 0;
for group in matches.linear_group_by_key(|bm| bm.query_index) {
sum_exact_query_words += group[0].is_exact as usize;
}
sum_exact_query_words
}
// does it contains a "one word field"
lhs.contains_one_word_field.cmp(&rhs.contains_one_word_field).reverse()
// if not, with document contains the more exact words
.then_with(|| {
let lhs = sum_exact_query_words(&lhs.bare_matches);
let rhs = sum_exact_query_words(&rhs.bare_matches);
lhs.cmp(&rhs).reverse()
})
}
}

View File

@ -1,292 +0,0 @@
use std::cmp::{self, Ordering};
use std::collections::HashMap;
use std::ops::Range;
use compact_arena::SmallArena;
use sdset::SetBuf;
use slice_group_by::GroupBy;
use crate::bucket_sort::{SimpleMatch, PostingsListView};
use crate::database::MainT;
use crate::query_tree::QueryId;
use crate::{store, RawDocument, MResult};
mod typo;
mod words;
mod proximity;
mod attribute;
mod words_position;
mod exactness;
mod document_id;
mod sort_by_attr;
pub use self::typo::Typo;
pub use self::words::Words;
pub use self::proximity::Proximity;
pub use self::attribute::Attribute;
pub use self::words_position::WordsPosition;
pub use self::exactness::Exactness;
pub use self::document_id::DocumentId;
pub use self::sort_by_attr::SortByAttr;
pub trait Criterion {
fn name(&self) -> &str;
fn prepare<'h, 'p, 'tag, 'txn, 'q, 'r>(
&self,
_ctx: ContextMut<'h, 'p, 'tag, 'txn, 'q>,
_documents: &mut [RawDocument<'r, 'tag>],
) -> MResult<()>
{
Ok(())
}
fn evaluate<'p, 'tag, 'txn, 'q, 'r>(
&self,
ctx: &Context<'p, 'tag, 'txn, 'q>,
lhs: &RawDocument<'r, 'tag>,
rhs: &RawDocument<'r, 'tag>,
) -> Ordering;
#[inline]
fn eq<'p, 'tag, 'txn, 'q, 'r>(
&self,
ctx: &Context<'p, 'tag, 'txn, 'q>,
lhs: &RawDocument<'r, 'tag>,
rhs: &RawDocument<'r, 'tag>,
) -> bool
{
self.evaluate(ctx, lhs, rhs) == Ordering::Equal
}
}
pub struct ContextMut<'h, 'p, 'tag, 'txn, 'q> {
pub reader: &'h heed::RoTxn<'h, MainT>,
pub postings_lists: &'p mut SmallArena<'tag, PostingsListView<'txn>>,
pub query_mapping: &'q HashMap<QueryId, Range<usize>>,
pub documents_fields_counts_store: store::DocumentsFieldsCounts,
}
pub struct Context<'p, 'tag, 'txn, 'q> {
pub postings_lists: &'p SmallArena<'tag, PostingsListView<'txn>>,
pub query_mapping: &'q HashMap<QueryId, Range<usize>>,
}
#[derive(Default)]
pub struct CriteriaBuilder<'a> {
inner: Vec<Box<dyn Criterion + 'a>>,
}
impl<'a> CriteriaBuilder<'a> {
pub fn new() -> CriteriaBuilder<'a> {
CriteriaBuilder { inner: Vec::new() }
}
pub fn with_capacity(capacity: usize) -> CriteriaBuilder<'a> {
CriteriaBuilder {
inner: Vec::with_capacity(capacity),
}
}
pub fn reserve(&mut self, additional: usize) {
self.inner.reserve(additional)
}
#[allow(clippy::should_implement_trait)]
pub fn add<C: 'a>(mut self, criterion: C) -> CriteriaBuilder<'a>
where
C: Criterion,
{
self.push(criterion);
self
}
pub fn push<C: 'a>(&mut self, criterion: C)
where
C: Criterion,
{
self.inner.push(Box::new(criterion));
}
pub fn build(self) -> Criteria<'a> {
Criteria { inner: self.inner }
}
}
pub struct Criteria<'a> {
inner: Vec<Box<dyn Criterion + 'a>>,
}
impl<'a> Default for Criteria<'a> {
fn default() -> Self {
CriteriaBuilder::with_capacity(7)
.add(Typo)
.add(Words)
.add(Proximity)
.add(Attribute)
.add(WordsPosition)
.add(Exactness)
.add(DocumentId)
.build()
}
}
impl<'a> AsRef<[Box<dyn Criterion + 'a>]> for Criteria<'a> {
fn as_ref(&self) -> &[Box<dyn Criterion + 'a>] {
&self.inner
}
}
fn prepare_query_distances<'a, 'tag, 'txn>(
documents: &mut [RawDocument<'a, 'tag>],
query_mapping: &HashMap<QueryId, Range<usize>>,
postings_lists: &SmallArena<'tag, PostingsListView<'txn>>,
) {
for document in documents {
if !document.processed_distances.is_empty() { continue }
let mut processed = Vec::new();
for m in document.bare_matches.iter() {
if postings_lists[m.postings_list].is_empty() { continue }
let range = query_mapping[&(m.query_index as usize)].clone();
let new_len = cmp::max(range.end as usize, processed.len());
processed.resize(new_len, None);
for index in range {
let index = index as usize;
processed[index] = match processed[index] {
Some(distance) if distance > m.distance => Some(m.distance),
Some(distance) => Some(distance),
None => Some(m.distance),
};
}
}
document.processed_distances = processed;
}
}
fn prepare_bare_matches<'a, 'tag, 'txn>(
documents: &mut [RawDocument<'a, 'tag>],
postings_lists: &mut SmallArena<'tag, PostingsListView<'txn>>,
query_mapping: &HashMap<QueryId, Range<usize>>,
) {
for document in documents {
if !document.processed_matches.is_empty() { continue }
let mut processed = Vec::new();
for m in document.bare_matches.iter() {
let postings_list = &postings_lists[m.postings_list];
processed.reserve(postings_list.len());
for di in postings_list.as_ref() {
let simple_match = SimpleMatch {
query_index: m.query_index,
distance: m.distance,
attribute: di.attribute,
word_index: di.word_index,
is_exact: m.is_exact,
};
processed.push(simple_match);
}
}
let processed = multiword_rewrite_matches(&mut processed, query_mapping);
document.processed_matches = processed.into_vec();
}
}
fn multiword_rewrite_matches(
matches: &mut [SimpleMatch],
query_mapping: &HashMap<QueryId, Range<usize>>,
) -> SetBuf<SimpleMatch>
{
matches.sort_unstable_by_key(|m| (m.attribute, m.word_index));
let mut padded_matches = Vec::with_capacity(matches.len());
// let before_padding = Instant::now();
// for each attribute of each document
for same_document_attribute in matches.linear_group_by_key(|m| m.attribute) {
// padding will only be applied
// to word indices in the same attribute
let mut padding = 0;
let mut iter = same_document_attribute.linear_group_by_key(|m| m.word_index);
// for each match at the same position
// in this document attribute
while let Some(same_word_index) = iter.next() {
// find the biggest padding
let mut biggest = 0;
for match_ in same_word_index {
let mut replacement = query_mapping[&(match_.query_index as usize)].clone();
let replacement_len = replacement.len();
let nexts = iter.remainder().linear_group_by_key(|m| m.word_index);
if let Some(query_index) = replacement.next() {
let word_index = match_.word_index + padding as u16;
let match_ = SimpleMatch { query_index, word_index, ..*match_ };
padded_matches.push(match_);
}
let mut found = false;
// look ahead and if there already is a match
// corresponding to this padding word, abort the padding
'padding: for (x, next_group) in nexts.enumerate() {
for (i, query_index) in replacement.clone().enumerate().skip(x) {
let word_index = match_.word_index + padding as u16 + (i + 1) as u16;
let padmatch = SimpleMatch { query_index, word_index, ..*match_ };
for nmatch_ in next_group {
let mut rep = query_mapping[&(nmatch_.query_index as usize)].clone();
let query_index = rep.next().unwrap();
if query_index == padmatch.query_index {
if !found {
// if we find a corresponding padding for the
// first time we must push preceding paddings
for (i, query_index) in replacement.clone().enumerate().take(i) {
let word_index = match_.word_index + padding as u16 + (i + 1) as u16;
let match_ = SimpleMatch { query_index, word_index, ..*match_ };
padded_matches.push(match_);
biggest = biggest.max(i + 1);
}
}
padded_matches.push(padmatch);
found = true;
continue 'padding;
}
}
}
// if we do not find a corresponding padding in the
// next groups so stop here and pad what was found
break;
}
if !found {
// if no padding was found in the following matches
// we must insert the entire padding
for (i, query_index) in replacement.enumerate() {
let word_index = match_.word_index + padding as u16 + (i + 1) as u16;
let match_ = SimpleMatch { query_index, word_index, ..*match_ };
padded_matches.push(match_);
}
biggest = biggest.max(replacement_len - 1);
}
}
padding += biggest;
}
}
// debug!("padding matches took {:.02?}", before_padding.elapsed());
// With this check we can see that the loop above takes something
// like 43% of the search time even when no rewrite is needed.
// assert_eq!(before_matches, padded_matches);
SetBuf::from_dirty(padded_matches)
}

View File

@ -1,68 +0,0 @@
use std::cmp::{self, Ordering};
use slice_group_by::GroupBy;
use crate::bucket_sort::{SimpleMatch};
use crate::{RawDocument, MResult};
use super::{Criterion, Context, ContextMut, prepare_bare_matches};
const MAX_DISTANCE: u16 = 8;
pub struct Proximity;
impl Criterion for Proximity {
fn name(&self) -> &str { "proximity" }
fn prepare<'h, 'p, 'tag, 'txn, 'q, 'r>(
&self,
ctx: ContextMut<'h, 'p, 'tag, 'txn, 'q>,
documents: &mut [RawDocument<'r, 'tag>],
) -> MResult<()>
{
prepare_bare_matches(documents, ctx.postings_lists, ctx.query_mapping);
Ok(())
}
fn evaluate(&self, _ctx: &Context, lhs: &RawDocument, rhs: &RawDocument) -> Ordering {
fn index_proximity(lhs: u16, rhs: u16) -> u16 {
if lhs < rhs {
cmp::min(rhs - lhs, MAX_DISTANCE)
} else {
cmp::min(lhs - rhs, MAX_DISTANCE) + 1
}
}
fn attribute_proximity(lhs: SimpleMatch, rhs: SimpleMatch) -> u16 {
if lhs.attribute != rhs.attribute { MAX_DISTANCE }
else { index_proximity(lhs.word_index, rhs.word_index) }
}
fn min_proximity(lhs: &[SimpleMatch], rhs: &[SimpleMatch]) -> u16 {
let mut min_prox = u16::max_value();
for a in lhs {
for b in rhs {
let prox = attribute_proximity(*a, *b);
min_prox = cmp::min(min_prox, prox);
}
}
min_prox
}
fn matches_proximity(matches: &[SimpleMatch],) -> u16 {
let mut proximity = 0;
let mut iter = matches.linear_group_by_key(|m| m.query_index);
// iterate over groups by windows of size 2
let mut last = iter.next();
while let (Some(lhs), Some(rhs)) = (last, iter.next()) {
proximity += min_proximity(lhs, rhs);
last = Some(rhs);
}
proximity
}
let lhs = matches_proximity(&lhs.processed_matches);
let rhs = matches_proximity(&rhs.processed_matches);
lhs.cmp(&rhs)
}
}

View File

@ -1,129 +0,0 @@
use std::cmp::Ordering;
use std::error::Error;
use std::fmt;
use meilisearch_schema::{Schema, FieldId};
use crate::{RankedMap, RawDocument};
use super::{Criterion, Context};
/// An helper struct that permit to sort documents by
/// some of their stored attributes.
///
/// # Note
///
/// If a document cannot be deserialized it will be considered [`None`][].
///
/// Deserialized documents are compared like `Some(doc0).cmp(&Some(doc1))`,
/// so you must check the [`Ord`] of `Option` implementation.
///
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None
/// [`Ord`]: https://doc.rust-lang.org/std/option/enum.Option.html#impl-Ord
///
/// # Example
///
/// ```ignore
/// use serde_derive::Deserialize;
/// use meilisearch::rank::criterion::*;
///
/// let custom_ranking = SortByAttr::lower_is_better(&ranked_map, &schema, "published_at")?;
///
/// let builder = CriteriaBuilder::with_capacity(8)
/// .add(Typo)
/// .add(Words)
/// .add(Proximity)
/// .add(Attribute)
/// .add(WordsPosition)
/// .add(Exactness)
/// .add(custom_ranking)
/// .add(DocumentId);
///
/// let criterion = builder.build();
///
/// ```
pub struct SortByAttr<'a> {
ranked_map: &'a RankedMap,
field_id: FieldId,
reversed: bool,
}
impl<'a> SortByAttr<'a> {
pub fn lower_is_better(
ranked_map: &'a RankedMap,
schema: &Schema,
attr_name: &str,
) -> Result<SortByAttr<'a>, SortByAttrError> {
SortByAttr::new(ranked_map, schema, attr_name, false)
}
pub fn higher_is_better(
ranked_map: &'a RankedMap,
schema: &Schema,
attr_name: &str,
) -> Result<SortByAttr<'a>, SortByAttrError> {
SortByAttr::new(ranked_map, schema, attr_name, true)
}
fn new(
ranked_map: &'a RankedMap,
schema: &Schema,
attr_name: &str,
reversed: bool,
) -> Result<SortByAttr<'a>, SortByAttrError> {
let field_id = match schema.id(attr_name) {
Some(field_id) => field_id,
None => return Err(SortByAttrError::AttributeNotFound),
};
if !schema.is_ranked(field_id) {
return Err(SortByAttrError::AttributeNotRegisteredForRanking);
}
Ok(SortByAttr {
ranked_map,
field_id,
reversed,
})
}
}
impl Criterion for SortByAttr<'_> {
fn name(&self) -> &str {
"sort by attribute"
}
fn evaluate(&self, _ctx: &Context, lhs: &RawDocument, rhs: &RawDocument) -> Ordering {
let lhs = self.ranked_map.get(lhs.id, self.field_id);
let rhs = self.ranked_map.get(rhs.id, self.field_id);
match (lhs, rhs) {
(Some(lhs), Some(rhs)) => {
let order = lhs.cmp(&rhs);
if self.reversed {
order.reverse()
} else {
order
}
}
(None, Some(_)) => Ordering::Greater,
(Some(_), None) => Ordering::Less,
(None, None) => Ordering::Equal,
}
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum SortByAttrError {
AttributeNotFound,
AttributeNotRegisteredForRanking,
}
impl fmt::Display for SortByAttrError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
use SortByAttrError::*;
match self {
AttributeNotFound => f.write_str("attribute not found in the schema"),
AttributeNotRegisteredForRanking => f.write_str("attribute not registered for ranking"),
}
}
}
impl Error for SortByAttrError {}

View File

@ -1,56 +0,0 @@
use std::cmp::Ordering;
use crate::{RawDocument, MResult};
use super::{Criterion, Context, ContextMut, prepare_query_distances};
pub struct Typo;
impl Criterion for Typo {
fn name(&self) -> &str { "typo" }
fn prepare<'h, 'p, 'tag, 'txn, 'q, 'r>(
&self,
ctx: ContextMut<'h, 'p, 'tag, 'txn, 'q>,
documents: &mut [RawDocument<'r, 'tag>],
) -> MResult<()>
{
prepare_query_distances(documents, ctx.query_mapping, ctx.postings_lists);
Ok(())
}
fn evaluate(&self, _ctx: &Context, lhs: &RawDocument, rhs: &RawDocument) -> Ordering {
// This function is a wrong logarithmic 10 function.
// It is safe to panic on input number higher than 3,
// the number of typos is never bigger than that.
#[inline]
#[allow(clippy::approx_constant)]
fn custom_log10(n: u8) -> f32 {
match n {
0 => 0.0, // log(1)
1 => 0.30102, // log(2)
2 => 0.47712, // log(3)
3 => 0.60205, // log(4)
_ => panic!("invalid number"),
}
}
#[inline]
fn compute_typos(distances: &[Option<u8>]) -> usize {
let mut number_words: usize = 0;
let mut sum_typos = 0.0;
for distance in distances {
if let Some(distance) = distance {
sum_typos += custom_log10(*distance);
number_words += 1;
}
}
(number_words as f32 / (sum_typos + 1.0) * 1000.0) as usize
}
let lhs = compute_typos(&lhs.processed_distances);
let rhs = compute_typos(&rhs.processed_distances);
lhs.cmp(&rhs).reverse()
}
}

View File

@ -1,31 +0,0 @@
use std::cmp::Ordering;
use crate::{RawDocument, MResult};
use super::{Criterion, Context, ContextMut, prepare_query_distances};
pub struct Words;
impl Criterion for Words {
fn name(&self) -> &str { "words" }
fn prepare<'h, 'p, 'tag, 'txn, 'q, 'r>(
&self,
ctx: ContextMut<'h, 'p, 'tag, 'txn, 'q>,
documents: &mut [RawDocument<'r, 'tag>],
) -> MResult<()>
{
prepare_query_distances(documents, ctx.query_mapping, ctx.postings_lists);
Ok(())
}
fn evaluate(&self, _ctx: &Context, lhs: &RawDocument, rhs: &RawDocument) -> Ordering {
#[inline]
fn number_of_query_words(distances: &[Option<u8>]) -> usize {
distances.iter().cloned().filter(Option::is_some).count()
}
let lhs = number_of_query_words(&lhs.processed_distances);
let rhs = number_of_query_words(&rhs.processed_distances);
lhs.cmp(&rhs).reverse()
}
}

View File

@ -1,37 +0,0 @@
use std::cmp::Ordering;
use slice_group_by::GroupBy;
use crate::bucket_sort::SimpleMatch;
use crate::{RawDocument, MResult};
use super::{Criterion, Context, ContextMut, prepare_bare_matches};
pub struct WordsPosition;
impl Criterion for WordsPosition {
fn name(&self) -> &str { "words position" }
fn prepare<'h, 'p, 'tag, 'txn, 'q, 'r>(
&self,
ctx: ContextMut<'h, 'p, 'tag, 'txn, 'q>,
documents: &mut [RawDocument<'r, 'tag>],
) -> MResult<()>
{
prepare_bare_matches(documents, ctx.postings_lists, ctx.query_mapping);
Ok(())
}
fn evaluate(&self, _ctx: &Context, lhs: &RawDocument, rhs: &RawDocument) -> Ordering {
#[inline]
fn sum_words_position(matches: &[SimpleMatch]) -> usize {
let mut sum_words_position = 0;
for group in matches.linear_group_by_key(|bm| bm.query_index) {
sum_words_position += group[0].word_index as usize;
}
sum_words_position
}
let lhs = sum_words_position(&lhs.processed_matches);
let rhs = sum_words_position(&rhs.processed_matches);
lhs.cmp(&rhs)
}
}

File diff suppressed because it is too large Load Diff

View File

@ -1,103 +0,0 @@
use hashbrown::HashMap;
use std::hash::Hash;
pub struct DistinctMap<K> {
inner: HashMap<K, usize>,
limit: usize,
len: usize,
}
impl<K: Hash + Eq> DistinctMap<K> {
pub fn new(limit: usize) -> Self {
DistinctMap {
inner: HashMap::new(),
limit,
len: 0,
}
}
pub fn len(&self) -> usize {
self.len
}
}
pub struct BufferedDistinctMap<'a, K> {
internal: &'a mut DistinctMap<K>,
inner: HashMap<K, usize>,
len: usize,
}
impl<'a, K: Hash + Eq> BufferedDistinctMap<'a, K> {
pub fn new(internal: &'a mut DistinctMap<K>) -> BufferedDistinctMap<'a, K> {
BufferedDistinctMap {
internal,
inner: HashMap::new(),
len: 0,
}
}
pub fn register(&mut self, key: K) -> bool {
let internal_seen = self.internal.inner.get(&key).unwrap_or(&0);
let inner_seen = self.inner.entry(key).or_insert(0);
let seen = *internal_seen + *inner_seen;
if seen < self.internal.limit {
*inner_seen += 1;
self.len += 1;
true
} else {
false
}
}
pub fn register_without_key(&mut self) -> bool {
self.len += 1;
true
}
pub fn transfert_to_internal(&mut self) {
for (k, v) in self.inner.drain() {
let value = self.internal.inner.entry(k).or_insert(0);
*value += v;
}
self.internal.len += self.len;
self.len = 0;
}
pub fn len(&self) -> usize {
self.internal.len() + self.len
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn easy_distinct_map() {
let mut map = DistinctMap::new(2);
let mut buffered = BufferedDistinctMap::new(&mut map);
for x in &[1, 1, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6] {
buffered.register(x);
}
buffered.transfert_to_internal();
assert_eq!(map.len(), 8);
let mut map = DistinctMap::new(2);
let mut buffered = BufferedDistinctMap::new(&mut map);
assert_eq!(buffered.register(1), true);
assert_eq!(buffered.register(1), true);
assert_eq!(buffered.register(1), false);
assert_eq!(buffered.register(1), false);
assert_eq!(buffered.register(2), true);
assert_eq!(buffered.register(3), true);
assert_eq!(buffered.register(2), true);
assert_eq!(buffered.register(2), false);
buffered.transfert_to_internal();
assert_eq!(map.len(), 5);
}
}

View File

@ -1,224 +0,0 @@
use crate::serde::{DeserializerError, SerializerError};
use serde_json::Error as SerdeJsonError;
use pest::error::Error as PestError;
use crate::filters::Rule;
use std::{error, fmt, io};
pub use bincode::Error as BincodeError;
pub use fst::Error as FstError;
pub use heed::Error as HeedError;
pub use pest::error as pest_error;
use meilisearch_error::{ErrorCode, Code};
pub type MResult<T> = Result<T, Error>;
#[derive(Debug)]
pub enum Error {
Bincode(bincode::Error),
Deserializer(DeserializerError),
FacetError(FacetError),
FilterParseError(PestError<Rule>),
Fst(fst::Error),
Heed(heed::Error),
IndexAlreadyExists,
Io(io::Error),
MaxFieldsLimitExceeded,
MissingDocumentId,
MissingPrimaryKey,
Schema(meilisearch_schema::Error),
SchemaMissing,
SerdeJson(SerdeJsonError),
Serializer(SerializerError),
VersionMismatch(String),
WordIndexMissing,
}
impl ErrorCode for Error {
fn error_code(&self) -> Code {
use Error::*;
match self {
FacetError(_) => Code::Facet,
FilterParseError(_) => Code::Filter,
IndexAlreadyExists => Code::IndexAlreadyExists,
MissingPrimaryKey => Code::MissingPrimaryKey,
MissingDocumentId => Code::MissingDocumentId,
MaxFieldsLimitExceeded => Code::MaxFieldsLimitExceeded,
Schema(s) => s.error_code(),
WordIndexMissing
| SchemaMissing => Code::InvalidState,
Heed(_)
| Fst(_)
| SerdeJson(_)
| Bincode(_)
| Serializer(_)
| Deserializer(_)
| VersionMismatch(_)
| Io(_) => Code::Internal,
}
}
}
impl From<io::Error> for Error {
fn from(error: io::Error) -> Error {
Error::Io(error)
}
}
impl From<PestError<Rule>> for Error {
fn from(error: PestError<Rule>) -> Error {
Error::FilterParseError(error.renamed_rules(|r| {
let s = match r {
Rule::or => "OR",
Rule::and => "AND",
Rule::not => "NOT",
Rule::string => "string",
Rule::word => "word",
Rule::greater => "field > value",
Rule::less => "field < value",
Rule::eq => "field = value",
Rule::leq => "field <= value",
Rule::geq => "field >= value",
Rule::key => "key",
_ => "other",
};
s.to_string()
}))
}
}
impl From<FacetError> for Error {
fn from(error: FacetError) -> Error {
Error::FacetError(error)
}
}
impl From<meilisearch_schema::Error> for Error {
fn from(error: meilisearch_schema::Error) -> Error {
Error::Schema(error)
}
}
impl From<HeedError> for Error {
fn from(error: HeedError) -> Error {
Error::Heed(error)
}
}
impl From<FstError> for Error {
fn from(error: FstError) -> Error {
Error::Fst(error)
}
}
impl From<SerdeJsonError> for Error {
fn from(error: SerdeJsonError) -> Error {
Error::SerdeJson(error)
}
}
impl From<BincodeError> for Error {
fn from(error: BincodeError) -> Error {
Error::Bincode(error)
}
}
impl From<SerializerError> for Error {
fn from(error: SerializerError) -> Error {
match error {
SerializerError::DocumentIdNotFound => Error::MissingDocumentId,
e => Error::Serializer(e),
}
}
}
impl From<DeserializerError> for Error {
fn from(error: DeserializerError) -> Error {
Error::Deserializer(error)
}
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
use self::Error::*;
match self {
Bincode(e) => write!(f, "bincode error; {}", e),
Deserializer(e) => write!(f, "deserializer error; {}", e),
FacetError(e) => write!(f, "error processing facet filter: {}", e),
FilterParseError(e) => write!(f, "error parsing filter; {}", e),
Fst(e) => write!(f, "fst error; {}", e),
Heed(e) => write!(f, "heed error; {}", e),
IndexAlreadyExists => write!(f, "index already exists"),
Io(e) => write!(f, "{}", e),
MaxFieldsLimitExceeded => write!(f, "maximum number of fields in a document exceeded"),
MissingDocumentId => write!(f, "document id is missing"),
MissingPrimaryKey => write!(f, "schema cannot be built without a primary key"),
Schema(e) => write!(f, "schema error; {}", e),
SchemaMissing => write!(f, "this index does not have a schema"),
SerdeJson(e) => write!(f, "serde json error; {}", e),
Serializer(e) => write!(f, "serializer error; {}", e),
VersionMismatch(version) => write!(f, "Cannot open database, expected MeiliSearch engine version: {}, current engine version: {}.{}.{}",
version,
env!("CARGO_PKG_VERSION_MAJOR"),
env!("CARGO_PKG_VERSION_MINOR"),
env!("CARGO_PKG_VERSION_PATCH")),
WordIndexMissing => write!(f, "this index does not have a word index"),
}
}
}
impl error::Error for Error {}
struct FilterParseError(PestError<Rule>);
impl fmt::Display for FilterParseError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
use crate::pest_error::LineColLocation::*;
let (line, column) = match self.0.line_col {
Span((line, _), (column, _)) => (line, column),
Pos((line, column)) => (line, column),
};
write!(f, "parsing error on line {} at column {}: {}", line, column, self.0.variant.message())
}
}
#[derive(Debug)]
pub enum FacetError {
EmptyArray,
ParsingError(String),
UnexpectedToken { expected: &'static [&'static str], found: String },
InvalidFormat(String),
AttributeNotFound(String),
AttributeNotSet { expected: Vec<String>, found: String },
InvalidDocumentAttribute(String),
NoAttributesForFaceting,
}
impl FacetError {
pub fn unexpected_token(expected: &'static [&'static str], found: impl ToString) -> FacetError {
FacetError::UnexpectedToken{ expected, found: found.to_string() }
}
pub fn attribute_not_set(expected: Vec<String>, found: impl ToString) -> FacetError {
FacetError::AttributeNotSet{ expected, found: found.to_string() }
}
}
impl fmt::Display for FacetError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
use FacetError::*;
match self {
EmptyArray => write!(f, "empty array in facet filter is unspecified behavior"),
ParsingError(msg) => write!(f, "parsing error: {}", msg),
UnexpectedToken { expected, found } => write!(f, "unexpected token {}, expected {}", found, expected.join("or")),
InvalidFormat(found) => write!(f, "invalid facet: {}, facets should be \"facetName:facetValue\"", found),
AttributeNotFound(attr) => write!(f, "unknown {:?} attribute", attr),
AttributeNotSet { found, expected } => write!(f, "`{}` is not set as a faceted attribute. available facet attributes: {}", found, expected.join(", ")),
InvalidDocumentAttribute(attr) => write!(f, "invalid document attribute {}, accepted types: String and [String]", attr),
NoAttributesForFaceting => write!(f, "impossible to perform faceted search, no attributes for faceting are set"),
}
}
}

View File

@ -1,357 +0,0 @@
use std::borrow::Cow;
use std::collections::HashMap;
use std::hash::Hash;
use std::ops::Deref;
use cow_utils::CowUtils;
use either::Either;
use heed::types::{Str, OwnedType};
use indexmap::IndexMap;
use serde_json::Value;
use meilisearch_schema::{FieldId, Schema};
use meilisearch_types::DocumentId;
use crate::database::MainT;
use crate::error::{FacetError, MResult};
use crate::store::BEU16;
/// Data structure used to represent a boolean expression in the form of nested arrays.
/// Values in the outer array are and-ed together, values in the inner arrays are or-ed together.
#[derive(Debug, PartialEq)]
pub struct FacetFilter(Vec<Either<Vec<FacetKey>, FacetKey>>);
impl Deref for FacetFilter {
type Target = Vec<Either<Vec<FacetKey>, FacetKey>>;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl FacetFilter {
pub fn from_str(
s: &str,
schema: &Schema,
attributes_for_faceting: &[FieldId],
) -> MResult<FacetFilter> {
if attributes_for_faceting.is_empty() {
return Err(FacetError::NoAttributesForFaceting.into());
}
let parsed = serde_json::from_str::<Value>(s).map_err(|e| FacetError::ParsingError(e.to_string()))?;
let mut filter = Vec::new();
match parsed {
Value::Array(and_exprs) => {
if and_exprs.is_empty() {
return Err(FacetError::EmptyArray.into());
}
for expr in and_exprs {
match expr {
Value::String(s) => {
let key = FacetKey::from_str( &s, schema, attributes_for_faceting)?;
filter.push(Either::Right(key));
}
Value::Array(or_exprs) => {
if or_exprs.is_empty() {
return Err(FacetError::EmptyArray.into());
}
let mut inner = Vec::new();
for expr in or_exprs {
match expr {
Value::String(s) => {
let key = FacetKey::from_str( &s, schema, attributes_for_faceting)?;
inner.push(key);
}
bad_value => return Err(FacetError::unexpected_token(&["String"], bad_value).into()),
}
}
filter.push(Either::Left(inner));
}
bad_value => return Err(FacetError::unexpected_token(&["Array", "String"], bad_value).into()),
}
}
Ok(Self(filter))
}
bad_value => Err(FacetError::unexpected_token(&["Array"], bad_value).into()),
}
}
}
#[derive(Debug, Eq, PartialEq, Hash)]
#[repr(C)]
pub struct FacetKey(FieldId, String);
impl FacetKey {
pub fn new(field_id: FieldId, value: String) -> Self {
let value = match value.cow_to_lowercase() {
Cow::Borrowed(_) => value,
Cow::Owned(s) => s,
};
Self(field_id, value)
}
pub fn key(&self) -> FieldId {
self.0
}
pub fn value(&self) -> &str {
&self.1
}
// TODO improve parser
fn from_str(
s: &str,
schema: &Schema,
attributes_for_faceting: &[FieldId],
) -> Result<Self, FacetError> {
let mut split = s.splitn(2, ':');
let key = split
.next()
.ok_or_else(|| FacetError::InvalidFormat(s.to_string()))?
.trim();
let field_id = schema
.id(key)
.ok_or_else(|| FacetError::AttributeNotFound(key.to_string()))?;
if !attributes_for_faceting.contains(&field_id) {
return Err(FacetError::attribute_not_set(
attributes_for_faceting
.iter()
.filter_map(|&id| schema.name(id))
.map(str::to_string)
.collect::<Vec<_>>(),
key))
}
let value = split
.next()
.ok_or_else(|| FacetError::InvalidFormat(s.to_string()))?
.trim();
// unquoting the string if need be:
let mut indices = value.char_indices();
let value = match (indices.next(), indices.last()) {
(Some((s, '\'')), Some((e, '\''))) |
(Some((s, '\"')), Some((e, '\"'))) => value[s + 1..e].to_string(),
_ => value.to_string(),
};
Ok(Self::new(field_id, value))
}
}
impl<'a> heed::BytesEncode<'a> for FacetKey {
type EItem = FacetKey;
fn bytes_encode(item: &'a Self::EItem) -> Option<Cow<'a, [u8]>> {
let mut buffer = Vec::with_capacity(2 + item.1.len());
let id = BEU16::new(item.key().into());
let id_bytes = OwnedType::bytes_encode(&id)?;
let value_bytes = Str::bytes_encode(item.value())?;
buffer.extend_from_slice(id_bytes.as_ref());
buffer.extend_from_slice(value_bytes.as_ref());
Some(Cow::Owned(buffer))
}
}
impl<'a> heed::BytesDecode<'a> for FacetKey {
type DItem = FacetKey;
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
let (id_bytes, value_bytes) = bytes.split_at(2);
let id = OwnedType::<BEU16>::bytes_decode(id_bytes)?;
let id = id.get().into();
let string = Str::bytes_decode(&value_bytes)?;
Some(FacetKey(id, string.to_string()))
}
}
pub fn add_to_facet_map(
facet_map: &mut HashMap<FacetKey, (String, Vec<DocumentId>)>,
field_id: FieldId,
value: Value,
document_id: DocumentId,
) -> Result<(), FacetError> {
let value = match value {
Value::String(s) => s,
// ignore null
Value::Null => return Ok(()),
value => return Err(FacetError::InvalidDocumentAttribute(value.to_string())),
};
let key = FacetKey::new(field_id, value.clone());
facet_map.entry(key).or_insert_with(|| (value, Vec::new())).1.push(document_id);
Ok(())
}
pub fn facet_map_from_docids(
rtxn: &heed::RoTxn<MainT>,
index: &crate::Index,
document_ids: &[DocumentId],
attributes_for_facetting: &[FieldId],
) -> MResult<HashMap<FacetKey, (String, Vec<DocumentId>)>> {
// A hashmap that ascociate a facet key to a pair containing the original facet attribute
// string with it's case preserved, and a list of document ids for that facet attribute.
let mut facet_map: HashMap<FacetKey, (String, Vec<DocumentId>)> = HashMap::new();
for document_id in document_ids {
for result in index
.documents_fields
.document_fields(rtxn, *document_id)?
{
let (field_id, bytes) = result?;
if attributes_for_facetting.contains(&field_id) {
match serde_json::from_slice(bytes)? {
Value::Array(values) => {
for v in values {
add_to_facet_map(&mut facet_map, field_id, v, *document_id)?;
}
}
v => add_to_facet_map(&mut facet_map, field_id, v, *document_id)?,
};
}
}
}
Ok(facet_map)
}
pub fn facet_map_from_docs(
schema: &Schema,
documents: &HashMap<DocumentId, IndexMap<String, Value>>,
attributes_for_facetting: &[FieldId],
) -> MResult<HashMap<FacetKey, (String, Vec<DocumentId>)>> {
let mut facet_map = HashMap::new();
let attributes_for_facetting = attributes_for_facetting
.iter()
.filter_map(|&id| schema.name(id).map(|name| (id, name)))
.collect::<Vec<_>>();
for (id, document) in documents {
for (field_id, name) in &attributes_for_facetting {
if let Some(value) = document.get(*name) {
match value {
Value::Array(values) => {
for v in values {
add_to_facet_map(&mut facet_map, *field_id, v.clone(), *id)?;
}
}
v => add_to_facet_map(&mut facet_map, *field_id, v.clone(), *id)?,
}
}
}
}
Ok(facet_map)
}
#[cfg(test)]
mod test {
use super::*;
use meilisearch_schema::Schema;
#[test]
fn test_facet_key() {
let mut schema = Schema::default();
let id = schema.insert_with_position("hello").unwrap().0;
let facet_list = [schema.id("hello").unwrap()];
assert_eq!(
FacetKey::from_str("hello:12", &schema, &facet_list).unwrap(),
FacetKey::new(id, "12".to_string())
);
assert_eq!(
FacetKey::from_str("hello:\"foo bar\"", &schema, &facet_list).unwrap(),
FacetKey::new(id, "foo bar".to_string())
);
assert_eq!(
FacetKey::from_str("hello:'foo bar'", &schema, &facet_list).unwrap(),
FacetKey::new(id, "foo bar".to_string())
);
// weird case
assert_eq!(
FacetKey::from_str("hello:blabla:machin", &schema, &facet_list).unwrap(),
FacetKey::new(id, "blabla:machin".to_string())
);
assert_eq!(
FacetKey::from_str("hello:\"\"", &schema, &facet_list).unwrap(),
FacetKey::new(id, "".to_string())
);
assert_eq!(
FacetKey::from_str("hello:'", &schema, &facet_list).unwrap(),
FacetKey::new(id, "'".to_string())
);
assert_eq!(
FacetKey::from_str("hello:''", &schema, &facet_list).unwrap(),
FacetKey::new(id, "".to_string())
);
assert!(FacetKey::from_str("hello", &schema, &facet_list).is_err());
assert!(FacetKey::from_str("toto:12", &schema, &facet_list).is_err());
}
#[test]
fn test_parse_facet_array() {
use either::Either::{Left, Right};
let mut schema = Schema::default();
let _id = schema.insert_with_position("hello").unwrap();
let facet_list = [schema.id("hello").unwrap()];
assert_eq!(
FacetFilter::from_str("[[\"hello:12\"]]", &schema, &facet_list).unwrap(),
FacetFilter(vec![Left(vec![FacetKey(FieldId(0), "12".to_string())])])
);
assert_eq!(
FacetFilter::from_str("[\"hello:12\"]", &schema, &facet_list).unwrap(),
FacetFilter(vec![Right(FacetKey(FieldId(0), "12".to_string()))])
);
assert_eq!(
FacetFilter::from_str("[\"hello:12\", \"hello:13\"]", &schema, &facet_list).unwrap(),
FacetFilter(vec![
Right(FacetKey(FieldId(0), "12".to_string())),
Right(FacetKey(FieldId(0), "13".to_string()))
])
);
assert_eq!(
FacetFilter::from_str("[[\"hello:12\", \"hello:13\"]]", &schema, &facet_list).unwrap(),
FacetFilter(vec![Left(vec![
FacetKey(FieldId(0), "12".to_string()),
FacetKey(FieldId(0), "13".to_string())
])])
);
assert_eq!(
FacetFilter::from_str(
"[[\"hello:12\", \"hello:13\"], \"hello:14\"]",
&schema,
&facet_list
)
.unwrap(),
FacetFilter(vec![
Left(vec![
FacetKey(FieldId(0), "12".to_string()),
FacetKey(FieldId(0), "13".to_string())
]),
Right(FacetKey(FieldId(0), "14".to_string()))
])
);
// invalid array depths
assert!(FacetFilter::from_str(
"[[[\"hello:12\", \"hello:13\"], \"hello:14\"]]",
&schema,
&facet_list
)
.is_err());
assert!(FacetFilter::from_str(
"[[[\"hello:12\", \"hello:13\"]], \"hello:14\"]]",
&schema,
&facet_list
)
.is_err());
assert!(FacetFilter::from_str("\"hello:14\"", &schema, &facet_list).is_err());
// unexisting key
assert!(FacetFilter::from_str("[\"foo:12\"]", &schema, &facet_list).is_err());
// invalid facet key
assert!(FacetFilter::from_str("[\"foo=12\"]", &schema, &facet_list).is_err());
assert!(FacetFilter::from_str("[\"foo12\"]", &schema, &facet_list).is_err());
assert!(FacetFilter::from_str("[\"\"]", &schema, &facet_list).is_err());
// empty array error
assert!(FacetFilter::from_str("[]", &schema, &facet_list).is_err());
assert!(FacetFilter::from_str("[\"hello:12\", []]", &schema, &facet_list).is_err());
}
}

View File

@ -1,276 +0,0 @@
use std::str::FromStr;
use std::cmp::Ordering;
use crate::error::Error;
use crate::{store::Index, DocumentId, MainT};
use heed::RoTxn;
use meilisearch_schema::{FieldId, Schema};
use pest::error::{Error as PestError, ErrorVariant};
use pest::iterators::Pair;
use serde_json::{Value, Number};
use super::parser::Rule;
#[derive(Debug, PartialEq)]
enum ConditionType {
Greater,
Less,
Equal,
LessEqual,
GreaterEqual,
NotEqual,
}
/// We need to infer type when the filter is constructed
/// and match every possible types it can be parsed into.
#[derive(Debug)]
struct ConditionValue<'a> {
string: &'a str,
boolean: Option<bool>,
number: Option<Number>
}
impl<'a> ConditionValue<'a> {
pub fn new(value: &Pair<'a, Rule>) -> Self {
match value.as_rule() {
Rule::string | Rule::word => {
let string = value.as_str();
let boolean = match value.as_str() {
"true" => Some(true),
"false" => Some(false),
_ => None,
};
let number = Number::from_str(value.as_str()).ok();
ConditionValue { string, boolean, number }
},
_ => unreachable!(),
}
}
pub fn as_str(&self) -> &str {
self.string
}
pub fn as_number(&self) -> Option<&Number> {
self.number.as_ref()
}
pub fn as_bool(&self) -> Option<bool> {
self.boolean
}
}
#[derive(Debug)]
pub struct Condition<'a> {
field: FieldId,
condition: ConditionType,
value: ConditionValue<'a>
}
fn get_field_value<'a>(schema: &Schema, pair: Pair<'a, Rule>) -> Result<(FieldId, ConditionValue<'a>), Error> {
let mut items = pair.into_inner();
// lexing ensures that we at least have a key
let key = items.next().unwrap();
let field = schema
.id(key.as_str())
.ok_or_else(|| PestError::new_from_span(
ErrorVariant::CustomError {
message: format!(
"attribute `{}` not found, available attributes are: {}",
key.as_str(),
schema.names().collect::<Vec<_>>().join(", ")
),
},
key.as_span()))?;
let value = ConditionValue::new(&items.next().unwrap());
Ok((field, value))
}
// undefined behavior with big numbers
fn compare_numbers(lhs: &Number, rhs: &Number) -> Option<Ordering> {
match (lhs.as_i64(), lhs.as_u64(), lhs.as_f64(),
rhs.as_i64(), rhs.as_u64(), rhs.as_f64()) {
// i64 u64 f64 i64 u64 f64
(Some(lhs), _, _, Some(rhs), _, _) => lhs.partial_cmp(&rhs),
(_, Some(lhs), _, _, Some(rhs), _) => lhs.partial_cmp(&rhs),
(_, _, Some(lhs), _, _, Some(rhs)) => lhs.partial_cmp(&rhs),
(_, _, _, _, _, _) => None,
}
}
impl<'a> Condition<'a> {
pub fn less(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::Less;
Ok(Self { field, condition, value })
}
pub fn greater(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::Greater;
Ok(Self { field, condition, value })
}
pub fn neq(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::NotEqual;
Ok(Self { field, condition, value })
}
pub fn geq(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::GreaterEqual;
Ok(Self { field, condition, value })
}
pub fn leq(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::LessEqual;
Ok(Self { field, condition, value })
}
pub fn eq(
item: Pair<'a, Rule>,
schema: &'a Schema,
) -> Result<Self, Error> {
let (field, value) = get_field_value(schema, item)?;
let condition = ConditionType::Equal;
Ok(Self { field, condition, value })
}
pub fn test(
&self,
reader: &RoTxn<MainT>,
index: &Index,
document_id: DocumentId,
) -> Result<bool, Error> {
match index.document_attribute::<Value>(reader, document_id, self.field)? {
Some(Value::Array(values)) => Ok(values.iter().any(|v| self.match_value(Some(v)))),
other => Ok(self.match_value(other.as_ref())),
}
}
fn match_value(&self, value: Option<&Value>) -> bool {
match value {
Some(Value::String(s)) => {
let value = self.value.as_str();
match self.condition {
ConditionType::Equal => unicase::eq(value, &s),
ConditionType::NotEqual => !unicase::eq(value, &s),
_ => false
}
},
Some(Value::Number(n)) => {
if let Some(value) = self.value.as_number() {
if let Some(ord) = compare_numbers(&n, value) {
let res = match self.condition {
ConditionType::Equal => ord == Ordering::Equal,
ConditionType::NotEqual => ord != Ordering::Equal,
ConditionType::GreaterEqual => ord != Ordering::Less,
ConditionType::LessEqual => ord != Ordering::Greater,
ConditionType::Greater => ord == Ordering::Greater,
ConditionType::Less => ord == Ordering::Less,
};
return res
}
}
false
},
Some(Value::Bool(b)) => {
if let Some(value) = self.value.as_bool() {
let res = match self.condition {
ConditionType::Equal => *b == value,
ConditionType::NotEqual => *b != value,
_ => false
};
return res
}
false
},
// if field is not supported (or not found), all values are different from it,
// so != should always return true in this case.
_ => self.condition == ConditionType::NotEqual,
}
}
}
#[cfg(test)]
mod test {
use super::*;
use serde_json::Number;
use std::cmp::Ordering;
#[test]
fn test_number_comp() {
// test both u64
let n1 = Number::from(1u64);
let n2 = Number::from(2u64);
assert_eq!(Some(Ordering::Less), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Greater), compare_numbers(&n2, &n1));
let n1 = Number::from(1u64);
let n2 = Number::from(1u64);
assert_eq!(Some(Ordering::Equal), compare_numbers(&n1, &n2));
// test both i64
let n1 = Number::from(1i64);
let n2 = Number::from(2i64);
assert_eq!(Some(Ordering::Less), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Greater), compare_numbers(&n2, &n1));
let n1 = Number::from(1i64);
let n2 = Number::from(1i64);
assert_eq!(Some(Ordering::Equal), compare_numbers(&n1, &n2));
// test both f64
let n1 = Number::from_f64(1f64).unwrap();
let n2 = Number::from_f64(2f64).unwrap();
assert_eq!(Some(Ordering::Less), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Greater), compare_numbers(&n2, &n1));
let n1 = Number::from_f64(1f64).unwrap();
let n2 = Number::from_f64(1f64).unwrap();
assert_eq!(Some(Ordering::Equal), compare_numbers(&n1, &n2));
// test one u64 and one f64
let n1 = Number::from_f64(1f64).unwrap();
let n2 = Number::from(2u64);
assert_eq!(Some(Ordering::Less), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Greater), compare_numbers(&n2, &n1));
// equality
let n1 = Number::from_f64(1f64).unwrap();
let n2 = Number::from(1u64);
assert_eq!(Some(Ordering::Equal), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Equal), compare_numbers(&n2, &n1));
// float is neg
let n1 = Number::from_f64(-1f64).unwrap();
let n2 = Number::from(1u64);
assert_eq!(Some(Ordering::Less), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Greater), compare_numbers(&n2, &n1));
// float is too big
let n1 = Number::from_f64(std::f64::MAX).unwrap();
let n2 = Number::from(1u64);
assert_eq!(Some(Ordering::Greater), compare_numbers(&n1, &n2));
assert_eq!(Some(Ordering::Less), compare_numbers(&n2, &n1));
// misc
let n1 = Number::from_f64(std::f64::MAX).unwrap();
let n2 = Number::from(std::u64::MAX);
assert_eq!(Some(Ordering::Greater), compare_numbers(&n1, &n2));
assert_eq!(Some( Ordering::Less ), compare_numbers(&n2, &n1));
}
}

View File

@ -1,127 +0,0 @@
mod parser;
mod condition;
pub(crate) use parser::Rule;
use std::ops::Not;
use condition::Condition;
use crate::error::Error;
use crate::{DocumentId, MainT, store::Index};
use heed::RoTxn;
use meilisearch_schema::Schema;
use parser::{PREC_CLIMBER, FilterParser};
use pest::iterators::{Pair, Pairs};
use pest::Parser;
type FilterResult<'a> = Result<Filter<'a>, Error>;
#[derive(Debug)]
pub enum Filter<'a> {
Condition(Condition<'a>),
Or(Box<Self>, Box<Self>),
And(Box<Self>, Box<Self>),
Not(Box<Self>),
}
impl<'a> Filter<'a> {
pub fn parse(expr: &'a str, schema: &'a Schema) -> FilterResult<'a> {
let mut lexed = FilterParser::parse(Rule::prgm, expr)?;
Self::build(lexed.next().unwrap().into_inner(), schema)
}
pub fn test(
&self,
reader: &RoTxn<MainT>,
index: &Index,
document_id: DocumentId,
) -> Result<bool, Error> {
use Filter::*;
match self {
Condition(c) => c.test(reader, index, document_id),
Or(lhs, rhs) => Ok(
lhs.test(reader, index, document_id)? || rhs.test(reader, index, document_id)?
),
And(lhs, rhs) => Ok(
lhs.test(reader, index, document_id)? && rhs.test(reader, index, document_id)?
),
Not(op) => op.test(reader, index, document_id).map(bool::not),
}
}
fn build(expression: Pairs<'a, Rule>, schema: &'a Schema) -> FilterResult<'a> {
PREC_CLIMBER.climb(
expression,
|pair: Pair<Rule>| match pair.as_rule() {
Rule::eq => Ok(Filter::Condition(Condition::eq(pair, schema)?)),
Rule::greater => Ok(Filter::Condition(Condition::greater(pair, schema)?)),
Rule::less => Ok(Filter::Condition(Condition::less(pair, schema)?)),
Rule::neq => Ok(Filter::Condition(Condition::neq(pair, schema)?)),
Rule::geq => Ok(Filter::Condition(Condition::geq(pair, schema)?)),
Rule::leq => Ok(Filter::Condition(Condition::leq(pair, schema)?)),
Rule::prgm => Self::build(pair.into_inner(), schema),
Rule::term => Self::build(pair.into_inner(), schema),
Rule::not => Ok(Filter::Not(Box::new(Self::build(
pair.into_inner(),
schema,
)?))),
_ => unreachable!(),
},
|lhs: FilterResult, op: Pair<Rule>, rhs: FilterResult| match op.as_rule() {
Rule::or => Ok(Filter::Or(Box::new(lhs?), Box::new(rhs?))),
Rule::and => Ok(Filter::And(Box::new(lhs?), Box::new(rhs?))),
_ => unreachable!(),
},
)
}
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn invalid_syntax() {
assert!(FilterParser::parse(Rule::prgm, "field : id").is_err());
assert!(FilterParser::parse(Rule::prgm, "field=hello hello").is_err());
assert!(FilterParser::parse(Rule::prgm, "field=hello OR OR").is_err());
assert!(FilterParser::parse(Rule::prgm, "OR field:hello").is_err());
assert!(FilterParser::parse(Rule::prgm, r#"field="hello world"#).is_err());
assert!(FilterParser::parse(Rule::prgm, r#"field='hello world"#).is_err());
assert!(FilterParser::parse(Rule::prgm, "NOT field=").is_err());
assert!(FilterParser::parse(Rule::prgm, "N").is_err());
assert!(FilterParser::parse(Rule::prgm, "(field=1").is_err());
assert!(FilterParser::parse(Rule::prgm, "(field=1))").is_err());
assert!(FilterParser::parse(Rule::prgm, "field=1ORfield=2").is_err());
assert!(FilterParser::parse(Rule::prgm, "field=1 ( OR field=2)").is_err());
assert!(FilterParser::parse(Rule::prgm, "hello world=1").is_err());
assert!(FilterParser::parse(Rule::prgm, "").is_err());
assert!(FilterParser::parse(Rule::prgm, r#"((((((hello=world)))))"#).is_err());
}
#[test]
fn valid_syntax() {
assert!(FilterParser::parse(Rule::prgm, "field = id").is_ok());
assert!(FilterParser::parse(Rule::prgm, "field=id").is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field >= 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field <= 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field="hello world""#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field='hello world'"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field > 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field < 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field < 10 AND NOT field=5"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field < 10 AND NOT field > 7.5"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field=true OR NOT field=5"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"NOT field=true OR NOT field=5"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field='hello world' OR ( NOT field=true OR NOT field=5 )"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field='hello \'worl\'d' OR ( NOT field=true OR NOT field=5 )"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"field="hello \"worl\"d" OR ( NOT field=true OR NOT field=5 )"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"((((((hello=world))))))"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#""foo bar" > 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#""foo bar" = 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"'foo bar' = 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"'foo bar' <= 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"'foo bar' != 10"#).is_ok());
assert!(FilterParser::parse(Rule::prgm, r#"bar != 10"#).is_ok());
}
}

View File

@ -1,28 +0,0 @@
key = _{quoted | word}
value = _{quoted | word}
quoted = _{ (PUSH("'") | PUSH("\"")) ~ string ~ POP }
string = {char*}
word = ${(LETTER | NUMBER | "_" | "-" | ".")+}
char = _{ !(PEEK | "\\") ~ ANY
| "\\" ~ (PEEK | "\\" | "/" | "b" | "f" | "n" | "r" | "t")
| "\\" ~ ("u" ~ ASCII_HEX_DIGIT{4})}
condition = _{eq | greater | less | geq | leq | neq}
geq = {key ~ ">=" ~ value}
leq = {key ~ "<=" ~ value}
neq = {key ~ "!=" ~ value}
eq = {key ~ "=" ~ value}
greater = {key ~ ">" ~ value}
less = {key ~ "<" ~ value}
prgm = {SOI ~ expr ~ EOI}
expr = _{ ( term ~ (operation ~ term)* ) }
term = { ("(" ~ expr ~ ")") | condition | not }
operation = _{ and | or }
and = {"AND"}
or = {"OR"}
not = {"NOT" ~ term}
WHITESPACE = _{ " " }

View File

@ -1,12 +0,0 @@
use once_cell::sync::Lazy;
use pest::prec_climber::{Operator, Assoc, PrecClimber};
pub static PREC_CLIMBER: Lazy<PrecClimber<Rule>> = Lazy::new(|| {
use Assoc::*;
use Rule::*;
pest::prec_climber::PrecClimber::new(vec![Operator::new(or, Left), Operator::new(and, Left)])
});
#[derive(Parser)]
#[grammar = "filters/parser/grammar.pest"]
pub struct FilterParser;

View File

@ -1,134 +0,0 @@
use std::cmp::min;
use std::collections::BTreeMap;
use std::ops::{Index, IndexMut};
// A simple wrapper around vec so we can get contiguous but index it like it's 2D array.
struct N2Array<T> {
y_size: usize,
buf: Vec<T>,
}
impl<T: Clone> N2Array<T> {
fn new(x: usize, y: usize, value: T) -> N2Array<T> {
N2Array {
y_size: y,
buf: vec![value; x * y],
}
}
}
impl<T> Index<(usize, usize)> for N2Array<T> {
type Output = T;
#[inline]
fn index(&self, (x, y): (usize, usize)) -> &T {
&self.buf[(x * self.y_size) + y]
}
}
impl<T> IndexMut<(usize, usize)> for N2Array<T> {
#[inline]
fn index_mut(&mut self, (x, y): (usize, usize)) -> &mut T {
&mut self.buf[(x * self.y_size) + y]
}
}
pub fn prefix_damerau_levenshtein(source: &[u8], target: &[u8]) -> (u32, usize) {
let (n, m) = (source.len(), target.len());
assert!(
n <= m,
"the source string must be shorter than the target one"
);
if n == 0 {
return (m as u32, 0);
}
if m == 0 {
return (n as u32, 0);
}
if n == m && source == target {
return (0, m);
}
let inf = n + m;
let mut matrix = N2Array::new(n + 2, m + 2, 0);
matrix[(0, 0)] = inf;
for i in 0..n + 1 {
matrix[(i + 1, 0)] = inf;
matrix[(i + 1, 1)] = i;
}
for j in 0..m + 1 {
matrix[(0, j + 1)] = inf;
matrix[(1, j + 1)] = j;
}
let mut last_row = BTreeMap::new();
for (row, char_s) in source.iter().enumerate() {
let mut last_match_col = 0;
let row = row + 1;
for (col, char_t) in target.iter().enumerate() {
let col = col + 1;
let last_match_row = *last_row.get(&char_t).unwrap_or(&0);
let cost = if char_s == char_t { 0 } else { 1 };
let dist_add = matrix[(row, col + 1)] + 1;
let dist_del = matrix[(row + 1, col)] + 1;
let dist_sub = matrix[(row, col)] + cost;
let dist_trans = matrix[(last_match_row, last_match_col)]
+ (row - last_match_row - 1)
+ 1
+ (col - last_match_col - 1);
let dist = min(min(dist_add, dist_del), min(dist_sub, dist_trans));
matrix[(row + 1, col + 1)] = dist;
if cost == 0 {
last_match_col = col;
}
}
last_row.insert(char_s, row);
}
let mut minimum = (u32::max_value(), 0);
for x in n..=m {
let dist = matrix[(n + 1, x + 1)] as u32;
if dist < minimum.0 {
minimum = (dist, x)
}
}
minimum
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn matched_length() {
let query = "Levenste";
let text = "Levenshtein";
let (dist, length) = prefix_damerau_levenshtein(query.as_bytes(), text.as_bytes());
assert_eq!(dist, 1);
assert_eq!(&text[..length], "Levenshte");
}
#[test]
#[should_panic]
fn matched_length_panic() {
let query = "Levenshtein";
let text = "Levenste";
// this function will panic if source if longer than target
prefix_damerau_levenshtein(query.as_bytes(), text.as_bytes());
}
}

View File

@ -1,202 +0,0 @@
#![allow(clippy::type_complexity)]
#[cfg(test)]
#[macro_use]
extern crate assert_matches;
#[macro_use]
extern crate pest_derive;
mod automaton;
mod bucket_sort;
mod database;
mod distinct_map;
mod error;
mod filters;
mod levenshtein;
mod number;
mod query_builder;
mod query_tree;
mod query_words_mapper;
mod ranked_map;
mod raw_document;
mod reordered_attrs;
pub mod criterion;
pub mod facets;
pub mod raw_indexer;
pub mod serde;
pub mod settings;
pub mod store;
pub mod update;
pub use self::database::{BoxUpdateFn, Database, DatabaseOptions, MainT, UpdateT, MainWriter, MainReader, UpdateWriter, UpdateReader};
pub use self::error::{Error, HeedError, FstError, MResult, pest_error, FacetError};
pub use self::filters::Filter;
pub use self::number::{Number, ParseNumberError};
pub use self::ranked_map::RankedMap;
pub use self::raw_document::RawDocument;
pub use self::store::Index;
pub use self::update::{EnqueuedUpdateResult, ProcessedUpdateResult, UpdateStatus, UpdateType};
pub use meilisearch_types::{DocIndex, DocumentId, Highlight};
pub use meilisearch_schema::Schema;
pub use query_words_mapper::QueryWordsMapper;
use compact_arena::SmallArena;
use log::{error, trace};
use std::borrow::Cow;
use std::collections::HashMap;
use std::convert::TryFrom;
use crate::bucket_sort::PostingsListView;
use crate::levenshtein::prefix_damerau_levenshtein;
use crate::query_tree::{QueryId, QueryKind};
use crate::reordered_attrs::ReorderedAttrs;
type FstSetCow<'a> = fst::Set<Cow<'a, [u8]>>;
type FstMapCow<'a> = fst::Map<Cow<'a, [u8]>>;
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]
pub struct Document {
pub id: DocumentId,
pub highlights: Vec<Highlight>,
#[cfg(test)]
pub matches: Vec<crate::bucket_sort::SimpleMatch>,
}
fn highlights_from_raw_document<'a, 'tag, 'txn>(
raw_document: &RawDocument<'a, 'tag>,
queries_kinds: &HashMap<QueryId, &QueryKind>,
arena: &SmallArena<'tag, PostingsListView<'txn>>,
searchable_attrs: Option<&ReorderedAttrs>,
schema: &Schema,
) -> Vec<Highlight>
{
let mut highlights = Vec::new();
for bm in raw_document.bare_matches.iter() {
let postings_list = &arena[bm.postings_list];
let input = postings_list.input();
let kind = &queries_kinds.get(&bm.query_index);
for di in postings_list.iter() {
let covered_area = match kind {
Some(QueryKind::NonTolerant(query)) | Some(QueryKind::Tolerant(query)) => {
let len = if query.len() > input.len() {
input.len()
} else {
prefix_damerau_levenshtein(query.as_bytes(), input).1
};
u16::try_from(len).unwrap_or(u16::max_value())
},
_ => di.char_length,
};
let attribute = searchable_attrs
.and_then(|sa| sa.reverse(di.attribute))
.unwrap_or(di.attribute);
let attribute = match schema.indexed_pos_to_field_id(attribute) {
Some(field_id) => field_id.0,
None => {
error!("Cannot convert indexed_pos {} to field_id", attribute);
trace!("Schema is compromized; {:?}", schema);
continue
}
};
let highlight = Highlight {
attribute,
char_index: di.char_index,
char_length: covered_area,
};
highlights.push(highlight);
}
}
highlights
}
impl Document {
#[cfg(not(test))]
pub fn from_highlights(id: DocumentId, highlights: &[Highlight]) -> Document {
Document { id, highlights: highlights.to_owned() }
}
#[cfg(test)]
pub fn from_highlights(id: DocumentId, highlights: &[Highlight]) -> Document {
Document { id, highlights: highlights.to_owned(), matches: Vec::new() }
}
#[cfg(not(test))]
pub fn from_raw<'a, 'tag, 'txn>(
raw_document: RawDocument<'a, 'tag>,
queries_kinds: &HashMap<QueryId, &QueryKind>,
arena: &SmallArena<'tag, PostingsListView<'txn>>,
searchable_attrs: Option<&ReorderedAttrs>,
schema: &Schema,
) -> Document
{
let highlights = highlights_from_raw_document(
&raw_document,
queries_kinds,
arena,
searchable_attrs,
schema,
);
Document { id: raw_document.id, highlights }
}
#[cfg(test)]
pub fn from_raw<'a, 'tag, 'txn>(
raw_document: RawDocument<'a, 'tag>,
queries_kinds: &HashMap<QueryId, &QueryKind>,
arena: &SmallArena<'tag, PostingsListView<'txn>>,
searchable_attrs: Option<&ReorderedAttrs>,
schema: &Schema,
) -> Document
{
use crate::bucket_sort::SimpleMatch;
let highlights = highlights_from_raw_document(
&raw_document,
queries_kinds,
arena,
searchable_attrs,
schema,
);
let mut matches = Vec::new();
for sm in raw_document.processed_matches {
let attribute = searchable_attrs
.and_then(|sa| sa.reverse(sm.attribute))
.unwrap_or(sm.attribute);
let attribute = match schema.indexed_pos_to_field_id(attribute) {
Some(field_id) => field_id.0,
None => {
error!("Cannot convert indexed_pos {} to field_id", attribute);
trace!("Schema is compromized; {:?}", schema);
continue
}
};
matches.push(SimpleMatch { attribute, ..sm });
}
matches.sort_unstable();
Document { id: raw_document.id, highlights, matches }
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::mem;
#[test]
fn docindex_mem_size() {
assert_eq!(mem::size_of::<DocIndex>(), 12);
}
}

View File

@ -1,120 +0,0 @@
use std::cmp::Ordering;
use std::fmt;
use std::num::{ParseFloatError, ParseIntError};
use std::str::FromStr;
use ordered_float::OrderedFloat;
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize, Debug, Copy, Clone)]
pub enum Number {
Unsigned(u64),
Signed(i64),
Float(OrderedFloat<f64>),
Null,
}
impl Default for Number {
fn default() -> Self {
Self::Null
}
}
impl FromStr for Number {
type Err = ParseNumberError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
let uint_error = match u64::from_str(s) {
Ok(unsigned) => return Ok(Number::Unsigned(unsigned)),
Err(error) => error,
};
let int_error = match i64::from_str(s) {
Ok(signed) => return Ok(Number::Signed(signed)),
Err(error) => error,
};
let float_error = match f64::from_str(s) {
Ok(float) => return Ok(Number::Float(OrderedFloat(float))),
Err(error) => error,
};
Err(ParseNumberError {
uint_error,
int_error,
float_error,
})
}
}
impl PartialEq for Number {
fn eq(&self, other: &Number) -> bool {
self.cmp(other) == Ordering::Equal
}
}
impl Eq for Number {}
impl PartialOrd for Number {
fn partial_cmp(&self, other: &Number) -> Option<Ordering> {
Some(self.cmp(other))
}
}
impl Ord for Number {
fn cmp(&self, other: &Self) -> Ordering {
use Number::{Float, Signed, Unsigned, Null};
match (*self, *other) {
(Unsigned(a), Unsigned(b)) => a.cmp(&b),
(Unsigned(a), Signed(b)) => {
if b < 0 {
Ordering::Greater
} else {
a.cmp(&(b as u64))
}
}
(Unsigned(a), Float(b)) => (OrderedFloat(a as f64)).cmp(&b),
(Signed(a), Unsigned(b)) => {
if a < 0 {
Ordering::Less
} else {
(a as u64).cmp(&b)
}
}
(Signed(a), Signed(b)) => a.cmp(&b),
(Signed(a), Float(b)) => OrderedFloat(a as f64).cmp(&b),
(Float(a), Unsigned(b)) => a.cmp(&OrderedFloat(b as f64)),
(Float(a), Signed(b)) => a.cmp(&OrderedFloat(b as f64)),
(Float(a), Float(b)) => a.cmp(&b),
(Null, Null) => Ordering::Equal,
(_, Null) => Ordering::Less,
(Null, _) => Ordering::Greater,
}
}
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ParseNumberError {
uint_error: ParseIntError,
int_error: ParseIntError,
float_error: ParseFloatError,
}
impl fmt::Display for ParseNumberError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
if self.uint_error == self.int_error {
write!(
f,
"can not parse number: {}, {}",
self.uint_error, self.float_error
)
} else {
write!(
f,
"can not parse number: {}, {}, {}",
self.uint_error, self.int_error, self.float_error
)
}
}
}

File diff suppressed because it is too large Load Diff

View File

@ -1,570 +0,0 @@
use std::borrow::Cow;
use std::collections::HashMap;
use std::hash::{Hash, Hasher};
use std::ops::Range;
use std::time::Instant;
use std::{cmp, fmt, iter::once};
use fst::{IntoStreamer, Streamer};
use itertools::{EitherOrBoth, merge_join_by};
use log::debug;
use meilisearch_tokenizer::analyzer::{Analyzer, AnalyzerConfig};
use sdset::{Set, SetBuf, SetOperation};
use crate::database::MainT;
use crate::{store, DocumentId, DocIndex, MResult, FstSetCow};
use crate::automaton::{build_dfa, build_prefix_dfa, build_exact_dfa};
use crate::QueryWordsMapper;
#[derive(Clone, PartialEq, Eq, Hash)]
pub enum Operation {
And(Vec<Operation>),
Or(Vec<Operation>),
Query(Query),
}
impl fmt::Debug for Operation {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
fn pprint_tree(f: &mut fmt::Formatter<'_>, op: &Operation, depth: usize) -> fmt::Result {
match op {
Operation::And(children) => {
writeln!(f, "{:1$}AND", "", depth * 2)?;
children.iter().try_for_each(|c| pprint_tree(f, c, depth + 1))
},
Operation::Or(children) => {
writeln!(f, "{:1$}OR", "", depth * 2)?;
children.iter().try_for_each(|c| pprint_tree(f, c, depth + 1))
},
Operation::Query(query) => writeln!(f, "{:2$}{:?}", "", query, depth * 2),
}
}
pprint_tree(f, self, 0)
}
}
impl Operation {
fn tolerant(id: QueryId, prefix: bool, s: &str) -> Operation {
Operation::Query(Query { id, prefix, exact: true, kind: QueryKind::Tolerant(s.to_string()) })
}
fn non_tolerant(id: QueryId, prefix: bool, s: &str) -> Operation {
Operation::Query(Query { id, prefix, exact: true, kind: QueryKind::NonTolerant(s.to_string()) })
}
fn phrase2(id: QueryId, prefix: bool, (left, right): (&str, &str)) -> Operation {
let kind = QueryKind::Phrase(vec![left.to_owned(), right.to_owned()]);
Operation::Query(Query { id, prefix, exact: true, kind })
}
}
pub type QueryId = usize;
#[derive(Clone, Eq)]
pub struct Query {
pub id: QueryId,
pub prefix: bool,
pub exact: bool,
pub kind: QueryKind,
}
impl PartialEq for Query {
fn eq(&self, other: &Self) -> bool {
self.prefix == other.prefix && self.kind == other.kind
}
}
impl Hash for Query {
fn hash<H: Hasher>(&self, state: &mut H) {
self.prefix.hash(state);
self.kind.hash(state);
}
}
#[derive(Clone, PartialEq, Eq, Hash)]
pub enum QueryKind {
Tolerant(String),
NonTolerant(String),
Phrase(Vec<String>),
}
impl fmt::Debug for Query {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let Query { id, prefix, kind, .. } = self;
let prefix = if *prefix { String::from("Prefix") } else { String::default() };
match kind {
QueryKind::NonTolerant(word) => {
f.debug_struct(&(prefix + "NonTolerant")).field("id", &id).field("word", &word).finish()
},
QueryKind::Tolerant(word) => {
f.debug_struct(&(prefix + "Tolerant")).field("id", &id).field("word", &word).finish()
},
QueryKind::Phrase(words) => {
f.debug_struct(&(prefix + "Phrase")).field("id", &id).field("words", &words).finish()
},
}
}
}
#[derive(Debug, Default)]
pub struct PostingsList {
docids: SetBuf<DocumentId>,
matches: SetBuf<DocIndex>,
}
pub struct Context<'a> {
pub words_set: FstSetCow<'a>,
pub stop_words: FstSetCow<'a>,
pub synonyms: store::Synonyms,
pub postings_lists: store::PostingsLists,
pub prefix_postings_lists: store::PrefixPostingsListsCache,
}
fn split_best_frequency<'a>(reader: &heed::RoTxn<MainT>, ctx: &Context, word: &'a str) -> MResult<Option<(&'a str, &'a str)>> {
let chars = word.char_indices().skip(1);
let mut best = None;
for (i, _) in chars {
let (left, right) = word.split_at(i);
let left_freq = ctx.postings_lists
.postings_list(reader, left.as_bytes())?
.map(|p| p.docids.len())
.unwrap_or(0);
let right_freq = ctx.postings_lists
.postings_list(reader, right.as_bytes())?
.map(|p| p.docids.len())
.unwrap_or(0);
let min_freq = cmp::min(left_freq, right_freq);
if min_freq != 0 && best.map_or(true, |(old, _, _)| min_freq > old) {
best = Some((min_freq, left, right));
}
}
Ok(best.map(|(_, l, r)| (l, r)))
}
fn fetch_synonyms(reader: &heed::RoTxn<MainT>, ctx: &Context, words: &[&str]) -> MResult<Vec<Vec<String>>> {
let words = &words.join(" ");
let set = ctx.synonyms.synonyms_fst(reader, words.as_bytes())?;
let mut strings = Vec::new();
let mut stream = set.stream();
while let Some(input) = stream.next() {
if let Ok(input) = std::str::from_utf8(input) {
let alts = input.split_ascii_whitespace().map(ToOwned::to_owned).collect();
strings.push(alts);
}
}
Ok(strings)
}
fn create_operation<I, F>(iter: I, f: F) -> Operation
where I: IntoIterator<Item=Operation>,
F: Fn(Vec<Operation>) -> Operation,
{
let mut iter = iter.into_iter();
match (iter.next(), iter.next()) {
(Some(first), None) => first,
(first, second) => f(first.into_iter().chain(second).chain(iter).collect()),
}
}
const MAX_NGRAM: usize = 3;
fn split_query_string<'a, A: AsRef<[u8]>>(s: &str, stop_words: &'a fst::Set<A>) -> Vec<(usize, String)> {
// TODO: Use global instance instead
Analyzer::new(AnalyzerConfig::default_with_stopwords(stop_words))
.analyze(s)
.tokens()
.filter(|t| t.is_word())
.map(|t| t.word.to_string())
.enumerate()
.collect()
}
pub fn create_query_tree(
reader: &heed::RoTxn<MainT>,
ctx: &Context,
query: &str,
) -> MResult<(Operation, HashMap<QueryId, Range<usize>>)>
{
// TODO: use a shared analyzer instance
let words = split_query_string(query, &ctx.stop_words);
let mut mapper = QueryWordsMapper::new(words.iter().map(|(_, w)| w));
fn create_inner(
reader: &heed::RoTxn<MainT>,
ctx: &Context,
mapper: &mut QueryWordsMapper,
words: &[(usize, String)],
) -> MResult<Vec<Operation>>
{
let mut alts = Vec::new();
for ngram in 1..=MAX_NGRAM {
if let Some(group) = words.get(..ngram) {
let mut group_ops = Vec::new();
let tail = &words[ngram..];
let is_last = tail.is_empty();
let mut group_alts = Vec::new();
match group {
[(id, word)] => {
let mut idgen = ((id + 1) * 100)..;
let range = (*id)..id+1;
let phrase = split_best_frequency(reader, ctx, word)?
.map(|ws| {
let id = idgen.next().unwrap();
idgen.next().unwrap();
mapper.declare(range.clone(), id, &[ws.0, ws.1]);
Operation::phrase2(id, is_last, ws)
});
let synonyms = fetch_synonyms(reader, ctx, &[word])?
.into_iter()
.map(|alts| {
let exact = alts.len() == 1;
let id = idgen.next().unwrap();
mapper.declare(range.clone(), id, &alts);
let mut idgen = once(id).chain(&mut idgen);
let iter = alts.into_iter().map(|w| {
let id = idgen.next().unwrap();
let kind = QueryKind::NonTolerant(w);
Operation::Query(Query { id, prefix: false, exact, kind })
});
create_operation(iter, Operation::And)
});
let original = Operation::tolerant(*id, is_last, word);
group_alts.push(original);
group_alts.extend(synonyms.chain(phrase));
},
words => {
let id = words[0].0;
let mut idgen = ((id + 1) * 100_usize.pow(ngram as u32))..;
let range = id..id+ngram;
let words: Vec<_> = words.iter().map(|(_, s)| s.as_str()).collect();
for synonym in fetch_synonyms(reader, ctx, &words)? {
let exact = synonym.len() == 1;
let id = idgen.next().unwrap();
mapper.declare(range.clone(), id, &synonym);
let mut idgen = once(id).chain(&mut idgen);
let synonym = synonym.into_iter().map(|s| {
let id = idgen.next().unwrap();
let kind = QueryKind::NonTolerant(s);
Operation::Query(Query { id, prefix: false, exact, kind })
});
group_alts.push(create_operation(synonym, Operation::And));
}
let id = idgen.next().unwrap();
let concat = words.concat();
mapper.declare(range.clone(), id, &[&concat]);
group_alts.push(Operation::non_tolerant(id, is_last, &concat));
}
}
group_ops.push(create_operation(group_alts, Operation::Or));
if !tail.is_empty() {
let tail_ops = create_inner(reader, ctx, mapper, tail)?;
group_ops.push(create_operation(tail_ops, Operation::Or));
}
alts.push(create_operation(group_ops, Operation::And));
}
}
Ok(alts)
}
let alternatives = create_inner(reader, ctx, &mut mapper, &words)?;
let operation = Operation::Or(alternatives);
let mapping = mapper.mapping();
Ok((operation, mapping))
}
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct PostingsKey<'o> {
pub query: &'o Query,
pub input: Vec<u8>,
pub distance: u8,
pub is_exact: bool,
}
pub type Postings<'o, 'txn> = HashMap<PostingsKey<'o>, Cow<'txn, Set<DocIndex>>>;
pub type Cache<'o, 'txn> = HashMap<&'o Operation, Cow<'txn, Set<DocumentId>>>;
pub struct QueryResult<'o, 'txn> {
pub docids: Cow<'txn, Set<DocumentId>>,
pub queries: Postings<'o, 'txn>,
}
pub fn traverse_query_tree<'o, 'txn>(
reader: &'txn heed::RoTxn<MainT>,
ctx: &Context,
tree: &'o Operation,
) -> MResult<QueryResult<'o, 'txn>>
{
fn execute_and<'o, 'txn>(
reader: &'txn heed::RoTxn<MainT>,
ctx: &Context,
cache: &mut Cache<'o, 'txn>,
postings: &mut Postings<'o, 'txn>,
depth: usize,
operations: &'o [Operation],
) -> MResult<Cow<'txn, Set<DocumentId>>>
{
debug!("{:1$}AND", "", depth * 2);
let before = Instant::now();
let mut results = Vec::new();
for op in operations {
if cache.get(op).is_none() {
let docids = match op {
Operation::And(ops) => execute_and(reader, ctx, cache, postings, depth + 1, &ops)?,
Operation::Or(ops) => execute_or(reader, ctx, cache, postings, depth + 1, &ops)?,
Operation::Query(query) => execute_query(reader, ctx, postings, depth + 1, &query)?,
};
cache.insert(op, docids);
}
}
for op in operations {
if let Some(docids) = cache.get(op) {
results.push(docids.as_ref());
}
}
let op = sdset::multi::Intersection::new(results);
let docids = op.into_set_buf();
debug!("{:3$}--- AND fetched {} documents in {:.02?}", "", docids.len(), before.elapsed(), depth * 2);
Ok(Cow::Owned(docids))
}
fn execute_or<'o, 'txn>(
reader: &'txn heed::RoTxn<MainT>,
ctx: &Context,
cache: &mut Cache<'o, 'txn>,
postings: &mut Postings<'o, 'txn>,
depth: usize,
operations: &'o [Operation],
) -> MResult<Cow<'txn, Set<DocumentId>>>
{
debug!("{:1$}OR", "", depth * 2);
let before = Instant::now();
let mut results = Vec::new();
for op in operations {
if cache.get(op).is_none() {
let docids = match op {
Operation::And(ops) => execute_and(reader, ctx, cache, postings, depth + 1, &ops)?,
Operation::Or(ops) => execute_or(reader, ctx, cache, postings, depth + 1, &ops)?,
Operation::Query(query) => execute_query(reader, ctx, postings, depth + 1, &query)?,
};
cache.insert(op, docids);
}
}
for op in operations {
if let Some(docids) = cache.get(op) {
results.push(docids.as_ref());
}
}
let op = sdset::multi::Union::new(results);
let docids = op.into_set_buf();
debug!("{:3$}--- OR fetched {} documents in {:.02?}", "", docids.len(), before.elapsed(), depth * 2);
Ok(Cow::Owned(docids))
}
fn execute_query<'o, 'txn>(
reader: &'txn heed::RoTxn<MainT>,
ctx: &Context,
postings: &mut Postings<'o, 'txn>,
depth: usize,
query: &'o Query,
) -> MResult<Cow<'txn, Set<DocumentId>>>
{
let before = Instant::now();
let Query { prefix, kind, exact, .. } = query;
let docids: Cow<Set<_>> = match kind {
QueryKind::Tolerant(word) => {
if *prefix && word.len() <= 2 {
let prefix = {
let mut array = [0; 4];
let bytes = word.as_bytes();
array[..bytes.len()].copy_from_slice(bytes);
array
};
// We retrieve the cached postings lists for all
// the words that starts with this short prefix.
let result = ctx.prefix_postings_lists.prefix_postings_list(reader, prefix)?.unwrap_or_default();
let key = PostingsKey { query, input: word.clone().into_bytes(), distance: 0, is_exact: false };
postings.insert(key, result.matches);
let prefix_docids = &result.docids;
// We retrieve the exact postings list for the prefix,
// because we must consider these matches as exact.
let result = ctx.postings_lists.postings_list(reader, word.as_bytes())?.unwrap_or_default();
let key = PostingsKey { query, input: word.clone().into_bytes(), distance: 0, is_exact: true };
postings.insert(key, result.matches);
let exact_docids = &result.docids;
let before = Instant::now();
let docids = sdset::duo::Union::new(prefix_docids, exact_docids).into_set_buf();
debug!("{:4$}prefix docids ({} and {}) construction took {:.02?}",
"", prefix_docids.len(), exact_docids.len(), before.elapsed(), depth * 2);
Cow::Owned(docids)
} else {
let dfa = if *prefix { build_prefix_dfa(word) } else { build_dfa(word) };
let byte = word.as_bytes()[0];
let mut stream = if byte == u8::max_value() {
ctx.words_set.search(&dfa).ge(&[byte]).into_stream()
} else {
ctx.words_set.search(&dfa).ge(&[byte]).lt(&[byte + 1]).into_stream()
};
let before = Instant::now();
let mut results = Vec::new();
while let Some(input) = stream.next() {
if let Some(result) = ctx.postings_lists.postings_list(reader, input)? {
let distance = dfa.eval(input).to_u8();
let is_exact = *exact && distance == 0 && input.len() == word.len();
results.push(result.docids);
let key = PostingsKey { query, input: input.to_owned(), distance, is_exact };
postings.insert(key, result.matches);
}
}
debug!("{:3$}docids retrieval ({:?}) took {:.02?}", "", results.len(), before.elapsed(), depth * 2);
let before = Instant::now();
let docids = if results.len() > 10 {
let cap = results.iter().map(|dis| dis.len()).sum();
let mut docids = Vec::with_capacity(cap);
for dis in results {
docids.extend_from_slice(&dis);
}
SetBuf::from_dirty(docids)
} else {
let sets = results.iter().map(AsRef::as_ref).collect();
sdset::multi::Union::new(sets).into_set_buf()
};
debug!("{:2$}docids construction took {:.02?}", "", before.elapsed(), depth * 2);
Cow::Owned(docids)
}
},
QueryKind::NonTolerant(word) => {
// TODO support prefix and non-prefix exact DFA
let dfa = build_exact_dfa(word);
let byte = word.as_bytes()[0];
let mut stream = if byte == u8::max_value() {
ctx.words_set.search(&dfa).ge(&[byte]).into_stream()
} else {
ctx.words_set.search(&dfa).ge(&[byte]).lt(&[byte + 1]).into_stream()
};
let before = Instant::now();
let mut results = Vec::new();
while let Some(input) = stream.next() {
if let Some(result) = ctx.postings_lists.postings_list(reader, input)? {
let distance = dfa.eval(input).to_u8();
results.push(result.docids);
let key = PostingsKey { query, input: input.to_owned(), distance, is_exact: *exact };
postings.insert(key, result.matches);
}
}
debug!("{:3$}docids retrieval ({:?}) took {:.02?}", "", results.len(), before.elapsed(), depth * 2);
let before = Instant::now();
let docids = if results.len() > 10 {
let cap = results.iter().map(|dis| dis.len()).sum();
let mut docids = Vec::with_capacity(cap);
for dis in results {
docids.extend_from_slice(&dis);
}
SetBuf::from_dirty(docids)
} else {
let sets = results.iter().map(AsRef::as_ref).collect();
sdset::multi::Union::new(sets).into_set_buf()
};
debug!("{:2$}docids construction took {:.02?}", "", before.elapsed(), depth * 2);
Cow::Owned(docids)
},
QueryKind::Phrase(words) => {
// TODO support prefix and non-prefix exact DFA
if let [first, second] = words.as_slice() {
let first = ctx.postings_lists.postings_list(reader, first.as_bytes())?.unwrap_or_default();
let second = ctx.postings_lists.postings_list(reader, second.as_bytes())?.unwrap_or_default();
let iter = merge_join_by(first.matches.as_slice(), second.matches.as_slice(), |a, b| {
let x = (a.document_id, a.attribute, (a.word_index as u32) + 1);
let y = (b.document_id, b.attribute, b.word_index as u32);
x.cmp(&y)
});
let matches: Vec<_> = iter
.filter_map(EitherOrBoth::both)
.flat_map(|(a, b)| once(*a).chain(Some(*b)))
.collect();
let before = Instant::now();
let mut docids: Vec<_> = matches.iter().map(|m| m.document_id).collect();
docids.dedup();
let docids = SetBuf::new(docids).unwrap();
debug!("{:2$}docids construction took {:.02?}", "", before.elapsed(), depth * 2);
let matches = Cow::Owned(SetBuf::from_dirty(matches));
let key = PostingsKey { query, input: vec![], distance: 0, is_exact: true };
postings.insert(key, matches);
Cow::Owned(docids)
} else {
debug!("{:2$}{:?} skipped", "", words, depth * 2);
Cow::default()
}
},
};
debug!("{:4$}{:?} fetched {:?} documents in {:.02?}", "", query, docids.len(), before.elapsed(), depth * 2);
Ok(docids)
}
let mut cache = Cache::new();
let mut postings = Postings::new();
let docids = match tree {
Operation::And(ops) => execute_and(reader, ctx, &mut cache, &mut postings, 0, &ops)?,
Operation::Or(ops) => execute_or(reader, ctx, &mut cache, &mut postings, 0, &ops)?,
Operation::Query(query) => execute_query(reader, ctx, &mut postings, 0, &query)?,
};
Ok(QueryResult { docids, queries: postings })
}

View File

@ -1,416 +0,0 @@
use std::collections::HashMap;
use std::iter::FromIterator;
use std::ops::Range;
use intervaltree::{Element, IntervalTree};
pub type QueryId = usize;
pub struct QueryWordsMapper {
originals: Vec<String>,
mappings: HashMap<QueryId, (Range<usize>, Vec<String>)>,
}
impl QueryWordsMapper {
pub fn new<I, A>(originals: I) -> QueryWordsMapper
where I: IntoIterator<Item = A>,
A: ToString,
{
let originals = originals.into_iter().map(|s| s.to_string()).collect();
QueryWordsMapper { originals, mappings: HashMap::new() }
}
#[allow(clippy::len_zero)]
pub fn declare<I, A>(&mut self, range: Range<usize>, id: QueryId, replacement: I)
where I: IntoIterator<Item = A>,
A: ToString,
{
assert!(range.len() != 0);
assert!(self.originals.get(range.clone()).is_some());
assert!(id >= self.originals.len());
let replacement: Vec<_> = replacement.into_iter().map(|s| s.to_string()).collect();
assert!(!replacement.is_empty());
// We detect words at the end and at the front of the
// replacement that are common with the originals:
//
// x a b c d e f g
// ^^^/ \^^^
// a b x c d k j e f
// ^^^ ^^^
//
let left = &self.originals[..range.start];
let right = &self.originals[range.end..];
let common_left = longest_common_prefix(left, &replacement);
let common_right = longest_common_prefix(&replacement, right);
for i in 0..common_left {
let range = range.start - common_left + i..range.start - common_left + i + 1;
let replacement = vec![replacement[i].clone()];
self.mappings.insert(id + i, (range, replacement));
}
{
let replacement = replacement[common_left..replacement.len() - common_right].to_vec();
self.mappings.insert(id + common_left, (range.clone(), replacement));
}
for i in 0..common_right {
let id = id + replacement.len() - common_right + i;
let range = range.end + i..range.end + i + 1;
let replacement = vec![replacement[replacement.len() - common_right + i].clone()];
self.mappings.insert(id, (range, replacement));
}
}
pub fn mapping(self) -> HashMap<QueryId, Range<usize>> {
let mappings = self.mappings.into_iter().map(|(i, (r, v))| (r, (i, v)));
let intervals = IntervalTree::from_iter(mappings);
let mut output = HashMap::new();
let mut offset = 0;
// We map each original word to the biggest number of
// associated words.
for i in 0..self.originals.len() {
let max = intervals.query_point(i)
.filter_map(|e| {
if e.range.end - 1 == i {
let len = e.value.1.iter().skip(i - e.range.start).count();
if len != 0 { Some(len) } else { None }
} else { None }
})
.max()
.unwrap_or(1);
let range = i + offset..i + offset + max;
output.insert(i, range);
offset += max - 1;
}
// We retrieve the range that each original word
// is mapped to and apply it to each of the words.
for i in 0..self.originals.len() {
let iter = intervals.query_point(i).filter(|e| e.range.end - 1 == i);
for Element { range, value: (id, words) } in iter {
// We ask for the complete range mapped to the area we map.
let start = output.get(&range.start).map(|r| r.start).unwrap_or(range.start);
let end = output.get(&(range.end - 1)).map(|r| r.end).unwrap_or(range.end);
let range = start..end;
// We map each query id to one word until the last,
// we map it to the remainings words.
let add = range.len() - words.len();
for (j, x) in range.take(words.len()).enumerate() {
let add = if j == words.len() - 1 { add } else { 0 }; // is last?
let range = x..x + 1 + add;
output.insert(id + j, range);
}
}
}
output
}
}
fn longest_common_prefix<T: Eq + std::fmt::Debug>(a: &[T], b: &[T]) -> usize {
let mut best = None;
for i in (0..a.len()).rev() {
let count = a[i..].iter().zip(b).take_while(|(a, b)| a == b).count();
best = match best {
Some(old) if count > old => Some(count),
Some(_) => break,
None => Some(count),
};
}
best.unwrap_or(0)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn original_unmodified() {
let query = ["new", "york", "city", "subway"];
// 0 1 2 3
let mut builder = QueryWordsMapper::new(&query);
// new york = new york city
builder.declare(0..2, 4, &["new", "york", "city"]);
// ^ 4 5 6
// new = new york city
builder.declare(0..1, 7, &["new", "york", "city"]);
// ^ 7 8 9
let mapping = builder.mapping();
assert_eq!(mapping[&0], 0..1); // new
assert_eq!(mapping[&1], 1..2); // york
assert_eq!(mapping[&2], 2..3); // city
assert_eq!(mapping[&3], 3..4); // subway
assert_eq!(mapping[&4], 0..1); // new
assert_eq!(mapping[&5], 1..2); // york
assert_eq!(mapping[&6], 2..3); // city
assert_eq!(mapping[&7], 0..1); // new
assert_eq!(mapping[&8], 1..2); // york
assert_eq!(mapping[&9], 2..3); // city
}
#[test]
fn original_unmodified2() {
let query = ["new", "york", "city", "subway"];
// 0 1 2 3
let mut builder = QueryWordsMapper::new(&query);
// city subway = new york city underground train
builder.declare(2..4, 4, &["new", "york", "city", "underground", "train"]);
// ^ 4 5 6 7 8
let mapping = builder.mapping();
assert_eq!(mapping[&0], 0..1); // new
assert_eq!(mapping[&1], 1..2); // york
assert_eq!(mapping[&2], 2..3); // city
assert_eq!(mapping[&3], 3..5); // subway
assert_eq!(mapping[&4], 0..1); // new
assert_eq!(mapping[&5], 1..2); // york
assert_eq!(mapping[&6], 2..3); // city
assert_eq!(mapping[&7], 3..4); // underground
assert_eq!(mapping[&8], 4..5); // train
}
#[test]
fn original_unmodified3() {
let query = ["a", "b", "x", "x", "a", "b", "c", "d", "e", "f", "g"];
// 0 1 2 3 4 5 6 7 8 9 10
let mut builder = QueryWordsMapper::new(&query);
// c d = a b x c d k j e f
builder.declare(6..8, 11, &["a", "b", "x", "c", "d", "k", "j", "e", "f"]);
// ^^ 11 12 13 14 15 16 17 18 19
let mapping = builder.mapping();
assert_eq!(mapping[&0], 0..1); // a
assert_eq!(mapping[&1], 1..2); // b
assert_eq!(mapping[&2], 2..3); // x
assert_eq!(mapping[&3], 3..4); // x
assert_eq!(mapping[&4], 4..5); // a
assert_eq!(mapping[&5], 5..6); // b
assert_eq!(mapping[&6], 6..7); // c
assert_eq!(mapping[&7], 7..11); // d
assert_eq!(mapping[&8], 11..12); // e
assert_eq!(mapping[&9], 12..13); // f
assert_eq!(mapping[&10], 13..14); // g
assert_eq!(mapping[&11], 4..5); // a
assert_eq!(mapping[&12], 5..6); // b
assert_eq!(mapping[&13], 6..7); // x
assert_eq!(mapping[&14], 7..8); // c
assert_eq!(mapping[&15], 8..9); // d
assert_eq!(mapping[&16], 9..10); // k
assert_eq!(mapping[&17], 10..11); // j
assert_eq!(mapping[&18], 11..12); // e
assert_eq!(mapping[&19], 12..13); // f
}
#[test]
fn simple_growing() {
let query = ["new", "york", "subway"];
// 0 1 2
let mut builder = QueryWordsMapper::new(&query);
// new york = new york city
builder.declare(0..2, 3, &["new", "york", "city"]);
// ^ 3 4 5
let mapping = builder.mapping();
assert_eq!(mapping[&0], 0..1); // new
assert_eq!(mapping[&1], 1..3); // york
assert_eq!(mapping[&2], 3..4); // subway
assert_eq!(mapping[&3], 0..1); // new
assert_eq!(mapping[&4], 1..2); // york
assert_eq!(mapping[&5], 2..3); // city
}
#[test]
fn same_place_growings() {
let query = ["NY", "subway"];
// 0 1
let mut builder = QueryWordsMapper::new(&query);
// NY = new york
builder.declare(0..1, 2, &["new", "york"]);
// ^ 2 3
// NY = new york city
builder.declare(0..1, 4, &["new", "york", "city"]);
// ^ 4 5 6
// NY = NYC
builder.declare(0..1, 7, &["NYC"]);
// ^ 7
// NY = new york city
builder.declare(0..1, 8, &["new", "york", "city"]);
// ^ 8 9 10
// subway = underground train
builder.declare(1..2, 11, &["underground", "train"]);
// ^ 11 12
let mapping = builder.mapping();
assert_eq!(mapping[&0], 0..3); // NY
assert_eq!(mapping[&1], 3..5); // subway
assert_eq!(mapping[&2], 0..1); // new
assert_eq!(mapping[&3], 1..3); // york
assert_eq!(mapping[&4], 0..1); // new
assert_eq!(mapping[&5], 1..2); // york
assert_eq!(mapping[&6], 2..3); // city
assert_eq!(mapping[&7], 0..3); // NYC
assert_eq!(mapping[&8], 0..1); // new
assert_eq!(mapping[&9], 1..2); // york
assert_eq!(mapping[&10], 2..3); // city
assert_eq!(mapping[&11], 3..4); // underground
assert_eq!(mapping[&12], 4..5); // train
}
#[test]
fn bigger_growing() {
let query = ["NYC", "subway"];
// 0 1
let mut builder = QueryWordsMapper::new(&query);
// NYC = new york city
builder.declare(0..1, 2, &["new", "york", "city"]);
// ^ 2 3 4
let mapping = builder.mapping();
assert_eq!(mapping[&0], 0..3); // NYC
assert_eq!(mapping[&1], 3..4); // subway
assert_eq!(mapping[&2], 0..1); // new
assert_eq!(mapping[&3], 1..2); // york
assert_eq!(mapping[&4], 2..3); // city
}
#[test]
fn middle_query_growing() {
let query = ["great", "awesome", "NYC", "subway"];
// 0 1 2 3
let mut builder = QueryWordsMapper::new(&query);
// NYC = new york city
builder.declare(2..3, 4, &["new", "york", "city"]);
// ^ 4 5 6
let mapping = builder.mapping();
assert_eq!(mapping[&0], 0..1); // great
assert_eq!(mapping[&1], 1..2); // awesome
assert_eq!(mapping[&2], 2..5); // NYC
assert_eq!(mapping[&3], 5..6); // subway
assert_eq!(mapping[&4], 2..3); // new
assert_eq!(mapping[&5], 3..4); // york
assert_eq!(mapping[&6], 4..5); // city
}
#[test]
fn end_query_growing() {
let query = ["NYC", "subway"];
// 0 1
let mut builder = QueryWordsMapper::new(&query);
// NYC = new york city
builder.declare(1..2, 2, &["underground", "train"]);
// ^ 2 3
let mapping = builder.mapping();
assert_eq!(mapping[&0], 0..1); // NYC
assert_eq!(mapping[&1], 1..3); // subway
assert_eq!(mapping[&2], 1..2); // underground
assert_eq!(mapping[&3], 2..3); // train
}
#[test]
fn multiple_growings() {
let query = ["great", "awesome", "NYC", "subway"];
// 0 1 2 3
let mut builder = QueryWordsMapper::new(&query);
// NYC = new york city
builder.declare(2..3, 4, &["new", "york", "city"]);
// ^ 4 5 6
// subway = underground train
builder.declare(3..4, 7, &["underground", "train"]);
// ^ 7 8
let mapping = builder.mapping();
assert_eq!(mapping[&0], 0..1); // great
assert_eq!(mapping[&1], 1..2); // awesome
assert_eq!(mapping[&2], 2..5); // NYC
assert_eq!(mapping[&3], 5..7); // subway
assert_eq!(mapping[&4], 2..3); // new
assert_eq!(mapping[&5], 3..4); // york
assert_eq!(mapping[&6], 4..5); // city
assert_eq!(mapping[&7], 5..6); // underground
assert_eq!(mapping[&8], 6..7); // train
}
#[test]
fn multiple_probable_growings() {
let query = ["great", "awesome", "NYC", "subway"];
// 0 1 2 3
let mut builder = QueryWordsMapper::new(&query);
// NYC = new york city
builder.declare(2..3, 4, &["new", "york", "city"]);
// ^ 4 5 6
// subway = underground train
builder.declare(3..4, 7, &["underground", "train"]);
// ^ 7 8
// great awesome = good
builder.declare(0..2, 9, &["good"]);
// ^ 9
// awesome NYC = NY
builder.declare(1..3, 10, &["NY"]);
// ^^ 10
// NYC subway = metro
builder.declare(2..4, 11, &["metro"]);
// ^^ 11
let mapping = builder.mapping();
assert_eq!(mapping[&0], 0..1); // great
assert_eq!(mapping[&1], 1..2); // awesome
assert_eq!(mapping[&2], 2..5); // NYC
assert_eq!(mapping[&3], 5..7); // subway
assert_eq!(mapping[&4], 2..3); // new
assert_eq!(mapping[&5], 3..4); // york
assert_eq!(mapping[&6], 4..5); // city
assert_eq!(mapping[&7], 5..6); // underground
assert_eq!(mapping[&8], 6..7); // train
assert_eq!(mapping[&9], 0..2); // good
assert_eq!(mapping[&10], 1..5); // NY
assert_eq!(mapping[&11], 2..7); // metro
}
}

View File

@ -1,41 +0,0 @@
use std::io::{Read, Write};
use hashbrown::HashMap;
use meilisearch_schema::FieldId;
use serde::{Deserialize, Serialize};
use crate::{DocumentId, Number};
#[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[serde(transparent)]
pub struct RankedMap(HashMap<(DocumentId, FieldId), Number>);
impl RankedMap {
pub fn len(&self) -> usize {
self.0.len()
}
pub fn is_empty(&self) -> bool {
self.0.is_empty()
}
pub fn insert(&mut self, document: DocumentId, field: FieldId, number: Number) {
self.0.insert((document, field), number);
}
pub fn remove(&mut self, document: DocumentId, field: FieldId) {
self.0.remove(&(document, field));
}
pub fn get(&self, document: DocumentId, field: FieldId) -> Option<Number> {
self.0.get(&(document, field)).cloned()
}
pub fn read_from_bin<R: Read>(reader: R) -> bincode::Result<RankedMap> {
bincode::deserialize_from(reader).map(RankedMap)
}
pub fn write_to_bin<W: Write>(&self, writer: W) -> bincode::Result<()> {
bincode::serialize_into(writer, &self.0)
}
}

View File

@ -1,51 +0,0 @@
use compact_arena::SmallArena;
use sdset::SetBuf;
use crate::DocIndex;
use crate::bucket_sort::{SimpleMatch, BareMatch, PostingsListView};
use crate::reordered_attrs::ReorderedAttrs;
pub struct RawDocument<'a, 'tag> {
pub id: crate::DocumentId,
pub bare_matches: &'a mut [BareMatch<'tag>],
pub processed_matches: Vec<SimpleMatch>,
/// The list of minimum `distance` found
pub processed_distances: Vec<Option<u8>>,
/// Does this document contains a field
/// with one word that is exactly matching
pub contains_one_word_field: bool,
}
impl<'a, 'tag> RawDocument<'a, 'tag> {
pub fn new<'txn>(
bare_matches: &'a mut [BareMatch<'tag>],
postings_lists: &mut SmallArena<'tag, PostingsListView<'txn>>,
searchable_attrs: Option<&ReorderedAttrs>,
) -> RawDocument<'a, 'tag>
{
if let Some(reordered_attrs) = searchable_attrs {
for bm in bare_matches.iter() {
let postings_list = &postings_lists[bm.postings_list];
let mut rewritten = Vec::new();
for di in postings_list.iter() {
if let Some(attribute) = reordered_attrs.get(di.attribute) {
rewritten.push(DocIndex { attribute, ..*di });
}
}
let new_postings = SetBuf::from_dirty(rewritten);
postings_lists[bm.postings_list].rewrite_with(new_postings);
}
}
bare_matches.sort_unstable_by_key(|m| m.query_index);
RawDocument {
id: bare_matches[0].document_id,
bare_matches,
processed_matches: Vec::new(),
processed_distances: Vec::new(),
contains_one_word_field: false,
}
}
}

View File

@ -1,344 +0,0 @@
use std::borrow::Cow;
use std::collections::{BTreeMap, HashMap};
use std::convert::TryFrom;
use meilisearch_schema::IndexedPos;
use meilisearch_tokenizer::analyzer::{Analyzer, AnalyzerConfig};
use meilisearch_tokenizer::{Token, token::SeparatorKind, TokenKind};
use sdset::SetBuf;
use crate::{DocIndex, DocumentId};
use crate::FstSetCow;
const WORD_LENGTH_LIMIT: usize = 80;
type Word = Vec<u8>; // TODO make it be a SmallVec
pub struct RawIndexer<'a, A> {
word_limit: usize, // the maximum number of indexed words
words_doc_indexes: BTreeMap<Word, Vec<DocIndex>>,
docs_words: HashMap<DocumentId, Vec<Word>>,
analyzer: Analyzer<'a, A>,
}
pub struct Indexed<'a> {
pub words_doc_indexes: BTreeMap<Word, SetBuf<DocIndex>>,
pub docs_words: HashMap<DocumentId, FstSetCow<'a>>,
}
impl<'a, A> RawIndexer<'a, A>
where
A: AsRef<[u8]>
{
pub fn new(stop_words: &'a fst::Set<A>) -> RawIndexer<'a, A> {
RawIndexer::with_word_limit(stop_words, 1000)
}
pub fn with_word_limit(stop_words: &'a fst::Set<A>, limit: usize) -> RawIndexer<A> {
RawIndexer {
word_limit: limit,
words_doc_indexes: BTreeMap::new(),
docs_words: HashMap::new(),
analyzer: Analyzer::new(AnalyzerConfig::default_with_stopwords(stop_words)),
}
}
pub fn index_text(&mut self, id: DocumentId, indexed_pos: IndexedPos, text: &str) -> usize {
let mut number_of_words = 0;
let analyzed_text = self.analyzer.analyze(text);
for (token_pos, (word_pos, token)) in process_tokens(analyzed_text.tokens()).enumerate() {
let must_continue = index_token(
token,
word_pos,
token_pos,
id,
indexed_pos,
self.word_limit,
&mut self.words_doc_indexes,
&mut self.docs_words,
);
number_of_words += 1;
if !must_continue {
break;
}
}
number_of_words
}
pub fn index_text_seq<'s, I>(&mut self, id: DocumentId, indexed_pos: IndexedPos, text_iter: I)
where
I: IntoIterator<Item = &'s str>,
{
let mut word_offset = 0;
for text in text_iter.into_iter() {
let current_word_offset = word_offset;
let analyzed_text = self.analyzer.analyze(text);
let tokens = process_tokens(analyzed_text.tokens())
.map(|(i, t)| (i + current_word_offset, t))
.enumerate();
for (token_pos, (word_pos, token)) in tokens {
word_offset = word_pos + 1;
let must_continue = index_token(
token,
word_pos,
token_pos,
id,
indexed_pos,
self.word_limit,
&mut self.words_doc_indexes,
&mut self.docs_words,
);
if !must_continue {
break;
}
}
}
}
pub fn build(self) -> Indexed<'static> {
let words_doc_indexes = self
.words_doc_indexes
.into_iter()
.map(|(word, indexes)| (word, SetBuf::from_dirty(indexes)))
.collect();
let docs_words = self
.docs_words
.into_iter()
.map(|(id, mut words)| {
words.sort_unstable();
words.dedup();
let fst = fst::Set::from_iter(words).unwrap().map_data(Cow::Owned).unwrap();
(id, fst)
})
.collect();
Indexed {
words_doc_indexes,
docs_words,
}
}
}
fn process_tokens<'a>(tokens: impl Iterator<Item = Token<'a>>) -> impl Iterator<Item = (usize, Token<'a>)> {
tokens
.skip_while(|token| !token.is_word())
.scan((0, None), |(offset, prev_kind), token| {
match token.kind {
TokenKind::Word | TokenKind::StopWord | TokenKind::Unknown => {
*offset += match *prev_kind {
Some(TokenKind::Separator(SeparatorKind::Hard)) => 8,
Some(_) => 1,
None => 0,
};
*prev_kind = Some(token.kind)
}
TokenKind::Separator(SeparatorKind::Hard) => {
*prev_kind = Some(token.kind);
}
TokenKind::Separator(SeparatorKind::Soft)
if *prev_kind != Some(TokenKind::Separator(SeparatorKind::Hard)) => {
*prev_kind = Some(token.kind);
}
_ => (),
}
Some((*offset, token))
})
.filter(|(_, t)| t.is_word())
}
#[allow(clippy::too_many_arguments)]
fn index_token(
token: Token,
word_pos: usize,
token_pos: usize,
id: DocumentId,
indexed_pos: IndexedPos,
word_limit: usize,
words_doc_indexes: &mut BTreeMap<Word, Vec<DocIndex>>,
docs_words: &mut HashMap<DocumentId, Vec<Word>>,
) -> bool
{
if token_pos >= word_limit {
return false;
}
if !token.is_stopword() {
match token_to_docindex(id, indexed_pos, &token, word_pos) {
Some(docindex) => {
let word = Vec::from(token.word.as_ref());
if word.len() <= WORD_LENGTH_LIMIT {
words_doc_indexes
.entry(word.clone())
.or_insert_with(Vec::new)
.push(docindex);
docs_words.entry(id).or_insert_with(Vec::new).push(word);
}
}
None => return false,
}
}
true
}
fn token_to_docindex(id: DocumentId, indexed_pos: IndexedPos, token: &Token, word_index: usize) -> Option<DocIndex> {
let word_index = u16::try_from(word_index).ok()?;
let char_index = u16::try_from(token.byte_start).ok()?;
let char_length = u16::try_from(token.word.len()).ok()?;
let docindex = DocIndex {
document_id: id,
attribute: indexed_pos.0,
word_index,
char_index,
char_length,
};
Some(docindex)
}
#[cfg(test)]
mod tests {
use super::*;
use meilisearch_schema::IndexedPos;
use meilisearch_tokenizer::{Analyzer, AnalyzerConfig};
use fst::Set;
#[test]
fn test_process_token() {
let text = " 為一包含一千多萬目詞的帶標記平衡語料庫";
let stopwords = Set::default();
let analyzer = Analyzer::new(AnalyzerConfig::default_with_stopwords(&stopwords));
let analyzer = analyzer.analyze(text);
let tokens: Vec<_> = process_tokens(analyzer.tokens()).map(|(_, t)| t.text().to_string()).collect();
assert_eq!(tokens, ["", "", "包含", "一千多万", "目词", "", "", "标记", "平衡", "语料库"]);
}
#[test]
fn strange_apostrophe() {
let stop_words = fst::Set::default();
let mut indexer = RawIndexer::new(&stop_words);
let docid = DocumentId(0);
let indexed_pos = IndexedPos(0);
let text = "Zut, laspirateur, jai oublié de léteindre !";
indexer.index_text(docid, indexed_pos, text);
let Indexed {
words_doc_indexes, ..
} = indexer.build();
assert!(words_doc_indexes.get(&b"l"[..]).is_some());
assert!(words_doc_indexes.get(&b"aspirateur"[..]).is_some());
assert!(words_doc_indexes.get(&b"ai"[..]).is_some());
assert!(words_doc_indexes.get(&b"eteindre"[..]).is_some());
}
#[test]
fn strange_apostrophe_in_sequence() {
let stop_words = fst::Set::default();
let mut indexer = RawIndexer::new(&stop_words);
let docid = DocumentId(0);
let indexed_pos = IndexedPos(0);
let text = vec!["Zut, laspirateur, jai oublié de léteindre !"];
indexer.index_text_seq(docid, indexed_pos, text);
let Indexed {
words_doc_indexes, ..
} = indexer.build();
assert!(words_doc_indexes.get(&b"l"[..]).is_some());
assert!(words_doc_indexes.get(&b"aspirateur"[..]).is_some());
assert!(words_doc_indexes.get(&b"ai"[..]).is_some());
assert!(words_doc_indexes.get(&b"eteindre"[..]).is_some());
}
#[test]
fn basic_stop_words() {
let stop_words = sdset::SetBuf::from_dirty(vec!["l", "j", "ai", "de"]);
let stop_words = fst::Set::from_iter(stop_words).unwrap();
let mut indexer = RawIndexer::new(&stop_words);
let docid = DocumentId(0);
let indexed_pos = IndexedPos(0);
let text = "Zut, laspirateur, jai oublié de léteindre !";
indexer.index_text(docid, indexed_pos, text);
let Indexed {
words_doc_indexes, ..
} = indexer.build();
assert!(words_doc_indexes.get(&b"l"[..]).is_none());
assert!(words_doc_indexes.get(&b"aspirateur"[..]).is_some());
assert!(words_doc_indexes.get(&b"j"[..]).is_none());
assert!(words_doc_indexes.get(&b"ai"[..]).is_none());
assert!(words_doc_indexes.get(&b"de"[..]).is_none());
assert!(words_doc_indexes.get(&b"eteindre"[..]).is_some());
}
#[test]
fn no_empty_unidecode() {
let stop_words = fst::Set::default();
let mut indexer = RawIndexer::new(&stop_words);
let docid = DocumentId(0);
let indexed_pos = IndexedPos(0);
let text = "🇯🇵";
indexer.index_text(docid, indexed_pos, text);
let Indexed {
words_doc_indexes, ..
} = indexer.build();
assert!(words_doc_indexes
.get(&"🇯🇵".to_owned().into_bytes())
.is_some());
}
#[test]
// test sample from 807
fn very_long_text() {
let stop_words = fst::Set::default();
let mut indexer = RawIndexer::new(&stop_words);
let indexed_pos = IndexedPos(0);
let docid = DocumentId(0);
let text = " The locations block is the most powerful, and potentially most involved, section of the .platform.app.yaml file. It allows you to control how the application container responds to incoming requests at a very fine-grained level. Common patterns also vary between language containers due to the way PHP-FPM handles incoming requests.\nEach entry of the locations block is an absolute URI path (with leading /) and its value includes the configuration directives for how the web server should handle matching requests. That is, if your domain is example.com then '/' means &ldquo;requests for example.com/&rdquo;, while '/admin' means &ldquo;requests for example.com/admin&rdquo;. If multiple blocks could match an incoming request then the most-specific will apply.\nweb:locations:&#39;/&#39;:# Rules for all requests that don&#39;t otherwise match....&#39;/sites/default/files&#39;:# Rules for any requests that begin with /sites/default/files....The simplest possible locations configuration is one that simply passes all requests on to your application unconditionally:\nweb:locations:&#39;/&#39;:passthru:trueThat is, all requests to /* should be forwarded to the process started by web.commands.start above. Note that for PHP containers the passthru key must specify what PHP file the request should be forwarded to, and must also specify a docroot under which the file lives. For example:\nweb:locations:&#39;/&#39;:root:&#39;web&#39;passthru:&#39;/app.php&#39;This block will serve requests to / from the web directory in the application, and if a file doesn&rsquo;t exist on disk then the request will be forwarded to the /app.php script.\nA full list of the possible subkeys for locations is below.\n root: The folder from which to serve static assets for this location relative to the application root. The application root is the directory in which the .platform.app.yaml file is located. Typical values for this property include public or web. Setting it to '' is not recommended, and its behavior may vary depending on the type of application. Absolute paths are not supported.\n passthru: Whether to forward disallowed and missing resources from this location to the application and can be true, false or an absolute URI path (with leading /). The default value is false. For non-PHP applications it will generally be just true or false. In a PHP application this will typically be the front controller such as /index.php or /app.php. This entry works similar to mod_rewrite under Apache. Note: If the value of passthru does not begin with the same value as the location key it is under, the passthru may evaluate to another entry. That may be useful when you want different cache settings for different paths, for instance, but want missing files in all of them to map back to the same front controller. See the example block below.\n index: The files to consider when serving a request for a directory: an array of file names or null. (typically ['index.html']). Note that in order for this to work, access to the static files named must be allowed by the allow or rules keys for this location.\n expires: How long to allow static assets from this location to be cached (this enables the Cache-Control and Expires headers) and can be a time or -1 for no caching (default). Times can be suffixed with &ldquo;ms&rdquo; (milliseconds), &ldquo;s&rdquo; (seconds), &ldquo;m&rdquo; (minutes), &ldquo;h&rdquo; (hours), &ldquo;d&rdquo; (days), &ldquo;w&rdquo; (weeks), &ldquo;M&rdquo; (months, 30d) or &ldquo;y&rdquo; (years, 365d).\n scripts: Whether to allow loading scripts in that location (true or false). This directive is only meaningful on PHP.\n allow: Whether to allow serving files which don&rsquo;t match a rule (true or false, default: true).\n headers: Any additional headers to apply to static assets. This section is a mapping of header names to header values. Responses from the application aren&rsquo;t affected, to avoid overlap with the application&rsquo;s own ability to include custom headers in the response.\n rules: Specific overrides for a specific location. The key is a PCRE (regular expression) that is matched against the full request path.\n request_buffering: Most application servers do not support chunked requests (e.g. fpm, uwsgi), so Platform.sh enables request_buffering by default to handle them. That default configuration would look like this if it was present in .platform.app.yaml:\nweb:locations:&#39;/&#39;:passthru:truerequest_buffering:enabled:truemax_request_size:250mIf the application server can already efficiently handle chunked requests, the request_buffering subkey can be modified to disable it entirely (enabled: false). Additionally, applications that frequently deal with uploads greater than 250MB in size can update the max_request_size key to the application&rsquo;s needs. Note that modifications to request_buffering will need to be specified at each location where it is desired.\n ";
indexer.index_text(docid, indexed_pos, text);
let Indexed {
words_doc_indexes, ..
} = indexer.build();
assert!(words_doc_indexes.get(&"request".to_owned().into_bytes()).is_some());
}
#[test]
fn words_over_index_1000_not_indexed() {
let stop_words = fst::Set::default();
let mut indexer = RawIndexer::new(&stop_words);
let indexed_pos = IndexedPos(0);
let docid = DocumentId(0);
let mut text = String::with_capacity(5000);
for _ in 0..1000 {
text.push_str("less ");
}
text.push_str("more");
indexer.index_text(docid, indexed_pos, &text);
let Indexed {
words_doc_indexes, ..
} = indexer.build();
assert!(words_doc_indexes.get(&"less".to_owned().into_bytes()).is_some());
assert!(words_doc_indexes.get(&"more".to_owned().into_bytes()).is_none());
}
}

View File

@ -1,31 +0,0 @@
use std::cmp;
#[derive(Default, Clone)]
pub struct ReorderedAttrs {
reorders: Vec<Option<u16>>,
reverse: Vec<u16>,
}
impl ReorderedAttrs {
pub fn new() -> ReorderedAttrs {
ReorderedAttrs { reorders: Vec::new(), reverse: Vec::new() }
}
pub fn insert_attribute(&mut self, attribute: u16) {
let new_len = cmp::max(attribute as usize + 1, self.reorders.len());
self.reorders.resize(new_len, None);
self.reorders[attribute as usize] = Some(self.reverse.len() as u16);
self.reverse.push(attribute);
}
pub fn get(&self, attribute: u16) -> Option<u16> {
match self.reorders.get(attribute as usize)? {
Some(attribute) => Some(*attribute),
None => None,
}
}
pub fn reverse(&self, attribute: u16) -> Option<u16> {
self.reverse.get(attribute as usize).copied()
}
}

View File

@ -1,161 +0,0 @@
use std::collections::HashSet;
use std::io::Cursor;
use std::{error::Error, fmt};
use meilisearch_schema::{Schema, FieldId};
use serde::{de, forward_to_deserialize_any};
use serde_json::de::IoRead as SerdeJsonIoRead;
use serde_json::Deserializer as SerdeJsonDeserializer;
use serde_json::Error as SerdeJsonError;
use crate::database::MainT;
use crate::store::DocumentsFields;
use crate::DocumentId;
#[derive(Debug)]
pub enum DeserializerError {
SerdeJson(SerdeJsonError),
Zlmdb(heed::Error),
Custom(String),
}
impl de::Error for DeserializerError {
fn custom<T: fmt::Display>(msg: T) -> Self {
DeserializerError::Custom(msg.to_string())
}
}
impl fmt::Display for DeserializerError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
DeserializerError::SerdeJson(e) => write!(f, "serde json related error: {}", e),
DeserializerError::Zlmdb(e) => write!(f, "heed related error: {}", e),
DeserializerError::Custom(s) => f.write_str(s),
}
}
}
impl Error for DeserializerError {}
impl From<SerdeJsonError> for DeserializerError {
fn from(error: SerdeJsonError) -> DeserializerError {
DeserializerError::SerdeJson(error)
}
}
impl From<heed::Error> for DeserializerError {
fn from(error: heed::Error) -> DeserializerError {
DeserializerError::Zlmdb(error)
}
}
pub struct Deserializer<'a> {
pub document_id: DocumentId,
pub reader: &'a heed::RoTxn<'a, MainT>,
pub documents_fields: DocumentsFields,
pub schema: &'a Schema,
pub fields: Option<&'a HashSet<FieldId>>,
}
impl<'de, 'a, 'b> de::Deserializer<'de> for &'b mut Deserializer<'a> {
type Error = DeserializerError;
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: de::Visitor<'de>,
{
self.deserialize_option(visitor)
}
fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: de::Visitor<'de>,
{
self.deserialize_map(visitor)
}
fn deserialize_map<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: de::Visitor<'de>,
{
let mut error = None;
let iter = self
.documents_fields
.document_fields(self.reader, self.document_id)?
.filter_map(|result| {
let (attr, value) = match result {
Ok(value) => value,
Err(e) => {
error = Some(e);
return None;
}
};
let is_displayed = self.schema.is_displayed(attr);
if is_displayed && self.fields.map_or(true, |f| f.contains(&attr)) {
if let Some(attribute_name) = self.schema.name(attr) {
let cursor = Cursor::new(value.to_owned());
let ioread = SerdeJsonIoRead::new(cursor);
let value = Value(SerdeJsonDeserializer::new(ioread));
Some((attribute_name, value))
} else {
None
}
} else {
None
}
});
let mut iter = iter.peekable();
let result = match iter.peek() {
Some(_) => {
let map_deserializer = de::value::MapDeserializer::new(iter);
visitor
.visit_some(map_deserializer)
.map_err(DeserializerError::from)
}
None => visitor.visit_none(),
};
match error.take() {
Some(error) => Err(error.into()),
None => result,
}
}
forward_to_deserialize_any! {
bool i8 i16 i32 i64 i128 u8 u16 u32 u64 u128 f32 f64 char str string
bytes byte_buf unit unit_struct newtype_struct seq tuple
tuple_struct struct enum identifier ignored_any
}
}
struct Value(SerdeJsonDeserializer<SerdeJsonIoRead<Cursor<Vec<u8>>>>);
impl<'de> de::IntoDeserializer<'de, SerdeJsonError> for Value {
type Deserializer = Self;
fn into_deserializer(self) -> Self::Deserializer {
self
}
}
impl<'de> de::Deserializer<'de> for Value {
type Error = SerdeJsonError;
fn deserialize_any<V>(mut self, visitor: V) -> Result<V::Value, Self::Error>
where
V: de::Visitor<'de>,
{
self.0.deserialize_any(visitor)
}
forward_to_deserialize_any! {
bool i8 i16 i32 i64 i128 u8 u16 u32 u64 u128 f32 f64 char str string
bytes byte_buf option unit unit_struct newtype_struct seq tuple
tuple_struct map struct enum identifier ignored_any
}
}

View File

@ -1,92 +0,0 @@
mod deserializer;
pub use self::deserializer::{Deserializer, DeserializerError};
use std::{error::Error, fmt};
use serde::ser;
use serde_json::Error as SerdeJsonError;
use meilisearch_schema::Error as SchemaError;
use crate::ParseNumberError;
#[derive(Debug)]
pub enum SerializerError {
DocumentIdNotFound,
InvalidDocumentIdFormat,
Zlmdb(heed::Error),
SerdeJson(SerdeJsonError),
ParseNumber(ParseNumberError),
Schema(SchemaError),
UnserializableType { type_name: &'static str },
UnindexableType { type_name: &'static str },
UnrankableType { type_name: &'static str },
Custom(String),
}
impl ser::Error for SerializerError {
fn custom<T: fmt::Display>(msg: T) -> Self {
SerializerError::Custom(msg.to_string())
}
}
impl fmt::Display for SerializerError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
SerializerError::DocumentIdNotFound => {
f.write_str("Primary key is missing.")
}
SerializerError::InvalidDocumentIdFormat => {
f.write_str("a document primary key can be of type integer or string only composed of alphanumeric characters, hyphens (-) and underscores (_).")
}
SerializerError::Zlmdb(e) => write!(f, "heed related error: {}", e),
SerializerError::SerdeJson(e) => write!(f, "serde json error: {}", e),
SerializerError::ParseNumber(e) => {
write!(f, "error while trying to parse a number: {}", e)
}
SerializerError::Schema(e) => write!(f, "impossible to update schema: {}", e),
SerializerError::UnserializableType { type_name } => {
write!(f, "{} is not a serializable type", type_name)
}
SerializerError::UnindexableType { type_name } => {
write!(f, "{} is not an indexable type", type_name)
}
SerializerError::UnrankableType { type_name } => {
write!(f, "{} types can not be used for ranking", type_name)
}
SerializerError::Custom(s) => f.write_str(s),
}
}
}
impl Error for SerializerError {}
impl From<String> for SerializerError {
fn from(value: String) -> SerializerError {
SerializerError::Custom(value)
}
}
impl From<SerdeJsonError> for SerializerError {
fn from(error: SerdeJsonError) -> SerializerError {
SerializerError::SerdeJson(error)
}
}
impl From<heed::Error> for SerializerError {
fn from(error: heed::Error) -> SerializerError {
SerializerError::Zlmdb(error)
}
}
impl From<ParseNumberError> for SerializerError {
fn from(error: ParseNumberError) -> SerializerError {
SerializerError::ParseNumber(error)
}
}
impl From<SchemaError> for SerializerError {
fn from(error: SchemaError) -> SerializerError {
SerializerError::Schema(error)
}
}

View File

@ -1,183 +0,0 @@
use std::collections::{BTreeMap, BTreeSet};
use std::str::FromStr;
use std::iter::IntoIterator;
use serde::{Deserialize, Deserializer, Serialize};
use once_cell::sync::Lazy;
use self::RankingRule::*;
pub const DEFAULT_RANKING_RULES: [RankingRule; 6] = [Typo, Words, Proximity, Attribute, WordsPosition, Exactness];
static RANKING_RULE_REGEX: Lazy<regex::Regex> = Lazy::new(|| {
regex::Regex::new(r"(asc|desc)\(([a-zA-Z0-9-_]*)\)").unwrap()
});
#[derive(Default, Clone, Serialize, Deserialize, Debug)]
#[serde(rename_all = "camelCase", deny_unknown_fields)]
pub struct Settings {
#[serde(default, deserialize_with = "deserialize_some")]
pub ranking_rules: Option<Option<Vec<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub distinct_attribute: Option<Option<String>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub searchable_attributes: Option<Option<Vec<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub displayed_attributes: Option<Option<BTreeSet<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub stop_words: Option<Option<BTreeSet<String>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub synonyms: Option<Option<BTreeMap<String, Vec<String>>>>,
#[serde(default, deserialize_with = "deserialize_some")]
pub attributes_for_faceting: Option<Option<Vec<String>>>,
}
// Any value that is present is considered Some value, including null.
fn deserialize_some<'de, T, D>(deserializer: D) -> Result<Option<T>, D::Error>
where T: Deserialize<'de>,
D: Deserializer<'de>
{
Deserialize::deserialize(deserializer).map(Some)
}
impl Settings {
pub fn to_update(&self) -> Result<SettingsUpdate, RankingRuleConversionError> {
let settings = self.clone();
let ranking_rules = match settings.ranking_rules {
Some(Some(rules)) => UpdateState::Update(RankingRule::try_from_iter(rules.iter())?),
Some(None) => UpdateState::Clear,
None => UpdateState::Nothing,
};
Ok(SettingsUpdate {
ranking_rules,
distinct_attribute: settings.distinct_attribute.into(),
primary_key: UpdateState::Nothing,
searchable_attributes: settings.searchable_attributes.into(),
displayed_attributes: settings.displayed_attributes.into(),
stop_words: settings.stop_words.into(),
synonyms: settings.synonyms.into(),
attributes_for_faceting: settings.attributes_for_faceting.into(),
})
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum UpdateState<T> {
Update(T),
Clear,
Nothing,
}
impl <T> From<Option<Option<T>>> for UpdateState<T> {
fn from(opt: Option<Option<T>>) -> UpdateState<T> {
match opt {
Some(Some(t)) => UpdateState::Update(t),
Some(None) => UpdateState::Clear,
None => UpdateState::Nothing,
}
}
}
#[derive(Debug, Clone)]
pub struct RankingRuleConversionError;
impl std::fmt::Display for RankingRuleConversionError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "impossible to convert into RankingRule")
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum RankingRule {
Typo,
Words,
Proximity,
Attribute,
WordsPosition,
Exactness,
Asc(String),
Desc(String),
}
impl std::fmt::Display for RankingRule {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
match self {
RankingRule::Typo => f.write_str("typo"),
RankingRule::Words => f.write_str("words"),
RankingRule::Proximity => f.write_str("proximity"),
RankingRule::Attribute => f.write_str("attribute"),
RankingRule::WordsPosition => f.write_str("wordsPosition"),
RankingRule::Exactness => f.write_str("exactness"),
RankingRule::Asc(field) => write!(f, "asc({})", field),
RankingRule::Desc(field) => write!(f, "desc({})", field),
}
}
}
impl FromStr for RankingRule {
type Err = RankingRuleConversionError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
let rule = match s {
"typo" => RankingRule::Typo,
"words" => RankingRule::Words,
"proximity" => RankingRule::Proximity,
"attribute" => RankingRule::Attribute,
"wordsPosition" => RankingRule::WordsPosition,
"exactness" => RankingRule::Exactness,
_ => {
let captures = RANKING_RULE_REGEX.captures(s).ok_or(RankingRuleConversionError)?;
match (captures.get(1).map(|m| m.as_str()), captures.get(2)) {
(Some("asc"), Some(field)) => RankingRule::Asc(field.as_str().to_string()),
(Some("desc"), Some(field)) => RankingRule::Desc(field.as_str().to_string()),
_ => return Err(RankingRuleConversionError)
}
}
};
Ok(rule)
}
}
impl RankingRule {
pub fn field(&self) -> Option<&str> {
match self {
RankingRule::Asc(field) | RankingRule::Desc(field) => Some(field),
_ => None,
}
}
pub fn try_from_iter(rules: impl IntoIterator<Item = impl AsRef<str>>) -> Result<Vec<RankingRule>, RankingRuleConversionError> {
rules.into_iter()
.map(|s| RankingRule::from_str(s.as_ref()))
.collect()
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SettingsUpdate {
pub ranking_rules: UpdateState<Vec<RankingRule>>,
pub distinct_attribute: UpdateState<String>,
pub primary_key: UpdateState<String>,
pub searchable_attributes: UpdateState<Vec<String>>,
pub displayed_attributes: UpdateState<BTreeSet<String>>,
pub stop_words: UpdateState<BTreeSet<String>>,
pub synonyms: UpdateState<BTreeMap<String, Vec<String>>>,
pub attributes_for_faceting: UpdateState<Vec<String>>,
}
impl Default for SettingsUpdate {
fn default() -> Self {
Self {
ranking_rules: UpdateState::Nothing,
distinct_attribute: UpdateState::Nothing,
primary_key: UpdateState::Nothing,
searchable_attributes: UpdateState::Nothing,
displayed_attributes: UpdateState::Nothing,
stop_words: UpdateState::Nothing,
synonyms: UpdateState::Nothing,
attributes_for_faceting: UpdateState::Nothing,
}
}
}

View File

@ -1,32 +0,0 @@
use std::borrow::Cow;
use heed::{types::CowSlice, BytesEncode, BytesDecode};
use sdset::{Set, SetBuf};
use zerocopy::{AsBytes, FromBytes};
pub struct CowSet<T>(std::marker::PhantomData<T>);
impl<'a, T: 'a> BytesEncode<'a> for CowSet<T>
where
T: AsBytes,
{
type EItem = Set<T>;
fn bytes_encode(item: &'a Self::EItem) -> Option<Cow<[u8]>> {
CowSlice::bytes_encode(item.as_slice())
}
}
impl<'a, T: 'a> BytesDecode<'a> for CowSet<T>
where
T: FromBytes + Copy,
{
type DItem = Cow<'a, Set<T>>;
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
match CowSlice::<T>::bytes_decode(bytes)? {
Cow::Owned(vec) => Some(Cow::Owned(SetBuf::new_unchecked(vec))),
Cow::Borrowed(slice) => Some(Cow::Borrowed(Set::new_unchecked(slice))),
}
}
}

View File

@ -1,43 +0,0 @@
use std::borrow::Cow;
use heed::Result as ZResult;
use heed::types::{ByteSlice, OwnedType};
use crate::database::MainT;
use crate::{DocumentId, FstSetCow};
use super::BEU32;
#[derive(Copy, Clone)]
pub struct DocsWords {
pub(crate) docs_words: heed::Database<OwnedType<BEU32>, ByteSlice>,
}
impl DocsWords {
pub fn put_doc_words(
self,
writer: &mut heed::RwTxn<MainT>,
document_id: DocumentId,
words: &FstSetCow,
) -> ZResult<()> {
let document_id = BEU32::new(document_id.0);
let bytes = words.as_fst().as_bytes();
self.docs_words.put(writer, &document_id, bytes)
}
pub fn del_doc_words(self, writer: &mut heed::RwTxn<MainT>, document_id: DocumentId) -> ZResult<bool> {
let document_id = BEU32::new(document_id.0);
self.docs_words.delete(writer, &document_id)
}
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<()> {
self.docs_words.clear(writer)
}
pub fn doc_words<'a>(self, reader: &'a heed::RoTxn<'a, MainT>, document_id: DocumentId) -> ZResult<FstSetCow> {
let document_id = BEU32::new(document_id.0);
match self.docs_words.get(reader, &document_id)? {
Some(bytes) => Ok(fst::Set::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
None => Ok(fst::Set::default().map_data(Cow::Owned).unwrap()),
}
}
}

View File

@ -1,79 +0,0 @@
use heed::types::{ByteSlice, OwnedType};
use crate::database::MainT;
use heed::Result as ZResult;
use meilisearch_schema::FieldId;
use super::DocumentFieldStoredKey;
use crate::DocumentId;
#[derive(Copy, Clone)]
pub struct DocumentsFields {
pub(crate) documents_fields: heed::Database<OwnedType<DocumentFieldStoredKey>, ByteSlice>,
}
impl DocumentsFields {
pub fn put_document_field(
self,
writer: &mut heed::RwTxn<MainT>,
document_id: DocumentId,
field: FieldId,
value: &[u8],
) -> ZResult<()> {
let key = DocumentFieldStoredKey::new(document_id, field);
self.documents_fields.put(writer, &key, value)
}
pub fn del_all_document_fields(
self,
writer: &mut heed::RwTxn<MainT>,
document_id: DocumentId,
) -> ZResult<usize> {
let start = DocumentFieldStoredKey::new(document_id, FieldId::min());
let end = DocumentFieldStoredKey::new(document_id, FieldId::max());
self.documents_fields.delete_range(writer, &(start..=end))
}
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<()> {
self.documents_fields.clear(writer)
}
pub fn document_attribute<'txn>(
self,
reader: &'txn heed::RoTxn<MainT>,
document_id: DocumentId,
field: FieldId,
) -> ZResult<Option<&'txn [u8]>> {
let key = DocumentFieldStoredKey::new(document_id, field);
self.documents_fields.get(reader, &key)
}
pub fn document_fields<'txn>(
self,
reader: &'txn heed::RoTxn<MainT>,
document_id: DocumentId,
) -> ZResult<DocumentFieldsIter<'txn>> {
let start = DocumentFieldStoredKey::new(document_id, FieldId::min());
let end = DocumentFieldStoredKey::new(document_id, FieldId::max());
let iter = self.documents_fields.range(reader, &(start..=end))?;
Ok(DocumentFieldsIter { iter })
}
}
pub struct DocumentFieldsIter<'txn> {
iter: heed::RoRange<'txn, OwnedType<DocumentFieldStoredKey>, ByteSlice>,
}
impl<'txn> Iterator for DocumentFieldsIter<'txn> {
type Item = ZResult<(FieldId, &'txn [u8])>;
fn next(&mut self) -> Option<Self::Item> {
match self.iter.next() {
Some(Ok((key, bytes))) => {
let field_id = FieldId(key.field_id.get());
Some(Ok((field_id, bytes)))
}
Some(Err(e)) => Some(Err(e)),
None => None,
}
}
}

View File

@ -1,143 +0,0 @@
use super::DocumentFieldIndexedKey;
use crate::database::MainT;
use crate::DocumentId;
use heed::types::OwnedType;
use heed::Result as ZResult;
use meilisearch_schema::IndexedPos;
use crate::MResult;
#[derive(Copy, Clone)]
pub struct DocumentsFieldsCounts {
pub(crate) documents_fields_counts: heed::Database<OwnedType<DocumentFieldIndexedKey>, OwnedType<u16>>,
}
impl DocumentsFieldsCounts {
pub fn put_document_field_count(
self,
writer: &mut heed::RwTxn<MainT>,
document_id: DocumentId,
attribute: IndexedPos,
value: u16,
) -> ZResult<()> {
let key = DocumentFieldIndexedKey::new(document_id, attribute);
self.documents_fields_counts.put(writer, &key, &value)
}
pub fn del_all_document_fields_counts(
self,
writer: &mut heed::RwTxn<MainT>,
document_id: DocumentId,
) -> ZResult<usize> {
let start = DocumentFieldIndexedKey::new(document_id, IndexedPos::min());
let end = DocumentFieldIndexedKey::new(document_id, IndexedPos::max());
self.documents_fields_counts.delete_range(writer, &(start..=end))
}
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<()> {
self.documents_fields_counts.clear(writer)
}
pub fn document_field_count(
self,
reader: &heed::RoTxn<MainT>,
document_id: DocumentId,
attribute: IndexedPos,
) -> ZResult<Option<u16>> {
let key = DocumentFieldIndexedKey::new(document_id, attribute);
match self.documents_fields_counts.get(reader, &key)? {
Some(count) => Ok(Some(count)),
None => Ok(None),
}
}
pub fn document_fields_counts<'txn>(
self,
reader: &'txn heed::RoTxn<MainT>,
document_id: DocumentId,
) -> ZResult<DocumentFieldsCountsIter<'txn>> {
let start = DocumentFieldIndexedKey::new(document_id, IndexedPos::min());
let end = DocumentFieldIndexedKey::new(document_id, IndexedPos::max());
let iter = self.documents_fields_counts.range(reader, &(start..=end))?;
Ok(DocumentFieldsCountsIter { iter })
}
pub fn documents_ids<'txn>(self, reader: &'txn heed::RoTxn<MainT>) -> MResult<DocumentsIdsIter<'txn>> {
let iter = self.documents_fields_counts.iter(reader)?;
Ok(DocumentsIdsIter {
last_seen_id: None,
iter,
})
}
pub fn all_documents_fields_counts<'txn>(
self,
reader: &'txn heed::RoTxn<MainT>,
) -> ZResult<AllDocumentsFieldsCountsIter<'txn>> {
let iter = self.documents_fields_counts.iter(reader)?;
Ok(AllDocumentsFieldsCountsIter { iter })
}
}
pub struct DocumentFieldsCountsIter<'txn> {
iter: heed::RoRange<'txn, OwnedType<DocumentFieldIndexedKey>, OwnedType<u16>>,
}
impl Iterator for DocumentFieldsCountsIter<'_> {
type Item = ZResult<(IndexedPos, u16)>;
fn next(&mut self) -> Option<Self::Item> {
match self.iter.next() {
Some(Ok((key, count))) => {
let indexed_pos = IndexedPos(key.indexed_pos.get());
Some(Ok((indexed_pos, count)))
}
Some(Err(e)) => Some(Err(e)),
None => None,
}
}
}
pub struct DocumentsIdsIter<'txn> {
last_seen_id: Option<DocumentId>,
iter: heed::RoIter<'txn, OwnedType<DocumentFieldIndexedKey>, OwnedType<u16>>,
}
impl Iterator for DocumentsIdsIter<'_> {
type Item = MResult<DocumentId>;
fn next(&mut self) -> Option<Self::Item> {
for result in &mut self.iter {
match result {
Ok((key, _)) => {
let document_id = DocumentId(key.docid.get());
if Some(document_id) != self.last_seen_id {
self.last_seen_id = Some(document_id);
return Some(Ok(document_id));
}
}
Err(e) => return Some(Err(e.into())),
}
}
None
}
}
pub struct AllDocumentsFieldsCountsIter<'txn> {
iter: heed::RoIter<'txn, OwnedType<DocumentFieldIndexedKey>, OwnedType<u16>>,
}
impl Iterator for AllDocumentsFieldsCountsIter<'_> {
type Item = ZResult<(DocumentId, IndexedPos, u16)>;
fn next(&mut self) -> Option<Self::Item> {
match self.iter.next() {
Some(Ok((key, count))) => {
let docid = DocumentId(key.docid.get());
let indexed_pos = IndexedPos(key.indexed_pos.get());
Some(Ok((docid, indexed_pos, count)))
}
Some(Err(e)) => Some(Err(e)),
None => None,
}
}
}

View File

@ -1,75 +0,0 @@
use std::borrow::Cow;
use heed::{BytesDecode, BytesEncode};
use sdset::Set;
use crate::DocumentId;
use super::cow_set::CowSet;
pub struct DocumentsIds;
impl BytesEncode<'_> for DocumentsIds {
type EItem = Set<DocumentId>;
fn bytes_encode(item: &Self::EItem) -> Option<Cow<[u8]>> {
CowSet::bytes_encode(item)
}
}
impl<'a> BytesDecode<'a> for DocumentsIds {
type DItem = Cow<'a, Set<DocumentId>>;
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
CowSet::bytes_decode(bytes)
}
}
pub struct DiscoverIds<'a> {
ids_iter: std::slice::Iter<'a, DocumentId>,
left_id: Option<u32>,
right_id: Option<u32>,
available_range: std::ops::Range<u32>,
}
impl DiscoverIds<'_> {
pub fn new(ids: &Set<DocumentId>) -> DiscoverIds {
let mut ids_iter = ids.iter();
let right_id = ids_iter.next().map(|id| id.0);
let available_range = 0..right_id.unwrap_or(u32::max_value());
DiscoverIds { ids_iter, left_id: None, right_id, available_range }
}
}
impl Iterator for DiscoverIds<'_> {
type Item = DocumentId;
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.available_range.next() {
// The available range gives us a new id, we return it.
Some(id) => return Some(DocumentId(id)),
// The available range is exhausted, we need to find the next one.
None if self.available_range.end == u32::max_value() => return None,
None => loop {
self.left_id = self.right_id.take();
self.right_id = self.ids_iter.next().map(|id| id.0);
match (self.left_id, self.right_id) {
// We found a gap in the used ids, we can yield all ids
// until the end of the gap
(Some(l), Some(r)) => if l.saturating_add(1) != r {
self.available_range = (l + 1)..r;
break;
},
// The last used id has been reached, we can use all ids
// until u32 MAX
(Some(l), None) => {
self.available_range = l.saturating_add(1)..u32::max_value();
break;
},
_ => (),
}
},
}
}
}
}

View File

@ -1,97 +0,0 @@
use std::borrow::Cow;
use std::collections::HashMap;
use std::mem;
use heed::{RwTxn, RoTxn, RoPrefix, types::Str, BytesEncode, BytesDecode};
use sdset::{SetBuf, Set, SetOperation};
use meilisearch_types::DocumentId;
use meilisearch_schema::FieldId;
use crate::MResult;
use crate::database::MainT;
use crate::facets::FacetKey;
use super::cow_set::CowSet;
/// contains facet info
#[derive(Clone, Copy)]
pub struct Facets {
pub(crate) facets: heed::Database<FacetKey, FacetData>,
}
pub struct FacetData;
impl<'a> BytesEncode<'a> for FacetData {
type EItem = (&'a str, &'a Set<DocumentId>);
fn bytes_encode(item: &'a Self::EItem) -> Option<Cow<'a, [u8]>> {
// get size of the first item
let first_size = item.0.as_bytes().len();
let size = mem::size_of::<u64>()
+ first_size
+ item.1.len() * mem::size_of::<DocumentId>();
let mut buffer = Vec::with_capacity(size);
// encode the length of the first item
buffer.extend_from_slice(&first_size.to_be_bytes());
buffer.extend_from_slice(Str::bytes_encode(&item.0)?.as_ref());
let second_slice = CowSet::bytes_encode(&item.1)?;
buffer.extend_from_slice(second_slice.as_ref());
Some(Cow::Owned(buffer))
}
}
impl<'a> BytesDecode<'a> for FacetData {
type DItem = (&'a str, Cow<'a, Set<DocumentId>>);
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
const LEN: usize = mem::size_of::<u64>();
let mut size_buf = [0; LEN];
size_buf.copy_from_slice(bytes.get(0..LEN)?);
// decode size of the first item from the bytes
let first_size = usize::from_be_bytes(size_buf);
// decode first and second items
let first_item = Str::bytes_decode(bytes.get(LEN..(LEN + first_size))?)?;
let second_item = CowSet::bytes_decode(bytes.get((LEN + first_size)..)?)?;
Some((first_item, second_item))
}
}
impl Facets {
// we use sdset::SetBuf to ensure the docids are sorted.
pub fn put_facet_document_ids(&self, writer: &mut RwTxn<MainT>, facet_key: FacetKey, doc_ids: &Set<DocumentId>, facet_value: &str) -> MResult<()> {
Ok(self.facets.put(writer, &facet_key, &(facet_value, doc_ids))?)
}
pub fn field_document_ids<'txn>(&self, reader: &'txn RoTxn<MainT>, field_id: FieldId) -> MResult<RoPrefix<'txn, FacetKey, FacetData>> {
Ok(self.facets.prefix_iter(reader, &FacetKey::new(field_id, String::new()))?)
}
pub fn facet_document_ids<'txn>(&self, reader: &'txn RoTxn<MainT>, facet_key: &FacetKey) -> MResult<Option<(&'txn str,Cow<'txn, Set<DocumentId>>)>> {
Ok(self.facets.get(reader, &facet_key)?)
}
/// updates the facets store, revmoving the documents from the facets provided in the
/// `facet_map` argument
pub fn remove(&self, writer: &mut RwTxn<MainT>, facet_map: HashMap<FacetKey, (String, Vec<DocumentId>)>) -> MResult<()> {
for (key, (name, document_ids)) in facet_map {
if let Some((_, old)) = self.facets.get(writer, &key)? {
let to_remove = SetBuf::from_dirty(document_ids);
let new = sdset::duo::OpBuilder::new(old.as_ref(), to_remove.as_set()).difference().into_set_buf();
self.facets.put(writer, &key, &(&name, new.as_set()))?;
}
}
Ok(())
}
pub fn add(&self, writer: &mut RwTxn<MainT>, facet_map: HashMap<FacetKey, (String, Vec<DocumentId>)>) -> MResult<()> {
for (key, (facet_name, document_ids)) in facet_map {
let set = SetBuf::from_dirty(document_ids);
self.put_facet_document_ids(writer, key, set.as_set(), &facet_name)?;
}
Ok(())
}
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
Ok(self.facets.clear(writer)?)
}
}

View File

@ -1,320 +0,0 @@
use std::borrow::Cow;
use std::collections::BTreeMap;
use chrono::{DateTime, Utc};
use heed::types::{ByteSlice, OwnedType, SerdeBincode, Str, CowSlice};
use meilisearch_schema::{FieldId, Schema};
use meilisearch_types::DocumentId;
use sdset::Set;
use crate::database::MainT;
use crate::{RankedMap, MResult};
use crate::settings::RankingRule;
use crate::{FstSetCow, FstMapCow};
use super::{CowSet, DocumentsIds};
const ATTRIBUTES_FOR_FACETING_KEY: &str = "attributes-for-faceting";
const CREATED_AT_KEY: &str = "created-at";
const CUSTOMS_KEY: &str = "customs";
const DISTINCT_ATTRIBUTE_KEY: &str = "distinct-attribute";
const EXTERNAL_DOCIDS_KEY: &str = "external-docids";
const FIELDS_DISTRIBUTION_KEY: &str = "fields-distribution";
const INTERNAL_DOCIDS_KEY: &str = "internal-docids";
const NAME_KEY: &str = "name";
const NUMBER_OF_DOCUMENTS_KEY: &str = "number-of-documents";
const RANKED_MAP_KEY: &str = "ranked-map";
const RANKING_RULES_KEY: &str = "ranking-rules";
const SCHEMA_KEY: &str = "schema";
const SORTED_DOCUMENT_IDS_CACHE_KEY: &str = "sorted-document-ids-cache";
const STOP_WORDS_KEY: &str = "stop-words";
const SYNONYMS_KEY: &str = "synonyms";
const UPDATED_AT_KEY: &str = "updated-at";
const WORDS_KEY: &str = "words";
pub type FreqsMap = BTreeMap<String, usize>;
type SerdeFreqsMap = SerdeBincode<FreqsMap>;
type SerdeDatetime = SerdeBincode<DateTime<Utc>>;
#[derive(Copy, Clone)]
pub struct Main {
pub(crate) main: heed::PolyDatabase,
}
impl Main {
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
Ok(self.main.clear(writer)?)
}
pub fn put_name(self, writer: &mut heed::RwTxn<MainT>, name: &str) -> MResult<()> {
Ok(self.main.put::<_, Str, Str>(writer, NAME_KEY, name)?)
}
pub fn name(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<String>> {
Ok(self
.main
.get::<_, Str, Str>(reader, NAME_KEY)?
.map(|name| name.to_owned()))
}
pub fn put_created_at(self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
Ok(self.main.put::<_, Str, SerdeDatetime>(writer, CREATED_AT_KEY, &Utc::now())?)
}
pub fn created_at(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<DateTime<Utc>>> {
Ok(self.main.get::<_, Str, SerdeDatetime>(reader, CREATED_AT_KEY)?)
}
pub fn put_updated_at(self, writer: &mut heed::RwTxn<MainT>) -> MResult<()> {
Ok(self.main.put::<_, Str, SerdeDatetime>(writer, UPDATED_AT_KEY, &Utc::now())?)
}
pub fn updated_at(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<DateTime<Utc>>> {
Ok(self.main.get::<_, Str, SerdeDatetime>(reader, UPDATED_AT_KEY)?)
}
pub fn put_internal_docids(self, writer: &mut heed::RwTxn<MainT>, ids: &sdset::Set<DocumentId>) -> MResult<()> {
Ok(self.main.put::<_, Str, DocumentsIds>(writer, INTERNAL_DOCIDS_KEY, ids)?)
}
pub fn internal_docids<'txn>(self, reader: &'txn heed::RoTxn<MainT>) -> MResult<Cow<'txn, sdset::Set<DocumentId>>> {
match self.main.get::<_, Str, DocumentsIds>(reader, INTERNAL_DOCIDS_KEY)? {
Some(ids) => Ok(ids),
None => Ok(Cow::default()),
}
}
pub fn merge_internal_docids(self, writer: &mut heed::RwTxn<MainT>, new_ids: &sdset::Set<DocumentId>) -> MResult<()> {
use sdset::SetOperation;
// We do an union of the old and new internal ids.
let internal_docids = self.internal_docids(writer)?;
let internal_docids = sdset::duo::Union::new(&internal_docids, new_ids).into_set_buf();
Ok(self.put_internal_docids(writer, &internal_docids)?)
}
pub fn remove_internal_docids(self, writer: &mut heed::RwTxn<MainT>, ids: &sdset::Set<DocumentId>) -> MResult<()> {
use sdset::SetOperation;
// We do a difference of the old and new internal ids.
let internal_docids = self.internal_docids(writer)?;
let internal_docids = sdset::duo::Difference::new(&internal_docids, ids).into_set_buf();
Ok(self.put_internal_docids(writer, &internal_docids)?)
}
pub fn put_external_docids<A>(self, writer: &mut heed::RwTxn<MainT>, ids: &fst::Map<A>) -> MResult<()>
where A: AsRef<[u8]>,
{
Ok(self.main.put::<_, Str, ByteSlice>(writer, EXTERNAL_DOCIDS_KEY, ids.as_fst().as_bytes())?)
}
pub fn merge_external_docids<A>(self, writer: &mut heed::RwTxn<MainT>, new_docids: &fst::Map<A>) -> MResult<()>
where A: AsRef<[u8]>,
{
use fst::{Streamer, IntoStreamer};
// Do an union of the old and the new set of external docids.
let external_docids = self.external_docids(writer)?;
let mut op = external_docids.op().add(new_docids.into_stream()).r#union();
let mut build = fst::MapBuilder::memory();
while let Some((docid, values)) = op.next() {
build.insert(docid, values[0].value).unwrap();
}
drop(op);
let external_docids = build.into_map();
Ok(self.put_external_docids(writer, &external_docids)?)
}
pub fn remove_external_docids<A>(self, writer: &mut heed::RwTxn<MainT>, ids: &fst::Map<A>) -> MResult<()>
where A: AsRef<[u8]>,
{
use fst::{Streamer, IntoStreamer};
// Do an union of the old and the new set of external docids.
let external_docids = self.external_docids(writer)?;
let mut op = external_docids.op().add(ids.into_stream()).difference();
let mut build = fst::MapBuilder::memory();
while let Some((docid, values)) = op.next() {
build.insert(docid, values[0].value).unwrap();
}
drop(op);
let external_docids = build.into_map();
self.put_external_docids(writer, &external_docids)
}
pub fn external_docids<'a>(self, reader: &'a heed::RoTxn<'a, MainT>) -> MResult<FstMapCow> {
match self.main.get::<_, Str, ByteSlice>(reader, EXTERNAL_DOCIDS_KEY)? {
Some(bytes) => Ok(fst::Map::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
None => Ok(fst::Map::default().map_data(Cow::Owned).unwrap()),
}
}
pub fn external_to_internal_docid(self, reader: &heed::RoTxn<MainT>, external_docid: &str) -> MResult<Option<DocumentId>> {
let external_ids = self.external_docids(reader)?;
Ok(external_ids.get(external_docid).map(|id| DocumentId(id as u32)))
}
pub fn words_fst<'a>(self, reader: &'a heed::RoTxn<'a, MainT>) -> MResult<FstSetCow> {
match self.main.get::<_, Str, ByteSlice>(reader, WORDS_KEY)? {
Some(bytes) => Ok(fst::Set::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
None => Ok(fst::Set::default().map_data(Cow::Owned).unwrap()),
}
}
pub fn put_words_fst<A: AsRef<[u8]>>(self, writer: &mut heed::RwTxn<MainT>, fst: &fst::Set<A>) -> MResult<()> {
Ok(self.main.put::<_, Str, ByteSlice>(writer, WORDS_KEY, fst.as_fst().as_bytes())?)
}
pub fn put_sorted_document_ids_cache(self, writer: &mut heed::RwTxn<MainT>, documents_ids: &[DocumentId]) -> MResult<()> {
Ok(self.main.put::<_, Str, CowSlice<DocumentId>>(writer, SORTED_DOCUMENT_IDS_CACHE_KEY, documents_ids)?)
}
pub fn sorted_document_ids_cache<'a>(self, reader: &'a heed::RoTxn<'a, MainT>) -> MResult<Option<Cow<[DocumentId]>>> {
Ok(self.main.get::<_, Str, CowSlice<DocumentId>>(reader, SORTED_DOCUMENT_IDS_CACHE_KEY)?)
}
pub fn put_schema(self, writer: &mut heed::RwTxn<MainT>, schema: &Schema) -> MResult<()> {
Ok(self.main.put::<_, Str, SerdeBincode<Schema>>(writer, SCHEMA_KEY, schema)?)
}
pub fn schema(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<Schema>> {
Ok(self.main.get::<_, Str, SerdeBincode<Schema>>(reader, SCHEMA_KEY)?)
}
pub fn delete_schema(self, writer: &mut heed::RwTxn<MainT>) -> MResult<bool> {
Ok(self.main.delete::<_, Str>(writer, SCHEMA_KEY)?)
}
pub fn put_ranked_map(self, writer: &mut heed::RwTxn<MainT>, ranked_map: &RankedMap) -> MResult<()> {
Ok(self.main.put::<_, Str, SerdeBincode<RankedMap>>(writer, RANKED_MAP_KEY, &ranked_map)?)
}
pub fn ranked_map(self, reader: &heed::RoTxn<MainT>) -> MResult<Option<RankedMap>> {
Ok(self.main.get::<_, Str, SerdeBincode<RankedMap>>(reader, RANKED_MAP_KEY)?)
}
pub fn put_synonyms_fst<A: AsRef<[u8]>>(self, writer: &mut heed::RwTxn<MainT>, fst: &fst::Set<A>) -> MResult<()> {
let bytes = fst.as_fst().as_bytes();
Ok(self.main.put::<_, Str, ByteSlice>(writer, SYNONYMS_KEY, bytes)?)
}
pub(crate) fn synonyms_fst<'a>(self, reader: &'a heed::RoTxn<'a, MainT>) -> MResult<FstSetCow> {
match self.main.get::<_, Str, ByteSlice>(reader, SYNONYMS_KEY)? {
Some(bytes) => Ok(fst::Set::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
None => Ok(fst::Set::default().map_data(Cow::Owned).unwrap()),
}
}
pub fn synonyms(self, reader: &heed::RoTxn<MainT>) -> MResult<Vec<String>> {
let synonyms = self
.synonyms_fst(&reader)?
.stream()
.into_strs()?;
Ok(synonyms)
}
pub fn put_stop_words_fst<A: AsRef<[u8]>>(self, writer: &mut heed::RwTxn<MainT>, fst: &fst::Set<A>) -> MResult<()> {
let bytes = fst.as_fst().as_bytes();
Ok(self.main.put::<_, Str, ByteSlice>(writer, STOP_WORDS_KEY, bytes)?)
}
pub(crate) fn stop_words_fst<'a>(self, reader: &'a heed::RoTxn<'a, MainT>) -> MResult<FstSetCow> {
match self.main.get::<_, Str, ByteSlice>(reader, STOP_WORDS_KEY)? {
Some(bytes) => Ok(fst::Set::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
None => Ok(fst::Set::default().map_data(Cow::Owned).unwrap()),
}
}
pub fn stop_words(self, reader: &heed::RoTxn<MainT>) -> MResult<Vec<String>> {
let stop_word_list = self
.stop_words_fst(reader)?
.stream()
.into_strs()?;
Ok(stop_word_list)
}
pub fn put_number_of_documents<F>(self, writer: &mut heed::RwTxn<MainT>, f: F) -> MResult<u64>
where
F: Fn(u64) -> u64,
{
let new = self.number_of_documents(&*writer).map(f)?;
self.main
.put::<_, Str, OwnedType<u64>>(writer, NUMBER_OF_DOCUMENTS_KEY, &new)?;
Ok(new)
}
pub fn number_of_documents(self, reader: &heed::RoTxn<MainT>) -> MResult<u64> {
match self
.main
.get::<_, Str, OwnedType<u64>>(reader, NUMBER_OF_DOCUMENTS_KEY)? {
Some(value) => Ok(value),
None => Ok(0),
}
}
pub fn put_fields_distribution(
self,
writer: &mut heed::RwTxn<MainT>,
fields_frequency: &FreqsMap,
) -> MResult<()> {
Ok(self.main.put::<_, Str, SerdeFreqsMap>(writer, FIELDS_DISTRIBUTION_KEY, fields_frequency)?)
}
pub fn fields_distribution(&self, reader: &heed::RoTxn<MainT>) -> MResult<Option<FreqsMap>> {
match self
.main
.get::<_, Str, SerdeFreqsMap>(reader, FIELDS_DISTRIBUTION_KEY)?
{
Some(freqs) => Ok(Some(freqs)),
None => Ok(None),
}
}
pub fn attributes_for_faceting<'txn>(&self, reader: &'txn heed::RoTxn<MainT>) -> MResult<Option<Cow<'txn, Set<FieldId>>>> {
Ok(self.main.get::<_, Str, CowSet<FieldId>>(reader, ATTRIBUTES_FOR_FACETING_KEY)?)
}
pub fn put_attributes_for_faceting(self, writer: &mut heed::RwTxn<MainT>, attributes: &Set<FieldId>) -> MResult<()> {
Ok(self.main.put::<_, Str, CowSet<FieldId>>(writer, ATTRIBUTES_FOR_FACETING_KEY, attributes)?)
}
pub fn delete_attributes_for_faceting(self, writer: &mut heed::RwTxn<MainT>) -> MResult<bool> {
Ok(self.main.delete::<_, Str>(writer, ATTRIBUTES_FOR_FACETING_KEY)?)
}
pub fn ranking_rules(&self, reader: &heed::RoTxn<MainT>) -> MResult<Option<Vec<RankingRule>>> {
Ok(self.main.get::<_, Str, SerdeBincode<Vec<RankingRule>>>(reader, RANKING_RULES_KEY)?)
}
pub fn put_ranking_rules(self, writer: &mut heed::RwTxn<MainT>, value: &[RankingRule]) -> MResult<()> {
Ok(self.main.put::<_, Str, SerdeBincode<Vec<RankingRule>>>(writer, RANKING_RULES_KEY, &value.to_vec())?)
}
pub fn delete_ranking_rules(self, writer: &mut heed::RwTxn<MainT>) -> MResult<bool> {
Ok(self.main.delete::<_, Str>(writer, RANKING_RULES_KEY)?)
}
pub fn distinct_attribute(&self, reader: &heed::RoTxn<MainT>) -> MResult<Option<FieldId>> {
match self.main.get::<_, Str, OwnedType<u16>>(reader, DISTINCT_ATTRIBUTE_KEY)? {
Some(value) => Ok(Some(FieldId(value.to_owned()))),
None => Ok(None),
}
}
pub fn put_distinct_attribute(self, writer: &mut heed::RwTxn<MainT>, value: FieldId) -> MResult<()> {
Ok(self.main.put::<_, Str, OwnedType<u16>>(writer, DISTINCT_ATTRIBUTE_KEY, &value.0)?)
}
pub fn delete_distinct_attribute(self, writer: &mut heed::RwTxn<MainT>) -> MResult<bool> {
Ok(self.main.delete::<_, Str>(writer, DISTINCT_ATTRIBUTE_KEY)?)
}
pub fn put_customs(self, writer: &mut heed::RwTxn<MainT>, customs: &[u8]) -> MResult<()> {
Ok(self.main.put::<_, Str, ByteSlice>(writer, CUSTOMS_KEY, customs)?)
}
pub fn customs<'txn>(self, reader: &'txn heed::RoTxn<MainT>) -> MResult<Option<&'txn [u8]>> {
Ok(self.main.get::<_, Str, ByteSlice>(reader, CUSTOMS_KEY)?)
}
}

View File

@ -1,522 +0,0 @@
mod cow_set;
mod docs_words;
mod documents_ids;
mod documents_fields;
mod documents_fields_counts;
mod facets;
mod main;
mod postings_lists;
mod prefix_documents_cache;
mod prefix_postings_lists_cache;
mod synonyms;
mod updates;
mod updates_results;
pub use self::cow_set::CowSet;
pub use self::docs_words::DocsWords;
pub use self::documents_fields::{DocumentFieldsIter, DocumentsFields};
pub use self::documents_fields_counts::{DocumentFieldsCountsIter, DocumentsFieldsCounts, DocumentsIdsIter};
pub use self::documents_ids::{DocumentsIds, DiscoverIds};
pub use self::facets::Facets;
pub use self::main::Main;
pub use self::postings_lists::PostingsLists;
pub use self::prefix_documents_cache::PrefixDocumentsCache;
pub use self::prefix_postings_lists_cache::PrefixPostingsListsCache;
pub use self::synonyms::Synonyms;
pub use self::updates::Updates;
pub use self::updates_results::UpdatesResults;
use std::borrow::Cow;
use std::collections::HashSet;
use std::convert::TryInto;
use std::{mem, ptr};
use heed::{BytesEncode, BytesDecode};
use meilisearch_schema::{IndexedPos, FieldId};
use sdset::{Set, SetBuf};
use serde::de::{self, Deserialize};
use zerocopy::{AsBytes, FromBytes};
use crate::criterion::Criteria;
use crate::database::{MainT, UpdateT};
use crate::database::{UpdateEvent, UpdateEventsEmitter};
use crate::serde::Deserializer;
use crate::settings::SettingsUpdate;
use crate::{query_builder::QueryBuilder, update, DocIndex, DocumentId, Error, MResult};
type BEU32 = zerocopy::U32<byteorder::BigEndian>;
type BEU64 = zerocopy::U64<byteorder::BigEndian>;
pub type BEU16 = zerocopy::U16<byteorder::BigEndian>;
#[derive(Debug, Copy, Clone, AsBytes, FromBytes)]
#[repr(C)]
pub struct DocumentFieldIndexedKey {
docid: BEU32,
indexed_pos: BEU16,
}
impl DocumentFieldIndexedKey {
fn new(docid: DocumentId, indexed_pos: IndexedPos) -> DocumentFieldIndexedKey {
DocumentFieldIndexedKey {
docid: BEU32::new(docid.0),
indexed_pos: BEU16::new(indexed_pos.0),
}
}
}
#[derive(Debug, Copy, Clone, AsBytes, FromBytes)]
#[repr(C)]
pub struct DocumentFieldStoredKey {
docid: BEU32,
field_id: BEU16,
}
impl DocumentFieldStoredKey {
fn new(docid: DocumentId, field_id: FieldId) -> DocumentFieldStoredKey {
DocumentFieldStoredKey {
docid: BEU32::new(docid.0),
field_id: BEU16::new(field_id.0),
}
}
}
#[derive(Default, Debug)]
pub struct Postings<'a> {
pub docids: Cow<'a, Set<DocumentId>>,
pub matches: Cow<'a, Set<DocIndex>>,
}
pub struct PostingsCodec;
impl<'a> BytesEncode<'a> for PostingsCodec {
type EItem = Postings<'a>;
fn bytes_encode(item: &'a Self::EItem) -> Option<Cow<'a, [u8]>> {
let u64_size = mem::size_of::<u64>();
let docids_size = item.docids.len() * mem::size_of::<DocumentId>();
let matches_size = item.matches.len() * mem::size_of::<DocIndex>();
let mut buffer = Vec::with_capacity(u64_size + docids_size + matches_size);
let docids_len = item.docids.len() as u64;
buffer.extend_from_slice(&docids_len.to_be_bytes());
buffer.extend_from_slice(item.docids.as_bytes());
buffer.extend_from_slice(item.matches.as_bytes());
Some(Cow::Owned(buffer))
}
}
fn aligned_to(bytes: &[u8], align: usize) -> bool {
(bytes as *const _ as *const () as usize) % align == 0
}
fn from_bytes_to_set<'a, T: 'a>(bytes: &'a [u8]) -> Option<Cow<'a, Set<T>>>
where T: Clone + FromBytes
{
match zerocopy::LayoutVerified::<_, [T]>::new_slice(bytes) {
Some(layout) => Some(Cow::Borrowed(Set::new_unchecked(layout.into_slice()))),
None => {
let len = bytes.len();
let elem_size = mem::size_of::<T>();
// ensure that it is the alignment that is wrong
// and the length is valid
if len % elem_size == 0 && !aligned_to(bytes, mem::align_of::<T>()) {
let elems = len / elem_size;
let mut vec = Vec::<T>::with_capacity(elems);
unsafe {
let dst = vec.as_mut_ptr() as *mut u8;
ptr::copy_nonoverlapping(bytes.as_ptr(), dst, len);
vec.set_len(elems);
}
return Some(Cow::Owned(SetBuf::new_unchecked(vec)));
}
None
}
}
}
impl<'a> BytesDecode<'a> for PostingsCodec {
type DItem = Postings<'a>;
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> {
let u64_size = mem::size_of::<u64>();
let docid_size = mem::size_of::<DocumentId>();
let (len_bytes, bytes) = bytes.split_at(u64_size);
let docids_len = len_bytes.try_into().ok().map(u64::from_be_bytes)? as usize;
let docids_size = docids_len * docid_size;
let docids_bytes = &bytes[..docids_size];
let matches_bytes = &bytes[docids_size..];
let docids = from_bytes_to_set(docids_bytes)?;
let matches = from_bytes_to_set(matches_bytes)?;
Some(Postings { docids, matches })
}
}
fn main_name(name: &str) -> String {
format!("store-{}", name)
}
fn postings_lists_name(name: &str) -> String {
format!("store-{}-postings-lists", name)
}
fn documents_fields_name(name: &str) -> String {
format!("store-{}-documents-fields", name)
}
fn documents_fields_counts_name(name: &str) -> String {
format!("store-{}-documents-fields-counts", name)
}
fn synonyms_name(name: &str) -> String {
format!("store-{}-synonyms", name)
}
fn docs_words_name(name: &str) -> String {
format!("store-{}-docs-words", name)
}
fn prefix_documents_cache_name(name: &str) -> String {
format!("store-{}-prefix-documents-cache", name)
}
fn prefix_postings_lists_cache_name(name: &str) -> String {
format!("store-{}-prefix-postings-lists-cache", name)
}
fn updates_name(name: &str) -> String {
format!("store-{}-updates", name)
}
fn updates_results_name(name: &str) -> String {
format!("store-{}-updates-results", name)
}
fn facets_name(name: &str) -> String {
format!("store-{}-facets", name)
}
#[derive(Clone)]
pub struct Index {
pub main: Main,
pub postings_lists: PostingsLists,
pub documents_fields: DocumentsFields,
pub documents_fields_counts: DocumentsFieldsCounts,
pub facets: Facets,
pub synonyms: Synonyms,
pub docs_words: DocsWords,
pub prefix_documents_cache: PrefixDocumentsCache,
pub prefix_postings_lists_cache: PrefixPostingsListsCache,
pub updates: Updates,
pub updates_results: UpdatesResults,
pub(crate) updates_notifier: UpdateEventsEmitter,
}
impl Index {
pub fn document<T: de::DeserializeOwned>(
&self,
reader: &heed::RoTxn<MainT>,
attributes: Option<&HashSet<&str>>,
document_id: DocumentId,
) -> MResult<Option<T>> {
let schema = self.main.schema(reader)?;
let schema = schema.ok_or(Error::SchemaMissing)?;
let attributes = match attributes {
Some(attributes) => Some(attributes.iter().filter_map(|name| schema.id(*name)).collect()),
None => None,
};
let mut deserializer = Deserializer {
document_id,
reader,
documents_fields: self.documents_fields,
schema: &schema,
fields: attributes.as_ref(),
};
Ok(Option::<T>::deserialize(&mut deserializer)?)
}
pub fn document_attribute<T: de::DeserializeOwned>(
&self,
reader: &heed::RoTxn<MainT>,
document_id: DocumentId,
attribute: FieldId,
) -> MResult<Option<T>> {
let bytes = self
.documents_fields
.document_attribute(reader, document_id, attribute)?;
match bytes {
Some(bytes) => Ok(Some(serde_json::from_slice(bytes)?)),
None => Ok(None),
}
}
pub fn document_attribute_bytes<'txn>(
&self,
reader: &'txn heed::RoTxn<MainT>,
document_id: DocumentId,
attribute: FieldId,
) -> MResult<Option<&'txn [u8]>> {
let bytes = self
.documents_fields
.document_attribute(reader, document_id, attribute)?;
match bytes {
Some(bytes) => Ok(Some(bytes)),
None => Ok(None),
}
}
pub fn customs_update(&self, writer: &mut heed::RwTxn<UpdateT>, customs: Vec<u8>) -> MResult<u64> {
let _ = self.updates_notifier.send(UpdateEvent::NewUpdate);
Ok(update::push_customs_update(writer, self.updates, self.updates_results, customs)?)
}
pub fn settings_update(&self, writer: &mut heed::RwTxn<UpdateT>, update: SettingsUpdate) -> MResult<u64> {
let _ = self.updates_notifier.send(UpdateEvent::NewUpdate);
Ok(update::push_settings_update(writer, self.updates, self.updates_results, update)?)
}
pub fn documents_addition<D>(&self) -> update::DocumentsAddition<D> {
update::DocumentsAddition::new(
self.updates,
self.updates_results,
self.updates_notifier.clone(),
)
}
pub fn documents_partial_addition<D>(&self) -> update::DocumentsAddition<D> {
update::DocumentsAddition::new_partial(
self.updates,
self.updates_results,
self.updates_notifier.clone(),
)
}
pub fn documents_deletion(&self) -> update::DocumentsDeletion {
update::DocumentsDeletion::new(
self.updates,
self.updates_results,
self.updates_notifier.clone(),
)
}
pub fn clear_all(&self, writer: &mut heed::RwTxn<UpdateT>) -> MResult<u64> {
let _ = self.updates_notifier.send(UpdateEvent::NewUpdate);
update::push_clear_all(writer, self.updates, self.updates_results)
}
pub fn current_update_id(&self, reader: &heed::RoTxn<UpdateT>) -> MResult<Option<u64>> {
match self.updates.last_update(reader)? {
Some((id, _)) => Ok(Some(id)),
None => Ok(None),
}
}
pub fn update_status(
&self,
reader: &heed::RoTxn<UpdateT>,
update_id: u64,
) -> MResult<Option<update::UpdateStatus>> {
update::update_status(reader, self.updates, self.updates_results, update_id)
}
pub fn all_updates_status(&self, reader: &heed::RoTxn<UpdateT>) -> MResult<Vec<update::UpdateStatus>> {
let mut updates = Vec::new();
let mut last_update_result_id = 0;
// retrieve all updates results
if let Some((last_id, _)) = self.updates_results.last_update(reader)? {
updates.reserve(last_id as usize);
for id in 0..=last_id {
if let Some(update) = self.update_status(reader, id)? {
updates.push(update);
last_update_result_id = id + 1;
}
}
}
// retrieve all enqueued updates
if let Some((last_id, _)) = self.updates.last_update(reader)? {
for id in last_update_result_id..=last_id {
if let Some(update) = self.update_status(reader, id)? {
updates.push(update);
}
}
}
Ok(updates)
}
pub fn query_builder(&self) -> QueryBuilder {
QueryBuilder::new(self)
}
pub fn query_builder_with_criteria<'c, 'f, 'd, 'i>(
&'i self,
criteria: Criteria<'c>,
) -> QueryBuilder<'c, 'f, 'd, 'i> {
QueryBuilder::with_criteria(self, criteria)
}
}
pub fn create(
env: &heed::Env,
update_env: &heed::Env,
name: &str,
updates_notifier: UpdateEventsEmitter,
) -> MResult<Index> {
// create all the store names
let main_name = main_name(name);
let postings_lists_name = postings_lists_name(name);
let documents_fields_name = documents_fields_name(name);
let documents_fields_counts_name = documents_fields_counts_name(name);
let synonyms_name = synonyms_name(name);
let docs_words_name = docs_words_name(name);
let prefix_documents_cache_name = prefix_documents_cache_name(name);
let prefix_postings_lists_cache_name = prefix_postings_lists_cache_name(name);
let updates_name = updates_name(name);
let updates_results_name = updates_results_name(name);
let facets_name = facets_name(name);
// open all the stores
let main = env.create_poly_database(Some(&main_name))?;
let postings_lists = env.create_database(Some(&postings_lists_name))?;
let documents_fields = env.create_database(Some(&documents_fields_name))?;
let documents_fields_counts = env.create_database(Some(&documents_fields_counts_name))?;
let facets = env.create_database(Some(&facets_name))?;
let synonyms = env.create_database(Some(&synonyms_name))?;
let docs_words = env.create_database(Some(&docs_words_name))?;
let prefix_documents_cache = env.create_database(Some(&prefix_documents_cache_name))?;
let prefix_postings_lists_cache = env.create_database(Some(&prefix_postings_lists_cache_name))?;
let updates = update_env.create_database(Some(&updates_name))?;
let updates_results = update_env.create_database(Some(&updates_results_name))?;
Ok(Index {
main: Main { main },
postings_lists: PostingsLists { postings_lists },
documents_fields: DocumentsFields { documents_fields },
documents_fields_counts: DocumentsFieldsCounts { documents_fields_counts },
synonyms: Synonyms { synonyms },
docs_words: DocsWords { docs_words },
prefix_postings_lists_cache: PrefixPostingsListsCache { prefix_postings_lists_cache },
prefix_documents_cache: PrefixDocumentsCache { prefix_documents_cache },
facets: Facets { facets },
updates: Updates { updates },
updates_results: UpdatesResults { updates_results },
updates_notifier,
})
}
pub fn open(
env: &heed::Env,
update_env: &heed::Env,
name: &str,
updates_notifier: UpdateEventsEmitter,
) -> MResult<Option<Index>> {
// create all the store names
let main_name = main_name(name);
let postings_lists_name = postings_lists_name(name);
let documents_fields_name = documents_fields_name(name);
let documents_fields_counts_name = documents_fields_counts_name(name);
let synonyms_name = synonyms_name(name);
let docs_words_name = docs_words_name(name);
let prefix_documents_cache_name = prefix_documents_cache_name(name);
let facets_name = facets_name(name);
let prefix_postings_lists_cache_name = prefix_postings_lists_cache_name(name);
let updates_name = updates_name(name);
let updates_results_name = updates_results_name(name);
// open all the stores
let main = match env.open_poly_database(Some(&main_name))? {
Some(main) => main,
None => return Ok(None),
};
let postings_lists = match env.open_database(Some(&postings_lists_name))? {
Some(postings_lists) => postings_lists,
None => return Ok(None),
};
let documents_fields = match env.open_database(Some(&documents_fields_name))? {
Some(documents_fields) => documents_fields,
None => return Ok(None),
};
let documents_fields_counts = match env.open_database(Some(&documents_fields_counts_name))? {
Some(documents_fields_counts) => documents_fields_counts,
None => return Ok(None),
};
let synonyms = match env.open_database(Some(&synonyms_name))? {
Some(synonyms) => synonyms,
None => return Ok(None),
};
let docs_words = match env.open_database(Some(&docs_words_name))? {
Some(docs_words) => docs_words,
None => return Ok(None),
};
let prefix_documents_cache = match env.open_database(Some(&prefix_documents_cache_name))? {
Some(prefix_documents_cache) => prefix_documents_cache,
None => return Ok(None),
};
let facets = match env.open_database(Some(&facets_name))? {
Some(facets) => facets,
None => return Ok(None),
};
let prefix_postings_lists_cache = match env.open_database(Some(&prefix_postings_lists_cache_name))? {
Some(prefix_postings_lists_cache) => prefix_postings_lists_cache,
None => return Ok(None),
};
let updates = match update_env.open_database(Some(&updates_name))? {
Some(updates) => updates,
None => return Ok(None),
};
let updates_results = match update_env.open_database(Some(&updates_results_name))? {
Some(updates_results) => updates_results,
None => return Ok(None),
};
Ok(Some(Index {
main: Main { main },
postings_lists: PostingsLists { postings_lists },
documents_fields: DocumentsFields { documents_fields },
documents_fields_counts: DocumentsFieldsCounts { documents_fields_counts },
synonyms: Synonyms { synonyms },
docs_words: DocsWords { docs_words },
prefix_documents_cache: PrefixDocumentsCache { prefix_documents_cache },
facets: Facets { facets },
prefix_postings_lists_cache: PrefixPostingsListsCache { prefix_postings_lists_cache },
updates: Updates { updates },
updates_results: UpdatesResults { updates_results },
updates_notifier,
}))
}
pub fn clear(
writer: &mut heed::RwTxn<MainT>,
update_writer: &mut heed::RwTxn<UpdateT>,
index: &Index,
) -> MResult<()> {
// clear all the stores
index.main.clear(writer)?;
index.postings_lists.clear(writer)?;
index.documents_fields.clear(writer)?;
index.documents_fields_counts.clear(writer)?;
index.synonyms.clear(writer)?;
index.docs_words.clear(writer)?;
index.prefix_documents_cache.clear(writer)?;
index.prefix_postings_lists_cache.clear(writer)?;
index.updates.clear(update_writer)?;
index.updates_results.clear(update_writer)?;
Ok(())
}

View File

@ -1,47 +0,0 @@
use std::borrow::Cow;
use heed::Result as ZResult;
use heed::types::ByteSlice;
use sdset::{Set, SetBuf};
use slice_group_by::GroupBy;
use crate::database::MainT;
use crate::DocIndex;
use crate::store::{Postings, PostingsCodec};
#[derive(Copy, Clone)]
pub struct PostingsLists {
pub(crate) postings_lists: heed::Database<ByteSlice, PostingsCodec>,
}
impl PostingsLists {
pub fn put_postings_list(
self,
writer: &mut heed::RwTxn<MainT>,
word: &[u8],
matches: &Set<DocIndex>,
) -> ZResult<()> {
let docids = matches.linear_group_by_key(|m| m.document_id).map(|g| g[0].document_id).collect();
let docids = Cow::Owned(SetBuf::new_unchecked(docids));
let matches = Cow::Borrowed(matches);
let postings = Postings { docids, matches };
self.postings_lists.put(writer, word, &postings)
}
pub fn del_postings_list(self, writer: &mut heed::RwTxn<MainT>, word: &[u8]) -> ZResult<bool> {
self.postings_lists.delete(writer, word)
}
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<()> {
self.postings_lists.clear(writer)
}
pub fn postings_list<'txn>(
self,
reader: &'txn heed::RoTxn<MainT>,
word: &[u8],
) -> ZResult<Option<Postings<'txn>>> {
self.postings_lists.get(reader, word)
}
}

View File

@ -1,80 +0,0 @@
use std::borrow::Cow;
use heed::types::{OwnedType, CowSlice};
use heed::Result as ZResult;
use zerocopy::{AsBytes, FromBytes};
use super::{BEU64, BEU32};
use crate::{DocumentId, Highlight};
use crate::database::MainT;
#[derive(Debug, Copy, Clone, AsBytes, FromBytes)]
#[repr(C)]
pub struct PrefixKey {
prefix: [u8; 4],
index: BEU64,
docid: BEU32,
}
impl PrefixKey {
pub fn new(prefix: [u8; 4], index: u64, docid: u32) -> PrefixKey {
PrefixKey {
prefix,
index: BEU64::new(index),
docid: BEU32::new(docid),
}
}
}
#[derive(Copy, Clone)]
pub struct PrefixDocumentsCache {
pub(crate) prefix_documents_cache: heed::Database<OwnedType<PrefixKey>, CowSlice<Highlight>>,
}
impl PrefixDocumentsCache {
pub fn put_prefix_document(
self,
writer: &mut heed::RwTxn<MainT>,
prefix: [u8; 4],
index: usize,
docid: DocumentId,
highlights: &[Highlight],
) -> ZResult<()> {
let key = PrefixKey::new(prefix, index as u64, docid.0);
self.prefix_documents_cache.put(writer, &key, highlights)
}
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<()> {
self.prefix_documents_cache.clear(writer)
}
pub fn prefix_documents<'txn>(
self,
reader: &'txn heed::RoTxn<MainT>,
prefix: [u8; 4],
) -> ZResult<PrefixDocumentsIter<'txn>> {
let start = PrefixKey::new(prefix, 0, 0);
let end = PrefixKey::new(prefix, u64::max_value(), u32::max_value());
let iter = self.prefix_documents_cache.range(reader, &(start..=end))?;
Ok(PrefixDocumentsIter { iter })
}
}
pub struct PrefixDocumentsIter<'txn> {
iter: heed::RoRange<'txn, OwnedType<PrefixKey>, CowSlice<Highlight>>,
}
impl<'txn> Iterator for PrefixDocumentsIter<'txn> {
type Item = ZResult<(DocumentId, Cow<'txn, [Highlight]>)>;
fn next(&mut self) -> Option<Self::Item> {
match self.iter.next() {
Some(Ok((key, highlights))) => {
let docid = DocumentId(key.docid.get());
Some(Ok((docid, highlights)))
}
Some(Err(e)) => Some(Err(e)),
None => None,
}
}
}

View File

@ -1,45 +0,0 @@
use std::borrow::Cow;
use heed::Result as ZResult;
use heed::types::OwnedType;
use sdset::{Set, SetBuf};
use slice_group_by::GroupBy;
use crate::database::MainT;
use crate::DocIndex;
use crate::store::{PostingsCodec, Postings};
#[derive(Copy, Clone)]
pub struct PrefixPostingsListsCache {
pub(crate) prefix_postings_lists_cache: heed::Database<OwnedType<[u8; 4]>, PostingsCodec>,
}
impl PrefixPostingsListsCache {
pub fn put_prefix_postings_list(
self,
writer: &mut heed::RwTxn<MainT>,
prefix: [u8; 4],
matches: &Set<DocIndex>,
) -> ZResult<()>
{
let docids = matches.linear_group_by_key(|m| m.document_id).map(|g| g[0].document_id).collect();
let docids = Cow::Owned(SetBuf::new_unchecked(docids));
let matches = Cow::Borrowed(matches);
let postings = Postings { docids, matches };
self.prefix_postings_lists_cache.put(writer, &prefix, &postings)
}
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<()> {
self.prefix_postings_lists_cache.clear(writer)
}
pub fn prefix_postings_list<'txn>(
self,
reader: &'txn heed::RoTxn<MainT>,
prefix: [u8; 4],
) -> ZResult<Option<Postings<'txn>>>
{
self.prefix_postings_lists_cache.get(reader, &prefix)
}
}

View File

@ -1,44 +0,0 @@
use std::borrow::Cow;
use heed::Result as ZResult;
use heed::types::ByteSlice;
use crate::database::MainT;
use crate::{FstSetCow, MResult};
#[derive(Copy, Clone)]
pub struct Synonyms {
pub(crate) synonyms: heed::Database<ByteSlice, ByteSlice>,
}
impl Synonyms {
pub fn put_synonyms<A>(self, writer: &mut heed::RwTxn<MainT>, word: &[u8], synonyms: &fst::Set<A>) -> ZResult<()>
where A: AsRef<[u8]>,
{
let bytes = synonyms.as_fst().as_bytes();
self.synonyms.put(writer, word, bytes)
}
pub fn del_synonyms(self, writer: &mut heed::RwTxn<MainT>, word: &[u8]) -> ZResult<bool> {
self.synonyms.delete(writer, word)
}
pub fn clear(self, writer: &mut heed::RwTxn<MainT>) -> ZResult<()> {
self.synonyms.clear(writer)
}
pub(crate) fn synonyms_fst<'txn>(self, reader: &'txn heed::RoTxn<MainT>, word: &[u8]) -> ZResult<FstSetCow<'txn>> {
match self.synonyms.get(reader, word)? {
Some(bytes) => Ok(fst::Set::new(bytes).unwrap().map_data(Cow::Borrowed).unwrap()),
None => Ok(fst::Set::default().map_data(Cow::Owned).unwrap()),
}
}
pub fn synonyms(self, reader: &heed::RoTxn<MainT>, word: &[u8]) -> MResult<Vec<String>> {
let synonyms = self
.synonyms_fst(&reader, word)?
.stream()
.into_strs()?;
Ok(synonyms)
}
}

View File

@ -1,65 +0,0 @@
use super::BEU64;
use crate::database::UpdateT;
use crate::update::Update;
use heed::types::{OwnedType, SerdeJson};
use heed::Result as ZResult;
#[derive(Copy, Clone)]
pub struct Updates {
pub(crate) updates: heed::Database<OwnedType<BEU64>, SerdeJson<Update>>,
}
impl Updates {
// TODO do not trigger deserialize if possible
pub fn last_update(self, reader: &heed::RoTxn<UpdateT>) -> ZResult<Option<(u64, Update)>> {
match self.updates.last(reader)? {
Some((key, data)) => Ok(Some((key.get(), data))),
None => Ok(None),
}
}
// TODO do not trigger deserialize if possible
pub fn first_update(self, reader: &heed::RoTxn<UpdateT>) -> ZResult<Option<(u64, Update)>> {
match self.updates.first(reader)? {
Some((key, data)) => Ok(Some((key.get(), data))),
None => Ok(None),
}
}
// TODO do not trigger deserialize if possible
pub fn get(self, reader: &heed::RoTxn<UpdateT>, update_id: u64) -> ZResult<Option<Update>> {
let update_id = BEU64::new(update_id);
self.updates.get(reader, &update_id)
}
pub fn put_update(
self,
writer: &mut heed::RwTxn<UpdateT>,
update_id: u64,
update: &Update,
) -> ZResult<()> {
// TODO prefer using serde_json?
let update_id = BEU64::new(update_id);
self.updates.put(writer, &update_id, update)
}
pub fn del_update(self, writer: &mut heed::RwTxn<UpdateT>, update_id: u64) -> ZResult<bool> {
let update_id = BEU64::new(update_id);
self.updates.delete(writer, &update_id)
}
pub fn pop_front(self, writer: &mut heed::RwTxn<UpdateT>) -> ZResult<Option<(u64, Update)>> {
match self.first_update(writer)? {
Some((update_id, update)) => {
let key = BEU64::new(update_id);
self.updates.delete(writer, &key)?;
Ok(Some((update_id, update)))
}
None => Ok(None),
}
}
pub fn clear(self, writer: &mut heed::RwTxn<UpdateT>) -> ZResult<()> {
self.updates.clear(writer)
}
}

View File

@ -1,45 +0,0 @@
use super::BEU64;
use crate::database::UpdateT;
use crate::update::ProcessedUpdateResult;
use heed::types::{OwnedType, SerdeJson};
use heed::Result as ZResult;
#[derive(Copy, Clone)]
pub struct UpdatesResults {
pub(crate) updates_results: heed::Database<OwnedType<BEU64>, SerdeJson<ProcessedUpdateResult>>,
}
impl UpdatesResults {
pub fn last_update(
self,
reader: &heed::RoTxn<UpdateT>,
) -> ZResult<Option<(u64, ProcessedUpdateResult)>> {
match self.updates_results.last(reader)? {
Some((key, data)) => Ok(Some((key.get(), data))),
None => Ok(None),
}
}
pub fn put_update_result(
self,
writer: &mut heed::RwTxn<UpdateT>,
update_id: u64,
update_result: &ProcessedUpdateResult,
) -> ZResult<()> {
let update_id = BEU64::new(update_id);
self.updates_results.put(writer, &update_id, update_result)
}
pub fn update_result(
self,
reader: &heed::RoTxn<UpdateT>,
update_id: u64,
) -> ZResult<Option<ProcessedUpdateResult>> {
let update_id = BEU64::new(update_id);
self.updates_results.get(reader, &update_id)
}
pub fn clear(self, writer: &mut heed::RwTxn<UpdateT>) -> ZResult<()> {
self.updates_results.clear(writer)
}
}

View File

@ -1,36 +0,0 @@
use crate::database::{MainT, UpdateT};
use crate::update::{next_update_id, Update};
use crate::{store, MResult, RankedMap};
pub fn apply_clear_all(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
) -> MResult<()> {
index.main.put_words_fst(writer, &fst::Set::default())?;
index.main.put_external_docids(writer, &fst::Map::default())?;
index.main.put_internal_docids(writer, &sdset::SetBuf::default())?;
index.main.put_ranked_map(writer, &RankedMap::default())?;
index.main.put_number_of_documents(writer, |_| 0)?;
index.main.put_sorted_document_ids_cache(writer, &[])?;
index.documents_fields.clear(writer)?;
index.documents_fields_counts.clear(writer)?;
index.postings_lists.clear(writer)?;
index.docs_words.clear(writer)?;
index.prefix_documents_cache.clear(writer)?;
index.prefix_postings_lists_cache.clear(writer)?;
index.facets.clear(writer)?;
Ok(())
}
pub fn push_clear_all(
writer: &mut heed::RwTxn<UpdateT>,
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
) -> MResult<u64> {
let last_update_id = next_update_id(writer, updates_store, updates_results_store)?;
let update = Update::clear_all();
updates_store.put_update(writer, last_update_id, &update)?;
Ok(last_update_id)
}

View File

@ -1,26 +0,0 @@
use crate::database::{MainT, UpdateT};
use crate::{store, MResult};
use crate::update::{next_update_id, Update};
pub fn apply_customs_update(
writer: &mut heed::RwTxn<MainT>,
main_store: store::Main,
customs: &[u8],
) -> MResult<()> {
main_store.put_customs(writer, customs)
}
pub fn push_customs_update(
writer: &mut heed::RwTxn<UpdateT>,
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
customs: Vec<u8>,
) -> MResult<u64> {
let last_update_id = next_update_id(writer, updates_store, updates_results_store)?;
let update = Update::customs(customs);
updates_store.put_update(writer, last_update_id, &update)?;
Ok(last_update_id)
}

View File

@ -1,424 +0,0 @@
use std::borrow::Cow;
use std::collections::{HashMap, BTreeMap};
use fst::{set::OpBuilder, SetBuilder};
use indexmap::IndexMap;
use meilisearch_schema::{Schema, FieldId};
use meilisearch_types::DocumentId;
use sdset::{duo::Union, SetOperation};
use serde::Deserialize;
use serde_json::Value;
use crate::database::{MainT, UpdateT};
use crate::database::{UpdateEvent, UpdateEventsEmitter};
use crate::facets;
use crate::raw_indexer::RawIndexer;
use crate::serde::Deserializer;
use crate::store::{self, DocumentsFields, DocumentsFieldsCounts, DiscoverIds};
use crate::update::helpers::{index_value, value_to_number, extract_document_id};
use crate::update::{apply_documents_deletion, compute_short_prefixes, next_update_id, Update};
use crate::{Error, MResult, RankedMap};
pub struct DocumentsAddition<D> {
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
updates_notifier: UpdateEventsEmitter,
documents: Vec<D>,
is_partial: bool,
}
impl<D> DocumentsAddition<D> {
pub fn new(
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
updates_notifier: UpdateEventsEmitter,
) -> DocumentsAddition<D> {
DocumentsAddition {
updates_store,
updates_results_store,
updates_notifier,
documents: Vec::new(),
is_partial: false,
}
}
pub fn new_partial(
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
updates_notifier: UpdateEventsEmitter,
) -> DocumentsAddition<D> {
DocumentsAddition {
updates_store,
updates_results_store,
updates_notifier,
documents: Vec::new(),
is_partial: true,
}
}
pub fn update_document(&mut self, document: D) {
self.documents.push(document);
}
pub fn finalize(self, writer: &mut heed::RwTxn<UpdateT>) -> MResult<u64>
where
D: serde::Serialize,
{
let _ = self.updates_notifier.send(UpdateEvent::NewUpdate);
let update_id = push_documents_addition(
writer,
self.updates_store,
self.updates_results_store,
self.documents,
self.is_partial,
)?;
Ok(update_id)
}
}
impl<D> Extend<D> for DocumentsAddition<D> {
fn extend<T: IntoIterator<Item = D>>(&mut self, iter: T) {
self.documents.extend(iter)
}
}
pub fn push_documents_addition<D: serde::Serialize>(
writer: &mut heed::RwTxn<UpdateT>,
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
addition: Vec<D>,
is_partial: bool,
) -> MResult<u64> {
let mut values = Vec::with_capacity(addition.len());
for add in addition {
let vec = serde_json::to_vec(&add)?;
let add = serde_json::from_slice(&vec)?;
values.push(add);
}
let last_update_id = next_update_id(writer, updates_store, updates_results_store)?;
let update = if is_partial {
Update::documents_partial(values)
} else {
Update::documents_addition(values)
};
updates_store.put_update(writer, last_update_id, &update)?;
Ok(last_update_id)
}
#[allow(clippy::too_many_arguments)]
fn index_document<A: AsRef<[u8]>>(
writer: &mut heed::RwTxn<MainT>,
documents_fields: DocumentsFields,
documents_fields_counts: DocumentsFieldsCounts,
ranked_map: &mut RankedMap,
indexer: &mut RawIndexer<A>,
schema: &Schema,
field_id: FieldId,
document_id: DocumentId,
value: &Value,
) -> MResult<()>
{
let serialized = serde_json::to_vec(value)?;
documents_fields.put_document_field(writer, document_id, field_id, &serialized)?;
if let Some(indexed_pos) = schema.is_searchable(field_id) {
let number_of_words = index_value(indexer, document_id, indexed_pos, value);
if let Some(number_of_words) = number_of_words {
documents_fields_counts.put_document_field_count(
writer,
document_id,
indexed_pos,
number_of_words as u16,
)?;
}
}
if schema.is_ranked(field_id) {
let number = value_to_number(value).unwrap_or_default();
ranked_map.insert(document_id, field_id, number);
}
Ok(())
}
pub fn apply_addition(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
new_documents: Vec<IndexMap<String, Value>>,
partial: bool
) -> MResult<()>
{
let mut schema = match index.main.schema(writer)? {
Some(schema) => schema,
None => return Err(Error::SchemaMissing),
};
// Retrieve the documents ids related structures
let external_docids = index.main.external_docids(writer)?;
let internal_docids = index.main.internal_docids(writer)?;
let mut available_ids = DiscoverIds::new(&internal_docids);
let primary_key = schema.primary_key().ok_or(Error::MissingPrimaryKey)?;
// 1. store documents ids for future deletion
let mut documents_additions = HashMap::new();
let mut new_external_docids = BTreeMap::new();
let mut new_internal_docids = Vec::with_capacity(new_documents.len());
for mut document in new_documents {
let external_docids_get = |docid: &str| {
match (external_docids.get(docid), new_external_docids.get(docid)) {
(_, Some(&id))
| (Some(id), _) => Some(id as u32),
(None, None) => None,
}
};
let (internal_docid, external_docid) =
extract_document_id(
&primary_key,
&document,
&external_docids_get,
&mut available_ids,
)?;
new_external_docids.insert(external_docid, internal_docid.0 as u64);
new_internal_docids.push(internal_docid);
if partial {
let mut deserializer = Deserializer {
document_id: internal_docid,
reader: writer,
documents_fields: index.documents_fields,
schema: &schema,
fields: None,
};
let old_document = Option::<HashMap<String, Value>>::deserialize(&mut deserializer)?;
if let Some(old_document) = old_document {
for (key, value) in old_document {
document.entry(key).or_insert(value);
}
}
}
documents_additions.insert(internal_docid, document);
}
// 2. remove the documents postings lists
let number_of_inserted_documents = documents_additions.len();
let documents_ids = new_external_docids.iter().map(|(id, _)| id.clone()).collect();
apply_documents_deletion(writer, index, documents_ids)?;
let mut ranked_map = match index.main.ranked_map(writer)? {
Some(ranked_map) => ranked_map,
None => RankedMap::default(),
};
let stop_words = index.main.stop_words_fst(writer)?.map_data(Cow::into_owned)?;
let mut indexer = RawIndexer::new(&stop_words);
// For each document in this update
for (document_id, document) in &documents_additions {
// For each key-value pair in the document.
for (attribute, value) in document {
let (field_id, _) = schema.insert_with_position(&attribute)?;
index_document(
writer,
index.documents_fields,
index.documents_fields_counts,
&mut ranked_map,
&mut indexer,
&schema,
field_id,
*document_id,
&value,
)?;
}
}
write_documents_addition_index(
writer,
index,
&ranked_map,
number_of_inserted_documents,
indexer,
)?;
index.main.put_schema(writer, &schema)?;
let new_external_docids = fst::Map::from_iter(new_external_docids.iter().map(|(ext, id)| (ext, *id as u64)))?;
let new_internal_docids = sdset::SetBuf::from_dirty(new_internal_docids);
index.main.merge_external_docids(writer, &new_external_docids)?;
index.main.merge_internal_docids(writer, &new_internal_docids)?;
// recompute all facet attributes after document update.
if let Some(attributes_for_facetting) = index.main.attributes_for_faceting(writer)? {
let docids = index.main.internal_docids(writer)?;
let facet_map = facets::facet_map_from_docids(writer, index, &docids, attributes_for_facetting.as_ref())?;
index.facets.add(writer, facet_map)?;
}
// update is finished; update sorted document id cache with new state
let mut document_ids = index.main.internal_docids(writer)?.to_vec();
super::cache_document_ids_sorted(writer, &ranked_map, index, &mut document_ids)?;
Ok(())
}
pub fn apply_documents_partial_addition(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
new_documents: Vec<IndexMap<String, Value>>,
) -> MResult<()> {
apply_addition(writer, index, new_documents, true)
}
pub fn apply_documents_addition(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
new_documents: Vec<IndexMap<String, Value>>,
) -> MResult<()> {
apply_addition(writer, index, new_documents, false)
}
pub fn reindex_all_documents(writer: &mut heed::RwTxn<MainT>, index: &store::Index) -> MResult<()> {
let schema = match index.main.schema(writer)? {
Some(schema) => schema,
None => return Err(Error::SchemaMissing),
};
let mut ranked_map = RankedMap::default();
// 1. retrieve all documents ids
let mut documents_ids_to_reindex = Vec::new();
for result in index.documents_fields_counts.documents_ids(writer)? {
let document_id = result?;
documents_ids_to_reindex.push(document_id);
}
// 2. remove the documents posting lists
index.main.put_words_fst(writer, &fst::Set::default())?;
index.main.put_ranked_map(writer, &ranked_map)?;
index.main.put_number_of_documents(writer, |_| 0)?;
index.facets.clear(writer)?;
index.postings_lists.clear(writer)?;
index.docs_words.clear(writer)?;
let stop_words = index.main
.stop_words_fst(writer)?
.map_data(Cow::into_owned)
.unwrap();
let number_of_inserted_documents = documents_ids_to_reindex.len();
let mut indexer = RawIndexer::new(&stop_words);
let mut ram_store = HashMap::new();
if let Some(ref attributes_for_facetting) = index.main.attributes_for_faceting(writer)? {
let facet_map = facets::facet_map_from_docids(writer, &index, &documents_ids_to_reindex, &attributes_for_facetting)?;
index.facets.add(writer, facet_map)?;
}
// ^-- https://github.com/meilisearch/MeiliSearch/pull/631#issuecomment-626624470 --v
for document_id in &documents_ids_to_reindex {
for result in index.documents_fields.document_fields(writer, *document_id)? {
let (field_id, bytes) = result?;
let value: Value = serde_json::from_slice(bytes)?;
ram_store.insert((document_id, field_id), value);
}
// For each key-value pair in the document.
for ((document_id, field_id), value) in ram_store.drain() {
index_document(
writer,
index.documents_fields,
index.documents_fields_counts,
&mut ranked_map,
&mut indexer,
&schema,
field_id,
*document_id,
&value,
)?;
}
}
// 4. write the new index in the main store
write_documents_addition_index(
writer,
index,
&ranked_map,
number_of_inserted_documents,
indexer,
)?;
index.main.put_schema(writer, &schema)?;
// recompute all facet attributes after document update.
if let Some(attributes_for_facetting) = index.main.attributes_for_faceting(writer)? {
let docids = index.main.internal_docids(writer)?;
let facet_map = facets::facet_map_from_docids(writer, index, &docids, attributes_for_facetting.as_ref())?;
index.facets.add(writer, facet_map)?;
}
// update is finished; update sorted document id cache with new state
let mut document_ids = index.main.internal_docids(writer)?.to_vec();
super::cache_document_ids_sorted(writer, &ranked_map, index, &mut document_ids)?;
Ok(())
}
pub fn write_documents_addition_index<A: AsRef<[u8]>>(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
ranked_map: &RankedMap,
number_of_inserted_documents: usize,
indexer: RawIndexer<A>,
) -> MResult<()>
{
let indexed = indexer.build();
let mut delta_words_builder = SetBuilder::memory();
for (word, delta_set) in indexed.words_doc_indexes {
delta_words_builder.insert(&word).unwrap();
let set = match index.postings_lists.postings_list(writer, &word)? {
Some(postings) => Union::new(&postings.matches, &delta_set).into_set_buf(),
None => delta_set,
};
index.postings_lists.put_postings_list(writer, &word, &set)?;
}
for (id, words) in indexed.docs_words {
index.docs_words.put_doc_words(writer, id, &words)?;
}
let delta_words = delta_words_builder.into_set();
let words_fst = index.main.words_fst(writer)?;
let words = if !words_fst.is_empty() {
let op = OpBuilder::new()
.add(words_fst.stream())
.add(delta_words.stream())
.r#union();
let mut words_builder = SetBuilder::memory();
words_builder.extend_stream(op).unwrap();
words_builder.into_set()
} else {
delta_words
};
index.main.put_words_fst(writer, &words)?;
index.main.put_ranked_map(writer, ranked_map)?;
index.main.put_number_of_documents(writer, |old| old + number_of_inserted_documents as u64)?;
compute_short_prefixes(writer, &words, index)?;
Ok(())
}

View File

@ -1,207 +0,0 @@
use std::collections::{BTreeSet, HashMap, HashSet};
use fst::{SetBuilder, Streamer};
use sdset::{duo::DifferenceByKey, SetBuf, SetOperation};
use crate::database::{MainT, UpdateT};
use crate::database::{UpdateEvent, UpdateEventsEmitter};
use crate::facets;
use crate::store;
use crate::update::{next_update_id, compute_short_prefixes, Update};
use crate::{DocumentId, Error, MResult, RankedMap, MainWriter, Index};
pub struct DocumentsDeletion {
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
updates_notifier: UpdateEventsEmitter,
external_docids: Vec<String>,
}
impl DocumentsDeletion {
pub fn new(
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
updates_notifier: UpdateEventsEmitter,
) -> DocumentsDeletion {
DocumentsDeletion {
updates_store,
updates_results_store,
updates_notifier,
external_docids: Vec::new(),
}
}
pub fn delete_document_by_external_docid(&mut self, document_id: String) {
self.external_docids.push(document_id);
}
pub fn finalize(self, writer: &mut heed::RwTxn<UpdateT>) -> MResult<u64> {
let _ = self.updates_notifier.send(UpdateEvent::NewUpdate);
let update_id = push_documents_deletion(
writer,
self.updates_store,
self.updates_results_store,
self.external_docids,
)?;
Ok(update_id)
}
}
impl Extend<String> for DocumentsDeletion {
fn extend<T: IntoIterator<Item=String>>(&mut self, iter: T) {
self.external_docids.extend(iter)
}
}
pub fn push_documents_deletion(
writer: &mut heed::RwTxn<UpdateT>,
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
external_docids: Vec<String>,
) -> MResult<u64> {
let last_update_id = next_update_id(writer, updates_store, updates_results_store)?;
let update = Update::documents_deletion(external_docids);
updates_store.put_update(writer, last_update_id, &update)?;
Ok(last_update_id)
}
pub fn apply_documents_deletion(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
external_docids: Vec<String>,
) -> MResult<()>
{
let (external_docids, internal_docids) = {
let new_external_docids = SetBuf::from_dirty(external_docids);
let mut internal_docids = Vec::new();
let old_external_docids = index.main.external_docids(writer)?;
for external_docid in new_external_docids.as_slice() {
if let Some(id) = old_external_docids.get(external_docid) {
internal_docids.push(DocumentId(id as u32));
}
}
let new_external_docids = fst::Map::from_iter(new_external_docids.into_iter().map(|k| (k, 0))).unwrap();
(new_external_docids, SetBuf::from_dirty(internal_docids))
};
let schema = match index.main.schema(writer)? {
Some(schema) => schema,
None => return Err(Error::SchemaMissing),
};
let mut ranked_map = match index.main.ranked_map(writer)? {
Some(ranked_map) => ranked_map,
None => RankedMap::default(),
};
// facet filters deletion
if let Some(attributes_for_facetting) = index.main.attributes_for_faceting(writer)? {
let facet_map = facets::facet_map_from_docids(writer, &index, &internal_docids, &attributes_for_facetting)?;
index.facets.remove(writer, facet_map)?;
}
// collect the ranked attributes according to the schema
let ranked_fields = schema.ranked();
let mut words_document_ids = HashMap::new();
for id in internal_docids.iter().cloned() {
// remove all the ranked attributes from the ranked_map
for ranked_attr in ranked_fields {
ranked_map.remove(id, *ranked_attr);
}
let words = index.docs_words.doc_words(writer, id)?;
if !words.is_empty() {
let mut stream = words.stream();
while let Some(word) = stream.next() {
let word = word.to_vec();
words_document_ids
.entry(word)
.or_insert_with(Vec::new)
.push(id);
}
}
}
let mut deleted_documents = HashSet::new();
let mut removed_words = BTreeSet::new();
for (word, document_ids) in words_document_ids {
let document_ids = SetBuf::from_dirty(document_ids);
if let Some(postings) = index.postings_lists.postings_list(writer, &word)? {
let op = DifferenceByKey::new(&postings.matches, &document_ids, |d| d.document_id, |id| *id);
let doc_indexes = op.into_set_buf();
if !doc_indexes.is_empty() {
index.postings_lists.put_postings_list(writer, &word, &doc_indexes)?;
} else {
index.postings_lists.del_postings_list(writer, &word)?;
removed_words.insert(word);
}
}
for id in document_ids {
index.documents_fields_counts.del_all_document_fields_counts(writer, id)?;
if index.documents_fields.del_all_document_fields(writer, id)? != 0 {
deleted_documents.insert(id);
}
}
}
let deleted_documents_len = deleted_documents.len() as u64;
for id in &deleted_documents {
index.docs_words.del_doc_words(writer, *id)?;
}
let removed_words = fst::Set::from_iter(removed_words).unwrap();
let words = {
let words_set = index.main.words_fst(writer)?;
let op = fst::set::OpBuilder::new()
.add(words_set.stream())
.add(removed_words.stream())
.difference();
let mut words_builder = SetBuilder::memory();
words_builder.extend_stream(op).unwrap();
words_builder.into_set()
};
index.main.put_words_fst(writer, &words)?;
index.main.put_ranked_map(writer, &ranked_map)?;
index.main.put_number_of_documents(writer, |old| old - deleted_documents_len)?;
// We apply the changes to the user and internal ids
index.main.remove_external_docids(writer, &external_docids)?;
index.main.remove_internal_docids(writer, &internal_docids)?;
compute_short_prefixes(writer, &words, index)?;
// update is finished; update sorted document id cache with new state
document_cache_remove_deleted(writer, index, &ranked_map, &deleted_documents)?;
Ok(())
}
/// rebuilds the document id cache by either removing deleted documents from the existing cache,
/// and generating a new one from docs in store
fn document_cache_remove_deleted(writer: &mut MainWriter, index: &Index, ranked_map: &RankedMap, documents_to_delete: &HashSet<DocumentId>) -> MResult<()> {
let new_cache = match index.main.sorted_document_ids_cache(writer)? {
// only keep documents that are not in the list of deleted documents. Order is preserved,
// no need to resort
Some(old_cache) => {
old_cache.iter().filter(|docid| !documents_to_delete.contains(docid)).cloned().collect::<Vec<_>>()
}
// couldn't find cached documents, try building a new cache from documents in store
None => {
let mut document_ids = index.main.internal_docids(writer)?.to_vec();
super::cache_document_ids_sorted(writer, ranked_map, index, &mut document_ids)?;
document_ids
}
};
index.main.put_sorted_document_ids_cache(writer, &new_cache)?;
Ok(())
}

View File

@ -1,142 +0,0 @@
use std::fmt::Write as _;
use indexmap::IndexMap;
use meilisearch_schema::IndexedPos;
use meilisearch_types::DocumentId;
use ordered_float::OrderedFloat;
use serde_json::Value;
use crate::Number;
use crate::raw_indexer::RawIndexer;
use crate::serde::SerializerError;
use crate::store::DiscoverIds;
/// Returns the number of words indexed or `None` if the type is unindexable.
pub fn index_value<A: AsRef<[u8]>>(
indexer: &mut RawIndexer<A>,
document_id: DocumentId,
indexed_pos: IndexedPos,
value: &Value,
) -> Option<usize>
{
match value {
Value::Null => None,
Value::Bool(boolean) => {
let text = boolean.to_string();
let number_of_words = indexer.index_text(document_id, indexed_pos, &text);
Some(number_of_words)
},
Value::Number(number) => {
let text = number.to_string();
Some(indexer.index_text(document_id, indexed_pos, &text))
},
Value::String(string) => {
Some(indexer.index_text(document_id, indexed_pos, &string))
},
Value::Array(_) => {
let text = value_to_string(value);
Some(indexer.index_text(document_id, indexed_pos, &text))
},
Value::Object(_) => {
let text = value_to_string(value);
Some(indexer.index_text(document_id, indexed_pos, &text))
},
}
}
/// Transforms the JSON Value type into a String.
pub fn value_to_string(value: &Value) -> String {
fn internal_value_to_string(string: &mut String, value: &Value) {
match value {
Value::Null => (),
Value::Bool(boolean) => { let _ = write!(string, "{}", &boolean); },
Value::Number(number) => { let _ = write!(string, "{}", &number); },
Value::String(text) => string.push_str(&text),
Value::Array(array) => {
for value in array {
internal_value_to_string(string, value);
let _ = string.write_str(". ");
}
},
Value::Object(object) => {
for (key, value) in object {
string.push_str(key);
let _ = string.write_str(". ");
internal_value_to_string(string, value);
let _ = string.write_str(". ");
}
},
}
}
let mut string = String::new();
internal_value_to_string(&mut string, value);
string
}
/// Transforms the JSON Value type into a Number.
pub fn value_to_number(value: &Value) -> Option<Number> {
use std::str::FromStr;
match value {
Value::Null => None,
Value::Bool(boolean) => Some(Number::Unsigned(*boolean as u64)),
Value::Number(number) => {
match (number.as_i64(), number.as_u64(), number.as_f64()) {
(Some(n), _, _) => Some(Number::Signed(n)),
(_, Some(n), _) => Some(Number::Unsigned(n)),
(_, _, Some(n)) => Some(Number::Float(OrderedFloat(n))),
(None, None, None) => None,
}
},
Value::String(string) => Number::from_str(string).ok(),
Value::Array(_array) => None,
Value::Object(_object) => None,
}
}
/// Validates a string representation to be a correct document id and returns
/// the corresponding id or generate a new one, this is the way we produce documents ids.
pub fn discover_document_id<F>(
docid: &str,
external_docids_get: F,
available_docids: &mut DiscoverIds<'_>,
) -> Result<DocumentId, SerializerError>
where
F: FnOnce(&str) -> Option<u32>
{
if docid.chars().all(|x| x.is_ascii_alphanumeric() || x == '-' || x == '_') {
match external_docids_get(docid) {
Some(id) => Ok(DocumentId(id)),
None => {
let internal_id = available_docids.next().expect("no more ids available");
Ok(internal_id)
},
}
} else {
Err(SerializerError::InvalidDocumentIdFormat)
}
}
/// Extracts and validates the document id of a document.
pub fn extract_document_id<F>(
primary_key: &str,
document: &IndexMap<String, Value>,
external_docids_get: F,
available_docids: &mut DiscoverIds<'_>,
) -> Result<(DocumentId, String), SerializerError>
where
F: FnOnce(&str) -> Option<u32>
{
match document.get(primary_key) {
Some(value) => {
let docid = match value {
Value::Number(number) => number.to_string(),
Value::String(string) => string.clone(),
_ => return Err(SerializerError::InvalidDocumentIdFormat),
};
discover_document_id(&docid, external_docids_get, available_docids).map(|id| (id, docid))
}
None => Err(SerializerError::DocumentIdNotFound),
}
}

View File

@ -1,384 +0,0 @@
mod clear_all;
mod customs_update;
mod documents_addition;
mod documents_deletion;
mod settings_update;
mod helpers;
pub use self::clear_all::{apply_clear_all, push_clear_all};
pub use self::customs_update::{apply_customs_update, push_customs_update};
pub use self::documents_addition::{apply_documents_addition, apply_documents_partial_addition, DocumentsAddition};
pub use self::documents_deletion::{apply_documents_deletion, DocumentsDeletion};
pub use self::helpers::{index_value, value_to_string, value_to_number, discover_document_id, extract_document_id};
pub use self::settings_update::{apply_settings_update, push_settings_update};
use std::cmp;
use std::time::Instant;
use chrono::{DateTime, Utc};
use fst::{IntoStreamer, Streamer};
use heed::Result as ZResult;
use indexmap::IndexMap;
use log::debug;
use sdset::Set;
use serde::{Deserialize, Serialize};
use serde_json::Value;
use meilisearch_error::ErrorCode;
use meilisearch_types::DocumentId;
use crate::{store, MResult, RankedMap};
use crate::database::{MainT, UpdateT};
use crate::settings::SettingsUpdate;
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Update {
data: UpdateData,
enqueued_at: DateTime<Utc>,
}
impl Update {
fn clear_all() -> Update {
Update {
data: UpdateData::ClearAll,
enqueued_at: Utc::now(),
}
}
fn customs(data: Vec<u8>) -> Update {
Update {
data: UpdateData::Customs(data),
enqueued_at: Utc::now(),
}
}
fn documents_addition(documents: Vec<IndexMap<String, Value>>) -> Update {
Update {
data: UpdateData::DocumentsAddition(documents),
enqueued_at: Utc::now(),
}
}
fn documents_partial(documents: Vec<IndexMap<String, Value>>) -> Update {
Update {
data: UpdateData::DocumentsPartial(documents),
enqueued_at: Utc::now(),
}
}
fn documents_deletion(data: Vec<String>) -> Update {
Update {
data: UpdateData::DocumentsDeletion(data),
enqueued_at: Utc::now(),
}
}
fn settings(data: SettingsUpdate) -> Update {
Update {
data: UpdateData::Settings(Box::new(data)),
enqueued_at: Utc::now(),
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum UpdateData {
ClearAll,
Customs(Vec<u8>),
DocumentsAddition(Vec<IndexMap<String, Value>>),
DocumentsPartial(Vec<IndexMap<String, Value>>),
DocumentsDeletion(Vec<String>),
Settings(Box<SettingsUpdate>)
}
impl UpdateData {
pub fn update_type(&self) -> UpdateType {
match self {
UpdateData::ClearAll => UpdateType::ClearAll,
UpdateData::Customs(_) => UpdateType::Customs,
UpdateData::DocumentsAddition(addition) => UpdateType::DocumentsAddition {
number: addition.len(),
},
UpdateData::DocumentsPartial(addition) => UpdateType::DocumentsPartial {
number: addition.len(),
},
UpdateData::DocumentsDeletion(deletion) => UpdateType::DocumentsDeletion {
number: deletion.len(),
},
UpdateData::Settings(update) => UpdateType::Settings {
settings: update.clone(),
},
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "name")]
pub enum UpdateType {
ClearAll,
Customs,
DocumentsAddition { number: usize },
DocumentsPartial { number: usize },
DocumentsDeletion { number: usize },
Settings { settings: Box<SettingsUpdate> },
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct ProcessedUpdateResult {
pub update_id: u64,
#[serde(rename = "type")]
pub update_type: UpdateType,
#[serde(skip_serializing_if = "Option::is_none")]
pub error: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub error_type: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub error_code: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub error_link: Option<String>,
pub duration: f64, // in seconds
pub enqueued_at: DateTime<Utc>,
pub processed_at: DateTime<Utc>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct EnqueuedUpdateResult {
pub update_id: u64,
#[serde(rename = "type")]
pub update_type: UpdateType,
pub enqueued_at: DateTime<Utc>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "camelCase", tag = "status")]
pub enum UpdateStatus {
Enqueued {
#[serde(flatten)]
content: EnqueuedUpdateResult,
},
Failed {
#[serde(flatten)]
content: ProcessedUpdateResult,
},
Processed {
#[serde(flatten)]
content: ProcessedUpdateResult,
},
}
pub fn update_status(
update_reader: &heed::RoTxn<UpdateT>,
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
update_id: u64,
) -> MResult<Option<UpdateStatus>> {
match updates_results_store.update_result(update_reader, update_id)? {
Some(result) => {
if result.error.is_some() {
Ok(Some(UpdateStatus::Failed { content: result }))
} else {
Ok(Some(UpdateStatus::Processed { content: result }))
}
},
None => match updates_store.get(update_reader, update_id)? {
Some(update) => Ok(Some(UpdateStatus::Enqueued {
content: EnqueuedUpdateResult {
update_id,
update_type: update.data.update_type(),
enqueued_at: update.enqueued_at,
},
})),
None => Ok(None),
},
}
}
pub fn next_update_id(
update_writer: &mut heed::RwTxn<UpdateT>,
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
) -> ZResult<u64> {
let last_update = updates_store.last_update(update_writer)?;
let last_update = last_update.map(|(n, _)| n);
let last_update_results_id = updates_results_store.last_update(update_writer)?;
let last_update_results_id = last_update_results_id.map(|(n, _)| n);
let max_update_id = cmp::max(last_update, last_update_results_id);
let new_update_id = max_update_id.map_or(0, |n| n + 1);
Ok(new_update_id)
}
pub fn update_task(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
update_id: u64,
update: Update,
) -> MResult<ProcessedUpdateResult> {
debug!("Processing update number {}", update_id);
let Update { enqueued_at, data } = update;
let (update_type, result, duration) = match data {
UpdateData::ClearAll => {
let start = Instant::now();
let update_type = UpdateType::ClearAll;
let result = apply_clear_all(writer, index);
(update_type, result, start.elapsed())
}
UpdateData::Customs(customs) => {
let start = Instant::now();
let update_type = UpdateType::Customs;
let result = apply_customs_update(writer, index.main, &customs).map_err(Into::into);
(update_type, result, start.elapsed())
}
UpdateData::DocumentsAddition(documents) => {
let start = Instant::now();
let update_type = UpdateType::DocumentsAddition {
number: documents.len(),
};
let result = apply_documents_addition(writer, index, documents);
(update_type, result, start.elapsed())
}
UpdateData::DocumentsPartial(documents) => {
let start = Instant::now();
let update_type = UpdateType::DocumentsPartial {
number: documents.len(),
};
let result = apply_documents_partial_addition(writer, index, documents);
(update_type, result, start.elapsed())
}
UpdateData::DocumentsDeletion(documents) => {
let start = Instant::now();
let update_type = UpdateType::DocumentsDeletion {
number: documents.len(),
};
let result = apply_documents_deletion(writer, index, documents);
(update_type, result, start.elapsed())
}
UpdateData::Settings(settings) => {
let start = Instant::now();
let update_type = UpdateType::Settings {
settings: settings.clone(),
};
let result = apply_settings_update(
writer,
index,
*settings,
);
(update_type, result, start.elapsed())
}
};
debug!(
"Processed update number {} {:?} {:?}",
update_id, update_type, result
);
let status = ProcessedUpdateResult {
update_id,
update_type,
error: result.as_ref().map_err(|e| e.to_string()).err(),
error_code: result.as_ref().map_err(|e| e.error_name()).err(),
error_type: result.as_ref().map_err(|e| e.error_type()).err(),
error_link: result.as_ref().map_err(|e| e.error_url()).err(),
duration: duration.as_secs_f64(),
enqueued_at,
processed_at: Utc::now(),
};
Ok(status)
}
fn compute_short_prefixes<A>(
writer: &mut heed::RwTxn<MainT>,
words_fst: &fst::Set<A>,
index: &store::Index,
) -> MResult<()>
where A: AsRef<[u8]>,
{
// clear the prefixes
let pplc_store = index.prefix_postings_lists_cache;
pplc_store.clear(writer)?;
for prefix_len in 1..=2 {
// compute prefixes and store those in the PrefixPostingsListsCache store.
let mut previous_prefix: Option<([u8; 4], Vec<_>)> = None;
let mut stream = words_fst.into_stream();
while let Some(input) = stream.next() {
// We skip the prefixes that are shorter than the current length
// we want to cache (<). We must ignore the input when it is exactly the
// same word as the prefix because if we match exactly on it we need
// to consider it as an exact match and not as a prefix (=).
if input.len() <= prefix_len { continue }
if let Some(postings_list) = index.postings_lists.postings_list(writer, input)?.map(|p| p.matches.into_owned()) {
let prefix = &input[..prefix_len];
let mut arr_prefix = [0; 4];
arr_prefix[..prefix_len].copy_from_slice(prefix);
match previous_prefix {
Some((ref mut prev_prefix, ref mut prev_pl)) if *prev_prefix != arr_prefix => {
prev_pl.sort_unstable();
prev_pl.dedup();
if let Ok(prefix) = std::str::from_utf8(&prev_prefix[..prefix_len]) {
debug!("writing the prefix of {:?} of length {}", prefix, prev_pl.len());
}
let pls = Set::new_unchecked(&prev_pl);
pplc_store.put_prefix_postings_list(writer, *prev_prefix, &pls)?;
*prev_prefix = arr_prefix;
prev_pl.clear();
prev_pl.extend_from_slice(&postings_list);
},
Some((_, ref mut prev_pl)) => prev_pl.extend_from_slice(&postings_list),
None => previous_prefix = Some((arr_prefix, postings_list.to_vec())),
}
}
}
// write the last prefix postings lists
if let Some((prev_prefix, mut prev_pl)) = previous_prefix.take() {
prev_pl.sort_unstable();
prev_pl.dedup();
let pls = Set::new_unchecked(&prev_pl);
pplc_store.put_prefix_postings_list(writer, prev_prefix, &pls)?;
}
}
Ok(())
}
fn cache_document_ids_sorted(
writer: &mut heed::RwTxn<MainT>,
ranked_map: &RankedMap,
index: &store::Index,
document_ids: &mut [DocumentId],
) -> MResult<()> {
crate::bucket_sort::placeholder_document_sort(document_ids, index, writer, ranked_map)?;
index.main.put_sorted_document_ids_cache(writer, &document_ids)
}

View File

@ -1,313 +0,0 @@
use std::collections::{BTreeMap, BTreeSet};
use heed::Result as ZResult;
use fst::{set::OpBuilder, SetBuilder};
use sdset::SetBuf;
use meilisearch_schema::Schema;
use crate::database::{MainT, UpdateT};
use crate::settings::{UpdateState, SettingsUpdate, RankingRule};
use crate::update::documents_addition::reindex_all_documents;
use crate::update::{next_update_id, Update};
use crate::{store, MResult, Error};
pub fn push_settings_update(
writer: &mut heed::RwTxn<UpdateT>,
updates_store: store::Updates,
updates_results_store: store::UpdatesResults,
settings: SettingsUpdate,
) -> ZResult<u64> {
let last_update_id = next_update_id(writer, updates_store, updates_results_store)?;
let update = Update::settings(settings);
updates_store.put_update(writer, last_update_id, &update)?;
Ok(last_update_id)
}
pub fn apply_settings_update(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
settings: SettingsUpdate,
) -> MResult<()> {
let mut must_reindex = false;
let mut schema = match index.main.schema(writer)? {
Some(schema) => schema,
None => {
match settings.primary_key.clone() {
UpdateState::Update(id) => Schema::with_primary_key(&id),
_ => return Err(Error::MissingPrimaryKey)
}
}
};
match settings.ranking_rules {
UpdateState::Update(v) => {
let ranked_field: Vec<&str> = v.iter().filter_map(RankingRule::field).collect();
schema.update_ranked(&ranked_field)?;
index.main.put_ranking_rules(writer, &v)?;
must_reindex = true;
},
UpdateState::Clear => {
index.main.delete_ranking_rules(writer)?;
schema.clear_ranked();
must_reindex = true;
},
UpdateState::Nothing => (),
}
match settings.distinct_attribute {
UpdateState::Update(v) => {
let field_id = schema.insert(&v)?;
index.main.put_distinct_attribute(writer, field_id)?;
},
UpdateState::Clear => {
index.main.delete_distinct_attribute(writer)?;
},
UpdateState::Nothing => (),
}
match settings.searchable_attributes.clone() {
UpdateState::Update(v) => {
if v.iter().any(|e| e == "*") || v.is_empty() {
schema.set_all_searchable();
} else {
schema.update_searchable(v)?;
}
must_reindex = true;
},
UpdateState::Clear => {
schema.set_all_searchable();
must_reindex = true;
},
UpdateState::Nothing => (),
}
match settings.displayed_attributes.clone() {
UpdateState::Update(v) => {
if v.contains("*") || v.is_empty() {
schema.set_all_displayed();
} else {
schema.update_displayed(v)?
}
},
UpdateState::Clear => {
schema.set_all_displayed();
},
UpdateState::Nothing => (),
}
match settings.attributes_for_faceting {
UpdateState::Update(attrs) => {
apply_attributes_for_faceting_update(writer, index, &mut schema, &attrs)?;
must_reindex = true;
},
UpdateState::Clear => {
index.main.delete_attributes_for_faceting(writer)?;
index.facets.clear(writer)?;
},
UpdateState::Nothing => (),
}
index.main.put_schema(writer, &schema)?;
match settings.stop_words {
UpdateState::Update(stop_words) => {
if apply_stop_words_update(writer, index, stop_words)? {
must_reindex = true;
}
},
UpdateState::Clear => {
if apply_stop_words_update(writer, index, BTreeSet::new())? {
must_reindex = true;
}
},
UpdateState::Nothing => (),
}
match settings.synonyms {
UpdateState::Update(synonyms) => apply_synonyms_update(writer, index, synonyms)?,
UpdateState::Clear => apply_synonyms_update(writer, index, BTreeMap::new())?,
UpdateState::Nothing => (),
}
if must_reindex {
reindex_all_documents(writer, index)?;
}
Ok(())
}
fn apply_attributes_for_faceting_update(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
schema: &mut Schema,
attributes: &[String]
) -> MResult<()> {
let mut attribute_ids = Vec::new();
for name in attributes {
attribute_ids.push(schema.insert(name)?);
}
let attributes_for_faceting = SetBuf::from_dirty(attribute_ids);
index.main.put_attributes_for_faceting(writer, &attributes_for_faceting)?;
Ok(())
}
pub fn apply_stop_words_update(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
stop_words: BTreeSet<String>,
) -> MResult<bool>
{
let mut must_reindex = false;
let old_stop_words: BTreeSet<String> = index.main
.stop_words_fst(writer)?
.stream()
.into_strs()?
.into_iter()
.collect();
let deletion: BTreeSet<String> = old_stop_words.difference(&stop_words).cloned().collect();
let addition: BTreeSet<String> = stop_words.difference(&old_stop_words).cloned().collect();
if !addition.is_empty() {
apply_stop_words_addition(writer, index, addition)?;
}
if !deletion.is_empty() {
must_reindex = true;
apply_stop_words_deletion(writer, index, deletion)?;
}
let words_fst = index.main.words_fst(writer)?;
if !words_fst.is_empty() {
let stop_words = fst::Set::from_iter(stop_words)?;
let op = OpBuilder::new()
.add(&words_fst)
.add(&stop_words)
.difference();
let mut builder = fst::SetBuilder::memory();
builder.extend_stream(op)?;
let words_fst = builder.into_set();
index.main.put_words_fst(writer, &words_fst)?;
index.main.put_stop_words_fst(writer, &stop_words)?;
}
Ok(must_reindex)
}
fn apply_stop_words_addition(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
addition: BTreeSet<String>,
) -> MResult<()>
{
let main_store = index.main;
let postings_lists_store = index.postings_lists;
let mut stop_words_builder = SetBuilder::memory();
for word in addition {
stop_words_builder.insert(&word)?;
// we remove every posting list associated to a new stop word
postings_lists_store.del_postings_list(writer, word.as_bytes())?;
}
// create the new delta stop words fst
let delta_stop_words = stop_words_builder.into_set();
// we also need to remove all the stop words from the main fst
let words_fst = main_store.words_fst(writer)?;
if !words_fst.is_empty() {
let op = OpBuilder::new()
.add(&words_fst)
.add(&delta_stop_words)
.difference();
let mut word_fst_builder = SetBuilder::memory();
word_fst_builder.extend_stream(op)?;
let word_fst = word_fst_builder.into_set();
main_store.put_words_fst(writer, &word_fst)?;
}
// now we add all of these stop words from the main store
let stop_words_fst = main_store.stop_words_fst(writer)?;
let op = OpBuilder::new()
.add(&stop_words_fst)
.add(&delta_stop_words)
.r#union();
let mut stop_words_builder = SetBuilder::memory();
stop_words_builder.extend_stream(op)?;
let stop_words_fst = stop_words_builder.into_set();
main_store.put_stop_words_fst(writer, &stop_words_fst)?;
Ok(())
}
fn apply_stop_words_deletion(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
deletion: BTreeSet<String>,
) -> MResult<()> {
let mut stop_words_builder = SetBuilder::memory();
for word in deletion {
stop_words_builder.insert(&word)?;
}
// create the new delta stop words fst
let delta_stop_words = stop_words_builder.into_set();
// now we delete all of these stop words from the main store
let stop_words_fst = index.main.stop_words_fst(writer)?;
let op = OpBuilder::new()
.add(&stop_words_fst)
.add(&delta_stop_words)
.difference();
let mut stop_words_builder = SetBuilder::memory();
stop_words_builder.extend_stream(op)?;
let stop_words_fst = stop_words_builder.into_set();
Ok(index.main.put_stop_words_fst(writer, &stop_words_fst)?)
}
pub fn apply_synonyms_update(
writer: &mut heed::RwTxn<MainT>,
index: &store::Index,
synonyms: BTreeMap<String, Vec<String>>,
) -> MResult<()> {
let main_store = index.main;
let synonyms_store = index.synonyms;
let mut synonyms_builder = SetBuilder::memory();
synonyms_store.clear(writer)?;
for (word, alternatives) in synonyms.clone() {
synonyms_builder.insert(&word)?;
let alternatives = {
let alternatives = SetBuf::from_dirty(alternatives);
let mut alternatives_builder = SetBuilder::memory();
alternatives_builder.extend_iter(alternatives)?;
alternatives_builder.into_set()
};
synonyms_store.put_synonyms(writer, word.as_bytes(), &alternatives)?;
}
let synonyms_set = synonyms_builder.into_set();
main_store.put_synonyms_fst(writer, &synonyms_set)?;
Ok(())
}

View File

@ -1,8 +1,8 @@
[package]
name = "meilisearch-error"
version = "0.18.0"
version = "0.22.0"
authors = ["marin <postma.marin@protonmail.com>"]
edition = "2018"
[dependencies]
actix-http = "2.2.0"
actix-http = "=3.0.0-beta.6"

View File

@ -63,6 +63,7 @@ pub enum Code {
Facet,
Filter,
Sort,
BadParameter,
BadRequest,
@ -81,7 +82,6 @@ pub enum Code {
}
impl Code {
/// ascociate a `Code` variant to the actual ErrCode
fn err_code(&self) -> ErrCode {
use Code::*;
@ -94,39 +94,57 @@ impl Code {
// thrown when requesting an unexisting index
IndexNotFound => ErrCode::invalid("index_not_found", StatusCode::NOT_FOUND),
InvalidIndexUid => ErrCode::invalid("invalid_index_uid", StatusCode::BAD_REQUEST),
OpenIndex => ErrCode::internal("index_not_accessible", StatusCode::INTERNAL_SERVER_ERROR),
OpenIndex => {
ErrCode::internal("index_not_accessible", StatusCode::INTERNAL_SERVER_ERROR)
}
// invalid state error
InvalidState => ErrCode::internal("invalid_state", StatusCode::INTERNAL_SERVER_ERROR),
// thrown when no primary key has been set
MissingPrimaryKey => ErrCode::invalid("missing_primary_key", StatusCode::BAD_REQUEST),
// error thrown when trying to set an already existing primary key
PrimaryKeyAlreadyPresent => ErrCode::invalid("primary_key_already_present", StatusCode::BAD_REQUEST),
PrimaryKeyAlreadyPresent => {
ErrCode::invalid("primary_key_already_present", StatusCode::BAD_REQUEST)
}
// invalid document
MaxFieldsLimitExceeded => ErrCode::invalid("max_fields_limit_exceeded", StatusCode::BAD_REQUEST),
MaxFieldsLimitExceeded => {
ErrCode::invalid("max_fields_limit_exceeded", StatusCode::BAD_REQUEST)
}
MissingDocumentId => ErrCode::invalid("missing_document_id", StatusCode::BAD_REQUEST),
// error related to facets
Facet => ErrCode::invalid("invalid_facet", StatusCode::BAD_REQUEST),
// error related to filters
Filter => ErrCode::invalid("invalid_filter", StatusCode::BAD_REQUEST),
// error related to sorts
Sort => ErrCode::invalid("invalid_sort", StatusCode::BAD_REQUEST),
BadParameter => ErrCode::invalid("bad_parameter", StatusCode::BAD_REQUEST),
BadRequest => ErrCode::invalid("bad_request", StatusCode::BAD_REQUEST),
DocumentNotFound => ErrCode::invalid("document_not_found", StatusCode::NOT_FOUND),
Internal => ErrCode::internal("internal", StatusCode::INTERNAL_SERVER_ERROR),
InvalidToken => ErrCode::authentication("invalid_token", StatusCode::FORBIDDEN),
MissingAuthorizationHeader => ErrCode::authentication("missing_authorization_header", StatusCode::UNAUTHORIZED),
MissingAuthorizationHeader => {
ErrCode::authentication("missing_authorization_header", StatusCode::UNAUTHORIZED)
}
NotFound => ErrCode::invalid("not_found", StatusCode::NOT_FOUND),
PayloadTooLarge => ErrCode::invalid("payload_too_large", StatusCode::PAYLOAD_TOO_LARGE),
RetrieveDocument => ErrCode::internal("unretrievable_document", StatusCode::BAD_REQUEST),
RetrieveDocument => {
ErrCode::internal("unretrievable_document", StatusCode::BAD_REQUEST)
}
SearchDocuments => ErrCode::internal("search_error", StatusCode::BAD_REQUEST),
UnsupportedMediaType => ErrCode::invalid("unsupported_media_type", StatusCode::UNSUPPORTED_MEDIA_TYPE),
UnsupportedMediaType => {
ErrCode::invalid("unsupported_media_type", StatusCode::UNSUPPORTED_MEDIA_TYPE)
}
// error related to dump
DumpAlreadyInProgress => ErrCode::invalid("dump_already_in_progress", StatusCode::CONFLICT),
DumpProcessFailed => ErrCode::internal("dump_process_failed", StatusCode::INTERNAL_SERVER_ERROR),
DumpAlreadyInProgress => {
ErrCode::invalid("dump_already_in_progress", StatusCode::CONFLICT)
}
DumpProcessFailed => {
ErrCode::internal("dump_process_failed", StatusCode::INTERNAL_SERVER_ERROR)
}
}
}

View File

@ -1,85 +1,109 @@
[package]
name = "meilisearch-http"
authors = ["Quentin de Quelen <quentin@dequelen.me>", "Clément Renault <clement@meilisearch.com>"]
description = "MeiliSearch HTTP server"
version = "0.18.0"
license = "MIT"
authors = [
"Quentin de Quelen <quentin@dequelen.me>",
"Clément Renault <clement@meilisearch.com>",
]
edition = "2018"
license = "MIT"
name = "meilisearch-http"
version = "0.22.0"
[[bin]]
name = "meilisearch"
path = "src/main.rs"
[features]
default = ["sentry"]
[build-dependencies]
actix-web-static-files = { git = "https://github.com/MarinPostma/actix-web-static-files.git", rev = "6db8c3e", optional = true }
anyhow = { version = "*", optional = true }
cargo_toml = { version = "0.9.0", optional = true }
hex = { version = "0.4.3", optional = true }
reqwest = { version = "0.11.3", features = ["blocking", "rustls-tls"], default-features = false, optional = true }
sha-1 = { version = "0.9.4", optional = true }
tempfile = { version = "3.1.0", optional = true }
vergen = { version = "5.1.15", default-features = false, features = ["git"] }
zip = { version = "0.5.12", optional = true }
[dependencies]
actix-cors = "0.5.4"
actix-http = "2.2.0"
actix-rt = "1.1.1"
actix-service = "1.0.6"
actix-web = { version = "3.3.2", features = ["rustls"] }
bytes = "1.0.0"
actix-cors = { git = "https://github.com/MarinPostma/actix-extras.git", rev = "2dac1a4"}
actix-http = { version = "=3.0.0-beta.6" }
actix-service = "2.0.0"
actix-web = { version = "=4.0.0-beta.6", features = ["rustls"] }
actix-web-static-files = { git = "https://github.com/MarinPostma/actix-web-static-files.git", rev = "6db8c3e", optional = true }
anyhow = "1.0.36"
async-stream = "0.3.0"
async-trait = "0.1.42"
arc-swap = "1.2.0"
byte-unit = { version = "4.0.9", default-features = false, features = ["std"] }
bytes = "0.6.0"
chrono = { version = "0.4.19", features = ["serde"] }
crossbeam-channel = "0.5.0"
either = "1.6.1"
env_logger = "0.8.2"
flate2 = "1.0.19"
futures = "0.3.8"
http = "0.2.2"
indexmap = { version = "1.6.1", features = ["serde-1"] }
log = "0.4.11"
main_error = "0.1.1"
meilisearch-core = { path = "../meilisearch-core", version = "0.18.0" }
meilisearch-error = { path = "../meilisearch-error", version = "0.18.0" }
meilisearch-schema = { path = "../meilisearch-schema", version = "0.18.0" }
fst = "0.4.5"
futures = "0.3.7"
futures-util = "0.3.8"
heed = { git = "https://github.com/Kerollmops/heed", tag = "v0.12.1" }
http = "0.2.1"
indexmap = { version = "1.3.2", features = ["serde-1"] }
itertools = "0.10.0"
log = "0.4.8"
main_error = "0.1.0"
meilisearch-error = { path = "../meilisearch-error" }
meilisearch-tokenizer = { git = "https://github.com/meilisearch/tokenizer.git", tag = "v0.2.5" }
memmap = "0.7.0"
milli = { git = "https://github.com/meilisearch/milli.git", tag = "v0.12.0" }
mime = "0.3.16"
num_cpus = "1.13.0"
once_cell = "1.5.2"
rand = "0.8.1"
parking_lot = "0.11.1"
rand = "0.7.3"
rayon = "1.5.0"
regex = "1.4.2"
rustls = "0.18.0"
serde = { version = "1.0.118", features = ["derive"] }
serde_json = { version = "1.0.61", features = ["preserve_order"] }
serde_qs = "0.8.2"
sha2 = "0.9.2"
siphasher = "0.3.3"
rustls = "0.19"
serde = { version = "1.0", features = ["derive"] }
serde_json = { version = "1.0.59", features = ["preserve_order"] }
sha2 = "0.9.1"
siphasher = "0.3.2"
slice-group-by = "0.2.6"
structopt = "0.3.21"
tar = "0.4.30"
structopt = "0.3.20"
tar = "0.4.29"
tempfile = "3.1.0"
tokio = { version = "0.2", features = ["macros"] }
ureq = { version = "2.0.0", features = ["tls"], default-features = false }
walkdir = "2.3.1"
whoami = "1.0.3"
[dependencies.sentry]
version = "0.18.1"
default-features = false
features = [
"with_client_implementation",
"with_panic",
"with_failure",
"with_device_info",
"with_rust_info",
"with_reqwest_transport",
"with_rustls",
"with_env_logger"
]
optional = true
thiserror = "1.0.24"
tokio = { version = "1", features = ["full"] }
uuid = { version = "0.8.2", features = ["serde"] }
walkdir = "2.3.2"
obkv = "0.2.0"
pin-project = "1.0.7"
whoami = { version = "1.1.2", optional = true }
reqwest = { version = "0.11.3", features = ["json", "rustls-tls"], default-features = false, optional = true }
serdeval = "0.1.0"
sysinfo = "0.20.0"
[dev-dependencies]
serde_url_params = "0.2.0"
actix-rt = "2.1.0"
assert-json-diff = { branch = "master", git = "https://github.com/qdequele/assert-json-diff" }
mockall = "0.9.1"
paste = "1.0.5"
serde_url_params = "0.2.1"
tempdir = "0.3.7"
tokio = { version = "0.2", features = ["macros", "time"] }
urlencoding = "1.1.1"
[dev-dependencies.assert-json-diff]
git = "https://github.com/qdequele/assert-json-diff"
branch = "master"
[features]
mini-dashboard = [
"actix-web-static-files",
"anyhow",
"cargo_toml",
"hex",
"reqwest",
"sha-1",
"tempfile",
"zip",
]
analytics = ["whoami", "reqwest"]
default = ["analytics", "mini-dashboard"]
[build-dependencies]
vergen = "3.1.0"
[target.'cfg(unix)'.dependencies]
[target.'cfg(target_os = "linux")'.dependencies]
jemallocator = "0.3.2"
[package.metadata.mini-dashboard]
assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.1.4/build.zip"
sha1 = "750e8a8e56cfa61fbf9ead14b08a5f17ad3f3d37"

View File

@ -1,10 +1,86 @@
use vergen::{generate_cargo_keys, ConstantsFlags};
use vergen::{vergen, Config};
fn main() {
// Setup the flags, toggling off the 'SEMVER_FROM_CARGO_PKG' flag
let mut flags = ConstantsFlags::all();
flags.toggle(ConstantsFlags::SEMVER_FROM_CARGO_PKG);
if let Err(e) = vergen(Config::default()) {
println!("cargo:warning=vergen: {}", e);
}
// Generate the 'cargo:' key output
generate_cargo_keys(ConstantsFlags::all()).expect("Unable to generate the cargo keys!");
#[cfg(feature = "mini-dashboard")]
mini_dashboard::setup_mini_dashboard().expect("Could not load the mini-dashboard assets");
}
#[cfg(feature = "mini-dashboard")]
mod mini_dashboard {
use std::env;
use std::fs::{create_dir_all, File, OpenOptions};
use std::io::{Cursor, Read, Write};
use std::path::PathBuf;
use actix_web_static_files::resource_dir;
use anyhow::Context;
use cargo_toml::Manifest;
use reqwest::blocking::get;
use sha1::{Digest, Sha1};
pub fn setup_mini_dashboard() -> anyhow::Result<()> {
let cargo_manifest_dir = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap());
let cargo_toml = cargo_manifest_dir.join("Cargo.toml");
let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
let sha1_path = out_dir.join(".mini-dashboard.sha1");
let dashboard_dir = out_dir.join("mini-dashboard");
let manifest = Manifest::from_path(cargo_toml).unwrap();
let meta = &manifest
.package
.as_ref()
.context("package not specified in Cargo.toml")?
.metadata
.as_ref()
.context("no metadata specified in Cargo.toml")?["mini-dashboard"];
// Check if there already is a dashboard built, and if it is up to date.
if sha1_path.exists() && dashboard_dir.exists() {
let mut sha1_file = File::open(&sha1_path)?;
let mut sha1 = String::new();
sha1_file.read_to_string(&mut sha1)?;
if sha1 == meta["sha1"].as_str().unwrap() {
// Nothing to do.
return Ok(());
}
}
let url = meta["assets-url"].as_str().unwrap();
let dashboard_assets_bytes = get(url)?.bytes()?;
let mut hasher = Sha1::new();
hasher.update(&dashboard_assets_bytes);
let sha1 = hex::encode(hasher.finalize());
assert_eq!(
meta["sha1"].as_str().unwrap(),
sha1,
"Downloaded mini-dashboard shasum differs from the one specified in the Cargo.toml"
);
create_dir_all(&dashboard_dir)?;
let cursor = Cursor::new(&dashboard_assets_bytes);
let mut zip = zip::read::ZipArchive::new(cursor)?;
zip.extract(&dashboard_dir)?;
resource_dir(&dashboard_dir).build()?;
// Write the sha1 for the dashboard back to file.
let mut file = OpenOptions::new()
.write(true)
.create(true)
.truncate(true)
.open(sha1_path)?;
file.write_all(sha1.as_bytes())?;
file.flush()?;
Ok(())
}
}

File diff suppressed because one or more lines are too long

View File

@ -1,333 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="bulma.min.css">
<title>MeiliSearch</title>
<style>
em {
color: hsl(204, 86%, 25%);
font-style: inherit;
background-color: hsl(204, 86%, 88%);
}
#results {
max-width: 900px;
margin: 20px auto 0 auto;
padding: 0;
}
.notification {
display: flex;
justify-content: center;
}
.level-left {
margin-right: 50px;
}
.document {
border-radius: 4px;
margin-bottom: 20px;
display: flex;
}
.document ol {
flex: 0 0 75%;
max-width: 75%;
padding: 0;
margin: 0;
list-style-type: none;
}
.document ol li {
list-style: none;
}
.document .image {
max-width: 50%;
margin: 0 auto;
box-sizing: border-box;
}
@media screen and (min-width: 770px) {
.document .image {
max-width: 25%;
flex: 0 0 25%;
margin: 0;
padding-left: 30px;
box-sizing: border-box;
}
}
.document .image img {
width: 100%;
}
.attribute {
text-align: center;
box-sizing: border-box;
text-transform: uppercase;
font-weight: bold;
color: rgba(0,0,0,.7);
}
@media screen and (min-width: 770px) {
.attribute {
flex: 0 0 25%;
max-width: 25%;
text-align: right;
padding-right: 10px;
font-weight: normal;
box-sizing: border-box;
}
}
@media screen and (max-width: 770px) {
.attribute {
padding-bottom: 0;
}
}
.content {
flex: 0 0 75%;
box-sizing: border-box;
color: rgba(0,0,0,.9);
overflow-wrap: anywhere;
}
.hero-foot {
padding-bottom: 3rem;
}
@media screen and (max-width: 770px) {
.align-on-mobile {
text-align: center;
}
}
</style>
</head>
<body>
<section class="hero is-light">
<div class="hero-body">
<div class="container">
<div class="content is-medium align-on-mobile">
<h1 class="title is-1 is-spaced">
Welcome to MeiliSearch
</h1>
<p class="subtitle is-4">
This dashboard will help you check the search results with ease.
</p>
</div>
<div class="columns">
<div class="column is-4">
<div class="field">
<!-- API Key -->
<label class="label" for="apiKey">API Key (optional)</label>
<div class="control">
<input id="apiKey" class="input is-small" type="password" placeholder="Enter your API key">
</div>
<p class="help">At least a private API key is required for the dashboard to access the indexes list.</p>
</div>
</div>
</div>
<div class="columns">
<div class="column is-8">
<label class="label" for="search">Search something</label>
<div class="field has-addons">
<div class="control">
<span class="select">
<select role="listbox" id="index" aria-label="Select the index you want to search on">
<!-- indexes names -->
</select>
</span>
</div>
<div class="control is-expanded">
<input id="search" class="input" type="search" autofocus placeholder="e.g. George Clooney" aria-label="Search through your documents">
</div>
</div>
</div>
<div class="column is-4">
<div class="columns">
<div class="column is-6 has-text-centered">
<p class="heading">Documents</p>
<p id="count" class="title">0</p>
</div>
<div class="column is-6 has-text-centered">
<p class="heading">Time Spent</p>
<p id="time" class="title">N/A</p>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section>
<div class="container">
<ol id="results" class="content">
<!-- documents matching resquests -->
</ol>
</div>
</section>
</body>
<script>
function sanitizeHTMLEntities(str) {
if (str && typeof str === 'string') {
str = str.replace(/</g,"&lt;");
str = str.replace(/>/g,"&gt;");
str = str.replace(/&lt;em&gt;/g,"<em>");
str = str.replace(/&lt;\/em&gt;/g,"<\/em>");
}
return str;
}
function httpGet(theUrl, apiKey) {
var xmlHttp = new XMLHttpRequest();
xmlHttp.open("GET", theUrl, false); // false for synchronous request
if (apiKey) {
xmlHttp.setRequestHeader("x-Meili-API-Key", apiKey);
}
xmlHttp.send(null);
return xmlHttp.responseText;
}
function refreshIndexList() {
// TODO we must not block here
let result = JSON.parse(httpGet(`${baseUrl}/indexes`, localStorage.getItem('apiKey')));
if (!Array.isArray(result)) { return }
let select = document.getElementById("index");
select.innerHTML = '';
for (index of result) {
const option = document.createElement('option');
option.value = index.uid;
option.innerHTML = index.name;
select.appendChild(option);
}
}
let lastRequest = undefined;
function triggerSearch() {
var e = document.getElementById("index");
if (e.selectedIndex == -1) { return }
var index = e.options[e.selectedIndex].value;
let theUrl = `${baseUrl}/indexes/${index}/search?q=${encodeURIComponent(search.value)}&attributesToHighlight=*`;
if (lastRequest) { lastRequest.abort() }
lastRequest = new XMLHttpRequest();
lastRequest.open("GET", theUrl, true);
if (localStorage.getItem('apiKey')) {
lastRequest.setRequestHeader("x-Meili-API-Key", localStorage.getItem('apiKey'));
}
lastRequest.onload = function (e) {
if (lastRequest.readyState === 4 && lastRequest.status === 200) {
let sanitizedResponseText = sanitizeHTMLEntities(lastRequest.responseText);
let httpResults = JSON.parse(sanitizedResponseText);
results.innerHTML = '';
let processingTimeMs = httpResults.processingTimeMs;
let numberOfDocuments = httpResults.nbHits;
time.innerHTML = `${processingTimeMs}ms`;
count.innerHTML = `${numberOfDocuments}`;
for (result of httpResults.hits) {
const element = {...result, ...result._formatted };
delete element._formatted;
const elem = document.createElement('li');
elem.classList.add("document","box");
const div = document.createElement('div');
div.classList.add("columns","is-desktop","is-tablet");
const info = document.createElement('div');
info.classList.add("column","align-on-mobile");
let image = undefined;
for (const prop in element) {
// Check if property is an image url link.
if (typeof result[prop] === 'string') {
if (image == undefined && result[prop].match(/^(https|http):\/\/.*(jpe?g|png|gif)(\?.*)?$/g)) {
image = result[prop];
}
}
const field = document.createElement('div');
field.classList.add("columns");
const attribute = document.createElement('div');
attribute.classList.add("attribute", "column");
attribute.innerHTML = prop;
const content = document.createElement('div');
content.classList.add("content", "column");
if (typeof (element[prop]) === "object") {
content.innerHTML = JSON.stringify(element[prop]);
} else {
content.innerHTML = element[prop];
}
field.appendChild(attribute);
field.appendChild(content);
info.appendChild(field);
}
div.appendChild(info);
elem.appendChild(div);
if (image != undefined) {
const divImage = document.createElement('div');
divImage.classList.add("image","column","align-on-mobile");
const img = document.createElement('img');
img.src = image;
img.setAttribute("alt","Item illustration");
divImage.appendChild(img);
div.appendChild(divImage);
elem.appendChild(div);
}
results.appendChild(elem)
}
} else {
console.error(lastRequest.statusText);
}
};
lastRequest.send(null);
}
if (!apiKey.value) {
apiKey.value = localStorage.getItem('apiKey');
}
apiKey.addEventListener('input', function(e) {
localStorage.setItem('apiKey', apiKey.value);
refreshIndexList();
}, false);
let baseUrl = window.location.origin;
refreshIndexList();
search.oninput = triggerSearch;
let select = document.getElementById("index");
select.onchange = triggerSearch;
triggerSearch();
</script>
</html>

View File

@ -1,12 +1,9 @@
use std::hash::{Hash, Hasher};
use std::{error, thread};
use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH};
use log::error;
use log::debug;
use serde::Serialize;
use serde_qs as qs;
use siphasher::sip::SipHasher;
use walkdir::WalkDir;
use crate::Data;
use crate::Opt;
@ -21,31 +18,21 @@ struct EventProperties {
}
impl EventProperties {
fn from(data: Data) -> Result<EventProperties, Box<dyn error::Error>> {
let mut index_list = Vec::new();
async fn from(data: Data) -> anyhow::Result<EventProperties> {
let stats = data.index_controller.get_all_stats().await?;
let reader = data.db.main_read_txn()?;
for index_uid in data.db.indexes_uids() {
if let Some(index) = data.db.open_index(&index_uid) {
let number_of_documents = index.main.number_of_documents(&reader)?;
index_list.push(number_of_documents);
}
}
let database_size = WalkDir::new(&data.db_path)
.into_iter()
.filter_map(|entry| entry.ok())
.filter_map(|entry| entry.metadata().ok())
.filter(|metadata| metadata.is_file())
.fold(0, |acc, m| acc + m.len());
let last_update_timestamp = data.db.last_update(&reader)?.map(|u| u.timestamp());
let database_size = stats.database_size;
let last_update_timestamp = stats.last_update.map(|u| u.timestamp());
let number_of_documents = stats
.indexes
.values()
.map(|index| index.number_of_documents)
.collect();
Ok(EventProperties {
database_size,
last_update_timestamp,
number_of_documents: index_list,
number_of_documents,
})
}
}
@ -72,10 +59,10 @@ struct Event<'a> {
#[derive(Debug, Serialize)]
struct AmplitudeRequest<'a> {
api_key: &'a str,
event: &'a str,
events: Vec<Event<'a>>,
}
pub fn analytics_sender(data: Data, opt: Opt) {
pub async fn analytics_sender(data: Data, opt: Opt) {
let username = whoami::username();
let hostname = whoami::hostname();
let platform = whoami::platform();
@ -97,7 +84,7 @@ pub fn analytics_sender(data: Data, opt: Opt) {
let time = n.as_secs();
let event_type = "runtime_tick";
let elapsed_since_start = first_start.elapsed().as_secs() / 86_400; // One day
let event_properties = EventProperties::from(data.clone()).ok();
let event_properties = EventProperties::from(data.clone()).await.ok();
let app_version = env!("CARGO_PKG_VERSION").to_string();
let app_version = app_version.as_str();
let user_email = std::env::var("MEILI_USER_EMAIL").ok();
@ -116,27 +103,24 @@ pub fn analytics_sender(data: Data, opt: Opt) {
time,
app_version,
user_properties,
event_properties
event_properties,
};
let event = serde_json::to_string(&event).unwrap();
let request = AmplitudeRequest {
api_key: AMPLITUDE_API_KEY,
event: &event,
events: vec![event],
};
let body = qs::to_string(&request).unwrap();
let response = ureq::post("https://api.amplitude.com/httpapi").send_string(&body);
match response {
Err(ureq::Error::Status(_ , response)) => {
error!("Unsuccessful call to Amplitude: {}", response.into_string().unwrap_or_default());
}
Err(e) => {
error!("Unsuccessful call to Amplitude: {}", e);
}
_ => (),
let response = reqwest::Client::new()
.post("https://api2.amplitude.com/2/httpapi")
.timeout(Duration::from_secs(60)) // 1 minute max
.json(&request)
.send()
.await;
if let Err(e) = response {
debug!("Unsuccessful call to Amplitude: {}", e);
}
thread::sleep(Duration::from_secs(3600)) // one hour
tokio::time::sleep(Duration::from_secs(3600)).await;
}
}

View File

@ -1,175 +0,0 @@
use std::error::Error;
use std::ops::Deref;
use std::path::PathBuf;
use std::sync::{Arc, Mutex};
use meilisearch_core::{Database, DatabaseOptions, Index};
use sha2::Digest;
use crate::error::{Error as MSError, ResponseError};
use crate::index_update_callback;
use crate::option::Opt;
use crate::dump::DumpInfo;
#[derive(Clone)]
pub struct Data {
inner: Arc<DataInner>,
}
impl Deref for Data {
type Target = DataInner;
fn deref(&self) -> &Self::Target {
&self.inner
}
}
#[derive(Clone)]
pub struct DataInner {
pub db: Arc<Database>,
pub db_path: String,
pub dumps_dir: PathBuf,
pub dump_batch_size: usize,
pub api_keys: ApiKeys,
pub server_pid: u32,
pub http_payload_size_limit: usize,
pub current_dump: Arc<Mutex<Option<DumpInfo>>>,
}
#[derive(Clone)]
pub struct ApiKeys {
pub public: Option<String>,
pub private: Option<String>,
pub master: Option<String>,
}
impl ApiKeys {
pub fn generate_missing_api_keys(&mut self) {
if let Some(master_key) = &self.master {
if self.private.is_none() {
let key = format!("{}-private", master_key);
let sha = sha2::Sha256::digest(key.as_bytes());
self.private = Some(format!("{:x}", sha));
}
if self.public.is_none() {
let key = format!("{}-public", master_key);
let sha = sha2::Sha256::digest(key.as_bytes());
self.public = Some(format!("{:x}", sha));
}
}
}
}
impl Data {
pub fn new(opt: Opt) -> Result<Data, Box<dyn Error>> {
let db_path = opt.db_path.clone();
let dumps_dir = opt.dumps_dir.clone();
let dump_batch_size = opt.dump_batch_size;
let server_pid = std::process::id();
let db_opt = DatabaseOptions {
main_map_size: opt.max_mdb_size,
update_map_size: opt.max_udb_size,
};
let http_payload_size_limit = opt.http_payload_size_limit;
let db = Arc::new(Database::open_or_create(opt.db_path, db_opt)?);
let mut api_keys = ApiKeys {
master: opt.master_key,
private: None,
public: None,
};
api_keys.generate_missing_api_keys();
let current_dump = Arc::new(Mutex::new(None));
let inner_data = DataInner {
db: db.clone(),
db_path,
dumps_dir,
dump_batch_size,
api_keys,
server_pid,
http_payload_size_limit,
current_dump,
};
let data = Data {
inner: Arc::new(inner_data),
};
let callback_context = data.clone();
db.set_update_callback(Box::new(move |index_uid, status| {
index_update_callback(&index_uid, &callback_context, status);
}));
Ok(data)
}
fn create_index(&self, uid: &str) -> Result<Index, ResponseError> {
if !uid
.chars()
.all(|x| x.is_ascii_alphanumeric() || x == '-' || x == '_')
{
return Err(MSError::InvalidIndexUid.into());
}
let created_index = self.db.create_index(&uid).map_err(|e| match e {
meilisearch_core::Error::IndexAlreadyExists => e.into(),
_ => ResponseError::from(MSError::create_index(e)),
})?;
self.db.main_write::<_, _, ResponseError>(|mut writer| {
created_index.main.put_name(&mut writer, uid)?;
created_index
.main
.created_at(&writer)?
.ok_or(MSError::internal("Impossible to read created at"))?;
created_index
.main
.updated_at(&writer)?
.ok_or(MSError::internal("Impossible to read updated at"))?;
Ok(())
})?;
Ok(created_index)
}
pub fn get_current_dump_info(&self) -> Option<DumpInfo> {
self.current_dump.lock().unwrap().clone()
}
pub fn set_current_dump_info(&self, dump_info: DumpInfo) {
self.current_dump.lock().unwrap().replace(dump_info);
}
pub fn get_or_create_index<F, R>(&self, uid: &str, f: F) -> Result<R, ResponseError>
where
F: FnOnce(&Index) -> Result<R, ResponseError>,
{
let mut index_has_been_created = false;
let index = match self.db.open_index(&uid) {
Some(index) => index,
None => {
index_has_been_created = true;
self.create_index(&uid)?
}
};
match f(&index) {
Ok(r) => Ok(r),
Err(err) => {
if index_has_been_created {
let _ = self.db.delete_index(&uid);
}
Err(err)
}
}
}
}

View File

@ -0,0 +1,133 @@
use std::ops::Deref;
use std::sync::Arc;
use sha2::Digest;
use crate::index::{Checked, Settings};
use crate::index_controller::{
error::Result, DumpInfo, IndexController, IndexMetadata, IndexSettings, IndexStats, Stats,
};
use crate::option::Opt;
pub mod search;
mod updates;
#[derive(Clone)]
pub struct Data {
inner: Arc<DataInner>,
}
impl Deref for Data {
type Target = DataInner;
fn deref(&self) -> &Self::Target {
&self.inner
}
}
pub struct DataInner {
pub index_controller: IndexController,
pub api_keys: ApiKeys,
options: Opt,
}
#[derive(Clone)]
pub struct ApiKeys {
pub public: Option<String>,
pub private: Option<String>,
pub master: Option<String>,
}
impl ApiKeys {
pub fn generate_missing_api_keys(&mut self) {
if let Some(master_key) = &self.master {
if self.private.is_none() {
let key = format!("{}-private", master_key);
let sha = sha2::Sha256::digest(key.as_bytes());
self.private = Some(format!("{:x}", sha));
}
if self.public.is_none() {
let key = format!("{}-public", master_key);
let sha = sha2::Sha256::digest(key.as_bytes());
self.public = Some(format!("{:x}", sha));
}
}
}
}
impl Data {
pub fn new(options: Opt) -> anyhow::Result<Data> {
let path = options.db_path.clone();
let index_controller = IndexController::new(&path, &options)?;
let mut api_keys = ApiKeys {
master: options.clone().master_key,
private: None,
public: None,
};
api_keys.generate_missing_api_keys();
let inner = DataInner {
index_controller,
api_keys,
options,
};
let inner = Arc::new(inner);
Ok(Data { inner })
}
pub async fn settings(&self, uid: String) -> Result<Settings<Checked>> {
self.index_controller.settings(uid).await
}
pub async fn list_indexes(&self) -> Result<Vec<IndexMetadata>> {
self.index_controller.list_indexes().await
}
pub async fn index(&self, uid: String) -> Result<IndexMetadata> {
self.index_controller.get_index(uid).await
}
pub async fn create_index(
&self,
uid: String,
primary_key: Option<String>,
) -> Result<IndexMetadata> {
let settings = IndexSettings {
uid: Some(uid),
primary_key,
};
let meta = self.index_controller.create_index(settings).await?;
Ok(meta)
}
pub async fn get_index_stats(&self, uid: String) -> Result<IndexStats> {
Ok(self.index_controller.get_index_stats(uid).await?)
}
pub async fn get_all_stats(&self) -> Result<Stats> {
Ok(self.index_controller.get_all_stats().await?)
}
pub async fn create_dump(&self) -> Result<DumpInfo> {
Ok(self.index_controller.create_dump().await?)
}
pub async fn dump_status(&self, uid: String) -> Result<DumpInfo> {
Ok(self.index_controller.dump_info(uid).await?)
}
#[inline]
pub fn http_payload_size_limit(&self) -> usize {
self.options.http_payload_size_limit.get_bytes() as usize
}
#[inline]
pub fn api_keys(&self) -> &ApiKeys {
&self.api_keys
}
}

View File

@ -0,0 +1,34 @@
use serde_json::{Map, Value};
use super::Data;
use crate::index::{SearchQuery, SearchResult};
use crate::index_controller::error::Result;
impl Data {
pub async fn search(&self, index: String, search_query: SearchQuery) -> Result<SearchResult> {
self.index_controller.search(index, search_query).await
}
pub async fn retrieve_documents(
&self,
index: String,
offset: usize,
limit: usize,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Vec<Map<String, Value>>> {
self.index_controller
.documents(index, offset, limit, attributes_to_retrieve)
.await
}
pub async fn retrieve_document(
&self,
index: String,
document_id: String,
attributes_to_retrieve: Option<Vec<String>>,
) -> Result<Map<String, Value>> {
self.index_controller
.document(index, document_id, attributes_to_retrieve)
.await
}
}

View File

@ -0,0 +1,80 @@
use milli::update::{IndexDocumentsMethod, UpdateFormat};
use crate::extractors::payload::Payload;
use crate::index::{Checked, Settings};
use crate::index_controller::{error::Result, IndexMetadata, IndexSettings, UpdateStatus};
use crate::Data;
impl Data {
pub async fn add_documents(
&self,
index: String,
method: IndexDocumentsMethod,
format: UpdateFormat,
stream: Payload,
primary_key: Option<String>,
) -> Result<UpdateStatus> {
let update_status = self
.index_controller
.add_documents(index, method, format, stream, primary_key)
.await?;
Ok(update_status)
}
pub async fn update_settings(
&self,
index: String,
settings: Settings<Checked>,
create: bool,
) -> Result<UpdateStatus> {
let update = self
.index_controller
.update_settings(index, settings, create)
.await?;
Ok(update)
}
pub async fn clear_documents(&self, index: String) -> Result<UpdateStatus> {
let update = self.index_controller.clear_documents(index).await?;
Ok(update)
}
pub async fn delete_documents(
&self,
index: String,
document_ids: Vec<String>,
) -> Result<UpdateStatus> {
let update = self
.index_controller
.delete_documents(index, document_ids)
.await?;
Ok(update)
}
pub async fn delete_index(&self, index: String) -> Result<()> {
self.index_controller.delete_index(index).await?;
Ok(())
}
pub async fn get_update_status(&self, index: String, uid: u64) -> Result<UpdateStatus> {
self.index_controller.update_status(index, uid).await
}
pub async fn get_updates_status(&self, index: String) -> Result<Vec<UpdateStatus>> {
self.index_controller.all_update_status(index).await
}
pub async fn update_index(
&self,
uid: String,
primary_key: Option<String>,
new_uid: Option<String>,
) -> Result<IndexMetadata> {
let settings = IndexSettings {
uid: new_uid,
primary_key,
};
self.index_controller.update_index(uid, settings).await
}
}

View File

@ -1,413 +0,0 @@
use std::fs::{create_dir_all, File};
use std::io::prelude::*;
use std::path::{Path, PathBuf};
use std::thread;
use actix_web::web;
use chrono::offset::Utc;
use indexmap::IndexMap;
use log::{error, info};
use meilisearch_core::{MainWriter, MainReader, UpdateReader};
use meilisearch_core::settings::Settings;
use meilisearch_core::update::{apply_settings_update, apply_documents_addition};
use serde::{Deserialize, Serialize};
use serde_json::json;
use tempfile::TempDir;
use crate::Data;
use crate::error::{Error, ResponseError};
use crate::helpers::compression;
use crate::routes::index;
use crate::routes::index::IndexResponse;
#[derive(Debug, Serialize, Deserialize, Copy, Clone)]
enum DumpVersion {
V1,
}
impl DumpVersion {
const CURRENT: Self = Self::V1;
}
#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct DumpMetadata {
indexes: Vec<crate::routes::index::IndexResponse>,
db_version: String,
dump_version: DumpVersion,
}
impl DumpMetadata {
/// Create a DumpMetadata with the current dump version of meilisearch.
pub fn new(indexes: Vec<crate::routes::index::IndexResponse>, db_version: String) -> Self {
DumpMetadata {
indexes,
db_version,
dump_version: DumpVersion::CURRENT,
}
}
/// Extract DumpMetadata from `metadata.json` file present at provided `dir_path`
fn from_path(dir_path: &Path) -> Result<Self, Error> {
let path = dir_path.join("metadata.json");
let file = File::open(path)?;
let reader = std::io::BufReader::new(file);
let metadata = serde_json::from_reader(reader)?;
Ok(metadata)
}
/// Write DumpMetadata in `metadata.json` file at provided `dir_path`
fn to_path(&self, dir_path: &Path) -> Result<(), Error> {
let path = dir_path.join("metadata.json");
let file = File::create(path)?;
serde_json::to_writer(file, &self)?;
Ok(())
}
}
/// Extract Settings from `settings.json` file present at provided `dir_path`
fn settings_from_path(dir_path: &Path) -> Result<Settings, Error> {
let path = dir_path.join("settings.json");
let file = File::open(path)?;
let reader = std::io::BufReader::new(file);
let metadata = serde_json::from_reader(reader)?;
Ok(metadata)
}
/// Write Settings in `settings.json` file at provided `dir_path`
fn settings_to_path(settings: &Settings, dir_path: &Path) -> Result<(), Error> {
let path = dir_path.join("settings.json");
let file = File::create(path)?;
serde_json::to_writer(file, settings)?;
Ok(())
}
/// Import settings and documents of a dump with version `DumpVersion::V1` in specified index.
fn import_index_v1(
data: &Data,
dumps_dir: &Path,
index_uid: &str,
document_batch_size: usize,
write_txn: &mut MainWriter,
) -> Result<(), Error> {
// open index
let index = data
.db
.open_index(index_uid)
.ok_or(Error::index_not_found(index_uid))?;
// index dir path in dump dir
let index_path = &dumps_dir.join(index_uid);
// extract `settings.json` file and import content
let settings = settings_from_path(&index_path)?;
let settings = settings.to_update().map_err(|e| Error::dump_failed(format!("importing settings for index {}; {}", index_uid, e)))?;
apply_settings_update(write_txn, &index, settings)?;
// create iterator over documents in `documents.jsonl` to make batch importation
// create iterator over documents in `documents.jsonl` to make batch importation
let documents = {
let file = File::open(&index_path.join("documents.jsonl"))?;
let reader = std::io::BufReader::new(file);
let deserializer = serde_json::Deserializer::from_reader(reader);
deserializer.into_iter::<IndexMap<String, serde_json::Value>>()
};
// batch import document every `document_batch_size`:
// create a Vec to bufferize documents
let mut values = Vec::with_capacity(document_batch_size);
// iterate over documents
for document in documents {
// push document in buffer
values.push(document?);
// if buffer is full, create and apply a batch, and clean buffer
if values.len() == document_batch_size {
let batch = std::mem::replace(&mut values, Vec::with_capacity(document_batch_size));
apply_documents_addition(write_txn, &index, batch)?;
}
}
// apply documents remaining in the buffer
if !values.is_empty() {
apply_documents_addition(write_txn, &index, values)?;
}
// sync index information: stats, updated_at, last_update
if let Err(e) = crate::index_update_callback_txn(index, index_uid, data, write_txn) {
return Err(Error::Internal(e));
}
Ok(())
}
/// Import dump from `dump_path` in database.
pub fn import_dump(
data: &Data,
dump_path: &Path,
document_batch_size: usize,
) -> Result<(), Error> {
info!("Importing dump from {:?}...", dump_path);
// create a temporary directory
let tmp_dir = TempDir::new()?;
let tmp_dir_path = tmp_dir.path();
// extract dump in temporary directory
compression::from_tar_gz(dump_path, tmp_dir_path)?;
// read dump metadata
let metadata = DumpMetadata::from_path(&tmp_dir_path)?;
// choose importation function from DumpVersion of metadata
let import_index = match metadata.dump_version {
DumpVersion::V1 => import_index_v1,
};
// remove indexes which have same `uid` than indexes to import and create empty indexes
let existing_index_uids = data.db.indexes_uids();
for index in metadata.indexes.iter() {
if existing_index_uids.contains(&index.uid) {
data.db.delete_index(index.uid.clone())?;
}
index::create_index_sync(&data.db, index.uid.clone(), index.name.clone(), index.primary_key.clone())?;
}
// import each indexes content
data.db.main_write::<_, _, Error>(|mut writer| {
for index in metadata.indexes {
import_index(&data, tmp_dir_path, &index.uid, document_batch_size, &mut writer)?;
}
Ok(())
})?;
info!("Dump importation from {:?} succeed", dump_path);
Ok(())
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Clone)]
#[serde(rename_all = "snake_case")]
pub enum DumpStatus {
Done,
InProgress,
Failed,
}
#[derive(Debug, Serialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct DumpInfo {
pub uid: String,
pub status: DumpStatus,
#[serde(skip_serializing_if = "Option::is_none", flatten)]
pub error: Option<serde_json::Value>,
}
impl DumpInfo {
pub fn new(uid: String, status: DumpStatus) -> Self {
Self { uid, status, error: None }
}
pub fn with_error(mut self, error: ResponseError) -> Self {
self.status = DumpStatus::Failed;
self.error = Some(json!(error));
self
}
pub fn dump_already_in_progress(&self) -> bool {
self.status == DumpStatus::InProgress
}
}
/// Generate uid from creation date
fn generate_uid() -> String {
Utc::now().format("%Y%m%d-%H%M%S%3f").to_string()
}
/// Infer dumps_dir from dump_uid
pub fn compressed_dumps_dir(dumps_dir: &Path, dump_uid: &str) -> PathBuf {
dumps_dir.join(format!("{}.dump", dump_uid))
}
/// Write metadata in dump
fn dump_metadata(data: &web::Data<Data>, dir_path: &Path, indexes: Vec<IndexResponse>) -> Result<(), Error> {
let (db_major, db_minor, db_patch) = data.db.version();
let metadata = DumpMetadata::new(indexes, format!("{}.{}.{}", db_major, db_minor, db_patch));
metadata.to_path(dir_path)
}
/// Export settings of provided index in dump
fn dump_index_settings(data: &web::Data<Data>, reader: &MainReader, dir_path: &Path, index_uid: &str) -> Result<(), Error> {
let settings = crate::routes::setting::get_all_sync(data, reader, index_uid)?;
settings_to_path(&settings, dir_path)
}
/// Export updates of provided index in dump
fn dump_index_updates(data: &web::Data<Data>, reader: &UpdateReader, dir_path: &Path, index_uid: &str) -> Result<(), Error> {
let updates_path = dir_path.join("updates.jsonl");
let updates = crate::routes::index::get_all_updates_status_sync(data, reader, index_uid)?;
let file = File::create(updates_path)?;
for update in updates {
serde_json::to_writer(&file, &update)?;
writeln!(&file)?;
}
Ok(())
}
/// Export documents of provided index in dump
fn dump_index_documents(data: &web::Data<Data>, reader: &MainReader, dir_path: &Path, index_uid: &str) -> Result<(), Error> {
let documents_path = dir_path.join("documents.jsonl");
let file = File::create(documents_path)?;
let dump_batch_size = data.dump_batch_size;
let mut offset = 0;
loop {
let documents = crate::routes::document::get_all_documents_sync(data, reader, index_uid, offset, dump_batch_size, None)?;
if documents.is_empty() { break; } else { offset += dump_batch_size; }
for document in documents {
serde_json::to_writer(&file, &document)?;
writeln!(&file)?;
}
}
Ok(())
}
/// Write error with a context.
fn fail_dump_process<E: std::error::Error>(data: &web::Data<Data>, dump_info: DumpInfo, context: &str, error: E) {
let error_message = format!("{}; {}", context, error);
error!("Something went wrong during dump process: {}", &error_message);
data.set_current_dump_info(dump_info.with_error(Error::dump_failed(error_message).into()))
}
/// Main function of dump.
fn dump_process(data: web::Data<Data>, dumps_dir: PathBuf, dump_info: DumpInfo) {
// open read transaction on Update
let update_reader = match data.db.update_read_txn() {
Ok(r) => r,
Err(e) => {
fail_dump_process(&data, dump_info, "creating RO transaction on updates", e);
return ;
}
};
// open read transaction on Main
let main_reader = match data.db.main_read_txn() {
Ok(r) => r,
Err(e) => {
fail_dump_process(&data, dump_info, "creating RO transaction on main", e);
return ;
}
};
// create a temporary directory
let tmp_dir = match TempDir::new() {
Ok(tmp_dir) => tmp_dir,
Err(e) => {
fail_dump_process(&data, dump_info, "creating temporary directory", e);
return ;
}
};
let tmp_dir_path = tmp_dir.path();
// fetch indexes
let indexes = match crate::routes::index::list_indexes_sync(&data, &main_reader) {
Ok(indexes) => indexes,
Err(e) => {
fail_dump_process(&data, dump_info, "listing indexes", e);
return ;
}
};
// create metadata
if let Err(e) = dump_metadata(&data, &tmp_dir_path, indexes.clone()) {
fail_dump_process(&data, dump_info, "generating metadata", e);
return ;
}
// export settings, updates and documents for each indexes
for index in indexes {
let index_path = tmp_dir_path.join(&index.uid);
// create index sub-dircetory
if let Err(e) = create_dir_all(&index_path) {
fail_dump_process(&data, dump_info, &format!("creating directory for index {}", &index.uid), e);
return ;
}
// export settings
if let Err(e) = dump_index_settings(&data, &main_reader, &index_path, &index.uid) {
fail_dump_process(&data, dump_info, &format!("generating settings for index {}", &index.uid), e);
return ;
}
// export documents
if let Err(e) = dump_index_documents(&data, &main_reader, &index_path, &index.uid) {
fail_dump_process(&data, dump_info, &format!("generating documents for index {}", &index.uid), e);
return ;
}
// export updates
if let Err(e) = dump_index_updates(&data, &update_reader, &index_path, &index.uid) {
fail_dump_process(&data, dump_info, &format!("generating updates for index {}", &index.uid), e);
return ;
}
}
// compress dump in a file named `{dump_uid}.dump` in `dumps_dir`
if let Err(e) = crate::helpers::compression::to_tar_gz(&tmp_dir_path, &compressed_dumps_dir(&dumps_dir, &dump_info.uid)) {
fail_dump_process(&data, dump_info, "compressing dump", e);
return ;
}
// update dump info to `done`
let resume = DumpInfo::new(
dump_info.uid,
DumpStatus::Done
);
data.set_current_dump_info(resume);
}
pub fn init_dump_process(data: &web::Data<Data>, dumps_dir: &Path) -> Result<DumpInfo, Error> {
create_dir_all(dumps_dir).map_err(|e| Error::dump_failed(format!("creating temporary directory {}", e)))?;
// check if a dump is already in progress
if let Some(resume) = data.get_current_dump_info() {
if resume.dump_already_in_progress() {
return Err(Error::dump_conflict())
}
}
// generate a new dump info
let info = DumpInfo::new(
generate_uid(),
DumpStatus::InProgress
);
data.set_current_dump_info(info.clone());
let data = data.clone();
let dumps_dir = dumps_dir.to_path_buf();
let info_cloned = info.clone();
// run dump process in a new thread
thread::spawn(move ||
dump_process(data, dumps_dir, info_cloned)
);
Ok(info)
}

View File

@ -1,307 +1,168 @@
use std::error;
use std::error::Error;
use std::fmt;
use actix_http::ResponseBuilder;
use actix_web as aweb;
use actix_web::error::{JsonPayloadError, QueryPayloadError};
use actix_web::body::Body;
use actix_web::dev::BaseHttpResponseBuilder;
use actix_web::http::StatusCode;
use serde::ser::{Serialize, Serializer, SerializeStruct};
use aweb::error::{JsonPayloadError, QueryPayloadError};
use meilisearch_error::{Code, ErrorCode};
use milli::UserError;
use serde::{Deserialize, Serialize};
use meilisearch_error::{ErrorCode, Code};
#[derive(Debug)]
#[derive(Debug, Serialize, Deserialize, Clone)]
#[serde(rename_all = "camelCase")]
pub struct ResponseError {
inner: Box<dyn ErrorCode>,
}
impl error::Error for ResponseError {}
impl ErrorCode for ResponseError {
fn error_code(&self) -> Code {
self.inner.error_code()
}
#[serde(skip)]
code: StatusCode,
message: String,
error_code: String,
error_type: String,
error_link: String,
}
impl fmt::Display for ResponseError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.inner.fmt(f)
self.message.fmt(f)
}
}
impl From<Error> for ResponseError {
fn from(error: Error) -> ResponseError {
ResponseError { inner: Box::new(error) }
}
}
impl From<meilisearch_core::Error> for ResponseError {
fn from(err: meilisearch_core::Error) -> ResponseError {
ResponseError { inner: Box::new(err) }
}
}
impl From<meilisearch_schema::Error> for ResponseError {
fn from(err: meilisearch_schema::Error) -> ResponseError {
ResponseError { inner: Box::new(err) }
}
}
impl From<FacetCountError> for ResponseError {
fn from(err: FacetCountError) -> ResponseError {
ResponseError { inner: Box::new(err) }
}
}
impl Serialize for ResponseError {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let struct_name = "ResponseError";
let field_count = 4;
let mut state = serializer.serialize_struct(struct_name, field_count)?;
state.serialize_field("message", &self.to_string())?;
state.serialize_field("errorCode", &self.error_name())?;
state.serialize_field("errorType", &self.error_type())?;
state.serialize_field("errorLink", &self.error_url())?;
state.end()
impl<T> From<T> for ResponseError
where
T: ErrorCode,
{
fn from(other: T) -> Self {
Self {
code: other.http_status(),
message: other.to_string(),
error_code: other.error_name(),
error_type: other.error_type(),
error_link: other.error_url(),
}
}
}
impl aweb::error::ResponseError for ResponseError {
fn error_response(&self) -> aweb::HttpResponse {
ResponseBuilder::new(self.status_code()).json(&self)
fn error_response(&self) -> aweb::BaseHttpResponse<Body> {
let json = serde_json::to_vec(self).unwrap();
BaseHttpResponseBuilder::new(self.status_code())
.content_type("application/json")
.body(json)
}
fn status_code(&self) -> StatusCode {
self.http_status()
self.code
}
}
macro_rules! internal_error {
($target:ty : $($other:path), *) => {
$(
impl From<$other> for $target {
fn from(other: $other) -> Self {
Self::Internal(Box::new(other))
}
}
)*
}
}
#[derive(Debug)]
pub enum Error {
BadParameter(String, String),
BadRequest(String),
CreateIndex(String),
DocumentNotFound(String),
IndexNotFound(String),
IndexAlreadyExists(String),
Internal(String),
InvalidIndexUid,
InvalidToken(String),
MissingAuthorizationHeader,
NotFound(String),
OpenIndex(String),
RetrieveDocument(u32, String),
SearchDocuments(String),
PayloadTooLarge,
UnsupportedMediaType,
DumpAlreadyInProgress,
DumpProcessFailed(String),
pub struct MilliError<'a>(pub &'a milli::Error);
impl Error for MilliError<'_> {}
impl fmt::Display for MilliError<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.0.fmt(f)
}
}
impl error::Error for Error {}
impl ErrorCode for Error {
impl ErrorCode for MilliError<'_> {
fn error_code(&self) -> Code {
use Error::*;
match self.0 {
milli::Error::InternalError(_) => Code::Internal,
milli::Error::IoError(_) => Code::Internal,
milli::Error::UserError(ref error) => {
match error {
// TODO: wait for spec for new error codes.
UserError::Csv(_)
| UserError::SerdeJson(_)
| UserError::MaxDatabaseSizeReached
| UserError::InvalidCriterionName { .. }
| UserError::InvalidDocumentId { .. }
| UserError::InvalidStoreFile
| UserError::NoSpaceLeftOnDevice
| UserError::DocumentLimitReached => Code::Internal,
UserError::AttributeLimitReached => Code::MaxFieldsLimitExceeded,
UserError::InvalidFilter(_) => Code::Filter,
UserError::InvalidFilterAttribute(_) => Code::Filter,
UserError::MissingDocumentId { .. } => Code::MissingDocumentId,
UserError::MissingPrimaryKey => Code::MissingPrimaryKey,
UserError::PrimaryKeyCannotBeChanged => Code::PrimaryKeyAlreadyPresent,
UserError::PrimaryKeyCannotBeReset => Code::PrimaryKeyAlreadyPresent,
UserError::UnknownInternalDocumentId { .. } => Code::DocumentNotFound,
UserError::InvalidFacetsDistribution { .. } => Code::BadRequest,
UserError::InvalidSortableAttribute { .. } => Code::Sort,
}
}
}
}
}
impl fmt::Display for PayloadError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
BadParameter(_, _) => Code::BadParameter,
BadRequest(_) => Code::BadRequest,
CreateIndex(_) => Code::CreateIndex,
DocumentNotFound(_) => Code::DocumentNotFound,
IndexNotFound(_) => Code::IndexNotFound,
IndexAlreadyExists(_) => Code::IndexAlreadyExists,
Internal(_) => Code::Internal,
InvalidIndexUid => Code::InvalidIndexUid,
InvalidToken(_) => Code::InvalidToken,
MissingAuthorizationHeader => Code::MissingAuthorizationHeader,
NotFound(_) => Code::NotFound,
OpenIndex(_) => Code::OpenIndex,
RetrieveDocument(_, _) => Code::RetrieveDocument,
SearchDocuments(_) => Code::SearchDocuments,
PayloadTooLarge => Code::PayloadTooLarge,
UnsupportedMediaType => Code::UnsupportedMediaType,
DumpAlreadyInProgress => Code::DumpAlreadyInProgress,
DumpProcessFailed(_) => Code::DumpProcessFailed,
PayloadError::Json(e) => e.fmt(f),
PayloadError::Query(e) => e.fmt(f),
}
}
}
#[derive(Debug)]
pub enum FacetCountError {
AttributeNotSet(String),
SyntaxError(String),
UnexpectedToken { found: String, expected: &'static [&'static str] },
NoFacetSet,
pub enum PayloadError {
Json(JsonPayloadError),
Query(QueryPayloadError),
}
impl error::Error for FacetCountError {}
impl Error for PayloadError {}
impl ErrorCode for FacetCountError {
impl ErrorCode for PayloadError {
fn error_code(&self) -> Code {
Code::BadRequest
}
}
impl FacetCountError {
pub fn unexpected_token(found: impl ToString, expected: &'static [&'static str]) -> FacetCountError {
let found = found.to_string();
FacetCountError::UnexpectedToken { expected, found }
}
}
impl From<serde_json::error::Error> for FacetCountError {
fn from(other: serde_json::error::Error) -> FacetCountError {
FacetCountError::SyntaxError(other.to_string())
}
}
impl fmt::Display for FacetCountError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
use FacetCountError::*;
match self {
AttributeNotSet(attr) => write!(f, "Attribute {} is not set as facet", attr),
SyntaxError(msg) => write!(f, "Syntax error: {}", msg),
UnexpectedToken { expected, found } => write!(f, "Unexpected {} found, expected {:?}", found, expected),
NoFacetSet => write!(f, "Can't perform facet count, as no facet is set"),
PayloadError::Json(err) => match err {
JsonPayloadError::Overflow => Code::PayloadTooLarge,
JsonPayloadError::ContentType => Code::UnsupportedMediaType,
JsonPayloadError::Payload(aweb::error::PayloadError::Overflow) => {
Code::PayloadTooLarge
}
JsonPayloadError::Deserialize(_) | JsonPayloadError::Payload(_) => Code::BadRequest,
JsonPayloadError::Serialize(_) => Code::Internal,
_ => Code::Internal,
},
PayloadError::Query(err) => match err {
QueryPayloadError::Deserialize(_) => Code::BadRequest,
_ => Code::Internal,
},
}
}
}
impl Error {
pub fn internal(err: impl fmt::Display) -> Error {
Error::Internal(err.to_string())
}
pub fn bad_request(err: impl fmt::Display) -> Error {
Error::BadRequest(err.to_string())
}
pub fn missing_authorization_header() -> Error {
Error::MissingAuthorizationHeader
}
pub fn invalid_token(err: impl fmt::Display) -> Error {
Error::InvalidToken(err.to_string())
}
pub fn not_found(err: impl fmt::Display) -> Error {
Error::NotFound(err.to_string())
}
pub fn index_not_found(err: impl fmt::Display) -> Error {
Error::IndexNotFound(err.to_string())
}
pub fn document_not_found(err: impl fmt::Display) -> Error {
Error::DocumentNotFound(err.to_string())
}
pub fn bad_parameter(param: impl fmt::Display, err: impl fmt::Display) -> Error {
Error::BadParameter(param.to_string(), err.to_string())
}
pub fn open_index(err: impl fmt::Display) -> Error {
Error::OpenIndex(err.to_string())
}
pub fn create_index(err: impl fmt::Display) -> Error {
Error::CreateIndex(err.to_string())
}
pub fn invalid_index_uid() -> Error {
Error::InvalidIndexUid
}
pub fn retrieve_document(doc_id: u32, err: impl fmt::Display) -> Error {
Error::RetrieveDocument(doc_id, err.to_string())
}
pub fn search_documents(err: impl fmt::Display) -> Error {
Error::SearchDocuments(err.to_string())
}
pub fn dump_conflict() -> Error {
Error::DumpAlreadyInProgress
}
pub fn dump_failed(message: String) -> Error {
Error::DumpProcessFailed(message)
impl From<JsonPayloadError> for PayloadError {
fn from(other: JsonPayloadError) -> Self {
Self::Json(other)
}
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::BadParameter(param, err) => write!(f, "Url parameter {} error: {}", param, err),
Self::BadRequest(err) => f.write_str(err),
Self::CreateIndex(err) => write!(f, "Impossible to create index; {}", err),
Self::DocumentNotFound(document_id) => write!(f, "Document with id {} not found", document_id),
Self::IndexNotFound(index_uid) => write!(f, "Index {} not found", index_uid),
Self::IndexAlreadyExists(index_uid) => write!(f, "Index {} already exists", index_uid),
Self::Internal(err) => f.write_str(err),
Self::InvalidIndexUid => f.write_str("Index must have a valid uid; Index uid can be of type integer or string only composed of alphanumeric characters, hyphens (-) and underscores (_)."),
Self::InvalidToken(err) => write!(f, "Invalid API key: {}", err),
Self::MissingAuthorizationHeader => f.write_str("You must have an authorization token"),
Self::NotFound(err) => write!(f, "{} not found", err),
Self::OpenIndex(err) => write!(f, "Impossible to open index; {}", err),
Self::RetrieveDocument(id, err) => write!(f, "Impossible to retrieve the document with id: {}; {}", id, err),
Self::SearchDocuments(err) => write!(f, "Impossible to search documents; {}", err),
Self::PayloadTooLarge => f.write_str("Payload too large"),
Self::UnsupportedMediaType => f.write_str("Unsupported media type"),
Self::DumpAlreadyInProgress => f.write_str("Another dump is already in progress"),
Self::DumpProcessFailed(message) => write!(f, "Dump process failed: {}", message),
}
impl From<QueryPayloadError> for PayloadError {
fn from(other: QueryPayloadError) -> Self {
Self::Query(other)
}
}
impl From<std::io::Error> for Error {
fn from(err: std::io::Error) -> Error {
Error::Internal(err.to_string())
}
}
impl From<actix_http::Error> for Error {
fn from(err: actix_http::Error) -> Error {
Error::Internal(err.to_string())
}
}
impl From<meilisearch_core::Error> for Error {
fn from(err: meilisearch_core::Error) -> Error {
Error::Internal(err.to_string())
}
}
impl From<serde_json::error::Error> for Error {
fn from(err: serde_json::error::Error) -> Error {
Error::Internal(err.to_string())
}
}
impl From<JsonPayloadError> for Error {
fn from(err: JsonPayloadError) -> Error {
match err {
JsonPayloadError::Deserialize(err) => Error::BadRequest(format!("Invalid JSON: {}", err)),
JsonPayloadError::Overflow => Error::PayloadTooLarge,
JsonPayloadError::ContentType => Error::UnsupportedMediaType,
JsonPayloadError::Payload(err) => Error::BadRequest(format!("Problem while decoding the request: {}", err)),
}
}
}
impl From<QueryPayloadError> for Error {
fn from(err: QueryPayloadError) -> Error {
match err {
QueryPayloadError::Deserialize(err) => Error::BadRequest(format!("Invalid query parameters: {}", err)),
}
}
}
pub fn payload_error_handler<E: Into<Error>>(err: E) -> ResponseError {
let error: Error = err.into();
error.into()
pub fn payload_error_handler<E>(err: E) -> ResponseError
where
E: Into<PayloadError>,
{
err.into().into()
}

View File

@ -0,0 +1,25 @@
use meilisearch_error::{Code, ErrorCode};
#[derive(Debug, thiserror::Error)]
pub enum AuthenticationError {
#[error("You must have an authorization token")]
MissingAuthorizationHeader,
#[error("Invalid API key")]
InvalidToken(String),
// Triggered on configuration error.
#[error("Irretrievable state")]
IrretrievableState,
#[error("Unknown authentication policy")]
UnknownPolicy,
}
impl ErrorCode for AuthenticationError {
fn error_code(&self) -> Code {
match self {
AuthenticationError::MissingAuthorizationHeader => Code::MissingAuthorizationHeader,
AuthenticationError::InvalidToken(_) => Code::InvalidToken,
AuthenticationError::IrretrievableState => Code::Internal,
AuthenticationError::UnknownPolicy => Code::Internal,
}
}
}

Some files were not shown because too many files have changed in this diff Show More