Commit Graph

7197 Commits

Author SHA1 Message Date
ee64f4a936 Use smartstring to store the external id in our hashmap
We need to store all the external id (primary key) in a hashmap
associated to their internal id during.
The smartstring remove heap allocation / memory usage and should
improve the cache locality.
2022-04-13 21:22:07 +02:00
b9e676b8ca Merge #2316
2316: Add version flag r=Kerollmops a=sanders41

# Pull Request

## What does this PR do?
Fixes #2315

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Paul Sanders <psanders1@gmail.com>
2022-04-13 17:24:09 +00:00
6c06fb226d Merge #2307
2307: Feat(Analytics): Add analytics for search format options r=irevoire a=ManyTheFish

Specification: [#120](https://github.com/meilisearch/specifications/pull/120) ([f5c6a8e](f5c6a8e183))

fix #2308

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-13 12:01:52 +00:00
456887a54a Merge #496
496: Improve the performances of the flattening subcrate r=irevoire a=Kerollmops

This PR adds some benchmarks to the _flatten-serde-json_ crate, this crate is responsible for transforming the original documents into flat versions that the engine can understand. It can probably be speed-up and this is why I added benchmarks to it.

I make some interesting performance improvements when I replaced the `json!` macro calls.

```
flatten/simple          time:   [452.44 ns 453.31 ns 454.18 ns]
                        change: [-15.036% -14.751% -14.473%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking flatten/complex: Collecting 100 samples in estimated 5.0007 s (4.9M i                                                                                  flatten/complex         time:   [1.0101 us 1.0131 us 1.0160 us]
                        change: [-18.001% -17.775% -17.536%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
```

---

_I removed this particular commit from this PR._ The reason is that the two other commits were enough for this PR to give enough impact and be merged. We will continue to explore where we can get performances later.

But when I changed the flattening function to accept an owned version of the objects, we lost a lot of performances. Yes, I rewrote the benchmarks (locally) to clone the input object (and measured both, previous and new versions, with the cloning benchmarks). Maybe cloning the benchmark inputs is not the right thing to do...

```
Benchmarking flatten/simple: Collecting 100 samples in estimated 5.0005 s (6.7M it                                                                                  flatten/simple          time:   [746.46 ns 749.59 ns 752.70 ns]
                        change: [+40.082% +40.714% +41.347%] (p = 0.00 < 0.05)
                        Performance has regressed.

Benchmarking flatten/complex: Collecting 100 samples in estimated 5.0047 s (2.9M i                                                                                  flatten/complex         time:   [1.7311 us 1.7342 us 1.7368 us]
                        change: [+40.976% +41.398% +41.807%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
```

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-04-13 11:14:29 +00:00
b3cec1a383 Prefer using direct method calls instead of using the json macros 2022-04-13 13:12:57 +02:00
436d2032c4 Add benchmarks to the flatten-serde-json subcrate 2022-04-13 13:12:57 +02:00
3828635fb2 Merge #489
489: fix distinct count bug r=curquiza a=MarinPostma

fix https://github.com/meilisearch/meilisearch/issues/2152

I think the issue was that we didn't take off the excluded candidates from the initial candidates when returning the candidates with the search result.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-13 10:15:30 +00:00
dda28d7415 exclude excluded canditates from search result candidates 2022-04-13 12:10:35 +02:00
cd83014fff add test for disctinct nb hits 2022-04-13 12:10:35 +02:00
bbb6728d2f add distinct attributes to cli 2022-04-13 12:10:35 +02:00
49fbbacafc Merge #492
492: Add the new `Specify breaking` check to bors.toml r=curquiza a=curquiza

Should prevent this problem: https://github.com/meilisearch/milli/pull/489#issuecomment-1094988060

Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-04-13 08:59:40 +00:00
7ad582f39f Update bors.toml 2022-04-13 10:56:56 +02:00
aa896f0e7a Update bors.toml 2022-04-13 10:56:56 +02:00
0261a0e3cf Add the new Specify breaking check to bors.toml 2022-04-13 10:56:55 +02:00
41249be274 Add version flag 2022-04-12 15:22:36 -04:00
5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
049cf0fcee Merge #2313
2313: fix(search): remove the back and forth between the IndexMap and the serde_json::Map r=irevoire a=irevoire

This is ok because we're using the preserve_order feature in serde_json which is already internally using an IndexMap.

See https://github.com/meilisearch/meilisearch/pull/2298#discussion_r845228412_


Co-authored-by: Tamo <tamo@meilisearch.com>
2022-04-12 14:17:26 +00:00
2ee210483f fix(search): remove the back and forth between the IndexMap and the serde_json::Map
This is ok because we're using the preserve_order feature in serde_json which is already internally using an IndexMap.
2022-04-12 16:12:52 +02:00
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
011f8210ed Make compute_matches more rust idiomatic 2022-04-12 10:19:02 +02:00
6b0737384b Merge #491
491: remove the unused key warning r=curquiza a=irevoire

When I copy-pasted my flatten crate I forgot to remove the key used to publish the package and that throw a warning.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-04-11 16:55:25 +00:00
13205066f3 Merge #2311
2311: Change version for the next release (v0.27.0) r=irevoire a=curquiza

Fixes #2310 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-04-11 14:49:33 +00:00
b3661bf8ec Change version for the next release (v0.27.0) 2022-04-11 16:25:15 +02:00
0990e95830 Feat(Analytics): Add analytics for search format options 2022-04-11 14:53:15 +02:00
e153418b8a remove the unused key warning 2022-04-11 14:52:41 +02:00
f67167fa9f Merge #2178
2178: Refacto docker r=irevoire a=irevoire

closes #2166 and #2085

-----------

I noticed many people had issues with the default configuration of our Dockerfile.
Some examples:
- #2166: If you use ubuntu and mount your `data.ms` in a volume (as shown in the [doc](https://docs.meilisearch.com/learn/getting_started/installation.html#download-and-launch)), you can't run meilisearch
- #2085: Here, meilisearch was not able to erase the `data.ms` when loading a dump because it's the mount point
Currently, we don't show how to use the snapshot and dumps with docker in the documentation. And it's quite hard to do:
  - You either send a big command to meilisearch to change the dump-path, snapshot-path and db-path a single directory and then mount that one
  - Or you mount three volumes
- And there were other issues on the slack community

I think this PR solve the problem.
Now the image contains the `meilisearch` binary in the `/bin` directory, so it's easy to find and always in the `PATH`.
It creates a `data` directory and moves the working-dir in it.
So now you can find the `dumps`, `snapshots` and `data.ms` directory in `/data`.

Here is the new command to run meilisearch with a volume:
```
docker run -it --rm -v $PWD/meili_data:/data -p 7700:7700 getmeili/meilisearch:latest
```

And if you need to import a dump or a snapshot, you don't need to restart your container and mount another volume. You can directly hit the `POST /dumps` route and then run:
```
docker run -it --rm -v $PWD/meili_data:/data -p 7700:7700 getmeili/meilisearch:latest meilisearch --import-dump dumps/20220217-152115159.dump
```

-------

You can already try this PR with the following docker image:
```
getmeili/meilisearch:test-docker-v0.26.0
```

If you want to use the v0.25.2 I created another image;
```
getmeili/meilisearch:test-docker-v0.25.2
```

------

If you're using helm I created a branch [here](https://github.com/meilisearch/meilisearch-kubernetes/tree/test-docker-v0.26.0) that use the v0.26.0 image with the good volume 👍 
If you use this conf with the v0.25.2, it should also work.

Co-authored-by: Tamo <tamo@meilisearch.com>
v0.27.0rc0
2022-04-11 12:01:25 +00:00
31584f34e8 Merge #2298
2298: Nested fields r=irevoire a=irevoire

There are a few things that I want to fix _AFTER_ merging this PR.
For the following RCs.

## Stop the useless conversion
In the `search.rs` I convert a `Document` to a `Value`, and then the `Value` to a `Document` and then back to a `Value` etc. I should stop doing all these conversion and stick to one format.
Probably by merging my `permissive-json-pointer` crate into meilisearch.
That would also give me the opportunity to work directly with obkvs and stops deserializing fields I don't need.

## Add more test specific to the nested
Everything seems to works but I should write tests to double check that the nested works well with the `formatted` field.

## See how I could stop iterating on hashmap and instead fill them correctly
This is related to milli. I really often needs to iterate over hashmap to see if a field is a subset of another field. I could probably generate a structure containing all the possible key values.
ie. the user say `doggo` is an attribute to retrieve. Instead of iterating on all the attributes to retrieve to check if `doggo.name` is a subset of `doggo`. I should insert `doggo.name` in the attributes to retrieve map.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-04-11 11:45:37 +00:00
a70e0a6422 Merge #2304
2304: chore(bors): comments clippy out r=curquiza a=irevoire

There is currently an issue with clippy that stops us from merging PRs.
https://github.com/rust-lang/rust-clippy/issues/8662#issuecomment-1093899755

We can't use clippy in the CI while that's not merged


Co-authored-by: Tamo <tamo@meilisearch.com>
2022-04-11 11:20:18 +00:00
348345f555 chore(bors): comments clippy out
There is currently an issue with clippy that stops us from merging PRs.
https://github.com/rust-lang/rust-clippy/issues/8662#issuecomment-1093899755

We can't use clippy in the CI while that's not merged
2022-04-11 13:19:00 +02:00
683206e140 feat(docker): refactoring the dockerfile
- Move the meilisearch binary to `/bin/meilisearch` so it's always in scope.
- Create a `meili_data` directory used as the default working directory
2022-04-11 13:14:44 +02:00
c8306616e0 Merge #490
490: Enforce labelling for the PRs r=curquiza a=curquiza

- Enforce one of the following labels to make the CI pass: `no breaking`, `DB breaking`, `API breaking` (milli API, not the Meilisearch API of course), or `skip changelog`. This new CI is now `Required` in the GitHub settings for merging a PR.
- Adapt the release drafter to these new labels
- rename `skip-changelog` into `skip changelog` according to the new label name

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-04-11 08:24:23 +00:00
9383629d13 Enforce labelling for the PRs 2022-04-09 23:47:06 +02:00
a16de5de84 Symplify format and remove intermediate function 2022-04-08 11:20:41 +02:00
a769e09dfa Make token_crop_bounds more rust idiomatic 2022-04-07 20:15:14 +02:00
69d312209e feat(search): Implements the nested fields
See https://github.com/meilisearch/specifications/pull/121
2022-04-07 19:47:20 +02:00
9ac2fd1c37 Merge #487
487: Update version (v0.26.0) r=Kerollmops a=curquiza

breaking because of #458 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-04-07 17:10:24 +00:00
80ae020bee Merge #458
458: Nested fields r=Kerollmops a=irevoire

For the following document:
```json
{
  "id": 1,
  "person": {
    "name": "tamo",
    "age": 25,
  }
}
```
Suppose the user sets `person` as a filterable attribute. We need to store `person` in the filterable _obviously_. But we also need to keep track of `person.name` and `person.age` somewhere.
That’s where I changed a little bit the logic of the engine.

Currently, we have a function called `faceted_field` that returns the union of the filterable and sortable.
I renamed this function in `user_defined_faceted_field`. And now, when we finish indexing documents, we look at all the fields and see if they « match » a `user_defined_faceted_field`.
So in our case:
- does `id` match `person`: 🔴 
- does `person.name` match `person`: 🟢 
- does `person.age` match `person`: 🟢 

And thus, we insert in the database the following faceted fields: `person, person.name, person.age`.

The good thing about that solution is that we generate everything during the indexing phase, and then during the search, we can access our field without recomputing too much globbing.

-----

Now the bad thing is that I had to create a new db.

And if that was only one db, that would be ok, but actually, I need to do the same for the:
- Displayed attributes
- Attributes to retrieve
- Attributes to highlight
- Attribute to crop

`@Kerollmops` 
Do you think there is a better way to do it?
Apart from all the code, can we have a problem because we have too many dbs?

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-04-07 16:26:09 +00:00
bab898ce86 move the flatten-serde-json crate inside of milli 2022-04-07 18:20:44 +02:00
c8ed1675a7 Add some documentation 2022-04-07 17:32:13 +02:00
b1905dfa24 Make split_best_frequency returns references instead of owned data 2022-04-07 17:05:44 +02:00
ab458d8840 fix tests after rebase 2022-04-07 17:00:00 +02:00
4f3ce6d9cd nested fields 2022-04-07 16:58:46 +02:00
ee1d627803 Update version (v0.26.0) 2022-04-07 15:56:10 +02:00
013fe4cbc9 Merge #2297
2297: Feat(Search): Enhance formating search results r=ManyTheFish a=ManyTheFish

Add new settings and change crop_len behavior to count words instead of characters.

- [x] `highlightPreTag`
- [x] `highlightPostTag`
- [x] `cropMarker`
- [x] `cropLength` count word instead of chars
- [x] `cropLength` 0 is now considered as no `cropLength`
- [ ] ~smart crop finding the best matches interval~ (postponed)

Partially fixes  #2214. (no smart crop)


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-07 13:29:56 +00:00
dc2cc1ee89 Feat(Search): Enhance formating search results 2022-04-07 15:04:08 +02:00
bb5f0e1485 Merge #2271
2271: Simplify Dockerfile r=ManyTheFish a=Thearas

# Pull Request

## What does this PR do?

1. Fixes #2234
2. Replace `$TARGETPLATFORM` with `apk --print-arch` to make Dockerfile available for `docker build` as well, not just `docker buildx` (inspired by [rust-lang/docker-rust](https://github.com/rust-lang/docker-rust/blob/master/1.59.0/alpine3.14/Dockerfile#L13))

PTAL `@curquiza` 

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Thearas <thearas850@gmail.com>
2022-04-07 11:50:21 +00:00
d5e33637b7 Merge #2296
2296: disable typo for attributes r=curquiza a=MarinPostma

Introduce the disable typos on attribute feature as per https://github.com/meilisearch/specifications/pull/117.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-06 18:10:45 +00:00
c321ac61b5 Merge #2259
2259: disable typos on words r=MarinPostma a=MarinPostma

Introduce the disable typo setting as per https://github.com/meilisearch/specifications/pull/117.

waiting for https://github.com/meilisearch/milli/pull/474.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-06 17:40:08 +00:00
67dea08a0a feat(http, lib): enable disable typos on attributes 2022-04-06 19:25:12 +02:00
e9f66b8766 feat(all): introduce disable typo on words 2022-04-06 19:16:36 +02:00