Commit Graph

1413 Commits

Author SHA1 Message Date
bors[bot]
700318dc62 Merge #357
357: Add benchmarks for the geosearch r=Kerollmops a=irevoire

closes #336

Should I merge this PR in #322 and then we merge everything in `main` or should we wait for #322 to be merged and then merge this one in `main` later?

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-09-21 16:08:06 +00:00
bors[bot]
9d9010e45f Merge #324
324: Implement documents API r=Kerollmops a=MarinPostma

This pr implement the intermediary document representation for milli. The JSON, JSONL and CSV formats are replaced with the format instead, to push the serialization duty on the client side.

The `documents` module contains the interface to the new document format:

- The `DocumentsBuilder` allows the creation of a writer backed document addition, when documents are added either one by one, or as arrays of depth 1. This is made possible by the fact that the seriliazer used by the `add_documents` methods only accepts `[Object]` and `Object`. The related serialization logic is located in the `serde.rs` file.
- The `DocumentsReader` allows to to iterate over the documents created by a `DocumentsBuilder`. A call to `next_document_with_index` returns the next obkv reader in the document addition, along with a reference to the index used to map the field ids in the obkv reader to the field names

All references to json, jsonl or csv in the tests have been replaced with the `documents!` macro, works exaclty like the `serde_json::json` macro, as a convenient way to create a `DocumentsReader`.

Rewrote the search cli, to the `cli` crate, to also allow index manipulation. This only offers basic functionalities for now, but is meant to be easier to extend than http ui


blocked by #308

Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-09-21 15:40:03 +00:00
mpostma
aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
bors[bot]
94764e5c7c Merge #360
360: Update version for the next release (v0.14.0) r=Kerollmops a=curquiza

Release containing the geosearch, cf #322 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-21 08:43:27 +00:00
bors[bot]
31c8de1cca Merge #322
322: Geosearch r=ManyTheFish a=irevoire

This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so.

### What we will have to do on the indexing part:
 - [x] Index the `_geo` fields from the documents.
   - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process.
   - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module.
   - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree`
- [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification.
- [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file.
- [x] save a Roaring bitmap of all documents containing the `_geo` field

### What we will have to do on the query part:
- [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range.
  - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum.
  - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents.
  - [x] Add the `_geoRadius` function in the pest parser.
- [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too!
  - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule.

-----------

- On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned.

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-20 19:04:57 +00:00
Irevoire
0d104a0fce Update milli/src/criterion.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-20 18:13:17 +02:00
Clémentine Urquizar
3f1453f470 Update version for the next release (v0.14.0) 2021-09-20 18:12:23 +02:00
Tamo
f4b8e5675d move the reserved keyword logic for the criterion and sort + add test 2021-09-20 17:21:02 +02:00
Irevoire
3b7a2cdbce fix typo
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-20 16:10:39 +02:00
bors[bot]
203aa727a7 Merge #359
359: Improve the benchmark comparison script r=irevoire a=irevoire

This modification allow us to compare more than 2 benchmarks or to only print the results of one benchmark



Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-20 12:39:59 +00:00
Tamo
eaba772f21 update the README to better match the new critcmp usage
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-20 10:59:55 +02:00
Irevoire
9a920d1f93 Fix datasets links in the readme
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-20 10:44:37 +02:00
Tamo
5e683ba472 add benchmarks for the geosearch 2021-09-20 10:44:37 +02:00
Irevoire
f6c6b026bb improve the comparison script 2021-09-16 11:25:51 +02:00
Tamo
c695a1ffd2 add the possibility to sort by descending order on geoPoint 2021-09-15 11:49:58 +02:00
Tamo
91ce4d1721 Stop iterating through the whole list of points
We stop when there is no possible candidates left
2021-09-15 11:49:58 +02:00
bors[bot]
3b1885859d Merge #356
356: Update the README r=curquiza a=Kerollmops

This PR updates a little bit the README and more specifically the indexing times, fixes #352.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-09-14 10:13:05 +00:00
Kerollmops
2741aa8589 Update the indexing timings in the README 2021-09-14 11:42:59 +02:00
Kerollmops
a43f99c600 Inform the users that documents must have an id in there documents 2021-09-13 14:01:02 +02:00
bors[bot]
90d64d257f Merge #354
354: Update version for the next release (v0.13.1) r=ManyTheFish a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-13 09:30:07 +00:00
Clémentine Urquizar
f167f7b412 Update version for the next release (v0.13.1) 2021-09-10 09:48:17 +02:00
bors[bot]
4af31ec9a6 Merge #353
353: Add lacking parameter to word level position builder r=Kerollmops a=ManyTheFish



Co-authored-by: many <maxime@meilisearch.com>
2021-09-09 16:36:33 +00:00
Tamo
cfc62a1c15 use geoutils instead of haversine 2021-09-09 18:11:38 +02:00
many
26deeb45a3 Add lacking parameter to word level position builder 2021-09-09 17:49:04 +02:00
Tamo
3fc145c254 if we have no rtree we return all other provided documents 2021-09-09 17:44:09 +02:00
Irevoire
a84f3a8b31 Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
c81ff22c5b delete the invalid criterion name error in favor of invalid ranking rule name 2021-09-08 19:17:00 +02:00
Tamo
bad8ea47d5 edit the two lasts TODO comments 2021-09-08 18:24:09 +02:00
Tamo
b15c77ebc4 return an error in case a user try to sort with :desc 2021-09-08 18:24:09 +02:00
Tamo
4b618b95e4 rebase on main 2021-09-08 18:24:09 +02:00
Tamo
2988d3c76d tests the geo filters 2021-09-08 18:24:09 +02:00
Tamo
e5ef0cad9a use meters in the filters 2021-09-08 18:24:09 +02:00
Tamo
4f69b190bc remove the distance from the search, the computation of the distance will be made on meilisearch side 2021-09-08 18:24:09 +02:00
Tamo
7ae2a7341c introduce the reserved keywords in the filters 2021-09-08 18:24:09 +02:00
Tamo
6d5762a6c8 handle the case where you forgot entirely the parenthesis 2021-09-08 18:24:09 +02:00
Tamo
ebf82ac28c improve the error messages and add tests for the filters 2021-09-08 18:24:09 +02:00
Tamo
bd4c248292 improve the error handling in general and introduce the concept of reserved keywords 2021-09-08 18:24:09 +02:00
Tamo
e8c093c1d0 fix the error handling in the filters 2021-09-08 18:24:09 +02:00
Tamo
f0b74637dc fix all the tests 2021-09-08 18:24:09 +02:00
Tamo
b1bf7d4f40 reformat 2021-09-08 18:24:09 +02:00
Tamo
aca707413c remove the memory leak 2021-09-08 18:24:09 +02:00
Tamo
a8a1f5bd55 move the geosearch criteria out of asc_desc.rs 2021-09-08 18:24:09 +02:00
Tamo
dc84ecc40b fix a bug 2021-09-08 18:24:09 +02:00
Tamo
7483614b75 [HTTP-UI] add the sorters 2021-09-08 18:24:09 +02:00
Tamo
4820ac71a6 allow spaces in a geoRadius 2021-09-08 18:24:09 +02:00
Tamo
13c78e5aa2 Implement the _geoPoint in the sortable 2021-09-08 18:24:09 +02:00
Tamo
5bb175fc90 only index _geo if it's set as sortable OR filterable
and only allow the filters if geo was set to filterable
2021-09-08 17:51:08 +02:00
Tamo
f73273d71c only call the extractor if needed 2021-09-08 17:51:08 +02:00
cvermand
4fd0116a0d Stringify objects on dashboard to avoid [Object object] 2021-09-08 17:51:08 +02:00
Irevoire
ea2f2ecf96 create a new database containing all the documents that were geo-faceted 2021-09-08 17:51:08 +02:00