Commit Graph

764 Commits

Author SHA1 Message Date
Irevoire
a3e7c468cd add helper methods on the settings 2021-10-13 13:05:07 +02:00
bors[bot]
6e3b869e6a Merge #388
388: fix primary key inference r=MarinPostma a=MarinPostma

The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id".

This fix sorts the the index by field_id when infering the primary key.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-10-12 09:25:16 +00:00
mpostma
86ead92ed5 infer primary key on sorted fields 2021-10-12 11:15:11 +02:00
mpostma
9a266a531b test correct primary key inference 2021-10-12 11:08:53 +02:00
many
c5a6075484 Make max_position_per_attributes changable 2021-10-12 10:10:50 +02:00
many
360c5ff3df Remove limit of 1000 position per attribute
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
Tom Parker-Shemilt
2dfe24f067 memmap -> memmap2 2021-10-10 22:47:12 +01:00
many
3296bb243c Simplify word level position DB into a word position DB 2021-10-05 12:15:02 +02:00
Many
26b5dad042 Revert "Change chunk size to 4MiB to fit more the end user usage" 2021-09-29 15:08:39 +02:00
Tamo
f65153ad64 stop casting integer docids to string 2021-09-28 18:35:54 +02:00
many
1988416295 Add failing test related to Meilisearch#1714 2021-09-28 12:05:11 +02:00
many
b188063869 Change chunk size to 4MiB to fit more the end user usage 2021-09-27 14:26:21 +02:00
many
551df0cb77 Add test checking the bug reported in meilisearch issue 1716 2021-09-23 15:55:39 +02:00
mpostma
aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
bors[bot]
31c8de1cca Merge #322
322: Geosearch r=ManyTheFish a=irevoire

This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so.

### What we will have to do on the indexing part:
 - [x] Index the `_geo` fields from the documents.
   - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process.
   - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module.
   - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree`
- [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification.
- [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file.
- [x] save a Roaring bitmap of all documents containing the `_geo` field

### What we will have to do on the query part:
- [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range.
  - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum.
  - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents.
  - [x] Add the `_geoRadius` function in the pest parser.
- [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too!
  - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule.

-----------

- On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned.

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-20 19:04:57 +00:00
many
26deeb45a3 Add lacking parameter to word level position builder 2021-09-09 17:49:04 +02:00
Irevoire
a84f3a8b31 Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
bad8ea47d5 edit the two lasts TODO comments 2021-09-08 18:24:09 +02:00
Tamo
bd4c248292 improve the error handling in general and introduce the concept of reserved keywords 2021-09-08 18:24:09 +02:00
Tamo
5bb175fc90 only index _geo if it's set as sortable OR filterable
and only allow the filters if geo was set to filterable
2021-09-08 17:51:08 +02:00
Tamo
f73273d71c only call the extractor if needed 2021-09-08 17:51:08 +02:00
Irevoire
ea2f2ecf96 create a new database containing all the documents that were geo-faceted 2021-09-08 17:51:08 +02:00
Irevoire
216a8aa3b2 add a tests for the indexation of the geosearch 2021-09-08 17:51:07 +02:00
Irevoire
a21c854790 handle errors 2021-09-08 17:51:07 +02:00
Irevoire
70ab2c37c5 remove multiple bugs 2021-09-08 17:51:07 +02:00
Irevoire
b4b6ba6d82 rename all the ’long’ into ’lng’ like written in the specification 2021-09-08 17:51:07 +02:00
Irevoire
3b9f1db061 implement the clear of the rtree 2021-09-08 17:51:07 +02:00
Irevoire
d344489c12 implement the deletion of geo points 2021-09-08 17:51:07 +02:00
Irevoire
44d6b6ae9e Index the geo points 2021-09-08 17:51:07 +02:00
many
e54280fbfc Skip empty normalized words 2021-09-08 15:25:23 +02:00
many
d18ee58ab9 Check if key are not empty in validator 2021-09-08 15:25:23 +02:00
many
9961b78b06 Drop sorter before creating a new one 2021-09-08 13:30:26 +02:00
many
741a4444a9 Remove log in chunk generator 2021-09-02 16:57:46 +02:00
many
7f7fafb857 Make document_chunk_size settable from update builder 2021-09-02 15:25:39 +02:00
many
db0c681bae Fix Pr comments 2021-09-02 15:17:52 +02:00
many
4860fd4529 Ignore empty facet values 2021-09-01 16:48:40 +02:00
many
b3a22f31f6 Fix memory consuption in word pair proximity extractor 2021-09-01 16:48:40 +02:00
many
9452fabfb2 Optimize cbo roaring bitmaps merge 2021-09-01 16:48:40 +02:00
many
8f702828ca Ignore errors comming from crossbeam channel senders 2021-09-01 16:48:40 +02:00
many
e09eec37bc Handle distance addition with hard separators 2021-09-01 16:48:40 +02:00
many
fc7cc770d4 Add logging timers 2021-09-01 16:48:40 +02:00
many
a2f59a28f7 Remove unwrap sending errors in channel 2021-09-01 16:48:40 +02:00
many
5c962c03dd Fix and optimize word_prefix_pair_proximity_docids database 2021-09-01 16:48:40 +02:00
many
2d1727697d Take stop word in account 2021-09-01 16:48:40 +02:00
many
823da19745 Fix test and use progress callback 2021-09-01 16:48:39 +02:00
many
1d314328f0 Plug new indexer 2021-09-01 16:48:36 +02:00
bors[bot]
d6bba0663a Merge #334
334: Wrap long values into BStr for warn logs r=Kerollmops a=shekhirin

Resolves https://github.com/meilisearch/milli/issues/263

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-08-31 17:38:54 +00:00
Alexey Shekhirin
0b02eb456c chore(update): wrap long values into BStr for warn logs 2021-08-31 20:28:16 +03:00
Kerollmops
f230ae6fd5 Introduce the reset_sortable_fields Settings method 2021-08-25 17:44:16 +02:00
Clément Renault
89d0758713 Revert "Revert "Sort at query time"" 2021-08-24 11:55:16 +02:00