Commit Graph

157 Commits

Author SHA1 Message Date
Kerollmops
9142ba9dd4 Fix the parsing of ndjson requests to index more than the first line 2022-02-02 17:55:13 +01:00
bors[bot]
9f2ff71581 Merge #434
434: bump milli to v0.22.0 r=curquiza a=irevoire

This is breaking because of this PR:
98a365aaae

Should we do a special branch to only release the [patch](https://github.com/meilisearch/milli/pull/433) for https://github.com/meilisearch/MeiliSearch/issues/2082 (which is non-breaking)?

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-24 17:31:20 +00:00
Marin Postma
0c84a40298 document batch support
reusable transform

rework update api

add indexer config

fix tests

review changes

Co-authored-by: Clément Renault <clement@meilisearch.com>

fmt
2022-01-19 12:40:20 +01:00
Tamo
367f403693 bump milli 2022-01-17 16:41:34 +01:00
Samyak S Sarnayak
c0313f3026 Use chars for highlight instead of graphemes
Tokenizer v0.2.7 uses chars instead of graphemes for matching bytes.
`unicode-segmentation` dependency isn't needed anymore.

Also, oxidised the highlight code :)

Co-authored-by: many <maxime@meilisearch.com>
2022-01-17 13:15:31 +05:30
Samyak S Sarnayak
c10f58b7bd Update tokenizer to v0.2.7 2022-01-17 13:02:00 +05:30
Samyak S Sarnayak
30247d70cd Fix search highlight for non-unicode chars
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
  clusters to highlight.

In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
  clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a
      part of it matched
2022-01-17 11:37:44 +05:30
Clément Renault
1c6c89f345 Fix the binaries that use the new optional filters 2021-12-09 11:57:53 +01:00
many
1b3923b5ce Update all packages to 0.21.0 2021-11-29 12:17:59 +01:00
many
64ef5869d7 Update tokenizer v0.2.6 2021-11-18 16:56:05 +01:00
Marin Postma
6eb47ab792 remove update_id in UpdateBuilder 2021-11-16 13:07:04 +01:00
Tamo
6831c23449 merge with main 2021-11-06 16:34:30 +01:00
Tamo
07a5ffb04c update http-ui 2021-11-04 15:52:22 +01:00
many
743ed9f57f Bump milli version 2021-11-04 14:04:21 +01:00
many
702589104d Update version for the next release (v0.20.1) 2021-11-03 14:20:01 +01:00
Clémentine Urquizar
056ff13c4d Update version for the next release (v0.20.0) 2021-10-28 14:52:57 +02:00
bors[bot]
d7943fe225 Merge #402
402: Optimize document transform r=MarinPostma a=MarinPostma

This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically:
- For json, we build a serde `Visitor`, that transform the json straight into obkv without intermediate representation.
- For csv, we directly write the lines in the obkv, applying other optimization as well.

Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-10-26 09:55:28 +00:00
marin postma
baddd80069 implement review suggestions 2021-10-25 18:29:12 +02:00
Clémentine Urquizar
679fe18b17 Update version for the next release (v0.19.0) 2021-10-25 11:52:17 +02:00
Tamo
e25ca9776f start updating the exposed function to makes other modules happy 2021-10-22 17:23:22 +02:00
Clémentine Urquizar
f8fe9316c0 Update version for the next release (v0.18.1) 2021-10-21 11:56:14 +02:00
Clémentine Urquizar
2209acbfe2 Update version for the next release (v0.18.2) 2021-10-18 13:45:48 +02:00
bors[bot]
c7db4176f3 Merge #384
384: Replace memmap with memmap2 r=Kerollmops a=palfrey

[memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values.

Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>
2021-10-13 13:47:23 +00:00
many
c5a6075484 Make max_position_per_attributes changable 2021-10-12 10:10:50 +02:00
Clémentine Urquizar
dd56e82dba Update version for the next release (v0.17.2) 2021-10-11 15:20:35 +02:00
Tom Parker-Shemilt
2dfe24f067 memmap -> memmap2 2021-10-10 22:47:12 +01:00
many
1bd15d849b Reduce candidates threshold 2021-10-05 18:52:14 +02:00
Clémentine Urquizar
05d8a33a28 Update version for the next release (v0.17.1) 2021-10-02 16:21:31 +02:00
bors[bot]
bfedbc1b6d Merge #374
374: Enhance CSV document parsing r=Kerollmops a=ManyTheFish

Benchmarks on `search_songs` were crashing because of the CSV parsing.

Co-authored-by: many <maxime@meilisearch.com>
2021-09-29 08:55:54 +00:00
many
d2427f18e5 Enhance CSV document parsing 2021-09-29 10:25:33 +02:00
Clémentine Urquizar
0e8665bf18 Update version for the next release (v0.17.0) 2021-09-28 19:38:12 +02:00
Tamo
cc732fe95e update http-ui to use the sort-error 2021-09-28 11:15:24 +02:00
Clémentine Urquizar
1eacab2169 Update version for the next release (v0.15.1) 2021-09-22 17:18:54 +02:00
Clémentine Urquizar
f8ecbc28e2 Update version for the next release (v0.15.0) 2021-09-21 18:09:14 +02:00
mpostma
aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
bors[bot]
94764e5c7c Merge #360
360: Update version for the next release (v0.14.0) r=Kerollmops a=curquiza

Release containing the geosearch, cf #322 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-21 08:43:27 +00:00
bors[bot]
31c8de1cca Merge #322
322: Geosearch r=ManyTheFish a=irevoire

This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so.

### What we will have to do on the indexing part:
 - [x] Index the `_geo` fields from the documents.
   - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process.
   - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module.
   - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree`
- [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification.
- [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file.
- [x] save a Roaring bitmap of all documents containing the `_geo` field

### What we will have to do on the query part:
- [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range.
  - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum.
  - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents.
  - [x] Add the `_geoRadius` function in the pest parser.
- [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too!
  - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule.

-----------

- On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned.

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-20 19:04:57 +00:00
Clémentine Urquizar
3f1453f470 Update version for the next release (v0.14.0) 2021-09-20 18:12:23 +02:00
Clémentine Urquizar
f167f7b412 Update version for the next release (v0.13.1) 2021-09-10 09:48:17 +02:00
Irevoire
a84f3a8b31 Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
7483614b75 [HTTP-UI] add the sorters 2021-09-08 18:24:09 +02:00
cvermand
4fd0116a0d Stringify objects on dashboard to avoid [Object object] 2021-09-08 17:51:08 +02:00
Kerollmops
68856e5e2f Disable the default snappy compression for the http-ui crate 2021-09-08 14:17:32 +02:00
Clémentine Urquizar
eb7b9d9dbf Update version for the next release (v0.13.0) 2021-09-08 10:59:30 +02:00
bors[bot]
5cbe879325 Merge #308
308: Implement a better parallel indexer r=Kerollmops a=ManyTheFish

Rewrite the indexer:
- enhance memory consumption control
- optimize parallelism using rayon and crossbeam channel
- factorize the different parts and make new DB implementation easier
- optimize and fix prefix databases


Co-authored-by: many <maxime@meilisearch.com>
2021-09-02 15:03:52 +00:00
Clémentine Urquizar
285849e3a6 Update version for the next release (v0.12.0) 2021-09-02 10:08:41 +02:00
many
1d314328f0 Plug new indexer 2021-09-01 16:48:36 +02:00
Tamo
d106eb5b90 add the sortable attributes to http-ui and fix the tests 2021-08-30 16:25:10 +02:00
Kerollmops
af65485ba7 Reexport the grenad CompressionType from milli 2021-08-24 18:15:31 +02:00
Kerollmops
2f20257070 Update milli to the v0.11.0 2021-08-24 18:10:11 +02:00