Commit Graph

100 Commits

Author SHA1 Message Date
e8a156d682 Reorganise facets database indexing code 2022-10-26 13:46:46 +02:00
add96f921b Remove unused infos/ http-ui/ and fuzz/ crates 2022-09-14 06:55:01 +02:00
a5b9a35c50 Activate char_map for highlighting 2022-08-22 14:39:16 +02:00
afc10acd19 Merge #596
596: Filter operators: NOT + IN[..] r=irevoire a=loiclec

# Pull Request

## What does this PR do?
Implements the changes described in https://github.com/meilisearch/meilisearch/issues/2580
It is based on top of #556 

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-08-18 11:24:32 +00:00
4aae07d5f5 expose the size methods 2022-08-17 17:07:38 +02:00
258c3dd563 Make AND+OR filters n-ary (store a vector of subfilters instead of 2)
NOTE: The token_at_depth is method is a bit useless now, as the only
cases where there would be a toke at depth 1000 are the cases where
the parser already stack-overflowed earlier.

Example: (((((... (x=1) ...)))))
2022-08-17 12:28:33 +02:00
20be69e1b9 Always use mimalloc as the global allocator 2022-08-16 20:09:36 +02:00
a892a4a79c Introduce a function to extend from a JSON array of objects 2022-07-12 15:14:06 +02:00
399eec5c01 Fix the indexation tests 2022-07-12 14:55:51 +02:00
fcfc4caf8c Move the Object type in the lib.rs file and use it everywhere 2022-07-12 14:55:51 +02:00
f29114f94a Fix http-ui to fit with the new DocumentsBatchBuilder/Reader structs 2022-07-12 14:52:56 +02:00
a5c790bf4b Update http-ui 2022-06-02 18:15:36 +02:00
4dd3675d2b Update http-ui 2022-06-02 16:59:11 +02:00
ea4bb9402f Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
4f3ce6d9cd nested fields 2022-04-07 16:58:46 +02:00
29c5f76d7f Use new matcher in http-ui 2022-04-05 17:41:32 +02:00
21ec334dcc Fix the compilation error of the dependency versions 2022-03-15 11:17:45 +01:00
288a879411 Remove three useless dependencies 2022-03-15 11:17:44 +01:00
9142ba9dd4 Fix the parsing of ndjson requests to index more than the first line 2022-02-02 17:55:13 +01:00
0c84a40298 document batch support
reusable transform

rework update api

add indexer config

fix tests

review changes

Co-authored-by: Clément Renault <clement@meilisearch.com>

fmt
2022-01-19 12:40:20 +01:00
c0313f3026 Use chars for highlight instead of graphemes
Tokenizer v0.2.7 uses chars instead of graphemes for matching bytes.
`unicode-segmentation` dependency isn't needed anymore.

Also, oxidised the highlight code :)

Co-authored-by: many <maxime@meilisearch.com>
2022-01-17 13:15:31 +05:30
30247d70cd Fix search highlight for non-unicode chars
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
  clusters to highlight.

In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
  clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a
      part of it matched
2022-01-17 11:37:44 +05:30
1c6c89f345 Fix the binaries that use the new optional filters 2021-12-09 11:57:53 +01:00
6eb47ab792 remove update_id in UpdateBuilder 2021-11-16 13:07:04 +01:00
6831c23449 merge with main 2021-11-06 16:34:30 +01:00
07a5ffb04c update http-ui 2021-11-04 15:52:22 +01:00
baddd80069 implement review suggestions 2021-10-25 18:29:12 +02:00
e25ca9776f start updating the exposed function to makes other modules happy 2021-10-22 17:23:22 +02:00
c7db4176f3 Merge #384
384: Replace memmap with memmap2 r=Kerollmops a=palfrey

[memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values.

Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>
2021-10-13 13:47:23 +00:00
c5a6075484 Make max_position_per_attributes changable 2021-10-12 10:10:50 +02:00
2dfe24f067 memmap -> memmap2 2021-10-10 22:47:12 +01:00
1bd15d849b Reduce candidates threshold 2021-10-05 18:52:14 +02:00
d2427f18e5 Enhance CSV document parsing 2021-09-29 10:25:33 +02:00
cc732fe95e update http-ui to use the sort-error 2021-09-28 11:15:24 +02:00
aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
a84f3a8b31 Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
7483614b75 [HTTP-UI] add the sorters 2021-09-08 18:24:09 +02:00
68856e5e2f Disable the default snappy compression for the http-ui crate 2021-09-08 14:17:32 +02:00
1d314328f0 Plug new indexer 2021-09-01 16:48:36 +02:00
d106eb5b90 add the sortable attributes to http-ui and fix the tests 2021-08-30 16:25:10 +02:00
af65485ba7 Reexport the grenad CompressionType from milli 2021-08-24 18:15:31 +02:00
89d0758713 Revert "Revert "Sort at query time"" 2021-08-24 11:55:16 +02:00
922f9fd4d5 Revert "Sort at query time" 2021-08-20 18:09:17 +02:00
5b88df508e Use the new Asc/Desc syntax everywhere 2021-08-17 14:15:22 +02:00
4562b278a8 remove a warning and add a log
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-07-05 17:46:02 +02:00
a57e522a67 introduce a die route let the program exit itself alone 2021-07-05 17:38:10 +02:00
634201244c Merge #250 #251
250: Add the limit field to http-ui r=Kerollmops a=irevoire



251: Fix the limit r=Kerollmops a=irevoire

There was no check on the limit and thus if a user specified a very large number this line could cause a panic.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-22 13:00:52 +00:00
81643e6d70 add the limit field to http-ui 2021-06-22 14:47:23 +02:00
77eb37934f add jemalloc to http-ui and the benchmarks 2021-06-22 14:17:56 +02:00