Commit Graph

31 Commits

Author SHA1 Message Date
8046ae4bd5 Count the number of char instead of counting bytes to assign the typo tolerance 2021-09-28 12:10:43 +02:00
9716fb3b36 format the whole project 2021-06-16 18:33:33 +02:00
312c2d1d8e Use the Error enum everywhere in the project 2021-06-14 16:58:38 +02:00
f4cab080a6 Update milli/src/search/query_tree.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-10 11:30:51 +02:00
e923a3ed6a Replace Consecutive by Phrase in query tree
Replace Consecutive by Phrase in query tree in order to remove theorical bugs,
due of the Consecutive enum type.
2021-06-10 11:16:16 +02:00
faf148d297 Update milli/src/search/query_tree.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-08 17:52:37 +02:00
b489d699ce Make hard separators split phrase query
hard separators will now split a phrase query as double double-quotes

Fix #208
2021-06-08 17:29:38 +02:00
225ae6fd25 Resolve PR comments 2021-06-01 11:53:09 +02:00
1df68d342a Make the MatchingWords return the number of matching bytes 2021-05-31 18:22:29 +02:00
efba662ca6 Fix clippy warnings in cirteria 2021-05-10 10:27:18 +02:00
a3f8686fbf Introduce exactness criterion 2021-05-06 14:28:30 +02:00
6fa00c61d2 feat(search): support words_limit 2021-04-20 12:22:04 +03:00
33860bc3b7 test(update, settings): set & reset synonyms
fixes after review

more fixes after review
2021-04-18 11:24:17 +03:00
e39aabbfe6 feat(search, update): synonyms 2021-04-18 11:24:17 +03:00
dcb00b2e54 test a new implementation of the stop_words 2021-04-12 18:35:33 +02:00
da036dcc3e Revert "Integrate the stop_words in the querytree"
This reverts commit 12fb509d84.
We revert this commit because it's causing the bug #150.
The initial algorithm we implemented for the stop_words was:

1. remove the stop_words from the dataset
2. keep the stop_words in the query to see if we can generate new words by
   integrating typos or if the word was a prefix
=> This was causing the bug since, in the case of “The hobbit”, we were
   **always** looking for something starting with “t he” or “th e”
   instead of ignoring the word completely.

For now we are going to fix the bug by completely ignoring the
stop_words in the query.
This could cause another problem were someone mistyped a normal word and
ended up typing a stop_word.

For example imagine someone searching for the music “Won't he do it”.
If that person misplace one space and write “Won' the do it” then we
will loose a part of the request.

One fix would be to update our query tree to something like that:

---------------------
OR
  OR
    TOLERANT hobbit # the first option is to ignore the stop_word
    AND
      CONSECUTIVE   # the second option is to do as we are doing
        EXACT t	    # currently
        EXACT he
      TOLERANT hobbit
---------------------

This would increase drastically the size of our query tree on request
with a lot of stop_words. For example think of “The Lord Of The Rings”.

For now whatsoever we decided we were going to ignore this problem and consider
that it doesn't reduce too much the relevancy of the search to do that
while it improves the performances.
2021-04-12 18:35:33 +02:00
12fb509d84 Integrate the stop_words in the querytree
remove the stop_words from the querytree except if it was a prefix or a typo
2021-04-01 13:57:55 +02:00
a2f46029c7 implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
5af63c74e0 Speed-up the MatchingWords highlighting struct 2021-03-03 15:45:03 +01:00
ae4a237e58 Fix the maximum_proximity function 2021-03-03 15:43:44 +01:00
9bc9b36645 Introduce the Proximity criterion 2021-03-03 15:43:44 +01:00
fb7e6df790 add tests on typo criterion 2021-03-03 15:43:43 +01:00
a273c46559 clean warnings 2021-03-03 15:43:42 +01:00
73286dc8bf Introduce the query tree data structure 2021-03-03 15:43:40 +01:00
240b02e175 Remove unused Operation constructors 2021-03-03 13:40:19 +01:00
a463ae821e Add methods optional_words and authorize_typos on the query tree 2021-03-03 13:40:19 +01:00
6d135beb21 Introduce the maximum_proximity helper function 2021-03-03 13:40:18 +01:00
6008f528d0 Introduce the maximum_typo helper function 2021-03-03 13:40:18 +01:00
1dc857a4b2 Fix the query tree optional word generation with phrases 2021-03-03 13:40:18 +01:00
4f19749252 Introduce the word_documents_count method on the Context trait 2021-03-03 13:40:18 +01:00
79a143b32f Introduce the query tree data structure 2021-03-03 13:40:18 +01:00