Commit Graph

1521 Commits

Author SHA1 Message Date
e8c76cf7bf Intern all strings and phrases in the search logic 2023-03-20 09:41:56 +01:00
3f1729a17f Update new search test 2023-03-20 09:41:56 +01:00
cab2b6bcda Fix: computation of initial universe, code organisation 2023-03-20 09:41:56 +01:00
c4979a2fda Fix code visibility issue + unimplemented detail in proximity rule 2023-03-20 09:41:56 +01:00
23931f8a4f Fix small bug in visual logger of search algo 2023-03-20 09:41:56 +01:00
aa414565bb Fix proximity graph edge builder to include all proximities 2023-03-20 09:41:56 +01:00
1db152046e WIP on split words and synonyms support 2023-03-20 09:41:56 +01:00
c27ea2677f Rewrite cheapest path algorithm and empty path cache
It is now much simpler and has much better performance.
2023-03-20 09:41:56 +01:00
caa1e1b923 Add typo ranking rule to new search impl 2023-03-20 09:41:56 +01:00
71f18e4379 Add sort ranking rule to new search impl 2023-03-20 09:41:56 +01:00
600e3dd1c5 Remove warnings 2023-03-20 09:41:56 +01:00
362eb0de86 Add support for filters 2023-03-20 09:41:56 +01:00
998d46ac10 Add support for search offset and limit 2023-03-20 09:41:56 +01:00
6c85c0d95e Fix more bugs + visual empty path cache logging 2023-03-20 09:41:56 +01:00
0e1fbbf7c6 Fix bugs in query graph's "remove word" and "cheapest paths" algos 2023-03-20 09:41:56 +01:00
6806640ef0 Fix d2 description of paths map 2023-03-20 09:41:56 +01:00
173e37584c Improve the visual/detailed search logger 2023-03-20 09:41:55 +01:00
6ba4d5e987 Add a search logger 2023-03-20 09:41:55 +01:00
dd12d44134 Support swapped word pairs in new proximity ranking rule impl 2023-03-20 09:41:55 +01:00
c8e251bf24 Remove noise in codebase 2023-03-20 09:41:55 +01:00
a938fbde4a Use a cache when resolving the query graph 2023-03-20 09:41:55 +01:00
dcf3f1d18a Remove EdgeIndex and NodeIndex types, prefer u32 instead 2023-03-20 09:41:55 +01:00
66d0c63694 Add some documentation and use bitmaps instead of hashmaps when possible 2023-03-20 09:41:55 +01:00
132191360b Introduce the sort ranking rule working with the new search structures 2023-03-20 09:41:55 +01:00
345c99d5bd Introduce the words ranking rule working with the new search structures 2023-03-20 09:41:55 +01:00
89d696c1e3 Introduce the proximity ranking rule as a graph-based ranking rule 2023-03-20 09:41:55 +01:00
c645853529 Introduce a generic graph-based ranking rule 2023-03-20 09:41:55 +01:00
a70ab8b072 Introduce a function to find the K shortest paths in a graph 2023-03-20 09:41:55 +01:00
48aae76b15 Introduce a function to find the docids of a set of paths in a graph 2023-03-20 09:41:55 +01:00
23bf572dea Introduce cache structures used with ranking rule graphs 2023-03-20 09:41:55 +01:00
864f6410ed Introduce a structure to represent a set of graph paths efficiently 2023-03-20 09:41:55 +01:00
c9bf6bb2fa Introduce a structure to implement ranking rules with graph algorithms 2023-03-20 09:41:55 +01:00
46249ea901 Implement a function to find a QueryGraph's docids 2023-03-20 09:41:55 +01:00
ce0d1e0e13 Introduce a common way to manage the coordination between ranking rules 2023-03-20 09:41:55 +01:00
5065d8b0c1 Introduce a DatabaseCache to memorize the addresses of LMDB values 2023-03-20 09:41:55 +01:00
a83007c013 Introduce structure to represent search queries as graphs 2023-03-20 09:41:55 +01:00
79e0a6dd4e Introduce a new search module, eventually meant to replace the old one
The code here does not compile, because I am merely splitting one giant
commit into smaller ones where each commit explains a single file.
2023-03-20 09:41:55 +01:00
2d88089129 Remove unused term matching strategies 2023-03-20 09:41:55 +01:00
6c659dc12f Use MiMalloc in milli tests 2023-03-20 09:41:37 +01:00
0f33a65468 makes kero happy 2023-03-13 16:51:11 +01:00
eddefb0e0f refactor the error type of the milli::document thing
silence a warning
2023-03-09 13:03:14 +01:00
c5f22be6e1 add boolean support for csv documents 2023-03-09 11:12:49 +01:00
4f1ccbc495 Merge #3525
3525: Fix phrase search containing stop words r=ManyTheFish a=ManyTheFish

# Summary
A search with a phrase containing only stop words was returning an HTTP error 500,
this PR filters the phrase containing only stop words dropping them before the search starts, a query with a phrase containing only stop words now behaves like a placeholder search.

fixes https://github.com/meilisearch/meilisearch/issues/3521

related v1.0.2 PR on milli: https://github.com/meilisearch/milli/pull/779



Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-03-02 10:55:37 +00:00
37489fd495 Return an internal error in the case of matching word is invalid 2023-03-01 19:05:16 +01:00
5822764be9 Skip computing index budget in tests 2023-02-23 11:23:39 +01:00
ac5a1e4c4b Merge #3423
3423: Add min and max facet stats r=dureuill a=dureuill

# Pull Request

## Related issue
Fixes #3426

## What does this PR do?

### User standpoint

- When using a `facets` parameter in search, the facets that have numeric values are displayed in a new section of the response called `facetStats` that contains, per facet, the numeric min and max value of the hits returned by the search.

<details>
<summary>
Sample request/response
</summary>

```json
❯ curl \
  -X POST 'http://localhost:7700/indexes/meteorites/search?facets=mass' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "LL6", "facets":["mass", "recclass"], "limit": 5 }' | jsonxf
{
  "hits": [
    {
      "name": "Niger (LL6)",
      "id": "16975",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 3.3,
      "fall": "Fell"
    },
    {
      "name": "Appley Bridge",
      "id": "2318",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 15000,
      "fall": "Fell",
      "_geo": {
        "lat": 53.58333,
        "lng": -2.71667
      }
    },
    {
      "name": "Athens",
      "id": "4885",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 265,
      "fall": "Fell",
      "_geo": {
        "lat": 34.75,
        "lng": -87.0
      }
    },
    {
      "name": "Bandong",
      "id": "4935",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 11500,
      "fall": "Fell",
      "_geo": {
        "lat": -6.91667,
        "lng": 107.6
      }
    },
    {
      "name": "Benguerir",
      "id": "30443",
      "nametype": "Valid",
      "recclass": "LL6",
      "mass": 25000,
      "fall": "Fell",
      "_geo": {
        "lat": 32.25,
        "lng": -8.15
      }
    }
  ],
  "query": "LL6",
  "processingTimeMs": 15,
  "limit": 5,
  "offset": 0,
  "estimatedTotalHits": 42,
  "facetDistribution": {
    "mass": {
      "110000": 1,
      "11500": 1,
      "1161": 1,
      "12000": 1,
      "1215.5": 1,
      "127000": 1,
      "15000": 1,
      "1676": 1,
      "1700": 1,
      "1710.5": 1,
      "18000": 1,
      "19000": 1,
      "220000": 1,
      "2220": 1,
      "22300": 1,
      "25000": 2,
      "265": 1,
      "271000": 1,
      "2840": 1,
      "3.3": 1,
      "3000": 1,
      "303": 1,
      "32000": 1,
      "34000": 1,
      "36.1": 1,
      "45000": 1,
      "460": 1,
      "478": 1,
      "483": 1,
      "5500": 2,
      "600": 1,
      "6000": 1,
      "67.8": 1,
      "678": 1,
      "680.5": 1,
      "6930": 1,
      "8": 1,
      "8300": 1,
      "840": 1,
      "8400": 1
    },
    "recclass": {
      "L/LL6": 3,
      "LL6": 39
    }
  },
  "facetStats": {
    "mass": {
      "min": 3.3,
      "max": 271000.0
    }
  }
}
```

</details>

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-22 13:06:43 +00:00
900bae3d9d keep phrases that has at least one word 2023-02-21 18:16:51 +01:00
28b7d73d4a Remove an unefficient part of a test on milli 2023-02-21 18:16:51 +01:00
39407885c2 Merge #3347
3347: Enhance language detection r=irevoire a=ManyTheFish

## Summary

Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query.

This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search.

## Detail

- Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected
- Fill the newly created database during indexing
- Create an allow-list with this database and pass it to Charabia
- Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese

## Related issues
Fixes #2403
Fixes #3513

Co-authored-by: f3r10 <frledesma@outlook.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-02-21 10:52:13 +00:00
bbecab8948 fix clippy 2023-02-21 10:18:44 +01:00