Commit Graph

6 Commits

Author SHA1 Message Date
8d71112dcb Rewrite the phrase query postings lists
This simplified the multiword_rewrite_matches function a little bit.
2019-12-13 14:38:23 +01:00
9c03bb3428 First probably working phrase query doc filtering 2019-12-13 14:38:23 +01:00
22b19c0d93 Fix the processed distance algorithm 2019-12-13 14:38:22 +01:00
0f698d6bd9 Work in progress: Bad Typo detection
I have an issue where "speakers" is split into "speaker" and "s",
when I compute the distances for the Typo criterion,
it takes "s" into account and put a distance of zero in the bucket 0
(the "speakers" bucket), therefore it reports any document matching "s"
without typos as best results.

I need to make sure to ignore "s" when its associated part "speaker"
doesn't even exist in the document and is not in the place
it should be ("speaker" followed by "s").

This is hard to think that it will had much computation time to
the Typo criterion like in the previous algorithm where I computed
the real query/words indexes based and removed the invalid ones
before sending the documents to the bucket sort.
2019-12-13 14:38:22 +01:00
4e91b31b1f Make the Typo and Words work with synonyms 2019-12-13 14:38:22 +01:00
902625601a Work in progress: It seems like we support synonyms, split and concat words 2019-12-13 14:38:22 +01:00