da8abebfa2
Introduce the query words mapping along with the query tree
2020-01-13 13:29:47 +01:00
4f7a7ea0bb
Faster intersection group by
2020-01-09 16:30:03 +01:00
d6c9ba8f08
Store the postings lists
2020-01-09 15:04:53 +01:00
ec8916bf54
Change the debug outputs
2020-01-09 12:05:39 +01:00
81c573ec92
Add the raw document IDs to the postings lists
2020-01-08 15:30:43 +01:00
9420edadf4
Introduce the Postings type to decorrelate the DocumentIds
2020-01-08 14:48:23 +01:00
d724a7659e
Introduce a query tree context struct
2020-01-08 13:37:22 +01:00
887c212b49
Add more logs about the docids construction
2020-01-08 13:22:42 +01:00
07937ed6d7
Use the prefix caches
2020-01-08 13:14:07 +01:00
a262c67ec3
limit the search in the FST
2020-01-08 13:06:12 +01:00
13ca30c4d8
WIP: Made the query tree traversing support prefix search
2020-01-08 12:02:58 +01:00
fbcec2975d
wip: Impl a basic tree traversing
2020-01-07 18:24:13 +01:00
6e1f4af833
wip: Create a tree from query but need to show synonyms
2020-01-07 18:24:13 +01:00
856c5c4214
Fix group offset computing
2019-12-31 14:24:10 +01:00
670e80c151
Use the cached postings lists in the query system
2019-12-31 13:32:36 +01:00
eed07c724f
Add more logging for postings lists fetching by word
2019-12-31 13:32:36 +01:00
99d35fb940
Introduce a first version of a number of candidates reducer
...
It works by ignoring the postings lists associated to documents that the previous words did not returned
2019-12-31 13:32:36 +01:00
106b886873
Cache the prefix postings lists
2019-12-30 18:01:32 +01:00
928876b553
Introduce the postings lists caching stores
...
Currently not used
2019-12-30 18:01:27 +01:00
58836d89aa
Rename the PrefixCache into PrefixDocumentsCache
2019-12-30 15:42:09 +01:00
1a5a104f13
Display proximity evaluation number of calls
2019-12-30 15:42:09 +01:00
064cfa4755
Add more debug, where are those 100ms
2019-12-30 15:42:08 +01:00
ed6172aa94
Add a time measurement of the criterion loop
2019-12-30 15:42:08 +01:00
8c140f6bcd
Increase the disk usage limit
2019-12-30 15:42:08 +01:00
1e1f0fcaf5
Introduce a basic cache system for first letters
2019-12-30 15:42:08 +01:00
d21352a109
Change the time measurement of the FST
2019-12-30 15:42:08 +01:00
4be11f961b
Use an ugly trick to avoid cloning the FST
2019-12-30 15:42:07 +01:00
1163f390b3
Restrict FST search to the first letter of the word
2019-12-30 15:42:07 +01:00
691e2a3c1d
Fix a blocking channel, appearing like a deadlock
2019-12-30 15:28:28 +01:00
04bb49989f
Add more debug timings
2019-12-20 14:18:48 +01:00
d12ff15ee3
Set the indexes info in the create_index function
2019-12-19 10:38:56 +01:00
40c0b14d1c
Reintroduce searchable attributes and reordering
2019-12-13 14:38:25 +01:00
a4dd033ccf
Rename raw_matches into bare_matches
2019-12-13 14:38:25 +01:00
48e8778881
Clean up the modules declarations
2019-12-13 14:38:25 +01:00
4be23efe66
Remove the AttrCount type
...
Could probably be reintroduced later
2019-12-13 14:38:25 +01:00
7d67750865
Reintroduce exacteness for one word document field
2019-12-13 14:38:25 +01:00
746e6e170c
Make the test pass again
2019-12-13 14:38:24 +01:00
d93e35cace
Introduce ContextMut and Context structs
2019-12-13 14:38:24 +01:00
d75339a271
Prefer summing the attribute
2019-12-13 14:38:24 +01:00
86ee0cbd6e
Introduce bucket_sort_with_distinct function
2019-12-13 14:38:24 +01:00
248ccfc0d8
Update the criteria to the new ones
2019-12-13 14:38:24 +01:00
ea148575cf
Remove the raw_query functions
2019-12-13 14:38:23 +01:00
efc2be0b7b
Bump the sdset dependency to 0.3.6
2019-12-13 14:38:23 +01:00
8d71112dcb
Rewrite the phrase query postings lists
...
This simplified the multiword_rewrite_matches function a little bit.
2019-12-13 14:38:23 +01:00
dd03a6256a
Debug pre filtered number of documents
2019-12-13 14:38:23 +01:00
9c03bb3428
First probably working phrase query doc filtering
2019-12-13 14:38:23 +01:00
22b19c0d93
Fix the processed distance algorithm
2019-12-13 14:38:22 +01:00
0f698d6bd9
Work in progress: Bad Typo detection
...
I have an issue where "speakers" is split into "speaker" and "s",
when I compute the distances for the Typo criterion,
it takes "s" into account and put a distance of zero in the bucket 0
(the "speakers" bucket), therefore it reports any document matching "s"
without typos as best results.
I need to make sure to ignore "s" when its associated part "speaker"
doesn't even exist in the document and is not in the place
it should be ("speaker" followed by "s").
This is hard to think that it will had much computation time to
the Typo criterion like in the previous algorithm where I computed
the real query/words indexes based and removed the invalid ones
before sending the documents to the bucket sort.
2019-12-13 14:38:22 +01:00
4e91b31b1f
Make the Typo and Words work with synonyms
2019-12-13 14:38:22 +01:00
f87c67fcad
Improve the QueryEnhancer by doing a single lookup
2019-12-13 14:38:22 +01:00