Commit Graph

272 Commits

Author SHA1 Message Date
Kerollmops
6a230fe803 Move the contains_documents logic to a function 2020-08-21 14:44:42 +02:00
Kerollmops
e55a569629 Compress much more the documents database 2020-08-21 14:44:42 +02:00
Kerollmops
962bad3cea Introduce an infos binary to fetch stats 2020-08-17 19:41:49 +02:00
Clément Renault
8806fcd545 Introduce a better query and document lexer 2020-08-16 14:36:54 +02:00
Clément Renault
1e358e3ae8 Introduce the AstarBagIter that iterates through best paths 2020-08-15 16:24:06 +02:00
Clément Renault
7dc594ba4d Introduce the Search builder struct 2020-08-13 14:27:51 +02:00
Clément Renault
bfb46cbfbe Introduce the Crtierion enum 2020-08-12 10:43:02 +02:00
Clément Renault
6d04a285dc Retrieve and display the distances of the words found 2020-08-11 15:18:02 +02:00
Clément Renault
1bd37d213a Lowercase quoted words 2020-08-10 14:49:09 +02:00
Clément Renault
883a8109c8 Show both database and documents database sizes 2020-08-10 14:37:18 +02:00
Clément Renault
a4e0f3f724 Remove the useless TransitiveArc from the serve binary 2020-08-10 14:06:27 +02:00
Clément Renault
edc06a97d6 Remove the useless stats binary 2020-08-10 13:55:02 +02:00
Clément Renault
ae77fe5a69 Introduce an option to specify the maximum database size 2020-08-10 13:53:53 +02:00
Clément Renault
394844062f Move the documents MTBL database inside the Index 2020-08-10 13:47:19 +02:00
Clément Renault
ecd2b2f217 Make the final merge done in parallel 2020-08-07 15:44:04 +02:00
Clément Renault
91282c8b6a Move the documents into another file 2020-08-07 13:11:31 +02:00
Clément Renault
fae694a102 Put the documents into an MTBL database 2020-08-07 12:14:40 +02:00
Clément Renault
405a71d3a4 Accept csv from stdin 2020-08-06 13:38:21 +02:00
Clément Renault
d3b1096510 Compute the word attribute postings lists on each threads 2020-08-06 11:50:27 +02:00
Clément Renault
8d734941af Clean up some lines 2020-08-06 10:20:26 +02:00
Clément Renault
6508d497ce Replace the regex highlighting by a simple algorithm 2020-08-05 13:52:27 +02:00
Clément Renault
4873abe145 Introduce option flags to toggle the indexing engine 2020-08-05 12:10:41 +02:00
Clément Renault
bd4b18541c Introduce a new indexer which uses an MTBL sorter 2020-08-04 15:44:37 +02:00
Kerollmops
ee305c9284 Replace the title by the milli logo 2020-07-15 23:55:28 +02:00
Kerollmops
9ade00e27b Highlight all the matching words 2020-07-14 11:53:21 +02:00
Kerollmops
085c376655 Use the regex crate to highlight "hello" 2020-07-14 11:28:40 +02:00
Kerollmops
aa92311d4e Add a dark theme to the dashboard 2020-07-13 23:51:41 +02:00
Kerollmops
3d144e62c4 Search for best proximities in multiple attributes 2020-07-13 19:06:56 +02:00
Kerollmops
576dd011a1 Compute the candidates but not by attribute 2020-07-13 18:16:05 +02:00
Kerollmops
6b14b20369 Introduce a method to retrieve the number of attributes of the documents 2020-07-13 17:50:16 +02:00
Kerollmops
92c2b1dd2d Refine the help message of the binaries 2020-07-12 11:06:45 +02:00
Kerollmops
f757df5dfd Introduce the stderr logger to the project 2020-07-12 11:04:35 +02:00
Kerollmops
12358476da Use the log crate instead of stderr 2020-07-12 10:55:09 +02:00
Kerollmops
2c62eeea3c Rename the project milli 2020-07-12 00:16:41 +02:00
Kerollmops
d31da26a51 Avoid cloning RoraringBitmaps when unecessary 2020-07-11 23:51:32 +02:00
Kerollmops
b8a1fc0126 Clean up the CSS style custom bulma rules 2020-07-11 14:51:59 +02:00
Kerollmops
f6eae91c7d Pretty print the new dashboard numbers 2020-07-11 14:17:37 +02:00
Kerollmops
d44428fa90 Display more informations on the dashboard 2020-07-11 11:51:56 +02:00
Kerollmops
11c7fef80a Implement a memory dumper
It moves the in memory HashMaps used when indexing to a disk based MTBL file
2020-07-07 16:48:49 +02:00
Kerollmops
b12bfcb03b Reduce the deepness of the word position document ids
This helps reduce the number of allocations.
2020-07-07 12:30:05 +02:00
Kerollmops
7178b6c2c4 First basic version using MTBL again 2020-07-07 11:32:33 +02:00
Kerollmops
adb1038b26 Add a jobs parameter to set the number of threads the indexer uses 2020-07-06 12:17:17 +02:00
Kerollmops
ec1023e790 Intersect document ids by inverse popularity of the words
This reduces the worst request we had which took 56s to now took 3s ("the best of the do").
2020-07-05 19:33:51 +02:00
Kerollmops
cd7e64b2b3 Allow users to set the arc cache size when indexing 2020-07-04 18:12:41 +02:00
Kerollmops
ac8353a64f Merge pre-computed word attribute documents ids 2020-07-04 17:02:27 +02:00
Kerollmops
fea7cac206 Display the time it took to compute the word attribute documents ids 2020-07-04 15:18:38 +02:00
Kerollmops
46ced5c828 Introduce the RwIter append heed API 2020-07-04 12:34:10 +02:00
Kerollmops
7e7440c431 Finalize the LMDB indexing design 2020-07-01 22:45:43 +02:00
Kerollmops
2ae3f40971 Make the indexer ignore certain words
This is a preparation for making the indexing fully parallel by making the
indexer only be aware of certain words for each threads to avoid postings lists
conflicts for each words
2020-07-01 17:49:46 +02:00
Kerollmops
a3ac2623d5 Introduce multiple functions to clean up the code 2020-07-01 17:24:55 +02:00