Commit Graph

106 Commits

Author SHA1 Message Date
3b64735058 Introduce a struct to compute facets values 2021-01-26 14:06:27 +01:00
1ae761311e integrate with meilisearch tokenizer 2021-01-07 16:14:27 +01:00
ecc8bc8910 Introduce the FieldId u8 alias type 2020-12-02 11:19:45 +01:00
96f64c629e Move the UpdateStore out of the update module 2020-12-01 14:51:05 +01:00
a0adfb5e8e Introduce a real pest parser and support every facet filter conditions 2020-11-23 16:43:55 +01:00
2341b99379 Support a basic facet based query system 2020-11-23 16:43:49 +01:00
27f3ef5f7a Use the new ExternalDocumentsIds struct in the engine 2020-11-22 19:27:34 +01:00
415c0b86ba Introduce the ExternalDocumentsIds struct 2020-11-22 19:27:33 +01:00
a18d9a1f87 Parse and store the faceted fields 2020-11-13 16:13:51 +01:00
466fb601d6 Faceted fields settings must specify the facet type 2020-11-13 11:46:48 +01:00
e0058c1125 Introduce codecs for facet types (string, f64, u64, i64) 2020-11-11 15:48:24 +01:00
4fb138c42e Make sure we index all kind of JSON types 2020-11-06 16:35:07 +01:00
c94bc59d7e Introduce a function to transform an obk into a JSON 2020-11-05 13:57:29 +01:00
f0d028d3a4 Update the Transform struct to support JSON updates 2020-10-31 20:52:49 +01:00
3889d956d9 Introduce the UpdateBuilder and use it in the HTTP routes 2020-10-27 18:47:58 +01:00
60347a5483 Move the AvailableDocumentsIds iterator into the update module 2020-10-26 10:53:23 +01:00
b14cca2ad9 Introduce the UpdateBuilder type along with some update operations 2020-10-25 18:32:01 +01:00
656a851830 Introduce the Transform struct transforming CSVs
This allows us to:
  - transform a CSV, a JSON or a JSON lines data type into the same
    Grenad x Obkv streamable data type and creates the new FieldsIdsMap.
  - Extract all the documents user ids in advance to be able to delete
    the existing documents before re-indexing them.
  - Keep the last documents with the same user id avoiding duplicates
    in the same request.
2020-10-24 13:37:38 +02:00
8d82e37ec0 Introduce the AvailableDocumentsIds iterator 2020-10-23 12:07:01 +02:00
566a7c3039 Make the FieldsIdsMap serialization more stable by using a BTreeMap 2020-10-22 14:53:20 +02:00
9133f38138 Introduce the FieldsIdsMap type 2020-10-22 12:56:35 +02:00
5caf523fd9 Move the Index to its own module 2020-10-21 15:55:48 +02:00
a122d3d466 Export the indexing part into a module 2020-10-20 14:22:09 +02:00
871222aebd Introduce some new routes to handle live indexing 2020-10-19 16:06:43 +02:00
65e32fecb1 Move the binaries into one with subcommands 2020-10-19 13:44:17 +02:00
83c1db8763 Introduce the UpdateStore 2020-10-18 15:26:57 +02:00
a00f5850ee Add support for placeholder search for empty queries 2020-10-06 20:19:50 +02:00
ce8e56ee18 Rewrite the indexer to use one MTBL by database
This allows us to avoid prefixing keys and appending into LMDB databases
2020-10-04 17:04:33 +02:00
007e647462 Introduce the Mdfs Iterator that explore the proximity graph using a mana DFS 2020-10-02 16:46:07 +02:00
d0c73564b1 Use the CboRoaringBitmapCodec for the word pair proximity docids 2020-10-02 16:46:06 +02:00
4eda149ffa Rename the BoRoaringBitmap codec 2020-10-02 16:46:06 +02:00
bc35c9a598 Introduce the size_of_database infos subcommand 2020-10-02 16:46:05 +02:00
d6fa9c0414 Index the intra documents word pair proximities 2020-09-22 14:04:33 +02:00
e34437b2d7 Move the proximity function to a module 2020-09-22 10:54:59 +02:00
5664c37539 Introduce an heed codec that reduce the size of small amount of serialized integers 2020-09-07 20:06:23 +02:00
daa3673c1c Invert the word docid positions key order 2020-09-06 10:30:53 +02:00
dc88a86259 Store the word positions under the documents 2020-09-05 18:03:06 +02:00
580ed1119a Make the engine to return csv string records as documents and headers 2020-08-31 19:02:00 +02:00
bad0663138 Come back to the old tokenizer 2020-08-31 13:34:38 +02:00
ad5cafbfed Introduce a database to store docids in groups of four positions 2020-08-29 17:42:55 +02:00
3db517548d Move the documents back into the LMDB database 2020-08-29 15:14:04 +02:00
3fe497e129 Improve the Mtbl heed codec to only encode MTBL databases 2020-08-29 11:20:39 +02:00
0a44ff86ab Put the documents MTBL back into LMDB
We makes sure to write the documents into a file before
memory mapping it and putting it into LMDB, this way we avoid
moving it to RAM
2020-08-28 15:43:24 +02:00
d784d87880 Remove the prefix LMDB databases 2020-08-28 14:41:43 +02:00
7cde312f14 Introduce the StrBEU32Codec heed codec 2020-08-28 14:16:37 +02:00
8806fcd545 Introduce a better query and document lexer 2020-08-16 14:36:54 +02:00
1e358e3ae8 Introduce the AstarBagIter that iterates through best paths 2020-08-15 16:24:06 +02:00
7dc594ba4d Introduce the Search builder struct 2020-08-13 14:27:51 +02:00
bfb46cbfbe Introduce the Crtierion enum 2020-08-12 10:43:02 +02:00
6d04a285dc Retrieve and display the distances of the words found 2020-08-11 15:18:02 +02:00