diff --git a/README.md b/README.md index bb3a3d2..dceffaa 100644 --- a/README.md +++ b/README.md @@ -16,3 +16,26 @@ Note that while the license for the code is MIT, the data has various licenses. | **Tanos JLPT levels:** | https://www.tanos.co.uk/jlpt/ | | **Kangxi Radicals:** | https://ctext.org/kangxi-zidian | +## Implementation details + +### Word search + +The word search procedure is currently split into 3 parts: + +1. **Entry ID query**: + +Use a complex query with various scoring factors to try to get list of +database ids pointing at dictionary entries, sorted by how likely we think this +word is the word that the caller is looking for. The output here is a `List` + +2. **Data Query**: + +Takes the entry id list from the last search, and performs all queries needed to retrieve +all the dictionary data for those IDs. The result is a struct with a bunch of flattened lists +with data for all the dictionary entries. These lists are sorted by the order that the ids +were provided. + +3. **Regrouping**: + +Takes the flattened data, and regroups the items into structs with a more "hierarchical" structure. +All data tagged with the same ID will end up in the same struct. Returns a list of these structs.