README: add textual overview of the word search procedure

This commit is contained in:
2026-02-28 14:52:22 +09:00
parent 382af1add8
commit 8fb6baa03f

View File

@@ -16,3 +16,26 @@ Note that while the license for the code is MIT, the data has various licenses.
| **Tanos JLPT levels:** | https://www.tanos.co.uk/jlpt/ |
| **Kangxi Radicals:** | https://ctext.org/kangxi-zidian |
## Implementation details
### Word search
The word search procedure is currently split into 3 parts:
1. **Entry ID query**:
Use a complex query with various scoring factors to try to get list of
database ids pointing at dictionary entries, sorted by how likely we think this
word is the word that the caller is looking for. The output here is a `List<int>`
2. **Data Query**:
Takes the entry id list from the last search, and performs all queries needed to retrieve
all the dictionary data for those IDs. The result is a struct with a bunch of flattened lists
with data for all the dictionary entries. These lists are sorted by the order that the ids
were provided.
3. **Regrouping**:
Takes the flattened data, and regroups the items into structs with a more "hierarchical" structure.
All data tagged with the same ID will end up in the same struct. Returns a list of these structs.