README: add textual overview of the word search procedure

2026-02-28 14:52:22 +09:00
parent 382af1add8
commit 8fb6baa03f
1 changed files with 23 additions and 0 deletions
@@ -16,3 +16,26 @@ Note that while the license for the code is MIT, the data has various licenses.
 | **Tanos JLPT levels:** | https://www.tanos.co.uk/jlpt/ |
 | **Kangxi Radicals:**   | https://ctext.org/kangxi-zidian |

+## Implementation details
+
+### Word search
+
+The word search procedure is currently split into 3 parts:
+
+1. **Entry ID query**:
+
+Use a complex query with various scoring factors to try to get list of
+database ids pointing at dictionary entries, sorted by how likely we think this
+word is the word that the caller is looking for. The output here is a `List<int>`
+
+2. **Data Query**:
+
+Takes the entry id list from the last search, and performs all queries needed to retrieve
+all the dictionary data for those IDs. The result is a struct with a bunch of flattened lists
+with data for all the dictionary entries. These lists are sorted by the order that the ids
+were provided.
+
+3. **Regrouping**:
+
+Takes the flattened data, and regroups the items into structs with a more "hierarchical" structure.
+All data tagged with the same ID will end up in the same struct. Returns a list of these structs.