jadb
An SQLite database containing open source japanese dictionary data combined from several sources
Note that while the license for the code is MIT, the data has various licenses.
Sources
| Source name | URL |
|---|---|
| JMDict: | https://edrdg.org/jmdict/j_jmdict.html |
| RADKFILE/KRADFILE: | https://www.edrdg.org/krad/kradinf.html |
| KANJIDIC2: | https://www.edrdg.org/kanjidic/kanjd2index_legacy.html |
| Tanos JLPT levels: | https://www.tanos.co.uk/jlpt/ |
| Kangxi Radicals: | https://ctext.org/kangxi-zidian |
Implementation details
Word search
The word search procedure is currently split into 3 parts:
- Entry ID query:
Use a complex query with various scoring factors to try to get list of
database ids pointing at dictionary entries, sorted by how likely we think this
word is the word that the caller is looking for. The output here is a List<int>
- Data Query:
Takes the entry id list from the last search, and performs all queries needed to retrieve all the dictionary data for those IDs. The result is a struct with a bunch of flattened lists with data for all the dictionary entries. These lists are sorted by the order that the ids were provided.
- Regrouping:
Takes the flattened data, and regroups the items into structs with a more "hierarchical" structure. All data tagged with the same ID will end up in the same struct. Returns a list of these structs.