42 lines
1.6 KiB
Markdown
42 lines
1.6 KiB
Markdown
# jadb
|
|
|
|
[](https://builtwithnix.org)
|
|
|
|
An SQLite database containing open source japanese dictionary data combined from several sources
|
|
|
|
Note that while the license for the code is MIT, the data has various licenses.
|
|
|
|
## Sources
|
|
|
|
| Source name | URL |
|
|
|------------------------|----------------------------------------|
|
|
| **JMDict:** | https://edrdg.org/jmdict/j_jmdict.html |
|
|
| **RADKFILE/KRADFILE:** | https://www.edrdg.org/krad/kradinf.html |
|
|
| **KANJIDIC2:** | https://www.edrdg.org/kanjidic/kanjd2index_legacy.html |
|
|
| **Tanos JLPT levels:** | https://www.tanos.co.uk/jlpt/ |
|
|
| **Kangxi Radicals:** | https://ctext.org/kangxi-zidian |
|
|
|
|
## Implementation details
|
|
|
|
### Word search
|
|
|
|
The word search procedure is currently split into 3 parts:
|
|
|
|
1. **Entry ID query**:
|
|
|
|
Use a complex query with various scoring factors to try to get list of
|
|
database ids pointing at dictionary entries, sorted by how likely we think this
|
|
word is the word that the caller is looking for. The output here is a `List<int>`
|
|
|
|
2. **Data Query**:
|
|
|
|
Takes the entry id list from the last search, and performs all queries needed to retrieve
|
|
all the dictionary data for those IDs. The result is a struct with a bunch of flattened lists
|
|
with data for all the dictionary entries. These lists are sorted by the order that the ids
|
|
were provided.
|
|
|
|
3. **Regrouping**:
|
|
|
|
Takes the flattened data, and regroups the items into structs with a more "hierarchical" structure.
|
|
All data tagged with the same ID will end up in the same struct. Returns a list of these structs.
|