jadb/README.md

# jadb

[![built with nix](https://builtwithnix.org/badge.svg)](https://builtwithnix.org)

An SQLite database containing open source japanese dictionary data combined from several sources

Note that while the license for the code is MIT, the data has various licenses.

## Sources

| Source name            | URL                                    |
|------------------------|----------------------------------------|
| **JMDict:**            | https://edrdg.org/jmdict/j_jmdict.html |
| **RADKFILE/KRADFILE:** | https://www.edrdg.org/krad/kradinf.html |
| **KANJIDIC2:**         | https://www.edrdg.org/kanjidic/kanjd2index_legacy.html |
| **Tanos JLPT levels:** | https://www.tanos.co.uk/jlpt/ |
| **Kangxi Radicals:**   | https://ctext.org/kangxi-zidian |

## Implementation details

### Word search

The word search procedure is currently split into 3 parts:

1. **Entry ID query**:

Use a complex query with various scoring factors to try to get list of
database ids pointing at dictionary entries, sorted by how likely we think this
word is the word that the caller is looking for. The output here is a `List<int>`

2. **Data Query**:

Takes the entry id list from the last search, and performs all queries needed to retrieve
all the dictionary data for those IDs. The result is a struct with a bunch of flattened lists
with data for all the dictionary entries. These lists are sorted by the order that the ids
were provided.

3. **Regrouping**:

Takes the flattened data, and regroups the items into structs with a more "hierarchical" structure.
All data tagged with the same ID will end up in the same struct. Returns a list of these structs.