mugiten/jadb

T

oysteikt 9632b90952

Build database / evals (push) Successful in 12m28s

Details

Run tests / evals (push) Failing after 28m41s

Details

search/kanji: split queries into separate functions

2026-02-28 18:57:57 +09:00

.gitea/workflows

.gitea/workflows/test: init

2026-02-24 20:43:07 +09:00

init commit

2022-06-20 20:07:35 +02:00

treewide: add and apply a bunch of lints

2025-07-17 00:24:35 +02:00

data/tanos-jlpt

Add data from tanos.co/jlpt

2025-05-13 21:26:18 +02:00

search/kanji: split queries into separate functions

2026-02-28 18:57:57 +09:00

Store type enum as CHAR(1)

2025-06-25 20:18:27 +02:00

nix/database_tool: fix building

2026-02-21 00:49:53 +09:00

Fix a few lints

2026-02-28 18:25:37 +09:00

.envrc

.envrc: init

2024-11-14 16:52:47 +01:00

.gitignore

.gitignore: add /doc

2025-05-19 16:40:36 +02:00

.sqlfluff

init commit

2022-06-20 20:07:35 +02:00

analysis_options.yaml

treewide: add and apply a bunch of lints

2025-07-17 00:24:35 +02:00

flake.lock

flake.lock: bump

2026-02-25 16:28:18 +09:00

flake.nix

flake.nix: comment out sqlint, currently broken due to dep build failure

2026-02-09 14:45:19 +09:00

LICENSE

LICENSE: init

2025-03-17 20:56:25 +01:00

pubspec.lock

pubspec.lock: update deps

2026-02-24 18:44:20 +09:00

pubspec.yaml

{flake.lock,pubspec.*}: bump

2026-02-21 00:49:24 +09:00

README.md

README: add textual overview of the word search procedure

2026-02-28 14:52:22 +09:00

README.md

jadb

An SQLite database containing open source japanese dictionary data combined from several sources

Note that while the license for the code is MIT, the data has various licenses.

Sources

Source name	URL
JMDict:	https://edrdg.org/jmdict/j_jmdict.html
RADKFILE/KRADFILE:	https://www.edrdg.org/krad/kradinf.html
KANJIDIC2:	https://www.edrdg.org/kanjidic/kanjd2index_legacy.html
Tanos JLPT levels:	https://www.tanos.co.uk/jlpt/
Kangxi Radicals:	https://ctext.org/kangxi-zidian

Implementation details

Word search

The word search procedure is currently split into 3 parts:

Entry ID query:

Use a complex query with various scoring factors to try to get list of database ids pointing at dictionary entries, sorted by how likely we think this word is the word that the caller is looking for. The output here is a List<int>

Data Query:

Takes the entry id list from the last search, and performs all queries needed to retrieve all the dictionary data for those IDs. The result is a struct with a bunch of flattened lists with data for all the dictionary entries. These lists are sorted by the order that the ids were provided.

Regrouping:

Takes the flattened data, and regroups the items into structs with a more "hierarchical" structure. All data tagged with the same ID will end up in the same struct. Returns a list of these structs.