106 Commits

Author SHA1 Message Date
d168f07563 WIP
All checks were successful
Build and test / build (push) Successful in 17m29s
2026-04-13 21:10:56 +09:00
d13138f8a5 Add datasource versions to database
All checks were successful
Build and test / build (push) Successful in 7m56s
2026-04-13 21:00:39 +09:00
cbaa9ec6b3 benchmark: create separate benchmarks for jp and en search
All checks were successful
Build and test / build (push) Successful in 6m53s
2026-04-13 20:24:26 +09:00
d1e2fa3748 test/search/radical_search: skip failing tests for now 2026-04-13 20:24:25 +09:00
3f4fdf470d jmdict: store glossary type in different table 2026-04-13 20:24:25 +09:00
556d07913d jmdict: don't store glossary language 2026-04-13 19:42:11 +09:00
6165045ea7 migrations: simplify JMdict_CombinedEntryScore 2026-04-13 19:29:04 +09:00
316dff3b46 migrations: comment out unused jmdict <-> kanjidic xref table 2026-04-13 19:27:16 +09:00
747e680a02 migrations: remove some excessive indices 2026-04-13 19:27:16 +09:00
4f73e07056 test/search/radical_search: init 2026-04-13 19:12:15 +09:00
15540514f6 jmdict: don't store kanji + reading for xrefs
All checks were successful
Build and test / build (push) Successful in 7m0s
2026-04-13 18:33:06 +09:00
4faf543d6e jmdict: don't store empty entry scores
All checks were successful
Build and test / build (push) Successful in 8m17s
2026-04-13 18:18:48 +09:00
d1a6f39cca kanjidic: split grade/freq/jlpt into separate tables
All checks were successful
Build and test / build (push) Successful in 8m1s
2026-04-09 14:10:51 +09:00
a222b2d9b8 jmdict: elementId instead of reading for element restriction tables
All checks were successful
Build and test / build (push) Successful in 8m0s
2026-04-08 19:57:07 +09:00
6364457d9e docs/database: add some notes about elementId embeddings
All checks were successful
Build and test / build (push) Successful in 7m56s
2026-04-08 19:07:48 +09:00
5d26b41524 jmdict: embed element type (k/r) into elementId 2026-04-08 19:05:02 +09:00
114febbe02 docs/database: add some notes about JMdict_EntryScore
All checks were successful
Build and test / build (push) Successful in 9m59s
2026-04-08 18:14:28 +09:00
20243dec09 docs: add docs about database schema choices
All checks were successful
Build and test / build (push) Successful in 8m35s
2026-04-08 18:01:16 +09:00
f6de8680ad jmdict: infer entryId from element ids 2026-04-08 18:01:16 +09:00
99218a6987 jmdict: embed orderNum in senseId for senses 2026-04-08 17:21:50 +09:00
e8ee1ab944 data_ingestion/tanos-jlpt: remove redundant code import 2026-04-08 17:21:49 +09:00
4f320e4ea9 jmdict: embed orderNum in elementId for kanji and readings 2026-04-08 17:21:49 +09:00
9c9f5543c8 .gitea/workflows: fix 'update database inputs' step
All checks were successful
Build and test / build (push) Successful in 8m59s
2026-04-08 14:05:28 +09:00
be493a6150 .gitea/workflows: fix build-and-test job
All checks were successful
Build and test / build (push) Successful in 8m59s
2026-04-08 13:36:42 +09:00
8d742b92be flake.nix: pull tanos jlpt data from datasources repo
Some checks failed
Build and test / build (push) Has been cancelled
2026-04-08 13:34:25 +09:00
9b9c771eff flake.nix: pull datasources from datasources repo
Some checks failed
Build and test / build (push) Failing after 8m19s
2026-04-07 17:26:47 +09:00
eebeaba0e0 flake.nix: split off sqlite debugging tools into separate devshell
Some checks failed
Build and test / build (push) Failing after 11m9s
2026-04-06 12:56:17 +09:00
61ac226fc3 word_search_result: add getter for unusual kanji flag
All checks were successful
Build and test / build (push) Successful in 11m48s
2026-04-02 15:53:39 +09:00
ede57a7a00 docs: init
All checks were successful
Build and test / build (push) Successful in 11m51s
2026-04-01 16:48:40 +09:00
2ad1e038f1 tanos-jlpt: remove flatten from xml stream
All checks were successful
Build and test / build (push) Successful in 13m41s
This was earlier used to compensate for a double nesting bug. This has
been fixed in the latest version of the xml package.
2026-04-01 16:04:44 +09:00
f40825de65 jmdict: skip inserting duplicate xrefs 2026-04-01 16:03:56 +09:00
5aa068eaec flake.nix: add sqldiff to devshell
Some checks failed
Build and test / build (push) Failing after 12m0s
2026-04-01 15:27:25 +09:00
170c3a853e flake.lock: bump, pubspec.lock: update inputs
Some checks failed
Build and test / build (push) Failing after 10m4s
2026-03-26 22:18:10 +09:00
c70838d1bf Add a basic benchmark
All checks were successful
Build and test / build (push) Successful in 13m30s
2026-03-04 19:00:57 +09:00
0f7854a4fc migrations: add version tables for all data sources
All checks were successful
Build and test / evals (push) Successful in 11m34s
2026-03-03 12:59:58 +09:00
a86f857553 util/romaji_transliteration: add functions to generate transliteration spans
All checks were successful
Build and test / evals (push) Successful in 18m58s
2026-03-02 18:23:36 +09:00
d14e3909d4 search/filter_kanji: keep order when deduplicating
All checks were successful
Build and test / evals (push) Successful in 13m33s
2026-03-02 17:37:45 +09:00
bb44bf786a tests: move const_data tests to test/const_data
All checks were successful
Build and test / evals (push) Successful in 11m38s
2026-03-02 17:16:14 +09:00
ad3343a01e README: add link to coverage
All checks were successful
Build and test / evals (push) Successful in 13m25s
2026-03-02 15:02:36 +09:00
16d72e94ba WIP: .gitea/workflows: generate coverage
All checks were successful
Build and test / evals (push) Successful in 13m17s
2026-03-02 14:34:08 +09:00
b070a1fd31 .gitea/workflows: merge build and test pipeline 2026-03-02 14:31:59 +09:00
dcf5c8ebe7 lemmatizer: implement equality for AllomorphPattern/LemmatizationRule 2026-03-02 12:01:13 +09:00
1f8bc8bac5 lemmatizer: let LemmatizationRule.validChildClasses be a set 2026-03-02 12:01:13 +09:00
ab28b5788b search/word_search: fix english queries without pageSize/offset 2026-03-02 12:01:13 +09:00
dd7b2917dc flake.nix: add lcov to devshell 2026-03-02 12:01:13 +09:00
74798c77b5 flake.nix: add libsqlite to LD_LIBRARY_PATH in devshell 2026-03-02 12:01:12 +09:00
63a4caa626 lemmatizer/rules/ichidan: add informal conditionals 2026-03-02 12:01:12 +09:00
374be5ca6b lemmatizer: add some basic tests 2026-03-02 12:01:12 +09:00
4a6fd41f31 lemmatizer: misc small improvements 2026-03-02 12:01:12 +09:00
c06fff9e5a lemmatizer/rules: name all rules as separate static variables 2026-03-02 12:01:12 +09:00
1d9928ade1 search/kanji: split queries into separate functions 2026-03-02 12:01:11 +09:00
1a3b04be00 word_search_result: add romanization getters 2026-03-02 12:01:11 +09:00
c0c6f97a01 search/word_search: fix casing of SearchMode variants 2026-03-02 12:01:11 +09:00
a954188d5d Fix a few lints 2026-03-02 12:01:11 +09:00
5b86d6eb67 README: add textual overview of the word search procedure 2026-03-02 12:01:11 +09:00
72f31e974b dart format 2026-03-02 12:01:10 +09:00
e824dc0a22 search/word_search: split data queries into functions 2026-03-02 12:01:10 +09:00
f5bca61839 flake.lock: bump
Some checks failed
Build database / evals (push) Successful in 10m44s
Run tests / evals (push) Failing after 43m13s
2026-02-25 16:28:18 +09:00
056aaaa0ce tests/search_match_inference: add more cases
Some checks failed
Build database / evals (push) Has been cancelled
Run tests / evals (push) Has been cancelled
2026-02-25 12:42:38 +09:00
a696ed9733 Generate matchspans for word search results
Some checks failed
Run tests / evals (push) Failing after 12m29s
Build database / evals (push) Successful in 12m36s
2026-02-24 21:27:12 +09:00
00b963bfed .gitea/workflows/test: init
Some checks failed
Build database / evals (push) Successful in 10m43s
Run tests / evals (push) Failing after 12m27s
2026-02-24 20:43:07 +09:00
4376012f18 pubspec.lock: update deps
All checks were successful
Build database / evals (push) Successful in 10m40s
2026-02-24 18:44:20 +09:00
8ae1d882a0 Add TODO for word matching
All checks were successful
Build database / evals (push) Successful in 12m32s
2026-02-24 15:21:03 +09:00
81db60ccf7 Add some docstrings
Some checks failed
Build database / evals (push) Has been cancelled
2026-02-24 15:13:33 +09:00
f57cc68ef3 search/radicals: deduplicate input radicals before search 2026-02-24 15:08:19 +09:00
48f50628a1 Create empty() factory for word search results
All checks were successful
Build database / evals (push) Successful in 35m56s
2026-02-23 13:01:57 +09:00
1783338b2a nix/database_tool: fix building
All checks were successful
Build database / evals (push) Successful in 10m47s
2026-02-21 00:49:53 +09:00
e92e99922b {flake.lock,pubspec.*}: bump 2026-02-21 00:49:24 +09:00
05b56466e7 tanos-jlpt: fix breaking changes for csv parser 2026-02-21 00:46:24 +09:00
33016ca751 flake.nix: comment out sqlint, currently broken due to dep build failure
All checks were successful
Build database / evals (push) Successful in 12m29s
2026-02-09 14:45:19 +09:00
98d92d370d {flake.lock,pubspec.lock}: bump, source libsqlite via hooks 2026-02-09 14:44:14 +09:00
5252936bdc flake.nix: filter more files from src 2026-02-09 14:40:53 +09:00
ac0cb14bbe flake.lock: bump, pubspec.lock: update inputs
All checks were successful
Build database / evals (push) Successful in 41m44s
2025-12-19 08:34:58 +09:00
49a86f60ea .gitea/workflows: upload db as artifact
Some checks failed
Build database / evals (push) Has been cancelled
2025-12-19 08:27:46 +09:00
9472156feb .gitea/workflows: update actions/checkout: v3 -> v6
All checks were successful
Build database / evals (push) Successful in 12m32s
2025-12-08 18:51:18 +09:00
4fbdba604e .gitea/workflows: run on debian-latest 2025-12-08 18:51:18 +09:00
0cdfa2015e .gitea/workflows: add workflow for building database
All checks were successful
Build database / evals (push) Successful in 15m4s
2025-11-13 16:35:25 +09:00
a9ca9b08a5 flake.lock: bump, pubspec.lock: update inputs 2025-11-13 16:13:51 +09:00
45e8181041 search/kanji: don't transliterate onyomi to katakana 2025-07-30 01:37:26 +02:00
0d3ebc97f5 flake.lock: bump 2025-07-17 00:24:35 +02:00
bb68319527 treewide: add and apply a bunch of lints 2025-07-17 00:24:35 +02:00
2803db9c12 bin/query-word: fix default pagination 2025-07-16 18:32:47 +02:00
93b76ed660 word_search: include data for cross references 2025-07-16 18:32:28 +02:00
29a3a6aafb treewide: dart format 2025-07-16 15:23:04 +02:00
3a2adf0367 pubspec.{yaml,lock}: update deps 2025-07-15 21:32:42 +02:00
eae6e881a7 flake.lock: bump 2025-07-15 21:32:35 +02:00
0a3387e77a search: add function for fetching multiple kanji at once 2025-07-15 00:58:16 +02:00
f30465a33c search: add function for fetching multiple word entries by id at once 2025-07-15 00:52:25 +02:00
d9006a0767 word_search: fix count query 2025-07-13 20:34:39 +02:00
1e1761ab4d pubspec.{yaml,lock}: update deps 2025-07-13 20:15:13 +02:00
37d29fc6ad cli/query_word: add flags for pagination 2025-07-13 20:12:22 +02:00
60898fe9a2 word_search: fix pagination 2025-07-13 20:12:10 +02:00
5049157b02 cli/query_word: add --json flag 2025-07-13 16:27:11 +02:00
1868c6fb41 word_search: don't throw error on empty results 2025-07-09 14:57:19 +02:00
4ee21d98e2 flake.lock: bump 2025-07-08 20:37:16 +02:00
7247af19cb word_search: always order exact matches first 2025-07-07 13:27:50 +02:00
ac7deae608 word_search: remove duplicate results 2025-07-07 12:47:20 +02:00
7978b74f8d lib/{_data_ingestion/search}: store kanjidic onyomi as hiragana 2025-06-25 20:18:28 +02:00
50870f64a0 cli/query_kanji: remove -k flag, use arguments 2025-06-25 20:18:27 +02:00
62d77749e6 cli/query_word: allow querying with jmdict id 2025-06-25 20:18:27 +02:00
80b3610a72 Store type enum as CHAR(1) 2025-06-25 20:18:27 +02:00
54705c3c10 word_search: add TODO 2025-06-24 23:04:47 +02:00
c7134f0d06 flake.nix: filter src 2025-06-24 19:33:10 +02:00
aac9bf69f6 cli/create_db: return an erroneous exit on on error 2025-06-24 19:33:09 +02:00
189d4a95cf test/word_search: cover more functionality 2025-06-24 19:33:09 +02:00
c32775ce7a use ids for \{kanji,reading\}Element tables 2025-06-24 19:33:02 +02:00
114 changed files with 4663 additions and 11288 deletions

View File

@@ -0,0 +1,70 @@
name: "Build and test"
on:
workflow_dispatch:
pull_request:
push:
jobs:
build:
runs-on: debian-latest
steps:
- uses: actions/checkout@v6
- name: Install sudo
run: apt-get update && apt-get -y install sudo
- name: Install nix
uses: https://github.com/cachix/install-nix-action@v31
with:
extra_nix_config: |
experimental-features = nix-command flakes
show-trace = true
max-jobs = auto
trusted-users = root
experimental-features = nix-command flakes
build-users-group =
- name: Update database inputs
run: nix flake update datasources
- name: Build database
run: nix build .#database -L
- name: Upload database as artifact
uses: actions/upload-artifact@v3
with:
name: jadb-${{ gitea.sha }}.zip
path: result/jadb.sqlite
if-no-files-found: error
retention-days: 15
# Already compressed
compression: 0
- name: Print database statistics
run: nix develop .#sqlite-debugging --command sqlite3_analyzer result/jadb.sqlite
# TODO: Defer failure of tests until after the coverage report is generated and uploaded.
- name: Run tests
run: nix develop .# --command dart run test --concurrency=1 --coverage-path=coverage/lcov.info
- name: Generate coverage report
run: |
GENHTML_ARGS=(
--current-date="$(date)"
--dark-mode
--output-directory coverage/report
)
nix develop .# --command genhtml "${GENHTML_ARGS[@]}" coverage/lcov.info
- name: Upload coverage report
uses: https://git.pvv.ntnu.no/Projects/rsync-action@v2
with:
source: ./coverage
target: jadb/${{ gitea.ref_name }}/
username: oysteikt
ssh-key: ${{ secrets.OYSTEIKT_GITEA_WEBDOCS_SSH_KEY }}
host: microbel.pvv.ntnu.no
known-hosts: "microbel.pvv.ntnu.no ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEq0yasKP0mH6PI6ypmuzPzMnbHELo9k+YB5yW534aKudKZS65YsHJKQ9vapOtmegrn5MQbCCgrshf+/XwZcjbM="
- name: Run benchmarks
run: nix develop .# --command dart run benchmark_harness:bench --flavor jit

1
.gitignore vendored
View File

@@ -8,6 +8,7 @@
# Conventional directory for build output.
/doc/
/build/
/coverage/
main.db
# Nix

View File

@@ -1,7 +1,9 @@
# jadb
[![built with nix](https://builtwithnix.org/badge.svg)](https://builtwithnix.org)
[Latest coverage report](https://www.pvv.ntnu.no/~oysteikt/gitea/jadb/main/coverage/report/)
# jadb
An SQLite database containing open source japanese dictionary data combined from several sources
Note that while the license for the code is MIT, the data has various licenses.
@@ -16,3 +18,4 @@ Note that while the license for the code is MIT, the data has various licenses.
| **Tanos JLPT levels:** | https://www.tanos.co.uk/jlpt/ |
| **Kangxi Radicals:** | https://ctext.org/kangxi-zidian |
See [docs/overview.md](./docs/overview.md) for notes and implementation details.

41
analysis_options.yaml Normal file
View File

@@ -0,0 +1,41 @@
# This file configures the analyzer, which statically analyzes Dart code to
# check for errors, warnings, and lints.
#
# The issues identified by the analyzer are surfaced in the UI of Dart-enabled
# IDEs (https://dart.dev/tools#ides-and-editors). The analyzer can also be
# invoked from the command line by running `flutter analyze`.
# The following line activates a set of recommended lints for Flutter apps,
# packages, and plugins designed to encourage good coding practices.
include:
- package:lints/recommended.yaml
linter:
# The lint rules applied to this project can be customized in the
# section below to disable rules from the `package:flutter_lints/flutter.yaml`
# included above or to enable additional rules. A list of all available lints
# and their documentation is published at https://dart.dev/lints.
#
# Instead of disabling a lint rule for the entire project in the
# section below, it can also be suppressed for a single line of code
# or a specific dart file by using the `// ignore: name_of_lint` and
# `// ignore_for_file: name_of_lint` syntax on the line or in the file
# producing the lint.
rules:
always_declare_return_types: true
annotate_redeclares: true
avoid_print: false
avoid_setters_without_getters: true
avoid_slow_async_io: true
directives_ordering: true
eol_at_end_of_file: true
prefer_const_declarations: true
prefer_contains: true
prefer_final_fields: true
prefer_final_locals: true
prefer_single_quotes: true
use_key_in_widget_constructors: true
use_null_aware_elements: true
# Additional information about this file can be found at
# https://dart.dev/guides/language/analysis-options

7
benchmark/benchmark.dart Normal file
View File

@@ -0,0 +1,7 @@
import './search/english_word_search.dart';
import './search/japanese_word_search.dart';
Future<void> main() async {
await EnglishWordSearchBenchmark.main();
await JapaneseWordSearchBenchmark.main();
}

View File

@@ -0,0 +1,49 @@
import 'package:benchmark_harness/benchmark_harness.dart';
import 'package:jadb/search.dart';
import 'package:sqflite_common/sqlite_api.dart';
import '../../test/search/setup_database_connection.dart';
class EnglishWordSearchBenchmark extends AsyncBenchmarkBase {
Database? connection;
static final List<String> searchTerms = [
'kana',
'kanji',
'cute',
'sushi',
'ramen',
];
EnglishWordSearchBenchmark() : super('EnglishWordSearchBenchmark');
static Future<void> main() async {
print('Running EnglishWordSearchBenchmark...');
await EnglishWordSearchBenchmark().report();
print('Finished EnglishWordSearchBenchmark');
}
@override
Future<void> setup() async {
connection = await setupDatabaseConnection();
}
@override
Future<void> run() async {
for (final term in searchTerms) {
final result = await connection!.jadbSearchWord(term);
assert(
result?.isNotEmpty ?? false,
'Expected search results for term "$term"',
);
}
}
@override
Future<void> teardown() async {
await connection?.close();
}
// @override
// Future<void> exercise() => run();
}

View File

@@ -0,0 +1,49 @@
import 'package:benchmark_harness/benchmark_harness.dart';
import 'package:jadb/search.dart';
import 'package:sqflite_common/sqlite_api.dart';
import '../../test/search/setup_database_connection.dart';
class JapaneseWordSearchBenchmark extends AsyncBenchmarkBase {
Database? connection;
static final List<String> searchTerms = [
'仮名',
'漢字',
'かわいい',
'すし',
'ラメン',
];
JapaneseWordSearchBenchmark() : super('JapaneseWordSearchBenchmark');
static Future<void> main() async {
print('Running JapaneseWordSearchBenchmark...');
await JapaneseWordSearchBenchmark().report();
print('Finished JapaneseWordSearchBenchmark');
}
@override
Future<void> setup() async {
connection = await setupDatabaseConnection();
}
@override
Future<void> run() async {
for (final term in searchTerms) {
final result = await connection!.jadbSearchWord(term);
assert(
result?.isNotEmpty ?? false,
'Expected search results for term "$term"',
);
}
}
@override
Future<void> teardown() async {
await connection?.close();
}
// @override
// Future<void> exercise() => run();
}

View File

@@ -9,7 +9,7 @@ import 'package:jadb/cli/commands/query_word.dart';
Future<void> main(List<String> args) async {
final runner = CommandRunner(
'jadb',
"CLI tool to help creating and testing the jadb database",
'CLI tool to help creating and testing the jadb database',
);
runner.addCommand(CreateDb());

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,582 +0,0 @@
,,Ah
,ああ,like that
,あいだ,a space
合う,あう,to match
,あかちゃん,baby
上る,あがる,to rise
赤ん坊,あかんぼう,baby
空く,あく,"to open, to become empty"
,あげる,to give
浅い,あさい,"shallow, superficial"
,あじ,flavour
明日,あす・あした,tomorrow
遊び,あそび,play
集る,あつまる,to gather
集める,あつめる,to collect something
謝る,あやまる,to apologize
安心,あんしん,relief
安全,あんぜん,safety
,あんな,such
以下,いか,less than
以外,いがい,with the exception of
医学,いがく,medical science
生きる,いきる,to live
意見,いけん,opinion
,いし,stone
,いじめる,to tease
以上,いじょう,"more than, this is all"
急ぐ,いそぐ,to hurry
致す,いたす,(humble) to do
一度,いちど,once
一生懸命,いっしょうけんめい,with utmost effort
,いっぱい,full
,いと,thread
以内,いない,within
田舎,いなか,countryside
祈る,いのる,to pray
,いらっしゃる,"(respectful) to be, to come or to go"
植える,うえる,"to plant, to grow"
受付,うけつけ,receipt
受ける,うける,to take a lesson or test
動く,うごく,to move
,うち,within
打つ,うつ,to hit
美しい,うつくしい,beautiful
写す,うつす,to copy or photograph
移る,うつる,to move house or transfer
,うで,arm
,うら,reverse side
売り場,うりば,place where things are sold
,うん,(informal) yes
運転手,うんてんしゅ,driver
,えだ,"branch, twig"
選ぶ,えらぶ,to choose
遠慮,えんりょ・する,"to be reserved, to be restrained"
,おいでになる,(respectful) to be
お祝い,おいわい,congratulation
,おかげ,"owing to, thanks to"
,おかしい,strange or funny
,おく,one hundred million
屋上,おくじょう,rooftop
贈り物,おくりもの,gift
送る,おくる,to send
遅れる,おくれる,to be late
起す,おこす,to wake
行う,おこなう,to do
怒る,おこる,"to get angry, to be angry"
押し入れ,おしいれ,closet
お嬢さん,おじょうさん,young lady
お宅,おたく,(polite) your house
落る,おちる,to fall or drop
,おっしゃる,(respectful) to say
,おっと,husband
,おつり,"change from purchase, balance"
,おと,"sound, note"
落す,おとす,to drop
踊り,おどり,a dance
踊る,おどる,to dance
驚く,おどろく,to be surprised
お祭り,おまつり,festival
お見舞い,おみまい,"calling on someone who is ill, enquiry"
お土産,おみやげ,souvenir
思い出す,おもいだす,to remember
思う,おもう,"to think, to feel"
,おもちゃ,toy
,おもて,the front
,おや,parents
泳ぎ方,およぎかた,way of swimming
下りる,おりる,"to get off, to descend"
折る,おる,to break or to fold
お礼,おれい,expression of gratitude
折れる,おれる,to break or be folded
終わり,おわり,the end
海岸,かいがん,coast
会議,かいぎ,meeting
会議室,かいぎしつ,meeting room
会場,かいじょう,assembly hall or meeting place
会話,かいわ,conversation
帰り,かえり,return
変える,かえる,to change
科学,かがく,science
,かがみ,mirror
掛ける,かける,to hang something
飾る,かざる,to decorate
火事,かじ,fire
,ガス,gas
堅/硬/固い,かたい,hard
,かたち,shape
片付ける,かたづける,to tidy up
課長,かちょう,section manager
勝つ,かつ,to win
家内,かない,housewife
悲しい,かなしい,sad
必ず,かならず,"certainly,necessarily"
お・金持ち,かねもち/おかねもち,rich man
彼女,かのじょ,"she,girlfriend"
,かべ,wall
,かみ,hair
噛む,かむ,"to bite,to chew"
通う,かよう,to commute
,かれ,"he,boyfriend"
乾く,かわく,to get dry
代わり,かわり,"substitute,alternate"
変わる,かわる,to change
考える,かんがえる,to consider
関係,かんけい,relationship
看護師,かんごし, nurse
簡単,かんたん,simple
,,"spirit,mood"
機会,きかい,opportunity
危険,きけん,danger
聞こえる,きこえる,to be heard
汽車,きしゃ,steam train
技術,ぎじゅつ,"art,technology,skill"
季節,きせつ,season
規則,きそく,regulations
,きっと,surely
,きぬ,silk
厳しい,きびしい,strict
気分,きぶん,mood
決る,きまる,to be decided
,きみ,(informal) You
決める,きめる,to decide
気持ち,きもち,"feeling,mood"
着物,きもの,kimono
,きゃく,"guest,customer"
,きゅう,"urgent, steep"
急行,きゅうこう,"speedy, express"
教育,きょういく,education
教会,きょうかい,church
競争,きょうそう,competition
興味,きょうみ,an interest
近所,きんじょ,neighbourhood
具合,ぐあい,"condition,health"
空気,くうき,"air,atmosphere"
空港,くうこう,airport
,くさ,grass
,くび,neck
,くも,cloud
比べる,くらべる,to compare
,くれる,to give
暮れる,くれる,"to get dark,to come to an end"
,くん,suffix for familiar young male
,,hair or fur
経済,けいざい,"finance,economy"
警察,けいさつ,police
景色,けしき,"scene,landscape"
消しゴム,けしゴム,eraser
下宿,げしゅく,lodging
決して,けっして,never
,けれど/けれども,however
原因,げんいん,"cause,source"
,けんか・する,to quarrel
研究,けんきゅう,research
研究室,けんきゅうしつ,"study room,laboratory"
見物,けんぶつ,sightseeing
,,child
,こう,this way
郊外,こうがい,outskirts
講義,こうぎ,lecture
工業,こうぎょう,the manufacturing industry
高校,こうこう,high school
高校生,こうこうせい,high school student
工場,こうじょう/こうば,"factory,plant,mill,workshop"
校長,こうちょう,headmaster
交通,こうつう,"traffic,transportation"
講堂,こうどう,auditorium
高等学校,こうとうがっこう,high school
公務員,こうむいん,"civil servant, government worker"
国際,こくさい,international
,こころ,"heart, mind, core"
御主人,ごしゅじん,(honorable) your husband
故障,こしょう・する,to break-down
ご存じ,ごぞんじ,(respect form ) to know
,こたえ,response
,ごちそう,a feast
小鳥,ことり,small bird
,このあいだ,"the other day,recently"
,このごろ,"these days,nowadays"
細かい,こまかい,"small, fine"
込む,こむ,to include
,こめ,uncooked rice
,ごらんになる,(respectful) to see
,これから,after this
怖い,こわい,frightening
壊す,こわす,to break
壊れる,こわれる,to be broken
今度,こんど,"now,next time"
今夜,こんや,tonight
最近,さいきん,"latest,nowadays"
最後,さいご,"last,end"
最初,さいしょ,"beginning,first"
,さか,"slope,hill"
探す,さがす,to look for
下る,さがる,"to get down,to descend"
盛ん,さかん,"popularity,prosperous"
下げる,さげる,"to hang,to lower,to move back"
差し上げる,さしあげる,(polite) to give
,さっき,some time ago
寂しい,さびしい,lonely
さ来月,さらいげつ,the month after next
さ来週,さらいしゅう,the week after next
騒ぐ,さわぐ,"to make noise,to be excited"
触る,さわる,to touch
産業,さんぎょう,industry
残念,ざんねん,disappointment
,,city
,,character
試合,しあい,"match,game"
仕方,しかた,method
試験,しけん,examination
事故,じこ,accident
地震,じしん,earthquake
時代,じだい,era
下着,したぎ,underwear
,しっかり,"firmly,steadily"
失敗,しっぱい,"failure,mistake"
辞典,じてん,dictionary
品物,しなもの,goods
,しばらく,little while
,しま,island
市民,しみん,citizen
事務所,じむしょ,office
社会,しゃかい,"society,public"
社長,しゃちょう,company president
自由,じゆう,freedom
習慣,しゅうかん,"custom,manners"
住所,じゅうしょ,"an address,a residence"
柔道,じゅうどう,judo
十分,じゅうぶん,enough
趣味,しゅみ,hobby
紹介,しょうかい,introduction
小学校,しょうがっこう,elementary school
小説,しょうせつ,novel
将来,しょうらい,"future,prospects"
食料品,しょくりょうひん,groceries
女性,じょせい,woman
知らせる,しらせる,to notify
調べる,しらべる,to investigate
人口,じんこう,population
神社,じんじゃ,Shinto shrine
親切,しんせつ,kindness
新聞社,しんぶんしゃ,newspaper company
水泳,すいえい,swimming
水道,すいどう,water supply
数学,すうがく,"mathematics,arithmetic"
過ぎる,すぎる,to exceed
凄い,すごい,terrific
進む,すすむ,to make progress
,すっかり,completely
,すっと,"straight,all of a sudden"
捨てる,すてる,to throw away
,すな,sand
滑る,すべる,"to slide,to slip"
,すみ,"corner,nook"
済む,すむ,to finish
,すり,pickpocket
,すると,then
生活,せいかつ・する,to live
生産,せいさん・する,to produce
政治,せいじ,"politics,government"
西洋,せいよう,western countries
世界,せかい,the world
,せき,seat
説明,せつめい,explanation
背中,せなか,back of the body
,せん,line
戦争,せんそう,war
先輩,せんぱい,senior
,そう,really
育てる,そだてる,"to rear,to bring up"
卒業,そつぎょう,graduation
祖父,そふ,grandfather
祖母,そぼ,grandmother
,それで,because of that
,それに,moreover
,それほど,to that extent
,そろそろ,"gradually,soon"
,そんな,that sort of
,そんなに,"so much,like that"
退院,たいいん・する,to leave hospital
大学生,だいがくせい,university student
大事,だいじ,"important,valuable,serious matter"
大体,だいたい,generally
,たいてい,usually
大分,だいぶ,greatly
台風,たいふう,typhoon
倒れる,たおれる,to break down
,だから,"so,therefore"
確か,たしか,definite
足す,たす,to add a number
訪ねる,たずねる,to visit
尋ねる,たずねる,to ask
正しい,ただしい,correct
,たたみ,Japanese straw mat
立てる,たてる,to stand something up
建てる,たてる,to build
例えば,たとえば,for example
,たな,shelves
楽しみ,たのしみ,joy
楽む,たのしむ,to enjoy oneself
,たまに,occasionally
,ため,in order to
足りる,たりる,to be enough
男性,だんせい,male
暖房,だんぼう,heating
,,blood
,チェック・する,to check
,ちから,"strength,power"
,ちっとも,not at all (used with a negative verb)
,ちゃん,suffix for familiar person
注意,ちゅうい,caution
中学校,ちゅうがっこう,"junior high school,middle school"
注射,ちゅうしゃ,injection
駐車場,ちゅうしゃじょう,parking lot
地理,ちり,geography
捕まえる,つかまえる,to seize
付く,つく,to be attached
漬ける,つける,"to soak,to pickle"
都合,つごう,"circumstances,convenience"
伝える,つたえる,to report
続く,つづく,to be continued
続ける,つづける,to continue
包む,つつむ,to wrap
,つま,my wife
,つもり,intention
釣る,つる,to fish
丁寧,ていねい,polite
適当,てきとう,suitability
手伝う,てつだう,to assist
手袋,てぶくろ,glove
,てら,temple
,てん,"point,dot"
店員,てんいん,shop assistant
天気予報,てんきよほう,weather forecast
電灯,でんとう,electric light
電報,でんぽう,telegram
展覧会,てんらんかい,exhibition
,,metropolitan
道具,どうぐ,"tool,means"
,とうとう,"finally, after all"
動物園,どうぶつえん,zoo
遠く,とおく,distant
通る,とおる,to go through
特に,とくに,"particularly,especially"
特別,とくべつ,special
,とこや,barber
途中,とちゅう,on the way
特急,とっきゅう,limited express train (faster than an express train)
届ける,とどける,"to send, to deliver, to report"
泊まる,とまる,to lodge at
止める,とめる,to stop something
取り替える,とりかえる,to exchange
泥棒,どろぼう,thief
,どんどん,more and more
直す,なおす,"to fix,to repair"
直る,なおる,"to be fixed,to be repaired"
治る,なおる,"to be cured,to heal"
泣く,なく,to weep
無くなる,なくなる,"to disappear,to get lost"
亡くなる,なくなる,to die
投げる,なげる,to throw or cast away
,なさる,(respectful) to do
鳴る,なる,to sound
,なるべく,as much as possible
,なるほど,now I understand
慣れる,なれる,to grow accustomed to
苦い,にがい,bitter
二階建て,にかいだて,two storied
逃げる,にげる,to escape
日記,にっき,journal
入院,にゅういん・する,"to hospitalise, hospitalisation"
入学,にゅうがく・する,to enter school or university
似る,にる,to be similar
人形,にんぎょう,"doll, figure"
盗む,ぬすむ,to steal
塗る,ぬる,"to paint, to colour, to plaster"
,ぬれる,to get wet
,ねだん,price
,ねつ,fever
寝坊,ねぼう,sleeping in late
眠い,ねむい,sleepy
眠る,ねむる,to sleep
残る,のこる,to remain
乗り換える,のりかえる,to change between buses or trains
乗り物,のりもの,vehicle
,,leaf
場合,ばあい,situation
,ばい,double
拝見,はいけん・する,(humble) to look at
歯医者,はいしゃ,dentist
運ぶ,はこぶ,to transport
始める,はじめる,to begin
場所,ばしょ,location
,はず,it should be so
恥ずかしい,はずかしい,embarrassed
発音,はつおん,pronunciation
,はっきり,clearly
花見,はなみ,cherry-blossom viewing
,はやし,"woods,forester"
払う,はらう,to pay
番組,ばんぐみ,television or radio program
反対,はんたい,opposition
,,"day, sun"
,,fire
冷える,ひえる,to grow cold
,ひかり,light
光る,ひかる,"to shine,to glitter"
引き出し,ひきだし,"drawer,drawing out"
,ひきだす,to withdraw
,ひげ,beard
飛行場,ひこうじょう,airport
久しぶり,ひさしぶり,after a long time
美術館,びじゅつかん,art gallery
非常に,ひじょうに,extremely
引っ越す,ひっこす,to move house
必要,ひつよう,necessary
,ひどい,awful
開く,ひらく,to open an event
昼間,ひるま,"daytime,during the day"
昼休み,ひるやすみ,noon break
拾う,ひろう,"to pick up,to gather"
増える,ふえる,to increase
深い,ふかい,deep
複雑,ふくざつ,"complexity,complication"
復習,ふくしゅう,revision
部長,ぶちょう,head of a section
普通,ふつう,"usually, or a train that stops at every station"
,ぶどう,grapes
太る,ふとる,to become fat
布団,ふとん,"Japanese bedding, futon"
,ふね,ship
不便,ふべん,inconvenience
踏む,ふむ,to step on
降り出す,ふりだす,to start to rain
文化,ぶんか,culture
文学,ぶんがく,literature
文法,ぶんぽう,grammar
,べつ,different
,へん,strange
返事,へんじ,reply
貿易,ぼうえき,trade
法律,ほうりつ,law
,ぼく,I (used by males)
,ほし,star
,ほとんど,mostly
,ほめる,to praise
翻訳,ほんやく,translation
参る,まいる,"(humble) to go,to come"
負ける,まける,to lose
,または,"or,otherwise"
間違える,まちがえる,to make a mistake
間に合う,まにあう,to be in time for
周り,まわり,surroundings
回る,まわる,to go around
漫画,まんが,comic
真中,まんなか,middle
見える,みえる,to be in sight
,みずうみ,lake
味噌,みそ,"miso, soybean paste"
見つかる,みつかる,to be discovered
見つける,みつける,to discover
,みな,everybody
,みなと,harbour
向かう,むかう,to face
迎える,むかえる,to go out to meet
,むかし,"old times, old days, long ago, formerly"
,むし,insect
息子,むすこ,(humble) son
,むすめ,(humble) daughter
無理,むり,impossible
召し上がる,めしあがる,(polite) to eat
珍しい,めずらしい,rare
申し上げる,もうしあげる,"(humble) to say,to tell"
申す,もうす,"(humble) to be called,to say"
,もうすぐ,soon
,もし,if
戻る,もどる,to turn back
木綿,もめん,cotton
,もり,forest
焼く,やく,"to bake,to grill"
約束,やくそく,promise
役に立つ,やくにたつ,to be helpful
焼ける,やける,"to burn,to be roasted"
優しい,やさしい,kind
痩せる,やせる,to become thin
,やっと,at last
止む,やむ,to stop
止める,やめる,to stop
柔らかい,やわらかい,soft
,,hot water
,ゆび,finger
指輪,ゆびわ,a ring
,ゆめ,dream
揺れる,ゆれる,"to shake,to sway"
,よう,use
用意,ようい,preparation
用事,ようじ,things to do
汚れる,よごれる,to get dirty
予習,よしゅう,preparation for a lesson
予定,よてい,arrangement
予約,よやく,reservation
寄る,よる,to visit
喜ぶ,よろこぶ,to be delighted
理由,りゆう,reason
利用,りよう,utilization
両方,りょうほう,both sides
旅館,りょかん,Japanese hotel
留守,るす,absence
冷房,れいぼう,air conditioning
歴史,れきし,history
連絡,れんらく,contact
沸かす,わかす,"to boil,to heat"
別れる,わかれる,to separate
沸く,わく,"to boil, to grow hot,to get excited"
,わけ,"meaning,reason"
忘れ物,わすれもの,lost article
笑う,わらう,"to laugh,to smile"
割合,わりあい,"rate,ratio,percentage"
割れる,われる,to break
,アクセサリー,accessory
,アジア,Asia
,アナウンサー,announcer
,アフリカ,Africa
,アメリカ,America
,アルコール,alcohol
,アルバイト,part-time job
,エスカレーター,escalator
,オートバイ,motorcycle
,カーテン,curtain
,ガス,gas
,ガソリン,petrol
,ガソリンスタンド,petrol station
,ガラス,a glass pane
,ケーキ,cake
消しゴム,けしゴム,"eraser, rubber"
,コンサート,concert
,コンピューター,computer
,サラダ,salad
,サンダル,sandal
,サンドイッチ,sandwich
,ジャム,jam
,スーツ,suit
,スーツケース,suitcase
,スクリーン,screen
,ステーキ,steak
,ステレオ,stereo
,ソフト,soft
,タイプ,"type,style"
,チェック・する,to check
,テキスト,"text,text book"
,テニス,tennis
,パート,part time
,パソコン,personal computer
,ハンドバッグ,handbag 
,ピアノ,piano
,ビル,building or bill
,ファックス,fax
,プレゼント,present
,ベル,bell
,レジ,register
,レポート/リポート,report
,ワープロ,word processor
1 Ah
2 ああ like that
3 あいだ a space
4 合う あう to match
5 あかちゃん baby
6 上る あがる to rise
7 赤ん坊 あかんぼう baby
8 空く あく to open, to become empty
9 あげる to give
10 浅い あさい shallow, superficial
11 あじ flavour
12 明日 あす・あした tomorrow
13 遊び あそび play
14 集る あつまる to gather
15 集める あつめる to collect something
16 謝る あやまる to apologize
17 安心 あんしん relief
18 安全 あんぜん safety
19 あんな such
20 以下 いか less than
21 以外 いがい with the exception of
22 医学 いがく medical science
23 生きる いきる to live
24 意見 いけん opinion
25 いし stone
26 いじめる to tease
27 以上 いじょう more than, this is all
28 急ぐ いそぐ to hurry
29 致す いたす (humble) to do
30 一度 いちど once
31 一生懸命 いっしょうけんめい with utmost effort
32 いっぱい full
33 いと thread
34 以内 いない within
35 田舎 いなか countryside
36 祈る いのる to pray
37 いらっしゃる (respectful) to be, to come or to go
38 植える うえる to plant, to grow
39 受付 うけつけ receipt
40 受ける うける to take a lesson or test
41 動く うごく to move
42 うち within
43 打つ うつ to hit
44 美しい うつくしい beautiful
45 写す うつす to copy or photograph
46 移る うつる to move house or transfer
47 うで arm
48 うら reverse side
49 売り場 うりば place where things are sold
50 うん (informal) yes
51 運転手 うんてんしゅ driver
52 えだ branch, twig
53 選ぶ えらぶ to choose
54 遠慮 えんりょ・する to be reserved, to be restrained
55 おいでになる (respectful) to be
56 お祝い おいわい congratulation
57 おかげ owing to, thanks to
58 おかしい strange or funny
59 おく one hundred million
60 屋上 おくじょう rooftop
61 贈り物 おくりもの gift
62 送る おくる to send
63 遅れる おくれる to be late
64 起す おこす to wake
65 行う おこなう to do
66 怒る おこる to get angry, to be angry
67 押し入れ おしいれ closet
68 お嬢さん おじょうさん young lady
69 お宅 おたく (polite) your house
70 落る おちる to fall or drop
71 おっしゃる (respectful) to say
72 おっと husband
73 おつり change from purchase, balance
74 おと sound, note
75 落す おとす to drop
76 踊り おどり a dance
77 踊る おどる to dance
78 驚く おどろく to be surprised
79 お祭り おまつり festival
80 お見舞い おみまい calling on someone who is ill, enquiry
81 お土産 おみやげ souvenir
82 思い出す おもいだす to remember
83 思う おもう to think, to feel
84 おもちゃ toy
85 おもて the front
86 おや parents
87 泳ぎ方 およぎかた way of swimming
88 下りる おりる to get off, to descend
89 折る おる to break or to fold
90 お礼 おれい expression of gratitude
91 折れる おれる to break or be folded
92 終わり おわり the end
93 海岸 かいがん coast
94 会議 かいぎ meeting
95 会議室 かいぎしつ meeting room
96 会場 かいじょう assembly hall or meeting place
97 会話 かいわ conversation
98 帰り かえり return
99 変える かえる to change
100 科学 かがく science
101 かがみ mirror
102 掛ける かける to hang something
103 飾る かざる to decorate
104 火事 かじ fire
105 ガス gas
106 堅/硬/固い かたい hard
107 かたち shape
108 片付ける かたづける to tidy up
109 課長 かちょう section manager
110 勝つ かつ to win
111 家内 かない housewife
112 悲しい かなしい sad
113 必ず かならず certainly,necessarily
114 お・金持ち かねもち/おかねもち rich man
115 彼女 かのじょ she,girlfriend
116 かべ wall
117 かみ hair
118 噛む かむ to bite,to chew
119 通う かよう to commute
120 かれ he,boyfriend
121 乾く かわく to get dry
122 代わり かわり substitute,alternate
123 変わる かわる to change
124 考える かんがえる to consider
125 関係 かんけい relationship
126 看護師 かんごし nurse
127 簡単 かんたん simple
128 spirit,mood
129 機会 きかい opportunity
130 危険 きけん danger
131 聞こえる きこえる to be heard
132 汽車 きしゃ steam train
133 技術 ぎじゅつ art,technology,skill
134 季節 きせつ season
135 規則 きそく regulations
136 きっと surely
137 きぬ silk
138 厳しい きびしい strict
139 気分 きぶん mood
140 決る きまる to be decided
141 きみ (informal) You
142 決める きめる to decide
143 気持ち きもち feeling,mood
144 着物 きもの kimono
145 きゃく guest,customer
146 きゅう urgent, steep
147 急行 きゅうこう speedy, express
148 教育 きょういく education
149 教会 きょうかい church
150 競争 きょうそう competition
151 興味 きょうみ an interest
152 近所 きんじょ neighbourhood
153 具合 ぐあい condition,health
154 空気 くうき air,atmosphere
155 空港 くうこう airport
156 くさ grass
157 くび neck
158 くも cloud
159 比べる くらべる to compare
160 くれる to give
161 暮れる くれる to get dark,to come to an end
162 くん suffix for familiar young male
163 hair or fur
164 経済 けいざい finance,economy
165 警察 けいさつ police
166 景色 けしき scene,landscape
167 消しゴム けしゴム eraser
168 下宿 げしゅく lodging
169 決して けっして never
170 けれど/けれども however
171 原因 げんいん cause,source
172 けんか・する to quarrel
173 研究 けんきゅう research
174 研究室 けんきゅうしつ study room,laboratory
175 見物 けんぶつ sightseeing
176 child
177 こう this way
178 郊外 こうがい outskirts
179 講義 こうぎ lecture
180 工業 こうぎょう the manufacturing industry
181 高校 こうこう high school
182 高校生 こうこうせい high school student
183 工場 こうじょう/こうば factory,plant,mill,workshop
184 校長 こうちょう headmaster
185 交通 こうつう traffic,transportation
186 講堂 こうどう auditorium
187 高等学校 こうとうがっこう high school
188 公務員 こうむいん civil servant, government worker
189 国際 こくさい international
190 こころ heart, mind, core
191 御主人 ごしゅじん (honorable) your husband
192 故障 こしょう・する to break-down
193 ご存じ ごぞんじ (respect form ) to know
194 こたえ response
195 ごちそう a feast
196 小鳥 ことり small bird
197 このあいだ the other day,recently
198 このごろ these days,nowadays
199 細かい こまかい small, fine
200 込む こむ to include
201 こめ uncooked rice
202 ごらんになる (respectful) to see
203 これから after this
204 怖い こわい frightening
205 壊す こわす to break
206 壊れる こわれる to be broken
207 今度 こんど now,next time
208 今夜 こんや tonight
209 最近 さいきん latest,nowadays
210 最後 さいご last,end
211 最初 さいしょ beginning,first
212 さか slope,hill
213 探す さがす to look for
214 下る さがる to get down,to descend
215 盛ん さかん popularity,prosperous
216 下げる さげる to hang,to lower,to move back
217 差し上げる さしあげる (polite) to give
218 さっき some time ago
219 寂しい さびしい lonely
220 さ来月 さらいげつ the month after next
221 さ来週 さらいしゅう the week after next
222 騒ぐ さわぐ to make noise,to be excited
223 触る さわる to touch
224 産業 さんぎょう industry
225 残念 ざんねん disappointment
226 city
227 character
228 試合 しあい match,game
229 仕方 しかた method
230 試験 しけん examination
231 事故 じこ accident
232 地震 じしん earthquake
233 時代 じだい era
234 下着 したぎ underwear
235 しっかり firmly,steadily
236 失敗 しっぱい failure,mistake
237 辞典 じてん dictionary
238 品物 しなもの goods
239 しばらく little while
240 しま island
241 市民 しみん citizen
242 事務所 じむしょ office
243 社会 しゃかい society,public
244 社長 しゃちょう company president
245 自由 じゆう freedom
246 習慣 しゅうかん custom,manners
247 住所 じゅうしょ an address,a residence
248 柔道 じゅうどう judo
249 十分 じゅうぶん enough
250 趣味 しゅみ hobby
251 紹介 しょうかい introduction
252 小学校 しょうがっこう elementary school
253 小説 しょうせつ novel
254 将来 しょうらい future,prospects
255 食料品 しょくりょうひん groceries
256 女性 じょせい woman
257 知らせる しらせる to notify
258 調べる しらべる to investigate
259 人口 じんこう population
260 神社 じんじゃ Shinto shrine
261 親切 しんせつ kindness
262 新聞社 しんぶんしゃ newspaper company
263 水泳 すいえい swimming
264 水道 すいどう water supply
265 数学 すうがく mathematics,arithmetic
266 過ぎる すぎる to exceed
267 凄い すごい terrific
268 進む すすむ to make progress
269 すっかり completely
270 すっと straight,all of a sudden
271 捨てる すてる to throw away
272 すな sand
273 滑る すべる to slide,to slip
274 すみ corner,nook
275 済む すむ to finish
276 すり pickpocket
277 すると then
278 生活 せいかつ・する to live
279 生産 せいさん・する to produce
280 政治 せいじ politics,government
281 西洋 せいよう western countries
282 世界 せかい the world
283 せき seat
284 説明 せつめい explanation
285 背中 せなか back of the body
286 せん line
287 戦争 せんそう war
288 先輩 せんぱい senior
289 そう really
290 育てる そだてる to rear,to bring up
291 卒業 そつぎょう graduation
292 祖父 そふ grandfather
293 祖母 そぼ grandmother
294 それで because of that
295 それに moreover
296 それほど to that extent
297 そろそろ gradually,soon
298 そんな that sort of
299 そんなに so much,like that
300 退院 たいいん・する to leave hospital
301 大学生 だいがくせい university student
302 大事 だいじ important,valuable,serious matter
303 大体 だいたい generally
304 たいてい usually
305 大分 だいぶ greatly
306 台風 たいふう typhoon
307 倒れる たおれる to break down
308 だから so,therefore
309 確か たしか definite
310 足す たす to add a number
311 訪ねる たずねる to visit
312 尋ねる たずねる to ask
313 正しい ただしい correct
314 たたみ Japanese straw mat
315 立てる たてる to stand something up
316 建てる たてる to build
317 例えば たとえば for example
318 たな shelves
319 楽しみ たのしみ joy
320 楽む たのしむ to enjoy oneself
321 たまに occasionally
322 ため in order to
323 足りる たりる to be enough
324 男性 だんせい male
325 暖房 だんぼう heating
326 blood
327 チェック・する to check
328 ちから strength,power
329 ちっとも not at all (used with a negative verb)
330 ちゃん suffix for familiar person
331 注意 ちゅうい caution
332 中学校 ちゅうがっこう junior high school,middle school
333 注射 ちゅうしゃ injection
334 駐車場 ちゅうしゃじょう parking lot
335 地理 ちり geography
336 捕まえる つかまえる to seize
337 付く つく to be attached
338 漬ける つける to soak,to pickle
339 都合 つごう circumstances,convenience
340 伝える つたえる to report
341 続く つづく to be continued
342 続ける つづける to continue
343 包む つつむ to wrap
344 つま my wife
345 つもり intention
346 釣る つる to fish
347 丁寧 ていねい polite
348 適当 てきとう suitability
349 手伝う てつだう to assist
350 手袋 てぶくろ glove
351 てら temple
352 てん point,dot
353 店員 てんいん shop assistant
354 天気予報 てんきよほう weather forecast
355 電灯 でんとう electric light
356 電報 でんぽう telegram
357 展覧会 てんらんかい exhibition
358 metropolitan
359 道具 どうぐ tool,means
360 とうとう finally, after all
361 動物園 どうぶつえん zoo
362 遠く とおく distant
363 通る とおる to go through
364 特に とくに particularly,especially
365 特別 とくべつ special
366 とこや barber
367 途中 とちゅう on the way
368 特急 とっきゅう limited express train (faster than an express train)
369 届ける とどける to send, to deliver, to report
370 泊まる とまる to lodge at
371 止める とめる to stop something
372 取り替える とりかえる to exchange
373 泥棒 どろぼう thief
374 どんどん more and more
375 直す なおす to fix,to repair
376 直る なおる to be fixed,to be repaired
377 治る なおる to be cured,to heal
378 泣く なく to weep
379 無くなる なくなる to disappear,to get lost
380 亡くなる なくなる to die
381 投げる なげる to throw or cast away
382 なさる (respectful) to do
383 鳴る なる to sound
384 なるべく as much as possible
385 なるほど now I understand
386 慣れる なれる to grow accustomed to
387 苦い にがい bitter
388 二階建て にかいだて two storied
389 逃げる にげる to escape
390 日記 にっき journal
391 入院 にゅういん・する to hospitalise, hospitalisation
392 入学 にゅうがく・する to enter school or university
393 似る にる to be similar
394 人形 にんぎょう doll, figure
395 盗む ぬすむ to steal
396 塗る ぬる to paint, to colour, to plaster
397 ぬれる to get wet
398 ねだん price
399 ねつ fever
400 寝坊 ねぼう sleeping in late
401 眠い ねむい sleepy
402 眠る ねむる to sleep
403 残る のこる to remain
404 乗り換える のりかえる to change between buses or trains
405 乗り物 のりもの vehicle
406 leaf
407 場合 ばあい situation
408 ばい double
409 拝見 はいけん・する (humble) to look at
410 歯医者 はいしゃ dentist
411 運ぶ はこぶ to transport
412 始める はじめる to begin
413 場所 ばしょ location
414 はず it should be so
415 恥ずかしい はずかしい embarrassed
416 発音 はつおん pronunciation
417 はっきり clearly
418 花見 はなみ cherry-blossom viewing
419 はやし woods,forester
420 払う はらう to pay
421 番組 ばんぐみ television or radio program
422 反対 はんたい opposition
423 day, sun
424 fire
425 冷える ひえる to grow cold
426 ひかり light
427 光る ひかる to shine,to glitter
428 引き出し ひきだし drawer,drawing out
429 ひきだす to withdraw
430 ひげ beard
431 飛行場 ひこうじょう airport
432 久しぶり ひさしぶり after a long time
433 美術館 びじゅつかん art gallery
434 非常に ひじょうに extremely
435 引っ越す ひっこす to move house
436 必要 ひつよう necessary
437 ひどい awful
438 開く ひらく to open an event
439 昼間 ひるま daytime,during the day
440 昼休み ひるやすみ noon break
441 拾う ひろう to pick up,to gather
442 増える ふえる to increase
443 深い ふかい deep
444 複雑 ふくざつ complexity,complication
445 復習 ふくしゅう revision
446 部長 ぶちょう head of a section
447 普通 ふつう usually, or a train that stops at every station
448 ぶどう grapes
449 太る ふとる to become fat
450 布団 ふとん Japanese bedding, futon
451 ふね ship
452 不便 ふべん inconvenience
453 踏む ふむ to step on
454 降り出す ふりだす to start to rain
455 文化 ぶんか culture
456 文学 ぶんがく literature
457 文法 ぶんぽう grammar
458 べつ different
459 へん strange
460 返事 へんじ reply
461 貿易 ぼうえき trade
462 法律 ほうりつ law
463 ぼく I (used by males)
464 ほし star
465 ほとんど mostly
466 ほめる to praise
467 翻訳 ほんやく translation
468 参る まいる (humble) to go,to come
469 負ける まける to lose
470 または or,otherwise
471 間違える まちがえる to make a mistake
472 間に合う まにあう to be in time for
473 周り まわり surroundings
474 回る まわる to go around
475 漫画 まんが comic
476 真中 まんなか middle
477 見える みえる to be in sight
478 みずうみ lake
479 味噌 みそ miso, soybean paste
480 見つかる みつかる to be discovered
481 見つける みつける to discover
482 みな everybody
483 みなと harbour
484 向かう むかう to face
485 迎える むかえる to go out to meet
486 むかし old times, old days, long ago, formerly
487 むし insect
488 息子 むすこ (humble) son
489 むすめ (humble) daughter
490 無理 むり impossible
491 召し上がる めしあがる (polite) to eat
492 珍しい めずらしい rare
493 申し上げる もうしあげる (humble) to say,to tell
494 申す もうす (humble) to be called,to say
495 もうすぐ soon
496 もし if
497 戻る もどる to turn back
498 木綿 もめん cotton
499 もり forest
500 焼く やく to bake,to grill
501 約束 やくそく promise
502 役に立つ やくにたつ to be helpful
503 焼ける やける to burn,to be roasted
504 優しい やさしい kind
505 痩せる やせる to become thin
506 やっと at last
507 止む やむ to stop
508 止める やめる to stop
509 柔らかい やわらかい soft
510 hot water
511 ゆび finger
512 指輪 ゆびわ a ring
513 ゆめ dream
514 揺れる ゆれる to shake,to sway
515 よう use
516 用意 ようい preparation
517 用事 ようじ things to do
518 汚れる よごれる to get dirty
519 予習 よしゅう preparation for a lesson
520 予定 よてい arrangement
521 予約 よやく reservation
522 寄る よる to visit
523 喜ぶ よろこぶ to be delighted
524 理由 りゆう reason
525 利用 りよう utilization
526 両方 りょうほう both sides
527 旅館 りょかん Japanese hotel
528 留守 るす absence
529 冷房 れいぼう air conditioning
530 歴史 れきし history
531 連絡 れんらく contact
532 沸かす わかす to boil,to heat
533 別れる わかれる to separate
534 沸く わく to boil, to grow hot,to get excited
535 わけ meaning,reason
536 忘れ物 わすれもの lost article
537 笑う わらう to laugh,to smile
538 割合 わりあい rate,ratio,percentage
539 割れる われる to break
540 アクセサリー accessory
541 アジア Asia
542 アナウンサー announcer
543 アフリカ Africa
544 アメリカ America
545 アルコール alcohol
546 アルバイト part-time job
547 エスカレーター escalator
548 オートバイ motorcycle
549 カーテン curtain
550 ガス gas
551 ガソリン petrol
552 ガソリンスタンド petrol station
553 ガラス a glass pane
554 ケーキ cake
555 消しゴム けしゴム eraser, rubber
556 コンサート concert
557 コンピューター computer
558 サラダ salad
559 サンダル sandal
560 サンドイッチ sandwich
561 ジャム jam
562 スーツ suit
563 スーツケース suitcase
564 スクリーン screen
565 ステーキ steak
566 ステレオ stereo
567 ソフト soft
568 タイプ type,style
569 チェック・する to check
570 テキスト text,text book
571 テニス tennis
572 パート part time
573 パソコン personal computer
574 ハンドバッグ handbag 
575 ピアノ piano
576 ビル building or bill
577 ファックス fax
578 プレゼント present
579 ベル bell
580 レジ register
581 レポート/リポート report
582 ワープロ word processor

View File

@@ -1,669 +0,0 @@
会う,あう,to meet
,あお,blue
青い,あおい,blue
,あか,red
赤い,あかい,red
明い,あかるい,bright
,あき,autumn
開く,あく,"to open,to become open"
開ける,あける,to open
上げる,あげる,to give
,あさ,morning
朝御飯,あさごはん,breakfast
,あさって,day after tomorrow
,あし,"foot,leg"
明日,あした,tomorrow
,あそこ,over there
遊ぶ,あそぶ,"to play,to make a visit"
暖かい,あたたかい,warm
,あたま,head
新しい,あたらしい,new
,あちら,there
暑い,あつい,hot
熱い,あつい,hot to the touch
厚い,あつい,"kind, deep, thick"
,あっち,over there
,あと,afterwards
,あなた,you
,あに,(humble) older brother
,あね,(humble) older sister
,あの,that over there
,あの,um...
,アパート,apartment
,あびる,"to bathe,to shower"
危ない,あぶない,dangerous
甘い,あまい,sweet
,あまり,not very
,あめ,rain
,あめ,candy
洗う,あらう,to wash
,ある,"to be,to have (used for inanimate objects)"
歩く,あるく,to walk
,あれ,that
,いい/よい,good
,いいえ,no
言う,いう,to say
,いえ,house
,いかが,how
行く,いく,to go
,いくつ,"how many?,how old?"
,いくら,how much?
,いけ,pond
医者,いしゃ,medical doctor
,いす,chair
忙しい,いそがしい,"busy,irritated"
痛い,いたい,painful
,いち,one
一日,いちにち,"(1) one day, (2) first of month"
,いちばん,"best,first"
,いつ,when
五日,いつか,"five days, fifth day"
一緒,いっしょ,together
五つ,いつつ,five
,いつも,always
,いぬ,dog
,いま,now
意味,いみ,meaning
,いもうと,(humble) younger sister
,いや,unpleasant
入口,いりぐち,entrance
居る,いる,"to be, to have (used for people and animals)"
要る,いる,to need
入れる,いれる,to put in
,いろ,colour
,いろいろ,various
,うえ,on top of
後ろ,うしろ,behind
薄い,うすい,"thin,weak"
,うた,song
歌う,うたう,to sing
生まれる,うまれる,to be born
,うみ,sea
売る,うる,to sell
煩い,うるさい,"noisy,annoying"
上着,うわぎ,jacket
,,picture
映画,えいが,movie
映画館,えいがかん,cinema
英語,えいご,English language
,ええ,yes
,えき,station
,エレベーター,elevator
鉛筆,えんぴつ,pencil
,おいしい,delicious
多い,おおい,many
大きい,おおきい,big
大きな,おおきな,big
大勢,おおぜい,great number of people
お母さん,おかあさん,(honorable) mother
お菓子,おかし,"sweets, candy"
お金,おかね,money
起きる,おきる,to get up
置く,おく,to put
奥さん,おくさん,(honorable) wife
お酒,おさけ,"alcohol, rice wine"
お皿,おさら,"plate, dish"
伯父/叔父,おじいさん,"grandfather,male senior citizen"
教える,おしえる,"to teach,to tell"
伯父/叔父,おじさん,"uncle,middle aged gentleman"
押す,おす,"to push, to stamp something"
遅い,おそい,"late,slow"
お茶,おちゃ,green tea
お手洗い,おてあらい,bathroom
お父さん,おとうさん,(honorable) father
,おとうと,younger brother
,おとこ,man
男の子,おとこのこ,boy
一昨日,おととい,day before yesterday
一昨年,おととし,year before last
大人,おとな,adult
,おなか,stomach
同じ,おなじ,same
お兄さん,おにいさん,(honorable) older brother
お姉さん,おねえさん,(honorable) older sister
,おばあさん,"grandmother,female senior-citizen"
伯母さん/叔母さん,おばさん,aunt
お風呂,おふろ,bath
お弁当,おべんとう,boxed lunch
覚える,おぼえる,to remember
,おまわりさん,friendly term for policeman
重い,おもい,heavy
,おもしろい,interesting
泳ぐ,およぐ,to swim
降りる,おりる,"to get off, to descend"
終る,おわる,to finish
音楽,おんがく,music
,おんな,woman
女の子,おんなのこ,girl
外国,がいこく,foreign country
外国人,がいこくじん,foreigner
会社,かいしゃ,company
階段,かいだん,stairs
買い物,かいもの,shopping
買う,かう,to buy
返す,かえす,to return something
帰る,かえる,to go back
,かかる,to take time or money
,かぎ,key
書く,かく,to write
学生,がくせい,student
,かける,to call by phone
,かさ,umbrella
貸す,かす,to lend
,かぜ,wind
風邪,かぜ,a cold
家族,かぞく,family
,かた,"person, way of doing"
学校,がっこう,school
,カップ,cup
家庭,かてい,household
,かど,a corner
,かばん,"bag,basket"
花瓶,かびん,a vase
,かみ,paper
,カメラ,camera
火曜日,かようび,Tuesday
辛い,からい,spicy
,からだ,body
借りる,かりる,to borrow
軽い,かるい,light
,カレー,curry
,カレンダー,calendar
川/河,かわ,river
,かわいい,cute
漢字,かんじ,Chinese character
,,"tree,wood"
黄色,きいろ,yellow
黄色い,きいろい,yellow
消える,きえる,to disappear
聞く,きく,"to hear,to listen to,to ask"
,きた,north
,ギター,guitar
汚い,きたない,dirty
喫茶店,きっさてん,coffee lounge
切手,きって,postage stamp
切符,きっぷ,ticket
昨日,きのう,yesterday
,きゅう / く,nine
牛肉,ぎゅうにく,beef
牛乳,ぎゅうにゅう,milk
今日,きょう,today
教室,きょうしつ,classroom
兄弟,きょうだい,(humble) siblings
去年,きょねん,last year
嫌い,きらい,hate
切る,きる,to cut
着る,きる,to put on from the shoulders down
,きれい,"pretty,clean"
,キロ/キログラム,kilogram
,キロ/キロメートル,kilometre
銀行,ぎんこう,bank
金曜日,きんようび,Friday
,くすり,medicine
,ください,please
果物,くだもの,fruit
,くち,"mouth,opening"
,くつ,shoes
靴下,くつした,socks
,くに,country
曇り,くもり,cloudy weather
曇る,くもる,"to become cloudy,to become dim"
暗い,くらい,gloomy
,クラス,class
,グラム,gram
来る,くる,to come
,くるま,"car,vehicle"
,くろ,black
黒い,くろい,black
警官,けいかん,policeman
今朝,けさ,this morning
消す,けす,"to erase,to turn off power"
結構,けっこう,"splendid,enough"
結婚,けっこん,marriage
月曜日,げつようび,Monday
玄関,げんかん,entry hall
元気,げんき,"health, vitality"
,,five
公園,こうえん,park
交差点,こうさてん,intersection
紅茶,こうちゃ,black tea
交番,こうばん,police box
,こえ,voice
,コート,"coat,tennis court"
,コーヒー,coffee
,ここ,here
午後,ごご,afternoon
九日,ここのか,"nine days, ninth day"
九つ,ここのつ,nine
午前,ごぜん,morning
答える,こたえる,to answer
,こちら,this person or way
,こっち,this person or way
,コップ,a glass
今年,ことし,this year
言葉,ことば,"word,language"
子供,こども,child
,この,this
御飯,ごはん,"cooked rice,meal"
,コピーする,to copy
困る,こまる,to be worried
,これ,this
今月,こんげつ,this month
今週,こんしゅう,this week
,こんな,such
今晩,こんばん,this evening
,さあ,well…
財布,さいふ,wallet
,さかな,fish
,さき,"the future,previous"
咲く,さく,to bloom
作文,さくぶん,"composition,writing"
差す,さす,"to stretch out hands,to raise an umbrella"
雑誌,ざっし,magazine
砂糖,さとう,sugar
寒い,さむい,cold
さ来年,さらいねん,year after next
,さん,three
散歩,さんぽする,to stroll
,し / よん,four
,しお,salt
,しかし,however
時間,じかん,time
仕事,しごと,job
辞書,じしょ,dictionary
静か,しずか,quiet
,した,below
,しち / なな,seven
質問,しつもん,question
自転車,じてんしゃ,bicycle
自動車,じどうしゃ,automobile
死ぬ,しぬ,to die
字引,じびき,dictionary
自分,じぶん,oneself
閉まる,しまる,"to close,to be closed"
閉める,しめる,to close something
締める,しめる,to tie
,じゃ/じゃあ,well then…
写真,しゃしん,photograph
,シャツ,shirt
,シャワー,shower
,じゅう とお,ten
授業,じゅぎょう,"lesson,class work"
宿題,しゅくだい,homework
上手,じょうず,skillful
丈夫,じょうぶ,"strong,durable"
,しょうゆ,soy sauce
食堂,しょくどう,dining hall
知る,しる,to know
,しろ,white
白い,しろい,white
新聞,しんぶん,newspaper
水曜日,すいようび,Wednesday
吸う,すう,"to smoke,to suck"
,スカート,skirt
好き,すき,likeable
少ない,すくない,a few
,すぐに,instantly
少し,すこし,few
涼しい,すずしい,refreshing
,ストーブ,heater
,スプーン,spoon
,スポーツ,sport
,ズボン,trousers
住む,すむ,to live in
,スリッパ,slippers
,する,to do
座る,すわる,to sit
,,"height,stature"
生徒,せいと,pupil
,セーター,"sweater,jumper"
,せっけん,soap
背広,せびろ,business suit
狭い,せまい,narrow
,ゼロ,zero
,せん,thousand
先月,せんげつ,last month
先週,せんしゅう,last week
先生,せんせい,"teacher,doctor"
洗濯,せんたく,washing
全部,ぜんぶ,all
掃除,そうじする,"to clean, to sweep"
,そうして/そして,and
,そこ,that place
,そちら,over there
,そっち,over there
,そと,outside
,その,that
,そば,"near,beside"
,そら,sky
,それ,that
,それから,after that
,それでは,in that situation
大学,だいがく,university
大使館,たいしかん,embassy
大丈夫,だいじょうぶ,all right
大好き,だいすき,to be very likeable
大切,たいせつ,important
台所,だいどころ,kitchen
,たいへん,very
,たいへん,difficult situation
高い,たかい,"tall, expensive"
,たくさん,many
,タクシー,taxi
出す,だす,to put out
立つ,たつ,to stand
,たて,"length,height"
建物,たてもの,building
楽しい,たのしい,enjoyable
頼む,たのむ,to ask
,たばこ,"tobacco,cigarettes"
,たぶん,probably
食べ物,たべもの,food
食べる,たべる,to eat
,たまご,egg
,だれ,who
,だれか,somebody
誕生日,たんじょうび,birthday
,だんだん,gradually
小さい,ちいさい,little
小さな,ちいさな,little
近い,ちかい,near
違う,ちがう,to differ
近く,ちかく,near
地下鉄,ちかてつ,underground train
地図,ちず,map
茶色,ちゃいろ,brown
,ちゃわん,rice bowl
,ちょうど,exactly
,ちょっと,somewhat
一日,ついたち,first of month
使う,つかう,to use
疲れる,つかれる,to get tired
,つぎ,next
着く,つく,to arrive at
,つくえ,desk
作る,つくる,to make
,つける,to turn on
勤める,つとめる,to work for someone
,つまらない,boring
冷たい,つめたい,cold to the touch
強い,つよい,powerful
,,hand
,テープ,tape
,テーブル,table
,テープレコーダー,tape recorder
出かける,でかける,to go out
手紙,てがみ,letter
,できる,to be able to
出口,でぐち,exit
,テスト,test
,では,with that...
,デパート,department store
,でも,but
出る,でる,"to appear,to leave"
,テレビ,television
天気,てんき,weather
電気,でんき,"electricity,electric light"
電車,でんしゃ,electric train
電話,でんわ,telephone
,,Japanese style door
,ドア,Western style door
,トイレ,toilet
,どう,"how,in what way"
,どうして,for what reason
,どうぞ,please
動物,どうぶつ,animal
,どうも,thanks
遠い,とおい,far
十日,とおか,"ten days,the tenth day"
時々,ときどき,sometimes
時計,とけい,"watch,clock"
,どこ,where
,ところ,place
,とし,year
図書館,としょかん,library
,どちら,which of two
,どっち,which
,とても,very
,どなた,who
,となり,next door to
,どの,which
飛ぶ,とぶ,"to fly,to hop"
止まる,とまる,to come to a halt
友達,ともだち,friend
土曜日,どようび,Saturday
,とり,bird
とり肉,とりにく,chicken meat
取る,とる,to take something
撮る,とる,to take a photo or record a film
,どれ,which (of three or more)
,ナイフ,knife
,なか,middle
長い,ながい,long
鳴く,なく,"animal noise. to chirp, roar or croak etc."
無くす,なくす,to lose something
,なぜ,why
,なつ,summer
夏休み,なつやすみ,summer holiday
,など,et cetera
七つ,ななつ,seven
七日,なのか,"seven days,the seventh day"
名前,なまえ,name
習う,ならう,to learn
並ぶ,ならぶ,"to line up,to stand in a line"
並べる,ならべる,"to line up,to set up"
,なる,to become
,なん/なに,what
,,two
賑やか,にぎやか,"bustling,busy"
,にく,meat
西,にし,west
日曜日,にちようび,Sunday
荷物,にもつ,luggage
,ニュース,news
,にわ,garden
脱ぐ,ぬぐ,to take off clothes
温い,ぬるい,luke warm
,ネクタイ,"tie,necktie"
,ねこ,cat
寝る,ねる,"to go to bed,to sleep"
,ノート,"notebook,exercise book"
登る,のぼる,to climb
飲み物,のみもの,a drink
飲む,のむ,to drink
乗る,のる,"to get on,to ride"
,,tooth
,パーティー,party
,はい,yes
灰皿,はいざら,ashtray
入る,はいる,"to enter,to contain"
葉書,はがき,postcard
,はく,"to wear,to put on trousers"
,はこ,box
,はし,bridge
,はし,chopsticks
始まる,はじまる,to begin
初め/始め,はじめ,beginning
初めて,はじめて,for the first time
走る,はしる,to run
,バス,bus
,バター,butter
二十歳,はたち,"20 years old,20th year"
働く,はたらく,to work
,はち,eight
二十日,はつか,"twenty days,twentieth"
,はな,flower
,はな,nose
,はなし,"talk,story"
話す,はなす,to speak
早い,はやい,early
速い,はやい,quick
,はる,spring
貼る,はる,to stick
晴れ,はれ,clear weather
晴れる,はれる,to be sunny
,はん,half
,ばん,evening
,パン,bread
,ハンカチ,handkerchief
番号,ばんごう,number
晩御飯,ばんごはん,evening meal
半分,はんぶん,half minute
,ひがし,east
引く,ひく,to pull
弾く,ひく,"to play an instrument with strings, including piano"
低い,ひくい,"short,low"
飛行機,ひこうき,aeroplane
,ひだり,left hand side
,ひと,person
一つ,ひとつ,one
一月,ひとつき,one month
一人,ひとり,one person
,ひま,free time
,ひゃく,hundred
病院,びょういん,hospital
病気,びょうき,illness
,ひる,"noon, daytime"
昼御飯,ひるごはん,midday meal
広い,ひろい,"spacious,wide"
,フィルム,roll of film
封筒,ふうとう,envelope
,プール,swimming pool
,フォーク,fork
吹く,ふく,to blow
,ふく,clothes
二つ,ふたつ,two
豚肉,ぶたにく,pork
二人,ふたり,two people
二日,ふつか,"two days, second day of the month"
太い,ふとい,fat
,ふゆ,winter
降る,ふる,"to fall, e.g. rain or snow"
古い,ふるい,old (not used for people)
,ふろ,bath
文章,ぶんしょう,"sentence,text"
,ページ,page
下手,へた,unskillful
,ベッド,bed
,ペット,pet
部屋,へや,room
,へん,area
,ペン,pen
勉強,べんきょうする,to study
便利,べんり,"useful, convenient"
帽子,ぼうし,hat
,ボールペン,ball-point pen
,ほか,"other, the rest"
,ポケット,pocket
欲しい,ほしい,want
,ポスト,post
細い,ほそい,thin
,ボタン,button
,ホテル,hotel
,ほん,book
本棚,ほんだな,bookshelves
,ほんとう,truth
毎朝,まいあさ,every morning
毎月,まいげつ/まいつき,every month
毎週,まいしゅう,every week
毎日,まいにち,every day
毎年,まいねん/まいとし,every year
毎晩,まいばん,every night
,まえ,before
曲る,まがる,"to turn,to bend"
,まずい,unpleasant
,また,"again,and"
,まだ,"yet,still"
,まち,"town,city"
待つ,まつ,to wait
,まっすぐ,"straight ahead,direct"
,マッチ,match
,まど,window
丸い/円い,まるい,"round,circular"
,まん,ten thousand
万年筆,まんねんひつ,fountain pen
磨く,みがく,"to brush teeth, to polish"
,みぎ,right side
短い,みじかい,short
,みず,water
,みせ,shop
見せる,みせる,to show
,みち,street
三日,みっか,"three days, third day of the month"
三つ,みっつ,three
,みどり,green
皆さん,みなさん,everyone
,みなみ,south
,みみ,ear
見る 観る,みる,"to see, to watch"
,みんな,everyone
六日,むいか,"six days, sixth day of the month"
向こう,むこう,over there
難しい,むずかしい,difficult
六つ,むっつ,six
,むら,village
,,eye
,メートル,metre
眼鏡,めがね,glasses
,もう,already
もう一度,もういちど,again
木曜日,もくようび,Thursday
持つ,もつ,to hold
,もっと,more
,もの,thing
,もん,gate
問題,もんだい,problem
八百屋,やおや,greengrocer
野菜,やさい,vegetable
易しい,やさしい,"easy, simple"
安い,やすい,cheap
休み,やすみ,"rest,holiday"
休む,やすむ,to rest
八つ,やっつ,eight
,やま,mountain
,やる,to do
夕方,ゆうがた,evening
夕飯,ゆうはん,dinner
郵便局,ゆうびんきょく,post office
昨夜,ゆうべ,last night
有名,ゆうめい,famous
,ゆき,snow
行く,ゆく,to go
,ゆっくりと,slowly
八日,ようか,"eight days, eighth day of the month"
洋服,ようふく,western-style clothes
,よく,"often, well"
,よこ,"beside,side,width"
四日,よっか,"four days, fouth day of the month"
四つ,よっつ,four
呼ぶ,よぶ,"to call out,to invite"
読む,よむ,to read
,よる,"evening,night"
弱い,よわい,weak
来月,らいげつ,next month
来週,らいしゅう,next week
来年,らいねん,next year
,ラジオ,radio
,ラジカセ / ラジオカセット,radio cassette player
,りっぱ,splendid
留学生,りゅうがくせい,overseas student
両親,りょうしん,both parents
料理,りょうり,cuisine
旅行,りょこう,travel
,れい,zero
冷蔵庫,れいぞうこ,refrigerator
,レコード,record
,レストラン,restaurant
練習,れんしゅうする,to practice
廊下,ろうか,corridor
,ろく,six
,ワイシャツ,business shirt
若い,わかい,young
分かる,わかる,to be understood
忘れる,わすれる,to forget
,わたくし,"(humble) I,myself"
,わたし,"I,myself"
渡す,わたす,to hand over
渡る,わたる,to go across
悪い,わるい,bad
,より、ほう,Used for comparison.
1 会う あう to meet
2 あお blue
3 青い あおい blue
4 あか red
5 赤い あかい red
6 明い あかるい bright
7 あき autumn
8 開く あく to open,to become open
9 開ける あける to open
10 上げる あげる to give
11 あさ morning
12 朝御飯 あさごはん breakfast
13 あさって day after tomorrow
14 あし foot,leg
15 明日 あした tomorrow
16 あそこ over there
17 遊ぶ あそぶ to play,to make a visit
18 暖かい あたたかい warm
19 あたま head
20 新しい あたらしい new
21 あちら there
22 暑い あつい hot
23 熱い あつい hot to the touch
24 厚い あつい kind, deep, thick
25 あっち over there
26 あと afterwards
27 あなた you
28 あに (humble) older brother
29 あね (humble) older sister
30 あの that over there
31 あの um...
32 アパート apartment
33 あびる to bathe,to shower
34 危ない あぶない dangerous
35 甘い あまい sweet
36 あまり not very
37 あめ rain
38 あめ candy
39 洗う あらう to wash
40 ある to be,to have (used for inanimate objects)
41 歩く あるく to walk
42 あれ that
43 いい/よい good
44 いいえ no
45 言う いう to say
46 いえ house
47 いかが how
48 行く いく to go
49 いくつ how many?,how old?
50 いくら how much?
51 いけ pond
52 医者 いしゃ medical doctor
53 いす chair
54 忙しい いそがしい busy,irritated
55 痛い いたい painful
56 いち one
57 一日 いちにち (1) one day, (2) first of month
58 いちばん best,first
59 いつ when
60 五日 いつか five days, fifth day
61 一緒 いっしょ together
62 五つ いつつ five
63 いつも always
64 いぬ dog
65 いま now
66 意味 いみ meaning
67 いもうと (humble) younger sister
68 いや unpleasant
69 入口 いりぐち entrance
70 居る いる to be, to have (used for people and animals)
71 要る いる to need
72 入れる いれる to put in
73 いろ colour
74 いろいろ various
75 うえ on top of
76 後ろ うしろ behind
77 薄い うすい thin,weak
78 うた song
79 歌う うたう to sing
80 生まれる うまれる to be born
81 うみ sea
82 売る うる to sell
83 煩い うるさい noisy,annoying
84 上着 うわぎ jacket
85 picture
86 映画 えいが movie
87 映画館 えいがかん cinema
88 英語 えいご English language
89 ええ yes
90 えき station
91 エレベーター elevator
92 鉛筆 えんぴつ pencil
93 おいしい delicious
94 多い おおい many
95 大きい おおきい big
96 大きな おおきな big
97 大勢 おおぜい great number of people
98 お母さん おかあさん (honorable) mother
99 お菓子 おかし sweets, candy
100 お金 おかね money
101 起きる おきる to get up
102 置く おく to put
103 奥さん おくさん (honorable) wife
104 お酒 おさけ alcohol, rice wine
105 お皿 おさら plate, dish
106 伯父/叔父 おじいさん grandfather,male senior citizen
107 教える おしえる to teach,to tell
108 伯父/叔父 おじさん uncle,middle aged gentleman
109 押す おす to push, to stamp something
110 遅い おそい late,slow
111 お茶 おちゃ green tea
112 お手洗い おてあらい bathroom
113 お父さん おとうさん (honorable) father
114 おとうと younger brother
115 おとこ man
116 男の子 おとこのこ boy
117 一昨日 おととい day before yesterday
118 一昨年 おととし year before last
119 大人 おとな adult
120 おなか stomach
121 同じ おなじ same
122 お兄さん おにいさん (honorable) older brother
123 お姉さん おねえさん (honorable) older sister
124 おばあさん grandmother,female senior-citizen
125 伯母さん/叔母さん おばさん aunt
126 お風呂 おふろ bath
127 お弁当 おべんとう boxed lunch
128 覚える おぼえる to remember
129 おまわりさん friendly term for policeman
130 重い おもい heavy
131 おもしろい interesting
132 泳ぐ およぐ to swim
133 降りる おりる to get off, to descend
134 終る おわる to finish
135 音楽 おんがく music
136 おんな woman
137 女の子 おんなのこ girl
138 外国 がいこく foreign country
139 外国人 がいこくじん foreigner
140 会社 かいしゃ company
141 階段 かいだん stairs
142 買い物 かいもの shopping
143 買う かう to buy
144 返す かえす to return something
145 帰る かえる to go back
146 かかる to take time or money
147 かぎ key
148 書く かく to write
149 学生 がくせい student
150 かける to call by phone
151 かさ umbrella
152 貸す かす to lend
153 かぜ wind
154 風邪 かぜ a cold
155 家族 かぞく family
156 かた person, way of doing
157 学校 がっこう school
158 カップ cup
159 家庭 かてい household
160 かど a corner
161 かばん bag,basket
162 花瓶 かびん a vase
163 かみ paper
164 カメラ camera
165 火曜日 かようび Tuesday
166 辛い からい spicy
167 からだ body
168 借りる かりる to borrow
169 軽い かるい light
170 カレー curry
171 カレンダー calendar
172 川/河 かわ river
173 かわいい cute
174 漢字 かんじ Chinese character
175 tree,wood
176 黄色 きいろ yellow
177 黄色い きいろい yellow
178 消える きえる to disappear
179 聞く きく to hear,to listen to,to ask
180 きた north
181 ギター guitar
182 汚い きたない dirty
183 喫茶店 きっさてん coffee lounge
184 切手 きって postage stamp
185 切符 きっぷ ticket
186 昨日 きのう yesterday
187 きゅう / く nine
188 牛肉 ぎゅうにく beef
189 牛乳 ぎゅうにゅう milk
190 今日 きょう today
191 教室 きょうしつ classroom
192 兄弟 きょうだい (humble) siblings
193 去年 きょねん last year
194 嫌い きらい hate
195 切る きる to cut
196 着る きる to put on from the shoulders down
197 きれい pretty,clean
198 キロ/キログラム kilogram
199 キロ/キロメートル kilometre
200 銀行 ぎんこう bank
201 金曜日 きんようび Friday
202 くすり medicine
203 ください please
204 果物 くだもの fruit
205 くち mouth,opening
206 くつ shoes
207 靴下 くつした socks
208 くに country
209 曇り くもり cloudy weather
210 曇る くもる to become cloudy,to become dim
211 暗い くらい gloomy
212 クラス class
213 グラム gram
214 来る くる to come
215 くるま car,vehicle
216 くろ black
217 黒い くろい black
218 警官 けいかん policeman
219 今朝 けさ this morning
220 消す けす to erase,to turn off power
221 結構 けっこう splendid,enough
222 結婚 けっこん marriage
223 月曜日 げつようび Monday
224 玄関 げんかん entry hall
225 元気 げんき health, vitality
226 five
227 公園 こうえん park
228 交差点 こうさてん intersection
229 紅茶 こうちゃ black tea
230 交番 こうばん police box
231 こえ voice
232 コート coat,tennis court
233 コーヒー coffee
234 ここ here
235 午後 ごご afternoon
236 九日 ここのか nine days, ninth day
237 九つ ここのつ nine
238 午前 ごぜん morning
239 答える こたえる to answer
240 こちら this person or way
241 こっち this person or way
242 コップ a glass
243 今年 ことし this year
244 言葉 ことば word,language
245 子供 こども child
246 この this
247 御飯 ごはん cooked rice,meal
248 コピーする to copy
249 困る こまる to be worried
250 これ this
251 今月 こんげつ this month
252 今週 こんしゅう this week
253 こんな such
254 今晩 こんばん this evening
255 さあ well…
256 財布 さいふ wallet
257 さかな fish
258 さき the future,previous
259 咲く さく to bloom
260 作文 さくぶん composition,writing
261 差す さす to stretch out hands,to raise an umbrella
262 雑誌 ざっし magazine
263 砂糖 さとう sugar
264 寒い さむい cold
265 さ来年 さらいねん year after next
266 さん three
267 散歩 さんぽする to stroll
268 し / よん four
269 しお salt
270 しかし however
271 時間 じかん time
272 仕事 しごと job
273 辞書 じしょ dictionary
274 静か しずか quiet
275 した below
276 しち / なな seven
277 質問 しつもん question
278 自転車 じてんしゃ bicycle
279 自動車 じどうしゃ automobile
280 死ぬ しぬ to die
281 字引 じびき dictionary
282 自分 じぶん oneself
283 閉まる しまる to close,to be closed
284 閉める しめる to close something
285 締める しめる to tie
286 じゃ/じゃあ well then…
287 写真 しゃしん photograph
288 シャツ shirt
289 シャワー shower
290 じゅう とお ten
291 授業 じゅぎょう lesson,class work
292 宿題 しゅくだい homework
293 上手 じょうず skillful
294 丈夫 じょうぶ strong,durable
295 しょうゆ soy sauce
296 食堂 しょくどう dining hall
297 知る しる to know
298 しろ white
299 白い しろい white
300 新聞 しんぶん newspaper
301 水曜日 すいようび Wednesday
302 吸う すう to smoke,to suck
303 スカート skirt
304 好き すき likeable
305 少ない すくない a few
306 すぐに instantly
307 少し すこし few
308 涼しい すずしい refreshing
309 ストーブ heater
310 スプーン spoon
311 スポーツ sport
312 ズボン trousers
313 住む すむ to live in
314 スリッパ slippers
315 する to do
316 座る すわる to sit
317 height,stature
318 生徒 せいと pupil
319 セーター sweater,jumper
320 せっけん soap
321 背広 せびろ business suit
322 狭い せまい narrow
323 ゼロ zero
324 せん thousand
325 先月 せんげつ last month
326 先週 せんしゅう last week
327 先生 せんせい teacher,doctor
328 洗濯 せんたく washing
329 全部 ぜんぶ all
330 掃除 そうじする to clean, to sweep
331 そうして/そして and
332 そこ that place
333 そちら over there
334 そっち over there
335 そと outside
336 その that
337 そば near,beside
338 そら sky
339 それ that
340 それから after that
341 それでは in that situation
342 大学 だいがく university
343 大使館 たいしかん embassy
344 大丈夫 だいじょうぶ all right
345 大好き だいすき to be very likeable
346 大切 たいせつ important
347 台所 だいどころ kitchen
348 たいへん very
349 たいへん difficult situation
350 高い たかい tall, expensive
351 たくさん many
352 タクシー taxi
353 出す だす to put out
354 立つ たつ to stand
355 たて length,height
356 建物 たてもの building
357 楽しい たのしい enjoyable
358 頼む たのむ to ask
359 たばこ tobacco,cigarettes
360 たぶん probably
361 食べ物 たべもの food
362 食べる たべる to eat
363 たまご egg
364 だれ who
365 だれか somebody
366 誕生日 たんじょうび birthday
367 だんだん gradually
368 小さい ちいさい little
369 小さな ちいさな little
370 近い ちかい near
371 違う ちがう to differ
372 近く ちかく near
373 地下鉄 ちかてつ underground train
374 地図 ちず map
375 茶色 ちゃいろ brown
376 ちゃわん rice bowl
377 ちょうど exactly
378 ちょっと somewhat
379 一日 ついたち first of month
380 使う つかう to use
381 疲れる つかれる to get tired
382 つぎ next
383 着く つく to arrive at
384 つくえ desk
385 作る つくる to make
386 つける to turn on
387 勤める つとめる to work for someone
388 つまらない boring
389 冷たい つめたい cold to the touch
390 強い つよい powerful
391 hand
392 テープ tape
393 テーブル table
394 テープレコーダー tape recorder
395 出かける でかける to go out
396 手紙 てがみ letter
397 できる to be able to
398 出口 でぐち exit
399 テスト test
400 では with that...
401 デパート department store
402 でも but
403 出る でる to appear,to leave
404 テレビ television
405 天気 てんき weather
406 電気 でんき electricity,electric light
407 電車 でんしゃ electric train
408 電話 でんわ telephone
409 Japanese style door
410 ドア Western style door
411 トイレ toilet
412 どう how,in what way
413 どうして for what reason
414 どうぞ please
415 動物 どうぶつ animal
416 どうも thanks
417 遠い とおい far
418 十日 とおか ten days,the tenth day
419 時々 ときどき sometimes
420 時計 とけい watch,clock
421 どこ where
422 ところ place
423 とし year
424 図書館 としょかん library
425 どちら which of two
426 どっち which
427 とても very
428 どなた who
429 となり next door to
430 どの which
431 飛ぶ とぶ to fly,to hop
432 止まる とまる to come to a halt
433 友達 ともだち friend
434 土曜日 どようび Saturday
435 とり bird
436 とり肉 とりにく chicken meat
437 取る とる to take something
438 撮る とる to take a photo or record a film
439 どれ which (of three or more)
440 ナイフ knife
441 なか middle
442 長い ながい long
443 鳴く なく animal noise. to chirp, roar or croak etc.
444 無くす なくす to lose something
445 なぜ why
446 なつ summer
447 夏休み なつやすみ summer holiday
448 など et cetera
449 七つ ななつ seven
450 七日 なのか seven days,the seventh day
451 名前 なまえ name
452 習う ならう to learn
453 並ぶ ならぶ to line up,to stand in a line
454 並べる ならべる to line up,to set up
455 なる to become
456 なん/なに what
457 two
458 賑やか にぎやか bustling,busy
459 にく meat
460 西 にし west
461 日曜日 にちようび Sunday
462 荷物 にもつ luggage
463 ニュース news
464 にわ garden
465 脱ぐ ぬぐ to take off clothes
466 温い ぬるい luke warm
467 ネクタイ tie,necktie
468 ねこ cat
469 寝る ねる to go to bed,to sleep
470 ノート notebook,exercise book
471 登る のぼる to climb
472 飲み物 のみもの a drink
473 飲む のむ to drink
474 乗る のる to get on,to ride
475 tooth
476 パーティー party
477 はい yes
478 灰皿 はいざら ashtray
479 入る はいる to enter,to contain
480 葉書 はがき postcard
481 はく to wear,to put on trousers
482 はこ box
483 はし bridge
484 はし chopsticks
485 始まる はじまる to begin
486 初め/始め はじめ beginning
487 初めて はじめて for the first time
488 走る はしる to run
489 バス bus
490 バター butter
491 二十歳 はたち 20 years old,20th year
492 働く はたらく to work
493 はち eight
494 二十日 はつか twenty days,twentieth
495 はな flower
496 はな nose
497 はなし talk,story
498 話す はなす to speak
499 早い はやい early
500 速い はやい quick
501 はる spring
502 貼る はる to stick
503 晴れ はれ clear weather
504 晴れる はれる to be sunny
505 はん half
506 ばん evening
507 パン bread
508 ハンカチ handkerchief
509 番号 ばんごう number
510 晩御飯 ばんごはん evening meal
511 半分 はんぶん half minute
512 ひがし east
513 引く ひく to pull
514 弾く ひく to play an instrument with strings, including piano
515 低い ひくい short,low
516 飛行機 ひこうき aeroplane
517 ひだり left hand side
518 ひと person
519 一つ ひとつ one
520 一月 ひとつき one month
521 一人 ひとり one person
522 ひま free time
523 ひゃく hundred
524 病院 びょういん hospital
525 病気 びょうき illness
526 ひる noon, daytime
527 昼御飯 ひるごはん midday meal
528 広い ひろい spacious,wide
529 フィルム roll of film
530 封筒 ふうとう envelope
531 プール swimming pool
532 フォーク fork
533 吹く ふく to blow
534 ふく clothes
535 二つ ふたつ two
536 豚肉 ぶたにく pork
537 二人 ふたり two people
538 二日 ふつか two days, second day of the month
539 太い ふとい fat
540 ふゆ winter
541 降る ふる to fall, e.g. rain or snow
542 古い ふるい old (not used for people)
543 ふろ bath
544 文章 ぶんしょう sentence,text
545 ページ page
546 下手 へた unskillful
547 ベッド bed
548 ペット pet
549 部屋 へや room
550 へん area
551 ペン pen
552 勉強 べんきょうする to study
553 便利 べんり useful, convenient
554 帽子 ぼうし hat
555 ボールペン ball-point pen
556 ほか other, the rest
557 ポケット pocket
558 欲しい ほしい want
559 ポスト post
560 細い ほそい thin
561 ボタン button
562 ホテル hotel
563 ほん book
564 本棚 ほんだな bookshelves
565 ほんとう truth
566 毎朝 まいあさ every morning
567 毎月 まいげつ/まいつき every month
568 毎週 まいしゅう every week
569 毎日 まいにち every day
570 毎年 まいねん/まいとし every year
571 毎晩 まいばん every night
572 まえ before
573 曲る まがる to turn,to bend
574 まずい unpleasant
575 また again,and
576 まだ yet,still
577 まち town,city
578 待つ まつ to wait
579 まっすぐ straight ahead,direct
580 マッチ match
581 まど window
582 丸い/円い まるい round,circular
583 まん ten thousand
584 万年筆 まんねんひつ fountain pen
585 磨く みがく to brush teeth, to polish
586 みぎ right side
587 短い みじかい short
588 みず water
589 みせ shop
590 見せる みせる to show
591 みち street
592 三日 みっか three days, third day of the month
593 三つ みっつ three
594 みどり green
595 皆さん みなさん everyone
596 みなみ south
597 みみ ear
598 見る 観る みる to see, to watch
599 みんな everyone
600 六日 むいか six days, sixth day of the month
601 向こう むこう over there
602 難しい むずかしい difficult
603 六つ むっつ six
604 むら village
605 eye
606 メートル metre
607 眼鏡 めがね glasses
608 もう already
609 もう一度 もういちど again
610 木曜日 もくようび Thursday
611 持つ もつ to hold
612 もっと more
613 もの thing
614 もん gate
615 問題 もんだい problem
616 八百屋 やおや greengrocer
617 野菜 やさい vegetable
618 易しい やさしい easy, simple
619 安い やすい cheap
620 休み やすみ rest,holiday
621 休む やすむ to rest
622 八つ やっつ eight
623 やま mountain
624 やる to do
625 夕方 ゆうがた evening
626 夕飯 ゆうはん dinner
627 郵便局 ゆうびんきょく post office
628 昨夜 ゆうべ last night
629 有名 ゆうめい famous
630 ゆき snow
631 行く ゆく to go
632 ゆっくりと slowly
633 八日 ようか eight days, eighth day of the month
634 洋服 ようふく western-style clothes
635 よく often, well
636 よこ beside,side,width
637 四日 よっか four days, fouth day of the month
638 四つ よっつ four
639 呼ぶ よぶ to call out,to invite
640 読む よむ to read
641 よる evening,night
642 弱い よわい weak
643 来月 らいげつ next month
644 来週 らいしゅう next week
645 来年 らいねん next year
646 ラジオ radio
647 ラジカセ / ラジオカセット radio cassette player
648 りっぱ splendid
649 留学生 りゅうがくせい overseas student
650 両親 りょうしん both parents
651 料理 りょうり cuisine
652 旅行 りょこう travel
653 れい zero
654 冷蔵庫 れいぞうこ refrigerator
655 レコード record
656 レストラン restaurant
657 練習 れんしゅうする to practice
658 廊下 ろうか corridor
659 ろく six
660 ワイシャツ business shirt
661 若い わかい young
662 分かる わかる to be understood
663 忘れる わすれる to forget
664 わたくし (humble) I,myself
665 わたし I,myself
666 渡す わたす to hand over
667 渡る わたる to go across
668 悪い わるい bad
669 より、ほう Used for comparison.

29
docs/database.md Normal file
View File

@@ -0,0 +1,29 @@
# Database
Here are some choices that have been made when designing the schema
### `JMdict_{Reading,Kanji}Element.elementId` and `JMdict_Sense.senseId`
The `elementId`/`senseId` field acts as a unique identifier for each individual element in these tables.
It is a packed version of the `(entryId, orderNum)` pair, where the first number is given 7 digits and the second is given 2 digits (max count found so far is `40`).
Since `entryId` already is a field in the table, it would technically have been fine to store the `orderNum` as a separate field,
but it is easier to be able to refer to the entries without a composite foreign key in other tables.
(NOTE: `entryId` is now inferred from `elementId` within sqlite using a generated column, so saying it is "stored in a separate field" might be a stretch)
In addition, the reading element id's are added with `1000000000` to make them unique from the kanji element id's. This reduces the amount of space needed for indices in some locations, because you can simply filter out each part with `>` or `<`.
We used to generate the `elementId` separately from `orderNum` as a sequential id, but it lead to all values
shifting whenever the data was updated, leading to very big diffs. Making it be a unique composite of data coming
from the source data itself means that the values will be stable across updates.
Due to the way the data is structured, we can use the `elementId` as the ordering number as well.
### `JMdict_EntryScore`
The `JMdict_EntryScore` table is used to store the score of each entry, which is used for sorting search results. The score is calculated based on a number of variables.
The table is automatically generated from other tables via triggers, and should be considered as a materialized view.
<s>There is a score row for every single entry in both `JMdict_KanjiElement` and `JMdict_ReadingElement`, split by the `type` field.</s>
This is no longer true, we now only store the rows for which the score is not `0`. The `type` field is now also virtual, since the `elementId` fields for both kanji and readings are unique to each other.

13
docs/lemmatizer.md Normal file
View File

@@ -0,0 +1,13 @@
# Lemmatizer
The lemmatizer is still quite experimental, but will play a more important role in the project in the future.
It is a manual implementation of a [Finite State Transducer](https://en.wikipedia.org/wiki/Morphological_dictionary#Finite_State_Transducers) for morphological parsing. The FST is used to recursively remove affixes from a word until it (hopefully) deconjugates into its dictionary form. This iterative deconjugation tree will then be combined with queries into the dictionary data to determine if the deconjugation leads to a real known word.
Each separate rule is a separate static object declared in `lib/util/lemmatizer/rules`.
There is a cli subcommand for testing the tool interactively, you can run
```bash
dart run jadb lemmatize -w '食べさせられない'
```

28
docs/overview.md Normal file
View File

@@ -0,0 +1,28 @@
# Overview
This is the documentation for `jadb`. Since I'm currently the only one working on it, the documentation is more or less just notes to myself, to ensure I remember how and why I implemented certain features in a certain way a few months down the road. This is not a comprehensive and formal documentation for downstream use, neither for developers nor end-users.
- [Word Search](./word-search.md)
- [Database](./database.md)
- [Lemmatizer](./lemmatizer.md)
## Project structure
- `lib/_data_ingestion` contains all the code for reading data sources, transforming them and compiling them into an SQLite database. This is for the most part isolated from the rest of the codebase, and should not be depended on by any code used for querying the database.
- `lib/cli` contains code for cli tooling (e.g. argument parsing, subcommand handling, etc.)
- `lib/const_data` contains database data that is small enough to warrant being hardcoded as dart constants.
- `lib/models` contains all the code for representing the database schema as Dart classes, and for converting between those classes and the actual database.
- `lib/search` contains all the code for searching the database.
- `lib/util/lemmatizer` contains the code for lemmatization, which will be used by the search code in the future.
- `migrations` contains raw SQL files for creating the database schema.
## SQLite naming conventions
> [!WARNING]
> All of these conventions are actually not enforced yet, it will be fixed at some point.
- Indices are prefixed with `IDX__`
- Crossref tables are prefixed with `XREF__`
- Trigger names are prefixed with `TRG__`
- Views are prefixed with `VW__`
- All data sources should have a `<datasource>_Version` table, which contains a single row with the version of the data source used to generate the database.

21
docs/word-search.md Normal file
View File

@@ -0,0 +1,21 @@
# Word search
The word search procedure is currently split into 3 parts:
1. **Entry ID query**:
Use a complex query with various scoring factors to try to get list of
database ids pointing at dictionary entries, sorted by how likely we think this
word is the word that the caller is looking for. The output here is a `List<int>`
2. **Data Query**:
Takes the entry id list from the last search, and performs all queries needed to retrieve
all the dictionary data for those IDs. The result is a struct with a bunch of flattened lists
with data for all the dictionary entries. These lists are sorted by the order that the ids
were provided.
3. **Regrouping**:
Takes the flattened data, and regroups the items into structs with a more "hierarchical" structure.
All data tagged with the same ID will end up in the same struct. Returns a list of these structs.

71
flake.lock generated
View File

@@ -1,48 +1,32 @@
{
"nodes": {
"jmdict-src": {
"flake": false,
"datasources": {
"inputs": {
"nixpkgs": [
"nixpkgs"
]
},
"locked": {
"narHash": "sha256-84P7r/fFlBnawy6yChrD9WMHmOWcEGWUmoK70N4rdGQ=",
"type": "file",
"url": "http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz"
"lastModified": 1776081209,
"narHash": "sha256-zR1115tcOPnYLk6NznSf7YslyaJLc/MGayEHShitx18=",
"ref": "refs/heads/main",
"rev": "7fe3552bb16e1d315c0b27b243e5eb53cd9e86fc",
"revCount": 13,
"type": "git",
"url": "https://git.pvv.ntnu.no/Mugiten/datasources.git"
},
"original": {
"type": "file",
"url": "http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz"
}
},
"jmdict-with-examples-src": {
"flake": false,
"locked": {
"narHash": "sha256-PM0sv7VcsCya2Ek02CI7hVwB3Jawn6bICSI+dsJK0yo=",
"type": "file",
"url": "http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz"
},
"original": {
"type": "file",
"url": "http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz"
}
},
"kanjidic2-src": {
"flake": false,
"locked": {
"narHash": "sha256-Lc0wUPpuDKuMDv2t87//w3z20RX8SMJI2iIRtUJ8fn0=",
"type": "file",
"url": "https://www.edrdg.org/kanjidic/kanjidic2.xml.gz"
},
"original": {
"type": "file",
"url": "https://www.edrdg.org/kanjidic/kanjidic2.xml.gz"
"type": "git",
"url": "https://git.pvv.ntnu.no/Mugiten/datasources.git"
}
},
"nixpkgs": {
"locked": {
"lastModified": 1746904237,
"narHash": "sha256-3e+AVBczosP5dCLQmMoMEogM57gmZ2qrVSrmq9aResQ=",
"lastModified": 1775423009,
"narHash": "sha256-vPKLpjhIVWdDrfiUM8atW6YkIggCEKdSAlJPzzhkQlw=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "d89fc19e405cb2d55ce7cc114356846a0ee5e956",
"rev": "68d8aa3d661f0e6bd5862291b5bb263b2a6595c9",
"type": "github"
},
"original": {
@@ -51,25 +35,10 @@
"type": "indirect"
}
},
"radkfile-src": {
"flake": false,
"locked": {
"narHash": "sha256-rO2z5GPt3g6osZOlpyWysmIbRV2Gw4AR4XvngVTHNpk=",
"type": "file",
"url": "http://ftp.usf.edu/pub/ftp.monash.edu.au/pub/nihongo/radkfile.gz"
},
"original": {
"type": "file",
"url": "http://ftp.usf.edu/pub/ftp.monash.edu.au/pub/nihongo/radkfile.gz"
}
},
"root": {
"inputs": {
"jmdict-src": "jmdict-src",
"jmdict-with-examples-src": "jmdict-with-examples-src",
"kanjidic2-src": "kanjidic2-src",
"nixpkgs": "nixpkgs",
"radkfile-src": "radkfile-src"
"datasources": "datasources",
"nixpkgs": "nixpkgs"
}
}
},

View File

@@ -4,35 +4,16 @@
inputs = {
nixpkgs.url = "nixpkgs/nixos-unstable";
jmdict-src = {
# url = "http://ftp.edrdg.org/pub/Nihongo/JMdict.gz";
url = "http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz";
flake = false;
};
jmdict-with-examples-src = {
url = "http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz";
flake = false;
};
radkfile-src = {
url = "http://ftp.usf.edu/pub/ftp.monash.edu.au/pub/nihongo/radkfile.gz";
flake = false;
};
kanjidic2-src = {
url = "https://www.edrdg.org/kanjidic/kanjidic2.xml.gz";
flake = false;
datasources = {
url = "git+https://git.pvv.ntnu.no/Mugiten/datasources.git";
inputs.nixpkgs.follows = "nixpkgs";
};
};
outputs = {
self,
nixpkgs,
jmdict-src,
jmdict-with-examples-src,
radkfile-src,
kanjidic2-src
datasources,
}: let
inherit (nixpkgs) lib;
systems = [
@@ -77,20 +58,29 @@
devShells = forAllSystems (system: pkgs: {
default = pkgs.mkShell {
buildInputs = with pkgs; [
packages = with pkgs; [
dart
gnumake
lcov
sqldiff
sqlite-interactive
sqlite-analyzer
sqlite-web
sqlint
sqlfluff
];
env = {
LIBSQLITE_PATH = "${pkgs.sqlite.out}/lib/libsqlite3.so";
JADB_PATH = "result/jadb.sqlite";
LD_LIBRARY_PATH = lib.makeLibraryPath [ pkgs.sqlite ];
};
};
sqlite-debugging = pkgs.mkShell {
packages = with pkgs; [
sqlite-interactive
sqlite-analyzer
sqlite-web
# sqlint
sqlfluff
];
};
});
packages = let
@@ -104,33 +94,43 @@
platforms = lib.platforms.all;
};
src = lib.cleanSource ./.;
src = builtins.filterSource (path: type: let
baseName = baseNameOf (toString path);
in !(lib.any (b: b) [
(!(lib.cleanSourceFilter path type))
(baseName == ".github" && type == "directory")
(baseName == ".gitea" && type == "directory")
(baseName == "nix" && type == "directory")
(baseName == ".envrc" && type == "regular")
(baseName == "flake.lock" && type == "regular")
(baseName == "flake.nix" && type == "regular")
(baseName == ".sqlfluff" && type == "regular")
])) ./.;
in forAllSystems (system: pkgs: {
default = self.packages.${system}.database;
jmdict = pkgs.callPackage ./nix/jmdict.nix {
inherit jmdict-src jmdict-with-examples-src edrdgMetadata;
};
filteredSource = pkgs.runCommandLocal "filtered-source" { } ''
ln -s ${src} $out
'';
radkfile = pkgs.callPackage ./nix/radkfile.nix {
inherit radkfile-src edrdgMetadata;
};
kanjidic2 = pkgs.callPackage ./nix/kanjidic2.nix {
inherit kanjidic2-src edrdgMetadata;
};
inherit (datasources.packages.${system}) jmdict radkfile kanjidic2;
database-tool = pkgs.callPackage ./nix/database_tool.nix {
inherit src;
};
database = pkgs.callPackage ./nix/database.nix {
inherit (self.packages.${system}) database-tool jmdict radkfile kanjidic2;
inherit (datasources.packages.${system}) jmdict radkfile kanjidic2 tanos-jlpt;
inherit (self.packages.${system}) database-tool;
inherit src;
};
database-wal = pkgs.callPackage ./nix/database.nix {
inherit (self.packages.${system}) database-tool jmdict radkfile kanjidic2;
inherit (datasources.packages.${system}) jmdict radkfile kanjidic2 tanos-jlpt;
inherit (self.packages.${system}) database-tool;
inherit src;
wal = true;
};

View File

@@ -1,13 +1,15 @@
import 'package:jadb/_data_ingestion/sql_writable.dart';
abstract class Element extends SQLWritable {
final int elementId;
final String reading;
final int? news;
final int? ichi;
final int? spec;
final int? gai;
final int? nf;
const Element({
Element({
required this.elementId,
required this.reading,
this.news,
this.ichi,
@@ -16,77 +18,61 @@ abstract class Element extends SQLWritable {
this.nf,
});
@override
Map<String, Object?> get sqlValue => {
'reading': reading,
'news': news,
'ichi': ichi,
'spec': spec,
'gai': gai,
'nf': nf,
};
'elementId': elementId,
'reading': reading,
'news': news,
'ichi': ichi,
'spec': spec,
'gai': gai,
'nf': nf,
};
}
class KanjiElement extends Element {
int orderNum;
List<String> info;
KanjiElement({
this.info = const [],
required this.orderNum,
required String reading,
int? news,
int? ichi,
int? spec,
int? gai,
int? nf,
}) : super(
reading: reading,
news: news,
ichi: ichi,
spec: spec,
gai: gai,
nf: nf,
);
required super.elementId,
required super.reading,
super.news,
super.ichi,
super.spec,
super.gai,
super.nf,
});
@override
Map<String, Object?> get sqlValue => {
...super.sqlValue,
'orderNum': orderNum,
};
...super.sqlValue,
};
}
class ReadingElement extends Element {
int orderNum;
bool readingDoesNotMatchKanji;
List<String> info;
List<String> restrictions;
ReadingElement({
required this.orderNum,
required this.readingDoesNotMatchKanji,
this.info = const [],
this.restrictions = const [],
required String reading,
int? news,
int? ichi,
int? spec,
int? gai,
int? nf,
}) : super(
reading: reading,
news: news,
ichi: ichi,
spec: spec,
gai: gai,
nf: nf,
);
required super.elementId,
required super.reading,
super.news,
super.ichi,
super.spec,
super.gai,
super.nf,
});
@override
Map<String, Object?> get sqlValue => {
...super.sqlValue,
'orderNum': orderNum,
'readingDoesNotMatchKanji': readingDoesNotMatchKanji,
};
...super.sqlValue,
'readingDoesNotMatchKanji': readingDoesNotMatchKanji,
};
}
class LanguageSource extends SQLWritable {
@@ -104,11 +90,11 @@ class LanguageSource extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'language': language,
'phrase': phrase,
'fullyDescribesSense': fullyDescribesSense,
'constructedFromSmallerWords': constructedFromSmallerWords,
};
'language': language,
'phrase': phrase,
'fullyDescribesSense': fullyDescribesSense,
'constructedFromSmallerWords': constructedFromSmallerWords,
};
}
class Glossary extends SQLWritable {
@@ -116,53 +102,44 @@ class Glossary extends SQLWritable {
final String phrase;
final String? type;
const Glossary({
required this.language,
required this.phrase,
this.type,
});
const Glossary({required this.language, required this.phrase, this.type});
@override
Map<String, Object?> get sqlValue => {
'language': language,
'phrase': phrase,
'type': type,
};
// 'language': language,
'phrase': phrase,
};
}
final kanaRegex =
RegExp(r'^[\p{Script=Katakana}\p{Script=Hiragana}ー]+$', unicode: true);
final kanaRegex = RegExp(
r'^[\p{Script=Katakana}\p{Script=Hiragana}ー]+$',
unicode: true,
);
class XRefParts {
final String? kanjiRef;
final String? readingRef;
final int? senseOrderNum;
const XRefParts({
this.kanjiRef,
this.readingRef,
this.senseOrderNum,
}) : assert(kanjiRef != null || readingRef != null);
const XRefParts({this.kanjiRef, this.readingRef, this.senseOrderNum})
: assert(kanjiRef != null || readingRef != null);
Map<String, Object?> toJson() => {
'kanjiRef': kanjiRef,
'readingRef': readingRef,
'senseOrderNum': senseOrderNum,
};
'kanjiRef': kanjiRef,
'readingRef': readingRef,
'senseOrderNum': senseOrderNum,
};
}
class XRef {
final String entryId;
final String reading;
const XRef({
required this.entryId,
required this.reading,
});
const XRef({required this.entryId, required this.reading});
}
class Sense extends SQLWritable {
final int senseId;
final int orderNum;
final List<XRefParts> antonyms;
final List<String> dialects;
final List<String> fields;
@@ -177,7 +154,6 @@ class Sense extends SQLWritable {
const Sense({
required this.senseId,
required this.orderNum,
this.antonyms = const [],
this.dialects = const [],
this.fields = const [],
@@ -193,9 +169,8 @@ class Sense extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'senseId': senseId,
'orderNum': orderNum,
};
'senseId': senseId,
};
bool get isEmpty =>
antonyms.isEmpty &&
@@ -224,5 +199,6 @@ class Entry extends SQLWritable {
required this.senses,
});
@override
Map<String, Object?> get sqlValue => {'entryId': entryId};
}

View File

@@ -1,35 +1,47 @@
import 'dart:collection';
import 'dart:io';
import 'package:collection/collection.dart';
import 'package:jadb/_data_ingestion/jmdict/objects.dart';
import 'package:jadb/table_names/jmdict.dart';
import 'package:sqflite_common/sqlite_api.dart';
/// A wrapper for the result of resolving an xref, which includes the resolved entry and a flag
/// indicating whether the xref was ambiguous (i.e. could refer to multiple entries).
class ResolvedXref {
Entry entry;
bool ambiguous;
final Entry entry;
final bool ambiguous;
ResolvedXref(this.entry, this.ambiguous);
const ResolvedXref(this.entry, this.ambiguous);
}
/// Resolves an xref (pair of kanji, optionally reading, and optionally sense number) to an a specific
/// JMdict entry, if possible.
///
/// If the xref is ambiguous (i.e. it could refer to multiple entries), the
/// first entry is returned, and the returned value is marked as ambiguous.
///
/// If the xref cannot be resolved to any entry at all, an exception is thrown.
ResolvedXref resolveXref(
SplayTreeMap<String, Set<Entry>> entriesByKanji,
SplayTreeMap<String, Set<Entry>> entriesByReading,
XRefParts xref,
) {
List<Entry> candidateEntries = switch ((xref.kanjiRef, xref.readingRef)) {
(null, null) =>
throw Exception('Xref $xref has no kanji or reading reference'),
(String k, null) => entriesByKanji[k]!.toList(),
(null, String r) => entriesByReading[r]!.toList(),
(String k, String r) =>
(null, null) => throw Exception(
'Xref $xref has no kanji or reading reference',
),
(final String k, null) => entriesByKanji[k]!.toList(),
(null, final String r) => entriesByReading[r]!.toList(),
(final String k, final String r) =>
entriesByKanji[k]!.intersection(entriesByReading[r]!).toList(),
};
// Filter out entries that don't have the number of senses specified in the xref
if (xref.senseOrderNum != null) {
candidateEntries
.retainWhere((entry) => entry.senses.length >= xref.senseOrderNum!);
candidateEntries.retainWhere(
(entry) => entry.senses.length >= xref.senseOrderNum!,
);
}
// If the xref has a reading ref but no kanji ref, and there are multiple
@@ -38,8 +50,9 @@ ResolvedXref resolveXref(
if (xref.kanjiRef == null &&
xref.readingRef != null &&
candidateEntries.length > 1) {
final candidatesWithEmptyKanji =
candidateEntries.where((entry) => entry.kanji.length == 0).toList();
final candidatesWithEmptyKanji = candidateEntries
.where((entry) => entry.kanji.isEmpty)
.toList();
if (candidatesWithEmptyKanji.isNotEmpty) {
candidateEntries = candidatesWithEmptyKanji;
@@ -50,7 +63,7 @@ ResolvedXref resolveXref(
// entry in case there are multiple candidates left.
candidateEntries.sortBy<num>((entry) => entry.senses.length);
if (candidateEntries.length == 0) {
if (candidateEntries.isEmpty) {
throw Exception(
'SKIPPING: Xref $xref has ${candidateEntries.length} entries, '
'kanjiRef: ${xref.kanjiRef}, readingRef: ${xref.readingRef}, '
@@ -72,51 +85,49 @@ Future<void> seedJMDictData(List<Entry> entries, Database db) async {
print(' [JMdict] Batch 1 - Kanji and readings');
Batch b = db.batch();
if (Platform.environment['JMDICT_VERSION'] != null &&
Platform.environment['JMDICT_DATE'] != null &&
Platform.environment['JMDICT_HASH'] != null) {
b.insert(JMdictTableNames.version, {
'version': Platform.environment['JMDICT_VERSION']!,
'date': Platform.environment['JMDICT_DATE']!,
'hash': Platform.environment['JMDICT_HASH']!,
});
} else {
print(
'WARNING: JMDICT version information not found in environment variables. '
'This may cause issues with future updates.',
);
}
for (final e in entries) {
b.insert(JMdictTableNames.entry, e.sqlValue);
for (final k in e.kanji) {
b.insert(
JMdictTableNames.kanjiElement,
k.sqlValue..addAll({'entryId': e.entryId}),
);
b.insert(JMdictTableNames.kanjiElement, k.sqlValue);
for (final i in k.info) {
b.insert(
JMdictTableNames.kanjiInfo,
{
'entryId': e.entryId,
'reading': k.reading,
'info': i,
},
);
b.insert(JMdictTableNames.kanjiInfo, {
'elementId': k.elementId,
'info': i,
});
}
}
for (final r in e.readings) {
b.insert(
JMdictTableNames.readingElement,
r.sqlValue..addAll({'entryId': e.entryId}),
);
b.insert(JMdictTableNames.readingElement, r.sqlValue);
for (final i in r.info) {
b.insert(
JMdictTableNames.readingInfo,
{
'entryId': e.entryId,
'reading': r.reading,
'info': i,
},
);
b.insert(JMdictTableNames.readingInfo, {
'elementId': r.elementId,
'info': i,
});
}
for (final res in r.restrictions) {
b.insert(
JMdictTableNames.readingRestriction,
{
'entryId': e.entryId,
'reading': r.reading,
'restriction': res,
},
);
b.insert(JMdictTableNames.readingRestriction, {
'elementId': r.elementId,
'restriction': res,
});
}
}
}
@@ -128,17 +139,18 @@ Future<void> seedJMDictData(List<Entry> entries, Database db) async {
for (final e in entries) {
for (final s in e.senses) {
b.insert(
JMdictTableNames.sense, s.sqlValue..addAll({'entryId': e.entryId}));
b.insert(JMdictTableNames.sense, s.sqlValue);
for (final d in s.dialects) {
b.insert(
JMdictTableNames.senseDialect,
{'senseId': s.senseId, 'dialect': d},
);
b.insert(JMdictTableNames.senseDialect, {
'senseId': s.senseId,
'dialect': d,
});
}
for (final f in s.fields) {
b.insert(
JMdictTableNames.senseField, {'senseId': s.senseId, 'field': f});
b.insert(JMdictTableNames.senseField, {
'senseId': s.senseId,
'field': f,
});
}
for (final i in s.info) {
b.insert(JMdictTableNames.senseInfo, {'senseId': s.senseId, 'info': i});
@@ -150,16 +162,20 @@ Future<void> seedJMDictData(List<Entry> entries, Database db) async {
b.insert(JMdictTableNames.sensePOS, {'senseId': s.senseId, 'pos': p});
}
for (final rk in s.restrictedToKanji) {
b.insert(
JMdictTableNames.senseRestrictedToKanji,
{'entryId': e.entryId, 'senseId': s.senseId, 'kanji': rk},
);
b.insert(JMdictTableNames.senseRestrictedToKanji, {
'senseId': s.senseId,
'kanjiElementId': e.kanji
.firstWhere((k) => k.reading == rk)
.elementId,
});
}
for (final rr in s.restrictedToReading) {
b.insert(
JMdictTableNames.senseRestrictedToReading,
{'entryId': e.entryId, 'senseId': s.senseId, 'reading': rr},
);
b.insert(JMdictTableNames.senseRestrictedToReading, {
'senseId': s.senseId,
'readingElementId': e.readings
.firstWhere((r) => r.reading == rr)
.elementId,
});
}
for (final ls in s.languageSource) {
b.insert(
@@ -172,6 +188,14 @@ Future<void> seedJMDictData(List<Entry> entries, Database db) async {
JMdictTableNames.senseGlossary,
g.sqlValue..addAll({'senseId': s.senseId}),
);
if (g.type != null) {
b.insert(JMdictTableNames.senseGlossaryType, {
'senseId': s.senseId,
'phrase': g.phrase,
'type': g.type!,
});
}
}
}
}
@@ -179,25 +203,18 @@ Future<void> seedJMDictData(List<Entry> entries, Database db) async {
await b.commit(noResult: true);
print(' [JMdict] Building xref trees');
SplayTreeMap<String, Set<Entry>> entriesByKanji = SplayTreeMap();
final SplayTreeMap<String, Set<Entry>> entriesByKanji = SplayTreeMap();
final SplayTreeMap<String, Set<Entry>> entriesByReading = SplayTreeMap();
for (final entry in entries) {
for (final kanji in entry.kanji) {
if (entriesByKanji.containsKey(kanji.reading)) {
entriesByKanji.update(kanji.reading, (list) => list..add(entry));
} else {
entriesByKanji.putIfAbsent(kanji.reading, () => {entry});
}
entriesByKanji.putIfAbsent(kanji.reading, () => {});
entriesByKanji.update(kanji.reading, (set) => set..add(entry));
}
}
SplayTreeMap<String, Set<Entry>> entriesByReading = SplayTreeMap();
for (final entry in entries) {
for (final reading in entry.readings) {
if (entriesByReading.containsKey(reading.reading)) {
entriesByReading.update(reading.reading, (list) => list..add(entry));
} else {
entriesByReading.putIfAbsent(reading.reading, () => {entry});
}
entriesByReading.putIfAbsent(reading.reading, () => {});
entriesByReading.update(reading.reading, (set) => set..add(entry));
}
}
@@ -206,6 +223,7 @@ Future<void> seedJMDictData(List<Entry> entries, Database db) async {
for (final e in entries) {
for (final s in e.senses) {
final seenSeeAlsoXrefs = <int>{};
for (final xref in s.seeAlso) {
final resolvedEntry = resolveXref(
entriesByKanji,
@@ -213,19 +231,26 @@ Future<void> seedJMDictData(List<Entry> entries, Database db) async {
xref,
);
b.insert(
JMdictTableNames.senseSeeAlso,
{
'senseId': s.senseId,
'xrefEntryId': resolvedEntry.entry.entryId,
'seeAlsoKanji': xref.kanjiRef,
'seeAlsoReading': xref.readingRef,
'seeAlsoSense': xref.senseOrderNum,
'ambiguous': resolvedEntry.ambiguous,
},
);
if (seenSeeAlsoXrefs.contains(resolvedEntry.entry.entryId)) {
print(
'WARNING: Skipping duplicate seeAlso xref from sense ${s.senseId} to entry ${resolvedEntry.entry.entryId}\n'
' (kanjiRef: ${xref.kanjiRef}, readingRef: ${xref.readingRef}, senseOrderNum: ${xref.senseOrderNum})',
);
continue;
}
seenSeeAlsoXrefs.add(resolvedEntry.entry.entryId);
b.insert(JMdictTableNames.senseSeeAlso, {
'senseId': s.senseId,
'xrefEntryId': resolvedEntry.entry.entryId,
'seeAlsoSense': xref.senseOrderNum != null
? xref.senseOrderNum! - 1
: null,
'ambiguous': resolvedEntry.ambiguous,
});
}
final seenAntonymXrefs = <int>{};
for (final ant in s.antonyms) {
final resolvedEntry = resolveXref(
entriesByKanji,
@@ -233,12 +258,21 @@ Future<void> seedJMDictData(List<Entry> entries, Database db) async {
ant,
);
if (seenAntonymXrefs.contains(resolvedEntry.entry.entryId)) {
print(
'WARNING: Skipping duplicate antonym xref from sense ${s.senseId} to entry ${resolvedEntry.entry.entryId}\n'
' (kanjiRef: ${ant.kanjiRef}, readingRef: ${ant.readingRef}, senseOrderNum: ${ant.senseOrderNum})',
);
continue;
}
seenAntonymXrefs.add(resolvedEntry.entry.entryId);
b.insert(JMdictTableNames.senseAntonyms, {
'senseId': s.senseId,
'xrefEntryId': resolvedEntry.entry.entryId,
'antonymKanji': ant.kanjiRef,
'antonymReading': ant.readingRef,
'antonymSense': ant.senseOrderNum,
'antonymSense': ant.senseOrderNum != null
? ant.senseOrderNum! - 1
: null,
'ambiguous': resolvedEntry.ambiguous,
});
}

View File

@@ -8,15 +8,17 @@ List<int?> getPriorityValues(XmlElement e, String prefix) {
int? news, ichi, spec, gai, nf;
for (final pri in e.findElements('${prefix}_pri')) {
final txt = pri.innerText;
if (txt.startsWith('news'))
if (txt.startsWith('news')) {
news = int.parse(txt.substring(4));
else if (txt.startsWith('ichi'))
} else if (txt.startsWith('ichi')) {
ichi = int.parse(txt.substring(4));
else if (txt.startsWith('spec'))
} else if (txt.startsWith('spec')) {
spec = int.parse(txt.substring(4));
else if (txt.startsWith('gai'))
} else if (txt.startsWith('gai')) {
gai = int.parse(txt.substring(3));
else if (txt.startsWith('nf')) nf = int.parse(txt.substring(2));
} else if (txt.startsWith('nf')) {
nf = int.parse(txt.substring(2));
}
}
return [news, ichi, spec, gai, nf];
}
@@ -46,10 +48,7 @@ XRefParts parseXrefParts(String s) {
);
}
} else {
result = XRefParts(
kanjiRef: parts[0],
readingRef: parts[1],
);
result = XRefParts(kanjiRef: parts[0], readingRef: parts[1]);
}
break;
@@ -72,8 +71,6 @@ XRefParts parseXrefParts(String s) {
List<Entry> parseJMDictData(XmlElement root) {
final List<Entry> entries = [];
int senseId = 0;
for (final entry in root.childElements) {
final entryId = int.parse(entry.findElements('ent_seq').first.innerText);
@@ -81,58 +78,83 @@ List<Entry> parseJMDictData(XmlElement root) {
final List<ReadingElement> readingEls = [];
final List<Sense> senses = [];
for (final (kanjiNum, k_ele) in entry.findElements('k_ele').indexed) {
final ke_pri = getPriorityValues(k_ele, 'ke');
for (final (orderNum, kEle) in entry.findElements('k_ele').indexed) {
assert(
orderNum < 100,
'Entry $entryId has more than 100 kanji elements, which will break the elementId generation logic.',
);
final elementId = entryId * 100 + orderNum;
final kePri = getPriorityValues(kEle, 'ke');
kanjiEls.add(
KanjiElement(
orderNum: kanjiNum + 1,
info: k_ele
elementId: elementId,
info: kEle
.findElements('ke_inf')
.map((e) => e.innerText.substring(1, e.innerText.length - 1))
.toList(),
reading: k_ele.findElements('keb').first.innerText,
news: ke_pri[0],
ichi: ke_pri[1],
spec: ke_pri[2],
gai: ke_pri[3],
nf: ke_pri[4],
reading: kEle.findElements('keb').first.innerText,
news: kePri[0],
ichi: kePri[1],
spec: kePri[2],
gai: kePri[3],
nf: kePri[4],
),
);
}
for (final (orderNum, r_ele) in entry.findElements('r_ele').indexed) {
final re_pri = getPriorityValues(r_ele, 're');
final readingDoesNotMatchKanji =
r_ele.findElements('re_nokanji').isNotEmpty;
for (final (orderNum, rEle) in entry.findElements('r_ele').indexed) {
assert(
orderNum < 100,
'Entry $entryId has more than 100 readings, which will break the elementId generation logic.',
);
final elementId = 1_000_000_000 + entryId * 100 + orderNum;
final rePri = getPriorityValues(rEle, 're');
final readingDoesNotMatchKanji = rEle
.findElements('re_nokanji')
.isNotEmpty;
readingEls.add(
ReadingElement(
orderNum: orderNum + 1,
elementId: elementId,
readingDoesNotMatchKanji: readingDoesNotMatchKanji,
info: r_ele
info: rEle
.findElements('re_inf')
.map((e) => e.innerText.substring(1, e.innerText.length - 1))
.toList(),
restrictions:
r_ele.findElements('re_restr').map((e) => e.innerText).toList(),
reading: r_ele.findElements('reb').first.innerText,
news: re_pri[0],
ichi: re_pri[1],
spec: re_pri[2],
gai: re_pri[3],
nf: re_pri[4],
restrictions: rEle
.findElements('re_restr')
.map((e) => e.innerText)
.toList(),
reading: rEle.findElements('reb').first.innerText,
news: rePri[0],
ichi: rePri[1],
spec: rePri[2],
gai: rePri[3],
nf: rePri[4],
),
);
}
for (final (orderNum, sense) in entry.findElements('sense').indexed) {
senseId++;
assert(
orderNum < 100,
'Entry $entryId has more than 100 senses, which will break the senseId generation logic.',
);
final senseId = entryId * 100 + orderNum;
final result = Sense(
senseId: senseId,
orderNum: orderNum + 1,
restrictedToKanji:
sense.findElements('stagk').map((e) => e.innerText).toList(),
restrictedToReading:
sense.findElements('stagr').map((e) => e.innerText).toList(),
restrictedToKanji: sense
.findElements('stagk')
.map((e) => e.innerText)
.toList(),
restrictedToReading: sense
.findElements('stagr')
.map((e) => e.innerText)
.toList(),
pos: sense
.findElements('pos')
.map((e) => e.innerText.substring(1, e.innerText.length - 1))

View File

@@ -13,42 +13,33 @@ class CodePoint extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'type': type,
'codepoint': codepoint,
};
'kanji': kanji,
'type': type,
'codepoint': codepoint,
};
}
class Radical extends SQLWritable {
final String kanji;
final int radicalId;
const Radical({
required this.kanji,
required this.radicalId,
});
const Radical({required this.kanji, required this.radicalId});
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'radicalId': radicalId,
};
Map<String, Object?> get sqlValue => {'kanji': kanji, 'radicalId': radicalId};
}
class StrokeMiscount extends SQLWritable {
final String kanji;
final int strokeCount;
const StrokeMiscount({
required this.kanji,
required this.strokeCount,
});
const StrokeMiscount({required this.kanji, required this.strokeCount});
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'strokeCount': strokeCount,
};
'kanji': kanji,
'strokeCount': strokeCount,
};
}
class Variant extends SQLWritable {
@@ -64,10 +55,10 @@ class Variant extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'type': type,
'variant': variant,
};
'kanji': kanji,
'type': type,
'variant': variant,
};
}
class DictionaryReference extends SQLWritable {
@@ -83,10 +74,10 @@ class DictionaryReference extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'type': type,
'ref': ref,
};
'kanji': kanji,
'type': type,
'ref': ref,
};
}
class DictionaryReferenceMoro extends SQLWritable {
@@ -104,11 +95,11 @@ class DictionaryReferenceMoro extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'ref': ref,
'volume': volume,
'page': page,
};
'kanji': kanji,
'ref': ref,
'volume': volume,
'page': page,
};
}
class QueryCode extends SQLWritable {
@@ -126,11 +117,11 @@ class QueryCode extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'code': code,
'type': type,
'skipMisclassification': skipMisclassification,
};
'kanji': kanji,
'code': code,
'type': type,
'skipMisclassification': skipMisclassification,
};
}
class Reading extends SQLWritable {
@@ -146,10 +137,10 @@ class Reading extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'type': type,
'reading': reading,
};
'kanji': kanji,
'type': type,
'reading': reading,
};
}
class Kunyomi extends SQLWritable {
@@ -165,10 +156,10 @@ class Kunyomi extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'yomi': yomi,
'isJouyou': isJouyou,
};
'kanji': kanji,
'yomi': yomi,
'isJouyou': isJouyou,
};
}
class Onyomi extends SQLWritable {
@@ -186,11 +177,11 @@ class Onyomi extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'yomi': yomi,
'isJouyou': isJouyou,
'type': type,
};
'kanji': kanji,
'yomi': yomi,
'isJouyou': isJouyou,
'type': type,
};
}
class Meaning extends SQLWritable {
@@ -206,10 +197,10 @@ class Meaning extends SQLWritable {
@override
Map<String, Object?> get sqlValue => {
'kanji': kanji,
'language': language,
'meaning': meaning,
};
'kanji': kanji,
'language': language,
'meaning': meaning,
};
}
class Character extends SQLWritable {
@@ -254,11 +245,9 @@ class Character extends SQLWritable {
this.nanori = const [],
});
@override
Map<String, Object?> get sqlValue => {
'literal': literal,
'grade': grade,
'strokeCount': strokeCount,
'frequency': frequency,
'jlpt': jlpt,
};
'literal': literal,
'strokeCount': strokeCount,
};
}

View File

@@ -1,3 +1,5 @@
import 'dart:io';
import 'package:jadb/table_names/kanjidic.dart';
import 'package:sqflite_common/sqlite_api.dart';
@@ -5,6 +7,22 @@ import 'objects.dart';
Future<void> seedKANJIDICData(List<Character> characters, Database db) async {
final b = db.batch();
if (Platform.environment['KANJIDIC_VERSION'] != null &&
Platform.environment['KANJIDIC_DATE'] != null &&
Platform.environment['KANJIDIC_HASH'] != null) {
b.insert(KANJIDICTableNames.version, {
'version': Platform.environment['KANJIDIC_VERSION']!,
'date': Platform.environment['KANJIDIC_DATE']!,
'hash': Platform.environment['KANJIDIC_HASH']!,
});
} else {
print(
'WARNING: KANJIDIC version information not found in environment variables. '
'This may cause issues with future updates.',
);
}
for (final c in characters) {
// if (c.dictionaryReferences.any((e) =>
// c.dictionaryReferences
@@ -15,14 +33,29 @@ Future<void> seedKANJIDICData(List<Character> characters, Database db) async {
// }
b.insert(KANJIDICTableNames.character, c.sqlValue);
if (c.grade != null) {
b.insert(KANJIDICTableNames.grade, {
'kanji': c.literal,
'grade': c.grade!,
});
}
if (c.frequency != null) {
b.insert(KANJIDICTableNames.frequency, {
'kanji': c.literal,
'frequency': c.frequency!,
});
}
if (c.jlpt != null) {
b.insert(KANJIDICTableNames.jlpt, {'kanji': c.literal, 'jlpt': c.jlpt!});
}
for (final n in c.radicalName) {
assert(c.radical != null, 'Radical name without radical');
b.insert(
KANJIDICTableNames.radicalName,
{
'radicalId': c.radical!.radicalId,
'name': n,
},
{'radicalId': c.radical!.radicalId, 'name': n},
conflictAlgorithm: ConflictAlgorithm.ignore,
);
}
@@ -34,13 +67,10 @@ Future<void> seedKANJIDICData(List<Character> characters, Database db) async {
b.insert(KANJIDICTableNames.radical, c.radical!.sqlValue);
}
for (final sm in c.strokeMiscounts) {
b.insert(
KANJIDICTableNames.strokeMiscount,
{
'kanji': c.literal,
'strokeCount': sm,
},
);
b.insert(KANJIDICTableNames.strokeMiscount, {
'kanji': c.literal,
'strokeCount': sm,
});
}
for (final v in c.variants) {
b.insert(KANJIDICTableNames.variant, v.sqlValue);
@@ -64,24 +94,24 @@ Future<void> seedKANJIDICData(List<Character> characters, Database db) async {
}
for (final (i, y) in c.kunyomi.indexed) {
b.insert(
KANJIDICTableNames.kunyomi, y.sqlValue..addAll({'orderNum': i + 1}));
KANJIDICTableNames.kunyomi,
y.sqlValue..addAll({'orderNum': i + 1}),
);
}
for (final (i, y) in c.onyomi.indexed) {
b.insert(
KANJIDICTableNames.onyomi, y.sqlValue..addAll({'orderNum': i + 1}));
KANJIDICTableNames.onyomi,
y.sqlValue..addAll({'orderNum': i + 1}),
);
}
for (final (i, m) in c.meanings.indexed) {
b.insert(
KANJIDICTableNames.meaning, m.sqlValue..addAll({'orderNum': i + 1}));
KANJIDICTableNames.meaning,
m.sqlValue..addAll({'orderNum': i + 1}),
);
}
for (final n in c.nanori) {
b.insert(
KANJIDICTableNames.nanori,
{
'kanji': c.literal,
'nanori': n,
},
);
b.insert(KANJIDICTableNames.nanori, {'kanji': c.literal, 'nanori': n});
}
}
await b.commit(noResult: true);

View File

@@ -1,4 +1,5 @@
import 'package:jadb/_data_ingestion/kanjidic/objects.dart';
import 'package:jadb/util/romaji_transliteration.dart';
import 'package:xml/xml.dart';
List<Character> parseKANJIDICData(XmlElement root) {
@@ -9,27 +10,33 @@ List<Character> parseKANJIDICData(XmlElement root) {
final codepoint = c.findElements('codepoint').firstOrNull;
final radical = c.findElements('radical').firstOrNull;
final misc = c.findElements('misc').first;
final dic_number = c.findElements('dic_number').firstOrNull;
final query_code = c.findElements('query_code').first;
final reading_meaning = c.findElements('reading_meaning').firstOrNull;
final dicNumber = c.findElements('dic_number').firstOrNull;
final queryCode = c.findElements('query_code').first;
final readingMeaning = c.findElements('reading_meaning').firstOrNull;
// TODO: Group readings and meanings by their rmgroup parent node.
result.add(
Character(
literal: kanji,
strokeCount:
int.parse(misc.findElements('stroke_count').first.innerText),
strokeCount: int.parse(
misc.findElements('stroke_count').first.innerText,
),
grade: int.tryParse(
misc.findElements('grade').firstOrNull?.innerText ?? ''),
misc.findElements('grade').firstOrNull?.innerText ?? '',
),
frequency: int.tryParse(
misc.findElements('freq').firstOrNull?.innerText ?? ''),
misc.findElements('freq').firstOrNull?.innerText ?? '',
),
jlpt: int.tryParse(
misc.findElements('jlpt').firstOrNull?.innerText ?? '',
),
radicalName:
misc.findElements('rad_name').map((e) => e.innerText).toList(),
codepoints: codepoint
radicalName: misc
.findElements('rad_name')
.map((e) => e.innerText)
.toList(),
codepoints:
codepoint
?.findElements('cp_value')
.map(
(e) => CodePoint(
@@ -44,10 +51,7 @@ List<Character> parseKANJIDICData(XmlElement root) {
?.findElements('rad_value')
.where((e) => e.getAttribute('rad_type') == 'classical')
.map(
(e) => Radical(
kanji: kanji,
radicalId: int.parse(e.innerText),
),
(e) => Radical(kanji: kanji, radicalId: int.parse(e.innerText)),
)
.firstOrNull,
strokeMiscounts: misc
@@ -65,7 +69,8 @@ List<Character> parseKANJIDICData(XmlElement root) {
),
)
.toList(),
dictionaryReferences: dic_number
dictionaryReferences:
dicNumber
?.findElements('dic_ref')
.where((e) => e.getAttribute('dr_type') != 'moro')
.map(
@@ -77,7 +82,8 @@ List<Character> parseKANJIDICData(XmlElement root) {
)
.toList() ??
[],
dictionaryReferencesMoro: dic_number
dictionaryReferencesMoro:
dicNumber
?.findElements('dic_ref')
.where((e) => e.getAttribute('dr_type') == 'moro')
.map(
@@ -90,7 +96,7 @@ List<Character> parseKANJIDICData(XmlElement root) {
)
.toList() ??
[],
querycodes: query_code
querycodes: queryCode
.findElements('q_code')
.map(
(e) => QueryCode(
@@ -101,7 +107,8 @@ List<Character> parseKANJIDICData(XmlElement root) {
),
)
.toList(),
readings: reading_meaning
readings:
readingMeaning
?.findAllElements('reading')
.where(
(e) =>
@@ -116,7 +123,8 @@ List<Character> parseKANJIDICData(XmlElement root) {
)
.toList() ??
[],
kunyomi: reading_meaning
kunyomi:
readingMeaning
?.findAllElements('reading')
.where((e) => e.getAttribute('r_type') == 'ja_kun')
.map(
@@ -128,19 +136,22 @@ List<Character> parseKANJIDICData(XmlElement root) {
)
.toList() ??
[],
onyomi: reading_meaning
onyomi:
readingMeaning
?.findAllElements('reading')
.where((e) => e.getAttribute('r_type') == 'ja_on')
.map(
(e) => Onyomi(
kanji: kanji,
yomi: e.innerText,
isJouyou: e.getAttribute('r_status') == 'jy',
type: e.getAttribute('on_type')),
kanji: kanji,
yomi: transliterateKatakanaToHiragana(e.innerText),
isJouyou: e.getAttribute('r_status') == 'jy',
type: e.getAttribute('on_type'),
),
)
.toList() ??
[],
meanings: reading_meaning
meanings:
readingMeaning
?.findAllElements('meaning')
.map(
(e) => Meaning(
@@ -151,7 +162,8 @@ List<Character> parseKANJIDICData(XmlElement root) {
)
.toList() ??
[],
nanori: reading_meaning
nanori:
readingMeaning
?.findElements('nanori')
.map((e) => e.innerText)
.toList() ??

View File

@@ -1,9 +1,7 @@
import 'dart:ffi';
import 'dart:io';
import 'package:jadb/search.dart';
import 'package:sqflite_common_ffi/sqflite_ffi.dart';
import 'package:sqlite3/open.dart';
Future<Database> openLocalDb({
String? libsqlitePath,
@@ -12,38 +10,23 @@ Future<Database> openLocalDb({
bool verifyTablesExist = true,
bool walMode = false,
}) async {
libsqlitePath ??= Platform.environment['LIBSQLITE_PATH'];
jadbPath ??= Platform.environment['JADB_PATH'];
jadbPath ??= Directory.current.uri.resolve('jadb.sqlite').path;
libsqlitePath = (libsqlitePath == null)
? null
: File(libsqlitePath).resolveSymbolicLinksSync();
jadbPath = File(jadbPath).resolveSymbolicLinksSync();
if (libsqlitePath == null) {
throw Exception("LIBSQLITE_PATH is not set");
}
if (!File(libsqlitePath).existsSync()) {
throw Exception("LIBSQLITE_PATH does not exist: $libsqlitePath");
}
if (!File(jadbPath).existsSync()) {
throw Exception("JADB_PATH does not exist: $jadbPath");
throw Exception('JADB_PATH does not exist: $jadbPath');
}
final db = await createDatabaseFactoryFfi(
ffiInit: () =>
open.overrideForAll(() => DynamicLibrary.open(libsqlitePath!)),
).openDatabase(
final db = await createDatabaseFactoryFfi().openDatabase(
jadbPath,
options: OpenDatabaseOptions(
onConfigure: (db) async {
if (walMode) {
await db.execute("PRAGMA journal_mode = WAL");
await db.execute('PRAGMA journal_mode = WAL');
}
await db.execute("PRAGMA foreign_keys = ON");
await db.execute('PRAGMA foreign_keys = ON');
},
readOnly: !readWrite,
),

View File

@@ -1,10 +1,12 @@
import 'dart:io';
Iterable<String> parseRADKFILEBlocks(File radkfile) {
final String content = File('data/tmp/radkfile_utf8').readAsStringSync();
final String content = radkfile.readAsStringSync();
final Iterable<String> blocks =
content.replaceAll(RegExp(r'^#.*$'), '').split(r'$').skip(2);
final Iterable<String> blocks = content
.replaceAll(RegExp(r'^#.*$'), '')
.split(r'$')
.skip(2);
return blocks;
}

View File

@@ -1,27 +1,37 @@
import 'dart:io';
import 'package:jadb/table_names/radkfile.dart';
import 'package:sqflite_common/sqlite_api.dart';
Future<void> seedRADKFILEData(
Iterable<String> blocks,
Database db,
) async {
Future<void> seedRADKFILEData(Iterable<String> blocks, Database db) async {
final b = db.batch();
if (Platform.environment['RADKFILE_VERSION'] != null &&
Platform.environment['RADKFILE_DATE'] != null &&
Platform.environment['RADKFILE_HASH'] != null) {
b.insert(RADKFILETableNames.version, {
'version': Platform.environment['RADKFILE_VERSION']!,
'date': Platform.environment['RADKFILE_DATE']!,
'hash': Platform.environment['RADKFILE_HASH']!,
});
} else {
print(
'WARNING: RADKFILE version information not found in environment variables. '
'This may cause issues with future updates.',
);
}
for (final block in blocks) {
final String radical = block[1];
final List<String> kanjiList = block
.replaceFirst(RegExp(r'.*\n'), '')
.split('')
..removeWhere((e) => e == '' || e == '\n');
final List<String> kanjiList =
block.replaceFirst(RegExp(r'.*\n'), '').split('')
..removeWhere((e) => e == '' || e == '\n');
for (final kanji in kanjiList.toSet()) {
b.insert(
RADKFILETableNames.radkfile,
{
'radical': radical,
'kanji': kanji,
},
);
b.insert(RADKFILETableNames.radkfile, {
'radical': radical,
'kanji': kanji,
});
}
}

View File

@@ -24,10 +24,10 @@ Future<void> seedData(Database db) async {
Future<void> parseAndSeedDataFromJMdict(Database db) async {
print('[JMdict] Reading file content...');
String rawXML = File('data/tmp/JMdict.xml').readAsStringSync();
final String rawXML = File('data/JMdict.xml').readAsStringSync();
print('[JMdict] Parsing XML tags...');
XmlElement root = XmlDocument.parse(rawXML).getElement('JMdict')!;
final XmlElement root = XmlDocument.parse(rawXML).getElement('JMdict')!;
print('[JMdict] Parsing XML content...');
final entries = parseJMDictData(root);
@@ -38,10 +38,10 @@ Future<void> parseAndSeedDataFromJMdict(Database db) async {
Future<void> parseAndSeedDataFromKANJIDIC(Database db) async {
print('[KANJIDIC2] Reading file...');
String rawXML = File('data/tmp/kanjidic2.xml').readAsStringSync();
final String rawXML = File('data/kanjidic2.xml').readAsStringSync();
print('[KANJIDIC2] Parsing XML...');
XmlElement root = XmlDocument.parse(rawXML).getElement('kanjidic2')!;
final XmlElement root = XmlDocument.parse(rawXML).getElement('kanjidic2')!;
print('[KANJIDIC2] Parsing XML content...');
final entries = parseKANJIDICData(root);
@@ -52,7 +52,7 @@ Future<void> parseAndSeedDataFromKANJIDIC(Database db) async {
Future<void> parseAndSeedDataFromRADKFILE(Database db) async {
print('[RADKFILE] Reading file...');
File raw = File('data/tmp/RADKFILE');
final File raw = File('data/RADKFILE');
print('[RADKFILE] Parsing content...');
final blocks = parseRADKFILEBlocks(raw);
@@ -63,7 +63,7 @@ Future<void> parseAndSeedDataFromRADKFILE(Database db) async {
Future<void> parseAndSeedDataFromTanosJLPT(Database db) async {
print('[TANOS-JLPT] Reading files...');
Map<String, File> files = {
final Map<String, File> files = {
'N1': File('data/tanos-jlpt/n1.csv'),
'N2': File('data/tanos-jlpt/n2.csv'),
'N3': File('data/tanos-jlpt/n3.csv'),

View File

@@ -9,46 +9,57 @@ Future<List<JLPTRankedWord>> parseJLPTRankedWords(
) async {
final List<JLPTRankedWord> result = [];
final codec = Csv(
fieldDelimiter: ',',
lineDelimiter: '\n',
quoteMode: QuoteMode.strings,
escapeCharacter: '\\',
parseHeaders: false,
);
for (final entry in files.entries) {
final jlptLevel = entry.key;
final file = entry.value;
if (!file.existsSync()) {
throw Exception("File $jlptLevel does not exist");
throw Exception('File $jlptLevel does not exist');
}
final rows = await file
final words = await file
.openRead()
.transform(utf8.decoder)
.transform(CsvToListConverter())
.transform(codec.decoder)
.map((row) {
if (row.length != 3) {
throw Exception('Invalid line in $jlptLevel: $row');
}
return row;
})
.map((row) => row.map((e) => e as String).toList())
.map((row) {
final kanji = row[0].isEmpty
? null
: row[0]
.replaceFirst(RegExp('^お・'), '')
.replaceAll(RegExp(r'.*'), '');
final readings = row[1]
.split(RegExp('[・/、(:?s+)]'))
.map((e) => e.trim())
.toList();
final meanings = row[2].split(',').expand(cleanMeaning).toList();
return JLPTRankedWord(
readings: readings,
kanji: kanji,
jlptLevel: jlptLevel,
meanings: meanings,
);
})
.toList();
for (final row in rows) {
if (row.length != 3) {
throw Exception("Invalid line in $jlptLevel: $row");
}
final kanji = (row[0] as String).isEmpty
? null
: (row[0] as String)
.replaceFirst(RegExp('^お・'), '')
.replaceAll(RegExp(r'.*'), '');
final readings = (row[1] as String)
.split(RegExp('[・/、(:?\s+)]'))
.map((e) => e.trim())
.toList();
final meanings =
(row[2] as String).split(',').expand(cleanMeaning).toList();
result.add(JLPTRankedWord(
readings: readings,
kanji: kanji,
jlptLevel: jlptLevel,
meanings: meanings,
));
}
result.addAll(words);
}
return result;

View File

@@ -13,5 +13,5 @@ class JLPTRankedWord {
@override
String toString() =>
'(${jlptLevel},${kanji},"${readings.join(",")}","${meanings.join(",")})';
'($jlptLevel,$kanji,"${readings.join(",")}","${meanings.join(",")})';
}

View File

@@ -1,4 +1,4 @@
const Map<(String?, String), int?> TANOS_JLPT_OVERRIDES = {
const Map<(String?, String), int?> tanosJLPTOverrides = {
// N5:
(null, 'あなた'): 1223615,
(null, 'あの'): 1000430,

View File

@@ -1,49 +1,39 @@
import 'package:jadb/table_names/jmdict.dart';
import 'package:jadb/_data_ingestion/tanos-jlpt/objects.dart';
import 'package:jadb/_data_ingestion/tanos-jlpt/overrides.dart';
import 'package:jadb/table_names/jmdict.dart';
import 'package:sqflite_common/sqlite_api.dart';
Future<List<int>> _findReadingCandidates(
JLPTRankedWord word,
Database db,
) =>
db
.query(
JMdictTableNames.readingElement,
columns: ['entryId'],
where:
'"reading" IN (${List.filled(word.readings.length, '?').join(',')})',
whereArgs: [...word.readings],
)
.then((rows) => rows.map((row) => row['entryId'] as int).toList());
Future<List<int>> _findReadingCandidates(JLPTRankedWord word, Database db) => db
.query(
JMdictTableNames.readingElement,
columns: ['entryId'],
where:
'"reading" IN (${List.filled(word.readings.length, '?').join(',')})',
whereArgs: [...word.readings],
)
.then((rows) => rows.map((row) => row['entryId'] as int).toList());
Future<List<int>> _findKanjiCandidates(
JLPTRankedWord word,
Database db,
) =>
db
.query(
JMdictTableNames.kanjiElement,
columns: ['entryId'],
where: 'reading = ?',
whereArgs: [word.kanji],
)
.then((rows) => rows.map((row) => row['entryId'] as int).toList());
Future<List<int>> _findKanjiCandidates(JLPTRankedWord word, Database db) => db
.query(
JMdictTableNames.kanjiElement,
columns: ['entryId'],
where: 'reading = ?',
whereArgs: [word.kanji],
)
.then((rows) => rows.map((row) => row['entryId'] as int).toList());
Future<List<(int, String)>> _findSenseCandidates(
JLPTRankedWord word,
Database db,
) =>
db.rawQuery(
) => db
.rawQuery(
'SELECT entryId, phrase '
'FROM "${JMdictTableNames.senseGlossary}" '
'JOIN "${JMdictTableNames.sense}" USING (senseId)'
'WHERE phrase IN (${List.filled(
word.meanings.length,
'?',
).join(',')})',
'WHERE phrase IN (${List.filled(word.meanings.length, '?').join(',')})',
[...word.meanings],
).then(
)
.then(
(rows) => rows
.map((row) => (row['entryId'] as int, row['phrase'] as String))
.toList(),
@@ -55,8 +45,10 @@ Future<int?> findEntry(
bool useOverrides = true,
}) async {
final List<int> readingCandidates = await _findReadingCandidates(word, db);
final List<(int, String)> senseCandidates =
await _findSenseCandidates(word, db);
final List<(int, String)> senseCandidates = await _findSenseCandidates(
word,
db,
);
List<int> entryIds;
@@ -71,8 +63,10 @@ Future<int?> findEntry(
print('No entry found, trying to combine with senses');
entryIds = readingCandidates
.where((readingId) =>
senseCandidates.any((sense) => sense.$1 == readingId))
.where(
(readingId) =>
senseCandidates.any((sense) => sense.$1 == readingId),
)
.toList();
}
} else {
@@ -82,18 +76,21 @@ Future<int?> findEntry(
if ((entryIds.isEmpty || entryIds.length > 1) && useOverrides) {
print('No entry found, trying to fetch from overrides');
final overrideEntries = word.readings
.map((reading) => TANOS_JLPT_OVERRIDES[(word.kanji, reading)])
.map((reading) => tanosJLPTOverrides[(word.kanji, reading)])
.whereType<int>()
.toSet();
if (overrideEntries.length > 1) {
throw Exception(
'Multiple override entries found for ${word.toString()}: $entryIds');
} else if (overrideEntries.length == 0 &&
!word.readings.any((reading) =>
TANOS_JLPT_OVERRIDES.containsKey((word.kanji, reading)))) {
'Multiple override entries found for ${word.toString()}: $entryIds',
);
} else if (overrideEntries.isEmpty &&
!word.readings.any(
(reading) => tanosJLPTOverrides.containsKey((word.kanji, reading)),
)) {
throw Exception(
'No override entry found for ${word.toString()}: $entryIds');
'No override entry found for ${word.toString()}: $entryIds',
);
}
print('Found override: ${overrideEntries.firstOrNull}');
@@ -103,7 +100,8 @@ Future<int?> findEntry(
if (entryIds.length > 1) {
throw Exception(
'Multiple override entries found for ${word.toString()}: $entryIds');
'Multiple override entries found for ${word.toString()}: $entryIds',
);
} else if (entryIds.isEmpty) {
throw Exception('No entry found for ${word.toString()}');
}

View File

@@ -1,3 +1,5 @@
import 'dart:io';
import 'package:jadb/table_names/tanos_jlpt.dart';
import 'package:sqflite_common/sqlite_api.dart';
@@ -5,20 +7,32 @@ Future<void> seedTanosJLPTData(
Map<String, Set<int>> resolvedEntries,
Database db,
) async {
Batch b = db.batch();
final Batch b = db.batch();
if (Platform.environment['TANOS_JLPT_VERSION'] != null &&
Platform.environment['TANOS_JLPT_DATE'] != null &&
Platform.environment['TANOS_JLPT_HASH'] != null) {
b.insert(TanosJLPTTableNames.version, {
'version': Platform.environment['TANOS_JLPT_VERSION']!,
'date': Platform.environment['TANOS_JLPT_DATE']!,
'hash': Platform.environment['TANOS_JLPT_HASH']!,
});
} else {
print(
'WARNING: Tanos JLPT version information not found in environment variables. '
'This may cause issues with future updates.',
);
}
for (final jlptLevel in resolvedEntries.entries) {
final level = jlptLevel.key;
final entryIds = jlptLevel.value;
for (final entryId in entryIds) {
b.insert(
TanosJLPTTableNames.jlptTag,
{
'entryId': entryId,
'jlptLevel': level,
},
);
b.insert(TanosJLPTTableNames.jlptTag, {
'entryId': entryId,
'jlptLevel': level,
});
}
}

View File

@@ -1,14 +1,15 @@
import 'dart:io';
import 'package:args/command_runner.dart';
import 'package:jadb/_data_ingestion/open_local_db.dart';
import 'package:jadb/_data_ingestion/seed_database.dart';
import 'package:args/command_runner.dart';
import 'package:jadb/cli/args.dart';
class CreateDb extends Command {
final name = "create-db";
final description = "Create the database";
@override
final name = 'create-db';
@override
final description = 'Create the database';
CreateDb() {
addLibsqliteArg(argParser);
@@ -23,6 +24,7 @@ class CreateDb extends Command {
);
}
@override
Future<void> run() async {
if (argResults!.option('libsqlite') == null) {
print(argParser.usage);
@@ -35,12 +37,22 @@ class CreateDb extends Command {
readWrite: true,
);
await seedData(db).then((_) {
print("Database created successfully");
}).catchError((error) {
print("Error creating database: $error");
}).whenComplete(() {
db.close();
});
bool failed = false;
await seedData(db)
.then((_) {
print('Database created successfully');
})
.catchError((error) {
print('Error creating database: $error');
failed = true;
})
.whenComplete(() {
db.close();
});
if (failed) {
exit(1);
} else {
exit(0);
}
}
}

View File

@@ -1,8 +1,7 @@
import 'dart:io';
import 'package:jadb/_data_ingestion/open_local_db.dart';
import 'package:args/command_runner.dart';
import 'package:jadb/_data_ingestion/open_local_db.dart';
import 'package:jadb/_data_ingestion/tanos-jlpt/csv_parser.dart';
import 'package:jadb/_data_ingestion/tanos-jlpt/objects.dart';
import 'package:jadb/_data_ingestion/tanos-jlpt/resolve.dart';
@@ -10,9 +9,11 @@ import 'package:jadb/cli/args.dart';
import 'package:sqflite_common/sqlite_api.dart';
class CreateTanosJlptMappings extends Command {
final name = "create-tanos-jlpt-mappings";
@override
final name = 'create-tanos-jlpt-mappings';
@override
final description =
"Resolve Tanos JLPT data against JMDict. This tool is useful to create overrides for ambiguous references";
'Resolve Tanos JLPT data against JMDict. This tool is useful to create overrides for ambiguous references';
CreateTanosJlptMappings() {
addLibsqliteArg(argParser);
@@ -26,6 +27,7 @@ class CreateTanosJlptMappings extends Command {
);
}
@override
Future<void> run() async {
if (argResults!.option('libsqlite') == null ||
argResults!.option('jadb') == null) {
@@ -40,7 +42,7 @@ class CreateTanosJlptMappings extends Command {
final useOverrides = argResults!.flag('overrides');
Map<String, File> files = {
final Map<String, File> files = {
'N1': File('data/tanos-jlpt/n1.csv'),
'N2': File('data/tanos-jlpt/n2.csv'),
'N3': File('data/tanos-jlpt/n3.csv'),
@@ -59,11 +61,12 @@ Future<void> resolveExisting(
Database db,
bool useOverrides,
) async {
List<JLPTRankedWord> missingWords = [];
final List<JLPTRankedWord> missingWords = [];
for (final (i, word) in rankedWords.indexed) {
try {
print(
'[${(i + 1).toString().padLeft(4, '0')}/${rankedWords.length}] ${word.toString()}');
'[${(i + 1).toString().padLeft(4, '0')}/${rankedWords.length}] ${word.toString()}',
);
await findEntry(word, db, useOverrides: useOverrides);
} catch (e) {
print(e);
@@ -78,16 +81,19 @@ Future<void> resolveExisting(
print('Statistics:');
for (final jlptLevel in ['N5', 'N4', 'N3', 'N2', 'N1']) {
final missingWordCount =
missingWords.where((e) => e.jlptLevel == jlptLevel).length;
final totalWordCount =
rankedWords.where((e) => e.jlptLevel == jlptLevel).length;
final missingWordCount = missingWords
.where((e) => e.jlptLevel == jlptLevel)
.length;
final totalWordCount = rankedWords
.where((e) => e.jlptLevel == jlptLevel)
.length;
final failureRate =
((missingWordCount / totalWordCount) * 100).toStringAsFixed(2);
final failureRate = ((missingWordCount / totalWordCount) * 100)
.toStringAsFixed(2);
print(
'${jlptLevel} failures: [${missingWordCount}/${totalWordCount}] (${failureRate}%)');
'$jlptLevel failures: [$missingWordCount/$totalWordCount] ($failureRate%)',
);
}
print('Not able to determine the entry for ${missingWords.length} words');

View File

@@ -1,14 +1,15 @@
// import 'dart:io';
import 'package:args/command_runner.dart';
// import 'package:jadb/_data_ingestion/open_local_db.dart';
import 'package:jadb/cli/args.dart';
import 'package:args/command_runner.dart';
import 'package:jadb/util/lemmatizer/lemmatizer.dart';
class Lemmatize extends Command {
final name = "lemmatize";
final description = "Lemmatize a word using the Jadb lemmatizer";
@override
final name = 'lemmatize';
@override
final description = 'Lemmatize a word using the Jadb lemmatizer';
Lemmatize() {
addLibsqliteArg(argParser);
@@ -21,6 +22,7 @@ class Lemmatize extends Command {
);
}
@override
Future<void> run() async {
// if (argResults!.option('libsqlite') == null ||
// argResults!.option('jadb') == null) {
@@ -41,6 +43,6 @@ class Lemmatize extends Command {
print(result.toString());
print("Lemmatization took ${time.elapsedMilliseconds}ms");
print('Lemmatization took ${time.elapsedMilliseconds}ms');
}
}

View File

@@ -1,27 +1,25 @@
import 'dart:convert';
import 'dart:io';
import 'package:args/command_runner.dart';
import 'package:jadb/_data_ingestion/open_local_db.dart';
import 'package:jadb/cli/args.dart';
import 'package:jadb/search.dart';
import 'package:args/command_runner.dart';
class QueryKanji extends Command {
final name = "query-kanji";
final description = "Query the database for kanji data";
@override
final name = 'query-kanji';
@override
final description = 'Query the database for kanji data';
@override
final invocation = 'jadb query-kanji [options] <kanji>';
QueryKanji() {
addLibsqliteArg(argParser);
addJadbArg(argParser);
argParser.addOption(
'kanji',
abbr: 'k',
help: 'The kanji to search for.',
valueHelp: 'KANJI',
);
}
@override
Future<void> run() async {
if (argResults!.option('libsqlite') == null ||
argResults!.option('jadb') == null) {
@@ -34,18 +32,25 @@ class QueryKanji extends Command {
libsqlitePath: argResults!.option('libsqlite')!,
);
if (argResults!.rest.length != 1) {
print('You need to provide exactly one kanji character to search for.');
print('');
printUsage();
exit(64);
}
final String kanji = argResults!.rest.first.trim();
final time = Stopwatch()..start();
final result = await JaDBConnection(db).jadbSearchKanji(
argResults!.option('kanji') ?? '',
);
final result = await JaDBConnection(db).jadbSearchKanji(kanji);
time.stop();
if (result == null) {
print("No such kanji");
print('No such kanji');
} else {
print(JsonEncoder.withIndent(' ').convert(result.toJson()));
}
print("Query took ${time.elapsedMilliseconds}ms");
print('Query took ${time.elapsedMilliseconds}ms');
}
}

View File

@@ -1,30 +1,38 @@
import 'dart:convert';
import 'dart:io';
import 'package:args/command_runner.dart';
import 'package:jadb/_data_ingestion/open_local_db.dart';
import 'package:jadb/cli/args.dart';
import 'package:jadb/search.dart';
import 'package:args/command_runner.dart';
import 'package:sqflite_common/sqflite.dart';
class QueryWord extends Command {
final name = "query-word";
final description = "Query the database for word data";
@override
final name = 'query-word';
@override
final description = 'Query the database for word data';
@override
final invocation = 'jadb query-word [options] (<word> | <ID>)';
QueryWord() {
addLibsqliteArg(argParser);
addJadbArg(argParser);
argParser.addOption(
'word',
abbr: 'w',
help: 'The word to search for.',
valueHelp: 'WORD',
);
argParser.addFlag('json', abbr: 'j', help: 'Output results in JSON format');
argParser.addOption('page', abbr: 'p', valueHelp: 'NUM', defaultsTo: '0');
argParser.addOption('pageSize', valueHelp: 'NUM', defaultsTo: '30');
}
@override
Future<void> run() async {
if (argResults!.option('libsqlite') == null ||
argResults!.option('jadb') == null) {
print(argParser.usage);
print('You need to provide both libsqlite and jadb paths.');
print('');
printUsage();
exit(64);
}
@@ -33,29 +41,81 @@ class QueryWord extends Command {
libsqlitePath: argResults!.option('libsqlite')!,
);
final String searchWord = argResults!.option('word') ?? 'かな';
if (argResults!.rest.isEmpty) {
print('You need to provide a word or ID to search for.');
print('');
printUsage();
exit(64);
}
final String searchWord = argResults!.rest.join(' ');
final int? maybeId = int.tryParse(searchWord);
if (maybeId != null && maybeId >= 1000000) {
await _searchId(db, maybeId, argResults!.flag('json'));
} else {
await _searchWord(
db,
searchWord,
argResults!.flag('json'),
int.parse(argResults!.option('page')!),
int.parse(argResults!.option('pageSize')!),
);
}
}
Future<void> _searchId(DatabaseExecutor db, int id, bool jsonOutput) async {
final time = Stopwatch()..start();
final result = await JaDBConnection(db).jadbGetWordById(id);
time.stop();
if (result == null) {
print('Invalid ID');
} else {
if (jsonOutput) {
print(JsonEncoder.withIndent(' ').convert(result));
} else {
print(result.toString());
}
}
print('Query took ${time.elapsedMilliseconds}ms');
}
Future<void> _searchWord(
DatabaseExecutor db,
String searchWord,
bool jsonOutput,
int page,
int pageSize,
) async {
final time = Stopwatch()..start();
final count = await JaDBConnection(db).jadbSearchWordCount(searchWord);
time.stop();
final time2 = Stopwatch()..start();
final result = await JaDBConnection(db).jadbSearchWord(searchWord);
final result = await JaDBConnection(
db,
).jadbSearchWord(searchWord, page: page, pageSize: pageSize);
time2.stop();
if (result == null) {
print("Invalid search");
print('Invalid search');
} else if (result.isEmpty) {
print("No matches");
print('No matches');
} else {
for (final e in result) {
print(e.toString());
print("");
if (jsonOutput) {
print(JsonEncoder.withIndent(' ').convert(result));
} else {
for (final e in result) {
print(e.toString());
print('');
}
}
}
print("Total count: ${count}");
print("Count query took ${time.elapsedMilliseconds}ms");
print("Query took ${time2.elapsedMilliseconds}ms");
print('Total count: $count');
print('Count query took ${time.elapsedMilliseconds}ms');
print('Query took ${time2.elapsedMilliseconds}ms');
}
}

View File

@@ -1,6 +1,5 @@
/// Jouyou kanji sorted primarily by grades and secondarily by strokes.
const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
{
const Map<int, Map<int, List<String>>> jouyouKanjiByGradeAndStrokeCount = {
1: {
1: [''],
2: ['', '', '', '', '', '', '', ''],
@@ -12,7 +11,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
8: ['', '', '', '', '', ''],
9: ['', ''],
10: [''],
12: ['']
12: [''],
},
2: {
2: [''],
@@ -35,7 +34,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
5: ['', '', '', '', '', '', '', '', '', '', '', ''],
6: [
@@ -58,7 +57,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
7: [
'',
@@ -78,7 +77,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
8: [
'',
@@ -95,7 +94,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
9: [
'',
@@ -115,7 +114,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
10: ['', '', '', '', '', '', '', '', '', '', '', ''],
11: ['', '', '', '', '', '', '', '', '', '', '', '', ''],
@@ -124,7 +123,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
14: ['', '', '', '', '', ''],
15: [''],
16: ['', ''],
18: ['', '']
18: ['', ''],
},
3: {
2: [''],
@@ -146,7 +145,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
6: ['', '', '', '', '', '', '', '', '', '', '', '', '', ''],
7: ['', '', '', '', '', '', '', '', '', '', '', '', '', ''],
@@ -178,7 +177,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
9: [
'',
@@ -210,7 +209,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
10: [
'',
@@ -232,7 +231,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
11: [
'',
@@ -253,7 +252,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
12: [
'',
@@ -282,13 +281,13 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
13: ['', '', '', '', '', '', '', '', '', '', ''],
14: ['', '', '', '', '', ''],
15: ['', '調', '', ''],
16: ['', '', '', ''],
18: ['']
18: [''],
},
4: {
4: ['', '', '', '', ''],
@@ -318,7 +317,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
8: [
'',
@@ -346,7 +345,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
9: [
'',
@@ -367,7 +366,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
10: [
'',
@@ -389,7 +388,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
11: [
'',
@@ -410,7 +409,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
12: [
'',
@@ -434,7 +433,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
13: ['', '', '', '', '', '', '', '', '', '', ''],
14: ['', '', '', '', '', '', '', '', '', ''],
@@ -442,7 +441,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
16: ['', '', ''],
18: ['', '', ''],
19: ['', ''],
20: ['', '']
20: ['', ''],
},
5: {
3: ['', ''],
@@ -464,7 +463,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
8: [
'',
@@ -484,7 +483,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
9: ['', '', '', '', '', '', '', '', '', '', '', '', ''],
10: [
@@ -505,7 +504,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
11: [
'',
@@ -537,7 +536,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
12: [
'貿',
@@ -561,7 +560,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
13: ['', '', '', '', '', '', '', '', '', '', '', '', '', ''],
14: [
@@ -583,14 +582,14 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
15: ['', '', '', '', '', '', '', ''],
16: ['', '', '', '', ''],
17: ['', '', ''],
18: ['', '', ''],
19: [''],
20: ['']
20: [''],
},
6: {
3: ['', '', '', ''],
@@ -618,7 +617,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'沿',
''
'',
],
9: [
'',
@@ -641,7 +640,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
10: [
'',
@@ -667,7 +666,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
11: [
'',
@@ -689,7 +688,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
12: [
'',
@@ -710,7 +709,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
13: [
'',
@@ -727,14 +726,14 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
14: ['', '', '', '', '', '', '', '', '', '', '', ''],
15: ['', '', '', '', '', '', '', '', '', ''],
16: ['', '', '', '', '', '', '', ''],
17: ['', '', '', ''],
18: ['', '', ''],
19: ['', '']
19: ['', ''],
},
7: {
1: [''],
@@ -760,7 +759,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
5: [
'',
@@ -790,7 +789,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
6: [
'',
@@ -831,7 +830,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
7: [
'',
@@ -896,7 +895,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
8: [
'',
@@ -989,7 +988,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
9: [
'',
@@ -1081,7 +1080,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
10: [
'',
@@ -1206,7 +1205,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
11: [
'',
@@ -1323,7 +1322,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
12: [
'',
@@ -1435,7 +1434,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
13: [
'',
@@ -1552,7 +1551,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
14: [
'',
@@ -1617,7 +1616,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
15: [
'',
@@ -1706,7 +1705,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
16: [
'',
@@ -1764,7 +1763,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
17: [
'',
@@ -1801,7 +1800,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
18: [
'',
@@ -1830,7 +1829,7 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
19: [
'',
@@ -1851,22 +1850,23 @@ const Map<int, Map<int, List<String>>> JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT =
'',
'',
'',
''
'',
],
20: ['', '', '', '', '', '', '', ''],
21: ['', '', '', '', '', ''],
22: ['', '', ''],
23: [''],
29: ['']
29: [''],
},
};
final Map<int, List<String>> JOUYOU_KANJI_BY_GRADES =
JOUYOU_KANJI_BY_GRADE_AND_STROKE_COUNT.entries
final Map<int, List<String>> jouyouKanjiByGrades =
jouyouKanjiByGradeAndStrokeCount.entries
.expand((entry) => entry.value.entries)
.map((entry) => MapEntry(entry.key, entry.value))
.fold<Map<int, List<String>>>(
{},
(acc, entry) => acc
..putIfAbsent(entry.key, () => [])
..update(entry.key, (value) => value..addAll(entry.value)));
{},
(acc, entry) => acc
..putIfAbsent(entry.key, () => [])
..update(entry.key, (value) => value..addAll(entry.value)),
);

View File

@@ -1,4 +1,4 @@
const Map<int, List<String>> RADICALS = {
const Map<int, List<String>> radicals = {
1: ['', '', '', '', '', ''],
2: [
'',
@@ -31,7 +31,7 @@ const Map<int, List<String>> RADICALS = {
'',
'',
'',
'𠂉'
'𠂉',
],
3: [
'',
@@ -78,7 +78,7 @@ const Map<int, List<String>> RADICALS = {
'',
'',
'',
''
'',
],
4: [
'',
@@ -124,7 +124,7 @@ const Map<int, List<String>> RADICALS = {
'',
'',
'',
''
'',
],
5: [
'',
@@ -154,7 +154,7 @@ const Map<int, List<String>> RADICALS = {
'',
'',
'',
''
'',
],
6: [
'',
@@ -181,7 +181,7 @@ const Map<int, List<String>> RADICALS = {
'',
'',
'',
'西'
'西',
],
7: [
'',
@@ -204,7 +204,7 @@ const Map<int, List<String>> RADICALS = {
'',
'',
'',
''
'',
],
8: ['', '', '', '', '', '', '', '', '', '', '', ''],
9: ['', '', '', '', '', '', '', '', '', '', ''],

View File

@@ -43,6 +43,7 @@ enum JlptLevel implements Comparable<JlptLevel> {
int? get asInt =>
this == JlptLevel.none ? null : JlptLevel.values.indexOf(this);
@override
String toString() => toNullableString() ?? 'N/A';
Object? toJson() => toNullableString();

View File

@@ -11,7 +11,7 @@ String migrationDirPath() {
}
Future<void> createEmptyDb(DatabaseExecutor db) async {
List<String> migrationFiles = [];
final List<String> migrationFiles = [];
for (final file in Directory(migrationDirPath()).listSync()) {
if (file is File && file.path.endsWith('.sql')) {
migrationFiles.add(file.path);

View File

@@ -19,20 +19,14 @@ enum JMdictDialect {
final String id;
final String description;
const JMdictDialect({
required this.id,
required this.description,
});
const JMdictDialect({required this.id, required this.description});
static JMdictDialect fromId(String id) => JMdictDialect.values.firstWhere(
(e) => e.id == id,
orElse: () => throw Exception('Unknown id: $id'),
);
(e) => e.id == id,
orElse: () => throw Exception('Unknown id: $id'),
);
Map<String, Object?> toJson() => {
'id': id,
'description': description,
};
Map<String, Object?> toJson() => {'id': id, 'description': description};
static JMdictDialect fromJson(Map<String, Object?> json) =>
JMdictDialect.values.firstWhere(

View File

@@ -102,20 +102,14 @@ enum JMdictField {
final String id;
final String description;
const JMdictField({
required this.id,
required this.description,
});
const JMdictField({required this.id, required this.description});
static JMdictField fromId(String id) => JMdictField.values.firstWhere(
(e) => e.id == id,
orElse: () => throw Exception('Unknown id: $id'),
);
(e) => e.id == id,
orElse: () => throw Exception('Unknown id: $id'),
);
Map<String, Object?> toJson() => {
'id': id,
'description': description,
};
Map<String, Object?> toJson() => {'id': id, 'description': description};
static JMdictField fromJson(Map<String, Object?> json) =>
JMdictField.values.firstWhere(

View File

@@ -13,20 +13,14 @@ enum JMdictKanjiInfo {
final String id;
final String description;
const JMdictKanjiInfo({
required this.id,
required this.description,
});
const JMdictKanjiInfo({required this.id, required this.description});
static JMdictKanjiInfo fromId(String id) => JMdictKanjiInfo.values.firstWhere(
(e) => e.id == id,
orElse: () => throw Exception('Unknown id: $id'),
);
(e) => e.id == id,
orElse: () => throw Exception('Unknown id: $id'),
);
Map<String, Object?> toJson() => {
'id': id,
'description': description,
};
Map<String, Object?> toJson() => {'id': id, 'description': description};
static JMdictKanjiInfo fromJson(Map<String, Object?> json) =>
JMdictKanjiInfo.values.firstWhere(

View File

@@ -74,20 +74,14 @@ enum JMdictMisc {
final String id;
final String description;
const JMdictMisc({
required this.id,
required this.description,
});
const JMdictMisc({required this.id, required this.description});
static JMdictMisc fromId(String id) => JMdictMisc.values.firstWhere(
(e) => e.id == id,
orElse: () => throw Exception('Unknown id: $id'),
);
(e) => e.id == id,
orElse: () => throw Exception('Unknown id: $id'),
);
Map<String, Object?> toJson() => {
'id': id,
'description': description,
};
Map<String, Object?> toJson() => {'id': id, 'description': description};
static JMdictMisc fromJson(Map<String, Object?> json) =>
JMdictMisc.values.firstWhere(

View File

@@ -202,14 +202,11 @@ enum JMdictPOS {
String get shortDescription => _shortDescription ?? description;
static JMdictPOS fromId(String id) => JMdictPOS.values.firstWhere(
(e) => e.id == id,
orElse: () => throw Exception('Unknown id: $id'),
);
(e) => e.id == id,
orElse: () => throw Exception('Unknown id: $id'),
);
Map<String, Object?> toJson() => {
'id': id,
'description': description,
};
Map<String, Object?> toJson() => {'id': id, 'description': description};
static JMdictPOS fromJson(Map<String, Object?> json) =>
JMdictPOS.values.firstWhere(

View File

@@ -15,10 +15,7 @@ enum JMdictReadingInfo {
final String id;
final String description;
const JMdictReadingInfo({
required this.id,
required this.description,
});
const JMdictReadingInfo({required this.id, required this.description});
static JMdictReadingInfo fromId(String id) =>
JMdictReadingInfo.values.firstWhere(
@@ -26,10 +23,7 @@ enum JMdictReadingInfo {
orElse: () => throw Exception('Unknown id: $id'),
);
Map<String, Object?> toJson() => {
'id': id,
'description': description,
};
Map<String, Object?> toJson() => {'id': id, 'description': description};
static JMdictReadingInfo fromJson(Map<String, Object?> json) =>
JMdictReadingInfo.values.firstWhere(

View File

@@ -26,19 +26,14 @@ class KanjiSearchRadical extends Equatable {
});
@override
List<Object> get props => [
symbol,
this.names,
forms,
meanings,
];
List<Object> get props => [symbol, names, forms, meanings];
Map<String, dynamic> toJson() => {
'symbol': symbol,
'names': names,
'forms': forms,
'meanings': meanings,
};
'symbol': symbol,
'names': names,
'forms': forms,
'meanings': meanings,
};
factory KanjiSearchRadical.fromJson(Map<String, dynamic> json) {
return KanjiSearchRadical(

View File

@@ -89,46 +89,46 @@ class KanjiSearchResult extends Equatable {
@override
// ignore: public_member_api_docs
List<Object?> get props => [
taughtIn,
jlptLevel,
newspaperFrequencyRank,
strokeCount,
meanings,
kunyomi,
onyomi,
// kunyomiExamples,
// onyomiExamples,
radical,
parts,
codepoints,
kanji,
nanori,
alternativeLanguageReadings,
strokeMiscounts,
queryCodes,
dictionaryReferences,
];
taughtIn,
jlptLevel,
newspaperFrequencyRank,
strokeCount,
meanings,
kunyomi,
onyomi,
// kunyomiExamples,
// onyomiExamples,
radical,
parts,
codepoints,
kanji,
nanori,
alternativeLanguageReadings,
strokeMiscounts,
queryCodes,
dictionaryReferences,
];
Map<String, dynamic> toJson() => {
'kanji': kanji,
'taughtIn': taughtIn,
'jlptLevel': jlptLevel,
'newspaperFrequencyRank': newspaperFrequencyRank,
'strokeCount': strokeCount,
'meanings': meanings,
'kunyomi': kunyomi,
'onyomi': onyomi,
// 'onyomiExamples': onyomiExamples,
// 'kunyomiExamples': kunyomiExamples,
'radical': radical?.toJson(),
'parts': parts,
'codepoints': codepoints,
'nanori': nanori,
'alternativeLanguageReadings': alternativeLanguageReadings,
'strokeMiscounts': strokeMiscounts,
'queryCodes': queryCodes,
'dictionaryReferences': dictionaryReferences,
};
'kanji': kanji,
'taughtIn': taughtIn,
'jlptLevel': jlptLevel,
'newspaperFrequencyRank': newspaperFrequencyRank,
'strokeCount': strokeCount,
'meanings': meanings,
'kunyomi': kunyomi,
'onyomi': onyomi,
// 'onyomiExamples': onyomiExamples,
// 'kunyomiExamples': kunyomiExamples,
'radical': radical?.toJson(),
'parts': parts,
'codepoints': codepoints,
'nanori': nanori,
'alternativeLanguageReadings': alternativeLanguageReadings,
'strokeMiscounts': strokeMiscounts,
'queryCodes': queryCodes,
'dictionaryReferences': dictionaryReferences,
};
factory KanjiSearchResult.fromJson(Map<String, dynamic> json) {
return KanjiSearchResult(
@@ -156,23 +156,20 @@ class KanjiSearchResult extends Equatable {
nanori: (json['nanori'] as List).map((e) => e as String).toList(),
alternativeLanguageReadings:
(json['alternativeLanguageReadings'] as Map<String, dynamic>).map(
(key, value) => MapEntry(
key,
(value as List).map((e) => e as String).toList(),
),
),
strokeMiscounts:
(json['strokeMiscounts'] as List).map((e) => e as int).toList(),
(key, value) =>
MapEntry(key, (value as List).map((e) => e as String).toList()),
),
strokeMiscounts: (json['strokeMiscounts'] as List)
.map((e) => e as int)
.toList(),
queryCodes: (json['queryCodes'] as Map<String, dynamic>).map(
(key, value) => MapEntry(
key,
(value as List).map((e) => e as String).toList(),
),
(key, value) =>
MapEntry(key, (value as List).map((e) => e as String).toList()),
),
dictionaryReferences:
(json['dictionaryReferences'] as Map<String, dynamic>).map(
(key, value) => MapEntry(key, value as String),
),
(key, value) => MapEntry(key, value as String),
),
);
}
}

View File

@@ -7,14 +7,14 @@ import 'package:sqflite_common/sqlite_api.dart';
Future<void> verifyTablesWithDbConnection(DatabaseExecutor db) async {
final Set<String> tables = await db
.query(
'sqlite_master',
columns: ['name'],
where: 'type = ?',
whereArgs: ['table'],
)
'sqlite_master',
columns: ['name'],
where: 'type = ?',
whereArgs: ['table'],
)
.then((result) {
return result.map((row) => row['name'] as String).toSet();
});
return result.map((row) => row['name'] as String).toSet();
});
final Set<String> expectedTables = {
...JMdictTableNames.allTables,
@@ -26,14 +26,16 @@ Future<void> verifyTablesWithDbConnection(DatabaseExecutor db) async {
final missingTables = expectedTables.difference(tables);
if (missingTables.isNotEmpty) {
throw Exception([
'Missing tables:',
missingTables.map((table) => ' - $table').join('\n'),
'',
'Found tables:\n',
tables.map((table) => ' - $table').join('\n'),
'',
'Please ensure the database is correctly set up.',
].join('\n'));
throw Exception(
[
'Missing tables:',
missingTables.map((table) => ' - $table').join('\n'),
'',
'Found tables:\n',
tables.map((table) => ' - $table').join('\n'),
'',
'Please ensure the database is correctly set up.',
].join('\n'),
);
}
}

View File

@@ -0,0 +1,62 @@
enum WordSearchMatchSpanType { kanji, kana, sense }
/// A span of a word search result that corresponds to a match for a kanji, kana, or sense.
class WordSearchMatchSpan {
/// Which subtype of the word search result this span corresponds to - either a kanji, a kana, or a sense.
final WordSearchMatchSpanType spanType;
/// The index of the kanji/kana/sense in the word search result that this span corresponds to.
final int index;
/// When matching a 'sense', this is the index of the English definition in that sense that this span corresponds to. Otherwise, this is always 0.
final int subIndex;
/// The start of the span (inclusive)
final int start;
/// The end of the span (inclusive)
final int end;
WordSearchMatchSpan({
required this.spanType,
required this.index,
required this.start,
required this.end,
this.subIndex = 0,
});
@override
String toString() {
return 'WordSearchMatchSpan(spanType: $spanType, index: $index, start: $start, end: $end)';
}
Map<String, Object?> toJson() => {
'spanType': spanType.toString().split('.').last,
'index': index,
'start': start,
'end': end,
};
factory WordSearchMatchSpan.fromJson(Map<String, dynamic> json) =>
WordSearchMatchSpan(
spanType: WordSearchMatchSpanType.values.firstWhere(
(e) => e.toString().split('.').last == json['spanType'],
),
index: json['index'] as int,
start: json['start'] as int,
end: json['end'] as int,
);
@override
int get hashCode => Object.hash(spanType, index, start, end);
@override
bool operator ==(Object other) {
if (identical(this, other)) return true;
return other is WordSearchMatchSpan &&
other.spanType == spanType &&
other.index == index &&
other.start == start &&
other.end == end;
}
}

View File

@@ -1,9 +1,13 @@
import 'package:jadb/models/common/jlpt_level.dart';
import 'package:jadb/models/jmdict/jmdict_kanji_info.dart';
import 'package:jadb/models/jmdict/jmdict_misc.dart';
import 'package:jadb/models/jmdict/jmdict_reading_info.dart';
import 'package:jadb/models/word_search/word_search_match_span.dart';
import 'package:jadb/models/word_search/word_search_ruby.dart';
import 'package:jadb/models/word_search/word_search_sense.dart';
import 'package:jadb/models/word_search/word_search_sources.dart';
import 'package:jadb/search/word_search/word_search.dart';
import 'package:jadb/util/romaji_transliteration.dart';
/// A class representing a single dictionary entry from a word search.
class WordSearchResult {
@@ -34,7 +38,51 @@ class WordSearchResult {
/// A class listing the sources used to make up the data for this word search result.
final WordSearchSources sources;
const WordSearchResult({
/// A list of spans, specifying which part of this word result matched the search keyword.
///
/// Note that this is considered ephemeral data - it does not originate from the dictionary,
/// and unlike the rest of the class it varies based on external information (the searchword).
/// It will *NOT* be exported to JSON, but can be reinferred by invoking [inferMatchSpans] with
/// the original searchword.
List<WordSearchMatchSpan>? matchSpans;
/// Whether the first item in [japanese] contains kanji that likely is rare.
bool get hasUnusualKanji =>
(japanese.first.furigana != null &&
kanjiInfo[japanese.first.base] == JMdictKanjiInfo.rK) ||
senses.where((sense) => sense.misc.contains(JMdictMisc.onlyKana)).length >
(senses.length / 2);
/// All contents of [japanese], transliterated to romaji
List<String> get romaji => japanese
.map((word) => transliterateKanaToLatin(word.furigana ?? word.base))
.toList();
/// All contents of [japanase], where the furigana has either been transliterated to romaji, or
/// contains the furigana transliteration of [WordSearchRuby.base].
List<WordSearchRuby> get romajiRubys => japanese
.map(
(word) => WordSearchRuby(
base: word.base,
furigana: word.furigana != null
? transliterateKanaToLatin(word.furigana!)
: transliterateKanaToLatin(word.base),
),
)
.toList();
/// The same list of spans as [matchSpans], but the positions have been adjusted for romaji conversion
///
/// This is mostly useful in conjunction with [romajiRubys].
List<WordSearchMatchSpan>? get romajiMatchSpans {
if (matchSpans == null) {
return null;
}
throw UnimplementedError('Not yet implemented');
}
WordSearchResult({
required this.score,
required this.entryId,
required this.isCommon,
@@ -44,21 +92,22 @@ class WordSearchResult {
required this.senses,
required this.jlptLevel,
required this.sources,
this.matchSpans,
});
Map<String, dynamic> toJson() => {
'_score': score,
'entryId': entryId,
'isCommon': isCommon,
'japanese': japanese.map((e) => e.toJson()).toList(),
'kanjiInfo':
kanjiInfo.map((key, value) => MapEntry(key, value.toJson())),
'readingInfo':
readingInfo.map((key, value) => MapEntry(key, value.toJson())),
'senses': senses.map((e) => e.toJson()).toList(),
'jlptLevel': jlptLevel.toJson(),
'sources': sources.toJson(),
};
'_score': score,
'entryId': entryId,
'isCommon': isCommon,
'japanese': japanese.map((e) => e.toJson()).toList(),
'kanjiInfo': kanjiInfo.map((key, value) => MapEntry(key, value.toJson())),
'readingInfo': readingInfo.map(
(key, value) => MapEntry(key, value.toJson()),
),
'senses': senses.map((e) => e.toJson()).toList(),
'jlptLevel': jlptLevel.toJson(),
'sources': sources.toJson(),
};
factory WordSearchResult.fromJson(Map<String, dynamic> json) =>
WordSearchResult(
@@ -81,17 +130,88 @@ class WordSearchResult {
sources: WordSearchSources.fromJson(json['sources']),
);
String _formatJapaneseWord(WordSearchRuby word) =>
word.furigana == null ? word.base : "${word.base} (${word.furigana})";
factory WordSearchResult.empty() => WordSearchResult(
score: 0,
entryId: 0,
isCommon: false,
japanese: [],
kanjiInfo: {},
readingInfo: {},
senses: [],
jlptLevel: JlptLevel.none,
sources: WordSearchSources.empty(),
);
/// Infers which part(s) of this word search result matched the search keyword, and populates [matchSpans] accordingly.
void inferMatchSpans(
String searchword, {
SearchMode searchMode = SearchMode.auto,
}) {
// TODO: handle wildcards like '?' and '*' when that becomes supported in the search.
// TODO: If the searchMode is provided, we can use that to narrow down which part of the word search results to look at.
final regex = RegExp(RegExp.escape(searchword));
final matchSpans = <WordSearchMatchSpan>[];
for (final (i, japanese) in japanese.indexed) {
final baseMatches = regex.allMatches(japanese.base);
matchSpans.addAll(
baseMatches.map(
(match) => WordSearchMatchSpan(
spanType: WordSearchMatchSpanType.kanji,
index: i,
start: match.start,
end: match.end,
),
),
);
if (japanese.furigana != null) {
final furiganaMatches = regex.allMatches(japanese.furigana!);
matchSpans.addAll(
furiganaMatches.map(
(match) => WordSearchMatchSpan(
spanType: WordSearchMatchSpanType.kana,
index: i,
start: match.start,
end: match.end,
),
),
);
}
}
for (final (i, sense) in senses.indexed) {
for (final (k, definition) in sense.englishDefinitions.indexed) {
final definitionMatches = regex.allMatches(definition);
matchSpans.addAll(
definitionMatches.map(
(match) => WordSearchMatchSpan(
spanType: WordSearchMatchSpanType.sense,
index: i,
subIndex: k,
start: match.start,
end: match.end,
),
),
);
}
}
this.matchSpans = matchSpans;
}
static String _formatJapaneseWord(WordSearchRuby word) =>
word.furigana == null ? word.base : '${word.base} (${word.furigana})';
@override
String toString() {
final japaneseWord = _formatJapaneseWord(japanese[0]);
final isCommonString = isCommon ? '(C)' : '';
final jlptLevelString = "(${jlptLevel.toString()})";
final jlptLevelString = '(${jlptLevel.toString()})';
return '''
${score} | [$entryId] $japaneseWord $isCommonString $jlptLevelString
$score | [$entryId] $japaneseWord $isCommonString $jlptLevelString
Other forms: ${japanese.skip(1).map(_formatJapaneseWord).join(', ')}
Senses: ${senses.map((s) => s.englishDefinitions).join(', ')}
'''

View File

@@ -6,18 +6,12 @@ class WordSearchRuby {
/// Furigana, if applicable.
String? furigana;
WordSearchRuby({
required this.base,
this.furigana,
});
WordSearchRuby({required this.base, this.furigana});
Map<String, dynamic> toJson() => {
'base': base,
'furigana': furigana,
};
Map<String, dynamic> toJson() => {'base': base, 'furigana': furigana};
factory WordSearchRuby.fromJson(Map<String, dynamic> json) => WordSearchRuby(
base: json['base'] as String,
furigana: json['furigana'] as String?,
);
base: json['base'] as String,
furigana: json['furigana'] as String?,
);
}

View File

@@ -71,18 +71,18 @@ class WordSearchSense {
languageSource.isEmpty;
Map<String, dynamic> toJson() => {
'englishDefinitions': englishDefinitions,
'partsOfSpeech': partsOfSpeech.map((e) => e.toJson()).toList(),
'seeAlso': seeAlso.map((e) => e.toJson()).toList(),
'antonyms': antonyms.map((e) => e.toJson()).toList(),
'restrictedToReading': restrictedToReading,
'restrictedToKanji': restrictedToKanji,
'fields': fields.map((e) => e.toJson()).toList(),
'dialects': dialects.map((e) => e.toJson()).toList(),
'misc': misc.map((e) => e.toJson()).toList(),
'info': info,
'languageSource': languageSource,
};
'englishDefinitions': englishDefinitions,
'partsOfSpeech': partsOfSpeech.map((e) => e.toJson()).toList(),
'seeAlso': seeAlso.map((e) => e.toJson()).toList(),
'antonyms': antonyms.map((e) => e.toJson()).toList(),
'restrictedToReading': restrictedToReading,
'restrictedToKanji': restrictedToKanji,
'fields': fields.map((e) => e.toJson()).toList(),
'dialects': dialects.map((e) => e.toJson()).toList(),
'misc': misc.map((e) => e.toJson()).toList(),
'info': info,
'languageSource': languageSource,
};
factory WordSearchSense.fromJson(Map<String, dynamic> json) =>
WordSearchSense(
@@ -104,8 +104,9 @@ class WordSearchSense {
dialects: (json['dialects'] as List)
.map((e) => JMdictDialect.fromJson(e))
.toList(),
misc:
(json['misc'] as List).map((e) => JMdictMisc.fromJson(e)).toList(),
misc: (json['misc'] as List)
.map((e) => JMdictMisc.fromJson(e))
.toList(),
info: List<String>.from(json['info']),
languageSource: (json['languageSource'] as List)
.map((e) => WordSearchSenseLanguageSource.fromJson(e))

View File

@@ -13,11 +13,11 @@ class WordSearchSenseLanguageSource {
});
Map<String, Object?> toJson() => {
'language': language,
'phrase': phrase,
'fullyDescribesSense': fullyDescribesSense,
'constructedFromSmallerWords': constructedFromSmallerWords,
};
'language': language,
'phrase': phrase,
'fullyDescribesSense': fullyDescribesSense,
'constructedFromSmallerWords': constructedFromSmallerWords,
};
factory WordSearchSenseLanguageSource.fromJson(Map<String, dynamic> json) =>
WordSearchSenseLanguageSource(

View File

@@ -7,20 +7,13 @@ class WordSearchSources {
/// Whether JMnedict was used.
final bool jmnedict;
const WordSearchSources({
this.jmdict = true,
this.jmnedict = false,
});
const WordSearchSources({this.jmdict = true, this.jmnedict = false});
Map<String, Object?> get sqlValue => {
'jmdict': jmdict,
'jmnedict': jmnedict,
};
factory WordSearchSources.empty() => const WordSearchSources();
Map<String, dynamic> toJson() => {
'jmdict': jmdict,
'jmnedict': jmnedict,
};
Map<String, Object?> get sqlValue => {'jmdict': jmdict, 'jmnedict': jmnedict};
Map<String, dynamic> toJson() => {'jmdict': jmdict, 'jmnedict': jmnedict};
factory WordSearchSources.fromJson(Map<String, dynamic> json) =>
WordSearchSources(

View File

@@ -1,3 +1,5 @@
import 'package:jadb/models/word_search/word_search_result.dart';
/// A cross-reference entry from one word-result to another entry.
class WordSearchXrefEntry {
/// The ID of the entry that this entry cross-references to.
@@ -13,19 +15,24 @@ class WordSearchXrefEntry {
/// database (and hence might be incorrect).
final bool ambiguous;
/// The result of the cross-reference, may or may not be included in the query.
final WordSearchResult? xrefResult;
const WordSearchXrefEntry({
required this.entryId,
required this.ambiguous,
required this.baseWord,
required this.furigana,
required this.xrefResult,
});
Map<String, dynamic> toJson() => {
'entryId': entryId,
'ambiguous': ambiguous,
'baseWord': baseWord,
'furigana': furigana,
};
'entryId': entryId,
'ambiguous': ambiguous,
'baseWord': baseWord,
'furigana': furigana,
'xrefResult': xrefResult?.toJson(),
};
factory WordSearchXrefEntry.fromJson(Map<String, dynamic> json) =>
WordSearchXrefEntry(
@@ -33,5 +40,6 @@ class WordSearchXrefEntry {
ambiguous: json['ambiguous'] as bool,
baseWord: json['baseWord'] as String,
furigana: json['furigana'] as String?,
xrefResult: null,
);
}

View File

@@ -1,12 +1,10 @@
import 'package:jadb/models/kanji_search/kanji_search_result.dart';
import 'package:jadb/models/verify_tables.dart';
import 'package:jadb/models/word_search/word_search_result.dart';
import 'package:jadb/models/kanji_search/kanji_search_result.dart';
import 'package:jadb/search/filter_kanji.dart';
import 'package:jadb/search/kanji_search.dart';
import 'package:jadb/search/radical_search.dart';
import 'package:jadb/search/word_search/word_search.dart';
import 'package:jadb/search/kanji_search.dart';
import 'package:sqflite_common/sqlite_api.dart';
extension JaDBConnection on DatabaseExecutor {
@@ -19,38 +17,45 @@ extension JaDBConnection on DatabaseExecutor {
Future<KanjiSearchResult?> jadbSearchKanji(String kanji) =>
searchKanjiWithDbConnection(this, kanji);
/// Search for a kanji in the database.
Future<Map<String, KanjiSearchResult>> jadbGetManyKanji(Set<String> kanji) =>
searchManyKanjiWithDbConnection(this, kanji);
/// Filter a list of characters, and return the ones that are listed in the kanji dictionary.
Future<List<String>> filterKanji(
List<String> kanji, {
bool deduplicate = false,
}) =>
filterKanjiWithDbConnection(this, kanji, deduplicate);
}) => filterKanjiWithDbConnection(this, kanji, deduplicate);
/// Search for a word in the database.
Future<List<WordSearchResult>?> jadbSearchWord(
String word, {
SearchMode searchMode = SearchMode.Auto,
SearchMode searchMode = SearchMode.auto,
int page = 0,
int pageSize = 10,
}) =>
searchWordWithDbConnection(
this,
word,
searchMode,
page,
pageSize,
);
int? pageSize,
}) => searchWordWithDbConnection(
this,
word,
searchMode: searchMode,
page: page,
pageSize: pageSize,
);
///
Future<WordSearchResult?> jadbGetWordById(int id) =>
getWordByIdWithDbConnection(this, id);
/// Get a list of words by their IDs.
///
/// IDs for which no result is found are omitted from the returned value.
Future<Map<int, WordSearchResult>> jadbGetManyWordsByIds(Set<int> ids) =>
getWordsByIdsWithDbConnection(this, ids);
/// Search for a word in the database, and return the count of results.
Future<int?> jadbSearchWordCount(
String word, {
SearchMode searchMode = SearchMode.Auto,
}) =>
searchWordCountWithDbConnection(this, word, searchMode);
SearchMode searchMode = SearchMode.auto,
}) => searchWordCountWithDbConnection(this, word, searchMode: searchMode);
/// Given a list of radicals, search which kanji contains all
/// of the radicals, find their other radicals, and return those.

View File

@@ -1,22 +1,32 @@
import 'package:jadb/table_names/kanjidic.dart';
import 'package:sqflite_common/sqflite.dart';
/// Filters a list of kanji characters, returning only those that exist in the database.
///
/// If [deduplicate] is true, the returned list will deduplicate the input kanji list before returning the filtered results.
Future<List<String>> filterKanjiWithDbConnection(
DatabaseExecutor connection,
List<String> kanji,
bool deduplicate,
) async {
final Set<String> filteredKanji = await connection.rawQuery(
'''
final Set<String> filteredKanji = await connection
.rawQuery('''
SELECT "literal"
FROM "${KANJIDICTableNames.character}"
WHERE "literal" IN (${kanji.map((_) => '?').join(',')})
''',
kanji,
).then((value) => value.map((e) => e['literal'] as String).toSet());
''', kanji)
.then((value) => value.map((e) => e['literal'] as String).toSet());
if (deduplicate) {
return filteredKanji.toList();
final List<String> result = [];
final Set<String> seen = {};
for (final k in kanji) {
if (filteredKanji.contains(k) && !seen.contains(k)) {
result.add(k);
seen.add(k);
}
}
return result;
} else {
return kanji.where((k) => filteredKanji.contains(k)).toList();
}

View File

@@ -1,143 +1,201 @@
import 'package:collection/collection.dart';
import 'package:jadb/table_names/kanjidic.dart';
import 'package:jadb/table_names/radkfile.dart';
import 'package:jadb/models/kanji_search/kanji_search_radical.dart';
import 'package:jadb/models/kanji_search/kanji_search_result.dart';
import 'package:jadb/table_names/kanjidic.dart';
import 'package:jadb/table_names/radkfile.dart';
import 'package:sqflite_common/sqflite.dart';
Future<List<Map<String, Object?>>> _charactersQuery(
DatabaseExecutor connection,
String kanji,
) => connection.rawQuery(
'''
SELECT
"${KANJIDICTableNames.character}"."literal",
"${KANJIDICTableNames.character}"."strokeCount",
"${KANJIDICTableNames.grade}"."grade",
"${KANJIDICTableNames.jlpt}"."jlpt",
"${KANJIDICTableNames.frequency}"."frequency"
FROM "${KANJIDICTableNames.character}"
LEFT JOIN "${KANJIDICTableNames.grade}" ON "${KANJIDICTableNames.character}"."literal" = "${KANJIDICTableNames.grade}"."kanji"
LEFT JOIN "${KANJIDICTableNames.jlpt}" ON "${KANJIDICTableNames.character}"."literal" = "${KANJIDICTableNames.jlpt}"."kanji"
LEFT JOIN "${KANJIDICTableNames.frequency}" ON "${KANJIDICTableNames.character}"."literal" = "${KANJIDICTableNames.frequency}"."kanji"
WHERE "literal" = ?
''',
[kanji],
);
Future<List<Map<String, Object?>>> _codepointsQuery(
DatabaseExecutor connection,
String kanji,
) => connection.query(
KANJIDICTableNames.codepoint,
where: 'kanji = ?',
whereArgs: [kanji],
);
Future<List<Map<String, Object?>>> _kunyomisQuery(
DatabaseExecutor connection,
String kanji,
) => connection.query(
KANJIDICTableNames.kunyomi,
where: 'kanji = ?',
whereArgs: [kanji],
orderBy: 'orderNum',
);
Future<List<Map<String, Object?>>> _onyomisQuery(
DatabaseExecutor connection,
String kanji,
) => connection.query(
KANJIDICTableNames.onyomi,
where: 'kanji = ?',
whereArgs: [kanji],
orderBy: 'orderNum',
);
Future<List<Map<String, Object?>>> _meaningsQuery(
DatabaseExecutor connection,
String kanji,
) => connection.query(
KANJIDICTableNames.meaning,
where: 'kanji = ? AND language = ?',
whereArgs: [kanji, 'eng'],
orderBy: 'orderNum',
);
Future<List<Map<String, Object?>>> _nanorisQuery(
DatabaseExecutor connection,
String kanji,
) => connection.query(
KANJIDICTableNames.nanori,
where: 'kanji = ?',
whereArgs: [kanji],
);
Future<List<Map<String, Object?>>> _dictionaryReferencesQuery(
DatabaseExecutor connection,
String kanji,
) => connection.query(
KANJIDICTableNames.dictionaryReference,
where: 'kanji = ?',
whereArgs: [kanji],
);
Future<List<Map<String, Object?>>> _queryCodesQuery(
DatabaseExecutor connection,
String kanji,
) => connection.query(
KANJIDICTableNames.queryCode,
where: 'kanji = ?',
whereArgs: [kanji],
);
Future<List<Map<String, Object?>>> _radicalsQuery(
DatabaseExecutor connection,
String kanji,
) => connection.rawQuery(
'''
SELECT DISTINCT
"XREF__KANJIDIC_Radical__RADKFILE"."radicalSymbol" AS "symbol",
"names"
FROM "${KANJIDICTableNames.radical}"
JOIN "XREF__KANJIDIC_Radical__RADKFILE" USING ("radicalId")
LEFT JOIN (
SELECT "radicalId", group_concat("name") AS "names"
FROM "${KANJIDICTableNames.radicalName}"
GROUP BY "radicalId"
) USING ("radicalId")
WHERE "${KANJIDICTableNames.radical}"."kanji" = ?
''',
[kanji],
);
Future<List<Map<String, Object?>>> _partsQuery(
DatabaseExecutor connection,
String kanji,
) => connection.query(
RADKFILETableNames.radkfile,
where: 'kanji = ?',
whereArgs: [kanji],
);
Future<List<Map<String, Object?>>> _readingsQuery(
DatabaseExecutor connection,
String kanji,
) => connection.query(
KANJIDICTableNames.reading,
where: 'kanji = ?',
whereArgs: [kanji],
);
Future<List<Map<String, Object?>>> _strokeMiscountsQuery(
DatabaseExecutor connection,
String kanji,
) => connection.query(
KANJIDICTableNames.strokeMiscount,
where: 'kanji = ?',
whereArgs: [kanji],
);
// Future<List<Map<String, Object?>>> _variantsQuery(
// DatabaseExecutor connection,
// String kanji,
// ) => connection.query(
// KANJIDICTableNames.variant,
// where: 'kanji = ?',
// whereArgs: [kanji],
// );
/// Searches for a kanji character and returns its details, or null if the kanji is not found in the database.
Future<KanjiSearchResult?> searchKanjiWithDbConnection(
DatabaseExecutor connection,
String kanji,
) async {
late final List<Map<String, Object?>> characters;
final characters_query = connection.query(
KANJIDICTableNames.character,
where: "literal = ?",
whereArgs: [kanji],
);
late final List<Map<String, Object?>> codepoints;
final codepoints_query = connection.query(
KANJIDICTableNames.codepoint,
where: "kanji = ?",
whereArgs: [kanji],
);
late final List<Map<String, Object?>> kunyomis;
final kunyomis_query = connection.query(
KANJIDICTableNames.kunyomi,
where: "kanji = ?",
whereArgs: [kanji],
orderBy: "orderNum",
);
late final List<Map<String, Object?>> onyomis;
final onyomis_query = connection.query(
KANJIDICTableNames.onyomi,
where: "kanji = ?",
whereArgs: [kanji],
orderBy: "orderNum",
);
late final List<Map<String, Object?>> meanings;
final meanings_query = connection.query(
KANJIDICTableNames.meaning,
where: "kanji = ? AND language = ?",
whereArgs: [kanji, 'eng'],
orderBy: "orderNum",
);
late final List<Map<String, Object?>> nanoris;
final nanoris_query = connection.query(
KANJIDICTableNames.nanori,
where: "kanji = ?",
whereArgs: [kanji],
);
late final List<Map<String, Object?>> dictionary_references;
final dictionary_references_query = connection.query(
KANJIDICTableNames.dictionaryReference,
where: "kanji = ?",
whereArgs: [kanji],
);
late final List<Map<String, Object?>> query_codes;
final query_codes_query = connection.query(
KANJIDICTableNames.queryCode,
where: "kanji = ?",
whereArgs: [kanji],
);
late final List<Map<String, Object?>> dictionaryReferences;
late final List<Map<String, Object?>> queryCodes;
late final List<Map<String, Object?>> radicals;
final radicals_query = connection.rawQuery(
'''
SELECT DISTINCT
"XREF__KANJIDIC_Radical__RADKFILE"."radicalSymbol" AS "symbol",
"names"
FROM "${KANJIDICTableNames.radical}"
JOIN "XREF__KANJIDIC_Radical__RADKFILE" USING ("radicalId")
LEFT JOIN (
SELECT "radicalId", group_concat("name") AS "names"
FROM "${KANJIDICTableNames.radicalName}"
GROUP BY "radicalId"
) USING ("radicalId")
WHERE "${KANJIDICTableNames.radical}"."kanji" = ?
''',
[kanji],
);
late final List<Map<String, Object?>> parts;
final parts_query = connection.query(
RADKFILETableNames.radkfile,
where: "kanji = ?",
whereArgs: [kanji],
);
late final List<Map<String, Object?>> readings;
final readings_query = connection.query(
KANJIDICTableNames.reading,
where: "kanji = ?",
whereArgs: [kanji],
);
late final List<Map<String, Object?>> stroke_miscounts;
final stroke_miscounts_query = connection.query(
KANJIDICTableNames.strokeMiscount,
where: "kanji = ?",
whereArgs: [kanji],
);
late final List<Map<String, Object?>> strokeMiscounts;
// TODO: add variant data to result
// late final List<Map<String, Object?>> variants;
// final variants_query = connection.query(
// KANJIDICTableNames.variant,
// where: "kanji = ?",
// whereArgs: [kanji],
// );
// TODO: Search for kunyomi and onyomi usage of the characters
// from JMDict. We'll need to fuzzy aquery JMDict_KanjiElement for mathces,
// filter JMdict_ReadingElement for kunyomi/onyomi, and then sort the main entry
// by JLPT, news frequency, etc.
// TODO: Search for kunyomi and onyomi usage of the characters
// from JMDict. We'll need to fuzzy aquery JMDict_KanjiElement for matches,
// filter JMdict_ReadingElement for kunyomi/onyomi, and then sort the main entry
// by JLPT, news frequency, etc.
await characters_query.then((value) => characters = value);
await _charactersQuery(connection, kanji).then((value) => characters = value);
if (characters.isEmpty) {
return null;
}
await Future.wait({
codepoints_query.then((value) => codepoints = value),
kunyomis_query.then((value) => kunyomis = value),
onyomis_query.then((value) => onyomis = value),
meanings_query.then((value) => meanings = value),
nanoris_query.then((value) => nanoris = value),
dictionary_references_query.then((value) => dictionary_references = value),
query_codes_query.then((value) => query_codes = value),
radicals_query.then((value) => radicals = value),
parts_query.then((value) => parts = value),
readings_query.then((value) => readings = value),
stroke_miscounts_query.then((value) => stroke_miscounts = value),
_codepointsQuery(connection, kanji).then((value) => codepoints = value),
_kunyomisQuery(connection, kanji).then((value) => kunyomis = value),
_onyomisQuery(connection, kanji).then((value) => onyomis = value),
_meaningsQuery(connection, kanji).then((value) => meanings = value),
_nanorisQuery(connection, kanji).then((value) => nanoris = value),
_dictionaryReferencesQuery(
connection,
kanji,
).then((value) => dictionaryReferences = value),
_queryCodesQuery(connection, kanji).then((value) => queryCodes = value),
_radicalsQuery(connection, kanji).then((value) => radicals = value),
_partsQuery(connection, kanji).then((value) => parts = value),
_readingsQuery(connection, kanji).then((value) => readings = value),
_strokeMiscountsQuery(
connection,
kanji,
).then((value) => strokeMiscounts = value),
// variants_query.then((value) => variants = value),
});
@@ -156,9 +214,7 @@ Future<KanjiSearchResult?> searchKanjiWithDbConnection(
: null;
final alternativeLanguageReadings = readings
.groupListsBy(
(item) => item['type'] as String,
)
.groupListsBy((item) => item['type'] as String)
.map(
(key, value) => MapEntry(
key,
@@ -167,20 +223,16 @@ Future<KanjiSearchResult?> searchKanjiWithDbConnection(
);
// TODO: Add `SKIPMisclassification` to the entries
final queryCodes = query_codes
.groupListsBy(
(item) => item['type'] as String,
)
final queryCodes_ = queryCodes
.groupListsBy((item) => item['type'] as String)
.map(
(key, value) => MapEntry(
key,
value.map((item) => item['code'] as String).toList(),
),
(key, value) =>
MapEntry(key, value.map((item) => item['code'] as String).toList()),
);
// TODO: Add `volume` and `page` to the entries
final dictionaryReferences = {
for (final entry in dictionary_references)
final dictionaryReferences_ = {
for (final entry in dictionaryReferences)
entry['type'] as String: entry['ref'] as String,
};
@@ -209,9 +261,33 @@ Future<KanjiSearchResult?> searchKanjiWithDbConnection(
},
nanori: nanoris.map((item) => item['nanori'] as String).toList(),
alternativeLanguageReadings: alternativeLanguageReadings,
strokeMiscounts:
stroke_miscounts.map((item) => item['strokeCount'] as int).toList(),
queryCodes: queryCodes,
dictionaryReferences: dictionaryReferences,
strokeMiscounts: strokeMiscounts
.map((item) => item['strokeCount'] as int)
.toList(),
queryCodes: queryCodes_,
dictionaryReferences: dictionaryReferences_,
);
}
// TODO: Use fewer queries with `IN` clauses to reduce the number of queries
/// Searches for multiple kanji at once, returning a map of kanji to their search results.
Future<Map<String, KanjiSearchResult>> searchManyKanjiWithDbConnection(
DatabaseExecutor connection,
Set<String> kanji,
) async {
if (kanji.isEmpty) {
return {};
}
final results = <String, KanjiSearchResult>{};
for (final k in kanji) {
final result = await searchKanjiWithDbConnection(connection, k);
if (result != null) {
results[k] = result;
}
}
return results;
}

View File

@@ -3,10 +3,16 @@ import 'package:sqflite_common/sqlite_api.dart';
// TODO: validate that the list of radicals all are valid radicals
/// Returns a list of radicals that are part of any kanji that contains all of the input radicals.
///
/// This can be used to limit the choices of additional radicals provided to a user,
/// so that any choice they make will still yield at least one kanji.
Future<List<String>> searchRemainingRadicalsWithDbConnection(
DatabaseExecutor connection,
List<String> radicals,
) async {
final distinctRadicals = radicals.toSet();
final queryResult = await connection.rawQuery(
'''
SELECT DISTINCT "radical"
@@ -14,39 +20,37 @@ Future<List<String>> searchRemainingRadicalsWithDbConnection(
WHERE "kanji" IN (
SELECT "kanji"
FROM "${RADKFILETableNames.radkfile}"
WHERE "radical" IN (${List.filled(radicals.length, '?').join(',')})
WHERE "radical" IN (${List.filled(distinctRadicals.length, '?').join(',')})
GROUP BY "kanji"
HAVING COUNT(DISTINCT "radical") = ?
)
''',
[
...radicals,
radicals.length,
],
[...distinctRadicals, distinctRadicals.length],
);
final remainingRadicals =
queryResult.map((row) => row['radical'] as String).toList();
final remainingRadicals = queryResult
.map((row) => row['radical'] as String)
.toList();
return remainingRadicals;
}
/// Returns a list of kanji that contain all of the input radicals.
Future<List<String>> searchKanjiByRadicalsWithDbConnection(
DatabaseExecutor connection,
List<String> radicals,
) async {
final distinctRadicals = radicals.toSet();
final queryResult = await connection.rawQuery(
'''
SELECT "kanji"
FROM "${RADKFILETableNames.radkfile}"
WHERE "radical" IN (${List.filled(radicals.length, '?').join(',')})
WHERE "radical" IN (${List.filled(distinctRadicals.length, '?').join(',')})
GROUP BY "kanji"
HAVING COUNT(DISTINCT "radical") = ?
''',
[
...radicals,
radicals.length,
],
[...distinctRadicals, distinctRadicals.length],
);
final kanji = queryResult.map((row) => row['kanji'] as String).toList();

View File

@@ -1,6 +1,5 @@
import 'package:jadb/table_names/jmdict.dart';
import 'package:jadb/table_names/tanos_jlpt.dart';
import 'package:jadb/util/sqlite_utils.dart';
import 'package:sqflite_common/sqflite.dart';
class LinearWordQueryData {
@@ -25,6 +24,9 @@ class LinearWordQueryData {
final List<Map<String, Object?>> readingElementRestrictions;
final List<Map<String, Object?>> kanjiElementInfos;
final LinearWordQueryData? senseAntonymData;
final LinearWordQueryData? senseSeeAlsoData;
const LinearWordQueryData({
required this.senses,
required this.readingElements,
@@ -46,245 +48,392 @@ class LinearWordQueryData {
required this.readingElementInfos,
required this.readingElementRestrictions,
required this.kanjiElementInfos,
required this.senseAntonymData,
required this.senseSeeAlsoData,
});
}
Future<LinearWordQueryData> fetchLinearWordQueryData(
Future<List<Map<String, Object?>>> _sensesQuery(
DatabaseExecutor connection,
List<int> entryIds,
) async {
) => connection.query(
JMdictTableNames.sense,
where: 'entryId IN (${List.filled(entryIds.length, '?').join(',')})',
whereArgs: entryIds,
);
Future<List<Map<String, Object?>>> _readingelementsQuery(
DatabaseExecutor connection,
List<int> entryIds,
) => connection.query(
JMdictTableNames.readingElement,
where: 'entryId IN (${List.filled(entryIds.length, '?').join(',')})',
whereArgs: entryIds,
orderBy: 'elementId',
);
Future<List<Map<String, Object?>>> _kanjielementsQuery(
DatabaseExecutor connection,
List<int> entryIds,
) => connection.query(
JMdictTableNames.kanjiElement,
where: 'entryId IN (${List.filled(entryIds.length, '?').join(',')})',
whereArgs: entryIds,
orderBy: 'elementId',
);
Future<List<Map<String, Object?>>> _jlpttagsQuery(
DatabaseExecutor connection,
List<int> entryIds,
) => connection.query(
TanosJLPTTableNames.jlptTag,
where: 'entryId IN (${List.filled(entryIds.length, '?').join(',')})',
whereArgs: entryIds,
);
Future<List<Map<String, Object?>>> _commonentriesQuery(
DatabaseExecutor connection,
List<int> entryIds,
) => connection.query(
'JMdict_EntryCommon',
where: 'entryId IN (${List.filled(entryIds.length, '?').join(',')})',
whereArgs: entryIds,
);
// Sense queries
Future<List<Map<String, Object?>>> _senseantonymsQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.rawQuery(
"""
SELECT
"${JMdictTableNames.senseAntonyms}".senseId,
"${JMdictTableNames.senseAntonyms}".ambiguous,
"${JMdictTableNames.senseAntonyms}".xrefEntryId,
"JMdict_BaseAndFurigana"."base",
"JMdict_BaseAndFurigana"."furigana"
FROM "${JMdictTableNames.senseAntonyms}"
JOIN "JMdict_BaseAndFurigana"
ON "${JMdictTableNames.senseAntonyms}"."xrefEntryId" = "JMdict_BaseAndFurigana"."entryId"
WHERE
"senseId" IN (${List.filled(senseIds.length, '?').join(',')})
AND "JMdict_BaseAndFurigana"."isFirst"
ORDER BY
"${JMdictTableNames.senseAntonyms}"."senseId",
"${JMdictTableNames.senseAntonyms}"."xrefEntryId"
""",
[...senseIds],
);
Future<List<Map<String, Object?>>> _senseseealsosQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.rawQuery(
"""
SELECT
"${JMdictTableNames.senseSeeAlso}"."senseId",
"${JMdictTableNames.senseSeeAlso}"."ambiguous",
"${JMdictTableNames.senseSeeAlso}"."xrefEntryId",
"JMdict_BaseAndFurigana"."base",
"JMdict_BaseAndFurigana"."furigana"
FROM "${JMdictTableNames.senseSeeAlso}"
JOIN "JMdict_BaseAndFurigana"
ON "${JMdictTableNames.senseSeeAlso}"."xrefEntryId" = "JMdict_BaseAndFurigana"."entryId"
WHERE
"senseId" IN (${List.filled(senseIds.length, '?').join(',')})
AND "JMdict_BaseAndFurigana"."isFirst"
ORDER BY
"${JMdictTableNames.senseSeeAlso}"."senseId",
"${JMdictTableNames.senseSeeAlso}"."xrefEntryId"
""",
[...senseIds],
);
Future<List<Map<String, Object?>>> _sensedialectsQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.query(
JMdictTableNames.senseDialect,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
Future<List<Map<String, Object?>>> _sensefieldsQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.query(
JMdictTableNames.senseField,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
Future<List<Map<String, Object?>>> _senseglossariesQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.query(
JMdictTableNames.senseGlossary,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
Future<List<Map<String, Object?>>> _senseinfosQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.query(
JMdictTableNames.senseInfo,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
Future<List<Map<String, Object?>>> _senselanguagesourcesQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.query(
JMdictTableNames.senseLanguageSource,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
Future<List<Map<String, Object?>>> _sensemiscsQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.query(
JMdictTableNames.senseMisc,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
Future<List<Map<String, Object?>>> _sensepossQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.query(
JMdictTableNames.sensePOS,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
Future<List<Map<String, Object?>>> _senserestrictedtokanjisQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.rawQuery(
"""
SELECT
"${JMdictTableNames.senseRestrictedToKanji}".senseId,
"${JMdictTableNames.senseRestrictedToKanji}".kanjiElementId,
"${JMdictTableNames.kanjiElement}".reading
FROM "${JMdictTableNames.senseRestrictedToKanji}"
JOIN "${JMdictTableNames.kanjiElement}"
ON "${JMdictTableNames.senseRestrictedToKanji}"."kanjiElementId" = "${JMdictTableNames.kanjiElement}"."elementId"
WHERE
"senseId" IN (${List.filled(senseIds.length, '?').join(',')})
ORDER BY
"${JMdictTableNames.senseRestrictedToKanji}"."senseId",
"${JMdictTableNames.senseRestrictedToKanji}"."kanjiElementId"
""",
[...senseIds],
);
Future<List<Map<String, Object?>>> _senserestrictedtoreadingsQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.rawQuery(
"""
SELECT
"${JMdictTableNames.senseRestrictedToReading}".senseId,
"${JMdictTableNames.senseRestrictedToReading}".readingElementId,
"${JMdictTableNames.readingElement}".reading
FROM "${JMdictTableNames.senseRestrictedToReading}"
JOIN "${JMdictTableNames.readingElement}"
ON "${JMdictTableNames.senseRestrictedToReading}"."readingElementId" = "${JMdictTableNames.readingElement}"."elementId"
WHERE
"senseId" IN (${List.filled(senseIds.length, '?').join(',')})
ORDER BY
"${JMdictTableNames.senseRestrictedToReading}"."senseId",
"${JMdictTableNames.senseRestrictedToReading}"."readingElementId"
""",
[...senseIds],
);
Future<List<Map<String, Object?>>> _examplesentencesQuery(
DatabaseExecutor connection,
List<int> senseIds,
) => connection.query(
'JMdict_ExampleSentence',
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
// Reading/kanji elements queries
Future<List<Map<String, Object?>>> _readingelementinfosQuery(
DatabaseExecutor connection,
List<int> readingIds,
) => connection.query(
JMdictTableNames.readingInfo,
where: '(elementId) IN (${List.filled(readingIds.length, '?').join(',')})',
whereArgs: readingIds,
);
Future<List<Map<String, Object?>>> _readingelementrestrictionsQuery(
DatabaseExecutor connection,
List<int> readingIds,
) => connection.query(
JMdictTableNames.readingRestriction,
where: '(elementId) IN (${List.filled(readingIds.length, '?').join(',')})',
whereArgs: readingIds,
);
Future<List<Map<String, Object?>>> _kanjielementinfosQuery(
DatabaseExecutor connection,
List<int> kanjiIds,
) => connection.query(
JMdictTableNames.kanjiInfo,
where: '(elementId) IN (${List.filled(kanjiIds.length, '?').join(',')})',
whereArgs: kanjiIds,
);
// Xref queries
Future<LinearWordQueryData?> _senseantonymdataQuery(
DatabaseExecutor connection,
List<int> entryIds,
) => fetchLinearWordQueryData(connection, entryIds, fetchXrefData: false);
Future<LinearWordQueryData?> _senseseealsodataQuery(
DatabaseExecutor connection,
List<int> entryIds,
) => fetchLinearWordQueryData(connection, entryIds, fetchXrefData: false);
// Full query
Future<LinearWordQueryData> fetchLinearWordQueryData(
DatabaseExecutor connection,
List<int> entryIds, {
bool fetchXrefData = true,
}) async {
late final List<Map<String, Object?>> senses;
final Future<List<Map<String, Object?>>> senses_query = connection.query(
JMdictTableNames.sense,
where: 'entryId IN (${List.filled(entryIds.length, '?').join(',')})',
whereArgs: entryIds,
);
late final List<Map<String, Object?>> readingElements;
final Future<List<Map<String, Object?>>> readingElements_query =
connection.query(
JMdictTableNames.readingElement,
where: 'entryId IN (${List.filled(entryIds.length, '?').join(',')})',
whereArgs: entryIds,
orderBy: 'orderNum',
);
late final List<Map<String, Object?>> kanjiElements;
final Future<List<Map<String, Object?>>> kanjiElements_query =
connection.query(
JMdictTableNames.kanjiElement,
where: 'entryId IN (${List.filled(entryIds.length, '?').join(',')})',
whereArgs: entryIds,
orderBy: 'orderNum',
);
late final List<Map<String, Object?>> jlptTags;
final Future<List<Map<String, Object?>>> jlptTags_query = connection.query(
TanosJLPTTableNames.jlptTag,
where: 'entryId IN (${List.filled(entryIds.length, '?').join(',')})',
whereArgs: entryIds,
);
late final List<Map<String, Object?>> commonEntries;
final Future<List<Map<String, Object?>>> commonEntries_query =
connection.query(
'JMdict_EntryCommon',
where: 'entryId IN (${List.filled(entryIds.length, '?').join(',')})',
whereArgs: entryIds,
);
await Future.wait([
senses_query.then((value) => senses = value),
readingElements_query.then((value) => readingElements = value),
kanjiElements_query.then((value) => kanjiElements = value),
jlptTags_query.then((value) => jlptTags = value),
commonEntries_query.then((value) => commonEntries = value),
_sensesQuery(connection, entryIds).then((value) => senses = value),
_readingelementsQuery(
connection,
entryIds,
).then((value) => readingElements = value),
_kanjielementsQuery(
connection,
entryIds,
).then((value) => kanjiElements = value),
_jlpttagsQuery(connection, entryIds).then((value) => jlptTags = value),
_commonentriesQuery(
connection,
entryIds,
).then((value) => commonEntries = value),
]);
// Sense queries
final senseIds = senses.map((sense) => sense['senseId'] as int).toList();
late final List<Map<String, Object?>> senseAntonyms;
final Future<List<Map<String, Object?>>> senseAntonyms_query =
connection.rawQuery(
"""
SELECT
"${JMdictTableNames.senseAntonyms}".senseId,
"${JMdictTableNames.senseAntonyms}".ambiguous,
"${JMdictTableNames.senseAntonyms}".xrefEntryId,
"JMdict_BaseAndFurigana"."base",
"JMdict_BaseAndFurigana"."furigana"
FROM "${JMdictTableNames.senseAntonyms}"
JOIN "JMdict_BaseAndFurigana"
ON "${JMdictTableNames.senseAntonyms}"."xrefEntryId" = "JMdict_BaseAndFurigana"."entryId"
WHERE
"senseId" IN (${List.filled(senseIds.length, '?').join(',')})
AND "JMdict_BaseAndFurigana"."isFirst"
ORDER BY
"${JMdictTableNames.senseAntonyms}"."senseId",
"${JMdictTableNames.senseAntonyms}"."xrefEntryId"
""",
[...senseIds],
);
late final List<Map<String, Object?>> senseDialects;
final Future<List<Map<String, Object?>>> senseDialects_query =
connection.query(
JMdictTableNames.senseDialect,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
late final List<Map<String, Object?>> senseFields;
final Future<List<Map<String, Object?>>> senseFields_query = connection.query(
JMdictTableNames.senseField,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
late final List<Map<String, Object?>> senseGlossaries;
final Future<List<Map<String, Object?>>> senseGlossaries_query =
connection.query(
JMdictTableNames.senseGlossary,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
late final List<Map<String, Object?>> senseInfos;
final Future<List<Map<String, Object?>>> senseInfos_query = connection.query(
JMdictTableNames.senseInfo,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
late final List<Map<String, Object?>> senseLanguageSources;
final Future<List<Map<String, Object?>>> senseLanguageSources_query =
connection.query(
JMdictTableNames.senseLanguageSource,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
late final List<Map<String, Object?>> senseMiscs;
final Future<List<Map<String, Object?>>> senseMiscs_query = connection.query(
JMdictTableNames.senseMisc,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
late final List<Map<String, Object?>> sensePOSs;
final Future<List<Map<String, Object?>>> sensePOSs_query = connection.query(
JMdictTableNames.sensePOS,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
late final List<Map<String, Object?>> senseRestrictedToKanjis;
final Future<List<Map<String, Object?>>> senseRestrictedToKanjis_query =
connection.query(
JMdictTableNames.senseRestrictedToKanji,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
late final List<Map<String, Object?>> senseRestrictedToReadings;
final Future<List<Map<String, Object?>>> senseRestrictedToReadings_query =
connection.query(
JMdictTableNames.senseRestrictedToReading,
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
late final List<Map<String, Object?>> senseSeeAlsos;
final Future<List<Map<String, Object?>>> senseSeeAlsos_query =
connection.rawQuery(
"""
SELECT
"${JMdictTableNames.senseSeeAlso}"."senseId",
"${JMdictTableNames.senseSeeAlso}"."ambiguous",
"${JMdictTableNames.senseSeeAlso}"."xrefEntryId",
"JMdict_BaseAndFurigana"."base",
"JMdict_BaseAndFurigana"."furigana"
FROM "${JMdictTableNames.senseSeeAlso}"
JOIN "JMdict_BaseAndFurigana"
ON "${JMdictTableNames.senseSeeAlso}"."xrefEntryId" = "JMdict_BaseAndFurigana"."entryId"
WHERE
"senseId" IN (${List.filled(senseIds.length, '?').join(',')})
AND "JMdict_BaseAndFurigana"."isFirst"
ORDER BY
"${JMdictTableNames.senseSeeAlso}"."senseId",
"${JMdictTableNames.senseSeeAlso}"."xrefEntryId"
""",
[...senseIds],
);
late final List<Map<String, Object?>> exampleSentences;
final Future<List<Map<String, Object?>>> exampleSentences_query =
connection.query(
'JMdict_ExampleSentence',
where: 'senseId IN (${List.filled(senseIds.length, '?').join(',')})',
whereArgs: senseIds,
);
// Reading queries
final readingIds = readingElements
.map((element) => (
element['entryId'] as int,
escapeStringValue(element['reading'] as String)
))
.map((element) => element['elementId'] as int)
.toList();
final kanjiIds = kanjiElements
.map((element) => element['elementId'] as int)
.toList();
late final List<Map<String, Object?>> readingElementInfos;
final Future<List<Map<String, Object?>>> readingElementInfos_query =
connection.query(
JMdictTableNames.readingInfo,
where: '(entryId, reading) IN (${readingIds.join(',')})',
);
late final List<Map<String, Object?>> readingElementRestrictions;
final Future<List<Map<String, Object?>>> readingElementRestrictions_query =
connection.query(
JMdictTableNames.readingRestriction,
where: '(entryId, reading) IN (${readingIds.join(',')})',
);
// Kanji queries
final kanjiIds = kanjiElements
.map((element) => (
element['entryId'] as int,
escapeStringValue(element['reading'] as String)
))
.toList();
late final List<Map<String, Object?>> kanjiElementInfos;
final Future<List<Map<String, Object?>>> kanjiElementInfos_query =
connection.query(
JMdictTableNames.kanjiInfo,
where: '(entryId, reading) IN (${kanjiIds.join(',')})',
);
// Xref data queries
await Future.wait([
_senseantonymsQuery(
connection,
senseIds,
).then((value) => senseAntonyms = value),
_senseseealsosQuery(
connection,
senseIds,
).then((value) => senseSeeAlsos = value),
]);
LinearWordQueryData? senseAntonymData;
LinearWordQueryData? senseSeeAlsoData;
await Future.wait([
senseAntonyms_query.then((value) => senseAntonyms = value),
senseDialects_query.then((value) => senseDialects = value),
senseFields_query.then((value) => senseFields = value),
senseGlossaries_query.then((value) => senseGlossaries = value),
senseInfos_query.then((value) => senseInfos = value),
senseLanguageSources_query.then((value) => senseLanguageSources = value),
senseMiscs_query.then((value) => senseMiscs = value),
sensePOSs_query.then((value) => sensePOSs = value),
senseRestrictedToKanjis_query
.then((value) => senseRestrictedToKanjis = value),
senseRestrictedToReadings_query
.then((value) => senseRestrictedToReadings = value),
senseSeeAlsos_query.then((value) => senseSeeAlsos = value),
exampleSentences_query.then((value) => exampleSentences = value),
readingElementInfos_query.then((value) => readingElementInfos = value),
readingElementRestrictions_query
.then((value) => readingElementRestrictions = value),
kanjiElementInfos_query.then((value) => kanjiElementInfos = value),
_sensedialectsQuery(
connection,
senseIds,
).then((value) => senseDialects = value),
_sensefieldsQuery(
connection,
senseIds,
).then((value) => senseFields = value),
_senseglossariesQuery(
connection,
senseIds,
).then((value) => senseGlossaries = value),
_senseinfosQuery(connection, senseIds).then((value) => senseInfos = value),
_senselanguagesourcesQuery(
connection,
senseIds,
).then((value) => senseLanguageSources = value),
_sensemiscsQuery(connection, senseIds).then((value) => senseMiscs = value),
_sensepossQuery(connection, senseIds).then((value) => sensePOSs = value),
_senserestrictedtokanjisQuery(
connection,
senseIds,
).then((value) => senseRestrictedToKanjis = value),
_senserestrictedtoreadingsQuery(
connection,
senseIds,
).then((value) => senseRestrictedToReadings = value),
_examplesentencesQuery(
connection,
senseIds,
).then((value) => exampleSentences = value),
_readingelementinfosQuery(
connection,
readingIds,
).then((value) => readingElementInfos = value),
_readingelementrestrictionsQuery(
connection,
readingIds,
).then((value) => readingElementRestrictions = value),
_kanjielementinfosQuery(
connection,
kanjiIds,
).then((value) => kanjiElementInfos = value),
if (fetchXrefData)
_senseantonymdataQuery(
connection,
senseAntonyms.map((antonym) => antonym['xrefEntryId'] as int).toList(),
).then((value) => senseAntonymData = value),
if (fetchXrefData)
_senseseealsodataQuery(
connection,
senseSeeAlsos.map((seeAlso) => seeAlso['xrefEntryId'] as int).toList(),
).then((value) => senseSeeAlsoData = value),
]);
return LinearWordQueryData(
@@ -308,5 +457,7 @@ Future<LinearWordQueryData> fetchLinearWordQueryData(
readingElementInfos: readingElementInfos,
readingElementRestrictions: readingElementRestrictions,
kanjiElementInfos: kanjiElementInfos,
senseAntonymData: senseAntonymData,
senseSeeAlsoData: senseSeeAlsoData,
);
}

View File

@@ -1,5 +1,5 @@
import 'package:jadb/table_names/jmdict.dart';
import 'package:jadb/search/word_search/word_search.dart';
import 'package:jadb/table_names/jmdict.dart';
import 'package:jadb/util/text_filtering.dart';
import 'package:sqflite_common/sqlite_api.dart';
@@ -15,15 +15,15 @@ SearchMode _determineSearchMode(String word) {
final bool containsAscii = RegExp(r'[A-Za-z]').hasMatch(word);
if (containsKanji && containsAscii) {
return SearchMode.MixedKanji;
return SearchMode.mixedKanji;
} else if (containsKanji) {
return SearchMode.Kanji;
return SearchMode.kanji;
} else if (containsAscii) {
return SearchMode.English;
return SearchMode.english;
} else if (word.contains(hiraganaRegex) || word.contains(katakanaRegex)) {
return SearchMode.Kana;
return SearchMode.kana;
} else {
return SearchMode.MixedKana;
return SearchMode.mixedKana;
}
}
@@ -37,91 +37,105 @@ String _filterFTSSensitiveCharacters(String word) {
.replaceAll('(', '')
.replaceAll(')', '')
.replaceAll('^', '')
.replaceAll('\"', '');
.replaceAll('"', '');
}
(String, List<Object?>) _kanjiReadingTemplate(
String tableName,
String word, {
int pageSize = 10,
int? pageSize,
int? offset,
bool countOnly = false,
}) =>
(
'''
}) {
assert(
tableName == JMdictTableNames.kanjiElement ||
tableName == JMdictTableNames.readingElement,
);
assert(!countOnly || pageSize == null);
assert(!countOnly || offset == null);
assert(pageSize == null || pageSize > 0);
assert(offset == null || offset >= 0);
assert(
offset == null || pageSize != null,
'Offset should only be used with pageSize set',
);
return (
'''
WITH
fts_results AS (
SELECT DISTINCT
"${tableName}FTS"."entryId",
"$tableName"."entryId",
100
+ (("${tableName}FTS"."reading" = ?) * 50)
+ "JMdict_EntryScore"."score"
+ (("${tableName}FTS"."reading" = ?) * 10000)
+ (("$tableName"."orderNum" = 0) * 20)
+ COALESCE("JMdict_EntryScore"."score", 0)
AS "score"
FROM "${tableName}FTS"
JOIN "${tableName}" USING ("entryId", "reading")
JOIN "JMdict_EntryScore" USING ("entryId", "reading")
JOIN "$tableName" USING ("elementId")
LEFT JOIN "JMdict_EntryScore" USING ("elementId")
WHERE "${tableName}FTS"."reading" MATCH ? || '*'
AND "JMdict_EntryScore"."type" = '${tableName == JMdictTableNames.kanjiElement ? 'kanji' : 'reading'}'
ORDER BY
"JMdict_EntryScore"."score" DESC
${!countOnly ? 'LIMIT ?' : ''}
),
non_fts_results AS (
SELECT DISTINCT
"${tableName}"."entryId",
"$tableName"."entryId",
50
+ "JMdict_EntryScore"."score"
+ (("$tableName"."orderNum" = 0) * 20)
+ COALESCE("JMdict_EntryScore"."score", 0)
AS "score"
FROM "${tableName}"
JOIN "JMdict_EntryScore" USING ("entryId", "reading")
FROM "$tableName"
LEFT JOIN "JMdict_EntryScore" USING ("elementId")
WHERE "reading" LIKE '%' || ? || '%'
AND "entryId" NOT IN (SELECT "entryId" FROM "fts_results")
AND "JMdict_EntryScore"."type" = '${tableName == JMdictTableNames.kanjiElement ? 'kanji' : 'reading'}'
ORDER BY
"JMdict_EntryScore"."score" DESC,
"${tableName}"."entryId" ASC
${!countOnly ? 'LIMIT ?' : ''}
AND "$tableName"."entryId" NOT IN (SELECT "entryId" FROM "fts_results")
)
${countOnly ? 'SELECT COUNT("entryId") AS count' : 'SELECT "entryId", "score"'}
SELECT ${countOnly ? 'COUNT(DISTINCT "entryId") AS count' : '"entryId", MAX("score") AS "score"'}
FROM (
SELECT * FROM fts_results
UNION ALL
SELECT * FROM non_fts_results
SELECT * FROM "fts_results"
UNION
SELECT * FROM "non_fts_results"
)
${!countOnly ? 'GROUP BY "entryId"' : ''}
${!countOnly ? 'ORDER BY "score" DESC, "entryId" ASC' : ''}
${pageSize != null ? 'LIMIT ?' : ''}
${offset != null ? 'OFFSET ?' : ''}
'''
.trim(),
[
_filterFTSSensitiveCharacters(word),
_filterFTSSensitiveCharacters(word),
if (!countOnly) pageSize,
_filterFTSSensitiveCharacters(word),
if (!countOnly) pageSize,
]
);
.trim(),
[
_filterFTSSensitiveCharacters(word),
_filterFTSSensitiveCharacters(word),
_filterFTSSensitiveCharacters(word),
?pageSize,
?offset,
],
);
}
Future<List<ScoredEntryId>> _queryKanji(
DatabaseExecutor connection,
String word,
int pageSize,
int? pageSize,
int? offset,
) {
final (query, args) = _kanjiReadingTemplate(
JMdictTableNames.kanjiElement,
word,
pageSize: pageSize,
offset: offset,
);
return connection.rawQuery(query, args).then((result) => result
.map((row) => ScoredEntryId(
row['entryId'] as int,
row['score'] as int,
))
.toList());
return connection
.rawQuery(query, args)
.then(
(result) => result
.map(
(row) =>
ScoredEntryId(row['entryId'] as int, row['score'] as int),
)
.toList(),
);
}
Future<int> _queryKanjiCount(
DatabaseExecutor connection,
String word,
) {
Future<int> _queryKanjiCount(DatabaseExecutor connection, String word) {
final (query, args) = _kanjiReadingTemplate(
JMdictTableNames.kanjiElement,
word,
@@ -129,32 +143,34 @@ Future<int> _queryKanjiCount(
);
return connection
.rawQuery(query, args)
.then((result) => result.first['count'] as int);
.then((result) => result.firstOrNull?['count'] as int? ?? 0);
}
Future<List<ScoredEntryId>> _queryKana(
DatabaseExecutor connection,
String word,
int pageSize,
int? pageSize,
int? offset,
) {
final (query, args) = _kanjiReadingTemplate(
JMdictTableNames.readingElement,
word,
pageSize: pageSize,
offset: offset,
);
return connection.rawQuery(query, args).then((result) => result
.map((row) => ScoredEntryId(
row['entryId'] as int,
row['score'] as int,
))
.toList());
return connection
.rawQuery(query, args)
.then(
(result) => result
.map(
(row) =>
ScoredEntryId(row['entryId'] as int, row['score'] as int),
)
.toList(),
);
}
Future<int> _queryKanaCount(
DatabaseExecutor connection,
String word,
) {
Future<int> _queryKanaCount(DatabaseExecutor connection, String word) {
final (query, args) = _kanjiReadingTemplate(
JMdictTableNames.readingElement,
word,
@@ -162,71 +178,62 @@ Future<int> _queryKanaCount(
);
return connection
.rawQuery(query, args)
.then((result) => result.first['count'] as int);
.then((result) => result.firstOrNull?['count'] as int? ?? 0);
}
Future<List<ScoredEntryId>> _queryEnglish(
DatabaseExecutor connection,
String word,
int pageSize,
int? pageSize,
int? offset,
) async {
assert(pageSize == null || pageSize > 0);
assert(offset == null || offset >= 0);
assert(
offset == null || pageSize != null,
'Offset should only be used with pageSize set',
);
final result = await connection.rawQuery(
'''
SELECT
"${JMdictTableNames.sense}"."entryId",
MAX("JMdict_EntryScore"."score")
+ (("${JMdictTableNames.senseGlossary}"."phrase" = ? AND "${JMdictTableNames.sense}"."orderNum" = 1) * 50)
+ (("${JMdictTableNames.senseGlossary}"."phrase" = ? AND "${JMdictTableNames.sense}"."orderNum" = 2) * 30)
+ (("${JMdictTableNames.senseGlossary}"."phrase" = ?) * 20)
COALESCE(MAX("JMdict_EntryScore"."score"), 0)
+ (("${JMdictTableNames.senseGlossary}"."phrase" = ?1 AND "${JMdictTableNames.sense}"."orderNum" = 0) * 50)
+ (("${JMdictTableNames.senseGlossary}"."phrase" = ?1 AND "${JMdictTableNames.sense}"."orderNum" = 1) * 30)
+ (("${JMdictTableNames.senseGlossary}"."phrase" = ?1 AND "${JMdictTableNames.sense}"."orderNum" > 1) * 20)
as "score"
FROM "${JMdictTableNames.senseGlossary}"
JOIN "${JMdictTableNames.sense}" USING ("senseId")
JOIN "JMdict_EntryScore" USING ("entryId")
WHERE "${JMdictTableNames.senseGlossary}"."phrase" LIKE ?
GROUP BY "JMdict_EntryScore"."entryId"
LEFT JOIN "JMdict_EntryScore" USING ("entryId")
WHERE "${JMdictTableNames.senseGlossary}"."phrase" LIKE ?2
GROUP BY "${JMdictTableNames.sense}"."entryId"
ORDER BY
"score" DESC,
"${JMdictTableNames.sense}"."entryId" ASC
LIMIT ?
OFFSET ?
${pageSize != null ? 'LIMIT ?3' : ''}
${offset != null ? 'OFFSET ?4' : ''}
'''
.trim(),
[
word,
word,
word,
'%${word.replaceAll('%', '')}%',
pageSize,
offset,
],
[word, '%${word.replaceAll('%', '')}%', ?pageSize, ?offset],
);
return result
.map((row) => ScoredEntryId(
row['entryId'] as int,
row['score'] as int,
))
.map((row) => ScoredEntryId(row['entryId'] as int, row['score'] as int))
.toList();
}
Future<int> _queryEnglishCount(
DatabaseExecutor connection,
String word,
) async {
Future<int> _queryEnglishCount(DatabaseExecutor connection, String word) async {
final result = await connection.rawQuery(
'''
SELECT
COUNT(DISTINCT "${JMdictTableNames.sense}"."entryId") AS "count"
FROM "${JMdictTableNames.senseGlossary}"
JOIN "${JMdictTableNames.sense}" USING ("senseId")
WHERE "${JMdictTableNames.senseGlossary}"."phrase" LIKE ?
'''
SELECT
COUNT(DISTINCT "${JMdictTableNames.sense}"."entryId") AS "count"
FROM "${JMdictTableNames.senseGlossary}"
JOIN "${JMdictTableNames.sense}" USING ("senseId")
WHERE "${JMdictTableNames.senseGlossary}"."phrase" LIKE ?
'''
.trim(),
[
'%$word%',
],
['%$word%'],
);
return result.first['count'] as int;
@@ -236,55 +243,34 @@ Future<List<ScoredEntryId>> fetchEntryIds(
DatabaseExecutor connection,
String word,
SearchMode searchMode,
int pageSize,
int? pageSize,
int? offset,
) async {
if (searchMode == SearchMode.Auto) {
if (searchMode == SearchMode.auto) {
searchMode = _determineSearchMode(word);
}
assert(
word.isNotEmpty,
'Word should not be empty when fetching entry IDs',
);
assert(word.isNotEmpty, 'Word should not be empty when fetching entry IDs');
late final List<ScoredEntryId> entryIds;
switch (searchMode) {
case SearchMode.Kanji:
entryIds = await _queryKanji(
connection,
word,
pageSize,
offset,
);
case SearchMode.kanji:
entryIds = await _queryKanji(connection, word, pageSize, offset);
break;
case SearchMode.Kana:
entryIds = await _queryKana(
connection,
word,
pageSize,
offset,
);
case SearchMode.kana:
entryIds = await _queryKana(connection, word, pageSize, offset);
break;
case SearchMode.English:
entryIds = await _queryEnglish(
connection,
word,
pageSize,
offset,
);
case SearchMode.english:
entryIds = await _queryEnglish(connection, word, pageSize, offset);
break;
case SearchMode.MixedKana:
case SearchMode.MixedKanji:
case SearchMode.mixedKana:
case SearchMode.mixedKanji:
default:
throw UnimplementedError(
'Search mode $searchMode is not implemented',
);
throw UnimplementedError('Search mode $searchMode is not implemented');
}
;
return entryIds;
}
@@ -294,45 +280,31 @@ Future<int?> fetchEntryIdCount(
String word,
SearchMode searchMode,
) async {
if (searchMode == SearchMode.Auto) {
if (searchMode == SearchMode.auto) {
searchMode = _determineSearchMode(word);
}
assert(
word.isNotEmpty,
'Word should not be empty when fetching entry IDs',
);
assert(word.isNotEmpty, 'Word should not be empty when fetching entry IDs');
late final int? entryIdCount;
switch (searchMode) {
case SearchMode.Kanji:
entryIdCount = await _queryKanjiCount(
connection,
word,
);
case SearchMode.kanji:
entryIdCount = await _queryKanjiCount(connection, word);
break;
case SearchMode.Kana:
entryIdCount = await _queryKanaCount(
connection,
word,
);
case SearchMode.kana:
entryIdCount = await _queryKanaCount(connection, word);
break;
case SearchMode.English:
entryIdCount = await _queryEnglishCount(
connection,
word,
);
case SearchMode.english:
entryIdCount = await _queryEnglishCount(connection, word);
break;
case SearchMode.MixedKana:
case SearchMode.MixedKanji:
case SearchMode.mixedKana:
case SearchMode.mixedKanji:
default:
throw UnimplementedError(
'Search mode $searchMode is not implemented',
);
throw UnimplementedError('Search mode $searchMode is not implemented');
}
return entryIdCount;

View File

@@ -12,50 +12,37 @@ import 'package:jadb/models/word_search/word_search_sense.dart';
import 'package:jadb/models/word_search/word_search_sense_language_source.dart';
import 'package:jadb/models/word_search/word_search_sources.dart';
import 'package:jadb/models/word_search/word_search_xref_entry.dart';
import 'package:jadb/search/word_search/data_query.dart';
import 'package:jadb/search/word_search/entry_id_query.dart';
List<WordSearchResult> regroupWordSearchResults({
required List<ScoredEntryId> entryIds,
required List<Map<String, Object?>> readingElements,
required List<Map<String, Object?>> kanjiElements,
required List<Map<String, Object?>> jlptTags,
required List<Map<String, Object?>> commonEntries,
required List<Map<String, Object?>> senses,
required List<Map<String, Object?>> senseAntonyms,
required List<Map<String, Object?>> senseDialects,
required List<Map<String, Object?>> senseFields,
required List<Map<String, Object?>> senseGlossaries,
required List<Map<String, Object?>> senseInfos,
required List<Map<String, Object?>> senseLanguageSources,
required List<Map<String, Object?>> senseMiscs,
required List<Map<String, Object?>> sensePOSs,
required List<Map<String, Object?>> senseRestrictedToKanjis,
required List<Map<String, Object?>> senseRestrictedToReadings,
required List<Map<String, Object?>> senseSeeAlsos,
required List<Map<String, Object?>> exampleSentences,
required List<Map<String, Object?>> readingElementInfos,
required List<Map<String, Object?>> readingElementRestrictions,
required List<Map<String, Object?>> kanjiElementInfos,
required LinearWordQueryData linearWordQueryData,
}) {
final List<WordSearchResult> results = [];
final commonEntryIds =
commonEntries.map((entry) => entry['entryId'] as int).toSet();
final commonEntryIds = linearWordQueryData.commonEntries
.map((entry) => entry['entryId'] as int)
.toSet();
for (final scoredEntryId in entryIds) {
final List<Map<String, Object?>> entryReadingElements = readingElements
final List<Map<String, Object?>> entryReadingElements = linearWordQueryData
.readingElements
.where((element) => element['entryId'] == scoredEntryId.entryId)
.toList();
final List<Map<String, Object?>> entryKanjiElements = kanjiElements
final List<Map<String, Object?>> entryKanjiElements = linearWordQueryData
.kanjiElements
.where((element) => element['entryId'] == scoredEntryId.entryId)
.toList();
final List<Map<String, Object?>> entryJlptTags = jlptTags
final List<Map<String, Object?>> entryJlptTags = linearWordQueryData
.jlptTags
.where((element) => element['entryId'] == scoredEntryId.entryId)
.toList();
final jlptLevel = entryJlptTags
final jlptLevel =
entryJlptTags
.map((e) => JlptLevel.fromString(e['jlptLevel'] as String?))
.sorted((a, b) => b.compareTo(a))
.firstOrNull ??
@@ -63,33 +50,36 @@ List<WordSearchResult> regroupWordSearchResults({
final isCommon = commonEntryIds.contains(scoredEntryId.entryId);
final List<Map<String, Object?>> entrySenses = senses
final List<Map<String, Object?>> entrySenses = linearWordQueryData.senses
.where((element) => element['entryId'] == scoredEntryId.entryId)
.toList();
final GroupedWordResult entryReadingElementsGrouped = _regroup_words(
final GroupedWordResult entryReadingElementsGrouped = _regroupWords(
entryId: scoredEntryId.entryId,
readingElements: entryReadingElements,
kanjiElements: entryKanjiElements,
readingElementInfos: readingElementInfos,
readingElementRestrictions: readingElementRestrictions,
kanjiElementInfos: kanjiElementInfos,
readingElementInfos: linearWordQueryData.readingElementInfos,
readingElementRestrictions:
linearWordQueryData.readingElementRestrictions,
kanjiElementInfos: linearWordQueryData.kanjiElementInfos,
);
final List<WordSearchSense> entrySensesGrouped = _regroup_senses(
final List<WordSearchSense> entrySensesGrouped = _regroupSenses(
senses: entrySenses,
senseAntonyms: senseAntonyms,
senseDialects: senseDialects,
senseFields: senseFields,
senseGlossaries: senseGlossaries,
senseInfos: senseInfos,
senseLanguageSources: senseLanguageSources,
senseMiscs: senseMiscs,
sensePOSs: sensePOSs,
senseRestrictedToKanjis: senseRestrictedToKanjis,
senseRestrictedToReadings: senseRestrictedToReadings,
senseSeeAlsos: senseSeeAlsos,
exampleSentences: exampleSentences,
senseAntonyms: linearWordQueryData.senseAntonyms,
senseDialects: linearWordQueryData.senseDialects,
senseFields: linearWordQueryData.senseFields,
senseGlossaries: linearWordQueryData.senseGlossaries,
senseInfos: linearWordQueryData.senseInfos,
senseLanguageSources: linearWordQueryData.senseLanguageSources,
senseMiscs: linearWordQueryData.senseMiscs,
sensePOSs: linearWordQueryData.sensePOSs,
senseRestrictedToKanjis: linearWordQueryData.senseRestrictedToKanjis,
senseRestrictedToReadings: linearWordQueryData.senseRestrictedToReadings,
senseSeeAlsos: linearWordQueryData.senseSeeAlsos,
exampleSentences: linearWordQueryData.exampleSentences,
senseSeeAlsosXrefData: linearWordQueryData.senseSeeAlsoData,
senseAntonymsXrefData: linearWordQueryData.senseAntonymData,
);
results.add(
@@ -102,10 +92,7 @@ List<WordSearchResult> regroupWordSearchResults({
readingInfo: entryReadingElementsGrouped.readingInfos,
senses: entrySensesGrouped,
jlptLevel: jlptLevel,
sources: const WordSearchSources(
jmdict: true,
jmnedict: false,
),
sources: const WordSearchSources(jmdict: true, jmnedict: false),
),
);
}
@@ -125,7 +112,7 @@ class GroupedWordResult {
});
}
GroupedWordResult _regroup_words({
GroupedWordResult _regroupWords({
required int entryId,
required List<Map<String, Object?>> kanjiElements,
required List<Map<String, Object?>> kanjiElementInfos,
@@ -135,8 +122,9 @@ GroupedWordResult _regroup_words({
}) {
final List<WordSearchRuby> rubys = [];
final kanjiElements_ =
kanjiElements.where((element) => element['entryId'] == entryId).toList();
final kanjiElements_ = kanjiElements
.where((element) => element['entryId'] == entryId)
.toList();
final readingElements_ = readingElements
.where((element) => element['entryId'] == entryId)
@@ -148,9 +136,7 @@ GroupedWordResult _regroup_words({
for (final readingElement in readingElements_) {
if (readingElement['doesNotMatchKanji'] == 1 || kanjiElements_.isEmpty) {
final ruby = WordSearchRuby(
base: readingElement['reading'] as String,
);
final ruby = WordSearchRuby(base: readingElement['reading'] as String);
rubys.add(ruby);
continue;
@@ -169,34 +155,47 @@ GroupedWordResult _regroup_words({
continue;
}
final ruby = WordSearchRuby(
base: kanji,
furigana: reading,
);
final ruby = WordSearchRuby(base: kanji, furigana: reading);
rubys.add(ruby);
}
}
assert(
rubys.isNotEmpty,
'No readings found for entryId: $entryId',
);
assert(rubys.isNotEmpty, 'No readings found for entryId: $entryId');
final Map<int, String> readingElementIdsToReading = {
for (final element in readingElements_)
element['elementId'] as int: element['reading'] as String,
};
final Map<int, String> kanjiElementIdsToReading = {
for (final element in kanjiElements_)
element['elementId'] as int: element['reading'] as String,
};
final readingElementInfos_ = readingElementInfos
.where((element) => element['entryId'] == entryId)
.toList();
final kanjiElementInfos_ = kanjiElementInfos
.where((element) => element['entryId'] == entryId)
.toList();
return GroupedWordResult(
rubys: rubys,
readingInfos: {
for (final rei in readingElementInfos)
rei['reading'] as String:
for (final rei in readingElementInfos_)
readingElementIdsToReading[rei['elementId'] as int]!:
JMdictReadingInfo.fromId(rei['info'] as String),
},
kanjiInfos: {
for (final kei in kanjiElementInfos)
kei['reading'] as String: JMdictKanjiInfo.fromId(kei['info'] as String),
for (final kei in kanjiElementInfos_)
kanjiElementIdsToReading[kei['elementId'] as int]!:
JMdictKanjiInfo.fromId(kei['info'] as String),
},
);
}
List<WordSearchSense> _regroup_senses({
List<WordSearchSense> _regroupSenses({
required List<Map<String, Object?>> senses,
required List<Map<String, Object?>> senseAntonyms,
required List<Map<String, Object?>> senseDialects,
@@ -210,29 +209,41 @@ List<WordSearchSense> _regroup_senses({
required List<Map<String, Object?>> senseRestrictedToReadings,
required List<Map<String, Object?>> senseSeeAlsos,
required List<Map<String, Object?>> exampleSentences,
required LinearWordQueryData? senseSeeAlsosXrefData,
required LinearWordQueryData? senseAntonymsXrefData,
}) {
final groupedSenseAntonyms =
senseAntonyms.groupListsBy((element) => element['senseId'] as int);
final groupedSenseDialects =
senseDialects.groupListsBy((element) => element['senseId'] as int);
final groupedSenseFields =
senseFields.groupListsBy((element) => element['senseId'] as int);
final groupedSenseGlossaries =
senseGlossaries.groupListsBy((element) => element['senseId'] as int);
final groupedSenseInfos =
senseInfos.groupListsBy((element) => element['senseId'] as int);
final groupedSenseLanguageSources =
senseLanguageSources.groupListsBy((element) => element['senseId'] as int);
final groupedSenseMiscs =
senseMiscs.groupListsBy((element) => element['senseId'] as int);
final groupedSensePOSs =
sensePOSs.groupListsBy((element) => element['senseId'] as int);
final groupedSenseRestrictedToKanjis = senseRestrictedToKanjis
.groupListsBy((element) => element['senseId'] as int);
final groupedSenseAntonyms = senseAntonyms.groupListsBy(
(element) => element['senseId'] as int,
);
final groupedSenseDialects = senseDialects.groupListsBy(
(element) => element['senseId'] as int,
);
final groupedSenseFields = senseFields.groupListsBy(
(element) => element['senseId'] as int,
);
final groupedSenseGlossaries = senseGlossaries.groupListsBy(
(element) => element['senseId'] as int,
);
final groupedSenseInfos = senseInfos.groupListsBy(
(element) => element['senseId'] as int,
);
final groupedSenseLanguageSources = senseLanguageSources.groupListsBy(
(element) => element['senseId'] as int,
);
final groupedSenseMiscs = senseMiscs.groupListsBy(
(element) => element['senseId'] as int,
);
final groupedSensePOSs = sensePOSs.groupListsBy(
(element) => element['senseId'] as int,
);
final groupedSenseRestrictedToKanjis = senseRestrictedToKanjis.groupListsBy(
(element) => element['senseId'] as int,
);
final groupedSenseRestrictedToReadings = senseRestrictedToReadings
.groupListsBy((element) => element['senseId'] as int);
final groupedSenseSeeAlsos =
senseSeeAlsos.groupListsBy((element) => element['senseId'] as int);
final groupedSenseSeeAlsos = senseSeeAlsos.groupListsBy(
(element) => element['senseId'] as int,
);
final List<WordSearchSense> result = [];
for (final sense in senses) {
@@ -251,45 +262,82 @@ List<WordSearchSense> _regroup_senses({
groupedSenseRestrictedToReadings[senseId] ?? [];
final seeAlsos = groupedSenseSeeAlsos[senseId] ?? [];
final List<WordSearchResult> seeAlsosWordResults =
senseSeeAlsosXrefData != null
? regroupWordSearchResults(
entryIds: seeAlsos
.map((e) => ScoredEntryId(e['xrefEntryId'] as int, 0))
.toList(),
linearWordQueryData: senseSeeAlsosXrefData,
)
: [];
final List<WordSearchResult> antonymsWordResults =
senseAntonymsXrefData != null
? regroupWordSearchResults(
entryIds: antonyms
.map((e) => ScoredEntryId(e['xrefEntryId'] as int, 0))
.toList(),
linearWordQueryData: senseAntonymsXrefData,
)
: [];
final resultSense = WordSearchSense(
englishDefinitions: glossaries.map((e) => e['phrase'] as String).toList(),
partsOfSpeech:
pos.map((e) => JMdictPOS.fromId(e['pos'] as String)).toList(),
seeAlso: seeAlsos
.map((e) => WordSearchXrefEntry(
entryId: e['xrefEntryId'] as int,
baseWord: e['base'] as String,
furigana: e['furigana'] as String?,
ambiguous: e['ambiguous'] == 1,
))
partsOfSpeech: pos
.map((e) => JMdictPOS.fromId(e['pos'] as String))
.toList(),
antonyms: antonyms
.map((e) => WordSearchXrefEntry(
entryId: e['xrefEntryId'] as int,
baseWord: e['base'] as String,
furigana: e['furigana'] as String?,
ambiguous: e['ambiguous'] == 1,
))
seeAlso: seeAlsos.asMap().entries.map<WordSearchXrefEntry>((mapEntry) {
final i = mapEntry.key;
final e = mapEntry.value;
return WordSearchXrefEntry(
entryId: e['xrefEntryId'] as int,
baseWord: e['base'] as String,
furigana: e['furigana'] as String?,
ambiguous: e['ambiguous'] == 1,
xrefResult: seeAlsosWordResults.isNotEmpty
? seeAlsosWordResults[i]
: null,
);
}).toList(),
antonyms: antonyms.asMap().entries.map<WordSearchXrefEntry>((mapEntry) {
final i = mapEntry.key;
final e = mapEntry.value;
return WordSearchXrefEntry(
entryId: e['xrefEntryId'] as int,
baseWord: e['base'] as String,
furigana: e['furigana'] as String?,
ambiguous: e['ambiguous'] == 1,
xrefResult: antonymsWordResults.isNotEmpty
? antonymsWordResults[i]
: null,
);
}).toList(),
restrictedToReading: restrictedToReadings
.map((e) => e['reading'] as String)
.toList(),
restrictedToKanji: restrictedToKanjis
.map((e) => e['reading'] as String)
.toList(),
fields: fields
.map((e) => JMdictField.fromId(e['field'] as String))
.toList(),
restrictedToReading:
restrictedToReadings.map((e) => e['reading'] as String).toList(),
restrictedToKanji:
restrictedToKanjis.map((e) => e['kanji'] as String).toList(),
fields:
fields.map((e) => JMdictField.fromId(e['field'] as String)).toList(),
dialects: dialects
.map((e) => JMdictDialect.fromId(e['dialect'] as String))
.toList(),
misc: miscs.map((e) => JMdictMisc.fromId(e['misc'] as String)).toList(),
info: infos.map((e) => e['info'] as String).toList(),
languageSource: languageSources
.map((e) => WordSearchSenseLanguageSource(
language: e['language'] as String,
phrase: e['phrase'] as String?,
fullyDescribesSense: e['fullyDescribesSense'] == 1,
constructedFromSmallerWords:
e['constructedFromSmallerWords'] == 1,
))
.map(
(e) => WordSearchSenseLanguageSource(
language: e['language'] as String,
phrase: e['phrase'] as String?,
fullyDescribesSense: e['fullyDescribesSense'] == 1,
constructedFromSmallerWords:
e['constructedFromSmallerWords'] == 1,
),
)
.toList(),
);

View File

@@ -14,26 +14,38 @@ import 'package:jadb/table_names/jmdict.dart';
import 'package:sqflite_common/sqlite_api.dart';
enum SearchMode {
Auto,
English,
Kanji,
MixedKanji,
Kana,
MixedKana,
/// Try to autodetect what is being searched for
auto,
/// Search for english words
english,
/// Search for the kanji reading of a word
kanji,
/// Search for the kanji reading of a word, mixed in with kana/romaji
mixedKanji,
/// Search for the kana reading of a word
kana,
/// Search for the kana reading of a word, mixed in with romaji
mixedKana,
}
/// Searches for an input string, returning a list of results with their details. Returns null if the input string is empty.
Future<List<WordSearchResult>?> searchWordWithDbConnection(
DatabaseExecutor connection,
String word,
SearchMode searchMode,
int page,
int pageSize,
) async {
String word, {
SearchMode searchMode = SearchMode.auto,
int page = 0,
int? pageSize,
}) async {
if (word.isEmpty) {
return null;
}
final offset = page * pageSize;
final int? offset = pageSize != null ? page * pageSize : null;
final List<ScoredEntryId> entryIds = await fetchEntryIds(
connection,
word,
@@ -43,47 +55,34 @@ Future<List<WordSearchResult>?> searchWordWithDbConnection(
);
if (entryIds.isEmpty) {
// TODO: try conjugation search
return [];
}
final LinearWordQueryData linearWordQueryData =
await fetchLinearWordQueryData(
connection,
entryIds.map((e) => e.entryId).toList(),
);
connection,
entryIds.map((e) => e.entryId).toList(),
);
final result = regroupWordSearchResults(
entryIds: entryIds,
readingElements: linearWordQueryData.readingElements,
kanjiElements: linearWordQueryData.kanjiElements,
jlptTags: linearWordQueryData.jlptTags,
commonEntries: linearWordQueryData.commonEntries,
senses: linearWordQueryData.senses,
senseAntonyms: linearWordQueryData.senseAntonyms,
senseDialects: linearWordQueryData.senseDialects,
senseFields: linearWordQueryData.senseFields,
senseGlossaries: linearWordQueryData.senseGlossaries,
senseInfos: linearWordQueryData.senseInfos,
senseLanguageSources: linearWordQueryData.senseLanguageSources,
senseMiscs: linearWordQueryData.senseMiscs,
sensePOSs: linearWordQueryData.sensePOSs,
senseRestrictedToKanjis: linearWordQueryData.senseRestrictedToKanjis,
senseRestrictedToReadings: linearWordQueryData.senseRestrictedToReadings,
senseSeeAlsos: linearWordQueryData.senseSeeAlsos,
exampleSentences: linearWordQueryData.exampleSentences,
readingElementInfos: linearWordQueryData.readingElementInfos,
readingElementRestrictions: linearWordQueryData.readingElementRestrictions,
kanjiElementInfos: linearWordQueryData.kanjiElementInfos,
linearWordQueryData: linearWordQueryData,
);
for (final resultEntry in result) {
resultEntry.inferMatchSpans(word, searchMode: searchMode);
}
return result;
}
/// Searches for an input string, returning the amount of results that the search would yield without pagination.
Future<int?> searchWordCountWithDbConnection(
DatabaseExecutor connection,
String word,
SearchMode searchMode,
) async {
String word, {
SearchMode searchMode = SearchMode.auto,
}) async {
if (word.isEmpty) {
return null;
}
@@ -97,6 +96,7 @@ Future<int?> searchWordCountWithDbConnection(
return entryIdCount;
}
/// Fetches a single word by its entry ID, returning null if not found.
Future<WordSearchResult?> getWordByIdWithDbConnection(
DatabaseExecutor connection,
int id,
@@ -105,43 +105,23 @@ Future<WordSearchResult?> getWordByIdWithDbConnection(
return null;
}
final exists = await connection.rawQuery(
'SELECT EXISTS(SELECT 1 FROM "${JMdictTableNames.entry}" WHERE "entryId" = ?)',
[id],
).then((value) => value.isNotEmpty && value.first.values.first == 1);
final exists = await connection
.rawQuery(
'SELECT EXISTS(SELECT 1 FROM "${JMdictTableNames.entry}" WHERE "entryId" = ?)',
[id],
)
.then((value) => value.isNotEmpty && value.first.values.first == 1);
if (!exists) {
return null;
}
final LinearWordQueryData linearWordQueryData =
await fetchLinearWordQueryData(
connection,
[id],
);
await fetchLinearWordQueryData(connection, [id]);
final result = regroupWordSearchResults(
entryIds: [ScoredEntryId(id, 0)],
readingElements: linearWordQueryData.readingElements,
kanjiElements: linearWordQueryData.kanjiElements,
jlptTags: linearWordQueryData.jlptTags,
commonEntries: linearWordQueryData.commonEntries,
senses: linearWordQueryData.senses,
senseAntonyms: linearWordQueryData.senseAntonyms,
senseDialects: linearWordQueryData.senseDialects,
senseFields: linearWordQueryData.senseFields,
senseGlossaries: linearWordQueryData.senseGlossaries,
senseInfos: linearWordQueryData.senseInfos,
senseLanguageSources: linearWordQueryData.senseLanguageSources,
senseMiscs: linearWordQueryData.senseMiscs,
sensePOSs: linearWordQueryData.sensePOSs,
senseRestrictedToKanjis: linearWordQueryData.senseRestrictedToKanjis,
senseRestrictedToReadings: linearWordQueryData.senseRestrictedToReadings,
senseSeeAlsos: linearWordQueryData.senseSeeAlsos,
exampleSentences: linearWordQueryData.exampleSentences,
readingElementInfos: linearWordQueryData.readingElementInfos,
readingElementRestrictions: linearWordQueryData.readingElementRestrictions,
kanjiElementInfos: linearWordQueryData.kanjiElementInfos,
linearWordQueryData: linearWordQueryData,
);
assert(
@@ -151,3 +131,27 @@ Future<WordSearchResult?> getWordByIdWithDbConnection(
return result.firstOrNull;
}
/// Fetches multiple words by their entry IDs, returning a map from entry ID to result.
Future<Map<int, WordSearchResult>> getWordsByIdsWithDbConnection(
DatabaseExecutor connection,
Set<int> ids,
) async {
if (ids.isEmpty) {
return {};
}
final LinearWordQueryData linearWordQueryData =
await fetchLinearWordQueryData(connection, ids.toList());
final List<ScoredEntryId> entryIds = ids
.map((id) => ScoredEntryId(id, 0)) // Score is not used here
.toList();
final results = regroupWordSearchResults(
entryIds: entryIds,
linearWordQueryData: linearWordQueryData,
);
return {for (var r in results) r.entryId: r};
}

View File

@@ -1,4 +1,5 @@
abstract class JMdictTableNames {
static const String version = 'JMdict_Version';
static const String entry = 'JMdict_Entry';
static const String kanjiElement = 'JMdict_KanjiElement';
static const String kanjiInfo = 'JMdict_KanjiElementInfo';
@@ -10,6 +11,7 @@ abstract class JMdictTableNames {
static const String senseDialect = 'JMdict_SenseDialect';
static const String senseField = 'JMdict_SenseField';
static const String senseGlossary = 'JMdict_SenseGlossary';
static const String senseGlossaryType = 'JMdict_SenseGlossaryType';
static const String senseInfo = 'JMdict_SenseInfo';
static const String senseMisc = 'JMdict_SenseMisc';
static const String sensePOS = 'JMdict_SensePOS';
@@ -20,23 +22,25 @@ abstract class JMdictTableNames {
static const String senseSeeAlso = 'JMdict_SenseSeeAlso';
static Set<String> get allTables => {
entry,
kanjiElement,
kanjiInfo,
readingElement,
readingInfo,
readingRestriction,
sense,
senseAntonyms,
senseDialect,
senseField,
senseGlossary,
senseInfo,
senseMisc,
sensePOS,
senseLanguageSource,
senseRestrictedToKanji,
senseRestrictedToReading,
senseSeeAlso
};
version,
entry,
kanjiElement,
kanjiInfo,
readingElement,
readingInfo,
readingRestriction,
sense,
senseAntonyms,
senseDialect,
senseField,
senseGlossary,
senseGlossaryType,
senseInfo,
senseMisc,
sensePOS,
senseLanguageSource,
senseRestrictedToKanji,
senseRestrictedToReading,
senseSeeAlso,
};
}

View File

@@ -1,5 +1,9 @@
abstract class KANJIDICTableNames {
static const String version = 'KANJIDIC_Version';
static const String character = 'KANJIDIC_Character';
static const String grade = 'KANJIDIC_Grade';
static const String frequency = 'KANJIDIC_Frequency';
static const String jlpt = 'KANJIDIC_JLPT';
static const String radicalName = 'KANJIDIC_RadicalName';
static const String codepoint = 'KANJIDIC_Codepoint';
static const String radical = 'KANJIDIC_Radical';
@@ -17,19 +21,23 @@ abstract class KANJIDICTableNames {
static const String nanori = 'KANJIDIC_Nanori';
static Set<String> get allTables => {
character,
radicalName,
codepoint,
radical,
strokeMiscount,
variant,
dictionaryReference,
dictionaryReferenceMoro,
queryCode,
reading,
kunyomi,
onyomi,
meaning,
nanori
};
version,
character,
grade,
frequency,
jlpt,
radicalName,
codepoint,
radical,
strokeMiscount,
variant,
dictionaryReference,
dictionaryReferenceMoro,
queryCode,
reading,
kunyomi,
onyomi,
meaning,
nanori,
};
}

View File

@@ -1,7 +1,6 @@
abstract class RADKFILETableNames {
static const String version = 'RADKFILE_Version';
static const String radkfile = 'RADKFILE';
static Set<String> get allTables => {
radkfile,
};
static Set<String> get allTables => {version, radkfile};
}

View File

@@ -1,5 +1,6 @@
abstract class TanosJLPTTableNames {
static const String version = 'JMdict_JLPT_Version';
static const String jlptTag = 'JMdict_JLPTTag';
static Set<String> get allTables => {jlptTag};
static Set<String> get allTables => {version, jlptTag};
}

View File

@@ -276,29 +276,22 @@ extension on DateTime {
/// See more info here:
/// - https://en.wikipedia.org/wiki/Nanboku-ch%C5%8D_period
/// - http://www.kumamotokokufu-h.ed.jp/kumamoto/bungaku/nengoui.html
String? japaneseEra({bool nanbokuchouPeriodUsesNorth = true}) {
String? japaneseEra() {
throw UnimplementedError('This function is not implemented yet.');
if (this.year < 645) {
if (year < 645) {
return null;
}
if (this.year < periodsNanbokuchouNorth.keys.first.$1) {
if (year < periodsNanbokuchouNorth.keys.first.$1) {
// TODO: find first where year <= this.year and jump one period back.
}
}
String get japaneseWeekdayPrefix => [
'',
'',
'',
'',
'',
'',
'',
][weekday - 1];
String get japaneseWeekdayPrefix =>
['', '', '', '', '', '', ''][weekday - 1];
/// Returns the date in Japanese format.
String japaneseDate({bool showWeekday = false}) =>
'$month月$day日' + (showWeekday ? '$japaneseWeekdayPrefix' : '');
'$month月$day日${showWeekday ? '$japaneseWeekdayPrefix' : ''}';
}

View File

@@ -1,3 +1,4 @@
import 'package:collection/collection.dart';
import 'package:jadb/util/lemmatizer/rules.dart';
enum WordClass {
@@ -10,18 +11,17 @@ enum WordClass {
adverb,
particle,
input,
// TODO: add toString and fromString so it can be parsed by the cli
}
enum LemmatizationRuleType {
prefix,
suffix,
}
enum LemmatizationRuleType { prefix, suffix }
class LemmatizationRule {
final String name;
final AllomorphPattern pattern;
final WordClass wordClass;
final List<WordClass>? validChildClasses;
final Set<WordClass>? validChildClasses;
final bool terminal;
const LemmatizationRule({
@@ -41,23 +41,44 @@ class LemmatizationRule {
required String pattern,
required String? replacement,
required WordClass wordClass,
validChildClasses,
terminal = false,
lookAheadBehind = const [''],
Set<WordClass>? validChildClasses,
bool terminal = false,
List<Pattern> lookAheadBehind = const [''],
LemmatizationRuleType type = LemmatizationRuleType.suffix,
}) : this(
name: name,
pattern: AllomorphPattern(
patterns: {
pattern: replacement != null ? [replacement] : null
},
type: type,
lookAheadBehind: lookAheadBehind,
),
validChildClasses: validChildClasses,
terminal: terminal,
wordClass: wordClass,
);
name: name,
pattern: AllomorphPattern(
patterns: {
pattern: replacement != null ? [replacement] : null,
},
type: type,
lookAheadBehind: lookAheadBehind,
),
validChildClasses: validChildClasses,
terminal: terminal,
wordClass: wordClass,
);
@override
int get hashCode => Object.hash(
name,
pattern,
wordClass,
validChildClasses,
terminal,
SetEquality().hash(validChildClasses),
);
@override
bool operator ==(Object other) {
if (identical(this, other)) return true;
return other is LemmatizationRule &&
other.name == name &&
other.pattern == pattern &&
other.wordClass == wordClass &&
other.terminal == terminal &&
SetEquality().equals(validChildClasses, other.validChildClasses);
}
}
/// Represents a set of patterns for matching allomorphs in a word.
@@ -74,6 +95,7 @@ class AllomorphPattern {
this.lookAheadBehind = const [''],
});
/// Convert the [patterns] into regexes
List<(String, Pattern)> get allPatternCombinations {
final combinations = <(String, Pattern)>[];
for (final l in lookAheadBehind) {
@@ -97,6 +119,7 @@ class AllomorphPattern {
return combinations;
}
/// Check whether an input string matches any of the [patterns]
bool matches(String word) {
for (final (_, p) in allPatternCombinations) {
if (p is String) {
@@ -114,6 +137,9 @@ class AllomorphPattern {
return false;
}
/// Apply the replacement for this pattern.
///
/// If none of the [patterns] apply, this function returns `null`.
List<String>? apply(String word) {
for (final (affix, p) in allPatternCombinations) {
switch ((type, p is RegExp)) {
@@ -132,8 +158,8 @@ class AllomorphPattern {
if (word.startsWith(p as String)) {
return patterns[affix] != null
? patterns[affix]!
.map((s) => s + word.substring(affix.length))
.toList()
.map((s) => s + word.substring(affix.length))
.toList()
: [word.substring(affix.length)];
}
break;
@@ -160,6 +186,22 @@ class AllomorphPattern {
}
return null;
}
@override
int get hashCode => Object.hash(
type,
ListEquality().hash(lookAheadBehind),
MapEquality().hash(patterns),
);
@override
bool operator ==(Object other) {
if (identical(this, other)) return true;
return other is AllomorphPattern &&
other.type == type &&
ListEquality().equals(other.lookAheadBehind, lookAheadBehind) &&
MapEquality().equals(other.patterns, patterns);
}
}
class Lemmatized {
@@ -186,7 +228,7 @@ class Lemmatized {
@override
String toString() {
final childrenString = children
.map((c) => ' - ' + c.toString().split('\n').join('\n '))
.map((c) => ' - ${c.toString().split('\n').join('\n ')}')
.join('\n');
if (children.isEmpty) {
@@ -206,9 +248,10 @@ List<Lemmatized> _lemmatize(LemmatizationRule parentRule, String word) {
final filteredLemmatizationRules = parentRule.validChildClasses == null
? lemmatizationRules
: lemmatizationRules.where(
(r) => parentRule.validChildClasses!.contains(r.wordClass),
);
: [
for (final wordClass in parentRule.validChildClasses!)
...lemmatizationRulesByWordClass[wordClass]!,
];
for (final rule in filteredLemmatizationRules) {
if (rule.matches(word)) {
@@ -239,9 +282,6 @@ Lemmatized lemmatize(String word) {
return Lemmatized(
original: word,
rule: inputRule,
children: _lemmatize(
inputRule,
word,
),
children: _lemmatize(inputRule, word),
);
}

View File

@@ -1,10 +1,17 @@
import 'package:jadb/util/lemmatizer/lemmatizer.dart';
import 'package:jadb/util/lemmatizer/rules/godan-verbs.dart';
import 'package:jadb/util/lemmatizer/rules/i-adjectives.dart';
import 'package:jadb/util/lemmatizer/rules/ichidan-verbs.dart';
import 'package:jadb/util/lemmatizer/rules/godan_verbs.dart';
import 'package:jadb/util/lemmatizer/rules/i_adjectives.dart';
import 'package:jadb/util/lemmatizer/rules/ichidan_verbs.dart';
List<LemmatizationRule> lemmatizationRules = [
final List<LemmatizationRule> lemmatizationRules = List.unmodifiable([
...ichidanVerbLemmatizationRules,
...godanVerbLemmatizationRules,
...iAdjectiveLemmatizationRules,
];
]);
final Map<WordClass, List<LemmatizationRule>> lemmatizationRulesByWordClass =
Map.unmodifiable({
WordClass.ichidanVerb: ichidanVerbLemmatizationRules,
WordClass.iAdjective: iAdjectiveLemmatizationRules,
WordClass.godanVerb: godanVerbLemmatizationRules,
});

View File

@@ -1,457 +0,0 @@
import 'package:jadb/util/lemmatizer/lemmatizer.dart';
List<LemmatizationRule> godanVerbLemmatizationRules = [
LemmatizationRule(
name: 'Godan verb - base form',
terminal: true,
pattern: AllomorphPattern(
patterns: {
'': [''],
'': [''],
'': [''],
'': [''],
'': [''],
'': [''],
'': [''],
'': [''],
'': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative form',
pattern: AllomorphPattern(
patterns: {
'わない': [''],
'かない': [''],
'がない': [''],
'さない': [''],
'たない': [''],
'なない': [''],
'ばない': [''],
'まない': [''],
'らない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - past form',
pattern: AllomorphPattern(
patterns: {
'した': [''],
'った': ['', '', ''],
'んだ': ['', '', ''],
'いだ': [''],
'いた': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - te-form',
pattern: AllomorphPattern(
patterns: {
'いて': ['', ''],
'して': [''],
'って': ['', '', ''],
'んで': ['', '', ''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - te-form with いる',
pattern: AllomorphPattern(
patterns: {
'いている': ['', ''],
'している': [''],
'っている': ['', '', ''],
'んでいる': ['', '', ''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - te-form with いた',
pattern: AllomorphPattern(
patterns: {
'いていた': ['', ''],
'していた': [''],
'っていた': ['', '', ''],
'んでいた': ['', '', ''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - conditional form',
pattern: AllomorphPattern(
patterns: {
'けば': [''],
'げば': [''],
'せば': [''],
'てば': ['', '', ''],
'ねば': [''],
'べば': [''],
'めば': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - volitional form',
pattern: AllomorphPattern(
patterns: {
'おう': [''],
'こう': [''],
'ごう': [''],
'そう': [''],
'とう': ['', '', ''],
'のう': [''],
'ぼう': [''],
'もう': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - potential form',
pattern: AllomorphPattern(
patterns: {
'ける': [''],
'げる': [''],
'せる': [''],
'てる': ['', '', ''],
'ねる': [''],
'べる': [''],
'める': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - passive form',
pattern: AllomorphPattern(
patterns: {
'かれる': [''],
'がれる': [''],
'される': [''],
'たれる': ['', '', ''],
'なれる': [''],
'ばれる': [''],
'まれる': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - causative form',
pattern: AllomorphPattern(
patterns: {
'かせる': [''],
'がせる': [''],
'させる': [''],
'たせる': ['', '', ''],
'なせる': [''],
'ばせる': [''],
'ませる': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - causative-passive form',
pattern: AllomorphPattern(
patterns: {
'かされる': [''],
'がされる': [''],
'される': [''],
'たされる': ['', '', ''],
'なされる': [''],
'ばされる': [''],
'まされる': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - imperative form',
pattern: AllomorphPattern(
patterns: {
'': [''],
'': [''],
'': [''],
'': [''],
'': ['', '', ''],
'': [''],
'': [''],
'': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative past form',
pattern: AllomorphPattern(
patterns: {
'わなかった': [''],
'かなかった': [''],
'がなかった': [''],
'さなかった': [''],
'たなかった': [''],
'ななかった': [''],
'ばなかった': [''],
'まなかった': [''],
'らなかった': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative te-form',
pattern: AllomorphPattern(
patterns: {
'わなくて': [''],
'かなくて': [''],
'がなくて': [''],
'さなくて': [''],
'たなくて': [''],
'ななくて': [''],
'ばなくて': [''],
'まなくて': [''],
'らなくて': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative conditional form',
pattern: AllomorphPattern(
patterns: {
'わなければ': [''],
'かなければ': [''],
'がなければ': [''],
'さなければ': [''],
'たなければ': [''],
'ななければ': [''],
'ばなければ': [''],
'まなければ': [''],
'らなければ': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative volitional form',
pattern: AllomorphPattern(
patterns: {
'うまい': [''],
'くまい': [''],
'ぐまい': [''],
'すまい': [''],
'つまい': ['', '', ''],
'ぬまい': [''],
'ぶまい': [''],
'むまい': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative potential form',
pattern: AllomorphPattern(
patterns: {
'けない': [''],
'げない': [''],
'せない': [''],
'てない': ['', '', ''],
'ねない': [''],
'べない': [''],
'めない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative passive form',
pattern: AllomorphPattern(
patterns: {
'かれない': [''],
'がれない': [''],
'されない': [''],
'たれない': ['', '', ''],
'なれない': [''],
'ばれない': [''],
'まれない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative causative form',
pattern: AllomorphPattern(
patterns: {
'かせない': [''],
'がせない': [''],
'させない': [''],
'たせない': ['', '', ''],
'なせない': [''],
'ばせない': [''],
'ませない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative causative-passive form',
pattern: AllomorphPattern(
patterns: {
'かされない': [''],
'がされない': [''],
'されない': [''],
'たされない': ['', '', ''],
'なされない': [''],
'ばされない': [''],
'まされない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative imperative form',
pattern: AllomorphPattern(
patterns: {
'うな': [''],
'くな': [''],
'ぐな': [''],
'すな': [''],
'つな': [''],
'ぬな': [''],
'ぶな': [''],
'むな': [''],
'るな': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - desire form',
pattern: AllomorphPattern(
patterns: {
'きたい': [''],
'ぎたい': [''],
'したい': [''],
'ちたい': [''],
'にたい': [''],
'びたい': [''],
'みたい': [''],
'りたい': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative desire form',
pattern: AllomorphPattern(
patterns: {
'いたくない': [''],
'きたくない': [''],
'ぎたくない': [''],
'したくない': [''],
'ちたくない': [''],
'にたくない': [''],
'びたくない': [''],
'みたくない': [''],
'りたくない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - past desire form',
pattern: AllomorphPattern(
patterns: {
'きたかった': [''],
'ぎたかった': [''],
'したかった': [''],
'ちたかった': [''],
'にたかった': [''],
'びたかった': [''],
'みたかった': [''],
'りたかった': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
LemmatizationRule(
name: 'Godan verb - negative past desire form',
pattern: AllomorphPattern(
patterns: {
'いたくなかった': [''],
'きたくなかった': [''],
'ぎたくなかった': [''],
'したくなかった': [''],
'ちたくなかった': [''],
'にたくなかった': [''],
'びたくなかった': [''],
'みたくなかった': [''],
'りたくなかった': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: [WordClass.godanVerb],
wordClass: WordClass.godanVerb,
),
];

View File

@@ -0,0 +1,509 @@
import 'package:jadb/util/lemmatizer/lemmatizer.dart';
final LemmatizationRule godanVerbBase = LemmatizationRule(
name: 'Godan verb - base form',
terminal: true,
pattern: AllomorphPattern(
patterns: {
'': [''],
'': [''],
'': [''],
'': [''],
'': [''],
'': [''],
'': [''],
'': [''],
'': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegative = LemmatizationRule(
name: 'Godan verb - negative form',
pattern: AllomorphPattern(
patterns: {
'わない': [''],
'かない': [''],
'がない': [''],
'さない': [''],
'たない': [''],
'なない': [''],
'ばない': [''],
'まない': [''],
'らない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbPast = LemmatizationRule(
name: 'Godan verb - past form',
pattern: AllomorphPattern(
patterns: {
'した': [''],
'った': ['', '', ''],
'んだ': ['', '', ''],
'いだ': [''],
'いた': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbTe = LemmatizationRule(
name: 'Godan verb - te-form',
pattern: AllomorphPattern(
patterns: {
'いて': ['', ''],
'して': [''],
'って': ['', '', ''],
'んで': ['', '', ''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbTeiru = LemmatizationRule(
name: 'Godan verb - te-form with いる',
pattern: AllomorphPattern(
patterns: {
'いている': ['', ''],
'している': [''],
'っている': ['', '', ''],
'んでいる': ['', '', ''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbTeita = LemmatizationRule(
name: 'Godan verb - te-form with いた',
pattern: AllomorphPattern(
patterns: {
'いていた': ['', ''],
'していた': [''],
'っていた': ['', '', ''],
'んでいた': ['', '', ''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbConditional = LemmatizationRule(
name: 'Godan verb - conditional form',
pattern: AllomorphPattern(
patterns: {
'けば': [''],
'げば': [''],
'せば': [''],
'てば': ['', '', ''],
'ねば': [''],
'べば': [''],
'めば': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbVolitional = LemmatizationRule(
name: 'Godan verb - volitional form',
pattern: AllomorphPattern(
patterns: {
'おう': [''],
'こう': [''],
'ごう': [''],
'そう': [''],
'とう': ['', '', ''],
'のう': [''],
'ぼう': [''],
'もう': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbPotential = LemmatizationRule(
name: 'Godan verb - potential form',
pattern: AllomorphPattern(
patterns: {
'ける': [''],
'げる': [''],
'せる': [''],
'てる': ['', '', ''],
'ねる': [''],
'べる': [''],
'める': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbPassive = LemmatizationRule(
name: 'Godan verb - passive form',
pattern: AllomorphPattern(
patterns: {
'かれる': [''],
'がれる': [''],
'される': [''],
'たれる': ['', '', ''],
'なれる': [''],
'ばれる': [''],
'まれる': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbCausative = LemmatizationRule(
name: 'Godan verb - causative form',
pattern: AllomorphPattern(
patterns: {
'かせる': [''],
'がせる': [''],
'させる': [''],
'たせる': ['', '', ''],
'なせる': [''],
'ばせる': [''],
'ませる': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbCausativePassive = LemmatizationRule(
name: 'Godan verb - causative-passive form',
pattern: AllomorphPattern(
patterns: {
'かされる': [''],
'がされる': [''],
'される': [''],
'たされる': ['', '', ''],
'なされる': [''],
'ばされる': [''],
'まされる': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbImperative = LemmatizationRule(
name: 'Godan verb - imperative form',
pattern: AllomorphPattern(
patterns: {
'': [''],
'': [''],
'': [''],
'': [''],
'': ['', '', ''],
'': [''],
'': [''],
'': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativePast = LemmatizationRule(
name: 'Godan verb - negative past form',
pattern: AllomorphPattern(
patterns: {
'わなかった': [''],
'かなかった': [''],
'がなかった': [''],
'さなかった': [''],
'たなかった': [''],
'ななかった': [''],
'ばなかった': [''],
'まなかった': [''],
'らなかった': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativeTe = LemmatizationRule(
name: 'Godan verb - negative te-form',
pattern: AllomorphPattern(
patterns: {
'わなくて': [''],
'かなくて': [''],
'がなくて': [''],
'さなくて': [''],
'たなくて': [''],
'ななくて': [''],
'ばなくて': [''],
'まなくて': [''],
'らなくて': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativeConditional = LemmatizationRule(
name: 'Godan verb - negative conditional form',
pattern: AllomorphPattern(
patterns: {
'わなければ': [''],
'かなければ': [''],
'がなければ': [''],
'さなければ': [''],
'たなければ': [''],
'ななければ': [''],
'ばなければ': [''],
'まなければ': [''],
'らなければ': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativeVolitional = LemmatizationRule(
name: 'Godan verb - negative volitional form',
pattern: AllomorphPattern(
patterns: {
'うまい': [''],
'くまい': [''],
'ぐまい': [''],
'すまい': [''],
'つまい': ['', '', ''],
'ぬまい': [''],
'ぶまい': [''],
'むまい': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativePotential = LemmatizationRule(
name: 'Godan verb - negative potential form',
pattern: AllomorphPattern(
patterns: {
'けない': [''],
'げない': [''],
'せない': [''],
'てない': ['', '', ''],
'ねない': [''],
'べない': [''],
'めない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativePassive = LemmatizationRule(
name: 'Godan verb - negative passive form',
pattern: AllomorphPattern(
patterns: {
'かれない': [''],
'がれない': [''],
'されない': [''],
'たれない': ['', '', ''],
'なれない': [''],
'ばれない': [''],
'まれない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativeCausative = LemmatizationRule(
name: 'Godan verb - negative causative form',
pattern: AllomorphPattern(
patterns: {
'かせない': [''],
'がせない': [''],
'させない': [''],
'たせない': ['', '', ''],
'なせない': [''],
'ばせない': [''],
'ませない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativeCausativePassive = LemmatizationRule(
name: 'Godan verb - negative causative-passive form',
pattern: AllomorphPattern(
patterns: {
'かされない': [''],
'がされない': [''],
'されない': [''],
'たされない': ['', '', ''],
'なされない': [''],
'ばされない': [''],
'まされない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativeImperative = LemmatizationRule(
name: 'Godan verb - negative imperative form',
pattern: AllomorphPattern(
patterns: {
'うな': [''],
'くな': [''],
'ぐな': [''],
'すな': [''],
'つな': [''],
'ぬな': [''],
'ぶな': [''],
'むな': [''],
'るな': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbDesire = LemmatizationRule(
name: 'Godan verb - desire form',
pattern: AllomorphPattern(
patterns: {
'きたい': [''],
'ぎたい': [''],
'したい': [''],
'ちたい': [''],
'にたい': [''],
'びたい': [''],
'みたい': [''],
'りたい': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativeDesire = LemmatizationRule(
name: 'Godan verb - negative desire form',
pattern: AllomorphPattern(
patterns: {
'いたくない': [''],
'きたくない': [''],
'ぎたくない': [''],
'したくない': [''],
'ちたくない': [''],
'にたくない': [''],
'びたくない': [''],
'みたくない': [''],
'りたくない': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbPastDesire = LemmatizationRule(
name: 'Godan verb - past desire form',
pattern: AllomorphPattern(
patterns: {
'きたかった': [''],
'ぎたかった': [''],
'したかった': [''],
'ちたかった': [''],
'にたかった': [''],
'びたかった': [''],
'みたかった': [''],
'りたかった': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final LemmatizationRule godanVerbNegativePastDesire = LemmatizationRule(
name: 'Godan verb - negative past desire form',
pattern: AllomorphPattern(
patterns: {
'いたくなかった': [''],
'きたくなかった': [''],
'ぎたくなかった': [''],
'したくなかった': [''],
'ちたくなかった': [''],
'にたくなかった': [''],
'びたくなかった': [''],
'みたくなかった': [''],
'りたくなかった': [''],
},
type: LemmatizationRuleType.suffix,
),
validChildClasses: {WordClass.godanVerb},
wordClass: WordClass.godanVerb,
);
final List<LemmatizationRule> godanVerbLemmatizationRules = List.unmodifiable([
godanVerbBase,
godanVerbNegative,
godanVerbPast,
godanVerbTe,
godanVerbTeiru,
godanVerbTeita,
godanVerbConditional,
godanVerbVolitional,
godanVerbPotential,
godanVerbPassive,
godanVerbCausative,
godanVerbCausativePassive,
godanVerbImperative,
godanVerbNegativePast,
godanVerbNegativeTe,
godanVerbNegativeConditional,
godanVerbNegativeVolitional,
godanVerbNegativePotential,
godanVerbNegativePassive,
godanVerbNegativeCausative,
godanVerbNegativeCausativePassive,
godanVerbNegativeImperative,
godanVerbDesire,
godanVerbNegativeDesire,
godanVerbPastDesire,
godanVerbNegativePastDesire,
]);

View File

@@ -1,61 +0,0 @@
import 'package:jadb/util/lemmatizer/lemmatizer.dart';
List<LemmatizationRule> iAdjectiveLemmatizationRules = [
LemmatizationRule.simple(
name: 'I adjective - base form',
terminal: true,
pattern: '',
replacement: '',
validChildClasses: [WordClass.iAdjective],
wordClass: WordClass.iAdjective,
),
LemmatizationRule.simple(
name: 'I adjective - negative form',
pattern: 'くない',
replacement: '',
validChildClasses: [WordClass.iAdjective],
wordClass: WordClass.iAdjective,
),
LemmatizationRule.simple(
name: 'I adjective - past form',
pattern: 'かった',
replacement: '',
validChildClasses: [WordClass.iAdjective],
wordClass: WordClass.iAdjective,
),
LemmatizationRule.simple(
name: 'I adjective - negative past form',
pattern: 'くなかった',
replacement: '',
validChildClasses: [WordClass.iAdjective],
wordClass: WordClass.iAdjective,
),
LemmatizationRule.simple(
name: 'I adjective - te-form',
pattern: 'くて',
replacement: '',
validChildClasses: [WordClass.iAdjective],
wordClass: WordClass.iAdjective,
),
LemmatizationRule.simple(
name: 'I adjective - conditional form',
pattern: 'ければ',
replacement: '',
validChildClasses: [WordClass.iAdjective],
wordClass: WordClass.iAdjective,
),
LemmatizationRule.simple(
name: 'I adjective - volitional form',
pattern: 'かろう',
replacement: '',
validChildClasses: [WordClass.iAdjective],
wordClass: WordClass.iAdjective,
),
LemmatizationRule.simple(
name: 'I adjective - continuative form',
pattern: '',
replacement: '',
validChildClasses: [WordClass.iAdjective],
wordClass: WordClass.iAdjective,
),
];

View File

@@ -0,0 +1,77 @@
import 'package:jadb/util/lemmatizer/lemmatizer.dart';
final LemmatizationRule iAdjectiveBase = LemmatizationRule.simple(
name: 'I adjective - base form',
terminal: true,
pattern: '',
replacement: '',
validChildClasses: {WordClass.iAdjective},
wordClass: WordClass.iAdjective,
);
final LemmatizationRule iAdjectiveNegative = LemmatizationRule.simple(
name: 'I adjective - negative form',
pattern: 'くない',
replacement: '',
validChildClasses: {WordClass.iAdjective},
wordClass: WordClass.iAdjective,
);
final LemmatizationRule iAdjectivePast = LemmatizationRule.simple(
name: 'I adjective - past form',
pattern: 'かった',
replacement: '',
validChildClasses: {WordClass.iAdjective},
wordClass: WordClass.iAdjective,
);
final LemmatizationRule iAdjectiveNegativePast = LemmatizationRule.simple(
name: 'I adjective - negative past form',
pattern: 'くなかった',
replacement: '',
validChildClasses: {WordClass.iAdjective},
wordClass: WordClass.iAdjective,
);
final LemmatizationRule iAdjectiveTe = LemmatizationRule.simple(
name: 'I adjective - te-form',
pattern: 'くて',
replacement: '',
validChildClasses: {WordClass.iAdjective},
wordClass: WordClass.iAdjective,
);
final LemmatizationRule iAdjectiveConditional = LemmatizationRule.simple(
name: 'I adjective - conditional form',
pattern: 'ければ',
replacement: '',
validChildClasses: {WordClass.iAdjective},
wordClass: WordClass.iAdjective,
);
final LemmatizationRule iAdjectiveVolitional = LemmatizationRule.simple(
name: 'I adjective - volitional form',
pattern: 'かろう',
replacement: '',
validChildClasses: {WordClass.iAdjective},
wordClass: WordClass.iAdjective,
);
final LemmatizationRule iAdjectiveContinuative = LemmatizationRule.simple(
name: 'I adjective - continuative form',
pattern: '',
replacement: '',
validChildClasses: {WordClass.iAdjective},
wordClass: WordClass.iAdjective,
);
final List<LemmatizationRule> iAdjectiveLemmatizationRules = List.unmodifiable([
iAdjectiveBase,
iAdjectiveNegative,
iAdjectivePast,
iAdjectiveNegativePast,
iAdjectiveTe,
iAdjectiveConditional,
iAdjectiveVolitional,
iAdjectiveContinuative,
]);

View File

@@ -1,241 +0,0 @@
import 'package:jadb/util/lemmatizer/lemmatizer.dart';
import 'package:jadb/util/text_filtering.dart';
List<Pattern> lookBehinds = [
kanjiRegex,
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
];
List<LemmatizationRule> ichidanVerbLemmatizationRules = [
LemmatizationRule.simple(
name: 'Ichidan verb - base form',
terminal: true,
pattern: '',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative form',
pattern: 'ない',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - past form',
pattern: '',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - te-form',
pattern: '',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - te-form with いる',
pattern: 'ている',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - te-form with いた',
pattern: 'ていた',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - conditional form',
pattern: 'れば',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - volitional form',
pattern: 'よう',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - potential form',
pattern: 'られる',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - passive form',
pattern: 'られる',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - causative form',
pattern: 'させる',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - causative passive form',
pattern: 'させられる',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - imperative form',
pattern: '',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative past form',
pattern: 'なかった',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative te-form',
pattern: 'なくて',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative conditional form',
pattern: 'なければ',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative volitional form',
pattern: 'なかろう',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative potential form',
pattern: 'られない',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative passive form',
pattern: 'られない',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative causative form',
pattern: 'させない',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative causative passive form',
pattern: 'させられない',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative imperative form',
pattern: 'るな',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - desire form',
pattern: 'たい',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative desire form',
pattern: 'たくない',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - past desire form',
pattern: 'たかった',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
LemmatizationRule.simple(
name: 'Ichidan verb - negative past desire form',
pattern: 'たくなかった',
replacement: '',
lookAheadBehind: lookBehinds,
validChildClasses: [WordClass.ichidanVerb],
wordClass: WordClass.ichidanVerb,
),
];

View File

@@ -0,0 +1,331 @@
import 'package:jadb/util/lemmatizer/lemmatizer.dart';
import 'package:jadb/util/text_filtering.dart';
final List<Pattern> _lookBehinds = [
kanjiRegex,
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
];
final LemmatizationRule ichidanVerbBase = LemmatizationRule.simple(
name: 'Ichidan verb - base form',
terminal: true,
pattern: '',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegative = LemmatizationRule.simple(
name: 'Ichidan verb - negative form',
pattern: 'ない',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbPast = LemmatizationRule.simple(
name: 'Ichidan verb - past form',
pattern: '',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbTe = LemmatizationRule.simple(
name: 'Ichidan verb - te-form',
pattern: '',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbTeiru = LemmatizationRule.simple(
name: 'Ichidan verb - te-form with いる',
pattern: 'ている',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbTeita = LemmatizationRule.simple(
name: 'Ichidan verb - te-form with いた',
pattern: 'ていた',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbConditional = LemmatizationRule.simple(
name: 'Ichidan verb - conditional form',
pattern: 'れば',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbVolitional = LemmatizationRule.simple(
name: 'Ichidan verb - volitional form',
pattern: 'よう',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbPotential = LemmatizationRule.simple(
name: 'Ichidan verb - potential form',
pattern: 'られる',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbPassive = LemmatizationRule.simple(
name: 'Ichidan verb - passive form',
pattern: 'られる',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbCausative = LemmatizationRule.simple(
name: 'Ichidan verb - causative form',
pattern: 'させる',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbCausativePassive = LemmatizationRule.simple(
name: 'Ichidan verb - causative passive form',
pattern: 'させられる',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbImperative = LemmatizationRule.simple(
name: 'Ichidan verb - imperative form',
pattern: '',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativePast = LemmatizationRule.simple(
name: 'Ichidan verb - negative past form',
pattern: 'なかった',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativeTe = LemmatizationRule.simple(
name: 'Ichidan verb - negative te-form',
pattern: 'なくて',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativeConditional =
LemmatizationRule.simple(
name: 'Ichidan verb - negative conditional form',
pattern: 'なければ',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativeConditionalVariant1 =
LemmatizationRule.simple(
name: 'Ichidan verb - negative conditional form (informal variant)',
pattern: 'なきゃ',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativeConditionalVariant2 =
LemmatizationRule.simple(
name: 'Ichidan verb - negative conditional form (informal variant)',
pattern: 'なくちゃ',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativeConditionalVariant3 =
LemmatizationRule.simple(
name: 'Ichidan verb - negative conditional form (informal variant)',
pattern: 'ないと',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativeVolitional =
LemmatizationRule.simple(
name: 'Ichidan verb - negative volitional form',
pattern: 'なかろう',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativePotential = LemmatizationRule.simple(
name: 'Ichidan verb - negative potential form',
pattern: 'られない',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativePassive = LemmatizationRule.simple(
name: 'Ichidan verb - negative passive form',
pattern: 'られない',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativeCausative = LemmatizationRule.simple(
name: 'Ichidan verb - negative causative form',
pattern: 'させない',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativeCausativePassive =
LemmatizationRule.simple(
name: 'Ichidan verb - negative causative passive form',
pattern: 'させられない',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativeImperative =
LemmatizationRule.simple(
name: 'Ichidan verb - negative imperative form',
pattern: 'るな',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbDesire = LemmatizationRule.simple(
name: 'Ichidan verb - desire form',
pattern: 'たい',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativeDesire = LemmatizationRule.simple(
name: 'Ichidan verb - negative desire form',
pattern: 'たくない',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbPastDesire = LemmatizationRule.simple(
name: 'Ichidan verb - past desire form',
pattern: 'たかった',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final LemmatizationRule ichidanVerbNegativePastDesire =
LemmatizationRule.simple(
name: 'Ichidan verb - negative past desire form',
pattern: 'たくなかった',
replacement: '',
lookAheadBehind: _lookBehinds,
validChildClasses: {WordClass.ichidanVerb},
wordClass: WordClass.ichidanVerb,
);
final List<LemmatizationRule> ichidanVerbLemmatizationRules =
List.unmodifiable([
ichidanVerbBase,
ichidanVerbNegative,
ichidanVerbPast,
ichidanVerbTe,
ichidanVerbTeiru,
ichidanVerbTeita,
ichidanVerbConditional,
ichidanVerbVolitional,
ichidanVerbPotential,
ichidanVerbPassive,
ichidanVerbCausative,
ichidanVerbCausativePassive,
ichidanVerbImperative,
ichidanVerbNegativePast,
ichidanVerbNegativeTe,
ichidanVerbNegativeConditional,
ichidanVerbNegativeConditionalVariant1,
ichidanVerbNegativeConditionalVariant2,
ichidanVerbNegativeConditionalVariant3,
ichidanVerbNegativeVolitional,
ichidanVerbNegativePotential,
ichidanVerbNegativePassive,
ichidanVerbNegativeCausative,
ichidanVerbNegativeCausativePassive,
ichidanVerbNegativeImperative,
ichidanVerbDesire,
ichidanVerbNegativeDesire,
ichidanVerbPastDesire,
ichidanVerbNegativePastDesire,
]);

View File

@@ -1,9 +1,9 @@
// Source: https://github.com/Kimtaro/ve/blob/master/lib/providers/japanese_transliterators.rb
const hiragana_syllabic_n = '';
const hiragana_small_tsu = '';
const hiraganaSyllabicN = '';
const hiraganaSmallTsu = '';
const Map<String, String> hiragana_to_latin = {
const Map<String, String> hiraganaToLatin = {
'': 'a',
'': 'i',
'': 'u',
@@ -209,7 +209,7 @@ const Map<String, String> hiragana_to_latin = {
'': 'yori',
};
const Map<String, String> latin_to_hiragana = {
const Map<String, String> latinToHiragana = {
'a': '',
'i': '',
'u': '',
@@ -481,12 +481,13 @@ const Map<String, String> latin_to_hiragana = {
'#~': '',
};
bool _smallTsu(String for_conversion) => for_conversion == hiragana_small_tsu;
bool _nFollowedByYuYeYo(String for_conversion, String kana) =>
for_conversion == hiragana_syllabic_n &&
bool _smallTsu(String forConversion) => forConversion == hiraganaSmallTsu;
bool _nFollowedByYuYeYo(String forConversion, String kana) =>
forConversion == hiraganaSyllabicN &&
kana.length > 1 &&
'やゆよ'.contains(kana.substring(1, 2));
/// Transliterates a string of hiragana characters to Latin script (romaji).
String transliterateHiraganaToLatin(String hiragana) {
String kana = hiragana;
String romaji = '';
@@ -495,17 +496,17 @@ String transliterateHiraganaToLatin(String hiragana) {
while (kana.isNotEmpty) {
final lengths = [if (kana.length > 1) 2, 1];
for (final length in lengths) {
final String for_conversion = kana.substring(0, length);
final String forConversion = kana.substring(0, length);
String? mora;
if (_smallTsu(for_conversion)) {
if (_smallTsu(forConversion)) {
geminate = true;
kana = kana.replaceRange(0, length, '');
break;
} else if (_nFollowedByYuYeYo(for_conversion, kana)) {
} else if (_nFollowedByYuYeYo(forConversion, kana)) {
mora = "n'";
}
mora ??= hiragana_to_latin[for_conversion];
mora ??= hiraganaToLatin[forConversion];
if (mora != null) {
if (geminate) {
@@ -516,7 +517,7 @@ String transliterateHiraganaToLatin(String hiragana) {
kana = kana.replaceRange(0, length, '');
break;
} else if (length == 1) {
romaji += for_conversion;
romaji += forConversion;
kana = kana.replaceRange(0, length, '');
}
}
@@ -524,48 +525,92 @@ String transliterateHiraganaToLatin(String hiragana) {
return romaji;
}
bool _doubleNFollowedByAIUEO(String for_conversion) =>
RegExp(r'^nn[aiueo]$').hasMatch(for_conversion);
bool _hasTableMatch(String for_conversion) =>
latin_to_hiragana[for_conversion] != null;
bool _hasDoubleConsonant(String for_conversion, int length) =>
for_conversion == 'tch' ||
(length == 2 &&
RegExp(r'^([kgsztdnbpmyrlwchf])\1$').hasMatch(for_conversion));
/// Returns a list of pairs of indices into the input and output strings,
/// indicating which characters in the input string correspond to which characters in the output string.
List<(int, int)> transliterateHiraganaToLatinSpan(String hiragana) {
String kana = hiragana;
String romaji = '';
final List<(int, int)> spans = [];
bool geminate = false;
int kanaIndex = 0;
while (kana.isNotEmpty) {
final lengths = [if (kana.length > 1) 2, 1];
for (final length in lengths) {
final String forConversion = kana.substring(0, length);
String? mora;
if (_smallTsu(forConversion)) {
geminate = true;
kana = kana.replaceRange(0, length, '');
break;
} else if (_nFollowedByYuYeYo(forConversion, kana)) {
mora = "n'";
}
mora ??= hiraganaToLatin[forConversion];
if (mora != null) {
if (geminate) {
geminate = false;
romaji += mora.substring(0, 1);
}
spans.add((kanaIndex, romaji.length));
romaji += mora;
kana = kana.replaceRange(0, length, '');
kanaIndex += length;
break;
} else if (length == 1) {
spans.add((kanaIndex, romaji.length));
romaji += forConversion;
kana = kana.replaceRange(0, length, '');
kanaIndex += length;
}
}
}
return spans;
}
bool _doubleNFollowedByAIUEO(String forConversion) =>
RegExp(r'^nn[aiueo]$').hasMatch(forConversion);
bool _hasTableMatch(String forConversion) =>
latinToHiragana[forConversion] != null;
bool _hasDoubleConsonant(String forConversion, int length) =>
forConversion == 'tch' ||
(length == 2 &&
RegExp(r'^([kgsztdnbpmyrlwchf])\1$').hasMatch(forConversion));
/// Transliterates a string of Latin script (romaji) to hiragana characters.
String transliterateLatinToHiragana(String latin) {
String romaji =
latin.toLowerCase().replaceAll('mb', 'nb').replaceAll('mp', 'np');
String romaji = latin
.toLowerCase()
.replaceAll('mb', 'nb')
.replaceAll('mp', 'np');
String kana = '';
while (romaji.isNotEmpty) {
final lengths = [
if (romaji.length > 2) 3,
if (romaji.length > 1) 2,
1,
];
final lengths = [if (romaji.length > 2) 3, if (romaji.length > 1) 2, 1];
for (final length in lengths) {
String? mora;
int for_removal = length;
final String for_conversion = romaji.substring(0, length);
int forRemoval = length;
final String forConversion = romaji.substring(0, length);
if (_doubleNFollowedByAIUEO(for_conversion)) {
mora = hiragana_syllabic_n;
for_removal = 1;
} else if (_hasTableMatch(for_conversion)) {
mora = latin_to_hiragana[for_conversion];
} else if (_hasDoubleConsonant(for_conversion, length)) {
mora = hiragana_small_tsu;
for_removal = 1;
if (_doubleNFollowedByAIUEO(forConversion)) {
mora = hiraganaSyllabicN;
forRemoval = 1;
} else if (_hasTableMatch(forConversion)) {
mora = latinToHiragana[forConversion];
} else if (_hasDoubleConsonant(forConversion, length)) {
mora = hiraganaSmallTsu;
forRemoval = 1;
}
if (mora != null) {
kana += mora;
romaji = romaji.replaceRange(0, for_removal, '');
romaji = romaji.replaceRange(0, forRemoval, '');
break;
} else if (length == 1) {
kana += for_conversion;
kana += forConversion;
romaji = romaji.replaceRange(0, 1, '');
}
}
@@ -574,37 +619,83 @@ String transliterateLatinToHiragana(String latin) {
return kana;
}
/// Returns a list of pairs of indices into the input and output strings,
/// indicating which characters in the input string correspond to which characters in the output string.
List<(int, int)> transliterateLatinToHiraganaSpan(String latin) {
String romaji = latin
.toLowerCase()
.replaceAll('mb', 'nb')
.replaceAll('mp', 'np');
String kana = '';
final List<(int, int)> spans = [];
int latinIndex = 0;
while (romaji.isNotEmpty) {
final lengths = [if (romaji.length > 2) 3, if (romaji.length > 1) 2, 1];
for (final length in lengths) {
String? mora;
int forRemoval = length;
final String forConversion = romaji.substring(0, length);
if (_doubleNFollowedByAIUEO(forConversion)) {
mora = hiraganaSyllabicN;
forRemoval = 1;
} else if (_hasTableMatch(forConversion)) {
mora = latinToHiragana[forConversion];
} else if (_hasDoubleConsonant(forConversion, length)) {
mora = hiraganaSmallTsu;
forRemoval = 1;
}
if (mora != null) {
spans.add((latinIndex, kana.length));
kana += mora;
romaji = romaji.replaceRange(0, forRemoval, '');
latinIndex += forRemoval;
break;
} else if (length == 1) {
spans.add((latinIndex, kana.length));
kana += forConversion;
romaji = romaji.replaceRange(0, 1, '');
latinIndex += 1;
}
}
}
return spans;
}
String _transposeCodepointsInRange(
String text,
int distance,
int rangeStart,
int rangeEnd,
) =>
String.fromCharCodes(
text.codeUnits
.map((c) => c + ((rangeStart <= c && c <= rangeEnd) ? distance : 0)),
);
) => String.fromCharCodes(
text.codeUnits.map(
(c) => c + ((rangeStart <= c && c <= rangeEnd) ? distance : 0),
),
);
/// Transliterates a string of kana characters (hiragana or katakana) to Latin script (romaji).
String transliterateKanaToLatin(String kana) =>
transliterateHiraganaToLatin(transliterateKatakanaToHiragana(kana));
/// Transliterates a string of Latin script (romaji) to katakana characters.
String transliterateLatinToKatakana(String latin) =>
transliterateHiraganaToKatakana(transliterateLatinToHiragana(latin));
/// Transliterates a string of katakana characters to hiragana characters.
String transliterateKatakanaToHiragana(String katakana) =>
_transposeCodepointsInRange(katakana, -96, 12449, 12534);
/// Transliterates a string of hiragana characters to katakana characters.
String transliterateHiraganaToKatakana(String hiragana) =>
_transposeCodepointsInRange(hiragana, 96, 12353, 12438);
String transliterateFullwidthRomajiToHalfwidth(String halfwidth) =>
_transposeCodepointsInRange(
_transposeCodepointsInRange(
halfwidth,
-65248,
65281,
65374,
),
_transposeCodepointsInRange(halfwidth, -65248, 65281, 65374),
-12256,
12288,
12288,
@@ -612,12 +703,7 @@ String transliterateFullwidthRomajiToHalfwidth(String halfwidth) =>
String transliterateHalfwidthRomajiToFullwidth(String halfwidth) =>
_transposeCodepointsInRange(
_transposeCodepointsInRange(
halfwidth,
65248,
33,
126,
),
_transposeCodepointsInRange(halfwidth, 65248, 33, 126),
12256,
32,
32,

View File

@@ -1,3 +1,3 @@
String escapeStringValue(String value) {
return "'" + value.replaceAll("'", "''") + "'";
return "'${value.replaceAll("'", "''")}'";
}

View File

@@ -1,3 +1,16 @@
CREATE TABLE "JMdict_Version" (
"version" VARCHAR(10) PRIMARY KEY NOT NULL,
"date" DATE NOT NULL,
"hash" VARCHAR(64) NOT NULL
) WITHOUT ROWID;
CREATE TRIGGER "JMdict_Version_SingleRow"
BEFORE INSERT ON "JMdict_Version"
WHEN (SELECT COUNT(*) FROM "JMdict_Version") >= 1
BEGIN
SELECT RAISE(FAIL, 'Only one row allowed in JMdict_Version');
END;
CREATE TABLE "JMdict_InfoDialect" (
"id" VARCHAR(4) PRIMARY KEY NOT NULL,
"description" TEXT NOT NULL
@@ -39,35 +52,33 @@ CREATE TABLE "JMdict_Entry" (
-- KanjiElement
CREATE TABLE "JMdict_KanjiElement" (
"entryId" INTEGER NOT NULL REFERENCES "JMdict_Entry"("entryId"),
"orderNum" INTEGER NOT NULL,
"elementId" INTEGER PRIMARY KEY,
"entryId" INTEGER NOT NULL GENERATED ALWAYS AS ("elementId" / 100) STORED,
"orderNum" INTEGER NOT NULL GENERATED ALWAYS AS ("elementId" % 100) VIRTUAL,
"reading" TEXT NOT NULL,
"news" INTEGER CHECK ("news" BETWEEN 1 AND 2),
"ichi" INTEGER CHECK ("ichi" BETWEEN 1 AND 2),
"spec" INTEGER CHECK ("spec" BETWEEN 1 AND 2),
"gai" INTEGER CHECK ("gai" BETWEEN 1 AND 2),
"nf" INTEGER CHECK ("nf" BETWEEN 1 AND 48),
PRIMARY KEY ("entryId", "reading"),
UNIQUE("entryId", "orderNum")
FOREIGN KEY ("entryId") REFERENCES "JMdict_Entry"("entryId"),
UNIQUE("entryId", "reading")
) WITHOUT ROWID;
CREATE INDEX "JMdict_KanjiElement_byEntryId_byOrderNum" ON "JMdict_KanjiElement"("entryId", "orderNum");
CREATE INDEX "JMdict_KanjiElement_byReading" ON "JMdict_KanjiElement"("reading");
CREATE TABLE "JMdict_KanjiElementInfo" (
"entryId" INTEGER NOT NULL,
"reading" TEXT NOT NULL,
"elementId" INTEGER NOT NULL REFERENCES "JMdict_KanjiElement"("elementId"),
"info" TEXT NOT NULL REFERENCES "JMdict_InfoKanji"("id"),
FOREIGN KEY ("entryId", "reading")
REFERENCES "JMdict_KanjiElement"("entryId", "reading"),
PRIMARY KEY ("entryId", "reading", "info")
PRIMARY KEY ("elementId", "info")
) WITHOUT ROWID;
-- ReadingElement
CREATE TABLE "JMdict_ReadingElement" (
"entryId" INTEGER NOT NULL REFERENCES "JMdict_Entry"("entryId"),
"orderNum" INTEGER NOT NULL,
"elementId" INTEGER PRIMARY KEY,
"entryId" INTEGER NOT NULL GENERATED ALWAYS AS (("elementId" / 100) % 10000000) STORED,
"orderNum" INTEGER NOT NULL GENERATED ALWAYS AS ("elementId" % 100) VIRTUAL,
"reading" TEXT NOT NULL,
"readingDoesNotMatchKanji" BOOLEAN NOT NULL DEFAULT FALSE,
"news" INTEGER CHECK ("news" BETWEEN 1 AND 2),
@@ -75,56 +86,44 @@ CREATE TABLE "JMdict_ReadingElement" (
"spec" INTEGER CHECK ("spec" BETWEEN 1 AND 2),
"gai" INTEGER CHECK ("gai" BETWEEN 1 AND 2),
"nf" INTEGER CHECK ("nf" BETWEEN 1 AND 48),
PRIMARY KEY ("entryId", "reading"),
UNIQUE("entryId", "orderNum")
FOREIGN KEY ("entryId") REFERENCES "JMdict_Entry"("entryId"),
UNIQUE("entryId", "reading")
) WITHOUT ROWID;
CREATE INDEX "JMdict_ReadingElement_byEntryId_byOrderNum" ON "JMdict_ReadingElement"("entryId", "orderNum");
CREATE INDEX "JMdict_ReadingElement_byReading" ON "JMdict_ReadingElement"("reading");
CREATE TABLE "JMdict_ReadingElementRestriction" (
"entryId" INTEGER NOT NULL,
"reading" TEXT NOT NULL,
"elementId" INTEGER NOT NULL REFERENCES "JMdict_ReadingElement"("elementId"),
"restriction" TEXT NOT NULL,
FOREIGN KEY ("entryId", "reading")
REFERENCES "JMdict_ReadingElement"("entryId", "reading"),
PRIMARY KEY ("entryId", "reading", "restriction")
PRIMARY KEY ("elementId", "restriction")
) WITHOUT ROWID;
CREATE TABLE "JMdict_ReadingElementInfo" (
"entryId" INTEGER NOT NULL,
"reading" TEXT NOT NULL,
"elementId" INTEGER NOT NULL REFERENCES "JMdict_ReadingElement"("elementId"),
"info" TEXT NOT NULL REFERENCES "JMdict_InfoReading"("id"),
FOREIGN KEY ("entryId", "reading")
REFERENCES "JMdict_ReadingElement"("entryId", "reading"),
PRIMARY KEY ("entryId", "reading", "info")
PRIMARY KEY ("elementId", "info")
) WITHOUT ROWID;
-- Sense
CREATE TABLE "JMdict_Sense" (
"senseId" INTEGER PRIMARY KEY,
"entryId" INTEGER NOT NULL REFERENCES "JMdict_Entry"("entryId"),
"orderNum" INTEGER NOT NULL,
"entryId" INTEGER NOT NULL GENERATED ALWAYS AS ("senseId" / 100) STORED,
"orderNum" INTEGER NOT NULL GENERATED ALWAYS AS ("senseId" % 100) VIRTUAL,
FOREIGN KEY ("entryId") REFERENCES "JMdict_Entry"("entryId"),
UNIQUE("entryId", "orderNum")
);
CREATE INDEX "JMdict_Sense_byEntryId_byOrderNum" ON "JMdict_Sense"("entryId", "orderNum");
CREATE TABLE "JMdict_SenseRestrictedToKanji" (
"entryId" INTEGER NOT NULL,
"senseId" INTEGER NOT NULL REFERENCES "JMdict_Sense"("senseId"),
"kanji" TEXT NOT NULL,
FOREIGN KEY ("entryId", "kanji") REFERENCES "JMdict_KanjiElement"("entryId", "reading"),
PRIMARY KEY ("entryId", "senseId", "kanji")
"kanjiElementId" INTEGER NOT NULL REFERENCES "JMdict_KanjiElement"("elementId"),
PRIMARY KEY ("senseId", "kanjiElementId")
) WITHOUT ROWID;
CREATE TABLE "JMdict_SenseRestrictedToReading" (
"entryId" INTEGER NOT NULL,
"senseId" INTEGER NOT NULL REFERENCES "JMdict_Sense"("senseId"),
"reading" TEXT NOT NULL,
FOREIGN KEY ("entryId", "reading") REFERENCES "JMdict_ReadingElement"("entryId", "reading"),
PRIMARY KEY ("entryId", "senseId", "reading")
"readingElementId" INTEGER NOT NULL REFERENCES "JMdict_ReadingElement"("elementId"),
PRIMARY KEY ("senseId", "readingElementId")
) WITHOUT ROWID;
-- In order to add xrefs, you will need to have added the entry to xref to.
@@ -140,31 +139,41 @@ CREATE TABLE "JMdict_SenseRestrictedToReading" (
CREATE TABLE "JMdict_SenseSeeAlso" (
"senseId" INTEGER NOT NULL REFERENCES "JMdict_Sense"("senseId"),
"xrefEntryId" INTEGER NOT NULL,
"seeAlsoReading" TEXT,
"seeAlsoKanji" TEXT,
"seeAlsoSense" INTEGER,
-- For some entries, the cross reference is ambiguous. This means that while the ingestion
-- has determined some xrefEntryId, it is not guaranteed to be the correct one.
"ambiguous" BOOLEAN NOT NULL DEFAULT FALSE,
FOREIGN KEY ("xrefEntryId", "seeAlsoKanji") REFERENCES "JMdict_KanjiElement"("entryId", "reading"),
FOREIGN KEY ("xrefEntryId", "seeAlsoReading") REFERENCES "JMdict_ReadingElement"("entryId", "reading"),
FOREIGN KEY ("xrefEntryId", "seeAlsoSense") REFERENCES "JMdict_Sense"("entryId", "orderNum"),
UNIQUE("senseId", "xrefEntryId", "seeAlsoReading", "seeAlsoKanji", "seeAlsoSense")
"seeAlsoSenseKey" INTEGER GENERATED ALWAYS AS (
CASE
WHEN "seeAlsoSense" IS NOT NULL THEN ("xrefEntryId" * 100) + "seeAlsoSense"
ELSE NULL
END
) VIRTUAL,
FOREIGN KEY ("seeAlsoSenseKey") REFERENCES "JMdict_Sense"("senseId"),
PRIMARY KEY ("senseId", "xrefEntryId", "seeAlsoSense")
);
CREATE TABLE "JMdict_SenseAntonym" (
"senseId" INTEGER NOT NULL REFERENCES "JMdict_Sense"("senseId"),
"xrefEntryId" INTEGER NOT NULL,
"antonymReading" TEXT,
"antonymKanji" TEXT,
"antonymSense" INTEGER,
-- For some entries, the cross reference is ambiguous. This means that while the ingestion
-- has determined some xrefEntryId, it is not guaranteed to be the correct one.
"ambiguous" BOOLEAN NOT NULL DEFAULT FALSE,
FOREIGN KEY ("xrefEntryId", "antonymKanji") REFERENCES "JMdict_KanjiElement"("entryId", "reading"),
FOREIGN KEY ("xrefEntryId", "antonymReading") REFERENCES "JMdict_ReadingElement"("entryId", "reading"),
FOREIGN KEY ("xrefEntryId", "antonymSense") REFERENCES "JMdict_Sense"("entryId", "orderNum"),
UNIQUE("senseId", "xrefEntryId", "antonymReading", "antonymKanji", "antonymSense")
"antonymSenseKey" INTEGER GENERATED ALWAYS AS (
CASE
WHEN "antonymSense" IS NOT NULL THEN ("xrefEntryId" * 100) + "antonymSense"
ELSE NULL
END
) VIRTUAL,
FOREIGN KEY ("antonymSenseKey") REFERENCES "JMdict_Sense"("senseId"),
PRIMARY KEY ("senseId", "xrefEntryId", "antonymSense")
);
-- These cross references are going to be mostly accessed from a sense
@@ -215,12 +224,20 @@ CREATE TABLE "JMdict_SenseDialect" (
CREATE TABLE "JMdict_SenseGlossary" (
"senseId" INTEGER NOT NULL REFERENCES "JMdict_Sense"("senseId"),
"phrase" TEXT NOT NULL,
"language" CHAR(3) NOT NULL DEFAULT "eng",
"type" TEXT,
PRIMARY KEY ("senseId", "language", "phrase")
-- "language" CHAR(3) NOT NULL DEFAULT "eng",
-- PRIMARY KEY ("senseId", "language", "phrase")
PRIMARY KEY ("senseId", "phrase")
) WITHOUT ROWID;
CREATE INDEX "JMdict_SenseGlossary_byPhrase" ON JMdict_SenseGlossary("phrase");
-- CREATE INDEX "JMdict_SenseGlossary_byPhrase" ON JMdict_SenseGlossary("phrase");
CREATE TABLE "JMdict_SenseGlossaryType" (
"senseId" INTEGER NOT NULL REFERENCES "JMdict_Sense"("senseId"),
"phrase" TEXT NOT NULL,
"type" TEXT NOT NULL,
PRIMARY KEY ("senseId", "phrase", "type"),
FOREIGN KEY ("senseId", "phrase") REFERENCES "JMdict_SenseGlossary"("senseId", "phrase")
) WITHOUT ROWID;
CREATE TABLE "JMdict_SenseInfo" (
"senseId" INTEGER NOT NULL REFERENCES "JMdict_Sense"("senseId"),

View File

@@ -1,55 +1,55 @@
CREATE VIRTUAL TABLE "JMdict_KanjiElementFTS" USING FTS5("entryId" UNINDEXED, "reading");
CREATE VIRTUAL TABLE "JMdict_KanjiElementFTS" USING FTS5("elementId" UNINDEXED, "reading");
CREATE TRIGGER "JMdict_KanjiElement_InsertFTS"
AFTER INSERT ON "JMdict_KanjiElement"
BEGIN
INSERT INTO "JMdict_KanjiElementFTS"("entryId", "reading")
VALUES (NEW."entryId", NEW."reading");
INSERT INTO "JMdict_KanjiElementFTS"("elementId", "reading")
VALUES (NEW."elementId", NEW."reading");
END;
CREATE TRIGGER "JMdict_KanjiElement_UpdateFTS"
AFTER UPDATE OF "entryId", "reading"
AFTER UPDATE OF "elementId", "reading"
ON "JMdict_KanjiElement"
BEGIN
UPDATE "JMdict_KanjiElementFTS"
SET
"entryId" = NEW."entryId",
"elementId" = NEW."elementId",
"reading" = NEW."reading"
WHERE "entryId" = OLD."entryId";
WHERE "elementId" = OLD."elementId";
END;
CREATE TRIGGER "JMdict_KanjiElement_DeleteFTS"
AFTER DELETE ON "JMdict_KanjiElement"
BEGIN
DELETE FROM "JMdict_KanjiElementFTS"
WHERE "entryId" = OLD."entryId";
WHERE "elementId" = OLD."elementId";
END;
CREATE VIRTUAL TABLE "JMdict_ReadingElementFTS" USING FTS5("entryId" UNINDEXED, "reading");
CREATE VIRTUAL TABLE "JMdict_ReadingElementFTS" USING FTS5("elementId" UNINDEXED, "reading");
CREATE TRIGGER "JMdict_ReadingElement_InsertFTS"
AFTER INSERT ON "JMdict_ReadingElement"
BEGIN
INSERT INTO "JMdict_ReadingElementFTS"("entryId", "reading")
VALUES (NEW."entryId", NEW."reading");
INSERT INTO "JMdict_ReadingElementFTS"("elementId", "reading")
VALUES (NEW."elementId", NEW."reading");
END;
CREATE TRIGGER "JMdict_ReadingElement_UpdateFTS"
AFTER UPDATE OF "entryId", "reading"
AFTER UPDATE OF "elementId", "reading"
ON "JMdict_ReadingElement"
BEGIN
UPDATE "JMdict_ReadingElementFTS"
SET
"entryId" = NEW."entryId",
"elementId" = NEW."elementId",
"reading" = NEW."reading"
WHERE "entryId" = OLD."entryId";
WHERE "elementId" = OLD."elementId";
END;
CREATE TRIGGER "JMdict_ReadingElement_DeleteFTS"
AFTER DELETE ON "JMdict_ReadingElement"
BEGIN
DELETE FROM "JMdict_ReadingElementFTS"
WHERE "entryId" = OLD."entryId";
WHERE "elementId" = OLD."elementId";
END;

View File

@@ -1,3 +1,16 @@
CREATE TABLE "JMdict_JLPT_Version" (
"version" VARCHAR(10) PRIMARY KEY NOT NULL,
"date" DATE NOT NULL,
"hash" VARCHAR(64) NOT NULL
) WITHOUT ROWID;
CREATE TRIGGER "JMdict_JLPT_Version_SingleRow"
BEFORE INSERT ON "JMdict_JLPT_Version"
WHEN (SELECT COUNT(*) FROM "JMdict_JLPT_Version") >= 1
BEGIN
SELECT RAISE(FAIL, 'Only one row allowed in JMdict_JLPT_Version');
END;
CREATE TABLE "JMdict_JLPTTag" (
"entryId" INTEGER NOT NULL,
"jlptLevel" CHAR(2) NOT NULL CHECK ("jlptLevel" in ('N5', 'N4', 'N3', 'N2', 'N1')),

View File

@@ -1,28 +1,30 @@
CREATE TABLE "JMdict_EntryScore" (
"type" TEXT NOT NULL CHECK ("type" IN ('reading', 'kanji')),
"entryId" INTEGER NOT NULL,
"reading" TEXT NOT NULL,
"elementId" INTEGER PRIMARY KEY,
"score" INTEGER NOT NULL DEFAULT 0,
"common" BOOLEAN NOT NULL DEFAULT FALSE,
PRIMARY KEY ("type", "entryId", "reading")
"entryId" INTEGER NOT NULL GENERATED ALWAYS AS (("elementId" / 100) % 10000000) STORED,
"type" CHAR(1) NOT NULL GENERATED ALWAYS AS (CASE
WHEN "elementId" / 1000000000 = 0 THEN 'k'
ELSE 'r'
END) VIRTUAL,
FOREIGN KEY ("entryId") REFERENCES "JMdict_Entry"("entryId")
) WITHOUT ROWID;
CREATE INDEX "JMdict_EntryScore_byEntryId_byReading_byScore" ON "JMdict_EntryScore"("entryId", "reading", "score");
CREATE INDEX "JMdict_EntryScore_byElementId_byScore" ON "JMdict_EntryScore"("elementId", "score");
CREATE INDEX "JMdict_EntryScore_byScore" ON "JMdict_EntryScore"("score");
CREATE INDEX "JMdict_EntryScore_byCommon" ON "JMdict_EntryScore"("common");
CREATE INDEX "JMdict_EntryScore_byType_byEntryId_byReading_byScore" ON "JMdict_EntryScore"("type", "entryId", "reading", "score");
CREATE INDEX "JMdict_EntryScore_byType_byScore" ON "JMdict_EntryScore"("type", "score");
CREATE INDEX "JMdict_EntryScore_byType_byCommon" ON "JMdict_EntryScore"("type", "common");
CREATE INDEX "JMdict_EntryScore_byElementId_byCommon" ON "JMdict_EntryScore"("elementId", "common");
CREATE INDEX "JMdict_EntryScore_byCommon" ON "JMdict_EntryScore"("common");
-- NOTE: these views are deduplicated in order not to perform an unnecessary
-- UNION on every trigger
CREATE VIEW "JMdict_EntryScoreView_Reading" AS
SELECT
'reading' AS "type",
"JMdict_ReadingElement"."entryId",
"JMdict_ReadingElement"."reading",
"JMdict_ReadingElement"."elementId",
(
"news" IS 1
OR "ichi" IS 1
@@ -44,7 +46,7 @@ SELECT
+ (("spec" IS 2) * 5)
+ (("gai" IS 1) * 10)
+ (("gai" IS 2) * 5)
+ (("orderNum" IS 1) * 20)
-- + (("orderNum" IS 0) * 20)
- (substr(COALESCE("JMdict_JLPTTag"."jlptLevel", 'N0'), 2) * -5)
AS "score"
FROM "JMdict_ReadingElement"
@@ -52,9 +54,8 @@ LEFT JOIN "JMdict_JLPTTag" USING ("entryId");
CREATE VIEW "JMdict_EntryScoreView_Kanji" AS
SELECT
'kanji' AS "type",
"JMdict_KanjiElement"."entryId",
"JMdict_KanjiElement"."reading",
"JMdict_KanjiElement"."elementId",
(
"news" IS 1
OR "ichi" IS 1
@@ -76,7 +77,7 @@ SELECT
+ (("spec" IS 2) * 5)
+ (("gai" IS 1) * 10)
+ (("gai" IS 2) * 5)
+ (("orderNum" IS 1) * 20)
-- + (("orderNum" IS 0) * 20)
- (substr(COALESCE("JMdict_JLPTTag"."jlptLevel", 'N0'), 2) * -5)
AS "score"
FROM "JMdict_KanjiElement"
@@ -96,20 +97,18 @@ CREATE TRIGGER "JMdict_EntryScore_Insert_JMdict_ReadingElement"
AFTER INSERT ON "JMdict_ReadingElement"
BEGIN
INSERT INTO "JMdict_EntryScore" (
"type",
"entryId",
"reading",
"elementId",
"score",
"common"
)
SELECT "type", "entryId", "reading", "score", "common"
SELECT "elementId", "score", "common"
FROM "JMdict_EntryScoreView_Reading"
WHERE "entryId" = NEW."entryId"
AND "reading" = NEW."reading";
WHERE "elementId" = NEW."elementId"
AND "score" > 0;
END;
CREATE TRIGGER "JMdict_EntryScore_Update_JMdict_ReadingElement"
AFTER UPDATE OF "news", "ichi", "spec", "gai", "nf", "orderNum"
AFTER UPDATE OF "news", "ichi", "spec", "gai", "nf", "elementId"
ON "JMdict_ReadingElement"
BEGIN
UPDATE "JMdict_EntryScore"
@@ -117,17 +116,18 @@ BEGIN
"score" = "JMdict_EntryScoreView_Reading"."score",
"common" = "JMdict_EntryScoreView_Reading"."common"
FROM "JMdict_EntryScoreView_Reading"
WHERE "entryId" = NEW."entryId"
AND "reading" = NEW."reading";
WHERE "elementId" = NEW."elementId";
DELETE FROM "JMdict_EntryScore"
WHERE "elementId" = NEW."elementId"
AND "score" <= 0;
END;
CREATE TRIGGER "JMdict_EntryScore_Delete_JMdict_ReadingElement"
AFTER DELETE ON "JMdict_ReadingElement"
BEGIN
DELETE FROM "JMdict_EntryScore"
WHERE "type" = 'reading'
AND "entryId" = OLD."entryId"
AND "reading" = OLD."reading";
WHERE "elementId" = OLD."elementId";
END;
--- JMdict_KanjiElement triggers
@@ -136,20 +136,18 @@ CREATE TRIGGER "JMdict_EntryScore_Insert_JMdict_KanjiElement"
AFTER INSERT ON "JMdict_KanjiElement"
BEGIN
INSERT INTO "JMdict_EntryScore" (
"type",
"entryId",
"reading",
"elementId",
"score",
"common"
)
SELECT "type", "entryId", "reading", "score", "common"
SELECT "elementId", "score", "common"
FROM "JMdict_EntryScoreView_Kanji"
WHERE "entryId" = NEW."entryId"
AND "reading" = NEW."reading";
WHERE "elementId" = NEW."elementId"
AND "score" > 0;
END;
CREATE TRIGGER "JMdict_EntryScore_Update_JMdict_KanjiElement"
AFTER UPDATE OF "news", "ichi", "spec", "gai", "nf", "orderNum"
AFTER UPDATE OF "news", "ichi", "spec", "gai", "nf", "elementId"
ON "JMdict_KanjiElement"
BEGIN
UPDATE "JMdict_EntryScore"
@@ -157,17 +155,18 @@ BEGIN
"score" = "JMdict_EntryScoreView_Kanji"."score",
"common" = "JMdict_EntryScoreView_Kanji"."common"
FROM "JMdict_EntryScoreView_Kanji"
WHERE "entryId" = NEW."entryId"
AND "reading" = NEW."reading";
WHERE "elementId" = NEW."elementId";
DELETE FROM "JMdict_EntryScore"
WHERE "elementId" = NEW."elementId"
AND "score" <= 0;
END;
CREATE TRIGGER "JMdict_EntryScore_Delete_JMdict_KanjiElement"
AFTER DELETE ON "JMdict_KanjiElement"
BEGIN
DELETE FROM "JMdict_EntryScore"
WHERE "type" = 'kanji'
AND "entryId" = OLD."entryId"
AND "reading" = OLD."reading";
WHERE "elementId" = OLD."elementId";
END;
--- JMdict_JLPTTag triggers
@@ -181,8 +180,8 @@ BEGIN
"common" = "JMdict_EntryScoreView"."common"
FROM "JMdict_EntryScoreView"
WHERE "JMdict_EntryScoreView"."entryId" = NEW."entryId"
AND "JMdict_EntryScoreView"."entryId" = "JMdict_EntryScore"."entryId"
AND "JMdict_EntryScoreView"."reading" = "JMdict_EntryScore"."reading";
AND "JMdict_EntryScore"."entryId" = NEW."entryId"
AND "JMdict_EntryScoreView"."elementId" = "JMdict_EntryScore"."elementId";
END;
CREATE TRIGGER "JMdict_EntryScore_Update_JMdict_JLPTTag"
@@ -195,8 +194,8 @@ BEGIN
"common" = "JMdict_EntryScoreView"."common"
FROM "JMdict_EntryScoreView"
WHERE "JMdict_EntryScoreView"."entryId" = NEW."entryId"
AND "JMdict_EntryScoreView"."entryId" = "JMdict_EntryScore"."entryId"
AND "JMdict_EntryScoreView"."reading" = "JMdict_EntryScore"."reading";
AND "JMdict_EntryScore"."entryId" = NEW."entryId"
AND "JMdict_EntryScoreView"."elementId" = "JMdict_EntryScore"."elementId";
END;
CREATE TRIGGER "JMdict_EntryScore_Delete_JMdict_JLPTTag"
@@ -207,7 +206,11 @@ BEGIN
"score" = "JMdict_EntryScoreView"."score",
"common" = "JMdict_EntryScoreView"."common"
FROM "JMdict_EntryScoreView"
WHERE "JMdict_EntryScoreView"."entryId" = NEW."entryId"
AND "JMdict_EntryScoreView"."entryId" = "JMdict_EntryScore"."entryId"
AND "JMdict_EntryScoreView"."reading" = "JMdict_EntryScore"."reading";
WHERE "JMdict_EntryScoreView"."entryId" = OLD."entryId"
AND "JMdict_EntryScore"."entryId" = OLD."entryId"
AND "JMdict_EntryScoreView"."elementId" = "JMdict_EntryScore"."elementId";
DELETE FROM "JMdict_EntryScore"
WHERE "elementId" = OLD."elementId"
AND "score" <= 0;
END;

View File

@@ -1,3 +1,16 @@
CREATE TABLE "RADKFILE_Version" (
"version" VARCHAR(10) PRIMARY KEY NOT NULL,
"date" DATE NOT NULL,
"hash" VARCHAR(64) NOT NULL
) WITHOUT ROWID;
CREATE TRIGGER "RADKFILE_Version_SingleRow"
BEFORE INSERT ON "RADKFILE_Version"
WHEN (SELECT COUNT(*) FROM "RADKFILE_Version") >= 1
BEGIN
SELECT RAISE(FAIL, 'Only one row allowed in RADKFILE_Version');
END;
CREATE TABLE "RADKFILE" (
"kanji" CHAR(1) NOT NULL,
"radical" CHAR(1) NOT NULL,
@@ -5,4 +18,3 @@ CREATE TABLE "RADKFILE" (
) WITHOUT ROWID;
CREATE INDEX "RADK" ON "RADKFILE"("radical");
CREATE INDEX "KRAD" ON "RADKFILE"("kanji");

View File

@@ -1,3 +1,16 @@
CREATE TABLE "KANJIDIC_Version" (
"version" VARCHAR(10) PRIMARY KEY NOT NULL,
"date" DATE NOT NULL,
"hash" VARCHAR(64) NOT NULL
) WITHOUT ROWID;
CREATE TRIGGER "KANJIDIC_Version_SingleRow"
BEFORE INSERT ON "KANJIDIC_Version"
WHEN (SELECT COUNT(*) FROM "KANJIDIC_Version") >= 1
BEGIN
SELECT RAISE(FAIL, 'Only one row allowed in KANJIDIC_Version');
END;
CREATE TABLE "KANJIDIC_Character" (
"literal" CHAR(1) NOT NULL PRIMARY KEY,
"grade" INTEGER CHECK ("grade" BETWEEN 1 AND 10),
@@ -6,6 +19,21 @@ CREATE TABLE "KANJIDIC_Character" (
"jlpt" INTEGER
) WITHOUT ROWID;
CREATE TABLE "KANJIDIC_Grade" (
"kanji" CHAR(1) NOT NULL PRIMARY KEY REFERENCES "KANJIDIC_Character"("literal"),
"grade" INTEGER NOT NULL CHECK ("grade" BETWEEN 1 AND 10)
) WITHOUT ROWID;
CREATE TABLE "KANJIDIC_Frequency" (
"kanji" CHAR(1) NOT NULL PRIMARY KEY REFERENCES "KANJIDIC_Character"("literal"),
"frequency" INTEGER NOT NULL
) WITHOUT ROWID;
CREATE TABLE "KANJIDIC_JLPT" (
"kanji" CHAR(1) NOT NULL PRIMARY KEY REFERENCES "KANJIDIC_Character"("literal"),
"jlpt" INTEGER NOT NULL CHECK ("jlpt" BETWEEN 1 AND 5)
) WITHOUT ROWID;
CREATE TABLE "KANJIDIC_Codepoint" (
"kanji" CHAR(1) NOT NULL REFERENCES "KANJIDIC_Character"("literal"),
"type" VARCHAR(6) NOT NULL CHECK ("type" IN ('jis208', 'jis212', 'jis213', 'ucs')),

View File

@@ -4,7 +4,6 @@ CREATE TABLE "XREF__KANJIDIC_Radical__RADKFILE"(
PRIMARY KEY ("radicalId", "radicalSymbol")
) WITHOUT ROWID;
CREATE INDEX "XREF__KANJIDIC_Radical__RADKFILE__byRadicalId" ON "XREF__KANJIDIC_Radical__RADKFILE"("radicalId");
CREATE INDEX "XREF__KANJIDIC_Radical__RADKFILE__byRadicalSymbol" ON "XREF__KANJIDIC_Radical__RADKFILE"("radicalSymbol");
/* Source: https://ctext.org/kangxi-zidian */

View File

@@ -1,10 +1,10 @@
CREATE TABLE "XREF__JMdict_KanjiElement__KANJIDIC_Character"(
"entryId" INTEGER NOT NULL,
"reading" TEXT NOT NULL,
"kanji" CHAR(1) NOT NULL REFERENCES "KANJIDIC_Character"("literal"),
PRIMARY KEY ("entryId", "reading", "kanji"),
FOREIGN KEY ("entryId", "reading") REFERENCES "JMdict_KanjiElement"("entryId", "reading")
) WITHOUT ROWID;
-- CREATE TABLE "XREF__JMdict_KanjiElement__KANJIDIC_Character"(
-- "entryId" INTEGER NOT NULL,
-- "reading" TEXT NOT NULL,
-- "kanji" CHAR(1) NOT NULL REFERENCES "KANJIDIC_Character"("literal"),
-- PRIMARY KEY ("entryId", "reading", "kanji"),
-- FOREIGN KEY ("entryId", "reading") REFERENCES "JMdict_KanjiElement"("entryId", "reading")
-- ) WITHOUT ROWID;
CREATE INDEX "XREF__JMdict_KanjiElement__KANJIDIC_Character__byEntryId_byReading" ON "XREF__JMdict_KanjiElement__KANJIDIC_Character"("entryId", "reading");
CREATE INDEX "XREF__JMdict_KanjiElement__KANJIDIC_Character__byKanji" ON "XREF__JMdict_KanjiElement__KANJIDIC_Character"("kanji");
-- CREATE INDEX "XREF__JMdict_KanjiElement__KANJIDIC_Character__byEntryId_byReading" ON "XREF__JMdict_KanjiElement__KANJIDIC_Character"("entryId", "reading");
-- CREATE INDEX "XREF__JMdict_KanjiElement__KANJIDIC_Character__byKanji" ON "XREF__JMdict_KanjiElement__KANJIDIC_Character"("kanji");

View File

@@ -32,9 +32,9 @@ SELECT
THEN "JMdict_ReadingElement"."reading"
ELSE NULL
END AS "furigana",
COALESCE("JMdict_KanjiElement"."orderNum", 1)
COALESCE("JMdict_KanjiElement"."orderNum", 0)
+ "JMdict_ReadingElement"."orderNum"
= 2
= 0
AS "isFirst",
"JMdict_KanjiElement"."orderNum" AS "kanjiOrderNum",
"JMdict_ReadingElement"."orderNum" AS "readingOrderNum"
@@ -65,9 +65,7 @@ JOIN "JMdict_KanjiElement"
ON "JMdict_KanjiElementFTS"."entryId" = "JMdict_KanjiElement"."entryId"
AND "JMdict_KanjiElementFTS"."reading" LIKE '%' || "JMdict_KanjiElement"."reading"
JOIN "JMdict_EntryScore"
ON "JMdict_EntryScore"."type" = 'kanji'
AND "JMdict_KanjiElement"."entryId" = "JMdict_EntryScore"."entryId"
AND "JMdict_KanjiElement"."reading" = "JMdict_EntryScore"."reading"
ON "JMdict_EntryScore"."elementId" = "JMdict_KanjiElement"."elementId"
WHERE "JMdict_EntryScore"."common" = 1;
@@ -77,7 +75,6 @@ SELECT DISTINCT "radical" FROM "RADKFILE";
CREATE VIEW "JMdict_CombinedEntryScore"
AS
SELECT
"JMdict_EntryScore"."entryId",
MAX("JMdict_EntryScore"."score") AS "score",
MAX("JMdict_EntryScore"."common") AS "common"
FROM "JMdict_EntryScore"

View File

@@ -6,6 +6,7 @@
jmdict,
radkfile,
kanjidic2,
tanos-jlpt,
sqlite,
wal ? false,
}:
@@ -18,13 +19,32 @@ stdenvNoCC.mkDerivation {
sqlite
];
env = {
JMDICT_VERSION = jmdict.version;
JMDICT_DATE = jmdict.date;
JMDICT_HASH = jmdict.hash;
KANJIDIC_VERSION = kanjidic2.version;
KANJIDIC_DATE = kanjidic2.date;
KANJIDIC_HASH = kanjidic2.hash;
RADKFILE_VERSION = radkfile.version;
RADKFILE_DATE = radkfile.date;
RADKFILE_HASH = radkfile.hash;
TANOS_JLPT_VERSION = tanos-jlpt.version;
TANOS_JLPT_DATE = tanos-jlpt.date;
TANOS_JLPT_HASH = tanos-jlpt.hash;
};
buildPhase = ''
runHook preBuild
mkdir -p data/tmp
ln -s "${jmdict}"/* data/tmp
ln -s "${radkfile}"/* data/tmp
ln -s "${kanjidic2}"/* data/tmp
mkdir -p data
ln -s '${jmdict}'/* data/
ln -s '${kanjidic2}'/* data/
ln -s '${radkfile}'/* data/
ln -s '${tanos-jlpt}' data/tanos-jlpt
for migration in migrations/*.sql; do
sqlite3 jadb.sqlite < "$migration"

View File

@@ -7,6 +7,29 @@ buildDartApplication {
version = "1.0.0";
inherit src;
dartEntryPoints."bin/jadb" = "bin/jadb.dart";
# NOTE: the default dart hooks are using `dart compile`, which is not able to call the
# new dart build hooks required to use package:sqlite3 >= 3.0.0. So we override
# these phases to use `dart build` instead.
buildPhase = ''
runHook preBuild
mkdir -p "$out/bin"
dart build cli --target "bin/jadb.dart"
runHook postBuild
'';
installPhase = ''
runHook preInstall
mkdir -p "$out"
mv build/cli/*/bundle/* "$out/"
runHook postInstall
'';
autoPubspecLock = ../pubspec.lock;
meta.mainProgram = "jadb";

View File

@@ -1,46 +0,0 @@
{
stdenvNoCC,
jmdict-src,
jmdict-with-examples-src,
xmlformat,
gzip,
edrdgMetadata,
}:
stdenvNoCC.mkDerivation {
name = "jmdict";
dontUnpack = true;
srcs = [
jmdict-src
jmdict-with-examples-src
];
nativeBuildInputs = [
gzip
xmlformat
];
buildPhase = ''
runHook preBuild
gzip -dkc "${jmdict-src}" > JMdict.xml
gzip -dkc "${jmdict-with-examples-src}" > JMdict_with_examples.xml
xmlformat -i JMdict.xml
xmlformat -i JMdict_with_examples.xml
runHook postBuild
'';
installPhase = ''
runHook preInstall
install -Dt "$out" JMdict.xml JMdict_with_examples.xml
runHook postInstall
'';
meta = edrdgMetadata // {
description = "A Japanese-Multilingual Dictionary providing lexical data for japanese words";
homepage = "https://www.edrdg.org/jmdict/j_jmdict.html";
};
}

View File

@@ -1,40 +0,0 @@
{
stdenvNoCC,
kanjidic2-src,
xmlformat,
gzip,
edrdgMetadata,
}:
stdenvNoCC.mkDerivation {
name = "kanjidic2";
src = kanjidic2-src;
dontUnpack = true;
nativeBuildInputs = [
gzip
xmlformat
];
buildPhase = ''
runHook preBuild
gzip -dkc "${kanjidic2-src}" > kanjidic2.xml
xmlformat -i kanjidic2.xml
runHook postBuild
'';
installPhase = ''
runHook preInstall
install -Dt "$out" kanjidic2.xml
runHook postInstall
'';
meta = edrdgMetadata // {
description = "A consolidated XML-format kanji database";
homepage = "https://www.edrdg.org/kanjidic/kanjd2index_legacy.html";
};
}

View File

@@ -1,40 +0,0 @@
{
stdenv,
radkfile-src,
gzip,
iconv,
edrdgMetadata,
}:
stdenv.mkDerivation {
name = "radkfile";
src = radkfile-src;
dontUnpack = true;
nativeBuildInputs = [
gzip
iconv
];
buildPhase = ''
runHook preBuild
gzip -dkc "$src" > radkfile
iconv -f EUC-JP -t UTF-8 -o radkfile_utf8 radkfile
runHook postBuild
'';
installPhase = ''
runHook preInstall
install -Dt "$out" radkfile_utf8
runHook postInstall
'';
meta = edrdgMetadata // {
description = "A file providing searchable decompositions of kanji characters";
homepage = "https://www.edrdg.org/krad/kradinf.html";
};
}

View File

@@ -5,18 +5,18 @@ packages:
dependency: transitive
description:
name: _fe_analyzer_shared
sha256: e55636ed79578b9abca5fecf9437947798f5ef7456308b5cb85720b793eac92f
sha256: "8d718c5c58904f9937290fd5dbf2d6a0e02456867706bfb6cd7b81d394e738d5"
url: "https://pub.dev"
source: hosted
version: "82.0.0"
version: "98.0.0"
analyzer:
dependency: transitive
description:
name: analyzer
sha256: "904ae5bb474d32c38fb9482e2d925d5454cda04ddd0e55d2e6826bc72f6ba8c0"
sha256: "6141ad5d092d1e1d13929c0504658bbeccc1703505830d7c26e859908f5efc88"
url: "https://pub.dev"
source: hosted
version: "7.4.5"
version: "12.0.0"
args:
dependency: "direct main"
description:
@@ -29,10 +29,18 @@ packages:
dependency: transitive
description:
name: async
sha256: "758e6d74e971c3e5aceb4110bfd6698efc7f501675bcfe0c775459a8140750eb"
sha256: e2eb0491ba5ddb6177742d2da23904574082139b07c1e33b8503b9f46f3e1a37
url: "https://pub.dev"
source: hosted
version: "2.13.0"
version: "2.13.1"
benchmark_harness:
dependency: "direct dev"
description:
name: benchmark_harness
sha256: a2d3c4c83cac0126bf38e41eaf7bd9ed4f6635f1ee1a0cbc6f79fa9736c62cbd
url: "https://pub.dev"
source: hosted
version: "2.4.0"
boolean_selector:
dependency: transitive
description:
@@ -49,6 +57,14 @@ packages:
url: "https://pub.dev"
source: hosted
version: "0.2.0"
code_assets:
dependency: transitive
description:
name: code_assets
sha256: "83ccdaa064c980b5596c35dd64a8d3ecc68620174ab9b90b6343b753aa721687"
url: "https://pub.dev"
source: hosted
version: "1.0.0"
collection:
dependency: "direct main"
description:
@@ -69,42 +85,42 @@ packages:
dependency: transitive
description:
name: coverage
sha256: "802bd084fb82e55df091ec8ad1553a7331b61c08251eef19a508b6f3f3a9858d"
sha256: "5da775aa218eaf2151c721b16c01c7676fbfdd99cebba2bf64e8b807a28ff94d"
url: "https://pub.dev"
source: hosted
version: "1.13.1"
version: "1.15.0"
crypto:
dependency: transitive
description:
name: crypto
sha256: "1e445881f28f22d6140f181e07737b22f1e099a5e1ff94b0af2f9e4a463f4855"
sha256: c8ea0233063ba03258fbcf2ca4d6dadfefe14f02fab57702265467a19f27fadf
url: "https://pub.dev"
source: hosted
version: "3.0.6"
version: "3.0.7"
csv:
dependency: "direct main"
description:
name: csv
sha256: c6aa2679b2a18cb57652920f674488d89712efaf4d3fdf2e537215b35fc19d6c
sha256: "2e0a52fb729f2faacd19c9c0c954ff450bba37aa8ab999410309e2342e7013a2"
url: "https://pub.dev"
source: hosted
version: "6.0.0"
version: "8.0.0"
equatable:
dependency: "direct main"
description:
name: equatable
sha256: "567c64b3cb4cf82397aac55f4f0cbd3ca20d77c6c03bedbc4ceaddc08904aef7"
sha256: "3e0141505477fd8ad55d6eb4e7776d3fe8430be8e497ccb1521370c3f21a3e2b"
url: "https://pub.dev"
source: hosted
version: "2.0.7"
version: "2.0.8"
ffi:
dependency: transitive
description:
name: ffi
sha256: "289279317b4b16eb2bb7e271abccd4bf84ec9bdcbe999e278a94b804f5630418"
sha256: "6d7fd89431262d8f3125e81b50d3847a091d846eafcd4fdb88dd06f36d705a45"
url: "https://pub.dev"
source: hosted
version: "2.1.4"
version: "2.2.0"
file:
dependency: transitive
description:
@@ -129,6 +145,14 @@ packages:
url: "https://pub.dev"
source: hosted
version: "2.1.3"
hooks:
dependency: transitive
description:
name: hooks
sha256: e79ed1e8e1929bc6ecb6ec85f0cb519c887aa5b423705ded0d0f2d9226def388
url: "https://pub.dev"
source: hosted
version: "1.0.2"
http_multi_server:
dependency: transitive
description:
@@ -153,22 +177,14 @@ packages:
url: "https://pub.dev"
source: hosted
version: "1.0.5"
js:
dependency: transitive
description:
name: js
sha256: "53385261521cc4a0c4658fd0ad07a7d14591cf8fc33abbceae306ddb974888dc"
url: "https://pub.dev"
source: hosted
version: "0.7.2"
lints:
dependency: "direct dev"
description:
name: lints
sha256: c35bb79562d980e9a453fc715854e1ed39e24e7d0297a880ef54e17f9874a9d7
sha256: "12f842a479589fea194fe5c5a3095abc7be0c1f2ddfa9a0e76aed1dbd26a87df"
url: "https://pub.dev"
source: hosted
version: "5.1.1"
version: "6.1.0"
logging:
dependency: transitive
description:
@@ -181,18 +197,18 @@ packages:
dependency: transitive
description:
name: matcher
sha256: dc58c723c3c24bf8d3e2d3ad3f2f9d7bd9cf43ec6feaa64181775e60190153f2
sha256: dc0b7dc7651697ea4ff3e69ef44b0407ea32c487a39fff6a4004fa585e901861
url: "https://pub.dev"
source: hosted
version: "0.12.17"
version: "0.12.19"
meta:
dependency: transitive
description:
name: meta
sha256: "23f08335362185a5ea2ad3a4e597f1375e78bce8a040df5c600c8d3552ef2394"
sha256: df0c643f44ad098eb37988027a8e2b2b5a031fd3977f06bbfd3a76637e8df739
url: "https://pub.dev"
source: hosted
version: "1.17.0"
version: "1.18.2"
mime:
dependency: transitive
description:
@@ -201,6 +217,14 @@ packages:
url: "https://pub.dev"
source: hosted
version: "2.0.0"
native_toolchain_c:
dependency: transitive
description:
name: native_toolchain_c
sha256: "6ba77bb18063eebe9de401f5e6437e95e1438af0a87a3a39084fbd37c90df572"
url: "https://pub.dev"
source: hosted
version: "0.17.6"
node_preamble:
dependency: transitive
description:
@@ -218,7 +242,7 @@ packages:
source: hosted
version: "2.2.0"
path:
dependency: transitive
dependency: "direct main"
description:
name: path
sha256: "75cca69d1490965be98c73ceaea117e8a04dd21217b37b292c9ddbec0d955bc5"
@@ -229,18 +253,18 @@ packages:
dependency: transitive
description:
name: petitparser
sha256: "07c8f0b1913bcde1ff0d26e57ace2f3012ccbf2b204e070290dad3bb22797646"
sha256: "91bd59303e9f769f108f8df05e371341b15d59e995e6806aefab827b58336675"
url: "https://pub.dev"
source: hosted
version: "6.1.0"
version: "7.0.2"
pool:
dependency: transitive
description:
name: pool
sha256: "20fe868b6314b322ea036ba325e6fc0711a22948856475e2c2b6306e8ab39c2a"
sha256: "978783255c543aa3586a1b3c21f6e9d720eb315376a915872c61ef8b5c20177d"
url: "https://pub.dev"
source: hosted
version: "1.5.1"
version: "1.5.2"
pub_semver:
dependency: transitive
description:
@@ -301,34 +325,34 @@ packages:
dependency: transitive
description:
name: source_span
sha256: "254ee5351d6cb365c859e20ee823c3bb479bf4a293c22d17a9f1bf144ce86f7c"
sha256: "56a02f1f4cd1a2d96303c0144c93bd6d909eea6bee6bf5a0e0b685edbd4c47ab"
url: "https://pub.dev"
source: hosted
version: "1.10.1"
version: "1.10.2"
sqflite_common:
dependency: "direct main"
description:
name: sqflite_common
sha256: "84731e8bfd8303a3389903e01fb2141b6e59b5973cacbb0929021df08dddbe8b"
sha256: "6ef422a4525ecc601db6c0a2233ff448c731307906e92cabc9ba292afaae16a6"
url: "https://pub.dev"
source: hosted
version: "2.5.5"
version: "2.5.6"
sqflite_common_ffi:
dependency: "direct main"
description:
name: sqflite_common_ffi
sha256: "1f3ef3888d3bfbb47785cc1dda0dc7dd7ebd8c1955d32a9e8e9dae1e38d1c4c1"
sha256: c59fcdc143839a77581f7a7c4de018e53682408903a0a0800b95ef2dc4033eff
url: "https://pub.dev"
source: hosted
version: "2.3.5"
version: "2.4.0+2"
sqlite3:
dependency: transitive
dependency: "direct main"
description:
name: sqlite3
sha256: "310af39c40dd0bb2058538333c9d9840a2725ae0b9f77e4fd09ad6696aa8f66e"
sha256: caa693ad15a587a2b4fde093b728131a1827903872171089dedb16f7665d3a91
url: "https://pub.dev"
source: hosted
version: "2.7.5"
version: "3.2.0"
stack_trace:
dependency: transitive
description:
@@ -357,10 +381,10 @@ packages:
dependency: transitive
description:
name: synchronized
sha256: "0669c70faae6270521ee4f05bffd2919892d42d1276e6c495be80174b6bc0ef6"
sha256: c254ade258ec8282947a0acbbc90b9575b4f19673533ee46f2f6e9b3aeefd7c0
url: "https://pub.dev"
source: hosted
version: "3.3.1"
version: "3.4.0"
term_glyph:
dependency: transitive
description:
@@ -373,26 +397,26 @@ packages:
dependency: "direct dev"
description:
name: test
sha256: "0561f3a2cfd33d10232360f16dfcab9351cfb7ad9b23e6cd6e8c7fb0d62c7ac3"
sha256: "8d9ceddbab833f180fbefed08afa76d7c03513dfdba87ffcec2718b02bbcbf20"
url: "https://pub.dev"
source: hosted
version: "1.26.1"
version: "1.31.0"
test_api:
dependency: transitive
description:
name: test_api
sha256: "522f00f556e73044315fa4585ec3270f1808a4b186c936e612cab0b565ff1e00"
sha256: "949a932224383300f01be9221c39180316445ecb8e7547f70a41a35bf421fb9e"
url: "https://pub.dev"
source: hosted
version: "0.7.6"
version: "0.7.11"
test_core:
dependency: transitive
description:
name: test_core
sha256: "8619a9a45be044b71fe2cd6b77b54fd60f1c67904c38d48706e2852a2bda1c60"
sha256: "1991d4cfe85d5043241acac92962c3977c8d2f2add1ee73130c7b286417d1d34"
url: "https://pub.dev"
source: hosted
version: "0.6.10"
version: "0.6.17"
typed_data:
dependency: transitive
description:
@@ -405,18 +429,18 @@ packages:
dependency: transitive
description:
name: vm_service
sha256: ddfa8d30d89985b96407efce8acbdd124701f96741f2d981ca860662f1c0dc02
sha256: "45caa6c5917fa127b5dbcfbd1fa60b14e583afdc08bfc96dda38886ca252eb60"
url: "https://pub.dev"
source: hosted
version: "15.0.0"
version: "15.0.2"
watcher:
dependency: transitive
description:
name: watcher
sha256: "69da27e49efa56a15f8afe8f4438c4ec02eff0a117df1b22ea4aad194fe1c104"
sha256: "1398c9f081a753f9226febe8900fce8f7d0a67163334e1c94a2438339d79d635"
url: "https://pub.dev"
source: hosted
version: "1.1.1"
version: "1.2.1"
web:
dependency: transitive
description:
@@ -453,10 +477,10 @@ packages:
dependency: "direct main"
description:
name: xml
sha256: b015a8ad1c488f66851d762d3090a21c600e479dc75e68328c52774040cf9226
sha256: "971043b3a0d3da28727e40ed3e0b5d18b742fa5a68665cca88e74b7876d5e025"
url: "https://pub.dev"
source: hosted
version: "6.5.0"
version: "6.6.1"
yaml:
dependency: transitive
description:
@@ -466,4 +490,4 @@ packages:
source: hosted
version: "3.1.3"
sdks:
dart: ">=3.7.0 <4.0.0"
dart: ">=3.10.1 <4.0.0"

View File

@@ -4,24 +4,32 @@ version: 1.0.0
homepage: https://git.pvv.ntnu.no/oysteikt/jadb
environment:
sdk: '>=3.2.0 <4.0.0'
sdk: '^3.9.0'
dependencies:
args: ^2.7.0
collection: ^1.19.0
csv: ^6.0.0
csv: ^8.0.0
equatable: ^2.0.0
path: ^1.9.1
sqflite_common: ^2.5.0
sqflite_common_ffi: ^2.3.0
sqlite3: ^3.1.6
xml: ^6.5.0
dev_dependencies:
lints: ^5.0.0
benchmark_harness: ^2.4.0
lints: ^6.0.0
test: ^1.25.15
executables:
jadb: jadb
hooks:
user_defines:
sqlite3:
source: system
topics:
- database
- dictionary

View File

@@ -0,0 +1,21 @@
import 'package:collection/collection.dart';
import 'package:jadb/const_data/kanji_grades.dart';
import 'package:test/test.dart';
void main() {
test('All constant kanji in jouyouKanjiByGrades are 2136 in total', () {
expect(jouyouKanjiByGrades.values.flattenedToSet.length, 2136);
});
// test('All constant kanji in jouyouKanjiByGrades are present in KANJIDIC2', () {
// });
// test('All constant kanji in jouyouKanjiByGrades have matching grade as in KANJIDIC2', () {
// });
// test('All constant kanji in jouyouKanjiByGradesAndStrokeCount have matching stroke count as in KANJIDIC2', () {
// });
}

View File

@@ -0,0 +1,17 @@
import 'package:collection/collection.dart';
import 'package:jadb/const_data/radicals.dart';
import 'package:test/test.dart';
void main() {
test('All constant radicals are 253 in total', () {
expect(radicals.values.flattenedToSet.length, 253);
});
// test('All constant radicals have at least 1 associated kanji in KANJIDIC2', () {
// });
// test('All constant radicals match the stroke order listed in KANJIDIC2', () {
// });
}

Some files were not shown because too many files have changed in this diff Show More