Word search: kana type independence #23

Open
opened 2025-04-23 12:46:08 +02:00 by oysteikt · 5 comments
Owner

Tables from 0007_JMdict_Entry_lookup_tables.sql are currently left empty

Tables from `0007_JMdict_Entry_lookup_tables.sql` are currently left empty
oysteikt added the data source label 2025-05-21 14:28:17 +02:00
Author
Owner

This table does not exist anymore, repurposing issue.

This table does not exist anymore, repurposing issue.
oysteikt changed title from Fill JMDict entry lookup tables to Word search: kana type independence 2025-05-22 16:10:09 +02:00
Author
Owner

You should be able to search for katakana words with hiragana and vice versa. This goes not only for raw kana, but also when mixed with kanji, as well as when hiragana and katakana are mixed in the search term or the result term.

You should be able to search for katakana words with hiragana and vice versa. This goes not only for raw kana, but also when mixed with kanji, as well as when hiragana and katakana are mixed in the search term or the result term.
oysteikt added the search-ux label 2025-05-22 16:28:35 +02:00
oysteikt added this to the Kanban project 2025-06-23 10:33:19 +02:00
oysteikt moved this to High pri in Kanban on 2025-06-23 10:52:01 +02:00
Author
Owner

Maybe we can use kana matching to adjust the result score?

Could we make the FTS tables to hiragana only, and transliterate the search input? I suppose we'd need #52 in place to avoid storing the reading twice.

Maybe we can use kana matching to adjust the result score? Could we make the FTS tables to hiragana only, and transliterate the search input? I suppose we'd need #52 in place to avoid storing the reading twice.
Author
Owner

This does not seem to be trivial, because:

  • The FTS tables are automatically updated via triggers, based on the Reading/Kanji Element tables
  • The Reading/Kanji Element tables have mixed kana
  • Translating them in the trigger either requires:
    • A custom extension for transliterating kana
    • A dart function via db.createFunction (can't work on the db locally anymore)
    • Avoid the entire issue with FTS4 ICU support (but we'd need to compile and bring our own libsqlite.so file, upstream does not bake this in)
This does not seem to be trivial, because: - The FTS tables are automatically updated via triggers, based on the Reading/Kanji Element tables - The Reading/Kanji Element tables have mixed kana - Translating them in the trigger either requires: - A custom extension for transliterating kana - A dart function via `db.createFunction` (can't work on the db locally anymore) - Avoid the entire issue with FTS4 ICU support (but we'd need to compile and bring our own `libsqlite.so` file, upstream does not bake this in)
Author
Owner

If #53 results in rewriting the migrations as dart extensions for dealing with #54, dart functions might actually not look that bad.

We could always just bundle our own libsqlite later if we need the speed of a custom sqlite extension.

If #53 results in rewriting the migrations as dart extensions for dealing with #54, dart functions might actually not look that bad. We could always just bundle our own libsqlite later if we need the speed of a custom sqlite extension.
Sign in to join this conversation.