2
0

Create "furigana segmentation" (or "kanji/kana alignment") algorithm #46

Offen
2025-05-19 15:34:22 +02:00 von oysteikt geöffnet · 1 Kommentar
Besitzer

The algorithm should be able to do something like this:

Given kanji data

{
  '考' : {
    kunyomi: ['かんが.える', 'かんが.え']
    onyomi: ['コウ']
  },
  '方' : {
    kunyomi: ['かた', '-かた', '-がた']
    onyomi: ['ホウ']
  }
}

And word

{
  kanji: '考え方',
  reading: 'かんがえかた'
}

Output

[
  ("考", "かんが"),
  ("え", "え"),
  ("方", "かた"),
]

It needs only be best effort. Need to be careful with special cases like 今日 - きょう (follow gikun, jukujikun tags?)

Not sure if this already exists as a classical or AI solution, or whether the problem has a name. Jisho seems to be able to do this effortlessly somehow.

The algorithm should be able to do something like this: ``` Given kanji data { '考' : { kunyomi: ['かんが.える', 'かんが.え'] onyomi: ['コウ'] }, '方' : { kunyomi: ['かた', '-かた', '-がた'] onyomi: ['ホウ'] } } And word { kanji: '考え方', reading: 'かんがえかた' } Output [ ("考", "かんが"), ("え", "え"), ("方", "かた"), ] ``` It needs only be best effort. Need to be careful with special cases like `今日 - きょう` (follow gikun, jukujikun tags?) Not sure if this already exists as a classical or AI solution, or whether the problem has a name. Jisho seems to be able to do this effortlessly somehow.
Autor
Besitzer
Keywords: furigana segmentation, kanji-kana alignment Links: - https://github.com/JMdictProject/JMdictIssues/issues/122 - https://github.com/hlorenzi/jisho-open/blob/main/common/src/furigana.ts - https://docs.rs/furigana/latest/src/furigana/lib.rs.html - Kuromoji seems to encourage using kanjidic to do this at home: https://github.com/atilika/kuromoji/issues/121
oysteikt hat das Label feature request 2025-05-21 14:29:47 +02:00 hinzugefügt
oysteikt hat eine neue Abhängigkeit 2025-05-22 16:15:38 +02:00 hinzugefügt
oysteikt hat den Titel von Create an algorithm to split a kanji word and map each kanji to its respective kana in the reading zu Create "furigana segmentation" (or "kanji/kana alignment") algorithm 2025-05-22 16:16:49 +02:00 geändert
oysteikt hat dieses zum Kanban projekt 2025-06-23 10:33:10 +02:00 hinzugefügt
oysteikt hat dies zu Mid pri in Kanban 2025-06-23 10:43:21 +02:00 verschoben
Anmelden, um an der Diskussion teilzunehmen.