Create "furigana segmentation" (or "kanji/kana alignment") algorithm #46

Open
opened 2025-05-19 15:34:22 +02:00 by oysteikt · 1 comment
Owner

The algorithm should be able to do something like this:

Given kanji data

{
  '考' : {
    kunyomi: ['かんが.える', 'かんが.え']
    onyomi: ['コウ']
  },
  '方' : {
    kunyomi: ['かた', '-かた', '-がた']
    onyomi: ['ホウ']
  }
}

And word

{
  kanji: '考え方',
  reading: 'かんがえかた'
}

Output

[
  ("考", "かんが"),
  ("え", "え"),
  ("方", "かた"),
]

It needs only be best effort. Need to be careful with special cases like 今日 - きょう (follow gikun, jukujikun tags?)

Not sure if this already exists as a classical or AI solution, or whether the problem has a name. Jisho seems to be able to do this effortlessly somehow.

The algorithm should be able to do something like this: ``` Given kanji data { '考' : { kunyomi: ['かんが.える', 'かんが.え'] onyomi: ['コウ'] }, '方' : { kunyomi: ['かた', '-かた', '-がた'] onyomi: ['ホウ'] } } And word { kanji: '考え方', reading: 'かんがえかた' } Output [ ("考", "かんが"), ("え", "え"), ("方", "かた"), ] ``` It needs only be best effort. Need to be careful with special cases like `今日 - きょう` (follow gikun, jukujikun tags?) Not sure if this already exists as a classical or AI solution, or whether the problem has a name. Jisho seems to be able to do this effortlessly somehow.
Author
Owner
Keywords: furigana segmentation, kanji-kana alignment Links: - https://github.com/JMdictProject/JMdictIssues/issues/122 - https://github.com/hlorenzi/jisho-open/blob/main/common/src/furigana.ts - https://docs.rs/furigana/latest/src/furigana/lib.rs.html - Kuromoji seems to encourage using kanjidic to do this at home: https://github.com/atilika/kuromoji/issues/121
oysteikt added the feature request label 2025-05-21 14:29:47 +02:00
oysteikt added a new dependency 2025-05-22 16:15:38 +02:00
oysteikt changed title from Create an algorithm to split a kanji word and map each kanji to its respective kana in the reading to Create "furigana segmentation" (or "kanji/kana alignment") algorithm 2025-05-22 16:16:49 +02:00
oysteikt added this to the Kanban project 2025-06-23 10:33:10 +02:00
oysteikt moved this to Mid pri in Kanban on 2025-06-23 10:43:21 +02:00
Sign in to join this conversation.