Renormalize KANJIDIC radical data #42

Closed
opened 2025-05-19 10:13:42 +02:00 by oysteikt · 0 comments
Owner

The radical data is heavily denormalized coming from KANJIDIC. The following things should be done:

  • We are almost never using nelson_c radicals, but they are stored as separate rows. It seems to be the case that there is a one to one mapping between classical and nelson_c radicals. Instead of storing a copy of each radical, let's just collect the mapping as we go, and store the classical number by default, and a classical <-> nelson_c mapping table in case we ever need them.

  • Radical readings are stored in the misc field for some reason, but they are the same for the same radical every time. However, there can be many of them. Redo the RadicalName table so that it uses the classical radical number instead of kanji as its JOIN point.

The radical data is heavily denormalized coming from KANJIDIC. The following things should be done: - We are almost never using `nelson_c` radicals, but they are stored as separate rows. It seems to be the case that there is a one to one mapping between `classical` and `nelson_c` radicals. Instead of storing a copy of each radical, let's just collect the mapping as we go, and store the `classical` number by default, and a `classical` <-> `nelson_c` mapping table in case we ever need them. - Radical readings are stored in the `misc` field for some reason, but they are the same for the same radical every time. However, there can be many of them. Redo the `RadicalName` table so that it uses the `classical` radical number instead of `kanji` as its JOIN point.
oysteikt added the space-reduction label 2025-05-22 16:31:52 +02:00
oysteikt added this to the Kanban project 2025-06-23 10:33:30 +02:00
oysteikt moved this to Finished in Kanban on 2025-06-23 10:33:51 +02:00
Sign in to join this conversation.