Database

Here are some choices that have been made when designing the schema

`JMdict_{Reading,Kanji}Element.elementId` and `JMdict_Sense.senseId`

The elementId/senseId field acts as a unique identifier for each individual element in these tables. It is a packed version of the (entryId, orderNum) pair, where the first number is given 7 digits and the second is given 2 digits (max count found so far is 40). Since entryId already is a field in the table, it would technically have been fine to store the orderNum as a separate field, but it is easier to be able to refer to the entries without a composite foreign key in other tables.

(NOTE: entryId is now inferred from elementId within sqlite using a generated column, so saying it is "stored in a separate field" might be a stretch)

We used to generate the elementId separately from orderNum as a sequential id, but it lead to all values shifting whenever the data was updated, leading to very big diffs. Making it be a unique composite of data coming from the source data itself means that the values will be stable across updates.

Due to the way the data is structured, we can use the elementId as the ordering number as well.

`JMdict_EntryScore`

The JMdict_EntryScore table is used to store the score of each entry, which is used for sorting search results. The score is calculated based on a number of variables.

The table is automatically generated from other tables via triggers, and should be considered as a materialized view.

There is a score row for every single entry in both JMdict_KanjiElement and JMdict_ReadingElement, split by the type field.

1.6 KiB Raw Blame History

Database

JMdict_{Reading,Kanji}Element.elementId and JMdict_Sense.senseId

JMdict_EntryScore

1.6 KiB

Raw Blame History

`JMdict_{Reading,Kanji}Element.elementId` and `JMdict_Sense.senseId`

`JMdict_EntryScore`