Files
jadb/docs/database.md
h7x4 6364457d9e
All checks were successful
Build and test / build (push) Successful in 7m56s
docs/database: add some notes about elementId embeddings
2026-04-08 19:07:48 +09:00

29 lines
1.9 KiB
Markdown

# Database
Here are some choices that have been made when designing the schema
### `JMdict_{Reading,Kanji}Element.elementId` and `JMdict_Sense.senseId`
The `elementId`/`senseId` field acts as a unique identifier for each individual element in these tables.
It is a packed version of the `(entryId, orderNum)` pair, where the first number is given 7 digits and the second is given 2 digits (max count found so far is `40`).
Since `entryId` already is a field in the table, it would technically have been fine to store the `orderNum` as a separate field,
but it is easier to be able to refer to the entries without a composite foreign key in other tables.
(NOTE: `entryId` is now inferred from `elementId` within sqlite using a generated column, so saying it is "stored in a separate field" might be a stretch)
In addition, the reading element id's are added with `1000000000` to make them unique from the kanji element id's. This reduces the amount of space needed for indices in some locations, because you can simply filter out each part with `>` or `<`.
We used to generate the `elementId` separately from `orderNum` as a sequential id, but it lead to all values
shifting whenever the data was updated, leading to very big diffs. Making it be a unique composite of data coming
from the source data itself means that the values will be stable across updates.
Due to the way the data is structured, we can use the `elementId` as the ordering number as well.
### `JMdict_EntryScore`
The `JMdict_EntryScore` table is used to store the score of each entry, which is used for sorting search results. The score is calculated based on a number of variables.
The table is automatically generated from other tables via triggers, and should be considered as a materialized view.
There is a score row for every single entry in both `JMdict_KanjiElement` and `JMdict_ReadingElement`, split by the `type` field.