docs: add docs about database schema choices

2026-04-08 17:38:08 +09:00
parent f6de8680ad
commit 20243dec09
2 changed files with 19 additions and 0 deletions
@@ -0,0 +1,18 @@
+# Database
+
+Here are some choices that have been made when designing the schema
+
+### `JMdict_{Reading,Kanji}Element.elementId` and `JMdict_Sense.senseId`
+
+The `elementId`/`senseId` field acts as a unique identifier for each individual element in these tables.
+It is a packed version of the `(entryId, orderNum)` pair, where the first number is given 7 digits and the second is given 2 digits (max count found so far is `40`).
+Since `entryId` already is a field in the table, it would technically have been fine to store the `orderNum` as a separate field,
+but it is easier to be able to refer to the entries without a composite foreign key in other tables.
+
+(NOTE: `entryId` is now inferred from `elementId` within sqlite using a generated column, so saying it is "stored in a separate field" might be a stretch)
+
+We used to generate the `elementId` separately from `orderNum` as a sequential id, but it lead to all values
+shifting whenever the data was updated, leading to very big diffs. Making it be a unique composite of data coming
+from the source data itself means that the values will be stable across updates.
+
+Due to the way the data is structured, we can use the `elementId` as the ordering number as well.
@@ -3,6 +3,7 @@
 This is the documentation for `jadb`. Since I'm currently the only one working on it, the documentation is more or less just notes to myself, to ensure I remember how and why I implemented certain features in a certain way a few months down the road. This is not a comprehensive and formal documentation for downstream use, neither for developers nor end-users.

 - [Word Search](./word-search.md)
+- [Database](./database.md)
 - [Lemmatizer](./lemmatizer.md)

 ## Project structure