docs: add docs about database schema choices
All checks were successful
Build and test / build (push) Successful in 8m35s

This commit is contained in:
2026-04-08 17:38:08 +09:00
parent f6de8680ad
commit 20243dec09
2 changed files with 19 additions and 0 deletions

18
docs/database.md Normal file
View File

@@ -0,0 +1,18 @@
# Database
Here are some choices that have been made when designing the schema
### `JMdict_{Reading,Kanji}Element.elementId` and `JMdict_Sense.senseId`
The `elementId`/`senseId` field acts as a unique identifier for each individual element in these tables.
It is a packed version of the `(entryId, orderNum)` pair, where the first number is given 7 digits and the second is given 2 digits (max count found so far is `40`).
Since `entryId` already is a field in the table, it would technically have been fine to store the `orderNum` as a separate field,
but it is easier to be able to refer to the entries without a composite foreign key in other tables.
(NOTE: `entryId` is now inferred from `elementId` within sqlite using a generated column, so saying it is "stored in a separate field" might be a stretch)
We used to generate the `elementId` separately from `orderNum` as a sequential id, but it lead to all values
shifting whenever the data was updated, leading to very big diffs. Making it be a unique composite of data coming
from the source data itself means that the values will be stable across updates.
Due to the way the data is structured, we can use the `elementId` as the ordering number as well.

View File

@@ -3,6 +3,7 @@
This is the documentation for `jadb`. Since I'm currently the only one working on it, the documentation is more or less just notes to myself, to ensure I remember how and why I implemented certain features in a certain way a few months down the road. This is not a comprehensive and formal documentation for downstream use, neither for developers nor end-users.
- [Word Search](./word-search.md)
- [Database](./database.md)
- [Lemmatizer](./lemmatizer.md)
## Project structure