docs: add docs about database schema choices
All checks were successful
Build and test / build (push) Successful in 8m35s
All checks were successful
Build and test / build (push) Successful in 8m35s
This commit is contained in:
18
docs/database.md
Normal file
18
docs/database.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Database
|
||||
|
||||
Here are some choices that have been made when designing the schema
|
||||
|
||||
### `JMdict_{Reading,Kanji}Element.elementId` and `JMdict_Sense.senseId`
|
||||
|
||||
The `elementId`/`senseId` field acts as a unique identifier for each individual element in these tables.
|
||||
It is a packed version of the `(entryId, orderNum)` pair, where the first number is given 7 digits and the second is given 2 digits (max count found so far is `40`).
|
||||
Since `entryId` already is a field in the table, it would technically have been fine to store the `orderNum` as a separate field,
|
||||
but it is easier to be able to refer to the entries without a composite foreign key in other tables.
|
||||
|
||||
(NOTE: `entryId` is now inferred from `elementId` within sqlite using a generated column, so saying it is "stored in a separate field" might be a stretch)
|
||||
|
||||
We used to generate the `elementId` separately from `orderNum` as a sequential id, but it lead to all values
|
||||
shifting whenever the data was updated, leading to very big diffs. Making it be a unique composite of data coming
|
||||
from the source data itself means that the values will be stable across updates.
|
||||
|
||||
Due to the way the data is structured, we can use the `elementId` as the ordering number as well.
|
||||
@@ -3,6 +3,7 @@
|
||||
This is the documentation for `jadb`. Since I'm currently the only one working on it, the documentation is more or less just notes to myself, to ensure I remember how and why I implemented certain features in a certain way a few months down the road. This is not a comprehensive and formal documentation for downstream use, neither for developers nor end-users.
|
||||
|
||||
- [Word Search](./word-search.md)
|
||||
- [Database](./database.md)
|
||||
- [Lemmatizer](./lemmatizer.md)
|
||||
|
||||
## Project structure
|
||||
|
||||
Reference in New Issue
Block a user