diff --git a/docs/database.md b/docs/database.md new file mode 100644 index 0000000..5fa9299 --- /dev/null +++ b/docs/database.md @@ -0,0 +1,18 @@ +# Database + +Here are some choices that have been made when designing the schema + +### `JMdict_{Reading,Kanji}Element.elementId` and `JMdict_Sense.senseId` + +The `elementId`/`senseId` field acts as a unique identifier for each individual element in these tables. +It is a packed version of the `(entryId, orderNum)` pair, where the first number is given 7 digits and the second is given 2 digits (max count found so far is `40`). +Since `entryId` already is a field in the table, it would technically have been fine to store the `orderNum` as a separate field, +but it is easier to be able to refer to the entries without a composite foreign key in other tables. + +(NOTE: `entryId` is now inferred from `elementId` within sqlite using a generated column, so saying it is "stored in a separate field" might be a stretch) + +We used to generate the `elementId` separately from `orderNum` as a sequential id, but it lead to all values +shifting whenever the data was updated, leading to very big diffs. Making it be a unique composite of data coming +from the source data itself means that the values will be stable across updates. + +Due to the way the data is structured, we can use the `elementId` as the ordering number as well. diff --git a/docs/overview.md b/docs/overview.md index c94fd67..426a90b 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -3,6 +3,7 @@ This is the documentation for `jadb`. Since I'm currently the only one working on it, the documentation is more or less just notes to myself, to ensure I remember how and why I implemented certain features in a certain way a few months down the road. This is not a comprehensive and formal documentation for downstream use, neither for developers nor end-users. - [Word Search](./word-search.md) +- [Database](./database.md) - [Lemmatizer](./lemmatizer.md) ## Project structure