799 B
799 B
Lemmatizer
The lemmatizer is still quite experimental, but will play a more important role in the project in the future.
It is a manual implementation of a Finite State Transducer for morphological parsing. The FST is used to recursively remove affixes from a word until it (hopefully) deconjugates into its dictionary form. This iterative deconjugation tree will then be combined with queries into the dictionary data to determine if the deconjugation leads to a real known word.
Each separate rule is a separate static object declared in lib/util/lemmatizer/rules.
There is a cli subcommand for testing the tool interactively, you can run
dart run jadb lemmatize -w '食べさせられない'