Files
jadb/docs/lemmatizer.md
h7x4 ede57a7a00
All checks were successful
Build and test / build (push) Successful in 11m51s
docs: init
2026-04-01 16:48:40 +09:00

799 B

Lemmatizer

The lemmatizer is still quite experimental, but will play a more important role in the project in the future.

It is a manual implementation of a Finite State Transducer for morphological parsing. The FST is used to recursively remove affixes from a word until it (hopefully) deconjugates into its dictionary form. This iterative deconjugation tree will then be combined with queries into the dictionary data to determine if the deconjugation leads to a real known word.

Each separate rule is a separate static object declared in lib/util/lemmatizer/rules.

There is a cli subcommand for testing the tool interactively, you can run

dart run jadb lemmatize -w '食べさせられない'