Files
jadb/docs/lemmatizer.md
h7x4 ede57a7a00
All checks were successful
Build and test / build (push) Successful in 11m51s
docs: init
2026-04-01 16:48:40 +09:00

14 lines
799 B
Markdown

# Lemmatizer
The lemmatizer is still quite experimental, but will play a more important role in the project in the future.
It is a manual implementation of a [Finite State Transducer](https://en.wikipedia.org/wiki/Morphological_dictionary#Finite_State_Transducers) for morphological parsing. The FST is used to recursively remove affixes from a word until it (hopefully) deconjugates into its dictionary form. This iterative deconjugation tree will then be combined with queries into the dictionary data to determine if the deconjugation leads to a real known word.
Each separate rule is a separate static object declared in `lib/util/lemmatizer/rules`.
There is a cli subcommand for testing the tool interactively, you can run
```bash
dart run jadb lemmatize -w '食べさせられない'
```