- [ ] Fix pulling data sources - [X] JMDict - [X] Ingest data - [ ] Tatoeba - [X] Ingest data - [ ] Disambiguate and connect to JMDict senses - [ ] NHK News - [X] Ingest data - [ ] Disambiguate This should be done through a combination of mecab and leveshtein to the sense glossary (although, please mention in the report that it might be bad dropping the ones still ambiguous, because there might be a pattern to it. Single words might have lots and lots of similar glosses, and be marked as very rare as a result) - [ ] TF IDF - [ ] Test out weight combinations Some notes: - Sentence length cost should probably increase exponentially.