TDT4310-project-sorted-japa.../todo.md

- [ ] Fix pulling data sources
- [X] JMDict
  - [X] Ingest data
- [ ] Tatoeba
  - [X] Ingest data
  - [ ] Disambiguate and connect to JMDict senses
- [ ] NHK News
  - [X] Ingest data
  - [ ] Disambiguate
		This should be done through a combination of mecab and
		leveshtein to the sense glossary (although, please mention
		in the report that it might be bad dropping the ones still
		ambiguous, because there might be a pattern to it. Single words
		might have lots and lots of similar glosses, and be marked as
		very rare as a result)
	- [ ] TF IDF
- [ ] Test out weight combinations
	  Some notes:
  - Sentence length cost should probably increase exponentially.