41 lines
1.7 KiB
Markdown
41 lines
1.7 KiB
Markdown
This is a simple chatbot project.
|
|
The aim is to recreate something similar to neurosama, running on local hardware on a minimal amount of compute.
|
|
|
|
The bot is designed to be modular, with the ability to add new modules easily.
|
|
|
|
You need to supply a backup mediawiki xml. this is used to gather information to the chatbot.
|
|
|
|
A strong computer with cuda and a fair bit of vrm is adviced to get response times down.
|
|
|
|
Most settings are configured through enviroment variables from the flake.nix file.
|
|
|
|
## Modules
|
|
|
|
### stt
|
|
|
|
The stt module is responsible for converting speech to text.
|
|
Whisper-cpp-stream is used to stream audio through the whisper stt engine.
|
|
whisper-cpp-stream is a c++ program that reads audio from a microphone, and sends it to the whisper stt engine.
|
|
It is run through a python subprocess.
|
|
|
|
### llm
|
|
|
|
The llm module is responsible for crafting a response to the user's input. It uses a rag based on a supplied mediawiki wiki xml file, and in the future, included chat history.
|
|
|
|
langchain is the pyhton module that interfaces with the rag, and llm.
|
|
ollama is used on the backend to interface with a llama model.
|
|
|
|
future work will include giving astructured response, to include emotions, and metadata for a future image module.
|
|
|
|
### tts
|
|
|
|
piper is used as the tts engine.
|
|
It does not have proper python bindings in nixpkgs, so it is run with subprocess.
|
|
text is echoed into piper's stdin, and the output is played with aplay.
|
|
|
|
### image
|
|
|
|
The image module is responsible for processing images.
|
|
It captures the image using pygame, b64 encodes it and sends it to a multimodal model for descriptions.
|
|
Future work is to test out using opencv or something similar for image tagging instead, as the multimodal model halucinates a lot, and is also way too slow.
|