Adrian Gunnar Lauterer 418d6d044d | ||
---|---|---|
.gitignore | ||
README.md | ||
assistant.py | ||
flake.lock | ||
flake.nix | ||
image.py | ||
llm.py | ||
stt.py | ||
tts.py |
README.md
This is a simple chatbot project. The aim is to recreate something similar to neurosama, running on local hardware on a minimal amount of compute.
The bot is designed to be modular, with the ability to add new modules easily.
You need to supply a backup mediawiki xml. this is used to gather information to the chatbot.
A strong computer with cuda and a fair bit of vrm is adviced to get response times down.
Most settings are configured through enviroment variables from the flake.nix file.
Modules
stt
The stt module is responsible for converting speech to text. Whisper-cpp-stream is used to stream audio through the whisper stt engine. whisper-cpp-stream is a c++ program that reads audio from a microphone, and sends it to the whisper stt engine. It is run through a python subprocess.
llm
The llm module is responsible for crafting a response to the user's input. It uses a rag based on a supplied mediawiki wiki xml file, and in the future, included chat history.
langchain is the pyhton module that interfaces with the rag, and llm. ollama is used on the backend to interface with a llama model.
future work will include giving astructured response, to include emotions, and metadata for a future image module.
tts
piper is used as the tts engine. It does not have proper python bindings in nixpkgs, so it is run with subprocess. text is echoed into piper's stdin, and the output is played with aplay.
image
The image module is responsible for processing images. It captures the image using pygame, b64 encodes it and sends it to a multimodal model for descriptions. Future work is to test out using opencv or something similar for image tagging instead, as the multimodal model halucinates a lot, and is also way too slow.