I wanted to demonstrate some of their capabilities by integrating them into WhatsApp to allow multimodal (text, audio, and image) communication.
Adding voice transcription support can do wonders for the accessibility of these language models (my mom can attest to that). And giving them personas to emulate can be a game changer for areas like counselling and teaching.
Sia also supports multilingual transcriptions, allowing speakers from any part of the world to interact with Sia with ease.
LLMs can emulate Einstein or any other personality and can help children learn better by directly asking the persona the question, making for a more personal learning experience.
They can also provide better emotional support than counsellors or psychologists, as they offer a judgement-free zone for people to express themselves.
Sia can also answer questions based on image input. This functionality is powered by another side project of mine, Pixquery (pixquery)
In this tech demo, I've just scratched the surface of what LLMs like SIA are capable of.
Future additions to this project could be:
Two-way audio support (currently, you can talk to Sia via audio messages, but v2 will support two-way audio communication and make it feel like you are on a call with Sia).
Web browsing and extension support This will allow Sia to communicate with other services on your phone, like Gmail or YouTube. Web browsing will also help Sia fetch the latest information from the web, so the knowledge cut-off limitation will be gone.
WhatsApp business API is an expensive mess. Finding a good unofficial GoLang implemetation and modifying the source code for my use case took a lot of time. (>day) (thanks to the repo maintianer for helping me understand the whatsapp protocall).
Reducing the latency between the spoken voice message and the GPT reply back was challenge.
Making sure the personas don't trigger the openAI safety filters was a hit and trial.
Writing database layer using redis to help Sia remember the conversation context was a chore but copilot made it a lil bit easier. 😅
Pushing this to cloud took time as docker mysteriously stopped working on my laptop.