Voice Chatbot with Chainlit#
This example demonstrates how to create a voice-enabled chatbot using Quivr and Chainlit. The chatbot lets users upload a text file, ask questions about its content, and interact using speech.
Prerequisites#
- Python: Version 3.8 or higher.
- OpenAI API Key: Ensure you have a valid OpenAI API key.
Installation#
-
Clone the repository and navigate to the appropriate directory:
git clone https://github.com/QuivrHQ/quivr cd examples/chatbot_voice
-
Set the OpenAI API key as an environment variable:
export OPENAI_API_KEY='<your-key-here>'
-
Install the required dependencies:
pip install -r requirements.lock
Running the Chatbot#
-
Start the Chainlit server:
chainlit run main.py
-
Open your web browser and navigate to the URL displayed in the terminal (default:
http://localhost:8000
).
Using the Chatbot#
File Upload#
- Once the interface loads, the chatbot will prompt you to upload a
.txt
file. - Click on the upload area or drag-and-drop a text file. Ensure the file size is under 20MB.
- After processing, the chatbot will notify you that it’s ready for interaction.
Asking Questions#
- Type your questions in the input box or upload an audio file containing your question.
- If using text input, the chatbot will respond with an answer derived from the uploaded file's content.
- If using audio input:
- The chatbot converts speech to text using OpenAI Whisper.
- Processes the text query and provides a response.
- Converts the response to audio, enabling hands-free interaction.
Features#
- Text File Processing: Creates a "brain" for the uploaded file using Quivr for question answering.
- Speech-to-Text (STT): Transcribes user-uploaded audio queries using OpenAI Whisper.
- Text-to-Speech (TTS): Converts chatbot responses into audio for a seamless voice chat experience.
- Source Display: Shows relevant file sources for each response.
- Real-Time Updates: Uses streaming for live feedback during processing.
How It Works#
- File Upload: The user uploads a
.txt
file, which is temporarily saved and processed into a "brain" using Quivr. - Session Handling: Chainlit manages user sessions to retain the uploaded file and brain context.
- Voice Interaction:
- Audio queries are processed via the OpenAI Whisper API.
- Responses are generated and optionally converted into audio for playback.
- Streaming: The chatbot streams its answers incrementally, improving response speed.
Workflow#
Chat Start#
- Waits for a text file upload.
- Processes the file into a "brain."
- Notifies the user when ready for interaction.
On User Message#
- Extracts the "brain" and queries it using the message content.
- Streams the response back to the user.
- Displays file sources related to the response.
Audio Interaction#
- Captures and processes audio chunks during user input.
- Converts captured audio into text using Whisper.
- Queries the brain and provides both text and audio responses.
Enjoy interacting with your documents in both text and voice modes!