Voice Chatbot with Flask#
This example demonstrates a simple chatbot using Flask and Quivr, where users can upload a .txt
file and ask questions based on its content. It supports speech-to-text and text-to-speech capabilities for a seamless interactive experience.
Prerequisites#
- Python: Version 3.8 or higher.
- OpenAI API Key: Ensure you have a valid OpenAI API key.
Installation#
-
Clone the repository and navigate to the project directory:
git clone https://github.com/QuivrHQ/quivr cd examples/quivr-whisper
-
Set the OpenAI API key as an environment variable:
export OPENAI_API_KEY='<your-key-here>'
-
Install the required dependencies:
pip install -r requirements.lock
Running the Application#
-
Start the Flask server:
python app.py
-
Open your web browser and navigate to the URL displayed in the terminal (default:
http://localhost:5000
).
Using the Chatbot#
File Upload#
- On the interface, upload a
.txt
file. - Ensure the file format is supported and its size is manageable.
- The file will be processed, and a "brain" instance will be created.
Asking Questions#
- Use the microphone to record your question (audio upload).
- The chatbot will process your question and respond with an audio answer.
How It Works#
File Upload#
- Users upload a
.txt
file. - The file is saved to the
uploads
directory and used to create a "brain" using Quivr.
Session Management#
- Each session is associated with a unique ID, allowing the system to cache the user's "brain."
Speech-to-Text#
- User audio files are processed with OpenAI's Whisper model to generate transcripts.
Question Answering#
- The "brain" processes the transcribed text, retrieves relevant answers, and generates a response.
Text-to-Speech#
- The answer is converted to audio using OpenAI's text-to-speech model and returned to the user.
Workflow#
-
Upload File:
- The user uploads a
.txt
file. - A "brain" is created and cached for the session.
- The user uploads a
-
Ask Questions:
- The user uploads an audio file containing a question.
- The question is transcribed, processed, and answered using the "brain."
-
Answer Delivery:
- The answer is converted to audio and returned to the user as a Base64-encoded string.
Features#
-
File Upload and Processing:
- Creates a context-aware "brain" from the uploaded text file.
-
Audio-based Interaction:
- Supports speech-to-text for input and text-to-speech for responses.
-
Session Management:
- Retains user context throughout the interaction.
-
Integration with OpenAI:
- Uses OpenAI models for transcription, answer generation, and audio synthesis.
Enjoy interacting with your text files through an intuitive voice-based interface!