Voice Chatbot with Flask#

This example demonstrates a simple chatbot using Flask and Quivr, where users can upload a .txt file and ask questions based on its content. It supports speech-to-text and text-to-speech capabilities for a seamless interactive experience.

Prerequisites#

Python: Version 3.8 or higher.
OpenAI API Key: Ensure you have a valid OpenAI API key.

Installation#

Clone the repository and navigate to the project directory:

git clone https://github.com/QuivrHQ/quivr
cd examples/quivr-whisper

Set the OpenAI API key as an environment variable:
```
export OPENAI_API_KEY='<your-key-here>'
```
Install the required dependencies:
```
pip install -r requirements.lock
```

Running the Application#

Start the Flask server:
```
python app.py
```
Open your web browser and navigate to the URL displayed in the terminal (default: http://localhost:5000).

Using the Chatbot#

File Upload#

On the interface, upload a .txt file.
Ensure the file format is supported and its size is manageable.
The file will be processed, and a "brain" instance will be created.

Asking Questions#

Use the microphone to record your question (audio upload).
The chatbot will process your question and respond with an audio answer.

How It Works#

File Upload#

Users upload a .txt file.
The file is saved to the uploads directory and used to create a "brain" using Quivr.

Session Management#

Each session is associated with a unique ID, allowing the system to cache the user's "brain."

Speech-to-Text#

User audio files are processed with OpenAI's Whisper model to generate transcripts.

Question Answering#

The "brain" processes the transcribed text, retrieves relevant answers, and generates a response.

Text-to-Speech#

The answer is converted to audio using OpenAI's text-to-speech model and returned to the user.

Workflow#

Upload File:
- The user uploads a .txt file.
- A "brain" is created and cached for the session.
Ask Questions:
- The user uploads an audio file containing a question.
- The question is transcribed, processed, and answered using the "brain."
Answer Delivery:
- The answer is converted to audio and returned to the user as a Base64-encoded string.

Features#

File Upload and Processing:
- Creates a context-aware "brain" from the uploaded text file.
Audio-based Interaction:
- Supports speech-to-text for input and text-to-speech for responses.
Session Management:
- Retains user context throughout the interaction.
Integration with OpenAI:
- Uses OpenAI models for transcription, answer generation, and audio synthesis.

Enjoy interacting with your text files through an intuitive voice-based interface!