This is a Python-based interface to interact with Google's Gemini AI models, now enhanced with advanced file processing capabilities. The app supports text, image, and ZIP file inputs, making it a powerful tool for developers and AI enthusiasts. This version is powered by DeepSeek and me , showcasing real-life coding collaboration between humans and AI.
- Chat with Gemini Models: Supports various Gemini models, including vision-enabled ones.
- Image Analysis: Upload images for AI-powered analysis (e.g.,
gemini-1.5-pro-vision-latest
). - ZIP File Processing: Extract and analyze contents of ZIP files, including text, code, and more.
- Customizable Settings: Adjust temperature, max tokens, and system prompts for tailored interactions.
- Interactive UI: Built with Streamlit for an intuitive user experience.
- Use Your Own API Key: Securely integrate your Google AI API key.
Try the app hosted on Hugging Face: Gemini AI Chat
This project started as a simple chat interface but evolved into a robust tool thanks to the collaboration between DeepSeek (AI) and ME. Here's how we tackled the challenges:
-
ZIP File Processing:
- Problem: Initial versions only displayed file names, not their contents.
- Solution: We implemented a dynamic file reader that extracts and processes text from supported file types (e.g.,
.txt
,.py
,.php
,.csv
,.pdf
). - Learning: Handling binary files and ensuring compatibility with various file formats.
-
Error Handling:
- Problem: The API sometimes returned empty responses or errors.
- Solution: Added robust error handling and validation for API responses, ensuring users receive clear feedback.
-
Real-Life Coding:
- Problem: Gemini 1 and 2 APIs struggled with advanced file processing tasks.
- Solution: DeepSeek stepped in to enhance the code, demonstrating the power of human-AI collaboration.
-
User Experience:
- Problem: Users needed clearer instructions and feedback.
- Solution: Improved UI with emojis, file previews, and detailed error messages.
Follow these steps to run the app locally:
-
Clone the Repository or download the Repository:
-
Install Dependencies: Make sure you have Python 3.7 or newer installed.
pip install -r requirements.txt
-
Run the App:
python app.py
-
Access the App: Open your browser and navigate to
http://localhost:8501
.
-
Enter Your API Key:
- Provide your Google AI API key in the sidebar.
-
Select a Model:
- For text interactions, choose models like
gemini-1.5-pro
. - For image-related tasks, use vision-enabled models (e.g.,
gemini-1.5-pro-vision-latest
).
- For text interactions, choose models like
-
Adjust Settings:
- Set
temperature
andmax tokens
for desired output style.
- Set
-
Upload Files:
- Upload images, text files, or ZIP archives for analysis.
- Supported file types:
.txt
,.py
,.php
,.csv
,.pdf
,.zip
, and more.
-
Chat with the AI:
- Type your message and press Enter.
The app uses:
- Streamlit: For creating the interactive web UI.
- Google Generative AI Python SDK: To connect with Google's Gemini models.
- Pillow: For image processing.
- Base64 Encoding: To handle image data.
- Zipfile & PyPDF2: For processing ZIP and PDF files.
For more details, see the code in app.py
.
This project highlights the importance of:
- Iterative Development: Starting simple and gradually adding features.
- Error Handling: Ensuring users receive clear feedback when something goes wrong.
- Human-AI Collaboration: Combining human creativity with AI's problem-solving capabilities.
- DeepSeek: For providing AI-powered coding assistance and helping overcome technical challenges.
- VolkanSah: For leading the project and integrating real-world use cases.
- Google Gemini API: For enabling powerful AI interactions.
If you encounter any issues or have suggestions, feel free to open an issue or contribute by submitting a pull request.
This project is licensed under the GPL3 License. See the LICENSE file for details.