Skip to content

Commit 8f7aa54

Browse files
committed
project commit
1 parent 5db78f7 commit 8f7aa54

6 files changed

+2658
-2
lines changed

ProjectPresentation.pdf

3.15 MB
Binary file not shown.

README.md

Lines changed: 71 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,71 @@
1-
# AiDrivenKeywordExtractionTagGeneration
2-
LLM-powered pipeline for extracting entities and generating tags from articles and images using GPT-4.1, optimized for media indexing and content analysis.
1+
# AI-Driven Keyword Extraction & Tag Generation
2+
3+
This project provides an end-to-end, LLM-powered system for structured information extraction and tag generation from multimedia content. It is designed to help organizations understand which athletes, teams, disciplines, and events are being covered in media, and enrich visual content with consistent tags for search and content management.
4+
5+
## Features
6+
7+
- **Entity Extraction from Articles**
8+
- Extracts company-related athletes, teams, disciplines, and events
9+
- Configurable GPT-4.1-based pipeline with support for multiple reruns and temperature tuning
10+
- Consolidation logic to merge multiple runs for maximum recall
11+
12+
- **Tag Generation from Images**
13+
- Uses GPT-4 Vision API to describe images with high-quality tags
14+
- Detects subjects, actions, settings, brand elements, and technical components
15+
- Tag consolidation prompt ensures consistent and relevant output
16+
17+
- **Prompt Engineering Framework**
18+
- Structured prompt design (Goal, Format, Constraints, Context)
19+
- Evaluation loop based on test tiers: minimal, small, and full sample
20+
- Built-in evaluation criteria (recall, precision, F1, confidence-based scoring planned)
21+
22+
- **Front-End Apps (Streamlit)**
23+
- Configurable UIs for both text and image pipelines
24+
- Advanced settings panel for pro users to adjust model, temperature, runs
25+
26+
## Tech Stack
27+
28+
- **Backend:** Python 3, OpenAI GPT-4.1 via Responses API
29+
- **Frontend:** Streamlit
30+
31+
## Evaluation Highlights
32+
33+
- Multi-model and rerun comparison (GPT-4.1 vs GPT-4.1 mini)
34+
- A/B testing setup to compare prompts and models
35+
- Prompt consolidation via LLM to ensure structured, consistent results
36+
- Modular design for versioned prompt & model swapping
37+
38+
## Project Structure
39+
<pre>
40+
├── entityExtraction.py # Entity extraction logic from article JSONs
41+
├── entityExtraction.ipynb # Jupyter notebook for iterative prompt tuning and testing
42+
├── tagGeneration.py # Image tag generation pipeline using GPT-4 Vision
43+
├── tagGeneration.ipynb # Visual exploration and prompt iterations for tagging
44+
├── ProjectPresentation.pdf # Project presentation for case study
45+
├── LICENSE # MIT license
46+
└── README.md # This file
47+
</pre>
48+
49+
## Setup & Usage
50+
51+
1. Clone the repository
52+
2. Install dependencies
53+
3. Set environment variables
54+
4. Run extraction or tagging
55+
- Use `entityExtraction.ipynb` for articles
56+
- Use `tagGeneration.ipynb` for image tag generation
57+
5. Optionally, launch the Streamlit apps
58+
- `streamlit run entityExtraction.py`
59+
- `streamlit run tagGeneration.py`
60+
61+
## Future Enhancements
62+
- Confidence scores for model outputs
63+
- Domain-specific semantic validation
64+
- Adaptive prompt rerunning based on low-confidence tags
65+
- Fine-tuning or model personalization
66+
- Human-in-the-loop feedback interface
67+
68+
## License
69+
70+
MIT License – see LICENSE for details.
71+

0 commit comments

Comments
 (0)