** Previously known as dbt-llm-agent **
Ragstar is an AI agent that learns the context of your dbt project to answer questions, generate documentation, and bring data insights closer to your users. Interact via Slack or CLI, and watch it improve over time with feedback.
BETA NOTICE: Ragstar is currently in beta. Core features include agentic model interpretation and semantic question answering about your dbt project.
- Democratize Data Access: Allow anyone to ask questions about your dbt project in natural language via Slack or CLI.
- Automate Documentation: Generate model and column descriptions where they're missing, improving data catalog quality.
- Enhance Data Discovery: Quickly find relevant models and understand their logic without digging through code.
- Continuous Learning: Ragstar learns from feedback to provide increasingly accurate and helpful answers.
- Natural Language Q&A: Ask about models, sources, metrics, lineage, etc.
- Agentic Interpretation: Intelligently analyzes dbt models, understanding logic and context.
- Automated Documentation Generation: Fills documentation gaps using LLMs.
- Semantic Search: Finds relevant assets based on meaning, not just keywords.
- dbt Integration: Parses metadata from dbt Cloud, local runs (
manifest.json
), or source code. - Postgres + pgvector Backend: Stores metadata and embeddings efficiently.
- Feedback Loop: Tracks questions and feedback for improvement.
- Slack Integration: Built-in Slackbot for easy interaction.
- Accelerate Data Discovery: Quickly find relevant dbt models and understand their purpose without digging through code.
- Improve Onboarding: Help new team members understand the dbt project structure and logic faster.
- Maintain Data Documentation: Keep dbt documentation up-to-date with automated generation and suggestions.
- Enhance Data Governance: Gain better visibility into data lineage and model dependencies.
- Debug dbt Models: Ask clarifying questions about model logic and calculations.
Ragstar combines several technologies to provide its capabilities:
- dbt Project Parsing: Extracts comprehensive metadata from dbt artifacts (
manifest.json
) or source files (.sql
,.yml
), including models, sources, exposures, metrics, tests, columns, descriptions, and lineage. - PostgreSQL Database with pgvector: Serves as the central knowledge store. It holds structured metadata parsed from the dbt project, generated documentation, question/answer history, and vector embeddings of model and column descriptions for semantic search.
- Vector Embeddings: Creates numerical representations (embeddings) of model and column documentation using sentence-transformer models. These embeddings capture semantic meaning, enabling powerful search capabilities.
- Large Language Models (LLMs): Integrates with LLMs (e.g., OpenAI's GPT models) via APIs to:
- Understand natural language questions.
- Generate human-readable answers based on retrieved context from the database and embeddings.
- Interpret model logic and generate documentation.
- Agentic Reasoning: Employs a step-by-step reasoning process, especially for model interpretation, where it breaks down the task, gathers evidence (e.g., upstream model definitions), and synthesizes an interpretation, similar to how a human analyst would approach it.
- CLI Interface: Provides command-line tools (
ragstar ...
) for initialization, embedding generation, asking questions, providing feedback, and managing the system.
Setting up Ragstar involves configuring environment variables and initializing the application with your dbt project data. The recommended method is using Docker Compose, which bundles the application and a PostgreSQL database with the required pgvector
extension.
-
Clone the repository:
git clone https://github.com/pragunbhutani/ragstar.git cd ragstar
-
Configure Environment Variables: Copy the example environment file:
cp .env.example .env
Edit the
.env
file:- Required: Set your
OPENAI_API_KEY
. - dbt Connection: Choose one method:
- dbt Cloud: Fill in
DBT_CLOUD_URL
,DBT_CLOUD_ACCOUNT_ID
,DBT_CLOUD_API_KEY
. - Local dbt: Fill in
DBT_PROJECT_PATH
(absolute path to your local dbt project).
- dbt Cloud: Fill in
- Slack (Optional): If using the Slackbot, set
SLACK_BOT_TOKEN
andSLACK_SIGNING_SECRET
. - Database (Optional): By default, Docker Compose sets up an internal PostgreSQL database. If you want to use an external PostgreSQL database (must have
pgvector
enabled), uncomment and setEXTERNAL_POSTGRES_URL
. This will override the internal database settings. - Other: Review other variables like
APP_PORT
,RAGSTAR_LOG_LEVEL
, etc., and adjust if needed.
- Required: Set your
-
Build and Run with Docker Compose: From the
ragstar
directory, run:docker compose up --build -d
This command builds the Docker image (if it doesn't exist), starts the application container (
ragstar_app
) and the database container (ragstar_db
), and runs them in the background (-d
). The database will be automatically initialized. -
Initialize with dbt Project Data: Once the containers are running, you need to load your dbt project's metadata. Run the
init
command inside the runningapp
container:# If using dbt Cloud configuration (credentials in .env): docker compose exec app ragstar init cloud # If using local dbt project path (DBT_PROJECT_PATH in .env): docker compose exec app ragstar init local # If using local dbt source code (DBT_PROJECT_PATH in .env): docker compose exec app ragstar init source
- Use
cloud
,local
, orsource
depending on the dbt configuration you set in.env
. - The
local
andsource
modes requireDBT_PROJECT_PATH
to point to your dbt project as mounted within the container. Seedocker-compose.yml
volumes.
- Use
-
Build the Knowledge Base (Interpret & Embed): After initializing, you need to process your models so the agent can answer questions about them.
- Interpret (Optional but Recommended): Generate descriptions for models lacking them.
- Embed (Required for Q&A): Create vector embeddings for semantic search.
# Example: Interpret models tagged 'knowledge', save, and then embed ALL models docker compose exec app ragstar interpret --select "tag:knowledge" --save docker compose exec app ragstar embed --select "*" # Example: Embed all models directly (if documentation exists) docker compose exec ragstar_app ragstar embed --select "*"
Choose the appropriate
--select
argument based on which models you want to process. SeeCore Commands
below for more details. -
Verify: Check the logs to ensure everything started correctly:
docker compose logs -f ragstar_app
You should see logs indicating successful initialization and the API server starting.
If you prefer not to use Docker, you can set up a local Python environment.
-
Prerequisites:
- Python 3.10 or higher.
- Poetry for dependency management.
- A running PostgreSQL server (version 11+) with the
pgvector
extension enabled.
-
Check Python Version:
python --version # or python3 --version
-
Clone Repository:
git clone https://github.com/pragunbhutani/ragstar.git cd ragstar
-
Install Dependencies:
# Install Poetry if needed: curl -sSL https://install.python-poetry.org | python3 - poetry install
-
Set up PostgreSQL:
- Install PostgreSQL and
pgvector
. - Create a database (e.g.,
ragstar
). - Enable the
pgvector
extension in the database:-- Run in psql connected to your database CREATE EXTENSION IF NOT EXISTS vector;
- Install PostgreSQL and
-
Configure Environment Variables: Copy the example environment file and fill in your details:
cp .env.example .env
Edit
.env
:- Required: Set
OPENAI_API_KEY
. - Required: Set
EXTERNAL_POSTGRES_URL
to your database connection string (e.g.,postgresql://user:password@host:port/dbname
). The Docker Compose variables (POSTGRES_DB
,POSTGRES_USER
, etc.) are ignored in this setup. - dbt Connection: Choose one method and configure the relevant variables (
DBT_CLOUD_...
orDBT_PROJECT_PATH
). - Slack (Optional): Configure Slack variables if needed.
- Other: Adjust
RAGSTAR_LOG_LEVEL
, etc.
- Required: Set
-
Initialize the Database Schema: Run the database initialization command:
poetry run ragstar init-db
-
Initialize with dbt Project Data: Load your dbt project metadata using the appropriate
init
mode:# Example using dbt Cloud (ensure DBT_CLOUD_* vars are set) poetry run ragstar init cloud # Example using local dbt project (ensure DBT_PROJECT_PATH is set or use --project-path) poetry run ragstar init local --project-path /path/to/your/dbt/project # Example using source code poetry run ragstar init source /path/to/your/dbt/project
-
Build the Knowledge Base (Interpret & Embed): After initializing, process your models so the agent can answer questions.
- Interpret (Optional but Recommended): Generate descriptions for models lacking them.
- Embed (Required for Q&A): Create vector embeddings for semantic search.
# Example: Interpret models tagged 'staging', save, and then embed ALL models poetry run ragstar interpret --select "tag:staging" --save poetry run ragstar embed --select "*" # Example: Embed all models directly (if documentation exists) poetry run ragstar embed --select "*"
Choose the appropriate
--select
argument. SeeCore Commands
below.
After setup and initialization, you can interact with Ragstar.
Most commands should be run inside the app
container using docker compose exec
:
# Example: Ask a question
docker compose exec app ragstar ask "Explain the fct_orders model"
# Example: Interpret and embed a model
docker compose exec app ragstar interpret --select "fct_orders" --save --embed
# Example: Run the Slackbot (will run in the foreground of the exec command)
# Note: The API server runs automatically via the Dockerfile's CMD.
# To run the Slackbot *instead* of the API server, you would need to
# modify the command in docker-compose.yml or the Dockerfile CMD.
# For typical use, run the Slackbot as a separate process if needed,
# perhaps in a different container or locally if developing.
# If you *only* want the Slackbot, change the docker-compose command:
# command: poetry run ragstar serve
The API server (for potential future UI or direct API calls) is started automatically by the docker compose up
command.
Run commands directly using poetry run
:
# Example: Ask a question
poetry run ragstar ask "Explain the fct_orders model"
# Example: Interpret and embed a model
poetry run ragstar interpret --select "fct_orders" --save --embed
# Example: Run the API server
poetry run uvicorn ragstar.api.server:app --host 0.0.0.0 --port 8000 --reload
# Example: Run the Slackbot
poetry run ragstar serve
ragstar init <cloud|local|source> [options]
: Loads dbt project metadata into Ragstar. Run this first.ragstar interpret --select <selector> [--save] [--embed]
: (Optional) Uses LLM to analyze and generate documentation for selected models.--save
writes to DB,--embed
runs embedding after.ragstar embed --select <selector>
: (Required for Q&A) Creates vector embeddings for selected models with documentation, enabling semantic search.ragstar ask "<question>"
: Ask a natural language question about your dbt project.ragstar questions
: List previous questions asked.ragstar feedback <question_id> [--useful | --not-useful] [--text "<feedback>"]
: Provide feedback on an answer to improve the agent.ragstar serve
: Starts the Slackbot (requires Slack env vars).ragstar list
: List models found in the database.ragstar model-details <model_name>
: Show detailed information about a specific model.ragstar init-db
: (Local Setup Only) Initializes the database schema.
Contributions are welcome! Please follow standard fork-and-pull-request workflow.