Skip to content

Commit 76342e5

Browse files
committed
feat: Implement Day 16 - SaaS Health Metrics Dashboard with retention curves and Metabase queries
- Added script to generate retention curves from subscription data (day16_DATA_generate_retention_curves.py) - Created comprehensive SQL queries for Metabase dashboard (day16_QUERIES_metabase.md) - Developed quick start guide for setting up the dashboard in Metabase Cloud (day16_QUICKSTART.md) - Compiled project summary and documentation for execution readiness (day16_SUMMARY.md) - Specified project dependencies in requirements file (day16_requirements.txt) - Established guidelines for capturing dashboard screenshots (screenshots/README.md)
1 parent e688bfc commit 76342e5

13 files changed

+3818
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Each one ships with full code and documentation.
3838
| 13 | Orchestration | Alert Triage Orchestrator (Finance Compliance) | Finance/Compliance | ✅ Complete | [Day 13](./day13) |
3939
| 14 | Orchestration | Transport Regulatory KPIs - Automated Email Reports | Government/Public Policy | ✅ Complete | [Day 14](./day14) |
4040
| 15 | Orchestration | Real-Time Analytics Orchestrator - Webhook Event Processing Pipeline | SaaS / Technology | ✅ Complete | [Day 15](./day15) |
41-
| 16 | Dashboards | TBD | TBD | 🚧 Planned | [Day 16](./day16) |
41+
| 16 | Dashboards | SaaS Health Metrics Dashboard - Metabase Cloud | TBD | ✅ Complete | [Day 16](./day16) |
4242
| 17 | Dashboards | TBD | TBD | 🚧 Planned | [Day 17](./day17) |
4343
| 18 | Dashboards | TBD | TBD | 🚧 Planned | [Day 18](./day18) |
4444
| 19 | Dashboards | TBD | TBD | 🚧 Planned | [Day 19](./day19) |

common/prompt library/VISUALIZATION_DELIVERY_CRITERIA.md

Lines changed: 1313 additions & 0 deletions
Large diffs are not rendered by default.

day16/.env.example

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Day 16: SaaS Health Metrics Dashboard - Metabase Cloud + BigQuery
2+
# Environment Variables
3+
4+
## BigQuery Configuration
5+
DAY16_GCP_PROJECT_ID=your-project-id-here
6+
DAY16_BQ_DATASET=day16_saas_metrics
7+
DAY16_BQ_LOCATION=US
8+
9+
## Metabase Cloud Configuration
10+
DAY16_METABASE_URL=https://your-instance.metabaseapp.com
11+
DAY16_METABASE_DASHBOARD_ID=your-dashboard-id
12+
13+
## Data Source
14+
DAY16_SOURCE_DB_PATH=../day06/data/day06_saas_metrics.db
15+
16+
## Service Account (DO NOT COMMIT THE ACTUAL KEY FILE)
17+
DAY16_SERVICE_ACCOUNT_KEY_PATH=./day16_metabase_key.json

day16/.gitignore

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Day 16 - Ignore sensitive files
2+
3+
# Service account keys
4+
*.json
5+
!day16_metabase_dashboard.json
6+
7+
# Data files (too large for git)
8+
data/*.csv
9+
10+
# Python cache
11+
__pycache__/
12+
*.pyc
13+
*.pyo
14+
15+
# Environment files
16+
.env

day16/README.md

Lines changed: 904 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
# Day 16: BigQuery Setup Guide
2+
3+
## Overview
4+
This guide helps you upload the Day 6 SaaS metrics data to BigQuery for use with Metabase Cloud.
5+
6+
---
7+
8+
## Step 1: Create BigQuery Dataset
9+
10+
### Using GCP Console:
11+
1. Go to [BigQuery Console](https://console.cloud.google.com/bigquery)
12+
2. Select your project (or create a new one)
13+
3. Click "Create Dataset"
14+
4. Use these settings:
15+
- **Dataset ID**: `day16_saas_metrics`
16+
- **Location**: `US` (or your preferred region)
17+
- **Default table expiration**: Never
18+
5. Click "Create Dataset"
19+
20+
### Using gcloud CLI:
21+
```bash
22+
# Set your project ID
23+
export DAY16_GCP_PROJECT_ID="your-project-id"
24+
25+
# Create dataset
26+
bq mk \
27+
--dataset \
28+
--location=US \
29+
--description="Day 16: SaaS Health Metrics for Metabase Dashboard" \
30+
${DAY16_GCP_PROJECT_ID}:day16_saas_metrics
31+
```
32+
33+
---
34+
35+
## Step 2: Upload CSV Files to BigQuery
36+
37+
You have 8 CSV files in `day16/data/` that need to be uploaded.
38+
39+
### Option A: Using GCP Console (Easiest)
40+
41+
For each CSV file, follow these steps:
42+
43+
1. In BigQuery, select your `day16_saas_metrics` dataset
44+
2. Click "Create Table"
45+
3. Configure:
46+
- **Source**: Upload
47+
- **Select file**: Choose the CSV file
48+
- **File format**: CSV
49+
- **Table name**: Use the filename without `.csv` (e.g., `day06_dashboard_kpis`)
50+
- **Schema**: Auto-detect
51+
- **Advanced options** → Header rows to skip: `1`
52+
4. Click "Create Table"
53+
54+
Repeat for all 8 files:
55+
- `day06_dashboard_kpis.csv`
56+
- `day06_mrr_summary.csv`
57+
- `day06_retention_curves.csv`
58+
- `day06_churn_by_cohort.csv`
59+
- `day06_customer_health.csv`
60+
- `day06_customers.csv`
61+
- `day06_subscriptions.csv`
62+
- `day06_mrr_movements.csv`
63+
64+
### Option B: Using bq CLI (Faster for bulk upload)
65+
66+
```bash
67+
# Navigate to the data directory
68+
cd day16/data
69+
70+
# Set your project ID
71+
export DAY16_GCP_PROJECT_ID="your-project-id"
72+
73+
# Upload all tables
74+
bq load --source_format=CSV --autodetect --skip_leading_rows=1 \
75+
${DAY16_GCP_PROJECT_ID}:day16_saas_metrics.day06_dashboard_kpis \
76+
day06_dashboard_kpis.csv
77+
78+
bq load --source_format=CSV --autodetect --skip_leading_rows=1 \
79+
${DAY16_GCP_PROJECT_ID}:day16_saas_metrics.day06_mrr_summary \
80+
day06_mrr_summary.csv
81+
82+
bq load --source_format=CSV --autodetect --skip_leading_rows=1 \
83+
${DAY16_GCP_PROJECT_ID}:day16_saas_metrics.day06_retention_curves \
84+
day06_retention_curves.csv
85+
86+
bq load --source_format=CSV --autodetect --skip_leading_rows=1 \
87+
${DAY16_GCP_PROJECT_ID}:day16_saas_metrics.day06_churn_by_cohort \
88+
day06_churn_by_cohort.csv
89+
90+
bq load --source_format=CSV --autodetect --skip_leading_rows=1 \
91+
${DAY16_GCP_PROJECT_ID}:day16_saas_metrics.day06_customer_health \
92+
day06_customer_health.csv
93+
94+
bq load --source_format=CSV --autodetect --skip_leading_rows=1 \
95+
${DAY16_GCP_PROJECT_ID}:day16_saas_metrics.day06_customers \
96+
day06_customers.csv
97+
98+
bq load --source_format=CSV --autodetect --skip_leading_rows=1 \
99+
${DAY16_GCP_PROJECT_ID}:day16_saas_metrics.day06_subscriptions \
100+
day06_subscriptions.csv
101+
102+
bq load --source_format=CSV --autodetect --skip_leading_rows=1 \
103+
${DAY16_GCP_PROJECT_ID}:day16_saas_metrics.day06_mrr_movements \
104+
day06_mrr_movements.csv
105+
```
106+
107+
---
108+
109+
## Step 3: Verify Upload
110+
111+
```bash
112+
# List tables
113+
bq ls ${DAY16_GCP_PROJECT_ID}:day16_saas_metrics
114+
115+
# Check row counts
116+
bq query --use_legacy_sql=false \
117+
"SELECT 'dashboard_kpis' as table_name, COUNT(*) as row_count FROM \`${DAY16_GCP_PROJECT_ID}.day16_saas_metrics.day06_dashboard_kpis\`
118+
UNION ALL
119+
SELECT 'mrr_summary', COUNT(*) FROM \`${DAY16_GCP_PROJECT_ID}.day16_saas_metrics.day06_mrr_summary\`
120+
UNION ALL
121+
SELECT 'retention_curves', COUNT(*) FROM \`${DAY16_GCP_PROJECT_ID}.day16_saas_metrics.day06_retention_curves\`
122+
UNION ALL
123+
SELECT 'churn_by_cohort', COUNT(*) FROM \`${DAY16_GCP_PROJECT_ID}.day16_saas_metrics.day06_churn_by_cohort\`
124+
UNION ALL
125+
SELECT 'customer_health', COUNT(*) FROM \`${DAY16_GCP_PROJECT_ID}.day16_saas_metrics.day06_customer_health\`
126+
UNION ALL
127+
SELECT 'customers', COUNT(*) FROM \`${DAY16_GCP_PROJECT_ID}.day16_saas_metrics.day06_customers\`
128+
UNION ALL
129+
SELECT 'subscriptions', COUNT(*) FROM \`${DAY16_GCP_PROJECT_ID}.day16_saas_metrics.day06_subscriptions\`
130+
UNION ALL
131+
SELECT 'mrr_movements', COUNT(*) FROM \`${DAY16_GCP_PROJECT_ID}.day16_saas_metrics.day06_mrr_movements\`"
132+
```
133+
134+
Expected row counts:
135+
- `dashboard_kpis`: 1
136+
- `mrr_summary`: 24
137+
- `retention_curves`: 299
138+
- `churn_by_cohort`: 52
139+
- `customer_health`: 500
140+
- `customers`: 500
141+
- `subscriptions`: 641
142+
- `mrr_movements`: 24
143+
144+
---
145+
146+
## Step 4: Create Service Account for Metabase
147+
148+
Metabase Cloud needs credentials to access your BigQuery data.
149+
150+
### Create Service Account:
151+
```bash
152+
# Set variables
153+
export DAY16_GCP_PROJECT_ID="your-project-id"
154+
export DAY16_SERVICE_ACCOUNT="metabase-day16"
155+
156+
# Create service account
157+
gcloud iam service-accounts create ${DAY16_SERVICE_ACCOUNT} \
158+
--display-name="Metabase Day 16 Dashboard" \
159+
--project=${DAY16_GCP_PROJECT_ID}
160+
161+
# Grant BigQuery Data Viewer role
162+
gcloud projects add-iam-policy-binding ${DAY16_GCP_PROJECT_ID} \
163+
--member="serviceAccount:${DAY16_SERVICE_ACCOUNT}@${DAY16_GCP_PROJECT_ID}.iam.gserviceaccount.com" \
164+
--role="roles/bigquery.dataViewer"
165+
166+
# Grant BigQuery Job User role (for running queries)
167+
gcloud projects add-iam-policy-binding ${DAY16_GCP_PROJECT_ID} \
168+
--member="serviceAccount:${DAY16_SERVICE_ACCOUNT}@${DAY16_GCP_PROJECT_ID}.iam.gserviceaccount.com" \
169+
--role="roles/bigquery.jobUser"
170+
171+
# Create and download key
172+
gcloud iam service-accounts keys create day16_metabase_key.json \
173+
--iam-account=${DAY16_SERVICE_ACCOUNT}@${DAY16_GCP_PROJECT_ID}.iam.gserviceaccount.com
174+
```
175+
176+
**Save the `day16_metabase_key.json` file** - you'll need it for Metabase Cloud connection.
177+
178+
---
179+
180+
## Step 5: Connect Metabase Cloud to BigQuery
181+
182+
1. Go to [Metabase Cloud](https://www.metabase.com/start/)
183+
2. Sign up or log in
184+
3. Click "Add a database"
185+
4. Select "BigQuery"
186+
5. Configure:
187+
- **Display Name**: Day 16 - SaaS Health Metrics
188+
- **Project ID**: Your GCP project ID
189+
- **Dataset ID**: `day16_saas_metrics`
190+
- **Service Account JSON**: Upload `day16_metabase_key.json`
191+
6. Click "Save"
192+
7. Click "Sync database schema now"
193+
194+
---
195+
196+
## Next Steps
197+
198+
Once connected, you're ready to create dashboard cards using the SQL queries in:
199+
- `day16_QUERIES_metabase.md`
200+
201+
---
202+
203+
## Troubleshooting
204+
205+
### "Permission denied" error:
206+
- Verify service account has both `bigquery.dataViewer` and `bigquery.jobUser` roles
207+
- Check that the service account JSON key is valid
208+
209+
### "Dataset not found":
210+
- Ensure dataset ID is exactly `day16_saas_metrics`
211+
- Verify dataset is in the same project as your service account
212+
213+
### "Table not found":
214+
- Run the verification query in Step 3 to confirm all tables uploaded successfully
215+
- Check table names match exactly (case-sensitive)
216+
217+
---
218+
219+
## Cost Considerations
220+
221+
- **Storage**: ~1 MB total (negligible cost)
222+
- **Queries**: Metabase preview queries are typically <10 MB scanned
223+
- **Expected monthly cost**: <$1 USD (likely free tier)
224+
225+
---
226+
227+
## Security Notes
228+
229+
- ⚠️ **DO NOT commit `day16_metabase_key.json` to git**
230+
- Add to `.gitignore`: `*.json` in day16 folder
231+
- Service account has read-only access (dataViewer role only)
232+
- Consider setting up BigQuery authorized views for production
233+
234+
---
235+
236+
Built for Christmas Data Advent 2025 - Day 16 (Project 4A)

day16/day16_DATA_export_to_csv.py

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
"""
2+
Day 16: Export SaaS metrics from SQLite to CSV for BigQuery upload
3+
Exports all tables from Day 6 database to CSV format
4+
"""
5+
6+
import sqlite3
7+
import pandas as pd
8+
import os
9+
from pathlib import Path
10+
11+
# Configuration
12+
DAY16_SOURCE_DB = "../day06/data/day06_saas_metrics.db"
13+
DAY16_OUTPUT_DIR = "./data"
14+
15+
# Tables to export
16+
DAY16_TABLES = [
17+
"day06_dashboard_kpis",
18+
"day06_mrr_summary",
19+
"day06_retention_curves",
20+
"day06_churn_by_cohort",
21+
"day06_customer_health",
22+
"day06_customers",
23+
"day06_subscriptions",
24+
"day06_mrr_movements"
25+
]
26+
27+
def day16_export_table_to_csv(db_path: str, table_name: str, output_dir: str):
28+
"""Export a single table to CSV"""
29+
try:
30+
conn = sqlite3.connect(db_path)
31+
df = pd.read_sql_query(f"SELECT * FROM {table_name}", conn)
32+
conn.close()
33+
34+
# Save to CSV
35+
output_path = os.path.join(output_dir, f"{table_name}.csv")
36+
df.to_csv(output_path, index=False)
37+
38+
print(f"✅ Exported {table_name}: {len(df)} rows → {output_path}")
39+
return True
40+
41+
except Exception as e:
42+
print(f"❌ Error exporting {table_name}: {e}")
43+
return False
44+
45+
def day16_main():
46+
"""Export all tables from Day 6 database"""
47+
print("=" * 60)
48+
print("Day 16: Exporting SaaS Metrics to CSV for BigQuery")
49+
print("=" * 60)
50+
51+
# Create output directory
52+
Path(DAY16_OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
53+
54+
# Check if source database exists
55+
if not os.path.exists(DAY16_SOURCE_DB):
56+
print(f"❌ Error: Database not found at {DAY16_SOURCE_DB}")
57+
return
58+
59+
print(f"\n📂 Source: {DAY16_SOURCE_DB}")
60+
print(f"📂 Output: {DAY16_OUTPUT_DIR}\n")
61+
62+
# Export each table
63+
success_count = 0
64+
for table in DAY16_TABLES:
65+
if day16_export_table_to_csv(DAY16_SOURCE_DB, table, DAY16_OUTPUT_DIR):
66+
success_count += 1
67+
68+
print(f"\n{'=' * 60}")
69+
print(f"✅ Export complete: {success_count}/{len(DAY16_TABLES)} tables exported")
70+
print(f"{'=' * 60}")
71+
print("\nNext steps:")
72+
print("1. Upload CSV files to BigQuery")
73+
print("2. Use the SQL queries in day16_QUERIES_metabase.md")
74+
print("3. Connect Metabase Cloud to BigQuery")
75+
76+
if __name__ == "__main__":
77+
day16_main()

0 commit comments

Comments
 (0)