Data Overview
Explore the diabetes dataset powering our 30-day readmission predictions.
Dataset Snapshot
| Feature | Type | Missing | Unique |
|---|---|---|---|
| Age | Categorical | 0% | 10 |
| Race | Categorical | 2% | 8 |
| Diagnoses | Integer | 0% | 16 |
| Medications | Integer | 0% | 25 |
Data Quality Report
- ✓ 101,766 patient encounters across 130 U.S. hospitals
- ✓ 50+ clinical and administrative features
- ! 2.3% missing values handled via KNN imputation
- ✓ 11% readmission rate within 30 days
Raw Data Capture
Original EHR extracts before cleaning and transformation.
Data Dictionary
Comprehensive schema and definitions for every field.
Analytics Pipeline
Automated ETL ensuring daily freshness and quality checks.
Model Performance
Benchmarked against industry standards with explainable insights.
ROC & Precision-Recall Curves
Calibration Plot
Confusion Matrix
Model Comparison
XGBoost outperformed Random Forest and Logistic Regression across all metrics.
Real-time Monitoring
Performance drift alerts trigger when AUC drops below 0.80.
Feature Stability
Top 10 features remain consistent across validation windows.
Risk Stratification Tools
Identify high-risk patients in real time with explainable risk scores.
Patient Risk Calculator
Risk Buckets & Interventions
Real-time Alerts
Instant notifications when patients cross risk thresholds.
Telemedicine Integration
Risk scores embedded directly into virtual care platforms.
Interactive Workshops
Train staff to interpret risk scores and take action.
Interactive Dashboard
Actionable insights in real time for clinicians and administrators.
Live KPI Tiles
Trend Analysis
Top Risk Factors
Power BI Integration
Native embedding with row-level security and scheduled refresh.
Drill-through Actions
Click any metric to explore underlying patient-level details.
Mobile Optimized
Full functionality on tablets and phones for bedside access.
Technical Details
Deep dive into architecture, dependencies, and reproducibility.
Tech Stack
Model Pipeline
-
1
Data IngestionDaily ETL from EHR via HL7 FHIR APIs
-
2
Pre-processingImputation, encoding, outlier handling
-
3
Feature EngineeringLOSRatio, medCount, diagGroup
-
4
InferenceXGBoost predicts readmission probability
-
5
Post-processingCalibration, risk stratification, SHAP
Reproducibility Checklist
- ✓ requirements.txt pinned to minor versions
- ✓ Random seeds fixed across train/val/test splits
- ✓ Docker image tagged with Git commit SHA
- ✓ MLflow tracking for every experiment run
- ✓ Data snapshot hashes stored in DVC
Latency Budget
P99 inference <200 ms on 8 vCPU container.
CI/CD Pipeline
GitHub Actions runs unit, integration, and load tests on every PR.
Monitoring
Prometheus metrics: latency, errors, drift, business KPIs.
Deployment
Zero-downtime rollout with automated rollback and monitoring.
Architecture Overview
CI/CD Pipeline
Security Controls
- 🔒 Network isolation via Azure VNet and NSGs
- 🔒 Managed Identity for keyless authentication
- 🔒 Data encrypted at rest with CMK and TLS 1.3 in transit
- 🔒 Weekly vulnerability scans and dependency updates
Canary Releases
Gradual rollout to 5%, 25%, 100% with automatic rollback on error rate >1%.
Feature Flags
Toggle model versions and UI components without redeploying code.
SLA & Uptime
99.9% monthly uptime with 15-minute RTO and 5-minute RPO targets.