Skip to main content

Data Overview

Explore the diabetes dataset powering our 30-day readmission predictions.

Dataset Snapshot

Feature Type Missing Unique
Age Categorical 0% 10
Race Categorical 2% 8
Diagnoses Integer 0% 16
Medications Integer 0% 25

Data Quality Report

  • 101,766 patient encounters across 130 U.S. hospitals
  • 50+ clinical and administrative features
  • ! 2.3% missing values handled via KNN imputation
  • 11% readmission rate within 30 days
Wooden block spelling data on a table

Raw Data Capture

Original EHR extracts before cleaning and transformation.

Google Sheets

Data Dictionary

Comprehensive schema and definitions for every field.

Speedcurve Performance Analytics

Analytics Pipeline

Automated ETL ensuring daily freshness and quality checks.

Model Performance

Benchmarked against industry standards with explainable insights.

ROC & Precision-Recall Curves

AUC = 0.84
Validation ROC
Sensitivity 0.79
Specificity 0.82

Calibration Plot

Brier = 0.12
Well-calibrated probabilities
Predicted probabilities align closely with observed readmission rates across risk deciles.

Confusion Matrix

8,432
True Negatives
1,218
False Positives
987
False Negatives
2,765
True Positives
Precision 0.69
Recall 0.74
Speedcurve Performance Analytics

Model Comparison

XGBoost outperformed Random Forest and Logistic Regression across all metrics.

Digital Data Analytics Tablet Modern Office Workspace monitor display

Real-time Monitoring

Performance drift alerts trigger when AUC drops below 0.80.

An illuminated car dashboard

Feature Stability

Top 10 features remain consistent across validation windows.

Risk Stratification Tools

Identify high-risk patients in real time with explainable risk scores.

Patient Risk Calculator

Predicted Risk
68.3%
High Risk
Top risk factors: Age 70-80, LOS >3 days, 7+ diagnoses

Risk Buckets & Interventions

L
Low Risk
<30% readmission probability
Standard care
M
Moderate Risk
30-60% readmission probability
Discharge call
H
High Risk
>60% readmission probability
Care coordination
Air quality monitor shows levels of pollutants

Real-time Alerts

Instant notifications when patients cross risk thresholds.

telemedicine simulation center

Telemedicine Integration

Risk scores embedded directly into virtual care platforms.

a woman points to a chart on a large screen

Interactive Workshops

Train staff to interpret risk scores and take action.

Interactive Dashboard

Actionable insights in real time for clinicians and administrators.

Live KPI Tiles

12.4%
Readmission Rate
847
Total Discharges
126
High-Risk Patients
.2M
Est. Savings

Trend Analysis

-2.3%
Readmission reduction vs last month
30-day avg 11.8%
Target 10.5%
Gap -1.3pp

Top Risk Factors

Length of Stay >5 days
78%
Age 70-80
65%
8+ Diagnoses
58%
Medications >12
52%
Air quality monitor shows levels of pollutants

Power BI Integration

Native embedding with row-level security and scheduled refresh.

a woman points to a chart on a large screen

Drill-through Actions

Click any metric to explore underlying patient-level details.

telemedicine simulation center

Mobile Optimized

Full functionality on tablets and phones for bedside access.

Technical Details

Deep dive into architecture, dependencies, and reproducibility.

Tech Stack

Languages
Python 3.10, SQL, R
Core Libraries
scikit-learn, pandas, XGBoost
Visualization
matplotlib, seaborn, Power BI
Data Storage
PostgreSQL, Parquet, CSV
Orchestration
Apache Airflow, cron
Deployment
Docker, Azure Container Apps

Model Pipeline

  1. 1
    Data Ingestion
    Daily ETL from EHR via HL7 FHIR APIs
  2. 2
    Pre-processing
    Imputation, encoding, outlier handling
  3. 3
    Feature Engineering
    LOSRatio, medCount, diagGroup
  4. 4
    Inference
    XGBoost predicts readmission probability
  5. 5
    Post-processing
    Calibration, risk stratification, SHAP

Reproducibility Checklist

  • requirements.txt pinned to minor versions
  • Random seeds fixed across train/val/test splits
  • Docker image tagged with Git commit SHA
  • MLflow tracking for every experiment run
  • Data snapshot hashes stored in DVC
Speedcurve Performance Analytics

Latency Budget

P99 inference <200 ms on 8 vCPU container.

Digital Data Analytics Tablet Modern Office Workspace monitor display

CI/CD Pipeline

GitHub Actions runs unit, integration, and load tests on every PR.

An illuminated car dashboard

Monitoring

Prometheus metrics: latency, errors, drift, business KPIs.

Deployment

Zero-downtime rollout with automated rollback and monitoring.

Architecture Overview

Containerized Microservice
Docker image <400 MB, based on python:3.10-slim
Orchestration
Azure Container Apps with KEDA autoscaling 2-20 replicas
API Gateway
Azure API Management with rate limiting and JWT auth
Data Layer
PostgreSQL with pgvector for similarity search

CI/CD Pipeline

Build
2-3 min
Test
4-6 min
Deploy
1-2 min
GitHub Actions triggers on push to main; blue-green deployment with automated rollback on health-check failure.

Security Controls

  • 🔒 Network isolation via Azure VNet and NSGs
  • 🔒 Managed Identity for keyless authentication
  • 🔒 Data encrypted at rest with CMK and TLS 1.3 in transit
  • 🔒 Weekly vulnerability scans and dependency updates
Speedcurve Performance Analytics

Canary Releases

Gradual rollout to 5%, 25%, 100% with automatic rollback on error rate >1%.

Analytics dashboard for content marketing with 1981 Digital

Feature Flags

Toggle model versions and UI components without redeploying code.

black flat screen computer monitor

SLA & Uptime

99.9% monthly uptime with 15-minute RTO and 5-minute RPO targets.

Docs

Documentation Resources

README.md
Complete project overview with setup instructions
View on GitHub →
API Documentation
RESTful API endpoints and usage examples
Model Report
Detailed model performance and validation results
Medical analytics dashboard Telemedicine simulation center NIH Clinical Center Laboratory
© 2024 Readmission Risk Model. All rights reserved.