2026 CLINICAL CHALLENGES
Projects should align with the aim under the case title, but may not need to address all aspects of the clinical vignette.
{LucyRx: Comprehensive, Real-time Pharmacy Benefit Management}
Aim: Improve patient access, cost-effectiveness, and adherence through smarter pharmacy benefits management.
-
Maria Lopez is a 54-year-old woman with type 2 diabetes, hypertension, and hyperlipidemia. She manages multiple medications prescribed by different specialists. Recently, Maria has experienced difficulty filling her prescriptions due to complex pharmacy benefit restrictions. Her insurer’s formulary often requires prior authorizations, step therapy, or switching to preferred alternatives that her providers are unfamiliar with, resulting in delayed treatment and confusion during clinic visits.
Maria’s blood glucose and blood pressure have become more volatile over the past six months, and she reports skipping doses because the cost of her insulin co-pays and antihypertensive agents has increased unexpectedly after benefits changes. When she attempts to contact her insurer for coverage clarification, hold times are long and responses inconsistent, leaving Maria uncertain which medications are covered or what therapeutic alternatives her plan will approve.
Her care team spends significant time during visits reconciling benefit restrictions, seeking formulary alternatives, and coordinating prior authorizations via phone and fax. This administrative burden reduces time available for clinical assessment and contributes to delays in therapy adjustments.
-
The ideal solution would integrate real-time pharmacy benefit information—including formulary status, prior authorization requirements, and out-of-pocket cost estimates—into clinicians’ workflows and patient communication platforms. It would proactively notify providers and patients about upcoming changes to coverage, suggest therapeutically appropriate alternatives that align with benefits, and automate prior authorization submissions where possible. The goal is to ensure patients like Maria receive the right medications at the right time, reduce medication non-adherence due to cost or coverage barriers, and decrease administrative workload for care teams.
This challenge invites innovative approaches that combine clinical data, benefit design intelligence, and patient engagement tools to optimize pharmacy benefits, improve outcomes, and reduce total cost of care.
Formulary guidance: https://www.cms.gov/medicare/coverage/prescription-drug-coverage/formulary-guidance
https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-248A
{Multimodal Home Health Intelligence Agent}
Aim: Develop an integrative home monitoring system for at-home recovery
-
Eleanor is a 74-year-old retired schoolteacher living independently in a small apartment. Her daughter, who lives two hours away, has noticed that Eleanor occasionally repeats herself during phone calls and recently forgot a scheduled family dinner. Eleanor herself jokes that her memory "isn't what it used to be," but she brushes off her daughter's concern, after all, she passed her last annual physical without issue. Eleanor's primary care physician suspects mild cognitive changes but cannot justify a full neuropsychological evaluation based on a single office visit. The standard tools available: a 10-minute clock-drawing test or a brief recall questionnaire feel blunt and incomplete. They capture only a snapshot, not a story. What Eleanor's care team really needs is a window into her cognitive health over time: How has her recall changed month to month? Are her response times slowing? Is she compensating for memory gaps in ways a brief office test would never catch? Eleanor, for her part, is sharp enough to feel the stigma of memory testing and resistant to anything that feels clinical or condescending. She is, however, an avid smartphone user who plays word puzzles every morning with her coffee.
-
Her physician wonders whether an AI-powered tool could meet her where she already is: something that feels like a daily game rather than a medical exam, uses her voice and the familiar faces and stories in her own life, and quietly tracks the patterns that matter most to a clinician. The ideal solution would give Eleanor something engaging to look forward to each day, give her daughter peace of mind, and give her physician the longitudinal signal needed to act early before a window for meaningful intervention closes.
Dataset: https://github.com/KarolChlasta/ADReSS-Challenge2020
{Voice and Visual Memory Games with Clinical Insight}
Aim: Gamify the neuropsychological evaluation to improve cognitive health tracking
-
Mr. James Okafor spent years coaching his granddaughter's soccer team, walking the National Mall on weekends, and priding himself on staying active well into his late sixties. When chronic knee pain finally became too much to ignore, he opted for a total knee replacement, hopeful that surgery would give him his mobility back. He came home two days after the operation with a wearable sensor on his leg, a home exercise program printed on two sheets of paper, and instructions to check in through an app when his pain felt unmanageable. His physical therapist visited twice a week, and his surgeon expected to see him at two weeks and again at six. In between, James did his best: some days pushing through the discomfort, other days barely making it down the hallway. No one knew that his step count had been quietly falling for nearly a week, that he had started favoring his unoperated leg in a way that was putting new strain on his hip, or that he had stopped logging his pain because it felt pointless. By the time he sat back down in his surgeon's office at six weeks, what had started as a manageable rough patch had hardened into a setback that would take months to undo.
-
James’s care team believes it should never have gotten that far. They envision an AI-powered home monitoring system that weaves together James's wearable gait data, motion sensor patterns, medication and pain logs, and brief daily voice check-ins into a living picture of how his recovery is actually going, not just how it looks during a 20-minute clinic visit. They want the system to catch the moments that matter: the three-day dip in activity that signals pain or fear, the subtle limp that suggests compensation before it becomes a habit, the missed exercise that hints at discouragement rather than laziness. When something changes, they want a clear, human-readable explanation delivered to his physical therapist or surgeon in time to actually do something about it, whether that means a quick message, an earlier appointment, or a reassuring call. Most importantly, they hope for a system that keeps patients like James from falling through the cracks between visits, turning the quiet weeks at home from a blind spot into the most important window in the entire recovery process.
Datasets: https://github.com/ounospanas/KneE-PAD
https://www.kaggle.com/datasets/zara2099/rehabilitation-motion-and-posture-monitoring-dataset
{Utilization Optimization Challenge: Reducing Readmissions and Improving Hospital Efficiency}
Aim: Design an AI-enabled intervention that improves hospital throughput and margin performance by safely reducing excess length of stay and preventable readmissions.
-
Michael Reynolds, Chief Operating Officer of a large nonprofit health system, understands a difficult truth: the system’s financial survival depends on a narrow band of high acuity procedures. Complex cardiovascular interventions, advanced surgical cases, and specialized oncology services generate the margins that subsidize essential but less profitable services such as behavioral health, primary care, and emergency medicine. Yet operating margins across nonprofit hospitals remain thin, typically between 1.5 and 3 percent nationwide (Kaufman Hall, 2024; American Hospital Association, 2024). When inpatient capacity is constrained, it is often these higher acuity, margin supporting procedures that are delayed or displaced. Occupancy frequently exceeds 85 to 90 percent. Emergency department boarding is rising. Elective procedures are occasionally postponed due to limited bed availability. In this environment, lost utilization is not abstract. Each blocked bed represents deferred revenue and reduced flexibility to reinvest in mission driven services. Nationally, approximately 14 to 16 percent of Medicare beneficiaries are readmitted within 30 days (CMS, 2024). While not all readmissions are preventable, avoidable bouncebacks consume scarce inpatient capacity and strain care teams. In high occupancy systems, each preventable readmission represents a bed that could have been used for a new admission, transfer, or scheduled procedure. Even modest reductions in average length of stay, when achieved safely, can unlock thousands of bed days annually in large systems (American Hospital Association, 2024).
-
Michael and the Vice President of Medical Operations, together with the Director of Case Management, are evaluating whether artificial intelligence could support a more integrated approach to discharge readiness and post discharge stability. They are not looking for another static risk score. They want a solution that can identify modifiable barriers before discharge, support safe earlier discharge when appropriate, monitor patients during the first week after discharge, flag prolonged stays relative to expected benchmarks, and integrate directly into existing clinical workflows. Constraints are clear. No additional full time staff may be added in the base scenario. The solution must pilot at a single hospital before expansion. Measurable operational and financial impact must be demonstrated within 24 months. It must align with mixed reimbursement incentives and avoid creating pressure for premature discharge or unintended inequities. The challenge is to design an AI enabled strategy that optimizes discharge readiness and post discharge stability to reduce avoidable utilization and unlock hospital capacity. Proposals must demonstrate operational feasibility, clear workflow ownership, governance safeguards, and measurable return on investment.
Dataset:
https://data.cms.gov/provider-data/dataset/9n3s-kdb3
https://www.kaggle.com/datasets/aayushchou/hospital-length-of-stay-dataset-microsoft
https://archive.ics.uci.edu/dataset/34/diabetes
FURTHER READING:
Hospitals within the system operate under a mix of reimbursement structures. Some facilities are primarily fee for service, where bed availability directly affects revenue opportunity. Others participate in global budget or value based payment models, where revenue is fixed and financial performance depends on managing quality and utilization metrics. Under Medicare’s Hospital Readmissions Reduction Program, hospitals may face payment reductions of up to 3 percent for excess readmissions (Centers for Medicare & Medicaid Services, 2024). Under Diagnosis Related Group reimbursement, hospitals receive fixed payments per admission, meaning prolonged length of stay reduces effective margin, while safe earlier discharge can improve throughput and financial performance.
{Predicting Housing Instability: An AI-Enabled Housing & Care Coordination Model to Support Proactive Care Coordination}
Aim: Identify patients at risk of housing instability and support timely, coordinated interventions across healthcare and community systems.
-
You are working with a health system that serves a large population of patients with complex medical and social needs. The health system has observed rising emergency department (ED) utilization, frequent hospital readmissions, and poor chronic disease outcomes among patients experiencing housing instability. Ms. A is a 47-year-old woman with poorly controlled type 2 diabetes, hypertension, and depression. Over the past 6 months, she has presented to the emergency department four times for hyperglycemia and dizziness. She has missed multiple primary care and behavioral health appointments.
During a recent ED visit, a social worker learned that Ms. A: Is behind on rent and has received an informal eviction warning from her landlord Recently lost SNAP benefits due to recertification issues Lives in an overcrowded apartment with extended family Reports high stress, food insecurity, and difficulty managing medications. Relies on public transportation with inconsistent access. Her electronic health record (EHR) documents frequent ED use and missed appointments, but housing instability is only captured in free-text social work notes. Although the health system partners with community-based organizations that provide housing and social services, they currently have no systematic way to identify patients like Ms. A before a crisis occurs.
Health system leadership is exploring whether AI and predictive analytics could help identify patients like Ms. A earlier and support proactive, equitable care coordination.
-
Your team is tasked with designing an AI-Enabled Housing Instability Risk Scoring and Care Coordination Solution that could be piloted within this health system. Propose a conceptual or technical solution (model, workflow, dashboard, or architecture) that demonstrates how AI can be used responsibly to predict housing instability and improve health outcomes through coordinated care.
1. Risk Identification
a. How would you develop a predictive housing instability risk score using available clinical, utilization, and social data?
b. What data sources would you include, and how would you handle missing or unstructured data?
2. Care Coordination & Intervention
a. How would the risk score trigger appropriate interventions (e.g., social work outreach, Community-Based Organization referrals, benefits support)?
b. How would your system support care teams without overwhelming them?
3. Data Product & Visualization
a. What dashboards or decision-support tools would clinicians, case managers, or community partners use?
b. How would information be presented to support trust and action?
4. Ethical & Equity Considerations
a. How would you ensure transparency, fairness, and mitigation of bias?
b. How would you keep humans “in the loop” for critical decisions?
5. Impact & Evaluation
a. What outcomes would you measure to evaluate success (clinical, social, operational)?
b. How would this solution scale across populations or regions?
Datasets:
https://www.kaggle.com/datasets/thedevastator/uncovering-trends-in-health-outcomes-and-socioec
https://www.ahrq.gov/data/innovations/clh-data.html#download
{Predicting Survival in Acute Myeloid Leukemia Using Clinical and Genomic Features}
Aim: Perform one of two analyses to predict survival in patients with Acute Myeloid Leukemia utilizing clinical and genomic features for clinical hospital use.
-
Acute Myeloid Leukemia (AML) is an aggressive hematologic malignancy characterized by the rapid proliferation of abnormal myeloid precursor cells in the bone marrow. Large-scale genomic efforts such as The Cancer Genome Atlas (TCGA) have generated molecular profiles for AML patients, enabling computational modeling of survival outcomes using available patient data.
You are a researcher in a large hospital research institute competing in an internal competition for funding from hospital administration. You are developing an algorithm to predict survival for AML patients using clinical, demographic, and genomic burden data as part of the hospital’s precision medicine initiative. A few use cases for your predictive survival analysis model that you envision include guiding resource allocation to and future screening procedures for the high-risk AML patients, assessing the overall health of and survival outcomes for their AML patient population, and/or identifying potential targets for pharmaceutical, genomic, and other therapies.
-
This challenge asks participants to develop predictive and survival models using clinical and genomic burden features. This project features two tracks (the specific model and the dataset to use are different between the two tracks but the aim remains the same); teams may choose to participate in one or both depending on their interest and experience. Additional instructions and datasets for each track can be found in the “Further Reading” section.
To assess project viability, hospital administration would like to see (at a minimum) the following in a slide deck with the following information:
Data preprocessing performed
Survival analysis method used and features selected
Validation approach used
Key results, interpretation, and limitations of the model developed
A clear, patient-specific explanation of the survival output based on structured results
An exploration of use cases for the model in a hospital and/or research setting
FURTHER READING:
Participants may choose one of the following two tracks for this case challenge. The overarching aim is the same. The only differences between the tracks are the specific model to develop and the dataset to use. The final presentation for this case challenge must also detail the business potential of the model. The two tracks and tips for processing the datasets are described below.
Track 1: Binary Survival Outcome Prediction (Full Cohort)
Build a patient-level binary classification model that predicts Survival Status (Alive vs Dead) using baseline clinical and genomic burden features.
Recommendations: Use diagnosis age, TMB (nonsynonymous), and mutation count as a start, but feel free to use other variables in the dataset!
Note: TMB and Mutation Count are correlated. Teams should evaluate the correlation of the variables and drop any if needed.
Data Provided
See attached dataset (H2AI_Oncology_Track1)
There are 86 additional sample rows beyond unique patients [same Patient ID but different Mutation Count and/or TMB (nonsynonymous)]. Teams should figure out a logical way to handle these duplicates. There are no right or wrong answers, but teams should describe their approach.
Some values in the “Mutation Count” Column are listed as NA; teams should figure out how to handle this. This mimics real life, where not every data point is perfect!
The provided diagnosis age is not accurate. Teams will have to derive age-at-diagnosis using “Birth from Initial Pathologic Diagnosis Date”.
Computational Goals:
Data preprocessing
Create a binary classification model
Train classification model (teams are responsible for splitting their data into testing/training datasets)
Evaluate performance on held-out validation data
Model performance
Metrics:
Accuracy
Precision/Recall
Confusion Matrix
ROC-AUC
F1 Score
Track 2: TCGA-Only Time-to-Event Survival Analysis
Build a survival analysis model that predicts time-to-event survival analysis given baseline clinical and genomic features
Recommendations: Use Overall Survival (Months) as the time variable, as it is generally more commonly used in Oncology.
Teams may select other predictors they feel are important from the dataset
Data Provided:
See attached dataset (H2AI_Oncology_Track2)
Some of the values in “Death from Initial Pathologic Diagnosis Date” have a value of 0. Teams can turn this into a value of “0.03” (corresponding to 1 day) if they wish to keep them in their data. Be sure to explain how this may have impacted the final statistical analysis. If teams wish to remove this data, please explain how this impacted the analysis.
Computational Goals:
Data preprocessing
Create a survival analysis approach
Fit/validate approach
Note on sample size: With 63 samples, traditional train/test splits may not be feasible.
Consider:
Full dataset analysis
Cross-validation approaches
Bootstrap validation
Model performance evaluation
Metrics:
Hazard ratios with confidence intervals
Kaplan Meier Survival Curves
Other metrics your team feels are necessary
{Turning Ecological Momentary Assessment Data into Actionable Psychiatric Insight}
Aim: Bridge the gap between Ecological Momentary Assessment (EMA) data availability and actionable clinical insight.
-
Dr. Nadia Osei is a psychiatrist at a busy community mental health clinic, where she sees patients back-to-back and rarely has more than 20 minutes per visit. One of her patients, Mrs. M, is a 31-year-old middle school art teacher who was diagnosed with bipolar II disorder three years ago. She has been stable enough since then, holding down her job and keeping up with her medications most of the time. Eight weeks ago, Mrs. M enrolled in the clinic's digital monitoring program. She completes EMA check-ins when she remembers, roughly twice a day instead of the recommended three times. Her partner Priya submits a brief weekly collateral report on Sunday evenings. She has noted that Mrs. M has been "a little wired" lately, staying up past 2 AM refinishing furniture, snapping at her over small things, and then acting as if nothing happened the next morning. At his Tuesday appointment, Mrs. M tells Dr. Osei that she is doing pretty well. The end of the school year is stressful. Her speech is a little fast, but she has always been talkative. What Dr. Osei cannot do in the time she has is manually scroll through eight weeks of mood logs, cross-reference Priya's collateral reports, and identify that Mrs. M's average sleep has dropped from seven hours to under five over the past two weeks, or that her self-rated energy scores have been rising in a pattern that warrants attention.
-
The data exists, but it lives in a dashboard full of ungrouped graphs and raw numbers that no one in a 20-minute visit has time to parse. Dr. Osei could greatly benefit from a tool that synthesizes his last eight weeks into a single, readable summary before she walks through the door, surfacing the trends, flagging the discrepancies, and highlighting what might be worth exploring in the time she has with her today. This may include multisource EMA data aggregation, meaningful longitudinal trends, flagged risk-relevant signals and generates suggested (not prescriptive) clinical considerations. Ideally, the psychiatrist would now have a 1-page print out of the summary from the last visit to now with graphs, trends, and major highlights which is easy to interpret.
Dataset: https://dataverse.tdl.org/dataset.xhtml?persistentId=doi:10.18738/T8/OPQMF3
FURTHER READING:
Psychiatric care relies heavily on cross-sectional encounters: a 20–30 minute outpatient visit every few weeks, or brief daily check-ins during inpatient hospitalization. Clinical decision-making is therefore often based on retrospective self-report, which may be limited by recall bias, mood-congruent distortion, or lack of insight. Collateral information is inconsistently available and rarely integrated longitudinally.
At the same time, Ecological Momentary Assessments (EMAs) are increasingly used to capture real-time data on mood, sleep, medication adherence, substance use, activity, and social engagement. Patients and collateral informants (e.g., partners, parents) can complete brief structured check-ins multiple times per day. While EMAs generate rich longitudinal datasets, the volume and fragmentation of this information make it difficult for psychiatrists to extract clinically meaningful patterns within time-constrained workflows.
There is a growing gap between data availability and actionable clinical insight.
Moskowitz DS, Young SN. Ecological momentary assessment: what it is and why it is a method of the future in clinical psychopharmacology. J Psychiatry Neurosci. 2006 Jan;31(1):13-20. PMID: 16496031; PMCID: PMC1325062. https://pmc.ncbi.nlm.nih.gov/articles/PMC1325062/
Melia R, Musacchio Schafer K, Rogers ML, Wilson-Lemoine E, Joiner TE. The Application of AI to Ecological Momentary Assessment Data in Suicide Research: Systematic Review. J Med Internet Res. 2025 Apr 17;27:e63192. doi: 10.2196/63192. PMID: 40245396; PMCID: PMC12046261.
Data sets/articles:
Megan A. Neff, Katherine V. Raffensperger, Raeanne C. Moore, Colin A. Depp, Robert A. Ackerman, Amy E. Pinkham, Philip D. Harvey, Predictions of different elements of everyday functional outcomes in bipolar disorder and schizophrenia: Cognition, social cognition, clinical symptoms, and mood states as predictors, Schizophrenia Research, Volume 288, 2026, Pages 77-85, ISSN 0920-9964,
https://www.sciencedirect.com/science/article/pii/S0920996425004438#s0010
Yerushalmi M, Sixsmith A, Pollock Star A, King DB, O'Rourke N. Ecological Momentary Assessment of Bipolar Disorder Symptoms and Partner Affect: Longitudinal Pilot Study. JMIR Form Res. 2021 Sep 2;5(9):e30472. doi: 10.2196/30472. PMID: 34473069; PMCID: PMC8446838.
Lin J, Zhang L, Li M, Li J, Yin L, Yan G, Hou X, Yin H, Xu G. Acceptability, usability, and recommendations for ecological momentary assessment among patients with bipolar disorder: A qualitative study. Psychiatry Res. 2025 Dec;354:116779. doi: 10.1016/j.psychres.2025.116779. Epub 2025 Oct 20. PMID: 41145079.
https://www.sciencedirect.com/science/article/pii/S016517812500424X?via%3Dihub#sec0001
Li H, Mukherjee D, Krishnamurthy VB, Millett C, Ryan KA, Zhang L, Saunders EFH, Wang M. Use of ecological momentary assessment to detect variability in mood, sleep and stress in bipolar disorder. BMC Res Notes. 2019 Dec 4;12(1):791. doi: 10.1186/s13104-019-4834-7. PMID: 31801608; PMCID: PMC6894147.
Dr. Sarah Sperry and Victoria Murphy of the Emotion and Temporal Dynamics (EmoTe) Lab at the University of Michigan created EMA-CleanR, an R-based program for efficient pre-processing, cleaning, and visualization of Ecological Momentary Assessment (EMA) survey data. This article documents how to use EMA-CleanR and how it works.
https://teamdynamix.umich.edu/TDClient/210/DepressionCenter/KB/PrintArticle?ID=14610
Anticipating manic and depressive shifts in patients with bipolar disorder using early warning signals
https://pure.rug.nl/ws/portalfiles/portal/177817957/Chapter_9.pdf
Dataset on Kaggle
(i) Use machine learning for depression states classification
(ii) MADRS score prediction based on motor activity data
(iii) Sleep pattern analysis of depressed vs. non-depressed participants
https://www.kaggle.com/datasets/arashnic/the-depression-dataset