Research
Bayesian and causal machine learning for healthcare data.
I am a PhD candidate in Biomedical AI at the University of Edinburgh, working with Dr Sohan Seth at the Data Science Unit. My research sits at the intersection of Bayesian modelling, unsupervised and causal machine learning, and healthcare. I am particularly interested in longitudinal data, the health trajectories that unfold over years, where the goal is not just prediction but understanding. I want to identify meaningful subgroups of patients, the factors that drive their trajectories, and how interventions move them along.
Three threads run through my work to date.
Interpretable clustering of multi-faceted time series
Real-world time series rarely live in a single facet. Patient histories, like most longitudinal data, carry trend, seasonality, regime changes, and event sequences at once, and treating them as a single signal collapses the structure that clinicians actually care about. My UAI 2025 paper introduces a nonparametric Bayesian framework that learns a separate clustering for each facet simultaneously, using variational inference to scale to real cohorts. Applied to the English Longitudinal Study of Ageing, it recovers interpretable subgroups that single-facet models miss.
- Paper. Nonparametric Bayesian Multi-Facet Clustering for Longitudinal Data, UAI 2025
- Code. GitHub
- Poster. UAI 2025
- Talks. UAI 2025 poster, UKAIRS 2025 oral, Joint CDT Conference on AI for Healthcare 2025 oral.
- Background reading. The real-world data are multi-faceted
Disease trajectories in multimorbidity
Multimorbidity, the co-existence of two or more chronic conditions, is increasingly the norm rather than the exception, and care models designed for a single disease break down for these patients. Most existing analyses are cross-sectional. Far fewer take the temporal order of disease onset seriously. My MSc dissertation, conducted with the Data Science Unit, used temporal clustering to surface meaningful patterns of how multiple conditions accumulate over time, and to link those patterns to outcomes such as mortality. This thread shapes the longitudinal modelling questions I continue to pursue in the PhD.
Supervised clustering for heterogeneous treatment effects
Clinical trials often report no average benefit when in fact a benefit exists for a specific subpopulation, and unsupervised clustering of patient covariates rarely surfaces those subpopulations cleanly. My recent preprint introduces Bayesian Supervised Causal Clustering (BSCC), a framework that uses individual treatment effect as the outcome guiding the clustering process. BSCC recovers homogeneous subgroups whose members are similar both in their covariate profiles and in how they respond to treatment, giving clinicians and trialists subgroups that are operationalisable rather than merely statistical. I evaluated BSCC on simulated benchmarks and on real-world data from the third International Stroke Trial.
This thread builds on earlier work I did as a research assistant on a Turing-funded project, where I evaluated supervised metric-based clustering for recovering subphenotypes of critically ill COVID-19 patients under convalescent plasma treatment. That project introduced a “FavorCP” outcome that improved odds-ratio testing across discovered subgroups, and motivated the move to a fully Bayesian, causal formulation in BSCC.
- Paper. Bayesian Supervised Causal Clustering, arXiv 2026
- Earlier project. Supervised Clustering of Critically Ill Patients
- Forthcoming companion note. Comparison on Common Meta-learners for HTE
Where I’m heading
A common thread runs through all three. I want to build interpretable models under uncertainty, models that a domain expert can act on while honestly representing what the data does and does not support. I’m increasingly drawn to causal machine learning as the natural next step, moving from asking which patients look alike to asking which interventions change which patients’ trajectories, and why.
The fastest way to reach me about research is by email.