Publications
Publications in reversed chronological order.
2026
- ICMLScalable Bayesian Semi-supervised Clustering with Feature Selection and Adaptive Constraint WeightingLuwei Wang, Dagmara Panas, Ke Wang , and 2 more authorsIn Proceedings of the 43rd International Conference on Machine Learning , Jul 2026To appear
Constrained clustering incorporates prior knowledge in the form of pairwise constraints to guide data partitioning. While effective, existing Bayesian approaches are often limited in scalability to large datasets and provide weak interpretability due to the lack of explicit feature relevance modeling. We propose BASIL, a scalable Bayesian semi-supervised clustering framework that leverages stochastic variational inference to jointly infer cluster assignments and feature importance weights. This joint formulation enables the identification of discriminative features consistent with the imposed constraints. To robustly handle noisy or inconsistent supervision, BASIL introduces an adaptive constraint-weighting mechanism that down-weights unreliable constraints. Experiments on synthetic and real-world benchmarks demonstrate that our approach achieves competitive clustering performance while improving scalability and interpretability over existing baselines. We further demonstrate applicability to large-scale health data, including medical imaging and electronic health records.
@inproceedings{Wang2026BASIL, title = {Scalable Bayesian Semi-supervised Clustering with Feature Selection and Adaptive Constraint Weighting}, author = {Wang, Luwei and Panas, Dagmara and Wang, Ke and Guthrie, Bruce and Seth, Sohan}, booktitle = {Proceedings of the 43rd International Conference on Machine Learning}, year = {2026}, month = jul, publisher = {PMLR}, series = {Proceedings of Machine Learning Research}, note = {To appear} } - ICMLCache Coherent Resampling for Efficient Test Time Scaling in LLM Reasoning via Adaptive Sequential Monte CarloKe Wang, Zehao Yu, Luwei Wang , and 1 more authorIn Proceedings of the 43rd International Conference on Machine Learning , Jul 2026To appear
Recent work shows that chain based sampling for power shaped trajectory distributions can deliver large test time gains from a fixed base LLM and can approach RL trained reasoners such as GRPO. Deployment is the bottleneck. Autoregressive Metropolis Hastings is inherently serial, limits GPU utilization, and exhibits extreme tail latency at high budgets, reaching p95 [?]s on MATH500 at [?]. We propose Adaptive Sequential Monte Carlo (ASMC), a parallel particle inference method that targets power shaped trajectory distributions while adapting particle populations to problem hardness. To make resampling practical for Transformers, we introduce cache coherent resampling, which realizes ancestry updates by reordering KV caches and other particle bound tensors, avoiding prefix recomputation. On MATH500 at the same budget, ASMC attains [?] exact match accuracy with p95 [?]s, improving the accuracy to tail latency trade off over both sequential MCMC and best of [?]. We further analyze particle degeneracy and find that collapse severity, measured by low [?], strongly predicts failures, while sensitivity to the resampling scheme is limited.
@inproceedings{Wang2026ASMC, title = {Cache Coherent Resampling for Efficient Test Time Scaling in {LLM} Reasoning via Adaptive Sequential Monte Carlo}, author = {Wang, Ke and Yu, Zehao and Wang, Luwei and Huang, Yongchao}, booktitle = {Proceedings of the 43rd International Conference on Machine Learning}, year = {2026}, month = jul, publisher = {PMLR}, series = {Proceedings of Machine Learning Research}, note = {To appear} } - arXivBayesian Supervised Causal ClusteringLuwei Wang, Nazir Lone, and Sohan SetharXiv preprint arXiv:2603.05288, Mar 2026
Finding patient subgroups with similar characteristics is crucial for personalized decision-making in various disciplines such as healthcare and policy evaluation. While most existing approaches rely on unsupervised clustering methods, there is a growing trend toward using supervised clustering methods that identify operationalizable subgroups in the context of a specific outcome of interest. We propose Bayesian Supervised Causal Clustering (BSCC), with treatment effect as outcome to guide the clustering process. BSCC identifies homogenous subgroups of individuals who are similar in their covariate profiles as well as their treatment effects. We evaluate BSCC on simulated datasets as well as real-world dataset from the third International Stroke Trial to assess the practical usefulness of the framework.
@article{Wang2026BSCC, title = {Bayesian Supervised Causal Clustering}, author = {Wang, Luwei and Lone, Nazir and Seth, Sohan}, journal = {arXiv preprint arXiv:2603.05288}, year = {2026}, month = mar, url = {https://arxiv.org/abs/2603.05288}, }
2025
- UAI
Nonparametric Bayesian Multi-Facet Clustering for Longitudinal DataLuwei Wang, Kieran Richards, and Sohan SethIn Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence , Jul 2025Complex real-world time series data are inherently multi-faceted, e.g., temporal data can be described by seasonality and trend. Popular clustering methods typically aggregate information from all facets, treating them collectively rather than individually. This aggregation may diminish the interpretability of clusters by obscuring the specific contributions of individual facets to the clustering outcome. This limitation can be addressed by multi-facet clustering that builds a separate clustering model for each facet simultaneously. In this paper, we explore Bayesian multi-facet clustering modelling for temporal data using nonparametric priors to select an appropriate number of clusters automatically and using variational inference to efficiently explore the parameter space. We apply this framework to nonlinear growth models and vector autoregressive models and observe their performance through simulation studies. We apply these models to real-world time series data from the English Longitudinal Study of Ageing (ELSA), highlighting its utility in identifying meaningful and interpretable clusters. These findings underscore the potential of the framework for advancing the analysis of multi-faceted longitudinal data in diverse fields.
@inproceedings{Wang2025NPBayesMFC, title = {Nonparametric Bayesian Multi-Facet Clustering for Longitudinal Data}, author = {Wang, Luwei and Richards, Kieran and Seth, Sohan}, booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence}, year = {2025}, pages = {4411--4442}, month = jul, publisher = {PMLR}, series = {Proceedings of Machine Learning Research}, url = {https://proceedings.mlr.press/v286/wang25c.html}, }