Search results for “”
Results 1 to 9 of 9
Important tasks in the study of genomic data include the identification of groups of similar cells (for example by clustering), and visualisation of data summaries (for example by dimensional reduction). In this talk, I will present a novel view of these tasks in the context of single-cell genomic data. To do so, I propose modelling the observed count-matrices of genomic data by representing these measurements as a bipartite network with multi-edges. Starting with this first-principles network model of the raw data, I will show improvements in clustering single cells via a suitably-identified d-dimensional Laplacian Eigenspace (LE) using a Gaussian mixture model (GMM-LE), and apply UMAP to non-linearly project the LE to two dimensions for visualisation (UMAP-LE). From this first-principles viewpoint, the LE representation of the data-points estimates transformed latent positions (of genes and cells), under a latent position statistical model of nodes in a bipartite stochastic network. By applying this proposed methodology to data from three recent genomics studies in different biological contexts, I will show how clusters of cells independently learned by this proposed methodology are found to correspond to cells expressing specific marker genes that were independently defined by domain experts, with an accuracy that is competitive with the industry-standard for these data. I will then show how this novel view of these data can provide unique insights, leading to the identification of a LE breast-cancer biomarker that significantly predicts long-term patient survival outcome in two independent validation cohorts with data from 1904 and 1091 individuals.
Speaker: Najmeh Nakhaeirad, University of Pretoria
Title: Density estimation from biased circular data
Abstract: Sampling with errors provides observations that, instead of being drawn from the distribution of interest, are rather drawn from a biased version of it. New estimation approaches are developed to retrieve the true density in the presence of such data contaminated by measurement errors from the circular manifold. Since weighted distribution theory provides a unifying approach for the correction of biases that exist in data, we assume a class of weighted distributions on the circle as the distribution of the biased observations. Then, both frequentist and Bayesian methods are applied to capture the true density of the data from the data contaminated with errors. Numerical assessments support the findings via simulation and real data analysis.
Speaker: Inger Fabris-Rotelli , University of Pretoria
Title: Spatial linear networks with applications
Abstract: This talk will introduce spatial linear networks and cover a number of application areas in Spatial Statistics. A variety of methods will be discussed involving analysis in a linear network space. Applications in informal roads, criminology and disease mapping will be presented.
As artificial intelligence grows ever more prominent within public discourse, record levels of investment and political pressure been ploughed into applying these technologies in the healthcare space. Despite this, few AI innovations translate into the clinical setting or result in real-world patient benefit. In this talk Dr Zucker will provide an overview of the barriers to AI adoption within the UK healthcare system, offer practical advice to academics on how to maximise the impact of their work in health and describe some of the projects and opportunities that can support AI and data science researchers within the region.
Bayesian emulation, and more generally Gaussian process models, have been successfully applied across a wide variety of scientific disciplines. This is both in the context of efficiently analysing computationally intensive models, as well as general statistical models for inference and prediction of response for new predictors given a training dataset. In this talk, we introduce emulators as fast statistical approximators, providing a predicted value at any input, along with a corresponding measure of uncertainty. We then proceed to discuss developments of Known Boundary Emulation (KBE) strategies which utilise the fact that, for many computer models, there exist hyperplanes in the input parameter space for which the model output can be evaluated far more efficiently. For example, this may be because the response is known at such inputs; or in the context of a computer model, such inputs may yield an analytical solution or the potential for application of a much simpler, more efficient, numerical solver. We demonstrate how information on these known hyperplanes can be incorporated into the emulation process via analytical update, thus involving no additional computational cost, before illustrating our techniques on a scientifically relevant and high-dimensional systems biology model of hormonal crosstalk in the roots of an Arabidopsis plant.
Dynamic covariance matrix models for multivariate normal data are a widely-applicable class of statistical models, meant to capture the covariate-dependent nature of the variance and dependence parameters. However, such models are rarely used in practice, partly due to the computational difficulties involved in model fitting. In particular, it is challenging to ensure the positive definiteness of the covariance matrix while guaranteeing computational scalability for even moderate dimension of the response vector. In this talk we will present methods for fitting multivariate Gaussian regression models where each parameter of the mean vector and of (an unconstrained parametrisation of) the covariance matrix can be modelled additively, via parametric or spline-based smooth effects. We will focus particularly on the modified Cholesky decomposition and we will show how the sparsity of the corresponding derivative system aids scalability w.r.t. the dimension of the response vector. The usefulness of the new models will be illustrated on a UK regional electrical net-demand forecasting application.
In this seminar, I will provide an insightful introduction to the fascinating world of Mixture of Experts (MoE) modelling, a versatile technique used across various fields. I will explore the fundamentals of MoE models, discussing their structure and practical applications. I will then highlight my contributions to this field, particularly in handling datasets with censored observations. Finally, I will conclude by sharing some current challenges I am passionate about, hoping to spark interest in potential collaborations within the school.
Human mortality patterns and trajectories in closely related populations are likely linked together and share similarities. It is always desirable to model them simultaneously while taking their heterogeneity into account. When a mortality model is applied to each population separately, they tend to result in divergent forecasts of life expectancy in the long term. We introduce a method for joint and coherent mortality modelling and forecasting of multiple subpopulations using the multivariate functional principal component analysis techniques, which ensures the non-divergent forecasting in the long run when several subpopulation groups have similar socio-economic conditions or common biological characteristics. We demonstrate the proposed methods by using sex-specific mortality data, and the forecast performances are compared with several existing models, including the independent functional data model and the Product-Ratio model.
Assuming X is a random vector and A a non-invertible matrix, one sometimes need to perform inference while only having access to samples of Y=AX. The corresponding likelihood is typically intractable. One may still be able to perform exact Bayesian inference using a pseudo-marginal sampler, but this requires an unbiased estimator of the intractable likelihood. We propose saddlepoint Monte Carlo, a method for obtaining an unbiased estimate of the density of Y with very low variance, for any model belonging to an exponential family. Our method relies on importance sampling and characteristic functions, with insights brought by the standard saddlepoint approximation scheme with exponential tilting. We show that saddlepoint Monte Carlo makes it possible to perform exact inference on particularly challenging problems and datasets. We focus on the ecological inference problem, where one observes only aggregates at a fine level. We present in particular a study of the carryover of votes between the two rounds of various French elections, using the finest available data (number of votes for each candidate in about 60,000 polling stations over most of the French territory).
Joint work with Théo Voldoire, Nicolas Chopin, and Guillaume Rateau. Preprint: https://arxiv.org/abs/2410.18243
We study various formulation of zero-sum games between a singular-controller and a stopper with a finite-time horizon, where the underlying process is a multi-dimensional controlled stochastic differential equation evolving in an unbounded domain. We prove that such games admit a value and present an optimal strategy for the stopper. In some cases, we show the game's value is the maximal solution, in a suitable Sobolev class, of a variational inequality of 'min-max' type with both obstacle and gradient constraint. Under stricter assumptions, we provide an optimal strategy for the controller and establish a connection between the space derivative of the value function and the solution of an optimal stopping problem with absorption.