# Possible MPhil/PhD Projects in Statistics

Suggested projects for postgraduate research in statistics.

## Areas of expertise

Statistical research in the School of Mathematics, Statistics and Physics can be grouped into the following research areas:

- Bayesian methodology for complex models
- big data, scalability and computation
- biostatistics and stochastic systems biology
- spatial and environmental statistics

If you're applying for a MPhil/PhD project in one of these areas, please provide the titles of up to three projects from the list below, in order of preference. Applicants are invited to apply online.

For further information, please contact the PG tutor/selector in applied mathematics: Dr Colin Gillespie

Analytic intractability of most nonlinear multivariate diffusions can make parameter inference problematic. A widely used approach is to adopt a data augmentation scheme so that an Euler-Maruyama approximation of unavailable transition densities is applied over additional intermediate time points. Recently proposed pseudo-marginal Metropolis-Hastings schemes effectively integrate over the uncertainty at these intermediate times and give samples from the marginal parameter posterior distribution of interest. Unfortunately, these inference schemes are extremely computationally intensive, particularly when the number of intermediate time points is large (corresponding to a fine discretisation level).

This project aims to alleviate this problem by extending a recently proposed delayed acceptance pseudo marginal scheme to incorporate multiple stages, corresponding to different discretisation levels. The basic idea is to try parameter proposals under a coarse (cheap) discretisation, and only proceed to finer (expensive) discretisations for those proposals that have passed the first stage. Key to a statistically efficient scheme will be the ability to induce positive correlation between estimators of marginal likelihood at consecutive stages. Application of this approach to models arising in systems biology will be of particular interest.

Supervisors: Dr A Golightly

Statistical inference remains challenging for many large scale complicated models and datasets in fields including biology, economics and epidemiology. This project will develop new methods combining recent advances in machine learning with Bayesian inference. For instance, one approach is to train a neural network to produce samples from a high dimensional Bayesian posterior distribution based on the output of Monte Carlo algorithms.

Applications include (a) agent-based models of finance and biology (b) tuning physics simulator code (c) dynamical system models (eg differential equations) of infectious diseases.

Supervisor: Dr Dennis Prangle

In a reliability demonstration test the producer of a hardware product demonstrates to a consumer that the product meets a certain level of reliability. As most hardware products have very high reliability, such tests can be prohibitively expensive, requiring large sample sizes and long testing periods. Accelerated testing can reduce the testing time, but introduces the additional complication of having to infer the relationship between failure times of the stressor variable at accelerated and normal operator conditions.

Previous attempts to plan and analyse reliability demonstration tests have utilised power calculations and hypothesis tests or risk criteria. More recently, Wilson & Farrow (2019) proposed the use of assurance to design reliability demonstration tests and suitable Bayesian analyses of the test data. Assurance provides to unconditional probability that the reliability demonstration test will be passed. Work to date has focussed on Binomial and Weibull observations. This project would extend the use of assurance to design reliability demonstration tests, considering a wider class of failure time distributions and implementing an augmented MCMC scheme to evaluate the assurance more efficiently.

Supervisor: Dr Kevin Wilson

Billera-holmes-Vogtmann tree space is the collection of all possible evolutionary trees for a fixed set of species. It is a non-smooth non-linear space, but it comes equipped with a very beautiful geometry, which enables some standard statistical methods to be reformulated in this novel setting.

In order to perform statistics within the space, it is highly desirable to construct parametric families of distributions. However, the construction of distributions for which parameters can be inferred is extremely challenging. In this project we will use transition kernels of stochastic processes on tree space to define such distributions, and then develop Bayesian methods for parameter inference from data sets in tree space. The project will involve developing the theory of stochastic processes in tree space, and modifying Bayesian inference schemes, such as the use of Brownian bridges, to apply in tree space.

Supervisor: Dr Tom Nye

Data sets in which each data point is a different edge-weighted network arise in a variety of different contexts such as neuroscience, molecular biology and computer science. Analysis of data sets of networks or graphs requires the development of new statistical methods, because such graphs do not live in a linear space.

Information geometry provides theory and methods for doing geometry on spaces of probability distributions. The project supervisor has recently used these tools to develop a new space of evolutionary trees, by mapping trees to certain probability distributions, and studying the geometry on trees induced by this embedding. The idea behind this project is to extend these ideas from collectins of edge-weighted evolutionary trees to collections of certain edge-weighted graphs. The project will involve stochastic processes on graphs, computational geometry and development of statistical methods in the novel setting of graph space.

Supervisor: Dr Tom Nye

EEG (electroencephalogram) data records the electrical activity of the brain and is used in the diagnosis of epilepsy and to support clinical decision making for patients with epilepsy.

EEG data are often recorded as time series of correlation matrices. In order to analyse such data, a geometry must be selected for the space of all such matrices, and a natural choice is to use an extension of the Fisher-Rao metric on covariance matrices to correlation matrix space. In this project we will use Riemannian geometry to map standard vector-valued time series statistical methods into the novel context of covariance- or correlation-matrix space. This will produce valuable new models for analysing clinical EEG data.

Supervisor: Dr Tom Nye