# Possible MPhil/PhD Projects in Statistics

Suggested projects for postgraduate research in statistics.

## Areas of expertise

Statistical research in the School of Mathematics, Statistics and Physics can be grouped into the following research areas:

- Bayesian methodology for complex models
- big data, scalability and computation
- biostatistics and stochastic systems biology
- spatial and environmental statistics

If you're applying for a MPhil/PhD project in one of these areas, please provide the titles of up to three projects from the list below, in order of preference. Applicants are invited to apply online.

For further information, please contact the PG tutor/selector in applied mathematics: Dr Colin Gillespie

At the core of the Bayesian framework, there is the determination of suitable prior distributions for any unknown. Whether this is a parameter of a model, a model itself or some sort of structure, experimenters and decision makers are faced with the task of translating any available prior information into a suitable probability distribution. However, there are many circumstances where this is not achievable, for example, because the number of parameters in a model is too large, or simply because there is no sufficient prior information to be exploited. In these cases, the option is to revert to methods that allow to build prior distribution in absence of information, and these methods go under the name of Objective Bayes.

Regular objective methods to derive prior distributions have reached their natural ceiling, making them unsuitable for the fast growing complexity of Bayesian models. The project aims to develop on a recent novel methodology based on scoring rules to explore mathematical aspects and applications of objective prior distributions.

Supervisor: Dr Cristiano Villa

At the core of the Bayesian framework there is the determination of suitable prior distributions for any unknown. Whether this is a parameter of a model, a model itself or some sort of structure, experimenters and decision makers are faced with the task of translating any available prior information into a suitable probability distribution. However, there are many circumstances where this is not achievable, for example, because the number of parameters in a model is too large, or simply because there is no sufficient prior information to be exploited. In these cases, the option is to revert to methods that allow to build prior distribution in absence of information, and these methods go under the name of Objective Bayes.

We live in a world that is, by nature, non-linear. Although linearity is often assumed, this is in general a convenient, yet forced, simplification. Overall, the project looks into improving the implementation of a statistical tool suitable to represent non-linear

phenomenon: the Bayesian Additive Regression Tree (BART) model. In detail, the aim is to enhance the applicability of BART models through the delivery of two key outputs. First, we will develop a novel prior distribution for the structure of the trees in the BART. Second, we will develop a prior distribution to estimate the number of trees in the BART. This project aims to propose a novel loss-based approach to solve the above problems.

Supervisor: Dr Cristiano Villa

Probabilistic numerics casts numerical tasks, such the numerical solution of differential equations, as inference problems to be solved. In this project, a convergent sequence of approximations to the quantity of interest constitute a dataset from which the limiting quantity of interest can be extrapolated, in a probabilistic analogue of Richardson's deferred approach to the limit. This approach provides probabilistic uncertainty quantification whilst inheriting the features and performance of state-of-the-art numerical methods. This project aims to develop and extend such methods for challenging numerical tasks, such as solving nonlinear partial differential equations and eigenvalue problems.

Supervisor: Chris Oates

Markov chain Monte Carlo is the engine of modern Bayesian statistics, used to approximate the posterior and derived quantities of interest.

Despite this, the issue of how the output from a Markov chain is post-processed and reported is often overlooked. Convergence diagnostics can be used to control bias via burn-in removal, but these do not account for (common) situations where a limited computational budget engenders a bias-variance trade-off. The aim of this project is to directly address the bias-variance trade-off, developing powerful post-processing techniques based on Stein discrepancy to improve Markov chain Monte Carlo output.

Supervisor: Chris Oates

In a reliability demonstration test the producer of a hardware product demonstrates to a consumer that the product meets a certain level of reliability. As most hardware products have very high reliability, such tests can be prohibitively expensive, requiring large sample sizes and long testing periods. Accelerated testing can reduce the testing time, but introduces the additional complication of having to infer the relationship between failure times of the stressor variable at accelerated and normal operator conditions.

Previous attempts to plan and analyse reliability demonstration tests have utilised power calculations and hypothesis tests or risk criteria. More recently, Wilson & Farrow (2019) proposed the use of assurance to design reliability demonstration tests and suitable Bayesian analyses of the test data. Assurance provides to unconditional probability that the reliability demonstration test will be passed. Work to date has focussed on Binomial and Weibull observations. This project would extend the use of assurance to design reliability demonstration tests, considering a wider class of failure time distributions and implementing an augmented MCMC scheme to evaluate the assurance more efficiently.

Supervisor: Dr Kevin Wilson

Data sets in which each data point is a different edge-weighted network arise in a variety of different contexts such as neuroscience, molecular biology and computer science. Analysis of data sets of networks or graphs requires the development of new statistical methods, because such graphs do not live in a linear space.

Information geometry provides theory and methods for doing geometry on spaces of probability distributions. The project supervisor has recently used these tools to develop a new space of evolutionary trees, by mapping trees to certain probability distributions, and studying the geometry on trees induced by this embedding. The idea behind this project is to extend these ideas from collectins of edge-weighted evolutionary trees to collections of certain edge-weighted graphs. The project will involve stochastic processes on graphs, computational geometry and development of statistical methods in the novel setting of graph space.

Supervisor: Dr Tom Nye

Abstract: Classic models for count data can readily accommodate overdispersion relative to the Poisson model, but models for underdispersed counts - where the mean exceeds the variance - are less well established, and those that have been proposed are often hampered by, for instance: lack of natural interpretation, a restricted parameter space, computational difficulties in implementation or some combination of all three. At the individual level, one can often encounter both over- and underdispersion, or bidispersion, within the same dataset and failure to allow for this bidispersion leads to inferences on parameters that are either conservative or anti-conservative. In this project, models to handle such bidispersed data, typically at an individual level, will be developed and applied to case studies drawn from sport and the social sciences.

Supervisor: Pete Philipson & Daniel Henderson