Skip to main content

Module

MAS8952 : Research Topics in Statistics

  • Offered for Year: 2022/23
  • Module Leader(s): Dr Daniel Henderson
  • Co-Module Leader: Professor Chris Oates
  • Lecturer: Dr Cristiano Villa, Mr Matthew Fisher
  • Owning School: Mathematics, Statistics and Physics
  • Teaching Location: Newcastle City Campus
Semesters
Semester 1 Credit Value: 15
Semester 2 Credit Value: 15
ECTS Credits: 15.0

Aims

To develop a broader knowledge of advanced statistical topics. To acquire skills in analysing
complex statistical models through both theoretical and data analysis.

Module summary

The module involves discussion of current research in Statistics at Newcastle and will introduce students to two or more of the areas described below. Each topic will involve theoretical development and study of applications, often with a computational aspect.

Gaussian processes: Gaussian processes are distributions on continuous functions with attractive properties which make them useful in a wide range of modelling applications, such as in bioinformatics and the study of neural networks.

Complex data structures: Most statistical methods rely on underlying linear structure within the data sets and models. However, data sets consisting of objects such as medical images, evolutionary trees or social networks can lack such structure, and this topic concerns the novel statistical methods required to analyse such data.

Statistics of extremes: This topic introduces a technique which performs a seemingly impossible task: to predict the probability of events that are more extreme than any that have happened before. Examples include calculating the required height of sea-walls to prevent flooding and modelling excessively high pollution levels.

Neural networks: Neural networks are highly flexible parametric models for regression and classification, which have produced state-of-the-art results in the analysis of images, sound and text. This topic explains the underlying mathematics and statistics of neural networks, as well as how to implement them in practice.

Statistical genetics: Statistical genetics is the study of genetic variation in humans and other organisms. By modelling random processes such as mutation, we will show how to infer the underlying evolutionary processes from observed genetic data, and apply this for example to the study of new virus variants. We will also consider methods for inferring associations between genes and diseases which aim to identify genes which cause inherited conditions.

Minimum discrepancy methods: A fundamental problem in statistics is that of selecting a small number of representative points to summarise a continuous probability distribution of interest. This topic covers the elegant mathematical theory of discrepancy, before turning attention to computational methods and how they can be used to speed up Bayesian analysis.

Design and analysis of diagnostic studies: Diagnostic studies aim to develop and assess the accuracy of test to diagnose a particular disease or condition. In this topic we will consider (i) methods to choose an appropriate sample size for a diagnostic study, (ii) approaches to optimise the performance of a diagnostic test and (iii) approaches to infer parameters assessing the accuracy of a diagnostic test.

Sports modelling: In this topic the focus is on statistical modelling of outcomes of sporting contests. Consideration will be given to constructing rankings based on pairwise outcomes (in the form of win-loss or a match score) and on outcomes involving multiple items, which may be in the form of an explicit ordering (i.e. times in a 100m sprint race) or an implicit order (e.g. runs scored by several batters).

Outline Of Syllabus

This module will introduce students to two or more of the areas described below. The anticipated syllabus is given below, although there may be changes to reflect recent developments.

Gaussian Processes: Introduction to Gaussian processes. Gaussian process regression and classification (choice of covariance kernel, selection of hyper-parameters, prediction). Application to real problems.

Complex data structures: Directional data: distributions on the circle and the sphere, parameter estimation. Shape data: size measures and shape coordinates, rotation and size invariance, Procrustes analysis. Trees and networks: phylogenetic trees, networks, models of evolution, parameter inference.

Statistics of extremes: Extremal types theorem, the generalised extreme value distribution (GEV), likelihood and estimation for the GEV, its uses and limitations. Alternative extreme value characterisations: the distribution of extreme r order statistics and the generalised Pareto distribution, the point process characterisation. Application to real problems: temporal dependence, non-stationarity. Issues involved in extreme value modelling of environmental and other data. A non-mathematical overview of multivariate extremes.

Neural networks: Supervised learning; Multi-layer perceptrons; Stochastic gradient descent; Automatic differentiation and backpropagation; Regularisation; Deep learning; Application to real problems using R.


Statistical genetics: Introduction to genetics including concepts of DNA, genes, mutation and inheritance. Stochastic models of mutation and recombination. The Kingman coalescent model. Estimation of ancestry and inference of parameters in evolutionary models. Genetic association studies. Computational methods in R for simulation and for performing inference.

Minimum Discrepancy Methods: Algorithms for discrepancy minimisation; quasi Monte Carlo methods; reproducing kernels; Hilbert spaces; Stein discrepancy; post-processing of Markov chain Monte Carlo output.

Design and analysis of diagnostic studies: measures of diagnostic accuracy including sensitivity, specificity, positive and negative predictive values, ROC curves, area under the curve and likelihood ratios. Estimation and testing in a single sample, comparing the accuracy of two tests, sample size calculations via power and assurance. Regression methods for independent ROC data, methods for correcting verification bias and methods for correcting imperfect gold standard bias.

Sports modelling: Methods for ranking teams/players (Bradley-Terry model) based on pairwise comparisons, extensions to handling ties (Davidson model); methods for ranking multiple items (Plackett-Luce) including adjustments for multiple ties. Basic Poisson model for football scores; Dixon-Coles model. Extended count data models; zero-inflation and overdispersion. Incorporation of home advantage, dynamic models and time-weighted components. Applications to sports such as football, tennis, cricket and Formula One will be used throughout.

Teaching Methods

Teaching Activities
Category Activity Number Length Student Hours Comment
Guided Independent StudyAssessment preparation and completion401:0040:00Completion of in course assessments
Scheduled Learning And Teaching ActivitiesLecture601:0060:00Formal Lectures – Present in Person
Scheduled Learning And Teaching ActivitiesLecture61:006:00Revision Lectures – Present in Person
Scheduled Learning And Teaching ActivitiesLecture151:0015:00Problem Classes – Synchronous On-Line
Guided Independent StudyIndependent study1791:00179:00Preparation time for lectures, background reading, coursework review
Total300:00
Teaching Rationale And Relationship

Lectures are used for the delivery of theory and explanation of methods, illustrated with examples, and for giving general feedback on marked work. Problem Classes are used to help develop the students’ abilities at applying the theory to solving problems.

Assessment Methods

The format of resits will be determined by the Board of Examiners

Exams
Description Length Semester When Set Percentage Comment
Written Examination1502A60Closed book
Other Assessment
Description Semester When Set Percentage Comment
Prob solv exercises1M20Coursework assignment
Prob solv exercises2M20Coursework assignment
Assessment Rationale And Relationship

A substantial formal unseen examination is appropriate for the assessment of the material in this module. The coursework assignments allow the students to develop their problem solving techniques, to practise the methods learnt in the module, to assess their progress and to receive feedback; these assessments have a secondary formative purpose as well as their primary summative purpose.

Reading Lists

Timetable