# MAS8955 : Advanced Topics in Statistics B

• Offered for Year: 2024/25
• Module Leader(s): Dr Jere Koskela
• Owning School: Mathematics, Statistics and Physics
• Teaching Location: Newcastle City Campus
##### Semesters

Your programme is made up of credits, the total differs on programme to programme.

 Semester 2 Credit Value: 20 ECTS Credits: 10.0 European Credit Transfer System

#### Aims

To develop a broader knowledge of advanced statistical topics. To acquire skills in analysing

complex statistical models through both theoretical and data analysis.

Module summary

The module involves discussion of current research in Statistics at Newcastle and will introduce students to two or more of the areas which could include those below. Each topic will involve theoretical development and study of applications, often with a computational aspect.

Gaussian processes: Gaussian processes are distributions on continuous functions with attractive properties which make them useful in a wide range of modelling applications, such as in bioinformatics and the study of neural networks.

Complex data structures: Most statistical methods rely on underlying linear structure within the data sets and models. However, data sets consisting of objects such as medical images, evolutionary trees or social networks can lack such structure, and this topic concerns the novel statistical methods required to analyse such data.

Statistics of extremes: This topic introduces a technique which performs a seemingly impossible task: to predict the probability of events that are more extreme than any that have happened before. Examples include calculating the required height of sea-walls to prevent flooding and modelling excessively high pollution levels.

Neural networks: Neural networks are highly flexible parametric models for regression and classification, which have produced state-of-the-art results in the analysis of images, sound and text. This topic explains the underlying mathematics and statistics of neural networks, as well as how to implement them in practice.

Statistical genetics: Statistical genetics is the study of genetic variation in humans and other organisms. By modelling random processes such as mutation, we will show how to infer the underlying evolutionary processes from observed genetic data, and apply this for example to the study of new virus variants. We will also consider methods for inferring associations between genes and diseases which aim to identify genes which cause inherited conditions.

Minimum discrepancy methods: A fundamental problem in statistics is that of selecting a small number of representative points to summarise a continuous probability distribution of interest. This topic covers the elegant mathematical theory of discrepancy, before turning attention to computational methods and how they can be used to speed up Bayesian analysis.

Design and analysis of diagnostic studies: Diagnostic studies aim to develop and assess the accuracy of test to diagnose a particular disease or condition. In this topic we will consider (i) methods to choose an appropriate sample size for a diagnostic study, (ii) approaches to optimise the performance of a diagnostic test and (iii) approaches to infer parameters assessing the accuracy of a diagnostic test.

Sports modelling: In this topic the focus is on statistical modelling of outcomes of sporting contests. Consideration will be given to constructing rankings based on pairwise outcomes (in the form of win-loss or a match score) and on outcomes involving multiple items, which may be in the form of an explicit ordering (i.e. times in a 100m sprint race) or an implicit order (e.g. runs scored by several batters).

#### Outline Of Syllabus

This module will introduce students to two or more of the areas which could include those listed below. The anticipated syllabus is given below, although there may be changes to reflect recent developments.

Gaussian Processes: Introduction to Gaussian processes. Gaussian process regression and classification (choice of covariance kernel, selection of hyper-parameters, prediction). Application to real problems.

Complex data structures: Directional data: distributions on the circle and the sphere, parameter estimation. Shape data: size measures and shape coordinates, rotation and size invariance, Procrustes analysis. Trees and networks: phylogenetic trees, networks, models of evolution, parameter inference.

This module will introduce students to two or more of the areas which could include those listed below. The anticipated syllabus is given below, although there may be changes to reflect recent developments.

Gaussian Processes: Introduction to Gaussian processes. Gaussian process regression and classification (choice of covariance kernel, selection of hyper-parameters, prediction). Application to real problems.

Complex data structures: Directional data: distributions on the circle and the sphere, parameter estimation. Shape data: size measures and shape coordinates, rotation and size invariance, Procrustes analysis. Trees and networks: phylogenetic trees, networks, models of evolution, parameter inference.

Statistics of extremes: Extremal types theorem, the generalised extreme value distribution (GEV), likelihood and estimation for the GEV, its uses and limitations. Alternative extreme value characterisations: the distribution of extreme r order statistics and the generalised Pareto distribution, the point process characterisation. Application to real problems: temporal dependence, non-stationarity. Issues involved in extreme value modelling of environmental and other data. A non-mathematical overview of multivariate extremes.

Neural networks: Supervised learning; Multi-layer perceptrons; Stochastic gradient descent; Automatic differentiation and backpropagation; Regularisation; Deep learning; Application to real problems using R.

Statistical genetics: Review of discrete-time stochastic processes, for instance Markov chains. Extensions to continuous-time stochastic processes. Discrete-time reproductive models - Wright-Fisher, Moran. Coalescent limit. Properties of the standard coalescent, models of mutation, summary statistics and tests for neutrality, e.g. Tajima's D. Hardy-Weinberg laws, recombination, linkage mapping, linkage disequilibrium, association testing, an overview of genome-wide association studies.

Minimum Discrepancy Methods: Algorithms for discrepancy minimisation; quasi Monte Carlo methods; reproducing kernels; Hilbert spaces; Stein discrepancy; post-processing of Markov chain Monte Carlo output.

Design and analysis of diagnostic studies: measures of diagnostic accuracy including sensitivity, specificity, positive and negative predictive values, ROC curves, area under the curve and likelihood ratios. Estimation and testing in a single sample, comparing the accuracy of two tests, sample size calculations via power and assurance. Regression methods for independent ROC data, methods for correcting verification bias and methods for correcting imperfect gold standard bias.

Sports modelling: Methods for ranking teams/players (Bradley-Terry model) based on pairwise comparisons, extensions to handling ties (Davidson model); methods for ranking multiple items (Plackett-Luce) including adjustments for multiple ties. Basic Poisson model for football scores; Dixon-Coles model. Extended count data models; zero-inflation and overdispersion. Incorporation of home advantage, dynamic models and time-weighted components. Applications to sports such as football, tennis, cricket and Formula One will be used throughout.

#### Teaching Methods

##### Teaching Activities
Category Activity Number Length Student Hours Comment
Scheduled Learning And Teaching ActivitiesLecture41:004:00Revision Lectures
Scheduled Learning And Teaching ActivitiesLecture401:0040:00Formal Lectures
Guided Independent StudyAssessment preparation and completion301:0030:00Completion of in course assessments
Scheduled Learning And Teaching ActivitiesLecture101:0010:00Problem Classes
Guided Independent StudyIndependent study1161:00116:00Preparation time for lectures, background reading, coursework review
Total200:00
##### Teaching Rationale And Relationship

The teaching methods are appropriate to allow students to develop a wide range of skills, from understanding basic concepts and facts to higher-order thinking. Lectures are used for the delivery of theory and explanation of methods, illustrated with examples, and for giving general feedback on marked work. Problem Classes and computer practicals are used to help develop the students’ abilities at applying the theory to solving problems.

#### Assessment Methods

The format of resits will be determined by the Board of Examiners

##### Exams
Description Length Semester When Set Percentage Comment
Written Examination1502A80N/A
##### Other Assessment
Description Semester When Set Percentage Comment
Prob solv exercises2M5Problem-solving exercises assessment
Written exercise2M5In class test
Prob solv exercises2M5Problem-solving exercises assessment
Written exercise2M5In class test
##### Assessment Rationale And Relationship

A substantial formal unseen examination is appropriate for the assessment of the material in this module. The format of the examination will enable students to reliably demonstrate their own knowledge, understanding and application of learning outcomes. The assurance of academic integrity forms a necessary part of the programme accreditation. Examination problems may require a synthesis of concepts and strategies from different sections, while they may have more than one ways for solution. The examination time allows the students to test different strategies, work out examples and gather evidence for deciding on an effective strategy, while carefully articulating their ideas and explicitly

The coursework assignments and in-class tests allow the students to develop their problem solving techniques, to practise the methods learnt in the module, to assess their progress and to receive feedback; these assessments have a secondary formative purpose as well as their primary summative purpose.