Skip to main content

Module

MAS8957 : Advanced Topics in Statistics B

  • Offered for Year: 2024/25
  • Module Leader(s): Dr Jere Koskela
  • Owning School: Mathematics, Statistics and Physics
  • Teaching Location: Newcastle City Campus
Semesters

Your programme is made up of credits, the total differs on programme to programme.

Semester 2 Credit Value: 10
ECTS Credits: 5.0
European Credit Transfer System

Aims

To develop a broader knowledge of advanced statistical topics. To acquire skills in analysing complex statistical models through both theoretical and data analysis.

Module summary

The module involves discussion of current research in Statistics at Newcastle and will introduce students to one or more of the areas which could include those below. Each topic will involve theoretical development and study of applications, often with a computational aspect.

Continuous-time stochastic processes: This topic introduces the theory of stochastic processes evolving in continuous time, which are a fundamental building block in probabilistic modelling in a broad class of applications such as asset pricing, ecology and epidemiology, chemical reactions, and statistical physics. We will encounter mathematical tools for building and analysing stochastic processes in continuous time, and use them to consider some of these applied examples.

Gaussian processes: Gaussian processes are distributions on continuous functions with attractive properties which make them useful in a wide range of modelling applications, such as in bioinformatics and the study of neural networks.

Complex data structures: Most statistical methods rely on underlying linear structure within the data sets and models. However, data sets consisting of objects such as medical images, evolutionary trees or social networks can lack such structure, and this topic concerns the novel statistical methods required to analyse such data.

Statistics of extremes: This topic introduces a technique which performs a seemingly impossible task: to predict the probability of events that are more extreme than any that have happened before. Examples include calculating the required height of sea-walls to prevent flooding and modelling excessively high pollution levels.

Neural networks: Neural networks are highly flexible parametric models for regression and classification, which have produced state-of-the-art results in the analysis of images, sound and text. This topic explains the underlying mathematics and statistics of neural networks, as well as how to implement them in practice.

Statistical genetics: Statistical genetics is the study of genetic variation in humans and other organisms. By modelling random processes such as mutation, we will show how to infer the underlying evolutionary processes from observed genetic data, and apply this for example to the study of new virus variants. We will also consider methods for inferring associations between genes and diseases which aim to identify genes which cause inherited conditions.

Minimum discrepancy methods: A fundamental problem in statistics is that of selecting a small number of representative points to summarise a continuous probability distribution of interest. This topic covers the elegant mathematical theory of discrepancy, before turning attention to computational methods and how they can be used to speed up Bayesian analysis.

Design and analysis of diagnostic studies: Diagnostic studies aim to develop and assess the accuracy of test to diagnose a particular disease or condition. In this topic we will consider (i) methods to choose an appropriate sample size for a diagnostic study, (ii) approaches to optimise the performance of a diagnostic test and (iii) approaches to infer parameters assessing the accuracy of a diagnostic test.

Sports modelling: In this topic the focus is on statistical modelling of outcomes of sporting contests. Consideration will be given to constructing rankings based on pairwise outcomes (in the form of win-loss or a match score) and on outcomes involving multiple items, which may be in the form of an explicit ordering (i.e. times in a 100m sprint race) or an implicit order (e.g. runs scored by several batters).

Outline Of Syllabus

This module will introduce students to two or more of the areas which could include those listed below. The anticipated syllabus is given below, although there may be changes to reflect recent developments.

Gaussian Processes: Introduction to Gaussian processes. Gaussian process regression and classification (choice of covariance kernel, selection of hyper-parameters, prediction). Application to real problems.

Complex data structures: Directional data: distributions on the circle and the sphere, parameter estimation. Shape data: size measures and shape coordinates, rotation and size invariance, Procrustes analysis. Trees and networks: phylogenetic trees, networks, models of evolution, parameter inference.

Statistics of extremes: Extremal types theorem, the generalised extreme value distribution (GEV), likelihood and estimation for the GEV, its uses and limitations. Alternative extreme value characterisations: the distribution of extreme r order statistics and the generalised Pareto distribution, the point process characterisation. Application to real problems: temporal dependence, non-stationarity. Issues involved in extreme value modelling of environmental and other data. A non-mathematical overview of multivariate extremes.

Neural networks: Supervised learning; Multi-layer perceptrons; Stochastic gradient descent; Automatic differentiation and backpropagation; Regularisation; Deep learning; Application to real problems using R.

Statistical genetics: Review of discrete-time stochastic processes, for instance Markov chains. Extensions to continuous-time stochastic processes. Discrete-time reproductive models - Wright-Fisher, Moran. Coalescent limit. Properties of the standard coalescent, models of mutation, summary statistics and tests for neutrality, e.g. Tajima's D. Hardy-Weinberg laws, recombination, linkage mapping, linkage disequilibrium, association testing, an overview of genome-wide association studies.

Minimum Discrepancy Methods: Algorithms for discrepancy minimisation; quasi Monte Carlo methods; reproducing kernels; Hilbert spaces; Stein discrepancy; post-processing of Markov chain Monte Carlo output.

Design and analysis of diagnostic studies: measures of diagnostic accuracy including sensitivity, specificity, positive and negative predictive values, ROC curves, area under the curve and likelihood ratios. Estimation and testing in a single sample, comparing the accuracy of two tests, sample size calculations via power and assurance. Regression methods for independent ROC data, methods for correcting verification bias and methods for correcting imperfect gold standard bias.

Sports modelling: Methods for ranking teams/players (Bradley-Terry model) based on pairwise comparisons, extensions to handling ties (Davidson model); methods for ranking multiple items (Plackett-Luce) including adjustments for multiple ties. Basic Poisson model for football scores; Dixon-Coles model. Extended count data models; zero-inflation and overdispersion. Incorporation of home advantage, dynamic models and time-weighted components. Applications to sports such as football, tennis, cricket and Formula One will be used throughout.

Teaching Methods

Teaching Activities
Category Activity Number Length Student Hours Comment
Guided Independent StudyAssessment preparation and completion12:002:00Unseen exam
Scheduled Learning And Teaching ActivitiesLecture51:005:00Problem Classes
Scheduled Learning And Teaching ActivitiesLecture12:002:00Revision Lectures
Scheduled Learning And Teaching ActivitiesLecture102:0020:00Formal Lectures
Guided Independent StudyAssessment preparation and completion131:0013:00Revision for unseen exam
Guided Independent StudyAssessment preparation and completion24:008:00Completion of in course assessments
Guided Independent StudyIndependent study251:0025:00Background reading on lectured content
Guided Independent StudyIndependent study21:303:00Review of coursework
Guided Independent StudyIndependent study221:0022:00Preparation time for lectures
Total100:00
Teaching Rationale And Relationship

Lectures are used for the delivery of theory and explanation of methods, illustrated with examples, and for giving general feedback on marked work. Problem classes are used to help develop the students’ abilities at applying the theory to solving problems.

The teaching methods are appropriate to allow students to develop a wide range of skills. From understanding basic concepts and facts to higher-order thinking.

Assessment Methods

The format of resits will be determined by the Board of Examiners

Exams
Description Length Semester When Set Percentage Comment
Written Examination1202A80N/A
Other Assessment
Description Semester When Set Percentage Comment
Prob solv exercises2M10Problem-solving exercises assessment
Written exercise2M10In class test
Assessment Rationale And Relationship

A substantial formal unseen examination is appropriate for the assessment of the material in this module. The format of the examination will enable students to reliably demonstrate their own knowledge, understanding and application of learning outcomes.

Examination problems may require a synthesis of concepts and strategies from different sections, while they may have more than one way for solution. The examination time allows the students to test different strategies, work out examples and gather evidence for deciding on an effective strategy, while carefully articulating their ideas and explicitly citing the theory they are using.

The coursework assignment and in-class test allow the students to develop their problem solving techniques, to practise the methods learnt in the module, to assess their progress and to receive feedback; these assessments have a secondary formative purpose as well as their primary summative purpose.

Reading Lists

Timetable