Skip to main content

Module

MAS8952 : Topics in Statistics

  • Offered for Year: 2021/22
  • Module Leader(s): Dr Tom Nye
  • Lecturer: Professor Darren Wilkinson
  • Owning School: Mathematics, Statistics and Physics
  • Teaching Location: Newcastle City Campus
Semesters
Semester 1 Credit Value: 15
Semester 2 Credit Value: 15
ECTS Credits: 15.0

Aims

To develop a broader knowledge of advanced statistical topics. To acquire skills in analysing
complex statistical models through both theoretical and data analysis.

Module summary

This module will introduce students to two of the areas described below:

Smoothing and Gaussian Processes: Nonparametric smoothing techniques are an important class of tools for identifying the true signal hidden in noisy data. These tools are widely used in statistical analysis in a variety of application areas including biostatistics and bioinformatics. This topic gives a thorough overview of various smoothing and nonparametric regression methodologies, with emphasis on both theoretical and computational aspects. Particular attention is given to nonparametric regression using Gaussian processes, and issues such as parameter tuning, prediction and classification. The course includes illustrations with real data examples.

Stochastic differential equation models: Diffusion processes satisfying Ito stochastic differential equations (SDEs) form a class of continuous-time, continuous-valued Markov stochastic processes that can be used to model a wide range of physical phenomena. Some application areas include econometrics, engineering, chemistry, statistical physics and biology. This topic is concerned with both the construction of SDE models (from a statistical viewpoint) and the challenging problem of fitting SDEs (within the Bayesian paradigm) to data observed at discrete times. Illustrations with real data examples will be given.

Spatial Statistics: In many fields, such as geology, ecology, image processing and atmospheric
sciences, data are collected at different locations in space. The analysis of such data requires
special treatment because observations are typically highly correlated across locations which
generally do not have a natural order and are often arranged irregularly in space. This topic
explores the three main approaches for modelling spatially referenced data: (i) geostatistics,
where there is a continuously varying spatial index; (ii) lattice data, where the spatial index
takes a discrete set of values over a regular or irregular grid; and (iii) spatial point processes,
where the spatial index itself is treated as a random variable.

Complex data structures: Most statistical methods rely on underlying linear structure within
the data sets and models. However, advances in computing power have enabled models with
more complex structure to be applied, and at the same time, advances in experimental techniques such as DNA sequencing and medical imaging have given rise to data with more complex structure. This topic explores statistics for these novel data sets and models, considering three related areas: (i) directional data; (ii) shape data; (iii) trees and networks.

Statistics of Extremes: This topic introduces a technique which performs a seemingly impossible
task: to predict the probability of events that are more extreme than any that have happened
before. For example, the governmental coastal flood defence division employs statisticians using
these methods to calculate the required height of sea-walls to prevent flooding. Extreme value
statistics is also used to help engineers decide how strong to build bridges or oil rigs and to
model excessively high pollution levels.

Neural networks: Neural networks are a "machine learning" technique for classification and
regression. They can be viewed as highly flexible extensions of linear regression with a large
number of parameters, which often require a large amount of data to fit. In recent years they
have produced state-of-the-art results in fields where "big data" is available, such as analysing
images, sound and text. This topic explains the underlying mathematics and statistics and
how to implement neural networks in practice.

Outline Of Syllabus

This module will introduce students to two of the areas described below. Students will be advised of the two areas before pre-registration.

Smoothing and Gaussian Processes: Overview of various smoothers (moving average, LOWESS,
kernel smoother). Splines (function approximation and splines, cubic splines, B-splines); selection of tuning parameters. Gaussian process regression and classification (choice of covariance
kernel, selection of hyper-parameters, prediction). Application to real problems.

Stochastic differential equation models: Construction from ODEs; Brownian motion, Ito integral. Solution of an SDE; linear SDEs, Ito's formula including sketch proof. Numerical methods for SDEs; Euler-Maruyama, higher order methods including the Milstein scheme.
End-point conditioned SDEs; construction, solution for tractable cases, approximation for in-
tractable cases. Bayesian inference; tractable and intractable cases via Metropolis-Hastings,
data augmentation. Application to real data examples.

Spatial Statistics: Descriptive method for identifying spatial trend and correlation. Geostatistics - stationary Gaussian processes; variograms, covariance functions and their estimation;
spatial prediction and kriging. Lattice data - Markov random field models; conditionally autoregressive (CAR) models. Spatial point processes - exploratory analysis; homogeneous and
inhomogeneous Poisson process models. Use of R to fit models of each type to data.

Complex data structures: Directional data: distributions on the circle and the sphere, parameter estimation. Shape data: size measures and shape coordinates, rotation and size invariance,
Procrustes analysis. Trees and networks: phylogenetic trees, networks, models of evolution,
parameter inference.

Statistics of Extremes: Extremal types theorem, the generalised extreme value distribution
(GEV), likelihood and estimation for the GEV, its uses and limitations. Alternative extreme
value characterisations: the distribution of extreme r order statistics and the generalised Pareto
distribution, the point process characterisation. Application to real problems: temporal dependence, non-stationarity. Issues involved in extreme value modelling of environmental and other
data. A non-mathematical overview of multivariate extremes.

Neural networks: Supervised learning; Multi-layer perceptrons; Stochastic gradient descent;
Automatic differentiation and backpropagation; Regularisation; Deep learning; Application to
real problems using R.

Teaching Methods

Teaching Activities
Category Activity Number Length Student Hours Comment
Scheduled Learning And Teaching ActivitiesLecture601:0060:00Formal Lectures – Present in Person
Scheduled Learning And Teaching ActivitiesLecture61:006:00Revision Lectures – Present in Person
Scheduled Learning And Teaching ActivitiesLecture151:0015:00Problem Classes – Synchronous On-Line
Guided Independent StudyAssessment preparation and completion401:0040:00Completion of in course assessments
Guided Independent StudyIndependent study1791:00179:00Preparation time for lectures, background reading, coursework review
Total300:00
Teaching Rationale And Relationship

Lectures are used for the delivery of theory and explanation of methods, illustrated with examples, and for giving general feedback on marked work. Problem Classes are used to help develop the students’ abilities at applying the theory to solving problems.

Assessment Methods

The format of resits will be determined by the Board of Examiners

Exams
Description Length Semester When Set Percentage Comment
Written Examination451M10Class test 1
Written Examination452M10Class test 2
Written Examination1502A60Closed book
Other Assessment
Description Semester When Set Percentage Comment
Prob solv exercises1M10Coursework assignment
Prob solv exercises2M10Coursework assignment
Assessment Rationale And Relationship

A substantial formal unseen examination is appropriate for the assessment of the material in this module. The coursework assignments allow the students to develop their problem solving techniques, to practise the methods learnt in the module, to assess their progress and to receive feedback; these assessments have a secondary formative purpose as well as their primary summative purpose.

In the event of on-campus examinations not being possible, an on-line alternative assessment will be used for written examination 1.

Reading Lists

Timetable