# MAS8918 : Topics in Statistical Modelling A

• Offered for Year: 2023/24
• Module Leader(s): Dr Daniel Henderson
• Lecturer: Dr Markus Rau
• Owning School: Mathematics, Statistics and Physics
• Teaching Location: Newcastle City Campus
##### Semesters

Your programme is made up of credits, the total differs on programme to programme.

 Semester 2 Credit Value: 20 ECTS Credits: 10.0 European Credit Transfer System

#### Aims

To develop a broader knowledge of statistical topics. To acquire skills in analysing

statistical models through both theoretical and data analysis.

Module summary

The module involves discussion of current research in Statistics at Newcastle and will introduce students to two of the areas which could include those below. Each topic will involve theoretical development and study of applications, often with a computational aspect.

Statistics of extremes: This topic introduces a technique which performs a seemingly impossible task: to predict the probability of events that are more extreme than any that have happened before. Examples include calculating the required height of sea-walls to prevent flooding and modelling excessively high pollution levels.

Sports modelling: In this topic the focus is on statistical modelling of outcomes of sporting contests. Consideration will be given to constructing rankings based on pairwise outcomes (in the form of win-loss or a match score) and on outcomes involving multiple items, which may be in the form of an explicit ordering (i.e. times in a 100m sprint race) or an implicit order (e.g. runs scored by several batters).

Time series: A time series is a set of ordered data with respect to time, such as the carbon dioxide concentration at a specific location measured at noon each day or the sales of a product recorded each month. Often in statistics, data are regarded as independent draws from a population. In time series analysis we typically do not regard consecutive observations to be independent, and build models to represent this dependence. Time series exhibit features such as trends and seasonal, or periodic, behaviour. In this topic we consider modelling and inference for time series and forecasting future observations.

Survival analysis: There are many areas where interest focuses on data which measures the time to some event. In recent decades the principal application for such data has been how long patients survive before some event occurs. The event may be death or it may be the recurrence of a disease which had been in remission, or some other event. Applications are not solely medical: how long it takes a battery to run down or how long a component in a machine lasts before it fails are just two industrial examples. Such data are known as survival data, or sometimes lifetime data, and their analysis is called survival analysis. The main complication with survival data is that many observations will be ‘censored’, i.e. they are only partially observed. For example, when a trial of a new treatment for cancer is terminated many of the patients will still be alive. Therefore the survival times of those who died will be known exactly whereas for those still alive at the end of the trial, their survival time is only known to exceed their present survival. Methods for dealing with this form of data will be considered.

#### Outline Of Syllabus

This module will introduce students to two of the areas which could include those listed below. The anticipated syllabus is given below, although there may be changes to reflect recent developments.

Statistics of extremes: Extremal types theorem, the generalised extreme value distribution (GEV), likelihood and estimation for the GEV, its uses and limitations. Alternative extreme value characterisations: the distribution of extreme r order statistics and the generalised Pareto distribution, the point process characterisation. Application to real problems: temporal dependence, non-stationarity. Issues involved in extreme value modelling of environmental and other data. A non-mathematical overview of multivariate extremes.

Sports modelling: Methods for ranking teams/players (Bradley-Terry model) based on pairwise comparisons, extensions to handling ties (Davidson model); methods for ranking multiple items (Plackett-Luce) including adjustments for multiple ties. Basic Poisson model for football scores; Dixon-Coles model. Extended count data models; zero-inflation and overdispersion. Incorporation of home advantage, dynamic models and time-weighted components. Applications to sports such as football, tennis, cricket and Formula One will be used throughout.

This module will introduce students to two of the areas which could include those listed below. The anticipated syllabus is given below, although there may be changes to reflect recent developments.

Statistics of extremes: Extremal types theorem, the generalised extreme value distribution (GEV), likelihood and estimation for the GEV, its uses and limitations. Alternative extreme value characterisations: the distribution of extreme r order statistics and the generalised Pareto distribution, the point process characterisation. Application to real problems: temporal dependence, non-stationarity. Issues involved in extreme value modelling of environmental and other data. A non-mathematical overview of multivariate extremes.

Sports modelling: Methods for ranking teams/players (Bradley-Terry model) based on pairwise comparisons, extensions to handling ties (Davidson model); methods for ranking multiple items (Plackett-Luce) including adjustments for multiple ties. Basic Poisson model for football scores; Dixon-Coles model. Extended count data models; zero-inflation and overdispersion. Incorporation of home advantage, dynamic models and time-weighted components. Applications to sports such as football, tennis, cricket and Formula One will be used throughout.

Time series: Introduction to time series, including trend effects, seasonality and moving averages. Linear Gaussian processes, stationarity, autocovariance and autocorrelation. Autoregressive (AR), moving average (MA) and mixed (ARMA) models for stationary processes. Likelihood in a simple case such as AR(1). ARIMA processes, differencing, seasonal ARIMA as models for non-stationary processes. The role of sample autocorrelation, partial autocorrelation and correlograms in model choice. Tests of autocorrelation. Inference for model parameters. Forecasting. Dynamic linear models and the Kalman filter. Filtering and smoothing. Use of R for time series analysis.

Survival analysis: Time-to-event data, censoring patterns. Non-parametric survival analysis: calculation of Kaplan-Meier estimates; use of log-rank statistics. Parametric survival analysis: exponential, Weibull and log-logistic distributions; likelihood analysis of effect of covariates. Proportional hazards model: partial likelihood; diagnostics; time-varying effects. Frailty. Prediction and explained variation.

#### Teaching Methods

##### Teaching Activities
Category Activity Number Length Student Hours Comment
Scheduled Learning And Teaching ActivitiesLecture41:004:00Revision Lectures
Scheduled Learning And Teaching ActivitiesLecture401:0040:00Formal Lectures
Guided Independent StudyAssessment preparation and completion301:0030:00Completion of in course assessments
Scheduled Learning And Teaching ActivitiesLecture101:0010:00Problem Classes
Guided Independent StudyIndependent study1161:00116:00Preparation time for lectures, background reading, coursework review
Total200:00
##### Jointly Taught With
Code Title
MAS3918Topics in Statistical Modelling A
##### Teaching Rationale And Relationship

The teaching methods are appropriate to allow students to develop a wide range of skills, from understanding basic concepts and facts to higher-order thinking. Lectures are used for the delivery of theory and explanation of methods, illustrated with examples, and for giving general feedback on marked work. Problem classes are used to help develop the students’ abilities at applying the theory to solving problems

#### Assessment Methods

The format of resits will be determined by the Board of Examiners

##### Exams
Description Length Semester When Set Percentage Comment
Written Examination1502A80N/A
##### Exam Pairings
Module Code Module Title Semester Comment
Topics in Statistical Modelling A2N/A
##### Other Assessment
Description Semester When Set Percentage Comment
Prob solv exercises2M5Problem-solving exercises assessment
Prob solv exercises2M5Problem-solving exercises assessment
Prob solv exercises2M5Problem-solving exercises assessment
Prob solv exercises2M5Problem-solving exercises assessment
##### Assessment Rationale And Relationship

A substantial formal unseen examination is appropriate for the assessment of the material in this module. The format of the examination will enable students to reliably demonstrate their own knowledge, understanding and application of learning outcomes. The assurance of academic integrity forms a necessary part of the programme accreditation.

Examination problems may require a synthesis of concepts and strategies from different sections, while they may have more than one ways for solution. The examination time allows the students to test different strategies, work out examples and gather evidence for deciding on an effective strategy, while carefully articulating their ideas and explicitly citing the theory they are using.

The coursework assignments allow the students to develop their problem solving techniques, to practise the methods learnt in the module, to assess their progress and to receive feedback; these assessments have a secondary formative purpose as well as their primary summative purpose.