Module Catalogue 2024/25

MAS8381 : Statistics for Big Data (Inactive)

MAS8381 : Statistics for Big Data (Inactive)

  • Inactive for Year: 2024/25
  • Module Leader(s): Dr Sarah Heaps
  • Owning School: Mathematics, Statistics and Physics
  • Teaching Location: Newcastle City Campus
Semesters

Your programme is made up of credits, the total differs on programme to programme.

Semester 1 Credit Value: 15
ECTS Credits: 8.0
European Credit Transfer System
Pre-requisite

Modules you must have done previously to study this module

Pre Requisite Comment

N/A

Co-Requisite

Modules you need to take at the same time

Code Title
CSC8622Programming for Big Data
Co Requisite Comment

N/A

Aims

To achieve an understanding of linear statistical models, and how regression, Analysis of Variance (ANOVA) and Analysis of Covariance (ANCOVA) models arise as special cases. To understand the problem of identifiability in ANOVA, and the role played by parameter constraints and dummy variables in solving it. To achieve an understanding of the principles involved in experimental design, and an awareness of the most important ideas. To achieve an understanding of Generalized Linear Models and achieve familiarity with the most common families. To understand basic multivariate statistical theory and ideas of Bayesian inference.

Module Summary

This module is concerned with building and applying statistical models for data. How does a mixture of quantitative and qualitative variables affect the probability of a person getting a particular illness? Suppose we find an association between age-group and car accidents, how can we study if this association varies between men and women, or from city to city? In this course we consider the issues involved when we wish to construct realistic and useful statistical models for problems which can arise in a range of fields: medicine, finance, social research and environmental issues being some of the main areas.

We consider multiple linear regression models, and see how they are special cases of a General Linear Model (GLM). We move on to consider Analysis of Variance (ANOVA) as another special case of a GLM – this is the problem of investigating contrasts between different levels of a factor in affecting a response. We consider the principles involved in designing a study or experiment, introducing the ideas of blocking, randomization, confounding and factorial experiments. We consider Analysis of Covariance (ANCOVA) which involves mixing linear regression and factor effects, and the idea of interaction between explanatory variables in the way they affect a response. We then generalize linear models to study the topic of Generalized Linear Models, allowing us to build non-linear relationships into our models, and to study many different types of outcome measure which could not have been handled using GLMs. We consider asymptotic maximum likelihood estimation for the multi-parameter case, including the use of information matrices in parameter estimation and likelihood ratio tests for comparing nested models. These ideas are applied to Generalized Linear Models. We study in depth the special cases involved with Binomial outcomes, where we are interested in how explanatory variables affect the success rate, and log-linear models, which enable us to study, among other things, contingency tables involving more than two factors. Multivariate extensions are considered briefly, as well as essential concepts from Bayesian statistics.

The module provides a comprehensive introduction to the issues involved in using statistics to model real large and complex data sets, and to draw relevant conclusions. There is an emphasis on hands-on application of the theory and methods throughout, with extensive use of R.

Outline Of Syllabus

The general linear model: asymptotic likelihood for the multi-parameter case; information matrices, likelihood ratio tests. Estimation of parameters; prediction; model adequacy; regression, ANOVA and ANCOVA as special cases. Model identifiability and parameter constraints and dummy variables. Introduction to experimental design methodology, including randomization, completely randomized designs, randomized block designs, blocking, confounding and factorial experiments. Generalized linear models: overall construction as generalization of linear models; binomial regression with various links; Poisson regression; log-linear models and their use for contingency tables. Multivariate statistics, Bayesian inference. Various extended examples of statistical modelling using R.

Learning Outcomes

Intended Knowledge Outcomes

Students will know the techniques of modelling normal outcomes in terms of categorical and continuous covariates using the general linear model. They will also know how to extend this basic framework to encompass outcomes from several non-Normal exponential family distributions. Students will also know the rudiments of experimental design, multivariate statistics and Bayesian inference.

Intended Skill Outcomes

The ability to determine the appropriate statistical model to use, to be able to use R to fit the model and to be able to interpret to fitted model. The ability to identify the kind of design and modelling approaches needed to address a wide variety of real-life statistical problems, and the ability to implement appropriate statistical modelling procedures using R.

Teaching Methods

Teaching Activities
Category Activity Number Length Student Hours Comment
Scheduled Learning And Teaching ActivitiesLecture122:0024:00Lectures
Guided Independent StudyAssessment preparation and completion121:0012:00Background reading
Guided Independent StudyAssessment preparation and completion122:0024:00Lecture follow-up
Scheduled Learning And Teaching ActivitiesPractical122:0024:00Computer Practicals
Guided Independent StudyProject work142:0042:00Project
Guided Independent StudyProject work46:0024:00Coursework
Total150:00
Teaching Rationale And Relationship

Lectures are used for the delivery of theory and explanation of methods, illustrated with examples, and for giving general feedback on marked work. Practicals are used both for solution of problems and work requiring extensive computation and to give insight into the ideas/methods studied. A large number of practicals are scheduled in order to provide sufficient hands-on training and rapid feedback on understanding.

Reading Lists

Assessment Methods

The format of resits will be determined by the Board of Examiners

Other Assessment
Description Semester When Set Percentage Comment
Practical/lab report1M404 separate practical reports, each max 250 words and worth 10%.
Report1M60Project report (max 2,000 words)
Assessment Rationale And Relationship

Written assignments (approximately 4 pieces of work of approximately equal weight) followed by a larger piece of project work allow the students to develop their problem solving techniques, to practise the methods learnt in the module, to assess their progress and to receive feedback; the smaller pieces of work are thus formative as well as summative assessment.

Timetable

Past Exam Papers

General Notes

N/A

Welcome to Newcastle University Module Catalogue

This is where you will be able to find all key information about modules on your programme of study. It will help you make an informed decision on the options available to you within your programme.

You may have some queries about the modules available to you. Your school office will be able to signpost you to someone who will support you with any queries.

Disclaimer

The information contained within the Module Catalogue relates to the 2024 academic year.

In accordance with University Terms and Conditions, the University makes all reasonable efforts to deliver the modules as described.

Modules may be amended on an annual basis to take account of changing staff expertise, developments in the discipline, the requirements of external bodies and partners, and student feedback. Module information for the 2025/26 entry will be published here in early-April 2025. Queries about information in the Module Catalogue should in the first instance be addressed to your School Office.