MAS3907 : Big Data Analytics
MAS3907 : Big Data Analytics
- Offered for Year: 2024/25
- Module Leader(s): Dr Steffen Grunewalder
- Owning School: Mathematics, Statistics and Physics
- Teaching Location: Newcastle City Campus
Semesters
Your programme is made up of credits, the total differs on programme to programme.
Semester 2 Credit Value: | 10 |
ECTS Credits: | 5.0 |
European Credit Transfer System | |
Pre-requisite
Modules you must have done previously to study this module
Code | Title |
---|---|
MAS2901 | Introduction to Statistical Inference |
MAS2902 | Introduction to Regression and Stochastic Modelling |
MAS2906 | Computational Probability and Statistics with R |
Pre Requisite Comment
N/A
Co-Requisite
Modules you need to take at the same time
Code | Title |
---|---|
MAS3903 | Linear Models |
Co Requisite Comment
N/A
Aims
To develop an understanding of the statistical theory underpinning methods and models for the analysis of “big” and, in particular, multivariate data. To gain experience in the application of this theory to a large data set.
Module summary
More data than ever before are being generated and stored, in a variety of fields such as healthcare and e-commerce. The term “big data” has emerged in acknowledgement of the vast amounts of data now available. By applying statistical analyses to these data sets, we can start to use them to answer important questions, for example, which genetic markers are associated with incidence of a particular disease. Commonly the data sets that arise are multivariate, comprising a large number of observations on many variables. In this module we study how we can learn from data sets of this form. We begin by considering their representation in R, and techniques for generating numerical and graphical summaries. We then turn to consider more formal techniques - often branded "unsupervised learning" - intended to summarise the relationships between variables or observations. Finally, we consider a collection of inferential procedures - so-called "supervised learning" techniques - where the goal is to predict a categorical or quantitative response variable on the basis of a collection of covariates. In the latter case, we study linear regression, focusing on overcoming the problems that arise when confronted with a very large number of covariates.
Outline Of Syllabus
Introduction to big data, particularly multivariate data, data summaries and use of R data frames. Principal components and cluster analysis. Classification methods using discriminant analysis; use of cross-validation. Methods based on linear regression, including variable selection methods; shrinkage using ridge regression, the lasso and the elastic net; dimension reduction using principal components regression and partial least squares.
Learning Outcomes
Intended Knowledge Outcomes
At the end of the module, students will know how to: apply linear algebra to transform and manipulate multivariate data; define, find and interpret the principal components of a multivariate data set; derive the Bayes classifier for linear and quadratic discriminant analysis and apply the idea to classify data into multiple groups; understand shrinkage and dimension reduction methods to improve least squares regression when there are many covariates.
Intended Skill Outcomes
At the end of the module, students will be able to: represent a large data set in R using appropriate data structures; use R to produce appropriate graphical and numerical summaries of large data sets, and interpret the results; use R to analyse large data sets using techniques in supervised and unsupervised learning, and interpret the results; use R to compare the performance of different models using cross-validation.
Students will develop skills across the cognitive domain (Bloom’s taxonomy, 2001 revised edition): remember, understand, apply, analyse, evaluate and create.
Teaching Methods
Teaching Activities
Category | Activity | Number | Length | Student Hours | Comment |
---|---|---|---|---|---|
Scheduled Learning And Teaching Activities | Lecture | 2 | 1:00 | 2:00 | Revision Lectures |
Scheduled Learning And Teaching Activities | Lecture | 20 | 1:00 | 20:00 | Formal Lectures |
Guided Independent Study | Assessment preparation and completion | 15 | 1:00 | 15:00 | Completion of in course assessments |
Scheduled Learning And Teaching Activities | Lecture | 5 | 1:00 | 5:00 | Problem Classes |
Guided Independent Study | Independent study | 58 | 1:00 | 58:00 | Preparation time for lectures, background reading, coursework review |
Total | 100:00 |
Teaching Rationale And Relationship
The teaching methods are appropriate to allow students to develop a wide range of skills, from understanding basic concepts and facts to higher-order thinking.
Lectures are used for the delivery of theory and explanation of methods, illustrated with examples, and for giving general feedback on marked work. Problem Classes are used to help develop the students’ abilities at applying the theory to solving problems.
Reading Lists
Assessment Methods
The format of resits will be determined by the Board of Examiners
Exams
Description | Length | Semester | When Set | Percentage | Comment |
---|---|---|---|---|---|
Written Examination | 120 | 2 | A | 80 | N/A |
Other Assessment
Description | Semester | When Set | Percentage | Comment |
---|---|---|---|---|
Prob solv exercises | 2 | M | 5 | Problem-solving exercises assessment |
Prob solv exercises | 2 | M | 5 | Problem-solving exercises assessment |
Prob solv exercises | 2 | M | 5 | Problem-solving exercises assessment |
Prob solv exercises | 2 | M | 5 | Problem-solving exercises assessment |
Assessment Rationale And Relationship
A substantial formal unseen examination is appropriate for the assessment of the material in this module. The format of the examination will enable students to reliably demonstrate their own knowledge, understanding and application of learning outcomes. The assurance of academic integrity forms a necessary part of the programme accreditation.
Examination problems may require a synthesis of concepts and strategies from different sections, while they may have more than one ways for solution. The examination time allows the students to test different strategies, work out examples and gather evidence for deciding on an effective strategy, while carefully articulating their ideas and explicitly citing the theory they are using.
The coursework assignments allow the students to develop their problem solving techniques, to practise the methods learnt in the module, to assess their progress and to receive feedback; these assessments have a secondary formative purpose as well as their primary summative purpose.
Timetable
- Timetable Website: www.ncl.ac.uk/timetable/
- MAS3907's Timetable
Past Exam Papers
- Exam Papers Online : www.ncl.ac.uk/exam.papers/
- MAS3907's past Exam Papers
General Notes
N/A
Welcome to Newcastle University Module Catalogue
This is where you will be able to find all key information about modules on your programme of study. It will help you make an informed decision on the options available to you within your programme.
You may have some queries about the modules available to you. Your school office will be able to signpost you to someone who will support you with any queries.
Disclaimer
The information contained within the Module Catalogue relates to the 2024 academic year.
In accordance with University Terms and Conditions, the University makes all reasonable efforts to deliver the modules as described.
Modules may be amended on an annual basis to take account of changing staff expertise, developments in the discipline, the requirements of external bodies and partners, and student feedback. Module information for the 2025/26 entry will be published here in early-April 2025. Queries about information in the Module Catalogue should in the first instance be addressed to your School Office.