MAS8907 : Big Data Analytics (Inactive)
- Inactive for Year: 2024/25
- Module Leader(s): Dr Pete Philipson
- Owning School: Mathematics, Statistics and Physics
- Teaching Location: Newcastle City Campus
Semesters
Your programme is made up of credits, the total differs on programme to programme.
Semester 2 Credit Value: | 10 |
ECTS Credits: | 5.0 |
European Credit Transfer System |
Aims
To develop an understanding of the statistical theory underpinning methods and models for the analysis of “big” and, in particular, multivariate data. To gain experience in the application of this theory to a large data set.
Module summary
More data than ever before are being generated and stored, in a variety of fields such as healthcare and e-commerce. The term “big data” has emerged in acknowledgement of the vast amounts of data now available. By applying statistical analyses to these data sets, we can start to use them to answer important questions, for example, which genetic markers are associated with incidence of a particular disease. Commonly the data sets that arise are multivariate, comprising a large number of observations on many variables. In this module we study how we can learn from data sets of this form. We begin by considering their representation in R, and techniques for generating numerical and graphical summaries. We then turn to consider more formal techniques - often branded "unsupervised learning" - intended to summarise the relationships between variables or observations. Finally we consider a collection of inferential procedures - so-called "supervised learning" techniques - where the goal is to predict a categorical or quantitative response variable on the basis of a collection of covariates. In the latter case, we study linear regression, focusing on overcoming the problems that arise when confronted with a very large number of covariates.
Outline Of Syllabus
Introduction to big data, particularly multivariate data and multivariate random quantities, data summaries and use of R data frames. Principal components and cluster analysis. Classification methods using discriminant analysis; use of cross-validation. Methods based on linear regression, including variable selection methods; shrinkage using ridge regression, the lasso and the elastic net; dimension reduction using principal components regression and partial least squares.
Teaching Methods
Teaching Activities
Category | Activity | Number | Length | Student Hours | Comment |
---|---|---|---|---|---|
Guided Independent Study | Assessment preparation and completion | 1 | 2:00 | 2:00 | Unseen exam |
Scheduled Learning And Teaching Activities | Lecture | 3 | 1:00 | 3:00 | Problem classes |
Scheduled Learning And Teaching Activities | Lecture | 2 | 1:00 | 2:00 | Revision lectures |
Scheduled Learning And Teaching Activities | Lecture | 25 | 1:00 | 25:00 | Formal lectures |
Guided Independent Study | Assessment preparation and completion | 1 | 13:00 | 13:00 | Revision for unseen exam |
Guided Independent Study | Independent study | 1 | 22:00 | 22:00 | Studying, practising and gaining understanding of course material |
Guided Independent Study | Independent study | 3 | 3:00 | 9:00 | Review of exercises and group project |
Guided Independent Study | Independent study | 1 | 12:00 | 12:00 | Preparation for group project |
Guided Independent Study | Independent study | 2 | 6:00 | 12:00 | Preparation for exercises |
Total | 100:00 |
Jointly Taught With
Code | Title |
---|---|
MAS3907 | Big Data Analytics |
Teaching Rationale And Relationship
Lectures are used for the delivery of theory and explanation of methods, illustrated with examples, and for giving general feedback on marked work. Problem Classes are used to help develop the students’ abilities at applying the theory to solving problems. Tutorials are used to identify and resolve specific queries raised by students and to allow students to receive individual feedback on marked work. In addition, office hours (two per week) will provide an opportunity for more direct contact between individual students and the lecturer.
Assessment Methods
The format of resits will be determined by the Board of Examiners
Exams
Description | Length | Semester | When Set | Percentage | Comment |
---|---|---|---|---|---|
Written Examination | 120 | 2 | A | 80 | N/A |
Exam Pairings
Module Code | Module Title | Semester | Comment |
---|---|---|---|
Big Data Analytics | 2 | N/A |
Other Assessment
Description | Semester | When Set | Percentage | Comment |
---|---|---|---|---|
Prob solv exercises | 2 | M | 5 | Exercises |
Prob solv exercises | 2 | M | 15 | Group project |
Assessment Rationale And Relationship
A substantial formal unseen examination is appropriate for the assessment of the material in this module. The exercises are expected to consist of two assignments of equal weight: the exact nature of assessment will be explained at the start of the module. The exercises and the group project allow the students to develop their problem solving techniques, to practise the methods learnt in the module, to assess their progress and to receive feedback; these assessments have a secondary formative purpose as well as their primary summative purpose.
Reading Lists
Timetable
- Timetable Website: www.ncl.ac.uk/timetable/
- MAS8907's Timetable