MAS8404 : Statistical Learning for Data Science

Semester 1 Credit Value: 10
ECTS Credits: 5.0


More data than ever before are being generated and stored, in a variety of fields across industry. The term “big data" has emerged in acknowledgement of the vast amounts of data now available. By applying statistical analyses to these data sets, we can start to use them to answer important questions such as (i) which are the important factors affecting the quality of an industrial process; (ii) how many different types of customer are interested in your product. Commonly the data sets that arise in industry are multivariate, comprising a large number of observations on many variables. In this module we study how we can learn from data sets of this form. There is an emphasis on hands-on application of the theory and methods throughout, with extensive use of R.
Specifically, the module aims to equip students with the following knowledge and skills:
-       To gain an overview of modern statistical approaches to learning from data.
-       To gain experience in the application of these techniques to the analysis of large and complex data sets across a range of application areas in industry.

Outline Of Syllabus

-       Linear regression, including variable selection and regularisation (ridge regression, the lasso and the elastic net)
-       Classification including linear discriminant analysis and logistic regression
-       Generalized linear models
-       Tree-based methods, including regression trees, classification trees and random forests
-       Clustering
-       Principal components analysis

Teaching Methods

Teaching Activities
Category Activity Number Length Student Hours Comment
Guided Independent StudyAssessment preparation and completion151:0015:00Coursework exercises
Scheduled Learning And Teaching ActivitiesLecture92:0018:00Lectures
Guided Independent StudyAssessment preparation and completion10:300:30Oral Examination
Guided Independent StudyAssessment preparation and completion50:302:30Preparation for Oral Examination
Scheduled Learning And Teaching ActivitiesPractical92:0018:00Practical sessions
Guided Independent StudyDirected research and reading191:0019:00Background reading
Guided Independent StudyProject work181:0018:00Project
Guided Independent StudyIndependent study91:009:00Lecture follow-up
Teaching Rationale And Relationship

Lectures are used for the delivery of theory and explanation of methods, illustrated with examples, and for giving general feedback on marked work. Practicals are used both for solution of problems and work requiring extensive computation and to give insight into the ideas/methods studied; they are also used to discuss the course material, identify and resolve specific queries raised by students and to allow students to receive individual feedback on marked work. Office hours provide an opportunity for more direct contact between individual students and the lecturer.

Assessment Methods

The format of resits will be determined by the Board of Examiners

Other Assessment
Description Semester When Set Percentage Comment
Practical/lab report1M45Up to 3 practical reports Word count: Up to 1,000 words as specified for each report.
Report1M55Project report Word count: Up to 1,500 words
Zero Weighted Pass/Fail Assessments
Description When Set Comment
Oral ExaminationMA structured discussion including a software demonstration and reflection on the key learning objectives of the coursework project.
Assessment Rationale And Relationship

Written assignments (approximately 3 pieces of work of approximately equal weight) followed by a larger piece of project work allow the students to develop their problem solving techniques, to practise the methods learnt in the module, to assess their progress and to receive feedback; the smaller pieces of work are thus formative as well as summative assessment.

The semi-structured interview facilitates a reflective discussion about how individual students have met the learning objectives of the module and how the principles of fundamental statistics are embedded in the functionality of their project work.

Reading Lists