CSC8101 : Big Data Analytics

Semester 2 Credit Value: 10
ECTS Credits: 5.0


The aim of Big Data Analytics is to analyse large amounts of data in order to extract useful information. Examples include analysing the world wide web to power web search engines, optimising the design of e-commerce sites by analysing user activity, and processing “open linked data” released globally both by governments in order to improve public services, as well as by research organizations in order to improve data sharing. Whilst data analysis has been an important topic for many decades, three developments have led to a surge of interest in new algorithms and methods. Firstly, there has been an explosion in the quantity and variety of data generated by organisations, programs and sensors: the web is one example of this. This has placed the processing of this data beyond existing approaches. Secondly, cloud computing has provided a new type of dynamically scalable platform on which to parallelise data analysis. Thirdly, there is enormous potential for insight and action deriving from the real-time analysis of data – such as from sensors, social media and e-commerce.
This module focusses on the algorithms, technologies and architectures required to analyse “big data”, with a particular focus on cloud-based solutions.

Outline Of Syllabus

- Scalable data management architectures
- Overview of data-parallel problems in e-science
- Patterns and technology for exploiting cloud infrastructure on data-parallel problems
- Graph databases and their application to social media analysis
- Scalable real-time data processing

Teaching Methods

Teaching Activities
Category Activity Number Length Student Hours Comment
Guided Independent StudyAssessment preparation and completion201:0020:00Lecture follow up
Scheduled Learning And Teaching ActivitiesLecture201:0020:00Lectures
Scheduled Learning And Teaching ActivitiesPractical181:0018:00Practicals
Guided Independent StudyProject work241:0024:00Coursework / Lab reports
Guided Independent StudyIndependent study181:0018:00Background reading
Teaching Rationale And Relationship

Lectures will be used to introduce the learning material and for demonstrating the key concepts by example. Students are expected to follow-up lectures within a few days by re-reading and annotating lecture notes to aid deep learning.

This is a very practical subject, and it is important that the learning materials are supported by hands-on opportunities provided by practical classes. Students are expected to spend time on coursework outside timetabled practical classes.

Students aiming for 1st class marks are expected to widen their knowledge beyond the content of lecture notes through background reading.

Assessment Methods

The format of resits will be determined by the Board of Examiners

Other Assessment
Description Semester When Set Percentage Comment
Practical/lab report2M10in-lab sign-off assignment: Neo4J queries. (8 hours)
Practical/lab report2M90Spark programming. 24 hours. (Includes demo/discussion of work.)
Assessment Rationale And Relationship

The assessment structure is designed to maximize engagement of the students with an area of technology that is evolving very rapidly. This is achieved in two ways.
- in-lab assignment, signed-off at the end of a 2 hours lab. This covers the one of the key topics in the module (querying graph databases) from a hands-on, practical programming perspective.
- coursework assignment (programming) with free lab time as well as assisted practical hours. The aim it to offer students a rich, hands-on experience using the dominant technology for Big Data Analytics (Spark), on a state-of-the-art industry-grade platform.
As part of the coursework assessment, a short demonstration session is conducted with each student individually to discuss their solution as well as to test students’ knowledge of other topics covered in the module.

Reading Lists