CSC8101 : Big Data Analytics
- Offered for Year: 2019/20
- Module Leader(s): Dr Paolo Missier
- Owning School: Computing
- Teaching Location: Newcastle City Campus
|Semester 2 Credit Value:||10|
The aim of Big Data Analytics is to analyse large amounts of data in order to extract useful information. Examples include analysing the world wide web to power web search engines, optimising the design of e-commerce sites by analysing user activity, and processing “open linked data” released globally both by governments in order to improve public services, as well as by research organizations in order to improve data sharing. Whilst data analysis has been an important topic for many decades, three developments have led to a surge of interest in new algorithms and methods. Firstly, there has been an explosion in the quantity and variety of data generated by organisations, programs and sensors: the web is one example of this. This has placed the processing of this data beyond existing approaches. Secondly, cloud computing has provided a new type of dynamically scalable platform on which to parallelise data analysis. Thirdly, there is enormous potential for insight and action deriving from the real-time analysis of data – such as from sensors, social media and e-commerce.
This module focusses on the algorithms, technologies and architectures required to analyse “big data”, with a particular focus on cloud-based solutions.
Outline Of Syllabus
- Scalable data management architectures
- Overview of data-parallel problems in e-science
- Patterns and technology for exploiting cloud infrastructure on data-parallel problems
- Graph databases and their application to social media analysis
- Scalable real-time data processing
|Guided Independent Study||Assessment preparation and completion||20||1:00||20:00||Lecture follow up|
|Scheduled Learning And Teaching Activities||Lecture||20||1:00||20:00||Lectures|
|Scheduled Learning And Teaching Activities||Practical||18||1:00||18:00||Practicals|
|Guided Independent Study||Project work||24||1:00||24:00||Coursework / Lab reports|
|Guided Independent Study||Independent study||18||1:00||18:00||Background reading|
Teaching Rationale And Relationship
Lectures will be used to introduce the learning material and for demonstrating the key concepts by example. Students are expected to follow-up lectures within a few days by re-reading and annotating lecture notes to aid deep learning.
This is a very practical subject, and it is important that the learning materials are supported by hands-on opportunities provided by practical classes. Students are expected to spend time on coursework outside timetabled practical classes.
Students aiming for 1st class marks are expected to widen their knowledge beyond the content of lecture notes through background reading.
The format of resits will be determined by the Board of Examiners
|Practical/lab report||2||M||10||in-lab sign-off assignment: Neo4J queries. (8 hours)|
|Practical/lab report||2||M||90||Spark programming. 24 hours. (Includes demo/discussion of work.)|
Assessment Rationale And Relationship
The assessment structure is designed to maximize engagement of the students with an area of technology that is evolving very rapidly. This is achieved in two ways.
- in-lab assignment, signed-off at the end of a 2 hours lab. This covers the one of the key topics in the module (querying graph databases) from a hands-on, practical programming perspective.
- coursework assignment (programming) with free lab time as well as assisted practical hours. The aim it to offer students a rich, hands-on experience using the dominant technology for Big Data Analytics (Spark), on a state-of-the-art industry-grade platform.
As part of the coursework assessment, a short demonstration session is conducted with each student individually to discuss their solution as well as to test students’ knowledge of other topics covered in the module.