Module Catalogue 2021/22

CSC8101 : Big Data Analytics

  • Offered for Year: 2021/22
  • Module Leader(s): Professor Paolo Missier
  • Owning School: Computing
  • Teaching Location: Newcastle City Campus
Semesters
Semester 2 Credit Value: 10
ECTS Credits: 5.0
Pre Requisites
Pre Requisite Comment

None

Co Requisites
Co Requisite Comment

None

Aims

The aim of the module is to introduce students to the complex combination of data engineering technology and data science that makes it possible to extract valuable knowledge from “Big Data”. A number of technical challenges are derived from the high volume and high diversity (heterogeneity of meaning and format) and variable quality of the data, and a distinction is made based on whether the data is stationary (resides in a data repository) or it is in motion (data streaming, as it would be produced for instance by sensors).
The module emphasises the following aspects:
-       Distribution of data processing over multiple nodes, e.g. in a cloud environment, as a way to scale up computing resources as the size of the data to be processed increases. This includes current frameworks for massively parallel data processing, such as Spark, practical programming examples and challenges to the students
-       Examples of algorithms that can be successfully parallelised and thus are able to take advantage of distributed data architectures
-       Specialised data structures, specifically graphs, and corresponding graph databases and parallel graph algorithms
-       Examples of data science applications, including Machine Learning algorithms, that are enabled by Big Data technology.
Emphasis is also placed on the rapid pace of technology advances in this area, and cutting-edge further reading material is offered for in-depth learning and deep-dives into specific topics

Outline Of Syllabus

1.       Introduction to Data Science and Data Analytics. Scalability, efficiency of parallel processing.
2.       Batch Big Data Processing (MapReduce)
3.       Computing environments for Big Data Analytics and Machine Learning:
•       Big Data platforms (Hortonworks, Cloudera), Spark
4.       Data Stream processing: Overview of real time Event Processing and querying
5.       Graph data processing: Example of algorithms for graph analytics, graph databases and query languages (GDBMS), massively parallel graph processing model

Learning Outcomes

Intended Knowledge Outcomes

You will:
-       Learn fundamental notions of parallel data processing and scalability
-       Understand the challenges associated with processing different types of Big Data (batch, streaming, graph-structured)
-       Learn fundamental concepts in data analytics: Exploratory (EDA) and Predictive (Machine Learning) with case studies in different application domains
-       Learn to take “stay on top” of cutting edge algorithms and architectures for Scalable Data Engineering, by discovering and reading selected research literature and providing a critical analysis

Intended Skill Outcomes

-       Learn to use practical computation environments for Big Data: Spark (massively parallel data processing) on the Cloud, and analytics workflows, with applications to specific analysis goals in diverse application domains
-       Develop problem-solving skills that are specific to Big Data Analytics

Teaching Methods

Please note that module leaders are reviewing the module teaching and assessment methods for Semester 2 modules, in light of the Covid-19 restrictions. There may also be a few further changes to Semester 1 modules. Final information will be available by the end of August 2020 in for Semester 1 modules and the end of October 2020 for Semester 2 modules.

Teaching Activities
Category Activity Number Length Student Hours Comment
Guided Independent StudyAssessment preparation and completion401:0040:00Independent programming / coursework development & in class test
Scheduled Learning And Teaching ActivitiesLecture141:0014:00Online / in class sessions. these are “flipped lectures” (see rationale below)
Guided Independent StudyDirected research and reading121:0012:00Pre-recorded lectures or other teaching material to watch / listen to ahead of class, with exercises
Scheduled Learning And Teaching ActivitiesDrop-in/surgery101:0010:00Online / in lab time with demonstrators
Guided Independent StudyIndependent study241:0024:00In proportion to directed study time (2:1) – to prepare for next class
Total100:00
Teaching Rationale And Relationship

The learning experience is organized into two parts with roughly equal weight:
1.       Theory (50 hours). In turn this follows the paradigm: watch-study-engage. Lectures will be used to introduce the learning material and for demonstrating the key concepts by example. Selected lectures will be pre-recorded to enable the class to be “flipped” during scheduled lecture time. For these lectures, students will be expected to follow the recording ahead of time (Structured Guided Learning) and then engage in Q&A during online / PIP class time.
Students are also expected to address specific topics in depth and independently (Directed research and reading) as part of this
2.       Practical programming. Workshops are offered to introduce the computational environment(s), as well as weekly Drop- in/Surgery hours to help solve practical problems. The bulk of the time for this part is for independent study and programming.

Reading Lists

Assessment Methods

Please note that module leaders are reviewing the module teaching and assessment methods for Semester 2 modules, in light of the Covid-19 restrictions. There may also be a few further changes to Semester 1 modules. Final information will be available by the end of August 2020 in for Semester 1 modules and the end of October 2020 for Semester 2 modules.

The format of resits will be determined by the Board of Examiners

Other Assessment
Description Semester When Set Percentage Comment
Oral Examination2M50Following lectures, students will answer a set of questions at the end of the scheduled lectures .
Prob solv exercises2M50Programming exercise consisting of multiple parts
Assessment Rationale And Relationship

The assessment structure is designed to
-       promote a deep understanding of the lecture material through assessed exercises
-       encourage students to engage with one or more programming environments, which may be new to them, and develop practical problem-solving skills to address specific programming challenges

Students will be completing a practical programming exercise (50% of total mark) mostly on their own time (Assessment preparation and completion) but a final in-class test (50% of total mark) will take place on the last day of class.

Timetable

Past Exam Papers

General Notes

N/A

Disclaimer: The information contained within the Module Catalogue relates to the 2021/22 academic year. In accordance with University Terms and Conditions, the University makes all reasonable efforts to deliver the modules as described. Modules may be amended on an annual basis to take account of changing staff expertise, developments in the discipline, the requirements of external bodies and partners, and student feedback. Module information for the 2022/23 entry will be published here in early-April 2022. Queries about information in the Module Catalogue should in the first instance be addressed to your School Office.