Skip to main content

Module

CSC8101 : Big Data Analytics

  • Offered for Year: 2020/21
  • Module Leader(s): Professor Paolo Missier
  • Owning School: Computing
  • Teaching Location: Newcastle City Campus
Semesters
Semester 2 Credit Value: 10
ECTS Credits: 5.0

Aims

The aim of the module is to introduce students to the complex combination of data engineering technology and data science that makes it possible to extract valuable knowledge from “Big Data”. A number of technical challenges are derived from the high volume and high diversity (heterogeneity of meaning and format) and variable quality of the data, and a distinction is made based on whether the data is stationary (resides in a data repository) or it is in motion (data streaming, as it would be produced for instance by sensors).
The module emphasises the following aspects:
-       Distribution of data processing over multiple nodes, eg in a cloud environment, as a way to scale up computing resources as the size of the data to be processed increases. This includes current frameworks for massively parallel data processing, such as Spark, practical programming examples and challenges to the students
-       Examples of algorithms that can be successfully parallelised and thus are able to take advantage of distributed data architectures
-       Specialised data structures, specifically graphs, and corresponding graph databases and parallel graph algorithms
-       Examples of data science applications, including Machine Learning algorithms, that are enabled by Big Data technology.
Emphasis is also placed on the rapid pace of technology advances in this area, and cutting-edge further reading material is offered for in-depth learning and deep-dives into specific topics

Outline Of Syllabus

1.       Introduction to Data Science and Data Analytics. Scalability, efficiency of parallel processing.
2.       Batch Big Data Processing (MapReduce)
3.       Computing environments for Big Data Analytics and Machine Learning:
o       Spark
o       Workflows (Knime)
4.       Data Stream processing: Overview of real time Event Processing and querying
5.       Graph data processing

Teaching Methods

Please note that module leaders are reviewing the module teaching and assessment methods for Semester 2 modules, in light of the Covid-19 restrictions. There may also be a few further changes to Semester 1 modules. Final information will be available by the end of August 2020 in for Semester 1 modules and the end of October 2020 for Semester 2 modules.

Teaching Activities
Category Activity Number Length Student Hours Comment
Structured Guided LearningLecture materials151:0015:005 hrs / week lecture time. Ansync online
Guided Independent StudyAssessment preparation and completion301:0030:00independent programming using cloud / online resources - (Azure) location independent
Guided Independent StudyAssessment preparation and completion51:005:00preparing a recorded presentation (5') from the "directed research and reading" activity
Guided Independent StudyDirected research and reading101:0010:00reading in-depth material (eg research papers)
Scheduled Learning And Teaching ActivitiesWorkshops41:004:00Synchronous online - intro to programming environments
Scheduled Learning And Teaching ActivitiesDrop-in/surgery32:006:00May include multiple media, whiteboard, computer monitor etc 1 session PiP 2 sessions sync online
Guided Independent StudyIndependent study281:0028:00in proportion to lecture time (2:1)
Scheduled Learning And Teaching ActivitiesModule talk21:002:00Synchronous online module talk with lecturer
Total100:00
Teaching Rationale And Relationship

Lectures will be used to introduce the learning material and for demonstrating the key concepts by example.
Students are then expected to address specific topics in depth and independently (Directed research and reading). They will be required to prepare a 5’ recorded presentation (Assessment preparation and completion)
Students will be completing a practical programming exercise mostly on their own time (Assessment preparation and completion) but Workshops are offered to introduce the computational environment(s), as well as weekly Drop-in/Surgery hours to help solve practical problems. These will be PIP but only to the 4 hours limit, with more hours offered in sync online mode.

Assessment Methods

Please note that module leaders are reviewing the module teaching and assessment methods for Semester 2 modules, in light of the Covid-19 restrictions. There may also be a few further changes to Semester 1 modules. Final information will be available by the end of August 2020 in for Semester 1 modules and the end of October 2020 for Semester 2 modules.

The format of resits will be determined by the Board of Examiners

Other Assessment
Description Semester When Set Percentage Comment
Oral Examination2M10A presentation. Following independent reading, students will submit a recorded presentation (5').
Computer assessment2M90Programming exercise: Spark or workflow-based programming.
Assessment Rationale And Relationship

The assessment structure is designed to
-       encourage students to engage with the theory and methods (algorithms, data architectures), by reading research papers and presenting a critical appraisal of one selected paper (or comparing two, for example) as a short presentation
-       encourage students to develop problem-solving skills while learning practical aspects of data analytics and learning to appreciate its many challenges, by engaging with one or more programming environments

Reading Lists

Timetable