Module Catalogue 2024/25

CSC8101 : Engineering for AI

CSC8101 : Engineering for AI

  • Offered for Year: 2024/25
  • Module Leader(s): Dr Mutaz Barika
  • Teaching Assistant: Mr Iain Dixon
  • Owning School: Computing
  • Teaching Location: Newcastle City Campus
Semesters

Your programme is made up of credits, the total differs on programme to programme.

Semester 2 Credit Value: 10
ECTS Credits: 5.0
European Credit Transfer System
Pre-requisite

Modules you must have done previously to study this module

Pre Requisite Comment

None

Co-Requisite

Modules you need to take at the same time

Co Requisite Comment

None

Aims

The aim of the module is to introduce students to the complex combination of data engineering technology and data science that makes it possible to extract valuable knowledge from “Big Data”. A number of technical challenges are derived from the high volume and high diversity (heterogeneity of meaning and format) and variable quality of the data, and a distinction is made based on whether the data is stationary (resides in a data repository) or it is in motion (data streaming, as it would be produced for instance by sensors), with further emphasis on graph data structures.

The module will focus on the following aspects:
- Distribution of data processing over a cluster of computing nodes, hosted in a cloud environment, as a way to
scale out computing resources as the size of the data to be processed increases. This includes current
frameworks for massively parallel data processing, notably Spark which is the most successful example of
cloud-based distributed programming platform, and possibly Dask, its direct competitor.
- Examples of algorithms that can be successfully parallelised and thus are able to take advantage of
distributed data architectures.
- Models of computation that enable near- real time analytics on data streams.
- Specialised data structures, specifically graphs. The module covers basics of graph databases (Neo4J) but
also massively parallel graph algorithms, i.e., implemented using the Pregel framework.
- Examples of data science applications, including Machine Learning algorithms that are enabled by Big Data
technology.

Emphasis is also placed on the rapid pace of technology advances in this area, and cutting-edge further reading material is offered for in-depth learning and deep-dives into specific topics.

Outline Of Syllabus

1. Introduction to Data Science and Data Analytics. Scalability, efficiency of parallel processing.
2. Batch Big Data Processing (MapReduce).
3. Computing environments for Big Data Analytics and Machine Learning.
4. Big Data platforms (Databricks), Spark.
5. Data Stream processing: Overview of real time Event Processing and querying.
6. Graph data processing: Example of algorithms for graph analytics, graph databases and query languages
(GDBMS), massively parallel graph processing model (Pregel).

Learning Outcomes

Intended Knowledge Outcomes

You will:
- Learn fundamental notions of parallel data processing and scalability.
- Understand the challenges associated with processing different types of Big Data (batch, streaming, graph-
structured).
- Learn fundamental concepts in data analytics: Exploratory (EDA) and Predictive (Machine Learning) with case
studies in different application domains.
- Learn to take “stay on top” of cutting edge algorithms and architectures for Scalable Data Engineering, by
discovering and reading selected research literature and providing a critical analysis.

Intended Skill Outcomes

- Learn to use practical computation environments for Big Data: Spark (massively parallel data processing) on
the Cloud, and analytics workflows, with applications to specific analysis goals in diverse application
domains.
- Develop problem-solving skills that are specific to Big Data Analytics.

Teaching Methods

Teaching Activities
Category Activity Number Length Student Hours Comment
Guided Independent StudyAssessment preparation and completion401:0040:00Independent programming / coursework as practical/lab report (70%) & in class test (30%).
Scheduled Learning And Teaching ActivitiesLecture141:0014:00In person lectures. Lecture notes and/or previously recorded lectures will be made available, and there is an assumption that students will have familiarized themselves with such material prior to each class. This makes it possible, when appropriate, to “flip the class” and focus on Q&A and exercise
Guided Independent StudyDirected research and reading181:0018:00Prep ahead of class using Pre-recorded lectures or other teaching materials.
Scheduled Learning And Teaching ActivitiesPractical101:0010:00Online / in lab time with demonstrators
Guided Independent StudyIndependent study181:0018:00after class study time
Total100:00
Teaching Rationale And Relationship

The learning experience is organized into two parts with roughly equal weight:

1. Theory (50 hours). This follows the paradigm: watch-study-engage. Lectures will be used to introduce the
learning material and for demonstrating the key concepts by example. Selected lectures will be pre-recorded
to enable the class to be “flipped” during scheduled lecture time. For these lectures, students will be
expected to follow the recording ahead of time (Structured Guided Learning) and then engage in Q&A during
online / PIP class time.

Students are also expected to address specific topics in depth and independently (Directed research and
reading) as part of this

2. Practical programming (50 hours). Workshops are offered to introduce the computational environment(s), as
well as weekly Drop- in/Surgery hours to help solve practical problems. The bulk of the time for this part
is for independent study and programming.

Reading Lists

Assessment Methods

The format of resits will be determined by the Board of Examiners

Exams
Description Length Semester When Set Percentage Comment
Written Examination602M30This is a written test to be administered during teaching time, normally during the last scheduled class.
Other Assessment
Description Semester When Set Percentage Comment
Practical/lab report2M70Report will document in detail how student has approached & solved a series of programming exercises that will have been set through the module. The report will include the code written to solve the problem, with evidence documenting the code in action.
Assessment Rationale And Relationship

The assessment structure is designed to:

- promote a deep understanding of the lecture material through assessed exercises.
- encourage students to engage with one or more programming environments, which may be new to them, and develop
practical problem-solving skills to address specific programming challenges.

Students will be completing a practical programming exercise and report on their results (70% of total mark) mostly on their own time (Assessment preparation and completion). A final in-class test (30% of total mark) will take place on the last day of class.

Depending on the available computing resources, the practical assessment exercise may be carried out in small groups (2-3 students). However, each student is expected to submit their own individual report and the assessment is based on individual work. Balance of effort and fairness within the group will be assessed through a short interview with all members of each group.

Timetable

Past Exam Papers

General Notes

N/A

Welcome to Newcastle University Module Catalogue

This is where you will be able to find all key information about modules on your programme of study. It will help you make an informed decision on the options available to you within your programme.

You may have some queries about the modules available to you. Your school office will be able to signpost you to someone who will support you with any queries.

Disclaimer

The information contained within the Module Catalogue relates to the 2024 academic year.

In accordance with University Terms and Conditions, the University makes all reasonable efforts to deliver the modules as described.

Modules may be amended on an annual basis to take account of changing staff expertise, developments in the discipline, the requirements of external bodies and partners, and student feedback. Module information for the 2025/26 entry will be published here in early-April 2025. Queries about information in the Module Catalogue should in the first instance be addressed to your School Office.