CSC8631 : Data Management and Exploratory Data Analysis

Semester 1 Credit Value: 10
ECTS Credits: 5.0


Data handling and characterisation are a central part of Data Science. With a deluge of data it can be critical to dissect, and explore, data in a systematic way. In doing so, this leads to reliable results using methods that can be reproduced, refined and interrogated by technical and non-technical stakeholders.

This module aims to explores the principles of data management and rigorous experimental design. Furthermore, we introduce the underlying technologies and computational tools that are required to support best practice in this area.

Specifically, the module aims to equip the students with the following knowledge and skills:
•       To understanding of the principles of the scientific method and how it is applied in computational analyses
•       To understand methods of data characterisation and data processing
•       To understand the principles of knowledge representation and constructing data models
•       To understand the technologies that support analysis pipelines
•       To understand end-to-end system design for Data Science

Outline Of Syllabus

1.       Scientific method in computational analyses
2.       The software lifecycle
3.       The data lifecycle
4.       Variable characterisation and experimental design
5.       Exploratory data analysis
6.       Semantics and knowledge representation
7.       ETL (extract transform and load) process and data warehousing
8.       System design, microservices and workflows
9.       Developing data products

Teaching Methods

Teaching Activities
Category Activity Number Length Student Hours Comment
Guided Independent StudyAssessment preparation and completion10:300:30Oral Examination
Guided Independent StudyAssessment preparation and completion50:302:30Preparation for oral examination
Guided Independent StudyAssessment preparation and completion12:002:00Project presentations session
Guided Independent StudyAssessment preparation and completion271:0027:00Coursework project
Guided Independent StudyAssessment preparation and completion31:003:00Preparation for oral presentation
Guided Independent StudyAssessment preparation and completion51:005:00Background reading
Guided Independent StudyAssessment preparation and completion201:0020:00Lecture follow-up
Scheduled Learning And Teaching ActivitiesLecture201:0020:00Lectures
Scheduled Learning And Teaching ActivitiesPractical201:0020:00Practical sessions
Teaching Rationale And Relationship

Lectures explain the underpinning principles for the module and technologies that support data management and exploratory data analysis. Lectures are complemented by supervised practical sessions to guide the application of these principles using suitable computational tools. The practical work builds up experience working with a computational toolset that is used to complete a substantive project working with data from a real-world context.

Assessment Methods

The format of resits will be determined by the Board of Examiners

Description Length Semester When Set Percentage Comment
Oral Presentation151M20Presentation of the methods and results from the coursework project. Presentation length: 15 mins
Other Assessment
Description Semester When Set Percentage Comment
Report1M80Extended technical project Word count: Up to 2,000 words
Zero Weighted Pass/Fail Assessments
Description When Set Comment
Oral ExaminationMA structured discussion including a software demonstration and reflection on the key learning objectives of the coursework project.
Assessment Rationale And Relationship

The report tests the students’ ability to apply data management techniques in a reproducible manner, using effective tools and methods to solve a real-world challenge. The presentation assesses the students’ ability to communicate their findings and approach. The semi-structured interview facilitates a reflective discussion about how individual students have met the learning objectives of the module and how the principles of data management and exploratory data analysis are embedded in the functionality of their software artefact.

Reading Lists