Skip to main content


CME8124 : Big Data Analytics in the Process Industries

  • Offered for Year: 2022/23
  • Module Leader(s): Dr Chris O'Malley
  • Lecturer: Dr Jie Zhang
  • Owning School: Engineering
  • Teaching Location: Newcastle City Campus
Semester 1 Credit Value: 10
ECTS Credits: 5.0


To introduce students to variety of data analysis techniques that can be used for modelling and analysis of large datasets, aka “big data”, typically encountered in the process industries.

There are many cases in the process industries where it is not possible to undertake experimental design and utilise the resulting data to enhance process understanding. Quite often the only data available is that collected directly from the process via the routine monitoring and control of process variables on plant. This data is often not of the correct format for subsequent modelling and often contains outliers, missing values and mistakes due to things like transcription errors or badly calibrated instruments.

This module aims to introduce students to tools and techniques for working with this type of data and how to extract meaningful relationships from the plant data that can subsequently be used to enhance process understanding and to develop data driven models for process monitoring and prediction. In recent years this has been a hot topic in the likes of the Bioprocess sector and forms a key part of the concept of Quality by Design (QbD).

Outline Of Syllabus

Multivariate Data Analysis: Introduction: What problems can be addressed using these techniques; Preliminary Data Analysis – Handling of Inhomogeneous Data (Missing Data; Outliers; Noisy Data; Time Alignment); Graphical Procedures. Dimensionality Reduction (Principal Component Analysis); Modelling techniques: Multiple linear regression, Principal component regression; Projection to Latent Structures. Multivariate Statistical Performance Monitoring – Continuous and Batch Processes. Model simplification. Analysis of Variance. Confidence Intervals. Non-linear modeling techniques. Machine Learning techniques.

Teaching Methods

Teaching Activities
Category Activity Number Length Student Hours Comment
Scheduled Learning And Teaching ActivitiesLecture161:0016:00Present in Person
Guided Independent StudyAssessment preparation and completion110:0010:00Problem Solving Exercise, formative assessment on pre-treatment of data
Guided Independent StudyAssessment preparation and completion130:0030:00Problem Solving Exercise 2 and subsequent writing up in report format -summative assessment
Scheduled Learning And Teaching ActivitiesSmall group teaching62:0012:00Numerical practice sessions - Computing Labs
Guided Independent StudyIndependent study132:0032:00Review lecture material and prepare for small group teaching
Teaching Rationale And Relationship

Lectures convey the statistical concepts and theory and their application in process engineering. Numerical practice sessions support the learning introduced in lectures through the students having the opportunity to apply the concepts to a number of problems varying in terms of complexity. The numerical practice sessions allow the completion some of the assignment work.

Assessment Methods

The format of resits will be determined by the Board of Examiners

Other Assessment
Description Semester When Set Percentage Comment
Computer assessment1M100Assessed report - Process Data Modelling (set Week 6) -2000 words
Zero Weighted Pass/Fail Assessments
Description When Set Comment
Computer assessmentMPass/Fail formative report on pre-screening of data
Assessment Rationale And Relationship

Assignments allow engineering problems to be set and solved using computer software. They also provide the opportunity for the key skills listed above to be assessed and implemented. The Formative assessment will run as a lead-in to the summative assessment and will be used to assess the students comprehension of the techniques discussed in the lectures whilst preparing the data for subsequent analysis.

Reading Lists