Our Student Cohorts | Cloud Computing for Big Data

Cohort 1

Richard Cloete

Education and industry experience

In 2012, I obtained my undergraduate degree in Computer Science from the University of Westminster, London.

I worked in industry for 3 years developing applications for a variety of sectors including medicine, energy, academia and education before joining the CDT programme in 2015.

Research topic

Predictive interactive remote rendering

My PhD focuses on the challenges of remote rendering in the Cloud. Rendering and processing in the Cloud allows users to interact with data remotely using a thin client device such as a smartphone, laptop or tablet. Unfortunately, network latencies are present in all network-connected applications and Cloud Rendering therefore it introduces a new problem: Interaction Latency.

I am interested in understanding how we can lower the interaction latency (the response time) of remote rendering applications hosted in the cloud. To do this, I am researching how prediction can be used to deliver responses to the user’s input before the corresponding actions have been performed, as well as how prediction effects the system as a whole.

Some example application domains for this research are: Cloud Visualisation, Remote Desktops and Cloud Gaming.

With its wealth of technical expertise, access to resources and world-leading teaching, the CDT provides the perfect environment for this research.

Supervisor

Nick Holliman

Thomas Cooper

Email: t.cooper@newcastle.ac.uk

PhD title

Performance modelling of distributed stream processing topologies

Research area

How do we cope with huge amounts of incoming data?

Rather than running a single script on a laptop, we spread the load across many machines. We break the processing into multiple stages. We replicate these stages so the slow parts of the process can have more copies than the fast part.

Apache Storm and Apache Heron are examples of distributed stream processing systems. They provide a framework to create and run distributed real time processing pipelines (topologies). Without shutting the topology down, they allow you to:

change the number of copies of each operator (stages) in your topology
change the number of machines they are run across.

This change in the number of operators and machines is called scaling. Scaling affects how fast a topology runs (latency) and how much data it can process (throughput).

Currently, the decision of what operators to scale and by how much, is left up to human operators. This often means we use trial and error to get a topology to the correct “size” to meet an expected workload. This can take many days for more complex topologies.

My PhD research aims to provide a way to predict how changes to a running topology will affect its performance. A prediction system would mean a human (or a machine) could choose the best performing option before having to change the running system. This would save time and resources.

My main focus is on creating a prediction system for Apache Storm. However I recently completed a 4-month internship with Twitter where I created a prediction system for their in-house version of Apache Heron.

Supervisor

Paul Ezhilchelvan

Publications

Proactive scaling of distributed stream processing work flows using workload modelling: doctoral symposium - Cooper, T. - 10th ACM International Conference on Distributed and Event-based Systems - 2016

A Queuing Model of a Stream-Processing Server - Cooper, T. Ezhilchelvan, P. Mitrani, I. - IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) - 2019

Caladrius: A Performance Modelling Service for Distributed Stream Processing Systems - Cooper, T. Kalim, F. Wu, H. Li, Y. Wang, N. Lu, N. Fu, Qian, M. Luo, H. Cheng, D. Wang, Y. Dai, F. Ghosh, M. Wang, B. - IEEE 35th International Conference on Data Engineering (ICDE) - April 2019

Matthew Edwards

Email: m.edwards3@ncl.ac.uk

Education

I have a background in Mathematics and Statistics. I obtained my masters in Statistics from Lancaster University in 2014.

In 2015, I joined the CDT in Cloud Computing for Big Data at Newcastle University. I am currently a fourth year PhD student studying Statistics at Newcastle Univeristy

PhD Title

Stochastic generators for multivariate global spatio-temporal climate data

To understand and quantify the uncertainties in projections and physics of a climate model, we need a collection of climate simulations (an ensemble). Given the high-dimensionality of the input space of a climate model, as well as the complex non-linear relationships between the climate variables, a large ensemble is often required to accurately assess the uncertainties.

If only a small number of climate variables are of interest at a specified spatial and temporal scale, the computational and storage expenses can be significantly reduced by applying a statistical model to the climate simulations. The statistical model would then act as a stochastic generator (SG) able to simulate a large ensemble, given a small training ensemble.

Previous work has focused only on simulating individual climate variables (eg surface temperature, wind speed) independently from a SG. I develop the first SG that achieves a joint simulation for three climate variables. I base this model on a multi-stage spectral approach. This allows for inference of more than 80 million data points for a nonstationary global model. It distributes the inference across many processors and in four stages.

The advantages of jointly simulating climate variables is demonstrated by training the SG on a five member ensemble, from a large ensemble project conducted at the National Center for Atmospheric Research.

Supervisor

Robin Henderson

Publications

A Multivariate Global Spatiotemporal Stochastic Generator for Climate Ensembles - Edwards, M. Castruccio, S. Hammerling, D. - Journal of Agricultural, Biological and Environmental Statistics 24, pages 464–483 (2019) - 2019

Marginally parameterized spatio-temporal models and stepwise maximum likelihood estimation - Edwards, M. Castruccio, S. Hammerling, D. - Computational Statistics & Data Analysis Volume 151 - November 2020

Shane Halloran

Email: s.halloran1@ncl.ac.uk

PhD title

Learning a Lexicon of Human Movements

Matching accelerometer data from smart watches to recovery level of patients. This allows us to monitor patients cheaply and remotely.

Supervisor

Jian Shi

Publications

Deep, convolutional, and recurrent models for human activity recognition using wearables - Hammerla, N.Y. Halloran, S. Ploetz, T. - Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) - 2016

Remote cloud-based automated stroke rehabilitation assessment using wearables - Halloran, S, Shi, J. Guan, Y Chen, X. Dunne Willows, M Eyre J - Presented in Data Handling and Analytics for Health Workshop of the 14th annual IEEE eScience conference - 2018

Remote monitoring of stroke patients’ rehabilitation using wearable accelerometers - Halloran, S. Tang, L. Guan, Y. Shi, J. Eyre, J - ACM International Symposium on Wearable Computers, London, UK - 2019

Evaluating upper limb function after stroke using the free-living accelerometer data - Halloran, S. Tang, L. et al - Statistical Methods for Medical Research 2019 - Journal paper, currently under review

Jonny Law

Email: j.law1@ncl.ac.uk

PhD title

Scalable Bayesian Analysis of Urban Observatory Data

Research area

The Urban Observatory is a sensor network in Newcastle upon Tyne. It records environmental data such as:

temperature
NO
CO
sound
humidity
traffic

The data can be difficult to use as:

the volume of data is very large
sensor data quality can vary
information is only available where sensors are deployed

My research focuses on statistical modelling of this sensor data to learn about the environment when:

sensors are not deployed
sensors have temporary outages due to network glitches

We aim to understand the true value of the process the sensor is measuring.

Supervisor

Darren Wilkinson

Publications

Composable models for online Bayesian analysis of streaming data - Law, J. Wilkinson,D.J. Prangle, D - Statistics and Computing - November 2018, Volume 28, Issue 6, pp 1119–1137

Peter Michalák

Email: P.Michalak1@newcastle.ac.uk

PhD title

Internet of Events for Healthcare Data: Automating Computational Placement in IoT Environments

This research project focuses on holistic stream processing for Internet of Things (IoT) applications.

Data analytics are automatically partitioned and deployed across heterogeneous platforms. These platforms (sensors, field gateways, clouds) then meet non-functional requirements:

energy
performance
security

The user gives a high-level declarative description of computation. This is in the form of Event Processing Language queries. These are compiled, optimised, and partitioned to meet the non-functional requirements. We use Database system techniques and cost models to meet the needs of IoT analytics.

We work with medical researchers on an application that uses wearable sensors. These sensors show the activity and glucose levels of type II diabetes patients. We analyse the measurements in near-real time to give short-term forecasts. The user receives personalised alerts. These prompt them to heighten their physical activity to raise their glucose levels. The app alerts the patient before their health is at risk.

Our research shows the potential for IoT management systems to automatically exploit fog/edge computing. This optimises non-functional requirements. Our system is also able to distribute stream processing computations across multiple platforms. This is doneautomatically. We use platform specific compilers and deployers.

Supervisors

Paul Watson, Sarah Heaps, Mike Trenell

Publications

PATH2iot: A Holistic, Distributed Stream Processing System - Michalak, P. Watson, P - DEBS 2016 - Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems June 2016, Pages 434-437

Doctoral symposium: Automating computational placement in IoT environments - Michalak, P. Heaps, S. Trenell, M. Watson, P. - DEBS 2016 - Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems June 2016, Pages 434-437

Multi-objective Deployment of Data Analysis Operations in Heterogeneous IoT Infrastructure - Michalak, P. Jha, D. Wen, Z. Watson, Ranja, P.R. - Published: IEEE Transactions on Industrial Informatics - December 2019

Automating the Placement of Time Series Models for IoT Healthcare Applications - Roberts, L. Michalák, P. Heaps, S. Trenell, M. Wilkinson, D.J. Watson, P - IEEE eScience 2018

Saleh Mohamad

Email: s.mohamed@ncl.ac.uk

BSc (Hons) in Computing Science (Security and Resilience) Newcastle University, 2013

MSc in Cloud Computing, Newcastle University, 2014

PhD title

Design of distributed run-time infrastructure for IoT

IoT devices are able to connect wirelessly to a network and have the ability to transmit data. The diversity of IoT infrastructure and resource present new challenges for application delivery.

On one side, we have resource constrained devices in terms of memory, storage and processing capabilities such as mobile and sensor-gateway devices. On the other side, clouds provide the illusion of limitless capacity of the same resources. The difference of these devices, together with the scale of the IoT systems prompt new challenges and opportunities for IoT applications deployment.

My project will investigate how an optimised, high-level, declarative description of a computation on streaming data can be used to automatically generate a distributed run-time infrastructure for Internet of Things (IoT) to meet:

energy
performance
security
cost
Quality of Service standards

By taking into account the diverse range of processing capabilities within IoT systems, our framework efficiently and optimally deploys each operation within a computation.

Research area

The Internet of Things comprises smart devices. Smart devices are objects that don’t just sense and communicate, but also possess varying processing capabilities. The diversity of IoT infrastructure poses new challenges for applications delivery. The computational heterogeneity of IoT devices, together with the scale of the IoT systems prompt new challenges and opportunities for IoT applications deployment.

This project will design a holistic approach for automatic generation of distributed run-time infrastructure for IoT systems. It is based on an optimised, high-level declarative description of a computation on streaming data.

Supervisors

Nigel Thomas, Matthew Forshaw, Sarah Heaps

Publications

Automatic Generation of Distributed Run-time Infrastructure for Internet of Things (IoT) - Mohamed, S. Forshaw, M. Thomas, N. - Conference: Software Architecture Workshops (ICSAW) - 2017

Performance and Dependability evaluation of distributed event-based systems: a dynamic code-injection approach - Mohamed, S. Forshaw, M. Thomas, N. Dinn, A. - Conference: 8th ACM/SPEC International Conference on Performance Engineering (ICPE) - 2017

Mario Parreno

Email: M.Parreno-Centeno1@ncl.ac.uk

Education and industry background

Mario graduated as a Telecommunications Engineer from the Polytechnic University of Valencia. From January 2012 to September 2013, he was working as a Visiting Researcher at The University of Manchester in Parallelization of Maxwell’s equations (Finite–Difference Time–Domain methods), using parallel programming models such as SSE, AVX, OpenMP, MPI and CUDA.

From September 2013, he did an MPhil at The University of Manchester. The topic of his proposal was the development and implementation of an Additive Operator Splitting FDTD (AOS-FDTD) method.

Mario is currently a PhD student at the EPSRC Centre for Doctoral Training in Cloud Computing for Big Data at Newcastle University.

Research interests

His PhD is in the area of Fraud Detection Techniques for Emerging Mobile Payment Solutions. He is interested in building a fraud detection system capable of detecting a high number of card fraudulent transactions on mobile purchases, keeping a low false positive (FP) rate, using behavioural data to help in the decision making process.

Publications

Mobile Based Continuous Authentication Using Deep Features - Parreno-Centeno, M. Moorsel, A.V. Guan, Y. - 2nd International Workshop on Embedded and Mobile Deep Learning (EMDL18) - 2016

Smartphone Continuous Authentication Using Deep Learning Autoencoders - Parreno-Centeno, M. Moorsel, Castruccio, S. - 15th Annual Conference on Privacy, Security and Trust (PST)

Consumer-facing technology fraud: Economics, attack methods and potential solutions - Ali, M.A. Azad, M.A. Parreno-Centeno, M. Hao, F. Moorsel, A.V. - Future Generation Computer Systems - 2019

Unsupervised Machine Learning for Card Payment Fraud Detection - Parreno-Centeno, M. Moorsel, A.V. Guan, Y. - Risks and Security of Internet ans, CRiSIS 2019

Cohort 2

Alexander Brown

Email: a.j.r.brown@ncl.ac.uk

PhD title

Real-time Physics for Spatially Partitioned MSDVEs

In this study, a solution to delivering scalable real-time physics simulations is proposed. Although high performance computing simulations of physics related problems do exist, these are not real-time and do not model the real-time intricate interactions of rigid bodies for visual effect common in video games (favouring accuracy over real-time). As such, this study presents the first approach to real-time delivery of scalable, commercial grade, video game quality physics and is termed Aura Projection (AP).

This approach takes the physics engine out of the player's machine and deploys it across standard cloud based infrastructures. The simulation world is divided into regions that are then allocated to multiple servers. A server maintains the physics for all simulated objects in its region. The contribution of this study is the ability to maintain a scalable simulation by allowing object interaction across region boundaries using predictive migration techniques. AP allows each object to project an aura that is used to determine object migration across servers to ensure seamless physics interactions between objects. AP allows player interaction at any point in real-time (influencing the simulation) in the same manner as any video game.

This study measures and evaluates both the scalability of AP and correctness of collisions within AP through experimentation and benchmarking. The experiments show that AP is a solution to scalable real-time physics by measuring computation workload with increasing computation resources. AP also demonstrates that collisions between rigid-bodies can be simulated correctly within a scalable real-time physics simulation, even when rigid-bodies are intersecting server-region boundaries; demonstrated through comparison of a distributed AP simulation to a single, centralised simulation.

We believe that AP is the first successful demonstration of scalable real-time physics in an academic setting.

Supervisor

Graham Morgan

Publications

Aura Projection for Scalable Real-Time Physics - Brown, A. Ushaw, G. Morgan, G - i3D ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games - May 2019

Michael Dunne-Willows

Email: m.dunne-willows@newcastle.ac.uk

Education and industry experience

I completed my statistics undergraduate degree at Newcastle University and simultaneously took on the role of STEM communicator as a member of the Street Scientist team.

As a statistical ambassador for the Royal Statistical Society (RSS) I respond to media requests and help journalists understand probability, data analytics and other statistical issues. Through the RSS I have developed advanced statistical communication skills which have been a huge help with disseminating my research to audiences of both statistical and clinical backgrounds.

In Summer 2018 I completed a technology internship with Ocado Technology. This was excellent experience of how cloud-based services and data analytics can be used for decision making in a fast-moving industry environment.

PhD title

Big Data Analytics in accelerometer-based data sets: long-term monitoring of gait to assess effectiveness of medication in Parkinson’s disease

Parkinson's Disease is the 2nd most common neurological disease. I use longitudinal triaxial accelerometer data from low-cost wearable sensors as a non-invasive means of medication efficacy assessment within Parkinson's Disease,

I aim to address the following questions:

How can we identify changes in gait as a consequence of medication during continuous free-living gait on a per-patient basis?
What is the pattern of gait change with respect to specific medication intake regimes?

To help answer these questions, I work closely with the Clinical Ageing Research Unit - Newcastle University.

Supervisors

Dr Jian Shi, Dr Silvia Del Din, Prof Paul Watson, Prof Lynn Rochester

Publications

Effect of medication on habitual gait in people with Parkinson's disease: a feasibility study - International Society of Posture and Gait Research (ISPGR) World Congress - June 2017

Multivariate Gaussian Mixture Model Parameterisation of Phase Plots for Monitoring of Parkinson’s Disease - 40th International Engineering in Medicine and Biology Conference - Dunne-Willows, M. Watson, P. Shi, J. Rochester, L. Del Din, S - July 2018

Validation of a novel lightweight template-based algorithm for free-living gait detection - 6th International Conference on Ambulatory Monitoring of Physical Activity and Movement (ICAMPAM) - Dunne-Willows, M. Watson, P. Shi, J. Hickey, A. Rochester, L. Del Din, S June 2019

Darren Fletcher

Email: d.fletcher@newcastle.ac.uk

PhD title

Modeling voxel dependent hemodynamic response function

Using spatial statistics to model fMRI data to detect which voxels (small volumes) and regions within the brain become active in response to a basic motor function task.

Supervisor

Robin Henderson

Naomi Hannaford

PhD title

Complex Models in Phylogenetics

Research area

The aim of my project is to develop complex models for genetic sequence data eg alignments of DNA that incorporate more biological realism. I work with problems in phylogenetics and I am starting some research in the area of metagenomics.

Phylogenetics is the study of evolutionary history. It looks at relationships among different individuals or groups of organisms eg different species. The main goal of phylogenetic studies is to find a phylogenetic tree for a set of species (or organisms). A phylogenetic tree (similar to a family tree) is a graphical way of representing the evolutionary history of a set of species. It can tell us how all of the species of interest are related to each other.

The models I develop are statistical models for DNA evolution or more specifically substitutions in DNA sequences. These models are then used to analyse (real data sets of) DNA sequences from different species to learn what their phylogenetic tree is.

Existing standard models are not realistic, usually for computational convenience. Analysis with more complicated models takes longer to run. Thus, another aspect of my research is increasing computational speed of analysis. To do this, we aim to improve the software that we use via parallel computing and more efficient code.

Supervisors

Sarah Heaps, Tom Nye

Hollie Johnson

Email: h.a.johnson@ncl.ac.uk

PhD Title

Topological methods for the assessment of statistical models

This research project develops new methodology. We base it on the fields of statistical topology and survival analysis. Our aim is to investigate how we can apply this methodology as a tool for both assessment of model fit, and comparison of two-dimensional random fields. This is of particular interest in the application to global wind intensities data from the CESM Large Ensemble. Our data set contains many realisations of simulated climate variables.

The work examines to what extent these methods for analysing random fields are more informative than conventional methods alone. It allows the discovery of more subtle differences between real data and model output, or between multiple realisations within a data set. This raises the challenge of applying standard survival analysis techniques to spatially correlated data. It is a problem that has not been subject to previous work. A range of spatial correlation structures are studied, including how to model and fit these structures on the surface of the globe, where possible. This is vital due to the global nature of the data set as many conventional methods are insufficient for modelling on the surface of a sphere.

Drawing on work from statistical topology, the research looks at one particular topological metric, connected components (apparent in the 1D and 2D cases as local maxima or minima). Of primary interest is whether the number of these components differs between random fields with different distributions or correlation structures. If so, whether it is possible to identify distributions by this topological metric. And whether it is possible to formulate their occurrence as event times, allowing the application of survival theory to the data. This allows a far more subtle analysis of data than is possible using a survival analysis approach alone.

Supervisor

Professor Robin Henderson

Antonia Kontaratou

Email: a.kontaratou2@newcastle.ac.uk

PhD title

Scalable Bayesian Hierarchical modelling with application in genomics

Bayesian Hierarchical Modelling is a very powerful technique. We use it to analyse and interpret large and complex data sets. However, the Markov Chain Monte Carlo algorithms used for this type of modelling are computationally intensive and difficult to parallelise.

Our aim is to develop new methods for fitting Bayesian Hierarchical models that can scale better and have a wider range of applications. We are using Apache Spark as it is a fast engine for large-scale data processing, and functional programming. We also use Scala due to its immutability and easier parallelism.

We will test the efficiency of the methods developed on budding yeast (Saccharomyces cerevisiae) genome data. The Institute for Cell and Molecular Bio-sciences of Newcastle University provides this data. We need Complex Bayesian Hierarchical modelling to identify interactions between genes related to the strength of telomere capping.

We aim to develop new methods for fitting Bayesian Hierarchical models to a large volume of data. These new methods will contribute to the scientific understanding and highlighting of genetic interactions and also to the modelling of various problems.

Supervisor

Darren Wilkinson

Ashleigh McLean

Email: a.t.mclean@ncl.ac.uk

PhD title

Bayesian inference for linear stochastic differential equations with application to biological processes

Stochastic differential equations (SDEs) provide a natural framework for describing the stochasticity inherent in physical processes that evolve continuously over time.

I consider the problem of Bayesian inference for a specific class of SDE – one in which the drift and diffusion coefficients are linear functions of the state. Although a linear SDE admits an analytical solution, the inference problem remains challenging, due to the absence of a closed form expression for the posterior density of the parameter of interest and any unobserved components.

This necessitates the use of sampling-based approaches such as Markov chain Monte Carlo (MCMC) and, in cases where observed data likelihood is intractable, particle MCMC (pMCMC). When data are available on multiple experimental units, a stochastic differential equation mixed effects model (SDEMEM) can be used to further account for between-unit variation.

Integrating over this additional uncertainty is computationally demanding. Motivated by two challenging biological applications arising from physiology studies of mice, the aim of my research is the development of efficient sampling-based inference schemes for linear SDEs. A key contribution is the development of a novel Bayesian inference scheme for SDEMEMs.

Supervisor

Andrew Golightly

Publications

Efficient inference for stochastic differential equation mixed-effects models using correlated particle pseudo-marginal algorithms - McLean, A.T. Wiqvist, S. Golightly, A. Picchini, U - 2019

Hayley Moore

Email: h.e.l.moore@ncl.ac.uk

PhD title

Contraining models of collective motion in biological systems

Animals moving together as one is a commonly seen spectacle in both the sky, with flock of birds, and in the oceans, with school of fish.

Mathematical models have been developed over the last 50 years to gain a deeper understanding into how such coordination occurs. There has been extensive numerical simulation and analysis done for these models but little comparison to actual data.

My research will describe a computer vision algorithm we devised to detect and track individual sheep in drone footage we collected. The algorithm emphasises the differences in the colours of the sheep and the grass background. In total the trajectories of 45 or more sheep were extracted from 14 videos. In some of these videos the quadbike and farmers herding the sheep were also able to be tracked. From these trajectories we were able to look at quantities such as average speed and global alignment which can then be used to compare to simulated data.

I will go on to compare our observational data to two different types of models: one to type of model to compare to emergent flocking behaviour and another type of model to compare to my observations of “steady-state” flocking. I will compare our observational data to simulated data using an approximate Bayesian computation rejection scheme to calculate an approximate joint posterior distribution for the parameters in each of the models.

Supervisor

Andrew Baggaley

Lauren Roberts

Email: l.k.roberts@ncl.ac.uk

PhD title

Real-time monitoring and forecasting of time series

The modelling of time series data can help us make real-time decisions. Often, data is available on more than one variable. Joint models can be fitted to provide more informed inferences. Advances in the Internet of Things have led to large amounts of smart-sensors. These produce high volumes of time series data.

Among these devices are continuous glucose monitoring (CGM) devices. CGM are sensors placed under the skin that regularly report blood glucose levels.

Type 2 diabetes is a disease which causes blood glucose levels to become dangerously high (hyperglycaemia). It is widely known that exercise directly affects blood glucose levels. It can help reverse the effects of type 2 diabetes.

Through joint modelling of patient blood glucose levels (from a CGM device) with patient activity levels (through collecting data from a wrist-worn accelerometer) we can forecast blood glucose levels. This allows the patient to act and to prevent the hyperglycaemic episode. It also gives us more accurate blood glucose forecasts.

Supervisors

Darren Wilkinson, Sarah Heaps, Paul Watson

Publications

Automating the placement of time series models for IoT healthcare applications - Roberts, L. Michalák, P. Heaps, S. Trenell, M. Wilkinson, D.J. Watson P - IEEE eScience conference in Amsterdam - 2018

Presented Poster at International Society for Bayesian Analysis conference - International Society for Bayesian Analysis conference, Edinburgh, UK - 2018

Cohort 3

Adam Cattermole

Email: a.cattermole@newcastle.ac.uk

PhD title

Declarative Distributed Stream Processing

I am investigating whether we can exploit the semantics of functional programming languages to generate, monitor, and dynamically adapt a real-time stream processing system for the internet of things. By partitioning the graph of computation we can decide the best deployment strategy, and transform the system during run time based on the non-functional constraints.

Supervisor

Paul Watson

Publications

An Automated Approach to Cloud Performance Benchmarking - Cattermole, A. Forshaw, M. - Electronic Notes in Theoretical Computer Science, Volume 340, 29 October 2018, Pages 23-39 - October 2018

Kathryn Garside

Email: K.A.Garside2@newcastle.ac.uk

PhD title

Title: Statistical topology for random fields and branching structures

Tree-like and branching structures can be found in many naturally occurring processes, for example such structures can be found in retinal vascularity, the respiratory system and breast ductal networks. Accurate topological and morphological descriptors of these structures can potentially aid classification and diagnosis of disease.

Topological data analysis is an area of research which considers the shape of data and its topological properties. This project aims to contribute towards the development of topological data analysis methods for both random fields and branching structures, proposing valid statistical and computational methodologies.

Supervisor

Rob Henderson

Publications

Topological data analysis of high resolution diabetic retinopathy images - Garside, K. Henderson, Masoller, C. Makarenko, I. - PLOS ONE - May 2019

Alexander Kell

Email: a.kell2@ncl.ac.uk

PhD title

Modelling the transition to a low-carbon electricity mix

My research will look at how the transition to a low-carbon electricity supply is crucial to limit the impacts of climate change. Reducing carbon emissions could help prevent the world from reaching a tipping point, where runaway emissions are likely. Runaway emissions could lead to extremes in weather conditions around the world - especially in problematic regions unable to cope with these conditions.

However, the movement to a low-carbon energy supply cannot happen instantaneously due to the existing fossil-fuel infrastructure and the requirement to maintain a reliable energy supply. Therefore, a low-carbon transition is required. Though, the decisions various stakeholders should make over the coming decades to reduce these carbon emissions are not obvious. This is due to many long-term uncertainties, such as electricity, fuel and generation costs, human behaviour and the size of electricity demand. In addition, the electricity generators invested in by generation companies are controlled by many heterogeneous actors in many markets around the world. These markets are known as decentralised electricity markets. Decentralised electricity markets stand in contrast to centralised control markets, where a central actor, such as a government, invest and control the market. A well-choreographed low-carbon transition is, therefore, required between all of the heterogenous actors in the system, as opposed to changing the behaviour of a single, centralised actor.

To account for these long-term uncertainties in decentralised electricity markets, energy modelling can be used to aid stakeholders to better understand the energy system. This allows for decisions to be made with more information. Energy models enable a quantitative analysis of how an electricity system may develop over the long term, and often use scenario analysis to investigate different decisions stakeholders could make. Simulations are powerful tools which can be used to generate insight, and are based upon the complexity of these models. Simulations are computer programs which have been designed to mimic a real-life system, to allow users to gain a better understanding of said system.

In my thesis, a novel agent-based simulation model, ElecSim, is created and used. ElecSim adopts an agent-based approach to simulation where each generation company within the system is modelled with its behaviour. This allows for fine-grained control and modelling of these generation companies. Thus allowing ElecSim to be used to investigate the following significant challenges in moving towards a low-carbon future:

1. Predictions must be made to predict electricity demand at various time intervals in the future. We modelled the impact of poor predictions on generator investments and utilisation over the long-term.

2. Devising a carbon tax can be challenging due to multiple competing objectives, and the inability for an iterative learning approach. In this work, we used ElecSim to model multiple different carbon tax policies using a genetic algorithm.

3. Many decentralised electricity markets have become oligopolies, where a few generation companies own a majority of the electricity supply. In this thesis, we used reinforcement learning and ElecSim to find ways to ensure healthy competition.

This requires a number of core challenges to be addressed to ensure ElecSim is fit for purpose. These are:

1. Development of the ElecSim model, where the replication of the pertinent features of the electricity market was required. For example, generation company investment behaviour, electricity market design and temporal granularity.

2. The complexity of a model increases with the replication of increasing market features. Therefore, optimisation of the code was required to maintain computational tractability, to allow for multiple scenario runs.

3. Once the model has been developed, its long-term behaviour must be verified to ensure accuracy. In this work, cross-validation was used to validate ElecSim.

4. To ensure that the salient parameters are found, a sensitivity analysis was run. In addition, various example scenarios were generated to show the behaviour of the model.

5. Predicting short-term electricity demand is a core challenge for electricity markets. This is so that electricity supply can be matched with demand. In this work, various methodologies were used to predict demand 30 minutes and a day ahead.

Supervisors

Matthew Forshaw & Stephen McGough

Publications

Segmenting Residential Smart Meter Data for Short-Term Load Forecasting - Kell, A. Forshaw, M. McGough, S. - e-Energy '18 Proceedings of the Ninth International Conference on Future Energy Systems - 2018

ElecSim: Stochastic Open-Source Agent-Based Model to Inform Policy for Long-Term Electricity Planning - Kell, A. Forshaw, M. McGough, S. - e-Energy '19 Proceedings of the Tenth International Conference on Future Energy Systems - 2019

Optimising energy and overhead for large parameter space simulations - Kell, A. Forshaw, M. McGough, S. - IGSC 2019 - The Tenth International Green and Sustainable Computing Conference - 2019

Modelling Carbon Tax in the UK Electricity Market using an Agent-Based Model - Kell, A. Forshaw, M. McGough, S. - e-Energy '19 Proceedings of the Tenth International Conference on Future Energy Systems - 2019

Isaac Matthews

Email: I.J.Matthews2@newcastle.ac.uk

PhD title

Attack graphs for network hardening

My research will look at how computer networks comprised of many hosts are vulnerable to cyber-attacks. One attack can take the form of the exploitation of multiple vulnerabilities in the network along with lateral movement between hosts. In order to analyse the security of a network it is common practice to run a vulnerability scan to report the presence of vulnerabilities in the network, and prioritise them with a scoring of importance.

This scoring ignores how multiple vulnerabilities could be used in conjunction with one another to achieve a goal that previously was not possible. Attack graphs are a common solution to this problem, where a scan along with the topology of the network is turned into a graph that models how hosts and vulnerabilities can be connected. For a large network these attack graphs can be thousands of nodes in size, so in order to gain insight from them in an automated way they can be turned into Bayesian attack graphs (BAGs) to model the security of the network probabilistically.

The aim of this thesis is to work towards the automation of gathering insight from vulnerability scans of a network, primarily through the generation of BAGs. As such the main contributions of this thesis are as follows:

1. Development and demonstration of a fully containerised pipeline to automatically process vulnerability scans and generate the corresponding attack graph.

2. Creation of a unified formalism for the structure of BAGs and how other graphs can be translated into this formalism.

3. Proposal and evaluation of a novel technique for approximation in the process of static BAG calculation with no requirement for the base graph to be acyclic.

4. Implementation and comparison of three stochastic simulation techniques for dynamic BAG analysis and sensitivity analysis.

5. Demonstration of a sensitivity analysis for BAG priors and a novel method for quick computation of sensitivities that is more readily analysed than the traditional technique.

6. Classification of vulnerabilities using neural networks.

Supervisor

Aad van Moorsel

Publications

Scalable approximate inference for state space models with normalising flows - Matthews, I. Golightly, A, Prangle, D. - October 2019

Pedro Pinta da Silva

Email: p.pinto-da-silva2@newcastle.ac.uk

PhD title

Extracting knowledge from networks of automatic number plate recognition cameras

Recent advances in Computer Vision and related hardware have driven down the price to performance ratio of on-device image recognition. In turn, this has led to increased numbers of these devices used in urban settings.

A clear example is Automatic Number Plate Recognition (ANPR) cameras. Originally a handful of cameras were installed for law enforcement and electronic tolling purposes. Now there is a network with over 200 cameras in the region of Tyne and Wear alone.

Yet, besides simple analytics, ANPR data is still very much an untapped source of information. Especially in the collective sense.

So, in this work we present a methodology to identify individual vehicle trips from number plate data. This is, in its essence, a stream of information containing an anonymised plate number, camera id and timestamp.

Furthermore, we argue that once trip data has been accurately obtained, we unlock a new range of applications. Research can move into how to leverage the entire network to extract valuable knowledge that can be used, for instance, in improving traffic management and control.

Supervisor

Stephen McGough

Publications

Clustering Trips Identified from Automatic Number Plate Recognition Camera Scans - Pinto Da Silva, P. Forshaw, M. McGough, S. - 1st International Workshop on Big Traffic Data Analytic - April 2018

Tom Ryder

Email: t.ryder2@ncl.ac.uk

PhD title

Variational inference with applications to stochastic processes

Variational Inference provides a way to evaluate and sample from hard-to-compute probability densities. With the modern advances in GPU technologies, such techniques are faster and more powerful than ever before.

Using variational inference, we perform full probabilistic inference, beginning on stochastic differential equations, and approximate the conditioned diffusion process and parameter posteriors.

Supervisor

Dennis Prangle

Publications

Black-Box Variational Inference for Stochastic Differential Equations - Ryder, T. Golightly, A. McGough, S. Prangle, D. - 35th International Conference on Machine Learning - Stockholm, Sweden - July 2018

Black-Box Autoregressive Density Estimation for State-Space Models - Ryder, T. Golightly, A. McGough, S. Prangle, D. - Bayesian Deep Learning workshop, NeurIPS 2018 - November 2018

Variational Bridge Constructs for Grey Box Modelling with Gaussian Processes - Ryder, T. Ward, W.O. Prangle, D. Alvarez, A.M. - 2019 Conference on Neural Information Processing Systems - Vancouver, Canada - December 2019

Scalable approximate inference for state space models with normalising flows - Ryder, T. Golightly, A. Matthews, I. Prangle, D. - October 2019

Junyang Wang

Email: j.wang68@newcastle.ac.uk

PhD title

On the Bayesian solution of differential equations

Junyang is working in the recent and emerging field of Probabilistic Numerics. This field interprets traditional numerical analysis methods, such as numerical differential equation solvers, as statistical estimation methods.

This has the advantage of allowing the statistical quantification of discretisation error in numerical algorithms both theoretically and when implemented in a computational pipeline.

In particular, Junyang is working on Bayesian Probabilistic Numerical Methods for Ordinary Differential Equations (ODEs). This involves exploiting the underlying structure of the ODE via Lie group methods to obtain a posterior distribution over the solution of the differential equation.

Supervisor

Chris Oates

Publications

On the Bayesian Solution of Differential Equations - Wang, J. Cockayne, J. Oates, C. - 38th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering - June 2019

Cohort 4

Georgia Atkinson

Email: g.atkinson@ncl.ac.uk

I graduated from Newcastle university in 2017 with an MMathStat in Mathematics and Statistics. In my final year, I worked with the Finance department of the car manufacturing company Nissan to help implement a base line model to predict the cost of building new cars.

In my spare time, I am a member of local wind bands and orchestras within the North East of England which perform on a regular basis, raising money for local charities.

PhD title

Acoustic Identification of Cetaceans using Deep Learning Techniques

Modelling cetacean (dolphins, whales and porpoises) population dynamics is paramount for effective conservation and population management. Cetaceans are prime candidates for modelling ecosystem change as they are at the top of the food chain within the ocean. Moreover, they can be used to assess the risk presented by anthropogenic activities.

Methodologies for cetacean research include passive acoustic monitoring (PAM) which allows for monitoring of cetacean occurrence and behaviour ecology through underwater recording. Due to high volumes of data collected and stored in PAM systems, there is a need for automated solutions that can detect and classify cetacean vocalisations. Current technologies provide identification of cetacean species using their vocalisations but do not harness deep learning techniques and/or use signature whistles to identify individuals within a species.

This project is designing a system that will detect and classify white-beaked dolphin whistles using signal processing and deep learning techniques. White-beaked dolphins migrate through the North-East of England between June and October each year and there is evidence to suggest the whistles that they produce are distinct, meaning they can be identified by these whistles - known as signature whistles. There is also evidence from a recent study to suggest their health is in decline, thus the monitoring of these groups is paramount.

Supervisors

Nick Wright, Stephen McGough, Per Berggren

Publications

The Northumberland Dolphin Dataset: A Multimedia Cetacean Dataset for Fine-Grained Categorisation - Atkinson, G. Trotter, C. Sharpe, M. McGough, S. Wright, N. Berggren, P. - CVPR, Long Beach, California, USA - June 2019

Julian Austin

Email: J.Austin3@ncl.ac.uk

I studied for my Bachelors in Mathematics at University of Manchester, graduated in 2014. From then I went on to work in the insurance industry in London for an international insurer. I worked in the Actuarial department for 3 years, as an actuarial analyst, before deciding to apply for the CDT in Newcastle.

PhD title

Integration of statistical modelling and multi-source remote sensing data for automatic and robust oil spill detection

This PhD project aims to combine spatial temporal models under functional data analysis framework with satellite remote sensing techniques to develop an automatic and robust oil spill detection system with ESA’s Sentinel-1/2 data.

Supervisors

Robin Henderson

Matthew Fisher

Email: m.fisher1@newcastle.ac.uk

I completed my BSc in Mathematics from Newcastle University in 2017.

PhD title

Toward an Adaptive Probabilistic Numerical Method

My research project is within the field of probabilistic numerics. Probabilistic numerics is a relatively new field. It involves the study of probabilistic numerical methods:

algorithms for solving numerical tasks, such as numerical integration
solving differential equations and optimisation, that return uncertainties in their calculation.

The uncertainty that arises can be interpreted as the uncertainty due to the incomplete or finite information about the continuous mathematical problem that is being approximated.

There is a gulf in sophistication between existing probabilistic numerical methods and the numerical methods that are used as standard.

In particular the Gaussian process model on which almost all probabilistic numerical methods are based does not yet allow for genuine adaptation. This is a necessary requirement for a general purpose numerical method. An adaptive numerical method usually exploits a local error indicator in order tosequentially refine an estimate until a prescribed tolerance is met.

The goal of my project is then to work towards and investigate how classical adaptive routines can be suitably modified to become adaptive probabilistic numerical methods.

Supervisors

Chris Oates, Aretha Teckentrup, Catherine Powell

Rob Geada

Email: r.geada2@ncl.ac.uk

Rob studied physics and computer science at the University of Chicago, and has worked for Red Hat as a data science intern for the past two years.

PhD title

Auto generation of optimal deep learning networks

This project seeks to discover easier and more efficient methods of generating neural networks.

It is tremendously expensive and/or unfeasibly time consuming to develop state-of-the-art networks. One notable method developed by Google Brain takes 4 days to train on a dedicated, multi-million pound setup. With such resource intensity, innovation is limited to massive companies like Google, Facebook, and Microsoft. These groups dominate the publication space in this field.

This project seeks to bring such methods to a 'human scale'. This means to a scale wherein someone could feasibly run these methods on their own personal machine. This would bring the power of state-of-the-art deep learning to the everyday user.

Supervisor

Steve McGough

Konstantinos Georgopoulos

Email: k.georgopoulos2@ncl.ac.uk

I have a background in Computer and Electrical Engineering. I obtained my undergraduate degree from the Polytechnic University of Patras, Greece. I received my master's degree in Cloud Computing after graduating from Newcastle University. Following that, I joined the CDT program in 2017 and currently I am in the first year of my PhD in Quantum Machine Learning.

PhD title

Quantum Machine Learning

Quantum Machine Learning is an emerging interdisciplinary scientific area. It lies in the intersection of Quantum Physics, Statistics and Machine Learning. The computers that we currently use are built using transistors and the data is stored in the form of binary states (bits), 0 and 1. Quantum computers are built using quantum bits (qubits). Qubits can be in multiple states at the same time as a quantum superposition of states.

The main advantage of quantum computers is that they can perform complex operations at high speeds. Thus, they can be used to solve certain problems which are currently computationally infeasible. The aim behind my research is to study the theoretical and applied basis that the effects of quantum techniques can have when applied to classical machine learning methods.

Throughout this research I am concerned with two main aspects of quantum machine learning: quantum walks and quantum Markov chain Monte Carlo (MCMC). Quantum walks, due to their intrinsic properties and the vast difference in behaviour compared to classical random walks are a very interesting and promising topic of research. Quantum walks form the basis for many machine learning algorithms, including Markov chain Monte Carlo methods.

Additionally, MCMC methods are vastly used throughout science in many fields, like physics, chemistry, computer science and more. Speeding up these methods through quantisation of MCMC could lead to increase in the computational efficiency. Areas of application include

Approximating partition functions in physics
Quantum sampling from distributions with at least quadratic increase in efficiency

The CDT program is the perfect place for this research to flourish. We have a strong community of researchers, supervisors and industry that can only have a positive impact on my research.

Supervisors

Paolo Zuliani, Chris Oates, Clive Emary

Carlos Vladimiro González Zelaya

Email: c.v.gonzalez-zelaya2@ncl.ac.uk

Bachelor and MSc in Mathematics from Mexico National Autonomous University. I worked as a lecturer at Panamerican University in Mexico, teaching undergraduate Mathematics courses. I also created a mobile indie game, called Baby-Bee for Android and iOS. I love playing and studying the game of go.

PhD title

Ensuring transparency of algorithmic predictions and Machine Learning: A data and metadata perspective

In this project we’ll try to improve our understanding on how ML algorithms make a decision. This will enable us to decide if an algorithm is working in a fair manner, and not discriminating against particular groups.

I think this project can have a big impact on society. ML algorithms are becoming common in decisions made in many different areas. For example in:

banking - is a loan given, and how big of a loan one gets
visa issuing
academic prowess comparisons

I believe the CDT program will help me a great deal. We have a great team of Academic supervisors and mentors. The CDT provides advanced level training in the relevant areas. We are able to interact and discuss our projects with our colleagues from all cohorts. This makes for a very fertile environment in which to work.

Supervisor

Paolo Missier

Publications

Towards Explaining the Effects of Data Preprocessing on Machine Learning - Gonzalez-Zelaya, C.V. - IEEE 35th International Conference on Data Engineering, Macao - April 2019

Benjamin Lam

Email: b3018678@newcastle.ac.uk

Benjamin gained a first class BSc degree in Computer Science at Newcastle University before joining the CDT in 2017. His undergraduate dissertation involved developing a platform for executing Design Space Exploration (DSE) experiments of cyber-physical systems in the Cloud. This was his first time working with Cloud and Big Data technologies. This inspired him to undertake doctoral training at the CDT. His research interests lie in artificial intelligence (machine learning), big data analytics and cloud computing.

PhD title

Data Science challenges for the future of healthcare: Predictive, Preventive, Personalised, and Participative

This PhD project is about using deep learning techniques to build predictive models. We use the UK Biobank (a bank of personal health data from over 500,000 patients) to predict disease progression.

This project has profound implications for the future of medicine. It shifts away from the traditional reactive paradigm. It moves to a model where healthcare becomes predictive, preventive, participatory and personalised. We call this P4 medicine.

The CDT at Newcastle provides a great opportunity to research this cutting-edge application in data science. We collaborate with:

the National Innovation Centre for Data
the National Innovation Centre for Ageing
the Institute for Genetic Medicine
other partners.

There is an ageing population in many parts of the world. Healthcare must adapt to this by identifying individuals at risk of disease. It must target treatments and lifestyle changes towards them.

Supervisor

Paolo Missier, Michael Catt, Jian Shi

Cameron Trotter

Email: c.trotter2@ncl.ac.uk

I joined the CDT in 2017 after graduating from Newcastle University with an MComp in Computer Science with a Year Abroad. Outside of my studies I am the Social Secretary for the University Maths Society.

PhD title

Photo Identification of Marine Cetaceans Using Convolutional Neural Networks

Modelling cetacean (whale, dolphin, and porpoise) population dynamics and behaviour is paramount to effective population management and conservation. Robust data is required for the design and implementation of conservation strategies and to assess the risks presented by anthropogenic activity such as offshore wind turbines and commercial fishing. Moreover, cetaceans make prime candidates for modelling ecosystem change under the ecosystem sentinel concept as they reflect the current state of the ecosystem and respond to change across different spatial and temporal scale.

As the global climate changes and urbanisation of coastal areas intensifies, it is imperative to develop methodologies for quick and effective assessment of the biological and ecological impact of rising sea temperatures, pollution, and habitat degradation. This can be achieved through modelling the population, behaviour, and health of large marine species such as dolphins.

Methodologies of cetacean research includes photo identification (photo-id). Photo-id involves collecting photographic data and identifying individuals based on unique permanent markings, and has been used for more than 40 years for modelling cetacean population dynamics and ecology. Current identification techniques for cetaceans rely heavily on experts manually identifying individuals. This can often be costly due to the number of person-hours required for identification, as well as the large potential for error due to issues such as observer fatigue. Further, individual identification of dolphins within a species is time consuming due to the nature of the task. With progressively more data being collected during fieldwork through increased use of technology, there is an urgent need for an automatic system for quick identification with reduced error rates.

This project addresses these limitations by applying the methodologies, techniques, and computational power of deep learning to the field of marine biology by bringing together a multidisciplinary team from the School of Engineering, the School of Computing, and the School of Natural and Environmental Science’s Marine MEGAfauna Lab.

Deep learning models, specifically Convolutional Neural Networks (CNNs), are trained on high-end computer clusters using the Microsoft Azure Cloud. Once trained, the models can be ran on field deployable computers to perform image analysis in real time from multiple data sources (underwater and above water images, and aerial drone footage). Methodologies incorporating these models will be designed to quickly identify individuals, assess health, analyse behaviour and incorporate remote sensing techniques.

Unlike traditional Computing Science PhDs, my research topic necessitates biological field work. It helps correlate sensor readings with marine life behaviour. This would be hard to achieve without the support of the CDT.

Supervisor

Nick Wright, Steve McGough, Per Berggren

Publications

The Northumberland Dolphin Dataset: A Multimedia Cetacean Dataset for Fine-Grained Categorisation - Trotter, C. Atkinson, G. Sharpe, M. McGough, S. Wright, N. Berggren, P. - CVPR, Long Beach, California, USA - June 2019

Jack Waudby

Email: J.waudby2@ncl.ac.uk

I completed my undergraduate studies in Economics & Mathematics at Lancaster University in 2015 and my Masters in Statistics in 2016, again at Lancaster University. Outside the CDT, I love skiing, playing football and watching the Montréal Canadiens.

PhD title

Transaction Management for Distributed Graph Databases

Graph databases provide an expressive data model and have a wide-range of use cases e.g. recommendation engines, fraud detection and master data management. Challenges arise in graph databases when you try to scale them, partitioning a graph across multiple machines is non-trivial. A common approach to placing a graph on a cluster of machines is to construct a balanced edge-cut in which vertices are evenly assigned to machines and the number of edges spanning machines is minimized. However, there is always a non-negligible number of these distributed edges. From a data management perspective maintaining the consistency of information regarding distributed edges becomes a challenge as ecords in the database are being concurrently modified.

An approach to architecting a distributed graph database is to use an existing scalable database as the storage backend and adapt it with a graph layer and a graph-like query language, e.g. JanusGraph. The graph database then inherits the (lack of) transactional and data replication semantics (often eventual consistency) from the underlying store. The presence of distributed edges make multi-partition transactional guarantees a necessity and it has been shown that eschewing ACID transactions in a distributed graph database can quickly lead to database corruption. My research explores the optimal approach to managing transactions in a distributed graph database given it’s common workloads and unique characteristics.

Proprietary graph databases have been fundamental to the success of companies such as Google, Facebook and Twitter. Our aim is to develop a general-purpose approach to managing graph transactions that maintains graph integrity and offers good performance. This will allow the wider society to harness the power of graphs to extract value from their data.

The CDT programme will help me achieve my research goals. It will provide a supportive environment that fosters discussion and collaboration. It will provide access to leading industry partners working on real world problems.

Supervisors

Paul Ezhilchelvan, Jim Webber

Cohort 5

Kirsten Crane

Email: K.Crane2@newcastle.ac.uk

I have a background in psychology – I did both my BSc in Psychology (Applied) and my Master’s by Research in Cognitive Neuroscience at Durham University.

During my MRes I was a member of DUNIC (Durham University Neuroimaging Centre). I used fMRI (functional magnetic resonance imaging) for my project, and the Space for Paralysis research lab - a multi-disciplinary team of psychologists, engineers and computer scientists working on brain-computer interface solutions for quadriplegic and locked-in syndrome patients.

Whilst I had already applied to several psychology PhDs, when I heard about the Cloud Computing for Big Data PhD at Newcastle University I decided to change paths. I felt a transition into data science would open many doors, whilst leaving the psychology door open.

With only very small amounts of experience in statistics and programming, I was encouraged to take the Computer Science conversion MSc here at Newcastle University. Having completed this degree, I am now beginning my CDT journey.

PhD title

Tracking Whales Through Big Data

The North-East has an internationally significant population of certain dolphin species and is the feeding grounds for migratory whales. It is important to monitor these populations of cetaceans for conservation purposes as well as knowledge acquisition. Researchers in the School of Biology spend several months of the year conducting surveys and then many more analysing this data for purposes such as identification and health assessment.

In an existing collaboration, Professor Nick Wright of Engineering and Dr Stephen McGough of Computing are working with Dr Per Berggren of Marine Biology to bring machine learning techniques to the study of marine mammals. Two students from Cohort 4 are already working on applying neural networks to image and sound data with the aim of assisting the identification process. The aim of this project is to facilitate the data collection process, collecting data that would not be possible to collect from aboard a vessel.

The idea is to design a camera-fitted ROV capable of diving sub-surface and equip it with deep reinforcement learning algorithms that will allow it to identify and then track a cetacean autonomously. Video footage acquired in this manner could potentially deliver new insights into the behaviour of these mammals.

Supervisor

Nick Wright

Philip Darke

Email: P.A.Darke2@ncl.ac.uk

I studied Mathematics and Physics at Durham University before training to be an actuary at a global consultancy. Here I helped clients understand and manage the risks associated with large corporate pension arrangements.

PhD title

Machine learning strategies to discover clinically-relevant biomarkers of biological age

My PhD research addresses the global challenge of ill-health in older populations. Outcomes vary significantly but many individuals experience lengthy periods of poor health in later life. The complexities of health needs and the associated economic burdens have implications for employers, insurers and public sector provision.

The analysis of large ageing-related datasets can help further our understanding of healthy ageing. My research uses machine learning to develop new computational strategies for working with electronic health data. The end goal is to help enable personalised and preventative healthcare and lifestyle advice.

I am a Fellow of the Institute and Faculty of Actuaries and have spoken at a number of data science events for the profession.

Supervisors

Jaume Bacardit, Paolo Missier, Pete Philipson

Chris Johnson

Email: c.johnson14@ncl.ac.uk

Distributed stream processing systems (DSPS) are a class of applications that process large amounts of data in real time, with the processing distributed over a cluster of machines. Storm, Flink, Heron and Spark Streaming are some examples of a DSPS.

DSPSs are an important tool in data analytics, and their usage will only increase with increases in data being collected and desire for faster processing.

Knowing a priori the exact number of machines that are needed for each stage of the processing pipeline can be a challenge. Current practice is to reactively respond to bottlenecks or over provisioning, resulting in degradation in service or wasted resources in the form of under utilised machines. Even if a topology meets throughput requirements, some use cases (such as credit card fraud detection) have strict latency requirements that must be met.

My research aims to create a model that will predict the ideal size of a cluster to run a given topology, given throughput and latency requirements. This would allow data engineers to quickly tune their topologies to what is required and sufficient, bypassing the tedious trial and error process, or pre-emptively scale a topology to meet future demands when paired with a forecasting model.

My research builds on earlier work carried out within the CDT that resulted in a 'proof of concept' model for Apache Storm using queuing theory. My research aims to extend this, generalising the model to all streaming systems, and investigating how the model may need to be modified under more complicated circumstances such as co-location of executors with other applications, and operators such as windowing and joins.

Supervisors

Paul Ezhilchelvan, Paul Watson, Isi Mitrani

Jordan Oakley

Email: j.oakley@ncl.ac.uk

I graduated from Newcastle University in 2016 with an MMATH degree with first class honours. I then completed a research masters degree in Bayesian statistics at Durham University. My thesis, titled Bayesian Forecasting and Dynamic Linear Models, implemented dynamic models to forecast severe oliguria in order to model and monitor kidney deterioration to identify kidney injury and the possible adverse outcomes associated with kidney deterioration.

PhD title

Reliability analysis in the age of data-centric engineering

In traditional reliability analysis, failure times of items or systems are observed, along with some covariate information (this information is referred to as failure-time data), and the probability of failure of future items or systems over time is estimated.

Using failure-time data to assess the reliability of highly reliable products is often problematic. For a practical testing duration (which there is always pressure to reduce), few or perhaps no failures may occur. In this scenario, we have very little information to infer the reliability of the products. Often, tests are conducted at increased (accelerated) temperatures, pressures or stresses to obtain failure times for these products. However, inferring failure times at normal-use conditions, which requires extrapolation, depends critically on identifying a relationship between the accelerating variable and the failure time distribution.

In the era of big data, sensors placed on items can give almost continuous information about the state of the system. However, the sensors do not record the actual state of an item or system, only measureable quantities which act as proxies for the system state.

Inferring the reliability of the system from these measurements is therefore a challenging statistical problem. Our goal is to perform inference in real-time allowing for corrective actions to be made as data are observed.

We hope the methodology that we will develop has the potential to improve the operation and maintenance of complex engineered systems and structures. It would allow more preventive (rather than corrective) maintenance to be performed, meaning that parts could be replaced prior to, rather than, after, failures occur. This would improve the reliability of the system, potentially reduce damage to the system and hence reduce costs.

More efficient engineered products and systems in the private sector should lead to lower prices for customers. In the public sector, it would hopefully lead to better value for money for the UK taxpayer.

Supervisor

Kevin Wilson

Tom Owen

Email: T.W.Owen1@ncl.ac.uk

I joined the CDT after earning my MMath & Stats degree at Newcastle University in 2018.

My dissertation focussed on modelling extreme sea levels along the North Sea coastlines to help build sea defences to the correct specification.

I chose the CDT over a regular PhD as they have great connections with industries, they focus on collaborative work and because they use cutting edge techniques for solving real world problems.

In my spare time I am an Event Leader for the Raising and Giving (RAG) society and I am a big fan of rugby, supporting Wales and the Scarlets.

PhD title

Structural white matter correlations in the human brain: Towards classification, prediction, and mechanistic understanding

My research project focusses on analysing neuroimaging data in order to devise improved personal treatment plans for patients with epilepsy.

Throughout, I will be applying existing techniques to the data and developing my own in order to accurately localise the seizure focus and to help surgeons determine viable candidates for a successful surgery.

Given the complexity and size of the neuroimaging scans needed for analysis, the collaborative ethos encouraged by the CDT and the skills developed during the taught component will be instrumental to the success of the project.

Supervisors

Peter Taylor, Tom Nye

Mariella Panagiotopoulou

Email: M.Panagiotopoulou2@ncl.ac.uk

I obtained my BSc in Applied Mathematics and Physics and my MSc in Applied Mathematical Sciences from National Technical University of Athens, Greece.

From then I went on to work as a data analyst in an International market research company. This company provides powerful market, consumer and media information based on the consumer goods industry using prescriptive analytic tools and visualization platforms.

Outside the CDT, I love cycling and travelling.

PhD title

Learning the hierarchical structure of spatio-temporal data in long term EEG recordings

Epilepsy is one of the most common neurological diseases, and more specifically is an electrophysiological disorder of the brain, characterized by recurrent, unprovoked seizures. One way to diagnose epilepsy is to use the electroencephalogram (EEG), which provides a measurement of the electrical activity of the brain.

Interestingly, recent studies have identified multiple rhythms (daily, monthly and/or yearly) in the EEG signals of epilepsy patients. Importantly, these rhythms are shown to modulate seizure risk, and even seizure type and severity. Thus, if we can capture, analyse and understand these long term rhythms, we can gain a better understanding of the vital temporal components that underpin the epileptic brain network in order to modulate epilepsy.

Epilepsy is affecting around 50 million people in the world and around 1 out of 3 do not respond to treatments. One novel avenue to increase treatments efficacy is to consider the timing of treatments. As human brain dynamics evolve over a wide range of time scales – from seconds to days to years – so do epileptic brain dynamics. By understanding different processes on these timescales, we have a chance to interact with them in a timely manner and thereby improve response to treatments.

The CDT programme gives me access to computational resources and provides me with the opportunity to participate to conferences, workshops and teaching courses that not only enhance my research knowledge, but helps me to my personal and career development as well. The CDT programme beyond all, is a powerful network of people and successful researchers from multiple disciplines that are willing to help you grow as a professional and researcher by providing advice to any problem and exchanging knowledge and experiences.

Supervisor

Yujiang Wang

Cohort 6

Rachel Binks

Email: R.Binks1@ncl.ac.uk

I graduated with a Master's degree in Mathematics and Statistics from Newcastle University in 2019, before joining the CDT.

Doing a PhD with the CDT appealed to me because of the emphasis on collaborative work as well as the links that the CDT hold with industry. I am looking forward to working with industry partners that will help me to gain skills that I wouldn't be able to develop whilst doing research alone.

In my free time, I enjoy getting involved with the Maths Society and going swimming.

PhD title

Determining the order of stationary vector autoregressions with application to spatio-temporal modelling of brain activity of patients with epilepsy

My PhD focuses on using a Bayesian approach to determine the order of stationary vector autoregressive time series. This will involve using two different model reparameterisations which were originally developed for the univariate case. The first reparameterises the model in terms of its partial autocorrelation function and the second in terms of the roots of its characteristic equation.

The aim is to generalise these methods for use in the case of vector autoregressions. The methods developed will then be applied to EEG data from epilepsy patients.

Supervisors

Darren Wilkinson, Sarah Heaps, Yujiang Wang

Jonathan Horsley

Email: J.Horsley2@newcastle.ac.uk

I graduated from Lancaster University in 2014, with a BSc in Mathematics. Following this, I worked for EDF Energy as a Nuclear Safety Engineer, with a focus on Reactor Physics, at Hartlepool Power Station.

Data Science is a field that I’ve been interested in for some time, and the opportunity within the CDT programme to develop the necessary skills, as part of a cohort of students is something that particularly appealed to me.

My spare time is spent watching my hometown football club Hartlepool United, attending live music or comedy gigs or taking part in the occasional pub quiz.

PhD Title

Improving Surgical Treatment of Epilepsy using Brain Network Analysis

Epilepsy is a neurological disorder, affecting approximately 1% of the population, and is typically characterised by abnormal brain activity together with recurrent seizures. Although anti-epileptic medication is widely used, it is not effective in up to 30% of patients. Surgery is therefore considered as an alternative treatment to remove the part of the brain believed to be causing the seizures. However, only 1 in 2 patients achieve long-term seizure freedom, and the reasons behind this are not well understood.

My research project aims to improve the understanding of why some patients achieve seizure freedom post-surgery, whereas others do not. By modelling patients’ brains as graphs comprising of regions (nodes) and tracts (edges), network analysis may provide insight into the effect of the removal of specific brain regions, and its relation to surgical outcome.

Supervisors

Peter Taylor, Rhys Thomas, Tom Nye

Atif Khan

Email: A.Khan21@ncl.ac.uk

I have over 8 years of industry experience managing IT infrastructure and software. Before this, I completed my BEng in Electronics and Communication Engineering, Masters in Computer Network Systems and Masters in Cloud Computing.

I joined the CDT because of its unique ecosystem, collaborative work amongst students, and diverse industry partners. I aspire to conduct high impacting research in an interdisciplinary subject and become a leader in Data Science. The CDT is well placed to help me achieve this.

PhD Title

mitoML: Machine learning approaches to understand mitochondrial disease pathology

“one-size-fits-all” approach in medicine is inefficient and sometime harmful . Application of AI for personalised medicine has potential to radically change every aspect of healthcare. And one such area is the AI driven approach to understand the disease pathology that can help with personalised diagnosis and prognosis. The personalised diagnoses and prognoses can greatly help in rare mitochondrial diseases where people suffer from a same disease present varied severity of symptoms.

Mitochondria are organelles that produce ~90% of the energy consumed within each of the trillions of cells that make up a human body. Mitochondria have their own genome: mtDNA coding for some mitochondrial proteins and the rest are coded in nDNA. Pathogenetic mutations in these genetic codes manifest into mitochondrial diseases. In the Wellcome Centre for Mitochondrial Research, at Newcastle University have some of the best access to mitochondrial disease patient tissue in the world. And hence access to scarce clinical (including omics) data from these patients.

In my PhD project mitoML, we use scarce single-cell omics and associated clinical data from patients of rare mitochondrial diseases, and machine learning techniques to understand pathology of these diseases. The high-level plan is to use novel multimodal machine learning techniques, and genomic, proteomic and other clinical phenotype data, to not just make non-obvious predictions but also discover underlining disease pathology.

Supervisors

Stephen McGough, Conor Lawless, Amy Vincent

Karoline Leiberg

Email: K.Leiberg2@newcastle.ac.uk

I completed my undergraduate degree in Mathematics at Heriot-Watt University in Edinburgh, during this time I wrote my bachelor dissertation on prime number distribution.

I joined the CDT because Data Science is an exciting research area with lots of career opportunities and by completing my PhD here I’m hoping to join the rank of experts in this field. I’m looking forward to picking up a lot of practical skills and gaining industry experience.

In my free time I like to play squash, read and play the piano.

Quantifying shape in 3D space to understand (dys)function: from brain morphology to disease prediction

The morphology of our brains changes as we age and in disease. It is therefore a useful neurological biomarker and its exploration and better understanding can help us identify unique aspects in diseases like Alzheimer’s and epilepsy.

It is however not always straightforward to quantify the morphology of shapes as complex as the brain. Meaningful measures are required to allow the distinction between different processes, and the interpretation of the changes that are happening.

For my project I’m comparing the progression of measures of brain morphology due to natural ageing with epilepsy. The knowledge of how the two processes are similar and where they diverge will hopefully lead to better, more precise and earlier diagnostics and prediction of disease trajectory for individuals.

Similar work can also be done for dementias and mental disorders. I’m hoping to be able to build a classification system for these diseases based on their morphological features and identify stages in each disease’s progression, so see which paths are shared and when and how they diverge.

Supervisors

Yujiang Wang, Sara Fernstad, Boguslaw Obara

Jamie Mcquire

Email: J.Mcquire2@newcastle.ac.uk

I graduated from Durham University in 2019 after completing my MEng in General Engineering with a specialization in Electronic Engineering.

My final year research project was titled 'Detection of the Pilot Spoofing Attack in Massive MIMO using Machine Learning.' The project inspired me to research further into machine learning techniques and I found that I became incredibly interested in Data Science.

I was attracted to the CDT at Newcastle University because of the uniquely tailored training course at the start of the program that would allow for me to explore different research areas and familiarise myself with a variety of different technologies used in Data Science.

Outside of the CDT, I am a keen football fan and I enjoy going to music festivals.

PhD Title

Next Generation Sensors and Analytics

My research explores the design and implementation of next generation body-worn sensors for non-invasive real-time healthcare monitoring.I am investigating the application of federated learning, a relatively new machine learning paradigm, which allows for machine learning algorithms to be trained using decentralized data sources in a privacy-conscious way, complying with data regulations such as the EU's GDPR.

The work will focus on locating the computation at the network edge, utilising different embedded devices, such as microcontrollers and single-board computers; these devices have hardware specific challenges, e.g. communication efficiencies, computational capabilities, and energy constraints. Additionally, the project will explore the challenges associated with developing robust decentralized models from heterogenous data sources, with the intention of solving specific problems in the healthcare domain.

Supervisors

Paul Watson, Nick Wright, Hugo Hiden, Michael Catt

Cohort 7

Jordan Childs

Email: j.childs@ncl.ac.uk

I graduated from Newcastle University with a master's in Mathematics and Statistics in 2020, I then joined the CDT in autumn of 2020.

I was interested in the CDT because of its focus on combining statistics and computer science, I was also interested in the link between the CDT and industry. I am excited to work with industry and develop skills that I otherwise wouldn't be able to, if I was part of a normal PhD programme. Outside of academics I am interested in art, specifically classical portraits and the artist Rembrandt. I also enjoy watching films and going to the cinema.

PhD Title

Bayesian Inference for Stochastic Models of Mitochondrial DNA Population Dynamics

Mitochondria are organelles within a cell responsible for producing the majority of energy consumed by the cell. They are unusual in that they have their own genome, mitochondrial DNA (mtDNA), responsible for coding mitochondrial proteins. mtDNA differs from nuclear DNA in many ways. Notably, there are many copies of mtDNA within each mitochondria, and there can be many mitochondrion per cell. As such there can be many thousand copies of mtDNA in a given cell. mtDNA also, continuously replicate and degrade throughout the cell cycle. This continuous replication and degradation gives rise to mtDNA population dynamics if, via inheritance or de novo mutation, a mutate mtDNA molecule appears in the system. When the proportion of mutant mtDNA (mutation load) reaches a sufficiently high level the cells function can be disrupted.

The mechanisms by which the mutation load reach a dangerous level are currently unknown. In this project we will study stochastic models of mtDNA population dynamics, developing new ones and comparing simulated results with large single data sets. For this we will use likelihood-free Bayesian inference requiring tens of thousands of simulations, as such we will take advantage of parallel and cloud computing and scientific programming languages.

Supervisors

Conor Lawless, Colin Gillespie, Andy Golightly, Amy Vincent

Mike Diessner

Email: M.Diessner2@newcastle.ac.uk

In 2018, I graduated with a Bachelor of Science in Economics from the University of Bonn in Germany. Enjoying the quantitative side of the degree the most, I pivoted into Data Science and worked as an analyst for DHL and a consultancy specialising in the energy markets. I decided to complement this practical experience with a more technical degree. In 2020, I graduated with a Master of Science in Applied Data Science and Statistics from the University of Exeter where I ranked first of all postgraduates enrolled at the School of Mathematics.

The Cloud Computing for Big Data CDT appealed to me as it combines scientific research with applications in industry. We are not just able to conduct research at the forefront of technology but also to solve problems that have a big impact outside of academia. This is further supported by the cohort system that provides a unique community on a day-to-day basis for the duration of the PhD and beyond.

Outside of the CDT, I enjoy cooking and eating good food, rooting for my favourite football team back in Germany (the one and only FC Schalke 04), and bouldering.

PhD Title

Taming turbulence with Bayesian optimisation

Despite over a century of research, turbulence remains one of the most important unsolved problems in the modern world. Whenever air flows over a commercial aircraft, a high-speed train or a car, or when water flows around the hull of an ocean-going ship liner, a thin layer of turbulence is generated close to the surface of the vehicle.This region of turbulence is responsible for more than half of the vehicle’s energy consumption. Taming the turbulence in this region reduces energy consumption, which in turn reduces transport emissions, leading to vast economic savings and wider health and environmental benefits due to improved air quality. To place this into context: just a 3% reduction in the turbulent forces acting on a long-range commercial aircraft would save £1.2M in jet fuel per aircraft per year and prevent the annual release of 3,000 tonnes of carbon dioxide. There are currently around 23,600 aircraft around the world. Yet, despite this significance, there is no economical system to reduce the effects of turbulence on any transportation vehicle.

Advanced wind tunnel experiments taking place in the School of Engineering at Newcastle University enable the investigation of how turbulence can be manipulated. These experiments make use of two state-of-the-art pieces of equipment developed at Newcastle University: (1) MicroElectro-Mechanical Systems sensors measure the turbulence close to the surface and (2) a blowing rig allows us to blow air upwards to alter the turbulence along a surface. The latter is essentially a metal plate with 666,000 small holes that can be divided in up to 600 smaller sections. Using valves underneath this plate, the blowing of air in each of these sections can be set individually by adjusting certain parameters, such as the blowing amplitude. At the upper end, a total of around 1,800 parameters have to be specified. However, the optimal, turbulenceminimising and energy-saving parameters cannot be found in an analytical fashion as the wind tunnel experiment itself is a black box function and has no known analytical form.

Experiments like these are expensive and time-consuming to perform so that an advanced framework is required to find the optimal parameters in a minimum number of evaluations. As part of my PhD project, I am developing an advanced Bayesian optimisation framework that utilises Gaussian processes that act as surrogates for the underlying physical processes of the experiments. While Bayesian optimisation works well in a low-dimensional parameter space, the Curse of Dimensionality makes it challenging to scale up to hundreds or even thousands of parameters. A main challenge of my project is therefore to find ways to extend the Bayesian optimisation framework to high-dimensional parameter space (for example through dimensionality reduction techniques or low-order additive models). The aim is to use this framework to find ways to reduce turbulence effectively and economically but also to learn more about the underlying physical processes.

Supervisors

Richard Whalley, Kevin Wilson, Yu Guan

Iain Dixon

Email: I.G.Dixon1@newcastle.ac.uk

I received a Bachelors of Science and Engineering in Computer Science from North Carolina State University, USA. During my time there I discovered a passion for computational modelling which I applied to a variety of bioinformatics projects. Outside of Uni I enjoy playing instruments (piano, cello, guitar, bass), listening to audiobooks, and consuming vast quantities of visual media - whether it’s on the big screen or YouTube.

I first gained interest in the CDT from friendly conversations with a few of the CDT professors and staff three years ago, and when I was applying for my postgraduate program, I couldn’t think of a better place to end up than here. I hope that over the course of my postgraduate degree I’ll be able to work on some tough but interesting problems, as well as bring my computing background as a springboard for my peers to bounce questions and ideas off. 

PhD Title

Autoscaling Controller with Bayesian Inference

With cloud computing technology becoming ubiquitous in enterprise services, improving the scaling of underlying containers is crucial for optimal throughput and resource utilisation. I propose autoscaling containers using a Bayesian update step to determine the ideal number of parallel operator instances.

Supervisors

Matt Forshaw, Joe Matthews

Sarah Gascoigne

Email: S.Gascoigne@newcastle.ac.uk

I have an undergraduate degree in Mathematics and Psychology from Newcastle University, where I received the Mary McKinnon Prize for outstanding performance in stages two and three. I am very interested in the applications of machine learning in medicine, particularly tailoring treatments to each individual, with the hope that this may improve prognoses and reduce the probability of side effects.

In my spare time, I enjoy cycling, yoga and attending the theatre. For me, the main selling point of the CDT was the cohort aspect; learning in a collaborative setting is where I tend to thrive.

I am hoping that this CDT gives me the opportunity to form strong relationships with students within my cohort and the wider CDT community, through which we can support each other in our research endeavours.  

PhD Title

Determining seizure modulating processes

Epilepsy is a neurological condition impacting over 65 million individuals worldwide and is characterised by recurrent, spontaneous seizures. A subset of epilepsy syndromes, known as focal epilepsy, is where pathological activity originates in one region of the brain from which it may propagate to other regions.

Previous work has shown that occurrence and characteristics of seizures can differ on various time scales (e.g., circadian, monthly). Therefore, the overarching question I aim to answer in my PhD thesis is whether it is possible to use biomarkers to investigate variability in seizures and potential modulating processes in focal epilepsy.

The severity of seizures is currently measured using one of a library of scales, all of which are somewhat based on qualitative assessments from patients and clinicians. The first project of my PhD seeks to develop a library of objective, quantitative markers of seizure severity obtained from intracranial electroencephalography (iEEG) recordings. This will be created with the hope that the proposed markers can be used to complement existing measures.

I will use iEEG data made available from two epilepsy monitoring units to develop markers and conduct statistical analysis. I intend to branch out into using wearable devices to measure biomarkers of seizures and combine them with EEG information to form a multimodal approach.

Supervisors

Yujiang Wang, Kevin Wilson, Yu Guan

David Towers

Email: D.Towers2@newcastle.ac.uk

I graduated with a BSc with honours in Computer Science and have completed an MSc in Games Engineering both at Newcastle University before joining the CDT.

The CDT appealed to me due to the cohort nature of the programme, having fellow students that are working alongside me, gives the opportunity for collaborative work, and the ability to help each other. I have interests in applying Big Data solutions in Security and Games, and hope to develop the necessary skills through the CDT to provide tools to further these areas.

In my free time, I enjoy creative writing activities and both board and video games.

PhD Title

Automating the development of efficient Deep Learning Architectures

My research is in the domain of Neural Architecture Search. Designing a Neural Network is a tedious and time-consuming process that requires expert knowledge to get good results.

Neural Architecture Search is a process of identifying good Neural Networks from a large pool of potential Networks that is too big to search through conventionally.

Supervisors

Stephen McGough, Matt Forshaw, Amir Atapour-Abarghouei

Heather Woodhouse

Email: H.Woodhouse@newcastle.ac.uk

I graduated from Newcastle University in 2020, with a Masters degree in Mathematics and Statistics. My fourth year project was titled 'Bayesian Inference for Serially Dependent Sea-surge Extremes', with a focus on the recent Hurricane Dorian which struck the Bahamas. 

Programming has always been one of my weaker areas which I am keen to improve. The CDT offers the opportunity for me to study aspects of computer science whilst keeping an element of statistics, which drew me to the course. Additionally, the field of Data Science is one which can lead to a wide range of career paths and I am looking forward to exploring the options when I finish with the CDT. 

Outside of studying, I enjoy going to music festivals and am a member of the Dance Society.  

Cohort 8

Abdul Abdulqader

Email: a.adbulqader2@ncl.ac.uk

I graduated from Teesside University with a BSc (Hons) in Internet and Microsystems Technology, then I achieved a Distinction in MSc Data Science from Newcastle University. I have over 10 years of experience working as an Information Technology professional at Newcastle University.

My MSc project involved working with Newcastle University Biosciences Institute (NUBI) and the Research Software Engineering (RSE) teams to improve and redevelop a complex Deep Learning model using PyTorch to detect cells in microscopy images.

I joined the CDT programme for its collaborative research environment and links with industrial partners which will enable me to build on the skills I gained in my master’s with a focus on solving real data science issues. Most importantly, the opportunity to work alongside other researchers in the same area.

I am passionate about travelling and visiting new places. I like to discover new places and spend time walking in the landscape.

PhD Title

Deep Learning for generating and up-scaling Digital Twins of real- world Microbial Systems

Given the vast range of environments and bacteria species, it is an almost impossible task to understand all possible interactions between multiple bacteria. My research will provide a more accurate simulations of how bacteria interact within an environment by using a Deep Learning Emulators (DLE) to predict bacteria function of unseen bacteria genomes, using a large and growing sets of bacterial experiment data, containing the bacteria genome sequencing, along with interaction data.

The goal of my research is to accelerate the creation of unique, and world leading, expertise that will deliver the AI advances needed to make a step change in the modelling of Engineered Microbial Systems (EMS).

Supervisors

Stephen McGough, Thomas Curtis

Matthew Anderson

Email: m.anderson10@ncl.ac.uk

I obtained my BEng (Hons) in Mechanical Engineering from University of Dundee in 2019 and my MSc in Machine Learning and Deep Learning from University of Strathclyde in 2020. My MSc thesis titled ‘Deep learning methods for classification of low-resolution SAR-images’ focused on the use of GANs and CNNs on radar data.

I chose the CDT due to the collaborative and supportive research environment, as well as the strong links to industry through the National Innovation Centre for Data. I hope to learn from other members of the CDT outside of my research area to broaden my knowledge of work done in other domains. In my free time I enjoy going to the gym, listening to music and playing video-games.

Deep learning approach for comprehensive quantification of diabetic maculopathy from spectral-domain optical coherence tomography images

Diabetic eye disease accounts for over 28,000 people registered blind or sight-impaired in the UK, with 1280 new patients registered each year. The commonest cause of blindness in diabetics is macular oedema, which can be treated with laser, intravitreal injections of anti-VEGF drugs and steroids. Spectral-domain optical coherence tomography (SD-OCT) can quickly, painlessly and easily detect diabetic macular oedema (DMO), but current quantitative measures of its severity on SD-OCT are poor at predicting progression and response to treatment. Within my project I aim to develop a fully automated comprehensive tool to segment and quantify a range of different OCT biomarkers in diabetic macular oedema, to assess the performance of the tool against human grader classification and to assess the ability of the automated quantification to predict outcome after DMO treatment.

Supervisors

Boguslaw Obara, Maged Hebib, David Steel

Matthew McTeer

Email: m.mcteer@ncl.ac.uk

I graduated from Newcastle University in 2019 with a BSc in Mathematics and Economics and then again in 2020 with an MSc in Data Science. I then worked as an Information Analyst for the NHS for one year before joining the CDT in Autumn 2021.

In my spare time I am a big supporter Newcastle United and Gateshead FC. I enjoy going to gigs, playing 5-a-side football, and I am also a keen runner.

PhD Title

Towards a viable artificial intelligence companion in healthcare management for non-alcoholic fatty liver disease

My research topic relates to an artificial intelligence solution to the healthcare management of non-alcoholic fatty liver disease (NAFLD), to support a personalised medicine approach of early diagnosis and accurate prognosis of the disease. Through utilising machine learning and statistical techniques, we hope to identify diagnostic and prognostic biomarkers obtained via less invasive methods in early stages of disease. We also wish to develop personalised predictive models to accurately assess the progression of NAFLD in an individual, using phenotypic multi-omics data and histological features of the liver.

I worked alongside some older members of the CDT during my master’s year, and I thought the collaborative aspect of the programme, the links to industry and being located in the up-and-coming Helix campus of Newcastle made for a fantastic opportunity to be a part of.

Supervisors

Paolo Missier, Robin Henderson, Quentin Anstee

Genevieve Moat

Email: G.J.Moat@ncl.ac.uk

I have completed both my BSc Mathematics and Statistics and MSc Data Science with Statistics at Newcastle University before joining the CDT.

The CDT appeals to me because of its cohort aspect in a collaborative setting, the interdisciplinary projects, and also the opportunities to work with institutes outside the University. I am hoping to establish long-term professional networks with the CDT and the research community during my PhD.

Outside of studying, I enjoy playing piano and badminton, also workout in the gym.

PhD title

Automatic recognition of primate natural behaviours using deep learning

Nonhuman primates are crucial models in toxicology and the study of human cognition, neurobiology and neuropathology. Measurement of their behaviour is crucial in a variety of fields, including animal welfare and neuroscience.

Video recording is commonly used to quantify complex behaviours in an accurate and reproducible manner, but manually analysing those recordings are time-consuming and difficult to standardise. There are various algorithms of pose estimation that have been developed to discriminate simple behaviours (e.g. sitting and walking), but a tool able to automatically interpret the kinematic information into complex, ethologically relevant behaviours is missing.

My goal for this project is to use deep learning models to develop and validate a new computational tool that not only identifies and tracks individual primates housed in a complex physical and social environment, but also recognises and quantifies a comprehensive list of natural behaviours that the animals spontaneously display.

Supervisors

Colline Poirier, Suliann Ben Hamed, Jaume Bacardit

Callum Simpson

Email: c.simpson5@ncl.ac.uk

I have completed both a BSc in Computer Science and an MSc in Data Science at Newcastle University before joining the CDT in Autumn 2021. In my spare time, I enjoy doing a bit of gardening and reading.

The CDT appealed to me because of the supportive cohort environment, the links to industry and the opportunity to apply my computing background to solve some really interesting problems.

PhD title

Predicting epilepsy surgery outcomes using resting-state functional MRI brain networks

Surgery to treat epilepsy can be extremely effective, leading to complete seizure freedom for around half of those who undergo it. Unfortunately for the remainder, it's unknown before surgery what the post-operative outcome will be. Given that seizures arise from abnormal brain networks, and that surgery is an alteration to a brain network, the challenge of predicting outcomes can be framed in the context of network science.

The aim of my project will be to develop novel brain network metrics and use machine learning techniques to try and develop clearer predictions of what a patients outcome will be.

Supervisors

Peter Taylor, Tom Nye, Rob Forsyth

Joe Smith

Email: j.smith57@newcastle.ac.uk

I graduated from Newcastle University in 2021 with a first in Computer Science. My dissertation was on fine grained image classification and how siamese neural networks could be used in classification tasks where no images of a certain class were present at training.

I chose the CDT because a PhD can be very stressful and requires a lot of support from staff and fellow students. The cohort aspect of the CDT means that I am constantly surrounded by other students doing similar things to me and tackling similar challenges. I never have to look too far help and support. I hope to gain friendships through the cohort programme that last through my PhD and long after.

I am a big fan of the Boston Celtics basketball team and unfortunately Sunderland football club. In my spare time I like to play guitar, watch TV and go to gigs.

PhD Title

Image Formation and Digitization Bias in Fine-grained Classification Tasks

Using real data is very important in image classification as systems trained on perfect data struggle when used in the real world where images may be blurry or contain unexpected data. Although training a system on real data can lead to a more robust system, the system may struggle to achieve an acceptable accuracy. In my PhD, I intend on looking at how real data can be used to train more robust systems for the real world whilst also looking at how data can be preprocessed to achieve and acceptable accuracy whilst also keeping a low false negative rate.

Supervisors

Boguslaw Obara, Jon Stonehouse

On this page

On this page

Cohort 1

Education and industry experience

Research topic

Predictive interactive remote rendering

Supervisor

PhD title

Performance modelling of distributed stream processing topologies

Research area

Supervisor

Publications

Education

PhD Title

Stochastic generators for multivariate global spatio-temporal climate data

Supervisor

Publications

PhD title

Learning a Lexicon of Human Movements

Supervisor

Publications

PhD title

Scalable Bayesian Analysis of Urban Observatory Data

Research area

Supervisor

Publications

PhD title

Internet of Events for Healthcare Data: Automating Computational Placement in IoT Environments

Supervisors

Publications

PhD title

Design of distributed run-time infrastructure for IoT

Research area

Supervisors

Publications

Education and industry background

Research interests

Publications

Cohort 2

PhD title

Real-time Physics for Spatially Partitioned MSDVEs

Supervisor

Publications

Education and industry experience

PhD title

Big Data Analytics in accelerometer-based data sets: long-term monitoring of gait to assess effectiveness of medication in Parkinson’s disease

Supervisors

PhD title

Modeling voxel dependent hemodynamic response function

Supervisor

PhD title

Complex Models in Phylogenetics

Research area

Supervisors

PhD Title

Topological methods for the assessment of statistical models

Supervisor

PhD title

Scalable Bayesian Hierarchical modelling with application in genomics

Supervisor

PhD title

Bayesian inference for linear stochastic differential equations with application to biological processes

Supervisor

Publications

PhD title

Contraining models of collective motion in biological systems

Supervisor

PhD title

Real-time monitoring and forecasting of time series

Supervisors

Publications

Cohort 3

PhD title

Declarative Distributed Stream Processing

Supervisor

Publications

PhD title

Title: Statistical topology for random fields and branching structures

Supervisor

Publications