Reader in Large Scale Inform Management


Bio and main research profile.
I am a Reader in Large-Scale Information Management (roughly equivalent to Associate Professor if you are not from the UK) with the School of Computing at Newcastle University, with 20+ years experience in CS research, development, and research management.
The broad goal of my research is to understand the role of metadata, most notably data provenance [7], in making sense of the underlying (big) data as well as improving and optimising the processes that produce and extract added value from the data (i.e. through “big data” analytics).I call this metadata analytics.I have been leading (as Principal Investigator) the ReComp project (2016-2019, EPSRC) focused on preserving value from large-scale data analytics over time through selective re-computation, where the challenge of collecting provenance metadata and extracting value from it through analytics techniques is central to the research. [invited talk]
I am also interested in the role of provenance in making experimental science more reproducible [11,3,13], in helping track scientific data assets as a way to incentivise scientists to share their data in an Open Science setting (Data Trajectories: a research agenda) [4], and on the automatic creation of views over provenance to facilitate limited-trust data exchange [8] (funded projects: Trusted Dynamic Coalitions, PI, 2012-2013, EPSRC,  CEM-DIT: Communication and Trust in Emergencies, CO-I, 2017-2019, ONRG).
I have also been involved in the specification of the W3C PROV data model for provenance(2011-2013) where I contributed to the main recommendation documents [12,14], which follows the Open Provenance Model [15].
Additional research.

My other research interests are centred around (large-scale) information management:
  • Social media analytics (Twitter) to help health authorities combat Zika and Dengue epidemics [5], 
  • Enabling trust-less and fair marketplaces for “personal” IoT data streamsusing blockchain technology [2], 
  • Real-time multi-source data analytics to predict rainfall events and mitigate their impact (funded project: Flood-PREPARED, 2017-2021, Co-I, NERC)
  • Implementing efficient and cost-effective genomics data processing pipelines using workflow technology on the Cloud (funded project:  Cloud-eGenome:, 2013-2015, PI, MRC/ NIHR) [6]
  • Online active learning for Human Activity Recognition [9]
  • Analysing the effect of cognitive load on car drivers [10]
  • Data and Information Quality. During my PhD I proposed the notion of Quality Views [16,17], a semantics-based method for semi-automatically adding data quality control to scientific workflows. I am currently Sr. Associate Editor for the ACM Journal on Data and Information Quality (JDIQ) 


I am responsible for the our School’s post-graduate academic teaching on Big Data Analytics (CSC8101) offered through various MSc programs (CS, Advanced CS, Cloud Computing, ...)) and for coordinating the School's new curriculum on Data Science.

Students interested in projects (UG/PGT/PGR) should look here.

UG teaching:  CSC2024 - Database Technology (stage 2)