Digital Institute

Staff Profile

Professor Paolo Missier

Professor of Big Data Analytics


I am Professor of Scalable Data Analytics with the School of Computing at Newcastle University and currently a Fellow (2018-2020) of the Alan Turing Institute, UK's National Institute for Data Science and Artificial Intelligence.

With a background in traditional databases and data management, my research has touched on Data and Information Quality, web semantics, workflow-based infrastructure for e-science, and data provenance.
With a background in traditional databases and data management, I have published over 150 peer reviewed research articles with contributions in the areas of Data and Information Quality, Web Semantics, workflow-based infrastructure for e-science, and data provenance. I have been involved in the specification of the W3C PROV data model for provenance (2011-2013) where I have been co-editor of the main recommendation documents.

My self-curated list of publications is here

My work has been funded over the years by EPSRC, NIHR-BRC, Newton Fund UK, Microsoft Azire for Research, and the Turing Institute. Funded projects include Cloud-e-Genome on scalable processing of genomics pipeline on the cloud (NIHR-BRC), ReComp on optimising analytics pipelines in response to changes in data (EPSRC), P4@NU (towards Personalised, Participatory, Preventive, Precision Medicine), and more.
My more recent research portfolio addresses the challenges and opportunities of Applied Data Science and Machine Learning for Health, with contributions on the search for "digital markers" from self-monitoring devices, and recent work on predicting respiratory crises  in acute covid patients.

My interests are also expanding to the management and exploitation of data provenance in data science pipelines, and to he study of algorithmic fairness.

I started my career as Research Scientist at Bell Communications Research, USA (1994-2001), and then as a Research Fellow at the University of Manchester, School of Computer Science (2004-2011) where I received my PhD in 2008.

At Newcastle I lead the School of Computing's post-graduate academic teaching on Big Data Analytics. 
I am Sr. Associate Editor for the 
ACM Journal on Data and Information Quality (JDIQ).


- References in this text are linked through and also available as a list from

- My self-curated list of publications is here

Current research focus: 

Data Science and Engineering for Health.  [16,17]
- Methods for predicting and preventing age-related diseases through Machine Learning;
- Personalised disease trajectories
- Discovery of digital biomarkers from self-monitoring devices (wearables).

Other research interests, current and past:
  • Provenance of data and processes. [1,2,3,4,5,6]. I have also been involved in the specification of the W3C PROV data model for provenance (2011-2013).
  • Optimisation of algorithmic fairness [7]
  • ReComp: preserving value from large-scale data analytics over time through selective re-computation [invited talk]. [8,9,3,5]
  • Social media analytics (Twitter) to help health authorities combat Zika and Dengue epidemics [10]
  • Enabling trust-less and fair marketplaces for data streams using blockchain technology [11,12]
  • Implementing efficient and cost-effective genomics data processing pipelines using workflow technology on the Cloud (funded project:  Cloud-eGenome:, 2013-2015, PI, MRC/ NIHR) [13]
  • Data and Information QualityDuring my PhD I proposed the notion of Quality Views, a semantics-based method for semi-automatically adding data quality control to scientific workflows [14,15]


Post-graduate (MSc) teaching: Big Data Analytics (CSC8101)