EPSRC Centre for Doctoral Training Cloud Computing for Big Data


Antonia Kontaratou

PhD title

Scalable Bayesian Hierarchical modelling with application in genomics

Bayesian Hierarchical Modelling is a very powerful technique. We use it to analyse and interpret large and complex data sets. However, the Markov Chain Monte Carlo algorithms used for this type of modelling are computationally intensive and difficult to parallelise.

Our aim is to develop new methods for fitting Bayesian Hierarchical models that can scale better and have a wider range of applications. We are using Apache Spark as it is a fast engine for large-scale data processing, and functional programming. We also use Scala due to its immutability and easier parallelism.

We will test the efficiency of the methods developed on budding yeast (Saccharomyces cerevisiae) genome data. The Institute for Cell and Molecular Bio-sciences of Newcastle University provides this data. We need Complex Bayesian Hierarchical modelling to identify interactions between genes related to the strength of telomere capping.

We aim to develop new methods for fitting Bayesian Hierarchical models to a large volume of data. These new methods will contribute to the scientific understanding and highlighting of genetic interactions and also to the modelling of various problems.


Darren Wilkinson