EPSRC Centre for Doctoral Training Cloud Computing for Big Data

People

Jack Waudby

I completed my undergraduate studies in Economics & Mathematics at Lancaster University in 2015 and my Masters in Statistics in 2016, again at Lancaster University. Outside the CDT, I love skiing, playing football and watching the Montréal Canadiens.

PhD title

Transaction Management for Distributed Graph Databases

Graph databases provide an expressive data model and have a wide-range of use cases e.g. recommendation engines, fraud detection and master data management. Challenges arise in graph databases when you try to scale them, partitioning a graph across multiple machines is non-trivial. A common approach to placing a graph on a cluster of machines is to construct a balanced edge-cut in which vertices are evenly assigned to machines and the number of edges spanning machines is minimized. However, there is always a non-negligible number of these distributed edges. From a data management perspective maintaining the consistency of information regarding distributed edges becomes a challenge as ecords in the database are being concurrently modified.

An approach to architecting a distributed graph database is to use an existing scalable database as the storage backend and adapt it with a graph layer and a graph-like query language, e.g. JanusGraph. The graph database then inherits the (lack of) transactional and data replication semantics (often eventual consistency) from the underlying store. The presence of distributed edges make multi-partition transactional guarantees a necessity and it has been shown that eschewing ACID transactions in a distributed graph database can quickly lead to database corruption. My research explores the optimal approach to managing transactions in a distributed graph database given it’s common workloads and unique characteristics.

Proprietary graph databases have been fundamental to the success of companies such as Google, Facebook and Twitter. Our aim is to develop a general-purpose approach to managing graph transactions that maintains graph integrity and offers good performance. This will allow the wider society to harness the power of graphs to extract value from their data.

The CDT programme will help me achieve my research goals. It will provide a supportive environment that fosters discussion and collaboration. It will provide access to leading industry partners working on real world problems.

Supervisors

Paul Ezhilchelvan, Jim Webber