EPSRC Centre for Doctoral Training Cloud Computing for Big Data

People

Thomas Cooper

PhD title

Performance modelling of distributed stream processing topologies

Research area

How do we cope with huge amounts of incoming data?

Rather than running a single script on a laptop, we spread the load across many machines. We break the processing into multiple stages. We replicate these stages so the slow parts of the process can have more copies than the fast part.

Apache Storm and Apache Heron are examples of distributed stream processing systems. They provide a framework to create and run distributed real time processing pipelines (topologies). Without shutting the topology down, they allow you to:

  • change the number of copies of each operator (stages) in your topology
  • change the number of machines they are run across.

This change in the number of operators and machines is called scaling. Scaling affects how fast a topology runs (latency) and how much data it can process (throughput).

Currently, the decision of what operators to scale and by how much, is left up to human operators. This often means we use trial and error to get a topology to the correct “size” to meet an expected workload. This can take many days for more complex topologies.

My PhD research aims to provide a way to predict how changes to a running topology will affect its performance. A prediction system would mean a human (or a machine) could choose the best performing option before having to change the running system. This would save time and resources.

My main focus is on creating a prediction system for Apache Storm. However I recently completed a 4-month internship with Twitter where I created a prediction system for their in-house version of Apache Heron.

Supervisor

Paul Ezhilchelvan

Publications

Proactive scaling of distributed stream processing work flows using workload modelling: doctoral symposiumCooper, T. - 10th ACM International Conference on Distributed and Event-based Systems - 2016 

A Queuing Model of a Stream-Processing ServerCooper, T. Ezhilchelvan, P. Mitrani, I. - IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) - 2019

Caladrius: A Performance Modelling Service for Distributed Stream Processing Systems - Cooper, T. Kalim, F. Wu, H. Li, Y. Wang, N. Lu, N. Fu, Qian, M. Luo, H. Cheng, D. Wang, Y. Dai, F. Ghosh, M. Wang, B. - IEEE 35th International Conference on Data Engineering (ICDE) - April 2019