A Rollback-Recovery Protocol for Wide Area Pipelined Data Flow Computations (2004)

Author(s): Smith J, Watson P

    Abstract: It is argued that there is a significant class of pipelined large grain data flow computations whose wide area distribution and long running nature suggest a need for fault-tolerance, but for which existing approaches appear either costly or incomplete. An example, which motivated this paper, is the execution of queries over distributed databases. This paper presents an approach which exploits some limited input from the application layer in order to implement a low overhead recovery protocol for such data flow computations. Over a large range of possible data flow graphs, the protocol is shown to support tolerance of a single machine failure, per execution of the data flow computation, and in many cases to provide a greater degree of fault-tolerance.

      • Date: April 2004
      • Series Title: School of Computing Science Technical Report Series
      • Pages: 16
      • Institution: School of Computing Science, University of Newcastle upon Tyne
      • Publication type: Report
      • Bibliographic status: Published

      Keywords: data flow, distributed system, fault-tolerance, parallel system, rollback-recovery, wide area


      Professor Paul Watson
      Professor of Computing Science