Applying Low-Overhead Rollback-Recovery to Wide Area Distributed Query Processing (2004)

Author(s): Smith J, Watson P

    Abstract: It is argued that there is a significant class of pipelined large grain data flow computations whose wide area distribution and long running nature suggest a need for fault-tolerance, but for which existing approaches appear either costly or incomplete. This paper presents an approach which exploits limited input from the application layer to implement a low overhead recovery protocol for such data flow computations. Over a large range of possible data flow graphs, the protocol supports tolerance of a single machine failure, per execution of the computation, and in many cases a greater degree of fault-tolerance. The protocol is implemented within an emulation of a distributed query processing system. Preliminary performance measurements suggest that the overhead is indeed low.

      • Date: October 2004
      • Series Title: School of Computing Science Technical Report Series
      • Pages: 15
      • Institution: School of Computing Science, University of Newcastle upon Tyne
      • Publication type: Report
      • Bibliographic status: Published

      Keywords: data flow, fault-tolerance, measurement, query processing, rollback-recovery, wide area


      Professor Paul Watson
      Professor of Computing Science