Fault Tolerance in Decentralised Systems (Keynote Speech) (1999)

Author(s): Randell B

    Abstract: In a decentralised system the problems of fault tolerance, and in particular error recovery, vary greatly depending on the design assumptions. For example, in a distributed database system, if one disregards the possibility of undetected invalid inputs or outputs, the errors that have to be recovered from will just affect the database, and backward error recovery will be feasible and should suffice. Such a system is typically supporting a set of activities that are competing for access to a shared database, but which are otherwise essentially independent of each other in such circumstances conventional database transaction processing and distributed protocols enable backward recovery to be provided very effectively. But in more general systems the multiple activities will often not simply be competing against each other, but rather will at times be attempting to co-operate with each other, in pursuit of some common goal. Moreover, the activities in decentralised systems typically involve not just computers, but also external entities that are not capable of backward error recovery. Such additional complications make the task of error recovery more challenging, and indeed more interesting.

      • Date: 21-23 March 1999
      • Conference Name: Proceedings of the Fourth International Symposium on Autonomous Decentralized Systems (ISADS)
      • Pages: 174-179
      • Publisher: IEEE Computer Society
      • Publication type: Conference Proceedings (inc. abstract)
      • Bibliographic status: Published
      Staff

      Professor Brian Randell
      Emeritus Professor, and Senior Research Investigator