Skip to main content

Module

CSC3622 : Reliability and Fault Tolerance

  • Offered for Year: 2020/21
  • Module Leader(s): Dr Stephen Riddle
  • Owning School: Computing
  • Teaching Location: Newcastle City Campus
Semesters
Semester 2 Credit Value: 10
ECTS Credits: 5.0

Aims

Overview of the concepts of reliability, and a systems approach to the design,
evaluation, and implementation of fault tolerance in computer systems,
exemplified by case studies of present-day systems.

The module aims to provide an overview of the concepts of reliability and a systems approach to the design, evaluation and implementation of fault tolerance in computer systems exemplified by case studies of present-day systems. Topics covered in the syllabus include: Need for reliability, system dependability concepts and terminology; fault tolerance principles; error detection and recovery; software and hardware fault tolerance; case studies from Mars and Delta-4.

Outline Of Syllabus

Need for reliability: Faults as the sources of unreliability; anticipated and unanticipated faults; fault prevention and fault tolerance approaches to achieving reliability.
System dependability concepts and terminology: failures, error, design and component faults. Fault tolerance: principles, error detection, damage assessment, error recovery, fault treatment; redundancy; TMR systems; programming with exception and exception handlers.
Error detection: Ideal measures for error detection; replication checks; timing checks; coding checks. Error recovery: Forward and backward error recovery; their advantages and limitations; implementation issues in backward error recovery; co-operating processes and recovery lines.
Software fault tolerance: N-version programming, recovery blocks.
Hardware fault tolerance: fault classification and replication strategies; need for agreement among replicas; evaluation of redundancy requirements.
Case studies Mars, Delta-4

Teaching Methods

Please note that module leaders are reviewing the module teaching and assessment methods for Semester 2 modules, in light of the Covid-19 restrictions. There may also be a few further changes to Semester 1 modules. Final information will be available by the end of August 2020 in for Semester 1 modules and the end of October 2020 for Semester 2 modules.

Teaching Activities
Category Activity Number Length Student Hours Comment
Structured Guided LearningLecture materials181:0018:009x2 hrs recorded lecture material per week
Scheduled Learning And Teaching ActivitiesWorkshops80:304:008x 30 min Synchronous online discussions
Guided Independent StudyProject work202:0040:00Coursework
Guided Independent StudyIndependent study181:0018:00Lecture follow-up
Guided Independent StudyIndependent study201:0020:00Background reading
Total100:00
Teaching Rationale And Relationship

Techniques and theory are presented in recorded lectures. Structured discussions aid the comprehension of recorded material. Further practical work takes place during the private study hours.

Assessment Methods

Please note that module leaders are reviewing the module teaching and assessment methods for Semester 2 modules, in light of the Covid-19 restrictions. There may also be a few further changes to Semester 1 modules. Final information will be available by the end of August 2020 in for Semester 1 modules and the end of October 2020 for Semester 2 modules.

The format of resits will be determined by the Board of Examiners

Other Assessment
Description Semester When Set Percentage Comment
Report2M100Research report on a given topic. 2,000 words max.
Assessment Rationale And Relationship

The coursework includes questions on specific problems, to assess student ability to recognise patterns between module components, and requires students to carry out an independent research on one of the two suggested topics relevant to reliability and fault tolerance. This is important because this is one of the many skills that these students will need to have in their career.

Reading Lists

Timetable