CSC3633 : Reliability and Fault Tolerance (Inactive)
- Inactive for Year: 2019/20
- Module Leader(s): Dr Neil Speirs
- Lecturer: Dr Matthew Collison, Professor Alexander Romanovsky, Prof. Tom Anderson
- Owning School: Computing
- Teaching Location: Newcastle City Campus
|Semester 1 Credit Value:||10|
Overview of the concepts of reliability, and a systems approach to the design, evaluation, and implementation of fault tolerance in computer systems, exemplified by case studies of present-day systems.
The module aims to provide an overview of the concepts of reliability and a systems approach to the design, evaluation and implementation of fault tolerance in computer systems exemplified by case studies of present-day systems. Topics covered in the syllabus include: Need for reliability, system dependability concepts and terminology; fault tolerance principles; error detection and recovery; software and hardware fault tolerance; case studies from Mars and Delta-4.
Outline Of Syllabus
Need for reliability: Faults as the sources of unreliability; anticipated and unanticipated faults; fault prevention and fault tolerance approaches to achieving reliability.
System dependability concepts and terminology: failures, error, design and component faults. Fault tolerance: principles, error detection, damage assessment, error recovery, fault treatment; redundancy; TMR systems; programming with exception and exception handlers.
Error detection: Ideal measures for error detection; replication checks; timing checks; coding checks.
Error recovery: Forward and backward error recovery; their advantages and limitations; implementation issues in backward error recovery; co-operating processes and recovery lines.
Software fault tolerance: N-version programming, recovery blocks.
Hardware fault tolerance: fault calssification and replication strategies; need for agreement among replicas; evaluation of redundancy requirements.
Case studies Mars, Delta-4
|Guided Independent Study||Assessment preparation and completion||22||0:30||11:00||Revision for final exam|
|Scheduled Learning And Teaching Activities||Lecture||22||1:00||22:00||Lectures|
|Scheduled Learning And Teaching Activities||Practical||11||1:00||11:00||Practicals|
|Guided Independent Study||Project work||1||11:00||11:00||Practical coursework|
|Guided Independent Study||Independent study||22||1:00||22:00||Lecture follow-up|
|Guided Independent Study||Independent study||23||1:00||23:00||background reading|
Teaching Rationale And Relationship
Techniques and theory are presented in lectures. Supervised practicals on a PC cluster room provide experience of
writing programming and using PCs with help available. Further practical work takes place during the private study
The format of resits will be determined by the Board of Examiners
|Practical/lab report||1||M||20||equivalent of 1000 words|
Assessment Rationale And Relationship
- The mandatory question requires the students to demonstrate their
understanding of the theories and approaches covered in the module (by solving specific problems), and it also
assesses the students' ability to recognise patterns and relationships between various components of the module.
- The two questions in Section B tend to be more in-depth on
specific techniques and they cover recalling information, summarising facts, comparing approaches, and solving a
The coursework requires students to carry out an independent research on one of the two suggested topics relevant
to reliability and fault tolerance. This is important because this is one of the many skills that these students will need
to have in their career.
N.B. This module has both “Exam Assessment” and “Other Assessment” (e.g. coursework). If the total mark for either
assessment falls below 35%, the maximum mark returned for the module will normally be 35%.