|WP2 - Architecture Specification: Deliverable D2.1 - Architecture and Components Integration|
|Prev||Chapter 7. Components for fault-tolerance||Next|
A brief description of each component is given in this section. Only general overview is given since fault-tolerance is deeply inter-related with other components. This interaction will be analysed further within the WP6 workpackage but is still on-going.
The design tool whose goal is to permit the expression of non-functional features from the user will be developed. A first step will consist in defining a description language in order to specify explicitly timing characteristics and constraints of tasks, possible alternatives for tasks, actions to be done on temporal faults, on errors. The possibility to specify tasks graphs will be offered. Support for imprecise computation will be offered. A way to specify critical tasks requiring redundancy will also be defined. The tool will permit to gather this information along with information on mapping requirements for tasks. Ways of specifying modes at task and application level will be considered. A possible result of this language definition task might be a UML profile for fault-tolerance. The acquisition tool itself will be developed using standard GUI programming tool.
The building tool will use information gathered by the design tool and configure the OCERA platform, it will provide tasks information to kernel schedulerand QoS Scheduler, instantiate FT mechanisms (AFT monitor and FTcontrollers), and adapt tasks code. When redundancy will be tackled it will produce code for tasks duplication and checkpointing mechanisms
The AFT monitor will consist of a module that will memorize Tasks Information and define reconfiguration strategies. It will collect information from both RT Scheduler and QoS Scheduler. It might also receive alarms from applications.It will decide of reconfiguration and inform RT scheduler by stopping tasks and QoS Scheduler by giving a new tasks set (this set will possibly include already running tasks)
The FT controller will be the low level module that will permit transfer of FT information from the RT level to the user level and that will activate emergency actions when required. In connection with the RT Kernel it will detect fail silent situations through watchdogs and transmit deadline miss to the Application FT monitor.
The task redundancy manager will monitor redundancy of a cluster of replicas, decide when to activate or deactivate a replica.
The task replica manager is a low level module that is in charge of synchronization and communication of the replica during checkpointing. It will also detect possible communication failure