|WP2 - Architecture Specification: Deliverable D2.1 - Architecture and Components Integration|
|Prev||Chapter 7. Components for fault-tolerance||Next|
The design tool will help user for the
Specification of tasks
Specification of real-time constraints
Specification of fault-tolerance constraints
The building tool will permit to
Configure OCERA components
Instantiate FT components
Produce additional tasks and/or wrap user code
The specific fault-tolerance components that will be developed within the first period of the project are dispatched over the user level and the kernel level.
The Application FT-monitor will decide of dynamic reconfiguration of tasks on abnormal situations. It will receive notification of errors from the kernel level and will get information on load from the QoS scheduler. Depending on the situation, it will apply degraded mode management policies in order to keep the system in a safe state. For that it will ask the kernel level to stop certain tasks and ask the QoS scheduler to reschedule the new configuration of tasks. Dynamic behavior change can also be activated by application on detection of alarm conditions on certain tasks.
First implementation will provide for emergency stop
Second implementation will provide for degraded mode management
Third implementation will provide for dynamic mode management depending on load
The FT controller will collect low level information on tasks progress and survey their lifeliness. It will signal abnormal behaviors detected and possibly activate emergency tasks.
In this first step, active collaboration with the development of scheduling and QoS components in WP4 and WP6 will be necessary.
WP4 will provide error signaling and deadline miss at kernel level
WP5 will provide high level dynamic scheduling necessary to implement dynamic reconfiguration, it will also provide information about anticipated scheduling miss so that dynamic reconfiguration can alleviate load.
Functioanalities will be added in the design tool :
Specification of fault-tolerance constraints for redundancy
The building tool will be extended in order to permit the implementation of distributed redundant tasks. It will also instanciate communication controllers to be developed in common with the OCERA communication components
Code generation for redundancy management will be added
The Application FT-monitor will be enriched and a new component a task redundancy manager will be added
4th implementation will provide for dynamic redundancy management (activation or deactivation of redundancy for certain tasks depending on workload)
Task Redundancy Manager
The task redundancy manager will monitor redundancy management. It will synchronize, activate and deactivate replicas. Passive redundancy will be implemented
The FT controller will collect low level information on tasks progress and survey their lifeliness. It will also provide reflexive information on the tasks implementing fault-tolerant mechanisms.
This component will locally monitor interactions of a local replica of a task with the Task Redundancy manager. It will operate chekpointing, and local activation / deactivation of a replica.
The implementation of redundancy mechanisms requires that data synchronisation over a distributed network can be achieved. This implies either a specific design approach (Time Triggered Systems) or the implementation of specific distributed synchronisation protocols between replicas.
In this second step active collaboration with the development of communications components in WP7 will be necessary in addition to already existing collaboration with other components. In particular fail-safe communications will have to be implemented and a checkpointing algorithm will have to be chosen and implemented. A temporal synchronization model will have to be defined in collaboration with WP4.