Architecture of an ISO17356 standards compliant operating system



Introduction

OSEK/VDX 2(TM) is a joint project of the automotive industry that aims to the definition of an industry standard for an open-ended architecture for distributed control units in vehicles.

The objective of the standard is to describe an environment which supports efficient utilization of resources for automotive control unit application software. This standard can be viewed as a set of API for real-time operating system (OSEK) integrated on a network management system (VDX) that together describes the characteristics of a distributed environment that can be used for developing automotive applications.

The typical applications that have to be implemented have tight real-time constraints and an high criticality (for example, a power-train application). Moreover, these applications have to be made in a huge number of units, therefore there is a need to reduce the memory footprint to a minimum enhancing as possible the OS performance.

Here are some keyword features that helps to better characterize the philosophy that drove the main architectural choices of the operating system definition:

  • Scalability - The operating system is intended for use on a wide range control units (either system with minimal hardware resources like RAM, ROM, CPU time, e.g. 8 bit microcontrollers). To support a wide range of systems the standard defines four conformance classes that tightly specifies the main features of an OS. Note that memory protection is not supported at all.

  • Portability of software - The standard specifies an ISO/ANSI-C interface between the application and the operating system that is identical in all the implementations of the OS. The aim of this interface is to give the ability to transfer an application software from one ECU to another ECU without big changes in the application source code. Due to the wide variety of hardware where the OS has to work in, the standard does not specify any interface for the Input/Output subsystem. Note that this fact reduces (if not prohibits) the portability of the application source code, since the I/O system is one of the main software part that impacts on the architecture of the software. We can say that the prime focus is not to achieve 100% compatibility between the application modules, but to ease their direct portability between compliant operating systems.

  • Configurability - Another prerequisite needed to adapt the OS to a wide range of hardware is a high degree of modularity and configurability. This configurability is reflected by the toolchain proposed by the OSEK standard, where some configuration tools help the designer in tuning the system services and the system footprint. Moreover, a language called OIL (OSEK Implementation Language) is proposed to help the definition of a standardized configuration information.

  • Statically allocated OS - All the OS objects and features are statically allocated. This fact allows simplification of the complete OS: the number of application tasks, resources and services requested are defined at compile time. Note that this approach ease the implementation of an OS capable of running on ROM, and moreover it is completely different from a dynamic approach followed in other OS standards like, for example, POSIX.

  • Support for time triggered architectures - The OSEK Standard provides the specification of OSEKTime OS, a time triggered OS that can be fully integrated in the OSEK/VDXTM framework.

In the following sections the main features of the OSEK/VDX(TM) standard will be analysed in detail.

OS

Architecture of the operating system

The architecture on which an ISO17356 standards compliant operating system is based can be viewed as a traditional fixed priority approach.

Each task in the system can be a basic task (BT) or an extended task (ET) (extended tasks are basic tasks that can react to external asynchronous events).

Every task in the system has a fixed priority (statically assigned at compile time), and the scheduler always selects the higher priority task from the ready task queue. Interrupt service routines typically pre-empt the running task (except in case the running task uses resources).

To provide support for different features in the Operating System, the various requirements of the application in terms of number of tasks, memory consumption and like are listed in four conformance classes. The compliance of an OSEK OS is always stated with respect to one conformance class. Basically, conformance classes exist to allow partial implementations of the standard along pre-defined lines, creating an upgrade path from classes of lesser functionality to classes of higher functionality with no change to the application tasks.

The conformance classes specifies different requirements for the following attributes:
  • Multiple requesting task activations (only one activation or more than one)
  • Task types (basic tasks only or basic and extended tasks)
  • Number of tasks per priority (one or more than one)

The following conformance classes are defined by the standard:
  • BCC1 - Only basic tasks, limited to one activation request per task and one task per priority, while all tasks have different priorities.
  • BCC2 - Like BCC1, plus more than one activation request per task and more than one task per priority.
  • ECC1 - Like BCC1, plus extended tasks.
  • ECC2 - Like ECC1, plus more than one task per priority and multiple requesting of task activation allowed for basic tasks.

Task Management

In the OSEK OS, a task provides the framework for the concurrent and asynchronous execution of functions. The Scheduler is then responsible for scheduling tasks following a well defined scheduling algorithm.

The OSEK operating system provides two kind of tasks: basic tasks and extended tasks. The only difference between the two concepts is that extended tasks are allowed to use the operating system call WaitEvent(). Basically that call allow an extended task to release the CPU waiting for an asynchronous event without terminating the current instance.

Each task in the system has a fixed priority (statically assigned at compile time - the value 0 is defined as the lowest priority of a task), and it can be pre-emptive or non pre-emptive. If the running task is pre-emptive the scheduler may always make a pre-emption when needed, otherwise it reschedules the system at the end of the running task instance. A pre-emptive task can disable pre-emption for a while by locking a resource called RES_SCHEDULER.

In any moment of its life a task is characterized by its state. The ISO17356 standard defines four task states:

running

In the running state, the CPU is assigned to the task, so that its instructions can be executed. Only one task can be in this state at any point in time, while all the other states can be adopted simultaneously by several tasks.

ready

All functional prerequisites for a transition into the running state exist, and the task only waits for allocation of the processor. The scheduler decides which ready task is executed next.

waiting

A task cannot continue execution because it has to wait for at least one event. Only Extended tasks can exist in this state (because they are the only tasks that may use events).

suspended

In the suspended state the task is passive and can be activated.

Note that basic tasks have no waiting state: a basic task can only represent a synchronization point at the beginning and at the end of the task. Application parts with internal synchronization points have to be implemented by more than one basic task. An advantage of extended tasks is that they can handle a coherent job in a single task, no matter which synchronization requests are active. Whenever current information for further processing is missing, the extended task switches over into the waiting state. It exits this state whenever corresponding events signal the receipt or the update of the desired data or events.

Depending on the conformance class a basic task can be activated once or multiple times. The latter means that an activation issued when a task is not in the suspended state will be recorded and then executed when the task will finish the current instance.

The termination of a task instance only occurs when a task terminates itself (to simplify the OS, no explicit task kill primitives are provided).

Application modes and system start-up

The OSEK Operating system gives a support for Application Modes. In real applications, an embedded system may execute different applications in a mutually exclusive way (for example, normal operation, a factory test, and so on). The application mode is a means to structure the software running in the system according to those different conditions and are a clean mechanism for development of totally separate systems. Once the operating system has been started, it is not allowed to change the application mode. Typically each application mode uses its own subset of tasks, ISRs, alarms and timing conditions, although if some kind of sharing between modes is possible.

The start up performance is another safety critical issue for embedded system in automotive applications since reset conditions may occur during normal operation (for example, a power-train application should be capable of rebooting the whole system in a few microseconds, because the system must safely control the spark on the engine cylinders). The system start-up is completely left to the particular implementation, although some hints are given on how design the boot-up sequence. In any case, the standard suggests the avoidance of lengthy or complicated start-up procedures.

Interrupt processing

Since the standard must be suitable for different microcontrollers, the specification of interrupt handling routines only cover the general approach that a compliant OS should follow, without coping with any hardware related issues.

In particular, the standard provides two kind of ISR handlers:

  • ISR category 1 - The ISR does not use an operating system service. In practice, the OS does not handle these interrupts, and the designer is free to write his handler, with the only restriction that he can not call any OS service. Typically, these are the fastest highest priority interrupts.
  • ISR category 2 - The ISR is handled by the system, so OS calls can be called from the handler.

Inside any ISR no rescheduling will take place. Rescheduling takes place on termination of the ISR category 2 if a pre-emptable task has been interrupted and if no other interrupt is active. At the end of the ISR category 1 no rescheduling takes place too, and this is the reason. ISR category 1 should always have the highest priority in a correct design.

Events

The event mechanism is only provided for extended tasks and can be used to communicate binary information that synchronize these tasks on asynchronous events. Each extended task owns a set of events, that can be triggered by other (basic and extended) tasks or by ISR of category 2.

The typical behaviour of an extended task is to wait for asynchronous events calling the OS service WaitEvent(). This service usually blocks the task until an event arrives. After servicing the event, the task calls WaitEvent() again to wait other events.

Events can be set only if the task is not in the suspended state. This seems to suggest that an extended task should never be in the suspended state.

Scheduling

The scheduler decides on the basis of the task priority which is the next of the ready tasks to be transferred into the running state (dynamic priority management is not supported). Tasks on the same priority level are started depending on their order of activation.

The OSEK standard provides four flavors of fixed priority scheduling, outlined below:

  • Full Preemptive Scheduling - Full pre-emptive scheduling means that the running task may be rescheduled at any instruction by the arrival of high priority tasks.

  • Non Preemptive Scheduling - Non pre-emptive scheduling means that task switching is only performed via one of a selection of explicitly defined system services (like task termination, explicit call to the scheduler or the arrival of an event that wakes up an extended task).

  • Mixed Preemptive Scheduling - Since pre-emptiveness is a task attribute, pre-emptive and non pre-emptive tasks can be mixed in the same application. The running task will influence the policy really used.

Task Grouping Using Internal Resources

This scheduling policy is very similar to the pre-emption threshold technology, where threshold values are implemented using the OSEK Priority Ceiling protocol together with internal resources locked and unlocked at the start and at the end of every task instance.

Resource Management

The standard provides support for binary resources that can be used to implement critical sections. Priority inversion and deadlock are avoided using a variant of the SRP called OSEK Priority Ceiling.

The protocol in fact is a version of SRP adapted to fixed priority:
  • every resource as assigned a ceiling that is the maximum priority of the tasks (and ISRs) that use the resource;
  • when a task requires a resource, its current priority is raised to the ceiling of the resource;
  • when a task releases a resource, the priority of this task is reset to the priority which was dynamically assigned before requiring that resource.

The normal properties of SRP applies to the protocol. In particular, Priority inversion, chained blocking and deadlocks are avoided. Moreover, there is no need for waiting queues, since a task can be scheduled only when all the resources it needs are free.

Resources are typically used by task only. In the ISO17356 standard, resources can be used either by a task or by an ISR of category 2. An ISR that use a resource can be thought as an high priority task: its execution can be delayed due to lower ISRs or tasks accessing resources with ceiling greater or equal than the IRS priority. This is the natural behaviour in those systems where tasks activations and priorities are mapped on interrupts, and the raising of task's priorities is done with a proper programming of the interrupt controller.

The ISO17356 standard also provides a support for Preemption Thresholds through the use of internal resources. An internal resource is simply a resource that is locked when a task instance starts, and is unlocked when the task instance ends. The ceiling of the internal resources can be thought as the Preemption Threshold of the tasks.

In the same way the standard provides a special resource called RES_SCHEDULER that can be used to disable preemption. In practice, the RES_SCHEDULER is a resource with ceiling equal to the maximum priority in the system. In the same way a non preemptive task can be thought as a task that use an internal resource with the same ceiling of RES_SCHEDULER.

Finally note that, although a technique similar to Preemption Threshold is used, stack sharing between tasks of a same Non Preemption Group can not be exploited due of the OS calls WaitEvent() and Schedule() 3. In fact, these calls releases the internal resource taken by a task, letting execution to more than one task in the same Non Preemption Group.

Miscellaneous

The operating system provides services for processing recurring events (for example, timers that provide an interrupt at regular intervals, or encoders at axles that generate an interrupt in case of a constant change of an angle). These events are recorded into implementation dependent counters, then used by software alarms. When an alarm (that can be one-shot or periodic) fires, a task can be activated, or an event can be set, or finally an alarm-callback routine can be called. Alarms and counters are statically defined at compile time. The only dynamic parameters that can be set are when an alarm has to expire and the period of a cyclic alarm.

To ease the tracing and the debugging of the system the ISO17356 standard provides system specific hook routines to allow user-defined actions within the OS internal processing. These hook routines are called by the operating system and they are composed by user code that is executed into an OS primitive, usually with ISR of category 2 disabled. These routines are only allowed to use a subset of API functions (mainly they can use functions for get internal OS states, to ease the tracing of the application). They are called at system startup, at system shutdown, before and after a preemption, and in case of an error. In particular, two different kinds of errors are distinguished:

  • Application errors - The operating system could not execute the requested service correctly, but assumes the correctness of its internal data.
  • Fatal errors - The operating system can no longer assume correctness of its internal data. In this case the operating system calls the centralized system shutdown.

The standard gives two ways of handling errors: a centralized way (using an Error Hook that is called every time an error occurs in a system primitive), and a decentralized way (where the application code must check itself for the correctness of the return value of every primitive).

COM

The ISO17356 standard comprises also an agreement on interfaces and protocols for in-vehicle communication called COM. The term in-vehicle communication means both communication between nodes and internal communication in a node of the whole vehicle. The basic idea is to provide a standardized API for software communication that is independent from the particular communication media used in a way to ease porting of applications between different platforms.

The COM standard is composed of:
  • An Interaction layer which provides communication services for the transfer of application messages.
  • A Network layer which provides services for the unacknowledged and segmented transfer of application messages. The network layer provides flow control mechanisms to enable interfacing of communication peers featuring different level of performance and capabilities.
  • A Data link layer interface which provides services for the unacknowledged transfer of individual data packets over a network to the layers above.

COM provides a rich set of communication facilities but it is likely that many applications will only require a subset of this functionality. For that reason, the standard defines a set of conformance classes to enable the integration of COM in systems featuring various levels of capabilities in a scalable way, enabling the vehicle producer to integrate software parts produced by different suppliers.

COM defines these levels as Communication Conformance Classes (CCCs). The main purpose of the conformance classes is to ensure that applications which have been developed for a particular conformance class are portable across different OSEK implementations and ECUs featuring that same or higher level of communication functionality. COM defines five communication conformance classes to provide support from ECU internal communication only (CCCA) up to inter-ECU external communication (CCC2).

For an implementation to be compliant, message handling for intra-processor communication has to be offered. The minimum functionality required is CCCA as described in the COM specification.
--

NM

The ISO17356 standard also covers a standardization of basic and non-competitive infrastructure between the various embedded systems that can be present in a vehicle. In fact, very often electronic control units made by different manufacturers are networked within vehicles by serial data communication links.

For that reason the standard propose a Network Management system (NM) that provides standardised features which ensure the functionality of inter-networking by standardised interfaces.

The essential task of NM is to ensure the safety and the reliability of a communication network. This is obtained implementing access restriction to each node (access must be restricted only to authorized entities), keeping the whole network tolerant to faults, and implementing diagnostic features capable of monitoring the status of the network in an direct (monitoring by dedicated NM communication using token principle) or indirect (monitoring application messages) way.

Moreover, the network management also covers the initialization of network resources, network configuration the co-ordination of global operation modes (e.g. network wide sleep mode), and support for diagnostics.

Figure 1 interface and algorithms responsibility


  1. API, fixed by ISO17356 definition
  2. several busses connected to one μController
  3. interface to DLL - COM specific, protocol specific
  4. interface to COM Interaction Layer
  5. station management (outside OSEK, see text below)
  6. ISO17356 algorithms
  7. protocol specific management algorithms

OSEKTime

The ISO17356 Standard produced a specification for a Time Triggered Operating System (OSEKtime OS) that aims to represent a uniform functioning environment for single processor distributed embedded control units with a fault-tolerant communication layer.

The OSEKtime operating system supports static scheduling and offers all basic services for real-time applications, i.e. interrupt handling, dispatching, system time and clock synchronization, local message handling, and error detection mechanisms.

The OSEKtime operating system serves as a basis for application programs which are independent of each other, and provides their environment on a processor. The are two types of entities: interrupt service routines managed by the operating system and time triggered tasks.

The operating system (OS) can coexist with a OSEKTime OS, for handling both time triggered and event-driven computations on the same Embedded Control Unit. Basically, the interface of the OS remains the same (apart from some small changes in the startup/shutdown procedures). The basic concept is that the OSEKTime OS assigns its idle time to the OS.

The OS, its tasks and its interrupts always have a lower priority than the similar entities in the OSEKTime OS. Non-preemptive tasks remain non-preemptive only in the OS domain, and they can be preempted bt the Time Triggered dispatcher and by the Time Triggered Tasks.

{ANAME()}OSEK{ANAME}The term OSEK means ``Offene Systeme und deren Schnittstellen für die Elektronik im Kraftfahrzeug'' (Open systems and the corresponding interfaces for automotive electronics); the term VDX means Vehicle Distributed eXecutive. This document briefly describes the detail of the Operating System Specification, release 2.2, and recalls some other OSEK documents.
{ANAME()}Schedule{ANAME}Schedule() is a point of rescheduling that can be used in both preemptive and non-preemptive tasks.

Last edited by rabiddog .
Page last modified on Saturday 09 of May, 2015 [14:03:56 UTC].


RSS Wiki