» About the Artist2 NoE » Research and Integration » Cluster: Execution Platforms

Subscriptions

ARTIST Mailing List

Cluster: Execution Platforms

State of the Art (Brief)

This section covers:

Communication-Centric Systems

Resource-Aware Design

Simulation and Performance Analysis

Communication-Centric Systems

With growing embedded system complexity more and more parts of a system are reused or supplied, often from external sources. These parts range from single hardware components or software processes to hardware-software (HW-SW) subsystems. They must cooperate and share resources with newly developed parts such that the design constraints are met. There are many software interface standards such as CORBA, COM or DCOM, to name just a few examples that are specifically designed for that task. Nevertheless in practice, software integration is not a solved but a growing problem. In a recent interview with the German weekly Magazine “Der Spiegel”, Jürgen Schrempp (DER SPIEGEL 14/2005, 02.04.2005), CEO DaimlerChrysler, has explained that systems integration was a key problem in the recent series of call backs of Mercedes cars.

Software and communication layers together with interface standards increase software portability but they do not yet solve some of the key embedded systems challenges, (1) integration verification and (2) the control of performance and other not functional constraints, such as power consumption or dependability. Integration verification is a general systems design problem that deserves much attention but is not the focus of the activity. Performance problems originate in the fact that platform components have limited performance and memory resources. Sharing resources introduces additional, not functional dependencies. Such dependencies slow down communication or processing and increase event jitter potentially leading to buffer over- or underflows and lost messages or system failures. Such failures are difficult to identify, because many of them appear as timing anomalies that are not discovered using the typical component test patterns.

There are several approaches to cope with large systems integration. Frequently, designers use simulation or test and add some design rules of thumb for uncovered cases. An example is to limit the load on a prioritized bus to 30 or 60% of the maximum bus load. This rule has a realistic formal background and can be backed by a proof by Liu and Layland for periodic systems that hold under certain circumstances on a single processor or bus. However, these approaches do not apply to multi-hop networked systems.

A more general approach is a conservative design style that decouples the integrated components and subsystems by assigning them a fixed share of the resources. This sharing uses a TDMA technique (Time Division Multiple Access) where fixed time slots are assigned to processes or communication. Unlike round-robin scheduling, unused time slots stay empty avoiding possible buffer overflow due to extra resource shares that speed up processing or communication. In essence, TDMA reaches a complete decoupling at the cost of sub optimal resource, and possibly energy, utilization. The TDMA technique can be extended all the way to software development, where the elegantly simple mathematical formulation describing TDMA performance can be used for a system wide performance analysis and control, such as in the Giotto tool of UC Berkeley.

Such a conservative design style is typical for the aircraft industry which uses bus protocols such as TTP. TTP has additional synchronization functions that support fault detection and fault tolerance. Recently, TTP has also been introduced to the automotive mass market where it shall replace the fixed priority CAN bus in the mid term. Audi is considering TTP as their future automotive bus standard while the majority of automotive companies have committed to FlexRay which uses a TDMA protocol as a basis and runs a dynamic “mini slot” protocol in one of the time slots. This hierarchy offers the rigorous conservative design style for part of the automotive functions and a dynamic schedule with higher resource utilization but complex non functional dependencies for other functions that are less time critical. The practical impact of the relatively complex FlexRay protocol will be seen soon when the first automotive system developments will use this bus. But even when commercial tools for FlexRay will be available in the future, they will fall short in evaluating general multi-hop networks as they appear today, from automotive to MpSoC.

Another approach is formal timing analysis. The scheduling effects of fixed priority scheduling, such as used for CAN buses, are well understood and formal analysis has found its way into commercial tools, such as the Volcano software and tool set. For a while, these techniques were sufficient for fixed priority scheduling approaches used in many of the bus protocols and real-time operating systems. With the advent of more complex networks including bridges, “local” analysis techniques that look at a single bus or processor only are not sufficient any more. This development can be observed from distributed real-time systems down to the level of an individual chip where several different buses are combined today. On the chip-level, the problem is even more complicated due to the introduction of caches (see roadmap). At the same time, the individual bus protocol becomes more complicated as visible in the transition from the CAN to the FlexRay protocol. Hence, embedded real-time system design is approaching a new level of complexity for which the current tools and formalisms are not appropriate any more.

There are proposals combining few different scheduling strategies, e.g. RMS on a processor and TDMA on the bus. These are called holistic approaches and were introduced by Tindell. A very good recent example that shows the power of this „holistic“ approach is the work in ARTIST NoE by the group of Eles. This work covers many different scheduling techniques and can be considered as defining the state of the art in holistic analysis. Other work by Palencia and Harbor, again in ARTIST, is leading in holistic analysis of systems with task dependencies. In general, it appears more efficient to identify solutions that encompass the whole system than to consider local scheduling individually.

On the other hand, there is an apparent scalability problem when considering the huge number of potential subsystem combinations that require adapted holistic scheduling. An alternative is a modular analysis technique that combines local analysis of individual components using event models as interfaces. Local analysis and event model propagation are iterated for a global analysis. ARTIST is the center of gravity for the development of this composition technique with the two cooperating groups from ETH Zürich and TU Braunschweig working on different versions of this approach.
Better modeling, analysis, and optimization are only part of the solution to master communication-centric systems. Networked systems based on the extension of conventional communication networks suffer from increasingly complex and decreasingly predictable real-time behavior. Therefore, the second focus of this ARTIST 2 activity is the development of new predictable and scalable networks with an emphasis on networks-on-chip. This is the interest of DTU and University of Bologna.

Recently, there has been a substantial interest in the area of massively distributed embedded systems that are used for communication, sensing and actuating. These sensor networks appear in many different application domains such as environmental monitoring, disaster management, distributed large scale control, building automation, elderly care and logistics. This quickly emerging research area is closely related to the subjects of the cluster on execution platforms. In particular, we are faced with resource constraints in terms of power and energy, computing, memory and communication. In addition, some of the potential application areas are characterized by (hard) real-time requirements. The necessary design and analysis methods clearly ask for an integrated view of the whole distributed systems, i.e. taking into account hardware and software, a cross-layer view on the different protocols and algorithms, and new concepts for application specification, middleware and operating systems. It is not possible to describe in a few sentences the major challenges in this new kind of systems. Please find below an incomplete set specific to the area of execution platforms:

Deployment of large distributed networks with limited communication and computation resources.

Low energy computation and communication.

Fault tolerance and reliability.

Currently, it is not clear whether this kind of new embedded systems is finding wide acceptance in industry. Nevertheless, there are already first projects in Europe that attempt to do a technology transfer from academia to industry, e.g. a safety network for building applications (Siemens, ETH Zurich).

Resource-Aware Design

While microelectronic technology is continuing to provide growth according to Moore’s low, our need for computation power is growing even faster. Therefore, although, they have unprecedented computation and memory resources at their disposal, designers of embedded systems have to be increasingly aware of the fact that, in order to achieve the requirements of novel applications, these resources have to be used in the most efficient way.

A particular dimension of this problem is that of power consumption, since energy is one of the dominating constraints, especially, but not only, in the case of mobile applications. Reducing energy consumption is one of the major concerns of the research and design community, from circuit designers to embedded software developers. However, integrated system-level approaches which allow for an energy efficient mapping of an application on a customized platform are still lacking. What the research community is currently looking after are accurate system-level energy models, power estimation and analysis tools, functionality mapping and scheduling techniques, energy efficient communication synthesis and memory hierarchy optimization, as well as energy aware software compilation techniques. Hardware components that must be efficiently utilized include processors and memories. Highly optimized and low-cost processors can be designed with tools that support the creation of tuned micro-architectures and tool chains for application-specific instruction set processors. Moreover, highly efficient use of memories is increasing critical, both because the speed gap between processors and memories widens, and because the power consumed in memory systems is rapidly increasing.

Real-time applications, hard or soft, are raising the challenge of predictability. This is an extremely difficult problem in the context of modern, dynamic, multiprocessor platforms which, while providing potentially high performance, make the task of timing predictability extremely difficult (if at all solvable without drastically limiting the spectrum of applicable architectures). . With the growing software content in embedded systems and the diffusion of highly programmable and re-configurable platforms, software is given an unprecedented degree of control on resource utilization. Software generation and performance evaluation tools have to be made aware of the particularities of a certain memory hierarchy, or the dynamic features of the processor micro-architecture, such as to be able to both generate efficient code and accurately predict performance numbers. The basic dilemma which researchers still face is how much to compromise predictability in order to improve average performance? Or how much can cost and average performance be affected in order to achieve a predictable system with good worst case behaviour? This always increasing inter-dependence between between hardware and software layers can be used to perform aggressive optimizations that can be achieved only by a synergistic approach that combines the advantages of static and dynamic techniques.

Simulation and Performance Analysis

The success of such new design methods depends on the availability of analysis techniques, beyond those corresponding to the state-of-the-art. Today, manufacturers and suppliers still rely only on extensive simulation to determine if the timing constraints are satisfied. However, simulations are very time consuming and provide no guarantees that imposed requirements are met.
There is a large quantity of research related to scheduling and schedulability analysis, with results having been incorporated in analysis tools available on the market. However, the existing approaches and tools address the schedulability of processes mapped on a single processor, or the schedulability of messages exchanged over a given communication protocol.
Several research groups have provided analyses that bound the communication delay for a given communication protocol, and extended the uni-processor analysis to handle distributed systems. However, none of the existing approaches offers an analysis that can handle applications distributed across different types of networks (e.g., CAN, FlexRay, TTP) consisting of nodes that may use different scheduling policies (e.g., static cyclic scheduling, fixed-priority preemptive scheduling, earliest deadline first).
Current modeling and design approaches often dimension systems for the worst case. However, embedded systems are growing more complex and dynamic than they used to be. E.g. in multi-media embedded systems, bit rates and encoding effort may vary by orders of magnitude depending on the complexity of the audio or video being played out, the complexity of the compression and on the required quality. Additionally, the embedded devices and application’s functionality increases and they become more open to interaction with their environments. E.g. users may issue requests to the applications to change the resolution or frame rate. High-quality multimedia delivery on affordable embedded system’s hardware requires cost-efficient realization of high throughput processing that is guaranteed to deliver the required performance. The platform needs to have sufficient resources to process the stream, even under the highest load conditions. Yet, it should not waste too many available resources when complexity of the stream is less. Current design approaches for multimedia embedded systems cannot deal appropriately with the increasing dynamism inside applications and the dynamically changing set of running applications. Often design approaches are based on worst-case analysis, resulting in over-dimensioned systems.
In order to (automatically) take informed design decisions, accurate analysis techniques are needed to:

handle distributed applications, data and control dependencies;

accurately take into account the details of the communication protocols;

handle heterogeneous scheduling policies;

take into account the fault-tolerance techniques used for dependability;

capture the integration of control models and streaming models to understand
the effects of sporadic events interfering with ongoing dataflow computations;

efficiently use platform resources through statistical multiplexing based on
a combination of worst-case and stochastic techniques.

About the Artist2 NoE

ARTIST followup

Subscriptions

Cluster: Execution Platforms

State of the Art (Brief)

Communication-Centric Systems

Resource-Aware Design

Simulation and Performance Analysis