Research and Integration Activities for the "Compilers and Timing Analysis" cluster

Platform-based Code Optimization and Verification Platform
JPIA-Platform

Abstract
The objective is to provide world-class code-synthesis and compiler tools for the generation of efficient machine code. Goals of the cluster include the integration of existing compiler-generation approaches allowing compilers for new architectures to be built quickly, efficiently and reliably.
One goal of the compilers sub-cluster is to achieve a tighter integration of European R&D activities by building on a carefully chosen industrial re-targetable compiler development platform that ensures interoperability.
The CoSy compiler platform provided by ACE is a state-of-the-art software system on which the common activities will build. This will reinforce Europe’s leading position in the area of compilers for embedded processors.
Also, the partners aim at making existing advanced optimization algorithms available to designers of embedded systems. Also, advanced high-level source-to-source transformations will be made available to users. Furthermore, verification methods for compilers are explored and also brought into practice by integrating them into the CoSy system or other tools.
Moreover, the objective of this activity is to exploit the world-leading position and expertise of academic and industrial cluster partners in order to integrate and further develop the technology currently available with the partners, so as to provide a unified architecture-aware code-synthesis, verification and compiler methodology to a variety of users, also beyond ARTIST2.



Baseline

Members of this activity have comprehensive expertise in different areas of compilers for embedded systems. Contacts and cooperations are already partially in place.

The CoSy compiler platform, provided by ACE, is a state-of-the-art tool on which the common activities build. For code-synthesis, ongoing cooperation on high-level transformations between IMEC and Dortmund will be extended.

Since compiler optimizations developed at Dortmund University have proven to be beneficial for Worst-Case Execution Time (WCET), a cooperation between Dortmund University and AbsInt has been established and will be extended in the future.

Traditionally, timing analysis tools have been designed independently of compilers. It has now turned out that proceeding along this path would result in a duplication of efforts. Flow facts are available in compilers and need to be regenerated in timing analysis tools. Timing information is available in timing analysis tools and would be useful for timing-aware optimizations in compilers. Currently, compilers use very rough approximations of timing, if they use any timing model at all. As a result, the impact of certain transformations on the run-time is frequently not known by the compilers. Hence, the user has to follow a trial-and-error approach, experimenting with different compiler options and figuring out a suitable combination of them. However, even this time-consuming process cannot really minimize the execution time since options which might be good for some part of the code might lead to bad result for some other part of the code. A tight integration of timing models into compiler is urgently needed.
Moreover, there is a general trend in industry to replace non programmable hardware accelerators (NPAs) with flexible reconfigurable cores, which have specialized resources and instructions dedicated to a class of applications. These reconfigurable cores create new challenges for embedded development tools and especially for the compiler, and new challenge for processor architecture investigation tools.

Examples of configurable cores include Xtensa from Tensilica, ARC600 and 700 from ARC, CoreXtend from MIPS. Examples of flexible development tools are the Coware/LISATek processor and compiler designer based on CoSy Express, or the toolsets proposed by Tensilica and ARC for their core extension development.

Many applications in the embedded systems domain are both resource-restricted and safety-critical. This in turn requires compilers for embedded processors to be both efficient and correct. In a cooperation between the Technical University of Berlin and ACE, verification methods and tools for compilers are investigated.
Further new or continued cooperations include Aachen-ACE (SIMD instructions).


Previous Work

Year 1
The main result achieved from the previous work in the co-operation between IMEC and Univ. Dortmund is a thorough alignment of the research objectives for the steering of locality-improving loop transformations at the source code level. For this purpose, it was crucial that the impact on the control flow complexity of the applied source code trafo could be evaluated at an early stage, before actually performing all the alternatives and compiling the resulting code on the target platform. The requirement (the WHAT specification) to tackle this problem had been defined. Based on a the WHAT specification, PhD research work had been initated at Univ. Dortmund to evaluate the alternative ways to come up with such a high-level control flow cost estimator that could base its estimate when only the source code is available (without performing any compilation). At IMEC complementary actions had been started to see how this estimator can be integrated in a loop transformation framework project that had been started up earlier (prior to the start of ARTIST2) and that is now being extended for these high-level estimators.

A cooperation was established between University of Dortmund and RWTH Aachen University, with the objective to integrate SIMD optimizations developed by the former partner into the LISATek Embedded Processor Designer based tool chain, developed by the latter partner. First, a complete tool chain, including Cosy-based C compiler, assembler and linker, was generated for the Philips Trimedia architecture. The Assembler Optimizer API, providing programming interface for developing post-pass optimizations, was extended based upon the specifications from University of Dortmund. The extensions paved the way for the integration of the bit-true dataflow analysis and the SIMD optimizations.

For the Aachen-Dortmund cooperation, after the generation of all software tools and the definition of the interfaces, the integration of the SIMD optimizations was tackled next. For this purpose, a bit-true representation of integer and register values, a semantic based control flow analysis, and a bit-true dataflow graph representation was implemented and integrated into the tools generated by LISATek model. Finally, the bit-true dataflow analysis and two proof-of-concept SIMD optimizations were integrated. Only a very limited set of additional architectural properties needs to be setup in the case of retargeting the tools to a new architecture.

In the cooperation between Absint and TU Vienna, the following problem is investigated: Improving the productivity of developing embedded software requires that high-level abstractions can be used and reused. It is paramount for the next decade that these abstractions can be optimized such that they can be used in embedded systems without a significant performance impact. One of the major problems in optimizing high-level abstractions is aliasing, introduced by operations on references and pointers in programs. Our effort focuses on building and integrating existing infrastructures such that we can offer platforms for evaluating the impact of aliasing algorithms on today’s languages performance that are used for embedded software.

The co-operation between IMEC and University of Dortmund resulted in alignment of the research objectives for the steering of locality-improving loop transformations at the source code level. For this purpose, the control flow complexity of a given source code should be evaluated for steering the loop-transformations and evaluating their benefits and overheads, before actually compiling the resulting code on the target platform. The requirements (the WHAT specifications) to tackle this problem were defined. Based on the WHAT specifications, Dortmund looked at high-level control flow cost estimation approaches that could base its estimate when only the source code is available (without performing any compilation). At IMEC complementary actions had been started to see how this estimator can be integrated in a loop transformation framework project that had been started up earlier (prior to the start of ARTIST2) and that is now being extended for these high-level estimators.


Problem Tackled in Year2

We saw that timing models need to be integrated into compilers. This needs to be done such that a duplication of the efforts spent on timing analysis tools is avoided. Therefore, we were faced with the problem of making timing information from timing analysis tools available in compilers. We wanted to demonstrate the approach in a prototype compiler. The aiT timing analysis tool was selected since it had already demonstrated its applicability at an industrial level. The Infineon TriCore was selected for the demonstrator, due to the availability of timing models, a tight coupling between the aiT analyzer provided by AbsInt and Dortmund’s compilers was established. A common data exchange format between timing analyzer and compiler was set up so that the compiler is able to pass information to the analyzer and vice versa. On top of this timing-aware compiler, new optimizations reducing worst-case timing will be developed.
At Dortmund, during year 2, the implementation of the bit-true dataflow analysis at the C-source level was carried out in partial cooperation with Aachen. In addition, processor specific optimizations were also implemented. In the Dortmund-IMEC cooperation, source to source code optimizations have been investigated.

Aachen and ACE have been working together on a number of topics together including a SIMD optimisation framework and conditional execution optimisation. A prototype optimization engine is currently being finalized, and a joint paper has been accepted for publication.

Absint and TU Vienna have been working on the integration of PAG into the ROSE compiler infrastructure as well as on the evaluation of C++ code optimizations.

Absint and Vienna have performed additional work on the ROSE/PAG integration in order to further explore the cluster´s secondary compiler platform.

TU Berlin and ACE have started initial discussions and CoSy platform trainings about Berlin´s future work on integration of compiler verification techniques into CoSy.


Current Results

- Cooperation: Dortmund-IMEC
The main technical outcome of the Dortmund-IMEC collaboration has been an agreement on the basic guidelines for the source to source transformations regarding static and dynamic optimizations (at design time and at run time respectively). These optimizations will target the loop transformations and memory assignment of statically and dynamically allocated data in complex memory hierarchies. The collaboration is mainly based on synchronized, individual work of each of the two partners and aims on common work through PhD research.

- Cooperation: Dortmund-Aachen
Bit-True Data Flow Optimizations as a Processor specific Source-Level Transformation
During the last reporting period, it was found bit-true dataflow analysis and the corresponding SIMD optimizations implemented at the C source-level serve greater benefit than as a post-pass analysis into the LISATek tools, as they can be easily retargeted for different processor architecture. Currently, the data flow analysis along with three different optimizations [Fal06] is implemented for TI C6x DSP. The first optimization detects and optimizes saturated arithmetic operations. The second optimization looks for SIMD instructions for packed parallel arithmetic. Third, is strength reduction optimization which determines the number of unused bits in variables and then reduces them to the smallest data type.

- Cooperation: ACE - Aachen
SIMD Support in CoSy
Aachen and ACE are working on an extensible framework to allow compiler writers to target SIMD hardware more quickly and efficiently. At present, hand optimised point solutions tend to be used in industry which are not conducive to retargeting. Such an approach is essential as part of an overall solution to support SIMD hardware generated from architectural descriptions
One-line Description of the Outcome
Work in progress – extensions to data dependency analysis, support for interprocedural pointer alignment analysis and code generation description extensions to enable SIMD retargetability.
Difficulty : One-line Description of the Difficulty Encountered
No-one has successfully been able to find a formalism or generate tools which facilitate generic retargeting of these algorithms.

Optimisation of Conditional Execution in CoSy.
A dynamic programming algorithm is being implemented and tested on a number of different architectures to validate its behaviour with real world code and current high-end industrial processors.
One-line Description of the Outcome
A prototype comprising a set of optimisation engines and compiler has been constructed.
Difficulty : One-line Description of the Difficulty Encountered
As per the SIMD work, retargetability is the issue.

- Cooperation: AbsInt - TU Vienna
Integration of PAG in the ROSE C++ Infrastructure and Evaluation of C++ programming styles
The ROSE-PAG integration achieved in Y1 for C was substantially extended to cover full C++ (only exlcuding Exceptions). This includes handling of templates, virtual methods, short-circuit evaluation in conditions, resolving overloaded functions, C++ namespaces, constructor and destructor calls. Based on that infrastructure an initial version of a tool for whole-program analysis (WHOPA) was implemented for performing high-level evaluation of different generic programming styles suitable for embedded systems.

Evaluation of C++ optimizations
The high-level analysis of the C++ evaluation cases shows a significant difference in code size after template instantiation and in-lining (which is crucial for the relevant codes to permit whole program optimization). The evaluation cases, although performing the same operations on test data, show a difference by a factor of six in over all codes size including library codes.
The generated assembly code was measured for different optimization levels and additionally evaluated by hand. Our early results show that a certain class of generic programming styles can be automatically optimized such that we obtain similar assembly codes as with low-level C or assembler programming; this indicates that not only C but even very high-level C++ techniques might become suitable for programming embedded systems in near future.

- Cooperation: Dortmund – Absint
Design of a WCET-aware C Compiler
Based on the interface language CRL2 of AbsInt’s timing analysis tool aiT, a successful integration of timing analysis into the compiler infrastructure of Dortmund University was achieved. This was done by automatically translating the assembly-like contents used in compilers to aiTs CRL2 format. Additionally, the results produced by the WCET analyzer aiT were automatically collected and re-imported into the compiler infrastructure. This way, precise timing information is available within a compiler for future optimization for the very first time. In addition, a powerful mechanism was developed to attach not only WCET-related data to the compiler data structures, but also to store arbitrary information used by optimizations targeting different objectives than WCET. This approach will be useful in order to perform automated trade-offs between different optimization goals.

Source Code Transformation for WCET-Optimization
The influence of the loop nest splitting source code optimization on the worst-case execution time (WCET) was examined. Loop nest splitting minimizes the number of executed if-statements in loop nests of embedded applications. It identifies iterations of a loop nest where all if-statements are satisfied and splits the loop nest such that if-statements are not executed at all for large parts of the loop’s iteration space. Especially loops and if-statements of high-level languages are an inherent source of unpredictability and loss of precision for WCET analysis. As a consequence, the optimization achieves a significantly more homogeneous control flow structure. Additionally, the precision of the optimization algorithms led to the generation of very accurate high-level flow facts. All together, considerable reductions of WCET were achieved by the source code optimization
View it online!

- Cooperation: ACE – Aachen
Optimisation of Conditional Execution in CoSy
A dynamic programming algorithm is being implemented and tested on a number of different architectures to validate its behaviour with real world code and current high-end industrial processors.
A prototype comprising a set of optimisation engines and compiler has been constructed.

No-one has successfully been able to find a formalism or generate tools which facilitate generic retargeting of these algorithms.

- Cooperation: Absint – TU Vienna
Extension of the ROSE-PAG integration from C to C++ and Implementation of Alias Analysis
The ROSE-PAG integration achieved in Y1 for C was substantially extended to cover full C++ (only excluding Exceptions). This includes handling of templates, virtual methods, short-circuit evaluation in conditions, resolving overloaded functions, C++ name spaces, constructor and destructor calls. An intra-procedural shape analysis, published by our cluster partner Reinhard Wilhelm, was implemented using PAG. We extended the analysis to an inter-procedural shape analysis. The results of the analysis can be written to an external file and visualized using the tool AiSee.

Infrastructure for high-level specification of C++ program analyses
With the integration of PAG in ROSE, an infrastructure is available that permits using a high-level language for specifying an abstract interpretation of C++ programs. ROSE uses the EDG front end for parsing C++ and offers a powerful interface for accessing and transforming the abstract syntax tree (AST). The decorated AST offers the full type information of C++ input program and the PAG-ROSE integration permits using this type information in the PAG specification (e.g. for virtual method resolution).
Difficulty: Handling of the wide range of programming constructs of a general-purpose language
C++ has such a rich set of programming constructs that research prototypes of analyses often only consider subsets of C/C++. Our goal was to create an infrastructure that permits performing research on real-world programs. In particular, the interface between PAG and ROSE required a careful design, such that we can maintain updates of ROSE and PAG, but keep the required changes in existing analysis specifications at a minimum. Our approach is grammar based and permits the generation of the used design patterns, glue code (between ROSE and PAG), and implementations of the required interfaces.

- Cooperation: Dortmund – IMEC
The main technical outcome of the Dortmund-IMEC collaboration has been an agreement on the basic guidelines for the source to source transformations regarding static and dynamic optimizations (at design time and at run time respectively). These optimizations will target the loop transformations and memory assignment of statically and dynamically allocated data in complex memory hierarchies. The collaboration is mainly based on synchronized, individual work of each of the two partners and aims on common work through PhD research.

- Cooperation: TU Berlin – ACE
Berlin works on the verification as well as on the development of optimizing compiler transformations and machine code generation. Especially in safety-critical applications in the embedded domain, compiler transformations must be both optimizing and correct. Hence, verification is necessary to ensure that transformations indeed preserve program semantics during compilation. Within ARTIST2, the focus is on the development of automated checkers that, for a particular compiler run with its source and target program, make sure that both programs are indeed semantically equivalent. As a starting point, the verification and development of checkers for loop transformations based on unimodular transformations is investigated.


Keynotes, Workshops, Tutorials

- A CoSy workshop was held in Amsterdam 3rd-5th May 2006 for academic partners including Dresden, Berlin, Bologna, University of Amsterdam, Delft and Leiden.
Hans van Someren delivered a lecture for students at Aachen on compiler technology – 21st June 2006.
- Rainer Leupers has given tutorials related to ARTIST2 activities at SBAC (Rio, Sep 2005), AICCSA (Dubai, Mar 2006), HiPEAC summer school (L´Aquila, Jul 2006), and MPSoC (Colorado, Aug 2006).
- Peter Marwedel has organized a compiler workshop at Nokia (Düsseldorf, Jun 2006) with invited speakers including Heiko Falk, Andreas Krall, and Rainer Leupers from ARTIST2.
- Tutorial: The ROSE Source-To-Source Translator
14th International Conference on Parallel Architectures and Compilation Techniques, (PACT’05), 2005 Saint Louis, MI, U.S.A – September 2005
Organization: Markus Schordan (TU-Vienna), together with Daniel J. Quinlan, Bronis R. de Supinski, Qing Yi, Richard Vuduc (Lawrence Livermore National Laboratory)
The tutorial was organized as a hands-on tutorial, separated in five sessions. Each session was organized such that participants got an overview of one component in ROSE, followed by a demo, and some hands-on exercises. Participants brought their own laptops and connected to our server for implementing the exercises. The tutorial covered the ROSE C++ intermediate representation, grammar based program analysis and data-flow analysis (PAG), pre-defined loop optimizations, visualization of the intermediate representation and attached analysis results.
View it online!


Publications Resulting from these Achievements

- [1] D. Atienza, S. Mamagkakis, F. Poletti, J. M. Mendias, F. Catthoor, L. Benini, D. Soudris, "Efficient system-level prototyping of power-aware dynamic memory managers for embedded systems". Elsevier Integration journal 39(2): 113-130 (2006)
- [2] S. Mamagkakis, D. Atienza, C. Poucet, F. Catthoor, D. Soudris, "Energy-efficient dynamic memory allocators at the middleware level of embedded systems", 6th Annual ACM Conference on Embedded Software (EMSOFT’06), Seoul, S. Korea, Oct. 2006
- [3] N. Genko, D.Atienza, J. Mendias, R. Hermida, G. De Micheli and F. Catthoor, “A Complete Network-on-Chip Emulation Framework,” DATE, International Conference on Design and Test Europe, 2005, pp. 246-251.
- [4] M. Hohenauer, C. Schumacher, R. Leupers, G. Ascheid, H. Meyr, H. van Someren: Retargetable Code Optimization with SIMD Instructions, IEEE/ACM Int. Conf. on Hardware/Software Codesign and System Synthesis (CODES+ISSS), Seoul (Korea), Oct 2006
- [5] [Fal06] H. Falk, J. Wager and A. Schaefer: Use of a Bit-true Data Flow Analysis for Processor-Specific Source Code Optimization. Submitted to ESTIMedia 2006.
- [6] [Sch06] Markus Schordan. The Language of the Visitor Design Pattern, 10th Brazilian Symposium on Programming Languages (SBLP’06) 2006. Proceedings of the 10th Brazilian Symposium on Programming Languages, pp. 235-248, ISBN 85-7669-071-3, ANAIS, 2006.
- [7] [QSVY06] Daniel Quinlan, Markus Schordan, Richard Vuduc, and Qing Yi. Annotating User-Defined Abstractions for Optimization. Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL’06) in conjunction with 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2006), Rhodes Island, Greece, April 2006. Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2006), Workshop on Performance Optimization for High-Level Languages and Libraries, CDROM, IEEE Computer Society, 2006.
- [8] [SQ05] Markus Schordan and Daniel Quinlan. Specifying Transformation Sequences as Computation on Program Fragments with an Abstract Attribute Grammar. Fifth IEEE International Workshop on Source Code Analysis and Manipulation (SCAM’05). Budapest, Hungary, September 2005. Proceedings of the Fifth IEEE International Workshop on Source Code Analysis and Manipulation, pp. 97-106, IEEE, ISBN 0-7695-2292-0, 2005.
- [9] [FLT06] Heiko Falk, Paul Lokuciejewski and Henrik Theiling. Design of a WCET-aware C Compiler. In Proceedings of “The 6th International Workshop on Worst-Case Execution Time Analysis” (WCET), Dresden, Germany, July 2006.
- [10] [FS06 Heiko Falk and Martin Schwarzer. Loop Nest Splitting for WCET-Optimization and Predictability Improvement. In Proceedings of “The 6th International Workshop on Worst-Case Execution Time Analysis” (WCET), Dresden, Germany, July 2006.


ARTIST2 Participants: Expertise and Roles

- Team Leader: Reinhard Wilhelm (Saarland University)
cluster co-leader, compiler design, static program analysis, timing analysis.
- Team Leader: Rainer Leupers (RWTH Aachen)
cluster co-leader, code optimization, retargetable compilation.
- Team Leader: Sabine Glesner (TU Berlin)
platform activity leader, compiler verification, compiler optimization.
- Team Leader: Christian Ferdinand (AbsInt)
Program-Analysis Tool, leading WCET estimation tool supplier.
- Team Leader: Peter Marwedel (Dortmund Univ.)
Architecture-aware compilation, low-power code generation, Development of optimizations for WCET minimization.
- Team Leader: Hans van Someren (ACE)
compiler development platform.

Affiliated Participants: Expertise and Roles

- Team Leader: Francky Catthoor (IMEC)
high-level code optimization.
- Team Leader: Markus Schordan, Andreas Krall (TU Vienna)
Program analysis, source-to-source transformation, code optimization.

(c) Artist Consortium, All Rights Reserved - 2006, 2007, 2008, 2009

Réalisation Axome - Création de sites Internet