#### Mapping Algorithms onto MPSoC

#### - Some Results from ARTISTDesign -

Lothar Thiele





#### Contents

- Overview about DOL
- Heterogeneous Modeling
- Interference Analysis for Multicore Systems
- Experimental Results





## **DOL Programming Environment**

#### Goal

Automatically generate an optimal mapping of a parallel application onto **MPSoC** 

#### **Challenges**

- Scalable specification
- Automated synthesis
- Design space exploration
- System-level performance analysis





## **DOL Programming Model**



- Model of computation: Kahn process network
  - Coordination language: XML with performance annotations
  - Functionality of processes: C/C++ with specific programming DOL API
- Scalability: "iterators" for large, multi-tile descriptions



### **DOL Abstract Platform Modeling**



#### ▶ Elements

- Structure: processors, peripherals, memories, buses, etc.
- Interconnects: explicit read and write communication paths
- Performance data: e.g. latency and bandwidth of HW communication channels

5

Specification: XML, including the "iterators" capability



Computer Engineering and Networks Laboratory



### **DOL Software Synthesis**



- MPARM: identical ARM7 tiles connected by AMBA bus
- Atmel Diopsis 940: tile:ARM9+DSP connected by an AMBA bus; several tiles connected via NoC
- Cell BE: PowerPC and 8 DSP-like SPEs connected via ring bus





## **Mapping Optimization Framework**



Modular framework: optimization and performance analysis plug-ins

- **Evolutionary algorithm:** SPEA2 (Strength Pareto Evolutionary Algorithm)
- Performance analysis: MPA (Modular Performance Analysis) analytic model http://www.mpa.ethz.ch
- Control & GUI: EXPO https://www.tik.ee.ethz.ch/expo
- Interface: PISA https://www.tik.ee.ethz.ch/pisa

Swiss Federal Institute of Technology



### **Performance Analysis using MPA**



**Design:** abstract event streams traversing a network of abstract processing units

> **Output:** worst-case bounds on system properties

- Modular Performance Analysis http://www.mpa.ethz.ch
- (Large) MPSoC extensions: complex activation schemes, timing correlations, blocking semantics, cyclic dependencies



### **Extension: Distributed Applications**



#### Contents

Overview about DOL

#### Heterogeneous Modeling

- Embedding into rigorous design flow (VERIMAG)
- Integration of Symta/S (TU Braunschweig)
- Heterogeneous Analysis (UPPAAL)
- Hierarchical Modeling (TU Braunschweig)
- Comparison of Methods (U Cantabria, TU BS)
- Interference Analysis for Multicore Systems
- Experimental Results



### **DOL Software Synthesis**



- MPARM: identical ARM7 tiles connected by AMBA bus
- Atmel Diopsis 940: tile:ARM9+DSP connected by an AMBA bus; several tiles connected via NoC
- Cell BE: PowerPC and 8 DSP-like SPEs connected via ring bus





### **Rigorous Design Flow (VERIMAG)**

The *frontend* transforms networks of processes written in DOL into BIP models

The *backend* applies a series of source-tosource correct-byconstruction transformations over the BIP software model.

The generated system model includes can be validated and analyzed





### **Global Picture**

# Mixing time-interval-based and time-based abstractions









### Integration by Stream Approximation



#### **Replace RTC component by**

- classical single-resource
   real-time analysis
- holistic scheduling analysis (MAST, *Gonzales*, Tindell, Yen, *Eles*, ...)
- Symta/S compositional analysis (*Ernst*)

#### **Benefits**

- incorporate advanced analysis methods; adapt analysis method to component characteristics
- reduce analysis complexity

































### Analytic vs. State-Based Abstraction





### **Composition Principle**





#### Linear arrival curves



Swiss Federal Institute of Technology

![](_page_22_Picture_4.jpeg)

#### **Structured Event Streams (ETHZ – TU BS)**

**Relevant design pattern:** 

![](_page_23_Figure_2.jpeg)

#### **Hierarchical Event Model**

J. Rox & R. Ernst, *Modeling Event Stream Hierarchies with Hierarchical Event Models* 

![](_page_24_Figure_2.jpeg)

Limitations: - Method is not transparent to existing component models

Deep processing of HES required by component models

**Computer Engineering and** 

Networks Laboratory

#### **Structured Event Streams**

![](_page_25_Figure_1.jpeg)

#### Cyclic Resource Dependencies (ETHZ – UPPSALA)

![](_page_26_Figure_1.jpeg)

**Acyclic dependency** 

![](_page_26_Figure_3.jpeg)

**Cyclic dependency** 

#### **Cyclic Resource Dependencies**

$$\Sigma := (\alpha_{in}, \beta_{in}, \alpha, \beta)$$

$$\Psi : \Sigma \to \Sigma'$$
(reflects MPA transfer functions)
$$\Sigma^{0}, \Psi(\Sigma^{0}), \Psi^{2}(\Sigma^{0}), \dots \xrightarrow{?} \Sigma^{*}$$

#### **Questions**:

- Can there be several fixpoints ?
- Is any fixpoint of  $\psi$  correct with respect to the system traces ?
- Can the optimal fixpoint be computed with the above sequence ?
- How to choose the initial approximati $\partial n^0$ ?

### RTC Toolbox (www.mpa.ethz.ch/rtctoolbox)

| Matlab Command Line                  | Simulink    |  |  |  |  |  |
|--------------------------------------|-------------|--|--|--|--|--|
| RTC Toolbox                          |             |  |  |  |  |  |
| MPA Library                          | RTI Library |  |  |  |  |  |
| Min-Plus/Max-Plus Algebra Library    |             |  |  |  |  |  |
| Matlab / Java Interface              |             |  |  |  |  |  |
| Java API                             |             |  |  |  |  |  |
| Min-Plus/Max-Plus Algebra, Utilities |             |  |  |  |  |  |
| Efficient Curve Representation       |             |  |  |  |  |  |

![](_page_28_Picture_2.jpeg)

![](_page_28_Picture_4.jpeg)

#### Contents

- Overview about DOL
- Heterogeneous Modeling
- Interference Analysis for Multicore Systems
  - Joint work with Marco Caccamo (Int. Cooperation)
- Experimental Results

![](_page_29_Picture_6.jpeg)

![](_page_29_Picture_8.jpeg)

![](_page_30_Picture_1.jpeg)

#### **Motivation**

- COTS Systems use shared resources (Memory, Bus)
- Multiple entities competing for shared resources
  - waiting for other entities to release the resource
  - accessing the resources

![](_page_30_Figure_7.jpeg)

![](_page_31_Picture_0.jpeg)

![](_page_31_Picture_1.jpeg)

#### Task / Superblock Model

- Tasks are structured as sequences of superblocks
  - fixed order of execution
  - upper bounds on execution and communication demands
- Dedicated phases for resource access and computation
  - phases have different amount of access requests
  - structure increases predictability (in terms of WCRT)
  - model motivated by industrial applications in the automotive industry

![](_page_31_Figure_10.jpeg)

![](_page_32_Picture_1.jpeg)

#### **Resource Access Models**

![](_page_32_Figure_3.jpeg)

- **DS** dedicated sequential phases, sequential superblocks
- **HS** hybrid sequential phases, sequential superblocks
- **HTS** hybrid sequential phases, time-triggered superblocks
- HTT hybrid time-triggered phases time-triggered superblocks
- **GS** general sequential phases, sequential superblocks
- **GTS** general sequential phases, time-triggered superblocks

#### Contents

- Overview about DOL
- Heterogeneous Modeling
- Interference Analysis for Multicore Systems
- Experimental Results
  - MPARM Simulator Luca Benini (U Bologna)

![](_page_33_Picture_6.jpeg)

![](_page_33_Picture_8.jpeg)

### Integration / Architecture Template

![](_page_34_Figure_1.jpeg)

#### (ETHZ – U BOLOGNA)

![](_page_34_Picture_3.jpeg)

![](_page_34_Picture_5.jpeg)

## Integration / Application Modeling

Algorithm 1 Example of a process with multiple inputs and outputs.

- 1: function FIRE(DOLProcess \*p)
- 2:  $DOL\_read(input[1], buffer\_in[1], N\_in[1]);$
- 3:  $DOL\_read(input[2], buffer\_in[2], N\_in[2]);$
- 4:  $DOL\_read(input[3], buffer\_in[3], N\_in[3]);$
- 5: execute;
- $6: \quad DOL\_write(output[1], buffer\_out[1], N\_out[1]);$
- 7:  $DOL_write(output[2], buffer_out[2], N_out[2]);$
- 8: end function

![](_page_35_Figure_10.jpeg)

![](_page_35_Picture_11.jpeg)

![](_page_35_Picture_13.jpeg)

## Integration / Communication Modeling

intra-processor communication

inter-processor communication

![](_page_36_Figure_3.jpeg)

![](_page_36_Figure_4.jpeg)

![](_page_36_Picture_7.jpeg)

#### **Example: Wave Field Synthesis**

![](_page_37_Figure_1.jpeg)

![](_page_37_Picture_2.jpeg)

![](_page_37_Picture_4.jpeg)

### **Example: Compilation Times**

| step                                          |                                       | duration        |                   |                 |  |
|-----------------------------------------------|---------------------------------------|-----------------|-------------------|-----------------|--|
|                                               |                                       | P–C             | MJPEG             | WFS             |  |
| model<br>calibration<br>(one-time<br>effort)  | functional simulation generation      | $22\mathrm{s}$  | $42\mathrm{s}$    | $35\mathrm{s}$  |  |
|                                               | functional simulation                 | $0.2\mathrm{s}$ | $3.6\mathrm{s}$   | $2.4\mathrm{s}$ |  |
|                                               | synthesis (generation of binary)      | $2\mathrm{s}$   | $4\mathrm{s}$     | $3\mathrm{s}$   |  |
|                                               | simulation on MPARM                   | $23\mathrm{s}$  | $13550\mathrm{s}$ | $740\mathrm{s}$ |  |
|                                               | log-file analysis and back-annotation | 1 s             | $12\mathrm{s}$    | $3\mathrm{s}$   |  |
| model generation                              |                                       | 1 s             | $1\mathrm{s}$     | $1\mathrm{s}$   |  |
| performance analysis based on generated model |                                       | $0.2\mathrm{s}$ | $2.5\mathrm{s}$   | $1.4\mathrm{s}$ |  |

![](_page_38_Picture_4.jpeg)

#### **Example: Accuracy**

| observed estimated |       |     |                      |                  |     |                      |                  |  |  |  |
|--------------------|-------|-----|----------------------|------------------|-----|----------------------|------------------|--|--|--|
| process            | proc. | pr. | delay                | backlog          | pr. | delay                | backlog          |  |  |  |
| p-c.p1             | 1     | 1   | $209 \ (\leq 223)$   | $5 (\le 6)$      | 2   | $357 \ (\leq 401)$   | $6 (\leq 8)$     |  |  |  |
| p–c.p3             | 1     | 2   | $329 \ (\leq 371)$   | $7 (\le 9)$      | 1   | $37 \ (\leq 43)$     | $1 (\leq 2)$     |  |  |  |
| p–c.p2             | 2     | 1   | $29 \ (\leq 38)$     | $1 \ (\leq 2)$   | 1   | $30 \ (\leq 35)$     | $1 \ (\leq 2)$   |  |  |  |
| mjpeg.ss           | 1     | 1   | $203 \ (\leq 240)$   | $4 \ (\leq 6)$   | 2   | $321 \ (\leq 441)$   | $3 (\leq 5)$     |  |  |  |
| mjpeg.ms           | 1     | 2   | $694 \ (\leq 781)$   | $1 \ (\leq 3)$   | 1   | $133 \ (\le 190)$    | $1 \ (\leq 1)$   |  |  |  |
| mjpeg.sf           | 2     | 1   | $2591 \ (\leq 3014)$ | $5 \ (\le 6)$    | 2   | $3226 \ (\leq 4315)$ | $6 \ (\leq 6)$   |  |  |  |
| mjpeg.mf           | 2     | 2   | $1881 \ (\leq 2143)$ | $2 \ (\leq 4)$   | 1   | $307 \ (\leq 340)$   | $1 \ (\leq 2)$   |  |  |  |
| mjpeg.zii          | 3     | 1   | $6164 \ (\leq 6762)$ | $4 \ (\leq 6)$   | 1   | $5971 \ (\leq 6663)$ | $4 \ (\leq 6)$   |  |  |  |
| wfs.ctrl           | 1     | 1   | $202 \ (\leq 235)$   | $3 \ (\leq 5)$   | 3   | $405 \ (\leq 795)$   | $5 \ (\leq 7)$   |  |  |  |
| wfs.src            | 1     | 2   | $292 \ (\leq 387)$   | $4 \ (\leq 5)$   | 2   | $228 \ (\leq 357)$   | $3 \ (\leq 5)$   |  |  |  |
| wfs.ls             | 1     | 3   | $4931 \ (\leq 5402)$ | $8 \ (\le 12)$   | 1   | $4996 \ (\leq 5512)$ | $9 \ (\leq 14)$  |  |  |  |
| wfs.comp1          | 2     | 1   | $1606 \ (\le 1919)$  | $12 \ (\le 15)$  | 2   | $6157 \ (\leq 7720)$ | $26 \ (\leq 30)$ |  |  |  |
| wfs.comp2          | 2     | 2   | $5960 \ (\le 6838)$  | $25 \ (\leq 26)$ | 1   | $1940 \ (\leq 2156)$ | $15 \ (\leq 20)$ |  |  |  |

![](_page_39_Picture_2.jpeg)

![](_page_39_Picture_4.jpeg)

![](_page_40_Picture_0.jpeg)

![](_page_40_Picture_1.jpeg)

![](_page_40_Picture_3.jpeg)