# **Distributed Operation Layer**

## Iuliana Bacivarov, Wolfgang Haid, Kai Huang, and Lothar Thiele ETH Zürich



## Outline

## Distributed Operation Layer Overview

- □ Specification
  - Application
  - Architecture
  - Mapping
- Design Space Exploration
  - Mapping Optimization
  - Performance Evaluation

### Demo

No KE HARITE

## **Distributed Operation Layer Overview**



#### Context

### Highly parallel applications

 e.g. wave field synthesis, physical particles modeling, ultrasound scanners

#### Multi-processor architectures

tile-based platform

### **DOL Framework**

- Scalable manner to specify applications and architectures
- Functional simulation for debugging and profiling
- Design space exploration: performance analysis and optimization

## **Application Specification**

### Structure

- Process Network
  - Processes
  - SW channels (FIFO behavior)
- Iterators
  - Scalability for processes, SW channels, entire structures

### Annotations

e.g. channel size, process runtimes

### **Functional specification**

- Language: C/C++
- Specific programming DOL API



NA ISA BARAMAN

| Algo        | rithm 1 Process Model            |                  |
|-------------|----------------------------------|------------------|
| 1: <b>p</b> | rocedure INIT(DOLProcess p)      | ▷ initialization |
| 2:          | initialize local data structures |                  |
| 3: e        | nd procedure                     |                  |
| 4: p        | procedure FIRE(DOLProcess p)     | ▷ execution      |
| 5:          | DOL_read(INPUT, size, buf)       | blocking read    |
| 6:          | manipulate                       |                  |
| 7:          | DOL_write(OUTPUT, size, buf)     | ▷ blocking write |
| 8. 0        | nd procedure                     |                  |

Eidgenössische Technische Hochschule Zürich Swiss Federal institute of Technology Zurich

## **Architecture Specification**

#### XML Description Available

NA NUMBER OF



### Abstract platform modeling

- Processors, peripherals, memories, buses, etc.
- Read and write communication paths
- Performance data:

latency and bandwidth of HW communication channels



## **Mapping Specification**

### Binding

- Processes to execution resources
- SW channels to read/write paths

### Scheduling

- Processors
- Communication

### Constraints

 To be considered during mapping



Namin Banan

## Outline

## Distributed Operation Layer Overview

- □ Specification
  - Application
  - Architecture
  - Mapping

### Design Space Exploration

- Mapping Optimization
- Performance Evaluation

### Demo

No RELEASED

## **Design Space Exploration Framework**

- Modular framework optimization algorithms and performance analysis plug-ins
- PISA&EXPO: multi-objective optimization using evolutionary algorithms
  [PISA] <u>https://www.tik.ee.ethz.ch/pisa;</u> [EXPO] <u>https://www.tik.ee.ethz.ch/expo</u>



Eidgenössische Technische Hochschule Zürich

Swiss Federal Institute of Technology Zurich

Na Na Ma Ma Mainte

## **Additive Model for Performance Analysis**

### Processor execution time:

Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich

 $T(\text{processor } c) = \sum_{\forall p \text{ mapped onto } c} r(p, c) \tag{1}$ 

r(p; c) - runtime of the process p on the processor  $c \rightarrow VSP$  simulation

#### Transfer time of a communication resource:

 $T(\text{HW channel } g) = \sum_{\forall s \text{ mapped onto } g} d(s)/b(g)$  (2)

 $\Box$  d(s) - nb. of bytes comm. over SW channel s  $\rightarrow$  Functional simulation

 $\neg$  b(g) - bandwidth of the resource  $g \rightarrow$  Platform benchmarking

#### Simplified, static model

- Parameters: resource allocation, binding of processes and SW channels to resources
- Does not account: computation and communication overhead, blocking times, HW channels usage should be below a certain threshold (up to 60% of the max bandwidth)

### **On-Going & Future Work in Performance Analysis**

Objective: more accurate and complex performance analysis model, i.e. including dynamic features

#### MPA – Modular Performance Analysis (<u>http://www.mpa.ethz.ch/</u>)

- Modular system composition; Abstract event processing unit
- Dynamic aspects included: scheduling

#### On-going research:

Eidgenössische Technische Hochschule Zürich

Swiss Federal Institute of Technology Zurich

- 1. **Complex activation schemes:** ref: W. Haid, L. Thiele, "Complex Task Activation Schemes in System Level Performance Analysis", CODES+ISSS'07
- 2. **Timing correlation:** ref: K. Huang, L. Thiele, T. Stefanov, E. Deprettere, "Performance Analysis of Multimedia Applications using Correlated Streams", DATE'07



No KE HARITE

## Outline

## Distributed Operation Layer Overview

- □ Specification
  - Application
  - Architecture
  - Mapping
- Design Space Exploration
  - Mapping Optimization
  - Performance Evaluation

### Demo

No KE HARITE

## **Objective and Design Trajectory**

 Objective: map a parallel application onto a parallel, heterogeneous architecture



### **Design Trajectory**

System specification

a statt a statut

- □ Application
- Architecture

### Application profiling

- □ Functional simulation
- Instruction-/cycle-accurate simulation

## Mapping optimization

- Performance estimation using analytic model
- Optimization algorithm

## **Design Trajectory**



## **Design Trajectory: Specification**



No NO IS NO IS NO IS

Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich

## **Design Trajectory: Functional Simulation**



No KARSKING

# **Design Trajectory: Virtual Simulation Platform**



Eidgenössische Technische Hochschule Zürich

Swiss Federal Institute of Technology Zurich

NA NA BARAMA

## **Design Trajectory: Profiling**



No KI HARING

Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich

## **Design Trajectory: Optimization**



No Kallada bie

## **System Specification: WFS Application**

### Structure

 XML representation of the Kahn process network for Wave Field Synthesis Functionality

□ ANSI C & DOL API

afte tel Matte aufen a



| Algorithm 1 Process Model                      |                  |  |
|------------------------------------------------|------------------|--|
| 1: <b>procedure</b> INIT(DOLProcess <i>p</i> ) | ▷ initialization |  |
| 2: initialize local data structures            |                  |  |
| 3: end procedure                               |                  |  |
| 4: <b>procedure</b> FIRE(DOLProcess <i>p</i> ) | ▷ execution      |  |
| 5: DOL_read(INPUT, size, buf)                  | ▷ blocking read  |  |
| 6: manipulate                                  |                  |  |
| 7: DOL_write(OUTPUT, size, buf)                | ▷ blocking write |  |
| 8: end procedure                               |                  |  |

## **Functional Simulation**

The *functional simulation* is a tool to develop, debug, and test applications



No Rallanam

## **System Specification: Architecture**

- Architecture modeled at an abstract level in XML format
- Modeled elements:
  - Processors, buses, memories
  - Communication paths between these elements



No KE HARITE

## Mapping Optimization Framework

### Approach:

- Profile application (*once*) using functional and instructionaccurate simulation
- Perform mapping optimization using an analytic model



No KARSHAND

## Where are data obtained from?



### **Static parameters:**

bus bandwidth t(g)

### **Functional simulation:**

- number of activations for each process n(p)
- amount of data for each channel b(s)

### VSP:

 runtime of each process on different processors r(p, c)
 (by using benchmark mappings)

## **Profiling VSP Execution**

- 1. Benchmark mapping generation
- 2. Binary generation (using HdS & compiler), Tcl script generation
- 3. Execution on VSP using Tcl script (log file generation)
- 4. Log file analysis and back-annotation



#### Benchmark mappings:

AD NUMBER OF

## Mapping of WFS onto SHAPES Tile

#### Process network of WFS application



#### Block diagram of SHAPES tile



#### Expo mapping example



## Mapping Optimization Framework

### Approach:

- Profile application (*once*) using functional and instructionaccurate simulation
- Perform mapping optimization using an analytic model

### **Optimization tools:**

 SPEA2: evolutionary algorithm for optimization

Na malität fan under

- EXPO
  - Mapping generation
  - Performance estimation
  - Graphical user interface



## Summary

- Demonstration how a parallel application is mapped to a heterogeneous and distributed architecture
- Demonstration of integration of complete SW tool chain
  - $\Box \text{ DOL} \rightarrow \text{HdS} \rightarrow 2 \text{ compilers} \rightarrow \text{VSP} (\rightarrow \text{DOL})$
  - □ DOL HdS interface: XML specification
  - □ DOL VSP interface: Tcl script

### EXPO framework for mapping optimization

## **Thank You!**

No wall an une

# **Any Questions?**

