# Benchmark Applications

#### **Todor Stefanov and Ed Deprettere**

Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands



# Agenda

- Introduction to the current benchmark suite
- Brief reporting on the experiences with the benchmark applications
  - Iuliana Bacivarov (10 min)
  - Todor Stefanov (10 min)
  - Bastian Ristau (10 min)
- Discussion

### **Benchmark Suite**

- Common set of benchmark applications is missing in the MPSoC community
  - The issue was discussed at the Map2MPSoC working meeting on November 27-28, 2008 in Dusseldorf
  - Many applications are available BUT
    - They are in sequential form
    - The MPSoC community needs parallel application specifications
- Decision made at the meeting
  - To initiate the creation of an application benchmark suit for MPSoC
  - To create a common repository where interested parties can upload applications for the benchmark suite
  - To make the benchmark suite available on Internet

#### **Current Content of Benchmark Suite**

- The benchmark suit is available on Internet at http://www.artist-embedded.org/artist/Benchmarks.html
- It consists of 8 applications contributed by 3 universities
  - U Leiden (Todor Stefanov)
    - Motion JPEG encoder
    - JPEG 2000 encoder
    - Sobel
  - TU Eindhoven (Marc Geilen)
    - MP3
    - H.263 encoder
    - H.263 decoder
  - ETH Zurich (Iuliana Bacivarov)
    - MPEG-2 decoder
    - MJPEG decoder

# Features of the applications

| Application         | Sequential Spec |             | Parallel Spec     |             |     | License |
|---------------------|-----------------|-------------|-------------------|-------------|-----|---------|
|                     | format          | fully func. | format            | fully func. | MoC |         |
| Motion JPEG encoder | С               | Yes         | C++ for YAPI      | Yes         | KPN | CPL     |
|                     |                 |             | XML for Daedalus  | Yes         | KPN | CPL     |
| JPEG 2000 encoder   | С               | Yes         | C++ YAPI          | Yes         | KPN | CPL     |
|                     |                 |             | XML for Daedalus  | Yes         | KPN | CPL     |
| Sobel               | С               | Yes         | C++ YAPI          | Yes         | KPN | CPL     |
|                     |                 |             | XML for Daedalus  | Yes         | KPN | CPL     |
| MP3                 | С               | Yes         | XML for SDF3      | No          | SDF | GPL     |
| H.263 encoder       | С               | Yes         | XML for SDF3      | No          | SDF | GPL     |
| H.263 decoder       | С               | Yes         | XML for SDF3      | No          | SDF | GPL     |
| MPEG 2 decoder      | С               | Yes         | XML and C for DOL | Yes         | KPN | ETHZ    |
| MJPEG decoder       | С               | Yes         | XML and C for DOL | Yes         | KPN | ETHZ    |

#### Some observations:

- All applications are streaming (data-flow) applications
- All parallel specs use well known data-flow models of computation (MoC)
- All parallel specs are in different XML formats
- Not all parallel specs are fully functional
- **...**

# Applications contributed by U Leiden



- Sobel (edge detection)
  - only task-level parallelism
- JPEG2000 (encoder)
  - mainly task-level parallelism
- Motion JPEG (encoder)
  - task-level and data-level parallelism





# MP-SoC platform considered

Platform ⇔ library of components + well defined rules how to connect and synchronize

- Processing Components:
  - Programmable processors
  - Hardware IP Cores
- Memory Components:
  - Program, Data (on-chip and external) Memory (MEM)
  - Communication Memory (CM)
- Communication Components:
  - Point-to-point network
  - Crossbar switch
  - Shared bus with Round-Robin,Fixed Priority, or TDMA arbitration



Many alternative platform instances can be constructed fast and easily by instantiating different type/number of components and setting their parameters.

 Communication Controller (CC) – interface between processing, memory, and communication components

#### Parallel Specification in C++YAPI format

YAPI simulation environment is available at <a href="http://sourceforge.net/projects/y-api/">http://sourceforge.net/projects/y-api/</a>



```
1 // process ND 1
2 void main() {
   for( int i=2; i<=M; i++)
                                     CONTROL
     for( int j=2; j<=N; j++) {
5
       if(i-2 == 0)
6
         read(IP1, in 0);
                                         READ
       if( i-3 >= 0 )
         read(IP2, in 0);
       Transformer(in 0, out 0);
9
                                     EXECUTE
       if(-j+N-1 >= 0)
10
11
         write(OP1, out 0);
                                        WRITE
12
       if(i-N == 0)
         write( OP2, out 0 );
13
     } // for i
15 } // main
```

```
#ifndef simple KPN H
#define simple KPN H
#include "vapi.h"
#include "ND 0.h"
#include "ND 1.h"
#include "ND 2.h"
class simple : public ProcessNetwork {
  private:
    // Channels
    Fifo<tCH 1> ED 0;
   Fifo<tCH 2> ED 1;
    Fifo<tCH 3> ED 2;
   // processes
   ND 0 ND 0 instance;
   ND 1 ND 1 instance;
   ND 2 ND 2 instance;
 public:
 simple(
    Id n, int parm N, int parm M
    ProcessNetwork(n),
    ED O(id("ED O"),1,1),
    ED 1(id("ED 1"),1,1),
    ED 2(id("ED 2"),1,1),
   ND O instance(id("ND_O"), ED_O, parm_N, parm_M),
   ND 1 instance(id("ND 1"), ED 0, ED 1, ED 1, ED 2, parm N, parm M),
   ND 2 instance(id("ND 2"), ED 2, parm N, parm M)
   );
 const char* type() const { return "simple"; };
#endif /* simple H */
```

Daedalus design framework is available at <a href="http://daedalus.liacs.nl">http://daedalus.liacs.nl</a>
 Network Topology Specification in XML

```
<sadg>
 <adq name="simple" levelUpNode="">
                                                                                      ED 1
    <parameter name="N" lb="450" ub="1000" value="450"/>
   <parameter name="M" lb="275" ub="1000" value="275"/>
                                                                           ED 0
                                                                   ND 0 OP1
                                                                                   IP1 ND 1 OP2
    <node name="ND 0" levelUpNode="">
      <outport name="OP1" node="ND 0" edge="ED 0">
      </outport>
    </node>
   <node name="ND 1" levelUpNode="">
      <inport name="IP1" node="ND_1" edge="ED_0">
      </inport>
      <inport name="IP2" node="ND 1" edge="ED 1">
      </inport>
      <outport name="OP1" node="ND_1" edge="ED_1">
      </outport>
      <outport name="OP2" node="ND 1" edge="ED 2">
      </outport>
    </node>
    <node name="ND 2" levelUpNode="">
     <inport name="IP1" node="ND 2" edge="ED 2">
      </inport>
    </node>
   <edge name="ED_0" fromPort="OP1" fromNode="ND_0" toPort="IP1" toNode="ND_1" size="1">
   </edge>
   <edge name="ED_1" fromPort="OP1" fromNode="ND_1" toPort="IP2" toNode="ND_1" size="1">
    </edge>
   <edge name="ED_2" fromPort="OP2" fromNode="ND_1" toPort="IP1" toNode="ND_2" size="1">
    </edge>
  </adg>
</sadg>
```

ED\_2 |P1 ND\_2

Process Control Code Specification in XML (1)



```
2 \le i \le M,
2 \le j \le N,
j - 2 = 0
2 \le i \le M,
j - 2 = 0
i - 2 = 0
-i + M = 0
j - 2 = 0
```

```
// process ND 1
2 void main() {
   for( int i=2; i<=M; i++ )
                                     CONTROL
     for( int j=2; j<=N; j++)
4
       if(i-2 == 0)
5
         read(IP1, in 0);
                                         READ
       if(i-3 >= 0)
         read(IP2, in 0);
9
       Transformer(in 0, out 0);
                                     EXECUTE
       if(-i+N-1 >= 0)
10
11
         write(OP1, out 0);
                                        WRITE
       if(i-N == 0)
12
13
         write( OP2, out 0 );
14
     } // for j
15 } // main
```



Process Control Code Specification in XML (2)

```
<sadg>
 <adg name="simple" levelUpNode="">
   <parameter name="N" lb="450" ub="1000" value="450"/>
   <parameter name="M" lb="275" ub="1000" value="275"/>
   <node name="ND_1" levelUpNode="">
     <inport name="IP1" node="ND_1" edge="ED_0">
       <bindvariable name="in_0" dataType="int"/>
       <domain type="LBS">
         dynamicControl="" parameter="N, M">
           \langle constraint matrix="[0, 0, 1, 0, 0, -2] \rangle
                               1, 1, 0, 0, 0, -2;
                               1, -1, 0, 0, 1, 0]" \rightarrow
         </linearbound>
       </domain>
     </inport>
                                                                                          N
                                                                                                  M
                                                                                                        const
     <inport name="IP2" node="ND 1" edge="ED 1">
     </inport>
     <outport name="OP2" node="ND 1" edge="ED 1">
     </outport>
     <outport name="OP2" node="ND 1" edge="ED 2">
     </outport>
     <function name="Transformer">
       <inargument name="in_0" dataType="int"/>
       <outarqument name="out 0" dataType="int"/>
     </function>
     <domain type="LBS">
       dynamicControl="" parameter="N, M">
         <constraint matrix="[1, 1, 0, 0, 0, -2;</pre>
                             1, -1, 0, 0, 1, 0;
                             1, 0, 1, 0, 0, -2;
1, 0, -1, 1, 0, 0]"/>
```

</linearbound>

</domain>

</node>
</adq>

k/sadq>

Schedule Specification in XML



```
<?xml version="1.0"?>
ksadσ>
  <adq name="simple" levelUpNode="">
 </adq>
 <ast>
   <for iterator="c0" LB="2" UB="1*M+1" stride="1">
      <for iterator="c1" LB="2" UB="1*N+1" stride="1">
        <stmt node="ND 0"/>
        <if LHS="1*c0" RHS="1*M" sign="-1">
            <if LHS="1*c1" RHS="1*N" sign="-1">
               <stmt node="ND 1"/>
            <if LHS="1*c1" RHS="1*N" sign="0">
               <stmt node="ND_2"/>
            </if>
        </if>
      </for>
   </for>
 </ast>
</sadg>
```

- Schedule gives a deadlock free execution order of the processes
  - Either with the absolute minimum FIFO buffer sizes that guaranty deadlock free execution
  - Or with the minimum FIFO buffer sizes that guarantee maximum performance
- Schedule represented as an abstract syntax tree in XML -- <ast> tag
- The <ast> can be converted to a control program implementing the schedule

# Discussion (1)

- Benchmark application suite
  - Do we need more applications?
    - Currently, we have 8 applications
  - Do we need more diverse applications?
    - Currently, we have only streaming (data-flow) applications where the parallel specs use the KPN or SDF MoC
    - What about control oriented applications?
    - What about more dynamic/adaptive applications?
    - What about applications in other MoCs?
  - Do we accept in the benchmark suit non fully functional application specs and non available simulation/execution engine for the parallel specs?
  - Do we need a common format for the applications?
    - Currently we have different XML and C/C++ formats
    - How difficult is to port an application to your specific format?

# Discussion (2)

- How to use the benchmark application suite to
  - Understand better the design flows and tools by others
    - Port the benchmark suite to your tool specific format
    - Make available your tools with the ported benchmark suit to others
  - Present experimental results
    - Make available your tools and experimental setup such that others can reproduce the experimental results
  - Compare qualitatively design flows and tools
    - Use the suite, map it onto your MPSoC and present the best results you can get
      - The results will depend on the MPSoC and the quality of the tools
    - We define/agree on a common platform and map the suite onto this platform
      - The result will depend on the quality of the tools

# Thank you

All benchmark applications can be found at:

http://www.artist-embedded.org/artist/Benchmarks.html