Skip to content

Commit 7065cbe

Browse files
committed
Add HeCBench post
1 parent bcb8ce7 commit 7065cbe

File tree

2 files changed

+69
-0
lines changed

2 files changed

+69
-0
lines changed
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
layout: single
3+
title: "Benchmarking hipSYCL with HeCBench on AMD hardware"
4+
date: 2022-07-20 20:00:00 +0100
5+
categories: hipsycl amd hecbench
6+
---
7+
8+
# HeCBench
9+
10+
[HeCBench](https://github.com/zjin-lcf/hecbench) is a large benchmark collection that provides applications in various programming models, gathered from various sources. The fact that it does contain SYCL ports makes it interesting for the purpose of evaluating hipSYCL, as the performance of the SYCL version with hipSYCL can be compared to the native programming models. In this blog post, we'll be looking into comparing the hipSYCL performance with native HIP performance on an AMD Radeon Pro VII.
11+
12+
# Benchmark selection
13+
14+
HeCBench overall contains over 280 benchmarks, and hence evaluating all of them is very time-consuming. Some of them don't run yet with hipSYCL, e.g. because they rely on DPC++-specific extensions or non-standard SYCL behavior (more details on these issues can be found in [this paper](https://dl.acm.org/doi/10.1145/3529538.3530005)), but the majority works. So, to simplify the problem at hand, we select the first ~30 benchmarks in alphabetical order that work with hipSYCL. Additionally, we include four benchmarks that we already had data on from prior work: XSBench, RSBench, md5hash and nbody.
15+
16+
Following these criteria, we have selected the following applications:
17+
```
18+
aligned-types
19+
amgmk
20+
aobench
21+
asta
22+
atomicCAS
23+
atomicIntrinsics
24+
atomicReduction
25+
attention
26+
babelstream
27+
bezier-surface
28+
binomial
29+
bitonic-sort
30+
bsearch
31+
bspline-vgh
32+
ccsd-trpdrv
33+
clenergy
34+
convolutionSeparable
35+
crc64
36+
damage
37+
dp
38+
dslash
39+
expdist
40+
extend2
41+
extrema
42+
fft
43+
filter
44+
floydwarshall
45+
fpc
46+
gamma-correction
47+
XSBench
48+
RSBench
49+
md5hash
50+
nbody
51+
```
52+
53+
Some of these applications are more of functional tests rather than benchmarks (e.g. `aligned-types`), some are memory-bound (e.g. `babelstream`), and others are compute-bound (e.g. `fft`). So, we have a good mixture of different use cases at our hand, that are hopefully representative of common scenarios in the real world.
54+
55+
56+
# Results
57+
58+
The plot below shows the relative performance between the hipSYCL results and the native HIP results. Some applications return more than one result, in which case multiple results are shown for one application. This is prominently the case for BabelStream.
59+
Where the application itself did not provide performance results (e.g. for some functional tests), the wall time of the application execution was measured. The vertical red lines indicate performance parity within 20%.
60+
61+
As can be seen, the vast majority of applications perform within 20% of the native HIP performance. Those applications that perform worse are almost exclusively applications that are not necessarily geared towards performance measurements such as aligned/unaligned copy microbenchmarks or functional tests.
62+
63+
On the other hand, there are also numerous cases where hipSYCL substantially outperforms HIP, such as `aobench` at almost twice the performance, and some CAS tests with an even higher relative performance. In fact, the CAS tests for an atomic maximum implementation even outperform HIP by over 20x, and are not shown in the plot in order to retain a reasonable axis range.
64+
65+
![relative HeCBench performance between hipSYCL and HIP](/assets/images/hipsycl-relative-perf.png)
66+
67+
# Conclusion
68+
69+
It is apparent that hipSYCL can reliably deliver good performance when looking at the HeCBench applications on the investigated AMD hardware. While there are (few) cases, where HIP outperforms hipSYCL, there are also cases where hipSYCL substantially outperforms HIP.
155 KB
Loading

0 commit comments

Comments
 (0)