GPU Performance

1. An adaptive performance modeling tool for GPU architectures

Sara S. Baghsorkhi et al., PPoPP 2010

progress: 40%, link

1.1. 主要工作

设计了一个模型，用于提供performance information to an auto-tuning compiler（这个compiler是什么？），assist it in narrowing down the search （辅助在搜索空间中进行剪枝）。 “We introduce a abstract interpretation of a GPU kernel, work flow graph, based on which we estimate the execution time of a GPU kernel”
如何做到不依赖于GPU architecture/high-level prgramming interface
PDG: 用于性能评估。for performance evaluation, framework to represent control and data dependences for each program operation
work flow graph
key factors

1.2. Term

SPMD: Single-Program Multiple-Data
SIMD: Single-Instruction, Multiple-Data
thread granularities

NVIDIA thread-block
ATI groups
OpenCL work-groups

Threads within a thread-block are grouped into warps
SM: streaming multiprocessor
PDG: program dependence graph
WLP: Warp-level parallelism (threads within a thread-block are grouped into warps)
DLP: data-level parallelims
TLP: Thread-level parallelism
ILP: instruction-level parallelism
NUM_blocks: the number of active thread-blocks on a streaming multiprocessor


NVIDIA	thread-block
ATI	groups
OpenCL	work-groups

1.3. Performance Model

多个层面上的并行
- kernel level -> warp level -> thread level

graph TD;

	kernel[kernel]
	warp[warp level:<br />GPUs attempt to reduce memory latency by exploiting the data-level parallelism]
	thread[thread level:<br />instruction-level parallelism can still improve performance by partially covering intra-warp stalls]
	
	kernel-->warp
	warp-->thread

1.4. Work flow graph

What to take into consideration:
- SIMD pipeline latency, global memory latency

1.5. Measurement

执行时间

Z's learning note

Explorer

GPU Performance

1. An adaptive performance modeling tool for GPU architectures

1.1. 主要工作

1.2. Term

1.3. Performance Model

1.4. Work flow graph

1.5. Measurement

Table of Contents

Backlinks

Z's learning note

Explorer

GPU Performance

1. An adaptive performance modeling tool for GPU architectures §

1.1. 主要工作 §

1.2. Term §

1.3. Performance Model §

1.4. Work flow graph §

1.5. Measurement §

Table of Contents

Backlinks

1. An adaptive performance modeling tool for GPU architectures

1.1. 主要工作

1.2. Term

1.3. Performance Model

1.4. Work flow graph

1.5. Measurement