CuPerf
Active Development • Jan 2026 - Present
A modern, extensible command-line tool for benchmarking GPU performance on NVIDIA CUDA devices.
Provides accurate, reproducible measurements of memory bandwidth, compute throughput, tensor core performance, kernel launch overhead, and reduction performance.
Supports multiple data types (FP32, FP16, BF16, INT8, FP4), comprehensive statistics, and multiple output formats (console, JSON, CSV).
Technologies: CUDA, C++, Parallel Computing, Profiling