csc-656-coding-project-4/outputs.md

231 lines
13 KiB
Markdown

# Profiler Outputs
CPU-Only
---
985.325 ms
CUDA: 1 Thread, 1 Block
---
```
uzylol@nid001133:/pscratch/sd/u/uzylol/cuda_vecadd> ./helper.sh vecadd_gpu_1t
Choose an option:
1. Build and profile
2. Clean
Enter your choice (1 or 2): 1
Built executable: vecadd_gpu_1t
WARNING: vecadd_gpu_1t and any of its children processes will be profiled.
Max error: 0
Generating '/tmp/nsys-report-135f.qdstrm'
[1/7] [========================100%] report1.nsys-rep
[2/7] [========================100%] report1.sqlite
[3/7] Executing 'nvtx_sum' stats report
SKIPPED: /pscratch/sd/u/uzylol/cuda_vecadd/report1.sqlite does not contain NV Tools Extension (NVTX) data.
[4/7] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ---------------- ---------------- -------------- -------------- ------------- ----------------------
98.5 50,918,539,217 1 50,918,539,217.0 50,918,539,217.0 50,918,539,217 50,918,539,217 0.0 cudaDeviceSynchronize
0.6 335,502,692 2 167,751,346.0 167,751,346.0 70,647 335,432,045 237,136,318.7 cudaMallocManaged
0.5 247,516,551 1 247,516,551.0 247,516,551.0 247,516,551 247,516,551 0.0 cudaLaunchKernel
0.4 198,199,002 2 99,099,501.0 99,099,501.0 68,308,537 129,890,465 43,544,998.9 cudaFree
0.0 1,293 1 1,293.0 1,293.0 1,293 1,293 0.0 cuModuleGetLoadingMode
[5/7] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ---------------- ---------------- -------------- -------------- ----------- --------------------------
100.0 50,918,525,966 1 50,918,525,966.0 50,918,525,966.0 50,918,525,966 50,918,525,966 0.0 add(int, float *, float *)
[6/7] Executing 'cuda_gpu_mem_time_sum' stats report
Time (%) Total Time (ns) Count Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Operation
-------- --------------- ------ -------- -------- -------- -------- ----------- ------------------------------------
65.2 206,887,363 24,576 8,418.3 3,647.5 1,982 41,312 11,229.0 [CUDA memcpy Unified Host-to-Device]
34.8 110,515,732 12,288 8,993.8 3,135.5 1,726 48,416 12,346.7 [CUDA memcpy Unified Device-to-Host]
[7/7] Executing 'cuda_gpu_mem_size_sum' stats report
Total (MB) Count Avg (MB) Med (MB) Min (MB) Max (MB) StdDev (MB) Operation
---------- ------ -------- -------- -------- -------- ----------- ------------------------------------
4,294.967 24,576 0.175 0.033 0.004 1.044 0.301 [CUDA memcpy Unified Host-to-Device]
2,147.484 12,288 0.175 0.033 0.004 1.044 0.301 [CUDA memcpy Unified Device-to-Host]
Generated:
/pscratch/sd/u/uzylol/cuda_vecadd/report1.nsys-rep
/pscratch/sd/u/uzylol/cuda_vecadd/report1.sqlite
```
CUDA: 256 Threads, One Block
---
```
uzylol@nid001133:/pscratch/sd/u/uzylol/cuda_vecadd> ./helper.sh vecadd_gpu_256t
Choose an option:
1. Build and profile
2. Clean
Enter your choice (1 or 2): 1
Built executable: vecadd_gpu_256t
WARNING: vecadd_gpu_256t and any of its children processes will be profiled.
Max error: 0
Generating '/tmp/nsys-report-a2a6.qdstrm'
[1/7] [========================100%] report1.nsys-rep
[2/7] [========================100%] report1.sqlite
[3/7] Executing 'nvtx_sum' stats report
SKIPPED: /pscratch/sd/u/uzylol/cuda_vecadd/report1.sqlite does not contain NV Tools Extension (NVTX) data.
[4/7] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------------- --------------- ------------- ------------- ------------- ----------------------
71.3 1,689,772,282 1 1,689,772,282.0 1,689,772,282.0 1,689,772,282 1,689,772,282 0.0 cudaDeviceSynchronize
13.2 312,348,647 2 156,174,323.5 156,174,323.5 66,999 312,281,648 220,769,095.5 cudaMallocManaged
8.4 200,018,555 2 100,009,277.5 100,009,277.5 70,954,629 129,063,926 41,089,478.0 cudaFree
7.0 166,925,228 1 166,925,228.0 166,925,228.0 166,925,228 166,925,228 0.0 cudaLaunchKernel
0.0 1,031 1 1,031.0 1,031.0 1,031 1,031 0.0 cuModuleGetLoadingMode
[5/7] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------------- --------------- ------------- ------------- ----------- --------------------------
100.0 1,689,768,383 1 1,689,768,383.0 1,689,768,383.0 1,689,768,383 1,689,768,383 0.0 add(int, float *, float *)
[6/7] Executing 'cuda_gpu_mem_time_sum' stats report
Time (%) Total Time (ns) Count Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Operation
-------- --------------- ------ -------- -------- -------- -------- ----------- ------------------------------------
65.4 208,609,295 24,576 8,488.3 3,920.5 1,982 41,568 11,229.4 [CUDA memcpy Unified Host-to-Device]
34.6 110,590,888 12,288 8,999.9 3,167.5 1,726 48,384 12,351.0 [CUDA memcpy Unified Device-to-Host]
[7/7] Executing 'cuda_gpu_mem_size_sum' stats report
Total (MB) Count Avg (MB) Med (MB) Min (MB) Max (MB) StdDev (MB) Operation
---------- ------ -------- -------- -------- -------- ----------- ------------------------------------
4,294.967 24,576 0.175 0.033 0.004 1.044 0.301 [CUDA memcpy Unified Host-to-Device]
2,147.484 12,288 0.175 0.033 0.004 1.044 0.301 [CUDA memcpy Unified Device-to-Host]
Generated:
/pscratch/sd/u/uzylol/cuda_vecadd/report1.nsys-rep
/pscratch/sd/u/uzylol/cuda_vecadd/report1.sqlite
```
CUDA: 256 Threads, Many Blocks
---
```
uzylol@nid001133:/pscratch/sd/u/uzylol/cuda_vecadd> ./helper.sh vecadd_gpu_256t_mb
Choose an option:
1. Build and profile
2. Clean
Enter your choice (1 or 2): 1
Built executable: vecadd_gpu_256t_mb
WARNING: vecadd_gpu_256t_mb and any of its children processes will be profiled.
Number of thread blocks: 2097152
Max error: 0
Generating '/tmp/nsys-report-cbad.qdstrm'
[1/7] [========================100%] report1.nsys-rep
[2/7] [========================100%] report1.sqlite
[3/7] Executing 'nvtx_sum' stats report
SKIPPED: /pscratch/sd/u/uzylol/cuda_vecadd/report1.sqlite does not contain NV Tools Extension (NVTX) data.
[4/7] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------------- --------------- ------------- ------------- ------------- ----------------------
68.5 1,219,130,474 1 1,219,130,474.0 1,219,130,474.0 1,219,130,474 1,219,130,474 0.0 cudaDeviceSynchronize
17.8 316,925,924 2 158,462,962.0 158,462,962.0 55,287 316,870,637 224,022,282.4 cudaMallocManaged
11.4 203,677,016 2 101,838,508.0 101,838,508.0 68,914,217 134,762,799 46,561,978.9 cudaFree
2.3 40,488,959 1 40,488,959.0 40,488,959.0 40,488,959 40,488,959 0.0 cudaLaunchKernel
0.0 1,072 1 1,072.0 1,072.0 1,072 1,072 0.0 cuModuleGetLoadingMode
[5/7] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------------- --------------- ------------- ------------- ----------- --------------------------
100.0 1,219,123,989 1 1,219,123,989.0 1,219,123,989.0 1,219,123,989 1,219,123,989 0.0 add(int, float *, float *)
[6/7] Executing 'cuda_gpu_mem_time_sum' stats report
Time (%) Total Time (ns) Count Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Operation
-------- --------------- ------- -------- -------- -------- -------- ----------- ------------------------------------
80.3 448,633,690 155,241 2,889.9 2,175.0 1,663 63,231 3,651.9 [CUDA memcpy Unified Host-to-Device]
19.7 109,875,418 12,286 8,943.1 3,214.5 1,726 48,352 12,278.7 [CUDA memcpy Unified Device-to-Host]
[7/7] Executing 'cuda_gpu_mem_size_sum' stats report
Total (MB) Count Avg (MB) Med (MB) Min (MB) Max (MB) StdDev (MB) Operation
---------- ------- -------- -------- -------- -------- ----------- ------------------------------------
3,996.164 155,241 0.026 0.008 0.004 1.044 0.097 [CUDA memcpy Unified Host-to-Device]
2,147.418 12,286 0.175 0.033 0.004 1.044 0.301 [CUDA memcpy Unified Device-to-Host]
Generated:
/pscratch/sd/u/uzylol/cuda_vecadd/report1.nsys-rep
/pscratch/sd/u/uzylol/cuda_vecadd/report1.sqlite
```
CUDA: 256 Threads, Many Blocks with Prefetch
---
```
uzylol@nid001133:/pscratch/sd/u/uzylol/cuda_vecadd> ./helper.sh vecadd_gpu_256t_mb_prefetch
Choose an option:
1. Build and profile
2. Clean
Enter your choice (1 or 2): 1
Built executable: vecadd_gpu_256t_mb_prefetch
WARNING: vecadd_gpu_256t_mb_prefetch and any of its children processes will be profiled.
Number of thread blocks: 2097152
Max error: 0
Generating '/tmp/nsys-report-3752.qdstrm'
[1/7] [========================100%] report1.nsys-rep
[2/7] [========================100%] report1.sqlite
[3/7] Executing 'nvtx_sum' stats report
SKIPPED: /pscratch/sd/u/uzylol/cuda_vecadd/report1.sqlite does not contain NV Tools Extension (NVTX) data.
[4/7] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ------------- ------------- ----------- ----------- ------------- ----------------------
43.5 329,908,462 2 164,954,231.0 164,954,231.0 50,698 329,857,764 233,208,812.9 cudaMallocManaged
25.1 190,030,072 2 95,015,036.0 95,015,036.0 62,035,958 127,994,114 46,639,459.4 cudaFree
17.2 130,653,654 1 130,653,654.0 130,653,654.0 130,653,654 130,653,654 0.0 cudaLaunchKernel
13.5 102,455,160 2 51,227,580.0 51,227,580.0 377,189 102,077,971 71,913,312.6 cudaMemPrefetchAsync
0.6 4,773,525 1 4,773,525.0 4,773,525.0 4,773,525 4,773,525 0.0 cudaDeviceSynchronize
0.0 1,312 1 1,312.0 1,312.0 1,312 1,312 0.0 cuModuleGetLoadingMode
[5/7] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ----------- ----------- --------- --------- ----------- --------------------------
100.0 4,771,134 1 4,771,134.0 4,771,134.0 4,771,134 4,771,134 0.0 add(int, float *, float *)
[6/7] Executing 'cuda_gpu_mem_time_sum' stats report
Time (%) Total Time (ns) Count Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Operation
-------- --------------- ------ -------- -------- -------- -------- ----------- ------------------------------------
59.9 165,077,448 2,048 80,604.2 80,576.0 80,511 81,055 68.8 [CUDA memcpy Unified Host-to-Device]
40.1 110,650,214 12,288 9,004.7 3,135.5 1,726 48,352 12,355.0 [CUDA memcpy Unified Device-to-Host]
[7/7] Executing 'cuda_gpu_mem_size_sum' stats report
Total (MB) Count Avg (MB) Med (MB) Min (MB) Max (MB) StdDev (MB) Operation
---------- ------ -------- -------- -------- -------- ----------- ------------------------------------
4,294.967 2,048 2.097 2.097 2.097 2.097 0.000 [CUDA memcpy Unified Host-to-Device]
2,147.484 12,288 0.175 0.033 0.004 1.044 0.301 [CUDA memcpy Unified Device-to-Host]
Generated:
/pscratch/sd/u/uzylol/cuda_vecadd/report1.nsys-rep
/pscratch/sd/u/uzylol/cuda_vecadd/report1.sqlite
```
## Prefetch Thread Count
```
uzylol@nid001013:/pscratch/sd/u/uzylol/cuda_vecadd> compute-sanitizer vecadd_gpu_256t_mb_prefetch
========= COMPUTE-SANITIZER
Number of thread blocks: 2097152
Max error: 0
========= ERROR SUMMARY: 0 errors
```