485 B
485 B
Report
| Implementation | Elapsed Time (ms) | MFLOP/s | Memory Bandwidth (GB/s) |
|---|---|---|---|
| CPU Vector Addition | xx | xx | xx |
| CUDA 1 thread, 1 thread block | 1,203.31 | 425.59 | 5.35 |
| CUDA 256 threads, 1 thread block | 1,212.36 | 422.76 | 5.31 |
| CUDA 256 threads/block, many thread blocks | 1,232.73 | 415.16 | 5.24 |
| CUDA 256 threads/block, many blocks, prefetching | 4.77 | 112,591.01 | 1,349.96 |