API
__syncthreads()
: A thread’s execution can only proceed past a __syncthreads() after all threads in its block have executed the __syncthreads()__syncwarp()
is an intrinsic function that provides a warp-level synchronization point within a warp. A warp is a group of threads in CUDA that execute instructions in lockstep.
Tool
cmd
- nvidia-smi