Is Threadfence Needed for Cuda Volatile Variables? -
volatile forcefulness each shared/global memory write/read goes straight shared/global memory. automatically accomplish threadfenced does? example:
volatile __shared__ int s; s = 2; s = 10
then no need of threadfence between "s = 2" , "s = 10"?
can volatile variable, threadfence not needed? if not, example?
for volatile variable in shared memory defined this:
volatile __shared__ int s;
any access other threads in threadblock after execution of next line:
s = 2;
will see s
containing 2, assuming there no farther updates s
. volatile
not cause sort of barrier. __threadfence()
and derivatives execution barriers. thread in question not proceed beyond barrier until guaranteed updates shared memory and global memory (for __threadfence()
) visible other threads.
however, next sequence:
s = 2; s = 10;
there no guarantee other threads see (except in warp synchronous case, , subject farther scenario description have not provided), except see either 2 or 10 (and again, assuming there no farther updates s
).
cuda
No comments:
Post a Comment