Sunday, 15 July 2012

Is Threadfence Needed for Cuda Volatile Variables? -



Is Threadfence Needed for Cuda Volatile Variables? -

volatile forcefulness each shared/global memory write/read goes straight shared/global memory. automatically accomplish threadfenced does? example:

volatile __shared__ int s; s = 2; s = 10

then no need of threadfence between "s = 2" , "s = 10"?

can volatile variable, threadfence not needed? if not, example?

for volatile variable in shared memory defined this:

volatile __shared__ int s;

any access other threads in threadblock after execution of next line:

s = 2;

will see s containing 2, assuming there no farther updates s. volatile not cause sort of barrier. __threadfence() and derivatives execution barriers. thread in question not proceed beyond barrier until guaranteed updates shared memory and global memory (for __threadfence()) visible other threads.

however, next sequence:

s = 2; s = 10;

there no guarantee other threads see (except in warp synchronous case, , subject farther scenario description have not provided), except see either 2 or 10 (and again, assuming there no farther updates s).

cuda

No comments:

Post a Comment