Sunday, 15 September 2013

gpgpu - OpenACC 2.0 routine: data locality -



gpgpu - OpenACC 2.0 routine: data locality -

take next code, illustrates calling of simple routine on accelerator, compiled on device using openacc 2.0's routine directive:

#include <iostream> #pragma acc routine int function(int *array,int multiplier){ int sum=0; #pragma acc loop reduction(+:sum) for(int i=0; i<10; ++i){ sum+=multiplier*array[i]; } homecoming sum; } int main(){ int *array = new int[10]; int multiplier = 5; int out; for(int i=0; i<10; i++){ array[i] = 1; } #pragma acc come in info create(out) copyin(array[0:10],multiplier) #pragma acc parallel present(out,array[0:10],multiplier) if (function(array,multiplier) == 50){ out = 1; }else{ out = 0; } #pragma acc exit info copyout(out) delete(array[0:10],multiplier) std::cout << out << std::endl; }

how function know utilize device copies of array[0:10] , multiplier when called within parallel region? how can enforce utilize of device copies?

when routine called within device part (the parallel in code), beingness called threads on device, means threads have access arrays on device. compiler may take inline function, or may device-side function call. means can know when function called device receiving device copies of info because function inheriting present info clause parallel region. if still want convince you're running on device 1 time within function, phone call acc_on_device, tells you're running on accelerator, not received device pointer.

if want enforce utilize of device copies more that, create routine nohost technically not valid phone call host, doesn't you're asking, check on gpu array device array.

keep in mind though code within parallel part not within loop run gang-redundantly, write out race condition, unless happen running 1 gang or write using atomic.

gpgpu openacc

No comments:

Post a Comment