gpgpu - OpenACC 2.0 routine: data locality -
take next code, illustrates calling of simple routine on accelerator, compiled on device using openacc 2.0's routine
directive:
#include <iostream> #pragma acc routine int function(int *array,int multiplier){ int sum=0; #pragma acc loop reduction(+:sum) for(int i=0; i<10; ++i){ sum+=multiplier*array[i]; } homecoming sum; } int main(){ int *array = new int[10]; int multiplier = 5; int out; for(int i=0; i<10; i++){ array[i] = 1; } #pragma acc come in info create(out) copyin(array[0:10],multiplier) #pragma acc parallel present(out,array[0:10],multiplier) if (function(array,multiplier) == 50){ out = 1; }else{ out = 0; } #pragma acc exit info copyout(out) delete(array[0:10],multiplier) std::cout << out << std::endl; }
how function
know utilize device copies of array[0:10]
, multiplier
when called within parallel region? how can enforce utilize of device copies?
when routine called within device part (the parallel
in code), beingness called threads on device, means threads have access arrays on device. compiler may take inline function, or may device-side function call. means can know when function called device receiving device copies of info because function inheriting present
info clause parallel region. if still want convince you're running on device 1 time within function, phone call acc_on_device
, tells you're running on accelerator, not received device pointer.
if want enforce utilize of device copies more that, create routine nohost
technically not valid phone call host, doesn't you're asking, check on gpu array device array.
keep in mind though code within parallel part not within loop
run gang-redundantly, write out
race condition, unless happen running 1 gang or write using atomic
.
gpgpu openacc
No comments:
Post a Comment