cuda - Does 'code=sm_X' embed only binary (cubin) code, or also PTX code, or both? -
i little bit confused 'code=sm_x' alternative within '-gencode' statement.
an example: nvcc compiler option
-gencode arch=compute_13,code=sm_13
embed in library ?
only machine code (cubin code) gpus cc 1.3, or also ptx code gpus cc 1.3 ?
in 'maxwell compatibility guide', stated "only back-end target versions(s) specified 'code=' clause retained in resulting binary".
from that, infer given compiler alternative embeds machine code gpus cc 1.3 , no ptx code. mean not possible run library e.g. on aa maxwell generation card, there no ptx code embeded within library machine code 'just-in-time' (jit) compiled.
on other side, on gtc 2013 presentation 'introduction cuda toolkit application build tool' nvidia stated '-gencode arch=compute_13,code=sm_13' plenty gpus cc >= 1.3, , compiler alternative gpus cc > 1.3 machine code jit-ed ptx code. so, info given in maxwell compatibility guide , gtc presentation conflicting in opinion.
nvcc
has many formats code generation options can specified. read of section 6 of nvcc manual may instructive.
when using format:
nvcc -gencode arch=compute_13,code=sm_13 ...
only sass code sm_13 (cc 1.3) device retained. there no ptx retained in executable object, , code can only run on device capable of running cc1.3 sass.
using above command format, in order embed ptx version of source code executable object, it's necessary utilize virtual architecture specification alternative provided code=...
. since particular format (using -gencode
) not allow specification of multiple targets in single switch, must pass -gencode
switch multiple times nvcc, 1 each target want embedded in executable object.
so extending above example, utilize following:
nvcc -gencode arch=compute_13,code=sm_13 -gencode arch=compute_13,code=compute_13 ...
this embed both cc1.3 sass (by first gencode
switch) , cc1.3 ptx (by sec gencode
switch) in executable. devices capable of running cc1.3 sass code straight utilize that. other devices (of compute capability greater cc 1.3) jit-compile step driver, convert cc1.3 ptx code sass code architecture suitable device in question.
i agree the gtc 2013 presentation (e.g. slide 37) seems suggest
nvcc -gencode arch=compute_13,code=sm_13 ...
is sufficient devices of compute capability 1.3 or higher. not, , easy demonstrate. if compile code using above format, , effort run on cc 2.0 device, fail "invalid device function" error associated kernel or kernels have in code.
again, nvcc
has variety of command formats , "shortcuts" specifying code generation. relatively simple ones, such as:
nvcc -arch=sm_13 ...
will embed both ptx , sass version of code in executable object, resulting in kind of forward-compatibility suggested.
cuda nvcc
No comments:
Post a Comment