Table of contents

8.1.2010The year I started blogging
9.1.2010Linux initramfs with iSCSI and bonding support for PXE booting
9.1.2010Using manually tweaked PTX assembly in your CUDA 2 program
9.1.2010OpenCL autoconf m4 macro
9.1.2010Mandelbrot with MPI
10.1.2010Using dynamic libraries for modular client threads
11.1.2010Creating an OpenGL 3 context with GLX
11.1.2010Creating a double buffered X window with the DBE X extension
11.1.2010Eurographics 2010 here I come!
12.1.2010A simple random file read benchmark
14.12.2011Change local passwords via RoundCube safer
5.1.2012Multi-GPU CUDA stress test
6.1.2012CUDA (Driver API) + nvcc autoconf macro



9.1.2010

Using manually tweaked PTX assembly in your CUDA 2 program

So you want to optimize or rewrite the PTX code CUDA 2.x compiler produced for you? Well, you should; not only is PTX virtual assembly fairly easy to write, but CUDA compiler technology is far from mature and there are surely manual optimizations to be made. Or you can just compile empty stub functions and fill in the actual PTX code yourself if you're feeling heroic.

In this entry I'll show you how to export PTX code from your CUDA program, and how to compile it back once you've edited it to your liking. Even though this is rather straight forward, I had to mess around for hours before I figured out how to do it. Or maybe I'm just a poor googler.

First, let's say your CUDA code is called mycode.cu and your program binary will be mycode. Create a devcode structure like this:

PROFILE=13 # Which CUDA compute profile to use
mkdir mycode.devcode
nvcc -v -arch=compute_$PROFILE -code=sm_$PROFILE -ext=all \
    --export-dir=mycode.devcode -c mycode.cu -o mycode.o.ptx
rm mycode.devcode/*/sm_$PROFILE
ln -s ../../mycode.cubin `echo mycode.devcode/*`/sm_$PROFILE

Now you have a structure that the CUDA runtime library can use when it runs the final program binary, except that instead of the actual device code object there is a symbolic link pointing to mycode.cubin in your current directory. Note that you don't have to do this again every time you compile a new version of your code. Next you can compile your existing C for CUDA code (or a framework) into an editable ptx file like this:

PROFILE=13
nvcc --ptx -v -arch=compute_$PROFILE -code=sm_$PROFILE mycode.cu

You are free to edit mycode.ptx now and work your magic in. When you're done, you should compile it as the device binary file mycode.cubin (validating the previously created link):

PROFILE=13
nvcc -v -arch=compute_$PROFILE -code=sm_$PROFILE --cubin mycode.cu

Now you can continue on building your program as usual. During linking, just include the object file mycode.o.ptx (and link against the CUDA runtime lib).

Comments

15.2.2012

Hello Ville,
Can you elaborate the steps after generating the cubin file?. How to link the object file?
Thanks in advance
- Jay

28.3.2012

Hello Jay,
Sorry for not replying earlier; I don't yet have a notification system in the blog software so I didn't see this until today :-)
The file you should link in your binary is the mycode.o.ptx file, which is a normal object (.o) file.  The device binaries (the cubin file) are loaded at runtime by this object.
So in the simplest case, you might compile and link your program like this:  g++ -lcudart -o program main.cpp mycode.o.ptx

Please note, however, that CUDA 2 is obsolete.  Later CUDAs support a lot cleaner ways to include customized PTX in your kernels.  Also, this example is for the runtime API.  The driver API allows you to explicitly upload a modified .ptx file at runtime.

Best regards
- wili


Nick     E-mail  

Is this spam?