| 8.1.2010 | The year I started blogging |
| 9.1.2010 | Linux initramfs with iSCSI and bonding support for PXE booting |
| 9.1.2010 | Using manually tweaked PTX assembly in your CUDA 2 program |
| 9.1.2010 | OpenCL autoconf m4 macro |
| 9.1.2010 | Mandelbrot with MPI |
| 10.1.2010 | Using dynamic libraries for modular client threads |
| 11.1.2010 | Creating an OpenGL 3 context with GLX |
| 11.1.2010 | Creating a double buffered X window with the DBE X extension |
| 11.1.2010 | Eurographics 2010 here I come! |
| 12.1.2010 | A simple random file read benchmark |
| 14.12.2011 | Change local passwords via RoundCube safer |
| 5.1.2012 | Multi-GPU CUDA stress test |
| 6.1.2012 | CUDA (Driver API) + nvcc autoconf macro |
So you want to optimize or rewrite the PTX code CUDA 2.x compiler produced for you? Well, you should; not only is PTX virtual assembly fairly easy to write, but CUDA compiler technology is far from mature and there are surely manual optimizations to be made. Or you can just compile empty stub functions and fill in the actual PTX code yourself if you're feeling heroic.
In this entry I'll show you how to export PTX code from your CUDA program, and how to compile it back once you've edited it to your liking. Even though this is rather straight forward, I had to mess around for hours before I figured out how to do it. Or maybe I'm just a poor googler.
First, let's say your CUDA code is called mycode.cu and your program binary will be mycode. Create a devcode structure like this:
Now you have a structure that the CUDA runtime library can use when it runs the final program binary, except that instead of the actual device code object there is a symbolic link pointing to mycode.cubin in your current directory. Note that you don't have to do this again every time you compile a new version of your code. Next you can compile your existing C for CUDA code (or a framework) into an editable ptx file like this:
You are free to edit mycode.ptx now and work your magic in. When you're done, you should compile it as the device binary file mycode.cubin (validating the previously created link):
Now you can continue on building your program as usual. During linking, just include the object file mycode.o.ptx (and link against the CUDA runtime lib).
Hello Ville, Can you elaborate the steps after generating the cubin file?. How to link the object file? Thanks in advance- Jay
Hello Jay, Sorry for not replying earlier; I don't yet have a notification system in the blog software so I didn't see this until today :-) The file you should link in your binary is the mycode.o.ptx file, which is a normal object (.o) file. The device binaries (the cubin file) are loaded at runtime by this object. So in the simplest case, you might compile and link your program like this: g++ -lcudart -o program main.cpp mycode.o.ptx Please note, however, that CUDA 2 is obsolete. Later CUDAs support a lot cleaner ways to include customized PTX in your kernels. Also, this example is for the runtime API. The driver API allows you to explicitly upload a modified .ptx file at runtime. Best regards- wili