One of my hobbies is amateur astronomy. Years ago I bought a fairly good CCD and a transportable telescope to do some serious imaging, for scientific purposes. One key difference between photography and astro-imaging is that CCD cameras provide just raw images that need post-processing. From phones to DSRL cameras, those provide pretty images processed to look great. But to aim for scientific results, the crude numbers are needed.
Imagine that we are trying to discover an exoplanet orbiting in a distant star. What we want is the exact count of photons that come from that star, to measure its brightness with very high precision. Any error, and the eclipse produced by the exoplanet won't be detected. CCDs just do that, they provide the number of photons detected. But even that count has some noise, including unwanted deviations produced by dust, heat and the analog-to-digital converter. For that, special images are taken, called bias, dark frame and flat field. Those are combined with the raw image to provide a calibrated image that has scientific value.
All these image operations consume lot of computer power. Not because they are inherently difficult (in other manipulations, they can), but because images today tend to be large and usually many images are taken in an observing session. To process images, amateur astronomers use good programs like IRIS, Nebulosity or PixInsight, that offer both good GUI's and batch options. Astrophysicists more often use CLI environments, like IRAF. These tools are commonly programmed in C/C++, and are optimized.
However, even C/C++ are slow compared to new computing options. Nowadays, computers come with specialized number-crunching hardware also known as graphics cards. These cards can be used not only to display windows in a monitor and to play 3D games, but to do calculations. One key difference between the CPU of our computer and the GPU of our graphic card is that GPU's are designed to do parallel computation. And graphic card makers are increasingly providing good SDK and tools to do that.
Below I provide a very simple example on how to use PyFITS to read an astronomical image, calculate the log of each pixel, and to display the result. Python is a increasingly popular language among the astronomical community. PyFITS is a library to read FITS, the standard image format of the field (Hubble Space Telescope uses it!). And usually, when manipulating images with PyFITS, the fast NumPy library is also used, because we want C speed and not Python slowness. But, what if C is also the slower option?
The example below uses PyCuda, which is a Python wrapper to the CUDA tools. These tools are provided by NVIDIA, the company that makes the popular graphics cards. As can be seen in the code, we provide a function in a C-like language as a string, with is then interpreted by CUDA and cached. We then pass the image array, which is transferred from the computer memory to the graphics card memory. And then, magic happens, because the GPU modifies each pixel value in parallel, much faster than using the CPU. Finally, the result is returned to the computer memory and the log10 image is available to display it.
#!/usr/bin/python3 import astropy.io.fits as pyfits # Library to read FITS import pycuda.autoinit # PyCuda autoinit import pycuda.driver as cuda # PyCuda In, Out helpers import matplotlib.pyplot as plot # Library to plot import matplotlib.cm as colormap # Library to plot import numpy # Fast math library import time from pycuda.compiler import SourceModule # Open FITS image fits = pyfits.open('gsc-2108-1564-750mm-020s_001_c_r.fit') # Get the image data as array data = fits[0].data.astype(numpy.float32) # Read image metadata header = fits[0].header # Show all metadata print(header.keys) # PyCuda: Create kernel kernel = SourceModule(""" __global__ void log_fast(float *result, float *source) { const int i = blockIdx.x * blockDim.x + threadIdx.x; result[i] = log10(source[i]); } """) # PyCuda: Get function log_fast = kernel.get_function("log_fast") # PyCuda: Calculate block size BLOCK_SIZE = 1024 block = (BLOCK_SIZE, 1, 1) grid = (int(data.shape[0] * data.shape[1] / BLOCK_SIZE) + 1, 1, 1) # PyCuda: Call the function to transform image data_result = numpy.zeros_like(data) start_time = time.time() log_fast(cuda.Out(data_result), cuda.In(data), block=block, grid=grid) end_time = time.time() cuda_time = end_time - start_time print("PyCuda log10 {0:.4f} seconds".format(cuda_time)) # NumPy start_time = time.time() np_data_result = numpy.log10(data) end_time = time.time() numpy_time = end_time - start_time print("NumPy log10 {0:.4f} seconds ".format(numpy_time)) print("PyCuda is {0:.1f} times faster than NumPy".format(numpy_time / cuda_time)) # Show image using matplotlib figure = plot.figure() figure.add_subplot(3,1,1) plot.imshow(data, cmap = colormap.Greys_r) figure.add_subplot(3,1,2) plot.imshow(data_result, cmap = colormap.Greys_r) figure.add_subplot(3,1,3) plot.imshow(np_data_result, cmap = colormap.Greys_r) plot.show()
Of course, the question is, how much faster is CUDA than NumPy?
PyCuda log10 0.0243 seconds
NumPy log10 0.0402 seconds
PyCuda is 1.7 times faster than NumPy
Almost 2 times faster. Amazing, isn't it? In your case, it will depend on the calculations that you want to do. The difference can be much higher.
So, if your computer has a NVIDIA graphics card, take a look to PyCuda. It will save lots of time. For other graphics, take a look to OpenCL and alternative to PyCuda which is vendor-neutral.
And finally, another processing that I've done using with PyCuda to average images of the galaxy NGC 253 that I took a couple of years ago. The goal is to increase the signal-to-noise ratio by averaging many images.
Recent Comments