Sunday 28 December 2008

Multithreading Begun

I recently downloaded some new data to try, from the same page as the CT scan I have been rendering. It was suggested in the last tutor meeting before christmas that I do this, in order to test that everything is working correctly.

The first set is a scan of the inner ear:



This data set does not contain cube voxels, instead they are rectangles, longer in the Z axis. I added some code to take this into account.

The second new data set is a mathematical function. This contains none of the problems of a scan (innaccuracies, noise etc) so it is good for testing the rendering is correct. I used the colour table function to map different colours and transparencies to various values.



Finally this evening I got multithreading working at a basic level using the boost::threads library.



The red component of the image indicates which thread rendered those pixels, just for testing purposes - here there are 8 threads. This computer only has a single core CPU and so I can't fully test this yet, however I will have a good point to start from when I get back to university after the christmas holidays.

The next task is to implement features such as selecting how the pixels are divided up between threads (in blocks or alternate pixels), possibly some improvements to the lighting, and some better output of rendering times to make testing easier.

Tuesday 23 December 2008

Back to work

After an unproductive week I've done some more work on the project.

After talking to my tutor I spent a bit longer attempting to optimise the basic rendering. When storing the volume data non-linearly the blocks are now power of two only, which allows for bitshifting instead of divides. Unfortunately it is still slower than linear storage.

I also tonight implemented interpolation between voxels when sampling. As expected rendering is quite a bit slower but image quality is improved quite a lot.




Some example render times:

Linear storage:
single sample: 18.8 seconds
interpolated sample: 34.3 seconds

Block storage (size 4)
single sample: 22.2 seconds
interpolated sample: 84.8 seconds

I will try to work when possible over the next couple of weeks. The main aims are to get round to researching the multi-CPU hardware details, and to multithread the ray caster.

Friday 12 December 2008

A Bit Of Research & Tomorrow's Meeting

I have done a brief bit of reading on memory cache issues in multi-core systems. It seems that memory cache is shared between cores - which would initially cause me to say that (in Ray Casting) the threads should be working on image pixels adjacent to those that other threads are working on, in order that they will all be working in similar areas of memory.

However, if multiple threads are accessing data in the same cache sector (the smallest amount of memory that the cache works with), this will cause one to stall whilst waiting for the other read operation to finish. Since this applies to a whole cache sector it can occur just when threads are reading data that is close together, not necessarily just the exact same data.

This suggests each thread should stick to its own seperate portion of the image, which will mean they are largely all working with different areas of the volume also. When I implement the multi-threading I will be able to test both methods and see if the results match this theory.


The final project meeting of the year is tomorrow. I would like to talk a bit about the actual coding of multithreading, including potential libraries to use (the Boost thread library looks promising), and discuss the results of the data storage tests in my previous post.


Links:
http://www.embedded.com/design/multicore/202805545?pgno=3

http://communities.intel.com/openport/community/embedded/multicore/multicore-blog/blog/2008/10/08/cache-efficiency-the-multicore-performance-linchpin-to-packet-processing-applications

"Optimisation"

I implemented the first optimisation of the Ray Casting method.

When storing the volume data in a standard 3D array manner (all voxels in a row, then all rows in a layer, then all layers in a volume), a cache hit is likely to occur when marching along a ray through the volume.

If the ray is travelling directly along the X axis, the data it is reading will be concurrent in memory. However when it moves in any other direction each sample of the data will be spread out in memory, and will not be in the cache from the previous read.

The idea for the optimisation was to store the data in a similar fashion (voxels, rows, layers), but to do this seperately for small blocks of the volume. Then when moving through the volume a cache hit is far less likely, potentially occurring only on the boundary between the blocks.

Unfortunately implementing this actually caused an increase in rendering times. Here are some render times for identical images, using various sizes of block:

1 (original storage method):
12.863 seconds
12.293
12.293

2:
16.816
16.781
16.742

4:
16.641
16.727
16.664

8:
17.180
17.156
17.137

16:
17.238
17.266
17.977

32:
17.227
17.227
17.207

I would guess that the slow-down is due to the extra computation involved in computing the memory location of a voxel. Some optimisations could be made in this area, for example if the block size was limited to powers of two I could replace divides with bit-shifts. However this would also require the size of the volume data to be a power of two (or at least a multiple of the selected block size). I may do this at some point, however for now I think I need to focus more on the parallel processing since that is the main purpose of the project.

Wednesday 10 December 2008

Lighting



The last day or two have been spent fixing problems with the colour table, and adding basic lighting.

I'm really pleased with the results, I can now render some impressive-looking images. It's a shame that it took me a lot longer than I originally expected to get to this stage, but I have been working hard so I can't be too disappointed with slow progress.

As previously mentioned the next tasks are to optimise the ray casting and to do further research on, and implement, multithreading.

Sunday 7 December 2008

Colour Table

Coloured rendering is now partially complete.

This is the colour table that maps voxel value to a colour and opacity. What I plan on implementing next is a small preview of slices of the volume, so it is easy to visualise how the colours will map onto the volume without having to actually render it.

After this, the next major feature is simple lighting, which hopefully won't take too long to do, and then to optimise the ray casting technique.

Thursday 4 December 2008

GUI Started

Had a quite productive evening tonight, mainly working on the infrastructure around the rendering. It now uses a low quality version of the rendering to maintain interactivity whilst rotating the volume, and then after a brief pause with no activity it produces a high quality render. (Visible in the background of the screenshot below; the actual rendering is still basic).

The groundwork for the pop-up GUI is now there. This is available by right-clicking and will contain all the controls for colour tables, rendering setup/stats, loading volume data, etc.

Tomorrow I will begin work on the colour table (mapping different volume data values to different colours/opacities).

Going by my original schedule the single-core raytracing should be finished by the end of the week, which is now highly unlikely to happen. I think this is partly due to not focusing enough time on this module in the earlier weeks, and also due to underestimating the work involved in the framework code that doesn't directly contribute to raytracing.

Once the framework is there it will be easier to complete later parts of the project, and so the time should balance itself out. Nevertheless I will make a concerted effort over the coming weeks (especially in the run up to the christmas break) to make good progress.

Wednesday 3 December 2008

Ray Casting - First Working Demo

I spent this evening refactoring the current code a bit, and making a start on single-core ray casting.



It's extremely simple (and slow) at the moment. The next stages will be to look at some optimisations (in method, data storage, etc) and to improve general functionality of the program. Also I will be researching into multithreading more as stated in the previous post.