Friday, 16 January 2009

Week's update

I'm disappointed in the amount of work achieved on the project this week. I mainly focused on the team project, and then have been busy sorting various things out. Next week I will probably limit the team project to just wednesday and try to do more on this project.

I tested non-linear (block) storage of volume data, which I hadn't done yet:

With 6 threads:
Block size............Time
Linear................3.837
2.......................4.587
4.......................5.132
8.......................4.164
16......................4.257
32......................4.914


Blocks are still slower than linear. I optimised the code for sampling from the volume data but only very slightly. I will look more closely at it soon, as in theory block storage should be faster, although I don't want to spend too much time on something that isn't directly related to my project specification.

I checked the Windows Task Manager as was suggested in last week's meeting to confirm that all 8 CPU cores were being maxed out in multithreaded rendering, and they indeed are.

I switched the parallel raycaster to call thread->join() on all of the worker threads, instead of manually polling whether all the threads have finished yet.

I upped the maximum number of threads that can be set in the program to 16, to try more threads, also suggested in last week's meeting. Time A is the threads rendering blocks, time B is the threads rendering alternate pixels.

Threads.....Time A....Time B
1...........7.85......N/A
2...........4.17......4.53
3...........3.05......3.04
4...........2.40......2.01
5...........1.91......1.64
6...........1.64......1.35
7...........1.39......1.16
8...........1.23......1.07
9...........1.39......1.36
10..........1.36......1.26
11..........1.21......1.15
12..........1.15......1.09
13..........1.20......1.16
14..........1.21......1.09
15..........1.28......1.13
16..........1.19......1.12

The times for running more threads than there are CPU cores are not as slow as I expected. Here there are only 2 threads per core so I guess the overhead of splitting time between them is small. I did a quick test of 160 threads and as expected it was very slow.

The times seem generally fastest around the 8 thread mark, compared to 6 threads in a previous test. This could be down to the improved method of waiting for threads to finish. It is hard to draw any real conclusions from this test however as it is only a single test and fluctuations have not been averaged out.

I had a look into SIMD processing. I'm not sure how much performance there is to gain from it, as I imagine the vast majority of the time in this program is down to reading the volume data rather than the processing involved in marching along the rays. I will discuss the idea further in the meeting tomorrow, as mentioned earlier I don't want to spend a lot of time on something not related to my specification when time is starting to look a bit on the short side.

Finally I also had a look into memory alignment to do with the cache, also suggested in the previous meeting, but I'm not sure where it could be applied to best effect and also want to discuss this further in tomorrow's meeting.

No comments: