October 24, 2005

This, I called simulation

Filed under: Causal inference and statistics, Uncategorized — xlsyu @ 3:35 pm

Last week I was troubled by a treacherous simulation problem. The problem was set up as a slow convergence example and was to demonstrate some techniques to improve its convergence. But the wicked part of the problem was that the instructor had overdone the setup so that it seemed to never converge, partly due to his own mistake in his code.

In fact, after I fixed instructor’s bug, the revised program never converged (at least in a manageable time). The simulation kept on running 24/7 on my two laptops. My heart sunk with the constant fan noise. This was not good.

After several fruitless attempts, my wife suggested doing a simulation with 109 runs. I kindly discouraged her by pointing out that given about 20 min for a 106 simulation job, 109 simulations might take more than ten days to finish. And we had two such jobs! Simulation in pure R is too burdensome.

Last Saturday evening, tired of being bugged around by my wife, I decided to write a C program for the simulation (the compiled C module still runs inside R). I have done that long time ago.

It looked easy to translate R algorithm to C, but it was not. First, I had to write out all the vectorized computation in R as looping through C array. This was not an easy task, as the subscripts were messy to handle. Second, I searched everywhere to find a solution to pass matrixes between R and C, which it turned out that R never transfers real matrix format into C, and vice versa. The ways they index the matrix were also significantly different. I had to use the unwieldy vector way to handle R matrix. Please note that the new R-C interface, however, claims that one can use the R SEXP data type in C so that you can transfer R data objects between R and C. But that was too much for me to learn in the later night. Besides, what’s the meaning of SEX-P?

The worst thing was debugging my codes, and most time I was barking on the wrong tree. The problem is that there is no good debugging tool for programming C within R. I thought some parts were prone to errors such as subscripts, but in fact they were bug-free from the first place. The majority of mistakes were things like wrong dimensions (causing segmentation fault error), typos and unnecessary loops. Even the print function for debugging took me an hour to figure out the correct grammar (I should read the code sample first). Anyway, after 24 hours of hard work, my C program–four functions, 200 lines–finally worked. It was also optimized and bug-free.

The results were significant. At a speed of the one-fifth time of the R algorithm, the C functions made the large simulation feasible. A108 simulations took only a little less than two hours. This was very impressive. Although a 109 simulation may still need a full day, it won’t make you feel the end of the world any more.

Then how could people in 1960s run simulation without the fancy computers and efficient languages?

2 Comments »

  1. Usually C++/C are good for looping, hence heavy computation should be written in it.

    But lazy me never know how to link R and C. I will just pick some the tools according to the task. Maybe that’s because I am too causal towards interface.

    Comment by mango — November 15, 2005 @ 11:36 am

RSS feed for comments on this post. TrackBack URI

Leave a comment


Freely hosted by www.xlogit.com. Powered by WordPress.