I have to run Markov chain Monte Carlo (MCMC) simulations that each takes hours and requires parameter configuration. On this particular day, it was a Metropolis Hastings algorithm for which I have to specify the step size of the proposal distribution.
After getting sick of manually changing the step size each time, I gave in and parallelize the MCMCs, running all of them in one go. Given the big time cost of each MCMC, I really don’t want one failure to jeopardize the rest. So I wrote a parallelized script that accomplishes the following goals:
The result of each chain is saved as soon as it is done
The progress is tracked in a log file
I used the foreach package to parallelize plus some tricks to create an informative log file and file names. Below is the code for my_parallel_mcmc.R:
Inside your f_mcmc() function, it should 1) periodically print out a progress report, and 2) save the MCMC result at the end.
I used git and github to clone my script onto a remote cluster. From its terminal, I run this script with Rscript my_parallel_mcmc_script.R. To check the progress, I use tail -f my_mcmc.log. To keep the job running even after you disconnect from the remote cluster, use tmux.