Given that the AFS share is mounted on each server you only have to do one local installation.
Example
- To illustrate the parallelization facilities provided by these servers, we will discuss a simple
xample of little or no practical use
- We will generate B replicates from the sampling distribution of the MAD (Median Absolute Difference) of '
based on pseudo random samples of size n
- We will use the facilties provided by the R packages snow and Rmpi
- The following function generates B replicates of the MAD based on pseudo random samples of size n (default 100000) from
a standard normal law
> foo=function(B,n=100000){replicate(B,mad(rnorm(n)))}
> B=16000
Example (single thread)
Let us try this function on a single thread:
> B=16000
> unix.time(foo(B))
user system elapsed
806.520 8.029 814.696
Example (two nodes)
We will create an MPI cluster with two nodes.
In R, the snow package provides the necessary interface.
The makeCluster() command creates the cluster (note that it loads the Rmpi package),
while the stopCluster() command terminates the cluster.
Note that given that job
is to be slit up between two nodes, each node will produce
B/2 (=8000) replicates.
> library(snow)
> k=2
> cl=makeCluster(k)
Loading required package: Rmpi
2 slaves are spawned successfully. 0 failed.
> unix.time(clusterCall(cl,foo,B=B/k))
user system elapsed
0.007 0.001 424.004
> stopCluster(cl)
Example (four nodes)
Next, try it using four nodes:
> k=4
> cl=makeCluster(k)
4 slaves are spawned successfully. 0 failed.
> unix.time(clusterCall(cl,foo,B=B/k))
user system elapsed
0.012 0.001 217.528
Example (eight nodes)
Next, try it using eight nodes:
> k=8
> cl=makeCluster(k)
8 slaves are spawned successfully. 0 failed.
> unix.time(clusterCall(cl,foo,B=B/k))
user system elapsed
0.019 0.002 129.770
> stopCluster(cl)
Example (sixteen nodes)
Finally, let us try using all 16 cores on bb16a:
> k=16
> cl=makeCluster(k)
16 slaves are spawned successfully. 0 failed.
> unix.time(clusterCall(cl,foo,B=B/k))
user system elapsed
0.043 0.001 90.829
> stopCluster(cl)
Example (summary)
nodes minutes cmp
1 1 13.5 13.5
2 2 7.1 6.8
3 4 3.6 3.4
4 8 2.2 1.7
5 16 1.5 0.8
Concluding remarks
- The actual speedup depends on many factors (MPI overhead, system resource availability, etc)
- Most importantly, it will depend on the fraction of the job that is serialized (refer to Amdahl's law)
- the kind of parallelization considered in the example work best for so called Embarrassingly_parallel problems
(e.g., permutation resampling)
- You can use the lamhalt command (from the shell) to kill all lam jobs
- The Intel compilers provide OpenMP parallelization support (gcc 4.2 is provides this as well).