Department of Biostatistics and Bioinformatics Computing Resources

Computing Servers Specs and Setup:

Hardware

Two 16-core AMD Opteron 8222SE servers with 64 (expandable to 128)GB of RAM
One 8-core AMD Opteron 8222SE server with 32 (expandable to 64)GB of RAM
The OS platform is CENTOS 5 (a whitebox version of RHEL 5)
Each server has useable 2.1TB of local storage (with RAID 10 redundancy)
Each machine mounts the user's OIT AFS share as the home directory

Each machine also mounts the storage shares from the other two machines as network shares (cool feature):

-bash-3.1$ df -H
Filesystem             Size   Used  Avail Use% Mounted on
/dev/mapper/VolGroup00-root
                       466G   2.1G   440G   1% /
/dev/md0               2.1G    49M   2.0G   3% /boot
tmpfs                   34G      0    34G   0% /dev/shm
AFS                    9.3G      0   9.3G   0% /afs
/dev/sdc1              2.2T   284M   2.1T   1% /srv/bb16a
biostat-bioinfo-03:/srv/bb8a
                       2.2T   2.6G   2.1T   1% /srv/bb8a
biostat-bioinfo-02:/srv/bb16b
                       2.2T   208M   2.1T   1% /srv/bb16b

Select List of Installed Software (as of 01/09/2008)

R 2.6.1 along with core packages
Bioconductor release 2.1 with core packages
GNU 4.1.1 C/C++/Fortran compilers
lam, lam-dev, mpicc, mpiCC, mpif77
Python, Perl, SUN Java
emacs, pico, vi
subversion and cvs
teTeX, psutils,enscript, a2ps, ispell,aspell
other standard GNU tools

Other Available Software

Any RPM package on the CENTOS yum repositories (free open-source software)
Need to discuss with OIT on how to install non-RPM open-source software
Intel C++ and Fortran (no fee for academic use)
Mathematica and Maple (no fee through OIT site-license)
Matlab, SAS and IMSL (requires payment of license fees)

Login Instructions

Use your favorite ssh client to access the servers using your NETID:

$ ssh owzar001@bb16a.oit.duke.edu
$ ssh owzar001@bb16b.oit.duke.edu
$ ssh owzar001@bb8a.oit.duke.edu

Use the -X switch to enable X forwarding

$ ssh -X owzar001@bb16a.oit.duke.edu
$ ssh -X owzar001@bb16b.oit.duke.edu
$ ssh -X owzar001@bb8a.oit.duke.edu

If you use one of those "other" so called operating systems, you can download an ssh client from the OIT site.

Hardware, Software and Account Management

The servers are managed by Duke OIT
You need to have a NETID to be able to use these resources
If you have a NETID but forgotten your password follow the information provided here
For account and software requests please e-mail kouros[dot]owzar[at]duke[dot]edu

Installing R and Bioconductor packages

You can requests R or Bioconductor packages to be added
Alternatively, you can install (with relative ease) local version of R packages.

-bash-3.1$ R
R> install.packages("xtable")
Warning in install.packages("xtable") :
  argument 'lib' is missing: using '/usr/local/lib/R/site-library'
Warning in install.packages("xtable") :
  'lib = "/usr/local/lib/R/site-library"' is not writable
Would you like to create a personal library
'~/R/i486-pc-linux-gnu-library/2.6'

By default, the package will be installed in ~/R/i486-pc-linux-gnu-library/2.6. Note that following installation, the package is in your R path

 
-bash-3.1$ R
R> .libPaths()
[1] "/afs/acpub/users/o/w/owzar001/R/x86_64-redhat-linux-gnu-library/2.6"
[2] "/usr/lib64/R/library"

Given that the AFS share is mounted on each server you only have to do one local installation.

Example

To illustrate the parallelization facilities provided by these servers, we will discuss a simple xample of little or no practical use
We will generate B replicates from the sampling distribution of the MAD (Median Absolute Difference) of ' based on pseudo random samples of size n
We will use the facilties provided by the R packages snow and Rmpi
The following function generates B replicates of the MAD based on pseudo random samples of size n (default 100000) from a standard normal law
```
 
> foo=function(B,n=100000){replicate(B,mad(rnorm(n)))}
> B=16000
```

Example (single thread)

Let us try this function on a single thread:

 
> B=16000
> unix.time(foo(B))
   user  system elapsed
806.520   8.029 814.696

Example (two nodes)

We will create an MPI cluster with two nodes. In R, the snow package provides the necessary interface. The makeCluster() command creates the cluster (note that it loads the Rmpi package), while the stopCluster() command terminates the cluster. Note that given that job is to be slit up between two nodes, each node will produce B/2 (=8000) replicates.

 
> library(snow)
> k=2
> cl=makeCluster(k)
Loading required package: Rmpi
        2 slaves are spawned successfully. 0 failed.
> unix.time(clusterCall(cl,foo,B=B/k))
   user  system elapsed
  0.007   0.001 424.004
> stopCluster(cl)

Example (four nodes)

Next, try it using four nodes:

 
> k=4
> cl=makeCluster(k)
        4 slaves are spawned successfully. 0 failed.
> unix.time(clusterCall(cl,foo,B=B/k))
   user  system elapsed
  0.012   0.001 217.528

Example (eight nodes)

Next, try it using eight nodes:

 
> k=8
> cl=makeCluster(k)
        8 slaves are spawned successfully. 0 failed.
> unix.time(clusterCall(cl,foo,B=B/k))
   user  system elapsed
  0.019   0.002 129.770
> stopCluster(cl)

Example (sixteen nodes)

Finally, let us try using all 16 cores on bb16a:

 
> k=16
> cl=makeCluster(k)
        16 slaves are spawned successfully. 0 failed.
> unix.time(clusterCall(cl,foo,B=B/k))
   user  system elapsed
  0.043   0.001  90.829
> stopCluster(cl)

Example (summary)

 
 nodes minutes  cmp
1     1    13.5 13.5
2     2     7.1  6.8
3     4     3.6  3.4
4     8     2.2  1.7
5    16     1.5  0.8

Concluding remarks

The actual speedup depends on many factors (MPI overhead, system resource availability, etc)
Most importantly, it will depend on the fraction of the job that is serialized (refer to Amdahl's law)
the kind of parallelization considered in the example work best for so called Embarrassingly_parallel problems (e.g., permutation resampling)
You can use the lamhalt command (from the shell) to kill all lam jobs
The Intel compilers provide OpenMP parallelization support (gcc 4.2 is provides this as well).