Setting up a local install of Jupyter with multiple kernels (Python 3.5, Python 2.7, R, Juila)

The only installation you are recommended to do is to install Anaconda 3.5, so that you have a backup when the OIT version is flaky. The other kernels and the Docker version are not required and you should only do so if you are comforatble with command line installs. Even the Anaconda 3.5 installation is optional if the OIT version works well for you.

Note: I have only done this on OSX 10.11.2 (El Capitan) with XCode and command line compilers installed.

To install Anaconda for Python 3.5

Download and install Anaconda Python 3.5 from https://www.continuum.io/downloads

Open a terminal

conda update conda
conda update anaconda

(OPTIONAL) To install Python 2.7 as well

Open a terminal

conda create -n py27 python=2.7 anaconda
source activate py27
ipython kernel install
source deactivate

(OPTIONAL) To install R

  • If you want conda to manage your R packages
conda install -y -c r r-irkernel r-recommended r-essentials

Note: The bug that required this appears to have been fixed
wget https://anaconda.org/r/ncurses/5.9/download/osx-64/ncurses-5.9-1.tar.bz2 \
        https://anaconda.org/r/nlopt/2.4.2/download/osx-64/nlopt-2.4.2-1.tar.bz2 \
    && conda install --yes ncurses-5.9-1.tar.bz2 nlopt-2.4.2-1.tar.bz2
  • If you have an existing R installation that you want to use

Start R

install.packages(c('rzmq','repr','IRkernel','IRdisplay'),
                 repos = c('http://irkernel.github.io/', getOption('repos')))
IRkernel::installspec()

(OPTIONAL) To install Julia

Download and install Julia from http://julialang.org/downloads/

Start Julia

Pkg.add("IJulia")
Pkg.build("IJulia")

(OPTIONAL) To install pyspark

Open a terminal

conda install -y -c anaconda-cluster spark

Check

Open terminal

jupyter notebook

See if the installed kernels are found in the drop-down menu.

(OPTIONAL) Installing Spark via Docker

Be patient - this can take a while the first time you do it

When done, it shouuld show something like this

                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""\___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           \______ o           __/
             \    \         __/
              \____\_______/


docker is configured to use the default machine with IP 192.168.99.100
For help getting started, check out the docs at https://docs.docker.com

Note the IP address given - you will need this to access the notebook.

In the Docker terminal

docker run -d -p 8888:8888 jupyter/all-spark-notebook

For how to connect to a Spark cluster, see official instructions

Testing the Docker installation

Check by typing in the Docker terminal

docker ps

Be patient - this can take a while the first time you do it.

It shoudl show something like

CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS                    NAMES
965a6a80bf44        jupyter/all-spark-notebook   "tini -- start-notebo"   4 minutes ago       Up 4 minutes        0.0.0.0:8888->8888/tcp   big_kilby

Note the machine name (mine is big_kilby, yours will likely be different).

Open your browser at the following URL http://192.168.99.100:8888 (Use the IP given above)

This should bring you to a Jupyter notebook. Open a Python3 notebook from the drop dwon menu and test:

import pyspark
sc = pyspark.SparkContext('local[*]')

# do something to prove it works
rdd = sc.parallelize(range(1000))
rdd.takeSample(False, 5)

If successful, you should get a list of 5 integers after a short delay.

Save and exit the notebook.

Cleap up in the Docker terminal

docker stop big_kilby
exit

Use the machine name foudnd with docker ps in place of big_kilby.

In [ ]: