In [1]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

%load_ext version_information
%load_ext rpy2.ipython

Getting Started With Graphics

When constructing a visualization, we are mapping data elements (annotation, categories, numbers, time series) to visual elements (coordinates, color, size, movement). So the first step in designing a graphic is to decide on

  • a data set (usually a DataFrame with observations in rows and variables in columns)
  • a mapping (e.g. map height to x-coordinate, weight to y-coordinate, age to color)
  • the type of plot(s) desired
    • e.g., bar chart or box plot
    • several types can sometimes be overlaid e.g., rug plot on density plot

After that, we can customize the visual elements in several ways

  • direct setting of visual element attributes (size, thickness, color, transparency)
  • adding labels (title, subtitle, x-axis label, y-axis label)a
  • adding guides (legend, color bar)
  • adding annotations (text labels, arrows)
  • changing coordinate systems (Cartesian to polar, linear to log)
  • changing color scales (color palettes and color maps)
  • changing graphic extents (minimum and maximum values displayed)

For global changes to the look and feel of visual elements, we can set styles or themes that simultaneously alter many graphical aspects - background and foreground colors, color scheme, font family used etc.

Sometimes, we need to display multiple plots in a single graphic. To do so, we create a layout that specifies how different plots are related to each other (relative size, sharing of axes). There are two kinds of layouts

  • plots are related (i.e. belong to the same data set and type, differ only in choice or subgroup of data elements presented)
  • plots are unrelated

Finally, we often need to save the graphic to a file for later viewing or inclusion in a report.

Resources

Data Set

We will load an example data set for the pulse rate after exercise in an experiment with the following variables

  • diet
    • low fat
    • no fat
  • duration of exercise
    • 1 min
    • 15 min
    • 30 min
  • condition
    • rest
    • walking
    • running
In [2]:
exercise = sns.load_dataset("exercise", index_col = 0)
In [3]:
exercise.head()
Out[3]:
id diet pulse time kind
0 1 low fat 85 1 min rest
1 1 low fat 85 15 min rest
2 1 low fat 88 30 min rest
3 2 low fat 90 1 min rest
4 2 low fat 92 15 min rest

Mapping

Suppose we are interested in the change in pulse after running for 1, 15 or 30 minutes, and also if the type of diet makes a difference. One way to do this is to map time to the x-coordinate, pulse to the y-coordinate and diet to the color. We do this with several types of plots.

In [4]:
df = exercise[exercise.kind == 'running']
df.head()
Out[4]:
id diet pulse time kind
60 21 low fat 93 1 min running
61 21 low fat 98 15 min running
62 21 low fat 110 30 min running
63 22 low fat 98 1 min running
64 22 low fat 104 15 min running
In [5]:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
pass
_images/Getting_Started_With_Graphics_8_0.png
In [6]:
sns.boxplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
pass
_images/Getting_Started_With_Graphics_9_0.png

Setting visual elements

Change transparency

In [7]:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df,
            alpha = 0.5)
pass
_images/Getting_Started_With_Graphics_12_0.png

Add title

In [8]:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df,)
plt.title('Running')
pass
_images/Getting_Started_With_Graphics_14_0.png

Change coordinate system

In [9]:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
plt.gca().set(yscale = "log")
pass
_images/Getting_Started_With_Graphics_16_0.png

Change color palette

In [10]:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df,
            palette = 'RdBu_r')
pass
_images/Getting_Started_With_Graphics_18_0.png

Multiple plots

Suppose we want to see the results for all types of exercise, not just running. We can use multiple plots to achieve this.

In [11]:
sns.factorplot(x = 'time', y = 'pulse', hue = 'diet', col = 'kind',
               kind = 'point', data = exercise)
pass
_images/Getting_Started_With_Graphics_20_0.png

Global changes with themes

In [12]:
sns.set_context("notebook", font_scale=1.0)
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
pass
_images/Getting_Started_With_Graphics_22_0.png
In [13]:
sns.set_context("paper", font_scale=2.0)
sns.set_style('white')
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
pass
_images/Getting_Started_With_Graphics_23_0.png
In [14]:
sns.set()
sns.set_context("notebook", font_scale = 1.5)

Saving plots

Preferred

In [15]:
g = sns.factorplot(x = 'time', y = 'pulse', hue = 'diet', col = 'kind',
               kind = 'bar', palette = 'bright', data = exercise)
g.savefig('figs/exercise1.png')
_images/Getting_Started_With_Graphics_28_0.png

Alternative

In [16]:
sns.factorplot(x = 'time', y = 'pulse', hue = 'diet', col = 'kind',
               kind = 'bar', palette = 'RdBu_r', data = exercise)
plt.savefig('figs/exercise2.png')
_images/Getting_Started_With_Graphics_30_0.png

Check

In [17]:
from IPython.display import Image
In [18]:
Image('figs/exercise1.png')
Out[18]:
_images/Getting_Started_With_Graphics_33_0.png
In [19]:
Image('figs/exercise2.png')
Out[19]:
_images/Getting_Started_With_Graphics_34_0.png

Version Information

In [20]:
%load_ext version_information
%version_information
The version_information extension is already loaded. To reload it, use:
  %reload_ext version_information
Out[20]:
SoftwareVersion
Python3.5.2 64bit [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
IPython5.0.0
OSDarwin 15.6.0 x86_64 i386 64bit
Tue Aug 16 09:04:30 2016 EDT
In [ ]: