In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Getting Started With Graphics¶
When constructing a visualization, we are mapping data elements (annotation, categories, numbers, time series) to visual elements (coordinates, color, size, movement). So the first step in designing a graphic is to decide on
- a data set (usually a
DataFrame
with observations in rows and variables in columns) - a mapping (e.g. map height to x-coordinate, weight to y-coordinate, age to color)
- the type of plot(s) desired
- e.g., bar chart or box plot
- several types can sometimes be overlaid e.g., rug plot on density plot
After that, we can customize the visual elements in several ways
- direct setting of visual element attributes (size, thickness, color, transparency)
- adding labels (title, subtitle, x-axis label, y-axis label)a
- adding guides (legend, color bar)
- adding annotations (text labels, arrows)
- changing coordinate systems (Cartesian to polar, linear to log)
- changing color scales (color palettes and color maps)
- changing graphic extents (minimum and maximum values displayed)
For global changes to the look and feel of visual elements, we can set styles or themes that simultaneously alter many graphical aspects - background and foreground colors, color scheme, font family used etc.
Sometimes, we need to display multiple plots in a single graphic. To do so, we create a layout that specifies how different plots are related to each other (relative size, sharing of axes). There are two kinds of layouts
- plots are related (i.e. belong to the same data set and type, differ only in choice or subgroup of data elements presented)
- plots are unrelated
Finally, we often need to save the graphic to a file for later viewing or inclusion in a report.
Resources¶
Data Set¶
We will load an example data set for the pulse rate after exercise in an experiment with the following variables
- diet
- low fat
- no fat
- duration of exercise
- 1 min
- 15 min
- 30 min
- condition
- rest
- walking
- running
In [2]:
exercise = sns.load_dataset("exercise", index_col = 0)
In [3]:
exercise.head()
Out[3]:
id | diet | pulse | time | kind | |
---|---|---|---|---|---|
0 | 1 | low fat | 85 | 1 min | rest |
1 | 1 | low fat | 85 | 15 min | rest |
2 | 1 | low fat | 88 | 30 min | rest |
3 | 2 | low fat | 90 | 1 min | rest |
4 | 2 | low fat | 92 | 15 min | rest |
Mapping¶
Suppose we are interested in the change in pulse after running for 1, 15 or 30 minutes, and also if the type of diet makes a difference. One way to do this is to map time to the x-coordinate, pulse to the y-coordinate and diet to the color. We do this with several types of plots.
In [4]:
df = exercise[exercise.kind == 'running']
df.head()
Out[4]:
id | diet | pulse | time | kind | |
---|---|---|---|---|---|
60 | 21 | low fat | 93 | 1 min | running |
61 | 21 | low fat | 98 | 15 min | running |
62 | 21 | low fat | 110 | 30 min | running |
63 | 22 | low fat | 98 | 1 min | running |
64 | 22 | low fat | 104 | 15 min | running |
Note: The pass
is just there to suppress extra unwanted output
from the graphics command, which gets displayed since it is the last
command in a cell. You can also suppress the output by using a
semi-colon ;
to terminate the expression:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df);
In [5]:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
pass
In [6]:
sns.boxplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
pass
Setting visual elements¶
Change transparency¶
In [7]:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df,
alpha = 0.5)
pass
Add title¶
In [8]:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
plt.title('Running')
pass
Change coordinate system¶
In [9]:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
plt.gca().set(yscale = "log")
pass
Change color palette¶
In [10]:
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df,
palette = 'RdBu_r')
pass
Multiple plots¶
Suppose we want to see the results for all types of exercise, not just running. We can use multiple plots to achieve this.
In [11]:
sns.factorplot(x = 'time', y = 'pulse', hue = 'diet', col = 'kind',
kind = 'point', data = exercise)
pass
Global changes with themes¶
In [12]:
sns.set_context("notebook", font_scale=1.0)
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
pass
In [13]:
sns.set_context("paper", font_scale=2.0)
sns.set_style('white')
sns.barplot(x = 'time', y = 'pulse', hue = 'diet', data = df)
pass
In [14]:
sns.set()
sns.set_context("notebook", font_scale = 1.5)
Saving plots¶
Preferred¶
In [15]:
g = sns.factorplot(x = 'time', y = 'pulse', hue = 'diet', col = 'kind',
kind = 'bar', palette = 'bright', data = exercise)
g.savefig('figs/exercise1.png')
Alternative¶
In [16]:
sns.factorplot(x = 'time', y = 'pulse', hue = 'diet', col = 'kind',
kind = 'bar', palette = 'RdBu_r', data = exercise)
plt.savefig('figs/exercise2.png')
Check¶
In [17]:
from IPython.display import Image
In [18]:
Image('figs/exercise1.png')
Out[18]:
In [19]:
Image('figs/exercise2.png')
Out[19]: