Homework 02

Write code to solve all problems. The grading rubric includes the following criteria:

  • Correctness
  • Readability
  • Efficiency

Please do not copy answwrs found on the web or elsewhere as it will not benefit your learning. Searching the web for general references etc is OK. Some discussion with friends is fine too - but again, do not just copy thier answer.

Honor Code: By submitting this assignment, you certify that this is your origianl work.

Q1. (15 pts) Exercise 1. Write a 12 by 12 times table matrix shwon below. Do this

  • using nested for loops

  • uisng numpy fromfunction array constructor

  • using numpy broadcasting

    array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])
    
In [1]:
import numpy as np

ns = np.arange(1, 13)
In [2]:
n = len(ns)
m = np.empty((n, n), dtype='int')
for i, x in enumerate(ns):
    for j, y in enumerate(ns):
        m[i, j] = x*y
m
Out[2]:
array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])
In [3]:
np.fromfunction(lambda i, j: (i+1)*(j+1), shape=(12,12)).astype('int')
Out[3]:
array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])
In [4]:
ns[:, None] * ns[None, :]
Out[4]:
array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])

Q2. (25 points) In this exercise, we will practice using Pandas dataframes to explore and summarize a data set heart.

This data contains the survival time after receiving a heart transplant, the age of the patient and whether or not the survival time was censored

  • Number of Observations - 69
  • Number of Variables - 3

Variable name definitions:

-  survival - Days after surgery until death
-  censors - indicates if an observation is censored. 1 is uncensored
-  age - age at the time of surgery

Answer the following questions with respect to the heart data set:

  • How many patients were censored?
  • What is the correlation coefficient between age and survival for uncensored patients?
  • What is the average age for censored and uncensored patients?
  • What is the average survival time for censored and uncensored patients under the age of 45?
  • What is the survival time of the youngest and oldest uncensored patient?
In [5]:
import statsmodels.api as sm
heart = sm.datasets.heart.load_pandas().data
heart.head(n=6)
Out[5]:
survival censors age
0 15 1 54.3
1 3 1 40.4
2 624 1 51.0
3 46 1 42.5
4 127 1 48.0
5 64 1 54.6
In [6]:
# How many patients were censored?

print('# censroed', sum(heart.censors == 0), '\n')

# What is the correlation coefficient between age and survival for uncensored patients?

uncensored = heart[heart.censors == 1]
print('Correlation coefficient', np.corrcoef(uncensored.age, uncensored.survival)[0,1], '\n')

# What is the average age for censored and uncensored patients?

print(heart.groupby('censors')['age'].mean(), '\n')

# What is the average survival time for censored and uncensored patients under the age of 45?

young = heart[heart.age < 45]
print(young.groupby('censors')['survival'].mean(), '\n')

# What is the survival time of the youngest and oldest uncensored patient?

print('Survival of youngest', uncensored.survival[np.argmin(uncensored.age)])
print('Survival of olderst', uncensored.survival[np.argmax(uncensored.age)])
# censroed 24

Correlation coefficient 0.00325649928321

censors
0    41.729167
1    48.484444
Name: age, dtype: float64

censors
0    712.818182
1    169.909091
Name: survival, dtype: float64

Survival of youngest 228.0
Survival of olderst 60.0

Q3. (40 pts)

Given matrix \(M\)

[[7, 8, 8],
 [1, 3, 8],
 [9, 2, 1]]
  • Normalize the given matrix \(M\) so that all rows sum to 1.0. This cna then be considered as a transition matrix \(P\) for a Markov chain. Find the stationary distribution of this matrix in the following ways using numpy and numpy.linalg (or scipy.linalg):

  • By raising the matrix \(P\) to some large power unitl it doesn’t change with higher powers (see np.linalg.matrix_power) and then calculating \(vP\)

  • From the equation for stationarity \(wP = w\), we can see that \(w\) must be a left eigenvector of \(P\) with eigenvalue \(1\) (Note: np.linalg.eig returns the right eigenvectors, but the left eighenvector of a matrix is the right eigenvector of the transposed matrix). Use this to find \(w\) using np.linalg.eig.

  • Suppose \(w = (w_1, w_2, w_3)\). Then from \(wP = w\), we have:

    \[ \begin{align}\begin{aligned}:nowrap:\\\begin{split} \begin{align} w_1 P_{11} + w_2 P_{21} + w_3 P_{31} &= w_1 \\ w_1 P_{12} + w_2 P_{22} + w_3 P_{32} &= w_2 \\ w_1 P_{13} + w_2 P_{23} + w_3 P_{331} &= w_3 \\ \end{align}\end{split}\\This is a singular system, but we also know that\end{aligned}\end{align} \]

    \(w_1 + w_2 + w_3 = 1\). Use these facts to set up a linear system of equations that can be solved with np.linalg.solve to find \(w\).

Part 1

In [7]:
M = np.array([7,8,8,1,3,8,9,2,1.0]).reshape(3,3)
M
Out[7]:
array([[ 7.,  8.,  8.],
       [ 1.,  3.,  8.],
       [ 9.,  2.,  1.]])
In [8]:
# Normalize the matrix so that rows sum to 1
P = M/np.sum(M, 1)[:, np.newaxis]
P
Out[8]:
array([[ 0.30434783,  0.34782609,  0.34782609],
       [ 0.08333333,  0.25      ,  0.66666667],
       [ 0.75      ,  0.16666667,  0.08333333]])
In [9]:
v = np.random.random(3)
v /= v.sum()
v
Out[9]:
array([ 0.58998219,  0.31194083,  0.09807698])

Part 2: Brute force

In [10]:
# By raising the matrix $P$ to some large power
P50 = np.linalg.matrix_power(P, 50)
P51 = np.dot(P50, P)
# check that P50 is stationary
np.testing.assert_allclose(P50, P51)
np.dot(v, P51)
Out[10]:
array([ 0.39862184,  0.2605972 ,  0.34078096])
In [11]:
import scipy.linalg as la

Part 3: Option 1: Using scipy.linalg

In [12]:
lam, vec = la.eig(P, left=True, right=False)
idx = np.argmin(np.abs(lam - 1))
w = np.real(vec[:, idx])
w/w.sum()
Out[12]:
array([ 0.39862184,  0.2605972 ,  0.34078096])

Part 3: Option 2: Using numpy.linalg with transpose to get the left eigenvectors

In [13]:
# Left eigenvector with eigenvalue 1
# note transpose of P to find left eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(P.T)
# find index of eigenvalue = 1
idx = np.argmin(np.abs(eigenvalues - 1))
w = np.real(eigenvectors[:, idx]).T
# remember to normalize eigenvector to get a probability distribution
w/np.sum(w)
Out[13]:
array([ 0.39862184,  0.2605972 ,  0.34078096])

Part 4

We want to solve

\begin{align} w_1 P_{11} + w_2 P_{21} + w_3 P_{31} &= w_1 \\ w_1 P_{12} + w_2 P_{22} + w_3 P_{32} &= w_2 \\ w_1 + w_2 + w_3 &= 1 \end{align}

In matrix form, we have

\begin{pmatrix} P_{11} - 1 & P_{21} & P_{31}\\ P_{12} & P_{22} -1 & P_{32} \\ 1 & 1 & 1 \end{pmatrix} \begin{pmatrix} w_1 \\ w_2 \\ w_3 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}

Knowing this, it is just a matter of setting up the system for np.linalg.solve to work.

In [14]:
# Solving linear system
# Use the first 2 rows of  the matrix P - I = 0
# and construnct the last row from $w_1 + w_2 + w_3 = 1$
A = P.T - np.eye(3)
A[2] = [1,1,1]
np.linalg.solve(A, [0,0,1])
Out[14]:
array([ 0.39862184,  0.2605972 ,  0.34078096])

Q4. (20 pts) Write code to replicate the following plot using seaborn.

In [16]:
%matplotlib inline
import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()
Out[16]:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
In [17]:
sns.set_context("notebook", font_scale=1.3)
sns.pairplot(iris, hue="species", kind='reg', diag_kind='kde', palette='dark')
pass
../_images/homework_Homework02_Solutions_26_0.png