Homework 02¶
Write code to solve all problems. The grading rubric includes the following criteria:
- Correctness
- Readability
- Efficiency
Please do not copy answwrs found on the web or elsewhere as it will not benefit your learning. Searching the web for general references etc is OK. Some discussion with friends is fine too - but again, do not just copy thier answer.
Honor Code: By submitting this assignment, you certify that this is your origianl work.
Q1. (15 pts) Exercise 1. Write a 12 by 12 times table matrix shwon below. Do this
using nested for loops
uisng numpy fromfunction array constructor
using numpy broadcasting
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], [ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24], [ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36], [ 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48], [ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60], [ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72], [ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84], [ 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96], [ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108], [ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120], [ 11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132], [ 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]])
In [1]:
import numpy as np
ns = np.arange(1, 13)
In [2]:
n = len(ns)
m = np.empty((n, n), dtype='int')
for i, x in enumerate(ns):
for j, y in enumerate(ns):
m[i, j] = x*y
m
Out[2]:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36],
[ 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48],
[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
[ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72],
[ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84],
[ 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96],
[ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108],
[ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
[ 11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132],
[ 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]])
In [3]:
np.fromfunction(lambda i, j: (i+1)*(j+1), shape=(12,12)).astype('int')
Out[3]:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36],
[ 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48],
[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
[ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72],
[ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84],
[ 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96],
[ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108],
[ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
[ 11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132],
[ 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]])
In [4]:
ns[:, None] * ns[None, :]
Out[4]:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36],
[ 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48],
[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
[ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72],
[ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84],
[ 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96],
[ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108],
[ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
[ 11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132],
[ 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]])
Q2. (25 points) In this exercise, we will practice using Pandas
dataframes to explore and summarize a data set heart
.
This data contains the survival time after receiving a heart transplant, the age of the patient and whether or not the survival time was censored
- Number of Observations - 69
- Number of Variables - 3
Variable name definitions:
- survival - Days after surgery until death
- censors - indicates if an observation is censored. 1 is uncensored
- age - age at the time of surgery
Answer the following questions with respect to the heart
data set:
- How many patients were censored?
- What is the correlation coefficient between age and survival for uncensored patients?
- What is the average age for censored and uncensored patients?
- What is the average survival time for censored and uncensored patients under the age of 45?
- What is the survival time of the youngest and oldest uncensored patient?
In [5]:
import statsmodels.api as sm
heart = sm.datasets.heart.load_pandas().data
heart.head(n=6)
Out[5]:
survival | censors | age | |
---|---|---|---|
0 | 15 | 1 | 54.3 |
1 | 3 | 1 | 40.4 |
2 | 624 | 1 | 51.0 |
3 | 46 | 1 | 42.5 |
4 | 127 | 1 | 48.0 |
5 | 64 | 1 | 54.6 |
In [6]:
# How many patients were censored?
print('# censroed', sum(heart.censors == 0), '\n')
# What is the correlation coefficient between age and survival for uncensored patients?
uncensored = heart[heart.censors == 1]
print('Correlation coefficient', np.corrcoef(uncensored.age, uncensored.survival)[0,1], '\n')
# What is the average age for censored and uncensored patients?
print(heart.groupby('censors')['age'].mean(), '\n')
# What is the average survival time for censored and uncensored patients under the age of 45?
young = heart[heart.age < 45]
print(young.groupby('censors')['survival'].mean(), '\n')
# What is the survival time of the youngest and oldest uncensored patient?
print('Survival of youngest', uncensored.survival[np.argmin(uncensored.age)])
print('Survival of olderst', uncensored.survival[np.argmax(uncensored.age)])
# censroed 24
Correlation coefficient 0.00325649928321
censors
0 41.729167
1 48.484444
Name: age, dtype: float64
censors
0 712.818182
1 169.909091
Name: survival, dtype: float64
Survival of youngest 228.0
Survival of olderst 60.0
Q3. (40 pts)
Given matrix \(M\)
[[7, 8, 8],
[1, 3, 8],
[9, 2, 1]]
Normalize the given matrix \(M\) so that all rows sum to 1.0. This cna then be considered as a transition matrix \(P\) for a Markov chain. Find the stationary distribution of this matrix in the following ways using
numpy
andnumpy.linalg
(orscipy.linalg
):By raising the matrix \(P\) to some large power unitl it doesn’t change with higher powers (see
np.linalg.matrix_power
) and then calculating \(vP\)From the equation for stationarity \(wP = w\), we can see that \(w\) must be a left eigenvector of \(P\) with eigenvalue \(1\) (Note: np.linalg.eig returns the right eigenvectors, but the left eighenvector of a matrix is the right eigenvector of the transposed matrix). Use this to find \(w\) using
np.linalg.eig
.Suppose \(w = (w_1, w_2, w_3)\). Then from \(wP = w\), we have:
\[ \begin{align}\begin{aligned}:nowrap:\\\begin{split} \begin{align} w_1 P_{11} + w_2 P_{21} + w_3 P_{31} &= w_1 \\ w_1 P_{12} + w_2 P_{22} + w_3 P_{32} &= w_2 \\ w_1 P_{13} + w_2 P_{23} + w_3 P_{331} &= w_3 \\ \end{align}\end{split}\\This is a singular system, but we also know that\end{aligned}\end{align} \]\(w_1 + w_2 + w_3 = 1\). Use these facts to set up a linear system of equations that can be solved with
np.linalg.solve
to find \(w\).
Part 1¶
In [7]:
M = np.array([7,8,8,1,3,8,9,2,1.0]).reshape(3,3)
M
Out[7]:
array([[ 7., 8., 8.],
[ 1., 3., 8.],
[ 9., 2., 1.]])
In [8]:
# Normalize the matrix so that rows sum to 1
P = M/np.sum(M, 1)[:, np.newaxis]
P
Out[8]:
array([[ 0.30434783, 0.34782609, 0.34782609],
[ 0.08333333, 0.25 , 0.66666667],
[ 0.75 , 0.16666667, 0.08333333]])
In [9]:
v = np.random.random(3)
v /= v.sum()
v
Out[9]:
array([ 0.58998219, 0.31194083, 0.09807698])
Part 2: Brute force¶
In [10]:
# By raising the matrix $P$ to some large power
P50 = np.linalg.matrix_power(P, 50)
P51 = np.dot(P50, P)
# check that P50 is stationary
np.testing.assert_allclose(P50, P51)
np.dot(v, P51)
Out[10]:
array([ 0.39862184, 0.2605972 , 0.34078096])
In [11]:
import scipy.linalg as la
Part 3: Option 1: Using scipy.linalg¶
In [12]:
lam, vec = la.eig(P, left=True, right=False)
idx = np.argmin(np.abs(lam - 1))
w = np.real(vec[:, idx])
w/w.sum()
Out[12]:
array([ 0.39862184, 0.2605972 , 0.34078096])
Part 3: Option 2: Using numpy.linalg with transpose to get the left eigenvectors¶
In [13]:
# Left eigenvector with eigenvalue 1
# note transpose of P to find left eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(P.T)
# find index of eigenvalue = 1
idx = np.argmin(np.abs(eigenvalues - 1))
w = np.real(eigenvectors[:, idx]).T
# remember to normalize eigenvector to get a probability distribution
w/np.sum(w)
Out[13]:
array([ 0.39862184, 0.2605972 , 0.34078096])
Part 4¶
We want to solve
In matrix form, we have
Knowing this, it is just a matter of setting up the system for
np.linalg.solve
to work.
In [14]:
# Solving linear system
# Use the first 2 rows of the matrix P - I = 0
# and construnct the last row from $w_1 + w_2 + w_3 = 1$
A = P.T - np.eye(3)
A[2] = [1,1,1]
np.linalg.solve(A, [0,0,1])
Out[14]:
array([ 0.39862184, 0.2605972 , 0.34078096])
Q4. (20 pts) Write code to replicate the following plot using
seaborn
.
In [16]:
%matplotlib inline
import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()
Out[16]:
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
In [17]:
sns.set_context("notebook", font_scale=1.3)
sns.pairplot(iris, hue="species", kind='reg', diag_kind='kde', palette='dark')
pass
