# Homework 02¶

Write code to solve all problems. The grading rubric includes the following criteria:

- Correctness
- Readability
- Efficiency

Please do not copy answwrs found on the web or elsewhere as it will not benefit your learning. Searching the web for general references etc is OK. Some discussion with friends is fine too - but again, do not just copy thier answer.

**Honor Code: By submitting this assignment, you certify that this is
your origianl work.**

**Q1. (15 pts)** Exercise 1. Write a 12 by 12 times table matrix shwon
below. Do this

using nested for loops

uisng numpy fromfunction array constructor

using numpy broadcasting

array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], [ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24], [ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36], [ 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48], [ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60], [ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72], [ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84], [ 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96], [ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108], [ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120], [ 11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132], [ 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]])

```
In [1]:
```

```
import numpy as np
ns = np.arange(1, 13)
```

```
In [2]:
```

```
n = len(ns)
m = np.empty((n, n), dtype='int')
for i, x in enumerate(ns):
for j, y in enumerate(ns):
m[i, j] = x*y
m
```

```
Out[2]:
```

```
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36],
[ 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48],
[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
[ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72],
[ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84],
[ 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96],
[ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108],
[ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
[ 11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132],
[ 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]])
```

```
In [3]:
```

```
np.fromfunction(lambda i, j: (i+1)*(j+1), shape=(12,12)).astype('int')
```

```
Out[3]:
```

```
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36],
[ 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48],
[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
[ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72],
[ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84],
[ 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96],
[ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108],
[ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
[ 11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132],
[ 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]])
```

```
In [4]:
```

```
ns[:, None] * ns[None, :]
```

```
Out[4]:
```

```
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36],
[ 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48],
[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
[ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72],
[ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84],
[ 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96],
[ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108],
[ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
[ 11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132],
[ 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]])
```

**Q2. (25 points)** In this exercise, we will practice using Pandas
dataframes to explore and summarize a data set `heart`

.

This data contains the survival time after receiving a heart transplant, the age of the patient and whether or not the survival time was censored

- Number of Observations - 69
- Number of Variables - 3

Variable name definitions:

```
- survival - Days after surgery until death
- censors - indicates if an observation is censored. 1 is uncensored
- age - age at the time of surgery
```

Answer the following questions with respect to the `heart`

data set:

- How many patients were censored?
- What is the correlation coefficient between age and survival for uncensored patients?
- What is the average age for censored and uncensored patients?
- What is the average survival time for censored and uncensored patients under the age of 45?
- What is the survival time of the youngest and oldest uncensored patient?

```
In [5]:
```

```
import statsmodels.api as sm
heart = sm.datasets.heart.load_pandas().data
heart.head(n=6)
```

```
Out[5]:
```

survival | censors | age | |
---|---|---|---|

0 | 15 | 1 | 54.3 |

1 | 3 | 1 | 40.4 |

2 | 624 | 1 | 51.0 |

3 | 46 | 1 | 42.5 |

4 | 127 | 1 | 48.0 |

5 | 64 | 1 | 54.6 |

```
In [6]:
```

```
# How many patients were censored?
print('# censroed', sum(heart.censors == 0), '\n')
# What is the correlation coefficient between age and survival for uncensored patients?
uncensored = heart[heart.censors == 1]
print('Correlation coefficient', np.corrcoef(uncensored.age, uncensored.survival)[0,1], '\n')
# What is the average age for censored and uncensored patients?
print(heart.groupby('censors')['age'].mean(), '\n')
# What is the average survival time for censored and uncensored patients under the age of 45?
young = heart[heart.age < 45]
print(young.groupby('censors')['survival'].mean(), '\n')
# What is the survival time of the youngest and oldest uncensored patient?
print('Survival of youngest', uncensored.survival[np.argmin(uncensored.age)])
print('Survival of olderst', uncensored.survival[np.argmax(uncensored.age)])
```

```
# censroed 24
Correlation coefficient 0.00325649928321
censors
0 41.729167
1 48.484444
Name: age, dtype: float64
censors
0 712.818182
1 169.909091
Name: survival, dtype: float64
Survival of youngest 228.0
Survival of olderst 60.0
```

**Q3. (40 pts)**

Given matrix \(M\)

```
[[7, 8, 8],
[1, 3, 8],
[9, 2, 1]]
```

Normalize the given matrix \(M\) so that all rows sum to 1.0. This cna then be considered as a transition matrix \(P\) for a Markov chain. Find the stationary distribution of this matrix in the following ways using

`numpy`

and`numpy.linalg`

(or`scipy.linalg`

):By raising the matrix \(P\) to some large power unitl it doesn’t change with higher powers (see

`np.linalg.matrix_power`

) and then calculating \(vP\)From the equation for stationarity \(wP = w\), we can see that \(w\) must be a left eigenvector of \(P\) with eigenvalue \(1\) (Note: np.linalg.eig returns the right eigenvectors, but the left eighenvector of a matrix is the right eigenvector of the transposed matrix). Use this to find \(w\) using

`np.linalg.eig`

.Suppose \(w = (w_1, w_2, w_3)\). Then from \(wP = w\), we have:

\[ \begin{align}\begin{aligned}:nowrap:\\\begin{split} \begin{align} w_1 P_{11} + w_2 P_{21} + w_3 P_{31} &= w_1 \\ w_1 P_{12} + w_2 P_{22} + w_3 P_{32} &= w_2 \\ w_1 P_{13} + w_2 P_{23} + w_3 P_{331} &= w_3 \\ \end{align}\end{split}\\This is a singular system, but we also know that\end{aligned}\end{align} \]\(w_1 + w_2 + w_3 = 1\). Use these facts to set up a linear system of equations that can be solved with

`np.linalg.solve`

to find \(w\).

## Part 1¶

```
In [7]:
```

```
M = np.array([7,8,8,1,3,8,9,2,1.0]).reshape(3,3)
M
```

```
Out[7]:
```

```
array([[ 7., 8., 8.],
[ 1., 3., 8.],
[ 9., 2., 1.]])
```

```
In [8]:
```

```
# Normalize the matrix so that rows sum to 1
P = M/np.sum(M, 1)[:, np.newaxis]
P
```

```
Out[8]:
```

```
array([[ 0.30434783, 0.34782609, 0.34782609],
[ 0.08333333, 0.25 , 0.66666667],
[ 0.75 , 0.16666667, 0.08333333]])
```

```
In [9]:
```

```
v = np.random.random(3)
v /= v.sum()
v
```

```
Out[9]:
```

```
array([ 0.58998219, 0.31194083, 0.09807698])
```

## Part 2: Brute force¶

```
In [10]:
```

```
# By raising the matrix $P$ to some large power
P50 = np.linalg.matrix_power(P, 50)
P51 = np.dot(P50, P)
# check that P50 is stationary
np.testing.assert_allclose(P50, P51)
np.dot(v, P51)
```

```
Out[10]:
```

```
array([ 0.39862184, 0.2605972 , 0.34078096])
```

```
In [11]:
```

```
import scipy.linalg as la
```

## Part 3: Option 1: Using scipy.linalg¶

```
In [12]:
```

```
lam, vec = la.eig(P, left=True, right=False)
idx = np.argmin(np.abs(lam - 1))
w = np.real(vec[:, idx])
w/w.sum()
```

```
Out[12]:
```

```
array([ 0.39862184, 0.2605972 , 0.34078096])
```

## Part 3: Option 2: Using numpy.linalg with transpose to get the left eigenvectors¶

```
In [13]:
```

```
# Left eigenvector with eigenvalue 1
# note transpose of P to find left eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(P.T)
# find index of eigenvalue = 1
idx = np.argmin(np.abs(eigenvalues - 1))
w = np.real(eigenvectors[:, idx]).T
# remember to normalize eigenvector to get a probability distribution
w/np.sum(w)
```

```
Out[13]:
```

```
array([ 0.39862184, 0.2605972 , 0.34078096])
```

## Part 4¶

We want to solve

In matrix form, we have

Knowing this, it is just a matter of setting up the system for
`np.linalg.solve`

to work.

```
In [14]:
```

```
# Solving linear system
# Use the first 2 rows of the matrix P - I = 0
# and construnct the last row from $w_1 + w_2 + w_3 = 1$
A = P.T - np.eye(3)
A[2] = [1,1,1]
np.linalg.solve(A, [0,0,1])
```

```
Out[14]:
```

```
array([ 0.39862184, 0.2605972 , 0.34078096])
```

**Q4. (20 pts)** Write code to replicate the following plot using
`seaborn`

.

```
In [16]:
```

```
%matplotlib inline
import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()
```

```
Out[16]:
```

sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|

0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |

1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |

2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |

3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |

4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |

```
In [17]:
```

```
sns.set_context("notebook", font_scale=1.3)
sns.pairplot(iris, hue="species", kind='reg', diag_kind='kde', palette='dark')
pass
```