In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
In [2]:
%load_ext dotmagic
Markov Chains¶
A Markov chain is a sequence of events in which the probability of the next event depends only on the state of the current event. For example, we have previously encountered Markov chains in the random walk and Google Page Rank algorithm.
Example: Random walk and diffusion¶
In [3]:
n = 1000
k = 100
xs = np.c_[np.zeros(n), np.random.choice([-1,1], (n, k))]
xs = np.cumsum(xs, axis=1)
In [4]:
for x in xs:
plt.plot(np.arange(k+1), x)
Snapshots of displacement after \(k\) steps follow normal distribution with \(\sigma = \sqrt{k}\)¶
In [5]:
fig, axes = plt.subplots(1, 5, figsize=(15, 3), sharex=True)
for i, k in enumerate(range(20, 101, 20)):
axes[i].hist(xs[:, k], alpha=0.3, normed=True)
axes[i].set_title('t = %d' % k)
xp = np.linspace(-25, 25, 100)
axes[i].plot(xp, stats.norm(loc=0, scale=np.sqrt(k)).pdf(xp))
plt.tight_layout()
Properties of states and Markov chains¶
A Markov chain is irreducible if it is possible to get from any state to any state. Otherwise it is reducible.
A state has period \(k\) if it must return to that state in multiples of \(k\) moves. If \(k=1\), the state is aperiodic. If all states are aperiodic, then the Markov chain is aperiodic.
If any state in an irreducible Markov chain is aperiodic, then all states are aperiodic.
Reducible and aperiodic¶
A is a transient state. B and C are recurrent states.
In [6]:
%%dot
digraph g {
node [shape=circle]
A -> B
B -> C
C -> B
C -> C
}
Irreducible and periodic¶
A, B, and C are recurrent states.
In [7]:
%%dot
digraph g {
layout="circo";
node [shape=circle]
A -> B
B -> C
C -> A
}
Absorbing Markov chain¶
D is an absorbing state. A, B and C are transient states. Since every state can reach an absorbing state, this is an absorbing Markov chain
In [8]:
%%dot
digraph g {
node [shape=circle]
A -> B
B -> C
C -> B
C -> C
B -> D
}
Ergodic¶
An ergodic state is aperiodic and positive recurrent. If all states are ergodic, then we have an ergodic Markov chain. A finite chain that is aperiodic and irreducible is ergodic.
In [9]:
%%dot
digraph g {
layout="circo";
node [shape=circle]
A -> B
B -> C
C -> A
C -> C
}
Markov Chain¶
Suppose we have the following transition table:
Raleigh | Chapel Hill | Durham | |
---|---|---|---|
Raleigh | 0.9 | 0.05 | 0.05 |
Chapel Hill | 0.1 | 0.8 | 0.1 |
Durham | 0.04 | 0.01 | 0.95 |
This says that a resident of Raleigh this year has a 90% chance of stying in Raleigh and a 5% chance of relocating to Chapel Hill or Durham the following year. Note that the probabilities only depend on the current year. The transitions between states (Raleigh, Chapel Hill and Durham) are then said to be modeled by a Markov chain.
If Raleigh, Chapel Hill and Durham each started with 300,000 residents, what is the long run distribution assuming no immigration or emigration?
In [10]:
%%dot
digraph g {
node [shape=box]
R [label="Raleigh"]
C [label="Chapel Hill"]
D [label="Duham"]
R -> R [label=0.9]
R -> C [label=0.05]
R -> D [label=0.05]
C -> R [label=0.1]
C -> C [label=0.8]
C -> D [label=0.1]
D -> R [label=0.04]
D -> C [label=0.01]
D -> D [label=0.95]
}
In [11]:
A = np.array([
[0.9, 0.05, 0.05],
[0.1, 0.8, 0.1],
[0.04, 0.01, 0.95]
])
In [12]:
x = np.array([300000, 300000, 300000]).reshape(-1,1)
Brute force solution¶
In [15]:
n = 100
x.T @ np.linalg.matrix_power(A, n)
Out[15]:
array([[300000.50469469, 100000.21203929, 499999.28326602]])
Eigenvector solution¶
At steady state, we have
Taking the transpose on both sides, we see that
suggesting that the steady state probability vector is the normalized eigenvector of \(A^T\) with \(\lambda = 1\).
In [16]:
E, V = np.linalg.eig(A.T)
In [17]:
E
Out[17]:
array([0.76479203, 0.88520797, 1. ])
In [18]:
p = V[:, 2:] / V[:, 2:].sum()
p
Out[18]:
array([[0.33333333],
[0.11111111],
[0.55555556]])
In [19]:
p * x.sum()
Out[19]:
array([[300000.],
[100000.],
[500000.]])
Linear system solution¶
We cannot solve \(A^Tp - p= \mathbb{0}\) directly because \(A^T - I\) does not have full rank (since the last column of \(A\) is fixed given the other columns).
We need to add the constraint \(\sum{p} = 1\) and then we can solve the linear system to find \(p\).
In [20]:
np.linalg.matrix_rank(A.T - np.eye(3))
Out[20]:
2
In [21]:
M = np.r_[A.T - np.eye(3), [np.ones(3)]]
b = np.r_[np.zeros(3), 1].reshape(-1,1)
In [22]:
M
Out[22]:
array([[-0.1 , 0.1 , 0.04],
[ 0.05, -0.2 , 0.01],
[ 0.05, 0.1 , -0.05],
[ 1. , 1. , 1. ]])
In [23]:
b
Out[23]:
array([[0.],
[0.],
[0.],
[1.]])
In [24]:
x, res, rank, s = np.linalg.lstsq(M, b, rcond=None)
In [25]:
x
Out[25]:
array([[0.33333333],
[0.11111111],
[0.55555556]])
Markov chain averages¶
We can also consider the perspective of a single individual in terms of the frequencies of places visited.
The individual starts from one of the 3 places (Raleigh, Chapel Hill or Durham) and moves from place to place according to the probabilities in \(A\) over a long time. In the long run, the average frequency of visits to a place is the steady state probability of visiting that place.
This suggests that we can estimate the equilibrium or steady state distribution by using the long-run averages, which we’ll revisit when we look at Markov chain Monte Carlo (MCMC). Note that the initial averages depend on the starting conditions, so it is common practice to discard some number of the initial iterations (burn-in).
In [26]:
P = np.cumsum(A).reshape(A.shape) - np.array([0,1,2]).reshape(-1,1)
P
Out[26]:
array([[0.9 , 0.95, 1. ],
[0.1 , 0.9 , 1. ],
[0.04, 0.05, 1. ]])
In [27]:
n = int(1e6)
xs = np.zeros(n+1).astype('int')
xs[0] = np.random.choice([0,1,2])
for t in range(n):
r = np.random.rand()
xs[t+1] = np.argmax(r < P[xs[t]])
In [28]:
np.mean(xs==0), np.mean(xs==1), np.mean(xs==2)
Out[28]:
(0.33321566678433323, 0.11102688897311103, 0.5557574442425558)