Working with Matrices

In [1]:
import numpy as np

Matrices and liner combinations

Post-multiplication with vector

Matrix-vector multiplication is a linear combination of the columns of the matrix

\[\begin{split}\begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} \begin{bmatrix} 2 \\ 3 \end{bmatrix} = 2 \begin{bmatrix} 1 \\ 3 \\ 5 \end{bmatrix} + 3 \begin{bmatrix} 2 \\ 4 \\ 6 \end{bmatrix} = \begin{bmatrix} 8 \\ 18 \\ 28 \end{bmatrix}\end{split}\]
\[\begin{split}\begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} \begin{bmatrix} 1 \\ 4 \end{bmatrix} = 1 \begin{bmatrix} 1 \\ 3 \\ 5 \end{bmatrix} + 4 \begin{bmatrix} 2 \\ 4 \\ 6 \end{bmatrix} = \begin{bmatrix} 9 \\ 19 \\ 29 \end{bmatrix}\end{split}\]

We can stack the columns horizontally to get matrix multiplication.

\[\begin{split}\begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} \begin{bmatrix} 2 & 1 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 8 & 9 \\ 18 & 19 \\ 28 & 29 \end{bmatrix}\end{split}\]
In [2]:
A = np.arange(1, 7).reshape((3,2))
x1 = np.array([2, 3]).reshape((2,1))
x2 = np.array([1,4]).reshape((2,1))
In [3]:
A @ x1
Out[3]:
array([[ 8],
       [18],
       [28]])
In [4]:
A @ x2
Out[4]:
array([[ 9],
       [19],
       [29]])
In [5]:
np.c_[x1, x2]
Out[5]:
array([[2, 1],
       [3, 4]])
In [6]:
A @ np.c_[x1, x2]
Out[6]:
array([[ 8,  9],
       [18, 19],
       [28, 29]])

Pre-multiplication with vector

Vector-matrix multiplication is a linear combination of the rows of the matrix

\[\begin{split}\begin{bmatrix} 1 & 2 & 3 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix}= 1 \begin{bmatrix} 1 & 2 \end{bmatrix} + 2 \begin{bmatrix} 3 & 4 \end{bmatrix} + 3 \begin{bmatrix} 5 & 6 \end{bmatrix} = \begin{bmatrix} 22 & 28 \end{bmatrix}\end{split}\]
\[\begin{split}\begin{bmatrix} 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix}= 4 \begin{bmatrix} 1 & 2 \end{bmatrix} + 5 \begin{bmatrix} 3 & 4 \end{bmatrix} + 6 \begin{bmatrix} 5 & 6 \end{bmatrix} = \begin{bmatrix} 49 & 64 \end{bmatrix}\end{split}\]

We can stack the rows vertically to get matrix multiplication.

\[\begin{split}\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} = \begin{bmatrix} 22 & 28 \\ 49 & 64 \end{bmatrix}\end{split}\]

Matrix-matrix multiplication can be seen as the horizontal stacking of column operations or as the vertical stacking of row operations.

In [7]:
y1 = np.array([1,2,3]).reshape((1,3))
y2 = np.array([4,5,6]).reshape((1,3))
In [8]:
y1 @ A
Out[8]:
array([[22, 28]])
In [9]:
y2 @ A
Out[9]:
array([[49, 64]])
In [10]:
np.r_[y1, y2]
Out[10]:
array([[1, 2, 3],
       [4, 5, 6]])
In [11]:
np.r_[y1, y2] @ A
Out[11]:
array([[22, 28],
       [49, 64]])

Extract columns of a matrix by post-multiplication with standard unit column vector

In [12]:
A
Out[12]:
array([[1, 2],
       [3, 4],
       [5, 6]])
In [13]:
e2 = np.array([0,1]).reshape((-1,1))
In [14]:
A @ e2
Out[14]:
array([[2],
       [4],
       [6]])

Extract rows of a matrix by pre-multiplication with standard unit row vector

In [15]:
e2 = np.array([0,1,0]).reshape((-1, 1))
In [16]:
e2.T @ A
Out[16]:
array([[3, 4]])

Permutation matrices

From the column extraction by post-multiplication with a standard unit column vector, we generalize to permutation matrices (identity matrix with permuted columns). Post-multiplication of a matrix \(A\) with a permutation matrix \(P\) rearranges the columns of \(A\). To recover the original matrix, multiply with \(P^T\) - i.e. \(P^{-1} = P^T\) and the inverse of \(P\) is its inverse, \(P\) being our first example of an orthogonal matrix.

In [17]:
A = np.arange(1, 17).reshape((4,4))
A
Out[17]:
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])
In [18]:
I = np.eye(4, dtype='int')
I
Out[18]:
array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])
In [19]:
A @ I
Out[19]:
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])
In [20]:
p = I[:, [2,1,3,0]]
p
Out[20]:
array([[0, 0, 0, 1],
       [0, 1, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 1, 0]])
In [21]:
A @ p
Out[21]:
array([[ 3,  2,  4,  1],
       [ 7,  6,  8,  5],
       [11, 10, 12,  9],
       [15, 14, 16, 13]])
In [22]:
A @ p @ p.T
Out[22]:
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

Matrix partitioning

We see above that matrix multiplication can be seen as separate operations on the row or column vectors. We can actually partition matrices into blocks (not just vectors) for matrix multiplication. Suppose we want to calculate \(AB\), where

\begin{align} A = \begin{bmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix}&, & B = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \end{align}

We can consider (say) \(A\) and \(B\) as each being a \(2 \times 2\) matrix where each element is a \(2 \times 2\) sub-matrix (or block). This simplifies the computation since many blocks are the identity or null matrix.

\begin{align} A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}&, & B = \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix} \end{align}

and

\[\begin{split}AB = \begin{bmatrix} A_{11}B_{11} + A_{12}B_{21} & A_{11}B_{12} + A_{12}B_{22} \\ A_{21}B_{11} + A_{22}B_{22} & A_{21}B_{12} + A_{22}B_{22} \end{bmatrix}\end{split}\]

In fact, we can see by inspection that the result will be

\[\begin{split}AB = \begin{bmatrix} B_{11} & B_{12}+I_2 \\ 0_2 & A_{22} \end{bmatrix} = \begin{bmatrix} 1 & 2 & 4 & 4 \\ 5 & 6 & 7 & 9 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix}\end{split}\]

In general, any sub-block structure consistent with matrix multiplication (more formally, \(A\) and \(B\) are conformable for multiplication) is fine. In particular, the blocks do not have to be square.

In [23]:
a11 = np.eye(2)
a12 = np.eye(2)
a21 = np.zeros((2,2))
a22 = np.diag((2,3))

b11 = np.array([
    [1,2],
    [5,6]
])
b12 = np.array([
    [3,4],
    [7,8]
])
b21 = np.zeros((2,2))
b22 = np.eye(2)
In [24]:
A = np.block([
    [a11, a12],
    [a21, a22]
]).astype('int')
A
Out[24]:
array([[1, 0, 1, 0],
       [0, 1, 0, 1],
       [0, 0, 2, 0],
       [0, 0, 0, 3]])
In [25]:
B = np.block([
    [b11, b12],
    [b21, b22]
]).astype('int')
B
Out[25]:
array([[1, 2, 3, 4],
       [5, 6, 7, 8],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])
In [26]:
A @ B
Out[26]:
array([[1, 2, 4, 4],
       [5, 6, 7, 9],
       [0, 0, 2, 0],
       [0, 0, 0, 3]])
In [27]:
np.block([
    [a11@b11 + a12@b21, a11@b12 + a12@b22],
    [a21@b11 + a22@b21, a21@b12 + a22@b22]
]).astype('int')
Out[27]:
array([[1, 2, 4, 4],
       [5, 6, 7, 9],
       [0, 0, 2, 0],
       [0, 0, 0, 3]])