{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Matrices" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Matrices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example 1: Linear equations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider a set of $m$ linear equations in $n$ unknowns:\n", "\n", "\\begin{align*}\n", "a_{11} x_1 + &a_{12} x_2& +& ... + &a_{1n} x_n &=& b_1\\\\\n", "\\vdots && &&\\vdots &= &\\vdots\\\\\n", "a_{m1} x_1 + &a_{m2} x_2& +& ... + &a_{mn} x_n &=&b_m \n", "\\end{align*}\n", "\n", "We can let\n", "\n", "\\begin{align*}\n", " A=\\left[\\begin{matrix}a_{11}&\\cdots&a_{1n}\\\\\n", " \\vdots & &\\vdots\\\\\n", " a_{m1}&\\cdots&a_{mn}\\end{matrix}\\right]\n", "\\end{align*}\n", "\n", "\\begin{align*}\n", "x = \\left[\\begin{matrix}x_1\\\\\n", " \\vdots\\\\\n", " x_n\\end{matrix}\\right] & \\;\\;\\;\\;\\textrm{ and } &\n", "b = \\left[\\begin{matrix}b_1\\\\\n", " \\vdots\\\\\n", " b_m\\end{matrix}\\right]\n", "\\end{align*}\n", "\n", "and re-write the system:\n", " \n", "$$ Ax = b$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example 2: Covariance matrix\n", "\n", "Suppose we have $n$ observations of $p$ variables in a $n \\times p$ matrix. The covariance matrix is a matrix with special useful properties - it is symmetric and positive semi-definite.\n", "\n", "The sample covariance matrix is given by the element-wise formula\n", "\n", "![cov](https://wikimedia.org/api/rest_v1/media/math/render/svg/61c41f7dd5d200b419a9cd94d8cb92bd47dbe08e)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n, p = 10, 3\n", "x = np.random.random((n, p))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.cov(x, rowvar=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "v = x - x.mean(axis=0)\n", "v.T @ v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example 3: Adjacency matrix for graph\n", "\n", "You can represent a graph as an adjacency matrix. This is a $n \\times n$ matrix $A$ for a graph with $n$ nodes, where a 1 at $A(i, j)$ indicates that there is an edge between node $i$ and node $j$. The adjacency matrix is typically a **sparse** graph, where most entires are 0 (no edges) and sparse matrix representations are useful for efficient calculations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import networkx as nx" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "g = nx.random_lobster(50,0.7,0.7)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.draw_kamada_kawai(g, nodesize=200, alpha=0.5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m = nx.adjacency_matrix(g)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m.todense()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Matrix representation in Python\n", "\n", "We simply use nd-arrays with two dimensions to represent matrices. There is a `np.matrix` class, but it is not often used because most numpy creation functions return `ndarray`s, and confusing behavior can result when mixed with `ndarray`s." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M1 = np.random.random((4,4))\n", "M1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M2 = np.matrix(M1)\n", "M2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Matrix multiplication**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M1 @ M1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M2 * M2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Transposition**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M1.T" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M2.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Inverse**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.linalg.inv(M1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M2.I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Properties of a matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M = np.arange(16).reshape((4,4))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M.size" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Norms\n", "\n", "Just as with vectors, we can measure the size or norm of a matrix. There are many possible norms. \n", "\n", "The following norms can be calculated using `np.linalg.norm`\n", " \n", " ===== ============================ ==========================\n", " ord norm for matrices norm for vectors\n", " ===== ============================ ==========================\n", " None Frobenius norm 2-norm\n", " 'fro' Frobenius norm --\n", " 'nuc' nuclear norm --\n", " inf max(sum(abs(x), axis=1)) max(abs(x))\n", " -inf min(sum(abs(x), axis=1)) min(abs(x))\n", " 0 -- sum(x != 0)\n", " 1 max(sum(abs(x), axis=0)) as below\n", " -1 min(sum(abs(x), axis=0)) as below\n", " 2 2-norm (largest sing. value) as below\n", " -2 smallest singular value as below\n", " other -- sum(abs(x)**ord)**(1./ord)\n", " ===== ============================ ==========================" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.linalg.norm(M)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.sqrt(np.sum(M**2))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "np.linalg.norm(M, ord=2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.linalg.svd(M)[1][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Trace\n", "\n", "The trace of a matrix $A$ is the sum of its diagonal elements. It is important for a couple of reasons:\n", "\n", "* It is an *invariant* of a matrix under change of basis\n", "* It defines a matrix norm" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M.trace()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M.diagonal().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Determinant\n", "\n", "The determinant of a matrix is defined to be the alternating sum of permutations of the elements of a matrix. Let's not dwell on that though. It is important to know that the determinant of a $2\\times 2$ matrix is\n", "\n", "$$\\left|\\begin{matrix}a_{11} & a_{12}\\\\a_{21} & a_{22}\\end{matrix}\\right| = a_{11}a_{22} - a_{12}a_{21}$$\n", "\n", "This may be extended to an $n \\times n$ matrix by minor expansion. I will leave that for you to google. \n", "\n", "What is most important about the determinant:\n", "\n", "* Like the trace, it is also invariant under change of basis\n", "* An $n\\times n$ matrix $A$ is invertible $\\iff$ det$(A)\\neq 0$ \n", "* The rows(columns) of an $n\\times n$ matrix $A$ are linearly independent $\\iff$ det$(A)\\neq 0$\n", "\n", "Geometrically, the determinant is the volume of the paralleliped spanned by the column vectors of the matrix." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.linalg.det(M)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Matrix operations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A = np.arange(12).reshape((3,4))\n", "B = np.arange(12).reshape((4,3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Addition" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A + A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Scalar multiplication" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "2 * A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Matrix multiplication" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A @ B" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A.T @ A" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A @ A.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's review some linear algebra topics:\n", "\n", "### Linear Independence\n", "\n", "A collection of vectors $v_1,...,v_n$ is said to be *linearly independent* if\n", "\n", "$$c_1v_1 + \\cdots c_nv_n = 0$$\n", "$$\\iff$$\n", "$$c_1=\\cdots=c_n=0$$\n", "\n", "In other words, any linear combination of the vectors that results in a zero vector is trivial.\n", "\n", "Another interpretation of this is that no vector in the set may be expressed as a linear combination of the others. In this sense, linear independence is an expression of non-redundancy in a set of vectors.\n", "\n", "\n", "Fact: Any linearly independent set of $n$ vectors spans an $n$-dimensional space. (I.e. the collection of all possible linear combinations is $\\mathbb{R}^n$.) Such a set of vectors is said to be a *basis* of $\\mathbb{R}^n$. Another term for basis is *minimal spanning set*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Column space, Row space, Rank and Kernel\n", "\n", "Let $A$ be an $m\\times n$ matrix. We can view the columns of $A$ as vectors, say $a_1,...,a_n$. The space of all linear combinations of the $a_i$ are the *column space* of the matrix $A$. Now, if $a_1,...,a_n$ are *linearly independent*, then the column space is of dimension $n$. Otherwise, the dimension of the column space is the size of the maximal set of linearly independent $a_i$. Row space is exactly analogous, but the vectors are the *rows* of $A$.\n", "\n", "The *rank* of a matrix *A* is the dimension of its column space - and - the dimension of its row space. These are equal for any matrix. Rank can be thought of as a measure of non-degeneracy of a system of linear equations, in that it is the *dimension of the image of the linear transformation* determined by $A$. \n", "\n", "The *kernel* of a matrix *A* is the dimension of the space mapped to zero under the linear transformation that $A$ represents. The dimension of the kernel of a linear transformation is called the *nullity*. \n", "\n", "Index theorem: For an $m\\times n$ matrix $A$, \n", "\n", "rank($A$) + nullity($A$) = $n$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solving linear equations (Matrix-vector multiplication)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We return to solving the system of equations\n", "\n", "$$\n", "Ax = b\n", "$$\n", "\n", "Expanding,\n", "\n", "\\begin{align*}\n", " \\left[\\begin{matrix}a_{11}&\\cdots&a_{1n}\\\\\n", " \\vdots & &\\vdots\\\\\n", " a_{m1}&\\cdots&a_{mn}\\end{matrix}\\right]\n", "\\left[\\begin{matrix}x_1\\\\\n", " \\vdots\\\\\n", " x_n\\end{matrix}\\right] =\n", "\\left[\\begin{matrix}b_1\\\\\n", " \\vdots\\\\\n", " b_m\\end{matrix}\\right]\n", "\\end{align*}\n", "\n", "which can be rewritten as a weighted sum of the column vectors of $A$\n", "\n", "$$\n", "x_1 \\left[ \\matrix{a_{11} \\\\ \\vdots \\\\ a_{m1}} \\right] + \\cdots + x_n \\left[ \\matrix{a_{1n} \\\\ \\vdots \\\\ a_{mn}} \\right]\n", "= \\left[\\begin{matrix}b_1\\\\\n", " \\vdots\\\\\n", " b_m\\end{matrix}\\right]\n", "$$\n", "\n", "So solving the system of equations means finding the appropriate weights $x$ such that the sum of weighted column vectors of $A$ is the same as $b$. Put another way, we are trying to express $b$ as a linear combination of the column vectors of $A$.\n", "\n", "Note that we have two different useful perspectives on a matrix - as a collection of column (or row) vectors that span a vector space, and as a function that transforms or maps a vector into another vector. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Solving for m = n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A = np.random.random((3,3))\n", "b = np.random.random((3,1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the solve function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.linalg.solve(A, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the projection onto the column space of A." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.linalg.inv(A.T @ A) @ A.T @ b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Underdetermined System ($mn$, the system may be *overdetermined*. In other words, there are more equations than unknowns. They system could be inconsistent, or some of the equations could be redundant.\n", "\n", "There are many techniques to solve and analyze linear systems. Our goal is to understand the theory behind many of the built-in functions, and how they *efficiently* solve systems of equations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Special Matrices\n", "\n", "Some matrices have interesting properties that allow us either simplify the underlying linear system or to understand more about it. \n", "\n", "#### Square Matrices\n", "\n", "Square matrices have the same number of columns (usually denoted $n$). We refer to an arbitrary square matrix as and $n\\times n$ or we refer to it as a 'square matrix of dimension $n$'. If an $n\\times n$ matrix $A$ has *full rank* (i.e. it has rank $n$), then $A$ is invertible, and its inverse is unique. This is a situation that leads to a unique solution to a linear system.\n", "\n", "#### Diagonal Matrices\n", "\n", "A diagonal matrix is a matrix with all entries off the diagonal equal to zero. Strictly speaking, such a matrix should be square, but we can also consider rectangular matrices of size $m\\times n$ to be diagonal, if all entries $a_{ij}$ are zero for $i\\neq j$\n", "\n", "#### Symmetric and Skew Symmetric\n", "\n", "A matrix $A$ is (skew) symmetric iff $a_{ij} = (-)a_{ji}$.\n", "\n", "Equivalently, $A$ is (skew) symmetric iff\n", "\n", "$$A = (-)A^T$$\n", "\n", "#### Upper and Lower Triangular\n", "\n", "A matrix $A$ is (upper|lower) triangular if $a_{ij} = 0$ for all $i (>|<) j$\n", "\n", "#### Orthogonal and Orthonormal\n", "\n", "A matrix $A$ is *orthogonal* iff\n", "\n", "$$A A^T = I$$\n", "\n", "In other words, $A$ is orthogonal iff \n", "\n", "$$A^T=A^{-1}$$\n", "\n", "Fact: The rows and columns of an orthogonal matrix are an orthonormal set of vectors.\n", "\n", "\n", "#### Positive Definite\n", "\n", "Positive definite matrices are an important class of matrices with very desirable properties. A square matrix $A$ is positive definite if\n", "\n", "$$u^TA u > 0$$\n", "\n", "for any non-zero n-dimensional vector $u$.\n", "\n", "A symmetric, positive-definite matrix $A$ is a positive-definite matrix such that\n", "\n", "$$A = A^T$$\n", "\n", "IMPORTANT: \n", "\n", "* Symmetric, positive-definite matrices have 'square-roots' (in a sense)\n", "* Any symmetric, positive-definite matrix is *diagonizable*!!!\n", "* Co-variance matrices are symmetric and positive-definite\n", "\n", "\n", "Now that we have the basics down, we can move on to numerical methods for solving systems - aka matrix decompositions." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }