Review of linear algebra¶

These notes are a sketch of what you should recall from your undergraduate linear algebra class. They are not meant for learning from, but more as a checklist of stuff you already know so that you can revise the parts you have forgotten.

Introduction¶

In general, algebra is the mathematical study of structure, just like geometry is the study of space and analysis is the study of change. Linear algebra, in particular, is the study of linear maps between vector spaces. For many students, linear algebra is the first experience of mathematical abstraction, and hence often felt to be unfamiliar and difficult.

Abstraction begins with the appreciation that mathematics is largely concerned with operations on elements in a set within some mathematical space. We start with something concrete and familiar - the basic vector space $\mathbb{R}^n$ - then take gentle steps into more abstract territory before finally returning to concrete applications of linear algebra.

Motivation: Why is all this abstraction necessary?¶

Abstraction pays off because it allows us to generalize the same rules to many different mathematical contexts. For example, we will see that sets of points in Euclidean space, sets of matrices, sets of polynomial functions, sets of solutions of a homogeneous linear system in $n$ variables, are all examples of vector spaces. So with the right abstraction, we can be sure that our intuition in concrete systems like $\mathbb{R}^3$ will carry over to, say, infinite-dimensional function spaces since both are vector spaces with linear maps.

The basic vector space $\mathbb{R}^n$¶

Concepts you should know¶

Vectors as arrows in $\mathbb{R}^2$
Intuition for rules of $\mathbb{R}$^2
Maps $T: \mathbb{R}^2 \rightarrow \mathbb{R}^2$ convert a vector into another vector
Spanning set and basis vectors
Linear independence
Coordinate systems as linear combinations of basis vectors
Change of coordinates
Other vector spaces - e.g. space of polynomial functions
Matrices as transformations
Examples of geometric transformations (scaling, rotation, projection etc)
Kernel and image
Trace and determinant
Column and row space
Length and the dot product
Distance
Perpendicular (orthogonal) vectors
Orthonormal vectors
Change of basis
Commutative diagram for change of basis
Finding a “nice” basis: Eigenvalues and eigenvectors
Geometric interpretation of eigenvalues and eigenvectors
Principal Components Analysis (PCA)
Singular value decomposition (SVD)
Geometric interpretation of PCA and SVD
Solving systems of linear equations
Row view, column view and element view of solving linear equations
Matrix decompositions
When is there a solution? Fundamental theorem of linear algebra
Geometric interpretation of the determinant
The normal equations for overdetermined systems of linear equations
Geometric interpretation of the normal equations as a projection

What is a field?¶

Informally, a field is a set of numbers possessing addition, subtraction, multiplication and division operations. More abstractly, a field $F$ is a set with two operations $+$ (addition) and $\cdot$ (multiplication) with the following properties. Let $a, b, c \in F$. Then we have

$a + b \in F$ and $a \cdot b \in F$ (closure)
$a + 0 = a$ and $a \cdot 1 = a$ (identity elements)
$a + (-a) = 0$ and $a \cdot a^{-1} = 1$ (inverses - also give subtraction $a - b$ and division $a/b$)
$a + (b + c) = (a + b) + c$ and $a \cdot (b \cdot c) = (a \cdot b) \cdot c$ (associative property)
$a + b = b + a$ and $a \cdot b = b \cdot a$ (commutative property)
$a \cdot (b + c) = a \cdot b + b \cdot c$ (distributive property)

Familiar examples of fields are the rational numbers $\mathbb{Q}$, real numbers $\mathbb{R}$, and complex numbers $\mathbb{C}$.

The elements $a, b, c \in F$ are known as scalars.

What is a vector space?¶

A vector space over a scalar field $F$ is a set $V$ with two operations $+$ (addition) $\cdot$ (scalar multiplication). Letting $u, v, w \in V$ and $a, b \in F$, we have four rules for addition and four rules for scalar multiplication. In most applied settings, $F = \mathbb{R}$.

Addition rules¶

$0 + v = v$ (additive identity or zero)
$v + (-v) = 0$ (additive inverse)
$v + w = w + v$ (commutative)
$u + (v + w) = (u + v) + w$ (associative)

Scalar multiplication rules¶

$1 \cdot v = v$ (multiplicative identity or unit)
$a(bv) = (ab)v$ (compatibility)
$a(v + w) = av + aw$ (distributive with respect to vector addition)
$(a + b)v = av + bv$ (distributive with respect to scalar addition)

Familiar examples of vector spaces are the space of Euclidean vectors and the set of polynomial functions.

The elements $u, v, w \in V$ are known as vectors.

What is a linear map or transformation?¶

A linear map $T$ takes an element from one vector space $U$ to another vector space $V$ in a way that preserves addition and scalar multiplication. In other words, if $u \in U$ and $v \in V$ are vectors and $a \in F$ is a scalar, $T$ must obey the following:

$T(u + v) = T(u_ + T(v)$
$T(au) = aT(u)$

Given two (or more) scalars $(a, b)$ and vectors $(u, v)$, we have

\[T(au + bv) = aT(u) + bT(v)\]

Familiar examples of linear maps are transformation matrices, differentiation, integration and expectation.

Vector subspaces¶

A subset $U$ of a vector space $V$ is a vector subspace if it is itself a vector space. The simplest way to see if $U$ is a vector subspace is to check whether it is closed under addition and scalar multiplication.

Kernel and image of a linear map¶

The kernel of a linear map $T$ is the set of vectors that T maps to zero. That is,

\[\text{ker}(T) = \left\{ v \in V: T(v) = 0 \right\}\]

In linear algebra contexts, the kernel is also known as the nullspace.

If $T: V \rightarrow W$ is a linear transformation, then the image of $T$ is

\[\text{Im}(T) = \left\{ w \in W: \text{there is a $v \in V$ such that} T(v) = w \right\}\]

Note that for $a, b \in \mathbb{R}$, and $v_1, v_2 \in V$,

and so the kernel is a subspace of $V$.

Similarly, for $w_1, w_2 \in W$, i.e. $T(v_1) = w_1, T(v_2) = w_2$,

so $aw_1 + bw_2\in W$ since $av_1 + bv_2 \in V$. This shows that the image of $T$ is a subspace of $W$.

Inner product space¶

Suppose we define a map

\[\begin{split}<\cdot, \cdot>: V \times V \rightarrow F\end{split}\]

that takes two vectors $u, v, w \in V$ to a scalar in $F$. If the map obeys the following rules

$<u, v> = \overline{<v, u>}$ (conjugate symmetry)
$<au + bv, w> = a<u, w> + b<v, w>$ (linear in first argument)
$<u, u> \ge 0$ (positive definite)
$<u, u> = 0 \implies u = 0$ (positive definite)

Then the vector space $V$ together with the inner product $<\cdot, \cdot>$ is known as an inner product space.

Euclidean space with the dot or scalar product is an inner product space. For example

Inner products generalize the idea of distance and angle (especially what it means to be perpendicular or orthogonal) familiar in Euclidean space to general vector spaces.

Length (norm)¶

The inner product allows us to define the length of a vector as

\[\begin{split}\|v\| = \sqrt{<v, v>}\end{split}\]

For example, in Euclidean space, the distance is the familiar Euclidean distance.

Orthogonality¶

Two non-zero vectors $u, v$ are orthogonal if and only if $<u, v> = 0$.

Some famous properties that arise in inner product spaces are Pythagoras’ theorem, the parallelogram law, the triangle inequality and the Cauchy-Schwartz inequality.

The Cauchy-Schwartz inequality (in dot product form) states that for any vectors $x$ and $y$

\[| x \cdot y | \le \|x\| \|y\|\]

Sketch of proof: Expand $\| tx + y \|^2$, where $t$ is a scalar. This gives a quadratic in $t$. Since the quadratic is greater or equal to zero (it is a norm), the determinant must be less than or equal to zero. Simplifying gives the Cauchy-Schwartz inequality.

The triangle inequality for two vectors $x$ and $y$ states that

\[\| x + y \| \le \|x\| + \|y\|\]

Sketch of proof: Take squares of both sides and simplify, then use Cauchy-Schwartz inequality.

For two orthogonal vectors $x$ and $y$

\[\|x\|^2 + \|y\|^2 = \| x + y \|^2\]

Sketch of proof: Expand $\| x + y \|^2$ as an inner product and use $<x, y> = 0$ by definition of orthogonality.

Basis, dimension, coordinates and matrices¶

Spanning set¶

A spanning set is a set of vectors $\left\{ v_1, v_2, \ldots \right \}$ in $V$ that allow us to rewrite any vector $v \in V$ as a linear combination

\[v = a_1 v_1 + a_2 v_2 + \ldots\]

where $a_1, a_2, \ldots$ are unique scalars.

For example, the vectors $v_1 = (0,1)$, $v_2 = (1,0)$ and $v_3 = (1,1)$ form a spanning set for $\mathbb{R}^2$.

Linear independence¶

Vectors $\left\{ v_1, v_2, \ldots \right \}$ are linearly independent if whenever

\[a_1 v_1 + a_2 v_2 + \ldots = 0\]

then all the $a_i$ must also be zero.

For example, the vectors $v_1 = (0,1)$, $v_2 = (1,0)$ and $v_3 = (1,1)$ are not linearly independent since

\[v_1 + v_2 - v_3 = 0\]

but not all the coefficients are equal to zero.

Basis¶

A spanning set that is linearly independent is known as a basis.

For example, the vectors $v_1 = (0,1)$, $v_2 = (1,0)$ form a basis for $\mathbb{R}^2$. Note that there can be more than one basis for a vector space - for example $w_1 = (1,1)$ and $w_2 = (0, 1)$ is another possible basis for $\mathbb{R}^2$.

Dimension¶

The dimension of a vector space is the number of elements in its basis. It is not immediately obvious, but can be proven that the number of elements in any basis of a vector space is the same, so the dimension is well-defined.

Coordinates¶

Given a basis for an $n$-dimensional vector space $v_1, v_2, \ldots, v_n$, we can express any vector $v \in V$ as

\[a_1 v_1 + a_2 v_2 + \ldots + a_n v_n\]

Since the scalars $a_1, a_2, \ldots, a_n$ are unique given the basis $v_1, v_2, \ldots, v_n$, we can define the vector $v$ in terms of its coordinates

$v = (a_1, a_2, \ldots, a_n)$

This is how the familiar idea of a vector as a tuple of numbers arises.

Note that this coordinate representation of a vector only applies to a particular choice of basis - change the basis and the coordinate representation of $v$ will also change.

Matrices as linear maps for some set of bases¶

Suppose $T: V \rightarrow W$ is a linear map such that $T(v) = w$.

Gien a basis $\left\{ v_1, v_2, \ldots, v_n \right\}$ for the vector space $V$ and another basis $\left\{ w_1, w_2, \ldots, w_m \right \}$ for the vector space $W$, then for any vector $v \in V$, we have

\[v = a_1 v_1 + a_2 v_2 + ... + a_n v_n\]

and hence can write $v$ as the column vector in $\mathbb{R}^n$

Similarly, we suppose that $w \in W$ can be expressed as a column vector in $\mathbb{R}^m$

Since $T$ maps a vecotr from $V$ to $W$, any basis vector $v_i \in V$ will be mapped to a vecotr in $W$, and so it can be expressed as

\[T(v_i) = c_{1i}w_1 + c_{2i}w_2 + \ldots c_{mi}w_m\]

where each $c_{ki}$ is just a number. So what does T do to an arbitrary vector $v \in V$?

Describing vectors uisng column vectors of coordinates and the usual rules of matridx mutliplication, we can rewrite this as

\[\begin{split}\begin{pmatrix} c_{11} c_{12} \ldots c_{1n} \\ c_{21} c_{22} \ldots c_{2n} \\ \cdots \\ c_{m1} c_{m2} \ldots c_{mn} \end{pmatrix} \begin{pmatrix} a_1\\a_2\\\cdots\\a_n \end{pmatrix} = \begin{pmatrix} b_1\\b_2\\\cdots\\b_m \end{pmatrix}\end{split}\]

Hence we see that matrix mulitplication does exactly the same thing to the vector $v$ as the linear map $T$. In other words, for the given basis, the $m \times n$ matrix $C$ represents the linear map $T$ that takes the vector $v \in \mathbb{R}^n$ to the vector $w \in \mathbb{R}^m$. In particular, if $T: V \rightarrow V$, then the matrix representing $T$ will be a square matrix.

The space spanned by the vectors that make up the columns of the matrix $C$ is the column space, and the space spanned by the vectors that make up the rows of $C$ is the row space. The rank of the matrix $C$ is the dimension of the column space. A theorem of linear algebra is that the dimension of the column space is the same as that of the row space.

Note that if we have different bases for the vecror spaces $V$ and $W$, then the matrix $C$ is also different, even though the linear map $T$ does not change. This is an important concept to grasp - a linear map takes a vector in some vector space to another vector in some other vector space. Depending on how we choose to descirbe the vector by our choice of basis, the matrix rerpesentation of the linear transform will also change. But the transformation or map is still the same.

How could you determine if two matrices represent the same lineat transform, just using differnt bases? This suggests that we should tackle change of basis next.

Calculations¶

Lets’ do some simple calculations to fix these ideas. Suppose we have a matrix

\[\begin{split}A = \begin{pmatrix}1 & 1 & 1 & 1\\1 & 2 & 3 & 4\\3 & 4 & 1 & 2\end{pmatrix}\end{split}\]

and we want to find the kernel $\ker(A)$, i.e. the space $\left\{x \in \mathbb{R}^4: Ax = 0 \right\}$. Explicitly, we want to solve the following equation for $x$

\[\begin{split}\begin{pmatrix}1 & 1 & 1 & 1\\1 & 2 & 3 & 4\\3 & 4 & 1 & 2\end{pmatrix} \begin{pmatrix}x_1 \\ x_2 \\ x_3 \\ x_4\end{pmatrix} = \begin{pmatrix}0 \\ 0 \\ 0\end{pmatrix}\end{split}\]

We can use Gausian elimination to find the reduced row echelon form

\[\begin{split}A \sim \begin{pmatrix}1 & 0 & 0 & -1\\0 & 1 & 0 & 1\\0 & 0 & 1 & 1\end{pmatrix}\end{split}\]

We can rewrtie this as a system of equations

With these constraints, we have

\[\begin{split}\begin{pmatrix}x_1 \\ x_2 \\ x_3 \\ x_4\end{pmatrix} = x_4 \begin{pmatrix}1 \\ -1 \\ -1\\ 1\end{pmatrix}\end{split}\]

In other words, the kenrel or nullspace is one-dimensional and is spanned by the basis vector

\[\begin{split}\left\{ \begin{pmatrix}1 \\ -1 \\ -1\\ 1\end{pmatrix} \right\}\end{split}\]

The column space is three-dimensional and spanned by the following vectors

\[\begin{split}\left \{ \begin{pmatrix}1 \\ 1 \\ 3 \end{pmatrix}, \begin{pmatrix}1 \\ 2 \\ 4 \end{pmatrix}, \begin{pmatrix}1 \\ 3 \\ 1 \end{pmatrix} \right \}\end{split}\]

In [1]:

from sympy import Matrix

A = Matrix([1,1,1,1,1,2,3,4,3,4,1,2]).reshape(3,4)
A.rref()[0]

Out[1]:

Matrix([
[1, 0, 0, -1],
[0, 1, 0,  1],
[0, 0, 1,  1]])

In [2]:

A * Matrix([1,-1,-1,1]).reshape(4,1)

Out[2]:

Matrix([
[0],
[0],
[0]])

Change of basis¶

In solving matrix problems, there is typically a choice of basis that is natural for the problem, and which might make the problem much simpler to solve. Hence it is useful to understand how vector and matrix coordinates change when me move to a new basis.

Let’s consider two different sets of basis vectors $B$ and $B'$ for $\mathbb{R}^2$. Suppose the basis vectors for $B$ are ${u, v}$ and that the basis vectors for $B'$ are ${u', v'}$. Suppose also that the basis vectors ${u', v'}$ for $B'$ have coordinates $u' = (a, b)$ and $v' = (c, d)$ with respect to $B$. That is, $u' = au + bv$ and $v' = cu + dv$ since that’s what vector coordinates mean.

Suppose we want to find out what the coordinates of a vector $w = (x', y')$ in the $B'$ basis would be in $B$. We do some algebra:

So

\[\begin{split}[w]_B = \left( \begin{array}{cc} ax' + cy' \\ bx' + dy' \end{array} \right)\end{split}\]

Expressing in matrix form

\[\begin{split}[w]_B = \left( \begin{array}{cc} a & c \\ b & d \end{array} \right) % \left( \begin{array}{c} x' \\ y' \end{array} \right)\end{split}\]

Since $[w]_{B'} = (x', y')$, we see that the linear transform we need to change a vector in $B'$ to one in $B$, we simply mulitply by the change of coordinates matrix $P$ that is the formed by using the basis vectors as column vectors, i.e.

\[\begin{split}P = \left( \begin{array}{cc} a & c \\ b & d \end{array} \right)\end{split}\]

To get from $B$ to $B'$, we multiply by $P^{-1}$,. How do we know that $P$ has an inverse? This brings us to the invertible matrix theorem, which is a collection of apparanlty very different but equivalent statements that describe when a matrix is invertible. Understanding why those statements are equivalent is usually a major focus of a first linear algebra course.

Changing to and from the standard basis¶

If one set of basis vectors is the standard basis (the most common case), the conversion is simpler to calculate. Suppose we have soem basis $B = \left \{ v_1, v_2, \ldots, v_n \right \}$ for $\mathbb{R}^n$. This means that any vector $x \in \mathbb{R}^n$ can be written as the linear combination $c_1 v_1 + c_2 v_2 + \ldots c_n v_n$. In matrix notation, we therefore have

\[\begin{split}x = \pmatrix{v_1 & v_2 & \ldots& v_n}\pmatrix{c_1 \\ c_2 \\ \ldots \\ c_n}\end{split}\]

where $C = \pmatrix{v_1 & v_2 & \ldots& v_n}$ is the change of baiss matrix. Since this matrix is full rank, it is invertible and $C^{-1}$ exists. So we have that $x = C[x]_B$ and $[x]_B = C^{-1}x$.

Invertible matrix theorem¶

A list of equivalent statements is provided here. See any linear algebra textbook for proofs that the statements are equivalent.

Basic intuition: Geometrically, an invertible (non-singular) matrix is a transform $T: V \rightarrow V$, where the dimension of a vector $v$ stays the same after the transformation. In other words, none of the basis vectors is collapsed to zero. This makes sense because if a basis vector is collapsed to zero, we lose the information required to reverse the map and hence the matrix is not invertible.

Determinants¶

Most of us vaguely recall that a square matrix is invertible only if some mysterious number known as the determinant, calculated using a byzantine series of redcutions, is non-zero. What is acutally going on?

Coming soon - geometric interpreation of determinant in $\mathbb{R}^n$.

Similar matrices¶

We know that when you change the basis, the matrix representation of a linear map will also change. This gives rise to the following question - when do two matrices in different basis systems represent the same linear map? Such matrices are known as similar matrices.

Let $T: V \rightarrow V$ be some linear map. Suppose we have a matrix $A$ that represents the linear map in basis $\left\{v_1, v_2, \ldots, v_n \right\}$, and a matrix $B$ that represents the same linear map in basis $\left\{w_1, w_2, \ldots, w_n \right \}$. Let’s now consider some vector $x \in V$. With one set of basis vectors, we have

\[x = a_1v_1 + a_2v_2 + \ldots + a_nv_n\]

while with the other, wwe have

\[x = b_1w_1 + b_2w_2 + \ldots b_nw_n\]

Suppose we can find a change-of-basis matrix $C$ that results in a change of basis from thee $v$ basis to the $w$ basis. In matrix notation, we have

\[\begin{split}C \pmatrix{a_1\\a_2\\\cdots\\a_n}\, = \, \pmatrix{b_1\\b_2\\\cdots\\b_n}\end{split}\]

Look at the picture below:

Commutative diagram

We see that $CA = BC$, and hence $A = C^{-1}BC$. Equivalently, $B = CAC^{-1}$. Whenever we find that we can express a matrix $A$ as the product of matrices $C^{-1}BC$, then matrices $A$ and $B$ are similar and represent the same linear map in different coordinate systems (i.e. linear combinations of different basis vectors). Some matrices have a particularly simple similar matrix - one with the only non-zero values all lie on the diagonal. Such matrices are known as diagonalizable and we will see how to find them next using eigendecomposition. The special diagonal matrix is known as the canonical form of all the other matrices that are similar to it - somehow the basis vectors that give rise to the canonical form are special. we will see why next.

Eigenvalues and Eigenvectors¶

First recall that an eigenvector of a matrix $A$ is a non-zero vector $v$ such that

\[Av = \lambda v\]

for some scalar $\lambda$. The value $\lambda$ is called an eigenvalue of $A$.

Basic intuition: Geometrically, notice that the application of $A$ on the eigenvector $v$ only stretches or contracts it by an amount $\lambda$. The direction of $v$ is unchanged. So geometrically, eigenvectors are invariant direction under some linear map $A$, while eigenvalues are scaling factors that tells us how $A$ shrinks or stretches the eigenvector $v$.

If an $n\times n$ matrix $A$ has $n$ linearly independent eigenvectors, then $A$ may be decomposed in the following manner:

\[A = B\Lambda B^{-1}\]

where $\Lambda$ is a diagonal matrix whose diagonal entries are the eigenvalues of $A$ and the columns of $B$ are the corresponding eigenvectors of $A$.

Facts:

An $n\times n$ matrix is diagonizable $\iff$ it has $n$ linearly independent eigenvectors.
A symmetric, positive definite matrix has only positive eigenvalues and its eigendecomposition

\[A=B\Lambda B^{-1}\]

is via an orthogonal transformation $B$. (I.e. its eigenvectors are an orthonormal set)

Calculating eigenvalues and eigenvectors¶

It is easy to see from the definition that if $v$ is an eigenvector of an $n\times n$ matrix $A$ with eigenvalue $\lambda$, then

\[Av - \lambda I = \bf{0}\]

where $I$ is the identity matrix of dimension $n$ and $\bf{0}$ is an n-dimensional zero vector. This means that $v \in \ker(A-\lambda I)$. For $v$ to be non-trivial, the column space for $A - \lambda I$ must be depeendent, and hence have a zero determinatn. Therefore, the eigenvalues of $A$ satisfy:

\[\det\left(A-\lambda I\right)=0\]

The left-hand side above is a polynomial in $\lambda$, and is called the characteristic polynomial of $A$. Thus, to find the eigenvalues of $A$, we find the roots of the characteristic polynomial.

Converting to the eigenvector basis¶

To convert from the standard basis ($B$) to the basis given by the eigenvectorrs ($B'$), we multiply by the inverse of the eigenvector marrix $V^{-1}$. Since the eigenvector matrix $V$ is orthogonal, $V^T = V^{-1}$. Given a matrix $M$ whose columns are the new basis vectors, the new coordinates for a vector $x$ are given by $M^{-1}x$. So to change the basis to use the eigenvector matrix (i.e. find the coordinates of the vector $x$ with respect to the space spnanned by the eigenvectors), we just need to multiply $V^{-1} = V^T$ with $x$.

Projection¶

We will look at a special type of linear transformation known as a projection, which is a map $T$ such that $T^2 = T$, also known as the idempotent property. As a simple example, suppose $T: \mathbb{R^2} \rightarrow \mathbb{R^2}$ is the transformation that just keeps the first coordinate and sets the second to zero. This is a projection onto the first basis vector and it is easy to see that the property $T^2 = T$ holds. For example $T(x, y) = (x, 0)$ and $T(x, 0) = (x, 0)$. Since a projection is a linear map, we can write $T$ as a projection matrix $P$. For example, the projection matrix for the example is

\[\begin{split}P = \pmatrix{1 & 0 \\ 0 & 0}\end{split}\]

Recall from linear algebra that the projection of a vector $x$ onto a subspace $A$ (that is, the columns of $A$ are a basis) is given by $A(A^TA)^{-1}A^T$. Note that $A(A^TA)^{-1}A^T$ is just a matrix, so projection is a linear transformation (prove it). If $A$ is an orthogonal matrix, the projeciton matrix simplifies to just $AA^T$. The projection of $x$ on $A$ is the vector in $A$ that is closest in distance to the oroiginal vector $x$. This is the basis for using projections to solve over-determined linear systems.

Least squares solution for over-determined linear systems.¶

When solving $Ax = b$, there is only a soluiton if $b$ is in the column space of $A$, that is $b \in C(A)$. When $A$ is a $n by k$ matrix where $n \gt k$, there is typically no souliton and we instead look for the vector $x^*$ in the column space of $A$ that is closest to the vector $b$ - the leaset squares solution. One way to solve this is minimization via calculus, but there is also an elegent linear algebra solution.

Consider $Ax^* - b = \text{Proj}_{C(A)}b - b$. The right hand side lies in the orthogoal complement of $C(A)$, and so must be in $N(A^T)$ - the left nullspace. This means that $Ax^* - b \in N(A^T)$. So $A^T(Ax^* - b) = 0$, and algebra gives us the desired result $A^TAx^* = A^T b$, which can be solved for $x^*$ in the usual way since $A^TA$ is a matrix and $A^Tb$ is a vector. Sometimes, it is shown like this $x^* = (A^TA)^{-1}A^Tb$, but that doesn’t mean you have to find the $(A^TA)^{-!}$ first.

In [ ]: