Matrices

You can think of a matrix as rectangle of objects arranged in rows and columns, like a chessboard. For computer scientist you can also imagine a matrix as a 2D array. A matrix with $m$ rows and $n$ columns is called an $m \times n$ matrix. The number of rows and columns therefore define the size or dimensions of a matrix. The objects in a matrix or commonly reffered to as elements or entries. A matrix containing only real numbered elements is called a real matrix and can be defined as follows:

\boldsymbol{A} \in \mathbb{R}^{m \times n} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}

The elements of a matrix are indexed by their row and then there column. So $a_{ij}$ or sometimes also $(\boldsymbol{A})_{ij}$ is the element in the $i$ th row and $j$ th column. This means that a matrix can also be defined as $\boldsymbol{A} = [a_{ij}]_{1 \leq i \leq m, 1 \leq j \leq n}$ .

Example

\begin{align*} \boldsymbol{A} = \begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \\ \boldsymbol{B} = \begin{bmatrix}1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} \end{align*}

Warning

The number of rows and columns of a matrix are always written in the order of rows first and columns second! So a matrix with 2 rows and 3 columns is written as a $2 \times 3$ matrix, not a $3 \times 2$ matrix, so $\boldsymbol{A} \in \mathbb{R}^{2 \times 3}$ .

A matrix with only one row is called a row vector or n-tuple and a matrix with only one column is called a column vector. When talking about vectors we usually refer to column vectors. A matrix that only has one row and one column can be thought of as a normal number and is called a scalar and therefore omit the matrix brackets. To differentiate between the three types of matrices we usually use bold capital letters for matrices, bold lower case letters for column vectors and normal lower case letters for scalars.

Example

A column vector, i.e. ur usual vector, can be written as follows:

\boldsymbol{v} \in \mathbb{R}^{n \times 1} = \begin{bmatrix}v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}

A row vector can be written as follows:

\boldsymbol{v} \in \mathbb{R}^{1 \times n} = \begin{bmatrix}v_1 & v_2 & \cdots & v_n \end{bmatrix}

A scalar can be written as follows:

\boldsymbol{v} \in \mathbb{R}^{1 \times 1} = \begin{bmatrix}v \end{bmatrix} = v

Square Matrix

As you can imagine a square matrix is a matrix where the number of rows and columns are equal. The number of rows or columns $n$ is reffered to as the order of the square matrix. So a square matrix of order $n$ is an $n \times n$ matrix, i.e. $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ . Square matrices have a number of useful properties and are therefore often used in many different applications.

\boldsymbol{A} \in \mathbb{R}^{n \times n} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix}

Example

\boldsymbol{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}

Null/Zero Matrix

If a matrix only contains elements that are zero it is called a zero or null matrix. A null matrix is denoted as $\boldsymbol{O}$ as the dimensions are usually implied by the context. If you want to be explicit you can also write $\boldsymbol{O}_{m \times n}$ where $m$ is the number of rows and $n$ is the number of columns.

\boldsymbol{O} = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}

If the matrix is a column vector we usually use the lower case letter $\boldsymbol{o}$ to denote a null or zero vector.

\boldsymbol{o} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}

Diagonal Matrix

A diagonal matrix is a square matrix where all the elements that are not on the main diagonal are zero. The main diagonal is the diagonal that goes from the top left to the bottom right of the matrix, in other words where the row and column index are equal, i.e. $a_{ij}$ where $i = j$ .

So a diagonal matrix is where $a_{ij} = 0$ for all $i \neq j$ and $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ . If a diagonal matrix is defined by the following elements $d_{11}, d_{22}, \ldots, d_{nn}$ then the matrix can be written as:

\boldsymbol{D} = \text{diag}(d_{11}, d_{22}, \ldots, d_{nn}) = \begin{bmatrix} d_{11} & 0 & \cdots & 0 \\ 0 & d_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_{nn} \end{bmatrix}

Example

\boldsymbol{D} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix} = \text{diag}(1, 2, 3)

The definition of a diagonal matrix can also be extended to non-square matrices. In this case the matrix is still a diagonal matrix if all the elements that are not on the main diagonal are zero.

Example

The following non-square matrix are still diagonal matrices:

\boldsymbol{D}_1 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & -3 \\ 0 & 0 & 0 \end{bmatrix} \text{ and } \boldsymbol{D}_2 = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 & 0 \\ 0 & 0 & -3 & 0 & 0 \end{bmatrix}

Identity Matrix

An identity matrix sometimes also called unit matrix or identity of order $n$ is a square diagonal matrix where all the diagonal elements are equal to $1$ . The identity matrix is often denoted as $\boldsymbol{I}$ or $\boldsymbol{I}_n$ where $n$ is the number of rows and columns in the matrix.

Example

\boldsymbol{I_3} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \text{diag}(1, 1, 1)

Triangular Matrix

A triangular matrix is a square matrix where all the elements above or below the main diagonal are $0$ . If all the elements above the diagonal are $0$ it is called a lower triangular matrix becuase it only has elements below the diagonal that are non-zero:

(\boldsymbol{L})_{ij} = 0 \text{ for all } i < j

If all the elements below the diagonal are $0$ it is called a upper triangular matrix because it only has elements above the diagonal that are non-zero:

(\boldsymbol{U})_{ij} = 0 \text{ for all } i > j

Example

A lower triangular matrix $\boldsymbol{L}$ :

\boldsymbol{L} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 3 & 0 \\ 4 & 5 & 6 \end{bmatrix}

An upper triangular matrix $\boldsymbol{U}$ :

\boldsymbol{U} = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 0 & 0 & 6 \end{bmatrix}

Matrix Operations

Matrix Addition

Two matrices can be added together if they have the same dimensions, i.e. $\boldsymbol{A} \in \mathbb{R}^{m \times n}$ and $\boldsymbol{B} \in \mathbb{R}^{m \times n}$ . The addition of two matrices is defined as the element-wise addition of the matrices:

\boldsymbol{A} + \boldsymbol{B} = \begin{bmatrix} a_{11} + b_{11} & a_{12} + b_{12} & \cdots & a_{1n} + b_{1n} \\ a_{21} + b_{21} & a_{22} + b_{22} & \cdots & a_{2n} + b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} + b_{m1} & a_{m2} + b_{m2} & \cdots & a_{mn} + b_{mn} \end{bmatrix}

Or more formally:

\boldsymbol{A} + \boldsymbol{B} = (\boldsymbol{A} + \boldsymbol{B})_{ij} = (\boldsymbol{A})_{ij} + (\boldsymbol{B})_{ij}

Example

\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} + \begin{bmatrix} 7 & 8 & 9 \\ 10 & 11 & 12 \end{bmatrix} = \begin{bmatrix} 8 & 10 & 12 \\ 14 & 16 & 18 \end{bmatrix}

Additive Identity

As you can expect the addition of any matrix with the null matrix results in the original matrix:

\boldsymbol{A} + \boldsymbol{O} = \boldsymbol{A}

Additive Inverse

We can also expect to find a matrix $\boldsymbol{B}$ that when added to $\boldsymbol{A}$ results in the null matrix:

\boldsymbol{A} + \boldsymbol{B} = \boldsymbol{O}

The matrix $\boldsymbol{B}$ is called the additive inverse of $\boldsymbol{A}$ and is denoted as $-\boldsymbol{A}$ where each element in the matrix is defined as the negative of the corresponding element in the matrix $\boldsymbol{A}$ :

(-\boldsymbol{A})_{ij} = -(\boldsymbol{A})_{ij}

By using the additive inverse we can also subtract two matrices. The subtraction of two matrices is defined as the addition of the first matrix and the additive inverse of the second matrix:

\boldsymbol{A} - \boldsymbol{B} = \boldsymbol{A} + (-\boldsymbol{B})

Example

\boldsymbol{A} - \boldsymbol{B} = \begin{bmatrix} 7 & 8 & 9 \\ 10 & 11 & 12 \end{bmatrix} - \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} = \begin{bmatrix} 7 - 1 & 8 - 2 & 9 - 3 \\ 10 - 4 & 11 - 5 & 12 - 6 \end{bmatrix} = \begin{bmatrix} 6 & 6 & 6 \\ 6 & 6 & 6 \end{bmatrix}

Scalar Multiplication

A matrix can be multiplied by a scalar, i.e. a constant. This is defined as the element-wise multiplication of the matrix with the scalar:

s \cdot \boldsymbol{A} = \begin{bmatrix} s \cdot a_{11} & s \cdot a_{12} & \cdots & s \cdot a_{1n} \\ s \cdot a_{21} & s \cdot a_{22} & \cdots & s \cdot a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ s \cdot a_{m1} & s \cdot a_{m2} & \cdots & s \cdot a_{mn} \end{bmatrix}

Or more formally:

s \cdot \boldsymbol{A} = (s \cdot \boldsymbol{A})_{ij} = s \cdot (\boldsymbol{A})_{ij}

Example

5 \cdot \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} = \begin{bmatrix} 5 \cdot 1 & 5 \cdot 2 & 5 \cdot 3 \\ 5 \cdot 4 & 5 \cdot 5 & 5 \cdot 6 \end{bmatrix} = \begin{bmatrix} 5 & 10 & 15 \\ 20 & 25 & 30 \end{bmatrix}

Matrix Multiplication

Matrix multiplication is a bit more complex than addition and scalar multiplication. One would think that multiplying two matrices together would be as simple as multiplying each element of the first matrix with the corresponding element in the second matrix and would therefore just like addition be an element-wise operation and only defined for matrices with the same dimensions. However, this is not the case, this would be the Hadamard product of two matrices. The reason for a more complex definition is due to relationship that the matrix multiplication has with linear transformations to which we will come later.

Matrix multiplication is only defined for matrices where the number of columns in the first matrix is equal to the number of rows in the second matrix. We will see why this is the case later. The result of multiplying two matrices together is a new matrix where the number of rows is equal to the number of rows in the first matrix and the number of columns is equal to the number of columns in the second matrix. So the dimensions in a matrix multiplication are defined as follows:

\boldsymbol{A} \in \mathbb{R}^{\color{red}{m} \times \color{blue}{n}} \text{ and } \boldsymbol{B} \in \mathbb{R}^{\color{blue}{n} \times \color{green}{p}} \Rightarrow \boldsymbol{C} = \boldsymbol{A} \cdot \boldsymbol{B} \in \mathbb{R}^{\color{red}{m} \times \color{green}{p}}

Dimensions of a matrix multiplication visualized.

The actual calculation of the elements in the resulting matrix is a bit more complex. The element in the $i$ -th row and $j$ -th column in the resulting matrix is defined as the sum of the products of the elements in the $i$ -th row of the first matrix and the $j$ -th column of the second matrix. So the element $c_{ij}$ in the resulting matrix is calculated as follows:

c_{ij} = a_{i1} \cdot b_{1j} + a_{i2} \cdot b_{2j} + \cdots + a_{im} \cdot b_{mj} = \sum_{k=1}^m a_{ik} \cdot b_{kj}

or more formally:

\boldsymbol{C} = \boldsymbol{A} \cdot \boldsymbol{B} = (\boldsymbol{A} \cdot \boldsymbol{B})_{ij} = \sum_{k=1}^m (\boldsymbol{A})_{ik} \cdot (\boldsymbol{B})_{kj} = \sum_{k=1}^m a_{ik} \cdot b_{kj}

There are three different ways to think about matrix multiplication, the element view, the row view and the column view. The first way would be to think of how a single element in the resulting matrix is calculated. A single element in row $i$ and column $j$ is calculated as the sum of the products of the elements in the $i$ -th row of the first matrix and the $j$ -th column of the second matrix. Later on you will see that this is the dot product of the $i$ -th row and the $j$ -th column as vectors.

Element-wise view of matrix multiplication.

Example

\begin{align*} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} &= \begin{bmatrix} 1a + 2c & \quad\quad \\ \quad\quad & \quad\quad \\ \end{bmatrix} \\ &= \begin{bmatrix} 1a + 2c & 1b + 2d \\ 3a + 4c & 3b + 4d \end{bmatrix} \end{align*}

The second way to think about matrix multiplication is to see how a column in the resulting matrix is calculated. The first column in the resulting matrix corresponds to the linear combination of the columns in the left matrix where the weights are the elements in the first column of the right matrix. This pattern carries on for the other columns in the resulting matrix.

Column-wise view of matrix multiplication.

Example

\begin{align*} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} &= \begin{bmatrix} a \begin{bmatrix} 1 \\ 3 \end{bmatrix} + b \begin{bmatrix} 2 \\ 4 \end{bmatrix} & \quad\quad\quad\quad\quad \\ \end{bmatrix} \\ &= \begin{bmatrix} a \begin{bmatrix} 1 \\ 3 \end{bmatrix} + b \begin{bmatrix} 2 \\ 4 \end{bmatrix} & c \begin{bmatrix} 1 \\ 3 \end{bmatrix} + d \begin{bmatrix} 2 \\ 4 \end{bmatrix} \end{bmatrix} = \begin{bmatrix} 1a + 2b & 1c + 2d \\ 3a + 4b & 3c + 4d \end{bmatrix} \end{align*}

The third way to think about matrix multiplication is to see how a row in the resulting matrix is calculated. The first row in the resulting matrix corresponds to the linear combination of the rows in the right matrix where the weights are the elements in the first row of the left matrix. This pattern carries on for the other rows in the resulting matrix.

Example

\begin{align*} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} &= \begin{bmatrix} 1 \begin{bmatrix} a & b \end{bmatrix} + 2 \begin{bmatrix} c & d \end{bmatrix} \\ \quad\quad\quad\quad\quad \end{bmatrix} \\ &= \begin{bmatrix} 1 \begin{bmatrix} a & b \end{bmatrix} + 2 \begin{bmatrix} c & d \end{bmatrix} \\ 3 \begin{bmatrix} a & b \end{bmatrix} + 4 \begin{bmatrix} c & d \end{bmatrix} \end{bmatrix} = \begin{bmatrix} 1a + 2c & 1b + 2d \\ 3a + 4c & 3b + 4d \end{bmatrix} \end{align*}

Commutative Property

Matrix multiplication is not commutative so:

\boldsymbol{A} \cdot \boldsymbol{B} \neq \boldsymbol{B} \cdot \boldsymbol{A}

This means that the order in which you multiply is important! This already becomes apparent when you look at the dimensions of the matrices. If you multiply a $2 \times 3$ matrix with a $3 \times 2$ matrix you get a $2 \times 2$ matrix. However, if you multiply a $3 \times 2$ matrix with a $2 \times 3$ matrix you get a $3 \times 3$ matrix.

Example

\begin{align*} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \cdot \begin{bmatrix} 7 & 8 \\ 9 & 10 \\ 11 & 12 \end{bmatrix} &= \begin{bmatrix} 58 & 64 \\ 139 & 154 \end{bmatrix} \\ \begin{bmatrix} 7 & 8 \\ 9 & 10 \\ 11 & 12 \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} &= \begin{bmatrix} 39 & 54 & 69 \\ 49 & 68 & 87 \\ 59 & 82 & 105 \end{bmatrix} \end{align*}

Even if the dimensions of the matrices are the same the result of the matrix multiplication can be different depending on the order of the matrices.

\begin{align*} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \cdot \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} &= \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} \\ \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} &= \begin{bmatrix} 23 & 34 \\ 31 & 46 \end{bmatrix} \end{align*}

For the element $c_{11}$ we can clearly see why the order of the matrices is important:

\begin{align*} c_{11} &= 1 \cdot 5 + 2 \cdot 7 = 19 \\ c_{11} &= 5 \cdot 1 + 6 \cdot 3 = 23 \\ 19 &\neq 23 \end{align*}

There are however some special matrices that are commutative to each other. These matrices are called commutative matrices or commute to each other so:

\boldsymbol{A} \cdot \boldsymbol{B} = \boldsymbol{B} \cdot \boldsymbol{A}

Example

For the three matrices $\boldsymbol{A}$ , $\boldsymbol{B}$ and $\boldsymbol{C}$ :

\boldsymbol{A} = \begin{bmatrix} 2 & 6 \\ 1 & 7 \end{bmatrix} \quad \boldsymbol{B} = \begin{bmatrix} -3 & -1 \\ 2 & 1 \end{bmatrix} \quad \boldsymbol{C} = \begin{bmatrix} 15 & 6 \\ 1 & 20 \end{bmatrix}

We can calculate the following:

\begin{align*} \boldsymbol{AB} = \begin{bmatrix} 6 & 4 \\ 11 & 6 \end{bmatrix} \quad \boldsymbol{BA} = \begin{bmatrix} -7 & -25 \\ 5 & 19 \end{bmatrix} \\ \boldsymbol{AC} = \begin{bmatrix} 36 & 132 \\ 22 & 146 \end{bmatrix} \quad \boldsymbol{CA} = \begin{bmatrix} 36 & 132 \\ 22 & 146 \end{bmatrix} \\ \end{align*}

We can see that the matrices $\boldsymbol{A}$ and $\boldsymbol{C}$ commute to each other.

\boldsymbol{A} \cdot \boldsymbol{C} = \boldsymbol{C} \cdot \boldsymbol{A} \quad \text{and} \quad \boldsymbol{A} \cdot \boldsymbol{B} \neq \boldsymbol{B} \cdot \boldsymbol{A}

Associative Property

Matrix multiplication is associative so:

(\boldsymbol{A} \cdot \boldsymbol{B}) \cdot \boldsymbol{C} = \boldsymbol{A} \cdot (\boldsymbol{B} \cdot \boldsymbol{C})

This mean that the order in which you group the matrices in a multiplication does not matter. However, this does not mean that the order in which you multiply the matrices does not matter. As we have seen before matrix multiplication is not commutative.

The associative property is very useful for linear transformations as it allows you to combine multiple transformations into a single transformation.

Proof

We have the following matrices:

\boldsymbol{A} \in \mathbb{R}^{m \times n} \quad \boldsymbol{B} \in \mathbb{R}^{n \times p} \quad \boldsymbol{C} \in \mathbb{R}^{p \times q}

Then we can see that the following two equations become the same:

\begin{align*} ((\boldsymbol{A} \boldsymbol{B}) \boldsymbol{C})_{ik} &= \sum_{l=1}^p (\boldsymbol{A} \boldsymbol{B})_{il} c_{lk} = \sum_{l=1}^p \sum_{j=1}^n a_{ij} b_{jl} c_{lk} \\ (\boldsymbol{A} (\boldsymbol{B} \boldsymbol{C}))_{ij} &= \sum_{j=1}^n a_{ij} (\boldsymbol{B} \boldsymbol{C})_{jk} = \sum_{j=1}^n \sum_{l=1}^p a_{ij} b_{jl} c_{lk} \end{align*}

Distributive Property

The matrix multiplication is left distributive over addition and right distributive over addition so:

\begin{align*} \boldsymbol{A} \cdot (\boldsymbol{B} + \boldsymbol{C}) = \boldsymbol{A} \cdot \boldsymbol{B} + \boldsymbol{A} \cdot \boldsymbol{C} \\ (\boldsymbol{A} + \boldsymbol{B}) \cdot \boldsymbol{C} = \boldsymbol{A} \cdot \boldsymbol{C} + \boldsymbol{B} \cdot \boldsymbol{C} \end{align*}

Proof

For the matrices $\boldsymbol{A}$ , $\boldsymbol{B}$ and $\boldsymbol{C}$ :

\boldsymbol{A} \in \mathbb{R}^{m \times n} \quad \boldsymbol{B} \in \mathbb{R}^{m \times n} \quad \boldsymbol{C} \in \mathbb{R}^{n \times p}

Then we can see that the following two equations become the same:

\begin{align*} ((\boldsymbol{A} + \boldsymbol{B}) \cdot \boldsymbol{C})_{ij} &= \sum_{k=1}^n (\boldsymbol{A} + \boldsymbol{B})_{ik} c_{kj} \\ &= \sum_{k=1}^n (a_{ik} + b_{ik}) c_{kj} = \sum_{k=1}^n a_{ik} c_{kj} + b_{ik} c_{kj} = \sum_{k=1}^n a_{ik} c_{kj} + \sum_{k=1}^n b_{ik} c_{kj} \\ &= (\boldsymbol{A} \cdot \boldsymbol{C})_{ij} + (\boldsymbol{B} \cdot \boldsymbol{C})_{ij} \end{align*}

Multiplicative Identity

With regards to matrix multiplication the identity matrix is a special matrix. The identity matrix functions as the multiplicative identity or also called the neutral element for matrix multiplication. Meaning that if you multiply any matrix with the identity matrix you get the same matrix back. The identity matrix is also commute to any matrix.

\boldsymbol{I} \cdot \boldsymbol{A} = \boldsymbol{A} \cdot \boldsymbol{I} = \boldsymbol{A}

The reason as to why the identity matrix functions as the multiplicative identity can quite easily be seen if you think of the first row in the identity matrix as selecting all the elements in the first row of the second matrix. Then the second row in the identity matrix selects all the elements in the second row of the second matrix and so on. So it can be thought of as the 1 in each row/column of the identity matrix selecting the corresponding row/column in the second matrix.

Example

\boldsymbol{I} \cdot {A} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}

The same is true for the other way around:

\boldsymbol{A} \cdot {I} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \cdot \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}

Permutation Matrix

We have seen that the identity matrix is a special matrix that functions as the multiplicative identity for matrix multiplication and can be used to select the corresponding row/column in another matrix. A permutation matrix commonly denoted as $\boldsymbol{P}$ is a matrix that can be used to permute the rows or columns of another matrix. A permutation matrix is a square matrix where each row and column contains exactly one 1 and all the other elements are 0. Depending on if you want to permute the rows or the columns you can use the permutation matrix as the left or right matrix in the multiplication.

To permute the rows of a matrix you can use the permutation matrix as the left matrix in the multiplication. By swapping the rows of the identity matrix you can permute the rows of the matrix.

Example

We can swap the first and second row of the matrix $\boldsymbol{A}$ :

\boldsymbol{P} \boldsymbol{A} = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} = \begin{bmatrix} 4 & 5 & 6 \\ 1 & 2 & 3 \\ 7 & 8 & 9 \end{bmatrix}

To permute the columns of a matrix you can use the permutation matrix as the right matrix in the multiplication. By swapping the columns of the identity matrix you can permute the columns of the matrix.

Example

We can swap the first and second column of the matrix $\boldsymbol{A}$ :

\boldsymbol{A} \boldsymbol{P} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 & 3 \\ 5 & 4 & 6 \\ 8 & 7 & 9 \end{bmatrix}

Strassen's Algorithm

Winograd's Algorithm

Transpose

The transpose of a matrix is a matrix where the rows and columns are swapped. The transpose of the matrix $\boldsymbol{A}$ is written as $\boldsymbol{A}^T$ . Formally the element in the $i$ -th row and $j$ -th column in the matrix $\boldsymbol{A}$ becomes the element in the $j$ -th row and $i$ -th column in the matrix $\boldsymbol{A}^T$ .

\boldsymbol{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \Rightarrow \boldsymbol{A}^T = \begin{bmatrix} a_{11} & a_{21} & \cdots & a_{m1} \\ a_{12} & a_{22} & \cdots & a_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{mn} \end{bmatrix}

Or also more formally:

(\boldsymbol{A}^T)_{ij} = (\boldsymbol{A})_{ji}

This results in the transpose of a row vector being a column vector and the transpose of a column vector being a row vector. The same goes for a matrix where the transpose of a matrix results in a matrix with its dimensions swapped. So the transpose of a $m \times n$ matrix is a $n \times m$ matrix.

Example

Transposing a matrix:

\boldsymbol{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \Rightarrow \boldsymbol{A}^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}

Transposing a column vector is very useful when you want to write vectors in a readable way in a text, as you can see. Most commonly in textbooks when a vector is defined it is defined as a transposed column vector.

\boldsymbol{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \Rightarrow \boldsymbol{v}^T = \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix}

Transposing a row vector:

\boldsymbol{v} = \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix} \Rightarrow \boldsymbol{v}^T = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}

Properties of Transpose

There are a few useful properties of the transpose of a matrix that are worth remembering. Thist first it that if a matrix is transposed twice it is the same as the original matrix:

(\boldsymbol{A}^T)^T = \boldsymbol{A}

Example

\boldsymbol{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \Rightarrow \boldsymbol{A}^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} \Rightarrow (\boldsymbol{A}^T)^T = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} = \boldsymbol{A}

Another important property is that when you apply a transpose to some expression in brackets the order of the matrices is reversed. For addition this does not matter as it is commutative but for multiplication it does matter. So:

\begin{align*} (\boldsymbol{A} + \boldsymbol{B})^T &= \boldsymbol{A}^T + \boldsymbol{B}^T \\ (\boldsymbol{A} \cdot \boldsymbol{B})^T &= \boldsymbol{B}^T \cdot \boldsymbol{A}^T \end{align*}

Symmetric Matrix

If a matrix is equal to its transpose it is called a symmetric matrix. So $\boldsymbol{A} = \boldsymbol{A}^T$ and for each element in the matrix $a_{ij} = a_{ji}$ . As you can quite easily imagine a prerequisite for a matrix to be symmetric is that it is a square matrix as otherwise the transpose of the matrix would have different dimensions.

Another thing that makes sense is that a diagonal matrix is always symmetric as all the elements that are not on the diagonal are zero.

Example

A symmetric matrix:

\boldsymbol{A} = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 5 \\ 3 & 5 & 6 \end{bmatrix} \Rightarrow \boldsymbol{A}^T = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 5 \\ 3 & 5 & 6 \end{bmatrix} = \boldsymbol{A}

A diagonal matrix is also symmetric:

\boldsymbol{A} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix} \Rightarrow \boldsymbol{A}^T = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix} = \boldsymbol{A}

Skew-Symmetric Matrix

A skew-symmetric matrix is a matrix where the elements of the matrix are equal to the negative of the elements in the transpose of the matrix. So $\boldsymbol{A} = -\boldsymbol{A}^T$ and for each element in the matrix $a_{ij} = -a_{ji}$ .

Example

\boldsymbol{A} = \begin{bmatrix} 0 & 2 & -3 \\ -2 & 0 & 5 \\ 3 & -5 & 0 \end{bmatrix} \Rightarrow \boldsymbol{A}^T = \begin{bmatrix} 0 & -2 & 3 \\ 2 & 0 & -5 \\ -3 & 5 & 0 \end{bmatrix} = -\boldsymbol{A}

Transposing on a Computer

When you want the transpose of a matrix you don't actually need to perform any operations. You can just change the way you access the elements of the matrix. Rather than accessing the elements row by row you can access them column by column.

Depending on the size of the matrix and how many times you need to access the elements of the matrix this can be a lot faster than actually transposing the matrix. However, if you need to access the elements of the matrix multiple times it is probably faster to transpose the matrix first and then access the elements due to memory locality.

To transpose a square matrix in-place you can use the following algorithm which you can think of as swapping the elements of the matrix along the diagonal:

for (int i = 0; i < n; i++) {
    for (int j = i + 1; j < n; j++) {
        swap(A[i][j], A[j][i]);
    }
}

For a non-square matrix you need to use a slightly more complex algorithm.

Inverse

The inverse of a matrix is a very important and special matrix that when multiplied with the original matrix results in the identity matrix. The inverse of a matrix is denoted as $\boldsymbol{A}^{-1}$ .

\boldsymbol{A} \cdot \boldsymbol{A}^{-1} = \boldsymbol{A}^{-1} \cdot \boldsymbol{A} = \boldsymbol{I}

Not all matrices have an inverse. A matrix that has an inverse is called an regular/invertible/non-singular matrix. A matrix that does not have an inverse is called a degenerate/singular/non-invertible matrix. If a matrix is invertible then the inverse is unique, i.e. there is only one inverse for a matrix. For now we will not discuss how to calculate the inverse of a matrix or when a matrix is invertable but we will do so later on in the corresponding sections for matrix rank, determinant and matrix inversion.

Similarily to the transpose of a matrix the inverse of a matrix has some properties that are worth remembering:

\begin{align*} (\boldsymbol{A}^{-1})^{-1} &= \boldsymbol{A} \\ (\boldsymbol{A} \cdot \boldsymbol{B})^{-1} &= \boldsymbol{B}^{-1} \cdot \boldsymbol{A}^{-1} \\ (\boldsymbol{A} + \boldsymbol{B})^{-1} &\neq \boldsymbol{A}^{-1} + \boldsymbol{B}^{-1} \neq \boldsymbol{A}^{-1} + \boldsymbol{B}^{-1} \end{align*}

Example

The following matrix is invertible:

\boldsymbol{A} = \begin{bmatrix} 1 & 2 & 1 \\ 4 & 4 & 5 \\ 6 & 7 & 7 \end{bmatrix} \quad \text{with} \quad \boldsymbol{A}^{-1} = \begin{bmatrix} -7 & -7 & 6 \\ 2 & 1 & -1 \\ 4 & 5 & -4 \end{bmatrix}

The following simple matrix is not invertible:

\boldsymbol{A} = \begin{bmatrix} 2 & 4 \\ 2 & 4 \end{bmatrix}

Todo

Parts below here are still a work in progress and might not belong here.

Frobenius Norm

The Frobenius norm is a way to measure the size of a matrix. It is defined as the square root of the sum of the squares of all the elements in the matrix. So for a matrix $\boldsymbol{A} \in \mathbb{R}^{m \times n}$ the Frobenius norm is defined as follows:

\|\boldsymbol{A}\|_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n a_{ij}^2}

You can also think of it as just taking the matrix and flattening it into a vector and then calculating the length of that vector, i.e. the Euclidean/L2 norm of the vector.

Example

If we define the matrix $\boldsymbol{A}$ :

\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}

Then the Frobenius norm of $\boldsymbol{A}$ is:

\|\boldsymbol{A}\|_F = \sqrt{1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2} = \sqrt{91} \approx 9.539

Trace

The trace of a matrix is the sum of all the diagonal elements in the matrix and is denoted as $\text{tr}(\boldsymbol{A})$ . Because it is the sum of the diagonal elements it is only defined for square matrices, i.e. $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ :

\text{tr}(\boldsymbol{A}) = \sum_{i=1}^n a_{ii} = a_{11} + a_{22} + \cdots + a_{nn}

Example

\text{tr}(\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}) = 1 + 5 + 9 = 15

Todo

Add properties of the trace. and proof that it is the sum of the eigenvalues.

Orthogonal / Orthonormal Matrix

Very unclear what the difference is between these two. I think an orthogonal matrix is a matrix where the columns are orthogonal to each other but don't have to be normalised. And an orthonormal matrix is a matrix where the columns are orthogonal to each other and are normalised, i.e. have a length of $1$ .

Matrices

Square Matrix

Null/Zero Matrix

Diagonal Matrix

Identity Matrix

Triangular Matrix

Matrix Operations

Matrix Addition

Additive Identity

Additive Inverse

Scalar Multiplication

Matrix Multiplication

Commutative Property

Associative Property

Distributive Property

Multiplicative Identity

Permutation Matrix

Strassen's Algorithm

Winograd's Algorithm

Transpose

Properties of Transpose

Symmetric Matrix

Skew-Symmetric Matrix

Transposing on a Computer

Inverse

Frobenius Norm

Trace

Orthogonal / Orthonormal Matrix

Householder Matrix

Block Matrix

Bi- and Tri-Diagonal Matrix

Band Matrix

Toeplitz and Hankel Matrix