Digital Garden
Maths
Linear Algebra
Vectors

Vectors

We have already seen that vectors are just special types of matrices with either one row or one column, primarly with one column. However, depending on the field you are working in, a vector can mean many different things. In computer science, a vector is just a list of numbers, i.e. a 1D array or n-tuple. The elements of a vector are more commonly called components rather than elements.

In maths, a vector can be thought of as an arrow in space that have a starting and ending point, also reffered to as the tail and the head of the vector. Most commonly vectors are denoted by a lowercase letter either in bold or with an arrow above it, e.g. v\vec{v} or v\boldsymbol{v}. If the vector is defined by two coordinate points, AA and BB, then the vector is denoted as AB\vec{AB}, this type of vector is a position vector. Usually the starting point of the vector is at the origin, (0,0)(0, 0) in 2D space or (0,0,0)(0, 0, 0) in 3D space etc. making the head of the vector the point (x,y)(x, y) or (x,y,z)(x, y, z) etc. this vector is then in the standard position. However, it is important to note that the vector is independent of the starting point, i.e. the vector is the same no matter where it starts only the direction and length of the vector matter.

We can see that the vectors defined from the origin are equivalent to the point coordinates.

Vectors are easily visualized in 2D and 3D space, but can be extended to any number of dimensions. If we define that all vectors have the same starting point, they can be uniquely be defined by their ending point which is the same as the direction and length of the vector. The length of the vector is also called the magnitude.

This is also in line with vectors in physics, where vectors are used to represent physical quantities that have both magnitude and direction such as velocity, force, acceleration, etc. For example, a force vector indicates both the magnitude of the force and the direction in which it is applied. Unlike position vectors, these vectors do not necessarily have a fixed starting point at the origin. Instead, they can be applied at any point in space, and their effects are determined by their magnitude and direction. The length of the vector represents the magnitude of the physical quantity, and the direction of the arrow indicates the direction in which the quantity acts.

Vectors being used to denote force and acceleration of a car.

Vector Addition

To add two vectors together, we simply add the corresponding components of the vectors together i.e. we just add element-wise. This also means that the two vectors must have the same number of components/dimensions. This is equivalent to matrix addition. So we can add two vectors xRn\boldsymbol{x} \in \mathbb{R}^n and yRn\boldsymbol{y} \in \mathbb{R}^n as follows:

[x1x2xn]+[y1y2yn]=[x1+y1x2+y2xn+yn]\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} + \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} x_1 + y_1 \\ x_2 + y_2 \\ \vdots \\ x_n + y_n \end{bmatrix}
Example [123]+[456]=[1+42+53+6]=[579]\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} + \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix} = \begin{bmatrix} 1 + 4 \\ 2 + 5 \\ 3 + 6 \end{bmatrix} = \begin{bmatrix} 5 \\ 7 \\ 9 \end{bmatrix}

We can also visualize vector addition nicely in 2D and 3D space with position vectors. Geometrically we can think of vector addition as moving the tail of the second vector to the head of the first vector. The resulting vector is the vector that starts at the tail of the first vector and ends at the head of the second vector. This also results in the the two vectors forming the sides of a parallelogram.

Vector addition in 2D.

Scalar Multiplication

We can multiply a vector by a scalar, i.e. a number, by multiplying each component of the vector by the scalar. So if we have a vector x\boldsymbol{x} and a scalar ss, then we can multiply them together just like when we multiply a matrix by a scalar:

sx=s[x1x2xn]=[sx1sx2sxn]s\boldsymbol{x} = s \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} sx_1 \\ sx_2 \\ \vdots \\ sx_n \end{bmatrix}
Example 2[123]=[212223]=[246]2\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 2 \cdot 1 \\ 2 \cdot 2 \\ 2 \cdot 3 \end{bmatrix} = \begin{bmatrix} 2 \\ 4 \\ 6 \end{bmatrix}

We can also visualize scalar multiplication nicely in 2D and 3D space. Geometrically we can think of scalar multiplication as stretching or shrinking the vector by the scalar. This is why scalar multiplication is also called vector scaling and the number is called the scalar. If the scalar is negative, then the vector will be flipped around, i.e. it will point in the opposite direction.

Scalar multiplication of a vector in 2D.

Subtraction

Subtracting two vectors is the same as adding the first vector to the negative of the second vector, i.e. multiplying the second vector by 1-1. So if we have two vectors we can subtract them as follows:

xy=x+(y)\boldsymbol{x} - \boldsymbol{y} = \boldsymbol{x} + (-\boldsymbol{y})
Example [123][456]=[123]+[456]=[1+(4)2+(5)3+(6)]=[333]\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} - \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} + \begin{bmatrix} -4 \\ -5 \\ -6 \end{bmatrix} = \begin{bmatrix} 1 + (-4) \\ 2 + (-5) \\ 3 + (-6) \end{bmatrix} = \begin{bmatrix} -3 \\ -3 \\ -3 \end{bmatrix}

When visualizing the subtraction of two vectors, we can think of it as adding the negative of the second vector to the first vector, i.e. moving the tail of the second vector after flipping it to the head of the first vector.

Vector subtraction in 2D.

Another geometric interpretation of vector subtraction is that the resulting vector is the vector that points from the head of the second vector to the head of the first vector after moving the tail of the second vector to the tail of the first vector. From this interpretation, we can clearly see that the subtraction of two vectors is not commutative, i.e. the order in which we subtract the vectors matters as the resulting vector will point in the opposite direction just like in normal subtraction, 12=11 - 2 = -1 and 21=12 - 1 = 1. If you think of ba\boldsymbol{b} - \boldsymbol{a} as the vector c\boldsymbol{c}, then you can also see that a+c=b\boldsymbol{a} + \boldsymbol{c} = \boldsymbol{b} and after rewriting the equation we get c=ba\boldsymbol{c} = \boldsymbol{b} - \boldsymbol{a} visually.

The geometric interpretation of vector subtraction.

Linear Combination

If we combine the concepts of vector addition and scalar multiplication, we get the concept of a linear combination. A linear combination of vectors is the sum of the vectors scaled by some scalars. So if we have a set of vectors v1,v2,,vn\boldsymbol{v}_1, \boldsymbol{v}_2, \dots, \boldsymbol{v}_n and a set of scalars s1,s2,,sns_1, s_2, \dots, s_n, then we can combine them as follows:

s1v1+s2v2++snvn=i=1nsivi=xs_1\boldsymbol{v}_1 + s_2\boldsymbol{v}_2 + \dots + s_n\boldsymbol{v}_n = \sum_{i=1}^n s_i\boldsymbol{v}_i = \boldsymbol{x}

The scalars s1,s2,,sns_1, s_2, \dots, s_n are the weights of the vectors, i.e. they determine how much each vector contributes to the resulting vector.

Example

If v\boldsymbol{v} and w\boldsymbol{w} are defined as:

v=[23]andw=[31]\boldsymbol{v} = \begin{bmatrix} 2 \\ 3 \end{bmatrix} \quad \text{and} \quad \boldsymbol{w} = \begin{bmatrix} 3 \\ -1 \end{bmatrix}

We can combine them as follows:

2v+1w=2[23]+1[31]=[46]+[31]=[17]2\boldsymbol{v} + -1\boldsymbol{w} = 2\begin{bmatrix} 2 \\ 3 \end{bmatrix} + -1\begin{bmatrix} 3 \\ -1 \end{bmatrix} = \begin{bmatrix} 4 \\ 6 \end{bmatrix} + \begin{bmatrix} -3 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 7 \end{bmatrix}

Linear combinations are the basis(hehe) of linear algebra and are used to define many more complex concepts such as linear independence, vector spaces, and linear transformations.

All vectors are Linear Combinations

We can show that all vectors are linear combinations of other vectors. For example we can create all vectors with two components uR2\boldsymbol{u} \in \mathbb{R}^2 by combining the following two vectors:

e1=[10]ande2=[01]\boldsymbol{e}_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \quad \text{and} \quad \boldsymbol{e}_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}

This becomes pretty clear if we think of the two vectors as x and y axis in 2D space. Any point in 2D space can be defined by the x and y coordinates, i.e. the linear combination of the two vectors. These vectors are called the standard basis vectors and we will go into more detail about them later. However, these are not the only vectors that can be used to create all vectors in 2D space, we could also use the following two vectors:

v=[23]andw=[31]\boldsymbol{v} = \begin{bmatrix} 2 \\ 3 \end{bmatrix} \quad \text{and} \quad \boldsymbol{w} = \begin{bmatrix} 3 \\ -1 \end{bmatrix}

We can show that any vector uR2\boldsymbol{u} \in \mathbb{R}^2 can be created by combining the two vectors by setting up a system of equations and seeing if we can solve it.

s1v+s2w=[u1u2]s1[23]+s2[31]=[u1u2][2s1+3s23s1s2]=[u1u2]\begin{align*} s_1\boldsymbol{v} + s_2\boldsymbol{w} &= \begin{bmatrix} u_1 \\ u_2 \end{bmatrix} \\ s_1\begin{bmatrix} 2 \\ 3 \end{bmatrix} + s_2\begin{bmatrix} 3 \\ -1 \end{bmatrix} &= \begin{bmatrix} u_1 \\ u_2 \end{bmatrix} \\ \begin{bmatrix} 2s_1 + 3s_2 \\ 3s_1 - s_2 \end{bmatrix} &= \begin{bmatrix} u_1 \\ u_2 \end{bmatrix} \end{align*}

So we can see that depending on what values we want our vector u\boldsymbol{u} to have, we can find the scalars s1s_1 and s2s_2 that create the vector. This is the basis of linear combinations. However, not any two vectors can be used to create all vectors in 2D space, the two vectors must form a basis for the space. In other words they must be linearly independent and span the space. We will go into more detail about this later.

Special Combinations

Depending on the scalars we use in the combination, we can create some special types of combinations:

  • Linear: We have already seen this case where the scalars can be any real number, so s1,s2,,snRs_1, s_2, \dots, s_n \in \mathbb{R}.
  • Affine: This is a linear combination where the sum of the scalars is equal to one, i.e. i=1nsi=1\sum_{i=1}^n s_i = 1. The affine combination is used to create a point that lies on a line or plane defined by the vectors. This is because the combination can be rewritten as follows: s1v1+s2v2=s1v1+(1s1)v2=v1+(1s1)(v2v1)s_1\boldsymbol{v}_1 + s_2\boldsymbol{v}_2= s_1\boldsymbol{v}_1 + (1 - s_1)\boldsymbol{v}_2= \boldsymbol{v}_1 + (1 - s_1)(\boldsymbol{v}_2 - \boldsymbol{v}_1).
  • Conic: This is a linear combination where the scalars are non-negative, i.e. s1,s2,,sn0s_1, s_2, \dots, s_n \geq 0. The conic combination is used to create a point that lies inside the convex hull of the vectors. The convex hull is the smallest convex set that contains all the vectors.
  • Convex: This is a mix between the affine and conic combinations, i.e. the scalars are non-negative and the sum of the scalars is equal to one, i.e. i=1nsi=1\sum_{i=1}^n s_i = 1 and s1,s2,,sn0s_1, s_2, \dots, s_n \geq 0. The convex combination is used to create a point that lies inside the convex hull of the vectors and on the line or plane defined by the vectors.
The yellow are represent the vectors that can be created by the combinations of the two vectors.

Multiplication between Vectors and Matrices

A vector can be left or right multiplied by a matrix. The multiplication between a vector and a matrix is defined just as the matrix multiplication so the dimensions of the two must be compatible. If we have a matrix ARm×n\boldsymbol{A} \in \mathbb{R}^{m \times n} and a column vector xRn×1\boldsymbol{x} \in \mathbb{R}^{n \times 1}, then we can multiply them together as follows:

Ax=[a11a12a1na21a22a2nam1am2amn][x1x2xn]=[a11x1+a12x2++a1nxna21x1+a22x2++a2nxnam1x1+am2x2++amnxn]=[y1y2ym]\begin{align*} \boldsymbol{A}\boldsymbol{x} &= \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \\ &= \begin{bmatrix} a_{11}x_1 + a_{12}x_2 + \dots + a_{1n}x_n \\ a_{21}x_1 + a_{22}x_2 + \dots + a_{2n}x_n \\ \vdots \\ a_{m1}x_1 + a_{m2}x_2 + \dots + a_{mn}x_n \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} \end{align*}

As we can see from the definition above, the result is another column vector with mm components. We can also see that the resulting vector is a linear combination of the columns of the matrix A\boldsymbol{A} where the weights are the components of the vector x\boldsymbol{x}. This is why when the matrix is square the multiplication is often called a linear transformation, because it transforms the vector into another vector. If the vector is on the left side of the matrix, then the vector needs to be transposed for the multiplication to work, i.e. the dimensions of the vector must be compatible with the matrix. So if we have a row vector xTR1×n\boldsymbol{x}^T \in \mathbb{R}^{1 \times n} and a matrix ARn×m\boldsymbol{A} \in \mathbb{R}^{n \times m}, then we can multiply them together as follows:

xTA=[x1x2xn][a11a12a1ma21a22a2man1an2anm]=[x1a11+x2a21++xnan1x1a1m+x2a2m++xnanm]=[y1y2ym]\begin{align*} \boldsymbol{x}^T\boldsymbol{A} &= \begin{bmatrix} x_1 & x_2 & \dots & x_n \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1m} \\ a_{21} & a_{22} & \dots & a_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \dots & a_{nm} \end{bmatrix} \\ &= \begin{bmatrix} x_1a_{11} + x_2a_{21} + \dots + x_na_{n1} & \dots & x_1a_{1m} + x_2a_{2m} + \dots + x_na_{nm} \end{bmatrix} = \begin{bmatrix} y_1 & y_2 & \dots & y_m \end{bmatrix} \end{align*}

Just like when right multiplying a vector by a matrix, the result is a row vector with mm components. We can also see that the resulting vector is a linear combination of the rows of the matrix A\boldsymbol{A} where the weights are the components of the vector x\boldsymbol{x}.

Example

A right multiplication of a matrix and a vector:

[1234][56]=[15+2635+46]=[1739]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 \\ 6 \end{bmatrix} = \begin{bmatrix} 1 \cdot 5 + 2 \cdot 6 \\ 3 \cdot 5 + 4 \cdot 6 \end{bmatrix} = \begin{bmatrix} 17 \\ 39 \end{bmatrix}

A left multiplication of a matrix and a vector:

[56][1234]=[51+6352+64]=[2334]\begin{bmatrix} 5 & 6 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 5 \cdot 1 + 6 \cdot 3 & 5 \cdot 2 + 6 \cdot 4 \end{bmatrix} = \begin{bmatrix} 23 & 34 \end{bmatrix}

Linear Independence

Two vectors are linearly independent if neither of them can be written as a linear combination of the other. In other words, two vectors are linearly independent if they are not scalar multiples of each other. It is however easier to find to define and check for linear dependence. The vectors a\boldsymbol{a} and b\boldsymbol{b} are linearly dependent if:

a=cbfor some cR\boldsymbol{a} = c\boldsymbol{b} \quad \text{for some } c \in \mathbb{R}

this can also be written as:

acb=0\boldsymbol{a} - c\boldsymbol{b} = \boldsymbol{0}

where 0\boldsymbol{0} is the zero vector. This means that the vectors a\boldsymbol{a} and b\boldsymbol{b} are linearly dependent if they are collinear, i.e. they lie on the same line. The two equations above can also be used to define linear independence, we just replace the equal sign with a not equal sign.

The left two vectors are linearly independent, while the right two vectors are linearly dependent.
Example

If a\boldsymbol{a} and b\boldsymbol{b} are defined as:

a=[123]andb=[246]\boldsymbol{a} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \quad \text{and} \quad \boldsymbol{b} = \begin{bmatrix} 2 \\ 4 \\ 6 \end{bmatrix}

then a\boldsymbol{a} and b\boldsymbol{b} are linearly dependent because:

a=2b\boldsymbol{a} = 2\boldsymbol{b}

However, if a\boldsymbol{a} and b\boldsymbol{b} are defined as:

a=[123]andb=[234]\boldsymbol{a} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \quad \text{and} \quad \boldsymbol{b} = \begin{bmatrix} 2 \\ 3 \\ 4 \end{bmatrix}

then a\boldsymbol{a} and b\boldsymbol{b} are linearly independent because no scalar multiple of b\boldsymbol{b} can be equal to a\boldsymbol{a}.

Linear Independence of More Than Two Vectors

Never used this.

Dot Product

The dot product or also called the inner product is the most common type of vector multiplication. This is defined just like the matrix multiplication. However, so the dimensions of the two vectors must be the same. To achieve this, the second vector is transposed. The dot product is often denoted asxy\boldsymbol{x} \cdot \boldsymbol{y}, but sometimes also as x,y\langle \boldsymbol{x}, \boldsymbol{y} \rangle rather than xTy\boldsymbol{x}^T \cdot \boldsymbol{y} to avoid confusion with the matrix multiplication or scalar multiplication. So if we have two vectors xRn×1\boldsymbol{x} \in \mathbb{R}^{n \times 1} and yRn×1\boldsymbol{y} \in \mathbb{R}^{n \times 1}, then we can multiply them together as follows:

x,y=xy=xTy=[x1x2xn][y1y2yn]=x1y1+x2y2++xnyn=i=1nxiyi\langle \boldsymbol{x}, \boldsymbol{y} \rangle = \boldsymbol{x} \cdot \boldsymbol{y} = \boldsymbol{x}^T\boldsymbol{y} = \begin{bmatrix} x_1 & x_2 & \dots & x_n \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = x_1y_1 + x_2y_2 + \dots + x_ny_n = \sum_{i=1}^n x_iy_i

From the dimensions we can also clearly see that the dot product results in a scalar which is why it is also called the scalar product, not to be confused with scalar multiplication!

Example [123][456]=14+25+36=4+10+18=32\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \cdot \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix} = 1 \cdot 4 + 2 \cdot 5 + 3 \cdot 6 = 4 + 10 + 18 = 32

Unlike the matrix multiplication, the dot product is commutative, meaning that the order in which we multiply the vectors together does not matter.

Why is the dot product commutative?

The dot product is also commutative for real numbers, meaning that the order in which we multiply the vectors together does not matter as long as the first vector is transposed. This is because the dot product is the sum of the products of the corresponding components of the two vectors and the pairs of components are the same in both cases.

[x1x2xn][y1y2yn]=x1y1+x2y2++xnyn[y1y2yn][x1x2xn]=y1x1+y2x2++ynxn\begin{align*} \begin{bmatrix} x_1 & x_2 & \dots & x_n \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = x_1y_1 + x_2y_2 + \dots + x_ny_n\\ \begin{bmatrix} y_1 & y_2 & \dots & y_n \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = y_1x_1 + y_2x_2 + \dots + y_nx_n \end{align*}

Norms

A norm is a function denoted by \|\cdot\| that maps vectors to real values and satisfies the following properties:

  • It is positiv definit meaning it assigns a non-negative real numbers, i.e a length or size to each vector.
  • If the norm of a vector is zero, then the vector is the zero vector.
  • The norm of a vector scaled by a scalar is equal to the absolute value of the scalar times the norm of the vector, i.e. sx=sx\|s\boldsymbol{x}\| = |s|\|\boldsymbol{x}\| where sRs \in \mathbb{R} and xRn\boldsymbol{x} \in \mathbb{R}^n.
  • The triangle inequality holds which we will see later.

In simple terms the norm of a vector is the length of the vector. There are many different types of norms, but the most common ones are the L1L_1 and L2L_2 norms, also known as the Manhattan and Euclidean norms respectively. The LpL_p norm is a generalization of the L1L_1 and L2L_2 norms. We denote a vector's norm by writting it in between two vertical bars, e.g. x\|\boldsymbol{x}\|, and the subscript denotes the type of norm, e.g. x1\|\boldsymbol{x}\|_1 or x2\|\boldsymbol{x}\|_2 etc. If the subscript is omitted, then the L2L_2 norm is assumed.

Manhattan Norm

The Manhattan norm or L1L_1 norm is defined as the sum of the absolute values of the vector's components.

It is called the Manhattan norm because it can be thought of as the distance between two points along the axis of a rectangular grid, like the streets of Manhattan or any other city with a grid-like structure. No matter how we move along the roads of Manhattan, the distance between two points is always the same.

x1=x1+x2++xn=i=1nxi\|\boldsymbol{x}\|_1 = |x_1| + |x_2| + \dots + |x_n| = \sum_{i=1}^n |x_i|
No matter how we move along the roads of Manhattan, the distance between two points is always the same.
Example

If x\boldsymbol{x} is defined as:

[123]\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}

then the L1L_1 norm of x\boldsymbol{x} is:

x1=1+2+3=6\|\boldsymbol{x}\|_1 = |1| + |2| + |3| = 6

Euclidean Norm

As the name suggests, the Euclidean norm or L2L_2 norm is the distance between two points in Euclidean space, i.e. the straight line distance between two points. For the 2D case, the Euclidean norm is just the Pythagorean theorem, i.e the length of the hypotenuse of a right-angled triangle.

x2=x12+x22++xn2=i=1nxi2\|\boldsymbol{x}\|_2 = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2} = \sqrt{\sum_{i=1}^n x_i^2}

From the defintion above we can actually see that the Euclidean norm is the square root of the dot product of the vector with itself.

x2=xx=x,x\|\boldsymbol{x}\|_2 = \sqrt{\boldsymbol{x} \cdot \boldsymbol{x}} = \sqrt{\langle \boldsymbol{x}, \boldsymbol{x} \rangle}
We can see the 2D case of the Euclidean norm and the 3D case of the Euclidean norm.

Cauchy-Schwarz Inequality

The Cauchy-Schwarz inequality states that the dot product of two vectors is always less than or equal to the product of the two vectors' norms.

xyx2y2|\boldsymbol{x} \cdot \boldsymbol{y}| \leq \|\boldsymbol{x}\|_2 \|\boldsymbol{y}\|_2
Proof

We want to prove that for any two vectors x\boldsymbol{x} and y\boldsymbol{y}, the inequality holds.

Case 1: When one of the vectors is the zero vector

If one of the vectors is the zero vector, then the inequality holds because the dot product is zero and the product of the norms are also zero. So then the inequality becomes:

000 \leq 0

Case 2: If both vectors are unit vectors

If both vectors are unit vectors, then the inequality becomes the following:

xy1|\boldsymbol{x} \cdot \boldsymbol{y}| \leq 1

We can then rewrite the dot product as the cosine of the angle between the two vectors, because the norms are one this also simplifies to:

xy=x2y2cos(θ)=cos(θ)\boldsymbol{x} \cdot \boldsymbol{y} = \|\boldsymbol{x}\|_2 \|\boldsymbol{y}\|_2 \cos(\theta) = \cos(\theta)

The cosine of the angle between two vectors is always between -1 and 1. The inequality however also takes the absolute value of the dot product, so the inequality holds.

xy=cos(θ)1|\boldsymbol{x} \cdot \boldsymbol{y}| = |\cos(\theta)| \leq 1

Case 3: Any two vectors

If the vectors are not unit vectors, then we can scale the vectors to be unit vectors. We don't need to worry about dividing by zero as we've already shown, if any of the vectors is the zero vector the inequality becomes zero.

u=xx2andv=yy2\boldsymbol{u} = \frac{\boldsymbol{x}}{\|\boldsymbol{x}\|_2} \quad \text{and} \quad \boldsymbol{v} = \frac{\boldsymbol{y}}{\|\boldsymbol{y}\|_2}

From above we know that uv1|\boldsymbol{u} \cdot \boldsymbol{v}| \leq 1, so we can write the following:

xy=x2y2(uv)xy=x2y2uvxyx2y2\begin{align*} \boldsymbol{x} \cdot \boldsymbol{y} &= \|\boldsymbol{x}\|_2 \|\boldsymbol{y}\|_2 (\boldsymbol{u} \cdot \boldsymbol{v}) \\ |\boldsymbol{x} \cdot \boldsymbol{y}| &= \|\boldsymbol{x}\|_2 \|\boldsymbol{y}\|_2 |\boldsymbol{u} \cdot \boldsymbol{v}| \\ |\boldsymbol{x} \cdot \boldsymbol{y}| &\leq \|\boldsymbol{x}\|_2 \|\boldsymbol{y}\|_2 \end{align*}

Triangle Inequality

The triangle inequality states that the norm of the sum of two vectors is less than or equal to the sum of the norms of the two vectors.

x±y2x2±y2\|\boldsymbol{x} \pm \boldsymbol{y}\|_2 \leq \|\boldsymbol{x}\|_2 \pm \|\boldsymbol{y}\|_2

This can also visually be seen in the 2D case, where the direct path from one point to another is always shorter than the path that goes through another point. Or also that the hypotenuse of a triangle is always shorter than the sum of the other two sides.

vectorTriangleInequality.png

Proof

Let's first look at the norm of the sum of two vectors squared.

x+y22=(x+y)(x+y)=xx+xy+yx+yy=x22+2xy+y22\begin{align*} \|\boldsymbol{x} + \boldsymbol{y}\|_2^2 &= (\boldsymbol{x} + \boldsymbol{y}) \cdot (\boldsymbol{x} + \boldsymbol{y}) \\ &= \boldsymbol{x} \cdot \boldsymbol{x} + \boldsymbol{x} \cdot \boldsymbol{y} + \boldsymbol{y} \cdot \boldsymbol{x} + \boldsymbol{y} \cdot \boldsymbol{y} \\ &= \|\boldsymbol{x}\|_2^2 + 2\boldsymbol{x} \cdot \boldsymbol{y} + \|\boldsymbol{y}\|_2^2 \end{align*}

Now we can use the Cauchy-Schwarz inequality on the middle term and get:

2xy2x2y22\boldsymbol{x} \cdot \boldsymbol{y} \leq 2\|\boldsymbol{x}\|_2 \|\boldsymbol{y}\|_2

So we can rewrite the norm of the sum of two vectors squared and take the square root to get the triangle inequality.

x+y22x22+2x2y2+y22=(x2+y2)2x+y2x2+y2\begin{align*} \|\boldsymbol{x} + \boldsymbol{y}\|_2^2 &\leq \|\boldsymbol{x}\|_2^2 + 2\|\boldsymbol{x}\|_2 \|\boldsymbol{y}\|_2 + \|\boldsymbol{y}\|_2^2 \\ &= (\|\boldsymbol{x}\|_2 + \|\boldsymbol{y}\|_2)^2 \\ \|\boldsymbol{x} + \boldsymbol{y}\|_2 &\leq \|\boldsymbol{x}\|_2 + \|\boldsymbol{y}\|_2 \end{align*}

P-Norm

The idea of the LpL_p norm is to generalize the L1L_1 and L2L_2 norms. The LpL_p norm is defined as:

xp=(x1p+x2p++xnp)1p=(i=1nxip)1p\|\boldsymbol{x}\|_p = \left(|x_1|^p + |x_2|^p + \dots + |x_n|^p\right)^{\frac{1}{p}} = \left(\sum_{i=1}^n |x_i|^p\right)^{\frac{1}{p}}

An arbitrary norm is rarely used in practice, most commonly the L1L_1 and L2L_2 norms are used. For some use-cases the LL_\infty norm is used, which is defined as:

x=maxixi\|\boldsymbol{x}\|_\infty = \max_i |x_i|

In other words, the LL_\infty norm is vector component with the largest absolute value.

Example

If x\boldsymbol{x} is defined as:

[123]\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}

then the L4L_4 norm of x\boldsymbol{x} is:

x4=(14+24+34)14=(1+16+81)14=4\|\boldsymbol{x}\|_4 = \left(|1|^4 + |2|^4 + |3|^4\right)^{\frac{1}{4}} = \left(1 + 16 + 81\right)^{\frac{1}{4}} = 4

and the LL_\infty norm of x\boldsymbol{x} is:

x=maxixi=max{1,2,3}=3\|\boldsymbol{x}\|_\infty = \max_i |x_i| = \max\{1, 2, 3\} = 3

Meaning of the Dot Product

The question now is what is the dot product actually?

We can also visualize the dot product nicely in 2D and 3D space. The dot product of two vectors is the cosine of the angle between the two vectors multiplied by the length of the two vectors if we place the tails at the same point. So if we have two vectors x\boldsymbol{x} and y\boldsymbol{y}, then we can calculate the dot product as follows:

xy=xycos(θ)\boldsymbol{x} \cdot \boldsymbol{y} = \|\boldsymbol{x}\| \|\boldsymbol{y}\| \cos(\theta)

where θ\theta is the angle between the two vectors. We can also calculate the angle between the two vectors by rewriting the equation above as follows:

θ=cos1(xyxy)\theta = \cos^{-1}\left(\frac{\boldsymbol{x} \cdot \boldsymbol{y}}{\|\boldsymbol{x}\| \|\boldsymbol{y}\|}\right)

Where cos1\cos^{-1} is the inverse cosine function, also called the arccosine function.

Calculating the angle between two vectors using the dot product.
Example

If x\boldsymbol{x} and y\boldsymbol{y} are defined as:

x=[32]andy=[17]\boldsymbol{x} = \begin{bmatrix} 3 \\ -2 \end{bmatrix} \quad \text{and} \quad \boldsymbol{y} = \begin{bmatrix} 1 \\ 7 \end{bmatrix}

then the angle between x\boldsymbol{x} and y\boldsymbol{y} is:

θ=cos1(xyxy)=cos1(31+(2)732+(2)212+72)=cos1(3149+41+49)=cos1(111350)=115.6\begin{align*} \theta &= \cos^{-1}\left(\frac{\boldsymbol{x} \cdot \boldsymbol{y}}{\|\boldsymbol{x}\| \|\boldsymbol{y}\|}\right) \\ &= \cos^{-1}\left(\frac{3 \cdot 1 + (-2) \cdot 7}{\sqrt{3^2 + (-2)^2} \sqrt{1^2 + 7^2}}\right) \\ &= \cos^{-1}\left(\frac{3 - 14}{\sqrt{9 + 4} \sqrt{1 + 49}}\right) \\ &= \cos^{-1}\left(\frac{-11}{\sqrt{13} \sqrt{50}}\right) \\ &= 115.6^\circ \end{align*}

Orthogonal Vectors

We call two vectors orthogonal if the angle between them is 90 degrees, i.e. they are perpendicular to each other. If two vectors are orthogonal, then their dot product is zero, because cos(90)=0\cos(90) = 0. So if we have two vectors x\boldsymbol{x} and y\boldsymbol{y}, then we can check if they are orthogonal as follows:

xy=0\boldsymbol{x} \cdot \boldsymbol{y} = 0

Outer Product

The outer product of two vectors is the opposite of the dot product. The outer product of two vectors results in a matrix. So if we have two vectors xRm×1\boldsymbol{x} \in \mathbb{R}^{m \times 1} and yRn×1\boldsymbol{y} \in \mathbb{R}^{n \times 1}, then we can multiply them together as follows to get a matrix ARm×n\boldsymbol{A} \in \mathbb{R}^{m \times n}. The outer product can be denoted as a matrix multiplication or as with the symbol \otimes.

A=xyT=xy=[x1x2xm][y1y2yn]=[x1y1x1y2x1ynx2y1x2y2x2ynxmy1xmy2xmyn]\boldsymbol{A} = \boldsymbol{x}\boldsymbol{y}^T = \boldsymbol{x} \otimes \boldsymbol{y} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix} \begin{bmatrix} y_1 & y_2 & \dots & y_n \end{bmatrix} = \begin{bmatrix} x_1y_1 & x_1y_2 & \dots & x_1y_n \\ x_2y_1 & x_2y_2 & \dots & x_2y_n \\ \vdots & \vdots & \ddots & \vdots \\ x_my_1 & x_my_2 & \dots & x_my_n \end{bmatrix}

Or more formally:

(A)ij=(xyT)ij=(xy)ij=xiyj(\boldsymbol{A})_{ij} = (\boldsymbol{x}\boldsymbol{y}^T)_{ij} = (\boldsymbol{x} \otimes \boldsymbol{y})_{ij} = x_iy_j

From above we can see that the outer product of two vectors results in a matrix where the columns are the first vector scaled by the components of the second vector and the rows are the second vector scaled by the components of the first vector. So the matrix forms a dependent set of vectors, i.e. the columns/rows of the matrix are linearly dependent. Because the size of largest set of linearly independent vectors is 1, the rank of the matrix is 1.

Example [123][456]=[141516242526343536]=[45681012121518]\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \begin{bmatrix} 4 & 5 & 6 \end{bmatrix} = \begin{bmatrix} 1 \cdot 4 & 1 \cdot 5 & 1 \cdot 6 \\ 2 \cdot 4 & 2 \cdot 5 & 2 \cdot 6 \\ 3 \cdot 4 & 3 \cdot 5 & 3 \cdot 6 \end{bmatrix} = \begin{bmatrix} 4 & 5 & 6 \\ 8 & 10 & 12 \\ 12 & 15 & 18 \end{bmatrix}

Matrix Multiplication as Outer Product

A fourth view of matrix multiplication is that it is the sum of the outer products of the columns of the first matrix and the rows of the second matrix. So you can interpret each outer product as a layer of the resulting matrix. This in turn shows that any matrix can be written as a sum of rank 1 matrices.

Matrix multiplication as the sum of the outer products of the columns of the first matrix and the rows of the second matrix.
Example

The matrix multiplication of two matrices A\boldsymbol{A} and B\boldsymbol{B} can be written as the sum of the outer products of the columns of A\boldsymbol{A} and the rows of B\boldsymbol{B}.

[1234][abcd]=[13][ab]+[24][cd]=[1a1b3a3b]+[2c2d4c4d]=[a+2cb+2d3a+4c3b+4d]\begin{align*} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} &= \begin{bmatrix} 1 \\ 3 \end{bmatrix} \begin{bmatrix} a & b \end{bmatrix} + \begin{bmatrix} 2 \\ 4 \end{bmatrix} \begin{bmatrix} c & d \end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot a & 1 \cdot b \\ 3 \cdot a & 3 \cdot b \end{bmatrix} + \begin{bmatrix} 2 \cdot c & 2 \cdot d \\ 4 \cdot c & 4 \cdot d \end{bmatrix} = \begin{bmatrix} a + 2c & b + 2d \\ 3a + 4c & 3b + 4d \end{bmatrix} \end{align*}

Linear Projections

Is a bigger topic can be related to matrix transformations via the outer product of the two vectors or something is related to the dot product and the angle between the two vectors and the length of the two vectors

Normalization

Normalizing means to bring something into some sort of normal or standard state. In the case of vectors, normalizing means to scale the vector in a way that it's length is equal to one. Often we denote a normalized vector by adding a hat to the vector, e.g. x^\hat{\boldsymbol{x}} is the normalized vector of x\boldsymbol{x}. So we can say if x=1\|\boldsymbol{x}\| = 1, then x\boldsymbol{x} is normalized. From this definition we can see that to normalize a vector, we simply divide the vector by it's length, i.e. we divide the vector by a scalar. So if we have a vector x\boldsymbol{x}, then we can normalize it as follows:

x^=xx2\hat{\boldsymbol{x}} = \frac{\boldsymbol{x}}{\|\boldsymbol{x}\|_2}

This normalized vector will have the same direction as the original vector, but it's length will be equal to one. By eliminating the length of the vector, we can uniquely identify a vector by it's direction. This is useful because we can now compare vectors based on their direction, without having to worry about their length. All these normalized vectors are also called unit vectors and if they are placed at the origin in 2D they span the unit circle.

We can see that the normalized vectors all have the same length, but different directions.

Orthonormal Vector

We can now combine the idea of orthogonal vectors and normalized vectors to get orthonormal vectors. Orthonormal vectors are vectors that are orthogonal to each other and have a length of one.

The difference between orthogonal and orthonormal vectors.

Standard Unit Vectors

An example of orthonormal vectors are the standard unit vectors. The standard unit vectors can be thought of as the vectos that correspond to the axes of a coordinate system. Later on we will see that these vectors can be used to span any vector space and form the standard basis of the vector space. The standard unit vectors are denoted as ei\boldsymbol{e}_i where ii is the index of the vector. The ii also corresponds to the index of the component that is one, while all other components are zero. The dimensionality of the vector is inferred from the index, so e1\boldsymbol{e}_1 is a 1D vector, e2\boldsymbol{e}_2 is a 2D vector, e3\boldsymbol{e}_3 is a 3D vector depending on the context.

ei=[010]\boldsymbol{e}_i = \begin{bmatrix} 0 \\ \vdots \\ 1 \\ \vdots \\ 0 \end{bmatrix}

It is quiet easy to see that the standard unit vectors are orthonormal, because they are orthogonal to each other and have a length of one. It also easy to see that any vector can be written as a linear combination of the standard unit vectors, this is why they are so useful and will become an important concept later on when talking about vector spaces and bases.

The standard unit vectors here in 3D space are i, j, and k. We can see how the vector a can be written as a linear combination of the standard unit vectors.