11 - Principal Component Analysis and Autoencoders

Notes:

We will use linear algebra only!

Matrix times Vector - 2 Ways

At first, you learn (Mv1). But when you get used to viewing it as (Mv2), you can understand $A x$ as a linear combination of the columns of $A$ . Those products fill the column space of $A$ denoted as $C (A)$ . The solution space of $A x = 0$ is the nullspace of $A$ denoted as $N (A)$ .

Notes:

If you need to know anything about linear algebra, this is the most important thing to know, matrix and vector multiplication!

The Art of Linear Algebra - Vector times matrix - 2 Ways

A row vector $y$ is multiplied by the two column vectors of $A$ and become the two dot product elements of $y A$ .

image-45.png428x154

The product $y A$ is a linear combination of the row vectors of $A$ .

Notes:

Vector matrix multiplication is the linear combination of the rows

Matrix times Matrix - 4 ways

Notes:

MM1 is a natural way, but turns out it is not the most used
- It is typycal row by column approach
The most important view is MM4
- You take a column vector and multiply it by a column vector, what do you get? a matrix
- You do this for every column and every row and then add the matrices together (they will be of the same size)
- This is kind of like an outer product.
- What can you learn from this approach?
  - You take the first column on matrix A, then you take the first row B, you multiply them together to get a matrix
  - The only thing that matters is the cross balance
  - This doesn't tell you that if you permute the columns, the matrix will be the same
  - But if you permute both the columns and row, then your matrices will be the same
What about MM2?
- You take each column and multiply it with a matrix, a linear combination of columns with the matrix (Matrix - vector multiplication).
MM3 is just the row way of doing MM2?

Practical Patterns

Burn this into your memories and you can see ...

Notes:

P1: Linear combination of the columns on A.
P2: Is the same thing but the diagonal matrix is on the left.
- Note this one is not a linear combination, you are just multiplying the numbers from the diagonal matrix * the rows in matrix B

Notes:

How to interpret P4?
- We can do step by step
  - First we use P1' (columns * diagonal) And then you multiply that column matrix to the row matrix
  - In practice what is the result? How do I interpret this?
    - You still look at the matrix from P1' as a column matrix but with each column multiplied with a number, then we need to multiply each column with each row of the row matrix.
Why are we talking about this?
- Later on, what we will do is to have a matrix, but we will decompose this matrix into products of matrices, with the one on the center being a diagonal matrix.
- The reason why we view linear algebra this way, is that if you docompose in this way, you can view this decomposition as an outer product (P4)
  - The size of this matrix is the same size as the original matrix, but each column are linearly dependent so the rank of this matrix is 1.
    - Is a sequence of rank 1 matrices summed together.
There is something that is easy but most people do not know.
- In most of these decomposition, they require the diagonal elements to be in non-increasing order?, why is this a requirement?
- Because the order of the matrices does not matter, you can always rearrange in sorted order.
  - If the diagonal terms are not in sorted order, you can always rearrange the terms of the sum of matrices at the end
Other thing is that the result is nonnegative?
- Because if they are negative, you can just flip the sign of the outside.
- You can always split the sign by multiplying a column or a row times a -1
- You either leave the sign in the column or in the row, but not in both
We always assume that the diagonal terms are always non-negative and in sorted order?

Orthogonal Matrices

(1) An orthogonal matrix is a square matrix whose columns and rows are orthogonal unit vectors, i.e., orthonormal vectors. That is, if a matrix $Q$ is an orthogonal matrix, we have

Q^{T} Q = Q Q^{T} = I .

(2) It leads to $Q^{- 1} = Q^{T}$ , which is a very useful property as it provides an easy way to compute the inverse.

(3) For an orthogonal $n \times n$ matrix $Q = [q_{1}, q_{2}, \dots, q_{n}]$ , where $q_{i} \in R^{n}$ , $i = 1, 2, \dots, n$ , it is easy to see that $q_{i}^{T} q_{j} = 0$ when $i \neq j$ and $q_{i}^{T} q_{i} = 1$ .

(4) Furthermore, suppose $Q_{1} = [q_{1}, q_{2}, \dots, q_{i}]$ and $Q_{2} = [q_{i + 1}, q_{i + 2}, \dots, q_{n}]$ , we have $Q_{1}^{T} Q_{1} = I, Q_{2}^{T} Q_{2} = I$ , but $Q_{1} Q_{1}^{T} \neq I, Q_{2} Q_{2}^{T} \neq I$ .

Notes:

Certain matrices are called orthogonal matrices, these matrices are square matrices (equal number of rows and columns)
If you have a square matrix, it is orthogonal if all columns are orthogonal to each other
- If you take any different columns and you compute the inner product, the inner product is 0.
- Each column needs to be normalized with each row?
  - Each column is normalized to have a unit ... then the rows will automatically satisfy the orthogonal requirement
The requirement is intuitively translated into $Q^{T} Q = Q Q^{T} = I$
- Remember Identity matrix (I) is all elements 0 except by the diagonal elements which are 1.
- $Q^{T} Q$ : $Q^{T}$ are rows, and $Q$ are columns, its product tells you the columns are normalized
- $Q Q^{T}$ : tells you the rows are normalized
If a matrix is orthogonal the inverse is simply the transpose of that matrix!
There is something else that is important to talk about:
- If I have an orthogonal matrix $Q$ (view it as columns) - it has to be a square matrix
- Somehow we will split this into 2 matrices: $Q = [Q_{1}, Q_{2}]$ , of course $Q_{1}$ and $Q_{2}$ are no longer squared matrices
  - They are not orthogonal matrices anymore, but they are vey important
  - In general when we talk about $Q_{1}$ , it is a matrix with orthonormal columns.
    - All the columns are orthonormal to each other and each column is also normalized
- In this case we would have:
  - $Q_{1}^{T} Q_{1} = I$
  - $Q_{1} Q_{1}^{T} = we don’t know$
    - Rows are no longer orthogonal and normalized!
  - $Q_{2}^{T} Q_{2} = I$
  - $Q_{2} Q_{2}^{T} = we don’t know$
    - Rows are no longer orthogonal and normalized!

Notes:

We know that:
- $Q = [Q_{1}, Q_{2}]$
- $Q^{T} Q = Q Q^{T} = I$
How do I write this in terms of $Q_{1}$ and $Q_{2}$ ?
- $Q^{T} Q = Q Q^{T} = I = Q_{1} Q_{1}^{T} + Q_{2} Q_{2}^{T}$
- You multiply the first part together, and then you multiply the second part together and sum them
- There is an interesting property here (FYIO):
  - These two matrices added together equals the Identity matrix
    1. For these two matrices the sum of the diagonal elements of these matrices is also the Identity matrix
    2. What would be the rank of these two matrices?
      - Is there an upper bound? Yes, the upper-bound is 1.
      - The diagonal elements are nonnegative, each of them have to be smaller or equal to 1.
  - Note this is not required for this class.

Eigen-Decomposition

(1) A square $n \times n$ matrix $S$ with $n$ linearly independent eigenvectors can be factorized as

S = Q Λ Q^{- 1}

where $Q$ is the square $n \times n$ matrix whose columns are eigenvectors of $S$ , and $Λ$ is the diagonal matrix whose diagonal elements are the corresponding eigenvalues.

(2) Note that only diagonalizable matrices can be factorized in this way.

(3) If $S$ is a symmetric matrix, its eigenvectors are orthogonal. Thus $Q$ is an orthogonal matrix and we have

S = Q Λ Q^{T} .

Notes:

Definition of eigenvalues & eigenvectoes?
- Matrix has to be a squared matrix
- If you have squared matrix $S$ (n * n)
- The eigen values and eigen vectors are:
  - $S v_{i} = λ_{i} v_{i}, i = 1 \dots n$
    - note eigen values and eigen vectors are paired to each other.
    - We just want it to be a matrix = a matrix
We want to have a matrix $Q$ so that each of its columns can be paired as eigenvectors: $Q = [q_{1}, q_{2}, \dots q_{n}]$
Then we have another matrix, which is a diagonal matrix: $\Lambda = \left[\begin{array}{lll}$ & \lambda_2 & \
& & \lambda_3
\end{array}\right]$$
- If you want to multiply each number times the columns, the diagonal matrix has to be on the right and the column matrix on the left: $Q Λ$
Note that when doing $S Q$ , we are multiplying the first column of the matrix S times $q_{1}$ , and so on.
- we can represent this a $S q_{i}$
- We then can say:
$S Q = Q Λ$ - We are just equaling each column to each other on $S v_{i} = λ_{i} v_{i}$
You can assume $Q$ is non-singular. and therefore you are able to do: $S Q Q^{- 1} = Q Λ Q^{- 1} = S$
- This happens since $Q Q^{- 1} = I$ ?
- If all columns are orthogonal to each other, they are not linear independent
  - (orthogonal means they have 90 degree angle), this is a stronger condition
S is a symmetric matrix, and in that case $Q$ will be an orthogonal matrix (non-singular). Where the inverse $Q^{- 1}$ becomes the transpose.

Question:

Look at:$$
\boldsymbol{S}=\boldsymbol{Q} \boldsymbol{\Lambda} \boldsymbol{Q}^T .
- Then we know the matrix i the middle is a diagonal matrix with eigenvalues
- Since $S$ is symmetric, we can be sure that all the eigenvalues are real numbers (but they still can be + or -)
- The dilemma is that:
  - somehow in this decomposition, we can somehow manage to make them positive?
    - No, because then we will affect transpose
    - Why can't you make these to be non-negative?
      - Because $Q$ and $Q^{T}$ will be the same value, so multiplying them will become the square
    - You are kind of flipping two times
    - Therefore you cannot make the eigenvalues (the diagonal elements of the middle matrix) to always be positive.
      - Note this is different in singular value decomposition, where $Q$ and $Q^{T}$ are different matrices.