09 - Convolutional Neural Networks

#MachineLearning #NeuralNetworks #Convolution

Class: CSCE-421

Notes:

Fully Connected Layer

Pasted image 20260219105125.png|500

Convolution Layer

Pasted image 20260219100750.png|500

Pasted image 20260219100813.png|500

Notes:

For example if you have 3 images, you will apply one different filter to each one then we add them together
In early days this technique has been used!
More filters = more calculations -> your model is slower
Most times we do full connections: each output is connected to each input, which means in this example we have 3 channels but that is not required.

Pasted image 20260219101103.png|500

Notes:

Remember that a different filter generates a different output
All the output slices are generated independently

Pasted image 20260219101206.png|500

Notes:

We want to use this convolution to get some features of the input image!
Things to talk about on (will be on the exam)
- If you know the size of the input, if you know the size of the filter, how can you calculate the size of the output?

Pasted image 20260219101346.png|500

Notes:

This operation of convolution is by far the most important one
It will be a network with many layers but the convolutional network will be by far the most important one

Pasted image 20260219101455.png|500

Pasted image 20260219101509.png|500

f [x, y] * g [x, y] = \sum_{n_{1} = - \infty}^{\infty} \sum_{n_{2} = - \infty}^{\infty} f [n_{1}, n_{2}] \cdot g [x - n_{1}, y - n_{2}]

Notes:

You want to have filters that may relate to the input image so you can account for features of the image or some rotations
Convolution = elementwise multiplication and sum of a filter and the signal (image)

Pasted image 20260219101615.png|500

Notes:

The image does not have to be a square but in most cases kernels are just squares (rectangle filters are not commonly used)
How to calculate the output image size if we have a 32x32 input image with a 5x5 filter?
Note size = 28 = 32 - 5
The size is the determined by:
- size of input - size of filter + 1

A closer look at spatial dimensions:

Notes:

Roughly the output will be half the sie of the input (if you do not consider the bottom)
Remember so far we had:
- size = size of input - size of filter + 1
Size = 7 - 3 + 1
- It will be 5, which is not
So you need to do:
- size = ((input size - filter size)/ 2) + 1

\begin{aligned} Output size: \\ \begin{aligned} (N - F) / stride + 1 \\ e.g. N = 7, F = 3 : \\ stride 1 \Rightarrow (7 - 3) / 1 + 1 = 5 \\ stride 2 \Rightarrow (7 - 3) / 2 + 1 = 3 \\ stride 3 \Rightarrow (7 - 3) / 3 + 1 = 2.33 \end{aligned} \end{aligned}

Notes:

This is why the pixels on the borders are not treated fair in comparison to the pixels in the center
So what have we done about this?

In practice: Common to zero pad the border

Pasted image 20260219102912.png|500

Notes:

Now your input has been increased by 1 pixels in each direction
This is equivalent to apply a padding
This will not affect the output size
And this will help because it will somehow mitigate this bothering effect were border pixels were treated unfairly
In conclusion, the size of your output will be affected by:
- The size of the input
- The size of your filter
- The stride
- The padding?

Pasted image 20260219103225.png|500

Notes:

After applying convolution, padding on the input does not affect the size of the output

Pasted image 20260219103346.png|500

Convolution: translation-equivariance

Process each window in the same way

Pasted image 20260219103443.png|400

Pasted image 20260219103600.png|500

Again remember this is only tru for translation
The network so far is:
- Translation equivariant
- But not rotation equivariant

Pasted image 20260219103705.png|500

Notes:

Look at the right model, this is the fully connected layer we have talked about in 06 - Multi-Layer Perceptron-Networks
How are convolutional layers related to this?
Convolution is a special case of a fully connected layer
- If your kernel size is 3, the output will only be connected to 3 inputs, you can see this is a locally connected layer (like in the middle model)
- But the second thing about convolution is that we have shared parameters, each line of the same color has the same value.
Note eventually these connection weights are trained from data, how can we guarantee that a matrix will have the same wiring ready for convolution?
- In practice you do not need to worry about this but in implementation you will need to initialize parameters to the same value and in an update you compute the gradient of each and you do the average of each.
- Not required to understand backward propagation in a convolutional layer in this class.

Convolution: linear transform

Pasted image 20260219104430.png|500

Notes:

Think about it as an X vector (the input vector)
Your output is a 4-dimensional y vector
Values on the diagonal have to remain exactly the same
Note that each output is connected to 3 inputs in this case
That is equivalent to taking this w vector and get the product with the x vector
You still have a fully connected layer with a W matrix but what is special about it is that some of the entries are fixed to 0 and some of them have the same value.
Conceptually this is useful, but in practice this is not really what we implement, this is not the most efficient way of doing it.

Receptive Field

Pasted image 20260219105052.png|500

Notes:

Each unit will look at 3 units in the previous layer
The network will be stacked with many layers
each unit on a top layer will look at 3 units on the bottom layer
So each unit on top will be able to look at larger and larger areas of the input (covering a larger span)
The idea is that:
- If you have an image and you do some convolution and have some outputs
- A unit is looking at some areas of the input
- Our goal is to convert this area into a vector
- We want x to be a one by one feature map
- This unit captures some kind of feature of the input image

Fully Connected Layer

Convolution Layer

A closer look at spatial dimensions:

In practice: Common to zero pad the border

Convolution: translation-equivariance

Convolution = local connection + weight-sharing

Convolution: linear transform

Receptive Field