08 - Convolution

Class: CSCE-421

Notes:

Pre:

How to convert an image into a fix-length vector?
We want this X to somehow remain the same if we rotate the input image
Turns out the translation part is easy to do, the rotation part can be done but it is not easy to do
Take an I image and turn it into a fix-length vector X

Image filtering

Pasted image 20260219093618.png|500

What is the idea of convolutional?

An image is simply a two-dimensional data array with numbers
If we have the 9-square box (3x3) as out filter (or sometimes called kernel) we will take this filter and do element-wise multiplication of some portion of the data array
The output would be somehow a moving average of the inputs
This translates to be a more blurry version of the image.

Pasted image 20260219093639.png|500

Then we move one step forward (advance 1 position) so we can try to cover the whole image at some point
Note, if we change out filter, our output can be very different

Pasted image 20260219093700.png|500

We want this operation to somehow be related to the transformation of the image (what if we shift the location of the left box?)
- How the output would be changed in this case?
- The output will be just shifted also!
This is called translation equivariant
- Means that if you do some transformation of the input, it will change the output equivalently
How about rotation? if we rotate the object and apply the same filter, will the output be somehow related?
- In general the output will not be related to the original image at all
- What matters is the relative angle to the filter, if you rotate just the image multiplication is now messed up, so you are multiplicating together different numbers.
- If you rotate both the image and the filter you will then get the same output rotated.
What if we apply 4 different kernels to the input image, each with a different angle of rotation. If the input image is rotated then you will jus basically change the order of the output, but some of the angles might still be able to relate to the input image. Now you have a network that has rotation equivariance.
- But what happens if you rotate by just 3 degrees or an arbitrary number of degrees? So in general it is not that beneficial.
- Then it will break our system because we are only accounting for 4 different angles.
- This is not that intuitive to implement, to do this you have oto rely in some transformations that might help do this.
This is the idea of convolution
- Depending of a filter, the output will be very different
- The numbers on your filter are actually learnt from data, you will treat them as parameters and you will learn these parameters from data.

Box Filter

What does it do?

Replaces each pixel with an average of its neighborhood
Achieve smoothing effect (remove sharp features)

Pasted image 20260219093857.png|150

Smoothing with box filter

Pasted image 20260219094504.png|400

Practice with linear filters

Pasted image 20260219094613.png|500

This filter would do nothing to the original image

Pasted image 20260219094645.png|500

Pasted image 20260219094719.png|500

The right output is the Vertical Edge (absolute value)
How does a negative number affect in this case?
- Remember we are doing element-wise multiplication
This filter will cancel both sides of the vertical borders, not the horizontal border

Pasted image 20260219095015.png|500

This is the Horizontal Edge (absolute value
If you make the filter slightly larger you can also do a 45 degreee detector and in principle any detector you can think of
Somehow by just playing around with a filter we can modify our input image and make it easier to recognize it.

Image filtering

Filters can be designed to detect edges of different orientations
The detected edges can be combined to form object shapes, which are important for object recognition
Convolutional neural networks are based on the idea of image convolutions
Convolutional neural networks use data to train filter parameters