04 - Overfitting and regularization
Class: CSCE-421
Notes:
Recap
- Linear regression: we use a linear model to predict a continuous value
- 01 - Linear Regression
- There are a lot of different names for this: linear regression/least squares regression etc.
- "you first compute the difference and then take the square"
- This is L2 loss
- L1 will be doing absolute value without squaring
- If your is discrete (not continuous) we need to use cross-entropy loss os multi-class logistic regression
- Generalization: we want to make multi-class predictions
- Somehow you have K classes
- You want to have a score for each of the classes meaning the likelihood an x belongs for each of the classes
- If you do you get the score for each of the classes
- Then you pass the score vector to softmax to make a prediction, that is all you need
- In training, if we have a training vector .
- You get the prediction you got from softmax and in training time you call this vector of probability Q.
- We want to convert this into P and compare it to Q, the cross entropy loss is simply the negative of the summation of Pi logQi.
- The most commonly used
Case Study Polynomial Curve Fitting
Suppose we observe a real-valued input variable x and we wish to use this observation to predict the value of a real-valued target variable t.
/CSCE-421/Visual%20Aids/Pasted%20image%2020260203091602.png)
polynomial function $$
y(x, \mathbf{w})=w_0+w_1 x+w_2 x^2+\ldots+w_M x^M=\sum_{j=0}^M w_j x^j
\left[\begin{array}{c}
1 \
x \
\end{array}\right]
\to f
\left[\begin{array}{c}
1 \
x \
\end{array}\right]
[w_0, w_1, ...,w_M]
\left[\begin{array}{c}
1 \
x_0 \
x_1 \
\vdots \
x_M
\end{array}\right]
E(\mathbf{w})=\frac{1}{2} \sum_{n=1}^N\left{y\left(x_n, \mathbf{w}\right)-t_n\right}^2
E_{\text {RMS }}=\sqrt{2 E\left(\mathbf{w}^{\star}\right) / N}
\widetilde{E}(\mathbf{w})=\frac{1}{2} \sum_{n=1}^N\left{y\left(x_n, \mathbf{w}\right)-t_n\right}^2+\frac{\lambda}{2}|\mathbf{w}|^2