Machine Learning (Hung-yi Lee,NTU,2017,Spring)
Course Link:
http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html
Video Link:
https://www.bilibili.com/video/av10590361?from=search&seid=10585674701848945740
https://www.youtube.com/watch?v=CXgbekl66jc&list=PLJV_el3uVTsPy9oCRY30oBPNLCo89yu49
1 Introduction of this course
Handout: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/introduction.pdf
The red part illustrates that the same task can be solved by using different types of methods.
2 Regression
Handout: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/Regression.pdf
The output of the regression is scalar.
- Step1:
Model(a set of function)
In this section, we need to define a set of function so that we can define a relation between all the parameters.
- Step2: Goodnesss of Function
Loss funcionn will be used in this part to assess whether the function is good or not.
- Step3: Best Function
loss function: L(b,w)
$$x^{*},y^{*}=argmin L(w,b) $$
Use Gradient Descent -> pick the best function
Q: Each time we update the parameter, we obtain \(\theta\) that makes \(L(\theta)\) smaller. Is this statement correct?
A: No, It can be illustrated by the chart in the image.
In linear regression there is no need to worry about the local minimum since the the target function is a convex funtion
How is the result?
-Generaliazation
We use different types of curve to fit the inner connection of these different parameter.
Overfitting- We get good result in the train set but bad in the test set.
A more complex model doesn’t always lead to better performance on testing data.
To fix overfitting we collect more data.
There are some hidden factors that leads to the overfitting which is different kinds of data fits different kinds of model.
Back to step1: Redesign the Model
Back to step2: Regularization. In this section we believe that function with smaller \(w_i\) are better
We do regularization to the weight but not bias since for most of the situation the bias won’t affect the smooth degree of the curve. Also, since we condider \(\lambda\) and the error would raise if the \(\lambda\) raised. The larger the \(\lambda\) is, the smoother the curve will be.
Simon,11/23/2018 7:52:14 PM