Machine Learning (Hung-yi Lee,NTU,2017,Spring)

Course Link:

http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html

Video Link:

https://www.bilibili.com/video/av10590361?from=search&seid=10585674701848945740

https://www.youtube.com/watch?v=CXgbekl66jc&list=PLJV_el3uVTsPy9oCRY30oBPNLCo89yu49

1 Introduction of this course

Handout: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/introduction.pdf

The red part illustrates that the same task can be solved by using different types of methods.

2 Regression

Handout: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/Regression.pdf

The output of the regression is scalar.

Step1:

Model（a set of function）
In this section, we need to define a set of function so that we can define a relation between all the parameters.

Step2: Goodnesss of Function

Loss funcionn will be used in this part to assess whether the function is good or not.

Step3: Best Function

loss function: L(b,w)

$$x^{*},y^{*}=argmin L(w,b) $$

Use Gradient Descent -> pick the best function

Q: Each time we update the parameter, we obtain $\theta$ that makes $L(\theta)$ smaller. Is this statement correct?

A: No, It can be illustrated by the chart in the image.

In linear regression there is no need to worry about the local minimum since the the target function is a convex funtion

How is the result?
-Generaliazation

We use different types of curve to fit the inner connection of these different parameter.

Overfitting- We get good result in the train set but bad in the test set.
A more complex model doesn’t always lead to better performance on testing data.

To fix overfitting we collect more data.

There are some hidden factors that leads to the overfitting which is different kinds of data fits different kinds of model.

Back to step1: Redesign the Model

Back to step2: Regularization. In this section we believe that function with smaller $w_i$ are better
We do regularization to the weight but not bias since for most of the situation the bias won’t affect the smooth degree of the curve. Also, since we condider $\lambda$ and the error would raise if the $\lambda$ raised. The larger the $\lambda$ is, the smoother the curve will be.

Simon,11/23/2018 7:52:14 PM