The Regularization method to reduce over-fitting

mohism (35)in #blog • 6 years ago (edited)

Why we need regularization

As the deep neural network becomes more and more complicated, the over-fitting problem will appear. Therefore we need some tricks to overcome over-fitting problem. One of solutions to tackle it is doing regularization. There are several regularization methods, the general version will be discussed in this essay.

How to do regularization

Regularization sounds very noble and mysterious, but it is just an adding item to the original cost function. So let's review what is cost function without regularization:

Then, let's view the cost function with regularization:

Inside this big equation, is called regularization parameter, apparently it's a kind of hyper-parameters. Different values of will generate different models.

The effects on gradient descent method

In Deep-learning, the Gradient Descent method is usually used to find the most optimal parameters matrix: W. Let's review the gradient descent method on W firstly:

If we want to take derivatives on the new version of the cost function, the new partial derivative is:

Now we take the equation 5 into the equation 3, we can get:

From the equation 5, we can know that is less than 1, so the final value of W will be smaller than before(without regularization). If the value of becomes larger, the final value of W would be smaller.

Why Regularization can reduce over-fitting

In order to answer this question intuitively, we start with a fundamental problem: there are only three cases for machine learning models trained by us: "High Bias", "Just right" and "High variance".

Our target is "Just right", and the regularization is used to reduce the third one: "High variance".

According to the deduction from the last section, the gets bigger, the final W would be smaller. If the becomes large enough, the value of W will approach zero. That means the whole network becomes a very simple network like Logistic Regression because the majority of network weights becomes 0. So we can find a middle value of to get the "Just right" case.

#technology #science

6 years ago in #blog by mohism (35)

$0.00

3 votes

Sort:

Trending

[-]

wanabe (61) 6 years ago

good article

$0.07

3 votes

[-]

ipromote (64) 6 years ago

spam @steemflagrewards

$0.34

2 votes

[-]

steemflagrewards (69) 6 years ago

Steem Flag Rewards mention comment has been approved! Thank you for reporting this abuse, @ipromote.

spam
You are repetitively posting the same content or recyling contents after a period of time. This post was submitted via our Discord Community channel. Check us out on the following link!
SFR Discord

$0.00

[-]

mohism (35) 6 years ago

Thanks!

$0.00

STEEM 0.25

TRX 0.25

JST 0.040

BTC 94687.77

ETH 3416.09

USDT 1.00

SBD 3.32

The Regularization method to reduce over-fitting

Why we need regularization

How to do regularization

The effects on gradient descent method

Why Regularization can reduce over-fitting

Coin Marketplace