The Chain RulesteemCreated with Sketch.

in #math8 years ago

Logo.png


In calculus there are several different rules for taking the derivative of certain types of functions. For example, we have the power rule, the product rule and the quotient rule. In this post I will discuss the most powerful rule of them all which is called the chain rule and I will give a couple of examples of how this works.

The chain rule allows us to take the derivative of the composition of two differentiable functions. The chain rule is the following:


Chain Rule:

Suppose that f and g are differentiable functions then [f(g(x))]' = f'(g(x))g'(x)


Thus the chain rule tells us that the derivate of the composition of f(x) and g(x) is equal to the derivative of f(x) evaluated at g(x) times the derivative of g(x). I will now go over several examples of how to use the chain rule.

For the first example suppose that we want to find the derivative of h(x) = sin(2x). Then we can see that h is the composition of the two functions f(x) = sin(x) and g(x) = 2x. According to the chain rule we first find f'(x) = cos(x) and evaluate this derivative at g(x) to get f'(g(x)) = cos(2x). Next we find g'(x) = 2. According to the chain rule we have h'(x) = cos(2x)2.

For our second example we will find the derivative of h(x) = ex2. Here we have f(x) = ex and g(x) = x2. We first calculate f'(x) = ex and evaluate this derivative at g(x) to get f'(g(x)) = ex2 and then multiply by g'(x) = 2x to get h'(x) = ex22x.

Once we become more familiar with the chain rule we will begin to do it without writing out all of the steps as in the previous two examples. There is a saying used which helps us to remember the main idea of the chain rule. We can calculate the derivative of the composition of two functions by taking the derivative of the outside function f times the derivative of the inside function g. Although it is not mentioned in this phrase, technically we want the derivative of f evaluated at g.

Using this idea let us find the derivative of the function h(x) = (2x + 3)3. In this case the outside function is x3 and the inside function is 2x + 3. So according to the chain rule the derivative is h'(x) = (derivative of inside)(derivative of outside) = 3(2x + 3)22 = 6(2x + 3)2.

As the last example let us find the derivative of h(x) = cos(ex + 1). The outside function is now cos(x) and the inside function is ex + 1. According to the chain rule we have
h'(x) = -sin(ex + 1)ex

In this post we have introduced the chain rule and have given several examples of how to calculate the derivative of the composition of two differentiable functions. With the chain rule we are not limited to calculating the composition of only two functions. For example, if we wanted to find the derivative of the composition of three functions we would apply the chain rule twice to find the derivative. From this is should be clear that we can use the chain rule to find the derivative of compositions of any length and thus the chain rule provides a powerful tool for finding the derivative of complicated functions.


References:

https://en.wikipedia.org/wiki/Chain_rule
http://mathworld.wolfram.com/ChainRule.html


All the images in this post were created by myself using latex.
Sort:  

The @OriginalWorks bot has determined this post by @timspeer to be original material and upvoted it!

ezgif.com-resize.gif

To call @OriginalWorks, simply reply to any post with @originalworks or !originalworks in your message!

Don't forget to check out the sponsored writing contest! 125 SBD in prizes!

For more information, Click Here!
Special thanks to @reggaemuffin for being a supporter! Vote him as a witness to help make Steemit a better place!

Nice article. The chain rule is central to the backpropagation algorithm used during the learning phase of a neural network. It provides a way to determine the precise amount that a neuron's weight in an early layer of the network affects the ultimate prediction value after the last hidden layer of the network.

Since that derivative cannot be calculated directly, you replace it with the derivative with respect to a known weight times the derivative of the activation function in the previous layer (layer n-1) with respect to the target weight. Now, since that last derivative is unknown, you repeat the process, replacing it with the derivative with respect to a known weight times the derivative of the activation function in the previous layer (layer n-2) with respect to the target weight. And you keep repeating that until you incorporate a derivative of the activation function in the layer in which the weight is used. Since that derivative is known, the process of chaining together all of these derivatives is complete, and now you just multiply them all together.

It's an elegant solution that's fascinating to watch unfold.

Yes, I remember hearing about this before. Thanks for sharing, very interesting. Might be a good topic for a more in depth post!

Congratulations @timspeer! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of upvotes received

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Coin Marketplace

STEEM 0.13
TRX 0.33
JST 0.034
BTC 110675.01
ETH 4295.01
USDT 1.00
SBD 0.83