RE: The Chain Rule

You are viewing a single comment's thread from:

RE: The Chain Rule

View the full context

terenceplizga (48)in #math • 8 years ago

Nice article. The chain rule is central to the backpropagation algorithm used during the learning phase of a neural network. It provides a way to determine the precise amount that a neuron's weight in an early layer of the network affects the ultimate prediction value after the last hidden layer of the network.

Since that derivative cannot be calculated directly, you replace it with the derivative with respect to a known weight times the derivative of the activation function in the previous layer (layer n-1) with respect to the target weight. Now, since that last derivative is unknown, you repeat the process, replacing it with the derivative with respect to a known weight times the derivative of the activation function in the previous layer (layer n-2) with respect to the target weight. And you keep repeating that until you incorporate a derivative of the activation function in the layer in which the weight is used. Since that derivative is known, the process of chaining together all of these derivatives is complete, and now you just multiply them all together.

It's an elegant solution that's fascinating to watch unfold.