This page demonstrates the learning rule for updating weights in a single layer artificial neural network. Since the learning rule is the same for each perceptron, we will focus on a single one. In this demonstration, we will assume we want to update the weights with respect to the gradient descent algorithm.

Let's consider the following perceptron:

The transfert function is given by:

$$ \begin{equation} y= f(w_1.x_1 + w_2.x_2 + ... + w_N.x_N) = f(\sum\limits_{i=1}^N w_i.x_i) \label{eq:transfert-function} \end{equation} $$

Let's define the sum \(S\):

$$ \begin{equation} S(w_i,x_i)= \sum\limits_{i=1}^N w_i.x_i \label{eq:sum} \end{equation} $$

Let's rewrite \(y\) as a function of \(S\) by merging equations \( \eqref{eq:sum} \) and \( \eqref{eq:transfert-function} \):

$$ y(S)= f(\sum\limits_{i=1}^N w_i.x_i)=f(S(w_i,x_i)) $$

In artificial neural networks, the error we want to minimize is:

$$ E=(y'-y)^2 $$

with:

- \(E\) the error
- \(y′\) the expected output (from training data set)
- \(y\) the real output of the network (from network)

In practice and to simplify the maths, this error is divided by two:

$$ E=\frac{1}{2}(y'-y)^2 $$

The algorithm (gradient descent) used to train the network (i.e. updating the weights) is given by:

$$ \begin{equation} w_i'=w_i-\eta.\frac{dE}{dw_i} \label{eq:gradient-descent} \end{equation} $$

where:

- \( w _i \) the weight before update
- \( w_i' \) the weight after update
- \( \eta \) the learning rate

Let's derivate the error:

$$ \begin{equation} \frac{dE}{dw_i} = \frac{1}{2}\frac{d}{dw_i}(y'-y)^2 \label{eq:error} \end{equation} $$

Thanks to the chain rule

$$ (f \circ g)'=(f' \circ g).g') $$

the equation \( \eqref{eq:error} \) can be rewritten:

$$ \frac{dE}{dw_i} = \frac{2}{2}(y'-y)\frac{d}{dw_i} (y'-y) = -(y'-y)\frac{dy}{dw_i} $$

Let's now calculate the derivative of \(y\):

$$ \begin{equation} \frac{dy}{dw_i} = \frac{df(S(w_i,x_i))}{dw_i} \label{eq:dy-dwi} \end{equation} $$

Once again, we use the chain rule to rewrite equation \( \eqref{eq:dy-dwi} \) :

$$ \frac{df(S)}{dw_i} = \frac{df(S)}{dS}\frac{dS}{dw_i} = x_i\frac{df(S)}{dS} $$

The derivative of the error becomes:

$$ \begin{equation} \frac{dE}{dw_i} = -x_i(y'-y)\frac{df(S)}{dS} \label{eq:derror} \end{equation} $$

By merging equations \( \eqref{eq:gradient-descent} \) and \( \eqref{eq:derror} \) the weights can be updated with the following formula:

$$ w_i'=w_i-\eta.\frac{dE}{dw_i} = w_i + \eta. x_i.(y'-y).\frac{df(S)}{dS} $$

In conclusion:

$$ w_i'= w_i + \eta.x_i.(y'-y).\frac{df(S)}{dS} $$

- Neural networks curve fitting
- Datasets for deep learning
- Gradient descent example
- Linear regression example
- Most popular activation functions for deep learning
- Neural Network Perceptron
- Simplest neural network with TensorFlow
- Simplest perceptron
- Single layer training algorithm
- Single layer classification example
- Gradient descent for neural networks
- Single layer limitations
- Neural networks

Last update : 03/09/2020