The easiest to follow derivation of backpropagation I’ve come across.

The Clever Machine


Artificial neural networks (ANNs) are a powerful class of models used for nonlinear regression and classification tasks that are motivated by biological neural computation. The general idea behind ANNs is pretty straightforward: map some input onto a desired target value using a distributed cascade of nonlinear transformations (see Figure 1). However, for many, myself included, the learning algorithm used to train ANNs can be difficult to get your head around at first. In this post I give a step-by-step walk-through of the derivation of gradient descent learning algorithm commonly used to train ANNs (aka the backpropagation algorithm) and try to provide some high-level insights into the computations being performed during learning.

Artificial  Neural Network Figure 1: Diagram of an artificial neural network with one hidden layer

Some Background and Notation

An ANN consists of an input layer, an output layer, and any number (including zero) of hidden layers situated between the input and output layers. Figure…

View original post 1,990 more words

I’ve written a 5-page expository article on the Lagrangian and Eulerian coordinate system, where the intuition behind the systems is discussed as well as the math that follows. The article also has a couple of nice figures courtesy of me playing around with Inkscape.


Edit (22.03.2012) : Spotted some errors, changed wording.

The purpose of this post is to explain how to compile the code explained in the paper “Real-Time Fluid Dynamics for Games” by Jos Stam.

CLICK HERE to download the paper.

CLICK HERE to download code. The complete code consists of two files, source.c and demo.c. They are zipped in one file

In order to compile the code, you first need to have a compiler. I use MinGW which you can get HERE. Please take the time to carefully follow the instructions provided at the MinGW site. Personally, I use something called Qt which is a nice way to create GUIs, the installation of which also installs MinGW.

The code by Jos Stam uses a think called GLUT. GLUT is an interface that allows you to easily use OpenGL to render graphics. GLUT is not open source, but there is an open source alternative to it called freeglut. I use GLUT. A great tutorial on how to set up GLUT with MinGW can be found HERE.

First unzip to any folder. Copy glut32.dll to the same folder. Open up windows command line and navigate your way to this folder.
We then create object files out of the source code files solver.c and demo.c:

gcc -c -o solver.o solver.c -I”D:\Dropbox\Code\GLUT\include”
gcc -c -o demo.o demo.c -I”D:\Dropbox\Code\GLUT\include”

The folder where you have put your GLUT files might of course not be the same as mine.

Finally, to generate an .exe file, we must link the two object files and tell the linker where our GLUT libraries are and which ones to use:

gcc -o demo.exe demo.o solver.o -L”D:\Dropbox\Code\GLUT\lib” -lglut32 -lopengl32 -lglu32 -WL,–subsystem,windows

Thats’s it!

According to the 12-step program, it’s time to solve a 1D diffusion equation, aka Heat equation of the form,

\displaystyle \frac{\partial u}{\partial t} = \nu \frac{\partial^2 u}{\partial x^2},

where {\nu} is the viscosity. The process is very similar to solving a 1D convection equation, but this time we need to approximate a second derivative. An easy way to get an expression that approximates a second derivative is by using Taylor series:

\displaystyle u(x+h,t)=u(x,t)+hu'(x,t)+\frac{h^2}{2!}u''(x,t)+\cdots,


\displaystyle u(x-h,t)=u(x,t)-hu'(x,t)+\frac{h^2}{2!}u''(x,t)+\cdots.

Now add them up and truncate to get,

\displaystyle u(x+h,t)+u(x-h,t)=2u(x,t)+h^2u''(x,t).

Since we are intereseted in the second derivative, we rewrite the last expression as

\displaystyle u''(x,t)=\frac{1}{h^2}\left[u(x-h,t)-2u(x,t)+u(x+h,t)\right].

The only part of the code that needs to change is the finite-difference loop itself:

// Finite-difference loop:
  for (int it=1; it<=nt-1; it++)
      for (int k=0; k<=nx-1; k++)
	  un[k][it-1] = u[k][it-1];
	  // Keep values of u at the 'ends' constant:
      u[0][it] = un[0][it-1];
      u[nx-1][it] = un[nx-1][it-1];
      for (int i=1; i<=nx-2; i++)
	  u[i][it] = un[i][it-1] + vis*dt/dx/dx*(un[i+1][it-1]-2*un[i][it-1]+un[i-1][it-1]);

Imagine that we have a rod of length 2m where the initial temperature distribution is the step-function as defined in the code. If no heat is added, we expect the rod to cool over time. The shown plot supports this.


In the last post, I posted code that solves the 1D linear convection equation. Here, I’ll just post the results of solving a 1D nonlinear convection equation, also known as the inviscid (i.e., zero viscosity) Burgers equation. It’s all part of the 12 step program to solving the Navier-Stokes equations, as taught by Prof. Barba at Boston University. The 1D nonlinear convection equation is of the form,

\displaystyle \frac{\partial u}{\partial t} + u\frac{\partial u}{\partial x} = 0.

The code is almost identical to the linear case, the only difference is that here we don’t have the parameter {c}. The plot shows both the linear (pink) and the nonlinear (black) case.

I found a Wolfram demonstration project that does the same thing as I’ve done, but for several initial conditions. It’s quite nice.

I started watching the CFD lectures provided by Boston University via iTunes, and thought I could share some bad code with the world. The lectures apparently consist of 12 steps to solving the Navier-Stokes equations, where the first one is to solve a 1D convection equation. The class used Python, but I will be using C++.

Our aim is to solve

\displaystyle \frac{\partial u(x,t)}{\partial t} + c\frac{\partial u(x,t)}{\partial x}=0, \ \ \ \ \ (1)

subject to some initial condition {u(x,0)=u_0}.The analytical solution to this problem is {u(x,t)=u_0(x-ct).}
We will approximate the left-hand side by a forward difference, and the right hand side by a backward difference. Read the post on finite-differences if the topic is unfamiliar to you.

The finite-differences we will use, together with their truncation errors are:

\displaystyle \frac{\partial u(x,t)}{\partial t} = \frac{1}{k}\left(u(x,t+k)-u(x,t)\right) + O(k)


\displaystyle \frac{\partial u(x,t)}{\partial x} = \frac{1}{h}\left(u(x,t)-u(x-h,t)\right) + O(h).

To simplify the final expression, we introduce the notation {u(x,t)=v_{i,j}}. Using a different letter for the computed solution {(v)} indicates that the computed solution is different from the real one, {(u)}. We are now able to rewrite equation (1) in its finite-difference form:

\displaystyle \frac{1}{k}\left(v_{i,j+1}-v_{i,j}\right) = \frac{c}{h}\left( v_{i,j}-v_{i-1,j}\right).

The initial condition makes sure that {v_{i,0}} is known for all {i}. We can therefore solve for {v_{i,j+1}} to get,

\displaystyle v_{i,j+1} = v_{i,j} - \frac{ck}{h}\left(v_{i,j}-v_{i-1,j}\right).

I won’t go into details about the implementation, because it is rather simple. The initial condition is chosen to be a step-function. I think the reason Prof. Barba has chosen this function is that it is not continuous, and solving using finite-differences will give pretty bad results. The output is saved to a .dat file. I import the .dat file into MATLAB to generate plots. As expected, the results are not great. It’s quite instructive to play around with the step sizes, number of meshpoints and the initial condition. I apologize for not naming my variables as they are named in the above derivation.

 * Solves 1D convection equation, also known as the one-way wave equation.

#include <iostream>
#include <fstream>
#include <vector>
using namespace std;

void saveArray(vector< vector<double> > &u, int nx, int nt, double dx, double dt, int c);

int main()

  int nx = 20;
  int nt = 50;
  double dx = double(2)/(nx-1);
  double dt = 0.01;
  int c = 5;

  vector< vector<double> > u(nx,vector<double> (nt));
  vector< vector<double> > un(nx,vector<double> (nt));

  // Initial condition:
  for (int i=0; i <= nx-1; i++)
      if (i*dx >= 0.5 && i*dx <= 1)
	  u[i][0] = 2;
	  u[i][0] = 1;

  // Finite-difference loop:
  for (int it=1; it<=nt-1; it++)
      for (int k=0; k<=nx-1; k++)
	  un[k][it-1] = u[k][it-1];
      for (int i=1; i<=nx-1; i++)
	  u[0][it] = un[1][it-1];
	  u[i][it] = un[i][it-1] - c*dt/dx*(un[i][it-1]-un[i-1][it-1]);

  saveArray(u, nx, nt, dx, dt, c);

  return 0;

// Save array to file:
void saveArray(vector< vector<double> > &u, int nx, int nt, double dx, double dt, int c)
  ofstream myfile;"1d_convection02.dat");
  myfile << "%\t" << "nx = "  << nx  << "\tnt = " << nt <<
    "\tdt = " << dt << "\tc = " << c << endl;

  for (int i=0; i<=nx-1; i++)
      myfile << i*dx << "\t\t";
      for (int j=0; j<=nt-1; j++)
	  myfile << u[i][j] << "\t\t";
      myfile << endl;

As I was reading about the fourth order Runge-Kutta method and how MATLAB’s own ODE45 is a modified version of this, I thought it would be interesting to check it out myself. For details on ODE solvers, take a look at Mark Embree’s notes, which are the best notes on Numerical Analysis I’ve read. Also, have a look at Mathwork’s demo-site on differential equations which shows how to solve systems of differential equations using ODE45.

Included in our comparison is the trusty Euler’s method, and the code is shown below. DOWNLOAD Euler’s method.

function [t,y] = euler(yprime, tspan, y0, h)
% function [t,x] = euler(xprime, [t0 tfinal], x0, h)
% Approximate the solution to the ODE: y'(t) = yprime(x,t)
% from t=t0 to t=tfinal with initial condition y(t0)=y0.
% yprime should be a function that can be called like: yprime(t,y).
% h is the step size: reduce h to improve the accuracy.
% The ODE can be a scalar or vector equation.

t0 = tspan(1); tfinal = tspan(end);

% set up the t values at which we will approximate the solution

% include tfinal even if h does not evenly divide tfinal-t0
if t(end)~=tfinal, t = [t tfinal]; end

% execute Euler's method
y = [y0 zeros(length(y0),length(t)-1)];

 for j=1:length(t)-1
 y(:,j+1) = y(:,j) + h*feval(yprime,t(j),y(:,j));

The implementation of Runge-Kutta is equaly simple. Download fourth order Runge-Kutta.

% Classical fourth-order Runge-Kutta method
function [t,y] = runge_kutta_4order(yprime, tspan, y0, h)

% function function [t,y] = runge_kutta_4order(yprime, [t0 tfinal], y0, h)
% Approximate the solution to the ODE: y'(t) = yprime(x,y)
% from t=t0 to t=tfinal with initial condition y(t0)=y0.
% yprime should be a function that cna be called like xprime(t,y).
% h is the step size: reduce h to improve the accuracy.
% The ODE can be a scalar or vector equation.

% Sample code to call runge_kutta_4order.m for the equation x'(t)=sin(t*x)
% yprime = inline('sin(y*t)','y','y');
% Instead of inline we can use yprime=@(t,y) sin(y*t).
% tspan = [0 10]; Integrate from t=0 to t=10.
% y0 = 5;
% h = 0.1; Time step.
% [t,y] = runge_kutta_4order(yprime, tspan, y0, h); Call Runge-Kutta.
% figure(1), clf
% plot(t,y,'b.','markersize',15); Plot output

t0 = tspan(1); tfinal = tspan(end);

% set up the t values at which we will approximate the solution
t = [t0:h:tfinal]';

% include tfinal even if h does not evenly divide tfinal-t0
if t(end)~=tfinal, t=[t tfinal]; end

m = length(t);
y = [y0 zeros(length(y0), length(t)-1)];

 for i=1:(m-1)
 k1 = feval(yprime,t(i),y(:,i));
 k2 = feval(yprime,t(i)+0.5*h, y(:,i)+(h.*k1)/2);
 k3 = feval(yprime,t(i)+0.5*h, y(:,i)+(h.*k2)/2);
 k4 = feval(yprime,t(i)+h, y(:,i)+h.*k3);

 y(:,i+1) = y(:,i)+(h*(k1+2*k2+2*k3+k4))/6;

We finally write a small script that uses all three solvers and plots the results. Download ODE comparison script.

% Comparison of ODE solvers on a system of equations.
% 1. Euler
% 2. Runge-Kutta
% 3. ODE45

yprime = @(t,y) vanderpoldemo(t,y,1); % Use y0=[2 0],tspan=[0 20]

tspan = [0 20];
y0 = [2 0]';
h = 0.1; %Time step.

for i=1:length(y0)
 figure(i), clf, hold on, grid on, xlabel('t'), ylabel('y(t)'),...
 title('Comparison of ODE solvers on a system of equations')

 [teuler,yeuler] = euler(yprime, tspan, y0, h);

 [trunge,yrunge] = runge_kutta_4order(yprime, tspan, y0, h);

 [tode45,yode45] = ode45(yprime, tspan, y0);


Now to the fun part, plots.

As expected, Runge-Kutta is almost identical to ODE45. It should be noted though, that ODE45 uses an adaptive scheme which adjusts the step size when needed. Euler’s is quite a bit off, which was also to be expected. Quite a trivial exercise, but still nice to see theory in practice.

I’ve completely rewritten the introductory post on interpolation using Lagrange polynomials. I also plan to do a proper write-up on the error involved in interpolation using Lagrange polynomials, and show why the Lagrange basis is better than the monomial basis.

The new semester starts next-week and hopefully we will cover a lot! If I get the time in-between work and Numerical Analysis, I’ll try and  solve problems from Baby Rudin.


(last update: 01.05.2013)

As we have shown, using equally spaced interpolation points can lead to polynomials growing wildly at the end-points of the interval. Another problem we must deal with is the fact that the monomial basis is not ideal for interplation purposes. First, we recall that every polynomial can be written as a linear combination of the monomial basis {1, x, x^2, x^3, \dots}. The problem with this basis is that its elements grow increasingly closer, which is evident from the attached figure.


From linear algebra we know that {\{ [1,0], [0,1] \}} is a basis for {\mathbb{R}^2}, but we could just as well use {\{ [10^{-10},0], [0,1] \}} as a basis. Any two independent vectors in {\mathbb{R}^2} form a basis of {\mathbb{R}^2} , but they are not all equally good considering computation with finite-precision arithmetic.

If we wanted to write the vector {[1,1]} as a linear combination of the first basis we would simply add the basis vectors, {\{1\cdot[1,0]+1\cdot[0,1]\}=[1,1]}. On the other hand, writing the same vector using the second basis gives, {\{10^{10}\cdot[10^{-10},0]+1\cdot[0,1]\}=[1,1]}. The reason why this is a problem is that if we make a small change to the vector we wish to expand as a linear combination of the basis vectors, the coefficients will change by a huge amount, which may lead to a lot of problem when working with finite-precision arithmetic.

This simple example illustrates the importance of a good basis.

We have seen how easy it is to come up with a interpolating polynomial in the monomial basis. The technique does however have some downfalls.
The simple code attached allows you to play around with this kind of interpolation. You must define the interval, function and interpolation points.  As the plot shows, the polynomial grows very large close to the endpoints of the interval. One of the reasons for this is that we chose equally distanced interpolation points.
There exist better choices of interpolation points, but it is not always the case that you may chose the points to be used, e.g. when using table values. Try increasing the number of interpolation points and see what happens. DOWNLOAD M-FILE HERE.

I have yet to find out how to make good looking plots  MATLAB. If I export using 600dpi, graph looks fine when viewed in full-size, but when I shrink it to fit wordpress it does not look good at all. The figure in this post is exported using 150dpi, and it is not pretty. Any hints are greatly appreciated!

% Polynomial interpolation using monomial basis.
% interpolation_monomial.m

function [] = interpolation_monomial(x,y,x_interval)
m = length(x);

% Create Vandermonde matrix.
V = zeros(m,m);

for i = 1:m
 V(i,:) = x(i).^(0:m-1);

% Find coefficients a_0,a_1,...,a_n.
a = V\y';

% flipud() reverses the column vector a so it may be used correctly in
% polyval().
h1 = plot(x_interval,polyval(flipud(a),x_interval));
set(h1,'LineStyle','- -','LineWidth',1.5,'Color','Black');

% Testing interpolation_monomial.m

x_interval = linspace(0,1,1000);
f = @(x) cos(40.*x);

hold on;
h1 = plot(x_interval,f(x_interval));

% Choose n+1 points on function.
x = 0:0.1:1;
y = f(x);



fh = figure(1);