Machine-learning 5 minutes

What is gradient descent (GD)?

Gradient descent is an optimization algorithm that tries to find the minimum of a function by following its gradient.

The gradient

The gradient of a differentiable function is a vector that points in the direction of greatest rate of increase of .

Gradient

Mathematically, it is defined by the partial derivatives:

Gradient descent

Since the gradient is directed along the steepest ascent, it’s opposite is directed along the steepest descent. The gradient descent algorithm aims at finding a function’s minimum by taking successive steps directed along :

where is the size of the step at iteration .

Gradient descent

def gradient_descent(f, x_0, max_steps, step_size):
    x = x_0
    for step in range(max_steps):
        x = x - step_size * gradient(of=f, at=x)
    return x

Pitfalls

As illustrated on the picture above, the gradient descent algorithm might get stuck in a local minimum. Since it always follows the steepest descent, there is no way to get away from a local minimum once it has been reached.

Illustration

The contour plot below displays the contours line of a function (think of it as seeing the surface from above). Click anywhere on it to start a gradient descent.