Machine-learning 5 minutes

# What is gradient descent (GD)?

Gradient descent is an optimization algorithm that tries to find the minimum of a function by following its gradient.

The gradient $\nabla f$ of a differentiable function $f$ is a vector that points in the direction of greatest rate of increase of $f$.

Mathematically, it is defined by the partial derivatives:

Since the gradient is directed along the steepest ascent, it’s opposite $-\nabla f$ is directed along the steepest descent. The gradient descent algorithm aims at finding a function’s minimum by taking successive steps directed along $- \nabla f$:

where $s_n$ is the size of the step at iteration $n$.

def gradient_descent(f, x_0, max_steps, step_size):
x = x_0
for step in range(max_steps):
x = x - step_size * gradient(of=f, at=x)
return x


## Pitfalls

As illustrated on the picture above, the gradient descent algorithm might get stuck in a local minimum. Since it always follows the steepest descent, there is no way to get away from a local minimum once it has been reached.

## Illustration

The contour plot below displays the contours line of a function $f$ (think of it as seeing the surface from above). Click anywhere on it to start a gradient descent.