The Simple Mathematics that Make AI Possible

At the heart of machine learning and neural networks lays a simple linear algebra technique, "gradient descent". In this post I will explore it in it's simplest form: a single node network.

Jan 03, 2024

Abstract

In this post, I explore the simplest forms of machine learning/artificial intelligence and the conceptual math that drives them. Through exploring the challenges of AI, I touch upon the businesses that will benefit. Furthermore, I suggest a potentially overlooked method that can improve the way AI is developed in the future.

The Simplest AI

Anyone that has done simple high-school calculus has most likely learned the math behind the most primitive form of AI: the derivative. On a 2D-line, you can use the derivative to determine local minimums and maximums fairly easily.

9.7 Second-Order Differentiation, Turning Points, Maximum and Minimum Points - user's Blog!

Finding minimums and maximums is exactly what machine learning and artificial intelligence does. However, an AI doesn’t know what the curve looks like at all. Instead, an AI has to go to random point and find the derivative for it. By knowing the size and value of the derivative, the AI can follow the direction of the curve by taking a small step. Then, the process repeats until the AI reaches a minima/maxima. This is a process called gradient descent.

Introduction to Artificial Neural Networks part two: Gradient Descent, Backpropagation, Supervised & Unsupervised learning | Adatis — https://adatis.co.uk/introduction-to-artificial-neural-networks-part-two-gradient-descent-backpropagation-supervised-unsupervised-learning/

Scaling up the Dimensions

Gradient descent is the backbone behind nearly all machine learning and artificial intelligence algorithms. The added complexity though, is that instead of a 2D-line, we are now working with 3-dimensions or more. These will form a multi-dimensional surface making the math more complex. However, it follows the same principle: go to a random point and find the “gradient” for it. By knowing the size and direction of the gradient, the AI can follow the steepest part of surface by taking a small step. Then, the process repeats until the AI reaches a minima/maxima. Conceptually, it is the exact same as the 2D-line. The only difference is that the math requires more linear algebra.

A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam) | by Lili Jiang | Towards Data Science — https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c

Challenges and Business Prospects

Now, the hardest part of developing AI isn’t the math anymore. It is now: how to frame a problem in a such a way that gradient descent can solve it. This is an incredibly complex task that thousands of minds are battling as we speak.

Additionally, many of you probably already see a problem within this method. Sometimes the AI doesn’t get to the optimal solution, in this case, the lowest valley. But that problem is solvable too. If we try hundreds, thousands, millions, or more random starting points, one of them inevitably find the optimal solution. However, the simulation and calculation of a high-dimensional gradient many times over is very computationally expensive.

The explosion in computation requirements is why we have seen NVidia data center revenues rise so much in 2023. Every AI company that is seriously training their models is using NVidia, Microsoft, Amazon, or Google data centers. All of whom use NVidia GPUs. In a way, NVidia is selling the picks and shovels of this new age.

Simulated Annealing: An Overlooked Technology?

I believe that there are more computationally efficient methods than gradient descent. By using probabilistic techniques, the number of starting points can be reduced with minimal drawbacks. In the 70’s, a mathematical hill climbing technique was developed called “simulated annealing”. This technique is much more efficient at finding near-global minima/maxima as it uses probabilities to determine optimal search locations. This can cut down on the vast number of simulations used in current methods while finding solutions that are near-optimal.

To demonstrate the power of this simulated annealing technique as a method of machine learning, I used it to sort rings of various colors with 2 possible input controls and 1 output.

You can find a showcase of this in my video here:

(Aside: This is a method of disc defragmentation that can possibly reduce the number of write cycles.)

Simon’s Research and Projects

Discussion about this post

Ready for more?