No code-no maths: Learn Gen AI

Khang Vu Tien
6 min readFeb 15, 2024

--

Home screen of TensorFlow Playground

Ever wonder how chatbots work? You know, those virtual assistants always ready to answer your questions? With a playground, let’s dive in without any intimidating math or coding!

For this we will proceed in 3 steps:

  1. We will use a playground to “no-code” build and run a neural network. We see how it “learns” a task. Neural networks are the bricks and mortar of computer vision, of image generators and of natural language processing (NLP). NLP is the basis of chatbots.
  2. Then we’ll watch a 25-minutes “no-maths” video from Harvard University explaining how computers “talk” to each other, using special math tricks called “Transformers.” Think of it like translating languages, but for computers! This helps them understand and respond to our words, making chatbots more natural.
  3. By combining these building blocks and language skills, we get chatbots! But remember, there’s no magic or real intelligence involved. It’s like a complex machine trained to respond in specific ways. This knowledge helps us appreciate what chatbots can do (answer questions, translate languages) and what they cannot (think for themselves, solve complex problems).

So, there you have it! A basic understanding of chatbots without any scary technical stuff. Remember, knowledge is power, and knowing how something works makes it less intimidating and more fun!

Graphically Build & Run a Neural Network

Imagine a website where you can play with mini-brains that can learn! With it you can build and train neural networks to discriminate between 2 datasets. These pairs are more or less difficult to discriminate, so your neural network also has to be more or less complex.

You can play around on your own, but this 2-part article is a guide to help you get started. This is the web site: https://playground.tensorflow.org/ . The screenshot below shows the home screen. Let’s have an overview of the actions that are possible on this playground.

->Choose 1 of the 4 input data sets : There are 4 squares filled with tiny dots, like colored pixels. We pick one square (it’s brighter!) and feed it to a smart program. This program learns to separate blue and red dots, assigning to each color its own area. Cool, right?

->Choose how the input data is preprocessed : Each pixel of the input square on the far left is preprocessed and fed to the first layer.

  • “x1” and “x2” simply multiply the input by a weight.
  • “x1²” and “x2²” square the input before multiplying it by a weight.
  • “x1 x2” multiply together the coordinates of input and the result is multiplied by a weight.
  • “sin(x1)” and “sin(x2)” calculate the sine of the coordinates of each input and multiply it by a weight.

->Set the hidden layers and populate the layers : build a network of interconnected squares, where each square represents a brain cell. You can adjust the number of squares in each layer and the connections between them to fine-tune the network’s performance.

->Reset, train & test the neural network : We see 3 buttons.

  • the reset button initiates randomly the preprocessing and the weights of all connections between brain cells.
  • the run button starts the magic part of the neural network. It uses the TensorFlow library to train the neural network. You can pause and restart the training.

Our machine learns by playing a guessing game. It gets clues (connection weights) and tries to guess if the answer is “blue” or “red”. If it guesses wrong, it learns from its mistakes and adjusts its guesses next time. This “learning” happens by changing the connection weights between its brain cells. The more it practices, the better it gets at guessing correctly!

  • See the loss function and discrimination:

Additional note:

  • In our artificial brain, stronger connections displayed with thicker lines. We can shrink this brain by removing weak connections, like unimportant shortcuts, without losing much information. This is how we make big language models smaller and faster! In jargon, they call this process LoRA (Low Rank Adaptation)

Now let’s start the exercises.

Simplest discrimination

This simplest discrimination involves two clearly distinct datasets. In the playground ( https://playground.tensorflow.org/ ), do as indicated in the drawing below. It is the simplest neural network: it has 2 linear preprocessing combined into one single brain cell (neuron). The learned discrimination is very good.

Learning 2 clearly different datasets
  • Try with only one input preprocessing square. Observe that the discrimination learning is not good. This means that good discrimination requires at least two orthogonal dimensions.
  • Add another hidden layer. Reset and run. Observe that discrimination learning is not better. Increase the neurons in the layers. Not better discrimination. Our first try hit already the optimal neural network.

Increase Pattern Complexity: disjoint groups, first attempt

The first exercise above compares two sets where dots within each group are close together. The following exercise uses two different sets where the groups themselves are completely separate. Select the second of the 4 possible sets.

Try with the same above single-layer neural network. Reset, run and observe that the loss function is high for both the training set and the test set (0.38–0.4), which means bad discrimination learning.

Simple neural network on a slightly more complex task

Disjoint groups, 2nd (failed) learning attempt

Add a hidden layer. Reset, run and observe that the loss function is still big.

More complex neural network on the same task

Disjoint groups, successful 3rd attempt

Add two neurons to the inner layer, for a total of 3 in this layer. Reset, run and observe the result. Observe that this more complex inner layer clearly improves the discrimination.

Learning task achieved successfully
  • Observe that all neuron connections in the hidden layer are thick: they participate in the output. This means that the neural network is optimal. The 3rd neuron of the inner layer is also important because its output contributes with an important weight to the output layer.
  • Try adding more neurons to the inner layer, for a total of 4, 5, 6 and 7 in this layer. Reset, run and observe that a more complex inner layer barely improves. Try and change “ratio of training to test data”. Observe that at high ratios the training becomes unstable: the loss function oscillates from one iteration to another.

Disjoint groups, inadequate preprocessing

In the input preprocessing, use parabolic functions (x² and y²). Observe that whatever the complexity of the neural network (number of layers and number of neurons par layer), discrimination is weak. Conclude on the importance of choosing the correct input preprocessing function.

Bad learning using a inadequate input preprocessing

Disjoint groups, most adequate preprocessing

In the input preprocessing, use the hyperbolic functions (x1x2). Observe that the discrimination is now excellent even with the simplest neural network configuration. Change the ratios of training to test data and observe that convergence is always stable: the training doesn’t depend on the sampling errors. Conclude again on the importance of choosing the correct input preprocessing.

Good learning using another appropriate preprocessing

What we have learned until now?

In this first part, we explored with several exercises how simple neural networks identify simple distinct patterns (blue vs. red dots).

Part 2 (https://kvutien-yes.medium.com/no-code-no-maths-learn-gen-ai-2-35d7080c417f) will dive deeper: building more complex networks for intricate tasks, with design tips and real-world applications in machine recognition and language processing.

Originally published at https://www.linkedin.com.

--

--