No code-no maths: Learn Gen AI (2)

Khang Vu Tien
10 min readFeb 15, 2024

--

Ever wonder how chatbots work? You know, those virtual assistants always ready to answer your questions? Let’s dive in without any intimidating math or coding!

In part 1, we already explored how simple neural networks identify distinct patterns (blue vs. red dots) with 2 simple patterns. https://kvutien-yes.medium.com/no-code-no-maths-learn-gen-ai-86f93241a0ad

In part 2 we will finish our exploration of the playground. After that, we will dive deeper: learn how more complex networks are built for intricate tasks, with design tips and real-world applications in machine recognition and language processing.

So, there you have it! A basic understanding of chatbots without any scary technical stuff. Remember, knowledge is power, and knowing how something works makes it less intimidating and more fun!

Graphically Build & Run a Neural Network (2)

Discriminate task: Inclusion Pattern

Brain-like AI recognition systems excel at untangling messy puzzles, especially when hidden shapes nestle within each other. Choose the 3rd dataset of the 4 datasets shown here:

Select the 3rd dataset (enclosed set)

In the input preprocessing functions, choose the parabolic functions (“x1²” and “x2²”). Define a simplest neural network configuration with 2 hidden layers, 1 neuron in each layer. Reset, run and observe that the discrimination is perfect.

Learning 2 enclosed datasets
  • Try with more complex neuron configurations and observe that the performance is just as better but not much.
  • We notice again the importance of using an adequate pre-processing function for the input layer. The square function performs better than the linear function. This is why, in computer vision, a choice of convolution filters is applied to preprocess the input and extract shapes and boundaries on which the training is done.

Discriminate Inclusion Pattern using Only Linear Preprocessing

There is more than one combination of neural network pre-processing and configuration that can be used to discriminate the 2 datasets. For example, instead of the parabolic preprocessing we can use linear preprocessing and compensate with a more complex hidden layer.

  • For input preprocessing, choose the linear functions (“x1” and “x2”). Define a neural network with 2 hidden layers, 3 neurons in inner layer and 2 neurons in the output layer. Reset, run and observe that the discrimination is quasi perfect.
  • Reset and run again to check that the good discrimination is consistently achieved.

We have a hint here how “hallucinations” appear in chatbots: it’s like teaching a parrot two ways to speak. Both sound good to the parrot because it only knows its training phrases. But for someone who understands language, one way might sound nonsensical, like the parrot making up words. That’s similar to how chatbots can sometimes create realistic sounding but incorrect responses based on their limited training data.

Discriminate task: Swirling Pattern, elaborate preprocessing

Let’s choose the pair of datasets as two swirling clouds of data, tightly linked. Choose the 4th dataset of the 4 datasets shown here (complex as it gets!):

Select the 4th data set (swirling pattern)

For the input preprocessing, select all functions including sine and cosine. Define a neural network configuration with 2 hidden layers, 7 neurons in the internal layer and 1 neuron in the output layer. Reset, run and observe that the discrimination is perfect, albeit after a long training.

Learning a swirling pattern with an elaborate preprocessing

Observe that all input preprocessing connections to the hidden layer are thick and all neurons in this hidden layer also have thick connections to the output. This means that the neural network is optimal, although training takes a lot of iterations before converging.

Swirling Pattern, more complex network

Repeat with a more complex configuration having 7 neurons in the output layer. Observe that training convergence is faster.

Learning with a more complex output layer

Swirling Pattern, simple preprocessing with elaborate network

The following experiment will give us several interesting hints on how to design neural networks.

  • Select only the 2 linear preprocessing functions (“x1” and “x2”).
  • Define a fully complex neural network with 6 hidden layers, each with 7 neurons.
  • Reset and run for a long moment. The training will converge after an instable period.
  • Observe that the training discriminates very well the 2 datasets despite the whirling pattern. As successful as if we used all preprocessing functions available.
Another learning configuration

Swirling Pattern, use of ReLU activation function

Now keep the same network. but use the ReLU (Rectified Linear Unit) activation function. Training converges with less iterations and faster. The ReLU activation function is faster to compute than the hyperbolic tangent (tanh) function.

Learning with ReLU activation function

What we have learned from using the playground

Typical neural network
  1. Imagine information flowing through a network of interconnected cells, each performing simple calculations. This simplified picture captures the essence of a neural network, a powerful tool used in machine learning.
  2. Each cell, called a neuron, receives information from its neighbors, multiplies it by a factor (weight), and combines the results. This combined value then gets transformed by an activation function, determining the neuron’s output that feeds the next layer.
  3. The ReLU activation function is popular because it’s efficient and mimics the behavior of brain cells. Think of it like a filter, letting through only positive signals.
  4. By stacking many layers of interconnected neurons, we create a fully connected neural network. With enough layers and neurons, even simple networks can learn complex patterns. However, too many connections become cumbersome, requiring powerful computers to train.
  5. To improve efficiency, some networks like those used for image recognition employ convolutions, a special type of connection that focuses on local features.

Here’s where things get interesting:

  1. A real-life network, containing a thousand of neurons in each layer, has 1 million weights between 2 layers. With 1 thousand layers (hence the term “deep learning”), that makes a billion weights. This is why very powerful GPUs (floating point calculators) are required to train it.
  2. Imagine connecting several single networks, creating a system like the Transformer , used in chatbots and image generation. The set of connections and weights in this system, called the model , can contain billions of parameters!
  3. Transformers are used in Large Language Models for chatbots and in Diffusion Models for image generation.

The above experiment required no maths and no code. To understand further the current developments in AI, we need some maths and some coding. This is why the following is composed of YouTube videos and scientific articles. The knowledge gained from the experiments above will help us understand the explanations.

Recurrent Neural Network

As a bonus, the same video introduces Recurrent Neural Networks at https://youtu.be/J1QD9hLDEDY?t=5222 . It is a very good transition to understand Transformers, that are the basis of current Large Language Models that power chatbots, and Diffusion Models that power image generators.

  • Until now, we have seen Feed-Forward Network(single shot): one-shot input => one single output
  • Recurrent network (loop back) network: unbounded stream of inputs => corresponding output stream
Recurrent Neural Network

Understand Transformers

https://www.youtube.com/watch?v=QAZc9xsQNjQ&t=2434s

This CS50 video lesson (25 minutes) from Harvard University gives an excellent explanation about transformers and Generative AI. It explains the following topics:

  • Encoder-decoder architecture,
  • Attention on a segment of tokens,
  • Attention + Positional encoding = Transformers,
  • Transformers in Generative AI.

This CS50 course stops short of distinguishing between Large Language Models, used for text generation, and Diffusion models, used for image generation.

Concerning the Diffusion Models, I haven’t found any easy-to-understand courses. The following resources give some useful explanations:

  • illustrated explanation, starting with easy concepts and becoming more and more abstract; watching this makes less steep the understanding the following ones: https://youtu.be/sFztPP9qPRc
  • papers & math explaining the foundation scientific article, more maths than the previous video but it complements well because it is more concrete: https://youtu.be/HoKDTa5jHvg
  • explanation of the same article, same maths level, but presented differently: https://youtu.be/W-O7AZNzbzQ

Convolutional Neural Network

Convolutional Neural Network Overall Principle

The playground exercise above demonstrated how pre-processing an input signal facilitates a neural network to recognize a spiral pattern. It also shows that while learning can be achieved using a brute force neural network (only linear inputs), doing so requires a lot more cells than learning using adequate input pre-processing.

Convolutional neural networks proceed from this remark to learn images in computer vision where a high number of pixels is already involved. They pre-process the input before feeding a neural network. They first identify shapes, borders, patterns with convolution transforms, and feed a neural network with these results, making less neurons to involve and therefore faster learning.

If you want to know more, here is an excellent video lesson from Harvard University on convolutional neural networks (CNN): https://youtu.be/J1QD9hLDEDY?t=3490. The main actions are:

  • Apply a convolution filter to detect shapes, boundaries and patterns,
  • Repeat convolution and pooling a number of times, each convolution with a different filter,
  • Pool the pixels (replace each square of 3x3 pixels by one single pixel containing the max of all 9, for example), to reduce their number,
  • Feed the pixels to a neural network,
  • The lesson ends with a Python programming example, using a pre-trained library for digit recognition.
Convolutional Neural Network

Image processing use cases

In CNNs used for image processing, several types of 3x3 filters are used and their coefficients are part of the neural network training. For example:

Example 1: Image Classification CNN

The example in Harvard the video above used a CNN to classify handwritten digits (0–9). The convolutional layer processes a 27x27 grayscale image of a digit. Here’s a potential set of filters:

  • Edge detectors (horizontal, vertical, diagonal): Capture basic edges forming the digit shape.
  • Corner detectors: Identify key junctions where lines meet.
  • Line thickness detectors: Differentiate between thin and thick strokes.
  • Curvature detectors: Capture round or curved segments of the digit.
  • Texture detectors: Extract information about pixel intensity variations within the digit.

The specific number and variations of these filters can vary depending on the network architecture and desired performance. Once filtered, the 27x27 matrix is reduced (pooled) to a 9x9 matrix that feeds the learning neural network to recognize the digit.

Example 2: Medical Image Segmentation CNN

Suppose a CNN segmenting tumors in brain MRI scans. The convolutional layer processes a 3D image volume. Here’s a possible set of filters:

  • Intensity difference filters: Differentiate between high-intensity tumor regions and lower-intensity healthy tissue.
  • Texture filters: Capture the characteristic texture patterns of tumors versus normal tissue.
  • Boundary filters: Identify sharp edges delineating the tumor region.
  • Smoothness filters: Enhance continuity within the tumor region while suppressing noise.
  • Spatial orientation filters: Capture the 3D spatial distribution of the tumor.

Again, the exact combination and types of filters depend on the specific task and data characteristics.

Seeing Road Signs with AI Eyes: A Simplified Look

Imagine a robot car trying to understand traffic signs. That’s where Convolutional Neural Networks (CNNs) come in!

Think of a CNN as a series of “boxes” stacked one after another. Each box holds smaller boxes called filters, like tiny detective glasses. Their job? To scan the image, looking for specific patterns.

In our example, we have 3 boxes:

  • Detail Detective: This box has filters that spot small things like numbers. It might find the digits “8” and “0”.
  • Digit Decoder: This box uses larger filters, combining smaller patterns like “8” and “0” into something bigger, like the number “80”.
  • Shape Sleuth: This box has even bigger filters, looking for the overall shape of the sign. In this case, it finds the telltale red circle of a speed limit sign.

But how do these boxes get smarter? Here comes the training:

  1. We show the CNN lots of labeled pictures of road signs.
  2. It tries to guess what sign it sees.
  3. If it’s wrong, we tell it the error.
  4. Like a student learning from mistakes, the CNN adjusts its filters to get better at recognizing patterns.

The more it practices, the better it gets at seeing different signs, even in different lighting or angles. This helps robot cars understand the road, making them safer and smarter!

Remember:

  • Each box does a specific job: finding small details, combining them, and seeing the overall shape.
  • Training helps the CNN learn by adjusting its filters.
  • This helps AI “see” and understand road signs!

This simplified explanation keeps the core concepts of CNNs for road sign recognition in under 400 words, making it easier to grasp!

Quantization and Low Ranking Adaptation

Raw AI models have hundreds of billions of parameters. We’ll see as a last step the techniques used to reduce the memory requirements to run these models.

https://youtu.be/t509sv5MT0w

Use Case: Hands-on OpenAI text embeddings coding

https://www.youtube.com/watch?v=ySus5ZS0b94

This tutorial concludes with a 18-minutes video showing how to store and compare personal profiles made of sequences of text. This is how, in the humanitarian project Machu Picchu, persons-in-need can share their profiles and obtain focused assistance, either from humanitarian organizations or mutually between persons sharing a similar profile.

You can immediately think of other applications in your own domain. For example, a communication satellite operator can identify patterns in the needs of its customers, in the best selling destinations, peak capacity demand periods etc.

To go further and do hands-on coding, explore open source resources like the HuggingFace services. https://huggingface.co/

Originally published at https://www.linkedin.com.

--

--