nanoGPT: Learn How ChatGPT Work

by

in

Background

One of the hottest topics today is Artificial Intellagence (AI), and more specificy Large Lnguages Models (LLM) you can use to generate text on almost any topic.

I started using a LLM called ChatGPT and found that it was able to generate reasonable human sounding text on several topics by telling it what role I wanted it to play, such as technical author, a series of factual statements I wanted it to base its’ answer on, and then ask for the specific question to be answered.

After a little experimentation with how I entered the information, referred to as called prompt engineering, I found I was able to have the model generate text on a series of different subjects quire easily.

This experimentation led me to wonder how the program underlying ChatGPT was written.

Investigation

My research started with  a visit to Wikapedia, followed by several days of general searching the topic. Much of what I found was either very simplistic, or highly technical, requireing a deeper knowledge base than what I had. I needed on something on the order of a primer that would allow me to build and test someting without a gradulate level course.

My real journey started whin I found  Andrew Karptathy’s nanoGPT open source code files at https://github.com/karpathy/nanoGPT.

It contains the python code and database, allowing you to feel the magic of actually building, training and to running a character-level Generative Pretrained Transformer (GPT), the program structure many of the latest LLM’s are based on.

Opening the link will bring you to the directory structure followed by the README.md file containing the program descripton and instructions on how to build, train, fine tune and run the code. It explains that nanoGPT is a simple, character based Generative Pretrained Transformer (GPT). In that model, you are shown how to train the Langluage Model (LM) on a database containing all the dialog of Shakespears plays.

This is an open source program that contains the working code for a small demonstration program. The site has the code and database to build an AI Language Model (LM). That said, it was quite enlightning to train and run it, but, I don’t think it will change the world, but it will open your eyes to how a GPT works, and why Graphic Processing Unit (GPU) are essential. It took three days of processing on a PC without a GPU to train the simplest mode,

Looking deeper into the program

Once I had experamented with nanoGPT I wanted to understand how the code worked. Reviewing the source files did not help. I needed a good understanding of how they worked.

At the end of the readme file there was a link to a series of his lectures titled Neural Networkes: Zero to Hero, designed to give users the basic understanding of the tools I needed. I consider thesee to be an outstanding series of videos that take you from start through building a basic Generative Pretrained Transformer (GPT) written in Python.

Neural Network: Zero To Hero. https://karpathy.ai/zero-to-hero.html

References:

The research paper underlying all GPT Models: Attention is all you need https://arxiv.org/pdf/1706.03762


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *