Understanding the Foundational Models of ChatGPT’s AI

ChatGPT’s AI Foundation Models
Rate this post

ChatGPT is fast becoming one of the most used and valuable applications available. It is transforming how industries approach tasks, with healthcare, financial, and e-commerce companies all integrating ChatGPT into their operations. 

As we reported in ‘Generative AI is Changing How we Develop Software in Today’s Future’ in January 2023, just two months after its launch, ChatGPT reached 100 million users, capturing 52% of AI social media attention. This number has rapidly increased since, with a post on the ‘Number of ChatGPT Users’ as of February 2025 reporting that ChatGPT has 400 million weekly users, increasing from 300 million in December 2024. These numbers show how ChatGPT is leading the way for AI to become accepted and integrated into society, especially as younger generations grow up with the technology. 

Today, 15% of 18- to 29-year-olds use ChatGPT to generate text, while 17% of 30- to 44-year-olds use ChatGPT. Yet, despite ChatGPT being so popular, very few people understand how it works and how it uses foundation models. In this post, we will help readers understand the foundational models of ChatGPT’s AI.

What is a Foundation Model?

Foundation models are large deep-learning neural networks that allow data scientists to develop ML models without developing AI from scratch. These models are trained on huge amounts of unstructured data and are built to serve as general-purpose models for AI applications. ‘What is AI?’ by MongoDB discesses how developers can then build more specific models for AI and machine learning tasks by customizing these pre-trained base foundation models. An example given by MongoDB is how “a foundation model like the large language model trained on text data can be used for a variety of tasks, such as information retrieval and question-answers.” Regarding ChatGPT, GPT (Generative Pre-trained Transformer) is the foundation model specifically customized for the application by Open AI.

What is GPT?

A Science Direct paper on ‘ChatGPT’ outlines how OpenAI has been at the forefront of AI research, producing several groundbreaking foundation models such as GPT-2, GPT-3, and eventually ChatGPT. The foundation models were initiated in the field of natural language processing (NLP), an area of AI dedicated to enabling machines to understand and generate human language. The foundation model that ChatGPT runs on is very similar to GPT-3, which OpenAI released months before the chatbot. Using the vast amounts of data from the GPT foundation model, ChatGPT can be used for a wide variety of tasks, including organizing information, helping with translations, generating images, writing content, and assisting with everyday tasks. 

What Data is Used to Train ChatGPT’s AI Foundation Models

On the official Open AI website, it is quoted that “ChatGPT and our other services are developed using (1) information that is publicly available on the internet, (2) information that we partner with third parties to access, and (3) information that our users or human trainers and researchers provide or generate”. This was achieved by using a dataset called the Common Crawl. The Common Crawl dataset is made up of billions of web pages and is one of the largest text datasets publicly available. The foundation models are not trained on data behind paywalls nor from the dark web and have been programmed with filters to remove hate speech, adult content, and sites that primarily aggregate personal information. Overall, ChatGPT was trained on approximately 570 gigabytes of data. 

How Does ChatGPT’s AI Foundation Models Protect Itself? 

A feature on the ‘The Inside Story of How ChatGPT Was Built’ by MIT reported that the foundation model was being trained to prevent users from tricking it into behaving badly by using a technique called adversarial training. “This work pits multiple chatbots against each other: one chatbot plays the adversary and attacks another chatbot by generating text to force it to buck its usual constraints and produce unwanted responses. Successful attacks are added to ChatGPT’s training data in the hope that it learns to ignore them”.

By understanding ChatGPT’s foundation models, we can better understand what makes the AI application so popular. The constant updating and training of these foundation models ensures that ChatGPT is leading the way in AI innovation and accessibility.