It’s more important than ever to understand what ChatGPT and other AI tools like it are actually doing when they talk to us and write for us.
I worked with some Large Language Models and GPTs and dug into what they’re doing, and I wrote this article. I try to explain in the simplest terms possible what modern AIs actually are and how exactly they construct their content so we can move past the fear and confusion about what AI is capable of and start using it for what it’s actually good for.
Please arm yourself with knowledge and understanding, and share this with someone who worries about AI taking over their job (or even the whole world)!
This article is full of errors!
Definitely not! An LLM is the combination of an architecture and its model parameters. It’s just a bunch of numbers, no list of sentences, no database. (Seems like the author confused the word “LLM” with the dataset of the LLM???)
Nope. This applies to the dataset, not the model. I guess you can argue that memorization happens sometimes, so it might have some features of a database. But it isn’t one.
LLMs are trained in an unsupervised fashion. Just sequences of tokens, no labels.
I’m not aware of any LLM that does this. What’s the “context” of GPT-4?
The closest real thing is the RLHF process that is used to fine tune an existing LLM for a specific application (like ChatGPT). The dataset for the LLM is not annotated or categorized in any way.
This is confusing. “GPT” is the architecture of the LLM.
This isn’t accurate, depending on the temperature setting, an LLM can output literally any word at any time with a non-zero probability. It can absolutely produce things it hasn’t seen.
Also I think it’s too simple to just assert that LLMs are not intelligent. It mostly depends on your definition of intelligence and there are lots of philosophical discussions to be had (see also the AI effect).