I tested by asking ChatGPT 3.5 specific questions about The Bedwetter, and it seems like it was not trained on the full text of the book. I asked it what is the first sentence, and then what is the second paragraph, and it gave plausible but incorrect answers. I asked it for the table of contents, and then if a specific chapter was in the book, and it said “my responses are generated based on pre-existing data and do not have real-time access to specific book content”. I asked who wrote the foreward, and who wrote the afterward. It said Patton Oswalt wrote the foreward and that there is no afterward. In reality, Sarah wrote the foreward and God wrote the afterward.
LLMs compress data, there’s no way ChatGPT could remember every detail of the book alongside all the other information it stores in its encodings. The issue isn’t whether the entire text of the book is contained within the encodings, it’s whether it was trained on the book in the first place.
I tested by asking ChatGPT 3.5 specific questions about The Bedwetter, and it seems like it was not trained on the full text of the book. I asked it what is the first sentence, and then what is the second paragraph, and it gave plausible but incorrect answers. I asked it for the table of contents, and then if a specific chapter was in the book, and it said “my responses are generated based on pre-existing data and do not have real-time access to specific book content”. I asked who wrote the foreward, and who wrote the afterward. It said Patton Oswalt wrote the foreward and that there is no afterward. In reality, Sarah wrote the foreward and God wrote the afterward.
ChatGPT conversation
Table of contents and first chapter from Google Books.
LLMs compress data, there’s no way ChatGPT could remember every detail of the book alongside all the other information it stores in its encodings. The issue isn’t whether the entire text of the book is contained within the encodings, it’s whether it was trained on the book in the first place.