ChatGPT is full of sensitive private information and spits out verbatim text from CNN, Goodreads, WordPress blogs, fandom wikis, Terms of Service agreements, Stack Overflow source code, Wikipedia pages, news blogs, random internet comments, and much more.
You could use diffusion to generate text. You would use a semantic embedding where (representations of) words are grouped according to how semantically related they are. Rather than dog/God, you would more likely switch dog for canine. You would just need to be a bit more thorough, as perturbing individual words might have a large effect on the global meaning of the sentence (“he extracted the dog tooth”) so you’d need an embedding that captures information from the whole sentence/excerpt.