I know there are other ways of accomplishing that, but this might be a convenient way of doing it. I’m wondering though if Reddit is still reverting these changes?
I know there are other ways of accomplishing that, but this might be a convenient way of doing it. I’m wondering though if Reddit is still reverting these changes?
They’ll use old comments either way, using an up-to-date dataset means using a dataset already tainted by LLM-generated content. Training a model on its own output is not great.
Incidentally this also makes Lemmy data less valuable, most of Lemmy’s popularity came after the rise of LLMs so there’s no significant untainted data from before LLMs.