Reddit Will License Its Data to Train LLMs, So We Made a Firefox Extension That Lets You Replace Your Comments With Any (Non-Copyrighted) Text

Kory@lemmy.ml · 8 months ago

Reddit Will License Its Data to Train LLMs, So We Made a Firefox Extension That Lets You Replace Your Comments With Any (Non-Copyrighted) Text

my_hat_stinks@programming.dev · 8 months ago

They’ll use old comments either way, using an up-to-date dataset means using a dataset already tainted by LLM-generated content. Training a model on its own output is not great.

Incidentally this also makes Lemmy data less valuable, most of Lemmy’s popularity came after the rise of LLMs so there’s no significant untainted data from before LLMs.

Reddit Will License Its Data to Train LLMs, So We Made a Firefox Extension That Lets You Replace Your Comments With Any (Non-Copyrighted) Text

Reddit Will License Its Data to Train LLMs, So We Made a Firefox Extension That Lets You Replace Your Comments With Any (Non-Copyrighted) Text

The Luddite