Poisoned AI went rogue during training and couldn’t be taught to behave again in ‘legitimately scary’ study::AI researchers found that widely used safety training techniques failed to remove malicious behavior from large language models — and one technique even backfired, teaching the AI to recognize its triggers and better hide its bad behavior from the researchers.
I hold that this is true of all neural-nets, organic as well as silicon:
Once a person has sided with treachery, rooting it out from one’s unconscious-mind is … enduringly difficult, if not intractable.
I don’t know how many decades it takes to eradicate the roots of it, if it can be done, at all:
the unconscious-mind mechanism, that-is the Kahneman System-1 ( from “Thinking Fast & Slow” ) imprint is going to still be there, even if overlaid with another imprint ( since mind is holographic/pattern-imprints in function ).
Worse, it is the motivation that need change, and motivation is of ego, which is of identity, so many who “reform” only do-so superficially.
I’m not saying this as some goody-2-shoes, I’m saying this as a person who was raised by narcissists, and therefore embodied much narcissism, and class-prejudice ( dad was a doctor: you can’t get more upper-middle-class status-prejudiced than doctor-culture )…
…who finally cracked the root kernel of the class-prejudice in my unconscious-mind’s identity-crystal at the end of a 25d hard-line fast, out in the bush.
It took that to fracture the identity-crystal’s prejudice.
It’s been a decade since then, & I’m still fighting to eradicate its treachery from my nature.
Neural-nets are tough to purge, or clean-up & make upright.
MUCH easier to keep a neural-net pristine through all of its formation, than to try ( endlessly failing ) to clean it up, after it’s become enemy-intent in “family” clothing.
_ /\ _
Can you recommend further reading?