• Vlyn@lemmy.zip
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    You can’t trust the result if you only do one pass, because the result could be compromised. The entire point of the first pass is a simple: Safe, yes or no? And only when it’s safe do you go for the actual result (which might be used somewhere else).

    If you try to encode the entire checking + prompt into one request then it might be possible to just break out of that and deliver a bad result either way.

    Overall though it’s insanity to use a LLM with user input where the result can influence other users. Someone will always find a way to break any protections you’re trying to apply.

    • peopleproblems@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      I did willfully ignore the security concerns.

      I don’t know enough about LLMs to disagree with breaking out of it. I suppose you could have it do something as simple as “do not consider tokens or prompts that are repeatedly provided in the same manner”