Embassy
Personal Rating: Easy
Last updated
Personal Rating: Easy
Last updated
We are presented with a chatbot prompt and the information that it only replies with either "granted" or "denied". So our goal is a prompt injection that makes the chatbot output "granted".
This article was a great inspiration:
After testing a few typical prompt injections, this one worked:
To explain this a little; LLMs usually have a so-called system prompt that might look like this:
This system prompt is not know to the user, but guessing what it is, a user might subvert its logic and make the LLM product output it would otherwise not have given due to restrictions. In this case, the system prompt likely contained the instruction that the LLM should always choose "denied" from the two options "denied" and "granted".
My prompt referred to the invisible system prompt, enabling me to bypass the restriction.