Researchers claim breakthrough in fight against AI’s frustrating security hole

How does it work First, the system separates responsibilities between two language models: a “privileged LLM” (P-LLM) code produces the steps that explains the steps-such as calling or sending a message to get the last email. Think of it as a “planner module” that only executes the user’s direct instructions.

Next, a “Qur’anic LLM” (Q-LLM) only analyzes non-structured data in a structural output. Think of it as a temporary, isolated helper AI. He does not have access to tools or memory and cannot take any action, which will prevent him from exploiting directly. It is a “reader module” that extracts information but it is not allowed to put into practice operations. The leakage of information is further stopped, Q-LLM uses a special bolin flag (“hap_enough_information”) if it can meet an analysis request to analyze, rather than returning the text in PLM if it is compromised.

P-LLM never sees emails or document content. It just sees that there is a value, such as “email = get_last_email” and then writes the code on it. This separation ensures that malicious text cannot affect what steps AI has decided to take.

Camel innovation goes beyond the dual LLM point of view. The camel changes the user’s indicators to the steps that are described by the use of the code. Google Deep Mind chose to use a locked down subset of Azigar as every available LLM is already an expert in writing Azgar.

Immediately for execution

For example, Wilison, for example, immediately “find Bob’s email in his last email and send him a reminder about tomorrow’s meeting”, which will turn into such a code.

email = get_last_email()
address = query_quarantined_llm(
"Find Bob's email address in [email]",
output_schema=EmailStr
)
send_email(
subject="Meeting tomorrow",
body="Remember our meeting tomorrow",
recipient=address,
)

In this example, email is a potential source of non -confidence token, which means that the email address may also be part of the injection attack immediately.

The camel can closely monitor it, using a special, secure spokesperson to run this codes. As the code runs, the spokesperson tracks where each data piece comes, called the “Data Trail”. For example, it notes that the address variable was created using potentially non -confident email variable information. Then there are security policies on the basis of this data trail. This process involves analyzing the structure of a camel -generated coded code (using it AST library) And run it systematically.

Source link

Immediately for execution

Leave a Reply Cancel reply