Although Gemini is using his model and reasoning process for these tasks, it has been telling that Jaeles, especially these special agents, had to graph the base model to help go through the most difficult challenges of the game. As Joel writes, “My intervention improves Gemini’s overall decision -making and reasoning capabilities.”
What are we doing here?
Do not misunderstand me, massage LLM in a form that can defeat the Pokémon game is definitely a success. But the “interference” level needs to help Gemini in things that “LLMs cannot do freely” is very important because we evaluate this success.
At that moment Gemini defeated Pokemon (with a little help).
We already know Certainly designed reinforcement tools can defeat Pokemon in a quite effective way (And that too A random number generator can defeat the game quite ineffectively) The special resonance of the “LLM Play Pokemon” test is to see if a typical language model can argue itself for its own solution to a complex game. Through external information, tools, or “harms” we allow the model as much as we can. This game is less useful as such a test.
Anthropic Said in February The cloud played the role of Pokémon “AI system glamors that deal with the challenges not only through training but with general reasoning, with increasing ability.” But as Brad Shaw wrote on Lishering, “Without the use of a better agent, [all models] Red’s bedroom, the first screen of the game has a hard time to do it only! “Brad Shaw’s Then the gameplay test The control -free LLMS highlights how these models often deceive the conditions of the game, meaningless, meaningless, or even impossible.
In other words, we are still far from such a imagined future where artificial general intelligence can find a way to defeat Pokemon because you asked him.