Site icon MSN Technology

Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark

GettyImages 1247646075 e1700575788127

Earlier this week, Meta Descended in hot water The crowded benchmark, to use the experimental, unmanned version of its Lama 4 extravagant model to get high scores on LM Arena. Incident LM Arena’s caregivers indicated to apologizeChange their policies, and score unmanned, vanilla more.

It turns out, it’s not too competitive.

Non -edited maracing, “Lalama -4 -Mauric 17 B -128 E -instructor,” The models were classified below Until Friday, Openi’s GPT -4O, Anthropic’s Claude 3.5 Sant, and Google’s Gemini 1.5 Pro. Many of these models are months.

Why poor performance? The company explained in A, Meta’s experimental, Lama-4-4-maverick -03-26- experimental, “improved for dialogue.” The chart appeared Last week, these reforms clearly played well with LM Arena, in which the human radius compares the output of models and select the what they prefer.

As we have written beforeFor various reasons, LM Arena has never been the most reliable step in the performance of the AI ​​model. Nevertheless, making a model a benchmark – besides being misleading – makes it difficult for developers to predict how the model will perform better in different contexts.

In a statement, a Meta spokesperson told the Tech Crunch that the Meta experiments with “all kinds of customs variations”.

The spokesperson said, “Llama-4-maverick-03-26- experimental ‘is a chat optimized version with which we have experienced that LM Arena also performs well. “” Now we have released our open source version and will see how the developers customize Lama 4 for their use matters. We are excited to see what they will make and are waiting for their ongoing opinion. “

Source link

Exit mobile version