Last month, I discussed Mercor’s newly established benchmark for evaluating AI agents’ competencies in professional tasks such as legal and corporate analysis. At that time, the results were rather disappointing, with all major laboratories scoring below 25%. Consequently, we concluded that, at least for the immediate future, lawyers were unlikely to face displacement by AI.
However, AI capabilities can evolve significantly within a short timeframe.
The recent launch of Anthropic’s Opus 4.6 has significantly altered the competitive landscape, with the new model achieving a score of nearly 30% in one-shot trials, and an average score of 45% when given multiple attempts to solve the problems. This release also introduced several innovative features, including “agent swarms,” which might have contributed to improved performance in tackling complex, multi-step challenges.
Nonetheless, this score represents a substantial increase from the previous state-of-the-art, indicating that advancements in foundation models are continuing at a rapid pace. Brendan Foody, CEO of Mercor, expressed his astonishment, stating, “transitioning from 18.4% to 29.8% in just a few months is remarkable.”
While a score of thirty percent still leaves ample room for improvement, lawyers need not fear imminent replacement by machines. However, they should reconsider their earlier confidence in the capabilities of AI!

