The Evolution of Claude: New Models Learn to Navigate Computers

The upgraded Sonnet version demonstrates substantial improvements across various benchmarks, with particularly impressive gains in coding capabilities. On the SWE-bench Verified test, it scored 49.0%, surpassing all publicly available models, including specialized coding systems. The enhanced performance extends to tool use tasks as well, with notable improvements in both retail and airline domains, all while maintaining the same price and speed as its predecessor. Early feedback from companies like GitLab and Cognition has confirmed significant improvements in coding, planning, and problem-solving capabilities.

A groundbreaking new feature called computer use has been introduced in public beta, exclusively available through the API. This capability allows Claude to interact with computers similarly to humans, using screens, cursors, and keyboards to navigate interfaces and complete tasks. Several major companies, including Asana, Canva, and DoorDash, have already begun exploring these possibilities, implementing tasks that require numerous steps to complete. The computer use feature, while still experimental and sometimes error-prone, has shown promising results on the OSWorld evaluation, scoring 14.9% in the screenshot-only category and 22.0% when given more steps to complete tasks.

The new Claude 3.5 Haiku model represents a significant advancement in combining speed with capability. Despite maintaining a similar speed to its predecessor, it matches or exceeds the performance of Claude 3 Opus on many intelligence benchmarks. The model demonstrates particular strength in coding tasks, scoring 40.6% on SWE-bench Verified and outperforming many existing models, including the original Claude 3.5 Sonnet. With its low latency and improved instruction-following capabilities, it is particularly well-suited for user-facing products and specialized tasks. The model will be available across multiple platforms, including Anthropic's API, Amazon Bedrock, and Google Cloud's Vertex AI.

Anthropic has taken significant steps to ensure the responsible development and deployment of these new capabilities. The company conducted joint pre-deployment testing with the US AI Safety Institute and the UK Safety Institute. Safety measures have been implemented specifically for the computer use feature, including new classifiers to identify potential misuse for spam, misinformation, or fraud. The company maintains that the ASL-2 Standard, as outlined in their Responsible Scaling Policy, remains appropriate for these models. Anthropic has emphasized that while these technologies are still in their early stages, they represent important steps forward in AI capability and accessibility.  Access the API here Github.com

Enjoyed this article? Stay informed by joining our newsletter!

Comments
hinson - Nov 18, 2024, 8:37 AM - Add Reply

clean

You must be logged in to post a comment.
Trevor Arege - Nov 18, 2024, 8:38 AM - Add Reply

Nice one there

You must be logged in to post a comment.
Daiberias Bundi - Nov 18, 2024, 8:48 AM - Add Reply

Appealing

You must be logged in to post a comment.
Brian - Nov 18, 2024, 8:50 AM - Add Reply

Gret invention

You must be logged in to post a comment.
Brian - Nov 18, 2024, 8:50 AM - Add Reply

great

You must be logged in to post a comment.
Anthony - Nov 18, 2024, 8:51 AM - Add Reply

Interesting

You must be logged in to post a comment.
Barbara - Nov 18, 2024, 8:55 AM - Add Reply

Nice article

You must be logged in to post a comment.
Fabish Onsomu - Nov 18, 2024, 11:48 AM - Add Reply

Nice article

You must be logged in to post a comment.
Prince Kungu - Nov 18, 2024, 7:14 PM - Add Reply

Real good.

You must be logged in to post a comment.
Brevin Gitanga - Nov 18, 2024, 8:03 PM - Add Reply

Outstanding

You must be logged in to post a comment.
Alex Nyakundi - Nov 19, 2024, 5:26 AM - Add Reply

Mind blowing

You must be logged in to post a comment.
robin kimathi - Nov 19, 2024, 10:42 AM - Add Reply

Great one

You must be logged in to post a comment.
robin kimathi - Nov 19, 2024, 10:42 AM - Add Reply

Great one

You must be logged in to post a comment.
robin kimathi - Nov 19, 2024, 10:43 AM - Add Reply

nice one

You must be logged in to post a comment.
Thug nificent - Nov 20, 2024, 12:52 PM - Add Reply

Great job

You must be logged in to post a comment.

You must be logged in to post a comment.

About Author

Lost