Alphabet- and Amazon-backed Anthropic introduced two new AI models on Tuesday, designed to automate computer tasks and reduce typing.
Anthropic’s updated Claude 3.5 Sonnet AI model can take charge of your mouse and keyboard to perform tasks autonomously.
This is in beta testing for now, accessible only to developers using the Claude API, but later on, we might all have AI filling in forms, moving files, searching the internet, and taking on tasks we usually do ourselves.
This feature is aimed at developers and reflects a push towards AI agents that can execute tasks autonomously with little human oversight.
CLAUDE AI MODELS OVERVIEW
Anthropic’s Claude AI models come in three versions for developers, each priced by performance level. The Sonnet mid-tier and low-cost Haiku recently received updates.
THE MODELS AT A GLANCE
- Claude 3.5 Sonnet: The mid-tier model, now enhanced with capabilities for computer use and improved coding, is priced at $3 per million input tokens and $15 per million output tokens.
- Claude 3.5 Haiku: The most budget-conscious option, ideal for customer service tasks and simpler applications, is priced at $0.25 per million input tokens and $1.25 per million output tokens.
- Claude 3 Opus: This premium model is designed for intricate tasks that demand advanced reasoning, available at $15 per million input tokens and $75 per million output tokens.
USING CLAUDE 3.5 SONNET FOR HANDLING TASKS ON A COMPUTER
The demo video from Anthropic shows Claude AI being instructed to fill out a form. While the required information is pulled from different databases and tabs, the user simply needs to request the form and point out where the relevant details can be found.
While Claude completes the tasks, it takes screenshots and examines them to grasp what it’s viewing, akin to the image recognition skills for which AI is well-recognised. It then assesses what actions to take next, relying on the displayed information and the guidance given.
In this case, the AI shows its intelligence by discerning that it should change to another browser tab and perform a search for a company name to locate required information. Claude takes care of all cursor movements, clicks, and typing as it works. The bot accurately identifies the right data and pastes it into the correct fields on the form.
At the end, Claude cleverly identifies and selects the form submission button on the screen, which finalises the task—all while the user observes. From the outset, it seems that the AI model can comprehend what it’s looking at and determine how to manipulate that information to complete tasks.
Yet, Anthropic points out that fundamental tasks such as scrolling, dragging, and zooming still “present challenges” for Claude, and beta testers are urged to operate it within “low-risk” contexts for the moment.
In the OSWorld benchmark for measuring AI’s computing task performance, Claude 3.5 Sonnet is said to score 14.9%, while human scores typically lie around 70-75%.
Another report indicates that Alphabet’s agreement with Anthropic is under scrutiny by UK competition authorities.
