At current latency, there's a bunch of async automation usecases that one could use this for. For example:
*Tedious to complete, easy to verify
BPO: Filling out doctor licensing applications based on context from a web profile
HR: Move candidate information from an ATS to HRIS system
Logistics: Fill out a shipping order form based on a PDF of the packing label
* Interact with a diverse set of sites for a single workflow
Real estate: Diligence on properties that involves interacting with one of many county records websites
Freight forwarding: Check the status of shipping containers across 1 of 50 port terminal sites
Shipping: Post truck load requests across multiple job board sites
BPO: Fetch the status of a Medicare coverage application from 1 of 50 state sites
BPO: Fill out medical license forms across multiple state websites
* Periodic syncs between various systems of record
Clinical: Copy patient insurance info from Zocdoc into an internal system
HR: Move candidate information from an ATS to HRIS system
Customer onboarding: Create Salesforce tickets based on planned product installations that are logged in an internal system
Logistics: Update the status of various shipments using tracking numbers on the USPS site
* PDF extraction to system interaction
Insurance: A broker processes a detailed project overview and creates a certificate of insurance with the specific details from the multi-page document by filling out an internal form
Logistics: Fill out a shipping order form based on a PDF of the packing label
Clinical: Enter patient appointment information into an EHR system based on a referral PDF
Accounting: Extract invoice information from up to 50+ vendor formats and enter the details into a Google sheet without laborious OCR setup for specific formats
Mortgage: Extract realtor names and address from a lease document and look up the license status on various state portals
We can definitely make the docs more clear here but the model requires using the computer_use tool. If you have custom tools, you'll need to exclude predefined tools if they clash with our action space.
I am on https://gemini.browserbase.com/ and just click the use case mentioned on the site "Go to Hacker News and find the most controversial post from today, then read the top 3 comments and summarize the debate."
It did not work, multiple times, just gets stuck after going to Hacker news.
It's a bit funny that I give Google Gemini a task and then it goes on the Google Search site and it gets stuck in the captcha tarpit that's supposed to block unwanted bots. But I guess Google Gemini shouldn't be unwanted for Google. Can't you ask the search team to whitelist the Gemini bot?
No easy answers on this one unfortunately, lots of conversations ongoing on these - but our default stance has been to hand back control to the user in cases of captcha and have them solve these when they arise.