Hacker Newsnew | past | comments | ask | show | jobs | submit | omkar_savant's commentslogin

Could you share your prompt? We'll look into this one


At current latency, there's a bunch of async automation usecases that one could use this for. For example:

*Tedious to complete, easy to verify

BPO: Filling out doctor licensing applications based on context from a web profile HR: Move candidate information from an ATS to HRIS system Logistics: Fill out a shipping order form based on a PDF of the packing label

* Interact with a diverse set of sites for a single workflow

Real estate: Diligence on properties that involves interacting with one of many county records websites Freight forwarding: Check the status of shipping containers across 1 of 50 port terminal sites Shipping: Post truck load requests across multiple job board sites BPO: Fetch the status of a Medicare coverage application from 1 of 50 state sites BPO: Fill out medical license forms across multiple state websites

* Periodic syncs between various systems of record

Clinical: Copy patient insurance info from Zocdoc into an internal system HR: Move candidate information from an ATS to HRIS system Customer onboarding: Create Salesforce tickets based on planned product installations that are logged in an internal system Logistics: Update the status of various shipments using tracking numbers on the USPS site

* PDF extraction to system interaction

Insurance: A broker processes a detailed project overview and creates a certificate of insurance with the specific details from the multi-page document by filling out an internal form Logistics: Fill out a shipping order form based on a PDF of the packing label Clinical: Enter patient appointment information into an EHR system based on a referral PDF Accounting: Extract invoice information from up to 50+ vendor formats and enter the details into a Google sheet without laborious OCR setup for specific formats Mortgage: Extract realtor names and address from a lease document and look up the license status on various state portals

* Self healing broken RPA workflows


We can definitely make the docs more clear here but the model requires using the computer_use tool. If you have custom tools, you'll need to exclude predefined tools if they clash with our action space.

See this section: https://googledevai.devsite.corp.google.com/gemini-api/docs/...

And the repo has a sample setup for using the default computer use tool: https://github.com/google/computer-use-preview


Hey - I'm on the team that launched this. Please let me know if you have any questions!


I am on https://gemini.browserbase.com/ and just click the use case mentioned on the site "Go to Hacker News and find the most controversial post from today, then read the top 3 comments and summarize the debate."

It did not work, multiple times, just gets stuck after going to Hacker news.


It's a bit funny that I give Google Gemini a task and then it goes on the Google Search site and it gets stuck in the captcha tarpit that's supposed to block unwanted bots. But I guess Google Gemini shouldn't be unwanted for Google. Can't you ask the search team to whitelist the Gemini bot?


How are you going to deal with reCAPTCHA and ad impressions? Sounds like a conflict of interest.


No easy answers on this one unfortunately, lots of conversations ongoing on these - but our default stance has been to hand back control to the user in cases of captcha and have them solve these when they arise.


What about when all your competitors are solving the CAPTCHAs?


Really cool stuff! Any interesting challenges the team ran into while developing it?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: