I’ve been evaluating Gemini’s novel automated functionality on the Pixel 10 Pro and the Galaxy S26 Ultra. This system, for the first time, permits Gemini to assume control and operate applications on your behalf. Its current scope is restricted to a select group — a limited number of meal delivery and transportation-on-demand platforms — and it remains in its experimental phase. The process is unhurried, occasionally awkward, and it doesn’t address any significant issue you might have had while using your phone. Nevertheless, it’s remarkably impressive, and I believe it’s not an exaggeration to say this offers a preview of what’s to come. We are still a long way off, but this marks the initial instance I’ve witnessed a genuine artificial intelligence helper actively functioning on a mobile device, distinct from a main address or a meticulously managed demonstration within an exhibition venue.
To begin, Gemini is considerably less swift than yourself, myself, or most individuals operating their mobile devices. Should you require to immediately hail a ride, you remain the most efficient operator. However, before you dismiss it, recall that automated tasks are configured to operate unobtrusively in the background while you engage in other activities on your phone. Even more advantageously, it continues its operation even when you’re *not* looking at your phone, allowing you to perform actions such as verifying your passport’s presence in your bag yet again.
However, if you’re inquisitive, similar to my own nature, you can observe the entire process unfold. During its operation, wording is displayed at the base of the screen, showing Gemini’s current actions. Examples include: “Selecting a second portion of Chicken Teriyaki for the combo,” which it did when I instructed it to procure my evening meal on Saturday night. Watching Gemini adaptively resolve issues spontaneously is genuinely quite impressive. I requested a mixed chicken meal; the list of choices offered quantities in half-sized portions, so it duly included two partial portions of chicken.

It is optimal that when you initiate an automation with Gemini, the standard operation involves background execution. You must press an icon and access a separate pane if you wish to observe Gemini processing the assignment. And it can be agonizing. Watching the computer attempt to locate a vegetable accompaniment on an Uber Eats menu when it is *visibly positioned at the pinnacle of the screen* is akin to a suspenseful film where the antagonist’s hidden presence is known, minus the homicidal element. Gemini took a few incorrect steps as it assembled my teriyaki request, which it ultimately resolved independently, but the entire sequence consumed approximately nine minutes. Far from optimal.
Gemini is intended to execute your instruction until the stage requiring final confirmation for requesting your transport or meal, allowing you to verify its output. This, I believe, is the sole sensible method for utilizing this function presently, and I do not object to the minor impediment of finalizing the transaction. In the trials I’ve conducted during the last five days, it has never acted autonomously to finalize my request independently. Furthermore, it is remarkably precise; I’ve had to implement minimal modifications to the final order. Should it falter — which I have observed occurring on a few occasions — it typically occurs within the initial one or two minutes when a specific aspect of the application demands my input, such as granting consent for location access or altering the drop-off point to my residence instead of Nevada, which was the most recent area where that application was utilized. I had to discern the nature of the issue in such instances, but after resolution was achieved, I was able to recommence the automated process flawlessly.
Here’s the one that truly impressed me. I entered an appointment into my schedule for a flight to San Francisco on the subsequent day (a simulated journey for me, but with authentic travel particulars). I provided Gemini with a general instruction to arrange a ride-share service that would ensure my timely arrival at the airfield for my flight tomorrow. Because Gemini can consult my electronic mail and agenda, it was able to retrieve that data. It did require some additional direction — potentially due to the flight details not being in my electronic mail as anticipated. Nevertheless, with that, it located the travel particulars, proposed departing by 11:30 or 11:45 AM (sensible timing for a 1:45 PM departure considering my proximity to the airfield), and inquired about arranging transport for one of those times. I validated the hour, and it proceeded to arrange the transport service within approximately three minutes, without any additional involvement from me.


It becomes a bit more remarkable when you consider that Uber doesn’t even designate it as *scheduling* a journey — rather, you *secure* a ride. This represents the crucial distinction separating the virtual assistants we have been utilizing from the AI helpers currently emerging. Employing natural language when conversing with the computer makes a substantial impact when you’re managing your smart home or submitting your dinner request. If the computer routinely stumbles, demanding clarification because you’ve overlooked that the eatery refers to your meal as a “plate” instead of a “combo,” or if you request “slaw” in place of “shredded cabbage,” then its utility is no greater than the assistants we’ve relied on for the past decade merely to set timers and play music.
Nonetheless, observing Gemini navigate and scroll through Uber Eats highlights a clear truth: if one were to design an application specifically for AI utilization, it would bear no resemblance to the interfaces prevalent today. These are, of course, apps meticulously crafted for human interaction. An AI assistant will not be swayed by a prominent advertisement in the middle of a page offering a 30 percent discount on your order. A tempting, skillfully presented photograph of the dish it is ordering holds no more persuasive power than an inferior-quality one. Instead, you would provide it with a comprehensive database, rather than a profusion of extraneous information to sift through — a goal the industry is striving for with the Model Context Protocol, or MCP.
An artificial intelligence system attempting to reason its way through a user-centric interface appears to be the most inefficient and fragile method for submitting a pizza order. It does encounter a hitch occasionally, and it is not particularly adept at explaining *why* it was unable to complete something. This current iteration of task automation feels like a temporary solution until software creators implement more robust approaches: either MCP or Android’s integrated app functions. Sameer Samat, Google’s Chief of Android, informed me recently that Gemini employs this deduction method in the absence of the other two alternatives. Perhaps this form of automated operations serves as a glimpse into future capabilities, or as a means to incentivize developers into embracing one of the other methods. Regardless, this marks a significant initial stride toward a novel way of utilizing our handheld digital helpers — it may be cumbersome and deliberate, but it is highly encouraging.
Imagery captured by Allison Johnson / The Verge
{content}
Source: {feed_title}

