The Project
Posts
The AI UI Challenge

The AI UI Challenge

Hunter Clarke
August 28, 2024

A new Google ad promoting the AI features of their new Pixel highlights a challenge of AI adoption and development.

An actor says “put this list of restaurants on a map,” and the locations are automatically pinned on Google Maps.

When I saw this ad, I had three thoughts:

Who would ask the AI to do that?
Is this the best example?
What do these questions mean for builders?

Let’s tackle them one at a time.

Who would ask that?

The cost and time invested in AI training runs has been extensively covered. Thousands of GPUs, months of training time, and new approaches to power are being invested in the next large models.

Little has been discussed about training end users to use these new tools.

Since the early days of the PC, computer users have been trained to think in terms of applications. An application serves a specific function. While applications persist data in a filesystem, the mental model is fairly contained. Tasks live in one application, then operations may be made to export the data for sharing.

This mental model has followed us through the internet and mobile paradigms. The internet nested applications within a browser. Aside from the browser, the workflow was the same: open a webpage, perform a task, then export or share data (or invite another user to view the document). Mobile applications follow the same pattern.

What strikes me about the Google ad is the fundamental disruption of this pattern. Data in one application can be taken and passed to any number of other applications under the care of the AI daemon.

The AI can be tasked with a job and will use the applications on the device to finish it.

Google and Apple are trying to make a subtle but challenging shift in user behavior. In an ideal world, users would ask the AI to do a task as the first attempt. That may be the case for our children. Their first devices will be AI first. They’ll know how to use it.

In the meantime, Google and Apple have to walk a tightrope. They have to teach us to unlearn our current usage habits, replace them with AI workflows, and spend billions on the next generation of models to keep up.

That explains my initial reaction to the new Google ad: “Who would think to ask the phone to do that?” I don’t have the habit of directly asking my phone for what I want. Instead, I habitually tap through applications to find a place to eat.

One reason ChatGPT exploded was because it cleverly side-stepped the problem by constraining the AI to the most popular computing interface: communications. Everyone knows how to use messaging, email, etc. It’s a completely normal pattern of behavior. The new behavior was more tractable to take on. “Just talk to this thing like a person.”

What will Apple and Google do when the usage graphs lag behind the technical capabilities? They’ll have an AI capable of doing everything for a user, but still see everyone opening Maps or Google and searching “restaurants” like they have for the past 15 years.

Inertia is a real problem.

Is that all?

The technical accomplishment of transferring data from one application, understanding intent, transforming it, and displaying it in another application is extraordinary. The unlocked workflows are incredible.

But that’s the application you decided to show us? You have an AI capable of accomplishing tasks for the user and mapping restaurants is the best you could come up with?

Something’s off. Maybe it relates to the point above. The ad isn’t meant to show the AI’s full horsepower, it’s educational. It’s meant to nudge everyone to try new features as if they aren’t AI. It’s approachable.

This approach echoes Apple’s “From Messages” multi-factor authentication. It breaches the application paradigm, showing that Apple can know your data’s state and place useful data in front of you.

Taking action without expressing intent is the only path forward for adoption. It’s unlikely Apple and Google will write data on your behalf. But more and more, useful data will be provided to you from elsewhere on your device. AI will try to guess your next move and ask if it can take that next step for you.

It will pad the usage numbers while everyone figures out how to use these tools over the next few years.

What about builders?

The problem is that all these functions require root access to our devices.

Isolation of applications provides built-in security. Just because the OS can access my bank app’s financial data, doesn’t mean I want that data shared outside of that app.

Apple and Google will secure these tasks like they secure the OS. They are the only ones who can make exceptions to the application model.

Where does that leave us?

The best AI applications will likely exist within walled gardens. To make Apple Intelligence work for you, use Mail, Notes, Maps, and Safari instead of popular alternatives.

It means it’s hard to compete as a builder. AI coding is making code more ephemeral. End users can develop tools in minutes without your SaaS. And unless you work for Google or Apple, you can’t build a moat around end user workflows at the OS level.

That leaves you contending with the UI challenge directly. Most builders already know this and the number of chat interfaces has exploded as a result. It’s the one UI everyone can agree will work without friction.

Avoid the UI and use AI on the backend. Have it perform tasks for you, such as writing code, generating marketing assets, or executing workflows for users. Users should interact with your app like any other app.

I’m bearish on chat interfaces for indie apps. Your competitive advantage is solving one problem well. Use LLMs behind the scenes, but the interface should focus on problem-solving. Having a user start a conversation seems like unnecessary friction.

What about voice?

A paradigm shift that is coming soon is voice conversations. Up to this point, we’ve been in an uncanny valley with voice quality, latency, and conversational quirks. But recent demos show these problems are being solved.

I expect the adoption to follow a smaller amplitude curve to chat interfaces. Life has fewer opportunities to interact with devices out loud than in silence.

These takeaways are constraining, but creativity thrives in constraints. I expect plenty of opportunities to build on the foundation of LLMs.

That’s it for this one. Thanks for reading.

Hunter