Is Your Analytics Ready for An AI-First Product?
Product and engineering teams face new challenges when building AI-first products. A modern digital analytics platform offers solutions.
For a long time, software products have given users discrete experiences that follow common conventions for design and interaction. Users navigate via menus, they click on UI buttons, they browse screens filled with pre-existing information.
And with these conventions, day-to-day use of AI/ML applications has been limited to use cases like search engine results, simple chatbot conversations, or “you might also like” product recommendations.
However, in the last three years, AI’s growing adoption has broken those conventions. Nearly every digital product we use (e.g., booking travel, shopping for shoes, preparing documents, etc.) has an AI aspect in its UX. Many offer a chat-first interface where text commands replace points and clicks. Some even include agentic workflows that find information and take action, stringing together a series of tasks and communicating with third-party services.
This shift has opened up new levels of personalization and ease of use—but it’s also made understanding users much more difficult for builders. AI interfaces are probabilistic instead of deterministic, so users are less likely to follow a single “golden path” between discrete pages or operate in standard conversion funnels. How can you run effective analytics if you can’t follow your users’ specific journeys?
In this post, I’ll share three key challenges faced by product and engineering teams building AI-first products, followed by the must-have solutions a modern digital analytics platform like Amplitude offers. We’ll also see examples of the new types of questions analytics teams need to be able to answer when building AI-first products.
Challenge 1: Outdated measurement tools
In most AI-based workflows, a successful outcome is hard to measure because it isn’t represented by a specific page view or button click. Teams can track correlated downstream metrics like revenue or retention, but traditional analytics tools can’t link movement on these metrics directly to new AI-based workflows.
Solution 1: Evals as events and properties
Leading product teams use a combination of objective, deterministic tests and LLM-as-a-judge processes as “evals” for user journey success metrics. Amplitude events or event properties capture user behavior and allow for meaningful monitoring, investigation, and trend analysis. Amplitude’s NLP can categorize conversations into topics, use cases, or themes, then score the outcome as a success or failure. Teams can use Amplitude to answer a range of new questions that embrace more complex user journeys, like:
- What percent of conversations have a positive outcome?
- What do users typically do after the agent fails?
- Which topics yield the most frustrated users?
- Do users sign up for our pro plan at a higher rate after a successful chat conversation?
- What should be the minimum level of confidence in a response to show it to the user?


Solution 2: Qualitative review
There will always be times when product teams want to analyze user behavior in more detail than just “success” or “failure.” Amplitude natively offers several qualitative analysis tools to complement binary scoring:
- Session Replay: See reconstructions of full user sessions to understand their interaction with a bot or agent. We recommend using quantitative analysis to determine which sessions to watch, then observe replays to get all the details. Some teams even use AI to watch sessions and extract meaningful trends. Teams can use session replay to answer questions like:
- Why did a user exit a chat halfway through?
- How does a user act when the LLM takes too long to respond?
- Surveys: Rather than relying on what happens on the screen, perhaps the best indication of success or failure is the user’s subjective perception. Amplitude offers the ability to survey users on their experience. They can provide a binary pass/fail, rate on a 1-5 scale, or even type open-ended text feedback. NLP can then be used to categorize and analyze text feedback using Amplitude technology that minimizes hallucinations. Teams can now answer questions like:
- When do users score an outcome poorly despite the agent believing it delivered the right outcome?
- Which users expressed frustration with the agent and what did they do next?

Solution 3: Tool usage analytics
Amplitude can natively track tool usage within agentic workflows. This provides meaningful insight into which tools are used most, in which order, and what impact that has on the workflow outcome and user satisfaction.

Solution 4: Cost analytics
AI APIs also report token usage for each request. Amplitude offers prebuilt tooling to capture token usage per prompt and prebuilt reporting to monitor usage by feature or customer. Product teams can now optimize spend and ensure ROI on their AI vendor billing. Teams can now answer questions like:
- What is the average cost per agent run by use case?
- What is the cost impact of rolling out the latest LLM to our customer base?
Challenge 2: LLMs are nondeterministic
It’s impossible to open up LLMs and see how they’re working. They can analyze a nearly infinite number of inputs (model selection, system prompts, chat messages, user context, tools available, model parameters) that affect the quality of output, the cost, and the latency, but the output isn’t predictable.
Solution 1: Experimentation
AI introduces a new universe of uncertainties and opportunities for optimization. Teams must also choose between AI vendors, models, and system prompts, all of which accommodate feature flags and experiments. There are new pricing/packaging alternatives to test, such as how many free credits to offer or which plans can access certain models.
With Amplitude Feature Experimentation, engineering teams can replace hard-coded parameters with remotely configured payloads, then run experiments to determine the impact of changes. Teams can now answer questions like which system prompt variant leads to the highest agent satisfaction rate? Or does the delay from using a higher reasoning level produce more successful chats?

Solution 2: User context inputs via Profile API
LLMs have near-infinite general knowledge but no inherent contextual knowledge of a user’s state or goals. Imagine a support chatbot that doesn’t know what paid plan a user is on, or a flight-booking agent that doesn’t know a user’s loyalty program membership.
Amplitude’s User Profile API unites user data from CRM, CDP, DWH, in-app behavioral data (including action recency and frequency), and even Amplitude-powered propensity models. This blended data can be delivered seamlessly to anywhere in your stack for entry into a system prompt with minimal engineering lift.
Solution 3: User profile enrichment
Sophisticated product and data teams are also keen to use AI for more advanced use cases in the data warehouse. However, there are still challenges in collecting clean, reliable, up-to-date data to train these models. It’s also necessary to have tools in place to act on the model outputs.
Amplitude’s two-way integrations with leading warehousing providers (Snowflake, BigQuery, S3, and Databricks) can easily facilitate this. The output of an ML job run in the data warehouse can be imported back to Amplitude to enrich the user’s profile. Once back in Amplitude, that property can be used to target experiments, guides, and surveys. It can also be synced out to ad platforms, marketing automation and messaging tools, CRMs, and more for activation.
Challenge 3: Dependence on external systems
Product builders are now relying on external parties for critical user experiences. Engineering teams will always lose sleep over putting third-party API calls in the critical path of delivering an experience to a user, but for AI-powered features, there’s often no alternative.
Solution 1: Latency monitoring
Amplitude automatically tracks the latency between a user prompt and an AI response. This opens up opportunities to understand which segments of users or types of prompts are producing the biggest delays. Product teams can set up intelligent alerts to fire if SLAs are breached. These metrics can also inform experiments about model selection, AI vendor performance, or other changes to backend infrastructure.
Solution 2: Feature flags
If the worst should happen and a team notices a latency spike or an API endpoint becomes unresponsive, feature flags are critical as circuit breakers, enabling an engineering team to quickly swap to a backup infrastructure or vendor.
So, is your analytics ready?
With the proliferation of AI, modern product teams need to blend qualitative and quantitative user experience data for a deeper understanding of their customers. They need to experiment on parameters, models, and infrastructure to react to users in real time. They need to collect user context and deliver it anywhere in the stack to create the most personalized experience yet.
Taken together, these solutions give product teams the ability to drive improvements to their AI-powered features and apps with full visibility into the impact on responsiveness, response quality, and cost.
The future of AI-first products is still definitely in motion—much like user journeys in AI products themselves. But these strategies, and the right tools to execute them, will help build a foundation that works on moving ground.
Teams that can ship code and learn using tight feedback loops will innovate faster and outperform their competition. Amplitude’s digital analytics platform is uniquely positioned to help teams build this new generation of AI-native products and experiences. Try it today with a free Starter account.

Ken Kutyn
Head of Solutions Engineering, APJ, Amplitude
Ken has 11 years experience in the analytics, experimentation, and personalization space. Originally from Vancouver Canada, he has lived and worked in London, Amsterdam, and San Francisco and is now based in Singapore. Ken has a passion for experimentation and data-driven decision-making and has spoken at several product development conferences. In his free time, he likes to travel around South East Asia with his family, bake bread, and explore the Singapore food scene.
More from KenRecommended Reading

Amplitude’s All-Star Weekend with the NBA Foundation: A Recap
Feb 26, 2026
5 min read

Amplitude Pathfinder: How Zach Phillips went from “IT Guy” to Analytics Leader
Feb 25, 2026
12 min read

Amplitude + Lovable: Launch, Learn, Repeat
Feb 23, 2026
3 min read

Amplitude + Figma: Make What Matters
Feb 20, 2026
4 min read

