Is Your Analytics Ready for An AI-First Product?

Product and engineering teams face new challenges when building AI-first products. A modern digital analytics platform offers solutions.
Insights

Feb 20, 2026

10 min read

For a long time, software products have given users discrete experiences that follow common conventions for design and interaction. Users navigate via menus, they click on UI buttons, they browse screens filled with pre-existing information.

And with these conventions, day-to-day use of AI/ML applications has been limited to use cases like search engine results, simple chatbot conversations, or “you might also like” product recommendations.

However, in the last three years, AI’s growing adoption has broken those conventions. Nearly every digital product we use (e.g., booking travel, shopping for shoes, preparing documents, etc.) has an AI aspect in its UX. Many offer a chat-first interface where text commands replace points and clicks. Some even include agentic workflows that find information and take action, stringing together a series of tasks and communicating with third-party services.

This shift has opened up new levels of personalization and ease of use—but it’s also made understanding users much more difficult for builders. AI interfaces are probabilistic instead of deterministic, so users are less likely to follow a single “golden path” between discrete pages or operate in standard conversion funnels. How can you run effective analytics if you can’t follow your users’ specific journeys?

In this post, I’ll share three key challenges faced by product and engineering teams building AI-first products, followed by the must-have solutions a modern digital analytics platform like Amplitude offers. We’ll also see examples of the new types of questions analytics teams need to be able to answer when building AI-first products.

Challenge 1: Outdated measurement tools

In most AI-based workflows, a successful outcome is hard to measure because it isn’t represented by a specific page view or button click. Teams can track correlated downstream metrics like revenue or retention, but traditional analytics tools can’t link movement on these metrics directly to new AI-based workflows.

Solution 1: Evals as events and properties

Leading product teams use a combination of objective, deterministic tests and LLM-as-a-judge processes as “evals” for user journey success metrics. Amplitude events or event properties capture user behavior and allow for meaningful monitoring, investigation, and trend analysis. Amplitude’s NLP can categorize conversations into topics, use cases, or themes, then score the outcome as a success or failure. Teams can use Amplitude to answer a range of new questions that embrace more complex user journeys, like:

  • What percent of conversations have a positive outcome?
  • What do users typically do after the agent fails?
  • Which topics yield the most frustrated users?
  • Do users sign up for our pro plan at a higher rate after a successful chat conversation?
  • What should be the minimum level of confidence in a response to show it to the user?
Evals as events
evals as events 2

Solution 2: Qualitative review

There will always be times when product teams want to analyze user behavior in more detail than just “success” or “failure.” Amplitude natively offers several qualitative analysis tools to complement binary scoring:

  • Session Replay: See reconstructions of full user sessions to understand their interaction with a bot or agent. We recommend using quantitative analysis to determine which sessions to watch, then observe replays to get all the details. Some teams even use AI to watch sessions and extract meaningful trends. Teams can use session replay to answer questions like:
    • Why did a user exit a chat halfway through?
    • How does a user act when the LLM takes too long to respond?
  • Surveys: Rather than relying on what happens on the screen, perhaps the best indication of success or failure is the user’s subjective perception. Amplitude offers the ability to survey users on their experience. They can provide a binary pass/fail, rate on a 1-5 scale, or even type open-ended text feedback. NLP can then be used to categorize and analyze text feedback using Amplitude technology that minimizes hallucinations. Teams can now answer questions like:
    • When do users score an outcome poorly despite the agent believing it delivered the right outcome?
    • Which users expressed frustration with the agent and what did they do next?
Feedback thumbs

Solution 3: Tool usage analytics

Amplitude can natively track tool usage within agentic workflows. This provides meaningful insight into which tools are used most, in which order, and what impact that has on the workflow outcome and user satisfaction.

tool usage analytics

Solution 4: Cost analytics

AI APIs also report token usage for each request. Amplitude offers prebuilt tooling to capture token usage per prompt and prebuilt reporting to monitor usage by feature or customer. Product teams can now optimize spend and ensure ROI on their AI vendor billing. Teams can now answer questions like:

  • What is the average cost per agent run by use case?
  • What is the cost impact of rolling out the latest LLM to our customer base?

Challenge 2: LLMs are nondeterministic

It’s impossible to open up LLMs and see how they’re working. They can analyze a nearly infinite number of inputs (model selection, system prompts, chat messages, user context, tools available, model parameters) that affect the quality of output, the cost, and the latency, but the output isn’t predictable.

Solution 1: Experimentation

AI introduces a new universe of uncertainties and opportunities for optimization. Teams must also choose between AI vendors, models, and system prompts, all of which accommodate feature flags and experiments. There are new pricing/packaging alternatives to test, such as how many free credits to offer or which plans can access certain models.

With Amplitude Feature Experimentation, engineering teams can replace hard-coded parameters with remotely configured payloads, then run experiments to determine the impact of changes. Teams can now answer questions like which system prompt variant leads to the highest agent satisfaction rate? Or does the delay from using a higher reasoning level produce more successful chats?

Experimentation

Solution 2: User context inputs via Profile API

LLMs have near-infinite general knowledge but no inherent contextual knowledge of a user’s state or goals. Imagine a support chatbot that doesn’t know what paid plan a user is on, or a flight-booking agent that doesn’t know a user’s loyalty program membership.

Amplitude’s User Profile API unites user data from CRM, CDP, DWH, in-app behavioral data (including action recency and frequency), and even Amplitude-powered propensity models. This blended data can be delivered seamlessly to anywhere in your stack for entry into a system prompt with minimal engineering lift.

Solution 3: User profile enrichment

Sophisticated product and data teams are also keen to use AI for more advanced use cases in the data warehouse. However, there are still challenges in collecting clean, reliable, up-to-date data to train these models. It’s also necessary to have tools in place to act on the model outputs.

Amplitude’s two-way integrations with leading warehousing providers (Snowflake, BigQuery, S3, and Databricks) can easily facilitate this. The output of an ML job run in the data warehouse can be imported back to Amplitude to enrich the user’s profile. Once back in Amplitude, that property can be used to target experiments, guides, and surveys. It can also be synced out to ad platforms, marketing automation and messaging tools, CRMs, and more for activation.

Challenge 3: Dependence on external systems

Product builders are now relying on external parties for critical user experiences. Engineering teams will always lose sleep over putting third-party API calls in the critical path of delivering an experience to a user, but for AI-powered features, there’s often no alternative.

Solution 1: Latency monitoring

Amplitude automatically tracks the latency between a user prompt and an AI response. This opens up opportunities to understand which segments of users or types of prompts are producing the biggest delays. Product teams can set up intelligent alerts to fire if SLAs are breached. These metrics can also inform experiments about model selection, AI vendor performance, or other changes to backend infrastructure.

Solution 2: Feature flags

If the worst should happen and a team notices a latency spike or an API endpoint becomes unresponsive, feature flags are critical as circuit breakers, enabling an engineering team to quickly swap to a backup infrastructure or vendor.

So, is your analytics ready?

With the proliferation of AI, modern product teams need to blend qualitative and quantitative user experience data for a deeper understanding of their customers. They need to experiment on parameters, models, and infrastructure to react to users in real time. They need to collect user context and deliver it anywhere in the stack to create the most personalized experience yet.

Taken together, these solutions give product teams the ability to drive improvements to their AI-powered features and apps with full visibility into the impact on responsiveness, response quality, and cost.

The future of AI-first products is still definitely in motion—much like user journeys in AI products themselves. But these strategies, and the right tools to execute them, will help build a foundation that works on moving ground.

Teams that can ship code and learn using tight feedback loops will innovate faster and outperform their competition. Amplitude’s digital analytics platform is uniquely positioned to help teams build this new generation of AI-native products and experiences. Try it today with a free Starter account.

About the author
Ken Kutyn

Ken Kutyn

Head of Solutions Engineering, APJ, Amplitude

Ken has 11 years experience in the analytics, experimentation, and personalization space. Originally from Vancouver Canada, he has lived and worked in London, Amsterdam, and San Francisco and is now based in Singapore. Ken has a passion for experimentation and data-driven decision-making and has spoken at several product development conferences. In his free time, he likes to travel around South East Asia with his family, bake bread, and explore the Singapore food scene.

More from Ken