Platform

AI

AI Agents
Sense, decide, and act faster than ever before
AI Visibility
See how your brand shows up in AI search
AI Feedback
Distill what your customers say they want
Amplitude MCP
Insights from the comfort of your favorite AI tool

Insights

Product Analytics
Understand the full user journey
Marketing Analytics
Get the metrics you need with one line of code
Session Replay
Visualize sessions based on events in your product
Heatmaps
Visualize clicks, scrolls, and engagement

Action

Guides and Surveys
Guide your users and collect feedback
Feature Experimentation
Innovate with personalized product experiences
Web Experimentation
Drive conversion with A/B testing powered by data
Feature Management
Build fast, target easily, and learn as you ship
Activation
Unite data across teams

Data

Warehouse-native Amplitude
Unlock insights from your data warehouse
Data Governance
Complete data you can trust
Security & Privacy
Keep your data secure and compliant
Integrations
Connect Amplitude to hundreds of partners
Solutions
Solutions that drive business results
Deliver customer value and drive business outcomes
Amplitude Solutions →

Industry

Financial Services
Personalize the banking experience
B2B
Maximize product adoption
Media
Identify impactful content
Healthcare
Simplify the digital healthcare experience
Ecommerce
Optimize for transactions

Use Case

Acquisition
Get users hooked from day one
Retention
Understand your customers like no one else
Monetization
Turn behavior into business

Team

Product
Fuel faster growth
Data
Make trusted data accessible
Engineering
Ship faster, learn more
Marketing
Build customers for life
Executive
Power decisions, shape the future

Size

Startups
Free analytics tools for startups
Enterprise
Advanced analytics for scaling businesses
Resources

Learn

Blog
Thought leadership from industry experts
Resource Library
Expertise to guide your growth
Compare
See how we stack up against the competition
Glossary
Learn about analytics, product, and technical terms
Explore Hub
Detailed guides on product and web analytics

Connect

Community
Connect with peers in product analytics
Events
Register for live or virtual events
Customers
Discover why customers love Amplitude
Partners
Accelerate business value through our ecosystem

Support & Services

Customer Help Center
All support resources in one place: policies, customer portal, and request forms
Developer Hub
Integrate and instrument Amplitude
Academy & Training
Become an Amplitude pro
Professional Services
Drive business success with expert guidance and support
Product Updates
See what's new from Amplitude

Tools

Benchmarks
Understand how your product compares
Templates
Kickstart your analysis with custom dashboard templates
Tracking Guides
Learn how to track events and metrics with Amplitude
Maturity Model
Learn more about our digital experience maturity model
Pricing
LoginContact salesGet started

AI

AI AgentsAI VisibilityAI FeedbackAmplitude MCP

Insights

Product AnalyticsMarketing AnalyticsSession ReplayHeatmaps

Action

Guides and SurveysFeature ExperimentationWeb ExperimentationFeature ManagementActivation

Data

Warehouse-native AmplitudeData GovernanceSecurity & PrivacyIntegrations
Amplitude Solutions →

Industry

Financial ServicesB2BMediaHealthcareEcommerce

Use Case

AcquisitionRetentionMonetization

Team

ProductDataEngineeringMarketingExecutive

Size

StartupsEnterprise

Learn

BlogResource LibraryCompareGlossaryExplore Hub

Connect

CommunityEventsCustomersPartners

Support & Services

Customer Help CenterDeveloper HubAcademy & TrainingProfessional ServicesProduct Updates

Tools

BenchmarksTemplatesTracking GuidesMaturity Model
LoginSign Up

Why Context Engineering Matters More Than Prompt Engineering

Stop rewriting system prompts. Instead, structure data access and tool calls to mimic an analyst.
Insights

Dec 16, 2025

9 min read

Ram Soma

Ram Soma

Staff AI Engineer

Context engineering feature

Most teams overestimate what prompt engineering can do. They tweak instructions, reformulate sentences, and write in all caps hoping that the model will suddenly become smarter. If your team has written and rewritten your prompt to the point of diminishing returns, you know what I mean.

But after working on AI at Amplitude, we discovered something counterintuitive: prompts weren’t the thing that improved quality. Context was.

That insight influenced how we built the rest of our system. It’s also critical for anyone trying to get reliable results from an LLM. If you’ve ever reached the point where prompt tweaks stop helping, this post will explain why and what you can do to improve your results.

Prompt engineering vs. context engineering

What’s the difference between prompt engineering and context engineering?

Prompt engineering is where most people start. You rewrite instructions, adjust the tone, add a few constraints, and hope the model follows them. This usually helps, but only up to a point.

Context engineering, on the other hand, looks at a different problem. Namely, which information and tools the model needs at the moment it’s making a decision. Good context engineering means being thoughtful about which data sources the model can query, which tools it can call, how the output from one step becomes input for the next, and which details should be kept out because they create noise rather than insight.

Once we understood that context was the primary driver of quality, we needed to make a series of choices about what context the model should and shouldn’t see. These two decisions had a big impact on our results.

Decision #1: Creating tool abstractions that work like analysts

One of the first context decisions we faced was how to organize tool calls to ensure we provided the right level of context to yield the best results.

Providing the system with all of the context all at once doesn’t work. Flooding the system with too much general data can lead to context rot, which negatively impacts performance and decreases the quality of results. We took a different approach that mirrors how analysts process information.

Analysts reason sequentially, not all at once. Each step—detection, attribution, segmentation—depends on the context generated by the prior one. LLMs require the same structure. Instead of relying on prompts alone, we orchestrate tool calls that mirror this investigative workflow. Each tool’s output becomes targeted context for the next stage, enabling the model to progressively narrow in on the true driver behind a metric change.

Decision #2: Restricting AI search to internal business context

The next decision we faced was whether or not to let the system search the public web.

Real-world events can influence product metrics, so the idea has merit. It’s tempting, but the internet creates far more noise than signal. If a model can pull in any news event, everything starts to look relevant. Instead of narrowing down potential causes, the model can overindex on external explanations that can’t be verified using product data.

So we drew a clear line and restricted Amplitude's AI from leveraging data that came from outside Amplitude.

This wasn’t about limiting the model’s abilities. It was about keeping the context clean so the explanations stayed grounded. This is a textbook example of what context engineering really means: deciding what not to include so the system stays focused.

Decision #3: Building richer conversational context

For an analyst, business context is often subconscious and automatic. They’re aware of significant organizational events without even thinking about them. When they ask AI a question, they automatically apply that information to the results before they ask a new one.

AI agents do not work the same way. They’re excellent at some parts of the analysis process (fetching data, finding patterns, automating routines) but they don’t have the same pulse on company context. That’s why analysts and agents get the best answers by collaborating in a back-and-forth conversational style.

With this approach, the AI is still doing the bulk of the grunt work to explore the data, but it is regularly coming back to the analyst for more context. The analyst still has control to steer the inquiry and can be sure that none of their valuable information is accidentally left out of the process.

Amplitude’s AI uses chat as an interface for this exact reason. It’s a choice we made with the intention of surfacing that subconscious analyst context frequently throughout an investigation.

Decision #4: Managing context size

As our agent matured, we started to see a different kind of problem: it got worse at following instructions as the context increased.

Our agent systematically applies a series of analytical steps to the data. Each step adds more information to the conversation—queries, results, intermediate summaries, and follow-up instructions. Over time, the model collects too much context to effectively analyze.

The information load started damaging our AI’s performance. For example, it precisely followed steps at the beginning of a chain, but faltered closer to the end. Sometimes it ignored or partially followed existing instructions that were buried in too much information. The contextual noise eventually got too loud for the model to function properly.

We found two effective ways to solve this problem:

  • Dynamic context windows. We dynamically switch to larger context windows (up to 1M tokens) only when a task truly requires it, instead of always operating at a maximum window size.
  • Explicit planning and step tracking. We experimented with getting the agent to plan its steps up front and track its progress as it moves through them, giving the model a clearer scaffold to follow, even as the surrounding context grew.

These solutions are examples of context engineering, not prompt tweaking. We adjusted the way context was organized and delivered to the model so it could reliably stay focused on the right things.

Evals are the backbone of context engineering

All of these decisions only worked because we treated evals as the backbone of our context engineering process. Instead of arguing about prompts or trusting our intuition, we used a robust evaluation set to measure whether each context change actually improved outcomes. When something failed, evals pinpointed the failure. The model was almost always missing a key piece of information, calling the wrong tool, or getting important context too late.

This evals-driven development loop is what made us confident in our choices. We could refactor tool calls, tighten or expand context, and reorganize steps knowing the evals would immediately tell us if we were moving in the right direction—or breaking something that used to work.

That’s the real key to context engineering. Without strong evals, it’s guesswork. With them, it becomes an iterative, reliable way to improve quality.

Context engineering is the key to scalable, reliable AI

Prioritizing context for Amplitude’s AI system didn’t just improve quality, it made the system easier to evolve. When we introduce a new tool, we don’t rewrite prompts from scratch. We define what the tool does, decide where it belongs in the workflow, and update our evals to reflect that new capability. When something breaks, we can debug by inspecting context and steps, instead of guessing which phrase in the prompt stopped working.

The same principle applies to anyone trying to get reliable, high-quality behavior from an LLM. Instead of endlessly tuning instructions, you should invest in the model’s operating environment: the information it sees, the tools it can use, the sequence it follows, and the evals that keep you honest. When you engineer context with intention, you get a system that’s more stable, more predictable, and far easier to extend as your use cases grow. If you’ve hit the ceiling on prompt tuning, context engineering is how you break through it.

About the author
Ram Soma

Ram Soma

Staff AI Engineer

More from Ram

Ram Soma is a Staff AI Engineer at Amplitude, leading various AI initiatives across the company. With a background in data science and machine learning engineering, he loves partnering with Amplitude’s passionate community of PMs, analysts, and data professionals, using AI to make their experience even more delightful and productive.

More from Ram
Topics

AI

Analytics

Product Analytics

Recommended Reading

article card image
Read 
Insights
The Product Benchmarks Every Financial Services Company Should Know

Dec 16, 2025

5 min read

article card image
Read 
Insights
The Product Benchmarks Every B2B Technology Company Should Know

Dec 11, 2025

5 min read

article card image
Read 
Product
How Amplitude Taught AI to Think Like an Analyst

Dec 11, 2025

8 min read

article card image
Read 
Product
Amplitude + OpenAI: Get New Insights in ChatGPT via MCP

Dec 10, 2025

3 min read

Platform
  • Product Analytics
  • Feature Experimentation
  • Feature Management
  • Web Analytics
  • Web Experimentation
  • Session Replay
  • Activation
  • Guides and Surveys
  • AI Agents
  • AI Visibility
  • AI Feedback
  • Amplitude MCP
Compare us
  • Adobe
  • Google Analytics
  • Mixpanel
  • Heap
  • Optimizely
  • Fullstory
  • Pendo
Resources
  • Resource Library
  • Blog
  • Product Updates
  • Amp Champs
  • Amplitude Academy
  • Events
  • Glossary
Partners & Support
  • Contact Us
  • Customer Help Center
  • Community
  • Developer Docs
  • Find a Partner
  • Become an affiliate
Company
  • About Us
  • Careers
  • Press & News
  • Investor Relations
  • Diversity, Equity & Inclusion
Terms of ServicePrivacy NoticeAcceptable Use PolicyLegal
EnglishJapanese (日本語)Korean (한국어)Español (Spain)Português (Brasil)Português (Portugal)FrançaisDeutsch
© 2025 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.
Blog
InsightsProductCompanyCustomers
Topics

101

AI

APJ

Acquisition

Adobe Analytics

Amplify

Amplitude Academy

Amplitude Activation

Amplitude Analytics

Amplitude Audiences

Amplitude Community

Amplitude Feature Experimentation

Amplitude Guides and Surveys

Amplitude Heatmaps

Amplitude Made Easy

Amplitude Session Replay

Amplitude Web Experimentation

Amplitude on Amplitude

Analytics

B2B SaaS

Behavioral Analytics

Benchmarks

Churn Analysis

Cohort Analysis

Collaboration

Consolidation

Conversion

Customer Experience

Customer Lifetime Value

DEI

Data

Data Governance

Data Management

Data Tables

Digital Experience Maturity

Digital Native

Digital Transformer

EMEA

Ecommerce

Employee Resource Group

Engagement

Event Tracking

Experimentation

Feature Adoption

Financial Services

Funnel Analysis

Getting Started

Google Analytics

Growth

Healthcare

How I Amplitude

Implementation

Integration

LATAM

Life at Amplitude

MCP

Machine Learning

Marketing Analytics

Media and Entertainment

Metrics

Modern Data Series

Monetization

Next Gen Builders

North Star Metric

Partnerships

Personalization

Pioneer Awards

Privacy

Product 50

Product Analytics

Product Design

Product Management

Product Releases

Product Strategy

Product-Led Growth

Recap

Retention

Startup

Tech Stack

The Ampys

Warehouse-native Amplitude