Why Context Engineering Matters More Than Prompt Engineering
Stop rewriting system prompts. Instead, structure data access and tool calls to mimic an analyst.
Most teams overestimate what prompt engineering can do. They tweak instructions, reformulate sentences, and write in all caps hoping that the model will suddenly become smarter. If your team has written and rewritten your prompt to the point of diminishing returns, you know what I mean.
But after working on AI at Amplitude, we discovered something counterintuitive: prompts weren’t the thing that improved quality. Context was.
That insight influenced how we built the rest of our system. It’s also critical for anyone trying to get reliable results from an LLM. If you’ve ever reached the point where prompt tweaks stop helping, this post will explain why and what you can do to improve your results.
Prompt engineering vs. context engineering
What’s the difference between prompt engineering and context engineering?
Prompt engineering is where most people start. You rewrite instructions, adjust the tone, add a few constraints, and hope the model follows them. This usually helps, but only up to a point.
Context engineering, on the other hand, looks at a different problem. Namely, which information and tools the model needs at the moment it’s making a decision. Good context engineering means being thoughtful about which data sources the model can query, which tools it can call, how the output from one step becomes input for the next, and which details should be kept out because they create noise rather than insight.
Once we understood that context was the primary driver of quality, we needed to make a series of choices about what context the model should and shouldn’t see. These two decisions had a big impact on our results.
Decision #1: Creating tool abstractions that work like analysts
One of the first context decisions we faced was how to organize tool calls to ensure we provided the right level of context to yield the best results.
Providing the system with all of the context all at once doesn’t work. Flooding the system with too much general data can lead to context rot, which negatively impacts performance and decreases the quality of results. We took a different approach that mirrors how analysts process information.
Analysts reason sequentially, not all at once. Each step—detection, attribution, segmentation—depends on the context generated by the prior one. LLMs require the same structure. Instead of relying on prompts alone, we orchestrate tool calls that mirror this investigative workflow. Each tool’s output becomes targeted context for the next stage, enabling the model to progressively narrow in on the true driver behind a metric change.
Decision #2: Restricting AI search to internal business context
The next decision we faced was whether or not to let the system search the public web.
Real-world events can influence product metrics, so the idea has merit. It’s tempting, but the internet creates far more noise than signal. If a model can pull in any news event, everything starts to look relevant. Instead of narrowing down potential causes, the model can overindex on external explanations that can’t be verified using product data.
So we drew a clear line and restricted Amplitude's AI from leveraging data that came from outside Amplitude.
This wasn’t about limiting the model’s abilities. It was about keeping the context clean so the explanations stayed grounded. This is a textbook example of what context engineering really means: deciding what not to include so the system stays focused.
Decision #3: Building richer conversational context
For an analyst, business context is often subconscious and automatic. They’re aware of significant organizational events without even thinking about them. When they ask AI a question, they automatically apply that information to the results before they ask a new one.
AI agents do not work the same way. They’re excellent at some parts of the analysis process (fetching data, finding patterns, automating routines) but they don’t have the same pulse on company context. That’s why analysts and agents get the best answers by collaborating in a back-and-forth conversational style.
With this approach, the AI is still doing the bulk of the grunt work to explore the data, but it is regularly coming back to the analyst for more context. The analyst still has control to steer the inquiry and can be sure that none of their valuable information is accidentally left out of the process.
Amplitude’s AI uses chat as an interface for this exact reason. It’s a choice we made with the intention of surfacing that subconscious analyst context frequently throughout an investigation.
Decision #4: Managing context size
As our agent matured, we started to see a different kind of problem: it got worse at following instructions as the context increased.
Our agent systematically applies a series of analytical steps to the data. Each step adds more information to the conversation—queries, results, intermediate summaries, and follow-up instructions. Over time, the model collects too much context to effectively analyze.
The information load started damaging our AI’s performance. For example, it precisely followed steps at the beginning of a chain, but faltered closer to the end. Sometimes it ignored or partially followed existing instructions that were buried in too much information. The contextual noise eventually got too loud for the model to function properly.
We found two effective ways to solve this problem:
- Dynamic context windows. We dynamically switch to larger context windows (up to 1M tokens) only when a task truly requires it, instead of always operating at a maximum window size.
- Explicit planning and step tracking. We experimented with getting the agent to plan its steps up front and track its progress as it moves through them, giving the model a clearer scaffold to follow, even as the surrounding context grew.
These solutions are examples of context engineering, not prompt tweaking. We adjusted the way context was organized and delivered to the model so it could reliably stay focused on the right things.
Evals are the backbone of context engineering
All of these decisions only worked because we treated evals as the backbone of our context engineering process. Instead of arguing about prompts or trusting our intuition, we used a robust evaluation set to measure whether each context change actually improved outcomes. When something failed, evals pinpointed the failure. The model was almost always missing a key piece of information, calling the wrong tool, or getting important context too late.
This evals-driven development loop is what made us confident in our choices. We could refactor tool calls, tighten or expand context, and reorganize steps knowing the evals would immediately tell us if we were moving in the right direction—or breaking something that used to work.
That’s the real key to context engineering. Without strong evals, it’s guesswork. With them, it becomes an iterative, reliable way to improve quality.
Context engineering is the key to scalable, reliable AI
Prioritizing context for Amplitude’s AI system didn’t just improve quality, it made the system easier to evolve. When we introduce a new tool, we don’t rewrite prompts from scratch. We define what the tool does, decide where it belongs in the workflow, and update our evals to reflect that new capability. When something breaks, we can debug by inspecting context and steps, instead of guessing which phrase in the prompt stopped working.
The same principle applies to anyone trying to get reliable, high-quality behavior from an LLM. Instead of endlessly tuning instructions, you should invest in the model’s operating environment: the information it sees, the tools it can use, the sequence it follows, and the evals that keep you honest. When you engineer context with intention, you get a system that’s more stable, more predictable, and far easier to extend as your use cases grow. If you’ve hit the ceiling on prompt tuning, context engineering is how you break through it.

Ram Soma
Staff AI Engineer
Ram Soma is a Staff AI Engineer at Amplitude, leading various AI initiatives across the company. With a background in data science and machine learning engineering, he loves partnering with Amplitude’s passionate community of PMs, analysts, and data professionals, using AI to make their experience even more delightful and productive.
More from RamRecommended Reading

The Product Benchmarks Every Financial Services Company Should Know
Dec 16, 2025
5 min read

The Product Benchmarks Every B2B Technology Company Should Know
Dec 11, 2025
5 min read

How Amplitude Taught AI to Think Like an Analyst
Dec 11, 2025
8 min read

Amplitude + OpenAI: Get New Insights in ChatGPT via MCP
Dec 10, 2025
3 min read

