Why Context Engineering Matters More Than Prompt Engineering

Stop rewriting system prompts. Instead, structure data access and tool calls to mimic an analyst.

Dec 16, 2025

9 min read

Ram Soma

Staff AI Engineer

Most teams overestimate what prompt engineering can do. They tweak instructions, reformulate sentences, and write in all caps hoping that the model will suddenly become smarter. If your team has written and rewritten your prompt to the point of diminishing returns, you know what I mean.

But after working on AI at Amplitude, we discovered something counterintuitive: prompts weren’t the thing that improved quality. Context was.

That insight influenced how we built the rest of our system. It’s also critical for anyone trying to get reliable results from an LLM. If you’ve ever reached the point where prompt tweaks stop helping, this post will explain why and what you can do to improve your results.

Prompt engineering vs. context engineering

What’s the difference between prompt engineering and context engineering?

Prompt engineering is where most people start. You rewrite instructions, adjust the tone, add a few constraints, and hope the model follows them. This usually helps, but only up to a point.

Context engineering, on the other hand, looks at a different problem. Namely, which information and tools the model needs at the moment it’s making a decision. Good context engineering means being thoughtful about which data sources the model can query, which tools it can call, how the output from one step becomes input for the next, and which details should be kept out because they create noise rather than insight.

Once we understood that context was the primary driver of quality, we needed to make a series of choices about what context the model should and shouldn’t see. These two decisions had a big impact on our results.

Decision #1: Creating tool abstractions that work like analysts

One of the first context decisions we faced was how to organize tool calls to ensure we provided the right level of context to yield the best results.

Providing the system with all of the context all at once doesn’t work. Flooding the system with too much general data can lead to context rot, which negatively impacts performance and decreases the quality of results. We took a different approach that mirrors how analysts process information.

Analysts reason sequentially, not all at once. Each step—detection, attribution, segmentation—depends on the context generated by the prior one. LLMs require the same structure. Instead of relying on prompts alone, we orchestrate tool calls that mirror this investigative workflow. Each tool’s output becomes targeted context for the next stage, enabling the model to progressively narrow in on the true driver behind a metric change.

Decision #2: Restricting AI search to internal business context

The next decision we faced was whether or not to let the system search the public web.

Real-world events can influence product metrics, so the idea has merit. It’s tempting, but the internet creates far more noise than signal. If a model can pull in any news event, everything starts to look relevant. Instead of narrowing down potential causes, the model can overindex on external explanations that can’t be verified using product data.

So we drew a clear line and restricted Amplitude's AI from leveraging data that came from outside Amplitude.

This wasn’t about limiting the model’s abilities. It was about keeping the context clean so the explanations stayed grounded. This is a textbook example of what context engineering really means: deciding what not to include so the system stays focused.

Decision #3: Building richer conversational context

For an analyst, business context is often subconscious and automatic. They’re aware of significant organizational events without even thinking about them. When they ask AI a question, they automatically apply that information to the results before they ask a new one.

AI agents do not work the same way. They’re excellent at some parts of the analysis process (fetching data, finding patterns, automating routines) but they don’t have the same pulse on company context. That’s why analysts and agents get the best answers by collaborating in a back-and-forth conversational style.

With this approach, the AI is still doing the bulk of the grunt work to explore the data, but it is regularly coming back to the analyst for more context. The analyst still has control to steer the inquiry and can be sure that none of their valuable information is accidentally left out of the process.

Amplitude’s AI uses chat as an interface for this exact reason. It’s a choice we made with the intention of surfacing that subconscious analyst context frequently throughout an investigation.

Decision #4: Managing context size

As our agent matured, we started to see a different kind of problem: it got worse at following instructions as the context increased.

Our agent systematically applies a series of analytical steps to the data. Each step adds more information to the conversation—queries, results, intermediate summaries, and follow-up instructions. Over time, the model collects too much context to effectively analyze.

The information load started damaging our AI’s performance. For example, it precisely followed steps at the beginning of a chain, but faltered closer to the end. Sometimes it ignored or partially followed existing instructions that were buried in too much information. The contextual noise eventually got too loud for the model to function properly.

We found two effective ways to solve this problem:

Dynamic context windows. We dynamically switch to larger context windows (up to 1M tokens) only when a task truly requires it, instead of always operating at a maximum window size.
Explicit planning and step tracking. We experimented with getting the agent to plan its steps up front and track its progress as it moves through them, giving the model a clearer scaffold to follow, even as the surrounding context grew.

These solutions are examples of context engineering, not prompt tweaking. We adjusted the way context was organized and delivered to the model so it could reliably stay focused on the right things.

Evals are the backbone of context engineering

All of these decisions only worked because we treated evals as the backbone of our context engineering process. Instead of arguing about prompts or trusting our intuition, we used a robust evaluation set to measure whether each context change actually improved outcomes. When something failed, evals pinpointed the failure. The model was almost always missing a key piece of information, calling the wrong tool, or getting important context too late.

This evals-driven development loop is what made us confident in our choices. We could refactor tool calls, tighten or expand context, and reorganize steps knowing the evals would immediately tell us if we were moving in the right direction—or breaking something that used to work.

That’s the real key to context engineering. Without strong evals, it’s guesswork. With them, it becomes an iterative, reliable way to improve quality.

Context engineering is the key to scalable, reliable AI

Prioritizing context for Amplitude’s AI system didn’t just improve quality, it made the system easier to evolve. When we introduce a new tool, we don’t rewrite prompts from scratch. We define what the tool does, decide where it belongs in the workflow, and update our evals to reflect that new capability. When something breaks, we can debug by inspecting context and steps, instead of guessing which phrase in the prompt stopped working.

The same principle applies to anyone trying to get reliable, high-quality behavior from an LLM. Instead of endlessly tuning instructions, you should invest in the model’s operating environment: the information it sees, the tools it can use, the sequence it follows, and the evals that keep you honest. When you engineer context with intention, you get a system that’s more stable, more predictable, and far easier to extend as your use cases grow. If you’ve hit the ceiling on prompt tuning, context engineering is how you break through it.

About the author

Ram Soma

Staff AI Engineer

Why Context Engineering Matters More Than Prompt Engineering

Stop rewriting system prompts. Instead, structure data access and tool calls to mimic an analyst.

Insights

Dec 16, 2025

9 min read

Ram Soma

Staff AI Engineer

But after working on AI at Amplitude, we discovered something counterintuitive: prompts weren’t the thing that improved quality. Context was.

Prompt engineering vs. context engineering

What’s the difference between prompt engineering and context engineering?

Prompt engineering is where most people start. You rewrite instructions, adjust the tone, add a few constraints, and hope the model follows them. This usually helps, but only up to a point.

Decision #1: Creating tool abstractions that work like analysts

One of the first context decisions we faced was how to organize tool calls to ensure we provided the right level of context to yield the best results.

Decision #2: Restricting AI search to internal business context

The next decision we faced was whether or not to let the system search the public web.

So we drew a clear line and restricted Amplitude's AI from leveraging data that came from outside Amplitude.

Decision #3: Building richer conversational context

Amplitude’s AI uses chat as an interface for this exact reason. It’s a choice we made with the intention of surfacing that subconscious analyst context frequently throughout an investigation.

Decision #4: Managing context size

As our agent matured, we started to see a different kind of problem: it got worse at following instructions as the context increased.

We found two effective ways to solve this problem:

Dynamic context windows. We dynamically switch to larger context windows (up to 1M tokens) only when a task truly requires it, instead of always operating at a maximum window size.
Explicit planning and step tracking. We experimented with getting the agent to plan its steps up front and track its progress as it moves through them, giving the model a clearer scaffold to follow, even as the surrounding context grew.

These solutions are examples of context engineering, not prompt tweaking. We adjusted the way context was organized and delivered to the model so it could reliably stay focused on the right things.

Evals are the backbone of context engineering

That’s the real key to context engineering. Without strong evals, it’s guesswork. With them, it becomes an iterative, reliable way to improve quality.

Context engineering is the key to scalable, reliable AI

About the author

Ram Soma

Staff AI Engineer

Why Context Engineering Matters More Than Prompt Engineering

Stop rewriting system prompts. Instead, structure data access and tool calls to mimic an analyst.

Prompt engineering vs. context engineering

Decision #1: Creating tool abstractions that work like analysts

Decision #2: Restricting AI search to internal business context

Decision #3: Building richer conversational context

Decision #4: Managing context size

Evals are the backbone of context engineering

Context engineering is the key to scalable, reliable AI

Recommended Reading

Our Quest to Become AI-First and What We Learned

Stop Reacting to Customer Churn—Start Predicting It

Amplitude Pathfinder: Jamie Dunbar Smyth’s Obsession with Growth

Why Finserv Contact Centers Need Digital Analytics, Not More Dashboards

Why Context Engineering Matters More Than Prompt Engineering

Stop rewriting system prompts. Instead, structure data access and tool calls to mimic an analyst.

Prompt engineering vs. context engineering

Decision #1: Creating tool abstractions that work like analysts

Decision #2: Restricting AI search to internal business context

Decision #3: Building richer conversational context

Decision #4: Managing context size

Evals are the backbone of context engineering

Context engineering is the key to scalable, reliable AI

Recommended Reading

Our Quest to Become AI-First and What We Learned

Stop Reacting to Customer Churn—Start Predicting It

Amplitude Pathfinder: Jamie Dunbar Smyth’s Obsession with Growth

Why Finserv Contact Centers Need Digital Analytics, Not More Dashboards