K.I.S.S(Keep It Static, Stupid): System prompt ft. caching

The 3 most important things that control what happens to the cache are:

Tool Defs
Systemp Prompts
Messages

Explicitly in this order! Atleast confirmed for claude and gpt models.

The gemini docs are not very clear on this, but I expect(and have seen it in practice) to reasonably follow the same order.

There are other factors as well, like changing the thinking params, budget, etc which can break the cache(& also varies across providers a bit). But I want to focus on system prompt for this post!

How to structure your system prompt for optimal caching

Very simple, keep everything static :)

Easier said than done. In most realistic cases, you might want to add dynamic information in your system prompt for various reasons. Could be short term memory, user details, date/time, etc. The main idea is to keep as few dynamic things as possible in your sys prompt and whatever you keep, has to be in the end.

Wait, if we can keep dynamic things at the end, why should we limit ourselves?

Well, the more things you put, the surface area of those things changing will increase. The higher chances of bugs creeping in and each one of those bugs will cost you, since each time the system prompt cache breaks, it will invalidate the complete chat history that comes after it. So, while it's a pattern most apps follow, you should be extremely concious about what dynamic parts go there and how often it's changed.

TIP: In claude you can send your system prompts in multiple parts(it's an array), so break your prompts into 2 parts, and use a cache breakpoint with the staic part. It's not possible to do this in openai/gemini.

If not system prompt, then where?

Depending upon your usecase, there can be a lot of things you would want to serve upfront OR important enough to be put in system prompt. Use tags instead!

Popularized by folks peeking under claude code intially. it's a life saver. I would even say first try to solve your problem using system-tags, if you can't, try harder, if you still cant, put it in your sys prompt.

Quoting @trq212 from this really nice post

Use Messages for Updates

There may be times when the information you put in your prompt becomes out of date, for example if you have the time or if the user changes a file. It may be tempting to update the prompt, but that would result in a cache miss and could end up being quite expensive for the user.

Consider if you can pass in this information via messages in the next turn instead. In Claude Code, we add a tag in the next user message or tool result with the updated information for the model (e.g. it is now Wednesday), which helps preserve the cache.

I have few dynamic parts in my prompt, now what?

Well, now that you've thought about it and you MUST have these, it's time to think about whats the best way to update it. The answer is simple, "As few times as possible".

Again this is very usecase & provider dependent, but there are few common patterns which everyone follows which you MUST. Other than this, be creative and find a way to do this for your case.

System prompt must not change across steps in a given turn

In Agents(esp long running ones), make sure the systemp prompt doesn't change at any of the steps in the middle of a turn. All good projects do this, some even go a step ahead and cache it for that session itself! It's all about trade-offs here. Make the best one for your case.

2. Think around your cache TTL

You need to consider this and plan your updates accordingly. Different providers have different TTLs. You need to be a little creative here, but this also helps.

The core idea remains the same, update is as few times as possible and since it's strings we are dealing with at the end of the day, ensure your prompts have no funny buisness going on. Even a byte of change will invalidate your cache.

How do I track this?

It's software, there will be bugs; Mythos or no Mythos. What you need is observability around this to ensure this doesn't happen and if it does you know what and how so you can fix it. A very simple way to track is simply by tracking the length of your system prompts. If this changes mid request, across requests/turns(under the TTL), you have a bug.

There are overall metrics you should track too, like your cache hit rates and what not, but these simpler metrics like the length of your system prompts are very useful as well.

This is it for now, in later posts I'll try to tackle tool defs. If I missed something or there are more creative ways others are doing this, please LET ME KNOW! I would love to read about it✨