Prompt Caching with Claude: Faster, Cheaper AI Interactions

What is Prompt Caching?

Enables developers to cache frequently used context between API calls, providing Claude with more background knowledge while reducing costs and latency.

💰

Up to 90% cost reduction

⚡

Up to 85% latency reduction

When to Use Prompt Caching

🗨️ Conversational agents
💻 Coding assistants
📄 Large document processing
📝 Detailed instruction sets
🔍 Agentic search and tool use
📚 Talk to books and long-form content

Performance Improvements

Use Case	Latency w/o Caching	Latency w/ Caching	Cost Reduction
Chat with a book	11.5s	2.4s (-79%)	-90%
Many-shot prompting	1.6s	1.1s (-31%)	-86%
Multi-turn conversation	~10s	~2.5s (-75%)	-53%

Pricing for Cached Prompts

Claude 3.5 Sonnet

Our most intelligent model

Input: $3 / MTok
Cache write: $3.75 / MTok
Cache read: $0.30 / MTok
Output: $15 / MTok

Claude 3 Opus

Powerful model for complex tasks

Input: $15 / MTok
Cache write: $18.75 / MTok
Cache read: $1.50 / MTok
Output: $75 / MTok

Claude 3 Haiku

Fastest, most cost-effective model

Input: $0.25 / MTok
Cache write: $0.30 / MTok
Cache read: $0.03 / MTok
Output: $1.25 / MTok

Customer Spotlight: Notion

Notion is adding prompt caching to Claude-powered features for its AI assistant, Notion AI. This optimization allows for faster and cheaper operations while maintaining high-quality results.

"We're excited to use prompt caching to make Notion AI faster and cheaper, all while maintaining state-of-the-art quality."

— Simon Last, Co-founder at Notion

Get Started with Prompt Caching

Explore our documentation and pricing page to start using the prompt caching public beta on the Anthropic API.

Documentation Pricing

@Anthropic

Tags：Prompt CachingClaude AIAPI cost reductionlatency reductionconversational agentscoding assistantsAI performance improvementNotion AI