Prompt Caching with Claude: Faster, Cheaper AI Interactions

What is Prompt Caching?

Enables developers to cache frequently used context between API calls, providing Claude with more background knowledge while reducing costs and latency.

💰

Up to 90% cost reduction

Up to 85% latency reduction

When to Use Prompt Caching

  • 🗨️ Conversational agents
  • 💻 Coding assistants
  • 📄 Large document processing
  • 📝 Detailed instruction sets
  • 🔍 Agentic search and tool use
  • 📚 Talk to books and long-form content

Performance Improvements

Use Case Latency w/o Caching Latency w/ Caching Cost Reduction
Chat with a book 11.5s 2.4s (-79%) -90%
Many-shot prompting 1.6s 1.1s (-31%) -86%
Multi-turn conversation ~10s ~2.5s (-75%) -53%

Pricing for Cached Prompts

Claude 3.5 Sonnet

Our most intelligent model

  • Input: $3 / MTok
  • Cache write: $3.75 / MTok
  • Cache read: $0.30 / MTok
  • Output: $15 / MTok

Claude 3 Opus

Powerful model for complex tasks

  • Input: $15 / MTok
  • Cache write: $18.75 / MTok
  • Cache read: $1.50 / MTok
  • Output: $75 / MTok

Claude 3 Haiku

Fastest, most cost-effective model

  • Input: $0.25 / MTok
  • Cache write: $0.30 / MTok
  • Cache read: $0.03 / MTok
  • Output: $1.25 / MTok

Customer Spotlight: Notion

Notion Logo

Notion is adding prompt caching to Claude-powered features for its AI assistant, Notion AI. This optimization allows for faster and cheaper operations while maintaining high-quality results.

"We're excited to use prompt caching to make Notion AI faster and cheaper, all while maintaining state-of-the-art quality."

— Simon Last, Co-founder at Notion

Get Started with Prompt Caching

Explore our documentation and pricing page to start using the prompt caching public beta on the Anthropic API.

AIbase Logo

@Anthropic

Tags:Prompt CachingClaude AIAPI cost reductionlatency reductionconversational agentscoding assistantsAI performance improvementNotion AI