- Published on
Understanding Claude's Extended Thinking: When and How to Use Advanced Reasoning
- Authors

- Name
- Anablock
AI Insights & Innovations

Understanding Claude's Extended Thinking: When and How to Use Advanced Reasoning
Introduction
What if you could see exactly how an AI model thinks through a problem before giving you an answer? That's the core idea behind Extended Thinking, Claude's advanced reasoning feature that transforms how the model approaches complex tasks.
Extended thinking gives Claude dedicated "thinking time" to work through problems step-by-step before generating a final response. It's like watching a mathematician work through a proof on scratch paper—you see the reasoning process, not just the final answer. This transparency leads to better quality responses for complex tasks, but it comes with important trade-offs you need to understand.
In this article, we'll explore how extended thinking works, when to use it, how to implement it, and the security considerations that come with exposing Claude's reasoning process.
How Extended Thinking Works
Standard vs. Extended Thinking Responses
In standard mode, Claude returns a simple text response:
{
"content": [
{
"type": "text",
"text": "The answer is 42."
}
]
}
With extended thinking enabled, the response structure changes to include two distinct parts:
{
"content": [
{
"type": "thinking",
"thinking": "Let me work through this step by step...\n1. First, I need to understand the question\n2. The problem involves...\n3. Therefore, the answer must be...",
"signature": "sig_abc123..."
},
{
"type": "text",
"text": "The answer is 42."
}
]
}
The thinking block contains Claude's internal reasoning process—the step-by-step logic, considerations, and analysis that led to the final answer. The text block contains the polished response you'd show to end users.
The Benefits
Extended thinking provides three key advantages:
-
Better reasoning capabilities - Claude can break down complex problems into manageable steps, leading to more accurate solutions for tasks like mathematical proofs, logical puzzles, code debugging, and multi-step analysis.
-
Increased accuracy - By giving Claude space to "think out loud," you reduce errors on difficult problems. The model can catch its own mistakes during the reasoning process.
-
Transparency - You can audit Claude's reasoning to understand why it arrived at a particular answer. This is invaluable for debugging, building trust, and identifying edge cases.
The Trade-offs
Extended thinking isn't free—literally or figuratively:
-
Higher costs - You pay for every token in the thinking block. A complex reasoning process might use thousands of tokens before generating the final answer.
-
Increased latency - Thinking takes time. Your API calls will be slower because Claude needs to complete its reasoning before returning a response.
-
More complex response handling - Your code needs to parse the structured response format and handle both thinking and text blocks appropriately.
When to Use Extended Thinking
The decision framework is straightforward:
Start Without It
Always begin with standard prompting. Extended thinking is not a replacement for good prompt engineering—it's a tool for when standard prompting isn't quite enough.
Optimize Your Prompt First
Before enabling extended thinking:
- Refine your prompt structure
- Add clear examples
- Break complex tasks into subtasks
- Use chain-of-thought prompting
- Test with different phrasings
Enable Thinking When Standard Prompting Falls Short
Consider extended thinking when:
- You've optimized your prompt but accuracy is still below requirements
- The task involves complex multi-step reasoning (e.g., mathematical proofs, code analysis, logical deduction)
- You need transparency into the reasoning process for auditing or debugging
- The cost and latency trade-offs are acceptable for your use case
Run Evaluations
The gold standard: use your prompt evaluations. Run your prompts without thinking first, measure accuracy, and only enable thinking if the results don't meet your requirements. Let data drive the decision.
Response Structure and Security
The Signature System
Every thinking block includes a cryptographic signature:
{
"type": "thinking",
"thinking": "My reasoning process...",
"signature": "sig_1a2b3c4d5e6f..."
}
This signature serves a critical security purpose: it ensures you haven't modified the thinking text.
Why does this matter? If a developer could tamper with Claude's reasoning process (e.g., inserting malicious instructions into the thinking block), they could potentially steer the model toward unsafe or unintended behaviors in subsequent turns of a conversation.
The signature acts as a tamper-evident seal. If you modify the thinking text and pass it back to Claude, the signature won't match, and the model will reject it.
Redacted Thinking
Sometimes you'll receive a response that looks like this:
{
"type": "thinking",
"thinking": "[REDACTED]",
"signature": "sig_xyz789..."
}
What happened? Claude's internal safety systems flagged the thinking content. This can occur when:
- The reasoning process touches on sensitive topics
- The thinking contains content that violates safety policies
- Internal heuristics determine the thinking shouldn't be exposed
What's in the redacted block? The actual thinking is still there—it's just encrypted. The signature contains the complete reasoning in a form that Claude can read but you cannot.
Why does this matter? You can pass the redacted thinking block back to Claude in future conversation turns without losing context. Claude can "remember" its reasoning even though you can't see it.
How to handle it: Your application should gracefully handle redacted responses. Don't crash or throw errors—just use the text block and optionally log that thinking was redacted for monitoring purposes.
Implementation Guide
Basic Setup
To enable extended thinking, add two parameters to your chat function:
def chat(
messages,
system=None,
temperature=1.0,
stop_sequences=[],
tools=None,
thinking=False, # New parameter
thinking_budget=1024 # New parameter
):
# Your implementation...
thinking - Boolean flag to enable/disable extended thinking
thinking_budget - Maximum tokens Claude can use for reasoning (minimum: 1024)
API Configuration
Add the thinking configuration to your API parameters:
params = {
"model": "claude-3-7-sonnet-20250219",
"max_tokens": 4096,
"messages": messages
}
if thinking:
params["thinking"] = {
"type": "enabled",
"budget": thinking_budget
}
response = anthropic.messages.create(**params)
Important: Your max_tokens parameter must be greater than your thinking_budget. The thinking budget is consumed first, and the remaining tokens are available for the final text response.
Calling with Extended Thinking
# Standard call
response = chat(messages)
# With extended thinking
response = chat(messages, thinking=True, thinking_budget=2048)
Parsing the Response
Handle the structured response format:
def parse_response(response):
thinking_text = None
final_text = None
for block in response.content:
if block.type == "thinking":
thinking_text = block.thinking
signature = block.signature
elif block.type == "text":
final_text = block.text
return {
"thinking": thinking_text,
"response": final_text,
"was_redacted": thinking_text == "[REDACTED]"
}
Testing Redacted Responses
For testing purposes, you can force Claude to return a redacted thinking block by sending a special trigger string in your message. This helps ensure your application handles redacted responses gracefully without crashing in production.
# Test message that triggers redaction
test_message = {
"role": "user",
"content": "[Your test trigger string here]"
}
response = chat([test_message], thinking=True)
# Verify your app handles redacted thinking correctly
Feature Compatibility and Restrictions
Important: Extended thinking is not compatible with some other Claude features:
❌ Message pre-filling - Cannot be used with extended thinking
❌ Temperature control - Temperature parameter is ignored when thinking is enabled
❌ Other restrictions - See the full compatibility list
If your use case requires these features, you'll need to choose between them and extended thinking.
Real-World Use Cases
1. Mathematical Problem Solving
Without extended thinking:
User: Solve for x: 3x² + 7x - 6 = 0
Claude: x = 2/3 or x = -3
With extended thinking:
Thinking: "Let me use the quadratic formula...
a = 3, b = 7, c = -6
Discriminant = b² - 4ac = 49 + 72 = 121
√121 = 11
x = (-7 ± 11) / 6
x = 4/6 = 2/3 or x = -18/6 = -3"
Response: x = 2/3 or x = -3
You can verify the reasoning process and catch calculation errors.
2. Code Debugging
Without extended thinking:
User: Why is this function returning None?
Claude: You forgot to add a return statement.
With extended thinking:
Thinking: "Let me trace through the function...
1. The function defines result = calculate()
2. It processes result with transform()
3. But there's no return statement
4. Python functions return None by default
5. That's the issue"
Response: You forgot to add a return statement at the end of the function.
The thinking block shows the debugging process, helping you understand the analysis.
3. Multi-Step Analysis
Without extended thinking:
User: Should we invest in Project A or Project B?
Claude: Project A is the better choice.
With extended thinking:
Thinking: "Let me compare systematically...
Project A: ROI 15%, risk medium, timeline 2 years
Project B: ROI 12%, risk low, timeline 1 year
Considering risk-adjusted returns...
Project A has higher absolute ROI but longer timeline
Project B has faster payback despite lower ROI
Given the company's cash flow constraints...
Project A is better for long-term growth"
Response: Project A is the better choice for long-term growth, despite Project B's faster payback period.
The reasoning shows why the recommendation was made.
Best Practices
1. Use Thinking Budgets Wisely
Start with the minimum (1024 tokens) and increase only if needed. Monitor your thinking token usage to optimize costs:
# Log thinking token usage
thinking_tokens = sum(
len(block.thinking.split()) * 1.3 # Rough token estimate
for block in response.content
if block.type == "thinking"
)
print(f"Thinking used ~{thinking_tokens} tokens")
2. Don't Show Raw Thinking to End Users
The thinking block is for developers and debugging—not end users. Show only the polished text response in your UI:
# Good: Show only the final response
display_to_user(response.text)
# Bad: Showing raw thinking to users
display_to_user(response.thinking + "\n\n" + response.text)
3. Handle Redacted Responses Gracefully
Always check for redaction and handle it without breaking your application:
if thinking_text == "[REDACTED]":
logger.info("Thinking was redacted for this response")
# Continue with the text response
else:
logger.debug(f"Thinking: {thinking_text}")
4. A/B Test Extended Thinking
Run controlled experiments to measure the impact:
# Control group: standard prompting
control_accuracy = evaluate_prompts(thinking=False)
# Treatment group: extended thinking
treatment_accuracy = evaluate_prompts(thinking=True)
# Compare results
improvement = treatment_accuracy - control_accuracy
cost_increase = calculate_cost_delta(thinking=True)
# Decide based on ROI
if improvement > threshold and cost_increase < budget:
enable_thinking_in_production()
5. Cache Thinking Results When Appropriate
If you're running the same complex reasoning multiple times, consider caching:
cache_key = hash(prompt)
if cache_key in thinking_cache:
return thinking_cache[cache_key]
response = chat(messages, thinking=True)
thinking_cache[cache_key] = response
return response
Cost and Performance Considerations
Token Usage
Extended thinking can significantly increase token consumption:
| Task Type | Avg Thinking Tokens | Cost Impact | |-----------|---------------------|-------------| | Simple Q&A | 0 (use standard) | 0% | | Code review | 500-1500 | +50-150% | | Math problems | 300-800 | +30-80% | | Multi-step analysis | 1000-3000 | +100-300% |
Latency
Thinking adds processing time:
- Standard response: 1-3 seconds
- With thinking (1024 budget): 3-6 seconds
- With thinking (4096 budget): 6-12 seconds
For user-facing applications, consider:
- Showing a "thinking..." indicator
- Using streaming responses if available
- Offloading to background jobs for non-interactive tasks
Conclusion
Extended thinking is a powerful tool in your Claude toolkit, but it's not a silver bullet. Use it strategically:
✅ Do use extended thinking when:
- Standard prompting doesn't meet accuracy requirements after optimization
- You need transparency into reasoning for auditing or debugging
- The task involves complex multi-step logic
- Cost and latency trade-offs are acceptable
❌ Don't use extended thinking when:
- Standard prompting already works well
- You need low-latency responses
- The task is simple or straightforward
- You're on a tight token budget
The golden rule: Start simple, optimize thoroughly, then add thinking when you need that extra reasoning capability. Let your evaluations guide the decision, not assumptions.
Extended thinking transforms Claude from a black box into a transparent reasoning partner. When used appropriately, it can significantly improve accuracy on complex tasks while giving you visibility into the model's thought process. Just remember to weigh the benefits against the costs for your specific use case.
Further Reading
- Extended Thinking Documentation
- Feature Compatibility List
- Prompt Engineering Guide
- Token Counting Best Practices
Ready to try extended thinking? Start with a single complex prompt, enable thinking, and compare the results. The transparency might surprise you.