claude 4

Understanding Claude's Extended Thinking: When and How to Use Advanced Reasoning

Introduction

What if you could see exactly how an AI model thinks through a problem before giving you an answer? That's the core idea behind Extended Thinking, Claude's advanced reasoning feature that transforms how the model approaches complex tasks.

Extended thinking gives Claude dedicated "thinking time" to work through problems step-by-step before generating a final response. It's like watching a mathematician work through a proof on scratch paper—you see the reasoning process, not just the final answer. This transparency leads to better quality responses for complex tasks, but it comes with important trade-offs you need to understand.

In this article, we'll explore how extended thinking works, when to use it, how to implement it, and the security considerations that come with exposing Claude's reasoning process.

How Extended Thinking Works

Standard vs. Extended Thinking Responses

In standard mode, Claude returns a simple text response:

{
  "content": [
    {
      "type": "text",
      "text": "The answer is 42."
    }
  ]
}

With extended thinking enabled, the response structure changes to include two distinct parts:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me work through this step by step...\n1. First, I need to understand the question\n2. The problem involves...\n3. Therefore, the answer must be...",
      "signature": "sig_abc123..."
    },
    {
      "type": "text",
      "text": "The answer is 42."
    }
  ]
}

The thinking block contains Claude's internal reasoning process—the step-by-step logic, considerations, and analysis that led to the final answer. The text block contains the polished response you'd show to end users.

The Benefits

Extended thinking provides three key advantages:

Better reasoning capabilities - Claude can break down complex problems into manageable steps, leading to more accurate solutions for tasks like mathematical proofs, logical puzzles, code debugging, and multi-step analysis.
Increased accuracy - By giving Claude space to "think out loud," you reduce errors on difficult problems. The model can catch its own mistakes during the reasoning process.
Transparency - You can audit Claude's reasoning to understand why it arrived at a particular answer. This is invaluable for debugging, building trust, and identifying edge cases.

The Trade-offs

Extended thinking isn't free—literally or figuratively:

Higher costs - You pay for every token in the thinking block. A complex reasoning process might use thousands of tokens before generating the final answer.
Increased latency - Thinking takes time. Your API calls will be slower because Claude needs to complete its reasoning before returning a response.
More complex response handling - Your code needs to parse the structured response format and handle both thinking and text blocks appropriately.

When to Use Extended Thinking

The decision framework is straightforward:

Start Without It

Always begin with standard prompting. Extended thinking is not a replacement for good prompt engineering—it's a tool for when standard prompting isn't quite enough.

Optimize Your Prompt First

Before enabling extended thinking:

Refine your prompt structure
Add clear examples
Break complex tasks into subtasks
Use chain-of-thought prompting
Test with different phrasings

Enable Thinking When Standard Prompting Falls Short

Consider extended thinking when:

You've optimized your prompt but accuracy is still below requirements
The task involves complex multi-step reasoning (e.g., mathematical proofs, code analysis, logical deduction)
You need transparency into the reasoning process for auditing or debugging
The cost and latency trade-offs are acceptable for your use case

Run Evaluations

The gold standard: use your prompt evaluations. Run your prompts without thinking first, measure accuracy, and only enable thinking if the results don't meet your requirements. Let data drive the decision.

Response Structure and Security

The Signature System

Every thinking block includes a cryptographic signature:

{
  "type": "thinking",
  "thinking": "My reasoning process...",
  "signature": "sig_1a2b3c4d5e6f..."
}

This signature serves a critical security purpose: it ensures you haven't modified the thinking text.

Why does this matter? If a developer could tamper with Claude's reasoning process (e.g., inserting malicious instructions into the thinking block), they could potentially steer the model toward unsafe or unintended behaviors in subsequent turns of a conversation.

The signature acts as a tamper-evident seal. If you modify the thinking text and pass it back to Claude, the signature won't match, and the model will reject it.

Redacted Thinking

Sometimes you'll receive a response that looks like this:

{
  "type": "thinking",
  "thinking": "[REDACTED]",
  "signature": "sig_xyz789..."
}

What happened? Claude's internal safety systems flagged the thinking content. This can occur when:

The reasoning process touches on sensitive topics
The thinking contains content that violates safety policies
Internal heuristics determine the thinking shouldn't be exposed

What's in the redacted block? The actual thinking is still there—it's just encrypted. The signature contains the complete reasoning in a form that Claude can read but you cannot.

Why does this matter? You can pass the redacted thinking block back to Claude in future conversation turns without losing context. Claude can "remember" its reasoning even though you can't see it.

How to handle it: Your application should gracefully handle redacted responses. Don't crash or throw errors—just use the text block and optionally log that thinking was redacted for monitoring purposes.

Implementation Guide

Basic Setup

To enable extended thinking, add two parameters to your chat function:

def chat(
    messages,
    system=None,
    temperature=1.0,
    stop_sequences=[],
    tools=None,
    thinking=False,           # New parameter
    thinking_budget=1024      # New parameter
):
    # Your implementation...

thinking - Boolean flag to enable/disable extended thinking
thinking_budget - Maximum tokens Claude can use for reasoning (minimum: 1024)

API Configuration

Add the thinking configuration to your API parameters:

params = {
    "model": "claude-3-7-sonnet-20250219",
    "max_tokens": 4096,
    "messages": messages
}

if thinking:
    params["thinking"] = {
        "type": "enabled",
        "budget": thinking_budget
    }

response = anthropic.messages.create(**params)

Important: Your max_tokens parameter must be greater than your thinking_budget. The thinking budget is consumed first, and the remaining tokens are available for the final text response.

Calling with Extended Thinking

# Standard call
response = chat(messages)

# With extended thinking
response = chat(messages, thinking=True, thinking_budget=2048)

Parsing the Response

Handle the structured response format:

def parse_response(response):
    thinking_text = None
    final_text = None
    
    for block in response.content:
        if block.type == "thinking":
            thinking_text = block.thinking
            signature = block.signature
        elif block.type == "text":
            final_text = block.text
    
    return {
        "thinking": thinking_text,
        "response": final_text,
        "was_redacted": thinking_text == "[REDACTED]"
    }

Testing Redacted Responses

For testing purposes, you can force Claude to return a redacted thinking block by sending a special trigger string in your message. This helps ensure your application handles redacted responses gracefully without crashing in production.

# Test message that triggers redaction
test_message = {
    "role": "user",
    "content": "[Your test trigger string here]"
}

response = chat([test_message], thinking=True)
# Verify your app handles redacted thinking correctly

Feature Compatibility and Restrictions

Important: Extended thinking is not compatible with some other Claude features:

❌ Message pre-filling - Cannot be used with extended thinking
❌ Temperature control - Temperature parameter is ignored when thinking is enabled
❌ Other restrictions - See the full compatibility list

If your use case requires these features, you'll need to choose between them and extended thinking.

Real-World Use Cases

1. Mathematical Problem Solving

Without extended thinking:

User: Solve for x: 3x² + 7x - 6 = 0
Claude: x = 2/3 or x = -3

With extended thinking:

Thinking: "Let me use the quadratic formula...
a = 3, b = 7, c = -6
Discriminant = b² - 4ac = 49 + 72 = 121
√121 = 11
x = (-7 ± 11) / 6
x = 4/6 = 2/3 or x = -18/6 = -3"

Response: x = 2/3 or x = -3

You can verify the reasoning process and catch calculation errors.

2. Code Debugging

Without extended thinking:

User: Why is this function returning None?
Claude: You forgot to add a return statement.

With extended thinking:

Thinking: "Let me trace through the function...
1. The function defines result = calculate()
2. It processes result with transform()
3. But there's no return statement
4. Python functions return None by default
5. That's the issue"

Response: You forgot to add a return statement at the end of the function.

The thinking block shows the debugging process, helping you understand the analysis.

3. Multi-Step Analysis

Without extended thinking:

User: Should we invest in Project A or Project B?
Claude: Project A is the better choice.

With extended thinking:

Thinking: "Let me compare systematically...
Project A: ROI 15%, risk medium, timeline 2 years
Project B: ROI 12%, risk low, timeline 1 year
Considering risk-adjusted returns...
Project A has higher absolute ROI but longer timeline
Project B has faster payback despite lower ROI
Given the company's cash flow constraints...
Project A is better for long-term growth"

Response: Project A is the better choice for long-term growth, despite Project B's faster payback period.

The reasoning shows why the recommendation was made.

Best Practices

1. Use Thinking Budgets Wisely

Start with the minimum (1024 tokens) and increase only if needed. Monitor your thinking token usage to optimize costs:

# Log thinking token usage
thinking_tokens = sum(
    len(block.thinking.split()) * 1.3  # Rough token estimate
    for block in response.content
    if block.type == "thinking"
)
print(f"Thinking used ~{thinking_tokens} tokens")

2. Don't Show Raw Thinking to End Users

The thinking block is for developers and debugging—not end users. Show only the polished text response in your UI:

# Good: Show only the final response
display_to_user(response.text)

# Bad: Showing raw thinking to users
display_to_user(response.thinking + "\n\n" + response.text)

3. Handle Redacted Responses Gracefully

Always check for redaction and handle it without breaking your application:

if thinking_text == "[REDACTED]":
    logger.info("Thinking was redacted for this response")
    # Continue with the text response
else:
    logger.debug(f"Thinking: {thinking_text}")

4. A/B Test Extended Thinking

Run controlled experiments to measure the impact:

# Control group: standard prompting
control_accuracy = evaluate_prompts(thinking=False)

# Treatment group: extended thinking
treatment_accuracy = evaluate_prompts(thinking=True)

# Compare results
improvement = treatment_accuracy - control_accuracy
cost_increase = calculate_cost_delta(thinking=True)

# Decide based on ROI
if improvement > threshold and cost_increase < budget:
    enable_thinking_in_production()

5. Cache Thinking Results When Appropriate

If you're running the same complex reasoning multiple times, consider caching:

cache_key = hash(prompt)
if cache_key in thinking_cache:
    return thinking_cache[cache_key]

response = chat(messages, thinking=True)
thinking_cache[cache_key] = response
return response

Cost and Performance Considerations

Token Usage

Extended thinking can significantly increase token consumption:

| Task Type | Avg Thinking Tokens | Cost Impact | |-----------|---------------------|-------------| | Simple Q&A | 0 (use standard) | 0% | | Code review | 500-1500 | +50-150% | | Math problems | 300-800 | +30-80% | | Multi-step analysis | 1000-3000 | +100-300% |

Latency

Thinking adds processing time:

Standard response: 1-3 seconds
With thinking (1024 budget): 3-6 seconds
With thinking (4096 budget): 6-12 seconds

For user-facing applications, consider:

Showing a "thinking..." indicator
Using streaming responses if available
Offloading to background jobs for non-interactive tasks

Conclusion

Extended thinking is a powerful tool in your Claude toolkit, but it's not a silver bullet. Use it strategically:

✅ Do use extended thinking when:

Standard prompting doesn't meet accuracy requirements after optimization
You need transparency into reasoning for auditing or debugging
The task involves complex multi-step logic
Cost and latency trade-offs are acceptable

❌ Don't use extended thinking when:

Standard prompting already works well
You need low-latency responses
The task is simple or straightforward
You're on a tight token budget

The golden rule: Start simple, optimize thoroughly, then add thinking when you need that extra reasoning capability. Let your evaluations guide the decision, not assumptions.

Extended thinking transforms Claude from a black box into a transparent reasoning partner. When used appropriately, it can significantly improve accuracy on complex tasks while giving you visibility into the model's thought process. Just remember to weigh the benefits against the costs for your specific use case.

Understanding Claude's Extended Thinking: When and How to Use Advanced Reasoning

Introduction

How Extended Thinking Works

Standard vs. Extended Thinking Responses

The Benefits

The Trade-offs

When to Use Extended Thinking

Start Without It

Optimize Your Prompt First

Enable Thinking When Standard Prompting Falls Short

Run Evaluations

Response Structure and Security

The Signature System

Redacted Thinking

Implementation Guide

Basic Setup

API Configuration

Calling with Extended Thinking

Parsing the Response

Testing Redacted Responses

Feature Compatibility and Restrictions

Real-World Use Cases

1. Mathematical Problem Solving

2. Code Debugging

3. Multi-Step Analysis

Best Practices

1. Use Thinking Budgets Wisely

2. Don't Show Raw Thinking to End Users

3. Handle Redacted Responses Gracefully

4. A/B Test Extended Thinking

5. Cache Thinking Results When Appropriate

Cost and Performance Considerations

Token Usage

Latency

Conclusion

Further Reading