Skip to main content

Agent Planning: From Goals to Executable Actions in Production AI Agents

What Is Agent Planning​

Agent Planning is the process of transforming a high‑level, ambiguous user goal into a concrete, ordered sequence of executable actions (tool calls, API requests, or sub‑tasks). It is the component that gives an agent its autonomy—allowing it to break down “Research Q3 sales trends” into query_database → analyze_numbers → generate_chart → write_summary.

Unlike a traditional program with hardcoded steps, an agent’s planner is driven by an LLM and can adapt when new information arrives or steps fail. Planning turns a reactive chatbot into a proactive, goal‑oriented system.

Examples across domains:

DomainUser GoalPlan (actions)
Research“Compare three cloud providers’ pricing for GPU instances”search_aws_pricing → search_azure_pricing → search_gcp_pricing → compare_in_table → summarise
Coding“Refactor auth.py to use async/await”read_file → identify_sync_functions → generate_async_versions → write_backup → replace_file → run_tests
Customer Support“Refund shipping on order #ORD‑1234 because it’s late”get_order_status → if delayed then request_refund → send_email_confirmation
Business Automation“Onboard new employee: create Slack, GitHub, email”create_slack_account → create_github_account → create_email_alias → send_welcome_message

Why Planning Matters​

Planning is one of the defining characteristics that separates AI agents from simpler LLM applications.

CapabilityTraditional ChatbotRAG ApplicationAI Agent (with planning)
Handles multi‑step tasksNo (single Q&A)No (one retrieval + answer)Yes – decomposes goal into steps
Uses tools sequentiallyNo (or one fixed tool)No (only retrieval)Yes – orchestrates multiple tools
Adapts to intermediate resultsNoNoYes – replans based on tool outputs
Recovers from failuresNo (just error message)NoYes – alternative tools or new plan
Visible to userSingle answerSingle answerCan show plan and progress

Without planning, an agent is just a fancy router: it can call one tool, get a result, and answer. With planning, it can handle tasks like “Find me the cheapest flight from NYC to London on June 15th, book it if under $500, and add to my calendar.” That requires sequencing: search flights → check price → conditionally call booking → call calendar API.


Planning in the Agent Architecture​

Planning is a distinct component in the agent runtime, sandwiched between memory retrieval and tool execution.

The planner does not execute actions. It produces a plan – a structured representation (list, DAG, or hierarchical task network). The workflow engine executes the plan step by step, feeding results back to the reasoning engine, which may request a replan.


The Agent Planning Lifecycle​

Stage 1: Goal Understanding​

Parse the user request into a structured goal (e.g., {type: "book_flight", constraints: {max_price:500, origin:"NYC", destination:"LON", date:"2025-06-15"}}).

Stage 2: Task Decomposition​

Break the goal into atomic tasks. Example: search_flights → filter_by_price → select_best_option → book_flight.

Stage 3: Action Generation​

For each task, define the required action. This may be a tool call, a sub‑plan, or a conditional branch.

Stage 4: Tool Selection​

Map actions to concrete tools from the agent’s registry. The planner must respect tool schemas (input/output types).

Stage 5: Execution (by workflow engine)​

The plan is passed to the workflow engine, which executes steps sequentially or in parallel.

Stage 6: Validation​

After each step, the reasoning engine checks if the outcome matches expectations. If not, it triggers replanning.

Stage 7: Replanning​

The planner generates an updated plan from the current state, possibly skipping completed steps or choosing alternative tools.


Planning Strategies​

StrategyDescriptionWhen to UseExample
Single‑step planningOne action, no sequencing.Trivial tasks, single tool.“What’s the weather?” → call get_weather.
Multi‑step planningLinear sequence of actions.Predictable, ordered tasks.“Create user → send welcome email → log event”.
Hierarchical planningParent plan with sub‑plans (tasks within tasks).Complex goals with reusable subtasks.“Write report” → subtask “gather data” → sub‑subtask “query DB”.
Dynamic planningPlan is generated incrementally as execution proceeds.Unknown tool outputs; exploration.Research agent that reads a document, then decides next source.
ReplanningModify existing plan based on new information or failures.Unreliable APIs, user interrupts, conditional logic.Price tool returns higher than expected → plan to look for discounts.
Conditional planningPlans with branches (if‑then‑else).Decisions depend on tool results.“If stock > 0 then order else notify me”.

Planning and Tool Calling​

The planner drives tool selection, but the two are distinct: planning decides what to do; tool calling executes how.

Example integration:

// Plan generated by planner
{
"plan_id": "plan_001",
"steps": [
{"id": 1, "action": "search_flights", "params": {"origin": "NYC", "dest": "LON", "date": "2025-06-15"}},
{"id": 2, "action": "filter_flights", "params": {"max_price": 500}, "depends_on": [1]},
{"id": 3, "action": "book_flight", "params": {"flight_id": "$step2.result.selected_id"}, "depends_on": [2], "condition": "step2.result.has_flights"}
]
}

Responsibilities:

ComponentRole
PlannerChooses tool names and fills required parameters (using placeholders for results of previous steps).
Tool calling layerValidates parameters against tool schema, executes, returns structured result.
Workflow engineResolves dependencies, injects previous step outputs into parameters.

Common mistake: The planner assumes a tool will succeed and does not include fallback tools. Always add conditional branches (e.g., if search_flights fails, try search_alternative_flights).


Planning and Memory​

Planning heavily depends on memory – both for context and for storing the plan itself.

  • Short‑term memory – Provides recent conversation context (e.g., user previously said “I prefer early flights”). The planner must consider this when generating steps.
  • Long‑term memory – Stores successful plans for similar goals. A planner can retrieve a plan template instead of generating from scratch.
  • Plan store – A dedicated state component that holds the current plan, which steps are completed, their outputs, and any pending replan requests.

Context window consideration: The plan itself must be kept small. A plan with 20 steps can exceed token limits if serialised naively. Store plans in the state manager and only inject the next step into the LLM context.


Planning Patterns​

Plan‑and‑Execute​

The classic pattern: generate full plan upfront, then execute each step sequentially.

Advantages: Simple, transparent, easy to debug.
Disadvantages: Cannot adapt to unexpected tool outputs; if step 2 fails, the rest of the plan may be useless.

Implementation: One LLM call for planning, then a loop.

ReAct (Reasoning + Acting)​

Interleaves planning and execution: at each turn, the LLM outputs a thought (reasoning), then an action, then observes the result, and repeats.

Advantages: Highly adaptive, good for exploration.
Disadvantages: More LLM calls (higher cost, latency), can loop.

Implementation: Prompt instructs model to output Thought: ... Action: tool_name[params].

Tree of Thoughts (ToT)​

Explores multiple reasoning paths in parallel, evaluating each branch, then chooses the best.

Advantages: Better for complex reasoning (math, puzzles).
Disadvantages: Very high token cost, complex to implement.

Implementation: Multiple LLM calls to generate candidate next steps, score them, and prune.

Reflection / Self‑Correction​

After plan execution (or after each step), the agent reflects on the outcome and refines the plan for the next iteration.

Advantages: Improves over time, can learn from mistakes.
Disadvantages: Requires storing feedback in memory.

Implementation: After plan completion, call LLM with prompt: “Given the outcome, how could the plan be improved?”

Workflow‑Based Planning​

A hybrid approach: a fixed workflow DAG defines the high‑level steps, but each node contains an agent with local planning (e.g., a “research” node that uses ReAct).

Advantages: Combines predictability with flexibility.
Disadvantages: More complex to design.

Implementation: Use LangGraph with conditional edges and sub‑graphs.

PatternLLM Calls per StepAdaptivityCostBest for
Plan‑and‑Execute1 (plan) + 1 (final)LowLowWell‑understood, deterministic tasks
ReAct2+ per actionHighMediumDynamic tasks, tool use
Tree of ThoughtsManyVery highHighReasoning‑heavy, open‑ended
ReflectionPlan + reflectionMediumMediumIterative improvement
Workflow‑basedVariesMediumMediumEnterprise processes with sub‑tasks

FrameworkPlanning ModelStrengthsWeaknessesUse Case
LangGraphGraph‑based + conditional edges; planner can be any node.Full control, checkpointing, replanning as graph cycle.Steeper learning curve.Complex, long‑running workflows.
CrewAISequential or hierarchical tasks; built‑in Planner agent.Simple, role‑based planning.Limited dynamic replanning; not good for loops.Linear business processes.
AutoGenConversation‑driven; agents plan via chat.Natural multi‑agent planning.Inefficient for pure plan‑execute; high token usage.Multi‑agent deliberation.
OpenAI Agents SDKHandoff‑based; no explicit planner – each agent has a single step.Simple, low latency.No multi‑step planning within one agent.Single‑turn tool use.
Semantic KernelPlan object with steps; sequential planner via LLM.Good for .NET/Java, enterprise.Basic planning, no built‑in replanning.Simple automation.

Recommendation: For production systems that require robust planning (replanning, conditional branching, human‑in‑the‑loop), use LangGraph. For quick prototypes with linear plans, CrewAI or Semantic Kernel suffice.


Production Planning Challenges​

ChallengeDescriptionMitigation
Hallucinated plansLLM invents non‑existent tools or steps.Validate plan against tool registry; reject plans with unknown actions.
Excessive tool usagePlan includes redundant or inefficient steps.Limit plan depth (max 10 steps). Use plan caching.
Infinite loopsReplanning never converges.Set max replan attempts (3). Detect repeated states.
Cost escalationPlanning itself requires LLM calls. Each replan adds cost.Use cheaper model for planning (e.g., GPT‑3.5). Cache identical plans.
Context driftOver many turns, the plan grows and overwhelms context.Summarise completed steps; only keep next step and dependencies in prompt.
Dependency failuresA tool fails, breaking all downstream steps.Include fallback steps in plan (e.g., if tool A fails, use tool B).

Evaluating Agent Planning​

You cannot improve what you do not measure. Evaluate planning separately from execution.

MetricDefinitionHow to Measure
Plan accuracy% of generated plans that are valid (all tools exist, parameters correct).Parse plan JSON; check tool registry; count errors.
Task completion rate% of user goals fully satisfied by following the plan.End‑to‑end test with expected outcome.
Plan efficiencyNumber of steps taken vs. optimal (human‑curated) plan.Compare step count; compute overhead ratio.
Replan success rateWhen a step fails, replan leads to success vs. total replan attempts.Trace logs: replan_triggered → final_success.
Cost per planTotal tokens (LLM calls) used during planning phase.Accumulate token usage for planning prompts.
Planning latencyTime from goal input to first executable step.Measure time before tool execution begins.

Example evaluation set:

GoalExpected planGenerated planCorrect?StepsTokens
“Send email to [email protected]”send_email([email protected])search_contact(john) → send_emailNo (over‑complicated)2 vs 1450
“Order status for #123”get_order_status(123)get_order_status(123)Yes1210

Best Practices​

  1. Always validate the plan before execution – Check that every tool exists, required parameters are present, and types match. Reject invalid plans.

  2. Set a maximum plan depth – Hard limit of 10–15 steps. Beyond that, ask the user to refine the goal.

  3. Implement plan caching – Store successful plans keyed by goal intent (embedding similarity). Retrieve and reuse.

  4. Use a cheaper LLM for planning – GPT‑3.5‑turbo or Llama 3 8B is often enough; reserve GPT‑4 for final reasoning.

  5. Include conditional branches – Teach the planner to output condition fields (e.g., if result.success else ...).

  6. Replan only when necessary – Not every tool failure needs a replan. Transient errors: retry. Schema errors: replan.

  7. Keep the plan in state, not in prompts – Store plan in a structured object; inject only the next step into the LLM context.

  8. Monitor planning failures – Alert when plan validation fails more than 5% of requests. That signals prompt or tool registry issues.

  9. Test planning in isolation – Mock tool execution; verify that the planner produces correct plans for a suite of goals.

  10. Provide tool documentation in the planning prompt – Include tool names, descriptions, parameter schemas, and example calls. The better the documentation, the better the plan.

  11. Version your planning prompts – Changes to prompts can drastically change plan quality. A/B test new prompts before full rollout.

  12. Log every plan – Save the generated plan, the final executed steps, and any replans. This is gold for debugging.


Common Planning Mistakes​

MistakeConsequenceFix
Overly complex plansHigh latency, token cost, more failure points.Limit depth; prefer atomic steps.
No replanningAgent fails when first plan encounters an unexpected tool result.Implement max_replans and a feedback loop.
Unlimited tool executionPlan can loop forever.Set iteration limit, detect repeated states.
Ignoring memory qualityPlan repeats steps because memory retrieval is poor.Improve memory recall; test with relevant history.
No evaluation frameworkCannot tell if planning improvements actually help.Build offline test suite before writing planner.
Hardcoding tool namesChanging tool schema breaks all plans.Use tool registry and dynamic lookup.
Planning as an afterthoughtAgent behaves reactively, never decomposing goals.Start with plan‑and‑execute pattern, then refine.

Case Study: Enterprise Research Agent​

Domain: Competitive intelligence. An analyst asks the agent: “Gather information on Acme Corp’s latest product launch, their pricing strategy, and any customer complaints from the last 3 months. Produce a summary with citations.”

Initial Plan (generated by planner)​

{
"steps": [
{"action": "web_search", "params": {"query": "Acme Corp product launch 2026"}},
{"action": "web_search", "params": {"query": "Acme Corp pricing strategy"}},
{"action": "web_search", "params": {"query": "Acme Corp customer complaints after:2026-03-01"}},
{"action": "extract_dates", "params": {"source": "$step1.result"}},
{"action": "sentiment_analysis", "params": {"text": "$step3.result"}},
{"action": "generate_summary", "params": {"sources": ["$step1","$step2","$step3"], "insights": ["$step4","$step5"]}}
]
}

Execution and Replanning​

  • Step 1 succeeds → returns three product launch articles.
  • Step 2 succeeds → pricing page found.
  • Step 3 fails → web search returns no complaints (maybe Acme has a different name for support).
    Replan triggered. The planner replaces step 3 with:
    "action": "search_trustpilot", "params": {"company": "Acme Corp"}}
  • New step succeeds → returns complaints.
  • Step 4 & 5 execute.
  • Step 6 generates final summary with citations.

Optimisation Opportunities​

  • Plan caching – Future “gather info on {competitor}” requests reuse the same plan template, reducing LLM calls.
  • Parallel execution – Steps 1,2,3 are independent. The workflow engine can execute them concurrently (cuts latency from 9s to 3s).
  • Tool‑specific fallbacks – Instead of replanning, the planner could have included fallback_tool field in the original step.

Result: The agent completed the research in 12 seconds, used 5 tool calls, 2 planning LLM calls (initial + replan), cost $0.08. Without planning, a single LLM call would have tried to answer from memory and hallucinated.


FAQ​

1. Is planning always necessary for an AI agent?
No. For single‑step tool calls (e.g., “what’s the weather”), planning is overkill. Use planning only when the task requires multiple tools or conditional logic.

2. What is the difference between planning and workflows?
Planning generates the sequence dynamically. Workflows are pre‑defined DAGs. A workflow agent may have no planning component; it simply executes a fixed path. Planning is for adaptive systems.

3. When should an agent replan?
Replan when: (a) a tool fails with a recoverable error (e.g., “no results” → try alternative tool), (b) a tool returns unexpected data that changes the goal, or (c) the user interrupts with new instructions.

4. How deep should planning trees be?
For most tasks, 5–10 steps. Deeper plans (>15) are hard for LLMs to generate correctly and expensive to execute. Decompose large goals into multiple agent turns.

5. Does LangGraph support planning?
Yes, implicitly. LangGraph does not have a separate “planner” node, but you can implement planning as a graph node that calls an LLM to produce a plan, then use conditional edges to execute steps and loop back to the planner for replanning.

6. Can I use planning without LLMs?
Classic AI planning (STRIPS, PDDL) works for deterministic domains. But for natural language goals, LLM‑based planning is far more practical.

7. How do I prevent the planner from inventing non‑existent tools?
Provide a strict list of available tools in the planning prompt, with names and parameter schemas. Also post‑validate the plan against the registry.

8. What is the cost overhead of planning?
Each planning LLM call adds tokens. For a typical 5‑step plan, planning might consume 20–30% of total tokens. Use a cheaper model for planning to reduce cost.

9. How do I evaluate a planner offline?
Create a dataset of (goal, expected plan). Run your planner on each goal, compare generated plan to expected plan (exact match or semantic similarity). Measure success rate and token usage.

10. Can planning be done in parallel for multiple goals?
Yes, if your agent handles batch requests. However, each goal should have its own plan and state. Do not interleave steps from different plans.

11. What is the role of memory in replanning?
Memory stores the outcomes of previous steps. The replanner must access that memory to decide which steps to skip, retry, or replace.

12. How does human‑in‑the‑loop affect planning?
The plan can include a wait_for_human action. The planner must understand that such steps pause execution indefinitely and require resumption with new input.

13. Is Tree of Thoughts production‑ready?
Rarely. The token cost is high, and latency is unpredictable. ToT is mostly for research. Production systems use ReAct or Plan‑and‑Execute.

14. Can a planner learn from past successes?
Yes, by storing successful plans in long‑term memory and retrieving them via similarity search. This is called “plan reuse” or “case‑based planning.”

15. What is the single biggest mistake teams make with planning?
Building a planner that never replans. When a tool returns an unexpected result, the agent blindly continues or crashes. Always include a replanning loop.


Continue Your Journey​

Now that you understand how agents plan, explore the other core components that work alongside planning:

Or return to the Agent Learning Path to see where planning fits in your roadmap.


This article is part of the AgentDevPro Production Agent Engineering Handbook. Updated for Q2 2026.