Anup Shinde
AI

AI Agent That Asks Before Acting in Python - Building Agents Part 3

August 31, 2025 6 min read

Build a Clarifying Agent in Python with GPT-5 that asks follow-up questions until inputs are clear, providing a safer, smarter, and user-friendly experience.

Clarifying AI Agent
Photo by julien Tromeur / Unsplash
Ai Agents Python — Part 3 of 3

Introduction: Why We Need Clarifying Agents

In Part 1 , we built a Simple Decision-Making Agent that worked on structured inputs.
In Part 2 , we added an LLM “perception layer” that turns natural language into structured JSON using response_format + json_schema.

But there’s still a problem:

User: “I’m kinda hungry and I don’t have much time.”

How hungry is “kinda”? Is “not much time” five minutes or fifteen?
Rather than guessing, a good agent asks clarifying questions until everything is clear and understood.

NOTE: This article builds directly on Part 2: LLM-Powered Agent , and we’ll be referring back to it often. So if you haven’t read it yet, we recommend starting there.


Enter the Clarifying Agent

Our agent will:

  • Detect what’s missing or vague (ingredients, hunger 1–5, time in minutes).
  • Ask one clarifying question at a time.
  • Loop until all three fields are complete.
  • Only then, extract structured JSON (with the same schema approach as Part 2 ).

How the Clarifying Agent Works


Environment Setup

Following the same baseline as Part 2 , and similar to it, we’ll load the environment from a .env file for convenience.

python3 -m venv ~/.venvs/agents
source ~/.venvs/agents/bin/activate
pip install -U openai python-dotenv

Create a .env file:

OPENAI_API_KEY=sk-...

And load it in Python:

from dotenv import load_dotenv
load_dotenv()  # loads OPENAI_API_KEY from .env

Code: Clarifying Agent with Schema-Constrained Extraction

Below is a safe, production-friendly implementation that:

  • Uses a <DONE/> sentinel to avoid false “done” matches inside natural text.
  • Caps clarifications with max_rounds to prevent infinite loops.
  • Extracts JSON via response_format={'type':'json_schema'} with the same schema as Part 2 .
  • Validates types/ranges and normalizes inputs.

1) Shared JSON Schema (same as Part 2)

MEAL_SCHEMA = {
    "type": "object",
    "properties": {
        "ingredients": {"type": "array", "items": {"type": "string"}},
        "hunger": {"type": "integer", "minimum": 1, "maximum": 5},
        "time": {"type": "integer", "minimum": 0}
    },
    "required": ["ingredients", "hunger", "time"],
    "additionalProperties": False
}

2) Rule Engine + Tiny Recipe Base

from dataclasses import dataclass
from typing import List

def normalize_items(items: List[str]) -> List[str]:
    return [i.strip().lower() for i in items]

@dataclass
class MealAgent:
    ingredients: List[str] = None
    hunger: int = 0
    time: int = 0

    def __post_init__(self):
        # Use your recipe list
        self.recipes = [
            {
                "name": "Egg Fried Rice",
                "ingredients": ["rice", "eggs"],
                "time": 15,
                "filling": 4
            },
            {
                "name": "Cheese Omelette Sandwich",
                "ingredients": ["bread", "cheese", "eggs"],
                "time": 10,
                "filling": 5
            },
            {
                "name": "Bread and Jam",
                "ingredients": ["bread", "jam"],
                "time": 5,
                "filling": 2
            }
        ]

    def perceive(self, ingredients: List[str], hunger: int, minutes: int):
        self.ingredients = normalize_items(ingredients)
        self.hunger = int(hunger)
        self.time = int(minutes)

    def act(self) -> str:
        have = set(self.ingredients)

        # Only requirement: recipe "need" ⊆ user "have"
        # (extra ingredients like "vegetables" are ignored)
        feasible = []
        for recipe in self.recipes:
            need = set(map(str.lower, recipe["ingredients"]))
            if need.issubset(have) and recipe["time"] <= self.time:
                feasible.append(recipe)

        if feasible:
            # If very hungry, prefer higher filling; otherwise prefer quicker
            feasible.sort(key=lambda r: (-(r["filling"] if self.hunger >= 4 else 0), r["time"]))
            top = feasible[0]
            tag = " (quick & filling!)" if self.hunger >= 4 and top["filling"] >= 4 else ""
            return f'{top["name"]}{tag}'

        return "No suitable recipe found."

3) Clarifying + Extraction

import json
from typing import List, Callable
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()

class ClarifyingMealAgent:
    def __init__(self, ask_user=input, max_rounds=8):
        self.ask_user = ask_user
        self.max_rounds = max_rounds
        self.ingredients: List[str] = []
        self.hunger: int = 0
        self.time: int = 0

    def clarify_until_clear(self, user_input: str) -> str:
        convo = user_input.strip()
        rounds = 0
        while True:
            rounds += 1
            if rounds > self.max_rounds:
                print("⚠️ Too many clarification rounds. Proceeding with best effort.")
                return convo

            prompt = f'''
You are helping extract three fields from a user's cooking context: 
1) ingredients (list of items). (list[str]): Canonical ingredient names only, pluralized where natural. 
2) hunger (integer 1-5) 
3) time (integer minutes) 
• Remove numbers (e.g., "2 eggs" → "eggs") 
• Remove adjectives (e.g., "leftover rice" → "rice", "boiled egg" → "eggs") 
• Use simple grocery-style labels that match common recipes. 

Do NOT infer missing values; if anything is vague or missing, ask a direct question to get that exact value.

Conversation so far:
--- START ---
{convo}
--- END ---

If ALL THREE are present and clear, reply with:
<DONE/>

If NOT complete, ask ONE clarifying question about the most missing/unclear field. No extra commentary.
'''
            reply = self.ask_gpt(prompt).strip()
            if "<DONE/>" in reply:
                return convo
            print("🤖 Clarifying:", reply)
            user_reply = self.ask_user("👤 Your answer: ").strip()
            convo += "\n" + reply + "\n" + user_reply

    def perceive(self, user_input: str) -> None:
        full = self.clarify_until_clear(user_input)

        resp = client.chat.completions.create(
            model="gpt-5",
            messages=[
                {"role": "system", 
                    "content": (
                        "Extract meal inputs as JSON matching the schema. "
                        "Normalize 'ingredients' to these allowed labels ONLY: "
                        "rice, eggs, bread, cheese, jam, vegetables. "
                        "Remove quantities (e.g., '2 eggs' -> 'eggs') and adjectives "
                        "(e.g., 'leftover rice' -> 'rice', 'boiled egg' -> 'eggs'). "
                        "Map synonyms like 'veg'/'veggies' -> 'vegetables'. "
                        "Return only canonical grocery-style labels."
                    )
                },
                {"role": "user", "content": full},
            ],
            response_format={
                "type": "json_schema",
                "json_schema": {"name": "meal_inputs", "schema": MEAL_SCHEMA},
            },
        )
        raw = resp.choices[0].message.content
        data = json.loads(raw)

        ingredients = data.get("ingredients", [])
        hunger = int(data.get("hunger", 0))
        minutes = int(data.get("time", 0))
        if not isinstance(ingredients, list) or not ingredients:
            raise ValueError("ingredients must be a non-empty list of strings.")
        if hunger < 1 or hunger > 5:
            raise ValueError("hunger must be an integer 1..5.")
        if minutes <= 0 or minutes > 24*60:
            raise ValueError("time must be a positive integer (minutes).")

        self.ingredients = [str(i).strip().lower() for i in ingredients]
        self.hunger = hunger
        self.time = minutes

    def ask_gpt(self, prompt: str) -> str:
        r = client.chat.completions.create(model="gpt-5", messages=[{"role":"user","content":prompt}])
        return r.choices[0].message.content

Why “Do NOT infer” (vs Part 2):

In Part 2 , we let the model guess vague details (e.g., treating “in a rush” as 10 minutes). That’s convenient for demos, but risky. Guesses can be wrong, inconsistent across runs, and hard to explain to users. In this part, the clarify loop flips this: ask, don’t guess. If the user says “kinda hungry” or “not much time,” the agent asks a single, direct question to pin down the exact number.

What this buys you:

You receive reliable, user-confirmed inputs before any decision logic is executed, which enhances trust and reduces silent errors. It also ensures stable behavior for testing/telemetry, and maintains the integrity of schema extraction; no backfilling or hallucinated values. In short: clearer inputs, fewer surprises, better UX.

4) Wire it up

if __name__ == "__main__":
    clarifier = ClarifyingMealAgent()
    meal = MealAgent()

    user_text = "I have leftover rice, two eggs, and veggies. I am kinda hungry and in a rush."
    clarifier.perceive(user_text)

    meal.perceive(clarifier.ingredients, clarifier.hunger, clarifier.time)
    suggestion = meal.act()
    print("🍽️ Suggestion:", suggestion)

Sample Run

Input:
I have leftover rice and two eggs, and veggies. I’m kinda hungry and in a rush.

Output:

🤖 Clarifying: How many minutes do you have to cook?
👤 Your answer: 15
🤖 Clarifying: On a scale of 1–5, how hungry are you?
👤 Your answer: 4
🍽️ Suggestion: Egg Fried Rice (quick & filling!)

Why <DONE/> and max_rounds?

  • <DONE/> sentinel: the plain word done can appear in conversation (“I’m done cooking”). A unique tag reduces false positives, so the loop only exits on an explicit signal from the LLM.
  • max_rounds: guards against infinite loops if a user goes silent or keeps replying vaguely. After N tries, the agent proceeds with best effort or shows a graceful message.

Why This Is Better

  • Matches Part 2's schema-constrained output pattern for reliable JSON.
  • Adaptive: asks only what’s missing; feels conversational.
  • Safe: bounded loop + validated inputs.
  • Practical: plugs straight into the Part 1 rule engine + recipes for instant results.

Wrap-Up

This finishes the 3-part series: rules → LLM perception → clarifying loop. From here, you can add memory (favorite meals), tools (web recipe fetchers), or a UI layer.