The Human Side of Agentic Systems: Why the Agent Industry Is Designing for Machines, Not People

Part 4 of 6 | By Tamil Selvan Gunasekaran, AI Agent Developer Intern at Autohive & HCI Researcher

The Uncomfortable Question Nobody Is Asking

I spend my days in two worlds. In one, I am a PhD researcher at the Empathic Computing Lab, studying how humans think, collaborate, and make decisions when AI enters the room. In the other, I am an AI Agent Developer Intern at Autohive, a startup building a production platform where AI agents handle real work for real businesses.

What bothered me early on was how rarely these two worlds talk to each other.

The agent industry is in a performance arms race. Every week there is a new benchmark, a new model, a new framework promising better tool-calling accuracy or lower latency. We measure cost-per-token to six decimal places. We build evaluation arenas that score models across dozens of dimensions. We obsess over whether Claude beats GPT-4o on reasoning tasks.

And in all of this, we keep flattening the human into a checkbox.

Not the phrase "human-in-the-loop" that shows up in diagrams. I mean the actual person staring at the dashboard, reviewing outputs, trying to decide if the agent is right, and slowly getting mentally cooked by the interface around them. That person has limits. Attention limits. Trust limits. Decision limits. Most agent systems are designed as if none of that matters.

I have spent years studying what happens when humans collaborate with AI. The thing I keep seeing is this: the industry optimizes agent performance and then acts surprised when the human experience is terrible.

The first three posts in this series were about architecture, monitoring, and evaluation. This one is about the part we keep under-designing: the human who has to live with those systems.

1. Cognitive Load Is the Real Bottleneck

One of the most useful things HCI taught me is embarrassingly easy to forget in engineering: human attention is finite. Every extra panel, trace, alert, and paragraph costs something.

Cognitive Load Theory has massive implications for agent design. Recent research from MIT Media Lab (Kosmyna et al., 2025) demonstrated that heavy AI assistant use leads to measurable "cognitive debt" — reduced neural connectivity and diminished independent thinking. A separate study of 666 participants (Abbas et al., 2025) found a strong positive correlation (r = 0.72) between AI tool use and cognitive offloading, meaning the more people rely on AI outputs, the less they engage their own critical thinking. The core idea is simple:

Intrinsic load: The inherent complexity of the task itself
Extraneous load: Complexity added by bad design — irrelevant information, poor formatting, unclear structure
Germane load: The mental effort spent on actually learning and making decisions

Total Cognitive Load = Intrinsic + Extraneous + Germane

If Total > Working Memory Capacity → Failure

Working memory can hold roughly 4 ± 1 chunks of information at a time. Not 20. Not 50. Four. The 2026 World Economic Forum report "The Human Advantage: Stronger Brains in the Age of AI" warns that without deliberate investment in human cognitive capacity, AI-augmented workplaces risk "driving preventable costs through declining employee well-being."

Now think about what a typical agent dashboard looks like. An operator monitoring five agents sees: conversation logs, tool call traces, error rates, token usage, cost breakdowns, evaluation scores, real-time status updates, approval requests, and escalation queues. All at once. All competing for four slots in working memory.

When I look at most agent dashboards, I do not see a monitoring problem first. I see a cognitive load problem.

What Agent Platforms Do	What They Should Do	HCI Principle
Show all metrics simultaneously	Show only actionable metrics; hide the rest	Reduce extraneous load
Display raw conversation logs	Surface decision points and anomalies	Information scent (Pirolli, 2007)
Alert on every threshold breach	Aggregate related alerts into incidents	Chunking (Miller, 1956)
Present 10-field approval forms	Ask one question: "Approve this action? Here is why."	Progressive disclosure
Render tool call traces as flat lists	Show traces as collapsible hierarchies	Visual hierarchy

The agent industry treats human attention as infinite. Cognitive science proved it is not — fifty years ago.

The Formula Nobody Uses

The way I make this practical is by treating each output as having a cognitive cost:

CognitiveCost(output) = InformationDensity × DecisionComplexity × ContextSwitchPenalty

where:
  InformationDensity  = words + data_points + visual_elements
  DecisionComplexity  = number_of_options × uncertainty_level
  ContextSwitchPenalty = 1.0 if same_task, 2.5 if different_task

Most agent systems maximize InformationDensity ("give the user everything, let them figure it out") while ignoring that ContextSwitchPenalty alone can more than double the cognitive cost. An operator switching between a support agent and a data extraction agent is not just reading two outputs — they are rebuilding their entire mental model each time.

2. The Second-Order Outage: When Agents Work Too Well

Everyone in this industry is focused on making agents succeed. Almost nobody is thinking about what happens when they succeed at scale.

The scenario looks like this. You deploy five agents. Individually, they all look successful. Together, they generate a flood of output that humans still need to review, approve, or absorb.

I call this the second-order outage — the system is working perfectly, but the humans operating it have collapsed.

Work inflation is the mechanism. Every competent agent generates downstream work for humans:

Support agent resolves 200 tickets/day → someone needs to quality-check a sample
Data agent produces 15 analysis reports/day → someone needs to read and act on them
Coding agent opens 30 PRs/week → someone needs to review them
Scheduling agent books 40 meetings/week → someone needs to attend them

ReviewLoad = Σ (AgentOutput_i × ReviewRate_i × TimePerReview_i)

If ReviewLoad > AvailableHumanHours → Second-Order Outage

The math is unforgiving. If each agent output takes 3 minutes of human review and you have five agents producing 50 outputs per day each, that is 250 × 3 = 750 minutes of review work per day. That is 12.5 hours. For one person. Every day.

Queueing Theory Meets Human Limits

Little's Law from queueing theory gives us the relationship:

L = λ × W

where:
  L = average number of items in the review queue
  λ = arrival rate (agent outputs per hour)
  W = average time to process one item

If agents produce 30 items per hour and a human takes 4 minutes per item, the human's throughput is 15 items per hour. The queue grows at 15 items per hour. By end of day, there are 120 unreviewed items. By Friday, the system is drowning.

Your agent platform's job is not to generate output. It is to protect your humans from output.

Designing for Throughput Sustainability

The fix is not "hire more reviewers." The fix is designing agent output for minimal human processing time:

Decision-ready artifacts: Do not give humans raw data. Give them a recommendation, a risk summary, and a one-click action. Reduce W in Little's Law.
Confidence gating: Only route to humans when the agent's confidence is below a threshold. Reduce λ.
Batch decisions: Group similar items. "These 12 support responses all follow the same pattern — approve all?" Reduce L through chunking.
Output shaping: Force agents to emit structured, scannable outputs — not verbose explorations.

3. The Attention Economy Inside Your Agent Platform

Research consistently shows that human working memory holds about four items at a time. Jakob Nielsen and the Nielsen Norman Group have spent three decades proving that people do not read — they scan.

None of this research has penetrated the agent industry.

Put plainly: token costs are rounding error. Human attention is the expensive resource.

A GPT-4o call costs fractions of a cent. The engineer who reads the output, decides whether to trust it, and takes action on it costs $80-200 per hour. Every minute of unnecessary cognitive effort is real money — far more than the tokens that generated it.

An Attention Budget

I propose treating human attention as a first-class resource with an explicit budget:

AttentionBudget(team) = TeamSize × FocusHoursPerDay × AttentionUnitsPerHour

AttentionCost(agent_output) = ReadTime + ComprehensionTime + DecisionTime + ActionTime

Daily Attention Spend = Σ AttentionCost(all_agent_outputs)

If Daily Attention Spend > AttentionBudget → Overload

Attention Cost Factor	Low Cost	High Cost	Design Lever
ReadTime	Structured, scannable	Wall of prose	Format + hierarchy
ComprehensionTime	Familiar patterns	Novel format every time	Consistency
DecisionTime	Binary choice with context	Open-ended with ambiguity	Confidence scores + recommendations
ActionTime	One-click action	Multi-step manual process	Automation of the last mile

Information Scent

Information Foraging Theory explains how humans navigate information environments. People follow "information scent" — cues that suggest relevant content is nearby. Strong scent means they find what they need quickly. Weak scent means they wander. Chen et al. (2025) describe how the new agent interaction paradigm demands that AI outputs be designed for human "cognitive strain alleviation" — yet most agent frameworks still dump raw outputs and expect the user to forage.

Agent outputs with weak information scent look like this:

Long paragraphs with no headers
Buried conclusions
Technical details before the summary
No visual distinction between critical and incidental information

Agent outputs with strong information scent:

Status first (success/failure/needs review)
Summary in one sentence
Recommendation with confidence level
Details collapsed, expandable on demand

Design agent outputs like newspaper articles: headline first, lead paragraph second, details third. The reader should be able to stop at any point and still have the most important information.

4. Trust Is Not a Toggle

Most agent platforms treat trust as a binary. The agent either has permission to act autonomously, or it requires approval. On or off. Trusted or not.

This is a fundamental misunderstanding of how humans actually trust.

McGrath et al. (2025) introduced the CHAI-T framework (Collaborative Human-AI Trust) specifically for human-AI teaming contexts. Their key insight: trust in AI collaboration is not a static property — it is a dynamic process that evolves through team interaction phases, influenced by task context, performance history, and environmental factors. Gerlich (2024) further showed that trust in AI is driven by a complex interplay of motivators where familiarity and perceived competence shift the balance — meaning trust is a continuous variable that evolves with experience, not a setting you configure.

The Trust Spectrum

Blind Trust ←——————— Calibrated Trust ———————→ No Trust
(dangerous)           (ideal)                  (wasteful)

Blind trust: The user accepts everything the agent says without verification. Efficient but dangerous — one bad output and the consequences can be severe.
Calibrated trust: The user has an accurate mental model of when the agent is reliable and when it is not. This is the goal.
No trust: The user checks everything, effectively doing the work themselves. The agent becomes overhead, not help.

Trust calibration requires two things that most agent systems do not provide:

Transparency: The user can see why the agent made a decision, not just what it decided
Track record: The user has accumulated enough experience to know the agent's strengths and weaknesses

How Trust Decays

Trust does not just build — it also decays, and it decays asymmetrically:

TrustGain(success)  = small, incremental (+0.01 to +0.05)
TrustLoss(failure)  = large, sudden (-0.10 to -0.40)
TrustRecovery(time) = slow, logarithmic

One bad output can undo twenty good ones. The CHAI-T framework (McGrath et al., 2025) explicitly models this through "performance phases" where trust updates are asymmetric — negative experiences carry disproportionate weight. This has a direct implication for agent design: the cost of a single visible failure is far higher than the benefit of a single visible success.

This means:

Design Implication	Why
Show confidence scores on every output	Users learn when to trust and when to verify
Highlight uncertainty, do not hide it	Transparent uncertainty builds trust; hidden uncertainty destroys it
Admit mistakes explicitly	"I may be wrong about this" is trust-building, not trust-destroying
Offer easy verification paths	Let users spot-check without derailing their workflow
Track trust over time per user	Different users calibrate at different rates

Learned Helplessness

There is a darker failure mode that nobody in the agent industry discusses: learned helplessness. When an agent handles tasks that a human used to do, the human gradually loses the ability and confidence to do those tasks themselves. If the agent then fails or is unavailable, the human cannot fall back to manual execution.

This is not hypothetical. Zhai et al. (2024) found that students who heavily relied on AI dialogue systems showed diminished decision-making and critical analysis abilities. The MIT Media Lab study (Kosmyna et al., 2025) reported reduced brain connectivity and underperformance after heavy LLM reliance. That should worry anyone designing agents for knowledge work.

I do not want agents to get so frictionless that the human stops thinking. I want them to reduce operational load without switching off the user's judgment.

5. Agent Memory Rot: The Entropy Nobody Audits

Everyone celebrates long-term agent memory. "Our agents learn from every conversation. They remember your preferences. They build context over time."

Nobody talks about what happens six months later.

Agent memory is subject to entropy — the gradual accumulation of stale, contradictory, and unverified information that degrades decision quality over time. And unlike human memory, which has built-in mechanisms for forgetting irrelevant information, agent memory stores everything with equal weight. Risko and Gilbert (2024) describe this as a fundamental asymmetry in cognitive offloading: humans evolved sophisticated forgetting mechanisms that improve decision quality, but the systems we build to augment them lack any equivalent.

The Rot Taxonomy

Memory Failure	Example	User Impact
Stale facts	"Customer prefers email" — they switched to Slack 3 months ago	Agent uses wrong channel, user corrects, trust decays
Contradictions	Memory A says "budget is $50k", Memory B says "budget is $75k"	Agent picks one arbitrarily, user cannot tell which
Unverified inferences	Agent inferred "user is technical" from one conversation	Agent skips explanations that user actually needs
Context collapse	Fact from Project A bleeds into Project B	Wrong context applied, subtle errors
Compounding errors	Inference built on inference built on stale fact	Confident, articulate, completely wrong

The most dangerous form is compounding errors. The agent stored that a customer prefers concise responses (true six months ago). It then inferred the customer is technical (uncertain). It then started skipping safety warnings in its responses (wrong). Each step was plausible. The chain is catastrophic.

The Transparency Problem

From an HCI perspective, the core issue is mental model alignment. The user has a mental model of what the agent knows. The agent has an actual knowledge state. These diverge over time, and the user has no way to detect the divergence.

Good interface design for agent memory requires:

Memory provenance: Every stored fact should show where it came from and when
Confidence decay: Older memories should be visually distinguished from recent ones
Contradiction surfacing: When memories conflict, surface the conflict to the user instead of silently resolving it
Audit interface: A simple way for users to review, correct, and delete what the agent "knows"

MemoryReliability(fact) = SourceReliability × Recency × VerificationStatus

where:
  SourceReliability  = { user_stated: 1.0, agent_inferred: 0.6, third_party: 0.8 }
  Recency           = exp(-λ × days_since_stored)   // λ = decay rate
  VerificationStatus = { verified: 1.0, unverified: 0.7, contradicted: 0.2 }

The most dangerous agent is the one with "experience" — because unverified memory is just institutionalized hallucination.

6. The Irreversibility Problem: Designing for Undo

Ben Shneiderman argued in Human-Centered AI (2022) that reliable, safe, and trustworthy AI systems must be designed around human control — including the ability to recover from errors. This principle of reversibility is almost entirely absent from agent systems.

Agents take actions. Some of those actions can be undone. Many cannot.

Action Category	Examples	Reversibility	User Anxiety
Fully reversible	Draft email, create document, internal note	Easy undo	Low
Partially reversible	Send email, post message, update record	Retract/edit possible	Medium
Irreversible	Process refund, delete data, submit legal filing	Cannot undo	High

The gap between "confirm this action?" and the user actually understanding the consequences of that action is a design failure. Most confirmation dialogs are worthless — they present the action ("Send refund of $450?") without the context needed to evaluate it ("This customer has had 3 refunds this month, which exceeds policy. This will flag an audit review.").

Designing for Safe Agency

From an HCI perspective, the solution is not to prevent agents from taking irreversible actions. It is to design the interaction so that the human can make an informed decision with minimal cognitive effort:

1. Action previews, not confirmations

BAD:  "Proceed with refund? [Yes] [No]"

GOOD: "Refund $450 to John Smith
       → Order #4821 (placed 3 days ago)
       → This is refund #4 this month (policy limit: 3)
       → Impact: triggers audit flag
       [Approve] [Modify] [Reject]"

2. Reversibility indicators

Every agent action should display a clear reversibility signal:

🟢 Reversible  — "Draft saved. You can edit or delete anytime."
🟡 Partial     — "Email sent. You can send a follow-up correction."
🔴 Irreversible — "Once submitted, this cannot be undone. Review carefully."

3. Graduated autonomy

Do not give agents full autonomy on day one. Ramp up based on demonstrated reliability:

Stage 1: Agent recommends → Human executes
Stage 2: Agent executes reversible actions → Human reviews
Stage 3: Agent executes all actions → Human audits sample
Stage 4: Full autonomy with exception-based review

The progression should be per action type, not per agent. An agent might be at Stage 4 for sending meeting reminders but Stage 1 for processing refunds.

The future is not "safer prompts." It is making unsafe actions structurally hard to take without informed human consent.

7. Designing Agent Output for Human Cognition

This is where the theory becomes design. If cognitive load is real, attention is finite, and trust keeps moving, then agent output has to be shaped around those limits.

The Inverted Pyramid

Journalism solved this problem a century ago with the inverted pyramid: most important information first, supporting details second, background third. The reader can stop at any point and still have the essential story.

Agent outputs should follow the same structure:

Level 1: STATUS + ONE-LINE SUMMARY
         "✅ Support ticket resolved. Customer refund processed."

Level 2: KEY DETAILS (3-5 items)
         - Refund amount: $120
         - Method: Original payment method
         - Processing time: 2-3 business days
         - Confidence: High (similar cases: 94% success rate)

Level 3: FULL CONTEXT (collapsed by default)
         - Complete conversation transcript
         - Tool call trace
         - Alternative actions considered
         - Raw model reasoning

Gestalt Principles Applied to Agent UI

The Gestalt principles of perception — proximity, similarity, continuity, closure — are foundational in interface design. They are almost never applied to agent output design:

Principle	Application to Agent Output
Proximity	Group related information together. Do not scatter the action, its result, and its confidence across different parts of the output.
Similarity	Use consistent visual patterns. Every agent should present status, summary, and details in the same format. Users should not have to re-learn the output structure for each agent.
Figure-ground	Make the primary message visually dominant. De-emphasize supporting details. The user's eye should land on the most important information first.
Closure	Provide clear completion signals. "Task complete" is not enough — show what was accomplished, what remains, and what the user needs to do next (if anything).

The Three-Second Rule

Usability research consistently shows that users form judgments about a page in 3-5 seconds. The same applies to agent output. If a human cannot determine the status and required action within three seconds of looking at an agent's response, the design has failed.

Test your agent outputs against this rubric:

Question	Must Be Answerable In	Design Element
Did it succeed or fail?	1 second	Status badge / color
What did it do?	3 seconds	One-line summary
Do I need to do anything?	5 seconds	Clear call-to-action or "no action needed"
Can I trust this?	10 seconds	Confidence score + reasoning preview
What are the details?	On demand	Expandable section

8. The Permission Graph as Interaction Design

In Part 3 of this series, I covered evaluation systems. In Part 2, I covered monitoring. But there is a design layer underneath both of them that determines what agents can actually do: the permission graph.

Most teams think of tool permissions as a security concern. It is that. But it is also an interaction design concern — perhaps the most important one.

Shneiderman's (2022) framework for Human-Centered AI emphasizes that affordances and constraints are the primary design levers for safe autonomous systems. An affordance is what the system allows you to do. A constraint is what prevents you from doing things you should not. In agent systems:

Affordances = the tools available to the agent
Constraints = the permissions, rate limits, and approval gates on those tools

The permission graph — which tools an agent can access, under what conditions, with what approval requirements — is the single biggest lever you have over agent behavior. More than the model. More than the prompt.

Capability = f(Model, Prompt, ToolAccess)

In practice: ToolAccess dominates.

Designing the Permission Graph

Think of it as concentric circles of autonomy:

Inner circle:  Read-only tools (search, lookup, retrieve)
               → Full autonomy, no approval needed

Middle circle: Low-risk write tools (draft, note, tag)
               → Agent executes, human reviews async

Outer circle:  High-risk tools (send, delete, pay, publish)
               → Human approval required before execution

Beyond:        Tools not granted
               → Agent cannot even attempt

This maps to four constraint types:

Constraint Type	Agent System Example
Physical	Tool not in agent's available set — cannot call it
Semantic	Tool available but parameter validation rejects dangerous inputs
Cultural	Soft norms — agent "knows" to ask before sending external communications
Logical	Workflow gates — cannot execute step 3 before step 2 completes

Stop benchmarking models. Benchmark access topologies — because an average model with the right tool constraints beats a frontier model with unrestricted access.

9. What HCI Research Already Solved

The agent industry is repeating mistakes that the HCI community solved decades ago. Here are the frameworks that should be standard practice in every agent platform — and are used in almost none.

Nielsen's 10 Usability Heuristics — Applied to Agents

Microsoft's "Guidelines for Human-AI Interaction" (Amershi et al., 2023) extended classic usability heuristics specifically for AI systems, identifying 18 design guidelines organized around interaction phases. But even the original Nielsen heuristics — applied honestly — would transform most agent dashboards:

Heuristic	Agent Application	Current State
Visibility of system status	Show what the agent is doing, thinking, and waiting for — in real time	Most agents show a spinner or nothing
Match between system and real world	Use the user's language, not "tool_call_id: tc_3f2a"	Most dashboards expose internal IDs
User control and freedom	Let users stop, undo, and redirect agents mid-task	Most agents cannot be interrupted cleanly
Consistency and standards	Every agent should present outputs in the same format	Every agent framework invents its own
Error prevention	Prevent the agent from taking dangerous actions, do not just report errors after	Most rely on post-hoc error handling
Recognition rather than recall	Show available actions, do not make users remember commands	Most agent UIs require typed instructions
Flexibility and efficiency of use	Power users should be able to batch-approve, filter, and customize	Most dashboards are one-size-fits-all
Aesthetic and minimalist design	Show only relevant information at each decision point	Most show everything always
Help users recognize and recover from errors	When an agent fails, explain what went wrong and how to fix it	Most show generic error messages
Help and documentation	Provide contextual guidance on agent capabilities and limits	Almost never present

GOMS for Agent Task Analysis

GOMS (Goals, Operators, Methods, Selection rules) models human task performance by decomposing activities into measurable steps. Apply it to agent oversight:

Goal:     Verify that the support agent handled this ticket correctly
Operator: Read summary (2s) → Check confidence (1s) → Scan tool calls (3s) → Approve (1s)
Method:   Structured review via dashboard
Total:    ~7 seconds per ticket

vs.

Goal:     Same
Operator: Open transcript (2s) → Read full conversation (45s) → Cross-reference policy (30s) → Decide (10s) → Navigate to approve (5s)
Method:   Unstructured review via raw logs
Total:    ~92 seconds per ticket

The difference between 7 seconds and 92 seconds per ticket is the difference between reviewing 500 tickets per day and reviewing 39. Same human. Same task. Different design.

Fitts's Law for Interaction Cost

Fitts's Law predicts the time to reach a target based on distance and size (Budiu, 2022). In agent interfaces, this translates to: make the most frequent actions the easiest to reach.

If 80% of agent outputs are approved without changes, the "Approve" action should be:

Visually prominent (large target)
Close to where the user's attention already is (short distance)
Accessible via keyboard shortcut (zero distance)

If the reject/edit path requires three clicks and a modal dialog, you have inverted Fitts's Law — you made the rare action easy and the common action hard.

10. A Framework for Human-Centered Agent Design

Let me pull everything together into a practical framework. If you are building agent systems, these are the five pillars of human-centered design:

Pillar 1: Cognitive Load Management

Measure the cognitive cost of every human touchpoint
Apply progressive disclosure to all agent outputs
Chunk related information; never present flat lists of more than 5 items
Minimize context switches between agent types

Pillar 2: Trust Calibration

Display confidence scores on every output
Surface uncertainty — do not hide it
Track trust dynamics per user over time
Design for graduated autonomy, not binary trust

Pillar 3: Attention Economics

Treat human attention as a budgeted resource
Design for the three-second rule: status in 1s, summary in 3s, action in 5s
Gate human involvement by confidence threshold — not every output needs review
Shape agent output for scanning, not reading

Pillar 4: Error Recovery

Classify every action by reversibility
Provide action previews with consequence context, not bare confirmations
Design clean interruption paths — users must be able to stop agents mid-task
Make error states informative: what happened, why, and what to do next

Pillar 5: Progressive Autonomy

Start agents at low autonomy and increase based on demonstrated reliability
Scope autonomy per action type, not per agent
Maintain human cognitive engagement even at high autonomy levels
Build "fallback readiness" — humans should retain the ability to do the task manually

The Builder's Checklist

If you are designing or building an agent system, evaluate it against these criteria:

[ ] Can a human determine the agent's status within 3 seconds of looking at the output?
[ ] Does every output include a confidence signal?
[ ] Are irreversible actions gated behind informed-consent previews, not generic confirmations?
[ ] Is the agent's memory auditable and correctable by users?
[ ] Does the system measure human review time, not just agent performance?
[ ] Are outputs designed for scanning (structured, hierarchical) not reading (prose)?
[ ] Can users batch-approve similar outputs to reduce repetitive decisions?
[ ] Does the permission graph enforce graduated autonomy per action type?
[ ] Is there a mechanism to detect human cognitive overload (review queue depth, response latency)?
[ ] Can the agent be cleanly interrupted mid-task without corrupting state?

Key Takeaways

Cognitive load is finite. Your agent dashboard is competing for four slots in working memory. Design accordingly.

The second-order outage is real. Competent agents create more work for humans. If you do not design for throughput sustainability, you will drown your team in plausible output.

Trust is a spectrum, not a switch. It builds slowly, breaks fast, and requires transparency and track record — not just accuracy metrics.

Agent memory rots. Without provenance, decay, and audit mechanisms, long-term memory becomes a liability, not an asset.

Reversibility is a design requirement. Every action should have a clear undo path, and irreversible actions need consequence previews, not confirmation dialogs.

HCI solved these problems decades ago. Nielsen's heuristics, Fitts's Law, GOMS, cognitive load theory, information foraging — all of it applies. The agent industry just has not read the literature.

The teams that win at AI agents will not be the ones with the best models. They will be the ones that best understand the humans using them.

References

Kosmyna, N. et al. (2025). "Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task." arXiv:2506.08872. arxiv.org/abs/2506.08872

Abbas, M. et al. (2025). "AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking." Societies, 15(1), 6. mdpi.com/2075-4698/15/1/6

McGrath, M.J. et al. (2025). "Collaborative Human-AI Trust (CHAI-T): A Process Framework for Active Management of Trust in Human-AI Collaboration." Computers in Human Behavior: Artificial Humans, 6, 100200. doi.org/10.1016/j.chbah.2025.100200

World Economic Forum & McKinsey Health Institute. (2026). "The Human Advantage: Stronger Brains in the Age of AI." Insight Report. reports.weforum.org

Zhai, C. et al. (2024). "The effects of over-reliance on AI dialogue systems on students' cognitive abilities: A systematic review." Smart Learning Environments, 11, 28. doi.org/10.1186/s40561-024-00316-7

Chen, Y. et al. (2025). "A new human-computer interaction paradigm: Agent interaction model based on large models and its prospects." Frontiers of Information Technology & Electronic Engineering. doi.org/10.1016/j.fite.2025.01.002

Saffaryazdi, N., Gunasekaran, T.S. et al. (2025). "Empathetic Conversational Agents: Utilizing Neural and Physiological Signals for Enhanced Empathetic Interactions." International Journal of Human–Computer Interaction, 1-25.

Gunasekaran, T.S. et al. (2025). "CoAffinity: A Multimodal Dataset for Cognitive Load and Affect Assessment in Remote Collaboration." IEEE Transactions on Affective Computing.

Gerlich, M. (2024). "Exploring Motivators for Trust in the Dichotomy of Human-AI Trust Dynamics." Social Sciences, 13(5), 251. doi.org/10.3390/socsci13050251

Budiu, R. (2022). "Fitts's Law and Its Applications in UX." Nielsen Norman Group. nngroup.com/articles/fitts-law

Risko, E.F. & Gilbert, S.J. (2024). "Cognitive Offloading: A Comprehensive Review." Annual Review of Psychology, 75, 455-480.

Shneiderman, B. (2022). Human-Centered AI. Oxford University Press.

Amershi, S. et al. (2019; updated 2023). "Guidelines for Human-AI Interaction." CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1-13. doi.org/10.1145/3290605.3300233

This is Part 4 of the AI Agent Systems series.

Visual Gallery

Split view of machine optimization and human overload

Operator cognitive overload in agent dashboard

Human-centered redesign with progressive disclosure

The Human Side of Agentic Systems: Why the Agent Industry Is Designing for Machines, Not People

Part 4 of 6 | By Tamil Selvan Gunasekaran, AI Agent Developer Intern at Autohive & HCI Researcher

The Uncomfortable Question Nobody Is Asking

What bothered me early on was how rarely these two worlds talk to each other.

And in all of this, we keep flattening the human into a checkbox.

I have spent years studying what happens when humans collaborate with AI. The thing I keep seeing is this: the industry optimizes agent performance and then acts surprised when the human experience is terrible.

The first three posts in this series were about architecture, monitoring, and evaluation. This one is about the part we keep under-designing: the human who has to live with those systems.

1. Cognitive Load Is the Real Bottleneck

One of the most useful things HCI taught me is embarrassingly easy to forget in engineering: human attention is finite. Every extra panel, trace, alert, and paragraph costs something.

Intrinsic load: The inherent complexity of the task itself
Extraneous load: Complexity added by bad design — irrelevant information, poor formatting, unclear structure
Germane load: The mental effort spent on actually learning and making decisions

Total Cognitive Load = Intrinsic + Extraneous + Germane

If Total > Working Memory Capacity → Failure

When I look at most agent dashboards, I do not see a monitoring problem first. I see a cognitive load problem.

What Agent Platforms Do	What They Should Do	HCI Principle
Show all metrics simultaneously	Show only actionable metrics; hide the rest	Reduce extraneous load
Display raw conversation logs	Surface decision points and anomalies	Information scent (Pirolli, 2007)
Alert on every threshold breach	Aggregate related alerts into incidents	Chunking (Miller, 1956)
Present 10-field approval forms	Ask one question: "Approve this action? Here is why."	Progressive disclosure
Render tool call traces as flat lists	Show traces as collapsible hierarchies	Visual hierarchy

The agent industry treats human attention as infinite. Cognitive science proved it is not — fifty years ago.

The Formula Nobody Uses

The way I make this practical is by treating each output as having a cognitive cost:

CognitiveCost(output) = InformationDensity × DecisionComplexity × ContextSwitchPenalty

where:
  InformationDensity  = words + data_points + visual_elements
  DecisionComplexity  = number_of_options × uncertainty_level
  ContextSwitchPenalty = 1.0 if same_task, 2.5 if different_task

2. The Second-Order Outage: When Agents Work Too Well

Everyone in this industry is focused on making agents succeed. Almost nobody is thinking about what happens when they succeed at scale.

The scenario looks like this. You deploy five agents. Individually, they all look successful. Together, they generate a flood of output that humans still need to review, approve, or absorb.

I call this the second-order outage — the system is working perfectly, but the humans operating it have collapsed.

Work inflation is the mechanism. Every competent agent generates downstream work for humans:

Support agent resolves 200 tickets/day → someone needs to quality-check a sample
Data agent produces 15 analysis reports/day → someone needs to read and act on them
Coding agent opens 30 PRs/week → someone needs to review them
Scheduling agent books 40 meetings/week → someone needs to attend them

ReviewLoad = Σ (AgentOutput_i × ReviewRate_i × TimePerReview_i)

If ReviewLoad > AvailableHumanHours → Second-Order Outage

Queueing Theory Meets Human Limits

Little's Law from queueing theory gives us the relationship:

L = λ × W

where:
  L = average number of items in the review queue
  λ = arrival rate (agent outputs per hour)
  W = average time to process one item

Your agent platform's job is not to generate output. It is to protect your humans from output.

Designing for Throughput Sustainability

The fix is not "hire more reviewers." The fix is designing agent output for minimal human processing time:

Decision-ready artifacts: Do not give humans raw data. Give them a recommendation, a risk summary, and a one-click action. Reduce W in Little's Law.
Confidence gating: Only route to humans when the agent's confidence is below a threshold. Reduce λ.
Batch decisions: Group similar items. "These 12 support responses all follow the same pattern — approve all?" Reduce L through chunking.
Output shaping: Force agents to emit structured, scannable outputs — not verbose explorations.

3. The Attention Economy Inside Your Agent Platform

None of this research has penetrated the agent industry.

Put plainly: token costs are rounding error. Human attention is the expensive resource.

An Attention Budget

I propose treating human attention as a first-class resource with an explicit budget:

AttentionBudget(team) = TeamSize × FocusHoursPerDay × AttentionUnitsPerHour

AttentionCost(agent_output) = ReadTime + ComprehensionTime + DecisionTime + ActionTime

Daily Attention Spend = Σ AttentionCost(all_agent_outputs)

If Daily Attention Spend > AttentionBudget → Overload

Attention Cost Factor	Low Cost	High Cost	Design Lever
ReadTime	Structured, scannable	Wall of prose	Format + hierarchy
ComprehensionTime	Familiar patterns	Novel format every time	Consistency
DecisionTime	Binary choice with context	Open-ended with ambiguity	Confidence scores + recommendations
ActionTime	One-click action	Multi-step manual process	Automation of the last mile

Information Scent

Agent outputs with weak information scent look like this:

Long paragraphs with no headers
Buried conclusions
Technical details before the summary
No visual distinction between critical and incidental information

Agent outputs with strong information scent:

Status first (success/failure/needs review)
Summary in one sentence
Recommendation with confidence level
Details collapsed, expandable on demand

Design agent outputs like newspaper articles: headline first, lead paragraph second, details third. The reader should be able to stop at any point and still have the most important information.

4. Trust Is Not a Toggle

Most agent platforms treat trust as a binary. The agent either has permission to act autonomously, or it requires approval. On or off. Trusted or not.

This is a fundamental misunderstanding of how humans actually trust.

The Trust Spectrum

Blind Trust ←——————— Calibrated Trust ———————→ No Trust
(dangerous)           (ideal)                  (wasteful)

Blind trust: The user accepts everything the agent says without verification. Efficient but dangerous — one bad output and the consequences can be severe.
Calibrated trust: The user has an accurate mental model of when the agent is reliable and when it is not. This is the goal.
No trust: The user checks everything, effectively doing the work themselves. The agent becomes overhead, not help.

Trust calibration requires two things that most agent systems do not provide:

Transparency: The user can see why the agent made a decision, not just what it decided
Track record: The user has accumulated enough experience to know the agent's strengths and weaknesses

How Trust Decays

Trust does not just build — it also decays, and it decays asymmetrically:

TrustGain(success)  = small, incremental (+0.01 to +0.05)
TrustLoss(failure)  = large, sudden (-0.10 to -0.40)
TrustRecovery(time) = slow, logarithmic

This means:

Design Implication	Why
Show confidence scores on every output	Users learn when to trust and when to verify
Highlight uncertainty, do not hide it	Transparent uncertainty builds trust; hidden uncertainty destroys it
Admit mistakes explicitly	"I may be wrong about this" is trust-building, not trust-destroying
Offer easy verification paths	Let users spot-check without derailing their workflow
Track trust over time per user	Different users calibrate at different rates

Learned Helplessness

I do not want agents to get so frictionless that the human stops thinking. I want them to reduce operational load without switching off the user's judgment.

5. Agent Memory Rot: The Entropy Nobody Audits

Everyone celebrates long-term agent memory. "Our agents learn from every conversation. They remember your preferences. They build context over time."

Nobody talks about what happens six months later.

The Rot Taxonomy

Memory Failure	Example	User Impact
Stale facts	"Customer prefers email" — they switched to Slack 3 months ago	Agent uses wrong channel, user corrects, trust decays
Contradictions	Memory A says "budget is $50k", Memory B says "budget is $75k"	Agent picks one arbitrarily, user cannot tell which
Unverified inferences	Agent inferred "user is technical" from one conversation	Agent skips explanations that user actually needs
Context collapse	Fact from Project A bleeds into Project B	Wrong context applied, subtle errors
Compounding errors	Inference built on inference built on stale fact	Confident, articulate, completely wrong

The Transparency Problem

Good interface design for agent memory requires:

Memory provenance: Every stored fact should show where it came from and when
Confidence decay: Older memories should be visually distinguished from recent ones
Contradiction surfacing: When memories conflict, surface the conflict to the user instead of silently resolving it
Audit interface: A simple way for users to review, correct, and delete what the agent "knows"

MemoryReliability(fact) = SourceReliability × Recency × VerificationStatus

where:
  SourceReliability  = { user_stated: 1.0, agent_inferred: 0.6, third_party: 0.8 }
  Recency           = exp(-λ × days_since_stored)   // λ = decay rate
  VerificationStatus = { verified: 1.0, unverified: 0.7, contradicted: 0.2 }

The most dangerous agent is the one with "experience" — because unverified memory is just institutionalized hallucination.

6. The Irreversibility Problem: Designing for Undo

Agents take actions. Some of those actions can be undone. Many cannot.

Action Category	Examples	Reversibility	User Anxiety
Fully reversible	Draft email, create document, internal note	Easy undo	Low
Partially reversible	Send email, post message, update record	Retract/edit possible	Medium
Irreversible	Process refund, delete data, submit legal filing	Cannot undo	High

Designing for Safe Agency

1. Action previews, not confirmations

BAD:  "Proceed with refund? [Yes] [No]"

GOOD: "Refund $450 to John Smith
       → Order #4821 (placed 3 days ago)
       → This is refund #4 this month (policy limit: 3)
       → Impact: triggers audit flag
       [Approve] [Modify] [Reject]"

2. Reversibility indicators

Every agent action should display a clear reversibility signal:

🟢 Reversible  — "Draft saved. You can edit or delete anytime."
🟡 Partial     — "Email sent. You can send a follow-up correction."
🔴 Irreversible — "Once submitted, this cannot be undone. Review carefully."

3. Graduated autonomy

Do not give agents full autonomy on day one. Ramp up based on demonstrated reliability:

Stage 1: Agent recommends → Human executes
Stage 2: Agent executes reversible actions → Human reviews
Stage 3: Agent executes all actions → Human audits sample
Stage 4: Full autonomy with exception-based review

The progression should be per action type, not per agent. An agent might be at Stage 4 for sending meeting reminders but Stage 1 for processing refunds.

The future is not "safer prompts." It is making unsafe actions structurally hard to take without informed human consent.

7. Designing Agent Output for Human Cognition

This is where the theory becomes design. If cognitive load is real, attention is finite, and trust keeps moving, then agent output has to be shaped around those limits.

The Inverted Pyramid

Agent outputs should follow the same structure:

Level 1: STATUS + ONE-LINE SUMMARY
         "✅ Support ticket resolved. Customer refund processed."

Level 2: KEY DETAILS (3-5 items)
         - Refund amount: $120
         - Method: Original payment method
         - Processing time: 2-3 business days
         - Confidence: High (similar cases: 94% success rate)

Level 3: FULL CONTEXT (collapsed by default)
         - Complete conversation transcript
         - Tool call trace
         - Alternative actions considered
         - Raw model reasoning

Gestalt Principles Applied to Agent UI

The Gestalt principles of perception — proximity, similarity, continuity, closure — are foundational in interface design. They are almost never applied to agent output design:

Principle	Application to Agent Output
Proximity	Group related information together. Do not scatter the action, its result, and its confidence across different parts of the output.
Similarity	Use consistent visual patterns. Every agent should present status, summary, and details in the same format. Users should not have to re-learn the output structure for each agent.
Figure-ground	Make the primary message visually dominant. De-emphasize supporting details. The user's eye should land on the most important information first.
Closure	Provide clear completion signals. "Task complete" is not enough — show what was accomplished, what remains, and what the user needs to do next (if anything).

The Three-Second Rule

Test your agent outputs against this rubric:

Question	Must Be Answerable In	Design Element
Did it succeed or fail?	1 second	Status badge / color
What did it do?	3 seconds	One-line summary
Do I need to do anything?	5 seconds	Clear call-to-action or "no action needed"
Can I trust this?	10 seconds	Confidence score + reasoning preview
What are the details?	On demand	Expandable section

8. The Permission Graph as Interaction Design

Most teams think of tool permissions as a security concern. It is that. But it is also an interaction design concern — perhaps the most important one.

Affordances = the tools available to the agent
Constraints = the permissions, rate limits, and approval gates on those tools

Capability = f(Model, Prompt, ToolAccess)

In practice: ToolAccess dominates.

Designing the Permission Graph

Think of it as concentric circles of autonomy:

Inner circle:  Read-only tools (search, lookup, retrieve)
               → Full autonomy, no approval needed

Middle circle: Low-risk write tools (draft, note, tag)
               → Agent executes, human reviews async

Outer circle:  High-risk tools (send, delete, pay, publish)
               → Human approval required before execution

Beyond:        Tools not granted
               → Agent cannot even attempt

This maps to four constraint types:

Constraint Type	Agent System Example
Physical	Tool not in agent's available set — cannot call it
Semantic	Tool available but parameter validation rejects dangerous inputs
Cultural	Soft norms — agent "knows" to ask before sending external communications
Logical	Workflow gates — cannot execute step 3 before step 2 completes

Stop benchmarking models. Benchmark access topologies — because an average model with the right tool constraints beats a frontier model with unrestricted access.

9. What HCI Research Already Solved

The agent industry is repeating mistakes that the HCI community solved decades ago. Here are the frameworks that should be standard practice in every agent platform — and are used in almost none.

Nielsen's 10 Usability Heuristics — Applied to Agents

Heuristic	Agent Application	Current State
Visibility of system status	Show what the agent is doing, thinking, and waiting for — in real time	Most agents show a spinner or nothing
Match between system and real world	Use the user's language, not "tool_call_id: tc_3f2a"	Most dashboards expose internal IDs
User control and freedom	Let users stop, undo, and redirect agents mid-task	Most agents cannot be interrupted cleanly
Consistency and standards	Every agent should present outputs in the same format	Every agent framework invents its own
Error prevention	Prevent the agent from taking dangerous actions, do not just report errors after	Most rely on post-hoc error handling
Recognition rather than recall	Show available actions, do not make users remember commands	Most agent UIs require typed instructions
Flexibility and efficiency of use	Power users should be able to batch-approve, filter, and customize	Most dashboards are one-size-fits-all
Aesthetic and minimalist design	Show only relevant information at each decision point	Most show everything always
Help users recognize and recover from errors	When an agent fails, explain what went wrong and how to fix it	Most show generic error messages
Help and documentation	Provide contextual guidance on agent capabilities and limits	Almost never present

GOMS for Agent Task Analysis

GOMS (Goals, Operators, Methods, Selection rules) models human task performance by decomposing activities into measurable steps. Apply it to agent oversight:

Goal:     Verify that the support agent handled this ticket correctly
Operator: Read summary (2s) → Check confidence (1s) → Scan tool calls (3s) → Approve (1s)
Method:   Structured review via dashboard
Total:    ~7 seconds per ticket

vs.

Goal:     Same
Operator: Open transcript (2s) → Read full conversation (45s) → Cross-reference policy (30s) → Decide (10s) → Navigate to approve (5s)
Method:   Unstructured review via raw logs
Total:    ~92 seconds per ticket

The difference between 7 seconds and 92 seconds per ticket is the difference between reviewing 500 tickets per day and reviewing 39. Same human. Same task. Different design.

Fitts's Law for Interaction Cost

Fitts's Law predicts the time to reach a target based on distance and size (Budiu, 2022). In agent interfaces, this translates to: make the most frequent actions the easiest to reach.

If 80% of agent outputs are approved without changes, the "Approve" action should be:

Visually prominent (large target)
Close to where the user's attention already is (short distance)
Accessible via keyboard shortcut (zero distance)

If the reject/edit path requires three clicks and a modal dialog, you have inverted Fitts's Law — you made the rare action easy and the common action hard.

10. A Framework for Human-Centered Agent Design

Let me pull everything together into a practical framework. If you are building agent systems, these are the five pillars of human-centered design:

Pillar 1: Cognitive Load Management

Measure the cognitive cost of every human touchpoint
Apply progressive disclosure to all agent outputs
Chunk related information; never present flat lists of more than 5 items
Minimize context switches between agent types

Pillar 2: Trust Calibration

Display confidence scores on every output
Surface uncertainty — do not hide it
Track trust dynamics per user over time
Design for graduated autonomy, not binary trust

Pillar 3: Attention Economics

Treat human attention as a budgeted resource
Design for the three-second rule: status in 1s, summary in 3s, action in 5s
Gate human involvement by confidence threshold — not every output needs review
Shape agent output for scanning, not reading

Pillar 4: Error Recovery

Classify every action by reversibility
Provide action previews with consequence context, not bare confirmations
Design clean interruption paths — users must be able to stop agents mid-task
Make error states informative: what happened, why, and what to do next

Pillar 5: Progressive Autonomy

Start agents at low autonomy and increase based on demonstrated reliability
Scope autonomy per action type, not per agent
Maintain human cognitive engagement even at high autonomy levels
Build "fallback readiness" — humans should retain the ability to do the task manually

The Builder's Checklist

If you are designing or building an agent system, evaluate it against these criteria:

[ ] Can a human determine the agent's status within 3 seconds of looking at the output?
[ ] Does every output include a confidence signal?
[ ] Are irreversible actions gated behind informed-consent previews, not generic confirmations?
[ ] Is the agent's memory auditable and correctable by users?
[ ] Does the system measure human review time, not just agent performance?
[ ] Are outputs designed for scanning (structured, hierarchical) not reading (prose)?
[ ] Can users batch-approve similar outputs to reduce repetitive decisions?
[ ] Does the permission graph enforce graduated autonomy per action type?
[ ] Is there a mechanism to detect human cognitive overload (review queue depth, response latency)?
[ ] Can the agent be cleanly interrupted mid-task without corrupting state?

Key Takeaways

Cognitive load is finite. Your agent dashboard is competing for four slots in working memory. Design accordingly.

The second-order outage is real. Competent agents create more work for humans. If you do not design for throughput sustainability, you will drown your team in plausible output.

Trust is a spectrum, not a switch. It builds slowly, breaks fast, and requires transparency and track record — not just accuracy metrics.

Agent memory rots. Without provenance, decay, and audit mechanisms, long-term memory becomes a liability, not an asset.

Reversibility is a design requirement. Every action should have a clear undo path, and irreversible actions need consequence previews, not confirmation dialogs.

HCI solved these problems decades ago. Nielsen's heuristics, Fitts's Law, GOMS, cognitive load theory, information foraging — all of it applies. The agent industry just has not read the literature.

The teams that win at AI agents will not be the ones with the best models. They will be the ones that best understand the humans using them.

References

Kosmyna, N. et al. (2025). "Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task." arXiv:2506.08872. arxiv.org/abs/2506.08872

Abbas, M. et al. (2025). "AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking." Societies, 15(1), 6. mdpi.com/2075-4698/15/1/6

McGrath, M.J. et al. (2025). "Collaborative Human-AI Trust (CHAI-T): A Process Framework for Active Management of Trust in Human-AI Collaboration." Computers in Human Behavior: Artificial Humans, 6, 100200. doi.org/10.1016/j.chbah.2025.100200

World Economic Forum & McKinsey Health Institute. (2026). "The Human Advantage: Stronger Brains in the Age of AI." Insight Report. reports.weforum.org

Zhai, C. et al. (2024). "The effects of over-reliance on AI dialogue systems on students' cognitive abilities: A systematic review." Smart Learning Environments, 11, 28. doi.org/10.1186/s40561-024-00316-7

Chen, Y. et al. (2025). "A new human-computer interaction paradigm: Agent interaction model based on large models and its prospects." Frontiers of Information Technology & Electronic Engineering. doi.org/10.1016/j.fite.2025.01.002

Saffaryazdi, N., Gunasekaran, T.S. et al. (2025). "Empathetic Conversational Agents: Utilizing Neural and Physiological Signals for Enhanced Empathetic Interactions." International Journal of Human–Computer Interaction, 1-25.

Gunasekaran, T.S. et al. (2025). "CoAffinity: A Multimodal Dataset for Cognitive Load and Affect Assessment in Remote Collaboration." IEEE Transactions on Affective Computing.

Gerlich, M. (2024). "Exploring Motivators for Trust in the Dichotomy of Human-AI Trust Dynamics." Social Sciences, 13(5), 251. doi.org/10.3390/socsci13050251

Budiu, R. (2022). "Fitts's Law and Its Applications in UX." Nielsen Norman Group. nngroup.com/articles/fitts-law

Risko, E.F. & Gilbert, S.J. (2024). "Cognitive Offloading: A Comprehensive Review." Annual Review of Psychology, 75, 455-480.

Shneiderman, B. (2022). Human-Centered AI. Oxford University Press.

Amershi, S. et al. (2019; updated 2023). "Guidelines for Human-AI Interaction." CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1-13. doi.org/10.1145/3290605.3300233

This is Part 4 of the AI Agent Systems series.

T.S

The Human Side of Agentic Systems

The Human Side of Agentic Systems: Why the Agent Industry Is Designing for Machines, Not People

The Uncomfortable Question Nobody Is Asking

1. Cognitive Load Is the Real Bottleneck

The Formula Nobody Uses

2. The Second-Order Outage: When Agents Work Too Well

Queueing Theory Meets Human Limits

Designing for Throughput Sustainability

3. The Attention Economy Inside Your Agent Platform

An Attention Budget

Information Scent

4. Trust Is Not a Toggle

The Trust Spectrum

How Trust Decays

Learned Helplessness

5. Agent Memory Rot: The Entropy Nobody Audits

The Rot Taxonomy

The Transparency Problem

6. The Irreversibility Problem: Designing for Undo

Designing for Safe Agency

7. Designing Agent Output for Human Cognition

The Inverted Pyramid

Gestalt Principles Applied to Agent UI

The Three-Second Rule

8. The Permission Graph as Interaction Design

Designing the Permission Graph

9. What HCI Research Already Solved

Nielsen's 10 Usability Heuristics — Applied to Agents

GOMS for Agent Task Analysis

Fitts's Law for Interaction Cost

10. A Framework for Human-Centered Agent Design

Pillar 1: Cognitive Load Management

Pillar 2: Trust Calibration

Pillar 3: Attention Economics

Pillar 4: Error Recovery

Pillar 5: Progressive Autonomy

The Builder's Checklist

Key Takeaways

References

Visual Gallery

Related Posts

HCI‑LLM: Building a Local Research Assistant for 8,000+ CHI Papers

Building CoAffinity: Teaching Computers to Read the Room

As A Man Thinketh: The Mind as Architect

The Human Side of Agentic Systems

The Human Side of Agentic Systems: Why the Agent Industry Is Designing for Machines, Not People

The Uncomfortable Question Nobody Is Asking

1. Cognitive Load Is the Real Bottleneck

The Formula Nobody Uses

2. The Second-Order Outage: When Agents Work Too Well

Queueing Theory Meets Human Limits

Designing for Throughput Sustainability

3. The Attention Economy Inside Your Agent Platform

An Attention Budget

Information Scent

4. Trust Is Not a Toggle

The Trust Spectrum

How Trust Decays

Learned Helplessness

5. Agent Memory Rot: The Entropy Nobody Audits

The Rot Taxonomy

The Transparency Problem

6. The Irreversibility Problem: Designing for Undo

Designing for Safe Agency

7. Designing Agent Output for Human Cognition

The Inverted Pyramid

Gestalt Principles Applied to Agent UI

The Three-Second Rule

8. The Permission Graph as Interaction Design

Designing the Permission Graph

9. What HCI Research Already Solved

Nielsen's 10 Usability Heuristics — Applied to Agents

GOMS for Agent Task Analysis

Fitts's Law for Interaction Cost

10. A Framework for Human-Centered Agent Design

Pillar 1: Cognitive Load Management

Pillar 2: Trust Calibration

Pillar 3: Attention Economics

Pillar 4: Error Recovery