Deploying Agentic AI Without Regret: 6 Lessons from 13 Years in Healthcare

date

November 4, 2025

Deploying Agentic AI in Healthcare: Key Lessons for Success

When McKinsey published its latest research on agentic AI, I nearly laughed out loud. They described six lessons that separate success from failure—and every one matched what we learned the hard way building AI for healthcare since 2012.

Back then, “agentic AI” wasn’t a phrase. We were just trying to get machines to handle appointment requests without breaking patient trust. Most organizations still debated whether automation could do anything safely without human supervision.

McKinsey’s findings aren’t just validation—they’re a roadmap for avoiding the mistakes we made so you don’t have to.

We learned these lessons through thousands of patient conversations, dozens of failed experiments, and the humbling experience of watching AI projects that looked brilliant on paper struggle in the real world. When McKinsey notes that some organizations are “rehiring people where agents have failed,” we know that pain. But we also know the turnaround.

  • When EmergeOrtho’s call volume doubled and they didn’t need to double staff.
  • When Virginia Women’s Center cut new agent training from 6 months to 2 weeks.
  • When Golden State Orthopedics automated 10,000+ appointments while improving satisfaction.

Let’s talk about what actually works.

What Is Agentic AI—and Why It Matters Now

Agentic AI refers to generative systems that don’t just suggest but act.

They execute multi-step processes autonomously—scheduling appointments, handling insurance verification, managing triage protocols—without waiting for human approval at every step.

The potential? Massive productivity gains and dramatically better patient experiences.

The challenge? Most projects stall or create technical debt because organizations focus on the wrong things.

McKinsey found these patterns across industries. We’ve seen them play out for 13 years in healthcare — where the stakes are higher and the workflows are messier.

Six Lessons for Deploying Agentic AI (Without Regret)

1. It’s Not About the Agent—it’s About the Workflow

McKinsey’s finding: Organizations that focus on building impressive agents instead of reimagining workflows end up with great demos and underwhelming results.

Our lesson: You can’t optimize a broken process with AI. You have to redesign it.

In 2012, practices would ask: “Can you automate our scheduling process?”

Technically yes. Strategically wrong. Because if your current process requires three transfers and two callbacks, automating it just means your AI inherits your dysfunction.

What works
  • Map the entire workflow—every handoff, failure mode, and pain point.
  • Separate human judgment from human workarounds.
  • Redesign for collaboration between humans and AI—not competition.
Real-world example:

Virginia Women’s Center rethought scheduling and triage when implementing CareDesk. They designed a system where AI handles the routine complexity and staff focus on judgment. The result? New agents master workflows in weeks, not months.

Ask yourself:
  • Are we redesigning workflows or just automating inefficiency?
  • Where is judgment essential versus compensating for bad systems?
  • How will feedback loops make AI smarter over time?

2. Agents Aren’t Always the Answer

McKinsey’s finding: Sometimes simpler tools—rules-based systems or analytics—work better than agents.

Our lesson: The goal isn’t to use AI everywhere. It’s to use the right tool for each job.

Healthcare workflows are layered. Insurance verification may need rules-based logic; appointment booking needs conversation handling; triage demands human expertise. Forcing everything through a single AI layer leads to brittleness.

What works
  • Assess each task by standardization, variance, complexity, and risk.
  • Use an orchestration layer that lets tools hand off seamlessly.
  • Measure outcomes, not AI coverage.
Real-world example:

EmergeOrtho didn’t deploy one massive AI to handle everything. They used rules for verification, automation for scheduling, humans for complex care coordination—and an AI orchestrator to keep them in sync.

Ask yourself:
  • Could a simpler system solve this more reliably?
  • What’s the consequence of an error?
  • Is our orchestration layer robust enough to handle escalation?

3. Stop “AI Slop”—Invest in Evaluation and Trust

McKinsey’s finding: Agents that look great in demos often frustrate users in production. Once trust erodes, adoption collapses.

Our lesson: Onboarding AI is like hiring employees, not installing software.

Early prototypes impressed us—until real patients exposed hidden dependencies and data gaps. The fix wasn’t better models. It was better evaluation infrastructure.

What works
  • Give each agent a job description: scope, metrics, and escalation rules.
  • Test with real-world data, not ideal scripts.
  • Track metrics like task success rate, hallucination rate, escalation rate, and time to resolution.
Real-world example:

Virginia Women’s Center built evaluation sets mirroring human QA metrics—accuracy, appropriateness, patient satisfaction. They used deviations from expert judgment as learning feedback. That’s why their AI maintains high trust and consistency.

Ask yourself:
  • Do we know what “good” looks like for each agent task?
  • Are we testing edge cases, not just happy paths?
  • How will we catch degradation before users do?

4. Make Every Step Traceable

McKinsey’s finding: Without observability, scaling from a few agents to hundreds becomes chaos.

Our lesson: If you can’t see how an agent made a decision, you can’t fix it.

What works
  • Log every intermediate decision, not just outcomes.
  • Build dashboards for auditing and anomaly detection.
  • Trace which sub-agents contributed and what data they used.
Real-world example:

When one client’s appointment confirmations dipped, our logs revealed that incomplete insurance data caused misclassification. We corrected the intake step—and restored accuracy in days, not weeks.

Ask yourself:
  • Can we trace every agent decision back to source data?
  • How quickly can we isolate root causes of errors?
  • Do we have alerts for subtle performance drops?

5. The Best Use Case Is the Reuse Case

McKinsey’s finding: One-off agents create massive redundancy. Reusable components create advantage.

Our lesson: Stop reinventing workflows. Build a library.

We once rebuilt similar scheduling and verification logic for each client. Improvements didn’t propagate. It was unsustainable.

Now, our platform uses reusable agent components—conversation handlers, triage protocols, scheduling logic—that can be configured, not rebuilt.

What works
  • Identify repeating patterns across workflows.
  • Build modular, validated components teams can reuse.
  • Centralize them in a shared repository.
Real-world example:

Golden State Orthopedics implemented automation in weeks using our validated scheduling modules—benefiting from years of refinement across other practices.

Ask yourself:
  • Are we creating building blocks or custom snowflakes?
  • How much of our build time repeats existing work?
  • Do we have a living library of reusable components?

6. Humans Remain Essential—But Their Roles Evolve

McKinsey’s finding: Agents change the nature of work, not the need for people.

Our lesson: AI doesn’t replace people—it changes what people do.

When EmergeOrtho’s call volume doubled, they didn’t double staff. Instead, staff evolved:

  • Front desk teams handled exceptions and empathy-driven cases.
  • Schedulers oversaw agent output, not data entry.
  • Nurses focused on triage, not coordination.

The result: higher satisfaction, fewer errors, faster service.

What works
  • Design human-agent collaboration deliberately.
  • Create clear oversight interfaces and escalation paths.
  • Reskill staff early—before rollout—to build trust.
Ask yourself:
  • Have we defined the new human roles clearly?
  • Where does human judgment remain non-negotiable?
  • Are we treating change management as a first-class project?

How to Apply These Lessons: A Practical Framework

True AI transformation is workflow transformation.
Here’s a diagnostic checklist to assess readiness:

Workflow First, Agent Second
  • ☑ Mapped end-to-end workflows
  • ☑ Redesigned processes (not just automated)
  • ☑ Built feedback loops
Right Tool for the Task
  • ☑ Evaluated task complexity and risk
  • ☑ Mixed rules, analytics, gen AI, and agents
  • ☑ Orchestrated tool handoffs
Evaluation Infrastructure
  • ☑ Defined success metrics
  • ☑ Tested real-world complexity
  • ☑ Committed to continuous improvement
Observability
  • ☑ Tracked intermediate steps
  • ☑ Built dashboards for traceability
  • ☑ Automated anomaly alerts
Reusability
  • ☑ Centralized component library
  • ☑ Reused validated agents
  • ☑ Minimized redundant builds
Human-Agent Collaboration
  • ☑ Designed deliberate handoffs
  • ☑ Built oversight interfaces
  • ☑ Planned reskilling and communication

Common Pitfalls (and How to Avoid Them)

Pitfall What Happens Fix
Overengineering Complex agents where rules would suffice Start simple, scale complexity later
Trust erosion Early failures destroy confidence Evaluate rigorously before rollout
Technical debt Custom builds that can’t evolve Build reusable modules from day one
Change resistance Staff fear replacement Communicate role evolution early
Monitoring neglect Failures go undetected Treat observability as a core feature

Where This Is All Heading

By 2027, agentic AI will be table stakes for patient access. Practices that delay won’t just lose efficiency—they’ll lose competitiveness.

Tooling is accelerating. What took months in 2015 now takes weeks—and soon, days.

Reusable ecosystems are emerging. Pre-built, healthcare-specific agents will commoditize. The winners will integrate best—tying AI, workflows, and people into one seamless system.

Human work is shifting. As AI handles routine complexity, human value moves toward empathy, oversight, and innovation. Those who embrace this shift will lead.

The Bottom Line

McKinsey’s six lessons mirror our own:

  • Workflow redesign.
  • Right tool selection.
  • Evaluation rigor.
  • Traceability.
  • Reusability.
  • Human collaboration.

We’ve lived these for 13 years—and seen what happens when they’re ignored. The practices achieving breakthrough results didn’t just adopt technology. They transformed how they work.

Your First Step

Start small, but start smart.

  • Map one workflow end-to-end—scheduling, triage, or verification.
  • Pilot deliberately. Define metrics, monitor relentlessly.
  • Scale with reuse. Build components, feedback loops, and staff training in tandem.

The next generation of healthcare leaders won’t just buy AI—they’ll build their workflows around it.

Schedule a conversation →

Let’s discuss where you are, where you’re headed, and how to make sure your AI delivers real outcomes.

Posted By

Stephen Dean

Stephen Dean is COO of Keona Health, where he’s spent 13 years building AI systems that transform patient access. Before “agentic AI” was a term, his team was deploying autonomous systems that now handle millions of patient conversations annually.