Microsoft Launches Copilot Health, Backed by Diagnostic AI That Outscored Physicians on Complex Cases

An AI system that scored 85.5% on complex diagnostic cases — against a physician average of roughly 20% — is now the technical backbone of Microsoft's latest consumer product. Copilot Health, which opened a U.S. waitlist on March 12, 2026, brings that research closer to everyday users: a secure space inside Microsoft Copilot that pulls health data from more than 50 wearable devices, health records from 50,000-plus U.S. hospitals, and lab results from Function Health, then synthesizes it into personalized insights.

The wearable integrations cover Apple Health, Oura, and Fitbit, among others. Hospital records flow in via HealthEx; lab data via Function. Rather than presenting raw numbers, the platform draws connections between metrics — linking sleep quality to other health markers, for instance — and flags questions worth raising at a clinical appointment. It carries ISO/IEC 42001 certification, the international standard for AI management systems.

The diagnostic engine underneath is MAI-DxO (Medical AI Diagnostic Orchestrator), a multi-model system Microsoft describes as a "virtual panel of physicians with diverse diagnostic approaches." It coordinates reasoning across GPT, Llama, Claude, Gemini, Grok, and DeepSeek — stepping through diagnostic logic, requesting virtual investigations, and checking its own reasoning before committing to an answer. The 85.5% accuracy figure comes from the Sequential Diagnosis Benchmark, derived from 304 complex cases originally published in the New England Journal of Medicine. The physician mean across 21 experienced doctors was about 20%. Microsoft is explicit that MAI-DxO is not approved for clinical use and that any capability drawn from this research reaches Copilot Health only after rigorous clinical evaluation.

Structurally, Microsoft built Copilot Health as an orchestration layer rather than a direct data collector — delegating aggregation to HealthEx and Function, which limits regulatory exposure while drawing in data across a broad range of sources. The platform also includes curated health content validated against National Academy of Medicine principles, with expert-written answer cards from Harvard Health and a real-time U.S. provider directory. Microsoft's existing Copilot products already field more than 50 million consumer health questions per day, giving Copilot Health distribution no health AI startup can match from day one.

Google, Apple, and a cluster of digital health companies are all competing for the same role: a trusted intelligence layer sitting between patients and the clinical system. Microsoft is betting that owning orchestration and distribution matters more than owning the underlying data. The "medical superintelligence" framing is aggressive for a product still on waitlist — how far it sticks will depend on what regulators and physicians make of a benchmark gap that, if it holds up in clinical settings, is impossible to ignore.