Microsoft’s Build week released a clear playbook: models are being turned into purpose-built agents and runtimes. New MAI models, an advanced-reasoning stack, and an assistant built on OpenClaw sit alongside a proposed OS for agent gadgets and developer tooling that lets you specify and run behavior tests from plain text. That combo — model + runtime + test harness — is where platform competition is concentrating. Even small tooling moves matter: tiny runtimes and editor conveniences are showing up for rapid iteration on agents.
The wider field is shifting from experiments to production. OpenAI is pushing Codex-style automation into white‑collar workflows and commercial deployments (including an insurer’s rollout), Anthropic reports Mythos operating in critical infra across 15+ countries, and high‑impact models are generating both admiration and unease. Practical takeaway for builders: invest first in reproducible behavior tests, tight runtime controls, and incremental integration points so you can fail fast and audit later. Safety and governance aren’t optional when agents leave the lab.