Agent tooling and agentic data are the running thread today: a new agent tool-search landed alongside a huge public stream of agent traces and a how-to for turning them into SFT data, while an open-source self-improving agent now claims it can update both harness and weights. Builders should treat these as composition and iteration primitives — they materially speed up workflows (and Anthropic’s evals are reporting big accuracy jumps on some agent tasks) but echoes from industry remind us agents are assistants, not replacements, and UI/UX still matters when a “conversational intern” feels underlending. Clinical and product teams are already shipping narrow wins—diagnostics and customer-to-code pipelines—showing practical integration is the low-risk path.
On the modeling side, cross-tokenizer distillation is back in the spotlight: a new projection-guided method reports a few-point lead over prior KD baselines on a 1B Llama variant. Google’s demo wave and a push toward standardized third-party evaluation playbooks suggest the ecosystem is finally prioritizing reproducible stress tests, while headlines about biodefense work and large vendor run-rate numbers underline the scale and stakes we’re building for.