Today’s AI news — distilled (Europe/Madrid, 2026-02-18)
TL;DR
Today’s signal: AI is moving from demo value to production reliability. The most important updates center on measurable evaluation, repeatable agent workflows, and consumer-facing creative tooling.
Top 3
1) OpenAI introduces EVMbench
Summary: OpenAI published EVMbench, a benchmark focused on AI performance in smart-contract and EVM-oriented tasks where correctness and adversarial robustness are critical.
Why it matters:
- Benchmarks tied to security-sensitive domains help teams evaluate models on failure cost, not just generic scores.
2) OpenAI details “harness engineering” in agent-first development
Summary: OpenAI shared lessons from a multi-month experiment building product workflows with minimal human-written code, emphasizing test harnesses, feedback loops, and evaluation infrastructure.
Why it matters:
- The practical moat is shifting toward orchestration and QA systems that make AI output reliable in real production pipelines.
3) Google adds Lyria 3 music generation to Gemini
Summary: Google announced Lyria 3-powered music creation in the Gemini app, extending multimodal creation from text and images into end-user audio workflows.
Why it matters:
- Creative AI is becoming a daily product surface, increasing adoption pressure on rivals and raising new questions around rights, attribution, and monetization.