San Francisco just dropped another heavyweight into the AI arena, and this one is built to keep the machines honest. Scorecard AI, founded by Darius Emrani, came out of stealth this week with a $3.75M seed round led by Kindred Ventures, joined by Neo, Inception Studio, Tekton Ventures, and a roster of angels from OpenAI, Apple, Waymo, Uber, Perplexity, and Meta. When investors from the world’s biggest AI labs and platforms line up, you pay attention. They’re not just writing checks, they’re validating the problem.
The problem is deceptively simple; how do you know your AI agent isn’t hallucinating, skipping steps, or tanking compliance in industries where a wrong answer costs real money, or worse? Emrani saw this firsthand while leading simulation at Waymo and shaping evaluation at Uber ATG. At both, speed was oxygen, but testing was a bottleneck. Scorecard AI was born out of that frustration, a platform designed to stress-test AI agents at scale with a nocode API and a TypeScript SDK that turns evaluation into a daily rhythm, not a quarterly scramble.
Within weeks of stealth, Scorecard AI wasn’t just building, it was billing. Thomson Reuters is already using it to validate and deploy CoCounsel, its legal AI suite. Millions of automated evaluation tests have been run through the engine, proof that this isn’t a lab toy. It’s infrastructure. A 5-person team, led by Emrani, took an open-source SDK to market and landed enterprise clients before most startups finish their first pitch deck. That’s velocity you can measure.
The platform itself is built like a trading desk for AI validation, an interactive dashboard with real-time metrics, CI/CD integrations so every commit gets tested, and customizable analytics that track performance, safety, efficiency, and compliance. Add enterprise-grade security and immutable audit trails, and you’ve got something that lets banks, hospitals, insurers, and law firms sleep at night. It’s evaluation as a service, industrial-strength and globally scalable.
The seed money is jet fuel for the next act. Scorecard AI is hiring engineers, product designers, and GTM talent, expanding into finance, healthcare, insurance, and legal while broadening integrations with model providers. Compliance-focused test modules are on the roadmap, and an open community hub at AgentEval.org is set to benchmark the industry. It’s a smart allocation of capital, build where regulation demands precision, sell into markets where trust is currency, and scale evaluation throughput until “manual testing” sounds as dated as dial-up.

