The arms race in AI isn’t happening in the models, it’s happening in the data. You can stack GPUs till the lights flicker in Palo Alto, but if you’re training on the same public sludge as everyone else, you’re just spinning silicon. The future belongs to whoever controls the clean, rare, regulated stuff. And that’s exactly where Protege steps in.
Protege just locked down a $25M Series A to go even deeper into the data vault. Footwork led the charge (shoutout to Nikhil Basu Trivedi), with the real ones returning: CRV, Bloomberg Beta, Flex Capital, Shaper Capital, and Liquid 2 Ventures. They’re not betting on potential, they’re doubling down on performance. The kind that turned a $10M seed round last fall into 20× business growth, tens of millions paid out to data partners, and a catalog that reads like a training-data fever dream.
Start with the founders, Bobby Samuels (CEO), Travis May (Co-Founder), Engy Ziedan (Chief Scientific Officer), and Richard Ho (CTO). This team isn’t just chasing the market, they built the last generation of infrastructure at LiveRamp, Datavant, and across top-tier healthcare AI stacks. They’ve been on both sides of the data moat. Now they’re building the drawbridge.
What’s wild is how fast it’s moving. Over 100 proprietary data partners are already in the network, feeding AI labs in biotech, media, and frontier research. We’re talking 300K+ hours of video, 500K+ hours of audio, billions of clinical notes, and hundreds of millions of medical images. The verticals aren’t just expanding, they’re exploding: Audio & Speech and Motion Capture just launched this month.
The platform itself is a fortress: end-to-end encryption, tokenized access, audit trails, HIPAA-aligned governance, and SOC 2 Type II compliance. You don’t just get the data, you get the receipts. And if you’re in healthcare, media, or anywhere AI and IP collide, this is the stack you want under your stack.
This isn’t some vague data marketplace fluff. Protege is becoming the go-to for secure, compliant, high-fidelity training data. The kind that unlocks real model differentiation, not just leaderboard vanity metrics.
The race for AI dominance is quietly becoming a supply chain story. And Protege is out here building the rails, signing the freight, and owning the yards.

