January 8, 2026 did not read like a victory lap. It read like alignment. Protege secured a $30M Series A extension led by Andreessen Horowitz, bringing total capital raised to $65M since launching in 2024. No theatrics. No chest pounding. Just capital moving toward infrastructure that has already proven it belongs in the center of the AI economy.
Protege operates where the real friction lives. Data that is regulated, licensed, governed, and actually usable. The kind of data that trains models without blowing up reputations or compliance teams. Headquartered in New York City, the company was founded by Bobby Samuels, Travis May, Engy Ziedan, and Richard Ho, a group that did not discover data problems in a pitch deck. They inherited them the hard way and decided to rebuild the rails.
Bobby Samuels brings years inside Datavant and LiveRamp, where regulated data exchange is not a concept, it is survival. Travis May has already built category defining data companies, exited them, then scaled them again at a level most founders only study. Engy Ziedan leads the Protege Data Lab with academic firepower that gets cited by the CDC while still shipping production grade datasets. Richard Ho brings platform discipline from MobLab and InsightNexus, building systems meant to hold weight, not just demos.
Andreessen Horowitz led the extension, with Daisy Wolf pointing directly at the future. AI will be shaped by who can responsibly unlock the world’s most valuable data. Returning investors Footwork, CRV, Bloomberg Beta, Flex Capital, and Shaper Capital did not need a new thesis. They have been watching execution compound.
In 2025 alone, Protege grew 20x and generated tens of $M in payouts to data partners. The platform now works with over 100 data partners and serves leading AI companies worldwide, including the majority of the Magnificent Seven. The catalog spans 300K+ hours of video, 500K+ hours of audio, billions of clinical notes, and hundreds of millions of medical images. Scale with receipts.
SHOT video datasets, multilingual conversational audio, motion capture for robotics, healthcare evaluation benchmarks, and encounter level EHR and imaging data all flow through a governed marketplace built for consent, provenance, and clarity. Protege handles the legal, technical, and operational drag so builders can build and data owners get paid without regret.
The lesson is not subtle. Models do not fail because of compute. They fail because the data is weak, risky, or inaccessible. Protege is becoming the connective tissue between proprietary reality and modern AI, and this $30M is not a finish line. It is fuel. The only real question is how long the rest of the market keeps pretending this layer was optional.
Startups Startup Funding Venture Capital Series A AI Data Data Driven Healthcare Data Infrastructure Compliance Technology Innovation Tech Ecosystem Startup Ecosystem DCTalks

