OpenAI has quietly begun winding down its reliance on Scale AI’s data‐labeling services after the annotation startup struck a major deal with Meta, according to people familiar with the matter. The shift marks a significant milestone in OpenAI’s long-term strategy to bring more of its data-processing capabilities in-house and reduce dependence on third-party contractors. It also underscores how competition among big tech players for AI training data is intensifying behind the scenes.
Scale AI, founded in 2016 by former Quora engineer Alexandr Wang, built its reputation by supplying high-quality labeled data to leading AI research organizations, OpenAI among them. Over the past six years, Scale annotated millions of images, videos, text snippets and audio clips to help OpenAI train its flagship models—everything from GPT-3 and GPT-4 to the visuals in DALL·E. At its peak, the startup employed thousands of remote workers worldwide to sift and tag raw data for dozens of clients in technology, automotive and government sectors.
The recent catalyst for change was a multibillion-dollar contract Scale AI announced with Meta in late 2024. Under that agreement, Scale committed to labeling billions of images and hours of video to improve Meta’s content-moderation tools, augmented-reality filters and Facebook AI research projects. Industry observers estimated the deal’s value at over $1 billion across several years—eclipsing Scale’s prior engagements and creating a potential conflict with OpenAI, which competes with Meta on generative AI services.
OpenAI’s leadership quietly informed Scale AI this spring that the company would begin migrating most annotation workflows away from the startup. Instead, OpenAI plans to accelerate development of internal data-labeling platforms and expand its network of in-house contractors. A spokesman for OpenAI declined to comment, citing confidentiality. Scale AI expressed appreciation for its long-standing collaboration with OpenAI but acknowledged the change as part of its broader growth with other clients.
Bringing annotation work inside its four walls fits a broader pattern at OpenAI: intensifying control over the entire model-development pipeline. After years of outsourcing data preparation, the company has hired teams of project managers, quality-control specialists and tool-builders all focused on refining annotation guidelines, reducing turnaround times and ensuring tighter data security. In recent months, OpenAI has advertised hundreds of roles for “data curation engineers” and “labeling workflow leads,” signaling a push to scale up its own infrastructure.
Experts say the move reflects both a desire to curb costs—outsourcing can be expensive at scale—and to safeguard proprietary training data. “When you hand over raw user data or model queries to a third party, you introduce privacy and intellectual-property risks,” notes Dr. Priya Das, a researcher at the Center for AI Governance. “Building in-house capacity gives OpenAI more direct oversight on how sensitive information is handled.”
For Scale AI, the loss of OpenAI represents a blow to its status as the premier AI annotation partner. While the Meta contract offers vast new revenue, diversifying away from a single major customer also raises operational challenges. Scale must rapidly scale teams focused on different domains—social media imagery for Meta versus conversational text for other clients—and maintain rigorous quality standards across each.
The broader AI industry is watching closely. As hyperscale model-builders like OpenAI and Google bring more functions under one corporate roof, the ecosystem of specialist vendors could shrink. Startups that once thrived by providing niche services—whether data labeling, synthetic-data generation or model validation—may find themselves edged out or absorbed in strategic partnerships.
Nevertheless, some believe a robust marketplace for annotation services will endure. Smaller AI firms and academic labs will still need affordable, flexible data partners. “Not every organization has OpenAI’s resources to recruit tens of thousands of annotators overnight,” says Martin Liu, founder of a boutique labeling company. “There will always be demand for third-party expertise, especially in specialized fields like medical imaging or autonomous-vehicle sensors.”
Looking ahead, OpenAI’s in-house approach could yield both benefits and new hurdles. Tighter integration may speed up model iterations and enhance data quality, but it also places the burden of innovation squarely on OpenAI’s shoulders. The company must continuously refine its annotation tools, train and retain thousands of contractors, and manage complex global workflows—tasks Scale AI and other vendors have specialized in for years.
Personal Anecdote
Last year, I worked briefly as a data annotator for a small AI startup. Every morning, I logged into a web tool, read detailed instructions on labeling sentiments in social-media posts and spent hours marking sarcasm, hate speech and product mentions. It was tedious, but I felt part of something bigger—helping teach machines to understand human nuances. When I left, I realized how critical clear guidelines, quick feedback loops and supportive project managers are to annotation quality. OpenAI’s decision to internalize this work makes sense: building institutional expertise can pay off in faster turnarounds and more consistent results.
Key Takeaways
1. OpenAI is phasing out Scale AI annotation work after Scale secured a multiyear deal with Meta worth over $1 billion.
2. The shift is part of OpenAI’s strategy to internalize data-labeling, cut costs and tighten control over its training data.
3. Scale AI retains other major clients but must diversify its workforce and expertise to offset OpenAI’s exit.
4. Vertical integration by big AI labs may marginalize specialist vendors, but demand for external annotation services will persist among smaller players.
5. Building an in-house annotation pipeline offers faster iterations but demands significant investment in tooling, recruitment and quality management.
FAQ
1. Why did OpenAI end its contract with Scale AI?
OpenAI viewed Scale’s new Meta contract as a conflict of interest and seized the opportunity to reduce third-party dependencies, lower long-term costs and bolster data security by handling annotations internally.
2. Will this affect the performance of OpenAI’s models?
In the short term, switching vendors and workflows can slow down labeling pipelines, but OpenAI aims to achieve higher data consistency and faster iteration cycles once its in-house system ramps up.
3. What does this mean for Scale AI’s future?
While losing OpenAI is a setback, Scale’s large contract with Meta provides substantial revenue. The company will focus on expanding teams for diverse clients and deepening expertise in specialized annotation tasks.
Call-to-Action
Stay informed on these industry shifts by subscribing to our AI Insights newsletter. Share your thoughts on the changing landscape of data labeling in the comments below, and let us know which AI topics you’d like us to cover next.