Arms Race for AI Training Data
There is an arms race to gather training data for AI Models but organizations do not always recognize that their data is being used for free. This happens when applications with embedded AI use company data to train their AI models, or so-called Shadow AI.
Monetizing AI Training Data
There are multiple approaches to “monetizing” AI training data:
- Data Licensing License data to model providers. Companies like Reddit, Stack Overflow, News Corp, and Shutterstock have adopted this playbook.
- Litigation File lawsuits to compel some type of compensation. The New York Times filed a lawsuit against OpenAI. In similar fashion, plaintiffs have sued OpenAI and GitHub for compensation.
Software Company Playbooks
Software companies have generally adopted two approaches:
- Exclude Data From AI Training Some SaaS providers specifically exclude data from AI training in their public policies. These include Zoom, Adobe, and Microsoft.
- Stay Silent on the Subject However, most SaaS public policies do not specifically exclude data from AI training.
Responses by Enterprises
Some companies have added “AI Training” clauses in their Vendor Master Services Agreements (MSAs). These clauses specifically exclude the use of company data by vendors to train their AI models. However, companies lack the appropriate mechanisms to enforce these clauses.
Chief Data Officers Need to Explore Avenues to Unlock Data Value
Maybe it’s time for Chief Data Officers to explore another avenue to unlock the value of their data?
Start by doing the following:- Shine a spotlight on “Shadow AI” – YDC has developed Shadow AI Governance agents to automate the research process.
- Improve negotiating posture with procurement teams.
- Get vendors to formally license AI training data.
- Get something back even if it’s free tickets to the vendor’s user conference.