• AI Governance Overview
  • 358 pages and 90 vendors
  • 90 controls and 25 case studies
  • Mappings to EU AI Act and NIST AI RMF
Vertical Line
  • Agentic AI Governance
  • 19 case studies
  • 11 Agentic AI platforms
  • Companion to AI Governance Comprehensive

There is an Arms Race to Gather Data to Train AI Models But Organizations Do Not Recognize That Their Data is Often Being Used for Free

Sunil Soares, Founder & CEO, YDC March 5, 2025

Arms Race for AI Training Data

There is an arms race to gather training data for AI Models but organizations do not always recognize that their data is being used for free. This happens when applications with embedded AI use company data to train their AI models, or so-called Shadow AI.


Monetizing AI Training Data

There are multiple approaches to “monetizing” AI training data:

  1. Data Licensing
    License data to model providers. Companies like Reddit, Stack Overflow, News Corp, and Shutterstock have adopted this playbook.

  2. Litigation
    File lawsuits to compel some type of compensation. The New York Times filed a lawsuit against OpenAI. In similar fashion, plaintiffs have sued OpenAI and GitHub for compensation.

Software Company Playbooks

Software companies have generally adopted two approaches:

  1. Exclude Data From AI Training
    Some SaaS providers specifically exclude data from AI training in their public policies. These include Zoom, Adobe, and Microsoft.

  2. Stay Silent on the Subject
    However, most SaaS public policies do not specifically exclude data from AI training.

Responses by Enterprises

Some companies have added “AI Training” clauses in their Vendor Master Services Agreements (MSAs). These clauses specifically exclude the use of company data by vendors to train their AI models. However, companies lack the appropriate mechanisms to enforce these clauses.

 

Chief Data Officers Need to Explore Avenues to Unlock Data Value

Maybe it’s time for Chief Data Officers to explore another avenue to unlock the value of their data?

Start by doing the following:

  1. Shine a spotlight on “Shadow AI” – YDC has developed Shadow AI Governance agents to automate the research process.

  2. Improve negotiating posture with procurement teams.

  3. Get vendors to formally license AI training data.

  4. Get something back even if it’s free tickets to the vendor’s user conference.

Fairness & Accessibility

Component

Component ID: 5.0

Mitigate bias and manage AI accessibility.

List of Controls:

  • Bias
  • Accessibility
Mitigate Bias
Control
ID: 5.1

Ensure that AI systems are fair and manage harmful bias.
Component
Sub-Control
Regulation
 
Source
Address Fairness and Accessibility EU AI Act -Article 10(2)(f)(g) – Data and Data Governance (“Examination of Possible Biases”)

Vendors

Detect Data Poisoning Attacks
Control

ID: 10.4.1

Data poisoning involves the deliberate and malicious contamination of data to compromise the performance of AI and machine learning systems.

Component
Control
Regulation
Source
10. Improve Security10.4 Avoid Data and Model Poisoning AttacksEU AI Act: Article 15 – Accuracy, Robustness and Cybersecurity 

Vendors

Improve Security
Component

Component ID: 10

Address emerging attack vectors impacting availability, integrity, abuse, and privacy.  

List of Controls:

  • Prevent Direct Prompt Injection Including Jailbreak
  • Avoid Indirect Prompt Injection
  • Avoid Availability Poisoning
    • Manage Increased Computation Attack
    • Detect Denial of Service (DoS) Attacks
    • Prevent Energy-Latency Attacks
  • Avoid Data and Model Poisoning Attacks
    • Detect Data Poisoning Attacks
    • Avoid Targeted Poisoning Attacks
    • Avoid Backdoor Poisoning Attacks
    • Prevent Model Poisoning Attacks
  • Support Data and Model Privacy
    • Prevent Data Reconstruction Attacks
    • Prevent Membership Inference Attacks
    • Avoid Data Extraction Attacks
    • Avoid Model Extraction Attacks
    • Prevent Property Inference Attacks
    • Prevent Prompt Extraction Attacks
  • Manage Abuse Violations
    • Detect White-Box Evasion Attacks
    • Detect Black-Box Evasion Attacks
    • Mitigate Transferability of Attacks
  • Misuse of AI Agents
    • Prevent AI-Powered Spear-Phishing at Scale
    • Prevent AI-Assisted Software Vulnerability Discovery
    • Prevent Malicious Code Generation
    • Identify Harmful Content Generation at Scale
    • Detect Non-Consensual Content
    • Detect Fraudulent Services
    • Prevent Delegation of Decision-Making Authority to Malicious Actors

Identify Executive Sponsor

ID : 1.1 

Appoint an executive who will be accountable for the overall success of the program.

ComponentRegulationVendors
1. Establish Accountability for AIEU AI Act 
We use cookies to ensure we give you the best experience on our website. If you continue to use this site, we will assume you consent to our privacy policy.