• AI Governance Overview
  • 358 pages and 90 vendors
  • 90 controls and 25 case studies
  • Mappings to EU AI Act and NIST AI RMF
Vertical Line
  • Agentic AI Governance
  • 19 case studies
  • 11 Agentic AI platforms
  • Companion to AI Governance Comprehensive

Introducing a Data Valuation Approach with Shadow AI Governance Agents to Identify Top Vendors Using Company Data to Train AI Models

Sunil Soares, Founder & CEO, YDC March 17, 2025

Arms Race for AI Training Data

As I discussed in an earlier blog, there is an arms race to gather training data for AI Models but organizations do not always recognize that their data is being used for free. This happens when applications with embedded AI use company data to train their AI models, or so-called Shadow AI.


In this example, we demonstrate the use of data valuation methodologies and AI agents to identify top vendors who are likely using a bank’s data for “free.” The analysis is based on a number of assumptions and should be the starting point for further research and vendor negotiations.

Start With an Inventory of COTS Apps

We started with an inventory of commercial-off-the-shelf (COTS) apps. The dataset was a simple CSV file with application_name, application_description, and vendor_name. The vendor names in this example are illustrative and were not necessarily part of the analysis dataset.



Value Aggregate Company Data Based on the EDM Council’s Data ROI Methodology

We estimated that the bank’s data was worth 30.1 percent of market capitalization based on the EDM Council’s Data ROI methodology. I was the co-chair of the EDM Council’s Data ROI Working Group back in the day along with more than 100 practitioners across industries. In the current analysis, we used an additional deflator of 50 percent to be extremely conservative and to account for the bank’s lines of business. Based on our assumptions, the bank’s data was worth approximately $527 million in aggregate.



Allocate Data Value to Each Category of COTS Apps

We used our business judgement to allocate the overall data value to individual categories of COTS apps (“data products”). For example, Corebanking was 40 percent of overall data value or approximately $211 million.



Allocate Data Value to Each Category of COTS Apps

We tagged each COTS app to a single data product category. We then used a simple average to compute the data value per COTS app by data product category. For example, there were 43 corebanking COTS apps with an average data value of $4.9 million.

Research Shadow AI Usage by Vendors with YDC_AIGOV Agents

We researched shadow AI usage by vendors with the YDC_AIGOV agents. These agents discovered apps with embedded AI and highlighted their AI data usage policies.


Adjust COTS App Data Value for Embedded AI and Data Usage Policies

We downward adjusted the data value of each COTS app for embedded AI and data usage policies based on the following algorithm:

Value of Data Captured by COTS App Vendor
Value of of COTS App Data
embedded_ai_flag 
X
data_specifically_excluded_from_ai_flag

The embedded_ai_flag = 1 when the COTS app has Embedded AI. The COTS app vendor will not capture any data value if the app has no embedded AI.

The data_specifically_excluded_from_ai_flag = 1 when the vendor’s public privacy policies “allow” them to use company data to train their models.

Both flags were computed using the YDC_AIGOV agents referred to above.

For example, the four AML/Fraud apps in the screenshot below did not require any downward adjustment because these apps have embedded AI and the vendors’ privacy policies do not specifically exclude the use of AI in training models. The vendor names have been hidden in all these examples.


The four apps in the Commercial Customers category do not have embedded AI, so the value of data captured by the vendor is $0.


Finally, the cybersecurity app also has a data captured value of $0 because data is specifically excluded from AI training (there weren’t too many such examples in our dataset).


Summarize Data Value Captured for AI Training by Vendor

All COTS vendors were capturing a potential $199 million in data value for AI training based on our analysis. The Top 5 vendors were capturing approximately $104M of data value.


Chief Data Officers Need to Explore Avenues to Unlock Data Value

Maybe it’s time for Chief Data Officers to explore another avenue to unlock the value of their data?

  1. Improve negotiating posture with procurement teams

  2. Update Vendor Master Services Agreements (MSAs) to add clauses restricting the usage of data for AI training

  3. Get vendors to formally license AI training data with the appropriate safeguards

  4. Get something back even if it’s free tickets to the vendor’s user conference

  5. Align with Third-Party Risk Management & EU Digital Operational Resilience Act (DORA) Compliance

Fairness & Accessibility

Component

Component ID: 5.0

Mitigate bias and manage AI accessibility.

List of Controls:

  • Bias
  • Accessibility
Mitigate Bias
Control
ID: 5.1

Ensure that AI systems are fair and manage harmful bias.
Component
Sub-Control
Regulation
 
Source
Address Fairness and Accessibility EU AI Act -Article 10(2)(f)(g) – Data and Data Governance (“Examination of Possible Biases”)

Vendors

Detect Data Poisoning Attacks
Control

ID: 10.4.1

Data poisoning involves the deliberate and malicious contamination of data to compromise the performance of AI and machine learning systems.

Component
Control
Regulation
Source
10. Improve Security10.4 Avoid Data and Model Poisoning AttacksEU AI Act: Article 15 – Accuracy, Robustness and Cybersecurity 

Vendors

Improve Security
Component

Component ID: 10

Address emerging attack vectors impacting availability, integrity, abuse, and privacy.  

List of Controls:

  • Prevent Direct Prompt Injection Including Jailbreak
  • Avoid Indirect Prompt Injection
  • Avoid Availability Poisoning
    • Manage Increased Computation Attack
    • Detect Denial of Service (DoS) Attacks
    • Prevent Energy-Latency Attacks
  • Avoid Data and Model Poisoning Attacks
    • Detect Data Poisoning Attacks
    • Avoid Targeted Poisoning Attacks
    • Avoid Backdoor Poisoning Attacks
    • Prevent Model Poisoning Attacks
  • Support Data and Model Privacy
    • Prevent Data Reconstruction Attacks
    • Prevent Membership Inference Attacks
    • Avoid Data Extraction Attacks
    • Avoid Model Extraction Attacks
    • Prevent Property Inference Attacks
    • Prevent Prompt Extraction Attacks
  • Manage Abuse Violations
    • Detect White-Box Evasion Attacks
    • Detect Black-Box Evasion Attacks
    • Mitigate Transferability of Attacks
  • Misuse of AI Agents
    • Prevent AI-Powered Spear-Phishing at Scale
    • Prevent AI-Assisted Software Vulnerability Discovery
    • Prevent Malicious Code Generation
    • Identify Harmful Content Generation at Scale
    • Detect Non-Consensual Content
    • Detect Fraudulent Services
    • Prevent Delegation of Decision-Making Authority to Malicious Actors

Identify Executive Sponsor

ID : 1.1 

Appoint an executive who will be accountable for the overall success of the program.

ComponentRegulationVendors
1. Establish Accountability for AIEU AI Act 
We use cookies to ensure we give you the best experience on our website. If you continue to use this site, we will assume you consent to our privacy policy.