Data Integrity and Ethics in Enterprise AI

Most of the data integrity decisions that will determine your company’s AI risk exposure are being made right now, by people who are not aware they’re making them.

Some of those decisions happen in procurement, when a vendor is selected without anyone reading the fine print on how customer data is used. Some happen in IT, when a tool is approved without a clear answer to where prompts and outputs are stored. Most of them happen on individual desktops, when an employee pastes a customer list, a contract draft, or a piece of source code into a consumer AI tool to get a quick answer.

These decisions are foundational. Get them right early and the rest of your AI program is built on solid ground. Get them wrong and you will discover, often years later, that your data exposure, IP risk, and regulatory liability are larger than anyone realized and harder to undo than they should have been.

According to McKinsey, at least 78% of companies are now aggressively integrating AI into their business processes. Safety, security, and ethics, often taken for granted in more mature enterprise solutions, are getting left behind in the rush. And when your data is also your customers’ data, the integrity questions are not optional. They are the law in California (CCPA), the EU (GDPR), and any healthcare context governed by HIPAA.

This piece is about the questions you need to be asking, in two places that most companies are currently underexamining: the vendor relationship, and the shadow AI economy that has formed in parallel to it.

What to Demand from Any Vendor Selling You AI

Big AI companies tend to lead by example. You don’t want a single page that pays lip service to security. You want to see an entire section dedicated to how that company is pursuing safe, ethical usage of AI. The way a vendor documents their data handling practices is itself a signal: detailed, specific commitments are made by companies that take this seriously. Vague or absent documentation is its own answer.

What good looks like: Adobe Firefly

Adobe’s policy for Firefly, their image generation model, is a strong example of what you should ask of every AI-enabled solution. Adobe represents that:

Customer data is never used to train Firefly, which means your proprietary data won’t be ingested and displayed to other Adobe customers.
Firefly is only trained on data that Adobe has permission to use for training, which means you don’t have to worry about pirated content.
Adobe does not own any content you create with Firefly, which means you’re free to use Firefly’s output in your own business.

Each of those is a specific, defensible commitment. That is the bar.

OpenAI: Enterprise terms are not consumer terms

OpenAI’s enterprise portal (though notably not their consumer-grade portals) states that:

Enterprise users own all of their data, and OpenAI does not use data input by enterprise customers to train its models.
Data is encrypted both at rest and in transit, and OpenAI has successfully completed a SOC 2 audit around its security and confidentiality controls, and complies with many other frameworks.
Custom models you create are not exposed to other customers.

The phrase “though notably not their consumer-grade portals” is the important one. The promises that protect enterprise customers do not extend to the free or consumer accounts that most of your employees use on their own devices. That distinction is the seam where shadow AI risk lives, and we’ll come back to it.

Google Gemini: Read what is and isn’t guaranteed

Google’s Gemini LLM provides a list of policy guidelines that Gemini shouldn’t do, but does not assert or guarantee that the model won’t do them anyway. If you want actual promises about your data security, you need to use Gemini for Workspaces. Using the portal for the general public could put your proprietary data at risk.

You should also verify that you are allowed to use the output of an LLM. For instance, Google asserts that Gemini includes a source when it outputs lengthy amounts of code, so that you can comply with any licensing requirements. The implicit statement: just because Gemini outputs something doesn’t mean you can use it, or that Google was allowed to use it in training.

This pattern, where a vendor publishes guidelines without warranties, is common. Read what is actually being promised. Read what is conspicuously not being promised. The difference is where your risk lives.

The Shadow AI Risk Most Companies Aren’t Measuring

Vendor evaluation is necessary, but it isn’t sufficient. The harder data integrity problem in most enterprises right now is shadow AI: employees using consumer AI tools on their personal accounts, with company data, in workflows nobody approved.

This is happening at meaningful scale. MIT’s research found that while only 40% of companies have official AI subscriptions, over 90% of employees report using personal AI tools for work tasks. Most of that activity is invisible to IT, security, and compliance. Most of it involves company data being pasted into systems whose terms of service are entirely different from the enterprise agreements your procurement team negotiated.

The risks from shadow AI fall into four categories that should be addressed deliberately rather than discovered after an incident.

Data exposure

Consumer AI tools typically reserve the right to use submitted data for training, model improvement, or other purposes their terms permit. When an employee pastes a customer list into a consumer chatbot to summarize it, that customer list has now left your environment. Depending on the tool and the prompt, it may be retained, used for training, or accessible to support staff. None of the protections from the vendor’s enterprise contract apply, because the employee isn’t using the enterprise contract.

Regulated data leakage

If your business handles healthcare data covered by HIPAA, financial data covered by GLBA, or personal data covered by GDPR or CCPA, an employee pasting that data into a consumer AI tool is likely a reportable incident. Most employees doing this don’t know that. Most security teams don’t have visibility into it happening. The exposure window can run for months before anyone realizes the data has left.

IP and confidentiality

Source code pasted into a consumer AI tool to get help debugging is, depending on the tool’s terms, potentially leaving your codebase. Contract drafts, M&A documents, board materials, and product roadmaps pasted into a consumer chatbot for summarization are creating IP exposure that nobody negotiated and nobody is tracking. The IBM Cost of a Data Breach Report has documented that shadow AI breaches take an average of 247 days to detect; far longer than breaches involving sanctioned tools.

Output you can’t actually use

This one is subtle. When an employee uses a consumer AI tool to generate marketing copy, code, or product designs and brings the output back into the business, the question of who owns that output and whether the company has the right to commercialize it is genuinely unclear. Consumer terms vary. Some tools assert no claim. Others reserve rights that complicate downstream use. By the time legal discovers the issue, the output is already in production.

What to Do About It

Banning shadow AI doesn’t work. Employees adopted these tools because the tools made them more productive, and the productivity gains are real. Telling them to stop, without offering a sanctioned alternative that’s as fast and as good, is a policy that will be ignored. The right approach is the harder one: replace shadow AI with sanctioned AI faster than the shadow economy can grow.

Practically, this means three things.

Get sanctioned tools into employees’ hands quickly. Every month an employee can’t get to a sanctioned AI tool is a month they’re going to use a consumer one. Procurement and security cycles that take 12 to 18 months to approve enterprise AI access are the single biggest driver of shadow AI risk in most companies.

Make the rules clear and specific. “Don’t use consumer AI” is too vague to be actionable. “Don’t paste customer PII, source code, contracts, or any document marked confidential into a consumer AI tool, and here are the sanctioned alternatives for each of those use cases” is a policy employees can actually follow.

Monitor and measure, not just write policies. Most enterprises have no visibility into what AI tools are actually being used and what data is flowing through them. Lightweight network monitoring, endpoint visibility, and data loss prevention tooling can identify shadow AI activity without making employees feel surveilled. The goal isn’t to punish; it’s to know what’s actually happening so you can replace it with something safer.

The Three Questions That Still Matter

Whether you’re evaluating a vendor, auditing a sanctioned tool, or trying to figure out what your employees are actually doing, the same three questions apply:

Where did the data come from?
Where will your data go?
And how do you know?

Companies that ask these questions deliberately, both of their vendors and of themselves, will reach the later stages of AI maturity with their data trust intact. Companies that don’t will discover late that their AI ambitions are constrained by exposures they didn’t manage early. Data integrity is not a compliance checkbox. It’s the foundation everything else gets built on.

About WNDYR

WNDYR is an AI-native transformation consultancy that guides enterprise leaders in moving beyond “AI-Powered” tools to become true “AI-Native” organizations. Our Aware, Automate, Accelerate, Architect framework provides a clear, C-suite-led journey from operational efficiency to category-defining market leadership. We partner with clients to build the foundational strategy, operating model, and data platforms required to architect new value and build a predictive, intelligent enterprise.

Sources

McKinsey, The State of AI
Adobe AI Ethics overview
OpenAI Trust portal
Google Gemini policy guidelines and Workspace documentation
MIT NANDA, State of AI in Business 2025
IBM, Cost of a Data Breach Report
CCPA, GDPR, HIPAA regulatory frameworks

The WNDYR Team