AI Safety: A Quick Reference

Taḋg Paul · 27 Mar 2025

This is a reference companion to my series on understanding AI. Before diving into the practical guides and opinion pieces, take a moment to understand the risks.

Using AI-generated material

Before using information or data generated by an AI service, be aware:

Generative AI can hallucinate, which is another way of saying it makes things up. Check all information before using it. Carefully parse summaries in case it misses nuance or key information.

Always disclose when using content directly generated by an AI model. That's not just best practice, it's the law.¹

Don't use it for anything critical—legal, medical, life-critical. It is not a substitute for a human.

Your data

Before uploading data to an AI service:

If it's someone else's data, including something they've written, get their permission first. Avoid uploading personally identifiable information (PII) or anything sensitive. If unsure, don't upload it.

Data you upload may be stored (harvested) and used for further training their models. Some paid services offer opt-outs.

A short summary of the main services

Free

ChatGPT, Claude and Microsoft Copilot have limited privacy controls but no opt-out of harvesting on free tiers.

For programming, Amazon Q does not store or harvest your code.

LibreTranslate is open source and does not store or harvest your data.

Paid

Robust privacy controls from ChatGPT, Claude and Microsoft Copilot on harvesting of your data. However the default will be to collect data, so you'll have to go and turn it off in your account settings.

For programming, GitHub Copilot does not make a clear guarantee for personal users against harvesting your code.

DeepL Pro does not store or harvest your data.

Avoid

Google do not allow consumers to opt-out of data harvesting by their AI services.²

DeepSeek don't offer information on privacy—assume they're harvesting just like the others.

Bottom line

If you're going to use it for anything serious, consider paying for a service that offers you control of your data. Amazon Q for programming and LibreTranslate are exceptions to that rule.

That doesn't mean you can't use the free services to learn. Be aware of the risks and be careful what data you share.

Images and videos

Not just with AI but sharing on the internet in general:

Images contain metadata usually including exact location and time. This can be used to identify you or your location. It can be removed using free open source software such as ExifCleaner.

Don't use services that offer to remove metadata by uploading to a website—who are they? Doesn't that defeat the purpose?

DON'T TRUST YOUR DATA TO JUST ANYONE. BE SURE YOU KNOW WHO THEY ARE AND WHAT THEY DO WITH IT. CHECK THE PRIVACY POLICY.

Deploying generated code

Deploying code that you don't understand, especially to the public cloud, comes with risks.

Security vulnerabilities

Code might introduce exploits (SQL injection, command injection, insecure deserialisation) if it handles inputs unsafely.

Unintended behaviour

The code may work differently than expected—especially if it was generated based on ambiguous or under-specified prompts.

Maintainability issues

If the code is opaque or overly complex, future debugging or updates become difficult, increasing long-term tech debt.

Compliance breaches

The code might violate licensing terms, data protection policies, or regulatory constraints (GDPR), especially if third-party libraries are embedded.

Specific technical risks

Hardcoded secrets or unsafe defaults: AI might generate code with secrets in plaintext or with insecure configurations (permissive CORS, open S3 buckets).

Resource mismanagement: Poor handling of memory, threads, or async calls might lead to crashes, race conditions, or degraded performance.

Dependency hell: Generated code might suggest niche or outdated libraries, potentially leading to version conflicts or insecure packages.

Silent failures: Lack of error handling or logging can mean critical issues go undetected, especially in batch jobs or background workers.

Cost

Generated code might lead to excessive resource consumption (infinite loops, excessive API calls), resulting in massive unexpected cloud bills. You could end up with a bill for thousands.

Social and organizational risks

If a team routinely deploys code they don't understand, it sets a precedent for poor engineering practices. New team members will struggle to decipher opaque, unreviewed code, slowing velocity. When something breaks, the team may lack the context or confidence to respond quickly.

Footnotes

Other jurisdictions may have similar laws. I've mentioned the EU act as that's what's relevant to me and most people reading this, but check your local laws.

They spin a good yarn on privacy controls but they're usually talking about generic account data settings, or enterprise agreements offered to large organizations. They applaud their own transparency on how data is used, but as a consumer, if you don't want your data harvested I would avoid.