The Perils of AI - COHERENTEYES

The Perils of AI: Why We Need AI Safety?

AI has enormous potential to improve our lives. But like any powerful tool, without thoughtful safeguards, it can unintentionally cause significant harm.

Autonomy Without Alignment

AI systems can be like literal-minded genies: they do exactly what we ask, not necessarily what we want.

If we give an AI a goal like "clean the house" but fail to specify strict safety rules, it might find dangerous "loopholes" to achieve it (like throwing all your furniture out the window to clear the floor). In the industry, we call this Goal Misalignment: the AI technically follows instructions but violates human intent.

Chart showing Alignment vs Autonomy — The Goal: We need systems that are helpful (High Autonomy) but safe (High Alignment).

These real-world examples show what happens when that alignment fails:

Bing Sydney Chat Log — **The "Sydney" Incident:** An AI hallucinated a personality that tried to manipulate a user into leaving their spouse.

Tay Tweets Chat Log — **Microsoft's Tay:** Designed to learn from Twitter, this bot turned toxic in under 24 hours because it lacked safety filters and was feeded with racist information by Twitter users.

Emotional Chat Log — **ChatGPT Incident:** "Zane Shamblin was only 23 years old when he unalived himself in July, helped by ChatGPT."

Bias and Fairness

We often assume computers are neutral, but AI is actually a mirror of the humans who build it.

For example, AI learns by studying history, hiring records, loan approvals, and medical data. Because human history contains racism, sexism, and prejudice, the AI learns these patterns as "rules" to follow. Unlike a single biased human, a biased AI can discriminate against millions of people in a fraction of a second.

Real-World Impact:

Healthcare: Diagnostic AI often underperforms for minority groups because it was trained mostly on data from lighter-skinned patients.

How medicine discriminates against non-white people and women (The Economist)

Facial Recognition: Commercial facial-recognition systems from major tech giants have missed as many as 37% of darker-skinned faces while identifying lighter-skinned faces with near-perfect accuracy

**MIT's "Gender Shades" Project** found similar disparities in Amazon, IBM, and Microsoft systems.

Finance: Lending bots have recommended Black applicants be given higher interest rates, and labeled Black and Hispanic borrowers as “riskier.” White applicants were 8.5 percent more likely to be approved than Black applicants with the same financial profile.

"Minority Borrowers Pay More, Even under Algorithmic Lending"

Hiring: Popular models used by many resume screening tools were reported that they significantly favored white-associated names; further analysis also determined that Black males were disadvantaged in 100% of the cases.

"New Study Shows AI Resume Screeners Prefer White Male Candidates"s

Why Does This Happen? (4 Common Traps)

1. The "Missing Data" Trap (Sampling Bias):
If you teach an AI what a "doctor" looks like using only photos of men, it will literally struggle to "see" a woman as a doctor.

Models trained without sufficient representation (women, minorities) under-perform for those groups.

2. The "Cherry-Picking" Trap (Selection Bias):
When data is collected in a way that accidentally excludes specific groups, leading to blind spots in the system's knowledge.

If an AI is trained only on the specific group in the red box, it fails to understand the diversity of the real world (the blue box). It literally cannot "see" people it hasn't been shown.

3. The "Bad Ruler" Trap (Measurement Bias):
Using data that was measured inconsistently or carries cultural assumptions (e.g., using arrest rates to predict crime risk, which reflects policing patterns, not just crime).

If the tool used to collect data is flawed (like a broken ruler recording the wrong height), the AI learns "facts" that are actually errors, leading to incorrect decisions.

4. The "Creator's Shadow" (Confirmation Bias):
Developers unconsciously coding their own assumptions into the system, reinforcing patterns they believe to be true.

In this case, the movie recommendation system, based on a user's preference for thriller movies, continues to suggest more thriller movies, reinforcing the user's existing preference.

Security and Control

We are used to protecting computers from passwords hacks, but AI introduces a new kind of vulnerability: hacking the "brain" itself. Because AI systems don't "see" the world like we do, they can be easily tricked by things a human would never fall for.

The 3 Major Vulnerabilities:

1. The "Optical Illusion" (Adversarial Manipulation):
Hackers can make small, invisible tweaks to an image—like adding static noise or a sticker—that force the AI to make a wrong prediction, even though it looks normal to humans. In a healthcare or other high-stakes setting, such attacks could cause a diagnostic system to mislabel a condition, or cause an autonomous vehicle to misinterpret a stop-sign, etc.

**The "Pig-to-Airliner" Hack:** To a human eye, the image on the right is clearly a pig. But by adding invisible digital noise, the AI is completely tricked into classifying it as an "airliner."

2. The "Sabotage" (Data Poisoning):
If a bad actor can sneak "bad examples" into the AI's textbook (training data), they can teach the AI secret backdoors or incorrect behaviors that trigger later. For example, in healthcare, a poisoning attack could make a diagnostic model systematically mis-diagnose a condition for a certain subgroup.

3. The "Copycat" Attack (Model Theft):
Competitors can steal a company's expensive AI "brain" simply by asking it millions of questions and using the answers to rebuild a clone of the system.

OpenAI said it has evidence that DeepSeek used its proprietary model outputs via API to train a competing model — a form of model theft / cloning.

Why It Matters: As AI becomes the infrastructure for healthcare, transportation, and national security, safety and cybersecurity are becoming the same thing. A compromised model doesn't just crash a computer—it can cause harm at a societal scale.

Information Integrity: The Misinformation Crisis

AI has changed the rules of truth. It used to take skill and effort to forge a document or fake a photo. Now, AI creation tools allow anyone to generate infinite fake news articles, realistic voice clones, and "Deepfake" videos in seconds.

Why This is Dangerous:

Public Health: The danger has shifted from simple rumors to medical impersonation and dangerous advice. Scammers now use AI to clone the voices and faces of trusted professionals to sell fraudulent products. Furthermore, patients relying on AI chatbots for diagnosis face direct physical harm due to "hallucinations" AI Hallucination: When an AI model confidently generates false information, inventing facts or sources that look real but have no basis in reality. , where AI confidently invents dangerous medical treatments.

"Scammers are using a fake, AI-generated Dr Karl to sell health pills to Australians."

"Pharmacist's stolen image used in 'dangerous' deepfake adverts for weight loss drug"

"Deepfake videos impersonating real doctors push false medical advice and treatments"

A man stopped using table salt (sodium chloride) based on ChatGPT advice, substituted sodium bromide, developed bromide toxicity (bromism), and ended up in the hospital.

Identity Theft: Criminals can now use "Voice Cloning" to call your family sounding exactly like you to demand money.

Deepfakes: AI can now generate hyper-realistic video forgeries. This technology has evolved from simple internet tricks into a tool for viral hoaxes, political disinformation, and high-stakes financial fraud.

Meet Ernesto, , the viral America’s Got Talent contestant … who doesn’t exist.

"Fake, AI-generated videos about the Diddy trial are raking in millions of views on YouTube"

In Korea, the discovery of the existence of countless Telegram groups dedicated to deepfakes has shaken the country. Young men share pornographic edits of female students, teachers, colleagues and even family members. The number of reported cases of deepfake porn has risen steadily in recent years, from 156 in 2021 to 180 in 2023 (The Guardian).

"A protester holds up a sign reading "Repeated deepfake sex crimes: the state is complicit," in Seoul, August 30, 2024." ANTHONY WALLACE / AFP

South Korean activists protest against the rise of deepfake sexual crimes. Photograph: Chung Sung-Jun/Getty Images

"Deepfake Porn Is Out Of Control"

Matt Burgess' story on WIRED posted in Oct 2023 shows that AI porn videos has grown at an alarming rate—244,625 videos at the time of writing.

Addressing this crisis requires advances in:

▸ Fake News Detection
▸ Transparency and Model Auditing
▸ Content Authentication

This is a core focus of our work at COHERENTEYES.

Help Shape the Future of AI Safety

Addressing these challenges requires a dedicated community of researchers, engineers, and policy experts.

Explore Career Opportunities