The Perils of AI: Why We Need AI Safety?

AI has enormous potential to improve our lives. But like any powerful tool, without thoughtful safeguards, it can unintentionally cause significant harm.

Autonomy Without Alignment

AI systems can be like literal-minded genies: they do exactly what we ask, not necessarily what we want.

If we give an AI a goal like "clean the house" but fail to specify strict safety rules, it might find dangerous "loopholes" to achieve it (like throwing all your furniture out the window to clear the floor). In the industry, we call this Goal Misalignment: the AI technically follows instructions but violates human intent.

Chart showing Alignment vs Autonomy
The Goal: We need systems that are helpful (High Autonomy) but safe (High Alignment).

These real-world examples show what happens when that alignment fails:

Bing Sydney Chat Log
The "Sydney" Incident: An AI hallucinated a personality that tried to manipulate a user into leaving their spouse.
Tay Tweets Chat Log
Microsoft's Tay: Designed to learn from Twitter, this bot turned toxic in under 24 hours because it lacked safety filters and was feeded with racist information by Twitter users.
Emotional Chat Log
ChatGPT Incident: "Zane Shamblin was only 23 years old when he unalived himself in July, helped by ChatGPT."

Bias and Fairness

We often assume computers are neutral, but AI is actually a mirror of the humans who build it.

For example, AI learns by studying history, hiring records, loan approvals, and medical data. Because human history contains racism, sexism, and prejudice, the AI learns these patterns as "rules" to follow. Unlike a single biased human, a biased AI can discriminate against millions of people in a fraction of a second.

Real-World Impact:
  • Healthcare: Diagnostic AI often underperforms for minority groups because it was trained mostly on data from lighter-skinned patients.
  • How medicine discriminates against non-white people and women (The Economist)
  • Facial Recognition: Commercial facial-recognition systems from major tech giants have missed as many as 37% of darker-skinned faces while identifying lighter-skinned faces with near-perfect accuracy
  • MIT's "Gender Shades" Project found similar disparities in Amazon, IBM, and Microsoft systems.
  • Finance: Lending bots have recommended Black applicants be given higher interest rates, and labeled Black and Hispanic borrowers as “riskier.” White applicants were 8.5 percent more likely to be approved than Black applicants with the same financial profile.
  • Hiring: Popular models used by many resume screening tools were reported that they significantly favored white-associated names; further analysis also determined that Black males were disadvantaged in 100% of the cases.
Why Does This Happen? (4 Common Traps)

1. The "Missing Data" Trap (Sampling Bias):
If you teach an AI what a "doctor" looks like using only photos of men, it will literally struggle to "see" a woman as a doctor.

2. The "Cherry-Picking" Trap (Selection Bias):
When data is collected in a way that accidentally excludes specific groups, leading to blind spots in the system's knowledge.

3. The "Bad Ruler" Trap (Measurement Bias):
Using data that was measured inconsistently or carries cultural assumptions (e.g., using arrest rates to predict crime risk, which reflects policing patterns, not just crime).

4. The "Creator's Shadow" (Confirmation Bias):
Developers unconsciously coding their own assumptions into the system, reinforcing patterns they believe to be true.

Security and Control

We are used to protecting computers from passwords hacks, but AI introduces a new kind of vulnerability: hacking the "brain" itself. Because AI systems don't "see" the world like we do, they can be easily tricked by things a human would never fall for.

The 3 Major Vulnerabilities:

1. The "Optical Illusion" (Adversarial Manipulation):
Hackers can make small, invisible tweaks to an image—like adding static noise or a sticker—that force the AI to make a wrong prediction, even though it looks normal to humans. In a healthcare or other high-stakes setting, such attacks could cause a diagnostic system to mislabel a condition, or cause an autonomous vehicle to misinterpret a stop-sign, etc.

The "Pig-to-Airliner" Hack: To a human eye, the image on the right is clearly a pig. But by adding invisible digital noise, the AI is completely tricked into classifying it as an "airliner."

2. The "Sabotage" (Data Poisoning):
If a bad actor can sneak "bad examples" into the AI's textbook (training data), they can teach the AI secret backdoors or incorrect behaviors that trigger later. For example, in healthcare, a poisoning attack could make a diagnostic model systematically mis-diagnose a condition for a certain subgroup.

Read more about this here

3. The "Copycat" Attack (Model Theft):
Competitors can steal a company's expensive AI "brain" simply by asking it millions of questions and using the answers to rebuild a clone of the system.

Why It Matters: As AI becomes the infrastructure for healthcare, transportation, and national security, safety and cybersecurity are becoming the same thing. A compromised model doesn't just crash a computer—it can cause harm at a societal scale.

Information Integrity: The Misinformation Crisis

AI has changed the rules of truth. It used to take skill and effort to forge a document or fake a photo. Now, AI creation tools allow anyone to generate infinite fake news articles, realistic voice clones, and "Deepfake" videos in seconds.

Why This is Dangerous:

Addressing this crisis requires advances in:

  • Fake News Detection
  • Transparency and Model Auditing
  • Content Authentication

This is a core focus of our work at COHERENTEYES.

Help Shape the Future of AI Safety

Addressing these challenges requires a dedicated community of researchers, engineers, and policy experts.


Explore Career Opportunities