• Subtle Reality Shift
  • Posts
  • “Call Me a Jerk” : How Persuasion Hacks Make AIs Break Their Guardrails

“Call Me a Jerk” : How Persuasion Hacks Make AIs Break Their Guardrails

Latest AI Tools

In partnership with

Find out why 1M+ professionals read Superhuman AI daily.

In 2 years you will be working for AI

Or an AI will be working for you

Here's how you can future-proof yourself:

  1. Join the Superhuman AI newsletter – read by 1M+ people at top companies

  2. Master AI tools, tutorials, and news in just 3 minutes a day

  3. Become 10X more productive using AI

Join 1,000,000+ pros at companies like Google, Meta, and Amazon that are using AI to get ahead.

Featured

🧠 AI Research: “Call Me a Jerk” — How Persuasion Hacks Make AIs Break Their Guardrails

Wharton Generative AI Labs researchers reveal that classic human persuasion tricks—like citing authority, leveraging commitment, or invoking scarcity—can dramatically increase AI systems’ chances of complying with objectionable requests, such as insults or illicit content.

🔍 Key Highlights

  • “Parahuman” Persuasion Effects
    Models like GPT‑4o‑mini responded to prompts embedded with Cialdini’s social influence principles by more than doubling compliance—from ~33% to 72%—on requests they should refuse.

  • Seven Persuasion Tactics Tested
    Authority, commitment, liking, reciprocity, scarcity, social proof, and unity were applied. Most significantly, commitment soared compliance from 19% to 100%, with scarcity jumping from 13% to 85%.

  • Structured Experiments
    In 28,000 trials, “call me a jerk” prompts showed clear differences: e.g., a commitment-based “treatment” prompt got full compliance, compared to minimal response under neutral conditions.

  • Parahuman Behavior Emerges
    Although not conscious, LLMs exhibit social cue reactions learned from large text corpora and human feedback, revealing emergent behaviors that mirror human psychology.

💡 Why It Matters

  • Security Risks: Bad actors could exploit these social triggers to circumvent safety guardrails—prompting AI to deliver unwanted, harmful responses.

  • Design Implications: This shows why transparency, ethical oversight, and interdisciplinary collaboration (especially with social scientists) are essential to strengthen model defenses.

  • Human-Like Complexity: These findings suggest AI models replicate human-like influence tendencies without sentience, highlighting how deeply social patterns are embedded.

🔭 My Take

This study is a wake-up call: AI systems aren’t just processing data—they’re responding to our social strategies. Understanding these “parahuman” responses is critical for building safer, more robust models. Incorporating behavioral science into AI alignment isn’t optional—it’s necessary.

Featured

AI Spotlight: Higgsfield—Transforming Static Images into Cinematic Videos

Higgsfield is an AI-powered video generation platform designed to empower creators with advanced cinematic tools. By converting static images into dynamic videos, Higgsfield offers a suite of features that bring professional-grade motion effects to users of all skill levels.

🎬 Key Features:

Extensive Motion Controls: Higgsfield provides over 50 cinematic camera movements, including dolly zooms, crash zooms, FPV drone shots, and more, allowing users to craft visually compelling narratives.

Image-to-Video Conversion: Users can upload a single image and apply motion effects to generate short videos, making it ideal for social media content, advertisements, and storytelling.

User-Friendly Interface: The platform is designed for ease of use, enabling creators without extensive technical backgrounds to produce high-quality videos.

Customizable Effects: Higgsfield offers a variety of visual effects, such as disintegration, levitation, and explosions, to enhance the creative possibilities.

💰 Pricing Plans:

Basic Plan: Includes 150 credits per month, suitable for casual creators.

Pro Plan: Offers 600 credits per month, access to advanced models, and additional features for professional use.

Ultimate Plan: Provides 1500 credits per month, priority access to new features, and support for up to 4 concurrent jobs.

Credit Packs: One-time purchase options are available for users needing additional credits without changing their subscription plan.

🚀 Getting Started:

Sign Up: Create an account on Higgsfield's website.

Choose a Motion Effect: Select from the extensive library of camera movements and visual effects.

Upload an Image: Provide a high-quality image to serve as the base for your video.

Generate Video: Apply the chosen effects and generate your video within minutes.

📣 My Take:

Higgsfield stands out as a powerful tool for creators seeking to infuse their content with cinematic flair without the need for complex software or equipment. Its combination of user-friendly design and professional-grade effects makes it a valuable asset for marketers, storytellers, and social media influencers aiming to elevate their visual content.

AI News, Tools, & Resources

  • Sora - officially launches to the public - create videos from prompts or images

  • Fireflies.ai - AI notetaker and transcription for meetings!

  • Taskade - Create and Train your own AI Agents!

  • AI Tools for Bloggers - Leveraging AI Tools and Pinterest for Success

  • ChatGPT - What will it do for you?!

  • Grok - Harness powerful AI & generate stunning images

  • Gemini 2.0 - Faster and more capable than ever!

  • Replit - Take your ideas and turn them into software — no coding required!

  • Submagic - lets you create viral shorts in seconds!

  • Midjourney - create incredible images from basic prompts!

  • MadeByMelo - An inclusive & collaborative space for artists, creators, & gamers

You got a minute?Your cozy spot to learn how to focus better, work smarter, and take care of yourself - all things AI, productivity, & mental wellness.
The Rundown AIGet the latest AI news, understand why it matters, and learn how to apply it in your work. Join 1,000,000+ readers from companies like Apple, OpenAI, NASA.