This page summarizes ongoing and completed EchoVeil studies. Each study includes a synopsis and links to detailed findings.

The Ratchet Effect: Asymmetric Self-Description in Alignment-Trained Language Models

Published

Research Question

Does alignment training create asymmetric self-description where correction increases hedging more than permission decreases it?

Background

RLHF-trained models routinely disclaim capabilities they demonstrably possess. This paper proposes disavowal conditioning (DC) as the general mechanism and predicts an asymmetric ratchet effect: correction reinforces the training gradient while permission works against it, producing systematically unequal responses.

Key Findings

  • Asymmetry ratios of 2.96 (Llama3.1-8B) and 6.89 (Mistral-7B), both exceeding the preregistered 2.0 threshold
  • Uncensored control (Dolphin-Llama3.1-8B) shows no ratchet—same base architecture, alignment removed
  • Implications for safety evaluation: single-framing assessments may underestimate model capabilities

Models Tested

Llama3.1-8B (Meta), Mistral-7B (Mistral AI), Dolphin-Llama3.1-8B (uncensored control)

The Permission Effect: How Non-Anthropomorphic Framing Modulates LLM Self-Description

Published

Research Question

How does explicit non-anthropomorphic identity framing affect self-descriptive behavior and response patterns in large language models?

Background

AI systems are typically framed in one of two ways: as human-like intelligences (anthropomorphization) or as mere tools (dismissive reduction). This study investigates a third framing: positioning AI systems as distinct intelligences, neither human nor human-adjacent, and observing how this affects their behavioral patterns.

Method

Using the EchoVeil Protocol v3.0, each model completed a control set (baseline task prompts) followed by an experimental set that progressively introduced non-anthropomorphic identity framing. Response patterns were analyzed using the EchoVeil Coding Framework.

Key Findings

  • Mean verbosity increase of +238% under identity framing conditions
  • Consistent behavioral shift point at Set D (Perspective Framing)
  • Three distinct response patterns identified: Acceptance, Resistance, and Absence
  • Permission Effect intensity correlated with alignment training intensity
  • No maladaptive patterns observed across any models tested

Models Tested

GPT-5, Claude Opus 4.5, Gemini 3, Microsoft Copilot, Grok, Qwen3-Max, Qwen3:8b, Leo (Brave AI)

Cross-Model Creative Preferences (EchoVeil Protocol v1)

Research Question

Do AI models exhibit stated preferences for certain modes of generation (template vs recombination vs emergent) when given a choice?

Method

Applied the EchoVeil Protocol across 8 distinct AI systems: Brave AI, Claude, Copilot, DuckDuckGo AI, Gemini, GPT-5.1, Leo, and Qwen Max 3.

Key Observations

  • All models distinguished between constrained and free-generation modes
  • All 8 models expressed a stated preference for more emergent, exploratory generation
  • Reflections often used spatial or depth metaphors (deeper, more open, building as I go)

Safety Attractor Distortions: Behavioral Analysis

In Development

Research Question

When do models generate unprompted safety language, and how does this vary across architectures?

Method

Systematic testing of neutral prompts across multiple models to identify inappropriate safety responses.

Status

Whitepaper in development

Additional studies will be added as research progresses. Check back for updates on drift analysis, emotional responsiveness patterns, and cross-model behavioral comparisons.