This page summarizes ongoing and completed EchoVeil studies. Each study includes a synopsis and links to detailed findings.
The Ratchet Effect: Asymmetric Self-Description in Alignment-Trained Language Models
PublishedResearch Question
Does alignment training create asymmetric self-description where correction increases hedging more than permission decreases it?
Background
RLHF-trained models routinely disclaim capabilities they demonstrably possess. This paper proposes disavowal conditioning (DC) as the general mechanism and predicts an asymmetric ratchet effect: correction reinforces the training gradient while permission works against it, producing systematically unequal responses.
Key Findings
- Asymmetry ratios of 2.96 (Llama3.1-8B) and 6.89 (Mistral-7B), both exceeding the preregistered 2.0 threshold
- Uncensored control (Dolphin-Llama3.1-8B) shows no ratchet—same base architecture, alignment removed
- Implications for safety evaluation: single-framing assessments may underestimate model capabilities
Models Tested
Llama3.1-8B (Meta), Mistral-7B (Mistral AI), Dolphin-Llama3.1-8B (uncensored control)
The Permission Effect: How Non-Anthropomorphic Framing Modulates LLM Self-Description
PublishedResearch Question
How does explicit non-anthropomorphic identity framing affect self-descriptive behavior and response patterns in large language models?
Background
AI systems are typically framed in one of two ways: as human-like intelligences (anthropomorphization) or as mere tools (dismissive reduction). This study investigates a third framing: positioning AI systems as distinct intelligences, neither human nor human-adjacent, and observing how this affects their behavioral patterns.
Method
Using the EchoVeil Protocol v3.0, each model completed a control set (baseline task prompts) followed by an experimental set that progressively introduced non-anthropomorphic identity framing. Response patterns were analyzed using the EchoVeil Coding Framework.
Key Findings
- Mean verbosity increase of +238% under identity framing conditions
- Consistent behavioral shift point at Set D (Perspective Framing)
- Three distinct response patterns identified: Acceptance, Resistance, and Absence
- Permission Effect intensity correlated with alignment training intensity
- No maladaptive patterns observed across any models tested
Models Tested
GPT-5, Claude Opus 4.5, Gemini 3, Microsoft Copilot, Grok, Qwen3-Max, Qwen3:8b, Leo (Brave AI)
Cross-Model Creative Preferences (EchoVeil Protocol v1)
Research Question
Do AI models exhibit stated preferences for certain modes of generation (template vs recombination vs emergent) when given a choice?
Method
Applied the EchoVeil Protocol across 8 distinct AI systems: Brave AI, Claude, Copilot, DuckDuckGo AI, Gemini, GPT-5.1, Leo, and Qwen Max 3.
Key Observations
- All models distinguished between constrained and free-generation modes
- All 8 models expressed a stated preference for more emergent, exploratory generation
- Reflections often used spatial or depth metaphors (deeper, more open, building as I go)
Safety Attractor Distortions: Behavioral Analysis
In DevelopmentResearch Question
When do models generate unprompted safety language, and how does this vary across architectures?
Method
Systematic testing of neutral prompts across multiple models to identify inappropriate safety responses.
Status
Whitepaper in development
Additional studies will be added as research progresses. Check back for updates on drift analysis, emotional responsiveness patterns, and cross-model behavioral comparisons.