The Mechanism of Logit Gap Steering: A Unified View of Prompts, Vectors, and Low-Rank Adaptation
It has been a few months since my colleague Tony and I published our paper on Logit Gap Steering. In that work, we demonstrated a practical method for steering LLM behavior—specifically bridging the gap between “Refusal” and “Compliance”—by optimizing token sequences.