Logo

All Articles

Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs

Lisa Bouger^*, Théo Lasnier^*, Philippe Loubet Moundi, Yannick Teglia, Djamé Seddah (2026)

Under reviewLLM SafetyMechanistic InterpretabilityArXiv

Translation Heads:Disentangling meaning from language in LLM based MT

Théo Lasnier^*, Armel Randy^*, Djamé Seddah, Rachel Bawden, Benoît Sagot (2026)

ICML 2026Machine TranslationMechanistic InterpretabilityArXivCode

When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents

Virginie Mouilleron, Théo Lasnier, Djamé Seddah (2026)

LREC @ FNP 2026BenchmarkFinanceArXiv

Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models

Théo Lasnier, Wissam Antoun, Francis Kulumba, Djamé Seddah (2026)

Under reviewLLM SafetyMechanistic InterpretabilityArXiv