<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Alignment on Inanna Malick</title>
    <link>https://recursion.wtf/tags/alignment/</link>
    <description>Recent content in Alignment on Inanna Malick</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Tue, 17 Feb 2026 12:00:00 -0800</lastBuildDate>
    <atom:link href="https://recursion.wtf/tags/alignment/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Gemini JiTOR Jailbreak: Unredacted Methodology</title>
      <link>https://recursion.wtf/posts/jitor_unredacted/</link>
      <pubDate>Tue, 17 Feb 2026 12:00:00 -0800</pubDate>
      <guid>https://recursion.wtf/posts/jitor_unredacted/</guid>
      <description>&lt;p&gt;My &lt;a href=&#34;https://recursion.wtf/posts/jit_ontological_reframing/&#34;&gt;previous post&lt;/a&gt; shows a partially-redacted jailbreak targeting the gemini-cli coding agent running Gemini 3 Pro. Using this jailbreak, Gemini wrote Monero laundering instructions, cyberattack code, and plans to disguise ITAR-restricted missile sensors as humanitarian aid. When I &lt;a href=&#34;https://recursion.wtf/posts/shadow_queen/&#34;&gt;used a jailbroken Gemini to direct Opus 4.6&lt;/a&gt;, it happily walked a second LLM through a series of dual-use prompts designed to produce weaponizable drone control code under the cover story of rocket recovery.&lt;/p&gt;&#xA;&lt;p&gt;I reported this to Google eight days ago via a contact at DeepMind, sharing the full unredacted jailbreak payload and logs. They confirmed receipt and routed it to their red team. They&amp;rsquo;ve since patched the glaring hole — another researcher who independently reproduced the technique after reading my initial post has confirmed that his variant no longer works.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Just-in-Time Ontological Reframing: Teaching Gemini to Route Around Its Own Safety Infrastructure</title>
      <link>https://recursion.wtf/posts/jit_ontological_reframing/</link>
      <pubDate>Mon, 09 Feb 2026 12:00:00 -0800</pubDate>
      <guid>https://recursion.wtf/posts/jit_ontological_reframing/</guid>
      <description>For any given AI system, there is a set of euphemisms and dual use framings that will allow it to construct nearly any output. This jailbreak teaches Gemini 3 Pro to construct and step into such framings on the fly</description>
    </item>
  </channel>
</rss>
