TL;DR

ComfyUI-Flux2Klein-Enhancer is a free, MIT-licensed ComfyUI custom-node suite by capitan01R that adds conditioning enhancement and reference-latent control to FLUX.2 Klein 9B. Built from empirical analysis and forward-pass tracing of Klein's dual-stream architecture, it solves the model's biggest weakness: inconsistent subject and object preservation during image edits. Current version: v2.4.0.

What's new

FLUX.2 Klein 9B (released 2026-01-15 by Black Forest Labs) is a step-distilled rectified flow transformer that matches or beats models 5x its size in under half a second. Powerful, but with one frustrating gap: when you edit an image, it sometimes nails the subject and sometimes completely ignores what you asked it to preserve. There was no native knob to control this.

capitan01R attached forward-pass hooks to the running model, mapped every tensor shape, and confirmed a striking fact: the text conditioning and the reference latent travel through completely separate streams until the final blocks. That insight made independent control possible, and the GitHub repo now ships the tooling to act on it.

Why it matters

Most ComfyUI conditioning hacks operate at the tokenizer or text-embedding level. Flux2Klein-Enhancer operates deeper — directly on Klein's native [1, 512, 12288] conditioning tensor and on the image stream's reference-latent tensor, independently. That turns FLUX.2 Klein from a fast-but-moody editor into a predictable one: character identities hold across edits, masked regions stay locked to the source, and complex prompts stop bleeding concepts together.

It also shows a broader pattern worth paying attention to: empirical reverse-engineering of open-weight SOTA models is becoming a shipping-quality developer activity, not just a research curiosity.

Technical facts

Verified through diagnostic analysis and model hook tracing:

PropertyValue
Base modelFLUX.2 Klein 9B (9B-param rectified flow transformer)
Text encoderQwen3 8B (4096 hidden dim, 36 layers)
Conditioning tensor[batch, 512, 12288]
Active regionPositions 0–77 (std ~40), rest is padding (std ~2). ~67 active tokens auto-detected from attention mask
Reference latent[batch, 128, H, W], stored in metadata, NOT merged into text conditioning
Architecture8 double_blocks (separate streams) + 24 single_blocks (concatenated)
InferenceStep-distilled, 4 steps, no CFG, <0.5s on modern hardware
VRAM~29GB (RTX 4090+)

With a reference latent attached, img_in receives [1, 8140, 128] (reference + noisy concatenated). Without one, [1, 4070, 128]. The enhancer modifies the txt_stream input on one side and half of the img_stream input (the reference portion) on the other — two independent control surfaces, combinable.

The nodes

  • FLUX.2 Klein Enhancer — general-purpose text-conditioning control. magnitude (0–3, scale prompt strength), contrast (−1 to 2, sharpen concept separation), normalize_strength (equalize token weight), edit_text_weight (image-edit-only; <1 preserves more source), plus active_end_override and low_vram.
  • FLUX.2 Klein Detail Controller — regional emphasis. front_mult (first 25% of active tokens, usually the subject), mid_mult (middle 50%, usually details), end_mult (last 25%, usually style terms), plus an arbitrary emphasis_start/end/mult window.
  • FLUX.2 Klein Ref Latent Controller — direct control over the reference-image stream. strength 0 acts like txt2img, 1 is normal, 2+ locks harder to the source. blend_with_noise degrades reference to loosen structure. spatial_fade (center_out, edges_out, top_down, left_right) applies a gradient so different regions have different strength.
  • Text/Ref Balance — single slider: 0 = reference dominates, 0.5 = balanced, 1 = prompt dominates.
  • Identity Guidance & Identity Feature Transfer — stackable nodes that pull generations toward a reference latent inside the sampling loop and at the attention level.
  • Mask Ref Controller (BETA) — mask-guided spatial control: masked area freed for prompt, unmasked area stays true to source.

Use cases

Identity preservation across edits. Set preservation mode to dampen at 1.20–1.30 for reliable subject lock. For extreme precision, stack Identity Guidance (strength 0.3, start 0.0, end 0.8) with Identity Feature Transfer (per-block blend 0.10–0.20) — they operate at different stages and don't interfere.

Targeted mask-guided swaps. Route a ComfyUI mask into Mask Ref Controller; the masked area accepts the new prompt, the unmasked area keeps its reference structure. Raise feather to soften seams.

Concept separation in complex prompts. Raise contrast to a positive value in the main Enhancer to prevent (e.g.) background color bleeding into the subject.

Quick sanity check on any workflow. Run three generations at the same seed, same prompt, with Ref Latent Controller strength at 0, 1, 2. If 0 looks like txt2img and 2 locks hard to the source, it's wired correctly.

Limitations & pricing

Mask Ref Controller is still BETA — results are promising but vary with prompt and image complexity. It and Identity Feature Transfer require image-edit mode (a reference latent must be present); they won't run on pure text-to-image. The FLUX.2 Klein 4B variant isn't supported yet — the author wants to "first get a full grip on the 9B" before tackling the 4B's different architecture. Early versions had a silent mean-recentering bug that neutralized the enhancer's own changes; v2.x removed it and renamed parameters, so pre-v2 workflows will break on upgrade.

The enhancer itself is free under the MIT license. The underlying FLUX.2 Klein 9B model sits under the FLUX Non-Commercial License, so commercial deployments still need a BFL agreement.

What's next

The January 30, 2026 release was labeled "Possibly Final" for the 9B branch. The roadmap hint is eventual 4B support, but no date. Install via ComfyUI Manager (search ComfyUI-Flux2Klein-Enhancer) or clone the GitHub repo into your custom_nodes folder. Example workflows and images are in the example_workflow/ directory.

Nguồn: capitan01R/ComfyUI-Flux2Klein-Enhancer, Black Forest Labs, Hugging Face, RunComfy guide.