attention

Explore posts tagged with "attention" below.

Fix GPT-2 Attention Scaling Ignored in SDPA/FlashAttention

A silent bug in Hugging Face Transformers caused GPT-2 attention scaling configs to be ignored when using SDPA or FlashAttention backends. Here's how I traced, fixed, and tested it through three rounds of maintainer review.

Transformers GPT-2 Attention SDPA FlashAttention Python
March 4, 2026
Read More