Our Latest Blog

Dive into my thoughts on coding, tech trends, and developer life. Explore my latest posts below.

Fix GPT-2 Attention Scaling Ignored in SDPA/FlashAttention

A silent bug in Hugging Face Transformers caused GPT-2 attention scaling configs to be ignored when using SDPA or FlashAttention backends. Here's how I traced, fixed, and tested it through three rounds of maintainer review.

Transformers GPT-2 Attention SDPA FlashAttention Python
March 4, 2026
Read More