AI Summary
Tianzhu Ye, Li Dong, Yutao Sun, Furu Wei Github Link Notion Link (for better readability) We introduce Differential Transformer V2 (DIFF V2), an improved version of Differential Transformer (DIFF V1). This revision focuses on inference efficiency, training stability for production-level LLMs, and ar