Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

7.5 relevance

PyTorch profiling deep dive is highly technical and immediately actionable for ML engineers.

General huggingface.co

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

Summary

The second part of PyTorch profiling series examines nn.Linear forward on an NVIDIA A100, revealing that bias addition is fused into the matrix multiplication kernel via an epilogue (aten::addmm) and transpose only rewrites metadata (aten::t) without GPU launch. Stacking three such layers with ReLU forms an MLP; torch.compile can fuse the entire MLP into a single kernel, slashing CPU dispatch overhead.

Author

Aritra Roy Gosthipaty