We developed efficient, model-parallel ( tensor, sequence, and pipeline), and multi-node pre-training of transformer based models such as GPT, BERT, and T5 using mixed precision.īelow are some of the projects where we have directly used Megatron:
This repository is for ongoing research on training large transformer language models at scale.
Megatron ( 1, 2, and 3) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA.