模型名 | 参数量 | 模型大小(Pytorch) | 训练数据 | token长度 | 网络结构 | 训练硬件 | 训练时长 | 发布时间 | 发布组织 | |
GPT-2 | small:124M medium:355M large:774M XL:1.5B | small: 548MB medium:1.52GB large:3.25GB XL:6.43GB | 8百万个页面,40GB网络数据,4千5百万个Reddit链接,数据截止2017年,词汇量50,257 | 1024 | small: 12layers medium: 24layers large:36layers XL:48layers batch size: 512 | 256 TPU v3 cores | 未知 | 2019年 | paper github huggingface | OpenAI |
GPT | 117M | 479MB | 7千本书,5GB, 4万词汇量 | 512 | 37-layer, 12-layer decoder, 768 hidden size, 12 attention heads. batch size: 64 | 8 P600 GPU | 一个月; 0.96 petaflop days; 100 epochs | 2018年 | paper huggingface | OpenAI |
GPT-3 | 175B | | 45TB | 2048 | 96 layers batch size: 3.2M
| | | 2020年 | | OpenAI |
T5 | small: 60M base: 220M large:770M T5-3B: 3B T5-11B:11B | small:242MB base:892MB large:2.95GB T5-3B: 11.4GB T5-11B: 45.2GB | 750GB datasets | | encoder: 12 layers deconder: 12 layers 1024 hiden size | 1024 TPU v3 | | | | Google |
BERT | 340M | | | | | | | | | Google AI |
Turning-NLG | 17B | | | | | | | | | Microsoft Research |
Megatron-LM | 8.3B | | | | | | | | | NVIDIA |
Switch-Transformer | 1600B | | | | | | | | | Google Brain |
OPT | 175B | | | | | 1000 A100 | 2个月 | | | Meta |
黄世宇/Shiyu Huang's Personal Page:https://huangshiyu13.github.io/