0. 往期内容

[一]深度学习Pytorch-张量定义与张量创建

[二]深度学习Pytorch-张量的操作：拼接、切分、索引和变换

[三]深度学习Pytorch-张量数学运算

[四]深度学习Pytorch-线性回归

[五]深度学习Pytorch-计算图与动态图机制

[六]深度学习Pytorch-autograd与逻辑回归

[七]深度学习Pytorch-DataLoader与Dataset(含人民币二分类实战)

[八]深度学习Pytorch-图像预处理transforms

[九]深度学习Pytorch-transforms图像增强(剪裁、翻转、旋转)

[十]深度学习Pytorch-transforms图像操作及自定义方法

[十一]深度学习Pytorch-模型创建与nn.Module

[十二]深度学习Pytorch-模型容器与AlexNet构建

[十三]深度学习Pytorch-卷积层(1D/2D/3D卷积、卷积nn.Conv2d、转置卷积nn.ConvTranspose)

[十四]深度学习Pytorch-池化层、线性层、激活函数层

[十五]深度学习Pytorch-权值初始化

[十六]深度学习Pytorch-18种损失函数loss function

[十七]深度学习Pytorch-优化器Optimizer

[十八]深度学习Pytorch-学习率Learning Rate调整策略

[十九]深度学习Pytorch-可视化工具TensorBoard

[二十]深度学习Pytorch-Hook函数与CAM算法

[二十一]深度学习Pytorch-正则化Regularization之weight decay

[二十二]深度学习Pytorch-正则化Regularization之dropout

[二十三]深度学习Pytorch-批量归一化Batch Normalization

[二十四]深度学习Pytorch-BN、LN(Layer Normalization)、IN(Instance Normalization)、GN(Group Normalization)

[二十五]深度学习Pytorch-模型保存与加载

[二十六]深度学习Pytorch-模型微调Finetune

[二十七]深度学习Pytorch-GPU的使用

1. CPU与GPU

在这里插入图片描述

代码示例：
cuda_methods.py

# -*- coding: utf-8 -*-
"""
# @file name  : cuda_methods.py
# @author     : TingsongYu https://github.com/TingsongYu
# @date       : 2019-11-11
# @brief      : 数据迁移至cuda的方法
"""
import torch
import torch.nn as nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# ========================== tensor to cuda
# flag = 0
flag = 1
if flag:
    x_cpu = torch.ones((3, 3))
    print("x_cpu:\ndevice: {} is_cuda: {} id: {}".format(x_cpu.device, x_cpu.is_cuda, id(x_cpu)))

    x_gpu = x_cpu.to(device)
    print("x_gpu:\ndevice: {} is_cuda: {} id: {}".format(x_gpu.device, x_gpu.is_cuda, id(x_gpu)))

# pytorch已经弃用的方法
# x_gpu = x_cpu.cuda()

# ========================== module to cuda
# flag = 0
flag = 1
if flag:
    net = nn.Sequential(nn.Linear(3, 3))

    print("\nid:{} is_cuda: {}".format(id(net), next(net.parameters()).is_cuda))

    net.to(device)
    print("\nid:{} is_cuda: {}".format(id(net), next(net.parameters()).is_cuda))


# ========================== forward in cuda
# flag = 0
flag = 1
if flag:
    output = net(x_gpu) #输入在gpu，模型在gpu，输出在gpu上
    print("output is_cuda: {}".format(output.is_cuda))

    # output = net(x_cpu) #输入在cpu，模型在gpu，报错



# ========================== 查看当前gpu 序号，尝试修改可见gpu，以及主gpu
flag = 0
# flag = 1
if flag:
    current_device = torch.cuda.current_device()
    print("current_device: ", current_device)

    torch.cuda.set_device(0)
    current_device = torch.cuda.current_device()
    print("current_device: ", current_device)


    #
    cap = torch.cuda.get_device_capability(device=None)
    print(cap)
    #
    name = torch.cuda.get_device_name()
    print(name)

    is_available = torch.cuda.is_available()
    print(is_available)



    # ===================== seed
    seed = 2
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    current_seed = torch.cuda.initial_seed()
    print(current_seed)


    s = torch.cuda.seed()
    s_all = torch.cuda.seed_all()

在这里插入图片描述

2. torch.cuda常用方法

在这里插入图片描述

2代表着逻辑gpu0是物理gpu2，3代表着逻辑gpu1是物理gpu3。
在这里插入图片描述
0代表着逻辑gpu0是物理gpu0，3代表着逻辑gpu1是物理gpu3，2代表着逻辑gpu2是物理gpu2。
默认gpu0是主gpu。

代码示例：

# -*- coding: utf-8 -*-

import os
import numpy as np
import torch


# ========================== 选择 gpu
# flag = 0
flag = 1
if flag:
    gpu_id = 0
    gpu_str = "cuda:{}".format(gpu_id)
    device = torch.device(gpu_str if torch.cuda.is_available() else "cpu")

    x_cpu = torch.ones((3, 3))
    x_gpu = x_cpu.to(device)

    print("x_gpu:\ndevice: {} is_cuda: {} id: {}".format(x_gpu.device, x_gpu.is_cuda, id(x_gpu)))


# ========================== 查看 gpu数量/名称
# flag = 0
flag = 1
if flag:
    device_count = torch.cuda.device_count()
    print("\ndevice_count: {}".format(device_count))

    device_name = torch.cuda.get_device_name(0)
    print("\ndevice_name: {}".format(device_name))

3. 多GPU并行计算

在这里插入图片描述

分发3分钟，做作业一本60分钟，做完以后收作业3分钟。
在这里插入图片描述
结果默认输出至主gpu上。
（1）1个gpu时：
代码示例：

# -*- coding: utf-8 -*-

import os
import numpy as np
import torch
import torch.nn as nn

# ============================ 手动选择gpu
# flag = 0
flag = 1
if flag:

    gpu_list = [0]
    gpu_list_str = ','.join(map(str, gpu_list))
    os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


# ============================ 依内存情况自动选择主gpu
# flag = 0
flag = 1
if flag:
    def get_gpu_memory():
        import platform
        if 'Windows' != platform.system():
            import os
            os.system('nvidia-smi -q -d Memory | grep -A4 GPU | grep Free > tmp.txt')
            memory_gpu = [int(x.split()[2]) for x in open('tmp.txt', 'r').readlines()]
            os.system('rm tmp.txt')
        else:
            memory_gpu = False
            print("显存计算功能暂不支持windows操作系统")
        return memory_gpu


    gpu_memory = get_gpu_memory()
    if not gpu_memory:
        print("\ngpu free memory: {}".format(gpu_memory))
        gpu_list = np.argsort(gpu_memory)[::-1]

        gpu_list_str = ','.join(map(str, gpu_list))
        os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


class FooNet(nn.Module):
    def __init__(self, neural_num, layers=3):
        super(FooNet, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])

    def forward(self, x):

        print("\nbatch size in forward: {}".format(x.size()[0]))

        for (i, linear) in enumerate(self.linears):
            x = linear(x)
            x = torch.relu(x)
        return x


if __name__ == "__main__":

    batch_size = 16

    # data
    inputs = torch.randn(batch_size, 3)
    labels = torch.randn(batch_size, 3)

    inputs, labels = inputs.to(device), labels.to(device)

    # model
    net = FooNet(neural_num=3, layers=3)
    net = nn.DataParallel(net)
    net.to(device)

    # training
    for epoch in range(1):

        outputs = net(inputs)

        print("model outputs.size: {}".format(outputs.size()))

    print("CUDA_VISIBLE_DEVICES :{}".format(os.environ["CUDA_VISIBLE_DEVICES"]))
    print("device_count :{}".format(torch.cuda.device_count()))