迅速入门Pytorch深度学习框架-CFANZ编程社区

本文只是对于pytorch深度学习框架的使用方法的介绍，如果涉及算法中复杂的数学原理，本文将不予阐述，敬请读者自行阅读相关论文或者文献。

1.tensor基础操作

1.1 tensor的dtype类型

代码	含义
float32	32位float
float	floa
float64	64位float
double	double
float16	16位float
bfloat16	比float范围大但精度低
int8	8位int
int16	16位int
short	short
int32	32位int
int	int
int64	64位int
long	long
complex32	32位complex
complex64	64位complex
cfloat	complex float
complex128	128位complex float
cdouble	complex double

1.2 创建tensor（建议写出参数名字）

创建tensor时，有很多参数可以选择，为节省篇幅，本文在列举API时只列举一次，不列举重载的API。

1.2.1 空tensor（无用数据填充）

API

@overload
def empty(size: Sequence[Union[_int, SymInt]], *, memory_format: Optional[memory_format]=None, out: Optional[Tensor]=None, dtype: Optional[_dtype]=None, layout: Optional[_layout]=None, device: Optional[Union[_device, str, None]]=None, pin_memory: Optional[_bool]=False, requires_grad: Optional[_bool]=False) -> Tensor: ...

size：[行数,列数]

dtype(deepth type)：数据类型

device：选择运算设备

requires_grad：是否进行自动求导，默认为False

示例

gpu=torch.device("cuda")
       empty_tensor=torch.empty(size=[3,4],device=gpu,requires_grad=True)
       print(empty_tensor)

输出

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]], device='cuda:0', requires_grad=True)

1.2.2 全一tensor

@overload
def ones(size: _size, *, names: Optional[Sequence[Union[str, ellipsis, None]]], dtype: Optional[_dtype]=None, layout: Optional[_layout]=None, device: Optional[Union[_device, str, None]]=None, pin_memory: Optional[_bool]=False, requires_grad: Optional[_bool]=False) -> Tensor: ...

size：[行数,列数]

dtype(deepth type)：数据类型

device：选择运算设备

requires_grad：是否进行自动求导，默认为False

1.2.3 全零tensor

@overload
def zeros(size: _size, *, names: Optional[Sequence[Union[str, ellipsis, None]]], dtype: Optional[_dtype]=None, layout: Optional[_layout]=None, device: Optional[Union[_device, str, None]]=None, pin_memory: Optional[_bool]=False, requires_grad: Optional[_bool]=False) -> Tensor: ...

1.2.4 随机值[0,1)的tensor

@overload
def rand(size: _size, *, generator: Optional[Generator], names: Optional[Sequence[Union[str, ellipsis, None]]], dtype: Optional[_dtype]=None, layout: Optional[_layout]=None, device: Optional[Union[_device, str, None]]=None, pin_memory: Optional[_bool]=False, requires_grad: Optional[_bool]=False) -> Tensor: ...

1.2.5 随机值为整数且规定上下限的tensor

API

@overload
def randint(low: _int, high: _int, size: _size, *, generator: Optional[Generator]=None, dtype: Optional[_dtype]=None, device: Device=None, requires_grad: _bool=False) -> Tensor: ...

示例

int_tensor=torch.randint(low=0,high=20,size=[5,6],device=gpu)
   print(int_tensor)

输出

tensor([[18,  0, 14,  7, 18, 14],
        [17,  0,  2,  0,  0,  3],
        [16, 17,  5, 15,  1, 14],
        [ 7, 12,  8,  6,  4, 11],
        [12,  4,  7,  5,  3,  3]], device='cuda:0')

1.2.6 随机值均值0方差1的tensor

@overload
def randn(size: _size, *, generator: Optional[Generator], names: Optional[Sequence[Union[str, ellipsis, None]]], dtype: Optional[_dtype]=None, layout: Optional[_layout]=None, device: Optional[Union[_device, str, None]]=None, pin_memory: Optional[_bool]=False, requires_grad: Optional[_bool]=False) -> Tensor: ...

1.2.7 从列表或numpy数组创建tensor

def tensor(data: Any, dtype: Optional[_dtype]=None, device: Device=None, requires_grad: _bool=False) -> Tensor: ...

如果使用torch.from_numpy()，返回的tensor与ndarray共享内存。

1.3 tensor常用成员函数和成员变量

1.3.1 转为numpy数组

def numpy(self,*args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__ 
    pass

只有在CPU上运算的tensor才可以转为numpy数组
tensor.requires_grad属性为True的tensor不能转为numpy数组

1.3.2 获得单元素tensor的值`item`

def item(self): # real signature unknown; restored from __doc__
    ...

如果tensor只有一个元素，就返回它的值
如果tensor有多个元素，抛出ValueError

1.3.3 获取维度个数

def dim(self): #real signature unknown; restored from __doc__
    return 0

返回一个int表示维度个数

1.3.4 获取数据类型

dtype = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default

1.3.5 获取形状

def size(self,dim=None): # real signature unknown; restored from __doc__
    pass

使用.shape效果相同

1.3.6 浅拷贝与深拷贝

detach函数浅拷贝

假设有模型A和模型B，我们需要将A的输出作为B的输入，但训练时我们只训练模型B. 那么可以这样做：

input_B = output_A.detach()

它可以使两个计算图的梯度传递断开，从而实现我们所需的功能。

返回一个新的tensor，新的tensor和原来的tensor共享数据内存，但不涉及梯度计算，即requires_grad=False。修改其中一个tensor的值，另一个也会改变，因为是共享同一块内存。

sequence_tensor=torch.tensor(np.array([[[1,2,3],
                                            [4,5,6]],

                                           [[9,8,7],
                                            [6,5,4]]]),
                                 dtype=torch.float,device=gpu,)
   sequence_tensor_shallowCp=sequence_tensor.detach()
   sequence_tensor_shallowCp+=1
   print(sequence_tensor)
   print(sequence_tensor_shallowCp.requires_grad)

输出

tensor([[[ 2.,  3.,  4.],
         [ 5.,  6.,  7.]],

        [[10.,  9.,  8.],
         [ 7.,  6.,  5.]]], device='cuda:0')
False

深拷贝

法一：.clone().detach()
法二：.new_tensor()

1.3.7 形状变换

转置

向量或矩阵转置

def t(self): # real signature unknown; restored from __doc__
    """
    t() -> Tensor
  
    See :func:`torch.t`
    """
    return _te.Tensor(*(), **{})

返回值与原tensor共享内存！

指定两个维度进行转置：

def permute(self, dims: _size) -> Tensor: 
    r"""
    permute(*dims) -> Tensor
  
    See :func:`torch.permute`
    """
    ...

返回值与原tensor共享内存！
对矩阵来说，.t()等价于.permute(0, 1)

多维度同时转置

def permute(self, *dims): # real signature unknown; restored from __doc__
    """
    permute(*dims) -> Tensor
  
    See :func:`torch.permute`
    """
    return _te.Tensor(*(), **{})

把要转置的维度放到对应位置上，比如对于三维tensor，x、y、z分别对应0、1、2，如果想要转置x轴和z轴，则输入2、1、0即可
返回值与原tensor共享内存！

`cat`堆叠

cat可以把两个或多个tensor沿着指定的维度进行连接，连接后的tensor维度个数不变，指定维度上的大小改变，非指定维度上的大小不变。譬如，两个shape=(3,)行向量按dim=0连接，变成1个shape=(6,)的行向量；2个3阶方阵按dim=0连接，就变成1个(6, 3)的矩阵。

cat在使用时对输入的这些tensor有要求：除了指定维度，其他维度的大小必须相同。譬如，1个shape=(1, 6)的矩阵可以和1个shape=(2, 6)的矩阵在dim=0连接。

例子可以参考下面的定义和注释。

def cat(tensors: Union[Tuple[Tensor, ...], List[Tensor]], dim: _int = 0, *, out: Optional[Tensor] = None) -> Tensor: 
    r"""
    cat(tensors, dim=0, *, out=None) -> Tensor
  
    Concatenates the given sequence of :attr:`seq` tensors in the given dimension.
    All tensors must either have the same shape (except in the concatenating
    dimension) or be a 1-D empty tensor with size ``(0,)``.
  
    :func:`torch.cat` can be seen as an inverse operation for :func:`torch.split`
    and :func:`torch.chunk`.
  
    :func:`torch.cat` can be best understood via examples.
  
    .. seealso::
  
        :func:`torch.stack` concatenates the given sequence along a new dimension.
  
    Args:
        tensors (sequence of Tensors): any python sequence of tensors of the same type.
            Non-empty tensors provided must have the same shape, except in the
            cat dimension.
        dim (int, optional): the dimension over which the tensors are concatenated
  
    Keyword args:
        out (Tensor, optional): the output tensor.
  
    Example::
  
        >>> x = torch.randn(2, 3)
        >>> x
        tensor([[ 0.6580, -1.0969, -0.4614],
                [-0.1034, -0.5790,  0.1497]])
        >>> torch.cat((x, x, x), 0)
        tensor([[ 0.6580, -1.0969, -0.4614],
                [-0.1034, -0.5790,  0.1497],
                [ 0.6580, -1.0969, -0.4614],
                [-0.1034, -0.5790,  0.1497],
                [ 0.6580, -1.0969, -0.4614],
                [-0.1034, -0.5790,  0.1497]])
        >>> torch.cat((x, x, x), 1)
        tensor([[ 0.6580, -1.0969, -0.4614,  0.6580, -1.0969, -0.4614,  0.6580,
                 -1.0969, -0.4614],
                [-0.1034, -0.5790,  0.1497, -0.1034, -0.5790,  0.1497, -0.1034,
                 -0.5790,  0.1497]])
    """
    ...

返回值与原tensor不共享内存！

`stack`堆叠

stack与cat有很大的区别，stack把两个或多个tensor在dim上创建一个全新的维度进行连接，非指定维度个数不变，创建的维度的大小取决于这次连接使用了多少个tensor。譬如，3个shape=(3,)行向量按dim=0连接，会变成一个shape=(3, 3)的矩阵；两个3阶方阵按dim=-1连接，就变成一个(3, 3, 2)的tensor。

def stack(tensors: Union[Tuple[Tensor, ...], List[Tensor]], dim: _int = 0, *, out: Optional[Tensor] = None) -> Tensor: 
    r"""
    stack(tensors, dim=0, *, out=None) -> Tensor
  
    Concatenates a sequence of tensors along a new dimension.
  
    All tensors need to be of the same size.
  
    .. seealso::
  
        :func:`torch.cat` concatenates the given sequence along an existing dimension.
  
    Arguments:
        tensors (sequence of Tensors): sequence of tensors to concatenate
        dim (int, optional): dimension to insert. Has to be between 0 and the number
            of dimensions of concatenated tensors (inclusive). Default: 0
  
    Keyword args:
        out (Tensor, optional): the output tensor.
  
    Example::
  
        >>> x = torch.randn(2, 3)
        >>> x
        tensor([[ 0.3367,  0.1288,  0.2345],
                [ 0.2303, -1.1229, -0.1863]])
        >>> x = torch.stack((x, x)) # same as torch.stack((x, x), dim=0)
        >>> x
        tensor([[[ 0.3367,  0.1288,  0.2345],
                 [ 0.2303, -1.1229, -0.1863]],
  
                [[ 0.3367,  0.1288,  0.2345],
                 [ 0.2303, -1.1229, -0.1863]]])
        >>> x.size()
        torch.Size([2, 2, 3])
        >>> x = torch.stack((x, x), dim=1)
        tensor([[[ 0.3367,  0.1288,  0.2345],
                 [ 0.3367,  0.1288,  0.2345]],
  
                [[ 0.2303, -1.1229, -0.1863],
                 [ 0.2303, -1.1229, -0.1863]]])
        >>> x = torch.stack((x, x), dim=2)
        tensor([[[ 0.3367,  0.3367],
                 [ 0.1288,  0.1288],
                 [ 0.2345,  0.2345]],
  
                [[ 0.2303,  0.2303],
                 [-1.1229, -1.1229],
                 [-0.1863, -0.1863]]])
        >>> x = torch.stack((x, x), dim=-1)
        tensor([[[ 0.3367,  0.3367],
                 [ 0.1288,  0.1288],
                 [ 0.2345,  0.2345]],
  
                [[ 0.2303,  0.2303],
                 [-1.1229, -1.1229],
                 [-0.1863, -0.1863]]])
    """
    ...

返回值与原tensor不共享内存！

`view`改变形状

view先把数据变成一维数组，然后再转换成指定形状。变换前后的元素个数并不会改变，所以变换前后的shape的乘积必须相等。详细例子如下：

def view(self, *shape): # real signature unknown; restored from __doc__
    """
    Example::
  
        >>> x = torch.randn(4, 4)
        >>> x.size()
        torch.Size([4, 4])
        >>> y = x.view(16)
        >>> y.size()
        torch.Size([16])
        >>> z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
        >>> z.size()
        torch.Size([2, 8])
  
        >>> a = torch.randn(1, 2, 3, 4)
        >>> a.size()
        torch.Size([1, 2, 3, 4])
        >>> b = a.transpose(1, 2)  # Swaps 2nd and 3rd dimension
        >>> b.size()
        torch.Size([1, 3, 2, 4])
        >>> c = a.view(1, 3, 2, 4)  # Does not change tensor layout in memory
        >>> c.size()
        torch.Size([1, 3, 2, 4])
        >>> torch.equal(b, c)
        False 
    """
    return _te.Tensor(*(), **{})

返回值与原tensor共享内存

`reshape`改变形状

reshape与view的区别如下：

view只能改变连续(.contiguous())的tensor，如果已经对tensor进行了permute、transpose等操作，tensor在内存中会变得不连续，此时调用view会报错。且view方法与原来的tensor共享内存。
reshape再调用时自动检测原tensor是否连续，如果是，则等价于view；如果不是，先调用.contiguous()，再调用view，此时返回值与原来tensor不共享内存。

def reshape(self, shape: Sequence[Union[_int, SymInt]]) -> Tensor: 
        ...

1.3.8 数学运算

迅速入门Pytorch深度学习框架_共享内存

def mean(self, dim=None, keepdim=False, *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__ 
        ...
    def sum(self, dim=None, keepdim=False, dtype=None): # real signature unknown; restored from __doc__
        ...
    def median(self, dim=None, keepdim=False): # real signature unknown; restored from __doc__
        ...
    def mode(self, dim=None, keepdim=False): # real signature unknown; restored from __doc__
        ...
    def dist(self, other, p=2): # real signature unknown; restored from __doc__
        ...
    def std(self, dim, unbiased=True, keepdim=False): # real signature unknown; restored from __doc__
        ...
    def var(self, dim, unbiased=True, keepdim=False): # real signature unknown; restored from __doc__
        ...
    def cumsum(self, dim, dtype=None): # real signature unknown; restored from __doc__
        ...
    def cumprod(self, dim, dtype=None): # real signature unknown; restored from __doc__
        ...

迅速入门Pytorch深度学习框架_共享内存_02

1.3.9 使用指定设备计算tensor

to可以把tensor转移到指定设备上。

def to(self, *args, **kwargs): # real signature unknown; restored from __doc__
        """
        Example::
      
            >>> tensor = torch.randn(2, 2)  # Initially dtype=float32, device=cpu
            >>> tensor.to(torch.float64)
            tensor([[-0.5044,  0.0005],
                    [ 0.3310, -0.0584]], dtype=torch.float64)
      
            >>> cuda0 = torch.device('cuda:0')
            >>> tensor.to(cuda0)
            tensor([[-0.5044,  0.0005],
                    [ 0.3310, -0.0584]], device='cuda:0')
      
            >>> tensor.to(cuda0, dtype=torch.float64)
            tensor([[-0.5044,  0.0005],
                    [ 0.3310, -0.0584]], dtype=torch.float64, device='cuda:0')
      
            >>> other = torch.randn((), dtype=torch.float64, device=cuda0)
            >>> tensor.to(other, non_blocking=True)
            tensor([[-0.5044,  0.0005],
                    [ 0.3310, -0.0584]], dtype=torch.float64, device='cuda:0')
        """
        return _te.Tensor(*(), **{})

迅速入门Pytorch深度学习框架