What is PyTorch, and How Does It Work: All You Need to Know

Since its inception by the Facebook AI Research (FAIR) team in 2017, PyTorch has become a highly popular and efficient framework to create Deep Learning (DL) model. This open-source machine learning library is based on Torch and designed to provide greater flexibility and increased speed for deep neural network implementation. Currently, PyTorch is the most favored library for AI (Artificial Intelligence) researchers What Is PyTorch, and How Does It Work?

PyTorch is an optimized Deep Learning tensor library based on Python and Torch and is mainly used for applications using GPUs and CPUs. PyTorch is favored over other Deep Learning frameworks like TensorFlow and Keras since it uses dynamic computation graphs and is completely Pythonic. It allows scientists, developers, and neural network debuggers to run and test portions of the code in real-time. Thus, users don’t have to wait for the entire code to be implemented to check if a part of the code works or not. 

The two main features of PyTorch are:

  • Tensor Computation (similar to NumPy) with strong GPU (Graphical Processing Unit) acceleration support
  • Automatic Differentiation for creating and training deep neural networks

Basics of PyTorch

The basic PyTorch operations are pretty similar to Numpy. Let’s understand the basics first.

  • Introduction to Tensors

In machine learning, when we represent data, we need to do that numerically. A tensor is simply a container that can hold data in multiple dimensions. In mathematical terms, however, a tensor is a fundamental unit of data that can be used as the foundation for advanced mathematical operations. It can be a number, vector, matrix, or multi-dimensional array like Numpy arrays. Tensors can also be handled by the CPU or GPU to make operations faster. There are various types of tensors like Float Tensor, Double Tensor, Half Tensor, Int Tensor, and Long Tensor, but PyTorch uses the 32-bit Float Tensor as the default type. 

  • Mathematical Operations

The codes to perform mathematical operations are the same in PyTorch as in Numpy. Users need to initialize two tensors and then perform operations like addition, subtraction, multiplication, and division on them. 

  • Matrix Initialization and Matrix Operations

To initialize a matrix with random numbers in PyTorch, use the function randn() that gives a tensor filled with random numbers from a standard normal distribution. Setting the random seed at the beginning will generate the same numbers every time you run this code. Basic matrix operations and transpose operation in PyTorch are also similar to NumPy. 

Common PyTorch Modules

In PyTorch, modules are used to represent neural networks. 

  • Autograd

The autograd module is PyTorch’s automatic differentiation engine that helps to compute the gradients in the forward pass in quick time. Autograd generates a directed acyclic graph where the leaves are the input tensors while the roots are the output tensors. 

  • Optim

The Optim module is a package with pre-written algorithms for optimizers that can be used to build neural networks. 

  • nn

The nn module includes various classes that help to build neural network models. All modules in PyTorch subclass the nn module. 

Dynamic Computation Graph

Computational graphs in PyTorch allow the framework to calculate gradient values for the neural networks built. PyTorch uses dynamic computational graphs. The graph is defined indirectly using operator overloading while the forward computation gets executed. Dynamic graphs are more flexible than static graphs, wherein users can make interleaved construction and valuation of the graph. These are debug-friendly as it allows line-by-line code execution. Finding problems in code is a lot easier with PyTorch Dynamic graphs – an important feature that makes PyTorch such a preferred choice in the industry. 

Computational graphs in PyTorch are rebuilt from scratch at every iteration, allowing the use of random Python control flow statements, which can impact the overall shape and size of the graph every time an iteration occurs. The advantage is – there’s no need to encode all possible paths before launching the training. You run what you differentiate.

Data Loader

Working with large datasets requires loading all data into memory in one go. This causes memory outage, and programs run slowly. Besides, it’s hard to maintain data samples processing code. PyTorch offers two data primitives – DataLoader and Dataset – for parallelizing data loading with automated batching and better readability and modularity of codes. Datasets and DataLoader allow users to use their own data as well as pre-loaded datasets. While Dataset houses the samples and the respective labels, DataLoader combines dataset and sampler and implements an iterable around the Dataset so users can easily access samples.

Information Source – https://www.simplilearn.com/what-is-pytorch-article