Image classification is one of the most fundamental tasks in computer vision — it enables machines to recognize and categorize objects within images and videos. In this tutorial, we’ll walk through how to build an image classification model using PyTorch, one of the most popular deep learning framework today.
Hiiii
We’ll use the CIFAR-10 dataset, a classic benchmark dataset that contains 60,000 color images divided into 10 different classes (‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, and ‘truck’). By the end of this tutorial, you’ll learn how to:
Load and preprocess image data using
torchvisionBuild a Convolutional Neural Network (CNN) from scratch.
Train the model on the CIFAR-10 dataset
Evaluate its accuracy and visualize the results
Whether you’re a beginner learning deep learning with PyTorch or an intermediate developer looking to strengthen your understanding of CNNs, this guide will give you a solid, hands-on foundation in image classification. So let’s get started,
Prerequisite-
Make sure u have installed: Python 3.8 or higher and and following libraries:
pip install torch torchvision matplotlib numpyif you’re using Google Colab, most of these libraries are already installed.
Import Libraries:-
import matplotlib.pyplot as plt # for plotting
import numpy as np # for transformation
import torch # PyTorch package
import torchvision # load datasets
import torchvision.transforms as transforms # transform data
import torch.nn as nn # basic building block for neural neteorks
import torch.nn.functional as F # import convolution functions like Relu
import torch.optim as optim # optimzerFirst, we import the libraries matplotlib and numpy. These are essential libraries for plotting and data transformation respectively.
The torch library is the PyTorch package.
torchvisionfor loading popular data sets.torchvision.transformsfor performing transformation on the image data.torch.nnis for defining the neural network.torch.nn.functionalfor importing functions like Relu.torch.optimfor implementing optimization algorithms such as Stochastic Gradient Descent (SGD).
1. Load and normalize data
Before loading our data, we first define a transformation that we want to apply to the image data from the CIFAR10 dataset.
# python image library of range [0, 1]
# transform them to tensors of normalized range[-1, 1]
transform = transforms.Compose( # composing several transforms together
[transforms.ToTensor(), # to tensor object
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # mean = 0.5, std = 0.5
# set batch_size
batch_size = 4
# set number of workers
num_workers = 2
# load train data
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
shuffle=True, num_workers=num_workers)
# load test data
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
shuffle=False, num_workers=num_workers)
# put 10 classes into a set
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')First, we determine the transformations we want and put it into a list of brackets [] and pass it into the transforms.Compose () function. In our code, we have these two transformations:
ToTensor ()
Converts the type images from the CIFAR 10 dataset made up of Python Imaging Library (PIL) images into tensors to be used with with torch library
Normalize (mean, std)
The number of parameters we pass into the mean and std arguments depends on the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) of our PIL image
Since our PIL images are RGB, which means they have three channels — red, green, and blue — we pass in 3 parameters for both the mean and standard deviation sequence
We pass in 0.5 for mean and std is because based on the normalization formula : (x — mean) /std , the min of our PIL range (0) passed into the formula gives us (0 - 0.5 / 0.5) = -1 and the max (1) gives us (1 - 0.5 / 0.5) = 1 . We end up getting a range of [-1, 1].
We normalize to help the CNN perform better as it helps get data within a range and reduces the skewness since it’s centered around 0. This helps it learn faster and better.
Now, let’s move on to the batch_size and num_workers.
batch_size
This is the number of training samples in one iteration or one forward/backward pass. Since we give batch_size the argument 4, this means were are getting 4 images at every iteration of training the network. The first 4 images (
1 to 4) are passed into the network, then the next 4 (5–8), and so on until it processes all the samples.Splitting our data into batches is crucial because the network is constantly learning and updating its weights. Thus, each batch is training the network successively and considering the updated weights from previous batches.
There are a few common guidelines to setting the right batch size, but the important thing to remember is that the higher the batch size, the more accurate it is, but at the cost of taking up more memory space.
num_workers
When the value of
num_workersis set to a positive integer, this allows PyTorch to switch to perform multi-process data loading.In our code, we set
2as the number of workers. This means there are 2 workers simultaneously putting data into the computer’s RAM.We use
num_workersbecause it allows us to speed up the training process by utilizing machines with multiple cores. By the time the key process is ready for the next batch of samples, the next batch is already loaded and ready to go.
Now we are ready to define and load our train and test data.
We use torchvision.datasets and call the CIFAR 10 data with .CIFAR 10. Inside this function, we pass in the multiple arguments and set the output to be trainset.
root = "./data"→ this creates a folder nameddataat the root directorytrain = True→ we set train as true for the train datadownload = True→ we set download as true so we’re downloading the datatransform = transform→ we pass in our previously defined transformation to transform the data as it’s loaded in.
Then we use torch.utils.data.DataLoader to load the data with the arguments below.
trainset→ our defined data abovebatch_size = batch_size→ our batch_sizeshuffle = True→ set to True to have the data reshuffled at every epochnum_workers = num_workers→ 2 workers loading the data
We then set the output to be trainloader. We’ll do the same for our test set except we set train = False and shuffle = False because it’s used to test the network.
After that, we can set define our classes into a set () in Python to guarantee that there are no duplicates.
2. Define the CNN
class Net(nn.Module):
''' Models a simple Convolutional Neural Network'''
def __init__(self):
''' initialize the network '''
super(Net, self).__init__()
# 3 input image channel, 6 output channels,
# 5x5 square convolution kernel
self.conv1 = nn.Conv2d(3, 6, 5)
# Max pooling over a (2, 2) window
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)# 5x5 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
''' the forward propagation algorithm '''
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
print(net)Printing the network shows us important information about the layers.
Net(
(conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1,
ceil_mode=False)
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)It might look very scary at first glance, but once you understand the important components that make up this network, it can be very intuitive.
We create networks wth classes as it gives us an object-oriented approach to building out network. This allows us to tweak every aspect of our network and we can easily visualize the network along with how the forward algorighm works.
First we initialize out net with convolutional Layers ( conv ), pooling, and Fully Connected ( fc) layer.
