Thanks to theidioms.com

Convolutional Neural Network Theoretical Course (Course VIII)

Convolutional Neural Network Theoretical Course (Course VIII)

Padding an Image

In some cases, it is not possible to perform a convolution/pooling operation on an image if the dimension of the image is smaller than the dimension of the filter region. Thus, to fix this problem, we can pad such images with rows and columns of pixel values to form an image tensor. There are different ways to choose the values of the padded pixels but we mostly use ‘0’ or the value of the closest pixel.

Here is a simple example demonstrating the concept of padding an image with zeroes. Consider an image tensor A with a dimension of 2×2 as shown on the left side of the image below. Since this image only has two columns of pixel values, we cannot use a 3×3 filter on it. So, we can pad the image with zeroes to make 3×3 convolution/pooling operation possible. Padding can be done as shown on the right side of the image below,

Padding a 2x2 image

Now, we have a 4×4 image tensor and the 3×3 convolution/pooling operation can be performed. Here, the value of padding is 1 since we padded the image once on the top, left, right, and bottom.

Finding the size of an output tensor when padding is used

If n_{A1} x n_{A2} is the size of the input image tensor, n_K x n_K is the size of the convolution filter, s is the value of stride taken and p is the amount of padding, then, the size of the resulting tensor, n_{O1} x n_{O2} (after a series of convolution operation) can be found out using the following formula:

    \[n_{O1} = \text{floor}\begin{pmatrix} \dfrac{n_{A1}+2p-n_{K}}{s} + 1 \end{pmatrix}\]

and,

    \[n_{O2} = \text{floor}\begin{pmatrix} \dfrac{n_{A2}+2p-n_{K}}{s} + 1 \end{pmatrix}\]

Calculating the output tensor size when a filter size of 3×3, stride of 1 and a padding of 1 is used on a 4×2 image.

    \[n_{O1} = \text{floor}\begin{pmatrix} \dfrac{4+2-3}{1} + 1 \end{pmatrix} = 4\]

and,

    \[n_{O2} = \text{floor}\begin{pmatrix} \dfrac{2+2-3}{1} + 1 \end{pmatrix} = 2\]

Thus, the size of the output tensor is 4 x 2.

Note: Performing a convolution/pooling operation decreases the size of the input tensor but if the right padding value is chosen then the original size can be retained.

With this, you now have all the fundamental knowledge required to build a Convolutional Neural Network. In the next chapter, we will be tying up everything we have learned until now to build a Convolutional Neural Network.

Leave your thought here

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami