Thanks to theidioms.com

Convolutional Neural Network Theoretical Course (Course VIII)

Convolutional Neural Network Theoretical Course (Course VIII)

Stride and Calculation of Output Size

In the last lesson, we discussed shifting the convolution filter (kernel) by one pixel at a time, i.e., by a stride of 1. Well, it is not necessary to move a convolution filter across an image by just taking a stride of 1.

The stride (s) taken during a series of convolution operation can be changed according to the need of the problem. When s = 1, it means that the filter will be shifted by a step of one column of pixel values to the right or one row of pixel values to the bottom. Similarly, when s = 2, it means that the filter will be shifted by a step of two columns of pixel values to the right or two rows of pixel values to the bottom and so on.

However, if you think for a moment, why would anyone want to take a larger stride since the neural network will be ignoring pixel values during computation. There are multiple reasons but here are some major ones:

  • Taking a larger stride allows a series of convolution operations to be computed faster for a large dimension image (say, 3000×3000 pixels).
  • Lesser memory is needed to store the results of the convolution operation.
  • The size of the output tensor can be reduced to make the input to the next layer of a Convolutional Neural Network smaller.
  • Since overlapping pixel values are ignored when selecting new regions, overfitting can be avoided.

Finding the size of an output tensor after a series of convolution operations

Generally, in a Convolutional Neural Network, the input image undergoes multiple convolution operations, where each convolution operation might change the size of the input image. In this section you will learn an easy way to find the size on an output tensor after a series of convolution operations.

If n_{A1} x n_{A2} is the size of the input image tensor, n_K x n_K is the size of the convolution filter and s is the value of stride taken, then, the size of the resulting tensor, n_{O1} x n_{O2} (after a series of convolution operation) can be found out using the following formula:

    \[n_{O1} = \text{floor}\begin{pmatrix} \dfrac{n_{A1}-n_{K}}{s} + 1  \end{pmatrix}\]

and,

    \[n_{O2} = \text{floor}\begin{pmatrix} \dfrac{n_{A2}-n_{K}}{s} + 1 \end{pmatrix}\]

where, \text{floor()} means that a floating-point result is rounded to its closest smallest integer value.

We’ve noticed that in the first lesson, we had started with an image tensor A of size 4×4 and a stride s of 1. However, after performing a series of convolutions, the output tensor O got reduced to a size of 2×2. Let us see if the above formula can show similar results for a filter size of 3×3,

    \[n_{O1} = \text{floor}\begin{pmatrix}{\dfrac{4-3}{1} + 1 \end{pmatrix} = 2\]

and,

    \[n_{O2} = \text{floor}\begin{pmatrix}{\dfrac{4-3}{1} + 1 \end{pmatrix} = 2\]

Thus, the size of the output tensor is 2 x 2.

With this, you now know about the Convolution Operation in CNNs. In the next chapter, you will be introduced to another important operation in CNNs, the Pooling Operation.

Leave your thought here

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami