Thanks to theidioms.com

Convolutional Neural Network Theoretical Course (Course VIII)

Convolutional Neural Network Theoretical Course (Course VIII)

The Pooling Operation

The pooling operation is another fundamental operation of a Convolutional Neural Network. Thankfully, this operation can be understood much quicker since we already have a sound knowledge of the convolution operation.

In practice, two kinds of pooling operations are mostly used: max pooling and average pooling. We will illustrate both of these pooling operations in the sections below.

Max-pooling

The max-pooling operation takes in a tensor as an input and outputs the maximum element present in the tensor. This can be better understood using the following notation-based example:

Consider an image tensor A with a dimension size of 4×4,

    \[A = \begin{pmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{pmatrix}\]

Taking a max-pooling of size 2×2 and stride of 2, the output tensor can be obtained as follows:

    \[o_1 = \text{max}\begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}\ ,\ o_2 = \text{max}\begin{pmatrix} a_{13} & a_{14} \\ a_{23} & a_{24} \end{pmatrix}\ ,\ o_3 = \text{max}\begin{pmatrix} a_{31} & a_{32} \\ a_{41} & a_{42} \end{pmatrix}\ ,\ o_4 = \text{max}\begin{pmatrix} a_{33} & a_{34} \\ a_{43} & a_{44} \end{pmatrix}\]

The final output tensor is then obtained as follows,

    \[O = \begin{pmatrix} o_1 & o_2 \\ o_3 & o_4 \end{pmatrix}\]

Let us understand this even clearly with the help of a numerical example.

Consider an image tensor A with a dimension size of 4×4,

    \[A = \begin{pmatrix} 2 & 4 & 6 & 8 \\ 10 & 12 & 14 & 16 \\ 18 & 20 & 22 & 24 \\ 26 & 28 & 30 & 32 \end{pmatrix}\]

Taking a max-pooling of size 2×2 and stride of 2, the output tensor can be obtained as follows,

    \[o_1 = \text{max}\begin{pmatrix} 2 & 4 \\ 10 & 12 \end{pmatrix} = 12 \ ,\ o_2 = \text{max}\begin{pmatrix} 6 & 8 \\ 14 & 16 \end{pmatrix} = 16 \ ,\ o_3 = \text{max}\begin{pmatrix} 18 & 20 \\ 26 & 28 \end{pmatrix} = 28 \ ,\ o_4 = \text{max}\begin{pmatrix} 22 & 24 \\30 & 32 \end{pmatrix} = 32\]

The final output tensor is then obtained as follows,

    \[O = \begin{pmatrix} 12 & 16 \\ 28 & 32 \end{pmatrix}\]

Average-pooling

The average-pooling operation takes in a tensor as an input and outputs the average of all the elements present in the tensor. This can be better understood using the following notation-based example:

Consider an image tensor A with a dimension size of 4×4,

    \[A = \begin{pmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{pmatrix}\]

Taking a average-pooling of size 2×2 and stride of 2, the output tensor can be obtained as follows,

    \[o_1 = \text{avg}\begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}\ ,\ o_2 = \text{avg}\begin{pmatrix} a_{13} & a_{14} \\ a_{23} & a_{24} \end{pmatrix}\ ,\ o_3 = \text{avg}\begin{pmatrix} a_{31} & a_{32} \\ a_{41} & a_{42} \end{pmatrix}\ ,\ o_4 = \text{avg}\begin{pmatrix} a_{33} & a_{34} \\ a_{43} & a_{44} \end{pmatrix}\]

The final output tensor is then obtained as follows,

    \[O = \begin{pmatrix} o_1 & o_2 \\ o_3 & o_4 \end{pmatrix}\]

Let us understand this even clearly with the help of a numerical example.

Consider an image tensor A with a dimension size of 4×4,

    \[A = \begin{pmatrix} 2 & 4 & 6 & 8 \\ 10 & 12 & 14 & 16 \\ 18 & 20 & 22 & 24 \\ 26 & 28 & 30 & 32 \end{pmatrix}\]

Taking a max-pooling of size 2×2 and stride of 2, the output tensor can be obtained as follows,

    \[o_1 = \text{avg}\begin{pmatrix} 2 & 4 \\ 10 & 12 \end{pmatrix} = 7 \ ,\ o_2 = \text{avg}\begin{pmatrix} 6 & 8 \\ 14 & 16 \end{pmatrix} = 11 \ ,\ o_3 = \text{avg}\begin{pmatrix} 18 & 20 \\ 26 & 28 \end{pmatrix} = 23 \ ,\ o_4 = \text{avg}\begin{pmatrix} 22 & 24 \\30 & 32 \end{pmatrix} = 27\]

The final output tensor is then obtained as follows,

    \[O = \begin{pmatrix} 7 & 11 \\ 23 & 27 \end{pmatrix}\]

The pooling operation is usually performed after the convolution operation. Pooling is performed in order to further reduce the size of the input tensor by selecting only the important features from an image.

Leave your thought here

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami