Thanks to theidioms.com

Convolutional Neural Network Theoretical Course (Course VIII)

Convolutional Neural Network Theoretical Course (Course VIII)

The Convolution/Pooling Operation for RGB images

Until now, we have only discussed the convolution and pooling operations on single-channel images, i.e., grayscale images. However, the colored photos taken from digital cameras are RGB images. Such images are formed by the addition of three color channels: Red, Green, and Blue as shown in the image below,

RGB image

Mathematically, an RGB image A, is represented as n_{A1} x n_{A2} x n_{c}, where the first two dimensions (n_{A1} and n_{A2}) represent the number of rows and columns of pixels in the image and the last dimension (n_c) represents the number of color channels. So, for an RGB image of 512×512 resolution, the actual representation of it is 512x512x3.

In this case, the convolution/pooling operation is performed on all three colour channels (Red, Green and Blue) simultaneously and a single output tensor is obtained by taking a sum of the convolution/pooling operation of each colour channel.

Let us understand this clearly with the following example of a convolution operation:

Consider an RGB image A with a dimension of 3x3x3,

    \[A_R = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}, A_G = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}, A_B = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}\]

Also, consider a kernel K with a dimension of 3×3,

    \[K = \begin{pmatrix} k_{11} & k_{12} & k_{13} \\ k_{21} & k_{22} & k_{23} \\ k_{31} & k_{32} & k_{33} \end{pmatrix}\]

The output tensor O is obtained as follows,

    \[O = A_R * K + A_G * K + A_B * K\]

The same process can be followed for an image with a larger dimension than the kernel. The kernel is convolved with each colour channel of each subset tensor of the image to get the resultant output tensor.

The above concept can be extended for the pooling operation as well where max-pooling or average-pooling is applied to each colour channel of each subset tensor of the image to get the resultant output tensor.

Leave your thought here

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami