Thanks to theidioms.com

Deep Learning Theoretical Course (Course VII)

Deep Learning Theoretical Course (Course VII)

Neurons in Deep Learning

Neurons in Deep Learning are computational units that take in real-valued inputs of features, processes them, and produces real-valued outputs.

In standard practice, a neuron applies a non-linear function to the linear combination of all input features passed into the neuron. This is basically how it ‘computes’ as a computational unit in a deep neural network.

Mathematically, given a real-valued input of features (x_1, x_2, x_3, ..., x_n) with weights (w_1, w_2, w_3, ..., w_n) and a bias, b, the neuron first calculates the linear combination of all the inputs and the weights, given by z.

    \[z = w_{1}x_{1} + w_{2}x_{2} + w_{3}x_{3} + ... + w_{n}x_{n} + b\]

where, n is a positive integer.

The linear combination, z, is then passed through a non-linear activation function, f, to get the final output, \hat{y} from the neuron.

    \[\hat{y} = f(z) = f(w_{1}x_{1} + w_{2}x_{2} + w_{3}x_{3} + ... + w_{n}x_{n} + b)\]

The below figure shows how a neuron may look like conceptually as a diagram.

Neurons in Deep Learning

Using matrix notation

In practice, the number of input features, and their respective weights, is very large and thus, it is better to use matrix notation for it. The input features are denoted by a bold-faced \textbf{x} and the weights are represented by a bold-faced \textbf{w}.

    \[\textbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}, \textbf{w} = \begin{pmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{pmatrix}\]

So, the linear combination of \textbf{x} and \textbf{w} can be found out using the following expression,

    \[z = \textbf{w}^{T}\textbf{x} + b\]

where, \textbf{w}^{T} represents the matrix transpose of \textbf{w} (we need to transpose the weight matrix for matrix multiplication to be possible).

Finally, the output of the neuron can be found out as,

    \[\hat{y} = f(z) = \textbf{w}^{T}\textbf{x} + b\]

Note that, the values of inputs \textbf{x} is the data we have in our dataset. However, the value of the weights \textbf{w} is randomly initialized. It will make sense to you about why the weights are randomly initialized later down the course when you learn about how to train a deep neural network.

Real-life example of a neuron in action

Consider we are trying to predict if a person is obese or not given his height and weight and we have the following dataset:

Height (in feet)Weight (in kgs)Condition (Not Obese/Obese)
480Obese

Here, \textbf{x} = \begin{pmatrix} 4  \\ 80 \end{pmatrix}. Now, let us randomly initialize our weights to say, \textbf{w} = \begin{pmatrix} 0.2  \\ 0.00001  \end{pmatrix} and our bias b = 1.

Now, that we have a general intuition about what our inputs, weights and bias are, let us find the linear combination of all the inputs and the weights, z.

    \[z &= \textbf{w}^{T}\textbf{x} + b = \begin{pmatrix} 0.2 & 0.00001 \end{pmatrix} \begin{pmatrix} 4 \\ 80 \end{pmatrix} + 1 = \begin{pmatrix} 0.808 \end{pmatrix} + 1 = \begin{pmatrix} 1.808 \end{pmatrix}\]

Great!

Next, we will be using an activation function called the sigmoid function. We will be learning more about the sigmoid function as well as other activation functions in detail in the next lesson. The sigmoid function is computed as follows:

    \[f(z) = \sigma(z) = \dfrac{1}{1+e^{-z}}\]

The sigmoid function gives a probability of input data belonging to a certain class, i.e. for two classes A and B, if the probability of the output is less than 0.5 the input data falls in class A else it falls in class B. So, in our case, we can assume that our class A is ‘Not Obese’ and class B is ‘Obese’.

Now, let us pass z through the activation function f(z).

    \[f(z) =  \dfrac{1}{1+e^{-z}}  =  \dfrac{1}{1+e^{-1.808}}  =  0.71\]

Our output prediction has a probability greater than 0.5 and thus, our neuron correctly predicted that the input data shows characteristics of the person being ‘Obese’. However, think for a second. How did the prediction become accurate just because we passed the input data through two mathematical functions?

This is purely the result of luck. If the weights we had randomly initialized been different than the ones in the above example, then, our prediction may have been different. In the above example, our neuron is not learning anything from the dataset and it is just computing the information given to it. In short, there is no ‘Machine Learning’ going on over here.

Keep this in mind as we move onto future chapters for learning how to train a neuron in finding the best weights. But, for now, congratulations on knowing what neurons in deep learning are!

Leave your thought here

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami