Thanks to

TensorFlow 2.0 for Deep Learning (Course IX)

TensorFlow 2.0 for Deep Learning (Course IX)

Logistic Regression in TensorFlow 2.0

In the previous chapter, we predicted a continuously-valued label using linear regression. In this chapter, we will discuss logistic regression which is useful for classification problems where the output is discrete rather than continuous. Logistic regression models the input-output behavior with an S-shaped curve (logistic function) which gives the probability of input variable belonging to a certain class.

In this chapter, we will be using the MNIST handwritten digits dataset. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28×28 pixels) with values from 0 to 255.

Logistic Regression in TensorFlow 2.0

Importing the dataset

We will first start by loading the MNIST dataset from the tensorflow datasets. We will load both the training and testing datasets. Since the data are images, we flatten the pixel values into a 1-D array of size 784 using the reshape method of numpy. We also normalize the pixel intensities such that the pixel values are between 0 to 1.

from tensorflow.keras.datasets import mnist
import numpy as np

# Load train and test data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Converting data to float32
x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32)

# Flatten images to 1-D vector of 784 features (28*28).
x_train, x_test = x_train.reshape(x_train.shape[0], -1), x_test.reshape(x_test.shape[0], -1)

# Normalize images value from [0, 255] to [0, 1].
x_train, x_test = x_train / 255., x_test / 255.
Downloading data from
11493376/11490434 [==============================] - 0s 0us/step

Batching the data

Due to the large number of images for training, it is suggested to train the images in batches. So we will be using the function to shuffle and create data batches.

train_data =, y_train))
train_data = train_data.repeat().shuffle(5000).batch(256).prefetch(1)

Building the model

Now, we define the logistic regression model as a Python class with two methods: init and call. Similar to the linear regression model, the weights and biases for the logistic regression model is defined in the init method whereas the formula is defined in the call method.

As the input feature vector has 784 pixel values and 10 classes (numbers from 0-9), the weight should be of shape [784, 10] and bias should be a 1-D vector having 10 values. Then we will multiply the inputs vector with the weights and finally add a bias to obtain the logits. Finally, a softmax function is applied to normalize the logits to a probability distribution.

class Model:
    def __init__(self):
        self.W = tf.Variable(tf.ones([784, 10]), name="weight")
        self.b = tf.Variable(tf.zeros([10]), name="bias")

    def __call__(self, x):
        return tf.nn.softmax(tf.matmul(x, self.W) + self.b)

Loss function

Now, we will pass the logit obtained from the model to a loss function in order to evaluate the model’s performance. We first one-hot encode the outputs using the one_hot() function of TensorFlow.

Then, we compute the cross-entropy loss between the predicted value and the actual one-hot encoded label. Another function that computes the accuracy of our model. For updating the weight and biases on each iteration (epoch), we will be using the Stochastic gradient descent (SGD) optimizer.

def loss(y_pred, y_true):
    # Encode label to a one hot vector
    y_true = tf.one_hot(y_true, depth=10)
    # Clip prediction values to avoid log(0) error
    y_pred = tf.clip_by_value(y_pred, 1e-9, 1.)
    # Compute cross-entropy
    return tf.reduce_mean(-tf.reduce_sum(y_true * tf.math.log(y_pred),1))

def accuracy(y_pred, y_true):
    # Predicted class is the index of highest score in prediction vector (i.e. argmax).
    correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64))
    return tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Stochastic gradient descent optimizer.
optimizer = tf.optimizers.SGD(lr = 0.1)

Now, for each iteration (epoch) during the model training, we need to:

  • Compute gradients of the model parameters with respect to the loss: The GradientTape() method records all the operations that are being executed inside the context manager. This is required when computing the gradient.
  • Update the model parameters: After computing the gradients of W and b, update them using the SGD optimizer.
def train(model, x, y):
    with tf.GradientTape() as t:
        pred = model(x)
        current_loss = loss(pred, y)

    # Compute gradients
    gradients = t.gradient(current_loss, [model.W, model.b])
    # Update W and b following gradients.
    optimizer.apply_gradients(zip(gradients, [model.W, model.b]))

Model training

Finally, the model is initialized and is trained for 60 iterations (epochs).

# Initialize the model
model = Model()
epochs = 60
losses = []

for epoch_count in range(epochs):
    current_loss = loss(model(x_train), y_train)
    # Train the model
    train(model, x_train, y_train)


Finally, we can visualize how the value of loss decreases over each epoch by visualizing the values of loss in each iteration using the matplotlib library.

# Visualizing the loss function
plt.xlabel('Num of epochs')
Logistic Regression in TensorFlow 2.0

From the above graph, we can clearly see how the value of loss is decreasing over each epoch. Running the training for a higher number of epochs may decrease the loss even further. So feel free to try it out!

In the next chapter, you will get introduced to building Neural Networks in a more TensorFlow-ic way.

Leave your thought here

Your email address will not be published. Required fields are marked *

Close Bitnami banner