Stochastic Gradient Descent is an optimization algorithm used to find the values of parameters (coefficients) of a function that minimizes a cost function(objective function).
In machine learning, we use mainly stochastic gradient descent to update the parameters of our model. Parameters may, for instance, refer to coefficients in Linear Regression and weights in neural networks and so on.
Stochastic gradient descent is a popular algorithm for training a wide range of models in machine learning, including (linear) support vector machines, logistic regression, and graphical models. When combined with the backpropagation algorithm, it is the de facto standard algorithm for training artificial neural networks. Recently SGD has been applied to large-scale and sparse machine learning problems often encountered in text classification and natural language processing.
The Stochastic Gradient Descent is very much similar to traditional Gradient Descent. SGD takes advantage of gradient descent and works similar to it only at each step of gradient descent it chooses 1 data point at random (hence the name stochastic).
Implementation of Stochastic Gradient Descent Classifier in Python
1. Import the necessary libraries/modules
Some essential python libraries are needed namely NumPy ( for some mathematical calculations), Pandas (for data loading and preprocessing) and some modules of Sklearn(for model development and prediction). Lets import other necessary libraries before we import modules of Sklearn:
#Import necessary libraries import numpy as np import pandas as pd
2. Import and Inspect the dataset
After importing necessary libraries, pandas function read_csv() is used to load the CSV file and store it as a pandas dataframe object. Then to inspect the dataset, head() function of the dataframe object is used as shown below. This dataset consists of logs which tell which of the users purchased/not purchased a particular product given other features (Id, Gender, age, estimated salary) as shown below:
#Import and Inspect the dataset dataset=pd.read_csv("master/Social_Network_Ads.csv") dataset.head()
3. Separate Dependent- Independent variables
After inspecting the dataset, the independent variable(X) and the dependent variable (y) are separated using iloc function for slicing as shown below. Our concern is to find the purchased or not value given Estimated Salary and Age from the above dataset. So the features Estimated Salary and Age (X) is the independent variable and Purchased(y) is the dependent variable with their values shown below.
#Separate Dependent and Independent variables X = dataset.iloc[:, [2, 3]].values y = dataset.iloc[:, 4].values print("X:\n",X) print("\ny:\n",y)
4. Split the dataset into train-test sets and Feature Scale
After separating the independent variable (X) and dependent variable(y), these values are split into train and test sets to train and evaluate the linear model. To split into test train sets test_train_split module of Sklearn is used with the test set 25 percent of available data as shown below. Here X_train and y_train are train sets and X_test and y_test are test sets. Also, the data is scaled using StandardScaler class form Sklearn that standardize features by removing the mean and scaling to unit variance as shown below:
# Splitting the dataset into the Training set and Test set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0) # Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
4. Fit SGD Classifier model to the dataset
After splitting the data into dependent and independent variables, the SGD Classifier model is fitted with train sets (ie X_train and y_train) using the SGDClassifier class specifying some parameters to be used. The class SGDClassifier implements a plain stochastic gradient descent learning routine that supports different loss functions and penalties for classification. There are different loss functions namely ‘hinge’, ‘log’, ‘modified_huber’, ‘squared_hinge’, and on to be used. Here in this problem case, we use ‘hinge’ loss which gives a linear SVM classifier. Here alpha is constant that multiplies the regularization term and max_iter is the maximum number of passes over the training data (aka epochs).
# Fitting SGD Classifier to the Training set from sklearn.linear_model import SGDClassifier classifier = SGDClassifier(loss="hinge", alpha=0.01, max_iter=200) classifier.fit(X_train, y_train)
5. Predict the test results
Finally, the model is tested on test data and compared with the actual values and showing this on the confusion matrix as shown below:
# Predicting the Test set results y_pred = classifier.predict(X_test) # Making the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) print(cm)
OUTPUT: [[66 2] [ 9 23]]
In this chapter, you got familiar with SGD Classifier along with its implementation in Python. Now head on to the next chapter in this course on Kernel Method.