# Supervised Machine Learning with Python (Course VI)

July 13, 2020 2020-08-04 10:50## Supervised Machine Learning with Python (Course VI)

### Support Vector Regression

So far, we have learned about various techniques that help predict the value of a dependent variable based on its independent features and their relationship. In this lesson, we will be addressing the concept of an algorithm based on Support Vectors that can be used for performing both linear and non-linear regressions.

#### What is Support Vector Regression?

Support Vector Regression (SVR) is a supervised learning model that can be used to perform both linear and nonlinear regressions. In the previous lessons, we learned that the goal of applying linear regression is to minimize the error between the prediction and data. However, the goal of applying Support Vector Regression to a data set is to make sure that the errors do not exceed the threshold. In SVR, we fit as many instances as possible between the lines while limiting the margin violation. An SVR model uses the following hyperparameters in its model that determine the performance of the model.

**Kernel**: The function used to map a lower-dimensional data into a higher dimensional data.**Hyper Plane**: The separation line between the data classes. For a Support Vector Regression problem, a hyperplane is a line that will help us predict the continuous value or target value.**Decision****Boundary line**: The boundary lines are essentially the decision boundaries of the hyperplane. The support vectors can be on the Boundary lines or outside it. The best fit line is determined on the basis of the hyperplane having the maximum number of points inside its boundary line.**Support Vectors**are the data points that are closest to the decision boundary. The distance of the points is minimum or least.

**Support Vector Regression** in Python

This section will walk you through a step-wise Python implementation of the prediction process that we just discussed.

**1. ****Importing necessary libraries**

**Importing necessary libraries**

First, let us import some essential Python libraries.

# Importing the libraries import numpy as np # for array operations import pandas as pd # for working with DataFrames import requests, io # for HTTP requests and I/O commands import matplotlib.pyplot as plt # for data visualization %matplotlib inline # scikit-learn modules from sklearn.model_selection import train_test_split # for splitting the data from sklearn.metrics import mean_squared_error # for calculating the cost function from sklearn.preprocessing import StandardScaler # for scaling the data from sklearn.svm import SVR # for building the model

###### 2**.** **Importing the data set**

For this problem, we will be loading a CSV dataset that you can download from here. The data set consists of temperature and pressure logs. We will be loading the data set using the read_csv() function from the pandas library and store it as a pandas DataFrame object.

# Importing the dataset from the url of the data set url = "https://forge.scilab.org/index.php/p/rdataset/source/file/master/csv/datasets/pressure.csv" data = requests.get(url).content # Reading the data dataset = pd.read_csv(io.StringIO(data.decode('utf-8')), index_col = 'Unnamed: 0') dataset.head()

dataset.describe()

###### 3**.** **Separating the features and the target variable**

After loading the dataset, the independent variable ($x$) and the dependent variable ($y$) need to be separated. Our concern is to find the relationships between the feature (Temperature) and the target variable (Pressure).

x = dataset.iloc[:, [0]].values # Temperature values y = dataset.iloc[:, [1]].values # Pressure values

###### 4. Feature Scaling

The data is scaled using StandardScaler() module of scikit-learn that standardizes the values.

# Feature scaling sc_x = StandardScaler() x = sc_x.fit_transform(x) sc_y = StandardScaler() y = sc_y.fit_transform(y)

**5. ****Splitting the data into a train set and a test set**

**Splitting the data into a train set and a test set**We use the train_test_split() module of scikit-learn for splitting the data into a train set and a test set. We will be using 20% of the available data as the testing set and the remaining data as the training set.

# Splitting the dataset into training and testing set (80/20) x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 28)

**6. Fitting the model to the training set**

After splitting the data into dependent and independent variables, the Support Vector Regression model is fitted with the training data using the SVR() class from scikit-learn.

# Initializing the SVR model with 10 decision trees model = SVR(kernel = 'rbf') # Fitting the SVR model to the data model.fit(x_train, y_train.ravel())

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto', kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

###### 7. **Calculating the loss after training**

Let us now calculate the loss between the actual target values in the testing set and the values predicted by the model with the use of a cost function called the Root Mean Square Error (RMSE).

$$RMSE = \sqrt{(\frac{1}{n})\sum_{i=1}^{n}(y_{i} – \hat{y_{i}})^{2}}$$

where,

$y_i$ is the actual target value,

$\hat{y_{i}}$ is the predicted target value, and

$n$ is the total number of data points.

The RMSE of a model determines the absolute fit of the model to the data. In other words, it indicates how close the actual data points are to the model’s predicted values. A low value of RMSE indicates a better fit and is a good measure for determining the accuracy of the model’s predictions.

# Predicting the target values of the test set y_pred = model.predict(x_test) # RMSE (Root Mean Square Error) rmse = float(format(np.sqrt(mean_squared_error(y_test, y_pred)), '.3f')) print("\nRMSE: ", rmse)

RMSE: 1.005

**Putting it all together**

The final code for the implementation of **Support Vector Regression in Python** is as follows.

# Importing the libraries import numpy as np # for array operations import pandas as pd # for working with DataFrames import requests, io # for HTTP requests and I/O commands import matplotlib.pyplot as plt # for data visualization %matplotlib inline # scikit-learn modules from sklearn.model_selection import train_test_split # for splitting the data from sklearn.metrics import mean_squared_error # for calculating the cost function from sklearn.preprocessing import StandardScaler # for scaling the data from sklearn.svm import SVR # for building the model # Importing the dataset from the url of the data set url = "https://forge.scilab.org/index.php/p/rdataset/source/file/master/csv/datasets/pressure.csv" data = requests.get(url).content # Reading the data dataset = pd.read_csv(io.StringIO(data.decode('utf-8')), index_col = 'Unnamed: 0') x = dataset.iloc[:, [0]].values #Temperature values y = dataset.iloc[:, [1]].values #Pressure values # Feature Scaling sc_x = StandardScaler() x = sc_x.fit_transform(x) sc_y = StandardScaler() y = sc_y.fit_transform(y) # Splitting the dataset into training and testing set (80/20) x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 28) # Initializing the SVR model with 10 decision trees model = SVR(kernel = 'rbf') # Fitting the SVR model to the data model.fit(x_train, y_train.ravel()) # Predicting the results y_pred = model.predict(x_test) # Predicting the target values of the test set y_pred = model.predict(x_test) # RMSE (Root Mean Square Error) rmse = float(format(np.sqrt(mean_squared_error(y_test, y_pred)), '.3f')) print("\nRMSE: ", rmse)

In this lesson, we learned about the Support Vector Regression along with its implementation in Python.

This marks the end of Regression section for this course. We will now move on to discuss the concept of classification and implement different kinds of classification algorithms in Machine learning using Python.