SVM in Machine Learning-Quick Guide

SVM in Machine Learning, we’re going to talk about the R package e1071 today.

We’ll learn how to use R to train and test SVM models, as well as the core functions of the e1071 package, such as svm(), predict(), plot(), and tune().

Let’s get started with the tutorial.

SVM in Machine Learning

In R, there are various packages that can be used to run SVM. The e1071 package is the first and most intuitive.

The e1071 Package includes:

This package was the first to implement SVM in R.

Using visualization and parameter tuning approaches, we achieve a rigorous interface in the libsvm with the svm() function.

Some of the libsvm library’s features are listed below:

SVMs may be implemented quickly and easily.

Most common kernels are included, such as linear, polynomial, RBF, and sigmoid.

Provides decision-making capabilities as well as probability estimates for predictions. In the classification mode, there is also class weighing and cross-validation.

To begin, change the path to include the directory containing the e1071 package. After that, you must install and load it.

The command ‘?svm’ displays the interface’s help information.

Install and load the e1071 package with the following commands.

What is the first step in machine learning? »

install.packages ('e1071, dependencies = TRUE) 
library(e1071)

The S3 class mechanism is used in the R implementation. It has a predict() method and a training function with standard and formula interfaces.

Data, support vectors, and decision boundaries can all be visualized using the plot() method.

The tune() framework can be used to do hyperparameter tuning. It runs a grid search over the parameter ranges you specify.

Learn how to develop S3 and S4 classes in R using the Object-Oriented Programming in R lesson.

e1071 Package Functions

The following are the major functions of the e1071 package:

svm() — SVM training function.

predict() – This method returns the model’s predictions as well as the binary classifiers’ decision values.

plot()- visualizes data, support vectors, and decision boundaries.

tune()– tune() performs a grid search over provided parameter ranges for hyperparameter tweaking.

1. The svm() Function

The svm() function is used to train an SVM. It has the ability to conduct general regression, classification, and density estimation. It has a mathematical user interface.

The following data defines certain svm() function import parameters.

Data – An optional data frame that holds the variables in a model is specified. You don’t need to utilize the x and y parameters if you use this parameter.

By default, variables are taken from the environment in which SVM is invoked.

X – A data matrix, a vector, or a sparse matrix (object of class matrix provided by the matrix package). It represents the dataset’s instances and their associated characteristics.

Rows represent instances, whereas columns represent properties in a data matrix.

Type – SVM can be used as a classification, regression, or novelty detection machine.

It is dependent on y, that is, whether or not it is a factor, and the type defaults to C-classification or eps-regression.

It can be overwritten by explicitly setting a value. Valid choices include:

C-classification

nu-classification

one-classification (for novelty detection)

eps-regression

nu-regression

degree

parameter – It is required for the kernel of type polynomial (default: 3).

gamma – The gamma parameter is required by all kernels save the linear one.

coef0 – For polynomial and sigmoid kernels, this parameter is required (default: 0).

cost – The ‘C’-constant of the regularisation term in the Lagrange formulation represents the cost of constraints violation (default: 1).

Machine Learning Algorithms Top 5 »

2. The plot() Function

To see the constructed model with a scatter plot of the input, use the plot() method. It can draw a filled contour map of the class regions if desired.

The plot() function is used to visualize data, support vectors, and models. Let’s look at how to use this feature.

plot.svm(x, data, formula, fill = TRUE, grid = 50, slice = list(),symbolPalette = palette(), svSymbol = “x”, dataSymbol = “o”, …)

Here,

x – An object of class svm.

Formula – Formula for selecting the two dimensions depicted. Only required when more than two input variables are used.

Fill – Indicates whether or not to add a contour plot for the class regions.

Grid — The contour plot’s granularity.

Slice — For each dimension, a list of named numeric values is kept constant. We can set it to 0 if no dimensions are supplied.

Model — Represents a data object of class svm returned by the svm() function.

Data — Represents the information to be displayed. It should utilise the same data as the svm() function when generating the model.

symbolPalette — The colour palette for the data points and support vectors’ class. Support vectors are represented by the symbol svSymbol.

dataSymbol — A symbol that represents data points (other than support vectors). A simple graphical display of classification models is possible with SVM.

Using R to Create an SVM Model

In R, we’ll utilise the e1071 library and the iris dataset to build our SVM model.

library("e1071")
library("caret")
data("iris")
head(iris)

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

We’ll divide our dataset into x and y variables in the next phase.

The x variables are all the independent factors, such as Sepal.length, Sepal.Width, and so on, whereas the y variables are the Species, which are setosa, versicolor, and virginica.

x <- iris[,-5]
y <- iris[5]
model_svm <- svm(Species ~ ., data=iris)
summary(model_svm)

Call:
svm(formula = Species ~ ., data = iris)
Parameters:
   SVM-Type:  C-classification
 SVM-Kernel:  radial
       cost:  1
Number of Support Vectors:  51
 ( 8 22 21 )
Number of Classes:  3
Levels:
 setosa versicolor virginica

We will make predictions depending on our input variable x in the last phase.

Then, to evaluate the output of the SVM prediction and the class data, we’ll generate a confusion matrix.

pred <- predict(model_svm,x)
confusionMatrix(pred,y$Species)

Confusion Matrix and Statistics
            Reference
Prediction   setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         48         2
  virginica       0          2        48
Overall Statistics
               Accuracy : 0.9733         
                 95% CI : (0.9331, 0.9927)
    No Information Rate : 0.3333         
    P-Value [Acc > NIR] : < 2.2e-16      
                  Kappa : 0.96            
 Mcnemar's Test P-Value : NA             
Statistics by Class:
                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            0.9600           0.9600
Specificity                 1.0000            0.9800           0.9800
Pos Pred Value              1.0000            0.9600           0.9600
Neg Pred Value              1.0000            0.9800           0.9800
Prevalence                  0.3333            0.3333           0.3333
Detection Rate              0.3333            0.3200           0.3200
Detection Prevalence        0.3333            0.3333           0.3333
Balanced Accuracy           1.0000            0.9700           0.9700

Summary

We talked about training and testing models in R in our e1071 packages tutorial. SVM, Plot, Predict, and Tune are the major functions of the e1071 packages in R.

Please comment to us if you have any questions or suggestions about the lesson.