Machine Learning : Support Vector Machine  Algorithm 

 

What is SVM ?

Its classification technique, Which means , it is used to split the data. It helps to
split the data in the best possible way. It provides the best split.

How SVM is going to do that ?

It tries to find the widest margin between the two groups of data. It may not consider
every point in the graph, it just considers only the points ( These points
are called as support vectors )near to the separation line ( this line is called as
Hyper plane ).

The way it calculates is , it always tries to find the hyper plane which can be far
from the support vectors.

Suppose you are trying to draw a separation line between india and pakisthan using SVM, its ensures that line is equally far from both the countries.

in this case, support vectors are kashmir, Punjab etc from india side , and places from pakisthan which are close to the separation line are considered as support vectors from pakisthan side.

Places like AP, Karnataka , Tamilnadu from india cannot be considered as support vectors as they are far from the line of separator.

Can i always get an accurate line which can separate the data using SVM ?

We may not, some times it depends on the data and other factors.

Are there any ways i can tune in , to make the line correct or get correct split ?

We can tune the model using the following parameters, These parameters can help in
reducing overfitting and having better fit.

1) Kernel:
We can make use of kernels when data is not linearly separable. For eg
if you have data as concentric circles, you cant separate the data using hyper
plane, in such cases you can take the help of kernels , that will bring the data
to linearly separable form. Different kernels that are available are linear,poly,
rdf,sigmoid etc.

2)Gamma ( if you want to consider farther points like AP, Karnataka you can play with this parameter )
Low Gama, Far points are also considered for decision boundary.
Higer gamma gives curvy lines, which means proper classification.

3) C Parameter:
Controls tradeoff between smooth boundary or classifying exactly on training points correct.
Bigger c may implies curvy lines , which means proper classification.

Advantages
Performs well when there is marginal difference.
Works well on small data sets with higher dimensions

Disadvantages:

When there is more overlapping this is not recommended, In those case naive base is better.

Doesn’t perform well with larger data sets

Python Code:

model = svm.svc(kernel =’linear’) // Kernel can be modified here, default is linear
model.fit(x_train,y_train)

Using C parameter:
Examples
model = svm.svc(kernel =’linear’,c= -12 ) // Default is 1
model = svm.svc(kernel =’linear’,c= 12 )

To conclude , in layman terms, we can assume that, SVM prefers to have thick gap than a thin gap between two groups of data.

Leave a comment