Thursday, February 10, 2011

Using LIBSVM as a tool for classification and prediction

Intelligent Systems

Dr. Abdul Rahim Ahmad


Using LIBSVM as a tool for classification and prediction

By
Omar Salah Farhan  (ST21058)
Anes .A.Shaker       (ST21059)

The contributions in this project work:  We completed the work of the project together.   



Analysis with LibSVM
Data is setup in to analysis with the tools of LibSVM. Our data contains two classes, each N samples. The data is given as and used as 2D data for analysis shown in Figure 1.
Figure 1. 2-D demonstration of data used in SVM method

We want to find the best parameter value C with using 2-fold cross validation (meaning use 1/2 data to train, the other  1/2 to test) and linear kernel (-t 0).. After finding the best parameter value for C, we train the entire data. The data set are downloaded from website link saved in Matlab work folder.

The data used for the analysis is downloaded from the website of LIBSVM.  Diabetes-scale data is selected for the analysis through LibSVM method incorporating with Matlab Simulation. Input data is considered as indicator of diabetes and output data is considered for scale of diabetes measuring.
We have solved the problem of obtaining the scale of diabetes as final outcomes in diabetes measurements from the supplied input of diabetes indicator through SVM method.



A table of the cross-validation training results for the all the different C and Gamma values combination are as follows:


C
0.03125
0.0625
0.125
0.25
0.5
1
2
4
8
16
32
64
128
g
0.03125
69
69.5
70
70
70.5
71
72
74
73
71
69
67.5
68
0.0625
70
71.5
72
72.5
73
73
73.5
75
74
72
69
67.5
68
0.125
71.5
72
72
72.75
73.5
74
74.5
75.5
74.5
74
72
70
68.5
0.25
72
72.5
73
74
74
75
76
77
75
74.5
72.5
71
69
0.5
72.5
73
73.25
75
75
75.5
77
78
76
75
73
72
70
1
74
74.25
74
74.5
75.5
76
77.5
79
77
76
74
72.5
71
2
73
73.75
74
74
74.5
77.5
77.5
77.75
75.5
74.5
73
72
70
4
73
73.5
73.5
73.5
74
76
76.5
77
74
73
72.5
71.5
69.5
8
72.5
73
73.25
73.25
73.75
75
75.5
76
73
73
72
71
69
16
71
72
73
73.25
73.5
74
75
76.5
72.5
72
71.5
70.5
68.5
32
70
71.5
72
72.5
73
73.5
74
74
72
72
71
70
68
64
69.5
70
71
72
72.5
72.5
73.5
73.5
71.5
71.5
70.5
68.5
67
128
69
69
70
71
71
72
72.5
73
71
71
70
68
67



<> 
Best C : 4  
Best Gamma value is determined as 0.99 (~1)             
The highest accuracy :  79.02 %
Time taken for the whole cross- validation training : 5.87e+0 second  


 
 
Results of SVM class

In order to make an LS-SVM model, we need 2 extra parameters:
gamma (gam) is the regularization parameter,
determining the trade-off between the fitting error minimization
and smoothness. In the common case of the RBF kernel,
sigma^2 (sig2) is the bandwidth. The parameters of SVM method
for dual class are shown in Table1.
Table 1. The parameters for using in SVM method with two class
gam
10
Sig2
0.2
type
classification
[alpha,b]
trainlssvm({X,Y,type,gam,sig2,'RBF_kernel'})


Results for the SVM multiclass

The simulation on SVM multiclass is done
with the cooperation of LibSVM and MATLAB.
The results of simulation are revealed sequentially.
 1. Coupled Simulated Annealing results:  [gam]         0.98774
                                          [sig2]        40.2467
                                          F(X)=         0.17
 2. Optimization routine:           simplex
    cost function:                  crossvalidatelssvm
    kernel function                 RBF_kernel

 3. starting values:                   0.987738      40.2467
 Iteration   Func-count    min f(x)    log(gamma)    log(sig2)    Procedure

     1           3     1.700000e-001     -0.0123        3.6950      initial
     2           5     1.700000e-001     -0.0123        3.6950      contract inside
     3           9     1.700000e-001     -0.0123        3.6950      shrink
     4          11     1.700000e-001     -0.0123        3.6950      contract outside
Simplex results:
X=0.987738   40.246706,  F(X)=1.700000e-001

Obtained hyper-parameters: [gamma sig2]: 0.987738      40.2467
Multidimensional output at Tuning time 5.875000e+000
Accuracy: 79.02 %
Discussion
   The multiclass classification case is more delicate, as many of the
algorithms are introduced basically incorporating with SVM method
through LIBSVM tools. In this short survey we investigate the
techniques for solving the multiclass classification problem.
Support Vector Machines are among the most robust and
successful classification algorithms. They are based upon the idea
of maximizing the margin i.e. maximizing the minimum distance
from the separating hyperplane to the nearest example.
In these extensions, additional parameters and constraints are added
to the optimization problem to handle the separation ofthe different classes.
The formulation of SVM method results in a large optimization problem,
which may be impractical for a large number of classes.
On the other hand, multiclass classification of SVM methodreports
 a better formulation with a more efficient implementation.