Parallel Gradient Projection-based Decomposition Technique

SCALAR AND PARALLEL TRAINING OF SUPPORT VECTOR MACHINES

**T. Serafini
L. Zanni G. Zanghirati**

GPDT is a C++ software designed to train large-scale Support Vector Machines (SVMs) for binary classification in both scalar and distributed memory parallel environments. It uses a popular problem decomposition technique [1, 2, 4, 6, 7] to split the SVM quadratic programming (QP) problem into a sequence of smaller QP subproblems, each one being solved by a suitable gradient projection method (GPM). The currently implemented GPMs are the Generalized Variable Projection Method (GVPM) [3] and the Dai-Fletcher method (DFGPM) [5].

A few minor bugs fixed (see more details in the
CHANGES file, also packaged with the sources distribution)

[Last updated: Fabruary 7, 2007.]

**Authors:**

**Thomas Serafini**,**Luca Zanni**

Dept. of Mathematics, University of Modena and Reggio Emilia - ITALY

serafini.thomas@unimo.it, zanni.luca@unimo.it**Gaetano Zanghirati**

Dept. of Mathematics, University of Ferrara - ITALY

g.zanghirati@unife.it

This work is mainly supported by the Italian FIRB Projects

Statistical Learning: Theory, Algorithms and Applications
(grant RBAU01877P), http://slipguru.disi.unige.it/Research/ASTA

and

Parallel Algorithms and Numerical Nonlinear Optimization
(grant RBAU01JYPN), http://dm.unife.it/pn2o.

This work is also supported by the Italian M.I.U.R. project

Numerical Methods and Mathematical Software in Applications
(grant 2004012559), http://www.math.unifi.it/~brugnano/Cofin2004.

Copyright (C) 2004-2007 by T. Serafini, G. Zanghirati, L. Zanni.

Permission to copy and modify this software and its documentation for internal research use is granted, provided that this notice is retained thereon and on all copies or modifications. The authors and their respective Universities makes no representations as to the suitability and operability of this software for any purpose. It is provided "as is" without express or implied warranty. Use of this software for commercial purposes is expressly prohibited without the authors written permission.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License
for more details.

The compressed tar file
gpdt-src-1-0r2.tgz (500 KB,
updated to **rev. 2, Jan. 2007**) contains the
sources to build the serial version of the program.

- Create the gpdt directory structure with

`gzip -d gpdt-src-1-0r2.tgz`

`tar -xvf gpdt-src-1-0r2.tar`

This produces the directory gpdt as a subdirectory of your current directory. On Windows systems this is accomplished by popular programs as WinZip. - Compilation has been tested on Windows (ME, 2000, XP), linux, HP-UX, AIX and DEC Alpha systems. Installation usually works on other systems but has not been tested yet.
- Move to the
`gpdt`

directory, edit the`makefile`

file and set the appropriate values for the given variables: in particular, set the name of your C++ compiler. Then compile the serial version with

`make gpdt`

This creates the serial executable`gpdt`

. - Run the program by giving training samples in the Joachims'
SVM
^{light}format.

A precompiled binary version is also available for the following serial platform:

- Windows32-based PCs (ME, 2000, XP):

gpdt-win32-1-0r2.zip (90 KB, compiled by MS Visual C++ .NET, updated to**rev. 2, Jan. 2007**)

For **MPI-based distributed memory parallel systems** we provide
the source code **here**. To build
the parallel version for your particular platform follows the same instructions
as for the serial version, but the last ones which must be

- Move to the
`pgpdt`

directory, edit the`makefile`

file and set the appropriate values for the given variables: in particular, set the name of your parallel C++ compiler and ensure the MPI libraries are available and linkable. Then compile the parallel version with

`make pgpdt`

This creates the parallel executable`pgpdt`

. - Run the program on the desired processors group,
by giving training samples in the Joachims'
SVM
^{light}format.

**Remark:** if you use our parallel code then you are kindly asked to cite
this page, together with the references [2] and
[7] here below.

Here we provide three freely available data sets, that you can use to test the program:

- mnist8n8-10k.gz (3.9 MB): a 10000 samples subset of the MNIST handwritten digits database, constituted by 5000 samples of the digit "8" and 5000 samples of the other digits;
- uciadu6.gz (79 KB): the 11220 samples set "adu6" of the UCI Adult database;
- web6a.gz (188 KB): the 17188 samples set "web-6a" of the J. Platt's Web database.

- T. Joachims,
*Making large-scale SVM learning practical*, Advances in Kernel Mathods - Support Vector Learning, Schölkopf*et al.*, eds., MIT Press, Cambridge, MA, 1998. - G. Zanghirati, L. Zanni,
*A Parallel Solver for Large Quadratic Programs in Training Support Vector Machines*, Parallel Computing**29**(2003), 535-551. - T. Serafini, G. Zanghirati, L. Zanni,
*Gradient Projection Methods for Large Quadratic Programs and Applications in Training Support Vector Machines*, Optim. Meth. Soft.**20**(2005), 353-378. - T. Serafini, L. Zanni,
*On the Working Set Selection in Gradient Projection-based Decomposition Techniques for Support Vector Machines*, Optim. Meth. Soft.**20**(2005), 583-596. - Y.H. Dai, R. Fletcher,
*New Algorithms for Singly Linearly Constrained Quadratic Programming Problems Subject to Lower and Upper Bounds*, Math. Prog.**106**(3) (2006), 403-421. Also as Research Report NA/216, Dept. of Mathematics, University of Dundee, UK (2003). - L. Zanni,
*An Improved Gradient Projection-based Decomposition Technique for Support Vector Machines*, Computational Management Science**3**(2) (2006), 131-145. -
L. Zanni, T. Serafini, G. Zanghirati,
**Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems**, JMLR**7**(Jul), 1467-1492, 2006.

(Formerly as "*Parallel Training of Large-Scale SVMs: a Software for Multiprocessor Systems*" TR**356**, Feb. 2006, Department of Mathematics, University of Ferrara, Italy, revision of TR 71/2005, University of Modena and Reggio Emilia, Italy)

Last modified: February 7, 2007.