Parallel GPDT
Parallel Gradient Projection-based Decomposition Technique
SCALAR AND PARALLEL TRAINING OF SUPPORT VECTOR MACHINES

T. Serafini L. Zanni G. Zanghirati

Description

GPDT is a C++ software designed to train large-scale Support Vector Machines (SVMs) for binary classification in both scalar and distributed memory parallel environments. It uses a popular problem decomposition technique [1, 2, 4, 6, 7] to split the SVM quadratic programming (QP) problem into a sequence of smaller QP subproblems, each one being solved by a suitable gradient projection method (GPM). The currently implemented GPMs are the Generalized Variable Projection Method (GVPM) [3] and the Dai-Fletcher method (DFGPM) [5].

A few minor bugs fixed (see more details in the CHANGES file, also packaged with the sources distribution)
[Last updated: Fabruary 7, 2007.]

Authors:

Thomas Serafini, Luca Zanni
Dept. of Mathematics, University of Modena and Reggio Emilia - ITALY
serafini.thomas@unimo.it, zanni.luca@unimo.it
Gaetano Zanghirati
Dept. of Mathematics, University of Ferrara - ITALY
g.zanghirati@unife.it

This work is mainly supported by the Italian FIRB Projects
Statistical Learning: Theory, Algorithms and Applications (grant RBAU01877P), http://slipguru.disi.unige.it/Research/ASTA
and
Parallel Algorithms and Numerical Nonlinear Optimization (grant RBAU01JYPN), http://dm.unife.it/pn2o.
This work is also supported by the Italian M.I.U.R. project
Numerical Methods and Mathematical Software in Applications (grant 2004012559), http://www.math.unifi.it/~brugnano/Cofin2004.

COPYRIGHT NOTIFICATION

Permission to copy and modify this software and its documentation for internal research use is granted, provided that this notice is retained thereon and on all copies or modifications. The authors and their respective Universities makes no representations as to the suitability and operability of this software for any purpose. It is provided "as is" without express or implied warranty. Use of this software for commercial purposes is expressly prohibited without the authors written permission.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Instructions and documentation

The compressed tar file gpdt-src-1-0r2.tgz (500 KB, updated to rev. 2, Jan. 2007) contains the sources to build the serial version of the program.

Create the gpdt directory structure with
gzip -d gpdt-src-1-0r2.tgz
tar -xvf gpdt-src-1-0r2.tar
This produces the directory gpdt as a subdirectory of your current directory. On Windows systems this is accomplished by popular programs as WinZip.
Compilation has been tested on Windows (ME, 2000, XP), linux, HP-UX, AIX and DEC Alpha systems. Installation usually works on other systems but has not been tested yet.
Move to the gpdt directory, edit the makefile file and set the appropriate values for the given variables: in particular, set the name of your C++ compiler. Then compile the serial version with
make gpdt
This creates the serial executable gpdt.
Run the program by giving training samples in the Joachims' SVM^light format.

More detailed instructions are available in pdf or in compressed PostScrpt format.

A precompiled binary version is also available for the following serial platform:

Windows32-based PCs (ME, 2000, XP):
gpdt-win32-1-0r2.zip (90 KB, compiled by MS Visual C++ .NET, updated to rev. 2, Jan. 2007)

Remark: if you use our serial code then you are kindly asked to cite this page, together with the references [3], [4] and [6] here below.

For MPI-based distributed memory parallel systems we provide the source code here. To build the parallel version for your particular platform follows the same instructions as for the serial version, but the last ones which must be

Move to the pgpdt directory, edit the makefile file and set the appropriate values for the given variables: in particular, set the name of your parallel C++ compiler and ensure the MPI libraries are available and linkable. Then compile the parallel version with
make pgpdt
This creates the parallel executable pgpdt.
Run the program on the desired processors group, by giving training samples in the Joachims' SVM^light format.

Remark: if you use our parallel code then you are kindly asked to cite this page, together with the references [2] and [7] here below.

Here we provide three freely available data sets, that you can use to test the program:

mnist8n8-10k.gz (3.9 MB): a 10000 samples subset of the MNIST handwritten digits database, constituted by 5000 samples of the digit "8" and 5000 samples of the other digits;
uciadu6.gz (79 KB): the 11220 samples set "adu6" of the UCI Adult database;
web6a.gz (188 KB): the 17188 samples set "web-6a" of the J. Platt's Web database.

These files contain the same data as the original ones, just converted to the SVM^light sparse data format.

Additional information

T. Joachims, Making large-scale SVM learning practical, Advances in Kernel Mathods - Support Vector Learning, Schölkopf et al., eds., MIT Press, Cambridge, MA, 1998.
G. Zanghirati, L. Zanni, A Parallel Solver for Large Quadratic Programs in Training Support Vector Machines, Parallel Computing 29 (2003), 535-551.
T. Serafini, G. Zanghirati, L. Zanni, Gradient Projection Methods for Large Quadratic Programs and Applications in Training Support Vector Machines, Optim. Meth. Soft. 20 (2005), 353-378.
T. Serafini, L. Zanni, On the Working Set Selection in Gradient Projection-based Decomposition Techniques for Support Vector Machines, Optim. Meth. Soft. 20 (2005), 583-596.
Y.H. Dai, R. Fletcher, New Algorithms for Singly Linearly Constrained Quadratic Programming Problems Subject to Lower and Upper Bounds, Math. Prog. 106(3) (2006), 403-421. Also as Research Report NA/216, Dept. of Mathematics, University of Dundee, UK (2003).
L. Zanni, An Improved Gradient Projection-based Decomposition Technique for Support Vector Machines, Computational Management Science 3(2) (2006), 131-145.
L. Zanni, T. Serafini, G. Zanghirati, Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems, JMLR 7(Jul), 1467-1492, 2006.
(Formerly as "Parallel Training of Large-Scale SVMs: a Software for Multiprocessor Systems" TR 356, Feb. 2006, Department of Mathematics, University of Ferrara, Italy, revision of TR 71/2005, University of Modena and Reggio Emilia, Italy)

Last modified: February 7, 2007.

Parallel GPDT Parallel Gradient Projection-based Decomposition Technique SCALAR AND PARALLEL TRAINING OF SUPPORT VECTOR MACHINES T. Serafini L. Zanni G. Zanghirati

Description

COPYRIGHT NOTIFICATION

Instructions and documentation

Additional information

Parallel GPDT
Parallel Gradient Projection-based Decomposition Technique
SCALAR AND PARALLEL TRAINING OF SUPPORT VECTOR MACHINES

T. Serafini L. Zanni G. Zanghirati