Parallel GPDT
Parallel Gradient Projection-based Decomposition Technique

SCALAR AND PARALLEL TRAINING OF SUPPORT VECTOR MACHINES

T. Serafini      L. Zanni     G. Zanghirati



Description

GPDT is a C++ software designed to train large-scale Support Vector Machines (SVMs) for binary classification in both scalar and distributed memory parallel environments. It uses a popular problem decomposition technique [1, 2, 4, 6, 7] to split the SVM quadratic programming (QP) problem into a sequence of smaller QP subproblems, each one being solved by a suitable gradient projection method (GPM). The currently implemented GPMs are the Generalized Variable Projection Method (GVPM) [3] and the Dai-Fletcher method (DFGPM) [5].

A few minor bugs fixed (see more details in the CHANGES file, also packaged with the sources distribution)
[Last updated: Fabruary 7, 2007.]

Authors:

This work is mainly supported by the Italian FIRB Projects
Statistical Learning: Theory, Algorithms and Applications (grant RBAU01877P), http://slipguru.disi.unige.it/Research/ASTA
and
Parallel Algorithms and Numerical Nonlinear Optimization (grant RBAU01JYPN), http://dm.unife.it/pn2o.
This work is also supported by the Italian M.I.U.R. project
Numerical Methods and Mathematical Software in Applications (grant 2004012559), http://www.math.unifi.it/~brugnano/Cofin2004.

Copyright (C) 2004-2007 by T. Serafini, G. Zanghirati, L. Zanni.




COPYRIGHT NOTIFICATION

Permission to copy and modify this software and its documentation for internal research use is granted, provided that this notice is retained thereon and on all copies or modifications. The authors and their respective Universities makes no representations as to the suitability and operability of this software for any purpose. It is provided "as is" without express or implied warranty. Use of this software for commercial purposes is expressly prohibited without the authors written permission.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.



Instructions and documentation

The compressed tar file gpdt-src-1-0r2.tgz (500 KB, updated to rev. 2, Jan. 2007) contains the sources to build the serial version of the program.

  1. Create the gpdt directory structure with
        gzip -d gpdt-src-1-0r2.tgz
        tar -xvf gpdt-src-1-0r2.tar
    This produces the directory gpdt as a subdirectory of your current directory. On Windows systems this is accomplished by popular programs as WinZip.
  2. Compilation has been tested on Windows (ME, 2000, XP), linux, HP-UX, AIX and DEC Alpha systems. Installation usually works on other systems but has not been tested yet.
  3. Move to the gpdt directory, edit the makefile file and set the appropriate values for the given variables: in particular, set the name of your C++ compiler. Then compile the serial version with
        make gpdt
    This creates the serial executable gpdt.
  4. Run the program by giving training samples in the Joachims' SVMlight format.
More detailed instructions are available in pdf or in compressed PostScrpt format.

A precompiled binary version is also available for the following serial platform:

Remark: if you use our serial code then you are kindly asked to cite this page, together with the references [3], [4] and [6] here below.


For MPI-based distributed memory parallel systems we provide the source code here. To build the parallel version for your particular platform follows the same instructions as for the serial version, but the last ones which must be

Remark: if you use our parallel code then you are kindly asked to cite this page, together with the references [2] and [7] here below.


Here we provide three freely available data sets, that you can use to test the program:

These files contain the same data as the original ones, just converted to the SVMlight sparse data format.



Additional information

  1. T. Joachims, Making large-scale SVM learning practical, Advances in Kernel Mathods - Support Vector Learning, Schölkopf et al., eds., MIT Press, Cambridge, MA, 1998.
  2. G. Zanghirati, L. Zanni, A Parallel Solver for Large Quadratic Programs in Training Support Vector Machines, Parallel Computing 29 (2003), 535-551.
  3. T. Serafini, G. Zanghirati, L. Zanni, Gradient Projection Methods for Large Quadratic Programs and Applications in Training Support Vector Machines, Optim. Meth. Soft. 20 (2005), 353-378.
  4. T. Serafini, L. Zanni, On the Working Set Selection in Gradient Projection-based Decomposition Techniques for Support Vector Machines, Optim. Meth. Soft. 20 (2005), 583-596.
  5. Y.H. Dai, R. Fletcher, New Algorithms for Singly Linearly Constrained Quadratic Programming Problems Subject to Lower and Upper Bounds, Math. Prog. 106(3) (2006), 403-421. Also as Research Report NA/216, Dept. of Mathematics, University of Dundee, UK (2003).
  6. L. Zanni, An Improved Gradient Projection-based Decomposition Technique for Support Vector Machines, Computational Management Science 3(2) (2006), 131-145.
  7. L. Zanni, T. Serafini, G. Zanghirati, Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems, JMLR 7(Jul), 1467-1492, 2006.
    (Formerly as "Parallel Training of Large-Scale SVMs: a Software for Multiprocessor Systems" TR 356, Feb. 2006, Department of Mathematics, University of Ferrara, Italy, revision of TR 71/2005, University of Modena and Reggio Emilia, Italy)



Last modified: February 7, 2007.