mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zhao zhendong <zhaozhend...@gmail.com>
Subject Re: SVM algo, code, etc.
Date Fri, 11 Dec 2009 11:52:39 GMT
True, I am still wondering about whether it is valuable to implement a
parallel SVM on hadoop? I really wanna join in mike's group.

Just like Olivier concerned, some linear version of SVM solvers can handle
large-scale data sets ( several seconds for 100K-level samples). It's true
that the linear version does not use Mercer Kernel, however, linear method
always can obtain very similar accuracy as the solvers with advanced kernel
does on large-scale data set. I really don't know whether it is true or
not.



On Thu, Dec 3, 2009 at 6:12 PM, Olivier Grisel <olivier.grisel@ensta.org>wrote:

> 2009/12/3 Ted Dunning <ted.dunning@gmail.com>:
> > Very interesting results, particularly the lack of dependence on data
> size.
> >
> > On Thu, Dec 3, 2009 at 12:02 AM, David Hall <dlwh@cs.berkeley.edu>
> wrote:
> >
> >> On Wed, Nov 25, 2009 at 2:35 AM, Isabel Drost <isabel@apache.org>
> wrote:
> >> > On Fri Grant Ingersoll <gsingers@apache.org> wrote:
> >> >> On Nov 19, 2009, at 1:15 PM, Sean Owen wrote:
> >> >> > Post a patch if you'd like to proceed, IMHO.
> >> >> +1
> >> >
> >> > +1 from me as well. I would love to see solid svm support in Mahout.
> >>
> >> And another +1 from me. If you want a pointer, I've recently stumbled
> >> on a new solver for SVMs that seems to be remarkably easy to
> >> implement.
> >>
> >> It's called Pegasos:
> >>
> >> ttic.uchicago.edu/~shai/papers/ShalevSiSr07.pdf<
> http://ttic.uchicago.edu/%7Eshai/papers/ShalevSiSr07.pdf>
>
> Pegasos and other online implementations of SVMs based on regularized
> variants of stochastic gradient descent are indeed amenable to large
> scale problems. They solve the SVM optimization problem with a
> stochastic approximation of the primal (as opposed to more 'classical'
> solvers such as libsvm that solve the dual problem using Sequential
> Minimal Optimization). However SGD based SVM implementation are
> currently limited to the linear 'kernel' (which is often expressive
> enough for common NLP tasks such as document categorization).
>
> Other interesting resources on the topic:
>
> A simple reference implementation of Pegasos:
> - http://leon.bottou.org/projects/sgd
>
> Speeding up the convergence of linear SGD based SVM using estimate of
> the diagonal of the hessian:
> -  http://webia.lip6.fr/~bordes/mywiki/doku.php?id=sgdqn
>
> Using a sparsifying L1 priors as a regularizer to perform automated
> feature selection:
> - http://www.cs.berkeley.edu/~jduchi/projects/DuchiSi09_folos.html
>
> Working coordinate-wise on large dimensional problems using L1 priors
> too (maybe easier to make map-reduceable efficiently):
> - http://ttic.uchicago.edu/~tewari/code/scd/
>
> Also do not overlook the higly optimized Vowpal Wabbit, probably the
> fastest linear classifier on earth:
> - http://hunch.net/~vw/
>
> --
> Olivier
> http://twitter.com/ogrisel - http://code.oliviergrisel.name
>



-- 
-------------------------------------------------------------

Zhen-Dong Zhao (Maxim)

<><<><><><><><><><>><><><><><>>>>>>

Department of Computer Science
School of Computing
National University of Singapore

><><><><><><><><><><><><><><><><<<<
Homepage:http://zhaozhendong.googlepages.com
Mail: zhaozhendong@gmail.com
>>>>>>><><><><><><><><<><>><><<<<<<

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message