spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Kun Yang <kuny...@stanford.edu>
Subject Re: multinomial logistic regression
Date Tue, 07 Jan 2014 02:15:09 GMT
Hi Hossein,

I can still use LabeledPoint with little modification. Currently I convert
the category into {0, 1} sequence, but I can do the conversion in the body
of methods or functions.

In order to make the code run faster, I try not to use DoubleMatrix
abstraction to avoid memory allocation; another reason is that jblas has no
data structure to handle symmetric matrix addition efficiently.

My code is not very pretty because I handle matrix operations manually (by
indexing).

If you think it is ok, I will make a pull request.


On Mon, Jan 6, 2014 at 5:34 PM, Hossein <falaki@gmail.com> wrote:

> Hi Michael,
>
> This sounds great. Would you please send these as a pull request.
> Especially if you can make your Newtown method implementation generic such
> that it can later be used by other algorithms, it would be very helpful.
> For example, you could add it as another optimization method under
> mllib/optimization.
>
> Was there a particular reason you chose not use LabeledPoint?
>
> We have some instructions for contributions here: <
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark>
>
> Thanks,
>
> --Hossein
>
>
> On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <kunyang@stanford.edu
> >wrote:
>
> > I actually have two versions:
> > one is based on gradient descent like the logistic regression on mllib.
> > the other is based on Newtown iteration, it is not as fast as SGD, but we
> > can get all the statistics from it like deviance, p-values and fisher
> info.
> >
> > we can get confusion matrix in both versions
> >
> > the gradient descent version is just a modification of logistic
> regression
> > with my own implementation. I did not use LabeledPoints class.
> >
> >
> > On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <evan.sparks@gmail.com>
> > wrote:
> >
> > > Hi Michael,
> > >
> > > What strategy are you using to train the multinomial classifier?
> > > One-vs-all? I've got an optimized version of that method that I've been
> > > meaning to clean up and commit for a while. In particular, rather than
> > > shipping a (potentially very big) model with each map task, I ship it
> > once
> > > before each iteration with a broadcast variable. Perhaps we can compare
> > > versions and incorporate some of my optimizations into your code?
> > >
> > > Thanks,
> > > Evan
> > >
> > > > On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <kunyang@stanford.edu>
> > > wrote:
> > > >
> > > > Hi Spark-ers,
> > > >
> > > > I implemented a SGD version of multinomial logistic regression based
> on
> > > > mllib's optimization package. If this classifier is in the future
> plan
> > of
> > > > mllib, I will be happy to contribute my code.
> > > >
> > > > Cheers
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message