spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Kun Yang <kuny...@stanford.edu>
Subject Re: multinomial logistic regression
Date Tue, 07 Jan 2014 05:18:26 GMT
I will follow up the newtown one later


On Mon, Jan 6, 2014 at 9:14 PM, Michael Kun Yang <kunyang@stanford.edu>wrote:

> I just sent the pr for multinomial logistic regression.
>
>
> On Mon, Jan 6, 2014 at 6:26 PM, Michael Kun Yang <kunyang@stanford.edu>wrote:
>
>> Thanks, will do.
>>
>>
>> On Mon, Jan 6, 2014 at 6:21 PM, Reynold Xin <rxin@databricks.com> wrote:
>>
>>> Thanks. Why don't you submit a pr and then we can work on it?
>>>
>>> > On Jan 6, 2014, at 6:15 PM, Michael Kun Yang <kunyang@stanford.edu>
>>> wrote:
>>> >
>>> > Hi Hossein,
>>> >
>>> > I can still use LabeledPoint with little modification. Currently I
>>> convert
>>> > the category into {0, 1} sequence, but I can do the conversion in the
>>> body
>>> > of methods or functions.
>>> >
>>> > In order to make the code run faster, I try not to use DoubleMatrix
>>> > abstraction to avoid memory allocation; another reason is that jblas
>>> has no
>>> > data structure to handle symmetric matrix addition efficiently.
>>> >
>>> > My code is not very pretty because I handle matrix operations manually
>>> (by
>>> > indexing).
>>> >
>>> > If you think it is ok, I will make a pull request.
>>> >
>>> >
>>> >> On Mon, Jan 6, 2014 at 5:34 PM, Hossein <falaki@gmail.com> wrote:
>>> >>
>>> >> Hi Michael,
>>> >>
>>> >> This sounds great. Would you please send these as a pull request.
>>> >> Especially if you can make your Newtown method implementation generic
>>> such
>>> >> that it can later be used by other algorithms, it would be very
>>> helpful.
>>> >> For example, you could add it as another optimization method under
>>> >> mllib/optimization.
>>> >>
>>> >> Was there a particular reason you chose not use LabeledPoint?
>>> >>
>>> >> We have some instructions for contributions here: <
>>> >>
>>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark>
>>> >>
>>> >> Thanks,
>>> >>
>>> >> --Hossein
>>> >>
>>> >>
>>> >> On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <
>>> kunyang@stanford.edu
>>> >>> wrote:
>>> >>
>>> >>> I actually have two versions:
>>> >>> one is based on gradient descent like the logistic regression on
>>> mllib.
>>> >>> the other is based on Newtown iteration, it is not as fast as SGD,
>>> but we
>>> >>> can get all the statistics from it like deviance, p-values and fisher
>>> >> info.
>>> >>>
>>> >>> we can get confusion matrix in both versions
>>> >>>
>>> >>> the gradient descent version is just a modification of logistic
>>> >> regression
>>> >>> with my own implementation. I did not use LabeledPoints class.
>>> >>>
>>> >>>
>>> >>> On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <evan.sparks@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Hi Michael,
>>> >>>>
>>> >>>> What strategy are you using to train the multinomial classifier?
>>> >>>> One-vs-all? I've got an optimized version of that method that
I've
>>> been
>>> >>>> meaning to clean up and commit for a while. In particular, rather
>>> than
>>> >>>> shipping a (potentially very big) model with each map task,
I ship
>>> it
>>> >>> once
>>> >>>> before each iteration with a broadcast variable. Perhaps we
can
>>> compare
>>> >>>> versions and incorporate some of my optimizations into your
code?
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Evan
>>> >>>>
>>> >>>>>> On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <
>>> kunyang@stanford.edu>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>> Hi Spark-ers,
>>> >>>>>
>>> >>>>> I implemented a SGD version of multinomial logistic regression
>>> based
>>> >> on
>>> >>>>> mllib's optimization package. If this classifier is in the
future
>>> >> plan
>>> >>> of
>>> >>>>> mllib, I will be happy to contribute my code.
>>> >>>>>
>>> >>>>> Cheers
>>> >>
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message