spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: multinomial logistic regression
Date Tue, 07 Jan 2014 02:21:25 GMT
Thanks. Why don't you submit a pr and then we can work on it?

> On Jan 6, 2014, at 6:15 PM, Michael Kun Yang <kunyang@stanford.edu> wrote:
>
> Hi Hossein,
>
> I can still use LabeledPoint with little modification. Currently I convert
> the category into {0, 1} sequence, but I can do the conversion in the body
> of methods or functions.
>
> In order to make the code run faster, I try not to use DoubleMatrix
> abstraction to avoid memory allocation; another reason is that jblas has no
> data structure to handle symmetric matrix addition efficiently.
>
> My code is not very pretty because I handle matrix operations manually (by
> indexing).
>
> If you think it is ok, I will make a pull request.
>
>
>> On Mon, Jan 6, 2014 at 5:34 PM, Hossein <falaki@gmail.com> wrote:
>>
>> Hi Michael,
>>
>> This sounds great. Would you please send these as a pull request.
>> Especially if you can make your Newtown method implementation generic such
>> that it can later be used by other algorithms, it would be very helpful.
>> For example, you could add it as another optimization method under
>> mllib/optimization.
>>
>> Was there a particular reason you chose not use LabeledPoint?
>>
>> We have some instructions for contributions here: <
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark>
>>
>> Thanks,
>>
>> --Hossein
>>
>>
>> On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <kunyang@stanford.edu
>>> wrote:
>>
>>> I actually have two versions:
>>> one is based on gradient descent like the logistic regression on mllib.
>>> the other is based on Newtown iteration, it is not as fast as SGD, but we
>>> can get all the statistics from it like deviance, p-values and fisher
>> info.
>>>
>>> we can get confusion matrix in both versions
>>>
>>> the gradient descent version is just a modification of logistic
>> regression
>>> with my own implementation. I did not use LabeledPoints class.
>>>
>>>
>>> On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <evan.sparks@gmail.com>
>>> wrote:
>>>
>>>> Hi Michael,
>>>>
>>>> What strategy are you using to train the multinomial classifier?
>>>> One-vs-all? I've got an optimized version of that method that I've been
>>>> meaning to clean up and commit for a while. In particular, rather than
>>>> shipping a (potentially very big) model with each map task, I ship it
>>> once
>>>> before each iteration with a broadcast variable. Perhaps we can compare
>>>> versions and incorporate some of my optimizations into your code?
>>>>
>>>> Thanks,
>>>> Evan
>>>>
>>>>>> On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <kunyang@stanford.edu>
>>>>> wrote:
>>>>>
>>>>> Hi Spark-ers,
>>>>>
>>>>> I implemented a SGD version of multinomial logistic regression based
>> on
>>>>> mllib's optimization package. If this classifier is in the future
>> plan
>>> of
>>>>> mllib, I will be happy to contribute my code.
>>>>>
>>>>> Cheers
>>

Mime
View raw message