hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: Logistic regression package on Hadoop
Date Mon, 15 Oct 2012 12:53:34 GMT
Hi Rajesh,

You may want to use the mahout mailing list for mahout related question.
http://mahout.apache.org/mailinglists.html

Regards

Bertrand

On Mon, Oct 15, 2012 at 2:34 PM, Rajesh Nikam <rajeshnikam@gmail.com> wrote:

> Hi Harsh,
>
> Thanks for giving link for sgd from mahout.
>
> I have asked question on issue with using sgd. Below is description of it.
> Ted Dunning has mentioned their may be some issue with data encoding.
>
> However I am not able to point issue. Could you please let me know what is
> issue its format or usage.
>
> Attached uses input files
>
> I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> Converted this to csv file just by updating header: iris-3-classes.csv
>
> mahout org.apache.mahout.classifier.
> sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output
/usr/local/mahout/trunk/
> *iris-3-classes.model* --target class *--categories 3* --predictors
> sepallength sepalwidth petallength petalwidth --types n
>
> >> it gave following error.
> Exception in thread "main" java.lang.IllegalArgumentException: Can only
> call classifyScalar with two categories
>
> Now created csv with only 2 classes. PFA iris-2-classes.csv
>
> >> trained iris-2-classes.csv with sgd
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepallength sepalwidth petallength petalwidth --types n
>
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
>
> AUC = 0.14
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.6, -0.3], [-0.8, -0.4]]
>
> >> AUC seems to poor. Now changed --predictors
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepalwidth petallength --types n n
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> --scores
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
> AUC is improved, however from confusion matrix seems everything is
> classified as class a.
>
> Below is the output.
>
> "target","model-output","log-likelihood"
> 0,0.492,-0.677017
> 0,0.493,-0.679192
> 0,0.493,-0.678355
> 0,0.493,-0.678724
> 0,0.492,-0.676583
> 0,0.491,-0.675182
> 0,0.492,-0.677452
> 0,0.492,-0.677419
> 0,0.493,-0.679628
> 0,0.493,-0.678724
> 0,0.491,-0.676116
> 0,0.492,-0.677386
> 0,0.493,-0.679192
> 0,0.493,-0.679291
> 0,0.491,-0.674912
> 0,0.490,-0.673081
> 0,0.491,-0.675313
> 0,0.492,-0.677017
> 0,0.491,-0.675616
> 0,0.491,-0.675682
> 0,0.492,-0.677353
> 0,0.491,-0.676116
> 0,0.492,-0.676714
> 0,0.492,-0.677788
> 0,0.492,-0.677287
> 0,0.493,-0.679126
> 0,0.492,-0.677386
> 0,0.492,-0.676984
> 0,0.492,-0.677452
> 0,0.492,-0.678256
> 0,0.493,-0.678691
> 0,0.492,-0.677419
> 0,0.491,-0.674381
> 0,0.490,-0.673980
> 0,0.493,-0.678724
> 0,0.493,-0.678387
> 0,0.492,-0.677050
> 0,0.493,-0.678724
> 0,0.493,-0.679225
> 0,0.492,-0.677419
> 0,0.492,-0.677050
> 0,0.495,-0.682279
> 0,0.493,-0.678355
> 0,0.492,-0.676951
> 0,0.491,-0.675550
> 0,0.493,-0.679192
> 0,0.491,-0.675649
> 0,0.493,-0.678322
> 0,0.491,-0.676116
> 0,0.492,-0.677887
> 1,0.492,-0.709316
> 1,0.492,-0.709248
> 1,0.492,-0.708935
> 1,0.494,-0.705048
> 1,0.493,-0.707488
> 1,0.493,-0.707454
> 1,0.492,-0.709765
> 1,0.494,-0.705258
> 1,0.493,-0.707936
> 1,0.493,-0.706803
> 1,0.495,-0.703539
> 1,0.493,-0.708249
> 1,0.494,-0.704601
> 1,0.493,-0.707970
> 1,0.493,-0.707597
> 1,0.492,-0.708765
> 1,0.492,-0.708351
> 1,0.493,-0.706871
> 1,0.494,-0.704770
> 1,0.494,-0.705908
> 1,0.492,-0.709350
> 1,0.493,-0.707285
> 1,0.493,-0.706247
> 1,0.493,-0.707522
> 1,0.493,-0.707835
> 1,0.492,-0.708317
> 1,0.493,-0.707556
> 1,0.492,-0.708520
> 1,0.493,-0.707902
> 1,0.494,-0.706220
> 1,0.494,-0.705427
> 1,0.494,-0.705393
> 1,0.493,-0.706803
> 1,0.493,-0.707210
> 1,0.492,-0.708351
> 1,0.492,-0.710146
> 1,0.492,-0.708867
> 1,0.494,-0.705183
> 1,0.493,-0.708215
> 1,0.494,-0.705942
> 1,0.493,-0.706525
> 1,0.492,-0.708385
> 1,0.493,-0.706389
> 1,0.494,-0.704811
> 1,0.493,-0.706905
> 1,0.493,-0.708249
> 1,0.493,-0.707801
> 1,0.493,-0.707835
> 1,0.494,-0.705604
> 1,0.493,-0.707319
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
>
> On Fri, Oct 12, 2012 at 10:51 PM, Ted Dunning <tdunning@maprtech.com>wrote:
>
>> Harsh,
>>
>> THanks for the plug.  Rajesh has been talking to us.
>>
>>
>> On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>>> Hi Rajesh,
>>>
>>> Please head over to the Apache Mahout project. See
>>> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>>>
>>> Apache Mahout is homed at http://mahout.apache.org and works well with
>>> Hadoop MR, etc..
>>>
>>> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <rajeshnikam@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > Could you please suggest Logistic regression package that could be
>>> used on
>>> > Hadoop ?
>>> > I have large data and looking for LR package with kernel supports.
>>> >
>>> > Thanks
>>> > Rajesh
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Bertrand Dechoux

Mime
View raw message