mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Nikam <rajeshni...@gmail.com>
Subject Re: ** Problem using SGD and iris arff as test set **
Date Thu, 11 Oct 2012 04:58:09 GMT
Thanks for looking into it.

Actually first I have tried it with big data. Below was model info for it.

AUC = 0.50
confusion: [[1252978.0, 23003.0], [0.0, 0.0]]
entropy: [[-0.0, -0.0], [-46.1, -0.8]]

Looking forward for your comments.

Thanks
Rajesh


On Wed, Oct 10, 2012 at 8:08 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Sgd is more suitable for large data.  I will take a look later today.
>
> Sent from my iPhone
>
> On Oct 9, 2012, at 11:29 PM, Rajesh Nikam <rajeshnikam@gmail.com> wrote:
>
> > Hi Ted,
> >
> > Putting specific question with data for getting problem with SGD.
> >
> > I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> >
> > Converted this to csv file just by updating header: iris-3-classes.csv
> >
> > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/iris-3-classes.csv --features 4 --output
> /usr/local/mahout/trunk/iris-3-classes.model --target class --categories 3
> --predictors sepallength sepalwidth petallength petalwidth --types n n
> >
> > >> it gave following error.
> > Exception in thread "main" java.lang.IllegalArgumentException: Can only
> call classifyScalar with two categories
> >
> > Now created csv with only 2 classes. PFA iris-2-classes.csv
> >
> > >> trained iris-2-classes.csv with sgd
> >
> > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/iris-2-classes.csv --features 4 --output
> /usr/local/mahout/trunk/iris-2-classes.model --target class --categories 2
> --predictors sepallength sepalwidth petallength petalwidth --types n n
> >
> >
> > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> >
> > AUC = 0.14
> > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > entropy: [[-0.6, -0.3], [-0.8, -0.4]]
> >
> > >> AUC seems to poor. Now changed --predictors
> >
> > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/iris-2-classes.csv --features 4 --output
> /usr/local/mahout/trunk/iris-2-classes.model --target class --categories 2
> --predictors sepalwidth petallength --types n n
> >
> > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> --scores
> >
> > AUC = 0.80
> > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > entropy: [[-0.7, -0.3], [-0.7, -0.4]]
> >
> > AUC is improved, however from confusion matrix seems everything is
> classified as class a.
> >
> > Below is the output.
> >
> > "target","model-output","log-likelihood"
> > 0,0.492,-0.677017
> > 0,0.493,-0.679192
> > 0,0.493,-0.678355
> > 0,0.493,-0.678724
> > 0,0.492,-0.676583
> > 0,0.491,-0.675182
> > 0,0.492,-0.677452
> > 0,0.492,-0.677419
> > 0,0.493,-0.679628
> > 0,0.493,-0.678724
> > 0,0.491,-0.676116
> > 0,0.492,-0.677386
> > 0,0.493,-0.679192
> > 0,0.493,-0.679291
> > 0,0.491,-0.674912
> > 0,0.490,-0.673081
> > 0,0.491,-0.675313
> > 0,0.492,-0.677017
> > 0,0.491,-0.675616
> > 0,0.491,-0.675682
> > 0,0.492,-0.677353
> > 0,0.491,-0.676116
> > 0,0.492,-0.676714
> > 0,0.492,-0.677788
> > 0,0.492,-0.677287
> > 0,0.493,-0.679126
> > 0,0.492,-0.677386
> > 0,0.492,-0.676984
> > 0,0.492,-0.677452
> > 0,0.492,-0.678256
> > 0,0.493,-0.678691
> > 0,0.492,-0.677419
> > 0,0.491,-0.674381
> > 0,0.490,-0.673980
> > 0,0.493,-0.678724
> > 0,0.493,-0.678387
> > 0,0.492,-0.677050
> > 0,0.493,-0.678724
> > 0,0.493,-0.679225
> > 0,0.492,-0.677419
> > 0,0.492,-0.677050
> > 0,0.495,-0.682279
> > 0,0.493,-0.678355
> > 0,0.492,-0.676951
> > 0,0.491,-0.675550
> > 0,0.493,-0.679192
> > 0,0.491,-0.675649
> > 0,0.493,-0.678322
> > 0,0.491,-0.676116
> > 0,0.492,-0.677887
> > 1,0.492,-0.709316
> > 1,0.492,-0.709248
> > 1,0.492,-0.708935
> > 1,0.494,-0.705048
> > 1,0.493,-0.707488
> > 1,0.493,-0.707454
> > 1,0.492,-0.709765
> > 1,0.494,-0.705258
> > 1,0.493,-0.707936
> > 1,0.493,-0.706803
> > 1,0.495,-0.703539
> > 1,0.493,-0.708249
> > 1,0.494,-0.704601
> > 1,0.493,-0.707970
> > 1,0.493,-0.707597
> > 1,0.492,-0.708765
> > 1,0.492,-0.708351
> > 1,0.493,-0.706871
> > 1,0.494,-0.704770
> > 1,0.494,-0.705908
> > 1,0.492,-0.709350
> > 1,0.493,-0.707285
> > 1,0.493,-0.706247
> > 1,0.493,-0.707522
> > 1,0.493,-0.707835
> > 1,0.492,-0.708317
> > 1,0.493,-0.707556
> > 1,0.492,-0.708520
> > 1,0.493,-0.707902
> > 1,0.494,-0.706220
> > 1,0.494,-0.705427
> > 1,0.494,-0.705393
> > 1,0.493,-0.706803
> > 1,0.493,-0.707210
> > 1,0.492,-0.708351
> > 1,0.492,-0.710146
> > 1,0.492,-0.708867
> > 1,0.494,-0.705183
> > 1,0.493,-0.708215
> > 1,0.494,-0.705942
> > 1,0.493,-0.706525
> > 1,0.492,-0.708385
> > 1,0.493,-0.706389
> > 1,0.494,-0.704811
> > 1,0.493,-0.706905
> > 1,0.493,-0.708249
> > 1,0.493,-0.707801
> > 1,0.493,-0.707835
> > 1,0.494,-0.705604
> > 1,0.493,-0.707319
> >
> > AUC = 0.80
> > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > entropy: [[-0.7, -0.3], [-0.7, -0.4]]
> >
> > SGD is suitable for what kind of data?
> >
> > Thanks,
> > Rajesh
> >
> >
> > <iris-2-classes.csv>
> > <iris-3-classes.csv>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message