mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Nikam <rajeshni...@gmail.com>
Subject Re: ** Problem using SGD and iris arff as test set **
Date Thu, 11 Oct 2012 08:58:33 GMT
what could be the problem with data formatting ?
Could you please update on the same.

On Thu, Oct 11, 2012 at 11:31 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> My first thought was that we needed several passes, but that is clearly
> wrong.
>
> I think that the problem is in the data formatting and conversion somehow.
>  Haven't had time to dope this out just yet.  The iris data should converge
> trivially.
>
> On Wed, Oct 10, 2012 at 9:58 PM, Rajesh Nikam <rajeshnikam@gmail.com>
> wrote:
>
> > Thanks for looking into it.
> >
> > Actually first I have tried it with big data. Below was model info for
> it.
> >
> > AUC = 0.50
> > confusion: [[1252978.0, 23003.0], [0.0, 0.0]]
> > entropy: [[-0.0, -0.0], [-46.1, -0.8]]
> >
> > Looking forward for your comments.
> >
> > Thanks
> > Rajesh
> >
> >
> > On Wed, Oct 10, 2012 at 8:08 PM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > Sgd is more suitable for large data.  I will take a look later today.
> > >
> > > Sent from my iPhone
> > >
> > > On Oct 9, 2012, at 11:29 PM, Rajesh Nikam <rajeshnikam@gmail.com>
> wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > Putting specific question with data for getting problem with SGD.
> > > >
> > > > I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> > > >
> > > > Converted this to csv file just by updating header:
> iris-3-classes.csv
> > > >
> > > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> > > /usr/local/mahout/trunk/iris-3-classes.csv --features 4 --output
> > > /usr/local/mahout/trunk/iris-3-classes.model --target class
> --categories
> > 3
> > > --predictors sepallength sepalwidth petallength petalwidth --types n n
> > > >
> > > > >> it gave following error.
> > > > Exception in thread "main" java.lang.IllegalArgumentException: Can
> only
> > > call classifyScalar with two categories
> > > >
> > > > Now created csv with only 2 classes. PFA iris-2-classes.csv
> > > >
> > > > >> trained iris-2-classes.csv with sgd
> > > >
> > > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> > > /usr/local/mahout/trunk/iris-2-classes.csv --features 4 --output
> > > /usr/local/mahout/trunk/iris-2-classes.model --target class
> --categories
> > 2
> > > --predictors sepallength sepalwidth petallength petalwidth --types n n
> > > >
> > > >
> > > > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> > > --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> > > >
> > > > AUC = 0.14
> > > > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > > > entropy: [[-0.6, -0.3], [-0.8, -0.4]]
> > > >
> > > > >> AUC seems to poor. Now changed --predictors
> > > >
> > > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> > > /usr/local/mahout/trunk/iris-2-classes.csv --features 4 --output
> > > /usr/local/mahout/trunk/iris-2-classes.model --target class
> --categories
> > 2
> > > --predictors sepalwidth petallength --types n n
> > > >
> > > > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> > > --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> > > --scores
> > > >
> > > > AUC = 0.80
> > > > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > > > entropy: [[-0.7, -0.3], [-0.7, -0.4]]
> > > >
> > > > AUC is improved, however from confusion matrix seems everything is
> > > classified as class a.
> > > >
> > > > Below is the output.
> > > >
> > > > "target","model-output","log-likelihood"
> > > > 0,0.492,-0.677017
> > > > 0,0.493,-0.679192
> > > > 0,0.493,-0.678355
> > > > 0,0.493,-0.678724
> > > > 0,0.492,-0.676583
> > > > 0,0.491,-0.675182
> > > > 0,0.492,-0.677452
> > > > 0,0.492,-0.677419
> > > > 0,0.493,-0.679628
> > > > 0,0.493,-0.678724
> > > > 0,0.491,-0.676116
> > > > 0,0.492,-0.677386
> > > > 0,0.493,-0.679192
> > > > 0,0.493,-0.679291
> > > > 0,0.491,-0.674912
> > > > 0,0.490,-0.673081
> > > > 0,0.491,-0.675313
> > > > 0,0.492,-0.677017
> > > > 0,0.491,-0.675616
> > > > 0,0.491,-0.675682
> > > > 0,0.492,-0.677353
> > > > 0,0.491,-0.676116
> > > > 0,0.492,-0.676714
> > > > 0,0.492,-0.677788
> > > > 0,0.492,-0.677287
> > > > 0,0.493,-0.679126
> > > > 0,0.492,-0.677386
> > > > 0,0.492,-0.676984
> > > > 0,0.492,-0.677452
> > > > 0,0.492,-0.678256
> > > > 0,0.493,-0.678691
> > > > 0,0.492,-0.677419
> > > > 0,0.491,-0.674381
> > > > 0,0.490,-0.673980
> > > > 0,0.493,-0.678724
> > > > 0,0.493,-0.678387
> > > > 0,0.492,-0.677050
> > > > 0,0.493,-0.678724
> > > > 0,0.493,-0.679225
> > > > 0,0.492,-0.677419
> > > > 0,0.492,-0.677050
> > > > 0,0.495,-0.682279
> > > > 0,0.493,-0.678355
> > > > 0,0.492,-0.676951
> > > > 0,0.491,-0.675550
> > > > 0,0.493,-0.679192
> > > > 0,0.491,-0.675649
> > > > 0,0.493,-0.678322
> > > > 0,0.491,-0.676116
> > > > 0,0.492,-0.677887
> > > > 1,0.492,-0.709316
> > > > 1,0.492,-0.709248
> > > > 1,0.492,-0.708935
> > > > 1,0.494,-0.705048
> > > > 1,0.493,-0.707488
> > > > 1,0.493,-0.707454
> > > > 1,0.492,-0.709765
> > > > 1,0.494,-0.705258
> > > > 1,0.493,-0.707936
> > > > 1,0.493,-0.706803
> > > > 1,0.495,-0.703539
> > > > 1,0.493,-0.708249
> > > > 1,0.494,-0.704601
> > > > 1,0.493,-0.707970
> > > > 1,0.493,-0.707597
> > > > 1,0.492,-0.708765
> > > > 1,0.492,-0.708351
> > > > 1,0.493,-0.706871
> > > > 1,0.494,-0.704770
> > > > 1,0.494,-0.705908
> > > > 1,0.492,-0.709350
> > > > 1,0.493,-0.707285
> > > > 1,0.493,-0.706247
> > > > 1,0.493,-0.707522
> > > > 1,0.493,-0.707835
> > > > 1,0.492,-0.708317
> > > > 1,0.493,-0.707556
> > > > 1,0.492,-0.708520
> > > > 1,0.493,-0.707902
> > > > 1,0.494,-0.706220
> > > > 1,0.494,-0.705427
> > > > 1,0.494,-0.705393
> > > > 1,0.493,-0.706803
> > > > 1,0.493,-0.707210
> > > > 1,0.492,-0.708351
> > > > 1,0.492,-0.710146
> > > > 1,0.492,-0.708867
> > > > 1,0.494,-0.705183
> > > > 1,0.493,-0.708215
> > > > 1,0.494,-0.705942
> > > > 1,0.493,-0.706525
> > > > 1,0.492,-0.708385
> > > > 1,0.493,-0.706389
> > > > 1,0.494,-0.704811
> > > > 1,0.493,-0.706905
> > > > 1,0.493,-0.708249
> > > > 1,0.493,-0.707801
> > > > 1,0.493,-0.707835
> > > > 1,0.494,-0.705604
> > > > 1,0.493,-0.707319
> > > >
> > > > AUC = 0.80
> > > > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > > > entropy: [[-0.7, -0.3], [-0.7, -0.4]]
> > > >
> > > > SGD is suitable for what kind of data?
> > > >
> > > > Thanks,
> > > > Rajesh
> > > >
> > > >
> > > > <iris-2-classes.csv>
> > > > <iris-3-classes.csv>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message