mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Nikam <rajeshni...@gmail.com>
Subject Fwd: SGD: Logistic regression package in Mahout
Date Tue, 16 Oct 2012 11:49:15 GMT
Hi Ted,

I was thinking, this might be due to having only 100 instances for training.

So I have created test set with two classes having ~49K instances, included
all features as predictors.
PFA sgd.grps.zip with test file.

mahout trainlogistic --input /usr/local/mahout/trainme/sgd-grps.csv
--output /usr/local/mahout/trainme/sgd-grps.model --target class
--categories 2 --features 128 --types n --predictors a1 a2 a3 a4 a5 a6 a7
a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 a24 a25 a26
a27 a28 a29 a30 a31 a32 a33 a34 a35 a36 a37 a38 a39 a40 a41 a42 a43 a44 a45
a46 a47 a48 a49 a50 a51 a52 a53 a54 a55 a56 a57 a58 a59 a60 a61 a62 a63 a64
a65 a66 a67 a68 a69 a70 a71 a72 a73 a74 a75 a76 a77 a78 a79 a80 a81 a82 a83
a84 a85 a86 a87 a88 a89 a90 a91 a92 a93 a94 a95 a96 a97 a98 a99 a100 a101
a102 a103 a104 a105 a106 a107 a108 a109 a110 a111 a112 a113 a114 a115 a116
a117 a118 a119 a120 a121 a122 a123 a124 a125 a126 a127


mahout runlogistic --input /usr/local/mahout/trainme/sgd-grps.csv --model
/usr/local/mahout/trainme/sgd-grps.model --auc --confusion

Still the results are similar, it classifies everything as class_1.

AUC = 0.50
confusion: [[*26563.0, 23006.0*], [0.0, 0.0]]
entropy: [[-0.0, -0.0], [-46.1, -21.4]]

I am not sure why this is failing all the time.

Looking forward for your reply.

Thanks
Rajesh



On Tue, Oct 16, 2012 at 3:57 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> I would love to help and will before long.  Just can't do it in the first
> part of this week.
>
> On Mon, Oct 15, 2012 at 6:28 AM, Rajesh Nikam <rajeshnikam@gmail.com>
> wrote:
>
> > Hello,
> >
> > I have asked below question on issue with using sgd on mahout forum.
> >
> > Similar issue with sgd is reported by
> >
> >
> http://stackoverflow.com/questions/11221436/using-sgd-classifier-in-mahout
> >
> > Even below link has similar output:
> >
> > AUC = 0.57*confusion: [[27.0, 13.0], [0.0, 0.0]]*
> > entropy: [[-0.4, -0.3], [-1.2, -0.7]]
> >
> >
> > http://sujitpal.blogspot.in/2012/09/learning-mahout-classification.html
> >
> > I am still wannder confusion how then this model works and used by many ?
> > Not able to get any points on how to use SGD that generates effective
> > model.
> >
> > Could someone point out what is missing in input file or provided
> > parameters.
> >
> > I appreciate your help.
> >
> > Below is description of steps that I followed.
> >
> > PF Attached uses input files for experiment.
> >
> > I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> > Converted this to csv file just by updating header: iris-3-classes.csv
> >
> > mahout org.apache.mahout.classifier.
> > sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output
> /usr/local/mahout/trunk/
> > *iris-3-classes.model* --target class *--categories 3* --predictors
> > sepallength sepalwidth petallength petalwidth --types n
> >
> > >> it gave following error.
> > Exception in thread "main" java.lang.IllegalArgumentException: Can only
> > call classifyScalar with two categories
> >
> > Now created csv with only 2 classes. PFA iris-2-classes.csv
> >
> > >> trained iris-2-classes.csv with sgd
> >
> > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> > /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> > /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class
> *--categories
> > 2* --predictors sepallength sepalwidth petallength petalwidth --types n
> >
> > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> > --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> >
> > AUC = 0.14
> > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > entropy: [[-0.6, -0.3], [-0.8, -0.4]]
> >
> > >> AUC seems to poor. Now changed --predictors
> >
> > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> > /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> > /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class
> *--categories
> > 2* --predictors sepalwidth petallength --types n
> >
> > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> > --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> > --scores
> >
> > AUC = 0.80
> > *confusion: [[50.0, 50.0], [0.0, 0.0]]*
> > entropy: [[-0.7, -0.3], [-0.7, -0.4]]
> >
> > This model classifies everything as category 1 which of no use.
> >
> > Thanks
> > Rajesh
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message