Thanks for looking into it.
Actually first I have tried it with big data. Below was model info for it.
AUC = 0.50
confusion: [[1252978.0, 23003.0], [0.0, 0.0]]
entropy: [[0.0, 0.0], [46.1, 0.8]]
Looking forward for your comments.
Thanks
Rajesh
On Wed, Oct 10, 2012 at 8:08 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Sgd is more suitable for large data. I will take a look later today.
> On Oct 9, 2012, at 11:29 PM, Rajesh Nikam <rajeshnikam@gmail.com> wrote:
> > Hi Ted,
> >
> > Putting specific question with data for getting problem with SGD.
> >
> > I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> >
> > Converted this to csv file just by updating header: iris3classes.csv
> >
> > mahout org.apache.mahout.classifier.sgd.TrainLogistic input
> /usr/local/mahout/trunk/iris3classes.csv features 4 output
> /usr/local/mahout/trunk/iris3classes.model target class categories 3
> predictors sepallength sepalwidth petallength petalwidth types n n
> > >> it gave following error.
> > Exception in thread "main" java.lang.IllegalArgumentException: Can only
> call classifyScalar with two categories
> >
> > Now created csv with only 2 classes. PFA iris2classes.csv
> >
> > >> trained iris2classes.csv with sgd
> >
> > mahout org.apache.mahout.classifier.sgd.TrainLogistic input
> /usr/local/mahout/trunk/iris2classes.csv features 4 output
> /usr/local/mahout/trunk/iris2classes.model target class categories 2
> predictors sepallength sepalwidth petallength petalwidth types n n
> > mahout runlogistic input /usr/local/mahout/trunk/iris2classes.csv
> model /usr/local/mahout/trunk/iris2classes.model auc confusion
> > AUC = 0.14
> > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > entropy: [[0.6, 0.3], [0.8, 0.4]]
> > >> AUC seems to poor. Now changed predictors
> >
> > mahout org.apache.mahout.classifier.sgd.TrainLogistic input
> /usr/local/mahout/trunk/iris2classes.csv features 4 output
> /usr/local/mahout/trunk/iris2classes.model target class categories 2
> predictors sepalwidth petallength types n n
> > mahout runlogistic input /usr/local/mahout/trunk/iris2classes.csv
> model /usr/local/mahout/trunk/iris2classes.model auc confusion
> scores
> > AUC = 0.80
> > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > entropy: [[0.7, 0.3], [0.7, 0.4]]
> > AUC is improved, however from confusion matrix seems everything is
> classified as class a.
> >
> > Below is the output.
> >
> > AUC = 0.80
> > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > entropy: [[0.7, 0.3], [0.7, 0.4]]
> > SGD is suitable for what kind of data?
> >
> > Thanks,
> > Rajesh
