mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Mahout SGD / Bayes prediction results over 20newsgroups
Date Fri, 30 Dec 2011 18:42:27 GMT
THanks.

This confirms my suspicions that the AdaptiveLogisticRegression has
regressed somehow.

I am munching on the pig interfaces right now and should get back to this
before too long.

On Fri, Dec 30, 2011 at 10:18 AM, Josh Patterson <josh@cloudera.com> wrote:

> I'm on random text (tweets), which are just like blobs of text like
> the newsgroups dataset.
>
> I was stuck in the 60s as well and then tried playing with the
> parameters. What worked for me to get up into the upper 70s was to set
> the "-features" param higher (started at 20, moved up 200 to get 76%).
>
> Hope that helps, playing with parameters is always an art in ML, can
> be time consuming.
>
> JP
>
> On Thu, Dec 22, 2011 at 1:46 AM, Sreejith S <srssreejith@gmail.com> wrote:
> > On Thu, Dec 22, 2011 at 12:04 PM, Lance Norskog <goksron@gmail.com>
> wrote:
> >
> >> The Bayes in the examples doesn't work very well in the 20 newsgroups
> >> example. Something is wrong  in the data ETL, the tuning options, or
> >> the Bayes implementation.
> >>
> >> On Wed, Dec 21, 2011 at 10:18 PM, Ted Dunning <ted.dunning@gmail.com>
> >> wrote:
> >> > 97% is not correct.  This sounds like you ran it on the training data.
> >>
> >
> > @Ted , yes i ran it on the same training data.
> >
> >
> >> >
> >> > 63% also sounds low.  I don't know what happened there.
> >>
> >
> > Is any one tested same 20newsgrop with SGD and got better results ?
> >
> >> >
> >> > On Wed, Dec 21, 2011 at 9:26 PM, Sreejith S <srssreejith@gmail.com>
> >> wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> I made a comparison between SGD and Bayes classifiers over
> 20news-bydate
> >> >> dataset.
> >> >>
> http://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz
> >> >>
> >> >> The classifier results and confusion matrix seems a bit confused,
> since
> >> it
> >> >> is said that SGD is better for small datasets and Bayes for large
> >> datasets.
> >> >> Pls check my test scenario http://pastebin.com/K0cy0ayk
> >> >>
> >> >> It seems that even in small dataset like 20news-bydate Bayes gives
> 97 %
> >> >> accuracy and SGD gives 63 % :(
> >> >> Am i missing something?? Pls clarify.
> >> >>
> >> >> Thank You,
> >> >> --
> >> >>
> >> >>
> >> >> *Sreejith.S*
> >> >> http://srijiths.wordpress.com/
> >> >> * *http://sreejiths.emurse.com/
> >> >>
> >> >> tweet2sree@twitter <http://tweet2Sree>
> >> >>
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goksron@gmail.com
> >>
> >
> >
> >
> > --
> >
> >
> > *Sreejith.S*
> > http://srijiths.wordpress.com/
> > * *http://sreejiths.emurse.com/
> >
> > tweet2sree@twitter <http://tweet2Sree>
>
>
>
> --
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message