mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay Santhanam <vijay.santha...@gmail.com>
Subject Re: Using naive bayes classification with continuous, categorical and word-like features
Date Mon, 04 Jul 2011 18:46:56 GMT
Thank you Ted

However, even with using the default OnlineLogisiticRegression I'm unable to
get acceptable results when trying to replicate the gender-guesser discussed
in the example of http://en.wikipedia.org/wiki/Naive_Bayes_classifier

For that particular problem, do you recommend I take a
binning/discretization approach with naive bayes? Or continue trying to fine
tune the SGD algorithm?

At this stage, I'm just hopelessly guessing parameters
for OnlineLogisiticRegression.
Even when I reiterate over the same data set many thousands of times I'm
unable to get a suitable model that can pick a female or male from a
height,weight and shoe size.

Thanks again for taking the time to answer me.

-V


On Tue, Jul 5, 2011 at 4:30 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> The wikipedia page recommends binning if you have a large amount of data
> and
> a supervised variable extraction method if not.  These are both ways of
> preprocessing to discretize continuous variables.
>
> On Mon, Jul 4, 2011 at 11:28 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > The mahout implementation of Naive_Bayes does not use continuous
> variables
> > well.  The best bet is to discretize these variables either individually
> or
> > together using k-means.  Then use the discrete version for the
> classifier.
> >
> > The random forest implementation and the SGD implementation are both
> > happier with continuous variables.
> >
> >
> > On Mon, Jul 4, 2011 at 8:01 AM, Vijay Santhanam <
> vijay.santhanam@gmail.com
> > > wrote:
> >
> >> Hi,
> >>
> >> I'm new to Mahout and many of the machine learning ideas, but from what
> I
> >> understand of Naive Bayes classifier, it's possible to train a Naive
> Bayes
> >> model with continuous, categorical and word-like features from my
> >> understanding of the wikipedia entry
> >> http://en.wikipedia.org/wiki/Naive_Bayes_classifier
> >>
> >> The 20news and wikipedia examples currently in mahout from what I gather
> >> only use a target categorical variable and a text-like variables.
> >>
> >> I'm trying to replicate the person-gender-guesser used in the wikipedia
> >> article using mahout.
> >>
> >> Can anyone give me any tips about how to:
> >> * format input files (train and test) for different data types
> >> * inform the trainer and classifier which features are continuous,
> >> categorical and word-like
> >>
> >> My dataset is quite small, so I'd like to be able to process this in
> code
> >> (using Vectors, Models, etc), but I'm quite confused about how to use
> the
> >> classifier.bayes packages to train/create model with all my feature
> types.
> >>
> >> Thanks in advance for any guidance.
> >>
> >> Cheers,
> >> --
> >>  Vijay Santhanam
> >>  Software Engineer
> >>  http://au.linkedin.com/in/vijaysanthanam
> >>  0407525087
> >>
> >
> >
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message