How many training examples do you have?
Sounds like you have very few. That is definitely not the sweet spot for
onlinear regression.
In any case, can you post your test code to github or something?
On Mon, Jul 4, 2011 at 11:46 AM, Vijay Santhanam
<vijay.santhanam@gmail.com>wrote:
> Thank you Ted
>
> However, even with using the default OnlineLogisiticRegression I'm unable
> to
> get acceptable results when trying to replicate the genderguesser
> discussed
> in the example of http://en.wikipedia.org/wiki/Naive_Bayes_classifier
>
> For that particular problem, do you recommend I take a
> binning/discretization approach with naive bayes? Or continue trying to
> fine
> tune the SGD algorithm?
>
> At this stage, I'm just hopelessly guessing parameters
> for OnlineLogisiticRegression.
> Even when I reiterate over the same data set many thousands of times I'm
> unable to get a suitable model that can pick a female or male from a
> height,weight and shoe size.
>
> Thanks again for taking the time to answer me.
>
> V
>
>
> On Tue, Jul 5, 2011 at 4:30 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
> > The wikipedia page recommends binning if you have a large amount of data
> > and
> > a supervised variable extraction method if not. These are both ways of
> > preprocessing to discretize continuous variables.
> >
> > On Mon, Jul 4, 2011 at 11:28 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > The mahout implementation of Naive_Bayes does not use continuous
> > variables
> > > well. The best bet is to discretize these variables either
> individually
> > or
> > > together using kmeans. Then use the discrete version for the
> > classifier.
> > >
> > > The random forest implementation and the SGD implementation are both
> > > happier with continuous variables.
> > >
> > >
> > > On Mon, Jul 4, 2011 at 8:01 AM, Vijay Santhanam <
> > vijay.santhanam@gmail.com
> > > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I'm new to Mahout and many of the machine learning ideas, but from
> what
> > I
> > >> understand of Naive Bayes classifier, it's possible to train a Naive
> > Bayes
> > >> model with continuous, categorical and wordlike features from my
> > >> understanding of the wikipedia entry
> > >> http://en.wikipedia.org/wiki/Naive_Bayes_classifier
> > >>
> > >> The 20news and wikipedia examples currently in mahout from what I
> gather
> > >> only use a target categorical variable and a textlike variables.
> > >>
> > >> I'm trying to replicate the persongenderguesser used in the
> wikipedia
> > >> article using mahout.
> > >>
> > >> Can anyone give me any tips about how to:
> > >> * format input files (train and test) for different data types
> > >> * inform the trainer and classifier which features are continuous,
> > >> categorical and wordlike
> > >>
> > >> My dataset is quite small, so I'd like to be able to process this in
> > code
> > >> (using Vectors, Models, etc), but I'm quite confused about how to use
> > the
> > >> classifier.bayes packages to train/create model with all my feature
> > types.
> > >>
> > >> Thanks in advance for any guidance.
> > >>
> > >> Cheers,
> > >> 
> > >> Vijay Santhanam
> > >> Software Engineer
> > >> http://au.linkedin.com/in/vijaysanthanam
> > >> 0407525087
> > >>
> > >
> > >
> >
>
>
>
> 
> Vijay Santhanam
> Software Engineer
> http://au.linkedin.com/in/vijaysanthanam
> 0407525087
>
