mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay Santhanam <>
Subject Using naive bayes classification with continuous, categorical and word-like features
Date Mon, 04 Jul 2011 15:01:53 GMT

I'm new to Mahout and many of the machine learning ideas, but from what I
understand of Naive Bayes classifier, it's possible to train a Naive Bayes
model with continuous, categorical and word-like features from my
understanding of the wikipedia entry

The 20news and wikipedia examples currently in mahout from what I gather
only use a target categorical variable and a text-like variables.

I'm trying to replicate the person-gender-guesser used in the wikipedia
article using mahout.

Can anyone give me any tips about how to:
* format input files (train and test) for different data types
* inform the trainer and classifier which features are continuous,
categorical and word-like

My dataset is quite small, so I'd like to be able to process this in code
(using Vectors, Models, etc), but I'm quite confused about how to use the
classifier.bayes packages to train/create model with all my feature types.

Thanks in advance for any guidance.

 Vijay Santhanam
 Software Engineer

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message