mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Patterson <j...@cloudera.com>
Subject Re: NaiveBayes and Classification of non-documents
Date Thu, 02 Jun 2011 15:23:18 GMT
yeah, if you look at the W-N-IDF stuff that the Mahout version adds
into NB its built for a "bag of words" as opposed to the training
instances setup in Weka for Weka's impl. Weka's impl is more
generalized whereas Mahout impl, afaik, is built specifically for
docs.

JP

On Thu, Jun 2, 2011 at 11:13 AM, Robin Anil <robin.anil@gmail.com> wrote:
> NB implementation doesnt handle numeric values very well, if you convert
> your data to boolean feature. You can construct a document out of it and use
> it on NB
>
> A better way would be to use Weka formatter to convert to vectors and use
> the SGD classifier in Mahout. You will be pleasantly surprised by its
> accuracy and speed.
>
> Robin
>
>
> On Thu, Jun 2, 2011 at 8:18 PM, Lancaster, Robert (Orbitz) <
> ROBERT.LANCASTER@orbitz.com> wrote:
>
>> I'm looking at the Mahout implementation NaiveBayes for a classification
>> task, but the language around the Mahout implementation appears to be
>> document-centric.  Is it possible to use the Mahout implementation of NB for
>> a classification task that doesn't involve documents?
>>
>> I have about 80 million records with a small number of features.  The arff
>> header looks like (the numeric features could easily be nominalized if need
>> be):
>>
>> @RELATION        relation
>> @ATTRIBUTE      featurea    NUMERIC
>> @ATTRIBUTE      featureb    {1,2,3,4,5,6,7}
>> @ATTRIBUTE      featurec     {1,2,3,4,5,6,7}
>> @ATTRIBUTE      featured     NUMERIC
>> @ATTRIBUTE      featuref        NUMERIC
>> @ATTRIBUTE      featuref {0,1}
>> @ATTRIBUTE      target  {0,1}
>>
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com
blog: http://jpatterson.floe.tv

Mime
View raw message