mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Filipponi <luca.filippon...@gmail.com>
Subject Re: Naive Bayes Classifier Sentiment Analysis
Date Tue, 29 Jul 2014 13:46:22 GMT
I am using mahout 0.9, which part of source code should I look?

My problem is that I don’t know how to the sequence file without the label should be structured.

Do you have any hint?

Il giorno 29/lug/2014, alle ore 15:24, vaibhav srivastava <vaibhavcse30@gmail.com> ha
scritto:

> Hi,
> If you want to create a test set and if you do not want to measure accuracy.
> Then you can make an instance of claasifier and load your model on that
> classifier and then can find the best score.
> Look at  navie bayes test code.
> Hope this help. Thanks .
> On 29 Jul 2014 12:53, "Luca Filipponi" <luca.filipponi89@gmail.com> wrote:
> 
>> Hi , I am trying to develop sentiment analysis on italian tweet from
>> twitter using the naive bayes classifier, but I've some trouble.
>> 
>> My idea was to classify a lot of tweet as positive, negative or neautral,
>> and using that as training set for the Classifier. To do that I've wrote a
>> sequence file, in the format <Text,Text>, where in the key there is
>> /label/tweetID and in the key the text, and then the text of all the
>> dataset is converted in tfidf vector, using mahout utilities.
>> 
>> Then I'm using the command:
>> 
>> ./mahout trainnb and ./mahout testnb to check the classifier, and the
>> score is right (I've got nearly 100% because the test set is the same as
>> the train set)
>> 
>> My question is if I want to use a test set that is unlabeled how should it
>> be created? because if the format isn't like:
>> 
>> key = /label/  the classifier can't find the label and I've got an
>> exception
>> 
>> but in a new dataset, obviously this will be unlabeled because i need to
>> classify that, so I don't know what put in the key of the sequence file.
>> 
>> I've searched online for some example, but the only ones that I've found
>> use the split command, on the original dataset, and then testing on part of
>> that, but isn't my case.
>> 
>> 
>> Every idea for developing a better sentiment analysis is welcome, thanks
>> in advance for the help.
>> 
>> 


Mime
View raw message