hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qiaoresearcher <qiaoresearc...@gmail.com>
Subject hadoop+python+text mining
Date Thu, 24 Apr 2014 17:58:12 GMT
I have Hadoop and python installed with nltk. Now I have an large input
file which has three columns:
column 1  | column 2 | column 3
positive         id1          some tweet message
negative       id2          other tweet message
positive         id3          tweet message
negative       id4          tweet message
positive         id5          tweet message
....                    ...                ....

I want to use text mining to construct TFIDF vectors from the tweet
messages (also use stop words, stem, etc) and then use some classifier to
classify tweet message as positive or negative. I know how to do it just
using python and nltk. But how to do the same thing on hadoop?


View raw message