mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neel Sheyal <latencybus...@gmail.com>
Subject BagofWords and StopList
Date Thu, 03 Mar 2011 13:29:59 GMT
Hi
       I need to do text clustering but in the context of natural
language processing. Consequently, word ordering becomes important.
Initially, I will be doing the nGram model (with n =3).

In Mahout, the Vector and SequenceFileFormat representation does not
take into consideration contextual information (as I understand). I
know I might need to modify  both of them but is there a bagofwords
and stoplist that I may use?

Thanks,
Neel Sheyal

Mime
View raw message