mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donni Khan <>
Subject Remove instance from SequenceFile
Date Tue, 11 Nov 2014 14:36:48 GMT
Hi All,

I'm working with text mining by using Mahoup algorithms. I'm calculating
the similarity for text documents, First I computed the TF-IDF for all
documents (SequenceFIle format), During computing the similarity, there are
a lot of documents do not have any simlair Doc's. So I would like to remove
those document from SequenceFile vectors.

Any Idea to do that?

Thank in advance,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message