hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhige Xin <xinzhi...@gmail.com>
Subject Top K words problem
Date Sat, 09 Aug 2014 16:49:44 GMT
I have a question about hadoop that how to modify the wordcount program to
give the top K words according to their occurrences.

The naive method is to count and sort but it needs too many lines of code
and is not elegant. Another one uses a data structure, called TreeMap, to
solve this problem, which only takes 100 lines and reduces the time

Are there any other ways? Any ideas are welcomed.


View raw message