hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhige Xin <xinzhi...@gmail.com>
Subject Top K words problem
Date Sat, 09 Aug 2014 16:49:44 GMT
I have a question about hadoop that how to modify the wordcount program to
give the top K words according to their occurrences.

The naive method is to count and sort but it needs too many lines of code
and is not elegant. Another one uses a data structure, called TreeMap, to
solve this problem, which only takes 100 lines and reduces the time
complexity.

Are there any other ways? Any ideas are welcomed.




Best,
Isaiah

Mime
View raw message