hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teodor Macicas <teodor.maci...@epfl.ch>
Subject [HADOOP] Terasort for numbers
Date Sun, 01 Aug 2010 21:23:42 GMT
Hi all,

I am using hadoop 0.20.2 and I want to use sort huge amount of data. 
I've read about Terasort [from examples], but now it's using 10bytes 
char keys.
Changing keys from char to integer wasn't a good solution as Terasort 
builds a trie for creating total order partitions. I got stuck when I 
tried to change the char trie to a one suitable for number keys.

Then, I've given a try to Sort [also from examples] and it did work for 
integer keys, but without a total order partitioning. In the end of the 
day, the final result can not be created only by putting together all 
reducers' outputs. Each reducer sorts only a subset of data and no 
merging is occured between two reducers.

Please can anyone advise me what and how to use in order to sort huge 
amount of real numbers ?
Looking forward for your replies.

Thank you.

View raw message