hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teodor Macicas <teodor.maci...@epfl.ch>
Subject Re: [HADOOP] Terasort for numbers
Date Mon, 02 Aug 2010 09:25:50 GMT
Hi Alex,

Thank you for your quick reply and sorry for not being so clear.
The job I want to do is simple to sort data having numbers [doubles] as 
keys [0]. I noticed that Terasort is using 10b char key. How can I use 
this for my particular job ?
Do I need to change the Terasort ?

[0] example of workload:
123.45    payload1
-34.56     payload2
752.10    payload3
10.25      payload4

Does this make sense now ?


On 08/02/2010 12:14 AM, Alex Kozlov wrote:
> Hi Teodor,
> I am not clear what you call 'real numbers'.  Terasort does work on bytes
> (10 bytes key and 90 bytes payload).  The actual 'meaning' of the bytes
> really does not matter as Hadoop uses binary comparators on the raw value.
> Total order partitioning should also work with any  WritableComparable key
> (if it doesn't, it's a bug).
> My guess your problem is converting a char trie to WritableComparable.  Can
> you provide more background?  Are the strings of fixed length?
> Alex K
> On Sun, Aug 1, 2010 at 2:23 PM, Teodor Macicas<teodor.macicas@epfl.ch>wrote:
>> Hi all,
>> I am using hadoop 0.20.2 and I want to use sort huge amount of data. I've
>> read about Terasort [from examples], but now it's using 10bytes char keys.
>> Changing keys from char to integer wasn't a good solution as Terasort
>> builds a trie for creating total order partitions. I got stuck when I tried
>> to change the char trie to a one suitable for number keys.
>> Then, I've given a try to Sort [also from examples] and it did work for
>> integer keys, but without a total order partitioning. In the end of the day,
>> the final result can not be created only by putting together all reducers'
>> outputs. Each reducer sorts only a subset of data and no merging is occured
>> between two reducers.
>> Please can anyone advise me what and how to use in order to sort huge
>> amount of real numbers ?
>> Looking forward for your replies.
>> Thank you.
>> Best,
>> Teodor

View raw message