hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kozlov <ale...@cloudera.com>
Subject Re: [HADOOP] Terasort for numbers
Date Sun, 01 Aug 2010 22:14:27 GMT
Hi Teodor,

I am not clear what you call 'real numbers'.  Terasort does work on bytes
(10 bytes key and 90 bytes payload).  The actual 'meaning' of the bytes
really does not matter as Hadoop uses binary comparators on the raw value.

Total order partitioning should also work with any  WritableComparable key
(if it doesn't, it's a bug).

My guess your problem is converting a char trie to WritableComparable.  Can
you provide more background?  Are the strings of fixed length?

Alex K

On Sun, Aug 1, 2010 at 2:23 PM, Teodor Macicas <teodor.macicas@epfl.ch>wrote:

> Hi all,
>
>
> I am using hadoop 0.20.2 and I want to use sort huge amount of data. I've
> read about Terasort [from examples], but now it's using 10bytes char keys.
> Changing keys from char to integer wasn't a good solution as Terasort
> builds a trie for creating total order partitions. I got stuck when I tried
> to change the char trie to a one suitable for number keys.
>
> Then, I've given a try to Sort [also from examples] and it did work for
> integer keys, but without a total order partitioning. In the end of the day,
> the final result can not be created only by putting together all reducers'
> outputs. Each reducer sorts only a subset of data and no merging is occured
> between two reducers.
>
> Please can anyone advise me what and how to use in order to sort huge
> amount of real numbers ?
> Looking forward for your replies.
>
>
> Thank you.
> Best,
> Teodor
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message