hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Multiple keys
Date Tue, 04 Dec 2007 01:28:52 GMT

There is the largely undocumented record stream stuff.  You define your
records in an IDL-like language which compiles to java code.  I haven't used
it, but it doesn't look particularly hard.

I believe that this stuff includes definitions of comparators.

Also, if you just put concatenated keys into the key that is output from the
mapper, you effectively get multi-key sorting.

If you really mean that you want to sort the values that your reduce
functions get, that is also possible.  The trick is that you need to define
a key that includes both the partitioning data (to determine which records
get grouped together for reducing) and the sort key (to determine what order
the reduce sees the data in).  This means that you have to define two
functions in your job config.  I don't have sample code just off-hand for
this, but it isn't hard to figure out from the javadocs.


On 12/3/07 5:10 PM, "Rui Shi" <shearershot@yahoo.com> wrote:

> Hi,
> 
> I need to sort the data by multiple keys. Is there any built-in support in
> Hadoop? 
> 
> Thanks,
> 
> Rui
> 
> 
> 
>       
> ______________________________________________________________________________
> ______
> Be a better pen pal.
> Text or chat with friends inside Yahoo! Mail. See how.
> http://overview.mail.yahoo.com/


Mime
View raw message