hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Pireddu <pire...@crs4.it>
Subject Re: how to write outputs sequentially?
Date Tue, 22 Mar 2011 16:03:00 GMT
On March 22, 2011 16:54:34 Shi Yu wrote:
> I guess you need to define a Partitioner to send hased keys to different
> reducers (sorry, I am still using the old API so probably there is
> something new in the trunk release).  Basically you try to segment the
> keys into different zones, 0-10, 11-20, ...
> 
> maybe check the hashCode() function and see how to categorize these zones?
> 
> Shi
> 
> On 3/22/2011 9:24 AM, JunYoung Kim wrote:
> > hi,
> > 
> > I run almost 60 ruduce tasks for a single job.
> > 
> > if the outputs of a job are from part00 to part 59.
> > 
> > is there way to write rows sequentially by sorted keys?
> > 
> > curretly my outputs are like this.
> > 
> > part00)
> > 1
> > 10
> > 12
> > 14
> > 
> > part 01)
> > 2
> > 4
> > 6
> > 11
> > 13
> > 
> > part 02)
> > 3
> > 5
> > 7
> > 8
> > 9
> > 
> > but, my aim is to get the following results.
> > 
> > part00)
> > 1
> > 2
> > 3
> > 4
> > 5
> > 
> > part01)
> > 6
> > 7
> > 8
> > 9
> > 10
> > 
> > part02)
> > 11
> > 12
> > 13
> > 14
> > 15
> > 
> > the hadoop is able to support this kind of one?
> > 
> > thanks


You can look at TeraSort in the examples to see how to do this.  There's even 
a short write-up  by Owen O'Malley about it here:  
http://sortbenchmark.org/YahooHadoop.pdf



-- 
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel:  +39 0709250452

Mime
View raw message