hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From icebergs <hkm...@gmail.com>
Subject Re: how to write outputs sequentially?
Date Fri, 25 Mar 2011 11:23:59 GMT
You should define your own partitioner.

2011/3/23 Luca Pireddu <pireddu@crs4.it>

> On March 22, 2011 16:54:34 Shi Yu wrote:
> > I guess you need to define a Partitioner to send hased keys to different
> > reducers (sorry, I am still using the old API so probably there is
> > something new in the trunk release).  Basically you try to segment the
> > keys into different zones, 0-10, 11-20, ...
> >
> > maybe check the hashCode() function and see how to categorize these
> zones?
> >
> > Shi
> >
> > On 3/22/2011 9:24 AM, JunYoung Kim wrote:
> > > hi,
> > >
> > > I run almost 60 ruduce tasks for a single job.
> > >
> > > if the outputs of a job are from part00 to part 59.
> > >
> > > is there way to write rows sequentially by sorted keys?
> > >
> > > curretly my outputs are like this.
> > >
> > > part00)
> > > 1
> > > 10
> > > 12
> > > 14
> > >
> > > part 01)
> > > 2
> > > 4
> > > 6
> > > 11
> > > 13
> > >
> > > part 02)
> > > 3
> > > 5
> > > 7
> > > 8
> > > 9
> > >
> > > but, my aim is to get the following results.
> > >
> > > part00)
> > > 1
> > > 2
> > > 3
> > > 4
> > > 5
> > >
> > > part01)
> > > 6
> > > 7
> > > 8
> > > 9
> > > 10
> > >
> > > part02)
> > > 11
> > > 12
> > > 13
> > > 14
> > > 15
> > >
> > > the hadoop is able to support this kind of one?
> > >
> > > thanks
>
>
> You can look at TeraSort in the examples to see how to do this.  There's
> even
> a short write-up  by Owen O'Malley about it here:
> http://sortbenchmark.org/YahooHadoop.pdf
>
>
>
> --
> Luca Pireddu
> CRS4 - Distributed Computing Group
> Loc. Pixina Manna Edificio 1
> Pula 09010 (CA), Italy
> Tel:  +39 0709250452
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message