hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boyu Zhang <boyuzhan...@gmail.com>
Subject Re: large parameter file, too many intermediate output
Date Fri, 13 Aug 2010 20:21:03 GMT
Hi Harsh,

Thank you for the reply. I will try that, although now the map tasks are
taking too much time, almost 20 min to finish all the map tasks(~90). I
don't know if compression will slow me down, but I will make a test and see.
Thank you very much!

Boyu

On Thu, Aug 12, 2010 at 11:07 PM, Harsh J <qwertymaniac@gmail.com> wrote:

> Apart from the combiner suggestion, I'd also suggest using
> intermediate map-output compression always (With LZO, if possible).
> Saves you some IO.
>
> On Fri, Aug 13, 2010 at 3:24 AM, Boyu Zhang <boyuzhang35@gmail.com> wrote:
> > Hi Steve,
> >
> > Thanks for the reply!
> >
> > On Thu, Aug 12, 2010 at 5:47 PM, Steve Lewis <lordjoe2000@gmail.com>
> wrote:
> >
> >> I don't think of half a  billion key value pairs as that large a number
> -
> >> nor 20,000 per task - these are
> >> not atypical for hadoop tasks and many users will see these as small
> >> numbers
> >> while you might use cleverness such as a combiner to reduce the output I
> >> wonder if this is needed
> >> What is your cluster size and how fast does the job perform???
> >>
> >
> > I am using combiner to compact the output a little bit before they got
> > written to the disk. My cluster is 48 cores (6 nodes * 8cores/node), my
> > chunk size is 12MB, there are 90 or so map tasks, and it takes about 30
> min
> > to process.  It is very slow I think. Thanks for the attention and
> interest!
> >
> > Boyu
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message