hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boyu Zhang <boyuzhan...@gmail.com>
Subject Re: large parameter file, too many intermediate output
Date Thu, 12 Aug 2010 21:54:03 GMT
Hi Steve,

Thanks for the reply!

On Thu, Aug 12, 2010 at 5:47 PM, Steve Lewis <lordjoe2000@gmail.com> wrote:

> I don't think of half a  billion key value pairs as that large a number -
> nor 20,000 per task - these are
> not atypical for hadoop tasks and many users will see these as small
> numbers
> while you might use cleverness such as a combiner to reduce the output I
> wonder if this is needed
> What is your cluster size and how fast does the job perform???

I am using combiner to compact the output a little bit before they got
written to the disk. My cluster is 48 cores (6 nodes * 8cores/node), my
chunk size is 12MB, there are 90 or so map tasks, and it takes about 30 min
to process.  It is very slow I think. Thanks for the attention and interest!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message