hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Large number of map output keys and performance issues.
Date Wed, 06 May 2009 20:29:49 GMT
Hi Tiago,

Here are a couple thoughts:

1) How much data are you outputting? Obviously there is a certain amount of
IO involved in actually outputting data versus not ;-)

2) Are you using a reduce phase in this job? If so, since you're cutting off
the data at map output time, you're also avoiding a whole sort computation
which involves significant network IO, etc.

3) What version of Hadoop are you running?


On Wed, May 6, 2009 at 12:23 PM, Tiago Macambira <macambira@gmail.com>wrote:

> I am developing a MR application w/ hadoop that is generating during it's
> map phase a really large number of output keys and it is having an abysmal
> performance.
> While just reading the said data takes 20 minutes and processing it but not
> outputting anything from the map takes around 30 min, running the full
> application takes around 4 hours. Is this a known or expected issue?
> Cheers.
> Tiago Alves Macambira
> --
> "I may be drunk, but in the morning I will be sober, while you will
> still be stupid and ugly." -Winston Churchill

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message