hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <lordjoe2...@gmail.com>
Subject Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec
Date Wed, 18 Jan 2012 23:46:10 GMT
I KNOW is is a task timeout - what I do NOT know is WHY merely cutting the
number of writes causes it to go away. It seems to imply that some
context.write operation or something downstream from that is taking a huge
amount of time and that is all hadoop internal code - not mine so my
question is why should increasing the number and volume of wriotes cause a
task to time out

On Wed, Jan 18, 2012 at 2:33 PM, Tom Melendez <tom@supertom.com> wrote:

> Sounds like mapred.task.timeout?  The default is 10 minutes.
>
> http://hadoop.apache.org/common/docs/current/mapred-default.html
>
> Thanks,
>
> Tom
>
> On Wed, Jan 18, 2012 at 2:05 PM, Steve Lewis <lordjoe2000@gmail.com>
> wrote:
> > The map tasks fail timing out after 600 sec.
> > I am processing one 9 GB file with 16,000,000 records. Each record (think
> > is it as a line)  generates hundreds of key value pairs.
> > The job is unusual in that the output of the mapper in terms of records
> or
> > bytes orders of magnitude larger than the input.
> > I have no idea what is slowing down the job except that the problem is in
> > the writes.
> >
> > If I change the job to merely bypass a fraction of the context.write
> > statements the job succeeds.
> > This is one map task that failed and one that succeeded - I cannot
> > understand how a write can take so long
> > or what else the mapper might be doing
> >
> > JOB FAILED WITH TIMEOUT
> >
> > *Parser*TotalProteins90,103NumberFragments10,933,089
> >
> *FileSystemCounters*HDFS_BYTES_READ67,245,605FILE_BYTES_WRITTEN444,054,807
> > *Map-Reduce Framework*Combine output records10,033,499Map input records
> > 90,103Spilled Records10,032,836Map output bytes3,520,182,794Combine input
> > records10,844,881Map output records10,933,089
> > Same code but fewer writes
> > JOB SUCCEEDED
> >
> > *Parser*TotalProteins90,103NumberFragments206,658,758
> > *FileSystemCounters*FILE_BYTES_READ111,578,253HDFS_BYTES_READ67,245,607
> > FILE_BYTES_WRITTEN220,169,922
> > *Map-Reduce Framework*Combine output records4,046,128Map input
> > records90,103Spilled
> > Records4,046,128Map output bytes662,354,413Combine input
> records4,098,609Map
> > output records2,066,588
> > Any bright ideas
> > --
> > Steven M. Lewis PhD
> > 4221 105th Ave NE
> > Kirkland, WA 98033
> > 206-384-1340 (cell)
> > Skype lordjoe_com
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message