hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kozlov <ale...@cloudera.com>
Subject Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec
Date Wed, 18 Jan 2012 23:51:04 GMT
Does it always fail at the same place?  Does the task log shows something
unusual?

On Wed, Jan 18, 2012 at 3:46 PM, Steve Lewis <lordjoe2000@gmail.com> wrote:

> I KNOW is is a task timeout - what I do NOT know is WHY merely cutting the
> number of writes causes it to go away. It seems to imply that some
> context.write operation or something downstream from that is taking a huge
> amount of time and that is all hadoop internal code - not mine so my
> question is why should increasing the number and volume of wriotes cause a
> task to time out
>
> On Wed, Jan 18, 2012 at 2:33 PM, Tom Melendez <tom@supertom.com> wrote:
>
> > Sounds like mapred.task.timeout?  The default is 10 minutes.
> >
> > http://hadoop.apache.org/common/docs/current/mapred-default.html
> >
> > Thanks,
> >
> > Tom
> >
> > On Wed, Jan 18, 2012 at 2:05 PM, Steve Lewis <lordjoe2000@gmail.com>
> > wrote:
> > > The map tasks fail timing out after 600 sec.
> > > I am processing one 9 GB file with 16,000,000 records. Each record
> (think
> > > is it as a line)  generates hundreds of key value pairs.
> > > The job is unusual in that the output of the mapper in terms of records
> > or
> > > bytes orders of magnitude larger than the input.
> > > I have no idea what is slowing down the job except that the problem is
> in
> > > the writes.
> > >
> > > If I change the job to merely bypass a fraction of the context.write
> > > statements the job succeeds.
> > > This is one map task that failed and one that succeeded - I cannot
> > > understand how a write can take so long
> > > or what else the mapper might be doing
> > >
> > > JOB FAILED WITH TIMEOUT
> > >
> > > *Parser*TotalProteins90,103NumberFragments10,933,089
> > >
> >
> *FileSystemCounters*HDFS_BYTES_READ67,245,605FILE_BYTES_WRITTEN444,054,807
> > > *Map-Reduce Framework*Combine output records10,033,499Map input records
> > > 90,103Spilled Records10,032,836Map output bytes3,520,182,794Combine
> input
> > > records10,844,881Map output records10,933,089
> > > Same code but fewer writes
> > > JOB SUCCEEDED
> > >
> > > *Parser*TotalProteins90,103NumberFragments206,658,758
> > > *FileSystemCounters*FILE_BYTES_READ111,578,253HDFS_BYTES_READ67,245,607
> > > FILE_BYTES_WRITTEN220,169,922
> > > *Map-Reduce Framework*Combine output records4,046,128Map input
> > > records90,103Spilled
> > > Records4,046,128Map output bytes662,354,413Combine input
> > records4,098,609Map
> > > output records2,066,588
> > > Any bright ideas
> > > --
> > > Steven M. Lewis PhD
> > > 4221 105th Ave NE
> > > Kirkland, WA 98033
> > > 206-384-1340 (cell)
> > > Skype lordjoe_com
> >
>
>
>
> --
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message