hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Davies <m...@mattdavies.net>
Subject Re: map stucks at 99.99%
Date Thu, 28 Feb 2013 22:10:47 GMT
I've seen this before if the input data stream changes suddenly and does
not lend itself to parallelization such as counting the number of tuples in
a bag.

One think that may be interesting are the job counters from a previous job
vs this job that just completed.  Do they differ? Is there a particular
mapper that seems to have counts that are way out of whack?

Has someone tweaked the production job in one way or another?




On Thu, Feb 28, 2013 at 1:28 PM, Patai Sangbutsarakum <
silvianhadoop@gmail.com> wrote:

> > What type of CPU is on the box ? load average seems pretty high for a
> 8-core
> > box.
> Xeon 3.07GHz, 24 cores
>
> > Do you have ganglia on these boxes ? Is the load average always so high?
> > What's the memory usage for the task and overall on the box ?
> From top -p pid of the task
> CPU 143.2%  MEM 1.7%
> So, it is not mem dried up on her, cpu is pretty pecked.
>
> >
> > How long has the map task been running in that stuck state ?
> --> at least 2 hours.
>
>
> It finally just finished after hours, it double on time used today.. T_T
>
>
>
>
>
>
> On Thu, Feb 28, 2013 at 1:18 PM, Viral Bajaria <viral.bajaria@gmail.com>
> wrote:
> > What type of CPU is on the box ? load average seems pretty high for a
> 8-core
> > box. Do you have ganglia on these boxes ? Is the load average always so
> high
> > ? What's the memory usage for the task and overall on the box ?
> >
> > How long has the map task been running in that stuck state ? If it's
> been a
> > few minutes, I am surprised that the JT didn't try to run it on another
> node
> > or have you switched off speculative execution ?
> >
> > Sorry too many questions !!
> >
> > You can try jstack, jmap. That will atleast tell you about what's getting
> > blocked.
> >
> > On Thu, Feb 28, 2013 at 1:04 PM, Patai Sangbutsarakum
> > <silvianhadoop@gmail.com> wrote:
> >>
> >> - Check the box on which the task is running, is it under heavy load ?
> >> Is there high amount of I/O wait ?
> >> CPU, very warm load average: 47.47, 48.56, 49.00
> >> I/O, chill on io 0.1x % on iowait, less than 20 tps, rarely upto
> >> 100tps, on 10 disks jbod.
> >>
> >>
> >> - You could check the task logs and see if they say anything about
> >> what is going wrong ?
> >> I would say no.. pretty much all of them is INFO
> >>
> >> - Did the task get pre-empted to other task trackers ? If yes, is it
> >> stuck at the same spot on those ?
> >> Nope.
> >>
> >> - What kind of work are you doing in the mapper ? Just reading from
> >> HDFS and compute something or reading/writing from HBase ?
> >> HDFS + compute, R/W
> >> Absolutely no HBase.
> >>
> >> Would jstack, jmap be any useful ?
> >>
> >>
> >> > - You could check the task logs and see if they say anything about
> what
> >> > is
> >> > going wrong ?
> >> > - Did the task get pre-empted to other task trackers ? If yes, is it
> >> > stuck
> >> > at the same spot on those ?
> >> > - What kind of work are you doing in the mapper ? Just reading from
> HDFS
> >> > and
> >> > compute something or reading/writing from HBase ?
> >>
> >> On Thu, Feb 28, 2013 at 12:25 PM, Viral Bajaria <
> viral.bajaria@gmail.com>
> >> wrote:
> >> > You could start off doing the following:
> >> >
> >> > - Check the box on which the task is running, is it under heavy load ?
> >> > Is
> >> > there high amount of I/O wait ?
> >> > - You could check the task logs and see if they say anything about
> what
> >> > is
> >> > going wrong ?
> >> > - Did the task get pre-empted to other task trackers ? If yes, is it
> >> > stuck
> >> > at the same spot on those ?
> >> > - What kind of work are you doing in the mapper ? Just reading from
> HDFS
> >> > and
> >> > compute something or reading/writing from HBase ?
> >> >
> >> > Thanks,
> >> > Viral
> >> >
> >> > On Thu, Feb 28, 2013 at 12:06 PM, Patai Sangbutsarakum
> >> > <silvianhadoop@gmail.com> wrote:
> >> >>
> >> >> Hadoopers!!
> >> >>
> >> >> Need input from you guys,
> >> >> i am looking at a critical job in production. it stucks at 99.99% in
> >> >> map phrase for much longer than it used to be..
> >> >>
> >> >> what to do to debug what is going on with those map why it is not
> pass
> >> >> through
> >> >> even though tasks and task attempts saying 100% progress but there
is
> >> >> not finish time...
> >> >>
> >> >> Please suggest
> >> >> Patai
> >> >
> >> >
> >
> >
>

Mime
View raw message