hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramya Sunil <ra...@hortonworks.com>
Subject Re: Jobs failing on submit
Date Fri, 26 Aug 2011 18:46:42 GMT
Hi John,

How many tasktrackers do you have? Can you check if your tasktrackers are
running and the total available map and reduce capacity in your cluster?
Can you also post the configuration of the scheduler you are using? You
might also want to check the jobtracker logs. It would help in further


On Fri, Aug 26, 2011 at 7:50 AM, John Armstrong <john.armstrong@ccri.com>wrote:

> One of my colleagues has noticed this problem for a while, and now it's
> biting me.  Jobs seem to be failing before every really starting.  It seems
> to be limited (so far) to running in pseudo-distributed mode, since that's
> where he saw the problem and where I'm now seeing it; it hasn't come up on
> our cluster (yet).
> So here's what happens:
> $ java -classpath $MY_CLASSPATH MyLauncherClass -conf my-config.xml -D
> extra.properties=extravalues
> ...
> launcher output
> ...
> 11/08/26 10:35:54 INFO input.FileInputFormat: Total input paths to process
> : 2
> 11/08/26 10:35:54 INFO mapred.JobClient: Running job:
> job_201108261034_0001
> 11/08/26 10:35:55 INFO mapred.JobClient:  map 0% reduce 0%
> and it just sits there.  If I look at the jobtracker's web view the number
> of submissions increments, but nothing shows up as a running, completed,
> failed, or retired job.  If I use the command line probe I find
> $ hadoop job -list
> 1 jobs currently running
> JobId   State   StartTime       UserName        Priority
>  SchedulingInfo
> job_201108261034_0001   4       1314369354247   hdfs    NORMAL  NA
> If I try to kill this job, nothing happens; it remains in the list with
> state 4 (failed?).  I've tried telling the mapper JVM to suspend so I can
> find it in netstat and attach a debugger from IDEA, but it seems that the
> job never gets to the point of even spinning up a JVM to run the mapper.
> Any ideas what might be going wrong?  Thanks.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message