hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sudharsan Sampath (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-2635) Jobs hang indefinitely on failure.
Date Fri, 01 Jul 2011 06:10:28 GMT
Jobs hang indefinitely on failure.
----------------------------------

                 Key: MAPREDUCE-2635
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2635
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: jobtracker, task-controller, tasktracker
    Affects Versions: 0.20.2, 0.20.1
         Environment: Suse Linux cluster with 2 nodes. One running a jobtracker, namenode,
datanode, tasktracker. Other running tasktracker, datanode.
            Reporter: Sudharsan Sampath
            Priority: Blocker


Running the following example hangs the child job indefinitely.

public class HaltCluster
{

  public static void main(String[] args) throws IOException
  {
    JobConf jobConf = new JobConf();
    prepareConf(jobConf);
    if (args != null && args.length > 0)
    {
      jobConf.set("callonceagain", args[0]);
      jobConf.setMaxMapAttempts(1);
      jobConf.setJobName("ParentJob");

    }
    JobClient.runJob(jobConf);

  }

  public static void prepareConf(JobConf jobConf)
  {
    jobConf.setJarByClass(HaltCluster.class);
    jobConf.set("mapred.job.tracker", "<<jobtracker>>");
    jobConf.set("fs.default.name", "<<hdfs>>");
    MultipleInputs.addInputPath(jobConf, new Path("/ignore" + System.currentTimeMillis()),
MyInputFormat.class);
    jobConf.setJobName("ChildJob");
    jobConf.setMapperClass(MyMapper.class);
    jobConf.setOutputFormat(NullOutputFormat.class);
    jobConf.setNumReduceTasks(0);
  }

}

public class MyMapper implements Mapper<IntWritable, Text, NullWritable, NullWritable>
{
  JobConf myConf = null;

  @Override
  public void map(IntWritable arg0, Text arg1, OutputCollector<NullWritable, NullWritable>
arg2, Reporter arg3) throws IOException
  {
    if (myConf != null && "true".equals(myConf.get("callonceagain")))
    {
      startBackGroundReporting(arg3);
      HaltCluster.main(new String[] {});
    }

    throw new RuntimeException("Throwing exception");
  }

  private void startBackGroundReporting(final Reporter arg3)
  {
    Thread t = new Thread()
    {
      @Override
      public void run()
      {
        while (true)
        {
          arg3.setStatus("Reporting to be alive at " + System.currentTimeMillis());
        }
      }
    };
    t.setDaemon(true);
    t.start();
  }

  @Override
  public void configure(JobConf arg0)
  {
    myConf = arg0;

  }

  @Override
  public void close() throws IOException
  {
    // TODO Auto-generated method stub

  }

}

run using the following command

java -cp <<classpath>> HaltCluster true

But if only one job is triggered as java -cp <<classpath>> HaltCluster
it fails to max number of attempts and quits as expected.


Also, when the jobs hang, running the child job once again, makes it come out of deadlock
and completes the three jobs.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message