hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2236) No task may execute due to an Integer overflow possibility
Date Wed, 07 Sep 2011 08:51:10 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098777#comment-13098777
] 

Harsh J commented on MAPREDUCE-2236:
------------------------------------

Well do you want me to rebase or do you feel there's no need to? I'm not targeting anything
lower than trunk here, so let me know if its relevant (I'll also like hows/whys :D)

> No task may execute due to an Integer overflow possibility
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-2236
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2236
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>         Environment: Linux, Hadoop 0.20.2
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Critical
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2236.r1.diff, MAPREDUCE-2236.r1.diff, MAPREDUCE-2236.r2.diff
>
>
> If the attempts is configured to use Integer.MAX_VALUE, an overflow occurs inside TaskInProgress,
and thereby no task is attempted by the cluster and the map tasks stay in pending state forever.
> For example, here's a job driver that causes this:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.fs.FSDataOutputStream;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.lib.IdentityMapper;
> import org.apache.hadoop.mapred.lib.NullOutputFormat;
> @SuppressWarnings("deprecation")
> public class IntegerOverflow {
> 	/**
> 	 * @param args
> 	 * @throws IOException 
> 	 */
> 	@SuppressWarnings("deprecation")
> 	public static void main(String[] args) throws IOException {
> 		JobConf conf = new JobConf();
> 		
> 		Path inputPath = new Path("ignore");
> 		FileSystem fs = FileSystem.get(conf);
> 		if (!fs.exists(inputPath)) {
> 			FSDataOutputStream out = fs.create(inputPath);
> 			out.writeChars("Test");
> 			out.close();
> 		}
> 		
> 		conf.setInputFormat(TextInputFormat.class);
> 		conf.setOutputFormat(NullOutputFormat.class);
> 		FileInputFormat.addInputPath(conf, inputPath);
> 		
> 		conf.setMapperClass(IdentityMapper.class);
> 		conf.setNumMapTasks(1);
> 		// Problem inducing line follows.
> 		conf.setMaxMapAttempts(Integer.MAX_VALUE);
> 		
> 		// No reducer in this test, although setMaxReduceAttempts leads to the same problem.
> 		conf.setNumReduceTasks(0);
> 		
> 		JobClient.runJob(conf);
> 	}
> }
> {code}
> The above code will not let any map task run. Additionally, a log would be created inside
JobTracker logs with the following information that clearly shows the overflow:
> {code}
> 2010-12-30 00:59:07,836 WARN org.apache.hadoop.mapred.TaskInProgress: Exceeded limit
of -2147483648 (plus 0 killed) attempts for the tip 'task_201012300058_0001_m_000000'
> {code}
> The issue lies inside the TaskInProgress class (/o/a/h/mapred/TaskInProgress.java), at
line 1018 (trunk), part of the getTaskToRun(String taskTracker) method.
> {code}
>   public Task getTaskToRun(String taskTracker) throws IOException {   
>     // Create the 'taskid'; do not count the 'killed' tasks against the job!
>     TaskAttemptID taskid = null;
>     /* ============ THIS LINE v ====================================== */
>     if (nextTaskId < (MAX_TASK_EXECS + maxTaskAttempts + numKilledTasks)) {
>     /* ============ THIS LINE ^====================================== */
>       // Make sure that the attempts are unqiue across restarts
>       int attemptId = job.getNumRestarts() * NUM_ATTEMPTS_PER_RESTART + nextTaskId;
>       taskid = new TaskAttemptID( id, attemptId);
>       ++nextTaskId;
>     } else {
>       LOG.warn("Exceeded limit of " + (MAX_TASK_EXECS + maxTaskAttempts) +
>               " (plus " + numKilledTasks + " killed)"  + 
>               " attempts for the tip '" + getTIPId() + "'");
>       return null;
>     }
> {code}
> Since all three variables being added are integer in type, one of them being Integer.MAX_VALUE
makes the condition fail with an overflow, thereby logging and returning a null as the result
is negative.
> One solution would be to make one of these variables into a long, so the addition does
not overflow?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message