hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balaji Rajagopalan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout.
Date Wed, 19 May 2010 08:17:54 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869056#action_12869056
] 

Balaji Rajagopalan commented on MAPREDUCE-1794:
-----------------------------------------------


<code>
+  /**
+   * Verify the job status whether it is succeeded or not when 
+   * the lost task trackers time out for all four attempts of a task. 
+   * @throws IOException if an I/O error occurs.
+   */
+  @Test
+  public void testJobStatusOfLostTracker2()  throws 
+      Exception {
+    String testName = "LTT2";
+    setupJobAndRun();
+    JobStatus jStatus = verifyLostTaskTrackerJobStatus(testName);
+    Assert.assertEquals("Job has not been failed...", 
+            JobStatus.SUCCEEDED, jStatus.getRunState());
+  }
</code>

The JobStatus should be JobStatus.FAILED instead of succeeded. If the task tracker was lost
for all the four attempts of a task should'nt the job fail instead of succeed, if that is
not the
case the message in the assert has to be changed the job suceeded even when loosing task tracker
for 4 times. 

<code>
+    // Make sure that job should run and completes 40%. 
+    while (jobStatus.getRunState() != JobStatus.RUNNING && 
+      jobStatus.mapProgress() < 0.4f) {
+      UtilsForTests.waitFor(100);
+      jobStatus = wovenClient.getJobInfo(jID).getStatus();
+    }
</code>
Why do we care for checking the job status for 40 % completion, also can be enhance the
building blocks to check this kind of status, since the code can be reused elsewhere. 

<code>
+    TaskInfo[] taskInfos = wovenClient.getTaskInfo(jID);
+    for (TaskInfo taskinfo : taskInfos) {
+      if (!taskinfo.isSetupOrCleanup()) {
+        taskInfo = taskinfo;
+        break;
+      }
+    }
</code>
The above code can be part of a building block in JTClient. 

<code>
+           while (counter < 30) {
+             if (ttClient != null) {
+               break;
+             }else{
+                taskInfo = wovenClient.getTaskInfo(taskInfo.getTaskID());  
+                ttClient = getTTClientIns(taskInfo); 
+             }
+             counter ++;
+           }
</code>
The above code is repeated coupld of times can be part of a function, if this is used accross
test cases then can be part of building block. 

If you see the story description we said we will suspend the task tracker and resume it, but
it seems that you have followed the route of killing the task tracker instead of pausing and
resuming it.
I think kiling should be fine since kill/start it emaulates the pause and resume, but on the
performance side if we had used pause and resume, so the waits in the test cases can be 
reduced.   

One general question I have is after killing the same task tracker 4 times, the task tracker
should
get blacklisted, and if you resubmit the job again, the task tracker should not be used by
job tracker. 
Is it good to check that condition as part of this test case or do you think this is out of
scope. 
There is url which has the blacklisted tasktracker, if we can get the number through aspect
then
it can be verified. Also at the end of the test we need to remove the task tracker from blacklisted
condition for the other tests to run without any problem. 

> Test the job status of lost task trackers before and after the timeout.
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1794
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: test
>            Reporter: Vinay Kumar Thota
>            Assignee: Vinay Kumar Thota
>         Attachments: 1794_lost_tasktracker.patch
>
>
> This test covers the following scenarios.
> 1. Verify the job status whether it is succeeded or not when  the task tracker is lost
and alive before the timeout.
> 2. Verify the job status and killed attempts of a task whether it is succeeded or not
and killed attempts are matched or not  when the task trackers are lost and it timeout for
all the four attempts of a task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message